Saturday, July 7, 2012

OSGI compatible Astyanax 1.0.3

This post describes how to make the Astyanax api OSGI compatible. As of the date of this post (July 2012), the Astyanax version is 1.0.3 . Astyanax was developed and open sourced by Netflix and is used as a database client to access Cassandra DB. If you are familiar with the history of the Cassandra database, then you might know that one of the recent popular clients to access Cassandra was the 'Hector' api. Back in the day, before Hector there was only one way to access Cassandra. Apache Thrift. So Hector extends Thrift, and Asytanax extends Hector.
As a side note:
In greek mythology, Astyanax was the son of Hector. His birth name was Scamandrius but the people of Troy nicknamed him Astyanax, hence, it is a fitting name to the api. But the parallels of the story diverge from here as the Astyanax from Greece was thrown from the Greek walls and killed after the Trojan War. Lets hope this Astyanax (the api) is not thrown over the wall, but will live long and stand strong.

Many thanks to Netflix for making Astyanax open source. If you guys ever want to make Astyanax OSGI compliant, you may find this post useful.

Create a directory structure like the following...



or if you want you can 'git' it at https://github.com/kentacious/astyanax-osgi.git

The astyanax directory structure is based on the most recent version as of this post (July 2012). The dependencies are slightly tweaked to be used with the Datastax distro (community edition) of Cassandra currently at version 1.1.0.

.
└── astyanax-osgify-1.0.3
    ├── astyanax-1.0.3
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── avro-1.7.0
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── cassandra-all-for-real-1.1.1
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── cassandra-cql-1.1.1
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── cassandra-jdbc-1.1.1
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── commons-cli-1.1
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── commons-csv-1.0-r706900_3
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── compress-lzf-0.7.0
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── concurrentlinkedhashmap-lru-1.2
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── eaio-uuid-3.2.0
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── high-scale-lib-1.1.2
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── jamm-0.2.5
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── javax-inject-1.0
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── jettison-1.3.1
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── libthrift-0.7.0
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── log4j-1.2.16
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── metrics-core-2.0.3
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── paranamer-2.5
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── pom.xml
    └── snappy-1.0.4.1
        ├── pom.xml
        └── src
            └── main
                └── resources
                    └── META-INF
                        └── template.mf


Spring Source Bundlor

As you can see, this is just a bunch of pom.xml files (18 give or take) with one parent pom.xml at the root of 'astyanax-osgify-1.0.3'. When we crack open one of these pom.xml files, you'll see that we are using the maven plugin 'Spring Source Bundlor' as follows.
                <groupId>com.springsource.bundlor</groupId>
                <artifactId>com.springsource.bundlor.maven</artifactId>
This snippet of magic takes a non-osgified jar and examines the contents by decompiling the byte code, identifying all of the import statements, public classes and package definitions from the jar, and produces a brand new jar containing all of the original classes plus a newly decorated META-INF/MANIFEST.MF file with entries defining all of the import and export statements plus some other important OSGI stuff.

Define Transient Dependencies as Modules and Define Maven Repositories

Lets first take a look at the parent pom.xml file. Take note of the highlighted entries below defining all of the modules. These module names are the same as the directory names by convention
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  
  <!-- ================== -->
  <!-- Basic project info -->
  <!-- ================== -->
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.netflix.astyanax</groupId>
  <artifactId>astyanax.parent</artifactId>
  <version>1.0.3</version>
  <packaging>pom</packaging>
  <name>Astyanax 1.0.3 OSGI Bundle builder - Parent POM</name>


  <!-- ================================= -->
  <!-- Properties unique to this project -->
  <!-- ================================= -->
  <properties>
    <osgi.bundle.symbolic.name>${groupId}.${artifactId}</osgi.bundle.symbolic.name>
    <parent.pom.version>1.0.3</parent.pom.version>
    <osgi.bundle.version>1.0.3</osgi.bundle.version>
  </properties>
  

  <!-- =============== -->
  <!-- Project Modules -->
  <!-- =============== -->
  <modules>
    <module>eaio-uuid-3.2.0</module>
    <module>astyanax-1.0.3</module>
    <module>avro-1.7.0</module>
    <module>cassandra-all-for-real-1.1.1</module>
    <module>commons-cli-1.1</module>
    <module>commons-csv-1.0-r706900_3</module>
    <module>compress-lzf-0.7.0</module>
    <module>concurrentlinkedhashmap-lru-1.2</module>
    <module>high-scale-lib-1.1.2</module>
    <module>jamm-0.2.5</module>
    <module>javax-inject-1.0</module>
    <module>jettison-1.3.1</module>
    <module>libthrift-0.7.0</module>
    <module>log4j-1.2.16</module>
    <module>metrics-core-2.0.3</module>
    <module>paranamer-2.5</module>
    <module>snappy-1.0.4.1</module>
 </modules>

  <!-- ==================== -->
  <!-- Project dependencies -->
  <!-- ==================== -->
  <dependencies>
  </dependencies>
  
  <build>
    <plugins>
            <plugin>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>compile</phase>
                        <goals>
                            <goal>unpack-dependencies</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>${project.build.outputDirectory}/</outputDirectory>
                            <excludeTransitive>true</excludeTransitive>
                            <excludes>META-INF/**/*</excludes>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <configuration>
                    <encoding>UTF-8</encoding>
                </configuration>
                <executions>
                    <execution>
                        <id>initialize</id>
                        <phase>initialize</phase>
                        <goals>
                            <goal>resources</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
           <plugin>
                <groupId>com.springsource.bundlor</groupId>
                <artifactId>com.springsource.bundlor.maven</artifactId>
                <version>1.0.0.M5</version>
                <configuration>
                    <failOnWarnings>false</failOnWarnings>
                    <removeNullHeaders>true</removeNullHeaders>
                    <manifestTemplatePath>${basedir}/src/main/resources/META-INF/template.mf</manifestTemplatePath>
                    <outputManifest>${project.build.outputDirectory}/META-INF/manifest.mf</outputManifest>
                </configuration>
                <executions>
                    <execution>
                        <id>bundle-manifest</id>
                        <phase>package</phase>
                        <goals>
                            <goal>transform</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

    </plugins>
  </build>  

  <!-- ======================== -->
  <!-- Repository Configuration -->
  <!-- ======================== -->
  <repositories>
        <repository>
            <id>artifactory-releases</id>
            <name>artifactory-releases</name>
            <url>http://itstcb.com/artifactory/libs-release-local</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
            <releases>
                <enabled>true</enabled>
            </releases>
        </repository>
        <repository>
            <id>artifactory-snapshots</id>
            <name>artifactory-snapshots</name>
            <url>http://itstcb.com/artifactory/libs-snapshot-local</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
            <releases>
                <enabled>false</enabled>
            </releases>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>com.springsource.repository.bundles.milestone</id>
            <name>SpringSource Enterprise Bundle Repository - SpringSource Milestone Releases</name>
            <url>http://repository.springsource.com/maven/bundles/milestone</url>
        </pluginRepository>
        <pluginRepository>
            <id>com.springsource.repository.bundles.release</id>
            <name>SpringSource Enterprise Bundle Repository</name>
            <url>http://repository.springsource.com/maven/bundles/release</url>
        </pluginRepository>
    </pluginRepositories>
</project>



Having a parent pom file allows you to keep all the heavy lifting in one place so that the individual modules pom.xml files are relatively clean and simple.

Create Child Poms

I'll crack open one of the child poms to show you how clean they are. This is the pom file for 'libthrift-0.7.0'

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  
  <!-- ================== -->
  <!-- Basic project info -->
  <!-- ================== -->
  <parent>
    <groupId>com.netflix.astyanax</groupId>
    <artifactId>astyanax.parent</artifactId>
    <version>1.0.3</version>
    <relativePath>../pom.xml</relativePath>
  </parent>
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.apache.thrift</groupId>
  <artifactId>libthrift-osgi</artifactId>
  <version>0.7.0</version>
  <packaging>jar</packaging>
  <name>Lib Thrift ${version} </name>

  
  <!-- ================================= -->
  <!-- Properties unique to this project -->
  <!-- ================================= -->
  <properties>
    <osgi.bundle.symbolic.name>${groupId}.${artifactId}</osgi.bundle.symbolic.name>
    <osgi.bundle.version>${version}</osgi.bundle.version>
    <osgi.bundle.name>${name}</osgi.bundle.name>
  </properties>
  
  <!-- ==================== -->
  <!-- Project dependencies -->
  <!-- ==================== -->
  <dependencies>
     <dependency>
      <groupId>org.apache.thrift</groupId>
      <artifactId>libthrift</artifactId>
      <version>${version}</version>
    </dependency>
  </dependencies>

</project>



Now that is crazy clean. All you really have to do is declare the name of the output which will be the osgi compatible module name via the groupId and artifactId.
  <groupId>org.apache.thrift</groupId>
  <artifactId>libthrift-osgi</artifactId>
  <version>0.7.0</version>
And define where the input is coming from.
  <dependencies>
     <dependency>
      <groupId>org.apache.thrift</groupId>
      <artifactId>libthrift</artifactId>
      <version>${version}</version>
    </dependency>
  </dependencies>
The input will be downloaded from either your local maven artifactory or from a remote artifactory site. So there is no need to assemble all of the individual jars in order to make the modules osgi ready.

template.mf

Now time to move on to the template.mf file that you see buried in each of the module subdirectories. This template.mf file tells the spring bundlor any additional osgi metadata that might be needed at run time. For example, the most common scenario would be to make certain packages that a module consumes as optional. When running in an OSGI container, if all of the dependencies are not satisfied, then OSGI will load the module in an 'INSTALLED' state and will not be 'ACTIVE' which is where we need to be in order for the modules to be usable in the OSGI container. Below is the template.mf file for 'libthrift-0.7.0'
Bundle-ManifestVersion: 2
Bundle-SymbolicName: ${osgi.bundle.symbolic.name}
Bundle-Version: ${osgi.bundle.version}
Bundle-Name: ${osgi.bundle.name}
Import-Template:
  org.apache.http;version=0.0.0;resolution:=optional,
  org.apache.http.client;version=0.0.0;resolution:=optional,
  org.apache.http.client.methods;version=0.0.0;resolution:=optional,
  org.apache.http.entity;version=0.0.0;resolution:=optional,
  org.apache.http.params;version=0.0.0;resolution:=optional


As you can see, I added some 'optional' statements to tell OSGI to be quiet. Since we are running on the server side and Astyanax is connecting to Cassandra via direct tcp socket, it was safe for me to assume we are not using an http client to talk to Cassandra. If I were to be wrong, I would get a 'ClassNotFoundException' at runtime and I would have to go back to this project and try to include an additional jar to resolve the dependency. Another interesting template.mf file from the module 'cassandra-all-for-real' is shown below. It has a bunch of optional packages.
Bundle-ManifestVersion: 2
Bundle-SymbolicName: ${osgi.bundle.symbolic.name}
Bundle-Version: ${osgi.bundle.version}
Bundle-Name: ${osgi.bundle.name}
Import-Template:
  edu.stanford.ppl.concurrent;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.conf;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.fs;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.io;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.mapred;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.mapreduce;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.util;version=0.0.0.0;resolution:=optional,
  org.apache.pig;version=0.0.0.0;resolution:=optional,
  org.apache.pig.backend.executionengine;version=0.0.0.0;resolution:=optional,
  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer;version=0.0.0.0;resolution:=optional,
  org.apache.pig.data;version=0.0.0.0;resolution:=optional,
  org.apache.pig.impl.util;version=0.0.0.0;resolution:=optional,
  com.sun.jna;version=0.0.0.0;resolution:=optional

Since I am not using 'pig' or 'hadoop', I made those jar optional.

Resolving OSGI depencency Issues and dealing with package ambiguity

You may be wondering why I named the osgi module 'cassandra-all-for-real'

It turns out to be an interesting story and a learning experience for me and strengthened my knowledge of OSGI. The cassandra distro recently broke out the thrift portion into a separate jar. So originally there were two jars as follows.
cassandra-all-1.1.1.jar
cassandra-thrift-1.1.1.jar

Merging Multiple jar into one jar using spring bundlor

It turns out both jars expose the same package name, 'org.apache.cassandra.thrift' but with difference classes in each jar, and in OSGI the lowest level of granularity is at the package level, so from the OSGI perspective, that package name was ambiguous. So in my previous implementation of this project both jar's where osgified independently and I didn't notice a problem until I started calling the Astyanax api from within my code in the context of an OSGI container. It should be known that OSGI can handle multiple versions of the same package to support backwards compatibility in the same runtime environment, but in this case, the two jar defined the same package name and OSGI was not too happy. There was no easily identifiable solution except to merge the two jars into one, hence the name cassandra-all-'for-real' was added to identify the osgi bundle. I started off merging the two jars by hand and that was a pain because the manifest wouldn't be importing/exporting the package names correctly. Then it occurred to me (divine intervention) that in the pom file that produces the osgi bundle, you can add more than just one dependency.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <!-- ================== -->
    <!-- Basic project info -->
    <!-- ================== -->
    <parent>
      <groupId>com.netflix.astyanax</groupId>
      <artifactId>astyanax.parent</artifactId>
      <version>1.0.3</version>
      <relativePath>../pom.xml</relativePath>
    </parent>
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.apache.cassandra</groupId>
    <artifactId>cassandra-all-for-real-osgi</artifactId>
    <version>1.1.1</version>
    <packaging>jar</packaging>
    <name>Cassandra All ${version}</name>


    <!-- ================================= -->
    <!-- Properties unique to this project -->
    <!-- ================================= -->
    <properties>
        <osgi.bundle.symbolic.name>${groupId}.${artifactId}</osgi.bundle.symbolic.name>
        <osgi.bundle.version>${version}</osgi.bundle.version>
        <osgi.bundle.name>${name}</osgi.bundle.name>
    </properties>

    <!-- ==================== -->
    <!-- Project dependencies -->
    <!-- ==================== -->
    <dependencies>
        <dependency>
            <groupId>org.apache.cassandra</groupId>
            <artifactId>cassandra-all</artifactId>
            <version>1.1.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.cassandra</groupId>
            <artifactId>cassandra-thrift</artifactId>
            <version>1.1.1</version>
        </dependency>
    </dependencies>

</project>


Final Step - Run Maven from the root directory of 'astyanax-osgify-1.0.3'

>  mvn clean install


Additional Thoughts

This lead me to my next thought. If you wanted to, you could get rid of all the subdirectories in this project and just have one pom.xml that defined each of the astyanax dependencies in one place to produce one big jar that is entirely self contained with no external dependencies. I don't know how big that jar would be, but it is possible to do. That way your distribution would be just one big jar and people could just start using it without worrying about dependencies. That is really a clever idea.

No comments:

Post a Comment