Sunday, August 5, 2012

Install Python 2.7 on centos without breaking yum (which depends on 2.4)

Problem Statement:
Here's the deal. I want to be able to run the Datastax Suite on a Red Hat Enterprise Linux Server release 5.5 (Tikanga) vm or a centOS 5.x distro. Datastax has a super nice console for monitoring a Cassandra cluster(s). This web based console requires Python 2.6 or 2.7 (but not 3.x -not supported yet as of 8/5/2012). In addition to the ops console needing a newer python, cqlsh (CQL shell) bundled with Cassandra as of 0.8 requires a modern version of python as well. But the problem is we just can't blindly upgrade to Python 2.7 as there is a hard dependency on Python 2.4 for centOS and RH distros and will cause yum package manager to break. So to keep yum happy, we need versions of python 2.4 and 2.7 side by side
OK, so to do this you need sudo access
Download python and install it in /opt or ~/source. I put mine in ~/source
wget http://www.python.org/ftp/python/2.7.3/Python-2.7.3.tgz
tar -xvzf Python-2.7.3.tgz
yum install gcc
cd Python-2.7.3
./configure
make altinstall
The command
make altinstall
is critical because it will prevent replacing the default python directory /usr/bin/python

Monday, July 9, 2012

Evangelizing OSGI



I've read a number of posts about OSGI and most of them do not mention the framework's greatest strength.  In a traditional JEE or Web application server, they all come with built in jars.  Every app server contains a stack of jars that are not the same from version to version or for that matter are not the same across vendor implementations. I've even seen variations in the jar stack in the same version of a container. In theory, you are supposed to be able to drop any web app into any JEE compliant app server and it would just work.  We all know that is just not true.

I was once on a project where the container jar set was manipulated throughout the dev cycle that no one knew how it was derived and for what reason.  The only thing we knew was that one of our developers had a working set of jars.  We were set to go into production within a matter of days, so to get us back on track, all of the jars from his machine were copied into the production instances.  The final app container did not even closely resemble the named version of the app containers GA'd release.  Talk about a cluster.  Whenever we came across a 'ClassNotFoundException' or a 'MethodNotFoundException', we just took the working stack of jars and replaced the jars and hand jammed them into the container.  That is no way to develop.

So the point of this post is that OSGI eliminates uncertainty in your jar dependencies and your OSGI app will always be consistent and you wont be left guessing on hard to find bugs related to runtime dependencies not to mention the class loader problems.  And finally, your app becomes the container and you distribute your app as a self contained tar ball or zip file.  Not a war or ear deployed into an app server (although your app contains war files) where it may or may not satisfy all of your apps dependencies.  I value my sleep and I don't want to be on a 3 am call trying to diagnose class loader issue.  Thanks for reading my post.

Saturday, July 7, 2012

OSGI compatible Astyanax 1.0.3

This post describes how to make the Astyanax api OSGI compatible. As of the date of this post (July 2012), the Astyanax version is 1.0.3 . Astyanax was developed and open sourced by Netflix and is used as a database client to access Cassandra DB. If you are familiar with the history of the Cassandra database, then you might know that one of the recent popular clients to access Cassandra was the 'Hector' api. Back in the day, before Hector there was only one way to access Cassandra. Apache Thrift. So Hector extends Thrift, and Asytanax extends Hector.
As a side note:
In greek mythology, Astyanax was the son of Hector. His birth name was Scamandrius but the people of Troy nicknamed him Astyanax, hence, it is a fitting name to the api. But the parallels of the story diverge from here as the Astyanax from Greece was thrown from the Greek walls and killed after the Trojan War. Lets hope this Astyanax (the api) is not thrown over the wall, but will live long and stand strong.

Many thanks to Netflix for making Astyanax open source. If you guys ever want to make Astyanax OSGI compliant, you may find this post useful.

Create a directory structure like the following...



or if you want you can 'git' it at https://github.com/kentacious/astyanax-osgi.git

The astyanax directory structure is based on the most recent version as of this post (July 2012). The dependencies are slightly tweaked to be used with the Datastax distro (community edition) of Cassandra currently at version 1.1.0.

.
└── astyanax-osgify-1.0.3
    ├── astyanax-1.0.3
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── avro-1.7.0
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── cassandra-all-for-real-1.1.1
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── cassandra-cql-1.1.1
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── cassandra-jdbc-1.1.1
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── commons-cli-1.1
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── commons-csv-1.0-r706900_3
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── compress-lzf-0.7.0
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── concurrentlinkedhashmap-lru-1.2
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── eaio-uuid-3.2.0
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── high-scale-lib-1.1.2
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── jamm-0.2.5
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── javax-inject-1.0
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── jettison-1.3.1
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── libthrift-0.7.0
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── log4j-1.2.16
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── metrics-core-2.0.3
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── paranamer-2.5
    │   ├── pom.xml
    │   └── src
    │       └── main
    │           └── resources
    │               └── META-INF
    │                   └── template.mf
    ├── pom.xml
    └── snappy-1.0.4.1
        ├── pom.xml
        └── src
            └── main
                └── resources
                    └── META-INF
                        └── template.mf


Spring Source Bundlor

As you can see, this is just a bunch of pom.xml files (18 give or take) with one parent pom.xml at the root of 'astyanax-osgify-1.0.3'. When we crack open one of these pom.xml files, you'll see that we are using the maven plugin 'Spring Source Bundlor' as follows.
                <groupId>com.springsource.bundlor</groupId>
                <artifactId>com.springsource.bundlor.maven</artifactId>
This snippet of magic takes a non-osgified jar and examines the contents by decompiling the byte code, identifying all of the import statements, public classes and package definitions from the jar, and produces a brand new jar containing all of the original classes plus a newly decorated META-INF/MANIFEST.MF file with entries defining all of the import and export statements plus some other important OSGI stuff.

Define Transient Dependencies as Modules and Define Maven Repositories

Lets first take a look at the parent pom.xml file. Take note of the highlighted entries below defining all of the modules. These module names are the same as the directory names by convention
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  
  <!-- ================== -->
  <!-- Basic project info -->
  <!-- ================== -->
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.netflix.astyanax</groupId>
  <artifactId>astyanax.parent</artifactId>
  <version>1.0.3</version>
  <packaging>pom</packaging>
  <name>Astyanax 1.0.3 OSGI Bundle builder - Parent POM</name>


  <!-- ================================= -->
  <!-- Properties unique to this project -->
  <!-- ================================= -->
  <properties>
    <osgi.bundle.symbolic.name>${groupId}.${artifactId}</osgi.bundle.symbolic.name>
    <parent.pom.version>1.0.3</parent.pom.version>
    <osgi.bundle.version>1.0.3</osgi.bundle.version>
  </properties>
  

  <!-- =============== -->
  <!-- Project Modules -->
  <!-- =============== -->
  <modules>
    <module>eaio-uuid-3.2.0</module>
    <module>astyanax-1.0.3</module>
    <module>avro-1.7.0</module>
    <module>cassandra-all-for-real-1.1.1</module>
    <module>commons-cli-1.1</module>
    <module>commons-csv-1.0-r706900_3</module>
    <module>compress-lzf-0.7.0</module>
    <module>concurrentlinkedhashmap-lru-1.2</module>
    <module>high-scale-lib-1.1.2</module>
    <module>jamm-0.2.5</module>
    <module>javax-inject-1.0</module>
    <module>jettison-1.3.1</module>
    <module>libthrift-0.7.0</module>
    <module>log4j-1.2.16</module>
    <module>metrics-core-2.0.3</module>
    <module>paranamer-2.5</module>
    <module>snappy-1.0.4.1</module>
 </modules>

  <!-- ==================== -->
  <!-- Project dependencies -->
  <!-- ==================== -->
  <dependencies>
  </dependencies>
  
  <build>
    <plugins>
            <plugin>
                <artifactId>maven-dependency-plugin</artifactId>
                <executions>
                    <execution>
                        <phase>compile</phase>
                        <goals>
                            <goal>unpack-dependencies</goal>
                        </goals>
                        <configuration>
                            <outputDirectory>${project.build.outputDirectory}/</outputDirectory>
                            <excludeTransitive>true</excludeTransitive>
                            <excludes>META-INF/**/*</excludes>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-resources-plugin</artifactId>
                <configuration>
                    <encoding>UTF-8</encoding>
                </configuration>
                <executions>
                    <execution>
                        <id>initialize</id>
                        <phase>initialize</phase>
                        <goals>
                            <goal>resources</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
           <plugin>
                <groupId>com.springsource.bundlor</groupId>
                <artifactId>com.springsource.bundlor.maven</artifactId>
                <version>1.0.0.M5</version>
                <configuration>
                    <failOnWarnings>false</failOnWarnings>
                    <removeNullHeaders>true</removeNullHeaders>
                    <manifestTemplatePath>${basedir}/src/main/resources/META-INF/template.mf</manifestTemplatePath>
                    <outputManifest>${project.build.outputDirectory}/META-INF/manifest.mf</outputManifest>
                </configuration>
                <executions>
                    <execution>
                        <id>bundle-manifest</id>
                        <phase>package</phase>
                        <goals>
                            <goal>transform</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>

    </plugins>
  </build>  

  <!-- ======================== -->
  <!-- Repository Configuration -->
  <!-- ======================== -->
  <repositories>
        <repository>
            <id>artifactory-releases</id>
            <name>artifactory-releases</name>
            <url>http://itstcb.com/artifactory/libs-release-local</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
            <releases>
                <enabled>true</enabled>
            </releases>
        </repository>
        <repository>
            <id>artifactory-snapshots</id>
            <name>artifactory-snapshots</name>
            <url>http://itstcb.com/artifactory/libs-snapshot-local</url>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
            <releases>
                <enabled>false</enabled>
            </releases>
        </repository>
    </repositories>
    <pluginRepositories>
        <pluginRepository>
            <id>com.springsource.repository.bundles.milestone</id>
            <name>SpringSource Enterprise Bundle Repository - SpringSource Milestone Releases</name>
            <url>http://repository.springsource.com/maven/bundles/milestone</url>
        </pluginRepository>
        <pluginRepository>
            <id>com.springsource.repository.bundles.release</id>
            <name>SpringSource Enterprise Bundle Repository</name>
            <url>http://repository.springsource.com/maven/bundles/release</url>
        </pluginRepository>
    </pluginRepositories>
</project>



Having a parent pom file allows you to keep all the heavy lifting in one place so that the individual modules pom.xml files are relatively clean and simple.

Create Child Poms

I'll crack open one of the child poms to show you how clean they are. This is the pom file for 'libthrift-0.7.0'

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  
  <!-- ================== -->
  <!-- Basic project info -->
  <!-- ================== -->
  <parent>
    <groupId>com.netflix.astyanax</groupId>
    <artifactId>astyanax.parent</artifactId>
    <version>1.0.3</version>
    <relativePath>../pom.xml</relativePath>
  </parent>
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.apache.thrift</groupId>
  <artifactId>libthrift-osgi</artifactId>
  <version>0.7.0</version>
  <packaging>jar</packaging>
  <name>Lib Thrift ${version} </name>

  
  <!-- ================================= -->
  <!-- Properties unique to this project -->
  <!-- ================================= -->
  <properties>
    <osgi.bundle.symbolic.name>${groupId}.${artifactId}</osgi.bundle.symbolic.name>
    <osgi.bundle.version>${version}</osgi.bundle.version>
    <osgi.bundle.name>${name}</osgi.bundle.name>
  </properties>
  
  <!-- ==================== -->
  <!-- Project dependencies -->
  <!-- ==================== -->
  <dependencies>
     <dependency>
      <groupId>org.apache.thrift</groupId>
      <artifactId>libthrift</artifactId>
      <version>${version}</version>
    </dependency>
  </dependencies>

</project>



Now that is crazy clean. All you really have to do is declare the name of the output which will be the osgi compatible module name via the groupId and artifactId.
  <groupId>org.apache.thrift</groupId>
  <artifactId>libthrift-osgi</artifactId>
  <version>0.7.0</version>
And define where the input is coming from.
  <dependencies>
     <dependency>
      <groupId>org.apache.thrift</groupId>
      <artifactId>libthrift</artifactId>
      <version>${version}</version>
    </dependency>
  </dependencies>
The input will be downloaded from either your local maven artifactory or from a remote artifactory site. So there is no need to assemble all of the individual jars in order to make the modules osgi ready.

template.mf

Now time to move on to the template.mf file that you see buried in each of the module subdirectories. This template.mf file tells the spring bundlor any additional osgi metadata that might be needed at run time. For example, the most common scenario would be to make certain packages that a module consumes as optional. When running in an OSGI container, if all of the dependencies are not satisfied, then OSGI will load the module in an 'INSTALLED' state and will not be 'ACTIVE' which is where we need to be in order for the modules to be usable in the OSGI container. Below is the template.mf file for 'libthrift-0.7.0'
Bundle-ManifestVersion: 2
Bundle-SymbolicName: ${osgi.bundle.symbolic.name}
Bundle-Version: ${osgi.bundle.version}
Bundle-Name: ${osgi.bundle.name}
Import-Template:
  org.apache.http;version=0.0.0;resolution:=optional,
  org.apache.http.client;version=0.0.0;resolution:=optional,
  org.apache.http.client.methods;version=0.0.0;resolution:=optional,
  org.apache.http.entity;version=0.0.0;resolution:=optional,
  org.apache.http.params;version=0.0.0;resolution:=optional


As you can see, I added some 'optional' statements to tell OSGI to be quiet. Since we are running on the server side and Astyanax is connecting to Cassandra via direct tcp socket, it was safe for me to assume we are not using an http client to talk to Cassandra. If I were to be wrong, I would get a 'ClassNotFoundException' at runtime and I would have to go back to this project and try to include an additional jar to resolve the dependency. Another interesting template.mf file from the module 'cassandra-all-for-real' is shown below. It has a bunch of optional packages.
Bundle-ManifestVersion: 2
Bundle-SymbolicName: ${osgi.bundle.symbolic.name}
Bundle-Version: ${osgi.bundle.version}
Bundle-Name: ${osgi.bundle.name}
Import-Template:
  edu.stanford.ppl.concurrent;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.conf;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.fs;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.io;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.mapred;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.mapreduce;version=0.0.0.0;resolution:=optional,
  org.apache.hadoop.util;version=0.0.0.0;resolution:=optional,
  org.apache.pig;version=0.0.0.0;resolution:=optional,
  org.apache.pig.backend.executionengine;version=0.0.0.0;resolution:=optional,
  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer;version=0.0.0.0;resolution:=optional,
  org.apache.pig.data;version=0.0.0.0;resolution:=optional,
  org.apache.pig.impl.util;version=0.0.0.0;resolution:=optional,
  com.sun.jna;version=0.0.0.0;resolution:=optional

Since I am not using 'pig' or 'hadoop', I made those jar optional.

Resolving OSGI depencency Issues and dealing with package ambiguity

You may be wondering why I named the osgi module 'cassandra-all-for-real'

It turns out to be an interesting story and a learning experience for me and strengthened my knowledge of OSGI. The cassandra distro recently broke out the thrift portion into a separate jar. So originally there were two jars as follows.
cassandra-all-1.1.1.jar
cassandra-thrift-1.1.1.jar

Merging Multiple jar into one jar using spring bundlor

It turns out both jars expose the same package name, 'org.apache.cassandra.thrift' but with difference classes in each jar, and in OSGI the lowest level of granularity is at the package level, so from the OSGI perspective, that package name was ambiguous. So in my previous implementation of this project both jar's where osgified independently and I didn't notice a problem until I started calling the Astyanax api from within my code in the context of an OSGI container. It should be known that OSGI can handle multiple versions of the same package to support backwards compatibility in the same runtime environment, but in this case, the two jar defined the same package name and OSGI was not too happy. There was no easily identifiable solution except to merge the two jars into one, hence the name cassandra-all-'for-real' was added to identify the osgi bundle. I started off merging the two jars by hand and that was a pain because the manifest wouldn't be importing/exporting the package names correctly. Then it occurred to me (divine intervention) that in the pom file that produces the osgi bundle, you can add more than just one dependency.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <!-- ================== -->
    <!-- Basic project info -->
    <!-- ================== -->
    <parent>
      <groupId>com.netflix.astyanax</groupId>
      <artifactId>astyanax.parent</artifactId>
      <version>1.0.3</version>
      <relativePath>../pom.xml</relativePath>
    </parent>
    <modelVersion>4.0.0</modelVersion>
    <groupId>org.apache.cassandra</groupId>
    <artifactId>cassandra-all-for-real-osgi</artifactId>
    <version>1.1.1</version>
    <packaging>jar</packaging>
    <name>Cassandra All ${version}</name>


    <!-- ================================= -->
    <!-- Properties unique to this project -->
    <!-- ================================= -->
    <properties>
        <osgi.bundle.symbolic.name>${groupId}.${artifactId}</osgi.bundle.symbolic.name>
        <osgi.bundle.version>${version}</osgi.bundle.version>
        <osgi.bundle.name>${name}</osgi.bundle.name>
    </properties>

    <!-- ==================== -->
    <!-- Project dependencies -->
    <!-- ==================== -->
    <dependencies>
        <dependency>
            <groupId>org.apache.cassandra</groupId>
            <artifactId>cassandra-all</artifactId>
            <version>1.1.1</version>
        </dependency>
        <dependency>
            <groupId>org.apache.cassandra</groupId>
            <artifactId>cassandra-thrift</artifactId>
            <version>1.1.1</version>
        </dependency>
    </dependencies>

</project>


Final Step - Run Maven from the root directory of 'astyanax-osgify-1.0.3'

>  mvn clean install


Additional Thoughts

This lead me to my next thought. If you wanted to, you could get rid of all the subdirectories in this project and just have one pom.xml that defined each of the astyanax dependencies in one place to produce one big jar that is entirely self contained with no external dependencies. I don't know how big that jar would be, but it is possible to do. That way your distribution would be just one big jar and people could just start using it without worrying about dependencies. That is really a clever idea.