J. Brisbin

Just another Wordpress.com weblog

Posts Tagged ‘tomcat

Tomcat/tcServer session manager with attribute replication

leave a comment »

I’d like to think that my programming projects don’t suffer from philosophical schizophrenia but that I simply move from point A to point B so fast it just looks that way. Unfortunately, sometimes I have to come face-to-face with this and accept it for what it is: my programming projects suffer from philosophical schizophrenia.

I say this because I’ve been hacking on some changes to the virtual/hybrid cloud Tomcat and tcServer session manager and I went through several stages while I was trying to solve some basic problems.

For one, how do I get a serialized object in the cloud when I don’t know where it is? RabbitMQ comes to my rescue here because I think I’ve finally settled on the most efficient way to answer this question: bind a queue to a topic exchange using the object’s key or identifier (in this case, a session ID). Then any messages for that object automagically get routed to the right VM (the one that has that object in memory). I’m thinking this idea can be extended to create an asynchronous, RabbitMQ-backed, Map implementation.

Now that I have the object, how do I keep it in sync with the “master”? In my case, I send a replication message out whenever a session’s (set|remove)Attribute methods are called and the value objects differ. One notable problem that I don’t see being easily overcome (but thankfully, doesn’t apply to my scenario) is if there are listeners on sessions. I don’t have RabbitMQ wired into the internal Catalina session event mechanism. I could add that at some point, but for the moment, I think this kind of dumb saving/loading via RabbitMQ messages will work for what we’re doing.

I’ve now switched back to a ONEFORALL type operation mode which means there is only one real session object that resides in an internal map on the server who first created it. Whenever another server sees a request for this session, it will send a load message to this queue every time it needs it. That last part is important: it loads a session object every time it needs it. When code sets an attribute on server TC3, that attribute is replicated back to the original server (TC1) so subsequent session loads get that updated object. I’m still trying to wrap my head around how I want to handle replication in case of node failures. No good answer on that one, yet.

REPLICATED mode is my next task. In simplifying this code, I focussed on getting the ONEFORALL mode working right to begin with. Now I can go back and make it more performant by cranking out a little extra code to handle replication events.

Initial smoke tests seem to indicate this works pretty well. Session load times in my testing were around 20-60 milliseconds. Depending on the location of your RabbitMQ server and your network topology, you might experience different results. I’m sanding off a few rough edges now and I’ll be testing this on a new set of cloud servers we’re setting up as part of a new vSphere/vCenter-managed cloud.

As always, the code is available on GitHub:

http://github.com/jbrisbin/vcloud/tree/master/session-manager/

Advertisements

Written by J. Brisbin

May 4, 2010 at 6:35 pm

Distributed Atomicity in the cloud with RabbitMQ

leave a comment »

The private cloud Tomcat/tcServer session manager I’m working on has a huge job cut out for it. Maintaining the state of an object that exists in possibly more than one location at any given point in time is not an easy task, I know. To be honest, if it weren’t for my Midwestern stubbornness, I might not take the time to work through these hefty issues. I might follow the path of least resistance, like most of the industry has done so far.

I just don’t like the idea of sticky sessions. I look at my pool of tcServer instances as one big homongenous group of available resources. In my mind, there should be no distinction made between machines running in different VMs–or even on different hardware. They should exist and cooperate together as a single unit.

But in “replicated” mode, each server has a copy of the object. This is great for failover and it makes the session manager extremely performant. But yet another sticky wicket rears its ugly head. How do I protect this object and make sure it gets updated properly before someone else has a chance to operate on it?

Call it distributed atomicity if you want–the idea being that an object exists within the context of a cloud of compute resources (in this case, a Tomcat/tcServer user session object) and needs to be updated with all the right attributes when code in a different physical process operates on that object. I’m attacking this problem by implementing a form of distributed atomicity that uses RabbitMQ to send the contents of newly-added attributes to any interested parties throughout the cloud. I already replicate the session object by grabbing it with a Valve, just before the request is completed. This session object gets serialized to the cloud before the response is sent, the idea being that this particular object will be updated in all the places it is needed before another server has a chance to operate on that object.

By using the messaging infrastructure of RabbitMQ, I can at least make updates to this object reasonably atomic. Now the question becomes: where does this object live? For performance reasons, it’s probably not realistic to have just one object to share among web application servers. In the case of Tomcat/tcServer, the internal code is requesting the session object so often (multiple times during a single request) that each server simply has to cache a session object for the length of the user’s request.

A tool like ZooKeeper might be helpful in this case. If code has to set an attribute on a session object, the session would set a barrier in ZooKeeper that lets other code know it is in the process of being altered. Once setAttribute() is finished, a message is then sent with the serialized attribute. The other interested parties could alter its local copy of the object with the updated attribute until it receives a full replication of the object. Would the second, full replication be superfluous? At this point I can’t say. In the interest of completeness, I feel compelled to issue a second replication event, but in the interest of performance and bandwidth conservation, I wonder if its really necessary.

I’m far from finished with the cloud-based session manager. I’m trying to get it to a stable point so that I can migrate my cloud away from sticky sessions. The “replicated” mode seems to work fine; and I’m okay with sending too many messages–I’d rather have that than have too few and end up with page loads blocking because the session can’t be tracked down.

Distributed, asynchronous programming isn’t easy. It isn’t for the faint of heart or those with pesky bosses breathing down their necks to meet arbitrary and usually unhelpful deadlines. It also doesn’t help if you’re not a bona-fide genius. I often feel a little out of my league given the number of CompSci grads that are doing fantastic work in this interesting and growing segment of the industry. But I’m stubborn enough to keep plugging away when I should probably give up.

Written by J. Brisbin

April 22, 2010 at 6:17 pm

Tomcat/tcServer cloud session manager now has “replicated” mode

leave a comment »

I’ve updated my virtual/hybrid cloud Tomcat/tcServer session manager to use two different modes of operation. The default mode is what I’ve described previously. The new mode of operation is called “replicated” and it, as the name implies, replicates the user’s session object to every node consuming events on that exchange. This might be the whole cloud, it might not, depending on how you have your exchanges configured.

I’m working on code to only replicate the session if it sees changes in the MD5 signature of the serialized session. Otherwise, it’ll conserve your bandwidth and not replicate the session until it has to. Until then, though, the entire session gets replicated after every request. Excessive? Maybe. 🙂

I’m also trying a different approach to loading user sessions. Rather than contacting the direct queue of the node that advertises itself as the owner of that session, I’m sending a load message to the fanout exchange. This way, dedicated replicator/failover consumers can also respond to load requests in case a node goes down unexpectedly.

At the moment, there’s still no persisting of sessions to disk since a server replicates all its sessions off that node when it’s going down. I’m not sure I really need to dump a node’s sessions to disk when it goes down. I think I want to have dedicated consumers for that purpose.

With dedicated failover consumers, when new servers come up, they get the list of current sessions from the failover node. I don’t see that restoring things from disk would add significant functionality to this store. If you feel differently, be sure and let me know. It wouldn’t be difficult to implement a disk-based persistence mechanism for restarts.

You can checkout the source code from github:

git clone git://github.com/jbrisbin/vcloud.git

The only other change is that you now need to add a special replication valve that calls the replicateSession() method after each request invocation.

<Valve className="com.jbrisbin.vcloud.session.CloudSessionReplicationValve"/>

Written by J. Brisbin

April 20, 2010 at 4:08 pm

Posted in The Virtual Cloud

Tagged with , ,

Change logging package to SLF4J/Log4J in tcServer/Tomcat

with one comment

I really dislike the JULI logging package which is Tomcat’s (and thusly tcServer’s) default. Its configuration seems uncomfortable and the log files are almost unreadable without grepping out what you’re looking for. In all my other applications I use SLF4J, powered by Log4J. This combination is powerful, easy to configure, and I like that it doesn’t put the date of the filename on the log file until after its rotatated. There’s been discussion on the Tomcat list recently about maybe changing this in the future, but I’m not very patient and I’d rather not spend the precious little time I do have mucking about with things that are difficult.

The documentation describing the switch from JULI to Log4J isn’t very long or informative, though the process itself–to be fair–isn’t very complicated. But I get the sense that not many Tomcat developers want to discuss switching from JULI to Log4J, hence the lack of documentation.

Making the switch for tcServer is really only one additional step, though the way tcServer structures its instance directories makes it slightly more complex to configure for use with Log4J.

Due Diligence

Please read the official documenatation on switching from Tomcat JULI to Log4J first. We’ll be doing things a little bit differently, but you should understand where we’re coming from before simply jumping into this.

Building Tomcat

In order to switch from the default Tomcat JULI package, you’ll need to build Tomcat from source, then build the “extras” module. The official documentation leaves out that you have to build the whole server first, then build the extras. If you build only the extras, without building the whole server, you’ll end up with ClassNotFound errors when you try to start Tomcat/tcServer.

UPDATE: You can build the extras module from source, but, come to find out, SpringSource has helpfully included the two jar files mentioned in “tomcat-6.0.20.C/bin/extras”. You can simply copy those jar files to the locations discussed here rather than building the whole server from source.

Building Tomcat

  1. I’m using tcServer 6.0, so download the source tarball for Tomcat 6.0.20 and unzip it somewhere.
  2. “cd” into that directory.
  3. Copy the build.properties.default file to build.properties.
  4. “vi” build.properties and uncomment the “jdt.loc” property, which will allow the Ant build to download the JDT compiler, which is a requirement of the build process.
  5. Increase Ant’s heap size: export ANT_OPTS=-Xmx256m
  6. Build the server: ant
  7. Once the Tomcat server has been successfully built, build the “extras” module: ant -f extras.xml

When that’s finisehd:

  1. Copy ($TCSERVER_HOME/tomcat-6.0.20.C/bin | $TOMCAT_SRC/output)/extras/tomcat-juli.jar file to $TCSERVER_HOME/tomcat-6.0.20.C/bin/tomcat-juli.jar.
  2. Copy ($TCSERVER_HOME/tomcat-6.0.20.C/bin | $TOMCAT_SRC/output)/extras/tomcat-juli-adapters.jar to $TCSERVER_HOME/tomcat-6.0.20.C/lib/
  3. Delete $TCSERVER_INSTANCE_DIR/conf/logging.properties.

Now, copy the Log4J and SLF4J jars. I used the ones from my personal Maven repository (from the $TCSERVER_HOME directory):

cp ~/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar tomcat-6.0.20.C/lib
cp ~/.m2/repository/org/slf4j/slf4j-api/1.5.8/slf4j-api-1.5.8.jar tomcat-6.0.20.C/lib
cp ~/.m2/repository/org/slf4j/slf4j-log4j12/1.5.8/slf4j-log4j12-1.5.8.jar tomcat-6.0.20.C/lib
cp ~/.m2/repository/org/slf4j/jcl-over-slf4j/1.5.8/jcl-over-slf4j-1.5.8.jar tomcat-6.0.20.C/lib

Configuration

When you’ve got all the dependencies copied over, you need to put a configuration file in one of two places, depending on how you want to configure logging for your instances. In my case, I use three identical instances (actually, the names of the instances are different, but other than that, they’re identical) of tcServer, so I could put my log4j.xml file in tomcat-6.0.20C/lib/. In your case, though, assuming your instances are configured differently from one another, you might want to put your log4j.xml file in (assuming an instance name of “dev1”) dev1/lib/.

NOTE: You also need to “vi” the tcServer start script (tcserver-ctl.sh) and comment out the lines that deal with a logging manager and a logging config file (lines 261-262 and 268-269). UPDATE: I actually don’t think this is necessary now. I think my errors were caused by something else. I think it’s safe to leave these be.

If you’re already using Log4J and SLF4J, you’ve likely already got an example XML file lying around that you could use. Copy that file to one of the locations mentioned previously. Mine looks something like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">

  <appender name="console" class="org.apache.log4j.ConsoleAppender">
    <layout class="org.apache.log4j.PatternLayout">
      <param name="ConversionPattern" value="%d %-5p %c{1} - %m%n"/>
    </layout>
  </appender>

  <appender name="catalina" class="org.apache.log4j.DailyRollingFileAppender">
    <param name="File" value="${catalina.base}/logs/catalina.log"/>
    <layout class="org.apache.log4j.PatternLayout">
      <param name="ConversionPattern" value="%d %-5p %c{1} - %m%n"/>
    </layout>
  </appender>

  <appender name="vcloud" class="org.apache.log4j.DailyRollingFileAppender">
    <param name="File" value="${catalina.base}/logs/vcloud.log"/>
    <layout class="org.apache.log4j.PatternLayout">
      <param name="ConversionPattern" value="%d %-5p %c{1} - %m%n"/>
    </layout>
  </appender>

  <category name="org.springframework">
    <level value="INFO"/>
  </category>
  <category name="org.quartz">
    <level value="INFO"/>
  </category>
  <category name="org.apache.catalina">
    <level value="INFO"/>
    <appender-ref ref="catalina"/>
  </category>
  <category name="com.jbrisbin.vcloud">
    <level value="DEBUG"/>
    <appender-ref ref="vcloud"/>
  </category>

  <root>
    <level value="INFO"/>
    <appender-ref ref="console"/>
  </root>

</log4j:configuration>

You can now add categories and appenders to suit your particular needs. You can also change the pattern to suit your tastes.

Written by J. Brisbin

April 20, 2010 at 2:54 pm

Publish Tomcat/tcServer lifecycle events into the cloud with RabbitMQ

with one comment

Once of the common tasks in any cloud environment is to manage membership lists. In the case of a cloud of Tomcat or SpringSource tcServer instances, I wrote a simple JMX MBean class that exposes my tcServer instances to RabbitMQ and serves two functions:

  1. Expose the calling of internal JMX methods to management tools that send messages using RabbitMQ.
  2. Expose the Catalina lifecyle events to the entire cloud.

To maintain a membership list of tcServer instances, I now just have to listen to the events exchange and respond to the lifecycle events I’m interested in:

def members = []
mq.exchange(name: "vcloud.events", type: "topic") {
  queue(name: null, routingKey: "#") {
    consume onmessage: {msg ->
      def key = msg.envelope.routingKey
      def msgBody = msg.bodyAsString
      def source = msg.envelope.routingKey[msgBody.length() + 1..key.length() - 1]
      println "Received ${msgBody} event from ${source}"
      if ( msgBody == "start" ) {
        members << source
      } else if ( msgBody == "stop" ) {
        members.remove(source)
      }

      return true
    }
  }
}

Starting and stopping the tcServer instance yields this in the console:

Received init event from instance.id
members=[]
Received before_start event from instance.id
members=[]
Received start event from instance.id
members=[instance.id]
Received after_start event from instance.id
members=[instance.id]
Received before_stop event from instance.id
members=[instance.id]
Received stop event from instance.id
members=[]
Received after_stop event from instance.id
members=[]

It seems to me one of the defining characteristics of cloud computing versus traditional clusters is the transparency between runtimes and what used to be separate servers. To that end, I’ve exposed the inner workings of my tcServers both to other servers of their kind in the cloud, and to sundry management and monitoring tools I may choose to write in the future.

If you’re concerned with security, opening up the JMX MBeans of your server may give you pause. Fair enough. In my case, that’s not as big of a concern because these servers are protected from the outside world. Only LAN and WAN users can access these servers, so I don’t mind exposing JMX methods to trivially-secured message brokers, particularly if it gives me this kind of inexpensive and direct control over the services I’m exposing to the cloud.

Written by J. Brisbin

April 15, 2010 at 9:32 pm