J. Brisbin

Just another Wordpress.com weblog

Archive for the ‘RabbitMQ’ Category

RabbitMQ as a NoSQL distributed cache

leave a comment »

Part of what I’ve been doing with the cloud-friendly Tomcat session manager is basically implementing my own asynchronous distributed object cache. At the moment, this functionality is tightly coupled to what I’m doing inside the session Store. But in making some changes recently to add Spring Security integration and make working with Spring Security 3.0 a little easier, I noticed that there’s a lot of what I’m doing inside the session Store that could simply be abstracted into its own package and used as a standalone distributed cache.

The concept is simple and I think the code will be straightforward. Instead of synchronously loading an object from a data store (which is configured on the back end to shard its data or do other kinds of distributed load-balancing and failover replication), code would request the object be loaded asynchronously and provide a callback to be executed when the load is complete. This would actually simplify and make my own code quite a bit more robust and it would add another voice to the important area of our industry that is getting a lot of focus at the moment, that of distributed caching.

Terracotta looks like a killer app in all ways. I’d love to be able to use something I didn’t write myself to solve a lot of our problems. But I spent all our money on VMware support and new servers. There’s nothing left to go chasing proprietary and heavy solutions to our problems. I’ll use OpenSource or software I’ve written myself–or I’ll do something else entirely. A distributed data cache backed by RabbitMQ will be relatively lightweight (probably not at first, as I often have to strip things out to get to my lightweight goal) and I’m sure quite fast. It will transparently allow for sharding and aggregating data with no additional configuration. Since queue subscribers get load-balanced anyway, there’s no need to figure out some way to split up objects because they’ll be spread over however many listeners I put on those queues. I can partition data by using different RabbitMQ servers and combinations of queues and exchanges.

I’m starting work on this right away since I’ll be on vacation next week and, geek that I am, will likely not be able to pull myself away for long. Expect to see something on GitHub week after next!

Written by J. Brisbin

July 7, 2010 at 7:14 pm

Cloud-friendly Classloading with RabbitMQ

leave a comment »

One of the things everyone who deploys artifacts into the cloud has to deal with is the issue of classloading. If you have multiple nodes out there, listening to your RabbitMQ server, waiting to do work, you have to have pre-deployed all the dependencies you need. This means some system to either copy them out there automatically (in the case of deployable artifacts), or you simply have to copy the JAR files into a lib/ directory somewhere that the listener has access to.

None of these solutions is ideal.

I was contemplating this on my way to work the other day and I’ve come up with a solution that I’m most of the way finished coding: write a ClassLoader that uses RabbitMQ to load the class data from a “provider” (just a listener somewhere in the cloud that actually *does* have that class in its CLASSPATH).

There are two moving parts: a Provider and a ClassLoader. The Provider has a number of message listeners and binds them to the configured exchange with routing keys that could be “com.mycompany.cloud.#”, or “com.thirdparty.#”, or simply “#”. The routing key is the class or resource name, so you could have different providers for different areas of responsibility. Third-party classes could come from one provider, while your own internal class files could come from an entirely different provider (ostensibly running on a different VM).

Some potential uses:

1. You could provide added layers of class file security because you could control exactly where class files come from without exposing those class files to be copied to the file system.
2. Providing class files dynamically to nodes that come up and down based on system demand but still need to do work that requires those individual classes. Amazon EC2 instances would not need to be pre-populated with JAR files, simply configured to use the cloud classloader pointed to your RabbitMQ server.
3. Wrap normal classloading with some AOP hooks that would cloud-ify an entire installation without touching the source code or using special configurations.

Point number 3 is the most interesting to me. Using Spring AOP, one could wrap normal classloading with a cloud-friendly version, which would alter the way all your classloaders work, without having to hack on the Tomcat source code (or whatever application you’re deploying).

I suspect I’ll write a Maven-aware Provider that will search maven repositories for requested class files. I’m sure there are other possibilities here.

Code will be posted on Github this week or next.

As always, patches and feedback are eagerly sought and heartily welcomed.

Written by J. Brisbin

June 29, 2010 at 8:33 pm

Log4J Logging with RabbitMQ

leave a comment »

In troubleshooting some problems I was having deploying my cloud-based session manager, I quickly grew frustrated by having to tail log files in three or four windows at once. With no real ability to filter what I was looking for, my important log messages would get buried under the truckloads of other DEBUG-level messages being dumped into those log files. I simply needed a better way to aggregate and monitor my log files.

I wrote an appender for Log4J that dumps logging events into a RabbitMQ queue rather than writing them to disk or inserting them into a database.

Our company is quite frugal, so the quote we got for Splunk, a tool to aggregate log files, was throat-constricting. Something in the tens of thousands! Thanks, but no thanks.

I haven’t written a web front-end for this yet, but it will be really simple to when I do. It will have a listener on the log events queue that processes incoming log events and builds nice grids so I can sort and search and do all those other Web 2.0 Ajax-y things.

It’s part of the larger umbrella of private, hybrid cloud utilities I have on Github. You can download the source on the vcloud project page: http://github.com/jbrisbin/vcloud/tree/master/amqp-appender/

Written by J. Brisbin

June 25, 2010 at 8:33 pm

Cloud Artifact Deployment with RabbitMQ and Ruby

leave a comment »

Running a hybrid or private cloud is great for your scalability but can get a little dodgy when it comes to deploying artifacts onto the various servers that need them. To show how I’m solving this problem, I’ve uploaded my Ruby scripts that monitor and deploy artifacts that have been staged by the automated processes on my continuous integration server, TeamCity. In order to make it fairly secure, it will not deploy arbitrary artifacts. Anything you want automatically deployed must be explicitly configured as to the URL from which to download the artifact and the path to which you want it copied (or unzipped/untarred).

The Parts

There are a couple moving parts here. You need a RabbitMQ server, of course. You also need a couple servers to deploy things to. I use three instances of SpringSource tcServer (basically Tomcat 6.0) per Ubuntu 10.04 virtual machine. So this script needs to deploy the same file to three different locations. I also need to deploy HTML files to my Apache server’s document root. As an aside: Apache has now been relegated to only serving static resources and PHP pages and is no longer the out-in-front proxy server. I’ve switched to HAProxy. I love it. More on that in a future post.

The Scripts

I haven’t included the script that actually publishes the notifications yet. That’s a Python script at the moment (Ruby is so much more fun to program in than Python :). It looks like this:


#!/usr/bin/python

import os, sys, hashlib
from amqplib import client_0_8 as amqp

EXCHANGE = 'vcloud.deployment.events'

def get_md5_sum(filename):
  if not os.path.exists(filename):
    return None

  md5 = hashlib.md5()
  try:
    with open(filename, 'r') as f:
      bytes = f.read(4096)
      while bytes:
        md5.update(bytes)
        bytes = f.read(4096)
  except IOError:
  # Probably doesn't exist
    pass
  return md5.hexdigest()

def send_deploy_message(queue=None, artifact=None, unzip=False):
	if not queue is None and not artifact is None:
		md5sum = get_md5_sum('/var/deploy/artifacts/%s' % sys.argv[2])
		#print 'MD5: %s' % md5sum

		mq_conn = amqp.Connection(host='rabbitmq', userid='guest', password='guest', virtual_host='/')
		mq_channel = mq_conn.channel()
		mq_channel.exchange_delete(EXCHANGE)
		mq_channel.exchange_declare(EXCHANGE, 'topic', durable=True, auto_delete=False)
		mq_channel.queue_declare(queue, durable=True, auto_delete=False, exclusive=False)
		mq_channel.queue_bind(queue=queue, exchange=EXCHANGE)
		msg = amqp.Message(artifact, delivery_mode=2, correlation_id=md5sum, application_headers={ 'unzip': unzip })
		mq_channel.basic_publish(msg, exchange=EXCHANGE, routing_key='')

if __name__ == '__main__':
	send_deploy_message(queue=sys.argv[1], artifact=sys.argv[2], unzip=sys.argv[3])

I’ll be converting this to Ruby at some point soon.

You can check out the Ruby scripts themselves on Github: http://github.com/jbrisbin/cloud-utils-deployer

The Deployment Chain

When our developers check anything into our Git repository, TeamCity sees that change and commences to build the project and automagically stage those artifacts onto the development server. This deployment requires no manual intervention. We always want development to use the latest bleeding edge of our application code. Once we’ve had a chance to test those changes and we’re ready to push them to production, I have a configuration in TeamCity that calls the above Python script. The developer can just click the button and it publishes a message to RabbitMQ announcing the availability of that project’s artifacts (of which there’s likely several). We haven’t decided how often we want the actual deployment to happen, but for the moment a cron job runs at 7:00 A.M. every morning on all the running application servers (it should also be run from an init.d script to catch servers that have been down and are behind on their artifacts). That script is the “monitor” script. It simply subscribes to a queue with the same name as the configuration section in the monior.yml YAML file:


myapp.war:
  :deploy: deploy -e %s

The “%s” placeholder in the “:deploy” section (the preceding colon is significant in Ruby) will be replaced by the name of the artifact as pulled from the body of the message. It may or may not correspond to the queue name. It doesn’t have to because it’s simply an arbitrary key in the deploy.yml file.

The “deploy” script is where all the fun happens. Via command-line switches, you can turn on or off the ETag matching and MD5 sum matching it does to keep from redeploying something that it’s already deployed (it keeps track in its own cache files).

First, the deployment script has to download the resource to a temporary file:


request = Net::HTTP::Get.new(@uri.request_uri)
load_etags do |etags|
	etag = etags[@name]
	if !@force and !etag.nil?
		request.initialize_http_header({
			'If-None-Match' => etag
		})
	end

	response = @http.request(request)
	case response
		when Net::HTTPSuccess
			# Continue to download file...
			$log.info(@name) { "Downloading: #{@uri.to_s}..." }
			bytes = response.body
			require "md5"
			@hash = MD5.new.update(bytes).hexdigest
			# Write to temp file, ready to deploy
			@temp_file = "/tmp/#{@name}"
			File.open(@temp_file, "w") { |f| f.write(bytes) }
			# Update ETags
			etags[@name] = response['etag']

			outdated = true
		when Net::HTTPNotModified
			# No need to download it again
			$log.info(@name) { "ETag matched, not downloading: #{@uri.to_s}" }
		else
			$log.fatal(@name) { "Error HTTP status code received: #{response['code']}" }
	end

	if @use_etags
		save_etags(etags)
	end
end

This method returns a true|false depending on if it thinks the resource is out-of-date or not. The deployment script then calls the “deploy!” method, which attempts to either copy the resource (if it’s say, a WAR file) or unzip the resource to the pre-configured path (if it’s say, a “.tar.gz” file of static HTML resources or a “.zip” file of XML definitions). The deployer decides whether to try to unzip or untar based on the extension. If it’s “.tar.gz” it will run the “tar” command. If it’s anything else, it will try to unzip it. This isn’t configurable, but might be a good project for someone if they want to use “.tbz2” files or something! 🙂

Permissions

The user you run this as matters. I have the log file set to “/var/log/cloud/deployer.log”. This is configurable in the sense that you can download the source code and change it in the constant where it’s defined (cloud/logger.rb). Your user should also have write permission to a directory named “/var/lib/cloud/”. You can change this (at the moment) only by editing the “cloud/deploy.rb” file and changing the constants. There’s only so many hours in the day. Just didn’t have time to make it fully configurable. I’d love some help on that, though, and would gladly accept patches!

Still to come…

I just haven’t had time to make it a true Gem yet. That’s my intention, but at this point, on a Friday afternoon, I’m thinking it’ll be next yet before that’s done. UPDATE: Done! This is now on RubyGems.org.

As always, the full source (Apache licensed) is on Github:

http://github.com/jbrisbin/cloud-utils-deployer

I’d love to hear what you think.

Written by J. Brisbin

June 11, 2010 at 8:55 pm

Tomcat/tcServer session manager with attribute replication

leave a comment »

I’d like to think that my programming projects don’t suffer from philosophical schizophrenia but that I simply move from point A to point B so fast it just looks that way. Unfortunately, sometimes I have to come face-to-face with this and accept it for what it is: my programming projects suffer from philosophical schizophrenia.

I say this because I’ve been hacking on some changes to the virtual/hybrid cloud Tomcat and tcServer session manager and I went through several stages while I was trying to solve some basic problems.

For one, how do I get a serialized object in the cloud when I don’t know where it is? RabbitMQ comes to my rescue here because I think I’ve finally settled on the most efficient way to answer this question: bind a queue to a topic exchange using the object’s key or identifier (in this case, a session ID). Then any messages for that object automagically get routed to the right VM (the one that has that object in memory). I’m thinking this idea can be extended to create an asynchronous, RabbitMQ-backed, Map implementation.

Now that I have the object, how do I keep it in sync with the “master”? In my case, I send a replication message out whenever a session’s (set|remove)Attribute methods are called and the value objects differ. One notable problem that I don’t see being easily overcome (but thankfully, doesn’t apply to my scenario) is if there are listeners on sessions. I don’t have RabbitMQ wired into the internal Catalina session event mechanism. I could add that at some point, but for the moment, I think this kind of dumb saving/loading via RabbitMQ messages will work for what we’re doing.

I’ve now switched back to a ONEFORALL type operation mode which means there is only one real session object that resides in an internal map on the server who first created it. Whenever another server sees a request for this session, it will send a load message to this queue every time it needs it. That last part is important: it loads a session object every time it needs it. When code sets an attribute on server TC3, that attribute is replicated back to the original server (TC1) so subsequent session loads get that updated object. I’m still trying to wrap my head around how I want to handle replication in case of node failures. No good answer on that one, yet.

REPLICATED mode is my next task. In simplifying this code, I focussed on getting the ONEFORALL mode working right to begin with. Now I can go back and make it more performant by cranking out a little extra code to handle replication events.

Initial smoke tests seem to indicate this works pretty well. Session load times in my testing were around 20-60 milliseconds. Depending on the location of your RabbitMQ server and your network topology, you might experience different results. I’m sanding off a few rough edges now and I’ll be testing this on a new set of cloud servers we’re setting up as part of a new vSphere/vCenter-managed cloud.

As always, the code is available on GitHub:

http://github.com/jbrisbin/vcloud/tree/master/session-manager/

Written by J. Brisbin

May 4, 2010 at 6:35 pm

Distributed Atomicity in the cloud with RabbitMQ

leave a comment »

The private cloud Tomcat/tcServer session manager I’m working on has a huge job cut out for it. Maintaining the state of an object that exists in possibly more than one location at any given point in time is not an easy task, I know. To be honest, if it weren’t for my Midwestern stubbornness, I might not take the time to work through these hefty issues. I might follow the path of least resistance, like most of the industry has done so far.

I just don’t like the idea of sticky sessions. I look at my pool of tcServer instances as one big homongenous group of available resources. In my mind, there should be no distinction made between machines running in different VMs–or even on different hardware. They should exist and cooperate together as a single unit.

But in “replicated” mode, each server has a copy of the object. This is great for failover and it makes the session manager extremely performant. But yet another sticky wicket rears its ugly head. How do I protect this object and make sure it gets updated properly before someone else has a chance to operate on it?

Call it distributed atomicity if you want–the idea being that an object exists within the context of a cloud of compute resources (in this case, a Tomcat/tcServer user session object) and needs to be updated with all the right attributes when code in a different physical process operates on that object. I’m attacking this problem by implementing a form of distributed atomicity that uses RabbitMQ to send the contents of newly-added attributes to any interested parties throughout the cloud. I already replicate the session object by grabbing it with a Valve, just before the request is completed. This session object gets serialized to the cloud before the response is sent, the idea being that this particular object will be updated in all the places it is needed before another server has a chance to operate on that object.

By using the messaging infrastructure of RabbitMQ, I can at least make updates to this object reasonably atomic. Now the question becomes: where does this object live? For performance reasons, it’s probably not realistic to have just one object to share among web application servers. In the case of Tomcat/tcServer, the internal code is requesting the session object so often (multiple times during a single request) that each server simply has to cache a session object for the length of the user’s request.

A tool like ZooKeeper might be helpful in this case. If code has to set an attribute on a session object, the session would set a barrier in ZooKeeper that lets other code know it is in the process of being altered. Once setAttribute() is finished, a message is then sent with the serialized attribute. The other interested parties could alter its local copy of the object with the updated attribute until it receives a full replication of the object. Would the second, full replication be superfluous? At this point I can’t say. In the interest of completeness, I feel compelled to issue a second replication event, but in the interest of performance and bandwidth conservation, I wonder if its really necessary.

I’m far from finished with the cloud-based session manager. I’m trying to get it to a stable point so that I can migrate my cloud away from sticky sessions. The “replicated” mode seems to work fine; and I’m okay with sending too many messages–I’d rather have that than have too few and end up with page loads blocking because the session can’t be tracked down.

Distributed, asynchronous programming isn’t easy. It isn’t for the faint of heart or those with pesky bosses breathing down their necks to meet arbitrary and usually unhelpful deadlines. It also doesn’t help if you’re not a bona-fide genius. I often feel a little out of my league given the number of CompSci grads that are doing fantastic work in this interesting and growing segment of the industry. But I’m stubborn enough to keep plugging away when I should probably give up.

Written by J. Brisbin

April 22, 2010 at 6:17 pm

Publish Tomcat/tcServer lifecycle events into the cloud with RabbitMQ

with one comment

Once of the common tasks in any cloud environment is to manage membership lists. In the case of a cloud of Tomcat or SpringSource tcServer instances, I wrote a simple JMX MBean class that exposes my tcServer instances to RabbitMQ and serves two functions:

  1. Expose the calling of internal JMX methods to management tools that send messages using RabbitMQ.
  2. Expose the Catalina lifecyle events to the entire cloud.

To maintain a membership list of tcServer instances, I now just have to listen to the events exchange and respond to the lifecycle events I’m interested in:

def members = []
mq.exchange(name: "vcloud.events", type: "topic") {
  queue(name: null, routingKey: "#") {
    consume onmessage: {msg ->
      def key = msg.envelope.routingKey
      def msgBody = msg.bodyAsString
      def source = msg.envelope.routingKey[msgBody.length() + 1..key.length() - 1]
      println "Received ${msgBody} event from ${source}"
      if ( msgBody == "start" ) {
        members << source
      } else if ( msgBody == "stop" ) {
        members.remove(source)
      }

      return true
    }
  }
}

Starting and stopping the tcServer instance yields this in the console:

Received init event from instance.id
members=[]
Received before_start event from instance.id
members=[]
Received start event from instance.id
members=[instance.id]
Received after_start event from instance.id
members=[instance.id]
Received before_stop event from instance.id
members=[instance.id]
Received stop event from instance.id
members=[]
Received after_stop event from instance.id
members=[]

It seems to me one of the defining characteristics of cloud computing versus traditional clusters is the transparency between runtimes and what used to be separate servers. To that end, I’ve exposed the inner workings of my tcServers both to other servers of their kind in the cloud, and to sundry management and monitoring tools I may choose to write in the future.

If you’re concerned with security, opening up the JMX MBeans of your server may give you pause. Fair enough. In my case, that’s not as big of a concern because these servers are protected from the outside world. Only LAN and WAN users can access these servers, so I don’t mind exposing JMX methods to trivially-secured message brokers, particularly if it gives me this kind of inexpensive and direct control over the services I’m exposing to the cloud.

Written by J. Brisbin

April 15, 2010 at 9:32 pm