J. Brisbin

Just another Wordpress.com weblog

Archive for June 2010

Cloud-friendly Classloading with RabbitMQ

leave a comment »

One of the things everyone who deploys artifacts into the cloud has to deal with is the issue of classloading. If you have multiple nodes out there, listening to your RabbitMQ server, waiting to do work, you have to have pre-deployed all the dependencies you need. This means some system to either copy them out there automatically (in the case of deployable artifacts), or you simply have to copy the JAR files into a lib/ directory somewhere that the listener has access to.

None of these solutions is ideal.

I was contemplating this on my way to work the other day and I’ve come up with a solution that I’m most of the way finished coding: write a ClassLoader that uses RabbitMQ to load the class data from a “provider” (just a listener somewhere in the cloud that actually *does* have that class in its CLASSPATH).

There are two moving parts: a Provider and a ClassLoader. The Provider has a number of message listeners and binds them to the configured exchange with routing keys that could be “com.mycompany.cloud.#”, or “com.thirdparty.#”, or simply “#”. The routing key is the class or resource name, so you could have different providers for different areas of responsibility. Third-party classes could come from one provider, while your own internal class files could come from an entirely different provider (ostensibly running on a different VM).

Some potential uses:

1. You could provide added layers of class file security because you could control exactly where class files come from without exposing those class files to be copied to the file system.
2. Providing class files dynamically to nodes that come up and down based on system demand but still need to do work that requires those individual classes. Amazon EC2 instances would not need to be pre-populated with JAR files, simply configured to use the cloud classloader pointed to your RabbitMQ server.
3. Wrap normal classloading with some AOP hooks that would cloud-ify an entire installation without touching the source code or using special configurations.

Point number 3 is the most interesting to me. Using Spring AOP, one could wrap normal classloading with a cloud-friendly version, which would alter the way all your classloaders work, without having to hack on the Tomcat source code (or whatever application you’re deploying).

I suspect I’ll write a Maven-aware Provider that will search maven repositories for requested class files. I’m sure there are other possibilities here.

Code will be posted on Github this week or next.

As always, patches and feedback are eagerly sought and heartily welcomed.

Advertisements

Written by J. Brisbin

June 29, 2010 at 8:33 pm

Log4J Logging with RabbitMQ

leave a comment »

In troubleshooting some problems I was having deploying my cloud-based session manager, I quickly grew frustrated by having to tail log files in three or four windows at once. With no real ability to filter what I was looking for, my important log messages would get buried under the truckloads of other DEBUG-level messages being dumped into those log files. I simply needed a better way to aggregate and monitor my log files.

I wrote an appender for Log4J that dumps logging events into a RabbitMQ queue rather than writing them to disk or inserting them into a database.

Our company is quite frugal, so the quote we got for Splunk, a tool to aggregate log files, was throat-constricting. Something in the tens of thousands! Thanks, but no thanks.

I haven’t written a web front-end for this yet, but it will be really simple to when I do. It will have a listener on the log events queue that processes incoming log events and builds nice grids so I can sort and search and do all those other Web 2.0 Ajax-y things.

It’s part of the larger umbrella of private, hybrid cloud utilities I have on Github. You can download the source on the vcloud project page: http://github.com/jbrisbin/vcloud/tree/master/amqp-appender/

Written by J. Brisbin

June 25, 2010 at 8:33 pm

Cloud Artifact Deployment with RabbitMQ and Ruby

leave a comment »

Running a hybrid or private cloud is great for your scalability but can get a little dodgy when it comes to deploying artifacts onto the various servers that need them. To show how I’m solving this problem, I’ve uploaded my Ruby scripts that monitor and deploy artifacts that have been staged by the automated processes on my continuous integration server, TeamCity. In order to make it fairly secure, it will not deploy arbitrary artifacts. Anything you want automatically deployed must be explicitly configured as to the URL from which to download the artifact and the path to which you want it copied (or unzipped/untarred).

The Parts

There are a couple moving parts here. You need a RabbitMQ server, of course. You also need a couple servers to deploy things to. I use three instances of SpringSource tcServer (basically Tomcat 6.0) per Ubuntu 10.04 virtual machine. So this script needs to deploy the same file to three different locations. I also need to deploy HTML files to my Apache server’s document root. As an aside: Apache has now been relegated to only serving static resources and PHP pages and is no longer the out-in-front proxy server. I’ve switched to HAProxy. I love it. More on that in a future post.

The Scripts

I haven’t included the script that actually publishes the notifications yet. That’s a Python script at the moment (Ruby is so much more fun to program in than Python :). It looks like this:


#!/usr/bin/python

import os, sys, hashlib
from amqplib import client_0_8 as amqp

EXCHANGE = 'vcloud.deployment.events'

def get_md5_sum(filename):
  if not os.path.exists(filename):
    return None

  md5 = hashlib.md5()
  try:
    with open(filename, 'r') as f:
      bytes = f.read(4096)
      while bytes:
        md5.update(bytes)
        bytes = f.read(4096)
  except IOError:
  # Probably doesn't exist
    pass
  return md5.hexdigest()

def send_deploy_message(queue=None, artifact=None, unzip=False):
	if not queue is None and not artifact is None:
		md5sum = get_md5_sum('/var/deploy/artifacts/%s' % sys.argv[2])
		#print 'MD5: %s' % md5sum

		mq_conn = amqp.Connection(host='rabbitmq', userid='guest', password='guest', virtual_host='/')
		mq_channel = mq_conn.channel()
		mq_channel.exchange_delete(EXCHANGE)
		mq_channel.exchange_declare(EXCHANGE, 'topic', durable=True, auto_delete=False)
		mq_channel.queue_declare(queue, durable=True, auto_delete=False, exclusive=False)
		mq_channel.queue_bind(queue=queue, exchange=EXCHANGE)
		msg = amqp.Message(artifact, delivery_mode=2, correlation_id=md5sum, application_headers={ 'unzip': unzip })
		mq_channel.basic_publish(msg, exchange=EXCHANGE, routing_key='')

if __name__ == '__main__':
	send_deploy_message(queue=sys.argv[1], artifact=sys.argv[2], unzip=sys.argv[3])

I’ll be converting this to Ruby at some point soon.

You can check out the Ruby scripts themselves on Github: http://github.com/jbrisbin/cloud-utils-deployer

The Deployment Chain

When our developers check anything into our Git repository, TeamCity sees that change and commences to build the project and automagically stage those artifacts onto the development server. This deployment requires no manual intervention. We always want development to use the latest bleeding edge of our application code. Once we’ve had a chance to test those changes and we’re ready to push them to production, I have a configuration in TeamCity that calls the above Python script. The developer can just click the button and it publishes a message to RabbitMQ announcing the availability of that project’s artifacts (of which there’s likely several). We haven’t decided how often we want the actual deployment to happen, but for the moment a cron job runs at 7:00 A.M. every morning on all the running application servers (it should also be run from an init.d script to catch servers that have been down and are behind on their artifacts). That script is the “monitor” script. It simply subscribes to a queue with the same name as the configuration section in the monior.yml YAML file:


myapp.war:
  :deploy: deploy -e %s

The “%s” placeholder in the “:deploy” section (the preceding colon is significant in Ruby) will be replaced by the name of the artifact as pulled from the body of the message. It may or may not correspond to the queue name. It doesn’t have to because it’s simply an arbitrary key in the deploy.yml file.

The “deploy” script is where all the fun happens. Via command-line switches, you can turn on or off the ETag matching and MD5 sum matching it does to keep from redeploying something that it’s already deployed (it keeps track in its own cache files).

First, the deployment script has to download the resource to a temporary file:


request = Net::HTTP::Get.new(@uri.request_uri)
load_etags do |etags|
	etag = etags[@name]
	if !@force and !etag.nil?
		request.initialize_http_header({
			'If-None-Match' => etag
		})
	end

	response = @http.request(request)
	case response
		when Net::HTTPSuccess
			# Continue to download file...
			$log.info(@name) { "Downloading: #{@uri.to_s}..." }
			bytes = response.body
			require "md5"
			@hash = MD5.new.update(bytes).hexdigest
			# Write to temp file, ready to deploy
			@temp_file = "/tmp/#{@name}"
			File.open(@temp_file, "w") { |f| f.write(bytes) }
			# Update ETags
			etags[@name] = response['etag']

			outdated = true
		when Net::HTTPNotModified
			# No need to download it again
			$log.info(@name) { "ETag matched, not downloading: #{@uri.to_s}" }
		else
			$log.fatal(@name) { "Error HTTP status code received: #{response['code']}" }
	end

	if @use_etags
		save_etags(etags)
	end
end

This method returns a true|false depending on if it thinks the resource is out-of-date or not. The deployment script then calls the “deploy!” method, which attempts to either copy the resource (if it’s say, a WAR file) or unzip the resource to the pre-configured path (if it’s say, a “.tar.gz” file of static HTML resources or a “.zip” file of XML definitions). The deployer decides whether to try to unzip or untar based on the extension. If it’s “.tar.gz” it will run the “tar” command. If it’s anything else, it will try to unzip it. This isn’t configurable, but might be a good project for someone if they want to use “.tbz2” files or something! 🙂

Permissions

The user you run this as matters. I have the log file set to “/var/log/cloud/deployer.log”. This is configurable in the sense that you can download the source code and change it in the constant where it’s defined (cloud/logger.rb). Your user should also have write permission to a directory named “/var/lib/cloud/”. You can change this (at the moment) only by editing the “cloud/deploy.rb” file and changing the constants. There’s only so many hours in the day. Just didn’t have time to make it fully configurable. I’d love some help on that, though, and would gladly accept patches!

Still to come…

I just haven’t had time to make it a true Gem yet. That’s my intention, but at this point, on a Friday afternoon, I’m thinking it’ll be next yet before that’s done. UPDATE: Done! This is now on RubyGems.org.

As always, the full source (Apache licensed) is on Github:

http://github.com/jbrisbin/cloud-utils-deployer

I’d love to hear what you think.

Written by J. Brisbin

June 11, 2010 at 8:55 pm

Beware the 802.3ad Beast!

leave a comment »

I readily admit I’m not a network person. We have engineers for that where I work, so I don’t have to be. But every once in a while, something crops up that makes even our network engineers scratch their heads, shrug, and say something to the effect of: “from what I can tell, it should work.”

We use VMware ESX server as our virtualization hypervisor and we manage them with vSphere. I love my VMware boxes. You’d have to pry them from my cold, dead, fingers if you want to get them away from me. But last week made get very close to actually cursing. The problem seemed like it was something very basic: virtual machines on one ESX host couldn’t always talk to virtual machines on another ESX host. Identically configured hosts, no less! If I ssh’d into one box and issued a “ping” back to box 1, then traffic started flowing. It was simply forgetting how to get to that other VM after some number of minutes. We tried every setting in the Cisco switch. We even called VMware support and they looked at everything and couldn’t find any glaring errors.

The next day, on a whim, I had our network engineer enable EtherChannel (Cisco’s name for 802.3ad) on the ports my ESX hosts were plugged into and I switched the ESX server’s NIC load balancing to “IP hash” instead of originating port ID and voila! It magically started working.

I was so frustrated the previous day that I was a little annoyed, truth be told, that the fix was something so simple. So if your VMs have trouble talking to other VMs reliably and you think your switches are losing the ARP entries for those VMs, enable EtherChannel for the NICs you’re using on that ESX host and change your load balancing to “IP hash” on the vSwitch properties in your vSphere client. It’ll save you a lot of headache.

Written by J. Brisbin

June 7, 2010 at 4:23 pm

Posted in The Server Side