Distributed Atomicity in the cloud with RabbitMQ
The private cloud Tomcat/tcServer session manager I’m working on has a huge job cut out for it. Maintaining the state of an object that exists in possibly more than one location at any given point in time is not an easy task, I know. To be honest, if it weren’t for my Midwestern stubbornness, I might not take the time to work through these hefty issues. I might follow the path of least resistance, like most of the industry has done so far.
I just don’t like the idea of sticky sessions. I look at my pool of tcServer instances as one big homongenous group of available resources. In my mind, there should be no distinction made between machines running in different VMs–or even on different hardware. They should exist and cooperate together as a single unit.
But in “replicated” mode, each server has a copy of the object. This is great for failover and it makes the session manager extremely performant. But yet another sticky wicket rears its ugly head. How do I protect this object and make sure it gets updated properly before someone else has a chance to operate on it?
Call it distributed atomicity if you want–the idea being that an object exists within the context of a cloud of compute resources (in this case, a Tomcat/tcServer user session object) and needs to be updated with all the right attributes when code in a different physical process operates on that object. I’m attacking this problem by implementing a form of distributed atomicity that uses RabbitMQ to send the contents of newly-added attributes to any interested parties throughout the cloud. I already replicate the session object by grabbing it with a Valve, just before the request is completed. This session object gets serialized to the cloud before the response is sent, the idea being that this particular object will be updated in all the places it is needed before another server has a chance to operate on that object.
By using the messaging infrastructure of RabbitMQ, I can at least make updates to this object reasonably atomic. Now the question becomes: where does this object live? For performance reasons, it’s probably not realistic to have just one object to share among web application servers. In the case of Tomcat/tcServer, the internal code is requesting the session object so often (multiple times during a single request) that each server simply has to cache a session object for the length of the user’s request.
A tool like ZooKeeper might be helpful in this case. If code has to set an attribute on a session object, the session would set a barrier in ZooKeeper that lets other code know it is in the process of being altered. Once setAttribute() is finished, a message is then sent with the serialized attribute. The other interested parties could alter its local copy of the object with the updated attribute until it receives a full replication of the object. Would the second, full replication be superfluous? At this point I can’t say. In the interest of completeness, I feel compelled to issue a second replication event, but in the interest of performance and bandwidth conservation, I wonder if its really necessary.
I’m far from finished with the cloud-based session manager. I’m trying to get it to a stable point so that I can migrate my cloud away from sticky sessions. The “replicated” mode seems to work fine; and I’m okay with sending too many messages–I’d rather have that than have too few and end up with page loads blocking because the session can’t be tracked down.
Distributed, asynchronous programming isn’t easy. It isn’t for the faint of heart or those with pesky bosses breathing down their necks to meet arbitrary and usually unhelpful deadlines. It also doesn’t help if you’re not a bona-fide genius. I often feel a little out of my league given the number of CompSci grads that are doing fantastic work in this interesting and growing segment of the industry. But I’m stubborn enough to keep plugging away when I should probably give up.