J. Brisbin

Just another Wordpress.com weblog

Archive for July 2010

RabbitMQ as a NoSQL distributed cache

leave a comment »

Part of what I’ve been doing with the cloud-friendly Tomcat session manager is basically implementing my own asynchronous distributed object cache. At the moment, this functionality is tightly coupled to what I’m doing inside the session Store. But in making some changes recently to add Spring Security integration and make working with Spring Security 3.0 a little easier, I noticed that there’s a lot of what I’m doing inside the session Store that could simply be abstracted into its own package and used as a standalone distributed cache.

The concept is simple and I think the code will be straightforward. Instead of synchronously loading an object from a data store (which is configured on the back end to shard its data or do other kinds of distributed load-balancing and failover replication), code would request the object be loaded asynchronously and provide a callback to be executed when the load is complete. This would actually simplify and make my own code quite a bit more robust and it would add another voice to the important area of our industry that is getting a lot of focus at the moment, that of distributed caching.

Terracotta looks like a killer app in all ways. I’d love to be able to use something I didn’t write myself to solve a lot of our problems. But I spent all our money on VMware support and new servers. There’s nothing left to go chasing proprietary and heavy solutions to our problems. I’ll use OpenSource or software I’ve written myself–or I’ll do something else entirely. A distributed data cache backed by RabbitMQ will be relatively lightweight (probably not at first, as I often have to strip things out to get to my lightweight goal) and I’m sure quite fast. It will transparently allow for sharding and aggregating data with no additional configuration. Since queue subscribers get load-balanced anyway, there’s no need to figure out some way to split up objects because they’ll be spread over however many listeners I put on those queues. I can partition data by using different RabbitMQ servers and combinations of queues and exchanges.

I’m starting work on this right away since I’ll be on vacation next week and, geek that I am, will likely not be able to pull myself away for long. Expect to see something on GitHub week after next!


Written by J. Brisbin

July 7, 2010 at 7:14 pm

Adventures in GrAppEngine

leave a comment »

Although I’ve blogged quite a bit about the cloud-based utilities I’ve been writing, releasing as OpenSource on GitHub, and working with here at the world’s largest Pizza Hut franchisee, there’s still plenty we’re doing to deploy Web 2.0 apps that I haven’t spent much time talking about. We’ve traditionally been a little tight-lipped about our application development because, quite frankly, there was no one to talk to about it. No one really cared much what a company they’ve never heard of does internally to develop applications for their own users.

But the tide is shifting away from this closed, isolationist attitude. It’s far from being endorsed at the highest levels; my superiors don’t particularly care that I blog on technical things but they’re not going to encourage me to do it. That said, I feel like I’ve made a smallish contribution to the global discussion of cloud computing. I hope to continue to do that by introducing you to aspects of our development efforts that have broader application. One of the things that might make good discussion material is our use of a custom-built Groovy-based REST framework for deploying web applications. I’ve alluded to it several times but never discussed it in detail. I’d like to open up this web framework a little and explain why we do what we do because I think it has broader application to the cloud deployment model.

The (currently unnamed) Groovy web framework I wrote is very opinionated about how to build and deploy applications. We use ExtJS (now called Sencha) internally as our Ajax toolkit of choice–primarily for the grid. There is no other JavaScript Ajax grid that I’ve found that is as powerful and easy to use as the Ext grid. It’s the foundation of a lot of our applications because the users understand how to use it. They know what a spreadsheet is and they know how to use a grid because it looks a lot like their email application. To power the grid, a developer needs to use a DataStore, which is an abstraction over Ajax and JSON that exposes server-side data to Ext components.

I realized early on that a lot of what we do when we build applications is simply expose data to end users. We give them lists (grids) and detail items that explain things to them. We give them links to other detail pages. Even updating this information is simply exposing a form that, when the user hits the “Update” button, sends the data to the back end to be persisted. There’s not a lot of actual code that needs to go into the basic CRUD operations that are the majority of our applications.

So the first thing I did when designing this framework was provide a way to map an HTTP verb (GET, POST, PUT, DELETE) to a CRUD operation (create, retrieve, update, delete) as defined in a Groovy source file. This works excellently for DataStore applications because this is all handled automatically by the Store. When the user interface updates a record in the client end and requests a save, the Store handles PUT’ing the data back to the server.

The whole of the web framework is not really designed to return HTML. It’s designed to return JSON. It handles serializing the data you want to send back to the client at the framework level. The way it does this is by using an SQL DSL (Domain-Specific Language) that allows the developer to express an SQL statement such that it can be built based on input data (e.g. by adding or removing columns or changing sort orders) and can be handed off to the framework for delegated execution.

By way of example, here’s a REST definition file that is part of a maintenance application to update an Ext menubar component:

import com.npci.enterprise.rest.util.SqlTemplate

create = {sql ->

retrieve = {sql ->
 dataSource = bean("postgres")

 minId = 0;
 sql {
   select "id,parent_id,title as \"text\",not(has_children) as leaf,order_index"
   from "webmenu.items_tree"
   where {
     condition column: "id", operator: ">", var: "minId"
     condition column: "id", var: "id", required: false
     condition column: "parent_id", param: "parent", type: "integer", required: false
   if (exists("sort")) {
     order by: [sort], direction: dir
   outputRawData true
   nolimit true

update = {sql ->
 dataSource = bean("postgres")

 def relDelete = new SqlTemplate(dataSource, "DELETE FROM webmenu.relationships WHERE child_id = ?")

 sql {
   insert "webmenu.relationships"
   column name: "parent_id", param: "parentId"
   column name: "child_id", param: "childId"
   column name: "order_index", param: "orderIndex"

delete = {sql ->

This illustrates several things about the REST framework that make it worth our time to develop with:

  1. When the DataStore that powers the Tree component requests the data with a “GET”, the framework runs the “retrieve” closure. This particular closure doesn’t actually execute anything. It simply sets up an environment that the framework will use when it invokes the SQL being returned by the configured DSL helper object (the “sql” variable being passed into the closure). The developer can also specify a filter expression, which will only output the row of data if the filter closure returns “true”.
  2. In the “where” block, there are multiple column definitions, but only one of them (the first) will ever be run all the time. You’ll notice the other column definitions have “required: false” set on them. This means that if no variable exists with the name you’ve defined, the column won’t be included in the WHERE clause.
  3. The sort ORDER BY and direction are controlled by the Store. I wrote a little helper Closure called “exists” that serves the same function as PHP’s is_set.
  4. “outputRawData” means don’t include some of the other metaData that is normally included in JSON responses that inform the requesting Store how many records there are and other such information about the results being returned. But the developer *can* specify any extra metaData that should be included with the results and sent back to the client if they wanted to. This provides a clean mechanism for returning not just results, but arbitrary data that can be consumed by any Ajax request, not just Ext DataStores.
  5. “nolimit” is a setting that means don’t include any kind of pagination. By default, the REST framework will NOT return full result sets. It will only return pages of results at a time. This is controlled by the Ext grid, in combination with the Store. This makes application performance incredibly fast. In addition to the small size of JSON requests, we’re not selecting massive amounts of data from the database. We’re only working with slices of data at a time. This means performance improvements all the way back to the Postgres server, which can pull out a slice of records very quickly and efficiently.
  6. The “update” closure is invoked in response to an HTTP PUT request. You’ll notice that I also have a helper object here that ties the Spring JdbcTemplate to developer-supplied Groovy code (SqlTemplate). The execute method is overloaded to take several different kinds of input. In addition to parameters for the SQL statement, it takes a closure, which is invoked with every record. Executing SQL in a REST operation, then, means specifying the SQL statement in the helper, calling execute (passing any required parameters), and providing a Closure as a callback for each record returned.

We see some real benefit to developing applications in a strictly RESTful manner. Our HTML pages are simply wrappers that define the important areas of the screen into which our Ext components will be inserted. Since our web framework is transparently integrated with our client framework, we don’t have to write code to handle plumbing. I designed the framework so that the developer would only have to define the bare minimum of information required to get data from the user’s browser into the database and vice-versa. The useful abstractions I’ve included like the SqlTemplate mean I don’t have to write any more code than is required to execute the business logic.

Groovy and AppEngine

This is all closed-source, unfortunately. I’ve asked a couple times about simply OpenSource-ing it as is, but I get the impression there’s too many fires burning (there’s *always* too many fires burning) to give the idea much thought. It’s not that they’re opposed to OpenSource (in general), just that they don’t know much about giving away internal code, fear the idea at least a little bit, and would rather take the path of least resistance which is to ignore the question and move on to the next crisis.

I firmly believe this paradigm could be useful to cloud deployments outside our company, though, and since I like to keep up-to-date with what everyone’s doing in the community, I decided to port this REST framework to an AppEngine-friendly, OpenSource version. In taking on a start-up project recently, I investigated using Grails on AppEngine and found it was unwieldy because it required restarts of the AppEngine server *every time I saved my Groovy source*. I simply can’t be productive that way, so I chose to use Rails 3 and deploy on Heroku.

But I still want to see a cloud deployment option for Java/Spring apps and, even though this framework is no different than Grails in that you will be constantly restarting the server (it’s a side-effect of the Draconian limitations Google places on Java apps running on AppEngine), it should make developing RESTful Ajax AppEngine applications a little less painful. It won’t include a full ORM because, in my opinion, that’s impractical when developing Ajax/REST applications where the data is meant to be sent to the client for the actual processing. Why go to the trouble of wrapping a datastore in an ORM so you can have dynamic finder methods when all you’re going to do with those objects is serialize them and send them on out to the client? Less is more in this case, so I opted to adapt the SQL DSL I wrote to interact with PostgreSQL and the AS/400 to a JDOQL version that puts fewer layers between the actual data and the JavaScript on the other end of the request.

Another limitation I don’t like at all is the time limit for requests. Since this REST framework enforces delegation of the execution of queries to the framework itself, it would be easy to farm out requests for large amounts of data processing to an asynchronous queue, where the work could be done in true parallel, cloud fashion. Page requests, then, would be shorter in duration because of the parallelism. But AppEngine has no such capability. Task Queues are an approximation, of course. But Task Queues cannot replace an asynchronous message bus where workers are listening for events and do work in an event-driven way.

Another piece I’ve intentionally left out–that developers familiar with other web frameworks might be expecting–is a templating system. I use Sitemesh to make things a little more Grails-like, but we don’t have a need for a complex templating system because we don’t generate HTML. Everything that comes out of our REST applications comes out in JSON format. Data display is handled entirely by the Ext toolkit in the user’s browser. If your REST operation Groovy code delegates execution of a SQL statement to the framework, the framework handles streaming the JSON out to the browser. The developer doesn’t even need to do anything because it’s handled by the framework. The developer can, of course, choose to output their own content. You *could* output HTML or plain text or a PDF if you wanted. But the point here is that you don’t have to. The idea of the framework is to let the developer get to the business of the application faster, without taking time and energy away in writing code that doesn’t really contribute to the execution of the business logic.

I translated most of the critical portions of the codebase into an OpenSource version that’s friendly to the confines of AppEngine. Script reloading could be achieved by storing Groovy code in BigTable and providing a ResourceConnector to the GroovyScriptEngine that reads files from BigTable rather than loading a resource from the classpath. But I’m not really sure how easy it would be to provide some kind of build hook that loaded the Groovy source into BigTable whenever the local resource is saved. Without this, the developer would have to use a form and a textbox to edit their Groovy code. That’s doable, but pretty rotary-phone if you ask me.

I’ll be posting an example application that should run as-is on AppEngine and interacts with BigTable via JDO and illustrates all the points I’ve tried to outline in this rather lengthly article. It will be on GitHub, alongside my other cloud utilities. I won’t give myself a deadline just yet. I’m still half-way through a fairly extensive renovation of my house, where my wife and I are doing all the work. Time is limited. But I’ve been wanting to publish an OpenSource version of what I’m doing at work for a long time now. Keep an eye out!

Written by J. Brisbin

July 7, 2010 at 1:11 am