J. Brisbin

Just another Wordpress.com weblog

Archive for the ‘The Server Side’ Category

Beware the 802.3ad Beast!

leave a comment »

I readily admit I’m not a network person. We have engineers for that where I work, so I don’t have to be. But every once in a while, something crops up that makes even our network engineers scratch their heads, shrug, and say something to the effect of: “from what I can tell, it should work.”

We use VMware ESX server as our virtualization hypervisor and we manage them with vSphere. I love my VMware boxes. You’d have to pry them from my cold, dead, fingers if you want to get them away from me. But last week made get very close to actually cursing. The problem seemed like it was something very basic: virtual machines on one ESX host couldn’t always talk to virtual machines on another ESX host. Identically configured hosts, no less! If I ssh’d into one box and issued a “ping” back to box 1, then traffic started flowing. It was simply forgetting how to get to that other VM after some number of minutes. We tried every setting in the Cisco switch. We even called VMware support and they looked at everything and couldn’t find any glaring errors.

The next day, on a whim, I had our network engineer enable EtherChannel (Cisco’s name for 802.3ad) on the ports my ESX hosts were plugged into and I switched the ESX server’s NIC load balancing to “IP hash” instead of originating port ID and voila! It magically started working.

I was so frustrated the previous day that I was a little annoyed, truth be told, that the fix was something so simple. So if your VMs have trouble talking to other VMs reliably and you think your switches are losing the ARP entries for those VMs, enable EtherChannel for the NICs you’re using on that ESX host and change your load balancing to “IP hash” on the vSwitch properties in your vSphere client. It’ll save you a lot of headache.

Written by J. Brisbin

June 7, 2010 at 4:23 pm

Posted in The Server Side

Even more reason why Oracle/Sun isn’t really competetive in the cloud

leave a comment »

I blogged recently about our terrible experience with Oracle/Sun support. I’d like to say it’s just because they’re getting out of the hardware business and their heart isn’t in it any more. I’d like to say it’s not a systemic problem with the company as a whole. I’d like to say that (honestly) only because I recommended we buy Sun hardware so we could run Solaris 10 with the virtualization offered by zones. We continue to be frustrated with Oracle/Sun as a company and we see systemic problems in the organization that make it unreliable as a vendor for our mission-critical applications.

There are too many problems to go into much detail on each one, but suffice it to say that our experience with Oracle/Sun support has been so bad, we have a demo next week to check out HP’s newest virtualization offering because the company we’re working with on that:

a) Answered a question I had about hooking up Solaris 10 to our SAN. The guy who helped me wasn’t a Solaris guy, just a UNIX guy. But we got it done. The support tech at Sun who was supposed to be helping me had a list of about 6 or 8 questions about our environment that he wanted me to answer first. Things it would have taken me more than an hour to answer. This HP guy helped me with a Sun problem on the fly with no questionaires.

b) Will take our Sun servers on trade-in.

If the price is right and we like what we see, we’re going to dump Oracle/Sun entirely and trade in 5 Sun servers for something else–anything else.

To prove I’m not just exaggerating the atriciousness of the situation, our primary warehouse server running Solaris 10 on Sun hardware crashed on Sunday morning. The tech finally showed up Wednesday after lunch and proceeded to replace the main system board. Fired the server back up (at least tried to) and the same result as before: nothing. Completely dead. No BIOS, no boot screen, no nothing. He tried taking the processors out and swapping sockets, updating ILOM firware, everything. It’s just as dead as a doornail. Then he tried to get someone at Oracle/Sun on the phone to discuss the next steps. After wading through automated menus and never getting to talk to a real person (this is the Oracle/Sun tech, remember, not Joe the Customer) he smiled and told me: “I feel your pain.” So the tech left last night with our server riding shotgun with him back to The Big City to operate on it.

If the field tech can’t even contact the engineer who’s supposed to be helping us with our problem, how can that company hope to support the Vast Throng of cloud-computing sychophants who are getting tech envy and want a piece of the action? How can a company that does business like this be serious competition for those (claim jumpers though they may be) already in this space?

But that’s just the hardware side of things, I hear. I use product X and I get great support, I also hear. Consider yourself lucky, then, to have avoided the vast beauracracies within the company that make it an inefficient behemoth. I understand now why the company was failing before Oracle bailed them out and why they off-loaded Java to the community (which I’m glad they did, of course). But I can see what this merger with Oracle has done to the company and I haven’t seen any net gain yet.

Written by J. Brisbin

April 28, 2010 at 9:34 pm

Oracle/Sun can’t answer anyone, let alone SpringSource

leave a comment »

I had to chuckle when I read the headline that Oracle “answers” VMware’s purchase of SpringSource with their new WebLogic tools. I’ve just spent two days knee-deep in problems caused by a catastrophic hardware failure in one of our Sun servers and I’m convinced that a company that does business the way Oracle/Sun does cannot survive in this new cloud ecology. Actually, I’m surprised they’ve made it this far.

When we found out (on a Sunday afternoon, no less) that the particulars of the support contract we had were Monday-Friday, 8:00-5:00 and not 24/7 we dutifully kicked ourselves in the rears and called them back. What can we do to get an upgraded support contract? Where can we send the check? We weren’t asking them to do something for free. We were willing to upgrade our support contract, pay for a support incident, or go to the friggin bank and bring back a roll of non-sequential Benjamins for them for crying out loud. Monday rolls around and we start calling our vendor, who supposedly calls the Oracle/Sun account rep, who never gets back to us. We spend most of the day getting bounced from one department within Oracle/Sun to another. Level one has me upgrade some firmware to the latest version so I can tell them the mainboard is fried much more elegantly than I can with the old version that’s currently on the machine. I talk to at least three different people within support before they begin the gradual process of bouncing my issue to the field techs, who will have to come onsite with a part. All the while, we still couldn’t get anyone on the phone to take our money. We were begging them to give us a chance to pay them whatever they wanted to get our support contract bumped up to the level where they actually start taking you seriously and don’t promise to call you back in 30 minutes when they mean two hours.

Oh, and that tech who’s supposed to come out? His pager number we were given isn’t valid. We tried paging him and the familiar ascending tones and the pleasant voice: “that number is no longer in service.”

As of now, the Sun hardware is still kaput. Maybe a tech will show up tomorrow, maybe they won’t. I’m not going to hold my breath. Oracle as a company is a vast, inhuman labyrinth of beauracracies which know nothing about what the other is doing. You can’t talk to a tech and then ask them to transfer you to someone who can take your money. You also can’t upgrade support plans on the fly. Buy a time machine instead and get the support plan you should have the first time.

All day I kept thinking about how Oracle was trying to “answer” what SpringSource was doing in the cloud computing space. It’s great that they understand where the industry is headed. But I can’t see a company that does business like this to succeed in anything it does. The last Sun server we bought (because we were running everything in Solaris 10, but we’ve moved completely away from that now to VMware and Ubuntu Linux) came, literally, in pieces. It’s like they just put all the parts required to construct a server into a box and shipped that to us. I couldn’t believe it. The actual CPUs weren’t even in the chassis. I had to install the heat sinks myself with that gray putty from the hardware guys. A $12,000 server and I’m putting in the parts myself. The developer. The system admin.

I will not willingly do business with Oracle or Sun ever again. Their support has been of no use, their account reps are unreliable, and their offerings pale in comparison to the robust and flexible solutions being offered by SpringSource and the community of cloud implementors. I don’t usually try to disuade anyone from the product or vendor of their choice. Do what you want. Whatever works, right? But know that my experience with this new Oracle/Sun behemoth has been nothing but frustratingly schziophrenic. I’m embarassed now to have ever suggested we use Sun for anything. Operating system, hardware, or what have you. I should have stuck with Linux.

I’ve learned my lesson. Never again, Oracle.

Written by J. Brisbin

April 27, 2010 at 1:10 am

Distributed Atomicity in the cloud with RabbitMQ

leave a comment »

The private cloud Tomcat/tcServer session manager I’m working on has a huge job cut out for it. Maintaining the state of an object that exists in possibly more than one location at any given point in time is not an easy task, I know. To be honest, if it weren’t for my Midwestern stubbornness, I might not take the time to work through these hefty issues. I might follow the path of least resistance, like most of the industry has done so far.

I just don’t like the idea of sticky sessions. I look at my pool of tcServer instances as one big homongenous group of available resources. In my mind, there should be no distinction made between machines running in different VMs–or even on different hardware. They should exist and cooperate together as a single unit.

But in “replicated” mode, each server has a copy of the object. This is great for failover and it makes the session manager extremely performant. But yet another sticky wicket rears its ugly head. How do I protect this object and make sure it gets updated properly before someone else has a chance to operate on it?

Call it distributed atomicity if you want–the idea being that an object exists within the context of a cloud of compute resources (in this case, a Tomcat/tcServer user session object) and needs to be updated with all the right attributes when code in a different physical process operates on that object. I’m attacking this problem by implementing a form of distributed atomicity that uses RabbitMQ to send the contents of newly-added attributes to any interested parties throughout the cloud. I already replicate the session object by grabbing it with a Valve, just before the request is completed. This session object gets serialized to the cloud before the response is sent, the idea being that this particular object will be updated in all the places it is needed before another server has a chance to operate on that object.

By using the messaging infrastructure of RabbitMQ, I can at least make updates to this object reasonably atomic. Now the question becomes: where does this object live? For performance reasons, it’s probably not realistic to have just one object to share among web application servers. In the case of Tomcat/tcServer, the internal code is requesting the session object so often (multiple times during a single request) that each server simply has to cache a session object for the length of the user’s request.

A tool like ZooKeeper might be helpful in this case. If code has to set an attribute on a session object, the session would set a barrier in ZooKeeper that lets other code know it is in the process of being altered. Once setAttribute() is finished, a message is then sent with the serialized attribute. The other interested parties could alter its local copy of the object with the updated attribute until it receives a full replication of the object. Would the second, full replication be superfluous? At this point I can’t say. In the interest of completeness, I feel compelled to issue a second replication event, but in the interest of performance and bandwidth conservation, I wonder if its really necessary.

I’m far from finished with the cloud-based session manager. I’m trying to get it to a stable point so that I can migrate my cloud away from sticky sessions. The “replicated” mode seems to work fine; and I’m okay with sending too many messages–I’d rather have that than have too few and end up with page loads blocking because the session can’t be tracked down.

Distributed, asynchronous programming isn’t easy. It isn’t for the faint of heart or those with pesky bosses breathing down their necks to meet arbitrary and usually unhelpful deadlines. It also doesn’t help if you’re not a bona-fide genius. I often feel a little out of my league given the number of CompSci grads that are doing fantastic work in this interesting and growing segment of the industry. But I’m stubborn enough to keep plugging away when I should probably give up.

Written by J. Brisbin

April 22, 2010 at 6:17 pm

Change logging package to SLF4J/Log4J in tcServer/Tomcat

with one comment

I really dislike the JULI logging package which is Tomcat’s (and thusly tcServer’s) default. Its configuration seems uncomfortable and the log files are almost unreadable without grepping out what you’re looking for. In all my other applications I use SLF4J, powered by Log4J. This combination is powerful, easy to configure, and I like that it doesn’t put the date of the filename on the log file until after its rotatated. There’s been discussion on the Tomcat list recently about maybe changing this in the future, but I’m not very patient and I’d rather not spend the precious little time I do have mucking about with things that are difficult.

The documentation describing the switch from JULI to Log4J isn’t very long or informative, though the process itself–to be fair–isn’t very complicated. But I get the sense that not many Tomcat developers want to discuss switching from JULI to Log4J, hence the lack of documentation.

Making the switch for tcServer is really only one additional step, though the way tcServer structures its instance directories makes it slightly more complex to configure for use with Log4J.

Due Diligence

Please read the official documenatation on switching from Tomcat JULI to Log4J first. We’ll be doing things a little bit differently, but you should understand where we’re coming from before simply jumping into this.

Building Tomcat

In order to switch from the default Tomcat JULI package, you’ll need to build Tomcat from source, then build the “extras” module. The official documentation leaves out that you have to build the whole server first, then build the extras. If you build only the extras, without building the whole server, you’ll end up with ClassNotFound errors when you try to start Tomcat/tcServer.

UPDATE: You can build the extras module from source, but, come to find out, SpringSource has helpfully included the two jar files mentioned in “tomcat-6.0.20.C/bin/extras”. You can simply copy those jar files to the locations discussed here rather than building the whole server from source.

Building Tomcat

  1. I’m using tcServer 6.0, so download the source tarball for Tomcat 6.0.20 and unzip it somewhere.
  2. “cd” into that directory.
  3. Copy the build.properties.default file to build.properties.
  4. “vi” build.properties and uncomment the “jdt.loc” property, which will allow the Ant build to download the JDT compiler, which is a requirement of the build process.
  5. Increase Ant’s heap size: export ANT_OPTS=-Xmx256m
  6. Build the server: ant
  7. Once the Tomcat server has been successfully built, build the “extras” module: ant -f extras.xml

When that’s finisehd:

  1. Copy ($TCSERVER_HOME/tomcat-6.0.20.C/bin | $TOMCAT_SRC/output)/extras/tomcat-juli.jar file to $TCSERVER_HOME/tomcat-6.0.20.C/bin/tomcat-juli.jar.
  2. Copy ($TCSERVER_HOME/tomcat-6.0.20.C/bin | $TOMCAT_SRC/output)/extras/tomcat-juli-adapters.jar to $TCSERVER_HOME/tomcat-6.0.20.C/lib/
  3. Delete $TCSERVER_INSTANCE_DIR/conf/logging.properties.

Now, copy the Log4J and SLF4J jars. I used the ones from my personal Maven repository (from the $TCSERVER_HOME directory):

cp ~/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar tomcat-6.0.20.C/lib
cp ~/.m2/repository/org/slf4j/slf4j-api/1.5.8/slf4j-api-1.5.8.jar tomcat-6.0.20.C/lib
cp ~/.m2/repository/org/slf4j/slf4j-log4j12/1.5.8/slf4j-log4j12-1.5.8.jar tomcat-6.0.20.C/lib
cp ~/.m2/repository/org/slf4j/jcl-over-slf4j/1.5.8/jcl-over-slf4j-1.5.8.jar tomcat-6.0.20.C/lib

Configuration

When you’ve got all the dependencies copied over, you need to put a configuration file in one of two places, depending on how you want to configure logging for your instances. In my case, I use three identical instances (actually, the names of the instances are different, but other than that, they’re identical) of tcServer, so I could put my log4j.xml file in tomcat-6.0.20C/lib/. In your case, though, assuming your instances are configured differently from one another, you might want to put your log4j.xml file in (assuming an instance name of “dev1”) dev1/lib/.

NOTE: You also need to “vi” the tcServer start script (tcserver-ctl.sh) and comment out the lines that deal with a logging manager and a logging config file (lines 261-262 and 268-269). UPDATE: I actually don’t think this is necessary now. I think my errors were caused by something else. I think it’s safe to leave these be.

If you’re already using Log4J and SLF4J, you’ve likely already got an example XML file lying around that you could use. Copy that file to one of the locations mentioned previously. Mine looks something like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j/">

  <appender name="console" class="org.apache.log4j.ConsoleAppender">
    <layout class="org.apache.log4j.PatternLayout">
      <param name="ConversionPattern" value="%d %-5p %c{1} - %m%n"/>
    </layout>
  </appender>

  <appender name="catalina" class="org.apache.log4j.DailyRollingFileAppender">
    <param name="File" value="${catalina.base}/logs/catalina.log"/>
    <layout class="org.apache.log4j.PatternLayout">
      <param name="ConversionPattern" value="%d %-5p %c{1} - %m%n"/>
    </layout>
  </appender>

  <appender name="vcloud" class="org.apache.log4j.DailyRollingFileAppender">
    <param name="File" value="${catalina.base}/logs/vcloud.log"/>
    <layout class="org.apache.log4j.PatternLayout">
      <param name="ConversionPattern" value="%d %-5p %c{1} - %m%n"/>
    </layout>
  </appender>

  <category name="org.springframework">
    <level value="INFO"/>
  </category>
  <category name="org.quartz">
    <level value="INFO"/>
  </category>
  <category name="org.apache.catalina">
    <level value="INFO"/>
    <appender-ref ref="catalina"/>
  </category>
  <category name="com.jbrisbin.vcloud">
    <level value="DEBUG"/>
    <appender-ref ref="vcloud"/>
  </category>

  <root>
    <level value="INFO"/>
    <appender-ref ref="console"/>
  </root>

</log4j:configuration>

You can now add categories and appenders to suit your particular needs. You can also change the pattern to suit your tastes.

Written by J. Brisbin

April 20, 2010 at 2:54 pm