Friday, June 21, 2013

The Pitfalls Of Virtualization

This week we took a little road trip down to San Diego and helped one of our clients migrate from our cloud architecture to their own in house server.  Not many customers choose this option, preferring the convenience of the cloud, but we're always happy to support anyone who does.

I've always cautioned customers to have realistic expectations with regard to performance when migrating from the cloud to an in house server, assuming that most performance issues are in the code and not the hardware.  We decided to use this particular install as an opportunity to conduct a real test of this theory and were fairly blown away by the results.

We started by loading a large quote on the EC2/cloud based system and executing some common workflow tasks: adding line items, generating pull sheets, scanning things out and back in.  We did all this while remotely profiling the Java virtual machine. 

The Sawtooth Pattern

I wish I'd thought to grab a screen shot at the time, but while quotes were loading we saw a zig-zag sawtooth pattern on CPU utilization at regular periods.  This reflects the underlying nature of virtualization.  One physical server hosting multiple virtual servers must find a way to allocate CPU time between them and it usually does this the same way a multi threaded operating system allocates CPU time to threads - through time slicing.  Server A can have the CPU for a few hundred milliseconds, then Server B can have it, and so on.

Real Servers

Then we ran the same tests on the real server - an 8 core HP Proliant with 32GB of memory and software RAID - and with exactly the same versions of Flex.  The results were astonishing.  The client felt as though response times more than doubled and our measurements confirmed that the dedicated servers was 2-10 times faster than the Amazon cloud, depending on what operations the user was doing.  We expected an improvements of 20%-50%, not 200% to 1000%.

Now What?

This information is too compelling not to act on.  This is the summer of speed, after all, a time when we're almost exclusively focused increasing the speed of Flex.  We have some big software related speed improvements coming in Flex 4.6.15 that are now in testing and we've now discovered an opportunity to couple those improvements with the speed boost that comes from moving to dedicated hardware.

We've already started the process of moving away from virtual servers in Europe, where we run two production servers that support our European and South African customers.  We're working with a data center located in Roubaix, France and have moved several EU customers from the Amazon data center in Dublin to France.

Building a New Cloud

Unfortunately, that leaves 30+ servers in the United States to support everyone else.  Moving that many servers is a much larger project.  We've opened up talks with several Tier 4 data center operators in North America about moving our entire remaining infrastructure to dedicated hardware.  We also plan to use this opportunity to beef up security, split the application and database load into separate servers, and introduce a load balancer.  Our new proposed architecture for North America is shown in the diagram below:



The final version of our new network will likely be a bit different as we incorporate feedback from networking and security consultants, but the general idea of a front side network for app servers with a backside network for database I/O will probably survive the process.

Another Look At Self Hosting

The vast majority of our customers use our cloud hosting option.  We've always supported a traditional self-hosted site license deployment model, but have never really pushed it as most customers seem to prefer a modest monthly payment to the up front costs associated with servers and software licenses. 

But our recent experience has prompted us to reevaluate our tendency to downplay self-hosting.  We're currently putting together pricing for servers with Flex preinstalled along with a few variations of on-site help getting up and running.  I believe Chris will be reaching out to the customer base soon with all the details.

A Bird In The Hand

It's never fun having problems with the software, but I'd rather know about them and know the reasons than not know, even if the solution is sometimes complicated and time consuming.  Moving to dedicated high performance hardware seems like a no-brainer given the scale we're running at these days.  We should ink a deal with the new data center by mid July and move everyone over to the new cluster shortly thereafter.

For our European and South African customers, the necessary hardware is already up and running.  We should have those customers still running on the Amazon cloud fully moved over by Monday morning.

No comments:

Post a Comment