Wednesday, December 26, 2012

Everything You Always Wanted To Know About Hibernate Performance Tuning But Were Afraid To Ask

One of the things I've learned over years developing software is that there's no such thing as a black box.  If you pick a framework or library off the shelf, just hope it works forever and that you'll never have to learn much about it, you're due for a day of reckoning.  The more complicated the problem this library solves, the sooner this day of reckoning will come.

In my opinion, there are fewer technologies more prone to black box syndrome than Hibernate.  If you browse forum posts and discussion threads on Hibernate, most of the posts are from people who just refuse to look inside the black box.  It's possible to get up and running on Hibernate without learning much - that is true.  But to use it in the real world, at web scale (whatever that means) requires a much deeper understanding of it - how it fetches data, how it caches data, etc.  Otherwise, your software will be dirt slow.

We've tinkered with Hibernate performance over the years, but always shied away from blocking off a few weeks to bear down and really dial it in.  Over the last few weeks, we took one of our high volume customer systems as a test case and worked through six key use cases and through a fairly agonizing and time consuming process, were able to achieve 20X-30X performance boosts.  We were hoping for 100X, and there are still some things we can do to (unrelated to Hibernate) that can possibly get us to the next order of magnitude, but we'll have to defer those for another day.

On XML And Religion

One area unrelated to performance tuning, but nonetheless a big philosophical debate within the community of Hibernate users, is whether to define the object relational mappings via JPA annotations or an external XML file.  The cool kids these days seem to prefer Annotations. Come to think of it, the cool kids seem to hate XML for just about anything these days.  And of course, anyone, especially someone who works in technology, who has absolute love or hatred for any technology, isn't practicing science or engineering - they're practicing religion.  This is a well known tendency of developers, which is why you'll often hear technology subjects referred to as "religious issues". 

The alternative to being a religious developer is being someone who evaluates technologies in a less satisfying, but more practical way.  XML's a good example.  You hear a lot of developers these days slam XML outright, preferring JSON or plain text files.  My take is that XML is bad for data, but good for configuration.  XML is self documenting which makes it easy to configure servers or object/relational mappings via XML, but it's not the sort of thing I'd want to send over the wire when speed or bandwidth are factors.  In those situations, JSON or a bitpacked format like AMF is more appropriate.  Right tool for the job.


I think this religious hatred of XML, even in situations like configuration where size isn't a factor, has driven a lot of young developers to go the annotations route with Hibernate.  One of the problems with religious zeal is it tends to blind people to all other considerations.  For example, there are all kinds of reasons why it's bad to co-mingle a system's domain model with the persistence mechanism, but all these reasons get smothered by XML hatred.

For example, suppose you believe, as I do, that a domain object or entity shouldn't know about the table used to persist it.  Why do I believe this?  It doesn't involve a bearded prophet.  It's because you might change your mind one day about the persistence mechanism or a customer might force a change on you.  All the persistence code needs to be contained in a single layer that can be swapped out if need be.  The annotation camp will usually say "nobody will ever really change the database" or "JPA is the persistence abstraction".  Maybe.  But you might change the ORM.  You might upgrade it (as we just did.)  You might decide to switch from HQL queries to criteria objects.  It's nice to know that all the code you'd need to change might be in a set of related DAO classes instead of scattered all over the application.  An even more practical example is that caching and indexes might be slightly different for different deployments of the application.  Being able to swap in a different set of Hibernate mapping files for a self-hosted deployment versus a cloud hosted deployment is of real value to us now, not in the abstract.  You can mix annotations with XML files, but I tend to think this creates confusion.

So we prefer XML configuration over Hibernate/JPA annotations at Flex because it decouples the persistence details from the domain.  I will concede, however, that if I were working on an internal application at a corporate IT shop where only one instance of the application will ever be run, the prospect of ever needing a plug-and-play persistence layer is pretty remote.  In that case, putting all the mapping configuration in annotations might make the code easier to understand.  I actually used this approach at the company I worked at prior to Flex.  Like most technology choices, XML vs Annotations is a choice that has to be informed by rational considerations specific to the environment and the project.  I hate XML is not a rational consideration (though you're still free to hate it.)

How Hibernate Works

With the philosophical debate out of the way, we can move on to the technical issues of making Hibernate fast.  We'll begin with a brief description of what Hibernate is and how it works - which should come as a great relief to those readers who've muddled through all the jargon to get this far in the post.

Most Java software (and most modern software) is based on an Object Oriented design model.  This means we represent "nouns" or data as objects, or more formally "classes".  For example, in Flex we have classes like ScanRecord, InventoryItem, SerialNumber and so on.  There are actually hundreds of different classes in Flex.  We do this because it's much easier and logical to work with objects than it is to work directly with database queries and recordsets.  Contrary to what a lot of Database Administrators might think, we developers do this because it's mundane and repetitive to work directly with databases, not because we're afraid of SQL.

Hibernate's job is to translate these domain objects/classes into SQL for us, and in so doing remove much of the drudgery of database interaction so we can focus on higher order thinking skills like business logic and user interfaces.  To make this work, we configure Hibernate (via Annotations or XML files) to know which tables go with which classes.  Then we can use Hibernate provided classes to save objects, delete them and query them.  Hibernate generates the SQL and takes care of the details.

To make this fast Hibernate provides several different caching mechanisms.  There's the query cache, which enables Hibernate to intelligently decide when a query doesn't need to be rerun.  There's a session cache, which caches objects for a brief time in the Hibernate session (typically a session has the lifespan of a single HTTP request), and a second level cache, which can cache objects between sessions.

If you examine a commonly used method on the Hibernate session like byId(), which retrieves an object instance by it's identifier or primary key, you'll see that Hibernate checks for the an instance of the object in the session cache and the second level cache before resorting to running a database query.

Let's assume that we all have a working knowledge of Hiberate now and dive into some optimization tips we uncovered over the last few weeks.

The Session Cache

You get the session cache for free.  There's nothing to configure or turn on and it's just a HashTable with a real reference to persistent object (as opposed to the second level cache - which caches a serialized representation of the object.)

In essence, if a session attempts to retrieve an object twice, it will only result in one database hit.

The session cache is faster than the second level cache because it doesn't have to deserialize or "hydrate" objects.

The Second Level Cache

The second level cache is used to cache objects with a lifespan longer than the session.  Any meaningful performance tuning will usually involve extensive use of the second level cache.  In order to use the SLC, you have to configure an external cache provider like EHCache or SwarmCache (we use EHCache) that plugs into Hibernate and handles all the details of sizing, eviction, disk overflow, etc.

You also have to tell Hibernate which classes to cache.  In our XML based approach, that requires adding a cache tag to each class we want to cache, like this:

    <class name="ShippingMethod" table="st_biz_shipping_method">
        <cache usage="nonstrict-read-write"/>
        <id column="id" length="36" name="objectIdentifier" type="string">
            <generator class="alto-uuid"/>
        </id>
        <property column="method_name" length="128" name="name"/>
        <property column="method_code" length="16" name="code"/>
        <property column="method_type" length="16" name="type"/>
        <property column="min_days" name="minimumDays"/>
        <property column="max_days" name="maximumDays"/>
        <many-to-one column="cost_rule_set_id" name="costRuleSet"/>
        <many-to-one column="waybill_template_id" name="waybillPrintTemplate"/>
        <set name="disabledPricingModels" table="st_biz_rc_disabled_pricing_models">
            <cache usage="nonstrict-read-write"/>
            <key column="inventory_item_id"/>
            <many-to-many class="com.shoptick.bizops.domain.PricingModel" column="pricing_model_id"/>
        </set>
                  
    </class>

This example will ensure that ShippingMethod's get cached in the second level cache - and just ShippingMethods.  A common misconception about the second level cache relates to what happens to the object graph under an object that's been cached.

Let's clear up the confusion.  Caching a class only caches instances of that class.  It will not cache any associated classes.  In this example, costRuleSet and the waybillPrintTemplate values will not be cached.  The ID of the related object will be cached, but not the object itself.

This is actually a really good design from the standpoint of concurrency.  We don't have to worry about dozens of objects in the cache that all refer to the same related object, and that each cached instance might have a stale version of the related object.

Under the hood, when Hibernate chooses to place an object in the second level cache, it takes the class (and just the class for which caching is enabled) and serializes it to a simple string based format that represents simple data types: strings, numbers, dates, etc.  If one of the properties refers to another object, that object's identifier is stored in the cache - but not the object.

When an object is loaded from the cache, Hibernate instantiates an instance of the class and sets all it's properties using the serialized values stored in the cache.  This is called hydration.  If one of those values is an identifier for another object, that object will be loaded using the normal three step process: session cache, second level cache, database.  This happens recursively until the whole object graph is loaded (assuming the object graph is eager fetched.)

List, map, set and other collection style associations are not cached by default.  In the previous example, you'll see that we have a cache tag as part of the set declaration.  In this case, a separate cache is used just to store collections.  But note that collection caches are just mappings of ids to ids.  The key for cache will be the parent object's identifier and the values will simply be a list or set of id's.  All that's cached is the association, not the referenced objects at either end of the association.

Caching Modes

There are three cache modes supported by Hibernate: read-only, read-write and nonstrict-read-write.  These modes relate to how caches are locked and used to enforce concurrency. 

Read only is the fastest, since there is no locking overhead.  The downside is that objects cached as read-only will not get refreshed if an instance of the object is changed.  Read-write is the slowest because every read operation takes out a lock to prohibit a write operation.  It also uses locks to prevent concurrent write operations.  The nonstrict version of read-write assumes there won't be concurrent write operations and as a result, has a much lower overhead in terms of lock synchronization.  We use this version extensively for configuration data and read-write caches for data subject to frequent changes like line items.  We don't use read-only caches at all.

Lazy Fetching and Preloading

Most systems that start to make extensive use of the second level cache will inevitably need a preloader that initializes the cache with frequently referenced objects.  In this case it makes good sense to switch a lot of the objects and the portions of their associated object graphs likely to be frequently needed to eager fetch such that the entire object graph gets preloaded and not just the top level object.

We ended setting a large number of properties to eager load that we ordinarily wouldn't have because we noticed a little quirk where lazy loaded associations were missing the cache.  For example, if getPricingModel() were set to lazy load and the associated pricing model was in the cache, the getter would go back out to the database anyway.  We think this could be the fault of the Hibernate Transaction Manager provided by Spring, but we aren't sure.  In short, our advice is to eager fetch if the associated object class is also cached.

The Join Fetch Problem

Let's consider a many-to-one association on a class like the one configured here:

  <many-to-one column="customer_id" name="customer"/>

If you want to load the parent object and the customer object, you can either run one query or two.  You can run a single query that joins the parent class table and the customer table or you can run two separate queries: one against the parent class table and a second query against the customer table.  Hibernate lets you choose which way to do this using a fetch strategy.  The snippet below shows the two options in context.

    <many-to-one column="customer_id" name="customer" lazy="false" fetch="join"/>
   <many-to-one column="customer_id" name="customer" lazy="false" fetch="select"/>

In a normal use case, where neither the parent object or the child object are cached, one query is generally better than two, so the default fetch mode is join.

But once you move to a configuration where the child object is likely to be cached, a fetch strategy of join can cause problems.  Think about it.  The purpose of the cache is to avoid reading information from the database unless absolutely necessary, especially information that doesn't change that often, like product descriptions or customer contact information.  Assuming the parent object is not cached and we're definitely going to need a query to fetch the parent object, we have no way of knowing what the id of the associated customer object is without first running a query.  So Hibernate runs that query with a join that brings back all the child object's fields along with the parent's.  In short, if you use a fetch strategy of join, you negate the benefit of using the cache because Hibernate will end up hitting the cached object's table anyway.

The solution is counter-intuitive: use a fetch strategy of select.  For the optimizing mind this seems scary because a select strategy means two database queries instead of one.  If neither object were in the second level cache, that would be true.  But if the associated object is highly likely to be in the cache, only one query gets run because the first query brings back the id (because the id is on the parent object's table).  Once that ID is in hand, Hibernate can check the caches for it before resorting to running a query.

If you want to guarantee a second level cache hit for many-to-one associations, make the properties eager fetch with a fetch strategy of select.

The Query Cache

One of the more common mistakes when configuring Hibernate is to enable the query cache and do nothing else.  This doesn't work.  You have to tell Hibernate in the code (or though saved queries) which queries or criteria objects can be cached. 

A lot of developers are gunshy about caching queries because of concurrency fears.  This is valid when something other than Hibernate writes to a table being queried, but if all the database I/O that can update the database is controlled by Hibernate, it's okay to be very aggressive with query caching.

Hibernate is smart enough to figure out when a cached query should be evicted.  It does this by checking the query cache every time a object is updated or deleted and evicting all queries that reference one or more of the updated tables.

If some other process writes to the tables, query caching can be dangerous (so could caching in general).  Otherwise, it's pretty useful, but you must manually tell Hibernate which queries to cache by calling setCachable() on the query or criteria object.


Use byId() Instead of load() Or Queries

One of the really dumb things we found in the code is that we were using queries to retrieve objects by identifier.  We did this to enable additional HQL to be added to queries for soft deletes, etc.  It's a dumb idea and ensures that in simple situations where someone calls findById() on a service or DAO, that an SQL query gets executed, even if the object is cached.  One could hope that Hibernate would be smart enough to know that the only field in the where clause is the object's identifier and check the cache before running a query, but it doesn't work that way.

The solution is to use the byId() method on session instead of running a query.  This will ensure that Hibernate checks the cache first.  There's also a method on session called load() that seems semantically equivalent to byId().  The difference is that load() will never return null.  It will return a clean instance of the object if no object matching the id exists.  This is usually not what you want.  Stick to byId().

Use Natural Keys

Under the hood, we're a big believer in surrogate keys, meaning primary keys that are random, immutable and have no meaning at all in business terms.  We use 36 character UUID's as primary keys instead of sequential integers.  If you need to generate sequential numbers, you need locks and when you cluster the database you need expensive locks enforced with network I/O.

Every domain object in Flex and therefore every database table has a UUID as a primary key.  But sometimes you need to lookup an object by an alternate identifier, or a natural key.  In computer science lingo, a natural key would be a unique identifier that has some kind of meaning to the user.  Examples would be social security numbers, bar codes, user id's, or job numbers.

The most common natural key lookups in flex are user id's for logins and barcodes for inventory.  In the previous version of Flex, barcode lookups were done using an HQL query, albeit a simple one.  This has the same disadvantages of using a query to retrieve an object by it's surrogate key: even if the object you're looking for is cached, Hibernate runs a query anyway.  In the case of natural keys, it doesn't know which ID is associated with a given barcode, so it has to go to the database, and by default, when Hibernate goes to the database, it gets the whole object, even if it's cached.

To work around this issue, you can configure one property of an object as a natural key.  Here's the real configuration we use to support barcode lookups:

    <class name="ManagedResource" table="st_biz_managed_resource" lazy="false">
        <cache usage="read-write" include="all"/>
        <id column="id" length="36" name="objectIdentifier" type="string">
            <generator class="alto-uuid"/>
        </id>
        <natural-id mutable="true">
            <property column="bar_code_id" length="64" name="barCodeId" not-null="false"/>
        </natural-id>
This example flags the barCodeId property as a natural key and enables you perform a natural key lookup and take full advantage of the cache - as shown in this example:

return session.bySimpleNaturalId(InventoryItem.class).load(barcode);

This bypasses the normal query approach and saves a query - in theory.  In reality, it results in a simpler query and in time no queries.  When you invoke bySimpleNaturalId(), the first thing Hibernate does is try to determine which primary key matches the given natural id.  There is a natural id cache and this is checked first.  If there's no id in the natural key cache, Hibernate will run a very simple query like this:

         select id from inventory_item where bar_code = ?

This just gets the primary key and if the object isn't cache, the system will run a second query to retrieve the object.  Otherwise, the object will be retrieved from cache and the second time the same natural key is looked up, the natural-key to primary-key mapping will be in cache and there won't be any database queries at all.

Hibernate's natural key feature is really just a special technique for improving cache efficiency.  If you're not using the cache or have an object that you're not caching, natural keys can actually slow things down because you'll always get two queries with a natural key lookup: one to get the natural key to primary key mapping and another for the main object lookup.

Key Generation

Another good way to get a speed boost, especially if you define speed as reducing the number of database hits, is to use an in-process mechanism for generating new primary keys.  This is virtually impossible to do with sequential integers or other database driven key generation mechanisms.

Prior to 4.6, we used MySQL's built in GUID generator to generate primary keys, even though we weren't using sequential integers.  This meant that every insert operation required an extra select query to get a new primary key.

As part of this release we developed our own pure Java UUID generator and configured Hibernate to use it.  We released the UUID generator as part of the open source multi-tenancy project we've launched here: http://code.google.com/p/flex-alto/


In Conclusion

This long post hopefully chronicles some of the lessons we've learned in the process of getting Hibernate up to high speed.  Like so many things in performance tuning software, the solutions are rarely clever or exotic.  You either do things before you need to (preloading), wait until the last minute (lazy-fetching) or do them only once (non-preloaded caching).

The next version of Flex will be 4.6.0 and will include these performance improvements.  We're in the process of doing QA rework and should be in regression testing by the end of the week.  We're planning a short beta to test the memory footprint of the bigger cache in production and will do a wider release after that hurdle is cleared.

Wednesday, December 5, 2012

Hibernate 4

As part of our new "flight plan" for the end of the year, we decided to take this opportunity to upgrade Hibernate, the object/relational mapping tool responsible for most of the database interaction in Flex.

Upgrading our technology stack is fairly common at Flex.  We frequently upgrade Spring, tons of smaller libraries like apache-commons and we recently upgraded Jasper Reports.  Upgrading Hibernate, however, is not something to be undertaken lightly.  Is very rare that you can upgrade Hibernate without doing some kind of refactoring or finding strange and serious regressions.

We Fear Change

Part of this is cultural.  The folks at Hibernate and JBoss have a possibly undeserved reputation for not respecting backwards compatibility.  Gavin King, the original developer, had a paternalistic attitude about how Hibernate should be used and expressed this attitude through a notoriously abrasive personality.  Steve Ebersole, the current lead for Hibernate, is a bit more diplomatic and it seems that Hibernate now has a kinder and gentler culture.  Another reason Hibernate may have struggled with reverse compatibility issues is that the problem they're solving - mapping complex object graphs to two dimensional database tables - is incredibly difficult.  With so much abstraction and propeller-head algorithms required to build something like Hibernate, I'm not surprised in the least that abstractions leak and have to be revised as things move forward.

I would never even attempt to develop something like Hibernate and I have a tremendous amount of respect for what the Hibernate people have accomplished.  Not only did they build a great and useful ORM tool that's become the defacto standard - they forced the entire Java establishment including the folks at Sun and now Oracle to redefine Java persistence.  JPA is Hibernate and wouldn't exist without it - make no mistake.  Were it not for Gavin (love him or hate him) and Hibernate, Sun would no doubt still be trying to cram Entity Beans down our throats.

I mention this "upgrade friction" not to bag on Hibernate, but to highlight the fact that you don't upgrade Hibernate when you have a project that's up and running unless you expect to realize some kind of tangible benefit from it.  For us, we liked the new service based approach in Hibernate because it seemed more Spring friendly (but not quite, as we shall see) and we liked the new caching architecture that got introduced over several iterations of Spring 3.  But the big thing for us was multi-tenancy support.  The big project on the horizon for us at Flex is converting the architecture to a one-instance/many-customers approach.  I didn't even know the lingo for we were about to do was "multi-tenancy" until I started reading release notes for Hibernate 4.  That sealed the deal.  However painful it might be to upgrade Hibernate, it had to be less pain than developing our own JDBC layer multi-tenancy system (which we may still have to do, but the likelihood is far less now.)

Can't We All Just Get Along

We, like many people who use Hibernate, also use Spring.  Hibernate folks, in their talks and blogs, tend to downplay a Spring/Hibernate stack, even suggesting in an oblique way that Spring/Hibernate is an uncommon architecture that kind of annoys them.  Of course, anybody with their eyes open in the world of Java Architecture knows that Spring/Hibernate architectures are incredibly common, perhaps the most common Java technology stack these days.  It wouldn't surprise me if the people that attend JBoss conferences don't reinforce this reality, but it's a reality nonetheless.

The Spring/Hibernate feud, to whatever extent it really exists, reminds me of an interview I once saw with Stephen Morrissey, known to his fans as just "Morrissey" from the Smiths.  In this interview Morrissey talks about how he saw a fan of his in the airport wearing a Cure T-Shirt.  Morrissey gave the fan a talking to and goes on to talk about how much he hates The Cure, which was a shock to me.  It's very rare to find a CD collection with Morrissey that doesn't also include The Cure.  Another analogy might be the South Park / Family Guy feud.  It's disappointing because fans of one are usually fans of the other and it also reflects a certain cluelessness on the part of the belligerents.  If you think you can convince a typical Morrissey fan to pick a side and abandon The Cure, you don't have a clue, you don't understand your fans.  Likewise, every time Hibernate takes a dig at Spring, they're taking a dig at their own users.  We're going to use both together as long as there is a Spring and a Hibernate - and it's time they embraced the idea.  I'd like to see the Venn diagram of Spring and Hibernate contributors.  If they don't touch, that's a problem.  (And maybe they do.  I haven't checked.)

Square Pegs and Round Holes

With all this intrigue and background well established, let's talk the actual upgrade process.  A key issue for us was the ability to define a RegionFactory (Hibernate's cache abstraction) in Spring and inject it into the Hibernate Session Factory.  We had no problem doing this is in Hibernate 3 using one of Spring's factory beans.

There was no facility for doing this in Spring's SessionFactoryBean for Hibernate 4.  I read that Hibernate 4 permits services to be swapped out using the ServiceRegistry, so I assumed that Spring's Hibernate 4 factory bean was just a little stale.  So, I subclassed it to support service injection and used the ServiceRegistry as documented to inject our RegionFactory.

Problem is it didn't work.  We'd get an error during startup warning us to use Hibernate's preferred method of defining a region factory - which is by classname as a configuration property.  You can't inject dependencies in things you define by class name, which is why we don't want to do it that way.

I poked around in the Hibernate source for several hours and found this little gem in a class called CacheImpl, which is the class that Hibernate uses to implement their cache integration.

public CacheImpl(SessionFactoryImplementor sessionFactory) {
    this.sessionFactory = sessionFactory;
    this.settings = sessionFactory.getSettings();
    //todo should get this from service registry
    this.regionFactory = settings.getRegionFactory();
    regionFactory.start( settings, sessionFactory.getProperties() );
    ....
}
So, there it is, proof positive that you can't inject a RegionFactory in Hibernate no matter how hard you try.  Although it does suggest that the folks at Hibernate are aware of the issue and will fix it relatively soon.

But we needed a fix now so I spent several more hours trying to find something I could subclass or swap out that would allow us to sneak in our RegionFactory, but through a combination of default scoped interfaces and final classes, it was all for naught. In the end, we had to fork Hibernate and change that one line of code ourselves. As it turned out, it wasn't just one line of code - several different places where RegionFactory is looked up had to be changed. I thought about submitting a patch to Hibernate, but my fix might be simplistic given other ways Hibernate is used, so I figured it was better to wait.

So, for now we have a fork of Hibernate.  We'll switch back to a standard version as soon as they fix the issue.

Sessions

If other major issue we've seen with Hibernate 4 - and in this case I think the issue concerns both Hibernate and some of the code Spring provides to work with Hibernate - is that Sessions aren't always readily available.  We no longer have a "get a session and create one if one doesn't exist" method of getting sessions.  We use the HibernateTransactionManager provided by Spring and the workaround so far has been to make facade or service methods that don't need transactions (things that Spring generates transactional proxies for in our architecture) transactional.  We also use the Spring OpenSessionInView filter which handles this for all browser initiated requests, leaving this session issue relegated to scheduled tasks, JMS queue consumers and startup stuff.  Personally, I think the Spring Hibernate Transaction Manager needs to be tweaked to handle this issue, but that's just my take on it.  I don't know the inner workings of Hibernate or Spring well enough to know if that's really the solution.  It's something we've been able to easily work around, so no biggie.

Our Region Factory

Since we went through so much trouble to inject our own Region Factory, it's reasonable to ask why we'd go through all that trouble.  The answer is that we're moving toward a custom distributed cache that's a hybrid of EhCache and Memecached.  Some caches are small and static enough that simply having a local in memory cache (provided via EhCache) makes perfect sense.  In other cases, we need to move the memory footprint of the cache to another server.  We need a lot of flexibility and most of the current cache implementations impose a kind of all or nothing proposition.  We're not even 100% sure how this will work yet.  More to come on that front....