Thursday, January 31, 2008

Hibernate Caches

Background
Hibernate comes with three different caching mechanisms - first level, second level and query cache. Truly understanding how the Hibernate caches work and interact with each other is important when you need to increase performance - just enabling caching in your entity with an annotation (or in classic .hbm.xml mapping file) is easy. But understanding what and how things happens behind the scenes is not. You might even end up with a less performing system if you do not know what you are doing.

SessionFactory and Session
The purpose of the Hibernate SessionFactory (called EntityManager in JEE) is to create Sessions, initialize JDBC connections and pool them (using a pluggable provider like C3P0). A SessionFactory is immutable and built from a Configuration holding mapping information, cache information and a lot of other information usually provided by means of a hibernate.cfg.cml file or through a Spring bean configuration.
A Session is a unit of work at its lowest level - representing a transaction in database lingua. When a Session is created and operations are done on Hibernate entities, e.g. setting an attribute of an entity, Hibernate does not go of and update the underlying table immediately. Instead Hibernate keeps track of the state of an entity, whether it is dirty or not, and flushes (commits) updates at the end at the end of a unit of work. This is what Hibernate calls the first level cache.

The 1st level cache
Definition: The first level cache is where Hibernate keeps track of the possible dirty states of the ongoing Session's loaded and touched entities. The ongoing Session represents a unit of work and is always used and can not be turned of. The purpose of the first level cache is to hinder to many SQL queries or updates beeing made to the database, and instead batch them together at the end of the Session. When you think about the 1st level cache think Session.




The 2nd level cache
The 2nd level cache is a process scoped cache that is associated with one SessionFactory. It will survive Sessions and can be reused in new Session by same SessionFactory (which usually is one per application). By default the 2nd level cache is not enabled.
The hibernate cache does not store instances of an entity - instead Hibernate uses something called dehydrated state. A dehydrated state can be thought of as a deserialized entity where the dehydrated state is like an array of strings, integers etc and the id of the entity is the pointer to the dehydrated entity. Conceptually you can think of it as a Map which contains the id as key and an array as value. Or something like below for a cache region:

{ id -> { atribute1, attribute2, attribute3 } }
{ 1 -> { "a name", 20, null } }
{ 2 -> { "another name", 30, 4 } }

If the entity holds a collection of other entities then the other entity also needs to be cached. In this case it could look something like:

{ id -> { atribute1, attribute2, attribute3, Set{item1..n} } }
{ 1 -> { "a name", 20, null , {1,2,5} } }
{ 2 -> { "another name", 30, 4 {4,8}} }

The actual implementation of the 2nd level cache is not done by Hibernate (there is a simple Hashtable cache available, not aimed for production though). Hibernate instead has a plugin concept for caching providers which is used by e.g. EHCache.

Enabling the 2nd level cache and EHCache
To get the 2nd level cache working you need to do 2 things:
1 Cache Strategy. Enable a cache strategy for your Hibernate entity - either in the class with an annotation or in the hibernate mapping xml file if you are stuck with pre java5. This can be done for an entity by providing this little snippet into your hbm.xml file (a better place is to store the cache setting strategy in hibernate.cg.xml file )

<class name="org.grouter.domain.entities.Router" table="ROUTER">
<cache usage="transactional|read-write|nonstrict-read-write|read-only" />
<id ...
</class>

or using an annotation for your entity (if you are on java5 or greater)

@Entity
@Cache(usage = CacheConcurrencyStrategy.NONSTRICT_READ_WRITE)
public class Router { ... }
And as mentioned above if you want to cache collections of an entity you need to specify caching on collection level:
<class name="org.grouter.domain.entities.Router" table="ROUTER">
<cache usage="transactional|read-write|nonstrict-read-write|read-only"/>
<id ...
<set name="nodes">
<cache usage="transactional|read-write|nonstrict-read-write|read-only"/>
...
</set>
</class>
Hibernate has something called a cache region which by default will be the full qualified name of your Java class. And if you like me are a fan of convention over configuration you will use the default region for an entity. A cache region will also be needed for the collection using the full qualified name of the Java class plus the name of the collection name (i.e. org.grouter.domain.entities.Router.nodes)

2 Cache provider. Setting up the physical caching for a cache provider. If you are using EHCache - which is the most common choice i dear to say - then you will need to specify some settings for the cache regions of your entities in a file called ehcache.xml. The EHCache will look for this file in the classpath and if not found it will fallback to ehcache-failsafe.xml which resides in the ehcache.jar library A typical sample for an EHCache configuration could look like (see mind map below for explanations):

<cache name="org.grouter.domain.entities.Router" maxElementsInMemory="1000" eternal="false" timeToLiveSeconds="600" overflowToDisk="false"/>
and
<cache name="org.grouter.domain.entities.Router.nodes" maxElementsInMemory="1000" eternal="false" timeToLiveSeconds="600" overflowToDisk="false"/>

The name maps to the name of the cache region of your entity. The attribute maxelementsInMemory needs to be set so that Hibernate does not have to swap in and out elements from the cache. A good choice for a read only cache would be as many entities there are in the database table the entity represents. The attribute eternal, if set to true means that any time outs specified will be ignored and entities put into the cache from Hibernate will live for ever.
Below is a mindmap for the second level cache and how it relates to the SessionFactory and the 1st level cache.

The Query cache
The Query cache of Hibernate is not on by default. It uses two cache regions called org.hibernate.cache.StandardQueryCache and org.hibernate.cache.UpdateTimestampsCache. The first one stores the query along with the parameters to the query as a key into the cache and the last one keeps track of stale query results. If an entity part of a cached query is updated the the query cache evicts the query and its cached result from the query cache. Of course to utilize the Query cache the returned and used entities must be set using a cache strategy as discussed previously. A simple load( id ) will not use the query cache but instead if you have a query like:

Query query = session.createQuery("from Router as r where r.created = :creationDate");
query.setParameter("creationDate", new Date());
query.setCacheable(true);
List l = query.list(); // will return one instance with id 4321

Hibernate will cache using as key the query and the parameters the value of the if of the entity.
{ query,{parameters}} ---> {id of cached entity}
{"from Router as r where r.id= :id and r.created = :creationDate", [ new Date() ] } ----> [ 4321 ] ]

Pragmatic approach to the 2nd level cache
How do you now if you are hitting the cache or not? One way is using Hibernates SessionFactory to get statistics for cache hits. In your SessionFactory configuration you can enable the cache statistics by:

<prop key="hibernate.show_sql">true</prop>
<prop key="hibernate.format_sql">true</prop>
<prop key="hibernate.use_sql_comments">true</prop>
<prop key="hibernate.cache.use_query_cache">true</prop>
<prop key="hibernate.cache.use_second_level_cache">true</prop>
<prop key="hibernate.generate_statistics">true</prop>
<prop key="hibernate.cache.use_structured_entries">true</prop>

The you might want to write a unit test to verify that you indeed are hitting the cache. Below is some sample code where the unit test is extending Springs excellent AbstractTransactionalDataSourceSpringContextTests

public class MessageDAOTest extends AbstractDAOTests  // which extends AbstractTransactionalDataSourceSpringContextTests
{
public void testCache()
{
long numberOfMessages = jdbcTemplate.queryForInt("SELECT count(*) FROM message ");
System.out.println("Number of rows :" + numberOfMessages);
final String cacheRegion = Message.class.getCanonicalName();
SecondLevelCacheStatistics settingsStatistics = sessionFactory.getStatistics().
getSecondLevelCacheStatistics(cacheRegion);
StopWatch stopWatch = new StopWatch();
for (int i = 0; i < 10; i++)
{
stopWatch.start();
messageDAO.findAllMessages();
stopWatch.stop();
System.out.println("Query time : " + stopWatch.getTime());
assertEquals(0, settingsStatistics.getMissCount());
assertEquals(numberOfMessages * i, settingsStatistics.getHitCount());
stopWatch.reset();
System.out.println(settingsStatistics);
endTransaction();

// spring creates a transaction when test starts - so we first end it then start a new in the loop
startNewTransaction();
}
}

}

The output could looke something like:

30 Jan 08 23:37:14  INFO org.springframework.test.AbstractTransactionalSpringContextTests:323 - Began transaction (1): 
transaction manager [org.springframework.orm.hibernate3.HibernateTransactionManager@ced32d]; default rollback = true
Number of rows :6
Query time : 562
SecondLevelCacheStatistics[hitCount=0,missCount=0,putCount=6,elementCountInMemory=6,elementCountOnDisk=0,sizeInMemory=8814]
30 Jan 08 23:37:15 INFO org.springframework.test.AbstractTransactionalSpringContextTests:290 - Rolled back transaction
after test execution
30 Jan 08 23:37:15 INFO org.springframework.test.AbstractTransactionalSpringContextTests:323 - Began transaction (2):
transaction manager [org.springframework.orm.hibernate3.HibernateTransactionManager@ced32d]; default rollback = true
Query time : 8
SecondLevelCacheStatistics[hitCount=6,missCount=0,putCount=6,elementCountInMemory=6,elementCountOnDisk=0,sizeInMemory=8814]
30 Jan 08 23:37:15 INFO org.springframework.test.AbstractTransactionalSpringContextTests:290 - Rolled back transaction
after test execution
30 Jan 08 23:37:15 INFO org.springframework.test.AbstractTransactionalSpringContextTests:323 - Began transaction (3):
transaction manager [org.springframework.orm.hibernate3.HibernateTransactionManager@ced32d]; default rollback = true
Query time : 11

Another way to spy on what Hibernate is doing is to proxy the jdbc driver used by a proxy driver. One excellent one I use is p6spy which will show you exactly what is issued over a JDBC connection to the actual backend database. For other tips have a look below in the mindmap.

30 comments:

Anonymous said...

Which MindMapping tool is being used to create these mind maps then?

Anonymous said...

Very very informative article. Thank you.

Anonymous said...

vary very informative article
thanks a lot.

Georges said...

The mindmapping tool used is Freemind - open source, works great - truly recommended

Mr.mojo Risin' said...

Wonderful article
very informative!

Anonymous said...

Very good article...thanks a lotttt

marquedios said...

Do you have the .mm files for this blog and would you be willing to share them?

aectann said...

The best hibernate caching manual ever! Thanks!

Kristof Jozsa said...

"Hibernate SessionFactory (called EntityManager in JEE)" - I believe you meant EntityManagerFactory.. great article anyway, grats!

imyousuf said...

Great article! Thanks a lot. Since you work with Hibernate this project might interest you.

Andries said...

Impressive! ty

Anonymous said...

Would it be possible to get a printer-friendly copy of this article ?

Georges said...

A more printer friendly version can be found on : http://docs.google.com/Doc?id=dfxnknzf_89c6m5r3n3

harpreet1433 said...

Great work, I have been looking for this from so many days.

Anonymous said...

Finally... an accurate and detailed article on Hibernate caching. Thanks!

web designer said...

nice post

Madhu said...

wonderful article..
Can you please tell me if EHCache is used even for First Level Caching?

sirisha said...

very nice explanation.
Thank You.

Anonymous said...

Good Information, Thanks

Majid said...

Excellent work dude !

JavaKungFu said...

Thank you so much for this extremely well put overview of hibernate caching - i will be sure to follow any articles you write !

dpanupam said...

Thats a very interesting post. I have been inspired. Thanks. Web Designer

Anonymous said...
This post has been removed by a blog administrator.
Amanda said...

Thanks for the manual! I need a nice mindmapping tool like that. Great post!


Amanda,
owner of Personal Checks Company

Anonymous said...

Now do you worried about that in the game do not had enough 2moons dil to play the game, now you can not worried, my friend told me a website, in here you can buy a lot 2moons gold and only spend a little money, do not hesitate, it was really, in here we had much 2moon dil, we can sure that you will get the cheap 2moons gold, quick to come here to buy 2moons dil.

Amanda said...

Thanks so much for this cache information!

Christening Favors | Paper Fans

stevedogg said...

as many people have already said, this is a great article. good docs on hibernate internals are hard to come by, but they're necessary if you're going to do any significant deployment using it. i find myself reading the code far too frequently... so thanks a ton!

the one comment i have is that the description of the query cache isn't entirely accurate. the way you described it in terms of caching entity ids and using those to look into the second level cache is how hibernate really *should* do it. however, as of 3.3.1 GA, whenever hibernate issues a cacheable query, the top-level entity in the JDBC result set is blindly loaded and placed in the cache. association mappings seem to be pulled from the second-level cache, but not the initial entity.

to better explain, say i had a class called Foo with an id column called id. if issued in sequence (and assuming proper configuration, enabling of the query cache, etc), both would load the Foo entity with id 1 and the second call would stomp on the value already in the second level cache because of the blind load:

session.get(Foo.class, 1);
session.find("from Foo where id = 1");

what's worse, if you're using distributed ehcache (haven't experimented with other cache providers yet to see how they handle this case), the second statment will broadcast either a cache invalidation or copy of the entity, depending on how you have it configured. you can imagine how messy that becomes when you start building up your server pool...

if Foo had a collection mapped, the second call would not need to load the collections again because it is properly going through the second level cache.

this seems to be a pretty big oversight on the hibernate developers' part, and i really hope they get it ironed out in future releases...

stevedogg said...

as it turns out, i found a barely documented property called hibernate.cache.use_minimal_puts. setting this value to true will prevent it from blindly overwriting the previously cached values, which solves my clustered cache problems. according to the manual, it is enabled by default for clustered cache implementations, but i guess hibernate 3.3.1 GA still isn't considering ehcache as a clustered provider. i recommend explicitly toggling it if you're using clustering (regardless of cache provider) just to be on the safe side...

thanks again for the great article!

To said...
This post has been removed by a blog administrator.
To said...
This post has been removed by a blog administrator.