Improving performance
Fetching strategies
A fetching strategy is the strategy Hibernate will use for
retrieving associated objects if the application needs to navigate the association.
Fetch strategies may be declared in the O/R mapping metadata, or over-ridden by a
particular HQL or Criteria query.
Hibernate3 defines the following fetching strategies:
Join fetching - Hibernate retrieves the
associated instance or collection in the same SELECT,
using an OUTER JOIN.
Select fetching - a second SELECT
is used to retrieve the associated entity or collection. Unless
you explicitly disable lazy fetching by specifying lazy="false",
this second select will only be executed when you actually access the
association.
Subselect fetching - a second SELECT
is used to retrieve the associated collections for all entities retrieved in a
previous query or fetch. Unless you explicitly disable lazy fetching by specifying
lazy="false", this second select will only be executed when you
actually access the association.
Batch fetching - an optimization strategy
for select fetching - Hibernate retrieves a batch of entity instances
or collections in a single SELECT, by specifying
a list of primary keys or foreign keys.
Hibernate also distinguishes between:
Immediate fetching - an association, collection or
attribute is fetched immediately, when the owner is loaded.
Lazy collection fetching - a collection is fetched
when the application invokes an operation upon that collection. (This
is the default for collections.)
"Extra-lazy" collection fetching - individual
elements of the collection are accessed from the database as needed.
Hibernate tries not to fetch the whole collection into memory unless
absolutely needed (suitable for very large collections)
Proxy fetching - a single-valued association is
fetched when a method other than the identifier getter is invoked
upon the associated object.
"No-proxy" fetching - a single-valued association is
fetched when the instance variable is accessed. Compared to proxy fetching,
this approach is less lazy (the association is fetched even when only the
identifier is accessed) but more transparent, since no proxy is visible to
the application. This approach requires buildtime bytecode instrumentation
and is rarely necessary.
Lazy attribute fetching - an attribute or single
valued association is fetched when the instance variable is accessed.
This approach requires buildtime bytecode instrumentation and is rarely
necessary.
We have two orthogonal notions here: when is the association
fetched, and how is it fetched (what SQL is used). Don't
confuse them! We use fetch to tune performance. We may use
lazy to define a contract for what data is always available
in any detached instance of a particular class.
Working with lazy associations
By default, Hibernate3 uses lazy select fetching for collections and lazy proxy
fetching for single-valued associations. These defaults make sense for almost
all associations in almost all applications.
Note: if you set
hibernate.default_batch_fetch_size, Hibernate will use the
batch fetch optimization for lazy fetching (this optimization may also be enabled
at a more granular level).
However, lazy fetching poses one problem that you must be aware of. Access to a
lazy association outside of the context of an open Hibernate session will result
in an exception. For example:
Since the permissions collection was not initialized when the
Session was closed, the collection will not be able to
load its state. Hibernate does not support lazy initialization
for detached objects. The fix is to move the code that reads
from the collection to just before the transaction is committed.
Alternatively, we could use a non-lazy collection or association,
by specifying lazy="false" for the association mapping.
However, it is intended that lazy initialization be used for almost all
collections and associations. If you define too many non-lazy associations
in your object model, Hibernate will end up needing to fetch the entire
database into memory in every transaction!
On the other hand, we often want to choose join fetching (which is non-lazy by
nature) instead of select fetching in a particular transaction. We'll now see
how to customize the fetching strategy. In Hibernate3, the mechanisms for
choosing a fetch strategy are identical for single-valued associations and
collections.
Tuning fetch strategies
Select fetching (the default) is extremely vulnerable to N+1 selects problems,
so we might want to enable join fetching in the mapping document:
]]>
The fetch strategy defined in the mapping document affects:
retrieval via get() or load()
retrieval that happens implicitly when an association is navigated
Criteria queries
HQL queries if subselect fetching is used
No matter what fetching strategy you use, the defined non-lazy graph is guaranteed
to be loaded into memory. Note that this might result in several immediate selects
being used to execute a particular HQL query.
Usually, we don't use the mapping document to customize fetching. Instead, we
keep the default behavior, and override it for a particular transaction, using
left join fetch in HQL. This tells Hibernate to fetch
the association eagerly in the first select, using an outer join. In the
Criteria query API, you would use
setFetchMode(FetchMode.JOIN).
If you ever feel like you wish you could change the fetching strategy used by
get() or load(), simply use a
Criteria query, for example:
(This is Hibernate's equivalent of what some ORM solutions call a "fetch plan".)
A completely different way to avoid problems with N+1 selects is to use the
second-level cache.
Single-ended association proxies
Lazy fetching for collections is implemented using Hibernate's own implementation
of persistent collections. However, a different mechanism is needed for lazy
behavior in single-ended associations. The target entity of the association must
be proxied. Hibernate implements lazy initializing proxies for persistent objects
using runtime bytecode enhancement (via the excellent CGLIB library).
By default, Hibernate3 generates proxies (at startup) for all persistent classes
and uses them to enable lazy fetching of many-to-one and
one-to-one associations.
The mapping file may declare an interface to use as the proxy interface for that
class, with the proxy attribute. By default, Hibernate uses a subclass
of the class. Note that the proxied class must implement a default constructor
with at least package visibility. We recommend this constructor for all persistent classes!
There are some gotchas to be aware of when extending this approach to polymorphic
classes, eg.
......
.....
]]>
Firstly, instances of Cat will never be castable to
DomesticCat, even if the underlying instance is an
instance of DomesticCat:
Secondly, it is possible to break proxy ==.
However, the situation is not quite as bad as it looks. Even though we now have two references
to different proxy objects, the underlying instance will still be the same object:
Third, you may not use a CGLIB proxy for a final class or a class
with any final methods.
Finally, if your persistent object acquires any resources upon instantiation (eg. in
initializers or default constructor), then those resources will also be acquired by
the proxy. The proxy class is an actual subclass of the persistent class.
These problems are all due to fundamental limitations in Java's single inheritance model.
If you wish to avoid these problems your persistent classes must each implement an interface
that declares its business methods. You should specify these interfaces in the mapping file. eg.
......
.....
]]>
where CatImpl implements the interface Cat and
DomesticCatImpl implements the interface DomesticCat. Then
proxies for instances of Cat and DomesticCat may be returned
by load() or iterate(). (Note that list()
does not usually return proxies.)
Relationships are also lazily initialized. This means you must declare any properties to be of
type Cat, not CatImpl.
Certain operations do not require proxy initialization
equals(), if the persistent class does not override
equals()
hashCode(), if the persistent class does not override
hashCode()
The identifier getter method
Hibernate will detect persistent classes that override equals() or
hashCode().
By choosing lazy="no-proxy" instead of the default
lazy="proxy", we can avoid the problems associated with typecasting.
However, we will require buildtime bytecode instrumentation, and all operations
will result in immediate proxy initialization.
Initializing collections and proxies
A LazyInitializationException will be thrown by Hibernate if an uninitialized
collection or proxy is accessed outside of the scope of the Session, ie. when
the entity owning the collection or having the reference to the proxy is in the detached state.
Sometimes we need to ensure that a proxy or collection is initialized before closing the
Session. Of course, we can alway force initialization by calling
cat.getSex() or cat.getKittens().size(), for example.
But that is confusing to readers of the code and is not convenient for generic code.
The static methods Hibernate.initialize() and Hibernate.isInitialized()
provide the application with a convenient way of working with lazily initialized collections or
proxies. Hibernate.initialize(cat) will force the initialization of a proxy,
cat, as long as its Session is still open.
Hibernate.initialize( cat.getKittens() ) has a similar effect for the collection
of kittens.
Another option is to keep the Session open until all needed
collections and proxies have been loaded. In some application architectures,
particularly where the code that accesses data using Hibernate, and the code that
uses it are in different application layers or different physical processes, it
can be a problem to ensure that the Session is open when a
collection is initialized. There are two basic ways to deal with this issue:
In a web-based application, a servlet filter can be used to close the
Session only at the very end of a user request, once
the rendering of the view is complete (the Open Session in
View pattern). Of course, this places heavy demands on the
correctness of the exception handling of your application infrastructure.
It is vitally important that the Session is closed and the
transaction ended before returning to the user, even when an exception occurs
during rendering of the view. The servlet filter has to be able to access the
Session for this approach. We recommend that a
ThreadLocal variable be used to hold the current
Session (see chapter 1,
, for an example implementation).
In an application with a separate business tier, the business logic must
"prepare" all collections that will be needed by the web tier before
returning. This means that the business tier should load all the data and
return all the data already initialized to the presentation/web tier that
is required for a particular use case. Usually, the application calls
Hibernate.initialize() for each collection that will
be needed in the web tier (this call must occur before the session is closed)
or retrieves the collection eagerly using a Hibernate query with a
FETCH clause or a FetchMode.JOIN in
Criteria. This is usually easier if you adopt the
Command pattern instead of a Session Facade.
You may also attach a previously loaded object to a new Session
with merge() or lock() before
accessing uninitialized collections (or other proxies). No, Hibernate does not,
and certainly should not do this automatically, since it
would introduce ad hoc transaction semantics!
Sometimes you don't want to initialize a large collection, but still need some
information about it (like its size) or a subset of the data.
You can use a collection filter to get the size of a collection without initializing it:
The createFilter() method is also used to efficiently retrieve subsets
of a collection without needing to initialize the whole collection:
Using batch fetching
Hibernate can make efficient use of batch fetching, that is, Hibernate can load several uninitialized
proxies if one proxy is accessed (or collections. Batch fetching is an optimization of the lazy select
fetching strategy. There are two ways you can tune batch fetching: on the class and the collection level.
Batch fetching for classes/entities is easier to understand. Imagine you have the following situation
at runtime: You have 25 Cat instances loaded in a Session, each
Cat has a reference to its owner, a Person.
The Person class is mapped with a proxy, lazy="true". If you now
iterate through all cats and call getOwner() on each, Hibernate will by default
execute 25 SELECT statements, to retrieve the proxied owners. You can tune this
behavior by specifying a batch-size in the mapping of Person:
...]]>
Hibernate will now execute only three queries, the pattern is 10, 10, 5.
You may also enable batch fetching of collections. For example, if each Person has
a lazy collection of Cats, and 10 persons are currently loaded in the
Sesssion, iterating through all persons will generate 10 SELECTs,
one for every call to getCats(). If you enable batch fetching for the
cats collection in the mapping of Person, Hibernate can pre-fetch
collections:
...
]]>
With a batch-size of 8, Hibernate will load 3, 3, 3, 1 collections in four
SELECTs. Again, the value of the attribute depends on the expected number of
uninitialized collections in a particular Session.
Batch fetching of collections is particularly useful if you have a nested tree of items, ie.
the typical bill-of-materials pattern. (Although a nested set or a
materialized path might be a better option for read-mostly trees.)
Using subselect fetching
If one lazy collection or single-valued proxy has to be fetched, Hibernate loads all of
them, re-running the original query in a subselect. This works in the same way as
batch-fetching, without the piecemeal loading.
Using lazy property fetching
Hibernate3 supports the lazy fetching of individual properties. This optimization technique
is also known as fetch groups. Please note that this is mostly a
marketing feature, as in practice, optimizing row reads is much more important than
optimization of column reads. However, only loading some properties of a class might
be useful in extreme cases, when legacy tables have hundreds of columns and the data model
can not be improved.
To enable lazy property loading, set the lazy attribute on your
particular property mappings:
]]>
Lazy property loading requires buildtime bytecode instrumentation! If your persistent
classes are not enhanced, Hibernate will silently ignore lazy property settings and
fall back to immediate fetching.
For bytecode instrumentation, use the following Ant task:
]]>
A different (better?) way to avoid unnecessary column reads, at least for
read-only transactions is to use the projection features of HQL or Criteria
queries. This avoids the need for buildtime bytecode processing and is
certainly a prefered solution.
You may force the usual eager fetching of properties using fetch all
properties in HQL.
The Second Level Cache
A Hibernate Session is a transaction-level cache of persistent data. It is
possible to configure a cluster or JVM-level (SessionFactory-level) cache on
a class-by-class and collection-by-collection basis. You may even plug in a clustered cache. Be
careful. Caches are never aware of changes made to the persistent store by another application
(though they may be configured to regularly expire cached data).
By default, Hibernate uses EHCache for JVM-level caching. (JCS support is now deprecated and will
be removed in a future version of Hibernate.) You may choose a different implementation by
specifying the name of a class that implements org.hibernate.cache.CacheProvider
using the property hibernate.cache.provider_class.
Cache Providers
Cache
Provider class
Type
Cluster Safe
Query Cache Supported
Hashtable (not intended for production use)
org.hibernate.cache.HashtableCacheProvider
memory
yes
EHCache
org.hibernate.cache.EhCacheProvider
memory, disk
yes
OSCache
org.hibernate.cache.OSCacheProvider
memory, disk
yes
SwarmCache
org.hibernate.cache.SwarmCacheProvider
clustered (ip multicast)
yes (clustered invalidation)
JBoss TreeCache
org.hibernate.cache.TreeCacheProvider
clustered (ip multicast), transactional
yes (replication)
yes (clock sync req.)
Cache mappings
The <cache> element of a class or collection mapping has the
following form:
]]>
usage specifies the caching strategy:
transactional,
read-write,
nonstrict-read-write or
read-only
Alternatively (preferrably?), you may specify <class-cache> and
<collection-cache> elements in hibernate.cfg.xml.
The usage attribute specifies a cache concurrency strategy.
Strategy: read only
If your application needs to read but never modify instances of a persistent class, a
read-only cache may be used. This is the simplest and best performing
strategy. It's even perfectly safe for use in a cluster.
....
]]>
Strategy: read/write
If the application needs to update data, a read-write cache might be appropriate.
This cache strategy should never be used if serializable transaction isolation level is required.
If the cache is used in a JTA environment, you must specify the property
hibernate.transaction.manager_lookup_class, naming a strategy for obtaining the
JTA TransactionManager. In other environments, you should ensure that the transaction
is completed when Session.close() or Session.disconnect() is called.
If you wish to use this strategy in a cluster, you should ensure that the underlying cache implementation
supports locking. The built-in cache providers do not.
....
....
]]>
Strategy: nonstrict read/write
If the application only occasionally needs to update data (ie. if it is extremely unlikely that two
transactions would try to update the same item simultaneously) and strict transaction isolation is
not required, a nonstrict-read-write cache might be appropriate. If the cache is
used in a JTA environment, you must specify hibernate.transaction.manager_lookup_class.
In other environments, you should ensure that the transaction is completed when
Session.close() or Session.disconnect() is called.
Strategy: transactional
The transactional cache strategy provides support for fully transactional cache
providers such as JBoss TreeCache. Such a cache may only be used in a JTA environment and you must
specify hibernate.transaction.manager_lookup_class.
None of the cache providers support all of the cache concurrency strategies. The following table shows
which providers are compatible with which concurrency strategies.
Cache Concurrency Strategy Support
Cache
read-only
nonstrict-read-write
read-write
transactional
Hashtable (not intended for production use)
yes
yes
yes
EHCache
yes
yes
yes
OSCache
yes
yes
yes
SwarmCache
yes
yes
JBoss TreeCache
yes
yes
Managing the caches
Whenever you pass an object to save(), update()
or saveOrUpdate() and whenever you retrieve an object using
load(), get(), list(),
iterate() or scroll(), that object is added
to the internal cache of the Session.
When flush() is subsequently called, the state of that object will
be synchronized with the database. If you do not want this synchronization to occur or
if you are processing a huge number of objects and need to manage memory efficiently,
the evict() method may be used to remove the object and its collections
from the first-level cache.
The Session also provides a contains() method to determine
if an instance belongs to the session cache.
To completely evict all objects from the session cache, call Session.clear()
For the second-level cache, there are methods defined on SessionFactory for
evicting the cached state of an instance, entire class, collection instance or entire collection
role.
The CacheMode controls how a particular session interacts with the second-level
cache.
CacheMode.NORMAL - read items from and write items to the second-level cache
CacheMode.GET - read items from the second-level cache, but don't write to
the second-level cache except when updating data
CacheMode.PUT - write items to the second-level cache, but don't read from
the second-level cache
CacheMode.REFRESH - write items to the second-level cache, but don't read from
the second-level cache, bypass the effect of hibernate.cache.use_minimal_puts, forcing
a refresh of the second-level cache for all items read from the database
To browse the contents of a second-level or query cache region, use the Statistics
API:
You'll need to enable statistics, and, optionally, force Hibernate to keep the cache entries in a
more human-understandable format:
The Query Cache
Query result sets may also be cached. This is only useful for queries that are run
frequently with the same parameters. To use the query cache you must first enable it:
This setting causes the creation of two new cache regions - one holding cached query
result sets (org.hibernate.cache.StandardQueryCache), the other
holding timestamps of the most recent updates to queryable tables
(org.hibernate.cache.UpdateTimestampsCache). Note that the query
cache does not cache the state of the actual entities in the result set; it caches
only identifier values and results of value type. So the query cache should always be
used in conjunction with the second-level cache.
Most queries do not benefit from caching, so by default queries are not cached. To
enable caching, call Query.setCacheable(true). This call allows
the query to look for existing cache results or add its results to the cache when
it is executed.
If you require fine-grained control over query cache expiration policies, you may
specify a named cache region for a particular query by calling
Query.setCacheRegion().
If the query should force a refresh of its query cache region, you should call
Query.setCacheMode(CacheMode.REFRESH). This is particularly useful
in cases where underlying data may have been updated via a separate process (i.e.,
not modified through Hibernate) and allows the application to selectively refresh
particular query result sets. This is a more efficient alternative to eviction of
a query cache region via SessionFactory.evictQueries().
Understanding Collection performance
We've already spent quite some time talking about collections.
In this section we will highlight a couple more issues about
how collections behave at runtime.
Taxonomy
Hibernate defines three basic kinds of collections:
collections of values
one to many associations
many to many associations
This classification distinguishes the various table and foreign key
relationships but does not tell us quite everything we need to know
about the relational model. To fully understand the relational structure
and performance characteristics, we must also consider the structure of
the primary key that is used by Hibernate to update or delete collection
rows. This suggests the following classification:
indexed collections
sets
bags
All indexed collections (maps, lists, arrays) have a primary key consisting
of the <key> and <index>
columns. In this case collection updates are usually extremely efficient -
the primary key may be efficiently indexed and a particular row may be efficiently
located when Hibernate tries to update or delete it.
Sets have a primary key consisting of <key> and element
columns. This may be less efficient for some types of collection element, particularly
composite elements or large text or binary fields; the database may not be able to index
a complex primary key as efficently. On the other hand, for one to many or many to many
associations, particularly in the case of synthetic identifiers, it is likely to be just
as efficient. (Side-note: if you want SchemaExport to actually create
the primary key of a <set> for you, you must declare all columns
as not-null="true".)
<idbag> mappings define a surrogate key, so they are
always very efficient to update. In fact, they are the best case.
Bags are the worst case. Since a bag permits duplicate element values and has no
index column, no primary key may be defined. Hibernate has no way of distinguishing
between duplicate rows. Hibernate resolves this problem by completely removing
(in a single DELETE) and recreating the collection whenever it
changes. This might be very inefficient.
Note that for a one-to-many association, the "primary key" may not be the physical
primary key of the database table - but even in this case, the above classification
is still useful. (It still reflects how Hibernate "locates" individual rows of the
collection.)
Lists, maps, idbags and sets are the most efficient collections to update
From the discussion above, it should be clear that indexed collections
and (usually) sets allow the most efficient operation in terms of adding,
removing and updating elements.
There is, arguably, one more advantage that indexed collections have over sets for
many to many associations or collections of values. Because of the structure of a
Set, Hibernate doesn't ever UPDATE a row when
an element is "changed". Changes to a Set always work via
INSERT and DELETE (of individual rows). Once
again, this consideration does not apply to one to many associations.
After observing that arrays cannot be lazy, we would conclude that lists, maps and
idbags are the most performant (non-inverse) collection types, with sets not far
behind. Sets are expected to be the most common kind of collection in Hibernate
applications. This is because the "set" semantics are most natural in the relational
model.
However, in well-designed Hibernate domain models, we usually see that most collections
are in fact one-to-many associations with inverse="true". For these
associations, the update is handled by the many-to-one end of the association, and so
considerations of collection update performance simply do not apply.
Bags and lists are the most efficient inverse collections
Just before you ditch bags forever, there is a particular case in which bags (and also lists)
are much more performant than sets. For a collection with inverse="true"
(the standard bidirectional one-to-many relationship idiom, for example) we can add elements
to a bag or list without needing to initialize (fetch) the bag elements! This is because
Collection.add() or Collection.addAll() must always
return true for a bag or List (unlike a Set). This can
make the following common code much faster.
One shot delete
Occasionally, deleting collection elements one by one can be extremely inefficient. Hibernate
isn't completely stupid, so it knows not to do that in the case of an newly-empty collection
(if you called list.clear(), for example). In this case, Hibernate will
issue a single DELETE and we are done!
Suppose we add a single element to a collection of size twenty and then remove two elements.
Hibernate will issue one INSERT statement and two DELETE
statements (unless the collection is a bag). This is certainly desirable.
However, suppose that we remove eighteen elements, leaving two and then add thee new elements.
There are two possible ways to proceed
delete eighteen rows one by one and then insert three rows
remove the whole collection (in one SQL DELETE) and insert
all five current elements (one by one)
Hibernate isn't smart enough to know that the second option is probably quicker in this case.
(And it would probably be undesirable for Hibernate to be that smart; such behaviour might
confuse database triggers, etc.)
Fortunately, you can force this behaviour (ie. the second strategy) at any time by discarding
(ie. dereferencing) the original collection and returning a newly instantiated collection with
all the current elements. This can be very useful and powerful from time to time.
Of course, one-shot-delete does not apply to collections mapped inverse="true".
Monitoring performance
Optimization is not much use without monitoring and access to performance numbers.
Hibernate provides a full range of figures about its internal operations.
Statistics in Hibernate are available per SessionFactory.
Monitoring a SessionFactory
You can access SessionFactory metrics in two ways.
Your first option is to call sessionFactory.getStatistics() and
read or display the Statistics yourself.
Hibernate can also use JMX to publish metrics if you enable the
StatisticsService MBean. You may enable a single MBean for all your
SessionFactory or one per factory. See the following code for
minimalistic configuration examples:
TODO: This doesn't make sense: In the first case, we retrieve and use the MBean directly. In the second one, we must give
the JNDI name in which the session factory is held before using it. Use
hibernateStatsBean.setSessionFactoryJNDIName("my/JNDI/Name")
You can (de)activate the monitoring for a SessionFactory
at configuration time, set hibernate.generate_statistics to false
at runtime: sf.getStatistics().setStatisticsEnabled(true)
or hibernateStatsBean.setStatisticsEnabled(true)
Statistics can be reset programatically using the clear() method.
A summary can be sent to a logger (info level) using the logSummary()
method.
Metrics
Hibernate provides a number of metrics, from very basic to the specialized information
only relevant in certain scenarios. All available counters are described in the
Statistics interface API, in three categories:
Metrics related to the general Session usage, such as
number of open sessions, retrieved JDBC connections, etc.
Metrics related to he entities, collections, queries, and caches as a
whole (aka global metrics),
Detailed metrics related to a particular entity, collection, query or
cache region.
For exampl,e you can check the cache hit, miss, and put ratio of entities, collections
and queries, and the average time a query needs. Beware that the number of milliseconds
is subject to approximation in Java. Hibernate is tied to the JVM precision, on some
platforms this might even only be accurate to 10 seconds.
Simple getters are used to access the global metrics (i.e. not tied to a particular entity,
collection, cache region, etc.). You can access the metrics of a particular entity, collection
or cache region through its name, and through its HQL or SQL representation for queries. Please
refer to the Statistics, EntityStatistics,
CollectionStatistics, SecondLevelCacheStatistics,
and QueryStatistics API Javadoc for more information. The following
code shows a simple example:
To work on all entities, collections, queries and region caches, you can retrieve
the list of names of entities, collections, queries and region caches with the
following methods: getQueries(), getEntityNames(),
getCollectionRoleNames(), and
getSecondLevelCacheRegionNames().