hibernate-orm/reference/en/modules/performance.xml

1077 lines
51 KiB
XML
Raw Normal View History

<chapter id="performance">
<title>Improving performance</title>
<sect1 id="performance-collections">
<title>Understanding Collection performance</title>
<para>
We've already spent quite some time talking about collections.
In this section we will highlight a couple more issues about
how collections behave at runtime.
</para>
<sect2 id="performance-collections-taxonomy">
<title>Taxonomy</title>
<para>Hibernate defines three basic kinds of collections:</para>
<itemizedlist>
<listitem>
<para>collections of values</para>
</listitem>
<listitem>
<para>one to many associations</para>
</listitem>
<listitem>
<para>many to many associations</para>
</listitem>
</itemizedlist>
<para>
This classification distinguishes the various table and foreign key
relationships but does not tell us quite everything we need to know
about the relational model. To fully understand the relational structure
and performance characteristics, we must also consider the structure of
the primary key that is used by Hibernate to update or delete collection
rows. This suggests the following classification:
</para>
<itemizedlist>
<listitem>
<para>indexed collections</para>
</listitem>
<listitem>
<para>sets</para>
</listitem>
<listitem>
<para>bags</para>
</listitem>
</itemizedlist>
<para>
All indexed collections (maps, lists, arrays) have a primary key consisting
of the <literal>&lt;key&gt;</literal> and <literal>&lt;index&gt;</literal>
columns. In this case collection updates are usually extremely efficient -
the primary key may be efficiently indexed and a particular row may be efficiently
located when Hibernate tries to update or delete it.
</para>
<para>
Sets have a primary key consisting of <literal>&lt;key&gt;</literal> and element
columns. This may be less efficient for some types of collection element, particularly
composite elements or large text or binary fields; the database may not be able to index
a complex primary key as efficently. On the other hand, for one to many or many to many
associations, particularly in the case of synthetic identifiers, it is likely to be just
as efficient. (Side-note: if you want <literal>SchemaExport</literal> to actually create
the primary key of a <literal>&lt;set&gt;</literal> for you, you must declare all columns
as <literal>not-null="true"</literal>.)
</para>
<para>
<literal>&lt;idbag&gt;</literal> mappings define a surrogate key, so they are
always very efficient to update. In fact, they are the best case.
</para>
<para>
Bags are the worst case. Since a bag permits duplicate element values and has no
index column, no primary key may be defined. Hibernate has no way of distinguishing
between duplicate rows. Hibernate resolves this problem by completely removing
(in a single <literal>DELETE</literal>) and recreating the collection whenever it
changes. This might be very inefficient.
</para>
<para>
Note that for a one-to-many association, the "primary key" may not be the physical
primary key of the database table - but even in this case, the above classification
is still useful. (It still reflects how Hibernate "locates" individual rows of the
collection.)
</para>
</sect2>
<sect2 id="performance-collections-mostefficientupdate">
<title>Lists, maps, idbags and sets are the most efficient collections to update</title>
<para>
From the discussion above, it should be clear that indexed collections
and (usually) sets allow the most efficient operation in terms of adding,
removing and updating elements.
</para>
<para>
There is, arguably, one more advantage that indexed collections have over sets for
many to many associations or collections of values. Because of the structure of a
<literal>Set</literal>, Hibernate doesn't ever <literal>UPDATE</literal> a row when
an element is "changed". Changes to a <literal>Set</literal> always work via
<literal>INSERT</literal> and <literal>DELETE</literal> (of individual rows). Once
again, this consideration does not apply to one to many associations.
</para>
<para>
After observing that arrays cannot be lazy, we would conclude that lists, maps and
idbags are the most performant (non-inverse) collection types, with sets not far
behind. Sets are expected to be the most common kind of collection in Hibernate
applications. This is because the "set" semantics are most natural in the relational
model.
</para>
<para>
However, in well-designed Hibernate domain models, we usually see that most collections
are in fact one-to-many associations with <literal>inverse="true"</literal>. For these
associations, the update is handled by the many-to-one end of the association, and so
considerations of collection update performance simply do not apply.
</para>
</sect2>
<sect2 id="performance-collections-mostefficentinverse">
<title>Bags and lists are the most efficient inverse collections</title>
<para>
Just before you ditch bags forever, there is a particular case in which bags (and also lists)
are much more performant than sets. For a collection with <literal>inverse="true"</literal>
(the standard bidirectional one-to-many relationship idiom, for example) we can add elements
to a bag or list without needing to initialize (fetch) the bag elements! This is because
<literal>Collection.add()</literal> or <literal>Collection.addAll()</literal> must always
return true for a bag or <literal>List</literal> (unlike a <literal>Set</literal>). This can
make the following common code much faster.
</para>
<programlisting><![CDATA[Parent p = (Parent) sess.load(Parent.class, id);
Child c = new Child();
c.setParent(p);
p.getChildren().add(c); //no need to fetch the collection!
sess.flush();]]></programlisting>
</sect2>
<sect2 id="performance-collections-oneshotdelete">
<title>One shot delete</title>
<para>
Occasionally, deleting collection elements one by one can be extremely inefficient. Hibernate
isn't completely stupid, so it knows not to do that in the case of an newly-empty collection
(if you called <literal>list.clear()</literal>, for example). In this case, Hibernate will
issue a single <literal>DELETE</literal> and we are done!
</para>
<para>
Suppose we add a single element to a collection of size twenty and then remove two elements.
Hibernate will issue one <literal>INSERT</literal> statement and two <literal>DELETE</literal>
statements (unless the collection is a bag). This is certainly desirable.
</para>
<para>
However, suppose that we remove eighteen elements, leaving two and then add thee new elements.
There are two possible ways to proceed
</para>
<itemizedlist>
<listitem>
<para>delete eighteen rows one by one and then insert three rows</para>
</listitem>
<listitem>
<para>remove the whole collection (in one SQL <literal>DELETE</literal>) and insert
all five current elements (one by one)</para>
</listitem>
</itemizedlist>
<para>
Hibernate isn't smart enough to know that the second option is probably quicker in this case.
(And it would probably be undesirable for Hibernate to be that smart; such behaviour might
confuse database triggers, etc.)
</para>
<para>
Fortunately, you can force this behaviour (ie. the second strategy) at any time by discarding
(ie. dereferencing) the original collection and returning a newly instantiated collection with
all the current elements. This can be very useful and powerful from time to time.
</para>
<para>
Of course, one-shot-delete does not apply to collections mapped <literal>inverse="true"</literal>.
</para>
</sect2>
</sect1>
<sect1 id="performance-fetching">
<title>Fetching strategies</title>
<para>
A fetching strategy describes the number of instances, the depth of a
subgraph of instances, and SQL <literal>SELECT</literal>s that are used
to retrieve these instances. Hibernate supports several strategies and you
can configure them on a global level, per entity class, per association, or
even for a particular query in HQL and with <literal>Criteria</literal>.
</para>
<para>
Hibernate offers the following fetching strategies:
</para>
<itemizedlist>
<listitem>
<para>
<emphasis>Lazy fetching</emphasis> - an associated instance (or a
collection) will only be loaded when needed, using an additional
defered <literal>SELECT</literal>.
</para>
</listitem>
<listitem>
<para>
<emphasis>Batch fetching</emphasis> - an optimization strategy
for lazy fetching, Hibernate not only retrieves a single instance
(or collection), but several in the same <literal>SELECT</literal>.
</para>
</listitem>
<listitem>
<para>
<emphasis>Eager fetching</emphasis> - Hibernate retrieves the
associated instance (or collection) in the same <literal>SELECT</literal>,
using an <literal>OUTER JOIN</literal>.
</para>
</listitem>
<listitem>
<para>
<emphasis>Select fetching</emphasis> - a second <literal>SELECT</literal>
is used to retrieve the associated instance (or collection), but
it might be executed immediately and not defered until first access
(as with lazy fetching).
</para>
</listitem>
</itemizedlist>
<para>
By default, Hibernate3 will only load the given entity using a single
<literal>SELECT</literal> statement if you retrieve an object with
<literal>load()</literal> or <literal>get()</literal>. This means that
all single-ended associations and collections are set for lazy fetching
by default. You can change this global default by setting the
<literal>default-lazy</literal> attribute on the <literal>hibernate-mapping</literal>
element to <literal>false</literal>.
</para>
<para>
We'll now have a closer look at the individual fetching strategies and how
to change them for single-ended associations and collections.
</para>
<sect2 id="performance-fetching-collections" revision="3">
<title>Collection fetching</title>
<para>
Initialization of collections owned by persistent instances happens transparently
to the user, so the application would not normally need to worry about this (in
fact, transparent lazy initialization is the main reason why Hibernate needs its
own collection implementations). However, if the application tries something like
this:
</para>
<programlisting><![CDATA[s = sessions.openSession();
User u = (User) s.find("from User u where u.name=?", userName, Hibernate.STRING).get(0);
Map permissions = u.getPermissions();
s.connection().commit();
s.close();
Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></programlisting>
<para>
It could be in for a nasty surprise. Since the permissions collection was not
initialized when the <literal>Session</literal> was closed, the collection
will not be able to load its state. <emphasis>Hibernate does not support lazy
initialization for detached objects</emphasis>. The fix is to move the
line that reads from the collection to just before the commit. (There are
other more advanced ways to solve this problem, some are discussed later.)
</para>
<para>
It's possible to use a non-lazy collection. However, it is intended that lazy
initialization be used for almost all collections, especially for collections
of entity references (its the default). If you define too many non-lazy associations
in your object model, Hibernate will end up needing to fetch the entire database
into memory in every transaction! Still, sometimes you want to use an additional
<literal>SELECT</literal> for a particular collection right away, not defered
until the first access happens:
</para>
<programlisting><![CDATA[<set name="permissions" fetch="select">
<key column="USER_ID"/>
<one-to-many class="Permission"/>
</set]]></programlisting>
<para>
Hibernate will now execute an immediate second <literal>SELECT</literal> loading
the collection of <literal>Permission</literal> instances, when a particular
<literal>User</literal> is retrieved.
</para>
<para>
Any kind of lazy fetching (and also Select fetching) is extremely vulnerable to
N+1 selects problems. So usually, we choose lazy fetching only as a default
strategy, and override it for a particular transaction, using the HQL
<literal>LEFT JOIN FETCH</literal> clause. This tells Hibernate to fetch the
association eagerly in the first select, using an outer join. In the
<literal>Criteria</literal> API, you would use
<literal>setFetchMode(FetchMode.EAGER)</literal>.
</para>
<para>
You can always force outer join association fetching in the mapping file, by setting
<literal>fetch="join"</literal> (or use the old <literal>outer-join="true"</literal>
syntax). We don't recommend this setting, especially not for collections, since it is
incredibly rare to find an entity which is <emphasis>always</emphasis> used when
an associated entity is used, at least in a sufficiently large system.
</para>
<para>
Eager fetching for collections has another restriction: you may only set one
collection role per persistent class to be fetched per outer join. Hibernate forbids
Cartesian products when possible, <literal>SELECT</literal>ing two collections per
outer join would create one. This would almost always be slower than two (lazy or
non-defered) <literal>SELECT</literal>s. The restriction to a single outer-joined
collection applies to both the mapping fetching strategies and to HQL/Criteria queries.
</para>
</sect2>
<sect2 id="performance-fetching-proxies" revision="2">
<title>Single-ended association proxies</title>
<para>
Lazy fetching for collections is implemented using Hibernate's own implementation
of persistent collections. However, a different mechanism is needed for lazy
behavior in single-ended associations. The target entity of the association must
be proxied. Hibernate implements lazy initializing proxies for persistent objects
using runtime bytecode enhancement (via the excellent CGLIB library).
</para>
<para>
By default, Hibernate3 generates proxies (at startup) for all persistent classes
and uses them to enable lazy fetching of <literal>many-to-one</literal> and
<literal>one-to-one</literal> associations.
</para>
<para>
The mapping file may declare an interface to use as the proxy interface for that
class, with the <literal>proxy</literal> attribute. By default, Hibernate uses a subclass
of the class. <emphasis>Note that the proxied class must implement a default constructor
with at least package visibility. We recommend this constructor for all persistent classes!</emphasis>
</para>
<para>
There are some gotchas to be aware of when extending this approach to polymorphic
classes, eg.
</para>
<programlisting><![CDATA[<class name="Cat" proxy="Cat">
......
<subclass name="DomesticCat">
.....
</subclass>
</class>]]></programlisting>
<para>
Firstly, instances of <literal>Cat</literal> will never be castable to
<literal>DomesticCat</literal>, even if the underlying instance is an
instance of <literal>DomesticCat</literal>:
</para>
<programlisting><![CDATA[Cat cat = (Cat) session.load(Cat.class, id); // instantiate a proxy (does not hit the db)
if ( cat.isDomesticCat() ) { // hit the db to initialize the proxy
DomesticCat dc = (DomesticCat) cat; // Error!
....
}]]></programlisting>
<para>
Secondly, it is possible to break proxy <literal>==</literal>.
</para>
<programlisting><![CDATA[
Cat cat = (Cat) session.load(Cat.class, id); // instantiate a Cat proxy
DomesticCat dc =
(DomesticCat) session.load(DomesticCat.class, id); // required new DomesticCat proxy!
System.out.println(cat==dc); // false]]></programlisting>
<para>
However, the situation is not quite as bad as it looks. Even though we now have two references
to different proxy objects, the underlying instance will still be the same object:
</para>
<programlisting><![CDATA[cat.setWeight(11.0); // hit the db to initialize the proxy
System.out.println( dc.getWeight() ); // 11.0]]></programlisting>
<para>
Third, you may not use a CGLIB proxy for a <literal>final</literal> class or a class
with any <literal>final</literal> methods.
</para>
<para>
Finally, if your persistent object acquires any resources upon instantiation (eg. in
initializers or default constructor), then those resources will also be acquired by
the proxy. The proxy class is an actual subclass of the persistent class.
</para>
<para>
These problems are all due to fundamental limitations in Java's single inheritence model.
If you wish to avoid these problems your persistent classes must each implement an interface
that declares its business methods. You should specify these interfaces in the mapping file. eg.
</para>
<programlisting><![CDATA[<class name="CatImpl" proxy="Cat">
......
<subclass name="DomesticCatImpl" proxy="DomesticCat">
.....
</subclass>
</class>]]></programlisting>
<para>
where <literal>Cat</literal> implements the interface <literal>ICat</literal> and
<literal>DomesticCat</literal> implements the interface <literal>IDomesticCat</literal>. Then
proxies for instances of <literal>Cat</literal> and <literal>DomesticCat</literal> may be returned
by <literal>load()</literal> or <literal>iterate()</literal>. (Note that <literal>find()</literal>
does not usually return proxies.)
</para>
<programlisting><![CDATA[Cat cat = (Cat) session.load(CatImpl.class, catid);
Iterator iter = session.iterate("from cat in class CatImpl where cat.name='fritz'");
Cat fritz = (Cat) iter.next();]]></programlisting>
<para>
Relationships are also lazily initialized. This means you must declare any properties to be of
type <literal>Cat</literal>, not <literal>CatImpl</literal>.
</para>
<para>
Certain operations do <emphasis>not</emphasis> require proxy initialization
</para>
<itemizedlist spacing="compact">
<listitem>
<para>
<literal>equals()</literal>, if the persistent class does not override
<literal>equals()</literal>
</para>
</listitem>
<listitem>
<para>
<literal>hashCode()</literal>, if the persistent class does not override
<literal>hashCode()</literal>
</para>
</listitem>
<listitem>
<para>
The identifier getter method
</para>
</listitem>
</itemizedlist>
<para>
Hibernate will detect persistent classes that override <literal>equals()</literal> or
<literal>hashCode()</literal>.
</para>
<para>
You may of course also use Eager or Select fetching strategies for single-ended
associations:
</para>
<programlisting><![CDATA[<many-to-one name="mother" class="Cat" fetch="join"/>
<many-to-one name="father" class="Cat" fetch="select"/>]]></programlisting>
<para>
The first mapping tells Hibernate to fetch the associated <literal>mother</literal>
entity in the same initial <literal>SELECT</literal> using an <literal>OUTER JOIN</literal>.
You can set this option on as many *-to-one associations as you like, there is no
danger of creating a Cartesian product (opposed to collections). Note that you can
set the maximum depth of outer joined tables with the global configuration option
<literal>max_fetch_depth</literal> (see <xref linkend="configuration-optional-outerjoin"/>).
</para>
<para>
The second mapping enables an additional <literal>SELECT</literal> for the
retrieval of the <literal>father</literal>. Note that Hibernate does not guarantee
<emphasis>when</emphasis> this query will be executed. If it should be executed
immediately (right after the initial <literal>SELECT</literal>), disable proxying
on the target of the association by setting it to <literal>lazy="false"</literal>:
</para>
<programlisting><![CDATA[<class name="Cat" lazy="false">...</class>]]></programlisting>
<para>
(Note that this example uses only a single persistent class <literal>Cat</literal>
and self-referencing associations. This doesn't change the fetching behavior, as expexted.)
</para>
</sect2>
<sect2 id="performance-fetching-initialization">
<title>Initializing collections and proxies</title>
<para>
An exception (<literal>LazyInitializationException</literal>) will be thrown by
Hibernate if an unitialized collection or proxy is accessed outside of the scope
of the <literal>Session</literal>, ie. when the entity owning the collection or
having the reference to the proxy is in detached state.
</para>
<para>
Sometimes we need to ensure that a proxy or collection is initialized before closing the
<literal>Session</literal>. Of course, we can alway force initialization by calling
<literal>cat.getSex()</literal> or <literal>cat.getKittens().size()</literal>, for example.
But that is confusing to readers of the code and is not convenient for generic code.
</para>
<para>
The static methods <literal>Hibernate.initialize()</literal> and <literal>Hibernate.isInitialized()</literal>
provide the application with a convenient way of working with lazyily initialized collections or
proxies. <literal>Hibernate.initialize(cat)</literal> will force the initialization of a proxy,
<literal>cat</literal>, as long as its <literal>Session</literal> is still open.
<literal>Hibernate.initialize( cat.getKittens() )</literal> has a similar effect for the collection
of kittens.
</para>
<para>
Another option is to keep the <literal>Session</literal> open until all needed
collections and proxies have been loaded. In some application architectures,
particularly where the code that accesses data using Hibernate, and the code that
uses it are in different application layers, it can be a problem to ensure that the
<literal>Session</literal> is open when a collection is initialized. There are
two basic ways to deal with this issue:
</para>
<itemizedlist>
<listitem>
<para>
In a web-based application, a servlet filter can be used to close the
<literal>Session</literal> only at the very end of a user request, once
the rendering of the view is complete (the <emphasis>Open Session in
View</emphasis> pattern). Of course, this places heavy
demands on the correctness of the exception handling of your application
infrastructure. It is vitally important that the <literal>Session</literal>
is closed and the transaction ended before returning to the user, even
when an exception occurs during rendering of the view. The servlet filter
has to be able to access the <literal>Session</literal> for this approach.
We recommend that a <literal>ThreadLocal</literal> variable be used to
hold the current <literal>Session</literal> (see chapter 1,
<xref linkend="quickstart-playingwithcats"/>, for an example implementation).
</para>
</listitem>
<listitem>
<para>
In an application with a seperate business tier, the business logic must
"prepare" all collections that will be needed by the web tier before
returning. This means that the business tier should load all the data and
return all the data already initialized to the presentation/web tier that
is required for a particular use case. Usually, the application calls
<literal>Hibernate.initialize()</literal> for each collection that will
be needed in the web tier (this call must occur before the session is closed)
or retrieves the collection eagerly using a Hibernate query with a
<literal>FETCH</literal> clause or a <literal>FetchMode.JOIN</literal> in
<literal>Criteria</literal>. This is usually easier if you adopt the
<emphasis>Command</emphasis> pattern instead of a <emphasis>Session Facade</emphasis>.
</para>
</listitem>
<listitem>
<para>
You may also attach a previously loaded object to a new <literal>Session</literal>
with <literal>merge()</literal> or <literal>lock()</literal> before
accessing unitialized collections (or other proxies). Hibernate can not
do this automatically, as it would introduce ad hoc transaction semantics!
</para>
</listitem>
</itemizedlist>
<para>
Sometimes you don't want to initialize a large collection, but still need some
information about it (like its size) or a subset of the data.
</para>
<para>
You can use a collection filter to get the size of a collection without initializing it:
</para>
<programlisting><![CDATA[( (Integer) s.createFilter( collection, "select count(*)" ).list().get(0) ).intValue()]]></programlisting>
<para>
The <literal>createFilter()</literal> method is also used to efficiently retrieve subsets
of a collection without needing to initialize the whole collection:
</para>
<programlisting><![CDATA[s.createFilter( lazyCollection, "").setFirstResult(0).setMaxResults(10).list();]]></programlisting>
</sect2>
<sect2 id="performance-fetching-batch">
<title>Using batch fetching</title>
<para>
Hibernate can make efficient use of batch fetching, that is, Hibernate can load several uninitialized
proxies if one proxy is accessed (or collections. Batch fetching is an optimization for the lazy
loading strategy. There are two ways you can tune batch fetching: on the class and the collection level.
</para>
<para>
Batch fetching for classes/entities is easier to understand. Imagine you have the following situation
at runtime: You have 25 <literal>Cat</literal> instances loaded in a <literal>Session</literal>, each
<literal>Cat</literal> has a reference to its <literal>owner</literal>, a <literal>Person</literal>.
The <literal>Person</literal> class is mapped with a proxy, <literal>lazy="true"</literal>. If you now
iterate through all cats and call <literal>getOwner()</literal> on each, Hibernate will by default
execute 25 <literal>SELECT</literal> statements, to retrieve the proxied owners. You can tune this
behavior by specifying a <literal>batch-size</literal> in the mapping of <literal>Person</literal>:
</para>
<programlisting><![CDATA[<class name="Person" batch-size="10">...</class>]]></programlisting>
<para>
Hibernate will now execute only three queries, the pattern is 10, 10, 5. You can see that batch fetching
is a blind guess, as far as performance optimization goes, it depends on the number of unitilized proxies
in a particular <literal>Session</literal>.
</para>
<para>
You may also enable batch fetching of collections. For example, if each <literal>Person</literal> has
a lazy collection of <literal>Cat</literal>s, and 10 persons are currently loaded in the
<literal>Sesssion</literal>, iterating through all persons will generate 10 <literal>SELECT</literal>s,
one for every call to <literal>getCats()</literal>. If you enable batch fetching for the
<literal>cats</literal> collection in the mapping of <literal>Person</literal>, Hibernate can pre-fetch
collections:
</para>
<programlisting><![CDATA[<class name="Person">
<set name="cats" batch-size="3">
...
</set>
</class>]]></programlisting>
<para>
With a <literal>batch-size</literal> of 3, Hibernate will load 3, 3, 3, 1 collections in 4
<literal>SELECT</literal>s. Again, the value of the attribute depends on the expected number of
uninitialized collections in a particular <literal>Session</literal>.
</para>
<para>
Batch fetching of collections is particularly useful if you have a nested tree of items, ie.
the typical bill-of-materials pattern. (Although a <emphasis>nested set</emphasis> or a
<emphasis>materialized path</emphasis> might be a better option for read-mostly trees.)
</para>
</sect2>
<sect2 id="performance-fetching-lazyproperties">
<title>Using lazy property fetching</title>
<para>
Hibernate3 supports the lazy fetching of individual properties. This optimization technique
is also known as <emphasis>fetch groups</emphasis>. Please note that this is mostly a
marketing feature, as in practice, optimizing row reads is much more important than
optimization of column reads. However, only loading some properties of a class might
be useful in extreme cases, when legacy tables have hundreds of columns and the data model
can not be improved.
</para>
<para>
To enable lazy property loading, set the <literal>lazy</literal> attribute on your
particular property mappings:
</para>
<programlisting><![CDATA[<class name="Document">
<id name="id">
<generator class="native"/>
</id>
<property name="name" not-null="true" length="50"/>
<property name="summary" not-null="true" length="200" lazy="true"/>
<property name="text" not-null="true" length="2000" lazy="true"/>
</class>]]></programlisting>
<para>
Lazy property loading requires buildtime bytecode instrumentation! If your persistent
classes are not enhanced, Hibernate will silently ignore lazy property settings and
fall back to immediate fetching.
</para>
<para>
For bytecode instrumentation, use the following Ant task:
</para>
<programlisting><![CDATA[<target name="instrument" depends="compile">
<taskdef name="instrument" classname="org.hibernate.tool.instrument.InstrumentTask">
<classpath path="${jar.path}"/>
<classpath path="${classes.dir}"/>
<classpath refid="lib.class.path"/>
</taskdef>
<instrument verbose="true">
<fileset dir="${testclasses.dir}/org/hibernate/auction/model">
<include name="*.class"/>
</fileset>
</instrument>
</target>]]></programlisting>
<para>
A different (better?) way to avoid unnecessary column reads, at least for
read-only transactons is to use the projection features of HQL. This avoids
the need for buildtime bytecode processing.
</para>
<para>
TODO: Document issues with lazy property loading
</para>
</sect2>
<para>
A completely different way to avoid problems with N+1 selects is to use the second-level
cache.
</para>
</sect1>
<sect1 id="performance-cache" revision="1">
<title>The Second Level Cache</title>
<para>
A Hibernate <literal>Session</literal> is a transaction-level cache of persistent data. It is
possible to configure a cluster or JVM-level (<literal>SessionFactory</literal>-level) cache on
a class-by-class and collection-by-collection basis. You may even plug in a clustered cache. Be
careful. Caches are never aware of changes made to the persistent store by another application
(though they may be configured to regularly expire cached data).
</para>
<para>
By default, Hibernate uses EHCache for JVM-level caching. (JCS support is now deprecated and will
be removed in a future version of Hibernate.) You may choose a different implementation by
specifying the name of a class that implements <literal>org.hibernate.cache.CacheProvider</literal>
using the property <literal>hibernate.cache.provider_class</literal>.
</para>
<table frame="topbot" id="cacheproviders" revision="1">
<title>Cache Providers</title>
<tgroup cols='5' align='left' colsep='1' rowsep='1'>
<colspec colname='c1' colwidth="1*"/>
<colspec colname='c2' colwidth="3*"/>
<colspec colname='c3' colwidth="1*"/>
<colspec colname='c4' colwidth="1*"/>
<colspec colname='c5' colwidth="1*"/>
<thead>
<row>
<entry>Cache</entry>
<entry>Provider class</entry>
<entry>Type</entry>
<entry>Cluster Safe</entry>
<entry>Query Cache Supported</entry>
</row>
</thead>
<tbody>
<row>
<entry>Hashtable (not intended for production use)</entry>
<entry><literal>org.hibernate.cache.HashtableCacheProvider</literal></entry>
<entry>memory</entry>
<entry></entry>
<entry>yes</entry>
</row>
<row>
<entry>EHCache</entry>
<entry><literal>org.hibernate.cache.EhCacheProvider</literal></entry>
<entry>memory, disk</entry>
<entry></entry>
<entry>yes</entry>
</row>
<row>
<entry>OSCache</entry>
<entry><literal>org.hibernate.cache.OSCacheProvider</literal></entry>
<entry>memory, disk</entry>
<entry></entry>
<entry>yes</entry>
</row>
<row>
<entry>SwarmCache</entry>
<entry><literal>org.hibernate.cache.SwarmCacheProvider</literal></entry>
<entry>clustered (ip multicast)</entry>
<entry>yes (clustered invalidation)</entry>
<entry></entry>
</row>
<row>
<entry>JBoss TreeCache</entry>
<entry><literal>org.hibernate.cache.TreeCacheProvider</literal></entry>
<entry>clustered (ip multicast), transactional</entry>
<entry>yes (replication)</entry>
<entry>yes (clock sync req.)</entry>
</row>
</tbody>
</tgroup>
</table>
<sect2 id="performance-cache-mapping">
<title>Cache mappings</title>
<para>
The <literal>&lt;cache&gt;</literal> element of a class or collection mapping has the
following form:
</para>
<programlistingco>
<areaspec>
<area id="cache1" coords="2 70"/>
</areaspec>
<programlisting><![CDATA[<cache
usage="transactional|read-write|nonstrict-read-write|read-only"
/>]]></programlisting>
<calloutlist>
<callout arearefs="cache1">
<para>
<literal>usage</literal> specifies the caching strategy:
<literal>transactional</literal>,
<literal>read-write</literal>,
<literal>nonstrict-read-write</literal> or
<literal>read-only</literal>
</para>
</callout>
</calloutlist>
</programlistingco>
<para>
Alternatively (preferrably?), you may specify <literal>&lt;class-cache&gt;</literal> and
<literal>&lt;collection-cache&gt;</literal> elements in <literal>hibernate.cfg.xml</literal>.
</para>
<para>
The <literal>usage</literal> attribute specifies a <emphasis>cache concurrency strategy</emphasis>.
</para>
</sect2>
<sect2 id="performance-cache-readonly">
<title>Strategy: read only</title>
<para>
If your application needs to read but never modify instances of a persistent class, a
<literal>read-only</literal> cache may be used. This is the simplest and best performing
strategy. Its even perfectly safe for use in a cluster.
</para>
<programlisting><![CDATA[<class name="eg.Immutable" mutable="false">
<cache usage="read-only"/>
....
</class>]]></programlisting>
</sect2>
<sect2 id="performance-cache-readwrite">
<title>Strategy: read/write</title>
<para>
If the application needs to update data, a <literal>read-write</literal> cache might be appropriate.
This cache strategy should never be used if serializable transaction isolation level is required.
If the cache is used in a JTA environment, you must specify the property
<literal>hibernate.transaction.manager_lookup_class</literal>, naming a strategy for obtaining the
JTA <literal>TransactionManager</literal>. In other environments, you should ensure that the transaction
is completed when <literal>Session.close()</literal> or <literal>Session.disconnect()</literal> is called.
If you wish to use this strategy in a cluster, you should ensure that the underlying cache implementation
supports locking. The built-in cache providers do <emphasis>not</emphasis>.
</para>
<programlisting><![CDATA[<class name="eg.Cat" .... >
<cache usage="read-write"/>
....
<set name="kittens" ... >
<cache usage="read-write"/>
....
</set>
</class>]]></programlisting>
</sect2>
<sect2 id="performance-cache-nonstrict">
<title>Strategy: nonstrict read/write</title>
<para>
If the application only occasionally needs to update data (ie. if it is extremely unlikely that two
transactions would try to update the same item simultaneously) and strict transaction isolation is
not required, a <literal>nonstrict-read-write</literal> cache might be appropriate. If the cache is
used in a JTA environment, you must specify <literal>hibernate.transaction.manager_lookup_class</literal>.
In other environments, you should ensure that the transaction is completed when
<literal>Session.close()</literal> or <literal>Session.disconnect()</literal> is called.
</para>
</sect2>
<sect2 id="performance-cache-transactional">
<title>Strategy: transactional</title>
<para>
The <literal>transactional</literal> cache strategy provides support for fully transactional cache
providers such as JBoss TreeCache. Such a cache may only be used in a JTA environment and you must
specify <literal>hibernate.transaction.manager_lookup_class</literal>.
</para>
</sect2>
<para>
None of the cache providers support all of the cache concurrency strategies. The following table shows
which providers are compatible with which concurrency strategies.
</para>
<table frame="topbot">
<title>Cache Concurrency Strategy Support</title>
<tgroup cols='5' align='left' colsep='1' rowsep='1'>
<colspec colname='c1' colwidth="1*"/>
<colspec colname='c2' colwidth="1*"/>
<colspec colname='c3' colwidth="1*"/>
<colspec colname='c4' colwidth="1*"/>
<colspec colname='c5' colwidth="1*"/>
<thead>
<row>
<entry>Cache</entry>
<entry>read-only</entry>
<entry>nonstrict-read-write</entry>
<entry>read-write</entry>
<entry>transactional</entry>
</row>
</thead>
<tbody>
<row>
<entry>Hashtable (not intended for production use)</entry>
<entry>yes</entry>
<entry>yes</entry>
<entry>yes</entry>
<entry></entry>
</row>
<row>
<entry>EHCache</entry>
<entry>yes</entry>
<entry>yes</entry>
<entry>yes</entry>
<entry></entry>
</row>
<row>
<entry>OSCache</entry>
<entry>yes</entry>
<entry>yes</entry>
<entry>yes</entry>
<entry></entry>
</row>
<row>
<entry>SwarmCache</entry>
<entry>yes</entry>
<entry>yes</entry>
<entry></entry>
<entry></entry>
</row>
<row>
<entry>JBoss TreeCache</entry>
<entry>yes</entry>
<entry></entry>
<entry></entry>
<entry>yes</entry>
</row>
</tbody>
</tgroup>
</table>
</sect1>
<sect1 id="performance-sessioncache" revision="2">
<title>Managing the <literal>Session</literal> Cache</title>
<para>
Whenever you pass an object to <literal>save()</literal>, <literal>update()</literal>
or <literal>saveOrUpdate()</literal> and whenever you retrieve an object using
<literal>load()</literal>, <literal>get()</literal>, <literal>list()</literal>,
<literal>iterate()</literal> or <literal>scroll()</literal>, that object is added
to the internal cache of the <literal>Session</literal>.
</para>
<para>
When <literal>flush()</literal> is subsequently called,
the state of that object will be synchronized with the database. If you do not want
this synchronization to occur or if you are processing a huge number of objects and
need to manage memory efficiently, the <literal>evict()</literal> method may be
used to remove the object and its collections from the cache.
</para>
<programlisting><![CDATA[Iterator cats = sess.iterate("from eg.Cat as cat"); //a huge result set
while ( cats.hasNext() ) {
Cat cat = (Cat) iter.next();
doSomethingWithACat(cat);
sess.evict(cat);
}]]></programlisting>
<para>
Hibernate will evict associated entities automatically if the association is mapped
with <literal>cascade="all"</literal> or <literal>cascade="all-delete-orphan"</literal>.
</para>
<para>
The <literal>Session</literal> also provides a <literal>contains()</literal> method
to determine if an instance belongs to the session cache.
</para>
<para>
To completely evict all objects from the session cache, call <literal>Session.clear()</literal>
</para>
<para>
For the second-level cache, there are methods defined on <literal>SessionFactory</literal> for
evicting the cached state of an instance, entire class, collection instance or entire collection
role.
</para>
</sect1>
<sect1 id="performance-querycache" revision="1">
<title>The Query Cache</title>
<para>
Query result sets may also be cached. This is only useful for queries that are run
frequently with the same parameters. To use the query cache you must first enable it
by setting the property <literal>hibernate.cache.use_query_cache=true</literal>. This
causes the creation of two cache regions - one holding cached query result sets
(<literal>org.hibernate.cache.QueryCache</literal>), the other holding timestamps
of most recent updates to queried tables
(<literal>org.hibernate.cache.UpdateTimestampsCache</literal>). Note that the query
cache does not cache the state of any entities in the result set; it caches only
identifier values and results of value type. So the query cache is usually used in
conjunction with the second-level cache.
</para>
<para>
Most queries do not benefit from caching, so by default queries are not cached. To
enable caching, call <literal>Query.setCacheable(true)</literal>. This call allows
the query to look for existing cache results or add its results to the cache when
it is executed.
</para>
<para>
If you require fine-grained control over query cache expiration policies, you may
specify a named cache region for a particular query by calling
<literal>Query.setCacheRegion()</literal>.
</para>
<programlisting><![CDATA[List blogs = sess.createQuery("from Blog blog where blog.blogger = :blogger")
.setEntity("blogger", blogger)
.setMaxResults(15)
.setCacheable(true)
.setCacheRegion("frontpages")
.list();]]></programlisting>
<para>
If the query should force a refresh of its query cache region, you may call
<literal>Query.setForceCacheRefresh()</literal> to <literal>true</literal>.
This is particularly useful in cases where underlying data may have been updated
via a seperate process (i.e., not modified through Hibernate) and allows the
application to selectively refresh the query cache regions based on its
knowledge of those events. This is an alternative to eviction of a query
cache region. If you need fine-grained refresh control for many queries, use
this function instead of a new region for each query.
</para>
</sect1>
<para>
TODO: document statistics
</para>
</chapter>