1073 lines
51 KiB
XML
1073 lines
51 KiB
XML
<chapter id="performance">
|
|
<title>Improving performance</title>
|
|
|
|
<sect1 id="performance-collections">
|
|
<title>Understanding Collection performance</title>
|
|
|
|
<para>
|
|
We've already spent quite some time talking about collections.
|
|
In this section we will highlight a couple more issues about
|
|
how collections behave at runtime.
|
|
</para>
|
|
|
|
<sect2 id="performance-collections-taxonomy">
|
|
<title>Taxonomy</title>
|
|
|
|
<para>Hibernate defines three basic kinds of collections:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>collections of values</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>one to many associations</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>many to many associations</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
This classification distinguishes the various table and foreign key
|
|
relationships but does not tell us quite everything we need to know
|
|
about the relational model. To fully understand the relational structure
|
|
and performance characteristics, we must also consider the structure of
|
|
the primary key that is used by Hibernate to update or delete collection
|
|
rows. This suggests the following classification:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>indexed collections</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>sets</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>bags</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
All indexed collections (maps, lists, arrays) have a primary key consisting
|
|
of the <literal><key></literal> and <literal><index></literal>
|
|
columns. In this case collection updates are usually extremely efficient -
|
|
the primary key may be efficiently indexed and a particular row may be efficiently
|
|
located when Hibernate tries to update or delete it.
|
|
</para>
|
|
|
|
<para>
|
|
Sets have a primary key consisting of <literal><key></literal> and element
|
|
columns. This may be less efficient for some types of collection element, particularly
|
|
composite elements or large text or binary fields; the database may not be able to index
|
|
a complex primary key as efficently. On the other hand, for one to many or many to many
|
|
associations, particularly in the case of synthetic identifiers, it is likely to be just
|
|
as efficient. (Side-note: if you want <literal>SchemaExport</literal> to actually create
|
|
the primary key of a <literal><set></literal> for you, you must declare all columns
|
|
as <literal>not-null="true"</literal>.)
|
|
</para>
|
|
|
|
<para>
|
|
<literal><idbag></literal> mappings define a surrogate key, so they are
|
|
always very efficient to update. In fact, they are the best case.
|
|
</para>
|
|
|
|
<para>
|
|
Bags are the worst case. Since a bag permits duplicate element values and has no
|
|
index column, no primary key may be defined. Hibernate has no way of distinguishing
|
|
between duplicate rows. Hibernate resolves this problem by completely removing
|
|
(in a single <literal>DELETE</literal>) and recreating the collection whenever it
|
|
changes. This might be very inefficient.
|
|
</para>
|
|
|
|
<para>
|
|
Note that for a one-to-many association, the "primary key" may not be the physical
|
|
primary key of the database table - but even in this case, the above classification
|
|
is still useful. (It still reflects how Hibernate "locates" individual rows of the
|
|
collection.)
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="performance-collections-mostefficientupdate">
|
|
<title>Lists, maps, idbags and sets are the most efficient collections to update</title>
|
|
|
|
<para>
|
|
From the discussion above, it should be clear that indexed collections
|
|
and (usually) sets allow the most efficient operation in terms of adding,
|
|
removing and updating elements.
|
|
</para>
|
|
|
|
<para>
|
|
There is, arguably, one more advantage that indexed collections have over sets for
|
|
many to many associations or collections of values. Because of the structure of a
|
|
<literal>Set</literal>, Hibernate doesn't ever <literal>UPDATE</literal> a row when
|
|
an element is "changed". Changes to a <literal>Set</literal> always work via
|
|
<literal>INSERT</literal> and <literal>DELETE</literal> (of individual rows). Once
|
|
again, this consideration does not apply to one to many associations.
|
|
</para>
|
|
|
|
<para>
|
|
After observing that arrays cannot be lazy, we would conclude that lists, maps and
|
|
idbags are the most performant (non-inverse) collection types, with sets not far
|
|
behind. Sets are expected to be the most common kind of collection in Hibernate
|
|
applications. This is because the "set" semantics are most natural in the relational
|
|
model.
|
|
</para>
|
|
|
|
<para>
|
|
However, in well-designed Hibernate domain models, we usually see that most collections
|
|
are in fact one-to-many associations with <literal>inverse="true"</literal>. For these
|
|
associations, the update is handled by the many-to-one end of the association, and so
|
|
considerations of collection update performance simply do not apply.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="performance-collections-mostefficentinverse">
|
|
<title>Bags and lists are the most efficient inverse collections</title>
|
|
|
|
<para>
|
|
Just before you ditch bags forever, there is a particular case in which bags (and also lists)
|
|
are much more performant than sets. For a collection with <literal>inverse="true"</literal>
|
|
(the standard bidirectional one-to-many relationship idiom, for example) we can add elements
|
|
to a bag or list without needing to initialize (fetch) the bag elements! This is because
|
|
<literal>Collection.add()</literal> or <literal>Collection.addAll()</literal> must always
|
|
return true for a bag or <literal>List</literal> (unlike a <literal>Set</literal>). This can
|
|
make the following common code much faster.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[Parent p = (Parent) sess.load(Parent.class, id);
|
|
Child c = new Child();
|
|
c.setParent(p);
|
|
p.getChildren().add(c); //no need to fetch the collection!
|
|
sess.flush();]]></programlisting>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="performance-collections-oneshotdelete">
|
|
<title>One shot delete</title>
|
|
|
|
<para>
|
|
Occasionally, deleting collection elements one by one can be extremely inefficient. Hibernate
|
|
isn't completely stupid, so it knows not to do that in the case of an newly-empty collection
|
|
(if you called <literal>list.clear()</literal>, for example). In this case, Hibernate will
|
|
issue a single <literal>DELETE</literal> and we are done!
|
|
</para>
|
|
|
|
<para>
|
|
Suppose we add a single element to a collection of size twenty and then remove two elements.
|
|
Hibernate will issue one <literal>INSERT</literal> statement and two <literal>DELETE</literal>
|
|
statements (unless the collection is a bag). This is certainly desirable.
|
|
</para>
|
|
|
|
<para>
|
|
However, suppose that we remove eighteen elements, leaving two and then add thee new elements.
|
|
There are two possible ways to proceed
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>delete eighteen rows one by one and then insert three rows</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>remove the whole collection (in one SQL <literal>DELETE</literal>) and insert
|
|
all five current elements (one by one)</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
Hibernate isn't smart enough to know that the second option is probably quicker in this case.
|
|
(And it would probably be undesirable for Hibernate to be that smart; such behaviour might
|
|
confuse database triggers, etc.)
|
|
</para>
|
|
|
|
<para>
|
|
Fortunately, you can force this behaviour (ie. the second strategy) at any time by discarding
|
|
(ie. dereferencing) the original collection and returning a newly instantiated collection with
|
|
all the current elements. This can be very useful and powerful from time to time.
|
|
</para>
|
|
|
|
<para>
|
|
Of course, one-shot-delete does not apply to collections mapped <literal>inverse="true"</literal>.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="performance-fetching">
|
|
<title>Fetching strategies</title>
|
|
|
|
<para>
|
|
A fetching strategy describes the number of instances, the depth of a
|
|
subgraph of instances, and SQL <literal>SELECT</literal>s that are used
|
|
to retrieve these instances. Hibernate supports several strategies and you
|
|
can configure them on a global level, per entity class, per association, or
|
|
even for a particular query in HQL and with <literal>Criteria</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
Hibernate offers the following fetching strategies:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Lazy fetching</emphasis> - an associated instance (or a
|
|
collection) will only be loaded when needed, using an additional
|
|
defered <literal>SELECT</literal>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Batch fetching</emphasis> - an optimization strategy
|
|
for lazy fetching, Hibernate not only retrieves a single instance
|
|
(or collection), but several in the same <literal>SELECT</literal>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Eager fetching</emphasis> - Hibernate retrieves the
|
|
associated instance (or collection) in the same <literal>SELECT</literal>,
|
|
using an <literal>OUTER JOIN</literal>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Select fetching</emphasis> - a second <literal>SELECT</literal>
|
|
is used to retrieve the associated instance (or collection), but
|
|
it might be executed immediately and not defered until first access
|
|
(as with lazy fetching).
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
By default, Hibernate3 will only load the given entity using a single
|
|
<literal>SELECT</literal> statement if you retrieve an object with
|
|
<literal>load()</literal> or <literal>get()</literal>. This means that
|
|
all single-ended associations and collections are set for lazy fetching
|
|
by default. You can change this global default by setting the
|
|
<literal>default-lazy</literal> attribute on the <literal>hibernate-mapping</literal>
|
|
element to <literal>false</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
We'll now have a closer look at the individual fetching strategies and how
|
|
to change them for single-ended associations and collections.
|
|
</para>
|
|
|
|
<sect2 id="performance-fetching-collections" revision="3">
|
|
<title>Collection fetching</title>
|
|
|
|
<para>
|
|
Initialization of collections owned by persistent instances happens transparently
|
|
to the user, so the application would not normally need to worry about this (in
|
|
fact, transparent lazy initialization is the main reason why Hibernate needs its
|
|
own collection implementations). However, if the application tries something like
|
|
this:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[s = sessions.openSession();
|
|
User u = (User) s.find("from User u where u.name=?", userName, Hibernate.STRING).get(0);
|
|
Map permissions = u.getPermissions();
|
|
s.connection().commit();
|
|
s.close();
|
|
|
|
Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></programlisting>
|
|
|
|
<para>
|
|
It could be in for a nasty surprise. Since the permissions collection was not
|
|
initialized when the <literal>Session</literal> was closed, the collection
|
|
will not be able to load its state. <emphasis>Hibernate does not support lazy
|
|
initialization for detached objects</emphasis>. The fix is to move the
|
|
line that reads from the collection to just before the commit. (There are
|
|
other more advanced ways to solve this problem, some are discussed later.)
|
|
</para>
|
|
|
|
<para>
|
|
It's possible to use a non-lazy collection. However, it is intended that lazy
|
|
initialization be used for almost all collections, especially for collections
|
|
of entity references (its the default). If you define too many non-lazy associations
|
|
in your object model, Hibernate will end up needing to fetch the entire database
|
|
into memory in every transaction! Still, sometimes you want to use an additional
|
|
<literal>SELECT</literal> for a particular collection right away, not defered
|
|
until the first access happens:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<set name="permissions" fetch="select">
|
|
<key column="USER_ID"/>
|
|
<one-to-many class="Permission"/>
|
|
</set]]></programlisting>
|
|
|
|
<para>
|
|
Hibernate will now execute an immediate second <literal>SELECT</literal> loading
|
|
the collection of <literal>Permission</literal> instances, when a particular
|
|
<literal>User</literal> is retrieved.
|
|
</para>
|
|
|
|
<para>
|
|
Any kind of lazy fetching (and also Select fetching) is extremely vulnerable to
|
|
N+1 selects problems. So usually, we choose lazy fetching only as a default
|
|
strategy, and override it for a particular transaction, using the HQL
|
|
<literal>LEFT JOIN FETCH</literal> clause. This tells Hibernate to fetch the
|
|
association eagerly in the first select, using an outer join. In the
|
|
<literal>Criteria</literal> API, you would use
|
|
<literal>setFetchMode(FetchMode.EAGER)</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
You can always force outer join association fetching in the mapping file, by setting
|
|
<literal>fetch="join"</literal> (or use the old <literal>outer-join="true"</literal>
|
|
syntax). We don't recommend this setting, especially not for collections, since it is
|
|
incredibly rare to find an entity which is <emphasis>always</emphasis> used when
|
|
an associated entity is used, at least in a sufficiently large system.
|
|
</para>
|
|
|
|
<para>
|
|
Eager fetching for collections has another restriction: you may only set one
|
|
collection role per persistent class to be fetched per outer join. Hibernate forbids
|
|
Cartesian products when possible, <literal>SELECT</literal>ing two collections per
|
|
outer join would create one. This would almost always be slower than two (lazy or
|
|
non-defered) <literal>SELECT</literal>s. The restriction to a single outer-joined
|
|
collection applies to both the mapping fetching strategies and to HQL/Criteria queries.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="performance-fetching-proxies" revision="2">
|
|
<title>Single-ended association proxies</title>
|
|
|
|
<para>
|
|
Lazy fetching for collections is implemented using Hibernate's own implementation
|
|
of persistent collections. However, a different mechanism is needed for lazy
|
|
behavior in single-ended associations. The target entity of the association must
|
|
be proxied. Hibernate implements lazy initializing proxies for persistent objects
|
|
using runtime bytecode enhancement (via the excellent CGLIB library).
|
|
</para>
|
|
|
|
<para>
|
|
By default, Hibernate3 generates proxies (at startup) for all persistent classes
|
|
and uses them to enable lazy fetching of <literal>many-to-one</literal> and
|
|
<literal>one-to-one</literal> associations.
|
|
</para>
|
|
|
|
<para>
|
|
The mapping file may declare an interface to use as the proxy interface for that
|
|
class, with the <literal>proxy</literal> attribute. By default, Hibernate uses a subclass
|
|
of the class. <emphasis>Note that the proxied class must implement a default constructor
|
|
with at least package visibility. We recommend this constructor for all persistent classes!</emphasis>
|
|
</para>
|
|
|
|
<para>
|
|
There are some gotchas to be aware of when extending this approach to polymorphic
|
|
classes, eg.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<class name="Cat" proxy="Cat">
|
|
......
|
|
<subclass name="DomesticCat">
|
|
.....
|
|
</subclass>
|
|
</class>]]></programlisting>
|
|
|
|
<para>
|
|
Firstly, instances of <literal>Cat</literal> will never be castable to
|
|
<literal>DomesticCat</literal>, even if the underlying instance is an
|
|
instance of <literal>DomesticCat</literal>:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[Cat cat = (Cat) session.load(Cat.class, id); // instantiate a proxy (does not hit the db)
|
|
if ( cat.isDomesticCat() ) { // hit the db to initialize the proxy
|
|
DomesticCat dc = (DomesticCat) cat; // Error!
|
|
....
|
|
}]]></programlisting>
|
|
|
|
<para>
|
|
Secondly, it is possible to break proxy <literal>==</literal>.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[
|
|
Cat cat = (Cat) session.load(Cat.class, id); // instantiate a Cat proxy
|
|
DomesticCat dc =
|
|
(DomesticCat) session.load(DomesticCat.class, id); // required new DomesticCat proxy!
|
|
System.out.println(cat==dc); // false]]></programlisting>
|
|
|
|
<para>
|
|
However, the situation is not quite as bad as it looks. Even though we now have two references
|
|
to different proxy objects, the underlying instance will still be the same object:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[cat.setWeight(11.0); // hit the db to initialize the proxy
|
|
System.out.println( dc.getWeight() ); // 11.0]]></programlisting>
|
|
|
|
<para>
|
|
Third, you may not use a CGLIB proxy for a <literal>final</literal> class or a class
|
|
with any <literal>final</literal> methods.
|
|
</para>
|
|
|
|
<para>
|
|
Finally, if your persistent object acquires any resources upon instantiation (eg. in
|
|
initializers or default constructor), then those resources will also be acquired by
|
|
the proxy. The proxy class is an actual subclass of the persistent class.
|
|
</para>
|
|
|
|
<para>
|
|
These problems are all due to fundamental limitations in Java's single inheritence model.
|
|
If you wish to avoid these problems your persistent classes must each implement an interface
|
|
that declares its business methods. You should specify these interfaces in the mapping file. eg.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<class name="CatImpl" proxy="Cat">
|
|
......
|
|
<subclass name="DomesticCatImpl" proxy="DomesticCat">
|
|
.....
|
|
</subclass>
|
|
</class>]]></programlisting>
|
|
|
|
<para>
|
|
where <literal>Cat</literal> implements the interface <literal>ICat</literal> and
|
|
<literal>DomesticCat</literal> implements the interface <literal>IDomesticCat</literal>. Then
|
|
proxies for instances of <literal>Cat</literal> and <literal>DomesticCat</literal> may be returned
|
|
by <literal>load()</literal> or <literal>iterate()</literal>. (Note that <literal>find()</literal>
|
|
does not usually return proxies.)
|
|
</para>
|
|
|
|
<programlisting><![CDATA[Cat cat = (Cat) session.load(CatImpl.class, catid);
|
|
Iterator iter = session.iterate("from cat in class CatImpl where cat.name='fritz'");
|
|
Cat fritz = (Cat) iter.next();]]></programlisting>
|
|
|
|
<para>
|
|
Relationships are also lazily initialized. This means you must declare any properties to be of
|
|
type <literal>Cat</literal>, not <literal>CatImpl</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
Certain operations do <emphasis>not</emphasis> require proxy initialization
|
|
</para>
|
|
|
|
<itemizedlist spacing="compact">
|
|
<listitem>
|
|
<para>
|
|
<literal>equals()</literal>, if the persistent class does not override
|
|
<literal>equals()</literal>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<literal>hashCode()</literal>, if the persistent class does not override
|
|
<literal>hashCode()</literal>
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The identifier getter method
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
Hibernate will detect persistent classes that override <literal>equals()</literal> or
|
|
<literal>hashCode()</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
You may of course also use Eager or Select fetching strategies for single-ended
|
|
associations:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<many-to-one name="mother" class="Cat" fetch="join"/>
|
|
<many-to-one name="father" class="Cat" fetch="select"/>]]></programlisting>
|
|
|
|
<para>
|
|
The first mapping tells Hibernate to fetch the associated <literal>mother</literal>
|
|
entity in the same initial <literal>SELECT</literal> using an <literal>OUTER JOIN</literal>.
|
|
You can set this option on as many *-to-one associations as you like, there is no
|
|
danger of creating a Cartesian product (opposed to collections). Note that you can
|
|
set the maximum depth of outer joined tables with the global configuration option
|
|
<literal>max_fetch_depth</literal> (see <xref linkend="configuration-optional-outerjoin"/>).
|
|
</para>
|
|
|
|
<para>
|
|
The second mapping enables an additional <literal>SELECT</literal> for the
|
|
retrieval of the <literal>father</literal>. Note that Hibernate does not guarantee
|
|
<emphasis>when</emphasis> this query will be executed. If it should be executed
|
|
immediately (right after the initial <literal>SELECT</literal>), disable proxying
|
|
on the target of the association by setting it to <literal>lazy="false"</literal>:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<class name="Cat" lazy="false">...</class>]]></programlisting>
|
|
|
|
<para>
|
|
(Note that this example uses only a single persistent class <literal>Cat</literal>
|
|
and self-referencing associations. This doesn't change the fetching behavior, as expexted.)
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="performance-fetching-initialization">
|
|
<title>Initializing collections and proxies</title>
|
|
|
|
<para>
|
|
An exception (<literal>LazyInitializationException</literal>) will be thrown by
|
|
Hibernate if an unitialized collection or proxy is accessed outside of the scope
|
|
of the <literal>Session</literal>, ie. when the entity owning the collection or
|
|
having the reference to the proxy is in detached state.
|
|
</para>
|
|
|
|
<para>
|
|
Sometimes we need to ensure that a proxy or collection is initialized before closing the
|
|
<literal>Session</literal>. Of course, we can alway force initialization by calling
|
|
<literal>cat.getSex()</literal> or <literal>cat.getKittens().size()</literal>, for example.
|
|
But that is confusing to readers of the code and is not convenient for generic code.
|
|
</para>
|
|
|
|
<para>
|
|
The static methods <literal>Hibernate.initialize()</literal> and <literal>Hibernate.isInitialized()</literal>
|
|
provide the application with a convenient way of working with lazyily initialized collections or
|
|
proxies. <literal>Hibernate.initialize(cat)</literal> will force the initialization of a proxy,
|
|
<literal>cat</literal>, as long as its <literal>Session</literal> is still open.
|
|
<literal>Hibernate.initialize( cat.getKittens() )</literal> has a similar effect for the collection
|
|
of kittens.
|
|
</para>
|
|
|
|
<para>
|
|
Another option is to keep the <literal>Session</literal> open until all needed
|
|
collections and proxies have been loaded. In some application architectures,
|
|
particularly where the code that accesses data using Hibernate, and the code that
|
|
uses it are in different application layers, it can be a problem to ensure that the
|
|
<literal>Session</literal> is open when a collection is initialized. There are
|
|
two basic ways to deal with this issue:
|
|
</para>
|
|
|
|
<itemizedlist>
|
|
<listitem>
|
|
<para>
|
|
In a web-based application, a servlet filter can be used to close the
|
|
<literal>Session</literal> only at the very end of a user request, once
|
|
the rendering of the view is complete (the <emphasis>Open Session in
|
|
View</emphasis> pattern). Of course, this places heavy
|
|
demands on the correctness of the exception handling of your application
|
|
infrastructure. It is vitally important that the <literal>Session</literal>
|
|
is closed and the transaction ended before returning to the user, even
|
|
when an exception occurs during rendering of the view. The servlet filter
|
|
has to be able to access the <literal>Session</literal> for this approach.
|
|
We recommend that a <literal>ThreadLocal</literal> variable be used to
|
|
hold the current <literal>Session</literal> (see chapter 1,
|
|
<xref linkend="quickstart-playingwithcats"/>, for an example implementation).
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
In an application with a seperate business tier, the business logic must
|
|
"prepare" all collections that will be needed by the web tier before
|
|
returning. This means that the business tier should load all the data and
|
|
return all the data already initialized to the presentation/web tier that
|
|
is required for a particular use case. Usually, the application calls
|
|
<literal>Hibernate.initialize()</literal> for each collection that will
|
|
be needed in the web tier (this call must occur before the session is closed)
|
|
or retrieves the collection eagerly using a Hibernate query with a
|
|
<literal>FETCH</literal> clause or a <literal>FetchMode.JOIN</literal> in
|
|
<literal>Criteria</literal>. This is usually easier if you adopt the
|
|
<emphasis>Command</emphasis> pattern instead of a <emphasis>Session Facade</emphasis>.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
You may also attach a previously loaded object to a new <literal>Session</literal>
|
|
with <literal>merge()</literal> or <literal>lock()</literal> before
|
|
accessing unitialized collections (or other proxies). Hibernate can not
|
|
do this automatically, as it would introduce ad hoc transaction semantics!
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
Sometimes you don't want to initialize a large collection, but still need some
|
|
information about it (like its size) or a subset of the data.
|
|
</para>
|
|
|
|
<para>
|
|
You can use a collection filter to get the size of a collection without initializing it:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[( (Integer) s.createFilter( collection, "select count(*)" ).list().get(0) ).intValue()]]></programlisting>
|
|
|
|
<para>
|
|
The <literal>createFilter()</literal> method is also used to efficiently retrieve subsets
|
|
of a collection without needing to initialize the whole collection:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[s.createFilter( lazyCollection, "").setFirstResult(0).setMaxResults(10).list();]]></programlisting>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="performance-fetching-batch">
|
|
<title>Using batch fetching</title>
|
|
|
|
<para>
|
|
Hibernate can make efficient use of batch fetching, that is, Hibernate can load several uninitialized
|
|
proxies if one proxy is accessed (or collections. Batch fetching is an optimization for the lazy
|
|
loading strategy. There are two ways you can tune batch fetching: on the class and the collection level.
|
|
</para>
|
|
|
|
<para>
|
|
Batch fetching for classes/entities is easier to understand. Imagine you have the following situation
|
|
at runtime: You have 25 <literal>Cat</literal> instances loaded in a <literal>Session</literal>, each
|
|
<literal>Cat</literal> has a reference to its <literal>owner</literal>, a <literal>Person</literal>.
|
|
The <literal>Person</literal> class is mapped with a proxy, <literal>lazy="true"</literal>. If you now
|
|
iterate through all cats and call <literal>getOwner()</literal> on each, Hibernate will by default
|
|
execute 25 <literal>SELECT</literal> statements, to retrieve the proxied owners. You can tune this
|
|
behavior by specifying a <literal>batch-size</literal> in the mapping of <literal>Person</literal>:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<class name="Person" batch-size="10">...</class>]]></programlisting>
|
|
|
|
<para>
|
|
Hibernate will now execute only three queries, the pattern is 10, 10, 5. You can see that batch fetching
|
|
is a blind guess, as far as performance optimization goes, it depends on the number of unitilized proxies
|
|
in a particular <literal>Session</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
You may also enable batch fetching of collections. For example, if each <literal>Person</literal> has
|
|
a lazy collection of <literal>Cat</literal>s, and 10 persons are currently loaded in the
|
|
<literal>Sesssion</literal>, iterating through all persons will generate 10 <literal>SELECT</literal>s,
|
|
one for every call to <literal>getCats()</literal>. If you enable batch fetching for the
|
|
<literal>cats</literal> collection in the mapping of <literal>Person</literal>, Hibernate can pre-fetch
|
|
collections:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<class name="Person">
|
|
<set name="cats" batch-size="3">
|
|
...
|
|
</set>
|
|
</class>]]></programlisting>
|
|
|
|
<para>
|
|
With a <literal>batch-size</literal> of 3, Hibernate will load 3, 3, 3, 1 collections in 4
|
|
<literal>SELECT</literal>s. Again, the value of the attribute depends on the expected number of
|
|
uninitialized collections in a particular <literal>Session</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
Batch fetching of collections is particularly useful if you have a nested tree of items, ie.
|
|
the typical bill-of-materials pattern. (Although a <emphasis>nested set</emphasis> or a
|
|
<emphasis>materialized path</emphasis> might be a better option for read-mostly trees.)
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="performance-fetching-lazyproperties">
|
|
<title>Using lazy property fetching</title>
|
|
|
|
<para>
|
|
Hibernate3 supports the lazy fetching of individual properties. This optimization technique
|
|
is also known as <emphasis>fetch groups</emphasis>. Please note that this is mostly a
|
|
marketing feature, as in practice, optimizing row reads is much more important than
|
|
optimization of column reads. However, only loading some properties of a class might
|
|
be useful in extreme cases, when legacy tables have hundreds of columns and the data model
|
|
can not be improved.
|
|
</para>
|
|
|
|
<para>
|
|
To enable lazy property loading, set the <literal>lazy</literal> attribute on your
|
|
particular property mappings:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<class name="Document">
|
|
<id name="id">
|
|
<generator class="native"/>
|
|
</id>
|
|
<property name="name" not-null="true" length="50"/>
|
|
<property name="summary" not-null="true" length="200" lazy="true"/>
|
|
<property name="text" not-null="true" length="2000" lazy="true"/>
|
|
</class>]]></programlisting>
|
|
|
|
<para>
|
|
Lazy property loading requires buildtime bytecode instrumentation! If your persistent
|
|
classes are not enhanced, Hibernate will silently ignore lazy property settings and
|
|
fall back to immediate fetching.
|
|
</para>
|
|
|
|
<para>
|
|
For bytecode instrumentation, use the following Ant task:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<target name="instrument" depends="compile">
|
|
<taskdef name="instrument" classname="org.hibernate.tool.instrument.InstrumentTask">
|
|
<classpath path="${jar.path}"/>
|
|
<classpath path="${classes.dir}"/>
|
|
<classpath refid="lib.class.path"/>
|
|
</taskdef>
|
|
|
|
<instrument verbose="true">
|
|
<fileset dir="${testclasses.dir}/org/hibernate/auction/model">
|
|
<include name="*.class"/>
|
|
</fileset>
|
|
</instrument>
|
|
</target>]]></programlisting>
|
|
|
|
<para>
|
|
A different (better?) way to avoid unnecessary column reads, at least for
|
|
read-only transactons is to use the projection features of HQL. This avoids
|
|
the need for buildtime bytecode processing.
|
|
</para>
|
|
|
|
<para>
|
|
TODO: Document issues with lazy property loading
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<para>
|
|
A completely different way to avoid problems with N+1 selects is to use the second-level
|
|
cache.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="performance-cache" revision="1">
|
|
<title>The Second Level Cache</title>
|
|
|
|
<para>
|
|
A Hibernate <literal>Session</literal> is a transaction-level cache of persistent data. It is
|
|
possible to configure a cluster or JVM-level (<literal>SessionFactory</literal>-level) cache on
|
|
a class-by-class and collection-by-collection basis. You may even plug in a clustered cache. Be
|
|
careful. Caches are never aware of changes made to the persistent store by another application
|
|
(though they may be configured to regularly expire cached data).
|
|
</para>
|
|
|
|
<para>
|
|
By default, Hibernate uses EHCache for JVM-level caching. (JCS support is now deprecated and will
|
|
be removed in a future version of Hibernate.) You may choose a different implementation by
|
|
specifying the name of a class that implements <literal>org.hibernate.cache.CacheProvider</literal>
|
|
using the property <literal>hibernate.cache.provider_class</literal>.
|
|
</para>
|
|
|
|
<table frame="topbot" id="cacheproviders" revision="1">
|
|
<title>Cache Providers</title>
|
|
<tgroup cols='5' align='left' colsep='1' rowsep='1'>
|
|
<colspec colname='c1' colwidth="1*"/>
|
|
<colspec colname='c2' colwidth="3*"/>
|
|
<colspec colname='c3' colwidth="1*"/>
|
|
<colspec colname='c4' colwidth="1*"/>
|
|
<colspec colname='c5' colwidth="1*"/>
|
|
<thead>
|
|
<row>
|
|
<entry>Cache</entry>
|
|
<entry>Provider class</entry>
|
|
<entry>Type</entry>
|
|
<entry>Cluster Safe</entry>
|
|
<entry>Query Cache Supported</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>Hashtable (not intended for production use)</entry>
|
|
<entry><literal>org.hibernate.cache.HashtableCacheProvider</literal></entry>
|
|
<entry>memory</entry>
|
|
<entry></entry>
|
|
<entry>yes</entry>
|
|
</row>
|
|
<row>
|
|
<entry>EHCache</entry>
|
|
<entry><literal>org.hibernate.cache.EhCacheProvider</literal></entry>
|
|
<entry>memory, disk</entry>
|
|
<entry></entry>
|
|
<entry>yes</entry>
|
|
</row>
|
|
<row>
|
|
<entry>OSCache</entry>
|
|
<entry><literal>org.hibernate.cache.OSCacheProvider</literal></entry>
|
|
<entry>memory, disk</entry>
|
|
<entry></entry>
|
|
<entry>yes</entry>
|
|
</row>
|
|
<row>
|
|
<entry>SwarmCache</entry>
|
|
<entry><literal>org.hibernate.cache.SwarmCacheProvider</literal></entry>
|
|
<entry>clustered (ip multicast)</entry>
|
|
<entry>yes (clustered invalidation)</entry>
|
|
<entry></entry>
|
|
</row>
|
|
<row>
|
|
<entry>JBoss TreeCache</entry>
|
|
<entry><literal>org.hibernate.cache.TreeCacheProvider</literal></entry>
|
|
<entry>clustered (ip multicast), transactional</entry>
|
|
<entry>yes (replication)</entry>
|
|
<entry>yes (clock sync req.)</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
<sect2 id="performance-cache-mapping">
|
|
<title>Cache mappings</title>
|
|
|
|
<para>
|
|
The <literal><cache></literal> element of a class or collection mapping has the
|
|
following form:
|
|
</para>
|
|
|
|
<programlistingco>
|
|
<areaspec>
|
|
<area id="cache1" coords="2 70"/>
|
|
</areaspec>
|
|
<programlisting><![CDATA[<cache
|
|
usage="transactional|read-write|nonstrict-read-write|read-only"
|
|
/>]]></programlisting>
|
|
<calloutlist>
|
|
<callout arearefs="cache1">
|
|
<para>
|
|
<literal>usage</literal> specifies the caching strategy:
|
|
<literal>transactional</literal>,
|
|
<literal>read-write</literal>,
|
|
<literal>nonstrict-read-write</literal> or
|
|
<literal>read-only</literal>
|
|
</para>
|
|
</callout>
|
|
</calloutlist>
|
|
</programlistingco>
|
|
|
|
<para>
|
|
Alternatively (preferrably?), you may specify <literal><class-cache></literal> and
|
|
<literal><collection-cache></literal> elements in <literal>hibernate.cfg.xml</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>usage</literal> attribute specifies a <emphasis>cache concurrency strategy</emphasis>.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="performance-cache-readonly">
|
|
<title>Strategy: read only</title>
|
|
|
|
<para>
|
|
If your application needs to read but never modify instances of a persistent class, a
|
|
<literal>read-only</literal> cache may be used. This is the simplest and best performing
|
|
strategy. Its even perfectly safe for use in a cluster.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<class name="eg.Immutable" mutable="false">
|
|
<cache usage="read-only"/>
|
|
....
|
|
</class>]]></programlisting>
|
|
|
|
</sect2>
|
|
|
|
|
|
<sect2 id="performance-cache-readwrite">
|
|
<title>Strategy: read/write</title>
|
|
|
|
<para>
|
|
If the application needs to update data, a <literal>read-write</literal> cache might be appropriate.
|
|
This cache strategy should never be used if serializable transaction isolation level is required.
|
|
If the cache is used in a JTA environment, you must specify the property
|
|
<literal>hibernate.transaction.manager_lookup_class</literal>, naming a strategy for obtaining the
|
|
JTA <literal>TransactionManager</literal>. In other environments, you should ensure that the transaction
|
|
is completed when <literal>Session.close()</literal> or <literal>Session.disconnect()</literal> is called.
|
|
If you wish to use this strategy in a cluster, you should ensure that the underlying cache implementation
|
|
supports locking. The built-in cache providers do <emphasis>not</emphasis>.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[<class name="eg.Cat" .... >
|
|
<cache usage="read-write"/>
|
|
....
|
|
<set name="kittens" ... >
|
|
<cache usage="read-write"/>
|
|
....
|
|
</set>
|
|
</class>]]></programlisting>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="performance-cache-nonstrict">
|
|
<title>Strategy: nonstrict read/write</title>
|
|
|
|
<para>
|
|
If the application only occasionally needs to update data (ie. if it is extremely unlikely that two
|
|
transactions would try to update the same item simultaneously) and strict transaction isolation is
|
|
not required, a <literal>nonstrict-read-write</literal> cache might be appropriate. If the cache is
|
|
used in a JTA environment, you must specify <literal>hibernate.transaction.manager_lookup_class</literal>.
|
|
In other environments, you should ensure that the transaction is completed when
|
|
<literal>Session.close()</literal> or <literal>Session.disconnect()</literal> is called.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="performance-cache-transactional">
|
|
<title>Strategy: transactional</title>
|
|
|
|
<para>
|
|
The <literal>transactional</literal> cache strategy provides support for fully transactional cache
|
|
providers such as JBoss TreeCache. Such a cache may only be used in a JTA environment and you must
|
|
specify <literal>hibernate.transaction.manager_lookup_class</literal>.
|
|
</para>
|
|
|
|
</sect2>
|
|
|
|
<para>
|
|
None of the cache providers support all of the cache concurrency strategies. The following table shows
|
|
which providers are compatible with which concurrency strategies.
|
|
</para>
|
|
|
|
<table frame="topbot">
|
|
<title>Cache Concurrency Strategy Support</title>
|
|
<tgroup cols='5' align='left' colsep='1' rowsep='1'>
|
|
<colspec colname='c1' colwidth="1*"/>
|
|
<colspec colname='c2' colwidth="1*"/>
|
|
<colspec colname='c3' colwidth="1*"/>
|
|
<colspec colname='c4' colwidth="1*"/>
|
|
<colspec colname='c5' colwidth="1*"/>
|
|
<thead>
|
|
<row>
|
|
<entry>Cache</entry>
|
|
<entry>read-only</entry>
|
|
<entry>nonstrict-read-write</entry>
|
|
<entry>read-write</entry>
|
|
<entry>transactional</entry>
|
|
</row>
|
|
</thead>
|
|
<tbody>
|
|
<row>
|
|
<entry>Hashtable (not intended for production use)</entry>
|
|
<entry>yes</entry>
|
|
<entry>yes</entry>
|
|
<entry>yes</entry>
|
|
<entry></entry>
|
|
</row>
|
|
<row>
|
|
<entry>EHCache</entry>
|
|
<entry>yes</entry>
|
|
<entry>yes</entry>
|
|
<entry>yes</entry>
|
|
<entry></entry>
|
|
</row>
|
|
<row>
|
|
<entry>OSCache</entry>
|
|
<entry>yes</entry>
|
|
<entry>yes</entry>
|
|
<entry>yes</entry>
|
|
<entry></entry>
|
|
</row>
|
|
<row>
|
|
<entry>SwarmCache</entry>
|
|
<entry>yes</entry>
|
|
<entry>yes</entry>
|
|
<entry></entry>
|
|
<entry></entry>
|
|
</row>
|
|
<row>
|
|
<entry>JBoss TreeCache</entry>
|
|
<entry>yes</entry>
|
|
<entry></entry>
|
|
<entry></entry>
|
|
<entry>yes</entry>
|
|
</row>
|
|
</tbody>
|
|
</tgroup>
|
|
</table>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="performance-sessioncache" revision="2">
|
|
<title>Managing the <literal>Session</literal> Cache</title>
|
|
|
|
<para>
|
|
Whenever you pass an object to <literal>save()</literal>, <literal>update()</literal>
|
|
or <literal>saveOrUpdate()</literal> and whenever you retrieve an object using
|
|
<literal>load()</literal>, <literal>get()</literal>, <literal>list()</literal>,
|
|
<literal>iterate()</literal> or <literal>scroll()</literal>, that object is added
|
|
to the internal cache of the <literal>Session</literal>.
|
|
</para>
|
|
<para>
|
|
When <literal>flush()</literal> is subsequently called,
|
|
the state of that object will be synchronized with the database. If you do not want
|
|
this synchronization to occur or if you are processing a huge number of objects and
|
|
need to manage memory efficiently, the <literal>evict()</literal> method may be
|
|
used to remove the object and its collections from the cache.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[Iterator cats = sess.iterate("from eg.Cat as cat"); //a huge result set
|
|
while ( cats.hasNext() ) {
|
|
Cat cat = (Cat) iter.next();
|
|
doSomethingWithACat(cat);
|
|
sess.evict(cat);
|
|
}]]></programlisting>
|
|
|
|
<para>
|
|
Hibernate will evict associated entities automatically if the association is mapped
|
|
with <literal>cascade="all"</literal> or <literal>cascade="all-delete-orphan"</literal>.
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>Session</literal> also provides a <literal>contains()</literal> method
|
|
to determine if an instance belongs to the session cache.
|
|
</para>
|
|
|
|
<para>
|
|
To completely evict all objects from the session cache, call <literal>Session.clear()</literal>
|
|
</para>
|
|
|
|
<para>
|
|
For the second-level cache, there are methods defined on <literal>SessionFactory</literal> for
|
|
evicting the cached state of an instance, entire class, collection instance or entire collection
|
|
role.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="performance-querycache" revision="1">
|
|
<title>The Query Cache</title>
|
|
|
|
<para>
|
|
Query result sets may also be cached. This is only useful for queries that are run
|
|
frequently with the same parameters. To use the query cache you must first enable it
|
|
by setting the property <literal>hibernate.cache.use_query_cache=true</literal>. This
|
|
causes the creation of two cache regions - one holding cached query result sets
|
|
(<literal>org.hibernate.cache.QueryCache</literal>), the other holding timestamps
|
|
of most recent updates to queried tables
|
|
(<literal>org.hibernate.cache.UpdateTimestampsCache</literal>). Note that the query
|
|
cache does not cache the state of any entities in the result set; it caches only
|
|
identifier values and results of value type. So the query cache is usually used in
|
|
conjunction with the second-level cache.
|
|
</para>
|
|
|
|
<para>
|
|
Most queries do not benefit from caching, so by default queries are not cached. To
|
|
enable caching, call <literal>Query.setCacheable(true)</literal>. This call allows
|
|
the query to look for existing cache results or add its results to the cache when
|
|
it is executed.
|
|
</para>
|
|
|
|
<para>
|
|
If you require fine-grained control over query cache expiration policies, you may
|
|
specify a named cache region for a particular query by calling
|
|
<literal>Query.setCacheRegion()</literal>.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[List blogs = sess.createQuery("from Blog blog where blog.blogger = :blogger")
|
|
.setEntity("blogger", blogger)
|
|
.setMaxResults(15)
|
|
.setCacheable(true)
|
|
.setCacheRegion("frontpages")
|
|
.list();]]></programlisting>
|
|
|
|
<para>
|
|
If the query should force a refresh of its query cache region, you may call
|
|
<literal>Query.setForceCacheRefresh()</literal> to <literal>true</literal>.
|
|
This is particularly useful in cases where underlying data may have been updated
|
|
via a seperate process (i.e., not modified through Hibernate) and allows the
|
|
application to selectively refresh the query cache regions based on its
|
|
knowledge of those events. This is an alternative to eviction of a query
|
|
cache region. If you need fine-grained refresh control for many queries, use
|
|
this function instead of a new region for each query.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|