fixed some probs with performance section

git-svn-id: https://svn.jboss.org/repos/hibernate/trunk/Hibernate3/doc@5546 1b8cb986-b30d-0410-93ca-fae66ebed9b2
This commit is contained in:
Gavin King 2005-02-04 04:19:15 +00:00
parent 4b8b1266dc
commit b490a52e42
1 changed files with 315 additions and 291 deletions

View File

@ -1,210 +1,15 @@
<chapter id="performance">
<title>Improving performance</title>
<sect1 id="performance-collections">
<title>Understanding Collection performance</title>
<para>
We've already spent quite some time talking about collections.
In this section we will highlight a couple more issues about
how collections behave at runtime.
</para>
<sect2 id="performance-collections-taxonomy">
<title>Taxonomy</title>
<para>Hibernate defines three basic kinds of collections:</para>
<itemizedlist>
<listitem>
<para>collections of values</para>
</listitem>
<listitem>
<para>one to many associations</para>
</listitem>
<listitem>
<para>many to many associations</para>
</listitem>
</itemizedlist>
<para>
This classification distinguishes the various table and foreign key
relationships but does not tell us quite everything we need to know
about the relational model. To fully understand the relational structure
and performance characteristics, we must also consider the structure of
the primary key that is used by Hibernate to update or delete collection
rows. This suggests the following classification:
</para>
<itemizedlist>
<listitem>
<para>indexed collections</para>
</listitem>
<listitem>
<para>sets</para>
</listitem>
<listitem>
<para>bags</para>
</listitem>
</itemizedlist>
<para>
All indexed collections (maps, lists, arrays) have a primary key consisting
of the <literal>&lt;key&gt;</literal> and <literal>&lt;index&gt;</literal>
columns. In this case collection updates are usually extremely efficient -
the primary key may be efficiently indexed and a particular row may be efficiently
located when Hibernate tries to update or delete it.
</para>
<para>
Sets have a primary key consisting of <literal>&lt;key&gt;</literal> and element
columns. This may be less efficient for some types of collection element, particularly
composite elements or large text or binary fields; the database may not be able to index
a complex primary key as efficently. On the other hand, for one to many or many to many
associations, particularly in the case of synthetic identifiers, it is likely to be just
as efficient. (Side-note: if you want <literal>SchemaExport</literal> to actually create
the primary key of a <literal>&lt;set&gt;</literal> for you, you must declare all columns
as <literal>not-null="true"</literal>.)
</para>
<para>
<literal>&lt;idbag&gt;</literal> mappings define a surrogate key, so they are
always very efficient to update. In fact, they are the best case.
</para>
<para>
Bags are the worst case. Since a bag permits duplicate element values and has no
index column, no primary key may be defined. Hibernate has no way of distinguishing
between duplicate rows. Hibernate resolves this problem by completely removing
(in a single <literal>DELETE</literal>) and recreating the collection whenever it
changes. This might be very inefficient.
</para>
<para>
Note that for a one-to-many association, the "primary key" may not be the physical
primary key of the database table - but even in this case, the above classification
is still useful. (It still reflects how Hibernate "locates" individual rows of the
collection.)
</para>
</sect2>
<sect2 id="performance-collections-mostefficientupdate">
<title>Lists, maps, idbags and sets are the most efficient collections to update</title>
<para>
From the discussion above, it should be clear that indexed collections
and (usually) sets allow the most efficient operation in terms of adding,
removing and updating elements.
</para>
<para>
There is, arguably, one more advantage that indexed collections have over sets for
many to many associations or collections of values. Because of the structure of a
<literal>Set</literal>, Hibernate doesn't ever <literal>UPDATE</literal> a row when
an element is "changed". Changes to a <literal>Set</literal> always work via
<literal>INSERT</literal> and <literal>DELETE</literal> (of individual rows). Once
again, this consideration does not apply to one to many associations.
</para>
<para>
After observing that arrays cannot be lazy, we would conclude that lists, maps and
idbags are the most performant (non-inverse) collection types, with sets not far
behind. Sets are expected to be the most common kind of collection in Hibernate
applications. This is because the "set" semantics are most natural in the relational
model.
</para>
<para>
However, in well-designed Hibernate domain models, we usually see that most collections
are in fact one-to-many associations with <literal>inverse="true"</literal>. For these
associations, the update is handled by the many-to-one end of the association, and so
considerations of collection update performance simply do not apply.
</para>
</sect2>
<sect2 id="performance-collections-mostefficentinverse">
<title>Bags and lists are the most efficient inverse collections</title>
<para>
Just before you ditch bags forever, there is a particular case in which bags (and also lists)
are much more performant than sets. For a collection with <literal>inverse="true"</literal>
(the standard bidirectional one-to-many relationship idiom, for example) we can add elements
to a bag or list without needing to initialize (fetch) the bag elements! This is because
<literal>Collection.add()</literal> or <literal>Collection.addAll()</literal> must always
return true for a bag or <literal>List</literal> (unlike a <literal>Set</literal>). This can
make the following common code much faster.
</para>
<programlisting><![CDATA[Parent p = (Parent) sess.load(Parent.class, id);
Child c = new Child();
c.setParent(p);
p.getChildren().add(c); //no need to fetch the collection!
sess.flush();]]></programlisting>
</sect2>
<sect2 id="performance-collections-oneshotdelete">
<title>One shot delete</title>
<para>
Occasionally, deleting collection elements one by one can be extremely inefficient. Hibernate
isn't completely stupid, so it knows not to do that in the case of an newly-empty collection
(if you called <literal>list.clear()</literal>, for example). In this case, Hibernate will
issue a single <literal>DELETE</literal> and we are done!
</para>
<para>
Suppose we add a single element to a collection of size twenty and then remove two elements.
Hibernate will issue one <literal>INSERT</literal> statement and two <literal>DELETE</literal>
statements (unless the collection is a bag). This is certainly desirable.
</para>
<para>
However, suppose that we remove eighteen elements, leaving two and then add thee new elements.
There are two possible ways to proceed
</para>
<itemizedlist>
<listitem>
<para>delete eighteen rows one by one and then insert three rows</para>
</listitem>
<listitem>
<para>remove the whole collection (in one SQL <literal>DELETE</literal>) and insert
all five current elements (one by one)</para>
</listitem>
</itemizedlist>
<para>
Hibernate isn't smart enough to know that the second option is probably quicker in this case.
(And it would probably be undesirable for Hibernate to be that smart; such behaviour might
confuse database triggers, etc.)
</para>
<para>
Fortunately, you can force this behaviour (ie. the second strategy) at any time by discarding
(ie. dereferencing) the original collection and returning a newly instantiated collection with
all the current elements. This can be very useful and powerful from time to time.
</para>
<para>
Of course, one-shot-delete does not apply to collections mapped <literal>inverse="true"</literal>.
</para>
</sect2>
</sect1>
<sect1 id="performance-fetching">
<title>Fetching strategies</title>
<para>
A fetching strategy describes the number of instances, the depth of a
subgraph of instances, and SQL <literal>SELECT</literal>s that are used
to retrieve these instances. Hibernate supports several strategies and you
can configure them on a global level, per entity class, per association, or
even for a particular query in HQL and with <literal>Criteria</literal>.
A <emphasis>fetching strategy</emphasis> is the strategy Hibernate will
use for retrieving associated objects if the application needs to
navigate the association. Fetch strategies may be declared in the O/R
mapping metadata, or over-ridden by particular HQL or
<literal>Criteria</literal> query.
</para>
<para>
@ -212,126 +17,152 @@
</para>
<itemizedlist>
<listitem>
<listitem>
<para>
<emphasis>Lazy fetching</emphasis> - an associated instance (or a
collection) will only be loaded when needed, using an additional
defered <literal>SELECT</literal>.
</para>
</listitem>
<listitem>
<para>
<emphasis>Batch fetching</emphasis> - an optimization strategy
for lazy fetching, Hibernate not only retrieves a single instance
(or collection), but several in the same <literal>SELECT</literal>.
</para>
</listitem>
<listitem>
<para>
<emphasis>Eager fetching</emphasis> - Hibernate retrieves the
associated instance (or collection) in the same <literal>SELECT</literal>,
<emphasis>Join fetching</emphasis> - Hibernate retrieves the
associated instance or collection in the same <literal>SELECT</literal>,
using an <literal>OUTER JOIN</literal>.
</para>
</listitem>
<listitem>
<para>
<emphasis>Select fetching</emphasis> - a second <literal>SELECT</literal>
is used to retrieve the associated instance (or collection), but
it might be executed immediately and not defered until first access
(as with lazy fetching).
is used to retrieve the associated entity or collection. Unless
you explicitly disable lazy fetching by specifying <literal>lazy="false"</literal>,
this second select will only be executed when you actually access the
association.
</para>
</listitem>
<listitem>
<para>
<emphasis>Batch fetching</emphasis> - an optimization strategy
for select fetching - Hibernate retrieves a batch of entity instances
or collections in a single <literal>SELECT</literal>, by specifying
a list of primary keys or foreign keys.
</para>
</listitem>
</itemizedlist>
<para>
By default, Hibernate3 will only load the given entity using a single
<literal>SELECT</literal> statement if you retrieve an object with
<literal>load()</literal> or <literal>get()</literal>. This means that
all single-ended associations and collections are set for lazy fetching
by default. You can change this global default by setting the
<literal>default-lazy</literal> attribute on the <literal>hibernate-mapping</literal>
element to <literal>false</literal>.
By default, Hibernate3 always uses lazy select fetching, which is the best
default for most entities and collections in most applications. However,
there is one problem to be aware of. Access to a lazy association outside
of the context of an open Hibernate session will result in an exception.
For example:
</para>
<para>
We'll now have a closer look at the individual fetching strategies and how
to change them for single-ended associations and collections.
</para>
<sect2 id="performance-fetching-collections" revision="3">
<title>Collection fetching</title>
<para>
Initialization of collections owned by persistent instances happens transparently
to the user, so the application would not normally need to worry about this (in
fact, transparent lazy initialization is the main reason why Hibernate needs its
own collection implementations). However, if the application tries something like
this:
</para>
<programlisting><![CDATA[s = sessions.openSession();
User u = (User) s.find("from User u where u.name=?", userName, Hibernate.STRING).get(0);
Transaction tx = s.beginTransaction();
User u = (User) s.createQuery("from User u where u.name=:userName")
.setString("userName", userName).uniqueResult();
Map permissions = u.getPermissions();
s.connection().commit();
tx.commit();
s.close();
Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></programlisting>
<para>
It could be in for a nasty surprise. Since the permissions collection was not
initialized when the <literal>Session</literal> was closed, the collection
will not be able to load its state. <emphasis>Hibernate does not support lazy
initialization for detached objects</emphasis>. The fix is to move the
line that reads from the collection to just before the commit. (There are
other more advanced ways to solve this problem, some are discussed later.)
</para>
<para>
Since the permissions collection was not
initialized when the <literal>Session</literal> was closed, the collection
will not be able to load its state. <emphasis>Hibernate does not support lazy
initialization for detached objects</emphasis>. The fix is to move the
code that reads from the collection to just before the commit. (There are
other more advanced ways to solve this problem, some are discussed later.)
</para>
<para>
It's possible to use a non-lazy collection. However, it is intended that lazy
initialization be used for almost all collections, especially for collections
of entity references (its the default). If you define too many non-lazy associations
in your object model, Hibernate will end up needing to fetch the entire database
into memory in every transaction! Still, sometimes you want to use an additional
<literal>SELECT</literal> for a particular collection right away, not defered
until the first access happens:
</para>
<para>
It is also possible to use a non-lazy collection or join fetching, which is
non-lazy by nature. However, it is intended that lazy initialization be used
for almost all collections, especially for collections of entity references.
If you define too many non-lazy associations in your object model, Hibernate
will end up needing to fetch the entire database into memory in every
transaction!
</para>
<programlisting><![CDATA[<set name="permissions" fetch="select">
<key column="USER_ID"/>
<para>
On the other hand, we often want to choose join fetching (which is non-lazy by
nature) instead of select fetching in a particular transaction. We'll now see
how to customize the fetching strategy. In Hibernate3, the mechanisms for
choosing a fetch strategy are identical for single-valued associations and
collections.
</para>
<sect2 id="performance-fetching-custom" revision="3">
<title>Tuning fetch strategies</title>
<para>
Select fetching (the default) is extremely vulnerable to N+1 selects problems,
so we might want to enable join fetching in the mapping document:
</para>
<programlisting><![CDATA[<set name="permissions"
fetch="join">
<key column="userId"/>
<one-to-many class="Permission"/>
</set]]></programlisting>
<para>
Hibernate will now execute an immediate second <literal>SELECT</literal> loading
the collection of <literal>Permission</literal> instances, when a particular
<literal>User</literal> is retrieved.
The fetch strategy defined in the mapping document affects:
</para>
<itemizedlist>
<listitem>
<para>
retrieval via <literal>get()</literal> or <literal>load()</literal>
</para>
</listitem>
<listitem>
<para>
retrieval that happens implicitly when an association is navigated
(lazy fetching)
</para>
</listitem>
<listitem>
<para>
<literal>Criteria</literal> queries
</para>
</listitem>
</itemizedlist>
<para>
<emphasis>The fetch strategy specified in the mapping document does not affect
HQL queries</emphasis>
</para>
<para>
Any kind of lazy fetching (and also Select fetching) is extremely vulnerable to
N+1 selects problems. So usually, we choose lazy fetching only as a default
strategy, and override it for a particular transaction, using the HQL
<literal>LEFT JOIN FETCH</literal> clause. This tells Hibernate to fetch the
association eagerly in the first select, using an outer join. In the
<literal>Criteria</literal> API, you would use
<literal>setFetchMode(FetchMode.EAGER)</literal>.
Usually, we don't use the mapping document to customize fetching. Instead, we
keep the default behavior, and override it for a particular transaction, using
the HQL <literal>LEFT JOIN FETCH</literal> clause. This tells Hibernate to fetch
the association eagerly in the first select, using an outer join. In the
<literal>Criteria</literal> query API, you would use
<literal>setFetchMode(FetchMode.JOIN)</literal>.
</para>
<para>
You can always force outer join association fetching in the mapping file, by setting
<literal>fetch="join"</literal> (or use the old <literal>outer-join="true"</literal>
syntax). We don't recommend this setting, especially not for collections, since it is
incredibly rare to find an entity which is <emphasis>always</emphasis> used when
an associated entity is used, at least in a sufficiently large system.
If you ever feel like you wish you could change the fetching strategy used by
<literal>get()</literal> or <literal>load()</literal>, simply use a
<literal>Criteria</literal> query, for example:
</para>
<programlisting><![CDATA[User user = (User) session.createCriteria(User.class)
.setFetchMode("permissions", FetchMode.JOIN)
.add( Restrictions.idEq(userId) )
.uniqueResult();]]></programlisting>
<para>
Eager fetching for collections has another restriction: you may only set one
collection role per persistent class to be fetched per outer join. Hibernate forbids
(This is Hibernate's equivalent of what some ORM solutions call a "fetch plan".)
</para>
<para>
Join fetching for collections has one limitation: you may only set one collection
role per persistent class or query to be fetched per outer join. Hibernate forbids
Cartesian products when possible, <literal>SELECT</literal>ing two collections per
outer join would create one. This would almost always be slower than two (lazy or
non-defered) <literal>SELECT</literal>s. The restriction to a single outer-joined
collection applies to both the mapping fetching strategies and to HQL/Criteria queries.
collection applies to both the mapping fetching strategies and to HQL/Criteria
queries.
</para>
</sect2>
@ -430,7 +261,7 @@ Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></prog
where <literal>Cat</literal> implements the interface <literal>ICat</literal> and
<literal>DomesticCat</literal> implements the interface <literal>IDomesticCat</literal>. Then
proxies for instances of <literal>Cat</literal> and <literal>DomesticCat</literal> may be returned
by <literal>load()</literal> or <literal>iterate()</literal>. (Note that <literal>find()</literal>
by <literal>load()</literal> or <literal>iterate()</literal>. (Note that <literal>list()</literal>
does not usually return proxies.)
</para>
@ -608,8 +439,8 @@ Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></prog
<para>
Hibernate can make efficient use of batch fetching, that is, Hibernate can load several uninitialized
proxies if one proxy is accessed (or collections. Batch fetching is an optimization for the lazy
loading strategy. There are two ways you can tune batch fetching: on the class and the collection level.
proxies if one proxy is accessed (or collections. Batch fetching is an optimization of the lazy select
fetching strategy. There are two ways you can tune batch fetching: on the class and the collection level.
</para>
<para>
@ -625,9 +456,7 @@ Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></prog
<programlisting><![CDATA[<class name="Person" batch-size="10">...</class>]]></programlisting>
<para>
Hibernate will now execute only three queries, the pattern is 10, 10, 5. You can see that batch fetching
is a blind guess, as far as performance optimization goes, it depends on the number of unitilized proxies
in a particular <literal>Session</literal>.
Hibernate will now execute only three queries, the pattern is 10, 10, 5.
</para>
<para>
@ -646,7 +475,7 @@ Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></prog
</class>]]></programlisting>
<para>
With a <literal>batch-size</literal> of 3, Hibernate will load 3, 3, 3, 1 collections in 4
With a <literal>batch-size</literal> of 8, Hibernate will load 3, 3, 3, 1 collections in four
<literal>SELECT</literal>s. Again, the value of the attribute depends on the expected number of
uninitialized collections in a particular <literal>Session</literal>.
</para>
@ -1069,6 +898,201 @@ while ( cats.hasNext() ) {
</sect1>
<sect1 id="performance-collections">
<title>Understanding Collection performance</title>
<para>
We've already spent quite some time talking about collections.
In this section we will highlight a couple more issues about
how collections behave at runtime.
</para>
<sect2 id="performance-collections-taxonomy">
<title>Taxonomy</title>
<para>Hibernate defines three basic kinds of collections:</para>
<itemizedlist>
<listitem>
<para>collections of values</para>
</listitem>
<listitem>
<para>one to many associations</para>
</listitem>
<listitem>
<para>many to many associations</para>
</listitem>
</itemizedlist>
<para>
This classification distinguishes the various table and foreign key
relationships but does not tell us quite everything we need to know
about the relational model. To fully understand the relational structure
and performance characteristics, we must also consider the structure of
the primary key that is used by Hibernate to update or delete collection
rows. This suggests the following classification:
</para>
<itemizedlist>
<listitem>
<para>indexed collections</para>
</listitem>
<listitem>
<para>sets</para>
</listitem>
<listitem>
<para>bags</para>
</listitem>
</itemizedlist>
<para>
All indexed collections (maps, lists, arrays) have a primary key consisting
of the <literal>&lt;key&gt;</literal> and <literal>&lt;index&gt;</literal>
columns. In this case collection updates are usually extremely efficient -
the primary key may be efficiently indexed and a particular row may be efficiently
located when Hibernate tries to update or delete it.
</para>
<para>
Sets have a primary key consisting of <literal>&lt;key&gt;</literal> and element
columns. This may be less efficient for some types of collection element, particularly
composite elements or large text or binary fields; the database may not be able to index
a complex primary key as efficently. On the other hand, for one to many or many to many
associations, particularly in the case of synthetic identifiers, it is likely to be just
as efficient. (Side-note: if you want <literal>SchemaExport</literal> to actually create
the primary key of a <literal>&lt;set&gt;</literal> for you, you must declare all columns
as <literal>not-null="true"</literal>.)
</para>
<para>
<literal>&lt;idbag&gt;</literal> mappings define a surrogate key, so they are
always very efficient to update. In fact, they are the best case.
</para>
<para>
Bags are the worst case. Since a bag permits duplicate element values and has no
index column, no primary key may be defined. Hibernate has no way of distinguishing
between duplicate rows. Hibernate resolves this problem by completely removing
(in a single <literal>DELETE</literal>) and recreating the collection whenever it
changes. This might be very inefficient.
</para>
<para>
Note that for a one-to-many association, the "primary key" may not be the physical
primary key of the database table - but even in this case, the above classification
is still useful. (It still reflects how Hibernate "locates" individual rows of the
collection.)
</para>
</sect2>
<sect2 id="performance-collections-mostefficientupdate">
<title>Lists, maps, idbags and sets are the most efficient collections to update</title>
<para>
From the discussion above, it should be clear that indexed collections
and (usually) sets allow the most efficient operation in terms of adding,
removing and updating elements.
</para>
<para>
There is, arguably, one more advantage that indexed collections have over sets for
many to many associations or collections of values. Because of the structure of a
<literal>Set</literal>, Hibernate doesn't ever <literal>UPDATE</literal> a row when
an element is "changed". Changes to a <literal>Set</literal> always work via
<literal>INSERT</literal> and <literal>DELETE</literal> (of individual rows). Once
again, this consideration does not apply to one to many associations.
</para>
<para>
After observing that arrays cannot be lazy, we would conclude that lists, maps and
idbags are the most performant (non-inverse) collection types, with sets not far
behind. Sets are expected to be the most common kind of collection in Hibernate
applications. This is because the "set" semantics are most natural in the relational
model.
</para>
<para>
However, in well-designed Hibernate domain models, we usually see that most collections
are in fact one-to-many associations with <literal>inverse="true"</literal>. For these
associations, the update is handled by the many-to-one end of the association, and so
considerations of collection update performance simply do not apply.
</para>
</sect2>
<sect2 id="performance-collections-mostefficentinverse">
<title>Bags and lists are the most efficient inverse collections</title>
<para>
Just before you ditch bags forever, there is a particular case in which bags (and also lists)
are much more performant than sets. For a collection with <literal>inverse="true"</literal>
(the standard bidirectional one-to-many relationship idiom, for example) we can add elements
to a bag or list without needing to initialize (fetch) the bag elements! This is because
<literal>Collection.add()</literal> or <literal>Collection.addAll()</literal> must always
return true for a bag or <literal>List</literal> (unlike a <literal>Set</literal>). This can
make the following common code much faster.
</para>
<programlisting><![CDATA[Parent p = (Parent) sess.load(Parent.class, id);
Child c = new Child();
c.setParent(p);
p.getChildren().add(c); //no need to fetch the collection!
sess.flush();]]></programlisting>
</sect2>
<sect2 id="performance-collections-oneshotdelete">
<title>One shot delete</title>
<para>
Occasionally, deleting collection elements one by one can be extremely inefficient. Hibernate
isn't completely stupid, so it knows not to do that in the case of an newly-empty collection
(if you called <literal>list.clear()</literal>, for example). In this case, Hibernate will
issue a single <literal>DELETE</literal> and we are done!
</para>
<para>
Suppose we add a single element to a collection of size twenty and then remove two elements.
Hibernate will issue one <literal>INSERT</literal> statement and two <literal>DELETE</literal>
statements (unless the collection is a bag). This is certainly desirable.
</para>
<para>
However, suppose that we remove eighteen elements, leaving two and then add thee new elements.
There are two possible ways to proceed
</para>
<itemizedlist>
<listitem>
<para>delete eighteen rows one by one and then insert three rows</para>
</listitem>
<listitem>
<para>remove the whole collection (in one SQL <literal>DELETE</literal>) and insert
all five current elements (one by one)</para>
</listitem>
</itemizedlist>
<para>
Hibernate isn't smart enough to know that the second option is probably quicker in this case.
(And it would probably be undesirable for Hibernate to be that smart; such behaviour might
confuse database triggers, etc.)
</para>
<para>
Fortunately, you can force this behaviour (ie. the second strategy) at any time by discarding
(ie. dereferencing) the original collection and returning a newly instantiated collection with
all the current elements. This can be very useful and powerful from time to time.
</para>
<para>
Of course, one-shot-delete does not apply to collections mapped <literal>inverse="true"</literal>.
</para>
</sect2>
</sect1>
<para>
TODO: document statistics
</para>