fixed some probs with performance section
git-svn-id: https://svn.jboss.org/repos/hibernate/trunk/Hibernate3/doc@5546 1b8cb986-b30d-0410-93ca-fae66ebed9b2
This commit is contained in:
parent
4b8b1266dc
commit
b490a52e42
|
@ -1,210 +1,15 @@
|
|||
<chapter id="performance">
|
||||
<title>Improving performance</title>
|
||||
|
||||
<sect1 id="performance-collections">
|
||||
<title>Understanding Collection performance</title>
|
||||
|
||||
<para>
|
||||
We've already spent quite some time talking about collections.
|
||||
In this section we will highlight a couple more issues about
|
||||
how collections behave at runtime.
|
||||
</para>
|
||||
|
||||
<sect2 id="performance-collections-taxonomy">
|
||||
<title>Taxonomy</title>
|
||||
|
||||
<para>Hibernate defines three basic kinds of collections:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>collections of values</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>one to many associations</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>many to many associations</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
This classification distinguishes the various table and foreign key
|
||||
relationships but does not tell us quite everything we need to know
|
||||
about the relational model. To fully understand the relational structure
|
||||
and performance characteristics, we must also consider the structure of
|
||||
the primary key that is used by Hibernate to update or delete collection
|
||||
rows. This suggests the following classification:
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>indexed collections</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>sets</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>bags</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
All indexed collections (maps, lists, arrays) have a primary key consisting
|
||||
of the <literal><key></literal> and <literal><index></literal>
|
||||
columns. In this case collection updates are usually extremely efficient -
|
||||
the primary key may be efficiently indexed and a particular row may be efficiently
|
||||
located when Hibernate tries to update or delete it.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Sets have a primary key consisting of <literal><key></literal> and element
|
||||
columns. This may be less efficient for some types of collection element, particularly
|
||||
composite elements or large text or binary fields; the database may not be able to index
|
||||
a complex primary key as efficently. On the other hand, for one to many or many to many
|
||||
associations, particularly in the case of synthetic identifiers, it is likely to be just
|
||||
as efficient. (Side-note: if you want <literal>SchemaExport</literal> to actually create
|
||||
the primary key of a <literal><set></literal> for you, you must declare all columns
|
||||
as <literal>not-null="true"</literal>.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<literal><idbag></literal> mappings define a surrogate key, so they are
|
||||
always very efficient to update. In fact, they are the best case.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Bags are the worst case. Since a bag permits duplicate element values and has no
|
||||
index column, no primary key may be defined. Hibernate has no way of distinguishing
|
||||
between duplicate rows. Hibernate resolves this problem by completely removing
|
||||
(in a single <literal>DELETE</literal>) and recreating the collection whenever it
|
||||
changes. This might be very inefficient.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Note that for a one-to-many association, the "primary key" may not be the physical
|
||||
primary key of the database table - but even in this case, the above classification
|
||||
is still useful. (It still reflects how Hibernate "locates" individual rows of the
|
||||
collection.)
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="performance-collections-mostefficientupdate">
|
||||
<title>Lists, maps, idbags and sets are the most efficient collections to update</title>
|
||||
|
||||
<para>
|
||||
From the discussion above, it should be clear that indexed collections
|
||||
and (usually) sets allow the most efficient operation in terms of adding,
|
||||
removing and updating elements.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There is, arguably, one more advantage that indexed collections have over sets for
|
||||
many to many associations or collections of values. Because of the structure of a
|
||||
<literal>Set</literal>, Hibernate doesn't ever <literal>UPDATE</literal> a row when
|
||||
an element is "changed". Changes to a <literal>Set</literal> always work via
|
||||
<literal>INSERT</literal> and <literal>DELETE</literal> (of individual rows). Once
|
||||
again, this consideration does not apply to one to many associations.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
After observing that arrays cannot be lazy, we would conclude that lists, maps and
|
||||
idbags are the most performant (non-inverse) collection types, with sets not far
|
||||
behind. Sets are expected to be the most common kind of collection in Hibernate
|
||||
applications. This is because the "set" semantics are most natural in the relational
|
||||
model.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
However, in well-designed Hibernate domain models, we usually see that most collections
|
||||
are in fact one-to-many associations with <literal>inverse="true"</literal>. For these
|
||||
associations, the update is handled by the many-to-one end of the association, and so
|
||||
considerations of collection update performance simply do not apply.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="performance-collections-mostefficentinverse">
|
||||
<title>Bags and lists are the most efficient inverse collections</title>
|
||||
|
||||
<para>
|
||||
Just before you ditch bags forever, there is a particular case in which bags (and also lists)
|
||||
are much more performant than sets. For a collection with <literal>inverse="true"</literal>
|
||||
(the standard bidirectional one-to-many relationship idiom, for example) we can add elements
|
||||
to a bag or list without needing to initialize (fetch) the bag elements! This is because
|
||||
<literal>Collection.add()</literal> or <literal>Collection.addAll()</literal> must always
|
||||
return true for a bag or <literal>List</literal> (unlike a <literal>Set</literal>). This can
|
||||
make the following common code much faster.
|
||||
</para>
|
||||
|
||||
<programlisting><![CDATA[Parent p = (Parent) sess.load(Parent.class, id);
|
||||
Child c = new Child();
|
||||
c.setParent(p);
|
||||
p.getChildren().add(c); //no need to fetch the collection!
|
||||
sess.flush();]]></programlisting>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="performance-collections-oneshotdelete">
|
||||
<title>One shot delete</title>
|
||||
|
||||
<para>
|
||||
Occasionally, deleting collection elements one by one can be extremely inefficient. Hibernate
|
||||
isn't completely stupid, so it knows not to do that in the case of an newly-empty collection
|
||||
(if you called <literal>list.clear()</literal>, for example). In this case, Hibernate will
|
||||
issue a single <literal>DELETE</literal> and we are done!
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Suppose we add a single element to a collection of size twenty and then remove two elements.
|
||||
Hibernate will issue one <literal>INSERT</literal> statement and two <literal>DELETE</literal>
|
||||
statements (unless the collection is a bag). This is certainly desirable.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
However, suppose that we remove eighteen elements, leaving two and then add thee new elements.
|
||||
There are two possible ways to proceed
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>delete eighteen rows one by one and then insert three rows</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>remove the whole collection (in one SQL <literal>DELETE</literal>) and insert
|
||||
all five current elements (one by one)</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
Hibernate isn't smart enough to know that the second option is probably quicker in this case.
|
||||
(And it would probably be undesirable for Hibernate to be that smart; such behaviour might
|
||||
confuse database triggers, etc.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Fortunately, you can force this behaviour (ie. the second strategy) at any time by discarding
|
||||
(ie. dereferencing) the original collection and returning a newly instantiated collection with
|
||||
all the current elements. This can be very useful and powerful from time to time.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Of course, one-shot-delete does not apply to collections mapped <literal>inverse="true"</literal>.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="performance-fetching">
|
||||
<title>Fetching strategies</title>
|
||||
|
||||
<para>
|
||||
A fetching strategy describes the number of instances, the depth of a
|
||||
subgraph of instances, and SQL <literal>SELECT</literal>s that are used
|
||||
to retrieve these instances. Hibernate supports several strategies and you
|
||||
can configure them on a global level, per entity class, per association, or
|
||||
even for a particular query in HQL and with <literal>Criteria</literal>.
|
||||
A <emphasis>fetching strategy</emphasis> is the strategy Hibernate will
|
||||
use for retrieving associated objects if the application needs to
|
||||
navigate the association. Fetch strategies may be declared in the O/R
|
||||
mapping metadata, or over-ridden by particular HQL or
|
||||
<literal>Criteria</literal> query.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
@ -214,124 +19,150 @@
|
|||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>Lazy fetching</emphasis> - an associated instance (or a
|
||||
collection) will only be loaded when needed, using an additional
|
||||
defered <literal>SELECT</literal>.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>Batch fetching</emphasis> - an optimization strategy
|
||||
for lazy fetching, Hibernate not only retrieves a single instance
|
||||
(or collection), but several in the same <literal>SELECT</literal>.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>Eager fetching</emphasis> - Hibernate retrieves the
|
||||
associated instance (or collection) in the same <literal>SELECT</literal>,
|
||||
<emphasis>Join fetching</emphasis> - Hibernate retrieves the
|
||||
associated instance or collection in the same <literal>SELECT</literal>,
|
||||
using an <literal>OUTER JOIN</literal>.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>Select fetching</emphasis> - a second <literal>SELECT</literal>
|
||||
is used to retrieve the associated instance (or collection), but
|
||||
it might be executed immediately and not defered until first access
|
||||
(as with lazy fetching).
|
||||
is used to retrieve the associated entity or collection. Unless
|
||||
you explicitly disable lazy fetching by specifying <literal>lazy="false"</literal>,
|
||||
this second select will only be executed when you actually access the
|
||||
association.
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>Batch fetching</emphasis> - an optimization strategy
|
||||
for select fetching - Hibernate retrieves a batch of entity instances
|
||||
or collections in a single <literal>SELECT</literal>, by specifying
|
||||
a list of primary keys or foreign keys.
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
By default, Hibernate3 will only load the given entity using a single
|
||||
<literal>SELECT</literal> statement if you retrieve an object with
|
||||
<literal>load()</literal> or <literal>get()</literal>. This means that
|
||||
all single-ended associations and collections are set for lazy fetching
|
||||
by default. You can change this global default by setting the
|
||||
<literal>default-lazy</literal> attribute on the <literal>hibernate-mapping</literal>
|
||||
element to <literal>false</literal>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
We'll now have a closer look at the individual fetching strategies and how
|
||||
to change them for single-ended associations and collections.
|
||||
</para>
|
||||
|
||||
<sect2 id="performance-fetching-collections" revision="3">
|
||||
<title>Collection fetching</title>
|
||||
|
||||
<para>
|
||||
Initialization of collections owned by persistent instances happens transparently
|
||||
to the user, so the application would not normally need to worry about this (in
|
||||
fact, transparent lazy initialization is the main reason why Hibernate needs its
|
||||
own collection implementations). However, if the application tries something like
|
||||
this:
|
||||
By default, Hibernate3 always uses lazy select fetching, which is the best
|
||||
default for most entities and collections in most applications. However,
|
||||
there is one problem to be aware of. Access to a lazy association outside
|
||||
of the context of an open Hibernate session will result in an exception.
|
||||
For example:
|
||||
</para>
|
||||
|
||||
<programlisting><![CDATA[s = sessions.openSession();
|
||||
User u = (User) s.find("from User u where u.name=?", userName, Hibernate.STRING).get(0);
|
||||
Transaction tx = s.beginTransaction();
|
||||
|
||||
User u = (User) s.createQuery("from User u where u.name=:userName")
|
||||
.setString("userName", userName).uniqueResult();
|
||||
Map permissions = u.getPermissions();
|
||||
s.connection().commit();
|
||||
|
||||
tx.commit();
|
||||
s.close();
|
||||
|
||||
Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></programlisting>
|
||||
|
||||
<para>
|
||||
It could be in for a nasty surprise. Since the permissions collection was not
|
||||
Since the permissions collection was not
|
||||
initialized when the <literal>Session</literal> was closed, the collection
|
||||
will not be able to load its state. <emphasis>Hibernate does not support lazy
|
||||
initialization for detached objects</emphasis>. The fix is to move the
|
||||
line that reads from the collection to just before the commit. (There are
|
||||
code that reads from the collection to just before the commit. (There are
|
||||
other more advanced ways to solve this problem, some are discussed later.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
It's possible to use a non-lazy collection. However, it is intended that lazy
|
||||
initialization be used for almost all collections, especially for collections
|
||||
of entity references (its the default). If you define too many non-lazy associations
|
||||
in your object model, Hibernate will end up needing to fetch the entire database
|
||||
into memory in every transaction! Still, sometimes you want to use an additional
|
||||
<literal>SELECT</literal> for a particular collection right away, not defered
|
||||
until the first access happens:
|
||||
It is also possible to use a non-lazy collection or join fetching, which is
|
||||
non-lazy by nature. However, it is intended that lazy initialization be used
|
||||
for almost all collections, especially for collections of entity references.
|
||||
If you define too many non-lazy associations in your object model, Hibernate
|
||||
will end up needing to fetch the entire database into memory in every
|
||||
transaction!
|
||||
</para>
|
||||
|
||||
<programlisting><![CDATA[<set name="permissions" fetch="select">
|
||||
<key column="USER_ID"/>
|
||||
<para>
|
||||
On the other hand, we often want to choose join fetching (which is non-lazy by
|
||||
nature) instead of select fetching in a particular transaction. We'll now see
|
||||
how to customize the fetching strategy. In Hibernate3, the mechanisms for
|
||||
choosing a fetch strategy are identical for single-valued associations and
|
||||
collections.
|
||||
</para>
|
||||
|
||||
<sect2 id="performance-fetching-custom" revision="3">
|
||||
<title>Tuning fetch strategies</title>
|
||||
|
||||
<para>
|
||||
Select fetching (the default) is extremely vulnerable to N+1 selects problems,
|
||||
so we might want to enable join fetching in the mapping document:
|
||||
</para>
|
||||
|
||||
<programlisting><![CDATA[<set name="permissions"
|
||||
fetch="join">
|
||||
<key column="userId"/>
|
||||
<one-to-many class="Permission"/>
|
||||
</set]]></programlisting>
|
||||
|
||||
<para>
|
||||
Hibernate will now execute an immediate second <literal>SELECT</literal> loading
|
||||
the collection of <literal>Permission</literal> instances, when a particular
|
||||
<literal>User</literal> is retrieved.
|
||||
The fetch strategy defined in the mapping document affects:
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>
|
||||
retrieval via <literal>get()</literal> or <literal>load()</literal>
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
retrieval that happens implicitly when an association is navigated
|
||||
(lazy fetching)
|
||||
</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<literal>Criteria</literal> queries
|
||||
</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
<emphasis>The fetch strategy specified in the mapping document does not affect
|
||||
HQL queries</emphasis>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Any kind of lazy fetching (and also Select fetching) is extremely vulnerable to
|
||||
N+1 selects problems. So usually, we choose lazy fetching only as a default
|
||||
strategy, and override it for a particular transaction, using the HQL
|
||||
<literal>LEFT JOIN FETCH</literal> clause. This tells Hibernate to fetch the
|
||||
association eagerly in the first select, using an outer join. In the
|
||||
<literal>Criteria</literal> API, you would use
|
||||
<literal>setFetchMode(FetchMode.EAGER)</literal>.
|
||||
Usually, we don't use the mapping document to customize fetching. Instead, we
|
||||
keep the default behavior, and override it for a particular transaction, using
|
||||
the HQL <literal>LEFT JOIN FETCH</literal> clause. This tells Hibernate to fetch
|
||||
the association eagerly in the first select, using an outer join. In the
|
||||
<literal>Criteria</literal> query API, you would use
|
||||
<literal>setFetchMode(FetchMode.JOIN)</literal>.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
You can always force outer join association fetching in the mapping file, by setting
|
||||
<literal>fetch="join"</literal> (or use the old <literal>outer-join="true"</literal>
|
||||
syntax). We don't recommend this setting, especially not for collections, since it is
|
||||
incredibly rare to find an entity which is <emphasis>always</emphasis> used when
|
||||
an associated entity is used, at least in a sufficiently large system.
|
||||
If you ever feel like you wish you could change the fetching strategy used by
|
||||
<literal>get()</literal> or <literal>load()</literal>, simply use a
|
||||
<literal>Criteria</literal> query, for example:
|
||||
</para>
|
||||
|
||||
<programlisting><![CDATA[User user = (User) session.createCriteria(User.class)
|
||||
.setFetchMode("permissions", FetchMode.JOIN)
|
||||
.add( Restrictions.idEq(userId) )
|
||||
.uniqueResult();]]></programlisting>
|
||||
|
||||
<para>
|
||||
(This is Hibernate's equivalent of what some ORM solutions call a "fetch plan".)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Eager fetching for collections has another restriction: you may only set one
|
||||
collection role per persistent class to be fetched per outer join. Hibernate forbids
|
||||
Join fetching for collections has one limitation: you may only set one collection
|
||||
role per persistent class or query to be fetched per outer join. Hibernate forbids
|
||||
Cartesian products when possible, <literal>SELECT</literal>ing two collections per
|
||||
outer join would create one. This would almost always be slower than two (lazy or
|
||||
non-defered) <literal>SELECT</literal>s. The restriction to a single outer-joined
|
||||
collection applies to both the mapping fetching strategies and to HQL/Criteria queries.
|
||||
collection applies to both the mapping fetching strategies and to HQL/Criteria
|
||||
queries.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
@ -430,7 +261,7 @@ Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></prog
|
|||
where <literal>Cat</literal> implements the interface <literal>ICat</literal> and
|
||||
<literal>DomesticCat</literal> implements the interface <literal>IDomesticCat</literal>. Then
|
||||
proxies for instances of <literal>Cat</literal> and <literal>DomesticCat</literal> may be returned
|
||||
by <literal>load()</literal> or <literal>iterate()</literal>. (Note that <literal>find()</literal>
|
||||
by <literal>load()</literal> or <literal>iterate()</literal>. (Note that <literal>list()</literal>
|
||||
does not usually return proxies.)
|
||||
</para>
|
||||
|
||||
|
@ -608,8 +439,8 @@ Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></prog
|
|||
|
||||
<para>
|
||||
Hibernate can make efficient use of batch fetching, that is, Hibernate can load several uninitialized
|
||||
proxies if one proxy is accessed (or collections. Batch fetching is an optimization for the lazy
|
||||
loading strategy. There are two ways you can tune batch fetching: on the class and the collection level.
|
||||
proxies if one proxy is accessed (or collections. Batch fetching is an optimization of the lazy select
|
||||
fetching strategy. There are two ways you can tune batch fetching: on the class and the collection level.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
@ -625,9 +456,7 @@ Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></prog
|
|||
<programlisting><![CDATA[<class name="Person" batch-size="10">...</class>]]></programlisting>
|
||||
|
||||
<para>
|
||||
Hibernate will now execute only three queries, the pattern is 10, 10, 5. You can see that batch fetching
|
||||
is a blind guess, as far as performance optimization goes, it depends on the number of unitilized proxies
|
||||
in a particular <literal>Session</literal>.
|
||||
Hibernate will now execute only three queries, the pattern is 10, 10, 5.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
|
@ -646,7 +475,7 @@ Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></prog
|
|||
</class>]]></programlisting>
|
||||
|
||||
<para>
|
||||
With a <literal>batch-size</literal> of 3, Hibernate will load 3, 3, 3, 1 collections in 4
|
||||
With a <literal>batch-size</literal> of 8, Hibernate will load 3, 3, 3, 1 collections in four
|
||||
<literal>SELECT</literal>s. Again, the value of the attribute depends on the expected number of
|
||||
uninitialized collections in a particular <literal>Session</literal>.
|
||||
</para>
|
||||
|
@ -1069,6 +898,201 @@ while ( cats.hasNext() ) {
|
|||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="performance-collections">
|
||||
<title>Understanding Collection performance</title>
|
||||
|
||||
<para>
|
||||
We've already spent quite some time talking about collections.
|
||||
In this section we will highlight a couple more issues about
|
||||
how collections behave at runtime.
|
||||
</para>
|
||||
|
||||
<sect2 id="performance-collections-taxonomy">
|
||||
<title>Taxonomy</title>
|
||||
|
||||
<para>Hibernate defines three basic kinds of collections:</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>collections of values</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>one to many associations</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>many to many associations</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
This classification distinguishes the various table and foreign key
|
||||
relationships but does not tell us quite everything we need to know
|
||||
about the relational model. To fully understand the relational structure
|
||||
and performance characteristics, we must also consider the structure of
|
||||
the primary key that is used by Hibernate to update or delete collection
|
||||
rows. This suggests the following classification:
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>indexed collections</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>sets</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>bags</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
All indexed collections (maps, lists, arrays) have a primary key consisting
|
||||
of the <literal><key></literal> and <literal><index></literal>
|
||||
columns. In this case collection updates are usually extremely efficient -
|
||||
the primary key may be efficiently indexed and a particular row may be efficiently
|
||||
located when Hibernate tries to update or delete it.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Sets have a primary key consisting of <literal><key></literal> and element
|
||||
columns. This may be less efficient for some types of collection element, particularly
|
||||
composite elements or large text or binary fields; the database may not be able to index
|
||||
a complex primary key as efficently. On the other hand, for one to many or many to many
|
||||
associations, particularly in the case of synthetic identifiers, it is likely to be just
|
||||
as efficient. (Side-note: if you want <literal>SchemaExport</literal> to actually create
|
||||
the primary key of a <literal><set></literal> for you, you must declare all columns
|
||||
as <literal>not-null="true"</literal>.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<literal><idbag></literal> mappings define a surrogate key, so they are
|
||||
always very efficient to update. In fact, they are the best case.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Bags are the worst case. Since a bag permits duplicate element values and has no
|
||||
index column, no primary key may be defined. Hibernate has no way of distinguishing
|
||||
between duplicate rows. Hibernate resolves this problem by completely removing
|
||||
(in a single <literal>DELETE</literal>) and recreating the collection whenever it
|
||||
changes. This might be very inefficient.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Note that for a one-to-many association, the "primary key" may not be the physical
|
||||
primary key of the database table - but even in this case, the above classification
|
||||
is still useful. (It still reflects how Hibernate "locates" individual rows of the
|
||||
collection.)
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="performance-collections-mostefficientupdate">
|
||||
<title>Lists, maps, idbags and sets are the most efficient collections to update</title>
|
||||
|
||||
<para>
|
||||
From the discussion above, it should be clear that indexed collections
|
||||
and (usually) sets allow the most efficient operation in terms of adding,
|
||||
removing and updating elements.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
There is, arguably, one more advantage that indexed collections have over sets for
|
||||
many to many associations or collections of values. Because of the structure of a
|
||||
<literal>Set</literal>, Hibernate doesn't ever <literal>UPDATE</literal> a row when
|
||||
an element is "changed". Changes to a <literal>Set</literal> always work via
|
||||
<literal>INSERT</literal> and <literal>DELETE</literal> (of individual rows). Once
|
||||
again, this consideration does not apply to one to many associations.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
After observing that arrays cannot be lazy, we would conclude that lists, maps and
|
||||
idbags are the most performant (non-inverse) collection types, with sets not far
|
||||
behind. Sets are expected to be the most common kind of collection in Hibernate
|
||||
applications. This is because the "set" semantics are most natural in the relational
|
||||
model.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
However, in well-designed Hibernate domain models, we usually see that most collections
|
||||
are in fact one-to-many associations with <literal>inverse="true"</literal>. For these
|
||||
associations, the update is handled by the many-to-one end of the association, and so
|
||||
considerations of collection update performance simply do not apply.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="performance-collections-mostefficentinverse">
|
||||
<title>Bags and lists are the most efficient inverse collections</title>
|
||||
|
||||
<para>
|
||||
Just before you ditch bags forever, there is a particular case in which bags (and also lists)
|
||||
are much more performant than sets. For a collection with <literal>inverse="true"</literal>
|
||||
(the standard bidirectional one-to-many relationship idiom, for example) we can add elements
|
||||
to a bag or list without needing to initialize (fetch) the bag elements! This is because
|
||||
<literal>Collection.add()</literal> or <literal>Collection.addAll()</literal> must always
|
||||
return true for a bag or <literal>List</literal> (unlike a <literal>Set</literal>). This can
|
||||
make the following common code much faster.
|
||||
</para>
|
||||
|
||||
<programlisting><![CDATA[Parent p = (Parent) sess.load(Parent.class, id);
|
||||
Child c = new Child();
|
||||
c.setParent(p);
|
||||
p.getChildren().add(c); //no need to fetch the collection!
|
||||
sess.flush();]]></programlisting>
|
||||
|
||||
</sect2>
|
||||
|
||||
<sect2 id="performance-collections-oneshotdelete">
|
||||
<title>One shot delete</title>
|
||||
|
||||
<para>
|
||||
Occasionally, deleting collection elements one by one can be extremely inefficient. Hibernate
|
||||
isn't completely stupid, so it knows not to do that in the case of an newly-empty collection
|
||||
(if you called <literal>list.clear()</literal>, for example). In this case, Hibernate will
|
||||
issue a single <literal>DELETE</literal> and we are done!
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Suppose we add a single element to a collection of size twenty and then remove two elements.
|
||||
Hibernate will issue one <literal>INSERT</literal> statement and two <literal>DELETE</literal>
|
||||
statements (unless the collection is a bag). This is certainly desirable.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
However, suppose that we remove eighteen elements, leaving two and then add thee new elements.
|
||||
There are two possible ways to proceed
|
||||
</para>
|
||||
|
||||
<itemizedlist>
|
||||
<listitem>
|
||||
<para>delete eighteen rows one by one and then insert three rows</para>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>remove the whole collection (in one SQL <literal>DELETE</literal>) and insert
|
||||
all five current elements (one by one)</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>
|
||||
Hibernate isn't smart enough to know that the second option is probably quicker in this case.
|
||||
(And it would probably be undesirable for Hibernate to be that smart; such behaviour might
|
||||
confuse database triggers, etc.)
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Fortunately, you can force this behaviour (ie. the second strategy) at any time by discarding
|
||||
(ie. dereferencing) the original collection and returning a newly instantiated collection with
|
||||
all the current elements. This can be very useful and powerful from time to time.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Of course, one-shot-delete does not apply to collections mapped <literal>inverse="true"</literal>.
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
</sect1>
|
||||
|
||||
<para>
|
||||
TODO: document statistics
|
||||
</para>
|
||||
|
|
Loading…
Reference in New Issue