Doc'd fetching strategies

git-svn-id: https://svn.jboss.org/repos/hibernate/trunk/Hibernate3/doc@5306 1b8cb986-b30d-0410-93ca-fae66ebed9b2
This commit is contained in:
Christian Bauer 2005-01-24 14:54:21 +00:00
parent a9a5787af5
commit 94de13235f
6 changed files with 495 additions and 366 deletions

View File

@ -566,109 +566,6 @@ kittens = cat.getKittens(); //Okay, kittens collection is a Set
</sect1>
<sect1 id="collections-lazy" revision="2">
<title>Lazy Initialization</title>
<para>
Collections (other than arrays) may be lazily initialized, meaning they load
their state from the database only when the application needs to access it.
Initialization of collections owned by persistent instances happens transparently
to the user, so the application would not normally need to worry about this (in
fact, transparent lazy initialization is the main reason why Hibernate needs its
own collection implementations). However, if the application tries something like
this:
</para>
<programlisting><![CDATA[s = sessions.openSession();
User u = (User) s.find("from User u where u.name=?", userName, Hibernate.STRING).get(0);
Map permissions = u.getPermissions();
s.connection().commit();
s.close();
Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></programlisting>
<para>
It could be in for a nasty surprise. Since the permissions collection was not
initialized when the <literal>Session</literal> was closed, the collection
will not be able to load its state. <emphasis>Hibernate does not support lazy
initialization for detached objects</emphasis>. The fix is to move the
line that reads from the collection to just before the commit. (There are
other more advanced ways to solve this problem, however.)
</para>
<para>
It's possible to use a non-lazy collection. However, it is intended that lazy
initialization be used for almost all collections, especially for collections
of entities, and is now the default. If you define too many non-lazy associations
in your object model, Hibernate will end up needing to fetch the entire database
into memory in every transaction!
</para>
<para>
Exceptions that occur while lazily initializing a collection are wrapped in a
<literal>LazyInitializationException</literal>.
</para>
<para>
In some application architectures, particularly where the code that accesses data
using Hibernate, and the code that uses it are in different application layers, it
can be a problem to ensure that the <literal>Session</literal> is open when a
collection is initialized. They are two basic ways to deal with this issue:
</para>
<itemizedlist>
<listitem>
<para>
In a web-based application, a servlet filter can be used to close the
<literal>Session</literal> only at the very end of a user request, once
the rendering of the view is complete. Of course, this places heavy
demands upon the correctness of the exception handling of your application
infrastructure. It is vitally important that the <literal>Session</literal>
is closed and the transaction ended before returning to the user, even
when an exception occurs during rendering of the view. The servlet filter
has to be able to access the <literal>Session</literal> for this approach.
We recommend that a <literal>ThreadLocal</literal> variable be used to
hold the current <literal>Session</literal> (see chapter 1,
<xref linkend="quickstart-playingwithcats"/>, for an example implementation).
</para>
</listitem>
<listitem>
<para>
In an application with a seperate business tier, the business logic must
"prepare" all collections that will be needed by the web tier before
returning. This means that the business tier should load all the data and
return all the data already initialized to the presentation/web tier that
is required for a particular use case. Usually, the application calls
<literal>Hibernate.initialize()</literal> for each collection that will
be needed in the web tier (this call must occur before the session is closed)
or retrieves the collection eagerly using a Hibernate query with a
<literal>FETCH</literal> clause.
</para>
</listitem>
<listitem>
<para>
You may also attach a previously loaded object to a new <literal>Session</literal>
with <literal>update()</literal> or <literal>lock()</literal> before
accessing unitialized collections (or other proxies). Hibernate can not
do this automatically, as it would introduce ad hoc transaction semantics!
</para>
</listitem>
</itemizedlist>
<para>
You can use a collection filter to get the size of a collection without initializing it:
</para>
<programlisting><![CDATA[( (Integer) s.createFilter( collection, "select count(*)" ).list().get(0) ).intValue()]]></programlisting>
<para>
The <literal>createFilter()</literal> method is also used to efficiently retrieve subsets
of a collection without needing to initialize the whole collection. (And the new
<literal>&lt;filter&gt;</literal> functionality is a more powerful approach.)
</para>
</sect1>
<sect1 id="collections-sorted" revision="1">
<title>Sorted Collections</title>

View File

@ -876,7 +876,7 @@ hibernate.dialect = \
</sect2>
<sect2 id="configuration-optional-outerjoin" revision="1">
<sect2 id="configuration-optional-outerjoin" revision="3">
<title>Outer Join Fetching</title>
<para>
@ -888,30 +888,20 @@ hibernate.dialect = \
in a single SQL <literal>SELECT</literal>.
</para>
<para>
By default, the fetched graph when loading an objects ends at leaf objects,
collections, objects with proxies, or where circularities occur in the case
of *-to-one associations. Hibernate will however execute an immediate additional
<literal>SELECT</literal> for any persistent collection (we recommend that you
turn on lazy loading for all collection mappings).
</para>
<para>
For a <emphasis>particular association</emphasis>, fetching may be enabled
or disabled (and the default behaviour overridden) by setting the
<literal>outer-join</literal> attribute in the XML mapping.
</para>
<para>
Outer join fetching may be disabled <emphasis>globally</emphasis> by setting
the property <literal>hibernate.max_fetch_depth</literal> to <literal>0</literal>.
A setting of <literal>1</literal> or higher enables outer join fetching for
all one-to-one and many-to-one associations, which are, also by default, set
to <literal>auto</literal> outer join. However, one-to-many associations and
collections are never fetched with an outer-join, unless explicitly declared
for each particular association. This behavior can also be overriden at runtime
with Hibernate queries. See the query chapters in the documentation for more
details.
all one-to-one and many-to-one associations if no other fetching strategy is
defined in the mapping <emphasis>and</emphasis> if proxying of the target entity
class has been turned off (thus disabling lazy loading). However, one-to-many
associations and collections are never fetched with an outer-join, unless
explicitly declared for each particular association. This
behavior can also be overriden at runtime with Hibernate queries.
</para>
<para>
See <xref linkend="performance-fetching"/> for more information.
</para>
</sect2>

View File

@ -2,7 +2,7 @@
<title>Working with Persistent Data</title>
<sect1 id="manipulatingdata-creating">
<sect1 id="manipulatingdata-creating" revision="1">
<title>Creating a persistent object</title>
<para>
@ -24,6 +24,8 @@ Long generatedId = (Long) sess.save(fritz);]]></programlisting>
is called. If <literal>Cat</literal> has an <literal>assigned</literal>
identifier, or a composite key, the identifier should be assigned to
the <literal>cat</literal> instance before calling <literal>save()</literal>.
You may also use <literal>create()</literal> instead of <literal>save()</literal>,
with the semantics defined in the EJB3 early draft.
</para>
<para>
@ -105,14 +107,16 @@ return cat;]]></programlisting>
<para>
You may even load an object using an SQL <literal>SELECT ... FOR UPDATE</literal>.
See the next section for a discussion of Hibernate <literal>LockMode</literal>s.
See the next sections for a discussion of Hibernate <literal>LockMode</literal>s.
</para>
<programlisting><![CDATA[Cat cat = (Cat) sess.get(Cat.class, id, LockMode.UPGRADE);]]></programlisting>
<para>
Note that any associated instances or contained collections are
<emphasis>not</emphasis> selected <literal>FOR UPDATE</literal>.
<emphasis>not</emphasis> selected <literal>FOR UPDATE</literal>, unless you decide
to specify <literal>lock</literal> or <literal>all</literal> as a
cascade style for the association.
</para>
<para>
@ -125,6 +129,13 @@ return cat;]]></programlisting>
sess.flush(); //force the SQL INSERT
sess.refresh(cat); //re-read the state (after the trigger executes)]]></programlisting>
<para>
An important question usually appears at this point: How much does Hibernate load
from the database and how many SQL <literal>SELECT</literal>s will it use? This
depends on the <emphasis>fetching strategy</emphasis> and is explained in
<xref linkend="performance-fetching"/>.
</para>
</sect1>
<sect1 id="manipulatingdata-querying">

View File

@ -196,31 +196,168 @@
</sect1>
<sect1 id="performance-fetching">
<title>Fetching strategies</title>
<para>
We have already shown how you can use lazy initialization for persistent collections
in the chapter about collection mappings. A similar effect is achievable for ordinary object
references, using CGLIB proxies. We have also mentioned how Hibernate caches persistent
objects at the level of a <literal>Session</literal>. More aggressive caching strategies
may be configured upon a class-by-class basis.
A fetching strategy describes the number of instances, the depth of a
subgraph of instances, and SQL <literal>SELECT</literal>s that are used
to retrieve these instances. Hibernate supports several strategies and you
can configure them on a global level, per entity class, per association, or
even for a particular query in HQL and with <literal>Criteria</literal>.
</para>
<para>
In the next section, we show you how to use these features, which may be used to
achieve much higher performance, where necessary.
Hibernate offers the following fetching strategies:
</para>
<sect1 id="performance-proxies" revision="2">
<title>Proxies for Lazy Initialization</title>
<itemizedlist>
<listitem>
<para>
<emphasis>Lazy fetching</emphasis> - an associated instance (or a
collection) will only be loaded when needed, using an additional
defered <literal>SELECT</literal>.
</para>
</listitem>
<listitem>
<para>
<emphasis>Batch fetching</emphasis> - an optimization strategy
for lazy fetching, Hibernate not only retrieves a single instance
(or collection), but several in the same <literal>SELECT</literal>.
</para>
</listitem>
<listitem>
<para>
<emphasis>Eager fetching</emphasis> - Hibernate retrieves the
associated instance (or collection) in the same <literal>SELECT</literal>,
using an <literal>OUTER JOIN</literal>.
</para>
</listitem>
<listitem>
<para>
<emphasis>Select fetching</emphasis> - a second <literal>SELECT</literal>
is used to retrieve the associated instance (or collection), but
it might be executed immediately and not defered until first access
(as with lazy fetching).
</para>
</listitem>
</itemizedlist>
<para>
Hibernate implements lazy initializing proxies for persistent objects using runtime
bytecode enhancement (via the excellent CGLIB library).
By default, Hibernate3 will only load the given entity using a single
<literal>SELECT</literal> statement if you retrieve an object with
<literal>load()</literal> or <literal>get()</literal>. This means that
all single-ended associations and collections are set for lazy fetching
by default. You can change this global default by setting the
<literal>default-lazy</literal> attribute on the <literal>hibernate-mapping</literal>
element to <literal>false</literal>.
</para>
<para>
We'll now have a closer look at the individual fetching strategies and how
to change them for single-ended associations and collections.
</para>
<sect2 id="performance-fetching-collections" revision="3">
<title>Collection fetching</title>
<para>
Initialization of collections owned by persistent instances happens transparently
to the user, so the application would not normally need to worry about this (in
fact, transparent lazy initialization is the main reason why Hibernate needs its
own collection implementations). However, if the application tries something like
this:
</para>
<programlisting><![CDATA[s = sessions.openSession();
User u = (User) s.find("from User u where u.name=?", userName, Hibernate.STRING).get(0);
Map permissions = u.getPermissions();
s.connection().commit();
s.close();
Integer accessLevel = (Integer) permissions.get("accounts"); // Error!]]></programlisting>
<para>
It could be in for a nasty surprise. Since the permissions collection was not
initialized when the <literal>Session</literal> was closed, the collection
will not be able to load its state. <emphasis>Hibernate does not support lazy
initialization for detached objects</emphasis>. The fix is to move the
line that reads from the collection to just before the commit. (There are
other more advanced ways to solve this problem, some are discussed later.)
</para>
<para>
It's possible to use a non-lazy collection. However, it is intended that lazy
initialization be used for almost all collections, especially for collections
of entity references (its the default). If you define too many non-lazy associations
in your object model, Hibernate will end up needing to fetch the entire database
into memory in every transaction! Still, sometimes you want to use an additional
<literal>SELECT</literal> for a particular collection right away, not defered
until the first access happens:
</para>
<programlisting><![CDATA[<set name="permissions" fetch="select">
<key column="USER_ID"/>
<one-to-many class="Permission"/>
</set]]></programlisting>
<para>
Hibernate will now execute an immediate second <literal>SELECT</literal> loading
the collection of <literal>Permission</literal> instances, when a particular
<literal>User</literal> is retrieved.
</para>
<para>
Any kind of lazy fetching (and also Select fetching) is extremely vulnerable to
N+1 selects problems. So usually, we choose lazy fetching only as a default
strategy, and override it for a particular transaction, using the HQL
<literal>LEFT JOIN FETCH</literal> clause. This tells Hibernate to fetch the
association eagerly in the first select, using an outer join. In the
<literal>Criteria</literal> API, you would use
<literal>setFetchMode(FetchMode.EAGER)</literal>.
</para>
<para>
You can always force outer join association fetching in the mapping file, by setting
<literal>fetch="join"</literal> (or use the old <literal>outer-join="true"</literal>
syntax). We don't recommend this setting, especially not for collections, since it is
incredibly rare to find an entity which is <emphasis>always</emphasis> used when
an associated entity is used, at least in a sufficiently large system.
</para>
<para>
Eager fetching for collections has another restriction: you may only set one
collection role per persistent class to be fetched per outer join. Hibernate forbids
Cartesian products when possible, <literal>SELECT</literal>ing two collections per
outer join would create one. This would almost always be slower than two (lazy or
non-defered) <literal>SELECT</literal>s. The restriction to a single outer-joined
collection applies to both the mapping fetching strategies and to HQL/Criteria queries.
</para>
</sect2>
<sect2 id="performance-fetching-proxies" revision="2">
<title>Single-ended association proxies</title>
<para>
Lazy fetching for collections is implemented using Hibernate's own implementation
of persistent collections. However, a different mechanism is needed for lazy
behavior in single-ended associations. The target entity of the association must
be proxied. Hibernate implements lazy initializing proxies for persistent objects
using runtime bytecode enhancement (via the excellent CGLIB library).
</para>
<para>
By default, Hibernate3 generates proxies (at startup) for all persistent classes
and uses them to enable lazy fetching of <literal>many-to-one</literal> and
<literal>one-to-one</literal> associations.
</para>
<para>
The mapping file may declare an interface to use as the proxy interface for that
class. By default, Hibernate uses a subclass of the class itself. (The proxied
class must implement a default constructor with at least package visibility.)
class, with the <literal>proxy</literal> attribute. By default, Hibernate uses a subclass
of the class. <emphasis>Note that the proxied class must implement a default constructor
with at least package visibility. We recommend this constructor for all persistent classes!</emphasis>
</para>
<para>
@ -238,7 +375,7 @@
<para>
Firstly, instances of <literal>Cat</literal> will never be castable to
<literal>DomesticCat</literal>, even if the underlying instance is an
instance of <literal>DomesticCat</literal>.
instance of <literal>DomesticCat</literal>:
</para>
<programlisting><![CDATA[Cat cat = (Cat) session.load(Cat.class, id); // instantiate a proxy (does not hit the db)
@ -336,8 +473,47 @@ Cat fritz = (Cat) iter.next();]]></programlisting>
</para>
<para>
Exceptions that occur while initializing a proxy are wrapped in a
<literal>LazyInitializationException</literal>.
You may of course also use Eager or Select fetching strategies for single-ended
associations:
</para>
<programlisting><![CDATA[<many-to-one name="mother" class="Cat" fetch="join"/>
<many-to-one name="father" class="Cat" fetch="select"/>]]></programlisting>
<para>
The first mapping tells Hibernate to fetch the associated <literal>mother</literal>
entity in the same initial <literal>SELECT</literal> using an <literal>OUTER JOIN</literal>.
You can set this option on as many *-to-one associations as you like, there is no
danger of creating a Cartesian product (opposed to collections). Note that you can
set the maximum depth of outer joined tables with the global configuration option
<literal>max_fetch_depth</literal> (see <xref linkend="configuration-optional-outerjoin"/>).
</para>
<para>
The second mapping enables an additional <literal>SELECT</literal> for the
retrieval of the <literal>father</literal>. Note that Hibernate does not guarantee
<emphasis>when</emphasis> this query will be executed. If it should be executed
immediately (right after the initial <literal>SELECT</literal>), disable proxying
on the target of the association by setting it to <literal>lazy="false"</literal>:
</para>
<programlisting><![CDATA[<class name="Cat" lazy="false">...</class>]]></programlisting>
<para>
(Note that this example uses only a single persistent class <literal>Cat</literal>
and self-referencing associations. This doesn't change the fetching behavior, as expexted.)
</para>
</sect2>
<sect2 id="performance-fetching-initialization">
<title>Initializing collections and proxies</title>
<para>
An exception (<literal>LazyInitializationException</literal>) will be thrown by
Hibernate if an unitialized collection or proxy is accessed outside of the scope
of the <literal>Session</literal>, ie. when the entity owning the collection or
having the reference to the proxy is in detached state.
</para>
<para>
@ -345,6 +521,9 @@ Cat fritz = (Cat) iter.next();]]></programlisting>
<literal>Session</literal>. Of course, we can alway force initialization by calling
<literal>cat.getSex()</literal> or <literal>cat.getKittens().size()</literal>, for example.
But that is confusing to readers of the code and is not convenient for generic code.
</para>
<para>
The static methods <literal>Hibernate.initialize()</literal> and <literal>Hibernate.isInitialized()</literal>
provide the application with a convenient way of working with lazyily initialized collections or
proxies. <literal>Hibernate.initialize(cat)</literal> will force the initialization of a proxy,
@ -353,15 +532,84 @@ Cat fritz = (Cat) iter.next();]]></programlisting>
of kittens.
</para>
</sect1>
<para>
Another option is to keep the <literal>Session</literal> open until all needed
collections and proxies have been loaded. In some application architectures,
particularly where the code that accesses data using Hibernate, and the code that
uses it are in different application layers, it can be a problem to ensure that the
<literal>Session</literal> is open when a collection is initialized. There are
two basic ways to deal with this issue:
</para>
<sect1 id="performance-batchfetching">
<itemizedlist>
<listitem>
<para>
In a web-based application, a servlet filter can be used to close the
<literal>Session</literal> only at the very end of a user request, once
the rendering of the view is complete (the <emphasis>Open Session in
View</emphasis> pattern). Of course, this places heavy
demands on the correctness of the exception handling of your application
infrastructure. It is vitally important that the <literal>Session</literal>
is closed and the transaction ended before returning to the user, even
when an exception occurs during rendering of the view. The servlet filter
has to be able to access the <literal>Session</literal> for this approach.
We recommend that a <literal>ThreadLocal</literal> variable be used to
hold the current <literal>Session</literal> (see chapter 1,
<xref linkend="quickstart-playingwithcats"/>, for an example implementation).
</para>
</listitem>
<listitem>
<para>
In an application with a seperate business tier, the business logic must
"prepare" all collections that will be needed by the web tier before
returning. This means that the business tier should load all the data and
return all the data already initialized to the presentation/web tier that
is required for a particular use case. Usually, the application calls
<literal>Hibernate.initialize()</literal> for each collection that will
be needed in the web tier (this call must occur before the session is closed)
or retrieves the collection eagerly using a Hibernate query with a
<literal>FETCH</literal> clause or a <literal>FetchMode.JOIN</literal> in
<literal>Criteria</literal>. This is usually easier if you adopt the
<emphasis>Command</emphasis> pattern instead of a <emphasis>Session Facade</emphasis>.
</para>
</listitem>
<listitem>
<para>
You may also attach a previously loaded object to a new <literal>Session</literal>
with <literal>merge()</literal> or <literal>lock()</literal> before
accessing unitialized collections (or other proxies). Hibernate can not
do this automatically, as it would introduce ad hoc transaction semantics!
</para>
</listitem>
</itemizedlist>
<para>
Sometimes you don't want to initialize a large collection, but still need some
information about it (like its size) or a subset of the data.
</para>
<para>
You can use a collection filter to get the size of a collection without initializing it:
</para>
<programlisting><![CDATA[( (Integer) s.createFilter( collection, "select count(*)" ).list().get(0) ).intValue()]]></programlisting>
<para>
The <literal>createFilter()</literal> method is also used to efficiently retrieve subsets
of a collection without needing to initialize the whole collection:
</para>
<programlisting><![CDATA[s.createFilter( lazyCollection, "").setFirstResult(0).setMaxResults(10).list();]]></programlisting>
</sect2>
<sect2 id="performance-fetching-batch">
<title>Using batch fetching</title>
<para>
Hibernate can make efficient use of batch fetching, that is, Hibernate can load several uninitialized
proxies if one proxy is accessed. Batch fetching is an optimization for the lazy loading strategy.
There are two ways you can tune batch fetching: on the class and the collection level.
proxies if one proxy is accessed (or collections. Batch fetching is an optimization for the lazy
loading strategy. There are two ways you can tune batch fetching: on the class and the collection level.
</para>
<para>
@ -405,12 +653,13 @@ Cat fritz = (Cat) iter.next();]]></programlisting>
<para>
Batch fetching of collections is particularly useful if you have a nested tree of items, ie.
the typical bill-of-materials pattern.
the typical bill-of-materials pattern. (Although a <emphasis>nested set</emphasis> or a
<emphasis>materialized path</emphasis> might be a better option for read-mostly trees.)
</para>
</sect1>
</sect2>
<sect1 id="performance-lazyproperties">
<sect2 id="performance-fetching-lazyproperties">
<title>Using lazy property fetching</title>
<para>
@ -470,26 +719,7 @@ Cat fritz = (Cat) iter.next();]]></programlisting>
TODO: Document issues with lazy property loading
</para>
</sect1>
<sect1 id="performance-outerjoinfetch" revision="1">
<title>Outer join fetching</title>
<para>
Any kind of lazy fetching is extremely vulnerable to N+1 selects problems. So usually,
we choose lazy fetching only as a "default" strategy, and override it for a particular
transaction, using the HQL <literal>LEFT JOIN FETCH</literal> clause. This tells Hibernate
to fetch the association in the first select, using an outer join. In the
<literal>Criteria</literal> API, you would use <literal>setFetchMode(FetchMode.EAGER)</literal>.
</para>
<para>
You can always force outer join association fetching in the mapping file, by setting
<literal>outer-join="true"</literal>. We don't recommend this setting, especially
not for collections, since it is incredibly rare to find an entity which is
<emphasis>always</emphasis> used when an associated entity is used, at least in a
sufficiently large system.
</para>
</sect2>
<para>
A completely different way to avoid problems with N+1 selects is to use the second-level

View File

@ -148,7 +148,7 @@ while ( iter.hasNext() ) {
</sect1>
<sect1 id="querycriteria-dynamicfetching">
<sect1 id="querycriteria-dynamicfetching" revision="1">
<title>Dynamic association fetching</title>
<para>
@ -164,7 +164,7 @@ while ( iter.hasNext() ) {
<para>
This query will fetch both <literal>mate</literal> and <literal>kittens</literal>
by outer join.
by outer join. See <xref linkend="performance-fetching"/> for more information.
</para>
</sect1>

View File

@ -72,7 +72,7 @@
</sect1>
<sect1 id="queryhql-joins">
<sect1 id="queryhql-joins" revision="1">
<title>Associations and joins</title>
<para>
@ -128,7 +128,8 @@ from Formula form full join form.parameter param]]></programlisting>
In addition, a "fetch" join allows associations or collections of values to be
initialized along with their parent objects, using a single select. This is particularly
useful in the case of a collection. It effectively overrides the outer join and
lazy declarations of the mapping file for associations and collections.
lazy declarations of the mapping file for associations and collections. See
<xref linkend="performance-fetching"/> for more information.
</para>
<programlisting><![CDATA[from eg.Cat as cat