hibernate-orm/doc/reference/en/modules/performance.xml

<chapter id="performance">
    <title>Improving performance</title>

    <sect1 id="performance-fetching" revision="2">
        <title>Fetching strategies</title>

        <para>
            A <emphasis>fetching strategy</emphasis> is the strategy Hibernate will use for
            retrieving associated objects if the application needs to navigate the association.
            Fetch strategies may be declared in the O/R mapping metadata, or over-ridden by a
            particular HQL or <literal>Criteria</literal> query.
        </para>

        <para>
            Hibernate3 defines the following fetching strategies:
        </para>

        <itemizedlist>
             <listitem>
                <para>
                    <emphasis>Join fetching</emphasis> - Hibernate retrieves the
                    associated instance or collection in the same <literal>SELECT</literal>,
                    using an <literal>OUTER JOIN</literal>.
                </para>
            </listitem>
            <listitem>
                <para>
                    <emphasis>Select fetching</emphasis> - a second <literal>SELECT</literal>
                    is used to retrieve the associated entity or collection. Unless
                    you explicitly disable lazy fetching by specifying <literal>lazy="false"</literal>,
                    this second select will only be executed when you actually access the
                    association.
                </para>
            </listitem>
            <listitem>
                <para>
                    <emphasis>Subselect fetching</emphasis> - a second <literal>SELECT</literal>
                    is used to retrieve the associated collections for all entities retrieved in a
                    previous query or fetch. Unless you explicitly disable lazy fetching by specifying
                    <literal>lazy="false"</literal>, this second select will only be executed when you
                    actually access the association.
                </para>
            </listitem>
            <listitem>
                <para>
                    <emphasis>Batch fetching</emphasis> - an optimization strategy
                    for select fetching - Hibernate retrieves a batch of entity instances
                    or collections in a single <literal>SELECT</literal>, by specifying
                    a list of primary keys or foreign keys.
                </para>
            </listitem>
        </itemizedlist>

        <para>
            Hibernate also distinguishes between:
        </para>

        <itemizedlist>
             <listitem>
                <para>
                    <emphasis>Immediate fetching</emphasis> - an association, collection or
                    attribute is fetched immediately, when the owner is loaded.
                </para>
            </listitem>
            <listitem>
                <para>
                    <emphasis>Lazy collection fetching</emphasis> - a collection is fetched
                    when the application invokes an operation upon that collection. (This
                    is the default for collections.)
                </para>
            </listitem>
            <listitem>
                <para>
                    <emphasis>"Extra-lazy" collection fetching</emphasis> - individual
                    elements of the collection are accessed from the database as needed.
                    Hibernate tries not to fetch the whole collection into memory unless
                    absolutely needed (suitable for very large collections)
                </para>
            </listitem>
            <listitem>
                <para>
                    <emphasis>Proxy fetching</emphasis> - a single-valued association is
                    fetched when a method other than the identifier getter is invoked
                    upon the associated object.
                </para>
            </listitem>
            <listitem>
                <para>
                    <emphasis>"No-proxy" fetching</emphasis> - a single-valued association is
                    fetched when the instance variable is accessed. Compared to proxy fetching,
                    this approach is less lazy (the association is fetched even when only the
                    identifier is accessed) but more transparent, since no proxy is visible to
                    the application. This approach requires buildtime bytecode instrumentation
                    and is rarely necessary.
                </para>
            </listitem>
            <listitem>
                <para>
                    <emphasis>Lazy attribute fetching</emphasis> - an attribute or single
                    valued association is fetched when the instance variable is accessed.
                    This approach requires buildtime bytecode instrumentation and is rarely
                    necessary.
                </para>
            </listitem>
        </itemizedlist>

        <para>
            We have two orthogonal notions here: <emphasis>when</emphasis> is the association
            fetched, and <emphasis>how</emphasis> is it fetched (what SQL is used). Don't
            confuse them! We use <literal>fetch</literal> to tune performance. We may use
            <literal>lazy</literal> to define a contract for what data is always available
            in any detached instance of a particular class.
        </para>

        <sect2 id="performance-fetching-lazy">
            <title>Working with lazy associations</title>

            <para>
                By default, Hibernate3 uses lazy select fetching for collections and lazy proxy
                fetching for single-valued associations. These defaults make sense for almost
                all associations in almost all applications.
            </para>

            <para>
                <emphasis>Note:</emphasis> if you set
                <literal>hibernate.default_batch_fetch_size</literal>, Hibernate will use the
                batch fetch optimization for lazy fetching (this optimization may also be enabled
                at a more granular level).
            </para>

            <para>
                However, lazy fetching poses one problem that you must be aware of. Access to a
                lazy association outside of the context of an open Hibernate session will result
                in an exception. For example:
            </para>

            <programlisting><![CDATA[s = sessions.openSession();
Transaction tx = s.beginTransaction();

User u = (User) s.createQuery("from User u where u.name=:userName")
    .setString("userName", userName).uniqueResult();
Map permissions = u.getPermissions();

tx.commit();
s.close();

Integer accessLevel = (Integer) permissions.get("accounts");  // Error!]]></programlisting>

            <para>
                Since the permissions collection was not initialized when the
                <literal>Session</literal> was closed, the collection will not be able to
                load its state. <emphasis>Hibernate does not support lazy initialization
                for detached objects</emphasis>. The fix is to move the code that reads
                from the collection to just before the transaction is committed.
            </para>

            <para>
                Alternatively, we could use a non-lazy collection or association,
                by specifying <literal>lazy="false"</literal> for the association mapping.
                However, it is intended that lazy initialization be used for almost all
                collections and associations. If you define too many non-lazy associations
                in your object model, Hibernate will end up needing to fetch the entire
                database into memory in every transaction!
            </para>

            <para>
                On the other hand, we often want to choose join fetching (which is non-lazy by
                nature) instead of select fetching in a particular transaction. We'll now see
                how to customize the fetching strategy. In Hibernate3, the mechanisms for
                choosing a fetch strategy are identical for single-valued associations and
                collections.
            </para>

        </sect2>

        <sect2 id="performance-fetching-custom" revision="4">
            <title>Tuning fetch strategies</title>

            <para>
                Select fetching (the default) is extremely vulnerable to N+1 selects problems,
                so we might want to enable join fetching in the mapping document:
            </para>

            <programlisting><![CDATA[<set name="permissions"
            fetch="join">
    <key column="userId"/>
    <one-to-many class="Permission"/>
</set]]></programlisting>

           <programlisting><![CDATA[<many-to-one name="mother" class="Cat" fetch="join"/>]]></programlisting>

            <para>
                The <literal>fetch</literal> strategy defined in the mapping document affects:
            </para>

        <itemizedlist>
             <listitem>
                <para>
                    retrieval via <literal>get()</literal> or <literal>load()</literal>
                </para>
            </listitem>
            <listitem>
                <para>
                    retrieval that happens implicitly when an association is navigated
                </para>
            </listitem>
            <listitem>
                <para>
                    <literal>Criteria</literal> queries
                </para>
            </listitem>
            <listitem>
                <para>
                    HQL queries if <literal>subselect</literal> fetching is used
                </para>
            </listitem>
        </itemizedlist>

            <para>
                No matter what fetching strategy you use, the defined non-lazy graph is guaranteed
                to be loaded into memory. Note that this might result in several immediate selects
                being used to execute a particular HQL query.
            </para>

            <para>
                Usually, we don't use the mapping document to customize fetching. Instead, we
                keep the default behavior, and override it for a particular transaction, using
                <literal>left join fetch</literal> in HQL. This tells Hibernate to fetch
                the association eagerly in the first select, using an outer join. In the
                <literal>Criteria</literal> query API, you would use
                <literal>setFetchMode(FetchMode.JOIN)</literal>.
            </para>

            <para>
                If you ever feel like you wish you could change the fetching strategy used by
                <literal>get()</literal> or <literal>load()</literal>, simply use a
                <literal>Criteria</literal> query, for example:
            </para>

            <programlisting><![CDATA[User user = (User) session.createCriteria(User.class)
                .setFetchMode("permissions", FetchMode.JOIN)
                .add( Restrictions.idEq(userId) )
                .uniqueResult();]]></programlisting>

            <para>
                (This is Hibernate's equivalent of what some ORM solutions call a "fetch plan".)
            </para>

            <para>
                A completely different way to avoid problems with N+1 selects is to use the
                second-level cache.
            </para>

        </sect2>

        <sect2 id="performance-fetching-proxies" revision="2">
            <title>Single-ended association proxies</title>

            <para>
                Lazy fetching for collections is implemented using Hibernate's own implementation
                of persistent collections. However, a different mechanism is needed for lazy
                behavior in single-ended associations. The target entity of the association must
                be proxied. Hibernate implements lazy initializing proxies for persistent objects
                using runtime bytecode enhancement (via the excellent CGLIB library).
            </para>

            <para>
                By default, Hibernate3 generates proxies (at startup) for all persistent classes
                and uses them to enable lazy fetching of <literal>many-to-one</literal> and
                <literal>one-to-one</literal> associations.
            </para>

            <para>
                The mapping file may declare an interface to use as the proxy interface for that
                class, with the <literal>proxy</literal> attribute. By default, Hibernate uses a subclass
                of the class. <emphasis>Note that the proxied class must implement a default constructor
                with at least package visibility. We recommend this constructor for all persistent classes!</emphasis>
            </para>

            <para>
                There are some gotchas to be aware of when extending this approach to polymorphic
                classes, eg.
            </para>

            <programlisting><![CDATA[<class name="Cat" proxy="Cat">
    ......
    <subclass name="DomesticCat">
        .....
    </subclass>
</class>]]></programlisting>

            <para>
                Firstly, instances of <literal>Cat</literal> will never be castable to
                <literal>DomesticCat</literal>, even if the underlying instance is an
                instance of <literal>DomesticCat</literal>:
            </para>

            <programlisting><![CDATA[Cat cat = (Cat) session.load(Cat.class, id);  // instantiate a proxy (does not hit the db)
if ( cat.isDomesticCat() ) {                  // hit the db to initialize the proxy
    DomesticCat dc = (DomesticCat) cat;       // Error!
    ....
}]]></programlisting>

            <para>
                Secondly, it is possible to break proxy <literal>==</literal>.
            </para>

            <programlisting><![CDATA[Cat cat = (Cat) session.load(Cat.class, id);            // instantiate a Cat proxy
DomesticCat dc =
        (DomesticCat) session.load(DomesticCat.class, id);  // acquire new DomesticCat proxy!
System.out.println(cat==dc);                            // false]]></programlisting>

            <para>
                However, the situation is not quite as bad as it looks. Even though we now have two references
                to different proxy objects, the underlying instance will still be the same object:
            </para>

            <programlisting><![CDATA[cat.setWeight(11.0);  // hit the db to initialize the proxy
System.out.println( dc.getWeight() );  // 11.0]]></programlisting>

            <para>
                Third, you may not use a CGLIB proxy for a <literal>final</literal> class or a class
                with any <literal>final</literal> methods.
            </para>

            <para>
                Finally, if your persistent object acquires any resources upon instantiation (eg. in
                initializers or default constructor), then those resources will also be acquired by
                the proxy. The proxy class is an actual subclass of the persistent class.
            </para>

            <para>
                These problems are all due to fundamental limitations in Java's single inheritance model.
                If you wish to avoid these problems your persistent classes must each implement an interface
                that declares its business methods. You should specify these interfaces in the mapping file. eg.
            </para>

            <programlisting><![CDATA[<class name="CatImpl" proxy="Cat">
    ......
    <subclass name="DomesticCatImpl" proxy="DomesticCat">
        .....
    </subclass>
</class>]]></programlisting>

            <para>
                where <literal>CatImpl</literal> implements the interface <literal>Cat</literal> and
                <literal>DomesticCatImpl</literal> implements the interface <literal>DomesticCat</literal>. Then
                proxies for instances of <literal>Cat</literal> and <literal>DomesticCat</literal> may be returned
                by <literal>load()</literal> or <literal>iterate()</literal>. (Note that <literal>list()</literal>
                does not usually return proxies.)
            </para>

            <programlisting><![CDATA[Cat cat = (Cat) session.load(CatImpl.class, catid);
Iterator iter = session.iterate("from CatImpl as cat where cat.name='fritz'");
Cat fritz = (Cat) iter.next();]]></programlisting>

            <para>
                Relationships are also lazily initialized. This means you must declare any properties to be of
                type <literal>Cat</literal>, not <literal>CatImpl</literal>.
            </para>

            <para>
                Certain operations do <emphasis>not</emphasis> require proxy initialization
            </para>

            <itemizedlist spacing="compact">
                <listitem>
                    <para>
                        <literal>equals()</literal>, if the persistent class does not override
                        <literal>equals()</literal>
                    </para>
                </listitem>
                <listitem>
                    <para>
                        <literal>hashCode()</literal>, if the persistent class does not override
                        <literal>hashCode()</literal>
                    </para>
                </listitem>
                <listitem>
                    <para>
                        The identifier getter method
                    </para>
                </listitem>
            </itemizedlist>

            <para>
                Hibernate will detect persistent classes that override <literal>equals()</literal> or
                <literal>hashCode()</literal>.
            </para>

            <para>
                By choosing <literal>lazy="no-proxy"</literal> instead of the default
                <literal>lazy="proxy"</literal>, we can avoid the problems associated with typecasting.
                However, we will require buildtime bytecode instrumentation, and all operations
                will result in immediate proxy initialization.
            </para>

        </sect2>

        <sect2 id="performance-fetching-initialization" revision="1">
            <title>Initializing collections and proxies</title>

            <para>
                A <literal>LazyInitializationException</literal> will be thrown by Hibernate if an uninitialized
                collection or proxy is accessed outside of the scope of the <literal>Session</literal>, ie. when
                the entity owning the collection or having the reference to the proxy is in the detached state.
            </para>

            <para>
                Sometimes we need to ensure that a proxy or collection is initialized before closing the
                <literal>Session</literal>. Of course, we can alway force initialization by calling
                <literal>cat.getSex()</literal> or <literal>cat.getKittens().size()</literal>, for example.
                But that is confusing to readers of the code and is not convenient for generic code.
            </para>

            <para>
                The static methods <literal>Hibernate.initialize()</literal> and <literal>Hibernate.isInitialized()</literal>
                provide the application with a convenient way of working with lazily initialized collections or
                proxies. <literal>Hibernate.initialize(cat)</literal> will force the initialization of a proxy,
                <literal>cat</literal>, as long as its <literal>Session</literal> is still open.
                <literal>Hibernate.initialize( cat.getKittens() )</literal> has a similar effect for the collection
                of kittens.
            </para>

            <para>
                Another option is to keep the <literal>Session</literal> open until all needed
                collections and proxies have been loaded. In some application architectures,
                particularly where the code that accesses data using Hibernate, and the code that
                uses it are in different application layers or different physical processes, it
                can be a problem to ensure that the <literal>Session</literal> is open when a
                collection is initialized. There are two basic ways to deal with this issue:
            </para>

            <itemizedlist>
                <listitem>
                    <para>
                        In a web-based application, a servlet filter can be used to close the
                        <literal>Session</literal> only at the very end of a user request, once
                        the rendering of the view is complete (the <emphasis>Open Session in
                        View</emphasis> pattern).  Of course, this places heavy demands on the
                        correctness of the exception handling of your application infrastructure.
                        It is vitally important that the <literal>Session</literal> is closed and the
                        transaction ended before returning to the user, even when an exception occurs
                        during rendering of the view. See the Hibernate Wiki for examples of this
                        "Open Session in View" pattern.
                    </para>
                </listitem>
                <listitem>
                    <para>
                        In an application with a separate business tier, the business logic must
                        "prepare" all collections that will be needed by the web tier before
                        returning. This means that the business tier should load all the data and
                        return all the data already initialized to the presentation/web tier that
                        is required for a particular use case. Usually, the application calls
                        <literal>Hibernate.initialize()</literal> for each collection that will
                        be needed in the web tier (this call must occur before the session is closed)
                        or retrieves the collection eagerly using a Hibernate query with a
                        <literal>FETCH</literal> clause or a <literal>FetchMode.JOIN</literal> in
                        <literal>Criteria</literal>. This is usually easier if you adopt the
                        <emphasis>Command</emphasis> pattern instead of a <emphasis>Session Facade</emphasis>.
                    </para>
                </listitem>
                <listitem>
                    <para>
                        You may also attach a previously loaded object to a new <literal>Session</literal>
                        with <literal>merge()</literal> or <literal>lock()</literal> before
                        accessing uninitialized collections (or other proxies). No, Hibernate does not,
                        and certainly <emphasis>should</emphasis> not do this automatically, since it
                        would introduce ad hoc transaction semantics!
                    </para>
                </listitem>
            </itemizedlist>

            <para>
                Sometimes you don't want to initialize a large collection, but still need some
                information about it (like its size) or a subset of the data.
            </para>

            <para>
                You can use a collection filter to get the size of a collection without initializing it:
            </para>

            <programlisting><![CDATA[( (Integer) s.createFilter( collection, "select count(*)" ).list().get(0) ).intValue()]]></programlisting>

            <para>
                The <literal>createFilter()</literal> method is also used to efficiently retrieve subsets
                of a collection without needing to initialize the whole collection:
            </para>

            <programlisting><![CDATA[s.createFilter( lazyCollection, "").setFirstResult(0).setMaxResults(10).list();]]></programlisting>

        </sect2>

        <sect2 id="performance-fetching-batch">
            <title>Using batch fetching</title>

            <para>
                Hibernate can make efficient use of batch fetching, that is, Hibernate can load several uninitialized
                proxies if one proxy is accessed (or collections. Batch fetching is an optimization of the lazy select
                fetching strategy. There are two ways you can tune batch fetching: on the class and the collection level.
            </para>

            <para>
                Batch fetching for classes/entities is easier to understand. Imagine you have the following situation
                at runtime: You have 25 <literal>Cat</literal> instances loaded in a <literal>Session</literal>, each
                <literal>Cat</literal> has a reference to its <literal>owner</literal>, a <literal>Person</literal>.
                The <literal>Person</literal> class is mapped with a proxy, <literal>lazy="true"</literal>. If you now
                iterate through all cats and call <literal>getOwner()</literal> on each, Hibernate will by default
                execute 25 <literal>SELECT</literal> statements, to retrieve the proxied owners. You can tune this
                behavior by specifying a <literal>batch-size</literal> in the mapping of <literal>Person</literal>:
            </para>

            <programlisting><![CDATA[<class name="Person" batch-size="10">...</class>]]></programlisting>

            <para>
                Hibernate will now execute only three queries, the pattern is 10, 10, 5.
            </para>

            <para>
                You may also enable batch fetching of collections. For example, if each <literal>Person</literal> has
                a lazy collection of <literal>Cat</literal>s, and 10 persons are currently loaded in the
                <literal>Sesssion</literal>, iterating through all persons will generate 10 <literal>SELECT</literal>s,
                one for every call to <literal>getCats()</literal>. If you enable batch fetching for the
                <literal>cats</literal> collection in the mapping of <literal>Person</literal>, Hibernate can pre-fetch
                collections:
            </para>

            <programlisting><![CDATA[<class name="Person">
    <set name="cats" batch-size="3">
        ...
    </set>
</class>]]></programlisting>

            <para>
                With a <literal>batch-size</literal> of 8, Hibernate will load 3, 3, 3, 1 collections in four
                <literal>SELECT</literal>s. Again, the value of the attribute depends on the expected number of
                uninitialized collections in a particular <literal>Session</literal>.
            </para>

            <para>
                Batch fetching of collections is particularly useful if you have a nested tree of items, ie.
                the typical bill-of-materials pattern. (Although a <emphasis>nested set</emphasis> or a
                <emphasis>materialized path</emphasis> might be a better option for read-mostly trees.)
            </para>

        </sect2>

        <sect2 id="performance-fetching-subselect">
            <title>Using subselect fetching</title>

            <para>
                If one lazy collection or single-valued proxy has to be fetched, Hibernate loads all of
                them, re-running the original query in a subselect. This works in the same way as
                batch-fetching, without the piecemeal loading.
            </para>

            <!-- TODO: Write more about this -->

        </sect2>

        <sect2 id="performance-fetching-lazyproperties">
            <title>Using lazy property fetching</title>

            <para>
                Hibernate3 supports the lazy fetching of individual properties. This optimization technique
                is also known as <emphasis>fetch groups</emphasis>. Please note that this is mostly a
                marketing feature, as in practice, optimizing row reads is much more important than
                optimization of column reads. However, only loading some properties of a class might
                be useful in extreme cases, when legacy tables have hundreds of columns and the data model
                can not be improved.
            </para>

            <para>
                To enable lazy property loading, set the <literal>lazy</literal> attribute on your
                particular property mappings:
            </para>

            <programlisting><![CDATA[<class name="Document">
       <id name="id">
        <generator class="native"/>
    </id>
    <property name="name" not-null="true" length="50"/>
    <property name="summary" not-null="true" length="200" lazy="true"/>
    <property name="text" not-null="true" length="2000" lazy="true"/>
</class>]]></programlisting>

            <para>
                Lazy property loading requires buildtime bytecode instrumentation! If your persistent
                classes are not enhanced, Hibernate will silently ignore lazy property settings and
                fall back to immediate fetching.
            </para>

            <para>
                For bytecode instrumentation, use the following Ant task:
            </para>

            <programlisting><![CDATA[<target name="instrument" depends="compile">
    <taskdef name="instrument" classname="org.hibernate.tool.instrument.InstrumentTask">
        <classpath path="${jar.path}"/>
        <classpath path="${classes.dir}"/>
        <classpath refid="lib.class.path"/>
    </taskdef>

    <instrument verbose="true">
        <fileset dir="${testclasses.dir}/org/hibernate/auction/model">
            <include name="*.class"/>
        </fileset>
    </instrument>
</target>]]></programlisting>

            <para>
                A different (better?) way to avoid unnecessary column reads, at least for
                read-only transactions is to use the projection features of HQL or Criteria
                queries. This avoids the need for buildtime bytecode processing and is
                certainly a prefered solution.
            </para>

            <para>
                You may force the usual eager fetching of properties using <literal>fetch all
                properties</literal> in HQL.
            </para>

        </sect2>

    </sect1>

    <sect1 id="performance-cache" revision="1">
        <title>The Second Level Cache</title>

        <para>
            A Hibernate <literal>Session</literal> is a transaction-level cache of persistent data. It is
            possible to configure a cluster or JVM-level (<literal>SessionFactory</literal>-level) cache on
            a class-by-class and collection-by-collection basis. You may even plug in a clustered cache. Be
            careful. Caches are never aware of changes made to the persistent store by another application
            (though they may be configured to regularly expire cached data).
        </para>

        <para revision="1">
            You have the option to tell Hibernate which caching implementation to use by
            specifying the name of a class that implements <literal>org.hibernate.cache.CacheProvider</literal>
            using the property <literal>hibernate.cache.provider_class</literal>.  Hibernate
            comes bundled with a number of built-in integrations with open-source cache providers
            (listed below); additionally, you could implement your own and plug it in as
            outlined above.  Note that versions prior to 3.2 defaulted to use EhCache as the
            default cache provider; that is no longer the case as of 3.2.
        </para>

        <table frame="topbot" id="cacheproviders" revision="1">
            <title>Cache Providers</title>
            <tgroup cols='5' align='left' colsep='1' rowsep='1'>
            <colspec colname='c1' colwidth="1*"/>
            <colspec colname='c2' colwidth="3*"/>
            <colspec colname='c3' colwidth="1*"/>
            <colspec colname='c4' colwidth="1*"/>
            <colspec colname='c5' colwidth="1*"/>
            <thead>
            <row>
              <entry>Cache</entry>
              <entry>Provider class</entry>
              <entry>Type</entry>
              <entry>Cluster Safe</entry>
              <entry>Query Cache Supported</entry>
            </row>
            </thead>
            <tbody>
            <row>
                <entry>Hashtable (not intended for production use)</entry>
                <entry><literal>org.hibernate.cache.HashtableCacheProvider</literal></entry>
                <entry>memory</entry>
                <entry></entry>
                <entry>yes</entry>
            </row>
            <row>
                <entry>EHCache</entry>
                <entry><literal>org.hibernate.cache.EhCacheProvider</literal></entry>
                <entry>memory, disk</entry>
                <entry></entry>
                <entry>yes</entry>
            </row>
            <row>
                <entry>OSCache</entry>
                <entry><literal>org.hibernate.cache.OSCacheProvider</literal></entry>
                <entry>memory, disk</entry>
                <entry></entry>
                <entry>yes</entry>
            </row>
            <row>
                <entry>SwarmCache</entry>
                <entry><literal>org.hibernate.cache.SwarmCacheProvider</literal></entry>
                <entry>clustered (ip multicast)</entry>
                <entry>yes (clustered invalidation)</entry>
                <entry></entry>
            </row>
            <row>
                <entry>JBoss TreeCache</entry>
                <entry><literal>org.hibernate.cache.TreeCacheProvider</literal></entry>
                <entry>clustered (ip multicast), transactional</entry>
                <entry>yes (replication)</entry>
                <entry>yes (clock sync req.)</entry>
            </row>
            </tbody>
            </tgroup>
        </table>

        <sect2 id="performance-cache-mapping" revision="2">
            <title>Cache mappings</title>

            <para>
                The <literal>&lt;cache&gt;</literal> element of a class or collection mapping has the
                following form:
            </para>

            <programlistingco>
                <areaspec>
                    <area id="cache1" coords="2 70"/>
                    <area id="cache2" coords="3 70"/>
                    <area id="cache3" coords="4 70"/>
                </areaspec>
                <programlisting><![CDATA[<cache
    usage="transactional|read-write|nonstrict-read-write|read-only"
    region="RegionName"
    include="all|non-lazy"
/>]]></programlisting>
                <calloutlist>
                    <callout arearefs="cache1">
                        <para>
                            <literal>usage</literal> (required) specifies the caching strategy:
                            <literal>transactional</literal>,
                            <literal>read-write</literal>,
                            <literal>nonstrict-read-write</literal> or
                            <literal>read-only</literal>
                        </para>
                    </callout>
                    <callout arearefs="cache2">
                        <para>
                            <literal>region</literal> (optional, defaults to the class or
                            collection role name) specifies the name of the second level cache
                            region
                        </para>
                    </callout>
                    <callout arearefs="cache3">
                        <para>
                            <literal>include</literal> (optional, defaults to <literal>all</literal>)
                            <literal>non-lazy</literal> specifies that properties of the entity mapped
                            with <literal>lazy="true"</literal> may not be cached when attribute-level
                            lazy fetching is enabled
                        </para>
                    </callout>
                </calloutlist>
            </programlistingco>

            <para>
                Alternatively (preferrably?), you may specify <literal>&lt;class-cache&gt;</literal> and
                <literal>&lt;collection-cache&gt;</literal> elements in <literal>hibernate.cfg.xml</literal>.
            </para>

            <para>
                The <literal>usage</literal> attribute specifies a <emphasis>cache concurrency strategy</emphasis>.
            </para>

        </sect2>

        <sect2 id="performance-cache-readonly">
            <title>Strategy: read only</title>

            <para>
                If your application needs to read but never modify instances of a persistent class, a
                <literal>read-only</literal> cache may be used. This is the simplest and best performing
                strategy. It's even perfectly safe for use in a cluster.
            </para>

            <programlisting><![CDATA[<class name="eg.Immutable" mutable="false">
    <cache usage="read-only"/>
    ....
</class>]]></programlisting>

        </sect2>


        <sect2 id="performance-cache-readwrite">
            <title>Strategy: read/write</title>

            <para>
                If the application needs to update data, a <literal>read-write</literal> cache might be appropriate.
                This cache strategy should never be used if serializable transaction isolation level is required.
                If the cache is used in a JTA environment, you must specify the property
                <literal>hibernate.transaction.manager_lookup_class</literal>, naming a strategy for obtaining the
                JTA <literal>TransactionManager</literal>. In other environments, you should ensure that the transaction
                is completed when <literal>Session.close()</literal> or <literal>Session.disconnect()</literal> is called.
                If you wish to use this strategy in a cluster, you should ensure that the underlying cache implementation
                supports locking. The built-in cache providers do <emphasis>not</emphasis>.
            </para>

            <programlisting><![CDATA[<class name="eg.Cat" .... >
    <cache usage="read-write"/>
    ....
    <set name="kittens" ... >
        <cache usage="read-write"/>
        ....
    </set>
</class>]]></programlisting>

        </sect2>

        <sect2 id="performance-cache-nonstrict">
            <title>Strategy: nonstrict read/write</title>

            <para>
                If the application only occasionally needs to update data (ie. if it is extremely unlikely that two
                transactions would try to update the same item simultaneously) and strict transaction isolation is
                not required, a <literal>nonstrict-read-write</literal> cache might be appropriate. If the cache is
                used in a JTA environment, you must specify <literal>hibernate.transaction.manager_lookup_class</literal>.
                In other environments, you should ensure that the transaction is completed when
                <literal>Session.close()</literal> or <literal>Session.disconnect()</literal> is called.
            </para>

        </sect2>

        <sect2 id="performance-cache-transactional">
            <title>Strategy: transactional</title>

            <para>
                The <literal>transactional</literal> cache strategy provides support for fully transactional cache
                providers such as JBoss TreeCache. Such a cache may only be used in a JTA environment and you must
                specify <literal>hibernate.transaction.manager_lookup_class</literal>.
            </para>

        </sect2>

        <para>
            None of the cache providers support all of the cache concurrency strategies. The following table shows
            which providers are compatible with which concurrency strategies.
        </para>

        <table frame="topbot">
            <title>Cache Concurrency Strategy Support</title>
            <tgroup cols='5' align='left' colsep='1' rowsep='1'>
            <colspec colname='c1' colwidth="1*"/>
            <colspec colname='c2' colwidth="1*"/>
            <colspec colname='c3' colwidth="1*"/>
            <colspec colname='c4' colwidth="1*"/>
            <colspec colname='c5' colwidth="1*"/>
            <thead>
            <row>
              <entry>Cache</entry>
              <entry>read-only</entry>
              <entry>nonstrict-read-write</entry>
              <entry>read-write</entry>
              <entry>transactional</entry>
            </row>
            </thead>
            <tbody>
            <row>
                <entry>Hashtable (not intended for production use)</entry>
                <entry>yes</entry>
                <entry>yes</entry>
                <entry>yes</entry>
                <entry></entry>
            </row>
            <row>
                <entry>EHCache</entry>
                <entry>yes</entry>
                <entry>yes</entry>
                <entry>yes</entry>
                <entry></entry>
            </row>
            <row>
                <entry>OSCache</entry>
                <entry>yes</entry>
                <entry>yes</entry>
                <entry>yes</entry>
                <entry></entry>
            </row>
            <row>
                <entry>SwarmCache</entry>
                <entry>yes</entry>
                <entry>yes</entry>
                <entry></entry>
                <entry></entry>
            </row>
            <row>
                <entry>JBoss TreeCache</entry>
                <entry>yes</entry>
                <entry></entry>
                <entry></entry>
                <entry>yes</entry>
            </row>
            </tbody>
            </tgroup>
        </table>

    </sect1>

    <sect1 id="performance-sessioncache" revision="2">
        <title>Managing the caches</title>

        <para>
            Whenever you pass an object to <literal>save()</literal>, <literal>update()</literal>
            or <literal>saveOrUpdate()</literal> and whenever you retrieve an object using
            <literal>load()</literal>, <literal>get()</literal>, <literal>list()</literal>,
            <literal>iterate()</literal> or <literal>scroll()</literal>, that object is added
            to the internal cache of the <literal>Session</literal>.
        </para>
        <para>
            When <literal>flush()</literal> is subsequently called, the state of that object will
            be synchronized with the database. If you do not want this synchronization to occur or
            if you are processing a huge number of objects and need to manage memory efficiently,
            the <literal>evict()</literal> method may be used to remove the object and its collections
            from the first-level cache.
        </para>

        <programlisting><![CDATA[ScrollableResult cats = sess.createQuery("from Cat as cat").scroll(); //a huge result set
while ( cats.next() ) {
    Cat cat = (Cat) cats.get(0);
    doSomethingWithACat(cat);
    sess.evict(cat);
}]]></programlisting>

        <para>
            The <literal>Session</literal> also provides a <literal>contains()</literal> method to determine
            if an instance belongs to the session cache.
        </para>

        <para>
            To completely evict all objects from the session cache, call <literal>Session.clear()</literal>
        </para>

        <para>
            For the second-level cache, there are methods defined on <literal>SessionFactory</literal> for
            evicting the cached state of an instance, entire class, collection instance or entire collection
            role.
        </para>

        <programlisting><![CDATA[sessionFactory.evict(Cat.class, catId); //evict a particular Cat
sessionFactory.evict(Cat.class);  //evict all Cats
sessionFactory.evictCollection("Cat.kittens", catId); //evict a particular collection of kittens
sessionFactory.evictCollection("Cat.kittens"); //evict all kitten collections]]></programlisting>

        <para>
            The <literal>CacheMode</literal> controls how a particular session interacts with the second-level
            cache.
        </para>

        <itemizedlist>
        <listitem>
        <para>
            <literal>CacheMode.NORMAL</literal> - read items from and write items to the second-level cache
        </para>
        </listitem>
        <listitem>
        <para>
            <literal>CacheMode.GET</literal> - read items from the second-level cache, but don't write to
            the second-level cache except when updating data
        </para>
        </listitem>
        <listitem>
        <para>
            <literal>CacheMode.PUT</literal> - write items to the second-level cache, but don't read from
            the second-level cache
        </para>
        </listitem>
        <listitem>
        <para>
            <literal>CacheMode.REFRESH</literal> - write items to the second-level cache, but don't read from
            the second-level cache, bypass the effect of <literal>hibernate.cache.use_minimal_puts</literal>, forcing
            a refresh of the second-level cache for all items read from the database
        </para>
        </listitem>
        </itemizedlist>

        <para>
            To browse the contents of a second-level or query cache region, use the <literal>Statistics</literal>
            API:
        </para>

        <programlisting><![CDATA[Map cacheEntries = sessionFactory.getStatistics()
        .getSecondLevelCacheStatistics(regionName)
        .getEntries();]]></programlisting>

        <para>
            You'll need to enable statistics, and, optionally, force Hibernate to keep the cache entries in a
            more human-understandable format:
        </para>

        <programlisting><![CDATA[hibernate.generate_statistics true
hibernate.cache.use_structured_entries true]]></programlisting>

    </sect1>

    <sect1 id="performance-querycache" revision="1">
        <title>The Query Cache</title>

        <para>
            Query result sets may also be cached. This is only useful for queries that are run
            frequently with the same parameters. To use the query cache you must first enable it:
        </para>

        <programlisting><![CDATA[hibernate.cache.use_query_cache true]]></programlisting>

        <para>
            This setting causes the creation of two new cache regions - one holding cached query
            result sets (<literal>org.hibernate.cache.StandardQueryCache</literal>), the other
            holding timestamps of the most recent updates to queryable tables
            (<literal>org.hibernate.cache.UpdateTimestampsCache</literal>). Note that the query
            cache does not cache the state of the actual entities in the result set; it caches
            only identifier values and results of value type. So the query cache should always be
            used in conjunction with the second-level cache.
        </para>

        <para>
            Most queries do not benefit from caching, so by default queries are not cached. To
            enable caching, call <literal>Query.setCacheable(true)</literal>. This call allows
            the query to look for existing cache results or add its results to the cache when
            it is executed.
        </para>

        <para>
            If you require fine-grained control over query cache expiration policies, you may
            specify a named cache region for a particular query by calling
            <literal>Query.setCacheRegion()</literal>.
        </para>

        <programlisting><![CDATA[List blogs = sess.createQuery("from Blog blog where blog.blogger = :blogger")
    .setEntity("blogger", blogger)
    .setMaxResults(15)
    .setCacheable(true)
    .setCacheRegion("frontpages")
    .list();]]></programlisting>

        <para>
            If the query should force a refresh of its query cache region, you should call
            <literal>Query.setCacheMode(CacheMode.REFRESH)</literal>. This is particularly useful
            in cases where underlying data may have been updated via a separate process (i.e.,
            not modified through Hibernate) and allows the application to selectively refresh
            particular query result sets. This is a more efficient alternative to eviction of
            a query cache region via <literal>SessionFactory.evictQueries()</literal>.
        </para>

    </sect1>

    <sect1 id="performance-collections">
        <title>Understanding Collection performance</title>

        <para>
            We've already spent quite some time talking about collections.
            In this section we will highlight a couple more issues about
            how collections behave at runtime.
        </para>

        <sect2 id="performance-collections-taxonomy">
            <title>Taxonomy</title>

            <para>Hibernate defines three basic kinds of collections:</para>

            <itemizedlist>
            <listitem>
                <para>collections of values</para>
            </listitem>
            <listitem>
                <para>one to many associations</para>
            </listitem>
            <listitem>
                <para>many to many associations</para>
            </listitem>
            </itemizedlist>

            <para>
                This classification distinguishes the various table and foreign key
                relationships but does not tell us quite everything we need to know
                about the relational model. To fully understand the relational structure
                and performance characteristics, we must also consider the structure of
                the primary key that is used by Hibernate to update or delete collection
                rows. This suggests the following classification:
            </para>

            <itemizedlist>
            <listitem>
                <para>indexed collections</para>
            </listitem>
            <listitem>
                <para>sets</para>
            </listitem>
            <listitem>
                <para>bags</para>
            </listitem>
            </itemizedlist>

            <para>
                All indexed collections (maps, lists, arrays) have a primary key consisting
                of the <literal>&lt;key&gt;</literal> and <literal>&lt;index&gt;</literal>
                columns. In this case collection updates are usually extremely efficient -
                the primary key may be efficiently indexed and a particular row may be efficiently
                located when Hibernate tries to update or delete it.
            </para>

            <para>
                Sets have a primary key consisting of <literal>&lt;key&gt;</literal> and element
                columns. This may be less efficient for some types of collection element, particularly
                composite elements or large text or binary fields; the database may not be able to index
                a complex primary key as efficently.  On the other hand, for one to many or many to many
                associations, particularly in the case of synthetic identifiers, it is likely to be just
                as efficient. (Side-note: if you want <literal>SchemaExport</literal> to actually create
                the primary key of a <literal>&lt;set&gt;</literal> for you, you must declare all columns
                as <literal>not-null="true"</literal>.)
            </para>

            <para>
                <literal>&lt;idbag&gt;</literal> mappings define a surrogate key, so they are
                always very efficient to update. In fact, they are the best case.
            </para>

            <para>
                Bags are the worst case. Since a bag permits duplicate element values and has no
                index column, no primary key may be defined. Hibernate has no way of distinguishing
                between duplicate rows. Hibernate resolves this problem by completely removing
                (in a single <literal>DELETE</literal>) and recreating the collection whenever it
                changes. This might be very inefficient.
            </para>

            <para>
                Note that for a one-to-many association, the "primary key" may not be the physical
                primary key of the database table - but even in this case, the above classification
                is still useful. (It still reflects how Hibernate "locates" individual rows of the
                collection.)
            </para>

        </sect2>

        <sect2 id="performance-collections-mostefficientupdate">
            <title>Lists, maps, idbags and sets are the most efficient collections to update</title>

            <para>
                From the discussion above, it should be clear that indexed collections
                and (usually) sets allow the most efficient operation in terms of adding,
                removing and updating elements.
            </para>

            <para>
                There is, arguably, one more advantage that indexed collections have over sets for
                many to many associations or collections of values. Because of the structure of a
                <literal>Set</literal>, Hibernate doesn't ever <literal>UPDATE</literal> a row when
                an element is "changed". Changes to a <literal>Set</literal> always work via
                <literal>INSERT</literal> and <literal>DELETE</literal> (of individual rows). Once
                again, this consideration does not apply to one to many associations.
            </para>

            <para>
                After observing that arrays cannot be lazy, we would conclude that lists, maps and
                idbags are the most performant (non-inverse) collection types, with sets not far
                behind. Sets are expected to be the most common kind of collection in Hibernate
                applications. This is because the "set" semantics are most natural in the relational
                model.
            </para>

            <para>
                However, in well-designed Hibernate domain models, we usually see that most collections
                are in fact one-to-many associations with <literal>inverse="true"</literal>. For these
                associations, the update is handled by the many-to-one end of the association, and so
                considerations of collection update performance simply do not apply.
            </para>

        </sect2>

        <sect2 id="performance-collections-mostefficentinverse">
            <title>Bags and lists are the most efficient inverse collections</title>

            <para>
                Just before you ditch bags forever, there is a particular case in which bags (and also lists)
                are much more performant than sets. For a collection with <literal>inverse="true"</literal>
                (the standard bidirectional one-to-many relationship idiom, for example) we can add elements
                to a bag or list without needing to initialize (fetch) the bag elements! This is because
                <literal>Collection.add()</literal> or <literal>Collection.addAll()</literal> must always
                return true for a bag or <literal>List</literal> (unlike a <literal>Set</literal>). This can
                make the following common code much faster.
            </para>

            <programlisting><![CDATA[Parent p = (Parent) sess.load(Parent.class, id);
Child c = new Child();
c.setParent(p);
p.getChildren().add(c);  //no need to fetch the collection!
sess.flush();]]></programlisting>

        </sect2>

        <sect2 id="performance-collections-oneshotdelete">
            <title>One shot delete</title>

            <para>
                Occasionally, deleting collection elements one by one can be extremely inefficient. Hibernate
                isn't completely stupid, so it knows not to do that in the case of an newly-empty collection
                (if you called <literal>list.clear()</literal>, for example). In this case, Hibernate will
                issue a single <literal>DELETE</literal> and we are done!
            </para>

            <para>
                Suppose we add a single element to a collection of size twenty and then remove two elements.
                Hibernate will issue one <literal>INSERT</literal> statement and two <literal>DELETE</literal>
                statements (unless the collection is a bag). This is certainly desirable.
            </para>

            <para>
                However, suppose that we remove eighteen elements, leaving two and then add thee new elements.
                There are two possible ways to proceed
            </para>

            <itemizedlist>
            <listitem>
                <para>delete eighteen rows one by one and then insert three rows</para>
            </listitem>
            <listitem>
                <para>remove the whole collection (in one SQL <literal>DELETE</literal>) and insert
                all five current elements (one by one)</para>
            </listitem>
            </itemizedlist>

            <para>
                Hibernate isn't smart enough to know that the second option is probably quicker in this case.
                (And it would probably be undesirable for Hibernate to be that smart; such behaviour might
                confuse database triggers, etc.)
            </para>

            <para>
                Fortunately, you can force this behaviour (ie. the second strategy) at any time by discarding
                (ie. dereferencing) the original collection and returning a newly instantiated collection with
                all the current elements. This can be very useful and powerful from time to time.
            </para>

            <para>
                Of course, one-shot-delete does not apply to collections mapped <literal>inverse="true"</literal>.
            </para>

        </sect2>

    </sect1>

    <sect1 id="performance-monitoring" revision="1">
        <title>Monitoring performance</title>

        <para>
            Optimization is not much use without monitoring and access to performance numbers.
            Hibernate provides a full range of figures about its internal operations.
            Statistics in Hibernate are available per <literal>SessionFactory</literal>.
        </para>

        <sect2 id="performance-monitoring-sf" revision="2">
            <title>Monitoring a SessionFactory</title>

            <para>
                You can access <literal>SessionFactory</literal> metrics in two ways.
                Your first option is to call <literal>sessionFactory.getStatistics()</literal> and
                read or display the <literal>Statistics</literal> yourself.
            </para>

            <para>
                Hibernate can also use JMX to publish metrics if you enable the
                <literal>StatisticsService</literal> MBean. You may enable a single MBean for all your
                <literal>SessionFactory</literal> or one per factory. See the following code for
                minimalistic configuration examples:
            </para>

            <programlisting><![CDATA[// MBean service registration for a specific SessionFactory
Hashtable tb = new Hashtable();
tb.put("type", "statistics");
tb.put("sessionFactory", "myFinancialApp");
ObjectName on = new ObjectName("hibernate", tb); // MBean object name

StatisticsService stats = new StatisticsService(); // MBean implementation
stats.setSessionFactory(sessionFactory); // Bind the stats to a SessionFactory
server.registerMBean(stats, on); // Register the Mbean on the server]]></programlisting>


<programlisting><![CDATA[// MBean service registration for all SessionFactory's
Hashtable tb = new Hashtable();
tb.put("type", "statistics");
tb.put("sessionFactory", "all");
ObjectName on = new ObjectName("hibernate", tb); // MBean object name

StatisticsService stats = new StatisticsService(); // MBean implementation
server.registerMBean(stats, on); // Register the MBean on the server]]></programlisting>

            <para>
                TODO: This doesn't make sense: In the first case, we retrieve and use the MBean directly. In the second one, we must give
                the JNDI name in which the session factory is held before using it. Use
                <literal>hibernateStatsBean.setSessionFactoryJNDIName("my/JNDI/Name")</literal>
            </para>
            <para>
                You can (de)activate the monitoring for a <literal>SessionFactory</literal>
            </para>
            <itemizedlist>
                <listitem>
                    <para>
                        at configuration time, set <literal>hibernate.generate_statistics</literal> to <literal>false</literal>
                    </para>
                </listitem>
            </itemizedlist>
            <itemizedlist>
                <listitem>
                    <para>
                        at runtime: <literal>sf.getStatistics().setStatisticsEnabled(true)</literal>
                        or <literal>hibernateStatsBean.setStatisticsEnabled(true)</literal>
                    </para>
                </listitem>
            </itemizedlist>

            <para>
                Statistics can be reset programatically using the <literal>clear()</literal> method.
                A summary can be sent to a logger (info level) using the <literal>logSummary()</literal>
                method.
            </para>

        </sect2>

        <sect2 id="performance-monitoring-metrics" revision="1">
            <title>Metrics</title>

            <para>
                Hibernate provides a number of metrics, from very basic to the specialized information
                only relevant in certain scenarios. All available counters are described in the
                <literal>Statistics</literal> interface API, in three categories:
            </para>
            <itemizedlist>
                <listitem>
                    <para>
                        Metrics related to the general <literal>Session</literal> usage, such as
                        number of open sessions, retrieved JDBC connections, etc.
                    </para>
                </listitem>
                <listitem>
                    <para>
                        Metrics related to he entities, collections, queries, and caches as a
                        whole (aka global metrics),
                    </para>
                </listitem>
                <listitem>
                    <para>
                        Detailed metrics related to a particular entity, collection, query or
                        cache region.
                    </para>
                </listitem>
            </itemizedlist>

            <para>
                For exampl,e you can check the cache hit, miss, and put ratio of entities, collections
                and queries, and the average time a query needs. Beware that the number of milliseconds
                is subject to approximation in Java. Hibernate is tied to the JVM precision, on some
                platforms this might even only be accurate to 10 seconds.
            </para>

            <para>
                Simple getters are used to access the global metrics (i.e. not tied to a particular entity,
                collection, cache region, etc.). You can access the metrics of a particular entity, collection
                or cache region through its name, and through its HQL or SQL representation for queries. Please
                refer to the <literal>Statistics</literal>, <literal>EntityStatistics</literal>,
                <literal>CollectionStatistics</literal>, <literal>SecondLevelCacheStatistics</literal>,
                and <literal>QueryStatistics</literal> API Javadoc for more information. The following
                code shows a simple example:
            </para>

            <programlisting><![CDATA[Statistics stats = HibernateUtil.sessionFactory.getStatistics();

double queryCacheHitCount  = stats.getQueryCacheHitCount();
double queryCacheMissCount = stats.getQueryCacheMissCount();
double queryCacheHitRatio =
  queryCacheHitCount / (queryCacheHitCount + queryCacheMissCount);

log.info("Query Hit ratio:" + queryCacheHitRatio);

EntityStatistics entityStats =
  stats.getEntityStatistics( Cat.class.getName() );
long changes =
        entityStats.getInsertCount()
        + entityStats.getUpdateCount()
        + entityStats.getDeleteCount();
log.info(Cat.class.getName() + " changed " + changes + "times"  );]]></programlisting>

            <para>
                To work on all entities, collections, queries and region caches, you can retrieve
                the list of names of entities, collections, queries and region caches with the
                following methods: <literal>getQueries()</literal>, <literal>getEntityNames()</literal>,
                <literal>getCollectionRoleNames()</literal>, and
                <literal>getSecondLevelCacheRegionNames()</literal>.
            </para>

        </sect2>

    </sect1>

</chapter>