317 lines
14 KiB
XML
Executable File
317 lines
14 KiB
XML
Executable File
<chapter id="batch">
|
|
<title>Batch processing</title>
|
|
|
|
<para>
|
|
A naive approach to inserting 100 000 rows in the database using Hibernate might
|
|
look like this:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
Transaction tx = session.beginTransaction();
|
|
for ( int i=0; i<100000; i++ ) {
|
|
Customer customer = new Customer(.....);
|
|
session.save(customer);
|
|
}
|
|
tx.commit();
|
|
session.close();]]></programlisting>
|
|
|
|
<para>
|
|
This would fall over with an <literal>OutOfMemoryException</literal> somewhere
|
|
around the 50 000th row. That's because Hibernate caches all the newly inserted
|
|
<literal>Customer</literal> instances in the session-level cache.
|
|
</para>
|
|
|
|
<para>
|
|
In this chapter we'll show you how to avoid this problem. First, however, if you
|
|
are doing batch processing, it is absolutely critical that you enable the use of
|
|
JDBC batching, if you intend to achieve reasonable performance. Set the JDBC batch
|
|
size to a reasonable number (say, 10-50):
|
|
</para>
|
|
|
|
<programlisting><![CDATA[hibernate.jdbc.batch_size 20]]></programlisting>
|
|
|
|
<para>
|
|
You also might like to do this kind of work in a process where interaction with
|
|
the second-level cache is completely disabled:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[hibernate.cache.use_second_level_cache false]]></programlisting>
|
|
|
|
<para>
|
|
However, this is not absolutely necessary, since we can explicitly set the
|
|
<literal>CacheMode</literal> to disable interaction with the second-level cache.
|
|
</para>
|
|
|
|
<sect1 id="batch-inserts">
|
|
<title>Batch inserts</title>
|
|
|
|
<para>
|
|
When making new objects persistent, you must <literal>flush()</literal> and
|
|
then <literal>clear()</literal> the session regularly, to control the size of
|
|
the first-level cache.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
for ( int i=0; i<100000; i++ ) {
|
|
Customer customer = new Customer(.....);
|
|
session.save(customer);
|
|
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
|
|
//flush a batch of inserts and release memory:
|
|
session.flush();
|
|
session.clear();
|
|
}
|
|
}
|
|
|
|
tx.commit();
|
|
session.close();]]></programlisting>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="batch-update" >
|
|
<title>Batch updates</title>
|
|
|
|
<para>
|
|
For retrieving and updating data the same ideas apply. In addition, you need to
|
|
use <literal>scroll()</literal> to take advantage of server-side cursors for
|
|
queries that return many rows of data.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
ScrollableResults customers = session.getNamedQuery("GetCustomers")
|
|
.setCacheMode(CacheMode.IGNORE)
|
|
.scroll(ScrollMode.FORWARD_ONLY);
|
|
int count=0;
|
|
while ( customers.next() ) {
|
|
Customer customer = (Customer) customers.get(0);
|
|
customer.updateStuff(...);
|
|
if ( ++count % 20 == 0 ) {
|
|
//flush a batch of updates and release memory:
|
|
session.flush();
|
|
session.clear();
|
|
}
|
|
}
|
|
|
|
tx.commit();
|
|
session.close();]]></programlisting>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="batch-statelesssession">
|
|
<title>The StatelessSession interface</title>
|
|
<para>
|
|
Alternatively, Hibernate provides a command-oriented API that may be used for
|
|
streaming data to and from the database in the form of detached objects. A
|
|
<literal>StatelessSession</literal> has no persistence context associated
|
|
with it and does not provide many of the higher-level lifecycle semantics.
|
|
In particular, a stateless session does not implement a first-level cache nor
|
|
interact with any second-level or query cache. It does not implement
|
|
transactional write-behind or automatic dirty checking. Operations performed
|
|
using a stateless session do not ever cascade to associated instances. Collections
|
|
are ignored by a stateless session. Operations performed via a stateless session
|
|
bypass Hibernate's event model and interceptors. Stateless sessions are vulnerable
|
|
to data aliasing effects, due to the lack of a first-level cache. A stateless
|
|
session is a lower-level abstraction, much closer to the underlying JDBC.
|
|
</para>
|
|
|
|
<programlisting><![CDATA[StatelessSession session = sessionFactory.openStatelessSession();
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
ScrollableResults customers = session.getNamedQuery("GetCustomers")
|
|
.scroll(ScrollMode.FORWARD_ONLY);
|
|
while ( customers.next() ) {
|
|
Customer customer = (Customer) customers.get(0);
|
|
customer.updateStuff(...);
|
|
session.update(customer);
|
|
}
|
|
|
|
tx.commit();
|
|
session.close();]]></programlisting>
|
|
|
|
<para>
|
|
Note that in this code example, the <literal>Customer</literal> instances returned
|
|
by the query are immediately detached. They are never associated with any persistence
|
|
context.
|
|
</para>
|
|
|
|
<para>
|
|
The <literal>insert(), update()</literal> and <literal>delete()</literal> operations
|
|
defined by the <literal>StatelessSession</literal> interface are considered to be
|
|
direct database row-level operations, which result in immediate execution of a SQL
|
|
<literal>INSERT, UPDATE</literal> or <literal>DELETE</literal> respectively. Thus,
|
|
they have very different semantics to the <literal>save(), saveOrUpdate()</literal>
|
|
and <literal>delete()</literal> operations defined by the <literal>Session</literal>
|
|
interface.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="batch-direct" revision="2">
|
|
<title>DML-style operations</title>
|
|
|
|
<para>
|
|
As already discussed, automatic and transparent object/relational mapping is concerned
|
|
with the management of object state. This implies that the object state is available
|
|
in memory, hence manipulating (using the SQL <literal>Data Manipulation Language</literal>
|
|
(DML) statements: <literal>INSERT</literal>, <literal>UPDATE</literal>, <literal>DELETE</literal>)
|
|
data directly in the database will not affect in-memory state. However, Hibernate provides methods
|
|
for bulk SQL-style DML statement execution which are performed through the
|
|
Hibernate Query Language (<xref linkend="queryhql">HQL</xref>).
|
|
</para>
|
|
|
|
<para>
|
|
The pseudo-syntax for <literal>UPDATE</literal> and <literal>DELETE</literal> statements
|
|
is: <literal>( UPDATE | DELETE ) FROM? EntityName (WHERE where_conditions)?</literal>. Some
|
|
points to note:
|
|
</para>
|
|
|
|
<itemizedlist spacing="compact">
|
|
<listitem>
|
|
<para>
|
|
In the from-clause, the FROM keyword is optional
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
There can only be a single entity named in the from-clause; it can optionally be
|
|
aliased. If the entity name is aliased, then any property references must
|
|
be qualified using that alias; if the entity name is not aliased, then it is
|
|
illegal for any property references to be qualified.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
No joins (either implicit or explicit) can be specified in a bulk HQL query. Sub-queries
|
|
may be used in the where-clause; the subqueries, themselves, can contain joins.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
The where-clause is also optional.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
As an example, to execute an HQL <literal>UPDATE</literal>, use the
|
|
<literal>Query.executeUpdate()</literal> method (the method is named for
|
|
those familiar with JDBC's <literal>PreparedStatement.executeUpdate()</literal>):
|
|
</para>
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
String hqlUpdate = "update Customer c set c.name = :newName where c.name = :oldName";
|
|
// or String hqlUpdate = "update Customer set name = :newName where name = :oldName";
|
|
int updatedEntities = s.createQuery( hqlUpdate )
|
|
.setString( "newName", newName )
|
|
.setString( "oldName", oldName )
|
|
.executeUpdate();
|
|
tx.commit();
|
|
session.close();]]></programlisting>
|
|
|
|
<para>
|
|
To execute an HQL <literal>DELETE</literal>, use the same <literal>Query.executeUpdate()</literal>
|
|
method:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
String hqlDelete = "delete Customer c where c.name = :oldName";
|
|
// or String hqlDelete = "delete Customer where name = :oldName";
|
|
int deletedEntities = s.createQuery( hqlDelete )
|
|
.setString( "oldName", oldName )
|
|
.executeUpdate();
|
|
tx.commit();
|
|
session.close();]]></programlisting>
|
|
|
|
<para>
|
|
The <literal>int</literal> value returned by the <literal>Query.executeUpdate()</literal>
|
|
method indicate the number of entities effected by the operation. Consider this may or may not
|
|
correlate to the number of rows effected in the database. An HQL bulk operation might result in
|
|
multiple actual SQL statements being executed, for joined-subclass, for example. The returned
|
|
number indicates the number of actual entities affected by the statement. Going back to the
|
|
example of joined-subclass, a delete against one of the subclasses may actually result
|
|
in deletes against not just the table to which that subclass is mapped, but also the "root"
|
|
table and potentially joined-subclass tables further down the inheritence hierarchy.
|
|
</para>
|
|
|
|
<para>
|
|
The pseudo-syntax for <literal>INSERT</literal> statements is:
|
|
<literal>INSERT INTO EntityName properties_list select_statement</literal>. Some
|
|
points to note:
|
|
</para>
|
|
|
|
<itemizedlist spacing="compact">
|
|
<listitem>
|
|
<para>
|
|
Only the INSERT INTO ... SELECT ... form is supported; not the INSERT INTO ... VALUES ... form.
|
|
</para>
|
|
<para>
|
|
The properties_list is analogous to the <literal>column speficiation</literal>
|
|
in the SQL <literal>INSERT</literal> statement. For entities involved in mapped
|
|
inheritence, only properties directly defined on that given class-level can be
|
|
used in the properties_list. Superclass properties are not allowed; and subclass
|
|
properties do not make sense. In other words, <literal>INSERT</literal>
|
|
statements are inherently non-polymorphic.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
select_statement can be any valid HQL select query, with the caveat that the return types
|
|
must match the types expected by the insert. Currently, this is checked during query
|
|
compilation rather than allowing the check to relegate to the database. Note however
|
|
that this might cause problems between Hibernate <literal>Type</literal>s which are
|
|
<emphasis>equivalent</emphasis> as opposed to <emphasis>equal</emphasis>. This might cause
|
|
issues with mismatches between a property defined as a <literal>org.hibernate.type.DateType</literal>
|
|
and a property defined as a <literal>org.hibernate.type.TimestampType</literal>, even though the
|
|
database might not make a distinction or might be able to handle the conversion.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
For the id property, the insert statement gives you two options. You can either
|
|
explicitly specify the id property in the properties_list (in which case its value
|
|
is taken from the corresponding select expression) or omit it from the properties_list
|
|
(in which case a generated value is used). This later option is only available when
|
|
using id generators that operate in the database; attempting to use this option with
|
|
any "in memory" type generators will cause an exception during parsing. Note that
|
|
for the purposes of this discussion, in-database generators are considered to be
|
|
<literal>org.hibernate.id.SequenceGenerator</literal> (and its subclasses) and
|
|
any implementors of <literal>org.hibernate.id.PostInsertIdentifierGenerator</literal>.
|
|
The most notable exception here is <literal>org.hibernate.id.TableHiLoGenerator</literal>,
|
|
which cannot be used because it does not expose a selectable way to get its values.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
For properties mapped as either <literal>version</literal> or <literal>timestamp</literal>,
|
|
the insert statement gives you two options. You can either specify the property in the
|
|
properties_list (in which case its value is taken from the corresponding select expressions)
|
|
or omit it from the properties_list (in which case the <literal>seed value</literal> defined
|
|
by the <literal>org.hibernate.type.VersionType</literal> is used).
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>
|
|
An example HQL <literal>INSERT</literal> statement execution:
|
|
</para>
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
String hqlInsert = "insert into DelinquentAccount (id, name) select c.id, c.name from Customer c where ...";
|
|
int createdEntities = s.createQuery( hqlInsert )
|
|
.executeUpdate();
|
|
tx.commit();
|
|
session.close();]]></programlisting>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|