2005-02-05 20:54:17 -05:00
|
|
|
<chapter id="batch">
|
2005-02-06 19:35:39 -05:00
|
|
|
<title>Batch processing</title>
|
2005-02-05 20:54:17 -05:00
|
|
|
|
|
|
|
<para>
|
|
|
|
A naive approach to inserting 100 000 rows in the database using Hibernate might
|
|
|
|
look like this:
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
for ( int i=0; i<100000; i++ ) {
|
|
|
|
Customer customer = new Customer(.....);
|
|
|
|
session.save(customer);
|
|
|
|
}
|
|
|
|
tx.commit();
|
|
|
|
session.close();]]></programlisting>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
This would fall over with an <literal>OutOfMemoryException</literal> somewhere
|
|
|
|
around the 50 000th row. That's because Hibernate caches all the newly inserted
|
|
|
|
<literal>Customer</literal> instances in the session-level cache.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
In this chapter we'll show you how to avoid this problem. First, however, if you
|
|
|
|
are doing batch processing, it is absolutely critical that you enable the use of
|
|
|
|
JDBC batching, if you intend to achieve reasonable performance. Set the JDBC batch
|
|
|
|
size to a reasonable number (say, 10-50):
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<programlisting><![CDATA[hibernate.jdbc.batch_size 20]]></programlisting>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
You also might like to do this kind of work in a process where interaction with
|
|
|
|
the second-level cache is completely disabled:
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<programlisting><![CDATA[hibernate.cache.use_second_level_cache false]]></programlisting>
|
|
|
|
|
2005-03-29 22:46:36 -05:00
|
|
|
<sect1 id="batch-inserts">
|
2005-02-05 20:54:17 -05:00
|
|
|
<title>Batch inserts</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
When making new objects persistent, you must <literal>flush()</literal> and
|
|
|
|
then <literal>clear()</literal> the session regularly, to control the size of
|
|
|
|
the first-level cache.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
|
|
|
|
for ( int i=0; i<100000; i++ ) {
|
|
|
|
Customer customer = new Customer(.....);
|
|
|
|
session.save(customer);
|
|
|
|
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
|
|
|
|
//flush a batch of inserts and release memory:
|
|
|
|
session.flush();
|
|
|
|
session.clear();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
tx.commit();
|
|
|
|
session.close();]]></programlisting>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
2005-03-29 22:46:36 -05:00
|
|
|
<sect1 id="batch-update" >
|
2005-02-05 20:54:17 -05:00
|
|
|
<title>Batch updates</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
For retrieving and updating data the same ideas apply. In addition, you need to
|
|
|
|
use <literal>scroll()</literal> to take advantage of server-side cursors for
|
|
|
|
queries that return many rows of data.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
|
|
|
|
ScrollableResults customers = session.getNamedQuery("GetCustomers")
|
|
|
|
.setCacheMode(CacheMode.IGNORE)
|
|
|
|
.scroll(ScrollMode.FORWARD_ONLY);
|
|
|
|
int count=0;
|
|
|
|
while ( customers.next() ) {
|
|
|
|
Customer customer = (Customer) customers.get(0);
|
|
|
|
customer.updateStuff(...);
|
|
|
|
if ( ++count % 20 == 0 ) {
|
|
|
|
//flush a batch of updates and release memory:
|
|
|
|
session.flush();
|
|
|
|
session.clear();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
tx.commit();
|
|
|
|
session.close();]]></programlisting>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
2005-04-15 11:43:46 -04:00
|
|
|
<sect1 id="batch-direct">
|
|
|
|
<title>Bulk update/delete</title>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
As already discussed, automatic and transparent object/relational mapping is concerned
|
|
|
|
with the management of object state. This implies that the object state is available
|
|
|
|
in memory, hence updating or deleting (using SQL <literal>UPDATE</literal> and
|
|
|
|
<literal>DELETE</literal>) data directly in the database will not affect in-memory
|
|
|
|
state. However, Hibernate provides methods for bulk SQL-style <literal>UPDATE</literal>
|
|
|
|
and <literal>DELETE</literal> statement execution which are performed through the
|
|
|
|
Hibernate Query Language (<xref linkend="queryhql">HQL</xref>).
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The psuedo-syntax for <literal>UPDATE</literal> and <literal>DELETE</literal> statements
|
|
|
|
is: <literal>( UPDATE | DELETE ) FROM? ClassName (WHERE WHERE_CONDITIONS)?</literal>. Some
|
|
|
|
points to note:
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<itemizedlist spacing="compact">
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
In the from-clause, the FROM keyword is optional
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
There can only be a single class named in the from-clause, and it <emphasis>cannot</emphasis>
|
|
|
|
have an alias.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
No joins (either implicit or explicit) can be specified in a bulk HQL query. Sub-queries
|
|
|
|
may be used in the where-clause.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
|
|
<para>
|
|
|
|
The where-clause is also optional.
|
|
|
|
</para>
|
|
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
As an example, to execute an HQL <literal>UPDATE</literal>, use the
|
|
|
|
<literal>Query.executeUpdate()</literal> method:
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
|
|
|
|
String hqlUpdate = "update Customer set name = :newName where name = :oldName";
|
|
|
|
int updatedEntities = s.createQuery( hqlUpdate )
|
|
|
|
.setString( "newName", newName )
|
|
|
|
.setString( "oldName", oldName )
|
|
|
|
.executeUpdate();
|
|
|
|
tx.commit();
|
|
|
|
session.close();]]></programlisting>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
To execute an HQL <literal>DELETE</literal>, use the same <literal>Query.executeUpdate()</literal>
|
|
|
|
method (the method is named for those familiar with JDBC's
|
|
|
|
<literal>PreparedStatement.executeUpdate()</literal>):
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<programlisting><![CDATA[Session session = sessionFactory.openSession();
|
|
|
|
Transaction tx = session.beginTransaction();
|
|
|
|
|
|
|
|
String hqlDelete = "delete Customer where name = :oldName";
|
|
|
|
int deletedEntities = s.createQuery( hqlDelete )
|
|
|
|
.setString( "oldName", oldName )
|
|
|
|
.executeUpdate();
|
|
|
|
tx.commit();
|
|
|
|
session.close();]]></programlisting>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
The <literal>int</literal> value returned by the <literal>Query.executeUpdate()</literal>
|
|
|
|
method indicate the number of entities effected by the operation. Consider this may or may not
|
|
|
|
correlate to the number of rows effected in the database. An HQL bulk operation might result in
|
|
|
|
multiple actual SQL statements being executed, for joined-subclass, for example. The returned
|
|
|
|
number indicates the number of actual entities affected by the statement. Going back to the
|
|
|
|
example of joined-subclass, a delete against one of the subclasses may actually result
|
|
|
|
in deletes against not just the table to which that subclass is mapped, but also the "root"
|
|
|
|
table and potentially joined-subclass tables further down the inheritence hierarchy.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
|
|
Note that there are currently a few limitations with the bulk HQL operations which
|
|
|
|
will be addressed in future releases; consult the JIRA roadmap for details.
|
|
|
|
</para>
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
2005-03-29 22:46:36 -05:00
|
|
|
</chapter>
|