Optimization Guidelines
optimization guidelines
There are numerous techniques you can use in order to ensure that OpenJPA
operates in the fastest and most efficient manner. Following are some
guidelines. Each describes what impact it will have on performance and
scalability. Note that general guidelines regarding performance or scalability
issues are just that - guidelines. Depending on the particular characteristics
of your application, the optimal settings may be considerably different than
what is outlined below.
In the following table, each row is labeled with a list of italicized keywords.
These keywords identify what characteristics the row in question may improve
upon. Many of the rows are marked with one or both of the performance
and scalability labels. It is important to bear
in mind the differences between performance and scalability (for the most part,
we are referring to system-wide scalability, and not necessarily only
scalability within a single JVM). The performance-related hints will probably
improve the performance of your application for a given user load, whereas the
scalability-related hints will probably increase the total number of users that
your application can service. Sometimes, increasing performance will decrease
scalability, and vice versa. Typically, options that reduce the amount of work
done on the database server will improve scalability, whereas those that push
more work onto the server will have a negative impact on scalability.
Optimization Guidelines
Plugin in a Connection Pool
performance, scalability
OpenJPA's built-in datasource does not perform connection pooling or
prepared statement caching. Plugging in a third-party pooling datasource may
drastically improve performance.
Optimize database indexes
performance, scalability
The default set of indexes created by OpenJPA's mapping tool may not always be
the most appropriate for your application. Manually setting indexes in your
mapping metadata or manually manipulating database indexes to include
frequently-queried fields (as well as dropping indexes on rarely-queried
fields) can yield significant performance benefits.
A database must do extra work on insert, update, and delete to maintain an
index. This extra work will benefit selects with WHERE clauses, which will
execute much faster when the terms in the WHERE clause are appropriately
indexed. So, for a read-mostly application, appropriate indexing will slow down
updates (which are rare) but greatly accelerate reads. This means that the
system as a whole will be faster, and also that the database will experience
less load, meaning that the system will be more scalable.
Bear in mind that over-indexing is a bad thing, both for scalability and
performance, especially for applications that perform lots of inserts, updates,
or deletes.
JVM optimizations
performance, reliability
Manipulating various parameters of the Java Virtual Machine (such as hotspot
compilation modes and the maximum memory) can result in performance
improvements. For more details about optimizing the JVM execution environment,
please see .
Use the data cache
performance, scalability
Using OpenJPA's data and query caching
features can often result in a dramatic improvement in performance.
Additionally, these caches can significantly reduce the amount of load on
the database, increasing the scalability characteristics of your application.
Set LargeTransaction to true,
or set PopulateDataCache to
false
performance vs. scalability
When using OpenJPA's data caching
features in a transaction that will delete, modify, or create a very large
number of objects you can set LargeTransaction to true and
perform periodic flushes during your transaction to reduce its memory
requirements. See the Javadoc:
OpenJPAEntityManager.setLargeTransaction. Note that transactions in
large mode have to more aggressively flush items from the data cache.
If your transaction will visit objects that you know are very unlikely to be
accessed by other transactions, for example an exhaustive report run only once a
month, you can turn off population of the data cache so that the transaction
doesn't fill the entire data cache with objects that won't be accessed again.
Again, see the Javadoc:
OpenJPAEntityManager.setPopulateDataCache
Disable logging, performance tracking
performance
Developer options such as verbose logging and the JDBC performance tracker can
result in serious performance hits for your application. Before evaluating
OpenJPA's performance, these options should all be disabled.
Set IgnoreChanges to true, or
set FlushBeforeQueries to true
performance vs. scalability
When both the
openjpa.IgnoreChanges and
openjpa.FlushBeforeQueries
properties are set to false, OpenJPA needs to consider
in-memory dirty instances during queries. This can sometimes result in OpenJPA
needing to evaluate the entire extent objects in order to return the correct
query results, which can have drastic performance consequences. If it is
appropriate for your application, configuring FlushBeforeQueries
to automatically flush before queries involving dirty objects will
ensure that this never happens. Setting IgnoreChanges to
false will result in a small performance hit even if FlushBeforeQueries
is true, as incremental flushing is not as efficient overall as
delaying all flushing to a single operation during commit.
Setting IgnoreChanges to true will help
performance, since dirty objects can be ignored for queries, meaning that
incremental flushing or client-side processing is not necessary. It will also
improve scalability, since overall database server usage is diminished. On the
other hand, setting IgnoreChanges to false
will have a negative impact on scalability, even when using automatic flushing
before queries, since more operations will be performed on the database server.
Configure openjpa.ConnectionRetainMode
appropriately
performance vs. scalability
The ConnectionRetainMode
configuration option controls when OpenJPA will obtain a
connection, and how long it will hold that connection. The optimal settings for
this option will vary considerably depending on the particular behavior of
your application. You may even benefit from using different retain modes for
different parts of your application.
The default setting of on-demand minimizes the amount of time
that OpenJPA holds onto a datastore connection. This is generally the best
option from a scalability standpoind, as database resources are held for a
minimal amount of time. However, if you are not using connection pooling, or
if your DataSource is not efficient at managing its
pool, then this default value could cause undesirable pool contention.
Use flat inheritance
performance, scalability vs. disk space
Mapping inheritance hierarchies to a single database table is faster for most
operations than other strategies employing multiple tables. If it is
appropriate for your application, you should use this strategy whenever
possible.
However, this strategy will require more disk space on the database side. Disk
space is relatively inexpensive, but if your object model is particularly large,
it can become a factor.
High sequence increment
performance, scalability
For applications that perform large bulk inserts, the retrieval of sequence
numbers can be a bottleneck. Increasing sequence increments and using
table-based rather than native database sequences can reduce or eliminate
this bottleneck. In some cases, implementing your own sequence factory can
further optimize sequence number retrieval.
Use optimistic transactions
performance, scalability
Using datastore transactions translates into pessimistic database row locking,
which can be a performance hit (depending on the database). If appropriate for
your application, optimistic transactions are typically faster than datastore
transactions.
Optimistic transactions provide the same transactional guarantees as datastore
transactions, except that you must handle a potential optimistic verification
exception at the end of a transaction instead of assuming that a transaction
will successfully complete. In many applications, it is unlikely that different
concurrent transactions will operate on the same set of data at the same time,
so optimistic verification increases the concurrency, and therefore both the
performance and scalability characteristics, of the application. A common
approach to handling optimistic verification exceptions is to simply present the
end user with the fact that concurrent modifications happened, and require that
the user redo any work.
Use query aggregates and projections
performance, scalability
Using aggregates to compute reporting data on the database server can
drastically speed up queries. Similarly, using projections when you are
interested in specific object fields or relations rather than the entire object
state can reduce the amount of data OpenJPA must transfer from the database to
your application.
Always close resources
scalability
Under certain settings, EntityManager s, OpenJPA
Extent iterators, and Query
results may be backed by resources in the database.
For example, if you have configured OpenJPA to use scrollable cursors and lazy
object instantiation by default, each query result will hold open a
ResultSet object, which, in turn, will hold open a
Statement object (preventing it from being re-used). Garbage
collection will clean up these resources, so it is never necessary to explicitly
close them, but it is always faster if it is done at the application level.
Use detached state managers
performance
Attaching and even persisting instances can be more efficient when your detached
objects use detached state managers. By default, OpenJPA does not use detached
state managers when serializing an instance across tiers. See
for how to force OpenJPA to use
detached state managers across tiers, and for other options for more efficient
attachment.
The downside of using a detached state manager across tiers is that your
enhanced persistent classes and the OpenJPA libraries must be available on the
client tier.
Utilize the EntityManager
cache
performance, scalability
When possible and appropriate, re-using EntityManagers
and setting the RetainState
configuration option to true may result in
significant performance gains, since the EntityManager's
built-in object cache will be used.
Enable multithreaded operation only when necessary
performance
OpenJPA respects the
openjpa.Multithreaded option in that it does not impose as
much synchronization overhead for applications that do not set this value to
true. If your application is guaranteed to only use
single-threaded access to OpenJPA resources and persistent objects, leaving
this option as false will reduce synchronization overhead,
and may result in a modest performance increase.
Enable large data set handling
performance, scalability
If you execute queries that return large numbers of objects or have relations
(collections or maps) that are large, and if you often only access parts of
these data sets, enabling large result
set handling where appropriate can dramatically speed up your
application, since OpenJPA will bring the data sets into memory from the
database only as necessary.
Disable large data set handling
performance, scalability
If you have enabled scrollable result sets and on-demand loading but do you not
require it, consider disabling it again. Some JDBC drivers and databases
(SQLServer for example) are much slower when used with scrolling result sets.
Use the DynamicSchemaFactoryperformance, validation
If you are using a
openjpa.jdbc.SchemaFactory setting of something other than
the default of dynamic, consider switching back. While other
factories can ensure that object-relational mapping information is valid when
a persistent class is first used, this can be a slow process. Though the
validation is only performed once for each class, switching back to the
DynamicSchemaFactory can reduce the warm-up time for
your application.
Do not use XA transactions
performance, scalability
XA transactions can be orders of
magnitude slower than standard transactions. Unless distributed transaction
functionality is required by your application, use standard transactions.
Recall that XA transactions are distinct from managed transactions - managed
transaction services such as that provided by EJB declarative transactions can
be used both with XA and non-XA transactions. XA transactions should only be
used when a given business transaction involves multiple different transactional
resources (an Oracle database and an IBM transactional message queue, for
example).
Use Sets instead of
List/Collections
performance, scalability
There is a small amount of extra overhead for OpenJPA to maintain collections
where each element is not guaranteed to be unique. If your application does
not require duplicates for a collection, you should always declare your
fields to be of type Set, SortedSet, HashSet, or
TreeSet.
Use query parameters instead of encoding search
data in filter strings
performance
If your queries depend on parameter data only known at runtime, you should use
query parameters rather than dynamically building different query strings.
OpenJPA performs aggressive caching of query compilation data, and the
effectiveness of this cache is diminished if multiple query filters are used
where a single one could have sufficed.
Tune your fetch groups appropriately
performance, scalability
The fetch groups used when loading an
object control how much data is eagerly loaded, and by extension, which fields
must be lazily loaded at a future time. The ideal fetch group configuration
loads all the data that is needed in one fetch, and no extra fields - this
minimizes both the amount of data transferred from the database, and the
number of trips to the database.
If extra fields are specified in the fetch groups (in particular, large fields
such as binary data, or relations to other persistence-capable objects), then
network overhead (for the extra data) and database processing (for any necessary
additional joins) will hurt your application's performance. If too few fields
are specified in the fetch groups, then OpenJPA will have to make additional
trips to the database to load additional fields as necessary.
Use eager fetching
performance, scalability
Using eager fetching when
loading subclass data or traversing relations for each instance in a large
collection of results can speed up data loading by orders of magnitude.