HHH-11132 - Add a Performance Tuning and Best Practices chapter

(cherry picked from commit a705b48c41)
2016-09-26 17:58:37 +03:00 · 2016-09-26 17:58:37 +03:00 · 12e798aa76
parent d82d398e90
commit 12e798aa76
2 changed files with 257 additions and 0 deletions
--- a/documentation/src/main/asciidoc/userguide/Hibernate_User_Guide.adoc
+++ b/documentation/src/main/asciidoc/userguide/Hibernate_User_Guide.adoc
@ -31,6 +31,7 @@ include::chapters/envers/Envers.adoc[]
 include::chapters/portability/Portability.adoc[]

 include::appendices/Configurations.adoc[]
+include::appendices/BestPractices.adoc[]
 include::appendices/Legacy_Bootstrap.adoc[]
 include::appendices/Legacy_DomainModel.adoc[]
 include::appendices/Legacy_Criteria.adoc[]
--- a/documentation/src/main/asciidoc/userguide/appendices/BestPractices.adoc
+++ b/documentation/src/main/asciidoc/userguide/appendices/BestPractices.adoc
@ -0,0 +1,256 @@
+[[best-practices]]
+== Performance Tuning and Best Practices
+
+Every enterprise system is unique. However, having a very efficient data access layer is a common requirement for many enterprise applications.
+Hibernate comes with a great variety of features that can help you tune the data access layer.
+
+[[best-practices-schema]]
+=== Schema management
+
+Although Hibernate provides the `update` option for the `hibernate.hbm2ddl.auto` configuration property,
+this feature is not suitable for a production environment.
+
+An automated schema migration tool (e.g. https://flywaydb.org/[Flyway], http://www.liquibase.org/[Liquibase]) allows you to use any database-specific DDL feature (e.g. Rules, Triggers, Partitioned Tables).
+Every migration should have an associated script, which is stored on the Version Control System, along with the application source code.
+
+When the application is deployed on a production-like QA environment, and the deploy worked as expected, then pushing the deploy to a production environment should be straightforward since the latest schema migration was already tested.
+
+[TIP]
+====
+You should always use an automatic schema migration tool and have all the migration scripts stored in the Version Control System.
+====
+
+[[best-practices-logging]]
+=== Logging
+
+Whenever you're using a framework that generates SQL statements on your behalf, you have to ensure that the generated statements are the ones that you intended in the first place.
+
+There are several alternatives to logging statements.
+You can log statements by configuring the underlying logging framework.
+For Log4j, you can use the following appenders:
+
+[source,java]
+----
+### log just the SQL
+log4j.logger.org.hibernate.SQL=debug
+
+### log JDBC bind parameters ###
+log4j.logger.org.hibernate.type=trace
+log4j.logger.org.hibernate.type.descriptor.sql=trace
+----
+
+However, there are some other alternatives like using https://vladmihalcea.com/2016/05/03/the-best-way-of-logging-jdbc-statements/[datasource-proxy or p6spy].
+The advantage of using a JDBC `Driver` or `DataSource` Proxy is that you can go beyond simple SQL logging:
+
+- statement execution time
+- JDBC batching logging
+- https://github.com/vladmihalcea/flexy-pool[database connection monitoring]
+
+Another advantage of using a `DataSource` proxy is that you can assert the number of executed statements at test time.
+This way, you can have the integration tests fail https://vladmihalcea.com/2014/02/01/how-to-detect-the-n-plus-one-query-problem-during-testing/[when a N+1 query issue is automatically detected].
+
+[TIP]
+====
+While simple statement logging is fine, using https://github.com/ttddyy/datasource-proxy[datasource-proxy] or https://github.com/p6spy/p6spy[p6spy] is even better.
+====
+
+[[best-practices-jdbc-batching]]
+=== JDBC batching
+
+JDBC allows us to batch multiple SQL statements and to send them to the database server into a single request.
+This saves database roundtrips, and so it https://leanpub.com/high-performance-java-persistence/read#jdbc-batch-updates[reduces response time significantly].
+
+Not only `INSERT` and `UPDATE` statements, but even `DELETE` statements can be batched as well.
+For `INSERT` and `UPDATE` statements, make sure that you have all the right configuration properties in place, like ordering inserts and updates and activating batching for versioned data.
+Check out https://vladmihalcea.com/2015/03/18/how-to-batch-insert-and-update-statements-with-hibernate/[this article] for more details on this topic.
+
+For `DELETE` statements, there is no option to order parent and child statements, so https://vladmihalcea.com/2015/03/26/how-to-batch-delete-statements-with-hibernate/[cascading can interfere with the JDBC batching process].
+
+Unlike any other framework which doesn't automate SQL statement generation, Hibernate makes it very easy to activate JDBC-level batching as indicated in the <<chapters/batch/Batching.adoc#batch,Batching chapter>>.
+
+[[best-practices-mapping]]
+=== Mapping
+
+Choosing the right mappings is very important for a high-performance data access layer.
+From the identifier generators to associations, there are many options to choose from, yet not all choices are equal from a performance perspective.
+
+[[best-practices-mapping-identifiers]]
+==== Identifiers
+
+When it comes to identifiers, you can either choose a natural id or a synthetic key.
+
+For natural identifiers, the *assigned* identifier generator is the right choice.
+
+For synthetic keys, the application developer can either choose a randomly generates fixed-size sequence (e.g. UUID) or a natural identifier.
+Natural identifiers are very practical, being more compact than their UUID counterparts, so there are multiple generators to choose from:
+
+- `IDENTITY`
+- `SEQUENCE`
+- `TABLE`
+
+Although the `TABLE` generator addresses the portability concern, in reality, it performs poorly because it requires emulating a database sequence using a separate transaction and row-level locks.
+For this reason, the choice is usually between `IDENTITY` and `SEQUENCE`.
+
+[TIP]
+====
+If the underlying database supports sequences, you should always use them for your Hibernate entity identifiers.
+
+Only if the relational database does not support sequences (e.g. MySQL 5.7), you should use the `IDENTITY` generators.
+However, you should keep in mind that the `IDENTITY` generators disables JDBC batching for `INSERT` statements.
+====
+
+If you're using the `SEQUENCE` generator, then you should be using the enhanced identifier generators that were enabled by default in Hibernate 5.
+The https://vladmihalcea.com/2014/07/21/hibernate-hidden-gem-the-pooled-lo-optimizer/[*pooled* and the *pooled-lo* optimizers] are very useful to reduce the number of database roundtrips when writing multiple entities per database transaction.
+
+[[best-practices-mapping-associations]]
+==== Associations
+
+JPA offers four entity association types:
+
+- `@ManyToOne`
+- `@OneToOne`
+- `@OneToMany`
+- `@ManyToMany`
+
+And an `@ElementCollection` for collections of embeddables.
+
+Because object associations can be bidirectional, there are many possible combinations of associations.
+However, not every possible association type is efficient from a database perspective.
+
+[TIP]
+====
+The closer the association mapping is to the underlying database relationship, the better it will perform.
+
+On the other hand, the more exotic the association mapping, the better the chance of being inefficient.
+====
+
+Therefore, the `@ManyToOne` and the `@OneToOne` child-side association are best to represent a `FOREIGN KEY` relationship.
+
+For collections, the association can be either:
+
+- unidirectional
+- bidirectional
+
+For unidirectional collections, `Sets` are the best choice because they generate the most efficient SQL statements.
+https://vladmihalcea.com/2015/05/04/how-to-optimize-unidirectional-collections-with-jpa-and-hibernate/[Unidirectional `Lists`] are less efficient than a `@ManyToOne` association.
+
+Bidirectional associations are usually a better choice because the `@ManyToOne` controls the association.
+
+The `@ManyToMany` annotation is rarely a good choice because it treats both sides as unidirectional associations.
+
+For this reason, it's much better to map the link table as depicted in the <<chapters/domain/associations.adoc#associations-many-to-many-bidirectional-with-link-entity-lifecycle-example,Bidirectional many-to-many with link entity lifecycle>> section.
+Each `FOREIGN KEY column will be mapped as a `@ManyToOne` association.
+On each parent-side, a bidirectional `@OneToMany` association is going to map to the aforementioned `@ManyToOne` relationship in the link entity.
+
+[TIP]
+====
+Just because you have support for collections, it does not mean that you have to turn any one-to-many database relationship into a collection.
+
+Sometimes, a `@ManyToOne` association is sufficient, and the collection can be simply replaced by an entity query which is easier to paginate or filter.
+====
+
+[[best-practices-inheritance]]
+=== Inheritance
+
+JPA offers `SINGLE_TABLE`, `JOINED`, and `TABLE_PER_CLASS` to deal with inheritance mapping, and each of these strategies has advantages and disadvantages.
+
+- `SINGLE_TABLE` performs the best in terms of executed SQL statements. However, you cannot use `NOT NULL` constraints on the column-level. You can still use triggers and rules to enforce such constraints, but it's not as straightforward.
+- `JOINED` addresses the data integrity concerns because every subclass is associated with a different table.
+   Polymorphic queries or ``@OneToMany` base class associations don't perform very well with this strategy.
+   However, polymorphic @ManyToOne` associations are fine, and they can provide a lot of value.
+- `TABLE_PER_CLASS` should be avoided since it does not render efficient SQL statements.
+
+[[best-practices-fetching]]
+=== Fetching
+
+[TIP]
+====
+Fetching too much data is the number one performance issue for the vast majority of JPA applications.
+====
+
+Hibernate supports both entity queries (JPQL/HQL and Criteria API) and native SQL statements.
+Entity queries are useful only if you need to modify the fetched entities, therefore benefiting from the https://vladmihalcea.com/2014/08/21/the-anatomy-of-hibernate-dirty-checking/[automatic dirty checking mechanism].
+
+For read-only transactions, you should fetch DTO projections because they allow you to fetch just as many columns as you need to fulfill a certain business use case.
+This has many benefits like reducing the load on the currently running Persistence Context because DTO projections don't need to be managed.
+
+[[best-practices-fetching-associations]]
+==== Fetching associations
+
+Related to associations, there are two major fetch strategies:
+
+- `EAGER`
+- `LAZY`
+
+https://vladmihalcea.com/2014/12/15/eager-fetching-is-a-code-smell/[`EAGER` fetching is almost always a bad choice].
+
+[TIP]
+====
+Prior to JPA, Hibernate used to have all associations as `LAZY` by default.
+However, when JPA 1.0 specification emerged, it was thought that not all providers would use Proxies. Hence, the `@ManyToOne` and the `@OneToOne` associations are now `EAGER` by default.
+
+The `EAGER` fetching strategy cannot be overwritten on a per query basis, so the association is always going to be retrieved even if you don't need it.
+More, if you forget to `JOIN FETCH` an `EAGER` association in a JPQL query, Hibernate will initialize it with a secondary statement, which in turn can lead to https://vladmihalcea.com/2014/02/01/how-to-detect-the-n-plus-one-query-problem-during-testing/[N+1 query issues].
+====
+
+So, `EAGER` fetching is to be avoided. For this reason, it's better if all associations are marked as `LAZY` by default.
+
+However, `LAZY` associations must be initialized prior to being accessed. Otherwise, a `LazyInitializationException` is thrown.
+There are good and bad ways to treat the `LazyInitializationException`.
+
+https://vladmihalcea.com/2016/09/13/the-best-way-to-handle-the-lazyinitializationexception/[The best way to deal with `LazyInitializationException`] is to fetch all the required associations prior to closing the Persistence Context.
+The `JOIN FETCH` directive is goof for `@ManyToOne` and `OneToOne` associations, and for at most one collection (e.g. `@OneToMany` or `@ManyToMany`).
+If you need to fetch multiple collections, to avoid a Cartesian Product, you should use secondary queries which are triggered either by navigating the `LAZY` association or by calling `Hibernate#initialize(proxy)` method.
+
+[[best-practices-caching]]
+=== Caching
+
+Hibernate has two caching layers:
+
+- the first-level cache (Persistence Context) which is a https://vladmihalcea.com/2015/04/20/a-beginners-guide-to-cache-synchronization-strategies/[transactional write-behind cache] providing https://vladmihalcea.com/2014/10/23/hibernate-application-level-repeatable-reads/[application-level repeatable reads].
+- the second-level cache which, unlike application-level caches, https://vladmihalcea.com/2015/04/09/how-does-hibernate-store-second-level-cache-entries/[it doesn't store entity aggregates but normalized dehydrated entity entries].
+
+The first-level cache is not a caching solution "per se", being more useful for ensuring https://vladmihalcea.com/2014/01/05/a-beginners-guide-to-acid-and-database-transactions/[REPEATABLE READS] even when using the https://vladmihalcea.com/2014/12/23/a-beginners-guide-to-transaction-isolation-levels-in-enterprise-java/[READ_COMMITTED isolation level].
+
+While the first-level cache is short lived, being cleared when the underlying `EntityManager` is closed, the second-level cache is tied to an `EntityManagerFactory`.
+Some second-level caching providers offer support for clusters. Therefore, a node needs only to store a subset of the whole cached data.
+
+Although the second-level cache can reduce transaction response time since entities are retrieved from the cache rather than from the database,
+https://vladmihalcea.com/2015/04/16/things-to-consider-before-jumping-to-enterprise-caching/[there are other options] to achieve the same goal,
+and you should consider these alternatives prior to jumping to a second-level cache layer:
+
+- tuning the underlying database cache so that the working set fits into memory, therefore reducing Disk I/O traffic.
+- optimizing database statements through JDBC batching, statement caching, indexing can reduce the average response time, therefore increasing throughput as well.
+- database replication is also a very valuable option to increase read-only transaction throughput
+
+After properly tuning the database, to further reduce the average response time and increase the system throughput, application-level caching becomes inevitable.
+
+Topically, a key-value application-level cache like https://memcached.org/[Memcached] or http://redis.io/[Redis] is a common choice to store data aggregates.
+If you can duplicate all data in the key-value store, you have the option of taking down the database system for maintenance without completely loosing availability since read-only traffic can still be served from the cache.
+
+One of the main challenges of using an application-level cache is ensuring data consistency across entity aggregates.
+That's where the second-level cache comes to the rescue.
+Being tightly integrated with Hibernate, the second-level cache can provide better data consistency since entries are cached in a normalized fashion, just like in a relational database.
+Changing a parent entity only requires a single entry cache update, as opposed to cache entry invalidation cascading in key-value stores.
+
+The second-level cache provides four cache concurrency strategies:
+
+- https://vladmihalcea.com/2015/04/27/how-does-hibernate-read_only-cacheconcurrencystrategy-work/[`READ_ONLY`]
+- https://vladmihalcea.com/2015/05/18/how-does-hibernate-nonstrict_read_write-cacheconcurrencystrategy-work/[`NONSTRICT_READ_WRITE`]
+- https://vladmihalcea.com/2015/05/25/how-does-hibernate-read_write-cacheconcurrencystrategy-work/[`READ_WRITE`]
+- https://vladmihalcea.com/2015/06/01/how-does-hibernate-transactional-cacheconcurrencystrategy-work/[`TRANSACTIONAL`]
+
+`READ_WRITE` is a very good default concurrency strategy since it provides strong consistency guarantees without compromising throughput.
+The `TRANSACTIONAL` concurrency strategy uses JTA. Hence, it's more suitable when entities are frequently modified.
+
+Both `READ_WRITE` and `TRANSACTIONAL` use write-through caching, while `NONSTRICT_READ_WRITE` is a read-through caching strategy.
+For this reason, `NONSTRICT_READ_WRITE` is not very suitable if entities are changed frequently.
+
+When using clustering, the second-level cache entries are spread across multiple nodes.
+When using http://blog.infinispan.org/2015/10/hibernate-second-level-cache.html[Infinispan distributed cache], only `READ_WRITE` and `NONSTRICT_READ_WRITE` are available for read-write caches.
+Bear in mind that `NONSTRICT_READ_WRITE` offers a weaker consistency guarantee since stale updates are possible.
+
+[NOTE]
+====
+For more about Hibernate Performance Tuning, check out the https://www.youtube.com/watch?v=BTdTEe9QL5k&amp;t=1s[High-Performance Hibernate] presentation from Devoxx France.
+====