From a705b48c4190240d0d6ad232591d2ab4edbe52a4 Mon Sep 17 00:00:00 2001 From: Vlad Mihalcea Date: Mon, 26 Sep 2016 17:58:37 +0300 Subject: [PATCH] HHH-11132 - Add a Performance Tuning and Best Practices chapter --- .../userguide/Hibernate_User_Guide.adoc | 1 + .../userguide/appendices/BestPractices.adoc | 256 ++++++++++++++++++ 2 files changed, 257 insertions(+) create mode 100644 documentation/src/main/asciidoc/userguide/appendices/BestPractices.adoc diff --git a/documentation/src/main/asciidoc/userguide/Hibernate_User_Guide.adoc b/documentation/src/main/asciidoc/userguide/Hibernate_User_Guide.adoc index 839b5fe8ad..92118574d0 100644 --- a/documentation/src/main/asciidoc/userguide/Hibernate_User_Guide.adoc +++ b/documentation/src/main/asciidoc/userguide/Hibernate_User_Guide.adoc @@ -31,6 +31,7 @@ include::chapters/envers/Envers.adoc[] include::chapters/portability/Portability.adoc[] include::appendices/Configurations.adoc[] +include::appendices/BestPractices.adoc[] include::appendices/Legacy_Bootstrap.adoc[] include::appendices/Legacy_DomainModel.adoc[] include::appendices/Legacy_Criteria.adoc[] diff --git a/documentation/src/main/asciidoc/userguide/appendices/BestPractices.adoc b/documentation/src/main/asciidoc/userguide/appendices/BestPractices.adoc new file mode 100644 index 0000000000..c12cb9bb15 --- /dev/null +++ b/documentation/src/main/asciidoc/userguide/appendices/BestPractices.adoc @@ -0,0 +1,256 @@ +[[best-practices]] +== Performance Tuning and Best Practices + +Every enterprise system is unique. However, having a very efficient data access layer is a common requirement for many enterprise applications. +Hibernate comes with a great variety of features that can help you tune the data access layer. + +[[best-practices-schema]] +=== Schema management + +Although Hibernate provides the `update` option for the `hibernate.hbm2ddl.auto` configuration property, +this feature is not suitable for a production environment. + +An automated schema migration tool (e.g. https://flywaydb.org/[Flyway], http://www.liquibase.org/[Liquibase]) allows you to use any database-specific DDL feature (e.g. Rules, Triggers, Partitioned Tables). +Every migration should have an associated script, which is stored on the Version Control System, along with the application source code. + +When the application is deployed on a production-like QA environment, and the deploy worked as expected, then pushing the deploy to a production environment should be straightforward since the latest schema migration was already tested. + +[TIP] +==== +You should always use an automatic schema migration tool and have all the migration scripts stored in the Version Control System. +==== + +[[best-practices-logging]] +=== Logging + +Whenever you're using a framework that generates SQL statements on your behalf, you have to ensure that the generated statements are the ones that you intended in the first place. + +There are several alternatives to logging statements. +You can log statements by configuring the underlying logging framework. +For Log4j, you can use the following appenders: + +[source,java] +---- +### log just the SQL +log4j.logger.org.hibernate.SQL=debug + +### log JDBC bind parameters ### +log4j.logger.org.hibernate.type=trace +log4j.logger.org.hibernate.type.descriptor.sql=trace +---- + +However, there are some other alternatives like using https://vladmihalcea.com/2016/05/03/the-best-way-of-logging-jdbc-statements/[datasource-proxy or p6spy]. +The advantage of using a JDBC `Driver` or `DataSource` Proxy is that you can go beyond simple SQL logging: + +- statement execution time +- JDBC batching logging +- https://github.com/vladmihalcea/flexy-pool[database connection monitoring] + +Another advantage of using a `DataSource` proxy is that you can assert the number of executed statements at test time. +This way, you can have the integration tests fail https://vladmihalcea.com/2014/02/01/how-to-detect-the-n-plus-one-query-problem-during-testing/[when a N+1 query issue is automatically detected]. + +[TIP] +==== +While simple statement logging is fine, using https://github.com/ttddyy/datasource-proxy[datasource-proxy] or https://github.com/p6spy/p6spy[p6spy] is even better. +==== + +[[best-practices-jdbc-batching]] +=== JDBC batching + +JDBC allows us to batch multiple SQL statements and to send them to the database server into a single request. +This saves database roundtrips, and so it https://leanpub.com/high-performance-java-persistence/read#jdbc-batch-updates[reduces response time significantly]. + +Not only `INSERT` and `UPDATE` statements, but even `DELETE` statements can be batched as well. +For `INSERT` and `UPDATE` statements, make sure that you have all the right configuration properties in place, like ordering inserts and updates and activating batching for versioned data. +Check out https://vladmihalcea.com/2015/03/18/how-to-batch-insert-and-update-statements-with-hibernate/[this article] for more details on this topic. + +For `DELETE` statements, there is no option to order parent and child statements, so https://vladmihalcea.com/2015/03/26/how-to-batch-delete-statements-with-hibernate/[cascading can interfere with the JDBC batching process]. + +Unlike any other framework which doesn't automate SQL statement generation, Hibernate makes it very easy to activate JDBC-level batching as indicated in the <>. + +[[best-practices-mapping]] +=== Mapping + +Choosing the right mappings is very important for a high-performance data access layer. +From the identifier generators to associations, there are many options to choose from, yet not all choices are equal from a performance perspective. + +[[best-practices-mapping-identifiers]] +==== Identifiers + +When it comes to identifiers, you can either choose a natural id or a synthetic key. + +For natural identifiers, the *assigned* identifier generator is the right choice. + +For synthetic keys, the application developer can either choose a randomly generates fixed-size sequence (e.g. UUID) or a natural identifier. +Natural identifiers are very practical, being more compact than their UUID counterparts, so there are multiple generators to choose from: + +- `IDENTITY` +- `SEQUENCE` +- `TABLE` + +Although the `TABLE` generator addresses the portability concern, in reality, it performs poorly because it requires emulating a database sequence using a separate transaction and row-level locks. +For this reason, the choice is usually between `IDENTITY` and `SEQUENCE`. + +[TIP] +==== +If the underlying database supports sequences, you should always use them for your Hibernate entity identifiers. + +Only if the relational database does not support sequences (e.g. MySQL 5.7), you should use the `IDENTITY` generators. +However, you should keep in mind that the `IDENTITY` generators disables JDBC batching for `INSERT` statements. +==== + +If you're using the `SEQUENCE` generator, then you should be using the enhanced identifier generators that were enabled by default in Hibernate 5. +The https://vladmihalcea.com/2014/07/21/hibernate-hidden-gem-the-pooled-lo-optimizer/[*pooled* and the *pooled-lo* optimizers] are very useful to reduce the number of database roundtrips when writing multiple entities per database transaction. + +[[best-practices-mapping-associations]] +==== Associations + +JPA offers four entity association types: + +- `@ManyToOne` +- `@OneToOne` +- `@OneToMany` +- `@ManyToMany` + +And an `@ElementCollection` for collections of embeddables. + +Because object associations can be bidirectional, there are many possible combinations of associations. +However, not every possible association type is efficient from a database perspective. + +[TIP] +==== +The closer the association mapping is to the underlying database relationship, the better it will perform. + +On the other hand, the more exotic the association mapping, the better the chance of being inefficient. +==== + +Therefore, the `@ManyToOne` and the `@OneToOne` child-side association are best to represent a `FOREIGN KEY` relationship. + +For collections, the association can be either: + +- unidirectional +- bidirectional + +For unidirectional collections, `Sets` are the best choice because they generate the most efficient SQL statements. +https://vladmihalcea.com/2015/05/04/how-to-optimize-unidirectional-collections-with-jpa-and-hibernate/[Unidirectional `Lists`] are less efficient than a `@ManyToOne` association. + +Bidirectional associations are usually a better choice because the `@ManyToOne` controls the association. + +The `@ManyToMany` annotation is rarely a good choice because it treats both sides as unidirectional associations. + +For this reason, it's much better to map the link table as depicted in the <> section. +Each `FOREIGN KEY column will be mapped as a `@ManyToOne` association. +On each parent-side, a bidirectional `@OneToMany` association is going to map to the aforementioned `@ManyToOne` relationship in the link entity. + +[TIP] +==== +Just because you have support for collections, it does not mean that you have to turn any one-to-many database relationship into a collection. + +Sometimes, a `@ManyToOne` association is sufficient, and the collection can be simply replaced by an entity query which is easier to paginate or filter. +==== + +[[best-practices-inheritance]] +=== Inheritance + +JPA offers `SINGLE_TABLE`, `JOINED`, and `TABLE_PER_CLASS` to deal with inheritance mapping, and each of these strategies has advantages and disadvantages. + +- `SINGLE_TABLE` performs the best in terms of executed SQL statements. However, you cannot use `NOT NULL` constraints on the column-level. You can still use triggers and rules to enforce such constraints, but it's not as straightforward. +- `JOINED` addresses the data integrity concerns because every subclass is associated with a different table. + Polymorphic queries or ``@OneToMany` base class associations don't perform very well with this strategy. + However, polymorphic @ManyToOne` associations are fine, and they can provide a lot of value. +- `TABLE_PER_CLASS` should be avoided since it does not render efficient SQL statements. + +[[best-practices-fetching]] +=== Fetching + +[TIP] +==== +Fetching too much data is the number one performance issue for the vast majority of JPA applications. +==== + +Hibernate supports both entity queries (JPQL/HQL and Criteria API) and native SQL statements. +Entity queries are useful only if you need to modify the fetched entities, therefore benefiting from the https://vladmihalcea.com/2014/08/21/the-anatomy-of-hibernate-dirty-checking/[automatic dirty checking mechanism]. + +For read-only transactions, you should fetch DTO projections because they allow you to fetch just as many columns as you need to fulfill a certain business use case. +This has many benefits like reducing the load on the currently running Persistence Context because DTO projections don't need to be managed. + +[[best-practices-fetching-associations]] +==== Fetching associations + +Related to associations, there are two major fetch strategies: + +- `EAGER` +- `LAZY` + +https://vladmihalcea.com/2014/12/15/eager-fetching-is-a-code-smell/[`EAGER` fetching is almost always a bad choice]. + +[TIP] +==== +Prior to JPA, Hibernate used to have all associations as `LAZY` by default. +However, when JPA 1.0 specification emerged, it was thought that not all providers would use Proxies. Hence, the `@ManyToOne` and the `@OneToOne` associations are now `EAGER` by default. + +The `EAGER` fetching strategy cannot be overwritten on a per query basis, so the association is always going to be retrieved even if you don't need it. +More, if you forget to `JOIN FETCH` an `EAGER` association in a JPQL query, Hibernate will initialize it with a secondary statement, which in turn can lead to https://vladmihalcea.com/2014/02/01/how-to-detect-the-n-plus-one-query-problem-during-testing/[N+1 query issues]. +==== + +So, `EAGER` fetching is to be avoided. For this reason, it's better if all associations are marked as `LAZY` by default. + +However, `LAZY` associations must be initialized prior to being accessed. Otherwise, a `LazyInitializationException` is thrown. +There are good and bad ways to treat the `LazyInitializationException`. + +https://vladmihalcea.com/2016/09/13/the-best-way-to-handle-the-lazyinitializationexception/[The best way to deal with `LazyInitializationException`] is to fetch all the required associations prior to closing the Persistence Context. +The `JOIN FETCH` directive is goof for `@ManyToOne` and `OneToOne` associations, and for at most one collection (e.g. `@OneToMany` or `@ManyToMany`). +If you need to fetch multiple collections, to avoid a Cartesian Product, you should use secondary queries which are triggered either by navigating the `LAZY` association or by calling `Hibernate#initialize(proxy)` method. + +[[best-practices-caching]] +=== Caching + +Hibernate has two caching layers: + +- the first-level cache (Persistence Context) which is a https://vladmihalcea.com/2015/04/20/a-beginners-guide-to-cache-synchronization-strategies/[transactional write-behind cache] providing https://vladmihalcea.com/2014/10/23/hibernate-application-level-repeatable-reads/[application-level repeatable reads]. +- the second-level cache which, unlike application-level caches, https://vladmihalcea.com/2015/04/09/how-does-hibernate-store-second-level-cache-entries/[it doesn't store entity aggregates but normalized dehydrated entity entries]. + +The first-level cache is not a caching solution "per se", being more useful for ensuring https://vladmihalcea.com/2014/01/05/a-beginners-guide-to-acid-and-database-transactions/[REPEATABLE READS] even when using the https://vladmihalcea.com/2014/12/23/a-beginners-guide-to-transaction-isolation-levels-in-enterprise-java/[READ_COMMITTED isolation level]. + +While the first-level cache is short lived, being cleared when the underlying `EntityManager` is closed, the second-level cache is tied to an `EntityManagerFactory`. +Some second-level caching providers offer support for clusters. Therefore, a node needs only to store a subset of the whole cached data. + +Although the second-level cache can reduce transaction response time since entities are retrieved from the cache rather than from the database, +https://vladmihalcea.com/2015/04/16/things-to-consider-before-jumping-to-enterprise-caching/[there are other options] to achieve the same goal, +and you should consider these alternatives prior to jumping to a second-level cache layer: + +- tuning the underlying database cache so that the working set fits into memory, therefore reducing Disk I/O traffic. +- optimizing database statements through JDBC batching, statement caching, indexing can reduce the average response time, therefore increasing throughput as well. +- database replication is also a very valuable option to increase read-only transaction throughput + +After properly tuning the database, to further reduce the average response time and increase the system throughput, application-level caching becomes inevitable. + +Topically, a key-value application-level cache like https://memcached.org/[Memcached] or http://redis.io/[Redis] is a common choice to store data aggregates. +If you can duplicate all data in the key-value store, you have the option of taking down the database system for maintenance without completely loosing availability since read-only traffic can still be served from the cache. + +One of the main challenges of using an application-level cache is ensuring data consistency across entity aggregates. +That's where the second-level cache comes to the rescue. +Being tightly integrated with Hibernate, the second-level cache can provide better data consistency since entries are cached in a normalized fashion, just like in a relational database. +Changing a parent entity only requires a single entry cache update, as opposed to cache entry invalidation cascading in key-value stores. + +The second-level cache provides four cache concurrency strategies: + +- https://vladmihalcea.com/2015/04/27/how-does-hibernate-read_only-cacheconcurrencystrategy-work/[`READ_ONLY`] +- https://vladmihalcea.com/2015/05/18/how-does-hibernate-nonstrict_read_write-cacheconcurrencystrategy-work/[`NONSTRICT_READ_WRITE`] +- https://vladmihalcea.com/2015/05/25/how-does-hibernate-read_write-cacheconcurrencystrategy-work/[`READ_WRITE`] +- https://vladmihalcea.com/2015/06/01/how-does-hibernate-transactional-cacheconcurrencystrategy-work/[`TRANSACTIONAL`] + +`READ_WRITE` is a very good default concurrency strategy since it provides strong consistency guarantees without compromising throughput. +The `TRANSACTIONAL` concurrency strategy uses JTA. Hence, it's more suitable when entities are frequently modified. + +Both `READ_WRITE` and `TRANSACTIONAL` use write-through caching, while `NONSTRICT_READ_WRITE` is a read-through caching strategy. +For this reason, `NONSTRICT_READ_WRITE` is not very suitable if entities are changed frequently. + +When using clustering, the second-level cache entries are spread across multiple nodes. +When using http://blog.infinispan.org/2015/10/hibernate-second-level-cache.html[Infinispan distributed cache], only `READ_WRITE` and `NONSTRICT_READ_WRITE` are available for read-write caches. +Bear in mind that `NONSTRICT_READ_WRITE` offers a weaker consistency guarantee since stale updates are possible. + +[NOTE] +==== +For more about Hibernate Performance Tuning, check out the https://www.youtube.com/watch?v=BTdTEe9QL5k&t=1s[High-Performance Hibernate] presentation from Devoxx France. +==== \ No newline at end of file