cannibalize good content from HR docs

2023-05-11 15:30:27 +02:00 · 2023-05-11 15:30:27 +02:00 · 9036184cc5
parent 02b551464d
commit 9036184cc5
4 changed files with 398 additions and 37 deletions
--- a/documentation/src/main/asciidoc/introduction/Entities.adoc
+++ b/documentation/src/main/asciidoc/introduction/Entities.adoc
@ -441,6 +441,50 @@ LocalDateTime lastUpdated;

 The `@Id` and `@Version` attributes we've already seen are just specialized examples of _basic attributes_.

+[[natural-id-attributes]]
+=== Natural id attributes
+
+Even when an entity has a surrogate key, it should still be possible to identify a combination of fields which uniquely identifies an instance of the entity, from the point of view of the user of the system.
+We call this combination of fields a _natural key_.
+
+[IMPORTANT]
+.What if my entity has no natural key?
+====
+If you can't identify a natural key, it might be a sign that you need to think more carefully about some aspect of your data model.
+If an entity doesn't have a meaningful unique key, then it's impossible to say what event or object it represents in the "real world" outside your program.
+====
+
+Since it's _extremely_ common to retrieve an entity based on its natural key, Hibernate has a way to mark the attributes of the entity which make up its natural key.
+Each attribute must be annotated `@NaturalId`.
+
+[source,java]
+----
+@Entity
+class Book {
+    Book() {}
+
+    @Id @GeneratedValue
+    Long id; // the system-generated surrogate key
+
+    @NaturalId
+    String isbn; // belongs to the natural key
+
+    @NaturalId
+    int printing; // also belongs to the natural key
+
+    ...
+}
+----
+
+Hibernate automatically generates a `UNIQUE` constraint on the columns mapped by the annotated fields.
+
+[TIP]
+====
+Consider using the natural id attributes to implement <<equals-and-hash>>.
+====
+
+Note that even when you've identified a natural key, we still recommend the use of a generated surrogate key in foreign keys, since this makes your data model _much_ easier to change.
+
 [[basic-attributes]]
 === Basic attributes

@ -490,7 +534,8 @@ Hibernate slightly extends this list with the following types:
 |====

 The `@Basic` annotation explicitly specifies that an attribute is basic, but it's often not needed, since attributes are assumed basic by default.
-On the other hand, if an attribute cannot be null, use of `@Basic(optional=false)` is highly recommended.
+On the other hand, if a non-primitively-typed attribute cannot be null, use of `@Basic(optional=false)` is highly recommended.
+Note that primitively-typed attributes are inferred `NOT NULL` by default.

 [source,java]
 ----
@ -1306,38 +1351,39 @@ That's already a lot of annotations, and we have not even started with the annot
 [[equals-and-hash]]
 === `equals()` and `hashCode()`

-Entity classes should override `equals()` and `hashCode()`. People new to
-Hibernate or JPA are often confused by exactly which fields should be
-included in the `hashCode()`, so please keep the following principles in
-mind:
+Entity classes should override `equals()` and `hashCode()`. People new to Hibernate or JPA are often confused by exactly which fields should be included in the `hashCode()`, so please keep the following principles in mind:

- You should not include mutable fields in the hashcode, since that would
-require rehashing any collection containing the entity whenever the field
-is mutated.
- It's not completely wrong to include a generated identifier (surrogate
-key) in the hashcode, but since the identifier is not generated until
-the entity instance is made persistent, you must take great care to not
-add it to any hashed collection before the identifier is generated. We
-therefore advise against including any database-generated field in the
-hashcode.
+- You should not include mutable fields in the hashcode, since that would require rehashing any collection containing the entity whenever the field is mutated.
+- It's not completely wrong to include a generated identifier (surrogate key) in the hashcode, but since the identifier is not generated until the entity instance is made persistent, you must take great care to not add it to any hashed collection before the identifier is generated. We therefore advise against including any database-generated field in the hashcode.

 It's OK to include any immutable, non-generated field in the hashcode.

-TIP: We therefore recommend identifying a _natural key_ for each entity,
-that is, a combination of fields that uniquely identifies an instance of
-the entity, from the perspective of the data model of the program. The
-business key should correspond to a unique constraint on the database,
-and to the fields which are included in `equals()` and `hashCode()`.
+TIP: We therefore recommend identifying a <<natural-id-attributes,natural key>> for each entity, that is, a combination of fields that uniquely identifies an instance of the entity, from the perspective of the data model of the program. The natural key should correspond to a unique constraint on the database, and to the fields which are included in `equals()` and `hashCode()`.

-That said, an implementation of `equals()` and `hashCode()` based on the
-generated identifier of the entity can work _if you're careful_.
+[source,java]
+----
+@Entity
+class Book {

-IMPORTANT: If you can't identify a natural key, it might be a sign that
-you need to think more carefully about some aspect of your data model.
-If an entity doesn't have a meaningful unique key, then it's impossible
-to say what event or object it represents in the "real world" outside
-your program.
+    @Id @GeneratedValue
+    Long id;

-Note that even when you've identified a natural key, we still recommend
-the use of a generated surrogate key in foreign keys, since this makes
-your data model _much_ easier to change.
+    @NaturalId
+    @Basic(optional=false)
+    String isbn;
+
+    ...
+
+    @Override
+    public boolean equals(Object other) {
+        return other instanceof Book
+            && ((Book) other).isbn.equals(isbn);
+    }
+    @Override
+    public int hashCode() {
+        return isbn.hashCode();
+    }
+}
+----
+
+That said, an implementation of `equals()` and `hashCode()` based on the generated identifier of the entity can work _if you're careful_.
--- a/documentation/src/main/asciidoc/introduction/Hibernate_Introduction.adoc
+++ b/documentation/src/main/asciidoc/introduction/Hibernate_Introduction.adoc
@ -50,11 +50,11 @@ It quickly overtook other open source and commercial contenders to become the mo

 In 2004, Gavin and Christian joined a tiny startup called JBoss, and other early Hibernate contributors soon followed: Max Rydahl Andersen, Emmanuel Bernard, Steve Ebersole, and Sanne Grinovero.

-Soon after, Gavin joined the EJB 3 expert group and convinced the group to deprecate Entity Beans in favor of brand new persistence API modelled after Hibernate.
+Soon after, Gavin joined the EJB 3 expert group and convinced the group to deprecate Entity Beans in favor of a brand-new persistence API modelled after Hibernate.
 Later, members of the TopLink team got involved, and the Java Persistence API evolved as a collaboration between&mdash;primarily&mdash;Sun, JBoss, Oracle, and Sybase, under the leadership of Linda Demichiel.

-Over the intervening two decades, _many_ other talented people have contributed to the development of Hibernate.
-Special credit must go to Steve, who has led the project for many years, since Gavin stepped back to focus in other work.
+Over the intervening two decades, _many_ talented people have contributed to the development of Hibernate.
+We're all especially grateful to Steve, who has led the project for many years, since Gavin stepped back to focus in other work.
 ====

 We can think of the API of Hibernate in terms of three basic elements:
@ -78,6 +78,40 @@ JPA is a great baseline that really nails the basics of the object/relational ma
 But without the native APIs, and extended mapping annotations, you miss out on much of the power of Hibernate.
 ====

+[[java-code]]
+=== Writing Java code with Hibernate
+
+If you're completely new to Hibernate and JPA, you might already be wondering how the persistence-related code is structured, and how it fits into the rest of your program.
+
+Well, typically, your persistence-related code comes in two layers:
+
+. a representation of your data model in Java, which takes the form of a set of annotated entity classes, and
+. a larger number of functions which interact with Hibernate's APIs to perform the persistence operations associated with your various transactions.
+
+The first part, the data or "domain" model, is usually easier to write, but doing a great and very clean job of it will strongly affect your success in the second part.
+
+Most people implement the domain model as a set of what we used to call "Plain Old Java Objects", that is, as simple Java classes with no direct dependencies on technical infrastructure, nor on application logic which deals with request processing, transaction management, communications, or interaction with the database.
+
+[TIP]
+====
+Take your time with this code, and try to produce a Java model that's as close as reasonable to the relational data model. Avoid using exotic or advanced mapping features when they're not really needed.
+When in the slightest doubt, map a foreign key relationship using `@ManyToOne` with `@OneToMany(mappedBy=...)` in preference to more complicated association mappings.
+====
+
+The second part of the code is much trickier to get right. This code must:
+
+- manage transactions and sessions,
+- interact with the database via the Hibernate session,
+- fetch and prepare data needed by the UI, and
+- handle failures.
+
+[TIP]
+====
+Some responsibility for transaction and session management, and for
+recovery from certain kinds of failure, can be best handled in some sort
+of framework code.
+====
+
 [[overview]]
 === Overview

@ -94,4 +128,4 @@ Naturally, we'll start at the top of this list, with the least-interesting topic
 include::Configuration.adoc[]
 include::Entities.adoc[]
 include::Mapping.adoc[]
-
+include::Tuning.adoc[]
--- a/documentation/src/main/asciidoc/introduction/Preface.adoc
+++ b/documentation/src/main/asciidoc/introduction/Preface.adoc
@ -1,15 +1,15 @@
 [[preface]]
-
-[preface]
 == Preface

+:user-guide: https://docs.jboss.org/hibernate/orm/6.2/userguide/html_single/Hibernate_User_Guide.html
+
 Hibernate 6 is a major redesign of the world's most popular and feature-rich ORM solution.
 The redesign has touched almost every subsystem of Hibernate, including the APIs, mapping annotations, and the query language.
 This new Hibernate is more powerful, more robust, and more typesafe.

 Unfortunately, the changes in Hibernate 6 have obsoleted much of the information about Hibernate that's available in books, in blog posts, and on stackoverflow.

-On the other hand, the Hibernate User Guide provides a great deal of detail about many aspects of Hibernate, but with so much information to cover, readability is difficult to achieve.
+On the other hand, the Hibernate {user-guide}[User Guide] provides a great deal of detail about many aspects of Hibernate, but with so much information to cover, readability is difficult to achieve.

 This is therefore the new canonical guide to Hibernate.
-We do not attempt to cover every detail of Hibernate here, and so this guide should be used in conjunction with the extensive Javadoc available for Hibernate 6, and with the User Guide.
+We do not attempt to cover every detail of Hibernate here, and so this guide should be used in conjunction with the extensive Javadoc available for Hibernate 6.
--- a/documentation/src/main/asciidoc/introduction/Tuning.adoc
+++ b/documentation/src/main/asciidoc/introduction/Tuning.adoc
@ -0,0 +1,281 @@
+[[tuning-and-performance]]
+== Tuning and performance
+
+Once you have a program up and running using Hibernate to access
+the database, it's inevitable that you'll find places where performance is
+disappointing or unacceptable.
+
+Fortunately, most performance problems are relatively easy to solve with
+the tools that Hibernate makes available to you, as long as you keep a
+couple of simple principles in mind.
+
+First and most important: the reason you're using Hibernate is
+that it makes things easier. If, for a certain problem, it's making
+things _harder_, stop using it. Solve this problem with a different tool
+instead.
+
+IMPORTANT: Just because you're using Hibernate in your program doesn't mean
+you have to use it _everywhere_.
+
+Second: there are two main potential sources of performance bottlenecks in
+a program that uses Hibernate:
+
+- too many round trips to the database, and
+- memory consumption associated with the first-level (session) cache.
+
+So performance tuning primarily involves reducing the number of accesses
+to the database, and/or controlling the size of the session cache.
+
+But before we get to those more advanced topics, we should start by tuning
+the connection pool.
+
+[[connection-pool]]
+=== Tuning the connection pool
+
+TODO
+
+[[statement-batching]]
+=== Enabling statement batching
+
+An easy way to improve performance of some transactions with almost no
+work at all is to turn on automatic DML statement batching. Batching
+only helps in cases where a program executes many inserts, updates, or
+deletes against the same table in a single transaction.
+
+All you need to do is set a single property:
+
+|===
+| Configuration property name | Purpose
+
+| `hibernate.jdbc.batch_size` | Maximum batch size for SQL statement batching
+|===
+
+TIP: Even better than DML statement batching is the use of HQL `update`
+or `delete` queries, or even native SQL that calls a stored procedure!
+
+[[association-fetching]]
+=== Association fetching
+
+:association-fetching: https://docs.jboss.org/hibernate/orm/6.2/userguide/html_single/Hibernate_User_Guide.html#fetching
+
+Achieving high performance in ORM means minimizing the number of round
+trips to the database. This goal should be uppermost in your mind
+whenever you're writing data access code with Hibernate. The most
+fundamental rule of thumb in ORM is:
+
+- explicitly specify all the data you're going to need right at the start
+of a session/transaction, and fetch it immediately in one or two queries,
+- and only then start navigating associations between persistent entities.
+
+Without question, the most common cause of poorly-performing data access
+code in Java programs is the problem of _N+1 selects_. Here, a list of N
+rows is retrieved from the database in an initial query, and then
+associated instances of a related entity are fetched using N subsequent
+queries.
+
+IMPORTANT: Hibernate code which does this is bad code and makes
+Hibernate look bad to people who don't realize that it's their own
+fault for not following the advice in this section!
+
+Hibernate provides several strategies for efficiently fetching
+associations and avoiding N+1 selects:
+
+- outer join fetching,
+- batch fetching, and
+- subselect fetching.
+
+Of these, you should almost always use outer join fetching. Batch
+fetching and subselect fetching are only useful in rare cases where
+outer join fetching would result in a cartesian product and a huge
+result set. Unfortunately, outer join fetching simply isn't possible
+with lazy fetching.
+
+TIP: Avoid the use of lazy fetching, which is often the source of
+N+1 selects.
+
+Now, we're not saying that associations should be mapped for eager
+fetching by default! That would be a terrible idea, resulting in
+simple session operations that fetch almost the entire database.
+Therefore:
+
+TIP: Most associations should be mapped for lazy fetching by default.
+
+It sounds as if this tip is in contradiction to the previous one, but
+it's not. It's saying that you must explicitly specify eager fetching
+for associations precisely when and where they are needed.
+
+If you need eager fetching in some particular transaction, use:
+
+- `left join fetch` in HQL,
+- a fetch profile,
+- a JPA `EntityGraph`, or
+- `fetch()` in a criteria query.
+
+You can find much more information about association fetching in the
+{association-fetching}[User Guide].
+
+[[second-level-cache]]
+=== Enabling the second-level cache
+
+:second-level-cache: https://docs.jboss.org/hibernate/orm/6.2/userguide/html_single/Hibernate_User_Guide.html#caching
+
+A classic way to reduce the number of accesses to the database is to
+use a second-level cache, allowing cached data to be shared between
+sessions.
+
+Configuring Hibernate's second-level cache is a rather involved topic,
+and quite outside the scope of this document. But in case it helps, we
+often test Hibernate with the following configuration, which uses
+EHCache as the cache implementation, as above in <<optional-dependencies>>:
+
+|===
+| Configuration property name              | Property value
+
+| `hibernate.cache.use_second_level_cache` | `true`
+| `hibernate.cache.region.factory_class`   | `org.hibernate.cache.jcache.JCacheRegionFactory`
+| `hibernate.javax.cache.provider`         | `org.ehcache.jsr107.EhcacheCachingProvider`
+| `hibernate.javax.cache.uri`              | `/ehcache.xml`
+|===
+
+If you're using EHCache, you'll also need to include an `ehcache.xml` file
+that explicitly configures the behavior of each cache region belonging to
+your entities and collections.
+
+TIP: Don't forget that you need to explicitly mark each entity that will
+be stored in the second-level cache with the `@Cache` annotation from
+`org.hibernate.annotations`.
+
+You can find much more information about the second-level cache in the
+{second-level-cache}[User Guide].
+
+[[session-cache-management]]
+=== Session cache management
+
+Entity instances aren't automatically evicted from the session cache when
+they're no longer needed. (The session cache is quite different to the
+second-level cache in this respect!) Instead, they stay pinned in memory
+until the session they belong to is discarded by your program.
+
+The methods `detach()` and `clear()` allow you to remove entities from the
+session cache, making them available for garbage collection. Since most
+sessions are rather short-lived, you won't need these operations very often.
+And if you find yourself thinking you _do_ need them in a certain situation,
+you should strongly consider an alternative solution: a _stateless session_.
+
+[[stateless-sessions]]
+=== Stateless sessions
+
+An arguably-underappreciated feature of Hibernate is the `StatelessSession`
+interface, which provides a command-oriented, more bare-metal approach to
+interacting with the database.
+
+You may obtain a reactive stateless session from the `SessionFactory`:
+
+[source, JAVA, indent=0]
+----
+Stage.StatelessSession ss = getSessionFactory().openStatelessSession();
+----
+
+A stateless session:
+
+- doesn't have a first-level cache (persistence context), nor does it interact
+with any second-level caches, and
+- doesn't implement transactional write-behind or automatic dirty checking,
+so all operations are executed immediately when they're explicitly called.
+
+For a stateless session, you're always working with detached objects. Thus,
+the programming model is a bit different:
+
+|===
+| Method name and parameters | Effect
+
+| `get(Class, Object)` | Obtain a detached object, given its type and its id,
+by executing a `select`
+| `fetch(Object)`      | Fetch an association of a detached object
+| `refresh(Object)`    | Refresh the state of a detached object by executing
+a `select`
+| `insert(Object)`     | Immediately `insert` the state of the given
+transient object into the database
+| `update(Object)`     | Immediately `update` the state of the given detached
+object in the database
+| `delete(Object)`     | Immediately `delete` the state of the given detached
+object from the database
+|===
+
+NOTE: There's no `flush()` operation, and so `update()` is always explicit.
+
+In certain circumstances, this makes stateless sessions easier to work with,
+but with the caveat that a stateless session is much more vulnerable to data
+aliasing effects, since it's easy to get two non-identical Java objects which
+both represent the same row of a database table.
+
+IMPORTANT: If you use `fetch()` in a stateless session, you can very easily
+obtain two objects representing the same database row!
+
+In particular, the absence of a persistence context means that you can safely
+perform bulk-processing tasks without allocating huge quantities of memory.
+Use of a `StatelessSession` alleviates the need to call:
+
+- `clear()` or `detach()` to perform first-level cache management, and
+- `setCacheMode()` to bypass interaction with the second-level cache.
+
+TIP: Stateless sessions can be useful, but for bulk operations on huge datasets,
+Hibernate can't possibly compete with stored procedures!
+
+When using a stateless session, you should be aware of the following additional
+limitations:
+
+- persistence operations never cascade to associated instances,
+- changes to `@ManyToMany` associations and ``@ElementCollection``s cannot be made
+persistent, and
+- operations performed via a stateless session bypass callbacks.
+
+[[optimistic-and-pessimistic-locking]]
+=== Optimistic and pessimistic locking
+
+Finally, an aspect of behavior under load that we didn't mention above is row-level
+data contention. When many transactions try to read and update the same data, the
+program might become unresponsive with lock escalation, deadlocks, and lock
+acquisition timeout errors.
+
+There's two basic approaches to data concurrency in Hibernate:
+
+- optimistic locking using `@Version` columns, and
+- database-level pessimistic locking using the SQL `for update` syntax (or equivalent).
+
+In the Hibernate community it's _much_ more common to use optimistic locking, and
+Hibernate makes that incredibly easy.
+
+TIP: Where possible, in a multiuser system, avoid holding a pessimistic lock across
+a user interaction. Indeed, the usual practice is to avoid having transactions that
+span user interactions. For multiuser systems, optimistic locking is king.
+
+That said, there _is_ also a place for pessimistic locks, which can sometimes reduce
+the probability of transaction rollbacks.
+
+Therefore, the `find()`, `lock()`, and `refresh()` methods of the reactive session
+accept an optional `LockMode`. You can also specify a `LockMode` for a query. The
+lock mode can be used to request a pessimistic lock, or to customize the behavior
+of optimistic locking:
+
+|===
+| `LockMode` type | Meaning
+
+| `READ`                        | An optimistic lock obtained implicitly whenever
+an entity is read from the database using `select`
+| `OPTIMISTIC`                  | An optimistic lock obtained when an entity is
+read from the database, and verified using a
+`select` to check the version when the
+transaction completes
+| `OPTIMISTIC_FORCE_INCREMENT`  | An optimistic lock obtained when an entity is
+read from the database, and enforced using an
+`update` to increment the version when the
+transaction completes
+| `WRITE`                       | A pessimistic lock obtained implicitly whenever
+an entity is written to the database using
+`update` or `insert`
+| `PESSIMISTIC_READ`            | A pessimistic `for share` lock
+| `PESSIMISTIC_WRITE`           | A pessimistic `for update` lock
+| `PESSIMISTIC_FORCE_INCREMENT` | A pessimistic lock enforced using an immediate
+`update` to increment the version
+|===