flushing, flush modes, and cache modes

2023-05-12 00:58:47 +02:00 · 2023-05-12 00:58:47 +02:00 · 02c179d07f
parent 05f4ac6483
commit 02c179d07f
2 changed files with 121 additions and 42 deletions
--- a/documentation/src/main/asciidoc/introduction/Interacting.adoc
+++ b/documentation/src/main/asciidoc/introduction/Interacting.adoc
@ -40,7 +40,15 @@ Stateful sessions certainly have their advantages, but they're more difficult to
 === Persistence Contexts

 A persistence context is a sort of cache; we sometimes call it the "first-level cache", to distinguish it from the <<second-level-cache,second-level cache>>.
-For every entity instance read within the scope of a persistence context, and for every new entity made persistent within the scope of the persistence context, the context holds a unique mapping from the identifier of the entity instance to the instance itself.
+For every entity instance read from the database within the scope of a persistence context, and for every new entity made persistent within the scope of the persistence context, the context holds a unique mapping from the identifier of the entity instance to the instance itself.
+
+Thus, an entity instance may be in one of three states with respect to a given persistence context:
+
+1. _transient_ — never persistent, and not associated with the persistence context,
+2. _persistent_ — currently associated with the persistence context, or
+3. _detached_ — previously persistent in another session, but not currently associated with _this_ persistence context.
+
+At any given moment, an instance may be associated with at most one persistence context.

 The lifetime of a persistence context usually corresponds to the lifetime of a transaction, though it's possible to have a persistence context that spans several database-level transactions that form a single logical unit of work.

@ -48,7 +56,7 @@ There are several reasons we like persistence contexts.

 1. They help avoid _data aliasing_: if we modify an entity in one section of code, then other code executing within the same persistence context will see our modification.
 2. They enable _automatic dirty checking_: after modifying an entity, we don't need to perform any explicit operation to ask Hibernate to propagate that change back to the database.
-   Instead, the change will be automatically synchronized with the database when the session _flush_ occurs.
+   Instead, the change will be automatically synchronized with the database when the session is <<flush,flushed>>.
 3. They can improve performance by avoiding a trip to the database when a given entity instance is requested repeatedly in a given unit of work.
 4. They make it possible to _transparently batch_ together multiple database operations.

@ -60,10 +68,13 @@ On the other hand, stateful sessions come with some very important restrictions,
 - persistence contexts aren't threadsafe, and can't be shared across threads, and
 - a persistence context can't be reused across unrelated transactions, since that would break the isolation and atomicity of the transactions.

+Furthermore, a persistence context holds a hard references to all its entities, preventing them from being garbage collected.
+Thus, the session must be discarded once a unit of work is complete.
+
 [IMPORTANT]
 .This is important
 ====
-If you didn't quite understand the point above, then go back and re-read it until you do understand.
+If you don't completely understand the previous passage, go back and re-read it until you do.
 A great deal of human suffering has resulted from users mismanaging the lifecycle of the Hibernate `Session` or JPA `EntityManager`.
 ====

@ -215,6 +226,13 @@ affecting the database
 | Obtain a reference to a persistent object without actually loading its state from the database
 |===

+Notice that some of these operations have no immediate effect on the database, and simply schedule a command for later execution.
+
+Any of these operations might throw an exception, and it's important that we know what to do when that happens.
+
+[[session-exception-handling]]
+=== Session and exception handling
+
 If an exception occurs while interacting with the database, there's no good way to resynchronize the state of the current persistence context with the state held in database tables.

 Therefore, a session is considered to be unusable after any of its methods throws an exception.
@ -228,3 +246,33 @@ If you receive an exception from Hibernate, you should immediately close and dis
 [[flush]]
 === Flushing the session

+From time to time, a _flush_ operation is triggered, and the session synchronizes dirty state held in memory—that is, modifications to the state of entities associated with the persistence context—with persistent state held in the database. Of course, it does this by executing SQL `INSERT`, `UPDATE`, and `DELETE` statements.
+
+By default, a flush is triggered when:
+
+- the current transaction commits, for example, when `Transacion.commit()` is called,
+- before execution of a query whose result would be affected by the synchronization of dirty state held in memory, or
+- when the program directly calls `flush()`.
+
+[NOTE]
+.SQL execution happens asynchronously
+====
+Notice that SQL statements are not usually executed synchronously by methods of the `Session` interface like `persist()` and `remove()`. If synchronous execution of SQL is desired, the `StatelessSession` allows this.
+====
+
+This behavior can be controlled by explicitly setting the flush mode.
+For example, to disable flushes that occur before query execution, call:
+
+[source,java]
+----
+em.setFlushMode(FlushModeType.COMMIT);
+----
+
+Hibernate allows greater control over the flush mode than JPA:
+
+[source,java]
+----
+s.setHibernateFlushMode(FlushMode.MANUAL);
+----
+
+Since flushing is a somewhat expensive operation (the session must dirty-check every entity in the persistence context), setting the flush mode to `COMMIT` can occasionally be a useful optimization.
--- a/documentation/src/main/asciidoc/introduction/Tuning.adoc
+++ b/documentation/src/main/asciidoc/introduction/Tuning.adoc
@ -152,6 +152,7 @@ The appropriate policies depend on the kind of data an entity represents. For ex

 The `@Cache` annotation also specifies `CacheConcurrencyStrategy`, a policy governing access to the second-level cache by concurrent transactions.

+.Cache concurrency policies
 |===
 | Concurrency policy | Interpretation | Use case

@ -180,6 +181,7 @@ Once we've marked some entities and collection as eligible for storage in the se

 Configuring Hibernate's second-level cache is a rather involved topic, and quite outside the scope of this document. But in case it helps, we often test Hibernate with the following configuration, which uses EHCache as the cache implementation, as above in <<optional-dependencies>>:

+.Cache provider configuration
 |===
 | Configuration property name              | Property value

@ -201,6 +203,7 @@ You can find much more information about the second-level cache in the

 For the most part, the second-level cache is transparent.
 Program logic which interacts with the Hibernate session is unaware of the cache, and is not impacted by changes to caching policies.
+
 At worst, interaction with the cache may be controlled by specification of an explicit `CacheMode`.

 [source,java]
@ -208,6 +211,42 @@ At worst, interaction with the cache may be controlled by specification of an ex
 s.setCacheMode(CacheMode.IGNORE);
 ----

+Or, using JPA-standard APIs:
+
+[source,java]
+----
+em.setCacheRetrieveMode(CacheRetrieveMode.BYPASS);
+em.setCacheStoreMode(CacheStoreMode.BYPASS);
+----
+
+The JPA-defined cache modes are:
+
+.Cache modes
+[cols=",3"]
+|===
+| Mode | Interpretation
+
+| `CacheRetrieveMode.USE` | Read data from the cache if available
+| `CacheRetrieveMode.BYPASS` | Don't read data from the cache; go direct to the database
+
+| `CacheStoreMode.USE` | Write data to the cache when read from the database or when modified; do not update already-cached items when reading
+| `CacheStoreMode.REFRESH` | Write data to the cache when read from the database or when modified; always update cached items when reading
+| `CacheStoreMode.BYPASS` | Don't write data to the cache
+|===
+
+[TIP]
+.A good time to `BYPASS` the cache
+====
+It's a good idea to set the `CacheStoreMode` to `BYPASS` just before running a query which returns a large result set full of data that we don't expect to need again soon.
+This saves work, and prevents the newly-read data from pushing out the previously cached data.
+
+[source,java]
+----
+em.setCacheStoreMode(CacheStoreMode.BYPASS);
+List<Publisher> allpubs = em.createQuery("from Publisher", Publisher.class).getResultList();
+----
+====
+
 Very occasionally, it's necessary or advantageous to control the cache explicitly, for example, to evict some data that we know to be stale.
 The `Cache` interface allows programmatic eviction of cached items.

@ -222,26 +261,24 @@ sf.getCache().evictEntityData(Book.class, bookId);
 None of the operations of the `Cache` interface respect any isolation or transactional semantics associated with the underlying caches. In particular, eviction via the methods of this interface causes an immediate "hard" removal outside any current transaction and/or locking scheme.
 ====

+Ordinarily, however, Hibernate automatically evicts or updates cached data after modifications, and, in addition, cached data which is unused will eventually be expired according to the configured policies.
+
+This is quite different to what happens with the first-level cache.
+
 [[session-cache-management]]
 === Session cache management

-Entity instances aren't automatically evicted from the session cache when
-they're no longer needed. (The session cache is quite different to the
-second-level cache in this respect!) Instead, they stay pinned in memory
-until the session they belong to is discarded by your program.
+Entity instances aren't automatically evicted from the session cache when they're no longer needed.
+Instead, they stay pinned in memory until the session they belong to is discarded by your program.

-The methods `detach()` and `clear()` allow you to remove entities from the
-session cache, making them available for garbage collection. Since most
-sessions are rather short-lived, you won't need these operations very often.
-And if you find yourself thinking you _do_ need them in a certain situation,
-you should strongly consider an alternative solution: a _stateless session_.
+The methods `detach()` and `clear()` allow you to remove entities from the session cache, making them available for garbage collection.
+Since most sessions are rather short-lived, you won't need these operations very often.
+And if you find yourself thinking you _do_ need them in a certain situation, you should strongly consider an alternative solution: a _stateless session_.

 [[stateless-sessions]]
 === Stateless sessions

-An arguably-underappreciated feature of Hibernate is the `StatelessSession`
-interface, which provides a command-oriented, more bare-metal approach to
-interacting with the database.
+An arguably-underappreciated feature of Hibernate is the `StatelessSession` interface, which provides a command-oriented, more bare-metal approach to interacting with the database.

 You may obtain a reactive stateless session from the `SessionFactory`:

@ -252,56 +289,50 @@ Stage.StatelessSession ss = getSessionFactory().openStatelessSession();

 A stateless session:

- doesn't have a first-level cache (persistence context), nor does it interact
-with any second-level caches, and
- doesn't implement transactional write-behind or automatic dirty checking,
-so all operations are executed immediately when they're explicitly called.
+- doesn't have a first-level cache (persistence context), nor does it interact with any second-level caches, and
+- doesn't implement transactional write-behind or automatic dirty checking, so all operations are executed immediately when they're explicitly called.

-For a stateless session, you're always working with detached objects. Thus,
-the programming model is a bit different:
+For a stateless session, you're always working with detached objects. Thus, the programming model is a bit different:

+.Important methods of the `StatelessSession`
+[cols=",2"]
 |===
 | Method name and parameters | Effect

-| `get(Class, Object)` | Obtain a detached object, given its type and its id,
-by executing a `select`
+| `get(Class, Object)` | Obtain a detached object, given its type and its id, by executing a `select`
 | `fetch(Object)`      | Fetch an association of a detached object
 | `refresh(Object)`    | Refresh the state of a detached object by executing
 a `select`
-| `insert(Object)`     | Immediately `insert` the state of the given
-transient object into the database
-| `update(Object)`     | Immediately `update` the state of the given detached
-object in the database
-| `delete(Object)`     | Immediately `delete` the state of the given detached
-object from the database
+| `insert(Object)`     | Immediately `insert` the state of the given transient object into the database
+| `update(Object)`     | Immediately `update` the state of the given detached object in the database
+| `delete(Object)`     | Immediately `delete` the state of the given detached object from the database
 |===

 NOTE: There's no `flush()` operation, and so `update()` is always explicit.

-In certain circumstances, this makes stateless sessions easier to work with,
-but with the caveat that a stateless session is much more vulnerable to data
-aliasing effects, since it's easy to get two non-identical Java objects which
-both represent the same row of a database table.
+In certain circumstances, this makes stateless sessions easier to work with, but with the caveat that a stateless session is much more vulnerable to data aliasing effects, since it's easy to get two non-identical Java objects which both represent the same row of a database table.

-IMPORTANT: If you use `fetch()` in a stateless session, you can very easily
-obtain two objects representing the same database row!
+[IMPORTANT]
+====
+If you use `fetch()` in a stateless session, you can very easily obtain two objects representing the same database row!
+====

-In particular, the absence of a persistence context means that you can safely
-perform bulk-processing tasks without allocating huge quantities of memory.
+In particular, the absence of a persistence context means that you can safely perform bulk-processing tasks without allocating huge quantities of memory.
 Use of a `StatelessSession` alleviates the need to call:

 - `clear()` or `detach()` to perform first-level cache management, and
 - `setCacheMode()` to bypass interaction with the second-level cache.

-TIP: Stateless sessions can be useful, but for bulk operations on huge datasets,
+[TIP]
+====
+Stateless sessions can be useful, but for bulk operations on huge datasets,
 Hibernate can't possibly compete with stored procedures!
+====

-When using a stateless session, you should be aware of the following additional
-limitations:
+When using a stateless session, you should be aware of the following additional limitations:

 - persistence operations never cascade to associated instances,
- changes to `@ManyToMany` associations and ``@ElementCollection``s cannot be made
-persistent, and
+- changes to `@ManyToMany` associations and ``@ElementCollection``s cannot be made persistent, and
 - operations performed via a stateless session bypass callbacks.

 [[optimistic-and-pessimistic-locking]]