HHH-11445 - Improve Infinispan second-level cache documentation

-  Also add a replicated cache example for entity/collection to make it easy for users to switch to that. Since eviction is not recommended for replicate entity caches, it helps to provide a sample configuration.
This commit is contained in:
Galder Zamarreño 2017-02-02 16:07:17 +01:00 committed by Vlad Mihalcea
parent 260b21bd63
commit 8b2a852a92
2 changed files with 251 additions and 43 deletions

View File

@ -643,9 +643,144 @@ http://www.ehcache.org/documentation/2.8/integrations/hibernate#optional[Ehcache
Use of the build-in integration for http://infinispan.org/[Infinispan] requires that the `hibernate-infinispan module` jar (and all of its dependencies) are on the classpath.
====
Infinispan currently supports all cache concurrency modes, although not all combinations of configurations are compatible.
How to configure Infinispan to be the second level cache provider varies slightly depending on the deployment scenario:
Traditionally the `transactional` and `read-only` strategy was supported on _transactional invalidation_ caches. In version 5.0, further modes have been added:
==== Single Node Local
In standalone library mode, a JPA/Hibernate application runs inside a Java SE application or inside containers that don't offer Infinispan integration.
Enabling Infinispan second level cache provider inside a JPA/Hibernate application that runs in single node is very straightforward.
First, make sure the Hibernate Infinispan cache provider (and its dependencies) are available in the classpath, then modify the persistence.xml to include these properties:
====
[source, XML, indent=0]
----
<!-- Use Infinispan second level cache provider -->
<property name="hibernate.cache.region.factory_class"
value="org.hibernate.cache.infinispan.InfinispanRegionFactory"/>
<!-- Optional: Force using local configuration when only using a single node.
Otherwise a clustered configuration is loaded. -->
<property name="hibernate.cache.infinispan.cfg"
value="org/hibernate/cache/infinispan/builder/infinispan-configs-local.xml"/>
----
====
Plugging in Infinispan as second-level cache provider requires at the bare minimum that `hibernate.cache.region.factory_class` is set to an Infinispan region factory implementation.
Normally, this is `org.hibernate.cache.infinispan.InfinispanRegionFactory` but other region factories are possible in alternative scenarios (see <<caching-provider-infinispan-region-factory, Alternative Region Factory>> section for more info).
By default, the Infinispan second-level cache provider uses an Infinispan configuration that's designed for clustered environments.
However, since this section is focused on running Infinispan second-level cache provider in a single node, an Infinispan configuration designed for local environments is recommended.
To enable that configuration, set `hibernate.cache.infinispan.cfg` to `org/hibernate/cache/infinispan/builder/infinispan-configs-local.xml` value.
The next section focuses on analysing how the default local configuration works.
Changing Infinispan configuration options can be done following the instructions in <<caching-provider-infinispan-config, Configuration Properties>> section.
===== Default Local Configuration
Infinispan second-level cache provider comes with a configuration designed for local, single node, environments.
These are the characteristics of such configuration:
* Entities, collections, queries and timestamps are stored in non-transactional local caches.
* Entities and collections query caches are configured with the following eviction settings.
You can change these settings on a per entity or collection basis or per individual entity or collection type.
More information in the <<integrations:infinispan-2lc-advanced,Advanced Configuration>> section below.
- Eviction wake up interval is 5 seconds.
- Max number of entries are 10,000
- Max idle time before expiration is 100 seconds
- Default eviction algorithm is LRU, least recently used.
* _No eviction/expiration is configured for timestamp caches_, nor it's allowed.
===== Local Cache Strategies
Before version 5.0, Infinispan only supported `transactional` and `read-only` strategies.
Starting with version 5.0, all cache strategies are supported: `transactional`, `read-write`, `nonstrict-read-write` and `read-only`.
==== Multi Node Cluster
When running a JPA/Hibernate in a multi-node environment and enabling Infinispan second-level cache, it is necessary to cluster the second-level cache so that cache consistency can be guaranteed.
Clustering the Infinispan second-level cache provider is as simple as adding the following properties:
====
[source, XML, indent=0]
----
<!-- Use Infinispan second level cache provider -->
<property name="hibernate.cache.region.factory_class"
value="org.hibernate.cache.infinispan.InfinispanRegionFactory"/>
====
As with the standalone local mode, at the bare minimum the region factory has to be configured to point to an Infinispan region factory implementation.
However, the default Infinispan configuration used by the second-level cache provider is already configured to work in a cluster environment, so no need to add any extra properties.
The next section focuses on analysing how the default cluster configuration works.
Changing Infinispan configuration options can be done following the instructions in <<caching-provider-infinispan-config, Configuration Properties>> section.
===== Default Cluster Configuration [[integrations:infinispan-2lc-cluster-configuration]]
Infinispan second-level cache provider default configuration is designed for multi-node clustered environments.
The aim of this section is to explain the default settings for each of the different global data type caches (entity, collection, query and timestamps), why these were chosen and what are the available alternatives.
These are the characteristics of such configuration:
* For all entities and collections, whenever a new _entity or collection is read from database_ and needs to be cached, _it's only cached locally_ in order to reduce intra-cluster traffic.
This option can be changed so that entities/collections are cached cluster wide, by switching the entity/collection cache to be replicated or distributed.
How to change this option is explained in the <<caching-provider-infinispan-config, Configuration Properties>> section.
* All _entities and collections are configured to use a synchronous invalidation_ as clustering mode.
This means that when an entity is updated, the updated cache will send a message to the other members of the cluster telling them that the entity has been modified.
Upon receipt of this message, the other nodes will remove this data from their local cache, if it was stored there.
This option can be changed so that both local and remote nodes contain the updates by configuring entities or collections to use a replicated or distributed cache.
With replicated caches all nodes would contain the update, whereas with distributed caches only a subset of the nodes.
How to change this option is explained in the <<caching-provider-infinispan-config, Configuration Properties>> section.
* All _entities and collections have initial state transfer disabled_ since there's no need for it.
* Entities and collections are configured with the following eviction settings.
You can change these settings on a per entity or collection basis or per individual entity or collection type.
More information in the <<caching-provider-infinispan-config, Configuration Properties>> section below.
- Eviction wake up interval is 5 seconds.
- Max number of entries are 10,000
- Max idle time before expiration is 100 seconds
- Default eviction algorithm is LRU, least recently used.
* Assuming that query caching has been enabled for the persistence unit (see <<caching-query, query cache section>>), the query cache is configured so that _queries are only cached locally_.
Alternatively, you can configure query caching to use replication by selecting the `replicated-query` as query cache name.
However, replication for query cache only makes sense if, and only if, all of this conditions are true:
- Performing the query is quite expensive.
- The same query is very likely to be repeatedly executed on different cluster nodes.
- The query is unlikely to be invalidated out of the cache (Note: Hibernate must aggressively invalidate query results from the cache any time any instance of one of the entity types targeted by the query. All such query results are invalidated, even if the change made to the specific entity instance would not have affected the query result)
* _query cache_ uses the _same eviction/expiration settings as for entities/collections_.
* _query cache has initial state transfer disabled_ . It is not recommended that this is enabled.
* The _timestamps cache is configured with asynchronous replication_ as clustering mode.
Local or invalidated cluster modes are not allowed, since all cluster nodes must store all timestamps.
As a result, _no eviction/expiration is allowed for timestamp caches either_.
[IMPORTANT]
====
Asynchronous replication was selected as default for timestamps cache for performance reasons.
A side effect of this choice is that when an entity/collection is updated, for a very brief period of time stale queries might be returned.
It's important to note that due to how Infinispan deals with asynchronous replication, stale queries might be found even query is done right after an entity/collection update on same node.
The reason why asynchronous replication works this way is because there's a single node that's owner for a given key, and that enables changes to be applied in the same order in all nodes.
Without it, it could happen that an older value could replace a newer value in certain nodes.
====
[NOTE]
====
Hibernate must aggressively invalidate query results from the cache any time any instance of one of the entity types is modified. All cached query results referencing given entity type are invalidated, even if the change made to the specific entity instance would not have affected the query result.
The timestamps cache plays here an important role - it contains last modification timestamp for each entity type. After a cached query results is loaded, its timestamp is compared to all timestamps of the entity types that are referenced in the query and if any of these is higher, the cached query result is discarded and the query is executed against DB.
====
===== Cluster Cache Strategies
Before version 5.0, Infinispan only supported `transactional` and `read-only` strategies on top of _transactional invalidation_ caches.
Since version 5.0, Infinispan currently supports all cache concurrency modes in cluster mode, although not all combinations of configurations are compatible:
* _non-transactional invalidation_ caches are supported as well with `read-write` strategy. The actual setting of cache concurrency mode (`read-write` vs. `transactional`) is not honored, the appropriate strategy is selected based on the cache configuration (_non-transactional_ vs. _transactional_).
* `read-write` mode is supported on _non-transactional distributed/replicated_ caches, however, eviction should not be used in this configuration. Use of eviction can lead to consistency issues. Expiration (with reasonably long max-idle times) can be used.
@ -665,7 +800,7 @@ The available combinations are summarized in table below:
|nonstrict-read-write|non-transactional |distributed/replicated |no
|===
If your second level cache is not clustered, it is possible to use local cache instead of the clustered caches in all modes as described above.
Changing caches to behave different to the default behaviour explained in previous section is explained in <<caching-provider-infinispan-config, Configuration Properties>> section.
[[caching-provider-infinispan-stale-read-example]]
.Stale read with `nonstrict-read-write` strategy
@ -684,30 +819,11 @@ Tx3: read A = 1, B = 1 // reads after TX1 commit completes are consistent again
====
[[caching-provider-infinispan-region-factory]]
==== RegionFactory
==== Alternative RegionFactory
The hibernate-infinispan module defines two specific providers: `infinispan` and `infinispan-jndi`.
[[caching-provider-infinispan-region-factory-basic]]
===== `InfinispanRegionFactory`
If Hibernate and Infinispan are running in a standalone environment, the `InfinispanRegionFactory` should be configured as follows:
[[caching-provider-infinispan-region-factory-basic-example]]
.`InfinispanRegionFactory` configuration
====
[source, XML, indent=0]
----
<property
name="hibernate.cache.region.factory_class"
value="org.hibernate.cache.infinispan.InfinispanRegionFactory" />
----
====
[[caching-provider-infinispan-region-factory-jndi]]
===== `JndiInfinispanRegionFactory`
If the Infinispan `CacheManager` is bound to JNDI, then the `JndiInfinispanRegionFactory` should be used as a region factory:
In standalone environments or managed environments with no Infinispan integration, `org.hibernate.cache.infinispan.InfinispanRegionFactory` should be the choice for region factory implementation.
However, it might be sometimes desirable for the Infinispan cache manager to be shared between different JPA/Hibernate applications, for example to share intra-cluster communications channels.
In this case, the Infinispan cache manager could be bound into JNDI and the JPA/Hibernate applications could use an alternative region factory implementation:
[[caching-provider-infinispan-region-factory-jndi-example]]
.`JndiInfinispanRegionFactory` configuration
@ -724,15 +840,36 @@ If the Infinispan `CacheManager` is bound to JNDI, then the `JndiInfinispanRegio
----
====
===== Infinispan in JBoss AS/WildFly
==== Inside Wildfly
In WildFly, Infinispan is the default second level cache provider for JPA/Hibernate.
When using JPA in WildFly, region factory is automatically set upon configuring `hibernate.cache.use_second_level_cache=true` (by default second-level cache is not used).
For more information, please consult https://docs.jboss.org/author/display/WFLY9/JPA+Reference+Guide#JPAReferenceGuide-UsingtheInfinispansecondlevelcache[WildFly documentation].
You can find details about its configuration in link:{wildflydocroot}/JPA%20Reference%20Guide[the JPA reference guide], in particular, in the link:{wildflydocroot}/JPA%20Reference%20Guide#JPAReferenceGuide-UsingtheInfinispansecondlevelcache[second level cache] section.
The default second-level cache configurations used by Wildfly match the configurations explained above both for local and clustered environments.
So, an Infinispan based second-level cache should behave exactly the same standalone and within containers that provide Infinispan second-level cache as default for JPA/Hibernate.
[IMPORTANT]
====
Remember that if deploying to Wildfly or Application Server, the way some Infinispan second level cache provider configuration is defined changes slightly because the properties must include deployment and persistence information.
Check the <<caching-provider-infinispan-config,Configuration>> section for more details.
====
[[caching-provider-infinispan-config]]
==== Configuration properties
Hibernate-infinispan module comes with default configuration in `infinispan-config.xml` that is suited for clustered use. If there's only single instance accessing the DB, you can use more performant `infinispan-config-local.xml` by setting the `hibernate.cache.infinispan.cfg` property. If you require further tuning of the cache, you can provide your own configuration. Caches that are not specified in the provided configuration will default to `infinispan-config.xml` (if the provided configuration uses clustering) or `infinispan-config-local.xml`. It is not possible to specify the configuration this way in WildFly.
As explained above, Infinispan second-level cache provider comes with default configuration in `infinispan-config.xml` that is suited for clustered use.
If there's only single JVM accessing the DB, you can use more performant `infinispan-config-local.xml` by setting the `hibernate.cache.infinispan.cfg` property.
If you require further tuning of the cache, you can provide your own configuration.
Caches that are not specified in the provided configuration will default to `infinispan-config.xml` (if the provided configuration uses clustering) or `infinispan-config-local.xml`.
[WARNING]
====
It is not possible to specify the configuration this way in WildFly.
Cache configuration changes in Wildfly should be done either modifying the cache configurations inside the application server configuration, or creating new caches with the desired tweaks and plugging them accordingly.
See examples below on how entity/collection specific configurations can be applied.
====
[[caching-provider-infinispan-config-example]]
.Use custom Infinispan configuration
@ -781,6 +918,19 @@ For specifying cache template for specific region, use region name instead of th
value="my-entities-some-collection" />
====
.Use custom cache template in Wildfly
When applying entity/collection level changes inside JPA applications deployed in Wildfly, it is necessary to specify deployment name and persistence unit name:
====
[source, XML, indent=0]
----
<property
name="hibernate.cache.infinispan._war_or_ear_name_._unit_name_.com.example.MyEntity.cfg"
value="my-entities" />
<property
name="hibernate.cache.infinispan._war_or_ear_name_._unit_name_.com.example.MyEntity.someCollection.cfg"
value="my-entities-some-collection" />
====
[IMPORTANT]
====
Cache configurations are used only as a template for the cache created for given region (usually each entity hierarchy or collection has its own region). It is not possible to use the same cache for different regions.
@ -795,6 +945,50 @@ Some options in the cache configuration can also be overridden directly through
`hibernate.cache.infinispan._something_.expiration.wake_up_interval`:: Period of thread checking expired entries.
`hibernate.cache.infinispan.statistics`:: Globally enables/disable Infinispan statistics collection, and their exposure via JMX.
Example:
====
[source, XML, indent=0]
----
<property name="hibernate.cache.infinispan.entity.eviction.strategy"
   value= "LRU"/>
<property name="hibernate.cache.infinispan.entity.eviction.wake_up_interval"
   value= "2000"/>
<property name="hibernate.cache.infinispan.entity.eviction.max_entries"
   value= "5000"/>
<property name="hibernate.cache.infinispan.entity.expiration.lifespan"
   value= "60000"/>
<property name="hibernate.cache.infinispan.entity.expiration.max_idle"
   value= "30000"/>
====
With the above configuration, you're overriding whatever eviction/expiration settings were defined for the default entity cache name in the Infinispan cache configuration used, regardless of whether it's the default one or user defined.
More specifically, we're defining the following:
* All entities to use LRU eviction strategy
* The eviction thread to wake up every 2 seconds (2000 milliseconds)
* The maximum number of entities for each entity type to be 5000 entries
* The lifespan of each entity instance to be 1 minute (600000 milliseconds).
* The maximum idle time for each entity instance to be 30 seconds (30000 milliseconds).
You can also override eviction/expiration settings on a per entity/collection type basis in such way that the overriden settings only afftect that particular entity (i.e. `com.acme.Person`) or collection type (i.e. `com.acme.Person.addresses`).
Example:
[source,xml]
----
<property name="hibernate.cache.infinispan.com.acme.Person.eviction.strategy"
   value= "LIRS"/>
----
Inside of Wildfly, same as with the entity/collection configuration override, eviction/expiration settings would also require deployment name and persistence unit information:
[source,xml]
----
<property name="hibernate.cache.infinispan._war_or_ear_name_._unit_name_.com.acme.Person.eviction.strategy"
   value= "LIRS"/>
<property name="hibernate.cache.infinispan._war_or_ear_name_._unit_name_.com.acme.Person.expiration.lifespan"
   value= "65000"/>
----
[NOTE]
====
In versions prior to 5.1, `hibernate.cache.infinispan._something_.expiration.wake_up_interval` was called `hibernate.cache.infinispan._something_.eviction.wake_up_interval`.
@ -807,24 +1001,32 @@ The old property still works, but its use is deprecated.
Property `hibernate.cache.infinispan.use_synchronization` that allowed to register Infinispan as XA resource in the transaction has been deprecated in 5.0 and is not honored anymore. Infinispan 2LC must register as synchronizations on transactional caches. Also, non-transactional cache modes hook into the current JTA/JDBC transaction as synchronizations.
====
[[caching-provider-infinispan-config-query-timestamps]]
===== Configuring Query and Timestamps caches
[[caching-provider-infinispan-remote]]
==== Remote Infinispan Caching
Lately, several questions ( link:http://community.jboss.org/message/575814#575814[here] and link:http://community.jboss.org/message/585841#585841[here] ) have appeared in the Infinispan user forums asking whether it'd be possible to have an Infinispan second level cache that instead of living in the same JVM as the Hibernate code, it resides in a remote server, i.e. an Infinispan Hot Rod server.
It's important to understand that trying to set up second level cache in this way is generally not a good idea for the following reasons:
Since version 5.0 it is possible to configure query caches as _non-transactional_. Consistency guarantees are not changed and writes to the query cache should be faster.
* The purpose of a JPA/Hibernate second level cache is to store entities/collections recently retrieved from database or to maintain results of recent queries.
So, part of the aim of the second level cache is to have data accessible locally rather than having to go to the database to retrieve it everytime this is needed.
Hence, if you decide to set the second level cache to be remote as well, you're losing one of the key advantages of the second level cache: the fact that the cache is local to the code that requires it.
The query cache is configured so that queries are only cached locally . Alternatively, you can configure query caching to use replication by selecting the "replicated-query" as query cache name. However, replication for query cache only makes sense if, and only if, all of this conditions are true:
* Setting a remote second level cache can have a negative impact in the overall performance of your application because it means that cache misses require accessing a remote location to verify whether a particular entity/collection/query is cached.
With a local second level cache however, these misses are resolved locally and so they are much faster to execute than with a remote second level cache.
* Performing the query is quite expensive.
* The same query is very likely to be repeatedly executed on different cluster nodes.
* The query is unlikely to be invalidated out of the cache
There are however some edge cases where it might make sense to have a remote second level cache, for example:
[NOTE]
====
Hibernate must aggressively invalidate query results from the cache any time any instance of one of the entity types is modified. All cached query results referencing given entity type are invalidated, even if the change made to the specific entity instance would not have affected the query result.
The timestamps cache plays here an important role - it contains last modification timestamp for each entity type. After a cached query results is loaded, its timestamp is compared to all timestamps of the entity types that are referenced in the query and if any of these is higher, the cached query result is discarded and the query is executed against DB.
====
* You are having memory issues in the JVM where JPA/Hibernate code and the second level cache is running.
Off loading the second level cache to remote Hot Rod servers could be an interesting way to separate systems and allow you find the culprit of the memory issues more easily.
In default configuration, timestamps cache is asynchronously replicated. This means that a cached query on remote node can provide stale results for a brief time window before the remote timestamps cache is updated. However, requiring synchronous RPC would result in severe performance degradation.
* Your application layer cannot be clustered but you still want to run multiple application layer nodes.
In this case, you can't have multiple local second level cache instances running because they won't be able to invalidate each other for example when data in the second level cache is updated.
In this case, having a remote second level cache could be a way out to make sure your second level cache is always in a consistent state, will all nodes in the application layer pointing to it.
Further, but possibly outdated information can be found in http://infinispan.org/docs/8.0.x/user_guide/user_guide.html#_using_infinispan_as_jpa_hibernate_second_level_cache_provider[Infinispan documentation].
* Rather than having the second level cache in a remote server, you want to simply keep the cache in a separate VM still within the same machine.
In this case you would still have the additional overhead of talking across to another JVM, but it wouldn't have the latency of across a network.
The benefit of doing this is that:
** Size the cache separate from the application, since the cache and the application server have very different memory profiles.
One has lots of short lived objects, and the other could have lots of long lived objects.
** To pin the cache and the application server onto different CPU cores (using _numactl_ ), and even pin them to different physically memory based on the NUMA nodes.

View File

@ -27,6 +27,12 @@
<expiration max-idle="100000" interval="5000"/>
</invalidation-cache-configuration>
<replicated-cache-configuration name="replicated-entity" mode="SYNC" remote-timeout="20000" statistics="false" statistics-available="false">
<locking concurrency-level="1000" acquire-timeout="15000"/>
<transaction mode="NONE" />
<expiration max-idle="100000" interval="5000"/>
</replicated-cache-configuration>
<!-- A config appropriate for query caching. Does not replicate queries. -->
<local-cache-configuration name="local-query" statistics="false" statistics-available="false">
<locking concurrency-level="1000" acquire-timeout="15000"/>