= High Availability and Failover :idprefix: :idseparator: - We define high availability as the _ability for the system to continue functioning after failure of one or more of the servers_. A part of high availability is _failover_ which we define as the _ability for client connections to migrate from one server to another in event of server failure so client applications can continue to operate_. == Terminology In order to discuss both configuration and runtime behavior consistently we need to define a pair of nouns and adjectives. These terms will be used throughout the documentation, configuration, source code, and runtime logs. === Configuration These nouns identify how the broker is _configured_, e.g. in `broker.xml`. primary:: This identifies the main broker in the high availability configuration. Oftentimes the hardware on this broker will be higher performance than the hardware on the backup broker. Typically, this broker is started before the backup and is active most of the time. backup:: This identifies the broker that should take over when the primary broker fails in a high availability configuration. Oftentimes the hardware on this broker will be lower performance than the hardware on the primary broker. Typically, this broker is started after the primary and is passive most of the time. === Runtime These adjectives describe the _behavior_ of the broker at runtime. For example, you could have a _passive_ primary or an _active_ backup. active:: This identifies a broker in a high-availability configuration which is accepting remote connections. For example, consider the scenario where the primary broker has failed and its backup has taken over. The backup would be described as _active_ at that point since it is accepting remote connections. passive:: This identifies a broker in a high-availability configuration which is **not** accepting remote connections. For example, consider the scenario where the primary broker was started and then the backup broker was started. The backup broker would be _passive_ since it is not accepting remote connections. It is waiting for the primary to fail before it activates and begins accepting remote connections. == Primary/Backup Groups Apache ActiveMQ Artemis allows servers to be linked together as _primary/backup_ groups where each primary server can have 1 or more backup servers. A backup server is owned by only one primary server. Backup servers are not operational until failover occurs. However, one chosen backup, which will be passive, announces its status and waits to take over the primary server's work. Before failover, only the primary server is active, serving clients while the backup servers remain passive, awaiting to become active when the primary fails. When a primary server crashes or is brought down in the correct mode the backup server currently in passive mode will activate. If a primary server restarts after a failover then it will be passive and have priority and be the next server to become active when the current active backup server goes down. If the active backup server is configured to allow automatic failback then it will detect the primary server coming back up and automatically stop. === HA Policies Apache ActiveMQ Artemis supports two different strategies for backing up a server: * shared store * replication These are configured via the `ha-policy` configuration element, e.g.: [,xml] ---- ---- or [,xml] ---- ---- As well as these 2 strategies there is also a 3rd called `primary-only`. This of course means there will be no Backup Strategy and is the default if none is provided, however this is used to configure `scale-down` which we will cover in a later chapter. [NOTE] ==== The `ha-policy` configurations replaces any current HA configuration in the root of the `broker.xml` configuration. All old configuration is now deprecated although best efforts will be made to honour it if configured this way. ==== [NOTE] ==== Only persistent message data will survive failover. Any non persistent message data will not be available after failover. ==== The `ha-policy` type configures which strategy a cluster should use to provide the backing up of a server's data. Within this configuration element we configure how a server should behave within the cluster, either as a primary (active), backup (passive) or colocated (both active and passive). This would look something like: [,xml] ---- ---- or [,xml] ---- ---- or [,xml] ---- ---- _Replication_ allows the configuration of two new roles to enable _pluggable quorum_ provider configuration, by using: [,xml] ---- ---- to configure the classic _primary_ role, and [,xml] ---- ---- for the classic _backup_ one. If _replication_ is configured using such new roles some additional element are required to complete configuration as detailed later. === IMPORTANT NOTE ON PLUGGABLE QUORUM VOTE FEATURE This feature is still *EXPERIMENTAL*. Extra testing should be done before running this feature into production. Please raise issues eventually found to the ActiveMQ Artemis Mail Lists. It means: * it's configuration can change until declared as *officially stable* === Data Replication When using replication, the primary and the backup servers do not share the same data directories, all data synchronization is done over the network. Therefore all (persistent) data received by the primary server will be duplicated to the backup. Notice that upon start-up the backup server will first need to synchronize all existing data from the primary server before becoming capable of replacing the primary server should it fail. So unlike when using shared storage, a replicating backup will not be a fully operational backup right after start-up, but only after it finishes synchronizing the data with its primary server. The time it will take for this to happen will depend on the amount of data to be synchronized and the connection speed. [NOTE] ==== In general, synchronization occurs in parallel with current network traffic so this won't cause any blocking on current clients. However, there is a critical moment at the end of this process where the replicating server must complete the synchronization and ensure the replica acknowledges this completion. This exchange between the replicating server and replica will block any journal related operations. The maximum length of time that this exchange will block is controlled by the `initial-replication-sync-timeout` configuration element. ==== Replication will create a copy of the data at the backup. One issue to be aware of is: in case of a successful fail-over, the backup's data will be newer than the primary's data. If you configure your backup to allow failback to the primary then when the primary is restarted it will be passive and the active backup will synchronize its data with the passive primary before stopping to allow the passive primary to become active again. If both servers are shutdown then the administrator will have to determine which one has the latest data. The replicating primary and backup pair must be part of a cluster. The Cluster Connection also defines how backup servers will find the remote primary servers to pair with. Refer to xref:clusters.adoc#clusters[Clusters] for details on how this is done, and how to configure a cluster connection. Notice that: * Both primary and backup servers must be part of the same cluster. Notice that even a simple primary/backup replicating pair will require a cluster configuration. * Their cluster user and password must match. Within a cluster, there are two ways that a backup server will locate a primary server to replicate from. These are: specifying a node group:: You can specify a group of primary servers that a backup server can connect to. This is done by configuring `group-name` in either the `primary` or the `backup` element of the `broker.xml`. A backup will only connect to a primary that shares the same node group name. connecting to any live:: This will be the behaviour if `group-name` is not configured allowing a backup server to connect to any primary server. [NOTE] ==== A `group-name` example: suppose you have 5 primary servers and 6 backup servers: * `primary1`, `primary2`, `primary3`: with `group-name=fish` * `primary4`, `primary5`: with `group-name=bird` * `backup1`, `backup2`, `backup3`, `backup4`: with `group-name=fish` * `backup5`, `backup6`: with `group-name=bird` After joining the cluster the backups with `group-name=fish` will search for primary servers with `group-name=fish` to pair with. Since there is one backup too many, the `fish` will remain with one spare backup. The 2 backups with `group-name=bird` (`backup5` and `backup6`) will pair with primary servers `primary4` and `primary5`. ==== The backup will search for any primary server that it is configured to connect to. It then tries to replicate with each primary server in turn until it finds a primary server that has no current backup configured. If no primary server is available it will wait until the cluster topology changes and repeats the process. [NOTE] ==== This is an important distinction from a shared-store backup, if a backup starts and does not find a primary server, the server will just activate and start to serve client requests. In the replication case, the backup just keeps waiting for a primary server to pair with. Note that in replication the backup server does not know whether any data it might have is up to date, so it really cannot decide to activate automatically. To activate a replicating backup server using the data it has, the administrator must change its configuration to make it a primary server by changing `backup` to `primary`. ==== Much like in the shared-store case, when the primary server stops or crashes, its backup will become active and take over its duties. Specifically, the backup will become active when it loses connection to its primary server. This can be problematic because it can also happen as the result of temporary network problem. The issue can be solved in two different ways, depending on which replication roles are configured: * *non-pluggable replication*: backup will try to determine whether it still can connect to the other servers in the cluster. If it can connect to more than half the servers, it will become active. If more than half the servers also disappeared with the primary, the backup will wait and try reconnecting with the primary. This avoids a split brain situation. * *pluggable replication*: backup relies on a pluggable quorum provider (configurable via `manager` xml element) to detect if there's any active primary. [NOTE] ==== A backup in the *pluggable replication* still needs to carefully configure xref:connection-ttl.adoc#detecting-dead-connections[connection-ttl] in order to promptly send a request to the quorum manager to become active before failing-over. ==== ==== Configuration To configure a non-pluggable replication's primary and backup servers to be a replicating pair, configure the primary server in ' `broker.xml` to have: [,xml] ---- ... ... ---- The backup server must be similarly configured but as a `backup` [,xml] ---- ---- To configure a pluggable quorum replication's primary and backup use: [,xml] ---- ... ... ---- and [,xml] ---- ---- ==== All Replication Configuration ===== Primary The following table lists all the `ha-policy` configuration elements for HA strategy Replication for `primary`: check-for-active-server:: Whether to check the cluster for a (live) server using our own server ID when starting up. This is an important option to avoid split-brain when failover happens and the primary is restarted. Default is `false`. cluster-name:: Name of the cluster configuration to use for replication. This setting is only necessary if you configure multiple cluster connections. If configured then the connector configuration of the cluster configuration with this name will be used when connecting to the cluster to discover if an active server is already running, see `check-for-active-server`. If unset then the default cluster connections configuration is used (the first one configured). group-name:: If set, backup servers will only pair with primary servers with matching `group-name`. initial-replication-sync-timeout:: The amount of time the replicating server will wait at the completion of the initial replication process for the replica to acknowledge it has received all the necessary data. The default is 30,000 milliseconds. + NOTE: during this interval any journal related operations will be blocked. ===== Backup The following table lists all the `ha-policy` configuration elements for HA strategy Replication for `backup`: cluster-name:: Name of the cluster configuration to use for replication. This setting is only necessary if you configure multiple cluster connections. If configured then the connector configuration of the cluster configuration with this name will be used when connecting to the cluster to discover if an active server is already running, see `check-for-active-server`. If unset then the default cluster connections configuration is used (the first one configured). group-name:: If set, backup servers will only pair with primary servers with matching group-name max-saved-replicated-journals-size:: This option specifies how many replication backup directories will be kept when server starts as replica. Every time when server starts as replica all former data moves to 'oldreplica.\{id}' directory, where id is growing backup index, this parameter sets the maximum number of such directories kept on disk. allow-failback:: Whether a server will automatically stop when another places a request to take over its place. The use case is when the backup has failed over. initial-replication-sync-timeout:: After failover and the backup has become active, this is set on the new active server. It represents the amount of time the replicating server will wait at the completion of the initial replication process for the replica to acknowledge it has received all the necessary data. The default is 30,000 milliseconds. + NOTE: During this interval any journal related operations will be blocked. ==== Pluggable Quorum Vote Replication configurations Pluggable Quorum Vote replication configuration options are a bit different from classic replication, mostly because of its customizable nature. https://curator.apache.org/[Apache curator] is used by the default quorum provider. Below some example configurations to show how it works. For `primary`: [,xml] ---- org.apache.activemq.artemis.quorum.zookeeper.CuratorDistributedPrimitiveManager ---- And `backup`: [,xml] ---- org.apache.activemq.artemis.quorum.zookeeper.CuratorDistributedPrimitiveManager true ---- The configuration of `class-name` as follows [,xml] ---- org.apache.activemq.artemis.quorum.zookeeper.CuratorDistributedPrimitiveManager ---- isn't really needed, because Apache Curator is the default provider, but has been shown for completeness. The `properties` element: [,xml] ---- ---- can specify a list of `property` elements in the form of key-value pairs, appropriate to what is supported by the specified `class-name` provider. Apache Curator's provider allows the following properties: * https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#connectString(java.lang.String)[`connect-string`]: (no default) * https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#sessionTimeoutMs(int)[`session-ms`]: (default is 18000 ms) * https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#simulatedSessionExpirationPercent(int)[`session-percent`]: (default is 33); should be \<= default, see https://cwiki.apache.org/confluence/display/CURATOR/TN14 for more info * https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#connectionTimeoutMs(int)[`connection-ms`]: (default is 8000 ms) * https://curator.apache.org/apidocs/org/apache/curator/retry/RetryNTimes.html#%3Cinit%3E(int,int)[`retries`]: (default is 1) * https://curator.apache.org/apidocs/org/apache/curator/retry/RetryNTimes.html#%3Cinit%3E(int,int)[`retries-ms`]: (default is 1000 ms) * https://curator.apache.org/apidocs/org/apache/curator/framework/CuratorFrameworkFactory.Builder.html#namespace(java.lang.String)[`namespace`]: (no default) Configuration of the https://zookeeper.apache.org/[Apache ZooKeeper] ensemble is the responsibility of the user, but there are few *suggestions to improve the reliability of the quorum service*: * broker `session_ms` must be `>= 2 * server tick time` and `+<= 20 * server tick time+` as by https://zookeeper.apache.org/doc/r3.6.3/zookeeperAdmin.html[ZooKeeper 3.6.3 admin guide]: it directly impacts how fast a backup can failover to an isolated/killed/unresponsive live; the higher, the slower. * GC on broker machine should allow keeping GC pauses within 1/3 of `session_ms` in order to let the ZooKeeper heartbeat protocol work reliably. If that is not possible, it is better to increase `session_ms`, accepting a slower failover. * ZooKeeper must have enough resources to keep GC (and OS) pauses much smaller than server tick time: please consider carefully if broker and ZooKeeper node should share the same physical machine, depending on the expected load of the broker * network isolation protection requires configuring >=3 ZooKeeper nodes .Important Notes on pluggable quorum replication configuration **** There are some classic replication configuration options which are no longer needed, i.e.: * `vote-on-replication-failure` * `quorum-vote-wait` * `vote-retries` * `vote-retries-wait` * `check-for-active-server` *Regarding replication configuration with the https://curator.apache.org/[Apache curator] quorum provider...* As noted previously, `session-ms` affects the failover duration. The passive broker can activate after `session-ms` expires or if the active broker voluntary gives up its role e.g. during a fail-back/manual broker stop, it happens immediately. For the former case (session expiration with active broker no longer present), the passive broker can detect an unresponsive active broker by using: . cluster connection PINGs (affected by xref:connection-ttl.adoc#detecting-dead-connections[connection-ttl] tuning) . closed TCP connection notification (depends on TCP configuration and networking stack/topology) The suggestion is to tune `connection-ttl` low enough to attempt failover as soon as possible, while taking in consideration that the whole fail-over duration cannot last less than the configured `session-ms`. **** ===== Peer or Multi Primary With coordination delegated to the quorum service, roles are less important. It is possible to have two peer servers compete for activation; the winner activating as live, the looser taking up a backup role. On restart, 'any' peer server with the most up to date journal can activate. The instances need to know in advance, what identity they will coordinate on. In the replication 'primary' ha policy we can explicitly set the 'coordination-id' to a common value for all peers in a cluster. For `multi primary`: [,xml] ---- org.apache.activemq.artemis.quorum.zookeeper.CuratorDistributedPrimitiveManager peer-journal-001 ---- NOTE: the string value provided will be converted internally into a 16 byte UUID, so it may not be immediately recognisable or human-readable, however it will ensure that all 'peers' coordinate. === Shared Store When using a shared store both primary and backup servers share the _same_ entire data directory using a shared file system. This means the paging directory, journal directory, large messages and binding journal. When failover occurs and a backup server takes over, it will load the persistent storage from the shared file system and clients can connect to it. This style of high availability differs from data replication in that it requires a shared file system which is accessible by both the primary and backup nodes. Typically this will be some kind of high performance Storage Area Network (SAN). We do not recommend you use Network Attached Storage (NAS), e.g. NFS mounts to store any shared journal (NFS is slow). The advantage of shared-store high availability is that no replication occurs between the primary and backup nodes. This means it does not suffer any performance penalties due to the overhead of replication during normal operation. The disadvantage of shared store replication is that it requires a shared file system, and when the backup server activates it needs to load the journal from the shared store which can take some time depending on the amount of data in the store. If you require the highest performance during normal operation then acquire access to a fast SAN and deal with a slightly slower failover (depending on amount of data). image::images/ha-shared-store.png[] ==== Configuration To configure the primary and backup servers to share their store use the `ha-policy` configuration in `broker.xml`: [,xml] ---- ... ... ---- The backup server must also be configured as a backup. [,xml] ---- ---- In order for primary/backup groups to operate properly with a shared store, both servers must have configured the location of journal directory to point to the _same shared location_ (as explained in xref:persistence.adoc#persistence[Configuring the message journal]) [NOTE] ==== todo write something about GFS ==== Also each node, primary and backups, will need to have a cluster connection defined even if not part of a cluster. The Cluster Connection info defines how backup servers announce there presence to its primary server or any other nodes in the cluster. Refer to xref:clusters.adoc#clusters[Clusters] for details on how this is done. === Failing Back to Primary Server After a primary server has failed and a backup taken has taken over its duties, you may want to restart the primary server and have clients fail back. ==== Shared Store In case of "shared disk" you have a couple of options: . Simply restart the primary and kill the backup. You can do this by killing the process itself. . Alternatively you can set `allow-fail-back` to `true` on the backup which will force the backup that has become active to automatically stop. This configuration would look like: + [,xml] ---- true ---- It is also possible, in the case of shared store, to cause failover to occur on normal server shutdown, to enable this set the following property to true in the `ha-policy` configuration on either the `primary` or `backup` like so: [,xml] ---- true ---- By default this is set to false, if by some chance you have set this to false but still want to stop the server normally and cause failover then you can do this by using the management API as explained at xref:management.adoc#management[Management] You can also force the active backup to shutdown when the primary comes back up allowing the primary to take over automatically by setting the following property in the `broker.xml` configuration file as follows: [,xml] ---- true ---- ==== Replication As with shared storage the `allow-failback` option can be set for both non-pluggable and pluggable replication. ===== Non-Pluggable [,xml] ---- true ---- With non-pluggable replication you need to set an extra property `check-for-active-server` to `true` in the `primary` configuration. If set to `true` then during start-up the primary server will first search the cluster for another active server using its nodeID. If it finds one it will contact this server and try to "fail-back". Since this is a remote replication scenario the primary will have to synchronize its data with the backup server running with its ID. Once they are in sync it will request the other server (which it assumes it is a backup that has assumed its duties) to shutdown in order for it to take over. This is necessary because otherwise the primary server has no means to know whether there was a fail-over or not, and if there was, if the server that took its duties is still running or not. To configure this option at your `broker.xml` configuration file as follows: [,xml] ---- true ---- [WARNING] .For Non-Pluggable Replication ==== Be aware that if you restart a primary server after failover has occurred then `check-for-active-server` must be set to `true`. If not the primary server will restart and serve the same messages that the backup has already handled causing duplicates. ==== ===== Pluggable One key difference between pluggable replication and non-pluggable replication is that with non-pluggable replication if the primary cannot reach any active server with its nodeID then it activates unilaterally. With pluggable replication the responsibilities of coordination are delegated to the quorum provider. There are no unilateral decisions. The primary will only activate when it knows that it has the most up to date version of the journal identified by its nodeID. In short: *a primary cannot become active without consensus when using pluggable replication*. Here's an example configuration: [,xml] ---- ---- ==== All Shared Store Configuration ===== Primary The following lists all the `ha-policy` configuration elements for HA strategy shared store for `primary`: failover-on-shutdown:: If set to `true` then when this server is stopped normally the backup will become active assuming failover. If false then the backup server will remain passive. Note that if `false` and you want failover to occur then you can use the management API as explained at xref:management.adoc#management[Management]. wait-for-activation:: If set to true then server startup will wait until it is activated. If set to false then server startup will be done in the background. Default is `true`. ===== Backup The following lists all the `ha-policy` configuration elements for HA strategy Shared Store for `backup`: failover-on-shutdown:: In the case of a backup that has become active then when set to `true` and this server is stopped normally the passive primary will become active assuming failover. If `false` then the primary server will remain passive. Note that if `false` and you want failover to occur then you can use the management API as explained at xref:management.adoc#management[Management]. allow-failback:: Whether a server will automatically stop when another places a request to take over its place. The use case is when the backup has failed over. ==== Colocated Backup Servers It is also possible when running standalone to colocate backup servers in the same JVM as another primary server. Primary Servers can be configured to request another primary server in the cluster to start a backup server in the same JVM either using shared store or replication. The new backup server will inherit its configuration from the primary server creating it apart from its name, which will be set to `colocated_backup_n` where n is the number of backups the server has created, and any directories and its Connectors and Acceptors which are discussed later on in this chapter. A primary server can also be configured to allow requests from backups and also how many backups a primary server can start. This way you can evenly distribute backups around the cluster. This is configured via the `ha-policy` element in the `broker.xml` file like so: [,xml] ---- true 1 -1 5000 ---- the above example is configured to use replication, in this case the `primary` and `backup` configurations must match those for normal replication as in the previous chapter. `shared-store` is also supported image::images/ha-colocated.png[ActiveMQ Artemis ha-colocated.png] ==== Configuring Connectors and Acceptors If the HA Policy is `colocated` then `connectors` and `acceptors` will be inherited from the primary server creating it and offset depending on the setting of `backup-port-offset` configuration element. If this is set to say 100 (which is the default) and a connector is using port 61616 then this will be set to 61716 for the first server created, 61816 for the second, and so on. [NOTE] ==== for INVM connectors and Acceptors the id will have `colocated_backup_n` appended, where n is the backup server number. ==== ==== Remote Connectors It may be that some of the Connectors configured are for external servers and hence should be excluded from the offset. for instance a connector used by the cluster connection to do quorum voting for a replicated backup server, these can be omitted from being offset by adding them to the `ha-policy` configuration like so: [,xml] ---- ... remote-connector ... ---- ==== Configuring Directories Directories for the Journal, Large messages and Paging will be set according to what the HA strategy is. If shared store the requesting server will notify the target server of which directories to use. If replication is configured then directories will be inherited from the creating server but have the new backups name appended. The following table lists all the `ha-policy` configuration elements for colocated policy: request-backup:: If true then the server will request a backup on another node backup-request-retries:: How many times the primary server will try to request a backup, `-1` means for ever. backup-request-retry-interval:: How long to wait for retries between attempts to request a backup server. max-backups:: How many backups a primary server can create backup-port-offset:: The offset to use for the Connectors and Acceptors when creating a new backup server. === Scaling Down An alternative to using primary/backup groups is to configure _scaledown_. When configured for scale down a server can copy all its messages and transaction state to another active server. The advantage of this is that you don't need full backups to provide some form of HA, however there are disadvantages with this approach the first being that it only deals with a server being stopped and not a server crash. The caveat here is if you configure a backup to scale down. Another disadvantage is that it is possible to lose message ordering. This happens in the following scenario, say you have 2 active servers and messages are distributed evenly between the servers from a single producer, if one of the servers scales down then the messages sent back to the other server will be in the queue after the ones already there, so server 1 could have messages 1,3,5,7,9 and server 2 would have 2,4,6,8,10, if server 2 scales down the order in server 1 would be 1,3,5,7,9,2,4,6,8,10. image::images/ha-scaledown.png[ActiveMQ Artemis ha-scaledown.png] The configuration for an active server to scale down would be something like: [,xml] ---- server1-connector ---- In this instance the server is configured to use a specific connector to scale down, if a connector is not specified then the first INVM connector is chosen, this is to make scale down from a backup server easy to configure. It is also possible to use discovery to scale down, this would look like: [,xml] ---- ---- ==== Scale Down with groups It is also possible to configure servers to only scale down to servers that belong in the same group. This is done by configuring the group like so: [,xml] ---- ... my-group ---- In this scenario only servers that belong to the group `my-group` will be scaled down to ==== Scale Down and Backups It is also possible to mix scale down with HA via backup servers. If a backup is configured to scale down then after failover has occurred, instead of starting fully the backup server will immediately scale down to another active server. The most appropriate configuration for this is using the `colocated` approach. It means that as you bring up primary servers they will automatically be backed up, and as they are shutdown their messages are made available on another active server. A typical configuration would look like: [,xml] ---- 44 33 3 false 33 purple true abcdefg tiddles 22 33rrrrr false boo! ---- ==== Scale Down and Clients When a server is stopping and preparing to scale down it will send a message to all its clients informing them which server it is scaling down to before disconnecting them. At this point the client will reconnect however this will only succeed once the server has completed the scaledown process. This is to ensure that any state such as queues or transactions are there for the client when it reconnects. The normal reconnect settings apply when the client is reconnecting so these should be high enough to deal with the time needed to scale down. == Client Failover Apache ActiveMQ Artemis clients can be configured to receive knowledge of all primary and backup servers, so that in event of connection failure the client will detect this and reconnect to the backup server. The backup server will then automatically recreate any sessions and consumers that existed on each connection before failover, thus saving the user from having to hand-code manual reconnection logic. For further details see xref:client-failover.adoc#core-client-failover[Client Failover] .A Note on Server Replication **** Apache ActiveMQ Artemis does not replicate full server state between active and passive servers. When the new session is automatically recreated on the backup it won't have any knowledge of messages already sent or acknowledged in that session. Any in-flight sends or acknowledgements at the time of failover might also be lost. By replicating full server state, theoretically we could provide a 100% transparent seamless failover, which would avoid any lost messages or acknowledgements, however this comes at a great cost: replicating the full server state (including the queues, session, etc.). This would require replication of the entire server state machine; every operation on the primary server would have to replicated on the replica server(s) in the exact same global order to ensure a consistent replica state. This is extremely hard to do in a performant and scalable way, especially when one considers that multiple threads are changing the active's server state concurrently. It is possible to provide full state machine replication using techniques such as _virtual synchrony_, but this does not scale well and effectively serializes all operations to a single thread, dramatically reducing concurrency. Other techniques for multi-threaded active replication exist such as replicating lock states or replicating thread scheduling but this is very hard to achieve at a Java level. Consequently it has been decided that it worth not worth massively reducing performance and concurrency for the sake of 100% transparent failover. Even without 100% transparent failover, it is simple to guarantee _once and only once_ delivery, even in the case of failure, by using a combination of duplicate detection and retrying of transactions. However this is not 100% transparent to the client code. **** === Handling Blocking Calls During Failover If the client code is in a blocking call to the server, waiting for a response to continue its execution, when failover occurs, the new session will not have any knowledge of the call that was in progress. This call might otherwise hang for ever, waiting for a response that will never come. To prevent this, Apache ActiveMQ Artemis will unblock any blocking calls that were in progress at the time of failover by making them throw a `javax.jms.JMSException` (if using JMS), or a `ActiveMQException` with error code `ActiveMQException.UNBLOCKED`. It is up to the client code to catch this exception and retry any operations if desired. If the method being unblocked is a call to commit(), or prepare(), then the transaction will be automatically rolled back and Apache ActiveMQ Artemis will throw a `javax.jms.TransactionRolledBackException` (if using JMS), or a `ActiveMQException` with error code `ActiveMQException.TRANSACTION_ROLLED_BACK` if using the core API. === Handling Failover With Transactions If the session is transactional and messages have already been sent or acknowledged in the current transaction, then the server cannot be sure that messages sent or acknowledgements have not been lost during the failover. Consequently the transaction will be marked as rollback-only, and any subsequent attempt to commit it will throw a `javax.jms.TransactionRolledBackException` (if using JMS), or a `ActiveMQException` with error code `ActiveMQException.TRANSACTION_ROLLED_BACK` if using the core API. [WARNING] ==== The caveat to this rule is when XA is used either via JMS or through the core API. If 2 phase commit is used and prepare has already been called then rolling back could cause a `HeuristicMixedException`. Because of this the commit will throw a `XAException.XA_RETRY` exception. This informs the Transaction Manager that it should retry the commit at some later point in time, a side effect of this is that any non persistent messages will be lost. To avoid this use persistent messages when using XA. With acknowledgements this is not an issue since they are flushed to the server before prepare gets called. ==== It is up to the user to catch the exception, and perform any client side local rollback code as necessary. There is no need to manually rollback the session - it is already rolled back. The user can then just retry the transactional operations again on the same session. Apache ActiveMQ Artemis ships with a fully functioning example demonstrating how to do this, please see xref:examples.adoc#examples[the examples] chapter. If failover occurs when a commit call is being executed, the server, as previously described, will unblock the call to prevent a hang, since no response will come back. In this case it is not easy for the client to determine whether the transaction commit was actually processed before failure occurred. [NOTE] ==== If XA is being used either via JMS or through the core API then an `XAException.XA_RETRY` is thrown. This is to inform Transaction Managers that a retry should occur at some point. At some later point in time the Transaction Manager will retry the commit. If the original commit has not occurred then it will still exist and be committed, if it does not exist then it is assumed to have been committed although the transaction manager may log a warning. ==== To remedy this, the client can simply enable duplicate detection (xref:duplicate-detection.adoc#duplicate-message-detection[Duplicate Message Detection]) in the transaction, and retry the transaction operations again after the call is unblocked. If the transaction had indeed been committed successfully before failover, then when the transaction is retried, duplicate detection will ensure that any durable messages resent in the transaction will be ignored on the server to prevent them getting sent more than once. [NOTE] ==== By catching the rollback exceptions and retrying, catching unblocked calls and enabling duplicate detection, _once and only once_ delivery guarantees can be provided for messages in the case of failure, guaranteeing 100% no loss or duplication of messages. ==== ==== Handling Failover With Non Transactional Sessions If the session is non transactional, messages or acknowledgements can be lost in the event of a failover. If you wish to provide _once and only once_ delivery guarantees for non transacted sessions too, enable duplicate detection, and catch unblock exceptions as described in xref:ha.adoc#handling-blocking-calls-during-failover[Handling Blocking Calls During Failover] ==== Use client connectors to fail over Apache ActiveMQ Artemis clients retrieve the backup connector from the topology updates that the cluster brokers send. If the connection options of the clients don't match the options of the cluster brokers the clients can define a client connector that will be used in place of the connector in the topology. To define a client connector it must have a name that matches the name of the connector defined in the `cluster-connection` of the broker, i.e. supposing to have a primary broker with the cluster connector name `node-0` and a backup broker with the `cluster-connector` name `node-1` the client connection url must define 2 connectors with the names `node-0` and `node-1`: Primary broker config: [,xml] ---- tcp://localhost:61616 ... node-0 ... ---- Backup broker config [,xml] ---- tcp://localhost:61617 node-1 ... ---- Client connection url ---- (tcp://localhost:61616?name=node-0,tcp://localhost:61617?name=node-1)?ha=true&reconnectAttempts=-1 ---- === Getting Notified of Connection Failure JMS provides a standard mechanism for getting notified asynchronously of connection failure: `java.jms.ExceptionListener`. Please consult the JMS javadoc or any good JMS tutorial for more information on how to use this. The Apache ActiveMQ Artemis core API also provides a similar feature in the form of the class `org.apache.activemq.artemis.core.client.SessionFailureListener` Any ExceptionListener or SessionFailureListener instance will always be called by ActiveMQ Artemis on event of connection failure, *irrespective* of whether the connection was successfully failed over, reconnected or reattached, however you can find out if reconnect or reattach has happened by either the `failedOver` flag passed in on the `connectionFailed` on `SessionfailureListener` or by inspecting the error code on the `javax.jms.JMSException` which will be one of the following: JMSException error codes: FAILOVER:: Failover has occurred and we have successfully reattached or reconnected. DISCONNECT:: No failover has occurred and we are disconnected. === Application-Level Failover In some cases you may not want automatic client failover, and prefer to handle any connection failure yourself, and code your own manually reconnection logic in your own failure handler. We define this as _application-level_ failover, since the failover is handled at the user application level. To implement application-level failover, if you're using JMS then you need to set an `ExceptionListener` class on the JMS connection. The `ExceptionListener` will be called by Apache ActiveMQ Artemis in the event that connection failure is detected. In your `ExceptionListener`, you would close your old JMS connections, potentially look up new connection factory instances from JNDI and creating new connections. For a working example of application-level failover, please see xref:examples.adoc#application-layer-failover[the Application-Layer Failover Example]. If you are using the core API, then the procedure is very similar: you would set a `FailureListener` on the core `ClientSession` instances.