activemq-artemis/docs/user-manual/restart-sequence.adoc

= Restart Sequence
:idprefix:
:idseparator: -

Apache ActiveMQ Artemis ships with 2 architectures for providing HA features.
The master and slave brokers can be configured either using network replication or using shared storage.
This document will share restart sequences for the brokers under various circumstances when the client applications are  connected to it.

== Restarting 1 broker at a time

When restarting the brokers one at a time at regular intervals, it is not important to follow any sequence.
We just need to make sure that atleast 1 broker in the master/slave pair is live to take up the connections from  the client applications.

[NOTE]
====
While restarting the brokers while the client applications are connected  kindly make sure that atleast one broker is always live to serve the connected  clients.
====

== Completely shutting down the brokers and starting

If there is situation that we need to completely shutdown the brokers and  start them again, please follow the following procedure:

. Shut down all the slave brokers.
. Shut down all the master brokers.
. Start all the master brokers.
. Start all the slave brokers.

This sequence is particularly important in case of network replication for  the following reasons: If the master broker is shutdown first, the slave broker will come live and accept  all the client connections.
Then when the slave broker is stopped, the clients will  remain connected to the last live connection i.e. slave.
Now, when we start the slave  and master brokers, the clients will keep trying to connecting to the last connection  i.e. with slave and will never be able to connect until we restart the client applications.
To avoid the hassle of restarting of client applications, we must follow the sequence  as suggested above.

== Split-brain situation

The following procedure helps the cluster to recover from the split-brain situation  and getting the client connections auto-reconnected to the cluster.
With this sequence, client applications do not need to be restarted in order to make  connection with the brokers.

During the split brain situation both the master and slave brokers are live and there is  no replication that is happening from the master broker to the slave.

In such situation, there can be some client applications that are connected to the master  broker and other connected to the slave broker.
Now after we restart the brokers and the  cluster is properly formed.

Here, the clients that were connected to the master broker during the split brain situation  are auto-connected to the cluster and start processing the messages.
But the clients that got  connected to the slave broker are still trying to make connection with the broker.
This happens  because the slave broker has restarted in 'back up' mode.

Thus, not all the clients get connected to the brokers and function properly.

To avoid such mishap, kindly follow the below sequence:

. Stop the slave broker
. Start the slave broker.
Observe the logs for the message "Waiting for the master"
. Stop the master broker.
. Start the master broker.
Observe the master broker logs for "Server is live" Observe the slave broker logs for "backup announced"
. Stop the master broker again.
Wait until the slave broker becomes live.
Observe that all the  clients are connected to the slave broker.
. Start the master broker.
This time, all the connections will be switched to master broker again,

[NOTE]
====
During the split brain situation, messages are produced on the slave broker since it is live.
While resolving the split brain situation, if there are some delta messages that are not produced  on the slave broker.
Those messages cannot be auto-recovered.
There will be manual intervention  required to retrieve the messages, sometime it is almost impossible to recover the messages.
The above mentioned sequence helps in forming the cluster that was broken due to split brain  and getting all the client applications to auto connected to the cluster without any need for  client applications to be restarted.
====
ARTEMIS-4383 migrate user docs to AsciiDoc Markdown, which is currently used for user-facing documentation, is good for a lot of things. However, it's not great for the kind of complex documentation we have and our need to produce both multi-page HTML and single-page PDF output via Maven. Markdown lacks features which would make the documentation easier to read, easier to navigate, and just look better overall. The current tool-chain uses honkit and a tool called Calibre. Honkit is written in TypeScript and is installed via NPM. Calibre is a native tool so it must be installed via an OS-specific package manager. All this complexity makes building, releasing, uploading, etc. a pain. AsciiDoc is relatively simple like Markdown, but it has more features for presentation and navigation not to mention Java-based Maven tooling to generate both HTML and PDF. Migrating will improve both the appearance of the documentation as well as the processes to generate and upload it. This commit contains the following changes: - Convert all the Markdown for the User Manual, Migration Guide, and Hacking guide to AsciiDoc via kramdown [1]. - Update the `artemis-website` build to use AsciiDoctor Maven tooling. - Update `RELEASING.md` with simplified instructions. - Update Hacking Guide with simplified instructions. - Use AsciiDoc link syntax in Artemis Maven doc plugin. - Drop EPUB & MOBI docs for User Manual as well as PDF for the Hacking Guide. All docs will be HTML only except for the User Manual which will have PDF. - Move all docs up out of their respective "en" directory. This was a hold-over from when we had docs in different languages. - Migration & Hacking Guides are now single-page HTML since they are relatively short. - Refactor README.md to simplify and remove redundant content. Benefits of the change: - Much simplified tooling. No more NPM packages or native tools. - Auto-generated table of contents for every chapter. - Auto-generated anchor links for every sub-section. - Overall more appealing presentation. - All docs will use the ActiveMQ favicon. - No more manual line-wrapping! AsciiDoc recommends one sentence per line and paragraphs are separated by a blank line. - AsciiDoctor plugins for IDEA are quite good. - Resulting HTML is less than half of the previous size. All previous links/bookmarks should continue to work. [1] https://github.com/asciidoctor/kramdown-asciidoc 2023-07-27 23:45:17 -04:00			`= Restart Sequence`
			`:idprefix:`
			`:idseparator: -`

			`Apache ActiveMQ Artemis ships with 2 architectures for providing HA features.`
			`The master and slave brokers can be configured either using network replication or using shared storage.`
			`This document will share restart sequences for the brokers under various circumstances when the client applications are connected to it.`

			`== Restarting 1 broker at a time`

			`When restarting the brokers one at a time at regular intervals, it is not important to follow any sequence.`
			`We just need to make sure that atleast 1 broker in the master/slave pair is live to take up the connections from the client applications.`

			`[NOTE]`
			`====`
			`While restarting the brokers while the client applications are connected kindly make sure that atleast one broker is always live to serve the connected clients.`
			`====`

			`== Completely shutting down the brokers and starting`

			`If there is situation that we need to completely shutdown the brokers and start them again, please follow the following procedure:`

			`. Shut down all the slave brokers.`
			`. Shut down all the master brokers.`
			`. Start all the master brokers.`
			`. Start all the slave brokers.`

			`This sequence is particularly important in case of network replication for the following reasons: If the master broker is shutdown first, the slave broker will come live and accept all the client connections.`
			`Then when the slave broker is stopped, the clients will remain connected to the last live connection i.e. slave.`
			`Now, when we start the slave and master brokers, the clients will keep trying to connecting to the last connection i.e. with slave and will never be able to connect until we restart the client applications.`
			`To avoid the hassle of restarting of client applications, we must follow the sequence as suggested above.`

			`== Split-brain situation`

			`The following procedure helps the cluster to recover from the split-brain situation and getting the client connections auto-reconnected to the cluster.`
			`With this sequence, client applications do not need to be restarted in order to make connection with the brokers.`

			`During the split brain situation both the master and slave brokers are live and there is no replication that is happening from the master broker to the slave.`

			`In such situation, there can be some client applications that are connected to the master broker and other connected to the slave broker.`
			`Now after we restart the brokers and the cluster is properly formed.`

			`Here, the clients that were connected to the master broker during the split brain situation are auto-connected to the cluster and start processing the messages.`
			`But the clients that got connected to the slave broker are still trying to make connection with the broker.`
			`This happens because the slave broker has restarted in 'back up' mode.`

			`Thus, not all the clients get connected to the brokers and function properly.`

			`To avoid such mishap, kindly follow the below sequence:`

			`. Stop the slave broker`
			`. Start the slave broker.`
			`Observe the logs for the message "Waiting for the master"`
			`. Stop the master broker.`
			`. Start the master broker.`
			`Observe the master broker logs for "Server is live" Observe the slave broker logs for "backup announced"`
			`. Stop the master broker again.`
			`Wait until the slave broker becomes live.`
			`Observe that all the clients are connected to the slave broker.`
			`. Start the master broker.`
			`This time, all the connections will be switched to master broker again,`

			`[NOTE]`
			`====`
			`During the split brain situation, messages are produced on the slave broker since it is live.`
			`While resolving the split brain situation, if there are some delta messages that are not produced on the slave broker.`
			`Those messages cannot be auto-recovered.`
			`There will be manual intervention required to retrieve the messages, sometime it is almost impossible to recover the messages.`
			`The above mentioned sequence helps in forming the cluster that was broken due to split brain and getting all the client applications to auto connected to the cluster without any need for client applications to be restarted.`
			`====`