activemq-artemis

History

jbertram ef5a9809f2 ARTEMIS-256 orchestrate failback deterministically The failback process needs to be deterministic rather than relying on various incarnations of Thread.sleep() at crucial points. Important aspects of this change include: 1) Make the initial replication synchronization process block at the very last step and wait for a response from the replica to ensure the replica has as the necessary data. This is a critical piece of knowledge during the failback process because it allows the soon-to-become-backup server to know for sure when it can shut itself down and allow the soon-to-become-live server to take over. Also, introduce a new configuration element called "initial-replication-sync-timeout" to conrol how long this blocking will occur. 2) Set the state of the server as 'LIVE' only after the server is fully started. This is necessary because once the soon-to-be-backup server shuts down it needs to know that the soon-to-be-live server has started fully before it restarts itself as the new backup. If the soon-to-be-backup server restarts before the soon-to-be-live is fully started then it won't actually become a backup server but instead will become a live server which will break the failback process. 3) Wait to receive the announcement of a backup server before failing-back.	2015-10-20 14:55:31 -04:00
..
en	ARTEMIS-256 orchestrate failback deterministically	2015-10-20 14:55:31 -04:00

jbertram ef5a9809f2 ARTEMIS-256 orchestrate failback deterministically

The failback process needs to be deterministic rather than relying on various
incarnations of Thread.sleep() at crucial points. Important aspects of this
change include:

1) Make the initial replication synchronization process block at the very
last step and wait for a response from the replica to ensure the replica has
as the necessary data. This is a critical piece of knowledge during the
failback process because it allows the soon-to-become-backup server to know
for sure when it can shut itself down and allow the soon-to-become-live
server to take over. Also, introduce a new configuration element called
"initial-replication-sync-timeout" to conrol how long this blocking will occur.

2) Set the state of the server as 'LIVE' only after the server is fully
started. This is necessary because once the soon-to-be-backup server shuts
down it needs to know that the soon-to-be-live server has started fully before
it restarts itself as the new backup. If the soon-to-be-backup server restarts
before the soon-to-be-live is fully started then it won't actually become a
backup server but instead will become a live server which will break the
failback process.

3) Wait to receive the announcement of a backup server before failing-back.

2015-10-20 14:55:31 -04:00

ARTEMIS-256 orchestrate failback deterministically

2015-10-20 14:55:31 -04:00