The failback process needs to be deterministic rather than relying on various
incarnations of Thread.sleep() at crucial points. Important aspects of this
change include:
1) Make the initial replication synchronization process block at the very
last step and wait for a response from the replica to ensure the replica has
as the necessary data. This is a critical piece of knowledge during the
failback process because it allows the soon-to-become-backup server to know
for sure when it can shut itself down and allow the soon-to-become-live
server to take over. Also, introduce a new configuration element called
"initial-replication-sync-timeout" to conrol how long this blocking will occur.
2) Set the state of the server as 'LIVE' only after the server is fully
started. This is necessary because once the soon-to-be-backup server shuts
down it needs to know that the soon-to-be-live server has started fully before
it restarts itself as the new backup. If the soon-to-be-backup server restarts
before the soon-to-be-live is fully started then it won't actually become a
backup server but instead will become a live server which will break the
failback process.
3) Wait to receive the announcement of a backup server before failing-back.
Netty 4.x uses pooled buffers. These buffers can run out of memory when
transferring large amounts of data over connection. This was causing an
OutOfMemory exception to be thrown on the CoreBridge when tranferring
large messages. Netty provides a callback handler to notify listeners
when a Connection is writable. This patch adds the ability to register
connection writable listeners to the Netty connection and registers the
relevant callback from the Bridge to avoid writing when the buffers are
full.
https://issues.apache.org/jira/browse/ARTEMIS-217 fixing dead lock
This is using a separate lock for notifications, this way we won't hold a lock while communicating on netty which was the issue here.
Inbound sessions are always created from the same ActiveMQConnectionFactory
which means the load-balancing policy is applied to them in the expected
manner. However, outbound sessions are created from independent, unique
ActiveMQConnectionFactory instances which means that the load-balancing
doesn't follow the expected pattern.
This commit changes this behavior by caching each unique
ActiveMQConnectionFactory instance and using it for both inbound and outbound
sessions potentially. This ensures the sessions are load-balanced as
expected.
this is just calling Idea format on all the files using the new style
I am separating manual changes from automatic changes in case I have to repeat the manual changes again
https://issues.apache.org/jira/browse/ARTEMIS-142
I have previously exposed this exception as some errors were being hidden
Now that I have cleared those, It's best to remove this as some errors are now happening
https://issues.apache.org/jira/browse/ARTEMIS-136
From what I researched from implementers of XA TM if you throw ERR over communication errors the transaction manager will create
an heuristic transaction to be manually dealt with.
Other XA Implementations (such as Oracle JDBC) are return FAIL over communication failures during any XA operation.