Communication between nodes will fail under certain topologies
JGroups has something called JForkChannel that could be used on container systems.
And be injected into Artemis.
For some reason that channel cannot be reused for more than one channel per VM.
And it cannot ever be closed.
I am keeping the trace logs I used to debug this issue in case anything similar to this happens again.
https://issues.apache.org/jira/browse/ARTEMIS-484
The File copy after the initial synchronization on large messages was broken.
On this commit we fix how the buffer is cleaned up before each read since
a previously unfinished body read would make the buffer dirty.
I'm keeping also lots of Traces I have added to debug this issue, so they will
be useful if anything like this happens again.
If a LM write packet is received from the live assume that the large
message exists and create a local reference.
Old behavour would reject the packet which could lead to loss of data on
failover see JIRA.
https://issues.apache.org/jira/browse/ARTEMIS-463
This will have some extra refactoring on the protocol head, transferring responsibility to the broker classes in a lot of cases
and removing some duplicated code
This was a team effort from Clebert Suconic and Howard Gao
There is a potential deadlock scenario if 2 threads try
sendReplicatePacket. Thread 1 registers Netty callback which will
invoke readyForWrites() method, which locks the callback list and waits
on the replicationLock. Thread 2 enters sendReplicatePacket and gets
the replicationLock and is blocked on the callback list lock. This fix
ensures the sendReplicatePacket blocks on the ReplicationManager meaning
no interleaving of the Netty callback thread and the wait on lock can
happen.
It's possible for the latch used for flow control here to get out of sync. In
other words, multiple count-downs can occur between count-ups so that the latch
always has a count > 0. When this situation arises then every single packet
sent to the replica is delayed by 5 seconds.
The solution here essentially is to eliminate the latch completely and use a
condition/wait/notify pattern.