PriorityLinkedList has multiple sub-lists, before this commit PriorityLinkedList::setNodeStore would set the same node store between all the lists.
When a removeWithID was called for an item on list[0] the remove from list[4] would always succeed first. This operation would work correctly most of the time except
when tail and head is being used. Many NullPointerExceptions would be seen while iterating on the list for remove operations, and the navigation would be completely broken.
A test was added to PriorityLinkedListTest to make sure the correct lists were used however I was not able to reproduce the NPE condition in that test.
AccumulatedInPageSoakTest reproduced the exact condition for the NPE when significant load is used.
AckManager.flush would hold a lock on ackManager, There was a possible deadlock with MirrorTarget:
Thread 1:
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager.addRetry(AckManager.java:393)
- waiting to lock <0x00000007990a13e8> (a org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager.ack(AckManager.java:418)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AMQPMirrorControllerTarget.performAck(AMQPMirrorControllerTarget.java:479)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AMQPMirrorControllerTarget.postAcknowledge(AMQPMirrorControllerTarget.java:461)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AMQPMirrorControllerTarget.actualDelivery(AMQPMirrorControllerTarget.java:318)
at org.apache.activemq.artemis.protocol.amqp.proton.ProtonAbstractReceiver.onMessageComplete(ProtonAbstractReceiver.java:361)
Thread 2:
at jdk.internal.misc.Unsafe.park(java.base@11.0.8/Native Method)
- parking to wait for <0x000000079de0af38> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.8/LockSupport.java:234)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.8/AbstractQueuedSynchronizer.java:1079)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.8/AbstractQueuedSynchronizer.java:1369)
at java.util.concurrent.CountDownLatch.await(java.base@11.0.8/CountDownLatch.java:278)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AMQPMirrorControllerTarget.flush(AMQPMirrorControllerTarget.java:230)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager$$Lambda$601/0x00000008005c3040.accept(Unknown Source)
at java.lang.Iterable.forEach(java.base@11.0.8/Iterable.java:75)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager.flushMirrorTargets(AckManager.java:184)
- locked <0x00000007990a13e8> (a org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager.initRetry(AckManager.java:162)
Say you use Mirroring and journal replication combined.
The target will wait a round trip on replica before sends are done.
It is possible to ignore that rountrip now with an option added into Configuration#mirrorReplicaSync
One side of the mirror will send and ack messages one by one.
As the message arrives in the mirror the ack comes before the persistence finishes, so we need to retry and configure retry accordingly.
Small change but say there's ever a leak on the journal. Removing the clause from paging would allow to also capture other leaks.
This is currently not an issue and the test should still pass.
Files will leak on the target and messages will not be received after failover.
Also messages will be written to wrong destinations on the replica. The leak is actually between destinations, and the consequence is the file leak.
I usually keep test and fix on the same commit, but in this case I have been heavily validating the server with and without the fix,
so I will open an exception in this case and keep the fix and test separated.
This commit uses lambdas or method references wherever possible. There
are still a handful of places that appear like they could be changed but
couldn't mainly because they use "this" and the meaning of "this"
changes when using a lambda.
This commit does the following:
- deprecate all QueueConfiguration ctors
- add `of` static factory methods for all the deprecated ctors
- replace any uses of the normal ctors with the `of` counterparts
This makes the code more concise and readable.
This commit does the following:
- deprecate the verbosely named `toSimpleString` static factory
methods
- add `of` static factory methods for all the ctors
- replace any uses of the normal ctors with the `of` counterparts
This makes the code more concise and readable.
PagingStore is supposed to send an event to replica on every file that is closed.
There are a few situation where the sendClose is being missed and that could generate leaks on the target
in this commit I'm storing a binding record with the address-settings for the correct size
this is also validating eventual merges of the AddressSettings in the same namespace.
This is a list of improvements done as part of this commit / task:
* Page Transactions on mirror target are now optional.
If you had an interrupt mirror while the target destination was paging, duplicate detection would be ineffective unless you used paged transactions
Users can now configure the ack manager retries intervals.
Say you need some time to remove a consumer from a target mirror. The delivering references would prevent acks from happening. You can allow bigger retry intervals and number of retries by tinkiering with ack manager retry parameters.
* AckManager restarted independent of incoming acks
The ackManager was only restarted when new acks were coming in. If you stopped receiving acks on a target server and restarted that server with pending acks, those acks would never be exercised. The AckManager is now restarted as soon as the server is started.