Ensure that on server restart the original priority value assigned to an
AMQP message is used when dispatching durable messages from the store.
The AMQP Header section is scanned if present and the priority value
is recovered in an efficient manner.
AckManager.flush would hold a lock on ackManager, There was a possible deadlock with MirrorTarget:
Thread 1:
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager.addRetry(AckManager.java:393)
- waiting to lock <0x00000007990a13e8> (a org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager.ack(AckManager.java:418)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AMQPMirrorControllerTarget.performAck(AMQPMirrorControllerTarget.java:479)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AMQPMirrorControllerTarget.postAcknowledge(AMQPMirrorControllerTarget.java:461)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AMQPMirrorControllerTarget.actualDelivery(AMQPMirrorControllerTarget.java:318)
at org.apache.activemq.artemis.protocol.amqp.proton.ProtonAbstractReceiver.onMessageComplete(ProtonAbstractReceiver.java:361)
Thread 2:
at jdk.internal.misc.Unsafe.park(java.base@11.0.8/Native Method)
- parking to wait for <0x000000079de0af38> (a java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.8/LockSupport.java:234)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.8/AbstractQueuedSynchronizer.java:1079)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.8/AbstractQueuedSynchronizer.java:1369)
at java.util.concurrent.CountDownLatch.await(java.base@11.0.8/CountDownLatch.java:278)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AMQPMirrorControllerTarget.flush(AMQPMirrorControllerTarget.java:230)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager$$Lambda$601/0x00000008005c3040.accept(Unknown Source)
at java.lang.Iterable.forEach(java.base@11.0.8/Iterable.java:75)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager.flushMirrorTargets(AckManager.java:184)
- locked <0x00000007990a13e8> (a org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager)
at org.apache.activemq.artemis.protocol.amqp.connect.mirror.AckManager.initRetry(AckManager.java:162)
Send operations should ignore replication, while the ack of the message should wait a round trip in replication.
That will allow us to ack the message faster and still have consistency with its replica.
If a user for some reason force closes the local mirror connection SNF
consumer or the actual connection but hasn't stopped the broker connection
itself the connection should recover and rebuild. The fix ensures that if
local connections are closed first then local session resources get freed.
When an address consumer explicitly closes or is closed we should remove the
address binding for that consumer right away instead of waiting for possible
configured auto delete as the demand is gone and we don't need to binding to
stick around any longer.
Say you use Mirroring and journal replication combined.
The target will wait a round trip on replica before sends are done.
It is possible to ignore that rountrip now with an option added into Configuration#mirrorReplicaSync
The federation sender links can react incorrectly when the source broker
closes a receiver because the demand is gone and they can result in the
remote broker closing its side of the connection causing the whole federation
to need to be rebuilt. Handle the closure events correctly to prevent an
unexpected close and rebuild.
Move parameters out of base classes where needed to avoid clashes with
subclasses [re-]defining their own params. Needed due to change of
field + method annotation search handling to adopt Java hiding/shadowing
semantics, leading to ParameterizedTestExtension discovering both parent
and subclass parameter definition methods and throwing.
In some setups, there could be a few hundred thousand queues that are
created due to many consumers that are connecting. However, most of
these are empty and stay empty for the entire day since there aren't
necessarily messages to be sent. The 8K intermediateMessageReferences
instantiates an 64KB buffer (Object[]). This means we have large
allocation and live heap that ultimately remains empty for almost the
entire day.
In this commit, we introduce initial-queue-buffer-size, which defaults
to the current value of 8192. It can be set programmatically via
QueueConfiguration#setInitialQueueBufferSize(int).
Note that this must be a positive power of 2.
The run command uses the artemis.profile and log4j2.properties files while all
other CLI commands use the artemis-utility.profile and log4j2-default.properties
files.
When using targeted FQQN permissions the AMQP sender needs to check that
it can access not only the address but also the queue if sent an FQQN so
that the security can validate if the sender has been granted directed
access to the FQQN as a whole.
Check that an attaching Openwire producer has SEND permission on the target
destination and reject it if it does not instead of delaying checks until the
actual send. For anonymous producers check early in the send process to reduce
overhead in the JVM handling messages that are going to fail to send.
One side of the mirror will send and ack messages one by one.
As the message arrives in the mirror the ack comes before the persistence finishes, so we need to retry and configure retry accordingly.
When counting messages with provided filters and grouping use the API
getObjectPropertyForFilter which translates values from AMQP message
annotations that are otherwise missed by the filters.
for regular messages it's quite obvious when the message is leaving the queue but for paged messages it becomes a challenge. We should just ignore the update for paged messages.
This commit fixes 3 distinct issues with bridge configuration encoding:
- The encoding size was calculated incorrectly in three ways:
- Using 0 instead of `DataConstants.SIZE_NULL` when no transformer is
defined resulting in a calculated size discrepancy of -1.
- Using `DataConstants.INT` instead of `DataConstants.SIZE_INT` for
the number of transformer properties resulting in a calculated size
discrepancy of +2 when using a configuration with a transformer.
- Using 0 or `sizeOfNullableInteger` instead of
`DataConstants.SIZE_INT` for the number of static connectors
resulting in a variable calculated size discrepancy of either -4 or
+1 respectively.
- Encoding was using `writeString` instead of `writeNullableString` for
the name of the transformer class resulting in an actual buffer size
discrepancy of -1.
Aside from these fixes this commit also adds new tests specifically for
encoding & decoding include a test to verify known bytes during
encoding.
Lastly, it's worth noting that this *won't fix* bad data that was
already stored to disk by an older broker or bad data that comes over
the wire from an older broker (e.g. if a older primary broker paired
with a newer backup).
During the investigation for ARTEMIS-4910 it was noted that some of the
storage manager tests didn't actually force a full reload (i.e.
decoding) of the various configurations from disk. This commit does the
following:
- refactors these tests so that they all force a reload from disk
- organizes imports
- eliminates various bits of repeated code in all the tests
- improves code formatting
This commit exposes a bug in the encoding/decoding of
BridgeConfiguration via a failing test in
BridgeConfigurationStorageTest. This test is disabled. This failure will
be resolved in a subsequent commit at which point the test will be
enabled again.
If, for whatever reason, the response for a packet sent with blocking
semantics is never returned it's possible that an async response
received in the interventing time will be interpreted as the current
response. This is because ChannelImpl does not verify the correlation
ID set on the response packet when it is received.
Small change but say there's ever a leak on the journal. Removing the clause from paging would allow to also capture other leaks.
This is currently not an issue and the test should still pass.
No need for the iterations, and a few minor improvements to the test.
This method was taking up to 15 seconds on the CI and the same method is reused a couple times, resulting in many minutes wasted
This commit fixes 4 distinct issues with divert configuration encoding:
- The encoding size was calculated incorrectly in three ways:
- Using 0 instead of `DataConstants.SIZE_NULL` when no transformer is
defined resulting in a calculated size discrepancy of -1.
- Using `DataConstants.INT` instead of `DataConstants.SIZE_INT` for
the number of transformer properties resulting in a calculated size
discrepancy of +2 when using a configuration with a transformer.
- Using `BufferHelper.sizeOfNullableBoolean` instead of
`DataConstants.SIZE_BOOLEAN` for the `exclusive` property resulting
in a calculated discrepancy of +1.
- Encoding was using `writeString` instead of `writeNullableString` for
the name of the transformer class resulting in an actual buffer size
discrepancy of -1.
Aside from these fixes this commit also:
- Updates the divert storage test to force a reload from disk which
will also trigger this problem.
- Adds new tests specifically for encoding & decoding include a test to
verify known bytes during encoding.
Lastly, it's worth noting that this *won't fix* bad data that was
already stored to disk by an older broker or bad data that comes over
the wire from an older broker (e.g. if a older primary broker paired
with a newer backup).
Files will leak on the target and messages will not be received after failover.
Also messages will be written to wrong destinations on the replica. The leak is actually between destinations, and the consequence is the file leak.
I usually keep test and fix on the same commit, but in this case I have been heavily validating the server with and without the fix,
so I will open an exception in this case and keep the fix and test separated.
When a bridge is stopped it doesn't wait for pending send
acknowledgements to arrive. However, when a bridge is paused it does
wait. The behavior should be consistent and more importantly
configurable. This commit implements these improvements and generally
refactors BridgeImpl to clarify and simplify the code. In total, this
commit includes the follow changes:
- Removes the hard-coded 60-second timeout for pending acks when
pausing the bridge and adds a new config parameter (i.e.
"pending-ack-timeout").
- Applies the new pending-ack-timeout when the bridge is stopped.
- Updates existing and adds new logging messages for clarity.
- De-duplicates code for sending bridge-related notifications.
- Avoids converting bridge name to/from SimpleString.
- Removes unnecessary comments.
- Renames variables & functions for clarity.
- Replaces the `started`, `stopping`, & `active` booleans with a
single `state` variable which is an enum.
- Adds `final` to a few variables that were functionally final.
- Synchronizes `stop` & `pause` methods to add safety when invoked
concurrently with `handle` (since both deal with `state` and execute
runnables on the ordered executor).
- Reorganizes and removes a few methods for clarity.
- Relocates `connect` method directly into `ConnectRunnable` (mirroring
the structure of the `StopRunnable` and `PauseRunnable`).
- Eliminates unnecessary variables in `ConnectRunnable` and
`ScheduledConnectRunnable`.
- Adds test to verify pending ack timeout works as expected with both
`stop` & `pause` with both regular and large messages.
Handle any exceptions from the proton transport and set the error on the
transport for processing in the events dispatch cycle by adding in handling
of the transport error event.
This commit uses lambdas or method references wherever possible. There
are still a handful of places that appear like they could be changed but
couldn't mainly because they use "this" and the meaning of "this"
changes when using a lambda.
This test is spinning a concurrent call on getDirectBindings
Since I added a synchronization point to get the Bindings the test could
be taking up to 5 seconds, being a variance between 500ms and 5 seconds.
Adding Thread.sleep(1) on this loop solved the issue as it is now letting other threads to do work
since I'm starting more executors than cores I have on my box.
Currently, with 500K+ queues, the cleanup step of TempQueueCleanerUpper
requires invoking WildcardAddressManager#getDirectBindings, which is
O(k) in the number of queues.
From method profiling, this can consume up to 95% of our CPU time when
needing to clean up many of these.
Add a new map to keep track of the direct bindings, and add a test
assertion that fails if we don't properly remove it.
When setting expiration on the AMQPMessage the AMQP header TTL value
should be read as an unsigned integer and as such should use the longValue
API of UnsignedInteger to get the right value to set expiration.
When checking if address federation can be done the manager needs to look
at the policy level settings before looking at federation or connector
level settings for amqp credits.
No semantic changes in this commit,
I was just exploring the possibility of
issues with paging and large messages combined
and I decided to keep the test.