This one should improve eventual failures on ClusterConnectionControlTest
to validate this, run ClusterConnectionControlTest::testNotifications in a loop
you will see eventually the test taking longer to shutdown the executor as the call could be blocked.
This won't be an issue on a real server (Production system)
however, on the testsuite or while embedded this could cause issues,
if a same instance is stopped then started.
This is the reason why BackupAuthenticationTest was intermittently failing.
I also adapted the test since I would need to stop the server and reactivate it in order to change the configuration.
The previous test wasn't acting like a real server.
Currently when the broker hits the common 'Address already in use' issue
when starting the process terminates with a vague exception. The user is
left to guess which acceptor actually failed. If the broker has lots of
acceptors it is a tedious process to identify the problematic
configuration. This commit adds details to the exception message about
which acceptor failed and which host:port it was attempting to bind to.
This commit does the following:
- Deprecates existing overloaded createQueue, createSharedQueue,
createTemporaryQueue, & updateQueue methods for ClientSession,
ServerSession, ActiveMQServer, & ActiveMQServerControl where
applicable.
- Deprecates QueueAttributes, QueueConfig, & CoreQueueConfiguration.
- Deprecates existing overloaded constructors for QueueImpl.
- Implements QueueConfiguration with JavaDoc to be the single,
centralized configuration object for both client-side and broker-side
queue creation including methods to convert to & from JSON for use in
the management API.
- Implements new createQueue, createSharedQueue & updateQueue methods
with JavaDoc for ClientSession, ServerSession, ActiveMQServer, &
ActiveMQServerControl as well as a new constructor for QueueImpl all
using the new QueueConfiguration object.
- Changes all internal broker code to use the new methods.
Due to the changes in 6b5fff40cb the
config parameter message-expiry-thread-priority is no longer needed. The
code now uses a ScheduledExecutorService and a thread pool rather than
dedicating a thread 100% to the expiry scanner. The pool's size can be
controlled via scheduled-thread-pool-max-size.
This appears to have been added to the code-base by mistake over 10
years ago. It seems related to debugging and I can't see anywhere where
it is actually used so I'm removing it.
Using a property on AMQPLargeMessage instead of a ThreadLocal
This was causing issues on the journal as the message may transverse different threads on the journal.
There is no guarantee that the encodeSize size is the same in AMQP right after read.
As the protocol may add additional bytes right after decoded such as header, extra properties.. etc.
This is a Large commit where I am refactoring largeMessage Body out of CoreMessage
which is now reused with AMQP.
I had also to fix Reference Counting to fix how Large Messages are Acked
And I also had to make sure Large Messages are transversing correctly when in cluster.
- Avoid some Properties Decoding, checking if we need certain properties like scheduled delivery
- Avoid creating some unnecessary SimpleString instances
- Removed some intermediate ActiveMQBuffer allocation
- Removed some intermediate UnreleasableByteBuf allocation
Adding serialVersionUID to WildcardConfiguration since it is
serializable. Without serialVersionUID being specified a new one is
generated on each compilation which prevents object deserialization
between releases.
This is to make it possible to identify what test is leaking files whenever that is happening.
That is because future tests will report the leaks, and it's difficult to identify where it happened.
Also i'm changing NoProcessFilesBehind to show the getOpenFD propertly
Large messages can be split up using Websocket Continuation Frames.
This allows for much smaller buffer sizes to send or receive
potentially very large messages.
There is an optimization in AMQP, that properties are only parsed over demand.
It happens that after ARTEMIS-2294 (commit 2dd0671698),
every send would request for the property on the message, resulting the properties to always be parsed upon send.
Even when there's no use of application properties.
This is a surprisingly large change just to fix some log messages, but
the changes were necessary in order to get the relevant data to where it
was being logged. The fact that the data wasn't readily available is
probably why it wasn't logged in the first place.
Removed unused private constant FLUSH_TIMEOUT.
Solved some compiler warnings (missing generic type, private class can
be final). Fixed a performance related sevntu checkstyle warning
"Variable 'xyz' can be moved inside the block at line '2,864' to
restrict runtime creation.".
Add a paged message to the tail, when the QueueIterateAction doesn't handle it, to avoid removing unhandled paged message. Move the refRemoved calls from the QueueIterateActions to the iterQueue to fix the queue stats.
If a broker loses its file lock on the journal and doesn't notice (e.g.
network connection failure to an NFS mount) then it can continue to run
after its backup activates resulting in split-brain.
This commit implements periodic journal lock evaluation so that if a live
server loses its lock it will automatically restart itself.
The PageSubscriptionImpl.cleanupEntries could be locked by the queue
depage because they are executed with the same executor and the depage
could be locked by the iterQueue.
If PageSubscriptionImpl.cleanupEntries is locked, no one clean up the
JournalRecord and PagePositionImpl instances created during iterQueue.
So removing all messages from a huge queue, causes the retention of too
JournalRecord and PagePositionImpl instances until an OOM.
To avoid to lock the PageSubscriptionImpl.cleanupEntries the depage is
executed only if the queue isn't iterating.
This commit introduces the ability to configure a downstream connection
for federation. This works by sending information to the remote broker
and that broker will parse the message and create a new upstream back
to the original broker.
Add the config parameter `page-sync-timeout` to set a customized value,
because if the broker is configured to use ASYNCIO journal, the timeout
has the same value of NIO default journal buffer timeout ie 3333333.
The iterQueue transaction commits are locked by the synchronization
context. So removing all messages from a huge queue causes the creation
of too locked transactions for the paged messages and so OOM.
The iteration on paged message is executed out the iterQueue
synchronization context to avoid to lock the transaction commits.
Active Directory servers are unable to handle referrals automatically.
This causes a PartialResultException to be thrown if a referral is
encountered beneath the base search DN, even if the LDAPLoginModule is
set to ignore referrals.
This option may be set to 'true' to ignore these exceptions, allowing
login to proceed with the query results received before the exception
was encountered.
Note: there are no tests for this change as I could not reproduce the
issue with the ApacheDS test server. The issue is specific to directory
servers that don't support the ManageDsaIT control such as Active
Directory.
A new feature to preserve messages sent to an address for queues that will be
created on the address in the future. This is essentially equivalent to the
"retroactive consumer" feature from 5.x. However, it's implemented in a way
that fits with the address model of Artemis.
Improve wildcard support for the key attribute in the roles access
match element and whitelist entry element, allowing prefix match for
the mBean properties.
In LargeMessageImpl.copy(long) it need to open the underlying
file in order to read and copy bytes into the new copied message.
However there is a chance that another thread can come in and close
the file in the middle, making the copy failed
with "channel is null" error.
This is happening in cases where a large message is sent to a jms
topic (multicast address). During delivery it to multiple
subscribers, some consumer is doing delivery and closed the
underlying file after. Some other consumer is rolling back
the messages and eventually move it to DLQ (which will call
the above copy method). So there is a chance this bug being hit on.
The crititical analyser trigger the broker shutdown if try to
removeAllMessages with a huge queue. The iterQueue is split so as
not to keep the lock too time.
The LocalMonitor tick log is very useful to establish a "heartbeat" log
statement. It is moved into its own logger from PagingManager logger,
which is too verbose to leave activated indefinitely in production.
After a node is scaled down to a target node, the sf queue in the
target node is not deleted.
Normally this is fine because may be reused when the scaled down
node is back up.
However in cloud environment many drainer pods can be created and
then shutdown in order to drain the messages to a live node (pod).
Each drainer pod will have a different node-id. Over time the sf
queues in the target broker node grows and those sf queues are
no longer reused.
Although use can use management API/console to manually delete
them, it would be nice to have an option to automatically delete
those sf queue/address resources after scale down.
In this PR it added a boolean configuration parameter called
cleanup-sf-queue to scale down policy so that if the parameter
is "true" the broker will send a message to the
target broker signalling that the SF queue is no longer
needed and should be deleted.
If the parameter is not defined (default) or is "false"
the scale down won't remove the sf queue.
By default, such a cancelled task is not automatically removed from the
work queue until its delay elapses. It may cause unbounded retention of
cancelled tasks. To avoid this, set remove on cancel policy to true.
The core server session tracks details about producers like what
addresses have had messages sent to them, the most recent message ID
sent to each address, and the number of messages sent to each address.
This information is made available to users via the
listProducersInfoAsJSON method on the various management interfaces
(JMX, web console, etc.). However, in situations where a server session
is long lived (e.g. in a pool) and is used to send to many different
addresses (e.g. randomly named temporary JMS queues) this info can
accumulate to a problematic degree. Therefore, we should limit the
amount of producer details saved by the session.
Certain devices or file systems won't support record level locking.
For that reason I am changing FileLockNodeManager to use separate files (one for each position) instead of using tryLock(position);
A good example for this would be cephFS where channel.tryLock or channel.tryLock works but it fails at a record level.