After a node is scaled down to a target node, the sf queue in the
target node is not deleted.
Normally this is fine because may be reused when the scaled down
node is back up.
However in cloud environment many drainer pods can be created and
then shutdown in order to drain the messages to a live node (pod).
Each drainer pod will have a different node-id. Over time the sf
queues in the target broker node grows and those sf queues are
no longer reused.
Although use can use management API/console to manually delete
them, it would be nice to have an option to automatically delete
those sf queue/address resources after scale down.
In this PR it added a boolean configuration parameter called
cleanup-sf-queue to scale down policy so that if the parameter
is "true" the broker will send a message to the
target broker signalling that the SF queue is no longer
needed and should be deleted.
If the parameter is not defined (default) or is "false"
the scale down won't remove the sf queue.
By default, such a cancelled task is not automatically removed from the
work queue until its delay elapses. It may cause unbounded retention of
cancelled tasks. To avoid this, set remove on cancel policy to true.
The core server session tracks details about producers like what
addresses have had messages sent to them, the most recent message ID
sent to each address, and the number of messages sent to each address.
This information is made available to users via the
listProducersInfoAsJSON method on the various management interfaces
(JMX, web console, etc.). However, in situations where a server session
is long lived (e.g. in a pool) and is used to send to many different
addresses (e.g. randomly named temporary JMS queues) this info can
accumulate to a problematic degree. Therefore, we should limit the
amount of producer details saved by the session.
Certain devices or file systems won't support record level locking.
For that reason I am changing FileLockNodeManager to use separate files (one for each position) instead of using tryLock(position);
A good example for this would be cephFS where channel.tryLock or channel.tryLock works but it fails at a record level.