Durable subscrption state is part of the MQTT specification which has
not been supported until now. This functionality is implemented via an
internal last-value queue. When an MQTT client creates, updates, or
adds a subscription a message using the client-ID as the last-value is
sent to the internal queue. When the broker restarts this data is read
from the queue and populates the in-memory MQTT data-structures.
Therefore subscribers can reconnect and resume their session's
subscriptions without have to manually resubscribe.
MQTT state is now managed centrally per-broker rather than in the
MQTTProtocolManager since there is one instance of MQTTProtocolManager
for each acceptor allowing MQTT connections. Managing state per acceptor
would allow odd behavior with clients connecting to different acceptors
with the same client ID.
The subscriptions are serialized as raw bytes with a "version" byte for
potential future use, but I intentionally avoided adding complex
scaffolding to support multiple versions. We can add that complexity
later if necessary.
Some tests needed to be changed since instantiating an MQTT protocol
manager now creates an internal queue. A handful of tests assume that no
queues will exist other than the ones they create themselves. I updated
the main test super-class so that an MQTT protocol manager is not
automatically instantiated when configuring a broker for in-vm support.
The exception thrown by serverLocator.connect() should be all you need on such case
and the caller should then be responsible for taking appropriate action.
Only report finding matching log message if all requested entries are present in it, not just the last one provided.
Also fix the updated AssertionLoggerHandler usage within AddressFullLoggingTest, ensure it is active across the full period expected messages can happen and doesnt miss early ones.
This commit introduces support for configuring a specific Duplicate ID cache size per address in the Artemis server. Previously, there was only a global setting for the ID cache size, but now each address can have its own cache size.
The changes include the addition of a new configuration property id-cache-size in the Artemis server configuration file. This property can now be specified under each address setting in the configuration file, and its value will determine the Duplicate ID cache size for that particular address. If the id-cache-size property is not specified for an address, it will use the global setting.
The test cases have been updated to cover this new functionality, and integration test have been added to verify that address-specific cache sizes work as expected.
Documentation has been added to address-settings.adoc, configuration-index.adoc and duplicate-detection.adoc
ARTEMIS-4375 Implement artemis shell using JLine3 integrated with auto-completion from picocli
This commit involves two JIRAs. One is adding PicoCLI and the next is Using JLine3 and implement a shell.
I have tried to keep these commits separate but these changes became interdependent hence the two JIRAs are squashed in this commit.
This commit contains the following changes:
- eliminate used, undeclared dependencies
- eliminate unused, declared dependencies
- fix scope for test dependencies
- eliminate org.hamcrest completely as its use involved deprecated code
as well as dependencies from multiple versions
In rare cases a store operation could silently fails or starves, blocking the
related server session and all delivering messages. Those server sessions can
be closed adding a management method that cleans their operation context
before closing them.
If the Security Manager is using Netty, and in particular the same Netty connection,
you could run into a deadlock / starvation.
This is particularly true in the Wildfly case where they reuse the same connection for everything via XNIO.
Improve the CORE client failover connecting to other live servers when all
reconnect attempts fails, i.e. in a cluster composed of 2 live servers,
when the server to which the CORE client is connected goes down the CORE
client should reconnect its sessions to the other liver broker.
Continually read from the compressed byte[] into
the decompressed object
Add test to validate large (>1024 bytes) compressed data can be
deserialized properly
Some scrapers, e.g. prometheus, add an "instance" tag. This value may not be the same as
the broker name, which results in these metrics becoming more difficult to match up with
the corresponding broker.
legacy-integration-tests is being created to hold LDAP Tests (or any other tests that won't play well with keeping threads clean)
it will have fork-mode=always on the maven-surefire-plugin
When skipping the authentication cache details for the original
exception are not logged.
This commit ensures these details are logged and adopts the
ExceptionUtils class from Apache Commons Lang in lieu of the previous
custom implementation.
When sending, for example, to a predefined anycast address and queue
from a multicast (JMS topic) producer, the routed count on the address
is incremented, but the message count on the matching queue is not. No
indication is given at the client end that the messages failed to get
routed - the messages are just silently dropped.
Fixing this problem requires a slight semantic change. The broker is now
more strict in what it allows specifically with regards to
auto-creation. If, for example, a JMS application attempts to send a
message to a topic and the corresponding multicast address doesn't exist
already or the broker cannot automatically create it or update it then
sending the message will fail.
Also, part of this commit moves a chunk of auto-create logic into
ServerSession and adds an enum for auto-create results. Aside from
helping fix this specific issue this can serve as a foundation for
de-duplicating the auto-create logic spread across many of the protocol
implementations.
testSimpleResume is intermittently failing.
This test is forcing another page, while cleanup is happening on the background.
ForceAnotherPage may not put the address back into paging if this happened right after the cleanup call.
To fix the test, we should call startPaging after forceAnotherPage is called.
- activemq.notifications are being transferred to the target node, unless an ignore is setup
- topics are being duplicated after redistribution
- topics sends are being duplicated when a 2 node cluster mirrors to another 2 node cluster, and both nodes are mirrored.
- interrupted message breaking reference counting
After the server writing to the client is interrupted in AMQP, the reference counting was broken what would require the server restarted
in order to cleanup the files of any interrupted sends.
- Removed consumer during large message delivery damaging large messages
If the consumer failed to deliver messages for any reason, the message on the queue would be duplicated. what would wipe out the body of the message
and other journal errors would happen because of this.
extra debug capabilities added into RefCountMessage as part of ARTEMIS-4206 in order to identify these issues
This test is boundless adding data into the journal when there are no syncs.
That's creating 600MIB worth of data on our CIs, and this tests was not meant to be acting like a soak test.
I'm limitting the load the test can generate with a TokenBucketLimiterImpl now.
MQTT5Test::testMaxMessageSize is spiking the memory on the integration testsuite all the way up to 1.5G
what makes this test more like a soak test.
The test is now converted to use a real server like other Soak Tests.
This fix will delay the message.copy to the redistributor itself.
Meaning no copy would be performed if the redistribution itself failed.
No need to remove a copy any longer
The federated queue consumer has to generate a new id for the messages
received from the upstream broker because they have an id generated by
the store manager of the upstream broker.
Co-authored-by: Clebert Suconic <clebertsuconic@apache.org>
Basically I started the testsuite and attached check leak with "java -jar check-leak.jar --pid <pid> --report testsuite-report --sleep 1000" and saw the allocations of this were pretty high.
This fix is scanning journal and paging for existing large messages. We will remove any large messages that do not have a corresponding record in journals or paging.
This failure was because of a noise from the test itself. as the test is creating a producer, and it's measuring for a producer from the test.
it makes no sense to fix it for OverCore.. we just ignore it on UsingCore
There are certain use-cases where addresses will be auto-created and
never have a direct binding created on them. Because of this they will
never be auto-deleted. If a large number of these addresses build up
they will consume a problematic amount of heap space.
One specific example of this use-case is an MQTT subscriber with a
wild-card subscription and a large number of MQTT producers sending one
or two messages a large number of different MQTT topics covered by the
wild-card. Since no bindings are ever created on any of these individual
addresses (e.g. from a subscription queue) they will never be
auto-deleted, but they will eventually consume a large amount of heap.
The only way to deal with these addresses is to manually delete them.
There are also situations where queues may be created and never have
any messages sent to them or never have a consumer connect. These
queues will never be auto-deleted so they must be deleted manually.
This commit adds the ability to configure the broker to skip the usage
check so that these kinds of addresses and queues can be deleted
automatically.
The issue identified with AMQP was under Transaction usage, and while opening and closing sessions.
It seems the leak would be released once the connection is closed.
We added a new testsuite under ./tests/leak-tests To fix and validate these issues
Configurations employing shared-storage with NFS are susceptible to
split-brain in certain scenarios. For example:
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Split-brain.
In reality this situation is pretty unlikely due to the timing involved,
but the possibility still exists. Currently the file lock held by the
primary broker on the NFS share is essentially worthless in this
situation. This commit adds logic by which the timestamp of the lock
file is updated during activation and then routinely checked during
runtime to ensure consistency. This effectively mitigates split-brain in
this situation (and likely others). Here's how it works now.
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Primary detects that the lock file's timestamp has been updated and
shuts itself down.
When the primary shuts down in step #4 the Topology on the backup can be
damaged. Protections were added for this via ARTEMIS-2868 but only for
the replicated use-case. This commit applies the protection for
removeMember() so that the Topology remains intact.
There are no tests for these changes as I cannot determine how to
properly simulate this use-case. However, there have never been robust,
automated tests for these kinds of NFS use-cases so this is not a
departure from the norm.
For pipelined open cases the events processing should ignore additional begin
and attach events if the open event handler closes the connection to avoid the
processing throwing additional exceptions and replacing the error condition in
the connection with an unrelated error about NPE from the additional events.
I am adding three attributes to Address-settings:
* page-limit-bytes: Number of bytes. We will convert this metric into max number of pages internally by dividing max-bytes / page-size. It will allow a max based on an estimate.
* page-limit-messages: Number of messages
* page-full-message-policy: fail or drop
We will now allow paging, until these max values and then fail or drop messages.
Once these values are retracted, the address will remain full until a period where cleanup is kicked in by paging. So these values may have a certain delay on being applied, but they should always be cleared once cleanup happened.
o.a.a.a.c.p.m.MQTTSubscriptionManager#removeSubscription() had a chunk
of code from 971f673c60 removed. That code
was added under the assumption that there should only ever be one
consumer per queue. That was true for MQTT 3.x, but it's not always true
for MQTT 5 due to shared subscriptions. However, the tests from that
commit all still pass even with it removed now (as well as all the other
MQTT tests) so I think it's safe.
If the client is using address prefixes to define the routing type along with
durable subscriptions then on re-attach the compairon to check if the subscription
address has changed needs to remove the prefix when comparing against the address
since the prefix isn't propagated when creating the address and will always fail
resulting in the subscription queue being deleted in error.
Adds some tests to validate that the destination prefixes if set and
are used properly by the client are honored over the default address
auto create routing type condiguration.
When an AMQP client subscribes to a new address (non-existing) with a receiver link, the
address is created with routing type ANYCAST regardles of the default address creation
configuration of the broker, and ignores even the broker wide default of MULTICAST.
I am adding an option sync=true or false on mirror. if sync, any client blocking operation will wait a roundtrip to the mirror
acting like a sync replica.
Over time org.apache.activemq.artemis.tests.integration.amqp has become
home to many multi-protocol JMS tests even though the package is really
for AMQP-specific tests. This commit splits those tests out into their
own package.
This is a preliminary step to clarify these tests before I add another
one for a different issue.
When the last non-durable subscriber on a JMS topic disconnects the
corresponding queue representing the subscription is deleted as
expected. However, the queue's address will also be deleted no matter
what, which is *not* expected.
Some LDAP servers (e.g. OpenLDAP) do not support the "persistent search"
feature and therefore the existing "listener" feature does not actually
fetch updates. This commit implements a "pull" feature controlled by a
configurable interval equivalent to what is implemented in the cached
LDAP authorization module from ActiveMQ "Classic."
A handful of tests started to fail after the original fix was committed.
This commit fixes those failures mainly by using a mock
`TransactionSynchronizationRegistry`.
I changed `o.a.a.a.r.ActiveMQRAManagedConnection#checkTransactionActive`
slightly because `getTransactionStatus` will never return `null` unlike
`getTransaction` would. The semantics should still be the same, though.
Allow setting id-cache-size to 0 from broker.xml and ensure the broker
handles this gracefully. Previously you could only set the cache size to
0 via broker properties or programmatically and it would throw an
ArrayIndexOutOfBoundsException when adding an item to the cache.
- From now on we will save snapshots of page-counters on the journal (basically for compatibility with previous verions).
And we will recount the records on startup.
- While the rebuild is being done the value from the previous snapshot is still available with current updates.
Attempt to standardize all Logger declaration to a singular variable name
which makes the code more consistent and make finding usages of loggers in
the code a bit easier.
Commit 5a42de5fa6 called my attention to
this test. It really needs to be refactored because:
- It belongs in the integration-tests module rather than the MQTT
protocol module.
- It is using a lot of non-standard components (e.g.
EmbeddedJMSResource, Awaitility, etc.).
- It is overly complicated (e.g. using its own MqttClientService).
This commit resolves all those problems. The new implementation is quite
a bit different but still equivalent. I reverted the original fix from
ARTEMIS-2476 and the test still fails.
Logger statements should use formatting syntax and let the normal framework checks take care of
checking if a logger is enabled instead of string concats and isXEnabled logger checks except
in cases there is known expense to the specifc logging message/arg preparation or passing.
Changes from myself and Robbie Gemmell.
Co-authored-by: Robbie Gemmell <robbie@apache.org>
I was debugging Compacting, looking for a possible issue here in these conditions.
even though I found nothing wrong with the code, I still want to keep the test as there's no such thing as enough testing.
In commit a9a85f98db I removed the code
which modified existing matches. However, I forgot that the matches read
from LDAP are often duplicated so instead of always adding a new match
this commit ensures that the *right* match is modified rather than a
potentially more generic wildcard match (which was the original
problem).
If an AMQP consumer tries to receive a message and the broker is unable
to convert the message from core to AMQP then the consumer is
disconnected and the offending message stays in the queue. When the
consumer reconnects the conversion error will happen again resulting in
a loop that can only be resolved through administrative action (e.g.
deleting the message manually or sending it to a dead letter address).
This commit fixes that problem by detecting the conversion problem and
sending the message to the queue's dead letter address. It also doesn't
disconnect the consumer.
This commit also changes the log messages associated with sending a
message to the dead letter address since this event can now occur
regardless of the delivery attempts.
The map used by LastValueQueue was inadvertently changed to a
non-thread-safe implementation in
4a4765c39c. This resulted in an occasional
ConcurrentModificationException from the hashCode implementation.
This commit restores the thread-safe map implementation and adds a test
which brute-forces a CME when using the non-thread-safe implementation.
Sometimes users want to perform custom client ID validation, and in the
case of an invalid client ID the proper reason code should be returned
in the CONNACK packet.
When the LegacyLDAPSecuritySettingPlugin has enableListener set to true
and a new permission is added it will try to modify the existing match
if one exists. This is problematic if there's a more generic wildcard
match than the specific one that's modified.
This commit fixes that problem so that instead of modifying the existing
match(es) it simply adds a new one. The plugin never should have tried
modifying the existing match in the first place as two identical matches
would be a configuration error.
- Fixing RoleInfo to provide informations on deleteAddress.
- Adding more coverage on test to check the number of permissions
returned.
Signed-off-by: Emmanuel Hugonnet <ehugonne@redhat.com>