in cluster.
When we know that a node leaves a clustercleanly we shouldn't log WARN
messages about it.
Signed-off-by: Emmanuel Hugonnet <ehugonne@redhat.com>
Allows the configuration of AMQP Federation broker connections to be updated and
reloaded. This allows for update, add or remove of AMQP federation broker connections
as well as the basic AMQP sender and receiver broker connections. It checks for and
ignores changes in AMQP broker connections that are performing Mirroring as that
would lead to issues that can break mirroring.
I just did some static analysis of this code, and I believe it would be better to first add to the list
before setting the cancel task.
Reason for that is in case the Runnable is dequeued between the add in the deQueue and setting the cancel task.
Possibility is remote but my OCD wouldn't let me ignore this small possibility.
There is a race condition between ConnectionEntry.ttl and
FailureCheckAndFlushThread whereby an in-vm connection may get closed
inadvertently due to a TTL timeout. This is because ConnectionEntry.ttl
is initialized to 60000 and then later set to -1 upon the initial Ping.
If this update happens at *just* the right time in
FailureCheckAndFlushThread then the connection will be closed.
The fix ensures that the ConnectionEntry.ttl is set to -1 for in-vm
connections from the start. It also eliminates the possibility of the
race in FailureCheckAndFlushThread.
This fix is based on static analysis of the code. The timing window is
just too small to contruct a reliable test. The failure has only been
seen in the wild a handful of times.
When initially developed the expectation was that no more producers would keep connecting but in a scenario like this
the consumers could actually give up and things will just accumulate on the server.
We should cleanup these upon disconnect.
- Async commit
* async here meaning the recording of the commit record is not doing a sync on the storage.
This is useful for internal operations where we don't need an immediate sync on the journal storage.
- Wired notification
* I need finer control on a afterWired (to the storage) and before the completions, so I can plug the sync context on mirror right before the commit is called.
This commit does the following:
- Replaces non-inclusive terms (e.g. master, slave, etc.) in the
source, docs, & configuration.
- Supports previous configuration elements, but logs when old elements
are used.
- Provides migration documentation.
- Updates XSD with new config elements and simplifies by combining some
overlapping complexTypes.
- Removes ambiguous "live" language that's used with regard to high
availability.
- Standardizes use of "primary," "backup," "active," & "passive" as
nomenclature to describe both configuration & runtime state for high
availability.
This commit does the following:
- prevents a large stack trace when the user-supplied message filter
has errors
- improves feedback to user for filter-validity
- indents the search-field in the "Browse Queue" screen same as on
other screens
- fixes a spelling error in a nearby comment
As I worked through implementing a more generic JSON marshaller, I tried using reflection through BeanUtils and other ways
however the endresult was always worse as there were a few caveats that were not as easy to accomplish.
For that reason I went to a declarative appraoch where I define a meta-data object on AddressSettings and AddressSettingsInfo and
reuse the metadata in a few other places.
Allow for core messages to be tunneled over broker connection links used
for AMQP Federation and for broker mirroring. This eliminates the need to
convert from Core to AMQP and from loading core large messages fully into
memory for that conversion.
In case the bindings "news.#" and "news.europe.#" are registered, only the first one matches with the address "news.europe" while both are supposed to match. Those changes are meant to get rid of this limitation.
Durable subscrption state is part of the MQTT specification which has
not been supported until now. This functionality is implemented via an
internal last-value queue. When an MQTT client creates, updates, or
adds a subscription a message using the client-ID as the last-value is
sent to the internal queue. When the broker restarts this data is read
from the queue and populates the in-memory MQTT data-structures.
Therefore subscribers can reconnect and resume their session's
subscriptions without have to manually resubscribe.
MQTT state is now managed centrally per-broker rather than in the
MQTTProtocolManager since there is one instance of MQTTProtocolManager
for each acceptor allowing MQTT connections. Managing state per acceptor
would allow odd behavior with clients connecting to different acceptors
with the same client ID.
The subscriptions are serialized as raw bytes with a "version" byte for
potential future use, but I intentionally avoided adding complex
scaffolding to support multiple versions. We can add that complexity
later if necessary.
Some tests needed to be changed since instantiating an MQTT protocol
manager now creates an internal queue. A handful of tests assume that no
queues will exist other than the ones they create themselves. I updated
the main test super-class so that an MQTT protocol manager is not
automatically instantiated when configuring a broker for in-vm support.
This commit introduces support for configuring a specific Duplicate ID cache size per address in the Artemis server. Previously, there was only a global setting for the ID cache size, but now each address can have its own cache size.
The changes include the addition of a new configuration property id-cache-size in the Artemis server configuration file. This property can now be specified under each address setting in the configuration file, and its value will determine the Duplicate ID cache size for that particular address. If the id-cache-size property is not specified for an address, it will use the global setting.
The test cases have been updated to cover this new functionality, and integration test have been added to verify that address-specific cache sizes work as expected.
Documentation has been added to address-settings.adoc, configuration-index.adoc and duplicate-detection.adoc
This commit contains the following changes:
- eliminate used, undeclared dependencies
- eliminate unused, declared dependencies
- fix scope for test dependencies
- eliminate org.hamcrest completely as its use involved deprecated code
as well as dependencies from multiple versions
In rare cases a store operation could silently fails or starves, blocking the
related server session and all delivering messages. Those server sessions can
be closed adding a management method that cleans their operation context
before closing them.
If the Security Manager is using Netty, and in particular the same Netty connection,
you could run into a deadlock / starvation.
This is particularly true in the Wildfly case where they reuse the same connection for everything via XNIO.
When resource audit logging is enabled STOMP is completely inoperable
due to an NPE during the protocol handshake. Unfortunately the failure
is completely silent. There are no logs to indicate a problem.
This commit fixes this problem via the following changes:
- Mitigate the original NPE via a check for null
- Move the logic necessary to set the "protocol connection" on the
"transport connection" to a class shared by all implementations.
- Add exception handling to log failures like this in the future.
- Add tests to ensure the audit logging is correct.
Currently JavaDoc is generated for many classes that don't need it.
JavaDoc should be reserved for user-facing classes (e.g. those used by
client application developers and developers embedding a broker into
their application). This commit narrows down the configuration to just
the classes that are needed. This will save time during release builds,
and save disk space wherever these files are stored (e.g. Apache
website).
`org.apache.activemq.artemis.core.server.embedded.MainTest` expects the
broker to throw a `java.io.IOException` when it is started due to its
inability to find the file app/data/server.lock. However, if that files
just happens to exist then the test will simply hang indefinitely. This
commit adds a timeout to avoid hanging.
This is Fixing BackupSyncJournalTest::testReplicationDuringSync
ARTEMIS-4215 introduced a failure on the testsuite.
However the failure is non related to the Buffer itself. it introduced a race that unveiled ARTEMIS-4298.
Some scrapers, e.g. prometheus, add an "instance" tag. This value may not be the same as
the broker name, which results in these metrics becoming more difficult to match up with
the corresponding broker.
The broker process fails to exit if an error is encountered starting the NodeManager. The issue is resolved by converting the critical analyzer thread to a daemon thread. As added protection, the thread is manually stopped when this error is encountered.
When skipping the authentication cache details for the original
exception are not logged.
This commit ensures these details are logged and adopts the
ExceptionUtils class from Apache Commons Lang in lieu of the previous
custom implementation.
When sending, for example, to a predefined anycast address and queue
from a multicast (JMS topic) producer, the routed count on the address
is incremented, but the message count on the matching queue is not. No
indication is given at the client end that the messages failed to get
routed - the messages are just silently dropped.
Fixing this problem requires a slight semantic change. The broker is now
more strict in what it allows specifically with regards to
auto-creation. If, for example, a JMS application attempts to send a
message to a topic and the corresponding multicast address doesn't exist
already or the broker cannot automatically create it or update it then
sending the message will fail.
Also, part of this commit moves a chunk of auto-create logic into
ServerSession and adds an enum for auto-create results. Aside from
helping fix this specific issue this can serve as a foundation for
de-duplicating the auto-create logic spread across many of the protocol
implementations.
- activemq.notifications are being transferred to the target node, unless an ignore is setup
- topics are being duplicated after redistribution
- topics sends are being duplicated when a 2 node cluster mirrors to another 2 node cluster, and both nodes are mirrored.
- interrupted message breaking reference counting
After the server writing to the client is interrupted in AMQP, the reference counting was broken what would require the server restarted
in order to cleanup the files of any interrupted sends.
- Removed consumer during large message delivery damaging large messages
If the consumer failed to deliver messages for any reason, the message on the queue would be duplicated. what would wipe out the body of the message
and other journal errors would happen because of this.
extra debug capabilities added into RefCountMessage as part of ARTEMIS-4206 in order to identify these issues
This commit fixes the following things:
- Moves connection audit logging to the resource audit logger instead
of using a dedicated logger as that would adversely impact upgrading
users, and arguably didn't make sense in the first place.
- Mitigates an potential NPE w.r.t. connection ID.
- Updates the "dummy" management connection to return a valid
connection ID.
This fix will delay the message.copy to the redistributor itself.
Meaning no copy would be performed if the redistribution itself failed.
No need to remove a copy any longer
The federated queue consumer has to generate a new id for the messages
received from the upstream broker because they have an id generated by
the store manager of the upstream broker.
Co-authored-by: Clebert Suconic <clebertsuconic@apache.org>
- redistribute received the handle call, it then copies the message
- the routing table changes
- the message is left behind
With the new version of the server these messages will be removed. But we should remove these right away
Basically I started the testsuite and attached check leak with "java -jar check-leak.jar --pid <pid> --report testsuite-report --sleep 1000" and saw the allocations of this were pretty high.
I have seen a werid intermittent failure in the testsuite
caused by some race where the EMPTY_SET ends up Not Empty! Causing weird failures that were really difficult to be investigated
This fix is scanning journal and paging for existing large messages. We will remove any large messages that do not have a corresponding record in journals or paging.
This is just an annoying thing. as the functionality would work properly if there was a previous value.
No other harm other than the cannot find TX record on startup for this, hence I'm fixing it.
There are certain use-cases where addresses will be auto-created and
never have a direct binding created on them. Because of this they will
never be auto-deleted. If a large number of these addresses build up
they will consume a problematic amount of heap space.
One specific example of this use-case is an MQTT subscriber with a
wild-card subscription and a large number of MQTT producers sending one
or two messages a large number of different MQTT topics covered by the
wild-card. Since no bindings are ever created on any of these individual
addresses (e.g. from a subscription queue) they will never be
auto-deleted, but they will eventually consume a large amount of heap.
The only way to deal with these addresses is to manually delete them.
There are also situations where queues may be created and never have
any messages sent to them or never have a consumer connect. These
queues will never be auto-deleted so they must be deleted manually.
This commit adds the ability to configure the broker to skip the usage
check so that these kinds of addresses and queues can be deleted
automatically.
there are two leaks here:
* QueueImpl::delivery might create a new iterator if a delivery happens right after a consumer was removed, and that iterator might belog to a consumer that was already closed
as a result of that, the iterator may leak messages and hold references until a reboot is done. I have seen scenarios where messages would not be dleivered because of this.
* ProtonTransaction holding references: the last transaction might hold messages in the memory longer than expected. In tests I have performed the messages were accumulating in memory. and I cleared it here.
The issue identified with AMQP was under Transaction usage, and while opening and closing sessions.
It seems the leak would be released once the connection is closed.
We added a new testsuite under ./tests/leak-tests To fix and validate these issues
Configurations employing shared-storage with NFS are susceptible to
split-brain in certain scenarios. For example:
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Split-brain.
In reality this situation is pretty unlikely due to the timing involved,
but the possibility still exists. Currently the file lock held by the
primary broker on the NFS share is essentially worthless in this
situation. This commit adds logic by which the timestamp of the lock
file is updated during activation and then routinely checked during
runtime to ensure consistency. This effectively mitigates split-brain in
this situation (and likely others). Here's how it works now.
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Primary detects that the lock file's timestamp has been updated and
shuts itself down.
When the primary shuts down in step #4 the Topology on the backup can be
damaged. Protections were added for this via ARTEMIS-2868 but only for
the replicated use-case. This commit applies the protection for
removeMember() so that the Topology remains intact.
There are no tests for these changes as I cannot determine how to
properly simulate this use-case. However, there have never been robust,
automated tests for these kinds of NFS use-cases so this is not a
departure from the norm.
I am adding three attributes to Address-settings:
* page-limit-bytes: Number of bytes. We will convert this metric into max number of pages internally by dividing max-bytes / page-size. It will allow a max based on an estimate.
* page-limit-messages: Number of messages
* page-full-message-policy: fail or drop
We will now allow paging, until these max values and then fail or drop messages.
Once these values are retracted, the address will remain full until a period where cleanup is kicked in by paging. So these values may have a certain delay on being applied, but they should always be cleared once cleanup happened.
Moves embedded XSD element definitions and associated complexTypes
from XSD element definitions to top-level types available for XML
schema validation in IDEs.