The broker process fails to exit if an error is encountered starting the NodeManager. The issue is resolved by converting the critical analyzer thread to a daemon thread. As added protection, the thread is manually stopped when this error is encountered.
When skipping the authentication cache details for the original
exception are not logged.
This commit ensures these details are logged and adopts the
ExceptionUtils class from Apache Commons Lang in lieu of the previous
custom implementation.
When sending, for example, to a predefined anycast address and queue
from a multicast (JMS topic) producer, the routed count on the address
is incremented, but the message count on the matching queue is not. No
indication is given at the client end that the messages failed to get
routed - the messages are just silently dropped.
Fixing this problem requires a slight semantic change. The broker is now
more strict in what it allows specifically with regards to
auto-creation. If, for example, a JMS application attempts to send a
message to a topic and the corresponding multicast address doesn't exist
already or the broker cannot automatically create it or update it then
sending the message will fail.
Also, part of this commit moves a chunk of auto-create logic into
ServerSession and adds an enum for auto-create results. Aside from
helping fix this specific issue this can serve as a foundation for
de-duplicating the auto-create logic spread across many of the protocol
implementations.
- activemq.notifications are being transferred to the target node, unless an ignore is setup
- topics are being duplicated after redistribution
- topics sends are being duplicated when a 2 node cluster mirrors to another 2 node cluster, and both nodes are mirrored.
- interrupted message breaking reference counting
After the server writing to the client is interrupted in AMQP, the reference counting was broken what would require the server restarted
in order to cleanup the files of any interrupted sends.
- Removed consumer during large message delivery damaging large messages
If the consumer failed to deliver messages for any reason, the message on the queue would be duplicated. what would wipe out the body of the message
and other journal errors would happen because of this.
extra debug capabilities added into RefCountMessage as part of ARTEMIS-4206 in order to identify these issues
This commit fixes the following things:
- Moves connection audit logging to the resource audit logger instead
of using a dedicated logger as that would adversely impact upgrading
users, and arguably didn't make sense in the first place.
- Mitigates an potential NPE w.r.t. connection ID.
- Updates the "dummy" management connection to return a valid
connection ID.
This fix will delay the message.copy to the redistributor itself.
Meaning no copy would be performed if the redistribution itself failed.
No need to remove a copy any longer
The federated queue consumer has to generate a new id for the messages
received from the upstream broker because they have an id generated by
the store manager of the upstream broker.
Co-authored-by: Clebert Suconic <clebertsuconic@apache.org>
- redistribute received the handle call, it then copies the message
- the routing table changes
- the message is left behind
With the new version of the server these messages will be removed. But we should remove these right away
Basically I started the testsuite and attached check leak with "java -jar check-leak.jar --pid <pid> --report testsuite-report --sleep 1000" and saw the allocations of this were pretty high.
I have seen a werid intermittent failure in the testsuite
caused by some race where the EMPTY_SET ends up Not Empty! Causing weird failures that were really difficult to be investigated
This fix is scanning journal and paging for existing large messages. We will remove any large messages that do not have a corresponding record in journals or paging.
This is just an annoying thing. as the functionality would work properly if there was a previous value.
No other harm other than the cannot find TX record on startup for this, hence I'm fixing it.
There are certain use-cases where addresses will be auto-created and
never have a direct binding created on them. Because of this they will
never be auto-deleted. If a large number of these addresses build up
they will consume a problematic amount of heap space.
One specific example of this use-case is an MQTT subscriber with a
wild-card subscription and a large number of MQTT producers sending one
or two messages a large number of different MQTT topics covered by the
wild-card. Since no bindings are ever created on any of these individual
addresses (e.g. from a subscription queue) they will never be
auto-deleted, but they will eventually consume a large amount of heap.
The only way to deal with these addresses is to manually delete them.
There are also situations where queues may be created and never have
any messages sent to them or never have a consumer connect. These
queues will never be auto-deleted so they must be deleted manually.
This commit adds the ability to configure the broker to skip the usage
check so that these kinds of addresses and queues can be deleted
automatically.
there are two leaks here:
* QueueImpl::delivery might create a new iterator if a delivery happens right after a consumer was removed, and that iterator might belog to a consumer that was already closed
as a result of that, the iterator may leak messages and hold references until a reboot is done. I have seen scenarios where messages would not be dleivered because of this.
* ProtonTransaction holding references: the last transaction might hold messages in the memory longer than expected. In tests I have performed the messages were accumulating in memory. and I cleared it here.
The issue identified with AMQP was under Transaction usage, and while opening and closing sessions.
It seems the leak would be released once the connection is closed.
We added a new testsuite under ./tests/leak-tests To fix and validate these issues
Configurations employing shared-storage with NFS are susceptible to
split-brain in certain scenarios. For example:
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Split-brain.
In reality this situation is pretty unlikely due to the timing involved,
but the possibility still exists. Currently the file lock held by the
primary broker on the NFS share is essentially worthless in this
situation. This commit adds logic by which the timestamp of the lock
file is updated during activation and then routinely checked during
runtime to ensure consistency. This effectively mitigates split-brain in
this situation (and likely others). Here's how it works now.
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Primary detects that the lock file's timestamp has been updated and
shuts itself down.
When the primary shuts down in step #4 the Topology on the backup can be
damaged. Protections were added for this via ARTEMIS-2868 but only for
the replicated use-case. This commit applies the protection for
removeMember() so that the Topology remains intact.
There are no tests for these changes as I cannot determine how to
properly simulate this use-case. However, there have never been robust,
automated tests for these kinds of NFS use-cases so this is not a
departure from the norm.
I am adding three attributes to Address-settings:
* page-limit-bytes: Number of bytes. We will convert this metric into max number of pages internally by dividing max-bytes / page-size. It will allow a max based on an estimate.
* page-limit-messages: Number of messages
* page-full-message-policy: fail or drop
We will now allow paging, until these max values and then fail or drop messages.
Once these values are retracted, the address will remain full until a period where cleanup is kicked in by paging. So these values may have a certain delay on being applied, but they should always be cleared once cleanup happened.
Moves embedded XSD element definitions and associated complexTypes
from XSD element definitions to top-level types available for XML
schema validation in IDEs.
I am adding an option sync=true or false on mirror. if sync, any client blocking operation will wait a roundtrip to the mirror
acting like a sync replica.
When the last non-durable subscriber on a JMS topic disconnects the
corresponding queue representing the subscription is deleted as
expected. However, the queue's address will also be deleted no matter
what, which is *not* expected.
Some LDAP servers (e.g. OpenLDAP) do not support the "persistent search"
feature and therefore the existing "listener" feature does not actually
fetch updates. This commit implements a "pull" feature controlled by a
configurable interval equivalent to what is implemented in the cached
LDAP authorization module from ActiveMQ "Classic."
Allow setting id-cache-size to 0 from broker.xml and ensure the broker
handles this gracefully. Previously you could only set the cache size to
0 via broker properties or programmatically and it would throw an
ArrayIndexOutOfBoundsException when adding an item to the cache.
- From now on we will save snapshots of page-counters on the journal (basically for compatibility with previous verions).
And we will recount the records on startup.
- While the rebuild is being done the value from the previous snapshot is still available with current updates.
when cancelling a large number of messages, the addSorted could be holding a lock for too long causing the server to crash under CriticalAnalyzer
co-authored: AntonRoskvist <anton.roskvist@volvo.com> (discovering the issue and providing the test ClientCrashMassiveRollbackTest.java)
In order to improve trouble-shooting for the MetricsManager there should
be additional logging and exceptions. In all, this commit contains the
following changes:
- Additional logging
- Throw an exception when registering meters if meters already exist
- Rename a few variables & methods to more clearly identify what they
are used for
- Upgrade Micrometer to 1.9.5
- Simplify/clarify a few blocks of code
- No longer pass the ManagementServiceImpl when registering the
metrics, but instead pass the Object the meter is observing (e.g.
broker, address, or queue)
- optimize startup time on paging (check-depage on startup)
- otpimize getNextPage() on complete pages
- optimize getFirstMessage() and paging. (avoid iterator usage)
Attempt to standardize all Logger declaration to a singular variable name
which makes the code more consistent and make finding usages of loggers in
the code a bit easier.
Logger statements should use formatting syntax and let the normal framework checks take care of
checking if a logger is enabled instead of string concats and isXEnabled logger checks except
in cases there is known expense to the specifc logging message/arg preparation or passing.
Changes from myself and Robbie Gemmell.
Co-authored-by: Robbie Gemmell <robbie@apache.org>
The issue is that depage should not put pages on the used pages as they were not actually intended to read.
instead I should create a newPageObject and not use the RefCounts caching.
In commit a9a85f98db I removed the code
which modified existing matches. However, I forgot that the matches read
from LDAP are often duplicated so instead of always adding a new match
this commit ensures that the *right* match is modified rather than a
potentially more generic wildcard match (which was the original
problem).
If an AMQP consumer tries to receive a message and the broker is unable
to convert the message from core to AMQP then the consumer is
disconnected and the offending message stays in the queue. When the
consumer reconnects the conversion error will happen again resulting in
a loop that can only be resolved through administrative action (e.g.
deleting the message manually or sending it to a dead letter address).
This commit fixes that problem by detecting the conversion problem and
sending the message to the queue's dead letter address. It also doesn't
disconnect the consumer.
This commit also changes the log messages associated with sending a
message to the dead letter address since this event can now occur
regardless of the delivery attempts.
Incorrect handling of unknown values in selectors.
There is a slight semantic change here due to an error in the way we
were handling null identifiers. This may require a change in selector
syntax to use "IS NULL" or "IS NOT NULL" when using identifiers which
may be null in the message being selected.
This was the case for an internal filter used by the cluster connection
bridge to select which cluster notification messages to consume.
See https://issues.apache.org/jira/browse/AMQ-5281 for more details.
The map used by LastValueQueue was inadvertently changed to a
non-thread-safe implementation in
4a4765c39c. This resulted in an occasional
ConcurrentModificationException from the hashCode implementation.
This commit restores the thread-safe map implementation and adds a test
which brute-forces a CME when using the non-thread-safe implementation.
When the LegacyLDAPSecuritySettingPlugin has enableListener set to true
and a new permission is added it will try to modify the existing match
if one exists. This is problematic if there's a more generic wildcard
match than the specific one that's modified.
This commit fixes that problem so that instead of modifying the existing
match(es) it simply adds a new one. The plugin never should have tried
modifying the existing match in the first place as two identical matches
would be a configuration error.
- Fixing RoleInfo to provide informations on deleteAddress.
- Adding more coverage on test to check the number of permissions
returned.
Signed-off-by: Emmanuel Hugonnet <ehugonne@redhat.com>
By allowing to pass caller's classname directly to org.apache.activemq.artemis.utils.ActiveMQThreadFactory#defaultThreadFactory instead of calculating it from stack.
RoleSet.class.getMethods() returns the same methods on both OpenJDK 11 and
OpenJ9 JDK 11 but the order is different. OpenJDK 11 returns
`public void org.apache.activemq.artemis.core.config.impl.RoleSet.add` before
`public boolean java.util.HashSet.add` while OpenJ9 JDK 11 returns
`public boolean java.util.HashSet.add` before
`public void org.apache.activemq.artemis.core.config.impl.RoleSet.add`
Due to the changes in 682f505e32 we now
send "Last Will & Testament" MQTT messages via ServerSession. This means
sending will fail if the disk is full. For MQTT this triggers a
connection failure which in turns triggers sending an LWT message. This
process will recurse infinitely until it results in a
java.lang.StackOverflowError.
This commit fixes that by tracking whether or not sending a LWT message
is already in progress.
org.apache.activemq.artemis.spi.core.protocol.RemotingConnection has a
number of implementations most notably an abstract version which
provides many methods shared among the implementations. The sharing
could be improved to eliminate duplicate code.
This commit eliminates more than 700 lines of unnecessary code.
There should be no semantic changes.
When a message is sent to an anycast queue via FQQN on one node of a
cluster and then a consumer is created on that same anycast queue via
FQQN on another node in the cluster the message is not redistributed to
the node with the consumer.
This commit fixes this use-case primarily by including the FQQN info in
the notification messages sent to other nodes in the cluster.
Messages without a last-value property sent to an LVQ are being pruned
rather than just passing through. Only messages with a non-null
last-value property should be subject to pruning.
Running HorizontalPagingTest with these variables would make the test to fail unless these changes are applied.
export TEST_HORIZONTAL_SERVER_START_TIMEOUT=300000
export TEST_HORIZONTAL_TIMEOUT_MINUTES=120
export TEST_HORIZONTAL_PROTOCOL_LIST=OPENWIRE
export TEST_HORIZONTAL_OPENWIRE_DESTINATIONS=200
export TEST_HORIZONTAL_OPENWIRE_MESSAGES=1000
export TEST_HORIZONTAL_OPENWIRE_COMMIT_INTERVAL=100
export TEST_HORIZONTAL_OPENWIRE_RECEIVE_COMMIT_INTERVAL=0
export TEST_HORIZONTAL_OPENWIRE_MESSAGE_SIZE=20000
export TEST_HORIZONTAL_OPENWIRE_PARALLEL_SENDS=10
We were lucky that processReferences was pretty much a static operation, hence I am moving it away from RoutingContext both for clarity and avoiding state being needed on that method.
If state was needed as part of processReferences you would had a pretty nasty bug as the IOCallback could introduce a race where a send would change the state while the IO was pending.
Both audit logging and logging from the LoggingActiveMQServerPlugin are
unclear as they relate to transactional sends and acks. Both essentially
ignore the transaction which makes it appear that an operation has taken
place when, in fact, it hasn't (e.g. a transactional ack is rolled back
but the log indicates the ack went through).
This commit fix this with the following changes:
- Log details when a send or ack is added to a transaction.
- Log details when the transaction is committed.
- Log when the transaction is rolled back.
- Include transaction details in the relevant DEBUG logs.
- Simplify INFO level logging for sends & acks in
LoggingActiveMQServerPlugin. Ensure details are in the DEBUG logs.
Other changes:
- Make capitalization more consistent in a handful of audit logs.
AddressControl has 2 methods to get same metric. Both
getNumberOfMessages() and getMessageCount() return the same metric
albeit in different ways.
Also, getNumberOfMessages() inspects both "local" and "remote" queue
bindings which is wrong.
This commit fixes these issues via the following changes:
- Deprecate getNumberOfMessages().
- Change getNumberOfMessages() to invoke getMessageCount().
- Add a test to ensure getNumberOfMessages() does not count remote
queue bindings.
- Simplify getMessageCount(DurabilityType).
This is caused by too many entries on the HashMap for ThreadLocals.
Also: I'm reviewing some readlock usage on the StorageManager to simplify things a little bit.
It would be useful to be able to cycle the embedded web server if, for
example, one needed to renew the SSL certificates. To support
functionality I made a handful of changes, e.g.:
- Refactoring WebServerComponent so that all the necessary
configuration would happen in the start() method.
- Refactoring WebServerComponentTest to re-use code.
Allow replication only certain addresses with mirror controller.
The configuration is similar to cluster address configuration.
Co-authored-by: Robbie Gemmell <robbie@apache.org>
The control existing in Redistributor is not needed as the Queue::deliver will already have a control on re-scheduling the loop and avoid holding references for too long.
This PR adds a "Validated User" column to the Hawtio plugin. The column is
not visible by default. The "Validated User" column is available in the
Session, Consumer, and Producer tabs of the artemis-navigation notebook.
"Validated User" is also available for sort and filtering in those view.
The "Validated Consumer" in the Producer view is always blank. I debugged
as far as I could, but did not find the cause of the issue.
It is possible to receive a compressed message from the client as regular message. Such a message will already contain correct body size, that takes compression into account.
Older versions of Openwire clients wil be affected by AMQ-6431.
As a result of the issue if the ID of the message>Integer.MAX_VALUE
a consumer configured with Failover and doing duplicate detection on the client
will not be able to process duplicate detection accordingly and miss messages.
Paging only removes files at the beginning of the stream...
Say you have paged files 1 through 1000...
if all the messages are ack, but one message on file 1 is missing an ack, all the 999 subsequent files would not be removed until all the messages on file 1 is ack.
This was working as engineered, but sometimes devs don't have complete control on their app.
With this improvement we will now remove messages in the middle of the stream as well.
There is also some improvement to how browsing and page work with this
The utility methods in
`org.apache.activemq.artemis.core.management.impl.MBeanInfoHelper` are
executed *a lot* - especially for Jolokia which is used by the web
console. The `MBeanOperationInfo` and `MBeanAttributeInfo` results are
static and reflection is slow therefore they should not be calculated
over and over again. Rather they should be calculated once and cached
for later use.
Caching these results significantly improves performance. Over the
course of 1,000,000 invocations the difference is several orders of
magnitude. This improves usability substantially when dealing with,
for example, tens of thousands of addresses and/or queues.
When using a temporary queue with a `temporary-queue-namespace` the
`AddressSettings` lookup wasn't correct. This commit fixes that and
refactors `QueueImpl` a bit so that it holds a copy of its
`AddressSettings` rather than looking them up all the time. If any
relevant `AddressSettings` changes the
`HierarchicalRepositoryChangeListener` implementation will still
refresh the `QueueImpl` appropriately.
The `QueueControlImpl` was likewise changed to get the dead-letter
address and expiry address directly from the `QueueImpl` rather than
looking them up in the `AddressSettings` repository.
I modified some code that came from ARTEMIS-734, but I ran the test that
was associated with that Jira (i.e.
`o.a.a.a.t.i.c.d.ExpireWhileLoadBalanceTest`) and it passed so I think
that should be fine. There actually was no test included with the
original commit. One was added later so it's hard to say for sure it
exactly captures the original issue.
It sometimes makes sense to set an acceptor's port to 0 to allow the JVM
to select an ephemeral port (e.g. in embedded integration tests). This
commit adds a new getter on NettyAcceptor so tests can programmtically
determine the actual port used by the acceptor.
This commit also changes the ACCEPTOR_STARTED notification and the
related logging to clarify the actual port value where clients can
connect.
The auto-create-jms-queues, auto-delete-jms-queues,
auto-create-jms-topics, and auto-delete-jms-topics address settings
were deprecated in ARTEMIS-881 way back in 2016. There's no need to keep
them in the default broker.xml at this point.
JGroups 3.x hasn't been updated in some time now. The last release was
in April 2020 almost 2 years ago. Lots of protocols have been updated
and added and users are wanting to use them. There is also increasing
concern about using older components triggered mainly by other
recently-discovered high-profile vulnerabilities in the wider Open
Source Java community.
This commit bumps JGroups up to the latest release - 5.2.0.Final.
However, there is a cost associated with upgrading.
The old-style properties configuration is no longer supported. I think
it's unlikely that end-users are leveraging this because it is not
exposed via broker.xml. The JGroups XML configuration has been around
for a long time, is widely adopted, and is still supported. I expect
most (if not all) users are using this. However, a handful of tests
needed to be updated and/or removed to deal with this absence.
Some protocols and/or protocol properties are no longer supported. This
means that users may have to change their JGroups stack configurations
when they upgrade. For example, our own clustered-jgroups example had to
be updated or it wouldn't run properly.
MQTT 5 is an OASIS standard which debuted in March 2019. It boasts
numerous improvments over its predecessor (i.e. MQTT 3.1.1) which will
benefit users. These improvements are summarized in the specification
at:
https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901293
The specification describes all the behavior necessary for a client or
server to conform. The spec is highlighted with special "normative"
conformance statements which distill the descriptions into concise
terms. The specification provides a helpful summary of all these
statements. See:
https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901292
This commit implements all of the mandatory elements from the
specification and provides tests which are identified using the
corresponding normative conformance statement. All normative
conformance statements either have an explicit test or are noted in
comments with an explanation of why an explicit test doesn't exist. See
org.apache.activemq.artemis.tests.integration.mqtt5 for all those
details.
This commit also includes documentation about how to configure
everything related to the new MQTT 5 features.
The address-setting config-delete-diverts is not being applied correctly
hierarchically because it's not included in the merge() method. It is
also not being persisted to disk either. This commit fixes both issues.