when cancelling a large number of messages, the addSorted could be holding a lock for too long causing the server to crash under CriticalAnalyzer
co-authored: AntonRoskvist <anton.roskvist@volvo.com> (discovering the issue and providing the test ClientCrashMassiveRollbackTest.java)
In order to improve trouble-shooting for the MetricsManager there should
be additional logging and exceptions. In all, this commit contains the
following changes:
- Additional logging
- Throw an exception when registering meters if meters already exist
- Rename a few variables & methods to more clearly identify what they
are used for
- Upgrade Micrometer to 1.9.5
- Simplify/clarify a few blocks of code
- No longer pass the ManagementServiceImpl when registering the
metrics, but instead pass the Object the meter is observing (e.g.
broker, address, or queue)
- optimize startup time on paging (check-depage on startup)
- otpimize getNextPage() on complete pages
- optimize getFirstMessage() and paging. (avoid iterator usage)
Attempt to standardize all Logger declaration to a singular variable name
which makes the code more consistent and make finding usages of loggers in
the code a bit easier.
Logger statements should use formatting syntax and let the normal framework checks take care of
checking if a logger is enabled instead of string concats and isXEnabled logger checks except
in cases there is known expense to the specifc logging message/arg preparation or passing.
Changes from myself and Robbie Gemmell.
Co-authored-by: Robbie Gemmell <robbie@apache.org>
The issue is that depage should not put pages on the used pages as they were not actually intended to read.
instead I should create a newPageObject and not use the RefCounts caching.
In commit a9a85f98db I removed the code
which modified existing matches. However, I forgot that the matches read
from LDAP are often duplicated so instead of always adding a new match
this commit ensures that the *right* match is modified rather than a
potentially more generic wildcard match (which was the original
problem).
If an AMQP consumer tries to receive a message and the broker is unable
to convert the message from core to AMQP then the consumer is
disconnected and the offending message stays in the queue. When the
consumer reconnects the conversion error will happen again resulting in
a loop that can only be resolved through administrative action (e.g.
deleting the message manually or sending it to a dead letter address).
This commit fixes that problem by detecting the conversion problem and
sending the message to the queue's dead letter address. It also doesn't
disconnect the consumer.
This commit also changes the log messages associated with sending a
message to the dead letter address since this event can now occur
regardless of the delivery attempts.
Incorrect handling of unknown values in selectors.
There is a slight semantic change here due to an error in the way we
were handling null identifiers. This may require a change in selector
syntax to use "IS NULL" or "IS NOT NULL" when using identifiers which
may be null in the message being selected.
This was the case for an internal filter used by the cluster connection
bridge to select which cluster notification messages to consume.
See https://issues.apache.org/jira/browse/AMQ-5281 for more details.
The map used by LastValueQueue was inadvertently changed to a
non-thread-safe implementation in
4a4765c39c. This resulted in an occasional
ConcurrentModificationException from the hashCode implementation.
This commit restores the thread-safe map implementation and adds a test
which brute-forces a CME when using the non-thread-safe implementation.
When the LegacyLDAPSecuritySettingPlugin has enableListener set to true
and a new permission is added it will try to modify the existing match
if one exists. This is problematic if there's a more generic wildcard
match than the specific one that's modified.
This commit fixes that problem so that instead of modifying the existing
match(es) it simply adds a new one. The plugin never should have tried
modifying the existing match in the first place as two identical matches
would be a configuration error.
- Fixing RoleInfo to provide informations on deleteAddress.
- Adding more coverage on test to check the number of permissions
returned.
Signed-off-by: Emmanuel Hugonnet <ehugonne@redhat.com>
By allowing to pass caller's classname directly to org.apache.activemq.artemis.utils.ActiveMQThreadFactory#defaultThreadFactory instead of calculating it from stack.
RoleSet.class.getMethods() returns the same methods on both OpenJDK 11 and
OpenJ9 JDK 11 but the order is different. OpenJDK 11 returns
`public void org.apache.activemq.artemis.core.config.impl.RoleSet.add` before
`public boolean java.util.HashSet.add` while OpenJ9 JDK 11 returns
`public boolean java.util.HashSet.add` before
`public void org.apache.activemq.artemis.core.config.impl.RoleSet.add`
Due to the changes in 682f505e32 we now
send "Last Will & Testament" MQTT messages via ServerSession. This means
sending will fail if the disk is full. For MQTT this triggers a
connection failure which in turns triggers sending an LWT message. This
process will recurse infinitely until it results in a
java.lang.StackOverflowError.
This commit fixes that by tracking whether or not sending a LWT message
is already in progress.
org.apache.activemq.artemis.spi.core.protocol.RemotingConnection has a
number of implementations most notably an abstract version which
provides many methods shared among the implementations. The sharing
could be improved to eliminate duplicate code.
This commit eliminates more than 700 lines of unnecessary code.
There should be no semantic changes.
When a message is sent to an anycast queue via FQQN on one node of a
cluster and then a consumer is created on that same anycast queue via
FQQN on another node in the cluster the message is not redistributed to
the node with the consumer.
This commit fixes this use-case primarily by including the FQQN info in
the notification messages sent to other nodes in the cluster.
Messages without a last-value property sent to an LVQ are being pruned
rather than just passing through. Only messages with a non-null
last-value property should be subject to pruning.
Running HorizontalPagingTest with these variables would make the test to fail unless these changes are applied.
export TEST_HORIZONTAL_SERVER_START_TIMEOUT=300000
export TEST_HORIZONTAL_TIMEOUT_MINUTES=120
export TEST_HORIZONTAL_PROTOCOL_LIST=OPENWIRE
export TEST_HORIZONTAL_OPENWIRE_DESTINATIONS=200
export TEST_HORIZONTAL_OPENWIRE_MESSAGES=1000
export TEST_HORIZONTAL_OPENWIRE_COMMIT_INTERVAL=100
export TEST_HORIZONTAL_OPENWIRE_RECEIVE_COMMIT_INTERVAL=0
export TEST_HORIZONTAL_OPENWIRE_MESSAGE_SIZE=20000
export TEST_HORIZONTAL_OPENWIRE_PARALLEL_SENDS=10
We were lucky that processReferences was pretty much a static operation, hence I am moving it away from RoutingContext both for clarity and avoiding state being needed on that method.
If state was needed as part of processReferences you would had a pretty nasty bug as the IOCallback could introduce a race where a send would change the state while the IO was pending.
Both audit logging and logging from the LoggingActiveMQServerPlugin are
unclear as they relate to transactional sends and acks. Both essentially
ignore the transaction which makes it appear that an operation has taken
place when, in fact, it hasn't (e.g. a transactional ack is rolled back
but the log indicates the ack went through).
This commit fix this with the following changes:
- Log details when a send or ack is added to a transaction.
- Log details when the transaction is committed.
- Log when the transaction is rolled back.
- Include transaction details in the relevant DEBUG logs.
- Simplify INFO level logging for sends & acks in
LoggingActiveMQServerPlugin. Ensure details are in the DEBUG logs.
Other changes:
- Make capitalization more consistent in a handful of audit logs.
AddressControl has 2 methods to get same metric. Both
getNumberOfMessages() and getMessageCount() return the same metric
albeit in different ways.
Also, getNumberOfMessages() inspects both "local" and "remote" queue
bindings which is wrong.
This commit fixes these issues via the following changes:
- Deprecate getNumberOfMessages().
- Change getNumberOfMessages() to invoke getMessageCount().
- Add a test to ensure getNumberOfMessages() does not count remote
queue bindings.
- Simplify getMessageCount(DurabilityType).
This is caused by too many entries on the HashMap for ThreadLocals.
Also: I'm reviewing some readlock usage on the StorageManager to simplify things a little bit.
It would be useful to be able to cycle the embedded web server if, for
example, one needed to renew the SSL certificates. To support
functionality I made a handful of changes, e.g.:
- Refactoring WebServerComponent so that all the necessary
configuration would happen in the start() method.
- Refactoring WebServerComponentTest to re-use code.
Allow replication only certain addresses with mirror controller.
The configuration is similar to cluster address configuration.
Co-authored-by: Robbie Gemmell <robbie@apache.org>
The control existing in Redistributor is not needed as the Queue::deliver will already have a control on re-scheduling the loop and avoid holding references for too long.
This PR adds a "Validated User" column to the Hawtio plugin. The column is
not visible by default. The "Validated User" column is available in the
Session, Consumer, and Producer tabs of the artemis-navigation notebook.
"Validated User" is also available for sort and filtering in those view.
The "Validated Consumer" in the Producer view is always blank. I debugged
as far as I could, but did not find the cause of the issue.
It is possible to receive a compressed message from the client as regular message. Such a message will already contain correct body size, that takes compression into account.
Older versions of Openwire clients wil be affected by AMQ-6431.
As a result of the issue if the ID of the message>Integer.MAX_VALUE
a consumer configured with Failover and doing duplicate detection on the client
will not be able to process duplicate detection accordingly and miss messages.
Paging only removes files at the beginning of the stream...
Say you have paged files 1 through 1000...
if all the messages are ack, but one message on file 1 is missing an ack, all the 999 subsequent files would not be removed until all the messages on file 1 is ack.
This was working as engineered, but sometimes devs don't have complete control on their app.
With this improvement we will now remove messages in the middle of the stream as well.
There is also some improvement to how browsing and page work with this
The utility methods in
`org.apache.activemq.artemis.core.management.impl.MBeanInfoHelper` are
executed *a lot* - especially for Jolokia which is used by the web
console. The `MBeanOperationInfo` and `MBeanAttributeInfo` results are
static and reflection is slow therefore they should not be calculated
over and over again. Rather they should be calculated once and cached
for later use.
Caching these results significantly improves performance. Over the
course of 1,000,000 invocations the difference is several orders of
magnitude. This improves usability substantially when dealing with,
for example, tens of thousands of addresses and/or queues.
When using a temporary queue with a `temporary-queue-namespace` the
`AddressSettings` lookup wasn't correct. This commit fixes that and
refactors `QueueImpl` a bit so that it holds a copy of its
`AddressSettings` rather than looking them up all the time. If any
relevant `AddressSettings` changes the
`HierarchicalRepositoryChangeListener` implementation will still
refresh the `QueueImpl` appropriately.
The `QueueControlImpl` was likewise changed to get the dead-letter
address and expiry address directly from the `QueueImpl` rather than
looking them up in the `AddressSettings` repository.
I modified some code that came from ARTEMIS-734, but I ran the test that
was associated with that Jira (i.e.
`o.a.a.a.t.i.c.d.ExpireWhileLoadBalanceTest`) and it passed so I think
that should be fine. There actually was no test included with the
original commit. One was added later so it's hard to say for sure it
exactly captures the original issue.
It sometimes makes sense to set an acceptor's port to 0 to allow the JVM
to select an ephemeral port (e.g. in embedded integration tests). This
commit adds a new getter on NettyAcceptor so tests can programmtically
determine the actual port used by the acceptor.
This commit also changes the ACCEPTOR_STARTED notification and the
related logging to clarify the actual port value where clients can
connect.
The auto-create-jms-queues, auto-delete-jms-queues,
auto-create-jms-topics, and auto-delete-jms-topics address settings
were deprecated in ARTEMIS-881 way back in 2016. There's no need to keep
them in the default broker.xml at this point.
JGroups 3.x hasn't been updated in some time now. The last release was
in April 2020 almost 2 years ago. Lots of protocols have been updated
and added and users are wanting to use them. There is also increasing
concern about using older components triggered mainly by other
recently-discovered high-profile vulnerabilities in the wider Open
Source Java community.
This commit bumps JGroups up to the latest release - 5.2.0.Final.
However, there is a cost associated with upgrading.
The old-style properties configuration is no longer supported. I think
it's unlikely that end-users are leveraging this because it is not
exposed via broker.xml. The JGroups XML configuration has been around
for a long time, is widely adopted, and is still supported. I expect
most (if not all) users are using this. However, a handful of tests
needed to be updated and/or removed to deal with this absence.
Some protocols and/or protocol properties are no longer supported. This
means that users may have to change their JGroups stack configurations
when they upgrade. For example, our own clustered-jgroups example had to
be updated or it wouldn't run properly.
MQTT 5 is an OASIS standard which debuted in March 2019. It boasts
numerous improvments over its predecessor (i.e. MQTT 3.1.1) which will
benefit users. These improvements are summarized in the specification
at:
https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901293
The specification describes all the behavior necessary for a client or
server to conform. The spec is highlighted with special "normative"
conformance statements which distill the descriptions into concise
terms. The specification provides a helpful summary of all these
statements. See:
https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901292
This commit implements all of the mandatory elements from the
specification and provides tests which are identified using the
corresponding normative conformance statement. All normative
conformance statements either have an explicit test or are noted in
comments with an explanation of why an explicit test doesn't exist. See
org.apache.activemq.artemis.tests.integration.mqtt5 for all those
details.
This commit also includes documentation about how to configure
everything related to the new MQTT 5 features.
The address-setting config-delete-diverts is not being applied correctly
hierarchically because it's not included in the merge() method. It is
also not being persisted to disk either. This commit fixes both issues.
scenario - avoid paging, if address is full chain another broker and produce to the head, consume from the tail using producer and consumer roles to partition connections. When tail is drained, drop it.
- adds a option to treat an idle consumer as slow
- adds basic support for credit based address blocking ARTEMIS-2097
- adds some more visiblity to address memory usage and balancer attribute modifier operations