I have seen a werid intermittent failure in the testsuite
caused by some race where the EMPTY_SET ends up Not Empty! Causing weird failures that were really difficult to be investigated
This fix is scanning journal and paging for existing large messages. We will remove any large messages that do not have a corresponding record in journals or paging.
This is just an annoying thing. as the functionality would work properly if there was a previous value.
No other harm other than the cannot find TX record on startup for this, hence I'm fixing it.
There are certain use-cases where addresses will be auto-created and
never have a direct binding created on them. Because of this they will
never be auto-deleted. If a large number of these addresses build up
they will consume a problematic amount of heap space.
One specific example of this use-case is an MQTT subscriber with a
wild-card subscription and a large number of MQTT producers sending one
or two messages a large number of different MQTT topics covered by the
wild-card. Since no bindings are ever created on any of these individual
addresses (e.g. from a subscription queue) they will never be
auto-deleted, but they will eventually consume a large amount of heap.
The only way to deal with these addresses is to manually delete them.
There are also situations where queues may be created and never have
any messages sent to them or never have a consumer connect. These
queues will never be auto-deleted so they must be deleted manually.
This commit adds the ability to configure the broker to skip the usage
check so that these kinds of addresses and queues can be deleted
automatically.
there are two leaks here:
* QueueImpl::delivery might create a new iterator if a delivery happens right after a consumer was removed, and that iterator might belog to a consumer that was already closed
as a result of that, the iterator may leak messages and hold references until a reboot is done. I have seen scenarios where messages would not be dleivered because of this.
* ProtonTransaction holding references: the last transaction might hold messages in the memory longer than expected. In tests I have performed the messages were accumulating in memory. and I cleared it here.
The issue identified with AMQP was under Transaction usage, and while opening and closing sessions.
It seems the leak would be released once the connection is closed.
We added a new testsuite under ./tests/leak-tests To fix and validate these issues
Configurations employing shared-storage with NFS are susceptible to
split-brain in certain scenarios. For example:
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Split-brain.
In reality this situation is pretty unlikely due to the timing involved,
but the possibility still exists. Currently the file lock held by the
primary broker on the NFS share is essentially worthless in this
situation. This commit adds logic by which the timestamp of the lock
file is updated during activation and then routinely checked during
runtime to ensure consistency. This effectively mitigates split-brain in
this situation (and likely others). Here's how it works now.
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Primary detects that the lock file's timestamp has been updated and
shuts itself down.
When the primary shuts down in step #4 the Topology on the backup can be
damaged. Protections were added for this via ARTEMIS-2868 but only for
the replicated use-case. This commit applies the protection for
removeMember() so that the Topology remains intact.
There are no tests for these changes as I cannot determine how to
properly simulate this use-case. However, there have never been robust,
automated tests for these kinds of NFS use-cases so this is not a
departure from the norm.
I am adding three attributes to Address-settings:
* page-limit-bytes: Number of bytes. We will convert this metric into max number of pages internally by dividing max-bytes / page-size. It will allow a max based on an estimate.
* page-limit-messages: Number of messages
* page-full-message-policy: fail or drop
We will now allow paging, until these max values and then fail or drop messages.
Once these values are retracted, the address will remain full until a period where cleanup is kicked in by paging. So these values may have a certain delay on being applied, but they should always be cleared once cleanup happened.
Moves embedded XSD element definitions and associated complexTypes
from XSD element definitions to top-level types available for XML
schema validation in IDEs.
I am adding an option sync=true or false on mirror. if sync, any client blocking operation will wait a roundtrip to the mirror
acting like a sync replica.
When the last non-durable subscriber on a JMS topic disconnects the
corresponding queue representing the subscription is deleted as
expected. However, the queue's address will also be deleted no matter
what, which is *not* expected.
Some LDAP servers (e.g. OpenLDAP) do not support the "persistent search"
feature and therefore the existing "listener" feature does not actually
fetch updates. This commit implements a "pull" feature controlled by a
configurable interval equivalent to what is implemented in the cached
LDAP authorization module from ActiveMQ "Classic."
Allow setting id-cache-size to 0 from broker.xml and ensure the broker
handles this gracefully. Previously you could only set the cache size to
0 via broker properties or programmatically and it would throw an
ArrayIndexOutOfBoundsException when adding an item to the cache.
- From now on we will save snapshots of page-counters on the journal (basically for compatibility with previous verions).
And we will recount the records on startup.
- While the rebuild is being done the value from the previous snapshot is still available with current updates.
when cancelling a large number of messages, the addSorted could be holding a lock for too long causing the server to crash under CriticalAnalyzer
co-authored: AntonRoskvist <anton.roskvist@volvo.com> (discovering the issue and providing the test ClientCrashMassiveRollbackTest.java)
In order to improve trouble-shooting for the MetricsManager there should
be additional logging and exceptions. In all, this commit contains the
following changes:
- Additional logging
- Throw an exception when registering meters if meters already exist
- Rename a few variables & methods to more clearly identify what they
are used for
- Upgrade Micrometer to 1.9.5
- Simplify/clarify a few blocks of code
- No longer pass the ManagementServiceImpl when registering the
metrics, but instead pass the Object the meter is observing (e.g.
broker, address, or queue)
- optimize startup time on paging (check-depage on startup)
- otpimize getNextPage() on complete pages
- optimize getFirstMessage() and paging. (avoid iterator usage)
Attempt to standardize all Logger declaration to a singular variable name
which makes the code more consistent and make finding usages of loggers in
the code a bit easier.
Logger statements should use formatting syntax and let the normal framework checks take care of
checking if a logger is enabled instead of string concats and isXEnabled logger checks except
in cases there is known expense to the specifc logging message/arg preparation or passing.
Changes from myself and Robbie Gemmell.
Co-authored-by: Robbie Gemmell <robbie@apache.org>
The issue is that depage should not put pages on the used pages as they were not actually intended to read.
instead I should create a newPageObject and not use the RefCounts caching.
In commit a9a85f98db I removed the code
which modified existing matches. However, I forgot that the matches read
from LDAP are often duplicated so instead of always adding a new match
this commit ensures that the *right* match is modified rather than a
potentially more generic wildcard match (which was the original
problem).
If an AMQP consumer tries to receive a message and the broker is unable
to convert the message from core to AMQP then the consumer is
disconnected and the offending message stays in the queue. When the
consumer reconnects the conversion error will happen again resulting in
a loop that can only be resolved through administrative action (e.g.
deleting the message manually or sending it to a dead letter address).
This commit fixes that problem by detecting the conversion problem and
sending the message to the queue's dead letter address. It also doesn't
disconnect the consumer.
This commit also changes the log messages associated with sending a
message to the dead letter address since this event can now occur
regardless of the delivery attempts.
Incorrect handling of unknown values in selectors.
There is a slight semantic change here due to an error in the way we
were handling null identifiers. This may require a change in selector
syntax to use "IS NULL" or "IS NOT NULL" when using identifiers which
may be null in the message being selected.
This was the case for an internal filter used by the cluster connection
bridge to select which cluster notification messages to consume.
See https://issues.apache.org/jira/browse/AMQ-5281 for more details.
The map used by LastValueQueue was inadvertently changed to a
non-thread-safe implementation in
4a4765c39c. This resulted in an occasional
ConcurrentModificationException from the hashCode implementation.
This commit restores the thread-safe map implementation and adds a test
which brute-forces a CME when using the non-thread-safe implementation.
When the LegacyLDAPSecuritySettingPlugin has enableListener set to true
and a new permission is added it will try to modify the existing match
if one exists. This is problematic if there's a more generic wildcard
match than the specific one that's modified.
This commit fixes that problem so that instead of modifying the existing
match(es) it simply adds a new one. The plugin never should have tried
modifying the existing match in the first place as two identical matches
would be a configuration error.
- Fixing RoleInfo to provide informations on deleteAddress.
- Adding more coverage on test to check the number of permissions
returned.
Signed-off-by: Emmanuel Hugonnet <ehugonne@redhat.com>
By allowing to pass caller's classname directly to org.apache.activemq.artemis.utils.ActiveMQThreadFactory#defaultThreadFactory instead of calculating it from stack.
RoleSet.class.getMethods() returns the same methods on both OpenJDK 11 and
OpenJ9 JDK 11 but the order is different. OpenJDK 11 returns
`public void org.apache.activemq.artemis.core.config.impl.RoleSet.add` before
`public boolean java.util.HashSet.add` while OpenJ9 JDK 11 returns
`public boolean java.util.HashSet.add` before
`public void org.apache.activemq.artemis.core.config.impl.RoleSet.add`
Due to the changes in 682f505e32 we now
send "Last Will & Testament" MQTT messages via ServerSession. This means
sending will fail if the disk is full. For MQTT this triggers a
connection failure which in turns triggers sending an LWT message. This
process will recurse infinitely until it results in a
java.lang.StackOverflowError.
This commit fixes that by tracking whether or not sending a LWT message
is already in progress.
org.apache.activemq.artemis.spi.core.protocol.RemotingConnection has a
number of implementations most notably an abstract version which
provides many methods shared among the implementations. The sharing
could be improved to eliminate duplicate code.
This commit eliminates more than 700 lines of unnecessary code.
There should be no semantic changes.
When a message is sent to an anycast queue via FQQN on one node of a
cluster and then a consumer is created on that same anycast queue via
FQQN on another node in the cluster the message is not redistributed to
the node with the consumer.
This commit fixes this use-case primarily by including the FQQN info in
the notification messages sent to other nodes in the cluster.
Messages without a last-value property sent to an LVQ are being pruned
rather than just passing through. Only messages with a non-null
last-value property should be subject to pruning.
Running HorizontalPagingTest with these variables would make the test to fail unless these changes are applied.
export TEST_HORIZONTAL_SERVER_START_TIMEOUT=300000
export TEST_HORIZONTAL_TIMEOUT_MINUTES=120
export TEST_HORIZONTAL_PROTOCOL_LIST=OPENWIRE
export TEST_HORIZONTAL_OPENWIRE_DESTINATIONS=200
export TEST_HORIZONTAL_OPENWIRE_MESSAGES=1000
export TEST_HORIZONTAL_OPENWIRE_COMMIT_INTERVAL=100
export TEST_HORIZONTAL_OPENWIRE_RECEIVE_COMMIT_INTERVAL=0
export TEST_HORIZONTAL_OPENWIRE_MESSAGE_SIZE=20000
export TEST_HORIZONTAL_OPENWIRE_PARALLEL_SENDS=10
We were lucky that processReferences was pretty much a static operation, hence I am moving it away from RoutingContext both for clarity and avoiding state being needed on that method.
If state was needed as part of processReferences you would had a pretty nasty bug as the IOCallback could introduce a race where a send would change the state while the IO was pending.
Both audit logging and logging from the LoggingActiveMQServerPlugin are
unclear as they relate to transactional sends and acks. Both essentially
ignore the transaction which makes it appear that an operation has taken
place when, in fact, it hasn't (e.g. a transactional ack is rolled back
but the log indicates the ack went through).
This commit fix this with the following changes:
- Log details when a send or ack is added to a transaction.
- Log details when the transaction is committed.
- Log when the transaction is rolled back.
- Include transaction details in the relevant DEBUG logs.
- Simplify INFO level logging for sends & acks in
LoggingActiveMQServerPlugin. Ensure details are in the DEBUG logs.
Other changes:
- Make capitalization more consistent in a handful of audit logs.
AddressControl has 2 methods to get same metric. Both
getNumberOfMessages() and getMessageCount() return the same metric
albeit in different ways.
Also, getNumberOfMessages() inspects both "local" and "remote" queue
bindings which is wrong.
This commit fixes these issues via the following changes:
- Deprecate getNumberOfMessages().
- Change getNumberOfMessages() to invoke getMessageCount().
- Add a test to ensure getNumberOfMessages() does not count remote
queue bindings.
- Simplify getMessageCount(DurabilityType).
This is caused by too many entries on the HashMap for ThreadLocals.
Also: I'm reviewing some readlock usage on the StorageManager to simplify things a little bit.
It would be useful to be able to cycle the embedded web server if, for
example, one needed to renew the SSL certificates. To support
functionality I made a handful of changes, e.g.:
- Refactoring WebServerComponent so that all the necessary
configuration would happen in the start() method.
- Refactoring WebServerComponentTest to re-use code.
Allow replication only certain addresses with mirror controller.
The configuration is similar to cluster address configuration.
Co-authored-by: Robbie Gemmell <robbie@apache.org>
The control existing in Redistributor is not needed as the Queue::deliver will already have a control on re-scheduling the loop and avoid holding references for too long.
This PR adds a "Validated User" column to the Hawtio plugin. The column is
not visible by default. The "Validated User" column is available in the
Session, Consumer, and Producer tabs of the artemis-navigation notebook.
"Validated User" is also available for sort and filtering in those view.
The "Validated Consumer" in the Producer view is always blank. I debugged
as far as I could, but did not find the cause of the issue.
It is possible to receive a compressed message from the client as regular message. Such a message will already contain correct body size, that takes compression into account.
Older versions of Openwire clients wil be affected by AMQ-6431.
As a result of the issue if the ID of the message>Integer.MAX_VALUE
a consumer configured with Failover and doing duplicate detection on the client
will not be able to process duplicate detection accordingly and miss messages.
Paging only removes files at the beginning of the stream...
Say you have paged files 1 through 1000...
if all the messages are ack, but one message on file 1 is missing an ack, all the 999 subsequent files would not be removed until all the messages on file 1 is ack.
This was working as engineered, but sometimes devs don't have complete control on their app.
With this improvement we will now remove messages in the middle of the stream as well.
There is also some improvement to how browsing and page work with this
The utility methods in
`org.apache.activemq.artemis.core.management.impl.MBeanInfoHelper` are
executed *a lot* - especially for Jolokia which is used by the web
console. The `MBeanOperationInfo` and `MBeanAttributeInfo` results are
static and reflection is slow therefore they should not be calculated
over and over again. Rather they should be calculated once and cached
for later use.
Caching these results significantly improves performance. Over the
course of 1,000,000 invocations the difference is several orders of
magnitude. This improves usability substantially when dealing with,
for example, tens of thousands of addresses and/or queues.
When using a temporary queue with a `temporary-queue-namespace` the
`AddressSettings` lookup wasn't correct. This commit fixes that and
refactors `QueueImpl` a bit so that it holds a copy of its
`AddressSettings` rather than looking them up all the time. If any
relevant `AddressSettings` changes the
`HierarchicalRepositoryChangeListener` implementation will still
refresh the `QueueImpl` appropriately.
The `QueueControlImpl` was likewise changed to get the dead-letter
address and expiry address directly from the `QueueImpl` rather than
looking them up in the `AddressSettings` repository.
I modified some code that came from ARTEMIS-734, but I ran the test that
was associated with that Jira (i.e.
`o.a.a.a.t.i.c.d.ExpireWhileLoadBalanceTest`) and it passed so I think
that should be fine. There actually was no test included with the
original commit. One was added later so it's hard to say for sure it
exactly captures the original issue.
It sometimes makes sense to set an acceptor's port to 0 to allow the JVM
to select an ephemeral port (e.g. in embedded integration tests). This
commit adds a new getter on NettyAcceptor so tests can programmtically
determine the actual port used by the acceptor.
This commit also changes the ACCEPTOR_STARTED notification and the
related logging to clarify the actual port value where clients can
connect.
The auto-create-jms-queues, auto-delete-jms-queues,
auto-create-jms-topics, and auto-delete-jms-topics address settings
were deprecated in ARTEMIS-881 way back in 2016. There's no need to keep
them in the default broker.xml at this point.
JGroups 3.x hasn't been updated in some time now. The last release was
in April 2020 almost 2 years ago. Lots of protocols have been updated
and added and users are wanting to use them. There is also increasing
concern about using older components triggered mainly by other
recently-discovered high-profile vulnerabilities in the wider Open
Source Java community.
This commit bumps JGroups up to the latest release - 5.2.0.Final.
However, there is a cost associated with upgrading.
The old-style properties configuration is no longer supported. I think
it's unlikely that end-users are leveraging this because it is not
exposed via broker.xml. The JGroups XML configuration has been around
for a long time, is widely adopted, and is still supported. I expect
most (if not all) users are using this. However, a handful of tests
needed to be updated and/or removed to deal with this absence.
Some protocols and/or protocol properties are no longer supported. This
means that users may have to change their JGroups stack configurations
when they upgrade. For example, our own clustered-jgroups example had to
be updated or it wouldn't run properly.
MQTT 5 is an OASIS standard which debuted in March 2019. It boasts
numerous improvments over its predecessor (i.e. MQTT 3.1.1) which will
benefit users. These improvements are summarized in the specification
at:
https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901293
The specification describes all the behavior necessary for a client or
server to conform. The spec is highlighted with special "normative"
conformance statements which distill the descriptions into concise
terms. The specification provides a helpful summary of all these
statements. See:
https://docs.oasis-open.org/mqtt/mqtt/v5.0/os/mqtt-v5.0-os.html#_Toc3901292
This commit implements all of the mandatory elements from the
specification and provides tests which are identified using the
corresponding normative conformance statement. All normative
conformance statements either have an explicit test or are noted in
comments with an explanation of why an explicit test doesn't exist. See
org.apache.activemq.artemis.tests.integration.mqtt5 for all those
details.
This commit also includes documentation about how to configure
everything related to the new MQTT 5 features.
The address-setting config-delete-diverts is not being applied correctly
hierarchically because it's not included in the merge() method. It is
also not being persisted to disk either. This commit fixes both issues.
scenario - avoid paging, if address is full chain another broker and produce to the head, consume from the tail using producer and consumer roles to partition connections. When tail is drained, drop it.
- adds a option to treat an idle consumer as slow
- adds basic support for credit based address blocking ARTEMIS-2097
- adds some more visiblity to address memory usage and balancer attribute modifier operations
The HTML output methods are hold-overs from way back when the code-base
started off as JBoss Messaging 2 and the broker mainly ran in JBoss AS 4
and 5 which leveraged an HTML-based JMX console where these methods
would be executed and spit out nicely formatted data. That stuff has all
long since been retired so this commit deprecates the HTML-based
management methods so they can be removed completely in a future release.
JSON is a better structured output format for this and most of the
deprecated methods have JSON alternatives.
Commit 481b73c8ca from ARTEMIS-3502
inadvertently broke this functionality. This commit restores the
original behavior.
autoDeleteAddress was renamed to forceAutoDeleteAddress which will ignore the address settings.
delete temporary queues will use forceAutoDeleteAddress=true.
this is done in collaboration with Justin Bertram
Adds support for extra configuration options to LDAP login module to
prepare for supporting any future/custom string configuration in LDAP
directory context creation.
Details:
- Changed LDAPLoginModule to pass any string configuration not
recognized by the module itself to the InitialDirContext contruction
environment.
- Changed the static LDAPLoginModule configuration key fields to an
enum to be able to loop through the specified keys (e.g. to filter out
the internal LDAPLoginModule configuration keys from the keys passed to
InitialDirContext).
- Few fixes for issues reported by static analysis tools.
- Tested that LDAP authentication with TLS+GSSAPI works against a
recent Windows AD server with Java
OpenJDK11U-jdk_x64_windows_hotspot_11.0.13_8 by setting the property
com.sun.jndi.ldap.tls.cbtype (see ARTEMIS-3140) in JAAS login.conf.
- Moved LDAPLoginModuleTest to the correct package to be able to
access LDAPLoginModule package privates from the test code.
- Added a test to LDAPLoginModuleTest for the task changes.
- Updated documentation to reflect the changes.
While converting a core message to an OpenWire message there may be an
error processing a property value. Currently this results in an
exception and the message is not dispatched to the client. The broker
eventually attempts to redeliver this message resulting in the same
error. Instead of throwing an exception the broker should simply log a
WARN message and skip the property. This will allow clients to receive
the message without the problematic property and the broker will not
have to attempt to redeliver the message again.
Durable changes made via the management API (e.g. adding
security-settings, adding address-settings, adding diverts) can be
reverted when reloading the XML at runtime.
This is a follow-up from ARTEMIS-2322.
The changes related to expired message are only there because
QueueFilterPredicate had a bug where the rate was correlated to expired
messages. When I fixed that I noticed that expired messages was actually
missing so I added it.
Casting the result of getPeerCertificates() to X509Certificate[] mirrors
what is done in the ActiveMQ "Classic" code-base.
A few tests which were imported from ActiveMQ "Classic" to verify our
OpenWire implementation were removed as they relied on a "stub"
implementation of javax.net.ssl.SSLSession that never would have worked
across multiple JDKs once javax.security.cert.X509Certificate[] was
removed. Furthermore, the tests appeared to be related to the OpenWire
*client* and not relevant to our broker-side implementation.
Aside from adding audit logging for message acknowledgement this commit
also consolidates the two nearly identical acknowledge method
implementations in o.a.a.a.c.s.i.QueueImpl. This avoids duplicating
code for audit logging, plugin invocation, etc. There is no semantic
change.
Due to the multi-threaded AMQP implementation the ThreadLocal variables
used by the AuditLogger to track the username and remote address don't
work properly. Changes include:
- Passing the audit Subject (set during authentication) and the remote
address explicitly for audit logging on the relevant ServerSession
methods rather than relying on the AuditLogger's ThreadLocal
variables
- Audit logging core session creation *after* successful authentication
so that we have the proper Subject; this is especially important for
the SSL certificate authentication use-case
- Renaming some methods and variables in AuditLogger to more accurately
reflect their intended use
- Adding JavaDoc and refactoring the getCaller methods on AuditLogger
- Refactor audit log testing and add a new test
As a follow-up to #3618/dc7de893747b90b627d729f9f18a758bb4dad9d5 update
checkstyle to the latest version, restoring the originally intended
"RightCurly" style, and updating all the code to properly adhere to the
style as enforced by the new checkstyle version.
The version of checkstyle we used before the aforementioned commit had
a bug which didn't properly enforced our intended "RightCurly" style
(see https://github.com/checkstyle/checkstyle/issues/6345). That commit
changed the style to accommodate the handful of unintended style
violations. This commit reverts that change for 2 main reasons:
- The style was always intended to use `alone` for both `METHOD_DEF`
and `CTOR_DEF`.
- There are over 1,000 existing uses of the intended style and around
30 violations of this style which were unintentionally allowed.
Reverting the style back to the original and cleaning up the unintented
violations makes the code more consistent and prevents further style
inconsistencies in the future.
There were a handful of other changes related to checkstyle bugs which
allowed unintended style violations. These were related to indentation
levels.
This closes#3619
(with some minor changes from Robbie to fix remaining violations)