The issue identified with AMQP was under Transaction usage, and while opening and closing sessions.
It seems the leak would be released once the connection is closed.
We added a new testsuite under ./tests/leak-tests To fix and validate these issues
Configurations employing shared-storage with NFS are susceptible to
split-brain in certain scenarios. For example:
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Split-brain.
In reality this situation is pretty unlikely due to the timing involved,
but the possibility still exists. Currently the file lock held by the
primary broker on the NFS share is essentially worthless in this
situation. This commit adds logic by which the timestamp of the lock
file is updated during activation and then routinely checked during
runtime to ensure consistency. This effectively mitigates split-brain in
this situation (and likely others). Here's how it works now.
1) Primary loses network connection to NFS.
2) Backup activates.
3) Primary reconnects to NFS.
4) Primary detects that the lock file's timestamp has been updated and
shuts itself down.
When the primary shuts down in step #4 the Topology on the backup can be
damaged. Protections were added for this via ARTEMIS-2868 but only for
the replicated use-case. This commit applies the protection for
removeMember() so that the Topology remains intact.
There are no tests for these changes as I cannot determine how to
properly simulate this use-case. However, there have never been robust,
automated tests for these kinds of NFS use-cases so this is not a
departure from the norm.
For pipelined open cases the events processing should ignore additional begin
and attach events if the open event handler closes the connection to avoid the
processing throwing additional exceptions and replacing the error condition in
the connection with an unrelated error about NPE from the additional events.
I am adding three attributes to Address-settings:
* page-limit-bytes: Number of bytes. We will convert this metric into max number of pages internally by dividing max-bytes / page-size. It will allow a max based on an estimate.
* page-limit-messages: Number of messages
* page-full-message-policy: fail or drop
We will now allow paging, until these max values and then fail or drop messages.
Once these values are retracted, the address will remain full until a period where cleanup is kicked in by paging. So these values may have a certain delay on being applied, but they should always be cleared once cleanup happened.
o.a.a.a.c.p.m.MQTTSubscriptionManager#removeSubscription() had a chunk
of code from 971f673c60 removed. That code
was added under the assumption that there should only ever be one
consumer per queue. That was true for MQTT 3.x, but it's not always true
for MQTT 5 due to shared subscriptions. However, the tests from that
commit all still pass even with it removed now (as well as all the other
MQTT tests) so I think it's safe.
If the client is using address prefixes to define the routing type along with
durable subscriptions then on re-attach the compairon to check if the subscription
address has changed needs to remove the prefix when comparing against the address
since the prefix isn't propagated when creating the address and will always fail
resulting in the subscription queue being deleted in error.
Adds some tests to validate that the destination prefixes if set and
are used properly by the client are honored over the default address
auto create routing type condiguration.
When an AMQP client subscribes to a new address (non-existing) with a receiver link, the
address is created with routing type ANYCAST regardles of the default address creation
configuration of the broker, and ignores even the broker wide default of MULTICAST.
I am adding an option sync=true or false on mirror. if sync, any client blocking operation will wait a roundtrip to the mirror
acting like a sync replica.
Over time org.apache.activemq.artemis.tests.integration.amqp has become
home to many multi-protocol JMS tests even though the package is really
for AMQP-specific tests. This commit splits those tests out into their
own package.
This is a preliminary step to clarify these tests before I add another
one for a different issue.
When the last non-durable subscriber on a JMS topic disconnects the
corresponding queue representing the subscription is deleted as
expected. However, the queue's address will also be deleted no matter
what, which is *not* expected.
Some LDAP servers (e.g. OpenLDAP) do not support the "persistent search"
feature and therefore the existing "listener" feature does not actually
fetch updates. This commit implements a "pull" feature controlled by a
configurable interval equivalent to what is implemented in the cached
LDAP authorization module from ActiveMQ "Classic."
A handful of tests started to fail after the original fix was committed.
This commit fixes those failures mainly by using a mock
`TransactionSynchronizationRegistry`.
I changed `o.a.a.a.r.ActiveMQRAManagedConnection#checkTransactionActive`
slightly because `getTransactionStatus` will never return `null` unlike
`getTransaction` would. The semantics should still be the same, though.
Adds support for standard Java TLS and ActiveMQ Artemis-specific override
encrypted system property values for the key store and trust store
passwords, including a separate codec property
Allow setting id-cache-size to 0 from broker.xml and ensure the broker
handles this gracefully. Previously you could only set the cache size to
0 via broker properties or programmatically and it would throw an
ArrayIndexOutOfBoundsException when adding an item to the cache.
- From now on we will save snapshots of page-counters on the journal (basically for compatibility with previous verions).
And we will recount the records on startup.
- While the rebuild is being done the value from the previous snapshot is still available with current updates.
when cancelling a large number of messages, the addSorted could be holding a lock for too long causing the server to crash under CriticalAnalyzer
co-authored: AntonRoskvist <anton.roskvist@volvo.com> (discovering the issue and providing the test ClientCrashMassiveRollbackTest.java)
- optimize startup time on paging (check-depage on startup)
- otpimize getNextPage() on complete pages
- optimize getFirstMessage() and paging. (avoid iterator usage)
Attempt to standardize all Logger declaration to a singular variable name
which makes the code more consistent and make finding usages of loggers in
the code a bit easier.
Commit 5a42de5fa6 called my attention to
this test. It really needs to be refactored because:
- It belongs in the integration-tests module rather than the MQTT
protocol module.
- It is using a lot of non-standard components (e.g.
EmbeddedJMSResource, Awaitility, etc.).
- It is overly complicated (e.g. using its own MqttClientService).
This commit resolves all those problems. The new implementation is quite
a bit different but still equivalent. I reverted the original fix from
ARTEMIS-2476 and the test still fails.
Logger statements should use formatting syntax and let the normal framework checks take care of
checking if a logger is enabled instead of string concats and isXEnabled logger checks except
in cases there is known expense to the specifc logging message/arg preparation or passing.
Changes from myself and Robbie Gemmell.
Co-authored-by: Robbie Gemmell <robbie@apache.org>
The issue is that depage should not put pages on the used pages as they were not actually intended to read.
instead I should create a newPageObject and not use the RefCounts caching.
I did not intend to have this difference in the semaphore.
The idea is that we never keep messages pending on this case, otherwise such a huge message would use too much memory.
I was debugging Compacting, looking for a possible issue here in these conditions.
even though I found nothing wrong with the code, I still want to keep the test as there's no such thing as enough testing.
In commit a9a85f98db I removed the code
which modified existing matches. However, I forgot that the matches read
from LDAP are often duplicated so instead of always adding a new match
this commit ensures that the *right* match is modified rather than a
potentially more generic wildcard match (which was the original
problem).
If an AMQP consumer tries to receive a message and the broker is unable
to convert the message from core to AMQP then the consumer is
disconnected and the offending message stays in the queue. When the
consumer reconnects the conversion error will happen again resulting in
a loop that can only be resolved through administrative action (e.g.
deleting the message manually or sending it to a dead letter address).
This commit fixes that problem by detecting the conversion problem and
sending the message to the queue's dead letter address. It also doesn't
disconnect the consumer.
This commit also changes the log messages associated with sending a
message to the dead letter address since this event can now occur
regardless of the delivery attempts.
The map used by LastValueQueue was inadvertently changed to a
non-thread-safe implementation in
4a4765c39c. This resulted in an occasional
ConcurrentModificationException from the hashCode implementation.
This commit restores the thread-safe map implementation and adds a test
which brute-forces a CME when using the non-thread-safe implementation.
Sometimes users want to perform custom client ID validation, and in the
case of an invalid client ID the proper reason code should be returned
in the CONNACK packet.
When the LegacyLDAPSecuritySettingPlugin has enableListener set to true
and a new permission is added it will try to modify the existing match
if one exists. This is problematic if there's a more generic wildcard
match than the specific one that's modified.
This commit fixes that problem so that instead of modifying the existing
match(es) it simply adds a new one. The plugin never should have tried
modifying the existing match in the first place as two identical matches
would be a configuration error.
- Fixing RoleInfo to provide informations on deleteAddress.
- Adding more coverage on test to check the number of permissions
returned.
Signed-off-by: Emmanuel Hugonnet <ehugonne@redhat.com>
By allowing to pass caller's classname directly to org.apache.activemq.artemis.utils.ActiveMQThreadFactory#defaultThreadFactory instead of calculating it from stack.
Due to the changes in 682f505e32 we now
send "Last Will & Testament" MQTT messages via ServerSession. This means
sending will fail if the disk is full. For MQTT this triggers a
connection failure which in turns triggers sending an LWT message. This
process will recurse infinitely until it results in a
java.lang.StackOverflowError.
This commit fixes that by tracking whether or not sending a LWT message
is already in progress.
When a message is sent to an anycast queue via FQQN on one node of a
cluster and then a consumer is created on that same anycast queue via
FQQN on another node in the cluster the message is not redistributed to
the node with the consumer.
This commit fixes this use-case primarily by including the FQQN info in
the notification messages sent to other nodes in the cluster.
These scripts are supposed to be execute with ". ./parameters-paging.sh"
which would incur in -e, and any mistake on the shell will kill your shell and you would be wondering what happened.
Using direct routing skips authorization for "Last Will and Testament"
messages (a.k.a. "will" messages). This commit fixes that problem by
using the internal session that is established for normal message
production and consumption.
Messages without a last-value property sent to an LVQ are being pruned
rather than just passing through. Only messages with a non-null
last-value property should be subject to pruning.
Running HorizontalPagingTest with these variables would make the test to fail unless these changes are applied.
export TEST_HORIZONTAL_SERVER_START_TIMEOUT=300000
export TEST_HORIZONTAL_TIMEOUT_MINUTES=120
export TEST_HORIZONTAL_PROTOCOL_LIST=OPENWIRE
export TEST_HORIZONTAL_OPENWIRE_DESTINATIONS=200
export TEST_HORIZONTAL_OPENWIRE_MESSAGES=1000
export TEST_HORIZONTAL_OPENWIRE_COMMIT_INTERVAL=100
export TEST_HORIZONTAL_OPENWIRE_RECEIVE_COMMIT_INTERVAL=0
export TEST_HORIZONTAL_OPENWIRE_MESSAGE_SIZE=20000
export TEST_HORIZONTAL_OPENWIRE_PARALLEL_SENDS=10
The OpenWire JMS client shipped with ActiveMQ "Classic" uses the
client's hostname as part of the `JMSMessageID`. Consumers may use this
data to select messages sent from particular hosts. Although this is
brittle and not recommended it is nonetheless possible.
However, when messages arrive to ActiveMQ Artemis they are converted
to core messages, and the broker doesn't properly map the selector from
`JMSMessageID` to the corresponding property on the underlying core
message. This commit fixes that problem. Changes include:
- Mapping selector from JMSMessageID to the internal __HDR_MESSAGE_ID
- Relocating some constant values so that both the protocol and commons
module can use them
- Adding a test
AddressControl has 2 methods to get same metric. Both
getNumberOfMessages() and getMessageCount() return the same metric
albeit in different ways.
Also, getNumberOfMessages() inspects both "local" and "remote" queue
bindings which is wrong.
This commit fixes these issues via the following changes:
- Deprecate getNumberOfMessages().
- Change getNumberOfMessages() to invoke getMessageCount().
- Add a test to ensure getNumberOfMessages() does not count remote
queue bindings.
- Simplify getMessageCount(DurabilityType).
The condition fixed on this commit should not really happen in production
as the compacting counts should always be ordered (records that were compacted earelier will always be at the top of the journal).
However it highlights an improvement that could be done on the journal compacting.
MQTT 3.1 and 3.1.1 clients using a clean session should have a
*non-durable* subscription queue. If the broker restarts the queue
should be removed. This is due to [MQTT-3.1.2-6] which states that the
session (and any state) must last only as long as the network
connection.
This is caused by too many entries on the HashMap for ThreadLocals.
Also: I'm reviewing some readlock usage on the StorageManager to simplify things a little bit.
JMSTestCase is deprecated anyway.
in its older form many many years ago a server would be reused over
between tests.
what forced us to make such verification to avoid messages from one test
leaking into the next.
This was because a server startup was expensive many years ago (less
efficient code and the hardware available 10 years ago)
with the current state of things this is not needed as the server will
be started from scratch on every test
It would be useful to be able to cycle the embedded web server if, for
example, one needed to renew the SSL certificates. To support
functionality I made a handful of changes, e.g.:
- Refactoring WebServerComponent so that all the necessary
configuration would happen in the start() method.
- Refactoring WebServerComponentTest to re-use code.
It would be useful for security manager implementations to be able to
alter the client ID of MQTT connections.
This commit supports this functionality by moving the code which handles
the client ID *ahead* of the authentication code. There it sets the
client ID on the connection and thereafter any component (e.g. security
managers) which needs to inspect or modify it can do so on the
connection.
This commit also refactors the MQTT connection class to extend the
abstract connection class. This greatly simplifies the MQTT connection
class and will make it easier to maintain in the future.
Allow replication only certain addresses with mirror controller.
The configuration is similar to cluster address configuration.
Co-authored-by: Robbie Gemmell <robbie@apache.org>
The control existing in Redistributor is not needed as the Queue::deliver will already have a control on re-scheduling the loop and avoid holding references for too long.
When running a UDP connection factory you have to either keep it running, or close it so the UDP thread is closed.
this is an issue for the testsuite as we validate for leaked threads. This needs to be fixed on the test.
The MQTT 5 (and 3.1.1) specification states:
Until it has received the corresponding PUBREL packet, the receiver
MUST acknowledge any subsequent PUBLISH packet with the same Packet
Identifier by sending a PUBREC. It MUST NOT cause duplicate messages to
be delivered to any onward recipients in this case [MQTT-4.3.3-10].
The broker prevents a duplicate message, but it doesn't respond with a
PUBREC. This commit fixes that.
Removing the connection ID property from the actual *message* breaks the
nolocal functionality. Removing the property isn't necessary in the
first place so this commit reomves that code.
It is possible to receive a compressed message from the client as regular message. Such a message will already contain correct body size, that takes compression into account.
Older versions of Openwire clients wil be affected by AMQ-6431.
As a result of the issue if the ID of the message>Integer.MAX_VALUE
a consumer configured with Failover and doing duplicate detection on the client
will not be able to process duplicate detection accordingly and miss messages.
Paging only removes files at the beginning of the stream...
Say you have paged files 1 through 1000...
if all the messages are ack, but one message on file 1 is missing an ack, all the 999 subsequent files would not be removed until all the messages on file 1 is ack.
This was working as engineered, but sometimes devs don't have complete control on their app.
With this improvement we will now remove messages in the middle of the stream as well.
There is also some improvement to how browsing and page work with this
Scripts:
- Fix the preapre-docker.sh to exit with 0 instead of 1 on success
On pom files:
- Change e2e-tests variable names to e2e-tests.xxxxxx for clarity on
e2e-tests variables
- Add e2e-tests.skipImageBuild variable to control if the docker image
will be build (defaults to not build)
- Add e2e-tests.dockerfile variable to specify the dockerfile to be
used (defaults to Dockerfile-centos)
- Bump testcontainers version to 1.16.3
- Add artemis distribution dependency since the docker image build
depends on it
On ContainerService class:
- Fix exposePorts and exporseFolder to use SELinux shared mode
otherwise the mount fails on machines with SELinux enabled
- Move the logic to use specific user on container from generic start
method to broker specific method to avoid affect other images
- Update the broker image name to a more generic name (activemq-artemis
instead of artemis-centos)
- Update the broker image tag to match with the project version in pom
file
As the test needs the generated jms jars to be verified I moved it from
unit-tests to smoke-tests.
Updated the test to look for the correct jars as the originally
specified does not exist.
Update the test to assert against Implementation-Version instead of
ActiveMQ-Version in the manifest file as the ActiveMQ-Version property does not exist.
Move all tests which are related to end-to-end testing from smoke-tests
module to a new module named e2e-tests.
These e2e tests are those which are dependent of ContainerService
class. ContainerService class uses artemis inside a container by using
the testcontainers library and for that reason these tests are usually
a quite slow and tecnically they are not a smoke test.
The new e2e-tests module is part of tests module but it is not enabled
by default and to get executed it requires the e2e-tests profile
specification on maven command.
The commit includes the following changes:
- Don't drop the connection on subscribe or publish authorization
failures for 3.1 clients.
- Don't drop the connection on subscribe authorization failures for
3.1.1 clients.
- Add configuration parameter to control behavior on publish
authorization failures for 3.1.1 clients (either disconnect or not).
When using a temporary queue with a `temporary-queue-namespace` the
`AddressSettings` lookup wasn't correct. This commit fixes that and
refactors `QueueImpl` a bit so that it holds a copy of its
`AddressSettings` rather than looking them up all the time. If any
relevant `AddressSettings` changes the
`HierarchicalRepositoryChangeListener` implementation will still
refresh the `QueueImpl` appropriately.
The `QueueControlImpl` was likewise changed to get the dead-letter
address and expiry address directly from the `QueueImpl` rather than
looking them up in the `AddressSettings` repository.
I modified some code that came from ARTEMIS-734, but I ran the test that
was associated with that Jira (i.e.
`o.a.a.a.t.i.c.d.ExpireWhileLoadBalanceTest`) and it passed so I think
that should be fine. There actually was no test included with the
original commit. One was added later so it's hard to say for sure it
exactly captures the original issue.
When copying message properties from the core message to the OpenWire
message we intentially omit any properties starting with `_AMQ` and
`__HDR_`. However, we were effectively negating that logic because we
copied the marshalled properties directly to the message without any
filtering. Now that we no longer copy the marshalled properties directly
to the message the test breaks because it expects properties starting
with `__HDR_`. This commit fixes the test by removing those
expectations. The test is still valid because the message is still
receieved rather than being swallowed due to an exception (which was the
original problem).
I am adding a test showing it is safe to not wait pending callbacks before closing a file.
With this I can just close the file and let the kernel to deal with sending the completions.
It sometimes makes sense to set an acceptor's port to 0 to allow the JVM
to select an ephemeral port (e.g. in embedded integration tests). This
commit adds a new getter on NettyAcceptor so tests can programmtically
determine the actual port used by the acceptor.
This commit also changes the ACCEPTOR_STARTED notification and the
related logging to clarify the actual port value where clients can
connect.