If a broker loses its file lock on the journal and doesn't notice (e.g.
network connection failure to an NFS mount) then it can continue to run
after its backup activates resulting in split-brain.
This commit implements periodic journal lock evaluation so that if a live
server loses its lock it will automatically restart itself.
The PageSubscriptionImpl.cleanupEntries could be locked by the queue
depage because they are executed with the same executor and the depage
could be locked by the iterQueue.
If PageSubscriptionImpl.cleanupEntries is locked, no one clean up the
JournalRecord and PagePositionImpl instances created during iterQueue.
So removing all messages from a huge queue, causes the retention of too
JournalRecord and PagePositionImpl instances until an OOM.
To avoid to lock the PageSubscriptionImpl.cleanupEntries the depage is
executed only if the queue isn't iterating.
This commit introduces the ability to configure a downstream connection
for federation. This works by sending information to the remote broker
and that broker will parse the message and create a new upstream back
to the original broker.
Add the config parameter `page-sync-timeout` to set a customized value,
because if the broker is configured to use ASYNCIO journal, the timeout
has the same value of NIO default journal buffer timeout ie 3333333.
The iterQueue transaction commits are locked by the synchronization
context. So removing all messages from a huge queue causes the creation
of too locked transactions for the paged messages and so OOM.
The iteration on paged message is executed out the iterQueue
synchronization context to avoid to lock the transaction commits.
Active Directory servers are unable to handle referrals automatically.
This causes a PartialResultException to be thrown if a referral is
encountered beneath the base search DN, even if the LDAPLoginModule is
set to ignore referrals.
This option may be set to 'true' to ignore these exceptions, allowing
login to proceed with the query results received before the exception
was encountered.
Note: there are no tests for this change as I could not reproduce the
issue with the ApacheDS test server. The issue is specific to directory
servers that don't support the ManageDsaIT control such as Active
Directory.
A new feature to preserve messages sent to an address for queues that will be
created on the address in the future. This is essentially equivalent to the
"retroactive consumer" feature from 5.x. However, it's implemented in a way
that fits with the address model of Artemis.
Improve wildcard support for the key attribute in the roles access
match element and whitelist entry element, allowing prefix match for
the mBean properties.
In LargeMessageImpl.copy(long) it need to open the underlying
file in order to read and copy bytes into the new copied message.
However there is a chance that another thread can come in and close
the file in the middle, making the copy failed
with "channel is null" error.
This is happening in cases where a large message is sent to a jms
topic (multicast address). During delivery it to multiple
subscribers, some consumer is doing delivery and closed the
underlying file after. Some other consumer is rolling back
the messages and eventually move it to DLQ (which will call
the above copy method). So there is a chance this bug being hit on.
The crititical analyser trigger the broker shutdown if try to
removeAllMessages with a huge queue. The iterQueue is split so as
not to keep the lock too time.
The LocalMonitor tick log is very useful to establish a "heartbeat" log
statement. It is moved into its own logger from PagingManager logger,
which is too verbose to leave activated indefinitely in production.
After a node is scaled down to a target node, the sf queue in the
target node is not deleted.
Normally this is fine because may be reused when the scaled down
node is back up.
However in cloud environment many drainer pods can be created and
then shutdown in order to drain the messages to a live node (pod).
Each drainer pod will have a different node-id. Over time the sf
queues in the target broker node grows and those sf queues are
no longer reused.
Although use can use management API/console to manually delete
them, it would be nice to have an option to automatically delete
those sf queue/address resources after scale down.
In this PR it added a boolean configuration parameter called
cleanup-sf-queue to scale down policy so that if the parameter
is "true" the broker will send a message to the
target broker signalling that the SF queue is no longer
needed and should be deleted.
If the parameter is not defined (default) or is "false"
the scale down won't remove the sf queue.
By default, such a cancelled task is not automatically removed from the
work queue until its delay elapses. It may cause unbounded retention of
cancelled tasks. To avoid this, set remove on cancel policy to true.
The core server session tracks details about producers like what
addresses have had messages sent to them, the most recent message ID
sent to each address, and the number of messages sent to each address.
This information is made available to users via the
listProducersInfoAsJSON method on the various management interfaces
(JMX, web console, etc.). However, in situations where a server session
is long lived (e.g. in a pool) and is used to send to many different
addresses (e.g. randomly named temporary JMS queues) this info can
accumulate to a problematic degree. Therefore, we should limit the
amount of producer details saved by the session.
Certain devices or file systems won't support record level locking.
For that reason I am changing FileLockNodeManager to use separate files (one for each position) instead of using tryLock(position);
A good example for this would be cephFS where channel.tryLock or channel.tryLock works but it fails at a record level.
Wait netty event loop group shutdown to avoid too many opened FDs after
server stops, when netty configuration is used. Clear server
activateCallbacks to avoid reactivation of previous nodeManager and
consequent FD leaks on restart. Fix LargeServerMessageImpl.copy to avoid
FD leaks when a large message expiry or it is sent to DLA. Terminate
HawtDispatcher global queue to avoid pipes and eventpolls leaks after a
MQTT test.
cherry-picking commit 9617058ba0649af4eea15ce8793f86de827c4b7f
NO-JIRA adding check for open FD on the testsuite
cherry-picking commit 0facb7ddf4d3baa14a3add4290684aff7fd46053
NO-JIRA addressing connections leaks on integration tests
If a jms client (be it openwire, amqp, or core jms) receives a message that
is from a different protocol, the JMSMessageID maybe null when the
jms client expects it.
Add max record size check before adding a record to prevent that the
broker shuts down, when there is one really large header sent with the
message. Add message size check before allocating large message resource
if it can't be stored.
AbstractJournalStorageManager::performCachedLargeMessageDeletes
must enforce acquisition of manager write lock (as documented)
to avoid unlucky racing calls of stopReplication while stopping
to deadlock.
FileConfigurationParserTest was creating a data folder.
This is simply disabling persistence from the configuration used by the server on this test as it is not needed.
There are a few issues with prefixing and compatibility.
This is basically an issue when integrated with Wildfly or any other case
where prefix is activated
and playing with older versions.
Page::read is allocating a new ChannelBufferWrapper on each
paged message read: to reduce the allocation rate, it could be
reused until a new wrapped ByteBuffer is created
This test has been failing as part of the main testsuite
and it should really be a smoke test as it is using a real test.
so, I'm moving it as smoke-test
This is fixing these tests:
- org.apache.activemq.artemis.tests.integration.paging.PagingOrderTest#testPagingOverCreatedDestinationQueues
- org.apache.activemq.artemis.tests.integration.paging.PagingOrderTest#testPagingOverCreatedDestinationTopics
No additional tests are needed as this change is covereted by the current testsuite
Historically the broker has read the XML configuration file as a String,
substituted system properties, and then parsed that String into an XML
document. However, this method won't substitute system properties in the
files which are imported via xinclude. In order to substitue system
properties in xincluded files the substitution needs to be performed
after the file is parsed into an XML document. This commit implements
that change and refactors the XMLUtil class a bit to eliminate redundant
code, obsolete comments, etc.
When auto-creation is off then older clients consuming messages from an
FQQN won't work. This commit fixes that problem and adds a compatibility
test to verify.
The changes from ARTEMIS-2189 mean that
o.a.a.a.c.s.i.ServerSessionImpl#deleteQueue
is no longer called from the same ServerSessionImpl instance that
created it which means that TempQueueCleanerUpper instances will leak.
To resolve the leak the client will only create a new session when
necessary instead of every time delete() is invoked.
Implement using the ActiveMQ5 JMSXGroupFirstForConsumer, property as default, but make it possible for future to make it configurable easily. (Not this PR)
Add test
In adding auto-delete queue level feature, its been noticed as some feature bits were added during hot fix branch, that there's api break with the 2.6.x hotfix branch.
This addresses that by fixing this in 2.7.x
LocalMonitor::under on PagingManagerImpl won't log anymore with a
warning message if the producers got unblocked and with info
if disk it getting freed
Performing direct deliveries of management messages could enter
a code path on QueueImpl::addTail with a NULL pageIterator: performing
a null check will avoid it to throw NPE.
The Audit log allows user to log some important actions,
such as ones performed via management APIs or clients,
like queue management, sending messages, etc.
The log tries to record who (the user if any) doing what
(like deleting a queue) with arguments (if any) and timestamps.
By default the audit log is disabled. Through configuration can
be easily turned on.
Multiple consumers using the same clientId in the cluster, the last consumer connection should close the previous consumer connection!
ARTEMIS-2226 last consumer connection should close the previous consumer connection
to address apache-rat-plugin:0.12:check
ARTEMIS-2226 last consumer connection should close the previous consumer connection
to address checkstyle
ARTEMIS-2226 last consumer connection should close the previous consumer connection
adjust the code structure
ARTEMIS-2226 last consumer connection should close the previous consumer connection
adjust the code structure
ARTEMIS-2226 last consumer connection should close the previous consumer connection
adjust the code structure
ARTEMIS-2226 last consumer connection should close the previous consumer connection
adjust the code structure
ARTEMIS-2226 last consumer connection should close the previous consumer connection
adjust the code structure
ARTEMIS-2226 last consumer connection should close the previous consumer connection
add javadoc
Direct and async deliveries lock QueueImpl::this and
ServerConsumerImpl::this in different order causing deadlock:
has been introduced a deliverLock to prevent both type of delivers
to concurrently happen, making irrelevant the lock ordering.
Added test reproducer and changed Queue::isDurableMessage usages into
Queue::isDurable to allow acks to hit the journal and being
correctly replicated across nodes.
Add ability to configure when creating auto created queues at the queue level
Add support for configuring message count check
Add test cases
Update docs
Support using group buckets on a queue for better local group scaling
Support disabling message groups on a queue
Support rebalancing groups when a consumer is added.
This is simply fixing the example under examples/features/standard/divert
Other tests are passing.
No additional tests are needed as the example on this case acts like a test.
Push isDirectDeliver method from netty impl, to the Connection interface
Add support to InVMConnection for isDirectDeliver flag and ability to set via config, defaulting to false, to keep current default behavior.
Extend DirectDeliverTest to check InVM as well.
Any checkProperties();<usage of this.properties> pattern has been
replaced by an atomic checkProperties().<usage of returned properties>
to help both performance and consistency.
The cleanup is now performed into CoreTypedProperties both
for performance reasons (avoid lock/unlock many times)
and consistency, given that the operation is now atomic.
In Page.write(final PagedMessage message) if the page file is closed
it returns silently. The caller has no way to know that if the message
is paged to file or not. It should throw an exception so that the
caller can handle it correctly.
This causes random failure PagingTest#testExpireLargeMessageOnPaging().
The test shows that when the server stops it closes the page file.
In the mean time a message is expired to the expiry queue and if
the expiry queue is in paging mode, it goes to Page.write() and
returns without any error. The result is that the message is removed
from the original queue and not added to the expiry queue.
If we throw exception here it makes the expiration failed, the message
will not be removed from the orginal queue. Next time broker is started,
the message will be reloaded and expired again. no message lost.
Add consumer priority support
Includes refactor of consumer iterating in QueueImpl to its own logical class, to be able to implement.
Add OpenWire JMS Test - taken from ActiveMQ5
Add Core JMS Test
Add AMQP Test
Add Docs
There's a *slight* semantic change with the behavior of the queue query
and binding query to make them consistent with the address query, namely
that they will return the name of the queue and the name of the address
in every case and the returned names will be not use the FQQN syntax but
will be parsed to reflect their actual names in the broker.
There were two different but nearly identical implementations of
createQueue(). I consolidated these into a single method. There should
be no semantic differences.
When trying to get the bindings for an address the getBindingsForAddress
method will create a Bindings instance if there are no bindings for the
address. This is unnecessary in most circumstances so use the
lookupBindingsForAddress method instead and check for null.
MULTICAST messages forwarded by a core bridge will not be routed to any
ANYCAST queues and vice-versa. Diverts have the ability to configure how
routing-type is treated. Core bridges now support this same kind of
functionality. By default the bridge does not alter the routing-type of
forwarded messages to maintain compatibility with existing behavior.
Large messages pendingRecordID is not accessed atomically, leading
to races that would lead to records that cannot been found on the
journal for deletion: it would lead to cause NPE that won't clean
the pending tasks on the current OperationContextImpl.
Adding a cleanup on error of those tasks and avoiding the race
to happen by adding proper synchronization will both enforce
correct clean up when something bad happen and avoid NPE.
In PagingManagerImpl#getPageStore() the operations on the map 'stores'
are not synchronzed and it's possible that more than one paging store is
created for one address.
PageSubscriptionImpl::ackTx is already performing a counter update
using the message persistent size: the size can be reused on
PagePosition::setPersistentSize, avoiding to query the page cache just
to compute it.
These improvements were also part of this task:
- Routing is now cached as much as possible.
- A new Runnable is avoided for each individual message,
since we use the Netty executor to perform delivery
https://issues.apache.org/jira/browse/ARTEMIS-2205
TransactionImpl::properties are often not used and could be
avoided to be allocated.
OperationContextImpl.TaskHolders instances are turned into static
classes to avoid refecencing back the context, making the life
easier for the GC.
OperationContexImpl volatile loads can be reduced to make the
code faster on the hot path.
When a receiving transaction is committed in a paging situation,
if a page happens to be completed and it will be deleted in a
transaction operation (PageCursorTx). The other tx operation
RefsOperation needs to access the page (in PageCache) to finish
its job. There is a chance that the PageCursorTx removes the
page before RefsOperation and it will cause the RefsOperation
failed to find a message in a page.
When a node tries to reconnects to another node in a scale down cluster,
the reconnect request gets denied by the other node and keeps retrying,
which causes tasks in the ordered executor accumulate and eventually OOM.
The fix is to change the ActiveMQPacketHandler#handleCheckForFailover
to allow reconnect if the scale down node is the node itself.
Not closing the InputStream makes this test flaky on Windows. The test
breaks because FileMoveManager::delete(java.io.File) Line 221 fails
to delete the file if it's still "owned" by the JVM process on Windows.
With AMQP protocol when some messages are received in a transaction,
calling JMX QueueControl.listDeliveringMessages() returns empty list
before the transaction is committed.
Add test cases
Add GroupSequence to Message Interface
Implement Support closing/reset group in queue impl
Update Documentation (copy from activemq5)
Change/Fix OpenWireMessageConverter to use default of 0 if not set, for OpenWire as per documentation http://activemq.apache.org/activemq-message-properties.html
Implement custom LVQ Key and Non-Destructive in broker - protocol agnostic
Make feature configurable via broker.xml, core apis and activemqservercontrol
Add last-value-key test cases
Add non-destructive with lvq test cases
Add non-destructive with expiry-delay test cases
Update documents
Add new methods to support create, update with new attributes
Refactor to pass through queue-attributes in client side methods to reduce further method changes for adding new attributes in future and avoid methods with endless parameters. (note: in future this should prob be done server side too)
Update existing test cases and fake impls for new methods/attributes
Add Concurrency Test to expose concurrency errors seen in logs.
Add Fix to ensure TypedProperties to ensure threadsafety
Add forEach and forEachKey to allow for provide a thread safe way of iterating through keys and values, without needing to duplicate the collection.
Add getMapNames method to remove code duplication and to ensure thread safe
Further fix around network loss.
If network loss (split) and slave activates, for a period its config used when it initializes in initialisePart2 was stale.
Add/Extend test to ensure address-setting change made on live preserved on slave (after activation)
Add/Extend test to ensure security-setting change made on live preserved on backup (after activation)
Extend test case to reproduce problem of client created queues being incorrectly removed on simple reload of config.
Add a flag/field to the queues created by configuration/broker.xml so we can correctly filter only queues created/managed by config.
Update listConfiguredQueues to use the new queue flag
Add Tests
Add implementation inline with other queue updatable settings.
Enhance tests to ensure queue is not destroyed during config change and messages in queue already are preserved
Revert previous fix
Keep original ConfigChangeTest
Apply new non-destructive fix.
Enhance tests to ensure messages in queues are not lost either on reload when running or when config changed on-restart (e.g. queue i not destroyed)
Any failure to deploy an address or queue will short-circuit the broker
initialization process preventing any other addresses or queues from
being deployed as well as other critical resources like acceptors, etc.
The ServerSessionPacketHandler has a close() callback handler which will
delete any pending large messages. However, there is a race where a
large message can be routed, then the close delete the associated large
message resulting in data loss.
First, QueueQuery should use address name for address settings
The name used for looking up address settings for a queue now uses the
address name if there is a local queue binding
Second, make sure sent credits to the server is the correct value
Fix checkstyle
Avoid duplicated logic
Ability to filter and group
Instantiate SimpleString property key once
Get property value via getObjectProprty to ensure all special mapped properties such as in AMQPMessage would return
Avoid a custom string to represent null, instead rely on Java's representation "null" by using Objects.toString to get the string value of the property value used to group by.
Seperate plugin interface by area, all extending a base interface.
Update code to check and call only plugins implementing specific interfaces.
Existing interface extends all the new interfaces for back compatibility or those who want simplicity and don't care about perf.
This commit adds support for tracking metrics for bridges for both
normal bridges and bridges that are part of a cluster. The two
statistics added in this commit are messages pending acknowledgement
and messages acknowledged but more can be added later.
Anonymous senders (those created without a target address) are not
blocked when max-disk-usage is reached. The cause is that when such
a sender is created on the broker, the broker doesn't check the
disk/memory usage and gives out the credit immediately.
Fix so that it only updates if not null, to avoid user being unset from existing api's,
This is similar to other values, that only change the value when not null.
This reverts commit c3fbd1b9e4.
Based on the discussion on the PR
https://github.com/apache/activemq-artemis/pull/2035 this shouldn't have
been merged. It's importing JMS-specific code into the core broker which
is something we've worked hard to eliminate in recent releases.
- Split protocols into individual chapters
- Reorganize summary to flow more logically
- Fill in missing parameters in configuration index
- Normalize spaces for ordered and unordered lists
- Re-wrap lots of text for readability
- Fix incorrect XML snippets
- Normalize table formatting
- Improve internal links with anchors
- Update content to reflect new address model
- Resized architecture images to avoid excessive white-space
- Update some JavaDoc
- Update some schema elements
- Disambiguate AIO & ASYNCIO where necessary
- Use URIs instead of Objects in code examples
Authentication failures are currently only logged for CORE clients.
This change puts the logging in a central location which all protocols
use for authentication so that authentication failures are logged for
all protocols.
Calling close multiple times on ServerConsumer can result in multiple
notifications being routed around the cluster. This causes cluster
topology info to become skewed. Which affects a number of components
such as message redistribution, metrics and can eventually cause OOM
should multiple queues be redistributing at the same time.
After sending pages, the thread will hold the storage manager write lock
and send synchronization finished packet(use the parent thread pool) to
the backup node. At the same time, thread pool is full bcs they are
waiting for the storage manager read lock to write the page or journal,
leading to replication starting failure. Here we use io executor to send
replicate packet to fix thread pool starvation problem.
Quorum voting is used by both the live and the backup to decide what to do if a replication connection is disconnected.
Basically, the server will request each live server in the cluster to vote as to whether it thinks the server it is replicating to or from is still alive.
You can also configure the time for which the quorum manager will wait for the quorum vote response.
Currently, the value is hardcoded as 30 sec. We should change this 30-second wait to be configurable.
When setting "default-max-consumers" in addressing setting in broker.xml
It has no effect on the "max-consumers" property on matching address queues.
Tests added as part of a previous commit
This closes#2107