- Fixed an issue where I needed to set connection to null after closing it
- Added more tests on QpidDispatchPeerTest (tests i would have done manually, and reproduced a few issues along the way)
- Adding a paragraph about addressing and distinct queue names
- Renaming match on peers, senders and receivers as "address-match"
- Changing qpid dispatch test to use a single listener
- Fixing reconnect attemps message
It add additional required fixes:
- Fixed uncommitted deleted tx records
- Fixed JDBC authorization on test
- Using property-based version for commons-dbcp2
- stopping thread pool after activation to allow JDBC lease locks to release the lock
- centralize JDBC network timeout configuration and save repeating it
- adding dbcp2 as the default pooled DataSource to be used
Replaces direct jdbc connections with dbcp2 datasource. Adds
configuration options to use alternative datasources and to alter the
parameters. While adding slight overhead, this vastly improves the
management and pooling capabilities with db connections.
This reverts commit dbb3a90fe6.
The org.apache.activemq.artemis.core.server.Queue#getRate method is for
slow-consumer detection and is designed for internal use only.
Furthermore, it's too opaque to be trusted by a remote user as it only
returns the number of message added to the queue since *the last time
it was called*. The problem here is that the user calling it doesn't
know when it was invoked last. Therefore, they could be getting the
rate of messages added for the last 5 minutes or the last 5
milliseconds. This can lead to inconsistent and misleading results.
There are three main ways for users to track rates of message
production and consumption:
1. Use a metrics plugin. This is the most feature-rich and flexible
way to track broker metrics, although it requires tools (e.g.
Prometheus) to store the metrics and display them (e.g. Grafana).
2. Invoke the getMessageCount() and getMessagesAdded() management
methods and store the returned values along with the time they were
retrieved. A time-series database is a great tool for this job. This is
exactly what tools like Prometheus do. That data can then be used to
create informative graphs, etc. using tools like Grafana. Of course, one
can skip all the tools and just do some simple math to calculate rates
based on the last time the counts were retrieved.
3. Use the broker's message counters. Message counters are the broker's
simple way of providing historical information about the queue. They
provide similar results to the previous solutions, but with less
flexibility since they only track data while the broker is up and
there's not really any good options for graphing.
The queue is missing access to the server,
recent changed functionality on temporary queues namespace needed
the server and now the unit test has to pass in the reference to fix the test.
In a cluster scenario where non durable subscribers fail over to
backup while another live node forwarding messages to it,
there is a chance that the the live node keeps the old remote
binding for the subs and messages go to those
old remote bindings will result in "binding not found".
Both authentication and authorization will hit the underlying security
repository (e.g. files, LDAP, etc.). For example, creating a JMS
connection and a consumer will result in 2 hits with the *same*
authentication request. This can cause unwanted (and unnecessary)
resource utilization, especially in the case of networked configuration
like LDAP.
There is already a rudimentary cache for authorization, but it is
cleared *totally* every 10 seconds by default (controlled via the
security-invalidation-interval setting), and it must be populated
initially which still results in duplicate auth requests.
This commit optimizes authentication and authorization via the following
changes:
- Replace our home-grown cache with Google Guava's cache. This provides
simple caching with both time-based and size-based LRU eviction. See more
at https://github.com/google/guava/wiki/CachesExplained. I also thought
about using Caffeine, but we already have a dependency on Guava and the
cache implementions look to be negligibly different for this use-case.
- Add caching for authentication. Both successful and unsuccessful
authentication attempts will be cached to spare the underlying security
repository as much as possible. Authenticated Subjects will be cached
and re-used whenever possible.
- Authorization will used Subjects cached during authentication. If the
required Subject is not in the cache it will be fetched from the
underlying security repo.
- Caching can be disabled by setting the security-invalidation-interval
to 0.
- Cache sizes are configurable.
- Management operations exist to inspect cache sizes at runtime.
This is allowing journal appends to happen in burst
during replication, by batching replication response
into the network at the end of the append burst.
I couldn't reproduce this with a test, but static code analysis led me
to this solution which is similar to the fix done for ARTEMIS-2592 via
e397a17796.
1 of 2) - Porting of HORNETMQ-1575
In a live-backup scenario, when live is down and backup becomes live, clients
using HA Connection Factories can failover automatically. However if a
client decides to create a new connection by itself (as in camel jms case)
there is a chance that the new connection is pointing to the dead live
and the connection won't be successful. The reason is that if the old
connection is gone the backup will not get a chance to announce itself
back to client so it fails on initial connection.
The fix is to let CF remember the old topology and use it on any
initial connection attempts.
Since getDiskStoreUsage on the ActiveMQServerControl is converting a
double to a long the value is always 0 in the management API. It should
return a double instead.
Adding this metric required moving the meter registration code from the
AddressInfo class to the ManagementService in order to get clean access
to both the AddressInfo and AddressControl classes.
The calculation used by
ActiveMQServerControlImpl.getDiskStoreUsagePercentage() is incorrect. It
uses disk space info with global-max-size which is for address memory.
Also, the existing getDiskStoreUsage() method *already* returns a
percentage of total disk store usage so this method seems redundant.
Now it is possible to reset queue parameters to their defaults by removing them
from broker.xml and redeploying the configuration.
Originally this PR covered the "filter" parameter only.
ORIG message propertes like _AMQ_ORIG_ADDRESS are added to messages
during various broker operations (e.g. diverting a message, expiring a
message, etc.). However, if multiple operations try to set these
properties on the same message (e.g. administratively moving a message
which eventually gets sent to a dead-letter address) then important
details can be lost. This is particularly problematic when using
auto-created dead-letter or expiry resources which use filters based on
_AMQ_ORIG_ADDRESS and can lead to message loss.
This commit simply over-writes the existing ORIG properties rather than
preserving them so that the most recent information is available.
- when sending messages to DLQ or Expiry we now use x-opt legal names
- we now support filtering thorugh annotations if using m. as a prefix.
- enabling hyphenated_props: to allow m. as a prefix
DivertBindings are now properly cleaned up when a queue binding is
removed that matches the divert. The correct key is now used to remove
the queue address from the set and the correct address is now used to
remove the remote consumer.
Test fails with the primary server being killed by the crash and the backup server is killed
by the tearDown before ScaleDownHandler can kick in. This commit adds a wait method to allow
ScaleDownHandler to process before the test completes.
AmqpExpiredMessageTest will expire messages, eventually the counter could be 0
so it is invalid to assertEquals(1, queue.getMessageCount()) as it will be 0 eventually.
This one should improve eventual failures on ClusterConnectionControlTest
to validate this, run ClusterConnectionControlTest::testNotifications in a loop
you will see eventually the test taking longer to shutdown the executor as the call could be blocked.
This won't be an issue on a real server (Production system)
however, on the testsuite or while embedded this could cause issues,
if a same instance is stopped then started.
This is the reason why BackupAuthenticationTest was intermittently failing.
I also adapted the test since I would need to stop the server and reactivate it in order to change the configuration.
The previous test wasn't acting like a real server.