After sending pages, the thread will hold the storage manager write lock
and send synchronization finished packet(use the parent thread pool) to
the backup node. At the same time, thread pool is full bcs they are
waiting for the storage manager read lock to write the page or journal,
leading to replication starting failure. Here we use io executor to send
replicate packet to fix thread pool starvation problem.
Quorum voting is used by both the live and the backup to decide what to do if a replication connection is disconnected.
Basically, the server will request each live server in the cluster to vote as to whether it thinks the server it is replicating to or from is still alive.
You can also configure the time for which the quorum manager will wait for the quorum vote response.
Currently, the value is hardcoded as 30 sec. We should change this 30-second wait to be configurable.
When setting "default-max-consumers" in addressing setting in broker.xml
It has no effect on the "max-consumers" property on matching address queues.
Tests added as part of a previous commit
This closes#2107
1. Add tests case to verify issue and fix, tests also tests for same behavior using CORE, OPENWIRE and AMQP JMS Clients.
2. Update Core Client to check for queue before creating, sharedQueue as per createQueue logic.
3. Update ServerSessionPacketHandler to handle packets from old clients to perform to implement the same fix server side for older clients.
4. Correct AMQP protocol so correct error code is returned on security exception so that amqp jms can correctly throw JMSsecurityException
5. Correct AMQP protocol to check for queue exists before create
6. Correct OpenWire protocol to check for address exists before create
When database persistence and no shared store option is being used,
Artemis is choosing to use InVMNodeManager, that is not providing
the same behaviour of FileLockNodeManager.
Replace guava Preconditions with artemis Preconditions
Replace guava Predicate with java Predicate
Replace guava Ordering with java Comparator
Replace guava Immutable, with ArrayList/Set and then wrap with unmodifiable
PageCursorProviderImpl is not handling any pending cleanup tasks
on stop, leaving paging enabled due to the remaining pages to be
cleared up.
PagingStoreImpl is responsible to trigger the flushing of pending
tasks on PageCursorProviderImpl before stopping it and to try to
execute any remaining tasks on the owned common executor, before
shutting it down.
It fixes testTopicsWithNonDurableSubscription.
NotificationActiveMQServerPlugin and LoggingActiveMQServerPlugin are
implementing the deprecated version of
ActiveMQServerPlugin::messageExpired that is not called by the
new version of the method or any other part of the code
This fixing test org.apache.activemq.artemis.tests.integration.management.NotificationTest#testMessageExpired
It avoid using the system clock to perform the locks logic
by using the DBMS time.
It contains several improvements on the JDBC error handling
and an improved observability thanks to debug logs.
largeMessagesFactory::newBuffer could create a pooled direct ByteBuffer
that will not be released into the factory pool: using a heap ByteBuffer
will perform more internal copies, but will make it simpler to be garbage
collected.
QuorumFailOverTest.testQuorumVotingLiveNotDead fails
because the quorum vote takes longer time to finish than
the test expects to.
(The test used to pass until commit ARTEMIS-1763)
The previous commit about this feature wasn't using the row count query
ResultSet.
The mechanics has been changed to allow the row count query
to fail, because DROP and CREATE aren't transactional and immediate
in most DBMS.
It includes a test that stress its mechanics if used with DBMS like
DB2 10.5 and Oracle 12c.
Additional checks and logs have been added to trace each steps.
JdbcNodeManager is configured to use the same network timeout
value of the journal and to validate all the timeout values
related to a correct HA behaviour.
The JDBC Connection leaks on:
- JDBCFileUtils::getDBFileDriver(DataSource, SQLProvider)
- SharedStoreBackupActivation.FailbackChecker::run on a failed awaitLiveStatus
Expose method to return current mappings of groups to consumers
Expose methods to reset (remove) specific group mapping from groupID to Consumer
Expose methods to reset (remove) all group mappings
messageAcknowledged plugin callback methods
Knowing the consumer that expired or acked a message (if available) is
useful and right now a message reference only contains a consumer id
which by itself is not unique so the actual consumer needs to be passed
When finding out if a connector belong to a target node it compares
the whole parameter map which is not necessary. Also in understanding
the connector the best place is to delegate it to the corresponding
remoting connection who understands it. (e.g. INVMConnection knows
whether the connector belongs to a target node by checking it's
serverID only. The netty ones only need to match host and port, and
understanding that localhost and 127.0.0.1 are same thing).
The queue metrics were being decremented improperly because on iteration
over the cancelled scheduled messages because the flag for fromMessageReferences was not
set to false. Setting the flag to false skips over the metrics update
which is what we want as the scheduled messages were never added to the
message references in the first place so the metrics don't need updating
When creating a temp destination and auto-create-address set to false, the
broker throws an error and refuse to create it. This doesn't conform to
normal use-case (like amqp dynamic flag) where the temp destination should
be allowed even if the auto-create-address is false.
There was a logic to validate if member is null.
Which seemed a bit weird considering the else would throw a NPE.
Fixing it proactively based on Coverity-scan findings.
The cluster connection bridge has a TopologyListener and connects to a new node
each time it receives a nodeUp() event. It needs to put a check here to make
sure that the cluster bridge only connects to its target node and it's backups.
This issue shows up when you run LiveToLiveFailoverTest.testConsumerTransacted
test.
Also in this commit improvement of BackupSyncJournalTest so that it runs more
stable.
It includes:
- Message References: no longer uses boxed primitives and AtomicInteger
- Node: intrusive nodes no longer need a reference field holding itself
- RefCountMessage: no longer uses AtomicInteger, but AtomicIntegerFieldUpdater
It allows a user to customize the max allowed distance between system and DB time,
improving HA reliability by shutting down the broker when the misalignment
exceeds configured limit.
In some environments it is not allowed to create a schema
by the application itself. With this change the AbstractJDBCDriver
now tests if an existing table is empty and executes further
statements in the same way as if the table does not exist.
It forces to use InVMNodeManager when no HA option is selected with JDBC persistence and includes the checks that the only valid JDBC HA options are SHARED_STORE_MASTER and SHARED_STORE_SLAVE.
The JDBC Lock Acquisition Timeout is no longer exposed to any user configuration and defaulted to infinite to match the behaviour of the journal (file-based) one.
When creting a durable topic subscription using the Artemis 1.x JMS
client library. The client sends a QueueQuery to the server to see if
the durable subsciption queue already exists. The broker then performs
some transformation of the queue addresses to suit the 1.x naming
scheme. However, if the queue does not already exist the transform is
attempted on a null string causing NPE. To fix we simply check that the
result return isExists=true.
Add Test Case to stop and restart server after config reload and check state, this re-creates network health check issue where config changes are lost when network health check de-activates the server and then re-activates.
Add fix to update the held configuration thats used when initialisation steps during start are done.
Free hash set used to hold page position for acks and removed refs.
The two set is cleared, but they still hold a big array.
It is safe to replace the old one with empty set.
Logging for the "fast-tests" profile used for PR builds could be reduced
significantly. This would save time as well as prevent log truncation
(Travis CI only supports logs up to 4MB).
Revert #1875
This reverts commit 5ad45369ce.
The storage manager is broken now as the AddressManager change here is trying to insert a record on the journal before startup.
- LargeServerMessageImpl.finalize is eventually causing deadlocks
- CoreMessage needs to check properties before decoding
- PagingTest tweaks
- ServerLocatorImpl can deadlock eventually, avoiding a lock and using actors
- ActiveMQServerImpl.finalize is also evil and can cause deadlocks on the testsuite
- MqttClusterRemoteSubscribeTest needs to setup the Address now on the setup
The PageCountPendingImpl was increasing the encode size without using its full allocation.
This was causing issues on replication as the encode is also used to determine the size of the packets.
however the packets were not receive the full allocated data causing missing packets on the replication
and test failures.
This is fixing the issue
This is good when you are a customer and an artemis engineer (e.g. me) asks your journal print-data but you can't do it because that would expose your user's data. If you do artemis data print --safe, that will only expose the journal structure without exposing user's data and eliminate any liability between the engineer and users.
Transactions may initialize a PagedReference without a valid message yet
during load of prepared transactions.
Caching has to be lazy on this case and it should load on demand.
Adding new metrics for tracking message counts and sizes on a Queue.
This includes tracking metrics for pending, delivering and scheduled
messages. The paging store also tracks message size now.
Cache `messageID`, `transactionID` and `isLargeMessage`
in PagedReference, so that when acknowledge, we do not have to
get PagedMessage which may be GCed and cause re-read entire page.
call
There was a small bug in the previous commit, the beforeMessageRoute
callback was being executed too early so the RoutingCountext wasn't
being filled in
Support exlusive consumer
Allow default address level settings for exclusive consumer
Allow queue level setting in broker.xml
Add the ability to set queue settings via Core JMS using address. Similar to ActiveMQ 5.X
Allow for Core JMS client to define exclusive consumer using address parameters
Add tests
Make sure that if a bridge disconnects and there is no record in the topology that it uses the original bridge connector to reconnect.
Originally the live broker that disconnected was left in the Topology, thie broke quorum voting as when th evote happened all brokers when asked though th etarget broker was still alive.
The fix for this was to remove the target live broker from the Topology. Since the bridge reconnect logic relied on this in a non HA environment to reconnect this stopped working.
The fix now uses the original target connector (or backup) to reconnect in the case where the broker was actually removed from the cluster.
https://issues.apache.org/jira/browse/ARTEMIS-1654
ActiveMQTestBase has been enhanced to expose the Database storage configuration and by adding specific JDBC HA configuration properties.
JdbcLeaseLockTest and NettyFailoverTests have been changed in order to make use of the JDBC configuration provided by ActiveMQTestBase.
JdbcNodeManager has been made restartable to allow failover tests to reuse it after a failover.
When live start replication, it must make sure there is
no pending write in message & bindings journal, or we may
lost journal records during initial replication.
So we need flush append executor after acquire StorageManager's
write lock, before Journal's write lock.
Also we set a 10 seconds timeout when flush, the same as
Journal::flushExecutor. If we failed to flush in 10 seconds,
we abort replication, backup will try again later.
Use OrderedExecutorFactory::flushExecutor to flush executor