This is still part of ARTEMIS-1776 fix, which still part of the same release as we are on now.
Hence I'm not opening a new JIRA for this one.
Cherry-picked from e5bce13316f7e81bb15a12592622df2ea2632a35
The cluster connection bridge hard codes its producerWindowSize to -1
(meaning no producer flow control) even if you configure it otherwise.
Cherry-picked from 98ce31bf588a314a0408216e4ad4b85597a15186
With NFSv4 it is now necessary to lock/unlock the byte of the server
lock file where the state information is written so that the
information is then flushed to the other clients looking at the file.
(cherry picked from commit 2ec173bc708daca163c9356cf12440432abc61c4)
There is a leak on replication tokens in the moment when a backup is
shutdowned or killed and the ReplicationManager is stopped. If there
are some tasks (holding replication tokens) in the executor, these
tokens are simply ignored and replicationDone method isn't called on
them. Because of this, some tasks in OperationContextImpl cannot be
finished.
(cherry picked from commit 88a018e17fd49097de1186c65e25cd0af578b6a9)
(cherry picked from commit d6cbc0aa885fa88beb7f1b2450cdfe4da9466947)
This is based on the work @jbertram made at the github pr #1466 and the discussions we had there
(cherry picked from commit ce6942a9aa9375efaa449424fe89de2db3f22e36)
When a large message is replicated to backup, a pendingID is generated
when the large message is finished. This pendingID is generated by a
BatchingIDGenerator at backup.
It is possible that a pendingID generated at backup may be a duplicate
to an ID generated at live server.
This can cause a problem when a large message with a messageID that is
the same as another largemessage's pendingID is replicated and stored
in the backup's journal, and then a deleteRecord for the pendingID
is appended. If backup becomes live and loads the journal, it will
drop the large message add record because there is a deleteRecord of
the same ID (even though it is a pendingID of another message).
As a result the expecting client will never get this large message.
So in summary, the root cause is that the pendingIDs for large
messages are generated at backup while backup is not alive.
The solution to this is that instead of the backup generating
the pendingID, we make them all be generated in advance
at live server and let them replicated to backup whereever needed.
The ID generater at backup only works when backup becomes live
(when it is properly initialized from journal).
(cherry picked from commit d50f577cd50df37634f592db65200861fe3e13d3)
add the adaptTransportConfiguration() method to the
ClientProtocolManagerFactory so that transport configurations used by
the ClientProtocolManager have an opportunity to adapt their transport
configuration.
This allows the HornetQClientProtocolManagerFactory to adapt the
transport configuration received by remote HornetQ broker to replace the
HornetQ-based NettyConnectorFactory by the Artemis-based one.
JIRA: https://issues.apache.org/jira/browse/ARTEMIS-1431
next commit should have the fix.
this is to make it easy to confirm the fix by people looking.
(cherry picked from commit 96c6268f5a68a293589ac0061d561265d9e79972)
The default id-cache-size is 20000 and the default
confirmation-window-size is 1MB. It turns out the 1MB
size is too small for id-cache-size.
To fix it we adjust the confirmation-window-size to 10MB. Also
a test is added to guarantee it won't break this rule when this
default value is to be changed to any new value.
(cherry picked from commit 06986e4ee1eb32fc2642b111ca3955518f684adb)
When a large message is being diverted, a new copy of the original
message is created and replicated (if there is a backup) to the backup.
In LargeServerMessageImpl.copy(long) it reuse a byte array to copy
message body. It is possible that one block of date is read into
the byte array before the previous read has been replicated,
causing the replicated bytes to corrupt.
If we make a copy of the byte array before replication, the corruption
of data will be avoided.
(cherry picked from commit 045021f7df583f6109cb0749dc1601c9a85dbe75)
If replication blocked anything on the journal
the processing from clients would be blocked
and nothing would work.
As part of this fix I am using an executor on ServerSessionPacketHandler
which will also scale better as the reader from Netty would be feed immediately.
remove custom repo
update groupid to match artifact in maven central.
bump version also to that now deployed to maven central.
bump checkstyle version to 7.7 to make compatible.
updated checkstyle.xml to ignore existing issues which are prolific
which are now flagged in latest version as some bugs in previous meant they we'ren't detected e.g. https://github.com/checkstyle/checkstyle/issues/3320
fixing some violations which are not too prolific.
(cherry picked from commit b8ebe0577504fa0b2c39471f36656a0fe0ad59bd)
Even if there is no address/url configured for the NetworkHealthCheck
an isolated backup will still fail-over potentially causing split-brain.
(cherry picked from commit 5cb5c8a6dc7179078f099d5455343e1f18fd3398)
Added a wait-for-activation option to shared-store master HA policies.
This option is enabled by default to ensure unchanged server startup behavior.
If this option is enabled, ActiveMQServer.start() with a shared-store master server will not return
before the server has been activated.
If this options is disabled, start() will return after a background activation thread has been started.
The caller can use waitForActivation() to wait until server is activated, or just check the current activation status.
(cherry picked from commit 6017e305d90664b4bf5b8f891e43514907df9a4b)
(cherry picked from commit 7aa50546b3dc6c012bd7d4118d82055ffd1e1226)