With NFSv4 it is now necessary to lock/unlock the byte of the server
lock file where the state information is written so that the
information is then flushed to the other clients looking at the file.
(cherry picked from commit 2ec173bc70)
There is a leak on replication tokens in the moment when a backup is
shutdowned or killed and the ReplicationManager is stopped. If there
are some tasks (holding replication tokens) in the executor, these
tokens are simply ignored and replicationDone method isn't called on
them. Because of this, some tasks in OperationContextImpl cannot be
finished.
(cherry picked from commit 88a018e17fd49097de1186c65e25cd0af578b6a9)
(cherry picked from commit d6cbc0aa88)
When a large message is replicated to backup, a pendingID is generated
when the large message is finished. This pendingID is generated by a
BatchingIDGenerator at backup.
It is possible that a pendingID generated at backup may be a duplicate
to an ID generated at live server.
This can cause a problem when a large message with a messageID that is
the same as another largemessage's pendingID is replicated and stored
in the backup's journal, and then a deleteRecord for the pendingID
is appended. If backup becomes live and loads the journal, it will
drop the large message add record because there is a deleteRecord of
the same ID (even though it is a pendingID of another message).
As a result the expecting client will never get this large message.
So in summary, the root cause is that the pendingIDs for large
messages are generated at backup while backup is not alive.
The solution to this is that instead of the backup generating
the pendingID, we make them all be generated in advance
at live server and let them replicated to backup whereever needed.
The ID generater at backup only works when backup becomes live
(when it is properly initialized from journal).
(cherry picked from commit d50f577cd5)
add the adaptTransportConfiguration() method to the
ClientProtocolManagerFactory so that transport configurations used by
the ClientProtocolManager have an opportunity to adapt their transport
configuration.
This allows the HornetQClientProtocolManagerFactory to adapt the
transport configuration received by remote HornetQ broker to replace the
HornetQ-based NettyConnectorFactory by the Artemis-based one.
JIRA: https://issues.apache.org/jira/browse/ARTEMIS-1431
The default id-cache-size is 20000 and the default
confirmation-window-size is 1MB. It turns out the 1MB
size is too small for id-cache-size.
To fix it we adjust the confirmation-window-size to 10MB. Also
a test is added to guarantee it won't break this rule when this
default value is to be changed to any new value.
(cherry picked from commit 06986e4ee1)
When a large message is being diverted, a new copy of the original
message is created and replicated (if there is a backup) to the backup.
In LargeServerMessageImpl.copy(long) it reuse a byte array to copy
message body. It is possible that one block of date is read into
the byte array before the previous read has been replicated,
causing the replicated bytes to corrupt.
If we make a copy of the byte array before replication, the corruption
of data will be avoided.
(cherry picked from commit 045021f7df)
If replication blocked anything on the journal
the processing from clients would be blocked
and nothing would work.
As part of this fix I am using an executor on ServerSessionPacketHandler
which will also scale better as the reader from Netty would be feed immediately.
remove custom repo
update groupid to match artifact in maven central.
bump version also to that now deployed to maven central.
bump checkstyle version to 7.7 to make compatible.
updated checkstyle.xml to ignore existing issues which are prolific
which are now flagged in latest version as some bugs in previous meant they we'ren't detected e.g. https://github.com/checkstyle/checkstyle/issues/3320
fixing some violations which are not too prolific.
(cherry picked from commit b8ebe05775)
Even if there is no address/url configured for the NetworkHealthCheck
an isolated backup will still fail-over potentially causing split-brain.
(cherry picked from commit 5cb5c8a6dc)
Added a wait-for-activation option to shared-store master HA policies.
This option is enabled by default to ensure unchanged server startup behavior.
If this option is enabled, ActiveMQServer.start() with a shared-store master server will not return
before the server has been activated.
If this options is disabled, start() will return after a background activation thread has been started.
The caller can use waitForActivation() to wait until server is activated, or just check the current activation status.
(cherry picked from commit 6017e305d9)
(cherry picked from commit 7aa50546b3)
AIOFileLockManager doesn't work on NFS-mounted share store directories.
Since the GFS2 bug https://bugzilla.redhat.com/show_bug.cgi?id=678585
has been fixed end of 2011, the class AIOFileLockManager is no longer needed and I have removed it.
(cherry picked from commit 557f02ba4d)