https://issues.apache.org/jira/browse/ARTEMIS-524
I am keeping all the debug ad tracing I added during the debug of this issue,
for that reason this commit may look longer than expected
The fix will be highlited by the tests added on org.apache.activemq.artemis.tests.integration.client.PagingTest
OptimizedAckTest: Using core api to replace old activemq
broker API to checking message count.
JmsQueueTransactionTest#testCloseConsumer: a bug in
delivery when prefetchSize is 0.
(InitalReconnectDelayTest)close connection after test.
- Added a thread pool executor, that combines cached and fixed size thread pooling.
It behaves like a cached thread pool in that it reuses exising threads and removes
idle threads after a timeout, limits the maximum number of threads in the pool, but
queue additional request instead of rejecting them.
- changed existing code to use the new thread pool instead of a fixed-size thread pool in
all places that are configured with a client thread pool size.
Temp Queue not deleted when connection is closed.
Enable Stomp in openwire test because some test uses it.
Remove unused code in opwnwire
Wrong XA error code returned when xid is missing
(ActiveMQXAConnectionFactory.testRollbackXaErrorCode)
regression in ActiveMQSslConnectionFactoryTest (SSL related)
1. Changed public fields in ActiveMQClient to private and added getters.
Exposing fields for thread pool sized allow to modify them in undesired ways.
I made these fields private and added corresponding getter methods.
In addition, I renamed the field 'globalThreadMaxPoolSize'
to 'globalThreadPoolSize' to be more consistent with the
'globalScheduledThreadPoolSize' field name.
I also adapted some tests to always call clearThreadPools after
the thread pool size configuration has been changed.
2. Protect against injecting null as thread pools
ActiveMQClient.injectPools allowed null as injected thread pools.
The effect was that internal threads pools were created,
but not shutdown correctly.
Communication between nodes will fail under certain topologies
JGroups has something called JForkChannel that could be used on container systems.
And be injected into Artemis.
For some reason that channel cannot be reused for more than one channel per VM.
And it cannot ever be closed.
I am keeping the trace logs I used to debug this issue in case anything similar to this happens again.
https://issues.apache.org/jira/browse/ARTEMIS-484
The File copy after the initial synchronization on large messages was broken.
On this commit we fix how the buffer is cleaned up before each read since
a previously unfinished body read would make the buffer dirty.
I'm keeping also lots of Traces I have added to debug this issue, so they will
be useful if anything like this happens again.
If a LM write packet is received from the live assume that the large
message exists and create a local reference.
Old behavour would reject the packet which could lead to loss of data on
failover see JIRA.
https://issues.apache.org/jira/browse/ARTEMIS-463
This will have some extra refactoring on the protocol head, transferring responsibility to the broker classes in a lot of cases
and removing some duplicated code
This was a team effort from Clebert Suconic and Howard Gao
There is a potential deadlock scenario if 2 threads try
sendReplicatePacket. Thread 1 registers Netty callback which will
invoke readyForWrites() method, which locks the callback list and waits
on the replicationLock. Thread 2 enters sendReplicatePacket and gets
the replicationLock and is blocked on the callback list lock. This fix
ensures the sendReplicatePacket blocks on the ReplicationManager meaning
no interleaving of the Netty callback thread and the wait on lock can
happen.
It's possible for the latch used for flow control here to get out of sync. In
other words, multiple count-downs can occur between count-ups so that the latch
always has a count > 0. When this situation arises then every single packet
sent to the replica is delayed by 5 seconds.
The solution here essentially is to eliminate the latch completely and use a
condition/wait/notify pattern.
This patch fixes a number of bugs with the JDBC Journal implementation.
Mainly around how it was handling transactions. The XA transactions
tests are now enabled to test both the File and Database store.
When publishing server connectors to other cluster members the whole buffer is sent.
This fixes ARTEMIS-362 by extracting the filled part of the buffer for broadcasting.
Added test case that checks that packet size does not exceed 1500 bytes with one connector.
This feature required a bit of refactoring to the plugin interface itself as
well as a restriction on the configuration so that either only one plugin could
be specified or an ulimited number of security-setting matches. This was done
to prevent messy situations where a plugin could update settings from the XML
or even another plugin if there were overlapping matches.
Its now possible to also add the broker name to jmx tree avoiding clashes when multiple brokers are in a single vm. This is now the default but the old way can be used with some configuration
https://issues.apache.org/jira/browse/ARTEMIS-311
The old property-file based security manager shouldn't be used anymore. Instead
use the JAAS InVMLoginModule for in-vm tests, embedded use-cases, etc. and use
the other JAAS login modules for normal server use-cases.
Stopping the server should be able to handle exceptions thrown by
individual components. Before this patch the stop method would throw
any raised exception and exit. This can lead to partial shutdowns of
the server resulting in leaking threads. This patch catches any
component exceptions and logs an appropriate error message, allowing the
server to properly shutdown.
The failback process needs to be deterministic rather than relying on various
incarnations of Thread.sleep() at crucial points. Important aspects of this
change include:
1) Make the initial replication synchronization process block at the very
last step and wait for a response from the replica to ensure the replica has
as the necessary data. This is a critical piece of knowledge during the
failback process because it allows the soon-to-become-backup server to know
for sure when it can shut itself down and allow the soon-to-become-live
server to take over. Also, introduce a new configuration element called
"initial-replication-sync-timeout" to conrol how long this blocking will occur.
2) Set the state of the server as 'LIVE' only after the server is fully
started. This is necessary because once the soon-to-be-backup server shuts
down it needs to know that the soon-to-be-live server has started fully before
it restarts itself as the new backup. If the soon-to-be-backup server restarts
before the soon-to-be-live is fully started then it won't actually become a
backup server but instead will become a live server which will break the
failback process.
3) Wait to receive the announcement of a backup server before failing-back.
Netty 4.x uses pooled buffers. These buffers can run out of memory when
transferring large amounts of data over connection. This was causing an
OutOfMemory exception to be thrown on the CoreBridge when tranferring
large messages. Netty provides a callback handler to notify listeners
when a Connection is writable. This patch adds the ability to register
connection writable listeners to the Netty connection and registers the
relevant callback from the Bridge to avoid writing when the buffers are
full.
In one situation I have seen a failrue on ProducerFlowControl to break everything else from here
This change will both avoid the failure and change the report of leaked threads so we can find them easily on the system.out when it happens (for future debugging)
this is doing some refactoring, making the SecurityStore mechnism possible to be reused on other protocols,
without forcing them to implement ServerSession on checks that won't fit the Server Model from Artemis
this is just calling Idea format on all the files using the new style
I am separating manual changes from automatic changes in case I have to repeat the manual changes again
The information about the bridge connection retries is already logged
at DEBUG level elsewhere and a WARN message is already logged when the
connection dies in the first place. I don't see any good reason to
keep this logging here.
https://issues.apache.org/jira/browse/ARTEMIS-163
On this pass I'm just converting the native layer to a simpler one.
It wasn't very easy to change the alignment at the current framework,
so I did some refactoring simplifying the native layer
The volume of the nubmer of changes here is because:
- The API is changed, we now don't close the libaio queue between files
- The native layer won't use malloc as much as it used to, saving some CPU and memory defragmentation
- I organized the code around nio and libaio
The current Security Manager implementation was returning the username
instead of the default password when validating the default user.
This patch returns the correct value and cleans up the validate method.
https://issues.apache.org/jira/browse/ARTEMIS-136
From what I researched from implementers of XA TM if you throw ERR over communication errors the transaction manager will create
an heuristic transaction to be manually dealt with.
Other XA Implementations (such as Oracle JDBC) are return FAIL over communication failures during any XA operation.
"mvn install" now works without the lint, but a "mvn install javadoc:jar" still fails. Since that is what the release plugin uses, need to keep the lint there for now. Still lots of failures.
To reproduce this commit, apply a replace regex rule using:
search regex: /\*\*\n \* Licensed
replace: /\*\n \* Licensed
These files had to be changed manually:
artemis-selector/src/main/javacc/HyphenatedParser.jj
artemis-selector/src/main/javacc/StrictParser.jj
artemis-website/src/main/resources/styles/impact/css/pygmentize.css
artemis-website/src/main/resources/styles/impact/css/site.css
We had a few reported small issues on the codebase from the recent introduced google error prone.
This should eliminate any issues, and I am making sure these won't happen again
If standalone backup server with shared has defined scale-down policy
but it's disabled then backup does not activate. Problem is that
server is checking only whether scale down is defined but if it's
enabled. This causes that server.stop() is called and backup does
not activate.