If nodes are shutting down we close thread pools and throw
'EsRejectedExcutionException'. This commit handles these exceptions
gracefully if throw during Ping execution.
Index settings can now override default behavior for 'double write'
and 'no delete open files' on MockDirectoryWrapper if tests use these
options in a legit. way. ie. in a restore situation a double write
is a legit operation.
This can go wrong if indices with the same name are repeatably created and deleted.
UUIDs can not be null anymore. If UUID is not available `_na_` will be used as a value.
Also - some minor clean up in ShardStateAction where shard started events could be added twice to the to-be-applied list where the second instance will be ignored.
Closes#3783
instead of relying on the current cluster state from the cluster service, make sure to rely on the cluster state we get from the change event, this will allow us to move processing of the cluster event around potentially to before the local cluster state has been updated
Currently we fail tests is any searcher reference is pending. Yet,
on a slow machine the freeContext calls that are async could still be
in flight so if there are pending searchers we wait for a bit to make
sure we don't fail if a freeContext call is in flight.
The MockEngine now also contains the stack trace of the first close call
if a searcher is closed twice.
DateHistogramFacets can be used on fields that store dates in a numeric
(long) field using different resolutions like seconds instead of
milliseconds. This commit adds a test that checks if the factor is
applied correctly to scale it up to milliseconds.
Introduce a new internal, index shard level, post recovery state, where the shard moves to when its done with recovery. The shard will now move to started only once the cluster state with its respective cluster state level state is started.
This change allow to have more fine grained control over when to allow reads on a shard, resolving potential refresh temporal visibility aspects while indexing and issuign a refresh. By only allowing reads on started shards, and making sure we refresh right before we move to started
Added node restart capabilities to TestCluster
Trigger retry mechanism (onFailure method) instead of invoking transport service with null DiscoveryNode when no DiscoveryNode can be found for a ShardRouting.
Currently tests only run with node clients but eventually we want to
run all tests with randomly choosen node / transport clients. To enable
this during development and on test servers as a transition phase this
commit adds the ability to allow a fraction of the clients used in
tests to be transport clients. By default this is still disabled.
To enable transport clients in tests pass '-Dtests.client.ratio=[0..1]'
where '1.0' will force transport clients and '0.0' completely disables
them. If an empty string is passed the ratio is chosen at random for
each test.
Catching Throwable instead of Exception in TransportClient
and TransportClientNodesService and restore interrupted flag
if interrupt exception is caught and ignored
TransportClient doesn't add the initial nodes to the nodes list
if it doesn't retrieve any nodes from the listeners which can cause
the transport client to throw a 'NoNodeAvailableException' if the
'sniff' response didn't return any nodes. This situation can occure
if the client tries to get the listener nodes cluster state while that
node is not yet connected to any other nodes.
TestCluster can currently only be used in a globally shared scope.
This commit adds the ability to use the TestCluster in 3 different
scopes per test-suite. The scopes are 'Global', 'Suite' and 'Test'
where the cluster is shared across all tests, across all test methods or
not at all respectivly.
Subclasses of AbstractIntegrationTest (formerly AbstractSharedClusterTest)
can add an annotation if they need a different scope than Global (default):
```
@ClusterScope(scope=Scope.Suite, numNodes=1)
```
This also allows to specify the number of shared nodes in that TestCluster
that are available when a test starts.
The cleanups in this commit include:
- s/Elasticsearch/ElasticSearch/g on test classes
- Move test classes in org.elasticsearch.test
This assertion module also injects an AssertingIndexSearcher that
checks if our queries are all compliant with the lucene specification
which is improtant for future updates and changes in the upstream project.
There was a small window of time where the transport response handler's handException method was invoked twice. As far as I can tell this happened when node disconnect event was processed just after the request was registered and between a "Node not connected" error was thrown. The TransportService#sendRequest method would invoke the transport response handler's handException method regardless if it was already invoked. This resulted that for one request failure, two retries were executed.
The mpercolate api has an assert that tripped when more than the expected shard level responses were returned. This was caused by the issue described above. For the a single shard level request we had multiple responses and this broke the the the total excepted responses. Also the reduce could be started prematurely, which resulted in an incorrect final response (e.g. total count being incorrect). For example: two shards in total, shard 0 gets reduces twice. The second shard 0 response gets in just before shard 1 response gets in. The reduce starts without shard 1 response.
The master node processing changes to cluster state, and part of the processing is publishing the cluster state to other nodes. It does not wait for the cluster state to be processed on the other nodes before it moves on to the next cluster state processing job.
This is fine, we support out of order cluster state events using versioning, and nodes can handle those cases. It does lead though to non optimal API semantics. For example, when issuing cluster health, and waiting for green state, the master node will report back once the cluster is green based on its cluster state, but that mentioned "green" state might not have been received by all other nodes yet.
Add a discovery.zen.publish_timeout setting, and default it to 5s. This will give a best effort into making sure all nodes will process a cluster state within a window of time.
closes#3736
If a request hasn't been acknowledged, there's no guarantee for any node to hold the up-to-date cluster state (not even the master yet, as the execution is asynchronous)
The average and sum comparators basically share the same code which is
copy-past today. We can simplify this into a base class which reduces
code duplication and prevents copy-paste bugs.
While testing an async system providing reproducible tests that
use randomized components is a hard task we should at least try to
reestablish the enviroment of a failing test as much as possible.
This commit allows to re-establish the shared 'TestCluster' by
resetting the cluster to a predefined shared state before each test.
Before this commit a tests that is executed in isolation was likely
using a entirely different node enviroment as the failing test since
the 'TestCluster' kept intermediate nodes started by other tests around.
use service id for pid name
disable filtering on *.exe (caused corruption)
rename exe names and add more options to .bat
start/stop operations are now supported (and expected to be called) by service.bat
add more variables from the env to customize default behavior prior to installing the service
add manager option
fixes regarding batch flow
specify service id in description
minor readability improvement
include .exe only in ZIP archive
rename x64 service id to make it work out of the box
add elasticsearch as a service for Windows platforms
based on Apace Commons Daemon
supports both x64 and x86
Compared to setting node.local to true, would be nicer to support node.mode with values of local or network.
Note, node.local is still supported.
closes#3713
The SearchWithRandomExceptionTests aim to catch problems when resources
are not closed due to exceptions etc. Yet in some cases the random seeds
cause the index to never be fully allocated so we basically go into a
ping-pong state where we try to allocate shards back and forth on nodes.
This causes all docs to time out which in-turn causes the tests to run
for a very long time (hours or days).
If we can not allocate the index and get to a yellow state we simply
index only one doc and expected all searches to fail.
This commit also beefs up the assertions in this test to check if
documents are actually present if they are indexed and refresh was
successful.
Closes#3694
Stats can be retrieved on a per-feature / per-component basis including the fields
they apply to. This commit add support for a 'completion' flag to include statistics
for the complition feature as well as 'completion_fields' to only
include certain fields into the returned statistics.
To disambiguate between 'fielddata' and 'completion' fields this commit
uses 'fields' as the default inclusion filter for stats fields only used
if not dedicated '[completion|fielddata]_fields' paramter is provided.
Relates to #3522
Add a dedicated suggest thread pool for the suggest API. With the new completion suggest type, which is purely CPU bounded, it makes more sense to have a dedicated thread pool for suggest compared to having it share the search thread pool and "competing" against other search operations.
closes#3698
When a dynamic type is introduced during indexing, the node that introduces it sends the fact that its added to the master, to be added to the master node. The master node then adds it to the index metadata and republishes that fact.
In order not to delete the mapping while the new type is introduced on the node that introduced it, we keep a map of seen mappings, and remove a mapping type when we already processed it.
The map is not properly cleared though in all places where an actual index service is being removed on a node.
closes#3697
Refresh flag in optimize is problematic, since the shards refresh is allowed to execute on is different compared to the optimize shards. In order to do optimize and then refresh, they should be executed as separate APIs when needed.
closes#3690
Refresh flag in flush is problematic, since the shards refresh is allowed to execute on is different compared to the flush shards. In order to do flush and then refresh, they should be executed as separate APIs when needed.
closes#3689
Both package types, RPM and deb now contain an option to not restart on upgrade.
This option can be configure in /etc/default/elasticsearch for dpkg based systems
and /etc/sysconfig/elasticsearch for rpm based systems.
By default the setting is as before, where a restart is executed on upgrade.
Closes#3685
In case of retries, we update the clusterState and shardsIt, make sure they are visible using volatile (even though updates will probably go through a memory barrier, this might explain rare failure we see when retry happens)
Introduced ElasticsearchAssertions that check for failures in all response
Added comment to flush requests as we don't check for failures there
Attempt to remove Thread.sleep in testChangeInitialShardsRecovery in favour of awaitBusy block
Added awaitBusy block in testQuorumRecovery waiting for YELLOW. We could have GREEN after the close node but wa want to wait for the YELLOW state.
As we are within an awaitBusy block, it doesn't make sense to have an assertion, since it would fail the test instead of waiting till the condition is verified (till timeout expires)
In 0.90.4, we deprecated some code:
* `GetIndexTemplatesRequest#GetIndexTemplatesRequest(String)` moved to `GetIndexTemplatesRequest#GetIndexTemplatesRequest(String...)`
* `GetIndexTemplatesRequest#name(String)` moved to `GetIndexTemplatesRequest#names(String...)`
* `GetIndexTemplatesRequest#name()` moved to `GetIndexTemplatesRequest#names()`
* `GetIndexTemplatesRequestBuilder#GetIndexTemplatesRequestBuilder(IndicesAdminClient, String)` moved to `GetIndexTemplatesRequestBuilder#GetIndexTemplatesRequestBuilder(IndicesAdminClient, String...)`
* `IndicesAdminClient#prepareGetTemplates(String)` moved to `IndicesAdminClient#prepareGetTemplates(String...)`
* `AbstractIndicesAdminClient#prepareGetTemplates(String)` moved to `AbstractIndicesAdminClient#prepareGetTemplates(String...)`
We can now remove that old methods in 1.0.
**Note**: it breaks the Java API
Relative to #2532.
Closes#3681.
/_template shows: No handler found for uri [/_template] and method [GET]
It would make sense to list the templates as they are listed in the /_cluster/state call.
Closes#2532.