The current setting of 20MB/sec seems to be too conservative given
the capabilities of modern hardware. Even on cloud infrastructure this
seems to be too lowish. A 50MB default should provide better out of the box
performance
Make improvements to how bloom filter hashing works based on guava 17 upcoming changes, see more here (https://code.google.com/p/guava-libraries/issues/detail?id=1119)
In order to do it, introduce a hashing enum, and use the (unused until now) hash type serialization to choose the correct hashing used based on serialized version.
Also, move to use our own optimized murmur hash for the new hashing logic.
Currently we use 5k operations as a flush threshold. Indexing 5k documents
per second is rather common which would cause the index to be committed on
the lucene level each time the flush logic runs which is 5 seconds by default.
We should rather use a size based threshold similar to the lucene index writer
that doesn't cause such agressive commits which can slow down indexing significantly
especially since they cause the underlying devices to fsync their data.
When the ClusterService applies a new cluster state, it is first assigned as the new active one and then all listeners are called. Some of ES's features sample the current state and try to take action on it (for example index a document). If that fails, they will wait for change in the cluster state and try again (for example, wait for a shard to start and try indexing again).
If you're unlucky you sample the state after it has been assigned as the "active" state but before all listeners has done the work. In this cases the action take (i.e., indexing a doc) will still fail (as the shard is not yet started) but waiting for a new state may take a long time or fail.
This commit adds a new ClusterStateStatus that allows to better track the stages a cluster state goes through (currently `RECEIVED`, `BEING_APPLIED` & `APPLIED`). This allows detecting that a cluster state is not yet fully applied and retry without waiting for a new state to arrive.
This commit also adds a utility class , ClusterStateObserver, to make this pattern slightly simpler and avoid common pit falls.
Closes#5741
The blacklist can be provided through -Dtests.rest.blacklist and supports a comma separated list of globs
e.g. -Dtests.rest.blacklist=get/10_basic/*,index/*/*
Also added some missing docs and made it clearer that the suite/test descriptions effectively contains their (relative) path (api/yaml_file/test section)
Closes#5881
This is a fix for a bug whereby a cluster that has no nodes started with
-Des.node.bench=true will cause clients to hang if they attempt to
submit a benchmark.
Also adds REST tests to validate fix
Closes#5754
we use the "local host" address in sevearl places in our networking layer, if local host is not resolved for some reason, still continue and operate but using the loopback interface
Today we first get a reference to the IndexSearcher in #acquireSearcher
and then futher down we try to run Store#incRef() which might throw an
exception if the store is already closed. There is a small window
that allows this to happen during InternalEngine#close() when we try
to acquire the searcher at the same time and the engine is the last
resource that holds a reference to the store.
This commit only affects unreleased code since the Store's ref counting
has not yet been released.
The test starts a single node, indexes into, restarts the node and checks that no data was lost. It only indexed into 2 shards and didn't wait for green meaning that the node could be restarted with non-started primary. In that case the node will not re-assign the primary as it was not started. This commit makes sure that we either wait for primaries to start or index into all shards which has the same net effect.
Also extending some logging in InternalIndexShard.
The test verifies that stats are measure by checking timeInMillis>0. On fast machines the suggestions are done in < 1 millis time. The tests now index documents (to power suggestions) and does multiple suggestions per iterations to slow things down.
When a replication operation (index/delete/update) fails to be executed properly, we fail the replica and allow master to allocate a new copy of it. At the moment, the node hosting the primary shard is responsible of notifying the master of a failed replica. However, if the replica shard is initializing (`POST_RECOVERY` state), we have a racing condition between the failed shard message and moving the shard into the `STARTED` state. If the latter happen first, master will fail to resolve the fail shard message.
This commit builds on #5800 and fails the engine of the replica shard if a replication operation fails. This protects us against the above as the shard will reject the `STARTED` command from master. It also makes us more resilient to other racing conditions in this area.
Closes#5847
A merge (and refresh) might rarely happen in the background between the two queries whose output is compared. It might then happen that two docs with same scores get returned by the two queries in a different order due to different lucene document id (which has changed in the meantime). To fix this we need to order by id when the score is the same, so that we can safely compare the output of the two queries (multimatch and dismax).
TTLPercolatorTests indexes docs with small TTLs which can trigger
AlreadyExpiredException exception. This is expected while rare and
we should just catch them.
Until today we did close the engine without aqcuireing the write lock
since most calls were still holding a read lock. This commit removes
the code that holds on to the readlock when failing the engine which
means we can simply call #close()
This test requires a mapping since otherwise if there is no mapping
added the percolator query might not be parsed as a query on a numeric
field since the query might arrive on a node before the dynamic mapping
reached that node.
This commit also moves the `indexService.readAllowed()` call up before
the number of percolation queries is check to make sure we fail if reads
are not allowed - there might be a query in-flight which means we need
to check another node rather than return an empty result.
Load tests showed that SerialMS has problems to keep up with
the merges under high load. We should switch back to CMS
until we have a better story to balance merge
threads / efforts across shards on a single node.
Closes#5817
If the during percolating a new field was introduced
in the local mapping service, then those changes should
be updated in cluster state of the master as well.
Closes#5776
This methods had some workarounds for bugs that seem to be fixed
in Java 7 [1]. There seem to be other problems on shared file-systems
which are not really supported by lucene anyway or rather not
recommeded. Yet the current solution that interrupts a static thread
reference is too dangrous given all the usage of NIO across
elasticsearch.
[1] http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4742723
This method basically forcefully creates as many files as possible
to find out the process limit in a brute-force manner. The number of
possible probles with this approach would exceed the number of lines
left on this commit message.
This commit uses a JMX based alternative to print the process limit.
This prevents executing bulks internal autocreate indices logic
and ensures that this internal request never creates an index
automaticall.
This fixes a bug where the TTL purger thread ran after the actual
index it was purging was already closed / deleted and that re-created
that index.
Closes#5766
This is related to LUCENE-5570 where fsync creates a 0-byte file
if the file does not exists. This commit adds the patched lucene
version using Java 7 APIs as well as a note to replace this method
with the upcomeing IOUtils#fsync in Lucene 4.8
This commit cleans up FsImmutableBlobContainer#writeBlob to make
use of Java7 Auto-Closing features and ensures that the directory
the blob was written to is fsynced as well if possible.
For resources that have their life time effectively defined by the search
context they are attached to, it is convenient to use the search context to
schedule the release of such resources.
This commit changes aggregations to use this mechanism and also introduces
a `Lifetime` object that can be used to define how long the object should
live:
- COLLECTION: if the object only needs to live during collection time and is
what SearchContext.addReleasable would have chosen before this change
(used for p/c queries),
- SEARCH_PHASE for resources that only need to live during the current search
phase (DFS, QUERY or FETCH),
- SEARCH_CONTEXT for resources that need to live until the context is
destroyed.
Aggregators are currently registed with SEARCH_CONTEXT. The reason is that when
using the DFS_QUERY_THEN_FETCH search type, they are allocated during the DFS
phase but only used during the QUERY phase. However we should fix it in order
to only allocate them during the QUERY phase and use SEARCH_PHASE as a life
time.
Close#5703
Java7's AutoCloseable allows to manage resources more nicely using
try-with-resources statements. Since the semantics of our Releasable interface
are very close to a Closeable, let's switch to it.
Close#5689