The syntax to specify one or more items is the same as for the Multi GET API.
If only one document is specified, the results returned are the same as when
using the More Like This API.
Relates #4075Closes#5857
The retry mechanism in the transport layer might cause the delete snapshot request to be executed twice if the cluster master is closed while the request is executed. First time delete snapshot request is getting successfully executed on the old master and then it is retried on the newly elected master. When the new master tries to delete the snapshot - the snapshot no longer exists (since it was successfully deleted by the old master) and SnapshotMissingException is returned.
Currently even if some shards of the snapshot are not snapshotted successfully, the snapshot is still marked as "SUCCESS". Users may miss the fact the there are shard failures present in the snapshot and think that snapshot was completed. This change adds a new snapshot state "PARTIAL" that provides a quick indication that the snapshot was only partially successful.
Closes#5792
Until now all version types have officially required the version to be a positive long number. Despite of this has being documented, ES versions <=1.0 did not enforce it when using the `external` version type. As a result people have succesfully indexed documents with 0 as a version. In 1.1. we introduced validation checks on incoming version values and causing indexing request to fail if the version was set to 0. While this is strictly speaking OK, we effectively have a situation where data already indexed does not match the version invariant.
To be lenient and adhere to spirit of our data backward compatibility policy, we have decided to allow 0 as a valid external version type. This is somewhat complicated as 0 is also the internal value of `MATCH_ANY`, which indicates requests should succeed regardles off the current doc version. To keep things simple, this commit changes the internal value of `MATCH_ANY` to `-3` for all version types.
Since we're doing this in a minor release (and because versions are stored in the transaction log), the default `internal` version type still accepts 0 as a `MATCH_ANY` value. This is not a problem for other version types as `MATCH_ANY` doesn't make sense in that context.
Closes#5662
We try to reuse character arrays and UTF8 writers with softreferences.
SoftReferences have negative impact on GC and should be avoided in
general. Yet in this case it can simply replaced with a per-stream
Bytes/CharsRef that is thread local and has the same lifetime as the
stream.
Today the RecovyerID is taken from a static atomic long which
is essentially a per JVM ID. We run the tests within the same
JVM and that means we don't really simulate what happens in
production environments. Instead we should use a per node generated
ID.
* If plugin does not provide `lucene` property, we consider that the plugin is compatible.
* If plugin provides `lucene` property, we try to load related Enum org.apache.lucene.util.Version. If this fails, it means that the node is too "old" comparing to the Lucene version the plugin was built for.
* We compare then two first digits of current node lucene version against two first digits of plugin Lucene version. If not equal, it means that the plugin is too "old" for the current node.
Plugin developers who wants to launch plugin check only have to add a `lucene` property in `es-plugin.properties` file. If you are using maven to build your plugin, you can do it like this:
In `pom.xml`:
```xml
<properties>
<lucene.version>4.6.0</lucene.version>
</properties>
<build>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
</resource>
</resources>
</build>
```
In `es-plugin.properties`, add:
```properties
lucene=${lucene.version}
```
BTW, if you don't already have it, you can add the plugin version as well:
```properties
version=${project.version}
```
You can disable that check using `plugins.check_lucene: false`.
This commit adds randomization for:
* `index.merge.scheduler.max_thread_count`
* `index.merge.scheduler.max_merge_count`
This commit also moves to use
EsExecutors#boundedNumberOfProcessors(Settings) to default
configure the default `max_thread_count` for better reproducibility
Closes#6194
Added new internal flag to IndicesOptions that tells whether aliases can be resolved to multiple indices or not.
Cut over to new metaData#concreteIndices(IndicesOptions, String...) for all the api previously using MetaData#concreteIndices(String[], IndicesOptions) and removed old method, deprecation is not needed as it doesn't break client code.
Introduced constants for flags in IndicesOptions for more readability
Renamed MetaData#concreteIndex to concreteSingleIndex, left method as a shortcut although it calls the common concreteIndices that accepts IndicesOptions and multipleIndices
We configure the threadpools according to the number of processors which is
different on every machine. Yet, we had some test failures related to this
and #6174 that only happened reproducibly on a node with 1 available processor.
This commit does:
* sometimes randomize the number of available processors
* if we don't randomize we should set the actual number of available processors
in the settings on the test node
* always print out the num of processors when a test fails to make sure we can
reproduce the thread pool settings with the reproduce info line
Closes#6176
On small hardware, the BENCH thread pool can be set to size 1. This is
problematic as it means that while a benchmark is active, there are no
threads available to service administrative tasks such as listing and
aborting. This change fixes that by executing list and abort operations
on the GENERIC thread pool.
Closes#6174
I think Chuck Norris is required to fix this at this point until we have an API
that can for instance pause a Benchmark. We basically wait for a query to be executed
and that query syncs on a latch with the test in a script :)
This commit also adds some more testing for benchmarks that run into errors.
When you have a nested document and want to sort on its fields, it's perfectly doable on regular fields but not on "generated" sub fields.
Here is a SENSE recreation:
```
DELETE /tmp
PUT /tmp
PUT /tmp/doc/_mapping
{
"properties": {
"flat": {
"type": "string",
"index": "analyzed",
"fields": {
"sub": {
"type": "string",
"index": "not_analyzed"
}
}
},
"nested": {
"type": "nested",
"properties": {
"foo": {
"type": "string",
"index": "analyzed",
"fields": {
"sub": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
PUT /tmp/doc/1
{
"flat":"bar",
"nested":{
"foo":"bar"
}
}
```
When sorting on `flat.sub` sub field, everything is fine:
```
GET /tmp/doc/_search
{
"sort": [
{
"flat.sub": {
"order": "desc"
}
}
]
}
```
When sorting on `nested` field, everything is fine:
```
GET /tmp/doc/_search
{
"sort": [
{
"nested.foo": {
"order": "desc"
}
}
]
}
```
But when sorting on `nested.sub` field, sorting is incorrect:
```
GET /tmp/doc/_search
{
"sort": [
{
"nested.foo.sub": {
"order": "desc"
}
}
]
}
Closes#6150.
Every class using these parameters has their own member where these four
are stored. This clutters the code. Because they mostly needed together
it might make sense to group them.
The recovery API was sometimes misreporting the recovered byte
percentages of index files. This was caused by summing up total file
lengths on each file chunk transfer. It should have been summing the
lengths of each transfer request.
Closes#6113
NettyTransport's ChannelPipelineFactory uses the instance variable
serverOpenChannels in order to create sockets. However, this instance variable
is set to null when stoping the netty transport, so if the transport tries to
stop and to initialize a socket at the same time you might hit the following
NullPointerException:
[2014-05-13 07:33:47,616][WARN ][netty.channel.socket.nio.AbstractNioSelector] Failed to initialize an accepted socket.
java.lang.NullPointerException: handler
at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.<init>(DefaultChannelPipeline.java:725)
at org.jboss.netty.channel.DefaultChannelPipeline.init(DefaultChannelPipeline.java:667)
at org.jboss.netty.channel.DefaultChannelPipeline.addLast(DefaultChannelPipeline.java:96)
at org.elasticsearch.transport.netty.NettyTransport$2.getPipeline(NettyTransport.java:327)
at org.jboss.netty.channel.socket.nio.NioServerBoss.registerAcceptedChannel(NioServerBoss.java:134)
at org.jboss.netty.channel.socket.nio.NioServerBoss.process(NioServerBoss.java:104)
at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
This fix ensures that the ChannelPipelineFactory always uses the channels that
have been used upon start, even if a stop request is issued concurrently.
Close#6144
Our improvements to t-digest have been pushed upstream and t-digest also got
some additional nice improvements around memory usage and speedups of quantile
estimation. So it makes sense to use it as a dependency now.
This also allows to remove the test dependency on Apache Mahout.
Close#6142
It's dangerous to expose SerialMergeScheduler as an option: since it only allows one merge at a time, it can easily cause merging to fall behind.
Closes#6120
When a mapping is declared and the type is known from the uri
then the type can be skipped in the body (see #4483). However,
there was no check if the given keys actually make a valid mapping.
closes#5864closes#6093
Percentiles are supposed to be monotonically increasing but floating-point
rounding issues can come into play and make the test fail if checks are too
strict.
The validate API was failing to reject JSON input that had unsupported
fields placed after a supported field. This was causing invalid requests
to be reported as valid.
Fixes#5685
Made sure that a match_all query is used when no query is specified and ensure no NPE is thrown either.
Also used the same code path as the search api to ensure that alias filters are taken into account, same for type filters.
Closes#6111Closes#6112Closes#6116
More Like This API would not take into account 'size' and 'from' in request body parameters.
Instead these values would always be overriden by the default values of REST parameters
'search_size' and 'search_from'.
Closes#5981
- Fix bug where repeatedly calling computeSummaryStatistics() could
accumulate some values incorrectly
- Fix check for number of responsive nodes on list is <= number of
candidate benchmark nodes
- Add public getters for summary statistics
- Add javadoc for new getters
- Add javadoc comments about API use
- Improve abort and status tests by calling awaitBusy() to wait for jobs
to be completely submitted before testing them
In our REST tests we already have support for features and skip sections that allow to skip tests if a feature is not supported.
We can then add a skip section based on the benchmark feature to the benchmark tests and execute them only when they are supported, knowing that they need at least a node with node.bench settings within the cluster. We can check that this requirement is met by calling the nodes info api.
This way we can dynamically decide whether to execute those tests or not and we don't need to have a node.bench around all the time. In fact, given that the REST tests use the GLOBAL cluster, we want to be able to randomize settings as much as possible and run tests against default settings as well. Also, this mechanism can be easily supported by the external cluster implementation that is used during the release process.
Introduced ability to disable benchmark nodes which is needed by BenchmarkNegativeTest.
The test failed because 'percent_terms_to_match' defaults to 0.3, which results
in requiring that some terms only found in the queried document must match, when
all the documents are on the same shard.
By default More Like This API excludes the queried document from the response.
However, when debugging or when comparing scores across different queries, it
could be useful to have the best possible matched hit. So this option lets users
explicitly specify the desired behavior.
Closes#6067
* The shardTopDocs array should get created with the size equal to the total number of shard level requests and not the total number of requests that have a shard level result.
* Make sure no null TopDocs entires are passed down to TopDocs#merge
* Added dedicated scroll tests that tests scrolling on an index that has missing shards due to node failure.
* Made sure that the sort fields in SimpleNestedTests exists by adding the fields in the mapping during index creation.
Closes#6022
- Fix bug where repeatedly calling computeSummaryStatistics() could
accumulate some values incorrectly.
- Fix check for number of responsive nodes on list is <= number of
candidate benchmark nodes.
- Add public getters for summary statistics
- Add javadoc for new getters
- Add javadoc comments about API use
- Randomized integration tests for the benchmark API.
- Negative tests for cases where the cluster cannot run benchmarks.
- Return 404 on missing benchmark name.
- Allow to specify 'types' as an array in the JSON syntax when describing a benchmark competition.
- Don't record slowest for single-request competitions.
Closes#6003, #5906, #5903, #5904
Significant terms internally maintain a priority queue per shard with a size potentially
lower than the number of terms. This queue uses the score as criterion to determine if
a bucket is kept or not. If many terms with low subsetDF score very high
but the `min_doc_count` is set high, this might result in no terms being
returned because the pq is filled with low frequent terms which are all sorted
out in the end.
This can be avoided by increasing the `shard_size` parameter to a higher value.
However, it is not immediately clear to which value this parameter must be set
because we can not know how many terms with low frequency are scored higher that
the high frequent terms that we are actually interested in.
On the other hand, if there is no routing of docs to shards involved, we can maybe
assume that the documents of classes and also the terms therein are distributed evenly
across shards. In that case it might be easier to not add documents to the pq that have
subsetDF <= `shard_min_doc_count` which can be set to something like
`min_doc_count`/number of shards because we would assume that even when summing up
the subsetDF across shards `min_doc_count` will not be reached.
closes#5998closes#6041
Relates to #6059, where two new constants were introduced in IndicesOptions. There were already two constants there though, one of which we could have reused. This commit tries to unify them.
We currently compute initial sizings based on the cardinality of our fields.
This can be highly exagerated for sub aggregations, for example if there is a
parent terms aggregation that is executed over a field that has a very long
tail: most buckets will only collect a couple of documents.
Close#5994
We switched to Lucene's SloppyMath way of computing an approximate value of
the eath diameter given a latitude in order to compute distances, yet the
bounding box optimization of the geo distance filter still assumed a constant
earth diameter, equal to the average.
Close#6008
This relates to #6040, the fix is twofold, first, not handling missing context specifically in the search code, but behave the same as we do in non scroll search, where if all the shards failed, raise an exception. The second is to apply this logic in both scroll cases.
Way back when, when ES started, there was an idea for a dump infrastructure, but it ended up supporting its serviceability aspects through APIs, remove the unused code
Removed MetaData#concreteIndices variations that didn't require an IndicesOptions argument. Every caller should specify how indices should be resolved to concrete indices based on the indices options argument.
Closes#6059
RootMapper.validate was only used by the routing field mapper, which makes
buggy assumptions about how fields are indexed. For example, it assumes that
the index representation of a field is the same as its external representation.
Close#5844
A bad/non-existing scroll ID used to return a 200, however a 404 might be more useful.
Also, this PR returns the right Exception (SearchContextMissingException) in the Java API.
Additionally: Added StatusToXContent interface and RestStatusToXContentListener listener, so
the appropriate RestStatus can be returned
Closes#5729
Similar to search removal, the operation threading options are not really ued, and the default should always be used. This also considerably simplifies the code.
A side affect is that we can now remove the ShardIterator#firstOrNull method, which can cause for sneaky bugs to occur.
closes#6044
The analyze API used the standard analyzer from lucene and therefore removed
stopwords instead of using the elasticsearch default analyzer.
Closes#5974
The possibility of filtering for index templates in the cluster state API
had been introduced before there was a dedicated index templates API. This
commit removes this support from the cluster state API, as it was not really
clean, requiring you to specify the metadata and the index templates.
Closes#4954
Search operation threading is an option that is not really used, and current non default implementations are flawed. Handling it also creates quite the complexity in the search handling codebase...
This is a breaking change, but one that is actually a good one, since I haven't seen/heard anybody use it, and if its used, its problematic...
closes#6042
Change #5561 introduced a potential bug in that iterations that are performed
on a thread are might not be visible to other threads due to the removal of the
`volatile` keyword.
Close#6039
When a thread pool rejects the execution on the local node, the search might not return.
This happens due to the fact that we move to the next shard only *within* the execution on the thread pool in the start method. If it fails to submit the task to the thread pool, it will go through the fail shard logic, but without "counting" the current shard itself. When this happens, the relevant shard will then execute more times than intended, causing the total opes counter to skew, and for example, if on another shard the search is successful, the total ops will be incremented *beyond* the expectedTotalOps, causing the check on == as the exit condition to never happen.
The fix here makes sure that the shard iterator properly progresses even in the case of rejections, and also includes improvement to when cleaning a context is sent in case of failures (which were exposed by the test).
Though the change fixes the problem, we should work on simplifying the code path considerably, the first suggestion as a followup is to remove the support for operation threading (also in broadcast), and move the local optimization execution to SearchService, this will simplify the code in different search action considerably, and will allow to remove the problematic #firstOrNull method on the shard iterator.
The second suggestion is to move the optimization of local execution to the TransportService, so all actions will not have to explicitly do the mentioned optimization.
fixes#4887
Changed getFiniteStrings to use an iterative implementation instead of
recursive, so we don't use a Java stack-frame per character for each
suggestion at build & query time.
The defaults we have today in our data intensive memory structures don't properly add up to properly protected from potential OOM.
The circuit breaker, today at 80%, aims at protecting from extensive field data loading. The default threshold today is too permissive and can still cause OOMs.
The filter cache today is at 20%, and its too high when adding it to other limits we have, reduce it to 10%, which is still a big enough portion of the heap, yet provides improved safety measure.
closes#5990
If a source node disconnect during recover, the target node will respond by canceling the recovery. Typically the master will respond by removing the disconnected node from the cluster state, promoting another shard to become primary. This is sent it to all nodes and the target node will start recovering from the new primary. However, if the drop of a node caused the node count to go bellow min_master_node, the master will step down and will not promote shard immediately. When a new master is elected we may publish a new cluster state (who's point is to notify of a new master) which is not yet updated. This caused the node to start a recovery to a non existent node. Before we aborted the recovery without cleaning up the shard, causing subsequent correct cluster states to be ignored. We should not start the recovery process but wait for another cluster state to come in.
Closes#6024
A boost terms factor of 1.0 is not the same as no boosting of terms.
The desired behavior is to deactivate boosting by default. If the user
specifies any value other than 0, then boosting is activated.
Closes#6021
Updating to this version allows to configure a special JNA directory,
in case the /tmp directory is mounted with the noexec option, as JNA
extracts some data and tries to execute parts of it.
Also updated documentation to clarify mlockall and memory settings as well
as pointing to the new jna.tmpdir system property.
Closes#5493
In addition, add a special warning if the misplaced function is a "boost_factor"
function to avoid confusion of "boost" and "boost_function".
closes#5995
In the string type, we have an optimization to reuse the StringTokenStream on a thread local when a non analyzed field is used (instead of creating it each time). We should use this across the board on all places where we create a field with a String.
Also, move to a specific XStringField, that we can reuse StringTokenStream instead of copying it.
closes#6001
Some tests run before the thread is started and thus use 0 as a the current time, which later on leads to big time jumps and thus failures.
Ex. InternalEngineTests.testVersioningReplicaConflict2
Sub-aggregations are currently collected directly, by just forwarding the
doc ID and bucket ordinal to them. This change adds the new BucketCollector
abstract class that Aggregator extends, so that we have more flexibility to
add implicit filters or buffering between an aggregator and its sub
aggregators.
Close#5975
Sorting was broken on nested documents because the `missing(slot)` method
didn't correctly set the segment ordinal (readerGen), causing term ordinals to
be compared across segments.
Close#5986
When providing a number (milliseconds since epoch, UTC), range and term query/filter don't handle it correctly and convert it to a string, that is then first tried to parse as a date
closes#5969
Instead of resolving the global ordinal for each hit on the fly, resolve the global ordinals during post collect.
On fields with not so many unique values, that can reduce the number of global ordinals significantly.
Closes#5895Closes#5854
The default number of clients nodes is randomized between 0 and 1, applied to all cluster scopes (global, suite and test). Can be changed through the newly added `@ClusterScope#numClientNodes`.
In our tests we currently refer to nodes in a generic way. All the tests that either stop or start nodes rely on the fact that those nodes hold data though. Made that clearer as that becomes more important when introducing other types of nodes within the test cluster. Reflected this by adapting and renaming the following methods in `TestCluster`:
- ensureAtLeastNumNodes to ensureAtLeastNumDataNodes
- ensureAtMostNumNodes to ensureAtMostNumDataNodes
- stopRandomNode to stopRandomDataNode
and the following ones in `ElasticsearchIntegrationTest`:
- allowNodes to allowDataNodes
- dataNodes to numDataNodes.
- @ClusterScope#numNodes to numDataNodes
- @ClusterScope#minNumNodes to minNumDataNodes
- @ClusterScope#maxNumNodes to maxNumDataNodes
Added facilities to be able to deal with data nodes specifically, like for instance retrieve a client to a data node, or retrieve an instance of a class through guice only from data nodes.
Adapted existing tests to successfully run although there's a node client around.
Fixed _cat/allocation REST tests to make disk.total, disk.avail and disk.percent optional as client nodes won't return that info.
Closes#5949
Decay functions currently only use the first value in a field that contains
multiple values to compute the distance to the origin. Instead, it should
consider all distances if more values are in the field and then use
one of min/max/sum/avg which is defined by the user.
Relates to #3960closes#5940
during the stop process, we raise network disconnect, so it is valid to raise then while we are in stop mode, and actually, we should not miss any events in such a case.
Typically, this is not a problem, since its during the normal shutdown process on the JVM, but when running a reused cluster within the JVM (like in our test infra with the shared cluster), we should properly raise those node disconnects
closes#5918
Our ordinals currently start at 1, like FieldCache did in older Lucene versions.
However, Lucene 4.2 changed it in order to make ordinals start at 0, using -1
as the ordinal for the missing value. We should switch to the same numbering as
Lucene for consistency. This also allows to remove some abstraction on top of
Lucene doc values.
Close#5871
Separate version check logic for reads and writes for all version types, which allows different behavior in these cases.
Change `VersionType.EXTERNAL` & `VersionType.EXTERNAL_GTE` to behave the same as `VersionType.INTERNAL` for read operations.
The previous behavior was fit for writes but is useless in reads.
This commit also makes the usage of `EXTERNAL` & `EXTERNAL_GTE` in the update api raise a validation error as it make cause data to
be lost.
Closes#5663 , Closes#5661, Closes#5929
When a create document is executed, and its an auto generated id (based on UUID), we know that the document will not exists in the index, so there is no need to try and lookup the version from the index.
For many cases, like logging, where ids are auto generated, this can improve the indexing performance, specifically for lightweight documents where analysis is not a big part of the execution.
closes#5917
Today we use a builder pattern / setters to set relevant information
to Engine#Delete|Create|Index. Yet almost all the values are required
but they are not passed via ctor arguments but via an error prone builder
pattern. If we add a required argument we should see compile errors on that
level to make sure we don't miss any place to set them.
Prerequisite for #5917
Currently the parser accepts queries like
```
"query" : {
"any_query": {
...
},
"any_field_name":...
}
```
The "any_field_name" is silently ignored. However, this also causes the parser
not to move to the next closing bracket which in turn can lead to additional query
paremters being ignored such as "fields", "highlight",...
This was the case in issue #4895
closes issue #4895
Needed for REST backwards compatibility tests, since we need to run older tests with the latest runner, which randomizes shards and replicas, but the tests rely on defaults (5,1).
Done in a generic way based on compatibility versions e.g. `-Dtests.compatibility=1.0.0` allows to run tests in a special manner that is compatibile with 1.0.0 version.
Also moved back randomIndexTemplate to ElasticsearchIntegrationTest (from ImmutableCluster) where all the randomized aspects should be.
Closes#5897
In case of a DFS_QUERY_THEN_FETCH request, facets and aggregations are currently
instantiated during the DFS phase while they only become useful during the QUERY
phase. By instantiating during the QUERY phase instead, we can make better use
of recycling since objects will have a shorter life out of the recyclers.
Close#5821
Today, if some shards pass the DFS phase but all of them fail the QUERY phase,
the response will only consist of failed shards. We should throw an exception
instead in order to be consistent with the QUERY_THEN_FETCH type.
We initially added abstraction in the percentiles aggregation in order to be
able to plug in different percentiles estimators. However, only one of the 3
options that we looked into proved useful and I don't see us adding new
estimators in the future.
Moreover, because of this, we let the parser put unknown parameters into a hash
table in case these parameters would have meaning for a specific percentiles
estimator impl. But this makes parsing error-prone: for example a user reported
that his percentiles aggregation reported extremely high (in the order of
several millions while the maximum field value was `5`), and the reason was that
he had a typo and had written `fields` instead of `field`. As a consequence,
the percentiles aggregation used the parent value source which was a timestamp,
hence the large values. Parsing would now barf in case of an unknown parameter.
Close#5859
we do this test in other places in ES, but no dedicated test for it. This test was born out of the auto generate id work, but we should have this test regardless if it gets in or not
The current setting of 20MB/sec seems to be too conservative given
the capabilities of modern hardware / network throughput.
A 50MB default should provide better out of the box performance.
Change the default numeric precision_step to 16 for 64-bit types,
8 for 32-bit and 16-bit types. Disable precision_step for the 8-bit
byte type.
Closes#5905
The current setting of 20MB/sec seems to be too conservative given
the capabilities of modern hardware. Even on cloud infrastructure this
seems to be too lowish. A 50MB default should provide better out of the box
performance
Make improvements to how bloom filter hashing works based on guava 17 upcoming changes, see more here (https://code.google.com/p/guava-libraries/issues/detail?id=1119)
In order to do it, introduce a hashing enum, and use the (unused until now) hash type serialization to choose the correct hashing used based on serialized version.
Also, move to use our own optimized murmur hash for the new hashing logic.
Currently we use 5k operations as a flush threshold. Indexing 5k documents
per second is rather common which would cause the index to be committed on
the lucene level each time the flush logic runs which is 5 seconds by default.
We should rather use a size based threshold similar to the lucene index writer
that doesn't cause such agressive commits which can slow down indexing significantly
especially since they cause the underlying devices to fsync their data.
When the ClusterService applies a new cluster state, it is first assigned as the new active one and then all listeners are called. Some of ES's features sample the current state and try to take action on it (for example index a document). If that fails, they will wait for change in the cluster state and try again (for example, wait for a shard to start and try indexing again).
If you're unlucky you sample the state after it has been assigned as the "active" state but before all listeners has done the work. In this cases the action take (i.e., indexing a doc) will still fail (as the shard is not yet started) but waiting for a new state may take a long time or fail.
This commit adds a new ClusterStateStatus that allows to better track the stages a cluster state goes through (currently `RECEIVED`, `BEING_APPLIED` & `APPLIED`). This allows detecting that a cluster state is not yet fully applied and retry without waiting for a new state to arrive.
This commit also adds a utility class , ClusterStateObserver, to make this pattern slightly simpler and avoid common pit falls.
Closes#5741
The blacklist can be provided through -Dtests.rest.blacklist and supports a comma separated list of globs
e.g. -Dtests.rest.blacklist=get/10_basic/*,index/*/*
Also added some missing docs and made it clearer that the suite/test descriptions effectively contains their (relative) path (api/yaml_file/test section)
Closes#5881
This is a fix for a bug whereby a cluster that has no nodes started with
-Des.node.bench=true will cause clients to hang if they attempt to
submit a benchmark.
Also adds REST tests to validate fix
Closes#5754
we use the "local host" address in sevearl places in our networking layer, if local host is not resolved for some reason, still continue and operate but using the loopback interface
Today we first get a reference to the IndexSearcher in #acquireSearcher
and then futher down we try to run Store#incRef() which might throw an
exception if the store is already closed. There is a small window
that allows this to happen during InternalEngine#close() when we try
to acquire the searcher at the same time and the engine is the last
resource that holds a reference to the store.
This commit only affects unreleased code since the Store's ref counting
has not yet been released.
The test starts a single node, indexes into, restarts the node and checks that no data was lost. It only indexed into 2 shards and didn't wait for green meaning that the node could be restarted with non-started primary. In that case the node will not re-assign the primary as it was not started. This commit makes sure that we either wait for primaries to start or index into all shards which has the same net effect.
Also extending some logging in InternalIndexShard.
The test verifies that stats are measure by checking timeInMillis>0. On fast machines the suggestions are done in < 1 millis time. The tests now index documents (to power suggestions) and does multiple suggestions per iterations to slow things down.
When a replication operation (index/delete/update) fails to be executed properly, we fail the replica and allow master to allocate a new copy of it. At the moment, the node hosting the primary shard is responsible of notifying the master of a failed replica. However, if the replica shard is initializing (`POST_RECOVERY` state), we have a racing condition between the failed shard message and moving the shard into the `STARTED` state. If the latter happen first, master will fail to resolve the fail shard message.
This commit builds on #5800 and fails the engine of the replica shard if a replication operation fails. This protects us against the above as the shard will reject the `STARTED` command from master. It also makes us more resilient to other racing conditions in this area.
Closes#5847
A merge (and refresh) might rarely happen in the background between the two queries whose output is compared. It might then happen that two docs with same scores get returned by the two queries in a different order due to different lucene document id (which has changed in the meantime). To fix this we need to order by id when the score is the same, so that we can safely compare the output of the two queries (multimatch and dismax).
TTLPercolatorTests indexes docs with small TTLs which can trigger
AlreadyExpiredException exception. This is expected while rare and
we should just catch them.
Until today we did close the engine without aqcuireing the write lock
since most calls were still holding a read lock. This commit removes
the code that holds on to the readlock when failing the engine which
means we can simply call #close()
This test requires a mapping since otherwise if there is no mapping
added the percolator query might not be parsed as a query on a numeric
field since the query might arrive on a node before the dynamic mapping
reached that node.
This commit also moves the `indexService.readAllowed()` call up before
the number of percolation queries is check to make sure we fail if reads
are not allowed - there might be a query in-flight which means we need
to check another node rather than return an empty result.
Load tests showed that SerialMS has problems to keep up with
the merges under high load. We should switch back to CMS
until we have a better story to balance merge
threads / efforts across shards on a single node.
Closes#5817
If the during percolating a new field was introduced
in the local mapping service, then those changes should
be updated in cluster state of the master as well.
Closes#5776
This methods had some workarounds for bugs that seem to be fixed
in Java 7 [1]. There seem to be other problems on shared file-systems
which are not really supported by lucene anyway or rather not
recommeded. Yet the current solution that interrupts a static thread
reference is too dangrous given all the usage of NIO across
elasticsearch.
[1] http://bugs.java.com/bugdatabase/view_bug.do?bug_id=4742723
This method basically forcefully creates as many files as possible
to find out the process limit in a brute-force manner. The number of
possible probles with this approach would exceed the number of lines
left on this commit message.
This commit uses a JMX based alternative to print the process limit.
This prevents executing bulks internal autocreate indices logic
and ensures that this internal request never creates an index
automaticall.
This fixes a bug where the TTL purger thread ran after the actual
index it was purging was already closed / deleted and that re-created
that index.
Closes#5766
This is related to LUCENE-5570 where fsync creates a 0-byte file
if the file does not exists. This commit adds the patched lucene
version using Java 7 APIs as well as a note to replace this method
with the upcomeing IOUtils#fsync in Lucene 4.8
This commit cleans up FsImmutableBlobContainer#writeBlob to make
use of Java7 Auto-Closing features and ensures that the directory
the blob was written to is fsynced as well if possible.
For resources that have their life time effectively defined by the search
context they are attached to, it is convenient to use the search context to
schedule the release of such resources.
This commit changes aggregations to use this mechanism and also introduces
a `Lifetime` object that can be used to define how long the object should
live:
- COLLECTION: if the object only needs to live during collection time and is
what SearchContext.addReleasable would have chosen before this change
(used for p/c queries),
- SEARCH_PHASE for resources that only need to live during the current search
phase (DFS, QUERY or FETCH),
- SEARCH_CONTEXT for resources that need to live until the context is
destroyed.
Aggregators are currently registed with SEARCH_CONTEXT. The reason is that when
using the DFS_QUERY_THEN_FETCH search type, they are allocated during the DFS
phase but only used during the QUERY phase. However we should fix it in order
to only allocate them during the QUERY phase and use SEARCH_PHASE as a life
time.
Close#5703
Java7's AutoCloseable allows to manage resources more nicely using
try-with-resources statements. Since the semantics of our Releasable interface
are very close to a Closeable, let's switch to it.
Close#5689
When a create document is executed, and its an auto generated id (based on UUID), we know that the document will not exists in the index, so there is no need to try and lookup the version from the index.
For many cases, like logging, where ids are auto generated, this can improve the indexing performance, specifically for lightweight documents where analysis is not a big part of the execution.
Due to the default of `async_merge` to `true` we never run
the merge policy on a segment flush which prevented the
pending merges from being updated and that caused actual
pending merges not to contribute to the merge decision.
This commit also removes the `index.async.merge` setting is actually
misleading since we take care of merges not being excecuted on the
indexing threads on a different level (the merge scheduler) since 1.1.
This commit also adds an additional check when to run a refresh
since soely relying on the dirty flag might leave merges un-refreshed
which can cause search slowdowns and higher memory consumption.
Closes#5779
When we load sparse single valued data, we automatically assign a missing value to represent a document who has none. We try to find a value that will increase the number of bits needed to represent the data. If that missing value happen to be 0, we do no properly intialize the value array.
This commit solved this problem but also cleans up the code even more to make spotting such issues easier in the future.
The AppendingDeltaPackedLongBuffer uses delta compression in paged fashion. For data which is roughly monotonic this results in reduced memory signature.
By default we use the storage format expected to use the least memory. You can force a choice using a new field data setting `memory_storage_hint` which can be set to `ORDINALS`, `PACKED` or `PAGED`
Closes#5706