Instead of resolving the global ordinal for each hit on the fly, resolve the global ordinals during post collect.
On fields with not so many unique values, that can reduce the number of global ordinals significantly.
Closes#5895Closes#5854
The default number of clients nodes is randomized between 0 and 1, applied to all cluster scopes (global, suite and test). Can be changed through the newly added `@ClusterScope#numClientNodes`.
In our tests we currently refer to nodes in a generic way. All the tests that either stop or start nodes rely on the fact that those nodes hold data though. Made that clearer as that becomes more important when introducing other types of nodes within the test cluster. Reflected this by adapting and renaming the following methods in `TestCluster`:
- ensureAtLeastNumNodes to ensureAtLeastNumDataNodes
- ensureAtMostNumNodes to ensureAtMostNumDataNodes
- stopRandomNode to stopRandomDataNode
and the following ones in `ElasticsearchIntegrationTest`:
- allowNodes to allowDataNodes
- dataNodes to numDataNodes.
- @ClusterScope#numNodes to numDataNodes
- @ClusterScope#minNumNodes to minNumDataNodes
- @ClusterScope#maxNumNodes to maxNumDataNodes
Added facilities to be able to deal with data nodes specifically, like for instance retrieve a client to a data node, or retrieve an instance of a class through guice only from data nodes.
Adapted existing tests to successfully run although there's a node client around.
Fixed _cat/allocation REST tests to make disk.total, disk.avail and disk.percent optional as client nodes won't return that info.
Closes#5949
Decay functions currently only use the first value in a field that contains
multiple values to compute the distance to the origin. Instead, it should
consider all distances if more values are in the field and then use
one of min/max/sum/avg which is defined by the user.
Relates to #3960closes#5940
during the stop process, we raise network disconnect, so it is valid to raise then while we are in stop mode, and actually, we should not miss any events in such a case.
Typically, this is not a problem, since its during the normal shutdown process on the JVM, but when running a reused cluster within the JVM (like in our test infra with the shared cluster), we should properly raise those node disconnects
closes#5918
Our ordinals currently start at 1, like FieldCache did in older Lucene versions.
However, Lucene 4.2 changed it in order to make ordinals start at 0, using -1
as the ordinal for the missing value. We should switch to the same numbering as
Lucene for consistency. This also allows to remove some abstraction on top of
Lucene doc values.
Close#5871
Separate version check logic for reads and writes for all version types, which allows different behavior in these cases.
Change `VersionType.EXTERNAL` & `VersionType.EXTERNAL_GTE` to behave the same as `VersionType.INTERNAL` for read operations.
The previous behavior was fit for writes but is useless in reads.
This commit also makes the usage of `EXTERNAL` & `EXTERNAL_GTE` in the update api raise a validation error as it make cause data to
be lost.
Closes#5663 , Closes#5661, Closes#5929
When a create document is executed, and its an auto generated id (based on UUID), we know that the document will not exists in the index, so there is no need to try and lookup the version from the index.
For many cases, like logging, where ids are auto generated, this can improve the indexing performance, specifically for lightweight documents where analysis is not a big part of the execution.
closes#5917
Today we use a builder pattern / setters to set relevant information
to Engine#Delete|Create|Index. Yet almost all the values are required
but they are not passed via ctor arguments but via an error prone builder
pattern. If we add a required argument we should see compile errors on that
level to make sure we don't miss any place to set them.
Prerequisite for #5917
Currently the parser accepts queries like
```
"query" : {
"any_query": {
...
},
"any_field_name":...
}
```
The "any_field_name" is silently ignored. However, this also causes the parser
not to move to the next closing bracket which in turn can lead to additional query
paremters being ignored such as "fields", "highlight",...
This was the case in issue #4895
closes issue #4895
Needed for REST backwards compatibility tests, since we need to run older tests with the latest runner, which randomizes shards and replicas, but the tests rely on defaults (5,1).
Done in a generic way based on compatibility versions e.g. `-Dtests.compatibility=1.0.0` allows to run tests in a special manner that is compatibile with 1.0.0 version.
Also moved back randomIndexTemplate to ElasticsearchIntegrationTest (from ImmutableCluster) where all the randomized aspects should be.
Closes#5897
In case of a DFS_QUERY_THEN_FETCH request, facets and aggregations are currently
instantiated during the DFS phase while they only become useful during the QUERY
phase. By instantiating during the QUERY phase instead, we can make better use
of recycling since objects will have a shorter life out of the recyclers.
Close#5821
Today, if some shards pass the DFS phase but all of them fail the QUERY phase,
the response will only consist of failed shards. We should throw an exception
instead in order to be consistent with the QUERY_THEN_FETCH type.
We initially added abstraction in the percentiles aggregation in order to be
able to plug in different percentiles estimators. However, only one of the 3
options that we looked into proved useful and I don't see us adding new
estimators in the future.
Moreover, because of this, we let the parser put unknown parameters into a hash
table in case these parameters would have meaning for a specific percentiles
estimator impl. But this makes parsing error-prone: for example a user reported
that his percentiles aggregation reported extremely high (in the order of
several millions while the maximum field value was `5`), and the reason was that
he had a typo and had written `fields` instead of `field`. As a consequence,
the percentiles aggregation used the parent value source which was a timestamp,
hence the large values. Parsing would now barf in case of an unknown parameter.
Close#5859