Mid-term we should switch from `BytesValues` to Lucene's doc values APIs, in
particular the `SortedSetDocValues` class. While `BytesValues.WithOrdinals` and
SortedSetDocValues expose the same functionality, `BytesValues.WithOrdinals`
exposes its ordinals via a different `Ordinals.Docs` object while
`SortedSetDocValues` exposes them on the same object as the one that holds the
values. This commit merges ordinals into `BytesValues.WithOrdinals` in order to
make both classes even closer.
Global ordinals were a bit tricky to migrate so I just changed them to use
Lucene's OrdinalMap that will soon (LUCENE-5767, scheduled for 4.9) have the
same optimizations as our global ordinals.
Close#6524
The `exists` and `missing` filters need to merge postings lists of all existing
terms, which can be very costly, especially on high-cardinality fields. This
commit indexes the field names of a document under `_field_names` and reuses it
to speed up the `exists` and `missing` filters.
This is only enabled for indices that are created on or after Elasticsearch
1.3.0.
Close#5659
VersionFieldMapper.defaultDocValuesFormat claims that the default is `disk`.
This is not used to choose the DV format in the index but for mappings
serialization in order to know when the _version doc values format is
different from the default format. This made it impossible to use the `disk`
doc values format since mappings would never retain that information at
serialization time.
Close#6523
The TTL, size, timestamp and index meta properties could be lost on an
update of a single field mapping due to a wrong comparison in the
merge method (which was caused by a wrong initialization, which marked
an update as explicitely disabled instead of unset.
Closes#5053
Added the http.jsonp.enable option to configure disabling of JSONP responses, as those
might pose a security risk, and can be disabled if unused.
This also fixes bugs in NettyHttpChannel
* JSONP responses were never setting application/javascript as the content-type
* The content-type and content-length headers were being overwritten even if they were set before
Closes#6164
Commit fbd7c9aa5d introduced a regression that caused
the min_doc_count to be equal to the number of documents in the
background set. As a result no buckets were built when the
response for significant terms was created.
This only affected the final XContent response.
closes#6535
Currently we send relocation & flush actions based on all assigned ShardRoutings. During the final stage of relocation, we may miss to refresh/flush a shard if the coordinating node has not yet processed the cluster state update indicating that a relocation is completed *and* the relocation target node has already processed it (i.e., started the shard and has accepted new indexing requests).
Closes#6545
Percentile Rank Aggregation is the reverse of the Percetiles aggregation. It determines the percentile rank (the proportion of values less than a given value) of the provided array of values.
Closes#6386
Nested documents were indexed as separate documents, but it was never checked
if the hits represent nested documents or not. Therefore, nested objects could
match not nested queries and nested queries could also match not nested documents.
Examples are in issue #6540 .
closes#6540closes#6544
Client, ClusterAdminClient and IndicesAdminClient had corresponding
intermediate `internal` interfaces that are unnecessary and cause
a lot of casting. This commit removes the intermediate interfaces
and uses the super interfaces directly.
This commit also adds Releaseable to `Node` and `Client` in order to
be used with utilities like try / with.
Closes#4355Closes#6517
This commit renames `TestCluster` -> `InternalTestCluster` and
`ImmutableTestCluster` to `TestCluster` for consistency. This also
makes `ExternalTestCluster` and `InternalTestCluster` consistent
with respect to their execution environment.
Closes#6510
This commit add a basic infrastructure as well as primitive tests
to ensure version backwards compatibility between the current
development trunk and an arbitrary previous version. The compatibility
tests are simple unit tests derived from a base class that starts
and manages nodes from a provided elasticsearch release package.
Use the following commandline executes all backwards compatiblity tests
in isolation:
```
mvn test -Dtests.bwc=true -Dtests.bwc.version=1.2.1 -Dtests.class=org.elasticsearch.bwcompat.*
```
These tests run basic checks like rolling upgrades and
routing/searching/get etc. against the specified version. The version
must be present in the `./backwards` folder as
`./backwards/elasticsearch-x.y.z`
The alias -> (index -> alias) map, specifically the index -> alias one, typically just hold one entry, yet we eagerly initialize it to the number of indices. When there are many indices, each with many aliases, this is a very large overhead per alias...
closes#6504
Our field data currently exposes hashes of the bytes values. That takes roughly
4 bytes per unique value, which is definitely not negligible on high-cardinality
fields.
These hashes have been used for 3 different purposes:
- term-based aggregations,
- parent/child queries,
- the percolator _id -> Query cache.
Both aggregations and parent/child queries have been moved to ordinals which
provide a greater speedup and lower memory usage. In the case of the percolator
it is used in conjunction with HashedBytesRef to not recompute the hash value
when getting resolving a query given its ID. However, removing this has no
impact on PercolatorStressBenchmark.
Close#6500
This was how terms aggregations managed to not be too slow initially by caching
reads into the terms dictionary using ordinals. However, this doesn't behave
nicely on high-cardinality fields since the reads into the terms dict are
random and this execution mode loads all unique terms into memory.
The `global_ordinals` execution mode (default since 1.2) is expected to be
better in all cases.
Close#6499
Under some rare circumstances:
- local transport,
- the range aggregation has both a parent and a child aggregation,
- the range aggregation got no documents on one shard or more and several
documents on one shard or more.
the range aggregation could return incorrect counts and sub aggregations.
The root cause is that since the reduce happens in-place and since the range
aggregation uses the same instance for all sub-aggregation in case of an
empty bucket, sometimes non-empty buckets would have been reduced into this
shared instance.
In order to avoid similar bugs in the future, aggregations have been updated
to return a new instance when reducing instead of doing it in-place.
Close#6435
Moved BulkProcessor tests from BulkTests to newly added BulkProcessorTests class.
Strenghtened BulkProcessorTests by adding randomizations to existing tests and new tests for concurrent requests and expcetions.
Also made sure that afterBulk is called only once per request if concurrentRequests==0.
Closes#5038