Before this change, the GetField#getValue() method was returning a list of values of a multivalued fields if the field values were obtained from source or if the field was stored and real-time get was used. If the field was stored but non-realtime get was used, GetField#getValue() was returning only the first element and the GetField#getValues() was returning a list of elements. This change makes behavior consistent. GetField#getValue() now always returns only the first value of the field and GetField#getValues() returns the entire list.
Typically, the main reason a reroute allocation command with allow_primary is enabled, is to force create an empty new shard because a shard (and its replicas) were lost. This can't be done today because the shard expects to have a valid index where its allocated, we need to clear its post allocation flag to make sure it is allowed to create a fresh index.
Lucene provides a set of statistics that depend on the codec / postingsformat
as well as on the index options used when the field is created / indexed.
If a certain stats value is not available lucene return `-1` instead of the
correct value. We need to ensure that those values are encoded correctly if
we try to write vLongs as well as when we aggregate those values.
Closes#3012
NgramTokenizer and NGramTokenFilter are broken with a version < 4.2
We should still support these filters but should prevent the StringIOOB
exceptions. Adding these fitlers for the FragmentBuilderHelper will
allow seamless highlighting on fields indexed with those tokenizers or
tokenfilters
Lucenes span queries are a different family than 'ordinary' queries
in lucene. Spans only work with other spans such that smart query
wrapping doesn't work with span queries at all ie. we can't wrap
in filtered query.
Closes#2994
Today an analysis chain with broken tokenfilters or tokenizers like
WordDelimiterFilter might produce somewhat broken term vectors that cause
`StringIndexOutOfBoundsExceptions` if FastVectorHighlighter is used
since the positions / offsets contract is violated and offsets of highlight
tokens are not increasing but decreasing even if their positions are increasing.
Yet, if we detect such a situation we can resort the tokens which might cause
somewhat odd highlights but doesn't fail hard with a StringIndexOOBException.
Closes#3006
All index meta data API's have urgent priority when it comes to cluster state updates. We'd like to remove indices asap to avoid things like unnecessary shards relocations
This Lucene Release introduced a new API on DocIdSetIterator that requires each
implementation to return a `cost` upperbound as a function of the iterated documents.
This API allows for several optimizations during query execution especially in
Conjunction and Disjunction Queries with min_should_match set.
Closes#2990
Allow to get the source directly using a specific REST endpoint without any additional content around it, the endpoint is `{index}/{type}/{id}/_source`.
Note, HEAD now also support the _source endpoint.
closes#2993, closes#2995
BytesRefOrdValComparator starts at 1 and ends at maxOrd - 1. Yet, numOrd is defined
as maxOrd - 1 excluding the 0 ord.
This causes wrong sort ords when the bottom of the queue is compared to the next
segment and the greatest term in the new segment is in-fact less than the current
queue bottom. If that is true we treat the values as equal and never include the right
value into the queue.
Closes#2991
This adds support for lucene span multi term queries. This lucene query
allows users to form complicated queries such as wildcards or prefix
queries embedded within span queries.
WeightFunction#weight to allow dedicated weight calculations per operation. In certain
circumstance it is more efficient / required to ignore certain factors in the weight
calculation to prevent for instance relocations if they are solely triggered by tie-breakers.
In particular the primary balance property should not be taken into account if the delta for
early termination is calculated since otherwise a relocation could be triggered solely by the
fact that two nodes have different amount of primaries allocated to them.
Closes#2984
A first implementation of adding matchers and helper methods to elasticsearch.
The following ones are supported
assertHitCount(searchResponse, 2);
// helper methods to easily access the first hits
assertFirstHit(searchResponse, hasId("foo")):
assertSecondHit(searchResponse, hasType("foo")):
assertThirdHit(searchResponse, hasIndex("foo")):
// methods to access all other hits
assertSearchHit(searchResponse, 5, hasId("10"));
// same as above, but maybe more readable
assertSearchHit(searchResponse.getHits().getAt(5), hasIndex("foo"));
I changed GeoFilterTests to show how it works.
Furthermore I inlined assertHighlight() from HighlighterSearchTests.
The ElasticsearchAssertions class can be used now as a centralized assertion class
in order have a centralized class for every developer to look at.
Custom settings are not always present in the `Settings` that are passed
to `NodeSettingsService.Listener#onRefreshSettings` such that using the defaults
will necessarily override the custom settings if set before.
Closes#2973
Fixed testX and testSingleNodeNoFlush by specifying mapping on index creation instead of using dynamic mapping. Dynamic mapping is updated on the cluster level asynchronously and if mapping changes are not applied to the cluster state before node is closed, these changes are not be available after node restart. While data added in the test is preserved, due to absence of mapping, the test still fails. This is a known issue that we are not planning to fix at the moment.
Currently realtime GET does not take source includes/excludes into account.
This patch adds support for the source field mapper includes/excludes
when getting an entry from the transaction log. Even though it introduces
a slight performance penalty, it now adheres to the defined configuration
instead of returning all source data when a realtime get is done.
The filter method of XContentMapValues actually filtered out nested
arrays/lists completely due to a bug in the filter method, which threw
away all data inside of such an array.
Closes#2944
This bug was a follow up problem, because of the filtering of nested arrays
in case source exclusion was configured.
Analysing a numeric field will return UTF-16 representations of
of Lucenes numeric prefix terms. Those terms are meaningless in general
unless used for lookups in the lucene index. Passing a numeric field
to the analysis action is most likely a bug.
Closes#2953#2952
use the latest Lucene version as specified in o.e.common.lucene.Lucene
and must be upgraded with each lucene release.
This commit adds an assert that fails once the actual lucene version
that is used is higher than the current releases version.
Lucene ships with a version constant that is mainly used to provide consistent behaviour across lucene release versions. Lucene's Analysis capabilities are commonly applied at index and search time such that the search-time behaviour should be identical to the index-time behaviour in most of the cases. Currently ElasticSearch always uses the latest version from Lucene which can break backwards compatibility with the index for users that rely on behaviour that changed in new Lucene version.
Users should always use the version the index was created with unless it's explicitly configured.
closes#2945
don't produce broken positions anymore and prevent certain highlighter bugs that fail with
StringArrayOutOfBoundsExceptions as in #2931
This commit breaks backwards compatibility in terms of highlighting when NGramTokenFilter is used.
The highlighter will highlight the entire terms as produced by the tokenizer instead of the individual
sub-gram. To do sub-gram highlighting, the ngram tokenizer should be used. This behavior was based on
broken NGramTokenFilter behavior which will be fixed in Lucene 4.4 but was ported in this commit
to elasticsearch 0.90. The broken behavior can still be used if a version < LUCENE_42 is used
in the token filter mapping.
Closes#2931
- rename to open_contexts from open, we might have other open stats in the future related to search (lucene index searchers?)
- add a test to verify it works