Some FieldData consumers require hash values per byte. We provide an optimization
that allows to cache the hashes internally if the consumer knows that they are needed
this optimization got lost in a previous commit. This commit adds them back and folds
the dedicated method into AtomicFieldData#getBytesValues(true|false)
As a side note, the internal reroute call is now part of the ack mechanism. That means that if the response contains acknowledged flag, the internal reroute that was eventually issued was acknowledged too. Also, even if the request is not acknowledged, the reroute is issued before returning, which means that there is no need to manually call reroute afterwards to make sure the new settings are immediately applied.
Closes#3995
Added support for serialization based on version to AcknowledgedResponse. Useful in api that don't support yet the acknowledged flag in the response.
Moved also ack warmer tests to more specific AckTests class
Close#3983
This addes the _cat/recovery/{index} API endpoint, which displays
information about the status of recovering shards. An example of the
output:
index shard node target recovered %
test2 0 Fwo7c_6MSdWM0uM1Ho4t-g 147304414 19236101 13.1%
test 0 Fwo7c_6MSdWM0uM1Ho4t-g 145891423 119640535 82.0%
Fixes#3969
This patch takes the version of the created index into account when a
prebuilt analyzer is created.
So, if an index was created with 0.90.4, then the prebuilt analyzers
will be the same than on the 0.90.4 release.
One reason for this feature is the possibility to change pre built
analyzers like the standard one.
The patch tries to reuse analyzers as mutch as possible. So even if
version X.Y.Z and X.Y.A use the same lucene analyzers, the same instance
is reused in order to prevent overcreation of lucene analyzer instances.
Closes#3790
The setting causes the upper bound for a range query/filter to be rounded up,
therefore the name `round_ceil` seems to make more sense.
Also this commit removes the redundant fourth parameter to DateMathParser.parse(..)
which was never used.
was: parse(String text, long now, boolean roundUp, boolean upperInclusive)
is now: parse(String text, long now, boolean roundCeil)
closes#3914
When running tests, Engine.searcher() is going to be an AssertingIndexSearcher
so we definitely don't want to discard it. This commit fixes it as well as the
bugs it found.
Closes#3987
There seems to be an issue with this test since it shuts down random
nodes and TransportClients seem to be confused due to that. For
now we disable them to figure out if this is the cause of the sporadic
timeouts.
In order to make sure that people do not get confused, if they
index a float as weight, it makes more sense to reject it instead of
silently parsing it to an integer and using it.
The CompletionFieldMapper now checks for the type of the number which
is being read and throws and exception if the number is something else
than int or long.
Closes#3977
Requires field index_options set to "offsets" in order to store positions and offsets in the postings list.
Considerably faster than the plain highlighter since it doesn't require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be.
Requires less disk space than term_vectors, needed for the fast_vector_highlighter.
Breaks the text into sentences and highlights them. Uses a BreakIterator to find sentences in the text. Plays really well with natural text, not quite the same if the text contains html markup for instance.
Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm.
Uses forked version of lucene postings highlighter to support:
- per value discrete highlighting for fields that have multiple values, needed when number_of_fragments=0 since we want to return a snippet per value
- manually passing in query terms to avoid calling extract terms multiple times, since we use a different highlighter instance per doc/field, but the query is always the same
The lucene postings highlighter api is quite different compared to the existing highlighters api, the main difference being that it allows to highlight multiple fields in multiple docs with a single call, ensuring sequential IO.
The way it is introduced in elasticsearch in this first round is a compromise trying not to change the current highlight api, which works per document, per field. The main disadvantage is that we lose the sequential IO, but we can always refactor the highlight api to work with multiple documents.
Supports pre_tag, post_tag, number_of_fragments (0 highlights the whole field), require_field_match, no_match_size, order by score and html encoding.
Closes#3704
You can configure the highlighting api to return an excerpt of a field
even if there wasn't a match on the field.
The FVH makes excerpts from the beginning of the string to the first
boundary character after the requested length or the boundary_max_scan,
whichever comes first. The Plain highlighter makes excerpts from the
beginning of the string to the end of the last token before the requested
length.
Closes#1171
Currently we have a marker interface for Acknowledged[Request|Response],
this makes not much sense since we duplicate the code in each subclass
or class that implements the interface. We can simply use abstract
classes and have it implemented only once.
This commit primarily folds [Double|Bytes|Long|GeoPoint]Values.Iter
into [Double|Bytes|Long|GeoPoint]Values. Iterations now don't require
a auxillary class (Iter) but instead driven by native for loops. All
[Double|Bytes|Long|GeoPoint]Values are stateful and provide `setDocId`
and `nextValue` methods to iterate over all values in a document.
This has several advantage:
* The amout of specialized classes is reduced
* Iteration is clearly stateful ie. Iters can't be confused to be local.
* All iterations are size bounded which prevents runtime checks and
allows JIT optimizations / loop un-rolling and most iterations are
branch free.
* Due to the bounded iteration the need for a `hasNext` method call
is removed.
* Value iterations feels more native.
This commit also adds consistent documentation and unifies the calcualtion
if SortMode is involved.
This commit also changes the runtime behavior of BytesValues#getValue() such that it
will never return `null` anymore. If a document has no value in a field
this method still returns a `BytesRef` with a `length` of 0. To identify
documents with no values #hasValue() or #setDocument(int) should be used.
The latter should be preferred if the value will be consumed in the case
the document has a value.
Have a separate channel for recovery, so it won't overflow the "low" channel which is also used for bulk indexing.
Also, rename the channel names to be more descriptive. Change low to bulk (for bulk based operations, currently just bulk indexing), med to reg (for "regular" operations), and high to state (for state based communication). The new channel for recovery will be named recovery, and the ping channel will remain the same.
closes#3954
When setting track scores, the scan search type will return the scores for each document. The Java API builder does not properly set this value (it only sets it if a sort in in place, which is not relevant for scan search type).
closes#3949
The description of the timeout parameter was worded misleadingly; it implied that the API would wait until the cluster reached the desired level and then stayed at that level for the timeout. I've tweaked the sentence to remove the risk of confusion.