* Hole charactor now can change with new releases
* Fixed bug where the SEP_LABEL constant was used instead of the sepLabel instance variable
* Replaced if- with switch-statement
When we relocate a shard we might still have pending SearchContext
instances hanging around that will be used in "in-flight" searches
on the already relocated shard. This is a valid operation but if
we have already closed the underlying directory which happens during
cleanup concurrently the close call on the IndexReader can trigger
an AlreadyClosedException when the NRT reader tries to cleanup files
via the IndexWriter.
Closes#4273
This method has 2 signatures and one of them is dangerous since it allows to
discard fielddata configuration of the field mapper. This commit changes the
percolator so that it uses fielddata configuration of the _id field mapper
instead of forcing the paged_bytes format.
Closes#4270
uri parameters were not all parsed for the multi term vector request. This commit
makes sure that all parameters are parsed and used when creating the requests for the
multi term vector request.
In order to simplify both code and json request, the request structure now allows
two ways to use multi term vectors:
1. Give all parameters for each document requested in the docs array like this:
```
{
"docs": [
{
"_index": "testidx",
"_type": "test",
"_id": "2",
"terms": [
"fox"
],
"term_statistics": true
},
{
"_index": "testidx",
"_type": "test",
"_id": "1",
"terms": [
"quick",
"brown"
],
"term_statistics": false
}
]
}
```
2. Define a list of ids and give parameters in a separate parameters object like this:
```
{
"ids": [
"1",
"2"
],
"parameters": {
"_index": "testidx",
"_type": "test",
"terms": [
"brown"
]
}
}
```
uri parameters are global parameters that are set for both cases. They are overwritten
by parameter definitions in the body.
Also, this commit adds the missing setParent(..) and setPreference(..) to TermVectorRequestBuilder.
* Minor alignments (like setter to ctor)
* FuzzySuggester has a unicode aware flag, which is not exposed in the fuzzy completion request parameters
* Made XAnalyzingSuggester flags (PAYLOAD_SEP, END_BYTE, SEP_LABEL) to be written into the postings format, so we can retain backwards compatibility
* The above change also implies, that these flags can be set per instantiated XAnalyzingSuggester
* CompletionPostingsFormatTest now uses a randomProvider for writing data to check for bwc
This commit upgrades to Lucene 4.6 and contains the following improvements:
* Remove XIndexWriter in favor of the fixed IndexWriter
* Removes patched XLuceneConstantScoreQuery
* Now uses Lucene passage formatters contributed from Elasticsearch in PostingsHighlighter
* Upgrades to Lucene46 Codec from Lucene45 Codec
* Fixes problem in CommonTermsQueryParser where close was never called.
Closes#4241
We have pending open files on a regular basis since we search while
relocating etc. and keep search contexts around that are cleaned up
later. We should rather let the close call pass on even if files are
open and only force failures on teardown in ElasticsearchIntegrationTest
When starting elasticsearch with a wrong linux user, it could generate a `NullPointerException` when `PluginsService` tries to list available plugins in `./plugins` dir.
To reproduce:
* create a plugins directory with `rwx` rights for root user only
* launch elasticsearch from another account (elasticsearch for example)
Related discussion: https://groups.google.com/forum/#!topic/elasticsearch/_WRW4Qfpo7MCloses#4186.
Closes#4187.
Add FieldDataTermsFilter that compares terms out of
the fielddata cache. When filtering on a large
set of terms this filter can be considerably faster
than using a standard lucene terms filter.
Add the "fielddata" execution mode to the
terms filter parser to enable the use of
the new FieldDataTermsFilter.
Add supporting tests and documentation.
Closes#4209
In case of a misconfigured slow search/index configuration (unparseable
TimeValue) an exception is thrown.
This is not a problem when creating a shard of an index, as an exception
is returned and all is good. However, this is a huge problem, when
starting up a node, as the shard creation is repeated endlessly.
This patch changes the behaviour to go on as usual and just disable the
slowlog, as an improper configuration of logging should not affect the
allocation behaviour.
Closes#2730
The search request inside of a put warmer request was nullable, but actually we have to have that request in the transport action.
Validation and appropriate test added.
Closes#4196
Make the FilterBuilder interface consistent with the QueryBuilder
interface and replace usage of QueryBuilderException with
ElasticSearchIllegalArgumentException.
Allow the user to configure the number of hash functions as well as add
support for serializing/deserializing the bloom filter from a stream.
Add a hashCode to the bloom filter.
In this view you never care about the actual heap used bytes; you only
want to know that your max is set to what you meant and what
percentage you're currently using.
Closes#4151.
The multi_match query accepted only an array in the fields parameter. This patch allows to use a single string as well.
Also added tests for parsing in both cases.
Closes#4164
There is an optimization that executes bit based (slow) filters in the end. Matched docs could be unset if they didn't match with any of these filters. The bug was that also iterator based (fast) filters should be checked.
This change checks all should filters in the end part (if must or must_not clauses exists), so it can now correctly unset matched docs. The current bool filters requires that at least one should clause must match for docs to be match regardless of any other clauses.
Closes#4130
This commit adds javadocs and removed unused methods from central
classes like ElasticsearchIntegrationTest. It also changes visibility
of many methods and classes that are only needed inside the test infrastructure.
On windows tests sometimes fail since files can not be deleted due
to existing repos still holding on to the files. The test
framework is picky about that since it could be a bug and fails
the test if a temp file can not be deleted.
At the end recovery, the IndexShardGatewayService will double check the gateway has moved the shard to POST_RECOVERY and if not, do it it self.
The shard state could have already move to started, causing the post_recovery call to throw an exception and the entire shard recovery to fail.
This can happened if, after the gateway moved the shard to POST_RECOVERY:
1) master sent a new cluster state indicating shard is initializing
2) IndicesClusterStateService#applyInitializingShard will send a shard started event
3) Master will mark shard as started and this will be processed quickly and move the shard to STARTED.
Closes#4147
With #3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started.
What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway.
Closes#4089, #3840
This adds a delegate to CharTermAttributeImpl to be compatible
with the Percolator that needs a CharTermAttribute. Yet compared
to CharTermAttributImpl we only fill the BytesRef with UTF-8 since
we already have it and only if we need to convert to UTF-16 we do it.
Closes#4028
Previously the field name specified in the search request was used, which isn't correct in case a custom index_name has been used for a field or the "path":"just_name" has been used in the mapping.
Closes#4116
Fixed also bug in the fast vector highlighter which was raised by enabling the object cache, due to null FieldQuery (NPE) in case the objects are taken from the cache
Added tests to check if there are issues when highlighting multiple fields at the same time
Closes#4106
Values returned by [Double|Long|Bytes]Values are sorted today which
is guaranteed by the underlying Lucene index. Several implementations can
make use of this property but the interfaces don't guarantee this behavior.
This commit adds the guarantees and makes use of them in several places.
Note: This change might require sorting for 3rd party implemenations of these
interaces.
The 'default' / 'standard' analyzer can be a trappy default sicne it filters
english stopwords by default. Yet a default should not be dedicated to a certain language
since elasticsearch is used in many different scenarios where a standard analysis chain
with specialization to english full-text might be rather counter productive.
This commit changes the 'standard' analyzer to use an empty stopword list for indices
that are created from 1.0.0.Beta1 version onwards but will maintain backwards compatibiliy
for older indices.
Closes#3775
Use .percolator as the internal (hidden) type name for percolators within the index. Seems nicer name to represent "hidden" types within an index.
closes#4090
Also, fix all the problems it brought up in tests.
Removed OverrideTypeMappingTests as it is no longer relevant.
Better naming for the default percolator mapping and change it's content use _default_ as root node.
Closes#4038
It create the following challenges:
- it automatically load all the norms for all fields. This should be an opt in feature similar to the new loading feature in field data. Will open a separate issue for it.
- It automatically loads all doc values for all fields (if they have it), overriding effectively the loading option of field data when its backed by doc values.
closes#4078
We now return acknowledged true when no wait is needed (mustAck always returns false). We do wait for the master node to complete its actions though. Previously it would try to timeout and hang due to a CountDown#fastForward call when the internal counter is set to 0
We now return acknowledged false without starting the timeout thread when the timeout is set 0, as starting the wait and immediately stopping the thread seems pointless.
Added coverage for ack in ClusterServiceTests
this map is a "true" immutable map, encapsulating an open impl, and has a builder that allows it to be built easily.
the builder has the optimization of using clone if its being built based on an existing immutable map.
Usually acknowledged true means that all nodes have digested the change and are in sync. When no changes are made, there is no need to push a new cluster state, no need to wait for ack either, but can't guarantee that all nodes are in sync.
When using cluster reroute with dryRun flag no changes are made, this test was based on the wrong assumption that acknowledged meant all nodes were in sync, which is not the case here.
Changed the test to only read the cluster state from the master, to check that nothing changed there after processing the cluster state update.
we need to move to started from post recovery on cluster level changes, we need to make sure we handle a global state change of relocating, which can happen (and not pass through started)
Improve the introduction of new fields into the concrete parsed mappings by not relying on immutable maps and copying over entries, but instead using open maps (which will also use less memory), and using clone to perform the copy on write logic
This allows the RegexpQueryBuilder to be used in span queries
Added tests for all span multi term queries.
Also updated the documentation and removed mentioning of numeric range
queries for span queries (they have to be terms).
Closes#3392
This new API allows to get the mapping for a specific set of fields rather than get the whole index mapping and traverse it.
The fields to be retrieved can be specified by their full path, index name and field name and will be resolved in this order.
In case multiple field match, the first one will be returned.
Since we are now generating the output (rather then fall back to the stored mapping), you can specify `include_defaults`=true on the request to have default values returned.
Closes#3941
its preferable to execute the indices cluster state service as quickly as possible, as one of the first listeners, so it will apply the cluster state to the local state
potentially, it should even execute before it state is "visible" (through the state call), but that's another change...
make sure to throw the already exists exception, so when indexing into an alias, and it has not propagated yet through the cluster state, it will end up being ignored if it already exists
Some FieldData consumers require hash values per byte. We provide an optimization
that allows to cache the hashes internally if the consumer knows that they are needed
this optimization got lost in a previous commit. This commit adds them back and folds
the dedicated method into AtomicFieldData#getBytesValues(true|false)
As a side note, the internal reroute call is now part of the ack mechanism. That means that if the response contains acknowledged flag, the internal reroute that was eventually issued was acknowledged too. Also, even if the request is not acknowledged, the reroute is issued before returning, which means that there is no need to manually call reroute afterwards to make sure the new settings are immediately applied.
Closes#3995
Added support for serialization based on version to AcknowledgedResponse. Useful in api that don't support yet the acknowledged flag in the response.
Moved also ack warmer tests to more specific AckTests class
Close#3983
This addes the _cat/recovery/{index} API endpoint, which displays
information about the status of recovering shards. An example of the
output:
index shard node target recovered %
test2 0 Fwo7c_6MSdWM0uM1Ho4t-g 147304414 19236101 13.1%
test 0 Fwo7c_6MSdWM0uM1Ho4t-g 145891423 119640535 82.0%
Fixes#3969
This patch takes the version of the created index into account when a
prebuilt analyzer is created.
So, if an index was created with 0.90.4, then the prebuilt analyzers
will be the same than on the 0.90.4 release.
One reason for this feature is the possibility to change pre built
analyzers like the standard one.
The patch tries to reuse analyzers as mutch as possible. So even if
version X.Y.Z and X.Y.A use the same lucene analyzers, the same instance
is reused in order to prevent overcreation of lucene analyzer instances.
Closes#3790
The setting causes the upper bound for a range query/filter to be rounded up,
therefore the name `round_ceil` seems to make more sense.
Also this commit removes the redundant fourth parameter to DateMathParser.parse(..)
which was never used.
was: parse(String text, long now, boolean roundUp, boolean upperInclusive)
is now: parse(String text, long now, boolean roundCeil)
closes#3914
When running tests, Engine.searcher() is going to be an AssertingIndexSearcher
so we definitely don't want to discard it. This commit fixes it as well as the
bugs it found.
Closes#3987
There seems to be an issue with this test since it shuts down random
nodes and TransportClients seem to be confused due to that. For
now we disable them to figure out if this is the cause of the sporadic
timeouts.
In order to make sure that people do not get confused, if they
index a float as weight, it makes more sense to reject it instead of
silently parsing it to an integer and using it.
The CompletionFieldMapper now checks for the type of the number which
is being read and throws and exception if the number is something else
than int or long.
Closes#3977
Requires field index_options set to "offsets" in order to store positions and offsets in the postings list.
Considerably faster than the plain highlighter since it doesn't require to reanalyze the text to be highlighted: the larger the documents the better the performance gain should be.
Requires less disk space than term_vectors, needed for the fast_vector_highlighter.
Breaks the text into sentences and highlights them. Uses a BreakIterator to find sentences in the text. Plays really well with natural text, not quite the same if the text contains html markup for instance.
Treats the document as the whole corpus, and scores individual sentences as if they were documents in this corpus, using the BM25 algorithm.
Uses forked version of lucene postings highlighter to support:
- per value discrete highlighting for fields that have multiple values, needed when number_of_fragments=0 since we want to return a snippet per value
- manually passing in query terms to avoid calling extract terms multiple times, since we use a different highlighter instance per doc/field, but the query is always the same
The lucene postings highlighter api is quite different compared to the existing highlighters api, the main difference being that it allows to highlight multiple fields in multiple docs with a single call, ensuring sequential IO.
The way it is introduced in elasticsearch in this first round is a compromise trying not to change the current highlight api, which works per document, per field. The main disadvantage is that we lose the sequential IO, but we can always refactor the highlight api to work with multiple documents.
Supports pre_tag, post_tag, number_of_fragments (0 highlights the whole field), require_field_match, no_match_size, order by score and html encoding.
Closes#3704
You can configure the highlighting api to return an excerpt of a field
even if there wasn't a match on the field.
The FVH makes excerpts from the beginning of the string to the first
boundary character after the requested length or the boundary_max_scan,
whichever comes first. The Plain highlighter makes excerpts from the
beginning of the string to the end of the last token before the requested
length.
Closes#1171
Currently we have a marker interface for Acknowledged[Request|Response],
this makes not much sense since we duplicate the code in each subclass
or class that implements the interface. We can simply use abstract
classes and have it implemented only once.
This commit primarily folds [Double|Bytes|Long|GeoPoint]Values.Iter
into [Double|Bytes|Long|GeoPoint]Values. Iterations now don't require
a auxillary class (Iter) but instead driven by native for loops. All
[Double|Bytes|Long|GeoPoint]Values are stateful and provide `setDocId`
and `nextValue` methods to iterate over all values in a document.
This has several advantage:
* The amout of specialized classes is reduced
* Iteration is clearly stateful ie. Iters can't be confused to be local.
* All iterations are size bounded which prevents runtime checks and
allows JIT optimizations / loop un-rolling and most iterations are
branch free.
* Due to the bounded iteration the need for a `hasNext` method call
is removed.
* Value iterations feels more native.
This commit also adds consistent documentation and unifies the calcualtion
if SortMode is involved.
This commit also changes the runtime behavior of BytesValues#getValue() such that it
will never return `null` anymore. If a document has no value in a field
this method still returns a `BytesRef` with a `length` of 0. To identify
documents with no values #hasValue() or #setDocument(int) should be used.
The latter should be preferred if the value will be consumed in the case
the document has a value.
Have a separate channel for recovery, so it won't overflow the "low" channel which is also used for bulk indexing.
Also, rename the channel names to be more descriptive. Change low to bulk (for bulk based operations, currently just bulk indexing), med to reg (for "regular" operations), and high to state (for state based communication). The new channel for recovery will be named recovery, and the ping channel will remain the same.
closes#3954
When setting track scores, the scan search type will return the scores for each document. The Java API builder does not properly set this value (it only sets it if a sort in in place, which is not relevant for scan search type).
closes#3949