The newly added afterIndexShardPostRecovery method to InternalIndicesLifecycle, that the percolator now uses to trigger the loading of the registered queries will make sure that a shard doesn't go to started state before the queries have been loaded.
The percolate api (like other apis) will retry the execution on a different shard copy if a shard isn't in a started state preventing empty results if there registered queries to be loaded. Percolator tests fail sometimes for this reason.
The Fast Vector Highlighter can combine matches on multiple fields to
highlight a single field using `matched_fields`. This is most
intuitive for multifields that analyze the same string in different
ways. Example:
{
"query": {
"query_string": {
"query": "content.plain:running scissors",
"fields": ["content"]
}
},
"highlight": {
"order": "score",
"fields": {
"content": {
"matched_fields": ["content", "content.plain"],
"type" : "fvh"
}
}
}
}
Closes#3750
For example when a has_child is wrapped in a filtered query as query and the wrapped filter is cached.
The short circuit mechanism in that case counts down based on deleted docs, which then yields lower results than is expected.
Relates to #4306
* Removed the applyAcceptedDocs in ChildrenConstantScoreQuery, they need to be applied at all times. (because of short circuit mechanism)
* Moved ParentDocSet to FilteredDocIdSetIterator, because it fits better than MatchDocIdSet.
* Made similar changes to ParentConstantScoreQuery for consistency between the two queries. The bug accepted docs bug didn't occur in the ParentConstantScoreQuery.
* Updated random p/c tests to randomly update parent or child docs during the test run.
Closes#4306
* Force the cat API classes to have simple help available by extending from a AbstractCatAction
* Use a set binder on guice creation to create the help on node start up
* Make sure that the help/field info is returned without querying any data
This happens when reverting the trans transaction log on failure, and when that happens, actually we might have failed on the transient translog creation to being with....
fixes#4223
The use of freq() instead of sloppyFreq() and the fact that `numMatches` was
not updated in `setFreqCurrentDoc` could lead to an inaccurate score in the
explanation.
Close#4298
Note: we were previously waiting for ack only from all nodes that contain shards for the indices that the mapping updatewas applied to. This change introduces a wait for ack from all nodes, consitent with other api as the ack is meant more on the cluster state itself, which is held by all nodes and needs to be updated on all nodes anyway.
Closes#4228
FilterDirectory has been ported to Lucene in LUCENE-5204 which makes
the class in elasticsearch obsolet. This commit removes the class and
moves the static utils that are not in lucene to `DirectoryUtils`
* Hole charactor now can change with new releases
* Fixed bug where the SEP_LABEL constant was used instead of the sepLabel instance variable
* Replaced if- with switch-statement
When we relocate a shard we might still have pending SearchContext
instances hanging around that will be used in "in-flight" searches
on the already relocated shard. This is a valid operation but if
we have already closed the underlying directory which happens during
cleanup concurrently the close call on the IndexReader can trigger
an AlreadyClosedException when the NRT reader tries to cleanup files
via the IndexWriter.
Closes#4273
This method has 2 signatures and one of them is dangerous since it allows to
discard fielddata configuration of the field mapper. This commit changes the
percolator so that it uses fielddata configuration of the _id field mapper
instead of forcing the paged_bytes format.
Closes#4270
uri parameters were not all parsed for the multi term vector request. This commit
makes sure that all parameters are parsed and used when creating the requests for the
multi term vector request.
In order to simplify both code and json request, the request structure now allows
two ways to use multi term vectors:
1. Give all parameters for each document requested in the docs array like this:
```
{
"docs": [
{
"_index": "testidx",
"_type": "test",
"_id": "2",
"terms": [
"fox"
],
"term_statistics": true
},
{
"_index": "testidx",
"_type": "test",
"_id": "1",
"terms": [
"quick",
"brown"
],
"term_statistics": false
}
]
}
```
2. Define a list of ids and give parameters in a separate parameters object like this:
```
{
"ids": [
"1",
"2"
],
"parameters": {
"_index": "testidx",
"_type": "test",
"terms": [
"brown"
]
}
}
```
uri parameters are global parameters that are set for both cases. They are overwritten
by parameter definitions in the body.
Also, this commit adds the missing setParent(..) and setPreference(..) to TermVectorRequestBuilder.
* Minor alignments (like setter to ctor)
* FuzzySuggester has a unicode aware flag, which is not exposed in the fuzzy completion request parameters
* Made XAnalyzingSuggester flags (PAYLOAD_SEP, END_BYTE, SEP_LABEL) to be written into the postings format, so we can retain backwards compatibility
* The above change also implies, that these flags can be set per instantiated XAnalyzingSuggester
* CompletionPostingsFormatTest now uses a randomProvider for writing data to check for bwc
This commit upgrades to Lucene 4.6 and contains the following improvements:
* Remove XIndexWriter in favor of the fixed IndexWriter
* Removes patched XLuceneConstantScoreQuery
* Now uses Lucene passage formatters contributed from Elasticsearch in PostingsHighlighter
* Upgrades to Lucene46 Codec from Lucene45 Codec
* Fixes problem in CommonTermsQueryParser where close was never called.
Closes#4241
We have pending open files on a regular basis since we search while
relocating etc. and keep search contexts around that are cleaned up
later. We should rather let the close call pass on even if files are
open and only force failures on teardown in ElasticsearchIntegrationTest
When starting elasticsearch with a wrong linux user, it could generate a `NullPointerException` when `PluginsService` tries to list available plugins in `./plugins` dir.
To reproduce:
* create a plugins directory with `rwx` rights for root user only
* launch elasticsearch from another account (elasticsearch for example)
Related discussion: https://groups.google.com/forum/#!topic/elasticsearch/_WRW4Qfpo7MCloses#4186.
Closes#4187.
Add FieldDataTermsFilter that compares terms out of
the fielddata cache. When filtering on a large
set of terms this filter can be considerably faster
than using a standard lucene terms filter.
Add the "fielddata" execution mode to the
terms filter parser to enable the use of
the new FieldDataTermsFilter.
Add supporting tests and documentation.
Closes#4209
In case of a misconfigured slow search/index configuration (unparseable
TimeValue) an exception is thrown.
This is not a problem when creating a shard of an index, as an exception
is returned and all is good. However, this is a huge problem, when
starting up a node, as the shard creation is repeated endlessly.
This patch changes the behaviour to go on as usual and just disable the
slowlog, as an improper configuration of logging should not affect the
allocation behaviour.
Closes#2730
The search request inside of a put warmer request was nullable, but actually we have to have that request in the transport action.
Validation and appropriate test added.
Closes#4196
Make the FilterBuilder interface consistent with the QueryBuilder
interface and replace usage of QueryBuilderException with
ElasticSearchIllegalArgumentException.
Allow the user to configure the number of hash functions as well as add
support for serializing/deserializing the bloom filter from a stream.
Add a hashCode to the bloom filter.
In this view you never care about the actual heap used bytes; you only
want to know that your max is set to what you meant and what
percentage you're currently using.
Closes#4151.
The multi_match query accepted only an array in the fields parameter. This patch allows to use a single string as well.
Also added tests for parsing in both cases.
Closes#4164
There is an optimization that executes bit based (slow) filters in the end. Matched docs could be unset if they didn't match with any of these filters. The bug was that also iterator based (fast) filters should be checked.
This change checks all should filters in the end part (if must or must_not clauses exists), so it can now correctly unset matched docs. The current bool filters requires that at least one should clause must match for docs to be match regardless of any other clauses.
Closes#4130
This commit adds javadocs and removed unused methods from central
classes like ElasticsearchIntegrationTest. It also changes visibility
of many methods and classes that are only needed inside the test infrastructure.
On windows tests sometimes fail since files can not be deleted due
to existing repos still holding on to the files. The test
framework is picky about that since it could be a bug and fails
the test if a temp file can not be deleted.
At the end recovery, the IndexShardGatewayService will double check the gateway has moved the shard to POST_RECOVERY and if not, do it it self.
The shard state could have already move to started, causing the post_recovery call to throw an exception and the entire shard recovery to fail.
This can happened if, after the gateway moved the shard to POST_RECOVERY:
1) master sent a new cluster state indicating shard is initializing
2) IndicesClusterStateService#applyInitializingShard will send a shard started event
3) Master will mark shard as started and this will be processed quickly and move the shard to STARTED.
Closes#4147
With #3782 we changed the execution order of dynamic mapping updates and index operations. We now first send the mapping update to the master node, and then we index the document. This makes sense but caused issues with rivers as they are started due to the cluster changed event that is triggered on the master node right after the mapping update has been applied, but in order for the river to be started its _meta document needs to be available, which is not the case anymore as the index operation most likely hasn't happened yet. As a result in most of the cases rivers don't get started.
What we want to do is retry a few times if the _meta document wasn't found, so that the river gets started anyway.
Closes#4089, #3840
This adds a delegate to CharTermAttributeImpl to be compatible
with the Percolator that needs a CharTermAttribute. Yet compared
to CharTermAttributImpl we only fill the BytesRef with UTF-8 since
we already have it and only if we need to convert to UTF-16 we do it.
Closes#4028
Previously the field name specified in the search request was used, which isn't correct in case a custom index_name has been used for a field or the "path":"just_name" has been used in the mapping.
Closes#4116
Fixed also bug in the fast vector highlighter which was raised by enabling the object cache, due to null FieldQuery (NPE) in case the objects are taken from the cache
Added tests to check if there are issues when highlighting multiple fields at the same time
Closes#4106
Values returned by [Double|Long|Bytes]Values are sorted today which
is guaranteed by the underlying Lucene index. Several implementations can
make use of this property but the interfaces don't guarantee this behavior.
This commit adds the guarantees and makes use of them in several places.
Note: This change might require sorting for 3rd party implemenations of these
interaces.
The 'default' / 'standard' analyzer can be a trappy default sicne it filters
english stopwords by default. Yet a default should not be dedicated to a certain language
since elasticsearch is used in many different scenarios where a standard analysis chain
with specialization to english full-text might be rather counter productive.
This commit changes the 'standard' analyzer to use an empty stopword list for indices
that are created from 1.0.0.Beta1 version onwards but will maintain backwards compatibiliy
for older indices.
Closes#3775
Use .percolator as the internal (hidden) type name for percolators within the index. Seems nicer name to represent "hidden" types within an index.
closes#4090