* Fuzzy Suggester parameter names are now easier to understand
* non_prefix_length became prefix_length
* min_prefix_length became min_length
* Instead of specyfying search_analyzer and index_analyzer using analyzer for both is supported
* CompletionSuggester used the CharsRef spare instead of too much toString() now
The ClusterState can hold an 'invalid' local 'DiscoveryNode' during
node startup and rare race conditions can cause NPEs if an 'invalid'
'DiscoveryNode' is serialized.
Closes#3515
* The _percolator type now has always to _id field enabled (index=not_analyzed, store=no)
* During loading shard initialization the query ids are fetched from field data, before ids were fetched from stored values.
* Moved internal percolator query map storage from Text to HashedBytesRef based keys.
For machines with lots of cores ie. >= 48 the number of threads
created by default might cause unecessary memory pressure on the system
and can even lead to OOM where the system is not able to create any
native threads anymore. This commit limits the number of available
CPUs on the system used for thread pool initialization to at most
24 cores.
Closes#3478
It will eventually time out (with the default 5 minutes timeout), but we should properly handle it, and also, properly propagate the failure.
closes#3513
The '"side" : "back"' parameter is not supported anymore on
EdgeNGramTokenizer if the mapping is created with 0.90.2 / Lucene 4.3
The 'EdgeNgramTokenFilter' handles this automatically wrapping the
'EdgeNgramTokenFilter' in a 'ReverseTokenFilter' yet with tokenizers this
optimization is not possible. This commit also add a more verbose error message
how to work around this limitation.
Closes#3489
- remove default scale weight in builder
- make parameters object/double instead of string
- do not convert number to string and back again, parse double instead
- remove javadoc reference to test classes
- Set parameters in constructor instead of in method
Scoring support will allow the percolate matches to be sorted, or just assign a scores to percolate matches. Sorting by score can be very useful when millions of matching percolate queries are being returned.
The scoring support hooks in into the percolate query option and adds two new boolean options:
* `sort` - Whether to sort the matches based on the score. This will also include the score for each match. The `size` option is a required option when sorting percolate matches is enabled.
* `score` - Whether to compute the score and include it with each match. This will not sort the matches.
For both new options the `query` option needs to be specified, which is used to produce the scores. The `query` option is normally used to control which percolate queries are evaluated. In order to give meaning to these scores, the recently added `function_score` query in #3423 can be used to wrap the percolate query, this way the scores have meaning.
Closes#3506
Currently the timeout for an delete index operation is set to 10 seconds.
Yet, if a full flush is running while we delete and index this can
easily exceed 10 seconds. The timeout is not dramatic ie. the index
will be deleted eventually but the client request is not acked which
can cause confusion. We should raise it to prevent unnecessary confusion
especially in client tests where this can happen if the machine is pretty busy.
The new timeout is set to 60 seconds.
Closes#3498
- also, properly report on the failed assertion in toFloat
- use function score in the explain compared to custom score
- use the Tests suffix convention
When building a plugin with a new search endpoint, you need to parse the request as a searchRequest.
Methods exist in RestSearchAction class but are private.
We will modify them to be public static. This applies to:
* `RestSearchAction#parseSearchRequest(RestRequest)`
* `RestSearchAction#parseSearchSource(RestRequest)`
Closes#3499.
The multi percolate allows the bundle multiple percolate requests into one request. This api works similar to the multi search api. The request body format is line based. Each percolate request item takes two lines, the first line is the header and the second line is the body.
The header can contain any parameter that normally would be set via the request path or query string parameters. There are several percolate actions, because there are multiple types of percolate requests:
* `percolate` - Action for defining a regular percolate request.
* `count_percolate` - Action for defining a count percolate request.
* `percolate_existing_doc` - Action for defining a percolate existing document request.
* `count_percolate_existing_doc` - Action for defining a count percolate existing document request.
Each action has its own set of parameters that need to be specified in the percolate action.
Format:
```
{"[header_type]" : {[options...]}
{[body]}
```
Depending on the percolate action different parameters can be specified. For example the percolate and percolate existing document actions support different parameters.
The following endpoints are supported:
```
POST localhost:9200/[index]/[type]/_mpercolate
POST localhost:9200/[index]/_mpercolate
POST localhost:9200/_mpercolate
```
The `index` and `type` defined in the url path are the default index and type.
Closes#3488
FVH deploys some recursive logic to extract terms from documents
that need to highlighted. For documents that have terms with super
large term frequency like a document that repeats a terms very
very often this can produce some very large stacks when extracting
the terms. Taken to an extreme this causes stack overflow errors
when this grow beyond a term frequency >= 6000.
The ultimate solution is a iterative implementation of the extract
logic but until then we should protect users from these massive
term extractions which might be not very useful in the first place.
Closes#3486
Added the FuzzySuggester in order to support completion queries
The following options have been added for the fuxxy suggester
* edit_distance: Maximum edit distance
* transpositions: Sets if transpositions should be counted as one or two changes
* min_prefix_len: Minimum length of the input before fuzzy suggestions are returned
* non_prefix_len: Minimum length of the input, which is not checked for fuzzy alternatives
Closes#3465
By making use of the lsb provided functions, one does not depend on the start-stop daemon version to test if elasticsearch is running.
This ensures, that the init script works on debian wheezy, squeeze, current ubuntu and LTS versions.
Closes#3452