Scoring support will allow the percolate matches to be sorted, or just assign a scores to percolate matches. Sorting by score can be very useful when millions of matching percolate queries are being returned.
The scoring support hooks in into the percolate query option and adds two new boolean options:
* `sort` - Whether to sort the matches based on the score. This will also include the score for each match. The `size` option is a required option when sorting percolate matches is enabled.
* `score` - Whether to compute the score and include it with each match. This will not sort the matches.
For both new options the `query` option needs to be specified, which is used to produce the scores. The `query` option is normally used to control which percolate queries are evaluated. In order to give meaning to these scores, the recently added `function_score` query in #3423 can be used to wrap the percolate query, this way the scores have meaning.
Closes#3506
Currently the timeout for an delete index operation is set to 10 seconds.
Yet, if a full flush is running while we delete and index this can
easily exceed 10 seconds. The timeout is not dramatic ie. the index
will be deleted eventually but the client request is not acked which
can cause confusion. We should raise it to prevent unnecessary confusion
especially in client tests where this can happen if the machine is pretty busy.
The new timeout is set to 60 seconds.
Closes#3498
- also, properly report on the failed assertion in toFloat
- use function score in the explain compared to custom score
- use the Tests suffix convention
When building a plugin with a new search endpoint, you need to parse the request as a searchRequest.
Methods exist in RestSearchAction class but are private.
We will modify them to be public static. This applies to:
* `RestSearchAction#parseSearchRequest(RestRequest)`
* `RestSearchAction#parseSearchSource(RestRequest)`
Closes#3499.
The multi percolate allows the bundle multiple percolate requests into one request. This api works similar to the multi search api. The request body format is line based. Each percolate request item takes two lines, the first line is the header and the second line is the body.
The header can contain any parameter that normally would be set via the request path or query string parameters. There are several percolate actions, because there are multiple types of percolate requests:
* `percolate` - Action for defining a regular percolate request.
* `count_percolate` - Action for defining a count percolate request.
* `percolate_existing_doc` - Action for defining a percolate existing document request.
* `count_percolate_existing_doc` - Action for defining a count percolate existing document request.
Each action has its own set of parameters that need to be specified in the percolate action.
Format:
```
{"[header_type]" : {[options...]}
{[body]}
```
Depending on the percolate action different parameters can be specified. For example the percolate and percolate existing document actions support different parameters.
The following endpoints are supported:
```
POST localhost:9200/[index]/[type]/_mpercolate
POST localhost:9200/[index]/_mpercolate
POST localhost:9200/_mpercolate
```
The `index` and `type` defined in the url path are the default index and type.
Closes#3488
FVH deploys some recursive logic to extract terms from documents
that need to highlighted. For documents that have terms with super
large term frequency like a document that repeats a terms very
very often this can produce some very large stacks when extracting
the terms. Taken to an extreme this causes stack overflow errors
when this grow beyond a term frequency >= 6000.
The ultimate solution is a iterative implementation of the extract
logic but until then we should protect users from these massive
term extractions which might be not very useful in the first place.
Closes#3486
Added the FuzzySuggester in order to support completion queries
The following options have been added for the fuxxy suggester
* edit_distance: Maximum edit distance
* transpositions: Sets if transpositions should be counted as one or two changes
* min_prefix_len: Minimum length of the input before fuzzy suggestions are returned
* non_prefix_len: Minimum length of the input, which is not checked for fuzzy alternatives
Closes#3465
By making use of the lsb provided functions, one does not depend on the start-stop daemon version to test if elasticsearch is running.
This ensures, that the init script works on debian wheezy, squeeze, current ubuntu and LTS versions.
Closes#3452
This commit adds support for failing fast when running a test
case with `-Dtests.iters=N` and uses some goodness from LuceneTestCase
in a new base `AbstractRandomizedTest`. This class checks among other
things if a tests doesn't call `super.setup` / `super.tearDown` when it
should do and checks if a large static resources are not cleaned up
after the tests ie. a running node.
Retrieving termvectors for a document that does not have the requested field
caused a null pointer exception. Same for documents if the field has no term vectors,
for example, because the field only contains "?".
Now, an empty response is returned.
Closes#3471
MultiOrdinals.MultiDocs returned 'null' ordinals which caused
a NPE if the field was single valued and would allow a significantly
smaller in memory representation than single packed int ordinals.
Closes#3470
We currently return with status code 0 when an IOException occurs.
The plugin manager should in any case return a nonzero status if
the operation was not successful. Now the PluginManager uses the
following reponse codes based on 'sysexists.sh':
* '0' on success
* '64' command line usage error
* '70' internal software error
* '74' input/output
Closes#3463
Lucene 4.4 shipped with a fundamental change in how the decision
on when to write compound files is made. During segment flush the
compound files are written by default which solely relies on a flag
in the IndexWriterConfig. The merge policy has been factored out to
only make decisions on merges and not on IW flushes. The default now
is always writing CFS on flush to reduce resource usage like open files
etc. if segments are flushed regularly. While providing a senseable
default certain users / usecases might need to change this setting if
re-packing flushed segments into CFS is not desired.
Closes#3461
The ClusterService might not see the latest cluster state and therefore
might not contain the local node id. Discovery will always see the local
node id since it's set on startup.