This commit allows for configuring the sort order of missing values in BytesRef
comparators (used for strings) with the following options:
- _first: missing values will appear in the first positions,
- _last: missing values will appear in the last positions (default),
- <any value>: documents with missing sort value will use the given value when
sorting.
Since the default is _last, sorting by string value will have a different
behavior than in previous versions of elasticsearch which used to insert missing
value in the first positions when sorting in ascending order.
Implementation notes:
- Nested sorting is supported through the implementation of
NestedWrappableComparator,
- BytesRefValComparator was mostly broken since no field data implementation
was using it, it is now tested through NoOrdinalsStringFieldDataTests,
- Specialized BytesRefOrdValComparators have been removed now that the ordinals
rely on packed arrays instead of raw arrays,
- Field data tests hierarchy has been changed so that the numeric tests don't
inherit from the string tests anymore,
- When _first or _last is used, internally the comparators are told to use
null or BytesRefFieldComparatorSource.MAX_TERM to replace missing values
(depending on the sort order),
- BytesRefValComparator just replaces missing values with the provided value
and uses them for comparisons,
- BytesRefOrdValComparator multiplies ordinals by 4 so that it can find
ordinals for the missing value and the bottom value which are directly
comparable with the segment ordinals. For example, if the segment values and
ordinals are (a,1) and (b,2), they will be stored internally as (a,4) and
(b,8) and if the missing value is 'ab', it will be assigned 6 as an ordinal,
since it is between 'a' and 'b'. Then if the bottom value is 'abc', it will
be assigned 7 as an ordinal since if it between 'ab' and 'b'.
Closes#896
count_percolate -> count
percolate_existing_doc -> percolate
count_percolate_existing_doc -> count
If header contains `id` field, then it will automatically be percolation an existing document.
The highlighter in the percolate api highlights snippets in the document being percolated. If highlighting is enabled then foreach matching query, highlight snippets will be generated.
All highlight options that are supported via the search api are also supported in the percolate api, since the percolate api embeds the same highlighter infrastructure as the search api.
The `size` option is a required option if highlighting is specified in the percolate api, other than that the `highlight`request part can just be placed in the percolate api request body.
Closes#3574
- improve the test to be more re-creatable
- have tests for various number of replica counts, to check if failures are caused by searching on replicas that might not have been refreshed yet
- improve test to test explicit index creation, and index creation caused by index operation
- have an initial search go to _primary, to check if failure fails when searching on replica because it missed a refresh
--------------------------
This feature allows to retrieve [term vectors](https://github.com/elasticsearch/elasticsearch/issues/3114) for a list of documents. The json request has exactly the same [format](https://github.com/elasticsearch/elasticsearch/issues/3484) as the ```_termvectors``` endpoint
It use it, call
```
curl -XGET 'http://localhost:9200/index/type/_mtermvectors' -d '{
"fields": [
"field1",
"field2",
...
],
"ids": [
"docId1",
"docId2",
...
],
"offsets": false|true,
"payloads": false|true,
"positions": false|true,
"term_statistics": false|true,
"field_statistics": false|true
}'
```
The return format is an array, each entry of which conatins the term vector response for one document:
```
{
"docs": [
{
"_index": "index",
"_type": "type",
"_id": "docId1",
"_version": 1,
"exists": true,
"term_vectors": {
...
}
},
{
"_index": "index",
"_type": "type",
"_id": "docId2",
"_version": 1,
"exists": true,
"term_vectors": {
...
}
}
]
}
```
Note that, like term vectors, the mult term vectors request will silenty skip over documents that have no term vectors stored in the index and will simply return an empty response in this case.
Closes#3536
Currently we run unittests with clusters running in the background
that can potentiallly spawn threads causeing the thread leak control
to fire off in tests that don't use the test cluster. This commit
introduces some base classes for that purpose shadowning lucene test
framework classes adding the approriate ThreadScope.
always multiply query score to function score. For script score
functions, this means that boost_mode has to be set to `plain` if
'function_score' should behave like 'custom_score'
This commit fixes inconsistencies in `function_score` and `filters_function_score`
using scripts, see issue #3464
The method 'ScoreFunction.factor(docId)' is removed completely, since the name
suggests that this method actually computes a factor which was not the case.
Multiplying the computed score is now handled by 'FiltersFunctionScoreQuery'
and 'FunctionScoreQuery' and not implicitely performed in
'ScoreFunction.factor(docId, subQueryScore)' as was the case for 'BoostScoreFunction'
and 'DecayScoreFunctions'.
This commit also fixes the explain function for FiltersFunctionScoreQuery. Here,
the influence of the maxBoost was never printed. Furthermore, the queryBoost was
printed as beeing multiplied to the filter score.
Closes#3464
There is no need to write the pidfile in the bin/elasticsearchshell script
as this happens already in the java code.
Also cleaning up the bin/elasticsearch shell script a bit (no need to return
an error code when exec is called, as this forks and exits the shell script
immediately).
Closes#3529Closes#1745
This looks like a copy/paste issue where onCached was being called
rather than onRemoval. This should fix the ID cache stats not being
correct after a call to /_cache/clear?id_cache=true