The percolator will add a `_percolator_document_slot` field to all percolator
hits to indicate with what document it has matched. This number matches with
the order in which the documents have been specified in the percolate query.
Also improved the support for multiple percolate queries in a search request.
This change exposes the duplicate removal option added in Lucene for the completion suggester
with a new option called `skip_duplicates` (defaults to false).
This commit also adapts the custom suggest collector to handle deduplication when multiple contexts match the input.
Closes#23364
The `index.percolator.map_unmapped_fields_as_text` is a more better name, because unmapped fields are mapped to a text field with default settings
and string is no longer a field type (it is either keyword or text).
The definition of development vs. production mode has evolved slightly
over time (with the introduction of single-node) discovery. This commit
clarifies the documentation to better account for this adjustment.
Relates #26460
Adding a check to QueryStringQueryBuilderTests that checks the override
behaviour of `quote_analyzer`, also adding documentation explaining the use of
this parameter in `query_string` query.
Closes#25417
The current script service has a script compilation limit for a one
minute window. This is set to a small default value of 15. Instead of
increasing that default value, this commit introduces a new setting
that allows to configure a rate per time unit, so that the script service can deal with bursts better.
The new setting is named `script.max_compilations_rate`,
requires a nonnegative number and a positive time value.
The default is `75/5m`, which is equivalent to the existing 15 per minute.
Multi-level Nested Sort with Filters
Allow multiple levels of nested sorting where each level can have it's own filter.
Backward compatible with previous single-level nested sort.
* Remove the _all metadata field
This change removes the `_all` metadata field. This field is deprecated in 6
and cannot be activated for indices created in 6 so it can be safely removed in
the next major version (e.g. 7).
587409e893 introduced a bug where an example of the format of a request which contained placeholder values was attempted to be tested. This change adds `NOTCONSOLE` to that snippet as the immediately following snippet tests a concrete example.
220212dd69 introduced a bug because the test substitution was looking for `otherhost` where the snippet contained `oldhost`. This change fixes the substitution
* Accept an array of field names and boosts in the index.query.default_field setting
This commit allows to define an array of field names and boosts for the index setting `index.query.default_field`.
The format is equivalent to the `fields` options of the full text search queries (e.g. field_name^boost).
This commit also makes this setting dynamically updatable.
Fixes#25946
* Deprecate global_ordinals_hash and global_ordinals_low_cardinality
This change deprecates the `global_ordinals_hash` and `global_ordinals_low_cardinality` and
makes the `global_ordinals` execution hint choose internally if global ords should be remapped or use the segment ord directly.
These hints are too sensitive and expert to be exposed and we should be able to take the right decision internally based on the agg tree.
Currently the `precision` parameter must be a precision level
in the range of [1,12]. In #5042 it was suggested also supporting
distance units like "1km" to automatically approcimate the needed
precision level. This change adds this support to the Rest API by
making use of GeoUtils#geoHashLevelsForPrecision.
Plain integer values without a unit are still treated as precision
levels like before. Distance values that are too small to be represented
by a precision level of 12 (values approx. less than 0.056m) are
rejected.
Closes#5042
There was some confusion about the fact that tokens emitted from a Pattern
Capture Token Filter are treated as synonyms when used to analyze a search
query. This commit adds an explanation to the note in the docs to emphasize this
behaviour.
Closes#25746
This change is a continuation of #25726 that aligns field expansions for the simple_query_string with the query_string and multi_match query.
The main changes are:
* For exact field name, the new behavior is to rewrite to a matchnodocs query when the field name is not found in the mapping.
* For partial field names (with * suffix), the expansion is done only on keyword, text, date, ip and number field types. Other field types are simply ignored.
* For all fields (*), the expansion is done on accepted field types only (see above) and metadata fields are also filtered.
The use_all_fields option is deprecated in this change and can be replaced by setting `*` in the fields parameter.
This commit also changes how text fields are analyzed. Previously the default search analyzer (or the provided analyzer) was used to analyze every text part
, ignoring the analyzer set on the field in the mapping. With this change, the field analyzer is used instead unless an analyzer has been forced in the parameter of the query.
Finally now that all full text queries can handle the special "*" expansion (`all_fields` mode), the `index.query.default_field` is now set to `*` for indices created in 6.
We use `:` for cross-cluster search (eg `cluster:index`), therefore, we should
not allow the ambiguity when allowing cluster or index names.
Relates to #23892
All of the snippets in our docs marked with `// TESTRESPONSE` are
checked against the response from Elasticsearch but, due to the
way they are implemented they are actually parsed as YAML instead
of JSON. Luckilly, all valid JSON is valid YAML! Unfurtunately
that means that invalid JSON has snuck into the exmples!
This adds a step during the build to parse them as JSON and fail
the build if they don't parse.
But no! It isn't quite that simple. The displayed text of some of
these responses looks like:
```
{
...
"aggregations": {
"range": {
"buckets": [
{
"to": 1.4436576E12,
"to_as_string": "10-2015",
"doc_count": 7,
"key": "*-10-2015"
},
{
"from": 1.4436576E12,
"from_as_string": "10-2015",
"doc_count": 0,
"key": "10-2015-*"
}
]
}
}
}
```
Note the `...` which isn't valid json but we like it anyway and want
it in the output. We use substitution rules to convert the `...`
into the response we expect. That yields a response that looks like:
```
{
"took": $body.took,"timed_out": false,"_shards": $body._shards,"hits": $body.hits,
"aggregations": {
"range": {
"buckets": [
{
"to": 1.4436576E12,
"to_as_string": "10-2015",
"doc_count": 7,
"key": "*-10-2015"
},
{
"from": 1.4436576E12,
"from_as_string": "10-2015",
"doc_count": 0,
"key": "10-2015-*"
}
]
}
}
}
```
That is what the tests consume but it isn't valid JSON! Oh no! We don't
want to go update all the substitution rules because that'd be huge and,
ultimately, wouldn't buy much. So we quote the `$body.took` bits before
parsing the JSON.
Note the responses that we use for the `_cat` APIs are all converted into
regexes and there is no expectation that they are valid JSON.
Closes#26233
* Migrate migration docs from 6.0 to 7.0
Since we only keep one version of migration docs and master is now on 7.0, we
should migrate these so breaking changes can be added in the right place.
* Remove release notes as well
They link to the migration guides, so they have to go.
* Add placeholder notes for 7.0 so doc build is happy
The names of two settings in the script security docs are incorrect,
referring to the prefix as "scripts" instead of "script". This commit
fixes this issue.
Relates #26236
In #26185 we made the description of `requests_per_second` sane
for reindex. This improves on the description by using some more
common vocabulary ("batch size", etc) and improving the formatting
of the example calculation so it stands out and doesn't require
scrolling.
An array of values is required because there is no default (or
reasonable way to set a default). But validation for values
only happens if it is actually set. If the values param is omitted
entirely than the agg builder will NPE.
The environment variable CONF_DIR was previously inconsistently used in
our packaging to customize the location of Elasticsearch configuration
files. The importance of this environment variable has increased
starting in 6.0.0 as it's now used consistently to ensure Elasticsearch
and all secondary scripts (e.g., elasticsearch-keystore) all use the
same configuration. The name CONF_DIR is there for legacy reasons yet
it's too generic. This commit renames CONF_DIR to ES_PATH_CONF.
Relates #26197
In reindex APIs, when using the `slices` parameter to choose the number of slices, adds the option to specify `slices` as "auto" which will choose a reasonable number of slices. It uses the number of shards in the source index, up to a ceiling. If there is more than one source index, it uses the smallest number of shards among them.
This gives users an easy way to use slicing in these APIs without having to make decisions about how to configure it, as it provides a good-enough configuration for them out of the box. This may become the default behavior for these APIs in the future.
The percolator field mapper doesn't need to extract all terms and ranges from a bool query with must or filter clauses.
In order to help to default extraction behavior, boost fields can be configured, so that fields that are known for not being
selective enough can be ignored in favor for other fields or clauses with specific fields can forcefully take precedence over other clauses.
This can help selecting clauses for fields that don't match with a lot of percolator queries over other clauses and thus improving performance of the percolate query.
For example a status like field is something that should configured as an ignore field.
Queries on this field tend to match with more documents and so if clauses for this fields
get selected as best clause then that isn't very helpful for the candidate query that the
percolate query generates to filter out percolator queries that are likely not going to match.