This PR proposes to use `IndexSortSortedNumericDocValuesRangeQuery` when
possible to speed up certain range queries. Points-based queries are already
very efficient, the only time this query makes a difference is when the range
matches a large number of documents.
Relates to #48665.
`FieldMapper#parseCreateField` accepts the parse context, plus a list of fields
as an output parameter. These fields are immediately added to the document
through `ParseContext#doc()`.
This commit simplifies the signature by removing the list of fields, and having
the mappers add the fields directly to `ParseContext#doc()`. I think this is
nicer for implementors, because previously fields could be added either through
the list, or the context (through `add`, `addWithKey`, etc.)
This commit introduces a new `geo` module that is intended
to be contain all the geo-spatial-specific features in server.
As a first step, the responsibility of registering the geo_shape
field mapper is moved to this module.
Co-authored-by: Nicholas Knize <nknize@gmail.com>
Some field name constants were not updaten when we moved from "string" to "text"
and "keyword" fields. Renaming them makes it easier and faster to know which
field type is used in test subclassing this base test case.
Refactor SearchHit to have separate document and meta fields.
This is a part of bigger refactoring of issue #24422 to remove
dependency on MapperService to check if a field is metafield.
Relates to PR: #38373
Relates to issue #24422
Co-authored-by: sandmannn <bohdanpukalskyi@gmail.com>
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
The highlighting phase for percolator queries currently uses some custom query
traversal logic to find all instances of PercolatorQuery in the query tree for the
current search context. This commit converts things to instead use a QueryVisitor,
which future-proofs us against new wrapper queries or queries from custom
plugins that the percolator module doesn't know about.
Lucene 8.5.0 release candidates are imminent. This commit upgrades master to use
the latest snapshot to check that there are no last-minute bugs or regressions.
ObjectParser allows you to declare a set of required fields, such that at least one
of the set must appear in an xcontent object for it to be valid. This commit adds
the similar concept of a set of exclusive fields, such that at most one of the set
must be present. It also enables required fields on ConstructingObjectParser, and
re-implements PercolateQueryBuilder.fromXContent() to use object parsing as
an example of how this works.
Generalize how queries on `_index` are handled at rewrite time (#52486)
Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.
This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.
Closes#49254
Commit #52748 fixed a bug where percolate queries wrapped in a constant score
could report incorrect matches. This commit adds a test to check that it also fixes
the case where a percolate query is sorted by something other than score.
Closes#52618
Lucene's RAMDirectory has been deprecated. This commit replaces all uses of
RAMDirectory in elasticsearch with the newer ByteBuffersDirectory. Most uses
are in tests, but the percolator and painless executor may get some small speedups.
Currently, date ranges queries using NOW-based date math are rewritten to
MatchAllDocs queries when being preprocessed for the percolator. However,
since we added the verification step, this can result in incorrect matches when
percolator queries are run without scores. This commit changes things to instead
wrap date queries that use NOW with a new DateRangeIncludingNowQuery.
This is a simple wrapper query that returns its delegate at rewrite time, but it can
be detected by the percolator QueryAnalyzer and be dealt with accordingly.
This also allows us to remove a method on QueryRewriteContext, and push all
logic relating to NOW-based ranges into the DateFieldMapper.
Fixes#52617
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
With the rewrite of the percolator's QueryAnalyzer to use lucene's QueryVisitor API,
term queries that are direct children of a boolean query are handled separately from
other children. This works fine for conjunctions, but for disjunctions we need to
treat the extracted terms from these direct descendents along with extractions from
more deeply nested children to ensure that minimum-should-match requirements
are met correctly.
This commit changes the logic in QueryAnalyzer#getResult() to bundle child term
results with all other results before handling them.
Fixes#50305
When the query analyzer examines a conjunction containing both terms and ranges,
it should only include ranges in the minimum_should_match calculation if there are no
other range queries on that same field within the conjunction. This is because we cannot
build a selection query over disjoint ranges on the same field, and it is not easy to check
if two range queries have an overlap.
The current logic to calculate this just sets minimum_should_match to 1 or 0, dependent
on whether or not the current range is over a field that has already been seen. However, this
can be incorrect in the case that there are terms in the same match group which adjust the
minimum_should_match downwards. Instead, the logic should be changed to match the
terms extraction, whereby we adjust minimum_should_match downwards if we have already
seen a range field.
Fixes#49684
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.
Closes#43599
Lucene now allows us to explore the structure of a query using QueryVisitors,
delegating the knowledge of how to recurse through and collect terms to the
query implementations themselves. The percolator currently has a home-grown
external version of this API to construct sets of matching terms that must be
present in a document in order for it to possibly match the query.
This commit removes the home-grown implementation in favour of one using
QueryVisitor. This has the added benefit of making interval queries available
for percolator pre-filtering. Due to a bug in multi-term intervals (LUCENE-9050)
it also includes a clone of some of the lucene intervals logic, that can be removed
once upstream has been fixed.
Closes#45639
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
Today built-in highlighter and plugins have access to the SearchContext through the
highlighter context. However most of the information exposed in the SearchContext are not needed and a QueryShardContext
would be enough to perform highlighting. This change replaces the SearchContext by the informations that are absolutely
required by highlighter: a QueryShardContext and the SearchContextHighlight. This change allows to reduce the exposure of the
complex SearchContext and remove the needs to clone it in the percolator sub phase.
Relates #47198
Relates #46523
This commit replaces the SearchContext used in AbstractQueryTestCase with
a QueryShardContext in order to reduce the visibility of search contexts.
Relates #46523
Currently we allow `_field_names` fields to be disabled explicitely, but since
the overhead is negligible now we decided to keep it turned on by default and
deprecate the `enable` option on the field type. This change adds a deprecation
warning whenever this setting is used, going forward we want to ignore and finally
remove it.
Closes#27239
This change adds an IndexSearcher and the node's BigArrays in the QueryShardContext.
It's a spin off of #46527 as this change is required to allow aggregation builder to solely use the
query shard context.
Relates #46523
Backport of 1a0dddf4ad24b3f2c751a1fe0e024fdbf8754f94 (AKA #445395)
* Add support for a Range field ValuesSource, including decode logic for range doc values and exposing RangeType as a first class enum
* Provide hooks in ValuesSourceConfig for aggregations to control ValuesSource class selection on missing & script values
* Branch aggregator creation in Histogram and DateHistogram based on ValuesSource class, to enable specialization based on type. This is similar to how Terms aggregator works.
* Prioritize field type when available for selecting the ValuesSource class type to use for an aggregation
This commit changes the way we manage refreshes in the index engines.
Instead of relying on a SearcherManager, this change uses a ReaderManager that
creates ElasticsearchDirectoryReader when needed. Searchers are now created on-demand
(when acquireSearcher is called) from the current ElasticsearchDirectoryReader.
It also slightly changes the Engine.Searcher to extend IndexSearcher in order
to simplify the usage in the consumer.
Currently `AbstractQueryTestCase#testToQuery` checks the search context cachable
flag. This is a bit fragile due to the high randomization of query builders
performed by this general test. Also we might only rarely check the
"interesting" cases because they rarely get generated when fully randomizing the
query builder.
This change moved the general checks out ot #testToQuery and instead adds
dedicated cache tests for those query builders that exhibit something other than
the default behaviour.
Closes#43200
This commit makes creators of GetField split the fields into document fields and metadata fields. It is part of larger refactoring that aims to remove the calls to static methods of MapperService related to metadata fields, as discussed in #24422.
`document_type` is the type to use for parsing the document to percolate, which
is already optional and deprecated. However `percotale` queries also have the
ability to percolate existing documents, identified by an index, an id and a
type. This change makes the latter optional and deprecated.
Closes#39963
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.
Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.
The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.
Relates #36921
This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.
Relates #36148
Relates #10708
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
Today when a percolator query contains a date range then the query
analyzer extracts that range, so that at search time the `percolate` query
can exclude percolator queries efficiently that are never going to match.
The problem is that if 'now' is used it is evaluated at index time.
So the idea is to rewrite date ranges with 'now' to a match all query,
so that the query analyzer can't extract it and the `percolate` query
is then able to evaluate 'now' at query time.
Stop passing `Settings` to `AbstractComponent`'s ctor. This allows us to
stop passing around `Settings` in a *ton* of places. While this change
touches many files, it touches them all in fairly small, mechanical
ways, doing a few things per file:
1. Drop the `super(settings);` line on everything that extends
`AbstractComponent`.
2. Drop the `settings` argument to the ctor if it is no longer used.
3. If the file doesn't use `logger` then drop `extends
AbstractComponent` from it.
4. Clean up all compilation failure caused by the `settings` removal
and drop any now unused `settings` isntances and method arguments.
I've intentionally *not* removed the `settings` argument from a few
files:
1. TransportAction
2. AbstractLifecycleComponent
3. BaseRestHandler
These files don't *need* `settings` either, but this change is large
enough as is.
Relates to #34488
After discussing on the team's FixItFriday, we concluded that
static final instance variables that are mutable should be lowercased.
Historically, DeprecationLogger was uppercased more frequently than lowercased.
This change adds the OneStatementPerLineCheck to our checkstyle precommit
checks. This rule restricts the number of statements per line to one. The
resoning behind this is that it is very difficult to read multiple statements on
one line. People seem to mostly use it in short lambdas and switch statements in
our code base, but just going through the changes already uncovered some actual
problems in randomization in test code, so I think its worth it.
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899