DfsPhase captures terms used for scoring a query in order to build global term statistics across
multiple shards for more accurate scoring. It currently does this by building the query's `Weight`
and calling `extractTerms` on it to collect terms, and then calling `IndexSearcher.termStatistics()`
for each collected term. This duplicates work, however, as the various `Weight` implementations
will already have collected these statistics at construction time.
This commit replaces this round-about way of collecting stats, instead using a delegating
IndexSearcher that collects the term contexts and statistics when `IndexSearcher.termStatistics()`
is called from the Weight.
It also fixes a bug when using rescorers, where a `QueryRescorer` would calculate distributed term
statistics, but ignore field statistics. `Rescorer.extractTerms` has been removed, and replaced with
a new method on `RescoreContext` that returns any queries used by the rescore implementation.
The delegating IndexSearcher then collects term contexts and statistics in the same way described
above for each Query.
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.
The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).
Relates #33028
The contains syntax was added in #30874 but the skips were not properly
put in place.
The java runner has the feature so the tests will run as part of the
build, but language clients will be able to support it at their own
pace.
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899
Add tests for build-tools to make sure example plugins build stand-alone using it.
This will catch issues such as referencing files from the buildSrc directly, breaking external uses of build-tools.
* remove left-over comment
* make sure of the property for plugins
* skip installing modules if these exist in the distribution
* Log the distrbution being ran
* Don't allow running with integ-tests-zip passed externally
* top level x-pack/qa can't run with oss distro
* Add support for matching objects in lists
Makes it possible to have a key that points to a list and assert that a
certain object is present in the list. All keys have to be present and
values have to match. The objects in the source list may have additional
fields.
example:
```
match: { 'nodes.$master.plugins': { name: ingest-attachment } }
```
* Update plugin and module tests to work with other distributions
Some of the tests expected that the integration tests will always be ran
with the `integ-test-zip` distribution so that there will be no other
plugins loaded.
With this change, we check for the presence of the plugin without
assuming exclusivity.
* Allow modules to run on other distros as well
To match the behavior of tets.distributions
* Add and use a new `contains` assertion
Replaces the previus changes that caused `match` to do a partial match.
* Implement PR review comments
This allows plugins to plug rescore implementations into
Elasticsearch. While this is a fairly expert thing to do I've
done my best to point folks to the QueryRescorer as one that at
least documents the tradeoffs that it makes. I've attempted to
limit the API surface area by removing `SearchContext` from the
exposed interface, instead exposing just the IndexSearcher and
`QueryShardContext`. I also tried to make some of the class names
more consistent and do some general cleanup while I was there.
I entertained the notion of moving the `QueryRescorer` to module.
After all, it'd be a wonderful test to prove that you can plug
rescore implementation into Elasticsearch if the only built in
rescore implementation is in the module. But I decided against it
because the new module would require a client jar and it'd require
moving some more things around. I think if we really want to do
it, we should do it as a followup.
I did, on the other hand, create an "example" rescore plugin which
should both be a nice example for anyone wanting to plug in their
own rescore implementation and servers as a good integration test
to make sure that you can indeed plug one in.
Closes#26208