This change adds Expected Reciprocal Rank (ERR) as a ranking evaluation metric
as descriped in:
Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009).
Expected reciprocal rank for graded relevance.
Proceeding of the 18th ACM Conference on Information and Knowledge Management.
https://doi.org/10.1145/1645953.1646033
ERR is an extension of the classical reciprocal rank to the graded relevance
case and assumes a cascade browsing model. It quantifies the usefulness of a
document at rank `i` conditioned on the degree of relevance of the items at ranks
less than `i`. ERR seems to be gain traction as an alternative to (n)DCG, so it
seems like a good metric to support. Also ERR seems to be the default optimization
metric used for training in RankLib, a widely used learning to rank library.
Relates to #29653
This change moves tests in `smoke-test-rank-eval-with-mustache` into the main
ranking evaluation module by declaring that the integration testing cluster
requires the `lang-mustache` plugin. This avoids having to maintain the qa
project for only one basic test suite.
While the other two ranking evaluation metrics (precicion and reciprocal rank)
already provide a more detailed output for how their score is calculated, the
discounted cumulative gain metric (dcg) and its normalized variant are lacking
this until now. Its not really clear which level of detail might be useful for
debugging and understanding the final metric calculation, but this change adds a
`metric_details` section to REST output that contains some information about the
evaluation details.
ObjectParser should throw XContentParseExceptions, not IAE. A dedicated parsing
exception can includes the place where the error occurred.
Closes#30605
Currently the ranking evaluation API accepts the full query syntax for
the queries specified in the evaluation set and executes them via multi
search. This potentially runs costly aggregations and suggestions too.
This change adds checks that forbid using aggregations, suggesters,
highlighters and the explain and profile options in the queries that are
run as part of the ranking evaluation since they are irrelevent in the
context of this API.
The ranking evaluation requests so far were not tested against aliases
but they should run regardless of the targeted index is a real index or
an alias. This change adds cases for this to the integration and rest
tests.
Allow high level java rest client to access details of the metric
calculation by making them accessible across packages. Also renaming the
inner `Breakdown` classes of the evaluation metrics to `Detail` to
better communicate their use.
* Move ObjectParser into the x-content lib
This moves `ObjectParser`, `AbstractObjectParser`, and
`ConstructingObjectParser` into the libs/x-content dependency. This decoupling
allows them to be used for parsing for projects that don't want to depend on the
entire Elasticsearch jar.
Relates to #28504
Currently the ranking evaluation API doesn't support many of the
standard parameters of the search API. Some of these make sense, like
adding support for the common indices options parameters, which this
change adds.
Fixes and edge case where DiscountedCumulativeGain can return NaN as
result of the quality metric calculation. This can happen when the
search result set is empty and normalization is used. We should return 0
in this case. Also adding related unit tests to the other two metrics.
Currently we store the indices specified in the request URL together with all
the other ranking evaluation specification in RankEvalSpec. This is not ideal
since e.g. the indices are not rendered to xContent and so cannot be parsed
back. Instead we should keep them in RankEvalRequest.
* Decouple XContentBuilder from BytesReference
This commit removes all mentions of `BytesReference` from `XContentBuilder`.
This is needed so that we can completely decouple the XContent code and move it
into its own dependency.
While this change appears large, it is due to two main changes, moving
`.bytes()` and `.string()` out of XContentBuilder itself into static methods
`BytesReference.bytes` and `Strings.toString` respectively. The rest of the
change is code reacting to these changes (the majority of it in tests).
Relates to #28504
Parsing of a ranking evaluation request and its subcomponents should throw parsing
errors on unknown fields. This change adds tests for this and changes the parser
behaviour in cases where it is needed.
This change adds support for the new ranking evaluation API to the High Level Rest Client.
This mostly means adding support for parsing the various response objects back from the
REST representation. It includes one change to the response syntax where previously we didn't
print the type of the metric details section but we now need it to pick the right parser to
parse this section back.
Closes#28198
Currenty the rest response of the ranking evaluation API wraps all inside an
enclosing `rank_eval` object. This is redundant since it is clear from the API
call and it doesn't provide any other useful information. This change removes
this.
Removing the unnecessary RankEvalTestHelper, making use of the common test infra
in ESTestCase, also hardening a few of the classes by making more fields final.
Problem: So far all rank eval requests are being executed in parallel. If there
are more than the search thread pool can handle, or if there are other search
requests executed in parallel rank eval can fail.
Solution: Make number of max_concurrent_searches configurable.
Name of configuration parameter is analogous to msearch. Default
max_concurrent_searches set to 10: Rank_eval isn't particularly time critical so
trying to avoid being more clever than probably needed here. Can set this value
through the API to a higher value anytime.
Fixes#21403