This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.
The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
* Remove empty statements
There are a couple of instances of undocumented empty statements all across the
code base. While they are mostly harmless, they make the code hard to read and
are potentially error-prone. Removing most of these instances and marking blocks
that look empty by intention as such.
* Change test, slightly more verbose but less confusing
Today if a wildcard, date-math expression or alias expands/resolves
to an index that is search-throttled we still search it. This is likely
not the desired behavior since it can unexpectedly slow down searches
significantly.
This change adds a new indices option that allows `search`, `count`
and `msearch` to ignore throttled indices by default. Users can
force expansion to throttled indices by using `ignore_throttled=true`
on the rest request to expand also to throttled indices.
Relates to #34352
With this commit we differentiate between permanent circuit breaking
exceptions (which require intervention from an operator and should not
be automatically retried) and transient ones (which may heal themselves
eventually and should be retried). Furthermore, the parent circuit
breaker will categorize a circuit breaking exception as either transient
or permanent based on the categorization of memory usage of its child
circuit breakers.
Closes#31986
Relates #34460
Optionals containing boxed primitive types are prohibitively costly because they
have two level of boxing. For Optional<Integer> the analogous OptionalInt can be
used to avoid the boxing of the contained int value.
This adds the ERR metric to the provided xContent parsers in the module and the
high level rest client registry. Also adding integration tests to make sure the
metric is correctly registered and usable from the client.
The notion of "quality" is an overloaded term in the search ranking evaluation
context. Its usually used to decribe certain levels of "good" vs. "bad" of a
seach result with respect to the users information need. We currently report the
result of the ranking evaluation as `quality_level` which is a bit missleading.
This changes the response parameter name to `metric_score` which fits better.
Currently the ranking evaluation response contains a 'unknown_docs' section
for each search use case in the evaluation set. It contains document ids for
results in the search hits that currently don't have a quality rating.
This change renames it to `unrated_docs`, which better reflects its purpose.
This change adds Expected Reciprocal Rank (ERR) as a ranking evaluation metric
as descriped in:
Chapelle, O., Metlzer, D., Zhang, Y., & Grinspan, P. (2009).
Expected reciprocal rank for graded relevance.
Proceeding of the 18th ACM Conference on Information and Knowledge Management.
https://doi.org/10.1145/1645953.1646033
ERR is an extension of the classical reciprocal rank to the graded relevance
case and assumes a cascade browsing model. It quantifies the usefulness of a
document at rank `i` conditioned on the degree of relevance of the items at ranks
less than `i`. ERR seems to be gain traction as an alternative to (n)DCG, so it
seems like a good metric to support. Also ERR seems to be the default optimization
metric used for training in RankLib, a widely used learning to rank library.
Relates to #29653
TransportAction currently contains 2 doExecute methods, one which takes
a the task, and one that does not. The latter is what some subclasses
implement, while the first one just calls the latter, dropping the given
task. This commit combines these methods, in favor of just always
assuming a task is present.
Most transport actions don't need the node ThreadPool. This commit
removes the ThreadPool as a super constructor parameter for
TransportAction. The actions that do need the thread pool then have a
member added to keep it from their own constructor.
Most transport actions don't need to resolve index names. This commit
removes the index name resolver as a super constructor parameter for
TransportAction. The actions that do need the resolver then have a
member added to keep the resolver from their own constructor.
Since #30966, Action no longer has anything but a call to the
GenericAction super constructor. This commit renames GenericAction
into Action, thus eliminating the Action class. Additionally, this
commit removes the Request generic parameter of the class, since
it was unused.
This change moves tests in `smoke-test-rank-eval-with-mustache` into the main
ranking evaluation module by declaring that the integration testing cluster
requires the `lang-mustache` plugin. This avoids having to maintain the qa
project for only one basic test suite.
While the other two ranking evaluation metrics (precicion and reciprocal rank)
already provide a more detailed output for how their score is calculated, the
discounted cumulative gain metric (dcg) and its normalized variant are lacking
this until now. Its not really clear which level of detail might be useful for
debugging and understanding the final metric calculation, but this change adds a
`metric_details` section to REST output that contains some information about the
evaluation details.
ObjectParser should throw XContentParseExceptions, not IAE. A dedicated parsing
exception can includes the place where the error occurred.
Closes#30605
This commit removes the RequestBuilder generic type from Action. It was
needed to be used by the newRequest method, which in turn was used by
client.prepareExecute. Both of these methods are now removed, along with
the existing users of prepareExecute constructing the appropriate
builder directly.
Currently the ranking evaluation API accepts the full query syntax for
the queries specified in the evaluation set and executes them via multi
search. This potentially runs costly aggregations and suggestions too.
This change adds checks that forbid using aggregations, suggesters,
highlighters and the explain and profile options in the queries that are
run as part of the ranking evaluation since they are irrelevent in the
context of this API.
This *mostly* silences `javadoc`'s warning about defaulting to
generating html4 files by enabling generating html5 file for the
projects for which that works. It didn't work in a half dozen projects,
about half of which I've fixed in this PR, entirely by replacing
`<tt>thing</tt>` with `{@code thing}`.
There are a few remaining projects that contain javadoc with invalid
html5. I'll fix those projects in a followup.
The ranking evaluation requests so far were not tested against aliases
but they should run regardless of the targeted index is a real index or
an alias. This change adds cases for this to the integration and rest
tests.
Allow high level java rest client to access details of the metric
calculation by making them accessible across packages. Also renaming the
inner `Breakdown` classes of the evaluation metrics to `Detail` to
better communicate their use.
This change validates that the `_search` request does not have trailing
tokens after the main object and fails the request with a parsing exception otherwise.
Closes#28995
* Move ObjectParser into the x-content lib
This moves `ObjectParser`, `AbstractObjectParser`, and
`ConstructingObjectParser` into the libs/x-content dependency. This decoupling
allows them to be used for parsing for projects that don't want to depend on the
entire Elasticsearch jar.
Relates to #28504
Currently the ranking evaluation API doesn't support many of the
standard parameters of the search API. Some of these make sense, like
adding support for the common indices options parameters, which this
change adds.
Fixes and edge case where DiscountedCumulativeGain can return NaN as
result of the quality metric calculation. This can happen when the
search result set is empty and normalization is used. We should return 0
in this case. Also adding related unit tests to the other two metrics.
Currently we store the indices specified in the request URL together with all
the other ranking evaluation specification in RankEvalSpec. This is not ideal
since e.g. the indices are not rendered to xContent and so cannot be parsed
back. Instead we should keep them in RankEvalRequest.
* Decouple XContentBuilder from BytesReference
This commit removes all mentions of `BytesReference` from `XContentBuilder`.
This is needed so that we can completely decouple the XContent code and move it
into its own dependency.
While this change appears large, it is due to two main changes, moving
`.bytes()` and `.string()` out of XContentBuilder itself into static methods
`BytesReference.bytes` and `Strings.toString` respectively. The rest of the
change is code reacting to these changes (the majority of it in tests).
Relates to #28504
Parsing of a ranking evaluation request and its subcomponents should throw parsing
errors on unknown fields. This change adds tests for this and changes the parser
behaviour in cases where it is needed.
* Move to non-deprecated XContentHelper.createParser(...)
This moves away from one of the now-deprecated XContentHelper.createParser
methods in favor of specifying the deprecation logger at parser creation time.
Relates to #28449
Note that this doesn't move all the `createParser` calls because some of them
use the already-deprecated method that doesn't specify the XContentType.
* Remove the deprecated (and now non-needed) createParser method
This change adds support for the new ranking evaluation API to the High Level Rest Client.
This mostly means adding support for parsing the various response objects back from the
REST representation. It includes one change to the response syntax where previously we didn't
print the type of the metric details section but we now need it to pick the right parser to
parse this section back.
Closes#28198
Currenty the rest response of the ranking evaluation API wraps all inside an
enclosing `rank_eval` object. This is redundant since it is clear from the API
call and it doesn't provide any other useful information. This change removes
this.
Removing the unnecessary RankEvalTestHelper, making use of the common test infra
in ESTestCase, also hardening a few of the classes by making more fields final.
Problem: So far all rank eval requests are being executed in parallel. If there
are more than the search thread pool can handle, or if there are other search
requests executed in parallel rank eval can fail.
Solution: Make number of max_concurrent_searches configurable.
Name of configuration parameter is analogous to msearch. Default
max_concurrent_searches set to 10: Rank_eval isn't particularly time critical so
trying to avoid being more clever than probably needed here. Can set this value
through the API to a higher value anytime.
Fixes#21403
Problem: We introduced the ability to shorten the rank eval request by using a
template in #20231. When playing with the API it turned out that there might be
use cases where - e.g. due to various heuristics - folks might want to translate
the original user query into more than just one type of Elasticsearch query.
Solution: Give each template an id that can later be referenced in the
actual requests.
Closes#21257
Add checks to RankEvalSpec to safe guard against missing parameters.
Fail early in case no metric is supplied, no rated requests are supplied or the search source builder is missing but no template is supplied neither.
Add stricter checks around rank eval request parsing: Fail if in a rated request we see both, a verbatim request as well as request
template parameters.
Relates to #21260
Adds tests around serialisation/validation checks for rank evaluation request components
* Add null/ empty string checks to RatedDocument constructor
* Add mutation test to RatedDocument serialization tests.
* Reorganise rank-eval RatedDocument tests and add serialisation test.
* Add roundtrip serialisation testing for RatedRequests
* Adds serialisation testing and equals/hashcode testing for RatedRequest.
* Fixes a bug in previous equals implementation of RatedRequest along the way.
* Add roundtrip tests for Precision and ReciprocalRank
* Also fixes a bug with serialising ReciprocalRank.
* Add roundtrip testing for DiscountedCumulativeGain
* Add serialisation test for DocumentKey and fix test init
* Add check that relevant doc threshold is always positive for precision.
* Check that relevant threshold is always positive for precision and reciprocal
rank
Closes#21401
Move rank-eval template compilation down to TransportRankEvalAction
Closes#21777 and #21465
Search templates for rank_eval endpoint so far only worked when sent through
REST end point
However we also allow templates to be set through a Java API call to
"setTemplate" on that same spec. This doesn't go through template execution so
fails further down the line.
To make this work, moved template execution further down, probably to
TransportRankEvalAction.
* Add template IT test for Java API
* Move template compilation to TransportRankEvalAction
Currently we fail the whole ranking evaluation request when we receive an
exception for any of the search requests. We should collect those errors and
report them back to the user in the rest response. This change adds collecting
the errors and propagating them back via the RankEvalResponse.
Closes#19889
Our current default behaviour to ignore unrated documents when calculating the
precision seems a bit counter intuitive. Instead we should treat those documents
as "irrelevant" by default and provide an optional parameter to ignore those
documents if that is the behaviour the user wants.
There's a currently unhandled edge case for the precion@ metric. When none of
the search hits in the result are rated, we have neither true nor false
positives which currently leads to division by zero. We should return a precion
of 0.0 in this case.
When multiple ratings for the same document (identified by _index, _type,
_id) are specified in the request we should throw an error. This change adds a
check for this in the RatedRequest setter (and ctor that uses that setter).
Closes#20997
This adds support for templating in rank eval requests.
Relates to #20231
Problem: In it's current state the rank-eval request API forces the user to repeat complete queries for each test request. In most use cases the structure of the query to test will be stable with only parameters changing across requests, so this looks like lots of boilerplate json for something that could be expressed in a more concise way.
Uses templating/ ScriptServices to enable users to submit only one test request template and let them only specify template parameters on a per test request basis.