Add ERR to ranking evaluation documentation (#32314)
This change adds a section about the Expected Reciprocal Rank metric (ERR) to the Ranking Evaluation documentation.
This commit is contained in:
parent
387c3c7f1d
commit
c1cc0cef61
|
@ -259,6 +259,56 @@ in the query. Defaults to 10.
|
||||||
|`normalize` | If set to `true`, this metric will calculate the https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG[Normalized DCG].
|
|`normalize` | If set to `true`, this metric will calculate the https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG[Normalized DCG].
|
||||||
|=======================================================================
|
|=======================================================================
|
||||||
|
|
||||||
|
[float]
|
||||||
|
==== Expected Reciprocal Rank (ERR)
|
||||||
|
|
||||||
|
Expected Reciprocal Rank (ERR) is an extension of the classical reciprocal rank for the graded relevance case
|
||||||
|
(Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009. http://olivier.chapelle.cc/pub/err.pdf[Expected reciprocal rank for graded relevance].)
|
||||||
|
|
||||||
|
It is based on the assumption of a cascade model of search, in which a user scans through ranked search
|
||||||
|
results in order and stops at the first document that satisfies the information need. For this reason, it
|
||||||
|
is a good metric for question answering and navigation queries, but less so for survey oriented information
|
||||||
|
needs where the user is interested in finding many relevant documents in the top k results.
|
||||||
|
|
||||||
|
The metric models the expectation of the reciprocal of the position at which a user stops reading through
|
||||||
|
the result list. This means that relevant document in top ranking positions will contribute much to the
|
||||||
|
overall score. However, the same document will contribute much less to the score if it appears in a lower rank,
|
||||||
|
even more so if there are some relevant (but maybe less relevant) documents preceding it.
|
||||||
|
In this way, the ERR metric discounts documents which are shown after very relevant documents. This introduces
|
||||||
|
a notion of dependency in the ordering of relevant documents that e.g. Precision or DCG don't account for.
|
||||||
|
|
||||||
|
[source,js]
|
||||||
|
--------------------------------
|
||||||
|
GET /twitter/_rank_eval
|
||||||
|
{
|
||||||
|
"requests": [
|
||||||
|
{
|
||||||
|
"id": "JFK query",
|
||||||
|
"request": { "query": { "match_all": {}}},
|
||||||
|
"ratings": []
|
||||||
|
}],
|
||||||
|
"metric": {
|
||||||
|
"expected_reciprocal_rank": {
|
||||||
|
"maximum_relevance" : 3,
|
||||||
|
"k" : 20
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
--------------------------------
|
||||||
|
// CONSOLE
|
||||||
|
// TEST[setup:twitter]
|
||||||
|
|
||||||
|
The `expected_reciprocal_rank` metric takes the following parameters:
|
||||||
|
|
||||||
|
[cols="<,<",options="header",]
|
||||||
|
|=======================================================================
|
||||||
|
|Parameter |Description
|
||||||
|
| `maximum_relevance` | Mandatory parameter. The highest relevance grade used in the user supplied
|
||||||
|
relevance judgments.
|
||||||
|
|`k` | sets the maximum number of documents retrieved per query. This value will act in place of the usual `size` parameter
|
||||||
|
in the query. Defaults to 10.
|
||||||
|
|=======================================================================
|
||||||
|
|
||||||
[float]
|
[float]
|
||||||
=== Response format
|
=== Response format
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue