Add ERR to ranking evaluation documentation (#32314)

This change adds a section about the Expected Reciprocal Rank metric (ERR) to
the Ranking Evaluation documentation.
This commit is contained in:
Christoph Büscher 2018-07-24 19:58:34 +02:00 committed by GitHub
parent 387c3c7f1d
commit c1cc0cef61
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 50 additions and 0 deletions

View File

@ -259,6 +259,56 @@ in the query. Defaults to 10.
|`normalize` | If set to `true`, this metric will calculate the https://en.wikipedia.org/wiki/Discounted_cumulative_gain#Normalized_DCG[Normalized DCG].
|=======================================================================
[float]
==== Expected Reciprocal Rank (ERR)
Expected Reciprocal Rank (ERR) is an extension of the classical reciprocal rank for the graded relevance case
(Olivier Chapelle, Donald Metzler, Ya Zhang, and Pierre Grinspan. 2009. http://olivier.chapelle.cc/pub/err.pdf[Expected reciprocal rank for graded relevance].)
It is based on the assumption of a cascade model of search, in which a user scans through ranked search
results in order and stops at the first document that satisfies the information need. For this reason, it
is a good metric for question answering and navigation queries, but less so for survey oriented information
needs where the user is interested in finding many relevant documents in the top k results.
The metric models the expectation of the reciprocal of the position at which a user stops reading through
the result list. This means that relevant document in top ranking positions will contribute much to the
overall score. However, the same document will contribute much less to the score if it appears in a lower rank,
even more so if there are some relevant (but maybe less relevant) documents preceding it.
In this way, the ERR metric discounts documents which are shown after very relevant documents. This introduces
a notion of dependency in the ordering of relevant documents that e.g. Precision or DCG don't account for.
[source,js]
--------------------------------
GET /twitter/_rank_eval
{
"requests": [
{
"id": "JFK query",
"request": { "query": { "match_all": {}}},
"ratings": []
}],
"metric": {
"expected_reciprocal_rank": {
"maximum_relevance" : 3,
"k" : 20
}
}
}
--------------------------------
// CONSOLE
// TEST[setup:twitter]
The `expected_reciprocal_rank` metric takes the following parameters:
[cols="<,<",options="header",]
|=======================================================================
|Parameter |Description
| `maximum_relevance` | Mandatory parameter. The highest relevance grade used in the user supplied
relevance judgments.
|`k` | sets the maximum number of documents retrieved per query. This value will act in place of the usual `size` parameter
in the query. Defaults to 10.
|=======================================================================
[float]
=== Response format