PrecisionAtN and ReciprocalRank are binary evaluation metrics by default that
only distiguish between relevant/irrelevant search results. So far we assumed
that relevant documents are labaled with 1 (irrelevant docs with 0) in the
evaluation request, but this is cumbersome if the ratings are provided on a
larger integer scale and would need to get mapped to a 0/1 value.
This change introduces a threshold parameter on the PrecisionAtN and
ReciprocalRank metric than can be used to set the threshold from which on a
document is considered "relevant". It defaults to 1, so in case of 0/1 ratings
the threshold doesn't have to be set and only ratings with value 0 are
considered to be irrelevant.
This adds roundtrip testing to RankEvalSpec, fixes issues
introduced with the previous roundtrip tests, splits
xcontent generation/parsing from actually checking the resulting
objects to deal with e.g. all evaluation metrics needing
some extra treatment.
Renames QuerySpec to RatedRequest, renames newly introduced xcontent
generation helper to conform with naming conventions.
Fixes several lines that were too long, adds missing types where
needed.
This factors the roundtripping out of RatedDocumentTests. Makes
RankedListQualityMetric and RatedDocument implement FromXContenBuilder
to be able to do the aforementioned refactoring in a generic way. Adds
a roundtrip test to DiscountedCumulativeGainAt.
Open questions:
DiscountedCumulativeGain didn't have a constructor that accepted all possible
parameters as arguments. Added one. I guess we still want to keep the one
that only requires the position argument?
To make roundtripping work I had to change the NAME parameter when generating
XContent for DiscountedCumulativeGainAt - all remaining unit tests seem to be
passing (haven't checked the REST tests yet) - need to figure out why that was
there to begin with.
For the two current metrics Prec@ and reciprocal rank we currently average the
partial results in the transport action. If other metric later need a different
behaviour or want to parametrize this, this operation should be part of the
metric itself, so this change moves it there. Also removing on of the two test
packages, main code is also in one package only.
This adds a second query evaluation metric alongside precision_at. Reciprocal
Rank is defined as 1/rank, where rank is the position of the first relevant
document in the search result. The results are averaged across all queries
across the sample of queries, according to
https://en.wikipedia.org/wiki/Mean_reciprocal_rank