Provided some insights as to how More Like This works internally.

In the Google Groups forum there appears to be some confusion as to what mlt
does. This documentation update should hopefully help demystifying this
feature, and provide some understanding as to how to use its parameters.

Closes #6092
This commit is contained in:
Alex Ksikes 2014-05-08 12:21:18 +02:00
parent a972aaa7ae
commit 48b7172ee7
2 changed files with 16 additions and 1 deletions

View File

@ -18,6 +18,19 @@ running it against one or more fields.
`more_like_this` can be shortened to `mlt`.
Under the hood, `more_like_this` simply creates multiple `should` clauses in a `bool` query of
interesting terms extracted from some provided text. The interesting terms are
selected with respect to their tf-idf scores. These are controlled by
`min_term_freq`, `min_doc_freq`, and `max_doc_freq`. The number of interesting
terms is controlled by `max_query_terms`. While the minimum number of clauses
that must be satisfied is controlled by `percent_terms_to_match`. The terms
are extracted from `like_text` which is analyzed by the analyzer associated
with the field, unless specified by `analyzer`. There are other parameters,
such as `min_word_length`, `max_word_length` or `stop_words`, to control what
terms should be considered as interesting. In order to give more weight to
more interesting terms, each boolean clause associated with a term could be
boosted by the term tf-idf score times some boosting factor `boost_terms`.
The `more_like_this` top level parameters include:
[cols="<,<",options="header",]

View File

@ -14,7 +14,9 @@ The API simply results in executing a search request with
parameters match the parameters to the `more_like_this` query). This
means that the body of the request can optionally include all the
request body options in the <<search-search,search
API>> (facets, from/to and so on).
API>> (aggs, from/to and so on). Internally, the more like this
API is equivalent to performing a boolean query of `more_like_this_field`
queries, with one query per specified `mlt_fields`.
Rest parameters relating to search are also allowed, including
`search_type`, `search_indices`, `search_types`, `search_scroll`,