Provided some insights as to how More Like This works internally.
In the Google Groups forum there appears to be some confusion as to what mlt does. This documentation update should hopefully help demystifying this feature, and provide some understanding as to how to use its parameters. Closes #6092
This commit is contained in:
parent
a972aaa7ae
commit
48b7172ee7
|
@ -18,6 +18,19 @@ running it against one or more fields.
|
|||
|
||||
`more_like_this` can be shortened to `mlt`.
|
||||
|
||||
Under the hood, `more_like_this` simply creates multiple `should` clauses in a `bool` query of
|
||||
interesting terms extracted from some provided text. The interesting terms are
|
||||
selected with respect to their tf-idf scores. These are controlled by
|
||||
`min_term_freq`, `min_doc_freq`, and `max_doc_freq`. The number of interesting
|
||||
terms is controlled by `max_query_terms`. While the minimum number of clauses
|
||||
that must be satisfied is controlled by `percent_terms_to_match`. The terms
|
||||
are extracted from `like_text` which is analyzed by the analyzer associated
|
||||
with the field, unless specified by `analyzer`. There are other parameters,
|
||||
such as `min_word_length`, `max_word_length` or `stop_words`, to control what
|
||||
terms should be considered as interesting. In order to give more weight to
|
||||
more interesting terms, each boolean clause associated with a term could be
|
||||
boosted by the term tf-idf score times some boosting factor `boost_terms`.
|
||||
|
||||
The `more_like_this` top level parameters include:
|
||||
|
||||
[cols="<,<",options="header",]
|
||||
|
|
|
@ -14,7 +14,9 @@ The API simply results in executing a search request with
|
|||
parameters match the parameters to the `more_like_this` query). This
|
||||
means that the body of the request can optionally include all the
|
||||
request body options in the <<search-search,search
|
||||
API>> (facets, from/to and so on).
|
||||
API>> (aggs, from/to and so on). Internally, the more like this
|
||||
API is equivalent to performing a boolean query of `more_like_this_field`
|
||||
queries, with one query per specified `mlt_fields`.
|
||||
|
||||
Rest parameters relating to search are also allowed, including
|
||||
`search_type`, `search_indices`, `search_types`, `search_scroll`,
|
||||
|
|
Loading…
Reference in New Issue