From 48b7172ee71249c9b76aea38a9f0cc9e13d7d5bb Mon Sep 17 00:00:00 2001 From: Alex Ksikes Date: Thu, 8 May 2014 12:21:18 +0200 Subject: [PATCH] Provided some insights as to how More Like This works internally. In the Google Groups forum there appears to be some confusion as to what mlt does. This documentation update should hopefully help demystifying this feature, and provide some understanding as to how to use its parameters. Closes #6092 --- docs/reference/query-dsl/queries/mlt-query.asciidoc | 13 +++++++++++++ docs/reference/search/more-like-this.asciidoc | 4 +++- 2 files changed, 16 insertions(+), 1 deletion(-) diff --git a/docs/reference/query-dsl/queries/mlt-query.asciidoc b/docs/reference/query-dsl/queries/mlt-query.asciidoc index b72a6fc4cab..4965b8677fc 100644 --- a/docs/reference/query-dsl/queries/mlt-query.asciidoc +++ b/docs/reference/query-dsl/queries/mlt-query.asciidoc @@ -18,6 +18,19 @@ running it against one or more fields. `more_like_this` can be shortened to `mlt`. +Under the hood, `more_like_this` simply creates multiple `should` clauses in a `bool` query of +interesting terms extracted from some provided text. The interesting terms are +selected with respect to their tf-idf scores. These are controlled by +`min_term_freq`, `min_doc_freq`, and `max_doc_freq`. The number of interesting +terms is controlled by `max_query_terms`. While the minimum number of clauses +that must be satisfied is controlled by `percent_terms_to_match`. The terms +are extracted from `like_text` which is analyzed by the analyzer associated +with the field, unless specified by `analyzer`. There are other parameters, +such as `min_word_length`, `max_word_length` or `stop_words`, to control what +terms should be considered as interesting. In order to give more weight to +more interesting terms, each boolean clause associated with a term could be +boosted by the term tf-idf score times some boosting factor `boost_terms`. + The `more_like_this` top level parameters include: [cols="<,<",options="header",] diff --git a/docs/reference/search/more-like-this.asciidoc b/docs/reference/search/more-like-this.asciidoc index 28bd07871f9..25e8158c461 100644 --- a/docs/reference/search/more-like-this.asciidoc +++ b/docs/reference/search/more-like-this.asciidoc @@ -14,7 +14,9 @@ The API simply results in executing a search request with parameters match the parameters to the `more_like_this` query). This means that the body of the request can optionally include all the request body options in the <> (facets, from/to and so on). +API>> (aggs, from/to and so on). Internally, the more like this +API is equivalent to performing a boolean query of `more_like_this_field` +queries, with one query per specified `mlt_fields`. Rest parameters relating to search are also allowed, including `search_type`, `search_indices`, `search_types`, `search_scroll`,