Commit Graph

20 Commits

Author SHA1 Message Date
Alex Ksikes d339ee4005 Term Vectors: terms filtering
This adds a new feature to the Term Vectors API which allows for filtering of
terms based on their tf-idf scores. With `dfs` option on, this could be useful
for finding out a good characteric vector of a document or a set of documents.
The parameters are similar to the ones used in the MLT Query.

Closes #9561
2015-04-14 19:11:09 +02:00
Alex Ksikes 615513ee9b Docs: clearer MLT documentation
Closes #9351
2015-01-20 16:42:39 +01:00
Alex Ksikes 256712640f MLT Query: Support for ignore docs
Adds a `ignore_like` parameter to the MLT Query, which simply tells the
algorithm to skip all the terms from the given documents. This could be useful
in order to better guide nearest neighbor search by telling the algorithm to
never explore the space spanned by the given `ignore_like` docs. In essence we
are interested about the characteristic of a given item, but not of the ones
provided by `ignore_like`, thereby forcing the algorithm to go deeper in its
selection of terms. Note that this is different than simply performing a must
not boolean query on the unliked items. The syntax is exactly the same as the
`like` parameter.

Closes #8674
2014-11-28 14:48:43 +01:00
Clinton Gormley cff544dcc2 Docs: Removed old coming/added tags 2014-11-10 14:41:24 +01:00
Alex Ksikes 0be5c60bce MLT Query: use ParseField#withAllDeprecated for percent_terms_to_match
Also the parameter was deprecated but not removed so we keep it in the doc and
mark it as deprecated ...

Closes #8241
2014-10-27 17:35:06 +01:00
Alex Ksikes 991f3e2cd3 Docs: fix tags for dfs and new like parameter 2014-10-27 15:42:44 +01:00
Alex Ksikes 4da407a869 MLT Query: versatile 'like' parameter
The MLT query has a lot of parameters. For example, a set of documents is
specified with either `like_text`, `ids` or `docs`, with at least one
parameter required. This commit groups all the document specification
parameters under one called `like`. The syntax is described below and could
easily be extended to allow for new means of specifying document input. The
`like_text`, `ids` and `docs` parameters are deprecated.

As a single piece text:

{
  "query": {
    "more_like_this": {
      "like": "some text here"
    }
  }
}

As a single item:

{
  "query": {
    "more_like_this": {
      "like": {
        "_index": "imdb",
        "_type": "movies",
        "_id": "88247"
      }
    }
  }
}

Or as a mixture of all:

{
  "query": {
    "more_like_this": {
      "like": [
        "Some random text ...",
        {
          "_index": "imdb",
          "_type": "movies",
          "_id": "88247"
        },
        {
          "_index": "imdb",
          "_type": "movies",
          "doc": {
            "title": "Document with an artificial title!"
          }
        }
      ]
    }
  }
}

Closes #8039
2014-10-25 11:04:51 +02:00
Alex Ksikes 349b7a3a8b Term Vectors/MLT Query: support for different analyzers than default at field
This adds a `per_field_analyzer` parameter to the Term Vectors API, which
allows to override the default analyzer at the field. If the field already
stores term vectors, then they will be re-generated. Since the MLT Query uses
the Term Vectors API under its hood, this commits also adds the same ability
to the MLT Query, thereby allowing users to fine grain how each field item
should be processed and analyzed.

Closes #7801
2014-10-03 16:40:17 +02:00
Alex Ksikes b118558962 MLT Query: Support for artificial documents
Previously, the only way to specify a document not present in the index was to
use `like_text`. This would usually lead to complex queries made of multiple
MLT queries per document field. This commit adds the ability to the MLT query
to directly specify documents not present in the index (artificial documents).
The syntax is similar to the Percolator API or to the Multi Term Vector API.

Closes #7725
2014-09-29 15:49:13 +02:00
Alex Ksikes 5014158d6b MLT Query: use minimum should match more extensive syntax
The minimum number of optional should clauses of the generated query to match
can now be set using the more extensive minimum should match syntax. This
makes the `percent_terms_to_match` parameter deprecated, and replaced in favor
to a new `minimum_should_match` parameter.

Closes #7898
2014-09-29 11:14:56 +02:00
Alex Ksikes 51bf3e6730 MLT Query: fix percent_terms_to_match
The parameter `percent_terms_to_match` (percentage of terms that must match in
the generated query) was wrongly set to the top level boolean query. This
would lead to zero or all results type of situations. This commit ensures that
the parameter is indeed applied to the query of generated terms.

Closes #7754
2014-09-25 09:56:53 +02:00
Alex Ksikes e78694ae82 More Like This Query: defaults to all possible fields for items
Items with no specified field now defaults to all the possible fields from the
document source. Previously, we had required 'fields' to be specified either
as a top level parameter or for each item. The default behavior is now similar
to the MLT API.

Closes #7382
2014-08-22 15:07:22 +02:00
Alex Ksikes f1a6b4e9fe More Like This Query: Switch to using the multi-termvectors API
The term vector API can now generate term vectors on the fly, if the terms are
not already stored in the index. This commit exploits this new functionality
for the MLT query. Now the terms are directly retrieved using multi-
termvectors API, instead of generating them from the texts retrieved using the
multi-get API.

Closes #7014
2014-08-21 12:18:21 +02:00
Alex Ksikes 2546c06131 More Like This Query: allow for both 'like_text' and 'docs/ids' to be specified.
Closes #6246
2014-05-22 13:50:17 +02:00
Alex Ksikes a29b4a800d More Like This Query: replaced 'exclude' with 'include' to avoid double negation when set.
Closes #6248
2014-05-21 18:45:03 +02:00
Alex Ksikes db991dc3a4 More Like This Query: Added searching for multiple items.
The syntax to specify one or more items is the same as for the Multi GET API.
If only one document is specified, the results returned are the same as when
using the More Like This API.

Relates #4075 Closes #5857
2014-05-17 19:14:56 +02:00
Alex Ksikes 48b7172ee7 Provided some insights as to how More Like This works internally.
In the Google Groups forum there appears to be some confusion as to what mlt
does. This documentation update should hopefully help demystifying this
feature, and provide some understanding as to how to use its parameters.

Closes #6092
2014-05-09 12:13:29 +02:00
Alex Ksikes b55d8ed2e3 Fix behavior on default boost factor for More Like This.
A boost terms factor of 1.0 is not the same as no boosting of terms.
The desired behavior is to deactivate boosting by default. If the user
specifies any value other than 0, then boosting is activated.

Closes #6021
2014-05-02 16:59:09 +02:00
markharwood 2795f4e55d Standardized use of “*_length” for parameter names rather than “*_len”.
Java Builder apis drop old “len” methods in favour of new “length”
Rest APIs support both old “len: and new “length” forms using new ParseField class to a) provide compiler-checked consistency between Builder and Parser classes and
b) a common means of handling deprecated syntax in the DSL.
Documentation and rest specs only document the new “*length” forms
Closes #4083
2014-01-13 15:59:15 +00:00
Clinton Gormley 822043347e Migrated documentation into the main repo 2013-08-29 01:24:34 +02:00