Allowing `_doc` as a type will enable users to make the transition to 7.0
smoother since the index APIs will be `PUT index/_doc/id` and `POST index/_doc`.
This also moves most of the documentation to `_doc` as a type name.
Closes#27750Closes#27751
This adds a new feature to the Term Vectors API which allows for filtering of
terms based on their tf-idf scores. With `dfs` option on, this could be useful
for finding out a good characteric vector of a document or a set of documents.
The parameters are similar to the ones used in the MLT Query.
Closes#9561
The `analyzer` setting is now the base setting, and `search_analyzer`
is simply an override of the search time analyzer. When setting
`search_analyzer`, `analyzer` must be set.
closes#9371
We speak of the term vectors of a document, where each field has an associated
stored term vector. Since by default we are requesting all the term vectors of
a document, the HTTP request endpoint should rather be called `_termvectors`
instead of `_termvector`. The usage of `_termvector` is now deprecated, as
well as the transport client call to termVector and prepareTermVector.
Closes#8484
The MLT query has a lot of parameters. For example, a set of documents is
specified with either `like_text`, `ids` or `docs`, with at least one
parameter required. This commit groups all the document specification
parameters under one called `like`. The syntax is described below and could
easily be extended to allow for new means of specifying document input. The
`like_text`, `ids` and `docs` parameters are deprecated.
As a single piece text:
{
"query": {
"more_like_this": {
"like": "some text here"
}
}
}
As a single item:
{
"query": {
"more_like_this": {
"like": {
"_index": "imdb",
"_type": "movies",
"_id": "88247"
}
}
}
}
Or as a mixture of all:
{
"query": {
"more_like_this": {
"like": [
"Some random text ...",
{
"_index": "imdb",
"_type": "movies",
"_id": "88247"
},
{
"_index": "imdb",
"_type": "movies",
"doc": {
"title": "Document with an artificial title!"
}
}
]
}
}
}
Closes#8039
This adds a `per_field_analyzer` parameter to the Term Vectors API, which
allows to override the default analyzer at the field. If the field already
stores term vectors, then they will be re-generated. Since the MLT Query uses
the Term Vectors API under its hood, this commits also adds the same ability
to the MLT Query, thereby allowing users to fine grain how each field item
should be processed and analyzed.
Closes#7801
By default term vectors are now realtime, as opposed to previously near
realtime. If they are not found in the index, they will be generated on the
fly. The document is fetched from the transaction log and treated as an
artificial document. One can set `realtime` parameter to `false` in order to
disable this functionality. This consequently makes the MLT query realtime in
fetching documents, as it previsouly used to be before switching from using
the multi get API to the mtv API.
Closes#7846
This adds the ability to the Term Vector API to generate term vectors for
artifical documents, that is for documents not present in the index. Following
a similar syntax to the Percolator API, a new 'doc' parameter is used, instead
of '_id', that specifies the document of interest. The parameters '_index' and
'_type' determine the mapping and therefore analyzers to apply to each value
field.
Closes#7530
Adds the ability to the Term Vector API to generate term vectors for some
chosen fields, even though they haven't been explicitely stored in the index.
Relates to #5184Closes#6567