OpenSearch

Commit Graph

Author	SHA1	Message	Date
Alex Ksikes	d339ee4005	Term Vectors: terms filtering This adds a new feature to the Term Vectors API which allows for filtering of terms based on their tf-idf scores. With `dfs` option on, this could be useful for finding out a good characteric vector of a document or a set of documents. The parameters are similar to the ones used in the MLT Query. Closes #9561	2015-04-14 19:11:09 +02:00
Alex Ksikes	615513ee9b	Docs: clearer MLT documentation Closes #9351	2015-01-20 16:42:39 +01:00
Alex Ksikes	256712640f	MLT Query: Support for ignore docs Adds a `ignore_like` parameter to the MLT Query, which simply tells the algorithm to skip all the terms from the given documents. This could be useful in order to better guide nearest neighbor search by telling the algorithm to never explore the space spanned by the given `ignore_like` docs. In essence we are interested about the characteristic of a given item, but not of the ones provided by `ignore_like`, thereby forcing the algorithm to go deeper in its selection of terms. Note that this is different than simply performing a must not boolean query on the unliked items. The syntax is exactly the same as the `like` parameter. Closes #8674	2014-11-28 14:48:43 +01:00
Clinton Gormley	cff544dcc2	Docs: Removed old coming/added tags	2014-11-10 14:41:24 +01:00
Alex Ksikes	0be5c60bce	MLT Query: use ParseField#withAllDeprecated for percent_terms_to_match Also the parameter was deprecated but not removed so we keep it in the doc and mark it as deprecated ... Closes #8241	2014-10-27 17:35:06 +01:00
Alex Ksikes	991f3e2cd3	Docs: fix tags for dfs and new like parameter	2014-10-27 15:42:44 +01:00
Alex Ksikes	4da407a869	MLT Query: versatile 'like' parameter The MLT query has a lot of parameters. For example, a set of documents is specified with either `like_text`, `ids` or `docs`, with at least one parameter required. This commit groups all the document specification parameters under one called `like`. The syntax is described below and could easily be extended to allow for new means of specifying document input. The `like_text`, `ids` and `docs` parameters are deprecated. As a single piece text: { "query": { "more_like_this": { "like": "some text here" } } } As a single item: { "query": { "more_like_this": { "like": { "_index": "imdb", "_type": "movies", "_id": "88247" } } } } Or as a mixture of all: { "query": { "more_like_this": { "like": [ "Some random text ...", { "_index": "imdb", "_type": "movies", "_id": "88247" }, { "_index": "imdb", "_type": "movies", "doc": { "title": "Document with an artificial title!" } } ] } } } Closes #8039	2014-10-25 11:04:51 +02:00
Alex Ksikes	349b7a3a8b	Term Vectors/MLT Query: support for different analyzers than default at field This adds a `per_field_analyzer` parameter to the Term Vectors API, which allows to override the default analyzer at the field. If the field already stores term vectors, then they will be re-generated. Since the MLT Query uses the Term Vectors API under its hood, this commits also adds the same ability to the MLT Query, thereby allowing users to fine grain how each field item should be processed and analyzed. Closes #7801	2014-10-03 16:40:17 +02:00
Alex Ksikes	b118558962	MLT Query: Support for artificial documents Previously, the only way to specify a document not present in the index was to use `like_text`. This would usually lead to complex queries made of multiple MLT queries per document field. This commit adds the ability to the MLT query to directly specify documents not present in the index (artificial documents). The syntax is similar to the Percolator API or to the Multi Term Vector API. Closes #7725	2014-09-29 15:49:13 +02:00
Alex Ksikes	5014158d6b	MLT Query: use minimum should match more extensive syntax The minimum number of optional should clauses of the generated query to match can now be set using the more extensive minimum should match syntax. This makes the `percent_terms_to_match` parameter deprecated, and replaced in favor to a new `minimum_should_match` parameter. Closes #7898	2014-09-29 11:14:56 +02:00
Alex Ksikes	51bf3e6730	MLT Query: fix percent_terms_to_match The parameter `percent_terms_to_match` (percentage of terms that must match in the generated query) was wrongly set to the top level boolean query. This would lead to zero or all results type of situations. This commit ensures that the parameter is indeed applied to the query of generated terms. Closes #7754	2014-09-25 09:56:53 +02:00
Alex Ksikes	e78694ae82	More Like This Query: defaults to all possible fields for items Items with no specified field now defaults to all the possible fields from the document source. Previously, we had required 'fields' to be specified either as a top level parameter or for each item. The default behavior is now similar to the MLT API. Closes #7382	2014-08-22 15:07:22 +02:00
Alex Ksikes	f1a6b4e9fe	More Like This Query: Switch to using the multi-termvectors API The term vector API can now generate term vectors on the fly, if the terms are not already stored in the index. This commit exploits this new functionality for the MLT query. Now the terms are directly retrieved using multi- termvectors API, instead of generating them from the texts retrieved using the multi-get API. Closes #7014	2014-08-21 12:18:21 +02:00
Alex Ksikes	2546c06131	More Like This Query: allow for both 'like_text' and 'docs/ids' to be specified. Closes #6246	2014-05-22 13:50:17 +02:00
Alex Ksikes	a29b4a800d	More Like This Query: replaced 'exclude' with 'include' to avoid double negation when set. Closes #6248	2014-05-21 18:45:03 +02:00
Alex Ksikes	db991dc3a4	More Like This Query: Added searching for multiple items. The syntax to specify one or more items is the same as for the Multi GET API. If only one document is specified, the results returned are the same as when using the More Like This API. Relates #4075 Closes #5857	2014-05-17 19:14:56 +02:00
Alex Ksikes	48b7172ee7	Provided some insights as to how More Like This works internally. In the Google Groups forum there appears to be some confusion as to what mlt does. This documentation update should hopefully help demystifying this feature, and provide some understanding as to how to use its parameters. Closes #6092	2014-05-09 12:13:29 +02:00
Alex Ksikes	b55d8ed2e3	Fix behavior on default boost factor for More Like This. A boost terms factor of 1.0 is not the same as no boosting of terms. The desired behavior is to deactivate boosting by default. If the user specifies any value other than 0, then boosting is activated. Closes #6021	2014-05-02 16:59:09 +02:00
markharwood	2795f4e55d	Standardized use of “_length” for parameter names rather than “_len”. Java Builder apis drop old “len” methods in favour of new “length” Rest APIs support both old “len: and new “length” forms using new ParseField class to a) provide compiler-checked consistency between Builder and Parser classes and b) a common means of handling deprecated syntax in the DSL. Documentation and rest specs only document the new “*length” forms Closes #4083	2014-01-13 15:59:15 +00:00
Clinton Gormley	822043347e	Migrated documentation into the main repo	2013-08-29 01:24:34 +02:00

20 Commits