OpenSearch/docs/reference/search/suggesters/completion-suggest.asciidoc

[[search-suggesters-completion]]
=== Completion Suggester

NOTE: In order to understand the format of suggestions, please
read the <<search-suggesters>> page first.

The `completion` suggester is a so-called prefix suggester. It does not
do spell correction like the `term` or `phrase` suggesters but allows
basic `auto-complete` functionality.

==== Why another suggester? Why not prefix queries?

The first question which comes to mind when reading about a prefix
suggestion is, why you should use it at all, if you have prefix queries
already. The answer is simple: Prefix suggestions are fast.

The data structures are internally backed by Lucenes
`AnalyzingSuggester`, which uses FSTs to execute suggestions. Usually
these data structures are costly to create, stored in-memory and need to
be rebuilt every now and then to reflect changes in your indexed
documents. The `completion` suggester circumvents this by storing the
FST as part of your index during index time. This allows for really fast
loads and executions.

[[completion-suggester-mapping]]
==== Mapping

In order to use this feature, you have to specify a special mapping for
this field, which enables the special storage of the field.

[source,js]
--------------------------------------------------
curl -X PUT localhost:9200/music
curl -X PUT localhost:9200/music/song/_mapping -d '{
  "song" : {
        "properties" : {
            "name" : { "type" : "string" },
            "suggest" : { "type" : "completion",
                          "index_analyzer" : "simple",
                          "search_analyzer" : "simple",
                          "payloads" : true
            }
        }
    }
}'
--------------------------------------------------

Mapping supports the following parameters:

`index_analyzer`::
    The index analyzer to use, defaults to `simple`.

`search_analyzer`::
    The search analyzer to use, defaults to `simple`.
    In case you are wondering why we did not opt for the `standard`
    analyzer: We try to have easy to understand behaviour here, and if you
    index the field content `At the Drive-in`, you will not get any
    suggestions for `a`, nor for `d` (the first non stopword).


`payloads`::
    Enables the storing of payloads, defaults to `false`

`preserve_separators`::
    Preserves the separators, defaults to `true`.
    If disabled, you could find a field starting with `Foo Fighters`, if you
    suggest for `foof`.

`preserve_position_increments`::
    Enables position increments, defaults
    to `true`. If disabled and using stopwords analyzer, you could get a
    field starting with `The Beatles`, if you suggest for `b`. *Note*: You
    could also achieve this by indexing two inputs, `Beatles` and
    `The Beatles`, no need to change a simple analyzer, if you are able to
    enrich your data.

`max_input_length`::
    Limits the length of a single input, defaults to `50` UTF-16 code points.
    This limit is only used at index time to reduce the total number of
    characters per input string in order to prevent massive inputs from
    bloating the underlying datastructure. The most usecases won't be influenced
    by the default value since prefix completions hardly grow beyond prefixes longer
    than a handful of characters. (Old name "max_input_len" is deprecated)

[[indexing]]
==== Indexing

[source,js]
--------------------------------------------------
curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{
    "name" : "Nevermind",
    "suggest" : {
        "input": [ "Nevermind", "Nirvana" ],
        "output": "Nirvana - Nevermind",
        "payload" : { "artistId" : 2321 },
        "weight" : 34
    }
}'
--------------------------------------------------

The following parameters are supported:

`input`::
    The input to store, this can be a an array of strings or just
    a string. This field is mandatory.

`output`::
    The string to return, if a suggestion matches. This is very
    useful to normalize outputs (i.e. have them always in the format
    `artist - songname`). The result is de-duplicated if several documents
    have the same output, i.e. only one is returned as part of the
    suggest result. This is optional.

`payload`::
    An arbitrary JSON object, which is simply returned in the
    suggest option. You could store data like the id of a document, in order
    to load it from elasticsearch without executing another search (which
    might not yield any results, if `input` and `output` differ strongly).

`weight`::
    A positive integer, which defines a weight and allows you to
    rank your suggestions. This field is optional.

NOTE: Even though you will lose most of the features of the
completion suggest, you can choose to use the following shorthand form.
Keep in mind that you will not be able to use several inputs, an output, 
payloads or weights. This form does still work inside of multi fields.

[source,js]
--------------------------------------------------
{
  "suggest" : "Nirvana"
}
--------------------------------------------------

NOTE: The suggest data structure might not reflect deletes on
documents immediately. You may need to do an <<indices-optimize>> for that.
You can call optimize with the `only_expunge_deletes=true` to only cater for deletes
or alternatively call a <<index-modules-merge>> operation.

[[querying]]
==== Querying

Suggesting works as usual, except that you have to specify the suggest
type as `completion`.

[source,js]
--------------------------------------------------
curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
    "song-suggest" : {
        "text" : "n",
        "completion" : {
            "field" : "suggest"
        }
    }
}'

{
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "song-suggest" : [ {
    "text" : "n",
    "offset" : 0,
    "length" : 4,
    "options" : [ {
      "text" : "Nirvana - Nevermind",
      "score" : 34.0, "payload" : {"artistId":2321}
    } ]
  } ]
}
--------------------------------------------------

As you can see, the payload is included in the response, if configured
appropriately. If you configured a weight for a suggestion, this weight
is used as `score`. Also the `text` field uses the `output` of your
indexed suggestion, if configured, otherwise the matched part of the
`input` field.

NOTE: The completion suggester considers all documents in the index.
See <<suggester-context>> for an explanation of how to query a subset of
documents instead.

[[fuzzy]]
==== Fuzzy queries

The completion suggester also supports fuzzy queries - this means,
you can actually have a typo in your search and still get results back.

[source,js]
--------------------------------------------------
curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{
    "song-suggest" : {
        "text" : "n",
        "completion" : {
            "field" : "suggest",
            "fuzzy" : {
                "fuzziness" : 2
            }
        }
    }
}'
--------------------------------------------------

The fuzzy query can take specific fuzzy parameters.
The following parameters are supported:

[horizontal]
`fuzziness`::
    The fuzziness factor, defaults to `AUTO`.
    See  <<fuzziness>> for allowed settings.

`transpositions`::
    Sets if transpositions should be counted
    as one or two changes, defaults to `true`

`min_length`::
    Minimum length of the input before fuzzy
    suggestions are returned, defaults `3`

`prefix_length`::
    Minimum length of the input, which is not
    checked for fuzzy alternatives, defaults to `1`

`unicode_aware`::
    Sets all are measurements (like edit distance,
    transpositions and lengths) in unicode code points
    (actual letters) instead of bytes.

NOTE: If you want to stick with the default values, but
      still use fuzzy, you can either use `fuzzy: {}`
      or `fuzzy: true`.
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[search-suggesters-completion]]`
			`=== Completion Suggester`

			`NOTE: In order to understand the format of suggestions, please`
			`read the <<search-suggesters>> page first.`

			The `completion` suggester is a so-called prefix suggester. It does not
			do spell correction like the `term` or `phrase` suggesters but allows
			basic `auto-complete` functionality.

			`==== Why another suggester? Why not prefix queries?`

			`The first question which comes to mind when reading about a prefix`
Docs: fixed a typo in the docs Closes: #6718 2014-07-03 15:33:32 -04:00			`suggestion is, why you should use it at all, if you have prefix queries`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`already. The answer is simple: Prefix suggestions are fast.`

			`The data structures are internally backed by Lucenes`
			`AnalyzingSuggester`, which uses FSTs to execute suggestions. Usually
			`these data structures are costly to create, stored in-memory and need to`
			`be rebuilt every now and then to reflect changes in your indexed`
			documents. The `completion` suggester circumvents this by storing the
			`FST as part of your index during index time. This allows for really fast`
			`loads and executions.`

Uniquify anchor links to fix asciidoc/docbook generation 2013-09-30 17:32:00 -04:00			`[[completion-suggester-mapping]]`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`==== Mapping`

			`In order to use this feature, you have to specify a special mapping for`
			`this field, which enables the special storage of the field.`

			`[source,js]`
			`--------------------------------------------------`
			`curl -X PUT localhost:9200/music`
			`curl -X PUT localhost:9200/music/song/_mapping -d '{`
			`"song" : {`
			`"properties" : {`
			`"name" : { "type" : "string" },`
			`"suggest" : { "type" : "completion",`
			`"index_analyzer" : "simple",`
			`"search_analyzer" : "simple",`
			`"payloads" : true`
			`}`
			`}`
			`}`
			`}'`
			`--------------------------------------------------`

			`Mapping supports the following parameters:`

[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`index_analyzer`::
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			The index analyzer to use, defaults to `simple`.

[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`search_analyzer`::
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			The search analyzer to use, defaults to `simple`.
			In case you are wondering why we did not opt for the `standard`
			`analyzer: We try to have easy to understand behaviour here, and if you`
			index the field content `At the Drive-in`, you will not get any
			suggestions for `a`, nor for `d` (the first non stopword).


[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`payloads`::
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			Enables the storing of payloads, defaults to `false`

Add 'min_input_len' to completion suggester Restrict the size of the input length to a reasonable size otherwise very long strings can cause StackOverflowExceptions deep down in lucene land. Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default. This limit is only present at index time and not at query time. If prefix completions > 50 UTF-16 codepoints are expected / desired this limit should be raised. Critical string sizes are beyone the 1k UTF-16 Codepoints limit. Closes #3596 2013-09-02 09:55:47 -04:00			`preserve_separators`::
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			Preserves the separators, defaults to `true`.
			If disabled, you could find a field starting with `Foo Fighters`, if you
			suggest for `foof`.

[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`preserve_position_increments`::
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`Enables position increments, defaults`
			to `true`. If disabled and using stopwords analyzer, you could get a
			field starting with `The Beatles`, if you suggest for `b`. Note: You
			could also achieve this by indexing two inputs, `Beatles` and
			`The Beatles`, no need to change a simple analyzer, if you are able to
			`enrich your data.`

Standardized use of “_length” for parameter names rather than “_len”. Java Builder apis drop old “len” methods in favour of new “length” Rest APIs support both old “len: and new “length” forms using new ParseField class to a) provide compiler-checked consistency between Builder and Parser classes and b) a common means of handling deprecated syntax in the DSL. Documentation and rest specs only document the new “*length” forms Closes #4083 2014-01-02 11:11:20 -05:00			`max_input_length`::
Add 'min_input_len' to completion suggester Restrict the size of the input length to a reasonable size otherwise very long strings can cause StackOverflowExceptions deep down in lucene land. Yet, this is simply a saftly limit set to `50` UTF-16 codepoints by default. This limit is only present at index time and not at query time. If prefix completions > 50 UTF-16 codepoints are expected / desired this limit should be raised. Critical string sizes are beyone the 1k UTF-16 Codepoints limit. Closes #3596 2013-09-02 09:55:47 -04:00			Limits the length of a single input, defaults to `50` UTF-16 code points.
			`This limit is only used at index time to reduce the total number of`
			`characters per input string in order to prevent massive inputs from`
			`bloating the underlying datastructure. The most usecases won't be influenced`
			`by the default value since prefix completions hardly grow beyond prefixes longer`
Standardized use of “_length” for parameter names rather than “_len”. Java Builder apis drop old “len” methods in favour of new “length” Rest APIs support both old “len: and new “length” forms using new ParseField class to a) provide compiler-checked consistency between Builder and Parser classes and b) a common means of handling deprecated syntax in the DSL. Documentation and rest specs only document the new “*length” forms Closes #4083 2014-01-02 11:11:20 -05:00			`than a handful of characters. (Old name "max_input_len" is deprecated)`
[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00
Add more anchor links to documentation Related to #3679 2013-09-25 12:17:40 -04:00			`[[indexing]]`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`==== Indexing`

			`[source,js]`
			`--------------------------------------------------`
			`curl -X PUT 'localhost:9200/music/song/1?refresh=true' -d '{`
			`"name" : "Nevermind",`
[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`"suggest" : {`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`"input": [ "Nevermind", "Nirvana" ],`
			`"output": "Nirvana - Nevermind",`
			`"payload" : { "artistId" : 2321 },`
			`"weight" : 34`
			`}`
			`}'`
			`--------------------------------------------------`

			`The following parameters are supported:`

[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`input`::
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`The input to store, this can be a an array of strings or just`
			`a string. This field is mandatory.`

[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`output`::
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`The string to return, if a suggestion matches. This is very`
			`useful to normalize outputs (i.e. have them always in the format`
[DOCS] Completion suggest: Clarify de-duplication, optimize/merge This contribution is based on the feedback given in issue #4254 and issue #4255, and should clear things up, when suggestions are being removed and not displayed anymore after deletion of data. 2013-11-28 04:27:31 -05:00			`artist - songname`). The result is de-duplicated if several documents
			`have the same output, i.e. only one is returned as part of the`
			`suggest result. This is optional.`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`payload`::
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`An arbitrary JSON object, which is simply returned in the`
			`suggest option. You could store data like the id of a document, in order`
			`to load it from elasticsearch without executing another search (which`
			might not yield any results, if `input` and `output` differ strongly).

[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`weight`::
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`A positive integer, which defines a weight and allows you to`
			`rank your suggestions. This field is optional.`

[DOCS] Reworded note about shorthand suggest syntax The existing Note about the shorthand suggest syntax was poorly worded and confusing. Please check whether the way I've phrased it now is still correct as to what the shorthand form actually does and doesn't do: the original wording did not provide me enough information to be sure. Thanks! 2014-05-02 06:01:31 -04:00			`NOTE: Even though you will lose most of the features of the`
			`completion suggest, you can choose to use the following shorthand form.`
			`Keep in mind that you will not be able to use several inputs, an output,`
			`payloads or weights. This form does still work inside of multi fields.`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00
			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"suggest" : "Nirvana"`
			`}`
			`--------------------------------------------------`

[DOCS] Completion suggest: Clarify de-duplication, optimize/merge This contribution is based on the feedback given in issue #4254 and issue #4255, and should clear things up, when suggestions are being removed and not displayed anymore after deletion of data. 2013-11-28 04:27:31 -05:00			`NOTE: The suggest data structure might not reflect deletes on`
			`documents immediately. You may need to do an <<indices-optimize>> for that.`
			You can call optimize with the `only_expunge_deletes=true` to only cater for deletes
			`or alternatively call a <<index-modules-merge>> operation.`

Add more anchor links to documentation Related to #3679 2013-09-25 12:17:40 -04:00			`[[querying]]`
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`==== Querying`

			`Suggesting works as usual, except that you have to specify the suggest`
			type as `completion`.

			`[source,js]`
			`--------------------------------------------------`
			`curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{`
			`"song-suggest" : {`
			`"text" : "n",`
			`"completion" : {`
			`"field" : "suggest"`
			`}`
			`}`
			`}'`

			`{`
			`"_shards" : {`
			`"total" : 5,`
			`"successful" : 5,`
			`"failed" : 0`
			`},`
			`"song-suggest" : [ {`
			`"text" : "n",`
			`"offset" : 0,`
			`"length" : 4,`
			`"options" : [ {`
			`"text" : "Nirvana - Nevermind",`
			`"score" : 34.0, "payload" : {"artistId":2321}`
			`} ]`
			`} ]`
			`}`
			`--------------------------------------------------`

			`As you can see, the payload is included in the response, if configured`
			`appropriately. If you configured a weight for a suggestion, this weight`
			is used as `score`. Also the `text` field uses the `output` of your
			`indexed suggestion, if configured, otherwise the matched part of the`
			`input` field.
[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00
ContextSuggester ================ This commit extends the `CompletionSuggester` by context informations. In example such a context informations can be a simple string representing a category reducing the suggestions in order to this category. Three base implementations of these context informations have been setup in this commit. - a Category Context - a Geo Context All the mapping for these context informations are specified within a context field in the completion field that should use this kind of information. 2013-10-08 07:55:25 -04:00			`NOTE: The completion suggester considers all documents in the index.`
			`See <<suggester-context>> for an explanation of how to query a subset of`
			`documents instead.`
[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00
Add more anchor links to documentation Related to #3679 2013-09-25 12:17:40 -04:00			`[[fuzzy]]`
[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`==== Fuzzy queries`

			`The completion suggester also supports fuzzy queries - this means,`
			`you can actually have a typo in your search and still get results back.`

			`[source,js]`
			`--------------------------------------------------`
			`curl -X POST 'localhost:9200/music/_suggest?pretty' -d '{`
			`"song-suggest" : {`
			`"text" : "n",`
			`"completion" : {`
			`"field" : "suggest",`
			`"fuzzy" : {`
Rename edit_distance/min_similarity to fuzziness A lot of different API's currently use different names for the same logical parameter. Since lucene moved away from the notion of a `similarity` and now uses an `fuzziness` we should generalize this and encapsulate the generation, parsing and creation of these settings across all queries. This commit adds a new `Fuzziness` class that handles the renaming and generalization in a backwards compatible manner. This commit also added a ParseField class to better support deprecated Query DSL parameters The ParseField class allows specifying parameger that have been deprecated. Those parameters can be more easily tracked and removed in future version. This also allows to run queries in `strict` mode per index to throw exceptions if a query is executed with deprected keys. Closes #4082 2014-01-02 10:45:24 -05:00			`"fuzziness" : 2`
[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`}`
			`}`
			`}`
			`}'`
			`--------------------------------------------------`

			`The fuzzy query can take specific fuzzy parameters.`
			`The following parameters are supported:`

			`[horizontal]`
Rename edit_distance/min_similarity to fuzziness A lot of different API's currently use different names for the same logical parameter. Since lucene moved away from the notion of a `similarity` and now uses an `fuzziness` we should generalize this and encapsulate the generation, parsing and creation of these settings across all queries. This commit adds a new `Fuzziness` class that handles the renaming and generalization in a backwards compatible manner. This commit also added a ParseField class to better support deprecated Query DSL parameters The ParseField class allows specifying parameger that have been deprecated. Those parameters can be more easily tracked and removed in future version. This also allows to run queries in `strict` mode per index to throw exceptions if a query is executed with deprected keys. Closes #4082 2014-01-02 10:45:24 -05:00			`fuzziness`::
			The fuzziness factor, defaults to `AUTO`.
			`See <<fuzziness>> for allowed settings.`
[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00
			`transpositions`::
			`Sets if transpositions should be counted`
			as one or two changes, defaults to `true`

			`min_length`::
			`Minimum length of the input before fuzzy`
			suggestions are returned, defaults `3`

			`prefix_length`::
			`Minimum length of the input, which is not`
			checked for fuzzy alternatives, defaults to `1`

Updated Analyzing/Fuzzysuggester from lucene trunk * Minor alignments (like setter to ctor) * FuzzySuggester has a unicode aware flag, which is not exposed in the fuzzy completion request parameters * Made XAnalyzingSuggester flags (PAYLOAD_SEP, END_BYTE, SEP_LABEL) to be written into the postings format, so we can retain backwards compatibility * The above change also implies, that these flags can be set per instantiated XAnalyzingSuggester * CompletionPostingsFormatTest now uses a randomProvider for writing data to check for bwc 2013-11-25 12:22:34 -05:00			`unicode_aware`::
			`Sets all are measurements (like edit distance,`
			`transpositions and lengths) in unicode code points`
			`(actual letters) instead of bytes.`

[DOCS] Added fuzzy options to completion suggester 2013-09-04 14:40:36 -04:00			`NOTE: If you want to stick with the default values, but`
			still use fuzzy, you can either use `fuzzy: {}`
			or `fuzzy: true`.