333 lines
9.5 KiB
Plaintext
333 lines
9.5 KiB
Plaintext
[[search-suggesters-completion]]
|
|
=== Completion Suggester
|
|
|
|
NOTE: In order to understand the format of suggestions, please
|
|
read the <<search-suggesters>> page first.
|
|
|
|
The `completion` suggester provides auto-complete/search-as-you-type
|
|
functionality. This is a navigational feature to guide users to
|
|
relevant results as they are typing, improving search precision.
|
|
It is not meant for spell correction or did-you-mean functionality
|
|
like the `term` or `phrase` suggesters.
|
|
|
|
Ideally, auto-complete functionality should be as fast as a user
|
|
types to provide instant feedback relevant to what a user has already
|
|
typed in. Hence, `completion` suggester is optimized for speed.
|
|
The suggester uses data structures that enable fast lookups,
|
|
but are costly to build and are stored in-memory.
|
|
|
|
[[completion-suggester-mapping]]
|
|
==== Mapping
|
|
|
|
To use this feature, specify a special mapping for this field,
|
|
which indexes the field values for fast completions.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT music/song/_mapping
|
|
{
|
|
"song" : {
|
|
"properties" : {
|
|
...
|
|
"suggest" : {
|
|
"type" : "completion",
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Mapping supports the following parameters:
|
|
|
|
`analyzer`::
|
|
The index analyzer to use, defaults to `simple`.
|
|
In case you are wondering why we did not opt for the `standard`
|
|
analyzer: We try to have easy to understand behaviour here, and if you
|
|
index the field content `At the Drive-in`, you will not get any
|
|
suggestions for `a`, nor for `d` (the first non stopword).
|
|
|
|
`search_analyzer`::
|
|
The search analyzer to use, defaults to value of `analyzer`.
|
|
|
|
`preserve_separators`::
|
|
Preserves the separators, defaults to `true`.
|
|
If disabled, you could find a field starting with `Foo Fighters`, if you
|
|
suggest for `foof`.
|
|
|
|
`preserve_position_increments`::
|
|
Enables position increments, defaults to `true`.
|
|
If disabled and using stopwords analyzer, you could get a
|
|
field starting with `The Beatles`, if you suggest for `b`. *Note*: You
|
|
could also achieve this by indexing two inputs, `Beatles` and
|
|
`The Beatles`, no need to change a simple analyzer, if you are able to
|
|
enrich your data.
|
|
|
|
`max_input_length`::
|
|
Limits the length of a single input, defaults to `50` UTF-16 code points.
|
|
This limit is only used at index time to reduce the total number of
|
|
characters per input string in order to prevent massive inputs from
|
|
bloating the underlying datastructure. Most usecases won't be influenced
|
|
by the default value since prefix completions seldom grow beyond prefixes longer
|
|
than a handful of characters.
|
|
|
|
[[indexing]]
|
|
==== Indexing
|
|
|
|
You index suggestions like any other field. A suggestion is made of an
|
|
`input` and an optional `weight` attribute. An `input` is the expected
|
|
text to be matched by a suggestion query and the `weight` determines how
|
|
the suggestions will be scored. Indexing a suggestion is as follows:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT music/song/1?refresh=true
|
|
{
|
|
"suggest" : {
|
|
"input": [ "Nevermind", "Nirvana" ],
|
|
"weight" : 34
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The following parameters are supported:
|
|
|
|
`input`::
|
|
The input to store, this can be a an array of strings or just
|
|
a string. This field is mandatory.
|
|
|
|
`weight`::
|
|
A positive integer or a string containing a positive integer,
|
|
which defines a weight and allows you to rank your suggestions.
|
|
This field is optional.
|
|
|
|
You can index multiple suggestions for a document as follows:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
PUT music/song/1?refresh=true
|
|
{
|
|
"suggest" : [
|
|
{
|
|
"input": "Nevermind",
|
|
"weight" : 10
|
|
},
|
|
{
|
|
"input": "Nirvana",
|
|
"weight" : 3
|
|
}
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
|
|
You can use the following shorthand form. Note that you can not specify
|
|
a weight with suggestion(s).
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"suggest" : [ "Nevermind", "Nirvana" ]
|
|
}
|
|
--------------------------------------------------
|
|
|
|
[[querying]]
|
|
==== Querying
|
|
|
|
Suggesting works as usual, except that you have to specify the suggest
|
|
type as `completion`. Suggestions are near real-time, which means
|
|
new suggestions can be made visible by <<indices-refresh,refresh>> and
|
|
documents once deleted are never shown.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST music/_suggest?pretty
|
|
{
|
|
"song-suggest" : {
|
|
"prefix" : "n",
|
|
"completion" : {
|
|
"field" : "suggest"
|
|
}
|
|
}
|
|
}
|
|
|
|
{
|
|
"_shards" : {
|
|
"total" : 5,
|
|
"successful" : 5,
|
|
"failed" : 0
|
|
},
|
|
"song-suggest" : [ {
|
|
"text" : "n",
|
|
"offset" : 0,
|
|
"length" : 1,
|
|
"options" : [ {
|
|
"text" : "Nirvana",
|
|
"score" : 34.0
|
|
} ]
|
|
} ]
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The configured weight for a suggestion is returned as `score`.
|
|
The `text` field uses the `input` of your indexed suggestion.
|
|
|
|
Suggestions are document oriented, you can specify fields to be
|
|
returned as part of suggestion payload. All field types (`string`,
|
|
`numeric`, `date`, etc) are supported.
|
|
|
|
For example, if you index a "title" field along with the suggestion
|
|
as follows:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST music/song
|
|
{
|
|
"suggest" : "Nirvana",
|
|
"title" : "Nevermind"
|
|
}
|
|
--------------------------------------------------
|
|
|
|
You can get the "title" as part of the suggestion
|
|
payload by specifying it as a `payload`:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST music/_suggest?pretty
|
|
{
|
|
"song-suggest" : {
|
|
"prefix" : "n",
|
|
"completion" : {
|
|
"field" : "suggest"
|
|
"payload" : [ "title" ] <1>
|
|
}
|
|
}
|
|
}
|
|
|
|
{
|
|
"_shards" : {
|
|
"total" : 5,
|
|
"successful" : 5,
|
|
"failed" : 0
|
|
},
|
|
"song-suggest" : [ {
|
|
"text" : "n",
|
|
"offset" : 0,
|
|
"length" : 1,
|
|
"options" : [ {
|
|
"text" : "Nirvana",
|
|
"score" : 34.0,
|
|
"payload" : {
|
|
"title" : [ "Nevermind" ]
|
|
}
|
|
} ]
|
|
} ]
|
|
}
|
|
--------------------------------------------------
|
|
<1> The fields to be returned as part of each suggestion payload.
|
|
|
|
The basic completion suggester query supports the following parameters:
|
|
|
|
`field`:: The name of the field on which to run the query (required).
|
|
`size`:: The number of suggestions to return (defaults to `5`).
|
|
`payload`:: The name of the field or field name array to be returned
|
|
as payload (defaults to no fields).
|
|
|
|
NOTE: The completion suggester considers all documents in the index.
|
|
See <<suggester-context>> for an explanation of how to query a subset of
|
|
documents instead.
|
|
|
|
NOTE: Specifying `payload` fields will incur additional search performance
|
|
hit. The `payload` fields are retrieved eagerly (single pass) for top
|
|
suggestions at the shard level using field data or from doc values.
|
|
|
|
[[fuzzy]]
|
|
==== Fuzzy queries
|
|
|
|
The completion suggester also supports fuzzy queries - this means,
|
|
you can have a typo in your search and still get results back.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST music/_suggest?pretty
|
|
{
|
|
"song-suggest" : {
|
|
"prefix" : "n",
|
|
"completion" : {
|
|
"field" : "suggest",
|
|
"fuzzy" : {
|
|
"fuzziness" : 2
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Suggestions that share the longest prefix to the query `prefix` will
|
|
be scored higher.
|
|
|
|
The fuzzy query can take specific fuzzy parameters.
|
|
The following parameters are supported:
|
|
|
|
[horizontal]
|
|
`fuzziness`::
|
|
The fuzziness factor, defaults to `AUTO`.
|
|
See <<fuzziness>> for allowed settings.
|
|
|
|
`transpositions`::
|
|
if set to `true`, transpositions are counted
|
|
as one change instead of two, defaults to `true`
|
|
|
|
`min_length`::
|
|
Minimum length of the input before fuzzy
|
|
suggestions are returned, defaults `3`
|
|
|
|
`prefix_length`::
|
|
Minimum length of the input, which is not
|
|
checked for fuzzy alternatives, defaults to `1`
|
|
|
|
`unicode_aware`::
|
|
If `true`, all measurements (like fuzzy edit
|
|
distance, transpositions, and lengths) are
|
|
measured in Unicode code points instead of
|
|
in bytes. This is slightly slower than raw
|
|
bytes, so it is set to `false` by default.
|
|
|
|
NOTE: If you want to stick with the default values, but
|
|
still use fuzzy, you can either use `fuzzy: {}`
|
|
or `fuzzy: true`.
|
|
|
|
[[regex]]
|
|
==== Regex queries
|
|
|
|
The completion suggester also supports regex queries meaning
|
|
you can express a prefix as a regular expression
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
POST music/_suggest?pretty
|
|
{
|
|
"song-suggest" : {
|
|
"regex" : "n[ever|i]r",
|
|
"completion" : {
|
|
"field" : "suggest",
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
The regex query can take specific regex parameters.
|
|
The following parameters are supported:
|
|
|
|
[horizontal]
|
|
`flags`::
|
|
Possible flags are `ALL` (default), `ANYSTRING`, `COMPLEMENT`,
|
|
`EMPTY`, `INTERSECTION`, `INTERVAL`, or `NONE`. See <<query-dsl-regexp-query, regexp-syntax>>
|
|
for their meaning
|
|
|
|
`max_determinized_states`::
|
|
Regular expressions are dangerous because it's easy to accidentally
|
|
create an innocuous looking one that requires an exponential number of
|
|
internal determinized automaton states (and corresponding RAM and CPU)
|
|
for Lucene to execute. Lucene prevents these using the
|
|
`max_determinized_states` setting (defaults to 10000). You can raise
|
|
this limit to allow more complex regular expressions to execute.
|