OpenSearch/docs/reference/query-dsl/percolator-query.asciidoc

424 lines
13 KiB
Plaintext

[[query-dsl-percolator-query]]
=== Percolator Query
Traditionally you design documents based on your data, store them into an index, and then define queries via the search API
in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an
index and then you use the `percolator` query to search for the queries which match a specified document (or documents).
The reason that queries can be stored comes from the fact that in Elasticsearch both documents and queries are defined in
JSON. This allows you to embed queries into documents via the index API. Elasticsearch can extract the query from a
document and make it available for search via the `percolator` query. Since documents are also defined as JSON,
you can define a document in the `percolator` query.
[IMPORTANT]
=====================================
Fields referred to in a percolator query must *already* exist in the mapping
associated with the index used for percolation. In order to make sure these fields exist,
add or update a mapping via the <<indices-create-index,create index>> or <<indices-put-mapping,put mapping>> APIs.
=====================================
[float]
=== Sample Usage
Create an index with a mapping for the field `message`:
[source,js]
--------------------------------------------------
curl -XPUT 'localhost:9200/my-index' -d '{
"mappings": {
"my-type": {
"properties": {
"message": {
"type": "string"
}
}
}
}
}'
--------------------------------------------------
Register a query in the percolator:
[source,js]
--------------------------------------------------
curl -XPUT 'localhost:9200/my-index/.percolator/1' -d '{
"query" : {
"match" : {
"message" : "bonsai tree"
}
}
}'
--------------------------------------------------
Match a document to the registered percolator queries:
[source,js]
--------------------------------------------------
curl -XGET 'localhost:9200/my-index/_search' -d '{
"query" : {
"percolator" : {
"document_type" : "my-type",
"document" : {
"message" : "A new bonsai tree in the office"
}
}
}
}'
--------------------------------------------------
The above request will yield the following response:
[source,js]
--------------------------------------------------
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": [
{ <1>
"_index": "my-index",
"_type": ".percolator",
"_id": "1",
"_score": 0,
"_source": {
"query": {
"match": {
"message": "bonsai tree"
}
}
}
}
]
}
}
--------------------------------------------------
<1> The percolate query with id `1` matches our document.
[float]
=== Indexing Percolator Queries
Percolate queries are stored as documents in a specific format and in an arbitrary index under a reserved type with the
name `.percolator`. The query itself is placed as is in a JSON object under the top level field `query`.
[source,js]
--------------------------------------------------
{
"query" : {
"match" : {
"field" : "value"
}
}
}
--------------------------------------------------
Since this is just an ordinary document, any field can be added to this document. This can be useful later on to only
percolate documents by specific queries.
[source,js]
--------------------------------------------------
{
"query" : {
"match" : {
"field" : "value"
}
},
"priority" : "high"
}
--------------------------------------------------
Just as with any other type, the `.percolator` type has a mapping, which you can configure via the mappings APIs.
The default percolate mapping doesn't index the query field, only stores it.
Because `.percolate` is a type it also has a mapping. By default the following mapping is active:
[source,js]
--------------------------------------------------
{
".percolator" : {
"properties" : {
"query" : {
"type" : "percolator"
}
}
}
}
--------------------------------------------------
If needed, this mapping can be modified with the update mapping API.
In order to un-register a percolate query the delete API can be used. So if the previous added query needs to be deleted
the following delete requests needs to be executed:
[source,js]
--------------------------------------------------
curl -XDELETE localhost:9200/my-index/.percolator/1
--------------------------------------------------
[float]
==== Parameters
The following parameters are required when percolating a document:
[horizontal]
`document_type`:: The type / mapping of the document being percolated. This is parameter is always required.
`document`:: The source of the document being percolated.
Instead of specifying a the source of the document being percolated, the source can also be retrieved from an already
stored document. The `percolator` query will then internally execute a get request to fetch that document.
In that case the `document` parameter can be substituted with the following parameters:
[horizontal]
`index`:: The index the document resides in. This is a required parameter.
`type`:: The type of the document to fetch. This is a required parameter.
`id`:: The id of the document to fetch. This is a required parameter.
`routing`:: Optionally, routing to be used to fetch document to percolate.
`preference`:: Optionally, preference to be used to fetch document to percolate.
`version`:: Optionally, the expected version of the document to be fetched.
[float]
==== Dedicated Percolator Index
Percolate queries can be added to any index. Instead of adding percolate queries to the index the data resides in,
these queries can also be added to a dedicated index. The advantage of this is that this dedicated percolator index
can have its own index settings (For example the number of primary and replica shards). If you choose to have a dedicated
percolate index, you need to make sure that the mappings from the normal index are also available on the percolate index.
Otherwise percolate queries can be parsed incorrectly.
[float]
==== Percolating an Existing Document
In order to percolate a newly indexed document, the `percolator` query can be used. Based on the response
from an index request, the `_id` and other meta information can be used to immediately percolate the newly added
document.
[float]
===== Example
Based on the previous example.
Index the document we want to percolate:
[source,js]
--------------------------------------------------
curl -XPUT "http://localhost:9200/my-index/message/1" -d'
{
"message" : "A new bonsai tree in the office"
}'
--------------------------------------------------
Index response:
[source,js]
--------------------------------------------------
{
"_index": "my-index",
"_type": "message",
"_id": "1",
"_version": 1,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true
}
--------------------------------------------------
Percolating an existing document, using the index response as basis to build to new search request:
[source,js]
--------------------------------------------------
curl -XGET "http://localhost:9200/my-index/_search" -d'
{
"query" : {
"percolator" : {
"document_type" : "my-type",
"index" : "my-index",
"type" : "message",
"id" : "1",
"version" : 1 <1>
}
}
}'
--------------------------------------------------
<1> The version is optional, but useful in certain cases. We can then ensure that we are try to percolate
the document we just have indexed. A change may be made after we have indexed, and if that is the
case the then the search request would fail with a version conflict error.
The search response returned is identical as in the previous example.
[float]
==== Percolator and highlighting
The percolator query is handled in a special way when it comes to highlighting. The percolator queries hits are used
to highlight the document that is provided in the `percolator` query. Whereas with regular highlighting the query in
the search request is used to highlight the hits.
[float]
===== Example
This example is based on the mapping of the first example.
Add a percolator query:
[source,js]
--------------------------------------------------
curl -XPUT "http://localhost:9200/my-index/.percolator/1" -d'
{
"query" : {
"match" : {
"message" : "brown fox"
}
}
}'
--------------------------------------------------
Add another percolator query:
[source,js]
--------------------------------------------------
curl -XPUT "http://localhost:9200/my-index/.percolator/2" -d'
{
"query" : {
"match" : {
"message" : "lazy dog"
}
}
}'
--------------------------------------------------
Execute a search request with `percolator` and highlighting enabled:
[source,js]
--------------------------------------------------
curl -XGET "http://localhost:9200/my-index/_search" -d'
{
"query" : {
"percolator" : {
"document_type" : "my-type",
"document" : {
"message" : "The quick brown fox jumps over the lazy dog"
}
}
},
"highlight": {
"fields": {
"message": {}
}
}
}'
--------------------------------------------------
This will yield the following response.
[source,js]
--------------------------------------------------
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": [
{
"_index": "my-index",
"_type": ".percolator",
"_id": "2",
"_score": 0,
"_source": {
"query": {
"match": {
"message": "lazy dog"
}
}
},
"highlight": {
"message": [
"The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" <1>
]
}
},
{
"_index": "my-index",
"_type": ".percolator",
"_id": "1",
"_score": 0,
"_source": {
"query": {
"match": {
"message": "brown fox"
}
}
},
"highlight": {
"message": [
"The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" <1>
]
}
}
]
}
}
--------------------------------------------------
<1> Instead of the query in the search request highlighting the percolator hits, the percolator queries are highlighting
the document defined in the `percolator` query.
[float]
==== How it Works Under the Hood
When indexing a document that contains a query in an index and the `.percolator` type, the query part of the documents gets
parsed into a Lucene query and is kept in memory until that percolator document is removed or the index containing the
`.percolator` type gets removed. So, all the active percolator queries are kept in memory.
At search time, the document specified in the request gets parsed into a Lucene document and is stored in a in-memory
Lucene index. This in-memory index can just hold this one document and it is optimized for that. Then all the queries
that are registered to the index that the searh request is targeted for, are going to be executed on this single document
in-memory index. This happens on each shard the search request needs to execute.
By using `routing` or additional queries the amount of percolator queries that need to be executed can be reduced and thus
the time the search API needs to run can be decreased.
[float]
==== Important Notes
Because the percolator query is processing one document at a time, it doesn't support queries and filters that run
against child documents such as `has_child` and `has_parent`.
The percolator doesn't work with queries like `template` and `geo_shape` queries when these queries fetch documents
to substitute parts of the query. The reason is that the percolator stores the query terms during indexing in order to
speedup percolating in certain cases and this doesn't work if part of the query is defined in another document.
There is no way to know for the percolator to know if an external document has changed and even if this was the case the
percolator query has to be reindexed.
The `wildcard` and `regexp` query natively use a lot of memory and because the percolator keeps the queries into memory
this can easily take up the available memory in the heap space. If possible try to use a `prefix` query or ngramming to
achieve the same result (with way less memory being used).
[float]
==== Forcing Unmapped Fields to be Handled as Strings
In certain cases it is unknown what kind of percolator queries do get registered, and if no field mapping exists for fields
that are referred by percolator queries then adding a percolator query fails. This means the mapping needs to be updated
to have the field with the appropriate settings, and then the percolator query can be added. But sometimes it is sufficient
if all unmapped fields are handled as if these were default string fields. In those cases one can configure the
`index.percolator.map_unmapped_fields_as_string` setting to `true` (default to `false`) and then if a field referred in
a percolator query does not exist, it will be handled as a default string field so that adding the percolator query doesn't
fail.