424 lines
13 KiB
Plaintext
424 lines
13 KiB
Plaintext
|
[[query-dsl-percolator-query]]
|
||
|
=== Percolator Query
|
||
|
|
||
|
Traditionally you design documents based on your data, store them into an index, and then define queries via the search API
|
||
|
in order to retrieve these documents. The percolator works in the opposite direction. First you store queries into an
|
||
|
index and then you use the `percolator` query to search for the queries which match a specified document (or documents).
|
||
|
|
||
|
The reason that queries can be stored comes from the fact that in Elasticsearch both documents and queries are defined in
|
||
|
JSON. This allows you to embed queries into documents via the index API. Elasticsearch can extract the query from a
|
||
|
document and make it available for search via the `percolator` query. Since documents are also defined as JSON,
|
||
|
you can define a document in the `percolator` query.
|
||
|
|
||
|
[IMPORTANT]
|
||
|
=====================================
|
||
|
|
||
|
Fields referred to in a percolator query must *already* exist in the mapping
|
||
|
associated with the index used for percolation. In order to make sure these fields exist,
|
||
|
add or update a mapping via the <<indices-create-index,create index>> or <<indices-put-mapping,put mapping>> APIs.
|
||
|
|
||
|
=====================================
|
||
|
|
||
|
[float]
|
||
|
=== Sample Usage
|
||
|
|
||
|
Create an index with a mapping for the field `message`:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
curl -XPUT 'localhost:9200/my-index' -d '{
|
||
|
"mappings": {
|
||
|
"my-type": {
|
||
|
"properties": {
|
||
|
"message": {
|
||
|
"type": "string"
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}'
|
||
|
--------------------------------------------------
|
||
|
|
||
|
Register a query in the percolator:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
curl -XPUT 'localhost:9200/my-index/.percolator/1' -d '{
|
||
|
"query" : {
|
||
|
"match" : {
|
||
|
"message" : "bonsai tree"
|
||
|
}
|
||
|
}
|
||
|
}'
|
||
|
--------------------------------------------------
|
||
|
|
||
|
Match a document to the registered percolator queries:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
curl -XGET 'localhost:9200/my-index/_search' -d '{
|
||
|
"query" : {
|
||
|
"percolator" : {
|
||
|
"document_type" : "my-type",
|
||
|
"document" : {
|
||
|
"message" : "A new bonsai tree in the office"
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}'
|
||
|
--------------------------------------------------
|
||
|
|
||
|
The above request will yield the following response:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"took": 5,
|
||
|
"timed_out": false,
|
||
|
"_shards": {
|
||
|
"total": 5,
|
||
|
"successful": 5,
|
||
|
"failed": 0
|
||
|
},
|
||
|
"hits": {
|
||
|
"total": 1,
|
||
|
"max_score": 0,
|
||
|
"hits": [
|
||
|
{ <1>
|
||
|
"_index": "my-index",
|
||
|
"_type": ".percolator",
|
||
|
"_id": "1",
|
||
|
"_score": 0,
|
||
|
"_source": {
|
||
|
"query": {
|
||
|
"match": {
|
||
|
"message": "bonsai tree"
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
]
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
<1> The percolate query with id `1` matches our document.
|
||
|
|
||
|
[float]
|
||
|
=== Indexing Percolator Queries
|
||
|
|
||
|
Percolate queries are stored as documents in a specific format and in an arbitrary index under a reserved type with the
|
||
|
name `.percolator`. The query itself is placed as is in a JSON object under the top level field `query`.
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {
|
||
|
"match" : {
|
||
|
"field" : "value"
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
Since this is just an ordinary document, any field can be added to this document. This can be useful later on to only
|
||
|
percolate documents by specific queries.
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {
|
||
|
"match" : {
|
||
|
"field" : "value"
|
||
|
}
|
||
|
},
|
||
|
"priority" : "high"
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
Just as with any other type, the `.percolator` type has a mapping, which you can configure via the mappings APIs.
|
||
|
The default percolate mapping doesn't index the query field, only stores it.
|
||
|
|
||
|
Because `.percolate` is a type it also has a mapping. By default the following mapping is active:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
".percolator" : {
|
||
|
"properties" : {
|
||
|
"query" : {
|
||
|
"type" : "percolator"
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
If needed, this mapping can be modified with the update mapping API.
|
||
|
|
||
|
In order to un-register a percolate query the delete API can be used. So if the previous added query needs to be deleted
|
||
|
the following delete requests needs to be executed:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
curl -XDELETE localhost:9200/my-index/.percolator/1
|
||
|
--------------------------------------------------
|
||
|
|
||
|
[float]
|
||
|
==== Parameters
|
||
|
|
||
|
The following parameters are required when percolating a document:
|
||
|
|
||
|
[horizontal]
|
||
|
`document_type`:: The type / mapping of the document being percolated. This is parameter is always required.
|
||
|
`document`:: The source of the document being percolated.
|
||
|
|
||
|
Instead of specifying a the source of the document being percolated, the source can also be retrieved from an already
|
||
|
stored document. The `percolator` query will then internally execute a get request to fetch that document.
|
||
|
|
||
|
In that case the `document` parameter can be substituted with the following parameters:
|
||
|
|
||
|
[horizontal]
|
||
|
`index`:: The index the document resides in. This is a required parameter.
|
||
|
`type`:: The type of the document to fetch. This is a required parameter.
|
||
|
`id`:: The id of the document to fetch. This is a required parameter.
|
||
|
`routing`:: Optionally, routing to be used to fetch document to percolate.
|
||
|
`preference`:: Optionally, preference to be used to fetch document to percolate.
|
||
|
`version`:: Optionally, the expected version of the document to be fetched.
|
||
|
|
||
|
[float]
|
||
|
==== Dedicated Percolator Index
|
||
|
|
||
|
Percolate queries can be added to any index. Instead of adding percolate queries to the index the data resides in,
|
||
|
these queries can also be added to a dedicated index. The advantage of this is that this dedicated percolator index
|
||
|
can have its own index settings (For example the number of primary and replica shards). If you choose to have a dedicated
|
||
|
percolate index, you need to make sure that the mappings from the normal index are also available on the percolate index.
|
||
|
Otherwise percolate queries can be parsed incorrectly.
|
||
|
|
||
|
[float]
|
||
|
==== Percolating an Existing Document
|
||
|
|
||
|
In order to percolate a newly indexed document, the `percolator` query can be used. Based on the response
|
||
|
from an index request, the `_id` and other meta information can be used to immediately percolate the newly added
|
||
|
document.
|
||
|
|
||
|
[float]
|
||
|
===== Example
|
||
|
|
||
|
Based on the previous example.
|
||
|
|
||
|
Index the document we want to percolate:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
curl -XPUT "http://localhost:9200/my-index/message/1" -d'
|
||
|
{
|
||
|
"message" : "A new bonsai tree in the office"
|
||
|
}'
|
||
|
--------------------------------------------------
|
||
|
|
||
|
Index response:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"_index": "my-index",
|
||
|
"_type": "message",
|
||
|
"_id": "1",
|
||
|
"_version": 1,
|
||
|
"_shards": {
|
||
|
"total": 2,
|
||
|
"successful": 1,
|
||
|
"failed": 0
|
||
|
},
|
||
|
"created": true
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
Percolating an existing document, using the index response as basis to build to new search request:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
curl -XGET "http://localhost:9200/my-index/_search" -d'
|
||
|
{
|
||
|
"query" : {
|
||
|
"percolator" : {
|
||
|
"document_type" : "my-type",
|
||
|
"index" : "my-index",
|
||
|
"type" : "message",
|
||
|
"id" : "1",
|
||
|
"version" : 1 <1>
|
||
|
}
|
||
|
}
|
||
|
}'
|
||
|
--------------------------------------------------
|
||
|
|
||
|
<1> The version is optional, but useful in certain cases. We can then ensure that we are try to percolate
|
||
|
the document we just have indexed. A change may be made after we have indexed, and if that is the
|
||
|
case the then the search request would fail with a version conflict error.
|
||
|
|
||
|
The search response returned is identical as in the previous example.
|
||
|
|
||
|
[float]
|
||
|
==== Percolator and highlighting
|
||
|
|
||
|
The percolator query is handled in a special way when it comes to highlighting. The percolator queries hits are used
|
||
|
to highlight the document that is provided in the `percolator` query. Whereas with regular highlighting the query in
|
||
|
the search request is used to highlight the hits.
|
||
|
|
||
|
[float]
|
||
|
===== Example
|
||
|
|
||
|
This example is based on the mapping of the first example.
|
||
|
|
||
|
Add a percolator query:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
curl -XPUT "http://localhost:9200/my-index/.percolator/1" -d'
|
||
|
{
|
||
|
"query" : {
|
||
|
"match" : {
|
||
|
"message" : "brown fox"
|
||
|
}
|
||
|
}
|
||
|
}'
|
||
|
--------------------------------------------------
|
||
|
|
||
|
Add another percolator query:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
curl -XPUT "http://localhost:9200/my-index/.percolator/2" -d'
|
||
|
{
|
||
|
"query" : {
|
||
|
"match" : {
|
||
|
"message" : "lazy dog"
|
||
|
}
|
||
|
}
|
||
|
}'
|
||
|
--------------------------------------------------
|
||
|
|
||
|
Execute a search request with `percolator` and highlighting enabled:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
curl -XGET "http://localhost:9200/my-index/_search" -d'
|
||
|
{
|
||
|
"query" : {
|
||
|
"percolator" : {
|
||
|
"document_type" : "my-type",
|
||
|
"document" : {
|
||
|
"message" : "The quick brown fox jumps over the lazy dog"
|
||
|
}
|
||
|
}
|
||
|
},
|
||
|
"highlight": {
|
||
|
"fields": {
|
||
|
"message": {}
|
||
|
}
|
||
|
}
|
||
|
}'
|
||
|
--------------------------------------------------
|
||
|
|
||
|
This will yield the following response.
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"took": 14,
|
||
|
"timed_out": false,
|
||
|
"_shards": {
|
||
|
"total": 5,
|
||
|
"successful": 5,
|
||
|
"failed": 0
|
||
|
},
|
||
|
"hits": {
|
||
|
"total": 2,
|
||
|
"max_score": 0,
|
||
|
"hits": [
|
||
|
{
|
||
|
"_index": "my-index",
|
||
|
"_type": ".percolator",
|
||
|
"_id": "2",
|
||
|
"_score": 0,
|
||
|
"_source": {
|
||
|
"query": {
|
||
|
"match": {
|
||
|
"message": "lazy dog"
|
||
|
}
|
||
|
}
|
||
|
},
|
||
|
"highlight": {
|
||
|
"message": [
|
||
|
"The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" <1>
|
||
|
]
|
||
|
}
|
||
|
},
|
||
|
{
|
||
|
"_index": "my-index",
|
||
|
"_type": ".percolator",
|
||
|
"_id": "1",
|
||
|
"_score": 0,
|
||
|
"_source": {
|
||
|
"query": {
|
||
|
"match": {
|
||
|
"message": "brown fox"
|
||
|
}
|
||
|
}
|
||
|
},
|
||
|
"highlight": {
|
||
|
"message": [
|
||
|
"The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" <1>
|
||
|
]
|
||
|
}
|
||
|
}
|
||
|
]
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
<1> Instead of the query in the search request highlighting the percolator hits, the percolator queries are highlighting
|
||
|
the document defined in the `percolator` query.
|
||
|
|
||
|
[float]
|
||
|
==== How it Works Under the Hood
|
||
|
|
||
|
When indexing a document that contains a query in an index and the `.percolator` type, the query part of the documents gets
|
||
|
parsed into a Lucene query and is kept in memory until that percolator document is removed or the index containing the
|
||
|
`.percolator` type gets removed. So, all the active percolator queries are kept in memory.
|
||
|
|
||
|
At search time, the document specified in the request gets parsed into a Lucene document and is stored in a in-memory
|
||
|
Lucene index. This in-memory index can just hold this one document and it is optimized for that. Then all the queries
|
||
|
that are registered to the index that the searh request is targeted for, are going to be executed on this single document
|
||
|
in-memory index. This happens on each shard the search request needs to execute.
|
||
|
|
||
|
By using `routing` or additional queries the amount of percolator queries that need to be executed can be reduced and thus
|
||
|
the time the search API needs to run can be decreased.
|
||
|
|
||
|
[float]
|
||
|
==== Important Notes
|
||
|
|
||
|
Because the percolator query is processing one document at a time, it doesn't support queries and filters that run
|
||
|
against child documents such as `has_child` and `has_parent`.
|
||
|
|
||
|
The percolator doesn't work with queries like `template` and `geo_shape` queries when these queries fetch documents
|
||
|
to substitute parts of the query. The reason is that the percolator stores the query terms during indexing in order to
|
||
|
speedup percolating in certain cases and this doesn't work if part of the query is defined in another document.
|
||
|
There is no way to know for the percolator to know if an external document has changed and even if this was the case the
|
||
|
percolator query has to be reindexed.
|
||
|
|
||
|
The `wildcard` and `regexp` query natively use a lot of memory and because the percolator keeps the queries into memory
|
||
|
this can easily take up the available memory in the heap space. If possible try to use a `prefix` query or ngramming to
|
||
|
achieve the same result (with way less memory being used).
|
||
|
|
||
|
[float]
|
||
|
==== Forcing Unmapped Fields to be Handled as Strings
|
||
|
|
||
|
In certain cases it is unknown what kind of percolator queries do get registered, and if no field mapping exists for fields
|
||
|
that are referred by percolator queries then adding a percolator query fails. This means the mapping needs to be updated
|
||
|
to have the field with the appropriate settings, and then the percolator query can be added. But sometimes it is sufficient
|
||
|
if all unmapped fields are handled as if these were default string fields. In those cases one can configure the
|
||
|
`index.percolator.map_unmapped_fields_as_string` setting to `true` (default to `false`) and then if a field referred in
|
||
|
a percolator query does not exist, it will be handled as a default string field so that adding the percolator query doesn't
|
||
|
fail.
|