339 lines
9.3 KiB
Plaintext
339 lines
9.3 KiB
Plaintext
[[query-dsl-percolator-query]]
|
|
=== Percolator Query
|
|
|
|
The `percolator` query can be used to match queries
|
|
stored in an index. The `percolator` query itself
|
|
contains the document that will be used as query
|
|
to match with the stored queries.
|
|
|
|
[float]
|
|
=== Sample Usage
|
|
|
|
Create an index with two mappings:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT "http://localhost:9200/my-index" -d'
|
|
{
|
|
"mappings": {
|
|
"doctype": {
|
|
"properties": {
|
|
"message": {
|
|
"type": "string"
|
|
}
|
|
}
|
|
},
|
|
"queries": {
|
|
"properties": {
|
|
"query": {
|
|
"type": "percolator"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
The `doctype` mapping is the mapping used to preprocess
|
|
the document defined in the `percolator` query before it
|
|
gets indexed into a temporary index.
|
|
|
|
The `queries` mapping is the mapping used for indexing
|
|
the query documents. The `query` field will hold a json
|
|
object that represents an actual Elasticsearch query. The
|
|
`query` field has been configured to use the
|
|
<<percolator,percolator field type>>. This field type understands
|
|
the query dsl and stored the query in such a way that it
|
|
can be used later on to match documents defined on the `percolator` query.
|
|
|
|
Register a query in the percolator:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT 'localhost:9200/my-index/queries/1' -d '{
|
|
"query" : {
|
|
"match" : {
|
|
"message" : "bonsai tree"
|
|
}
|
|
}
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
Match a document to the registered percolator queries:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET 'localhost:9200/my-index/_search' -d '{
|
|
"query" : {
|
|
"percolator" : {
|
|
"field" : "query",
|
|
"document_type" : "doctype",
|
|
"document" : {
|
|
"message" : "A new bonsai tree in the office"
|
|
}
|
|
}
|
|
}
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
The above request will yield the following response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 5,
|
|
"timed_out": false,
|
|
"_shards": {
|
|
"total": 5,
|
|
"successful": 5,
|
|
"failed": 0
|
|
},
|
|
"hits": {
|
|
"total": 1,
|
|
"max_score": 0.5716521,
|
|
"hits": [
|
|
{ <1>
|
|
"_index": "my-index",
|
|
"_type": "queries",
|
|
"_id": "1",
|
|
"_score": 0.5716521,
|
|
"_source": {
|
|
"query": {
|
|
"match": {
|
|
"message": "bonsai tree"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
<1> The percolate query with id `1` matches our document.
|
|
|
|
[float]
|
|
==== Parameters
|
|
|
|
The following parameters are required when percolating a document:
|
|
|
|
[horizontal]
|
|
`field`:: The field of type `percolator` and that holds the indexed queries. This is a required parameter.
|
|
`document_type`:: The type / mapping of the document being percolated. This is a required parameter.
|
|
`document`:: The source of the document being percolated.
|
|
|
|
Instead of specifying a the source of the document being percolated, the source can also be retrieved from an already
|
|
stored document. The `percolator` query will then internally execute a get request to fetch that document.
|
|
|
|
In that case the `document` parameter can be substituted with the following parameters:
|
|
|
|
[horizontal]
|
|
`index`:: The index the document resides in. This is a required parameter.
|
|
`type`:: The type of the document to fetch. This is a required parameter.
|
|
`id`:: The id of the document to fetch. This is a required parameter.
|
|
`routing`:: Optionally, routing to be used to fetch document to percolate.
|
|
`preference`:: Optionally, preference to be used to fetch document to percolate.
|
|
`version`:: Optionally, the expected version of the document to be fetched.
|
|
|
|
[float]
|
|
==== Percolating an Existing Document
|
|
|
|
In order to percolate a newly indexed document, the `percolator` query can be used. Based on the response
|
|
from an index request, the `_id` and other meta information can be used to immediately percolate the newly added
|
|
document.
|
|
|
|
[float]
|
|
===== Example
|
|
|
|
Based on the previous example.
|
|
|
|
Index the document we want to percolate:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT "http://localhost:9200/my-index/message/1" -d'
|
|
{
|
|
"message" : "A new bonsai tree in the office"
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
Index response:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"_index": "my-index",
|
|
"_type": "message",
|
|
"_id": "1",
|
|
"_version": 1,
|
|
"_shards": {
|
|
"total": 2,
|
|
"successful": 1,
|
|
"failed": 0
|
|
},
|
|
"created": true
|
|
}
|
|
--------------------------------------------------
|
|
|
|
Percolating an existing document, using the index response as basis to build to new search request:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET "http://localhost:9200/my-index/_search" -d'
|
|
{
|
|
"query" : {
|
|
"percolator" : {
|
|
"field": "query",
|
|
"document_type" : "doctype",
|
|
"index" : "my-index",
|
|
"type" : "message",
|
|
"id" : "1",
|
|
"version" : 1 <1>
|
|
}
|
|
}
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
<1> The version is optional, but useful in certain cases. We can then ensure that we are try to percolate
|
|
the document we just have indexed. A change may be made after we have indexed, and if that is the
|
|
case the then the search request would fail with a version conflict error.
|
|
|
|
The search response returned is identical as in the previous example.
|
|
|
|
[float]
|
|
==== Percolator and highlighting
|
|
|
|
The percolator query is handled in a special way when it comes to highlighting. The percolator queries hits are used
|
|
to highlight the document that is provided in the `percolator` query. Whereas with regular highlighting the query in
|
|
the search request is used to highlight the hits.
|
|
|
|
[float]
|
|
===== Example
|
|
|
|
This example is based on the mapping of the first example.
|
|
|
|
Add a percolator query:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT "http://localhost:9200/my-index/queries/1" -d'
|
|
{
|
|
"query" : {
|
|
"match" : {
|
|
"message" : "brown fox"
|
|
}
|
|
}
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
Add another percolator query:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XPUT "http://localhost:9200/my-index/queries/2" -d'
|
|
{
|
|
"query" : {
|
|
"match" : {
|
|
"message" : "lazy dog"
|
|
}
|
|
}
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
Execute a search request with the `percolator` query and highlighting enabled:
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
curl -XGET "http://localhost:9200/my-index/_search" -d'
|
|
{
|
|
"query" : {
|
|
"percolator" : {
|
|
"field": "query",
|
|
"document_type" : "doctype",
|
|
"document" : {
|
|
"message" : "The quick brown fox jumps over the lazy dog"
|
|
}
|
|
}
|
|
},
|
|
"highlight": {
|
|
"fields": {
|
|
"message": {}
|
|
}
|
|
}
|
|
}'
|
|
--------------------------------------------------
|
|
|
|
This will yield the following response.
|
|
|
|
[source,js]
|
|
--------------------------------------------------
|
|
{
|
|
"took": 83,
|
|
"timed_out": false,
|
|
"_shards": {
|
|
"total": 5,
|
|
"successful": 5,
|
|
"failed": 0
|
|
},
|
|
"hits": {
|
|
"total": 2,
|
|
"max_score": 0.5446649,
|
|
"hits": [
|
|
{
|
|
"_index": "my-index",
|
|
"_type": "queries",
|
|
"_id": "2",
|
|
"_score": 0.5446649,
|
|
"_source": {
|
|
"query": {
|
|
"match": {
|
|
"message": "lazy dog"
|
|
}
|
|
}
|
|
},
|
|
"highlight": {
|
|
"message": [
|
|
"The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" <1>
|
|
]
|
|
}
|
|
},
|
|
{
|
|
"_index": "my-index",
|
|
"_type": "queries",
|
|
"_id": "1",
|
|
"_score": 0.5446649,
|
|
"_source": {
|
|
"query": {
|
|
"match": {
|
|
"message": "brown fox"
|
|
}
|
|
}
|
|
},
|
|
"highlight": {
|
|
"message": [
|
|
"The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" <1>
|
|
]
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
|
|
<1> Instead of the query in the search request highlighting the percolator hits, the percolator queries are highlighting
|
|
the document defined in the `percolator` query.
|
|
|
|
[float]
|
|
==== How it Works Under the Hood
|
|
|
|
When indexing a document into an index that has the <<percolator,percolator field type>> mapping configured, the query
|
|
part of the documents gets parsed into a Lucene query and is kept in memory until that percolator document is removed.
|
|
So, all the active percolator queries are kept in memory.
|
|
|
|
At search time, the document specified in the request gets parsed into a Lucene document and is stored in a in-memory
|
|
temporary Lucene index. This in-memory index can just hold this one document and it is optimized for that. Then all the queries
|
|
that are registered to the index that the search request is targeted for, are going to be executed on this single document
|
|
in-memory index. This happens on each shard the search request needs to execute.
|
|
|
|
By using `routing` or additional queries the amount of percolator queries that need to be executed can be reduced and thus
|
|
the time the search API needs to run can be decreased. |