docs: describe parent/child performances

This commit is contained in:
Martijn van Groningen 2017-10-26 11:25:10 +02:00
parent 8bf33241ed
commit f1e944a675
No known key found for this signature in database
GPG Key ID: AB236F4FCF2AF12A
3 changed files with 27 additions and 60 deletions

View File

@ -114,6 +114,17 @@ PUT my_index/doc/4?routing=1&refresh
<2> `answer` is the name of the join for this document
<3> The parent id of this child document
==== Parent-join and performance.
The join field shouldn't be used like joins in a relation database. In Elasticsearch the key to good performance
is to de-normalize your data into documents. Each join field, `has_child` or `has_parent` query adds a
significant tax to your query performance.
The only case where the join field makes sense is if your data contains a one-to-many relationship where
one entity significantly outnumbers the other entity. An example of such case is a use case with products
and offers for these products. In the case that offers significantly outnumbers the number of products then
it makes sense to model the product as parent document and the offer as child document.
==== Parent-join restrictions
* Only one `join` field mapping is allowed per index.
@ -338,7 +349,7 @@ GET _nodes/stats/indices/fielddata?human&fields=my_join_field#question
// CONSOLE
// TEST[continued]
==== Multiple levels of parent join
==== Multiple children per parent
It is also possible to define multiple children for a single parent:
@ -363,62 +374,3 @@ PUT my_index
// CONSOLE
<1> `question` is parent of `answer` and `comment`.
And multiple levels of parent/child:
[source,js]
--------------------------------------------------
PUT my_index
{
"mappings": {
"doc": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"question": ["answer", "comment"], <1>
"answer": "vote" <2>
}
}
}
}
}
}
--------------------------------------------------
// CONSOLE
<1> `question` is parent of `answer` and `comment`
<2> `answer` is parent of `vote`
The mapping above represents the following tree:
question
/ \
/ \
comment answer
|
|
vote
Indexing a grand child document requires a `routing` value equals
to the grand-parent (the greater parent of the lineage):
[source,js]
--------------------------------------------------
PUT my_index/doc/3?routing=1&refresh <1>
{
"text": "This is a vote",
"my_join_field": {
"name": "vote",
"parent": "2" <2>
}
}
--------------------------------------------------
// CONSOLE
// TEST[continued]
<1> This child document must be on the same shard than its grandparent and parent
<2> The parent id of this document (must points to an `answer` document)

View File

@ -23,6 +23,14 @@ GET /_search
--------------------------------------------------
// CONSOLE
Note that the `has_child` is a slow query compared to other queries in the
query dsl due to the fact that it performs a join. The performance degrades
as the number of matching child documents pointing to unique parent documents
increases. If you care about query performance you should not use this query.
However if you do happen to use this query then use it as less as possible. Each
`has_child` query that gets added to a search request can increase query time
significantly.
[float]
==== Scoring capabilities

View File

@ -25,6 +25,13 @@ GET /_search
--------------------------------------------------
// CONSOLE
Note that the `has_parent` is a slow query compared to other queries in the
query dsl due to the fact that it performs a join. The performance degrades
as the number of matching parent documents increases. If you care about query
performance you should not use this query. However if you do happen to use
this query then use it as less as possible. Each `has_parent` query that gets
added to a search request can increase query time significantly.
[float]
==== Scoring capabilities