OpenSearch/docs/reference/mapping/fields/parent-field.asciidoc
Ryan Ernst a03b6c2fa5 Scripting: Change keys for inline/stored scripts to source/id (#25127)
This commit adds back "id" as the key within a script to specify a
stored script (which with file scripts now gone is no longer ambiguous).
It also adds "source" as a replacement for "code". This is in an attempt
to normalize how scripts are specified across both put stored scripts and script usages, including search template requests. This also deprecates the old inline/stored keys.
2017-06-09 08:29:25 -07:00

170 lines
4.6 KiB
Plaintext

[[mapping-parent-field]]
=== `_parent` field
A parent-child relationship can be established between documents in the same
index by making one mapping type the parent of another:
[source,js]
--------------------------------------------------
PUT my_index
{
"settings": {
"mapping.single_type": false
},
"mappings": {
"my_parent": {},
"my_child": {
"_parent": {
"type": "my_parent" <1>
}
}
}
}
PUT my_index/my_parent/1 <2>
{
"text": "This is a parent document"
}
PUT my_index/my_child/2?parent=1 <3>
{
"text": "This is a child document"
}
PUT my_index/my_child/3?parent=1&refresh=true <3>
{
"text": "This is another child document"
}
GET my_index/my_parent/_search
{
"query": {
"has_child": { <4>
"type": "my_child",
"query": {
"match": {
"text": "child document"
}
}
}
}
}
--------------------------------------------------
// CONSOLE
<1> The `my_parent` type is parent to the `my_child` type.
<2> Index a parent document.
<3> Index two child documents, specifying the parent document's ID.
<4> Find all parent documents that have children which match the query.
See the <<query-dsl-has-child-query,`has_child`>> and
<<query-dsl-has-parent-query,`has_parent`>> queries,
the <<search-aggregations-bucket-children-aggregation,`children`>> aggregation,
and <<parent-child-inner-hits,inner hits>> for more information.
The value of the `_parent` field is accessible in aggregations
and scripts, and may be queried with the
<<query-dsl-parent-id-query, `parent_id` query>>:
[source,js]
--------------------------
GET my_index/_search
{
"query": {
"parent_id": { <1>
"type": "my_child",
"id": "1"
}
},
"aggs": {
"parents": {
"terms": {
"field": "_parent", <2>
"size": 10
}
}
},
"script_fields": {
"parent": {
"script": {
"source": "doc['_parent']" <3>
}
}
}
}
--------------------------
// CONSOLE
// TEST[continued]
<1> Querying the id of the `_parent` field (also see the <<query-dsl-has-parent-query,`has_parent` query>> and the <<query-dsl-has-child-query,`has_child` query>>)
<2> Aggregating on the `_parent` field (also see the <<search-aggregations-bucket-children-aggregation,`children`>> aggregation)
<3> Accessing the `_parent` field in scripts
==== Parent-child restrictions
* The parent and child types must be different -- parent-child relationships
cannot be established between documents of the same type.
* The `_parent.type` setting can only point to a type that doesn't exist yet.
This means that a type cannot become a parent type after it has been
created.
* Parent and child documents must be indexed on the same shard. The `parent`
ID is used as the <<mapping-routing-field,routing>> value for the child,
to ensure that the child is indexed on the same shard as the parent.
This means that the same `parent` value needs to be provided when
<<docs-get,getting>>, <<docs-delete,deleting>>, or <<docs-update,updating>>
a child document.
==== Global ordinals
Parent-child uses <<eager-global-ordinals,global ordinals>> to speed up joins.
Global ordinals need to be rebuilt after any change to a shard. The more
parent id values are stored in a shard, the longer it takes to rebuild the
global ordinals for the `_parent` field.
Global ordinals, by default, are built eagerly: if the index has changed,
global ordinals for the `_parent` field will be rebuilt as part of the refresh.
This can add significant time the refresh. However most of the times this is the
right trade-off, otherwise global ordinals are rebuilt when the first parent-child
query or aggregation is used. This can introduce a significant latency spike for
your users and usually this is worse as multiple global ordinals for the `_parent`
field may be attempt rebuilt within a single refresh interval when many writes
are occurring.
When the parent/child is used infrequently and writes occur frequently it may
make sense to disable eager loading:
[source,js]
--------------------------------------------------
PUT my_index
{
"settings": {
"mapping.single_type": false
},
"mappings": {
"my_parent": {},
"my_child": {
"_parent": {
"type": "my_parent",
"eager_global_ordinals": false
}
}
}
}
--------------------------------------------------
// CONSOLE
The amount of heap used by global ordinals can be checked as follows:
[source,sh]
--------------------------------------------------
# Per-index
GET _stats/fielddata?human&fields=_parent
# Per-node per-index
GET _nodes/stats/indices/fielddata?human&fields=_parent
--------------------------------------------------
// CONSOLE