OpenSearch/docs/reference/query-dsl/queries/top-children-query.asciidoc

[[query-dsl-top-children-query]]
=== Top Children Query

The `top_children` query runs the child query with an estimated hits
size, and out of the hit docs, aggregates it into parent docs. If there
aren't enough parent docs matching the requested from/size search
request, then it is run again with a wider (more hits) search.

The `top_children` also provide scoring capabilities, with the ability
to specify `max`, `sum` or `avg` as the score type.

One downside of using the `top_children` is that if there are more child
docs matching the required hits when executing the child query, then the
`total_hits` result of the search response will be incorrect.

How many hits are asked for in the first child query run is controlled
using the `factor` parameter (defaults to `5`). For example, when asking
for 10 parent docs (with `from` set to 0), then the child query will
execute with 50 hits expected. If not enough parents are found (in our
example 10), and there are still more child docs to query, then the
child search hits are expanded by multiplying by the
`incremental_factor` (defaults to `2`).

The required parameters are the `query` and `type` (the child type to
execute the query on). Here is an example with all different parameters,
including the default values:

[source,js]
--------------------------------------------------
{
    "top_children" : {
        "type": "blog_tag",
        "query" : {
            "term" : {
                "tag" : "something"
            }
        },
        "score" : "max",
        "factor" : 5,
        "incremental_factor" : 2
    }
}
--------------------------------------------------

[float]
==== Scope

A `_scope` can be defined on the query allowing to run facets on the
same scope name that will work against the child documents. For example:

[source,js]
--------------------------------------------------
{
    "top_children" : {
        "_scope" : "my_scope",
        "type": "blog_tag",
        "query" : {
            "term" : {
                "tag" : "something"
            }
        }
    }
}
--------------------------------------------------

[float]
==== Memory Considerations

In order to support parent-child joins, all of the (string) parent IDs 
must be resident in memory (in the <<index-modules-fielddata,field data cache>>. 
Additionaly, every child document is mapped to its parent using a long 
value (approximately). It is advisable to keep the string parent ID short
in order to reduce memory usage.

You can check how much memory is being used by the ID cache using the
<<indices-stats,indices stats>> or <<cluster-nodes-stats,nodes stats>>
APIS, eg:

[source,js]
--------------------------------------------------
curl -XGET "http://localhost:9200/_stats/id_cache?pretty&human"
--------------------------------------------------
Migrated documentation into the main repo 2013-08-28 19:24:34 -04:00			`[[query-dsl-top-children-query]]`
			`=== Top Children Query`

			The `top_children` query runs the child query with an estimated hits
			`size, and out of the hit docs, aggregates it into parent docs. If there`
			`aren't enough parent docs matching the requested from/size search`
			`request, then it is run again with a wider (more hits) search.`

			The `top_children` also provide scoring capabilities, with the ability
			to specify `max`, `sum` or `avg` as the score type.

			One downside of using the `top_children` is that if there are more child
			`docs matching the required hits when executing the child query, then the`
			`total_hits` result of the search response will be incorrect.

			`How many hits are asked for in the first child query run is controlled`
			using the `factor` parameter (defaults to `5`). For example, when asking
			for 10 parent docs (with `from` set to 0), then the child query will
			`execute with 50 hits expected. If not enough parents are found (in our`
			`example 10), and there are still more child docs to query, then the`
			`child search hits are expanded by multiplying by the`
			`incremental_factor` (defaults to `2`).

			The required parameters are the `query` and `type` (the child type to
			`execute the query on). Here is an example with all different parameters,`
			`including the default values:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"top_children" : {`
			`"type": "blog_tag",`
			`"query" : {`
			`"term" : {`
			`"tag" : "something"`
			`}`
			`},`
			`"score" : "max",`
			`"factor" : 5,`
			`"incremental_factor" : 2`
			`}`
			`}`
			`--------------------------------------------------`

			`[float]`
			`==== Scope`

			A `_scope` can be defined on the query allowing to run facets on the
			`same scope name that will work against the child documents. For example:`

			`[source,js]`
			`--------------------------------------------------`
			`{`
			`"top_children" : {`
			`"_scope" : "my_scope",`
			`"type": "blog_tag",`
			`"query" : {`
			`"term" : {`
			`"tag" : "something"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`

			`[float]`
			`==== Memory Considerations`

Docs: Updated the explanation about memory usage with parent/child 2014-06-21 10:32:29 -04:00			`In order to support parent-child joins, all of the (string) parent IDs`
			`must be resident in memory (in the <<index-modules-fielddata,field data cache>>.`
			`Additionaly, every child document is mapped to its parent using a long`
			`value (approximately). It is advisable to keep the string parent ID short`
			`in order to reduce memory usage.`

			`You can check how much memory is being used by the ID cache using the`
			`<<indices-stats,indices stats>> or <<cluster-nodes-stats,nodes stats>>`
			`APIS, eg:`

			`[source,js]`
			`--------------------------------------------------`
			`curl -XGET "http://localhost:9200/_stats/id_cache?pretty&human"`
			`--------------------------------------------------`