Docs: Added explanation of how to do multi-field terms agg

Closes #5100
2025-02-19 19:35:02 +00:00 · 2014-09-07 11:09:52 +02:00 · 2014-09-07 11:09:52 +02:00 · 1bdf79e527
commit 1bdf79e527
parent 7c2490b2ad
1 changed files with 87 additions and 59 deletions
--- a/docs/reference/search/aggregations/bucket/terms-aggregation.asciidoc
+++ b/docs/reference/search/aggregations/bucket/terms-aggregation.asciidoc
@ -380,6 +380,7 @@ WARNING: When NOT sorting on `doc_count` descending, high values of `min_doc_cou
         back by increasing `shard_size`.
         Setting `shard_min_doc_count` too high will cause terms to be filtered out on a shard level. This value should be set much lower than `min_doc_count/#shards`.

+[[search-aggregations-bucket-terms-aggregation-script]]
 ==== Script

 Generating the terms using a script:
@ -476,6 +477,33 @@ http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CA
 http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS[`UNICODE_CHARACTER_CLASS`] and
 http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html#UNIX_LINES[`UNIX_LINES`]

+==== Multi-field terms aggregation
+
+The `terms` aggregation does not support collecting terms from multiple fields
+in the same document.  The reason is that the `terms` agg doesn't collect the
+string term values themselves, but rather uses
+<<search-aggregations-bucket-terms-aggregation-execution-hint,global ordinals>>
+to produce a list of all of the unique values in the field.  Global ordinals
+results in an important performance boost which would not be possible across
+multiple fields.
+
+There are two approaches that you can use to perform a `terms` agg across
+multiple fields:
+
+<<search-aggregations-bucket-terms-aggregation-script,Script>>::
+
+Use a script to retrieve terms from multiple fields.  This disables the global
+ordinals optimization and will be slower than collecting terms from a single
+field, but it gives you the flexibility to implement this option at search
+time.
+
+<<copy-to,`copy_to` field>>::
+
+If you know ahead of time that you want to collect the terms from two or more
+fields, then use `copy_to` in your mapping to create a new dedicated field at
+index time which contains the values from both fields.  You can aggregate on
+this single field, which will benefit from the global ordinals optimization.
+
 ==== Collect mode

 added[1.3.0] Deferring calculation of child aggregations
@ -548,7 +576,7 @@ WARNING: It is not possible to nest aggregations such as `top_hits` which requir
 the `breadth_first` collection mode. This is because this would require a RAM buffer to hold the float score value for every document and
 this would typically be too costly in terms of RAM.

-
+[[search-aggregations-bucket-terms-aggregation-execution-hint]]
 ==== Execution hint

 added[1.2.0] Added the `global_ordinals`, `global_ordinals_hash` and `global_ordinals_low_cardinality` execution modes