139 lines
4.0 KiB
Plaintext
139 lines
4.0 KiB
Plaintext
|
[[search-facets-histogram-facet]]
|
||
|
=== Histogram Facets
|
||
|
|
||
|
The histogram facet works with numeric data by building a histogram
|
||
|
across intervals of the field values. Each value is "rounded" into an
|
||
|
interval (or placed in a bucket), and statistics are provided per
|
||
|
interval/bucket (count and total). Here is a simple example:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {
|
||
|
"match_all" : {}
|
||
|
},
|
||
|
"facets" : {
|
||
|
"histo1" : {
|
||
|
"histogram" : {
|
||
|
"field" : "field_name",
|
||
|
"interval" : 100
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
The above example will run a histogram facet on the `field_name` filed,
|
||
|
with an `interval` of `100` (so, for example, a value of `1055` will be
|
||
|
placed within the `1000` bucket).
|
||
|
|
||
|
The interval can also be provided as a time based interval (using the
|
||
|
time format). This mainly make sense when working on date fields or
|
||
|
field that represent absolute milliseconds, here is an example:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {
|
||
|
"match_all" : {}
|
||
|
},
|
||
|
"facets" : {
|
||
|
"histo1" : {
|
||
|
"histogram" : {
|
||
|
"field" : "field_name",
|
||
|
"time_interval" : "1.5h"
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
==== Key and Value
|
||
|
|
||
|
The histogram facet allows to use a different key and value. The key is
|
||
|
used to place the hit/document within the appropriate bucket, and the
|
||
|
value is used to compute statistical data (for example, total). Here is
|
||
|
an example:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {
|
||
|
"match_all" : {}
|
||
|
},
|
||
|
"facets" : {
|
||
|
"histo1" : {
|
||
|
"histogram" : {
|
||
|
"key_field" : "key_field_name",
|
||
|
"value_field" : "value_field_name",
|
||
|
"interval" : 100
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
==== Script Key and Value
|
||
|
|
||
|
Sometimes, some munging of both the key and the value are needed. In the
|
||
|
key case, before it is rounded into a bucket, and for the value, when
|
||
|
the statistical data is computed per bucket
|
||
|
<<modules-scripting,scripts>> can be used. Here
|
||
|
is an example:
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {
|
||
|
"match_all" : {}
|
||
|
},
|
||
|
"facets" : {
|
||
|
"histo1" : {
|
||
|
"histogram" : {
|
||
|
"key_script" : "doc['date'].date.minuteOfHour",
|
||
|
"value_script" : "doc['num1'].value"
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
In the above sample, we can use a date type field called `date` to get
|
||
|
the minute of hour from it, and the total will be computed based on
|
||
|
another field `num1`. Note, in this case, no `interval` was provided, so
|
||
|
the bucket will be based directly on the `key_script` (no rounding).
|
||
|
|
||
|
Parameters can also be provided to the different scripts (preferable if
|
||
|
the script is the same, with different values for a specific parameter,
|
||
|
like "factor"):
|
||
|
|
||
|
[source,js]
|
||
|
--------------------------------------------------
|
||
|
{
|
||
|
"query" : {
|
||
|
"match_all" : {}
|
||
|
},
|
||
|
"facets" : {
|
||
|
"histo1" : {
|
||
|
"histogram" : {
|
||
|
"key_script" : "doc['date'].date.minuteOfHour * factor1",
|
||
|
"value_script" : "doc['num1'].value + factor2",
|
||
|
"params" : {
|
||
|
"factor1" : 2,
|
||
|
"factor2" : 3
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
}
|
||
|
--------------------------------------------------
|
||
|
|
||
|
==== Memory Considerations
|
||
|
|
||
|
In order to implement the histogram facet, the relevant field values are
|
||
|
loaded into memory from the index. This means that per shard, there
|
||
|
should be enough memory to contain them. Since by default, dynamic
|
||
|
introduced types are `long` and `double`, one option to reduce the
|
||
|
memory footprint is to explicitly set the types for the relevant fields
|
||
|
to either `short`, `integer`, or `float` when possible.
|