[DOCS] Merge rollup config details into API (#49412)
This commit is contained in:
parent
97c7ea60b9
commit
ca895d3ad5
|
@ -21,7 +21,7 @@ include-tagged::{doc-tests}/RollupDocumentationIT.java[x-pack-rollup-put-rollup-
|
|||
==== Rollup Job Configuration
|
||||
|
||||
The `RollupJobConfig` object contains all the details about the rollup job
|
||||
configuration. See {ref}/rollup-job-config.html[Rollup configuration] to learn more
|
||||
configuration. See {ref}/rollup-put-job.html[create rollup job API] to learn more
|
||||
about the various configuration settings.
|
||||
|
||||
A `RollupJobConfig` requires the following arguments:
|
||||
|
@ -45,7 +45,7 @@ include-tagged::{doc-tests}/RollupDocumentationIT.java[x-pack-rollup-put-rollup-
|
|||
|
||||
The grouping configuration of the Rollup job is defined in the `RollupJobConfig`
|
||||
using a `GroupConfig` instance. `GroupConfig` reflects all the configuration
|
||||
settings that can be defined using the REST API. See {ref}/rollup-job-config.html#rollup-groups-config[Grouping Config]
|
||||
settings that can be defined using the REST API. See {ref}/rollup-put-job.html#rollup-groups-config[Grouping config]
|
||||
to learn more about these settings.
|
||||
|
||||
Using the REST API, we could define this grouping configuration:
|
||||
|
@ -89,7 +89,7 @@ include-tagged::{doc-tests}/RollupDocumentationIT.java[x-pack-rollup-put-rollup-
|
|||
After defining which groups should be generated for the data, you next configure
|
||||
which metrics should be collected. The list of metrics is defined in the `RollupJobConfig`
|
||||
using a `List<MetricConfig>` instance. `MetricConfig` reflects all the configuration
|
||||
settings that can be defined using the REST API. See {ref}/rollup-job-config.html#rollup-metrics-config[Metrics Config]
|
||||
settings that can be defined using the REST API. See {ref}/rollup-put-job.html#rollup-metrics-config[Metrics config]
|
||||
to learn more about these settings.
|
||||
|
||||
Using the REST API, we could define this metrics configuration:
|
||||
|
|
|
@ -1,5 +1,5 @@
|
|||
[[search-aggregations-bucket-datehistogram-aggregation]]
|
||||
=== Date Histogram Aggregation
|
||||
=== Date histogram aggregation
|
||||
|
||||
This multi-bucket aggregation is similar to the normal
|
||||
<<search-aggregations-bucket-histogram-aggregation,histogram>>, but it can
|
||||
|
@ -10,7 +10,8 @@ that here the interval can be specified using date/time expressions. Time-based
|
|||
data requires special support because time-based intervals are not always a
|
||||
fixed length.
|
||||
|
||||
==== Calendar and Fixed intervals
|
||||
[[calendar_and_fixed_intervals]]
|
||||
==== Calendar and fixed intervals
|
||||
|
||||
When configuring a date histogram aggregation, the interval can be specified
|
||||
in two manners: calendar-aware time intervals, and fixed time intervals.
|
||||
|
@ -42,7 +43,8 @@ are clear to the user immediately and there is no ambiguity. The old `interval`
|
|||
will be removed in the future.
|
||||
==================================
|
||||
|
||||
===== Calendar Intervals
|
||||
[[calendar_intervals]]
|
||||
===== Calendar intervals
|
||||
|
||||
Calendar-aware intervals are configured with the `calendar_interval` parameter.
|
||||
Calendar intervals can only be specified in "singular" quantities of the unit
|
||||
|
@ -100,7 +102,8 @@ One year (1y) is the interval between the start day of the month and time of
|
|||
day and the same day of the month and time of day the following year in the
|
||||
specified timezone, so that the date and time are the same at the start and end. +
|
||||
|
||||
===== Calendar Interval Examples
|
||||
[[calendar_interval_examples]]
|
||||
===== Calendar interval examples
|
||||
As an example, here is an aggregation requesting bucket intervals of a month in calendar time:
|
||||
|
||||
[source,console]
|
||||
|
@ -157,7 +160,8 @@ POST /sales/_search?size=0
|
|||
--------------------------------------------------
|
||||
// NOTCONSOLE
|
||||
|
||||
===== Fixed Intervals
|
||||
[[fixed_intervals]]
|
||||
===== Fixed intervals
|
||||
|
||||
Fixed intervals are configured with the `fixed_interval` parameter.
|
||||
|
||||
|
@ -192,7 +196,8 @@ All days begin at the earliest possible time, which is usually 00:00:00
|
|||
|
||||
Defined as 24 hours (86,400,000 milliseconds)
|
||||
|
||||
===== Fixed Interval Examples
|
||||
[[fixed_interval_examples]]
|
||||
===== Fixed interval examples
|
||||
|
||||
If we try to recreate the "month" `calendar_interval` from earlier, we can approximate that with
|
||||
30 fixed days:
|
||||
|
|
|
@ -1015,3 +1015,8 @@ See <<ad-realm-configuration>>.
|
|||
=== How security works
|
||||
|
||||
See <<elasticsearch-security>>.
|
||||
|
||||
[role="exclude",id="rollup-job-config"]
|
||||
=== Rollup job configuration
|
||||
|
||||
See <<rollup-put-job-api-request-body>>.
|
||||
|
|
|
@ -26,6 +26,14 @@ experimental[]
|
|||
[[rollup-put-job-api-desc]]
|
||||
==== {api-description-title}
|
||||
|
||||
The {rollup-job} configuration contains all the details about how the job should
|
||||
run, when it indexes documents, and what future queries will be able to execute
|
||||
against the rollup index.
|
||||
|
||||
There are three main sections to the job configuration: the logistical details
|
||||
about the job (cron schedule, etc), the fields that are used for grouping, and
|
||||
what metrics to collect for each group.
|
||||
|
||||
Jobs are created in a `STOPPED` state. You can start them with the
|
||||
<<rollup-start-job,start {rollup-jobs} API>>.
|
||||
|
||||
|
@ -33,42 +41,183 @@ Jobs are created in a `STOPPED` state. You can start them with the
|
|||
==== {api-path-parms-title}
|
||||
|
||||
`<job_id>`::
|
||||
(Required, string) Identifier for the {rollup-job}.
|
||||
(Required, string) Identifier for the {rollup-job}. This can be any
|
||||
alphanumeric string and uniquely identifies the data that is associated with
|
||||
the {rollup-job}. The ID is persistent; it is stored with the rolled up data.
|
||||
If you create a job, let it run for a while, then delete the job, the data
|
||||
that the job rolled up is still be associated with this job ID. You cannot
|
||||
create a new job with the same ID since that could lead to problems with
|
||||
mismatched job configurations.
|
||||
|
||||
[[rollup-put-job-api-request-body]]
|
||||
==== {api-request-body-title}
|
||||
|
||||
`cron`::
|
||||
(Required, string) A cron string which defines when the {rollup-job} should be executed.
|
||||
(Required, string) A cron string which defines the intervals when the
|
||||
{rollup-job} should be executed. When the interval triggers, the indexer
|
||||
attempts to rollup the data in the index pattern. The cron pattern is
|
||||
unrelated to the time interval of the data being rolled up. For example, you
|
||||
may wish to create hourly rollups of your document but to only run the indexer
|
||||
on a daily basis at midnight, as defined by the cron. The cron pattern is
|
||||
defined just like a {watcher} cron schedule.
|
||||
|
||||
[[rollup-groups-config]]
|
||||
`groups`::
|
||||
(Required, object) Defines the grouping fields that are defined for this
|
||||
{rollup-job}. See <<rollup-job-config,{rollup-job} config>>.
|
||||
(Required, object) Defines the grouping fields and aggregations that are
|
||||
defined for this {rollup-job}. These fields will then be available later for
|
||||
aggregating into buckets.
|
||||
+
|
||||
--
|
||||
These aggs and fields can be used in any combination. Think of the `groups`
|
||||
configuration as defining a set of tools that can later be used in aggregations
|
||||
to partition the data. Unlike raw data, we have to think ahead to which fields
|
||||
and aggregations might be used. Rollups provide enough flexibility that you
|
||||
simply need to determine _which_ fields are needed, not _in what order_ they are
|
||||
needed.
|
||||
|
||||
There are three types of groupings currently available:
|
||||
--
|
||||
|
||||
`date_histogram`:::
|
||||
(Required, object) A date histogram group aggregates a `date` field into
|
||||
time-based buckets. This group is *mandatory*; you currently cannot rollup
|
||||
documents without a timestamp and a `date_histogram` group. The
|
||||
`date_histogram` group has several parameters:
|
||||
|
||||
`field`::::
|
||||
(Required, string) The date field that is to be rolled up.
|
||||
|
||||
`calendar_interval` or `fixed_interval`::::
|
||||
(Required, <<time-units,time units>>) The interval of time buckets to be
|
||||
generated when rolling up. For example, `60m` produces 60 minute (hourly)
|
||||
rollups. This follows standard time formatting syntax as used elsewhere in
|
||||
{es}. The interval defines the _minimum_ interval that can be aggregated only.
|
||||
If hourly (`60m`) intervals are configured, <<rollup-search,rollup search>>
|
||||
can execute aggregations with 60m or greater (weekly, monthly, etc) intervals.
|
||||
So define the interval as the smallest unit that you wish to later query. For
|
||||
more information about the difference between calendar and fixed time
|
||||
intervals, see <<rollup-understanding-group-intervals>>.
|
||||
+
|
||||
--
|
||||
NOTE: Smaller, more granular intervals take up proportionally more space.
|
||||
|
||||
--
|
||||
|
||||
`delay`::::
|
||||
(Optional,<<time-units,time units>>) How long to wait before rolling up new
|
||||
documents. By default, the indexer attempts to roll up all data that is
|
||||
available. However, it is not uncommon for data to arrive out of order,
|
||||
sometimes even a few days late. The indexer is unable to deal with data that
|
||||
arrives after a time-span has been rolled up. That is to say, there is no
|
||||
provision to update already-existing rollups.
|
||||
+
|
||||
--
|
||||
Instead, you should specify a `delay` that matches the longest period of time
|
||||
you expect out-of-order data to arrive. For example, a `delay` of `1d`
|
||||
instructs the indexer to roll up documents up to `now - 1d`, which provides
|
||||
a day of buffer time for out-of-order documents to arrive.
|
||||
--
|
||||
|
||||
`time_zone`::::
|
||||
(Optional, string) Defines what time_zone the rollup documents are stored as.
|
||||
Unlike raw data, which can shift timezones on the fly, rolled documents have
|
||||
to be stored with a specific timezone. By default, rollup documents are stored
|
||||
in `UTC`.
|
||||
|
||||
`terms`:::
|
||||
(Optional, object) The terms group can be used on `keyword` or numeric fields
|
||||
to allow bucketing via the `terms` aggregation at a later point. The indexer
|
||||
enumerates and stores _all_ values of a field for each time-period. This can
|
||||
be potentially costly for high-cardinality groups such as IP addresses,
|
||||
especially if the time-bucket is particularly sparse.
|
||||
+
|
||||
--
|
||||
TIP: While it is unlikely that a rollup will ever be larger in size than the raw
|
||||
data, defining `terms` groups on multiple high-cardinality fields can
|
||||
effectively reduce the compression of a rollup to a large extent. You should be
|
||||
judicious which high-cardinality fields are included for that reason.
|
||||
|
||||
The `terms` group has a single parameter:
|
||||
--
|
||||
|
||||
`fields`::::
|
||||
(Required, string) The set of fields that you wish to collect terms for. This
|
||||
array can contain fields that are both `keyword` and numerics. Order does not
|
||||
matter.
|
||||
|
||||
`histogram`:::
|
||||
(Optional, object) The histogram group aggregates one or more numeric fields
|
||||
into numeric histogram intervals.
|
||||
+
|
||||
--
|
||||
The `histogram` group has a two parameters:
|
||||
--
|
||||
|
||||
`fields`::::
|
||||
(Required, array) The set of fields that you wish to build histograms for. All fields
|
||||
specified must be some kind of numeric. Order does not matter.
|
||||
|
||||
`interval`::::
|
||||
(Required, integer) The interval of histogram buckets to be generated when
|
||||
rolling up. For example, a value of `5` creates buckets that are five units
|
||||
wide (`0-5`, `5-10`, etc). Note that only one interval can be specified in the
|
||||
`histogram` group, meaning that all fields being grouped via the histogram
|
||||
must share the same interval.
|
||||
|
||||
`index_pattern`::
|
||||
(Required, string) The index or index pattern to roll up. Supports
|
||||
wildcard-style patterns (`logstash-*`).
|
||||
wildcard-style patterns (`logstash-*`). The job will
|
||||
attempt to rollup the entire index or index-pattern.
|
||||
+
|
||||
--
|
||||
NOTE: The `index_pattern` cannot be a pattern that would also match the
|
||||
destination `rollup_index`. For example, the pattern `foo-*` would match the
|
||||
rollup index `foo-rollup`. This situation would cause problems because the
|
||||
{rollup-job} would attempt to rollup its own data at runtime. If you attempt to
|
||||
configure a pattern that matches the `rollup_index`, an exception occurs to
|
||||
prevent this behavior.
|
||||
|
||||
--
|
||||
|
||||
[[rollup-metrics-config]]
|
||||
`metrics`::
|
||||
(Optional, object) Defines the metrics to collect for each grouping tuple. See
|
||||
<<rollup-job-config,{rollup-job} config>>.
|
||||
(Optional, object) Defines the metrics to collect for each grouping tuple.
|
||||
By default, only the doc_counts are collected for each group. To make rollup
|
||||
useful, you will often add metrics like averages, mins, maxes, etc. Metrics
|
||||
are defined on a per-field basis and for each field you configure which metric
|
||||
should be collected.
|
||||
+
|
||||
--
|
||||
The `metrics` configuration accepts an array of objects, where each object has
|
||||
two parameters:
|
||||
--
|
||||
|
||||
`field`:::
|
||||
(Required, string) The field to collect metrics for. This must be a numeric
|
||||
of some kind.
|
||||
|
||||
`metrics`:::
|
||||
(Required, array) An array of metrics to collect for the field. At least one
|
||||
metric must be configured. Acceptable metrics are `min`,`max`,`sum`,`avg`, and
|
||||
`value_count`.
|
||||
|
||||
`page_size`::
|
||||
(Required, integer) The number of bucket results that are processed on each
|
||||
iteration of the rollup indexer. A larger value tends to execute faster, but
|
||||
requires more memory during processing.
|
||||
requires more memory during processing. This value has no effect on how the
|
||||
data is rolled up; it is merely used for tweaking the speed or memory cost of
|
||||
the indexer.
|
||||
|
||||
`rollup_index`::
|
||||
(Required, string) The index that contains the rollup results. The index can
|
||||
be shared with other {rollup-jobs}.
|
||||
|
||||
For more details about the job configuration, see <<rollup-job-config>>.
|
||||
be shared with other {rollup-jobs}. The data is stored so that it doesn't
|
||||
interfere with unrelated jobs.
|
||||
|
||||
[[rollup-put-job-api-example]]
|
||||
==== {api-example-title}
|
||||
|
||||
The following example creates a {rollup-job} named "sensor", targeting the
|
||||
"sensor-*" index pattern:
|
||||
The following example creates a {rollup-job} named `sensor`, targeting the
|
||||
`sensor-*` index pattern:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
|
@ -78,7 +227,7 @@ PUT _rollup/job/sensor
|
|||
"rollup_index": "sensor_rollup",
|
||||
"cron": "*/30 * * * * ?",
|
||||
"page_size" :1000,
|
||||
"groups" : {
|
||||
"groups" : { <1>
|
||||
"date_histogram": {
|
||||
"field": "timestamp",
|
||||
"fixed_interval": "1h",
|
||||
|
@ -88,7 +237,7 @@ PUT _rollup/job/sensor
|
|||
"fields": ["node"]
|
||||
}
|
||||
},
|
||||
"metrics": [
|
||||
"metrics": [ <2>
|
||||
{
|
||||
"field": "temperature",
|
||||
"metrics": ["min", "max", "sum"]
|
||||
|
@ -101,6 +250,11 @@ PUT _rollup/job/sensor
|
|||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:sensor_index]
|
||||
<1> This configuration enables date histograms to be used on the `timestamp`
|
||||
field and `terms` aggregations to be used on the `node` field.
|
||||
<2> This configuration defines metrics over two fields: `temperature` and
|
||||
`voltage`. For the `temperature` field, we are collecting the min, max, and
|
||||
sum of the temperature. For `voltage`, we are collecting the average.
|
||||
|
||||
When the job is created, you receive the following results:
|
||||
|
||||
|
@ -109,4 +263,4 @@ When the job is created, you receive the following results:
|
|||
{
|
||||
"acknowledged": true
|
||||
}
|
||||
----
|
||||
----
|
||||
|
|
|
@ -1,279 +0,0 @@
|
|||
[role="xpack"]
|
||||
[testenv="basic"]
|
||||
[[rollup-job-config]]
|
||||
=== Rollup job configuration
|
||||
|
||||
experimental[]
|
||||
|
||||
The Rollup Job Configuration contains all the details about how the rollup job should run, when it indexes documents,
|
||||
and what future queries will be able to execute against the rollup index.
|
||||
|
||||
There are three main sections to the Job Configuration; the logistical details about the job (cron schedule, etc), what fields
|
||||
should be grouped on, and what metrics to collect for each group.
|
||||
|
||||
A full job configuration might look like this:
|
||||
|
||||
[source,console]
|
||||
--------------------------------------------------
|
||||
PUT _rollup/job/sensor
|
||||
{
|
||||
"index_pattern": "sensor-*",
|
||||
"rollup_index": "sensor_rollup",
|
||||
"cron": "*/30 * * * * ?",
|
||||
"page_size" :1000,
|
||||
"groups" : {
|
||||
"date_histogram": {
|
||||
"field": "timestamp",
|
||||
"fixed_interval": "60m",
|
||||
"delay": "7d"
|
||||
},
|
||||
"terms": {
|
||||
"fields": ["hostname", "datacenter"]
|
||||
},
|
||||
"histogram": {
|
||||
"fields": ["load", "net_in", "net_out"],
|
||||
"interval": 5
|
||||
}
|
||||
},
|
||||
"metrics": [
|
||||
{
|
||||
"field": "temperature",
|
||||
"metrics": ["min", "max", "sum"]
|
||||
},
|
||||
{
|
||||
"field": "voltage",
|
||||
"metrics": ["avg"]
|
||||
}
|
||||
]
|
||||
}
|
||||
--------------------------------------------------
|
||||
// TEST[setup:sensor_index]
|
||||
|
||||
==== Logistical Details
|
||||
|
||||
In the above example, there are several pieces of logistical configuration for the job itself.
|
||||
|
||||
`{job_id}` (required)::
|
||||
(string) In the endpoint URL, you specify the name of the job (`sensor` in the above example). This can be any alphanumeric string,
|
||||
and uniquely identifies the data that is associated with the rollup job. The ID is persistent, in that it is stored with the rolled
|
||||
up data. So if you create a job, let it run for a while, then delete the job... the data that the job rolled up will still be
|
||||
associated with this job ID. You will be unable to create a new job with the same ID, as that could lead to problems with mismatched
|
||||
job configurations
|
||||
|
||||
`index_pattern` (required)::
|
||||
(string) The index, or index pattern, that you wish to rollup. Supports wildcard-style patterns (`logstash-*`). The job will
|
||||
attempt to rollup the entire index or index-pattern. Once the "backfill" is finished, it will periodically (as defined by the cron)
|
||||
look for new data and roll that up too.
|
||||
|
||||
`rollup_index` (required)::
|
||||
(string) The index that you wish to store rollup results into. All the rollup data that is generated by the job will be
|
||||
stored in this index. When searching the rollup data, this index will be used in the <<rollup-search,Rollup Search>> endpoint's URL.
|
||||
The rollup index can be shared with other rollup jobs. The data is stored so that it doesn't interfere with unrelated jobs.
|
||||
|
||||
`cron` (required)::
|
||||
(string) A cron string which defines when the rollup job should be executed. The cron string defines an interval of when to run
|
||||
the job's indexer. When the interval triggers, the indexer will attempt to rollup the data in the index pattern. The cron pattern
|
||||
is unrelated to the time interval of the data being rolled up. For example, you may wish to create hourly rollups of your document (as
|
||||
defined in the <<rollup-groups-config,grouping configuration>>) but to only run the indexer on a daily basis at midnight, as defined by the cron.
|
||||
The cron pattern is defined just like Watcher's Cron Schedule.
|
||||
|
||||
`page_size` (required)::
|
||||
(int) The number of bucket results that should be processed on each iteration of the rollup indexer. A larger value
|
||||
will tend to execute faster, but will require more memory during processing. This has no effect on how the data is rolled up, it is
|
||||
merely used for tweaking the speed/memory cost of the indexer.
|
||||
|
||||
[NOTE]
|
||||
The `index_pattern` cannot be a pattern that would also match the destination `rollup_index`. E.g. the pattern
|
||||
`"foo-*"` would match the rollup index `"foo-rollup"`. This causes problems because the rollup job would attempt
|
||||
to rollup it's own data at runtime. If you attempt to configure a pattern that matches the `rollup_index`, an exception
|
||||
will be thrown to prevent this behavior.
|
||||
|
||||
[[rollup-groups-config]]
|
||||
==== Grouping Config
|
||||
|
||||
The `groups` section of the configuration is where you decide which fields should be grouped on, and with what aggregations. These
|
||||
fields will then be available later for aggregating into buckets. For example, this configuration:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"groups" : {
|
||||
"date_histogram": {
|
||||
"field": "timestamp",
|
||||
"fixed_interval": "60m",
|
||||
"delay": "7d"
|
||||
},
|
||||
"terms": {
|
||||
"fields": ["hostname", "datacenter"]
|
||||
},
|
||||
"histogram": {
|
||||
"fields": ["load", "net_in", "net_out"],
|
||||
"interval": 5
|
||||
}
|
||||
}
|
||||
--------------------------------------------------
|
||||
// NOTCONSOLE
|
||||
|
||||
Allows `date_histogram`'s to be used on the `"timestamp"` field, `terms` aggregations to be used on the `"hostname"` and `"datacenter"`
|
||||
fields, and `histograms` to be used on any of `"load"`, `"net_in"`, `"net_out"` fields.
|
||||
|
||||
Importantly, these aggs/fields can be used in any combination. Think of the `groups` configuration as defining a set of tools that can
|
||||
later be used in aggregations to partition the data. Unlike raw data, we have to think ahead to which fields and aggregations might be used.
|
||||
But Rollups provide enough flexibility that you simply need to determine _which_ fields are needed, not _in what order_ they are needed.
|
||||
|
||||
There are three types of groupings currently available:
|
||||
|
||||
===== Date Histogram
|
||||
|
||||
A `date_histogram` group aggregates a `date` field into time-based buckets. The `date_histogram` group is *mandatory* -- you currently
|
||||
cannot rollup documents without a timestamp and a `date_histogram` group.
|
||||
|
||||
The `date_histogram` group has several parameters:
|
||||
|
||||
`field` (required)::
|
||||
The date field that is to be rolled up.
|
||||
|
||||
`interval` (required)::
|
||||
The interval of time buckets to be generated when rolling up. E.g. `"60m"` will produce 60 minute (hourly) rollups. This follows standard time formatting
|
||||
syntax as used elsewhere in Elasticsearch. The `interval` defines the _minimum_ interval that can be aggregated only. If hourly (`"60m"`)
|
||||
intervals are configured, <<rollup-search,Rollup Search>> can execute aggregations with 60m or greater (weekly, monthly, etc) intervals.
|
||||
So define the interval as the smallest unit that you wish to later query.
|
||||
|
||||
Note: smaller, more granular intervals take up proportionally more space.
|
||||
|
||||
`delay`::
|
||||
How long to wait before rolling up new documents. By default, the indexer attempts to roll up all data that is available. However, it
|
||||
is not uncommon for data to arrive out of order, sometimes even a few days late. The indexer is unable to deal with data that arrives
|
||||
after a time-span has been rolled up (e.g. there is no provision to update already-existing rollups).
|
||||
|
||||
Instead, you should specify a `delay` that matches the longest period of time you expect out-of-order data to arrive. E.g. a `delay` of
|
||||
`"1d"` will instruct the indexer to roll up documents up to `"now - 1d"`, which provides a day of buffer time for out-of-order documents
|
||||
to arrive.
|
||||
|
||||
`time_zone`::
|
||||
Defines what time_zone the rollup documents are stored as. Unlike raw data, which can shift timezones on the fly, rolled documents have
|
||||
to be stored with a specific timezone. By default, rollup documents are stored in `UTC`, but this can be changed with the `time_zone`
|
||||
parameter.
|
||||
|
||||
.Calendar vs Fixed time intervals
|
||||
**********************************
|
||||
Elasticsearch understands both "calendar" and "fixed" time intervals. Fixed time intervals are fairly easy to understand;
|
||||
`"60s"` means sixty seconds. But what does `"1M` mean? One month of time depends on which month we are talking about,
|
||||
some months are longer or shorter than others. This is an example of "calendar" time, and the duration of that unit
|
||||
depends on context. Calendar units are also affected by leap-seconds, leap-years, etc.
|
||||
|
||||
This is important because the buckets generated by Rollup will be in either calendar or fixed intervals, and will limit
|
||||
how you can query them later (see <<rollup-search-limitations-intervals, Requests must be multiples of the config>>.
|
||||
|
||||
We recommend sticking with "fixed" time intervals, since they are easier to understand and are more flexible at query
|
||||
time. It will introduce some drift in your data during leap-events, and you will have to think about months in a fixed
|
||||
quantity (30 days) instead of the actual calendar length... but it is often easier than dealing with calendar units
|
||||
at query time.
|
||||
|
||||
Multiples of units are always "fixed" (e.g. `"2h"` is always the fixed quantity `7200` seconds. Single units can be
|
||||
fixed or calendar depending on the unit:
|
||||
|
||||
[options="header"]
|
||||
|=======
|
||||
|Unit |Calendar |Fixed
|
||||
|millisecond |NA |`1ms`, `10ms`, etc
|
||||
|second |NA |`1s`, `10s`, etc
|
||||
|minute |`1m` |`2m`, `10m`, etc
|
||||
|hour |`1h` |`2h`, `10h`, etc
|
||||
|day |`1d` |`2d`, `10d`, etc
|
||||
|week |`1w` |NA
|
||||
|month |`1M` |NA
|
||||
|quarter |`1q` |NA
|
||||
|year |`1y` |NA
|
||||
|=======
|
||||
|
||||
For some units where there are both fixed and calendar, you may need to express the quantity in terms of the next
|
||||
smaller unit. For example, if you want a fixed day (not a calendar day), you should specify `24h` instead of `1d`.
|
||||
Similarly, if you want fixed hours, specify `60m` instead of `1h`. This is because the single quantity entails
|
||||
calendar time, and limits you to querying by calendar time in the future.
|
||||
|
||||
|
||||
**********************************
|
||||
|
||||
===== Terms
|
||||
|
||||
The `terms` group can be used on `keyword` or numeric fields, to allow bucketing via the `terms` aggregation at a later point. The `terms`
|
||||
group is optional. If defined, the indexer will enumerate and store _all_ values of a field for each time-period. This can be potentially
|
||||
costly for high-cardinality groups such as IP addresses, especially if the time-bucket is particularly sparse.
|
||||
|
||||
While it is unlikely that a rollup will ever be larger in size than the raw data, defining `terms` groups on multiple high-cardinality fields
|
||||
can effectively reduce the compression of a rollup to a large extent. You should be judicious which high-cardinality fields are included
|
||||
for that reason.
|
||||
|
||||
The `terms` group has a single parameter:
|
||||
|
||||
`fields` (required)::
|
||||
The set of fields that you wish to collect terms for. This array can contain fields that are both `keyword` and numerics. Order
|
||||
does not matter
|
||||
|
||||
|
||||
===== Histogram
|
||||
|
||||
The `histogram` group aggregates one or more numeric fields into numeric histogram intervals. This group is optional
|
||||
|
||||
|
||||
The `histogram` group has a two parameters:
|
||||
|
||||
`fields` (required)::
|
||||
The set of fields that you wish to build histograms for. All fields specified must be some kind of numeric. Order does not matter
|
||||
|
||||
`interval` (required)::
|
||||
The interval of histogram buckets to be generated when rolling up. E.g. `5` will create buckets that are five units wide
|
||||
(`0-5`, `5-10`, etc). Note that only one interval can be specified in the `histogram` group, meaning that all fields being grouped via
|
||||
the histogram must share the same interval.
|
||||
|
||||
[[rollup-metrics-config]]
|
||||
==== Metrics Config
|
||||
|
||||
After defining which groups should be generated for the data, you next configure which metrics should be collected. By default, only
|
||||
the doc_counts are collected for each group. To make rollup useful, you will often add metrics like averages, mins, maxes, etc.
|
||||
|
||||
Metrics are defined on a per-field basis, and for each field you configure which metric should be collected. For example:
|
||||
|
||||
[source,js]
|
||||
--------------------------------------------------
|
||||
"metrics": [
|
||||
{
|
||||
"field": "temperature",
|
||||
"metrics": ["min", "max", "sum"]
|
||||
},
|
||||
{
|
||||
"field": "voltage",
|
||||
"metrics": ["avg"]
|
||||
}
|
||||
]
|
||||
--------------------------------------------------
|
||||
// NOTCONSOLE
|
||||
|
||||
This configuration defines metrics over two fields, `"temperature` and `"voltage"`. For the `"temperature"` field, we are collecting
|
||||
the min, max and sum of the temperature. For `"voltage"`, we are collecting the average. These metrics are collected in a way that makes
|
||||
them compatible with any combination of defined groups.
|
||||
|
||||
The `metrics` configuration accepts an array of objects, where each object has two parameters:
|
||||
|
||||
`field` (required)::
|
||||
The field to collect metrics for. This must be a numeric of some kind
|
||||
|
||||
`metrics` (required)::
|
||||
An array of metrics to collect for the field. At least one metric must be configured. Acceptable metrics are min/max/sum/avg/value_count.
|
||||
|
||||
|
||||
|
||||
.Averages aren't composable?!
|
||||
**********************************
|
||||
If you've worked with rollups before, you may be cautious around averages. If an average is saved for a 10 minute
|
||||
interval, it usually isn't useful for larger intervals. You cannot average six 10-minute averages to find a
|
||||
hourly average (average of averages is not equal to the total average).
|
||||
|
||||
For this reason, other systems tend to either omit the ability to average, or store the average at multiple intervals
|
||||
to support more flexible querying.
|
||||
|
||||
Instead, the Rollup feature saves the `count` and `sum` for the defined time interval. This allows us to reconstruct
|
||||
the average at any interval greater-than or equal to the defined interval. This gives maximum flexibility for
|
||||
minimal storage costs... and you don't have to worry about average accuracies (no average of averages here!)
|
||||
**********************************
|
||||
|
|
@ -10,7 +10,6 @@
|
|||
* <<rollup-put-job,Create>> or <<rollup-delete-job,delete {rollup-jobs}>>
|
||||
* <<rollup-start-job,Start>> or <<rollup-stop-job,stop {rollup-jobs}>>
|
||||
* <<rollup-get-job,Get {rollup-jobs}>>
|
||||
* <<rollup-job-config,Job configuration details>>
|
||||
|
||||
[float]
|
||||
[[rollup-data-endpoint]]
|
||||
|
@ -32,6 +31,5 @@ include::apis/get-job.asciidoc[]
|
|||
include::apis/rollup-caps.asciidoc[]
|
||||
include::apis/rollup-index-caps.asciidoc[]
|
||||
include::apis/rollup-search.asciidoc[]
|
||||
include::apis/rollup-job-config.asciidoc[]
|
||||
include::apis/start-job.asciidoc[]
|
||||
include::apis/stop-job.asciidoc[]
|
|
@ -72,12 +72,11 @@ seconds worth of data that was indexed into the `sensor-*` indices.
|
|||
If instead the cron was configured to run once a day at midnight, the job would process the last 24 hours worth of data. The choice is largely
|
||||
preference, based on how "realtime" you want the rollups, and if you wish to process continuously or move it to off-peak hours.
|
||||
|
||||
Next, we define a set of `groups` and `metrics`. The metrics are fairly straightforward: we want to save the min/max/sum of the `temperature`
|
||||
field, and the average of the `voltage` field.
|
||||
|
||||
The groups are a little more interesting. Essentially, we are defining the dimensions that we wish to pivot on at a later date when
|
||||
querying the data. The grouping in this job allows us to use date_histograms aggregations on the `timestamp` field, rolled up at hourly intervals.
|
||||
It also allows us to run terms aggregations on the `node` field.
|
||||
Next, we define a set of `groups`. Essentially, we are defining the dimensions
|
||||
that we wish to pivot on at a later date when querying the data. The grouping in
|
||||
this job allows us to use `date_histogram` aggregations on the `timestamp` field,
|
||||
rolled up at hourly intervals. It also allows us to run terms aggregations on
|
||||
the `node` field.
|
||||
|
||||
.Date histogram interval vs cron schedule
|
||||
**********************************
|
||||
|
@ -93,8 +92,31 @@ simply go back to sleep. But there's nothing wrong with it either, the job will
|
|||
|
||||
**********************************
|
||||
|
||||
For more details about the job syntax, see <<rollup-job-config>>.
|
||||
After defining which groups should be generated for the data, you next configure
|
||||
which metrics should be collected. By default, only the `doc_counts` are
|
||||
collected for each group. To make rollup useful, you will often add metrics
|
||||
like averages, mins, maxes, etc. In this example, the metrics are fairly
|
||||
straightforward: we want to save the min/max/sum of the `temperature`
|
||||
field, and the average of the `voltage` field.
|
||||
|
||||
.Averages aren't composable?!
|
||||
**********************************
|
||||
If you've worked with rollups before, you may be cautious around averages. If an
|
||||
average is saved for a 10 minute interval, it usually isn't useful for larger
|
||||
intervals. You cannot average six 10-minute averages to find a hourly average;
|
||||
the average of averages is not equal to the total average.
|
||||
|
||||
For this reason, other systems tend to either omit the ability to average or
|
||||
store the average at multiple intervals to support more flexible querying.
|
||||
|
||||
Instead, the {rollup-features} save the `count` and `sum` for the defined time
|
||||
interval. This allows us to reconstruct the average at any interval greater-than
|
||||
or equal to the defined interval. This gives maximum flexibility for minimal
|
||||
storage costs... and you don't have to worry about average accuracies (no
|
||||
average of averages here!)
|
||||
**********************************
|
||||
|
||||
For more details about the job syntax, see <<rollup-put-job>>.
|
||||
|
||||
After you execute the above command and create the job, you'll receive the following response:
|
||||
|
||||
|
|
|
@ -119,7 +119,54 @@ Rollup Search to execute:
|
|||
|
||||
Ultimately, when configuring `groups` for a job, think in terms of how you might wish to partition data in a query at a future date...
|
||||
then include those in the config. Because Rollup Search allows any order or combination of the grouped fields, you just need to decide
|
||||
if a field is useful for aggregating later, and how you might wish to use it (terms, histogram, etc)
|
||||
if a field is useful for aggregating later, and how you might wish to use it (terms, histogram, etc).
|
||||
|
||||
[[rollup-understanding-group-intervals]]
|
||||
==== Calendar vs fixed time intervals
|
||||
|
||||
Each rollup-job must have a date histogram group with a defined interval. {es}
|
||||
understands both
|
||||
<<calendar_and_fixed_intervals,calendar and fixed time intervals>>. Fixed time
|
||||
intervals are fairly easy to understand; `60s` means sixty seconds. But what
|
||||
does `1M` mean? One month of time depends on which month we are talking about,
|
||||
some months are longer or shorter than others. This is an example of calendar
|
||||
time and the duration of that unit depends on context. Calendar units are also
|
||||
affected by leap-seconds, leap-years, etc.
|
||||
|
||||
This is important because the buckets generated by rollup are in either calendar
|
||||
or fixed intervals and this limits how you can query them later. See
|
||||
<<rollup-search-limitations-intervals>>.
|
||||
|
||||
We recommend sticking with fixed time intervals, since they are easier to
|
||||
understand and are more flexible at query time. It will introduce some drift in
|
||||
your data during leap-events and you will have to think about months in a fixed
|
||||
quantity (30 days) instead of the actual calendar length. However, it is often
|
||||
easier than dealing with calendar units at query time.
|
||||
|
||||
Multiples of units are always "fixed". For example, `2h` is always the fixed
|
||||
quantity `7200` seconds. Single units can be fixed or calendar depending on the
|
||||
unit:
|
||||
|
||||
[options="header"]
|
||||
|=======
|
||||
|Unit |Calendar |Fixed
|
||||
|millisecond |NA |`1ms`, `10ms`, etc
|
||||
|second |NA |`1s`, `10s`, etc
|
||||
|minute |`1m` |`2m`, `10m`, etc
|
||||
|hour |`1h` |`2h`, `10h`, etc
|
||||
|day |`1d` |`2d`, `10d`, etc
|
||||
|week |`1w` |NA
|
||||
|month |`1M` |NA
|
||||
|quarter |`1q` |NA
|
||||
|year |`1y` |NA
|
||||
|=======
|
||||
|
||||
For some units where there are both fixed and calendar, you may need to express
|
||||
the quantity in terms of the next smaller unit. For example, if you want a fixed
|
||||
day (not a calendar day), you should specify `24h` instead of `1d`. Similarly,
|
||||
if you want fixed hours, specify `60m` instead of `1h`. This is because the
|
||||
single quantity entails calendar time, and limits you to querying by calendar
|
||||
time in the future.
|
||||
|
||||
==== Grouping limitations with heterogeneous indices
|
||||
|
||||
|
|
Loading…
Reference in New Issue