267 lines
9.8 KiB
Plaintext
267 lines
9.8 KiB
Plaintext
[role="xpack"]
|
|
[testenv="basic"]
|
|
[[rollup-put-job]]
|
|
=== Create {rollup-jobs} API
|
|
[subs="attributes"]
|
|
++++
|
|
<titleabbrev>Create {rollup-jobs}</titleabbrev>
|
|
++++
|
|
|
|
Creates a {rollup-job}.
|
|
|
|
experimental[]
|
|
|
|
[[rollup-put-job-api-request]]
|
|
==== {api-request-title}
|
|
|
|
`PUT _rollup/job/<job_id>`
|
|
|
|
[[rollup-put-job-api-prereqs]]
|
|
==== {api-prereq-title}
|
|
|
|
* If the {es} {security-features} are enabled, you must have `manage` or
|
|
`manage_rollup` cluster privileges to use this API. For more information, see
|
|
<<security-privileges>>.
|
|
|
|
[[rollup-put-job-api-desc]]
|
|
==== {api-description-title}
|
|
|
|
The {rollup-job} configuration contains all the details about how the job should
|
|
run, when it indexes documents, and what future queries will be able to execute
|
|
against the rollup index.
|
|
|
|
There are three main sections to the job configuration: the logistical details
|
|
about the job (cron schedule, etc), the fields that are used for grouping, and
|
|
what metrics to collect for each group.
|
|
|
|
Jobs are created in a `STOPPED` state. You can start them with the
|
|
<<rollup-start-job,start {rollup-jobs} API>>.
|
|
|
|
[[rollup-put-job-api-path-params]]
|
|
==== {api-path-parms-title}
|
|
|
|
`<job_id>`::
|
|
(Required, string) Identifier for the {rollup-job}. This can be any
|
|
alphanumeric string and uniquely identifies the data that is associated with
|
|
the {rollup-job}. The ID is persistent; it is stored with the rolled up data.
|
|
If you create a job, let it run for a while, then delete the job, the data
|
|
that the job rolled up is still be associated with this job ID. You cannot
|
|
create a new job with the same ID since that could lead to problems with
|
|
mismatched job configurations.
|
|
|
|
[[rollup-put-job-api-request-body]]
|
|
==== {api-request-body-title}
|
|
|
|
`cron`::
|
|
(Required, string) A cron string which defines the intervals when the
|
|
{rollup-job} should be executed. When the interval triggers, the indexer
|
|
attempts to rollup the data in the index pattern. The cron pattern is
|
|
unrelated to the time interval of the data being rolled up. For example, you
|
|
may wish to create hourly rollups of your document but to only run the indexer
|
|
on a daily basis at midnight, as defined by the cron. The cron pattern is
|
|
defined just like a {watcher} cron schedule.
|
|
|
|
[[rollup-groups-config]]
|
|
`groups`::
|
|
(Required, object) Defines the grouping fields and aggregations that are
|
|
defined for this {rollup-job}. These fields will then be available later for
|
|
aggregating into buckets.
|
|
+
|
|
--
|
|
These aggs and fields can be used in any combination. Think of the `groups`
|
|
configuration as defining a set of tools that can later be used in aggregations
|
|
to partition the data. Unlike raw data, we have to think ahead to which fields
|
|
and aggregations might be used. Rollups provide enough flexibility that you
|
|
simply need to determine _which_ fields are needed, not _in what order_ they are
|
|
needed.
|
|
|
|
There are three types of groupings currently available:
|
|
--
|
|
|
|
`date_histogram`:::
|
|
(Required, object) A date histogram group aggregates a `date` field into
|
|
time-based buckets. This group is *mandatory*; you currently cannot rollup
|
|
documents without a timestamp and a `date_histogram` group. The
|
|
`date_histogram` group has several parameters:
|
|
|
|
`field`::::
|
|
(Required, string) The date field that is to be rolled up.
|
|
|
|
`calendar_interval` or `fixed_interval`::::
|
|
(Required, <<time-units,time units>>) The interval of time buckets to be
|
|
generated when rolling up. For example, `60m` produces 60 minute (hourly)
|
|
rollups. This follows standard time formatting syntax as used elsewhere in
|
|
{es}. The interval defines the _minimum_ interval that can be aggregated only.
|
|
If hourly (`60m`) intervals are configured, <<rollup-search,rollup search>>
|
|
can execute aggregations with 60m or greater (weekly, monthly, etc) intervals.
|
|
So define the interval as the smallest unit that you wish to later query. For
|
|
more information about the difference between calendar and fixed time
|
|
intervals, see <<rollup-understanding-group-intervals>>.
|
|
+
|
|
--
|
|
NOTE: Smaller, more granular intervals take up proportionally more space.
|
|
|
|
--
|
|
|
|
`delay`::::
|
|
(Optional,<<time-units,time units>>) How long to wait before rolling up new
|
|
documents. By default, the indexer attempts to roll up all data that is
|
|
available. However, it is not uncommon for data to arrive out of order,
|
|
sometimes even a few days late. The indexer is unable to deal with data that
|
|
arrives after a time-span has been rolled up. That is to say, there is no
|
|
provision to update already-existing rollups.
|
|
+
|
|
--
|
|
Instead, you should specify a `delay` that matches the longest period of time
|
|
you expect out-of-order data to arrive. For example, a `delay` of `1d`
|
|
instructs the indexer to roll up documents up to `now - 1d`, which provides
|
|
a day of buffer time for out-of-order documents to arrive.
|
|
--
|
|
|
|
`time_zone`::::
|
|
(Optional, string) Defines what time_zone the rollup documents are stored as.
|
|
Unlike raw data, which can shift timezones on the fly, rolled documents have
|
|
to be stored with a specific timezone. By default, rollup documents are stored
|
|
in `UTC`.
|
|
|
|
`terms`:::
|
|
(Optional, object) The terms group can be used on `keyword` or numeric fields
|
|
to allow bucketing via the `terms` aggregation at a later point. The indexer
|
|
enumerates and stores _all_ values of a field for each time-period. This can
|
|
be potentially costly for high-cardinality groups such as IP addresses,
|
|
especially if the time-bucket is particularly sparse.
|
|
+
|
|
--
|
|
TIP: While it is unlikely that a rollup will ever be larger in size than the raw
|
|
data, defining `terms` groups on multiple high-cardinality fields can
|
|
effectively reduce the compression of a rollup to a large extent. You should be
|
|
judicious which high-cardinality fields are included for that reason.
|
|
|
|
The `terms` group has a single parameter:
|
|
--
|
|
|
|
`fields`::::
|
|
(Required, string) The set of fields that you wish to collect terms for. This
|
|
array can contain fields that are both `keyword` and numerics. Order does not
|
|
matter.
|
|
|
|
`histogram`:::
|
|
(Optional, object) The histogram group aggregates one or more numeric fields
|
|
into numeric histogram intervals.
|
|
+
|
|
--
|
|
The `histogram` group has a two parameters:
|
|
--
|
|
|
|
`fields`::::
|
|
(Required, array) The set of fields that you wish to build histograms for. All fields
|
|
specified must be some kind of numeric. Order does not matter.
|
|
|
|
`interval`::::
|
|
(Required, integer) The interval of histogram buckets to be generated when
|
|
rolling up. For example, a value of `5` creates buckets that are five units
|
|
wide (`0-5`, `5-10`, etc). Note that only one interval can be specified in the
|
|
`histogram` group, meaning that all fields being grouped via the histogram
|
|
must share the same interval.
|
|
|
|
`index_pattern`::
|
|
(Required, string) The index or index pattern to roll up. Supports
|
|
wildcard-style patterns (`logstash-*`). The job will
|
|
attempt to rollup the entire index or index-pattern.
|
|
+
|
|
--
|
|
NOTE: The `index_pattern` cannot be a pattern that would also match the
|
|
destination `rollup_index`. For example, the pattern `foo-*` would match the
|
|
rollup index `foo-rollup`. This situation would cause problems because the
|
|
{rollup-job} would attempt to rollup its own data at runtime. If you attempt to
|
|
configure a pattern that matches the `rollup_index`, an exception occurs to
|
|
prevent this behavior.
|
|
|
|
--
|
|
|
|
[[rollup-metrics-config]]
|
|
`metrics`::
|
|
(Optional, object) Defines the metrics to collect for each grouping tuple.
|
|
By default, only the doc_counts are collected for each group. To make rollup
|
|
useful, you will often add metrics like averages, mins, maxes, etc. Metrics
|
|
are defined on a per-field basis and for each field you configure which metric
|
|
should be collected.
|
|
+
|
|
--
|
|
The `metrics` configuration accepts an array of objects, where each object has
|
|
two parameters:
|
|
--
|
|
|
|
`field`:::
|
|
(Required, string) The field to collect metrics for. This must be a numeric
|
|
of some kind.
|
|
|
|
`metrics`:::
|
|
(Required, array) An array of metrics to collect for the field. At least one
|
|
metric must be configured. Acceptable metrics are `min`,`max`,`sum`,`avg`, and
|
|
`value_count`.
|
|
|
|
`page_size`::
|
|
(Required, integer) The number of bucket results that are processed on each
|
|
iteration of the rollup indexer. A larger value tends to execute faster, but
|
|
requires more memory during processing. This value has no effect on how the
|
|
data is rolled up; it is merely used for tweaking the speed or memory cost of
|
|
the indexer.
|
|
|
|
`rollup_index`::
|
|
(Required, string) The index that contains the rollup results. The index can
|
|
be shared with other {rollup-jobs}. The data is stored so that it doesn't
|
|
interfere with unrelated jobs.
|
|
|
|
[[rollup-put-job-api-example]]
|
|
==== {api-example-title}
|
|
|
|
The following example creates a {rollup-job} named `sensor`, targeting the
|
|
`sensor-*` index pattern:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
PUT _rollup/job/sensor
|
|
{
|
|
"index_pattern": "sensor-*",
|
|
"rollup_index": "sensor_rollup",
|
|
"cron": "*/30 * * * * ?",
|
|
"page_size" :1000,
|
|
"groups" : { <1>
|
|
"date_histogram": {
|
|
"field": "timestamp",
|
|
"fixed_interval": "1h",
|
|
"delay": "7d"
|
|
},
|
|
"terms": {
|
|
"fields": ["node"]
|
|
}
|
|
},
|
|
"metrics": [ <2>
|
|
{
|
|
"field": "temperature",
|
|
"metrics": ["min", "max", "sum"]
|
|
},
|
|
{
|
|
"field": "voltage",
|
|
"metrics": ["avg"]
|
|
}
|
|
]
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:sensor_index]
|
|
<1> This configuration enables date histograms to be used on the `timestamp`
|
|
field and `terms` aggregations to be used on the `node` field.
|
|
<2> This configuration defines metrics over two fields: `temperature` and
|
|
`voltage`. For the `temperature` field, we are collecting the min, max, and
|
|
sum of the temperature. For `voltage`, we are collecting the average.
|
|
|
|
When the job is created, you receive the following results:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"acknowledged": true
|
|
}
|
|
----
|