136 lines
6.0 KiB
Plaintext
136 lines
6.0 KiB
Plaintext
[role="xpack"]
|
|
[testenv="basic"]
|
|
[[rollup-search-limitations]]
|
|
=== {rollup-cap} search limitations
|
|
|
|
experimental[]
|
|
|
|
While we feel the Rollup function is extremely flexible, the nature of summarizing data means there will be some limitations. Once
|
|
live data is thrown away, you will always lose some flexibility.
|
|
|
|
This page highlights the major limitations so that you are aware of them.
|
|
|
|
[discrete]
|
|
==== Only one {rollup} index per search
|
|
|
|
When using the <<rollup-search>> endpoint, the `index` parameter accepts one or more indices. These can be a mix of regular, non-rollup
|
|
indices and rollup indices. However, only one rollup index can be specified. The exact list of rules for the `index` parameter are as
|
|
follows:
|
|
|
|
- At least one index/index-pattern must be specified. This can be either a rollup or non-rollup index. Omitting the index parameter,
|
|
or using `_all`, is not permitted
|
|
- Multiple non-rollup indices may be specified
|
|
- Only one rollup index may be specified. If more than one are supplied an exception will be thrown
|
|
- Index patterns may be used, but if they match more than one rollup index an exception will be thrown.
|
|
|
|
This limitation is driven by the logic that decides which jobs are the "best" for any given query. If you have ten jobs stored in a single
|
|
index, which cover the source data with varying degrees of completeness and different intervals, the query needs to determine which set
|
|
of jobs to actually search. Incorrect decisions can lead to inaccurate aggregation results (e.g. over-counting doc counts, or bad metrics).
|
|
Needless to say, this is a technically challenging piece of code.
|
|
|
|
To help simplify the problem, we have limited search to just one rollup index at a time (which may contain multiple jobs). In the future we
|
|
may be able to open this up to multiple rollup jobs.
|
|
|
|
[discrete]
|
|
[[aggregate-stored-only]]
|
|
==== Can only aggregate what's been stored
|
|
|
|
A perhaps obvious limitation, but rollups can only aggregate on data that has been stored in the rollups. If you don't configure the
|
|
rollup job to store metrics about the `price` field, you won't be able to use the `price` field in any query or aggregation.
|
|
|
|
For example, the `temperature` field in the following query has been stored in a rollup job... but not with an `avg` metric. Which means
|
|
the usage of `avg` here is not allowed:
|
|
|
|
[source,console]
|
|
--------------------------------------------------
|
|
GET sensor_rollup/_rollup_search
|
|
{
|
|
"size": 0,
|
|
"aggregations": {
|
|
"avg_temperature": {
|
|
"avg": {
|
|
"field": "temperature"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
--------------------------------------------------
|
|
// TEST[setup:sensor_prefab_data]
|
|
// TEST[catch:/illegal_argument_exception/]
|
|
|
|
The response will tell you that the field and aggregation were not possible, because no rollup jobs were found which contained them:
|
|
|
|
[source,console-result]
|
|
----
|
|
{
|
|
"error": {
|
|
"root_cause": [
|
|
{
|
|
"type": "illegal_argument_exception",
|
|
"reason": "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",
|
|
"stack_trace": ...
|
|
}
|
|
],
|
|
"type": "illegal_argument_exception",
|
|
"reason": "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",
|
|
"stack_trace": ...
|
|
},
|
|
"status": 400
|
|
}
|
|
----
|
|
// TESTRESPONSE[s/"stack_trace": \.\.\./"stack_trace": $body.$_path/]
|
|
|
|
[discrete]
|
|
==== Interval granularity
|
|
|
|
Rollups are stored at a certain granularity, as defined by the `date_histogram` group in the configuration. This means you
|
|
can only search/aggregate the rollup data with an interval that is greater-than or equal to the configured rollup interval.
|
|
|
|
For example, if data is rolled up at hourly intervals, the <<rollup-search>> API can aggregate on any time interval
|
|
hourly or greater. Intervals that are less than an hour will throw an exception, since the data simply doesn't
|
|
exist for finer granularities.
|
|
|
|
[[rollup-search-limitations-intervals]]
|
|
.Requests must be multiples of the config
|
|
**********************************
|
|
Perhaps not immediately apparent, but the interval specified in an aggregation request must be a whole
|
|
multiple of the configured interval. If the job was configured to rollup on `3d` intervals, you can only
|
|
query and aggregate on multiples of three (`3d`, `6d`, `9d`, etc).
|
|
|
|
A non-multiple wouldn't work, since the rolled up data wouldn't cleanly "overlap" with the buckets generated
|
|
by the aggregation, leading to incorrect results.
|
|
|
|
For that reason, an error is thrown if a whole multiple of the configured interval isn't found.
|
|
**********************************
|
|
|
|
Because the RollupSearch endpoint can "upsample" intervals, there is no need to configure jobs with multiple intervals (hourly, daily, etc).
|
|
It's recommended to just configure a single job with the smallest granularity that is needed, and allow the search endpoint to upsample
|
|
as needed.
|
|
|
|
That said, if multiple jobs are present in a single rollup index with varying intervals, the search endpoint will identify and use the job(s)
|
|
with the largest interval to satisfy the search request.
|
|
|
|
[discrete]
|
|
==== Limited querying components
|
|
|
|
The Rollup functionality allows `query`'s in the search request, but with a limited subset of components. The queries currently allowed are:
|
|
|
|
- Term Query
|
|
- Terms Query
|
|
- Range Query
|
|
- MatchAll Query
|
|
- Any compound query (Boolean, Boosting, ConstantScore, etc)
|
|
|
|
Furthermore, these queries can only use fields that were also saved in the rollup job as a `group`.
|
|
If you wish to filter on a keyword `hostname` field, that field must have been configured in the rollup job under a `terms` grouping.
|
|
|
|
If you attempt to use an unsupported query, or the query references a field that wasn't configured in the rollup job, an exception will be
|
|
thrown. We expect the list of support queries to grow over time as more are implemented.
|
|
|
|
[discrete]
|
|
==== Timezones
|
|
|
|
Rollup documents are stored in the timezone of the `date_histogram` group configuration in the job. If no timezone is specified, the default
|
|
is to rollup timestamps in `UTC`.
|
|
|