OpenSearch/docs/reference/rollup/rollup-search-limitations.a...

[role="xpack"]
[testenv="basic"]
[[rollup-search-limitations]]
== Rollup Search Limitations

experimental[]

While we feel the Rollup function is extremely flexible, the nature of summarizing data means there will be some limitations.  Once
live data is thrown away, you will always lose some flexibility.

This page highlights the major limitations so that you are aware of them.

[float]
=== Only one Rollup index per search

When using the <<rollup-search>> endpoint, the `index` parameter accepts one or more indices.  These can be a mix of regular, non-rollup
indices and rollup indices.  However, only one rollup index can be specified.  The exact list of rules for the `index` parameter are as
follows:

- At least one index/index-pattern must be specified.  This can be either a rollup or non-rollup index.  Omitting the index parameter,
or using `_all`, is not permitted
- Multiple non-rollup indices may be specified
- Only one rollup index may be specified.  If more than one are supplied an exception will be thrown

This limitation is driven by the logic that decides which jobs are the "best" for any given query.  If you have ten jobs stored in a single
index, which cover the source data with varying degrees of completeness and different intervals, the query needs to determine which set
of jobs to actually search.  Incorrect decisions can lead to inaccurate aggregation results (e.g. over-counting doc counts, or bad metrics).
Needless to say, this is a technically challenging piece of code.

To help simplify the problem, we have limited search to just one rollup index at a time (which may contain multiple jobs).  In the future we
may be able to open this up to multiple rollup jobs.

[float]
=== Can only aggregate what's been stored

A perhaps obvious limitation, but rollups can only aggregate on data that has been stored in the rollups.  If you don't configure the
rollup job to store metrics about the `price` field, you won't be able to use the `price` field in any query or aggregation.

For example, the `temperature` field in the following query has been stored in a rollup job... but not with an `avg` metric.  Which means
the usage of `avg` here is not allowed:

[source,js]
--------------------------------------------------
GET sensor_rollup/_rollup_search
{
    "size": 0,
    "aggregations": {
        "avg_temperature": {
            "avg": {
                "field": "temperature"
            }
        }
    }
}
--------------------------------------------------
// CONSOLE
// TEST[setup:sensor_prefab_data]
// TEST[catch:/illegal_argument_exception/]

The response will tell you that the field and aggregation were not possible, because no rollup jobs were found which contained them:

[source,js]
----
{
    "error" : {
        "root_cause" : [
            {
                "type" : "illegal_argument_exception",
                "reason" : "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",
                "stack_trace": ...
            }
        ],
        "type" : "illegal_argument_exception",
        "reason" : "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",
        "stack_trace": ...
    },
    "status": 400
}
----
// TESTRESPONSE[s/"stack_trace": \.\.\./"stack_trace": $body.$_path/]

[float]
=== Interval Granularity

Rollups are stored at a certain granularity, as defined by the `date_histogram` group in the configuration.  This means you
can only search/aggregate the rollup data with an interval that is greater-than or equal to the configured rollup interval.

For example, if data is rolled up at hourly intervals, the <<rollup-search>> API can aggregate on any time interval
hourly or greater.  Intervals that are less than an hour will throw an exception, since the data simply doesn't
exist for finer granularities.

[[rollup-search-limitations-intervals]]
.Requests must be multiples of the config
**********************************
Perhaps not immediately apparent, but the interval specified in an aggregation request must be a whole
multiple of the configured interval.  If the job was configured to rollup on `3d` intervals, you can only
query and aggregate on multiples of three (`3d`, `6d`, `9d`, etc).

A non-multiple wouldn't work, since the rolled up data wouldn't cleanly "overlap" with the buckets generated
by the aggregation, leading to incorrect results.

For that reason, an error is thrown if a whole multiple of the configured interval isn't found.
**********************************

Because the RollupSearch endpoint can "upsample" intervals, there is no need to configure jobs with multiple intervals (hourly, daily, etc).
It's recommended to just configure a single job with the smallest granularity that is needed, and allow the search endpoint to upsample
as needed.

That said, if multiple jobs are present in a single rollup index with varying intervals, the search endpoint will identify and use the job(s)
with the largest interval to satisfy the search request.

[float]
=== Limited querying components

The Rollup functionality allows `query`'s in the search request, but with a limited subset of components.  The queries currently allowed are:

- Term Query
- Terms Query
- Range Query
- MatchAll Query
- Any compound query (Boolean, Boosting, ConstantScore, etc)

Furthermore, these queries can only use fields that were also saved in the rollup job as a `group`.
If you wish to filter on a keyword `hostname` field, that field must have been configured in the rollup job under a `terms` grouping.

If you attempt to use an unsupported query, or the query references a field that wasn't configured in the rollup job, an exception will be
thrown.  We expect the list of support queries to grow over time as more are implemented.

[float]
=== Timezones

Rollup documents are stored in the timezone of the `date_histogram` group configuration in the job.  If no timezone is specified, the default
is to rollup timestamps in `UTC`.
[DOCS] Move rollup APIs to docs (#31450) 2018-08-31 13:50:43 -04:00			`[role="xpack"]`
			`[testenv="basic"]`
Rollups for Elasticsearch (elastic/x-pack-elasticsearch#4002) This adds a new Rollup module to XPack, which allows users to configure periodic "rollup jobs" to pre-aggregate data. That data is then available later for search through a special RollupSearch API, which mimics the DSL and functionality of regular search. Rollups are used to drastically reduce the on-disk footprint of metric-based data (e.g. timestamped document with numeric and keyword fields). It can also be used to speed up aggregations over large datasets, since the rolled data will be considerably smaller and fewer documents to search. The PR adds seven new endpoints to interact with Rollups; create/get/delete job, start/stop job, a capabilities API similar to field-caps, and a Rollup-enabled search. Original commit: elastic/x-pack-elasticsearch@dcde91aacfa52d2985e3948cb64392061e2b10c1 2018-02-23 17:10:37 -05:00			`[[rollup-search-limitations]]`
			`== Rollup Search Limitations`

[Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (#31299) - All rollup pages should be marked as experimental instead of just the top page - While the job config docs state which aggregations are allowed, adding a section which specifically details this in one place is more convenient for the user - Add a clarification that the DeleteJob API does not delete the rollup data, just the rollup job. 2018-06-13 15:42:20 -04:00			`experimental[]`

[Docs] Add quickstart and limitation documentation for Rollups Original commit: elastic/x-pack-elasticsearch@cb4aaa0992eb866ebc4e56f73a31db357dccca33 2018-03-30 16:43:33 -04:00			`While we feel the Rollup function is extremely flexible, the nature of summarizing data means there will be some limitations. Once`
			`live data is thrown away, you will always lose some flexibility.`

			`This page highlights the major limitations so that you are aware of them.`

			`[float]`
			`=== Only one Rollup index per search`

			When using the <<rollup-search>> endpoint, the `index` parameter accepts one or more indices. These can be a mix of regular, non-rollup
			indices and rollup indices. However, only one rollup index can be specified. The exact list of rules for the `index` parameter are as
			`follows:`

			`- At least one index/index-pattern must be specified. This can be either a rollup or non-rollup index. Omitting the index parameter,`
			or using `_all`, is not permitted
			`- Multiple non-rollup indices may be specified`
			`- Only one rollup index may be specified. If more than one are supplied an exception will be thrown`

			`This limitation is driven by the logic that decides which jobs are the "best" for any given query. If you have ten jobs stored in a single`
			`index, which cover the source data with varying degrees of completeness and different intervals, the query needs to determine which set`
			`of jobs to actually search. Incorrect decisions can lead to inaccurate aggregation results (e.g. over-counting doc counts, or bad metrics).`
			`Needless to say, this is a technically challenging piece of code.`

			`To help simplify the problem, we have limited search to just one rollup index at a time (which may contain multiple jobs). In the future we`
			`may be able to open this up to multiple rollup jobs.`

			`[float]`
			`=== Can only aggregate what's been stored`

			`A perhaps obvious limitation, but rollups can only aggregate on data that has been stored in the rollups. If you don't configure the`
			rollup job to store metrics about the `price` field, you won't be able to use the `price` field in any query or aggregation.

			For example, the `temperature` field in the following query has been stored in a rollup job... but not with an `avg` metric. Which means
			the usage of `avg` here is not allowed:

			`[source,js]`
			`--------------------------------------------------`
			`GET sensor_rollup/_rollup_search`
			`{`
			`"size": 0,`
			`"aggregations": {`
			`"avg_temperature": {`
			`"avg": {`
			`"field": "temperature"`
			`}`
			`}`
			`}`
			`}`
			`--------------------------------------------------`
			`// CONSOLE`
[Docs] Fix bad doc test Typo, needed to use a test setup instead of a continuation Original commit: elastic/x-pack-elasticsearch@cfbc1202c6f3fdea38eb39ec5cb5d6ae5c1eef42 2018-03-30 17:06:46 -04:00			`// TEST[setup:sensor_prefab_data]`
[Docs] Add quickstart and limitation documentation for Rollups Original commit: elastic/x-pack-elasticsearch@cb4aaa0992eb866ebc4e56f73a31db357dccca33 2018-03-30 16:43:33 -04:00			`// TEST[catch:/illegal_argument_exception/]`

			`The response will tell you that the field and aggregation were not possible, because no rollup jobs were found which contained them:`

			`[source,js]`
			`----`
			`{`
			`"error" : {`
			`"root_cause" : [`
			`{`
			`"type" : "illegal_argument_exception",`
			`"reason" : "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",`
			`"stack_trace": ...`
			`}`
			`],`
			`"type" : "illegal_argument_exception",`
			`"reason" : "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",`
			`"stack_trace": ...`
			`},`
			`"status": 400`
			`}`
			`----`
			`// TESTRESPONSE[s/"stack_trace": \.\.\./"stack_trace": $body.$_path/]`

			`[float]`
			`=== Interval Granularity`

[Rollup] Only allow aggregating on multiples of configured interval (#32052) We need to limit the search request aggregations to whole multiples of the configured interval for both histogram and date_histogram. Otherwise, agg buckets won't overlap with the rolled up buckets and the results will be incorrect. For histogram, the validation is very simple: request must be >= the config, and modulo evenly. Dates are more tricky. - If both request and config are fixed dates, we can convert to millis and treat them just like the histo - If both are calendar, we make sure the request is >= the config with a static lookup map that ranks the calendar values relatively. All calendar units are "singles", so they are evenly divisible already - We disallow any other combination (one fixed, one calendar, etc) 2018-08-29 17:10:00 -04:00			Rollups are stored at a certain granularity, as defined by the `date_histogram` group in the configuration. This means you
			`can only search/aggregate the rollup data with an interval that is greater-than or equal to the configured rollup interval.`

			`For example, if data is rolled up at hourly intervals, the <<rollup-search>> API can aggregate on any time interval`
			`hourly or greater. Intervals that are less than an hour will throw an exception, since the data simply doesn't`
			`exist for finer granularities.`

			`[[rollup-search-limitations-intervals]]`
			`.Requests must be multiples of the config`
			`**********************************`
			`Perhaps not immediately apparent, but the interval specified in an aggregation request must be a whole`
			multiple of the configured interval. If the job was configured to rollup on `3d` intervals, you can only
			query and aggregate on multiples of three (`3d`, `6d`, `9d`, etc).

			`A non-multiple wouldn't work, since the rolled up data wouldn't cleanly "overlap" with the buckets generated`
			`by the aggregation, leading to incorrect results.`

			`For that reason, an error is thrown if a whole multiple of the configured interval isn't found.`
			`**********************************`
[Docs] Add quickstart and limitation documentation for Rollups Original commit: elastic/x-pack-elasticsearch@cb4aaa0992eb866ebc4e56f73a31db357dccca33 2018-03-30 16:43:33 -04:00
			`Because the RollupSearch endpoint can "upsample" intervals, there is no need to configure jobs with multiple intervals (hourly, daily, etc).`
			`It's recommended to just configure a single job with the smallest granularity that is needed, and allow the search endpoint to upsample`
			`as needed.`

			`That said, if multiple jobs are present in a single rollup index with varying intervals, the search endpoint will identify and use the job(s)`
[DOCS] fix a couple of typos (#33356) 2018-09-04 04:07:11 -04:00			`with the largest interval to satisfy the search request.`
[Docs] Add quickstart and limitation documentation for Rollups Original commit: elastic/x-pack-elasticsearch@cb4aaa0992eb866ebc4e56f73a31db357dccca33 2018-03-30 16:43:33 -04:00
			`[float]`
			`=== Limited querying components`

			The Rollup functionality allows `query`'s in the search request, but with a limited subset of components. The queries currently allowed are:

			`- Term Query`
			`- Terms Query`
			`- Range Query`
			`- MatchAll Query`
			`- Any compound query (Boolean, Boosting, ConstantScore, etc)`

[Docs] All Rollup docs experimental, agg limitations, clarify DeleteJob (#31299) - All rollup pages should be marked as experimental instead of just the top page - While the job config docs state which aggregations are allowed, adding a section which specifically details this in one place is more convenient for the user - Add a clarification that the DeleteJob API does not delete the rollup data, just the rollup job. 2018-06-13 15:42:20 -04:00			Furthermore, these queries can only use fields that were also saved in the rollup job as a `group`.
			If you wish to filter on a keyword `hostname` field, that field must have been configured in the rollup job under a `terms` grouping.
[Docs] Add quickstart and limitation documentation for Rollups Original commit: elastic/x-pack-elasticsearch@cb4aaa0992eb866ebc4e56f73a31db357dccca33 2018-03-30 16:43:33 -04:00
			`If you attempt to use an unsupported query, or the query references a field that wasn't configured in the rollup job, an exception will be`
			`thrown. We expect the list of support queries to grow over time as more are implemented.`

			`[float]`
			`=== Timezones`

			Rollup documents are stored in the timezone of the `date_histogram` group configuration in the job. If no timezone is specified, the default
			is to rollup timestamps in `UTC`.