2018-02-23 17:10:37 -05:00
|
|
|
[role="xpack"]
|
2018-08-31 13:50:43 -04:00
|
|
|
[testenv="basic"]
|
2018-02-23 17:10:37 -05:00
|
|
|
[[rollup-search]]
|
2018-12-20 13:23:28 -05:00
|
|
|
=== Rollup search
|
2018-02-23 17:10:37 -05:00
|
|
|
++++
|
2018-12-20 13:23:28 -05:00
|
|
|
<titleabbrev>Rollup search</titleabbrev>
|
2018-02-23 17:10:37 -05:00
|
|
|
++++
|
|
|
|
|
2018-06-13 15:42:20 -04:00
|
|
|
experimental[]
|
|
|
|
|
2018-02-23 17:10:37 -05:00
|
|
|
The Rollup Search endpoint allows searching rolled-up data using the standard query DSL. The Rollup Search endpoint
|
|
|
|
is needed because, internally, rolled-up documents utilize a different document structure than the original data. The
|
|
|
|
Rollup Search endpoint rewrites standard query DSL into a format that matches the rollup documents, then takes the response
|
|
|
|
and rewrites it back to what a client would expect given the original query.
|
|
|
|
|
|
|
|
==== Request
|
|
|
|
|
|
|
|
`GET {index}/_rollup_search`
|
|
|
|
|
|
|
|
//===== Description
|
|
|
|
|
|
|
|
==== Path Parameters
|
|
|
|
|
|
|
|
`index`::
|
|
|
|
(string) Index, indices or index-pattern to execute a rollup search against. This can include both rollup and non-rollup
|
2018-03-30 16:43:33 -04:00
|
|
|
indices.
|
2018-02-23 17:10:37 -05:00
|
|
|
|
2018-03-30 16:43:33 -04:00
|
|
|
Rules for the `index` parameter:
|
2018-12-07 14:39:58 -05:00
|
|
|
|
2018-03-30 16:43:33 -04:00
|
|
|
- At least one index/index-pattern must be specified. This can be either a rollup or non-rollup index. Omitting the index parameter,
|
|
|
|
or using `_all`, is not permitted
|
|
|
|
- Multiple non-rollup indices may be specified
|
|
|
|
- Only one rollup index may be specified. If more than one are supplied an exception will be thrown
|
2018-10-30 13:50:50 -04:00
|
|
|
- Index patterns may be used, but if they match more than one rollup index an exception will be thrown.
|
2018-02-23 17:10:37 -05:00
|
|
|
|
|
|
|
==== Request Body
|
|
|
|
|
|
|
|
The request body supports a subset of features from the regular Search API. It supports:
|
|
|
|
|
2018-08-17 13:33:12 -04:00
|
|
|
- `query` param for specifying an DSL query, subject to some limitations (see <<rollup-search-limitations>> and <<rollup-agg-limitations>>
|
2018-02-23 17:10:37 -05:00
|
|
|
- `aggregations` param for specifying aggregations
|
|
|
|
|
|
|
|
Functionality that is not available:
|
|
|
|
|
|
|
|
- `size`: because rollups work on pre-aggregated data, no search hits can be returned and so size must be set to zero or
|
|
|
|
omitted entirely.
|
|
|
|
- `highlighter`, `suggestors`, `post_filter`, `profile`, `explain` are similarly disallowed
|
|
|
|
|
|
|
|
|
|
|
|
==== Historical-only search example
|
|
|
|
|
|
|
|
Imagine we have an index named `sensor-1` full of raw data, and we have created a rollup job with the following configuration:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
2018-12-11 19:43:17 -05:00
|
|
|
PUT _rollup/job/sensor
|
2018-02-23 17:10:37 -05:00
|
|
|
{
|
|
|
|
"index_pattern": "sensor-*",
|
|
|
|
"rollup_index": "sensor_rollup",
|
|
|
|
"cron": "*/30 * * * * ?",
|
2018-04-10 16:34:40 -04:00
|
|
|
"page_size" :1000,
|
2018-02-23 17:10:37 -05:00
|
|
|
"groups" : {
|
|
|
|
"date_histogram": {
|
|
|
|
"field": "timestamp",
|
[7.x Backport] Force selection of calendar or fixed intervals (#41906)
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.
This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed. And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit). This arrangement is very
error-prone for users.
This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.
The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.
The change applies to both REST and java clients.
2019-05-20 12:07:29 -04:00
|
|
|
"fixed_interval": "1h",
|
2018-02-23 17:10:37 -05:00
|
|
|
"delay": "7d"
|
|
|
|
},
|
|
|
|
"terms": {
|
|
|
|
"fields": ["node"]
|
|
|
|
}
|
|
|
|
},
|
|
|
|
"metrics": [
|
|
|
|
{
|
|
|
|
"field": "temperature",
|
|
|
|
"metrics": ["min", "max", "sum"]
|
|
|
|
},
|
|
|
|
{
|
|
|
|
"field": "voltage",
|
|
|
|
"metrics": ["avg"]
|
|
|
|
}
|
|
|
|
]
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
2018-04-04 18:32:26 -04:00
|
|
|
// TEST[setup:sensor_index]
|
2018-02-23 17:10:37 -05:00
|
|
|
|
|
|
|
This rolls up the `sensor-*` pattern and stores the results in `sensor_rollup`. To search this rolled up data, we
|
|
|
|
need to use the `_rollup_search` endpoint. However, you'll notice that we can use regular query DSL to search the
|
|
|
|
rolled-up data:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
GET /sensor_rollup/_rollup_search
|
|
|
|
{
|
|
|
|
"size": 0,
|
|
|
|
"aggregations": {
|
|
|
|
"max_temperature": {
|
|
|
|
"max": {
|
|
|
|
"field": "temperature"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TEST[setup:sensor_prefab_data]
|
2018-08-23 16:15:37 -04:00
|
|
|
// TEST[s/_rollup_search/_rollup_search?filter_path=took,timed_out,terminated_early,_shards,hits,aggregations/]
|
2018-02-23 17:10:37 -05:00
|
|
|
|
|
|
|
The query is targeting the `sensor_rollup` data, since this contains the rollup data as configured in the job. A `max`
|
|
|
|
aggregation has been used on the `temperature` field, yielding the following response:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
----
|
|
|
|
{
|
|
|
|
"took" : 102,
|
|
|
|
"timed_out" : false,
|
[Rollup] Select best jobs then execute msearch-per-job (elastic/x-pack-elasticsearch#4152)
If there are multiple jobs that are all the "best" (e.g. share the
best interval) we have no way of knowing which is actually the best.
Unfortunately, we cannot just filter for all the jobs in a single
search because their doc_counts can potentially overlap.
To solve this, we execute an msearch-per-job so that the results
stay isolated. When rewriting the response, we iteratively
unroll and reduce the independent msearch responses into a single
"working tree". This allows us to intervene if there are
overlapping buckets and manually choose a doc_count.
Job selection is found by recursively descending through the aggregation
tree and independently pruning the list of valid job caps in each branch.
When a leaf node is reached in the branch, the remaining jobs are
sorted by "best'ness" (see comparator in RollupJobIdentifierUtils for the
implementation) and added to a global set of "best jobs". Once
all branches have been evaluated, the final set is returned to the
calling code.
Job "best'ness" is, briefly, the job(s) that have
- The largest compatible date interval
- Fewer and larger interval histograms
- Fewer terms groups
Note: the final set of "best" jobs is not guaranteed to be minimal,
there may be redundant effort due to independent branches choosing
jobs that are subsets of other branches.
Related changes:
- We have to include the job's ID in the rollup doc's
hash, so that different jobs don't overwrite the same summary
document.
- Now that we iteratively reduce the agg tree, the agg framework
injects empty buckets while we're working. In most cases this
is harmless, but for `avg` aggs the empty bucket is a SumAgg while
any unrolled versions are converted into AvgAggs... causing a cast
exception. To get around this, avg's are renamed to
`{source_name}.value` to prevent a conflict
- The job filtering has been pushed up into a query filter, since it
applies to the entire msearch rather than just individual agg components
- We no longer add a filter agg clause about the date_histo's interval, because
that is handled by the job validation and pruning.
Original commit: elastic/x-pack-elasticsearch@995be2a039912ba55ee4b5d21f25eaae54b76b68
2018-03-27 13:33:59 -04:00
|
|
|
"terminated_early" : false,
|
2018-02-23 17:10:37 -05:00
|
|
|
"_shards" : ... ,
|
|
|
|
"hits" : {
|
2018-12-05 13:49:06 -05:00
|
|
|
"total" : {
|
|
|
|
"value": 0,
|
|
|
|
"relation": "eq"
|
|
|
|
},
|
2018-02-23 17:10:37 -05:00
|
|
|
"max_score" : 0.0,
|
|
|
|
"hits" : [ ]
|
|
|
|
},
|
|
|
|
"aggregations" : {
|
|
|
|
"max_temperature" : {
|
|
|
|
"value" : 202.0
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
----
|
|
|
|
// TESTRESPONSE[s/"took" : 102/"took" : $body.$_path/]
|
|
|
|
// TESTRESPONSE[s/"_shards" : \.\.\. /"_shards" : $body.$_path/]
|
|
|
|
|
|
|
|
The response is exactly as you'd expect from a regular query + aggregation; it provides some metadata about the request
|
|
|
|
(`took`, `_shards`, etc), the search hits (which is always empty for rollup searches), and the aggregation response.
|
|
|
|
|
|
|
|
Rollup searches are limited to functionality that was configured in the rollup job. For example, we are not able to calculate
|
|
|
|
the average temperature because `avg` was not one of the configured metrics for the `temperature` field. If we try
|
|
|
|
to execute that search:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
GET sensor_rollup/_rollup_search
|
|
|
|
{
|
|
|
|
"size": 0,
|
|
|
|
"aggregations": {
|
|
|
|
"avg_temperature": {
|
|
|
|
"avg": {
|
|
|
|
"field": "temperature"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TEST[continued]
|
[Rollup] Select best jobs then execute msearch-per-job (elastic/x-pack-elasticsearch#4152)
If there are multiple jobs that are all the "best" (e.g. share the
best interval) we have no way of knowing which is actually the best.
Unfortunately, we cannot just filter for all the jobs in a single
search because their doc_counts can potentially overlap.
To solve this, we execute an msearch-per-job so that the results
stay isolated. When rewriting the response, we iteratively
unroll and reduce the independent msearch responses into a single
"working tree". This allows us to intervene if there are
overlapping buckets and manually choose a doc_count.
Job selection is found by recursively descending through the aggregation
tree and independently pruning the list of valid job caps in each branch.
When a leaf node is reached in the branch, the remaining jobs are
sorted by "best'ness" (see comparator in RollupJobIdentifierUtils for the
implementation) and added to a global set of "best jobs". Once
all branches have been evaluated, the final set is returned to the
calling code.
Job "best'ness" is, briefly, the job(s) that have
- The largest compatible date interval
- Fewer and larger interval histograms
- Fewer terms groups
Note: the final set of "best" jobs is not guaranteed to be minimal,
there may be redundant effort due to independent branches choosing
jobs that are subsets of other branches.
Related changes:
- We have to include the job's ID in the rollup doc's
hash, so that different jobs don't overwrite the same summary
document.
- Now that we iteratively reduce the agg tree, the agg framework
injects empty buckets while we're working. In most cases this
is harmless, but for `avg` aggs the empty bucket is a SumAgg while
any unrolled versions are converted into AvgAggs... causing a cast
exception. To get around this, avg's are renamed to
`{source_name}.value` to prevent a conflict
- The job filtering has been pushed up into a query filter, since it
applies to the entire msearch rather than just individual agg components
- We no longer add a filter agg clause about the date_histo's interval, because
that is handled by the job validation and pruning.
Original commit: elastic/x-pack-elasticsearch@995be2a039912ba55ee4b5d21f25eaae54b76b68
2018-03-27 13:33:59 -04:00
|
|
|
// TEST[catch:/illegal_argument_exception/]
|
2018-02-23 17:10:37 -05:00
|
|
|
|
|
|
|
[source,js]
|
|
|
|
----
|
|
|
|
{
|
|
|
|
"error" : {
|
|
|
|
"root_cause" : [
|
|
|
|
{
|
|
|
|
"type" : "illegal_argument_exception",
|
[Rollup] Select best jobs then execute msearch-per-job (elastic/x-pack-elasticsearch#4152)
If there are multiple jobs that are all the "best" (e.g. share the
best interval) we have no way of knowing which is actually the best.
Unfortunately, we cannot just filter for all the jobs in a single
search because their doc_counts can potentially overlap.
To solve this, we execute an msearch-per-job so that the results
stay isolated. When rewriting the response, we iteratively
unroll and reduce the independent msearch responses into a single
"working tree". This allows us to intervene if there are
overlapping buckets and manually choose a doc_count.
Job selection is found by recursively descending through the aggregation
tree and independently pruning the list of valid job caps in each branch.
When a leaf node is reached in the branch, the remaining jobs are
sorted by "best'ness" (see comparator in RollupJobIdentifierUtils for the
implementation) and added to a global set of "best jobs". Once
all branches have been evaluated, the final set is returned to the
calling code.
Job "best'ness" is, briefly, the job(s) that have
- The largest compatible date interval
- Fewer and larger interval histograms
- Fewer terms groups
Note: the final set of "best" jobs is not guaranteed to be minimal,
there may be redundant effort due to independent branches choosing
jobs that are subsets of other branches.
Related changes:
- We have to include the job's ID in the rollup doc's
hash, so that different jobs don't overwrite the same summary
document.
- Now that we iteratively reduce the agg tree, the agg framework
injects empty buckets while we're working. In most cases this
is harmless, but for `avg` aggs the empty bucket is a SumAgg while
any unrolled versions are converted into AvgAggs... causing a cast
exception. To get around this, avg's are renamed to
`{source_name}.value` to prevent a conflict
- The job filtering has been pushed up into a query filter, since it
applies to the entire msearch rather than just individual agg components
- We no longer add a filter agg clause about the date_histo's interval, because
that is handled by the job validation and pruning.
Original commit: elastic/x-pack-elasticsearch@995be2a039912ba55ee4b5d21f25eaae54b76b68
2018-03-27 13:33:59 -04:00
|
|
|
"reason" : "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",
|
2018-02-23 17:10:37 -05:00
|
|
|
"stack_trace": ...
|
|
|
|
}
|
|
|
|
],
|
|
|
|
"type" : "illegal_argument_exception",
|
[Rollup] Select best jobs then execute msearch-per-job (elastic/x-pack-elasticsearch#4152)
If there are multiple jobs that are all the "best" (e.g. share the
best interval) we have no way of knowing which is actually the best.
Unfortunately, we cannot just filter for all the jobs in a single
search because their doc_counts can potentially overlap.
To solve this, we execute an msearch-per-job so that the results
stay isolated. When rewriting the response, we iteratively
unroll and reduce the independent msearch responses into a single
"working tree". This allows us to intervene if there are
overlapping buckets and manually choose a doc_count.
Job selection is found by recursively descending through the aggregation
tree and independently pruning the list of valid job caps in each branch.
When a leaf node is reached in the branch, the remaining jobs are
sorted by "best'ness" (see comparator in RollupJobIdentifierUtils for the
implementation) and added to a global set of "best jobs". Once
all branches have been evaluated, the final set is returned to the
calling code.
Job "best'ness" is, briefly, the job(s) that have
- The largest compatible date interval
- Fewer and larger interval histograms
- Fewer terms groups
Note: the final set of "best" jobs is not guaranteed to be minimal,
there may be redundant effort due to independent branches choosing
jobs that are subsets of other branches.
Related changes:
- We have to include the job's ID in the rollup doc's
hash, so that different jobs don't overwrite the same summary
document.
- Now that we iteratively reduce the agg tree, the agg framework
injects empty buckets while we're working. In most cases this
is harmless, but for `avg` aggs the empty bucket is a SumAgg while
any unrolled versions are converted into AvgAggs... causing a cast
exception. To get around this, avg's are renamed to
`{source_name}.value` to prevent a conflict
- The job filtering has been pushed up into a query filter, since it
applies to the entire msearch rather than just individual agg components
- We no longer add a filter agg clause about the date_histo's interval, because
that is handled by the job validation and pruning.
Original commit: elastic/x-pack-elasticsearch@995be2a039912ba55ee4b5d21f25eaae54b76b68
2018-03-27 13:33:59 -04:00
|
|
|
"reason" : "There is not a rollup job that has a [avg] agg with name [avg_temperature] which also satisfies all requirements of query.",
|
2018-02-23 17:10:37 -05:00
|
|
|
"stack_trace": ...
|
|
|
|
},
|
|
|
|
"status": 400
|
|
|
|
}
|
|
|
|
----
|
|
|
|
// TESTRESPONSE[s/"stack_trace": \.\.\./"stack_trace": $body.$_path/]
|
|
|
|
|
|
|
|
==== Searching both historical rollup and non-rollup data
|
|
|
|
|
|
|
|
The Rollup Search API has the capability to search across both "live", non-rollup data as well as the aggregated rollup
|
|
|
|
data. This is done by simply adding the live indices to the URI:
|
|
|
|
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
--------------------------------------------------
|
|
|
|
GET sensor-1,sensor_rollup/_rollup_search <1>
|
|
|
|
{
|
|
|
|
"size": 0,
|
|
|
|
"aggregations": {
|
|
|
|
"max_temperature": {
|
|
|
|
"max": {
|
|
|
|
"field": "temperature"
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
--------------------------------------------------
|
|
|
|
// CONSOLE
|
|
|
|
// TEST[continued]
|
2018-08-23 16:15:37 -04:00
|
|
|
// TEST[s/_rollup_search/_rollup_search?filter_path=took,timed_out,terminated_early,_shards,hits,aggregations/]
|
2018-02-23 17:10:37 -05:00
|
|
|
<1> Note the URI now searches `sensor-1` and `sensor_rollup` at the same time
|
|
|
|
|
|
|
|
When the search is executed, the Rollup Search endpoint will do two things:
|
|
|
|
|
|
|
|
1. The original request will be sent to the non-rollup index unaltered
|
|
|
|
2. A rewritten version of the original request will be sent to the rollup index.
|
|
|
|
|
|
|
|
When the two responses are received, the endpoint will then rewrite the rollup response and merge the two together.
|
|
|
|
During the merging process, if there is any overlap in buckets between the two responses, the buckets from the non-rollup
|
|
|
|
index will be used.
|
|
|
|
|
|
|
|
The response to the above query will look as expected, despite spanning rollup and non-rollup indices:
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
----
|
|
|
|
{
|
|
|
|
"took" : 102,
|
|
|
|
"timed_out" : false,
|
[Rollup] Select best jobs then execute msearch-per-job (elastic/x-pack-elasticsearch#4152)
If there are multiple jobs that are all the "best" (e.g. share the
best interval) we have no way of knowing which is actually the best.
Unfortunately, we cannot just filter for all the jobs in a single
search because their doc_counts can potentially overlap.
To solve this, we execute an msearch-per-job so that the results
stay isolated. When rewriting the response, we iteratively
unroll and reduce the independent msearch responses into a single
"working tree". This allows us to intervene if there are
overlapping buckets and manually choose a doc_count.
Job selection is found by recursively descending through the aggregation
tree and independently pruning the list of valid job caps in each branch.
When a leaf node is reached in the branch, the remaining jobs are
sorted by "best'ness" (see comparator in RollupJobIdentifierUtils for the
implementation) and added to a global set of "best jobs". Once
all branches have been evaluated, the final set is returned to the
calling code.
Job "best'ness" is, briefly, the job(s) that have
- The largest compatible date interval
- Fewer and larger interval histograms
- Fewer terms groups
Note: the final set of "best" jobs is not guaranteed to be minimal,
there may be redundant effort due to independent branches choosing
jobs that are subsets of other branches.
Related changes:
- We have to include the job's ID in the rollup doc's
hash, so that different jobs don't overwrite the same summary
document.
- Now that we iteratively reduce the agg tree, the agg framework
injects empty buckets while we're working. In most cases this
is harmless, but for `avg` aggs the empty bucket is a SumAgg while
any unrolled versions are converted into AvgAggs... causing a cast
exception. To get around this, avg's are renamed to
`{source_name}.value` to prevent a conflict
- The job filtering has been pushed up into a query filter, since it
applies to the entire msearch rather than just individual agg components
- We no longer add a filter agg clause about the date_histo's interval, because
that is handled by the job validation and pruning.
Original commit: elastic/x-pack-elasticsearch@995be2a039912ba55ee4b5d21f25eaae54b76b68
2018-03-27 13:33:59 -04:00
|
|
|
"terminated_early" : false,
|
2018-02-23 17:10:37 -05:00
|
|
|
"_shards" : ... ,
|
|
|
|
"hits" : {
|
2018-12-05 13:49:06 -05:00
|
|
|
"total" : {
|
|
|
|
"value": 0,
|
|
|
|
"relation": "eq"
|
|
|
|
},
|
2018-02-23 17:10:37 -05:00
|
|
|
"max_score" : 0.0,
|
|
|
|
"hits" : [ ]
|
|
|
|
},
|
|
|
|
"aggregations" : {
|
|
|
|
"max_temperature" : {
|
|
|
|
"value" : 202.0
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
----
|
|
|
|
// TESTRESPONSE[s/"took" : 102/"took" : $body.$_path/]
|
|
|
|
// TESTRESPONSE[s/"_shards" : \.\.\. /"_shards" : $body.$_path/]
|