OpenSearch

Commit Graph

Author	SHA1	Message	Date
Zachary Tong	20dbd75623	[Rollup] Rename job config `size` to `page_size` (elastic/x-pack-elasticsearch#4309 ) Renaming should hopefully make it more clear that this is the size of pages to process during rolling up, nothing to do with the size of the various groups, metrics, etc. Original commit: elastic/x-pack-elasticsearch@8a0a44f04b	2018-04-10 13:34:40 -07:00
Zachary Tong	7810dc6146	[Rollup] Add `value_count` metric (elastic/x-pack-elasticsearch#4315 ) Adds `value_count` as one of the accepted metrics. The caveat is that it only accepts numeric values for two reasons: - Job validation at creation makes sure all metrics are numeric fields. Changing this would require new syntax (or disallowing anything but value_count on mixed fields) - when `toBuilders()` is called, we have to supply a ValueSource to the ValueCountBuilder, and we don't know what the field type is at that time. These are both fixable, but relatively more involved. I think numeric-only is a reasonable limitation to start with Original commit: elastic/x-pack-elasticsearch@270f24c8bf	2018-04-06 10:47:33 -07:00
Zachary Tong	f682ecc576	[Rollup] Remove `computed` field from rollup docs The computed field contained a list of all aggs that were computed for this particular rollup doc. It was used to help filter to the correct rollup job/set of jobs. But this functionality was never perfect, and has been obsoleted by validating the rollup caps while searching. So we can remove the computed field and save a bunch of space (since they were quite bulky) Original commit: elastic/x-pack-elasticsearch@455644488f	2018-04-05 15:25:20 +00:00
Zachary Tong	3852b41330	[Rollup] Validate field mapping before creating Rollup Job (elastic/x-pack-elasticsearch#4274 ) This PR adds logic to ensure that the fields (and field types) configured in the Rollup Job are present in the index/indices specified by the job's index pattern. If a field is missing, or is not aggregatable, it will throw an exception before the job is created. This is important for user-friendliness, because otherwise the user only discovers an issue with mapping when the job is started and fails to rollup correctly (and only really noticeable by looking at logs, since it's a runtime failure). Original commit: elastic/x-pack-elasticsearch@686cd03072	2018-04-04 15:32:26 -07:00
Zachary Tong	e8a6c9f5d1	[Rollup] Delegate GetJobs to master (elastic/x-pack-elasticsearch#4247 ) If a job is deleted and then GetJobs API is immediately called, it is possible for a job to be returned in the response. This is likely due to the GetJobs API being executed on a node with a slightly stale cluster state which shows the job as still existing. So we delegate to the master node so the list of jobs/tasks is current. After routing to the master, we need to check if the rollup job is in the PersistentTask's CS. A job can be acknowledged canceled, removed from the CS, but the allocated task is still alive. So we first check the CS to make sure it's really there before going to the allocated task to get the status. As extra precaution, when running local to the task, we also make sure the task isn't canceled before including it in the response. relates elastic/x-pack-elasticsearch#4041 Original commit: elastic/x-pack-elasticsearch@3b6fb65e12	2018-03-30 06:24:29 -07:00
Zachary Tong	54539a1eb0	[Rollup] Make Rollup a Basic license feature (elastic/x-pack-elasticsearch#4246 ) * Make Rollup a Basic license feature Original commit: elastic/x-pack-elasticsearch@ef1ee98855	2018-03-30 06:23:08 -07:00
Zachary Tong	df88ba4ed7	[Rollup] Don't persist state if aborting `doSaveState` can be invoked on different types of failure. Some of these failures are recoverable (e.g. search exception) which just cause the job to reset until the next trigger time. Other exceptions might be caused by an Abort request. Previously `doSaveState` assumed that the indexer state would be INDEXING, STOPPED or STARTED and asserted that. But if we are ABORTING it failed the assertion, and in production would try to persist that aborting state which is not needed (and may complicate matters later). This commit removes the assertion and only tries to persist if we are not aborting. If we're aborting, we just invoke the next handler which is likely an onFailure handler. Relates to elastic/x-pack-elasticsearch#4243 Original commit: elastic/x-pack-elasticsearch@3643b7c0e4	2018-03-28 13:01:58 +00:00
Zachary Tong	9cc33f4e29	[Rollup] Select best jobs then execute msearch-per-job (elastic/x-pack-elasticsearch#4152 ) If there are multiple jobs that are all the "best" (e.g. share the best interval) we have no way of knowing which is actually the best. Unfortunately, we cannot just filter for all the jobs in a single search because their doc_counts can potentially overlap. To solve this, we execute an msearch-per-job so that the results stay isolated. When rewriting the response, we iteratively unroll and reduce the independent msearch responses into a single "working tree". This allows us to intervene if there are overlapping buckets and manually choose a doc_count. Job selection is found by recursively descending through the aggregation tree and independently pruning the list of valid job caps in each branch. When a leaf node is reached in the branch, the remaining jobs are sorted by "best'ness" (see comparator in RollupJobIdentifierUtils for the implementation) and added to a global set of "best jobs". Once all branches have been evaluated, the final set is returned to the calling code. Job "best'ness" is, briefly, the job(s) that have - The largest compatible date interval - Fewer and larger interval histograms - Fewer terms groups Note: the final set of "best" jobs is not guaranteed to be minimal, there may be redundant effort due to independent branches choosing jobs that are subsets of other branches. Related changes: - We have to include the job's ID in the rollup doc's hash, so that different jobs don't overwrite the same summary document. - Now that we iteratively reduce the agg tree, the agg framework injects empty buckets while we're working. In most cases this is harmless, but for `avg` aggs the empty bucket is a SumAgg while any unrolled versions are converted into AvgAggs... causing a cast exception. To get around this, avg's are renamed to `{source_name}.value` to prevent a conflict - The job filtering has been pushed up into a query filter, since it applies to the entire msearch rather than just individual agg components - We no longer add a filter agg clause about the date_histo's interval, because that is handled by the job validation and pruning. Original commit: elastic/x-pack-elasticsearch@995be2a039	2018-03-27 10:33:59 -07:00
Jim Ferenczi	3a75435980	Fix IndexerUtilsTests that relies on indexed fields This test creates doc values fields only but does not set the index options to none. This commit fixes this discrepancy by adding an indexed point field for all doc values field. relates elastic/x-pack-elasticsearch#4223 Original commit: elastic/x-pack-elasticsearch@8adab7c849	2018-03-26 13:37:18 +02:00
David Turner	8c8de0a774	Mute failing IndexerUtilsTests Awaiting a fix of elastic/x-pack-elasticsearch#4223 Original commit: elastic/x-pack-elasticsearch@d385099719	2018-03-26 10:57:34 +01:00
Zachary Tong	aa877161ff	[Rollup] Register FeatureSetUsage with xpack, add tests (elastic/x-pack-elasticsearch#4040 ) We had a Usage class before, but weren't registering it with XPack. Would be nice to add more usage info in the future (like the running jobs on each node), but unclear the best way to do it since we'd need to filter through the list of allocated tasks. Original commit: elastic/x-pack-elasticsearch@5207d2758b	2018-03-08 08:06:42 -08:00
Jason Tedor	ead1c6c315	Fix Javadocs for o.e.x.r.j.RollupIndexer This commit fixes the Javadocs for the class o.e.x.r.j.RollupIndexer as these Javadocs were referring to instance methods on the class incorrectly (using a this prefix). Original commit: elastic/x-pack-elasticsearch@fdcc7338f9	2018-03-06 14:12:42 -08:00
Lee Hinman	2147d217df	Wrap stream passed to createParser in try-with-resources (elastic/x-pack-elasticsearch#4055 ) This wraps the stream (`.streamInput()`) that is passed to many of the `createParser` instances in the enclosing (or a new) try-with-resources block. This ensures the `BytesReference.streamInput()` is closed. Relates to elastic/x-pack-elasticsearch#28504 Original commit: elastic/x-pack-elasticsearch@7546e3b4d4	2018-03-04 16:48:15 -07:00
$polyfractal$ polyfractal	933738c264	[Rollup] Don't use lucene's newSearcher() method in tests Use AggregatorTestCase's `newIndexSearcher()` instead. Lucene's version can randomly wrap with IndexReader with things we can't handle like ParallelCompositeReader Original commit: elastic/x-pack-elasticsearch@b4c0e9a601	2018-03-02 17:07:57 -08:00
$polyfractal$ polyfractal	faac0d2a52	[Rollup] Don't randomize index name in test The test job helper randomizes the index name with 1-10 characters, which can lead to randomized index names to overlap and show fewer caps than the test expects. The solution is to just use index names "0"-"24" to ensure none of the names overlap, and thus the caps don't overlap. Original commit: elastic/x-pack-elasticsearch@74a6d13213	2018-03-02 16:16:11 -08:00
$polyfractal$ polyfractal	7fbe289d30	[Rollup] Fix bad await in tests The arrangement of the final latch meant the latch could countdown, then the test ends before the await() triggers which caused the thread to be interrupted and fail. The whole arrangement was incorrect anyhow. We need to await the latch before sending the search response as before, but move the final atomicBoolean to the second time the persistent task status is updated which is a signal that we are done and can end the test If these tests continues to be flaky, we should probably just remove them. The headers are tested elsewhere and not required to be tested in this context. Original commit: elastic/x-pack-elasticsearch@0cf5603972	2018-03-02 16:05:36 -08:00
Zachary Tong	3b474d8868	[Test] Fix slow rollup job task test Incorrect latch caused this test to run slowly (until the await finished), and could probably cause failure due to incorrect ordering Original commit: elastic/x-pack-elasticsearch@ebeb8655da	2018-02-24 20:04:33 +00:00
Zachary Tong	eb82e3cf61	[Test] Fix bad latches in rollup state tests The latches were not placed correctly, allowing the aborts to be set before we checked the state for Indexing the first time. This was due to using the DelayingIndexer's built in latch, which isn't placed quite where we needed it. Original commit: elastic/x-pack-elasticsearch@590cfa07b0	2018-02-24 18:44:51 +00:00
Zachary Tong	390e64aabd	Add empty policy file to Rollups Packaging tests seem to require a policy file for the time being Original commit: elastic/x-pack-elasticsearch@ce34b023db	2018-02-24 03:28:28 +00:00
Zachary Tong	bf1550a0b2	Rollups for Elasticsearch (elastic/x-pack-elasticsearch#4002 ) This adds a new Rollup module to XPack, which allows users to configure periodic "rollup jobs" to pre-aggregate data. That data is then available later for search through a special RollupSearch API, which mimics the DSL and functionality of regular search. Rollups are used to drastically reduce the on-disk footprint of metric-based data (e.g. timestamped document with numeric and keyword fields). It can also be used to speed up aggregations over large datasets, since the rolled data will be considerably smaller and fewer documents to search. The PR adds seven new endpoints to interact with Rollups; create/get/delete job, start/stop job, a capabilities API similar to field-caps, and a Rollup-enabled search. Original commit: elastic/x-pack-elasticsearch@dcde91aacf	2018-02-23 17:10:37 -05:00

20 Commits