Commit Graph

39 Commits

Author SHA1 Message Date
Jim Ferenczi 79cd6385fe
Collapse package structure for metrics aggs (#33463)
This change collapses all metrics aggregations classes into a single package `org.elasticsearch.aggregations.metrics`.
It also restricts the visibility of some classes (aggregators and factories) that should not be used outside of the package.

Relates #22868
2018-09-07 10:58:06 +02:00
Zachary Tong 90ce3a6224 [Rollup] Fix Caps Comparator to handle calendar/fixed time (#33336)
The comparator used TimeValue parsing, which meant it couldn't handle
calendar time.  This fixes the comparator to handle either (and potentially
mixed).  The mixing shouldn't be an issue since the validation code
upstream will prevent it, but was simplest to allow the comparator
to handle both.
2018-09-03 10:49:19 +02:00
Zachary Tong d93b2a2e9a
[Rollup] Only allow aggregating on multiples of configured interval (#32052)
We need to limit the search request aggregations to whole multiples
of the configured interval for both histogram and date_histogram.
Otherwise, agg buckets won't overlap with the rolled up buckets
and the results will be incorrect.

For histogram, the validation is very simple: request must be >= the config,
and modulo evenly.

Dates are more tricky.
- If both request and config are fixed dates, we can convert to millis
and treat them just like the histo
- If both are calendar, we make sure the request is >= the config with
a static lookup map that ranks the calendar values relatively.  All
calendar units are "singles", so they are evenly divisible already
- We disallow any other combination (one fixed, one calendar, etc)
2018-08-29 17:10:00 -04:00
Hendrik Muhs cfc003d485 [Rollup] Re-factor Rollup Indexer into a generic indexer for re-usability (#32743)
This extracts a super class out of the rollup indexer called the AsyncTwoPhaseIterator. 
The implementor of it can define the query, transformation of the response, 
indexing and the object to persist the position/state of the indexer.

The stats object used by the indexer to record progress is also now abstract, allowing
the implementation provide custom stats beyond what the indexer provides.  It also
allows the implementation to decide how the stats are presented (leaves toXContent()
up to the implementation).

This should allow new projects to reuse the search-then-index persistent task that Rollup
uses, but without the restrictions/baggage of how Rollup has to work internally to
satisfy time-based rollups.
2018-08-29 14:28:21 -04:00
Zachary Tong 353112a033
[Rollup] Better error message when trying to set non-rollup index (#32965)
We don't allow the user to configure a rollup index against an
existing index, but the exceptions that we return are not clear about
that.  They indicate issues with metadata, instead of stating
the real reason (not allowed to use a non-rollup index to store
rollup data).

This makes the exception better, and adds a bit more testing
2018-08-28 11:50:35 -04:00
Tanguy Leroux e1e8cf382f
[Rollup] Move toBuilders() methods out of rollup config objects (#32585) 2018-08-27 09:18:26 +02:00
Tanguy Leroux 879a90b999
[Rollup] Move getMetadata() methods out of rollup config objects (#32579)
This committ removes the getMetadata() methods from the DateHistoGroupConfig 
and HistoGroupConfig objects. This way the configuration objects do not rely on RollupField.formatMetaField() anymore and do not expose a getMetadata() 
method that is tighlty coupled to the rollup indexer.
2018-08-24 11:57:46 +02:00
Zachary Tong 8f8d3a5556
[Rollup] Return empty response when aggs are missing (#32796)
If a search request doesn't contain aggs (or an empty agg object),
we should just retun an empty response.  This is how the normal search
API works if you specify zero hits and empty aggs.

The existing behavior throws an exception because it tries to send
an empty msearch.

Closes #32256
2018-08-23 16:15:37 -04:00
Nik Everett 462e91d362
Logging: Use settings when building daemon threads (#32751)
Subclasses of `EsIntegTestCase` run multiple Elasticsearch nodes in the
same JVM and when we log we look at the name of the thread to figure out
the node name. This makes sure that all calls to `daemonThreadFactory`
include the node name.

Closes #32574

I'd like to follow this up with more drastic changes that make it
impossible to do this incorrectly but that change is much larger than
this and I'd like to get these log lines fixed up sooner rather than
later.
2018-08-20 13:53:15 -04:00
Lee Hinman 48281ac5bc
Use generic AcknowledgedResponse instead of extended classes (#32859)
This removes custom Response classes that extend `AcknowledgedResponse` and do nothing, these classes are not needed and we can directly use the non-abstract super-class instead.

While this appears to be a large PR, no code has actually changed, only class names have been changed and entire classes removed.
2018-08-15 08:06:14 -06:00
Tanguy Leroux 2e65bac5dd
[Rollup] Remove builders from RollupJobConfig (#32669) 2018-08-07 18:54:42 +02:00
Tanguy Leroux 1122314b3b
[Rollup] Remove builders from GroupConfig (#32614) 2018-08-07 09:39:24 +02:00
Zachary Tong fc9fb64ad5
[Rollup] Improve ID scheme for rollup documents (#32558)
Previously, we were using a simple CRC32 for the IDs of rollup documents.
This is a very poor choice however, since 32bit IDs leads to collisions
between documents very quickly.

This commit moves Rollups over to a 128bit ID.  The ID is a concatenation
of all the keys in the document (similar to the rolling CRC before),
hashed with 128bit Murmur3, then base64 encoded.  Finally, the job
ID and a delimiter (`$`) are prepended to the ID.

This gurantees that there are 128bits per-job.  128bits should
essentially remove all chances of collisions, and the prepended
job ID means that _if_ there is a collision, it stays "within"
the job.

BWC notes:

We can only upgrade the ID scheme after we know there has been a good
checkpoint during indexing.  We don't rely on a STARTED/STOPPED
status since we can't guarantee that resulted from a real checkpoint,
or other state.  So we only upgrade the ID after we have reached
a checkpoint state during an active index run, and only after the
checkpoint has been confirmed.

Once a job has been upgraded and checkpointed, the version increments
and the new ID is used in the future.  All new jobs use the
new ID from the start
2018-08-03 11:13:25 -04:00
Tanguy Leroux 21f660d801
[Rollup] Remove builders from DateHistogramGroupConfig (#32555)
Same motivation as #32507 but for the DateHistogramGroupConfig
configuration object. This pull request also changes the format of the
time zone from a Joda's DateTimeZone to a simple String.

It should help to port the API to the high level rest client and allows
clients to not be forced to use the Joda Time library. Serialization is
impacted but does not need a backward compatibility layer as
DateTimeZone are serialized as String anyway. XContent also expects
a String for timezone, so I found it easier to move everything to String.

Related to #29827
2018-08-03 13:11:00 +02:00
Tanguy Leroux 937dcfd716
[Rollup] Remove builders from MetricConfig (#32536)
Related to #29827
2018-08-03 10:01:20 +02:00
Tanguy Leroux 08e4f4be42
[Rollup] Remove builders from HistoGroupConfig (#32533)
Related to #29827
2018-08-02 17:55:00 +02:00
Tanguy Leroux 82fe67b225
[Rollup] Remove builders from TermsGroupConfig (#32507)
While working on adding the Create Rollup Job API to the 
high level REST client (#29827), I noticed that the configuration 
objects like TermsGroupConfig rely on the Builder pattern in 
order to create or parse instances. These builders are doing 
some validation but the same validation could be done within 
the constructor itself or on the server side when appropriate.

This commit removes the builder for TermsGroupConfig, 
removes some other methods that I consider not really usefull 
once the TermsGroupConfig object will be exposed in the 
high level REST client. It also simplifies the parsing logic.

Related to #29827
2018-08-01 09:43:32 +02:00
Zachary Tong 6cf7588c3d
[TEST] Fix failure due to exception message in java11 (#32321)
Java 11 uses more verbose exceptions messages, causing this assertion
to fail.  Changed the test to be less restrictive and only look
for the classes we care about.
2018-07-25 11:34:26 -04:00
Zachary Tong 6ba144ae31
Add WeightedAvg metric aggregation (#31037)
Adds a new single-value metrics aggregation that computes the weighted 
average of numeric values that are extracted from the aggregated 
documents. These values can be extracted from specific numeric
fields in the documents.

When calculating a regular average, each datapoint has an equal "weight"; it
contributes equally to the final value.  In contrast, weighted averages
scale each datapoint differently.  The amount that each datapoint contributes 
to the final value is extracted from the document, or provided by a script.

As a formula, a weighted average is the `∑(value * weight) / ∑(weight)`

A regular average can be thought of as a weighted average where every value has
an implicit weight of `1`.

Closes #15731
2018-07-23 18:33:15 -04:00
Jim Ferenczi 644a92f158
Fix rollup on date fields that don't support epoch_millis (#31890)
The rollup indexer uses a range query to select the next page
of results based on the last time bucket of the previous round
and the `delay` configured on the rollup job. This query uses
the `epoch_millis` format implicitly but doesn't set the `format`.
This result in errors during the rollup job if the field
definition doesn't allow this format. It can also miss documents
if the format is not accepted but another format in the field
definition is able to parse the query (e.g.: `epoch_second`).
This change ensures that we use `epoch_millis` as the only format
to parse the rollup range query.
2018-07-19 09:34:23 +02:00
Zachary Tong 791b9b147c
[Rollup] Add new capabilities endpoint for concrete rollup indices (#30401)
This introduces a new GetRollupIndexCaps API which allows the user to retrieve rollup capabilities of a specific rollup index (or index pattern). This is distinct from the existing RollupCaps endpoint.

- Multiple jobs can be stored in multiple indices and point to a single target data index pattern (logstash-*). The existing API finds capabilities/config of all jobs matching that data index pattern.
- One rollup index can hold data from multiple jobs, targeting multiple data index patterns. This new API finds the capabilities based on the concrete rollup indices.
2018-07-16 17:20:50 -04:00
Zachary Tong 59191b4998
[Rollup] Replace RollupIT with a ESRestTestCase version (#31977)
The old RollupIT was a node IT, an flaky for a number of reasons.
This new version is an ESRestTestCase and should be a little more robust.

This was added to the multi-node QA tests as that seemed like the most
appropriate location.  It didn't seem necessary to create a whole new
QA module.

Note: The only test that was ported was the "Big" test for validating
a larger dataset.  The rest of the tests are represented in existing
yaml tests.

Closes #31258
Closes #30232
Related to #30290
2018-07-16 10:47:46 -04:00
Zachary Tong b7f07f03ed
[Rollup] Use composite's missing_bucket (#31402)
We can leverage the composite agg's new `missing_bucket` feature on
terms groupings.  This means the aggregation criteria used in the indexer
will now return null buckets for missing keys.  

Because all buckets are now returned (even if a key is null),
we can guarantee correct doc counts with
"combined" jobs (where a job rolls up multiple schemas).  This was
previously impossible since composite would ignore documents that
didn't have _all_ the keys, meaning non-overlapping schemas would
cause composite to return no buckets.

Note: date_histo does not use `missing_bucket`, since a timestamp is
always required.

The docs have been adjusted to recommend a single, combined job.  It
also makes reference to the previous issue to help users that are upgrading
(rather than just deleting the sections).
2018-07-13 10:07:42 -04:00
Vladimir Dolzhenko 6acb591012 mark RollupIT.testTwoJobsStartStopDeleteOne as AwaitsFix 2018-07-05 10:03:10 +02:00
Alpar Torok 8557bbab28
Upgrade gradle wrapper to 4.8 (#31525)
* Move to Gradle 4.8 RC1

* Use latest version of plugin

The current does not work with Gradle 4.8 RC1

* Switch to Gradle GA

* Add and configure build compare plugin

* add work-around for https://github.com/gradle/gradle/issues/5692

* work around https://github.com/gradle/gradle/issues/5696

* Make use of Gradle build compare with reference project

* Make the manifest more compare friendly

* Clear the manifest in compare friendly mode

* Remove animalsniffer from buildscript classpath

* Fix javadoc errors

* Fix doc issues

* reference Gradle issues in comments

* Conditionally configure build compare

* Fix some more doclint issues

* fix typo in build script

* Add sanity check to make sure the test task was replaced

Relates to #31324. It seems like Gradle has an inconsistent behavior and
the taks is not always replaced.

* Include number of non conforming tasks in the exception.

* No longer replace test task, create implicit instead

Closes #31324. The issue has full context in comments.

With this change the `test` task becomes nothing more than an alias for `utest`.
Some of the stand alone tests that had a `test` task now have `integTest`, and a
few of them that used to have `integTest` to run multiple tests now only
have `check`.
This will also help separarate unit/micro tests from integration tests.

* Revert "No longer replace test task, create implicit instead"

This reverts commit f1ebaf7d93e4a0a19e751109bf620477dc35023c.

* Fix replacement of the test task

Based on information from gradle/gradle#5730 replace the task taking
into account the task providres.
Closes #31324.

* Only apply build comapare plugin if needed

* Make sure test runs before integTest

* Fix doclint aftter merge

* PR review comments

* Switch to Gradle 4.8.1 and remove workaround

* PR review comments

* Consolidate task ordering
2018-06-28 08:13:21 +03:00
Tanguy Leroux be9292cac6
[Test] Add full cluster restart test for Rollup (#31533)
This pull request adds a full cluster restart test for a Rollup job. 
The test creates and starts a Rollup job on the cluster and checks 
that the job already exists and is correctly started on the upgraded 
cluster.

This test allows to test that the persistent task state is correctly 
parsed from the cluster state after the upgrade, as the status field 
has been renamed to state in #31031.

The test undercovers a ClassCastException that can be thrown in 
the RollupIndexer when the timestamp as a very low value that fits 
into an integer. When it's the case, the value is parsed back as an 
Integer instead of Long object and (long) position.get(rollupFieldName) 
fails.
2018-06-26 10:07:25 +02:00
Ryan Ernst 7a150ec06d
Core: Combine doExecute methods in TransportAction (#31517)
TransportAction currently contains 2 doExecute methods, one which takes
a the task, and one that does not. The latter is what some subclasses
implement, while the first one just calls the latter, dropping the given
task. This commit combines these methods, in favor of just always
assuming a task is present.
2018-06-22 15:03:01 -07:00
Ryan Ernst 59e7c6411a
Core: Combine messageRecieved methods in TransportRequestHandler (#31519)
TransportRequestHandler currently contains 2 messageReceived methods,
one which takes a Task, and one that does not. The first just delegates
to the second. This commit changes all existing implementors of
TransportRequestHandler to implement the version which takes Task, thus
allowing the class to be a functional interface, and eliminating the
need to throw exceptions when a task needs to be ensured.
2018-06-22 07:36:03 -07:00
Ryan Ernst 4f9332ee16
Core: Remove ThreadPool from base TransportAction (#31492)
Most transport actions don't need the node ThreadPool. This commit
removes the ThreadPool as a super constructor parameter for
TransportAction. The actions that do need the thread pool then have a
member added to keep it from their own constructor.
2018-06-21 11:25:26 -07:00
Ryan Ernst 401800d958
Core: Remove index name resolver from base TransportAction (#31002)
Most transport actions don't need to resolve index names. This commit
removes the index name resolver as a super constructor parameter for
TransportAction. The actions that do need the resolver then have a
member added to keep the resolver from their own constructor.
2018-06-19 17:06:09 -07:00
Tanguy Leroux 992c7889ee
Uncouple persistent task state and status (#31031)
This pull request removes the relationship between the state 
of persistent task (as stored in the cluster state) and the status 
of the task (as reported by the Task APIs and used in various 
places) that have been confusing for some time (#29608).

In order to do that, a new PersistentTaskState interface is added. 
This interface represents the persisted state of a persistent task. 
The methods used to update the state of persistent tasks are 
renamed: updatePersistentStatus() becomes updatePersistentTaskState() 
and now takes a PersistentTaskState as a parameter. The 
Task.Status type as been changed to PersistentTaskState in all 
places were it make sense (in persistent task customs in cluster 
state and all other methods that deal with the state of an allocated 
persistent task).
2018-06-15 09:26:47 +02:00
Tanguy Leroux bf58660482
Remove all unused imports and fix CRLF (#31207)
The X-Pack opening and the recent other refactorings left a lot of 
unused imports in the codebase. This commit removes them all.
2018-06-11 15:12:12 +02:00
Zachary Tong a1c9def64e
[Rollup] Disallow index patterns that match the rollup index (#30491)
We should not allow the user to configure index patterns that also match
the index which stores the rollup index.

For example, it is quite natural for a user to specify `metricbeat-*`
as the index pattern, and then store the rollups in `metricbeat-rolled`.
This will start throwing errors as soon as the rollup index is created
because the indexer will try to search it.

Note: this does not prevent the user from matching against existing
rollup indices.  That should be prevented by the field-level validation
during job creation.
2018-06-05 15:00:34 -04:00
Jim Ferenczi 7f850bb8ce
Allow terms query in _rollup_search (#30973)
This change adds the `terms` query to the list of accepted queries
for the _rollup_search endpoint.
2018-06-05 16:51:14 +02:00
Yannick Welsch e1649b8669
Allow rollup job creation only if cluster is x-pack ready (#30963)
Otherwise we could end up with persistent tasks metadata in the cluster that some of the nodes
might not understand in case where the cluster is during rolling upgrade from the default 6.2 to the
default 6.3 distribution.

Follow-up to #30743
2018-06-01 10:47:53 +02:00
Tanguy Leroux a0af0e7f1e
Rename methods in PersistentTasksService (#30837)
This commit renames methods in the PersistentTasksService, to 
make obvious that the methods send requests in order to change 
the state of persistent tasks. 

Relates to #29608.
2018-05-30 09:20:14 +02:00
Colin Goodheart-Smithe a75b8adce5
Refactors ClientHelper to combine header logic (#30620)
* Refactors ClientHelper to combine header logic

This change removes all the `*ClientHelper` classes which were
repeating logic between plugins and instead adds
`ClientHelper.executeWithHeaders()` and
`ClientHelper.executeWithHeadersAsync()` methods to centralise the
logic for executing requests with stored security headers.

* Removes Watcher headers constant
2018-05-16 11:38:24 +01:00
Zachary Tong 1c0d339904
[Rollup] Validate timezone in range queries (#30338)
When validating the search request, we make sure any date_histogram
aggregations have timezones that match the jobs.  But we didn't
do any such validation on range queries.

While it wouldn't produce incorrect results, it would be confusing
to the user as no documents would match the aggregation (because we
add a filter clause on the timezone for the agg).

Now the user gets an exception up front, and some helpful text about
why the range query didnt match, and which timezones are acceptable
2018-05-04 10:45:16 -07:00
Ryan Ernst 2efd22454a Migrate x-pack-elasticsearch source to elasticsearch 2018-04-20 15:29:54 -07:00