Commit Graph

339 Commits

Author SHA1 Message Date
Dimitris Athanasiou ccb9ab5717 Fix time field extraction after upstream change (elastic/elasticsearch#873)
Elasticsearch changed doc_values of date fields to return a
joda DateTime object. Thus, we need to call getMillis() to extract
the epoch millis value.

Original commit: elastic/x-pack-elasticsearch@b992882af5
2017-02-07 12:04:01 +00:00
Zachary Tong 0b71e015d8 Remove incorrect/unused parameter
Original commit: elastic/x-pack-elasticsearch@4f33186b5c
2017-02-06 14:50:10 -05:00
Zachary Tong a1a5d590b6 Integrate DBQ into job deletion process (elastic/elasticsearch#691)
A JobStorageDeletionTask is created, which supervises the physical deletion of the job.  This
task is a child of the DeleteJob action.  After the DBQ finishes, the normal flow
resumes (physical index deleted, job removed from CS)

Original commit: elastic/x-pack-elasticsearch@5d6f694408
2017-02-06 14:34:36 -05:00
Igor Motov a0b37a2510 Replace List with Map in PersistentTasksInProgress
Store currently running persistent tasks in a map instead of a list.

Original commit: elastic/x-pack-elasticsearch@f383b0bbed
2017-02-06 12:18:42 -05:00
Dimitrios Athanasiou aa86d57487 Include start/end in log message for starting datafeed
Original commit: elastic/x-pack-elasticsearch@7b88bb27c1
2017-02-06 13:49:03 +00:00
David Roberts f594030c9e Fix some mappings on the .ml indexes (elastic/elasticsearch#870)
Closes elastic/elasticsearch#814

Original commit: elastic/x-pack-elasticsearch@206efacc4c
2017-02-06 12:26:03 +00:00
David Kyle 3f9741b85f Make config and result objects with dates human readable (elastic/elasticsearch#863)
Original commit: elastic/x-pack-elasticsearch@9c0c306741
2017-02-06 09:46:21 +00:00
David Roberts 50c4090541 Remove timeout setting from Job (elastic/elasticsearch#866)
This setting was related to auto-close, and Jobs no longer auto-close.

Closes elastic/elasticsearch#832

Original commit: elastic/x-pack-elasticsearch@fef81f9c3b
2017-02-06 09:39:55 +00:00
David Roberts fb0ccde8d8 More checkstyle fixes
Original commit: elastic/x-pack-elasticsearch@f68e57835b
2017-02-03 16:32:09 +00:00
David Kyle 55482ca65c Fix check style error after upgrade
Original commit: elastic/x-pack-elasticsearch@db802d1837
2017-02-03 16:08:07 +00:00
Martijn van Groningen 1b65366478 Simplified AutodetectProcess interface:
* Removed getPersistStream() method from this interface and let the NativeAutodetectProcess implementation deal with this. The persist stream is an implementation detail and BlackHoleAutodetectProcess doesn't deal with this too.
* Replaced getProcessOutStream() method with readAutodetectResults() method. This method now returns a `Iterator<AutodetectResult>` instead of an inputstream. This makes the BlackHoleAutodetectProcess and future mocked implementations easier.

Original commit: elastic/x-pack-elasticsearch@086e7b40ab
2017-02-03 16:52:51 +01:00
Dimitris Athanasiou 9d9572e2b2 Reintroduce chunking to improve data extractor performance (elastic/elasticsearch#849)
* Reintroduce chunking to improve data extractor performance

Performing a sorted search/scroll over a period of time that matches
a lot of documents is very expensive because for each page all
documents are traversed.

The solution is to chunk the search time and perform separate
search/scrolls for each chunk.

This commit is introducing a new `chung` config in `datafeed_config`
whose mode can be set to either of AUTO, OFF, MANUAL, with the latter
allowing to specify an explicit chunk size.

When set to AUTO, a heuristic is used in order to determine the chunk
size. The heuristic is based on estimating the time interval within
which we expect `scroll_size` documents and then taking the 10x multiple
of that. Based on benchmarking, this method gives a dramatic performance
increase. For example, for the citizens dataset it improved the ingest
rate from 0.33M docs / minute to 13.6M docs / minute. Farequote is now
done in ~1 second.

Finally, note that when `chunk` is not specified, it defaults to AUTO
when aggregations are not set and to OFF otherwise. This is because
the chunk size heuristic does not lend itself great for aggregations
where one needs to chunk based on the cardinality of buckets rather
than simply time.

Relates to elastic/elasticsearch#734

Original commit: elastic/x-pack-elasticsearch@a738e86d21
2017-02-03 15:50:01 +00:00
David Kyle 21adb19b22 Checkstyle fix
Original commit: elastic/x-pack-elasticsearch@1d0eaed282
2017-02-03 15:35:36 +00:00
Martijn van Groningen a7d95951a6 Removed forgotten blocking call when opening a job.
Original commit: elastic/x-pack-elasticsearch@e1dfa54240
2017-02-03 16:24:43 +01:00
Dimitris Athanasiou 9b0344cd90 Write enum values in lowercase (elastic/elasticsearch#861)
Original commit: elastic/x-pack-elasticsearch@6788ad3304
2017-02-03 15:10:11 +00:00
David Kyle e7dcab48ab Test was testing the wrong endpoint
Original commit: elastic/x-pack-elasticsearch@ca7a1a1097
2017-02-03 14:50:08 +00:00
David Kyle 70b8129b78 Add job update endpoint (elastic/elasticsearch#854)
* Remove redundant code

* Add job update endpoint

* Support updating detector description & rules

* Fix merge conflicts

* Use toStrings and fix race condition in update

* Revert to using xpack.ml.support.AbstractSerializingTestCase

Original commit: elastic/x-pack-elasticsearch@771ada0572
2017-02-03 14:22:36 +00:00
Dimitrios Athanasiou 2883b00b7c Also rename some *Status*Tests to *State*Tests
Original commit: elastic/x-pack-elasticsearch@6e1d3e2bba
2017-02-03 11:08:02 +00:00
Dimitris Athanasiou ca4badeb46 Rename {Job|Datafeed}Status to {Job|Datafeed}State (elastic/elasticsearch#856)
This is more consistent with elasticsearch where an index
has state [open, close], etc.

Original commit: elastic/x-pack-elasticsearch@30bf720c3e
2017-02-03 10:43:05 +00:00
David Kyle b940dbf6d9 Remove the overwrite option from PUT job (elastic/elasticsearch#855)
Original commit: elastic/x-pack-elasticsearch@0f7e0d35a9
2017-02-03 09:54:47 +00:00
Igor Motov 53a5e19c70 Add support for task status on persistent tasks
Similarly to task status on normal tasks it's now possible to update task status on the persistent tasks. This should allow updating the state of the running tasks (such as loading, started, etc) as well as store intermediate state or progress.

Original commit: elastic/x-pack-elasticsearch@ed109cfa84
2017-02-02 13:35:37 -05:00
Martijn van Groningen 06d688eb74 AutodetectProcessManager#getStatistics(...) should can now just return stats for single job as the _all expension is done on the transport layer
Original commit: elastic/x-pack-elasticsearch@02d5272a4e
2017-02-02 13:11:39 +01:00
Dimitris Athanasiou 5ba9a6cfcc Clear scroll after it is complete (elastic/elasticsearch#847)
The ScrollDataExtractor needs to clear the scroll after
it is complete. Originally, it was thought that completing a scroll
leads to an automatic clearing of its context. That is not true,
thus manual clearing has to be requested.

- Also removes sorting in AggregationDataExtractor as it was redundant

Original commit: elastic/x-pack-elasticsearch@8f955da8ce
2017-02-02 10:18:36 +00:00
Zachary Tong a11ddd1e04 Integrate domainSplit function into datafeeds (elastic/elasticsearch#841)
If `domainSplit(` is detected in an inline script, the function and params are injected into
the script.

The majority of this PR is actually test-related.  Adds a unit test to check for the injected
script/params.  Also adds another QA test which -- through a very round-about mechanism --
confirms that the injected script compiles and functions correctly.  The QA test can
be simplified greatly once the Preview API is added.

Original commit: elastic/x-pack-elasticsearch@c7c35a982c
2017-02-01 10:20:00 -05:00
Martijn van Groningen ec902c4dc3 test: change assertion to be more lenient to platform specifics
Original commit: elastic/x-pack-elasticsearch@2131e7f0c7
2017-02-01 14:49:45 +01:00
Martijn van Groningen 051d8d8fdf Moved start and stop datafeeder apis over the persistent task infrastructure
Original commit: elastic/x-pack-elasticsearch@8e15578fb7
2017-01-31 22:50:00 +01:00
Martijn van Groningen 22282e9d56 cleanup toString() methods
Original commit: elastic/x-pack-elasticsearch@17a10ea68f
2017-01-31 22:35:54 +01:00
Martijn van Groningen ce6dc4a506 Make job stats api task aware.
This will allow the job stats api to redirect the request to node where job is running.

Original commit: elastic/x-pack-elasticsearch@9f1d12dfcb
2017-01-31 22:35:54 +01:00
Martijn van Groningen b07e9bbd07 Fixed AOBE caused by fetching model state when opening a job.
This error only occurred for jobs that have been opened before and persisted model state.

Closes elastic/elasticsearch#836

Original commit: elastic/x-pack-elasticsearch@ad76f4167f
2017-01-31 19:56:24 +01:00
David Roberts 7ff3b707a8 Make renormalization thread-safe (elastic/elasticsearch#840)
Each ScoresUpdater needs its own JobRenormalizedResultsPersister, because
each JobRenormalizedResultsPersister has a single BulkRequest that various
methods update.

Fixes elastic/elasticsearch#838

Original commit: elastic/x-pack-elasticsearch@90f4bbd5a0
2017-01-31 16:51:26 +00:00
David Kyle 97970b94cd Remove the ignoreDowntime parameter from the _data endpoint (elastic/elasticsearch#834)
The parameter only applies when a job is opened

Original commit: elastic/x-pack-elasticsearch@37b902aa2a
2017-01-31 11:48:58 +00:00
David Roberts 34274a30ed Make transport layer names consistent with corresponding endpoints (elastic/elasticsearch#822)
Closes elastic/elasticsearch#630

Original commit: elastic/x-pack-elasticsearch@32aae3e1d9
2017-01-31 11:42:06 +00:00
David Kyle c84f227857 Set memory usage log message to trace (elastic/elasticsearch#829)
Original commit: elastic/x-pack-elasticsearch@13412cc4cf
2017-01-31 09:54:56 +00:00
David Roberts ab957b6d91 Adjust validation endpoints (elastic/elasticsearch#812)
Changes are:

1. The detector validation endpoint is changed from /_xpack/ml/_validate/detector
   to /_xpack/ml/anomaly_detectors/_validate/detector
2. A new endpoint is added for validating an entire job config:
   /_xpack/ml/anomaly_detectors/_validate

Relates elastic/elasticsearch#630

Original commit: elastic/x-pack-elasticsearch@7b2031e746
2017-01-30 17:10:22 +00:00
David Kyle 4eab74ce29 Store input fields for anomaly records and influencers (elastic/elasticsearch#799)
* Store input fields for anomaly records and influencers

* Address review comments

* Remove DotNotationReverser

* Remove duplicated constants

* Can’t use the same date for all records as they will have equivalent Ids

Original commit: elastic/x-pack-elasticsearch@40796b5efc
2017-01-30 14:05:18 +00:00
Colin Goodheart-Smithe 79d1a10a86 Mutes DataFeedJobIT test method that uses painless
This needs to be moved to the single-node-tests qa modules since integTests shouldn’t access modules.

Original commit: elastic/x-pack-elasticsearch@289b697eb8
2017-01-30 10:55:59 +00:00
Colin Goodheart-Smithe 618cb2a1a0 Make all action names cluster actions
Original commit: elastic/x-pack-elasticsearch@815d8f0aac
2017-01-30 10:00:23 +00:00
Igor Motov 827118e154 Adds support for persistent actions
A persistent action is a transport-like action that is using the cluster state instead of transport to start tasks. This allows persistent tasks to survive restart of executing nodes. A persistent action can be implemented by extending TransportPersistentAction. TransportPersistentAction will start the task by using PersistentActionService, which controls persistent tasks lifecycle.  See TestPersistentActionPlugin for an example implementing a persistent action.

Original commit: elastic/x-pack-elasticsearch@8ef4103cd6
2017-01-27 11:20:54 -05:00
Martijn van Groningen ff65c38253 [TEST] fixed mocking logic to include id
Original commit: elastic/x-pack-elasticsearch@7b20e92fdc
2017-01-27 17:20:18 +01:00
Martijn van Groningen 2059b91620 Workaround for index request without an id being retried that are tripping an assertion in internal engine. (2)
Original commit: elastic/x-pack-elasticsearch@22d5060deb
2017-01-27 17:07:51 +01:00
Martijn van Groningen ad4218320c Workaround for index request without an id being retried that are tripping an assertion in internal engine.
Original commit: elastic/x-pack-elasticsearch@ba44acc28b
2017-01-27 16:15:00 +01:00
David Roberts 64fdb039ab Reduce the controller connect timeout (elastic/elasticsearch#804)
This used to be 60 seconds, dating back to the days when the controller
had to be started manually after starting Elasticsearch.  However, now
Elasticsearch starts it automatically it should already be running when
we try to connect, so the timeout can be much lower.  It just needs to
be long enough to give the C++ process time to create its named pipes.
2 seconds seems reasonable, and matches what we use for autodetect and
normalize.

Original commit: elastic/x-pack-elasticsearch@7300d68482
2017-01-27 14:23:18 +00:00
Zachary Tong 9395ef81b1 Painless DomainSplit tests in new Single-Node QA Module (elastic/elasticsearch#787)
This contains the Painless-based DomainSplit function, generated static maps and basic tests.  Due to cross-module complications, the tests are run by executing searches with script_fields and checking the response


Original commit: elastic/x-pack-elasticsearch@c6c2942e01
2017-01-27 08:52:48 -05:00
Dimitris Athanasiou 91be1e719d Disable stored_fields when possible in ScrollDataExtractor (elastic/elasticsearch#801)
When source fields are not required, stored_fields can be disabled.
This can make the query faster as no stored fields have to be
decompressed. Note that this means no metadata (_id, _index, _type, etc.)
will be returned.

Original commit: elastic/x-pack-elasticsearch@b1ea526d83
2017-01-27 11:38:54 +00:00
Dimitris Athanasiou 5790a6f152 Handle shard failures in extractors (elastic/elasticsearch#794)
Even though a search response may return a 200 status code, things could
still have gone wrong. A search response may report shard failures.

The datafeed extractors should check for that and report an extraction
error accordingly.

Closes elastic/elasticsearch#775

Original commit: elastic/x-pack-elasticsearch@5d6d899738
2017-01-26 16:01:43 +00:00
David Kyle efc47c2a6f Remove Usage classes (elastic/elasticsearch#796)
* Delete usage class

* Delete usage reporter

* Remove unused constant

Original commit: elastic/x-pack-elasticsearch@c7a6c457bd
2017-01-26 11:50:08 +00:00
David Kyle db14d89358 Fix checkstyle
Original commit: elastic/x-pack-elasticsearch@05d59da705
2017-01-26 10:05:03 +00:00
David Kyle e3bb7cfea3 Split ml-int index into .ml-audit and .ml-meta (elastic/elasticsearch#752)
* Audit messages in .ml-audit

* Rename ml-int to .ml-meta

* Remove no release comment

* Fix compilation after classes moved to a different package

* Create the Audit, state and meta indices every time a job is created

* Revert change creating the audit index etc when the job is created

* Rename index .ml-audit -> .ml-notifications

Original commit: elastic/x-pack-elasticsearch@95168fa341
2017-01-26 09:44:54 +00:00
Martijn van Groningen 3a36f94a4a When timeout has been reached, check one more time if the job / datafeed status has the expected value.
Decreased wait timeout from 30s to 20s

Original commit: elastic/x-pack-elasticsearch@b46fb0abe3
2017-01-25 23:32:04 +01:00
Dimitris Athanasiou 86291c12e2 Handle manual aggregations in datafeeds (elastic/elasticsearch#784)
* Handle manual aggregations in datafeeds

Adds a DataExtractor implementation that runs aggregated searches.

The manual aggregations supported have the following limitations:

- each aggregation can hava 0 or 1 sub-aggregations
- the top aggregation has to be a histogram
- sub-aggregations have to be either terms aggregations or single value
metric aggregations.

The response is converted into flat JSON documents that contain only the
fields of interest and can be parsed without additional context from our
JSON parser. The fields in the JSON documents correspond to the names of the aggregations.

Closes elastic/elasticsearch#680

Original commit: elastic/x-pack-elasticsearch@7dfd2d31e6
2017-01-25 19:13:03 +00:00