* Removed getPersistStream() method from this interface and let the NativeAutodetectProcess implementation deal with this. The persist stream is an implementation detail and BlackHoleAutodetectProcess doesn't deal with this too.
* Replaced getProcessOutStream() method with readAutodetectResults() method. This method now returns a `Iterator<AutodetectResult>` instead of an inputstream. This makes the BlackHoleAutodetectProcess and future mocked implementations easier.
Original commit: elastic/x-pack-elasticsearch@086e7b40ab
* Reintroduce chunking to improve data extractor performance
Performing a sorted search/scroll over a period of time that matches
a lot of documents is very expensive because for each page all
documents are traversed.
The solution is to chunk the search time and perform separate
search/scrolls for each chunk.
This commit is introducing a new `chung` config in `datafeed_config`
whose mode can be set to either of AUTO, OFF, MANUAL, with the latter
allowing to specify an explicit chunk size.
When set to AUTO, a heuristic is used in order to determine the chunk
size. The heuristic is based on estimating the time interval within
which we expect `scroll_size` documents and then taking the 10x multiple
of that. Based on benchmarking, this method gives a dramatic performance
increase. For example, for the citizens dataset it improved the ingest
rate from 0.33M docs / minute to 13.6M docs / minute. Farequote is now
done in ~1 second.
Finally, note that when `chunk` is not specified, it defaults to AUTO
when aggregations are not set and to OFF otherwise. This is because
the chunk size heuristic does not lend itself great for aggregations
where one needs to chunk based on the cardinality of buckets rather
than simply time.
Relates to elastic/elasticsearch#734
Original commit: elastic/x-pack-elasticsearch@a738e86d21
Similarly to task status on normal tasks it's now possible to update task status on the persistent tasks. This should allow updating the state of the running tasks (such as loading, started, etc) as well as store intermediate state or progress.
Original commit: elastic/x-pack-elasticsearch@ed109cfa84
The ScrollDataExtractor needs to clear the scroll after
it is complete. Originally, it was thought that completing a scroll
leads to an automatic clearing of its context. That is not true,
thus manual clearing has to be requested.
- Also removes sorting in AggregationDataExtractor as it was redundant
Original commit: elastic/x-pack-elasticsearch@8f955da8ce
If `domainSplit(` is detected in an inline script, the function and params are injected into
the script.
The majority of this PR is actually test-related. Adds a unit test to check for the injected
script/params. Also adds another QA test which -- through a very round-about mechanism --
confirms that the injected script compiles and functions correctly. The QA test can
be simplified greatly once the Preview API is added.
Original commit: elastic/x-pack-elasticsearch@c7c35a982c
Changes are:
1. The detector validation endpoint is changed from /_xpack/ml/_validate/detector
to /_xpack/ml/anomaly_detectors/_validate/detector
2. A new endpoint is added for validating an entire job config:
/_xpack/ml/anomaly_detectors/_validate
Relates elastic/elasticsearch#630
Original commit: elastic/x-pack-elasticsearch@7b2031e746
* Store input fields for anomaly records and influencers
* Address review comments
* Remove DotNotationReverser
* Remove duplicated constants
* Can’t use the same date for all records as they will have equivalent Ids
Original commit: elastic/x-pack-elasticsearch@40796b5efc
This needs to be moved to the single-node-tests qa modules since integTests shouldn’t access modules.
Original commit: elastic/x-pack-elasticsearch@289b697eb8
A persistent action is a transport-like action that is using the cluster state instead of transport to start tasks. This allows persistent tasks to survive restart of executing nodes. A persistent action can be implemented by extending TransportPersistentAction. TransportPersistentAction will start the task by using PersistentActionService, which controls persistent tasks lifecycle. See TestPersistentActionPlugin for an example implementing a persistent action.
Original commit: elastic/x-pack-elasticsearch@8ef4103cd6
This used to be 60 seconds, dating back to the days when the controller
had to be started manually after starting Elasticsearch. However, now
Elasticsearch starts it automatically it should already be running when
we try to connect, so the timeout can be much lower. It just needs to
be long enough to give the C++ process time to create its named pipes.
2 seconds seems reasonable, and matches what we use for autodetect and
normalize.
Original commit: elastic/x-pack-elasticsearch@7300d68482
This contains the Painless-based DomainSplit function, generated static maps and basic tests. Due to cross-module complications, the tests are run by executing searches with script_fields and checking the response
Original commit: elastic/x-pack-elasticsearch@c6c2942e01
When source fields are not required, stored_fields can be disabled.
This can make the query faster as no stored fields have to be
decompressed. Note that this means no metadata (_id, _index, _type, etc.)
will be returned.
Original commit: elastic/x-pack-elasticsearch@b1ea526d83
Even though a search response may return a 200 status code, things could
still have gone wrong. A search response may report shard failures.
The datafeed extractors should check for that and report an extraction
error accordingly.
Closeselastic/elasticsearch#775
Original commit: elastic/x-pack-elasticsearch@5d6d899738
* Audit messages in .ml-audit
* Rename ml-int to .ml-meta
* Remove no release comment
* Fix compilation after classes moved to a different package
* Create the Audit, state and meta indices every time a job is created
* Revert change creating the audit index etc when the job is created
* Rename index .ml-audit -> .ml-notifications
Original commit: elastic/x-pack-elasticsearch@95168fa341
* Handle manual aggregations in datafeeds
Adds a DataExtractor implementation that runs aggregated searches.
The manual aggregations supported have the following limitations:
- each aggregation can hava 0 or 1 sub-aggregations
- the top aggregation has to be a histogram
- sub-aggregations have to be either terms aggregations or single value
metric aggregations.
The response is converted into flat JSON documents that contain only the
fields of interest and can be parsed without additional context from our
JSON parser. The fields in the JSON documents correspond to the names of the aggregations.
Closeselastic/elasticsearch#680
Original commit: elastic/x-pack-elasticsearch@7dfd2d31e6
The new constructor takes an Environment object. This is needed for migration to X-Pack since the environment instance is built by the XPackPlugin and then passed into the feature plugins.
Original commit: elastic/x-pack-elasticsearch@f25225bc6a
Most transforms will be replaced with Painless scripts.
The exception is the DateTransform, whose functionality is now simplified
to what existed before the other transforms were added.
The SINGLE_LINE format relied on transforms to extract fields, so has also
been removed, but this is reasonable as it strays into Logstash territory.
Relates elastic/elasticsearch#630Closeselastic/elasticsearch#39
Original commit: elastic/x-pack-elasticsearch@a593d3e0ad
This matches the way tests that need to run without an Elasticsearch
bootstrap are run in core Elasticsearch. This should make merging to
x-pack easier.
Note that the no bootstrap tests now run after the integration tests, but
this doesn't really matter.
Original commit: elastic/x-pack-elasticsearch@5547f457b6