* Removed getPersistStream() method from this interface and let the NativeAutodetectProcess implementation deal with this. The persist stream is an implementation detail and BlackHoleAutodetectProcess doesn't deal with this too.
* Replaced getProcessOutStream() method with readAutodetectResults() method. This method now returns a `Iterator<AutodetectResult>` instead of an inputstream. This makes the BlackHoleAutodetectProcess and future mocked implementations easier.
Original commit: elastic/x-pack-elasticsearch@086e7b40ab
* Reintroduce chunking to improve data extractor performance
Performing a sorted search/scroll over a period of time that matches
a lot of documents is very expensive because for each page all
documents are traversed.
The solution is to chunk the search time and perform separate
search/scrolls for each chunk.
This commit is introducing a new `chung` config in `datafeed_config`
whose mode can be set to either of AUTO, OFF, MANUAL, with the latter
allowing to specify an explicit chunk size.
When set to AUTO, a heuristic is used in order to determine the chunk
size. The heuristic is based on estimating the time interval within
which we expect `scroll_size` documents and then taking the 10x multiple
of that. Based on benchmarking, this method gives a dramatic performance
increase. For example, for the citizens dataset it improved the ingest
rate from 0.33M docs / minute to 13.6M docs / minute. Farequote is now
done in ~1 second.
Finally, note that when `chunk` is not specified, it defaults to AUTO
when aggregations are not set and to OFF otherwise. This is because
the chunk size heuristic does not lend itself great for aggregations
where one needs to chunk based on the cardinality of buckets rather
than simply time.
Relates to elastic/elasticsearch#734
Original commit: elastic/x-pack-elasticsearch@a738e86d21
This commit upgrades the checkstyle configuration from version 5.9 to
version 7.5, the latest version as of today. The main enhancement
obtained via this upgrade is better detection of redundant modifiers.
Relates elastic/elasticsearch#4810
Original commit: elastic/x-pack-elasticsearch@2c9b7d23dc
This commit reuses the automaton that defines the allowed fields in
`FieldSubsetReader` rather than resolving the list of all matching fields from
the mapping. As a side-effect this change solves a bug that unmapped fields
could previously not be read from the source. Moreover it avoids determinization
errors in the case that the number of matching fields is high.
It also uses `CharacterRunAutomaton` to evaluate automata against a given
string, which should be faster than naively stepping into the automaton since
`CharacterRunAutomaton` builds a lookup table of transitions.
Closeselastic/elasticsearch#4679
Original commit: elastic/x-pack-elasticsearch@a30913dbd5
The way we check for the triggered watches on start-up did not take into account
that an index could be closed and thus resulted in an NPE.
This commit adds a check to ensure that the watch index and triggered watches index
are open, before trying to check if all primary shards are active.
Original commit: elastic/x-pack-elasticsearch@ee05779963
This change adapts x-pack to pass on the parsed XContentType from rest requests to transport
requests and use this value in place of attempting to auto-detect the content type.
Original commit: elastic/x-pack-elasticsearch@57475fd403
Similarly to task status on normal tasks it's now possible to update task status on the persistent tasks. This should allow updating the state of the running tasks (such as loading, started, etc) as well as store intermediate state or progress.
Original commit: elastic/x-pack-elasticsearch@ed109cfa84
The ScrollDataExtractor needs to clear the scroll after
it is complete. Originally, it was thought that completing a scroll
leads to an automatic clearing of its context. That is not true,
thus manual clearing has to be requested.
- Also removes sorting in AggregationDataExtractor as it was redundant
Original commit: elastic/x-pack-elasticsearch@8f955da8ce
If `domainSplit(` is detected in an inline script, the function and params are injected into
the script.
The majority of this PR is actually test-related. Adds a unit test to check for the injected
script/params. Also adds another QA test which -- through a very round-about mechanism --
confirms that the injected script compiles and functions correctly. The QA test can
be simplified greatly once the Preview API is added.
Original commit: elastic/x-pack-elasticsearch@c7c35a982c
Changes are:
1. The detector validation endpoint is changed from /_xpack/ml/_validate/detector
to /_xpack/ml/anomaly_detectors/_validate/detector
2. A new endpoint is added for validating an entire job config:
/_xpack/ml/anomaly_detectors/_validate
Relates elastic/elasticsearch#630
Original commit: elastic/x-pack-elasticsearch@7b2031e746
* Store input fields for anomaly records and influencers
* Address review comments
* Remove DotNotationReverser
* Remove duplicated constants
* Can’t use the same date for all records as they will have equivalent Ids
Original commit: elastic/x-pack-elasticsearch@40796b5efc
This needs to be moved to the single-node-tests qa modules since integTests shouldn’t access modules.
Original commit: elastic/x-pack-elasticsearch@289b697eb8
A persistent action is a transport-like action that is using the cluster state instead of transport to start tasks. This allows persistent tasks to survive restart of executing nodes. A persistent action can be implemented by extending TransportPersistentAction. TransportPersistentAction will start the task by using PersistentActionService, which controls persistent tasks lifecycle. See TestPersistentActionPlugin for an example implementing a persistent action.
Original commit: elastic/x-pack-elasticsearch@8ef4103cd6
This used to be 60 seconds, dating back to the days when the controller
had to be started manually after starting Elasticsearch. However, now
Elasticsearch starts it automatically it should already be running when
we try to connect, so the timeout can be much lower. It just needs to
be long enough to give the C++ process time to create its named pipes.
2 seconds seems reasonable, and matches what we use for autodetect and
normalize.
Original commit: elastic/x-pack-elasticsearch@7300d68482
This contains the Painless-based DomainSplit function, generated static maps and basic tests. Due to cross-module complications, the tests are run by executing searches with script_fields and checking the response
Original commit: elastic/x-pack-elasticsearch@c6c2942e01
When source fields are not required, stored_fields can be disabled.
This can make the query faster as no stored fields have to be
decompressed. Note that this means no metadata (_id, _index, _type, etc.)
will be returned.
Original commit: elastic/x-pack-elasticsearch@b1ea526d83