Commit Graph

658 Commits

Author SHA1 Message Date
David Roberts 50c34b2a9b
[ML] Reverse engineer Grok patterns from categorization results (#30125)
This change adds a grok_pattern field to the GET categories API
output in ML. It's calculated using the regex and examples in the
categorization result, and applying a list of candidate Grok
patterns to the bits in between the tokens that are considered to
define the category.

This can currently be considered a prototype, as the Grok patterns
it produces are not optimal. However, enough people have said it
would be useful for it to be worthwhile exposing it as experimental
functionality for interested parties to try out.
2018-05-15 09:02:38 +01:00
David Kyle 9dd629648d [ML] Improve state persistence log message 2018-05-12 09:20:08 +01:00
Dimitris Athanasiou 3b260dcfc1
[ML] Account for gaps in data counts after job is reopened (#30294)
This commit fixes an issue with the data diagnostics were
empty buckets are not reported even though they should. Once
a job is reopened, the diagnostics do not get initialized from
the current data counts (especially the latest record timestamp).
The result is that if the data that is sent have a time gap compared
to the previous ones, that gap is not accounted for in the empty bucket
count.

This commit fixes that by initializing the diagnostics with the current
data counts.

Closes #30080
2018-05-03 15:08:24 +01:00
Ryan Ernst fb0aa562a5
Network: Remove http.enabled setting (#29601)
This commit removes the http.enabled setting. While all real nodes (started with bin/elasticsearch) will always have an http binding, there are many tests that rely on the quickness of not actually needing to bind to 2 ports. For this case, the MockHttpTransport.TestPlugin provides a dummy http transport implementation which is used by default in ESIntegTestCase.

closes #12792
2018-05-02 11:42:05 -07:00
Dimitris Athanasiou 057cdffed5
[ML] Refactor DataStreamDiagnostics to use array (#30129)
This commit refactors the DataStreamDiagnostics class
achieving the following advantages:

- simpler code; by encapsulating the moving bucket histogram
into its own class
- better performance; by using an array to store the buckets
instead of a map
- explicit handling of gap buckets; in preparation of fixing #30080
2018-05-01 09:50:32 +01:00
David Roberts 225f7093a9
[ML] Include 3rd party C++ component notices (#30132)
The overall NOTICE file for the ML X-Pack module should
include the notices from the 3rd party C++ components as
well as the 3rd party Java components.
2018-04-30 20:05:27 +01:00
David Kyle cfc66a1fd5 [ML] Wait for updates to established memory usage
Tests need to wait for changes to the job's established memory usage to
propagate and an over enthusiastic optimisation meant jobs were updated
from stale state causing recent change to be lost.
2018-04-24 13:46:58 -04:00
Ryan Ernst 2efd22454a Migrate x-pack-elasticsearch source to elasticsearch 2018-04-20 15:29:54 -07:00