Commit Graph

313 Commits

Author SHA1 Message Date
Hendrik Muhs 6c313a9871 This implementation lazily (on 1st forecast request) checks for available
diskspace and creates a subfolder for storing data outside of Lucene
indexes, but as part of the ES data paths.

Details:
 - tmp storage is managed and does not allow allocation if disk space is
   below a threshold (5GB at the moment)
 - tmp storage is supposed to be managed by the native component but in
   case this fails cleanup is provided:
    - on job close
    - on process crash
    - after node crash, on restart
 - available space is re-checked for every forecast call (the native
   component has to check again before writing)

Note: The 1st path that has enough space is chosen on job open (job
close/reopen triggers a new search)
2018-05-18 14:04:09 +02:00
Hendrik Muhs d893041634
[ML] add version information in case of crash of native ML process (#30674)
This change adds version information in case a native ML process crashes, the version is important for choosing the right symbol files when analyzing the crash. Adding the version combines all necessary information on one line.

relates elastic/ml-cpp#94
2018-05-18 07:46:52 +02:00
Dimitris Athanasiou 75665a2d3e
[ML] Clean left behind model state docs (#30659)
It is possible for state documents to be
left behind in the state index. This may be
because of bugs or uncontrollable scenarios.
In any case, those documents may take up quite
some disk space when they add up. This commit
adds a step in the expired data deletion that
is part of the daily maintenance service. The
new step searches for state documents that
do not belong to any of the current jobs and
deletes them.

Closes #30551
2018-05-17 17:51:26 +03:00
Dimitris Athanasiou 01bdfcde6f
[ML] DeleteExpiredDataAction should use client with origin (#30646)
This is an admin action that should be allowed to operate on
ML indices with full permissions.
2018-05-16 23:35:23 +03:00
Colin Goodheart-Smithe a75b8adce5
Refactors ClientHelper to combine header logic (#30620)
* Refactors ClientHelper to combine header logic

This change removes all the `*ClientHelper` classes which were
repeating logic between plugins and instead adds
`ClientHelper.executeWithHeaders()` and
`ClientHelper.executeWithHeadersAsync()` methods to centralise the
logic for executing requests with stored security headers.

* Removes Watcher headers constant
2018-05-16 11:38:24 +01:00
David Roberts 50c34b2a9b
[ML] Reverse engineer Grok patterns from categorization results (#30125)
This change adds a grok_pattern field to the GET categories API
output in ML. It's calculated using the regex and examples in the
categorization result, and applying a list of candidate Grok
patterns to the bits in between the tokens that are considered to
define the category.

This can currently be considered a prototype, as the Grok patterns
it produces are not optimal. However, enough people have said it
would be useful for it to be worthwhile exposing it as experimental
functionality for interested parties to try out.
2018-05-15 09:02:38 +01:00
David Kyle 9dd629648d [ML] Improve state persistence log message 2018-05-12 09:20:08 +01:00
Dimitris Athanasiou 3b260dcfc1
[ML] Account for gaps in data counts after job is reopened (#30294)
This commit fixes an issue with the data diagnostics were
empty buckets are not reported even though they should. Once
a job is reopened, the diagnostics do not get initialized from
the current data counts (especially the latest record timestamp).
The result is that if the data that is sent have a time gap compared
to the previous ones, that gap is not accounted for in the empty bucket
count.

This commit fixes that by initializing the diagnostics with the current
data counts.

Closes #30080
2018-05-03 15:08:24 +01:00
Ryan Ernst fb0aa562a5
Network: Remove http.enabled setting (#29601)
This commit removes the http.enabled setting. While all real nodes (started with bin/elasticsearch) will always have an http binding, there are many tests that rely on the quickness of not actually needing to bind to 2 ports. For this case, the MockHttpTransport.TestPlugin provides a dummy http transport implementation which is used by default in ESIntegTestCase.

closes #12792
2018-05-02 11:42:05 -07:00
Dimitris Athanasiou 057cdffed5
[ML] Refactor DataStreamDiagnostics to use array (#30129)
This commit refactors the DataStreamDiagnostics class
achieving the following advantages:

- simpler code; by encapsulating the moving bucket histogram
into its own class
- better performance; by using an array to store the buckets
instead of a map
- explicit handling of gap buckets; in preparation of fixing #30080
2018-05-01 09:50:32 +01:00
David Roberts 225f7093a9
[ML] Include 3rd party C++ component notices (#30132)
The overall NOTICE file for the ML X-Pack module should
include the notices from the 3rd party C++ components as
well as the 3rd party Java components.
2018-04-30 20:05:27 +01:00
David Kyle cfc66a1fd5 [ML] Wait for updates to established memory usage
Tests need to wait for changes to the job's established memory usage to
propagate and an over enthusiastic optimisation meant jobs were updated
from stale state causing recent change to be lost.
2018-04-24 13:46:58 -04:00
Ryan Ernst 2efd22454a Migrate x-pack-elasticsearch source to elasticsearch 2018-04-20 15:29:54 -07:00