Commit Graph

937 Commits

Author SHA1 Message Date
Przemysław Witek 4e4ca6ac25
Extract ClientHelper.filterSecurityHeaders method and use it in ML code (#58447) (#58459) 2020-06-23 22:18:39 +02:00
Benjamin Trent a9b868b7a9
[7.x] [ML] allow data streams to be expanded for analytics and transforms (#58280) (#58455)
This commits allows data streams to be a valid source for analytics and transforms.

Data streams are fairly transparent and our `_search` and `_reindex` actions work without error.

For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.
2020-06-23 14:40:35 -04:00
Dimitris Athanasiou f67fee387b
[7.x][ML] Make regression training set predictable in size (#58331) (#58453)
Unlike `classification`, which is using a cross validation splitter
that produces training sets whose size is predictable and equal to
`training_percent * class_cardinality`, for regression we have been
using a random splitter that takes an independent decision for each
document. This means we cannot predict the exact size of the training
set. This poses a problem as we move towards performing test inference
on the java side as we need to be able to provide an accurate upper
bound of the training set size to the c++ process.

This commit replaces the random splitter we use for regression with
the same streaming-reservoir approach we do for `classification`.

Backport of #58331
2020-06-23 19:49:03 +03:00
David Roberts 0d6bfd0ac3
[7.x][ML] Fix wire serialization for flush acknowledgements (#58443)
There was a discrepancy in the implementation of flush
acknowledgements: most of the class was designed on the
basis that the "last finalized bucket time" could be null
but the wire serialization assumed that it was never
null.  This works because, the C++ sends zero "last
finalized bucket time" when it is not known or not
relevant.  But then the Java code will print that to
XContent as it is assuming null represents not known or
not relevant.

This change corrects the discrepancies.  Internally within
the class null represents not known or not relevant, but
this is translated from/to 0 for communications from the
C++ and old nodes that have the bug.

Additionally I switched from Date to Instant for this
class and made the member variables final to modernise it
a bit.

Backport of #58413
2020-06-23 16:42:06 +01:00
Benjamin Trent 61142a3005
[ML] only log if forecasts are set to failed (#58421) (#58437)
This adjusts the logging level for setting forecasts to failed to WARN. And it will only log if 1 or more forecasts were adjusted to failed.
2020-06-23 10:24:03 -04:00
Martijn van Groningen 7dda9934f9
Keep track of timestamp_field mapping as part of a data stream (#58400)
Backporting #58096 to 7.x branch.
Relates to #53100

* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
2020-06-22 17:46:38 +02:00
Benjamin Trent bf8641aa15
[7.x] [ML] calculate cache misses for inference and return in stats (#58252) (#58363)
When a local model is constructed, the cache hit miss count is incremented.

When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
2020-06-19 09:46:51 -04:00
Przemysław Witek 9dd3d5aa48
[7.x] Delete auto-generated annotations when model snapshot is reverted (#58240) (#58335) 2020-06-18 17:59:52 +02:00
Benjamin Trent 2de242f80e
[ML] rename EnsembleSizeInfo#inputFieldNameLengths to this.featureNameLengths (#58241) (#58253) 2020-06-17 10:08:55 -04:00
Benjamin Trent 69338b03d7
[ML] expand data_streams when assigning datafeed to node (#58175) (#58242) 2020-06-17 08:34:34 -04:00
Jason Tedor b78b3edeea
Upgrade to JNA 5.5.0 (#58183)
This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are
now on the latest maintained line, and pick up a large collection of bug
fixes that have accumulated.
2020-06-17 07:35:08 -04:00
Dimitris Athanasiou 36dbf08d47
[7.x][ML] Improve stability of stratified splitter tests (#58180) (#58224)
The main improvement here is that the total expected
count of training rows in the test is calculated as the
sum of the training fraction times the cardinality of each
class (instead of the training fraction times the total doc count).

Also relaxes slightly the error bound on the uniformity test from 0.12
to 0.13.

Closes #54122

Backport of #58180
2020-06-17 12:40:21 +03:00
Przemysław Witek b22e91cefc
[7.x] Delete auto-generated annotations when job is deleted. (#58169) (#58219) 2020-06-17 09:17:20 +02:00
Tal Levy 69d5e044af
Add optional description parameter to ingest processors. (#57906) (#58152)
This commit adds an optional field, `description`, to all ingest processors
so that users can explain the purpose of the specific processor instance.

Closes #56000.
2020-06-15 19:27:57 -07:00
Rene Groeschke 01e9126588
Remove deprecated usage of testCompile configuration (#57921) (#58083)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-14 22:30:44 +02:00
David Roberts 93b693527a
[7.x][ML] Add categorizer stats ML result type (#58001)
This type of result will store stats about how well categorization
is performing.  When per-partition categorization is in use, separate
documents will be written for every partition so that it is possible
to see if categorization is working well for some partitions but not
others.

This PR is a minimal implementation to allow the C++ side changes to
be made.  More Java side changes related to per-partition
categorization will be in followup PRs.  However, even in the long
term I do not see a major benefit in introducing dedicated APIs for
querying categorizer stats.  Like forecast request stats the
categorizer stats can be read directly from the job's results alias.

Backport of #57978
2020-06-12 12:08:07 +01:00
David Kyle 39020f3900
HLRC for delete expired data by job Id (#57722) (#57975)
High level rest client changes for #57337
2020-06-12 09:44:17 +01:00
Benjamin Trent 2881995a45
[ML] adding new inference model size estimate handling from native process (#57930) (#57999)
Adds support for reading in `model_size_info` objects.

These objects contain numeric values indicating the model definition size and complexity.

Additionally, these objects are not stored or serialized to any other node. They are to be used for calculating and storing model metadata. They are much smaller on heap than the true model definition and should help prevent the analytics process from using too much memory.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-11 15:59:23 -04:00
David Roberts 54d4f2a623 [ML] Refresh annotations index on job flush and close (#57979)
Now that annotations are part of the anomaly detection job results
the annotations index should be refreshed on flushing and closing
the job so that flush and close continue to fulfil their contracts
that immediately after returning all results the job generated up
to that point are searchable.
2020-06-11 12:29:04 +01:00
David Kyle b87b147704
Add models for search to ModelLoadingService (#57592) (#57919)
ModelLoadingService only caches models if they are referenced by an 
ingest pipeline. For models used in search we want to always cache the
models and rely on TTL to evict them. Additionally when an ingest 
pipeline is deleted the model it references should not be evicted if 
it is used in search.
2020-06-11 10:48:37 +01:00
David Kyle 2905a2f623
Use Search After job iterators (#57875) (#57923)
Search after is a better choice for the delete expired data iterators
where processing takes a long time as unlike scroll a context does not
have to be kept alive. Also changes the delete expired data endpoint to
404 if the job is unknown
2020-06-11 10:06:18 +01:00
Valeriy Khakhutskyy c0f368bbf3
[7.x][ML] Adjust assertion for job case memory usage estimates (#57929)
Since we change the memory estimates for data frame analytics jobs from worst case to a realistic case, the strict less-than assertion in the test does not hold anymore. I replaced it with a less-or-equal-than assertion.

Backport or #57882
2020-06-10 15:17:16 +02:00
Jake Landis a370d5eead
[7.x] Ensure Joni warning are logged at debug (#57302) (#57897)
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.

This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN 
level. This WARN level log should only occur on pipeline creation which 
is a much lower frequency then every document. 

Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
2020-06-09 17:06:29 -05:00
Benjamin Trent d5522c2747
[ML] add new circuit breaker for inference model caching (#57731) (#57830)
This adds new plugin level circuit breaker for the ML plugin.

`model_inference` is the circuit breaker qualified name.

Right now it simply adds to the breaker when the model is loaded (and possibly breaking) and removing from the breaker when the model is unloaded.
2020-06-08 16:02:48 -04:00
Mayya Sharipova 70e63a365a
Refactor how to determine if a field is metafield (#57378) (#57771)
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.

This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.

Related #38373, #41656
Closes #24422
2020-06-08 09:16:18 -04:00
David Kyle 08d1286de7
[7.x] Delete expired data by job (#57337) (#57796)
Deleting expired data can take a long time leading to timeouts if there
are many jobs. Often the problem is due to a few large jobs which 
prevent the regular maintenance of the remaining jobs. This change adds
a job_id parameter to the delete expired data endpoint to help clean up
those problematic jobs.
2020-06-08 13:00:23 +01:00
David Roberts 1d64d55a86
[7.x][ML] Add per-partition categorization option (#57723)
This PR adds the initial Java side changes to enable
use of the per-partition categorization functionality
added in elastic/ml-cpp#1293.

There will be a followup change to complete the work,
as there cannot be any end-to-end integration tests
until elastic/ml-cpp#1293 is merged, and also
elastic/ml-cpp#1293 does not implement some of the
more peripheral functionality, like stop_on_warn and
per-partition stats documents.

The changes so far cover REST APIs, results object
formats, HLRC and docs.

Backport of #57683
2020-06-06 08:15:17 +01:00
Benjamin Trent 9666a895f7
[ML] inference performance optimizations and refactor (#57674) (#57753)
This is a major refactor of the underlying inference logic.

The main refactor is now we are separating the model configuration and
the inference interfaces.

This has the following benefits:
 - we can store extra things with the model that are not
   necessary for inference (i.e. treenode split information gain)
 - we can optimize inference separate from model serialization and storage.
 - The user is oblivious to the optimizations (other than seeing the benefits).

A major part of this commit is removing all inference related methods from the
trained model configurations (ensemble, tree, etc.) and moving them to a new class.

This new class satisfies a new interface that is ONLY for inference.

The optimizations applied currently are:
- feature maps are flattened once
- feature extraction only happens once at the highest level
  (improves inference + feature importance through put)
- Only storing what we need for inference + feature importance on heap
2020-06-05 14:20:58 -04:00
Dimitris Athanasiou f49a14ce6f
[7.x][ML] Fix race condition when force stopping DF analytics job (#57680) (#57717)
When we force delete a DF analytics job, we currently first force
stop it and then we proceed with deleting the job config.
This may result in logging errors if the job config is deleted
before it is retrieved while the job is starting.

Instead of force stopping the job, it would make more sense to
try to stop the job gracefully first. So we now try that out first.
If normal stop fails, then we resort to force stopping the job to
ensure we can go through with the delete.

In addition, this commit introduces `timeout` for the delete action
and makes use of it in the child requests.

Backport of #57680
2020-06-05 17:50:01 +03:00
William Brafford dfb6def3da Revert "Restore xpack.ilm.enabled and xpack.slm.enabled settings (#57383)"
This reverts commit 7a67fb2d04.
2020-06-04 16:25:05 -04:00
William Brafford 7a67fb2d04
Restore xpack.ilm.enabled and xpack.slm.enabled settings (#57383)
In #55592 and #55416, we deprecated the settings for enabling and disabling
basic license features and turned those settings into no-ops. Since doing so,
we've had feedback that this change may not give users enough time to cleanly
switch from non-ILM index management tools to ILM. If two index managers
operate simultaneously, results could be strange and difficult to
reconstruct. We don't know of any cases where SLM will cause a problem, but we
are restoring that setting as well, to be on the safe side.

This PR is not a strict commit reversion. First, we are keeping the new
xpack.watcher.use_ilm_index_management setting, introduced when
xpack.ilm.enabled was made a no-op, so that users can begin migrating to using
it. Second, the SLM setting was modified in the same commit as a group of other
settings, so I have taken just the changes relating to SLM.
2020-06-04 13:38:22 -04:00
Przemysław Witek 6b5f49d097
[7.x] Introduce ModelPlotConfig. annotations_enabled setting (#57539) (#57641) 2020-06-04 15:15:35 +02:00
Benjamin Trent ea9b8b9d41
[ML] fix setting forecasts to failed method (#57654) (#57656) 2020-06-04 08:54:46 -04:00
Przemysław Witek ea6cfb7c3d
[7.x] Make Annotation a result type (#56342) (#57508) 2020-06-02 11:56:41 +02:00
Przemysław Witek ceb4b29b98
Introduce Annotation.event field (#57144) (#57453) 2020-06-01 20:42:25 +02:00
David Kyle 064093c4d4 Fix compilation after backport of #57278 2020-06-01 12:03:13 +01:00
Przemysław Witek 72ad9a4548
[7.x] Make AnnotationPersister use bulk requests instead of indexing individual documents (#57278) (#57354) 2020-06-01 12:05:09 +02:00
Benjamin Trent 34f1e0b6bb
[7.x] [ML] mark forecasts for force closed/failed jobs as failed (#57143) (#57374)
* [ML] mark forecasts for force closed/failed jobs as failed (#57143)

forecasts that are still running should be marked as failed/finished in the following scenarios:

- Job is force closed
- Job is re-assigned to another node.

Forecasts are not "resilient". Their execution does not continue after a node failure. Consequently, forecasts marked as STARTED or SCHEDULED should be flagged as failed. These forecasts can then be deleted.

Additionally, force closing a job kills the native task directly. This means that if a forecast was running, it is not allowed to complete and could still have the status of `STARTED` in the index.

relates to https://github.com/elastic/elasticsearch/issues/56419
2020-05-29 14:48:10 -04:00
Benjamin Trent 35d5126cea
[7.x] [ML] adds new for_export flag to GET _ml/inference API (#57351) (#57368)
* [ML] adds new for_export flag to GET _ml/inference API (#57351)

Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.

This flag is useful for moving models between clusters.
2020-05-29 14:01:08 -04:00
Benjamin Trent c8374dc9f3
[ML] add max_model_memory parameter to forecast request (#57254) (#57355)
This adds a max_model_memory setting to forecast requests. 
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").

The default value is `20mb`.

There is a HARD limit at `500mb` which will throw an error if used.

If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.

related native change: https://github.com/elastic/ml-cpp/pull/1238

closes: https://github.com/elastic/elasticsearch/issues/56420
2020-05-29 11:16:08 -04:00
Dimitris Athanasiou 322f953060
[7.x][ML] Anomaly detection jobs should allow missing values for geo fields (#57300) (#57338)
Allows geo fields (`geo_point`, `geo_shape`) to have missing values.
Fixes a bug where such missing values would result in an error.

Closes #57299

Backport of #57300
2020-05-29 13:06:16 +03:00
Benjamin Trent 24d605e41e
[ML] fixing GET _ml/inference so size param is respected (#57303) (#57308)
`size` was previously ignored when grabbing full trained model configs. 

closes https://github.com/elastic/elasticsearch/issues/57298
2020-05-28 15:45:26 -04:00
David Roberts d139a79ef6
[7.x][ML] Fix monitoring if orphaned anomaly detector persistent tasks exist (#57240)
Since #51888 the ML job stats endpoint has returned entries for
jobs that have a persistent task but not job config. Such
orphaned tasks caused monitoring to fail.

This change ignores any such corrupt jobs for monitoring purposes.

Backport of #57235
2020-05-27 22:59:11 +01:00
Benjamin Trent decc6277f9
[ML] allow unran/incomplete forecasts to be deleted for stopped/failed jobs (#57152) (#57172)
If a job is NOT opened, forecasts should be able to be deleted, no matter their state.

This also fixes a bug with expanding forecast IDs. We should check for wildcard `*` and `_all` when expanding the ids

closes https://github.com/elastic/elasticsearch/issues/56419
2020-05-26 15:44:22 -04:00
David Kyle 571477d0ad
[7.x] Fix delete_expired_data/nightly maintenance when many model snapshots need deleting (#57041) (#57136)
Fix delete_expired_data/nightly maintenance when 
many model snapshots need deleting (#57041)

The queries performed by the expired data removers pull back entire 
documents when only a few fields are required. For ModelSnapshots in 
particular this is a problem as they contain quantiles which may be 
100s of KB and the search size is set to 10,000.

This change makes the search more efficient by only requesting the 
fields needed to work out which expired data should be deleted.
2020-05-26 10:56:42 +01:00
Przemysław Witek ea2012778e
Mute failing test (#57112) (#57113) 2020-05-25 14:06:29 +02:00
Benjamin Trent f00dfb2d5f
[ML] adds WKT support in filestructurefinder (#57014) (#57032)
Field mapping detection is done via grok patterns. 
This commit adds well-known text (WKT) formatted geometry detection.

If everything is a `POINT`, then a `geo_point` mapping is preferred. 
Otherwise, if all the fields are WKT geometries a `geo_shape` mapping is preferred.

This does **NOT** detect other types of formatted geometries (geohash, comma delimited points, etc.)

closes https://github.com/elastic/elasticsearch/issues/56967
2020-05-21 08:22:51 -04:00
Alan Woodward 18bfbeda29 Move merge compatibility logic from MappedFieldType to FieldMapper (#56915)
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.

This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.

Relates to #56814
2020-05-20 09:43:13 +01:00
Benjamin Trent 297f864884
[ML] relax throttling on expired data cleanup (#56711) (#56895)
Throttling nightly cleanup as much as we do has been over cautious.

Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.

Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.

This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
2020-05-18 08:46:42 -04:00
Dimitris Athanasiou 54d3cc74ec
[7.x][ML] Ensure class is represented when its cardinality is low (#56783) (#56829)
In DF analytics classification, it is possible to use no samples
of a class if its cardinality is too low.

This commit fixes this by ensuring the target sample count can never be zero.

Backport of #56783
2020-05-15 20:52:06 +03:00
David Roberts 270a23e422 [TEST] Fix log tail mocking in native process unit tests (#56804)
This is a followup to #56632. Tests that had to be changed
to mock the C++ log handler more accurately need to be more
careful about when that stream ends, as ending of that
stream is used to detect crashes in the production system.

Fixes #56796
2020-05-15 12:46:37 +01:00
Dimitris Athanasiou ac5902624c
[7.x][ML] Improve error upon DF analytics mappings conflict (#56700) (#56776)
Adds the conflicting types and an example of an index which specifies
them in order to make it easier for the user to understand the conflict.

Backport of #56700
2020-05-14 19:16:10 +03:00
David Roberts 3051c37f92
[ML] Tail the C++ logging pipe before connecting other pipes (#56701)
Prior to this change the named pipes that connect the ML C++
processes to the Elasticsearch JVM were all opened before any
of them were read from or written to.

This created a problem, where if the C++ process logged more
messages between opening the log pipe and opening the last
pipe to be connected than there was space for in the named
pipe's buffer then the C++ process would block.  This would
mean it never got as far as opening the last named pipe, so
the JVM would never get as far as reading from the log pipe,
hence a deadlock.

This change alters the connection order so that the JVM
starts reading from the logging pipe immediately after opening
it so that if the C++ process logs messages while opening the
other named pipes they are captured in a timely manner and
there is no danger of a deadlock.

Backport of #56632
2020-05-14 07:10:30 +01:00
Armin Braun 0a879b95d1
Save Bounds Checks in BytesReference (#56577) (#56621)
Two spots that allow for some optimization:

* We are often creating a composite reference of just a single item in
the transport layer => special cased via static constructor to make sure we never do that
   * Also removed the pointless case of an empty composite bytes ref
* `ByteBufferReference` is practically always created from a heap buffer these days so there
is no point of dealing with all the bounds checks and extra references to sliced buffers from that
and we can just use the underlying array directly
2020-05-12 20:33:45 +02:00
Ryan Ernst 902fc546bd
Migrate remaining ESIntegTestCases to internalClusterTest (#56479) (#56563)
This commit migrates the ESIntegTestCase tests in x-pack to the
internalClusterTest source set.
2020-05-11 21:06:04 -07:00
zhenxianyimeng 8e96e5c936
Use CollectionUtils.isEmpty where appropriate (#55910)
This commit uses the isEmpty utility method for arrays in place of null and greater than zero checks.
2020-05-11 09:55:57 -07:00
Dimitris Athanasiou 44ffa388ac
[7.x][ML] Use non-zero timeout when force stopping DF analytics (#56423) (#56428)
We have been using a zero timeout in the case that DF analytics
is stopped. This may cause a timeout when we cancel, for example,
the reindex task.

This commit fixes this by using the default timeout instead.

Backport of #56423
2020-05-08 21:12:11 +03:00
David Roberts 9a3924a641
[ML] Adjust list of platforms that have ML native code (#56426)
Native code is now available for linux-aarch64.

Note that it is _not_ currently supported!
2020-05-08 16:22:45 +01:00
Dimitris Athanasiou c117ae7a6e
[7.x][ML] Force stopping stopped DF analytics should succeed (#56421) (#56424)
Force stopping a DF analytics job whose config exists and that
is stopped should succeed. This was broken by #56360.

Closes #56414

Backport of #56421
2020-05-08 18:04:24 +03:00
Dimitris Athanasiou 60b1c67409
[7.x][ML] Allow stopping DF analytics whose config is missing (#56360) (#56408)
It is possible that the config document for a data frame
analytics job is deleted from the config index. If that is
the case the user is unable to stop a running job because
we attempt to retrieve the config and that will throw.

This commit changes that. When the request is forced,
we do not expand the requested ids based on the existing
configs but from the list of running tasks instead.

Backport of #56360
2020-05-08 13:54:44 +03:00
Dimitris Athanasiou d064eda2b0
[7.x][ML] Ensure phase progress may only increase (#56339) (#56357)
Due to multi-threading it is possible that phase progress
updates written from the c++ process arrive reordered.
We can address this by ensuring that progress may only increase.

Closes #56282

Backport of #56339
2020-05-07 19:46:58 +03:00
Przemysław Witek 0cd0ab276e
Introduce Annotation.Builder class and use it to create instances of Annotation class (#56276) (#56286) 2020-05-06 20:47:03 +02:00
Dimitris Athanasiou 011e995165
[7.x][ML] Unmute ClssificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange (#56268) (#56287)
Closes #56240
2020-05-06 18:20:46 +03:00
Julie Tibshirani 49de092b38 Mute RegressionIT.testTwoJobsWithSameRandomizeSeedUseSameTrainingSet. 2020-05-05 16:25:36 -07:00
Julie Tibshirani 63062ec7bd Mute ClassificationIT.testDependentVariableCardinalityTooHighButWithQueryMakesItWithinRange. 2020-05-05 13:48:35 -07:00
Dan Hermann 6674f14fb3
[7.x] Get index includes parent data stream for backing indices (#56238) 2020-05-05 15:43:42 -05:00
Benjamin Trent e1c5ca421e
[7.x] [ML] lay ground work for handling >1 result indices (#55892) (#56192)
* [ML] lay ground work for handling >1 result indices (#55892)

This commit removes all but one reference to `getInitialResultsIndexName`. 
This is to support more than one result index for a single job.
2020-05-05 15:54:08 -04:00
William Brafford 3499fa917c
Deprecated xpack "enable" settings should be no-ops (#55416) (#56167)
The following settings are now no-ops:

* xpack.flattened.enabled
* xpack.logstash.enabled
* xpack.rollup.enabled
* xpack.slm.enabled
* xpack.sql.enabled
* xpack.transform.enabled
* xpack.vectors.enabled

Since these settings no longer need to be checked, we can remove settings
parameters from a number of constructors and methods, and do so in this
commit.

We also update documentation to remove references to these settings.
2020-05-05 10:40:49 -04:00
David Roberts 7aa0daaabd
[7.x][ML] More advanced model snapshot retention options (#56194)
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Backport of #56125
2020-05-05 14:31:58 +01:00
Dimitris Athanasiou 75dadb7a6d
[7.x][ML] Add loss_function to regression (#56118) (#56187)
Adds parameters `loss_function` and `loss_function_parameter`
to regression.

Backport of #56118
2020-05-05 14:59:51 +03:00
Dimitris Athanasiou 6061aa3db4
[7.x][ML] Fix race condition updating reindexing progress (#56135) (#56146)
In #55763 I thought I could remove the flag that marks
reindexing was finished on a data frame analytics task.
However, that exposed a race condition. It is possible that
between updating reindexing progress to 100 because we
have called `DataFrameAnalyticsManager.startAnalytics()` and
a call to the _stats API which updates reindexing progress via the
method `DataFrameAnalyticsTask.updateReindexTaskProgress()` we
end up overwriting the 100 with a lower progress value.

This commit fixes this issue by bringing back the help of
a `isReindexingFinished` flag as it was prior to #55763.

Closes #56128

Backport of #56135
2020-05-05 10:48:42 +03:00
Martijn van Groningen 2ac32db607
Move includeDataStream flag from IndicesOptions to IndexNameExpressionResolver.Context (#56151)
Backport of #56034.

Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context
as a dedicated field that callers to IndexNameExpressionResolver can set.

Also alter indices stats api to support data streams.
The rollover api uses this api and otherwise rolling over data stream does no longer work.

Relates to #53100
2020-05-04 22:38:33 +02:00
Benjamin Trent 6c26de444d
[ML] reduce InferenceProcessor.Factory log spam by not parsing pipelines (#56020) (#56126)
If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam.

It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion.

Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor).

closes https://github.com/elastic/elasticsearch/issues/55985
2020-05-04 13:32:01 -04:00
Martijn van Groningen 6d03081560
Add auto create action (#56122)
Backport of #55858 to 7.x branch.

Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.

This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:

* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See #55377

* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:

AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.

The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.

Relates to #53100
2020-05-04 19:10:09 +02:00
Przemysław Witek 44f5a8ccd3
Use snapshot's latest result time rather than snapshot's creation time when creating an annotation (#56093) (#56103) 2020-05-04 12:36:12 +02:00
William Brafford d53c941c41
Make xpack.monitoring.enabled setting a no-op (#55617) (#56061)
* Make xpack.monitoring.enabled setting a no-op

This commit turns xpack.monitoring.enabled into a no-op. Mostly, this involved
removing the setting from the setup for integration tests. Monitoring may
introduce some complexity for test setup and teardown, so we should keep an eye
out for turbulence and failures

* Docs for making deprecated setting a no-op
2020-05-01 16:42:11 -04:00
Ryan Ernst 52b9d8d15e
Convert remaining license methods to isAllowed (#55908) (#55991)
This commit converts the remaining isXXXAllowed methods to instead of
use isAllowed with a Feature value. There are a couple other methods
that are static, as well as some licensed features that check the
license directly, but those will be dealt with in other followups.
2020-04-30 15:52:22 -07:00
Benjamin Trent c36bcb4dd0
[ML] fixing file structure finder multiline merge max for delimited formats (#56023) (#56035)
This commit correctly sets the maxLinesPerRow in the CsvPreference for delimited files given the file structure finder settings.

Previously, it was silently ignored.
2020-04-30 10:51:32 -04:00
Benjamin Trent 04b1f6498b
[ML] using new fixed interval in ml tests (#56021) (#56031)
This commit removes deprecated references to DateHistogram.interval from ml tests
2020-04-30 10:26:39 -04:00
Dimitris Athanasiou 17b904def5
[7.x][ML] Decouple DFA progress testing from analyses phases (#55925) (#56024)
This refactors native integ tests to assert progress without
expecting explicit phases for analyses. We can test those with
yaml tests in a single place.

Backport of #55925
2020-04-30 17:05:47 +03:00
William Brafford 273ff6a105
Make xpack.ilm.enabled setting a no-op (#55592) (#55980)
* Make xpack.ilm.enabled setting a no-op

* Add watcher setting to not use ILM

* Update documentation for no-op setting

* Remove NO_ILM ml index templates

* Remove unneeded setting from test setup

* Inline variable definitions for ML templates

* Use identical parameter names in templates

* New ILM/watcher setting falls back to old setting

* Add fallback unit test for watcher/ilm setting
2020-04-30 09:50:18 -04:00
David Kyle c204353249
[ML] Wait for model loaded and cached in ModelLoadingServiceTests (#56014)
Fixes test by exposing the method ModelLoadingService::addModelLoadedListener() 
so that the test class can be notified when a model is loaded which happens in
a background thread
2020-04-30 13:32:07 +01:00
Dimitris Athanasiou c5aa281171
[7.x][ML] Remove error on parsing progress for unknown phase in DFA (#55926) (#55954)
On second thought, this check does not seem to be adding value.
We can test that the phases are as we expect them for each analysis
by adding yaml tests. Those would fail if we introduce new phases
from c++ accidentally or without coordination. This would achieve
the same thing. At the same time we would not have to comment out
this code each time a new phase is introduced. Instead we can just
temporarily mute those yaml tests. Note I will add those tests
right after the imminent new phases are added to the c++ side.

Backport of #55926
2020-04-29 20:11:33 +03:00
Benjamin Trent edd049f9cd
[ML] Allow a certain number of ill-formatted rows when delimited format is specified (#55735) (#55944)
While it is good to not be lenient when attempting to guess the file format, it is frustrating to users when they KNOW it is CSV but there are a few ill-formatted rows in the file (via some entry error, etc.).

This commit allows for up to 10% of sample rows to be considered "bad". These rows are effectively ignored while guessing the format.

This percentage of "allows bad rows" is only applied when the user has specified delimited formatting options. As the structure finder needs some guidance on what a "bad row" actually means.

related to https://github.com/elastic/elasticsearch/issues/38890
2020-04-29 11:15:21 -04:00
Dimitris Athanasiou d9685a0f19
[7.x][ML] Validate at least one feature is available for DF analytics (#55876) (#55914)
We were previously checking at least one supported field existed
when the _explain API was called. However, in the case of analyses
with required fields (e.g. regression) we were not accounting that
the dependent variable is not a feature and thus if the source index
only contains the dependent variable field there are no features to
train a model on.

This commit adds a validation that at least one feature is available
for analysis. Note that we also move that validation away from
`ExtractedFieldsDetector` and the _explain API and straight into
the _start API. The reason for doing this is to allow the user to use
the _explain API in order to understand why they would be seeing an
error like this one.

For example, the user might be using an index that has fields but
they are of unsupported types. If they start the job and get
an error that there are no features, they will wonder why that is.
Calling the _explain API will show them that all their fields are
unsupported. If the _explain API was failing instead, there would
be no way for the user to understand why all those fields are
ignored.

Closes #55593

Backport of #55876
2020-04-29 11:39:58 +03:00
David Roberts 61ac09ae21
[ML] Add daily_model_snapshot_retention_after_days to job config (#55891)
This change adds a new setting, daily_model_snapshot_retention_after_days,
to the anomaly detection job config.

Initially this has no effect, the effect will be added in a followup PR.
This PR gets the complexities of making changes that interact with BWC
over well before feature freeze.

Backport of #55878
2020-04-29 09:12:53 +01:00
Dimitris Athanasiou abab4c4d4f
[7.x][ML] Do not fail DFA task when it's stopped whilst reindexing (#55797) (#55800)
Adding to #55659, we missed another way we could set the task to
failed due to task cancellation. CI revealed that we might also
get a `SearchPhaseExecutionException` whose cause is a
`TaskCancelledException`. That exception is not wrapped so
unwrapping it will not return the underlying `TaskCancelledException`.
Thus to be complete in catching this, we also need to check the
error's cause.

Closes #55068

Backport of #55797
2020-04-27 16:03:57 +03:00
Dimitris Athanasiou 7f100c1196
[7.x][ML] Allow analytics process define its own progress phases (#55763) (#55791)
This is a continuation from #55580.

Now that we're parsing phase progresses from the analytics process
we change `ProgressTracker` to allow for custom phases between
the `loading_data` and `writing_results` phases. Each `DataFrameAnalysis`
may declare its own phases.

This commit sets things in place for the analytics process to start
reporting different phases per analysis type. However, this is
still preserving existing behaviour as all analyses currently
declare a single `analyzing` phase.

Backport of #55763
2020-04-27 13:30:05 +03:00
David Roberts 3ba44a5af8
[ML] Adding failed_category_count to model_size_stats (#55761)
The failed_category_count statistic records the number of times
categorization wanted to create a new category but couldn't
because the job had reached its model_memory_limit.

Backport of #55716
2020-04-25 10:36:49 +01:00
Dimitris Athanasiou 210b7f1b76
[7.x][ML] Remove parsing of old progress format in DF Analytics (#55711) (#55720)
Since #55580 we've introduced a new format for parsing progress
from the data frame analytics process. As the process is now
writing out progress in this new way, we can remove the parsing
of the old format.

Backport of #55711
2020-04-24 16:50:56 +03:00
Przemysław Witek c89917c799
Register DFA jobs on putAnalytics rather than via a separate method (#55458) (#55708) 2020-04-24 10:59:32 +02:00
Dimitris Athanasiou b8379872a7
[7.x][ML] Logs error when DFA task is set to failed (#55545) (#55668)
Also unmutes the integ test that stops and restarts
an outlier detection job with the hope of learning more
of the failure in #55068.

Backport of #55545

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-04-24 11:06:07 +03:00
David Roberts 46be9959a0
[ML] Audit when unassigned datafeeds are stopped (#55667)
Previously audit messages were indexed when datafeeds that were
assigned to a node were stopped, but not datafeeds that were
unassigned at the time they were stopped.

This change adds auditing for the unassigned case.

Backport of #55656
2020-04-23 20:46:35 +01:00
Dimitris Athanasiou 4b11adf074
[7.x][ML] Do not fail DFA task that is stopped during reindexing (#55659) (#55663)
While we were catching `TaskCancelledException` while we wait for
reindexing to complete, we missed the fact that this exception
may be wrapped in a multi-node cluster. This is the reason
we may still fail the task when stop is called while reindexing.

Some times we're lucky and the exception is thrown by the same
node that runs the job. Then the exception is not wrapped and
things work fine. But when that is not the case the exception is
wrapped, we fail to catch it, and set the task to failed.

The fix is to simply unwrap the exception when we check it it
is `TaskCancelledException`.

Closes #55068

Backport of #55659
2020-04-23 15:57:01 +03:00
Rory Hunter d66af46724
Always use deprecateAndMaybeLog for deprecation warnings (#55319)
Backport of #55115.

Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
2020-04-23 09:20:54 +01:00
David Roberts 87f4751eca [ML] Make find_file_structure recognize Kibana CSV report timestamps (#55609)
The Kibana CSV export feature uses a non-standard timestamp format.
This change adds it to the formats the find_file_structure endpoint
recognizes out-of-the-box, to make round-tripping data from Kibana
back to Kibana via CSV files easier.

Fixes #55586
2020-04-23 08:39:07 +01:00
Dimitris Athanasiou 50a5afed15
[7.x][ML] Prepare parsing phase_progress from DFA process (#55580) (#55587)
Data frame analytics process currently reports progress as
an integer `progress_percent`. We parse that and report it
from the _stats API as the progress of the `analyzing` phase.
However, we want to allow the DFA process to report progress
for more than one phase. This commit prepares for this by
parsing `phase_progress` from the process, an object that
contains the `phase` name plus the `progress_percent` for that
phase.

Backport of #55580
2020-04-22 16:38:32 +03:00
Benjamin Trent 7c81cd7833
[ML] explicitly disallow partial results in datafeed extractors (#55537) (#55585)
Instead of doing our own checks against REST status, shard counts, and shard failures, this commit changes all our extractor search requests to set `.setAllowPartialSearchResults(false)`.

- Scrolls are automatically cleared when a search failure occurs with `.setAllowPartialSearchResults(false)` set.
- Code error handling is simplified

closes https://github.com/elastic/elasticsearch/issues/40793
2020-04-22 09:07:44 -04:00
David Roberts 810caf5ffe
[ML] Test that audit message is written when closing unassigned job (#55582)
Issue #55521 suggested that audit messages were not written when
closing an unassigned job.  This is not the case, but we didn't
have a test to prove it.

Backport of #55571
2020-04-22 13:23:43 +01:00
David Roberts 2dc5586afe
[ML] Add effective max model memory limit to ML info (#55581)
The ML info endpoint returns the max_model_memory_limit setting
if one is configured.  However, it is still possible to create
a job that cannot run anywhere in the current cluster because
no node in the cluster has enough memory to accommodate it.

This change adds an extra piece of information,
limits.effective_max_model_memory_limit, to the ML info
response that returns the biggest model memory limit that could
be run in the current cluster assuming no other jobs were
running.

The idea is that the ML UI will be able to warn users who try to
create jobs with higher model memory limits that their jobs will
not be able to start unless they add a bigger ML node to their
cluster.

Backport of #55529
2020-04-22 12:28:50 +01:00