Commit Graph

1271 Commits

Author SHA1 Message Date
Ryland Herrick 7e8769a666
EQL: make allow_no_indices true by default (#63573) (#63645)
* Allow all indices options variants
Irrespective of allow_no_indices value, throw VerificationException when
there is no index validated

Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
2020-10-14 03:41:04 +03:00
Przemysław Witek acbd48f834
[ML] Allow setting num_top_classes to a special value -1 (#63587) (#63602) 2020-10-13 13:57:50 +02:00
Przemysław Witek bd761cce1d
[ML] Validate that AucRoc has the data necessary to be calculated (#63302) (#63454) 2020-10-08 09:52:15 +02:00
Lisa Cawley 8f76c89cd3
[7.x][DOCS] Add feature_importance_baseline to get trained model API (#63279) (#63336)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-10-06 10:08:34 -07:00
Hendrik Muhs 058c55da6a [Transform] disallow field and script being empty for group sources (#63313)
fail validation earlier when field and script are both missing in a group source
2020-10-06 16:59:02 +02:00
Yang Wang 7969fbb4ab
Cache API key doc to reduce traffic to the security index (#59376) (#63319)
Getting the API key document form the security index is the most time consuing part
of the API Key authentication flow (>60% if index is local and >90% if index is remote).
This traffic is now avoided by caching added with this PR.

Additionally, we add a cache invalidator registry so that clearing of different caches will
be managed in a single place (requires follow-up PRs).
2020-10-06 23:49:23 +11:00
Andrei Stefan 76bba601ab
Remove case_sensitive request option (#63218) (#63244)
Make EQL case sensitive by default and adapt some of the string functions
Remove the case sensitive option from Between string function
Add case_insensitive option to term and wildcard queries usage

(cherry picked from commit 7550e0664c8c2f1f13519036c759b1e76345551f)
2020-10-05 22:04:42 +03:00
Armin Braun cf75abb021
Optimize XContentParserUtils.ensureExpectedToken (#62691) (#63253)
We only ever use this with `XContentParser` no need to make it inline
worse by forcing the lambda and hence dynamic callsite here.
=> Extraced the exception formatting code path that is likely very cold
to a separate method and removed the lambda usage in hot loops by simplifying
the signature here.
2020-10-05 19:08:32 +02:00
Benjamin Trent 1e63313c19
[ML] adds feature_importance_baseline object to model metadata (#63172) (#63237)
this adds the new field `feature_importance_baseline` and allows it to be optionally be included in the model's metadata.

Related to: https://github.com/elastic/ml-cpp/pull/1522
2020-10-05 09:33:38 -04:00
Costin Leau 8c4503bcc3 EQL: Change default indices options (#63192)
Ignore by default unavailable indices (same as ES) and verify that
allowNoIndices is set to false since at least one index is required
for validating the query.

Fix #62986

(cherry picked from commit fd75ac27223cd1b699b8d9c311dc401a39f9e0c8)
2020-10-05 14:21:56 +03:00
Benjamin Trent cfcf973259
[7.x] [ML] renames */inference* apis to */trained_models* (#63097) (#63136)
* [ML] renames */inference* apis to */trained_models* (#63097)

This commit renames all `inference` CRUD APIs to `trained_models`.

This aligns with internal terminology, documentation, and use-cases.
2020-10-02 07:34:28 -04:00
Benjamin Trent 535f8a434b
Revert "[ML] adding `baseline` field to total_feature_importance objects (#63098) (#63125)" (#63144)
This reverts commit 95242eccee.
2020-10-02 07:03:15 -04:00
Benjamin Trent 95242eccee
[ML] adding `baseline` field to total_feature_importance objects (#63098) (#63125)
This adds a new `baseline` field to the feature importance values. 

This field contains the baseline importance for a given feature and class.
2020-10-01 09:48:07 -04:00
Costin Leau a6b903b783 EQL: Remove unused classes from reponse API (#62134)
Remove Count class and related artifacts since that functionality is not
(yet) available.
Update parser name for better error reporting.

Fix #62131

(cherry picked from commit 060f500346788c4c5d0b3b9c045facec5d677d3d)
2020-09-30 15:45:30 +03:00
Przemysław Witek d677a2b8ee
[7.x] [ML] Implement AucRoc metric for classification - HLRC (#62304) (#63058) 2020-09-30 14:04:10 +02:00
Benjamin Trent 0b3af242d4
[ML] fixing classification feature importance parsing (#63003) (#63015)
Classification feature importance supports various types in the class name:
- string
- boolean
- numerical

The xcontent parsing on the server side and the HLRC side should support and test these types.
2020-09-29 10:54:35 -04:00
Dimitris Athanasiou 7f6c1ff5b4
[7.x][ML] Remove top level importance from classification inference results (#62486) (#62964)
As we have decided top level importance for classification is not useful,
it has been removed from the results from the training job. This commit
also removes them from inference.

Backport of #62486
2020-09-29 10:58:48 +03:00
Andrei Dan 25106ba58f
HLRC: add support for the wait_for_snapshot ILM action (#62333) (#62931)
(cherry picked from commit b8a10b3995669954f0e8c6b3512c50da6c76d48d)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-28 09:54:24 +01:00
Andrei Dan 3590a77b2b
HLRC: add support for the searchable_snapshot ILM action (#62323) (#62887)
(cherry picked from commit 681eb58718c4cce9ed18a835f4eadb06997e91a0)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-24 16:45:50 +01:00
Hendrik Muhs a70389015d [Transform] Return parsed count for get transform stats (#62809)
In case of more than 500 transforms, get and stats return paged results which can be requested using
page parameters. For >500 transforms count wasn't parsed out of the server response but taken from
size of the list of transforms.

The change also adds client/server hlrc tests and fixes a wrong type for count in get.

fixes #56245
2020-09-24 08:38:07 +02:00
Marios Trivyzas 1e72144847
EQL: Remove support for `=` for comparisons (#62756) (#62775)
Since `=` is rarely used and is undocumented we its support for
equality comparisons keeping `==` as the only option. `=` is now only
used for assignments like in `maxspan=10m`.

Closes: #62650
(cherry picked from commit ad5ae4d887b5c2feca2d0e874d7bdf738e3fd54e)
2020-09-22 20:56:04 +02:00
Benjamin Trent e163559e4c
[7.x] [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) (#62620)
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)

Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.

* fixing test

* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
2020-09-18 10:07:35 -04:00
Costin Leau 81f2f84177 EQL: Allow requests with size 0 (#62537)
The purpose for this change is to allow validation of queries without
having to actually execute them. The optimizer already picks up this
case.

Fix #62494

(cherry picked from commit 675889559b2f96a0c1faa6fc84fd537148ba2cce)
2020-09-18 11:24:39 +03:00
William Brafford 5a0dca2491
Deprecate xpack.eql.enabled setting and make it a no-op (#61375) (#62491)
* Deprecate xpack.eql.enabled and make it a no-op
* Remove uses of xpack.eql.enabled
2020-09-17 14:17:27 -04:00
Andrei Dan fe1194d58f
[7.x] ILM migrate data between tiers (#61377) (#62536)
This adds ILM support for automatically migrating the managed
indices between data tiers.

This proposal makes use of a MigrateAction that is injected
(similar to how the Unfollow action is injected) in phases that
don't define index allocation rules using the AllocateAction or
don't explicitly define the MigrateAction itself (regardless if it's
enabled or disabled).

(cherry picked from commit c1746afffd61048d0c12d3a77e6d8191a804ed49)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-17 15:08:31 +01:00
Benjamin Trent cec102a391
[7.x] [ML] adds new n_gram_encoding custom processor (#61578) (#61935)
* [ML] adds new n_gram_encoding custom processor (#61578)

This adds a new `n_gram_encoding` feature processor for analytics and inference.

The focus of this processor is simple ngram encodings that allow:
 - multiple ngrams [1..5]
 - Prefix, infix, suffix
2020-09-04 08:36:50 -04:00
Armin Braun 28710c985d
Dry up Settings from Map Construction (#61778) (#61803)
We used the same hack all over the place. At least drying it up to a single place.

Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
2020-09-01 19:46:10 +02:00
Przemyslaw Gomulka 9f566644af
Do not create two loggers for DeprecationLogger backport(#58435) (#61530)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.

depends on #61515
backports #58435
2020-08-26 16:04:02 +02:00
Costin Leau bff3c7470e
EQL: Replace SearchHit in response with Event (#61428) (#61522)
The building block of the eql response is currently the SearchHit. This
is a problem since it is tied to an actual search, and thus has scoring,
highlighting, shard information and a lot of other things that are not
relevant for EQL.
This becomes a problem when doing sequence queries since the response is
not generated from one search query and thus there are no SearchHits to
speak of.
Emulating one is not just conceptually incorrect but also problematic
since most of the data is missed or made-up.

As such this PR introduces a simple class, Event, that maps nicely to
the terminology while hiding the ES internals (the use of SearchHit or
GetResult/GetResponse depending on the API used).

Fix #59764
Fix #59779

Co-authored-by: Igor Motov <igor@motovs.org>
(cherry picked from commit 997376fbe6ef2894038968842f5e0635731ede65)
2020-08-25 17:32:42 +03:00
Benjamin Trent 1ae2923632
[7.x] [ML] adding docs + hlrc for data frame analysis feature_processors (#61149) (#61493)
* [ML] adding docs + hlrc for data frame analysis feature_processors (#61149)

Adds HLRC and some docs for the new feature_processors field in Data frame analytics.

Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-08-24 12:56:21 -04:00
Armin Braun d05649bfae
Fix PutPolicyRequestTests.testFromXContent (#61485) (#61494)
We only ever support `JSON` for the query source format in practice.
The reason this test worked before is a bug in xcontent parsing that parses
empty maps out of streams of the wrong format.

Closes #61483
2020-08-24 18:52:05 +02:00
Yang Wang cd52233b94
Include authentication type for the authenticate response (#61247) (#61411)
Add a new "authentication_type" field to the response of "GET _security/_authenticate".
2020-08-21 22:59:43 +10:00
Andrei Stefan 5de0f19cc3
EQL: Return sequence join keys in the original type (#61268) (#61282)
(cherry picked from commit d54957d61faa0d502387656e3cace594017b6ea0)
2020-08-18 19:37:15 +03:00
Mark Tozzi db1df6cc30
[7.x] Remove a bunch of type boilerplate from Aggs (#60852) (#61031) 2020-08-17 12:13:05 -04:00
Benjamin Trent 038cc26ac5
[ML] adjusts feature importance format for hlrc (#61150) (#61153)
related to PR https://github.com/elastic/elasticsearch/pull/61104
2020-08-14 11:33:41 -04:00
Andrei Dan 186e8b865d
HLRC: UpdateByQuery API with wait_for_completion being false (#58552) (#61081)
(cherry picked from commit 291f5bd1b2e889e9447d660e5407f3120cffb1a5)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>

Co-authored-by: Tuan Le <23419763+tumile@users.noreply.github.com>
2020-08-13 11:06:41 +01:00
Armin Braun 32423a486d
Simplify and Speed up some Compression Usage (#60953) (#61008)
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.

Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
2020-08-12 11:06:23 +02:00
Henning Andersen a0b54b53fc Rest high level ReindexIT fix (#60834)
ReindexIT would rethrottle any delete or update by query task, fixed to
more precisely match the task started by the test.

Closes #60811
2020-08-11 10:35:15 +02:00
Martijn van Groningen 9163c9ce36
Adjust hlrc data streams integration test (#60804)
Backport of #60746

to wait for at least a single shard to be allocated for a backing index of a data stream,
so that total store size is larger than zero (which is what the tests expects).

Closes #60461
2020-08-06 11:57:12 +02:00
Hendrik Muhs 2b6891b584
[7.x][Transform] implement test suite to test continuous transforms (#60725)
implements a test suite for testing continuous transform with randomization in terms of mappings,
index settings, transform configuration. Add a test case for terms and date histogram. The test
covers:

 - continuous mode with several checkpoints created
 - correctness of results
 - optimizations (minimal necessary writes)
 - permutations of features (index settings, aggs, data types, index or data stream)
2020-08-05 16:56:01 +02:00
Przemysław Witek 0afa1bd972
Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601) (#60727) 2020-08-05 13:39:40 +02:00
Rene Groeschke bdd7347bbf
Merge test runner task into RestIntegTest (7.x backport) (#60600)
* Merge test runner task into RestIntegTest (#60261)
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
* Fix merge issues
* use former 7.x common test configuration
2020-08-04 14:46:32 +02:00
Yang Wang 54aaadade7
API key name should always be required for creation (#59836) (#60636)
The name is now required when creating or granting API keys.
2020-08-04 13:28:47 +10:00
Hendrik Muhs aaed6b59d6
[7.x][Transform] add support for missing bucket (#59591) (#60390)
add support for "missing_bucket" in group_by

fixes #42941
fixes #55102
backport #59591
2020-07-30 08:26:51 +02:00
Dan Hermann fe12217c7f
[7.x] Move REST specs for data streams (#60111) 2020-07-23 08:10:54 -05:00
James Baiera b3363cf8f9
[7.x] Remove unneeded rest params from Data Stream Stats (#59575) (#59661)
This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data 
Streams Stats REST endpoint. These options are required for broadcast requests, but are not 
needed for anything in terms of resolving data streams. Instead, we just set a default set of 
IndicesOptions on the transport request.
2020-07-21 15:59:16 -04:00
Przemysław Witek 283a1f605c
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 15:15:04 +02:00
Benjamin Trent a28547c4b4
[7.x] [ML] add new `custom` field to trained model processors (#59542) (#59700)
* [ML] add new `custom` field to trained model processors (#59542)

This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 10:57:38 -04:00
Martijn van Groningen 2a89e13e43
Move data stream transport and rest action to xpack (#59593)
Backport of #59525 to 7.x branch.

* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
2020-07-15 16:50:44 +02:00
James Baiera 5f7e7e9410
[7.x] Data Stream Stats API (#58707) (#59566)
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
2020-07-14 16:57:46 -04:00