Commit Graph

1333 Commits

Author SHA1 Message Date
Nick Knize 125958eb2c Remove the Elastic license file, all checks for this license and the license REST APIs. (#12)
Co-authored-by: Rabi Panda <rabipanda@icloud.com>
Signed-off-by: Peter Nied <petern@amazon.com>
2021-03-13 10:36:07 -06:00
Nick Knize 0dd4f9f281 [PURIFY] remove all trace of x-pack ccr (#11)
This committ removes all trace of Elastic licensed CCR.

Signed-off-by: Peter Nied <petern@amazon.com>
2021-03-13 10:36:07 -06:00
Nick Knize 479dd4b42a [PURIFY] remove all trace of x-pack async-search (#10)
This commit removes all trace of Elastic licensed asyc-search

Signed-off-by: Peter Nied <petern@amazon.com>
2021-03-13 10:36:07 -06:00
Nick Knize 204122aae4 [PURIFY] remove all trace of x-pack enrichment processor (#9)
This commit removes all trace of the Elastic licensed enrichment processor.

Signed-off-by: Peter Nied <petern@amazon.com>
2021-03-13 10:36:06 -06:00
Nick Knize 3a52e9ddc1 [PURIFY] update build.gradle files to ensure build completes; gradle check fails (#7)
Signed-off-by: Peter Nied <petern@amazon.com>
2021-03-13 10:36:06 -06:00
Nick Knize 168f5e825f [PURIFY] remove all trace of x-pack eql (#5)
This commit removes all trace of EQL from the sanitized fork.

Signed-off-by: Peter Nied <petern@amazon.com>
2021-03-13 10:36:06 -06:00
Jay Modi 55e79dd286
Mute IndicesClientIT.testDataStreams (#65859)
This commit mutes IndicesClientIT.testDataStreams as this test is
failing in CI intermittently.

Relates #60746
Relates #60461
2020-12-03 14:36:35 -07:00
Sylvain Wallez b2475f9ccf
Fix parsing RareTerms aggregation response in RestHighLevelClient (#65144)
Backport of #64454

- Add LongRareTerms and StringRareTerms to the DefaultNamedXContents,
  ensure that the response of RareTerms aggregation can be parsed
  correctly.
- Add testSearchWithRareTermsAgg method to test the response of
  RareTerms aggregation can be parsed correctly.
- Add some test code to ensure the AggregationsTests can execute
  successfully.

Co-authored-by: bellengao <gbl_long@163.com>
2020-11-17 17:43:51 +01:00
Costin Leau 76e73fec79
EQL: Add option for returning results from the tail of the stream (#64869) (#65040)
Introduce option for specifying whether the results are returned from
the tail (end) of the stream or the head (beginning).
Improve sequencing algorithm by significantly eliminating the number
of in-flight sequences for spare datasets.
Refactor the sequence class by eliminating some of the redundant code.
Change matching behavior for tail sequences.
Return results based on their first entry ordinal instead of
insertion order (which was ordered on the last match ordinal).
Randomize results position inside test suite.

Close #58646

(cherry picked from commit e85d9d1bbee13ad408e789fd62efb30bc8d223f2)
(cherry picked from commit 452c674a10cdc16dced3cde7babf5d5a9d64a6d9)
2020-11-14 13:44:17 +02:00
Tanguy Leroux 09ff421d4f
Fix TaskIT (#64724) (#64848)
Closes #64056
2020-11-10 11:06:25 +01:00
Andrei Dan 460c1364c0
Tests: fix testBulkProcessorConcurrentRequestsReadOnlyIndex (#64633) (#64657)
Executing a MultiGetRequest with no items will fail. This makes sure there
is always at least one item in the request.

(cherry picked from commit bd4703250fe331296b8613b277ea25c8bef1dcd9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-11-06 09:10:51 +00:00
Armin Braun 1880bcdc09
Add REST Test for Snapshot Clone API (#63863) (#63881)
Adds snapshot clone REST tests and HLRC support for the API.
2020-10-20 09:48:03 +02:00
Ryland Herrick 7e8769a666
EQL: make allow_no_indices true by default (#63573) (#63645)
* Allow all indices options variants
Irrespective of allow_no_indices value, throw VerificationException when
there is no index validated

Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
2020-10-14 03:41:04 +03:00
Przemysław Witek acbd48f834
[ML] Allow setting num_top_classes to a special value -1 (#63587) (#63602) 2020-10-13 13:57:50 +02:00
Przemysław Witek bd761cce1d
[ML] Validate that AucRoc has the data necessary to be calculated (#63302) (#63454) 2020-10-08 09:52:15 +02:00
Lisa Cawley 8f76c89cd3
[7.x][DOCS] Add feature_importance_baseline to get trained model API (#63279) (#63336)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-10-06 10:08:34 -07:00
Hendrik Muhs 058c55da6a [Transform] disallow field and script being empty for group sources (#63313)
fail validation earlier when field and script are both missing in a group source
2020-10-06 16:59:02 +02:00
Yang Wang 7969fbb4ab
Cache API key doc to reduce traffic to the security index (#59376) (#63319)
Getting the API key document form the security index is the most time consuing part
of the API Key authentication flow (>60% if index is local and >90% if index is remote).
This traffic is now avoided by caching added with this PR.

Additionally, we add a cache invalidator registry so that clearing of different caches will
be managed in a single place (requires follow-up PRs).
2020-10-06 23:49:23 +11:00
Andrei Stefan 76bba601ab
Remove case_sensitive request option (#63218) (#63244)
Make EQL case sensitive by default and adapt some of the string functions
Remove the case sensitive option from Between string function
Add case_insensitive option to term and wildcard queries usage

(cherry picked from commit 7550e0664c8c2f1f13519036c759b1e76345551f)
2020-10-05 22:04:42 +03:00
Armin Braun cf75abb021
Optimize XContentParserUtils.ensureExpectedToken (#62691) (#63253)
We only ever use this with `XContentParser` no need to make it inline
worse by forcing the lambda and hence dynamic callsite here.
=> Extraced the exception formatting code path that is likely very cold
to a separate method and removed the lambda usage in hot loops by simplifying
the signature here.
2020-10-05 19:08:32 +02:00
Benjamin Trent 1e63313c19
[ML] adds feature_importance_baseline object to model metadata (#63172) (#63237)
this adds the new field `feature_importance_baseline` and allows it to be optionally be included in the model's metadata.

Related to: https://github.com/elastic/ml-cpp/pull/1522
2020-10-05 09:33:38 -04:00
Costin Leau 8c4503bcc3 EQL: Change default indices options (#63192)
Ignore by default unavailable indices (same as ES) and verify that
allowNoIndices is set to false since at least one index is required
for validating the query.

Fix #62986

(cherry picked from commit fd75ac27223cd1b699b8d9c311dc401a39f9e0c8)
2020-10-05 14:21:56 +03:00
Benjamin Trent cfcf973259
[7.x] [ML] renames */inference* apis to */trained_models* (#63097) (#63136)
* [ML] renames */inference* apis to */trained_models* (#63097)

This commit renames all `inference` CRUD APIs to `trained_models`.

This aligns with internal terminology, documentation, and use-cases.
2020-10-02 07:34:28 -04:00
Benjamin Trent 535f8a434b
Revert "[ML] adding `baseline` field to total_feature_importance objects (#63098) (#63125)" (#63144)
This reverts commit 95242eccee.
2020-10-02 07:03:15 -04:00
Benjamin Trent 95242eccee
[ML] adding `baseline` field to total_feature_importance objects (#63098) (#63125)
This adds a new `baseline` field to the feature importance values. 

This field contains the baseline importance for a given feature and class.
2020-10-01 09:48:07 -04:00
Costin Leau a6b903b783 EQL: Remove unused classes from reponse API (#62134)
Remove Count class and related artifacts since that functionality is not
(yet) available.
Update parser name for better error reporting.

Fix #62131

(cherry picked from commit 060f500346788c4c5d0b3b9c045facec5d677d3d)
2020-09-30 15:45:30 +03:00
Przemysław Witek d677a2b8ee
[7.x] [ML] Implement AucRoc metric for classification - HLRC (#62304) (#63058) 2020-09-30 14:04:10 +02:00
Benjamin Trent 0b3af242d4
[ML] fixing classification feature importance parsing (#63003) (#63015)
Classification feature importance supports various types in the class name:
- string
- boolean
- numerical

The xcontent parsing on the server side and the HLRC side should support and test these types.
2020-09-29 10:54:35 -04:00
Dimitris Athanasiou 7f6c1ff5b4
[7.x][ML] Remove top level importance from classification inference results (#62486) (#62964)
As we have decided top level importance for classification is not useful,
it has been removed from the results from the training job. This commit
also removes them from inference.

Backport of #62486
2020-09-29 10:58:48 +03:00
Andrei Dan 25106ba58f
HLRC: add support for the wait_for_snapshot ILM action (#62333) (#62931)
(cherry picked from commit b8a10b3995669954f0e8c6b3512c50da6c76d48d)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-28 09:54:24 +01:00
Andrei Dan 3590a77b2b
HLRC: add support for the searchable_snapshot ILM action (#62323) (#62887)
(cherry picked from commit 681eb58718c4cce9ed18a835f4eadb06997e91a0)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-24 16:45:50 +01:00
Hendrik Muhs a70389015d [Transform] Return parsed count for get transform stats (#62809)
In case of more than 500 transforms, get and stats return paged results which can be requested using
page parameters. For >500 transforms count wasn't parsed out of the server response but taken from
size of the list of transforms.

The change also adds client/server hlrc tests and fixes a wrong type for count in get.

fixes #56245
2020-09-24 08:38:07 +02:00
Marios Trivyzas 1e72144847
EQL: Remove support for `=` for comparisons (#62756) (#62775)
Since `=` is rarely used and is undocumented we its support for
equality comparisons keeping `==` as the only option. `=` is now only
used for assignments like in `maxspan=10m`.

Closes: #62650
(cherry picked from commit ad5ae4d887b5c2feca2d0e874d7bdf738e3fd54e)
2020-09-22 20:56:04 +02:00
Benjamin Trent e163559e4c
[7.x] [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922) (#62620)
* [ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)

Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.

* fixing test

* Update x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ml/action/GetTrainedModelsRequestTests.java
2020-09-18 10:07:35 -04:00
Costin Leau 81f2f84177 EQL: Allow requests with size 0 (#62537)
The purpose for this change is to allow validation of queries without
having to actually execute them. The optimizer already picks up this
case.

Fix #62494

(cherry picked from commit 675889559b2f96a0c1faa6fc84fd537148ba2cce)
2020-09-18 11:24:39 +03:00
William Brafford 5a0dca2491
Deprecate xpack.eql.enabled setting and make it a no-op (#61375) (#62491)
* Deprecate xpack.eql.enabled and make it a no-op
* Remove uses of xpack.eql.enabled
2020-09-17 14:17:27 -04:00
Andrei Dan fe1194d58f
[7.x] ILM migrate data between tiers (#61377) (#62536)
This adds ILM support for automatically migrating the managed
indices between data tiers.

This proposal makes use of a MigrateAction that is injected
(similar to how the Unfollow action is injected) in phases that
don't define index allocation rules using the AllocateAction or
don't explicitly define the MigrateAction itself (regardless if it's
enabled or disabled).

(cherry picked from commit c1746afffd61048d0c12d3a77e6d8191a804ed49)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-09-17 15:08:31 +01:00
Benjamin Trent cec102a391
[7.x] [ML] adds new n_gram_encoding custom processor (#61578) (#61935)
* [ML] adds new n_gram_encoding custom processor (#61578)

This adds a new `n_gram_encoding` feature processor for analytics and inference.

The focus of this processor is simple ngram encodings that allow:
 - multiple ngrams [1..5]
 - Prefix, infix, suffix
2020-09-04 08:36:50 -04:00
Armin Braun 28710c985d
Dry up Settings from Map Construction (#61778) (#61803)
We used the same hack all over the place. At least drying it up to a single place.

Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
2020-09-01 19:46:10 +02:00
Przemyslaw Gomulka 9f566644af
Do not create two loggers for DeprecationLogger backport(#58435) (#61530)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.

depends on #61515
backports #58435
2020-08-26 16:04:02 +02:00
Costin Leau bff3c7470e
EQL: Replace SearchHit in response with Event (#61428) (#61522)
The building block of the eql response is currently the SearchHit. This
is a problem since it is tied to an actual search, and thus has scoring,
highlighting, shard information and a lot of other things that are not
relevant for EQL.
This becomes a problem when doing sequence queries since the response is
not generated from one search query and thus there are no SearchHits to
speak of.
Emulating one is not just conceptually incorrect but also problematic
since most of the data is missed or made-up.

As such this PR introduces a simple class, Event, that maps nicely to
the terminology while hiding the ES internals (the use of SearchHit or
GetResult/GetResponse depending on the API used).

Fix #59764
Fix #59779

Co-authored-by: Igor Motov <igor@motovs.org>
(cherry picked from commit 997376fbe6ef2894038968842f5e0635731ede65)
2020-08-25 17:32:42 +03:00
Benjamin Trent 1ae2923632
[7.x] [ML] adding docs + hlrc for data frame analysis feature_processors (#61149) (#61493)
* [ML] adding docs + hlrc for data frame analysis feature_processors (#61149)

Adds HLRC and some docs for the new feature_processors field in Data frame analytics.

Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-08-24 12:56:21 -04:00
Armin Braun d05649bfae
Fix PutPolicyRequestTests.testFromXContent (#61485) (#61494)
We only ever support `JSON` for the query source format in practice.
The reason this test worked before is a bug in xcontent parsing that parses
empty maps out of streams of the wrong format.

Closes #61483
2020-08-24 18:52:05 +02:00
Yang Wang cd52233b94
Include authentication type for the authenticate response (#61247) (#61411)
Add a new "authentication_type" field to the response of "GET _security/_authenticate".
2020-08-21 22:59:43 +10:00
Andrei Stefan 5de0f19cc3
EQL: Return sequence join keys in the original type (#61268) (#61282)
(cherry picked from commit d54957d61faa0d502387656e3cace594017b6ea0)
2020-08-18 19:37:15 +03:00
Mark Tozzi db1df6cc30
[7.x] Remove a bunch of type boilerplate from Aggs (#60852) (#61031) 2020-08-17 12:13:05 -04:00
Benjamin Trent 038cc26ac5
[ML] adjusts feature importance format for hlrc (#61150) (#61153)
related to PR https://github.com/elastic/elasticsearch/pull/61104
2020-08-14 11:33:41 -04:00
Andrei Dan 186e8b865d
HLRC: UpdateByQuery API with wait_for_completion being false (#58552) (#61081)
(cherry picked from commit 291f5bd1b2e889e9447d660e5407f3120cffb1a5)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>

Co-authored-by: Tuan Le <23419763+tumile@users.noreply.github.com>
2020-08-13 11:06:41 +01:00
Armin Braun 32423a486d
Simplify and Speed up some Compression Usage (#60953) (#61008)
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.

Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
2020-08-12 11:06:23 +02:00
Henning Andersen a0b54b53fc Rest high level ReindexIT fix (#60834)
ReindexIT would rethrottle any delete or update by query task, fixed to
more precisely match the task started by the test.

Closes #60811
2020-08-11 10:35:15 +02:00
Martijn van Groningen 9163c9ce36
Adjust hlrc data streams integration test (#60804)
Backport of #60746

to wait for at least a single shard to be allocated for a backing index of a data stream,
so that total store size is larger than zero (which is what the tests expects).

Closes #60461
2020-08-06 11:57:12 +02:00
Hendrik Muhs 2b6891b584
[7.x][Transform] implement test suite to test continuous transforms (#60725)
implements a test suite for testing continuous transform with randomization in terms of mappings,
index settings, transform configuration. Add a test case for terms and date histogram. The test
covers:

 - continuous mode with several checkpoints created
 - correctness of results
 - optimizations (minimal necessary writes)
 - permutations of features (index settings, aggs, data types, index or data stream)
2020-08-05 16:56:01 +02:00
Przemysław Witek 0afa1bd972
Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601) (#60727) 2020-08-05 13:39:40 +02:00
Rene Groeschke bdd7347bbf
Merge test runner task into RestIntegTest (7.x backport) (#60600)
* Merge test runner task into RestIntegTest (#60261)
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
* Fix merge issues
* use former 7.x common test configuration
2020-08-04 14:46:32 +02:00
Yang Wang 54aaadade7
API key name should always be required for creation (#59836) (#60636)
The name is now required when creating or granting API keys.
2020-08-04 13:28:47 +10:00
Hendrik Muhs aaed6b59d6
[7.x][Transform] add support for missing bucket (#59591) (#60390)
add support for "missing_bucket" in group_by

fixes #42941
fixes #55102
backport #59591
2020-07-30 08:26:51 +02:00
Dan Hermann fe12217c7f
[7.x] Move REST specs for data streams (#60111) 2020-07-23 08:10:54 -05:00
James Baiera b3363cf8f9
[7.x] Remove unneeded rest params from Data Stream Stats (#59575) (#59661)
This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data 
Streams Stats REST endpoint. These options are required for broadcast requests, but are not 
needed for anything in terms of resolving data streams. Instead, we just set a default set of 
IndicesOptions on the transport request.
2020-07-21 15:59:16 -04:00
Przemysław Witek 283a1f605c
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 15:15:04 +02:00
Benjamin Trent a28547c4b4
[7.x] [ML] add new `custom` field to trained model processors (#59542) (#59700)
* [ML] add new `custom` field to trained model processors (#59542)

This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 10:57:38 -04:00
Martijn van Groningen 2a89e13e43
Move data stream transport and rest action to xpack (#59593)
Backport of #59525 to 7.x branch.

* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
2020-07-15 16:50:44 +02:00
James Baiera 5f7e7e9410
[7.x] Data Stream Stats API (#58707) (#59566)
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
2020-07-14 16:57:46 -04:00
Dan Hermann 59f639a279
Add auto_configure privilege 2020-07-14 08:23:49 -05:00
Andrei Dan 7dcdaeae49
Default to @timestamp in composable template datastream definition (#59317) (#59516)
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.

(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 12:36:54 +01:00
Igor Motov 1acb4aeba9
EQL: Prepare for release (#59331) (#59426)
Enables eql setting in release builds.

Relates #51613
2020-07-13 11:54:32 -04:00
Dimitris Athanasiou b2243337d8
[7.x][ML] Data frame analytics max_num_threads setting (#59254) (#59308)
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.

Backport of #59254 and #57274
2020-07-09 19:15:46 +03:00
David Kyle c5443f78ce
Add Inference Pipeline aggregation to HLRC (#59086) (#59250)
Adds InferencePipelineAggregationBuilder to the HLRC duplicating 
the server side classes
2020-07-09 13:38:45 +01:00
Andrei Stefan c0e0bca84c
Remove search_after and implicit_join_key_field (#59232) (#59280)
(cherry picked from commit 6ede6c59eff321b9fedad30e19508b9e4f788b54)
2020-07-09 12:34:01 +03:00
Martijn van Groningen 17bd559253
Fix the timestamp field of a data stream to @timestamp (#59210)
Backport of #59076 to 7.x branch.

The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
  index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
  instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
  to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
  for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.

Relates to #58642
Relates to #53100
Closes #58956
Closes #58583
2020-07-08 17:30:46 +02:00
Costin Leau 3e32d060bf EQL: Fix bug in skipping window (#59196)
Corrected condition that caused a sequence window to be skipped when a query
returns no results by checking not just the current stage but also following
ones as they can match with in-flight sequences.
Improve logging
Fix NPE when emptying a SequenceGroup
Increase randomization in testing
Make maxspan inclusive (up to and equal to value vs just up to)

(cherry picked from commit ad32c488688cb350c2934dfca03af86045e997b0)
2020-07-08 14:36:39 +03:00
Andrei Dan 24c6a30e2b
[7.9] GET data stream API returns additional information (#59128) (#59177)
* GET data stream API returns additional information (#59128)

This adds the data stream's index template, the configured ILM policy
(if any) and the health status of the data stream to the GET _data_stream
response.

Restoring a data stream from a snapshot could install a data stream that
doesn't match any composable templates. This also makes the `template`
field in the `GET _data_stream` response optional.

(cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-07 20:30:09 +01:00
Rene Groeschke e8181fc627
Fix implicit duplicate duplicatesStrategy in processResources (#58929) (#59127)
* Fix implicit duplicate duplicatesStrategy in processResources
* Fix duplicates strategy in docker distribution setup
2020-07-07 13:45:36 +02:00
Costin Leau f9c15d0fec EQL: Introduce sequencing fetch size (#59063)
The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory.
This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired.
As such, use in testing a minimum fetch size which exposed a number of bugs:

Jumping across data across queries causing valid data to be seen as a gap.
Incorrectly resuming searching across pages (again causing data to be discarded).
which have been addressed.

(cherry picked from commit 2f389a7724790d7b0bda67264d6eafcfa8b2116e)
2020-07-06 19:14:26 +03:00
Przemysław Witek 4a791e835b
Simplify parser declarations when specialist types are stored in strings (#58996) (#59056) 2020-07-06 13:05:03 +02:00
Przemysław Witek f35ad0d4e1
Report peak model memory in ModelSizeStats (#59017) (#59055) 2020-07-06 12:55:12 +02:00
Benjamin Trent b9d9964d10
[ML] add exponent output aggregator to inference (#58933) (#59016)
* [ML] add exponent output aggregator to inference

* fixing docs

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-03 14:51:00 -04:00
Przemysław Witek 751e84e4c8
Rename regression evaluation metrics to make the names consistent with loss functions (#58887) (#58927) 2020-07-02 17:35:55 +02:00
Przemysław Witek 8e074c4495
Rename "error" field to "value" for consistency between metrics (#58726) (#58870) 2020-07-02 09:08:56 +02:00
Yang Wang a5a8b4ae1d
Add cache for application privileges (#55836) (#58798)
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.

Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
2020-07-02 11:50:03 +10:00
Przemysław Witek 909649dd15
[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) (#58825) 2020-07-01 14:52:06 +02:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Przemysław Witek 9ea9b7bd3b
[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) (#58731) 2020-06-30 14:09:11 +02:00
Yannick Welsch b885cbff1a
Add index block api (#58716)
Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to
the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have
been completed after adding the write block.

This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after
the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its
documents.
2020-06-30 14:06:52 +02:00
Andrei Stefan 3cb8f54f28
EQL: case sensitivity aware integration testing (#58624) (#58672)
* EQL: case sensitivity aware integration testing (#58624)

* Add DataLoader
* Rewrite case sensitivity settings:
NULL -> run both case sensitive and insensitive tests
TRUE -> run case sensitive test only
FALSE -> run case insensitive test only
* Rename test_queries_supported
* Add more toml tests from the Python client

Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
(cherry picked from commit 34d383421599f060a5c083b40df35f135de49e39)
2020-06-29 18:40:07 +03:00
Przemysław Witek 3f7c45472e
[7.x] Introduce DataFrameAnalyticsConfig update API (#58302) (#58648) 2020-06-29 10:56:11 +02:00
Dimitris Athanasiou 1817b896c9
[7.x][ML] Add status and increased estimate to memory usage (#58588) (#58606)
Adds parsing of `status` and `memory_reestimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`memory_reestimate_bytes` can be used to update the job's
limit in order to restart the job.

Backport of #58588
2020-06-28 16:27:26 +03:00
Igor Motov 20af856abd
[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192)
Adds async support to EQL searches

Closes #49638

Co-authored-by: James Rodewig james.rodewig@elastic.co
2020-06-25 14:11:57 -04:00
Nik Everett 03e6d1b535
Add Variable Width Histogram Aggregation (backport of #42035) (#58440)
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.

This PR addresses #9572.

The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.

At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.

The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.

Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue.

It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.

Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
2020-06-25 11:40:47 -04:00
Martijn van Groningen 7dda9934f9
Keep track of timestamp_field mapping as part of a data stream (#58400)
Backporting #58096 to 7.x branch.
Relates to #53100

* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
2020-06-22 17:46:38 +02:00
Benjamin Trent bf8641aa15
[7.x] [ML] calculate cache misses for inference and return in stats (#58252) (#58363)
When a local model is constructed, the cache hit miss count is incremented.

When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
2020-06-19 09:46:51 -04:00
Jason Tedor be08268562
Allow follower indices to override leader settings (#58103)
Today when creating a follower index via the put follow API, or via an
auto-follow pattern, it is not possible to specify settings overrides
for the follower index. Instead, we copy all of the leader index
settings to the follower. Yet, there are cases where a user would want
some different settings on the follower index such as the number of
replicas, or allocation settings. This commit addresses this by allowing
the user to specify settings overrides when creating follower index via
manual put follower calls, or via auto-follow patterns. Note that not
all settings can be overrode (e.g., index.number_of_shards) so we also
have detection that prevents attempting to override settings that must
be equal between the leader and follow index. Note that we do not even
allow specifying such settings in the overrides, even if they are
specified to be equal between the leader and the follower
index. Instead, the must be implicitly copied from the leader index, not
explicitly set by the user.
2020-06-18 11:56:06 -04:00
Jim Ferenczi 82db0b575c
Allow index filtering in field capabilities API (#57276) (#58299)
This change allows to use an `index_filter` in the
field capabilities API. Indices are filtered from
the response if the provided query rewrites to `match_none`
on every shard:

````
GET metrics-*
{
  "index_filter": {
    "bool": {
      "must": [
        "range": {
          "@timestamp": {
            "gt": "2019"
          }
        }
      }
  }
}
````

The filtering is done on a best-effort basis, it uses the can match phase
to rewrite queries to `match_none` instead of fully executing the request.
The first shard that can match the filter is used to create the field
capabilities response for the entire index.

Closes #56195
2020-06-18 10:23:26 +02:00
Rory Hunter e065d6cc91 Rename dangling index APIs (#58266)
The dangling_indices.import API name could cause issues in the client
libs because import is a reserved word in many languages. Rename the
API to avoid this, and rename the other APIs for consistency.

Related to #48366.
2020-06-18 08:58:32 +01:00
Rene Groeschke abc72c1a27
Unify dependency licenses task configuration (#58116) (#58274)
- Remove duplicate dependency configuration
- Use task avoidance api accross the build
- Remove redundant licensesCheck config
2020-06-18 08:15:50 +02:00
Przemko Robakowski 3249ee9a86
HLRC support for data streams (#58106) (#58202)
This change adds high level REST client support for data streams

Relates to #53100
2020-06-17 00:21:14 +02:00
Rory Hunter 03369e0980
Implement dangling indices API (#58176)
Backport of #50920. Part of #48366. Implement an API for listing,
importing and deleting dangling indices.

Co-authored-by: David Turner <david.turner@elastic.co>
2020-06-16 21:50:38 +01:00
Rene Groeschke 01e9126588
Remove deprecated usage of testCompile configuration (#57921) (#58083)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-14 22:30:44 +02:00
Dan Hermann 17f3318732
[7.x] Resolve index API (#58037) 2020-06-12 15:41:32 -05:00
David Kyle 39020f3900
HLRC for delete expired data by job Id (#57722) (#57975)
High level rest client changes for #57337
2020-06-12 09:44:17 +01:00
Costin Leau ff0ea62cb8 EQL: Fix casing for tiebreaker field (#57943)
Use tiebreaker instead of tieBreaker

(cherry picked from commit 3c774948a5d5e10fac267cb9a54f5d0559a00c1d)
2020-06-11 00:10:19 +03:00