Commit Graph

6547 Commits

Author SHA1 Message Date
Rory Hunter ff6c071275
Implement deprecation logging using log4j (#61629)
Backport of #61474.

Part of #46106. Simplify the implementation of deprecation logging by
relying of log4j more completely, and implementing additional behaviour
through custom appenders and filters.
2020-08-31 12:42:04 +01:00
Henning Andersen 4c9fe31da8 Mute testTooLowConfiguredMemoryStillStarts (#61705)
Related to #61704
2020-08-31 11:19:53 +02:00
Ioannis Kakavas c621d291d2
Call ActionListener.onResponse exactly once (#61584) (#61682)
Under specific circumstances we would call onResponse twice, which led to unexpected behavior.
2020-08-30 16:47:09 +03:00
Lee Hinman 1bfebd54ea
[7.x] Allocate newly created indices on data_hot tier nodes (#61342) (#61650)
This commit adds the functionality to allocate newly created indices on nodes in the "hot" tier by
default when they are created.

This does not break existing behavior, as nodes with the `data` role are considered to be part of
the hot tier. Users that separate their deployments by using the `data_hot` (and `data_warm`,
`data_cold`, `data_frozen`) roles will have their data allocated on the hot tier nodes now by
default.

This change is a little more complicated than changing the default value for
`index.routing.allocation.include._tier` from null to "data_hot". Instead, this adds the ability to
have a plugin inject a setting into the builder for a newly created index. This has the benefit of
allowing this setting to be visible as part of the settings when retrieving the index, for example:

```
// Create an index
PUT /eggplant

// Get an index
GET /eggplant?flat_settings
```

Returns the default settings now of:

```json
{
  "eggplant" : {
    "aliases" : { },
    "mappings" : { },
    "settings" : {
      "index.creation_date" : "1597855465598",
      "index.number_of_replicas" : "1",
      "index.number_of_shards" : "1",
      "index.provided_name" : "eggplant",
      "index.routing.allocation.include._tier" : "data_hot",
      "index.uuid" : "6ySG78s9RWGystRipoBFCA",
      "index.version.created" : "8000099"
    }
  }
}
```

After the initial setting of this setting, it can be treated like any other index level setting.

This new setting is *not* set on a new index if any of the following is true:

- The index is created with an `index.routing.allocation.include.<anything>` setting
- The index is created with an `index.routing.allocation.exclude.<anything>` setting
- The index is created with an `index.routing.allocation.require.<anything>` setting
- The index is created with a null `index.routing.allocation.include._tier` value
- The index was created from an existing source metadata (shrink, clone, split, etc)

Relates to #60848
2020-08-27 13:41:12 -06:00
Albert Zaharovits 1cb97a2c4f
Relax the index access control check for scroll searches (#61446)
The check introduced by #60640 for scroll searches, in which we log
if the index access control before the query and fetch phases differs
from when the scroll context is created, is too strict, leading to spurious
warning log messages.
The check verifies instance equality but this assumes that the fetch
phase is executed in the same thread context as the scroll context
validation. However, this is not true if the scroll search is executed
cross-cluster, and even for local scroll searches it is an unfounded assumption.

The check is hence reduced to a null check for the index access.
The fact that the access control is suitable given the indices that
are actually accessed (by the scroll) will be done in a follow-up,
after we better regulate the creation of index access controls in general.
2020-08-27 21:16:01 +03:00
Luca Cavanna f769821bc8
Pass SearchLookup supplier through to fielddataBuilder (#61430) (#61638)
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.

To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.

As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.

With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.

Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.

This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.

Relates to #59332

Co-authored-by: Nik Everett <nik9000@gmail.com>
2020-08-27 18:09:56 +02:00
Nik Everett 5a83e89a2b
Migrate histogram field test (#61602) (#61632)
Replaces the superclass of the test for `HistogramFieldMapperTests` with
one that doesn't extend `ESSingleNodeTestCase` so we don't depend on the
entire world to test the field mapper.

Continues #61301.
2020-08-27 11:08:19 -04:00
David Turner c89fb8b9fa Avoid listener call under SparseFileTracker#mutex (#61626)
Today we sometimes notify a listener of completion while holding
`SparseFileTracker#mutex`. This commit move all such calls out from
under the mutex and adds assertions that the mutex is not held in the
listener.

Closes #61520
2020-08-27 15:39:38 +01:00
David Kyle 49a5afc6c1
[ML] Increase wait for templates timeout in tests (#61623) (#61628) 2020-08-27 12:57:12 +01:00
David Kyle 25e811ced7
Rewrite Inference yml tests for better clean up (#61180) (#61555)
Inference processors asynchronously usage write stats to the .ml-stats index after they used. 
In tests the write can leak into the next test causing failures depending on which test follows.
This change waits for the usage stats docs to be written at the end of the test
2020-08-27 11:16:26 +01:00
David Turner f6055dc9b2 Suppress noisy SSL exceptions (#61359)
If a TLS-protected connection closes unexpectedly then today we often
emit a `WARN` log, typically one of the following:

    io.netty.handler.codec.DecoderException: javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)

    io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received close_notify during handshake

We typically only report unexpectedly-closed connections at `DEBUG`
level, but these two messages don't follow that rule and generate a lot
of noise as a result. This commit adjusts the logging to report these
two exceptions at `DEBUG` level only.
2020-08-27 10:59:39 +01:00
David Turner b866aaf81c Use int for number of parts in blob store (#61618)
Today we use `long` to represent the number of parts of a blob. There's
no need for this extra range, it forces us to do some casting elsewhere,
and indeed when snapshotting we iterate over the parts using an `int`
which would be an infinite loop in case of overflow anyway:

    for (int i = 0; i < fileInfo.numberOfParts(); i++) {

This commit changes the representation of the number of parts of a blob
to an `int`.
2020-08-27 10:54:03 +01:00
Ioannis Kakavas aac9eb6b64
Kerberos doc kibana link (#61466) (#61619)
Add a note in Kerberos documentation that Kibana requires a
configuration change too, and link to that documentation page.
2020-08-27 12:42:52 +03:00
Ioannis Kakavas 3640ff1ff2
Add SAML AuthN request signing tests (#61582)
- Add a unit test for our signing code
- Change SAML IT to use signed authentication requests for Shibboleth to consume

Backport of #48444
2020-08-27 10:41:56 +03:00
David Turner 5df74cc888 Replace Math.toIntExact with toIntBytes (#61604)
We convert longs to ints using `Math.toIntExact` in places where we're
sure there will be no overflow, but this doesn't explain the intent of
these conversions very well. This commit introduces a dedicated method
for these conversions, and adds an assertion that we never overflow.
2020-08-27 08:28:54 +01:00
David Turner e14d9c9514
Introduce cache index for searchable snapshots (#61595)
If a searchable snapshot shard fails (e.g. its node leaves the cluster)
we want to be able to start it up again on a different node as quickly
as possible to avoid unnecessarily blocking or failing searches. It
isn't feasible to fully restore such shards in an acceptably short time.
In particular we would like to be able to deal with the `can_match`
phase of a search ASAP so that we can skip unnecessary waiting on shards
that may still be warming up but which are not required for the search.

This commit solves this problem by introducing a system index that holds
much of the data required to start a shard. Today(*) this means it holds
the contents of every file with size <8kB, and the first 4kB of every
other file in the shard. This system index acts as a second-level cache,
behind the first-level node-local disk cache but in front of the blob
store itself. Reading chunks from the index is slower than reading them
directly from disk, but faster than reading them from the blob store,
and is also replicated and accessible to all nodes in the cluster.

(*) the exact heuristics for what we should put into the system index
are still under investigation and may change in future.

This second-level cache is populated when we attempt to read a chunk
which is missing from both levels of cache and must therefore be read
from the blob store.

We also introduce `SearchableSnapshotsBlobStoreCacheIntegTests` which
verify that we do not hit the blob store more than necessary when
starting up a shard that we've seen before, whether due to a node
restart or because a snapshot was mounted multiple times.

Backport of #60522

Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>
2020-08-27 06:38:32 +01:00
Dimitris Athanasiou 3ed65eb418
[7.x][ML] Recover data frame extraction search from latest sort key (#61544) (#61572)
If a search failure occurs during data frame extraction we catch
the error and retry once. However, we retry another search that is
identical to the first one. This means we will re-fetch any docs
that were already processed. This may result either to training
a model using duplicate data or in the case of outlier detection to
an error message that the process received more records than it
expected.

This commit fixes this issue by tracking the latest doc's sort key
and then using that in a range query in case we restart the search
due to a failure.

Backport of #61544

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-08-26 17:54:00 +03:00
Benjamin Trent a6e7a3d65f
[7.x] [ML] write warning if configured memory limit is too low for analytics job (#61505) (#61528)
Backports the following commits to 7.x:

[ML] write warning if configured memory limit is too low for analytics job (#61505)

Having `_start` fail when the configured memory limit is too low can be frustrating. 

We should instead warn the user that their job might not run properly if their configured limit is too low. 

It might be that our estimate is too high, and their configured limit works just fine.
2020-08-26 10:35:38 -04:00
Przemyslaw Gomulka 9f566644af
Do not create two loggers for DeprecationLogger backport(#58435) (#61530)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.

depends on #61515
backports #58435
2020-08-26 16:04:02 +02:00
Ioannis Kakavas 283eaabc71
[7.x] Refactor SamlAuthenticationIT (#57162) (#61568)
Refactor the tests to not require a mock HTTP Server. This has been
the cause of flakiness and removing it doesn't affect the logical
coverage of this suite. The "fake UI" is now simulated by an
http client that makes the necessary requests to Elasticsearch APIs.
2020-08-26 15:34:56 +03:00
Przemysław Witek 11c2710e7f
[7.x] [ML] Do not mark the DFA job as FAILED when a failure occurs after the node is shutdown (#61331) (#61526) 2020-08-26 09:53:13 +02:00
Igor Motov f70a59971a
[7.x] Add rate aggregation (#61369) (#61554)
Adds a new rate aggregation that can calculate a document rate for buckets
of a date_histogram.

Closes #60674
2020-08-25 17:39:00 -04:00
markharwood 8b56441d2b
Search - add case insensitive support for regex queries. (#59441) (#61532)
Backport to add case insensitive support for regex queries. 
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master.
This can be removed when 8.7 Lucene is released.

Closes #59235
2020-08-25 17:18:59 +01:00
Przemyslaw Gomulka f3f7d25316
Header warning logging refactoring backport(#55941) (#61515)
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.

relates #55699
relates #52369
backports #55941
2020-08-25 16:35:54 +02:00
Costin Leau bff3c7470e
EQL: Replace SearchHit in response with Event (#61428) (#61522)
The building block of the eql response is currently the SearchHit. This
is a problem since it is tied to an actual search, and thus has scoring,
highlighting, shard information and a lot of other things that are not
relevant for EQL.
This becomes a problem when doing sequence queries since the response is
not generated from one search query and thus there are no SearchHits to
speak of.
Emulating one is not just conceptually incorrect but also problematic
since most of the data is missed or made-up.

As such this PR introduces a simple class, Event, that maps nicely to
the terminology while hiding the ES internals (the use of SearchHit or
GetResult/GetResponse depending on the API used).

Fix #59764
Fix #59779

Co-authored-by: Igor Motov <igor@motovs.org>
(cherry picked from commit 997376fbe6ef2894038968842f5e0635731ede65)
2020-08-25 17:32:42 +03:00
Armin Braun f22ddf822e
Some Optimizations around BytesArray (#61183) (#61511)
* Faster `equals` for `BytesArray` which is nice since with this change we use it for the search cache
* Lighter `StreamInput` for `BytesArray` that should save memory and some indirection relative to the one on the abstract bytes reference
* Lighter `writeTo` implementation
* Build a `BytesArray` instead of a PagedBytesReference whenever possible to save indirection and memory
2020-08-25 07:13:39 +02:00
Armin Braun 806dfcfcf7
Speed up Compression Logic by Pooling Resources (#61358) (#61495)
This is mostly motivated by the performance issues we are seeing around the GET mappings
REST API which (in case of a large number of indices) will create decompressing streams in a hot loop
which takes a significant amount of time for the system calls involved in instantiating deflaters
and inflaters.
Also, this fixes a leaked deflater when deserializing cached repository data.
2020-08-25 04:01:55 +02:00
David Kyle 539cf914bc
[ML] handle new model metadata stream from native process (#59725) (#61251)
This adds the serialization handling for the new model_metadata object from the native process.

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-08-24 15:52:13 -04:00
James Rodewig 2b852388c5
[DOCS] Fix hyphenation for "time series" (#61472) (#61481) 2020-08-24 11:18:07 -04:00
Dimitris Athanasiou 618dd65d5f
[7.x][ML] Add debug logging for field caps request during DF Analytics (#61459) (#61478)
Adds debug logging for the request and the response that is getting
field capabilities during a data frame analytics job.

Backport of #61459
2020-08-24 18:01:30 +03:00
Dimitris Athanasiou 18ca8a6be3
[7.x][ML] Remove redundant logging for creation of annotations index (#61461) (#61475)
This commit removes the log info message "Created ML annotations index and aliases".

The message comes in addition to elasticsearch's index creation logging and it does
not add to it. In addition, since #61107 that message may be logged multiple times.

Backport of #61461
2020-08-24 17:46:29 +03:00
Yang Wang f0615113b6
Report anonymous roles in authenticate response (#61355) (#61454)
Report anonymous roles in response to "GET _security/_authenticate" API call when:
* Anonymous role is enabled
* User is not the anonymous user
* Credentials is not an API Key
2020-08-24 14:51:44 +10:00
Yang Wang 0509465a9e
Warn about unlicensed realms if no auth token can be extracted (#61402) (#61419)
There are warnings about unlicense realms when user lookup fails. This PR adds
similar warnings for when no authentication token can be extracted from the request.
2020-08-22 00:04:45 +10:00
Yang Wang cd52233b94
Include authentication type for the authenticate response (#61247) (#61411)
Add a new "authentication_type" field to the response of "GET _security/_authenticate".
2020-08-21 22:59:43 +10:00
Lloyd cb83e7011c
[Backport][API keys] Add full_name and email to API key doc and use them to populate authing User (#61354) (#61403)
The API key document currently doesn't include the user's full_name or email attributes,
and as a result, when those attributes return `null` when hitting `GET`ing  `/_security/_authenticate`,
and in the SAML response from the [IdP Plugin](https://github.com/elastic/elasticsearch/pull/54046).

This changeset adds those fields to the document and extracts them to fill in the User when
authenticating. They're effectively going to be a snapshot of the User from when the key was
created, but this is in line with roles and metadata as well.

Signed-off-by: lloydmeta <lloydmeta@gmail.com>
2020-08-21 18:32:19 +09:00
Julie Tibshirani 997c73ec17
Correct how field retrieval handles multifields and copy_to. (#61391)
Before when a value was copied to a field through a parent field or `copy_to`,
we parsed it using the `FieldMapper` from the source field. Instead we should
parse it using the target `FieldMapper`. This ensures that we apply the
appropriate mapping type and options to the copied value.

To implement the fix cleanly, this PR refactors the value parsing strategy. Now
instead of looking up values directly, field mappers produce a helper object
`ValueFetcher`. The value fetchers are responsible for almost all aspects of
fetching, including looking up the right paths in the _source.

The PR is fairly big but each commit can be reviewed individually.

Fixes #61033.
2020-08-20 15:53:35 -07:00
Alan Woodward a3a0c63ccf
Convert NumberFieldMapper to parametrized form (#61092) (#61376)
In addition, this commit converts ScaledFloatFieldMapper as it was relying
on a number of static values taken from NumberFieldMapper that had changed
or been removed.
2020-08-20 16:43:26 +01:00
Nik Everett 9789e6d154
Migrate some field mapper tests to ESTestCase (#61301) (#61346)
This switches a few tests for field mappers from `ESSingleNodeTestCase`
to `ESTestCase` because, in general, we prefer to avoid
`ESSingleNodeTestCase` when we can because it is slow and "big". "Big"
here means that it pulls in an entire node, making it difficult to
reason about what you are testing.
2020-08-19 15:43:49 -04:00
Francisco Fernández Castaño 89a7f32100
Fix SearchableSnapshotDirectoryTests#testRecoveryStateIsKeptOpenAfterPreWarmFailure (#61343)
The test didn't take into account the case where 0 documents are
indexed into the shard, meaning that files aren't loaded during
the pre-warm phase. The test injects FileSystem failures, if
the snapshot doesn't contain any files, pre-warm doesn't read
any files and the recovery completes normally.

Closes #61295
Backport of #61317
2020-08-19 19:28:47 +02:00
Andrei Stefan a214d7902a
EQL: make endsWith function use a wildcard ES query wherever possible (#61160) (#61320)
(cherry picked from commit 55fdb7e2c74d4fae86ec40686091ecba831caeaf)
2020-08-19 14:17:55 +03:00
Andrei Stefan a6c0670a14
EQL: make stringContains function use a wildcard ES query (#61189) (#61313)
(cherry picked from commit 039a7d1c68f6f1ed0e7e6cfb86be6b04eec8051c)
2020-08-19 12:40:48 +03:00
Martijn van Groningen d4a8172f8e
Disable ilm history in data streams rest qa module. (#61312)
Backport of #61291 to 7.x branch.

Closes #61273
2020-08-19 10:34:26 +02:00
Andrei Stefan 93abbb9057
Add data streams wildcard pattern yml test (#61269) (#61280)
(cherry picked from commit e13a365eeb6d8c6a7c9a91f94f0e8e78e3fe4773)
2020-08-18 19:38:07 +03:00
Andrei Stefan 5de0f19cc3
EQL: Return sequence join keys in the original type (#61268) (#61282)
(cherry picked from commit d54957d61faa0d502387656e3cace594017b6ea0)
2020-08-18 19:37:15 +03:00
Martijn van Groningen cbf60f6c5e
Add tests that simulate new indexing strategy upgrade procedure. (#61263)
Backport of #61082 to 7.x branch.

Closes #58251
2020-08-18 17:02:29 +02:00
Andrei Stefan ad627c7eab
Introduce ordering in the constant_keyword test for better predictibility. (#61248) (#61252)
(cherry picked from commit 69193f9de8178dbaa1d8467f1686b100dd2b161c)
2020-08-18 12:17:15 +03:00
Mark Tozzi db1df6cc30
[7.x] Remove a bunch of type boilerplate from Aggs (#60852) (#61031) 2020-08-17 12:13:05 -04:00
James Rodewig 60876a0e32
[DOCS] Replace Wikipedia links with attribute (#61171) (#61209) 2020-08-17 11:27:04 -04:00
Andrei Stefan db8788e5a2
QL: wildcard field type support (#58062) (#61205)
(cherry picked from commit c874e6cdd3e051ce599b50c18642de038b84105f)
2020-08-17 18:24:32 +03:00
Andrei Stefan 90e116738e
QL: add filtering query dsl support to IndexResolver (#60514) (#61200)
(cherry picked from commit 7b3635d796be26af9f87d19963a8ed4ab4bbf13f)
2020-08-17 17:59:58 +03:00
Nik Everett 1b7bbafd81
Add method to make random DateFormatter pattern (backport of #60613) (#61213)
Adds a method to make a random date `DateFormatter` pattern. We expect
this'll be useful for runtime fields to compate their formatting with
the standard date field.
2020-08-17 10:57:52 -04:00
David Kyle ba89af544f
[7.x] Respect ML upgrade mode in TrainedModelStatsService (#61143) (#61187)
When in upgrade mode the ml stats service should not write to the stats index.
2020-08-17 11:09:25 +01:00
Benjamin Trent 43fc6c34bc
Muting analytics integration tests for change new native output model_metadata (#61158)
relates to elastic/ml-cpp#1456
2020-08-14 11:45:35 -04:00
Benjamin Trent 8f302282f4
[ML] adds new feature_processors field for data frame analytics (#60528) (#61148)
feature_processors allow users to create custom features from
individual document fields.

These `feature_processors` are the same object as the trained model's pre_processors.

They are passed to the native process and the native process then appends them to the
pre_processor array in the inference model.

closes https://github.com/elastic/elasticsearch/issues/59327
2020-08-14 10:32:20 -04:00
David Roberts d1b60269f4
[ML] Ensure annotations index mappings are up to date (#61142)
When the ML annotations index was first added, only the
ML UI wrote to it, so the code to create it was designed
with this in mind.  Now the ML backend also creates
annotations, and those mappings can change between
versions.

In this change:

1. The code that runs on the master node to create the
   annotations index if it doesn't exist but another ML
   index does also now ensures the mappings are up-to-date.
   This is good enough for the ML UI's use of the
   annotations index, because the upgrade order rules say
   that the whole Elasticsearch cluster must be upgraded
   prior to Kibana, so the master node should be on the
   newer version before Kibana tries to write an
   annotation with the new fields.
2. We now also check whether the annotations index exists
   with the correct mappings before starting an autodetect
   process on a node.  This is necessary because ML nodes
   can be upgraded before the master node, so could write
   an annotation with the new fields before the master node
   knows about the new fields.

Backport of #61107
2020-08-14 13:51:04 +01:00
Benjamin Trent 7c3bfb9437
[ML] updating feature_importance results mapping (#61104) (#61144)
This updates the feature_importance mapping change from elastic/ml-cpp#1387
2020-08-14 08:43:10 -04:00
Nhat Nguyen 328c86a4ec Increase timeout in PrimaryFollowerAllocationIT
A slow CI can take more than 10 seconds to relocate shards on the follower.
2020-08-13 14:41:32 -04:00
Dan Hermann c17839c255
Add warning handler to resolve test failure (#60427) (#61102) 2020-08-13 10:37:08 -05:00
Benjamin Trent a497263c47
[ML] ensure config index is updated before clearing finished_time (#61064) (#61085)
When a user upgrades between versions, they may stop their ML jobs.

Then when the upgrade is complete, they will want to open the jobs again.

But, when opening a job, we attempt to clear out the jobs finished_time. If the job configuration has adjusted between the versions (i.e. added a new field), it will dynamically update the .ml-config index.

We should instead manually change the mapping to be the updated version.
2020-08-13 08:12:10 -04:00
David Turner dd7410d8c2 Disable rebalancing in searchable snapshots tests (#61068)
Fixes a test failure in which we allocated some shards and then
relocated them elsewhere, invalidating an assertion about the recovery
statistics which assumed that the shards stayed where they were
originally allocated.

Closes #61067.
2020-08-13 09:08:27 +01:00
Lee Hinman e3df64a429
[7.x] Add data tiers (hot, warm, cold, frozen) as custom node roles (#60994) (#61045)
This commit adds the `data_hot`, `data_warm`, `data_cold`, and `data_frozen` node roles to the
x-pack plugin. These roles are intended to be the base for the formalization of data tiers in
Elasticsearch.

These roles all act as data nodes (meaning shards can be allocated to them). Nodes with the existing
`data` role acts as though they have all of the roles configured (it is a hot, warm, cold, and
frozen node).

This also includes a custom `AllocationDecider` that allows the user to configure the following
settings on a cluster level:
- `cluster.routing.allocation.require._tier`
- `cluster.routing.allocation.include._tier`
- `cluster.routing.allocation.exclude._tier`

And in index settings:
- `index.routing.allocation.require._tier`
- `index.routing.allocation.include._tier`
- `index.routing.allocation.exclude._tier`

Relates to #60848
2020-08-12 11:06:23 -06:00
Andrei Dan 32173a82c8
ILM: add frozen phase (#60983) (#61035)
This adds a frozen phase to ILM that will allow the execution of the
set_priority, unfollow, allocate, freeze and searchable_snapshot actions.

The frozen phase will be executed after the cold and before the delete phase.

(cherry picked from commit 6d0148001c3481290ed7e60dab588e0191346864)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-12 16:36:27 +01:00
Yannick Welsch 6644f2283d Do not access snapshot repo on dedicated voting-only master node (#61016)
Today a snapshot repository verification ensures that all master-eligible and data nodes have write access to the
snapshot repository (and can see each other's data) since taking a snapshot requires data nodes and the currently
elected master to write to the repository. However, a dedicated voting-only master-eligible node is not a data node and
will never be the elected master so we should not require it to have write access to the repository.

Closes #59649
2020-08-12 16:56:45 +02:00
Benjamin Trent 4275a715c9
[ML] adjusting inference processor to support foreach usage (#60915) (#61022)
`foreach` processors store information within the `_ingest` metadata object.

This commit adds the contents of the `_ingest` metadata (if it is not empty).

And will append new inference results if the result field already exists.

This allows a `foreach` to execute and multiple inference results being written to the same result field.

closes https://github.com/elastic/elasticsearch/issues/60867
2020-08-12 08:34:18 -04:00
markharwood 66098e0bf4
Search fix: query_string regex/wildcard searches not working on wildcard fields (#60959) (#61010)
The Query string parser was not delegating the construction of wildcard/regex queries to the underlying field type.
The wildcard field has special data structures and queries that operate on them so cannot rely on the basic regex/wildcard queries that were being used for other fields.

Closes #60957
2020-08-12 10:44:52 +01:00
Armin Braun 32423a486d
Simplify and Speed up some Compression Usage (#60953) (#61008)
Use thread-local buffers and deflater and inflater instances to speed up
compressing and decompressing from in-memory bytes.
Not manually invoking `end()` on these should be safe since their off-heap memory
will eventually be reclaimed by the finalizer thread which should not be an issue for thread-locals
that are not instantiated at a high frequency.
This significantly reduces the amount of byte copying and object creation relative to the previous approach
which had to create a fresh temporary buffer (that was then resized multiple times during operations), copied
bytes out of that buffer to a freshly allocated `byte[]`, used 4k stream buffers needlessly when working with
bytes that are already in arrays (`writeTo` handles efficient writing to the compression logic now) etc.

Relates #57284 which should be helped by this change to some degree.
Also, I expect this change to speed up mapping/template updates a little as those make heavy use of these
code paths.
2020-08-12 11:06:23 +02:00
Andrei Dan 35423a75af
Tests: don't fail if ILM executed the action already (#60916) (#60982)
(cherry picked from commit 8c970ad20f4f55a9c0d6a256aa643ea037281e75)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-12 09:04:04 +01:00
Dimitris Athanasiou 2e18c0f2ac
[7.x][ML] Audit force stopping data frame analytics (#60973) (#61004)
Audits a message when a data frame analytics job is force stopped.

Backport of #60973
2020-08-12 07:45:26 +03:00
Yang Wang c7b0290256
Mute kerberos tests for jdk 8u[262,271) (#60995)
The Kerberos bug (JDK-8246193) is introduced in JDK 8u262 and fixed in 8u271.
This PR mute for any possible releases between these two versions.
2020-08-12 11:50:48 +10:00
Nhat Nguyen ceaa28e97b Increase timeout in testFollowIndexWithConcurrentMappingChanges (#60534)
The test failed because the leader was taking a lot of CPUs to process
many mapping updates. This commit reduces the mapping updates, increases
timeout, and adds more debug info.

Closes #59832
2020-08-11 17:03:22 -04:00
Nhat Nguyen bf7eecf1dc Fix synchronization in ShardFollowNodeTask (#60490)
The leader mapping, settings, and aliases versions in a shard follow-task 
are updated without proper synchronization and can go backward.
2020-08-11 14:52:52 -04:00
James Rodewig 929f1cc9f9
[DOCS] Remove search request body page (#60972) (#60977) 2020-08-11 13:04:07 -04:00
Francisco Fernández Castaño d544528c7b
Increase information on assertRecoveryStats assertion (#60960)
Backport of #60952
2020-08-11 15:30:59 +02:00
Dimitris Athanasiou 6062672148
[7.x][ML] Monitor reindex response in DF analytics (#60911) (#60958)
Examines the reindex response in order to report potential
problems that occurred during the reindexing phase of
data frame analytics jobs.

Backport of #60911
2020-08-11 16:17:37 +03:00
Mark Tozzi ab8518fb5b
[7.x] Extensibility for Composite Agg #59648 (#60842) 2020-08-11 09:14:33 -04:00
Dan Hermann 839c6cdfc0
Un-mute data stream REST test (#60120) (#60939) 2020-08-11 08:10:04 -05:00
David Kyle 18a65c5b9a
DFA Get Stats can return multiple responses if more than one error occurs (#60950)
If the search for get stats with multiple job Ids fails the listener is called for each failure. 
This change waits for all responses then returns the first error if there was one.
2020-08-11 10:28:05 +01:00
Alan Woodward 54279212cf
Make MetadataFieldMapper extend ParametrizedFieldMapper (#59847) (#60924)
This commit cuts over all metadata field mappers to parametrized format.
2020-08-11 09:02:28 +01:00
Benjamin Trent 66b3e89482
[ML] enable logging for test failures (#60902) (#60910) 2020-08-10 12:36:30 -04:00
Francisco Fernández Castaño 2a4fd8329b
Avoid a race condition while waiting for pre warm to finish on SearchableSnapshotDirectoryTests (#60906)
Backport of #60885. Closes #60813
2020-08-10 17:29:16 +02:00
Jim Ferenczi f30f1f04e2
Replace AggregatorTestCase#search with AggregatorTestCase#searchAndReduce (#60816)
This commit removes the ability to test the top level result of an aggregator
before it runs the final reduce. All aggregator tests that use AggregatorTestCase#search
are rewritten with AggregatorTestCase#searchAndReduce in order to ensure that we test
the final output (the one sent to the end user) rather than an intermediary result
that could be different.
This change also removes spurious commits triggered on top of a random index writer.
These commits slow down the tests and are redundant with the commits that the
random index writer performs.
2020-08-10 17:23:00 +02:00
David Roberts dd02e9f31a [TEST] Mute SearchableSnapshotActionIT testSearchableSnapshotForceMergesIndexToOneSegment (#60904)
Due to https://github.com/elastic/elasticsearch/issues/60901
2020-08-10 15:25:39 +01:00
Henning Andersen a155315ceb
Autoscaling decider and decision service (#59005) (#60884)
Split the autoscaling decider into a service and configuration
in order to enable having additional context information available
in the service. Added AutoscalingDeciderContext holding generic
information all deciders are expected to need. Implemented GET
_autoscaling/decision
2020-08-10 15:28:52 +02:00
Andrei Dan 235e5ed3ea
[7.x] ILM: add force-merge step to searchable snapshots action (#60819) (#60882)
This adds a force-merge step to the searchable snapshot action, enabled by default,
but parameterizable using the `force_merge-index" optional boolean.

eg.
```
PUT _ilm/policy/my_policy
{
  "policy": {
    "phases": {
      "cold": {
        "actions": {
          "searchable_snapshot" : {
            "snapshot_repository" : "backing_repo",
            "force_merge_index": true
          }
        }
      }
    }
  }
}
```

(cherry picked from commit d0a17b2d35f1b083b574246bdbf3e1929471a4a9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-10 13:45:11 +01:00
Martijn van Groningen 64bb082f9b
Improve error message for non append-only writes that target data stream (#60874)
Backport of #60809 to 7.x branch.

Closes #60581
2020-08-10 13:18:59 +02:00
David Kyle 6b2ddf4453
Fix typo in DataHistogramGroupByIT name (#60880) (#60883) 2020-08-10 11:55:01 +01:00
David Turner f168bdac7d Change transitive -> transient in ILM log message (#60871)
"Transitive" is technically ok here but it's an overloaded word and it's
not immediately clear which meaning is intended so this log message
always makes me do a double-take. I think both "transient" and
"transitory" are clearer, with "transient" being the usual choice.
2020-08-10 11:37:49 +01:00
David Turner a2d5bfca2f Even longer timeout for XPackRestIT (#60812)
This suite is still occasionally failing with a timeout on macOS.
Suggest further increasing this timeout until this suite is broken up.

Relates #58071
2020-08-10 10:26:21 +01:00
James Rodewig ff4ea4720a
[DOCS] Update example data stream names (#60783) (#60820)
Uses `my-data-stream` in place of `logs` for data stream examples.
This provides a more intuitive experience for users that copy/paste
their own values into snippets.
2020-08-06 09:38:35 -04:00
Benjamin Trent bc17afc535
[7.x] [ML] have DELETE analytics ignore stats failures and clean up unused stats (#60776) (#60784)
* [ML] have DELETE analytics ignore stats failures and clean up unused stats (#60776)

When deleting an analytics configuration, the request MIGHT fail if
the .ml-stats index does not exist or is in strange state (shards unallocated).

Instead of making the request fail, we should log that we were unable to delete the stats docs and then
have them cleaned up in the 'delete_expire_data' janitorial process
2020-08-06 08:55:35 -04:00
David Turner 05b2a2db8b AwaitsFix for #60781 2020-08-06 12:28:53 +01:00
David Turner f24a3a4e81 AwaitsFix for 60781 2020-08-06 11:35:44 +01:00
Hendrik Muhs b210aaf666 [Transform] remove wrong test (#60807)
remove test, scripts are excluded in the change collector, the test is a leftover from a previous
solution of #57332, which has been discarded

relates #60724
fixes #60794
2020-08-06 11:56:19 +02:00
Dimitris Athanasiou cedbe6968b
[7.x][ML] Include cause in logging during test inference (#60749) (#60805)
When an exception is thrown during test inference we are
not including the cause message in our logging. This commit
addresses this issue.

Backport of #60749
2020-08-06 11:45:59 +03:00
Ryan Ernst d88098c1d5
Mute flaky transform pivot test
see https://github.com/elastic/elasticsearch/issues/60794
2020-08-05 14:53:25 -07:00
James Rodewig 029869eb35
[DOCS] Fix metadata field refs (#60764) (#60769) 2020-08-05 14:04:55 -04:00
Francisco Fernández Castaño b4044004aa
Add recovery state tracking for Searchable Snapshots (#60751)
This pull request adds recovery state tracking for Searchable Snapshots.

In order to track recoveries for searchable snapshot backed indices, this pull
request adds a new type of RecoveryState.
This newRecoveryState instance is able to deal with the
small differences that arise during Searchable snapshots recoveries.

Those differences can be summarized as follows:

-  The Directory implementation that's provided by SearchableSnapshots mark the
    snapshot files as reused during recovery. In order to keep track of the
    recovery process as the cache is pre-warmed, those files shouldn't be marked
    as reused.
 - Once the shard is created, the cache starts its pre-warming phase, meaning that
    we should keep track of those downloads during that process and tie the recovery
    to this pre-warming phase. The shard is considered recovered once this pre-warming
    phase has finished.

Backport of #60505
2020-08-05 17:41:49 +02:00
Hendrik Muhs 08f94c914b [Transform] disable optimizations when using scripts in group_by (#60724)
disable optimizations when using scripts in group_by, when scripts using scripts we can not predict
the outcome and we have no query counterpart. Other optimizations for other group_by's are not
affected.

fixes #57332
2020-08-05 17:27:19 +02:00
Hendrik Muhs 2b6891b584
[7.x][Transform] implement test suite to test continuous transforms (#60725)
implements a test suite for testing continuous transform with randomization in terms of mappings,
index settings, transform configuration. Add a test case for terms and date histogram. The test
covers:

 - continuous mode with several checkpoints created
 - correctness of results
 - optimizations (minimal necessary writes)
 - permutations of features (index settings, aggs, data types, index or data stream)
2020-08-05 16:56:01 +02:00
Albert Zaharovits e5dce5e805
Use the Index Access Control from the scroll search context (#60640)
When the RBACEngine authorizes scroll searches it sets the index access control
to the very limiting IndicesAccessControl.ALLOW_NO_INDICES value.
This change will set it to the value for the index access control that was produced
during the authorization of the initial search that created the scroll,
which is now stored in the scroll context.
2020-08-05 15:37:37 +03:00
Przemysław Witek 0afa1bd972
Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601) (#60727) 2020-08-05 13:39:40 +02:00
Yannick Welsch 9f6f66f156 Fail searchable snapshot shards on invalid license (#60722)
Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.
2020-08-05 13:14:15 +02:00
Adrien Grand 67f6f34c23
Remove dataset.* fields. (#60720)
These are being replaced by the `data_stream.*` fields.
2020-08-05 11:35:05 +02:00
Rory Hunter 43762f69d1
Move deprecation HTTP tests to deprecation plugin (#60523)
Backport of #60298.

This PR moves the deprecation HTTP tests under the deprecation plugin, as a precursor to
adding further tests as part of #58924.
2020-08-05 09:54:34 +01:00
Adrien Grand 602d269059
Rename `datastream` to `data_stream`. (#60714)
The name of the feature having a space: "data stream", the key should
have an underscore.
2020-08-05 09:55:02 +02:00
Russ Cam e9c0bf1566 Remove body from indices.create_data_stream REST spec (#60705)
This commit removes the body property from the
indices.create_data_stream.json REST API spec
as the API does not support sending a body.

Update the description of the API to remove
that a data stream can be updated with the
API - data streams can only be created with
this API and attempting to update yields a
`resource_already_exists_exception`.

Closes #60704

(cherry picked from commit 2cab2e0ee094769852df31566dbe22b5df59d900)
2020-08-05 17:01:28 +10:00
Igor Motov 959690a64a
Refactor extendedBounds to use DoubleBounds (#60556) (#60681)
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for #59175
2020-08-04 16:45:47 -04:00
Francisco Fernández Castaño b500b3d55a
Decrease restore rate limit value to enforce its usage on SearchableSnapshotsIntegTests#testMaxRestoreBytesPerSecIsUsed (#60650)
Fixes #59287. Backport of #59592
2020-08-04 17:44:47 +02:00
Alan Woodward b3ae5d26bd
Move mapper validation to the mappers themselves (#60072) (#60649)
Currently, validation of mappers (checking that cross-references are correct, limits on
field name lengths and object depths, multiple definitions, etc) is performed by the
MapperService. This means that any mapper-specific validation, for example that done
on the CompletionFieldMapper, needs to be called specifically from core server code,
and so we can't add validation to mappers that live in plugins.

This commit reworks the validation framework so that mapper-specific validation is
done on the Mapper itself. Mapper gets a new `validate(MappingLookup)`
method (already present on `MetadataFieldMapper` and now pulled up to the parent
interface), which is called from a new `DocumentMapper.validate()` method. All
the validation code currently living on `MapperService` moves either to individual
mapper implementations (FieldAliasMapper, CompletionFieldMapper) or into
`MappingLookup`, an altered `DocumentFieldMappers` which now knows about
object fields and can check for duplicate definitions, or into DocumentMapper
which handles soft limit checks.
2020-08-04 14:39:20 +01:00
Rene Groeschke bdd7347bbf
Merge test runner task into RestIntegTest (7.x backport) (#60600)
* Merge test runner task into RestIntegTest (#60261)
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
* Fix merge issues
* use former 7.x common test configuration
2020-08-04 14:46:32 +02:00
Adrien Grand 20ae1b75bd
Rename dataset to datastream (#60638)
Co-authored-by: ruflin <spam@ruflin.com>
2020-08-04 09:58:54 +02:00
Armin Braun 7ae9dc2092
Unify Stream Copy Buffer Usage (#56078) (#60608)
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
2020-08-04 09:54:52 +02:00
Yang Wang 54aaadade7
API key name should always be required for creation (#59836) (#60636)
The name is now required when creating or granting API keys.
2020-08-04 13:28:47 +10:00
Tim Vernum c58e32bb27
Improve assertion failure when error is not empty (#60572)
This commit changes TokenAuthIntegTests so all occurrences of

    assertThat(x.size(), equalTo(0));

become

    assertThat(x, empty());

This means that the assertion failure message will include the
contents of the list (`x`) instead of just its size, which
facilitates easier failure diagnosis.

Relates: #56903
Backport of: #60496
2020-08-04 11:05:18 +10:00
Jake Landis bcb9d06bb6
[7.x] Cleanup xpack build.gradle (#60554) (#60603)
This commit does three things:
* Removes all Copyright/license headers for the build.gradle files under x-pack. (implicit Apache license)
* Removes evaluationDependsOn(xpackModule('core')) from build.gradle files under x-pack
* Removes a place holder test in favor of disabling the test task (in the async plugin)
2020-08-03 13:11:43 -05:00
Hendrik Muhs 1e01832b0c fix possible NPE introduced in #60591 2020-08-03 16:40:38 +02:00
Hendrik Muhs cd6492fc11 [Transform] fix regression of date histogram optimization (#60591)
fixes mix up of input and output field name for date histogram optimization.

minimal fix, more tests to be added with #60469

fixes #60590
2020-08-03 15:52:08 +02:00
Yannick Welsch b0d601fa63 Adjust searchable snapshot license (#60578)
No longer needs Platinum license for testing on staging.
2020-08-03 13:19:53 +02:00
Yannick Welsch 9e24a54382 Clean existing index folder when loading searchable snapshot (#60122)
Closing a regular index and mounting a snapshot-backed index into that existing index does not clean the existing index
folders of those preexisting shards.

This PR removes the existing Lucene / translog files once the searchable snapshot shard is starting up. Future PRs will
make reuse of the existing index files to populate the cache.
2020-08-03 13:19:11 +02:00
Yang Wang a76fc324d4
Fix get-license test failure by ensure cluster is ready (#60498) (#60569)
When a new cluster starts, the HTTP layer becomes ready to accept incoming 
requests while the basic license is still being populated in the background. 
When a get license request comes in before the license is ready, it can get 
404 error. This PR fixes it by either wrap the license check in assertBusy or 
ensure the license is ready before perform the check.

This is a backport for both #60498 and #60573
2020-08-03 19:40:03 +10:00
Tim Vernum 1a373b0c21
Only call listener once (SP template registration) (#60567)
This fixes a bug in the IdP's template registration that would
sometimes call the listener twice.

Resolves: #54285
Resolves: #54423

Backport of: #60497
2020-08-03 13:45:16 +10:00
Andrei Dan ac258f10d6
Data streams: throw ResourceAlreadyExists exception (#60518) (#60536)
For consistency reasons (and reducing the overload of IllegalArgumentException)
this changes the exception thrown when trying to create a data stream
that already exists.

(cherry picked from commit ac2184c4614bba0f3ee377da49aea0daed98bab4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-01 16:31:09 +01:00
Julie Tibshirani f1d4fd8c3e Correct name of IndexFieldData#loadGlobalDirect. (#60492)
It seems 'localGlobalDirect' was just a typo.
2020-07-31 10:53:21 -07:00
Rene Groeschke ed4b70190b
Replace immediate task creations by using task avoidance api (#60071) (#60504)
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
2020-07-31 13:09:04 +02:00
Hendrik Muhs a721d6d19b [Transform] use correct version in BWC serialization test (#60500)
use correct version in BWC serialization test

fixes #60464
2020-07-31 11:23:05 +02:00
Julie Tibshirani 8ac81a3447 Remove IndexFieldData#clear since it is unused. (#60475)
This method was never called. It also seemed tricky that calling a method on
`IndexFieldData` could clear the contents of a shared cache.
2020-07-30 14:07:55 -07:00
Lisa Cawley 9425de8dd1
[DOCS] Clarify support for run as in OIDC realms (#60246) (#60485) 2020-07-30 13:39:41 -07:00
Mark Tozzi 970a0c8957
[7.x] Aggregation tests for Wildcard Field (#58507) (#60423) 2020-07-30 08:56:21 -04:00
Przemysław Witek 9e27f7474c
Make MlDailyMaintenanceService delete jobs that are in deleting state anyway (#60121) (#60439) 2020-07-30 09:53:11 +02:00
Hendrik Muhs aaed6b59d6
[7.x][Transform] add support for missing bucket (#59591) (#60390)
add support for "missing_bucket" in group_by

fixes #42941
fixes #55102
backport #59591
2020-07-30 08:26:51 +02:00
Bogdan Pintea 8c22adc447
SQL: Add option to provide the delimiter for the CSV format (#59907) (#60420)
* SQL: Add option to provide the delimiter for the CSV format (#59907)

* Add option to provide the delimiter to the CSV fmt

This adds the option to provide the desired character as the separator
for the CSV format (the default remains comma).
A set of characters are excluded though - like CR, LF, `"` - to avoid
slipping onto the CSV-dialects slope. The tab is also forbidden, the
user needs to choose the "tsv" format explicitely.

Update the doc to make it clear that the textual CSV, TSV and TXT
formats pass the cursor back to the user through the Cursor HTTP header.

(cherry picked from commit 3a8b00cc7480f7ada57fcea3cbac957facac08fc)

* Java8 fixes

- replace Set#of();
- URLDecoder#decode() requires a string (vs a charset) as 2nd arg.
2020-07-29 21:40:11 +02:00
Bogdan Pintea 30610d962a
Fix SYS COLUMNS schema in ODBC mode (#59513) (#60418)
* Fix SYS COLUMNS schema in ODBC mode (#59513)

* Fix SYS COLUMNS schema in ODBC mode

This fixes a regression when certain ODBC-specific columns that need to
be of the short type were returned as the integer type.

This also fixes the stubbing for the *-indices SYS COLUMN commands.

(cherry picked from commit 96d89dc9b1fd731e736ef804a16bd05496c1dea6)

* Java8 fix: avoid diamond notation in test.

Qualify anonymous class in test.
2020-07-29 21:19:32 +02:00
Bogdan Pintea 4c771485f6
SQL: fix NPE on ambiguous GROUP BY (#59370) (#60416)
* fix npe on ambiguous group by

* add tests for aggregates and group by, add quotes to error message

* add more cases for Group By ambiguity test

* change error messages for field ambiguity

* change collection aliases approach

* add locations of attributes for ambiguous grouping error

* Adress review comments

- remove Comparable implementations from Attribute and Location;
- add ad-hoc comparator for sorting locations in ambiguity message;
- remove added AttributeAlias class with Touple;
- add code comment to explain issue with Location overwriting.

* Fix c&p error in location ref generation comparator

Fix copy&paste error in dedicated comparator used for sorting ambiguity
location references.
Slightly increase its readability.

Co-authored-by: Nikita Verkhovin <verkhovin13@gmail.com>
(cherry picked from commit 9ba70a3483f0f4987229bec231cdc004f51b88a5)
2020-07-29 20:44:28 +02:00
Bogdan Pintea 79ef263fc2
Add test with alias reuse and grouping (#60396) (#60421)
Add test with alias reuse and grouping.

(cherry picked from commit 37ee819eb98fd10c1b16a61e4e1d446d0ee859de)
2020-07-29 20:43:04 +02:00
Mark Vieira 39fa1c4df0
Add compatibility testing for JDBC driver (#60409)
This commit adds compatibility testing of our JDBC driver against
different Elasticsearch versions. Although we are really testing the
forwards compatibility nature of the JDBC driver we model the testing
the same as we do existing BWC tests, that is, with the current branch
fetching the earlier versions of the artifact that is to be tested. In
this case, that's the JDBC driver itself.

Because the tests include the JDBC driver jar on it's classpath we had
to change the packaging of the driver jar in order to avoid jarhell and
other conflicting dependency issues when using an old JDBC driver with
later branches. For this we simply relocate all driver dependencies in
the shadow jar under a "shadowed" package. This allows the JDBC driver
to use the correct version of Elasticsearch libs classes, while the
tests themselves use their versions. Since this required a change to the
driver jar compatibility testing can only go back as far as that version
which at the time of this commit is 7.8.1.
2020-07-29 10:45:11 -07:00
David Roberts 2a0116f51b
[ML] Take more care that memory estimation uses unique named pipes (#60405)
Prior to this change ML memory estimation processes for a
given job would always use the same named pipe names.  This
would often cause one of the processes to fail.

This change avoids this risk by adding an incrementing counter
value into the named pipe names used for memory estimation
processes.

Backport of #60395
2020-07-29 17:29:55 +01:00
Armin Braun bfee7b91ff
Increase Timeouts in SLMBlockingIntegTests (#60356) (#60403)
The retention run goes through a number of steps and can randomly take more than 10s.
=> increased timeout to 30s like we did in other spots in this test

Also, noticed that we had a hard wait of 10s in this test, removed it and adjusted following
busy assert in a way that can deal with a missing snapshot (from when the assert runs before
the snapshot was put into the CS).

Closes #60336
2020-07-29 17:34:49 +02:00
Benjamin Trent 76359aaa53
[ML] always write prediction_[score|probability] for classification inference (#60335) (#60397)
In order to unify model inference and analytics results we
need to write the same fields.

prediction_probability and prediction_score are now written
for inference calls against classification models.
2020-07-29 10:58:14 -04:00
Nhat Nguyen 9d4a64e749
Allow CCR on nodes with legacy roles only (#60093)
CCR will stop functioning if the master node is on 7.8, but data nodes 
are before that version because the master node considers that all data
nodes do not have the remote cluster client role. This commit allows CCR
work on data nodes with legacy roles only.

Relates #54146
Relates #59375
2020-07-29 10:57:31 -04:00
Benjamin Trent a6da1fd73e
[ML] require alias when indexing to an alias that should be created (#60315) (#60394)
This sets up all indexing to one of our write aliases to require it actually be an alias.

This allows failures scenarios to be captured quickly, loudly, and then potentially recovered.
2020-07-29 10:52:36 -04:00
Jim Ferenczi 578749a5e8 Fix AsyncResultsServiceTests#testRetrieveFromMemoryWithExpiration (#60337)
This change ensures that the expiration time that is set in the test
is long enough to not be triggered by a slow execution.

Closes #60255
2020-07-29 09:47:47 +02:00
Hendrik Muhs 5eb04fb413 [Transform] fix performance regression introduced in #60196 (#60276)
re-work #60196, to not skip building change collectors as otherwise date histogram only
pivots would run slow

relates #60125
2020-07-29 09:44:03 +02:00
Armin Braun 753fd4f6bc
Cleanup and optimize More Serialization Spots (#59959) (#60331)
Same as #59626 for a few more spots.
2020-07-29 07:20:44 +02:00
Yang Wang 3a0e7f4294
Unmute kerberos tests for jdk 15 and mute for jdk 8u262 (#60279)
The JDK bug (https://bugs.openjdk.java.net/browse/JDK-8246193) is fixed since b26.
The tests can be unmuted since we are already using b33. However the same bug is now
affecting jdk 8u262, which is the base for current Zulu jdk 8.48. This PR mute the tests
for this specific jdk version.

Relates: #56507
2020-07-29 12:57:00 +10:00
Benjamin Trent 54c8936508
[ML] do not summerize importance for custom features (#60198) (#60333)
If a feature is created via a custom pre-processor,
we should return the importance for that feature.

This means we will not return the importance for the
original document field for custom processed features.

closes https://github.com/elastic/elasticsearch/issues/59330
2020-07-28 15:58:20 -04:00
Julie Tibshirani c7bfb5de41
Add search `fields` parameter to support high-level field retrieval. (#60258)
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.

Addresses #49028 and #55363.
2020-07-28 10:58:20 -07:00
James Rodewig 8edae3cd15
[DOCS] Update pre-existing data stream refs (#60289) (#60293) 2020-07-28 13:51:43 -04:00
Nhat Nguyen 416e51980c Relax ShardFollowTasksExecutor validation (#60054)
If a primary shard of a follower index is being relocated, then we
will fail to create a follow-task. This validation is too restricted.
We should ensure that all primaries of the follower index are active
instead.

Closes #59625
2020-07-28 13:46:49 -04:00
Nhat Nguyen 6ece629ec3 Set timeout of master requests on follower to unbounded (#60070)
Today, a follow task will fail if the master node of the follower
cluster is temporarily overloaded and unable to process master node
requests (such as update mapping, setting, or alias) from a follow-task
within the default timeout. This error is transient, and follow-tasks
should not abort. We can avoid this problem by setting the timeout of
master node requests on the follower cluster to unbounded.

Closes #56891
2020-07-28 13:46:49 -04:00
Zachary Tong 9f8ec3e3fb Mute SSLDriverTests#testCloseDuringHandshakePreJDK11
Tracking issue: https://github.com/elastic/elasticsearch/issues/59992
2020-07-28 13:20:53 -04:00
Zachary Tong 46f9c38c33 Mute tests while waiting on 58807
Bugurl: https://github.com/elastic/elasticsearch/issues/58807
2020-07-28 12:45:49 -04:00
markharwood e0286e9bd3
Search - remove allow-expensive-query checks from wildcard field. (#60273) (#60308)
Removing allow-expensive-query checks because we think this field type is fast enough.

Closes #60139
2020-07-28 17:12:33 +01:00
Dimitris Athanasiou ed7dcff7c4
[7.x][ML] Audit updates on data frame analytics jobs (#60126) (#60287)
Closes #59652

Backport of #60126
2020-07-28 16:33:35 +03:00
Dimitris Athanasiou 16ffcfb9f6
[7.x][ML] Ensure bulk requests are not over memory limit (#60219) (#60283)
Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.

Note the limit was implemented in #58885.

Backport of #60219
2020-07-28 16:04:03 +03:00
Dimitris Athanasiou 981e436d6c
[7.x][ML] Improve assertion on regression alias field test (#60221) (#60264)
Previously the test was asserting the prediction on each document
was close 10.0 from the expected. It turned out that was not enough
as we occasionally saw the test failing by little.

Instead of relaxing that assertion, this commit changes it to
assert the mean prediction error is less than 10.0. This should
reduce the chances of the test failing significantly.

Fixes #60212

Backport of #60221
2020-07-28 11:48:00 +03:00
James Rodewig aba785cb6e
[DOCS] Update my-index examples (#60132) (#60248)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 15:58:26 -04:00
Dan Hermann b98caf58ee
Mark data stream APIs as stable (#59860) (#60206) 2020-07-27 10:37:52 -05:00
Benjamin Trent ea3c49979e
Test mute for issue 60212 (#60214) 2020-07-27 10:10:40 -04:00
Hendrik Muhs 95c99ca887 [Transform] Fix Regression: continuous transform can fail for (date) histogram group_by(#60196)
do not create change collector if group_by configuration does not support change detection

fixes #60125
2020-07-27 14:50:03 +02:00
Dimitris Athanasiou 439b7f7e59
[7.x][ML] DFA result processor should only skip rows and model chunks on cancel (#60113) (#60193)
When the job is force-closed or shutting down due to a fatal error we clean
up all cancellable job operations. This includes cancelling the results processor.
However, this means that we might not persist objects that are written from the
process like stats, memory usage, etc.

In hindsight, we do not gain from cancelling the results processor in its
entirety. It makes more sense to skip row results and model chunks but keep
stats and instrumentation about the job as the latter may contain useful information
to understand what happened to the job.

Backport of #60113
2020-07-27 13:42:46 +03:00
David Roberts 89466eefa5
Don't require separate privilege for internal detail of put pipeline (#60190)
Putting an ingest pipeline used to require that the user calling
it had permission to get nodes info as well as permission to
manage ingest.  This was due to an internal implementaton detail
that was not visible to the end user.

This change alters the behaviour so that a user with the
manage_pipeline cluster privilege can put an ingest pipeline
regardless of whether they have the separate privilege to get
nodes info.  The internal implementation detail now runs as
the internal _xpack user when security is enabled.

Backport of #60106
2020-07-27 10:44:48 +01:00
Dan Hermann 88e8f691af
Update index privileges doc to include data streams (#59139) (#60169) 2020-07-24 07:52:36 -05:00
Lisa Cawley 2665bfffce
[DOCS] Fix security links in machine learning APIs (#60098) (#60152) 2020-07-23 16:43:10 -07:00
Nhat Nguyen bc65b3a590 Increase timeout in AutoFollowIT (#60004)
It can take more than 10 seconds to auto-follow and create a follow-task
on a slow CI. This commit increases timeout in AutoFollowIT by replacing
assertBusy with assertLongBusy.

Closes #59952
2020-07-23 16:36:53 -04:00
Nhat Nguyen 0fe4d5df67 Increase timeout testFollowIndexWithConcurrentMappingChanges
Fixes #59273
2020-07-23 16:22:58 -04:00
Albert Zaharovits 2eaf5e1c25
[DOCS] Mapping updates are deprecated for ingestion privileges (#60024)
This PR contains the deprecation notice that `create`, `create_doc`, `index` and
`write` ingest privileges do not permit mapping updates in version 8. It also
updates the docs description of said privileges. 

This should've been part of #58784
2020-07-23 19:49:23 +03:00
James Rodewig 988e8c8fc6
[DOCS] Swap `[float]` for `[discrete]` (#60134)
Changes instances of `[float]` in our docs for `[discrete]`.

Asciidoctor prefers the `[discrete]` tag for floating headings:
https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks
2020-07-23 12:42:33 -04:00
Dimitris Athanasiou 6b9a362ec2
[7.x][ML] Skip test inference if DFA task has been stopped (#60116) (#60127)
If the job is stopped before starting inference on test data, we
should skip inference entirely.

Backport of #60116
2020-07-23 18:34:09 +03:00
Dan Hermann ca25f6ae6f
Include the resolve index action in the view_index_metadata privilege (#59785) (#60112) 2020-07-23 08:13:56 -05:00
Dan Hermann fe12217c7f
[7.x] Move REST specs for data streams (#60111) 2020-07-23 08:10:54 -05:00
Albert Zaharovits 3ad3a7d268 DOCS audit attributes for API Key authn (#60033)
This PR describes the new audit entry attributes api_key.id,
api_key.name and authentication.type, as well as the meaning of
existing attributes when authentication is performed using API keys.

This should've been part of #58928
2020-07-23 15:51:40 +03:00
Armin Braun ebb6677815
Formalize and Streamline Buffer Sizes used by Repositories (#59771) (#60051)
Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient.
By the same token, the use of stream copying with the default 8k buffer size  for blob writes was inefficient as well.

We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`.

This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs.

This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.
2020-07-22 21:06:31 +02:00
James Rodewig 74a34777d1
[DOCS] Fix outdated Kibana UI refs and screenshots in security docs (#60023) (#60059) 2020-07-22 13:08:22 -04:00
Larry Gregory a686ccc9b2
[Backport][7.x] Introduce reserved_ml_apm_user kibana privilege (#59854) (#60047) 2020-07-22 11:06:10 -04:00
Jay Modi c8ef2e18f7
Thread safe clean up of LocalNodeModeListeners (#60007)
This commit continues on the work in #59801 and makes other
implementors of the LocalNodeMasterListener interface thread safe in
that they will no longer allow the callbacks to run on different
threads and possibly race each other. This also helps address other
issues where these events could be queued to wait for execution while
the service keeps moving forward thinking it is the master even when
that is not the case.

In order to accomplish this, the LocalNodeMasterListener no longer has
the executorName() method to prevent future uses that could encounter
this surprising behavior.

Each use was inspected and if the class was also a
ClusterStateListener, the implementation of LocalNodeMasterListener
was removed in favor of a single listener that combined the logic. A
single listener is used and there is currently no guarantee on execution
order between ClusterStateListeners and LocalNodeMasterListeners,
so a future change there could cause undesired consequences. For other
classes, the implementations of the callbacks were inspected and if the
operations were lightweight, the overriden executorName method was
removed to use the default, which runs on the same thread.

Backport of #59932
2020-07-22 08:02:18 -06:00
Dimitris Athanasiou 7e652ca873
[7.x][ML] Include same fields during test inference as in training (#… (#60034)
In #58877, when we switched test inference on java, we just
use the doc's `_source` as features. However, this could be
missing out on features that were used during training,
e.g. alias fields, etc.

This commit addresses this by extracting fields to use as
features during inference the same way they are extracted
in `DataFrameDataExtractor` when they are used for training.

Backport of #59963
2020-07-22 12:54:13 +03:00
David Roberts 7358f9fb05 [ML] Mute ForecastIT.testOverflowToDisk in EAR builds (#60040)
Due to https://github.com/elastic/elasticsearch/issues/58806
2020-07-22 10:17:37 +01:00
James Baiera 1c1a4297e0
Track backing indices in data streams stats from cluster state (#59817) (#60015)
If shard level results are incomplete in the data streams stats call, it is possible to get inaccurate 
counts of the number of backing indices, despite this data being accurate and available in the 
cluster state.
2020-07-21 23:21:33 -04:00
James Rodewig 401e12dc2b
[DOCS] Fix data stream docs (#59818) (#60010) 2020-07-21 17:04:13 -04:00
James Baiera b3363cf8f9
[7.x] Remove unneeded rest params from Data Stream Stats (#59575) (#59661)
This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data 
Streams Stats REST endpoint. These options are required for broadcast requests, but are not 
needed for anything in terms of resolving data streams. Instead, we just set a default set of 
IndicesOptions on the transport request.
2020-07-21 15:59:16 -04:00
James Rodewig b302b09b85
[DOCS] Reformat snippets to use two-space indents (#59973) (#59994) 2020-07-21 15:49:58 -04:00
Armin Braun 5613e4b00b
Increase Timeout in testSLMRetentionAfterRestore (#59979) (#59991)
This test failed by hitting the 10s default busy assert timeout.
Given how involved the retention run is (multiple disk reads, CS updates etc.)
we should have a higher timeout here.

Also, removed the pointless delete call for the snapshot that we just asserted is gone,
 at the end of the test.

Closes #59956
2020-07-21 18:19:18 +02:00
James Rodewig 4d646ca819
[DOCS] Fix typo in LDAP config docs (#59953) (#59974)
Co-authored-by: AndyHunt66 <andrew.hunt@elastic.co>
2020-07-21 10:48:08 -04:00
Nik Everett 6f6076e208
Drop some params from IndexFieldData.Builder (backport of #59934) (#59972)
We never used the `IndexSettings` parameter and we only used the
`MappedFieldType` parameter to get the name of the field which we
already know everywhere where we build the `IFD.Builder`. This allows us
to drop a fair bit of ceremony from a couple of tests.
2020-07-21 10:28:59 -04:00
Przemysław Witek 283a1f605c
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 15:15:04 +02:00
Yannick Welsch 07784a0b16 CCR recoveries using wrong setting for chunk sizes (#59597)
The default chunk size for CCR file-based recoveries was wrongly set to 40MB instead of 1MB.
2020-07-21 13:56:06 +02:00
Tal Levy c9ac4bf7c8 Reduce memory usage of GeoGridTiler tests (#59921)
This PR further reduces the memory footprint of the
testGeoHashGridCircuitBreaker test such that only
0.26% of the randomized runs result in memory usage of between
500kb-1mb. where most of that those that are in that range
produce ~650kb of usage. Before, 3% of the runs would use
> 50mb of memory resulting in OOMs in CI

Closes #59853.
2020-07-20 15:45:39 -07:00
Jay Modi 515b53d297
Fix race in SLM master/cluster state listeners (#59896)
This change fixes two possible race conditions in SLM related to
how local master changes and cluster state events are observed. When
implementing the `LocalNodeMasterListener` interface, it is only
recommended to execute on a separate threadpool if the operations are
heavy and would block the cluster state thread. SLM specified that the
listeners should run in the Snapshot thread pool, but the operations
in the listener were lightweight. This had the side effect of causing
master changes to be delayed if the Snapshot threads were all busy and
could also potentially cause the `onMaster` and `offMaster` calls to
race if both were queued and then executed concurrently. Additionally,
the `SnapshotLifecycleService` is also a `ClusterStateListener` and
there is currently no order of operations guarantee between
`LocalNodeMasterListeners` and `ClusterStateListeners` so this could
lead to incorrect behavior.

The resolution for these two issues is that the
SnapshotRetentionService now specifies the `SAME` executor for its
implementation of the `LocalNodeMasterListener` interface. The
`SnapshotLifecycleService` is no longer a `LocalNodeMasterListener` and
instead tracks local master changes in its `ClusterStateListner`.

Backport of #59801
2020-07-20 09:59:46 -06:00
Nik Everett fcd8b5fe6e
Fix top_metrics when metric is missing (backport of #59471) (#59881)
This fixes a null pointer exception when the metric is missing for the
latest document returned by `top_metrics`.

Closes #58926
2020-07-20 10:42:58 -04:00
Rene Groeschke e31ebc96f9
Enforce fail on deprecated gradle usage (7.x backport) (#59758)
* Enforce fail on deprecated gradle usage (#59598)
* Fix branch specific deprecated gradle api usages
* Fix archiveVersion property usage
2020-07-20 08:52:30 +02:00
Albert Zaharovits 3ffb20bdfc
Fix DLS/FLS permission for the submit async search action (#59693)
The submit async search action should not populate the thread context
DLS/FLS permission set, because it is not currently authorised as an "indices request"
and hence the permission set that it builds is incomplete and it overrides the
DLS/FLS permission set of the actual spawned search request (which is built correctly).
2020-07-20 09:37:26 +03:00
Costin Leau 9cc80621c3 EQL: Fix matching of tail/desc queries (#59827)
When dealing with tail queries, data is returned descending for the base
criterion yet the rest of the queries are ascending. This caused a
problem during insertion since while in a page, the data is ASC, between
pages the blocks of data is DESC.
This caused incorrectly sorting inside a SequenceGroup which led to
incorrect results.

Further more in case of limit, since the data in a page is ASC, early
return is not possible neither is desc matching. Thus the page needs to
be consumed first before finding the final results.
A future improvement could be to keep only the top N results dropping
the rest during insertion time.

(cherry picked from commit 77c88da054a1ce662a264f72cde5986d4ce37e3a)
2020-07-19 00:49:16 +03:00
Lee Hinman 8c7d414a3b
[7.x] Fix retrieving data stream stats for a DS with multiple backing indices (#59806) (#59810)
Backports the following commits to 7.x:

    Fix retrieving data stream stats for a DS with multiple backing indices (#59806)
2020-07-17 16:56:07 -06:00
Nik Everett 95e6e4a452
Small cleanup for IndexFieldData (#59724) (#59800)
This drops `IndexComponent` from `IndexFieldData` because it wasn't
doing anything other than forcing us to perform a bunch of ceremony to
build them.
2020-07-17 13:38:15 -04:00
Tal Levy c9ab7bb651
Fix bug in circuit-breaker check for geoshape grid aggregations (#57962) (#59741)
There was a bug in the geoshape circuit-breaker check where the
hash values array was being allocated before its new size was
accounted for by the circuit breaker.

Fixes #57847.
2020-07-17 09:26:00 -07:00
Benjamin Trent b7f30fc929
[7.x] Adding new `require_alias` option to indexing requests (#58917) (#59769)
* Adding new `require_alias` option to indexing requests (#58917)

This commit adds the `require_alias` flag to requests that create new documents.

This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.

When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.

This is useful when an alias is required instead of a concrete index.

closes https://github.com/elastic/elasticsearch/issues/55267
2020-07-17 10:24:58 -04:00
Alan Woodward b29d368b52
Convert DateFieldMapper to parametrized format (#59429) (#59759)
This commit makes DateFieldMapper extend ParametrizedFieldMapper,
declaring its parameters explicitly. As well as changes to DateFieldMapper
itself, there are some changes to dynamic mapping code to ensure that
dynamically detected date formats are passed through to new date mapper
builders.
2020-07-17 12:46:18 +01:00
Andrei Dan 301d61a98e
Tests: fix TimeSeriesDataStreamsIT.testShrinkActionInPolicyWithoutHotPhase (#59603) (#59689)
The ILM policy for the source and shrunk indices run separately (ie. they
are two separate managed indices). This fixes the test which exhibited some
flakiness by allowing some time for the ILM policy for the source index
to finish executing.

(cherry picked from commit c78d5e8499fc5ca2ca1314f97bcc6b55ba06e2e7)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-17 11:26:06 +01:00
Andrei Stefan d513e1090f
Do not create the index, if it's already there (#59745) (#59747)
(cherry picked from commit d097447d257efdf0a36b1157e1f177aed86ecca1)
2020-07-17 11:38:30 +03:00
Tanguy Leroux 4827fec1cf
Revert "Mute AzureSearchableSnapshotsIT (#58775)" (#59749)
This reverts commit 74a78b3a7b.
2020-07-17 10:02:46 +02:00
Martijn van Groningen 74c9402912
Re-enable data stream bwc tests (#59734)
after backporting #59503
Backport of #59732 yo 7.x
2020-07-16 23:59:52 +02:00
Martijn van Groningen 0096238df1
Replaced _data_stream_timestamp meta field's 'path' option with 'enabled' option (#59727)
Backport #59503 to 7.x

and adjusted exception messages.

Relates to #59076
2020-07-16 22:29:40 +02:00
Igor Motov 2408803fad
Adds hard_bounds to histogram aggregations (#59175) (#59656)
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
2020-07-16 15:31:53 -04:00
Martijn van Groningen 4089cbd767
Ignore multiple matching templates warning in specific tests. (#59692) (#59715)
Closes #59679
2020-07-16 20:07:38 +02:00
Marios Trivyzas c7efbc1b83
SQL: Implement DATE_PARSE function for parsing strings into DATE values (#57391) (#59699)
Implement DATE_PARSE(<date_str>, <pattern_str>) function
which allows to parse a date string according to the specified
pattern into a date object. The patterns allowed are those of
java.time.format.DateTimeFormatter.

Closes #54962

Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
Co-authored-by: Patrick Jiang(白泽) <dreamlike.sky@foxmail.com>

(cherry picked from commit 647a413d9b21bd3938f1716bb19f8407e1334125)
2020-07-16 17:24:30 +02:00
Benjamin Trent a28547c4b4
[7.x] [ML] add new `custom` field to trained model processors (#59542) (#59700)
* [ML] add new `custom` field to trained model processors (#59542)

This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 10:57:38 -04:00
Nik Everett 343053c0a7 Fix compilation in Eclipse (backport #59675)
Eclipse was confused by #59583. It can't see a the public inner
interface within the superclass. This time. Usually that is fine, but
the Eclipse gods don't like this particular code, I guess.
2020-07-16 08:25:12 -04:00
David Kyle c349fdcb89 Mute RegressionIT testWithDataStream (#59687)
For #59664
2020-07-16 09:45:29 +01:00
Przemysław Witek df4fea79cb
Add a "verbose" option to the data frame analytics stats endpoint (#59589) (#59621) 2020-07-16 09:51:31 +02:00
Nhat Nguyen b599f7a9c0
Fix estimate size of translog operations (#59206)
Make sure that the estimateSize method includes all fields of translog operations.
2020-07-16 00:19:30 -04:00
Yang Wang 067db1fc3b
Fix test of API key creation in a mixed cluster (#59680)
RoleDescriptors are mandatory prior to v7.3

Relates: #59425
2020-07-16 12:44:17 +10:00
Costin Leau 5f2285a8b3 EQL: Fix bug in returning results (#59673)
Using serialization/deserialization when dealing with non-trivial
documents causes the process to get stuck not to mention it is expensive.
Use a much more simple approach at the expense of losing information
(we're just interested in the source after all).

(cherry picked from commit e1659822db7ce1390ba9bbfb21768e24a0907dff)
2020-07-16 01:01:13 +03:00
Julie Tibshirani 2b70758a05 Correct type parametrization in geo mappers. (#59583)
Previously the concrete type parameters for the MappedFieldType didn't always
match those for the FieldMapper. This PR updates the mappers so that the type
parameters always match, which makes the design easier to follow.
2020-07-15 14:10:47 -07:00
Martijn van Groningen f1028fbbcc
Only install stack templates via elected master node (#59624) (#59657)
to avoid many error stacktraces in logs during a rolling upgrade.

Stack templates use the composable index template and component APIs,these APIs
aren't supported in 7.7 and earlier and in mixed cluster
environments this can cause a lot of ActionNotFoundTransportException
errors in the logs during rolling upgrades. If these templates
are only installed via elected master node then the APIs are always
there and the ActionNotFoundTransportException errors are then prevented.
2020-07-15 22:22:01 +02:00
Lee Hinman 74372df824
Mute {p0=mixed_cluster/120_api_key_auth/Test API key authentication will work in a mixed cluster} (#59663)
Relates to #59425
2020-07-15 14:14:33 -06:00
Nhat Nguyen 93d419b9c8 Mute CcrRollingUpgradeIT
Tracked at #59625
2020-07-15 14:43:32 -04:00
David Kyle df7fc8f967
Accounting for model size when models are not cached (#59607)
When an inference model is loaded it is accounted for in circuit breaker
and should not be released until there are no users of the model. Adds
a reference count to the model to track usage.
2020-07-15 18:06:15 +01:00
David Turner 691759fb1f
Validate snapshot UUID during restore (#59601)
Today when mounting a searchable snapshot we obtain the snapshot/index
UUIDs and then assume that these are the UUIDs used during the
subsequent restore. If you concurrently delete the snapshot and replace
it with one with the same name then this assumption is violated, with
chaotic consequences.

This commit introduces a check that ensures that the snapshot UUID does
not change during the mount process. If the snapshot remains in place
then the index UUID necessarily does not change either.

Relates #50999
2020-07-15 16:23:20 +01:00
Costin Leau 6b75525efb EQL: Improve testing spec (#59615)
Case sensitivity is incorporated as a test dimension - instead of
running the same test twice, two different tests are created.
Clean-up the test invocation by removing unused parameters.

Fix #59294

(cherry picked from commit 72c8a3582d8e8a4a663d82814a17a1a3d2757292)
2020-07-15 18:07:24 +03:00
Igor Motov b5ab447b3e
EQL: Fix async EQL Rest test (#59556) (#59620)
Unfortunately, we cannot guarantee that the execution will be truly
async even with 0ms timeout since we cannot block the execution. So, we need
to modify the test to work in both async and non-async mode.

Closes #59416
2020-07-15 11:02:33 -04:00
Martijn van Groningen 2a89e13e43
Move data stream transport and rest action to xpack (#59593)
Backport of #59525 to 7.x branch.

* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
2020-07-15 16:50:44 +02:00
Martijn van Groningen 53249dcca8
No need to select only < 7.9 nodes in 7.x branch. (#59609) 2020-07-15 15:23:16 +02:00
Ignacio Vera f8037abf47
upgrade to lucene-8.6.0 release (#59596) (#59599) 2020-07-15 12:40:57 +02:00
Tanguy Leroux 604f22db79
Use a dedicated thread pool for searchable snapshot cache prewarming (#59313) (#59590)
Since #58728 writing operations on searchable snapshot directory cache files
are executed in an asynchronous manner using a dedicated thread pool. The
thread pool used is searchable_snapshots which has been created to execute
prewarming tasks.

Reusing the same thread pool wasn't a good idea as it can lead to deadlock
situations. One of these situation arose in a test failure where the thread pool
was full of prewarming tasks, all waiting for a cache file to be accessible, while
the cache file was being evicted by the cache service. But such an eviction
can only be processed when all read/write operations on the cache file are
completed and in this case the deadlock occurred because the cache file was
actively being read by a concurrent search which also won the privilege to
write the range of bytes in cache... and this writing operation could never have
 been completed because of the prewarming tasks making no progress and
filling up the thread pool.

This commit renames the searchable_snapshots thread pool to
searchable_snapshots_cache_fetch_async. Assertions are added to assert
that cache writes are executed using this thread pool and to assert that read
on cached index inputs are executed using a different thread pool to avoid
potential deadlock situations.

This commit also adds a searchable_snapshots_cache_prewarming that is
used to execute prewarming tasks. It also converts the existing cache prewarming
test into a more complte integration test that creates multiple searchable
snapshot indices concurrently with randomized thread pool sizes, and verifies
that all files have been correctly prewarmed.
2020-07-15 11:45:52 +02:00
Francisco Fernández Castaño 66ef1cdad7
Add the possibility to inject a custom RecoveryState factory to IndexStorePlugin implementations (#59124)
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.

Backport of #59038
2020-07-15 11:11:07 +02:00
Yannick Welsch bc11503dc3 Wait for active license in CcrRestIT (#59543)
Relates #53966

Closes #59486
2020-07-15 09:38:08 +02:00
Tal Levy 4bb91b61e8
Adds support for date_nanos in Rollup Metric and DateHistogram Configs (#59349) (#59577)
Closes #44505.
2020-07-14 22:37:48 -07:00
Armin Braun 2dd086445c
Enable Fully Concurrent Snapshot Operations (#56911) (#59578)
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
   * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations

See updated JavaDoc and added test cases for more details and illustration on the functionality.

Some notes:

The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring.  The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.

This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.

Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
2020-07-15 03:42:31 +02:00
Armin Braun 06d94cbb2a
Fix TODO about Spurious FAILED Snapshots (#58994) (#59576)
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
2020-07-15 00:54:30 +02:00
Armin Braun e1014038e9
Simplify Repository.finalizeSnapshot Signature (#58834) (#59574)
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
2020-07-15 00:14:28 +02:00
Martijn van Groningen 35ae3d19db
Remove data stream feature flag (#59572)
so that it can used in the next minor release (7.9.0).

Backport of #59504 to 7.x branch.
Closes #53100
2020-07-14 23:50:41 +02:00
Ryan Ernst 3b688bfee5
Add license feature usage api (#59342) (#59571)
This commit adds a new api to track when gold+ features are used within
x-pack. The tracking is done internally whenever a feature is checked
against the current license. The output of the api is a list of each
used feature, which includes the name, license level, and last time it
was used. In addition to a unit test for the tracking, a rest test is
added which ensures starting up a default configured node does not
result in any features registering as used.

There are a couple features which currently do not work well with the
tracking, as they are checked in a manner that makes them look always
used. Those features will be fixed in followups, and in this PR they are
omitted from the feature usage output.
2020-07-14 14:34:59 -07:00
James Rodewig e5baacbe2e
[DOCS] Simplify index template snippets for data streams (#59533) (#59553)
Removes the `@timestamp` field mapping from several data stream index
template snippets.

With #59317, the `@timestamp` field defaults to a `date` field data type
for data streams.
2020-07-14 17:28:43 -04:00
James Baiera 5f7e7e9410
[7.x] Data Stream Stats API (#58707) (#59566)
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
2020-07-14 16:57:46 -04:00
Costin Leau 679619c798 EQL: Improve retrieval of results (#59552)
Instead of retrieving an entire SearchHit, get just a reference and
postpone the document retrieval when assembling the final results.
Remove sort information from results to make them consistent.
Move TumblingWindow under the sequence package.

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
(cherry picked from commit bccfbcd81f2f1d3552e95e4a9ee2618fb3059bd9)
2020-07-14 23:53:57 +03:00
Albert Zaharovits 6d6d565eeb
Fix auditing of nameless API Keys (#59531)
API keys can be created nameless using the grant endpoint (it is a bug, see #59484).
This change ensures auditing doesn't throw when such an API Key is used for authn.
2020-07-14 23:46:25 +03:00
Albert Zaharovits 4eb310c777
Disallow mapping updates for doc ingestion privileges (#58784)
The `create_doc`, `create`, `write` and `index` privileges do not grant
the PutMapping action anymore. Apart from the `write` privilege, the other
three privileges also do NOT grant (auto) updating the mapping when ingesting
a document with unmapped fields, according to the templates.

In order to maintain the BWC in the 7.x releases, the above privileges will still grant
the Put and AutoPutMapping actions, but only when the "index" entity is an alias
or a concrete index, but not a data stream or a backing index of a data stream.
2020-07-14 23:39:41 +03:00
Armin Braun d456f7870a
Deduplicate Index Metadata in BlobStore (#50278) (#59514)
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
2020-07-14 22:18:42 +02:00
David Kyle 0d2ea1b881
Check for ml privilege when using the Inference Aggregation (#59530) (#59562)
The inference pipeline aggregation requires the user has permission to access
the ml get trained models endpoint (_ml/inference/)
2020-07-14 20:53:40 +01:00
Tim Brooks 408a07f96a
Separate coordinating and primary bytes in stats (#59487)
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
2020-07-14 12:37:06 -06:00
Dan Hermann 70fe553ce0
[7.x] Reenable BWC tests for data streams (#59538) 2020-07-14 13:35:52 -05:00
Albert Zaharovits b1e4233806
Fix auditing of API Key authn without the owner realm name (#59470)
The `Authentication` object that gets built following an API Key authentication
contains the realm name of the owner user that created the key (which is audited),
but the specific field used for storing it changed in #51305 .

This PR makes it so that auditing tolerates an "unfound" realm name, so it doesn't
throw an NPE, because the owner realm name is not found in the expected field.

Closes #59425
2020-07-14 21:35:29 +03:00
Dimitris Athanasiou ee4610c0ca
[7.x][ML] Rename cross validation splitter package (#59529) (#59544)
Renames and moves the cross validation splitter package.

First, the package and classes are renamed from using
"cross validation splitter" to "train test splitter".
Cross validation as a term is overloaded and encompasses
more concepts than what we are trying to do here.

Second, the package used to be under `process` but it does
not make sense to be there, it can be a top level package
under `dataframe`.

Backport of #59529
2020-07-14 18:54:46 +03:00
Dimitris Athanasiou 37406487b9
[7.x][ML] Improve error for non-included field with unsupported type (#59424) (#59541)
When a field is not included yet its type is unsupported, we currently
state that the reason the field is excluded is that it is not in the
includes list. However, this implies the user could include it but
if the user tried to do so, they would get a failure as they would
be including a field with unsupported type.

This commit improves this by stating the reason a not included field
with unsupported type is excluded is because of its type.

Backport of #59424
2020-07-14 18:54:34 +03:00
Andrei Stefan 1fd16ffb70
Add license header to EqlStatsIT.java (#59537) 2020-07-14 18:45:13 +03:00
Dan Hermann e54b4a729f
[7.x] Adds write_index_only option to put mapping API (#59539) 2020-07-14 10:34:08 -05:00
Nhat Nguyen 4d7c59bedb
Assign follower primary to nodes with remote cluster client role (#59375)
The primary shards of follower indices during the bootstrap need to be
on nodes with the remote cluster client role as those nodes reach out to
the corresponding leader shards on the remote cluster to copy Lucene
segment files and renew the retention leases. This commit introduces a
new allocation decider that ensures bootstrapping follower primaries are
allocated to nodes with the remote cluster client role.

Co-authored-by: Jason Tedor <jason@tedor.me>
2020-07-14 11:23:55 -04:00
Dimitris Athanasiou e302c66847
[7.x][ML] Fix NPE when starting classification with missing dependent_variable (#59524) (#59540)
Since we have added checking the cardinality of the dependent_variable
for classification, we have introduced a bug where an NPE is thrown
if the dependent_variable is a missing field.

This commit is fixing this issue.

Backport of #59524
2020-07-14 17:56:55 +03:00
Andrei Dan d477aa14ef
Data Streams: fix bwc test (#59528) (#59534)
(cherry picked from commit ed1a5c00abed8c63ad395ea93df7a303da7b7a65)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 15:17:20 +01:00
Andrei Stefan cf752992d6
Add telemetry metrics (#59526) 2020-07-14 16:25:24 +03:00
Dan Hermann 59f639a279
Add auto_configure privilege 2020-07-14 08:23:49 -05:00
David Kyle d86435938b
[7.x] Add ml licence check to the pipeline inference agg. (#59213) (#59412)
Ensures the licence is sufficient for the model used in inference
2020-07-14 14:03:10 +01:00
Yang Wang f651487d74
Support prefix search for API key names (#59113) (#59520)
This PR adds minimum support for prefix search of API Key name. It only touches API key name and leave all other query parameters, e.g. realm name, username unchanged.
2020-07-14 22:06:20 +10:00
Andrei Dan 7dcdaeae49
Default to @timestamp in composable template datastream definition (#59317) (#59516)
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.

(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 12:36:54 +01:00
Yang Wang 2e71d0aa91
Allow mixed usage of boolean and string when merging OIDC claims (#59112) (#59512)
Certain OPs mix usage of boolean and string for boolean type OIDC claims. For example, the same "email_verified" field is presented as boolean in IdToken, but is a string of "true" in the response of user info. This inconsistency results in failures when we try to merge them during authorization.

This PR introduce a small leniency so that it will merge a boolean with a string that has value of the boolean's string representation. In another word, it will merge true with "true", also will merge false with "false", but nothing else.
2020-07-14 20:41:16 +10:00
Andrei Dan 4180333bbc
[7.x] Composable templates: add a default mapping for @timestamp (#59244) (#59510)
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.

(cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 11:29:33 +01:00
Costin Leau 5580eb61ed EQL: Improve sequence limiting (#59439)
Improve the way limit (in particular offset) is being applied to handle
the case where the matches are less than the offset and absolute limit.

Combine Matcher and SequenceStateMachine into one class since the two
have evolved beyond their original name and structure.

(cherry picked from commit 63d3c62cdfc33dea03f21d5565b9c8ea104003eb)
2020-07-14 13:19:09 +03:00
Hendrik Muhs c8290167a0
[7.x][Transform] separate pivot and extract function interface (#59505)
separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function.
Foundation to add more functions to transform.

piggy backed fixes:
 - when running geo tile group_by it could fail due to query clause limit (unreleased)
 - new style page size using settings was not validating limit of 10k (7.8)
2020-07-14 11:27:16 +02:00
Martijn van Groningen 5f24be1bc1
Also set system property when running test task. (#59499)
Closes #59488
2020-07-14 10:34:52 +02:00
Rene Groeschke d5c11479da
Remove remaining deprecated api usages (#59231) (#59498)
- Fix duplicate path deprecation by removing duplicate test resources
- fix deprecated non annotated input property in LazyPropertyList
- fix deprecated usage of AbstractArchiveTask.version
- Resolve correct test resources
2020-07-14 10:25:00 +02:00
David Roberts 529aa345df [ML] Account for per-partition categorization in model memory estimate (#59458)
Now that we have per-partition categorization, the estimate for
the model memory limit required for a particular analysis config
needs to take into account whether categorization is operating
for the job as a whole or per-partition.
2020-07-14 09:16:28 +01:00
Yang Wang 4350add12c
Allow null name when deserialising API key document (#59485) (#59496)
API keys can be created without names using grant API key action. This is considered as a bug (#59484). Since the feature has already been released, we need to accomodate existing keys that are created with null names. This PR relaxes the parser logic so that a null name is accepted.
2020-07-14 16:08:32 +10:00
Tim Brooks 623df95a32
Adding indexing pressure stats to node stats API (#59467)
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
2020-07-13 17:23:42 -06:00
Lee Hinman 81bdb20b8a Fix license header for DataStreamRestIT 2020-07-13 14:41:29 -06:00
Lee Hinman bf1a60130d
[7.x] Add telemetery for data streams (#59433) (#59454)
This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is
pretty minimal, returning only the number of data streams and the number of indices currently
abstracted by a data stream:

```
  ...
  "data_streams" : {
    "available" : true,
    "enabled" : true,
    "data_streams" : 3,
    "indices_count" : 17
  }
  ...
```
2020-07-13 14:30:11 -06:00
Christos Soulios 3868bcc7b8
[7.x] Histogram integration on Histogram field type (#59431)
Backports #58930 to 7.x
Implements histogram aggregation over histogram fields as requested in #53285.
2020-07-13 19:36:33 +03:00
Dimitris Athanasiou a7895ff458
[7.x][ML] Remove unused member var from ExtractedFieldsDetector (#59395) (#59406)
Removes member variable `index` from `ExtractedFieldsDetector`
as it is not used.

Backport of #59395

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-13 19:10:43 +03:00
Igor Motov 1acb4aeba9
EQL: Prepare for release (#59331) (#59426)
Enables eql setting in release builds.

Relates #51613
2020-07-13 11:54:32 -04:00
Martijn van Groningen b1b7bf3912
Make data streams a basic licensed feature. (#59392)
Backport of #59293 to 7.x branch.

* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
  this results in storing a composable index template
  with data stream definition only to work with default
  distribution. This way data streams can only be used
  with default distribution, since a data stream can
  currently only be created if a matching composable index
  template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
   to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
  to fail if `_data_stream_timestamp` meta field mapper
  isn't registered. So that a more understandable
  error is returned when attempting to store a template
  with data stream definition via the oss distribution.

In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
2020-07-13 17:26:46 +02:00
Yang Wang cc9166a5ea Mute failed 120_api_key_auth test till #59425 is addressed. 2020-07-14 01:10:36 +10:00
Yang Wang edf27cd765 Adjust BWC versions for API key auth test.
API key realm name is not available in authentication metadata prior to
v7.5. The issue is tracked at #59425
2020-07-14 00:38:42 +10:00
David Roberts b5e8250a4e
[ML] Drive categorization warning notifications from annotations (#59393)
With the introduction of per-partition categorization the old
logic for creating a job notification for categorization status
"warn" does not work.  However, the C++ code is already writing
annotations for categorization status "warn" that take into
account whether per-partition categorization is being used and
which partition(s) the warnings relate to.  Therefore, this
change alters the Java results processor to create notifications
based on the annotations the C++ writes.  (It is arguable that
we don't need both annotations and notifications, but they show
up in different ways in the UI: only annotations are visible in
results and only notifications set the warning symbol in the
jobs list.  This means it's best to have both.)

Backport of #59377
2020-07-13 15:28:57 +01:00
David Kyle 054d5236d4 Mute RegressionIT failure (#59414)
For #59413
2020-07-13 14:12:19 +01:00
Yang Wang a84469742c
Improve role cache efficiency for API key roles (#58156) (#59397)
This PR ensure that same roles are cached only once even when they are from different API keys.
API key role descriptors and limited role descriptors are now saved in Authentication#metadata
as raw bytes instead of deserialised Map<String, Object>.
Hashes of these bytes are used as keys for API key roles. Only when the required role is not found
in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly
from raw bytes to RoleDescriptors without going through the current detour of
"bytes -> Map -> bytes -> RoleDescriptors".
2020-07-13 22:58:11 +10:00
Dan Hermann e01d73c737
[7.x] Data stream admin actions are now index-level actions 2020-07-10 14:36:18 -05:00
Dan Hermann 7fa9cf601b
Data stream support for rollup search 2020-07-10 11:13:34 -05:00
Alan Woodward 4b9cbfca64 Remove test backported in error 2020-07-09 21:45:41 +01:00
Alan Woodward f4caadd239 MappedFieldType no longer requires equals/hashCode/clone (#59212)
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
2020-07-09 21:05:10 +01:00
Lisa Cawley 54483394ae
[DOCS] Clarify subscription requirements (#58958) (#59307) 2020-07-09 12:24:45 -07:00
Dan Hermann c7e977701a
Data stream support for async search 2020-07-09 13:12:04 -05:00
Dan Hermann b9fb12924b
Data stream support for EQL search 2020-07-09 13:10:44 -05:00
Dimitris Athanasiou b2243337d8
[7.x][ML] Data frame analytics max_num_threads setting (#59254) (#59308)
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.

Backport of #59254 and #57274
2020-07-09 19:15:46 +03:00
Costin Leau d9c1e531db EQL: Introduce until functionality (#59292)
Sequences now support until conditional, which prevents a match from
occurring if the until matches a document while doing look-ups.
Thus a sequence must complete before the until condition matches - if
any document within the sequence occurs at, or after, the until hit, the
sequence is discarded.

(cherry picked from commit 1ba1b9f0661aee655aa48cf9475ac61aaee2bfda)
2020-07-09 17:12:01 +03:00
Dimitris Athanasiou d07b11b86b
[7.x][ML] Perform test inference on java (#58877) (#59298)
Since we are able to load the inference model
and perform inference in java, we no longer need
to rely on the analytics process to be performing
test inference on the docs that were not used for
training. The benefit is that we do not need to
send test docs and fit them in memory of the c++
process.

Backport of #58877

Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-07-09 16:30:49 +03:00
David Kyle 86555ec163
Remove unused function InferenceIndexConstants.mapping() (#59146) (#59158)
InferenceIndexConstants.mapping() is broken and unused.
2020-07-09 14:28:53 +01:00
Andrei Stefan d187b531ed
EQL: Give a name to all toml tests and enforce the naming of new tests (#59283) (#59295)
(cherry picked from commit c8ffe3c9237d3cdd90331795b8e37517155b7e91)
2020-07-09 16:20:29 +03:00
David Kyle dbb9c802b1
Better error message when the model cannot be parsed due to its size (#59166) (#59209)
The actual cause can be lost in a long list of parse exceptions
this surfaces the cause when the problem is size.
2020-07-09 13:43:46 +01:00
David Kyle c5443f78ce
Add Inference Pipeline aggregation to HLRC (#59086) (#59250)
Adds InferencePipelineAggregationBuilder to the HLRC duplicating 
the server side classes
2020-07-09 13:38:45 +01:00
Daniel Mitterdorfer 10ef4d2140
Mute testMaxRestoreBytesPerSecIsUsed (#59289)
Relates #59287
2020-07-09 12:52:17 +02:00
Alan Woodward 67a27e2b9d Add declarative parameters to FieldMappers (#58663)
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.

This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.

Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
2020-07-09 11:43:21 +01:00
Daniel Mitterdorfer daa48329ec
[TEST] Mute FollowerFailOverIT.testFailOverOnFollower (#58659) (#59286)
Relates #58534

Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>
2020-07-09 12:38:36 +02:00
Albert Zaharovits 2b7456db7f
Improve auditing of API key authentication #58928
1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields
to the `access_granted`, `access_denied`, `authentication_success`, and
(some) `tampered_request` audit events. The `apikey.id` and `apikey.name`
are present only when authn using an API Key.
2. When authn with an API Key, the `user.realm` field now contains the effective
realm name of the user that created the key, instead of the synthetic value of
`_es_api_key`.
2020-07-09 13:26:18 +03:00
Dimitris Athanasiou d323f8d698
[ML] Add REST spec for the update data frame analytics endpoint (#59253) (#59281)
Closes #59148

Backport of #59253
2020-07-09 13:12:21 +03:00
Ignacio Vera 1ad00d1ceb
Add Support in geo_match enrichment policy for any type of geometry (#59276)
geo_match enrichment works currently only with points. This change adds the ability to
use any type of geometry.
2020-07-09 11:41:41 +02:00
Andrei Stefan c0e0bca84c
Remove search_after and implicit_join_key_field (#59232) (#59280)
(cherry picked from commit 6ede6c59eff321b9fedad30e19508b9e4f788b54)
2020-07-09 12:34:01 +03:00
Bogdan Pintea acfff7b896
Add sample versions of standard deviation and variance funcs (#59093) (#59274)
* Add sample versions of standard deviation and variance functions (#59093)

* Add STDDEV_SAMP, VAR_SAMP

This commit adds the sampling variations of the standard deviation and
variance agg functions.

(cherry picked from commit 8b29817b49e386215f29cb5b3356d0183fd5d9de)

* Fix: workaround for lack of Map#of() in Java8

Replace Map#of() with a HashMap static init.
2020-07-09 10:17:13 +02:00
Ignacio Vera 14ab35e323
Fix numerical error in CentroidCalculatorTests#testPolygonAsPoint (#59012) (#59272) 2020-07-09 08:42:07 +02:00
Lee Hinman bb1c53a0f5
Allow warnings about 'global' template in upgrade tests (#59242)
These tests sometimes install a template so they can be compatible with older versions, but they run
amok of the occasionally installed "global" template which changes the default number of shards.

This commit adds `allowedWarnings` and allows these warnings to be present, but doesn't fail if they
are not (since the global template is only randomly installed).

Resolves #58807
Resolves #58258
2020-07-08 13:40:55 -06:00
Armin Braun cc3c8be0f1
Fix SLMSnapshotBlockingIntegTests.testSnapshotInProgress (#59218) (#59239)
Waiting `INIT` here is dead code in newer versions that don't use `INIT`
any longer and leads to nothing being written to the repository in older versions
if the snapshot is cancelled at the `INIT` step which then breaks repo consistency
checks.
Since we have other tests ensuring that snapshot abort works properly we can just remove
the wait for `INIT` here and backport this down to 7.8 to fix tests.

relates #59140
2020-07-08 19:13:01 +02:00
James Rodewig 838f717e5f
[DOCS] Add data streams to security docs (#59084) (#59237) 2020-07-08 12:53:56 -04:00
Martijn van Groningen 17bd559253
Fix the timestamp field of a data stream to @timestamp (#59210)
Backport of #59076 to 7.x branch.

The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
  index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
  instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
  to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
  for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.

Relates to #58642
Relates to #53100
Closes #58956
Closes #58583
2020-07-08 17:30:46 +02:00
David Turner 6ffdb19a2a Clean searchable snapshots cache on startup (#59009)
Today we empty the searchable snapshots cache when cleanly closing a
shard, but leak cache files in some cases involving an unclean shutdown.
Such leaks are not permanent, they are cleaned up on shard relocation or
deletion, but they still might last for arbitrarily long until that
happens. This commit introduces a cleanup process that runs during node
startup to catch such leaks sooner.

Also, today we permit searchable snapshots to be held on custom data
paths, and store the corresponding cache files within the custom
location. Supporting this feature would make the cleanup process
significantly more complicated since it would require each node to parse
the index metadata for the shards it held before shutdown. Yet, this
feature is undocumented and offers minimal benefits to searchable
snapshots. Therefore with this commit we forbid custom data paths for
searchable snapshot shards.
2020-07-08 15:17:52 +01:00
Nik Everett a29d3515a2
Improve cardinality measure used to build aggs (#56533) (#59107)
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
   extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
   sub-aggregations live under multi-bucket aggregations. This is
   normally true but `parentCardinality` is properly carried forward
   for single bucket aggregations like `filter` and for multi-bucket
   aggregations configured in single-bucket for like `range` with a
   single range.

While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.

Relates to #56487

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-08 08:42:23 -04:00
Dan Hermann 90c8d3fc9d
IndexNameExpressionResolver::dataStreamNames should support exclusions 2020-07-08 07:35:52 -05:00
Armin Braun 9268b25789
Add Check for Metadata Existence in BlobStoreRepository (#59141) (#59216)
In order to ensure that we do not write a broken piece of `RepositoryData`
because the phyiscal repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.

Relates #56911
2020-07-08 14:25:01 +02:00
Costin Leau 3e32d060bf EQL: Fix bug in skipping window (#59196)
Corrected condition that caused a sequence window to be skipped when a query
returns no results by checking not just the current stage but also following
ones as they can match with in-flight sequences.
Improve logging
Fix NPE when emptying a SequenceGroup
Increase randomization in testing
Make maxspan inclusive (up to and equal to value vs just up to)

(cherry picked from commit ad32c488688cb350c2934dfca03af86045e997b0)
2020-07-08 14:36:39 +03:00
Yannick Welsch 0b9eb210b8
Add basic searchable snapshots usage information (#58828) (#59160)
Adds super basic usage information for searchable snapshots, to be extended later.

Backport of #58828
2020-07-08 13:09:29 +02:00
Yang Wang a6109063a2
Even more robust test for API key auth 429 response (#59159) (#59208)
Ensure blocking tasks are running before submitting more no-op tasks. This ensures no task would be popped out of the queue unexpectedly, which in turn guarantees the rejection of subsequent authentication request.
2020-07-08 16:43:07 +10:00
Nhat Nguyen ef5c397c0f
Sending operations concurrently in peer recovery (#58018)
Today, we send operations in phase2 of peer recoveries batch by batch
sequentially. Normally that's okay as we should have a fairly small of
operations in phase 2 due to the file-based threshold. However, if
phase1 takes a lot of time and we are actively indexing, then phase2 can
have a lot of operations to replay.

With this change, we will send multiple batches concurrently (defaults
to 1) to reduce the recovery time.

Backport of #58018
2020-07-07 22:03:31 -04:00
Albert Zaharovits d4a0f80c32
Ensure authz role for API key is named after owner role (#59041)
The composite role that is used for authz, following the authn with an API key,
is an intersection of the privileges from the owner role and the key privileges defined
when the key has been created.
This change ensures that the `#names` property of such a role equals the `#names`
property of the key owner role, thereby rectifying the value for the `user.roles`
audit event field.
2020-07-07 23:26:57 +03:00
Benjamin Trent e343e066fc
[7.x] [ML] prefer secondary auth headers on evaluate (#59167) (#59183)
* [ML] prefer secondary auth headers on evaluate (#59167)

We should prefer the secondary auth headers when evaluating a data frame
2020-07-07 15:34:47 -04:00
Andrei Dan 24c6a30e2b
[7.9] GET data stream API returns additional information (#59128) (#59177)
* GET data stream API returns additional information (#59128)

This adds the data stream's index template, the configured ILM policy
(if any) and the health status of the data stream to the GET _data_stream
response.

Restoring a data stream from a snapshot could install a data stream that
doesn't match any composable templates. This also makes the `template`
field in the `GET _data_stream` response optional.

(cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-07 20:30:09 +01:00
Nik Everett 93ff5bf9c8
Remove blocking from inference pipeline builder (#59096) (#59162)
This removes the blocking model lookup from the `inference` aggregator's
builder by integrating it into the request rewrite process that loads
stuff asynchronously.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-07 12:31:17 -04:00
Armin Braun 6dec2cf722
Fix SLM Tests Leaking Snapshot Operation (#59150) (#59155)
Fixed an issue #59082 introduced. We have to wait for no more operations
in all tests here not just the one we were waiting in already so that the cleanup
operation from the parent class can run without failure.
2020-07-07 17:19:06 +02:00
Rene Groeschke a896df53ac
Remove misc dependency related deprecation warnings (7.x backport) (#59122)
* Fix dependency related deprecations (#58892)
* Fix classpath setup for forbiddenapi usage
2020-07-07 17:10:31 +02:00
Christoph Büscher 7c64a1bd7b Muting failing ApiKeyIntegTests 2020-07-07 16:02:59 +02:00
Yang Wang f84b76661d
Make test more robust for API key auth 429 (#59077) (#59136)
Adds error handling when filling up the queue of the crypto thread pool. Also reduce queue size of the crypto thread pool to 10 so that the queue can be cleared out in time.

Test testAuthenticationReturns429WhenThreadPoolIsSaturated has seen failure on CI when it tries to push 1000 tasks into the queue (setup phase). Since multiple tests share the same internal test cluster, it may be possible that there are lingering requests not fully cleared out from the queue. When it happens, we will not be able to push all 1000 tasks into the queue. But since what we need is just queue saturation, so as long as we can be sure that the queue is fully filled, it is safe to ignore rejection error and just move on.

A number of 1000 tasks also take some to clear out, which could cause the test suite to time out. This PR change the queue to 10 so the tests would have better chance to complete in time.
2020-07-07 22:27:10 +10:00
Rene Groeschke e8181fc627
Fix implicit duplicate duplicatesStrategy in processResources (#58929) (#59127)
* Fix implicit duplicate duplicatesStrategy in processResources
* Fix duplicates strategy in docker distribution setup
2020-07-07 13:45:36 +02:00
Ignacio Vera 5cc6457ed8
upgrade to lucene-8.6.0-snapshot-6a715e2ecc3 (#59091) (#59120) 2020-07-07 12:07:41 +02:00
Armin Braun d6d6df16bb
Share IT Infrastructure between Core Snapshot and SLM ITs (#59082) (#59119)
For #58994 it would be useful to be able to share test infrastructure.
This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests
accordingly and adds a shared and efficient (compared to the previous implementations)
way of waiting for no running snapshot operations to the test infrastructure to dry things up further.
2020-07-07 12:04:41 +02:00
David Roberts e217f9a1e8
[ML] Wait for shards to initialize after creating ML internal indices (#59087)
There have been a few test failures that are likely caused by tests
performing actions that use ML indices immediately after the actions
that create those ML indices.  Currently this can result in attempts
to search the newly created index before its shards have initialized.

This change makes the method that creates the internal ML indices
that have been affected by this problem (state and stats) wait for
the shards to be initialized before returning.

Backport of #59027
2020-07-07 10:52:10 +01:00
Francisco Fernández Castaño 0752a86fe5
Enforce higher priority for RepositoriesService ClusterStateApplier (#59040)
* Enforce higher priority for RepositoriesService ClusterStateApplier

This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.

Backport of #58808
2020-07-07 09:51:08 +02:00
Jake Landis 604c6dd528
7.x - Create plugin for yamlTest task (#56841) (#59090)
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.

The remaining cases in modules, plugins, and x-pack will be handled in followups.

This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.

The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.

Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).

As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.

Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
2020-07-06 14:16:26 -05:00
Costin Leau f9c15d0fec EQL: Introduce sequencing fetch size (#59063)
The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory.
This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired.
As such, use in testing a minimum fetch size which exposed a number of bugs:

Jumping across data across queries causing valid data to be seen as a gap.
Incorrectly resuming searching across pages (again causing data to be discarded).
which have been addressed.

(cherry picked from commit 2f389a7724790d7b0bda67264d6eafcfa8b2116e)
2020-07-06 19:14:26 +03:00
Costin Leau b2e9c6f640 Update UnresolvedRelationTests
UnresolvedRelation does not care about its source during equality hence
ignore it when doing randomized mutations.

Relates #59014

(cherry picked from commit b21222e714fbf85aad0916e4d4b6a933d2b6958a)
2020-07-06 19:14:25 +03:00
Costin Leau fe775a315f EQL: Obey size request parameter (#59014)
While at it, change the default size to 10 (to align it with the search
API defaults).

(cherry picked from commit 45795939b277e736a9e4f2f008d1c3f406239075)
2020-07-06 19:14:25 +03:00
Yang Wang 2a1635ad69
Create API key with TransportBulkAction directly (#59046) (#59060)
Use TransportBulkAction directly to create API keys instead of going through the proxy from IndexAction to BulkAction.
2020-07-06 23:32:07 +10:00
Yang Wang 66c0231895
Improve threadpool usage and error handling for API key validation (#58090) (#59047)
The PR introduces following two changes:

Move API key validation into a new separate threadpool. The new threadpool is created separately with half of the available processors and 1000 in queue size. We could combine it with the existing TokenService's threadpool. Technically it is straightforward, but I am not sure whether it could be a rushed optimization since I am not clear about potential impact on the token service.

On threadpoool saturation, it now fails with EsRejectedExecutionException which in turns gives back a 429, instead of 401 status code to users.
2020-07-06 21:21:07 +10:00
Przemysław Witek 4a791e835b
Simplify parser declarations when specialist types are stored in strings (#58996) (#59056) 2020-07-06 13:05:03 +02:00
Przemysław Witek f35ad0d4e1
Report peak model memory in ModelSizeStats (#59017) (#59055) 2020-07-06 12:55:12 +02:00
David Kyle c651135562
[ML] Make Inference processor field_map and inference_config optional (#59010)
Relaxes the requirement that the inference ingest processor must has a
field_map and inference_config defined even if they are empty.
2020-07-06 11:35:30 +01:00
David Kyle 0fc12194bf
[ML] Increase timeout in MlDistributedFailureIT (#58997) (#59013)
Doubles the timeout on the ensureStableClusterOnAllNodes method to 60s
to account for v slow ci
2020-07-06 11:30:41 +01:00
Martijn van Groningen f0dd9b4ace
Add data stream timestamp validation via metadata field mapper (#59002)
Backport of #58582 to 7.x branch.

This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.

The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.

Relates to #53100
2020-07-06 11:32:33 +02:00
Armin Braun 49857cc35d
Dry up Master Disconnect Disruption Tests (#58953) (#59050)
Dry up tests that use a disruption that isolates the master from all other nodes.
Also, turn disruption types that have neither parameters nor state into constants
to make things a little clearer.
2020-07-06 11:04:24 +02:00
Yang Wang a9151db735
Map only specific type of OIDC Claims (#58524) (#59043)
This commit changes our behavior in 2 ways:

- When mapping claims to user properties ( principal, email, groups,
name), we only handle string and array of string type. Previously
we would fail to recognize an array of other types and that would
cause failures when trying to cast to String.
- When adding unmapped claims to the user metadata, we only handle
string, number, boolean and arrays of these. Previously, we would
fail to recognize an array of other types and that would cause
failures when attempting to process role mappings.

For user properties that are inherently single valued, like
principal(username) we continue to support arrays of strings where
we select the first one in case this is being depended on by users
but we plan on removing this leniency in the next major release.

Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
2020-07-06 11:36:41 +10:00
Tanguy Leroux 49f4227837 Check acknowledged responses in FsSearchableSnapshotsIT (#59021)
Despite all my attempts I did not manage to reproduce issues like the ones 
described in #58961. My guess is that the _mount request got retried at 
some point but I wasn't able to validate this assumption.

Still, the FsSearchableSnapshotsIT can be pretty disk heavy if a small 
random chunk size and a large number of documents is picked up in the 
tests. The parent class also does not verify the acknowledged status 
of some requests.

This commit lowers down the chunk size and number of docs in tests 
(this is extensively tests in unit tests) and also adds assertions on 
acknowledged responses.

Relates #58961
2020-07-05 10:50:31 +02:00
Armin Braun 071d8b2c1c
Deduplicate Empty InternalAggregations (#58386) (#59032)
Working through a heap dump for an unrelated issue I found that we can easily rack up
tens of MBs of duplicate empty instances in some cases.
I moved to a static constructor to guard against that in all cases.
2020-07-04 14:02:16 +02:00
Bogdan Pintea e88d71b187
[7.x] SQL: Redact credentials in connection exceptions (#58650) (#59025)
* SQL: Redact credentials in connection exceptions (#58650)

This commit adds the functionality to redact the credentials from the
exceptions generated when a connection attempt fails, preventing them
from leaking into logs, console history etc.

There are a few causes that can lead to failed connections. The most
challenging to deal with is a malformed connection string. The redaction
tries to get around it by modifying the URI to a parsable state, so that
the redaction can be applied reliably. If there's no reliability
guarantee, the redaction will bluntly replace the entire connection
string and the user informed about the option to modify it so that the
redaction won't apply. (This is done by using a caplitalized scheme,
which is legal, but otherwise never used in practice.)

The commit fixes a couple of other issues with the URI parser:
- it allows an empty hostname, or even entire connection string (as per
the existing documentation);
- it reduces the editing of the connection string in the exception
messages (so that the user easier recognize their input);
- it uses the default URI as source for the scheme and hostname.

(cherry picked from commit a0bd5929d0658c4fed44404e0c4d78eac88222fd)

* Implement String#repeat(), unavailable in Java8

Implement a client.StringUtils#repeatString() as a replacement for
String#repeat(), unavailable in Java8.
2020-07-04 11:29:06 +02:00
Benjamin Trent b9d9964d10
[ML] add exponent output aggregator to inference (#58933) (#59016)
* [ML] add exponent output aggregator to inference

* fixing docs

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-03 14:51:00 -04:00
Lisa Cawley 5c19464a2f [DOCS] Clarifies number of file and native realms (#58949) 2020-07-03 11:00:28 -07:00
Dan Hermann c1781bc7e7
[7.x] Add include_data_streams flag for authorization (#59008) 2020-07-03 12:58:39 -05:00
Bogdan Pintea 3d96d91efb
[7.x] SQL: fix handling of escaped chars in JDBC connection string (#58429) (#58977)
SQL: fix handling of escaped chars in JDBC connection string (#58429)

This commit fixes an issue emerging when the connection string URI
contains escaped characters.

The original URI is pre-parsed in order to re-assemble a new URI having
the optional elements filled in with defaults. The new URI has been
using however the unescaped query and fragment parts. So if these
contained any escaped `&` or `=` (such as in the password option value),
the unescaping would reveal them and make them later interfere with the
options parsing.

The commit changes that, so that the new URI be built from the unescaped
"raw" parts of the original URI.

(cherry picked from commit 94eb5a05e79c6e203de548d05b13e00295bd4489)
2020-07-03 17:03:00 +02:00
Luca Cavanna e3fc1638d8 Improve error handling in async search code (#57925)
- The exception that we caught when failing to schedule a thread was incorrect.
- We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager
- when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed.

Closes #58995
2020-07-03 16:07:26 +02:00
Hendrik Muhs ca3da7af85 [ML] handle broken setup with state alias being an index (#58999)
.ml-state-write is supposed to be an index alias, however by accident it can become an index. If
.ml-state-write is a concrete index instead of an alias ML stops working. This change improves error
handling by setting the job to failed and properly log and audit the problem. The user still has to
manually fix the problem. This change should lead to a quicker resolution of the problem.

fixes #58482
2020-07-03 15:26:59 +02:00
Ignacio Vera 2c2486d3d4
Fix GeoHash grid aggregation circuit breaker tests (#58218) (#59001) 2020-07-03 13:46:35 +02:00
Dan Hermann 5e7746d3bd
[7.x] Mirror privileges over data streams to their backing indices (#58991) 2020-07-03 06:33:38 -05:00
Luca Cavanna 4f86f6fb38 Submit async search to not require read privilege (#58942)
When we execute search against remote indices, the remote indices are authorized on the remote cluster and not on the CCS cluster. When we introduced submit async search we added a check that requires that the user running it has the privilege to execute it on some index. That prevents users from executing async searches against remote indices unless they also have read access on the CCS cluster, which is common when the CCS cluster holds no data.

The solution is to let the submit async search go through as we already do for get and delete async search. Note that the inner search action will still check that the user can access local indices, and remote indices on the remote cluster, like search always does.
2020-07-03 12:18:07 +02:00
David Kyle f6a0c2c59d
[7.x] Pipeline Inference Aggregation (#58965)
Adds a pipeline aggregation that loads a model and performs inference on the
input aggregation results.
2020-07-03 09:29:04 +01:00
Tim Vernum 1133c29ce9
Treat roles as a SortedSet (#58988)
The Saml SP document stored the role mapping in a Set, but this made
the order in XContent inconsistent. This switched it to use a TreeSet.

Resolves: #54733
Backport of: #55201
2020-07-03 13:40:58 +10:00
Tim Brooks dc9e364ff2
Count coordinating and primary bytes as write bytes (#58984)
This is a follow-up to #57573. This commit combines coordinating and
primary bytes under the same "write" bucket. Double accounting is
prevented by only accounting the bytes at either the reroute phase or
the primary phase. TransportBulkAction calls execute directly, so the
operations handler is skipped and the bytes are not double accounted.
2020-07-02 19:48:19 -06:00
Benjamin Trent bd9b3b6116
[ML] fix inference ml-stats-write alias creation (#58947) (#58959)
The check for potentially creating the .ml-stats-write alias should verify that the indices actually exist.

closes #58662
2020-07-02 16:16:42 -04:00
Tim Brooks 1ef2cd7f1a
Add memory tracking to queued write operations (#58957)
Currently we do not track the memory consuming by in-process write
operations.

This commit adds a mechanism to track write operation memory usage.
2020-07-02 14:14:57 -06:00
Tal Levy d516959774
Re-enable support for array-valued geo_shape fields. (#58786) (#58943)
A regression in the mapping code led to geo_shape no longer supporting
array-valued fields. This commit fixes this support and adds an integration
test to make sure this problem does not return!
2020-07-02 11:21:55 -07:00
David Roberts 2c04685b81
[ML] Ensure config index mappings are up-to-date before updating configs (#58938)
We already had code to ensure the config index mappings were
up-to-date before creating a new config.  However, it's also
possible that an update to a config could add the latest
settings that require the latest mappings to index correctly.

This change checks that the latest config index mappings are
in place in the 3 update actions in the same way as the checks
are done in the 3 put actions.

Backport of #58916
2020-07-02 18:55:19 +01:00
Robin Clarke 567720d970 [DOCS] Added caveat about the number of file realms (#58369) 2020-07-02 10:27:36 -07:00
Dan Hermann c988afdc15
Data stream support for migrations deprecations info API 2020-07-02 11:16:22 -05:00
Przemysław Witek 751e84e4c8
Rename regression evaluation metrics to make the names consistent with loss functions (#58887) (#58927) 2020-07-02 17:35:55 +02:00
Tanguy Leroux 6aa669c8bb Fix SearchableSnapshotDirectoryStatsTests (#58912)
Similar to #58847 but in a different tests. The failure never 
reproduced locally but occurs from time to time on CI.
2020-07-02 16:39:26 +02:00
Dan Hermann b78bfa01f6
[7.x] Data stream support for graph explore API 2020-07-02 08:19:03 -05:00
David Kyle d6643bfc7f Revert "Mute FsSearchableSnapshotsIT testClearCache (#58902)"
The test was fixed in #58847

This reverts commit bb96c910a5.
2020-07-02 13:21:05 +01:00
David Kyle bb96c910a5 Mute FsSearchableSnapshotsIT testClearCache (#58902)
For #58901
2020-07-02 12:58:28 +01:00
Costin Leau 965f77fa44 EQL: Introduce sequence internal paging (#58859)
Refactor sequence matching classes in order to decouple querying from
results consumption (and matching).
Rename some classes to better convey their intent.

Introduce internal pagination of sequence algorithm, that is getting the
data in slices and, if needed, moving forward in order to find more
matches until either the dataset is consumer or the number of results
desired is found.

(cherry picked from commit bcf2c1141302f3f98c85e82d2c501aa02c8540e9)
2020-07-02 13:44:21 +03:00
Przemysław Witek 8e074c4495
Rename "error" field to "value" for consistency between metrics (#58726) (#58870) 2020-07-02 09:08:56 +02:00
Yang Wang a5a8b4ae1d
Add cache for application privileges (#55836) (#58798)
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.

Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
2020-07-02 11:50:03 +10:00
Benjamin Trent c64e283dbf
[7.x] [ML] handles compressed model stream from native process (#58009) (#58836)
* [ML] handles compressed model stream from native process (#58009)

This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.

1. ModelSizeInfo which contains model size information 
2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.

`model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.


Native side change: https://github.com/elastic/ml-cpp/pull/1349
2020-07-01 15:14:31 -04:00
Mark Vieira 1fcaec7dfc
Ignore test seed used in test system properties (#58789) 2020-07-01 11:52:22 -07:00
James Rodewig a966513eae
[DOCS] Remove problematic terms (#58832) (#58851) 2020-07-01 13:47:14 -04:00
Nhat Nguyen f63cbad629
Ensure CCR partial reads never overuse buffer (#58620)
When the documents are large, a follower can receive a partial response 
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.

Backport of #58620
2020-07-01 13:23:28 -04:00
Tanguy Leroux ec4843f4df Fix AbstractSearchableSnapshotsRestTestCase.testClearCache (#58847)
Since #58728 part of searchable snapshot shard files are written in cache 
in an asynchronous manner in a dedicated thread pool. It means that even 
if a search query is successful and returns, there are still more bytes to 
write in the cached files on disk.

On CI this can be slow; if we want to check that the cached_bytes_written 
has changed we need to check multiple times to give some time for the 
cached data to be effectively written.
2020-07-01 18:01:00 +02:00
Benjamin Trent c768467155
Muting flakey test (#58855) (#58856) 2020-07-01 11:54:43 -04:00
Lee Hinman d3d03fc1c6
[7.x] Add default composable templates for new indexing strategy (#57629) (#58757)
Backports the following commits to 7.x:

    Add default composable templates for new indexing strategy (#57629)
2020-07-01 09:32:32 -06:00
Ryan Ernst c23613e05a
Split license allowed checks into two types (#58704) (#58797)
The checks on the license state have a singular method, isAllowed, that
returns whether the given feature is allowed by the current license.
However, there are two classes of usages, one which intends to actually
use a feature, and another that intends to return in telemetry whether
the feature is allowed. When feature usage tracking is added, the latter
case should not count as a "usage", so this commit reworks the calls to
isAllowed into 2 methods, checkFeature, which will (eventually) both
check whether a feature is allowed, and keep track of the last usage
time, and isAllowed, which simply determines whether the feature is
allowed.

Note that I considered having a boolean flag on the current method, but
wanted the additional clarity that a different method name provides,
versus a boolean flag which is more easily copied without realizing what
the flag means since it is nameless in call sites.
2020-07-01 07:11:05 -07:00
Alan Woodward 3ba16e0f39
Move MappedFieldType#getSearchAnalyzer and #getSearchQuoteAnalyzer to TextSearchInfo (#58830)
Analyzers are specific to text searching, and so should be in TextSearchInfo rather than on
the generic MappedFieldType.

Backport of #58639
2020-07-01 14:52:14 +01:00
Tanguy Leroux d35e8f45da
Allow read operations to be executed without waiting for full range to be written in cache (#58728) (#58829)
This commit changes CacheFile and CachedBlobContainerIndexInput so that
 the read operations made by these classes are now progressively executed 
and do not wait for full range to be written in cache. It relies on the change 
introduced in #58477 and it is the last change extracted from #58164.

Relates #58164
2020-07-01 15:38:17 +02:00
Przemysław Witek 909649dd15
[7.x] Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) (#58825) 2020-07-01 14:52:06 +02:00
Andrei Stefan b904a60275
EQL: Add case handling to stringContains (#58762) (#58813)
Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
(cherry picked from commit 1a58776d3aa563beb364b067a1db46497122306f)
2020-07-01 13:51:45 +03:00
Andrei Stefan 470bcee5bf
EQL: Integrate TOML tests for function folding (#58748) (#58812)
Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
(cherry picked from commit e9b1fa58cf8d510a4b4afb14f66b0d5f9c603ebb)
2020-07-01 13:50:54 +03:00
Przemysław Witek 2638809cba
Mute failing test DataFrameAnalyticsConfigProviderIT.testUpdate_UpdateCannotBeAppliedWhenTaskIsRunning (#58821) 2020-07-01 12:28:23 +02:00
Yannick Welsch 15c85b29fd
Account for recovery throttling when restoring snapshot (#58658) (#58811)
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.

The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.

Relates https://github.com/elastic/elasticsearch/issues/57023

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2020-07-01 12:19:29 +02:00
David Turner 3a234d2669
Account for remaining recovery in disk allocator (#58800)
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.

This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.

Backport of #58029 to 7.x

* Picky picky

* Missing type
2020-07-01 10:12:44 +01:00
David Kyle 27d52d4d23
Remove the Model interface (#58754) (#58803)
The Model interface was implemented by just one class and did not 
contribute to making the code more undertandable
2020-07-01 09:57:02 +01:00
Dario Gieselaar 417f7062c5
[7.x] Add read privileges for annotations for apm_user (#58530) (#58781)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-01 09:04:57 +02:00
Yang Wang 3d49e62960
Support handling LogoutResponse from SAML idP (#56316) (#58792)
SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective.

The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error.

This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.
2020-07-01 16:47:27 +10:00
Tim Vernum 9e49af03b7
Reenable test after backport (#58717)
This commit re-enables CCR rolling upgrade tests following the
backport of #58217 to 7.8 branch (7.8.1)
2020-07-01 11:50:30 +10:00
Lee Hinman 74a78b3a7b
Mute AzureSearchableSnapshotsIT (#58775)
Relates to #58260
2020-06-30 13:30:51 -06:00
Dan Hermann 22806c943d
Data stream support for ILM remove policy API (#58595) (#58770) 2020-06-30 14:03:19 -05:00
Benjamin Trent a2331bc9d4
[Transform] fix bug in supporting boolean values in pivot (#58741) (#58760)
Since the underlying composite aggs support boolean mapped values for terms, transforms should also support them

closes #58697
2020-06-30 13:47:58 -04:00
Martijn van Groningen adcef93a6c
Introduce new put mapping action for dynamic mapping updates. (#58746)
Backport of #58419

Mapping updates that originate from indexing a document with unmapped fields will use this new action
instead of the current put mapping action. This way on the security side, authorization logic
can easily determine whether a mapping update is automatically generated or a mapping update originates
from the put mapping api.

The new auto put mapping action is only used if all nodes are on the version that supports it.
2020-06-30 18:02:31 +02:00
Julie Tibshirani ab65a57d70
Merge mappings for composable index templates (#58709)
This PR implements recursive mapping merging for composable index templates.

When creating an index, we perform the following:
* Add each component template mapping in order, merging each one in after the
last.
* Merge in the index template mappings (if present).
* Merge in the mappings on the index request itself (if present).

Some principles:
* All 'structural' changes are disallowed (but everything else is fine). An
object mapper can never be changed between `type: object` and `type: nested`. A
field mapper can never be changed to an object mapper, and vice versa.
* Generally, each section is merged recursively. This includes `object`
mappings, as well as root options like `dynamic_templates` and `meta`. Once we
reach 'leaf components' like field definitions, they always overwrite an
existing one instead of being merged.

Relates to #53101.
2020-06-30 08:01:37 -07:00
David Roberts d9e0e0bf95
[ML] Pass through the stop-on-warn setting for categorization jobs (#58738)
When per_partition_categorization.stop_on_warn is set for an analysis
config it is now passed through to the autodetect C++ process.

Also adds some end-to-end tests that exercise the functionality
added in elastic/ml-cpp#1356

Backport of #58632
2020-06-30 15:17:04 +01:00
Rene Groeschke d952b101e6
Replace compile configuration usage with api (7.x backport) (#58721)
* Replace compile configuration usage with api (#58451)

- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
  - required as java library will by default not have build jar file
  - jar file is now explicit input of the task and gradle will ensure its properly build

* Fix compile usages in 7.x branch
2020-06-30 15:57:41 +02:00
Przemysław Witek 9ea9b7bd3b
[7.x] Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) (#58731) 2020-06-30 14:09:11 +02:00
Benjamin Trent def5550df3
[ML] fix ml inference stats tests (#58690) (#58729) 2020-06-30 07:53:33 -04:00
Przemyslaw Gomulka 3923a10165
Exclude SystemV timezones from randomZone method (#58549) (#58655)
RandomZone test method returns a ZoneId from the set of ids supported by
java. The only difference between joda and java supported timezones are
SystemV* timezones.
These should be excluded from randomZone method as they would break
testing. They also do not bring much confidence when used in testing as
I suspect they are rarely used.
That exclude should be removed for simplification once joda support is removed.
2020-06-30 12:45:53 +02:00
Andrei Stefan 7b80ea7218
Fix release tests (#58713) (#58725)
(cherry picked from commit 7816c100612168bf46595c4813fe374bca2e7259)
2020-06-30 13:42:32 +03:00
Tanguy Leroux 4e03633a66
Differentiate base paths for searchable snapshots QA tests (#58664) (#58714)
This commit adds the BuildParams.testSeed to the repository base paths used 
in searchable snapshots QA tests. For S3 and GCS the test seed is added for 
coherency sake with other integration tests while it's required for Azure as 
Azure 3rd party tests are executed on CI simultaneously for regular and 
SAS token accounts.

Closes #58260
2020-06-30 10:18:33 +02:00
Tim Vernum dcc5a06dec
Display enterprise license as platinum in /_xpack (#58217)
The GET /_license endpoint displays "enterprise" licenses as
"platinum" by default so that old clients (including beats, kibana and
logstash) know to interpret this new license type as if it were a
platinum license.

However, this compatibility layer was not applied to the GET /_xpack/
endpoint which also displays a license type & mode.

This commit causes the _xpack API to mimic the _license API and treat
enterprise as platinum by default, with a new accept_enterprise
parameter that will cause the API to return the correct "enterprise"
value.

This BWC layer exists only for the 7.x branch.
This is a breaking change because, since 7.6, the _xpack API has
returned "enterprise" for enterprise licenses, but this has been found
to break old versions of beats and logstash so needs to be corrected.
2020-06-30 16:42:28 +10:00
Costin Leau 3a546f1f51 EQL: Introduce support for sequence maxspan (#58635)
EQL sequences can specify now a maximum time allowed for their span
(computed between the first and the last matching event).

(cherry picked from commit 747c3592244192a2e25a092f62aec91a899afc83)
2020-06-29 21:31:00 +03:00
Igor Motov 773f3574a9
Removes debug logging from RestEqlCancellationIT (#58676)
The test didn't fail since the fix in #58493. So, it's time to remove debug
logging and close the issue.

Closes #58270
2020-06-29 13:15:01 -04:00
Andrei Stefan 3cb8f54f28
EQL: case sensitivity aware integration testing (#58624) (#58672)
* EQL: case sensitivity aware integration testing (#58624)

* Add DataLoader
* Rewrite case sensitivity settings:
NULL -> run both case sensitive and insensitive tests
TRUE -> run case sensitive test only
FALSE -> run case insensitive test only
* Rename test_queries_supported
* Add more toml tests from the Python client

Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
(cherry picked from commit 34d383421599f060a5c083b40df35f135de49e39)
2020-06-29 18:40:07 +03:00
Tanguy Leroux 73adcf4d44
SparseFileTracker.Gap should keep a reference to the corresponding Range (#58587) (#58665)
SparseFileTracker.Gap can keep a reference to the corresponding range it is about to fill,
it does not need to resolve the range each time onSuccess/onProgress/onFailure are 
called.

Relates #58477
2020-06-29 15:24:19 +02:00
Przemysław Witek 3f7c45472e
[7.x] Introduce DataFrameAnalyticsConfig update API (#58302) (#58648) 2020-06-29 10:56:11 +02:00
Yang Wang 61fa7f4d22
Change privilege of enrich stats API to monitor (#52027) (#52196)
The remote_monitoring_user user needs to access the enrich stats API.
But the request is denied because the API is categorized under admin.
The correct privilege should be monitor.
2020-06-29 10:25:33 +10:00
Dimitris Athanasiou 1817b896c9
[7.x][ML] Add status and increased estimate to memory usage (#58588) (#58606)
Adds parsing of `status` and `memory_reestimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`memory_reestimate_bytes` can be used to update the job's
limit in order to restart the job.

Backport of #58588
2020-06-28 16:27:26 +03:00
Costin Leau 3c81b91474 EQL: Add Head/Tail pipe support (#58536)
Introduce pipe support, in particular head and tail
(which can also be chained).

(cherry picked from commit 4521ca3367147d4d6531cf0ab975d8d705f400ea)
(cherry picked from commit d6731d659d012c96b19879d13cfc9e1eaf4745a4)
2020-06-27 09:49:14 +03:00
Benjamin Trent 7a202b149e
Muting analytics tests (#58617) (#58618) 2020-06-26 16:50:59 -04:00
Tanguy Leroux 775fb5d4cf
Allows SparseFileTracker to progressively execute listeners during Gap processing (#58477) (#58584)
Today SparseFileTracker allows to wait for a range to become available
before executing a given listener. In the case of searchable snapshot,
we'd like to be able to wait for a large range to be filled (ie, downloaded
and written to disk) while being able to execute the listener as soon as
a smaller range is available.

This pull request is an extract from #58164 which introduces a
ProgressListenableActionFuture that is used internally by
 SparseFileTracker. The progressive listenable future allows to register
listeners attached to SparseFileTracker.Gap so that they are executed
once the Gap is completed (with success or failure) or as soon as the
Gap progress reaches a given progress value. This progress value is
defined when the tracker.waitForRange() method is called; this method
has been modified to accept a range and another listener's range to
operate on.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-26 18:26:20 +02:00
James Baiera 89243857ce
Update precommit to filter out project dependencies (#58189) (#58572)
If a project is pulling in an external org.elasticsearch dependency, the dependency
report generation would require a license file for the dependency to be present. 
This would break precommit because a license was present that it did not feel was
warranted. This un-reverts the update to the dependenciesInfo task, as well as the 
JNA license addition.
2020-06-25 16:33:25 -04:00
Lee Hinman f732003370
[7.x] Fix negative limiting with fewer PARTIAL snapshots than minimum required (#58563) (#58569)
In SLM retention, when a minimum number of snapshots is required for retention, we prefer to remove
the oldest snapshots first. To perform this, we limit one of the streams, in a rare case this can
cause:

```
[mynode] error during snapshot retention task
java.lang.IllegalArgumentException: -5
	at java.util.stream.ReferencePipeline.limit(ReferencePipeline.java:469) ~[?:?]
	at org.elasticsearch.xpack.core.slm.SnapshotRetentionConfiguration.lambda$getSnapshotDeletionPredicate$6(SnapshotRetentionConfiguration.java:195) ~[?:?]
	at org.elasticsearch.xpack.slm.SnapshotRetentionTask.snapshotEligibleForDeletion(SnapshotRetentionTask.java:245) ~[?:?]
	at org.elasticsearch.xpack.slm.SnapshotRetentionTask$1.lambda$onResponse$0(SnapshotRetentionTask.java:163) ~[?:?]
	at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176) ~[?:?]
	at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1624) ~[?:?]
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
	at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
```

When certain criteria are met. This commit fixes the negative limiting with `Math.max(0, ...)` and
adds a unit test for the behavior.

Resolves #58515
2020-06-25 14:16:34 -06:00
Henning Andersen 38be2812b1
Enhance extensible plugin (#58542)
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
2020-06-25 20:37:56 +02:00
Jason Tedor 52ad5842a9
Introduce node.roles setting (#58512)
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
 - node.data: false
 - node.ingest: false
 - node.remote_cluster_client: false
 - node.ml: false

at a minimum if they are relying on defaults, but also add:
 - node.master: true
 - node.transform: false
 - node.voting_only: false

If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.

This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.

With this change we deprecate the existing 'node.*' settings such as
'node.data'.
2020-06-25 14:14:51 -04:00
Igor Motov 20af856abd
[7.x] EQL: Adds an ability to execute an asynchronous EQL search (#58192)
Adds async support to EQL searches

Closes #49638

Co-authored-by: James Rodewig james.rodewig@elastic.co
2020-06-25 14:11:57 -04:00
Benjamin Trent c7ba79bc19
[7.x] [ML] make waiting for renormalization optional for internally flushing job (#58537) (#58553)
* [ML] make waiting for renormalization optional for internally flushing job (#58537)

When flushing, datafeeds only need the guaruntee that the latest bucket has been handled.

But, in addition to this, the typical call to flush waits for renormalization to complete. For large jobs, this can take a fair bit of time (even longer than a bucket length). This causes unnecessary delays in handling data.

This commit adds a new internal only flag that allows datafeeds (and forecasting) to skip waiting on renormalization.

closes #58395
2020-06-25 12:26:52 -04:00
Nik Everett 03e6d1b535
Add Variable Width Histogram Aggregation (backport of #42035) (#58440)
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.

This PR addresses #9572.

The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.

At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.

The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.

Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue.

It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.

Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
2020-06-25 11:40:47 -04:00
Nik Everett 71adade73a
Return clear error message if aggregation type is invalid (#58255) (#58365)
The main changes are:

1. Catch the `NamedObjectNotFoundException` when parsing aggregation
   type, and then throw a `ParsingException` with clear error message with hint.
2. Add a unit test method: AggregatorFactoriesTests#testInvalidType().

Closes #58146.

Co-authored-by: bellengao <gbl_long@163.com>
2020-06-25 11:08:25 -04:00
Dimitris Athanasiou c3dfafe0b4
[7.x][ML] Avoid assertion error on empty string feature values for inference (#58541) (#58550)
It is possible for the source document to have an empty string value
for a field that is mapped as numeric. We should treat those as missing
values and avoid throwing an assertion error.

Backport of #58541
2020-06-25 18:07:29 +03:00
Dimitris Athanasiou 5af7071db0
[7.x][ML] Change inference default field name to <dep_var>_prediction… (#58546)
This changes the default value for the results field of inference
applied on models that are trained via a data frame analytics job.
Previously, the results field default was `predicted_value`. This
commit makes it the same as in the training job itself. The new
default field is `<dependent_variable>_prediction`. Apart from
making inference consistent with the training job the model came
from, it is helpful to preserve the dependent variable name
by default as it provides some context to the user that may
avoid confusion as to which model results came from.

Backport of #58538
2020-06-25 18:03:43 +03:00
Benjamin Trent add8ff1ad3
[ML] assume data streams are enabled in data stream tests (#58502) (#58508) 2020-06-24 14:14:48 -04:00
Chris Roberson d5899d1765
[Monitoring] APM mapping update (#46244) (#58498)
* Add acm mapping to APM for beats

* Add root mapping for APM

* Add sourcemap mapping to APM

* Fix missing properties

* Fix a second missing properties

* Add request property to acm

* Remove root and sourcemap per review

Co-authored-by: Mike Place <mike.place@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-24 13:26:30 -04:00
Armin Braun 9e4c5d1dde
Cleaner Handling of Snapshot Related null Custom Values in CS (#58382) (#58501)
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
2020-06-24 17:24:44 +02:00
Martijn van Groningen f4fad9c65a
Re-enable data streams yaml tests in bwc mode (#58500)
Backport of #58403 to 7.x branch.
2020-06-24 16:59:51 +02:00
Hendrik Muhs c1bbfeddc9 Improve rolling upgrade test setup assertions (#58313)
wrap test setup and add proper assert messages

relates #58282
2020-06-24 16:54:48 +02:00
Andrei Stefan 69f73d948b
EQL: code cleanup and further tests (#58458) (#58497)
Add FunctionPipe tests to all functions. Cleanup functions code.

(cherry picked from commit 0f83d5799841fe99d8aeaf46e50dd11aa6bf8a57)
2020-06-24 17:38:56 +03:00
Przemysław Witek 551b8bcd73
[7.x] Use static methods (rather than constants) to obtain .ml-meta and .ml-config index names (#58484) (#58490) 2020-06-24 15:52:45 +02:00
Benjamin Trent fa88e71532
[ML] unify usages of _all and wildcard <*> (#58460) (#58494) 2020-06-24 09:47:57 -04:00
Luca Cavanna dbbf2772d8 Mute newly added ml data streams tests (#58492)
Relates to #58491
2020-06-24 15:11:40 +02:00
markharwood d5ac3bb87f
Field capabilities - make `keyword` a family of field types (#58315) (#58483)
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.

Relates to #53175
2020-06-24 12:32:14 +01:00
Alan Woodward d251a482e9 Move MappedFieldType.similarity() to TextSearchInfo (#58439)
Similarities only apply to a few text-based field types, but are currently set directly on
the base MappedFieldType class. This commit moves similarity information into
TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper.

It was previously possible to include a similarity parameter on a number of field types
that would then ignore this information. To make it obvious that this has no effect, setting
this parameter on non-text field types now issues a deprecation warning.
2020-06-24 10:00:32 +01:00
Jim Ferenczi fcd8a432d9 Submit _async search task should cancel children on cancellation (#58332)
This change allows the submit async search task to cancel children
and removes the manual indirection that cancels the search task when the submit
task is cancelled. This is now handled by the task cancellation, which can cancel
grand-children since #54757.
2020-06-24 09:10:26 +02:00
Larry Gregory 2ca09cddaf [DOCS] Rename kibana user to kibana_system (#58423) 2020-06-23 14:25:09 -07:00
Przemysław Witek 4e4ca6ac25
Extract ClientHelper.filterSecurityHeaders method and use it in ML code (#58447) (#58459) 2020-06-23 22:18:39 +02:00
Benjamin Trent a9b868b7a9
[7.x] [ML] allow data streams to be expanded for analytics and transforms (#58280) (#58455)
This commits allows data streams to be a valid source for analytics and transforms.

Data streams are fairly transparent and our `_search` and `_reindex` actions work without error.

For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.
2020-06-23 14:40:35 -04:00
Benjamin Trent 0cc84d3caf
[ML] wait for yellow state for stats index in tests (#58436) (#58456)
GET inference stats now reads from the .ml-stats index.

Our tests should wait for yellow state before attempting to query the index for stat information.
2020-06-23 13:32:24 -04:00
Dimitris Athanasiou f67fee387b
[7.x][ML] Make regression training set predictable in size (#58331) (#58453)
Unlike `classification`, which is using a cross validation splitter
that produces training sets whose size is predictable and equal to
`training_percent * class_cardinality`, for regression we have been
using a random splitter that takes an independent decision for each
document. This means we cannot predict the exact size of the training
set. This poses a problem as we move towards performing test inference
on the java side as we need to be able to provide an accurate upper
bound of the training set size to the c++ process.

This commit replaces the random splitter we use for regression with
the same streaming-reservoir approach we do for `classification`.

Backport of #58331
2020-06-23 19:49:03 +03:00
Marios Trivyzas e7c40d973e
SQL: Relax parsing of date/time escaped literals (#58336) (#58450)
Improve the usability of the MS-SQL server/ODBC escaped
date/time/timestamp literals, by allowing timezone/offset ids
in the parsed string, e.g.:
```
{ts '2000-01-01T11:11:11Z'}
```

Closes: #58262
(cherry picked from commit 0af1f2fef805324e802d97d2fd9b4660abb403f0)
2020-06-23 18:05:54 +02:00
David Roberts 0d6bfd0ac3
[7.x][ML] Fix wire serialization for flush acknowledgements (#58443)
There was a discrepancy in the implementation of flush
acknowledgements: most of the class was designed on the
basis that the "last finalized bucket time" could be null
but the wire serialization assumed that it was never
null.  This works because, the C++ sends zero "last
finalized bucket time" when it is not known or not
relevant.  But then the Java code will print that to
XContent as it is assuming null represents not known or
not relevant.

This change corrects the discrepancies.  Internally within
the class null represents not known or not relevant, but
this is translated from/to 0 for communications from the
C++ and old nodes that have the bug.

Additionally I switched from Date to Instant for this
class and made the member variables final to modernise it
a bit.

Backport of #58413
2020-06-23 16:42:06 +01:00
Mark Tozzi 52806a8f89
Small VS config cleanup (#58294) (#58442) 2020-06-23 10:53:06 -04:00
Benjamin Trent 61142a3005
[ML] only log if forecasts are set to failed (#58421) (#58437)
This adjusts the logging level for setting forecasts to failed to WARN. And it will only log if 1 or more forecasts were adjusted to failed.
2020-06-23 10:24:03 -04:00
Alan Woodward 8ebd341710
Add text search information to MappedFieldType (#58230) (#58432)
Now that MappedFieldType no longer extends lucene's FieldType, we need to have a
way of getting the index information about a field necessary for building text queries,
building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo
abstraction that holds this information, and a getTextSearchInfo() method to
MappedFieldType to make it available. Field types that do not support text search can
just return null here.

This allows us to remove the MapperService.getLuceneFieldType() shim method.
2020-06-23 14:37:26 +01:00
Alan Woodward 519d1278e2
Make FieldTypeLookup immutable (#58162) (#58411)
FieldTypeLookup maps field names to their MappedFieldTypes. In the past, due to
the presence of multiple mapping types within a single index, this had to be updated
in-place because a mapping update might only affect one type. However, now that
we only have a single type per index, we can completely rebuild the FieldTypeLookup
on each update, removing lots of concurrency worries.
2020-06-23 10:51:32 +01:00
David Roberts f97b37190b [ML] Add a new annotation type for categorization status changes (#58394)
Adds a new value to the "event" enum of ML annotations, namely
"categorization_status_change".

This will allow users to see when categorization was found to
be performing poorly.  Once per-partition categorization is
available, it will allow users to see when categorization is
performing poorly for a specific partition.

It does not make sense to reuse the "model_change" event that
annotations already have, because categorizer state is separate
to model state ("model" state is really anomaly detector state),
and is not reverted by the revert model snapshot API.
Therefore annotations related to categorization need to be
treated differently to annotations related to anomaly detection.
2020-06-23 09:16:27 +01:00
Rene Groeschke bd2dd81bc6
Fix deprecated property usage in archive tasks (#58269) (#58308) 2020-06-23 09:11:46 +02:00
Martijn van Groningen 7dda9934f9
Keep track of timestamp_field mapping as part of a data stream (#58400)
Backporting #58096 to 7.x branch.
Relates to #53100

* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
2020-06-22 17:46:38 +02:00
Costin Leau 765f1b5775 SQL: Fix bug in resolving aliases against filters (#58399)
When doing aliasing with the same name over non existing fields, the analyzer gets stuck in a loop trying to resolve the alias over and over leading to SO. This PR breaks the cycle by checking the relationship between the alias and the child it tries to replace as an alias should never replace its child.

Fix #57270
Close #57417
Co-authored-by: Hailei <zhh5919@163.com>

(cherry picked from commit 46786ff2e1ed5951006ff4bdd2b6ac6a1ebcf17b)
2020-06-22 16:05:42 +03:00
Przemko Robakowski a44dad9fbb
[7.x] Add support for snapshot and restore to data streams (#57675) (#58371)
* Add support for snapshot and restore to data streams (#57675)

This change adds support for including data streams in snapshots.
Names are provided in indices field (the same way as in other APIs), wildcards are supported.
If rename pattern is specified it renames both data streams and backing indices.
It also adds test to make sure SLM works correctly.

Closes #57127

Relates to #53100

* version fix

* compilation fix

* compilation fix

* remove unused changes

* compilation fix

* test fix
2020-06-19 22:41:51 +02:00
Benjamin Trent bf8641aa15
[7.x] [ML] calculate cache misses for inference and return in stats (#58252) (#58363)
When a local model is constructed, the cache hit miss count is incremented.

When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
2020-06-19 09:46:51 -04:00
Stuart Tettemer 20abba8433
Scripting: Deprecate general cache settings (#55753) (#58283)
Backport: ef543b0
2020-06-18 11:54:23 -06:00
Jim Ferenczi 1c1a6d4ec8 Handle failures with no explicit cause in async search (#58319)
This commit fixes an AOOBE in the handling of fatal
failures in _async_search. If the underlying cause is not found,
this change uses the root failure.

Closes #58311
2020-06-18 18:57:58 +02:00
Przemysław Witek 9dd3d5aa48
[7.x] Delete auto-generated annotations when model snapshot is reverted (#58240) (#58335) 2020-06-18 17:59:52 +02:00
Jason Tedor be08268562
Allow follower indices to override leader settings (#58103)
Today when creating a follower index via the put follow API, or via an
auto-follow pattern, it is not possible to specify settings overrides
for the follower index. Instead, we copy all of the leader index
settings to the follower. Yet, there are cases where a user would want
some different settings on the follower index such as the number of
replicas, or allocation settings. This commit addresses this by allowing
the user to specify settings overrides when creating follower index via
manual put follower calls, or via auto-follow patterns. Note that not
all settings can be overrode (e.g., index.number_of_shards) so we also
have detection that prevents attempting to override settings that must
be equal between the leader and follow index. Note that we do not even
allow specifying such settings in the overrides, even if they are
specified to be equal between the leader and the follower
index. Instead, the must be implicitly copied from the leader index, not
explicitly set by the user.
2020-06-18 11:56:06 -04:00
Alan Woodward 4b8cf2af6a
Add serialization test for FieldMappers when include_defaults=true (#58235) (#58328)
Fixes a bug in TextFieldMapper serialization when index is false, and adds a
base-class test to ensure that all field mappers are tested against all variations
with defaults both included and excluded.

Fixes #58188
2020-06-18 15:46:04 +01:00
Marios Trivyzas 50b391e91b
SQL: [Docs] Fix TIME_PARSE documentation (#58182) (#58317)
TIME_PARSE works correctly if both date and time parts are specified,
and a TIME object (that contains only time is returned).

Adjust docs and add a unit test that validates the behavior.

Follows: #55223
(cherry picked from commit 9d6b679a5da88f3c131b9bdba49aa92c6c272abe)
2020-06-18 16:09:13 +02:00
Alan Woodward ca2d12d039 Remove Settings parameter from FieldMapper base class (#58237)
This is currently used to set the indexVersionCreated parameter on FieldMapper.
However, this parameter is only actually used by two implementations, and clutters
the API considerably. We should just remove it, and use it directly in the
implementations that require it.
2020-06-18 12:53:54 +01:00
Tanguy Leroux f3b6e41f02 Do not wrap CacheFile reentrant r/w locks with ReleasableLock (#58244)
Today the read/write locks used internally by CacheFile object are 
wrapped into a ReleasableLock. This is not strictly required and also 
prevents usage of the tryLock() methods which we would like to use 
for early releasing of read operations (#58164).
2020-06-18 11:01:53 +02:00
Andrei Dan caa5d3abe0
ILM actions check the managed index is not a DS write index (#58239) (#58295)
This changes the actions that would attempt to make the managed index read only to
check if the managed index is the write index of a data stream before proceeding.
The updated actions are shrink, readonly, freeze and forcemerge.

(cherry picked from commit c906f631833fee8628f898917a8613a1f436c6b1)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-18 07:45:11 +01:00
Rene Groeschke abc72c1a27
Unify dependency licenses task configuration (#58116) (#58274)
- Remove duplicate dependency configuration
- Use task avoidance api accross the build
- Remove redundant licensesCheck config
2020-06-18 08:15:50 +02:00
Lee Hinman d272646a55 Fix name of template in allowed warning for DS YML test (#58273)
The warning was present, but had the incorrect template name, leading to a test failure.
2020-06-17 11:23:04 -06:00
David Roberts 3f8d16304c
Add ML admin permissions to the kibana_system role (#58172)
As part of the "ML in Spaces" project, access to the ML UI in
Kibana is migrating to being controlled by Kibana privileges.
The ML UI will check whether the logged-in user has permission
to do something ML-related using Kibana privileges, and if they
do will call the relevant ML Elasticsearch API using the Kibana
system user.  In order for this to work the kibana_system role
needs to have administrative access to ML.

Backport of #58061
2020-06-17 17:03:32 +01:00
Benjamin Trent 2de242f80e
[ML] rename EnsembleSizeInfo#inputFieldNameLengths to this.featureNameLengths (#58241) (#58253) 2020-06-17 10:08:55 -04:00
Benjamin Trent 69338b03d7
[ML] expand data_streams when assigning datafeed to node (#58175) (#58242) 2020-06-17 08:34:34 -04:00
Ignacio Vera 2d3d7ab387
mute CentroidCalculatorTests#testPolygonAsPoint (#58249) (#58250) 2020-06-17 14:32:13 +02:00
Jason Tedor b78b3edeea
Upgrade to JNA 5.5.0 (#58183)
This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are
now on the latest maintained line, and pick up a large collection of bug
fixes that have accumulated.
2020-06-17 07:35:08 -04:00
Dimitris Athanasiou 36dbf08d47
[7.x][ML] Improve stability of stratified splitter tests (#58180) (#58224)
The main improvement here is that the total expected
count of training rows in the test is calculated as the
sum of the training fraction times the cardinality of each
class (instead of the training fraction times the total doc count).

Also relaxes slightly the error bound on the uniformity test from 0.12
to 0.13.

Closes #54122

Backport of #58180
2020-06-17 12:40:21 +03:00
Andrei Dan e17c51151b
[7.x] ILM: don't take snapshot of a data stream's write index (#58159) (#58222)
We don't allow converting a data stream's writeable index into a searchable
snapshot. We are currently preventing swapping a data stream's write index
with the restored index.

This adds another step that will not proceed with the searchable snapshot action
until the managed index is not the write index of a data stream anymore.

(cherry picked from commit ccd618ead7cf7f5a74b9fb34524d00024de1479a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-17 09:45:16 +01:00
Ignacio Vera 7080ba5b05
Check for degenerated lines when calculating the centroid (#58216) 2020-06-17 09:34:49 +02:00
Przemysław Witek b22e91cefc
[7.x] Delete auto-generated annotations when job is deleted. (#58169) (#58219) 2020-06-17 09:17:20 +02:00
Lisa Cawley 46d797b1d9 [DOCS] Fixes license management links (#58213) 2020-06-16 16:49:48 -07:00
Stuart Tettemer 01795d1925
Revert "Scripting: Deprecate general cache settings (#55753)" (#58201)
This reverts commit 88e8b34fc2.
2020-06-16 14:58:18 -06:00
Stuart Tettemer 88e8b34fc2
Scripting: Deprecate general cache settings (#55753)
Backport: ef543b0
2020-06-16 13:06:59 -06:00
Benjamin Trent 081da09c72
Allow GET <pattern>/_rollup/data to expand data streams (#58173) (#58177) 2020-06-16 14:01:54 -04:00
Benjamin Trent 3309817d18
[ML] fixing tree inference ctor to allow target_type to be optional (#58132) (#58165)
The tree trained model object will set its target_type to be regression by default.

This updates the inference object to behave the same way.
2020-06-16 13:29:11 -04:00
Benjamin Trent 6c03d97419
Mute TimeSeriesDataStreamsIT.testSearchableSnapshotAction (#58127) (#58181)
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
2020-06-16 12:40:38 -04:00
Alan Woodward 12a3f6dfca
MappedFieldType should not extend FieldType (#58160)
MappedFieldType is a combination of two concerns:

* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched

We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.

Relates to #56814
2020-06-16 16:56:43 +01:00
Dan Hermann 7079a3b09f
[7.x] Prohibit freezing the write index of a data stream (#58168) 2020-06-16 09:37:32 -05:00
Yannick Welsch 1e235a7f55 Fix off-by-one on CCR lease (#58158)
The leases issued by CCR keep one extra operation around on the leader shards. This is not
harmful to the leader cluster, but means that there's potentially one delete that can't be
cleaned up.
2020-06-16 14:04:58 +02:00
David Turner 423697f414 Default to zero replicas for searchable snapshots (#57802)
Today a mounted searchable snapshot defaults to having the same replica
configuration as the index that was snapshotted. This commit changes this
behaviour so that we default to zero replicas on these indices, but allow the
user to override this in the mount request.

Relates #50999
2020-06-16 10:12:23 +01:00
Tal Levy 69d5e044af
Add optional description parameter to ingest processors. (#57906) (#58152)
This commit adds an optional field, `description`, to all ingest processors
so that users can explain the purpose of the specific processor instance.

Closes #56000.
2020-06-15 19:27:57 -07:00
Lisa Cawley 554e60860f [DOCS] Add token and HTTPS requirements for Kerberos (#57180)
Co-authored-by: Tim Vernum <tim@adjective.org>
2020-06-15 14:30:13 -07:00
Lee Hinman d56d2dfb09
[7.x] Scope index templates put during cluster upgrade tests (#58065) (#58122)
This template was added for 7.0 for what I am guessing is a BWC issue related to deprecation
warnings. It unfortunately seems to cause failures because templates for these tests are not cleared
after the test (because these are upgrade tests).

Resolves #56363
2020-06-15 10:47:36 -06:00
markharwood 03dd73dc0d
Fix for wildcard fields that returned ByteRefs not Strings to scripts. (#58060) (#58109)
This need some reorg of BinaryDV field data classes to allow specialisation of scripted doc values.
Moved common logic to a new abstract base class and added a new subclass to return string-based representations to scripts.

Closes #58044
2020-06-15 14:52:56 +01:00
Alejandro Fernández Haro 3d0c8da66d Add monitor and view_index_metadata to the built-in `kibana_system` role (#57755)
Allows the kibana user to collect data telemetry in a background
task by giving the kibana_system built-in role the view_index_metadata
and monitoring privileges over all indices (*).
2020-06-15 14:40:27 +03:00
Shaunak Kashyap 5e2faad783 Add ILM policy PUT and GET for remote_monitoring_agent built-in role (#57963)
Without this fix, users who try to use Metricbeat for Stack Monitoring today
see the following error repeatedly in their Metricbeat log. Due to this error
Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring
data is indexed into the Elasticsearch cluster.

Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
2020-06-15 14:35:30 +03:00
Rene Groeschke 01e9126588
Remove deprecated usage of testCompile configuration (#57921) (#58083)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-14 22:30:44 +02:00
Jason Tedor dcf4131f00
Revert "Add JNA license to SQL CLI dependency licenses"
This reverts commit 076b32d4f3.
2020-06-12 17:04:39 -04:00
Dan Hermann 17f3318732
[7.x] Resolve index API (#58037) 2020-06-12 15:41:32 -05:00
Jason Tedor 076b32d4f3
Add JNA license to SQL CLI dependency licenses
Previously we excluded requiring licenses for dependencies with the
group name org.elasticsearch under the assumption that these use the
top-level Elasticsearch license. This is not always correct, for
example, for the org.elasticsearch:jna dependency as this is merely a
wrapper around the upstream JNA project, and that is the license that we
should be including. A recent change modified this check from using the
group name to checking only if the dependency is a project
dependency. This exposed the use of JNA in SQL CLI to this check, but
the license for it was not added. This commit addresses this by adding
the license.

Relates #58015
2020-06-12 16:38:23 -04:00
Benjamin Trent 79c784932f
[ML] allow feature_names to be optional in ensemble inference model (#58059) (#58067)
This has `EnsembleInferenceModel` not parse feature_names from the XContent.

Instead, it will rely on `rewriteFeatureIndices` to be called ahead time.

Consequently, protections are made for a fail fast path if `rewriteFeatureIndices` has not been called before `infer`.
2020-06-12 16:33:54 -04:00
Mark Vieira 0ce102a5f4
Fix issue with bwc tests running wrong cluster versions (#58063)
We were previously configuring BWC testing tasks by matching on task
name prefix. This naive approach breaks down when you have versions like
1.0.1 and 1.0.10 since they both share a common prefix. This commit
makes the pattern matching more specific so we won't inadvertently
spin up the wrong cluster version.
2020-06-12 12:34:15 -07:00
Ignacio Vera c518670f83
Fix Geo grid aggregation circuit breaker tests (#58028) (#58042)
This commit makes sure we create index with only one shard.
2020-06-12 15:39:27 +02:00
Martijn van Groningen 01d8bb8cfa
Enforce valid field mapping exists for timestamp_field in templates. (#58036)
Backport of #57741 to 7.x branch.

Relates to #53100
2020-06-12 15:24:42 +02:00
David Roberts 93b693527a
[7.x][ML] Add categorizer stats ML result type (#58001)
This type of result will store stats about how well categorization
is performing.  When per-partition categorization is in use, separate
documents will be written for every partition so that it is possible
to see if categorization is working well for some partitions but not
others.

This PR is a minimal implementation to allow the C++ side changes to
be made.  More Java side changes related to per-partition
categorization will be in followup PRs.  However, even in the long
term I do not see a major benefit in introducing dedicated APIs for
querying categorizer stats.  Like forecast request stats the
categorizer stats can be read directly from the job's results alias.

Backport of #57978
2020-06-12 12:08:07 +01:00
markharwood 2da8e57f59
Search - add range query support to wildcard field (#57881) (#57988)
Backport to add range query support to wildcard field

Closes #57816
2020-06-12 11:30:54 +01:00
David Kyle 39020f3900
HLRC for delete expired data by job Id (#57722) (#57975)
High level rest client changes for #57337
2020-06-12 09:44:17 +01:00
Mark Tozzi 36f551bdb4
Make ValuesSourceConfig behave like a config object (#57762) (#58012) 2020-06-11 17:23:55 -04:00
Benjamin Trent 2881995a45
[ML] adding new inference model size estimate handling from native process (#57930) (#57999)
Adds support for reading in `model_size_info` objects.

These objects contain numeric values indicating the model definition size and complexity.

Additionally, these objects are not stored or serialized to any other node. They are to be used for calculating and storing model metadata. They are much smaller on heap than the true model definition and should help prevent the analytics process from using too much memory.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-06-11 15:59:23 -04:00
Alan Woodward 16e230dcb8 Update to lucene snapshot e7c625430ed (#57981)
Includes LUCENE-9148 and LUCENE-9398, which splits the BKD metadata, index and data into separate files and keeps the index off-heap.
2020-06-11 14:51:53 +01:00
David Roberts 54d4f2a623 [ML] Refresh annotations index on job flush and close (#57979)
Now that annotations are part of the anomaly detection job results
the annotations index should be refreshed on flushing and closing
the job so that flush and close continue to fulfil their contracts
that immediately after returning all results the job generated up
to that point are searchable.
2020-06-11 12:29:04 +01:00
David Kyle b87b147704
Add models for search to ModelLoadingService (#57592) (#57919)
ModelLoadingService only caches models if they are referenced by an 
ingest pipeline. For models used in search we want to always cache the
models and rely on TTL to evict them. Additionally when an ingest 
pipeline is deleted the model it references should not be evicted if 
it is used in search.
2020-06-11 10:48:37 +01:00
David Kyle 2905a2f623
Use Search After job iterators (#57875) (#57923)
Search after is a better choice for the delete expired data iterators
where processing takes a long time as unlike scroll a context does not
have to be kept alive. Also changes the delete expired data endpoint to
404 if the job is unknown
2020-06-11 10:06:18 +01:00
Costin Leau ff0ea62cb8 EQL: Fix casing for tiebreaker field (#57943)
Use tiebreaker instead of tieBreaker

(cherry picked from commit 3c774948a5d5e10fac267cb9a54f5d0559a00c1d)
2020-06-11 00:10:19 +03:00