Commit Graph

6198 Commits

Author SHA1 Message Date
Albert Zaharovits e5dce5e805
Use the Index Access Control from the scroll search context (#60640)
When the RBACEngine authorizes scroll searches it sets the index access control
to the very limiting IndicesAccessControl.ALLOW_NO_INDICES value.
This change will set it to the value for the index access control that was produced
during the authorization of the initial search that created the scroll,
which is now stored in the scroll context.
2020-08-05 15:37:37 +03:00
Przemysław Witek 0afa1bd972
Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601) (#60727) 2020-08-05 13:39:40 +02:00
Yannick Welsch 9f6f66f156 Fail searchable snapshot shards on invalid license (#60722)
Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.
2020-08-05 13:14:15 +02:00
Adrien Grand 67f6f34c23
Remove dataset.* fields. (#60720)
These are being replaced by the `data_stream.*` fields.
2020-08-05 11:35:05 +02:00
Rory Hunter 43762f69d1
Move deprecation HTTP tests to deprecation plugin (#60523)
Backport of #60298.

This PR moves the deprecation HTTP tests under the deprecation plugin, as a precursor to
adding further tests as part of #58924.
2020-08-05 09:54:34 +01:00
Adrien Grand 602d269059
Rename `datastream` to `data_stream`. (#60714)
The name of the feature having a space: "data stream", the key should
have an underscore.
2020-08-05 09:55:02 +02:00
Russ Cam e9c0bf1566 Remove body from indices.create_data_stream REST spec (#60705)
This commit removes the body property from the
indices.create_data_stream.json REST API spec
as the API does not support sending a body.

Update the description of the API to remove
that a data stream can be updated with the
API - data streams can only be created with
this API and attempting to update yields a
`resource_already_exists_exception`.

Closes #60704

(cherry picked from commit 2cab2e0ee094769852df31566dbe22b5df59d900)
2020-08-05 17:01:28 +10:00
Igor Motov 959690a64a
Refactor extendedBounds to use DoubleBounds (#60556) (#60681)
Refactors extendedBounds to use DoubleBounds instead
of 2 variables.

This is a follow up for #59175
2020-08-04 16:45:47 -04:00
Francisco Fernández Castaño b500b3d55a
Decrease restore rate limit value to enforce its usage on SearchableSnapshotsIntegTests#testMaxRestoreBytesPerSecIsUsed (#60650)
Fixes #59287. Backport of #59592
2020-08-04 17:44:47 +02:00
Alan Woodward b3ae5d26bd
Move mapper validation to the mappers themselves (#60072) (#60649)
Currently, validation of mappers (checking that cross-references are correct, limits on
field name lengths and object depths, multiple definitions, etc) is performed by the
MapperService. This means that any mapper-specific validation, for example that done
on the CompletionFieldMapper, needs to be called specifically from core server code,
and so we can't add validation to mappers that live in plugins.

This commit reworks the validation framework so that mapper-specific validation is
done on the Mapper itself. Mapper gets a new `validate(MappingLookup)`
method (already present on `MetadataFieldMapper` and now pulled up to the parent
interface), which is called from a new `DocumentMapper.validate()` method. All
the validation code currently living on `MapperService` moves either to individual
mapper implementations (FieldAliasMapper, CompletionFieldMapper) or into
`MappingLookup`, an altered `DocumentFieldMappers` which now knows about
object fields and can check for duplicate definitions, or into DocumentMapper
which handles soft limit checks.
2020-08-04 14:39:20 +01:00
Rene Groeschke bdd7347bbf
Merge test runner task into RestIntegTest (7.x backport) (#60600)
* Merge test runner task into RestIntegTest (#60261)
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
* Fix merge issues
* use former 7.x common test configuration
2020-08-04 14:46:32 +02:00
Adrien Grand 20ae1b75bd
Rename dataset to datastream (#60638)
Co-authored-by: ruflin <spam@ruflin.com>
2020-08-04 09:58:54 +02:00
Armin Braun 7ae9dc2092
Unify Stream Copy Buffer Usage (#56078) (#60608)
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
2020-08-04 09:54:52 +02:00
Yang Wang 54aaadade7
API key name should always be required for creation (#59836) (#60636)
The name is now required when creating or granting API keys.
2020-08-04 13:28:47 +10:00
Tim Vernum c58e32bb27
Improve assertion failure when error is not empty (#60572)
This commit changes TokenAuthIntegTests so all occurrences of

    assertThat(x.size(), equalTo(0));

become

    assertThat(x, empty());

This means that the assertion failure message will include the
contents of the list (`x`) instead of just its size, which
facilitates easier failure diagnosis.

Relates: #56903
Backport of: #60496
2020-08-04 11:05:18 +10:00
Jake Landis bcb9d06bb6
[7.x] Cleanup xpack build.gradle (#60554) (#60603)
This commit does three things:
* Removes all Copyright/license headers for the build.gradle files under x-pack. (implicit Apache license)
* Removes evaluationDependsOn(xpackModule('core')) from build.gradle files under x-pack
* Removes a place holder test in favor of disabling the test task (in the async plugin)
2020-08-03 13:11:43 -05:00
Hendrik Muhs 1e01832b0c fix possible NPE introduced in #60591 2020-08-03 16:40:38 +02:00
Hendrik Muhs cd6492fc11 [Transform] fix regression of date histogram optimization (#60591)
fixes mix up of input and output field name for date histogram optimization.

minimal fix, more tests to be added with #60469

fixes #60590
2020-08-03 15:52:08 +02:00
Yannick Welsch b0d601fa63 Adjust searchable snapshot license (#60578)
No longer needs Platinum license for testing on staging.
2020-08-03 13:19:53 +02:00
Yannick Welsch 9e24a54382 Clean existing index folder when loading searchable snapshot (#60122)
Closing a regular index and mounting a snapshot-backed index into that existing index does not clean the existing index
folders of those preexisting shards.

This PR removes the existing Lucene / translog files once the searchable snapshot shard is starting up. Future PRs will
make reuse of the existing index files to populate the cache.
2020-08-03 13:19:11 +02:00
Yang Wang a76fc324d4
Fix get-license test failure by ensure cluster is ready (#60498) (#60569)
When a new cluster starts, the HTTP layer becomes ready to accept incoming 
requests while the basic license is still being populated in the background. 
When a get license request comes in before the license is ready, it can get 
404 error. This PR fixes it by either wrap the license check in assertBusy or 
ensure the license is ready before perform the check.

This is a backport for both #60498 and #60573
2020-08-03 19:40:03 +10:00
Tim Vernum 1a373b0c21
Only call listener once (SP template registration) (#60567)
This fixes a bug in the IdP's template registration that would
sometimes call the listener twice.

Resolves: #54285
Resolves: #54423

Backport of: #60497
2020-08-03 13:45:16 +10:00
Andrei Dan ac258f10d6
Data streams: throw ResourceAlreadyExists exception (#60518) (#60536)
For consistency reasons (and reducing the overload of IllegalArgumentException)
this changes the exception thrown when trying to create a data stream
that already exists.

(cherry picked from commit ac2184c4614bba0f3ee377da49aea0daed98bab4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-08-01 16:31:09 +01:00
Julie Tibshirani f1d4fd8c3e Correct name of IndexFieldData#loadGlobalDirect. (#60492)
It seems 'localGlobalDirect' was just a typo.
2020-07-31 10:53:21 -07:00
Rene Groeschke ed4b70190b
Replace immediate task creations by using task avoidance api (#60071) (#60504)
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
2020-07-31 13:09:04 +02:00
Hendrik Muhs a721d6d19b [Transform] use correct version in BWC serialization test (#60500)
use correct version in BWC serialization test

fixes #60464
2020-07-31 11:23:05 +02:00
Julie Tibshirani 8ac81a3447 Remove IndexFieldData#clear since it is unused. (#60475)
This method was never called. It also seemed tricky that calling a method on
`IndexFieldData` could clear the contents of a shared cache.
2020-07-30 14:07:55 -07:00
Lisa Cawley 9425de8dd1
[DOCS] Clarify support for run as in OIDC realms (#60246) (#60485) 2020-07-30 13:39:41 -07:00
Mark Tozzi 970a0c8957
[7.x] Aggregation tests for Wildcard Field (#58507) (#60423) 2020-07-30 08:56:21 -04:00
Przemysław Witek 9e27f7474c
Make MlDailyMaintenanceService delete jobs that are in deleting state anyway (#60121) (#60439) 2020-07-30 09:53:11 +02:00
Hendrik Muhs aaed6b59d6
[7.x][Transform] add support for missing bucket (#59591) (#60390)
add support for "missing_bucket" in group_by

fixes #42941
fixes #55102
backport #59591
2020-07-30 08:26:51 +02:00
Bogdan Pintea 8c22adc447
SQL: Add option to provide the delimiter for the CSV format (#59907) (#60420)
* SQL: Add option to provide the delimiter for the CSV format (#59907)

* Add option to provide the delimiter to the CSV fmt

This adds the option to provide the desired character as the separator
for the CSV format (the default remains comma).
A set of characters are excluded though - like CR, LF, `"` - to avoid
slipping onto the CSV-dialects slope. The tab is also forbidden, the
user needs to choose the "tsv" format explicitely.

Update the doc to make it clear that the textual CSV, TSV and TXT
formats pass the cursor back to the user through the Cursor HTTP header.

(cherry picked from commit 3a8b00cc7480f7ada57fcea3cbac957facac08fc)

* Java8 fixes

- replace Set#of();
- URLDecoder#decode() requires a string (vs a charset) as 2nd arg.
2020-07-29 21:40:11 +02:00
Bogdan Pintea 30610d962a
Fix SYS COLUMNS schema in ODBC mode (#59513) (#60418)
* Fix SYS COLUMNS schema in ODBC mode (#59513)

* Fix SYS COLUMNS schema in ODBC mode

This fixes a regression when certain ODBC-specific columns that need to
be of the short type were returned as the integer type.

This also fixes the stubbing for the *-indices SYS COLUMN commands.

(cherry picked from commit 96d89dc9b1fd731e736ef804a16bd05496c1dea6)

* Java8 fix: avoid diamond notation in test.

Qualify anonymous class in test.
2020-07-29 21:19:32 +02:00
Bogdan Pintea 4c771485f6
SQL: fix NPE on ambiguous GROUP BY (#59370) (#60416)
* fix npe on ambiguous group by

* add tests for aggregates and group by, add quotes to error message

* add more cases for Group By ambiguity test

* change error messages for field ambiguity

* change collection aliases approach

* add locations of attributes for ambiguous grouping error

* Adress review comments

- remove Comparable implementations from Attribute and Location;
- add ad-hoc comparator for sorting locations in ambiguity message;
- remove added AttributeAlias class with Touple;
- add code comment to explain issue with Location overwriting.

* Fix c&p error in location ref generation comparator

Fix copy&paste error in dedicated comparator used for sorting ambiguity
location references.
Slightly increase its readability.

Co-authored-by: Nikita Verkhovin <verkhovin13@gmail.com>
(cherry picked from commit 9ba70a3483f0f4987229bec231cdc004f51b88a5)
2020-07-29 20:44:28 +02:00
Bogdan Pintea 79ef263fc2
Add test with alias reuse and grouping (#60396) (#60421)
Add test with alias reuse and grouping.

(cherry picked from commit 37ee819eb98fd10c1b16a61e4e1d446d0ee859de)
2020-07-29 20:43:04 +02:00
Mark Vieira 39fa1c4df0
Add compatibility testing for JDBC driver (#60409)
This commit adds compatibility testing of our JDBC driver against
different Elasticsearch versions. Although we are really testing the
forwards compatibility nature of the JDBC driver we model the testing
the same as we do existing BWC tests, that is, with the current branch
fetching the earlier versions of the artifact that is to be tested. In
this case, that's the JDBC driver itself.

Because the tests include the JDBC driver jar on it's classpath we had
to change the packaging of the driver jar in order to avoid jarhell and
other conflicting dependency issues when using an old JDBC driver with
later branches. For this we simply relocate all driver dependencies in
the shadow jar under a "shadowed" package. This allows the JDBC driver
to use the correct version of Elasticsearch libs classes, while the
tests themselves use their versions. Since this required a change to the
driver jar compatibility testing can only go back as far as that version
which at the time of this commit is 7.8.1.
2020-07-29 10:45:11 -07:00
David Roberts 2a0116f51b
[ML] Take more care that memory estimation uses unique named pipes (#60405)
Prior to this change ML memory estimation processes for a
given job would always use the same named pipe names.  This
would often cause one of the processes to fail.

This change avoids this risk by adding an incrementing counter
value into the named pipe names used for memory estimation
processes.

Backport of #60395
2020-07-29 17:29:55 +01:00
Armin Braun bfee7b91ff
Increase Timeouts in SLMBlockingIntegTests (#60356) (#60403)
The retention run goes through a number of steps and can randomly take more than 10s.
=> increased timeout to 30s like we did in other spots in this test

Also, noticed that we had a hard wait of 10s in this test, removed it and adjusted following
busy assert in a way that can deal with a missing snapshot (from when the assert runs before
the snapshot was put into the CS).

Closes #60336
2020-07-29 17:34:49 +02:00
Benjamin Trent 76359aaa53
[ML] always write prediction_[score|probability] for classification inference (#60335) (#60397)
In order to unify model inference and analytics results we
need to write the same fields.

prediction_probability and prediction_score are now written
for inference calls against classification models.
2020-07-29 10:58:14 -04:00
Nhat Nguyen 9d4a64e749
Allow CCR on nodes with legacy roles only (#60093)
CCR will stop functioning if the master node is on 7.8, but data nodes 
are before that version because the master node considers that all data
nodes do not have the remote cluster client role. This commit allows CCR
work on data nodes with legacy roles only.

Relates #54146
Relates #59375
2020-07-29 10:57:31 -04:00
Benjamin Trent a6da1fd73e
[ML] require alias when indexing to an alias that should be created (#60315) (#60394)
This sets up all indexing to one of our write aliases to require it actually be an alias.

This allows failures scenarios to be captured quickly, loudly, and then potentially recovered.
2020-07-29 10:52:36 -04:00
Jim Ferenczi 578749a5e8 Fix AsyncResultsServiceTests#testRetrieveFromMemoryWithExpiration (#60337)
This change ensures that the expiration time that is set in the test
is long enough to not be triggered by a slow execution.

Closes #60255
2020-07-29 09:47:47 +02:00
Hendrik Muhs 5eb04fb413 [Transform] fix performance regression introduced in #60196 (#60276)
re-work #60196, to not skip building change collectors as otherwise date histogram only
pivots would run slow

relates #60125
2020-07-29 09:44:03 +02:00
Armin Braun 753fd4f6bc
Cleanup and optimize More Serialization Spots (#59959) (#60331)
Same as #59626 for a few more spots.
2020-07-29 07:20:44 +02:00
Yang Wang 3a0e7f4294
Unmute kerberos tests for jdk 15 and mute for jdk 8u262 (#60279)
The JDK bug (https://bugs.openjdk.java.net/browse/JDK-8246193) is fixed since b26.
The tests can be unmuted since we are already using b33. However the same bug is now
affecting jdk 8u262, which is the base for current Zulu jdk 8.48. This PR mute the tests
for this specific jdk version.

Relates: #56507
2020-07-29 12:57:00 +10:00
Benjamin Trent 54c8936508
[ML] do not summerize importance for custom features (#60198) (#60333)
If a feature is created via a custom pre-processor,
we should return the importance for that feature.

This means we will not return the importance for the
original document field for custom processed features.

closes https://github.com/elastic/elasticsearch/issues/59330
2020-07-28 15:58:20 -04:00
Julie Tibshirani c7bfb5de41
Add search `fields` parameter to support high-level field retrieval. (#60258)
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.

Addresses #49028 and #55363.
2020-07-28 10:58:20 -07:00
James Rodewig 8edae3cd15
[DOCS] Update pre-existing data stream refs (#60289) (#60293) 2020-07-28 13:51:43 -04:00
Nhat Nguyen 416e51980c Relax ShardFollowTasksExecutor validation (#60054)
If a primary shard of a follower index is being relocated, then we
will fail to create a follow-task. This validation is too restricted.
We should ensure that all primaries of the follower index are active
instead.

Closes #59625
2020-07-28 13:46:49 -04:00
Nhat Nguyen 6ece629ec3 Set timeout of master requests on follower to unbounded (#60070)
Today, a follow task will fail if the master node of the follower
cluster is temporarily overloaded and unable to process master node
requests (such as update mapping, setting, or alias) from a follow-task
within the default timeout. This error is transient, and follow-tasks
should not abort. We can avoid this problem by setting the timeout of
master node requests on the follower cluster to unbounded.

Closes #56891
2020-07-28 13:46:49 -04:00
Zachary Tong 9f8ec3e3fb Mute SSLDriverTests#testCloseDuringHandshakePreJDK11
Tracking issue: https://github.com/elastic/elasticsearch/issues/59992
2020-07-28 13:20:53 -04:00
Zachary Tong 46f9c38c33 Mute tests while waiting on 58807
Bugurl: https://github.com/elastic/elasticsearch/issues/58807
2020-07-28 12:45:49 -04:00
markharwood e0286e9bd3
Search - remove allow-expensive-query checks from wildcard field. (#60273) (#60308)
Removing allow-expensive-query checks because we think this field type is fast enough.

Closes #60139
2020-07-28 17:12:33 +01:00
Dimitris Athanasiou ed7dcff7c4
[7.x][ML] Audit updates on data frame analytics jobs (#60126) (#60287)
Closes #59652

Backport of #60126
2020-07-28 16:33:35 +03:00
Dimitris Athanasiou 16ffcfb9f6
[7.x][ML] Ensure bulk requests are not over memory limit (#60219) (#60283)
Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.

Note the limit was implemented in #58885.

Backport of #60219
2020-07-28 16:04:03 +03:00
Dimitris Athanasiou 981e436d6c
[7.x][ML] Improve assertion on regression alias field test (#60221) (#60264)
Previously the test was asserting the prediction on each document
was close 10.0 from the expected. It turned out that was not enough
as we occasionally saw the test failing by little.

Instead of relaxing that assertion, this commit changes it to
assert the mean prediction error is less than 10.0. This should
reduce the chances of the test failing significantly.

Fixes #60212

Backport of #60221
2020-07-28 11:48:00 +03:00
James Rodewig aba785cb6e
[DOCS] Update my-index examples (#60132) (#60248)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 15:58:26 -04:00
Dan Hermann b98caf58ee
Mark data stream APIs as stable (#59860) (#60206) 2020-07-27 10:37:52 -05:00
Benjamin Trent ea3c49979e
Test mute for issue 60212 (#60214) 2020-07-27 10:10:40 -04:00
Hendrik Muhs 95c99ca887 [Transform] Fix Regression: continuous transform can fail for (date) histogram group_by(#60196)
do not create change collector if group_by configuration does not support change detection

fixes #60125
2020-07-27 14:50:03 +02:00
Dimitris Athanasiou 439b7f7e59
[7.x][ML] DFA result processor should only skip rows and model chunks on cancel (#60113) (#60193)
When the job is force-closed or shutting down due to a fatal error we clean
up all cancellable job operations. This includes cancelling the results processor.
However, this means that we might not persist objects that are written from the
process like stats, memory usage, etc.

In hindsight, we do not gain from cancelling the results processor in its
entirety. It makes more sense to skip row results and model chunks but keep
stats and instrumentation about the job as the latter may contain useful information
to understand what happened to the job.

Backport of #60113
2020-07-27 13:42:46 +03:00
David Roberts 89466eefa5
Don't require separate privilege for internal detail of put pipeline (#60190)
Putting an ingest pipeline used to require that the user calling
it had permission to get nodes info as well as permission to
manage ingest.  This was due to an internal implementaton detail
that was not visible to the end user.

This change alters the behaviour so that a user with the
manage_pipeline cluster privilege can put an ingest pipeline
regardless of whether they have the separate privilege to get
nodes info.  The internal implementation detail now runs as
the internal _xpack user when security is enabled.

Backport of #60106
2020-07-27 10:44:48 +01:00
Dan Hermann 88e8f691af
Update index privileges doc to include data streams (#59139) (#60169) 2020-07-24 07:52:36 -05:00
Lisa Cawley 2665bfffce
[DOCS] Fix security links in machine learning APIs (#60098) (#60152) 2020-07-23 16:43:10 -07:00
Nhat Nguyen bc65b3a590 Increase timeout in AutoFollowIT (#60004)
It can take more than 10 seconds to auto-follow and create a follow-task
on a slow CI. This commit increases timeout in AutoFollowIT by replacing
assertBusy with assertLongBusy.

Closes #59952
2020-07-23 16:36:53 -04:00
Nhat Nguyen 0fe4d5df67 Increase timeout testFollowIndexWithConcurrentMappingChanges
Fixes #59273
2020-07-23 16:22:58 -04:00
Albert Zaharovits 2eaf5e1c25
[DOCS] Mapping updates are deprecated for ingestion privileges (#60024)
This PR contains the deprecation notice that `create`, `create_doc`, `index` and
`write` ingest privileges do not permit mapping updates in version 8. It also
updates the docs description of said privileges. 

This should've been part of #58784
2020-07-23 19:49:23 +03:00
James Rodewig 988e8c8fc6
[DOCS] Swap `[float]` for `[discrete]` (#60134)
Changes instances of `[float]` in our docs for `[discrete]`.

Asciidoctor prefers the `[discrete]` tag for floating headings:
https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks
2020-07-23 12:42:33 -04:00
Dimitris Athanasiou 6b9a362ec2
[7.x][ML] Skip test inference if DFA task has been stopped (#60116) (#60127)
If the job is stopped before starting inference on test data, we
should skip inference entirely.

Backport of #60116
2020-07-23 18:34:09 +03:00
Dan Hermann ca25f6ae6f
Include the resolve index action in the view_index_metadata privilege (#59785) (#60112) 2020-07-23 08:13:56 -05:00
Dan Hermann fe12217c7f
[7.x] Move REST specs for data streams (#60111) 2020-07-23 08:10:54 -05:00
Albert Zaharovits 3ad3a7d268 DOCS audit attributes for API Key authn (#60033)
This PR describes the new audit entry attributes api_key.id,
api_key.name and authentication.type, as well as the meaning of
existing attributes when authentication is performed using API keys.

This should've been part of #58928
2020-07-23 15:51:40 +03:00
Armin Braun ebb6677815
Formalize and Streamline Buffer Sizes used by Repositories (#59771) (#60051)
Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient.
By the same token, the use of stream copying with the default 8k buffer size  for blob writes was inefficient as well.

We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`.

This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs.

This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.
2020-07-22 21:06:31 +02:00
James Rodewig 74a34777d1
[DOCS] Fix outdated Kibana UI refs and screenshots in security docs (#60023) (#60059) 2020-07-22 13:08:22 -04:00
Larry Gregory a686ccc9b2
[Backport][7.x] Introduce reserved_ml_apm_user kibana privilege (#59854) (#60047) 2020-07-22 11:06:10 -04:00
Jay Modi c8ef2e18f7
Thread safe clean up of LocalNodeModeListeners (#60007)
This commit continues on the work in #59801 and makes other
implementors of the LocalNodeMasterListener interface thread safe in
that they will no longer allow the callbacks to run on different
threads and possibly race each other. This also helps address other
issues where these events could be queued to wait for execution while
the service keeps moving forward thinking it is the master even when
that is not the case.

In order to accomplish this, the LocalNodeMasterListener no longer has
the executorName() method to prevent future uses that could encounter
this surprising behavior.

Each use was inspected and if the class was also a
ClusterStateListener, the implementation of LocalNodeMasterListener
was removed in favor of a single listener that combined the logic. A
single listener is used and there is currently no guarantee on execution
order between ClusterStateListeners and LocalNodeMasterListeners,
so a future change there could cause undesired consequences. For other
classes, the implementations of the callbacks were inspected and if the
operations were lightweight, the overriden executorName method was
removed to use the default, which runs on the same thread.

Backport of #59932
2020-07-22 08:02:18 -06:00
Dimitris Athanasiou 7e652ca873
[7.x][ML] Include same fields during test inference as in training (#… (#60034)
In #58877, when we switched test inference on java, we just
use the doc's `_source` as features. However, this could be
missing out on features that were used during training,
e.g. alias fields, etc.

This commit addresses this by extracting fields to use as
features during inference the same way they are extracted
in `DataFrameDataExtractor` when they are used for training.

Backport of #59963
2020-07-22 12:54:13 +03:00
David Roberts 7358f9fb05 [ML] Mute ForecastIT.testOverflowToDisk in EAR builds (#60040)
Due to https://github.com/elastic/elasticsearch/issues/58806
2020-07-22 10:17:37 +01:00
James Baiera 1c1a4297e0
Track backing indices in data streams stats from cluster state (#59817) (#60015)
If shard level results are incomplete in the data streams stats call, it is possible to get inaccurate 
counts of the number of backing indices, despite this data being accurate and available in the 
cluster state.
2020-07-21 23:21:33 -04:00
James Rodewig 401e12dc2b
[DOCS] Fix data stream docs (#59818) (#60010) 2020-07-21 17:04:13 -04:00
James Baiera b3363cf8f9
[7.x] Remove unneeded rest params from Data Stream Stats (#59575) (#59661)
This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data 
Streams Stats REST endpoint. These options are required for broadcast requests, but are not 
needed for anything in terms of resolving data streams. Instead, we just set a default set of 
IndicesOptions on the transport request.
2020-07-21 15:59:16 -04:00
James Rodewig b302b09b85
[DOCS] Reformat snippets to use two-space indents (#59973) (#59994) 2020-07-21 15:49:58 -04:00
Armin Braun 5613e4b00b
Increase Timeout in testSLMRetentionAfterRestore (#59979) (#59991)
This test failed by hitting the 10s default busy assert timeout.
Given how involved the retention run is (multiple disk reads, CS updates etc.)
we should have a higher timeout here.

Also, removed the pointless delete call for the snapshot that we just asserted is gone,
 at the end of the test.

Closes #59956
2020-07-21 18:19:18 +02:00
James Rodewig 4d646ca819
[DOCS] Fix typo in LDAP config docs (#59953) (#59974)
Co-authored-by: AndyHunt66 <andrew.hunt@elastic.co>
2020-07-21 10:48:08 -04:00
Nik Everett 6f6076e208
Drop some params from IndexFieldData.Builder (backport of #59934) (#59972)
We never used the `IndexSettings` parameter and we only used the
`MappedFieldType` parameter to get the name of the field which we
already know everywhere where we build the `IFD.Builder`. This allows us
to drop a fair bit of ceremony from a couple of tests.
2020-07-21 10:28:59 -04:00
Przemysław Witek 283a1f605c
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 15:15:04 +02:00
Yannick Welsch 07784a0b16 CCR recoveries using wrong setting for chunk sizes (#59597)
The default chunk size for CCR file-based recoveries was wrongly set to 40MB instead of 1MB.
2020-07-21 13:56:06 +02:00
Tal Levy c9ac4bf7c8 Reduce memory usage of GeoGridTiler tests (#59921)
This PR further reduces the memory footprint of the
testGeoHashGridCircuitBreaker test such that only
0.26% of the randomized runs result in memory usage of between
500kb-1mb. where most of that those that are in that range
produce ~650kb of usage. Before, 3% of the runs would use
> 50mb of memory resulting in OOMs in CI

Closes #59853.
2020-07-20 15:45:39 -07:00
Jay Modi 515b53d297
Fix race in SLM master/cluster state listeners (#59896)
This change fixes two possible race conditions in SLM related to
how local master changes and cluster state events are observed. When
implementing the `LocalNodeMasterListener` interface, it is only
recommended to execute on a separate threadpool if the operations are
heavy and would block the cluster state thread. SLM specified that the
listeners should run in the Snapshot thread pool, but the operations
in the listener were lightweight. This had the side effect of causing
master changes to be delayed if the Snapshot threads were all busy and
could also potentially cause the `onMaster` and `offMaster` calls to
race if both were queued and then executed concurrently. Additionally,
the `SnapshotLifecycleService` is also a `ClusterStateListener` and
there is currently no order of operations guarantee between
`LocalNodeMasterListeners` and `ClusterStateListeners` so this could
lead to incorrect behavior.

The resolution for these two issues is that the
SnapshotRetentionService now specifies the `SAME` executor for its
implementation of the `LocalNodeMasterListener` interface. The
`SnapshotLifecycleService` is no longer a `LocalNodeMasterListener` and
instead tracks local master changes in its `ClusterStateListner`.

Backport of #59801
2020-07-20 09:59:46 -06:00
Nik Everett fcd8b5fe6e
Fix top_metrics when metric is missing (backport of #59471) (#59881)
This fixes a null pointer exception when the metric is missing for the
latest document returned by `top_metrics`.

Closes #58926
2020-07-20 10:42:58 -04:00
Rene Groeschke e31ebc96f9
Enforce fail on deprecated gradle usage (7.x backport) (#59758)
* Enforce fail on deprecated gradle usage (#59598)
* Fix branch specific deprecated gradle api usages
* Fix archiveVersion property usage
2020-07-20 08:52:30 +02:00
Albert Zaharovits 3ffb20bdfc
Fix DLS/FLS permission for the submit async search action (#59693)
The submit async search action should not populate the thread context
DLS/FLS permission set, because it is not currently authorised as an "indices request"
and hence the permission set that it builds is incomplete and it overrides the
DLS/FLS permission set of the actual spawned search request (which is built correctly).
2020-07-20 09:37:26 +03:00
Costin Leau 9cc80621c3 EQL: Fix matching of tail/desc queries (#59827)
When dealing with tail queries, data is returned descending for the base
criterion yet the rest of the queries are ascending. This caused a
problem during insertion since while in a page, the data is ASC, between
pages the blocks of data is DESC.
This caused incorrectly sorting inside a SequenceGroup which led to
incorrect results.

Further more in case of limit, since the data in a page is ASC, early
return is not possible neither is desc matching. Thus the page needs to
be consumed first before finding the final results.
A future improvement could be to keep only the top N results dropping
the rest during insertion time.

(cherry picked from commit 77c88da054a1ce662a264f72cde5986d4ce37e3a)
2020-07-19 00:49:16 +03:00
Lee Hinman 8c7d414a3b
[7.x] Fix retrieving data stream stats for a DS with multiple backing indices (#59806) (#59810)
Backports the following commits to 7.x:

    Fix retrieving data stream stats for a DS with multiple backing indices (#59806)
2020-07-17 16:56:07 -06:00
Nik Everett 95e6e4a452
Small cleanup for IndexFieldData (#59724) (#59800)
This drops `IndexComponent` from `IndexFieldData` because it wasn't
doing anything other than forcing us to perform a bunch of ceremony to
build them.
2020-07-17 13:38:15 -04:00
Tal Levy c9ab7bb651
Fix bug in circuit-breaker check for geoshape grid aggregations (#57962) (#59741)
There was a bug in the geoshape circuit-breaker check where the
hash values array was being allocated before its new size was
accounted for by the circuit breaker.

Fixes #57847.
2020-07-17 09:26:00 -07:00
Benjamin Trent b7f30fc929
[7.x] Adding new `require_alias` option to indexing requests (#58917) (#59769)
* Adding new `require_alias` option to indexing requests (#58917)

This commit adds the `require_alias` flag to requests that create new documents.

This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.

When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.

This is useful when an alias is required instead of a concrete index.

closes https://github.com/elastic/elasticsearch/issues/55267
2020-07-17 10:24:58 -04:00
Alan Woodward b29d368b52
Convert DateFieldMapper to parametrized format (#59429) (#59759)
This commit makes DateFieldMapper extend ParametrizedFieldMapper,
declaring its parameters explicitly. As well as changes to DateFieldMapper
itself, there are some changes to dynamic mapping code to ensure that
dynamically detected date formats are passed through to new date mapper
builders.
2020-07-17 12:46:18 +01:00
Andrei Dan 301d61a98e
Tests: fix TimeSeriesDataStreamsIT.testShrinkActionInPolicyWithoutHotPhase (#59603) (#59689)
The ILM policy for the source and shrunk indices run separately (ie. they
are two separate managed indices). This fixes the test which exhibited some
flakiness by allowing some time for the ILM policy for the source index
to finish executing.

(cherry picked from commit c78d5e8499fc5ca2ca1314f97bcc6b55ba06e2e7)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-17 11:26:06 +01:00
Andrei Stefan d513e1090f
Do not create the index, if it's already there (#59745) (#59747)
(cherry picked from commit d097447d257efdf0a36b1157e1f177aed86ecca1)
2020-07-17 11:38:30 +03:00
Tanguy Leroux 4827fec1cf
Revert "Mute AzureSearchableSnapshotsIT (#58775)" (#59749)
This reverts commit 74a78b3a7b.
2020-07-17 10:02:46 +02:00
Martijn van Groningen 74c9402912
Re-enable data stream bwc tests (#59734)
after backporting #59503
Backport of #59732 yo 7.x
2020-07-16 23:59:52 +02:00
Martijn van Groningen 0096238df1
Replaced _data_stream_timestamp meta field's 'path' option with 'enabled' option (#59727)
Backport #59503 to 7.x

and adjusted exception messages.

Relates to #59076
2020-07-16 22:29:40 +02:00
Igor Motov 2408803fad
Adds hard_bounds to histogram aggregations (#59175) (#59656)
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
2020-07-16 15:31:53 -04:00
Martijn van Groningen 4089cbd767
Ignore multiple matching templates warning in specific tests. (#59692) (#59715)
Closes #59679
2020-07-16 20:07:38 +02:00
Marios Trivyzas c7efbc1b83
SQL: Implement DATE_PARSE function for parsing strings into DATE values (#57391) (#59699)
Implement DATE_PARSE(<date_str>, <pattern_str>) function
which allows to parse a date string according to the specified
pattern into a date object. The patterns allowed are those of
java.time.format.DateTimeFormatter.

Closes #54962

Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
Co-authored-by: Patrick Jiang(白泽) <dreamlike.sky@foxmail.com>

(cherry picked from commit 647a413d9b21bd3938f1716bb19f8407e1334125)
2020-07-16 17:24:30 +02:00
Benjamin Trent a28547c4b4
[7.x] [ML] add new `custom` field to trained model processors (#59542) (#59700)
* [ML] add new `custom` field to trained model processors (#59542)

This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 10:57:38 -04:00
Nik Everett 343053c0a7 Fix compilation in Eclipse (backport #59675)
Eclipse was confused by #59583. It can't see a the public inner
interface within the superclass. This time. Usually that is fine, but
the Eclipse gods don't like this particular code, I guess.
2020-07-16 08:25:12 -04:00
David Kyle c349fdcb89 Mute RegressionIT testWithDataStream (#59687)
For #59664
2020-07-16 09:45:29 +01:00
Przemysław Witek df4fea79cb
Add a "verbose" option to the data frame analytics stats endpoint (#59589) (#59621) 2020-07-16 09:51:31 +02:00
Nhat Nguyen b599f7a9c0
Fix estimate size of translog operations (#59206)
Make sure that the estimateSize method includes all fields of translog operations.
2020-07-16 00:19:30 -04:00
Yang Wang 067db1fc3b
Fix test of API key creation in a mixed cluster (#59680)
RoleDescriptors are mandatory prior to v7.3

Relates: #59425
2020-07-16 12:44:17 +10:00
Costin Leau 5f2285a8b3 EQL: Fix bug in returning results (#59673)
Using serialization/deserialization when dealing with non-trivial
documents causes the process to get stuck not to mention it is expensive.
Use a much more simple approach at the expense of losing information
(we're just interested in the source after all).

(cherry picked from commit e1659822db7ce1390ba9bbfb21768e24a0907dff)
2020-07-16 01:01:13 +03:00
Julie Tibshirani 2b70758a05 Correct type parametrization in geo mappers. (#59583)
Previously the concrete type parameters for the MappedFieldType didn't always
match those for the FieldMapper. This PR updates the mappers so that the type
parameters always match, which makes the design easier to follow.
2020-07-15 14:10:47 -07:00
Martijn van Groningen f1028fbbcc
Only install stack templates via elected master node (#59624) (#59657)
to avoid many error stacktraces in logs during a rolling upgrade.

Stack templates use the composable index template and component APIs,these APIs
aren't supported in 7.7 and earlier and in mixed cluster
environments this can cause a lot of ActionNotFoundTransportException
errors in the logs during rolling upgrades. If these templates
are only installed via elected master node then the APIs are always
there and the ActionNotFoundTransportException errors are then prevented.
2020-07-15 22:22:01 +02:00
Lee Hinman 74372df824
Mute {p0=mixed_cluster/120_api_key_auth/Test API key authentication will work in a mixed cluster} (#59663)
Relates to #59425
2020-07-15 14:14:33 -06:00
Nhat Nguyen 93d419b9c8 Mute CcrRollingUpgradeIT
Tracked at #59625
2020-07-15 14:43:32 -04:00
David Kyle df7fc8f967
Accounting for model size when models are not cached (#59607)
When an inference model is loaded it is accounted for in circuit breaker
and should not be released until there are no users of the model. Adds
a reference count to the model to track usage.
2020-07-15 18:06:15 +01:00
David Turner 691759fb1f
Validate snapshot UUID during restore (#59601)
Today when mounting a searchable snapshot we obtain the snapshot/index
UUIDs and then assume that these are the UUIDs used during the
subsequent restore. If you concurrently delete the snapshot and replace
it with one with the same name then this assumption is violated, with
chaotic consequences.

This commit introduces a check that ensures that the snapshot UUID does
not change during the mount process. If the snapshot remains in place
then the index UUID necessarily does not change either.

Relates #50999
2020-07-15 16:23:20 +01:00
Costin Leau 6b75525efb EQL: Improve testing spec (#59615)
Case sensitivity is incorporated as a test dimension - instead of
running the same test twice, two different tests are created.
Clean-up the test invocation by removing unused parameters.

Fix #59294

(cherry picked from commit 72c8a3582d8e8a4a663d82814a17a1a3d2757292)
2020-07-15 18:07:24 +03:00
Igor Motov b5ab447b3e
EQL: Fix async EQL Rest test (#59556) (#59620)
Unfortunately, we cannot guarantee that the execution will be truly
async even with 0ms timeout since we cannot block the execution. So, we need
to modify the test to work in both async and non-async mode.

Closes #59416
2020-07-15 11:02:33 -04:00
Martijn van Groningen 2a89e13e43
Move data stream transport and rest action to xpack (#59593)
Backport of #59525 to 7.x branch.

* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
2020-07-15 16:50:44 +02:00
Martijn van Groningen 53249dcca8
No need to select only < 7.9 nodes in 7.x branch. (#59609) 2020-07-15 15:23:16 +02:00
Ignacio Vera f8037abf47
upgrade to lucene-8.6.0 release (#59596) (#59599) 2020-07-15 12:40:57 +02:00
Tanguy Leroux 604f22db79
Use a dedicated thread pool for searchable snapshot cache prewarming (#59313) (#59590)
Since #58728 writing operations on searchable snapshot directory cache files
are executed in an asynchronous manner using a dedicated thread pool. The
thread pool used is searchable_snapshots which has been created to execute
prewarming tasks.

Reusing the same thread pool wasn't a good idea as it can lead to deadlock
situations. One of these situation arose in a test failure where the thread pool
was full of prewarming tasks, all waiting for a cache file to be accessible, while
the cache file was being evicted by the cache service. But such an eviction
can only be processed when all read/write operations on the cache file are
completed and in this case the deadlock occurred because the cache file was
actively being read by a concurrent search which also won the privilege to
write the range of bytes in cache... and this writing operation could never have
 been completed because of the prewarming tasks making no progress and
filling up the thread pool.

This commit renames the searchable_snapshots thread pool to
searchable_snapshots_cache_fetch_async. Assertions are added to assert
that cache writes are executed using this thread pool and to assert that read
on cached index inputs are executed using a different thread pool to avoid
potential deadlock situations.

This commit also adds a searchable_snapshots_cache_prewarming that is
used to execute prewarming tasks. It also converts the existing cache prewarming
test into a more complte integration test that creates multiple searchable
snapshot indices concurrently with randomized thread pool sizes, and verifies
that all files have been correctly prewarmed.
2020-07-15 11:45:52 +02:00
Francisco Fernández Castaño 66ef1cdad7
Add the possibility to inject a custom RecoveryState factory to IndexStorePlugin implementations (#59124)
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.

Backport of #59038
2020-07-15 11:11:07 +02:00
Yannick Welsch bc11503dc3 Wait for active license in CcrRestIT (#59543)
Relates #53966

Closes #59486
2020-07-15 09:38:08 +02:00
Tal Levy 4bb91b61e8
Adds support for date_nanos in Rollup Metric and DateHistogram Configs (#59349) (#59577)
Closes #44505.
2020-07-14 22:37:48 -07:00
Armin Braun 2dd086445c
Enable Fully Concurrent Snapshot Operations (#56911) (#59578)
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
   * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations

See updated JavaDoc and added test cases for more details and illustration on the functionality.

Some notes:

The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring.  The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.

This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.

Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
2020-07-15 03:42:31 +02:00
Armin Braun 06d94cbb2a
Fix TODO about Spurious FAILED Snapshots (#58994) (#59576)
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
2020-07-15 00:54:30 +02:00
Armin Braun e1014038e9
Simplify Repository.finalizeSnapshot Signature (#58834) (#59574)
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
2020-07-15 00:14:28 +02:00
Martijn van Groningen 35ae3d19db
Remove data stream feature flag (#59572)
so that it can used in the next minor release (7.9.0).

Backport of #59504 to 7.x branch.
Closes #53100
2020-07-14 23:50:41 +02:00
Ryan Ernst 3b688bfee5
Add license feature usage api (#59342) (#59571)
This commit adds a new api to track when gold+ features are used within
x-pack. The tracking is done internally whenever a feature is checked
against the current license. The output of the api is a list of each
used feature, which includes the name, license level, and last time it
was used. In addition to a unit test for the tracking, a rest test is
added which ensures starting up a default configured node does not
result in any features registering as used.

There are a couple features which currently do not work well with the
tracking, as they are checked in a manner that makes them look always
used. Those features will be fixed in followups, and in this PR they are
omitted from the feature usage output.
2020-07-14 14:34:59 -07:00
James Rodewig e5baacbe2e
[DOCS] Simplify index template snippets for data streams (#59533) (#59553)
Removes the `@timestamp` field mapping from several data stream index
template snippets.

With #59317, the `@timestamp` field defaults to a `date` field data type
for data streams.
2020-07-14 17:28:43 -04:00
James Baiera 5f7e7e9410
[7.x] Data Stream Stats API (#58707) (#59566)
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
2020-07-14 16:57:46 -04:00
Costin Leau 679619c798 EQL: Improve retrieval of results (#59552)
Instead of retrieving an entire SearchHit, get just a reference and
postpone the document retrieval when assembling the final results.
Remove sort information from results to make them consistent.
Move TumblingWindow under the sequence package.

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
(cherry picked from commit bccfbcd81f2f1d3552e95e4a9ee2618fb3059bd9)
2020-07-14 23:53:57 +03:00
Albert Zaharovits 6d6d565eeb
Fix auditing of nameless API Keys (#59531)
API keys can be created nameless using the grant endpoint (it is a bug, see #59484).
This change ensures auditing doesn't throw when such an API Key is used for authn.
2020-07-14 23:46:25 +03:00
Albert Zaharovits 4eb310c777
Disallow mapping updates for doc ingestion privileges (#58784)
The `create_doc`, `create`, `write` and `index` privileges do not grant
the PutMapping action anymore. Apart from the `write` privilege, the other
three privileges also do NOT grant (auto) updating the mapping when ingesting
a document with unmapped fields, according to the templates.

In order to maintain the BWC in the 7.x releases, the above privileges will still grant
the Put and AutoPutMapping actions, but only when the "index" entity is an alias
or a concrete index, but not a data stream or a backing index of a data stream.
2020-07-14 23:39:41 +03:00
Armin Braun d456f7870a
Deduplicate Index Metadata in BlobStore (#50278) (#59514)
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
2020-07-14 22:18:42 +02:00
David Kyle 0d2ea1b881
Check for ml privilege when using the Inference Aggregation (#59530) (#59562)
The inference pipeline aggregation requires the user has permission to access
the ml get trained models endpoint (_ml/inference/)
2020-07-14 20:53:40 +01:00
Tim Brooks 408a07f96a
Separate coordinating and primary bytes in stats (#59487)
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
2020-07-14 12:37:06 -06:00
Dan Hermann 70fe553ce0
[7.x] Reenable BWC tests for data streams (#59538) 2020-07-14 13:35:52 -05:00
Albert Zaharovits b1e4233806
Fix auditing of API Key authn without the owner realm name (#59470)
The `Authentication` object that gets built following an API Key authentication
contains the realm name of the owner user that created the key (which is audited),
but the specific field used for storing it changed in #51305 .

This PR makes it so that auditing tolerates an "unfound" realm name, so it doesn't
throw an NPE, because the owner realm name is not found in the expected field.

Closes #59425
2020-07-14 21:35:29 +03:00
Dimitris Athanasiou ee4610c0ca
[7.x][ML] Rename cross validation splitter package (#59529) (#59544)
Renames and moves the cross validation splitter package.

First, the package and classes are renamed from using
"cross validation splitter" to "train test splitter".
Cross validation as a term is overloaded and encompasses
more concepts than what we are trying to do here.

Second, the package used to be under `process` but it does
not make sense to be there, it can be a top level package
under `dataframe`.

Backport of #59529
2020-07-14 18:54:46 +03:00
Dimitris Athanasiou 37406487b9
[7.x][ML] Improve error for non-included field with unsupported type (#59424) (#59541)
When a field is not included yet its type is unsupported, we currently
state that the reason the field is excluded is that it is not in the
includes list. However, this implies the user could include it but
if the user tried to do so, they would get a failure as they would
be including a field with unsupported type.

This commit improves this by stating the reason a not included field
with unsupported type is excluded is because of its type.

Backport of #59424
2020-07-14 18:54:34 +03:00
Andrei Stefan 1fd16ffb70
Add license header to EqlStatsIT.java (#59537) 2020-07-14 18:45:13 +03:00
Dan Hermann e54b4a729f
[7.x] Adds write_index_only option to put mapping API (#59539) 2020-07-14 10:34:08 -05:00
Nhat Nguyen 4d7c59bedb
Assign follower primary to nodes with remote cluster client role (#59375)
The primary shards of follower indices during the bootstrap need to be
on nodes with the remote cluster client role as those nodes reach out to
the corresponding leader shards on the remote cluster to copy Lucene
segment files and renew the retention leases. This commit introduces a
new allocation decider that ensures bootstrapping follower primaries are
allocated to nodes with the remote cluster client role.

Co-authored-by: Jason Tedor <jason@tedor.me>
2020-07-14 11:23:55 -04:00
Dimitris Athanasiou e302c66847
[7.x][ML] Fix NPE when starting classification with missing dependent_variable (#59524) (#59540)
Since we have added checking the cardinality of the dependent_variable
for classification, we have introduced a bug where an NPE is thrown
if the dependent_variable is a missing field.

This commit is fixing this issue.

Backport of #59524
2020-07-14 17:56:55 +03:00
Andrei Dan d477aa14ef
Data Streams: fix bwc test (#59528) (#59534)
(cherry picked from commit ed1a5c00abed8c63ad395ea93df7a303da7b7a65)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 15:17:20 +01:00
Andrei Stefan cf752992d6
Add telemetry metrics (#59526) 2020-07-14 16:25:24 +03:00
Dan Hermann 59f639a279
Add auto_configure privilege 2020-07-14 08:23:49 -05:00
David Kyle d86435938b
[7.x] Add ml licence check to the pipeline inference agg. (#59213) (#59412)
Ensures the licence is sufficient for the model used in inference
2020-07-14 14:03:10 +01:00
Yang Wang f651487d74
Support prefix search for API key names (#59113) (#59520)
This PR adds minimum support for prefix search of API Key name. It only touches API key name and leave all other query parameters, e.g. realm name, username unchanged.
2020-07-14 22:06:20 +10:00
Andrei Dan 7dcdaeae49
Default to @timestamp in composable template datastream definition (#59317) (#59516)
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.

(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 12:36:54 +01:00
Yang Wang 2e71d0aa91
Allow mixed usage of boolean and string when merging OIDC claims (#59112) (#59512)
Certain OPs mix usage of boolean and string for boolean type OIDC claims. For example, the same "email_verified" field is presented as boolean in IdToken, but is a string of "true" in the response of user info. This inconsistency results in failures when we try to merge them during authorization.

This PR introduce a small leniency so that it will merge a boolean with a string that has value of the boolean's string representation. In another word, it will merge true with "true", also will merge false with "false", but nothing else.
2020-07-14 20:41:16 +10:00
Andrei Dan 4180333bbc
[7.x] Composable templates: add a default mapping for @timestamp (#59244) (#59510)
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.

(cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 11:29:33 +01:00
Costin Leau 5580eb61ed EQL: Improve sequence limiting (#59439)
Improve the way limit (in particular offset) is being applied to handle
the case where the matches are less than the offset and absolute limit.

Combine Matcher and SequenceStateMachine into one class since the two
have evolved beyond their original name and structure.

(cherry picked from commit 63d3c62cdfc33dea03f21d5565b9c8ea104003eb)
2020-07-14 13:19:09 +03:00
Hendrik Muhs c8290167a0
[7.x][Transform] separate pivot and extract function interface (#59505)
separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function.
Foundation to add more functions to transform.

piggy backed fixes:
 - when running geo tile group_by it could fail due to query clause limit (unreleased)
 - new style page size using settings was not validating limit of 10k (7.8)
2020-07-14 11:27:16 +02:00
Martijn van Groningen 5f24be1bc1
Also set system property when running test task. (#59499)
Closes #59488
2020-07-14 10:34:52 +02:00
Rene Groeschke d5c11479da
Remove remaining deprecated api usages (#59231) (#59498)
- Fix duplicate path deprecation by removing duplicate test resources
- fix deprecated non annotated input property in LazyPropertyList
- fix deprecated usage of AbstractArchiveTask.version
- Resolve correct test resources
2020-07-14 10:25:00 +02:00
David Roberts 529aa345df [ML] Account for per-partition categorization in model memory estimate (#59458)
Now that we have per-partition categorization, the estimate for
the model memory limit required for a particular analysis config
needs to take into account whether categorization is operating
for the job as a whole or per-partition.
2020-07-14 09:16:28 +01:00
Yang Wang 4350add12c
Allow null name when deserialising API key document (#59485) (#59496)
API keys can be created without names using grant API key action. This is considered as a bug (#59484). Since the feature has already been released, we need to accomodate existing keys that are created with null names. This PR relaxes the parser logic so that a null name is accepted.
2020-07-14 16:08:32 +10:00
Tim Brooks 623df95a32
Adding indexing pressure stats to node stats API (#59467)
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
2020-07-13 17:23:42 -06:00
Lee Hinman 81bdb20b8a Fix license header for DataStreamRestIT 2020-07-13 14:41:29 -06:00
Lee Hinman bf1a60130d
[7.x] Add telemetery for data streams (#59433) (#59454)
This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is
pretty minimal, returning only the number of data streams and the number of indices currently
abstracted by a data stream:

```
  ...
  "data_streams" : {
    "available" : true,
    "enabled" : true,
    "data_streams" : 3,
    "indices_count" : 17
  }
  ...
```
2020-07-13 14:30:11 -06:00
Christos Soulios 3868bcc7b8
[7.x] Histogram integration on Histogram field type (#59431)
Backports #58930 to 7.x
Implements histogram aggregation over histogram fields as requested in #53285.
2020-07-13 19:36:33 +03:00
Dimitris Athanasiou a7895ff458
[7.x][ML] Remove unused member var from ExtractedFieldsDetector (#59395) (#59406)
Removes member variable `index` from `ExtractedFieldsDetector`
as it is not used.

Backport of #59395

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-13 19:10:43 +03:00
Igor Motov 1acb4aeba9
EQL: Prepare for release (#59331) (#59426)
Enables eql setting in release builds.

Relates #51613
2020-07-13 11:54:32 -04:00
Martijn van Groningen b1b7bf3912
Make data streams a basic licensed feature. (#59392)
Backport of #59293 to 7.x branch.

* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
  this results in storing a composable index template
  with data stream definition only to work with default
  distribution. This way data streams can only be used
  with default distribution, since a data stream can
  currently only be created if a matching composable index
  template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
   to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
  to fail if `_data_stream_timestamp` meta field mapper
  isn't registered. So that a more understandable
  error is returned when attempting to store a template
  with data stream definition via the oss distribution.

In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
2020-07-13 17:26:46 +02:00
Yang Wang cc9166a5ea Mute failed 120_api_key_auth test till #59425 is addressed. 2020-07-14 01:10:36 +10:00
Yang Wang edf27cd765 Adjust BWC versions for API key auth test.
API key realm name is not available in authentication metadata prior to
v7.5. The issue is tracked at #59425
2020-07-14 00:38:42 +10:00
David Roberts b5e8250a4e
[ML] Drive categorization warning notifications from annotations (#59393)
With the introduction of per-partition categorization the old
logic for creating a job notification for categorization status
"warn" does not work.  However, the C++ code is already writing
annotations for categorization status "warn" that take into
account whether per-partition categorization is being used and
which partition(s) the warnings relate to.  Therefore, this
change alters the Java results processor to create notifications
based on the annotations the C++ writes.  (It is arguable that
we don't need both annotations and notifications, but they show
up in different ways in the UI: only annotations are visible in
results and only notifications set the warning symbol in the
jobs list.  This means it's best to have both.)

Backport of #59377
2020-07-13 15:28:57 +01:00
David Kyle 054d5236d4 Mute RegressionIT failure (#59414)
For #59413
2020-07-13 14:12:19 +01:00
Yang Wang a84469742c
Improve role cache efficiency for API key roles (#58156) (#59397)
This PR ensure that same roles are cached only once even when they are from different API keys.
API key role descriptors and limited role descriptors are now saved in Authentication#metadata
as raw bytes instead of deserialised Map<String, Object>.
Hashes of these bytes are used as keys for API key roles. Only when the required role is not found
in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly
from raw bytes to RoleDescriptors without going through the current detour of
"bytes -> Map -> bytes -> RoleDescriptors".
2020-07-13 22:58:11 +10:00
Dan Hermann e01d73c737
[7.x] Data stream admin actions are now index-level actions 2020-07-10 14:36:18 -05:00
Dan Hermann 7fa9cf601b
Data stream support for rollup search 2020-07-10 11:13:34 -05:00
Alan Woodward 4b9cbfca64 Remove test backported in error 2020-07-09 21:45:41 +01:00
Alan Woodward f4caadd239 MappedFieldType no longer requires equals/hashCode/clone (#59212)
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
2020-07-09 21:05:10 +01:00
Lisa Cawley 54483394ae
[DOCS] Clarify subscription requirements (#58958) (#59307) 2020-07-09 12:24:45 -07:00
Dan Hermann c7e977701a
Data stream support for async search 2020-07-09 13:12:04 -05:00
Dan Hermann b9fb12924b
Data stream support for EQL search 2020-07-09 13:10:44 -05:00
Dimitris Athanasiou b2243337d8
[7.x][ML] Data frame analytics max_num_threads setting (#59254) (#59308)
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.

Backport of #59254 and #57274
2020-07-09 19:15:46 +03:00
Costin Leau d9c1e531db EQL: Introduce until functionality (#59292)
Sequences now support until conditional, which prevents a match from
occurring if the until matches a document while doing look-ups.
Thus a sequence must complete before the until condition matches - if
any document within the sequence occurs at, or after, the until hit, the
sequence is discarded.

(cherry picked from commit 1ba1b9f0661aee655aa48cf9475ac61aaee2bfda)
2020-07-09 17:12:01 +03:00
Dimitris Athanasiou d07b11b86b
[7.x][ML] Perform test inference on java (#58877) (#59298)
Since we are able to load the inference model
and perform inference in java, we no longer need
to rely on the analytics process to be performing
test inference on the docs that were not used for
training. The benefit is that we do not need to
send test docs and fit them in memory of the c++
process.

Backport of #58877

Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-07-09 16:30:49 +03:00
David Kyle 86555ec163
Remove unused function InferenceIndexConstants.mapping() (#59146) (#59158)
InferenceIndexConstants.mapping() is broken and unused.
2020-07-09 14:28:53 +01:00
Andrei Stefan d187b531ed
EQL: Give a name to all toml tests and enforce the naming of new tests (#59283) (#59295)
(cherry picked from commit c8ffe3c9237d3cdd90331795b8e37517155b7e91)
2020-07-09 16:20:29 +03:00
David Kyle dbb9c802b1
Better error message when the model cannot be parsed due to its size (#59166) (#59209)
The actual cause can be lost in a long list of parse exceptions
this surfaces the cause when the problem is size.
2020-07-09 13:43:46 +01:00
David Kyle c5443f78ce
Add Inference Pipeline aggregation to HLRC (#59086) (#59250)
Adds InferencePipelineAggregationBuilder to the HLRC duplicating 
the server side classes
2020-07-09 13:38:45 +01:00
Daniel Mitterdorfer 10ef4d2140
Mute testMaxRestoreBytesPerSecIsUsed (#59289)
Relates #59287
2020-07-09 12:52:17 +02:00
Alan Woodward 67a27e2b9d Add declarative parameters to FieldMappers (#58663)
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.

This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.

Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
2020-07-09 11:43:21 +01:00
Daniel Mitterdorfer daa48329ec
[TEST] Mute FollowerFailOverIT.testFailOverOnFollower (#58659) (#59286)
Relates #58534

Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>
2020-07-09 12:38:36 +02:00
Albert Zaharovits 2b7456db7f
Improve auditing of API key authentication #58928
1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields
to the `access_granted`, `access_denied`, `authentication_success`, and
(some) `tampered_request` audit events. The `apikey.id` and `apikey.name`
are present only when authn using an API Key.
2. When authn with an API Key, the `user.realm` field now contains the effective
realm name of the user that created the key, instead of the synthetic value of
`_es_api_key`.
2020-07-09 13:26:18 +03:00
Dimitris Athanasiou d323f8d698
[ML] Add REST spec for the update data frame analytics endpoint (#59253) (#59281)
Closes #59148

Backport of #59253
2020-07-09 13:12:21 +03:00
Ignacio Vera 1ad00d1ceb
Add Support in geo_match enrichment policy for any type of geometry (#59276)
geo_match enrichment works currently only with points. This change adds the ability to
use any type of geometry.
2020-07-09 11:41:41 +02:00
Andrei Stefan c0e0bca84c
Remove search_after and implicit_join_key_field (#59232) (#59280)
(cherry picked from commit 6ede6c59eff321b9fedad30e19508b9e4f788b54)
2020-07-09 12:34:01 +03:00
Bogdan Pintea acfff7b896
Add sample versions of standard deviation and variance funcs (#59093) (#59274)
* Add sample versions of standard deviation and variance functions (#59093)

* Add STDDEV_SAMP, VAR_SAMP

This commit adds the sampling variations of the standard deviation and
variance agg functions.

(cherry picked from commit 8b29817b49e386215f29cb5b3356d0183fd5d9de)

* Fix: workaround for lack of Map#of() in Java8

Replace Map#of() with a HashMap static init.
2020-07-09 10:17:13 +02:00
Ignacio Vera 14ab35e323
Fix numerical error in CentroidCalculatorTests#testPolygonAsPoint (#59012) (#59272) 2020-07-09 08:42:07 +02:00
Lee Hinman bb1c53a0f5
Allow warnings about 'global' template in upgrade tests (#59242)
These tests sometimes install a template so they can be compatible with older versions, but they run
amok of the occasionally installed "global" template which changes the default number of shards.

This commit adds `allowedWarnings` and allows these warnings to be present, but doesn't fail if they
are not (since the global template is only randomly installed).

Resolves #58807
Resolves #58258
2020-07-08 13:40:55 -06:00
Armin Braun cc3c8be0f1
Fix SLMSnapshotBlockingIntegTests.testSnapshotInProgress (#59218) (#59239)
Waiting `INIT` here is dead code in newer versions that don't use `INIT`
any longer and leads to nothing being written to the repository in older versions
if the snapshot is cancelled at the `INIT` step which then breaks repo consistency
checks.
Since we have other tests ensuring that snapshot abort works properly we can just remove
the wait for `INIT` here and backport this down to 7.8 to fix tests.

relates #59140
2020-07-08 19:13:01 +02:00
James Rodewig 838f717e5f
[DOCS] Add data streams to security docs (#59084) (#59237) 2020-07-08 12:53:56 -04:00
Martijn van Groningen 17bd559253
Fix the timestamp field of a data stream to @timestamp (#59210)
Backport of #59076 to 7.x branch.

The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
  index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
  instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
  to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
  for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.

Relates to #58642
Relates to #53100
Closes #58956
Closes #58583
2020-07-08 17:30:46 +02:00
David Turner 6ffdb19a2a Clean searchable snapshots cache on startup (#59009)
Today we empty the searchable snapshots cache when cleanly closing a
shard, but leak cache files in some cases involving an unclean shutdown.
Such leaks are not permanent, they are cleaned up on shard relocation or
deletion, but they still might last for arbitrarily long until that
happens. This commit introduces a cleanup process that runs during node
startup to catch such leaks sooner.

Also, today we permit searchable snapshots to be held on custom data
paths, and store the corresponding cache files within the custom
location. Supporting this feature would make the cleanup process
significantly more complicated since it would require each node to parse
the index metadata for the shards it held before shutdown. Yet, this
feature is undocumented and offers minimal benefits to searchable
snapshots. Therefore with this commit we forbid custom data paths for
searchable snapshot shards.
2020-07-08 15:17:52 +01:00
Nik Everett a29d3515a2
Improve cardinality measure used to build aggs (#56533) (#59107)
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
   extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
   sub-aggregations live under multi-bucket aggregations. This is
   normally true but `parentCardinality` is properly carried forward
   for single bucket aggregations like `filter` and for multi-bucket
   aggregations configured in single-bucket for like `range` with a
   single range.

While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.

Relates to #56487

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-08 08:42:23 -04:00
Dan Hermann 90c8d3fc9d
IndexNameExpressionResolver::dataStreamNames should support exclusions 2020-07-08 07:35:52 -05:00
Armin Braun 9268b25789
Add Check for Metadata Existence in BlobStoreRepository (#59141) (#59216)
In order to ensure that we do not write a broken piece of `RepositoryData`
because the phyiscal repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.

Relates #56911
2020-07-08 14:25:01 +02:00
Costin Leau 3e32d060bf EQL: Fix bug in skipping window (#59196)
Corrected condition that caused a sequence window to be skipped when a query
returns no results by checking not just the current stage but also following
ones as they can match with in-flight sequences.
Improve logging
Fix NPE when emptying a SequenceGroup
Increase randomization in testing
Make maxspan inclusive (up to and equal to value vs just up to)

(cherry picked from commit ad32c488688cb350c2934dfca03af86045e997b0)
2020-07-08 14:36:39 +03:00
Yannick Welsch 0b9eb210b8
Add basic searchable snapshots usage information (#58828) (#59160)
Adds super basic usage information for searchable snapshots, to be extended later.

Backport of #58828
2020-07-08 13:09:29 +02:00
Yang Wang a6109063a2
Even more robust test for API key auth 429 response (#59159) (#59208)
Ensure blocking tasks are running before submitting more no-op tasks. This ensures no task would be popped out of the queue unexpectedly, which in turn guarantees the rejection of subsequent authentication request.
2020-07-08 16:43:07 +10:00
Nhat Nguyen ef5c397c0f
Sending operations concurrently in peer recovery (#58018)
Today, we send operations in phase2 of peer recoveries batch by batch
sequentially. Normally that's okay as we should have a fairly small of
operations in phase 2 due to the file-based threshold. However, if
phase1 takes a lot of time and we are actively indexing, then phase2 can
have a lot of operations to replay.

With this change, we will send multiple batches concurrently (defaults
to 1) to reduce the recovery time.

Backport of #58018
2020-07-07 22:03:31 -04:00
Albert Zaharovits d4a0f80c32
Ensure authz role for API key is named after owner role (#59041)
The composite role that is used for authz, following the authn with an API key,
is an intersection of the privileges from the owner role and the key privileges defined
when the key has been created.
This change ensures that the `#names` property of such a role equals the `#names`
property of the key owner role, thereby rectifying the value for the `user.roles`
audit event field.
2020-07-07 23:26:57 +03:00
Benjamin Trent e343e066fc
[7.x] [ML] prefer secondary auth headers on evaluate (#59167) (#59183)
* [ML] prefer secondary auth headers on evaluate (#59167)

We should prefer the secondary auth headers when evaluating a data frame
2020-07-07 15:34:47 -04:00
Andrei Dan 24c6a30e2b
[7.9] GET data stream API returns additional information (#59128) (#59177)
* GET data stream API returns additional information (#59128)

This adds the data stream's index template, the configured ILM policy
(if any) and the health status of the data stream to the GET _data_stream
response.

Restoring a data stream from a snapshot could install a data stream that
doesn't match any composable templates. This also makes the `template`
field in the `GET _data_stream` response optional.

(cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-07 20:30:09 +01:00
Nik Everett 93ff5bf9c8
Remove blocking from inference pipeline builder (#59096) (#59162)
This removes the blocking model lookup from the `inference` aggregator's
builder by integrating it into the request rewrite process that loads
stuff asynchronously.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-07 12:31:17 -04:00
Armin Braun 6dec2cf722
Fix SLM Tests Leaking Snapshot Operation (#59150) (#59155)
Fixed an issue #59082 introduced. We have to wait for no more operations
in all tests here not just the one we were waiting in already so that the cleanup
operation from the parent class can run without failure.
2020-07-07 17:19:06 +02:00
Rene Groeschke a896df53ac
Remove misc dependency related deprecation warnings (7.x backport) (#59122)
* Fix dependency related deprecations (#58892)
* Fix classpath setup for forbiddenapi usage
2020-07-07 17:10:31 +02:00
Christoph Büscher 7c64a1bd7b Muting failing ApiKeyIntegTests 2020-07-07 16:02:59 +02:00
Yang Wang f84b76661d
Make test more robust for API key auth 429 (#59077) (#59136)
Adds error handling when filling up the queue of the crypto thread pool. Also reduce queue size of the crypto thread pool to 10 so that the queue can be cleared out in time.

Test testAuthenticationReturns429WhenThreadPoolIsSaturated has seen failure on CI when it tries to push 1000 tasks into the queue (setup phase). Since multiple tests share the same internal test cluster, it may be possible that there are lingering requests not fully cleared out from the queue. When it happens, we will not be able to push all 1000 tasks into the queue. But since what we need is just queue saturation, so as long as we can be sure that the queue is fully filled, it is safe to ignore rejection error and just move on.

A number of 1000 tasks also take some to clear out, which could cause the test suite to time out. This PR change the queue to 10 so the tests would have better chance to complete in time.
2020-07-07 22:27:10 +10:00
Rene Groeschke e8181fc627
Fix implicit duplicate duplicatesStrategy in processResources (#58929) (#59127)
* Fix implicit duplicate duplicatesStrategy in processResources
* Fix duplicates strategy in docker distribution setup
2020-07-07 13:45:36 +02:00
Ignacio Vera 5cc6457ed8
upgrade to lucene-8.6.0-snapshot-6a715e2ecc3 (#59091) (#59120) 2020-07-07 12:07:41 +02:00
Armin Braun d6d6df16bb
Share IT Infrastructure between Core Snapshot and SLM ITs (#59082) (#59119)
For #58994 it would be useful to be able to share test infrastructure.
This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests
accordingly and adds a shared and efficient (compared to the previous implementations)
way of waiting for no running snapshot operations to the test infrastructure to dry things up further.
2020-07-07 12:04:41 +02:00
David Roberts e217f9a1e8
[ML] Wait for shards to initialize after creating ML internal indices (#59087)
There have been a few test failures that are likely caused by tests
performing actions that use ML indices immediately after the actions
that create those ML indices.  Currently this can result in attempts
to search the newly created index before its shards have initialized.

This change makes the method that creates the internal ML indices
that have been affected by this problem (state and stats) wait for
the shards to be initialized before returning.

Backport of #59027
2020-07-07 10:52:10 +01:00
Francisco Fernández Castaño 0752a86fe5
Enforce higher priority for RepositoriesService ClusterStateApplier (#59040)
* Enforce higher priority for RepositoriesService ClusterStateApplier

This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.

Backport of #58808
2020-07-07 09:51:08 +02:00
Jake Landis 604c6dd528
7.x - Create plugin for yamlTest task (#56841) (#59090)
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.

The remaining cases in modules, plugins, and x-pack will be handled in followups.

This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.

The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.

Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).

As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.

Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
2020-07-06 14:16:26 -05:00
Costin Leau f9c15d0fec EQL: Introduce sequencing fetch size (#59063)
The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory.
This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired.
As such, use in testing a minimum fetch size which exposed a number of bugs:

Jumping across data across queries causing valid data to be seen as a gap.
Incorrectly resuming searching across pages (again causing data to be discarded).
which have been addressed.

(cherry picked from commit 2f389a7724790d7b0bda67264d6eafcfa8b2116e)
2020-07-06 19:14:26 +03:00
Costin Leau b2e9c6f640 Update UnresolvedRelationTests
UnresolvedRelation does not care about its source during equality hence
ignore it when doing randomized mutations.

Relates #59014

(cherry picked from commit b21222e714fbf85aad0916e4d4b6a933d2b6958a)
2020-07-06 19:14:25 +03:00
Costin Leau fe775a315f EQL: Obey size request parameter (#59014)
While at it, change the default size to 10 (to align it with the search
API defaults).

(cherry picked from commit 45795939b277e736a9e4f2f008d1c3f406239075)
2020-07-06 19:14:25 +03:00
Yang Wang 2a1635ad69
Create API key with TransportBulkAction directly (#59046) (#59060)
Use TransportBulkAction directly to create API keys instead of going through the proxy from IndexAction to BulkAction.
2020-07-06 23:32:07 +10:00
Yang Wang 66c0231895
Improve threadpool usage and error handling for API key validation (#58090) (#59047)
The PR introduces following two changes:

Move API key validation into a new separate threadpool. The new threadpool is created separately with half of the available processors and 1000 in queue size. We could combine it with the existing TokenService's threadpool. Technically it is straightforward, but I am not sure whether it could be a rushed optimization since I am not clear about potential impact on the token service.

On threadpoool saturation, it now fails with EsRejectedExecutionException which in turns gives back a 429, instead of 401 status code to users.
2020-07-06 21:21:07 +10:00
Przemysław Witek 4a791e835b
Simplify parser declarations when specialist types are stored in strings (#58996) (#59056) 2020-07-06 13:05:03 +02:00
Przemysław Witek f35ad0d4e1
Report peak model memory in ModelSizeStats (#59017) (#59055) 2020-07-06 12:55:12 +02:00
David Kyle c651135562
[ML] Make Inference processor field_map and inference_config optional (#59010)
Relaxes the requirement that the inference ingest processor must has a
field_map and inference_config defined even if they are empty.
2020-07-06 11:35:30 +01:00
David Kyle 0fc12194bf
[ML] Increase timeout in MlDistributedFailureIT (#58997) (#59013)
Doubles the timeout on the ensureStableClusterOnAllNodes method to 60s
to account for v slow ci
2020-07-06 11:30:41 +01:00
Martijn van Groningen f0dd9b4ace
Add data stream timestamp validation via metadata field mapper (#59002)
Backport of #58582 to 7.x branch.

This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.

The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.

Relates to #53100
2020-07-06 11:32:33 +02:00
Armin Braun 49857cc35d
Dry up Master Disconnect Disruption Tests (#58953) (#59050)
Dry up tests that use a disruption that isolates the master from all other nodes.
Also, turn disruption types that have neither parameters nor state into constants
to make things a little clearer.
2020-07-06 11:04:24 +02:00
Yang Wang a9151db735
Map only specific type of OIDC Claims (#58524) (#59043)
This commit changes our behavior in 2 ways:

- When mapping claims to user properties ( principal, email, groups,
name), we only handle string and array of string type. Previously
we would fail to recognize an array of other types and that would
cause failures when trying to cast to String.
- When adding unmapped claims to the user metadata, we only handle
string, number, boolean and arrays of these. Previously, we would
fail to recognize an array of other types and that would cause
failures when attempting to process role mappings.

For user properties that are inherently single valued, like
principal(username) we continue to support arrays of strings where
we select the first one in case this is being depended on by users
but we plan on removing this leniency in the next major release.

Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
2020-07-06 11:36:41 +10:00
Tanguy Leroux 49f4227837 Check acknowledged responses in FsSearchableSnapshotsIT (#59021)
Despite all my attempts I did not manage to reproduce issues like the ones 
described in #58961. My guess is that the _mount request got retried at 
some point but I wasn't able to validate this assumption.

Still, the FsSearchableSnapshotsIT can be pretty disk heavy if a small 
random chunk size and a large number of documents is picked up in the 
tests. The parent class also does not verify the acknowledged status 
of some requests.

This commit lowers down the chunk size and number of docs in tests 
(this is extensively tests in unit tests) and also adds assertions on 
acknowledged responses.

Relates #58961
2020-07-05 10:50:31 +02:00
Armin Braun 071d8b2c1c
Deduplicate Empty InternalAggregations (#58386) (#59032)
Working through a heap dump for an unrelated issue I found that we can easily rack up
tens of MBs of duplicate empty instances in some cases.
I moved to a static constructor to guard against that in all cases.
2020-07-04 14:02:16 +02:00
Bogdan Pintea e88d71b187
[7.x] SQL: Redact credentials in connection exceptions (#58650) (#59025)
* SQL: Redact credentials in connection exceptions (#58650)

This commit adds the functionality to redact the credentials from the
exceptions generated when a connection attempt fails, preventing them
from leaking into logs, console history etc.

There are a few causes that can lead to failed connections. The most
challenging to deal with is a malformed connection string. The redaction
tries to get around it by modifying the URI to a parsable state, so that
the redaction can be applied reliably. If there's no reliability
guarantee, the redaction will bluntly replace the entire connection
string and the user informed about the option to modify it so that the
redaction won't apply. (This is done by using a caplitalized scheme,
which is legal, but otherwise never used in practice.)

The commit fixes a couple of other issues with the URI parser:
- it allows an empty hostname, or even entire connection string (as per
the existing documentation);
- it reduces the editing of the connection string in the exception
messages (so that the user easier recognize their input);
- it uses the default URI as source for the scheme and hostname.

(cherry picked from commit a0bd5929d0658c4fed44404e0c4d78eac88222fd)

* Implement String#repeat(), unavailable in Java8

Implement a client.StringUtils#repeatString() as a replacement for
String#repeat(), unavailable in Java8.
2020-07-04 11:29:06 +02:00
Benjamin Trent b9d9964d10
[ML] add exponent output aggregator to inference (#58933) (#59016)
* [ML] add exponent output aggregator to inference

* fixing docs

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-03 14:51:00 -04:00
Lisa Cawley 5c19464a2f [DOCS] Clarifies number of file and native realms (#58949) 2020-07-03 11:00:28 -07:00
Dan Hermann c1781bc7e7
[7.x] Add include_data_streams flag for authorization (#59008) 2020-07-03 12:58:39 -05:00
Bogdan Pintea 3d96d91efb
[7.x] SQL: fix handling of escaped chars in JDBC connection string (#58429) (#58977)
SQL: fix handling of escaped chars in JDBC connection string (#58429)

This commit fixes an issue emerging when the connection string URI
contains escaped characters.

The original URI is pre-parsed in order to re-assemble a new URI having
the optional elements filled in with defaults. The new URI has been
using however the unescaped query and fragment parts. So if these
contained any escaped `&` or `=` (such as in the password option value),
the unescaping would reveal them and make them later interfere with the
options parsing.

The commit changes that, so that the new URI be built from the unescaped
"raw" parts of the original URI.

(cherry picked from commit 94eb5a05e79c6e203de548d05b13e00295bd4489)
2020-07-03 17:03:00 +02:00
Luca Cavanna e3fc1638d8 Improve error handling in async search code (#57925)
- The exception that we caught when failing to schedule a thread was incorrect.
- We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager
- when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed.

Closes #58995
2020-07-03 16:07:26 +02:00
Hendrik Muhs ca3da7af85 [ML] handle broken setup with state alias being an index (#58999)
.ml-state-write is supposed to be an index alias, however by accident it can become an index. If
.ml-state-write is a concrete index instead of an alias ML stops working. This change improves error
handling by setting the job to failed and properly log and audit the problem. The user still has to
manually fix the problem. This change should lead to a quicker resolution of the problem.

fixes #58482
2020-07-03 15:26:59 +02:00
Ignacio Vera 2c2486d3d4
Fix GeoHash grid aggregation circuit breaker tests (#58218) (#59001) 2020-07-03 13:46:35 +02:00
Dan Hermann 5e7746d3bd
[7.x] Mirror privileges over data streams to their backing indices (#58991) 2020-07-03 06:33:38 -05:00
Luca Cavanna 4f86f6fb38 Submit async search to not require read privilege (#58942)
When we execute search against remote indices, the remote indices are authorized on the remote cluster and not on the CCS cluster. When we introduced submit async search we added a check that requires that the user running it has the privilege to execute it on some index. That prevents users from executing async searches against remote indices unless they also have read access on the CCS cluster, which is common when the CCS cluster holds no data.

The solution is to let the submit async search go through as we already do for get and delete async search. Note that the inner search action will still check that the user can access local indices, and remote indices on the remote cluster, like search always does.
2020-07-03 12:18:07 +02:00
David Kyle f6a0c2c59d
[7.x] Pipeline Inference Aggregation (#58965)
Adds a pipeline aggregation that loads a model and performs inference on the
input aggregation results.
2020-07-03 09:29:04 +01:00
Tim Vernum 1133c29ce9
Treat roles as a SortedSet (#58988)
The Saml SP document stored the role mapping in a Set, but this made
the order in XContent inconsistent. This switched it to use a TreeSet.

Resolves: #54733
Backport of: #55201
2020-07-03 13:40:58 +10:00