Commit Graph

5925 Commits

Author SHA1 Message Date
Ignacio Vera f8037abf47
upgrade to lucene-8.6.0 release (#59596) (#59599) 2020-07-15 12:40:57 +02:00
Tanguy Leroux 604f22db79
Use a dedicated thread pool for searchable snapshot cache prewarming (#59313) (#59590)
Since #58728 writing operations on searchable snapshot directory cache files
are executed in an asynchronous manner using a dedicated thread pool. The
thread pool used is searchable_snapshots which has been created to execute
prewarming tasks.

Reusing the same thread pool wasn't a good idea as it can lead to deadlock
situations. One of these situation arose in a test failure where the thread pool
was full of prewarming tasks, all waiting for a cache file to be accessible, while
the cache file was being evicted by the cache service. But such an eviction
can only be processed when all read/write operations on the cache file are
completed and in this case the deadlock occurred because the cache file was
actively being read by a concurrent search which also won the privilege to
write the range of bytes in cache... and this writing operation could never have
 been completed because of the prewarming tasks making no progress and
filling up the thread pool.

This commit renames the searchable_snapshots thread pool to
searchable_snapshots_cache_fetch_async. Assertions are added to assert
that cache writes are executed using this thread pool and to assert that read
on cached index inputs are executed using a different thread pool to avoid
potential deadlock situations.

This commit also adds a searchable_snapshots_cache_prewarming that is
used to execute prewarming tasks. It also converts the existing cache prewarming
test into a more complte integration test that creates multiple searchable
snapshot indices concurrently with randomized thread pool sizes, and verifies
that all files have been correctly prewarmed.
2020-07-15 11:45:52 +02:00
Francisco Fernández Castaño 66ef1cdad7
Add the possibility to inject a custom RecoveryState factory to IndexStorePlugin implementations (#59124)
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.

Backport of #59038
2020-07-15 11:11:07 +02:00
Yannick Welsch bc11503dc3 Wait for active license in CcrRestIT (#59543)
Relates #53966

Closes #59486
2020-07-15 09:38:08 +02:00
Tal Levy 4bb91b61e8
Adds support for date_nanos in Rollup Metric and DateHistogram Configs (#59349) (#59577)
Closes #44505.
2020-07-14 22:37:48 -07:00
Armin Braun 2dd086445c
Enable Fully Concurrent Snapshot Operations (#56911) (#59578)
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
   * We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations

See updated JavaDoc and added test cases for more details and illustration on the functionality.

Some notes:

The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring.  The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.

This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.

Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
2020-07-15 03:42:31 +02:00
Armin Braun 06d94cbb2a
Fix TODO about Spurious FAILED Snapshots (#58994) (#59576)
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
2020-07-15 00:54:30 +02:00
Armin Braun e1014038e9
Simplify Repository.finalizeSnapshot Signature (#58834) (#59574)
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
2020-07-15 00:14:28 +02:00
Martijn van Groningen 35ae3d19db
Remove data stream feature flag (#59572)
so that it can used in the next minor release (7.9.0).

Backport of #59504 to 7.x branch.
Closes #53100
2020-07-14 23:50:41 +02:00
Ryan Ernst 3b688bfee5
Add license feature usage api (#59342) (#59571)
This commit adds a new api to track when gold+ features are used within
x-pack. The tracking is done internally whenever a feature is checked
against the current license. The output of the api is a list of each
used feature, which includes the name, license level, and last time it
was used. In addition to a unit test for the tracking, a rest test is
added which ensures starting up a default configured node does not
result in any features registering as used.

There are a couple features which currently do not work well with the
tracking, as they are checked in a manner that makes them look always
used. Those features will be fixed in followups, and in this PR they are
omitted from the feature usage output.
2020-07-14 14:34:59 -07:00
James Rodewig e5baacbe2e
[DOCS] Simplify index template snippets for data streams (#59533) (#59553)
Removes the `@timestamp` field mapping from several data stream index
template snippets.

With #59317, the `@timestamp` field defaults to a `date` field data type
for data streams.
2020-07-14 17:28:43 -04:00
James Baiera 5f7e7e9410
[7.x] Data Stream Stats API (#58707) (#59566)
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
2020-07-14 16:57:46 -04:00
Costin Leau 679619c798 EQL: Improve retrieval of results (#59552)
Instead of retrieving an entire SearchHit, get just a reference and
postpone the document retrieval when assembling the final results.
Remove sort information from results to make them consistent.
Move TumblingWindow under the sequence package.

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
(cherry picked from commit bccfbcd81f2f1d3552e95e4a9ee2618fb3059bd9)
2020-07-14 23:53:57 +03:00
Albert Zaharovits 6d6d565eeb
Fix auditing of nameless API Keys (#59531)
API keys can be created nameless using the grant endpoint (it is a bug, see #59484).
This change ensures auditing doesn't throw when such an API Key is used for authn.
2020-07-14 23:46:25 +03:00
Albert Zaharovits 4eb310c777
Disallow mapping updates for doc ingestion privileges (#58784)
The `create_doc`, `create`, `write` and `index` privileges do not grant
the PutMapping action anymore. Apart from the `write` privilege, the other
three privileges also do NOT grant (auto) updating the mapping when ingesting
a document with unmapped fields, according to the templates.

In order to maintain the BWC in the 7.x releases, the above privileges will still grant
the Put and AutoPutMapping actions, but only when the "index" entity is an alias
or a concrete index, but not a data stream or a backing index of a data stream.
2020-07-14 23:39:41 +03:00
Armin Braun d456f7870a
Deduplicate Index Metadata in BlobStore (#50278) (#59514)
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.

The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).

Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
2020-07-14 22:18:42 +02:00
David Kyle 0d2ea1b881
Check for ml privilege when using the Inference Aggregation (#59530) (#59562)
The inference pipeline aggregation requires the user has permission to access
the ml get trained models endpoint (_ml/inference/)
2020-07-14 20:53:40 +01:00
Tim Brooks 408a07f96a
Separate coordinating and primary bytes in stats (#59487)
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
2020-07-14 12:37:06 -06:00
Dan Hermann 70fe553ce0
[7.x] Reenable BWC tests for data streams (#59538) 2020-07-14 13:35:52 -05:00
Albert Zaharovits b1e4233806
Fix auditing of API Key authn without the owner realm name (#59470)
The `Authentication` object that gets built following an API Key authentication
contains the realm name of the owner user that created the key (which is audited),
but the specific field used for storing it changed in #51305 .

This PR makes it so that auditing tolerates an "unfound" realm name, so it doesn't
throw an NPE, because the owner realm name is not found in the expected field.

Closes #59425
2020-07-14 21:35:29 +03:00
Dimitris Athanasiou ee4610c0ca
[7.x][ML] Rename cross validation splitter package (#59529) (#59544)
Renames and moves the cross validation splitter package.

First, the package and classes are renamed from using
"cross validation splitter" to "train test splitter".
Cross validation as a term is overloaded and encompasses
more concepts than what we are trying to do here.

Second, the package used to be under `process` but it does
not make sense to be there, it can be a top level package
under `dataframe`.

Backport of #59529
2020-07-14 18:54:46 +03:00
Dimitris Athanasiou 37406487b9
[7.x][ML] Improve error for non-included field with unsupported type (#59424) (#59541)
When a field is not included yet its type is unsupported, we currently
state that the reason the field is excluded is that it is not in the
includes list. However, this implies the user could include it but
if the user tried to do so, they would get a failure as they would
be including a field with unsupported type.

This commit improves this by stating the reason a not included field
with unsupported type is excluded is because of its type.

Backport of #59424
2020-07-14 18:54:34 +03:00
Andrei Stefan 1fd16ffb70
Add license header to EqlStatsIT.java (#59537) 2020-07-14 18:45:13 +03:00
Dan Hermann e54b4a729f
[7.x] Adds write_index_only option to put mapping API (#59539) 2020-07-14 10:34:08 -05:00
Nhat Nguyen 4d7c59bedb
Assign follower primary to nodes with remote cluster client role (#59375)
The primary shards of follower indices during the bootstrap need to be
on nodes with the remote cluster client role as those nodes reach out to
the corresponding leader shards on the remote cluster to copy Lucene
segment files and renew the retention leases. This commit introduces a
new allocation decider that ensures bootstrapping follower primaries are
allocated to nodes with the remote cluster client role.

Co-authored-by: Jason Tedor <jason@tedor.me>
2020-07-14 11:23:55 -04:00
Dimitris Athanasiou e302c66847
[7.x][ML] Fix NPE when starting classification with missing dependent_variable (#59524) (#59540)
Since we have added checking the cardinality of the dependent_variable
for classification, we have introduced a bug where an NPE is thrown
if the dependent_variable is a missing field.

This commit is fixing this issue.

Backport of #59524
2020-07-14 17:56:55 +03:00
Andrei Dan d477aa14ef
Data Streams: fix bwc test (#59528) (#59534)
(cherry picked from commit ed1a5c00abed8c63ad395ea93df7a303da7b7a65)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 15:17:20 +01:00
Andrei Stefan cf752992d6
Add telemetry metrics (#59526) 2020-07-14 16:25:24 +03:00
Dan Hermann 59f639a279
Add auto_configure privilege 2020-07-14 08:23:49 -05:00
David Kyle d86435938b
[7.x] Add ml licence check to the pipeline inference agg. (#59213) (#59412)
Ensures the licence is sufficient for the model used in inference
2020-07-14 14:03:10 +01:00
Yang Wang f651487d74
Support prefix search for API key names (#59113) (#59520)
This PR adds minimum support for prefix search of API Key name. It only touches API key name and leave all other query parameters, e.g. realm name, username unchanged.
2020-07-14 22:06:20 +10:00
Andrei Dan 7dcdaeae49
Default to @timestamp in composable template datastream definition (#59317) (#59516)
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.

(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 12:36:54 +01:00
Yang Wang 2e71d0aa91
Allow mixed usage of boolean and string when merging OIDC claims (#59112) (#59512)
Certain OPs mix usage of boolean and string for boolean type OIDC claims. For example, the same "email_verified" field is presented as boolean in IdToken, but is a string of "true" in the response of user info. This inconsistency results in failures when we try to merge them during authorization.

This PR introduce a small leniency so that it will merge a boolean with a string that has value of the boolean's string representation. In another word, it will merge true with "true", also will merge false with "false", but nothing else.
2020-07-14 20:41:16 +10:00
Andrei Dan 4180333bbc
[7.x] Composable templates: add a default mapping for @timestamp (#59244) (#59510)
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.

(cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 11:29:33 +01:00
Costin Leau 5580eb61ed EQL: Improve sequence limiting (#59439)
Improve the way limit (in particular offset) is being applied to handle
the case where the matches are less than the offset and absolute limit.

Combine Matcher and SequenceStateMachine into one class since the two
have evolved beyond their original name and structure.

(cherry picked from commit 63d3c62cdfc33dea03f21d5565b9c8ea104003eb)
2020-07-14 13:19:09 +03:00
Hendrik Muhs c8290167a0
[7.x][Transform] separate pivot and extract function interface (#59505)
separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function.
Foundation to add more functions to transform.

piggy backed fixes:
 - when running geo tile group_by it could fail due to query clause limit (unreleased)
 - new style page size using settings was not validating limit of 10k (7.8)
2020-07-14 11:27:16 +02:00
Martijn van Groningen 5f24be1bc1
Also set system property when running test task. (#59499)
Closes #59488
2020-07-14 10:34:52 +02:00
Rene Groeschke d5c11479da
Remove remaining deprecated api usages (#59231) (#59498)
- Fix duplicate path deprecation by removing duplicate test resources
- fix deprecated non annotated input property in LazyPropertyList
- fix deprecated usage of AbstractArchiveTask.version
- Resolve correct test resources
2020-07-14 10:25:00 +02:00
David Roberts 529aa345df [ML] Account for per-partition categorization in model memory estimate (#59458)
Now that we have per-partition categorization, the estimate for
the model memory limit required for a particular analysis config
needs to take into account whether categorization is operating
for the job as a whole or per-partition.
2020-07-14 09:16:28 +01:00
Yang Wang 4350add12c
Allow null name when deserialising API key document (#59485) (#59496)
API keys can be created without names using grant API key action. This is considered as a bug (#59484). Since the feature has already been released, we need to accomodate existing keys that are created with null names. This PR relaxes the parser logic so that a null name is accepted.
2020-07-14 16:08:32 +10:00
Tim Brooks 623df95a32
Adding indexing pressure stats to node stats API (#59467)
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
2020-07-13 17:23:42 -06:00
Lee Hinman 81bdb20b8a Fix license header for DataStreamRestIT 2020-07-13 14:41:29 -06:00
Lee Hinman bf1a60130d
[7.x] Add telemetery for data streams (#59433) (#59454)
This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is
pretty minimal, returning only the number of data streams and the number of indices currently
abstracted by a data stream:

```
  ...
  "data_streams" : {
    "available" : true,
    "enabled" : true,
    "data_streams" : 3,
    "indices_count" : 17
  }
  ...
```
2020-07-13 14:30:11 -06:00
Christos Soulios 3868bcc7b8
[7.x] Histogram integration on Histogram field type (#59431)
Backports #58930 to 7.x
Implements histogram aggregation over histogram fields as requested in #53285.
2020-07-13 19:36:33 +03:00
Dimitris Athanasiou a7895ff458
[7.x][ML] Remove unused member var from ExtractedFieldsDetector (#59395) (#59406)
Removes member variable `index` from `ExtractedFieldsDetector`
as it is not used.

Backport of #59395

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-13 19:10:43 +03:00
Igor Motov 1acb4aeba9
EQL: Prepare for release (#59331) (#59426)
Enables eql setting in release builds.

Relates #51613
2020-07-13 11:54:32 -04:00
Martijn van Groningen b1b7bf3912
Make data streams a basic licensed feature. (#59392)
Backport of #59293 to 7.x branch.

* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
  this results in storing a composable index template
  with data stream definition only to work with default
  distribution. This way data streams can only be used
  with default distribution, since a data stream can
  currently only be created if a matching composable index
  template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
   to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
  to fail if `_data_stream_timestamp` meta field mapper
  isn't registered. So that a more understandable
  error is returned when attempting to store a template
  with data stream definition via the oss distribution.

In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
2020-07-13 17:26:46 +02:00
Yang Wang cc9166a5ea Mute failed 120_api_key_auth test till #59425 is addressed. 2020-07-14 01:10:36 +10:00
Yang Wang edf27cd765 Adjust BWC versions for API key auth test.
API key realm name is not available in authentication metadata prior to
v7.5. The issue is tracked at #59425
2020-07-14 00:38:42 +10:00
David Roberts b5e8250a4e
[ML] Drive categorization warning notifications from annotations (#59393)
With the introduction of per-partition categorization the old
logic for creating a job notification for categorization status
"warn" does not work.  However, the C++ code is already writing
annotations for categorization status "warn" that take into
account whether per-partition categorization is being used and
which partition(s) the warnings relate to.  Therefore, this
change alters the Java results processor to create notifications
based on the annotations the C++ writes.  (It is arguable that
we don't need both annotations and notifications, but they show
up in different ways in the UI: only annotations are visible in
results and only notifications set the warning symbol in the
jobs list.  This means it's best to have both.)

Backport of #59377
2020-07-13 15:28:57 +01:00
David Kyle 054d5236d4 Mute RegressionIT failure (#59414)
For #59413
2020-07-13 14:12:19 +01:00
Yang Wang a84469742c
Improve role cache efficiency for API key roles (#58156) (#59397)
This PR ensure that same roles are cached only once even when they are from different API keys.
API key role descriptors and limited role descriptors are now saved in Authentication#metadata
as raw bytes instead of deserialised Map<String, Object>.
Hashes of these bytes are used as keys for API key roles. Only when the required role is not found
in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly
from raw bytes to RoleDescriptors without going through the current detour of
"bytes -> Map -> bytes -> RoleDescriptors".
2020-07-13 22:58:11 +10:00
Dan Hermann e01d73c737
[7.x] Data stream admin actions are now index-level actions 2020-07-10 14:36:18 -05:00
Dan Hermann 7fa9cf601b
Data stream support for rollup search 2020-07-10 11:13:34 -05:00
Alan Woodward 4b9cbfca64 Remove test backported in error 2020-07-09 21:45:41 +01:00
Alan Woodward f4caadd239 MappedFieldType no longer requires equals/hashCode/clone (#59212)
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
2020-07-09 21:05:10 +01:00
Lisa Cawley 54483394ae
[DOCS] Clarify subscription requirements (#58958) (#59307) 2020-07-09 12:24:45 -07:00
Dan Hermann c7e977701a
Data stream support for async search 2020-07-09 13:12:04 -05:00
Dan Hermann b9fb12924b
Data stream support for EQL search 2020-07-09 13:10:44 -05:00
Dimitris Athanasiou b2243337d8
[7.x][ML] Data frame analytics max_num_threads setting (#59254) (#59308)
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.

Backport of #59254 and #57274
2020-07-09 19:15:46 +03:00
Costin Leau d9c1e531db EQL: Introduce until functionality (#59292)
Sequences now support until conditional, which prevents a match from
occurring if the until matches a document while doing look-ups.
Thus a sequence must complete before the until condition matches - if
any document within the sequence occurs at, or after, the until hit, the
sequence is discarded.

(cherry picked from commit 1ba1b9f0661aee655aa48cf9475ac61aaee2bfda)
2020-07-09 17:12:01 +03:00
Dimitris Athanasiou d07b11b86b
[7.x][ML] Perform test inference on java (#58877) (#59298)
Since we are able to load the inference model
and perform inference in java, we no longer need
to rely on the analytics process to be performing
test inference on the docs that were not used for
training. The benefit is that we do not need to
send test docs and fit them in memory of the c++
process.

Backport of #58877

Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-07-09 16:30:49 +03:00
David Kyle 86555ec163
Remove unused function InferenceIndexConstants.mapping() (#59146) (#59158)
InferenceIndexConstants.mapping() is broken and unused.
2020-07-09 14:28:53 +01:00
Andrei Stefan d187b531ed
EQL: Give a name to all toml tests and enforce the naming of new tests (#59283) (#59295)
(cherry picked from commit c8ffe3c9237d3cdd90331795b8e37517155b7e91)
2020-07-09 16:20:29 +03:00
David Kyle dbb9c802b1
Better error message when the model cannot be parsed due to its size (#59166) (#59209)
The actual cause can be lost in a long list of parse exceptions
this surfaces the cause when the problem is size.
2020-07-09 13:43:46 +01:00
David Kyle c5443f78ce
Add Inference Pipeline aggregation to HLRC (#59086) (#59250)
Adds InferencePipelineAggregationBuilder to the HLRC duplicating 
the server side classes
2020-07-09 13:38:45 +01:00
Daniel Mitterdorfer 10ef4d2140
Mute testMaxRestoreBytesPerSecIsUsed (#59289)
Relates #59287
2020-07-09 12:52:17 +02:00
Alan Woodward 67a27e2b9d Add declarative parameters to FieldMappers (#58663)
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.

This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.

Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
2020-07-09 11:43:21 +01:00
Daniel Mitterdorfer daa48329ec
[TEST] Mute FollowerFailOverIT.testFailOverOnFollower (#58659) (#59286)
Relates #58534

Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>
2020-07-09 12:38:36 +02:00
Albert Zaharovits 2b7456db7f
Improve auditing of API key authentication #58928
1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields
to the `access_granted`, `access_denied`, `authentication_success`, and
(some) `tampered_request` audit events. The `apikey.id` and `apikey.name`
are present only when authn using an API Key.
2. When authn with an API Key, the `user.realm` field now contains the effective
realm name of the user that created the key, instead of the synthetic value of
`_es_api_key`.
2020-07-09 13:26:18 +03:00
Dimitris Athanasiou d323f8d698
[ML] Add REST spec for the update data frame analytics endpoint (#59253) (#59281)
Closes #59148

Backport of #59253
2020-07-09 13:12:21 +03:00
Ignacio Vera 1ad00d1ceb
Add Support in geo_match enrichment policy for any type of geometry (#59276)
geo_match enrichment works currently only with points. This change adds the ability to
use any type of geometry.
2020-07-09 11:41:41 +02:00
Andrei Stefan c0e0bca84c
Remove search_after and implicit_join_key_field (#59232) (#59280)
(cherry picked from commit 6ede6c59eff321b9fedad30e19508b9e4f788b54)
2020-07-09 12:34:01 +03:00
Bogdan Pintea acfff7b896
Add sample versions of standard deviation and variance funcs (#59093) (#59274)
* Add sample versions of standard deviation and variance functions (#59093)

* Add STDDEV_SAMP, VAR_SAMP

This commit adds the sampling variations of the standard deviation and
variance agg functions.

(cherry picked from commit 8b29817b49e386215f29cb5b3356d0183fd5d9de)

* Fix: workaround for lack of Map#of() in Java8

Replace Map#of() with a HashMap static init.
2020-07-09 10:17:13 +02:00
Ignacio Vera 14ab35e323
Fix numerical error in CentroidCalculatorTests#testPolygonAsPoint (#59012) (#59272) 2020-07-09 08:42:07 +02:00
Lee Hinman bb1c53a0f5
Allow warnings about 'global' template in upgrade tests (#59242)
These tests sometimes install a template so they can be compatible with older versions, but they run
amok of the occasionally installed "global" template which changes the default number of shards.

This commit adds `allowedWarnings` and allows these warnings to be present, but doesn't fail if they
are not (since the global template is only randomly installed).

Resolves #58807
Resolves #58258
2020-07-08 13:40:55 -06:00
Armin Braun cc3c8be0f1
Fix SLMSnapshotBlockingIntegTests.testSnapshotInProgress (#59218) (#59239)
Waiting `INIT` here is dead code in newer versions that don't use `INIT`
any longer and leads to nothing being written to the repository in older versions
if the snapshot is cancelled at the `INIT` step which then breaks repo consistency
checks.
Since we have other tests ensuring that snapshot abort works properly we can just remove
the wait for `INIT` here and backport this down to 7.8 to fix tests.

relates #59140
2020-07-08 19:13:01 +02:00
James Rodewig 838f717e5f
[DOCS] Add data streams to security docs (#59084) (#59237) 2020-07-08 12:53:56 -04:00
Martijn van Groningen 17bd559253
Fix the timestamp field of a data stream to @timestamp (#59210)
Backport of #59076 to 7.x branch.

The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
  index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
  instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
  to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
  for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.

Relates to #58642
Relates to #53100
Closes #58956
Closes #58583
2020-07-08 17:30:46 +02:00
David Turner 6ffdb19a2a Clean searchable snapshots cache on startup (#59009)
Today we empty the searchable snapshots cache when cleanly closing a
shard, but leak cache files in some cases involving an unclean shutdown.
Such leaks are not permanent, they are cleaned up on shard relocation or
deletion, but they still might last for arbitrarily long until that
happens. This commit introduces a cleanup process that runs during node
startup to catch such leaks sooner.

Also, today we permit searchable snapshots to be held on custom data
paths, and store the corresponding cache files within the custom
location. Supporting this feature would make the cleanup process
significantly more complicated since it would require each node to parse
the index metadata for the shards it held before shutdown. Yet, this
feature is undocumented and offers minimal benefits to searchable
snapshots. Therefore with this commit we forbid custom data paths for
searchable snapshot shards.
2020-07-08 15:17:52 +01:00
Nik Everett a29d3515a2
Improve cardinality measure used to build aggs (#56533) (#59107)
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
   extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
   sub-aggregations live under multi-bucket aggregations. This is
   normally true but `parentCardinality` is properly carried forward
   for single bucket aggregations like `filter` and for multi-bucket
   aggregations configured in single-bucket for like `range` with a
   single range.

While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.

Relates to #56487

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-08 08:42:23 -04:00
Dan Hermann 90c8d3fc9d
IndexNameExpressionResolver::dataStreamNames should support exclusions 2020-07-08 07:35:52 -05:00
Armin Braun 9268b25789
Add Check for Metadata Existence in BlobStoreRepository (#59141) (#59216)
In order to ensure that we do not write a broken piece of `RepositoryData`
because the phyiscal repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.

Relates #56911
2020-07-08 14:25:01 +02:00
Costin Leau 3e32d060bf EQL: Fix bug in skipping window (#59196)
Corrected condition that caused a sequence window to be skipped when a query
returns no results by checking not just the current stage but also following
ones as they can match with in-flight sequences.
Improve logging
Fix NPE when emptying a SequenceGroup
Increase randomization in testing
Make maxspan inclusive (up to and equal to value vs just up to)

(cherry picked from commit ad32c488688cb350c2934dfca03af86045e997b0)
2020-07-08 14:36:39 +03:00
Yannick Welsch 0b9eb210b8
Add basic searchable snapshots usage information (#58828) (#59160)
Adds super basic usage information for searchable snapshots, to be extended later.

Backport of #58828
2020-07-08 13:09:29 +02:00
Yang Wang a6109063a2
Even more robust test for API key auth 429 response (#59159) (#59208)
Ensure blocking tasks are running before submitting more no-op tasks. This ensures no task would be popped out of the queue unexpectedly, which in turn guarantees the rejection of subsequent authentication request.
2020-07-08 16:43:07 +10:00
Nhat Nguyen ef5c397c0f
Sending operations concurrently in peer recovery (#58018)
Today, we send operations in phase2 of peer recoveries batch by batch
sequentially. Normally that's okay as we should have a fairly small of
operations in phase 2 due to the file-based threshold. However, if
phase1 takes a lot of time and we are actively indexing, then phase2 can
have a lot of operations to replay.

With this change, we will send multiple batches concurrently (defaults
to 1) to reduce the recovery time.

Backport of #58018
2020-07-07 22:03:31 -04:00
Albert Zaharovits d4a0f80c32
Ensure authz role for API key is named after owner role (#59041)
The composite role that is used for authz, following the authn with an API key,
is an intersection of the privileges from the owner role and the key privileges defined
when the key has been created.
This change ensures that the `#names` property of such a role equals the `#names`
property of the key owner role, thereby rectifying the value for the `user.roles`
audit event field.
2020-07-07 23:26:57 +03:00
Benjamin Trent e343e066fc
[7.x] [ML] prefer secondary auth headers on evaluate (#59167) (#59183)
* [ML] prefer secondary auth headers on evaluate (#59167)

We should prefer the secondary auth headers when evaluating a data frame
2020-07-07 15:34:47 -04:00
Andrei Dan 24c6a30e2b
[7.9] GET data stream API returns additional information (#59128) (#59177)
* GET data stream API returns additional information (#59128)

This adds the data stream's index template, the configured ILM policy
(if any) and the health status of the data stream to the GET _data_stream
response.

Restoring a data stream from a snapshot could install a data stream that
doesn't match any composable templates. This also makes the `template`
field in the `GET _data_stream` response optional.

(cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-07 20:30:09 +01:00
Nik Everett 93ff5bf9c8
Remove blocking from inference pipeline builder (#59096) (#59162)
This removes the blocking model lookup from the `inference` aggregator's
builder by integrating it into the request rewrite process that loads
stuff asynchronously.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-07-07 12:31:17 -04:00
Armin Braun 6dec2cf722
Fix SLM Tests Leaking Snapshot Operation (#59150) (#59155)
Fixed an issue #59082 introduced. We have to wait for no more operations
in all tests here not just the one we were waiting in already so that the cleanup
operation from the parent class can run without failure.
2020-07-07 17:19:06 +02:00
Rene Groeschke a896df53ac
Remove misc dependency related deprecation warnings (7.x backport) (#59122)
* Fix dependency related deprecations (#58892)
* Fix classpath setup for forbiddenapi usage
2020-07-07 17:10:31 +02:00
Christoph Büscher 7c64a1bd7b Muting failing ApiKeyIntegTests 2020-07-07 16:02:59 +02:00
Yang Wang f84b76661d
Make test more robust for API key auth 429 (#59077) (#59136)
Adds error handling when filling up the queue of the crypto thread pool. Also reduce queue size of the crypto thread pool to 10 so that the queue can be cleared out in time.

Test testAuthenticationReturns429WhenThreadPoolIsSaturated has seen failure on CI when it tries to push 1000 tasks into the queue (setup phase). Since multiple tests share the same internal test cluster, it may be possible that there are lingering requests not fully cleared out from the queue. When it happens, we will not be able to push all 1000 tasks into the queue. But since what we need is just queue saturation, so as long as we can be sure that the queue is fully filled, it is safe to ignore rejection error and just move on.

A number of 1000 tasks also take some to clear out, which could cause the test suite to time out. This PR change the queue to 10 so the tests would have better chance to complete in time.
2020-07-07 22:27:10 +10:00
Rene Groeschke e8181fc627
Fix implicit duplicate duplicatesStrategy in processResources (#58929) (#59127)
* Fix implicit duplicate duplicatesStrategy in processResources
* Fix duplicates strategy in docker distribution setup
2020-07-07 13:45:36 +02:00
Ignacio Vera 5cc6457ed8
upgrade to lucene-8.6.0-snapshot-6a715e2ecc3 (#59091) (#59120) 2020-07-07 12:07:41 +02:00
Armin Braun d6d6df16bb
Share IT Infrastructure between Core Snapshot and SLM ITs (#59082) (#59119)
For #58994 it would be useful to be able to share test infrastructure.
This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests
accordingly and adds a shared and efficient (compared to the previous implementations)
way of waiting for no running snapshot operations to the test infrastructure to dry things up further.
2020-07-07 12:04:41 +02:00
David Roberts e217f9a1e8
[ML] Wait for shards to initialize after creating ML internal indices (#59087)
There have been a few test failures that are likely caused by tests
performing actions that use ML indices immediately after the actions
that create those ML indices.  Currently this can result in attempts
to search the newly created index before its shards have initialized.

This change makes the method that creates the internal ML indices
that have been affected by this problem (state and stats) wait for
the shards to be initialized before returning.

Backport of #59027
2020-07-07 10:52:10 +01:00
Francisco Fernández Castaño 0752a86fe5
Enforce higher priority for RepositoriesService ClusterStateApplier (#59040)
* Enforce higher priority for RepositoriesService ClusterStateApplier

This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.

Backport of #58808
2020-07-07 09:51:08 +02:00