Commit Graph

5477 Commits

Author SHA1 Message Date
Lee Hinman bf9651c635
[7.x] Add "content" tier as new "data_content" role (#62247) (#62322)
Similar to the work in #60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This
introduces a new `data_content` node role to be used for the Content tier.

Currently this tier is not used anywhere, but subsequent work will use this tier.

Relates to #60848
2020-09-14 09:42:57 -06:00
Benjamin Trent 13c193a9fc
[Enrich] add logging for when there are search/bulk failures on _execute (#62313) (#62320)
When calling `_execute` there is a chance that there will be bulk indexing failures 
or search failures. 

These will result in the call failing overall. But, no information is provided for troubleshooting the failure.

This commit adds logging to indicate the number of failures, and new debug level logging so that
failure details can be determined if necessary.

closes https://github.com/elastic/elasticsearch/issues/60491
2020-09-14 11:20:13 -04:00
Armin Braun 95766da345
Save Some Allocations when Working with ClusterState (#62060) (#62303)
Just a number of obvious spots where we were allocating
duplicate empty structures or otherwise inefficient that I
found while investigating snapshot cluster state update performance.
2020-09-14 15:09:54 +02:00
Tanguy Leroux 9e38dd0254
Deprecate Repository Stats API (#62297) (#62308)
This commit deprecates the Repository Stats API added in 7.8.0 as
an experimental API behind a feature flag. The goal is to deprecate
this API in 7.10.0 and remove it in a follow up PR in 8.0.0.

This API is now superseded by the Repositories Metering API.
2020-09-14 14:57:38 +02:00
David Roberts d8288526d9 [ML] Add null checks for C++ log handler (#62238)
It has been observed that if the normalizer process fails
to connect to the JVM then this causes a null pointer
exception as the JVM tries to close the native process
object.  The accessors and close methods of the native
process class that access the C++ log handler should not
assume that it connected correctly.
2020-09-14 11:28:26 +01:00
Martijn van Groningen c88f4174ec
Fix resolve index data streams yaml test. (#62221)
Closes #62190
2020-09-14 08:43:58 +02:00
Nhat Nguyen 7779c1f703 Ensure to release async search iterator in tests
We need to close an async search response iterator to release
the related point in time if the test uses pit.
2020-09-12 12:04:10 -04:00
Martijn van Groningen 1bb094a27b
Return 404 when deleting a non existing data stream (#62224)
Backport of #62059 to 7.x branch.

Return a 404 http status code when attempting to delete a non existing data stream.
However only return a 404 when targeting a data stream without any wildcards.

Closes #62022
2020-09-11 15:36:05 +02:00
Nhat Nguyen b118697368
Adjust BWC rest version for point in time (#62264)
Relates #61872
2020-09-11 08:54:11 -04:00
Luca Cavanna b5e1e652c1 Remove unused import 2020-09-11 10:19:01 +02:00
Luca Cavanna 3d3a1b4bc2 Tweak OpenPointInTimeRequest createTask
This commit addresses a super minor misalignment with master, applying exactly the same change that was made as part of #62057, which was backported before point in time APIs were backported.
2020-09-11 10:06:35 +02:00
Nhat Nguyen aafb2cb812 Support point in time cross cluster search (#61827)
This commit integrates point in time into cross cluster search.

Relates #61062
Closes #61790
2020-09-10 19:25:48 -04:00
Nhat Nguyen 808c8689ac Always include the matching node when resolving point in time (#61658)
If shards are relocated to new nodes, then searches with a point in time
will fail, although a pit keeps search contexts open. This commit solves
this problem by reducing info used by SearchShardIterator and always
including the matching nodes when resolving a point in time.

Closes #61627
2020-09-10 19:25:48 -04:00
Nhat Nguyen 035f0638f4 Support point in time in async_search (#61560)
This commit integrates point in time into async search and
ensures that it works correctly with security enabled.

Relates #61062
2020-09-10 19:25:48 -04:00
Nhat Nguyen 2eb1e8bc84 Make keep alive of point in time optional in search (#62184)
A search request should not be required to extend the keep_alive of a point in time. 
This change makes that parameter optional.
2020-09-10 19:25:48 -04:00
Jim Ferenczi 4d528e91a1 Ensure validation of the reader context is executed first (#61831)
This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext)
before any other operation and that it is correctly recycled or removed at the end of the operation.
This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once.

Relates #61446

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2020-09-10 19:25:48 -04:00
Nhat Nguyen 3d69b5c41e Introduce point in time APIs in x-pack basic (#61062)
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <jimczi@apache.org>
2020-09-10 19:25:47 -04:00
Nhat Nguyen 87c889f9c9 CCR should retry on CircuitBreakingException (#62013)
CCR shard follow task can hit CircuitBreakingException on the leader 
cluster (read changes requests) or the follower cluster (bulk requests).
CCR should retry on CircuitBreakingException as it's a transient error.
2020-09-10 17:23:47 -04:00
Nik Everett ac23380560 Fix some query methods in runtime fields
We were missing a few `@Override` annotations in runtime fields which
let us drift from the methods we were supposed to override. Oops. This
adds them and links the methods.
2020-09-10 17:06:05 -04:00
Luca Cavanna 39e59d6edf Share more query execution code for runtime fields (#62229)
For runtime fields we have written quite some lucene queries that work against runtime values that are the result of the execution of the different script contexts that runtime fields support.

The all (but one) share the same main logic: use a two phase iterator, iterate over all documents, and decide whether the current doc matches or not based on what the script returns. I went ahead and shared this bit of code in the base class for all queries on top of runtime fields.
2020-09-10 20:27:49 +02:00
Luca Cavanna cd9774d8cb Runtime fields: rename emitValue function to emit (#62191)
We decided to shorten the emitValue function to emit, given that emit is self-explanatory.

Relates to #59332
2020-09-10 20:27:49 +02:00
Tanguy Leroux 42f5d38d9b
Remove REST APIs documentation for experimental Searchable Snapshot APIs (#62217) (#62231)
This commit removes the documentation for some specific Searchable Snapshot REST APIs:
- clear cache
- searchable snapshot stats
- repository stats

These APIs are low-level and are useful to investigate the behavior of snapshot
backed indices but we expect them to be removed in the future or to appear in
a different form.
2020-09-10 16:51:28 +02:00
Ignacio Vera c8981ea93d
upgrade to lucene-8.7.0-snapshot-b313618cc1d (#62213) (#62222) 2020-09-10 16:23:18 +02:00
Martijn van Groningen f87fc67592
Mute resolve index data stream tests. (#62211)
Relates to #62190 and #62210
2020-09-10 13:13:50 +02:00
Andrei Stefan cce6da7d52
EQL: add the wildcard field type to the IT tests (#62166) (#62200)
* Add wildcard field type as an option for randomized testing of IT queries

(cherry picked from commit 87b14c409c180c4d53c3c61a30bd69f1b81a2823)
2020-09-10 12:36:36 +03:00
Yannick Welsch e3feafc1e9 Enable searchable snapshots in release builds (#62201)
Enables searchable snapshot functionality not only in snapshot, but also release builds.
2020-09-10 11:20:12 +02:00
David Roberts 969a1c558b [ML] Include the "properties" layer in find_file_structure mappings (#62158)
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing.  The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects.  However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects.  As a first
step it makes sense to move the returned mappings closer
to the standard format.

This is a small building block towards fixing #55616
2020-09-10 09:33:42 +01:00
Jake Landis c6c6596623
[7.x] Configure internalClusterTest for snapshot_feature_enabled flag (#62185) (#62189)
closes #62039
2020-09-09 15:41:43 -05:00
Jake Landis d8dad9ab2c
[7.x] Remove integTest task from PluginBuildPlugin (#61879) (#62135)
This commit removes `integTest` task from all es-plugins.  
Most relevant projects have been converted to use yamlRestTest, javaRestTest, 
or internalClusterTest in prior PRs. 

A few projects needed to be adjusted to allow complete removal of this task
* x-pack/plugin - converted to use yamlRestTest and javaRestTest 
* plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task
* qa/die-with-dignity - convert to javaRestTest
* x-pack/qa/security-example-spi-extension - convert to javaRestTest
* multiple projects - remove the integTest.enabled = false (yay!)

related: #61802
related: #60630
related: #59444
related: #59089
related: #56841
related: #59939
related: #55896
2020-09-09 14:25:41 -05:00
Nik Everett 6d2cab9437
Stop runtime script from emitting too many values (#61938) (#62186)
This prevent `keyword` valued runtime scripts from emitting too many
values or values that take up too much space. Without this you can put
allocate a ton of memory with the script by sticking it into a tight
loop. Painless has some protections against this but:
1. I don't want to rely on them out of sheer paranoia
2. They don't really kick in when the script uses callbacks like we do
   anyway.

Relates to #59332
2020-09-09 14:47:24 -04:00
Benjamin Trent e181e24d48
[ML] only persist progress if it has changed (#62123) (#62180)
* [ML] only persist progress if it has changed

We already search for the previously stored progress document.

For optimization purposes, and to prevent restoring the same
progress after a failed analytics job is stopped,
this commit does an equality check between the previously stored progress and current progress
If the progress has changed, persistence continues as normal.
2020-09-09 12:04:09 -04:00
Benjamin Trent 1b9dc0172a
[ML] adding feature_name and node size validation for tree models (#62096) (#62161)
When a tree model is provided, it is possible that it is a stump.
Meaning, it only has one node with no splits
This implies that the tree has no features. In this case,
having zero feature_names is appropriate. In any other case,
this should be considered a validation failure.

This commit adds the validation if there is more than 1 node,
that the feature_names in the model are non-empty.

closes #60759
2020-09-09 08:50:25 -04:00
Luca Cavanna b680d3fb29 Async search: don't track fetch failures (#62111)
Fetch failures are currently tracked byy AsyncSearchTask like ordinary shard failures. Though they should be treated differently or they end up causing weird scenarios like total=num_shards and successful=num_shards as the query phase ran fine yet the failed count would reflect the number of shards where fetch failed.

Given that partial results only include aggs for now and are complete even if fetch fails, we can ignore fetch failures in async search, as they will be anyways included in the response. They are in fact either received as a failure when all shards fail during fetch, or as part of the final response when only some shards fail during fetch.
2020-09-09 13:44:07 +02:00
Luca Cavanna ad83261348 Print out search request as part of async search task description (#62057)
Currently, the async search task is the task that will be running through the whole execution of an async search. While the submit async search task prints out the search as part of its description, async search task doesn't while it should.

With this commit we address that while also making sure that the description highlights that the task is originated from an async search.

Also, we streamline the way the description is printed out by SearchTask so that it does not get forgotten in the future.
2020-09-09 13:44:07 +02:00
Rory Hunter b7fd7cf154
Write deprecation logs to a data stream (#61966)
Backport of #58924.

Closes #46106. Introduce a mechanism for writing deprecation logs to a data stream
as well as to disk.
2020-09-09 12:16:28 +01:00
markharwood 5a48895065
Wildcard field bug fix for prefix + term queries. Wildcard syntax should not be supported. (#62085) (#62154)
Wildcard field bug fix for term and prefix queries.
We now escape any * or ? characters in the search string before delegating to the main wildcardQuery() method.

Closes #62081
2020-09-09 10:46:52 +01:00
Dimitris Athanasiou 6c1700c343
[7.x][ML] Outlier detection mapping for nested feature influence (#62068) (#62150)
Adds mappings for outlier detection results.

Backport of #62068
2020-09-09 12:41:26 +03:00
Yang Wang 146b2e6b1a
[Test] Fix data-stream rest test failure (#62137) (#62144)
By enabling searchable snapshots for release builds.
2020-09-09 17:12:53 +10:00
Armin Braun ed4984a32e
Remove Redundant Stream Wrapping from Compression (#62017) (#62132)
In many cases we don't need a `StreamInput` or `StreamOutput`
wrapper around these streams so I this commit adjusts the API
to just normal streams and adds the wrapping where necessary.
2020-09-09 03:27:38 +02:00
Costin Leau 0f9532689f EQL: Propagate key constraints through the query (#62073)
Since join keys are common across all queries in a Join/Sequence, any
constraint applied on one query needs to be obeyed but all the other
queries.
This PR enhances the optimizer to propagate such constraints across
all queries so they get pushed down to the actual generated ES queries.

Fix #58937

(cherry picked from commit 4afa5debc199c132c07015bfae17952c40a21e5d)
2020-09-08 18:40:47 +03:00
Benjamin Trent 057bf3f7d5
[ML] setting require_alias to previous value on bulk index retry (#62103) (#62108)
Previous work has been done to prevent automatically creating a concrete index when an alias is desired.

This commit addresses a path where this check was not being done.

relates: #62064
2020-09-08 11:38:32 -04:00
Dimitris Athanasiou 41507cff48
[7.x][ML] Update mappings of ml stats index (#61980) (#62091)
- Adds missing mappings for `alpha`, `gamma`, and `lambda`.
- Corrects name of `soft_tree_depth_limit` and `soft_tree_depth_tolerance`.
- Removes unused `regularization_depth_penalty_multiplier`,
  `regularization_leaf_weight_penalty_multiplier` and
  `regularization_tree_size_penalty_multiplier`.

Backport of #61980
2020-09-08 16:41:57 +03:00
David Roberts b2636678b2 [ML] Add support for date_nanos fields in find_file_structure (#62048)
Now that #61324 is merged it is possible for the find_file_structure
endpoint to suggest using date_nanos fields for timestamps where
the timestamp format provides greater than millisecond accuracy.
2020-09-08 13:05:09 +01:00
Francisco Fernández Castaño 2bb5716b3d
Add repositories metering API (#62088)
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.

In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.

Backport of #60371
2020-09-08 14:01:04 +02:00
David Kyle fb6ee5b36d
[7.x] [ML] Assert mappings match templates in Upgrade tests (#61905)
At the end of the rolling upgrade tests check the mappings of the concrete
.ml and .transform-internal indices match the mappings in the templates.
When the templates change, the tests should prove that the mappings have
been updated in the new cluster.
2020-09-08 12:21:19 +01:00
Przemko Robakowski bb357f6aae
[7.x] Move internal index templates to composable templates (#61457) (#61661)
This change moves watcher, ILM history and SLM history templates to composable templates.
Versions are updated to reflect the switch. Only change to the templates themselves is added `_meta` to mark them as managed
2020-09-08 11:26:06 +02:00
Andrei Stefan 7d5791b6bd
EQL: create the search request with a list of indices (#62005) (#62076)
* The query client uses an array of indices instead of the comma separated
version of the indices names

(cherry picked from commit 8ec4a768f4892a4a2faed25836cb333a9deb2ace)
2020-09-08 10:26:59 +03:00
David Kyle a5b24bf44c
Mute ClassificationIT (#62063)
testWithOnlyTrainingRowsAndTrainingPercentIsFifty_DependentVariableIsBoolean
For #60759
2020-09-07 16:10:48 +01:00
Luca Cavanna 168b448a0f
Rename runtime_script field type to runtime (#62034)
We've had some discussions around the user experience when using runtime fields. Although we do plan on having multiple runtime fields implementation (e.g. grok, lookup etc.) which could be exposed as different field types, we decided to expose all runtime fields under the same `runtime` type. At the moment, the only implementation will be through scripts, hence a `script` must be specified. In the future, there will be other ways to generate values for runtime fields besides scripts.

This translates also to renaming the RuntimeScriptFieldMapper class to RuntimeFieldMapper .

Relates to #59332
2020-09-07 15:07:23 +02:00
Jim Ferenczi fa8e76abb1
Improve reduction of terms aggregations (#61779) (#62028)
Today, the terms aggregation reduces multiple aggregations at once using a map
to group same buckets together. This operation can be costly since it requires
to lookup every bucket in a global map with no particular order.
This commit changes how term buckets are sorted by shards and partial reduces in
order to be able to reduce results using a merge-sort strategy.
For bwc, results are merged with the legacy code if any of the aggregations use
a different sort (if it was returned by a node in prior versions).

Relates #51857
2020-09-07 13:13:20 +02:00