896 Commits

Author SHA1 Message Date
Nhat Nguyen
035f0638f4 Support point in time in async_search (#61560)
This commit integrates point in time into async search and
ensures that it works correctly with security enabled.

Relates #61062
2020-09-10 19:25:48 -04:00
Nhat Nguyen
2eb1e8bc84 Make keep alive of point in time optional in search (#62184)
A search request should not be required to extend the keep_alive of a point in time. 
This change makes that parameter optional.
2020-09-10 19:25:48 -04:00
Nhat Nguyen
3d69b5c41e Introduce point in time APIs in x-pack basic (#61062)
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.

A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.

```
POST /my_index/_pit?keep_alive=1m
```

The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.

```
POST /_search
{
    "query": {
        "match" : {
            "title" : "elasticsearch"
        }
    },
    "pit": {
            "id":  "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
            "keep_alive": "1m"
    }
}
```

Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.

```
DELETE /_pit
{
    "id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```

#### Notable works in this change:

- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966

Relates #46523
Relates #26472

Co-authored-by: Jim Ferenczi <jimczi@apache.org>
2020-09-10 19:25:47 -04:00
Luca Cavanna
cd9774d8cb Runtime fields: rename emitValue function to emit (#62191)
We decided to shorten the emitValue function to emit, given that emit is self-explanatory.

Relates to #59332
2020-09-10 20:27:49 +02:00
Tanguy Leroux
42f5d38d9b
Remove REST APIs documentation for experimental Searchable Snapshot APIs (#62217) (#62231)
This commit removes the documentation for some specific Searchable Snapshot REST APIs:
- clear cache
- searchable snapshot stats
- repository stats

These APIs are low-level and are useful to investigate the behavior of snapshot
backed indices but we expect them to be removed in the future or to appear in
a different form.
2020-09-10 16:51:28 +02:00
Martijn van Groningen
f87fc67592
Mute resolve index data stream tests. (#62211)
Relates to #62190 and #62210
2020-09-10 13:13:50 +02:00
David Roberts
969a1c558b [ML] Include the "properties" layer in find_file_structure mappings (#62158)
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing.  The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects.  However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects.  As a first
step it makes sense to move the returned mappings closer
to the standard format.

This is a small building block towards fixing #55616
2020-09-10 09:33:42 +01:00
Jake Landis
d8dad9ab2c
[7.x] Remove integTest task from PluginBuildPlugin (#61879) (#62135)
This commit removes `integTest` task from all es-plugins.  
Most relevant projects have been converted to use yamlRestTest, javaRestTest, 
or internalClusterTest in prior PRs. 

A few projects needed to be adjusted to allow complete removal of this task
* x-pack/plugin - converted to use yamlRestTest and javaRestTest 
* plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task
* qa/die-with-dignity - convert to javaRestTest
* x-pack/qa/security-example-spi-extension - convert to javaRestTest
* multiple projects - remove the integTest.enabled = false (yay!)

related: #61802
related: #60630
related: #59444
related: #59089
related: #56841
related: #59939
related: #55896
2020-09-09 14:25:41 -05:00
markharwood
5a48895065
Wildcard field bug fix for prefix + term queries. Wildcard syntax should not be supported. (#62085) (#62154)
Wildcard field bug fix for term and prefix queries.
We now escape any * or ? characters in the search string before delegating to the main wildcardQuery() method.

Closes #62081
2020-09-09 10:46:52 +01:00
Luca Cavanna
168b448a0f
Rename runtime_script field type to runtime (#62034)
We've had some discussions around the user experience when using runtime fields. Although we do plan on having multiple runtime fields implementation (e.g. grok, lookup etc.) which could be exposed as different field types, we decided to expose all runtime fields under the same `runtime` type. At the moment, the only implementation will be through scripts, hence a `script` must be specified. In the future, there will be other ways to generate values for runtime fields besides scripts.

This translates also to renaming the RuntimeScriptFieldMapper class to RuntimeFieldMapper .

Relates to #59332
2020-09-07 15:07:23 +02:00
Martijn van Groningen
7e566ddd06
Move data stream yaml tests to xpack plugin module. (#62032)
Backport of #61998 to 7.x branch.

Moving the data stream yaml tests to xpack plugin module has the following benefits:
* The tests are ran both with security enabled (as part of xpack/plugin integTest)
  and disabled (as part of xpack/plugin/data-stream/qa/rest integTest).
* and running the tests in mixed cluster qa environment.
2020-09-07 11:03:32 +02:00
Luca Cavanna
0c8b438577
Add support for runtime fields (#61776)
This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch.

We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script.
The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields.

Co-authored-by: Nik Everett <nik9000@gmail.com>
2020-09-07 09:14:53 +02:00
Dimitris Athanasiou
d37f197efd
[7.x][ML] Allow training_percent to be any positive double up to hundred (#61977) (#61990)
This changes the valid range of `training_percent` for regression and
classification from [1, 100] to (0, 100].

Backport of #61977
2020-09-04 17:34:14 +03:00
Martijn van Groningen
84af9abd76
Fix skip versions fix xpack data stream yaml tests. (#61981)
Backport of #61926 to 7.x branch.

Relates to #61904
2020-09-04 14:53:38 +02:00
Martijn van Groningen
3d9c12e2d3
Fix data stream wildcard resolution bug in eql search api.(#61910)
Backport of #61904 to 7.x branch.

The eql search api redirects to the search api. For this reason the eql
search api could work with concrete data stream names. However if security
is enabled and a data stream name snippet with a wildcard was used then
it could not resolve this expressions. This is because the EqlSearchRequest
class didn't overwrite the `includeDataStreams()` method. This pr fixes this,
so that the security layer can properly expand data stream name wildcard
expressions for the eql search api.

This commit also moves the eql data stream test to xpack rest tests,
so that the test runs with security enabled. This is required to reproduce
the bug.

Closes #60828
2020-09-03 16:03:57 +02:00
David Kyle
25e811ced7
Rewrite Inference yml tests for better clean up (#61180) (#61555)
Inference processors asynchronously usage write stats to the .ml-stats index after they used. 
In tests the write can leak into the next test causing failures depending on which test follows.
This change waits for the usage stats docs to be written at the end of the test
2020-08-27 11:16:26 +01:00
Igor Motov
f70a59971a
[7.x] Add rate aggregation (#61369) (#61554)
Adds a new rate aggregation that can calculate a document rate for buckets
of a date_histogram.

Closes #60674
2020-08-25 17:39:00 -04:00
markharwood
8b56441d2b
Search - add case insensitive support for regex queries. (#59441) (#61532)
Backport to add case insensitive support for regex queries. 
Forks a copy of Lucene’s RegexpQuery and RegExp from Lucene master.
This can be removed when 8.7 Lucene is released.

Closes #59235
2020-08-25 17:18:59 +01:00
David Kyle
539cf914bc
[ML] handle new model metadata stream from native process (#59725) (#61251)
This adds the serialization handling for the new model_metadata object from the native process.

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-08-24 15:52:13 -04:00
Yang Wang
cd52233b94
Include authentication type for the authenticate response (#61247) (#61411)
Add a new "authentication_type" field to the response of "GET _security/_authenticate".
2020-08-21 22:59:43 +10:00
markharwood
66098e0bf4
Search fix: query_string regex/wildcard searches not working on wildcard fields (#60959) (#61010)
The Query string parser was not delegating the construction of wildcard/regex queries to the underlying field type.
The wildcard field has special data structures and queries that operate on them so cannot rely on the basic regex/wildcard queries that were being used for other fields.

Closes #60957
2020-08-12 10:44:52 +01:00
David Turner
a2d5bfca2f Even longer timeout for XPackRestIT (#60812)
This suite is still occasionally failing with a timeout on macOS.
Suggest further increasing this timeout until this suite is broken up.

Relates #58071
2020-08-10 10:26:21 +01:00
Przemysław Witek
0afa1bd972
Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601) (#60727) 2020-08-05 13:39:40 +02:00
Russ Cam
e9c0bf1566 Remove body from indices.create_data_stream REST spec (#60705)
This commit removes the body property from the
indices.create_data_stream.json REST API spec
as the API does not support sending a body.

Update the description of the API to remove
that a data stream can be updated with the
API - data streams can only be created with
this API and attempting to update yields a
`resource_already_exists_exception`.

Closes #60704

(cherry picked from commit 2cab2e0ee094769852df31566dbe22b5df59d900)
2020-08-05 17:01:28 +10:00
Julie Tibshirani
c7bfb5de41
Add search fields parameter to support high-level field retrieval. (#60258)
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.

Addresses #49028 and #55363.
2020-07-28 10:58:20 -07:00
Dan Hermann
b98caf58ee
Mark data stream APIs as stable (#59860) (#60206) 2020-07-27 10:37:52 -05:00
Dan Hermann
ca25f6ae6f
Include the resolve index action in the view_index_metadata privilege (#59785) (#60112) 2020-07-23 08:13:56 -05:00
Dan Hermann
fe12217c7f
[7.x] Move REST specs for data streams (#60111) 2020-07-23 08:10:54 -05:00
Przemysław Witek
283a1f605c
Rename binary_soft_classification evaluation to outlier_detection (#59951) (#59970) 2020-07-21 15:15:04 +02:00
Nik Everett
fcd8b5fe6e
Fix top_metrics when metric is missing (backport of #59471) (#59881)
This fixes a null pointer exception when the metric is missing for the
latest document returned by `top_metrics`.

Closes #58926
2020-07-20 10:42:58 -04:00
Przemysław Witek
df4fea79cb
Add a "verbose" option to the data frame analytics stats endpoint (#59589) (#59621) 2020-07-16 09:51:31 +02:00
Tal Levy
4bb91b61e8
Adds support for date_nanos in Rollup Metric and DateHistogram Configs (#59349) (#59577)
Closes #44505.
2020-07-14 22:37:48 -07:00
James Baiera
5f7e7e9410
[7.x] Data Stream Stats API (#58707) (#59566)
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
2020-07-14 16:57:46 -04:00
Albert Zaharovits
4eb310c777
Disallow mapping updates for doc ingestion privileges (#58784)
The `create_doc`, `create`, `write` and `index` privileges do not grant
the PutMapping action anymore. Apart from the `write` privilege, the other
three privileges also do NOT grant (auto) updating the mapping when ingesting
a document with unmapped fields, according to the templates.

In order to maintain the BWC in the 7.x releases, the above privileges will still grant
the Put and AutoPutMapping actions, but only when the "index" entity is an alias
or a concrete index, but not a data stream or a backing index of a data stream.
2020-07-14 23:39:41 +03:00
David Kyle
0d2ea1b881
Check for ml privilege when using the Inference Aggregation (#59530) (#59562)
The inference pipeline aggregation requires the user has permission to access
the ml get trained models endpoint (_ml/inference/)
2020-07-14 20:53:40 +01:00
Dan Hermann
70fe553ce0
[7.x] Reenable BWC tests for data streams (#59538) 2020-07-14 13:35:52 -05:00
Dan Hermann
e54b4a729f
[7.x] Adds write_index_only option to put mapping API (#59539) 2020-07-14 10:34:08 -05:00
Dimitris Athanasiou
e302c66847
[7.x][ML] Fix NPE when starting classification with missing dependent_variable (#59524) (#59540)
Since we have added checking the cardinality of the dependent_variable
for classification, we have introduced a bug where an NPE is thrown
if the dependent_variable is a missing field.

This commit is fixing this issue.

Backport of #59524
2020-07-14 17:56:55 +03:00
Dan Hermann
59f639a279
Add auto_configure privilege 2020-07-14 08:23:49 -05:00
Andrei Dan
7dcdaeae49
Default to @timestamp in composable template datastream definition (#59317) (#59516)
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.

(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
2020-07-14 12:36:54 +01:00
Christos Soulios
3868bcc7b8
[7.x] Histogram integration on Histogram field type (#59431)
Backports #58930 to 7.x
Implements histogram aggregation over histogram fields as requested in #53285.
2020-07-13 19:36:33 +03:00
Dan Hermann
e01d73c737
[7.x] Data stream admin actions are now index-level actions 2020-07-10 14:36:18 -05:00
Dan Hermann
7fa9cf601b
Data stream support for rollup search 2020-07-10 11:13:34 -05:00
Dan Hermann
c7e977701a
Data stream support for async search 2020-07-09 13:12:04 -05:00
Dimitris Athanasiou
b2243337d8
[7.x][ML] Data frame analytics max_num_threads setting (#59254) (#59308)
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.

Backport of #59254 and #57274
2020-07-09 19:15:46 +03:00
Dimitris Athanasiou
d07b11b86b
[7.x][ML] Perform test inference on java (#58877) (#59298)
Since we are able to load the inference model
and perform inference in java, we no longer need
to rely on the analytics process to be performing
test inference on the docs that were not used for
training. The benefit is that we do not need to
send test docs and fit them in memory of the c++
process.

Backport of #58877

Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>

Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-07-09 16:30:49 +03:00
Dimitris Athanasiou
d323f8d698
[ML] Add REST spec for the update data frame analytics endpoint (#59253) (#59281)
Closes #59148

Backport of #59253
2020-07-09 13:12:21 +03:00
Yannick Welsch
0b9eb210b8
Add basic searchable snapshots usage information (#58828) (#59160)
Adds super basic usage information for searchable snapshots, to be extended later.

Backport of #58828
2020-07-08 13:09:29 +02:00
David Kyle
c651135562
[ML] Make Inference processor field_map and inference_config optional (#59010)
Relaxes the requirement that the inference ingest processor must has a
field_map and inference_config defined even if they are empty.
2020-07-06 11:35:30 +01:00
Martijn van Groningen
f0dd9b4ace
Add data stream timestamp validation via metadata field mapper (#59002)
Backport of #58582 to 7.x branch.

This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.

The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.

Relates to #53100
2020-07-06 11:32:33 +02:00