Commit Graph

1676 Commits

Author SHA1 Message Date
Luca Cavanna 932a7e3112
Backport of async search changes (#53976)
* Get Async Search: omit _clusters section when empty (#53907)

The _clusters section is omitted by the search API whenever no remote clusters are searched. Async search should do the same, but Get Async Search returns a deserialized response, hence a weird `_clusters` section with all values set to `0` gets returned instead. In fact the recreated Clusters object is not the same object as the EMPTY constant, yet it has the same content.

This commit addresses this by changing the comparison in the `toXContent` method to not print out the section if the number of total clusters is `0`.

* Async search: remove version from response (#53960)

The goal of the version field was to quickly show when you can expect to find something new in the search response, compared to when nothing has changed. This can also be done by looking at the `_shards` section and `num_reduce_phases` returned with the search response. In fact when there has been one or more additional reduction of the results, you can expect new results in the search response. Otherwise, the `_shards` section could notify of additional failures of shards that have completed the query, but that is not a guarantee that their results will be exposed (only when the following partial reduction is performed their results will be available).

That said this commit clarifies this in the docs and removes the version field from the async search response

* Async Search: replicas to auto expand from 0 to 1 (#53964)

This way single node clusters that are green don't go yellow once async search is used, while
all the others still have one replica.

* [DOCS] address timing issue in async search docs tests (#53910)

The docs snippets for submit async search have proven difficult to test as it is not possible to guarantee that you get a response that is not final, even when providing `wait_for_completion=0`. In the docs we want to show though a proper long-running query, and its first response should be partial rather than final.

With this commit we adapt the docs snippets to show a partial response, and replace under the hood all that's needed to make the snippets tests succeed when we get a final response. Also, increased the timeout so we always get a final response.

Closes #53887
Closes #53891
2020-03-23 19:13:31 +01:00
Dimitris Athanasiou 965af3a68b
[7.x][ML] Delete DF analytics stats upon job deletion (#53933) (#53997)
Since a data frame analytics job may have associated docs
in the .ml-stats-* indices, when the job is deleted we
should delete those docs too.

Backport of #53933
2020-03-23 19:55:36 +02:00
Dimitris Athanasiou 08a8345269
[7.x][ML] Fix typo in outlier detection timing stats (#53988) (#53995)
The field holding the timing stats was mistakenly called
`timings_stats`.

Backport of #53988
2020-03-23 19:46:39 +02:00
Armin Braun 5b9864db2c
Better Incrementality for Snapshots of Unchanged Shards (#52182) (#53984)
Use sequence numbers and force merge UUID to determine whether a shard has changed or not instead before falling back to comparing files to get incremental snapshots on primary fail-over.
2020-03-23 16:43:41 +01:00
Martijn van Groningen aef7b89219
Backport: initial data stream commit (#53959)
This commits adds a data stream feature flag, initial definition of a data stream and
the stubs for the data stream create, delete and get APIs. Also simple serialization
tests are added and a rest test to thest the data stream API stubs.

This is a large amount of code and mainly mechanical, but this commit should be
straightforward to review, because there isn't any real logic.

The data stream transport and rest action are behind the data stream feature flag and
are only intialized if the feature flag is enabled. The feature flag is enabled if
elasticsearch is build as snapshot or a release build and the
'es.datastreams_feature_flag_registered' is enabled.

The integ-test-zip sets the feature flag if building a release build, otherwise
rest tests would fail.

Relates to #53100
2020-03-23 12:58:09 +01:00
Yannick Welsch 060c72c799 Only link fd* files during source-only snapshot (#53463)
Source-only snapshots currently create a second full source-only copy of the shard on disk to
support incrementality during upload. Given that stored fields are occupying a substantial part
of a shard's storage, this means that clusters with source-only snapshots can require up to
50% more local storage. Ideally we would only generate source-only parts of the shard for the
things that need to be uploaded (i.e. do incrementality checks on original file instead of
trimmed-down source-only versions), but that requires much bigger changes to the snapshot
infrastructure. This here is an attempt to dramatically cut down on the storage used by the
source-only copy of the shard by soft-linking the stored-fields files (fd*) instead of copying
them.

Relates #50231
2020-03-23 11:04:53 +01:00
Tim Vernum cde8725e3c
Create API Key on behalf of other user (#53943)
This change adds a "grant API key action"

   POST /_security/api_key/grant

that creates a new API key using the privileges of one user ("the
system user") to execute the action, but creates the API key with
the roles of the second user ("the end user").

This allows a system (such as Kibana) to create API keys representing
the identity and access of an authenticated user without requiring
that user to have permission to create API keys on their own.

This also creates a new QA project for security on trial licenses and runs
the API key tests there

Backport of: #52886
2020-03-23 18:50:07 +11:00
Jason Tedor 27c8bcbbd1
Introduce aarch64 packaging (#53914) (#53926)
This commit introduces aarch64 packaging, including bundling an aarch64
JDK distribution. We had to make some interesting choices here:
 - ML binaries are not compiled for aarch64, so for now we disable ML on
   aarch64
 - depending on underlying page sizes, we have to disable class data
   sharing
2020-03-22 11:58:11 -04:00
Ryan Ernst caa4e0dc18
Use boolean methods for allowed realm types in license state (#53456) (#53834)
In xpack the license state contains methods to determine whether a
particular feature is allowed to be used. The one exception is
allowsRealmTypes() which returns an enum of the types of realms allowed.
This change converts the enum values to boolean methods. There are 2
notable changes: NONE is removed as we always fall back to basic license
behavior, and NATIVE is not needed because it would always return true
since we should always have a basic license.
2020-03-20 14:30:31 -07:00
Christoph Büscher 8eacb153df
Add async_search.submit to HLRC #53592 (#53852)
This commit adds a new AsyncSearchClient to the High Level Rest Client which
initially supporst the submitAsyncSearch in its blocking and non-blocking
flavour. Also adding client side request and response objects and parsing code
to parse the xContent output of the client side AsyncSearchResponse together
with parsing roundtrip tests and a simple roundtrip integration test.

Relates to #49091
Backport of #53592
2020-03-20 13:15:58 +01:00
Alan Woodward d23112f441 Report parser name and location in XContent deprecation warnings (#53805)
It's simple to deprecate a field used in an ObjectParser just by adding deprecation
markers to the relevant ParseField objects. The warnings themselves don't currently
have any context - they simply say that a deprecated field has been used, but not
where in the input xcontent it appears. This commit adds the parent object parser
name and XContentLocation to these deprecation messages.

Note that the context is automatically stripped from warning messages when they
are asserted on by integration tests and REST tests, because randomization of
xcontent type during these tests means that the XContentLocation is not constant
2020-03-20 11:52:55 +00:00
Dimitris Athanasiou 60153c5433
[7.x][ML] Data frame analytics analysis stats (#53788) (#53844)
Adds parsing and indexing of analysis instrumentation stats.
The latest one is also returned from the get-stats API.

Note that we chose to duplicate objects even where they are currently
similar. There are already ideas on how these will diverge in the future
and while the duplication looks ugly at the moment, it is the option
that offers the highest flexibility.

Backport of #53788
2020-03-20 12:11:53 +02:00
Christoph Büscher d846ea43f4
Fix ReloadSynonymAnalyzerIT failure (#53663) (#53806)
There is an assertion in ReloadAnalyzersResponse.merge that compares index names
of merged responses that was falsely using object equality instead of
String.equals(). In the past this didn't seem to matter but with changes in the
test setup we started to see failures. Correcting this and also simplifying test
a bit to be able to run it repeatedly if needed.

Backport of #53663
2020-03-19 19:00:14 +01:00
Benjamin Trent 2ccb963f1d
Create GET _cat/transforms API Issue (#53643) (#53726)
Adds new` _cat/transform` and `_cat/transform/{transform_id}` endpoints.
2020-03-18 10:45:28 -04:00
Alan Woodward 580bc40c0c Make it possible to deprecate all variants of a ParseField with no replacement (#53722)
Sometimes we want to deprecate and remove a ParseField entirely, without replacement;
for example, the various places where we specify a _type field in 7x. Currently we can
tell users only that a particular field name should not be used, and that another name should
be used in its place. This commit adds the ability to say that a field should not be used at
all.
2020-03-18 14:16:19 +00:00
Christoph Büscher 2384c1359d Revert "Fix ReloadSynonymAnalyzerIT failure (#53663)"
This reverts commit 2c32173fce.
2020-03-18 12:44:23 +01:00
Christoph Büscher 2c32173fce Fix ReloadSynonymAnalyzerIT failure (#53663)
There is an assertion in ReloadAnalyzersResponse.merge that compares index names
of merged responses that was falsely using object equality instead of
String.equals(). In the past this didn't seem to matter but with changes in the
test setup we started to see failures. Correcting this and also simplifying test
a bit to be able to run it repeatedly if needed.

Closes #53443
2020-03-18 11:55:37 +01:00
Przemysław Witek ec13c093df
Make ML index aliases hidden (#53160) (#53710) 2020-03-18 10:28:45 +01:00
Hendrik Muhs 7a12300ce6
[7.x][Transform] enhance the output of preview to return full… (#53695)
changes the output format of preview regarding deduced mappings and enhances
it to return all the details about auto-index creation. This allows the user
to customize the index creation. Using HLRC you can create a index request
from the output of the response.

backport #53572
2020-03-18 08:37:56 +01:00
David Kyle 2b635737e1
[ML] Parse single named object in config classes (#53472) (#53542) 2020-03-17 13:59:52 +00:00
Yang Wang 7f21ade924
Explicitly require that derived API keys have no privileges (#53647) (#53648)
The current implicit behaviour is that when an API keys is used to create another API key,
the child key is created without any privilege. This implicit behaviour is surprising and is
a source of confusion for users.

This change makes that behaviour explicit.
2020-03-17 17:56:37 +11:00
Ryan Ernst e7f38674ed Add internalClusterTest to check task (#53444)
This commit adds internalClusterTest in xpack core to run as part of
check. This was accidentally removed in a refactoring. Other xpack
modules already do this, but core was left out. This commit also mutes 2
tests that currently fail.

closes #53407
2020-03-16 18:55:01 -07:00
Gordon Brown 880cc3ca7e
Hide I/SLM history aliases (#53564)
This commit adjusts the aliases used for the ILM and SLM history indices
to be hidden aliases.

Also tweaks the configuration of the `IndexTemplateRegistry`s used by
these history system to only upgrade the template from the master node,
as documents are indexed from the master node, so the template version
should only be upgraded from the master node.
2020-03-16 13:07:26 -06:00
markharwood 2c74f3e22c
Backport of new wildcard field type (#53590)
* New wildcard field optimised for wildcard queries (#49993)

Indexes values using size 3 ngrams and also stores the full original as a binary doc value.
Wildcard queries operate by using a cheap approximation query on the ngram field followed up by a more expensive verification query using an automaton on the binary doc values.  Also supports aggregations and sorting.
2020-03-16 15:07:13 +00:00
Przemysław Witek 376b2ae735
[7.x] Make classification evaluation metrics work when there is field mapping type mismatch (#53458) (#53601) 2020-03-16 15:38:56 +01:00
Jim Ferenczi e6680be0b1
Add new x-pack endpoints to track the progress of a search asynchronously (#49931) (#53591)
This change introduces a new API in x-pack basic that allows to track the progress of a search.
Users can submit an asynchronous search through a new endpoint called `_async_search` that
works exactly the same as the `_search` endpoint but instead of blocking and returning the final response when available, it returns a response after a provided `wait_for_completion` time.

````
GET my_index_pattern*/_async_search?wait_for_completion=100ms
{
  "aggs": {
    "date_histogram": {
      "field": "@timestamp",
      "fixed_interval": "1h"
    }
  }
}
````

If after 100ms the final response is not available, a `partial_response` is included in the body:

````
{
  "id": "9N3J1m4BgyzUDzqgC15b",
  "version": 1,
  "is_running": true,
  "is_partial": true,
  "response": {
   "_shards": {
       "total": 100,
       "successful": 5,
       "failed": 0
    },
    "total_hits": {
      "value": 1653433,
      "relation": "eq"
    },
    "aggs": {
      ...
    }
  }
}
````

The partial response contains the total number of requested shards, the number of shards that successfully returned and the number of shards that failed.
It also contains the total hits as well as partial aggregations computed from the successful shards.
To continue to monitor the progress of the search users can call the get `_async_search` API like the following:

````
GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms
````

That returns a new response that can contain the same partial response than the previous call if the search didn't progress, in such case the returned `version`
should be the same. If new partial results are available, the version is incremented and the `partial_response` contains the updated progress.
Finally if the response is fully available while or after waiting for completion, the `partial_response` is replaced by a `response` section that contains the usual _search response:

````
{
  "id": "9N3J1m4BgyzUDzqgC15b",
  "version": 10,
  "is_running": false,
  "response": {
     "is_partial": false,
     ...
  }
}
````

Asynchronous search are stored in a restricted index called `.async-search` if they survive (still running) after the initial submit. Each request has a keep alive that defaults to 5 days but this value can be changed/updated any time:
`````
GET my_index_pattern*/_async_search?wait_for_completion=100ms&keep_alive=10d
`````
The default can be changed when submitting the search, the example above raises the default value for the search to `10d`.
`````
GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms&keep_alive=10d
`````
The time to live for a specific search can be extended when getting the progress/result. In the example above we extend the keep alive to 10 more days.
A background service that runs only on the node that holds the first primary shard of the `async-search` index is responsible for deleting the expired results. It runs every hour but the expiration is also checked by running queries (if they take longer than the keep_alive) and when getting a result.

Like a normal `_search`, if the http channel that is used to submit a request is closed before getting a response, the search is automatically cancelled. Note that this behavior is only for the submit API, subsequent GET requests will not cancel if they are closed.

Asynchronous search are not persistent, if the coordinator node crashes or is restarted during the search, the asynchronous search will stop. To know if the search is still running or not the response contains a field called `is_running` that indicates if the task is up or not. It is the responsibility of the user to resume an asynchronous search that didn't reach a final response by re-submitting the query. However final responses and failures are persisted in a system index that allows
to retrieve a response even if the task finishes.

````
DELETE _async_search/9N3J1m4BgyzUDzqgC15b
````

The response is also not stored if the initial submit action returns a final response. This allows to not add any overhead to queries that completes within the initial `wait_for_completion`.

The `.async-search` index is a restricted index (should be migrated to a system index in +8.0) that is accessible only through the async search APIs. These APIs also ensure that only the user that submitted the initial query can retrieve or delete the running search. Note that admins/superusers would still be able to cancel the search task through the task manager like any other tasks.

Relates #49091

Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
2020-03-16 15:31:27 +01:00
Dimitris Athanasiou 94da4ca3fc
[7.x][ML] Extend classification to support multiple classes (#53539) (#53597)
Prepares classification analysis to support more than just
two classes. It introduces a new parameter to the process config
which dictates the `num_classes` to the process. It also
changes the max classes limit to `30` provisionally.

Backport of #53539
2020-03-16 15:00:54 +02:00
Tom Veasey 690099553c
[7.x][ML] Adds the class_assignment_objective parameter to classification (#53552)
Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.

Fixes #52427.
2020-03-13 17:35:51 +00:00
Tim Vernum a8677499d7
[Backport] Add support for secondary authentication (#53530)
This change makes it possible to send secondary authentication
credentials to select endpoints that need to perform a single action
in the context of two users.

Typically this need arises when a server process needs to call an
endpoint that users should not (or might not) have direct access to,
but some part of that action must be performed using the logged-in
user's identity.

Backport of: #52093
2020-03-13 16:30:20 +11:00
Jay Modi af36665b08
Deprecate the logstash enabled setting (#53487)
The setting, `xpack.logstash.enabled`, exists to enable or disable the
logstash extensions found within x-pack. In practice, this setting had
no effect on the functionality of the extension. Given this, the
setting is now deprecated in preparation for removal.

Backport of #53367
2020-03-12 10:18:39 -06:00
Yannick Welsch 48124807d5 Fix SourceOnlySnapshotIT (#53462)
The tests in this class had been failing for a while, but went unnoticed as not tested by CI (see #53442).

The reason the tests fail is that the can-match phase is smarter now, and filters out access to a non-existing field.

Closes #53442
2020-03-12 14:15:03 +01:00
Benjamin Trent 89668c5ea0
[ML][Inference] adds new default_field_map field to trained models (#53294) (#53419)
Adds a new `default_field_map` field to trained model config objects.

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 13:49:39 -04:00
Przemysław Witek 8c4c19d310
Perform evaluation in multiple steps when necessary (#53295) (#53409) 2020-03-11 15:36:38 +01:00
Dimitris Athanasiou cc7751eb16
[7.x][ML] Add ILM policy to ml stats indices (#53349) (#53392)
Adds a size based ILM policy to automatically
rollover ml stats indices.

Backport of #53349
2020-03-11 13:01:34 +02:00
Dimitris Athanasiou 0fd0516d0d
[7.x][ML] Rename data frame analytics maximum_number_trees to max_trees (#53300) (#53390)
Deprecates `maximum_number_trees` parameter of classification and
regression and replaces it with `max_trees`.

Backport of #53300
2020-03-11 12:45:27 +02:00
David Roberts 532a720e1b
[ML] Skeleton estimate_model_memory endpoint for anomaly detection (#53386)
This is a partial implementation of an endpoint for anomaly
detector model memory estimation.

It is not complete, lacking docs, HLRC and sensible numbers
for many anomaly detector configurations.  These will be
added in a followup PR in time for 7.7 feature freeze.

A skeleton endpoint is useful now because it allows work on
the UI side of the change to commence.  The skeleton endpoint
handles the same cases that the old UI code used to handle,
and produces very similar estimates for these cases.

Backport of #53333
2020-03-11 10:20:00 +00:00
Jake Landis 2ab502afc4
[7.x] Remove dead 'beats' code (#53312) (#53376) 2020-03-10 20:57:29 -05:00
Przemko Robakowski 847ac9c7d7
Fix null config in SnapshotLifecyclePolicy.toRequest (#53328) (#53355)
This avoids NPE when executing SLM policy when no config was provided.

Related to #44465

Closes #53171

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-03-10 20:44:30 +01:00
Przemysław Witek d54d7f2be0
[7.x] Implement ILM policy for .ml-state* indices (#52356) (#53327) 2020-03-10 14:24:18 +01:00
Hendrik Muhs 696aa4ddaf
[7.x][Transform] add support for script in group_by (#53167) (#53324)
add the possibility to base the group_by on the output of a script.

closes #43152
backport #53167
2020-03-10 11:12:58 +01:00
Cauê Marcondes b68d7b1c33
giving kibana user privileges to create custom link index (#53221) (#53278) 2020-03-10 09:50:38 +01:00
Henning Andersen a4d481f2bb ILM Freeze step retry when not acknowledged (#53287)
A freeze operation can partially fail in multiple places, including the
close verification step. This left the index in an unfrozen but
partially closed state. Now throw an exception to retry the freeze step
instead.
2020-03-10 08:03:39 +01:00
Jay Modi a81460dbf5
Make watch history indices hidden (#52974)
This commit updates the template used for watch history indices with
the hidden index setting so that new indices will be created as hidden.

Relates #50251
Backport of #52962
2020-03-06 09:47:03 -07:00
Dimitris Athanasiou 9abf537527
[7.x][ML] Improve DF analytics audits and logging (#53179) (#53218)
Adds audits for when the job starts reindexing, loading data,
analyzing, writing results. Also adds some info logging.

Backport of #53179
2020-03-06 13:47:27 +02:00
Nik Everett 609c61f75c
Formalize usage stats for analytics (backport of #52966) (#53077)
This moves the usage statistics gathering from the `AnalyticsPlugin`
into an `AnalyicsUsage`, removing the static state. It also checks the
license level when parsing all analytics aggregations. This is how we
were checking them before but we did it in an easy to forget way. This
way is slightly simpler, I think.
2020-03-04 10:29:11 -05:00
Adrien Grand cb868d2f5e
Introduce a `constant_keyword` field. (#49713) (#53024)
This field is a specialization of the `keyword` field for the case when all
documents have the same value. It typically performs more efficiently than
keywords at query time by figuring out whether all or none of the documents
match at rewrite time, like `term` queries on `_index`.

The name is up for discussion. I liked including `keyword` in it, so that we
still have room for a `singleton_numeric` in the future. However I'm unsure
whether to call it `singleton`, `constant` or something else, any opinions?

For this field there is a choice between
 1. accepting values in `_source` when they are equal to the value configured
    in mappings, but rejecting mapping updates
 2. rejecting values in `_source` but then allowing updates to the value that
    is configured in the mapping
This commit implements option 1, so that it is possible to reindex from/to an
index that has the field mapped as a keyword with no changes to the source.

Backport of #49713
2020-03-03 16:01:47 +01:00
Yang Wang 70814daa86
Allow _rollup_search with read privilege (#52043) (#53047)
Currently _rollup_search requires manage privilege to access. It should really be
a read only operation. This PR changes the requirement to be read indices privilege.

Resolves: #50245
2020-03-03 22:29:54 +11:00
Hendrik Muhs a328a8eaf1
[7.x][Transform] implement node.transform to control where to… (#52998)
implement transform node attributes to disable transform on certain nodes and
test which nodes are allowed to do remote connections

closes #52200
closes #50033
closes #48734

backport #52712
2020-03-02 16:10:57 +01:00
Martijn van Groningen d102158e6f
Improve closing mock webserver when failed to start (#52943)
Fix NPE when closing a webserver that hasn't started correctly.

This can happen when ssl context isn't initialized. The server instance is then never set,
which causes an NPE that masks the actual failure.

Example stacktrace that would mask an actual failure:

```
java.lang.NullPointerException
	at org.elasticsearch.test.http.MockWebServer.close(MockWebServer.java:271)
	at org.elasticsearch.xpack.watcher.test.integration.HttpSecretsIntegrationTests.cleanup(HttpSecretsIntegrationTests.java:70)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
```
2020-03-02 07:19:08 +01:00
Dimitris Athanasiou 85b4e45093
[7.x]ML] Parse and report memory usage for DF Analytics (#52778) (#52980)
Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern `.ml-stats-*` whose
first concrete index will be `.ml-stats-000001`. This index serves
to store instrumentation information for those jobs.

Backport of #52778 and #52958
2020-02-29 13:03:40 +02:00