Commit Graph

733 Commits

Author SHA1 Message Date
Jake Landis 9c76ee47c4
[7.x] json spec: allow null for documentation url (#55749) (#56625)
This commit allows the JSON schema's documentation.url property to have a null value.
This can useful for cases where a feature is under development, and does not have
documentation published yet.

This commit also adds a documentation.url for two ml resources.
2020-05-12 14:49:02 -05:00
Martijn van Groningen 0c61bc63e4
Backport: auto create data streams using index templates v2 (#56596)
Backport: #55377

This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.

Relates to #53100
2020-05-12 17:01:15 +02:00
Martijn van Groningen 2e86801f61
Backport: enable searchable snapshots feature flag for xpack rest tests.
Backport of: #56569

A data stream test, which tests data stream resolvability in xpack apis failed in release builds.
A invocation of a searchable snapshot api failed, because the corresponding feature flag
wasn't enabled for xpack rest tests.

Closes #56531
2020-05-12 12:18:24 +02:00
Ignacio Vera 222ee721ec
Add moving percentiles pipeline aggregation (#55441) (#56575)
Similar to what the moving function aggregation does, except merging windows of percentiles
sketches together instead of cumulatively merging final metrics
2020-05-12 11:35:23 +02:00
Martijn van Groningen 9ae09570d8
Allow a number of broadcast transport actions to resolve data streams (#55726) (#56502)
Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction
to be able to resolve data streams by default. Implementations can change this ability.

This change allows to following APIs to resolve data streams: flush,
refresh (already supported data streams), force merge, clear indices cache,
indices stats (already supported data streams), segments, upgrade stats, 
upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and
reload analyzers APIs.

Relates to #53100
2020-05-11 12:48:35 +02:00
Benjamin Trent 641f598364
[Transform] fixes http status code when bad scripts are provided (#56117) (#56219)
Transforms should propagate up the search execution exception if one is returned when it does the test query. 

this allows transforms to return a `4xx` when the aggs are malformed but parseable. 

closes https://github.com/elastic/elasticsearch/issues/55994
2020-05-05 12:36:22 -04:00
David Roberts 7aa0daaabd
[7.x][ML] More advanced model snapshot retention options (#56194)
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Backport of #56125
2020-05-05 14:31:58 +01:00
Dimitris Athanasiou 2d7899c83c
[7.x][ML] Adjust DF Analytics process phases (#56107) (#56177)
As of elastic/ml-cpp#1179, the analytics process reports phases
depending on the analysis type. This commit adjusts the phases
of current analyses from `analyzing` to the following:

 - outlier_detection: [`computing_outlier`]
 - regression/classification: [`feature_selection`, `coarse_parameter_search`, `fine_tuning_parameters`, `final_training`]

Backport of #56107
2020-05-05 15:00:07 +03:00
Dimitris Athanasiou 75dadb7a6d
[7.x][ML] Add loss_function to regression (#56118) (#56187)
Adds parameters `loss_function` and `loss_function_parameter`
to regression.

Backport of #56118
2020-05-05 14:59:51 +03:00
Martijn van Groningen 6d03081560
Add auto create action (#56122)
Backport of #55858 to 7.x branch.

Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.

This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:

* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See #55377

* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:

AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.

The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.

Relates to #53100
2020-05-04 19:10:09 +02:00
Christos Soulios c65f828cb7
[7.x] Histogram field type support for ValueCount and Avg aggregations (#56099)
Backports #55933 to 7.x

Implements value_count and avg aggregations over Histogram fields as discussed in #53285

- value_count returns the sum of all counts array of the histograms
- avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array
2020-05-04 13:23:02 +03:00
Przemko Robakowski 797f63e743
[7.x] Emit deprecation warning if multiple v1 templates match with a new index (#55558) (#56038)
* Emit deprecation warning if multiple v1 templates match with a new index (#55558)

* Emit deprecation warning if multiple v1 templates match with a new index

* DEPRECATION_LOGGER rename
2020-04-30 17:36:17 +02:00
Yang Wang 317d9fb88f
Remove synthetic role names of API keys as they confuse users (#56005) (#56011)
Synthetic role names of API keys add confusion to users. This happens to API responses as well as audit logs. The PR removes them for clarity.
2020-04-30 21:32:55 +10:00
Jake Landis ae4d980c8c
[7.x] json spec - add description for autoscaling (#55748) (#55901) 2020-04-29 08:40:11 -05:00
Christos Soulios 02bf0c586a
[7.x] Histogram field type support for Sum aggregation (#55916)
Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285

Backports #55681 to 7.x
2020-04-29 15:06:12 +03:00
Dimitris Athanasiou d9685a0f19
[7.x][ML] Validate at least one feature is available for DF analytics (#55876) (#55914)
We were previously checking at least one supported field existed
when the _explain API was called. However, in the case of analyses
with required fields (e.g. regression) we were not accounting that
the dependent variable is not a feature and thus if the source index
only contains the dependent variable field there are no features to
train a model on.

This commit adds a validation that at least one feature is available
for analysis. Note that we also move that validation away from
`ExtractedFieldsDetector` and the _explain API and straight into
the _start API. The reason for doing this is to allow the user to use
the _explain API in order to understand why they would be seeing an
error like this one.

For example, the user might be using an index that has fields but
they are of unsupported types. If they start the job and get
an error that there are no features, they will wonder why that is.
Calling the _explain API will show them that all their fields are
unsupported. If the _explain API was failing instead, there would
be no way for the user to understand why all those fields are
ignored.

Closes #55593

Backport of #55876
2020-04-29 11:39:58 +03:00
Jake Landis 6f392cf5b9
[7.x] json spec - add description for searchable snapshots (#55746) (#55809) 2020-04-27 10:08:09 -05:00
David Roberts 3ba44a5af8
[ML] Adding failed_category_count to model_size_stats (#55761)
The failed_category_count statistic records the number of times
categorization wanted to create a new category but couldn't
because the job had reached its model_memory_limit.

Backport of #55716
2020-04-25 10:36:49 +01:00
James Rodewig c1b0548db0
[DOCS] Document EQL search REST API (#52384) 2020-04-24 15:36:01 -04:00
David Roberts 2dc5586afe
[ML] Add effective max model memory limit to ML info (#55581)
The ML info endpoint returns the max_model_memory_limit setting
if one is configured.  However, it is still possible to create
a job that cannot run anywhere in the current cluster because
no node in the cluster has enough memory to accommodate it.

This change adds an extra piece of information,
limits.effective_max_model_memory_limit, to the ML info
response that returns the biggest model memory limit that could
be run in the current cluster assuming no other jobs were
running.

The idea is that the ML UI will be able to warn users who try to
create jobs with higher model memory limits that their jobs will
not be able to start unless they add a bigger ML node to their
cluster.

Backport of #55529
2020-04-22 12:28:50 +01:00
David Roberts da5aeb8be7
[ML] Return assigned node in start/open job/datafeed response (#55570)
Adds a "node" field to the response from the following endpoints:

1. Open anomaly detection job
2. Start datafeed
3. Start data frame analytics job

If the job or datafeed is assigned to a node immediately then
this field will return the ID of that node.

In the case where a job or datafeed is opened or started lazily
the node field will contain an empty string.  Clients that want
to test whether a job or datafeed was opened or started lazily
can therefore check for this.

Backport of #55473
2020-04-22 12:06:53 +01:00
markharwood 7761b01a33
Remove normalizer support from wildcard field while we decide on approach for handling case insensitvity (#55294) (#55375)
Closes #55288
2020-04-17 11:43:26 +01:00
Ioannis Kakavas ac87c10039
[7.x] Fix responses for the token APIs (#54532) (#55278)
This commit fixes our behavior regarding the responses we
return in various cases for the use of token related APIs.
More concretely:

- In the Get Token API with the `refresh` grant, when an invalid
(already deleted, malformed, unknown) refresh token is used in the
body of the request, we respond with `400` HTTP status code
 and an `error_description` header with the message "could not
refresh the requested token".
Previously we would return erroneously return a  `401` with "token
malformed" message.

- In the Invalidate Token API, when using an invalid (already
deleted, malformed, unknown) access or refresh token, we respond
with `404` and a body that shows that no tokens were invalidated:
   ```
   {
     "invalidated_tokens":0,
     "previously_invalidated_tokens":0,
      "error_count":0
   }
   ```
   The previous behavior would be to erroneously return
a `400` or `401` ( depending on the case ).

- In the Invalidate Token API, when the tokens index doesn't
exist or is closed, we return `400` because we assume this is
a user issue either because they tried to invalidate a token
when there is no tokens index yet ( i.e. no tokens have
been created yet or the tokens index has been deleted ) or the
index is closed.

- In the Invalidate Token API, when the tokens index is
unavailable, we return a `503` status code because
we want to signal to the caller of the API that the token they
tried to invalidate was not invalidated and we can't be sure
if it is still valid or not, and that they should try the request
again.

Resolves: #53323
2020-04-16 14:05:55 +03:00
Igor Motov 1754e50cbd
[7.x] Add analytics plugin usage stats to _xpack/usage (#54911) (#55162)
Adds analytics plugin usage stats to _xpack/usage.

Closes #54847
2020-04-14 17:03:14 -04:00
Yannick Welsch a610513ec7 Provide repository-level stats for searchable snapshots (#55051)
Provides basic repository-level stats that will allow us to get some insight into how many
requests are actually being made by the underlying SDK. Currently only tracks GET and LIST
calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint
that will help us better understand some of the low-level dynamics of searchable snapshots.
2020-04-14 14:34:08 +02:00
Igor Motov 51c6f69e02
[7.x] Add support for filters to T-Test aggregation (#54980) (#55066)
Adds support for filters to T-Test aggregation. The filters can be used to
select populations based on some criteria and use values from the same or
different fields.

Closes #53692
2020-04-13 12:28:58 -04:00
Przemysław Witek 17101d86d9
[7.x] Do not execute ML CRUD actions when upgrade mode is enabled (#54437) (#55049) 2020-04-10 16:07:11 +02:00
Tanguy Leroux 4d36917e52
Merge feature/searchable-snapshots branch into 7.x (#54803) (#54825)
This is a backport of #54803 for 7.x.

This pull request cherry picks the squashed commit from #54803 with the additional commits:

    6f50c92 which adjusts master code to 7.x
    a114549 to mute a failing ILM test (#54818)
    48cbca1 and 50186b2 that cleans up and fixes the previous test
    aae12bb that adds a missing feature flag (#54861)
    6f330e3 that adds missing serialization bits (#54864)
    bf72c02 that adjust the version in YAML tests
    a51955f that adds some plumbing for the transport client used in integration tests

Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
2020-04-07 13:28:53 +02:00
David Roberts 8f2ddaee1a [TEST] Allow kb or mb for data frame analytics memory estimate (#54869)
This change is to support the switch from kb to mb being made in
https://github.com/elastic/ml-cpp/pull/1126
2020-04-07 11:28:29 +01:00
Jason Tedor 184c038f59
Add get autoscaling policy API (#54762)
This commit adds the get autoscaling policy API.
2020-04-04 18:04:25 -04:00
Benjamin Trent 4a1610265f
[7.x] [ML] add new inference_config field to trained model config (#54421) (#54647)
* [ML] add new inference_config field to trained model config (#54421)

A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder.

The inference processor can still override whatever is set as the default in the trained model config.

* fixing for backport
2020-04-02 12:25:10 -04:00
Jason Tedor 54ecb009bb
Add delete autoscaling policy API (#54601)
This commit adds an API for deleting autoscaling policies.
2020-04-02 09:05:12 -04:00
Martijn Laarman 0ed20cc349 Rename cat.transform => cat.transforms (#54438)
* Rename cat.transform => cat.transforms

To match the url.

We typically prefer singular url nouns but _cat tends to use plural and
this API does in fact uses `/_cat/transforms`

* also rename the api in the spec and tests

(cherry picked from commit c495d220ac8fedba7f70f82387cd6d6a672b8b14)
2020-04-02 09:40:51 +02:00
Russ Cam 2978024375 Update rest API specs (#54252)
This commit updates the rest API specs to validate against a
JSON schema for the specifications. Most updates are to add
a description, whilst others fix typos and unify conventions
e.g. deprecations, descriptions, urls starting with /. The schema
conforms to draft-07 JSON schema.

(cherry picked from commit da37e01d32f9764c3937736ef0c7d3ab40af9a77)
2020-04-02 10:53:32 +10:00
Russ Cam a2f59a2744 Add hidden value to expand_wildcards params (#54551)
This commit adds the hidden enum value to
all expand_wildcards params

(cherry picked from commit 581b8cdabe11444105edb62226b439ba4c7e908a)
2020-04-02 09:01:20 +10:00
Jason Tedor f670ae0bc8
Introduce autoscaling policies (#54473)
This commit is the first in a series of commits that introduces
autoscaling policies, and APIs for working with them. For now, we
introduce the basic infrastructure, and a single API for putting an
autoscaling policy. We will follow in rapid succession with APIs for
getting, and deleting autoscaling policies.
2020-04-01 08:12:26 -04:00
Jason Tedor 5fcda57b37
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 17:24:38 -04:00
David Roberts b8f06df53f
[ML] Fix bug, add tests, improve estimates for estimate_model_memory (#54508)
This PR:

1. Fixes the bug where a cardinality estimate of zero could cause
   a 500 status
2. Adds tests for that scenario and a few others
3. Adds sensible estimates for the cases that were previously TODO

Backport of #54462
2020-03-31 17:59:38 +01:00
Christoph Büscher 67b9b68c66 [Docs] Add HLRC Async Search API documentation (#54353)
Adds documentation and a corresponding test case containing typical API usage
for the Async Search API to the High Level Rest Client.
2020-03-30 15:37:22 +02:00
Benjamin Trent 374e76d7cd
[Transform] fixing naming in HLRC and _cat to match API content (#54300) (#54408)
Fixing the naming of the HLRC values to match the ToXContent field names (i.e. the field names returned from an API call).

Also fixes the names in the _cat API as well.

closes #53946
2020-03-30 08:57:02 -04:00
Dimitris Athanasiou 13368aae37
[7.x][ML] DF Analytics should always display operational stats (#54210) (#54290)
This commit populates the _stats API response with sensible "empty"
`data_counts` and `memory_usage` objects when the job itself
has not started reporting them.

Backport of #54210
2020-03-26 20:03:14 +02:00
David Turner fc92bf4208 assertBusy in XPackRestIT#awaitCallApi (#54264)
Retries in this method were lost in #45794. This commit reinstates them.
2020-03-26 16:16:05 +00:00
Dimitris Athanasiou cc981fa377
[7.x][ML] Get ML filters size should default to 100 (#54207) (#54278)
When get filters is called without setting the `size`
paramter only up to 10 filters are returned. However,
100 filters should be returned. This commit fixes this
and adds an integ test to guard it.

It seems this was accidentally broken in #39976.

Closes #54206

Backport of #54207
2020-03-26 17:51:43 +02:00
Luca Cavanna ff269160af Async search: rename REST parameters (#54198)
This commit renames wait_for_completion to wait_for_completion_timeout in submit async search and get async search.
Also it renames clean_on_completion to keep_on_completion and turns around its behaviour.

Closes #54069
2020-03-26 09:40:50 +01:00
Martijn Laarman 077bf52acc transform.cat should live in the cat namespace. (#54196)
* transform.cat should live in the cat namespace.

Similarly to to ml cat API's also living in the `cat` namespace.

Clients treat the `cat` namespace differently then other API's (return
types, content types). This introduces an exception to this rule.

* rename the specification file as well

(cherry picked from commit 0a98904b1a73a30bbaebc32bd16a238c8d03c329)
2020-03-25 18:16:01 +01:00
David Roberts 7667004b20
[ML] Add a model memory estimation endpoint for anomaly detection (#54129)
A new endpoint for estimating anomaly detection job
model memory requirements:

POST _ml/anomaly_detectors/estimate_model_memory

Backport of #53507
2020-03-24 22:55:11 +00:00
markharwood 6a60f85bba
Wildcard field - add normalizer support (#53851) (#54109)
Backport support for normalisation to wildcard field

Closes #53603
2020-03-24 17:37:47 +00:00
David Roberts 1421471556
[ML] Introduce a "starting" datafeed state for lazy jobs (#54065)
It is possible for ML jobs to open lazily if the "allow_lazy_open"
option in the job config is set to true.  Such jobs wait in the
"opening" state until a node has sufficient capacity to run them.

This commit fixes the bug that prevented datafeeds for jobs lazily
waiting assignment from being started.  The state of such datafeeds
is "starting", and they can be stopped by the stop datafeed API
while in this state with or without force.

Backport of #53918
2020-03-24 13:00:04 +00:00
Hendrik Muhs 7dcacf531f
[7.x][Transform][Rollup] add processing stats to record the ti… (#54027)
add 2 additional stats: processing time and processing total which capture the
time spent for processing results and how often it ran. The 2 new stats
correspond to the existing indexing and search stats. Together with indexing
and search this now allows the user to see the full picture, all 3 stages.
2020-03-24 09:22:02 +01:00
Tim Vernum 4bd853a6f2
Add "grant_api_key" cluster privilege (#54042)
This change adds a new cluster privilege "grant_api_key" that allows
the use of the new /_security/api_key/grant endpoint

Backport of: #53527
2020-03-24 13:17:45 +11:00