This merges the initial work that adds a framework for performing
machine learning analytics on data frames. The feature is currently experimental
and requires a platinum license. Note that the original commits can be
found in the `feature-ml-data-frame-analytics` branch.
A new set of APIs is added which allows the creation of data frame analytics
jobs. Configuration allows specifying different types of analysis to be performed
on a data frame. At first there is support for outlier detection.
The APIs are:
- PUT _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}
- GET _ml/data_frame/analysis/{id}/_stats
- POST _ml/data_frame/analysis/{id}/_start
- POST _ml/data_frame/analysis/{id}/_stop
- DELETE _ml/data_frame/analysis/{id}
When a data frame analytics job is started a persistent task is created and started.
The main steps of the task are:
1. reindex the source index into the dest index
2. analyze the data through the data_frame_analyzer c++ process
3. merge the results of the process back into the destination index
In addition, an evaluation API is added which packages commonly used metrics
that provide evaluation of various analysis:
- POST _ml/data_frame/_evaluate
* [ML][Data Frame] adds new pipeline field to dest config (#43124)
* [ML][Data Frame] adds new pipeline field to dest config
* Adding pipeline support to _preview
* removing unused import
* moving towards extracting _source from pipeline simulation
* fixing permission requirement, adding _index entry to doc
* adjusting for java 8 compatibility
* adjusting bwc serialization version to 7.3.0
Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.
The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.
Finally, all 3 endpoints now support max_docs in both body and URL.
Relates #24344
This commit clones the existing AnalyzeRequest/AnalyzeResponse classes
to the high-level rest client, and adjusts request converters to use these new
classes.
This is a prerequisite to removing the Streamable interface from the internal
server version of these classes.
As a follow-up to #38540 we can use lambda functions and method
references where convenient in the low-level REST client.
Also, we need to update the docs to state that the minimum java version
required is 1.8.
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.
This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed. And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit). This arrangement is very
error-prone for users.
This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.
The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.
The change applies to both REST and java clients.
* [ML] adding pivot.size option for setting paging size
* Changing field name to address PR comments
* fixing ctor usage
* adjust hlrc for field name change
* [ML] Adds progress reporting for transforms
* fixing after master merge
* Addressing PR comments
* removing unused imports
* Adjusting afterKey handling and percentage to be 100*
* Making sure it is a linked hashmap for serialization
* removing unused import
* addressing PR comments
* removing unused import
* simplifying code, only storing total docs and decrementing
* adjusting for rewrite
* removing initial progress gathering from executor
This commit fixes a problem with BWC that was brought up in #40511. A
newer version of the code was emitting a new value for an enum to an
older version, and the older version could not handle that. It caused
the response to error. The MainResponse is now relaxed, and will accept
whatever values the server expose, and holds most of them as Strings
instead of complex objects.
Fixes#40511
We generate two pages with "funny" names:
* _changing_the_client_8217_s_initialization_code.html
* _changing_the_application_8217_s_code.html
The leading `_` comes from us not specifying the name of the page. The
`8217` comes about because of the single quote character. This is a
funny name, but it is the name that we have so we shouldn't change it
without putting in a redirect.
We're looking at switching these docs from being built with the
no-longer-maintained AsciiDoc project to being built with the
actively-maintained Asciidoctor project. Asciidoctor Doesn't include the
`8217`s in the generated ids. That is *better*, but we don't really want
to change the pages. Ultimately we'd prefer none of our pages start with
`_`, but that is a problem for a different time.
Anyway, this pins the ids to their "funny" id so it won't change when we
switch to Asciidoctor. We'll remove it later, when we have more fine
control of our redirects.
* [ML] Add data frame task state object and field
* A new state item is added so that the overall task state can be
accoutned for
* A new FAILED state and reason have been added as well so that failures
can be shown to the user for optional correction
* Addressing PR comments
* adjusting after master merge
* addressing pr comment
* Adjusting auditor usage with failure state
* Refactor, renamed state items to task_state and indexer_state
* Adding todo and removing redundant auditor call
* Address HLRC changes and PR comment
* adjusting hlrc IT test
* [ML] make source and dest objects in the transform config
* addressing PR comments
* Fixing compilation post merge
* adding comment for Arrays.hashCode
* addressing changes for moving dest to object
* fixing data_frame yml tests
* fixing API test
The Migration Assistance API has been functionally replaced by the
Deprecation Info API, and the Migration Upgrade API is not used for the
transition from ES 6.x to 7.x, and does not need to be kept around to
repair indices that were not properly upgraded before upgrading the
cluster, as was the case in 6.
This commit introduces the forget follower API. This API is needed in cases that
unfollowing a following index fails to remove the shard history retention leases
on the leader index. This can happen explicitly through user action, or
implicitly through an index managed by ILM. When this occurs, history will be
retained longer than necessary. While the retention lease will eventually
expire, it can be expensive to allow history to persist for that long, and also
prevent ILM from performing actions like shrink on the leader index. As such, we
introduce an API to allow for manual removal of the shard history retention
leases in this case.
We have had various reports of problems caused by the maxRetryTimeout
setting in the low-level REST client. Such setting was initially added
in the attempts to not have requests go through retries if the request
already took longer than the provided timeout.
The implementation was problematic though as such timeout would also
expire in the first request attempt (see #31834), would leave the
request executing after expiration causing memory leaks (see #33342),
and would not take into account the http client internal queuing (see #25951).
Given all these issues, it seems that this custom timeout mechanism
gives little benefits while causing a lot of harm. We should rather rely
on connect and socket timeout exposed by the underlying http client
and accept that a request can overall take longer than the configured
timeout, which is the case even with a single retry anyways.
This commit removes the `maxRetryTimeout` setting and all of its usages.
Elasticsearch has long [supported](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#index-versioning) compare and set (a.k.a optimistic concurrency control) operations using internal document versioning. Sadly that approach is flawed and can sometime do the wrong thing. Here's the relevant excerpt from the resiliency status page:
> When a primary has been partitioned away from the cluster there is a short period of time until it detects this. During that time it will continue indexing writes locally, thereby updating document versions. When it tries to replicate the operation, however, it will discover that it is partitioned away. It won’t acknowledge the write and will wait until the partition is resolved to negotiate with the master on how to proceed. The master will decide to either fail any replicas which failed to index the operations on the primary or tell the primary that it has to step down because a new primary has been chosen in the meantime. Since the old primary has already written documents, clients may already have read from the old primary before it shuts itself down. The version numbers of these reads may not be unique if the new primary has already accepted writes for the same document
We recently [introduced](https://www.elastic.co/guide/en/elasticsearch/reference/6.x/optimistic-concurrency-control.html) a new sequence number based approach that doesn't suffer from this dirty reads problem.
This commit removes support for internal versioning as a concurrency control mechanism in favor of the sequence number approach.
Relates to #1078
X-Pack security supports built-in authentication service
`token-service` that allows access tokens to be used to
access Elasticsearch without using Basic authentication.
The tokens are generated by `token-service` based on
OAuth2 spec. The access token is a short-lived token
(defaults to 20m) and refresh token with a lifetime of 24 hours,
making them unsuitable for long-lived or recurring tasks where
the system might go offline thereby failing refresh of tokens.
This commit introduces a built-in authentication service
`api-key-service` that adds support for long-lived tokens aka API
keys to access Elasticsearch. The `api-key-service` is consulted
after `token-service` in the authentication chain. By default,
if TLS is enabled then `api-key-service` is also enabled.
The service can be disabled using the configuration setting.
The API keys:-
- by default do not have an expiration but expiration can be
configured where the API keys need to be expired after a
certain amount of time.
- when generated will keep authentication information of the user that
generated them.
- can be defined with a role describing the privileges for accessing
Elasticsearch and will be limited by the role of the user that
generated them
- can be invalidated via invalidation API
- information can be retrieved via a get API
- that have been expired or invalidated will be retained for 1 week
before being deleted. The expired API keys remover task handles this.
Following are the API key management APIs:-
1. Create API Key - `PUT/POST /_security/api_key`
2. Get API key(s) - `GET /_security/api_key`
3. Invalidate API Key(s) `DELETE /_security/api_key`
The API keys can be used to access Elasticsearch using `Authorization`
header, where the auth scheme is `ApiKey` and the credentials, is the
base64 encoding of API key Id and API key separated by a colon.
Example:-
```
curl -H "Authorization: ApiKey YXBpLWtleS1pZDphcGkta2V5" http://localhost:9200/_cluster/health
```
Closes#34383
Added deprecation warnings for use of include_type_name in put/get index templates.
HLRC changes:
GetIndexTemplateRequest has a new client-side class which is a copy of server's GetIndexTemplateResponse but modified to be typeless.
PutIndexTemplateRequest has a new client-side counterpart which doesn't use types in the mappings
Relates to #35190
This commit modifies the put follow index action to use a
CcrRepository when creating a follower index. It routes
the logic through the snapshot/restore process. A
wait_for_active_shards parameter can be used to configure
how long to wait before returning the response.
From previous PRs, we've already added support for include_type_name to
the get mapping API. We had also taken an approach to the HLRC where the
server-side `GetMappingResponse#fromXContent` could only handle typeless
input.
This PR updates the HLRC for 'get mapping' to be in line with our new approach:
* Add a typeless 'get mappings' method to the Java HLRC, that accepts new
client-side request and response objects. This new response only handles
typeless mapping definitions.
* Switch the old version of `GetMappingResponse` back to expecting typed
mappings, and deprecate the corresponding method on the HLRC.
Finally, the PR also does some small, related clean-up around 'get field mappings'.
From #29453 and #37285, the include_type_name parameter was already present and defaulted to false. This PR makes the following updates:
* Add deprecation warnings to RestCreateIndexAction, plus tests in RestCreateIndexActionTests.
* Add a typeless 'create index' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I created new CreateIndexRequest and CreateIndexResponse objects that differ from the existing server ones.
The EmptyResponse is essentially the same as returning a boolean, which
is done in other places. This commit deprecates all the existing
EmptyResponse methods and creates new boolean methods that have method
params reordered so they can exist with the deprecated methods. A
followup PR in master will remove the existing deprecated methods, fix
the parameter ordering and deprecate the incorrectly ordered parameter
methods.
Relates #36938
- Add deprecation warning to RestGetFieldMappingAction
- Add two new java HRLC classes GetFieldMappingsRequest and
GetFieldMappingsResponse. These classes use new typeless forms
of a request and response, and differ in that from the server
versions.
Relates to #35190
From #29453 and #37285, the `include_type_name` parameter was already present and defaulted to false. This PR makes the following updates:
- Add deprecation warnings to `RestPutMappingAction`, plus tests in `RestPutMappingActionTests`.
- Add a typeless 'put mappings' method to the Java HLRC, and deprecate the old typed version. To do this cleanly, I opted to create a new `PutMappingRequest` object that differs from the existing server one.
This change adds deprecation warning to the indices.get_mapping API in case the
"inlcude_type_name" parameter is set to "true" and changes the parsing code in
GetMappingsResponse to parse the type-less response instead of the one
containing types. As a consequence the HLRC client doesn't need to force
"include_type_name=true" any more and the GetMappingsResponseTests can be
adapted to the new format as well. Also removing some "include_type_name"
parameters in yaml test and docs where not necessary.
Update the scroll example ascii and Java docs, so it is more clear when to
consume the scroll documents. Before this change the user could loose
the first results if one uses copy & paste.
Added warnings checks to existing tests
Added “defaultTypeIfNull” to DocWriteRequest interface so that Bulk requests can override a null choice of document type with any global custom choice.
Related to #35190
* Deprecate types in index API
- deprecate type-based constructors of IndexRequest
- update tests to use typeless IndexRequest constructors
- no yaml tests as they have been already added in #35790
Relates to #35190
This change:
- Adds functionality to invalidate all (refresh+access) tokens for all users of a realm
- Adds functionality to invalidate all (refresh+access)tokens for a user in all realms
- Adds functionality to invalidate all (refresh+access) tokens for a user in a specific realm
- Changes the response format for the invalidate token API to contain information about the
number of the invalidated tokens and possible errors that were encountered.
- Updates the API Documentation
After back-porting to 6.x, the `created` field will be removed from master as a field in the
response
Resolves: #35115
Relates: #34556
The following updates were made:
* Add deprecation warnings to `RestUpdateAction`, plus a test in `RestUpdateActionTests`.
* Deprecate relevant methods on the Java HLRC requests/ responses.
* Add HLRC integration tests for the typed APIs.
* Update documentation (for both the REST API and Java HLRC).
* Fix failing integration tests.
Because of an earlier PR, the REST yml tests were already updated (one version without types, and another legacy version that retains types).
This adds the _security/user/_privileges API to the High
Level Rest Client.
This also makes some changes to the Java model for the Role APIs
in order to better accommodate the GetPrivileges API
The following updates were made:
- Add a new untyped endpoint `{index}/_explain/{id}`.
- Add deprecation warnings to Rest*Action, plus tests in Rest*ActionTests.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
For each API, the following updates were made:
- Add deprecation warnings to `Rest*Action`, plus tests in `Rest*ActionTests`.
- For each REST yml test, make sure there is one version without types, and another legacy version that retains types (called *_with_types.yml).
- Deprecate relevant methods on the Java HLRC requests/ responses.
- Update documentation (for both the REST API and Java HLRC).
This commit adds support for the index templates exist API, creating
new client-side request types for that API and the get index
templates API. Also adds links in hlrc docs to pages for supported
index template APIs
* Add deprecation warnings to `Rest*TermVectorsAction`, plus tests in `Rest*TermVectorsActionTests`.
* Deprecate relevant methods on the Java HLRC requests/ responses.
* Update documentation (for both the REST API and Java HLRC).
* For each REST yml test, create one version without types, and another legacy version that retains types (called *_with_types.yml).
- GetSslCertificatesRequest need not implement toXContentObject
- getRequest() returns a new Request object
- Add tests for GetSslCertificatesResponse
- Adjust docs to the new format
Add a more detailed section about what exceptions to expect from the blocking
calls in this class and removing the mostly redundant mentions of the
IOExceptions from each method javadoc since it doesn't give much details about
the expected exceptions anyway.
Closes#30334
- Add the authentication realm and lookup realm name and type in the response for the _authenticate API
- The authentication realm is set as the lookup realm too (instead of setting the lookup realm to null or empty ) when no lookup realm is used.
Update PutUserRequest to support password_hash (see: #35242)
This also updates the documentation to bring it in line with our more
recent approach to HLRC docs.
this adds documentation for the retry method in the
high-level-ilm-rest-client.
this PR also renames retryLifecycleStep to retryLifecyclePolicy in the index-lifecycle-client
* Deprecate types in count requests.
* Move RestCountAction to the 'search' package.
* Deprecate types in multi search requests.
* Add tests for types deprecation in the _search endpoint.
* ML: Adding missing datacheck to datafeedjob
* Adding client side and docs
* Making adjustments to validations
* Making values default to on, having more sensible limits
* Intermittent commit, still need to figure out interval
* Adjusting delayed data check interval
* updating docs
* Making parameter Boolean, so it is nullable
* bumping bwc to 7 before backport
* changing to version current
* moving delayed data check config its own object
* Separation of duties for delayed data detection
* fixing checkstyles
* fixing checkstyles
* Adjusting default behavior so that null windows are allowed
* Mentioning the default value
* Fixing comments, syncing up validations
With #34811 the API for stopping rollup jobs got two new url parameters
"wait_for_completion" and "timeout". This change adds these to the HLRC APIs as
well.
Relates to #34811
* Adds HLRC docs for put lifecycle policy
* Adds link to docs in client javadocs
* Fixes checkstyle
* Make the documentation use the right ack response
Implement high level client for migration upgrade API. It should wrap
RestHighLevelClient and expose high level IndexUpgradeRequest (new),
IndexTaskResponse for submissions with wait_for_completion=false and
BulkByScrollResponse (already used) objects.
refers: #29827
* Moved `AcknowledgedResponse` to core package
* Made AcknowledgedResponse not abstract and provided a default parser,
so that in cases when the field name is not overwritten then there
is no need for a subclass.
Relates to #33824
This change adds support for clearing the cache of a realm. The realms
cache may contain a stale set of credentials or incorrect role
assignment, which can be corrected by clearing the cache of the entire
realm or just that of a specific user.
Relates #29827
This adds the security `_authenticate` API to the HLREST client.
It is unlike some of the other APIs because the request does not
have a body.
The commit also creates the `User` entity. It is important
to note that the `User` entity does not have the `enabled`
flag. The `enabled` flag is part of the response, alongside
the `User` entity.
Moreover this adds the `SecurityIT` test class
(extending `ESRestHighLevelClientTestCase`).
Relates #29827
Bulk Request in High level rest client should be consistent with what is
possible in Rest API, therefore should support global parameters. Global
parameters are passed in URL in Rest API.
Some parameters are mandatory - index, type - and would fail validation
if not provided before before the bulk is executed.
Optional parameters - routing, pipeline.
The usage of these should be consistent across sync/async execution,
bulk processor and BulkRequestBuilder
closes#26026
This further applies the pattern set in #34125 to reduce copy-and-paste
in the multi-document CRUD portion of the High Level REST Client docs.
It also adds line wraps to snippets that are too wide to fit into the box
when rendered in the docs, following up on the work started in #34163.
Adds support for the Clear Roles Cache API to the High Level Rest
Client. As part of this a helper class, NodesResponseHeader, has been
added that enables parsing the nodes header from responses that are
node requests.
Relates to #29827
* Replace custom type names with _doc in REST examples.
* Avoid using two mapping types in the percolator docs.
* Rename doc -> _doc in the main repository README.
* Also replace some custom type names in the HLRC docs.
Introduces client-specific request and response classes that do not
depend on the server
The `type` parameter is named `licenseType` in the response class to be
more descriptive. The parts that make up the acknowledged-required
response are given slightly different names than their server-response
types to be consistent with the naming in the put license API
Tests do not cover all cases because the integ test cluster starts up
with a trial license - this will be addressed in a future commit
* Replace deprecated field `code` with `source` for stored scripts (#25127)
* Replace examples using the deprecated endpoint `{index}/{type}/_search`
with `{index}/_search` (#29468)
* Use a system property to avoid deprecation warnings after the Update
Scripts have been moved to their own context (#32096)
We added support for role mapper expression DSL in #33745,
that allows us to build the role mapper expression used in the
role mapping (as rules for determining user roles based on what
the boolean expression resolves to).
This change now adds support for create/update role mapping
API to the high-level rest client.
* HLRC: ML Add preview datafeed api
* Changing deprecation handling for parser
* Removing some duplication in docs, will address other APIs in another PR
* HLRC: ML Cleanup docs
* updating get datafeed stats docs
This further applies the pattern set in #34125 to reduce copy-and-paste
in the single document CRUD portion of the High Level REST Client docs.
It also adds line wraps to snippets that are too wide to fit into the box
when rendered in the docs, following up on the work started in #34163.
This changes the delete job API by adding
the choice to delete a job asynchronously.
The commit adds a `wait_for_completion` parameter
to the delete job request. When set to `false`,
the action returns immediately and the response
contains the task id.
This also changes the handling of subsequent
delete requests for a job that is already being
deleted. It now uses the task framework to check
if the job is being deleted instead of the cluster
state. This is a beneficial for it is going to also
be working once the job configs are moved out of the
cluster state and into an index. Also, force delete
requests that are waiting for the job to be deleted
will not proceed with the deletion if the first task
fails. This will prevent overloading the cluster. Instead,
the failure is communicated better via notifications
so that the user may retry.
Finally, this makes the `deleting` property of the job
visible (also it was renamed from `deleted`). This allows
a client to render a deleting job differently.
Closes#32836
* HLRC: ML Add preview datafeed api
* Changing deprecation handling for parser
* Removing some duplication in docs, will address other APIs in another PR
This change adds throttling to the update-by-query and delete-by-query cases
similar to throttling for reindex. This mostly means additional methods on the
client class itself, since the request hits the same RestHandler, just with
slightly different endpoints, and also the return values are similar.
Adds support for the get rollup job to the High Level REST Client. I had
to do three interesting and unexpected things:
1. I ported the rollup state wiping code into the high level client
tests. I'll move this into the test framework in a followup and remove
the x-pack version.
2. The `timeout` in the rollup config was serialized using the
`toString` representation of `TimeValue` which produces fractional time
values which are more human readable but aren't supported by parsing. So
I switched it to `getStringRep`.
3. Refactor the xcontent round trip testing utilities so we can test
parsing of classes that don't implements `ToXContent`.
We use wrap code in `// tag` and `//end` to include it in our docs. Our
current docs style wraps code snippets in a box that is only wide enough
for 76 characters and adds a horizontal scroll bar for wider snippets
which makes the snippet much harder to read. This adds a checkstyle check
that looks for java code that is included in the docs and is wider than
that 76 characters so all snippets fit into the box. It solves many of
the failures that this catches but suppresses many more. I will clean
those up in a follow up change.
This commit adds the Create Rollup Job API to the high level REST
client. It supersedes #32703 and adds dedicated request/response
objects so that it does not depend on server side components.
Related #29827
This also changes both `DatafeedConfig` and `DatafeedUpdate`
to store the query and aggs as a bytes reference. This allows
the client to remove its dependency to the named objects
registry of the search module.
Relates #29827
This change adds support for enable and disable user APIs to the high
level rest client. There is a common request base class for both
requests with specific requests that simplify the use of these APIs.
The response for these APIs is simply an empty object so a new response
class has been created for cases where we expect an empty response to
be returned.
Finally, the put user documentation has been moved to the proper
location that is not within an x-pack sub directory and the document
tags no longer contain x-pack.
See #29827