This changes the tree validation code to ensure no node in the tree has a
feature index that is beyond the bounds of the feature_names array.
Specifically this handles the situation where the C++ emits a tree containing
a single node and an empty feature_names list. This is valid tree used to
centre the data in the ensemble but the validation code would reject this
as feature_names is empty. This meant a broken workflow as you cannot GET
the model and PUT it back
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.
At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.
Co-Authored by: Zachary Tong <zach@elastic.co>
This adds a builder and parsed results for the `string_stats`
aggregation directly to the high level rest client. Without this the
HLRC can't access the `string_stats` API without the elastic licensed
`analytics` module.
While I'm in there this adds a few of our usual unit tests and
modernizes the parsing.
This change adds support for the following new model_size_stats
fields:
- categorized_doc_count
- total_category_count
- frequent_category_count
- rare_category_count
- dead_category_count
- categorization_status
Backport of #51879
This commit changes how RestHandlers are registered with the
RestController so that a RestHandler no longer needs to register itself
with the RestController. Instead the RestHandler interface has new
methods which when called provide information about the routes
(method and path combinations) that are handled by the handler
including any deprecated and/or replaced combinations.
This change also makes the publication of RestHandlers safe since they
no longer publish a reference to themselves within their constructors.
Closes#51622
Co-authored-by: Jason Tedor <jason@tedor.me>
Backport of #51950
in preparation for feature importance and split information gain, adding `number_samples` field to `TreeNode` definition.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
The main purpose of this commit is to add a single autoscaling REST
endpoint skeleton, for the purpose of starting to build out the build
and testing infrastructure that will surround it. For example, rather
than commiting a fully-functioning autoscaling API, we introduce here
the skeleton so that we can start wiring up the build and testing
infrastructure, establish security roles/permissions, an so on. This
way, in a forthcoming PR that introduces actual functionality, that PR
will be smaller and have less distractions around that sort of
infrastructure.
SecurityIT.testGetUser creates a user for testing purposes, but did
not delete the user at the end of the test. This could leave the
cluster in an unexpected state for other tests.
This commit:
- Deletes the user at the end of `testGetUser`
- Adds the test-name as metadata to the users that are created in `SecurityIT`
so that their origin is clear if they do interfere with other tests
- Enables SecurityDocumentationIT.testGetUsers on the expectation that
the new cleanup step will resolve the unreliability of that test.
Relates: #48440
Co-authored-by: Tim Vernum <tim@adjective.org>
Currently, the same class `FieldCapabilities` is used both to represent the
capabilities for one index, and also the merged capabilities across indices. To
help clarify the logic, this PR proposes to create a separate class
`IndexFieldCapabilities` for the capabilities in one index. The refactor will
also help when adding `source_path` information in #49264, since the merged
source path field will have a different structure from the field for a single index.
Individual changes:
* Add a new class IndexFieldCapabilities.
* Remove extra constructor from FieldCapabilities.
* Combine the add and merge methods in FieldCapabilities.Builder.
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
* Rename ILM history index enablement setting
The previous setting was `index.lifecycle.history_index_enabled`, this commit changes it to
`indices.lifecycle.history_index_enabled` to indicate this is not an index-level setting (it's node
level).
* [ML][Inference] Fix weighted mode definition (#51648)
Weighted mode inaccurately assumed that the "max value" of the input values would be the maximum class value. This does not make sense.
Weighted Mode should know how many classes there are. Hence the new parameter `num_classes`. This indicates what the maximum class value to be expected.
The audit index is re-created for every testrun and therefore potential useful debug information
gets lost. This change reads out the audit index and logs the results, which makes them available
for debugging CI issues.
relates #51549
This commit creates a new index privilege named `maintenance`.
The privilege grants the following actions: `refresh`, `flush` (also synced-`flush`),
and `force-merge`. Previously the actions were only under the `manage` privilege
which in some situations was too permissive.
Co-authored-by: Amir H Movahed <arhd83@gmail.com>
This commit adds examples in our documentation for
- An HLRC instance authenticating to an elasticsearch cluster using
an elasticsearch token service access token or an API key
- An HLRC instance connecting to an elasticsearch cluster that is
setup for TLS on the HTTP layer when the CA certificate of the
cluster is available either as a PEM file or a keystore
- An HLRC instance connecting to an elasticsearch cluster that
requires client authentication where the client key and certificate
are available in a keystore
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
This change changes the way to run our test suites in
JVMs configured in FIPS 140 approved mode. It does so by:
- Configuring any given runtime Java in FIPS mode with the bundled
policy and security properties files, setting the system
properties java.security.properties and java.security.policy
with the == operator that overrides the default JVM properties
and policy.
- When runtime java is 11 and higher, using BouncyCastle FIPS
Cryptographic provider and BCJSSE in FIPS mode. These are
used as testRuntime dependencies for unit
tests and internal clusters, and copied (relevant jars)
explicitly to the lib directory for testclusters used in REST tests
- When runtime java is 8, using BouncyCastle FIPS
Cryptographic provider and SunJSSE in FIPS mode.
Running the tests in FIPS 140 approved mode doesn't require an
additional configuration either in CI workers or locally and is
controlled by specifying -Dtests.fips.enabled=true
* [ML][Inference] add tags url param to GET (#51330)
Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint.
This parameter allows the list of models to be further reduced to those who contain all the provided tags.
This change adds a new `kibana_admin` role, and deprecates
the old `kibana_user` and`kibana_dashboard_only_user`roles.
The deprecation is implemented via a new reserved metadata
attribute, which can be consumed from the API and also triggers
deprecation logging when used (by a user authenticating to
Elasticsearch).
Some docs have been updated to avoid references to these
deprecated roles.
Backport of: #46456
Co-authored-by: Larry Gregory <lgregorydev@gmail.com>
* [ML][Inference] Adding classification_weights to ensemble models
classification_weights are a way to allow models to
prefer specific classification results over others
this might be advantageous if classification value
probabilities are a known quantity and can improve
model error rates.
Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.
Backport of #50914
This commit improves the performance of warning value extraction in the
low-level REST client, and is similar to the approach taken in
#24114. There are some differences since the low-level REST client might
be connected to Elasticsearch through a proxy that injects its own
warnings.
* [ML][Inference] PUT API (#50852)
This adds the `PUT` API for creating trained models that support our format.
This includes
* HLRC change for the API
* API creation
* Validations of model format and call
* fixing backport
Since 6.0, the 'template' field has been deprecated in put template requests in
favour of index_patterns. Previously, the PutIndexTemplateRequest would accept
the 'template' field in its 'source' methods and silently convert it to
'index_patterns'. This meant that users specifying 'template' in the source
would not receive a deprecation warning from the server.
This PR makes a small change to no longer silently convert 'template' to
'index_patterns', which ensures that users receive a deprecation warning.
Follow-up to #49460.
Replaces the "funny"
`Function<String, ConstructingObjectParser<T, Void>>` with a much
simpler `ConstructingObjectParser<T, String>`. This makes pretty much
all of our object parsers static.
This adds the necessary named XContent classes to the HLRC for the lang ident model. This is so the HLRC can call `GET _ml/inference/lang_ident_model_1?include_definition=true` without XContent parsing errors.
The constructors are package private as since this classes are used exclusively within the pre-packaged model (and require the specific weights, etc. to be of any use).
If a pipeline referenced by a transform does not exist, we should not allow the transform to be created.
We do allow the pipeline existence check to be skipped with defer_validations, but if the pipeline still does not exist on `_start`, the pipeline will fail to start.
relates: #50135
This PR adds per-field metadata that can be set in the mappings and is later
returned by the field capabilities API. This metadata is completely opaque to
Elasticsearch but may be used by tools that index data in Elasticsearch to
communicate metadata about fields with tools that then search this data. A
typical example that has been requested in the past is the ability to attach
a unit to a numeric field.
In order to not bloat the cluster state, Elasticsearch requires that this
metadata be small:
- keys can't be longer than 20 chars,
- values can only be numbers or strings of no more than 50 chars - no inner
arrays or objects,
- the metadata can't have more than 5 keys in total.
Given that metadata is opaque to Elasticsearch, field capabilities don't try to
do anything smart when merging metadata about multiple indices, the union of
all field metadatas is returned.
Here is how the meta might look like in mappings:
```json
{
"properties": {
"latency": {
"type": "long",
"meta": {
"unit": "ms"
}
}
}
}
```
And then in the field capabilities response:
```json
{
"latency": {
"long": {
"searchable": true,
"aggreggatable": true,
"meta": {
"unit": [ "ms" ]
}
}
}
}
```
When there are no conflicts, values are arrays of size 1, but when there are
conflicts, Elasticsearch includes all unique values in this array, without
giving ways to know which index has which metadata value:
```json
{
"latency": {
"long": {
"searchable": true,
"aggreggatable": true,
"meta": {
"unit": [ "ms", "ns" ]
}
}
}
}
```
Closes#33267
We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which
are final. This is *probably* the right way to declare them because in
practice we never mutate them after they are built. And we certainly
don't change the static reference. Anyway, this adds `final` to these
parsers.
I found the non-final parsers with this:
```
diff \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*PARSER\s*=' {} \+ | sort) \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*final.*PARSER\s*=' {} \+ | sort) \
2>&1 | grep '^<'
```
Adds a `force` parameter to the delete data frame analytics
request. When `force` is `true`, the action force-stops the
jobs and then proceeds to the deletion. This can be used in
order to delete a non-stopped job with a single request.
Closes#48124
Backport of #50553
We have about 800 `ObjectParsers` in Elasticsearch, about 700 of which
are final. This is *probably* the right way to declare them because in
practice we never mutate them after they are built. And we certainly
don't change the static reference. Anyway, this adds `final` to a bunch
of these parsers, mostly the ones in xpack and their "paired" parsers in
the high level rest client. I picked these just to have somewhere to
break the up the change so it wouldn't be huge.
I found the non-final parsers with this:
```
diff \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*PARSER\s*=' {} \+ | sort) \
<(find . -type f -name '*.java' -exec grep -iHe 'static.*final.*PARSER\s*=' {} \+ | sort) \
2>&1 | grep '^<'
```
The additional change to the original PR (#49657), is that `org.elasticsearch.client.cluster.RemoteConnectionInfo` now parses the initial_connect_timeout field as a string instead of a TimeValue instance.
The reason that this is needed is because that the initial_connect_timeout field in the remote connection api is serialized for human consumption, but not for parsing purposes.
Therefore the HLRC can't parse it correctly (which caused test failures in CI, but not in the PR CI
:( ). The way this field is serialized needs to be changed in the remote connection api, but that is a breaking change. We should wait making this change until rest api versioning is introduced.
Co-Authored-By: j-bean <anton.shuvaev91@gmail.com>
Co-authored-by: j-bean <anton.shuvaev91@gmail.com>
* Add ILM histore store index (#50287)
* Add ILM histore store index
This commit adds an ILM history store that tracks the lifecycle
execution state as an index progresses through its ILM policy. ILM
history documents store output similar to what the ILM explain API
returns.
An example document with ALL fields (not all documents will have all
fields) would look like:
```json
{
"@timestamp": 1203012389,
"policy": "my-ilm-policy",
"index": "index-2019.1.1-000023",
"index_age":123120,
"success": true,
"state": {
"phase": "warm",
"action": "allocate",
"step": "ERROR",
"failed_step": "update-settings",
"is_auto-retryable_error": true,
"creation_date": 12389012039,
"phase_time": 12908389120,
"action_time": 1283901209,
"step_time": 123904107140,
"phase_definition": "{\"policy\":\"ilm-history-ilm-policy\",\"phase_definition\":{\"min_age\":\"0ms\",\"actions\":{\"rollover\":{\"max_size\":\"50gb\",\"max_age\":\"30d\"}}},\"version\":1,\"modified_date_in_millis\":1576517253463}",
"step_info": "{... etc step info here as json ...}"
},
"error_details": "java.lang.RuntimeException: etc\n\tcaused by:etc etc etc full stacktrace"
}
```
These documents go into the `ilm-history-1-00000N` index to provide an
audit trail of the operations ILM has performed.
This history storage is enabled by default but can be disabled by setting
`index.lifecycle.history_index_enabled` to `false.`
Resolves#49180
* Make ILMHistoryStore.putAsync truly async (#50403)
This moves the `putAsync` method in `ILMHistoryStore` never to block.
Previously due to the way that the `BulkProcessor` works, it was possible
for `BulkProcessor#add` to block executing a bulk request. This was bad
as we may be adding things to the history store in cluster state update
threads.
This also moves the index creation to be done prior to the bulk request
execution, rather than being checked every time an operation was added
to the queue. This lessens the chance of the index being created, then
deleted (by some external force), and then recreated via a bulk indexing
request.
Resolves#50353
* Update remote cluster stats to support simple mode (#49961)
Remote cluster stats API currently only returns useful information if
the strategy in use is the SNIFF mode. This PR modifies the API to
provide relevant information if the user is in the SIMPLE mode. This
information is the configured addresses, max socket connections, and
open socket connections.
* Send hostname in SNI header in simple remote mode (#50247)
Currently an intermediate proxy must route conncctions to the
appropriate remote cluster when using simple mode. This commit offers
a additional mechanism for the proxy to route the connections by
including the hostname in the TLS SNI header.
* Rename the remote connection mode simple to proxy (#50291)
This commit renames the simple connection mode to the proxy connection
mode for remote cluster connections. In order to do this, the mode specific
settings which we namespaced by their mode (ex: sniff.seed and
proxy.addresses) have been reverted.
* Modify proxy mode to support a single address (#50391)
Currently, the remote proxy connection mode uses a list setting for the
proxy address. This commit modifies this so that the setting is
proxy_address and only supports a single remote proxy address.
The "code_user" and "code_admin" reserved roles existed to support
code search which is no longer included in Kibana.
The "kibana_system" role included privileges to read/write from the
code search indices, but no longer needs that access.
Backport of: #50068
This adds a new `randomize_seed` for regression and classification.
When not explicitly set, the seed is randomly generated. One can
reuse the seed in a similar job in order to ensure the same docs
are picked for training.
Backport of #49990
Adds `GET /_script_language` to support Kibana dynamic scripting
language selection.
Response contains whether `inline` and/or `stored` scripts are
enabled as determined by the `script.allowed_types` settings.
For each scripting language registered, such as `painless`,
`expression`, `mustache` or custom, available contexts for the language
are included as determined by the `script.allowed_contexts` setting.
Response format:
```
{
"types_allowed": [
"inline",
"stored"
],
"language_contexts": [
{
"language": "expression",
"contexts": [
"aggregation_selector",
"aggs"
...
]
},
{
"language": "painless",
"contexts": [
"aggregation_selector",
"aggs",
"aggs_combine",
...
]
}
...
]
}
```
Fixes: #49463
**Backport**
Reindex sort never gave a guarantee about the order of documents being
indexed into the destination, though it could give a sense of locality
of source data.
It prevents us from doing resilient reindex and other optimizations and
it has therefore been deprecated.
Related to #47567
Reindex sort never gave a guarantee about the order of documents being
indexed into the destination, though it could give a sense of locality
of source data.
It prevents us from doing resilient reindex and other optimizations and
it has therefore been deprecated.
Related to #47567
This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.
Closes#49531
Backport of #49690
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.
Closes#43599
This commit back ports three commits related to enabling the simple
connection strategy.
Allow simple connection strategy to be configured (#49066)
Currently the simple connection strategy only exists in the code. It
cannot be configured. This commit moves in the direction of allowing it
to be configured. It introduces settings for the addresses and socket
count. Additionally it introduces new settings for the sniff strategy
so that the more generic number of connections and seed node settings
can be deprecated.
The simple settings are not yet registered as the registration is
dependent on follow-up work to validate the settings.
Ensure at least 1 seed configured in remote test (#49389)
This fixes#49384. Currently when we select a random subset of seed
nodes from a list, it is possible for 0 seeds to be selected. This test
depends on at least 1 seed being selected.
Add the simple strategy to cluster settings (#49414)
This is related to #49067. This commit adds the simple connection
strategy settings and strategy mode setting to the cluster settings
registry. With these changes, the simple connection mode can be used.
Additionally, it adds validation to ensure that settings cannot be
misconfigured.
This commit replaces the _estimate_memory_usage API with
a new API, the _explain API.
The API consolidates information that is useful before
creating a data frame analytics job.
It includes:
- memory estimation
- field selection explanation
Memory estimation is moved here from what was previously
calculated in the _estimate_memory_usage API.
Field selection is a new feature that explains to the user
whether each available field was selected to be included or
not in the analysis. In the case it was not included, it also
explains the reason why.
Backport of #49455
This commit adds a deprecation warning when starting
a node where either of the server contexts
(xpack.security.transport.ssl and xpack.security.http.ssl)
meet either of these conditions:
1. The server lacks a certificate/key pair (i.e. neither
ssl.keystore.path not ssl.certificate are configured)
2. The server has some ssl configuration, but ssl.enabled is not
specified. This new validation does not care whether ssl.enabled is
true or false (though other validation might), it simply makes it
an error to configure server SSL without being explicit about
whether to enable that configuration.
Backport of: #45892
* [ML] ML Model Inference Ingest Processor (#49052)
* [ML][Inference] adds lazy model loader and inference (#47410)
This adds a couple of things:
- A model loader service that is accessible via transport calls. This service will load in models and cache them. They will stay loaded until a processor no longer references them
- A Model class and its first sub-class LocalModel. Used to cache model information and run inference.
- Transport action and handler for requests to infer against a local model
Related Feature PRs:
* [ML][Inference] Adjust inference configuration option API (#47812)
* [ML][Inference] adds logistic_regression output aggregator (#48075)
* [ML][Inference] Adding read/del trained models (#47882)
* [ML][Inference] Adding inference ingest processor (#47859)
* [ML][Inference] fixing classification inference for ensemble (#48463)
* [ML][Inference] Adding model memory estimations (#48323)
* [ML][Inference] adding more options to inference processor (#48545)
* [ML][Inference] handle string values better in feature extraction (#48584)
* [ML][Inference] Adding _stats endpoint for inference (#48492)
* [ML][Inference] add inference processors and trained models to usage (#47869)
* [ML][Inference] add new flag for optionally including model definition (#48718)
* [ML][Inference] adding license checks (#49056)
* [ML][Inference] Adding memory and compute estimates to inference (#48955)
* fixing version of indexed docs for model inference
Backport of #48849. Update `.editorconfig` to make the Java settings the
default for all files, and then apply a 2-space indent to all `*.gradle`
files. Then reformat all the files.
* [ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)
[ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)
Related PR: https://github.com/elastic/ml-cpp/pull/809
* adjusting bwc version
This commit finishes cleaning up the AbstractHlrcXContentTestCase work
and removes this class. All classes that were using this are now using
the updated base class.
Ref #39745
The old graph tests were duplicated a lot and used a deprecated parent
class. This commit cleans that up and removes one of the duplicated
tests.
Ref #39745
* [ML][Inference] separating definition and config object storage (#48651)
This separates out the `definition` object from being stored within the configuration object in the index.
This allows us to gather the config object without decompressing a potentially large definition.
Additionally, `input` is moved to the TrainedModelConfig object and out of the definition. This is so the trained input fields are accessible outside the potentially large model definition.
Adding support for the `search_type` request parameter to the Ranking Evaluation
API since this parameter can impact the ranking and the metric score and should
be choosen in the same way when evaluating the search as later in the real
search.
Closes#48503
Backport of #48447. Make a number of changes so that code in the
`client` directory is more resilient to automatic formatting. This
covers:
* Literal JSON handling:
* Reformatting multiline JSON to embed whitespace in the strings
* Remove string concatenation where JSON fits on a single line
* Use `String.format` for large documents with variable content
* Remove some erroneous doc refs in `QueryDSLDocumentationTests`
* Move some comments around to they aren't auto-formatted to a strange
place
This commit simplifies and standardizes our usage of the Gradle Shadow
plugin to conform more to plugin conventions. The custom "bundle" plugin
has been removed as it's not necessary and performs the same function
as the Shadow plugin's default behavior with existing configurations.
Additionally, this removes unnecessary creation of a "nodeps" artifact,
which is unnecessary because by default project dependencies will in
fact use the non-shadowed JAR unless explicitly depending on the
"shadow" configuration.
Finally, we've cleaned up the logic used for unit testing, so we are
now correctly testing against the shadow JAR when the plugin is applied.
This better represents a real-world scenario for consumers and provides
better test coverage for incorrectly declared dependencies.
(cherry picked from commit 3698131109c7e78bdd3a3340707e1c7b4740d310)
Due to a bug, GETing a snapshot can cause a RespositoryException to be
thrown. This error is transient and should be retried, rather than
causing the test to fail. This commit converts those
RepositoryExceptions into AssertionErrors so that they will be retried
in code wrapped in assertBusy.
BytesReference is currently an abstract class which is extended by
various implementations. This makes it very difficult to use the
delegation pattern. The implication of this is that our releasable
BytesReference is a PagedBytesReference type and cannot be used as a
generic releasable bytes reference that delegates to any reference type.
This commit makes BytesReference an interface and introduces an
AbstractBytesReference for common functionality.
The AbstractHlrcWriteableXContentTestCase was replaced by a better test
case a while ago, and this is the last two instances using it. They have
been converted and the test is now deleted.
Ref #39745
This commit removes the randomization used by every execute call in the
high level rest tests. Previously every execute call, which can be many
calls per single test, would rely on a random boolean to determine if
they should use the sync or async methods provided to the execute
method. This commit runs the tests twice, using two different clusters,
both of them providing the value one time via a sysprop. This ensures
that the whole suite of tests is run using the sync and async code
paths.
Closes#39667
All internal searches (triggered by APIs) across the .security index
must be performed while "under the security origin". Otherwise,
the search is performed in the context of the caller which most
likely does not have privileges to search .security (hopefully).
This commit fixes this in the case of two methods in the
TokenService and corrects an overly done such context switch
in the ApiKeyService.
In addition, this makes all tests from the client/rest-high-level
module execute as an all mighty administrator,
but not a literal superuser.
Closes#47151
Several links from the ILM HLRC Javadoc to the online documentation were
not updated when the ILM HLRC documentation was written. This commit
fixes those links.
Adds `GET /_script_context`, returning a `contexts` object with each
available context as a key whose value is an empty object. eg.
```
{
"contexts": {
"aggregation_selector": {},
"aggs": {},
"aggs_combine": {},
...
}
}
```
refs: #47411
This adds parsing an inference model as a possible
result of the analytics process. When we do parse such a model
we persist a `TrainedModelConfig` into the inference index
that contains additional metadata derived from the running job.
which is backport merge and adds a new ingest processor, named enrich processor,
that allows document being ingested to be enriched with data from other indices.
Besides a new enrich processor, this PR adds several APIs to manage an enrich policy.
An enrich policy is in charge of making the data from other indices available to the enrich processor in an efficient manner.
Related to #32789
This change adds:
- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs
Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)
Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.
A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
This commit adds HLRC support and documentation for the SLM Start and
Stop APIs, as well as updating existing documentation where appropriate.
This commit also ensures that the SLM APIs are properly included in the
HLRC documentation.
Elastic cloud has a concept of a cloud Id. This Id is a base64 encoded
url, split up into a few parts. This commit allows the user to pass in a
cloud id now, which is translated to a HttpHost that is defined by the
encoded parts therein.
Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.
Backport of #47922
This commit adds two APIs that allow to pause and resume
CCR auto-follower patterns:
// pause auto-follower
POST /_ccr/auto_follow/my_pattern/pause
// resume auto-follower
POST /_ccr/auto_follow/my_pattern/resume
The ability to pause and resume auto-follow patterns can be
useful in some situations, including the rolling upgrades of
cluster using a bi-directional cross-cluster replication scheme
(see #46665).
This commit adds a new active flag to the AutoFollowPattern
and adapts the AutoCoordinator and AutoFollower classes so
that it stops to fetch remote's cluster state when all auto-follow
patterns associate to the remote cluster are paused.
When an auto-follower is paused, remote indices that match the
pattern are just ignored: they are not added to the pattern's
followed indices uids list that is maintained in the local cluster
state. This way, when the auto-follow pattern is resumed the
indices created in the remote cluster in the meantime will be
picked up again and added as new following indices. Indices
created and then deleted in the remote cluster will be ignored
as they won't be seen at all by the auto-follower pattern at
resume time.
Backport of #47510 for 7.x
Currently there are two issues with serializing BulkByScrollResponse.
First, when deserializing from XContent, indexing exceptions and search
exceptions are switched. Additionally, search exceptions do no retain
the appropriate RestStatus code, so you must evaluate the status code
from the exception. However, the exception class is not always correctly
retained when serialized.
This commit adds tests in the failure case. Additionally, fixes the
swapping of failure types and adds the rest status code to the search
failure.
Adds the following parameters to `outlier_detection`:
- `compute_feature_influence` (boolean): whether to compute or not
feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
to the feature values
Backport of #47600
Use case:
User with `create_doc` index privilege will be allowed to only index new documents
either via Index API or Bulk API.
There are two cases that we need to think:
- **User indexing a new document without specifying an Id.**
For this ES auto generates an Id and now ES version 7.5.0 onwards defaults to `op_type` `create` we just need to authorize on the `op_type`.
- **User indexing a new document with an Id.**
This is problematic as we do not know whether a document with Id exists or not.
If the `op_type` is `create` then we can assume the user is trying to add a document, if it exists it is going to throw an error from the index engine.
Given these both cases, we can safely authorize based on the `op_type` value. If the value is `create` then the user with `create_doc` privilege is authorized to index new documents.
In the `AuthorizationService` when authorizing a bulk request, we check the implied action.
This code changes that to append the `:op_type/index` or `:op_type/create`
to indicate the implied index action.
This commit adds support to retrieve all API keys if the authenticated
user is authorized to do so.
This removes the restriction of specifying one of the
parameters (like id, name, username and/or realm name)
when the `owner` is set to `false`.
Closes#46887
* Remove eclipse conditionals
We used to have some meta projects with a `-test` prefix because
historically eclipse could not distinguish between test and main
source-sets and could only use a single classpath.
This is no longer the case for the past few Eclipse versions.
This PR adds the necessary configuration to correctly categorize source
folders and libraries.
With this change eclipse can import projects, and the visibility rules
are correct e.x. auto compete doesn't offer classes from test code or
`testCompile` dependencies when editing classes in `main`.
Unfortunately the cyclic dependency detection in Eclipse doesn't seem to
take the difference between test and non test source sets into account,
but since we are checking this in Gradle anyhow, it's safe to set to
`warning` in the settings. Unfortunately there is no setting to ignore
it.
This might cause problems when building since Eclipse will probably not
know the right order to build things in so more wirk might be necesarry.
* Add API to execute SLM retention on-demand (#47405)
This is a backport of #47405
This commit adds the `/_slm/_execute_retention` API endpoint. This
endpoint kicks off SLM retention and then returns immediately.
This in particular allows us to run retention without scheduling it
(for entirely manual invocation) or perform a one-off cleanup.
This commit also includes HLRC for the new API, and fixes an issue
in SLMSnapshotBlockingIntegTests where retention invoked prior to the
test completing could resurrect an index the internal test cluster
cleanup had already deleted.
Resolves#46508
Relates to #43663
* [ML][Inference] adding .ml-inference* index and storage (#47267)
* [ML][Inference] adding .ml-inference* index and storage
* Addressing PR comments
* Allowing null definition, adding validation tests for model config
* fixing line length
* adjusting for backport
Currently the policy config is placed directly in the json object
of the toplevel `policies` array field. For example:
```
{
"policies": [
{
"match": {
"name" : "my-policy",
"indices" : ["users"],
"match_field" : "email",
"enrich_fields" : [
"first_name",
"last_name",
"city",
"zip",
"state"
]
}
}
]
}
```
This change adds a `config` field in each policy json object:
```
{
"policies": [
{
"config": {
"match": {
"name" : "my-policy",
"indices" : ["users"],
"match_field" : "email",
"enrich_fields" : [
"first_name",
"last_name",
"city",
"zip",
"state"
]
}
}
}
]
}
```
This allows us in the future to add other information about policies
in the get policy api response.
The UI will consume this API to build an overview of all policies.
The UI may in the future include additional information about a policy
and the plan is to include that in the get policy api, so that this
information can be gathered in a single api call.
An example of the information that is likely to be added is:
* Last policy execution time
* The status of a policy (executing, executed, unexecuted)
* Information about the last failure if exists
Backport of #45794 to 7.x. Convert most `awaitBusy` calls to
`assertBusy`, and use asserts where possible. Follows on from #28548 by
@liketic.
There were a small number of places where it didn't make sense to me to
call `assertBusy`, so I kept the existing calls but renamed the method to
`waitUntil`. This was partly to better reflect its usage, and partly so
that anyone trying to add a new call to awaitBusy wouldn't be able to find
it.
I also didn't change the usage in `TransportStopRollupAction` as the
comments state that the local awaitBusy method is a temporary
copy-and-paste.
Other changes:
* Rework `waitForDocs` to scale its timeout. Instead of calling
`assertBusy` in a loop, work out a reasonable overall timeout and await
just once.
* Some tests failed after switching to `assertBusy` and had to be fixed.
* Correct the expect templates in AbstractUpgradeTestCase. The ES
Security team confirmed that they don't use templates any more, so
remove this from the expected templates. Also rewrite how the setup
code checks for templates, in order to give more information.
* Remove an expected ML template from XPackRestTestConstants The ML team
advised that the ML tests shouldn't be waiting for any
`.ml-notifications*` templates, since such checks should happen in the
production code instead.
* Also rework the template checking code in `XPackRestTestHelper` to give
more helpful failure messages.
* Fix issue in `DataFrameSurvivesUpgradeIT` when upgrading from < 7.4
In the current implementation, the validation of the role query
occurs at runtime when the query is being executed.
This commit adds validation for the role query when creating a role
but not for the template query as we do not have the runtime
information required for evaluating the template query (eg. authenticated user's
information). This is similar to the scripts that we
store but do not evaluate or parse if they are valid queries or not.
For validation, the query is evaluated (if not a template), parsed to build the
QueryBuilder and verify if the query type is allowed.
Closes#34252
This commit adds support for POST requests to the SLM `_execute` API,
because POST is a more appropriate HTTP verb for this action as it is
not idempotent. The docs are also changed to favor POST over PUT,
although PUT is not removed or officially deprecated.
Using arrays of objects with embedded IDs is preferred for new APIs over
using entity IDs as JSON keys. This commit changes the SLM stats API to
use the preferred format.
* [ML][Inference] Feature pre-processing objects and functions (#46777)
To support inference on pre-trained machine learning models, some basic feature encoding will be necessary. I am using a named object serialization approach so new encodings/pre-processing steps could be added in the future.
This PR lays down the ground work for 3 basic encodings:
* HotOne
* Target Mean
* Frequency
More feature encodings or pre-processings could be added in the future:
* Handling missing columns
* Standardization
* Label encoding
* etc....
* fixing compilation for namedxcontent tests
The HLRC has a method for reindex, that allows to trigger an async reindex by running RestHighLevelClient.submitReindexTask and RestHighLevelClient.reindex. The delete by query however only has an RestHighLevelClient.deleteByQuery method (and its async counterpart), but no RestHighLevelClient.submitDeleteByQueryTask. So add RestHighLevelClient.submitDeleteByQueryTask
Closes#46395
Currently the CountRequest accepts a search source builder, while the
RestCountAction only accepts a top level query object. This can lead
to confusion if another element (e.g. aggregations) is specified,
because that will be ignored on the server side in RestCountAction.
By deprecating the current setter & constructor that accept a
SearchSourceBuilder and adding replacement that accepts a QueryBuilder
it is clear what the count api can handle from HLRC side.
Follow up from #46829
* addSnapshotLifecyclePolicy drop version assertion
This drops the assertion on the policy version (which was pinned to 1L)
as we want to execute both put policy apis (sync and async) for
documentation purposes. This will sometimes (depending on the async
call) yield a version of 2L. Waiting for the async call to always
complete could be an option but the test is already rather slow and it's
a bit of an overkill as we're already verifying the policy was created.
(cherry picked from commit af4864c39129bcdbf98d00223f445346a62075e4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Prior to this commit terminate_after was sent as request body parameter
(via SearchSourceBuilder), which is not possible in the count api.
Closes#46446
This commits makes all the async methods in the high level client return the `Cancellable` object that the low level client now exposes.
Relates to #45379Closes#44802
The low-level REST client exposes a `performRequestAsync` method that
allows to send async requests, but today it does not expose the ability
to cancel such requests. That is something that the underlying apache
async http client supports, and it makes sense for us to expose.
This commit adds a return value to the `performRequestAsync` method,
which is backwards compatible. A `Cancellable` object gets returned,
which exposes a `cancel` public method. When calling `cancel`, the
on-going request associated with the returned `Cancellable` instance
will be cancelled by calling its `abort` method. This works throughout
multiple retries, though some special care was needed for the case where
`cancel` is called between different attempts (when one attempt has
failed and the consecutive one has not been sent yet).
Note that cancelling a request on the client side does not automatically
translate to cancelling the server side execution of it. That needs to be
specifically implemented, which is on the work for the search API (see #43332).
Relates to #44802
rename data frame transform plugin to transform:
- rename plugin data-frame to transform
- change all package names from o.e.*.dataframe.* to o.e.*.transform.*
- necessary changes to fix loading/testing
Changed the signature of AbstractResponseTestCase#createServerTestInstance(...)
to include the randomly selected xcontent type. This is needed for the
creating a server response instance with a query which is represented as BytesReference.
Maybe this should go into a different change?
This PR also includes HLRC docs for the get policy api.
Relates to #32789
* Add retention to Snapshot Lifecycle Management (#46407)
This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria.
An example policy would look like:
```
PUT /_slm/policy/snapshot-every-day
{
"schedule": "0 30 2 * * ?",
"name": "<production-snap-{now/d}>",
"repository": "my-s3-repository",
"config": {
"indices": ["foo-*", "important"]
},
// Newly configured retention options
"retention": {
// Snapshots should be deleted after 14 days
"expire_after": "14d",
// Keep a maximum of thirty snapshots
"max_count": 30,
// Keep a minimum of the four most recent snapshots
"min_count": 4
}
}
```
SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour.
Included in this work is a new SLM stats API endpoint available through
``` json
GET /_slm/stats
```
That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API.
* Add base framework for snapshot retention (#43605)
* Add base framework for snapshot retention
This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask`
to start as the basis for SLM's retention implementation.
Relates to #38461
* Remove extraneous 'public'
* Use a local var instead of reading class var repeatedly
* Add SnapshotRetentionConfiguration for retention configuration (#43777)
* Add SnapshotRetentionConfiguration for retention configuration
This commit adds the `SnapshotRetentionConfiguration` class and its HLRC
counterpart to encapsulate the configuration for SLM retention.
Currently only a single parameter is supported as an example (we still
need to discuss the different options we want to support and their
names) to keep the size of the PR down. It also does not yet include version serialization checks
since the original SLM branch has not yet been merged.
Relates to #43663
* Fix REST tests
* Fix more documentation
* Use Objects.equals to avoid NPE
* Put `randomSnapshotLifecyclePolicy` in only one place
* Occasionally return retention with no configuration
* Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764)
* Implement SnapshotRetentionTask's snapshot filtering and deletion
This commit implements the snapshot filtering and deletion for
`SnapshotRetentionTask`. Currently only the expire-after age is used for
determining whether a snapshot is eligible for deletion.
Relates to #43663
* Fix deletes running on the wrong thread
* Handle missing or null policy in snap metadata differently
* Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>>
* Use the `OriginSettingClient` to work with security, enhance logging
* Prevent NPE in test by mocking Client
* Allow empty/missing SLM retention configuration (#45018)
Semi-related to #44465, this allows the `"retention"` configuration map
to be missing.
Relates to #43663
* Add min_count and max_count as SLM retention predicates (#44926)
This adds the configuration options for `min_count` and `max_count` as
well as the logic for determining whether a snapshot meets this criteria
to SLM's retention feature.
These options are optional and one, two, or all three can be specified
in an SLM policy.
Relates to #43663
* Time-bound deletion of snapshots in retention delete function (#45065)
* Time-bound deletion of snapshots in retention delete function
With a cluster that has a large number of snapshots, it's possible that
snapshot deletion can take a very long time (especially since deletes
currently have to happen in a serial fashion). To prevent snapshot
deletion from taking forever in a cluster and blocking other operations,
this commit adds a setting to allow configuring a maximum time to spend
deletion snapshots during retention. This dynamic setting defaults to 1
hour and is best-effort, meaning that it doesn't hard stop a deletion
at an hour mark, but ensures that once the time has passed, all
subsequent deletions are deferred until the next retention cycle.
Relates to #43663
* Wow snapshots suuuure can take a long time.
* Use a LongSupplier instead of actually sleeping
* Remove TestLogging annotation
* Remove rate limiting
* Add SLM metrics gathering and endpoint (#45362)
* Add SLM metrics gathering and endpoint
This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster
takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The
stats stored include the number of snapshots taken, failed, deleted, the number of retention runs,
as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount
of time spent deleting snapshots from SLM retention.
This commit also adds an endpoint for retrieving all stats (further commits will expose this in the
SLM get-policy API) that looks like:
```
GET /_slm/stats
{
"retention_runs" : 13,
"retention_failed" : 0,
"retention_timed_out" : 0,
"retention_deletion_time" : "1.4s",
"retention_deletion_time_millis" : 1404,
"policy_metrics" : {
"daily-snapshots2" : {
"snapshots_taken" : 7,
"snapshots_failed" : 0,
"snapshots_deleted" : 6,
"snapshot_deletion_failures" : 0
},
"daily-snapshots" : {
"snapshots_taken" : 12,
"snapshots_failed" : 0,
"snapshots_deleted" : 12,
"snapshot_deletion_failures" : 6
}
},
"total_snapshots_taken" : 19,
"total_snapshots_failed" : 0,
"total_snapshots_deleted" : 18,
"total_snapshot_deletion_failures" : 6
}
```
This does not yet include HLRC for this, as this commit is quite large on its own. That will be
added in a subsequent commit.
Relates to #43663
* Version qualify serialization
* Initialize counters outside constructor
* Use computeIfAbsent instead of being too verbose
* Move part of XContent generation into subclass
* Fix REST action for master merge
* Unused import
* Record history of SLM retention actions (#45513)
This commit records the deletion of snapshots by the retention component
of SLM into the SLM history index for the purposes of reviewing operations
taken by SLM and alerting.
* Retry SLM retention after currently running snapshot completes (#45802)
* Retry SLM retention after currently running snapshot completes
This commit adds a ClusterStateObserver to wait until the currently
running snapshot is complete before proceeding with snapshot deletion.
SLM retention waits for the maximum allowed deletion time for the
snapshot to complete, however, the waiting time is not factored into
the limit on actual deletions.
Relates to #43663
* Increase timeout waiting for snapshot completion
* Apply patch
From 2374316f0d.patch
* Rename test variables
* [TEST] Be less strict for stats checking
* Skip SLM retention if ILM is STOPPING or STOPPED (#45869)
This adds a check to ensure we take no action during SLM retention if
ILM is currently stopped or in the process of stopping.
Relates to #43663
* Check all actions preventing snapshot delete during retention (#45992)
* Check all actions preventing snapshot delete during retention run
Previously we only checked to see if a snapshot was currently running,
but it turns out that more things can block snapshot deletion. This
changes the check to be a check for:
- a snapshot currently running
- a deletion already in progress
- a repo cleanup in progress
- a restore currently running
This was found by CI where a third party delete in a test caused SLM
retention deletion to throw an exception.
Relates to #43663
* Add unit test for okayToDeleteSnapshots
* Fix bug where SLM retention task would be scheduled on every node
* Enhance test logging
* Ignore if snapshot is already deleted
* Missing import
* Fix SnapshotRetentionServiceTests
* Expose SLM policy stats in get SLM policy API (#45989)
This also adds support for the SLM stats endpoint to the high level rest client.
Retrieving a policy now looks like:
```json
{
"daily-snapshots" : {
"version": 1,
"modified_date": "2019-04-23T01:30:00.000Z",
"modified_date_millis": 1556048137314,
"policy" : {
"schedule": "0 30 1 * * ?",
"name": "<daily-snap-{now/d}>",
"repository": "my_repository",
"config": {
"indices": ["data-*", "important"],
"ignore_unavailable": false,
"include_global_state": false
},
"retention": {}
},
"stats": {
"snapshots_taken": 0,
"snapshots_failed": 0,
"snapshots_deleted": 0,
"snapshot_deletion_failures": 0
},
"next_execution": "2019-04-24T01:30:00.000Z",
"next_execution_millis": 1556048160000
}
}
```
Relates to #43663
* Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356)
* Rewrite SnapshotLifecycleIT as as ESIntegTestCase
This commit splits `SnapshotLifecycleIT` into two different tests.
`SnapshotLifecycleRestIT` which includes the tests that do not require
slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an
integration test using `MockRepository` to simulate a snapshot being in
progress.
Relates to #43663Resolves#46205
* Add error logging when exceptions are thrown
* Update serialization versions
* Fix type inference
* Use non-Cancellable HLRC return value
* Fix Client mocking in test
* Fix SLMSnapshotBlockingIntegTests for 7.x branch
* Update SnapshotRetentionTask for non-multi-repo snapshot retrieval
* Add serialization guards for SnapshotLifecyclePolicy
Since 7.3, the request converter for multiSearchTemplate would silently
not set the two request parameters `typed_keys` and
`max_concurrent_searches`.
Closes#46488
Besides a rename, this changes allows to processor to attach multiple
enrich docs to the document being ingested.
Also in order to control the maximum number of enrich docs to be
included in the document being ingested, the `max_matches` setting
is added to the enrich processor.
Relates #32789
Add XContentType as parameter to the
AbstractResponseTestCase#createServerTestInstance method.
In the case a server side response class serializes xcontent as
bytes then the test needs to know what xcontent type was randomily selected.
This change is needed in #45970
The existing privilege model for API keys with privileges like
`manage_api_key`, `manage_security` etc. are too permissive and
we would want finer-grained control over the cluster privileges
for API keys. Previously APIs created would also need these
privileges to get its own information.
This commit adds support for `manage_own_api_key` cluster privilege
which only allows api key cluster actions on API keys owned by the
currently authenticated user. Also adds support for retrieval of
the API key self-information when authenticating via API key
without the need for the additional API key privileges.
To support this privilege, we are introducing additional
authentication context along with the request context such that
it can be used to authorize cluster actions based on the current
user authentication.
The API key get and invalidate APIs introduce an `owner` flag
that can be set to true if the API key request (Get or Invalidate)
is for the API keys owned by the currently authenticated user only.
In that case, `realm` and `username` cannot be set as they are
assumed to be the currently authenticated ones.
The changes cover HLRC changes, documentation for the API changes.
Closes#40031
This commit introduces PKI realm delegation. This feature
supports the PKI authentication feature in Kibana.
In essence, this creates a new API endpoint which Kibana must
call to authenticate clients that use certificates in their TLS
connection to Kibana. The API call passes to Elasticsearch the client's
certificate chain. The response contains an access token to be further
used to authenticate as the client. The client's certificates are validated
by the PKI realms that have been explicitly configured to permit
certificates from the proxy (Kibana). The user calling the delegation
API must have the delegate_pki privilege.
Closes#34396
Previously, the stats API reports a progress percentage
for DF analytics tasks that are running and are in the
`reindexing` or `analyzing` state.
This means that when the task is `stopped` there is no progress
reported. Thus, one cannot distinguish between a task that never
run to one that completed.
In addition, there are blind spots in the progress reporting.
In particular, we do not account for when data is loaded into the
process. We also do not account for when results are written.
This commit addresses the above issues. It changes progress
to being a list of objects, each one describing the phase
and its progress as a percentage. We currently have 4 phases:
reindexing, loading_data, analyzing, writing_results.
When the task stops, progress is persisted as a document in the
state index. The stats API now reports progress from in-memory
if the task is running, or returns the persisted document
(if there is one).
A policy type controls how the enrich index is created and
the query executed against the match field. Currently there
is a single policy type (`exact_match`). In the near future
more policy types will be added and different policy may have
different configuration options.
For this reason type should be a json object instead of a string field:
```
{
"exact_match": {
...
}
}
```
instead of:
```
{
"type": "exact_match",
...
}
```
This will make streaming parsing of enrich policies easier as in the
new format, the parsing code can know ahead what configuration fields
to expect. In the latter format that is not possible if the type field
appears not as the first field.
Relates to #32789
* Repository Cleanup Endpoint (#43900)
* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
The get and list APIs are a single API in this commit. Whether
requesting one named policy or all policies, a list of policies is
returened. The list API code has all been removed and the GET api is
what remains, which contains much of the list response code.
Adjusts the cluster cleanup routine in ESRestTestCase to clean up SLM
test cases, and optionally wait for all snapshots to be deleted.
Waiting for all snapshots to be deleted, rather than failing if any are
in progress, is necessary for tests which use SLM policies because SLM
policies may be in the process of executing when the test ends.
This change adds the support for the RankFeatureQuery in the HLRC by
providing an extra dependency on mapper-extras-client. It also removes
the dependency on lang-painless in mapper-extras which is not needed
anymore since the move of the vector field into a dedicated module.
Closes#43634
This commit replaces task_state and indexer_state in the
data frame _stats output with a single top level state
that combines the two. It is defined as:
- failed if what's currently reported as task_state is failed
- stopped if there is no persistent task
- Otherwise what's currently reported as indexer_state
Backport of #45276
* [ML][Data Frame] Add update transform api endpoint (#45154)
This adds the ability to `_update` stored data frame transforms. All mutable fields are applied when the next checkpoint starts. The exception being `description`.
This PR contains all that is necessary for this addition:
* HLRC
* Docs
* Server side
This commit adds a deprecation warning in 7.x for the Force Merge API
when both only_expunge_deletes and max_num_segments are set in a request.
Relates #44761
introduces an abstraction for how checkpointing and synchronization works, covering
- retrieval of checkpoints
- check for updates
- retrieving stats information
This commit switches to using the full hash to build into the JAR
manifest, which is used in node startup and the REST main action to
display the build hash.
Today we recover a replica by copying operations from the primary's translog.
However we also retain some historical operations in the index itself, as long
as soft-deletes are enabled. This commit adjusts peer recovery to use the
operations in the index for recovery rather than those in the translog, and
ensures that the replication group retains enough history for use in peer
recovery by means of retention leases.
Reverts #38904 and #42211
Relates #41536
Backport of #45136 to 7.x.
* Rename indexlifecycle to ilm and snapshotlifecycle to slm (#44917)
As a followup to #44725 and #44608, which renamed the packages within
the x-pack project, this renames the packages within the core x-pack
project. It also renames 'snapshotlifecycle' within the HLRC to slm.
* Fix one more import
With this change, we will return primary_term and seq_no of the current
document if an update is detected as a noop. We already return the
version; hence we should also return seq_no and primary_term.
Relates #42497
Adds an API to clone an index. This is similar to the index split and shrink APIs, just with the
difference that the number of primary shards is kept the same. In case where the filesystem
provides hard-linking capabilities, this is a very cheap operation.
Indexing cloning can be done by running `POST my_source_index/_clone/my_target_index` and it
supports the same options as the split and shrink APIs.
Closes#44128
This is a followup to #44350. The indexer stats used to
be persisted standalone, but now are only persisted as
part of a state-and-stats document. During the review
of #44350 it was decided that we'll stick with this
design, so there will never be a need for an indexer
stats object to store its transform ID as it is stored
on the enclosing document. This PR removes the indexer
stats document ID.
Backport of #44768
* Only emit deprecation warning if there was actual change of a datafeed's job_id.
* Add @Deprecated annotation to DatafeedUpdate.Builder#setJobId method
This change adjusts the data frame transforms stats
endpoint to return a structure that is easier to
understand.
This is a breaking change for clients of the data frame
transforms stats endpoint, but the feature is in beta so
stability is not guaranteed.
Backport of #44350
We often start testing with early access versions of new Java
versions and this have caused minor issues in our tests
(i.e. #43141) because the version string that the JVM reports
cannot be parsed as it ends with the string -ea.
This commit changes how we parse and compare Java versions to
allow correct parsing and comparison of the output of java.version
system property that might include an additional alphanumeric
part after the version numbers
(see [JEP 223[(https://openjdk.java.net/jeps/223)). In short it
handles a version number part, like before, but additionally a
PRE part that matches ([a-zA-Z0-9]+).
It also changes a number of tests that would attempt to parse
java.specification.version in order to get the full version
of Java. java.specification.version only contains the major
version and is thus inappropriate when trying to compare against
a version that might contain a minor, patch or an early access
part. We know parse java.version that can be consistently
parsed.
Resolves#43141
This commit converts all remaining TransportRequest and
TransportResponse classes to implement Writeable, and disallows
Streamable implementations.
relates #34389
This commit converts several more classes from streamable to writeable
in server, mostly within the o.e.index and o.e.persistent packages.
relates #34389
* Allow empty configuration for SLM policies
When putting or updating a snapshot lifecycle policy it was not possible
to elide the `config` map. This commit makes the configuration optional,
the same way that it is when taking a snapshot.
Relates to #38461
* Add Objects.requireNonNull for required parts of the policy
* Expose index age in ILM explain output
This adds the index's age to the ILM explain output, for example:
```
{
"indices" : {
"ilm-000001" : {
"index" : "ilm-000001",
"managed" : true,
"policy" : "full-lifecycle",
"lifecycle_date" : "2019-07-16T19:48:22.294Z",
"lifecycle_date_millis" : 1563306502294,
"age" : "1.34m",
"phase" : "hot",
"phase_time" : "2019-07-16T19:48:22.487Z",
... etc ...
}
}
}
```
This age can be used to tell when ILM will transition the index to the
next phase, based on that phase's `min_age`.
Resolves#38988
* Expose age in getters and in HLRC
this commit removes usage of the deprecated
constructor with a single argument and no Writeable.Reader.
The purpose of this is to reduce the boilerplate necessary for
properly implementing a new action, as well as reducing the
chances of using the incorrect super constructor while classes
are being migrated to Writeable
relates #34389.
This commit converts all remaining ActionType response classes to
writeable in xpack core. It also converts a few from server which were
used by xpack core.
relates #34389
* Migrate ML Actions to use writeable ActionType (#44302)
This commit converts all the StreamableResponseActionType
actions in the ML core module to be ActionType and leverage
the Writeable infrastructure.
* Add Snapshot Lifecycle Management (#43934)
* Add SnapshotLifecycleService and related CRUD APIs
This commit adds `SnapshotLifecycleService` as a new service under the ilm
plugin. This service handles snapshot lifecycle policies by scheduling based on
the policies defined schedule.
This also includes the get, put, and delete APIs for these policies
Relates to #38461
* Make scheduledJobIds return an immutable set
* Use Object.equals for SnapshotLifecyclePolicy
* Remove unneeded TODO
* Implement ToXContentFragment on SnapshotLifecyclePolicyItem
* Copy contents of the scheduledJobIds
* Handle snapshot lifecycle policy updates and deletions (#40062)
(Note this is a PR against the `snapshot-lifecycle-management` feature branch)
This adds logic to `SnapshotLifecycleService` to handle updates and deletes for
snapshot policies. Policies with incremented versions have the old policy
cancelled and the new one scheduled. Deleted policies have their schedules
cancelled when they are no longer present in the cluster state metadata.
Relates to #38461
* Take a snapshot for the policy when the SLM policy is triggered (#40383)
(This is a PR for the `snapshot-lifecycle-management` branch)
This commit fills in `SnapshotLifecycleTask` to actually perform the
snapshotting when the policy is triggered. Currently there is no handling of the
results (other than logging) as that will be added in subsequent work.
This also adds unit tests and an integration test that schedules a policy and
ensures that a snapshot is correctly taken.
Relates to #38461
* Record most recent snapshot policy success/failure (#40619)
Keeping a record of the results of the successes and failures will aid
troubleshooting of policies and make users more confident that their
snapshots are being taken as expected.
This is the first step toward writing history in a more permanent
fashion.
* Validate snapshot lifecycle policies (#40654)
(This is a PR against the `snapshot-lifecycle-management` branch)
With the commit, we now validate the content of snapshot lifecycle policies when
the policy is being created or updated. This checks for the validity of the id,
name, schedule, and repository. Additionally, cluster state is checked to ensure
that the repository exists prior to the lifecycle being added to the cluster
state.
Part of #38461
* Hook SLM into ILM's start and stop APIs (#40871)
(This pull request is for the `snapshot-lifecycle-management` branch)
This change allows the existing `/_ilm/stop` and `/_ilm/start` APIs to also
manage snapshot lifecycle scheduling. When ILM is stopped all scheduled jobs are
cancelled.
Relates to #38461
* Add tests for SnapshotLifecyclePolicyItem (#40912)
Adds serialization tests for SnapshotLifecyclePolicyItem.
* Fix improper import in build.gradle after master merge
* Add human readable version of modified date for snapshot lifecycle policy (#41035)
* Add human readable version of modified date for snapshot lifecycle policy
This small change changes it from:
```
...
"modified_date": 1554843903242,
...
```
To
```
...
"modified_date" : "2019-04-09T21:05:03.242Z",
"modified_date_millis" : 1554843903242,
...
```
Including the `"modified_date"` field when the `?human` field is used.
Relates to #38461
* Fix test
* Add API to execute SLM policy on demand (#41038)
This commit adds the ability to perform a snapshot on demand for a policy. This
can be useful to take a snapshot immediately prior to performing some sort of
maintenance.
```json
PUT /_ilm/snapshot/<policy>/_execute
```
And it returns the response with the generated snapshot name:
```json
{
"snapshot_name" : "production-snap-2019.04.09-rfyv3j9qreixkdbnfuw0ug"
}
```
Note that this does not allow waiting for the snapshot, and the snapshot could
still fail. It *does* record this information into the cluster state similar to
a regularly trigged SLM job.
Relates to #38461
* Add next_execution to SLM policy metadata (#41221)
* Add next_execution to SLM policy metadata
This adds the next time a snapshot lifecycle policy will be executed when
retriving a policy's metadata, for example:
```json
GET /_ilm/snapshot?human
{
"production" : {
"version" : 1,
"modified_date" : "2019-04-15T21:16:21.865Z",
"modified_date_millis" : 1555362981865,
"policy" : {
"name" : "<production-snap-{now/d}>",
"schedule" : "*/30 * * * * ?",
"repository" : "repo",
"config" : {
"indices" : [
"foo-*",
"important"
],
"ignore_unavailable" : true,
"include_global_state" : false
}
},
"next_execution" : "2019-04-15T21:16:30.000Z",
"next_execution_millis" : 1555362990000
},
"other" : {
"version" : 1,
"modified_date" : "2019-04-15T21:12:19.959Z",
"modified_date_millis" : 1555362739959,
"policy" : {
"name" : "<other-snap-{now/d}>",
"schedule" : "0 30 2 * * ?",
"repository" : "repo",
"config" : {
"indices" : [
"other"
],
"ignore_unavailable" : false,
"include_global_state" : true
}
},
"next_execution" : "2019-04-16T02:30:00.000Z",
"next_execution_millis" : 1555381800000
}
}
```
Relates to #38461
* Fix and enhance tests
* Figured out how to Cron
* Change SLM endpoint from /_ilm/* to /_slm/* (#41320)
This commit changes the endpoint for snapshot lifecycle management from:
```
GET /_ilm/snapshot/<policy>
```
to:
```
GET /_slm/policy/<policy>
```
It mimics the ILM path only using `slm` instead of `ilm`.
Relates to #38461
* Add initial documentation for SLM (#41510)
* Add initial documentation for SLM
This adds the initial documentation for snapshot lifecycle management.
It also includes the REST spec API json files since they're sort of
documentation.
Relates to #38461
* Add `manage_slm` and `read_slm` roles (#41607)
* Add `manage_slm` and `read_slm` roles
This adds two more built in roles -
`manage_slm` which has permission to perform any of the SLM actions, as well as
stopping, starting, and retrieving the operation status of ILM.
`read_slm` which has permission to retrieve snapshot lifecycle policies as well
as retrieving the operation status of ILM.
Relates to #38461
* Add execute to the test
* Fix ilm -> slm typo in test
* Record SLM history into an index (#41707)
It is useful to have a record of the actions that Snapshot Lifecycle
Management takes, especially for the purposes of alerting when a
snapshot fails or has not been taken successfully for a certain amount of
time.
This adds the infrastructure to record SLM actions into an index that
can be queried at leisure, along with a lifecycle policy so that this
history does not grow without bound.
Additionally,
SLM automatically setting up an index + lifecycle policy leads to
`index_lifecycle` custom metadata in the cluster state, which some of
the ML tests don't know how to deal with due to setting up custom
`NamedXContentRegistry`s. Watcher would cause the same problem, but it
is already disabled (for the same reason).
* High Level Rest Client support for SLM (#41767)
* High Level Rest Client support for SLM
This commit add HLRC support for SLM.
Relates to #38461
* Fill out documentation tests with tags
* Add more callouts and asciidoc for HLRC
* Update javadoc links to real locations
* Add security test testing SLM cluster privileges (#42678)
* Add security test testing SLM cluster privileges
This adds a test to `PermissionsIT` that uses the `manage_slm` and `read_slm`
cluster privileges.
Relates to #38461
* Don't redefine vars
* Add Getting Started Guide for SLM (#42878)
This commit adds a basic Getting Started Guide for SLM.
* Include SLM policy name in Snapshot metadata (#43132)
Keep track of which SLM policy in the metadata field of the Snapshots
taken by SLM. This allows users to more easily understand where the
snapshot came from, and will enable future SLM features such as
retention policies.
* Fix compilation after master merge
* [TEST] Move exception wrapping for devious exception throwing
Fixes an issue where an exception was created from one line and thrown in another.
* Fix SLM for the change to AcknowledgedResponse
* Add Snapshot Lifecycle Management Package Docs (#43535)
* Fix compilation for transport actions now that task is required
* Add a note mentioning the privileges needed for SLM (#43708)
* Add a note mentioning the privileges needed for SLM
This adds a note to the top of the "getting started with SLM"
documentation mentioning that there are two built-in privileges to
assist with creating roles for SLM users and administrators.
Relates to #38461
* Mention that you can create snapshots for indices you can't read
* Fix REST tests for new number of cluster privileges
* Mute testThatNonExistingTemplatesAreAddedImmediately (#43951)
* Fix SnapshotHistoryStoreTests after merge
* Remove overridden newResponse functions that have been removed
* Fix compilation for backport
* Fix get snapshot output parsing in test
* [DOCS] Add redirects for removed autogen anchors (#44380)
* Switch <tt>...</tt> in javadocs for {@code ...}
This commit converts all the StreamableResponseActionType security
classes in xpack core to ActionType, implementing Writeable for their
response classes.
relates #34389
Test clusters currently has its own set of logic for dealing with
finding different versions of Elasticsearch, downloading them, and
extracting them. This commit converts testclusters to use the
DistributionDownloadPlugin.
* HLRC: Fix '+' Not Correctly Encoded in GET Req.
* Encode `+` correctly as `%2B` in URL paths
* Keep encoding `+` as space in URL parameters
* Closes#33077
Currently we loose information about whether a token list in an AnalyzeAction
response is null or an empty list, because we write a 0 value to the stream in
both cases and deserialize to a null value on the receiving side. This change
fixes this so we write an additional flag indicating whether the value is null
or not, followed by the size of the list and its content.
Closes#44078
Rewrites how continuous data frame transforms calculates and handles buckets that require an update. Instead of storing the whole set in memory, it pages through the updates using a 2nd cursor. This lowers memory consumption and prevents problems with limits at query time (max_terms_count). The list of updates can be re-retrieved in a failure case (#43662)
This commit moves the Supplier variant of HandledTransportAction to have
a different ordering than the Writeable.Reader variant. The Supplier
version is used for the legacy Streamable, and currently having the
location of the Writeable.Reader vs Supplier in the same place forces
using casts of Writeable.Reader to select the correct super constructor.
This change in ordering allows easier migration to Writeable.Reader.
relates #34389
The base classes for transport requests and responses currently
implement Streamable and Writeable. The writeTo method on these base
classes is implemented with an empty implementation. Not only does this
complicate subclasses to think they need to call super.writeTo, but it
also can lead to not implementing writeTo when it should have been
implemented, or extendiong one of these classes when not necessary,
since there is nothing to actually implement.
This commit removes the empty writeTo from these base classes, and fixes
subclasses to not call super and in some cases implement an empty
writeTo themselves.
relates #34389
Previously a data frame transform would check whether the
source index was changed every 10 seconds. Sometimes it
may be desirable for the check to be done less frequently.
This commit increases the default to 60 seconds but also
allows the frequency to be overridden by a setting in the
data frame transform config.
The rest client does not communicate over the transport protocol.
However, in the move to make all apis supported in the HLRC, some
response classes were copied with extending ActionResponse, which is
meant strictly for the transport protocol. This commit removes uses of
that base class from HLRC.
Currently we log a deprecation warning to the types removal in
RestGetIndicesAction even if the REST method is HEAD, which is used by the
indices.exists API. Since the body is empty in this case we should not need to
show the deprecation warning.
Closes#43905