Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.
Fixes#52427.
This change makes it possible to send secondary authentication
credentials to select endpoints that need to perform a single action
in the context of two users.
Typically this need arises when a server process needs to call an
endpoint that users should not (or might not) have direct access to,
but some part of that action must be performed using the logged-in
user's identity.
Backport of: #52093
The setting, `xpack.logstash.enabled`, exists to enable or disable the
logstash extensions found within x-pack. In practice, this setting had
no effect on the functionality of the extension. Given this, the
setting is now deprecated in preparation for removal.
Backport of #53367
The tests in this class had been failing for a while, but went unnoticed as not tested by CI (see #53442).
The reason the tests fail is that the can-match phase is smarter now, and filters out access to a non-existing field.
Closes#53442
Adds a new `default_field_map` field to trained model config objects.
This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.
The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
This is a partial implementation of an endpoint for anomaly
detector model memory estimation.
It is not complete, lacking docs, HLRC and sensible numbers
for many anomaly detector configurations. These will be
added in a followup PR in time for 7.7 feature freeze.
A skeleton endpoint is useful now because it allows work on
the UI side of the change to commence. The skeleton endpoint
handles the same cases that the old UI code used to handle,
and produces very similar estimates for these cases.
Backport of #53333
This avoids NPE when executing SLM policy when no config was provided.
Related to #44465Closes#53171
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
A freeze operation can partially fail in multiple places, including the
close verification step. This left the index in an unfrozen but
partially closed state. Now throw an exception to retry the freeze step
instead.
This commit updates the template used for watch history indices with
the hidden index setting so that new indices will be created as hidden.
Relates #50251
Backport of #52962
This moves the usage statistics gathering from the `AnalyticsPlugin`
into an `AnalyicsUsage`, removing the static state. It also checks the
license level when parsing all analytics aggregations. This is how we
were checking them before but we did it in an easy to forget way. This
way is slightly simpler, I think.
This field is a specialization of the `keyword` field for the case when all
documents have the same value. It typically performs more efficiently than
keywords at query time by figuring out whether all or none of the documents
match at rewrite time, like `term` queries on `_index`.
The name is up for discussion. I liked including `keyword` in it, so that we
still have room for a `singleton_numeric` in the future. However I'm unsure
whether to call it `singleton`, `constant` or something else, any opinions?
For this field there is a choice between
1. accepting values in `_source` when they are equal to the value configured
in mappings, but rejecting mapping updates
2. rejecting values in `_source` but then allowing updates to the value that
is configured in the mapping
This commit implements option 1, so that it is possible to reindex from/to an
index that has the field mapped as a keyword with no changes to the source.
Backport of #49713
Currently _rollup_search requires manage privilege to access. It should really be
a read only operation. This PR changes the requirement to be read indices privilege.
Resolves: #50245
implement transform node attributes to disable transform on certain nodes and
test which nodes are allowed to do remote connections
closes#52200closes#50033closes#48734
backport #52712
Fix NPE when closing a webserver that hasn't started correctly.
This can happen when ssl context isn't initialized. The server instance is then never set,
which causes an NPE that masks the actual failure.
Example stacktrace that would mask an actual failure:
```
java.lang.NullPointerException
at org.elasticsearch.test.http.MockWebServer.close(MockWebServer.java:271)
at org.elasticsearch.xpack.watcher.test.integration.HttpSecretsIntegrationTests.cleanup(HttpSecretsIntegrationTests.java:70)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
```
Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern `.ml-stats-*` whose
first concrete index will be `.ml-stats-000001`. This index serves
to store instrumentation information for those jobs.
Backport of #52778 and #52958
When user A runs as user B and performs any API key related operations,
user B's realm should always be used to associate with the API key.
Currently user A's realm is used when getting or invalidating API keys
and owner=true. The PR is to fix this bug.
resolves: #51975
* [ML][Inference] Add support for multi-value leaves to the tree model (#52531)
This adds support for multi-value leaves. This is a prerequisite for multi-class boosted tree classification.
This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index.
This is necessary for the following use cases:
- Reading from frozen indices
- Allowing certain indices in multiple index patterns to not exist yet
These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object.
closes https://github.com/elastic/elasticsearch/issues/48056
This change removes TrainedModelConfig#isAvailableWithLicense method with calls to
XPackLicenseState#isAllowedByLicense.
Please note there are subtle changes to the code logic. But they are the right changes:
* Instead of Platinum license, Enterprise license nows guarantees availability.
* No explicit check when the license requirement is basic. Since basic license is always available, this check is unnecessary.
* Trial license is always allowed.
Generalize how queries on `_index` are handled at rewrite time (#52486)
Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.
This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.
Closes#49254
XPackLicenseState reads to necessary to validate a number of cluster
operations. This reads occasionally occur on transport threads which
should not be blocked. Currently we sychronize when reading. However,
this is unecessary as only a single piece of state is updateable. This
commit makes this state volatile and removes the locking.
When a license expires, or license state changes, functionality might be
disabled. This commit adds messages for CCR to inform users that CCR
functionality will be disabled when a license expires, or when license
state changes to a license level lower than trial/platinum/enterprise.
This adds machine learning model feature importance calculations to the inference processor.
The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
"field_mappings": {},
"model_id": "my_model",
"inference_config": {
"regression": {
"num_top_feature_importance_values": 3
}
}
}
```
This will write to the document as follows:
```
"inference" : {
"feature_importance" : {
"FlightTimeMin" : -76.90955548511226,
"FlightDelayType" : 114.13514762158526,
"DistanceMiles" : 13.731580450792187
},
"predicted_value" : 108.33165831875137,
"model_id" : "my_model"
}
```
This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888).
It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7.
Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.
NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc
usability blocked by: https://github.com/elastic/ml-cpp/pull/991
This commit modifies the codebase so that our production code uses a
single instance of the IndexNameExpressionResolver class. This change
is being made in preparation for allowing name expression resolution
to be augmented by a plugin.
In order to remove some instances of IndexNameExpressionResolver, the
single instance is added as a parameter of Plugin#createComponents and
PersistentTaskPlugin#getPersistentTasksExecutor.
Backport of #52596
* Make FreezeStep retryable
This change marks `FreezeStep` as retryable and adds test to make sure we can really run it again.
* refactor tests
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Refactor Inflexible Snapshot Repository BwC (#52365)
Transport the version to use for a snapshot instead of whether to use shard generations in the snapshots in progress entry. This allows making upcoming repository metadata changes in a flexible manner in an analogous way to how we handle serialization BwC elsewhere.
Also, exposing the version at the repository API level will make it easier to do BwC relevant changes in derived repositories like source only or encrypted.
Add enterprise operation mode to properly map enterprise license.
Aslo refactor XPackLicenstate class to consolidate license status and mode checks.
This class has many sychronised methods to check basically three things:
* Minimum operation mode required
* Whether security is enabled
* Whether current license needs to be active
Depends on the actual feature, either 1, 2 or all of above checks are performed.
These are now consolidated in to 3 helper methods (2 of them are new).
The synchronization is pushed down to the helper methods so actual checking
methods no longer need to worry about it.
resolves: #51081
When `PUT` is called to store a trained model, it is useful to return the newly create model config. But, it is NOT useful to return the inflated definition.
These definitions can be large and returning the inflated definition causes undo work on the server and client side.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Make DeleteStep retryable
This change marks `DeleteStep` as retryable and adds test to make sure we really can invoke it again.
* Fix unused import
* revert unneeded changes
* test reworked
This changes the tree validation code to ensure no node in the tree has a
feature index that is beyond the bounds of the feature_names array.
Specifically this handles the situation where the C++ emits a tree containing
a single node and an empty feature_names list. This is valid tree used to
centre the data in the ensemble but the validation code would reject this
as feature_names is empty. This meant a broken workflow as you cannot GET
the model and PUT it back
This is useful in cases where the caller of the API needs to know
the name of the realm that consumed the SAML Response and
authenticated the user and this is not self evident (i.e. because
there are many saml realms defined in ES).
Currently, the way to learn the realm name would be to make a
subsequent request to the `_authenticate` API.
The shard follow task cleaner executes on behalf of the user to clean up
a shard follow task after the follower index has been
deleted. Otherwise, these persistent tasks are left laying around, and
they fail to execute because the follower index has been deleted. In the
face of security, attempts to complete these persistent tasks would
fail. This is because these cleanups are executed under the system
context (this makes sense, they are happening on behalf of the user
after the user has executed an action) but the system role was never
granted the permission for persistent task completion. This commit
addresses this by adding this cluster privilege to the system role.
We marked the `init` ILM step as retryable but our test used `waitUntil`
without an assert so we didn’t catch the fact that we were not actually
able to retry this step as our ILM state didn’t contain any information
about the policy execution (as we were in the process of initialising
it).
This commit manually sets the current step to `init` when we’re moving
the ilm policy into the ERROR step (this enables us to successfully
move to the error step and later retry the step)
* ShrunkenIndexCheckStep: Use correct logger
(cherry picked from commit f78d4b3d91345a2a8fc0f48b90dd66c9959bd7ff)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.
At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.
Co-Authored by: Zachary Tong <zach@elastic.co>
ML mappings and index templates have so far been created
programmatically. While this had its merits due to static typing,
there is consensus it would be clear to maintain those in json files.
In addition, we are going to adding ILM policies to these indices
and the component for a plugin to register ILM policies is
`IndexTemplateRegistry`. It expects the templates to be in resource
json files.
For the above reasons this commit refactors ML mappings and index
templates into json resource files that are registered via
`MlIndexTemplateRegistry`.
Backport of #51765