This commit instruments data frame analytics
with stats for the data that are being analyzed.
In particular, we count training docs, test docs,
and skipped docs.
In order to account docs with missing values as skipped
docs for analyses that do not support missing values,
this commit changes the extractor so that it only ignores
docs with missing values when it collects the data summary,
which is used to estimate memory usage.
Backport of #53998
add 2 additional stats: processing time and processing total which capture the
time spent for processing results and how often it ran. The 2 new stats
correspond to the existing indexing and search stats. Together with indexing
and search this now allows the user to see the full picture, all 3 stages.
This is the first in a series of commits that will introduce the
autoscaling deciders framework. This commit introduces the basic
framework for representing autoscaling decisions.
Adds multi-class feature importance calculation.
Feature importance objects are now mapped as follows
(logistic) Regression:
```
{
"feature_name": "feature_0",
"importance": -1.3
}
```
Multi-class [class names are `foo`, `bar`, `baz`]
```
{
“feature_name”: “feature_0”,
“importance”: 2.0, // sum(abs()) of class importances
“foo”: 1.0,
“bar”: 0.5,
“baz”: -0.5
},
```
For users to get the full benefit of aggregating and searching for feature importance, they should update their index mapping as follows (before turning this option on in their pipelines)
```
"ml.inference.feature_importance": {
"type": "nested",
"dynamic": true,
"properties": {
"feature_name": {
"type": "keyword"
},
"importance": {
"type": "double"
}
}
}
```
The mapping field name is as follows
`ml.<inference.target_field>.<inference.tag>.feature_importance`
if `inference.tag` is not provided in the processor definition, it is not part of the field path.
`inference.target_field` is defaulted to `ml.inference`.
//cc @lcawl ^ Where should we document this?
If this makes it in for 7.7, there shouldn't be any feature_importance at inference BWC worries as 7.7 is the first version to have it.
This commit removes the configuration time vs execution time distinction
with regards to certain BuildParms properties. Because of the cost of
determining Java versions for configuration JDK locations we deferred
this until execution time. This had two main downsides. First, we had
to implement all this build logic in tasks, which required a bunch of
additional plumbing and complexity. Second, because some information
wasn't known during configuration time, we had to nest any build logic
that depended on this in awkward callbacks.
We now defer to the JavaInstallationRegistry recently added in Gradle.
This utility uses a much more efficient method for probing Java
installations vs our jrunscript implementation. This, combined with some
optimizations to avoid probing the current JVM as well as deferring
some evaluation via Providers when probing installations for BWC builds
we can maintain effectively the same configuration time performance
while removing a bunch of complexity and runtime cost (snapshotting
inputs for the GenerateGlobalBuildInfoTask was very expensive). The end
result should be a much more responsive build execution in almost all
scenarios.
(cherry picked from commit ecdbd37f2e0f0447ed574b306adb64c19adc3ce1)
This moves the pipeline aggregation validation from the data node to the
coordinating node so that we, eventually, can stop sending pipeline
aggregations to the data nodes entirely. In fact, it moves it into the
"request validation" stage so multiple errors can be accumulated and
sent back to the requester for the entire request. We can't always take
advantage of that, but it'll be nice for folks not to have to play
whack-a-mole with validation.
This is implemented by replacing `PipelineAggretionBuilder#validate`
with:
```
protected abstract void validate(ValidationContext context);
```
The `ValidationContext` handles the accumulation of validation failures,
provides access to the aggregation's siblings, and implements a few
validation utility methods.
Benchmarking showed that the effect of the ExitableDirectoryReader
is reduced considerably when checking every 8191 docs. Moreover,
set the cancellable task before calling QueryPhase#preProcess()
and make sure we don't wrap with an ExitableDirectoryReader at all
when lowLevelCancellation is set to false to avoid completely any
performance impact.
Follows: #52822
Follows: #53166
Follows: #53496
(cherry picked from commit cdc377e8e74d3ca6c231c36dc5e80621aab47c69)
Feature importance storage format is changing to encompass multi-class.
Feature importance objects are now mapped as follows
(logistic) Regression:
```
{
"feature_name": "feature_0",
"importance": -1.3
}
```
Multi-class [class names are `foo`, `bar`, `baz`]
```
{
“feature_name”: “feature_0”,
“importance”: 2.0, // sum(abs()) of class importances
“foo”: 1.0,
“bar”: 0.5,
“baz”: -0.5
},
```
This change adjusts the mapping creation for analytics so that the field is mapped as a `nested` type.
Native side change: https://github.com/elastic/ml-cpp/pull/1071
* Get Async Search: omit _clusters section when empty (#53907)
The _clusters section is omitted by the search API whenever no remote clusters are searched. Async search should do the same, but Get Async Search returns a deserialized response, hence a weird `_clusters` section with all values set to `0` gets returned instead. In fact the recreated Clusters object is not the same object as the EMPTY constant, yet it has the same content.
This commit addresses this by changing the comparison in the `toXContent` method to not print out the section if the number of total clusters is `0`.
* Async search: remove version from response (#53960)
The goal of the version field was to quickly show when you can expect to find something new in the search response, compared to when nothing has changed. This can also be done by looking at the `_shards` section and `num_reduce_phases` returned with the search response. In fact when there has been one or more additional reduction of the results, you can expect new results in the search response. Otherwise, the `_shards` section could notify of additional failures of shards that have completed the query, but that is not a guarantee that their results will be exposed (only when the following partial reduction is performed their results will be available).
That said this commit clarifies this in the docs and removes the version field from the async search response
* Async Search: replicas to auto expand from 0 to 1 (#53964)
This way single node clusters that are green don't go yellow once async search is used, while
all the others still have one replica.
* [DOCS] address timing issue in async search docs tests (#53910)
The docs snippets for submit async search have proven difficult to test as it is not possible to guarantee that you get a response that is not final, even when providing `wait_for_completion=0`. In the docs we want to show though a proper long-running query, and its first response should be partial rather than final.
With this commit we adapt the docs snippets to show a partial response, and replace under the hood all that's needed to make the snippets tests succeed when we get a final response. Also, increased the timeout so we always get a final response.
Closes#53887Closes#53891
Since a data frame analytics job may have associated docs
in the .ml-stats-* indices, when the job is deleted we
should delete those docs too.
Backport of #53933
Use sequence numbers and force merge UUID to determine whether a shard has changed or not instead before falling back to comparing files to get incremental snapshots on primary fail-over.
While `CustomProcessor` is generic and allows for flexibility, there
are new requirements that make cross validation a concept it's hard
to abstract behind custom processor. In particular, we would like to
add data_counts to the DFA jobs stats. Counting training VS. test
docs would be a useful statistic. We would also want to add a
different cross validation strategy for multiclass classification.
This commit renames custom processors to cross validation splitters
which allows for those enhancements without cryptically doing
things as a side effect of the abstract custom processing.
Backport of #53915
Upgrading AWS SDK to v1.11.749.
Required building clients inside privileged contexts because some class loading that requires privileges now happens there and working around a new SDK bug in the S3 client builder.
Closes#53191
This commits adds a data stream feature flag, initial definition of a data stream and
the stubs for the data stream create, delete and get APIs. Also simple serialization
tests are added and a rest test to thest the data stream API stubs.
This is a large amount of code and mainly mechanical, but this commit should be
straightforward to review, because there isn't any real logic.
The data stream transport and rest action are behind the data stream feature flag and
are only intialized if the feature flag is enabled. The feature flag is enabled if
elasticsearch is build as snapshot or a release build and the
'es.datastreams_feature_flag_registered' is enabled.
The integ-test-zip sets the feature flag if building a release build, otherwise
rest tests would fail.
Relates to #53100
Source-only snapshots currently create a second full source-only copy of the shard on disk to
support incrementality during upload. Given that stored fields are occupying a substantial part
of a shard's storage, this means that clusters with source-only snapshots can require up to
50% more local storage. Ideally we would only generate source-only parts of the shard for the
things that need to be uploaded (i.e. do incrementality checks on original file instead of
trimmed-down source-only versions), but that requires much bigger changes to the snapshot
infrastructure. This here is an attempt to dramatically cut down on the storage used by the
source-only copy of the shard by soft-linking the stored-fields files (fd*) instead of copying
them.
Relates #50231
This change adds a "grant API key action"
POST /_security/api_key/grant
that creates a new API key using the privileges of one user ("the
system user") to execute the action, but creates the API key with
the roles of the second user ("the end user").
This allows a system (such as Kibana) to create API keys representing
the identity and access of an authenticated user without requiring
that user to have permission to create API keys on their own.
This also creates a new QA project for security on trial licenses and runs
the API key tests there
Backport of: #52886
This change adds a new exception with consistent metadata for when
security features are not enabled. This allows clients to be able to
tell that an API failed due to a configuration option, and respond
accordingly.
Relates: kibana#55255
Resolves: #52311, #47759
Backport of: #52811
This commit introduces aarch64 packaging, including bundling an aarch64
JDK distribution. We had to make some interesting choices here:
- ML binaries are not compiled for aarch64, so for now we disable ML on
aarch64
- depending on underlying page sizes, we have to disable class data
sharing
This commit changes the Transforms notifications index to be hidden
index, with a hidden alias.
This commit also removes the temporary hack in
MetaDataCreateIndexService that prevents deprecation warnings for known
dot-prefixed index names which are not hidden/system indices, as this
was the last index pattern to need that hack.
In xpack the license state contains methods to determine whether a
particular feature is allowed to be used. The one exception is
allowsRealmTypes() which returns an enum of the types of realms allowed.
This change converts the enum values to boolean methods. There are 2
notable changes: NONE is removed as we always fall back to basic license
behavior, and NATIVE is not needed because it would always return true
since we should always have a basic license.
Fixes up the "forbidden" warnings that you get when you import
Elasticsearch using "import gradle projects".
With this, and the manual step of switching circular project definitions
to warnings this gets most thing *compiling*.
This commit adds a new AsyncSearchClient to the High Level Rest Client which
initially supporst the submitAsyncSearch in its blocking and non-blocking
flavour. Also adding client side request and response objects and parsing code
to parse the xContent output of the client side AsyncSearchResponse together
with parsing roundtrip tests and a simple roundtrip integration test.
Relates to #49091
Backport of #53592
It's simple to deprecate a field used in an ObjectParser just by adding deprecation
markers to the relevant ParseField objects. The warnings themselves don't currently
have any context - they simply say that a deprecated field has been used, but not
where in the input xcontent it appears. This commit adds the parent object parser
name and XContentLocation to these deprecation messages.
Note that the context is automatically stripped from warning messages when they
are asserted on by integration tests and REST tests, because randomization of
xcontent type during these tests means that the XContentLocation is not constant
Adds parsing and indexing of analysis instrumentation stats.
The latest one is also returned from the get-stats API.
Note that we chose to duplicate objects even where they are currently
similar. There are already ideas on how these will diverge in the future
and while the duplication looks ugly at the moment, it is the option
that offers the highest flexibility.
Backport of #53788
The AuditTrailService has historically been an AuditTrail itself, acting
as a composite of the configured audit trails. This commit removes that
interface from the service and instead builds a composite delegating
implementation internally. The service now has a single get() method to
get an AuditTrail implementation which may be called. If auditing is not
allowed by the license, an empty noop version is returned.
There is an assertion in ReloadAnalyzersResponse.merge that compares index names
of merged responses that was falsely using object equality instead of
String.equals(). In the past this didn't seem to matter but with changes in the
test setup we started to see failures. Correcting this and also simplifying test
a bit to be able to run it repeatedly if needed.
Backport of #53663
* [ML] only retry persistence failures when the failure is intermittent and stop retrying when analytics job is stopping (#53725)
This fixes two issues:
- Results persister would retry actions even if they are not intermittent. An example of an persistent failure is a doc mapping problem.
- Data frame analytics would continue to retry to persist results even after the job is stopped.
closes https://github.com/elastic/elasticsearch/issues/53687
Prior to this commit Watcher explicitly copied test between two
projects with a copy task. This commit removes the explicit copy in favor
of adding the Watcher tests to the available restResources that may be
copied between projects.
This is how inter-project dependencies should be modeled. However, only
Watcher is included here since it is (currently) the only project with
inter-project test dependencies.
* Fix feature flag setting for ComponentTemplate APIs (#53758)
The feature flag was set for *most* of the builds, but there are a couple where it was missing.
Resolves#53708
* Add skip for older versions of ES
Backport to 7x
Enable geo_shape query to work on geo_point fields for shapes: circle, polygon, multipolygon, rectangle see: #48928
Co-Authored-By: @iverase
* Adds per context settings:
`script.context.${CONTEXT}.cache_max_size` ~
`script.cache.max_size`
`script.context.${CONTEXT}.cache_expire` ~
`script.cache.expire`
`script.context.${CONTEXT}.max_compilations_rate` ~
`script.max_compilations_rate`
* Context cache is used if:
`script.max_compilations_rate=use-context`. This
value is dynamically updatable, so users can
switch back to the general cache if desired.
* Settings for context caches take the first value
that applies:
1) Context specific settings if set, eg
`script.context.ingest.cache_max_size`
2) Correlated general setting is set to the non-default
value, eg `script.cache.max_size`
3) Context default
The reason for 2's inclusion is to allow an easy
transition for users who've customized their general
cache settings.
Using the general cache settings for the context caches
results in higher effective settings, since they are
multiplied across the number of contexts. So a general
cache max size of 200 will become 200 * # of contexts.
However, this behavior it will avoid users snapping to a
value that is too low for them.
Backport of: #52855
Refs: #50152