Feature importance storage format is changing to encompass multi-class.
Feature importance objects are now mapped as follows
(logistic) Regression:
```
{
"feature_name": "feature_0",
"importance": -1.3
}
```
Multi-class [class names are `foo`, `bar`, `baz`]
```
{
“feature_name”: “feature_0”,
“importance”: 2.0, // sum(abs()) of class importances
“foo”: 1.0,
“bar”: 0.5,
“baz”: -0.5
},
```
This change adjusts the mapping creation for analytics so that the field is mapped as a `nested` type.
Native side change: https://github.com/elastic/ml-cpp/pull/1071
This change adds the `nori_number` token filter.
It also adds a `discard_punctuation` option in nori_tokenizer that should be used in conjunction with the new filter.
* Get Async Search: omit _clusters section when empty (#53907)
The _clusters section is omitted by the search API whenever no remote clusters are searched. Async search should do the same, but Get Async Search returns a deserialized response, hence a weird `_clusters` section with all values set to `0` gets returned instead. In fact the recreated Clusters object is not the same object as the EMPTY constant, yet it has the same content.
This commit addresses this by changing the comparison in the `toXContent` method to not print out the section if the number of total clusters is `0`.
* Async search: remove version from response (#53960)
The goal of the version field was to quickly show when you can expect to find something new in the search response, compared to when nothing has changed. This can also be done by looking at the `_shards` section and `num_reduce_phases` returned with the search response. In fact when there has been one or more additional reduction of the results, you can expect new results in the search response. Otherwise, the `_shards` section could notify of additional failures of shards that have completed the query, but that is not a guarantee that their results will be exposed (only when the following partial reduction is performed their results will be available).
That said this commit clarifies this in the docs and removes the version field from the async search response
* Async Search: replicas to auto expand from 0 to 1 (#53964)
This way single node clusters that are green don't go yellow once async search is used, while
all the others still have one replica.
* [DOCS] address timing issue in async search docs tests (#53910)
The docs snippets for submit async search have proven difficult to test as it is not possible to guarantee that you get a response that is not final, even when providing `wait_for_completion=0`. In the docs we want to show though a proper long-running query, and its first response should be partial rather than final.
With this commit we adapt the docs snippets to show a partial response, and replace under the hood all that's needed to make the snippets tests succeed when we get a final response. Also, increased the timeout so we always get a final response.
Closes#53887Closes#53891
Since a data frame analytics job may have associated docs
in the .ml-stats-* indices, when the job is deleted we
should delete those docs too.
Backport of #53933
Use sequence numbers and force merge UUID to determine whether a shard has changed or not instead before falling back to comparing files to get incremental snapshots on primary fail-over.
The test in CloseWhileRelocatingShardsIT failed recently
multiple times (3) when waiting for initial indices to be
become green. Looking at the execution logs from #53544
it appears at the very beginning of the test and when
the WindowsFS file system is picked up (which is known
to slow down tests).
This commit simply increases the timeout for the first
ensureGreen() to 60 seconds. If the test continues to fail,
we might want to test a larger timeout or disable
WindowsFS for this test.
Closes#53544
While `CustomProcessor` is generic and allows for flexibility, there
are new requirements that make cross validation a concept it's hard
to abstract behind custom processor. In particular, we would like to
add data_counts to the DFA jobs stats. Counting training VS. test
docs would be a useful statistic. We would also want to add a
different cross validation strategy for multiclass classification.
This commit renames custom processors to cross validation splitters
which allows for those enhancements without cryptically doing
things as a side effect of the abstract custom processing.
Backport of #53915
DoubleValuesSource is the type-safe replacement for ValueSource in the lucene
core. Most of elasticsearch has moved to use these, but lang-expressions is still
using the old version. This commit migrates lang-expressions as well.
Upgrading AWS SDK to v1.11.749.
Required building clients inside privileged contexts because some class loading that requires privileges now happens there and working around a new SDK bug in the S3 client builder.
Closes#53191
This commits adds a data stream feature flag, initial definition of a data stream and
the stubs for the data stream create, delete and get APIs. Also simple serialization
tests are added and a rest test to thest the data stream API stubs.
This is a large amount of code and mainly mechanical, but this commit should be
straightforward to review, because there isn't any real logic.
The data stream transport and rest action are behind the data stream feature flag and
are only intialized if the feature flag is enabled. The feature flag is enabled if
elasticsearch is build as snapshot or a release build and the
'es.datastreams_feature_flag_registered' is enabled.
The integ-test-zip sets the feature flag if building a release build, otherwise
rest tests would fail.
Relates to #53100
Downstream Elasticsearch clients, such as the Elaticsearch-JS client,
use the documentation links in our REST API JSON specifications to
create their docs.
Using a broken link or linking to yet-to-be-created doc pages can
break the docs build for these clients.
This PR adds a related note to the README for the REST API JSON Specs.
Upgrading to 8.6.2 in #53865 broke running against HTTPs endpoints (and hence real azure)
because the https url connection needs the newly added permission to work.
Source-only snapshots currently create a second full source-only copy of the shard on disk to
support incrementality during upload. Given that stored fields are occupying a substantial part
of a shard's storage, this means that clusters with source-only snapshots can require up to
50% more local storage. Ideally we would only generate source-only parts of the shard for the
things that need to be uploaded (i.e. do incrementality checks on original file instead of
trimmed-down source-only versions), but that requires much bigger changes to the snapshot
infrastructure. This here is an attempt to dramatically cut down on the storage used by the
source-only copy of the shard by soft-linking the stored-fields files (fd*) instead of copying
them.
Relates #50231
Today we only read `cluster.max_voting_config_exclusions` from the dynamic
settings in the cluster metadata, ignoring any value set in
`elasticsearch.yml`. This commit addresses this.
Closes#53455
When indexing a rectangle that crosses the dateline, we are currently not
handling it properly and we index a polygon that do not cross the dateline.
This changes generates two polygons wrapping the dateline.
This change adds a "grant API key action"
POST /_security/api_key/grant
that creates a new API key using the privileges of one user ("the
system user") to execute the action, but creates the API key with
the roles of the second user ("the end user").
This allows a system (such as Kibana) to create API keys representing
the identity and access of an authenticated user without requiring
that user to have permission to create API keys on their own.
This also creates a new QA project for security on trial licenses and runs
the API key tests there
Backport of: #52886
The joda to java.time migration requires users to upgrade their mappings. We allow them to still use 6.x created indices with joda patterns in 7 but ask them to upgrade their patterns in 7.x.
This migration guide is to help them understand how they could be affected and what needs to be changed in their mappings.
closes#51614closes#51236
This change adds a new exception with consistent metadata for when
security features are not enabled. This allows clients to be able to
tell that an API failed due to a configuration option, and respond
accordingly.
Relates: kibana#55255
Resolves: #52311, #47759
Backport of: #52811
This commit introduces aarch64 packaging, including bundling an aarch64
JDK distribution. We had to make some interesting choices here:
- ML binaries are not compiled for aarch64, so for now we disable ML on
aarch64
- depending on underlying page sizes, we have to disable class data
sharing
* Adds ability for contexts to specify their own defaults.
* Context defaults are applied if no context-specific or
general setting exists.
* See 070ea7e for settings keys.
* Increases the per-context default for the `ingest` context.
* Cache size is doubled, 200 compared to default of 100
* Cache expiration is unchanged at no expiration
* Cache max compilation is quintupled, 375/5m instead of 75/5m
Backport of: 1b37d4b
Refs: #50152
This commit changes the Transforms notifications index to be hidden
index, with a hidden alias.
This commit also removes the temporary hack in
MetaDataCreateIndexService that prevents deprecation warnings for known
dot-prefixed index names which are not hidden/system indices, as this
was the last index pattern to need that hack.
In xpack the license state contains methods to determine whether a
particular feature is allowed to be used. The one exception is
allowsRealmTypes() which returns an enum of the types of realms allowed.
This change converts the enum values to boolean methods. There are 2
notable changes: NONE is removed as we always fall back to basic license
behavior, and NATIVE is not needed because it would always return true
since we should always have a basic license.
We mark cluster states persisted on master-ineligible nodes as
potentially-stale using the voting configuration `{STALE_STATE_CONFIG}` which
prevents these nodes from being elected as master if they are restarted as
master-eligible. Today we do not handle this special voting configuration
differently in the `ClusterFormationFailureHandler`, leading to a mysterious
message `an election requires a node with id [STALE_STATE_CONFIG]` if the
election does not succeed.
This commit adds a special case description for this situation to explain
better why this node cannot win an election.
Closes#53734