The GET /_license endpoint displays "enterprise" licenses as
"platinum" by default so that old clients (including beats, kibana and
logstash) know to interpret this new license type as if it were a
platinum license.
However, this compatibility layer was not applied to the GET /_xpack/
endpoint which also displays a license type & mode.
This commit causes the _xpack API to mimic the _license API and treat
enterprise as platinum by default, with a new accept_enterprise
parameter that will cause the API to return the correct "enterprise"
value.
This BWC layer exists only for the 7.x branch.
This is a breaking change because, since 7.6, the _xpack API has
returned "enterprise" for enterprise licenses, but this has been found
to break old versions of beats and logstash so needs to be corrected.
Adds parsing of `status` and `memory_reestimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`memory_reestimate_bytes` can be used to update the job's
limit in order to restart the job.
Backport of #58588
GET inference stats now reads from the .ml-stats index.
Our tests should wait for yellow state before attempting to query the index for stat information.
There was a discrepancy in the implementation of flush
acknowledgements: most of the class was designed on the
basis that the "last finalized bucket time" could be null
but the wire serialization assumed that it was never
null. This works because, the C++ sends zero "last
finalized bucket time" when it is not known or not
relevant. But then the Java code will print that to
XContent as it is assuming null represents not known or
not relevant.
This change corrects the discrepancies. Internally within
the class null represents not known or not relevant, but
this is translated from/to 0 for communications from the
C++ and old nodes that have the bug.
Additionally I switched from Date to Instant for this
class and made the member variables final to modernise it
a bit.
Backport of #58413
This need some reorg of BinaryDV field data classes to allow specialisation of scripted doc values.
Moved common logic to a new abstract base class and added a new subclass to return string-based representations to scripts.
Closes#58044
Search after is a better choice for the delete expired data iterators
where processing takes a long time as unlike scroll a context does not
have to be kept alive. Also changes the delete expired data endpoint to
404 if the job is unknown
Deleting expired data can take a long time leading to timeouts if there
are many jobs. Often the problem is due to a few large jobs which
prevent the regular maintenance of the remaining jobs. This change adds
a job_id parameter to the delete expired data endpoint to help clean up
those problematic jobs.
This PR adds the initial Java side changes to enable
use of the per-partition categorization functionality
added in elastic/ml-cpp#1293.
There will be a followup change to complete the work,
as there cannot be any end-to-end integration tests
until elastic/ml-cpp#1293 is merged, and also
elastic/ml-cpp#1293 does not implement some of the
more peripheral functionality, like stop_on_warn and
per-partition stats documents.
The changes so far cover REST APIs, results object
formats, HLRC and docs.
Backport of #57683
#47711 and #47246 helped to validate that monitoring settings are
rejected at time of setting the monitoring settings. Else an invalid
monitoring setting can find it's way into the cluster state and result
in an exception thrown [1] on the cluster state application (there by
causing significant issues). Some additional monitoring settings have
been identified that can result in invalid cluster state that also
result in exceptions thrown on cluster state application.
All settings require a type of either http or local to be
applicable. When a setting is changed, the exporters are automatically
updated with the new settings. However, if the old or new settings lack
of a type setting an exception will be thrown (since exporters are
always of type 'http' or 'local'). Arguably we shouldn't blindly create
and destroy new exporters on each monitoring setting update, but the
lifecycle of the exporters is abit out the scope this PR is trying to
address.
This commit introduces a similar methodology to check for validity as
#47711 and #47246 but this time for ALL (including non-http) settings.
Monitoring settings are not useful unless there an exporter with a type
defined. The type is used as dependent setting, such that it must
exist to set the value. This ensures that when any monitoring settings
changes that they can only get added to cluster state if the type
exists. If the type exists (and the other validations pass) then the
exporters will get re-built and the cluster state remains valid.
Tests have been included to ensure that all dynamic monitoring settings
have the type as dependent settings.
[1]
org.elasticsearch.common.settings.SettingsException: missing exporter type for [found-user-defined] exporter
at org.elasticsearch.xpack.monitoring.exporter.Exporters.initExporters(Exporters.java:126) ~[?:?]
When we force delete a DF analytics job, we currently first force
stop it and then we proceed with deleting the job config.
This may result in logging errors if the job config is deleted
before it is retrieved while the job is starting.
Instead of force stopping the job, it would make more sense to
try to stop the job gracefully first. So we now try that out first.
If normal stop fails, then we resort to force stopping the job to
ensure we can go through with the delete.
In addition, this commit introduces `timeout` for the delete action
and makes use of it in the child requests.
Backport of #57680
* [ML] adds new for_export flag to GET _ml/inference API (#57351)
Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.
This flag is useful for moving models between clusters.
This adds a max_model_memory setting to forecast requests.
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").
The default value is `20mb`.
There is a HARD limit at `500mb` which will throw an error if used.
If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.
related native change: https://github.com/elastic/ml-cpp/pull/1238
closes: https://github.com/elastic/elasticsearch/issues/56420
Backport of #56878 to 7.x branch.
With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.
Relates to #53100
Backporting #56888 to 7.x branch.
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.
This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.
Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.
Relates to #53100
Backport of #55548
Adds equivalence for keyword field to the wildcard field. Regex, fuzzy, wildcard and prefix queries are all supported.
All queries use an approximation query backed by an automaton-based verification queries.
Closes#54275
Throttling nightly cleanup as much as we do has been over cautious.
Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.
Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.
This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
* [Transform] add support for terms agg in transforms (#56696)
This adds support for `terms` and `rare_terms` aggs in transforms.
The default behavior is that the results are collapsed in the following manner:
`<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...`
Or if no sub aggs exist
`<AGG_NAME>.<BUCKET_NAME>.<_doc_count>`
The mapping is also defined as `flattened` by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.
The aggregations supports the following normalizations
- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization
To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.
For example:
```
{
"normalize": {
"buckets_path": <>,
"normalizer": "percent"
}
}
```
This commit allows the JSON schema's documentation.url property to have a null value.
This can useful for cases where a feature is under development, and does not have
documentation published yet.
This commit also adds a documentation.url for two ml resources.
Backport: #55377
This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.
Relates to #53100
Backport of: #56569
A data stream test, which tests data stream resolvability in xpack apis failed in release builds.
A invocation of a searchable snapshot api failed, because the corresponding feature flag
wasn't enabled for xpack rest tests.
Closes#56531
Similar to what the moving function aggregation does, except merging windows of percentiles
sketches together instead of cumulatively merging final metrics
Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction
to be able to resolve data streams by default. Implementations can change this ability.
This change allows to following APIs to resolve data streams: flush,
refresh (already supported data streams), force merge, clear indices cache,
indices stats (already supported data streams), segments, upgrade stats,
upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and
reload analyzers APIs.
Relates to #53100
Transforms should propagate up the search execution exception if one is returned when it does the test query.
this allows transforms to return a `4xx` when the aggs are malformed but parseable.
closes https://github.com/elastic/elasticsearch/issues/55994
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.
- The default for `model_snapshot_retention_days` for new jobs is now
10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
that defaults to 1 for new jobs and `model_snapshot_retention_days`
for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
and `model_snapshot_retention_days` all but the first model snapshot
for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
selected model snapshots to be retained indefinitely
Backport of #56125
As of elastic/ml-cpp#1179, the analytics process reports phases
depending on the analysis type. This commit adjusts the phases
of current analyses from `analyzing` to the following:
- outlier_detection: [`computing_outlier`]
- regression/classification: [`feature_selection`, `coarse_parameter_search`, `fine_tuning_parameters`, `final_training`]
Backport of #56107
Backport of #55858 to 7.x branch.
Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.
This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:
* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See #55377
* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:
AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.
The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.
Relates to #53100
Backports #55933 to 7.x
Implements value_count and avg aggregations over Histogram fields as discussed in #53285
- value_count returns the sum of all counts array of the histograms
- avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array
* Emit deprecation warning if multiple v1 templates match with a new index (#55558)
* Emit deprecation warning if multiple v1 templates match with a new index
* DEPRECATION_LOGGER rename
Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285
Backports #55681 to 7.x