In SLM retention, when a minimum number of snapshots is required for retention, we prefer to remove
the oldest snapshots first. To perform this, we limit one of the streams, in a rare case this can
cause:
```
[mynode] error during snapshot retention task
java.lang.IllegalArgumentException: -5
at java.util.stream.ReferencePipeline.limit(ReferencePipeline.java:469) ~[?:?]
at org.elasticsearch.xpack.core.slm.SnapshotRetentionConfiguration.lambda$getSnapshotDeletionPredicate$6(SnapshotRetentionConfiguration.java:195) ~[?:?]
at org.elasticsearch.xpack.slm.SnapshotRetentionTask.snapshotEligibleForDeletion(SnapshotRetentionTask.java:245) ~[?:?]
at org.elasticsearch.xpack.slm.SnapshotRetentionTask$1.lambda$onResponse$0(SnapshotRetentionTask.java:163) ~[?:?]
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176) ~[?:?]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1624) ~[?:?]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
```
When certain criteria are met. This commit fixes the negative limiting with `Math.max(0, ...)` and
adds a unit test for the behavior.
Resolves#58515
Fixes two bugs introduced by #57627:
1. We were not properly letting go of memory from the request breaker
when the aggregation finished.
2. We no longer supported totally arbitrary stuff produced by the init
script because we *assumed* that it'd be ok to run the script once
and clone its results. Sadly, cloning can't clone *anything* that the
init script can make, like `String` arrays. This runs the init script
once for every new bucket so we don't need to clone.
If we failed over while the data nodes were doing their work
we would never resolve the listener and leak it.
This change fails all listeners if master fails over.
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
- node.data: false
- node.ingest: false
- node.remote_cluster_client: false
- node.ml: false
at a minimum if they are relying on defaults, but also add:
- node.master: true
- node.transform: false
- node.voting_only: false
If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.
This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.
With this change we deprecate the existing 'node.*' settings such as
'node.data'.
* [ML] make waiting for renormalization optional for internally flushing job (#58537)
When flushing, datafeeds only need the guaruntee that the latest bucket has been handled.
But, in addition to this, the typical call to flush waits for renormalization to complete. For large jobs, this can take a fair bit of time (even longer than a bucket length). This causes unnecessary delays in handling data.
This commit adds a new internal only flag that allows datafeeds (and forecasting) to skip waiting on renormalization.
closes#58395
This commit restores the filtering of empty fields during the
xcontent serialization of SearchHit. The filtering was removed
unintentionally in #41656.
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.
This PR addresses #9572.
The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.
At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.
The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.
Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue.
It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.
Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
The main changes are:
1. Catch the `NamedObjectNotFoundException` when parsing aggregation
type, and then throw a `ParsingException` with clear error message with hint.
2. Add a unit test method: AggregatorFactoriesTests#testInvalidType().
Closes#58146.
Co-authored-by: bellengao <gbl_long@163.com>
It is possible for the source document to have an empty string value
for a field that is mapped as numeric. We should treat those as missing
values and avoid throwing an assertion error.
Backport of #58541
This changes the default value for the results field of inference
applied on models that are trained via a data frame analytics job.
Previously, the results field default was `predicted_value`. This
commit makes it the same as in the training job itself. The new
default field is `<dependent_variable>_prediction`. Apart from
making inference consistent with the training job the model came
from, it is helpful to preserve the dependent variable name
by default as it provides some context to the user that may
avoid confusion as to which model results came from.
Backport of #58538
If a persistent task cannot be assigned on the first attempt
then the master node will schedule periodic rechecks to see
if the assignment requirements have been met.
These periodic rechecks should be cancelled if the node ceases
to be master. Previously they weren't, leading to exceptions
being logged repeatedly. This PR cancels the rechecks on
learning that the node is no longer the master.
Fixes#58531
* Enable TTY password OS tests, plus refactoring (#57759)
Two keystore tests were unintentionally ignored when the
password-protected keystore work was merged. I've reënabled those tests
here.
I've also refactored the test methods a little bit to reduce the API
surface: instead of having a "startElasticsearchTtyPassword" method and
a "startElasticsearchStandardInputPassword" method, I've made a single
"startElasticsearch" method with a "useTty" boolean argument.
* Separate daemonization and non-daemonization case for tty
Centos 6 uses a version of expect that kills the elasticsearch process
when it tries to daemonize. I will fix this in future work but for now
I'm replacing it with a todo.
Follow-up to 35aecf4c9aa. Somehow I missed the fact that there's an ILM
API named `retry`, which is a keyword in Ruby. I've removed it from the
keywords list.
If an API name (or components of a name) overlaps with a reserved word in
the programming language for an ES client, then it's possible that the code
that is generated from the API will not compile. This PR adds validation to
check for such overlaps.
With #58096, data streams now track the timestamp field mapping outside
of the template associated with the stream. This means you can no longer
update the timestamp field mapping using template changes.
This updates the associated data stream docs.
When creating a target index from a source index, we don't allow for target
mappings to be specified. This PR simplifies the check that the target mappings
are empty.
This refactor will help when implementing composable template merging, since we
no longer need to resolve + check the target mappings when creating an index
from a template.
* Add acm mapping to APM for beats
* Add root mapping for APM
* Add sourcemap mapping to APM
* Fix missing properties
* Fix a second missing properties
* Add request property to acm
* Remove root and sourcemap per review
Co-authored-by: Mike Place <mike.place@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Netty4HttpServerTransportTests has started to fail intermittently. It
seems like unexpected successful responses are being received when the
test is simulating errors. This commit adds logging to the test to
provide additional information when there is an unexpected success. It
also adds the logging to the nio http test.
The plugin installer currently checks the fake plugins installed contain
a single jar file. This commit adds another test for a plugin which
contains multiple jar files, ensuring all jars exist in the installed
plugin.
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.
Relates to #53175
`terminate_after` is ignored on search requests that don't return top hits (`size` set to 0)
and do not tracked the number of hits accurately (`track_total_hits`).
We use early termination when the number of hits to track is reached during collection
but this breaks the hard termination of `terminate_after` if it happens before we reached
the `terminate_after` value.
This change ensures that we continue to check `terminate_after` even if the tracking of total
hits has reached the provided value.
Closes#57624
Users are perennially confused by the message they get when writing to
an index is blocked due to excessive disk usage:
TOO_MANY_REQUESTS/12/index read-only / allow delete (api)
Of course this is technically accurate but it is hard to join the dots
from this message to "your disk was too full" without some searching of
forums and documentation. Additionally in #50166 we changed the status
code to today's `429` from the previous `403` which changed the message
from the one that's widely documented elsewhere:
FORBIDDEN/12/index read-only / allow delete (api)
Since #42559 we've considered this block to be under the sole control of
the disk-based shard allocator, and we have seen no evidence to suggest
that anyone is applying this block manually. Therefore this commit
adjusts this block's message to indicate that it's caused by a lack of
disk space.
Similarities only apply to a few text-based field types, but are currently set directly on
the base MappedFieldType class. This commit moves similarity information into
TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper.
It was previously possible to include a similarity parameter on a number of field types
that would then ignore this information. To make it obvious that this has no effect, setting
this parameter on non-text field types now issues a deprecation warning.
This change allows the submit async search task to cancel children
and removes the manual indirection that cancels the search task when the submit
task is cancelled. This is now handled by the task cancellation, which can cancel
grand-children since #54757.
When assigning ports for internal cluster tests, we use the gradle
worker id as an adjustment on the base port of 10300. In order to not go
outside the max port range, we modulo the worker id by 223. Since gradle
worker ids start at 1, we expect to never actually get the base port of
10300. However, as the gradle daemon lasts for longer, the module can
result in a value of 0, which cases the test to fail. This commit
adjusts the modulo to ensure the value is never 0.
closes#58279
This commit creates a shared withCustomConfig method that may be used by
any packaging test. The method will copy the config directory and
override the conf path appropriately depending on the distribution type.
This commits allows data streams to be a valid source for analytics and transforms.
Data streams are fairly transparent and our `_search` and `_reindex` actions work without error.
For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.
GET inference stats now reads from the .ml-stats index.
Our tests should wait for yellow state before attempting to query the index for stat information.
Unlike `classification`, which is using a cross validation splitter
that produces training sets whose size is predictable and equal to
`training_percent * class_cardinality`, for regression we have been
using a random splitter that takes an independent decision for each
document. This means we cannot predict the exact size of the training
set. This poses a problem as we move towards performing test inference
on the java side as we need to be able to provide an accurate upper
bound of the training set size to the c++ process.
This commit replaces the random splitter we use for regression with
the same streaming-reservoir approach we do for `classification`.
Backport of #58331