This merges the global-ordinals-based implementation for
`significant_terms` into the global-ordinals-based implementation of
`terms`, removing a bunch of copy and pasted code that is subtly
different across the two implementations and replacing it with an
explicit `ResultStrategy` with nice stuff like Javadoc.
The actual behavior is mostly unchanged, though I was able to remove a
redundant copy of bytes representing the string from the result
construction phase of `significant_terms`.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Previously we'd get a `ClassCastException` when you tried to use
`numeric_type` on `scaled_float`. Oops! This cleans up the CCE and moves
some code around so the casting actually works.
* Make it more clear that you can use `month` or `1M`.
* Explain rounding rules
* Consistently use "time zone" instead of "timezone". It looks like both
are right but I see "time zone" much more. And the parameter in
elasticsearch is `time_zone` so we may as well line up.
Closes#56760
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
This includes a few small cleanups for the `TermsAggregatorFactory`:
1. Removes an unused `DeprecationLogger`
2. Moves the members to right above the ctor.
3. Merges some all of the heuristics for picking `SubAggCollectionMode`
into a single method.
This commit adds support for rules with multiple tokens on LHS, also
known as "contraction rules", into stemmer override token
filter. Contraction rules are handy into translating multiple
inflected words into the same root form. One side effect of this change is
that it brings stemmer override rules format closer to synonym rules
format so that it makes it easier to translate one into another.
This change also makes stemmer override rules parser more strict so
that it should catch more errors which were previously accepted.
Closes#56113
This saves some memory when the `histogram` aggregation is not a top
level aggregation by dropping `asMultiBucketAggregator` in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the `LongKeyedBucketOrds` that we built the
first time we did this.
* [ML] mark forecasts for force closed/failed jobs as failed (#57143)
forecasts that are still running should be marked as failed/finished in the following scenarios:
- Job is force closed
- Job is re-assigned to another node.
Forecasts are not "resilient". Their execution does not continue after a node failure. Consequently, forecasts marked as STARTED or SCHEDULED should be flagged as failed. These forecasts can then be deleted.
Additionally, force closing a job kills the native task directly. This means that if a forecast was running, it is not allowed to complete and could still have the status of `STARTED` in the index.
relates to https://github.com/elastic/elasticsearch/issues/56419
* [ML] adds new for_export flag to GET _ml/inference API (#57351)
Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.
This flag is useful for moving models between clusters.
This commit adds the `expand_wildcards` parameter documentation to the
`_cat/indices` and `_cat/aliases` docs, as those APIs now support
`expand_wildcards`. Additionally, clarifies the `expand_wildcards` docs with
respect to hidden indices.
* Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695)
This commit lays the ground work for plugins supplying their own circuit breakers.
It adds a new interface: `CircuitBreakerPlugin`.
This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers.
With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.
This adds a max_model_memory setting to forecast requests.
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").
The default value is `20mb`.
There is a HARD limit at `500mb` which will throw an error if used.
If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.
related native change: https://github.com/elastic/ml-cpp/pull/1238
closes: https://github.com/elastic/elasticsearch/issues/56420
Unfortunately, we cannot have a safety mechnism like this where we throw whenever we find unreadable data in a shard.
This breaks in the case of an older ES version (without shard generations enabled) having failed to snapshot a shard snapshot after writing some data to its path and having finalized it for example.
Another example of where we can't support this check is the test I added, if we snapshot an index with a name that already exists in the repository and more shards than the existing index, fail doing that and then retry snapshotting it we will also see unexpected data in the path.
We could technically do deeper inspections on the unexpected data but I don't think it's worth it really. In the end if we are unable to read the data here it's broken anyway. By moving to a new `index-` blob in the shard directory I don't see us ever
corrupting existing data and since we (by virtue of moving to an empty generation) won't do any incremental work on top of potentially corrupt data we also do not risk creating broken snapshots going forward.
=> Just logging a warning in this very unlikely case is the best we can do I think
Implement TIME_PARSE(<time_str>, <pattern_str>) function
which allows to parse a time string according to the specified
pattern into a time object. The patterns allowed are those of
java.time.format.DateTimeFormatter.
Closes#54963
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Co-authored-by: Patrick Jiang(白泽) <patrickjiang0530@gmail.com>
(cherry picked from commit 1fe1188d449cad7d0782a202372edc52a4014135)
When the parameter `max_docs` is less than `slices` in update_by_query,
delete_by_query or reindex API, `max_docs ` is set to 0 and we throw an
action_request_validation_exception with confused error message:
"maxDocs should be greater than 0...".
This change checks that whether `max_docs` is less than `slices` and
throw an illegal_argument_exception with clear message.
Relates to #52786.
Co-authored-by: bellengao <gbl_long@163.com>
Backport of #56878 to 7.x branch.
With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.
Relates to #53100
Allows geo fields (`geo_point`, `geo_shape`) to have missing values.
Fixes a bug where such missing values would result in an error.
Closes#57299
Backport of #57300
* Fix GCS Mock Behavior for Missing Bucket
We were throwing a 500 instead of a 404 for a missing bucket.
This would make yaml tests needlessly wait for multiple seconds, retrying
the 500 response with backoff, in the test checking behavior for missing buckets.
Jackson 2.10 library has added a new type of error that is thrown when a numeric value is out
of range. This error should be catch and handle properly in case the flag ignore_malformed
has been set to true.
Relates: elastic/elasticsearch#55014
This commit deprecates the local param in get_mapping.json.
This parameter is a no-op and field mappings are always retrieved locally.
(cherry picked from commit 0b041cccd894f01d723fb2979f70c1cf279700a6)
When the `terms` enum operates on non-numeric data it can collect it via
global ordinals. It actually has two separate collection strategies for,
one "dense" and one "remapping". Each of *those* strategies has two
"iteration" strategies that it uses to build buckets, depending on
whether or not we need buckets with `0` docs in them. Previously this
was done with several `null` checks and never really explained. This
change replaces those checks with two `CollectionStrategy` classes which
have good stuff like documentation.
The new translog bwc test checks a corruption case before 6.3.0.
However, it needs to restart the old node to reproduce, which does not
currently work given how testclusters works when plugins are installed.
As a workaround, this commit omits creating bwc tests before 6.3.0 only
when the default distribution is used.
fixes#57252
* Fix up-to-date checks for precommit related tasks
- Do not use lambdas for doFirst / doLast action declarations as this is not supported by gradle up-to-date check
- Use marker output folder for dependencies license task to make task incremental build compliant
* Tweak formatting
Backporting #56888 to 7.x branch.
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.
This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.
Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.
Relates to #53100
Previously, `CASE` and `IIF` when translated to painless scripts
(used in GROUP BY, HAVING, WHERE) a custom `caseFunction`
registered in the `InternalSqlScriptUtils` was used. This function
received and array of arbitrary length:
```[condition1, result1, condition2, result2, ... elseResult]```
Painless doesn't know of the context and therefore is evaluating
all conditions and results before invoking the `caseFunction` on them.
As a consequence, erroneous result expressions (i.e. division by 0)
where always evaluated despite of the guarding condition.
Replace the `caseFunction` with painless `<cond> ? <res1> : <res2>`
expressions to properly guard the result expressions and only evaluate
the one for which its guarding condition evaluates to true (or of course
the elseResult).
As a bonus, this approach includes performance benefits since we avoid
unnecessary evaluations of both conditions and result expressions.
Fixes: #49672
(cherry picked from commit 9584b345d89f797bfb658212b928b9812804f02f)
The ssl.trust setting for Watcher provides a list of hostnames that
should be automatically trusted for SSL hostname verification. It was
accidentally broken when we added the full ssl.* settings for email
notifications (see #45272)
This commit corrects this, so the setting is once again respected,
as long as none of the other ssl settings are configured for email
notifications.
Resolves: #52153
Backport of: #56090
This commit moves the global hook for reporting failed test cases to the
ElasticsearchJavaPlugin. It should always be applied for all java
projects since the Test class is what emits the failures logged.
The gradle version check currently exists in BuildPlugin. However, there
is no reason to check this within every project. Instead, this commit
moves the check to the global build info, which is only applied to the
root project. Additionally, this commit removes the check from buildSrc
because it is not really necessary. The check exists really just for
external plugin authors since we use the gradle wrapper for our own
build.
This commit removes the compiler.java setting from the build. It was
originally added when Gradle was far behind support for the latest jdk,
but is no longer applicable as we don't have any need to update the
supported compile version before gradle supports the newer version. Note
that the runtime version changing support still exists here, this only
ensures we use the same jdk to compile as we use to run gradle.
Since #51888 the ML job stats endpoint has returned entries for
jobs that have a persistent task but not job config. Such
orphaned tasks caused monitoring to fail.
This change ignores any such corrupt jobs for monitoring purposes.
Backport of #57235
This PR removes the blocking call to insert ingest documents into a queue in the
coordinator. It replaces it with an offer call which will throw a rejection exception
in the event that the queue is full. This prevents deadlocks of the write threads
when the queue fills to capacity and there are more than one enrich processors
in a pipeline.
If an upgraded node is restarted multiple times without flushing a new
index commit, then we will wrongly exclude all commits from the starting
commits. This bug is reproducible with these minimal steps: (1) create
an empty index on 6.1.4 with translog retention disabled, (2) upgrade
the cluster to 7.7.0, (3) restart the upgraded the cluster. The problem
is that with the new translog policy can trim translog without having a
new index commit, while the existing commit still refers to the previous
translog generation.
Closes#57091
Changes:
* Rewrites description and adds a Lucene link
* Reformats the configurable parameters as a definition list
* Changes the `Theory` heading to `Using the min_hash token filter for
similarity search`
* Adds some additional detail to the analyzer example