* [ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)
[ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)
Related PR: https://github.com/elastic/ml-cpp/pull/809
* adjusting bwc version
The realtime GET API currently has erratic performance in case where a document is accessed
that has just been indexed but not refreshed yet, as the implementation will currently force an
internal refresh in that case. Refreshing can be an expensive operation, and also will block the
thread that executes the GET operation, blocking other GETs to be processed. In case of
frequent access of recently indexed documents, this can lead to a refresh storm and terrible
GET performance.
While older versions of Elasticsearch (2.x and older) did not trigger refreshes and instead opted
to read from the translog in case of realtime GET API or update API, this was removed in 5.0
(#20102) to avoid inconsistencies between values that were returned from the translog and
those returned by the index. This was partially reverted in 6.3 (#29264) to allow _update and
upsert to read from the translog again as it was easier to guarantee consistency for these, and
also brought back more predictable performance characteristics of this API. Calls to the realtime
GET API, however, would still always do a refresh if necessary to return consistent results. This
means that users that were calling realtime GET APIs to coordinate updates on client side
(realtime GET + CAS for conditional index of updated doc) would still see very erratic
performance.
This PR (together with #48707) resolves the inconsistencies between reading from translog and
index. In particular it fixes the inconsistencies that happen when requesting stored fields, which
were not available when reading from translog. In case where stored fields are requested, this
PR will reparse the _source from the translog and derive the stored fields to be returned. With
this, it changes the realtime GET API to allow reading from the translog again, avoid refresh
storms and blocking the GET threadpool, and provide overall much better and predictable
performance for this API.
The first example of splitting rules for the `word_delimiter` token filter was spread across two bullet points. This makes it look like they are two separate splitting rules.
CCR follower stats can return information for persistent tasks that are in the process of being cleaned up. This is problematic for tests where CCR follower indices have been deleted, but their persistent follower task is only cleaned up asynchronously afterwards. If one of the following tests then accesses the follower stats, it might still get the stats for that follower task.
In addition, some tests were not cleaning up their auto-follow patterns, leaving orphaned patterns behind. Other tests cleaned up their auto-follow patterns. As always the same name was used, it just depended on the test execution order whether this led to a failure or not. This commit fixes the offensive tests, and will also automatically remove auto-follow-patterns at the end of tests, like we do for many other features.
Closes #48700
* Add ingest info to Cluster Stats (#48485)
This commit enhances the ClusterStatsNodes response to include global
processor usage stats on a per-processor basis.
example output:
```
...
"processor_stats": {
"gsub": {
"count": 0,
"failed": 0
"current": 0
"time_in_millis": 0
},
"script": {
"count": 0,
"failed": 0
"current": 0,
"time_in_millis": 0
}
}
...
```
The purpose for this enhancement is to make it easier to collect stats on how specific processors are being used across the cluster beyond the current per-node usage statistics that currently exist in node stats.
Closes#46146.
* fix BWC of ingest stats
The introduction of processor types into IngestStats had a bug.
It was set to `null` and set as the key to the map. This would
throw a NPE. This commit resolves this by setting all the processor
types from previous versions that are not serializing it out to
`_NOT_AVAILABLE`.
This adds the infrastructure to be able to retry the execution of retryable
steps and makes the `check-rollover-ready` retryable as an initial step to
make the rollover action more resilient to transient errors.
(cherry picked from commit 454020ac8acb147eae97acb4ccd6fb470d1e5f48)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
The docs specify that cluster.remote.connect disables cross-cluster
search. This is correct, but not fully accurate as it disables any
functionality that relies on remote cluster connections: cross-cluster
search, remote data feeds, and cross-cluster replication. This commit
updates the docs to reflect this.
For the user agent ingest processor, custom regex files must end
with the `.yml` file extension.
This corrects the docs which said the `.yaml` extension was required.
Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.
```
Before: cosineSimilarity(params.query_vector, doc['field'])
After: cosineSimilarity(params.query_vector, 'field')
```
This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.
The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.
Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
PR #25543 removed the `_uid` field in favor of the `_id` field,
including for use in slicing.
This removes an outdated reference to `_uid` in our reindex docs.