This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index.
This is necessary for the following use cases:
- Reading from frozen indices
- Allowing certain indices in multiple index patterns to not exist yet
These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object.
closes https://github.com/elastic/elasticsearch/issues/48056
This change adds the recall@k metric and refactors precision@k to match
the new metric.
Recall@k is an important metric to use for learning to rank (LTR)
use-cases. Candidate generation or first ranking phase ranking functions
are often optimized for high recall, in order to generate as many
relevant candidates in the top-k as possible for a second phase of
ranking. Adding this metric allows tuning that base query for LTR.
See: https://github.com/elastic/elasticsearch/issues/51676
Backports: https://github.com/elastic/elasticsearch/pull/52577
Add query execution and return actual results returned from
Elasticsearch inside the tests
(cherry picked from commit 3e039282bf991af87604a6d4f8eada19d5e33842)
The introductory sections of the reference manual contains some simplified
instructions for adding a node to the cluster. Unfortunately they are a little
too simplified and only really work for clusters running on `localhost`. If you
try and follow these instructions for a distributed cluster then the new node
will, confusingly, auto-bootstrap itself into a distinct one-node cluster.
Multiple nodes running on localhost is a valid config, of course, but we should
spell out that these instructions are really only for experimentation and that
it takes a bit more work to add nodes to a distributed cluster. This commit
does so.
Also, the "important config" instructions for discovery say that you MUST set
`discovery.seed_hosts` whereas in fact it is fine to ignore this setting and
use a dynamic discovery mechanism instead. This commit weakens this statement
and links to the docs for dynamic discovery mechanisms.
Finally, this section is also overloaded with some technical details that are
not important for this context and are adequately covered elsewhere, and
completely fails to note that the default discovery port is 9300. This commit
addresses this.
Adds the `?refresh=wait_for` query argument to an index API snippet in
the term vectors API docs.
This should ensure the document is indexed and available before a
subsequent term vectors API request executes.
Fixes#52814.
Remove reference to an "SQL API" which could suggest that one needs to
treat this in a special way when configuring the ODBC driver.
(cherry picked from commit 451c341e0193b542409e8891ec2a31e62529a5e7)
Adds an explicit "important" admonition discouraging apps from using
cat APIs.
cat APIs are intended for human consumption via the command line or
Kibana console only. They are not intended for consumption by
applications.
Indices open with the `niofs` store type load much more data on-heap than
indices open with the `mmapfs` store type. This limitation is now documented
and examples have been updated to show how to update settings to use the
`mmapfs` store type rather than `niofs`.
We should be more explicit about the downsides of disabling replicas and
explain that users should be ready to re-do the entire load in case of
issues mid-way.
One architecture that we have recommended to several users to speed up
indexing involved using CCR to prevent searching from stealing resources
from indexing.
Before boost in script_score query was wrongly applied only to the subquery.
This commit makes sure that the boost is applied to the whole score
that comes out of script.
Closes#48465
Explicitly notes the Elasticsearch API endpoints that support CCS.
This should deter users from attempting to use CCS with other API
endpoints, such as `GET <index>/_doc/<_id>`.
* Adds an example request to the top of the page.
* Relocates several parameters erroneously listed under "Request body"
to the appropriate "Query parameters" section.
* Updates the "Request body" section to better document the NDJSON
structure of msearch requests.
Add default value to each one of the usages of `allow_no_indices`
since it differs between different APIs.
Relates to: #52534
(cherry picked from commit 2eb986488ac326d6da6ab8ad0203a94e08684a36)
This adds machine learning model feature importance calculations to the inference processor.
The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
"field_mappings": {},
"model_id": "my_model",
"inference_config": {
"regression": {
"num_top_feature_importance_values": 3
}
}
}
```
This will write to the document as follows:
```
"inference" : {
"feature_importance" : {
"FlightTimeMin" : -76.90955548511226,
"FlightDelayType" : 114.13514762158526,
"DistanceMiles" : 13.731580450792187
},
"predicted_value" : 108.33165831875137,
"model_id" : "my_model"
}
```
This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888).
It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7.
Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.
NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc
usability blocked by: https://github.com/elastic/ml-cpp/pull/991
Re-adds several redirects removed with #50510.
These redirects were related to the relocation of several API docs to
new pages under the 'REST APIs' chapter.
We've since decided to only remove such redirects with major releases.
This commit updates the enrich.get_policy API to specify name
as a list, in line with other URL parts that accept a comma-separated
list of values.
In addition, update the get enrich policy API docs
to align the URL part name in the documentation with
the name used in the REST API specs.
(cherry picked from commit 94f6f946ef283dc93040e052b4676c5bc37f4bde)
* Refresh snapshots with latest look
Add new snapshots with the connection editor to reflect the latest UI.
* Document the effect of the late added params
Add details about the Cloud ID setting, as well as those on the Misc
tab.
(cherry picked from commit afa67625e847e99a22264f5dd6fa0daa37786c6f)
Updates the cross-cluster search (CCS) documentation to note how
cluster-level settings are applied.
When `ccs_minimize_roundtrips` is `true`, each cluster applies its own
cluster-level settings to the request.
When `ccs_minimize_roundtrips` is `false`, cluster-level settings for
the local cluster is used. This includes shard limit settings, such as
`action.search.shard_count.limit`, `pre_filter_shard_size`, and
`max_concurrent_shard_requests`. If these limits are set too low, the
request could be rejected.