Creates a reusable template for token filter reference documentation.
Contributors can make a copy of this template and customize it when
documenting new token filters.
Implement DATETIME_PARSE(<datetime_str>, <pattern_str>) function
which allows to parse a datetime string according to the specified
pattern into a datetime object. The patterns allowed are those of
java.time.format.DateTimeFormatter.
Relates to #53714
(cherry picked from commit 3febcd8f3cdf9fdda4faf01f23a5f139f38b57e0)
This commit includes a number of changes to reduce overall build
configuration time. These optimizations include:
- Removing the usage of the 'nebula.info-scm' plugin. This plugin
leverages jgit to load read various pieces of VCS information. This
is mostly overkill and we have our own minimal implementation for
determining the current commit id.
- Removing unnecessary build dependencies such as perforce and jgit
now that we don't need them. This reduces our classpath considerably.
- Expanding the usage lazy task creation, particularly in our
distribution projects. The archives and packages projects create
lots of tasks with very complex configuration. Avoiding the creation
of these tasks at configuration time gives us a nice boost.
Implement DATETIME_FORMAT(<date/datetime/time>, ) function
which allows for formatting a timestamp to the specified format. The
patterns allowed as those of java.time.format.DateTimeFormatter.
Related to #53714
(cherry picked from commit 72be0b54a9299e87e785469cdc9aafac2a48c046)
In 7.x, an index template will fail to apply if it contains a `_default_`
mapping. Several users have expressed confusion over the fact that loading the
template doesn't show any default mappings. This docs change clarifies that in
order to see all mappings in the template, you must pass `include_type_name`.
Adds a detailed example to the "Avoid scripts" section of the "Tune
for search speed" docs. The detail outlines how a script used to
transform indexed data can be moved to ingest.
The update also removes an outdated reference to supported script
languages.
This commit adds a new point field that is able to index arbitrary pair of values (x/y)
in the cartesian space. It only supports filtering using shape queries at the moment.
This is a backport of #54803 for 7.x.
This pull request cherry picks the squashed commit from #54803 with the additional commits:
6f50c92 which adjusts master code to 7.x
a114549 to mute a failing ILM test (#54818)
48cbca1 and 50186b2 that cleans up and fixes the previous test
aae12bb that adds a missing feature flag (#54861)
6f330e3 that adds missing serialization bits (#54864)
bf72c02 that adjust the version in YAML tests
a51955f that adds some plumbing for the transport client used in integration tests
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
Adds t_test metric aggregation that can perform paired and unpaired two-sample
t-tests. In this PR support for filters in unpaired is still missing. It will
be added in a follow-up PR.
Relates to #53692
Today when canceling a task we broadcast ban/unban requests to all nodes
in the cluster. This strategy does not scale well for hierarchical
cancellation. With this change, we will track outstanding child requests
and broadcast the cancellation to only nodes that have outstanding child
tasks. This change also prevents a parent task from sending child
requests once it got canceled.
Relates #50990
Supersedes #51157
Co-authored-by: Igor Motov <igor@motovs.org>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Looking into #50237 I realized that two of the examples given in the
documentation around date math rounding for range queries on date fields using
`gt` and `lt` is slightly off by a nanosecond. This PR changes this to the
bounds that are currently parsed using these parameters.
* Document VarcharLimit and EarlyExecution params
Add the documentation for the newly added VarcharLimit and
EarlyExecution DSN attributes.
* Remove obsolete VersionChecking param
This param had been removed already along the #53082 work.
* Update docs/reference/sql/endpoints/odbc/configuration.asciidoc
fix typo
Co-Authored-By: Stuart Cam <stuart@codebrain.co.uk>
* Update docs/reference/sql/endpoints/odbc/configuration.asciidoc
fix typo
Co-Authored-By: Stuart Cam <stuart@codebrain.co.uk>
(cherry picked from commit f38761631a12b38f7f075635f7ac61dc96656cd7)
In #33933 we disallowed changing the `enabled` parameter in object mappings.
However, the fix didn't cover the root object mapper. This PR adjusts the change
to also include the root mapper and clarifies the error message.
The `indices.recovery.max_bytes_per_sec` recovery bandwidth limit can differ
between nodes if it is not set dynamically, but today this is not obvious. This
commit adds a paragraph to its documentation clarifying how to set different
bandwidth limits on each node.
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
* [ML] add new inference_config field to trained model config (#54421)
A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder.
The inference processor can still override whatever is set as the default in the trained model config.
* fixing for backport
* [ML] prefer secondary authorization header for data[feed|frame] authz (#54121)
Secondary authorization headers are to be used to facilitate Kibana spaces support + ML jobs/datafeeds.
Now on PUT/Update/Preview datafeed, and PUT data frame analytics the secondary authorization is preferred over the primary (if provided).
closes https://github.com/elastic/elasticsearch/issues/53801
* fixing for backport
* [ML] add num_matches and preferred_to_categories to category defintion objects (#54214)
This adds two new fields to category definitions.
- `num_matches` indicating how many documents have been seen by this category
- `preferred_to_categories` indicating which other categories this particular category supersedes when messages are categorized.
These fields are only guaranteed to be up to date after a `_flush` or `_close`
native change: https://github.com/elastic/ml-cpp/pull/1062
* adjusting for backport
This commit corrects the description for the request URI index for the Multi Get (mget) API.
The index can only be a single index name (multiple or wildcard expressions not supported),
and acts as the index to use when "ids" are specified, or a document in the "docs" array does
not specify an index.
(cherry picked from commit aa4926ed7f91dfbf7973a01b1e4682e91dda11a9)
This commit adds a top-level link to the autoscaling API reference page
to the API docs. Additionally, we add a conditional guard on the API
pages to only include them in development builds of the docs.
EQL functions are an easy way for users to transform indexed data
at search time. However, using multiple functions can make
queries difficult to write and slows search speeds.
Users can circumvent this by indexing fields containing the transformed
data, but that usually slows index speeds.
This adds a related tip and example covering these tradeoffs.
This commit is the first in a series of commits that introduces
autoscaling policies, and APIs for working with them. For now, we
introduce the basic infrastructure, and a single API for putting an
autoscaling policy. We will follow in rapid succession with APIs for
getting, and deleting autoscaling policies.
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
The anchor ID for the snapshot repository plugins section in the docs
was recently changes from `_repository_plugins` to
`snapshots-repository-plugins`.
This adds a corresponding redirect so no links are broken.
Remove mention of the `yellow` and `red` starting
health status from the rolling upgrade docs.
Instead, we should emphasize that users wait
for the node to recover with a health status of
`green` rather than the starting status.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Fixing the naming of the HLRC values to match the ToXContent field names (i.e. the field names returned from an API call).
Also fixes the names in the _cat API as well.
closes#53946
This setting is not documented and has dubious value since it means
there can be nodes in the cluster (non-data and non-master nodes) that
do not have persistent node IDs. This does not have any use cases so
this commit removes the setting.
The failing suggester documentation test was expecting specific scores in the
test response, which is fragile implementation details that e.g. can change with
different lucene versions and generally shouldn't be done in documentation test.
Instead we usually replace the float values in the output response by the ones
in the actual response.
Closes#54257
The remove keystore command can handle multiple settings. In a few
places, we were not consistent about mentioning this. This commit
addreses this, in the CLI help, and the docs.
This commit renames wait_for_completion to wait_for_completion_timeout in submit async search and get async search.
Also it renames clean_on_completion to keep_on_completion and turns around its behaviour.
Closes#54069
Today the keystore add-file command can only handle adding a single
setting/file pair in a single invocation. This incurs the startup costs
of the JVM many times, which in some environments can be expensive. This
commit teaches the add-file keystore command to accept adding multiple
settings in a single invocation.
The documentation was missing the long option for the force option, and
the short option for the stdin option. This commit addresses this by
adding these to the documentation.
Today the keystore add command can only handle adding a single
setting/value pair in a single invocation. This incurs the startup costs
of the JVM many times, which in some environments can be expensive. This
commit teaches the add keystore command to accept adding multiple
settings in a single invocation.
Adds documentation for the EQL `substring` function.
Supporting changes:
* Creates a new "EQL function reference" page
* Updates the title of the "EQL syntax reference" page for consistency
* Adds a brief "Functions" section to the EQL syntax docs
* Updates EQL limitations docs to state that only array functions are
unsupported
Makes the following changes to the `keyword_marker` token filter docs:
* Rewrites description and adds Lucene link
* Adds detailed analyze example
* Rewrites parameter definitions
* Adds custom analyzer and filter example
Documents missing data types for several response parameters returned
by the node stats API.
Also adds several missing human-readable parameters returned by the API.
Currently the remote info api has added a number of possible fields
(proxy, num_socket_connections, etc) that are available in proxy mode.
These fields are not aligned with what the settings are named. This
commit modifies this API to align with the settings.
Submit async search forces pre_filter_shard_size for the underlying search that it creates.
With this commit we also prevent users from overriding such default as part of request validation.
DocsClientYamlTestSuiteIT sometimes fails for CCR
related tests because tests are started before the license
is fully applied and active within the cluster. The first
tests to be executed then fails with the error noticed
in #53430. This can be easily reproduced locally by
only running CCR docs tests.
This commit adds some @Before logic in
DocsClientYamlTestSuiteIT so that it waits for the
license to be active before running CCR tests.
Closes#53430
It is possible for ML jobs to open lazily if the "allow_lazy_open"
option in the job config is set to true. Such jobs wait in the
"opening" state until a node has sufficient capacity to run them.
This commit fixes the bug that prevented datafeeds for jobs lazily
waiting assignment from being started. The state of such datafeeds
is "starting", and they can be stopped by the stop datafeed API
while in this state with or without force.
Backport of #53918
add 2 additional stats: processing time and processing total which capture the
time spent for processing results and how often it ran. The 2 new stats
correspond to the existing indexing and search stats. Together with indexing
and search this now allows the user to see the full picture, all 3 stages.
This is the first in a series of commits that will introduce the
autoscaling deciders framework. This commit introduces the basic
framework for representing autoscaling decisions.
This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase
automatically if one of these conditions is met:
* The request targets more than 128 shards.
* The request contains read-only indices.
* The primary sort of the query targets an indexed field.
Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value.
Closes#39835
It seemed confusing for users that our top-level mapping page still had a
prominent section named 'Mapping Type'. This PR reworks the docs to remove this
reference and adds a note about types removal (similar to the note we added to
other APIs like put mapping).
This change adds the `nori_number` token filter.
It also adds a `discard_punctuation` option in nori_tokenizer that should be used in conjunction with the new filter.
* Get Async Search: omit _clusters section when empty (#53907)
The _clusters section is omitted by the search API whenever no remote clusters are searched. Async search should do the same, but Get Async Search returns a deserialized response, hence a weird `_clusters` section with all values set to `0` gets returned instead. In fact the recreated Clusters object is not the same object as the EMPTY constant, yet it has the same content.
This commit addresses this by changing the comparison in the `toXContent` method to not print out the section if the number of total clusters is `0`.
* Async search: remove version from response (#53960)
The goal of the version field was to quickly show when you can expect to find something new in the search response, compared to when nothing has changed. This can also be done by looking at the `_shards` section and `num_reduce_phases` returned with the search response. In fact when there has been one or more additional reduction of the results, you can expect new results in the search response. Otherwise, the `_shards` section could notify of additional failures of shards that have completed the query, but that is not a guarantee that their results will be exposed (only when the following partial reduction is performed their results will be available).
That said this commit clarifies this in the docs and removes the version field from the async search response
* Async Search: replicas to auto expand from 0 to 1 (#53964)
This way single node clusters that are green don't go yellow once async search is used, while
all the others still have one replica.
* [DOCS] address timing issue in async search docs tests (#53910)
The docs snippets for submit async search have proven difficult to test as it is not possible to guarantee that you get a response that is not final, even when providing `wait_for_completion=0`. In the docs we want to show though a proper long-running query, and its first response should be partial rather than final.
With this commit we adapt the docs snippets to show a partial response, and replace under the hood all that's needed to make the snippets tests succeed when we get a final response. Also, increased the timeout so we always get a final response.
Closes#53887Closes#53891
The joda to java.time migration requires users to upgrade their mappings. We allow them to still use 6.x created indices with joda patterns in 7 but ask them to upgrade their patterns in 7.x.
This migration guide is to help them understand how they could be affected and what needs to be changed in their mappings.
closes#51614closes#51236
The terms-lookup section of our terms query docs currently state that the
index, id and path fields are optional. They should be marked instead
as required.
Backport to 7x
Enable geo_shape query to work on geo_point fields for shapes: circle, polygon, multipolygon, rectangle see: #48928
Co-Authored-By: @iverase
Adds conceptual docs for token graphs.
These docs cover:
* How a token graph is constructed from a token stream
* How synonyms and multi-position tokens impact token graphs
* How token graphs are used during search
* Why some token filters produce invalid token graphs
Also makes the following supporting changes:
* Adds anchors to the 'Anatomy of an Analyzer' docs for cross-linking
* Adds several SVGs for token graph diagrams
Removes the `flat_settings` and `timeout` query parameters from the JSON
spec and asciidoc docs for the put index template API.
These parameters are not supported by the API.
It is useful to be able to delay state recovery until enough data nodes have
joined the cluster, since this gives the shard allocator a decent opportunity
to re-use as much existing data as possible. However we also have the option to
delay state recovery until a certain number of master-eligible nodes have
joined, and this is unnecessary: we require a majority of master-eligible nodes
for state recovery, and there is no advantage in waiting for more.
This commit deprecates the unnecessary settings in preparation for their
removal.
Relates #51806
This commit adjusts a `deprecation[...]` message in the docs since such
messages must be on a single line. It also moves this message to the start of
the description of the deprecated setting as is the case with other such
messages.
* Submit async search to work only with POST (#53368)
Currently the submit async search API can be called using both GET and POST at REST, but given that it submits a call and creates internal state, POST should be the only allowed method.
* Refine SearchProgressListener internal API (#53373)
The following cumulative improvements have been made:
- rename `onReduce` and `notifyReduce` to `onFinalReduce` and `notifyFinalReduce`
- add unit test for `SearchShard`
- on* methods in `SearchProgressListener` shouldn't need to be public as they should never be called directly, they only need to be overridden hence they can be made protected. They are actually called directly from a test which required some adapting, like making `AsyncSearchTask.Listener` class package private instead of private
- Instead of overriding `getProgressListener` in `AsyncSearchTask`, as it feels weird to override a getter method, added a specific method that allows to retrieve the Listener directly without needing to cast it. Made the getter and setter for the listener final in the base class.
- rename `SearchProgressListener#searchShards` methods to `buildSearchShards` and make it static given that it accesses no instance members
- make `SearchShard` and `SearchShardTask` classes final
* Move async search yaml tests to x-pack yaml test folder (#53537)
The yaml tests for async search currently sit in its qa folder. There is no reason though for them to live in a separate folder as they don't require particular setup. This commit moves them to the main folder together with the other x-pack yaml tests so that they will be run by the client test runners too.
* [DOCS] Add temporary redirect for async-search (#53454)
The following API spec files contain a link to a not-yet-created
async search docs page:
* [async_search.delete.json][0]
* [async_search.get.json][1]
* [async_search.submit.json][2]
The Elaticsearch-js client uses these spec files to create their docs.
This created a broken link in the Elaticsearch-js docs, which has broken
the docs build.
This PR adds a temporary redirect for the docs page. This redirect
should be removed when the actual API docs are added.
[0]: https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/src/test/resources/rest-api-spec/api/async_search.delete.json
[1]: https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/src/test/resources/rest-api-spec/api/async_search.get.json
[2]: https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/src/test/resources/rest-api-spec/api/async_search.submit.json
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
* Removes experimental.
* Replaces `"v"` (for value) with `"m"` (for metric).
* Move the note about tiebreaking into the list of limitations of the
sort.
* Explain how you ask for `metrics`.
* Clean up some wording.
* Link to the docs from `top_metrics`.
Closes#51813
Makes the following changes to the `remove_duplicates` token filter
docs:
* Rewrites description and adds Lucene link
* Adds detailed analyze example
* Adds custom analyzer example
* New wildcard field optimised for wildcard queries (#49993)
Indexes values using size 3 ngrams and also stores the full original as a binary doc value.
Wildcard queries operate by using a cheap approximation query on the ngram field followed up by a more expensive verification query using an automaton on the binary doc values. Also supports aggregations and sorting.
Keyword field values with length more than ignore_above are not
indexed. But highlighters still were retrieving these values
from _source and were trying to highlight them. This sometimes lead to
errors if a field length exceeded max_analyzed_offset. But also this
is an overall wrong behaviour to attempt to highlight something that was
ignored during indexing.
This PR checks if a keyword value was ignored because of its length,
and if yes, skips highlighting it.
Backport: #53408Closes#43800
Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.
Fixes#52427.
This changes the `top_metrics` aggregation to return metrics in their
original type. Since it only supports numerics, that means that dates,
longs, and doubles will come back as stored, with their appropriate
formatter applied.
This change removes the Lucene's experimental flag from the documentations of the following
tokenizer/filters:
* Simple Pattern Split Tokenizer
* Simple Pattern tokenizer
* Flatten Graph Token Filter
* Word Delimiter Graph Token Filter
The flag is still present in Lucene codebase but we're fully supporting these tokenizers/filters
in ES for a long time now so the docs flag is misleading.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Restructures the 'Update an enrich policy' section to:
* Migrate the content to the section. It was previously stored in the
Put Enrich Policy API docs.
* Remove the warning tag admonition from the section content.
* Replace a reused section earlier in the "Set up an enrich processor"
page with a link.
No substantive changes were made to the content.
Adds a new `default_field_map` field to trained model config objects.
This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.
The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
Makes the following changes to the `word_delimiter` token filter docs:
* Adds a warning admonition recommending the `word_delimiter_graph`
filter instead. This warning includes a link to the deprecated Lucene
`WordDelimiterFilter`.
* Updates the description
* Adds detailed analyze snippet
* Adds custom analyzer and custom filter snippets
* Reorganizes and updates parameter documentation
In a tip admonition, we recommend using the `keyword` tokenizer with the
`word_delimiter_graph` token filter. However, we only use the
`whitespace` tokenizer in the example snippets. This updates those
snippets to use the `keyword` tokenizer instead.
Also corrects several spacing issues for arrays in these docs.
Updates the SVG for a token graph to make the layout consistent with
other graphs. This means moving the text directly above the edge lines.
Previously, the text was above the edge line.
Adds a tip admonition to the basic example in the EQL search docs.
This tip lets users know they can set up a Beat to automatically
index data in ES, rather than manually indexing using the bulk or index
APIs.
Documents the `nodes` response parameters returned by the
`_cluster/stats` API.
Also adds collapsible attributes for the `indices` and `nodes`
sections.
Makes the following changes to the `word_delimiter_graph` token filter
docs:
* Updates the Lucene experimental admonition.
* Updates description
* Adds analyze snippet
* Adds custom analyzer and custom filter snippets
* Reorganizes and updates parameter list
* Expands and updates section re: differences between `word_delimiter`
and `word_delimiter_graph`
This commit introduces hidden aliases. These are similar to hidden
indices, in that they are not visible by default, unless explicitly
specified by name or by indicating that hidden indices/aliases are
desired.
The new alias property, `is_hidden` is implemented similarly to
`is_write_index`, except that it must be consistent across all indices
with a given alias - that is, all indices with a given alias must
specify the alias as either hidden, or all specify it as non-hidden,
either explicitly or by omitting the `is_hidden` property.
7.5 and 7.6 had a regression that allowed for
script_score queries to have negative scores.
We have corrected this regression in #52478.
This is an addition to #52478 that adds
a test and release notes.
Adds documentation for the `any` keyword to the EQL syntax docs.
Includes:
* Definition of an event category and its relationship to the event
category field.
* Example matching all event categories using `any` keyword
* Example using `any` with `where true`
Updates the documented default `event_category_field` and `timestamp_field`
values for the EQL search API. Also updates related guidance in the
EQL requirement docs.
Relates to #53073.
Per the [Asciidoctor docs][0], Asciidoctor replaces the following
syntax with double arrows in the rendered HTML:
* => renders as ⇒
* <= renders as ⇐
This escapes several unintended replacements, such as in the Painless
docs.
Where appropriate, it also replaces some double arrow instances with
single arrows for consistency.
[0]: https://asciidoctor.org/docs/user-manual/#replacements
Makes the following changes to the `stop` token filter docs:
* Updates description
* Adds a link to the related Lucene filter
* Adds detailed analyze snippet
* Updates custom analyzer and custom filter snippets
* Adds a list of predefined stop words by language
Co-authored-by: ScottieL <36999642+ScottieL@users.noreply.github.com>
This field is a specialization of the `keyword` field for the case when all
documents have the same value. It typically performs more efficiently than
keywords at query time by figuring out whether all or none of the documents
match at rewrite time, like `term` queries on `_index`.
The name is up for discussion. I liked including `keyword` in it, so that we
still have room for a `singleton_numeric` in the future. However I'm unsure
whether to call it `singleton`, `constant` or something else, any opinions?
For this field there is a choice between
1. accepting values in `_source` when they are equal to the value configured
in mappings, but rejecting mapping updates
2. rejecting values in `_source` but then allowing updates to the value that
is configured in the mapping
This commit implements option 1, so that it is possible to reindex from/to an
index that has the field mapped as a keyword with no changes to the source.
Backport of #49713
implement transform node attributes to disable transform on certain nodes and
test which nodes are allowed to do remote connections
closes#52200closes#50033closes#48734
backport #52712
Makes the following updates to the EQL search tutorial:
* Adds an API response to the basic tutorial
* Adds an example using the `event_type_field` parm
* Adds an example using the `timestamp_field`parm
* Adds an example using the `query` parm
* Updates example dataset to support more EQL query variety
Makes the following changes to the `trim` token filter docs:
* Updates description
* Adds a link to the related Lucene filter
* Adds tip about removing whitespace using tokenizers
* Adds detailed analyze snippets
* Adds custom analyzer snippet
Adds a warning admonition stating that the `index_options` mapping
parameter is intended only for `text` fields.
Removes an outdated statement regarding default values for numeric
and other datatypes.
Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern `.ml-stats-*` whose
first concrete index will be `.ml-stats-000001`. This index serves
to store instrumentation information for those jobs.
Backport of #52778 and #52958
Closes#43990. Describe how to change the default GC settings without changing
the default `jvm.options`. Give examples using `jvm.options.d`, and
`ES_JAVA_OPTS` with Docker.
Backport of #51233 to the seven dot x branch.
Tries to load a `Mapper` instance for the mapping snippet of a dynamic template.
This should catch things like using an analyzer that is undefined or mapping attributes that are unused.
This is best effort:
* If `{{name}}` placeholder is used in the mapping snippet then validation is skipped.
* If `match_mapping_type` is not specified then validation is performed for all mapping types.
If parsing succeeds with a single mapping type then this the dynamic mapping is considered valid.
If is detected that a dynamic template mapping snippet is invalid at mapping update time then the mapping update is failed for indices created on 8.0.0-alpha1 and later. For indices created on prior version a deprecation warning is omitted instead. In 7.x clusters the mapping update will never fail in case of an invalid dynamic template mapping snippet and a deprecation warning will always be omitted.
Closes#17411Closes#24419
Co-authored-by: Adrien Grand <jpountz@gmail.com>
This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index.
This is necessary for the following use cases:
- Reading from frozen indices
- Allowing certain indices in multiple index patterns to not exist yet
These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object.
closes https://github.com/elastic/elasticsearch/issues/48056
This change adds the recall@k metric and refactors precision@k to match
the new metric.
Recall@k is an important metric to use for learning to rank (LTR)
use-cases. Candidate generation or first ranking phase ranking functions
are often optimized for high recall, in order to generate as many
relevant candidates in the top-k as possible for a second phase of
ranking. Adding this metric allows tuning that base query for LTR.
See: https://github.com/elastic/elasticsearch/issues/51676
Backports: https://github.com/elastic/elasticsearch/pull/52577
Add query execution and return actual results returned from
Elasticsearch inside the tests
(cherry picked from commit 3e039282bf991af87604a6d4f8eada19d5e33842)
The introductory sections of the reference manual contains some simplified
instructions for adding a node to the cluster. Unfortunately they are a little
too simplified and only really work for clusters running on `localhost`. If you
try and follow these instructions for a distributed cluster then the new node
will, confusingly, auto-bootstrap itself into a distinct one-node cluster.
Multiple nodes running on localhost is a valid config, of course, but we should
spell out that these instructions are really only for experimentation and that
it takes a bit more work to add nodes to a distributed cluster. This commit
does so.
Also, the "important config" instructions for discovery say that you MUST set
`discovery.seed_hosts` whereas in fact it is fine to ignore this setting and
use a dynamic discovery mechanism instead. This commit weakens this statement
and links to the docs for dynamic discovery mechanisms.
Finally, this section is also overloaded with some technical details that are
not important for this context and are adequately covered elsewhere, and
completely fails to note that the default discovery port is 9300. This commit
addresses this.
Adds the `?refresh=wait_for` query argument to an index API snippet in
the term vectors API docs.
This should ensure the document is indexed and available before a
subsequent term vectors API request executes.
Fixes#52814.
* Smarter copying of the rest specs and tests (#52114)
This PR addresses the unnecessary copying of the rest specs and allows
for better semantics for which specs and tests are copied. By default
the rest specs will get copied if the project applies
`elasticsearch.standalone-rest-test` or `esplugin` and the project
has rest tests or you configure the custom extension `restResources`.
This PR also removes the need for dozens of places where the x-pack
specs were copied by supporting copying of the x-pack rest specs too.
The plugin/task introduced here can also copy the rest tests to the
local project through a similar configuration.
The new plugin/task allows a user to minimize the surface area of
which rest specs are copied. Per project can be configured to include
only a subset of the specs (or tests). Configuring a project to only
copy the specs when actually needed should help with build cache hit
rates since we can better define what is actually in use.
However, project level optimizations for build cache hit rates are
not included with this PR.
Also, with this PR you can no longer use the includePackaged flag on
integTest task.
The following items are included in this PR:
* new plugin: `elasticsearch.rest-resources`
* new tasks: CopyRestApiTask and CopyRestTestsTask - performs the copy
* new extension 'restResources'
```
restResources {
restApi {
includeCore 'foo' , 'bar' //will include the core specs that start with foo and bar
includeXpack 'baz' //will include x-pack specs that start with baz
}
restTests {
includeCore 'foo', 'bar' //will include the core tests that start with foo and bar
includeXpack 'baz' //will include the x-pack tests that start with baz
}
}
```
Remove reference to an "SQL API" which could suggest that one needs to
treat this in a special way when configuring the ODBC driver.
(cherry picked from commit 451c341e0193b542409e8891ec2a31e62529a5e7)
Adds an explicit "important" admonition discouraging apps from using
cat APIs.
cat APIs are intended for human consumption via the command line or
Kibana console only. They are not intended for consumption by
applications.
Indices open with the `niofs` store type load much more data on-heap than
indices open with the `mmapfs` store type. This limitation is now documented
and examples have been updated to show how to update settings to use the
`mmapfs` store type rather than `niofs`.
We should be more explicit about the downsides of disabling replicas and
explain that users should be ready to re-do the entire load in case of
issues mid-way.
One architecture that we have recommended to several users to speed up
indexing involved using CCR to prevent searching from stealing resources
from indexing.
Before boost in script_score query was wrongly applied only to the subquery.
This commit makes sure that the boost is applied to the whole score
that comes out of script.
Closes#48465
Explicitly notes the Elasticsearch API endpoints that support CCS.
This should deter users from attempting to use CCS with other API
endpoints, such as `GET <index>/_doc/<_id>`.
* Adds an example request to the top of the page.
* Relocates several parameters erroneously listed under "Request body"
to the appropriate "Query parameters" section.
* Updates the "Request body" section to better document the NDJSON
structure of msearch requests.
Add default value to each one of the usages of `allow_no_indices`
since it differs between different APIs.
Relates to: #52534
(cherry picked from commit 2eb986488ac326d6da6ab8ad0203a94e08684a36)
This adds machine learning model feature importance calculations to the inference processor.
The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
"field_mappings": {},
"model_id": "my_model",
"inference_config": {
"regression": {
"num_top_feature_importance_values": 3
}
}
}
```
This will write to the document as follows:
```
"inference" : {
"feature_importance" : {
"FlightTimeMin" : -76.90955548511226,
"FlightDelayType" : 114.13514762158526,
"DistanceMiles" : 13.731580450792187
},
"predicted_value" : 108.33165831875137,
"model_id" : "my_model"
}
```
This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888).
It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7.
Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.
NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc
usability blocked by: https://github.com/elastic/ml-cpp/pull/991
Re-adds several redirects removed with #50510.
These redirects were related to the relocation of several API docs to
new pages under the 'REST APIs' chapter.
We've since decided to only remove such redirects with major releases.
When `PUT` is called to store a trained model, it is useful to return the newly create model config. But, it is NOT useful to return the inflated definition.
These definitions can be large and returning the inflated definition causes undo work on the server and client side.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit updates the enrich.get_policy API to specify name
as a list, in line with other URL parts that accept a comma-separated
list of values.
In addition, update the get enrich policy API docs
to align the URL part name in the documentation with
the name used in the REST API specs.
(cherry picked from commit 94f6f946ef283dc93040e052b4676c5bc37f4bde)
* Refresh snapshots with latest look
Add new snapshots with the connection editor to reflect the latest UI.
* Document the effect of the late added params
Add details about the Cloud ID setting, as well as those on the Misc
tab.
(cherry picked from commit afa67625e847e99a22264f5dd6fa0daa37786c6f)
Updates the cross-cluster search (CCS) documentation to note how
cluster-level settings are applied.
When `ccs_minimize_roundtrips` is `true`, each cluster applies its own
cluster-level settings to the request.
When `ccs_minimize_roundtrips` is `false`, cluster-level settings for
the local cluster is used. This includes shard limit settings, such as
`action.search.shard_count.limit`, `pre_filter_shard_size`, and
`max_concurrent_shard_requests`. If these limits are set too low, the
request could be rejected.
* Fix "Description"s for various sections in the functions pages.
* Added a TIP for searching using a routing key.
* Other small polishings
(cherry picked from commit 9fad0b1ac4409a42c435ed040f41cbaea18930a3)
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.
At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.
Co-Authored by: Zachary Tong <zach@elastic.co>
This adds a builder and parsed results for the `string_stats`
aggregation directly to the high level rest client. Without this the
HLRC can't access the `string_stats` API without the elastic licensed
`analytics` module.
While I'm in there this adds a few of our usual unit tests and
modernizes the parsing.
The example of how to access the nano value of a date_nanos field has
been broken since it was created. This commit fixes it to use the
correct scripting methods.
closes#51931
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
I plan to add additional sections to this page with future PRs:
* Specify timestamp and event type fields
* Specify a join key field
* Filter using query DSL
* Paginate a large response
See #51057.
Add a section to point out that when ordering by an aggregate
only plain aggregate functions are allowed, no scalars/operators
can be used on top of them.
Fixes: #52204
(cherry picked from commit 78a1185549ff7f3229fd2d036567eb2a4f2cf230)
7.x backport of #52201
Provides a path to set register the EQL feature flag in release builds.
This enables EQL in release builds so that release docs tests pass.
Release docs tests do not have infrastructure in place to only register
snippets from included portions of the docs, they instead include all
docs snippets.
Since EQL can not be enabled in release builds, this meant that the EQL
snippets fail in the release docs tests.
This adds the ability to enable EQL in the release docs tests. This
system property will be removed when EQL is ready for release.
Adds the ability to display docs on permanently unreleased branches,
such as `master` and `7.x`.
Also updates how the autoscaling and EQL docs are included.
Currently, these feature-flag docs would display on any unreleased
branches that contain the changes, such as 7.7.
Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely
if joining a faulty master that is too slow to respond, and
`cluster.publish.timeout` to allow a faulty master to detect that it is unable
to publish its cluster state updates in a timely fashion. If these timeouts
occur then the node restarts the discovery process in an attempt to find a
healthier master.
In the special case of `discovery.type: single-node` there is no point in
looking for another healthier master since the single node in the cluster is
all we've got. This commit suppresses these timeouts and instead lets the node
wait for joins and publications to succeed no matter how long this might take.
The changes add more granularity for identiying the data ingestion user.
The ingest pipeline can now be configure to record authentication realm and
type. It can also record API key name and ID when one is in use.
This improves traceability when data are being ingested from multiple agents
and will become more relevant with the incoming support of required
pipelines (#46847)
Resolves: #49106
* Allow forcemerge in the hot phase for ILM policies
This commit changes the `forcemerge` action to also be allowed in the `hot` phase for policies. The
forcemerge will occur after a rollover, and allows users to take advantage of higher disk speeds for
performing the force merge (on a separate node type, for example).
On caveat with this is that a `forcemerge` in the `hot` phase *MUST* be accompanied by a `rollover`
action. ILM validates policies to ensure this is the case.
Resolves#43165
* Use anyMatch instead of findAny in validation
* Make randomTimeseriesLifecyclePolicy single-pass
This change adds support for the following new model_size_stats
fields:
- categorized_doc_count
- total_category_count
- frequent_category_count
- rare_category_count
- dead_category_count
- categorization_status
Backport of #51879
This commit introduces the ability to override JVM options by adding
custom JVM options files to a jvm.options.d directory. This simplifies
administration of Elasticsearch by not requiring administrators to keep
the root jvm.options file in sync with changes that we make to the root
jvm.options file. Instead, they are not expected to modify this file but
instead supply their own in jvm.options.d. In Docker installations, this
means they can bind mount this directory in. In future versions of
Elasticsearch, we can consider removing the root jvm.options file
(instead, providing all options there as system JVM options).
This commit provides a path to set register the autoscaling feature flag
in release builds, and therefore enabling autoscaling in release
builds. The primary reason that we add this is so that our release docs
tests can pass. Our release docs tests do not have infrastructure in
place to only register snippets from included portions of the docs, they
instead include all docs snippets. Since autoscaling can not be enabled
in release builds, this meant that the autoscaling snippets would fail
in the release docs tests. To address then, we need the ability to
enable autoscaling in the release docs tests which we can now do with
the system property added here. This system property will be removed
when autoscaling is ready for release.
There is some extraneous whitespace here, and every time I look at this
file my editor wants to make these changes and so my diffs end up having
this noise in it which I fight to exclude. This commit addresses this
issue by removing this extraneous whitespace.
The main purpose of this commit is to add a single autoscaling REST
endpoint skeleton, for the purpose of starting to build out the build
and testing infrastructure that will surround it. For example, rather
than commiting a fully-functioning autoscaling API, we introduce here
the skeleton so that we can start wiring up the build and testing
infrastructure, establish security roles/permissions, an so on. This
way, in a forthcoming PR that introduces actual functionality, that PR
will be smaller and have less distractions around that sort of
infrastructure.
Adds the ability to display docs on permanently unreleased branches,
such as `master` and `7.x`.
Also updates how the autoscaling and EQL docs are included.
Currently, these feature-flag docs would display on any unreleased
branches that contain the changes, such as 7.7.
Backport of #51867.
Tweak the documentation around configuring the heap size when using
Docker, to state that:
- using `ES_JAVA_OPTS` is the preferred method
- Any `ES_JAVA_OPTS` overrides the defaults in `jvm.options`
- It's possible to bind-mount a custom `jvm.options`
* Add empty_value parameter to CSV processor
This change adds `empty_value` parameter to the CSV processor.
This value is used to fill empty fields. Fields will be skipped
if this parameter is ommited. This behavior is the same for both
quoted and unquoted fields.
* docs updated
* Fix compilation problem
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Adds documentation for basic EQL syntax.
Joins, sequences, and other syntax to be added as its supported
in future development.
Co-Authored-By: Ross Wolf <31489089+rw-access@users.noreply.github.com>
We only drop ilm/slm policies on teardown only if the running docs tests
are ilm/slm related.
This updates the test name pattern to match the ilm/slm related tests
when running on windows
(eg.`reference\ilm/update-lifecycle-policy/line_29`).
(cherry picked from commit 4bb5bbd52eee59bd3eee6d766a9efc159822d9b9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
* [DOCS] Align with ILM API docs (#48705)
* [DOCS] Reconciled with Snapshot/Restore reorg
* [DOCS] Split off ILM overview to a separate topic. (#51287)
* [DOCS} Split off overview to a separate topic.
* [DOCS] Incorporated feedback from @jrodewig.
* [DOCS] Edit ILM GS tutorial (#51513)
* [DOCS] Edit ILM GS tutorial
* [DOCS] Incorporated review feedback from @andreidan.
* [DOCS] Removed test link & fixed anchor & title.
* Update docs/reference/ilm/getting-started-ilm.asciidoc
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
* Fixed glossary merge error.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
* Use standard format for reload settings API
The reload-secure-settings API page was not reorganized for the standard
API format, so this commit is reorganizing the page and adding some
links to the page in related documentation.
* Fix broken links
* Reorder examples to correctly check API response
* Note that only certain settings are reloadable
* [DOCS] Edits layout
* [DOCS] Removes unnecessary callouts
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Adds a secure and reloadable SECURE_AUTH_PASSWORD setting to allow keystore entries in the form "xpack.monitoring.exporters.*.auth.secure_password" to securely supply passwords for monitoring HTTP exporters. Also deprecates the insecure `AUTH_PASSWORD` setting.
The changes are to help users prepare for migration to next major
release (v8.0.0) regarding to the break change of realm order config.
Warnings are added for when:
* A realm does not have an order config
* Multiple realms have the same order config
The warning messages are added to both deprecation API and loggings.
The main reasons for doing this are: 1) there is currently no automatic relay
between the two; 2) deprecation API is under basic and we need logging
for OSS.
* Rename ILM history index enablement setting
The previous setting was `index.lifecycle.history_index_enabled`, this commit changes it to
`indices.lifecycle.history_index_enabled` to indicate this is not an index-level setting (it's node
level).
* [DOCS] Rewrite analysis intro. Move index/search analysis content.
* Rewrites 'Text analysis' page intro as high-level definition.
Adds guidance on when users should configure text analysis
* Rewrites and splits index/search analysis content:
* Conceptual content -> 'Index and search analysis' under 'Concepts'
* Task-based content -> 'Specify an analyzer' under 'Configure...'
* Adds detailed examples for when to use the same index/search analyzer
and when not.
* Adds new example snippets for specifying search analyzers
* clarifications
* Add toc. Decrement headings.
* Reword 'When to configure' section
* Remove sentence from tip
Previously, if YEAR() was used as and ORDER BY argument without being
wrapped with another scalar (e.g. YEAR(birth_date) + 10), no script
ordering was used but instead the underlying field (e.g. birth_date)
was used instead as a performance optimisation. This works correctly if
YEAR() is the only ORDER BY arg but if further args are used as tie
breakers for the ordering wrong results are produced. This is because
2 rows with the different birth_date but on the same year are not tied
as the underlying ordering is on birth_date and not on the
YEAR(birth_date), and the following ORDER BY args are ignored.
Remove this optimisation for YEAR() to avoid incorrect results in
such cases.
As a consequence another bug is revealed: scalar functions on top
of nested fields produce scripted sorting/filtering which is not yet
supported. In such cases no error was thrown but instead all values for
such nested fields were null and were passed to the script implementing
the sorting/filtering, producing incorrect results.
Detect such cases and throw a validation exception.
Fixes: #51224
(cherry picked from commit f41efd6753dc3650a7eabb3e07b02b3b32c5704c)