Adding support for the `search_type` request parameter to the Ranking Evaluation
API since this parameter can impact the ranking and the metric score and should
be choosen in the same way when evaluating the search as later in the real
search.
Closes#48503
Adds `GET /_script_context`, returning a `contexts` object with each
available context as a key whose value is an empty object. eg.
```
{
"contexts": {
"aggregation_selector": {},
"aggs": {},
"aggs_combine": {},
...
}
}
```
refs: #47411
* Bwc testclusters all (#46265)
Convert all bwc projects to testclusters
* Fix bwc versions config
* WIP fix rolling upgrade
* Fix bwc tests on old versions
* Fix rolling upgrade
Changes auto-id index requests to use optype CREATE, making it compliant with our docs.
This will also make these auto-id index requests compatible with the new "create-doc" index
privilege (which is based on the optype), the default optype is changed to create, just as it is
already documented.
Bulk requests currently do not allow adding "create" actions with auto-generated IDs.
This commit allows using the optype CREATE for append-only indexing operations. This is
mainly the user facing aspect of it.
API spec now use an object for the documentation field. _common was not updated yet. This commit updates _common.json and its corresponding parser.
Closes#46744
Co-Authored-By: Tomas Della Vedova <delvedor@users.noreply.github.com>
Follow up from #42346. Since the `methods` array is in order of
preference when calling the index API with an `{id}` we prefer to use
the `PUT` http method.
When waiting for no initializing shards we also have to wait for events
when we have more than one node in the cluster. When the primary is
started, there is a short period of time, where neither the primary nor
any of the replicas are initializing.
Closes#46535
We leave replicas unassigned until we reroute after the primary shard
starts. If a cluster health request with wait_for_no_initializing_shards
is executed before the reroute, it will return immediately although
there will be some initializing replicas. Peer recoveries of those
shards can prevent translog on the primary from trimming.
We add wait_for_events to the cluster health request so that it will
execute after the reroute.
Closes#46425
* Support geotile_grid aggregation in composite agg sources (#45810)
Adds support for `geotile_grid` as a source in composite aggs.
Part of this change includes adding a new docFormat of `GEOTILE` that formats a hashed `long` value into a geotile formatting string `zoom/x/y`.
Since #45136, we use soft-deletes instead of translog in peer recovery.
There's no need to retain extra translog to increase a chance of
operation-based recoveries. This commit ignores the translog retention
policy if soft-deletes is enabled so we can discard translog more
quickly.
Backport of #45473
Relates #45136
* Add is_write_index column to cat.aliases (#44772)
Aliases have had the option to set `is_write_index` since 6.4,
but the cat.aliases action was never updated.
* correct version bounds to 7.4
* Repository Cleanup Endpoint (#43900)
* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
* Update the REST API specification
This patch updates the REST API spefication in JSON files to better encode deprecated entities,
to improve specification of URL paths, and to open up the schema for future extensions.
Notably, it changes the `paths` from a list of strings to a list of objects, where each
particular object encodes all the information for this particular path: the `parts` and the `methods`.
Among the benefits of this approach is eg. encoding the difference between using the `PUT` and `POST`
methods in the Index API, to either use a specific document ID, or let Elasticsearch generate one.
Also `documentation` becomes an object that supports an `url` and also a `description` which is a
new field.
* Adapt YAML runner to new REST API specification format
The logic for choosing the path to use when running tests has been
simplified, as a consequence of the path parts being listed under each
path in the spec. The special case for create and index has been removed.
Also the parsing code has been hardened so that errors are thrown earlier
when the structure of the spec differs from what expected, and their
error messages should be more helpful.
This commit adds a deprecation warning in 7.x for the Force Merge API
when both only_expunge_deletes and max_num_segments are set in a request.
Relates #44761
With this change, we will return primary_term and seq_no of the current
document if an update is detected as a noop. We already return the
version; hence we should also return seq_no and primary_term.
Relates #42497
Adds an API to clone an index. This is similar to the index split and shrink APIs, just with the
difference that the number of primary shards is kept the same. In case where the filesystem
provides hard-linking capabilities, this is a very cheap operation.
Indexing cloning can be done by running `POST my_source_index/_clone/my_target_index` and it
supports the same options as the split and shrink APIs.
Closes#44128
This is a refactor to current JSON logging to make it more open for extensions
and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog
We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage)
These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter.
Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field.
closes#41350
backport #41354
This adds a `rare_terms` aggregation. It is an aggregation designed
to identify the long-tail of keywords, e.g. terms that are "rare" or
have low doc counts.
This aggregation is designed to be more memory efficient than the
alternative, which is setting a terms aggregation to size: LONG_MAX
(or worse, ordering a terms agg by count ascending, which has
unbounded error).
This aggregation works by maintaining a map of terms that have
been seen. A counter associated with each value is incremented
when we see the term again. If the counter surpasses a predefined
threshold, the term is removed from the map and inserted into a cuckoo
filter. If a future term is found in the cuckoo filter we assume it
was previously removed from the map and is "common".
The map keys are the "rare" terms after collection is done.
This commit adds a wildcard intervals source, similar to the prefix. It
also changes the term parameter in prefix to read prefix, to bring it
in to line with the pattern parameter in wildcard.
Closes#43198
This commit adds a prefix intervals source, allowing you to search
for intervals that contain terms starting with a given prefix. The source
can make use of the index_prefixes mapping option.
Relates to #43198
* introduce state to the REST API specification
* change state over to stability
* CCR is no GA updated to stable
* SQL is now GA so marked as stable
* Introduce `internal` as state for API's, marks stable in terms of lifetime but unstable in terms of guarantees on its output format since it exposes internal representations
* make setting a wrong stability value, or not setting it at all an error that causes the YAML test suite to fail
* update spec files to be explicit about their stability state
* Document the fact that stability needs to be defined
Otherwise the YAML test runner will fail (with a nice exception message)
* address check style violations
* update rest spec unit tests to include stability
* found one more test spec file not declaring stability, made sure stability appears after documentation everywhere
* cluster.state is stable, mark response in some way to denote its a key value format that can be changed during minors
* mark data frame API's as beta
* remove internal and private as states for an API
* removed the wrong enum values in the Stability Enum in the previous commit
(cherry picked from commit 61c34bbd92f8f7e5f22fa411c6b682b0ebd8a99d)
The painless context api is internal and currently meant only for use in
generating docs. This commit moves the spec file for the api so that it
is only used by the test for this api, and not externally by any clients
building from the public rest spec.
* Documents the new deprecations options on the rest-api-spec
Relates #41439#38613#35262
* remove reference to path now that #41452 is merged, also fixed missing a comma rendering the example json invalid
* removed one more instance of path
* make sure json examples are self contained and not excerpts
(cherry picked from commit 4430f99750a3bf98373d69d2be59d71475c7aaad)
Both TransportAnalyzeAction and CategorizationAnalyzer have logic to build
custom analyzers for index-independent analysis. A lot of this code is duplicated,
and it requires the AnalysisRegistry to expose a number of internal provider
classes, as well as making some assumptions about when analysis components are
constructed.
This commit moves the build logic directly into AnalysisRegistry, reducing the
registry's API surface considerably.
Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.
The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.
Finally, all 3 endpoints now support max_docs in both body and URL.
Relates #24344
Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.
Tests were failing in mixed cluster after more broad warnings were introduced
in 6.x These tests were using `yyyy-MM-dd` pattern which is now warning about
the change of `y` to `u`. However, using predefined pattern
`strict_date` which uses the same format prevents the warning from being
generate and allow smooth upgrade/work in mixed cluster.
relates #42679
The date_histogram accepts an interval which can be either a calendar
interval (DST-aware, leap seconds, arbitrary length of months, etc) or
fixed interval (strict multiples of SI units). Unfortunately this is inferred
by first trying to parse as a calendar interval, then falling back to fixed
if that fails.
This leads to confusing arrangement where `1d` == calendar, but
`2d` == fixed. And if you want a day of fixed time, you have to
specify `24h` (e.g. the next smallest unit). This arrangement is very
error-prone for users.
This PR adds `calendar_interval` and `fixed_interval` parameters to any
code that uses intervals (date_histogram, rollup, composite, datafeed, etc).
Calendar only accepts calendar intervals, fixed accepts any combination of
units (meaning `1d` can be used to specify `24h` in fixed time), and both
are mutually exclusive.
The old interval behavior is deprecated and will throw a deprecation warning.
It is also mutually exclusive with the two new parameters. In the future the
old dual-purpose interval will be removed.
The change applies to both REST and java clients.
If `keyedFilters` is null it assumes there are unkeyed filters...which
will NPE if the unkeyed filters was actually empty.
This refactors to simplify the filter assignment a bit, adds an empty
check and tidies up some formatting.