This fixes a serialization bug in `auto_date_histogram` that comes up in
a cluster mixed between pre-7.3.0 and post-7.3.0.
Includes #54429 to keep 7.x looking like master for simpler backports.
Closes#54382
Pipeline aggregations like `stats_bucket`, `sum_bucket`, and
`percentiles_bucket` only operate on buckets that have multiple buckets.
This adds support for those aggregations to `geo_distance`, `ip_range`,
`auto_date_histogram`, and `rare_terms`.
This all happened because we used a marker interface to mark compatible
aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget
to implement the interface.
This replaces the marker interface with an abstract method in
`AggregationBuilder`, `bucketCardinality` which makes you return `NONE`,
`ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At
this point `ONE` and `NONE` amount to about the same thing, but I
suspect that'll be a useful distinction when validating bucket sorts.
Closes#53215
This commit ensures that node roles are sorted by node role name, which
makes the output easier to consume, and also makes it easier to rely on
the behavior of the output in assertions.
* Add REST APIs for IndexTemplateV2Metadata CRUD (#54039)
* Add REST APIs for IndexTemplateV2Metadata CRUD
This commit adds the get/put/delete APIs for interacting with the now v2 versions of index
templates.
These APIs are behind the existing `es.itv2_feature_flag_registered` system property feature flag.
Relates to #53101
* Add exceptions for HLRC tests
* Add skips for 7.x versions
* Use index_template instead of template_v2 in action names
* Add test for MetaDataIndexTemplateService.addIndexTemplateV2
* Move removal to static method and add test
* Add unit tests for request classes (implement hashCode & equals)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Fix compilation
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Document known features on rest-api-spec tests
Features dictate wheter a `rest-api-spec` test runner can execute a
test. This PR documents all the know features in the java implementation
of the runner.
* Apply suggestions from code review
Co-Authored-By: Luca Cavanna <javanna@users.noreply.github.com>
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
(cherry picked from commit fbe173723d0ad7ebb920cda855bce3fb758b04a6)
This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase
automatically if one of these conditions is met:
* The request targets more than 128 shards.
* The request contains read-only indices.
* The primary sort of the query targets an indexed field.
Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value.
Closes#39835
This commits adds a data stream feature flag, initial definition of a data stream and
the stubs for the data stream create, delete and get APIs. Also simple serialization
tests are added and a rest test to thest the data stream API stubs.
This is a large amount of code and mainly mechanical, but this commit should be
straightforward to review, because there isn't any real logic.
The data stream transport and rest action are behind the data stream feature flag and
are only intialized if the feature flag is enabled. The feature flag is enabled if
elasticsearch is build as snapshot or a release build and the
'es.datastreams_feature_flag_registered' is enabled.
The integ-test-zip sets the feature flag if building a release build, otherwise
rest tests would fail.
Relates to #53100
Downstream Elasticsearch clients, such as the Elaticsearch-JS client,
use the documentation links in our REST API JSON specifications to
create their docs.
Using a broken link or linking to yet-to-be-created doc pages can
break the docs build for these clients.
This PR adds a related note to the README for the REST API JSON Specs.
TermsLookup in master no longer accepts a type parameter. We should emit
a deprecate warning in 7.x when a terms lookup requests includes type to prepare
users for its removal.
Relates to #41059
Prior to this commit Watcher explicitly copied test between two
projects with a copy task. This commit removes the explicit copy in favor
of adding the Watcher tests to the available restResources that may be
copied between projects.
This is how inter-project dependencies should be modeled. However, only
Watcher is included here since it is (currently) the only project with
inter-project test dependencies.
* Fix feature flag setting for ComponentTemplate APIs (#53758)
The feature flag was set for *most* of the builds, but there are a couple where it was missing.
Resolves#53708
* Add skip for older versions of ES
Removes the `flat_settings` and `timeout` query parameters from the JSON
spec and asciidoc docs for the put index template API.
These parameters are not supported by the API.
This commit, built on top of #51708, allows to modify shard search requests based on informations collected on other shards. It is intended to speed up sorted queries on time-based indices. For queries that are only interested in the top documents.
This change will rewrite the shard queries to match none if the bottom sort value computed in prior shards is better than all values in the shard.
For queries that mix top documents and aggregations this change will reset the size of the top documents to 0 instead of rewriting to match none.
This means that we don't need to keep a search context open for this shard since we know in advance that it doesn't contain any competitive hit.
* Add REST API for ComponentTemplate CRUD
This adds the Put/Get/DeleteComponentTemplate APIs that allow inserting, retrieving, and removing
ComponentTemplateMetadata into the cluster state metadata.
These APIs are currently only available behind a feature flag system property -
`es.itv2_feature_flag_registered`.
Relates to #53101
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
I broke sorting aggregations by `doc_count` in #51271 by mixing up true
and false. This flips that comparison and adds a few tests to double
check that we don't so this again.
This commit adjusts the _cat/indices and _cat/aliases APIs to allow
specifying indices options, so that these APIs can handle hidden
indices/aliases in the same way as other APIs.
Also adds the hidden option to the expand_wildcards parameter
in the YAML spec for every API that accepts it.
Keyword field values with length more than ignore_above are not
indexed. But highlighters still were retrieving these values
from _source and were trying to highlight them. This sometimes lead to
errors if a field length exceeded max_analyzed_offset. But also this
is an overall wrong behaviour to attempt to highlight something that was
ignored during indexing.
This PR checks if a keyword value was ignored because of its length,
and if yes, skips highlighting it.
Backport: #53408Closes#43800
Remove excessive testing and keep only the checks for when the queries
are disallowed. Fix also the check for the initial value of the setting
to be conmbatible with Go client tests.
(cherry picked from commit 314145294ea926e069c6f8629dfc622a7f31a0fb)
It looks like `date_nanos` fields weren't likely to work properly in
composite aggs because composites iterate field values using points and
we weren't converting the points into milliseconds. Because the doc
values were coming back in milliseconds we ended up geting very confused
and just never collecting sub-aggregations.
This fixes that by adding a method to `DateFieldMapper.Resolution` to
`parsePointAsMillis` which is similarly in name and function to
`NumberFieldMapper.NumberType`'s `parsePoint` except that it normalizes
to milliseconds which is what aggs need at the moment.
Closes#53168
When an composite aggregation is run against an index with a sort that
*starts* with the "source" fields from the composite but has additional
fields it'd blow up in while trying to decide if it could use the sort.
This changes it to decide that it *can* use the sort.
Closes#52480
When we test backwards compatibility we often end up in a situation
where we *sometimes* get a warning, and sometimes don't. Like, we won't
get the warning if we're testing against an older version, but we will
in a newer one. Or we won't get the warning if the request randomly
lands on a node with an old version of the code. But we wouldn't if it
randomed into a node with newer code.
This adds `allowed_warnings` to our yaml test runner for those cases:
warnings declared this way are "allowed" but not "required".
Blocks #52959
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Computing the stats for completion fields may involve a significant amount of
work since it walks every field of every segment looking for completion fields.
Innocuous-looking APIs like `GET _stats` or `GET _cluster/stats` do this for
every shard in the cluster. This repeated work is unnecessary since these stats
do not change between refreshes; in many indices they remain constant for a
long time.
This commit introduces a cache for these stats which is invalidated on a
refresh, allowing most stats calls to bypass the work needed to compute them on
most shards.
Closes#51915
Backport of #51991
Today we use the translog_generation of the safe commit as the minimum
required translog generation for recovery. This approach has a
limitation, where we won't be able to clean up translog unless we flush.
Reopening an already recovered engine will create a new empty translog,
and we leave it there until we force flush.
This commit removes the translog_generation commit tag and uses the
local checkpoint of the safe commit to calculate the minimum required
translog generation for recovery instead.
Closes#49970
* Smarter copying of the rest specs and tests (#52114)
This PR addresses the unnecessary copying of the rest specs and allows
for better semantics for which specs and tests are copied. By default
the rest specs will get copied if the project applies
`elasticsearch.standalone-rest-test` or `esplugin` and the project
has rest tests or you configure the custom extension `restResources`.
This PR also removes the need for dozens of places where the x-pack
specs were copied by supporting copying of the x-pack rest specs too.
The plugin/task introduced here can also copy the rest tests to the
local project through a similar configuration.
The new plugin/task allows a user to minimize the surface area of
which rest specs are copied. Per project can be configured to include
only a subset of the specs (or tests). Configuring a project to only
copy the specs when actually needed should help with build cache hit
rates since we can better define what is actually in use.
However, project level optimizations for build cache hit rates are
not included with this PR.
Also, with this PR you can no longer use the includePackaged flag on
integTest task.
The following items are included in this PR:
* new plugin: `elasticsearch.rest-resources`
* new tasks: CopyRestApiTask and CopyRestTestsTask - performs the copy
* new extension 'restResources'
```
restResources {
restApi {
includeCore 'foo' , 'bar' //will include the core specs that start with foo and bar
includeXpack 'baz' //will include x-pack specs that start with baz
}
restTests {
includeCore 'foo', 'bar' //will include the core tests that start with foo and bar
includeXpack 'baz' //will include the x-pack tests that start with baz
}
}
```
This commit reinstates the following params in the rest specs:
1. "analyzer" in delete_by_query
2. "ccs_minimize_roundtrips" in msearch_template
3. "ccs_minimize_roundtrips" in search_template
All appear to be valid options that seem to have been inadvertantly removed
between 7.3 and 7.4.
Fixeselastic/elasticsearch#47768
When `date_histogram` attempts to optimize itself it for a particular
time zone it checks to see if the entire shard is within the same
"transition". Most time zone transition once every size months or
thereabouts so the optimization can usually kicks in.
*But* it crashes when you attempt feed it a time zone who's last DST
transition was before epoch. The reason for this is a little twisted:
before this patch it'd find the next and previous transitions in
milliseconds since epoch. Then it'd cast them to `Long`s and pass them
into the `DateFieldType` to check if the shard's contents were within
the range. The trouble is they are then converted to `String`s which are
*then* parsed back to `Instant`s which are then convertd to `long`s. And
the parser doesn't like most negative numbers. And everything before
epoch is negative.
This change removes the
`long` -> `Long` -> `String` -> `Instant` -> `long` chain in favor of
passing the `long` -> `Instant` -> `long` which avoids the fairly complex
parsing code and handles a bunch of interesting edge cases around
epoch. And other edge cases around `date_nanos`.
Closes#50265
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
* Time parameter includes description
In option enumeration causing codegenerators to pick up the description
as a value to send.
* cat.shards missing ending quotes
(cherry picked from commit 1c3b341960e3b70555927bdbab325d26382f68b2)
This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards). Coordinating nodes don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty response while the other can return null ones. This is needed since creating true empty responses on shards require to create concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation builders in a follow up but that would be a much bigger change.
This change is also important for #49601 since we want to add the ability to use the result of other shards to rewrite the request of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided one.
When the `rare_terms` aggregation contained another aggregation it'd
break them. Most of the time. This happened because the process that it
uses to remove buckets that turn out not to be rare was incorrectly
merging results from multiple leaves. This'd cause array index out of
bounds issues. We didn't catch it in the test because the issue doesn't
happen on the very first bucket. And the tests generated data in such a
way that the first bucket always contained the rare terms. Randomizing
the order of the generated data fixed the test so it caught the issue.
Closes#51020
This patch supplements #51792 and #51535 where the type of the "slices" parameter has been fixed.
(cherry picked from commit 2ed9e95100474f3dfbeb7efb0529e237b8f61e53)
The previous patch in c1d9966d35d incorrectly set the `type` to `number|auto`,
which is incorrect — the "polymorphic" type, denoted with the `|` sign,
should contain only other types, ie. number, string, bool, etc.
Fixes#51535
(cherry picked from commit 68db7fc611622ca0e418f454249e376e01f80587)
* REST: Test: Fix the `accept_enterprise` parameter for Get License API (#51527)
The Get License API specifies the `accept_enterprise` parameter as a `boolean`:
0ca5cb8cb6/x-pack/plugin/src/test/resources/rest-api-spec/api/license.get.json (L22-L27)
In the test, a `string` is passed however, which makes the test compilation fail in the Go client.
(cherry picked from commit e2a2169b3d44592057c143253bb56375ed3e4268)
* Fix the SQL API documentation in REST specification (#51534)
This patch fixes the SQL REST API documentation to conform to the current schema.
(cherry picked from commit c8b6a849852699883086a6ada42279f2f68d7e07)
* Fix the "slices" parameter for the Delete By Query API in the REST specification (#51535)
This patch updates the `type` parameter in the Delete By Query API: according to
[the documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-slice),
it can be set to "auto", but the type in the documentation allows only numerical values.
This prevents people from setting the parameter to "auto" eg. in the Go client,
which generates source from the specification, and sets the corresponding Go
type as number.
The patch uses the `|` notation, which we have discussed previously for encoding
a "polymorphic" parameter like this.
Related: https://github.com/elastic/go-elasticsearch/issues/77
* Fix the Enrich API documentation in REST specification (#51528)
This patch fixes the REST API documentation for the Enrich APIs to conform to the current schema.
(cherry picked from commit 59f28f4f2feeba3f6d2f0b632410577eacb28121)
We added a new rounding in #50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.
This is a redo of #50873 with more integration tests.
This reverts commit d114c9db3e1d1a766f9f48f846eed0466125ce83.
We've been parsing the `time_zone` parameter on `date_hitogram` for a
while but it hasn't *done* anything. This wires it up.
Closes#45199
Inspired by #45200
This replaces the message we return for unknown queries with the standard
one that we use for unknown fields from `ObjectParser`. This is nice
because it includes "did you mean". One day we might convert parsing
queries to using object parser, but that looks complex. This change is
much smaller and seems useful.
Adding back accidentally removed jvm option that is required to enforce
start of the week = Monday in IsoCalendarDataProvider.
Adding a `feature` to yml test in order to skip running it in JDK8
commit that removed it 398c802
commit that backports SystemJvmOptions c4fbda3
relates 7.x backport of code that enforces CalendarDataProvider use #48349
When you declare an ObjectParser with top level named objects like we do
with `significant_terms` we didn't support "did you mean". This fixes
that.
Relates #50938
Check it out:
```
$ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{
"dac": {}
}'
{
"error" : {
"root_cause" : [
{
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
}
],
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
},
"status" : 400
}
```
The tricky thing about implementing this is that x-content doesn't
depend on Lucene. So this works by creating an extension point for the
error message using SPI. Elasticsearch's server module provides the
"spell checking" implementation.
s
When a composite aggregation is reduced using the results from an
index that has one of the fields unmapped we were throwing away the
formatter. This is mildly annoying, except in the case of IP addresses
which were coming out as non-utf-8-characters. And tripping assertions.
This carefully preserves the formatter from the working bucket.
Closes#50600
Adds support for the `offset` parameter to the `date_histogram` source
of composite aggs. The `offset` parameter is supported by the normal
`date_histogram` aggregation and is useful for folks that need to
measure things from, say, 6am one day to 6am the next day.
This is implemented by creating a new `Rounding` that knows how to
handle offsets and delegates to other rounding implementations. That
implementation doesn't fully implement the `Rounding` contract, namely
`nextRoundingValue`. That method isn't used by composite aggs so I can't
be sure that any implementation that I add will be correct. I propose to
leave it throwing `UnsupportedOperationException` until I need it.
Closes#48757
Previously, the following situation would throw an error:
* A search contains a `collapse` on a particular field.
* The search spans multiple indices, and in one index the field is mapped as a
concrete field, but in another it is a field alias.
The error occurs when we attempt to merge `CollapseTopFieldDocs` across shards.
When merging, we validate that the name of the collapse field is the same across
shards. But the name has already been resolved to the concrete field name, so it
will be different on shards where the field was mapped as an alias vs. shards
where it was a concrete field.
This PR updates the collapse field name in `CollapseTopFieldDocs` to the
original requested field, so that it will always be consistent across shards.
Note that in #32648, we already made a fix around collapsing on field aliases.
However, we didn't test this specific scenario where the field was mapped as an
alias in only one of the indices being searched.
This PR adds per-field metadata that can be set in the mappings and is later
returned by the field capabilities API. This metadata is completely opaque to
Elasticsearch but may be used by tools that index data in Elasticsearch to
communicate metadata about fields with tools that then search this data. A
typical example that has been requested in the past is the ability to attach
a unit to a numeric field.
In order to not bloat the cluster state, Elasticsearch requires that this
metadata be small:
- keys can't be longer than 20 chars,
- values can only be numbers or strings of no more than 50 chars - no inner
arrays or objects,
- the metadata can't have more than 5 keys in total.
Given that metadata is opaque to Elasticsearch, field capabilities don't try to
do anything smart when merging metadata about multiple indices, the union of
all field metadatas is returned.
Here is how the meta might look like in mappings:
```json
{
"properties": {
"latency": {
"type": "long",
"meta": {
"unit": "ms"
}
}
}
}
```
And then in the field capabilities response:
```json
{
"latency": {
"long": {
"searchable": true,
"aggreggatable": true,
"meta": {
"unit": [ "ms" ]
}
}
}
}
```
When there are no conflicts, values are arrays of size 1, but when there are
conflicts, Elasticsearch includes all unique values in this array, without
giving ways to know which index has which metadata value:
```json
{
"latency": {
"long": {
"searchable": true,
"aggreggatable": true,
"meta": {
"unit": [ "ms", "ns" ]
}
}
}
}
```
Closes#33267
This intervals source will return terms that are similar to an input term, up to
an edit distance defined by fuzziness, similar to FuzzyQuery.
Closes#49595
The cat nodes API performs a `ClusterStateAction` then a `NodesInfoAction`.
Today it accepts the `?local` parameter and passes this to the
`ClusterStateAction` but this parameter has no effect on the `NodesInfoAction`.
This is surprising, because `GET _cat/nodes?local` looks like it might be a
completely local call but in fact it still depends on every node in the
cluster.
This commit deprecates the `?local` parameter on this API so that it can be
removed in 8.0.
Relates #50088
Multiple version ranges are allowed to be used in section skip in yml
tests. This is useful when a bugfix was backported to latest versions
and all previous releases contain a wire breaking bug.
examples:
6.1.0 - 6.3.0, 6.6.0 - 6.7.9, 7.0 -
- 7.2, 8.0.0 -
backport #50014
Adds `GET /_script_language` to support Kibana dynamic scripting
language selection.
Response contains whether `inline` and/or `stored` scripts are
enabled as determined by the `script.allowed_types` settings.
For each scripting language registered, such as `painless`,
`expression`, `mustache` or custom, available contexts for the language
are included as determined by the `script.allowed_contexts` setting.
Response format:
```
{
"types_allowed": [
"inline",
"stored"
],
"language_contexts": [
{
"language": "expression",
"contexts": [
"aggregation_selector",
"aggs"
...
]
},
{
"language": "painless",
"contexts": [
"aggregation_selector",
"aggs",
"aggs_combine",
...
]
}
...
]
}
```
Fixes: #49463
**Backport**
This change adds a dynamic cluster setting named `indices.id_field_data.enabled`.
When set to `false` any attempt to load the fielddata for the `_id` field will fail
with an exception. The default value in this change is set to `false` in order to prevent
fielddata usage on this field for future versions but it will be set to `true` when backporting
to 7x. When the setting is set to true (manually or by default in 7x) the loading will also issue
a deprecation warning since we want to disallow fielddata entirely when https://github.com/elastic/elasticsearch/issues/26472
is implemented.
Closes#43599
JavaDateFormatter should keep the pattern with the prefixed 8 as it will be used for serialisation. The stripped pattern should be used for the enclosed formatters.
closes#48698
Adding support for the `search_type` request parameter to the Ranking Evaluation
API since this parameter can impact the ranking and the metric score and should
be choosen in the same way when evaluating the search as later in the real
search.
Closes#48503
Adds `GET /_script_context`, returning a `contexts` object with each
available context as a key whose value is an empty object. eg.
```
{
"contexts": {
"aggregation_selector": {},
"aggs": {},
"aggs_combine": {},
...
}
}
```
refs: #47411
* Bwc testclusters all (#46265)
Convert all bwc projects to testclusters
* Fix bwc versions config
* WIP fix rolling upgrade
* Fix bwc tests on old versions
* Fix rolling upgrade
Changes auto-id index requests to use optype CREATE, making it compliant with our docs.
This will also make these auto-id index requests compatible with the new "create-doc" index
privilege (which is based on the optype), the default optype is changed to create, just as it is
already documented.
Bulk requests currently do not allow adding "create" actions with auto-generated IDs.
This commit allows using the optype CREATE for append-only indexing operations. This is
mainly the user facing aspect of it.
API spec now use an object for the documentation field. _common was not updated yet. This commit updates _common.json and its corresponding parser.
Closes#46744
Co-Authored-By: Tomas Della Vedova <delvedor@users.noreply.github.com>
Follow up from #42346. Since the `methods` array is in order of
preference when calling the index API with an `{id}` we prefer to use
the `PUT` http method.
When waiting for no initializing shards we also have to wait for events
when we have more than one node in the cluster. When the primary is
started, there is a short period of time, where neither the primary nor
any of the replicas are initializing.
Closes#46535
We leave replicas unassigned until we reroute after the primary shard
starts. If a cluster health request with wait_for_no_initializing_shards
is executed before the reroute, it will return immediately although
there will be some initializing replicas. Peer recoveries of those
shards can prevent translog on the primary from trimming.
We add wait_for_events to the cluster health request so that it will
execute after the reroute.
Closes#46425