When you run a `significant_terms` aggregation on a field and it *is*
mapped but there aren't any values for it then the count of the
documents that match the query on that shard still have to be added to
the overall doc count. I broke that in #57361. This fixes that.
Closes#57402
Merges the remaining implementation of `significant_terms` into `terms`
so that we can more easilly make them work properly without
`asMultiBucketAggregator` which *should* save memory and speed them up.
Relates #56487
When the `terms` agg runs against strings and uses global ordinals it
has an optimization when it collects segments that only ever have a
single value for the particular string. This is *very* common. But I
broke it in #57241. This fixes that optimization and adds `debug`
information that you can use to see how often we collect segments of
each type. And adds a test to make sure that I don't break the
optimization again.
We also had a specialiation for when there isn't a filter on the terms
to aggregate. I had removed that specialization in #57241 which resulted
in some slow down as well. This adds it back but in a more clear way.
And, hopefully, a way that is marginally faster when there *is* a
filter.
Closes#57407
This saves some memory when the `histogram` aggregation is not a top
level aggregation by dropping `asMultiBucketAggregator` in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the `LongKeyedBucketOrds` that we built the
first time we did this.
Backport of #56878 to 7.x branch.
With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.
Relates to #53100
Relates: elastic/elasticsearch#55014
This commit deprecates the local param in get_mapping.json.
This parameter is a no-op and field mappings are always retrieved locally.
(cherry picked from commit 0b041cccd894f01d723fb2979f70c1cf279700a6)
When the `terms` enum operates on non-numeric data it can collect it via
global ordinals. It actually has two separate collection strategies for,
one "dense" and one "remapping". Each of *those* strategies has two
"iteration" strategies that it uses to build buckets, depending on
whether or not we need buckets with `0` docs in them. Previously this
was done with several `null` checks and never really explained. This
change replaces those checks with two `CollectionStrategy` classes which
have good stuff like documentation.
Backporting #56888 to 7.x branch.
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.
This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.
Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.
Relates to #53100
This saves memory when running numeric significant terms which are not
at the top level by merging its collection into numeric terms and relying
on the optimization that we made in #55873.
Fixes for the REST specification specific to 7.x
* remove ignore "cat.thread_pool.json" and add the "" as valid option. #55984 deprecated this field since it these params here have no effect on this specific API
* remove ignore "indices.put_mapping.json" by adding the required / in the path to pass validation.
Changes:
* Adds API reference docs for the delete snapshot repo API.
* Corrects an error in the delete snapshot repo API spec. Comma-separated
repository names are not supported.
* Relocates the existing delete snapshot repo API example docs.
When `date_histogram` is a sub-aggregator it used to allocate a bunch of
objects for every one of it's parent's buckets. This uses the data
structures that we built in #55873 rework the `date_histogram`
aggregator instead of all of the allocation.
Part of #56487
In KeystoreWrapper class we determine if the error to decrypt a
given keystore is caused by a wrong password based on the exception
that the SunJCE implementation of AES is
throwing(AEADBadTagException). Other implementations from other
Security Providers fail with a different exception and as such we
cannot differentiate between a corrupted file and a wrong password
in a foolproof way.
As in other tests such as in
KeyStoreWrapperTests#testDecryptKeyStoreWithWrongPassword
we handle this by matching both possible exception messages.
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
This adds a few things to the `breakdown` of the profiler:
* `histogram` aggregations now contain `total_buckets` which is the
count of buckets that they collected. This could be useful when
debugging a histogram inside of another bucketing agg that is fairly
selective.
* All bucketing aggs that can delay their sub-aggregations will now add
a list of delayed sub-aggregations. This is useful because we
sometimes have fairly involved logic around which sub-aggregations get
delayed and this will save you from having to guess.
* Aggregtations wrapped in the `MultiBucketAggregatorWrapper` can't
accurately add anything to the breakdown. Instead they the wrapper
adds a marker entry `"multi_bucket_aggregator_wrapper": true` so we
can be quickly pick out such aggregations when debugging.
It also fixes a bug where `_count` breakdown entries were contributing
to the overall `time_in_nanos`. They didn't add a large amount of time
so it is unlikely that this caused a big problem, but I was there.
To support the arbitrary breakdown data this reworks the profiler so
that the `breakdown` can contain any data that is supported by
`StreamOutput#writeGenericValue(Object)` and
`XContentBuilder#value(Object)`.
This commit allows the JSON schema's documentation.url property to have a null value.
This can useful for cases where a feature is under development, and does not have
documentation published yet.
This commit also adds a documentation.url for two ml resources.
Backport: #55377
This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.
Relates to #53100
This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that
allowed specifying whether V1 or V2 template should be used when an index is created. It has been
removed in favor of V2 templates always having priority.
Relates to #53101Resolves#56528
This is not a breaking change because this flag was never in a released version.
`auto_date_histogram` was returning the incorrect `interval` because
of a combination of two things:
1. When pipeline aggregations rewrote `auto_date_histogram` we reset the
interval to 1. Oops. Fixed that.
2. *Every* bucket aggregation was rewriting its buckets as though there
was a pipeline aggregation even if there aren't any. This is a bit
silly so we skip that too.
Closes#56116
Only run the tests verifyin the overlapping index templates when there is
no `global` index template (ie. when the default shards are not changed)
(cherry picked from commit e256becad7650018ed6687d6f4ddba5e255f6b29)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This removed the specification of `order` as it is not a parameter of the
v2 put template api (the priority is the equivalent of `order` and is
defined in the body) and add a bit of description for the `cause` parameter
(which is currently used as a cluster update task tracking)
(cherry picked from commit e3e9782b2059e28bc4a08be2232c1e5baecad3d6)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This adds a new api to simulate matching the given index name against the
index templates in the system.
The syntax for the new API takes the following form:
POST _index_template/_simulate_index/{index_name}
{
"index_patterns": ["logs-*"],
"priority": 15,
"template": {
"settings": {
"number_of_shards": 3
}
...
}
}
Where the body is optional, but we support the entire body used by the
PUT _index_template/{name} api. When the body is specified we'll simulate
matching the given index against a system that'd have the given index
template together with the index templates that exist in the system.
The response, in both cases, will return the matching template's resolved
settings, mappings and aliases, together with a special field that'll print any
overlapping templates and their corresponding index patterns.
(cherry picked from commit 1a5845edce1f445c58e094e9a3b6792e21e543b0)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This adds a validation to VSParserHelper to ensure that a field or
script or both are specified by the user. This is technically
required today already, but throws an exception much deeper
in the agg framework and has a very unintuitive error for the user
(as well as eating more resources instead of failing early)
A JSON schema was recently introduced for the REST API specification. #54252
This PR introduces a 3rd party validation tool to ensure that the
REST specification conforms to the schema.
The task is applied to the 3 projects that contain REST API specifications.
The plugin wires this task into the precommit commit task, and should be
considered as part of the public API for the build tools for any plugin
developer to contribute their plugin's specification.
An ignore parameter has been introduced for the task to allow specific
file to be ignored from the validation. The ignored files in this PR
will soon get issues logged and a link so they can be fixed.
Closes#54314
This commit adds a new querystring parameter on the following APIs:
- Index
- Update
- Bulk
- Create Index
- Rollover
These APIs now support a `?prefer_v2_templates=true|false` flag. This flag changes the preference
creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be
changed to `true` for 8.0+ in subsequent work.
Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting.
This setting is used so that actions that automatically create a new index (things like rollover
initiated by ILM) will inherit the preference from the original index. This setting is dynamic so
that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias
performing periodic rollover.
This also adds support for sending this parameter to the High Level Rest Client.
Relates to #53101
Backport from: #54726
The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used.
In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs.
Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex().
If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api).
Relates to #53100
* Add ValuesSource Registry and associated logic (#54281)
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638)
* ValuesSourceRegistry Prototype (#48758)
* Remove generics from ValuesSource related classes (#49606)
* fix percentile aggregation tests (#50712)
* Basic thread safety for ValuesSourceRegistry (#50340)
* Remove target value type from ValuesSourceAggregationBuilder (#49943)
* Cleanup default values source type (#50992)
* CoreValuesSourceType no longer implements Writable (#51276)
* Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131)
* Put values source types on fields (#51503)
* Remove VST Any (#51539)
* Rewire terms agg to use new VS registry (#51182)
Also adds some basic AggTestCases for untested code
paths (and boilerplate for future tests once the IT are
converted over)
* Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337)
* Wire Percentiles aggregator into new VS framework (#51639)
This required a bit of a refactor to percentiles itself. Before,
the Builder would switch on the chosen algo to generate an
algo-specific factory. This doesn't work (or at least, would be
difficult) in the new VS framework.
This refactor consolidates both factories together and introduces
a PercentilesConfig object to act as a standardized way to pass
algo-specific parameters through the factory. This object
is then used when deciding which kind of aggregator to create
Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will
be moved in a subsequent PR.
* Remove generics and target value type from MultiVSAB (#51647)
* fix checkstyle after merge (#52008)
* Plumb ValuesSourceRegistry through to QuerySearchContext (#51710)
* Convert RareTerms to new VS registry (#52166)
* Wire up Value Count (#52225)
* Wire up Max & Min aggregations (#52219)
* ValuesSource refactoring: Wire up Sum aggregation (#52571)
* ValuesSource refactoring: Wire up SigTerms aggregation (#52590)
* Soft immutability for VSConfig (#52729)
* Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734)
Also fixes Percentiles which was incorrectly specified to only accept
numeric, but in fact also accepts Boolean and Date (because those are
numeric on master - thanks `testSupportedFieldTypes` for catching it!)
* VS refactoring: Wire up stats aggregation (#52891)
* ValuesSource refactoring: Wire up string_stats aggregation (#52875)
* VS refactoring: Wire up median (MAD) aggregation (#52945)
* fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java
this commit implements `getValuesSourceType` for
the ConstantKeyword field type.
master was merged into feature/extensible-values-source
introducing a new field type that was not implementing
`getValuesSourceType`.
* ValuesSource refactoring: Wire up Avg aggregation (#52752)
* Wire PercentileRanks aggregator into new VS framework (#51693)
* Add a VSConfig resolver for aggregations not using the registry (#53038)
* Vs refactor wire up ranges and date ranges (#52918)
* Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034)
This commit updates the geo_bounds aggregation to depend
on registering itself in the ValuesSourceRegistry
relates #42949.
* VS refactoring: convert Boxplot to new registry (#53132)
* Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037)
This commit updates the geo*_grid aggregations to depend
on registering itself in the ValuesSourceRegistry
relates to the values-source refactoring meta issue #42949.
* Wire-up geo_centroid agg to ValuesSourceRegistry (#53040)
This commit updates the geo_centroid aggregation to depend
on registering itself in the ValuesSourceRegistry.
relates to the values-source refactoring meta issue #42949.
* Fix type tests for Missing aggregation (#53501)
* ValuesSource Refactor: move histo VSType into XPack module (#53298)
- Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core.
- This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept
- Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs
* Wire up DateHistogram to the ValuesSourceRegistry (#53484)
* Vs refactor parser cleanup (#53198)
Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
* First batch of easy fixes
* Remove List.of from ValuesSourceRegistry
Note that we intend to have a follow up PR dealing with the mutability
of the registry, so I didn't even try to address that here.
* More compiler fixes
* More compiler fixes
* More compiler fixes
* Precommit is happy and so am I
* Add new Core VSTs to tests
* Disabled supported type test on SigTerms until we can backport it's fix
* fix checkstyle
* Fix test failure from semantic merge issue
* Fix some metaData->metadata replacements that got lost
* Fix list of supported types for MinAggregator
* Fix list of supported types for Avg
* remove unused import
Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
Modify the value of nowInMillis in queryShardContext to current timestamp, because the
value will be used lately when validating the filtered alias which uses now in a date_nanos
range query.
With this change, when a task is canceled, the task manager will cancel
not only its direct child tasks but all also its descendant tasks.
Closes#50990
The secure_settings_password was never taken into consideration in
the ReloadSecureSettings API. This commit fixes that and adds
necessary REST layer testing. Doing so, it also:
- Allows TestClusters to have a password protected keystore
so that it can be set for tests.
- Adds a parameter to the run task so that elastisearch can
be run with a password protected keystore from source.
The usage of local parameter for GetFieldMappingRequest has been removed from the underlying transport action since v2.0.
This PR deprecates the parameter from rest layer. It will be removed in next major version.
* HLRC support for Index Templates V2 (#54838)
* HLRC support for Index Templates V2
This change adds High Level Rest Client support for Index Templates V2.
Relates to #53101
* fixed compilation error
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
These tests do CRUD for component templates, however, for 7.7 some changes weren't backported in the
`_doc` wrapping/unwrapping done for the APIs, this can cause test failures.
This bumps the minimum version for these tests to 7.8, which is okay because component templates are
hidden behind a flag and have no compatibility guarantees for 7.7.
Relates to #53101
We occasionally add a global template for our YAML tests, and this can cause warnings for these
template tests. This commit adds these warnings so they don't cause test failures.
Resolves#54822
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit introduces a new `geo` module that is intended
to be contain all the geo-spatial-specific features in server.
As a first step, the responsibility of registering the geo_shape
field mapper is moved to this module.
Co-authored-by: Nicholas Knize <nknize@gmail.com>
This replaces the last bit of validation that pipeline aggregations
performed on the data nodes with explicit checks in a few
`PipelineAggregationBuilders`. We were *already* catching these
validation errors for pipeline aggregations that require that their
parent be squentially ordered. This just adds validation for pipelines
that require *any* parent like `bucket_selector` and `bucket_sort`.
There were some failures on 7.x of field collapse tests,
where total hits count was less then expected.
This adds an additional test to check total hits count
before field collapse queries to understand if the problem
is with field collapsing or with simply that writes have
not been finished yet
Relates to #52416
Today when canceling a task we broadcast ban/unban requests to all nodes
in the cluster. This strategy does not scale well for hierarchical
cancellation. With this change, we will track outstanding child requests
and broadcast the cancellation to only nodes that have outstanding child
tasks. This change also prevents a parent task from sending child
requests once it got canceled.
Relates #50990
Supersedes #51157
Co-authored-by: Igor Motov <igor@motovs.org>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
* Use V2 index templates during index creation
This commit changes our index creation code to use (and favor!) V2 index templates during index
creation. The creation precedence goes like so, in order of precedence:
- Existing source `IndexMetadata` - for example, when recovering from a peer or a shrink/split/clone
where index templates should not be applied
- A matching V2 index template, if one is found
- When a V2 template is found, all component templates (in the `composed_of` field) are applied
in the order that they appear, with the index template having the 2nd highest precedence (the
create index request always has the top priority when it comes to index settings)
- All matching V1 templates (the old style)
This also adds index template validation when `PUT`-ing a new v2 index template (because this was
required) and ensures that all index and component templates specify *no* top-level mapping type (it
is automatically added when the template is added to the cluster state).
This does not yet implement fine-grained component template merging of mappings, where we favor
merging only a single field's configuration, that will be done in subsequent work.
This also keeps the existing hidden index behavior present for v1 templates, where a hidden index
will match v2 index templates unless they are global (`*`) templates.
Relates to #53101
- Consolidates HDR/TDigest factories into a single factory
- Consolidates most HDR/TDigest builder into an abstract builder
- Deprecates method(), compression(), numSigFig() in favor of a new
unified PercentileConfig object
- Disallows setting algo options that don't apply to current algo
The unified config method carries both the method and algo-specific
setting. This provides a mechanism to reject settings that apply
to the wrong algorithm. For BWC the old methods are retained
but marked as deprecated, and can be removed in future versions.
Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
This commit updates the rest API specs to validate against a
JSON schema for the specifications. Most updates are to add
a description, whilst others fix typos and unify conventions
e.g. deprecations, descriptions, urls starting with /. The schema
conforms to draft-07 JSON schema.
(cherry picked from commit da37e01d32f9764c3937736ef0c7d3ab40af9a77)
Tests for unmapped fields, the missing parameter, scripting, and correct
ValuesSource types in MissingAggregatorTests. Basic yaml tests for the
missing agg
For #42949
There is a setting in `ESClientYamlSuiteTestCase` under `usually()` that can install a `global`
template changing the number of shards for all indices. This can cause warnings when installing v2
templates (see #54367). This adds these as optional warnings so they don't cause failures regardless
of whether the global template is installed or not.
These warnings can be removed when our internal template usage has been moved to index templates v2
Relates to #53101
This fixes a serialization bug in `auto_date_histogram` that comes up in
a cluster mixed between pre-7.3.0 and post-7.3.0.
Includes #54429 to keep 7.x looking like master for simpler backports.
Closes#54382
Pipeline aggregations like `stats_bucket`, `sum_bucket`, and
`percentiles_bucket` only operate on buckets that have multiple buckets.
This adds support for those aggregations to `geo_distance`, `ip_range`,
`auto_date_histogram`, and `rare_terms`.
This all happened because we used a marker interface to mark compatible
aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget
to implement the interface.
This replaces the marker interface with an abstract method in
`AggregationBuilder`, `bucketCardinality` which makes you return `NONE`,
`ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At
this point `ONE` and `NONE` amount to about the same thing, but I
suspect that'll be a useful distinction when validating bucket sorts.
Closes#53215
This commit ensures that node roles are sorted by node role name, which
makes the output easier to consume, and also makes it easier to rely on
the behavior of the output in assertions.
* Add REST APIs for IndexTemplateV2Metadata CRUD (#54039)
* Add REST APIs for IndexTemplateV2Metadata CRUD
This commit adds the get/put/delete APIs for interacting with the now v2 versions of index
templates.
These APIs are behind the existing `es.itv2_feature_flag_registered` system property feature flag.
Relates to #53101
* Add exceptions for HLRC tests
* Add skips for 7.x versions
* Use index_template instead of template_v2 in action names
* Add test for MetaDataIndexTemplateService.addIndexTemplateV2
* Move removal to static method and add test
* Add unit tests for request classes (implement hashCode & equals)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Fix compilation
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Document known features on rest-api-spec tests
Features dictate wheter a `rest-api-spec` test runner can execute a
test. This PR documents all the know features in the java implementation
of the runner.
* Apply suggestions from code review
Co-Authored-By: Luca Cavanna <javanna@users.noreply.github.com>
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
(cherry picked from commit fbe173723d0ad7ebb920cda855bce3fb758b04a6)
This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase
automatically if one of these conditions is met:
* The request targets more than 128 shards.
* The request contains read-only indices.
* The primary sort of the query targets an indexed field.
Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value.
Closes#39835
This commits adds a data stream feature flag, initial definition of a data stream and
the stubs for the data stream create, delete and get APIs. Also simple serialization
tests are added and a rest test to thest the data stream API stubs.
This is a large amount of code and mainly mechanical, but this commit should be
straightforward to review, because there isn't any real logic.
The data stream transport and rest action are behind the data stream feature flag and
are only intialized if the feature flag is enabled. The feature flag is enabled if
elasticsearch is build as snapshot or a release build and the
'es.datastreams_feature_flag_registered' is enabled.
The integ-test-zip sets the feature flag if building a release build, otherwise
rest tests would fail.
Relates to #53100
Downstream Elasticsearch clients, such as the Elaticsearch-JS client,
use the documentation links in our REST API JSON specifications to
create their docs.
Using a broken link or linking to yet-to-be-created doc pages can
break the docs build for these clients.
This PR adds a related note to the README for the REST API JSON Specs.
TermsLookup in master no longer accepts a type parameter. We should emit
a deprecate warning in 7.x when a terms lookup requests includes type to prepare
users for its removal.
Relates to #41059
Prior to this commit Watcher explicitly copied test between two
projects with a copy task. This commit removes the explicit copy in favor
of adding the Watcher tests to the available restResources that may be
copied between projects.
This is how inter-project dependencies should be modeled. However, only
Watcher is included here since it is (currently) the only project with
inter-project test dependencies.
* Fix feature flag setting for ComponentTemplate APIs (#53758)
The feature flag was set for *most* of the builds, but there are a couple where it was missing.
Resolves#53708
* Add skip for older versions of ES
Removes the `flat_settings` and `timeout` query parameters from the JSON
spec and asciidoc docs for the put index template API.
These parameters are not supported by the API.
This commit, built on top of #51708, allows to modify shard search requests based on informations collected on other shards. It is intended to speed up sorted queries on time-based indices. For queries that are only interested in the top documents.
This change will rewrite the shard queries to match none if the bottom sort value computed in prior shards is better than all values in the shard.
For queries that mix top documents and aggregations this change will reset the size of the top documents to 0 instead of rewriting to match none.
This means that we don't need to keep a search context open for this shard since we know in advance that it doesn't contain any competitive hit.
* Add REST API for ComponentTemplate CRUD
This adds the Put/Get/DeleteComponentTemplate APIs that allow inserting, retrieving, and removing
ComponentTemplateMetadata into the cluster state metadata.
These APIs are currently only available behind a feature flag system property -
`es.itv2_feature_flag_registered`.
Relates to #53101
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
I broke sorting aggregations by `doc_count` in #51271 by mixing up true
and false. This flips that comparison and adds a few tests to double
check that we don't so this again.
This commit adjusts the _cat/indices and _cat/aliases APIs to allow
specifying indices options, so that these APIs can handle hidden
indices/aliases in the same way as other APIs.
Also adds the hidden option to the expand_wildcards parameter
in the YAML spec for every API that accepts it.
Keyword field values with length more than ignore_above are not
indexed. But highlighters still were retrieving these values
from _source and were trying to highlight them. This sometimes lead to
errors if a field length exceeded max_analyzed_offset. But also this
is an overall wrong behaviour to attempt to highlight something that was
ignored during indexing.
This PR checks if a keyword value was ignored because of its length,
and if yes, skips highlighting it.
Backport: #53408Closes#43800
Remove excessive testing and keep only the checks for when the queries
are disallowed. Fix also the check for the initial value of the setting
to be conmbatible with Go client tests.
(cherry picked from commit 314145294ea926e069c6f8629dfc622a7f31a0fb)
It looks like `date_nanos` fields weren't likely to work properly in
composite aggs because composites iterate field values using points and
we weren't converting the points into milliseconds. Because the doc
values were coming back in milliseconds we ended up geting very confused
and just never collecting sub-aggregations.
This fixes that by adding a method to `DateFieldMapper.Resolution` to
`parsePointAsMillis` which is similarly in name and function to
`NumberFieldMapper.NumberType`'s `parsePoint` except that it normalizes
to milliseconds which is what aggs need at the moment.
Closes#53168
When an composite aggregation is run against an index with a sort that
*starts* with the "source" fields from the composite but has additional
fields it'd blow up in while trying to decide if it could use the sort.
This changes it to decide that it *can* use the sort.
Closes#52480
When we test backwards compatibility we often end up in a situation
where we *sometimes* get a warning, and sometimes don't. Like, we won't
get the warning if we're testing against an older version, but we will
in a newer one. Or we won't get the warning if the request randomly
lands on a node with an old version of the code. But we wouldn't if it
randomed into a node with newer code.
This adds `allowed_warnings` to our yaml test runner for those cases:
warnings declared this way are "allowed" but not "required".
Blocks #52959
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Computing the stats for completion fields may involve a significant amount of
work since it walks every field of every segment looking for completion fields.
Innocuous-looking APIs like `GET _stats` or `GET _cluster/stats` do this for
every shard in the cluster. This repeated work is unnecessary since these stats
do not change between refreshes; in many indices they remain constant for a
long time.
This commit introduces a cache for these stats which is invalidated on a
refresh, allowing most stats calls to bypass the work needed to compute them on
most shards.
Closes#51915
Backport of #51991
Today we use the translog_generation of the safe commit as the minimum
required translog generation for recovery. This approach has a
limitation, where we won't be able to clean up translog unless we flush.
Reopening an already recovered engine will create a new empty translog,
and we leave it there until we force flush.
This commit removes the translog_generation commit tag and uses the
local checkpoint of the safe commit to calculate the minimum required
translog generation for recovery instead.
Closes#49970
* Smarter copying of the rest specs and tests (#52114)
This PR addresses the unnecessary copying of the rest specs and allows
for better semantics for which specs and tests are copied. By default
the rest specs will get copied if the project applies
`elasticsearch.standalone-rest-test` or `esplugin` and the project
has rest tests or you configure the custom extension `restResources`.
This PR also removes the need for dozens of places where the x-pack
specs were copied by supporting copying of the x-pack rest specs too.
The plugin/task introduced here can also copy the rest tests to the
local project through a similar configuration.
The new plugin/task allows a user to minimize the surface area of
which rest specs are copied. Per project can be configured to include
only a subset of the specs (or tests). Configuring a project to only
copy the specs when actually needed should help with build cache hit
rates since we can better define what is actually in use.
However, project level optimizations for build cache hit rates are
not included with this PR.
Also, with this PR you can no longer use the includePackaged flag on
integTest task.
The following items are included in this PR:
* new plugin: `elasticsearch.rest-resources`
* new tasks: CopyRestApiTask and CopyRestTestsTask - performs the copy
* new extension 'restResources'
```
restResources {
restApi {
includeCore 'foo' , 'bar' //will include the core specs that start with foo and bar
includeXpack 'baz' //will include x-pack specs that start with baz
}
restTests {
includeCore 'foo', 'bar' //will include the core tests that start with foo and bar
includeXpack 'baz' //will include the x-pack tests that start with baz
}
}
```
This commit reinstates the following params in the rest specs:
1. "analyzer" in delete_by_query
2. "ccs_minimize_roundtrips" in msearch_template
3. "ccs_minimize_roundtrips" in search_template
All appear to be valid options that seem to have been inadvertantly removed
between 7.3 and 7.4.
Fixeselastic/elasticsearch#47768
When `date_histogram` attempts to optimize itself it for a particular
time zone it checks to see if the entire shard is within the same
"transition". Most time zone transition once every size months or
thereabouts so the optimization can usually kicks in.
*But* it crashes when you attempt feed it a time zone who's last DST
transition was before epoch. The reason for this is a little twisted:
before this patch it'd find the next and previous transitions in
milliseconds since epoch. Then it'd cast them to `Long`s and pass them
into the `DateFieldType` to check if the shard's contents were within
the range. The trouble is they are then converted to `String`s which are
*then* parsed back to `Instant`s which are then convertd to `long`s. And
the parser doesn't like most negative numbers. And everything before
epoch is negative.
This change removes the
`long` -> `Long` -> `String` -> `Instant` -> `long` chain in favor of
passing the `long` -> `Instant` -> `long` which avoids the fairly complex
parsing code and handles a bunch of interesting edge cases around
epoch. And other edge cases around `date_nanos`.
Closes#50265
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
* Time parameter includes description
In option enumeration causing codegenerators to pick up the description
as a value to send.
* cat.shards missing ending quotes
(cherry picked from commit 1c3b341960e3b70555927bdbab325d26382f68b2)
This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards). Coordinating nodes don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty response while the other can return null ones. This is needed since creating true empty responses on shards require to create concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation builders in a follow up but that would be a much bigger change.
This change is also important for #49601 since we want to add the ability to use the result of other shards to rewrite the request of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided one.
When the `rare_terms` aggregation contained another aggregation it'd
break them. Most of the time. This happened because the process that it
uses to remove buckets that turn out not to be rare was incorrectly
merging results from multiple leaves. This'd cause array index out of
bounds issues. We didn't catch it in the test because the issue doesn't
happen on the very first bucket. And the tests generated data in such a
way that the first bucket always contained the rare terms. Randomizing
the order of the generated data fixed the test so it caught the issue.
Closes#51020
This patch supplements #51792 and #51535 where the type of the "slices" parameter has been fixed.
(cherry picked from commit 2ed9e95100474f3dfbeb7efb0529e237b8f61e53)
The previous patch in c1d9966d35d incorrectly set the `type` to `number|auto`,
which is incorrect — the "polymorphic" type, denoted with the `|` sign,
should contain only other types, ie. number, string, bool, etc.
Fixes#51535
(cherry picked from commit 68db7fc611622ca0e418f454249e376e01f80587)
* REST: Test: Fix the `accept_enterprise` parameter for Get License API (#51527)
The Get License API specifies the `accept_enterprise` parameter as a `boolean`:
0ca5cb8cb6/x-pack/plugin/src/test/resources/rest-api-spec/api/license.get.json (L22-L27)
In the test, a `string` is passed however, which makes the test compilation fail in the Go client.
(cherry picked from commit e2a2169b3d44592057c143253bb56375ed3e4268)
* Fix the SQL API documentation in REST specification (#51534)
This patch fixes the SQL REST API documentation to conform to the current schema.
(cherry picked from commit c8b6a849852699883086a6ada42279f2f68d7e07)
* Fix the "slices" parameter for the Delete By Query API in the REST specification (#51535)
This patch updates the `type` parameter in the Delete By Query API: according to
[the documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-delete-by-query.html#docs-delete-by-query-slice),
it can be set to "auto", but the type in the documentation allows only numerical values.
This prevents people from setting the parameter to "auto" eg. in the Go client,
which generates source from the specification, and sets the corresponding Go
type as number.
The patch uses the `|` notation, which we have discussed previously for encoding
a "polymorphic" parameter like this.
Related: https://github.com/elastic/go-elasticsearch/issues/77
* Fix the Enrich API documentation in REST specification (#51528)
This patch fixes the REST API documentation for the Enrich APIs to conform to the current schema.
(cherry picked from commit 59f28f4f2feeba3f6d2f0b632410577eacb28121)
We added a new rounding in #50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.
This is a redo of #50873 with more integration tests.
This reverts commit d114c9db3e1d1a766f9f48f846eed0466125ce83.
We've been parsing the `time_zone` parameter on `date_hitogram` for a
while but it hasn't *done* anything. This wires it up.
Closes#45199
Inspired by #45200
This replaces the message we return for unknown queries with the standard
one that we use for unknown fields from `ObjectParser`. This is nice
because it includes "did you mean". One day we might convert parsing
queries to using object parser, but that looks complex. This change is
much smaller and seems useful.
Adding back accidentally removed jvm option that is required to enforce
start of the week = Monday in IsoCalendarDataProvider.
Adding a `feature` to yml test in order to skip running it in JDK8
commit that removed it 398c802
commit that backports SystemJvmOptions c4fbda3
relates 7.x backport of code that enforces CalendarDataProvider use #48349
When you declare an ObjectParser with top level named objects like we do
with `significant_terms` we didn't support "did you mean". This fixes
that.
Relates #50938
Check it out:
```
$ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{
"dac": {}
}'
{
"error" : {
"root_cause" : [
{
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
}
],
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
},
"status" : 400
}
```
The tricky thing about implementing this is that x-content doesn't
depend on Lucene. So this works by creating an extension point for the
error message using SPI. Elasticsearch's server module provides the
"spell checking" implementation.
s
When a composite aggregation is reduced using the results from an
index that has one of the fields unmapped we were throwing away the
formatter. This is mildly annoying, except in the case of IP addresses
which were coming out as non-utf-8-characters. And tripping assertions.
This carefully preserves the formatter from the working bucket.
Closes#50600
Adds support for the `offset` parameter to the `date_histogram` source
of composite aggs. The `offset` parameter is supported by the normal
`date_histogram` aggregation and is useful for folks that need to
measure things from, say, 6am one day to 6am the next day.
This is implemented by creating a new `Rounding` that knows how to
handle offsets and delegates to other rounding implementations. That
implementation doesn't fully implement the `Rounding` contract, namely
`nextRoundingValue`. That method isn't used by composite aggs so I can't
be sure that any implementation that I add will be correct. I propose to
leave it throwing `UnsupportedOperationException` until I need it.
Closes#48757
Previously, the following situation would throw an error:
* A search contains a `collapse` on a particular field.
* The search spans multiple indices, and in one index the field is mapped as a
concrete field, but in another it is a field alias.
The error occurs when we attempt to merge `CollapseTopFieldDocs` across shards.
When merging, we validate that the name of the collapse field is the same across
shards. But the name has already been resolved to the concrete field name, so it
will be different on shards where the field was mapped as an alias vs. shards
where it was a concrete field.
This PR updates the collapse field name in `CollapseTopFieldDocs` to the
original requested field, so that it will always be consistent across shards.
Note that in #32648, we already made a fix around collapsing on field aliases.
However, we didn't test this specific scenario where the field was mapped as an
alias in only one of the indices being searched.
This PR adds per-field metadata that can be set in the mappings and is later
returned by the field capabilities API. This metadata is completely opaque to
Elasticsearch but may be used by tools that index data in Elasticsearch to
communicate metadata about fields with tools that then search this data. A
typical example that has been requested in the past is the ability to attach
a unit to a numeric field.
In order to not bloat the cluster state, Elasticsearch requires that this
metadata be small:
- keys can't be longer than 20 chars,
- values can only be numbers or strings of no more than 50 chars - no inner
arrays or objects,
- the metadata can't have more than 5 keys in total.
Given that metadata is opaque to Elasticsearch, field capabilities don't try to
do anything smart when merging metadata about multiple indices, the union of
all field metadatas is returned.
Here is how the meta might look like in mappings:
```json
{
"properties": {
"latency": {
"type": "long",
"meta": {
"unit": "ms"
}
}
}
}
```
And then in the field capabilities response:
```json
{
"latency": {
"long": {
"searchable": true,
"aggreggatable": true,
"meta": {
"unit": [ "ms" ]
}
}
}
}
```
When there are no conflicts, values are arrays of size 1, but when there are
conflicts, Elasticsearch includes all unique values in this array, without
giving ways to know which index has which metadata value:
```json
{
"latency": {
"long": {
"searchable": true,
"aggreggatable": true,
"meta": {
"unit": [ "ms", "ns" ]
}
}
}
}
```
Closes#33267
This intervals source will return terms that are similar to an input term, up to
an edit distance defined by fuzziness, similar to FuzzyQuery.
Closes#49595