Backport of #50920. Part of #48366. Implement an API for listing,
importing and deleting dangling indices.
Co-authored-by: David Turner <david.turner@elastic.co>
Before #57042 the max_buckets test would consistently pass because the
request would consistently fail. In particular, the request would fail on
the data node. After #57042 it only fails on the coordinating node. When
the max_buckets test is run in a mixed version cluster it consistently
fails on *either* the data node or the coordinating node. Except when
the coordinating node is missing #43095. In that case if the one data
node has #57042 and one does not, *and* the one that doesn't gets the
request first, fails it as expected, and then the coordinating node
retries the request on the node with #57042. When that happens the
request fails mysteriously with "partial shard failures" as the error
message but not partial failures reported. This is *exactly* the bug
fixed in #43095.
This updates the test to be skipped in mixed version clusters without
#43095 because they *sometimes* fail the test spuriously. The request
fails in those cases, just like we expect, but with a mysterious error
message.
Closes#57657
We keep a static list of meta-fields: META_FIELDS_BEFORE_7_8
as it was before.
This is done to ensure the backwards compatability with pre 7.8 nodes.
Closes#57831
When you run a `significant_terms` aggregation on a field and it *is*
mapped but there aren't any values for it then the count of the
documents that match the query on that shard still have to be added to
the overall doc count. I broke that in #57361. This fixes that.
Closes#57402
Merges the remaining implementation of `significant_terms` into `terms`
so that we can more easilly make them work properly without
`asMultiBucketAggregator` which *should* save memory and speed them up.
Relates #56487
When the `terms` agg runs against strings and uses global ordinals it
has an optimization when it collects segments that only ever have a
single value for the particular string. This is *very* common. But I
broke it in #57241. This fixes that optimization and adds `debug`
information that you can use to see how often we collect segments of
each type. And adds a test to make sure that I don't break the
optimization again.
We also had a specialiation for when there isn't a filter on the terms
to aggregate. I had removed that specialization in #57241 which resulted
in some slow down as well. This adds it back but in a more clear way.
And, hopefully, a way that is marginally faster when there *is* a
filter.
Closes#57407
This saves some memory when the `histogram` aggregation is not a top
level aggregation by dropping `asMultiBucketAggregator` in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the `LongKeyedBucketOrds` that we built the
first time we did this.
Backport of #56878 to 7.x branch.
With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.
Relates to #53100
Relates: elastic/elasticsearch#55014
This commit deprecates the local param in get_mapping.json.
This parameter is a no-op and field mappings are always retrieved locally.
(cherry picked from commit 0b041cccd894f01d723fb2979f70c1cf279700a6)
When the `terms` enum operates on non-numeric data it can collect it via
global ordinals. It actually has two separate collection strategies for,
one "dense" and one "remapping". Each of *those* strategies has two
"iteration" strategies that it uses to build buckets, depending on
whether or not we need buckets with `0` docs in them. Previously this
was done with several `null` checks and never really explained. This
change replaces those checks with two `CollectionStrategy` classes which
have good stuff like documentation.
Backporting #56888 to 7.x branch.
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.
This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.
Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.
Relates to #53100
This saves memory when running numeric significant terms which are not
at the top level by merging its collection into numeric terms and relying
on the optimization that we made in #55873.
Fixes for the REST specification specific to 7.x
* remove ignore "cat.thread_pool.json" and add the "" as valid option. #55984 deprecated this field since it these params here have no effect on this specific API
* remove ignore "indices.put_mapping.json" by adding the required / in the path to pass validation.
Changes:
* Adds API reference docs for the delete snapshot repo API.
* Corrects an error in the delete snapshot repo API spec. Comma-separated
repository names are not supported.
* Relocates the existing delete snapshot repo API example docs.
When `date_histogram` is a sub-aggregator it used to allocate a bunch of
objects for every one of it's parent's buckets. This uses the data
structures that we built in #55873 rework the `date_histogram`
aggregator instead of all of the allocation.
Part of #56487
In KeystoreWrapper class we determine if the error to decrypt a
given keystore is caused by a wrong password based on the exception
that the SunJCE implementation of AES is
throwing(AEADBadTagException). Other implementations from other
Security Providers fail with a different exception and as such we
cannot differentiate between a corrupted file and a wrong password
in a foolproof way.
As in other tests such as in
KeyStoreWrapperTests#testDecryptKeyStoreWithWrongPassword
we handle this by matching both possible exception messages.
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
This adds a few things to the `breakdown` of the profiler:
* `histogram` aggregations now contain `total_buckets` which is the
count of buckets that they collected. This could be useful when
debugging a histogram inside of another bucketing agg that is fairly
selective.
* All bucketing aggs that can delay their sub-aggregations will now add
a list of delayed sub-aggregations. This is useful because we
sometimes have fairly involved logic around which sub-aggregations get
delayed and this will save you from having to guess.
* Aggregtations wrapped in the `MultiBucketAggregatorWrapper` can't
accurately add anything to the breakdown. Instead they the wrapper
adds a marker entry `"multi_bucket_aggregator_wrapper": true` so we
can be quickly pick out such aggregations when debugging.
It also fixes a bug where `_count` breakdown entries were contributing
to the overall `time_in_nanos`. They didn't add a large amount of time
so it is unlikely that this caused a big problem, but I was there.
To support the arbitrary breakdown data this reworks the profiler so
that the `breakdown` can contain any data that is supported by
`StreamOutput#writeGenericValue(Object)` and
`XContentBuilder#value(Object)`.
This commit allows the JSON schema's documentation.url property to have a null value.
This can useful for cases where a feature is under development, and does not have
documentation published yet.
This commit also adds a documentation.url for two ml resources.
Backport: #55377
This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.
Relates to #53100
This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that
allowed specifying whether V1 or V2 template should be used when an index is created. It has been
removed in favor of V2 templates always having priority.
Relates to #53101Resolves#56528
This is not a breaking change because this flag was never in a released version.
`auto_date_histogram` was returning the incorrect `interval` because
of a combination of two things:
1. When pipeline aggregations rewrote `auto_date_histogram` we reset the
interval to 1. Oops. Fixed that.
2. *Every* bucket aggregation was rewriting its buckets as though there
was a pipeline aggregation even if there aren't any. This is a bit
silly so we skip that too.
Closes#56116
Only run the tests verifyin the overlapping index templates when there is
no `global` index template (ie. when the default shards are not changed)
(cherry picked from commit e256becad7650018ed6687d6f4ddba5e255f6b29)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This removed the specification of `order` as it is not a parameter of the
v2 put template api (the priority is the equivalent of `order` and is
defined in the body) and add a bit of description for the `cause` parameter
(which is currently used as a cluster update task tracking)
(cherry picked from commit e3e9782b2059e28bc4a08be2232c1e5baecad3d6)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This adds a new api to simulate matching the given index name against the
index templates in the system.
The syntax for the new API takes the following form:
POST _index_template/_simulate_index/{index_name}
{
"index_patterns": ["logs-*"],
"priority": 15,
"template": {
"settings": {
"number_of_shards": 3
}
...
}
}
Where the body is optional, but we support the entire body used by the
PUT _index_template/{name} api. When the body is specified we'll simulate
matching the given index against a system that'd have the given index
template together with the index templates that exist in the system.
The response, in both cases, will return the matching template's resolved
settings, mappings and aliases, together with a special field that'll print any
overlapping templates and their corresponding index patterns.
(cherry picked from commit 1a5845edce1f445c58e094e9a3b6792e21e543b0)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This adds a validation to VSParserHelper to ensure that a field or
script or both are specified by the user. This is technically
required today already, but throws an exception much deeper
in the agg framework and has a very unintuitive error for the user
(as well as eating more resources instead of failing early)
A JSON schema was recently introduced for the REST API specification. #54252
This PR introduces a 3rd party validation tool to ensure that the
REST specification conforms to the schema.
The task is applied to the 3 projects that contain REST API specifications.
The plugin wires this task into the precommit commit task, and should be
considered as part of the public API for the build tools for any plugin
developer to contribute their plugin's specification.
An ignore parameter has been introduced for the task to allow specific
file to be ignored from the validation. The ignored files in this PR
will soon get issues logged and a link so they can be fixed.
Closes#54314
This commit adds a new querystring parameter on the following APIs:
- Index
- Update
- Bulk
- Create Index
- Rollover
These APIs now support a `?prefer_v2_templates=true|false` flag. This flag changes the preference
creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be
changed to `true` for 8.0+ in subsequent work.
Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting.
This setting is used so that actions that automatically create a new index (things like rollover
initiated by ILM) will inherit the preference from the original index. This setting is dynamic so
that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias
performing periodic rollover.
This also adds support for sending this parameter to the High Level Rest Client.
Relates to #53101
Backport from: #54726
The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used.
In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs.
Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex().
If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api).
Relates to #53100
* Add ValuesSource Registry and associated logic (#54281)
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638)
* ValuesSourceRegistry Prototype (#48758)
* Remove generics from ValuesSource related classes (#49606)
* fix percentile aggregation tests (#50712)
* Basic thread safety for ValuesSourceRegistry (#50340)
* Remove target value type from ValuesSourceAggregationBuilder (#49943)
* Cleanup default values source type (#50992)
* CoreValuesSourceType no longer implements Writable (#51276)
* Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131)
* Put values source types on fields (#51503)
* Remove VST Any (#51539)
* Rewire terms agg to use new VS registry (#51182)
Also adds some basic AggTestCases for untested code
paths (and boilerplate for future tests once the IT are
converted over)
* Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337)
* Wire Percentiles aggregator into new VS framework (#51639)
This required a bit of a refactor to percentiles itself. Before,
the Builder would switch on the chosen algo to generate an
algo-specific factory. This doesn't work (or at least, would be
difficult) in the new VS framework.
This refactor consolidates both factories together and introduces
a PercentilesConfig object to act as a standardized way to pass
algo-specific parameters through the factory. This object
is then used when deciding which kind of aggregator to create
Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will
be moved in a subsequent PR.
* Remove generics and target value type from MultiVSAB (#51647)
* fix checkstyle after merge (#52008)
* Plumb ValuesSourceRegistry through to QuerySearchContext (#51710)
* Convert RareTerms to new VS registry (#52166)
* Wire up Value Count (#52225)
* Wire up Max & Min aggregations (#52219)
* ValuesSource refactoring: Wire up Sum aggregation (#52571)
* ValuesSource refactoring: Wire up SigTerms aggregation (#52590)
* Soft immutability for VSConfig (#52729)
* Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734)
Also fixes Percentiles which was incorrectly specified to only accept
numeric, but in fact also accepts Boolean and Date (because those are
numeric on master - thanks `testSupportedFieldTypes` for catching it!)
* VS refactoring: Wire up stats aggregation (#52891)
* ValuesSource refactoring: Wire up string_stats aggregation (#52875)
* VS refactoring: Wire up median (MAD) aggregation (#52945)
* fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java
this commit implements `getValuesSourceType` for
the ConstantKeyword field type.
master was merged into feature/extensible-values-source
introducing a new field type that was not implementing
`getValuesSourceType`.
* ValuesSource refactoring: Wire up Avg aggregation (#52752)
* Wire PercentileRanks aggregator into new VS framework (#51693)
* Add a VSConfig resolver for aggregations not using the registry (#53038)
* Vs refactor wire up ranges and date ranges (#52918)
* Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034)
This commit updates the geo_bounds aggregation to depend
on registering itself in the ValuesSourceRegistry
relates #42949.
* VS refactoring: convert Boxplot to new registry (#53132)
* Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037)
This commit updates the geo*_grid aggregations to depend
on registering itself in the ValuesSourceRegistry
relates to the values-source refactoring meta issue #42949.
* Wire-up geo_centroid agg to ValuesSourceRegistry (#53040)
This commit updates the geo_centroid aggregation to depend
on registering itself in the ValuesSourceRegistry.
relates to the values-source refactoring meta issue #42949.
* Fix type tests for Missing aggregation (#53501)
* ValuesSource Refactor: move histo VSType into XPack module (#53298)
- Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core.
- This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept
- Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs
* Wire up DateHistogram to the ValuesSourceRegistry (#53484)
* Vs refactor parser cleanup (#53198)
Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
* First batch of easy fixes
* Remove List.of from ValuesSourceRegistry
Note that we intend to have a follow up PR dealing with the mutability
of the registry, so I didn't even try to address that here.
* More compiler fixes
* More compiler fixes
* More compiler fixes
* Precommit is happy and so am I
* Add new Core VSTs to tests
* Disabled supported type test on SigTerms until we can backport it's fix
* fix checkstyle
* Fix test failure from semantic merge issue
* Fix some metaData->metadata replacements that got lost
* Fix list of supported types for MinAggregator
* Fix list of supported types for Avg
* remove unused import
Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
Modify the value of nowInMillis in queryShardContext to current timestamp, because the
value will be used lately when validating the filtered alias which uses now in a date_nanos
range query.
With this change, when a task is canceled, the task manager will cancel
not only its direct child tasks but all also its descendant tasks.
Closes#50990
The secure_settings_password was never taken into consideration in
the ReloadSecureSettings API. This commit fixes that and adds
necessary REST layer testing. Doing so, it also:
- Allows TestClusters to have a password protected keystore
so that it can be set for tests.
- Adds a parameter to the run task so that elastisearch can
be run with a password protected keystore from source.
The usage of local parameter for GetFieldMappingRequest has been removed from the underlying transport action since v2.0.
This PR deprecates the parameter from rest layer. It will be removed in next major version.
* HLRC support for Index Templates V2 (#54838)
* HLRC support for Index Templates V2
This change adds High Level Rest Client support for Index Templates V2.
Relates to #53101
* fixed compilation error
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
These tests do CRUD for component templates, however, for 7.7 some changes weren't backported in the
`_doc` wrapping/unwrapping done for the APIs, this can cause test failures.
This bumps the minimum version for these tests to 7.8, which is okay because component templates are
hidden behind a flag and have no compatibility guarantees for 7.7.
Relates to #53101
We occasionally add a global template for our YAML tests, and this can cause warnings for these
template tests. This commit adds these warnings so they don't cause test failures.
Resolves#54822
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This commit introduces a new `geo` module that is intended
to be contain all the geo-spatial-specific features in server.
As a first step, the responsibility of registering the geo_shape
field mapper is moved to this module.
Co-authored-by: Nicholas Knize <nknize@gmail.com>
This replaces the last bit of validation that pipeline aggregations
performed on the data nodes with explicit checks in a few
`PipelineAggregationBuilders`. We were *already* catching these
validation errors for pipeline aggregations that require that their
parent be squentially ordered. This just adds validation for pipelines
that require *any* parent like `bucket_selector` and `bucket_sort`.
There were some failures on 7.x of field collapse tests,
where total hits count was less then expected.
This adds an additional test to check total hits count
before field collapse queries to understand if the problem
is with field collapsing or with simply that writes have
not been finished yet
Relates to #52416
Today when canceling a task we broadcast ban/unban requests to all nodes
in the cluster. This strategy does not scale well for hierarchical
cancellation. With this change, we will track outstanding child requests
and broadcast the cancellation to only nodes that have outstanding child
tasks. This change also prevents a parent task from sending child
requests once it got canceled.
Relates #50990
Supersedes #51157
Co-authored-by: Igor Motov <igor@motovs.org>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
* Use V2 index templates during index creation
This commit changes our index creation code to use (and favor!) V2 index templates during index
creation. The creation precedence goes like so, in order of precedence:
- Existing source `IndexMetadata` - for example, when recovering from a peer or a shrink/split/clone
where index templates should not be applied
- A matching V2 index template, if one is found
- When a V2 template is found, all component templates (in the `composed_of` field) are applied
in the order that they appear, with the index template having the 2nd highest precedence (the
create index request always has the top priority when it comes to index settings)
- All matching V1 templates (the old style)
This also adds index template validation when `PUT`-ing a new v2 index template (because this was
required) and ensures that all index and component templates specify *no* top-level mapping type (it
is automatically added when the template is added to the cluster state).
This does not yet implement fine-grained component template merging of mappings, where we favor
merging only a single field's configuration, that will be done in subsequent work.
This also keeps the existing hidden index behavior present for v1 templates, where a hidden index
will match v2 index templates unless they are global (`*`) templates.
Relates to #53101
- Consolidates HDR/TDigest factories into a single factory
- Consolidates most HDR/TDigest builder into an abstract builder
- Deprecates method(), compression(), numSigFig() in favor of a new
unified PercentileConfig object
- Disallows setting algo options that don't apply to current algo
The unified config method carries both the method and algo-specific
setting. This provides a mechanism to reject settings that apply
to the wrong algorithm. For BWC the old methods are retained
but marked as deprecated, and can be removed in future versions.
Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
This commit updates the rest API specs to validate against a
JSON schema for the specifications. Most updates are to add
a description, whilst others fix typos and unify conventions
e.g. deprecations, descriptions, urls starting with /. The schema
conforms to draft-07 JSON schema.
(cherry picked from commit da37e01d32f9764c3937736ef0c7d3ab40af9a77)
Tests for unmapped fields, the missing parameter, scripting, and correct
ValuesSource types in MissingAggregatorTests. Basic yaml tests for the
missing agg
For #42949
There is a setting in `ESClientYamlSuiteTestCase` under `usually()` that can install a `global`
template changing the number of shards for all indices. This can cause warnings when installing v2
templates (see #54367). This adds these as optional warnings so they don't cause failures regardless
of whether the global template is installed or not.
These warnings can be removed when our internal template usage has been moved to index templates v2
Relates to #53101
This fixes a serialization bug in `auto_date_histogram` that comes up in
a cluster mixed between pre-7.3.0 and post-7.3.0.
Includes #54429 to keep 7.x looking like master for simpler backports.
Closes#54382
Pipeline aggregations like `stats_bucket`, `sum_bucket`, and
`percentiles_bucket` only operate on buckets that have multiple buckets.
This adds support for those aggregations to `geo_distance`, `ip_range`,
`auto_date_histogram`, and `rare_terms`.
This all happened because we used a marker interface to mark compatible
aggs, `MultiBucketAggregationBuilder` and it was fairly easy to forget
to implement the interface.
This replaces the marker interface with an abstract method in
`AggregationBuilder`, `bucketCardinality` which makes you return `NONE`,
`ONE`, or `MANY`. The `bucket` aggregations can check for `MANY`. At
this point `ONE` and `NONE` amount to about the same thing, but I
suspect that'll be a useful distinction when validating bucket sorts.
Closes#53215
This commit ensures that node roles are sorted by node role name, which
makes the output easier to consume, and also makes it easier to rely on
the behavior of the output in assertions.
* Add REST APIs for IndexTemplateV2Metadata CRUD (#54039)
* Add REST APIs for IndexTemplateV2Metadata CRUD
This commit adds the get/put/delete APIs for interacting with the now v2 versions of index
templates.
These APIs are behind the existing `es.itv2_feature_flag_registered` system property feature flag.
Relates to #53101
* Add exceptions for HLRC tests
* Add skips for 7.x versions
* Use index_template instead of template_v2 in action names
* Add test for MetaDataIndexTemplateService.addIndexTemplateV2
* Move removal to static method and add test
* Add unit tests for request classes (implement hashCode & equals)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Fix compilation
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Document known features on rest-api-spec tests
Features dictate wheter a `rest-api-spec` test runner can execute a
test. This PR documents all the know features in the java implementation
of the runner.
* Apply suggestions from code review
Co-Authored-By: Luca Cavanna <javanna@users.noreply.github.com>
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
(cherry picked from commit fbe173723d0ad7ebb920cda855bce3fb758b04a6)
This commit changes the pre_filter_shard_size default from 128 to unspecified.
This allows to apply heuristics based on the request and the target indices when deciding
whether the can match phase should run or not. When unspecified, this pr runs the can match phase
automatically if one of these conditions is met:
* The request targets more than 128 shards.
* The request contains read-only indices.
* The primary sort of the query targets an indexed field.
Users can opt-out from this behavior by setting the `pre_filter_shard_size` to a static value.
Closes#39835