Previously the concrete type parameters for the MappedFieldType didn't always
match those for the FieldMapper. This PR updates the mappers so that the type
parameters always match, which makes the design easier to follow.
A number of obvious possible simplifications that also improve efficiency
in some cases (better empty collection handling and size hint use).
Also, added a shortcut for writing and reading immutable open maps that
can be used to dry up additional spots.
Backport of #58898.
Part of #48366. Now that there is a dedicated API for dangling indices, the auto-import
behaviour can default to off. Also add a note to the breaking changes for 7.9.0.
Today when mounting a searchable snapshot we obtain the snapshot/index
UUIDs and then assume that these are the UUIDs used during the
subsequent restore. If you concurrently delete the snapshot and replace
it with one with the same name then this assumption is violated, with
chaotic consequences.
This commit introduces a check that ensures that the snapshot UUID does
not change during the mount process. If the snapshot remains in place
then the index UUID necessarily does not change either.
Relates #50999
Backport of #59525 to 7.x branch.
* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
Today `GET _cat/shards` requests the nodes, routing table, and metadata
from the cluster state, but it does not use any information from the
metadata portion of the response. Metadata includes things like mappings
and templates that may be substantial in size.
This commit drops the unnecessary metadata portion of this cluster state
request.
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.
Backport of #59038
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
* We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations
See updated JavaDoc and added test cases for more details and illustration on the functionality.
Some notes:
The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.
This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.
Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
With parallel snapshots incoming (but also in isolation) it makes sense to clean up
`SnapshotsInProgress` construction.
We don't need to pre-compute the waiting shards for every entry. We rarely use this information
(only on routing changes) and in the one spot we did we now simply spent the extra cycles for looping
over all shards instead of just the waiting ones once per routing change tops instead of on every change
to `SnapshotsInProgress` (moreover, we would burn the cycles for looping on all nodes even though only the
current master cares about the information).
In addition to that change I removed some dead code constructors and slighly optimized deserialization.
* Use consistent cluster state instead in state update
* Remove dead loop in tests
* Remove some dead exception ctors
Just three trivial/random things I found.
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.
The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).
Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
This commit increases the default write queue size to 10000. This is to
allow a greater number of pending indexing requests. This work is safe
as we have added additional memory limits. Relates to #59263.
The update by query action parses a script from an object (map or string). We will need to do the same for runtime fields as they are parsed as part of mappings (#59391).
This commit moves the existing parsing of a script from an object from RestUpdateByQueryAction to the Script class. It also adds tests and adjusts some error messages that are incorrect. Also, options were not parsed before and they are now. And unsupported fields trigger now a deprecation warning.
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.
(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.
(cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Relates to #58680. Bugs like that should not only show up in logs
but ideally also get caught in tests. We expect to never see exceptions
in these two spots.
This change removes the redundant submitting of two separate cluster state updates
for the node configuration changes and routing changes that affect snapshots.
Since we submitted the task to deal with node configuration changes every time on master
fail-over we could also move the BwC cleanup loop that removes `INIT` state snapshots as well
as snapshots that have all their shards completed into this cluster state update task.
Aside from improving efficiency overall this change has the fortunate side effect of moving
all snapshot finalization to the CS update thread. This is helpful for concurrent snapshots
since it makes it very natural and straight forward to order snapshot finalizations by exploiting
that they are all initiated on the same thread.
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
This commit adds rejections when the indexing memory limits are
exceeded for primary or coordinating operations. The amount of bytes
allow for indexing is controlled by a new setting
`indexing_limits.memory.limit`.
We don't need to switch to the generic or snapshot pool for loading
cached repository data (i.e. most of the time in normal operation).
This makes `executeConsistentStateUpdate` less heavy if it has to retry
and lowers the chance of having to retry in the first place.
Also, this change allowed simplifying a few other spots in the codebase
where we would fork off to another pool just to load repository data.
No need to do any switch to the `SNAPSHOT` pool here, the blob store
repo handles all its writes async on the `SNAPSHOT` pool so we're just
needlessly context-switching to enqueue those tasks there.
Also cleaned up the source only repository (the only override to `finalizeSnapshot`)
to make it clear that no IO is happening there and we don't need to run it on the
`SNAPSHOT` pool either.
Follow up to #56365. Instead of redundantly checking snapshots for completion
over and over, just track the completed snapshots in the CS updates that complete
them instead of looping over the smae snapshot entries over and over.
Also, in the batched snapshot shard status updates, only check for completion
of a snapshot entry if it isn't already finalizing.
Using G1 GC, Elasticsearch can rarely trigger that heap usage goes above
the real memory circuit breaker limit and stays there for an extended
period. This situation will persist until the next young GC. The circuit
breaking itself hinders that from occurring in a timely manner since it
breaks all request before real work is done.
This commit gently nudges G1 to do a young GC and then double checks
that heap usage is still above the real memory circuit breaker limit
before throwing the circuit breaker exception.
Related to #57202
Backport of #59293 to 7.x branch.
* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
this results in storing a composable index template
with data stream definition only to work with default
distribution. This way data streams can only be used
with default distribution, since a data stream can
currently only be created if a matching composable index
template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
to fail if `_data_stream_timestamp` meta field mapper
isn't registered. So that a more understandable
error is returned when attempting to store a template
with data stream definition via the oss distribution.
In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
The code path for closed indices is dead code here ever since #39644
because `shards(currentState, indexIds, ...)` does not set
`MISSING` on a closed index's shard that is assigned any longer. Before that change it would always set `MISSING` for a closed index's shard even it was assigned.
=> simplified the code accordingly.
In #52680 we introduced a new health check mechanism. This commit fixes
up some related test failures on Windows caused by erroneously assuming
that all paths begin with `/`.
Closes#59380
With #55773 the snapshot INIT state step has become obsolete. We can set up the snapshot directly in one single step to simplify the state machine.
This is a big help for building concurrent snapshots because it allows us to establish a deterministic order of operations between snapshot create and delete operations since all of their entries now contain a repository generation. With this change simple queuing up of snapshot operations can and will be added in a follow-up.
We have a number of parameters which are universally parsed by almost all
mappers, whether or not they make sense. Migrating the binary and boolean
mappers to the new style of declaring their parameters explicitly has meant
that these universal parameters stopped being accepted, which would break
existing mappings.
This commit adds some extra logic to ParametrizedFieldMapper that checks
for the existence of these universal parameters, and issues a warning on
7x indexes if it finds them. Indexes created in 8.0 and beyond will throw an
error.
Fixes#59359
This refactoring has three motivations:
1. Separate all master node steps during snapshot operations from all data node steps in code.
2. Set up next steps in concurrent repository operations and general improvements by centralizing tracking of each shard's state in the repository in `SnapshotsService` so that operations for each shard can be linearized efficiently (i.e. without having to inspect the full snapshot state for all shards on every cluster state update, allowing us to track more in memory and only fall back to inspecting the full CS on master failover like we do in the snapshot shards service).
* This PR already contains some best effort examples of this, but obviously this could be way improved upon still (just did not want to do it in this PR for complexity reasons)
3. Make the `SnapshotsService` less expensive on the CS thread for large snapshots
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
This modifies the `variable_width_histogram`'s distant bucket handling
to:
1. Properly handle integer overflows
2. Recalculate the average distance when new buckets are added on the
ends. This should slow down the rate at which we build extra buckets
as we build more of them.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
In #52680 we introduced a new health check mechanism. This commit fixes
up some sporadic related test failures, and improves the behaviour of
the `FollowersChecker` slightly in the case that no retries are
configured.
Closes#59252Closes#59172
Today `NodeEnvironment#findAllShardIds` enumerates the index directories
in each data path in order to find one with a specific name. Since we
already know the name of the folder we seek we can construct the path
directly and avoid this directory listing. This commit does that.
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.
This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.
Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
We are leaking a FileChannel in #39585 if we release a safe commit with
CancellableThreads. Although it is a bug in Lucene where we do not close
a FileChannel if we failed to create a NIOFSIndexInput, I think it's
safer if we release a safe commit using the generic thread pool instead.
Closes#39585
Relates #45409
Backport of #59076 to 7.x branch.
The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.
Relates to #58642
Relates to #53100Closes#58956Closes#58583
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
sub-aggregations live under multi-bucket aggregations. This is
normally true but `parentCardinality` is properly carried forward
for single bucket aggregations like `filter` and for multi-bucket
aggregations configured in single-bucket for like `range` with a
single range.
While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.
Relates to #56487
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
In order to ensure that we do not write a broken piece of `RepositoryData`
because the phyiscal repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.
Relates #56911
Currently we assert that the reason we fail collecting nodes in this
test is due to the fact that no seeds are available or no connections
could be established to cluster_2. However, the collection could fail if
we cannot establish connections to cluster_1. This commit adds that as
an acceptible assertion.
Today, we send operations in phase2 of peer recoveries batch by batch
sequentially. Normally that's okay as we should have a fairly small of
operations in phase 2 due to the file-based threshold. However, if
phase1 takes a lot of time and we are actively indexing, then phase2 can
have a lot of operations to replay.
With this change, we will send multiple batches concurrently (defaults
to 1) to reduce the recovery time.
Backport of #58018
This commit adds validation that when a composable index template is updated, that the number
of unreferenced data streams does not increase. While it is still possible to have data streams
without a backing template (through snapshot restoration), this reduces the chance of getting
in to that scenario.
Relates to #53100
Currently in the recovery request tracker tests we place the futures
into the future map on the GENERIC thread. It is possible that the test
has already advanced past the point where we block on these futures
before they are placed in the map. This introduces other potential
failures as we expect all futures have been completed. This commit fixes
the test by places the futures in the map prior to dispatching.
The test would try to prepare a `Rounding` even when there aren't any
buckets. This would fail because there is no range over which to prepare
the rounding. It turns out that we don't need the rounding in that case
so we just use `null` then.
Closes#59131
* GET data stream API returns additional information (#59128)
This adds the data stream's index template, the configured ILM policy
(if any) and the health status of the data stream to the GET _data_stream
response.
Restoring a data stream from a snapshot could install a data stream that
doesn't match any composable templates. This also makes the `template`
field in the `GET _data_stream` response optional.
(cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
If the recovery source is on an old node (before 7.2), then the recovery
target won't have the safe commit after phase1 because the recovery
source does not send the global checkpoint in the clean_files step. And
if the recovery fails and retries, then the recovery stage won't
transition properly. If a sync_id is used in peer recovery, then the
clean_files step won't be executed to move the stage to TRANSLOG.
Relates ##7187
Closes#57708
This request:
```
POST /_search
{
"aggs": {
"a": {
"adjacency_matrix": {
"filters": {
"1": {
"terms": { "t": { "index": "lookup", "id": "1", "path": "t" } }
}
}
}
}
}
}
```
Would fail with a 500 error and a message like:
```
{
"error": {
"root_cause": [
{
"type": "illegal_state_exception",
"reason":"async actions are left after rewrite"
}
]
}
}
```
This fixes that by moving the query rewrite phase from a synchronous
call on the data nodes into the standard aggregation rewrite phase which
can properly handle the asynchronous actions.
Today we do not allow a node to start if its filesystem is readonly, but
it is possible for a filesystem to become readonly while the node is
running. We don't currently have any infrastructure in place to make
sure that Elasticsearch behaves well if this happens. A node that cannot
write to disk may be poisonous to the rest of the cluster.
With this commit we periodically verify that nodes' filesystems are
writable. If a node fails these writability checks then it is removed
from the cluster and prevented from re-joining until the checks start
passing again.
Closes#45286
Co-authored-by: Bukhtawar Khan <bukhtawar7152@gmail.com>
For #58994 it would be useful to be able to share test infrastructure.
This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests
accordingly and adds a shared and efficient (compared to the previous implementations)
way of waiting for no running snapshot operations to the test infrastructure to dry things up further.
* Enforce higher priority for RepositoriesService ClusterStateApplier
This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.
Backport of #58808
This allows pipeline aggregations to participate in the up-front rewrite
phase for searches, in particular, it allows them to load data that they
need asynchronously.
Relates to #58193
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
If the global checkpoint equals max_seq_no, then we won't reset an engine
(as all operations are safe), and max_seqno_of_updates_or_deletes
won't advance to max_seq_no.
Closes#58163
Part of the original PR was merged by #59028
(cherry picked from commit 2598327726124d8a86333f79cdc45bf6a4297dbc)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Backport of #58582 to 7.x branch.
This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.
The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.
Relates to #53100
Dry up tests that use a disruption that isolates the master from all other nodes.
Also, turn disruption types that have neither parameters nor state into constants
to make things a little clearer.
Working through a heap dump for an unrelated issue I found that we can easily rack up
tens of MBs of duplicate empty instances in some cases.
I moved to a static constructor to guard against that in all cases.
This is a follow-up to #57573. This commit combines coordinating and
primary bytes under the same "write" bucket. Double accounting is
prevented by only accounting the bytes at either the reroute phase or
the primary phase. TransportBulkAction calls execute directly, so the
operations handler is skipped and the bytes are not double accounted.
Since #55785, exists queries rewrite to MatchNoneQueryBuilder when the field is unmapped.
This change also introduced a bug in the `query_string` query, using an unmapped field
like `_exists_:foo` throws an exception if the field is unmapped. This commit avoids the
exception if the query is built outside of an `ExistsQueryBuilder`.
Closes#58737
An old translog header does not have a checksum. If we flip the header
version of an empty translog to the older version, then we won't detect
that corruption, and translog will be considered clean as before.
Closes#58671
A regression in the mapping code led to geo_shape no longer supporting
array-valued fields. This commit fixes this support and adds an integration
test to make sure this problem does not return!
Ingest script processors were changed to eagerly compile their scripts
when the ingest pipeline is saved, but conditional scripts were missed.
This commit adds eager compilation to ingest conditional scripts, which
will help surface errors before runtime, as well as adds tests for each
case we might encounter between inline and stored script compilation
failures.
closes#58864
This commit adds an integration test that component templates used to form a composite template can
still be updated after a cluster restart.
In #58643 an issue arose where mappings were causing problems because of the way we unwrap `_doc` in
template mappings. This was also related to the mappings being merged manually rather than using the
`MapperService` to do the merging. #58643 was fixed in 7.9 and master with the #58521 change, since
mappings now are read and digested by the actual mapper service.
This test passes for 7.x and master, and I intend to open a separate PR including this test for
7.8.1 along with a bug fix for #58643. This test is to ensure we don't have any regression in the
future.
The refactoring in #57666 inadvertently enabled norms on two of the percolator subfields,
leading to an increase in memory usage. This commit disables norms on these fields again.
The `date_histogram` aggregation had an optimization where it'd rewrite
`time_zones` who's offset from UTC is fixed across the entire index.
This rewrite is no longer needed after #56371 because we can tell that a
time zone is fixed lower down in the aggregation. So this removes it.
This prohibits the use of a custom _routing when the index/bulk requests are targetting a data stream.
Using a custom _routing when targetting a backing index is still permitted.
Relates to #53100
(cherry picked from commit ece6b7a318a8bd3a010499189f31fc5e3a012d4f)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Date processor was incorrectly parsing week based dates because when a
weekbased year was provided ingest module was thinking year was not
on a date and was trying to applying the logic for dd/MM type of
dates.
Date Processor is also allowing users to specify locale parameter. It
should be taken into account when parsing dates - currently only used
for formatting. If someone specifies 'en-us' locale, then calendar data
rules for that locale should be used.
The exception is iso8601 format. If someone is using that format,
then locale should not override calendar data rules.
closes#58479
The read-only-allow-delete block is not really under the user's control
since Elasticsearch adds/removes it automatically. This commit removes
support for it from the new API for adding blocks to indices that was
introduced in #58094.
Backport of #58231 to 7.x branch.
Change update index setting and put mapping api
to execute on all backing indices if data stream is targeted.
Relates #53100
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.
The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.
Relates https://github.com/elastic/elasticsearch/issues/57023
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.
This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.
Backport of #58029 to 7.x
* Picky picky
* Missing type
Adds an explicit check to `variable_width_histogram` to stop it from
trying to collect from many buckets because it can't. I tried to make it
do so but that is more than an afternoon's project, sadly. So for now we
just disallow it.
Relates to #42035
Backport of #58419
Mapping updates that originate from indexing a document with unmapped fields will use this new action
instead of the current put mapping action. This way on the security side, authorization logic
can easily determine whether a mapping update is automatically generated or a mapping update originates
from the put mapping api.
The new auto put mapping action is only used if all nodes are on the version that supports it.
This PR implements recursive mapping merging for composable index templates.
When creating an index, we perform the following:
* Add each component template mapping in order, merging each one in after the
last.
* Merge in the index template mappings (if present).
* Merge in the mappings on the index request itself (if present).
Some principles:
* All 'structural' changes are disallowed (but everything else is fine). An
object mapper can never be changed between `type: object` and `type: nested`. A
field mapper can never be changed to an object mapper, and vice versa.
* Generally, each section is merged recursively. This includes `object`
mappings, as well as root options like `dynamic_templates` and `meta`. Once we
reach 'leaf components' like field definitions, they always overwrite an
existing one instead of being merged.
Relates to #53101.
* Replace compile configuration usage with api (#58451)
- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
- required as java library will by default not have build jar file
- jar file is now explicit input of the task and gradle will ensure its properly build
* Fix compile usages in 7.x branch
In the unlikely corner case of deleting a relocation (hence `WAITING`) primary shard's
index during a partial snapshot, we would throw an NPE when checking if there's any external
changes to process.
Adds an API for putting an index block in place, which also ensures for write blocks that, once successfully returning to
the user, all shards of the index are properly accounting for the block, for example that all in-flight writes to an index have
been completed after adding the write block.
This API allows coordinating more complex workflows, where it is crucial that an index is no longer receiving writes after
the API completes, useful for example when marking an index as read-only during an upgrade in order to reindex its
documents.
Only snapshot datastreams that are recorded in `SnapshotInfo` and clean those
that aren't from the snapshotted metadata.
Do not restore all datastreams by default when restoring global metadata, use the same
mechanics used for indices here.
Closes#58544
When restoring a global metadata snapshot we were overwriting the correctly
adjusted data streams in the metadata when looping over all custom values.
Closes#58496
The remote_monitoring_user user needs to access the enrich stats API.
But the request is denied because the API is categorized under admin.
The correct privilege should be monitor.
Java 9 removed pathname canonicalization, which means that we need to
add permissions for the path and also the real path when adding file
permissions. Since master requires a minimum runtime of JDK 11, we no
longer need conditional logic here to apply this pathname
canonicalization with our bares hands. This commit removes that
conditional pathname canonicalization.
Co-authored-by: Jason Tedor <jason@tedor.me>
`descendsFromBucketAggregator` was important before we removed
`asMultiBucketAggregator` but now that it is gone
`collectsFromSingleBucket` is good enough.
Relates to #56487
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Today SparseFileTracker allows to wait for a range to become available
before executing a given listener. In the case of searchable snapshot,
we'd like to be able to wait for a large range to be filled (ie, downloaded
and written to disk) while being able to execute the listener as soon as
a smaller range is available.
This pull request is an extract from #58164 which introduces a
ProgressListenableActionFuture that is used internally by
SparseFileTracker. The progressive listenable future allows to register
listeners attached to SparseFileTracker.Gap so that they are executed
once the Gap is completed (with success or failure) or as soon as the
Gap progress reaches a given progress value. This progress value is
defined when the tracker.waitForRange() method is called; this method
has been modified to accept a range and another listener's range to
operate on.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Minor bugs/inconsistencies:
If a shard hasn't changed at all we were reporting `0` for total size and total file count
while it was ongoing.
If a data node restarts/drops out during snapshot creation the fallback logic did not load the correct statistic from the repository but just created a status with `0` counts from the snapshot state in the CS. Added a fallback to reading from the repository in this case.
Fixes two bugs introduced by #57627:
1. We were not properly letting go of memory from the request breaker
when the aggregation finished.
2. We no longer supported totally arbitrary stuff produced by the init
script because we *assumed* that it'd be ok to run the script once
and clone its results. Sadly, cloning can't clone *anything* that the
init script can make, like `String` arrays. This runs the init script
once for every new bucket so we don't need to clone.
If we failed over while the data nodes were doing their work
we would never resolve the listener and leak it.
This change fails all listeners if master fails over.
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
- node.data: false
- node.ingest: false
- node.remote_cluster_client: false
- node.ml: false
at a minimum if they are relying on defaults, but also add:
- node.master: true
- node.transform: false
- node.voting_only: false
If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.
This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.
With this change we deprecate the existing 'node.*' settings such as
'node.data'.
This commit restores the filtering of empty fields during the
xcontent serialization of SearchHit. The filtering was removed
unintentionally in #41656.
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.
This PR addresses #9572.
The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.
At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.
The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.
Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue.
It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.
Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
The main changes are:
1. Catch the `NamedObjectNotFoundException` when parsing aggregation
type, and then throw a `ParsingException` with clear error message with hint.
2. Add a unit test method: AggregatorFactoriesTests#testInvalidType().
Closes#58146.
Co-authored-by: bellengao <gbl_long@163.com>
If a persistent task cannot be assigned on the first attempt
then the master node will schedule periodic rechecks to see
if the assignment requirements have been met.
These periodic rechecks should be cancelled if the node ceases
to be master. Previously they weren't, leading to exceptions
being logged repeatedly. This PR cancels the rechecks on
learning that the node is no longer the master.
Fixes#58531
When creating a target index from a source index, we don't allow for target
mappings to be specified. This PR simplifies the check that the target mappings
are empty.
This refactor will help when implementing composable template merging, since we
no longer need to resolve + check the target mappings when creating an index
from a template.
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.
Relates to #53175
`terminate_after` is ignored on search requests that don't return top hits (`size` set to 0)
and do not tracked the number of hits accurately (`track_total_hits`).
We use early termination when the number of hits to track is reached during collection
but this breaks the hard termination of `terminate_after` if it happens before we reached
the `terminate_after` value.
This change ensures that we continue to check `terminate_after` even if the tracking of total
hits has reached the provided value.
Closes#57624
Users are perennially confused by the message they get when writing to
an index is blocked due to excessive disk usage:
TOO_MANY_REQUESTS/12/index read-only / allow delete (api)
Of course this is technically accurate but it is hard to join the dots
from this message to "your disk was too full" without some searching of
forums and documentation. Additionally in #50166 we changed the status
code to today's `429` from the previous `403` which changed the message
from the one that's widely documented elsewhere:
FORBIDDEN/12/index read-only / allow delete (api)
Since #42559 we've considered this block to be under the sole control of
the disk-based shard allocator, and we have seen no evidence to suggest
that anyone is applying this block manually. Therefore this commit
adjusts this block's message to indicate that it's caused by a lack of
disk space.
Similarities only apply to a few text-based field types, but are currently set directly on
the base MappedFieldType class. This commit moves similarity information into
TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper.
It was previously possible to include a similarity parameter on a number of field types
that would then ignore this information. To make it obvious that this has no effect, setting
this parameter on non-text field types now issues a deprecation warning.
This commit creates a shared withCustomConfig method that may be used by
any packaging test. The method will copy the config directory and
override the conf path appropriately depending on the distribution type.
Very rarely this test can fail if we draw a random TimeZone id that we cannot
parse with the legacy joda DateMathParser and get an IllegalArgumentException.
In addition to a "SystemV/*" time zone we also need an index "versionCreated"
before V_7_0_0 and no "format" setting in the query builder. Given how unlikely
this combination is, we should simply dissallow those time zone ids when
generating the random query builder for RangeQueryBuilderTests.
Closes#58431
Now that MappedFieldType no longer extends lucene's FieldType, we need to have a
way of getting the index information about a field necessary for building text queries,
building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo
abstraction that holds this information, and a getTextSearchInfo() method to
MappedFieldType to make it available. Field types that do not support text search can
just return null here.
This allows us to remove the MapperService.getLuceneFieldType() shim method.
This merges the aggregator for `significant_text` into
`significant_terms`, applying the optimization built in #55873 to save
memory when the aggregation is not on top. The `significant_text`
aggregation is pretty memory intensive all on its own and this doesn't
particularly help with that, but it'll help with the memory usage of any
sub-aggregations.
Just like #56094 but for the request side.
Removes a lot of redundant `ShardId` instances from bulk shard requests as well as stops serializing index names when they're not needed because they're not different from what is in the shard id.
Even ignoring the index name serialization savings here, this change saves one `ShardId` instance per bulk shard request at least. This means it saves approximately:
* 8 bytes for the `ShardId` object (itself + one field)
* + another 4 bytes for the `int` in the `ShardId`
* 16 bytes (two fields + the instance itself + the padding) for the `Index` object
* + 30 bytes for the `Index` uuid string
* + all the bytes in the index name string
=> 60+ bytes per bulk request item saved on heap and over the wire
Today the `PublicationContext` interface has a single anonymous
implementation, and `PublicationTransportHandler` has various methods
that take the variables that this anonymous class captures. This commit
refactors this into a proper class with proper fields and moves the
relevant methods onto this class.
Backport of #58405 to 7.x.
FieldTypeLookup maps field names to their MappedFieldTypes. In the past, due to
the presence of multiple mapping types within a single index, this had to be updated
in-place because a mapping update might only affect one type. However, now that
we only have a single type per index, we can completely rebuild the FieldTypeLookup
on each update, removing lots of concurrency worries.
Backporting #58096 to 7.x branch.
Relates to #53100
* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
* Add support for snapshot and restore to data streams (#57675)
This change adds support for including data streams in snapshots.
Names are provided in indices field (the same way as in other APIs), wildcards are supported.
If rename pattern is specified it renames both data streams and backing indices.
It also adds test to make sure SLM works correctly.
Closes#57127
Relates to #53100
* version fix
* compilation fix
* compilation fix
* remove unused changes
* compilation fix
* test fix
This adds validation to make sure alias operations (add, remove, remove index)
don't target data streams or the backing indices.
(cherry picked from commit 816448990e464a02f3960f12f6f6644a8cce36a4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Fixes a bug in TextFieldMapper serialization when index is false, and adds a
base-class test to ensure that all field mappers are tested against all variations
with defaults both included and excluded.
Fixes#58188
This is currently used to set the indexVersionCreated parameter on FieldMapper.
However, this parameter is only actually used by two implementations, and clutters
the API considerably. We should just remove it, and use it directly in the
implementations that require it.
When a numeric value in e.g. a `term` query doesn't fit into a long, it
curerently gets parsed to a BigInteger object, that the various term query
builders store untouched. This leads to serialization errors when these queries
are sent across the wire. Instead we can convert to a string representation
early on, since that is what we store e.g. when indexing big integers into
`keyword` fields anyway.
Closes#57917
This change allows to use an `index_filter` in the
field capabilities API. Indices are filtered from
the response if the provided query rewrites to `match_none`
on every shard:
````
GET metrics-*
{
"index_filter": {
"bool": {
"must": [
"range": {
"@timestamp": {
"gt": "2019"
}
}
}
}
}
````
The filtering is done on a best-effort basis, it uses the can match phase
to rewrite queries to `match_none` instead of fully executing the request.
The first shard that can match the filter is used to create the field
capabilities response for the entire index.
Closes#56195
This allows doing true CAS operations on aliases, making sure that an alias is actually properly
moved from a given source index onto a given target index. This is useful to ensure that an
alias is actually moved from a given index to another one, and not just added to another index.
Currently a failed replication action will fail an entire replica. This
includes when replication fails due to potentially short lived transient
issues such as network distruptions or circuit breaking errors.
This commit implements retries using the retryable action.
Forgot the brackets here in #58214 so in the rare case where the
first update seen by the listener doesn't match it will still remove
itself and never be invoked again -> timeout.
This builds an `auto_date_histogram` aggregator that natively aggregates
from many buckets and uses it when the `auto_date_histogram` used to use
`asMultiBucketAggregator` which should save a significant amount of
memory in those cases. In particular, this happens when
`auto_date_histogram` is a sub-aggregator of a multi-bucketing aggregator
like `terms` or `histogram` or `filters`. For the most part we preserve
the original implementation when `auto_date_histogram` only collects from
a single bucket.
It isn't possible to "just port the aggregator" without taking a pretty
significant performance hit because we used to rewrite all of the
buckets every time we switched to a coarser and coarser rounding
configuration. Without some major surgery to how to delay sub-aggs
we'd end up rewriting the delay list zillions of time if there are many
buckets.
The multi-bucket version of the aggregator has a "budget" of "wasted"
buckets and only rewrites all of the buckets when we exceed that budget.
Now that we don't rebucket every time we increase the rounding we can no
longer get an accurate count of the number of buckets! So instead the
aggregator uses an estimate of the number of buckets to trigger switching
to a coarser rounding. This estimate is likely to be *terrible* when
buckets are far apart compared to the rounding. So it also uses the
difference between the first and last bucket to trigger switching to a
coarser rounding. Which covers for the shortcomings of the bucket
estimation technique pretty well. It also causes the aggregator to emit
fewer buckets in cases where they'd be reduced together on the
coordinating node. This is wonderful! But probably fairly rare.
All of that does buy us some speed improvements when the aggregator is
a child of multi-bucket aggregator:
Without metrics or time zone: 25% faster
With metrics: 15% faster
With time zone: 22% faster
Relates to #56487
This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are
now on the latest maintained line, and pick up a large collection of bug
fixes that have accumulated.
This was a really subtle bug that we introduced a long time ago.
If a shard snapshot is in aborted state but hasn't started snapshotting on a node
we can only send the failed notification for it if the shard was actually supposed
to execute on the local node.
Without this fix, if shard snapshots were spread out across at least two data nodes
(so that each data node does not have all the primaries) the abort would actually
never wait on the data nodes. This isn't a big deal with uuid shard generations
but could lead to potential corruption on S3 when using numeric shard generations
(albeit very unlikely now that we have the 3 minute wait there).
Another negative side-effect of this bug was that master would receive a lot more
shard status update messages for aborted shards since each data node not assigned
a primary would send one message for that primary.
The dangling indices action is not a proper master node action so it does not
retry when executed while the cluster hasn't fully formed yet.
Since we use node restarts when setting up the dangling indices state we need
to manually ensure a fully formed cluster before moving on with the tests to avoid
failures.
Backport of #50920. Part of #48366. Implement an API for listing,
importing and deleting dangling indices.
Co-authored-by: David Turner <david.turner@elastic.co>
MappedFieldType is a combination of two concerns:
* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched
We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.
Relates to #56814
* Normalized prefix for rollover API (#57271)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Lee Hinman <lee@writequit.org>
It fixes the issue #53388
by normalizing prefix at index creation request itself
* Fix compilation for backport
Co-authored-by: Gaurav Chandani <chngau@amazon.com>
After an index has been deleted it may take some time to cancel all the
maintenance tasks such as RetentionLeaseSync, it's possible that the
task is already executing before the cancellation. This commit just
avoids logging a warning message for those scenarios.
Closes#57864
Backport of (#58098)
This commit adds an optional field, `description`, to all ingest processors
so that users can explain the purpose of the specific processor instance.
Closes#56000.
Instead of serializing compilation using a plain lock / mutex combined with a double check, rely on the computeIfAbsent logic to prevent duplicated compilation of scripts. Made checkCompilationLimit to be thread-safe and lock free.
Backport: 865acad
Co-authored-by: Michael Bischoff <michael.bischoff@elastic.co>
This need some reorg of BinaryDV field data classes to allow specialisation of scripted doc values.
Moved common logic to a new abstract base class and added a new subclass to return string-based representations to scripts.
Closes#58044
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
We keep a static list of meta-fields: META_FIELDS_BEFORE_7_8
as it was before.
This is done to ensure the backwards compatability with pre 7.8 nodes.
Closes#57831
If `ExtraFS` decides to put `extra0/0` into the indices folder
then the previous logic in this test would have interpreted the `0`
as shard `0` of index `extra0` and fail to list its contents (since it's a file
and not an actual shard directory).
=> simplified the logic to use actually referenced `IndexId` for iterating over indices
instead.
Scheduling on the threadpool will throw if the scheduler is already
shut down. Handled by treating the rejection like any other non-retryable
exception.
Closes#58021
This moves the code to look up significance heuristics information like
background frequency and superset size out of
`SignificantTermsAggregatorFactory` and into its own home so that it is
easier to pass around. This will:
1. Make us feel better about ourselves for not passing around the
factory, which is really *supposed* to be a throw away thing.
2. Abstract the significance lookup logic so we can reuse it for the
`significant_text` aggregation.
3. Make if very simple to cache the background frequencies which should
speed up when the agg is a sub-agg. We had done this for numerics
but not string-shaped significant terms.
When a search phase fails, we release the context of all successful shards.
Successful shards that rewrite the request to match none will not create any context
since #. This change ensures that we don't try to release a `null` context on these
successful shards.
Closes#57945
Ensures that InternalClusterInfoService's internally cached stats are refreshed whenever the
shard size or disk usage function (to mock out disk usage) are overridden.
Closes#57888
Today `InternalEngine#releaseIndexCommit` fails with an
`AlreadyClosedException` if the engine is closed before the index commit is
released. This can happen if, for example, a node leaves and rejoins the
cluster and acquires an index commit for replica shard allocation concurrently
with shutting the shard down.
There's no need to fail the operation like this: if the engine is shut down
then we will clean up the unreferenced files when it's restarted (or if it's
allocated elsewhere) so we can suppress an `AlreadyClosedException` in this
case. This commit does so.
Fixes#57797
Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface.
Closes#49554
Co-authored-by: Igor Motov <igor@motovs.org>
Co-authored-by: andrewjohnson2 <aj114114@gmail.com>
When reducing `auto_date_histogram` we were using `Rounding#round`
which is quite a bit more expensive than
```
Rounding.Prepared prepared = rounding.prepare(min, max);
long result = prepared.round(date);
```
when rounding to a non-fixed time zone like `America/New_York`. This
stops using the former and starts using the latter.
Relates to #56124
Use the the hack used in `CorruptedBlobStoreRepositoryIT` in more snapshot
failure tests to verify that BwC repository metadata is handled properly
in these so far not-test-covered scenarios.
Also, some minor related dry-up of snapshot tests.
Relates #57798
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
If a node is disconnected we retry. It does not make sense
to retry the recovery if the node is removed from the cluster though.
=> added a CS listener that cancels the recovery for removed nodes
Also, we were running the retry on the `SAME` pool which for each retry will
be the scheduler pool. Since the error path of the listener we use here
will do blocking operations when closing the resources used by the recovery
we can't use the `SAME` pool here since not all exceptions go to the `ActionListenerResponseHandler`
threading like e.g. `NodeNotConnectedException`.
Closes#57585
In ff9e8c622427d42a2d87b4ceb298d043ae3c4e6a we changed the format
used when serializing snapshot failures in the cluster state and
`SnapshotInfo`. This turned them from a short string holding all the
nested exception messages into a multi kb stacktrace in many cases.
This is not great if you snapshot a large number of shards that all fail
for example and massively blows up the size of the GET snapshots response
if there are snapshots with failures in there.
This change reverts to the format used for exceptions before the above commit.
Also, this change short circuits logging and serialization of the failure
for an aborted snapshot where we don't care about the specific message at all
and aligns the message to "aborted" in all cases (current if we aborted before any IO,
it would have been "aborted" and an exception when aborting later during IO).
Previously, hidden indices were not included in snapshots by default, unless
specified using one of the usual methods for doing so: naming indices directly,
using index patterns starting with a ., or specifying expand_wildcards to
a value that includes hidden (e.g. all or hidden,open).
This commit changes the default expand_wildcards value to include hidden
indices.
Fixed two newly introduced issues with rollover:
1. Using auto-expand replicas, rollover could result in unexpected log
messages on future indexes.
2. It did a reroute and other heavy work on the network thread.
Closes#57706
Supersedes #57865
Relates #53965
Allow for optimistic concurrency control during ingest by checking the
sequence number and primary term. This is accomplished by defining
_if_seq_no and _if_primary_term in the pipeline, similarly to _version
and _version_type.
Closes#41255
Co-authored-by: Maria Ralli <mariai.ralli@gmail.com>
The shrink action creates a shrunken index with the target number of shards.
This makes the shrink action data stream aware. If the ILM managed index is
part of a data stream the shrink action will make sure to swap the original
managed index with the shrunken one as part of the data stream's backing
indices and then delete the original index.
(cherry picked from commit 99aeed6acf4ae7cbdd97a3bcfe54c5d37ab7a574)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This commit fixes a bug on the composite aggregation when the index
is sorted and the primary composite source needs to round values (date_histo).
In such case, we cannot take into account the subsequent sources even if they
match the index sort because the rounding of the primary sort value may break
the original index order.
Fixes#57849
This deprecates `Rounding#round` and `Rounding#nextRoundingValue` in
favor of calling
```
Rounding.Prepared prepared = rounding.prepare(min, max);
...
prepared.round(val)
```
because it is always going to be faster to prepare once. There
are going to be some cases where we won't know what to prepare *for*
and in those cases you can call `prepareForUnknown` and stil be faster
than calling the deprecated method over and over and over again.
Ultimately, this is important because it doesn't look like there is an
easy way to cache `Rounding.Prepared` or any of its precursors like
`LocalTimeOffset.Lookup`. Instead, we can just build it at most once per
request.
Relates to #56124
Currently it is possible for a transient network error to disrupt the
start recovery request from the remote to source node. This disruption
is racy with the recovery occurring on the source node. It is possible
for the source node to finish and clear its recovery. When this occurs,
the recovery cannot be reestablished and the "no two start" assertion
is tripped. This commit fixes this issue by allowing two starts if the
finalize request has been received.
Fixes#57416.
Currently, the translog ops request is reentrent when there is a mapping
update. The impact of this is that a translog ops ends up waiting on the
pre-existing listener and it is never completed. This commit fixes this
by introducing a new code path to avoid the idempotency logic.
The action name is passed to the `ChannelListener` and is used for
logging purposes. Currently, we are using the incorrect action name for
the translog ops listener. This commit fixes the issue.
This reworks string flavored implementations of the `terms` aggregation
to save memory when it is under another bucket by dropping the usage of
`asMultiBucketAggregator`.
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
Currently we check that exceptions are the same in the recovery request
tracker test. This is inconsistent because the future wraps the
exception in a new instance. This commit fixes the test by comparing a
random exception message.
Fixes#57199
In #57701 we changed mappings merging so that duplicate fields specified in mappings caused an
exception during validation. This change makes the same exception thrown when metadata fields are
duplicated. This will allow us to be strict currently with plans to make the merging more
fine-grained in a later release.
Currently a network disruption will fail a peer recovery. This commit
adds network errors as retryable actions for the source node.
Additionally, it adds sequence numbers to the recovery request to
ensure that the requests are idempotent.
Additionally it adds a reestablish recovery action. The target node
will attempt to reestablish an existing recovery after a network
failure. This is necessary to ensure that the retries occurring on the
source node provide value in bidirectional failures.
Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.
Closes#57798
When you run a `significant_terms` aggregation on a field and it *is*
mapped but there aren't any values for it then the count of the
documents that match the query on that shard still have to be added to
the overall doc count. I broke that in #57361. This fixes that.
Closes#57402
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.
This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.
Related #38373, #41656Closes#24422
We want to validate the DataStreams on creation to make sure the future backing
indices would not clash with existing indices in the system (so we can
always rollover the data stream).
This changes the validation logic to allow for a DataStream to be created
with a backing index that has a prefix (eg. `shrink-foo-000001`) even if the
former backing index (`foo-000001`) exists in the system.
The new validation logic will look for potential index conflicts with indices
in the system that have the counter in the name greater than the data stream's
generation.
This ensures that the `DataStream`'s future rollovers are safe because for a
`DataStream` `foo` of generation 4, we will look for standalone indices in the
form of `foo-%06d` with the counter greater than 4 (ie. validation will fail if
`foo-000006` exists in the system), but will also allow replacing a
backing index with an index named by prefixing the backing index it replaces.
(cherry picked from commit 695b242d69f0dc017e732b63737625adb01fe595)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
* Fix Bug With RepositoryData Caching
This fixes a really subtle bug with caching `RepositoryData`
that can corrupt a repository.
We were caching `RepositoryData` serialized in the newest
metadata format. This lead to a confusing situation where
numeric shard generations would be cached in `ShardGenerations`
that were not written to the repository because the repository
or cluster did not yet support `ShardGenerations`.
In the case where shard generations are not actually supported yet,
these cached numeric generations are not safe and there's multiple
scenarios where they would be incorrect, leading to the repository
trying to read shard level metadata from index-N that don't exist.
This commit makes it so that cached metadata is always in the same
format as the metadata in the repository.
Relates #57798
This makes it easier to debug where such tasks come from in case they are returned from the get tasks API.
Also renamed the last occurrence of waitForCompletion to waitForCompletionTimeout in get async search request.
Improve efficiency of background indexer by allowing to add
an assertion for failures while they are produced to prevent
queuing them up.
Also, add non-blocking stop to the background indexer so that when
stopping multiple indexers we don't needlessly continue indexing
on some indexers while stopping another one.
Closes#57766
This removes the deprecated `asMultiBucketAggregator` wrapper from
`scripted_metric`. Unlike most other such removals, this isn't likely to
save much memory. But it does make the internals of the aggregator
slightly less twisted.
Relates to #56487
Backport of #57640 to 7.x branch.
Composable templates with exact matches, can match with the data stream name, but not with the backing index name.
Also if the backing index naming scheme changes, then a composable template may never match with a backing index.
In that case mappings and settings may not get applied.
#47711 and #47246 helped to validate that monitoring settings are
rejected at time of setting the monitoring settings. Else an invalid
monitoring setting can find it's way into the cluster state and result
in an exception thrown [1] on the cluster state application (there by
causing significant issues). Some additional monitoring settings have
been identified that can result in invalid cluster state that also
result in exceptions thrown on cluster state application.
All settings require a type of either http or local to be
applicable. When a setting is changed, the exporters are automatically
updated with the new settings. However, if the old or new settings lack
of a type setting an exception will be thrown (since exporters are
always of type 'http' or 'local'). Arguably we shouldn't blindly create
and destroy new exporters on each monitoring setting update, but the
lifecycle of the exporters is abit out the scope this PR is trying to
address.
This commit introduces a similar methodology to check for validity as
#47711 and #47246 but this time for ALL (including non-http) settings.
Monitoring settings are not useful unless there an exporter with a type
defined. The type is used as dependent setting, such that it must
exist to set the value. This ensures that when any monitoring settings
changes that they can only get added to cluster state if the type
exists. If the type exists (and the other validations pass) then the
exporters will get re-built and the cluster state remains valid.
Tests have been included to ensure that all dynamic monitoring settings
have the type as dependent settings.
[1]
org.elasticsearch.common.settings.SettingsException: missing exporter type for [found-user-defined] exporter
at org.elasticsearch.xpack.monitoring.exporter.Exporters.initExporters(Exporters.java:126) ~[?:?]
Prior to this commit, `cluster.max_shards_per_node` is not correctly handled
when it is set via the YAML config file, only when it is set via the Cluster
Settings API.
This commit refactors how the limit is implemented, both to enable correctly
handling the setting in the YAML and to more effectively centralize the logic
used to enforce the limit. The logic used to apply the limit, as well as the
setting value, has been moved to the new `ShardLimitValidator`.
Merges the remaining implementation of `significant_terms` into `terms`
so that we can more easilly make them work properly without
`asMultiBucketAggregator` which *should* save memory and speed them up.
Relates #56487
Today `GET _cluster/health?wait_for_events=...&timeout=...` will wait
indefinitely for the master to process the pending cluster health task,
ignoring the specified timeout. This could take a very long time if the master
is overloaded. This commit fixes this by adding a timeout to the pending
cluster health task.
This PR replaces the marker interface with the method
FieldMapper#parsesArrayValue. I find this cleaner and it will help with the
fields retrieval work (#55363).
The refactor also ensures that only field mappers can declare they parse array
values. Previously other types like ObjectMapper could implement the marker
interface and be passed array values, which doesn't make sense.
The test for `auto_date_histogram` as trying to round `Long.MAX_VALUE`
if there were 0 buckets. That doesn't work.
Also, this replaces all of the class variables created to make
consistent random result when testing `InternalAutoDateHistogram` with
the newer `randomResultsToReduce` which is a little simpler to
understand.
The test failed when it was running with 4 replicas and 3 indexing
threads. The recovering replicas can prevent the global checkpoint from
advancing. This commit increases the timeout to 60 seconds for this
suite and the check for no inflight requests.
Closes#57204
SigTerms cannot run on fields that are not searchable, and SigText
cannot run on fields that do not have analyzers. Both of these
situations fail today with an esoteric exception, so this just formalizes
the constraint by throwing an IllegalArgumentException up front.
In practice, the only affected field seems to be the `binary` field,
which is neither searchable or has a default analyzer (e.g. even numeric
and keyword fields have a default analyzer despite not being tokenized)
Adds supported-type tests, and makes some changes to the test itself
to allow testing sigtext (indexing _source).
Also a few tweaks to the test to avoid bad randomization (negative
numbers, etc).
When the `terms` agg runs against strings and uses global ordinals it
has an optimization when it collects segments that only ever have a
single value for the particular string. This is *very* common. But I
broke it in #57241. This fixes that optimization and adds `debug`
information that you can use to see how often we collect segments of
each type. And adds a test to make sure that I don't break the
optimization again.
We also had a specialiation for when there isn't a filter on the terms
to aggregate. I had removed that specialization in #57241 which resulted
in some slow down as well. This adds it back but in a more clear way.
And, hopefully, a way that is marginally faster when there *is* a
filter.
Closes#57407
Almost every outbound message is serialized to buffers of 16k pagesize.
We were serializing these messages off the IO loop (and retaining the concrete message
instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the
channel is ready.
1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average.
2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically.
With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable.
Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC.
This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain).
I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once.
Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.
Allow for a fairer distribution of snapshot and restore operations
to enable parallel snapshots and improve behaviour for parallel snapshot + restore.
Closes#55803
There are several mapping settings that are currently re-parsed every
time they are read. This can be quite frequent, for example within every
document ingestion. This commit moves the parsed versions of these
mapping settings to be stored in IndexSettings, just as other index settings
are already.
closes#57395
The `routingNodes` variable is unused. Replace `clusterState.getRoutingNodes()` with `routingNodes`.
Co-authored-by: Boice Huang <boicehuang@tencent.com>
At some point, we changed the supported-type test to also catch
assertion errors. This has the side effect of also catching the
`fail()` call inside the try-catch, which silently smothered some
failures.
This modifies the test to throw at the end of the try-catch
block to prevent from accidentally catching itself.
Catching the AssertionError is convenient because there are other locations
that do throw an assertion in tests (due to hitting an assertion
before the exception is thrown) so I think we should keep it around.
Also includes a variety of fixes to other tests which were failing
but being silently smothered.
In case the local checkpoint in the latest commit is less
than the last processed local checkpoint we would recover
0 ops and hence not commit again.
This would lead to the logic in `IndexShard#recoverLocallyUpToGlobalCheckpoint`
not seeing the latest local checkpoint when it reload the safe commit from the store
and thus cause inefficient recoveries because the recoveries would work from a
lower than possible local checkpoint.
Closes#57010
This merges the global-ordinals-based implementation for
`significant_terms` into the global-ordinals-based implementation of
`terms`, removing a bunch of copy and pasted code that is subtly
different across the two implementations and replacing it with an
explicit `ResultStrategy` with nice stuff like Javadoc.
The actual behavior is mostly unchanged, though I was able to remove a
redundant copy of bytes representing the string from the result
construction phase of `significant_terms`.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Previously we'd get a `ClassCastException` when you tried to use
`numeric_type` on `scaled_float`. Oops! This cleans up the CCE and moves
some code around so the casting actually works.
This includes a few small cleanups for the `TermsAggregatorFactory`:
1. Removes an unused `DeprecationLogger`
2. Moves the members to right above the ctor.
3. Merges some all of the heuristics for picking `SubAggCollectionMode`
into a single method.
This saves some memory when the `histogram` aggregation is not a top
level aggregation by dropping `asMultiBucketAggregator` in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the `LongKeyedBucketOrds` that we built the
first time we did this.
* Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695)
This commit lays the ground work for plugins supplying their own circuit breakers.
It adds a new interface: `CircuitBreakerPlugin`.
This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers.
With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.
Unfortunately, we cannot have a safety mechnism like this where we throw whenever we find unreadable data in a shard.
This breaks in the case of an older ES version (without shard generations enabled) having failed to snapshot a shard snapshot after writing some data to its path and having finalized it for example.
Another example of where we can't support this check is the test I added, if we snapshot an index with a name that already exists in the repository and more shards than the existing index, fail doing that and then retry snapshotting it we will also see unexpected data in the path.
We could technically do deeper inspections on the unexpected data but I don't think it's worth it really. In the end if we are unable to read the data here it's broken anyway. By moving to a new `index-` blob in the shard directory I don't see us ever
corrupting existing data and since we (by virtue of moving to an empty generation) won't do any incremental work on top of potentially corrupt data we also do not risk creating broken snapshots going forward.
=> Just logging a warning in this very unlikely case is the best we can do I think
When the parameter `max_docs` is less than `slices` in update_by_query,
delete_by_query or reindex API, `max_docs ` is set to 0 and we throw an
action_request_validation_exception with confused error message:
"maxDocs should be greater than 0...".
This change checks that whether `max_docs` is less than `slices` and
throw an illegal_argument_exception with clear message.
Relates to #52786.
Co-authored-by: bellengao <gbl_long@163.com>
Backport of #56878 to 7.x branch.
With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.
Relates to #53100
Jackson 2.10 library has added a new type of error that is thrown when a numeric value is out
of range. This error should be catch and handle properly in case the flag ignore_malformed
has been set to true.
When the `terms` enum operates on non-numeric data it can collect it via
global ordinals. It actually has two separate collection strategies for,
one "dense" and one "remapping". Each of *those* strategies has two
"iteration" strategies that it uses to build buckets, depending on
whether or not we need buckets with `0` docs in them. Previously this
was done with several `null` checks and never really explained. This
change replaces those checks with two `CollectionStrategy` classes which
have good stuff like documentation.
Backporting #56888 to 7.x branch.
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.
This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.
Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.
Relates to #53100
If an upgraded node is restarted multiple times without flushing a new
index commit, then we will wrongly exclude all commits from the starting
commits. This bug is reproducible with these minimal steps: (1) create
an empty index on 6.1.4 with translog retention disabled, (2) upgrade
the cluster to 7.7.0, (3) restart the upgraded the cluster. The problem
is that with the new translog policy can trim translog without having a
new index commit, while the existing commit still refers to the previous
translog generation.
Closes#57091
This saves memory when running numeric significant terms which are not
at the top level by merging its collection into numeric terms and relying
on the optimization that we made in #55873.
Slow loggers should use single shared logger as otherwise when index is
deleted the log4j logger will remain reachable (log4j is caching) and
will create a memory leak.
closes https://github.com/elastic/elasticsearch/issues/56171
When we had multiple mapping types, an update to a field in one type had to be
propagated to the same field in all other types. This was done using the
Mapper.updateFieldType() method, called at the end of a merge. However, now
that we only have a single type per index, this method is unnecessary and can
be removed.
Relates to #41059
Backport of #56986
Closes#57168 by using `AggregatorTestCase#newIndexSearcher` in the
`AggregatorTestCase#testCase`. Without that global ordinals will
*sometimes* fail to work.
Until 7.7 we used to ignore `null` values for `bool`queries `minimum_should_match`,
parameters and also for the `must`, `must_not`, `should` and `filter` clauses.
An internal refactoring has changed this so now we get a parsing error. While `null`
should not a common value here, we should restore the old behaviour for bwc for now.
Closes#56812
If a partial snapshot has some of its shards aborted because an index got deleted, this can lead to confusing `IllegalStateExceptions` when trying to increment the ref count of the already closed `Store`.
Refactored this a little to throw the same exception for aborted shards no matter the timing of the store close and assert that the concurrent store close can in fact only happen when the shard snapshot has already been aborted.
A task might not be canceled on disconnection if it is completed before the cancellation
is started. We need to relax the assertion in this test.
Closes#56746
When slicing a releasable bytes reference we would create a new counter
every time and pass the original reference chain to the new slice on every
slice invocation. This would lead to extremely deep reference chains and
needlessly uses a dedicated counter for every slice when all the slices
eventually just refer to the same underlying bytes and `Releasable`.
This commit tracks the ref count wrapper with its releasable in a separate
object that can be passed around on every slicing, making the slices' tree
as flat as the original releasable bytes reference.
Also, we were needlessly creating a redundant releasable bytes reference from
a releasable bytes-stream-output that we never actually used for releasing (all code
that uses it just releases the stream itself instead).
We don't need to hold on to the request body past the beginning of sending
the response. There is no need to keep a reference to it until after the response
has been sent fully and we can eagerly release it here.
Note, this can be optimized further to release the contents even earlier but for now
this is an easy increment to saving some memory on the IO pool.
* Update DeprecationMap to DynamicMap (#56149)
This renames DeprecationMap to DynamicMap, and changes the deprecation
messages Map to accept a Map of String (keys) to Functions (updated values)
instead. This creates more flexibility in either logging or updating values from
params within a script. This change is required to fix (#52103) in a future PR.
* Fix Source Return Bug in Scripting (#56831)
This change ensures that when a user returns _source directly no matter where
accessed within scripting, the value is a Map of the converted source as
opposed to a SourceLookup.
These log statements are also logged by every "simulate adding this index"
functionality. One of them is the rollover action in ILM which executes
rollover dry-runs until the conditions are met, when the actual rollover
is executed.
This changes the statements log level to DEBUG and changes the phrasing
from V1/V2 to legacy/composable templates.
(cherry picked from commit 7cc8e1fe7f9731213ac4869fe99853564fbaaba9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.
We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender
misinterpreting the response which may have unpredictable consequences.
Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.
Closes#54337
Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.
We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender
misinterpreting the response which may have unpredictable consequences.
Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.
Closes#54337
All of these files were written by us, and not sourced from
anywhere. Therefore, the license head should be granting licenses to
Elasticsearch, rathern than to the ASF. This commit address them by
changing the license to our standard Apache 2.0 license header.
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.
This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.
Relates to #56814
When `date_histogram` is a sub-aggregator it used to allocate a bunch of
objects for every one of it's parent's buckets. This uses the data
structures that we built in #55873 rework the `date_histogram`
aggregator instead of all of the allocation.
Part of #56487
Elasticsearch requires that a HttpRequest abstraction be implemented
by http modules before server processing. This abstraction controls when
underlying resources are released. This commit moves this abstraction to
be created immediately after content aggregation. This change will
enable follow-up work including moving Cors logic into the server
package and tracking bytes as they are aggregated from the network
level.
PR #56893 was supposed to randomise the iteration count in
`testDataOnlyNodePersistence` but this change was mistakenly omitted. This
commit addresses this.
This test failed if all 1000 top-level `rarely()` calls in the loop returned
`false`, because then we would never set the term of the persisted state. This
commit fixes this by adding an earlier call to `persistedState#setCurrentTerm`.
It also changes the test to clean up the threadpools it starts whether it
passes or fails.
When reading/writing the individual doc responses in the context
of a bulk shard response there is no need to serialize the `ShardId`
over and over. This can waste a lot of memory when handling large bulk
requests.
This assertion is too strict. A snapshot will be removed from the cluster state
on the CS thread before it is removed from the listeners map on the snapshot thread pool.
Throughout the removal from the cluster state and listener map, the snapshot is tracked
in `endingSnapshots` though, so we can relax the assertion accordingly and are still able
to catch leaked listeners.
Closes#56607
In the unlikely event that the data nodes started snapshotting the
shards already (and hence got blocked on the data blobs) before the
master has applied the cluster state to its own `SnapshotsService` on
the CS applier thread, we can get a `SnapshotMissingException` here which
breaks the busy assert loop so we have to deal with it explicitly.
Closes#56858
Currently it is possible that a sniff connection round is occurring as
we enter another test loop in testEnsureWeReconnect. The problem is that
once we enter another loop, closing the connection manually can cause
this pre-existing connection round to fail. This round failing can fail
the test. This commit fixes the issue by ensuring that there are no
in-progress connections before entering another loop.
It was relying on the compensated sum working but the test framework was
dodging it. This forces the accuracy tests to come from a single shard
where we get the proper compensated sum.
Closes#56757
We get the number of shards and replicas with our bare hands in index
metadata, rather than letting the settings infrastructure do the work
for us. This commit switches to using the settings infrastructure.
Today a 7.x node logs `cluster UUID set to [...]` on every cluster state update
received from a 6.8 master, because 6.8 nodes are not able to commit the
cluster UUID properly. We could try and deduplicate these logs somehow, but
that would introduce a good deal of complexity. Instead, this commit suppresses
these logs entirely when receiving cluster state updates from a 6.8 master.
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.
The aggregations supports the following normalizations
- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization
To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.
For example:
```
{
"normalize": {
"buckets_path": <>,
"normalizer": "percent"
}
}
```
If a channel gets disconnected, then we should cancel the tasks
associated with that channel as their results won't be retrieved.
Closes#56327
Relates #56619
Backport of #56620
We previously rejected removing the number of replicas setting, which
prevents users from reverting this setting to its default the natural
way. To fix this, we put back the setting with the default value in the
cases that the user is trying to remove it. Yet, we also need to do the
work of updating the routing table and so on appropriately. This case
was missed because when the setting is being removed, we were defaulting
to -1 in this code path, which is treated as not being updated. Instead,
we must treat the case when we are removing this setting as if the
setting is being updated, too. This commit does that.
In normal operation native controllers are not expected to write
anything to stdout or stderr. However, if due to an error or
something unexpected with the environment a native controller
does write something to stdout or stderr then it will block if
nothing is reading that output.
This change makes the stdout and stderr of native controllers
reuse the same stdout and stderr as the Elasticsearch JVM (which
are by default redirected to es.stdout.log and es.stderr.log) so
that if something unexpected is written to native controller
output then:
1. The native controller process does not block, waiting for
something to read the output
2. We can see what the output was, making it easier to debug
obscure environmental problems
Backport of #56491
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.
Precondition for the terms work in #56487
When using source filtering exclusions, empty arrays are not preserved in documents, and no empty arrays are returned if arrays are empty after applying exclusions. We have special treatment to make sure that we preserve empty objects, but the behaviour for arrays is different.
It looks like this regression was introduced by #22593, shortly after we refactored source filtering to use automata (#20736).
Note that this change affects what the search API returns when using source exclusions, as well as what gets indexed when using source exclusions for the _source field.
Closes#23796
This adds a few things to the `breakdown` of the profiler:
* `histogram` aggregations now contain `total_buckets` which is the
count of buckets that they collected. This could be useful when
debugging a histogram inside of another bucketing agg that is fairly
selective.
* All bucketing aggs that can delay their sub-aggregations will now add
a list of delayed sub-aggregations. This is useful because we
sometimes have fairly involved logic around which sub-aggregations get
delayed and this will save you from having to guess.
* Aggregtations wrapped in the `MultiBucketAggregatorWrapper` can't
accurately add anything to the breakdown. Instead they the wrapper
adds a marker entry `"multi_bucket_aggregator_wrapper": true` so we
can be quickly pick out such aggregations when debugging.
It also fixes a bug where `_count` breakdown entries were contributing
to the overall `time_in_nanos`. They didn't add a large amount of time
so it is unlikely that this caused a big problem, but I was there.
To support the arbitrary breakdown data this reworks the profiler so
that the `breakdown` can contain any data that is supported by
`StreamOutput#writeGenericValue(Object)` and
`XContentBuilder#value(Object)`.
This PR proposes to use `IndexSortSortedNumericDocValuesRangeQuery` when
possible to speed up certain range queries. Points-based queries are already
very efficient, the only time this query makes a difference is when the range
matches a large number of documents.
Relates to #48665.
This is similar to a previous change that allowed removing the number of
replicas settings (so setting it to its default) on open indices. This
commit allows the same for closed indices.
It is unfortunate that we have separate branches for handling open and
closed indices here, but I do not see a clean way to merge these two
together without making a rather unnatural method (note that they invoke
different methods for doing the settings updates). For now, we leave
this as-is even though it led to the miss here.
Today a user can create an index without setting the
index.number_of_replicas setting even though the index metadata requires
that the setting has a value. We do this when creating an index by
explicitly settings index.number_of_replicas to a default value if one
is not provided. However, if a user updates the number of replicas, and
then let wants to return to the default value, they are naturally
inclined to try setting this setting to null, as the agreed upon way to
return a setting to its default. Since the index metadata requires that
this setting has a non-null value, we blow up when a user attempts to
make this change. This is because we are not taking the same action when
updating a setting on an index that we take when create an
index. Namely, we are not explicitly setting index.number_of_replicas if
the request does not carry a value for this setting. This would happen
when nulling the setting, which we want to support. This commit
addresses this by setting index.number_of_replicas to the default if the
value for this setting is null when updating the settings for an index.
Currently the `time_zone` parameter in `query_string` queries gets applied
correctly only when using the range syntax, e.g "date:[2020-01-02 TO
2020-01-05]. When a date field gets searched without explicit range syntax, e.g.
"date:"2020-01-01" we internally create a range query than uses the specified
date as start date and rounds up to the next underspecified units for the end
date (e.g. here 2020-01-01T23:59:59) without considering the `time_zone`
settings. This change adds a check in QueryStringQueryParser to detect this
scenario early where we have access to the time zone information and directly
create a range query using it.
Closes#55813
In a race condition, a search context could remain enlisted in
SearchService when an index is deleted, potentially causing the index
folder to not be cleaned up (for either lengthy searches or scrolls with
timeouts > 30 minutes or if the scroll is kept active).
If a conditional is added to a processor, and that processor fails, and
that processor has an on_failure handler, the full trace of all of the
executed processors may not be displayed in simulate verbose. The
information is correct, but misses displaying some of the steps used
to get there.
This happens because a processor that is conditional processor is a
wrapper around the real processor and a processor with an on_failure
handler is also a wrapper around the processor(s). When decorating for
simulation we treat compound processor specially, but if a compound processor
is wrapped by a conditional processor that compound processor's processors
can be missed for decoration resulting in the missing displayed steps.
The fix to this is to treat the conditional processor specially and
explicitly seperate it from the processor it is wrapping. This requires
us to keep track of 2 processors a possible conditional processor and
the actual processor it may be wrapping.
related: #56004
Two spots that allow for some optimization:
* We are often creating a composite reference of just a single item in
the transport layer => special cased via static constructor to make sure we never do that
* Also removed the pointless case of an empty composite bytes ref
* `ByteBufferReference` is practically always created from a heap buffer these days so there
is no point of dealing with all the bounds checks and extra references to sliced buffers from that
and we can just use the underlying array directly
Today the heap size check warns the user about two issues why they might
care about the heap size check: resize pauses, and if memory locking is
enabled. Yet, we unconditionally make mention of the memory locking
reason, even if memory locking is not enabled. This can confuse some
users, so we adjust the warning about memory locking to only display if
memory locking is enabled.
Backport: #55377
This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.
Relates to #53100
Similar to what the moving function aggregation does, except merging windows of percentiles
sketches together instead of cumulatively merging final metrics
Move data stream resolvability test from IndicesOptionsIntegrationIT to DataStreamIT class.
Whether a transport action supports data streams is no longer controlled via indices options.
This adds support for parsing numbers as range keys. They get converted
into a string, but we allow numbers.
While I was there I replaced the parser for `Range` with a
`ConstructingObjectParser` which will automatically add support for "did
you mean" style corrections on errors.
Closes#56402
This commit refactors the following:
* GeoPointFieldMapper and PointFieldMapper to
AbstractPointGeometryFieldMapper derived from AbstractGeometryFieldMapper.
* .setupFieldType moved up to AbstractGeometryFieldMapper
* lucene indexing moved up to AbstractGeometryFieldMapper.parse
* new addStoredFields, addDocValuesFields abstract methods for implementing
stored field and doc values field indexing in the concrete field mappers
This refactor is the next phase for setting up a framework for extending
spatial field mapper functionality in x-pack.
This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that
allowed specifying whether V1 or V2 template should be used when an index is created. It has been
removed in favor of V2 templates always having priority.
Relates to #53101Resolves#56528
This is not a breaking change because this flag was never in a released version.
Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction
to be able to resolve data streams by default. Implementations can change this ability.
This change allows to following APIs to resolve data streams: flush,
refresh (already supported data streams), force merge, clear indices cache,
indices stats (already supported data streams), segments, upgrade stats,
upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and
reload analyzers APIs.
Relates to #53100
This wires `auto_date_histogram` into the rounding optimization that I
built in #55559. This is should significantly speed up any
`auto_date_histogram`s with `time_zone`s on them.
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:
1. We didn't have an appropriate data structure to track the
sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
per `Aggregator` which was the only way to delay collection of
sub-aggregations.
This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.
It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.
It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.
I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
When an index spans a daylight savings time transition we can't use our
optimization that rewrites the requested time zone to a fixed time zone
and instead we used to fall back to a java.util.time based rounding
implementation. In #55559 we optimized "time unit" rounding. This
optimizes "time interval" rounding.
The java.util.time based implementation is about 1650% slower than the
rounding implementation for a fixed time zone. This replaces it with a
similar optimization that is only about 30% slower than the fixed time
zone. The java.util.time implementation allocates a ton of short lived
objects but the optimized implementation doesn't. So it *might* end up
being faster than the microbenchmarks imply.
Use proper facility for creating temporary index service for the simulation
that does not add itself to the `IndicesService` unnecessarily (breaking an assertion about the
internal consistency of the cluster state and the `IndicesService`).
Closes#56298
Backport of: #56413
Allow cluster health api to resolve data streams and
automatically remove data streams after each test in
test cases extending from `ESIntegTestCase`
Relates to #53100
While investigating possible optimizations to speed up searchable
snapshots shard restores, we noticed that Elasticsearch builds the
list of shard files on local disk in order to compare it with the list of
files contained in the snapshot to restore. This list of files is
materialized with a MetadataSnapshot object whose construction
involves to read the footer checksum of every files of the shard
using Store.checksumFromLuceneFile() method.
Further investigation shows that a MetadataSnapshot object is
also created for other types of operations like building the list of
files to recover in a peer recovery (and primary shard relocation)
or in order to assign a shard to a node. These operations use the
Store.getMetadata(IndexCommit) method to build the list of files
and checksums.
In the case of searchable snapshots building the MetadataSnapshot
object can potentially trigger cache misses, which in turn can
cause the download and the writing in cache of the last range of
the file in order to check the 16 bytes footer. This in turn can
cause more evictions.
Since searchable snapshots already contains the footer information
of every file in BlobStoreIndexShardSnapshot it can directly read the
checksum from it and avoid to use the cache at all to create a
MetadataSnapshot for the operations mentioned above.
This commit adds a shortcut to the
SearchableSnapshotDirectory.openInput() method - similarly to what
already exists for segment infos - so that it creates a specific
IndexInput for checksum reading operation.
A bug in InternalGeoCentroid#reduce existed that summed up
the aggregation's long-valued counts into a local integer variable.
Since it is definitely possible to reduce more than Integer.MAX points,
this change simply updates that variable to be a long-valued number.
Closes#55992.
Currently, the logging around the SniffConnectionStrategy is limited.
The log messages are inconsistent and sometimes wrong. This commit
cleans up these log message to describe when connections are happening
and what failed if a step fails.
Additionally, this commit enables TRACE logging for a problematic test
(testEnsureWeReconnect).
Currently when a connection closes a new sniff round begins. The
testCollectNodes test closes four transports before triggering the
method to collect the remote nodes. This leads to a race where there are
a number of reasons the collect nodes call might fail. This commit fixes
that issue by changing the test assertion to include a potential failure
condition.
Fixes#55292.
`auto_date_histogram` was returning the incorrect `interval` because
of a combination of two things:
1. When pipeline aggregations rewrote `auto_date_histogram` we reset the
interval to 1. Oops. Fixed that.
2. *Every* bucket aggregation was rewriting its buckets as though there
was a pipeline aggregation even if there aren't any. This is a bit
silly so we skip that too.
Closes#56116
We fail to unregister the child node in registerAndExecute if the parent
task is being canceled. This leads to a bug where a cancel request never
completes.
Closes#55875
Relates #54312
Rounding dates on a shard that contains a daylight savings time transition
is currently something like 1400% slower than when a shard contains dates
only on one side of the DST transition. And it makes a ton of short lived
garbage. This replaces that implementation with one that benchmarks to
having around 30% overhead instead of the 1400%. And it doesn't generate
any garbage per search hit.
Some background:
There are two ways to round in ES:
* Round to the nearest time unit (Day/Hour/Week/Month/etc)
* Round to the nearest time *interval* (3 days/2 weeks/etc)
I'm only optimizing the first one in this change and plan to do the second
in a follow up. It turns out that rounding to the nearest unit really *is*
two problems: when the unit rounds to midnight (day/week/month/year) and
when it doesn't (hour/minute/second). Rounding to midnight is consistently
about 25% faster and rounding to individual hour or minutes.
This optimization relies on being able to *usually* figure out what the
minimum and maximum dates are on the shard. This is similar to an existing
optimization where we rewrite time zones that aren't fixed
(think America/New_York and its daylight savings time transitions) into
fixed time zones so long as there isn't a daylight savings time transition
on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement
time interval rounding the time zone rewriting optimization *should* no
longer be needed.
This optimization doesn't come into play for `composite` or
`auto_date_histogram` aggs because neither have been migrated to the new
`DATE` `ValuesSourceType` which is where that range lookup happens. When
they are they will be able to pick up the optimization without much work.
I expect this to be substantial for `auto_date_histogram` but less so for
`composite` because it deals with fewer values.
Note: My 30% overhead figure comes from small numbers of daylight savings
time transitions. That overhead gets higher when there are more
transitions in logarithmic fashion. When there are two thousand years
worth of transitions my algorithm ends up being 250% slower than rounding
without a time zone, but java time is 47000% slower at that point,
allocating memory as fast as it possibly can.
We were logging the cleanup of the snap- and meta- blobs for every snapshot delete
which is needlessly noisy and confusing to users. We should only log actual stale/unexpected
blobs here.
This commit creates a new gradle plugin to provide a separate task name
and source set for running ESIntegTestCase tests. The only project
converted to use the new plugin in this PR is server, as an example. The
remaining cases in x-pack will be handled in followups.
backport of #55896
`FieldMapper#parseCreateField` accepts the parse context, plus a list of fields
as an output parameter. These fields are immediately added to the document
through `ParseContext#doc()`.
This commit simplifies the signature by removing the list of fields, and having
the mappers add the fields directly to `ParseContext#doc()`. I think this is
nicer for implementors, because previously fields could be added either through
the list, or the context (through `add`, `addWithKey`, etc.)
A FilterBlobContainer class was introduced in #55952 and it delegates
its behavior to a given BlobContainer while allowing to override
only necessary methods.
This commit replaces the existing BlobContainerWrapper class from
the test framework with the new FilterBlobContainer from core.
this commit adds aggregation support for the geo_shape field
type on geo*_grid aggregations.
it introduces a Tiler for both tiles and hashes that enables a new type of
ValuesSource to replace the GeoPoint's CellIdSource. This makes it possible
for the existing Aggregator to be re-used, so no new implementations of
the grid aggregators are added.
This commit changes searchable snapshots so that it now respects the
repository's max_restore_bytes_per_sec setting when it downloads blobs.
Backport of #55952 for 7.x
This arose after a backport where we do not have the nicities of the
Java 11 diamond operator. This commit fixes it by adding the proper type
parameter.
When the index we are validating a query does not exist, we try to send
back a response letting the client know that the index does not
exist. Yet, we accidentally fallthrough into the case that the
validation failed for some other reason. This means that we end up
notifying the channel twice. Sometimes the notification occurs after the
failure has been written out and the channel closed (so the second
invocation leads to a silent failed to write to a closed channel issue),
and sometimes the response does end up in the channel, creating garbled
responses to the client. This commit fixes that issue by avoiding the
fallthrough.
Backport of #56034.
Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context
as a dedicated field that callers to IndexNameExpressionResolver can set.
Also alter indices stats api to support data streams.
The rollover api uses this api and otherwise rolling over data stream does no longer work.
Relates to #53100
Backport of #55858 to 7.x branch.
Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.
This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:
* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See #55377
* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:
AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.
The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.
Relates to #53100
It's possible for a constant_keyword to have a 'null' value before any documents
are seen that contain a value for the field. In this case, no documents have a
value for the field, and 'exists' queries should return no documents.
Making use of #55773 to simplify snapshot state machine.
1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away
instead of doing a second CS update after the fist update was a NOOP.
2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot
and then move on to delete from the repository.
Backports #55933 to 7.x
Implements value_count and avg aggregations over Histogram fields as discussed in #53285
- value_count returns the sum of all counts array of the histograms
- avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array
Using optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Allows for a follow-up to remove the snapshot INIT state.
* Allow Deleting Multiple Snapshots at Once (#55474)
Adds deleting multiple snapshots in one go without significantly changing the mechanics of snapshot deletes otherwise.
This change does not yet allow mixing snapshot delete and abort. Abort is still only allowed for a single snapshot delete by exact name.
I see occasional confusion about the explanations emitted by the same-shard
allocation decider, particularly amongst new users setting up a single-node
cluster and trying to determine why their cluster has `yellow` health. For
example:
the shard cannot be allocated to the same node on which a copy of the shard
already exists
This is technically correct but it's quite a complicated sentence. Also, by
starting with "the shard cannot be allocated" it makes it sound like this is
the problem, whereas in fact this message is a good thing and users should
typically focus their attention elsewhere.
This commit simplifies the wording of these messages and makes them sound more
positive, for example:
a copy of this shard is already allocated to this node
In order to iterate through remote connections, the remote connection
manager maintains a local cache of connected nodes. Unfortunately this
is difficult in relationship with testing as it is inherently racy in
comparison to the parent connection manager map of connections.
This commit improves the relationship by only returning a cached
connection if it is still registered with the parent. If the connection
is not open, we will go to the slow path of allocating a iterator
directly from the parent.
* Emit deprecation warning if multiple v1 templates match with a new index (#55558)
* Emit deprecation warning if multiple v1 templates match with a new index
* DEPRECATION_LOGGER rename
This commit includes a number of minor improvements around `DelayableWriteable`: javadocs were expanded and reworded, `get` was renamed to `expand` and `DelayableWriteable` no longer implements `Supplier`. Also a couple of methods are now private instead of package private.
We generate a circular reference exception in translog in 6.8 in the
following scenario:
- The first rollGeneration hits "too many open files" exception when it's
copying a checkpoint file. We will set the tragic exception and close
the translog
- The second rollGeneration hits AlreadyClosedException as the current
writer is closed. We will suppress the ACE to the current tragic
exception. Unfortunately, this leads to a circular reference as ACE
already suppresses the tragic exception.
Other factors that help to manifest this bug:
- We do not fail the engine on AlreadyClosedException in flush
- We do not check for ensureOpen before rolling a new generation
Closes#55893
Ocassionally this test can fail when the randomized index.version.created is
before 6.1. In this case we don't check that if mappedFields.size() == 0 we
expect a MatchNoDocsQuery query being returned, which we do for other versions.
This fails only occasionally but with the seed provided on the original issue.
It also shouldn't be an issue on master since we shouldn't test with these pre-7
index versions there.
Closes#55950
Backports #55826 to 7.x
Modified AggregatorTestCase.searchAndReduce() method so that it returns an empty aggregation result when no documents have been inserted.
Also refactored several aggregation tests so they do not re-implement method AggregatorTestCase.testCase()
Fixes#55824
Currently if we shortcircuit a message the breaker release is null since
there is nothing to be broken. However, the TcpTransportChannel
infrastructure still expects it. This commit resolves this issue be
returning a no-op breaker release.
Currently `currentFieldType` is an instance variable that is first set and then
used by all methods referring to it. We can make it local to each method
instead, avoiding possible state problems and improve readability of the code
instead.
This adds a new api to simulate matching the given index name against the
index templates in the system.
The syntax for the new API takes the following form:
POST _index_template/_simulate_index/{index_name}
{
"index_patterns": ["logs-*"],
"priority": 15,
"template": {
"settings": {
"number_of_shards": 3
}
...
}
}
Where the body is optional, but we support the entire body used by the
PUT _index_template/{name} api. When the body is specified we'll simulate
matching the given index against a system that'd have the given index
template together with the index templates that exist in the system.
The response, in both cases, will return the matching template's resolved
settings, mappings and aliases, together with a special field that'll print any
overlapping templates and their corresponding index patterns.
(cherry picked from commit 1a5845edce1f445c58e094e9a3b6792e21e543b0)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285
Backports #55681 to 7.x
Today if state recovery is delayed by the `gateway.recover_after_*` settings
then we may end up performing state recovery twice: once when enough nodes have
joined the cluster, and again when the timeout elapses. The second state
recovery reinitializes the routing table, effectively discarding all
recovered/recovering shards and starting again from scratch. This commit adds a
check to prevent this second state recovery.
Closes#55564
Currently the testTransientErrorsDuringRecoveryAreRetried validates that
the expected peer recovery starts only once. This check is coarse and is
executed on all nodes and indexes. This commit modifies this check to
only be performed on the expected index. Additionally this commit
removes the disruption behavior from the "blue" node where it is not
relevant. Finally, this commit improves the logging for this test.
This replaces a reference to the result of partially reducing
aggregations that async search keeps with a reference to the serialized
form of the result of the partial reduction which we need to keep
anyway.
When calling scripts in metric aggregation, the returned metric state is
passed along to the coordinating node to do the final reduce. However,
it is possible the object could contain nested state which is unknown to
StreamOutput/StreamInput. This would then result in the node crashing as
exceptions are not expected in the middle of serialization.
This commit adds a method to StreamOutput that can determine if an
object is writeable by the stream. It uses the same logic
writeGenericValue, special casing each of the supported collection types
to recursively determine if each contained value is itself writeable.
relates #54708
Currently cancelling the RetryableAction does not stop one last run from
being executed. This commit makes a best effort attempt to cancel a
scheduled retry and guards future executions from the action already
being completed.
Currently a failed peer recovery action will fail an recovery. This
includes when the recovery fails due to potentially short lived
transient issues such as rejected exceptions or circuit breaking
errors.
This commit adds the concept of a retryable action. A retryable action
will be retryed in face of certain errors. The action will be retried
after an exponentially increasing backoff period. After defined time,
the action will timeout.
This commit only implements retries for responses that indicate the
target node has NOT executed the action.
If we advance the global checkpoint during commit and sync that
checkpoint after commit, then the assertions in the test won't hold
because the deletion policy did not see the latest global checkpoint
but only the value before committing.
Closes#55680
The disk decider had special handling for the single data node case,
allowing any allocation (skipping watermark checks) for such clusters.
This special handling can now be avoided via a setting.
If a node client (or rather its underlying node) is closed then
any executions on it will just quietly fail as happens in #55660
via closing the nodes on the test thread and asynchronously using
a node client.
Closes#55660
For 7.x, we already implemented the `?prefer_v2_templates` flag and made V2 templates opt-in, so we
can relax the error when updating V1 templates to just a warning. This will still be a hard error
for 8.0+
Relates to #53101
This has no practical impact on users since frozen indices are the only
throttled indices today. However this has an impact on upcoming features
that would use search throttling.
Filtering out throttled indices made sense a couple years ago, but as
we're now improving support for slow requests with `_async_search` and
exploring ways to reduce storage costs, this feature has most likely
become a trap, that we'd like to not have with upcoming features that
would use search throttling.
Relates #54058
Currently there is a clear mechanism to stub sending a request through
the transport. However, this is limited to testing exceptions on the
sender side. This commit reworks our transport related testing
infrastructure to allow stubbing request handling on the receiving side.
Fixes confusing error message when unknown value type is specified in a terms
aggregation. Adds support for parsing "numeric" and "number" value types.
Fixes#55727
We make sure to filter shard generations for indices that are missing
from the metadata when finalizing a partial snapshot (from concurrent index deletion)
but we failed to account for the case where we manually build a fake metadata instance
for snapshots without the global state.
Fixed this by handling missing indices by skipping, same way we do it for filtering the
shard generations.
Relates #50234
With this change, we will always return true for can_match requests on
idle search shards; otherwise, some shards will never get refreshed if
all search requests perform the can_match phase (i.e., total shards >
pre_filter_shard_size).
Relates #27500
Relates #50043
This commit refactors all spatial Field Mappers to a common
AbstractGeometryFieldMapper that implements shared parameter functionality
(e.g., ignore_malformed, ignore_z_value) and provides a common framework for
overriding type parsing, and building in xpack. Common shape functionality is
implemented in a new AbstractShapeGeometryFieldMapper that is reused and
overridden in GeoShapeFieldMapper, GeoShapeFieldMapperWithDocValues,
LegacyGeoShapeFieldMapper, and ShapeFieldMapper. This abstraction provides a
reusable foundation for adding new xpack features; such as coordinate reference
system support.
This adds a validation to VSParserHelper to ensure that a field or
script or both are specified by the user. This is technically
required today already, but throws an exception much deeper
in the agg framework and has a very unintuitive error for the user
(as well as eating more resources instead of failing early)
The (de)serialization code of the async search response
cannot handle exceptions that extend ElasticsearchException (e.g. ScriptException).
This commit fixes this bug by serializing the error with the more generic
StreamInput#writeException.
We are using `FORCE_STALE_PRIMARY_INSTANCE` in instance equality checks `==`
but were creating new instances of `ExistingStoreRecoverySource` when reading
from the wire. This could break these checks in corner cases, causing
`org.elasticsearch.cluster.routing.allocation.IndexMetadataUpdater#shardStarted`
to not remove the force allocation fake id when starting a shard.
Closes#55513
Currently QueryStringQueryParser already checks if the field limit is breached
at construction time, which e.g. leads to errors if the default field is set to
"*" or the default isn't used and there are more fields than the limit, even if the
query itself does not use all these fields.
This change moves this check to happen after query parsing. QueryStringQueryParser now
keeps track of the fields that are actually resolved while parsing. The size of
that set is later used to check against the limit set by the
`indices.query.bool.max_clause_count` setting.
Backport of #55158
Backport of #55115.
Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
This commit changes the way that V2 index, component, and request mappings are merged. Specifically:
- Fields are merged in a "replacement" manner, meaning that the entire definition is replaced rather
than merging the interior configuration
- Mapping metadata (all fields outside of `properties`) are merged recursively.
The merging for V1 templates does not change.
Relates to #53101
This commit adds a new GeoShapeBoundsAggregator to the spatial plugin and registers it with the GeoShapeValuesSourceType. This enables geo_bounds aggregations on geo_shape fields
This change fixes problem with updating Index Templates V2.
Validatation added in #54933 didn't filter list of conflicting templates correctly so
new template was always clashing with itself unless patterns were not changed completely.
Introduces InstantiatingObjectParser which is similar to the
ConstructingObjectParser, but instantiates the object using its constructor
instead of a builder function.
Closes#52499
After #53562, the `geo_shape` field mapper is registered within
a module. This opens the door for introducing a new `geo_shape`
field mapper into the Spatial Plugin that has doc-values support.
This is very much an extension of server's GeoShapeFieldMapper,
but with the addition of the doc values implementation.
There is no guarantee the observer and subsequent CS update will execute
before we move on to the next test here and we ahve to wait for the observer + CS update cycle to complete
before moving on to the next test.
closes#55481
If some of the nodes are pre-7.8 nodes, they may not support the `index.prefer_v2_templates`
setting (since it exists only on 7.8+). This commit makes sure that we only add this setting to the
index's metadata if all of the nodes are 7.8+. Otherwise we get errors like:
```
[ WARN ][o.e.i.c.IndicesClusterStateService] [v7.7.0-3] [test-snapshot-index][2] marking and sending shard failed due to [failed to create index]
java.lang.IllegalArgumentException: unknown setting [index.prefer_v2_templates] please check that any required plugins are installed, or check the breaking changes documentation for removed settings
at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:544) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:489) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:460) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:595) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:549) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:176) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndices(IndicesClusterStateService.java:484) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:246) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
```
Relates to #55411
Relates to #53101Resolves#55539
This change folds the removal of the in-progress snapshot entry
into setting the safe repository generation. Outside of removing
an unnecessary cluster state update, this also has the advantage
of removing a somewhat inconsistent cluster state where the safe
repository generation points at `RepositoryData` that contains a
finished snapshot while it is still in-progress in the cluster
state, making it easier to reason about the state machine of
upcoming concurrent snapshot operations.
In the documentation of `pre_filter_shard_size` we state that the pre-filter
phase will be executed if the request targets more than 128 shards, yet in the
current test randomization we check that the
TransportSearchAction.shouldPreFilterSearchShards is true already for 127
targeted shards. This should be raised to 129 instead.
Closes#55514
Today a read-only engine requires a complete history of operations, in the
sense that its local checkpoint must equal its maximum sequence number. This is
a valid check for read-only engines that were obtained by closing an index
since closing an index waits for all in-flight operations to complete. However
a snapshot may not have this property if it was taken while indexing was
ongoing, but that's ok.
This commit weakens the check for a complete history to exclude the case of a
searchable snapshot.
Relates #50999
This change ensures that we return the latest expiration time
when retrieving the response from the index.
This commit also fixes a bug that stops the garbage collection of saved responses if the async search index is deleted.
Validate adding global V2 templates don't configure the index.hidden setting.
This also prevents updating the component template to add the index.hidden
setting if that component template is referenced by a global index template.
(cherry picked from commit 2e768981809887649f49d265d039f056985f7e6a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Peer recovery fails if the primary does not see the recovering replica
in the replication group (when the cluster state update on the primary
is delayed). To verify the local recovery stats, we have to remember
this value in the first try because the local recovery happens once, and
its stats is reset when the recovery fails.
Closes#54829
If more than 100 shard-follow tasks are trying to connect to the remote
cluster, then some of them will abort with "connect listener queue is
full". This is because we retry on ESRejectedExecutionException, but not
on RejectedExecutionException.