We keep a static list of meta-fields: META_FIELDS_BEFORE_7_8
as it was before.
This is done to ensure the backwards compatability with pre 7.8 nodes.
Closes#57831
If `ExtraFS` decides to put `extra0/0` into the indices folder
then the previous logic in this test would have interpreted the `0`
as shard `0` of index `extra0` and fail to list its contents (since it's a file
and not an actual shard directory).
=> simplified the logic to use actually referenced `IndexId` for iterating over indices
instead.
Scheduling on the threadpool will throw if the scheduler is already
shut down. Handled by treating the rejection like any other non-retryable
exception.
Closes#58021
This moves the code to look up significance heuristics information like
background frequency and superset size out of
`SignificantTermsAggregatorFactory` and into its own home so that it is
easier to pass around. This will:
1. Make us feel better about ourselves for not passing around the
factory, which is really *supposed* to be a throw away thing.
2. Abstract the significance lookup logic so we can reuse it for the
`significant_text` aggregation.
3. Make if very simple to cache the background frequencies which should
speed up when the agg is a sub-agg. We had done this for numerics
but not string-shaped significant terms.
When a search phase fails, we release the context of all successful shards.
Successful shards that rewrite the request to match none will not create any context
since #. This change ensures that we don't try to release a `null` context on these
successful shards.
Closes#57945
Ensures that InternalClusterInfoService's internally cached stats are refreshed whenever the
shard size or disk usage function (to mock out disk usage) are overridden.
Closes#57888
Today `InternalEngine#releaseIndexCommit` fails with an
`AlreadyClosedException` if the engine is closed before the index commit is
released. This can happen if, for example, a node leaves and rejoins the
cluster and acquires an index commit for replica shard allocation concurrently
with shutting the shard down.
There's no need to fail the operation like this: if the engine is shut down
then we will clean up the unreferenced files when it's restarted (or if it's
allocated elsewhere) so we can suppress an `AlreadyClosedException` in this
case. This commit does so.
Fixes#57797
Per 49554 I added standard deviation sampling and variance sampling to the extended stats interface.
Closes#49554
Co-authored-by: Igor Motov <igor@motovs.org>
Co-authored-by: andrewjohnson2 <aj114114@gmail.com>
When reducing `auto_date_histogram` we were using `Rounding#round`
which is quite a bit more expensive than
```
Rounding.Prepared prepared = rounding.prepare(min, max);
long result = prepared.round(date);
```
when rounding to a non-fixed time zone like `America/New_York`. This
stops using the former and starts using the latter.
Relates to #56124
Use the the hack used in `CorruptedBlobStoreRepositoryIT` in more snapshot
failure tests to verify that BwC repository metadata is handled properly
in these so far not-test-covered scenarios.
Also, some minor related dry-up of snapshot tests.
Relates #57798
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
If a node is disconnected we retry. It does not make sense
to retry the recovery if the node is removed from the cluster though.
=> added a CS listener that cancels the recovery for removed nodes
Also, we were running the retry on the `SAME` pool which for each retry will
be the scheduler pool. Since the error path of the listener we use here
will do blocking operations when closing the resources used by the recovery
we can't use the `SAME` pool here since not all exceptions go to the `ActionListenerResponseHandler`
threading like e.g. `NodeNotConnectedException`.
Closes#57585
In ff9e8c622427d42a2d87b4ceb298d043ae3c4e6a we changed the format
used when serializing snapshot failures in the cluster state and
`SnapshotInfo`. This turned them from a short string holding all the
nested exception messages into a multi kb stacktrace in many cases.
This is not great if you snapshot a large number of shards that all fail
for example and massively blows up the size of the GET snapshots response
if there are snapshots with failures in there.
This change reverts to the format used for exceptions before the above commit.
Also, this change short circuits logging and serialization of the failure
for an aborted snapshot where we don't care about the specific message at all
and aligns the message to "aborted" in all cases (current if we aborted before any IO,
it would have been "aborted" and an exception when aborting later during IO).
Previously, hidden indices were not included in snapshots by default, unless
specified using one of the usual methods for doing so: naming indices directly,
using index patterns starting with a ., or specifying expand_wildcards to
a value that includes hidden (e.g. all or hidden,open).
This commit changes the default expand_wildcards value to include hidden
indices.
Fixed two newly introduced issues with rollover:
1. Using auto-expand replicas, rollover could result in unexpected log
messages on future indexes.
2. It did a reroute and other heavy work on the network thread.
Closes#57706
Supersedes #57865
Relates #53965
Allow for optimistic concurrency control during ingest by checking the
sequence number and primary term. This is accomplished by defining
_if_seq_no and _if_primary_term in the pipeline, similarly to _version
and _version_type.
Closes#41255
Co-authored-by: Maria Ralli <mariai.ralli@gmail.com>
The shrink action creates a shrunken index with the target number of shards.
This makes the shrink action data stream aware. If the ILM managed index is
part of a data stream the shrink action will make sure to swap the original
managed index with the shrunken one as part of the data stream's backing
indices and then delete the original index.
(cherry picked from commit 99aeed6acf4ae7cbdd97a3bcfe54c5d37ab7a574)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This commit fixes a bug on the composite aggregation when the index
is sorted and the primary composite source needs to round values (date_histo).
In such case, we cannot take into account the subsequent sources even if they
match the index sort because the rounding of the primary sort value may break
the original index order.
Fixes#57849
This deprecates `Rounding#round` and `Rounding#nextRoundingValue` in
favor of calling
```
Rounding.Prepared prepared = rounding.prepare(min, max);
...
prepared.round(val)
```
because it is always going to be faster to prepare once. There
are going to be some cases where we won't know what to prepare *for*
and in those cases you can call `prepareForUnknown` and stil be faster
than calling the deprecated method over and over and over again.
Ultimately, this is important because it doesn't look like there is an
easy way to cache `Rounding.Prepared` or any of its precursors like
`LocalTimeOffset.Lookup`. Instead, we can just build it at most once per
request.
Relates to #56124
Currently it is possible for a transient network error to disrupt the
start recovery request from the remote to source node. This disruption
is racy with the recovery occurring on the source node. It is possible
for the source node to finish and clear its recovery. When this occurs,
the recovery cannot be reestablished and the "no two start" assertion
is tripped. This commit fixes this issue by allowing two starts if the
finalize request has been received.
Fixes#57416.
Currently, the translog ops request is reentrent when there is a mapping
update. The impact of this is that a translog ops ends up waiting on the
pre-existing listener and it is never completed. This commit fixes this
by introducing a new code path to avoid the idempotency logic.
The action name is passed to the `ChannelListener` and is used for
logging purposes. Currently, we are using the incorrect action name for
the translog ops listener. This commit fixes the issue.
This reworks string flavored implementations of the `terms` aggregation
to save memory when it is under another bucket by dropping the usage of
`asMultiBucketAggregator`.
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
Currently we check that exceptions are the same in the recovery request
tracker test. This is inconsistent because the future wraps the
exception in a new instance. This commit fixes the test by comparing a
random exception message.
Fixes#57199
In #57701 we changed mappings merging so that duplicate fields specified in mappings caused an
exception during validation. This change makes the same exception thrown when metadata fields are
duplicated. This will allow us to be strict currently with plans to make the merging more
fine-grained in a later release.
Currently a network disruption will fail a peer recovery. This commit
adds network errors as retryable actions for the source node.
Additionally, it adds sequence numbers to the recovery request to
ensure that the requests are idempotent.
Additionally it adds a reestablish recovery action. The target node
will attempt to reestablish an existing recovery after a network
failure. This is necessary to ensure that the retries occurring on the
source node provide value in bidirectional failures.
Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.
Closes#57798
When you run a `significant_terms` aggregation on a field and it *is*
mapped but there aren't any values for it then the count of the
documents that match the query on that shard still have to be added to
the overall doc count. I broke that in #57361. This fixes that.
Closes#57402
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.
This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.
Related #38373, #41656Closes#24422
We want to validate the DataStreams on creation to make sure the future backing
indices would not clash with existing indices in the system (so we can
always rollover the data stream).
This changes the validation logic to allow for a DataStream to be created
with a backing index that has a prefix (eg. `shrink-foo-000001`) even if the
former backing index (`foo-000001`) exists in the system.
The new validation logic will look for potential index conflicts with indices
in the system that have the counter in the name greater than the data stream's
generation.
This ensures that the `DataStream`'s future rollovers are safe because for a
`DataStream` `foo` of generation 4, we will look for standalone indices in the
form of `foo-%06d` with the counter greater than 4 (ie. validation will fail if
`foo-000006` exists in the system), but will also allow replacing a
backing index with an index named by prefixing the backing index it replaces.
(cherry picked from commit 695b242d69f0dc017e732b63737625adb01fe595)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
* Fix Bug With RepositoryData Caching
This fixes a really subtle bug with caching `RepositoryData`
that can corrupt a repository.
We were caching `RepositoryData` serialized in the newest
metadata format. This lead to a confusing situation where
numeric shard generations would be cached in `ShardGenerations`
that were not written to the repository because the repository
or cluster did not yet support `ShardGenerations`.
In the case where shard generations are not actually supported yet,
these cached numeric generations are not safe and there's multiple
scenarios where they would be incorrect, leading to the repository
trying to read shard level metadata from index-N that don't exist.
This commit makes it so that cached metadata is always in the same
format as the metadata in the repository.
Relates #57798
This makes it easier to debug where such tasks come from in case they are returned from the get tasks API.
Also renamed the last occurrence of waitForCompletion to waitForCompletionTimeout in get async search request.
Improve efficiency of background indexer by allowing to add
an assertion for failures while they are produced to prevent
queuing them up.
Also, add non-blocking stop to the background indexer so that when
stopping multiple indexers we don't needlessly continue indexing
on some indexers while stopping another one.
Closes#57766
This removes the deprecated `asMultiBucketAggregator` wrapper from
`scripted_metric`. Unlike most other such removals, this isn't likely to
save much memory. But it does make the internals of the aggregator
slightly less twisted.
Relates to #56487
Backport of #57640 to 7.x branch.
Composable templates with exact matches, can match with the data stream name, but not with the backing index name.
Also if the backing index naming scheme changes, then a composable template may never match with a backing index.
In that case mappings and settings may not get applied.
#47711 and #47246 helped to validate that monitoring settings are
rejected at time of setting the monitoring settings. Else an invalid
monitoring setting can find it's way into the cluster state and result
in an exception thrown [1] on the cluster state application (there by
causing significant issues). Some additional monitoring settings have
been identified that can result in invalid cluster state that also
result in exceptions thrown on cluster state application.
All settings require a type of either http or local to be
applicable. When a setting is changed, the exporters are automatically
updated with the new settings. However, if the old or new settings lack
of a type setting an exception will be thrown (since exporters are
always of type 'http' or 'local'). Arguably we shouldn't blindly create
and destroy new exporters on each monitoring setting update, but the
lifecycle of the exporters is abit out the scope this PR is trying to
address.
This commit introduces a similar methodology to check for validity as
#47711 and #47246 but this time for ALL (including non-http) settings.
Monitoring settings are not useful unless there an exporter with a type
defined. The type is used as dependent setting, such that it must
exist to set the value. This ensures that when any monitoring settings
changes that they can only get added to cluster state if the type
exists. If the type exists (and the other validations pass) then the
exporters will get re-built and the cluster state remains valid.
Tests have been included to ensure that all dynamic monitoring settings
have the type as dependent settings.
[1]
org.elasticsearch.common.settings.SettingsException: missing exporter type for [found-user-defined] exporter
at org.elasticsearch.xpack.monitoring.exporter.Exporters.initExporters(Exporters.java:126) ~[?:?]
Prior to this commit, `cluster.max_shards_per_node` is not correctly handled
when it is set via the YAML config file, only when it is set via the Cluster
Settings API.
This commit refactors how the limit is implemented, both to enable correctly
handling the setting in the YAML and to more effectively centralize the logic
used to enforce the limit. The logic used to apply the limit, as well as the
setting value, has been moved to the new `ShardLimitValidator`.
Merges the remaining implementation of `significant_terms` into `terms`
so that we can more easilly make them work properly without
`asMultiBucketAggregator` which *should* save memory and speed them up.
Relates #56487
Today `GET _cluster/health?wait_for_events=...&timeout=...` will wait
indefinitely for the master to process the pending cluster health task,
ignoring the specified timeout. This could take a very long time if the master
is overloaded. This commit fixes this by adding a timeout to the pending
cluster health task.
This PR replaces the marker interface with the method
FieldMapper#parsesArrayValue. I find this cleaner and it will help with the
fields retrieval work (#55363).
The refactor also ensures that only field mappers can declare they parse array
values. Previously other types like ObjectMapper could implement the marker
interface and be passed array values, which doesn't make sense.
The test for `auto_date_histogram` as trying to round `Long.MAX_VALUE`
if there were 0 buckets. That doesn't work.
Also, this replaces all of the class variables created to make
consistent random result when testing `InternalAutoDateHistogram` with
the newer `randomResultsToReduce` which is a little simpler to
understand.
The test failed when it was running with 4 replicas and 3 indexing
threads. The recovering replicas can prevent the global checkpoint from
advancing. This commit increases the timeout to 60 seconds for this
suite and the check for no inflight requests.
Closes#57204
SigTerms cannot run on fields that are not searchable, and SigText
cannot run on fields that do not have analyzers. Both of these
situations fail today with an esoteric exception, so this just formalizes
the constraint by throwing an IllegalArgumentException up front.
In practice, the only affected field seems to be the `binary` field,
which is neither searchable or has a default analyzer (e.g. even numeric
and keyword fields have a default analyzer despite not being tokenized)
Adds supported-type tests, and makes some changes to the test itself
to allow testing sigtext (indexing _source).
Also a few tweaks to the test to avoid bad randomization (negative
numbers, etc).
When the `terms` agg runs against strings and uses global ordinals it
has an optimization when it collects segments that only ever have a
single value for the particular string. This is *very* common. But I
broke it in #57241. This fixes that optimization and adds `debug`
information that you can use to see how often we collect segments of
each type. And adds a test to make sure that I don't break the
optimization again.
We also had a specialiation for when there isn't a filter on the terms
to aggregate. I had removed that specialization in #57241 which resulted
in some slow down as well. This adds it back but in a more clear way.
And, hopefully, a way that is marginally faster when there *is* a
filter.
Closes#57407
Almost every outbound message is serialized to buffers of 16k pagesize.
We were serializing these messages off the IO loop (and retaining the concrete message
instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the
channel is ready.
1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average.
2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically.
With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable.
Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC.
This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain).
I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once.
Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.
Allow for a fairer distribution of snapshot and restore operations
to enable parallel snapshots and improve behaviour for parallel snapshot + restore.
Closes#55803
There are several mapping settings that are currently re-parsed every
time they are read. This can be quite frequent, for example within every
document ingestion. This commit moves the parsed versions of these
mapping settings to be stored in IndexSettings, just as other index settings
are already.
closes#57395
The `routingNodes` variable is unused. Replace `clusterState.getRoutingNodes()` with `routingNodes`.
Co-authored-by: Boice Huang <boicehuang@tencent.com>
At some point, we changed the supported-type test to also catch
assertion errors. This has the side effect of also catching the
`fail()` call inside the try-catch, which silently smothered some
failures.
This modifies the test to throw at the end of the try-catch
block to prevent from accidentally catching itself.
Catching the AssertionError is convenient because there are other locations
that do throw an assertion in tests (due to hitting an assertion
before the exception is thrown) so I think we should keep it around.
Also includes a variety of fixes to other tests which were failing
but being silently smothered.
In case the local checkpoint in the latest commit is less
than the last processed local checkpoint we would recover
0 ops and hence not commit again.
This would lead to the logic in `IndexShard#recoverLocallyUpToGlobalCheckpoint`
not seeing the latest local checkpoint when it reload the safe commit from the store
and thus cause inefficient recoveries because the recoveries would work from a
lower than possible local checkpoint.
Closes#57010
This merges the global-ordinals-based implementation for
`significant_terms` into the global-ordinals-based implementation of
`terms`, removing a bunch of copy and pasted code that is subtly
different across the two implementations and replacing it with an
explicit `ResultStrategy` with nice stuff like Javadoc.
The actual behavior is mostly unchanged, though I was able to remove a
redundant copy of bytes representing the string from the result
construction phase of `significant_terms`.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Previously we'd get a `ClassCastException` when you tried to use
`numeric_type` on `scaled_float`. Oops! This cleans up the CCE and moves
some code around so the casting actually works.
This includes a few small cleanups for the `TermsAggregatorFactory`:
1. Removes an unused `DeprecationLogger`
2. Moves the members to right above the ctor.
3. Merges some all of the heuristics for picking `SubAggCollectionMode`
into a single method.
This saves some memory when the `histogram` aggregation is not a top
level aggregation by dropping `asMultiBucketAggregator` in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the `LongKeyedBucketOrds` that we built the
first time we did this.
* Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695)
This commit lays the ground work for plugins supplying their own circuit breakers.
It adds a new interface: `CircuitBreakerPlugin`.
This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers.
With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.
Unfortunately, we cannot have a safety mechnism like this where we throw whenever we find unreadable data in a shard.
This breaks in the case of an older ES version (without shard generations enabled) having failed to snapshot a shard snapshot after writing some data to its path and having finalized it for example.
Another example of where we can't support this check is the test I added, if we snapshot an index with a name that already exists in the repository and more shards than the existing index, fail doing that and then retry snapshotting it we will also see unexpected data in the path.
We could technically do deeper inspections on the unexpected data but I don't think it's worth it really. In the end if we are unable to read the data here it's broken anyway. By moving to a new `index-` blob in the shard directory I don't see us ever
corrupting existing data and since we (by virtue of moving to an empty generation) won't do any incremental work on top of potentially corrupt data we also do not risk creating broken snapshots going forward.
=> Just logging a warning in this very unlikely case is the best we can do I think
When the parameter `max_docs` is less than `slices` in update_by_query,
delete_by_query or reindex API, `max_docs ` is set to 0 and we throw an
action_request_validation_exception with confused error message:
"maxDocs should be greater than 0...".
This change checks that whether `max_docs` is less than `slices` and
throw an illegal_argument_exception with clear message.
Relates to #52786.
Co-authored-by: bellengao <gbl_long@163.com>
Backport of #56878 to 7.x branch.
With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.
Relates to #53100
Jackson 2.10 library has added a new type of error that is thrown when a numeric value is out
of range. This error should be catch and handle properly in case the flag ignore_malformed
has been set to true.
When the `terms` enum operates on non-numeric data it can collect it via
global ordinals. It actually has two separate collection strategies for,
one "dense" and one "remapping". Each of *those* strategies has two
"iteration" strategies that it uses to build buckets, depending on
whether or not we need buckets with `0` docs in them. Previously this
was done with several `null` checks and never really explained. This
change replaces those checks with two `CollectionStrategy` classes which
have good stuff like documentation.
Backporting #56888 to 7.x branch.
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.
This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.
Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.
Relates to #53100
If an upgraded node is restarted multiple times without flushing a new
index commit, then we will wrongly exclude all commits from the starting
commits. This bug is reproducible with these minimal steps: (1) create
an empty index on 6.1.4 with translog retention disabled, (2) upgrade
the cluster to 7.7.0, (3) restart the upgraded the cluster. The problem
is that with the new translog policy can trim translog without having a
new index commit, while the existing commit still refers to the previous
translog generation.
Closes#57091
This saves memory when running numeric significant terms which are not
at the top level by merging its collection into numeric terms and relying
on the optimization that we made in #55873.
Slow loggers should use single shared logger as otherwise when index is
deleted the log4j logger will remain reachable (log4j is caching) and
will create a memory leak.
closes https://github.com/elastic/elasticsearch/issues/56171
When we had multiple mapping types, an update to a field in one type had to be
propagated to the same field in all other types. This was done using the
Mapper.updateFieldType() method, called at the end of a merge. However, now
that we only have a single type per index, this method is unnecessary and can
be removed.
Relates to #41059
Backport of #56986
Closes#57168 by using `AggregatorTestCase#newIndexSearcher` in the
`AggregatorTestCase#testCase`. Without that global ordinals will
*sometimes* fail to work.
Until 7.7 we used to ignore `null` values for `bool`queries `minimum_should_match`,
parameters and also for the `must`, `must_not`, `should` and `filter` clauses.
An internal refactoring has changed this so now we get a parsing error. While `null`
should not a common value here, we should restore the old behaviour for bwc for now.
Closes#56812
If a partial snapshot has some of its shards aborted because an index got deleted, this can lead to confusing `IllegalStateExceptions` when trying to increment the ref count of the already closed `Store`.
Refactored this a little to throw the same exception for aborted shards no matter the timing of the store close and assert that the concurrent store close can in fact only happen when the shard snapshot has already been aborted.
A task might not be canceled on disconnection if it is completed before the cancellation
is started. We need to relax the assertion in this test.
Closes#56746
When slicing a releasable bytes reference we would create a new counter
every time and pass the original reference chain to the new slice on every
slice invocation. This would lead to extremely deep reference chains and
needlessly uses a dedicated counter for every slice when all the slices
eventually just refer to the same underlying bytes and `Releasable`.
This commit tracks the ref count wrapper with its releasable in a separate
object that can be passed around on every slicing, making the slices' tree
as flat as the original releasable bytes reference.
Also, we were needlessly creating a redundant releasable bytes reference from
a releasable bytes-stream-output that we never actually used for releasing (all code
that uses it just releases the stream itself instead).
We don't need to hold on to the request body past the beginning of sending
the response. There is no need to keep a reference to it until after the response
has been sent fully and we can eagerly release it here.
Note, this can be optimized further to release the contents even earlier but for now
this is an easy increment to saving some memory on the IO pool.
* Update DeprecationMap to DynamicMap (#56149)
This renames DeprecationMap to DynamicMap, and changes the deprecation
messages Map to accept a Map of String (keys) to Functions (updated values)
instead. This creates more flexibility in either logging or updating values from
params within a script. This change is required to fix (#52103) in a future PR.
* Fix Source Return Bug in Scripting (#56831)
This change ensures that when a user returns _source directly no matter where
accessed within scripting, the value is a Map of the converted source as
opposed to a SourceLookup.
These log statements are also logged by every "simulate adding this index"
functionality. One of them is the rollover action in ILM which executes
rollover dry-runs until the conditions are met, when the actual rollover
is executed.
This changes the statements log level to DEBUG and changes the phrasing
from V1/V2 to legacy/composable templates.
(cherry picked from commit 7cc8e1fe7f9731213ac4869fe99853564fbaaba9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.
We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender
misinterpreting the response which may have unpredictable consequences.
Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.
Closes#54337
Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.
We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender
misinterpreting the response which may have unpredictable consequences.
Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.
Closes#54337
All of these files were written by us, and not sourced from
anywhere. Therefore, the license head should be granting licenses to
Elasticsearch, rathern than to the ASF. This commit address them by
changing the license to our standard Apache 2.0 license header.
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.
This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.
Relates to #56814
When `date_histogram` is a sub-aggregator it used to allocate a bunch of
objects for every one of it's parent's buckets. This uses the data
structures that we built in #55873 rework the `date_histogram`
aggregator instead of all of the allocation.
Part of #56487
Elasticsearch requires that a HttpRequest abstraction be implemented
by http modules before server processing. This abstraction controls when
underlying resources are released. This commit moves this abstraction to
be created immediately after content aggregation. This change will
enable follow-up work including moving Cors logic into the server
package and tracking bytes as they are aggregated from the network
level.
PR #56893 was supposed to randomise the iteration count in
`testDataOnlyNodePersistence` but this change was mistakenly omitted. This
commit addresses this.
This test failed if all 1000 top-level `rarely()` calls in the loop returned
`false`, because then we would never set the term of the persisted state. This
commit fixes this by adding an earlier call to `persistedState#setCurrentTerm`.
It also changes the test to clean up the threadpools it starts whether it
passes or fails.
When reading/writing the individual doc responses in the context
of a bulk shard response there is no need to serialize the `ShardId`
over and over. This can waste a lot of memory when handling large bulk
requests.
This assertion is too strict. A snapshot will be removed from the cluster state
on the CS thread before it is removed from the listeners map on the snapshot thread pool.
Throughout the removal from the cluster state and listener map, the snapshot is tracked
in `endingSnapshots` though, so we can relax the assertion accordingly and are still able
to catch leaked listeners.
Closes#56607
In the unlikely event that the data nodes started snapshotting the
shards already (and hence got blocked on the data blobs) before the
master has applied the cluster state to its own `SnapshotsService` on
the CS applier thread, we can get a `SnapshotMissingException` here which
breaks the busy assert loop so we have to deal with it explicitly.
Closes#56858
Currently it is possible that a sniff connection round is occurring as
we enter another test loop in testEnsureWeReconnect. The problem is that
once we enter another loop, closing the connection manually can cause
this pre-existing connection round to fail. This round failing can fail
the test. This commit fixes the issue by ensuring that there are no
in-progress connections before entering another loop.
It was relying on the compensated sum working but the test framework was
dodging it. This forces the accuracy tests to come from a single shard
where we get the proper compensated sum.
Closes#56757
We get the number of shards and replicas with our bare hands in index
metadata, rather than letting the settings infrastructure do the work
for us. This commit switches to using the settings infrastructure.
Today a 7.x node logs `cluster UUID set to [...]` on every cluster state update
received from a 6.8 master, because 6.8 nodes are not able to commit the
cluster UUID properly. We could try and deduplicate these logs somehow, but
that would introduce a good deal of complexity. Instead, this commit suppresses
these logs entirely when receiving cluster state updates from a 6.8 master.
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.
The aggregations supports the following normalizations
- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization
To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.
For example:
```
{
"normalize": {
"buckets_path": <>,
"normalizer": "percent"
}
}
```
If a channel gets disconnected, then we should cancel the tasks
associated with that channel as their results won't be retrieved.
Closes#56327
Relates #56619
Backport of #56620
We previously rejected removing the number of replicas setting, which
prevents users from reverting this setting to its default the natural
way. To fix this, we put back the setting with the default value in the
cases that the user is trying to remove it. Yet, we also need to do the
work of updating the routing table and so on appropriately. This case
was missed because when the setting is being removed, we were defaulting
to -1 in this code path, which is treated as not being updated. Instead,
we must treat the case when we are removing this setting as if the
setting is being updated, too. This commit does that.
In normal operation native controllers are not expected to write
anything to stdout or stderr. However, if due to an error or
something unexpected with the environment a native controller
does write something to stdout or stderr then it will block if
nothing is reading that output.
This change makes the stdout and stderr of native controllers
reuse the same stdout and stderr as the Elasticsearch JVM (which
are by default redirected to es.stdout.log and es.stderr.log) so
that if something unexpected is written to native controller
output then:
1. The native controller process does not block, waiting for
something to read the output
2. We can see what the output was, making it easier to debug
obscure environmental problems
Backport of #56491
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.
Precondition for the terms work in #56487
When using source filtering exclusions, empty arrays are not preserved in documents, and no empty arrays are returned if arrays are empty after applying exclusions. We have special treatment to make sure that we preserve empty objects, but the behaviour for arrays is different.
It looks like this regression was introduced by #22593, shortly after we refactored source filtering to use automata (#20736).
Note that this change affects what the search API returns when using source exclusions, as well as what gets indexed when using source exclusions for the _source field.
Closes#23796
This adds a few things to the `breakdown` of the profiler:
* `histogram` aggregations now contain `total_buckets` which is the
count of buckets that they collected. This could be useful when
debugging a histogram inside of another bucketing agg that is fairly
selective.
* All bucketing aggs that can delay their sub-aggregations will now add
a list of delayed sub-aggregations. This is useful because we
sometimes have fairly involved logic around which sub-aggregations get
delayed and this will save you from having to guess.
* Aggregtations wrapped in the `MultiBucketAggregatorWrapper` can't
accurately add anything to the breakdown. Instead they the wrapper
adds a marker entry `"multi_bucket_aggregator_wrapper": true` so we
can be quickly pick out such aggregations when debugging.
It also fixes a bug where `_count` breakdown entries were contributing
to the overall `time_in_nanos`. They didn't add a large amount of time
so it is unlikely that this caused a big problem, but I was there.
To support the arbitrary breakdown data this reworks the profiler so
that the `breakdown` can contain any data that is supported by
`StreamOutput#writeGenericValue(Object)` and
`XContentBuilder#value(Object)`.
This PR proposes to use `IndexSortSortedNumericDocValuesRangeQuery` when
possible to speed up certain range queries. Points-based queries are already
very efficient, the only time this query makes a difference is when the range
matches a large number of documents.
Relates to #48665.
This is similar to a previous change that allowed removing the number of
replicas settings (so setting it to its default) on open indices. This
commit allows the same for closed indices.
It is unfortunate that we have separate branches for handling open and
closed indices here, but I do not see a clean way to merge these two
together without making a rather unnatural method (note that they invoke
different methods for doing the settings updates). For now, we leave
this as-is even though it led to the miss here.
Today a user can create an index without setting the
index.number_of_replicas setting even though the index metadata requires
that the setting has a value. We do this when creating an index by
explicitly settings index.number_of_replicas to a default value if one
is not provided. However, if a user updates the number of replicas, and
then let wants to return to the default value, they are naturally
inclined to try setting this setting to null, as the agreed upon way to
return a setting to its default. Since the index metadata requires that
this setting has a non-null value, we blow up when a user attempts to
make this change. This is because we are not taking the same action when
updating a setting on an index that we take when create an
index. Namely, we are not explicitly setting index.number_of_replicas if
the request does not carry a value for this setting. This would happen
when nulling the setting, which we want to support. This commit
addresses this by setting index.number_of_replicas to the default if the
value for this setting is null when updating the settings for an index.
Currently the `time_zone` parameter in `query_string` queries gets applied
correctly only when using the range syntax, e.g "date:[2020-01-02 TO
2020-01-05]. When a date field gets searched without explicit range syntax, e.g.
"date:"2020-01-01" we internally create a range query than uses the specified
date as start date and rounds up to the next underspecified units for the end
date (e.g. here 2020-01-01T23:59:59) without considering the `time_zone`
settings. This change adds a check in QueryStringQueryParser to detect this
scenario early where we have access to the time zone information and directly
create a range query using it.
Closes#55813
In a race condition, a search context could remain enlisted in
SearchService when an index is deleted, potentially causing the index
folder to not be cleaned up (for either lengthy searches or scrolls with
timeouts > 30 minutes or if the scroll is kept active).
If a conditional is added to a processor, and that processor fails, and
that processor has an on_failure handler, the full trace of all of the
executed processors may not be displayed in simulate verbose. The
information is correct, but misses displaying some of the steps used
to get there.
This happens because a processor that is conditional processor is a
wrapper around the real processor and a processor with an on_failure
handler is also a wrapper around the processor(s). When decorating for
simulation we treat compound processor specially, but if a compound processor
is wrapped by a conditional processor that compound processor's processors
can be missed for decoration resulting in the missing displayed steps.
The fix to this is to treat the conditional processor specially and
explicitly seperate it from the processor it is wrapping. This requires
us to keep track of 2 processors a possible conditional processor and
the actual processor it may be wrapping.
related: #56004
Two spots that allow for some optimization:
* We are often creating a composite reference of just a single item in
the transport layer => special cased via static constructor to make sure we never do that
* Also removed the pointless case of an empty composite bytes ref
* `ByteBufferReference` is practically always created from a heap buffer these days so there
is no point of dealing with all the bounds checks and extra references to sliced buffers from that
and we can just use the underlying array directly
Today the heap size check warns the user about two issues why they might
care about the heap size check: resize pauses, and if memory locking is
enabled. Yet, we unconditionally make mention of the memory locking
reason, even if memory locking is not enabled. This can confuse some
users, so we adjust the warning about memory locking to only display if
memory locking is enabled.
Backport: #55377
This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.
Relates to #53100
Similar to what the moving function aggregation does, except merging windows of percentiles
sketches together instead of cumulatively merging final metrics
Move data stream resolvability test from IndicesOptionsIntegrationIT to DataStreamIT class.
Whether a transport action supports data streams is no longer controlled via indices options.
This adds support for parsing numbers as range keys. They get converted
into a string, but we allow numbers.
While I was there I replaced the parser for `Range` with a
`ConstructingObjectParser` which will automatically add support for "did
you mean" style corrections on errors.
Closes#56402
This commit refactors the following:
* GeoPointFieldMapper and PointFieldMapper to
AbstractPointGeometryFieldMapper derived from AbstractGeometryFieldMapper.
* .setupFieldType moved up to AbstractGeometryFieldMapper
* lucene indexing moved up to AbstractGeometryFieldMapper.parse
* new addStoredFields, addDocValuesFields abstract methods for implementing
stored field and doc values field indexing in the concrete field mappers
This refactor is the next phase for setting up a framework for extending
spatial field mapper functionality in x-pack.
This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that
allowed specifying whether V1 or V2 template should be used when an index is created. It has been
removed in favor of V2 templates always having priority.
Relates to #53101Resolves#56528
This is not a breaking change because this flag was never in a released version.
Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction
to be able to resolve data streams by default. Implementations can change this ability.
This change allows to following APIs to resolve data streams: flush,
refresh (already supported data streams), force merge, clear indices cache,
indices stats (already supported data streams), segments, upgrade stats,
upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and
reload analyzers APIs.
Relates to #53100
This wires `auto_date_histogram` into the rounding optimization that I
built in #55559. This is should significantly speed up any
`auto_date_histogram`s with `time_zone`s on them.
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:
1. We didn't have an appropriate data structure to track the
sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
per `Aggregator` which was the only way to delay collection of
sub-aggregations.
This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.
It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.
It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.
I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
When an index spans a daylight savings time transition we can't use our
optimization that rewrites the requested time zone to a fixed time zone
and instead we used to fall back to a java.util.time based rounding
implementation. In #55559 we optimized "time unit" rounding. This
optimizes "time interval" rounding.
The java.util.time based implementation is about 1650% slower than the
rounding implementation for a fixed time zone. This replaces it with a
similar optimization that is only about 30% slower than the fixed time
zone. The java.util.time implementation allocates a ton of short lived
objects but the optimized implementation doesn't. So it *might* end up
being faster than the microbenchmarks imply.
Use proper facility for creating temporary index service for the simulation
that does not add itself to the `IndicesService` unnecessarily (breaking an assertion about the
internal consistency of the cluster state and the `IndicesService`).
Closes#56298
Backport of: #56413
Allow cluster health api to resolve data streams and
automatically remove data streams after each test in
test cases extending from `ESIntegTestCase`
Relates to #53100
While investigating possible optimizations to speed up searchable
snapshots shard restores, we noticed that Elasticsearch builds the
list of shard files on local disk in order to compare it with the list of
files contained in the snapshot to restore. This list of files is
materialized with a MetadataSnapshot object whose construction
involves to read the footer checksum of every files of the shard
using Store.checksumFromLuceneFile() method.
Further investigation shows that a MetadataSnapshot object is
also created for other types of operations like building the list of
files to recover in a peer recovery (and primary shard relocation)
or in order to assign a shard to a node. These operations use the
Store.getMetadata(IndexCommit) method to build the list of files
and checksums.
In the case of searchable snapshots building the MetadataSnapshot
object can potentially trigger cache misses, which in turn can
cause the download and the writing in cache of the last range of
the file in order to check the 16 bytes footer. This in turn can
cause more evictions.
Since searchable snapshots already contains the footer information
of every file in BlobStoreIndexShardSnapshot it can directly read the
checksum from it and avoid to use the cache at all to create a
MetadataSnapshot for the operations mentioned above.
This commit adds a shortcut to the
SearchableSnapshotDirectory.openInput() method - similarly to what
already exists for segment infos - so that it creates a specific
IndexInput for checksum reading operation.
A bug in InternalGeoCentroid#reduce existed that summed up
the aggregation's long-valued counts into a local integer variable.
Since it is definitely possible to reduce more than Integer.MAX points,
this change simply updates that variable to be a long-valued number.
Closes#55992.
Currently, the logging around the SniffConnectionStrategy is limited.
The log messages are inconsistent and sometimes wrong. This commit
cleans up these log message to describe when connections are happening
and what failed if a step fails.
Additionally, this commit enables TRACE logging for a problematic test
(testEnsureWeReconnect).
Currently when a connection closes a new sniff round begins. The
testCollectNodes test closes four transports before triggering the
method to collect the remote nodes. This leads to a race where there are
a number of reasons the collect nodes call might fail. This commit fixes
that issue by changing the test assertion to include a potential failure
condition.
Fixes#55292.
`auto_date_histogram` was returning the incorrect `interval` because
of a combination of two things:
1. When pipeline aggregations rewrote `auto_date_histogram` we reset the
interval to 1. Oops. Fixed that.
2. *Every* bucket aggregation was rewriting its buckets as though there
was a pipeline aggregation even if there aren't any. This is a bit
silly so we skip that too.
Closes#56116
We fail to unregister the child node in registerAndExecute if the parent
task is being canceled. This leads to a bug where a cancel request never
completes.
Closes#55875
Relates #54312
Rounding dates on a shard that contains a daylight savings time transition
is currently something like 1400% slower than when a shard contains dates
only on one side of the DST transition. And it makes a ton of short lived
garbage. This replaces that implementation with one that benchmarks to
having around 30% overhead instead of the 1400%. And it doesn't generate
any garbage per search hit.
Some background:
There are two ways to round in ES:
* Round to the nearest time unit (Day/Hour/Week/Month/etc)
* Round to the nearest time *interval* (3 days/2 weeks/etc)
I'm only optimizing the first one in this change and plan to do the second
in a follow up. It turns out that rounding to the nearest unit really *is*
two problems: when the unit rounds to midnight (day/week/month/year) and
when it doesn't (hour/minute/second). Rounding to midnight is consistently
about 25% faster and rounding to individual hour or minutes.
This optimization relies on being able to *usually* figure out what the
minimum and maximum dates are on the shard. This is similar to an existing
optimization where we rewrite time zones that aren't fixed
(think America/New_York and its daylight savings time transitions) into
fixed time zones so long as there isn't a daylight savings time transition
on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement
time interval rounding the time zone rewriting optimization *should* no
longer be needed.
This optimization doesn't come into play for `composite` or
`auto_date_histogram` aggs because neither have been migrated to the new
`DATE` `ValuesSourceType` which is where that range lookup happens. When
they are they will be able to pick up the optimization without much work.
I expect this to be substantial for `auto_date_histogram` but less so for
`composite` because it deals with fewer values.
Note: My 30% overhead figure comes from small numbers of daylight savings
time transitions. That overhead gets higher when there are more
transitions in logarithmic fashion. When there are two thousand years
worth of transitions my algorithm ends up being 250% slower than rounding
without a time zone, but java time is 47000% slower at that point,
allocating memory as fast as it possibly can.
We were logging the cleanup of the snap- and meta- blobs for every snapshot delete
which is needlessly noisy and confusing to users. We should only log actual stale/unexpected
blobs here.
This commit creates a new gradle plugin to provide a separate task name
and source set for running ESIntegTestCase tests. The only project
converted to use the new plugin in this PR is server, as an example. The
remaining cases in x-pack will be handled in followups.
backport of #55896
`FieldMapper#parseCreateField` accepts the parse context, plus a list of fields
as an output parameter. These fields are immediately added to the document
through `ParseContext#doc()`.
This commit simplifies the signature by removing the list of fields, and having
the mappers add the fields directly to `ParseContext#doc()`. I think this is
nicer for implementors, because previously fields could be added either through
the list, or the context (through `add`, `addWithKey`, etc.)
A FilterBlobContainer class was introduced in #55952 and it delegates
its behavior to a given BlobContainer while allowing to override
only necessary methods.
This commit replaces the existing BlobContainerWrapper class from
the test framework with the new FilterBlobContainer from core.
this commit adds aggregation support for the geo_shape field
type on geo*_grid aggregations.
it introduces a Tiler for both tiles and hashes that enables a new type of
ValuesSource to replace the GeoPoint's CellIdSource. This makes it possible
for the existing Aggregator to be re-used, so no new implementations of
the grid aggregators are added.
This commit changes searchable snapshots so that it now respects the
repository's max_restore_bytes_per_sec setting when it downloads blobs.
Backport of #55952 for 7.x
This arose after a backport where we do not have the nicities of the
Java 11 diamond operator. This commit fixes it by adding the proper type
parameter.
When the index we are validating a query does not exist, we try to send
back a response letting the client know that the index does not
exist. Yet, we accidentally fallthrough into the case that the
validation failed for some other reason. This means that we end up
notifying the channel twice. Sometimes the notification occurs after the
failure has been written out and the channel closed (so the second
invocation leads to a silent failed to write to a closed channel issue),
and sometimes the response does end up in the channel, creating garbled
responses to the client. This commit fixes that issue by avoiding the
fallthrough.
Backport of #56034.
Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context
as a dedicated field that callers to IndexNameExpressionResolver can set.
Also alter indices stats api to support data streams.
The rollover api uses this api and otherwise rolling over data stream does no longer work.
Relates to #53100
Backport of #55858 to 7.x branch.
Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.
This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:
* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See #55377
* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:
AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.
The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.
Relates to #53100
It's possible for a constant_keyword to have a 'null' value before any documents
are seen that contain a value for the field. In this case, no documents have a
value for the field, and 'exists' queries should return no documents.
Making use of #55773 to simplify snapshot state machine.
1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away
instead of doing a second CS update after the fist update was a NOOP.
2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot
and then move on to delete from the repository.
Backports #55933 to 7.x
Implements value_count and avg aggregations over Histogram fields as discussed in #53285
- value_count returns the sum of all counts array of the histograms
- avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array
Using optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Allows for a follow-up to remove the snapshot INIT state.
* Allow Deleting Multiple Snapshots at Once (#55474)
Adds deleting multiple snapshots in one go without significantly changing the mechanics of snapshot deletes otherwise.
This change does not yet allow mixing snapshot delete and abort. Abort is still only allowed for a single snapshot delete by exact name.
I see occasional confusion about the explanations emitted by the same-shard
allocation decider, particularly amongst new users setting up a single-node
cluster and trying to determine why their cluster has `yellow` health. For
example:
the shard cannot be allocated to the same node on which a copy of the shard
already exists
This is technically correct but it's quite a complicated sentence. Also, by
starting with "the shard cannot be allocated" it makes it sound like this is
the problem, whereas in fact this message is a good thing and users should
typically focus their attention elsewhere.
This commit simplifies the wording of these messages and makes them sound more
positive, for example:
a copy of this shard is already allocated to this node
In order to iterate through remote connections, the remote connection
manager maintains a local cache of connected nodes. Unfortunately this
is difficult in relationship with testing as it is inherently racy in
comparison to the parent connection manager map of connections.
This commit improves the relationship by only returning a cached
connection if it is still registered with the parent. If the connection
is not open, we will go to the slow path of allocating a iterator
directly from the parent.
* Emit deprecation warning if multiple v1 templates match with a new index (#55558)
* Emit deprecation warning if multiple v1 templates match with a new index
* DEPRECATION_LOGGER rename
This commit includes a number of minor improvements around `DelayableWriteable`: javadocs were expanded and reworded, `get` was renamed to `expand` and `DelayableWriteable` no longer implements `Supplier`. Also a couple of methods are now private instead of package private.
We generate a circular reference exception in translog in 6.8 in the
following scenario:
- The first rollGeneration hits "too many open files" exception when it's
copying a checkpoint file. We will set the tragic exception and close
the translog
- The second rollGeneration hits AlreadyClosedException as the current
writer is closed. We will suppress the ACE to the current tragic
exception. Unfortunately, this leads to a circular reference as ACE
already suppresses the tragic exception.
Other factors that help to manifest this bug:
- We do not fail the engine on AlreadyClosedException in flush
- We do not check for ensureOpen before rolling a new generation
Closes#55893
Ocassionally this test can fail when the randomized index.version.created is
before 6.1. In this case we don't check that if mappedFields.size() == 0 we
expect a MatchNoDocsQuery query being returned, which we do for other versions.
This fails only occasionally but with the seed provided on the original issue.
It also shouldn't be an issue on master since we shouldn't test with these pre-7
index versions there.
Closes#55950
Backports #55826 to 7.x
Modified AggregatorTestCase.searchAndReduce() method so that it returns an empty aggregation result when no documents have been inserted.
Also refactored several aggregation tests so they do not re-implement method AggregatorTestCase.testCase()
Fixes#55824
Currently if we shortcircuit a message the breaker release is null since
there is nothing to be broken. However, the TcpTransportChannel
infrastructure still expects it. This commit resolves this issue be
returning a no-op breaker release.
Currently `currentFieldType` is an instance variable that is first set and then
used by all methods referring to it. We can make it local to each method
instead, avoiding possible state problems and improve readability of the code
instead.
This adds a new api to simulate matching the given index name against the
index templates in the system.
The syntax for the new API takes the following form:
POST _index_template/_simulate_index/{index_name}
{
"index_patterns": ["logs-*"],
"priority": 15,
"template": {
"settings": {
"number_of_shards": 3
}
...
}
}
Where the body is optional, but we support the entire body used by the
PUT _index_template/{name} api. When the body is specified we'll simulate
matching the given index against a system that'd have the given index
template together with the index templates that exist in the system.
The response, in both cases, will return the matching template's resolved
settings, mappings and aliases, together with a special field that'll print any
overlapping templates and their corresponding index patterns.
(cherry picked from commit 1a5845edce1f445c58e094e9a3b6792e21e543b0)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285
Backports #55681 to 7.x
Today if state recovery is delayed by the `gateway.recover_after_*` settings
then we may end up performing state recovery twice: once when enough nodes have
joined the cluster, and again when the timeout elapses. The second state
recovery reinitializes the routing table, effectively discarding all
recovered/recovering shards and starting again from scratch. This commit adds a
check to prevent this second state recovery.
Closes#55564
Currently the testTransientErrorsDuringRecoveryAreRetried validates that
the expected peer recovery starts only once. This check is coarse and is
executed on all nodes and indexes. This commit modifies this check to
only be performed on the expected index. Additionally this commit
removes the disruption behavior from the "blue" node where it is not
relevant. Finally, this commit improves the logging for this test.
This replaces a reference to the result of partially reducing
aggregations that async search keeps with a reference to the serialized
form of the result of the partial reduction which we need to keep
anyway.
When calling scripts in metric aggregation, the returned metric state is
passed along to the coordinating node to do the final reduce. However,
it is possible the object could contain nested state which is unknown to
StreamOutput/StreamInput. This would then result in the node crashing as
exceptions are not expected in the middle of serialization.
This commit adds a method to StreamOutput that can determine if an
object is writeable by the stream. It uses the same logic
writeGenericValue, special casing each of the supported collection types
to recursively determine if each contained value is itself writeable.
relates #54708
Currently cancelling the RetryableAction does not stop one last run from
being executed. This commit makes a best effort attempt to cancel a
scheduled retry and guards future executions from the action already
being completed.
Currently a failed peer recovery action will fail an recovery. This
includes when the recovery fails due to potentially short lived
transient issues such as rejected exceptions or circuit breaking
errors.
This commit adds the concept of a retryable action. A retryable action
will be retryed in face of certain errors. The action will be retried
after an exponentially increasing backoff period. After defined time,
the action will timeout.
This commit only implements retries for responses that indicate the
target node has NOT executed the action.
If we advance the global checkpoint during commit and sync that
checkpoint after commit, then the assertions in the test won't hold
because the deletion policy did not see the latest global checkpoint
but only the value before committing.
Closes#55680
The disk decider had special handling for the single data node case,
allowing any allocation (skipping watermark checks) for such clusters.
This special handling can now be avoided via a setting.
If a node client (or rather its underlying node) is closed then
any executions on it will just quietly fail as happens in #55660
via closing the nodes on the test thread and asynchronously using
a node client.
Closes#55660
For 7.x, we already implemented the `?prefer_v2_templates` flag and made V2 templates opt-in, so we
can relax the error when updating V1 templates to just a warning. This will still be a hard error
for 8.0+
Relates to #53101
This has no practical impact on users since frozen indices are the only
throttled indices today. However this has an impact on upcoming features
that would use search throttling.
Filtering out throttled indices made sense a couple years ago, but as
we're now improving support for slow requests with `_async_search` and
exploring ways to reduce storage costs, this feature has most likely
become a trap, that we'd like to not have with upcoming features that
would use search throttling.
Relates #54058
Currently there is a clear mechanism to stub sending a request through
the transport. However, this is limited to testing exceptions on the
sender side. This commit reworks our transport related testing
infrastructure to allow stubbing request handling on the receiving side.
Fixes confusing error message when unknown value type is specified in a terms
aggregation. Adds support for parsing "numeric" and "number" value types.
Fixes#55727
We make sure to filter shard generations for indices that are missing
from the metadata when finalizing a partial snapshot (from concurrent index deletion)
but we failed to account for the case where we manually build a fake metadata instance
for snapshots without the global state.
Fixed this by handling missing indices by skipping, same way we do it for filtering the
shard generations.
Relates #50234
With this change, we will always return true for can_match requests on
idle search shards; otherwise, some shards will never get refreshed if
all search requests perform the can_match phase (i.e., total shards >
pre_filter_shard_size).
Relates #27500
Relates #50043
This commit refactors all spatial Field Mappers to a common
AbstractGeometryFieldMapper that implements shared parameter functionality
(e.g., ignore_malformed, ignore_z_value) and provides a common framework for
overriding type parsing, and building in xpack. Common shape functionality is
implemented in a new AbstractShapeGeometryFieldMapper that is reused and
overridden in GeoShapeFieldMapper, GeoShapeFieldMapperWithDocValues,
LegacyGeoShapeFieldMapper, and ShapeFieldMapper. This abstraction provides a
reusable foundation for adding new xpack features; such as coordinate reference
system support.
This adds a validation to VSParserHelper to ensure that a field or
script or both are specified by the user. This is technically
required today already, but throws an exception much deeper
in the agg framework and has a very unintuitive error for the user
(as well as eating more resources instead of failing early)
The (de)serialization code of the async search response
cannot handle exceptions that extend ElasticsearchException (e.g. ScriptException).
This commit fixes this bug by serializing the error with the more generic
StreamInput#writeException.
We are using `FORCE_STALE_PRIMARY_INSTANCE` in instance equality checks `==`
but were creating new instances of `ExistingStoreRecoverySource` when reading
from the wire. This could break these checks in corner cases, causing
`org.elasticsearch.cluster.routing.allocation.IndexMetadataUpdater#shardStarted`
to not remove the force allocation fake id when starting a shard.
Closes#55513
Currently QueryStringQueryParser already checks if the field limit is breached
at construction time, which e.g. leads to errors if the default field is set to
"*" or the default isn't used and there are more fields than the limit, even if the
query itself does not use all these fields.
This change moves this check to happen after query parsing. QueryStringQueryParser now
keeps track of the fields that are actually resolved while parsing. The size of
that set is later used to check against the limit set by the
`indices.query.bool.max_clause_count` setting.
Backport of #55158
Backport of #55115.
Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
This commit changes the way that V2 index, component, and request mappings are merged. Specifically:
- Fields are merged in a "replacement" manner, meaning that the entire definition is replaced rather
than merging the interior configuration
- Mapping metadata (all fields outside of `properties`) are merged recursively.
The merging for V1 templates does not change.
Relates to #53101
This commit adds a new GeoShapeBoundsAggregator to the spatial plugin and registers it with the GeoShapeValuesSourceType. This enables geo_bounds aggregations on geo_shape fields
This change fixes problem with updating Index Templates V2.
Validatation added in #54933 didn't filter list of conflicting templates correctly so
new template was always clashing with itself unless patterns were not changed completely.
Introduces InstantiatingObjectParser which is similar to the
ConstructingObjectParser, but instantiates the object using its constructor
instead of a builder function.
Closes#52499
After #53562, the `geo_shape` field mapper is registered within
a module. This opens the door for introducing a new `geo_shape`
field mapper into the Spatial Plugin that has doc-values support.
This is very much an extension of server's GeoShapeFieldMapper,
but with the addition of the doc values implementation.
There is no guarantee the observer and subsequent CS update will execute
before we move on to the next test here and we ahve to wait for the observer + CS update cycle to complete
before moving on to the next test.
closes#55481
If some of the nodes are pre-7.8 nodes, they may not support the `index.prefer_v2_templates`
setting (since it exists only on 7.8+). This commit makes sure that we only add this setting to the
index's metadata if all of the nodes are 7.8+. Otherwise we get errors like:
```
[ WARN ][o.e.i.c.IndicesClusterStateService] [v7.7.0-3] [test-snapshot-index][2] marking and sending shard failed due to [failed to create index]
java.lang.IllegalArgumentException: unknown setting [index.prefer_v2_templates] please check that any required plugins are installed, or check the breaking changes documentation for removed settings
at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:544) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:489) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.common.settings.AbstractScopedSettings.validate(AbstractScopedSettings.java:460) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.indices.IndicesService.createIndexService(IndicesService.java:595) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:549) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.indices.IndicesService.createIndex(IndicesService.java:176) ~[elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.createIndices(IndicesClusterStateService.java:484) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:246) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
at org.elasticsearch.cluster.service.ClusterApplierService.lambda$callClusterStateAppliers$5(ClusterApplierService.java:517) [elasticsearch-7.7.0-SNAPSHOT.jar:7.7.0-SNAPSHOT]
```
Relates to #55411
Relates to #53101Resolves#55539
This change folds the removal of the in-progress snapshot entry
into setting the safe repository generation. Outside of removing
an unnecessary cluster state update, this also has the advantage
of removing a somewhat inconsistent cluster state where the safe
repository generation points at `RepositoryData` that contains a
finished snapshot while it is still in-progress in the cluster
state, making it easier to reason about the state machine of
upcoming concurrent snapshot operations.
In the documentation of `pre_filter_shard_size` we state that the pre-filter
phase will be executed if the request targets more than 128 shards, yet in the
current test randomization we check that the
TransportSearchAction.shouldPreFilterSearchShards is true already for 127
targeted shards. This should be raised to 129 instead.
Closes#55514
Today a read-only engine requires a complete history of operations, in the
sense that its local checkpoint must equal its maximum sequence number. This is
a valid check for read-only engines that were obtained by closing an index
since closing an index waits for all in-flight operations to complete. However
a snapshot may not have this property if it was taken while indexing was
ongoing, but that's ok.
This commit weakens the check for a complete history to exclude the case of a
searchable snapshot.
Relates #50999
This change ensures that we return the latest expiration time
when retrieving the response from the index.
This commit also fixes a bug that stops the garbage collection of saved responses if the async search index is deleted.
Validate adding global V2 templates don't configure the index.hidden setting.
This also prevents updating the component template to add the index.hidden
setting if that component template is referenced by a global index template.
(cherry picked from commit 2e768981809887649f49d265d039f056985f7e6a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Peer recovery fails if the primary does not see the recovering replica
in the replication group (when the cluster state update on the primary
is delayed). To verify the local recovery stats, we have to remember
this value in the first try because the local recovery happens once, and
its stats is reset when the recovery fails.
Closes#54829
If more than 100 shard-follow tasks are trying to connect to the remote
cluster, then some of them will abort with "connect listener queue is
full". This is because we retry on ESRejectedExecutionException, but not
on RejectedExecutionException.
This commit adds a new querystring parameter on the following APIs:
- Index
- Update
- Bulk
- Create Index
- Rollover
These APIs now support a `?prefer_v2_templates=true|false` flag. This flag changes the preference
creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be
changed to `true` for 8.0+ in subsequent work.
Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting.
This setting is used so that actions that automatically create a new index (things like rollover
initiated by ILM) will inherit the preference from the original index. This setting is dynamic so
that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias
performing periodic rollover.
This also adds support for sending this parameter to the High Level Rest Client.
Relates to #53101
We don't really need `LinkedHashSet` here. We can assume that all the
entries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in `addSnapshot` which would mutate the existing
linked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.
* Add Snapshot Resiliency Test for Master Failover during Delete
We only have very indirect coverage of master failovers during snaphot delete
at the moment. This comment adds a direct test of this scenario and also
an assertion that makes sure we are not leaking any snapshot completion listeners
in the snapshots service in this scenario.
This gives us better coverage of scenarios like #54256 and makes the diff
to the upcoming more consistent snapshot delete implementation in #54705
smaller.
If we run into an INIT state snapshot and the current master didn't create it, it will be removed anyway.
=> no need to have that block another snapshot from starting.
This has practical relevance because on master fail-over after snapshot INIT but before start, the create snapshot request will be retried by the client (as it's a transport master node action) and needlessly fail with an unexpected exception (snapshot clearly didn't exist so it's confusing to the user).
This allowed making two disruption type tests stricter
isHidden was a `Boolean` in order to treat a special case identified
with V1 templates where if the create index request didn't specify if
the index should be hidden or not (ie. isHidden was `null`) but the
index matched a template that specified the `index.hidden` setting we
needed to remove the global templates from the templates we'll apply to
the new index (note: this is important with V1 templates as inheritance
is supported).
With V2 templates we match only one template with an index so the
equivalent check did not need to exist (we added a sanity check in
https://github.com/elastic/elasticsearch/pull/55015 where we make sure
we don't apply an invalid global template - one that specifes the
`index.hidden` setting, but this is a check we make irrespective of the
user specifying or not if the index should be hidden)
This commit makes `isHidden` when matching V2 templates a boolean
primitive, eliminating the need for the `null` state to exist. Note that
some methods which use the matching V2 templates still work with a
`Boolean` object `isHidden` attribute as they are also matching the V1
templates. These methods will pass in `false` instead of `null` when
finding the V2 templates.
(cherry picked from commit c5b923afec911c6ae8fc5179e65ae6bf55dcc5f1)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Some aggregations, such as the Terms* family, will use an alternate
class to represent unmapped shard results (while the rest of the aggs
use the same object but with some form of "empty" or "nullish" values
to represent unmapped).
This was problematic with AbstractWireSerializingTestCase because it
expects the instanceReader to always match the original class. Instead,
we need to use the NamedWriteable version so that the registry
can be consulted for the proper deserialization reader.
If we start unbanning when the last child task completed and that child
task executed with a specific user, then unban requests are denied
because internal requests can't run with a user. We need to remove bans
with the current thread context.
This validates that if the winner v2 template is a global one, it doesn't specify the
index.hidden setting.
(cherry picked from commit 19a97f76aac73e0455053097e5391165a9357427)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Enabled data streams and itv2 feature enabled system properties in server module's integ test task.
PR #54726 added java integration tests for data streams, so this is why these system properties
need to be enabled when running release build.
Hotfix to not run into stuck snapshots because of master circuit breaking these requests.
Given that these requests are very small and much of the memory associated with them is already allocated
when the circuit breaker kicks in, the risk of this change introducing a higher chance of master running out
of memory should be very small.
Closes#54714
We rewrite more query builders to MatchNoneQueryBuilders now, which are always
cacheable. We should make sure the tests expects this when the rewritten query
is a MatchNoneQueryBuilder.
Closes#55331
Backport from: #54726
The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used.
In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs.
Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex().
If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api).
Relates to #53100
* Add ValuesSource Registry and associated logic (#54281)
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638)
* ValuesSourceRegistry Prototype (#48758)
* Remove generics from ValuesSource related classes (#49606)
* fix percentile aggregation tests (#50712)
* Basic thread safety for ValuesSourceRegistry (#50340)
* Remove target value type from ValuesSourceAggregationBuilder (#49943)
* Cleanup default values source type (#50992)
* CoreValuesSourceType no longer implements Writable (#51276)
* Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131)
* Put values source types on fields (#51503)
* Remove VST Any (#51539)
* Rewire terms agg to use new VS registry (#51182)
Also adds some basic AggTestCases for untested code
paths (and boilerplate for future tests once the IT are
converted over)
* Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337)
* Wire Percentiles aggregator into new VS framework (#51639)
This required a bit of a refactor to percentiles itself. Before,
the Builder would switch on the chosen algo to generate an
algo-specific factory. This doesn't work (or at least, would be
difficult) in the new VS framework.
This refactor consolidates both factories together and introduces
a PercentilesConfig object to act as a standardized way to pass
algo-specific parameters through the factory. This object
is then used when deciding which kind of aggregator to create
Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will
be moved in a subsequent PR.
* Remove generics and target value type from MultiVSAB (#51647)
* fix checkstyle after merge (#52008)
* Plumb ValuesSourceRegistry through to QuerySearchContext (#51710)
* Convert RareTerms to new VS registry (#52166)
* Wire up Value Count (#52225)
* Wire up Max & Min aggregations (#52219)
* ValuesSource refactoring: Wire up Sum aggregation (#52571)
* ValuesSource refactoring: Wire up SigTerms aggregation (#52590)
* Soft immutability for VSConfig (#52729)
* Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734)
Also fixes Percentiles which was incorrectly specified to only accept
numeric, but in fact also accepts Boolean and Date (because those are
numeric on master - thanks `testSupportedFieldTypes` for catching it!)
* VS refactoring: Wire up stats aggregation (#52891)
* ValuesSource refactoring: Wire up string_stats aggregation (#52875)
* VS refactoring: Wire up median (MAD) aggregation (#52945)
* fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java
this commit implements `getValuesSourceType` for
the ConstantKeyword field type.
master was merged into feature/extensible-values-source
introducing a new field type that was not implementing
`getValuesSourceType`.
* ValuesSource refactoring: Wire up Avg aggregation (#52752)
* Wire PercentileRanks aggregator into new VS framework (#51693)
* Add a VSConfig resolver for aggregations not using the registry (#53038)
* Vs refactor wire up ranges and date ranges (#52918)
* Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034)
This commit updates the geo_bounds aggregation to depend
on registering itself in the ValuesSourceRegistry
relates #42949.
* VS refactoring: convert Boxplot to new registry (#53132)
* Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037)
This commit updates the geo*_grid aggregations to depend
on registering itself in the ValuesSourceRegistry
relates to the values-source refactoring meta issue #42949.
* Wire-up geo_centroid agg to ValuesSourceRegistry (#53040)
This commit updates the geo_centroid aggregation to depend
on registering itself in the ValuesSourceRegistry.
relates to the values-source refactoring meta issue #42949.
* Fix type tests for Missing aggregation (#53501)
* ValuesSource Refactor: move histo VSType into XPack module (#53298)
- Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core.
- This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept
- Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs
* Wire up DateHistogram to the ValuesSourceRegistry (#53484)
* Vs refactor parser cleanup (#53198)
Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
* First batch of easy fixes
* Remove List.of from ValuesSourceRegistry
Note that we intend to have a follow up PR dealing with the mutability
of the registry, so I didn't even try to address that here.
* More compiler fixes
* More compiler fixes
* More compiler fixes
* Precommit is happy and so am I
* Add new Core VSTs to tests
* Disabled supported type test on SigTerms until we can backport it's fix
* fix checkstyle
* Fix test failure from semantic merge issue
* Fix some metaData->metadata replacements that got lost
* Fix list of supported types for MinAggregator
* Fix list of supported types for Avg
* remove unused import
Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
The main changes are:
1. Throw an error when updating `include_in_parent` or `include_in_root` attribute of nested field dynamically by the PUT mapping API.
2. Add a test for the change.
Closes#53792
Co-authored-by: bellengao <gbl_long@163.com>
Modify the value of nowInMillis in queryShardContext to current timestamp, because the
value will be used lately when validating the filtered alias which uses now in a date_nanos
range query.
When retrieving the snapshots for a set of repos or deleting a single snapshot, it's possible for
the body of the `ActionListener`'s `onResponse` method to throw an Exception. In this case, the
`errHandler` passed in may not be executed, resulting in the `running` boolean not being reset back
to false.
This commit uses `ActionListener.wrap(...)` instead of creating a new ActionListener, which ensures
that if the `onResponse` fails in any way, the `onFailure` handler is still called.
Resolves#55217
Queries like script_score wrap a query and modify its score. If the inner query
rewrites to match_none, then the entire query can rewrite to match_none. This
lets us detect that certain shards can be skipped during the 'can match' phase.
This was a simple change that seemed like it would help in some cases. But it
will likely not have a huge impact, since in many use cases where the 'can
match' phase is helpful, the search is not sorted by score.
Today we pass the `RepositoriesService` to the searchable snapshots plugin
during the initialization of the `RepositoryModule`, forcing the plugin to be a
`RepositoryPlugin` even though it does not implement any repositories.
After discussion we decided it best for now to pass this in via
`Plugin#createComponents` instead, pending some future work in which plugins
can depend on services more dynamically.
Today the voting config exclusions API accepts node filters and resolves them
to a collection of node IDs against the current cluster membership.
This is problematic since we may want to exclude nodes that are not currently
members of the cluster. For instance:
- if attempting to remove a flaky node from the cluster you cannot reliably
exclude it from the voting configuration since it may not reliably be a
member of the cluster
- if `cluster.auto_shrink_voting_configuration: false` then naively shrinking
the cluster will remove some nodes but will leaving their node IDs in the
voting configuration. The only way to clean up the voting configuration is to
grow the cluster back to its original size (potentially replacing some of the
voting configuration) and then use the exclusions API.
This commit adds an alternative API that accepts node names and node IDs but
not node filters in general, and deprecates the current node-filters-based API.
Relates #47990.
Backport of #50836 to 7.x.
Co-authored-by: zacharymorn <zacharymorn@gmail.com>
The ResourceWatcherService enables watching of files for modifications
and deletions. During startup various consumers register the files that
should be watched by this service. There is behavior that might be
unexpected in that the service may not start polling until later in the
startup process due to the use of lifecycle states to control when the
service actually starts the jobs to monitor resources. This change
removes this unexpected behavior so that upon construction the service
has already registered its tasks to poll resources for changes. In
making this modification, the service no longer extends
AbstractLifecycleComponent and instead implements the Closeable
interface so that the polling jobs can be terminated when the service
is no longer required.
Relates #54867
Backport of #54993
`auto_date_histogram`'s reduction behavior is fairly complex and we have
some fairly complex testing logic for it but it is super difficult to
look at that testing logic and say "ah, that is what it does in this
case". This adds some tests explicit (non-randomized) tests of the
reduction logic that *should* be easier to read.
I've noticed that a lot of our tests are using deprecated static methods
from the Hamcrest matchers. While this is not a big deal in any
objective sense, it seems like a small good thing to reduce compilation
warnings and be ready for a new release of the matcher library if we
need to upgrade. I've also switched a few other methods in tests that
have drop-in replacements.
Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
Upgrade to lucene 8.5.1 release that contains a bug fix for a bug that might introduce index corruption when deleting data from an index that was previously shrunk.
We can be a little more efficient when aborting a snapshot. Since we know the new repository
data after finalizing the aborted snapshot when can pass it down to the snapshot completion listeners.
This way, we don't have to fork off to the snapshot threadpool to get the repository data when the listener completes and can directly submit the delete task with high priority straight from the cluster state thread.
Snapshot deletes should first check the cluster state for an in-progress snapshot
and try to abort it before checking the repository contents. This allows for atomically
checking and aborting a snapshot in the same cluster state update, removing all possible
races where a snapshot that is in-progress could not be found if it finishes between
checking the repository contents and the cluster state.
Also removes confusing races, where checking the cluster state off of the cluster state thread
finds an in-progress snapshot that is then not found in the cluster state update to abort it.
Finally, the logic to use the repository generation of the in-progress snapshot + 1 was error
prone because it would always fail the delete when the repository had a pending generation different from its safe generation when a snapshot started (leading to the snapshot finalizing at a
higher generation).
These issues (particularly that last point) can easily be reproduced by running `SLMSnapshotBlockingIntegTests` in a loop with current `master` (see #54766).
The snapshot resiliency test for concurrent snapshot creation and deletion was made to more
aggressively start the delete operation so that the above races would become visible.
Previously, the fact that deletes would never coincide with initializing snapshots resulted
in a number of the above races not reproducing.
This PR is the most consistent I could get snapshot deletes without changes to the state machine. The fact that aborted deletes will not put the delete operation in the cluster state before waiting for the snapshot to abort still allows for some possible (though practically very unlikely) races. These will be fixed by a state-machine change in upcoming work in #54705 (which will have a much simpler and clearer diff after this change).
Closes#54766
* Remove Redundant Cluster State during Snapshot INIT + Master Failover (#54420)
Similar to #54395 we know that a snapshot in INIT state has not
written anything to the repository yet. If we see one from a master
failover, there is no point in moving it to ABORTED before removing it
from the cluster state in a subsequent CS update.
Instead, we can simply remove its job from the CS the first time
we see it on master failover and be done with it.
* Move Snapshot Status Related Method to Appropriate Places
Lots of things living in `SnapshotsService` for no reason other than
that `SnapshotsService` provides the `RepositoriesService`.
Cleaning this up to directly use `RepositoriesService` in the relevant
transport actions and by that shortening the already very complex `SnapshotsService`.
Just like in `AbstractCoordinatorTestCase` we can't just assume the cluster
is stable once all the cluster states align since stray follower/leader check
tasks could still hit us after a disconnect, causing future test operations to fail.
=> fixed by running all tasks in the possible time span of running into these
checks before validating that cluster states align on all nodes to prevent this
like we do in the coordinator tests.
Closes#55103
Provides basic repository-level stats that will allow us to get some insight into how many
requests are actually being made by the underlying SDK. Currently only tracks GET and LIST
calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint
that will help us better understand some of the low-level dynamics of searchable snapshots.
This is a first cut at giving NodeInfo the ability to carry a flexible
list of heterogeneous info responses. The trick is to be able to
serialize and deserialize an arbitrary list of blocks of information. It
is convenient to be able to deserialize into usable Java objects so that
we can aggregate nodes stats for the cluster stats endpoint.
In order to provide a little bit of clarity about which objects can and
can't be used as info blocks, I've introduced a new interface called
"ReportingService."
I have removed the hard-coded getters (e.g., getOs()) in favor of a
flexible method that can return heterogeneous kinds of info blocks
(e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the
need to cast in the client code.
With this change, when a task is canceled, the task manager will cancel
not only its direct child tasks but all also its descendant tasks.
Closes#50990
Adds support for filters to T-Test aggregation. The filters can be used to
select populations based on some criteria and use values from the same or
different fields.
Closes#53692
The secure_settings_password was never taken into consideration in
the ReloadSecureSettings API. This commit fixes that and adds
necessary REST layer testing. Doing so, it also:
- Allows TestClusters to have a password protected keystore
so that it can be set for tests.
- Adds a parameter to the run task so that elastisearch can
be run with a password protected keystore from source.
The usage of local parameter for GetFieldMappingRequest has been removed from the underlying transport action since v2.0.
This PR deprecates the parameter from rest layer. It will be removed in next major version.
We added a fancy method to provide random realistic test data to the
reduction tests in #54910. This uses that to remove some of the more
esoteric machinations in the agg tests. This will marginally increase
the coverage of the serialiation tests and, more importantly, remove
some mysterious value generation code that only really made sense for
random reduction tests but was used all over the place. It doesn't, on
the other hand, make the tests shorter. Just *hopefully* more clear.
I only cleaned up a few tests this way. If we like this it'd probably be
worth grabbing others.
We found some problems during the test.
Data: 200Million docs, 1 shard, 0 replica
hits | avg | sum | value_count |
----------- | ------- | ------- | ----------- |
20,000 | .038s | .033s | .063s |
200,000 | .127s | .125s | .334s |
2,000,000 | .789s | .729s | 3.176s |
20,000,000 | 4.200s | 3.239s | 22.787s |
200,000,000 | 21.000s | 22.000s | 154.917s |
The performance of `avg`, `sum` and other is very close when performing
statistics, but the performance of `value_count` has always been poor,
even not on an order of magnitude. Based on some common-sense knowledge,
we think that `value_count` and sum are similar operations, and the time
consumed should be the same. Therefore, we have discussed the agg
of `value_count`.
The principle of counting in es is to traverse the field of each
document. If the field is an ordinary value, the count value is
increased by 1. If it is an array type, the count value is increased
by n. However, the problem lies in traversing each document and taking
out the field, which changes from disk to an object in the Java
language. We summarize its current problems with Elasticsearch as:
- Number cast to string overhead, and GC problems caused by a large
number of strings
- After the number type is converted to string, sorting and other
unnecessary operations are performed
Here is the proof of type conversion overhead.
```
// Java long to string source code, getChars is very time-consuming.
public static String toString(long i) {
int size = stringSize(i);
if (COMPACT_STRINGS) {
byte[] buf = new byte[size];
getChars(i, size, buf);
return new String(buf, LATIN1);
} else {
byte[] buf = new byte[size * 2];
StringUTF16.getChars(i, size, buf);
return new String(buf, UTF16);
}
}
```
test type | average | min | max | sum
------------ | ------- | ---- | ----------- | -------
double->long | 32.2ns | 28ns | 0.024ms | 3.22s
long->double | 31.9ns | 28ns | 0.036ms | 3.19s
long->String | 163.8ns | 93ns | 1921 ms | 16.3s
particularly serious.
Our optimization code is actually very simple. It is to manage different
types separately, instead of uniformly converting to string unified
processing. We added type identification in ValueCountAggregator, and
made special treatment for number and geopoint types to cancel their
type conversion. Because the string type is reduced and the string
constant is reduced, the improvement effect is very obvious.
hits | avg | sum | value_count | value_count | value_count | value_count | value_count | value_count |
| | | double | double | keyword | keyword | geo_point | geo_point |
| | | before | after | before | after | before | after |
----------- | ------- | ------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- |
20,000 | 38s | .033s | .063s | .026s | .030s | .030s | .038s | .015s |
200,000 | 127s | .125s | .334s | .078s | .116s | .099s | .278s | .031s |
2,000,000 | 789s | .729s | 3.176s | .439s | .348s | .386s | 3.365s | .178s |
20,000,000 | 4.200s | 3.239s | 22.787s | 2.700s | 2.500s | 2.600s | 25.192s | 1.278s |
200,000,000 | 21.000s | 22.000s | 154.917s | 18.990s | 19.000s | 20.000s | 168.971s | 9.093s |
- The results are more in line with common sense. `value_count` is about
the same as `avg`, `sum`, etc., or even lower than these. Previously,
`value_count` was much larger than avg and sum, and it was not even an
order of magnitude when the amount of data was large.
- When calculating numeric types such as `double` and `long`, the
performance is improved by about 8 to 9 times; when calculating the
`geo_point` type, the performance is improved by 18 to 20 times.
Currently the remote cluster sniff connection process can succeed even
if no connections are opened. This commit fixes this by failing the
connection process if no connections are successfully opened.
This commit adds an explicit test of time zone rewrite on date nanos
field. Today this is working but we need tests to ensure that we don't
break it unintentionally.