This includes a few small cleanups for the `TermsAggregatorFactory`:
1. Removes an unused `DeprecationLogger`
2. Moves the members to right above the ctor.
3. Merges some all of the heuristics for picking `SubAggCollectionMode`
into a single method.
This saves some memory when the `histogram` aggregation is not a top
level aggregation by dropping `asMultiBucketAggregator` in favor of
natively implementing multi-bucket storage in the aggregator. For the
most part this just uses the `LongKeyedBucketOrds` that we built the
first time we did this.
* Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695)
This commit lays the ground work for plugins supplying their own circuit breakers.
It adds a new interface: `CircuitBreakerPlugin`.
This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers.
With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.
Unfortunately, we cannot have a safety mechnism like this where we throw whenever we find unreadable data in a shard.
This breaks in the case of an older ES version (without shard generations enabled) having failed to snapshot a shard snapshot after writing some data to its path and having finalized it for example.
Another example of where we can't support this check is the test I added, if we snapshot an index with a name that already exists in the repository and more shards than the existing index, fail doing that and then retry snapshotting it we will also see unexpected data in the path.
We could technically do deeper inspections on the unexpected data but I don't think it's worth it really. In the end if we are unable to read the data here it's broken anyway. By moving to a new `index-` blob in the shard directory I don't see us ever
corrupting existing data and since we (by virtue of moving to an empty generation) won't do any incremental work on top of potentially corrupt data we also do not risk creating broken snapshots going forward.
=> Just logging a warning in this very unlikely case is the best we can do I think
When the parameter `max_docs` is less than `slices` in update_by_query,
delete_by_query or reindex API, `max_docs ` is set to 0 and we throw an
action_request_validation_exception with confused error message:
"maxDocs should be greater than 0...".
This change checks that whether `max_docs` is less than `slices` and
throw an illegal_argument_exception with clear message.
Relates to #52786.
Co-authored-by: bellengao <gbl_long@163.com>
Backport of #56878 to 7.x branch.
With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.
Relates to #53100
Jackson 2.10 library has added a new type of error that is thrown when a numeric value is out
of range. This error should be catch and handle properly in case the flag ignore_malformed
has been set to true.
When the `terms` enum operates on non-numeric data it can collect it via
global ordinals. It actually has two separate collection strategies for,
one "dense" and one "remapping". Each of *those* strategies has two
"iteration" strategies that it uses to build buckets, depending on
whether or not we need buckets with `0` docs in them. Previously this
was done with several `null` checks and never really explained. This
change replaces those checks with two `CollectionStrategy` classes which
have good stuff like documentation.
Backporting #56888 to 7.x branch.
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.
This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.
Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.
Relates to #53100
If an upgraded node is restarted multiple times without flushing a new
index commit, then we will wrongly exclude all commits from the starting
commits. This bug is reproducible with these minimal steps: (1) create
an empty index on 6.1.4 with translog retention disabled, (2) upgrade
the cluster to 7.7.0, (3) restart the upgraded the cluster. The problem
is that with the new translog policy can trim translog without having a
new index commit, while the existing commit still refers to the previous
translog generation.
Closes#57091
This saves memory when running numeric significant terms which are not
at the top level by merging its collection into numeric terms and relying
on the optimization that we made in #55873.
Slow loggers should use single shared logger as otherwise when index is
deleted the log4j logger will remain reachable (log4j is caching) and
will create a memory leak.
closes https://github.com/elastic/elasticsearch/issues/56171
When we had multiple mapping types, an update to a field in one type had to be
propagated to the same field in all other types. This was done using the
Mapper.updateFieldType() method, called at the end of a merge. However, now
that we only have a single type per index, this method is unnecessary and can
be removed.
Relates to #41059
Backport of #56986
Closes#57168 by using `AggregatorTestCase#newIndexSearcher` in the
`AggregatorTestCase#testCase`. Without that global ordinals will
*sometimes* fail to work.
Until 7.7 we used to ignore `null` values for `bool`queries `minimum_should_match`,
parameters and also for the `must`, `must_not`, `should` and `filter` clauses.
An internal refactoring has changed this so now we get a parsing error. While `null`
should not a common value here, we should restore the old behaviour for bwc for now.
Closes#56812
If a partial snapshot has some of its shards aborted because an index got deleted, this can lead to confusing `IllegalStateExceptions` when trying to increment the ref count of the already closed `Store`.
Refactored this a little to throw the same exception for aborted shards no matter the timing of the store close and assert that the concurrent store close can in fact only happen when the shard snapshot has already been aborted.
A task might not be canceled on disconnection if it is completed before the cancellation
is started. We need to relax the assertion in this test.
Closes#56746
When slicing a releasable bytes reference we would create a new counter
every time and pass the original reference chain to the new slice on every
slice invocation. This would lead to extremely deep reference chains and
needlessly uses a dedicated counter for every slice when all the slices
eventually just refer to the same underlying bytes and `Releasable`.
This commit tracks the ref count wrapper with its releasable in a separate
object that can be passed around on every slicing, making the slices' tree
as flat as the original releasable bytes reference.
Also, we were needlessly creating a redundant releasable bytes reference from
a releasable bytes-stream-output that we never actually used for releasing (all code
that uses it just releases the stream itself instead).
We don't need to hold on to the request body past the beginning of sending
the response. There is no need to keep a reference to it until after the response
has been sent fully and we can eagerly release it here.
Note, this can be optimized further to release the contents even earlier but for now
this is an easy increment to saving some memory on the IO pool.
* Update DeprecationMap to DynamicMap (#56149)
This renames DeprecationMap to DynamicMap, and changes the deprecation
messages Map to accept a Map of String (keys) to Functions (updated values)
instead. This creates more flexibility in either logging or updating values from
params within a script. This change is required to fix (#52103) in a future PR.
* Fix Source Return Bug in Scripting (#56831)
This change ensures that when a user returns _source directly no matter where
accessed within scripting, the value is a Map of the converted source as
opposed to a SourceLookup.
These log statements are also logged by every "simulate adding this index"
functionality. One of them is the rollover action in ILM which executes
rollover dry-runs until the conditions are met, when the actual rollover
is executed.
This changes the statements log level to DEBUG and changes the phrasing
from V1/V2 to legacy/composable templates.
(cherry picked from commit 7cc8e1fe7f9731213ac4869fe99853564fbaaba9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.
We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender
misinterpreting the response which may have unpredictable consequences.
Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.
Closes#54337
Today a transport response uses the same wire format version as the
corresponding request. This mostly works since we mostly know we are
communicating with a node with a compatible version. TCP handshakes don't have
this guarantee since they use `Version.CURRENT.minimumCompatibilityVersion()`
to let us handshake with older nodes. This results in the strange situation of
a node of major version `N` responding to a node of major version `N-1` using a
wire format of version `N-2`.
We put extra effort into the longer BWC requirements for successful responses,
but we do not offer the same guarantees for error responses since they may be
rather complicated to serialize. This can result in the request sender
misinterpreting the response which may have unpredictable consequences.
Rather than strengthening the guarantees in this area, this commit simply logs
the exception and closes the connection on a handshake error with a node that
uses an incompatible wire format.
Closes#54337
All of these files were written by us, and not sourced from
anywhere. Therefore, the license head should be granting licenses to
Elasticsearch, rathern than to the ASF. This commit address them by
changing the license to our standard Apache 2.0 license header.
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.
This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.
Relates to #56814
When `date_histogram` is a sub-aggregator it used to allocate a bunch of
objects for every one of it's parent's buckets. This uses the data
structures that we built in #55873 rework the `date_histogram`
aggregator instead of all of the allocation.
Part of #56487
Elasticsearch requires that a HttpRequest abstraction be implemented
by http modules before server processing. This abstraction controls when
underlying resources are released. This commit moves this abstraction to
be created immediately after content aggregation. This change will
enable follow-up work including moving Cors logic into the server
package and tracking bytes as they are aggregated from the network
level.
PR #56893 was supposed to randomise the iteration count in
`testDataOnlyNodePersistence` but this change was mistakenly omitted. This
commit addresses this.
This test failed if all 1000 top-level `rarely()` calls in the loop returned
`false`, because then we would never set the term of the persisted state. This
commit fixes this by adding an earlier call to `persistedState#setCurrentTerm`.
It also changes the test to clean up the threadpools it starts whether it
passes or fails.
When reading/writing the individual doc responses in the context
of a bulk shard response there is no need to serialize the `ShardId`
over and over. This can waste a lot of memory when handling large bulk
requests.
This assertion is too strict. A snapshot will be removed from the cluster state
on the CS thread before it is removed from the listeners map on the snapshot thread pool.
Throughout the removal from the cluster state and listener map, the snapshot is tracked
in `endingSnapshots` though, so we can relax the assertion accordingly and are still able
to catch leaked listeners.
Closes#56607
In the unlikely event that the data nodes started snapshotting the
shards already (and hence got blocked on the data blobs) before the
master has applied the cluster state to its own `SnapshotsService` on
the CS applier thread, we can get a `SnapshotMissingException` here which
breaks the busy assert loop so we have to deal with it explicitly.
Closes#56858
Currently it is possible that a sniff connection round is occurring as
we enter another test loop in testEnsureWeReconnect. The problem is that
once we enter another loop, closing the connection manually can cause
this pre-existing connection round to fail. This round failing can fail
the test. This commit fixes the issue by ensuring that there are no
in-progress connections before entering another loop.
It was relying on the compensated sum working but the test framework was
dodging it. This forces the accuracy tests to come from a single shard
where we get the proper compensated sum.
Closes#56757
We get the number of shards and replicas with our bare hands in index
metadata, rather than letting the settings infrastructure do the work
for us. This commit switches to using the settings infrastructure.
Today a 7.x node logs `cluster UUID set to [...]` on every cluster state update
received from a 6.8 master, because 6.8 nodes are not able to commit the
cluster UUID properly. We could try and deduplicate these logs somehow, but
that would introduce a good deal of complexity. Instead, this commit suppresses
these logs entirely when receiving cluster state updates from a 6.8 master.
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.
The aggregations supports the following normalizations
- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization
To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.
For example:
```
{
"normalize": {
"buckets_path": <>,
"normalizer": "percent"
}
}
```
If a channel gets disconnected, then we should cancel the tasks
associated with that channel as their results won't be retrieved.
Closes#56327
Relates #56619
Backport of #56620
We previously rejected removing the number of replicas setting, which
prevents users from reverting this setting to its default the natural
way. To fix this, we put back the setting with the default value in the
cases that the user is trying to remove it. Yet, we also need to do the
work of updating the routing table and so on appropriately. This case
was missed because when the setting is being removed, we were defaulting
to -1 in this code path, which is treated as not being updated. Instead,
we must treat the case when we are removing this setting as if the
setting is being updated, too. This commit does that.
In normal operation native controllers are not expected to write
anything to stdout or stderr. However, if due to an error or
something unexpected with the environment a native controller
does write something to stdout or stderr then it will block if
nothing is reading that output.
This change makes the stdout and stderr of native controllers
reuse the same stdout and stderr as the Elasticsearch JVM (which
are by default redirected to es.stdout.log and es.stderr.log) so
that if something unexpected is written to native controller
output then:
1. The native controller process does not block, waiting for
something to read the output
2. We can see what the output was, making it easier to debug
obscure environmental problems
Backport of #56491
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.
Precondition for the terms work in #56487
When using source filtering exclusions, empty arrays are not preserved in documents, and no empty arrays are returned if arrays are empty after applying exclusions. We have special treatment to make sure that we preserve empty objects, but the behaviour for arrays is different.
It looks like this regression was introduced by #22593, shortly after we refactored source filtering to use automata (#20736).
Note that this change affects what the search API returns when using source exclusions, as well as what gets indexed when using source exclusions for the _source field.
Closes#23796
This adds a few things to the `breakdown` of the profiler:
* `histogram` aggregations now contain `total_buckets` which is the
count of buckets that they collected. This could be useful when
debugging a histogram inside of another bucketing agg that is fairly
selective.
* All bucketing aggs that can delay their sub-aggregations will now add
a list of delayed sub-aggregations. This is useful because we
sometimes have fairly involved logic around which sub-aggregations get
delayed and this will save you from having to guess.
* Aggregtations wrapped in the `MultiBucketAggregatorWrapper` can't
accurately add anything to the breakdown. Instead they the wrapper
adds a marker entry `"multi_bucket_aggregator_wrapper": true` so we
can be quickly pick out such aggregations when debugging.
It also fixes a bug where `_count` breakdown entries were contributing
to the overall `time_in_nanos`. They didn't add a large amount of time
so it is unlikely that this caused a big problem, but I was there.
To support the arbitrary breakdown data this reworks the profiler so
that the `breakdown` can contain any data that is supported by
`StreamOutput#writeGenericValue(Object)` and
`XContentBuilder#value(Object)`.
This PR proposes to use `IndexSortSortedNumericDocValuesRangeQuery` when
possible to speed up certain range queries. Points-based queries are already
very efficient, the only time this query makes a difference is when the range
matches a large number of documents.
Relates to #48665.
This is similar to a previous change that allowed removing the number of
replicas settings (so setting it to its default) on open indices. This
commit allows the same for closed indices.
It is unfortunate that we have separate branches for handling open and
closed indices here, but I do not see a clean way to merge these two
together without making a rather unnatural method (note that they invoke
different methods for doing the settings updates). For now, we leave
this as-is even though it led to the miss here.
Today a user can create an index without setting the
index.number_of_replicas setting even though the index metadata requires
that the setting has a value. We do this when creating an index by
explicitly settings index.number_of_replicas to a default value if one
is not provided. However, if a user updates the number of replicas, and
then let wants to return to the default value, they are naturally
inclined to try setting this setting to null, as the agreed upon way to
return a setting to its default. Since the index metadata requires that
this setting has a non-null value, we blow up when a user attempts to
make this change. This is because we are not taking the same action when
updating a setting on an index that we take when create an
index. Namely, we are not explicitly setting index.number_of_replicas if
the request does not carry a value for this setting. This would happen
when nulling the setting, which we want to support. This commit
addresses this by setting index.number_of_replicas to the default if the
value for this setting is null when updating the settings for an index.
Currently the `time_zone` parameter in `query_string` queries gets applied
correctly only when using the range syntax, e.g "date:[2020-01-02 TO
2020-01-05]. When a date field gets searched without explicit range syntax, e.g.
"date:"2020-01-01" we internally create a range query than uses the specified
date as start date and rounds up to the next underspecified units for the end
date (e.g. here 2020-01-01T23:59:59) without considering the `time_zone`
settings. This change adds a check in QueryStringQueryParser to detect this
scenario early where we have access to the time zone information and directly
create a range query using it.
Closes#55813
In a race condition, a search context could remain enlisted in
SearchService when an index is deleted, potentially causing the index
folder to not be cleaned up (for either lengthy searches or scrolls with
timeouts > 30 minutes or if the scroll is kept active).
If a conditional is added to a processor, and that processor fails, and
that processor has an on_failure handler, the full trace of all of the
executed processors may not be displayed in simulate verbose. The
information is correct, but misses displaying some of the steps used
to get there.
This happens because a processor that is conditional processor is a
wrapper around the real processor and a processor with an on_failure
handler is also a wrapper around the processor(s). When decorating for
simulation we treat compound processor specially, but if a compound processor
is wrapped by a conditional processor that compound processor's processors
can be missed for decoration resulting in the missing displayed steps.
The fix to this is to treat the conditional processor specially and
explicitly seperate it from the processor it is wrapping. This requires
us to keep track of 2 processors a possible conditional processor and
the actual processor it may be wrapping.
related: #56004
Two spots that allow for some optimization:
* We are often creating a composite reference of just a single item in
the transport layer => special cased via static constructor to make sure we never do that
* Also removed the pointless case of an empty composite bytes ref
* `ByteBufferReference` is practically always created from a heap buffer these days so there
is no point of dealing with all the bounds checks and extra references to sliced buffers from that
and we can just use the underlying array directly
Today the heap size check warns the user about two issues why they might
care about the heap size check: resize pauses, and if memory locking is
enabled. Yet, we unconditionally make mention of the memory locking
reason, even if memory locking is not enabled. This can confuse some
users, so we adjust the warning about memory locking to only display if
memory locking is enabled.
Backport: #55377
This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.
Relates to #53100
Similar to what the moving function aggregation does, except merging windows of percentiles
sketches together instead of cumulatively merging final metrics
Move data stream resolvability test from IndicesOptionsIntegrationIT to DataStreamIT class.
Whether a transport action supports data streams is no longer controlled via indices options.
This adds support for parsing numbers as range keys. They get converted
into a string, but we allow numbers.
While I was there I replaced the parser for `Range` with a
`ConstructingObjectParser` which will automatically add support for "did
you mean" style corrections on errors.
Closes#56402
This commit refactors the following:
* GeoPointFieldMapper and PointFieldMapper to
AbstractPointGeometryFieldMapper derived from AbstractGeometryFieldMapper.
* .setupFieldType moved up to AbstractGeometryFieldMapper
* lucene indexing moved up to AbstractGeometryFieldMapper.parse
* new addStoredFields, addDocValuesFields abstract methods for implementing
stored field and doc values field indexing in the concrete field mappers
This refactor is the next phase for setting up a framework for extending
spatial field mapper functionality in x-pack.
This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that
allowed specifying whether V1 or V2 template should be used when an index is created. It has been
removed in favor of V2 templates always having priority.
Relates to #53101Resolves#56528
This is not a breaking change because this flag was never in a released version.
Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction
to be able to resolve data streams by default. Implementations can change this ability.
This change allows to following APIs to resolve data streams: flush,
refresh (already supported data streams), force merge, clear indices cache,
indices stats (already supported data streams), segments, upgrade stats,
upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and
reload analyzers APIs.
Relates to #53100
This wires `auto_date_histogram` into the rounding optimization that I
built in #55559. This is should significantly speed up any
`auto_date_histogram`s with `time_zone`s on them.
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:
1. We didn't have an appropriate data structure to track the
sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
per `Aggregator` which was the only way to delay collection of
sub-aggregations.
This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.
It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.
It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.
I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
When an index spans a daylight savings time transition we can't use our
optimization that rewrites the requested time zone to a fixed time zone
and instead we used to fall back to a java.util.time based rounding
implementation. In #55559 we optimized "time unit" rounding. This
optimizes "time interval" rounding.
The java.util.time based implementation is about 1650% slower than the
rounding implementation for a fixed time zone. This replaces it with a
similar optimization that is only about 30% slower than the fixed time
zone. The java.util.time implementation allocates a ton of short lived
objects but the optimized implementation doesn't. So it *might* end up
being faster than the microbenchmarks imply.
Use proper facility for creating temporary index service for the simulation
that does not add itself to the `IndicesService` unnecessarily (breaking an assertion about the
internal consistency of the cluster state and the `IndicesService`).
Closes#56298
Backport of: #56413
Allow cluster health api to resolve data streams and
automatically remove data streams after each test in
test cases extending from `ESIntegTestCase`
Relates to #53100
While investigating possible optimizations to speed up searchable
snapshots shard restores, we noticed that Elasticsearch builds the
list of shard files on local disk in order to compare it with the list of
files contained in the snapshot to restore. This list of files is
materialized with a MetadataSnapshot object whose construction
involves to read the footer checksum of every files of the shard
using Store.checksumFromLuceneFile() method.
Further investigation shows that a MetadataSnapshot object is
also created for other types of operations like building the list of
files to recover in a peer recovery (and primary shard relocation)
or in order to assign a shard to a node. These operations use the
Store.getMetadata(IndexCommit) method to build the list of files
and checksums.
In the case of searchable snapshots building the MetadataSnapshot
object can potentially trigger cache misses, which in turn can
cause the download and the writing in cache of the last range of
the file in order to check the 16 bytes footer. This in turn can
cause more evictions.
Since searchable snapshots already contains the footer information
of every file in BlobStoreIndexShardSnapshot it can directly read the
checksum from it and avoid to use the cache at all to create a
MetadataSnapshot for the operations mentioned above.
This commit adds a shortcut to the
SearchableSnapshotDirectory.openInput() method - similarly to what
already exists for segment infos - so that it creates a specific
IndexInput for checksum reading operation.
A bug in InternalGeoCentroid#reduce existed that summed up
the aggregation's long-valued counts into a local integer variable.
Since it is definitely possible to reduce more than Integer.MAX points,
this change simply updates that variable to be a long-valued number.
Closes#55992.
Currently, the logging around the SniffConnectionStrategy is limited.
The log messages are inconsistent and sometimes wrong. This commit
cleans up these log message to describe when connections are happening
and what failed if a step fails.
Additionally, this commit enables TRACE logging for a problematic test
(testEnsureWeReconnect).
Currently when a connection closes a new sniff round begins. The
testCollectNodes test closes four transports before triggering the
method to collect the remote nodes. This leads to a race where there are
a number of reasons the collect nodes call might fail. This commit fixes
that issue by changing the test assertion to include a potential failure
condition.
Fixes#55292.
`auto_date_histogram` was returning the incorrect `interval` because
of a combination of two things:
1. When pipeline aggregations rewrote `auto_date_histogram` we reset the
interval to 1. Oops. Fixed that.
2. *Every* bucket aggregation was rewriting its buckets as though there
was a pipeline aggregation even if there aren't any. This is a bit
silly so we skip that too.
Closes#56116
We fail to unregister the child node in registerAndExecute if the parent
task is being canceled. This leads to a bug where a cancel request never
completes.
Closes#55875
Relates #54312
Rounding dates on a shard that contains a daylight savings time transition
is currently something like 1400% slower than when a shard contains dates
only on one side of the DST transition. And it makes a ton of short lived
garbage. This replaces that implementation with one that benchmarks to
having around 30% overhead instead of the 1400%. And it doesn't generate
any garbage per search hit.
Some background:
There are two ways to round in ES:
* Round to the nearest time unit (Day/Hour/Week/Month/etc)
* Round to the nearest time *interval* (3 days/2 weeks/etc)
I'm only optimizing the first one in this change and plan to do the second
in a follow up. It turns out that rounding to the nearest unit really *is*
two problems: when the unit rounds to midnight (day/week/month/year) and
when it doesn't (hour/minute/second). Rounding to midnight is consistently
about 25% faster and rounding to individual hour or minutes.
This optimization relies on being able to *usually* figure out what the
minimum and maximum dates are on the shard. This is similar to an existing
optimization where we rewrite time zones that aren't fixed
(think America/New_York and its daylight savings time transitions) into
fixed time zones so long as there isn't a daylight savings time transition
on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement
time interval rounding the time zone rewriting optimization *should* no
longer be needed.
This optimization doesn't come into play for `composite` or
`auto_date_histogram` aggs because neither have been migrated to the new
`DATE` `ValuesSourceType` which is where that range lookup happens. When
they are they will be able to pick up the optimization without much work.
I expect this to be substantial for `auto_date_histogram` but less so for
`composite` because it deals with fewer values.
Note: My 30% overhead figure comes from small numbers of daylight savings
time transitions. That overhead gets higher when there are more
transitions in logarithmic fashion. When there are two thousand years
worth of transitions my algorithm ends up being 250% slower than rounding
without a time zone, but java time is 47000% slower at that point,
allocating memory as fast as it possibly can.
We were logging the cleanup of the snap- and meta- blobs for every snapshot delete
which is needlessly noisy and confusing to users. We should only log actual stale/unexpected
blobs here.
This commit creates a new gradle plugin to provide a separate task name
and source set for running ESIntegTestCase tests. The only project
converted to use the new plugin in this PR is server, as an example. The
remaining cases in x-pack will be handled in followups.
backport of #55896
`FieldMapper#parseCreateField` accepts the parse context, plus a list of fields
as an output parameter. These fields are immediately added to the document
through `ParseContext#doc()`.
This commit simplifies the signature by removing the list of fields, and having
the mappers add the fields directly to `ParseContext#doc()`. I think this is
nicer for implementors, because previously fields could be added either through
the list, or the context (through `add`, `addWithKey`, etc.)
A FilterBlobContainer class was introduced in #55952 and it delegates
its behavior to a given BlobContainer while allowing to override
only necessary methods.
This commit replaces the existing BlobContainerWrapper class from
the test framework with the new FilterBlobContainer from core.
this commit adds aggregation support for the geo_shape field
type on geo*_grid aggregations.
it introduces a Tiler for both tiles and hashes that enables a new type of
ValuesSource to replace the GeoPoint's CellIdSource. This makes it possible
for the existing Aggregator to be re-used, so no new implementations of
the grid aggregators are added.
This commit changes searchable snapshots so that it now respects the
repository's max_restore_bytes_per_sec setting when it downloads blobs.
Backport of #55952 for 7.x
This arose after a backport where we do not have the nicities of the
Java 11 diamond operator. This commit fixes it by adding the proper type
parameter.
When the index we are validating a query does not exist, we try to send
back a response letting the client know that the index does not
exist. Yet, we accidentally fallthrough into the case that the
validation failed for some other reason. This means that we end up
notifying the channel twice. Sometimes the notification occurs after the
failure has been written out and the channel closed (so the second
invocation leads to a silent failed to write to a closed channel issue),
and sometimes the response does end up in the channel, creating garbled
responses to the client. This commit fixes that issue by avoiding the
fallthrough.
Backport of #56034.
Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context
as a dedicated field that callers to IndexNameExpressionResolver can set.
Also alter indices stats api to support data streams.
The rollover api uses this api and otherwise rolling over data stream does no longer work.
Relates to #53100
Backport of #55858 to 7.x branch.
Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.
This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:
* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See #55377
* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:
AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.
The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.
Relates to #53100
It's possible for a constant_keyword to have a 'null' value before any documents
are seen that contain a value for the field. In this case, no documents have a
value for the field, and 'exists' queries should return no documents.
Making use of #55773 to simplify snapshot state machine.
1. Deletes with no in-progress snapshot now add the delete entry to the cluster state right away
instead of doing a second CS update after the fist update was a NOOP.
2. If a bulk delete matches in-progress as well as completed snapshots, abort the in-progress snapshot
and then move on to delete from the repository.
Backports #55933 to 7.x
Implements value_count and avg aggregations over Histogram fields as discussed in #53285
- value_count returns the sum of all counts array of the histograms
- avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array
Using optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Allows for a follow-up to remove the snapshot INIT state.
* Allow Deleting Multiple Snapshots at Once (#55474)
Adds deleting multiple snapshots in one go without significantly changing the mechanics of snapshot deletes otherwise.
This change does not yet allow mixing snapshot delete and abort. Abort is still only allowed for a single snapshot delete by exact name.
I see occasional confusion about the explanations emitted by the same-shard
allocation decider, particularly amongst new users setting up a single-node
cluster and trying to determine why their cluster has `yellow` health. For
example:
the shard cannot be allocated to the same node on which a copy of the shard
already exists
This is technically correct but it's quite a complicated sentence. Also, by
starting with "the shard cannot be allocated" it makes it sound like this is
the problem, whereas in fact this message is a good thing and users should
typically focus their attention elsewhere.
This commit simplifies the wording of these messages and makes them sound more
positive, for example:
a copy of this shard is already allocated to this node
In order to iterate through remote connections, the remote connection
manager maintains a local cache of connected nodes. Unfortunately this
is difficult in relationship with testing as it is inherently racy in
comparison to the parent connection manager map of connections.
This commit improves the relationship by only returning a cached
connection if it is still registered with the parent. If the connection
is not open, we will go to the slow path of allocating a iterator
directly from the parent.
* Emit deprecation warning if multiple v1 templates match with a new index (#55558)
* Emit deprecation warning if multiple v1 templates match with a new index
* DEPRECATION_LOGGER rename
This commit includes a number of minor improvements around `DelayableWriteable`: javadocs were expanded and reworded, `get` was renamed to `expand` and `DelayableWriteable` no longer implements `Supplier`. Also a couple of methods are now private instead of package private.
We generate a circular reference exception in translog in 6.8 in the
following scenario:
- The first rollGeneration hits "too many open files" exception when it's
copying a checkpoint file. We will set the tragic exception and close
the translog
- The second rollGeneration hits AlreadyClosedException as the current
writer is closed. We will suppress the ACE to the current tragic
exception. Unfortunately, this leads to a circular reference as ACE
already suppresses the tragic exception.
Other factors that help to manifest this bug:
- We do not fail the engine on AlreadyClosedException in flush
- We do not check for ensureOpen before rolling a new generation
Closes#55893
Ocassionally this test can fail when the randomized index.version.created is
before 6.1. In this case we don't check that if mappedFields.size() == 0 we
expect a MatchNoDocsQuery query being returned, which we do for other versions.
This fails only occasionally but with the seed provided on the original issue.
It also shouldn't be an issue on master since we shouldn't test with these pre-7
index versions there.
Closes#55950
Backports #55826 to 7.x
Modified AggregatorTestCase.searchAndReduce() method so that it returns an empty aggregation result when no documents have been inserted.
Also refactored several aggregation tests so they do not re-implement method AggregatorTestCase.testCase()
Fixes#55824
Currently if we shortcircuit a message the breaker release is null since
there is nothing to be broken. However, the TcpTransportChannel
infrastructure still expects it. This commit resolves this issue be
returning a no-op breaker release.
Currently `currentFieldType` is an instance variable that is first set and then
used by all methods referring to it. We can make it local to each method
instead, avoiding possible state problems and improve readability of the code
instead.
This adds a new api to simulate matching the given index name against the
index templates in the system.
The syntax for the new API takes the following form:
POST _index_template/_simulate_index/{index_name}
{
"index_patterns": ["logs-*"],
"priority": 15,
"template": {
"settings": {
"number_of_shards": 3
}
...
}
}
Where the body is optional, but we support the entire body used by the
PUT _index_template/{name} api. When the body is specified we'll simulate
matching the given index against a system that'd have the given index
template together with the index templates that exist in the system.
The response, in both cases, will return the matching template's resolved
settings, mappings and aliases, together with a special field that'll print any
overlapping templates and their corresponding index patterns.
(cherry picked from commit 1a5845edce1f445c58e094e9a3b6792e21e543b0)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285
Backports #55681 to 7.x
Today if state recovery is delayed by the `gateway.recover_after_*` settings
then we may end up performing state recovery twice: once when enough nodes have
joined the cluster, and again when the timeout elapses. The second state
recovery reinitializes the routing table, effectively discarding all
recovered/recovering shards and starting again from scratch. This commit adds a
check to prevent this second state recovery.
Closes#55564
Currently the testTransientErrorsDuringRecoveryAreRetried validates that
the expected peer recovery starts only once. This check is coarse and is
executed on all nodes and indexes. This commit modifies this check to
only be performed on the expected index. Additionally this commit
removes the disruption behavior from the "blue" node where it is not
relevant. Finally, this commit improves the logging for this test.
This replaces a reference to the result of partially reducing
aggregations that async search keeps with a reference to the serialized
form of the result of the partial reduction which we need to keep
anyway.
When calling scripts in metric aggregation, the returned metric state is
passed along to the coordinating node to do the final reduce. However,
it is possible the object could contain nested state which is unknown to
StreamOutput/StreamInput. This would then result in the node crashing as
exceptions are not expected in the middle of serialization.
This commit adds a method to StreamOutput that can determine if an
object is writeable by the stream. It uses the same logic
writeGenericValue, special casing each of the supported collection types
to recursively determine if each contained value is itself writeable.
relates #54708
Currently cancelling the RetryableAction does not stop one last run from
being executed. This commit makes a best effort attempt to cancel a
scheduled retry and guards future executions from the action already
being completed.
Currently a failed peer recovery action will fail an recovery. This
includes when the recovery fails due to potentially short lived
transient issues such as rejected exceptions or circuit breaking
errors.
This commit adds the concept of a retryable action. A retryable action
will be retryed in face of certain errors. The action will be retried
after an exponentially increasing backoff period. After defined time,
the action will timeout.
This commit only implements retries for responses that indicate the
target node has NOT executed the action.
If we advance the global checkpoint during commit and sync that
checkpoint after commit, then the assertions in the test won't hold
because the deletion policy did not see the latest global checkpoint
but only the value before committing.
Closes#55680
The disk decider had special handling for the single data node case,
allowing any allocation (skipping watermark checks) for such clusters.
This special handling can now be avoided via a setting.
If a node client (or rather its underlying node) is closed then
any executions on it will just quietly fail as happens in #55660
via closing the nodes on the test thread and asynchronously using
a node client.
Closes#55660
For 7.x, we already implemented the `?prefer_v2_templates` flag and made V2 templates opt-in, so we
can relax the error when updating V1 templates to just a warning. This will still be a hard error
for 8.0+
Relates to #53101
This has no practical impact on users since frozen indices are the only
throttled indices today. However this has an impact on upcoming features
that would use search throttling.
Filtering out throttled indices made sense a couple years ago, but as
we're now improving support for slow requests with `_async_search` and
exploring ways to reduce storage costs, this feature has most likely
become a trap, that we'd like to not have with upcoming features that
would use search throttling.
Relates #54058
Currently there is a clear mechanism to stub sending a request through
the transport. However, this is limited to testing exceptions on the
sender side. This commit reworks our transport related testing
infrastructure to allow stubbing request handling on the receiving side.
Fixes confusing error message when unknown value type is specified in a terms
aggregation. Adds support for parsing "numeric" and "number" value types.
Fixes#55727
We make sure to filter shard generations for indices that are missing
from the metadata when finalizing a partial snapshot (from concurrent index deletion)
but we failed to account for the case where we manually build a fake metadata instance
for snapshots without the global state.
Fixed this by handling missing indices by skipping, same way we do it for filtering the
shard generations.
Relates #50234
With this change, we will always return true for can_match requests on
idle search shards; otherwise, some shards will never get refreshed if
all search requests perform the can_match phase (i.e., total shards >
pre_filter_shard_size).
Relates #27500
Relates #50043
This commit refactors all spatial Field Mappers to a common
AbstractGeometryFieldMapper that implements shared parameter functionality
(e.g., ignore_malformed, ignore_z_value) and provides a common framework for
overriding type parsing, and building in xpack. Common shape functionality is
implemented in a new AbstractShapeGeometryFieldMapper that is reused and
overridden in GeoShapeFieldMapper, GeoShapeFieldMapperWithDocValues,
LegacyGeoShapeFieldMapper, and ShapeFieldMapper. This abstraction provides a
reusable foundation for adding new xpack features; such as coordinate reference
system support.
This adds a validation to VSParserHelper to ensure that a field or
script or both are specified by the user. This is technically
required today already, but throws an exception much deeper
in the agg framework and has a very unintuitive error for the user
(as well as eating more resources instead of failing early)
The (de)serialization code of the async search response
cannot handle exceptions that extend ElasticsearchException (e.g. ScriptException).
This commit fixes this bug by serializing the error with the more generic
StreamInput#writeException.
We are using `FORCE_STALE_PRIMARY_INSTANCE` in instance equality checks `==`
but were creating new instances of `ExistingStoreRecoverySource` when reading
from the wire. This could break these checks in corner cases, causing
`org.elasticsearch.cluster.routing.allocation.IndexMetadataUpdater#shardStarted`
to not remove the force allocation fake id when starting a shard.
Closes#55513
Currently QueryStringQueryParser already checks if the field limit is breached
at construction time, which e.g. leads to errors if the default field is set to
"*" or the default isn't used and there are more fields than the limit, even if the
query itself does not use all these fields.
This change moves this check to happen after query parsing. QueryStringQueryParser now
keeps track of the fields that are actually resolved while parsing. The size of
that set is later used to check against the limit set by the
`indices.query.bool.max_clause_count` setting.
Backport of #55158
Backport of #55115.
Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
This commit changes the way that V2 index, component, and request mappings are merged. Specifically:
- Fields are merged in a "replacement" manner, meaning that the entire definition is replaced rather
than merging the interior configuration
- Mapping metadata (all fields outside of `properties`) are merged recursively.
The merging for V1 templates does not change.
Relates to #53101
This commit adds a new GeoShapeBoundsAggregator to the spatial plugin and registers it with the GeoShapeValuesSourceType. This enables geo_bounds aggregations on geo_shape fields
This change fixes problem with updating Index Templates V2.
Validatation added in #54933 didn't filter list of conflicting templates correctly so
new template was always clashing with itself unless patterns were not changed completely.
Introduces InstantiatingObjectParser which is similar to the
ConstructingObjectParser, but instantiates the object using its constructor
instead of a builder function.
Closes#52499
After #53562, the `geo_shape` field mapper is registered within
a module. This opens the door for introducing a new `geo_shape`
field mapper into the Spatial Plugin that has doc-values support.
This is very much an extension of server's GeoShapeFieldMapper,
but with the addition of the doc values implementation.