Even if we increase the limit it might not take effect straight away if a thread is
blocked on a long wait in `org.elasticsearch.index.snapshots.blobstore.RateLimitingInputStream#maybePause`.
Let's increase the limit a little and see if that deals with the remaining failures for good and stop burning
cycles busy asserting a future completion.
Closes#63246
MapperService carries a lot of weight and is only used to determine if loading of field data for the id field is enabled, which can be done in a different way.
Just a few spots where we can dry up these tests using the snapshot test infrastructure
in core that I found while studying the existing searchable snapshot tests.
In #62509 we already plugged faster sequential access for stored fields in the fetch phase.
This PR now adds using the potentially better field reader also in SourceLookup.
Rally exeriments are showing that this speeds up e.g. when runtime fields that are using
"_source" are added e.g. via "docvalue_fields" or are used in queries or aggs.
Closes#62621
In 6x and 7x, indexes can have only one type, which means that we can rework
all queries against the type field to use a ConstantFieldType. This has already
been done in master with the removal of the TypeFieldMapper, but we still need
that class in 7x to deal with nested documents. This commit leaves
TypeFieldMapper in place, but refactors TypeFieldType to extend
ConstantFieldType and consolidates deprecation warnings within that class.
It also incidentally removes the requirement to pass a MapperService to
IndexFieldData.Builder#build, which should allow #63197 to be backported.
There is no need to let snapshots that haven't yet written anything to the repo
finalize with `FAILED`. When we still had the `INIT` state we would also just remove
these snapshots from the state without any further action.
This is not just a theoretical optimization. Currently, the situation of having a lot of
queued up snapshots is fairly complicated to resolve when all the queued shards move to aborted
since it is now necessary to execute tasks on the `SNAPSHOT` pool (that might be very busy) to
remove the snapshot from the CS (including a number of redundant CS updates and repo writes
for finalizing these snapshots before deleting them right away after).
If the connection between clusters is disconnected or the leader cluster
is offline, then CCR shard-follow tasks can stop with "no seed node
left". CCR should retry on this error.
The copy constructors previously used were hard to read and the exact state changes
were not obvious at all.
Refactored those into a number of named constructors instead, added additional assertions
and moved the snapshot abort logic into `SnapshotsInProgress`.
In #63242 we changed how we build `nextRoundingValue` to, well, be
correct. But the old `org.elasticsearch.common.rounding.Rounding`
implementation didn't get the fix. Which is fine, because it doesn't
that method on that implementation doesn't receive any use outside of
tests. In fact, it is entirely removed in master. Anyway, now that the
two implementation produce different values we really can't go around
asserting that they produce the same values now can we? Well, we were!
This skips that assertion if we know `nextRoundingValue` is implemented
differently.
Closes#63256
* Setting `script.painless.regex.enabled` has a new option,
`use-factor`, the default. This defaults to using regular
expressions but limiting the complexity of the regular
expressions.
In addition to `use-factor`, the setting can be `true`, as
before, which enables regular expressions without limiting them.
`false` totally disables regular expressions, which was the
old default.
* New setting `script.painless.regex.limit-factor`. This limits
regular expression complexity by limiting the number characters
a regular expression can consider based on input length.
The default is `6`, so a regular expression can consider
`6` * input length number of characters. With input
`foobarbaz` (length `9`), for example, the regular expression
can consider `54` (`6 * 9`) characters.
This reduces the impact of exponential backtracking in Java's
regular expression engine.
* add `@inject_constant` annotation to whitelist.
This annotation signals that a compiler settings will
be injected at the beginning of a whitelisted method.
The format is `argnum=settingname`:
`1=foo_setting 2=bar_setting`.
Argument numbers must start at one and must be sequential.
* Augment
`Pattern.split(CharSequence)`
`Pattern.split(CharSequence, int)`,
`Pattern.splitAsStream(CharSequence)`
`Pattern.matcher(CharSequence)`
to take the value of `script.painless.regex.limit-factor` as a
an injected parameter, limiting as explained above when this
setting is in use.
Fixes: #49873
Backport of: 93f29a4
We only ever use this with `XContentParser` no need to make it inline
worse by forcing the lambda and hence dynamic callsite here.
=> Extraced the exception formatting code path that is likely very cold
to a separate method and removed the lambda usage in hot loops by simplifying
the signature here.
Small refactoring to shorten the diff with the clone logic in #61839:
* Since clones will create a different kind of shard state update that
isn't the same request sent by the snapshot shards service (and cannot be
the same request because we have no `ShardId`) base the shard state updates
on a different class that can be extended to be general enough to accomodate
shard clones as well.
* Make the update executor a singleton (can't make it an inline lambda as that
would break CS update batching because the executor is used as a map key but
this change still makes it crystal clear that there's no internal state to the
executor)
* Make shard state update responses a singleton (can't use TransportResponse.Empty because
we need an action response but still it makes it clear that there's no actual
response with content here)
* Just some obvious drying up of these super complex tests.
* Mainly just shortening the diff of #61839 here by moving test utilities
to the abstract test case.
Also, making use of the now available functionality to simplify existing tests
and improve logging in them.
"interval" style roundings were implementing `nextRoundingValue` in a
fairly inconsistent way - it'd produce a value, but sometimes that
value would be the same as the previous rounding value. This makes it
consistently the next value that `rounding` would make.
As long as `bestEffortConsistency` is `true`, the value of `latestKnownRepoGen`
can be updated as a result of reads. We can only assert that `latestKnownRepoGen`
and cluster state move in lock-step if `bestEffortConsistency` was `false` before
updating the metadata generation as well as after.
Closes#62877
There is a small race when processing the cluster state that is used to
establish a newly elected leader as master of the cluster: it can pick the term
in its master state update task from a different (newer) election. This trips
an assertion in `Coordinator.publish(...)` where we claim that the term on the
state allows to uniquely define the pre-state but this isn't so. There are no
bad consequences of this race since such a publication fails later on anyway.
This PR fixes things so that the assertion holds true by improving the handling
of terms during cluster state processing by associating each master state
update task that is used to establish a newly elected leader with the correct
corresponding term from its election. It also explicitly handles the case where
the pre-state that is used as base state has already superseded the current
state. As a nice side-effect, join batching now only happens based on the same
term.
Closes#61437
The iteration over `timeoutClusterStateListeners` starts when the CS applier
thread is still running. This can lead to entries being added to it that never
get their listener resolved on shutdown and thus leak that listener as observed
in a stuck test in #62863.
Since `listener.onClose()` is idempotent we can just call it if we run into a stopped service
on the CS thread to avoid the race with certainty (because the iteration in `doStop` starts after
the stopped state has been set).
Closes#62863
For runtime fields, we will want to do all search-time interaction with
a field definition via a MappedFieldType, rather than a FieldMapper, to
avoid interfering with the logic of document parsing. Currently, fetching
values for runtime scripts and for building top hits responses need to
call a method on FieldMapper. This commit moves this method to
MappedFieldType, incidentally simplifying the current call sites and freeing
us up to implement runtime fields as pure MappedFieldType objects.
In #22721, the decision to throttle indexing was inadvertently flipped,
so that we until this commit throttle indexing during recovery but
never throttle user initiated indexing requests. This commit
fixes that to throttle user initiated indexing requests and never
throttle recovery requests.
Closes#61959
Backport #63170 to 7.x branch.
The _index field is a special field that allows using queries against the name of an index or alias.
Data stream names were not included, this pr fixes that by changing SearchIndexNameMatcher
(which used via IndexFieldMapper) to also include data streams.
Splitting some tests out of this class that has become a catch-all
for random snapshot related tests into either existing suits that fit
better for these tests or one of two new suits to prevent timeouts
in extreme cases (e.g. `WindowsFS` + many nodes + multiple data paths per node).
No other changes to tests were made whatsoever.
Closes#61541
7.x client can pass media type with a version which will return a 7.x
version of the api in ES 8.
In ES server 7 this media type shoulld be accepted but it serve the same
version of the API (7x)
relates #61427
* Add System Indices check to AutoCreateIndex
By default, Elasticsearch auto-creates indices when a document is
submitted to a non-existent index. There is a setting that allows users
to disable this behavior. However, this setting should not apply to
system indices, so that Elasticsearch modules and plugins are able to
use auto-create behavior whether or not it is exposed to users.
This commit constructs the AutoCreateIndex object with a reference to
the SystemIndices object so that we bypass the check for the user-facing
autocreate setting when it's a system index that is being autocreated.
We also modify the logic in TransportBulkAction to make sure that if a
system index is included in a bulk request, we don't skip the
autocreation step.
It extracts the query capabilities from AbstractGeometryFieldType into two new interfaces, GeoshapeQueryable and ShapeQueryable. Those interfaces are implemented by the final mappers.
Today InternalClusterInfoService ignores hidden indices when
retrieving shard stats of the cluster. This can lead to suboptimal
shard allocation decisions as the size of shards are taken into
account when allocating new shards or rebalancing existing shards.
This converts RankFeatureFieldMapper, RankFeaturesFieldMapper,
SearchAsYouTypeFieldMapper and TokenCountFieldMapper to
parametrized forms. It also adds a TextParams utility class to core
containing functions that help declare text parameters - mainly shared
between SearchAsYouTypeFieldMapper and KeywordFieldMapper at
the moment, but it will come in handy when we convert TextFieldMapper
and friends.
Relates to #62988
Currently, `finishHim` can either execute on the specified executor
(in the less likely case that the local node request is the last to arrive)
or on a transport thread.
In case of e.g. `org.elasticsearch.action.admin.cluster.stats.TransportClusterStatsAction`
this leads to an expensive execution that deserializes all mapping metadata in the cluster
running on the transport thread and destabilizing the cluster. In case of this transport
action it was specifically moved to the `MANAGEMENT` thread to avoid the high cost of processing
the stats requests on the nodes during fan-out but that did not cover the final execution
on the node that received the initial request. This PR adds to ability to optionally specify the executor for the final step of the
nodes request execution and uses that to work around the issue for the slow `TransportClusterStatsAction`.
Note: the specific problem that motivated this PR is essentially the same as https://github.com/elastic/elasticsearch/pull/57937 where we moved the execution off the transport and on the management thread as a fix as well.
Passing FieldMappers to point parsing functions makes trying to build source-only
fields from MappedFieldTypes more complicated. This small refactoring changes
things so that the relevant parsing and factory functions from
AbstractGeometryFieldMapper are instead passed as lambdas to the PointParser
constructor.
make node resolving more robust by ignoring null values. This is a bug in
the usage of this class, however you don't want NPE's in prod. The root cause
might be a corner case. Because silencing the root cause is bad, the assert
causes a fail if assertions are enabled
relates #62847
We have to make sure the applier and not the accept state versions allign here.
Otherwise we can get into the situation where the data node is so slow to process
one version that the next one arrives, gets rejected and the request return with
ack `false` and we fail the assertion that the put mapping request didn't complete.
Closes#62446
Currently we duplicate our specialized cors logic in all transport
plugins. This is unnecessary as it could be implemented in a single
place. This commit moves the logic to server. Additionally it fixes a
but where we are incorrectly closing http channels on early Cors
responses.
Introduce 64-bit unsigned long field type
This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
to double and can be imprecise for large values.
Backport for #60050Closes#32434
This commit adds a mechanism to MapperTestCase that allows implementing
test classes to check that their parameters can be updated, or throw conflict
errors as advertised. Child classes override the registerParameters method
and tell the passed-in UpdateChecker class about their parameters. Simple
conflicts can be checked, using the existing minimal mappings as a base to
compare against, or alternatively a particular initial mapping can be provided
to check edge cases (eg, norms can be updated from true to false, but not
vice versa). Updates are registered with a predicate that checks that the update
has in fact been applied to the resulting FieldMapper.
Fixes#61631
This commit allows coordinating node to account the memory used to perform partial and final reduce of
aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save
and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final
reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown
if exceeds the maximum memory allowed in this breaker.
This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced.
This estimation can be completely off for some aggregations but it is corrected with the real size after
the reduce completes.
If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations
and replace the estimation with the serialized size of the newly reduced result.
As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead
of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is
to [reduce the default batch reduce size](https://github.com/elastic/elasticsearch/issues/51857) of blocking
search request to a more sane number.
Closes#37182
Backport #62825 to 7.x branch.
Today if a data stream is auto created, but an index with same name as the
first backing index already exists then internally that error is ignored,
which then result that later in the execution of a bulk request, the
bulk item fails due to that the data stream hasn't been auto created.
This situation can only occur if an index with same is created that
will be the backing index of a data stream prior to the creation
of the data stream.
Co-authored-by: Dan Hermann <danhermann@users.noreply.github.com>
A few of us were talking about ways to speed up the `date_histogram`
using the index for the timestamp rather than the doc values. To do that
we'd have to pre-compute all of the "round down" points in the index. It
turns out that *just* precomputing those values speeds up rounding
fairly significantly:
```
Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units
before 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 96461080.982 ± 616373.011 ns/op
before 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 130598950.850 ± 1249189.867 ns/op
after 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 52311775.080 ± 107171.092 ns/op
after 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 54800134.968 ± 373844.796 ns/op
```
That's a 46% speed up when there isn't a time zone and a 58% speed up
when there is.
This doesn't work for every time zone, specifically those that have two
midnights in a single day due to daylight savings time will produce wonky
results. So they don't get the optimization.
Second, this requires a few expensive computation up front to make the
transition array. And if the transition array is too large then we give
up and use the original mechanism, throwing away all of the work we did
to build the array. This seems appropriate for most usages of `round`,
but this change uses it for *all* usages of `round`. That seems ok for
now, but it might be worth investigating in a follow up.
I ran a macrobenchmark as well which showed an 11% preformance
improvement. *BUT* the benchmark wasn't tuned for my desktop so it
overwhelmed it and might have produced "funny" results. I think it is
pretty clear that this is an improvement, but know the measurement is
weird:
```
Benchmark (count) (interval) (range) (zone) Mode Cnt Score Error Units
before 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 96461080.982 ± 616373.011 ns/op
before 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 g± 1249189.867 ns/op
after 10000000 calendar month 2000-10-28 to 2000-10-31 UTC avgt 10 52311775.080 ± 107171.092 ns/op
after 10000000 calendar month 2000-10-28 to 2000-10-31 America/New_York avgt 10 54800134.968 ± 373844.796 ns/op
Before:
| Min Throughput | hourly_agg | 0.11 | ops/s |
| Median Throughput | hourly_agg | 0.11 | ops/s |
| Max Throughput | hourly_agg | 0.11 | ops/s |
| 50th percentile latency | hourly_agg | 650623 | ms |
| 90th percentile latency | hourly_agg | 821478 | ms |
| 99th percentile latency | hourly_agg | 859780 | ms |
| 100th percentile latency | hourly_agg | 864030 | ms |
| 50th percentile service time | hourly_agg | 9268.71 | ms |
| 90th percentile service time | hourly_agg | 9380 | ms |
| 99th percentile service time | hourly_agg | 9626.88 | ms |
|100th percentile service time | hourly_agg | 9884.27 | ms |
| error rate | hourly_agg | 0 | % |
After:
| Min Throughput | hourly_agg | 0.12 | ops/s |
| Median Throughput | hourly_agg | 0.12 | ops/s |
| Max Throughput | hourly_agg | 0.12 | ops/s |
| 50th percentile latency | hourly_agg | 519254 | ms |
| 90th percentile latency | hourly_agg | 653099 | ms |
| 99th percentile latency | hourly_agg | 683276 | ms |
| 100th percentile latency | hourly_agg | 686611 | ms |
| 50th percentile service time | hourly_agg | 8371.41 | ms |
| 90th percentile service time | hourly_agg | 8407.02 | ms |
| 99th percentile service time | hourly_agg | 8536.64 | ms |
|100th percentile service time | hourly_agg | 8538.54 | ms |
| error rate | hourly_agg | 0 | % |
```
The name `FieldFetcher` fits better with the 'fetch' terminology we use
elsewhere, for example `FetchFieldsPhase` and `ValueFetcher`.
This PR also moves the construction of the fetcher off the context and onto
`FetchFieldsPhase`, which feels like a more natural place for it, and fixes a
TODO in javadocs.