Determines the shard size of shards before allocating shards that are
recovering from snapshots. It ensures during shard allocation that the
target node that is selected as recovery target will have enough free
disk space for the recovery event. This applies to regular restores,
CCR bootstrap from remote, as well as mounting searchable snapshots.
The InternalSnapshotInfoService is responsible for fetching snapshot
shard sizes from repositories. It provides a getShardSize() method
to other components of the system that can be used to retrieve the
latest known shard size. If the latest snapshot shard size retrieval
failed, the getShardSize() returns
ShardRouting.UNAVAILABLE_EXPECTED_SHARD_SIZE. While
we'd like a better way to handle such failures, returning this value
allows to keep the existing behavior for now.
Note that this PR does not address an issues (we already have today)
where a replica is being allocated without knowing how much disk
space is being used by the primary.
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
MapperService carries a lot of weight and is only used to determine if loading of field data for the id field is enabled, which can be done in a different way.
Just a few spots where we can dry up these tests using the snapshot test infrastructure
in core that I found while studying the existing searchable snapshot tests.
* Just some obvious drying up of these super complex tests.
* Mainly just shortening the diff of #61839 here by moving test utilities
to the abstract test case.
Also, making use of the now available functionality to simplify existing tests
and improve logging in them.
For runtime fields, we will want to do all search-time interaction with
a field definition via a MappedFieldType, rather than a FieldMapper, to
avoid interfering with the logic of document parsing. Currently, fetching
values for runtime scripts and for building top hits responses need to
call a method on FieldMapper. This commit moves this method to
MappedFieldType, incidentally simplifying the current call sites and freeing
us up to implement runtime fields as pure MappedFieldType objects.
Splitting some tests out of this class that has become a catch-all
for random snapshot related tests into either existing suits that fit
better for these tests or one of two new suits to prevent timeouts
in extreme cases (e.g. `WindowsFS` + many nodes + multiple data paths per node).
No other changes to tests were made whatsoever.
Closes#61541
Introduce 64-bit unsigned long field type
This field type supports
- indexing of integer values from [0, 18446744073709551615]
- precise queries (term, range)
- precise sort and terms aggregations
- other aggregations are based on conversion of long values
to double and can be imprecise for large values.
Backport for #60050Closes#32434
This commit adds a mechanism to MapperTestCase that allows implementing
test classes to check that their parameters can be updated, or throw conflict
errors as advertised. Child classes override the registerParameters method
and tell the passed-in UpdateChecker class about their parameters. Simple
conflicts can be checked, using the existing minimal mappings as a base to
compare against, or alternatively a particular initial mapping can be provided
to check edge cases (eg, norms can be updated from true to false, but not
vice versa). Updates are registered with a predicate that checks that the update
has in fact been applied to the resulting FieldMapper.
Fixes#61631
Most of our field types have the same implementation for their `existsQuery` method which relies on doc_values if present, otherwise it queries norms if available or uses a term query against the _field_names meta field. This standard implementation is repeated in many different mappers.
There are field types that only query doc_values, because they always have them, and field types that always query _field_names, because they never have norms nor doc_values. We could apply the same standard logic to all of these field types as `MappedFieldType` has the knowledge about what data structures are available.
This commit introduces a standard implementation that does the right thing depending on the data structure that is available. With that only field types that require a different behaviour need to override the existsQuery method.
At the same time, this no longer forces subclasses to override `existsQuery`, which could be forgotten when needed. To address this we introduced a new test method in `MapperTestCase` that verifies the `existsQuery` being generated and its consistency with the available data structures.
This commit adds a dedicated threadpool for system index write
operations. The dedicated resources for system index writes serves as
a means to ensure that user activity does not block important system
operations from occurring such as the management of users and roles.
Backport of #61655
`RepositoriesService#doClose` was never called which lead to
mock repositories not unblocking until the `ThreadPool` interrupts
all threads. Thus stopping a node that is blocked on a mock repository operation wastes `10s`
in each test that does it (which is quite a few as it turns out).
The dense vector field is not aggregatable although it produces fielddata through its BinaryDocValuesField. It should pass up hasDocValues set to true to its parent class in its constructor, and return isAggregatable false. Same for the sparse vector field (only in 7.x).
This may not have consequences today, but it will be important once we try to share the same exists query implementation throughout all of the mappers with #57607.
We removed index-time boosting back in 5x, and we no longer document the 'boost'
parameter on any of our mapping types. However, it is still possible to define an
index-time boost on a field mapper for a surprisingly large number of field types, and
they even have an effect (sometimes, on some queries).
As a first step in finally removing all traces of index time boosting, this comment emits
a deprecation warning whenever a boost parameter is found on a mapping definition.
Today when a snapshot restore is aborted (for example when the index is
explicitly deleted) while the restoration of the files from the repository has
already started the file restores are not interrupted. It means that Elasticsearch
will continue to read the files from the repository and will continue to write
them to disk until all files are restored; the store will then be closed and
files will be deleted from disk at some point but this can take a while. This
will also take some slots in the SNAPSHOT thread pool too. The Recovery
API won't show any files actively being recovered, the only notable
indicator would be the active threads in the SNAPSHOT thread pool.
This commit adds a check before reading a file to restore and before
writing bytes on disk so that a closing store can be detected more
quickly and the file recovery process aborted. This way the file
restores just stops and for most of the repository implementations
it means that no more bytes are read (see #62370 for S3), finishing
threads in the SNAPSHOT thread pool more quickly too.
In #57666 we changed when null_value was parsed for ip and date fields. Previously,
the null value was stored as a string, and parsed into a date or InetAddress whenever
a document containing a null value was encountered. Now, the values are parsed when
the mappings are built, which means that bad values are detected up front; if you try and
add a mapping with a badly-parsed ip or date for a null_value, the mapping will be
rejected.
This causes problems for upgrades in the case when you have a badly-formed null_value
in a pre-7.9 cluster. This commit fixes the upgrade case by changing the logic to only
logging a warning on the badly formed value, replicating the earlier behaviour.
Fixes#62363
Backport of #62484 to 7.x branch.
It is possible in mixed version clusters (nodes prior to 7.10)
that a 404 is returned when wiping all data streams.
This is because there are no data streams and
the coordinator node is on a version that doesn't
mark the delete request for wildcard usage.
This implements the `fields` API in `_search` for runtime fields using
doc values. Most of that implementation is stolen from the
`docvalue_fields` fetch sub-phase, just moved into the same API that the
`fields` API uses. At this point the `docvalue_fields` fetch phase looks
like a special case of the `fields` API.
While I was at it I moved the "which doc values sub-implementation
should I use for fetching?" question from a bunch of `instanceof`s to a
method on `LeafFieldData` so we can be much more flexible with what is
returned and we're not forced to extend certain classes just to make the
fetch phase happy.
Relates to #59332
This change adds an aggregation that can be used to delay the
query phase execution on shards with a configurable time:
{
"aggs": {
"delay": {
"shard_delay": {
"value": "30s"
},
"aggs": {
"host": {
"terms": {
"field": "hostname"
}
}
}
}
}
}
This test module is built on top of #61954 so the aggregation will be available only within
snapshots since this module is not meant to be used in production.
Closes#54159
* Add "synthetics-*-*" templates for synthetics fleet data
For the Elastic Agent we currently have `logs` and `metrics`, however, synthetic data doesn't belong
with those and thus we should have a place for it to live. This would be data reported from
heartbeat and under the 'monitoring' category.
This commit adds a composable index template for `synthetics-*-*` indices similar to the work in
#56709 and #57629.
Resolves#61665
PointInTimeBuilder is a ToXContentObject yet it does not print out a whole object (it is rather a fragment). Also, when it is printed out as part of SearchSourceBuilder, an error is thrown because pit should be wrapped into its own object.
This commit fixes this and adds tests for it.
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.
A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.
```
POST /my_index/_pit?keep_alive=1m
```
The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.
```
POST /_search
{
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"pit": {
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
}
}
```
Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.
```
DELETE /_pit
{
"id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```
#### Notable works in this change:
- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966
Relates #46523
Relates #26472
Co-authored-by: Jim Ferenczi <jimczi@apache.org>
Backporting #62205 to 7.x branch.
This is similar to what happens for indices. Initially we decided to let each test cleanup the
data streams it created.
The reason behind this was that client yaml test runners would need to be modified to do this too and
because data steams were new, we waited with that and let each test cleanup the data stream it created.
However we sometimes have very hard to debug test failures, because many tests fail because another test
failed mid way and didn't clean up the data streams it created. Given that and data streams exist in
the code base for a while now, we should automatically delete all data streams after each yaml test.
Relates to #62190
* preserve data streams for rolling upgrade yaml tests
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.
In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.
Backport of #60371
An important goal of the disk threshold decider is to ensure that nodes
use less disk space than the high watermark, and to take action if a
node ever exceeds this watermark. Today we do not have any
integration-style tests of this high-level behaviour. This commit
introduces a small test harness that can adjust the apparent size of the
disk and verify that the disk threshold decider moves shards around in
response.
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
This commit includes the work that has been done on the runtime fields feature branch until now. The high level tasks are listed in #59332. The tasks that have not yet been completed can be worked on after merging the feature branch.
We are adding a new x-pack plugin called runtime-fields that plugs in a custom mapper which allows to define runtime fields based on a script.
The changes included in this commit that were made outside of the x-pack/plugin/runtime-fields directory are minimal and revolve around 1) making the ScriptService available while parsing index mappings so that the scripts associated to runtime fields can be compiled 2) sharing code to manipulate ranges etc. as it can be reused in runtime fields.
Co-authored-by: Nik Everett <nik9000@gmail.com>
This commit adds external test modules. These are modules meant for
external systems to test edge cases in elasticsearch, but only within
snapshots. They are not meant to be used in production, so protections
are also added from their accidental inclusion in release builds.
Note that this commit does not actually add any new modules, it only
adds the infrastructure for the new modules, under
`test/external-modules`.
This commit adds a test to MapperTestCase that explicitly checks that a mapper can
serialize all its default values, and that this serialization can then be re-parsed. Note that
the test is disabled for non-parametrized mappers as their serialization may in some cases
output parameters that are not accepted. Gradually moving all mappers to parametrized
form will address this.
The commit also contains a fix to keyword mappers, which were not correctly serializing
the similarity parameter; this partially addresses #61563. It also enables `null` as a
value for `null_value` on `scaled_float`, as a follow-up to #61798
Several field mappers have a null_value parameter, that allows you to specify a placeholder
value to insert into a document if the incoming value for that field is null. The default value
for this is always null, meaning "add no placeholder". However, we explicitly bar users from
setting this parameter directly to null (done in #7978, in order to fix an NPE).
This exclusion means that if a mapper is serialized with include_defaults, then we either need
to special-case null_value to ensure that it is not output when it holds the default value, or
we find that the resulting serialized form cannot be used to create a mapping. This stops us
doing some useful generic testing of mappers.
This commit permits null as a parameter value for null_value, and changes the tests to check
that it is a) permissible and b) applied without throwing errors. As part of the testing changes,
a new base class MapperServiceTestCase is refactored from MapperTestCase, holding
the various helper methods related to building mappings but not the single-mapper specific
abstract methods.
Closes#58823
The recursive data.path FilePermission check is an extremely hot
codepath in Elasticsearch. Unfortunately the FilePermission check in
Java is extremely allocation heavy. As it iterates through different
file permissions, it allocates byte arrays for each Path component that
must be compared. This PR improves the situation by adding the recursive
data.path FilePermission it its own PermissionsCollection object which
is checked first.
Backport of #61474.
Part of #46106. Simplify the implementation of deprecation logging by
relying of log4j more completely, and implementing additional behaviour
through custom appenders and filters.
Runtime fields need to have a SearchLookup available, when building their fielddata implementations, so that they can look up other fields, runtime or not.
To achieve that, we add a Supplier<SearchLookup> argument to the existing MappedFieldType#fielddataBuilder method.
As we introduce the ability to look up other fields while building fielddata for mapped fields, we implicitly add the ability for a field to require other fields. This requires some protection mechanism that detects dependency cycles to prevent stack overflow errors.
With this commit we also introduce detection for cycles, as well as a limit on the depth of the references for a runtime field. Note that we also plan on introducing cycles detection at compile time, so the runtime cycles detection is a last resort to prevent stack overflow errors but we hope that we can reject runtime fields from being registered in the mappings when they create a cycle in their definition.
Note that this commit does not introduce any production implementation of runtime fields, but is rather a pre-requisite to merge the runtime fields feature branch.
This is a breaking change for MapperPlugins that plug in a mapper, as the signature of MappedFieldType#fielddataBuilder changes from taking a single argument (the index name), to also accept a Supplier<SearchLookup>.
Relates to #59332
Co-authored-by: Nik Everett <nik9000@gmail.com>
Today the `CoordinatorTests` run the publication process as a single
atomic action; however in production it appears possible that another
master may be elected, publish its state, then fail, then we win another
election, all in between the time we sampled our previous cluster state
and started to publish the one we first thought of.
This violates the `assertClusterStateConsistency()` assertion that
verifies the cluster state update event matches the states we actually
published and applied.
This commit adjusts the tests to run the publication process more
asynchronously so as to allow time for this behaviour to occur. This
should eventually result in a reproduction of the failure in #61437 that
will let us analyse what's really going on there and help us fix it.
If a searchable snapshot shard fails (e.g. its node leaves the cluster)
we want to be able to start it up again on a different node as quickly
as possible to avoid unnecessarily blocking or failing searches. It
isn't feasible to fully restore such shards in an acceptably short time.
In particular we would like to be able to deal with the `can_match`
phase of a search ASAP so that we can skip unnecessary waiting on shards
that may still be warming up but which are not required for the search.
This commit solves this problem by introducing a system index that holds
much of the data required to start a shard. Today(*) this means it holds
the contents of every file with size <8kB, and the first 4kB of every
other file in the shard. This system index acts as a second-level cache,
behind the first-level node-local disk cache but in front of the blob
store itself. Reading chunks from the index is slower than reading them
directly from disk, but faster than reading them from the blob store,
and is also replicated and accessible to all nodes in the cluster.
(*) the exact heuristics for what we should put into the system index
are still under investigation and may change in future.
This second-level cache is populated when we attempt to read a chunk
which is missing from both levels of cache and must therefore be read
from the blob store.
We also introduce `SearchableSnapshotsBlobStoreCacheIntegTests` which
verify that we do not hit the blob store more than necessary when
starting up a shard that we've seen before, whether due to a node
restart or because a snapshot was mounted multiple times.
Backport of #60522
Co-authored-by: Tanguy Leroux <tlrx.dev@gmail.com>
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.
depends on #61515
backports #58435