Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
We can be a little more efficient when aborting a snapshot. Since we know the new repository
data after finalizing the aborted snapshot when can pass it down to the snapshot completion listeners.
This way, we don't have to fork off to the snapshot threadpool to get the repository data when the listener completes and can directly submit the delete task with high priority straight from the cluster state thread.
Provides basic repository-level stats that will allow us to get some insight into how many
requests are actually being made by the underlying SDK. Currently only tracks GET and LIST
calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint
that will help us better understand some of the low-level dynamics of searchable snapshots.
This is a first cut at giving NodeInfo the ability to carry a flexible
list of heterogeneous info responses. The trick is to be able to
serialize and deserialize an arbitrary list of blocks of information. It
is convenient to be able to deserialize into usable Java objects so that
we can aggregate nodes stats for the cluster stats endpoint.
In order to provide a little bit of clarity about which objects can and
can't be used as info blocks, I've introduced a new interface called
"ReportingService."
I have removed the hard-coded getters (e.g., getOs()) in favor of a
flexible method that can return heterogeneous kinds of info blocks
(e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the
need to cast in the client code.
We added a fancy method to provide random realistic test data to the
reduction tests in #54910. This uses that to remove some of the more
esoteric machinations in the agg tests. This will marginally increase
the coverage of the serialiation tests and, more importantly, remove
some mysterious value generation code that only really made sense for
random reduction tests but was used all over the place. It doesn't, on
the other hand, make the tests shorter. Just *hopefully* more clear.
I only cleaned up a few tests this way. If we like this it'd probably be
worth grabbing others.
Instead delete the data streams manually, until client yaml test runners
have been updated to also delete all data streams after each yaml test.
Relates to #53100
This commit introduces a new `geo` module that is intended
to be contain all the geo-spatial-specific features in server.
As a first step, the responsibility of registering the geo_shape
field mapper is moved to this module.
Co-authored-by: Nicholas Knize <nknize@gmail.com>
This commit moves the action name validation and circuit breaking into
the InboundAggregator. This work is valuable because it lays the
groundwork for incrementally circuit breaking as data is received.
This PR includes the follow behavioral change:
Handshakes contribute to circuit breaking, but cannot be broken. They
currently do not contribute nor are they broken.
Currently the TransportHandshaker has a specialized codepath for sending
a response. In other work, we are going to start having handshakes
contribute to circuit breaking (while not being breakable). This commit
moves in that direction by allowing the handshaker to responding using a
standard TcpTransportChannel similar to other requests.
This removes pipeline aggregators from the aggregation result tree
except for a single field used for backwards compatibility with pre-7.8
versions of Elasticsearch. That field isn't populated unless we are
serializing to pre-7.8 Elasticsearch. So, good news! We no longer build
pipeline aggregators on the data node. Most of the time.
This allows subclasses of `InternalAggregationTestCase` to make a `List`
of values to reduce so that it can make values that are realistic
*together*. The first use of this is with `InternalTTest` which uses it
to make results that don't cause their `sum` field to wrap. It'd likely
be useful for a ton of other aggs but just one for now.
`scripted_metric` did not work with cross cluster search because it
assumed that you'd never perform a partial reduction, serialize the
results, and then perform a final reduction. That
serialized-after-partial-reduction step was broken.
This is also required to support #54758.
This is a backport of #54803 for 7.x.
This pull request cherry picks the squashed commit from #54803 with the additional commits:
6f50c92 which adjusts master code to 7.x
a114549 to mute a failing ILM test (#54818)
48cbca1 and 50186b2 that cleans up and fixes the previous test
aae12bb that adds a missing feature flag (#54861)
6f330e3 that adds missing serialization bits (#54864)
bf72c02 that adjust the version in YAML tests
a51955f that adds some plumbing for the transport client used in integration tests
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
This should avoid REST failures caused by the inability to delete said
policy
Fix#54759
(cherry picked from commit 3ba5e02b713c03b1bdf14e0367a2bce68c35dd30)
We recently cleaned up the use of the word "metadata" across the
codebase. A few additional uses have trickled in, likely from
in-progress work. This commit cleans up these last few instances.
Relates #54519
Some field name constants were not updaten when we moved from "string" to "text"
and "keyword" fields. Renaming them makes it easier and faster to know which
field type is used in test subclassing this base test case.
Removes pipeline aggregations from the aggregation result tree as they
are no longer used. This stops us from building the pipeline aggregators
at all on data nodes except for backwards compatibility serialization.
This will save a tiny bit of space in the aggregation tree which is
lovely, but the biggest benefit is that it is a step towards simplifying
pipeline aggregators.
This only does about half of the work to remove the pipeline aggs from
the tree. Removing all of it would, well, double the size of the change
and make it harder to review.
This fixes pipeline aggregations used in cross cluster search from an older
version of Elasticsearch to a newer version of Elasticsearch. I broke
this in #53730 when I was too aggressive in shutting off serialization
of pipeline aggs. In particular, this comes up when the coordinating
node is pre-7.8.0 and the gateway node is on or after 7.8.0.
The fix is another step down the line to remove pipeline aggregators
from the aggregation tree. Sort of. It create a new
`List<PipelineAggregator>` member in `InternalAggregation` *but* it is
only used for bwc serialization and it is fed by the mechanism
established in #53730 to read the pipelines from the
Adds tests for supported ValuesSourceTypes, unmapped fields, scripting,
and the missing param. The tests for unmapped fields and scripting are
migrated from the StatsIT integration test
* Refactor nodes stats request builders to match requests (#54363)
* Remove hard-coded setters from NodesInfoRequestBuilder
* Remove hard-coded setters from NodesStatsRequest
* Use static imports to reduce clutter
* Remove uses of old info APIs
Use the same ES cluster as both an SP and an IDP and perform
IDP initiated and SP initiated SSO. The REST client plays the role
of both the Cloud UI and Kibana in these flows
Backport of #54215
* fix compilation issues
We previously checked for a 405 response, but on 7.x BWC tests may be hitting versions that don't
support the 405 response (returning 400 or 500), so be more lenient in those cases.
Relates to #54513
This is a follow up to a previous commit that renamed MetaData to
Metadata in all of the places. In that commit in master, we renamed
META_DATA to METADATA, but lost this on the backport. This commit
addresses that.
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
* Comprehensively test supported/unsupported field type:agg combinations (#52493)
This adds a test to AggregatorTestCase that allows us to programmatically
verify that an aggregator supports or does not support a particular
field type. It fetches the list of registered field type parsers,
creates a MappedFieldType from the parser and then attempts to run
a basic agg against the field.
A supplied list of supported VSTypes are then compared against the
output (success or exception) and suceeds or fails the test accordingly.
Co-Authored-By: Mark Tozzi <mark.tozzi@gmail.com>
* Skip fields that are not aggregatable
* Use newIndexSearcher() to avoid incompatible readers (#52723)
Lucene's `newSearcher()` can generate readers like ParallelCompositeReader
which we can't use. We need to instead use our helper `newIndexSearcher`
* Add warnings/errors when V2 templates would match same indices… (#54367)
* Add warnings/errors when V2 templates would match same indices as V1
With the introduction of V2 index templates, we want to warn users that templates they put in place
might not take precedence (because v2 templates are going to "win"). This adds this validation at
`PUT` time for both V1 and V2 templates with the following rules:
** When creating or updating a V2 template
- If the v2 template would match indices for an existing v1 template or templates, provide a
warning (through the deprecation logging so it shows up to the client) as well as logging the
warning
The v2 warning looks like:
```
index template [my-v2-template] has index patterns [foo-*] matching patterns from existing older
templates [old-v1-template,match-all-template] with patterns (old-v1-template =>
[foo*],match-all-template => [*]); this template [my-v2-template] will take
precedence during new index creation
```
** When creating a V1 template
- If the v1 template is for index patterns of `"*"` and a v2 template exists, warn that the v2
template may take precedence
- If the v1 template is for index patterns other than all indices, and a v2 template exists that
would match, throw an error preventing creation of the v1 template
** When updating a V1 template (without changing its existing `index_patterns`!)
- If the v1 template is for index patterns that would match an existing v2 template, warn that the
v2 template may take precedence.
The v1 warning looks like:
```
template [my-v1-template] has index patterns [*] matching patterns from existing index templates
[existing-v2-template] with patterns (existing-v2-template => [foo*]); this template [my-v1-template] may be ignored in favor of an index template at index creation time
```
And the v1 error looks like:
```
template [my-v1-template] has index patterns [foo*] matching patterns from existing index templates
[existing-v2-template] with patterns (existing-v2-template => [f*]), use index templates (/_index_template) instead
```
Relates to #53101
* Remove v2 index and component templates when cleaning up tests
* Finish half-finished comment sentence
* Guard template removal and ignore for earlier versions of ES
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Also ignore 500 errors when clearing index template v2 templates
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Currently all of our transport protocol decoding and aggregation occurs
in the individual transport modules. This means that each implementation
(test, netty, nio) must implement this logic. Additionally, it means
that the entire message has been read from the network before the server
package receives it.
This commit creates a pipeline in server which can be passed arbitrary
bytes to handle. Internally, the pipeline will decode, decompress, and
aggregate the messages. Additionally, this allows us to run many
megabytes of bytes through the pipeline in tests to ensure that the
logic works.
This work will enable future work:
Circuit breaking or backoff logic based on message type and byte
in the content aggregator.
Sharing bytes with the application layer using the ref counted
releasable network bytes.
Improved network monitoring based specifically on channels.
Finally, this fixes the bug where we do not circuit break on the correct
message size when compression is enabled.
Changes ThreadPool's schedule method to run the schedule task in the context of the thread
that scheduled the task.
This is the more sensible default for this method, and eliminates a range of bugs where the
current thread context is mistakenly dropped.
Closes#17143
This drop the "top level" pipeline aggregators from the aggregation
result tree which should save a little memory and a few serialization
bytes. Perhaps more imporantly, this provides a mechanism by which we
can remove *all* pipelines from the aggregation result tree. This will
save quite a bit of space when pipelines are deep in the tree.
Sadly, doing this isn't simple because of backwards compatibility. Nodes
before 7.7.0 *need* those pipelines. We provide them by setting passing
a `Supplier<PipelineTree>` into the root of the aggregation tree that we
only call if we need to serialize to a version before 7.7.0.
This solution works for cross cluster search because we always reduce
the aggregations in each remote cluster and then forward them back to
the coordinating node. Its quite possible that the coordinating node
needs the pipeline (say it is version 7.1.0) and the gateway node in the
remote cluster doesn't (version 7.7.0). In that case the data nodes
won't send the pipeline aggregations back to the gateway node.
Critically, the gateway node *will* send the pipeline aggregations back
to the coordinating node. This is all managed with that
`Supplier<PipelineTree>`, but *how* it is managed is a bit tricky.
Fixes an issue where the elasticsearch-node command-line tools would not work correctly
because PersistentTasksCustomMetaData contains named XContent from plugins. This PR
makes it so that the parsing for all custom metadata is skipped, even if the core system would
know how to handle it.
Closes#53549
DocsClientYamlTestSuiteIT sometimes fails for CCR
related tests because tests are started before the license
is fully applied and active within the cluster. The first
tests to be executed then fails with the error noticed
in #53430. This can be easily reproduced locally by
only running CCR docs tests.
This commit adds some @Before logic in
DocsClientYamlTestSuiteIT so that it waits for the
license to be active before running CCR tests.
Closes#53430
This moves the pipeline aggregation validation from the data node to the
coordinating node so that we, eventually, can stop sending pipeline
aggregations to the data nodes entirely. In fact, it moves it into the
"request validation" stage so multiple errors can be accumulated and
sent back to the requester for the entire request. We can't always take
advantage of that, but it'll be nice for folks not to have to play
whack-a-mole with validation.
This is implemented by replacing `PipelineAggretionBuilder#validate`
with:
```
protected abstract void validate(ValidationContext context);
```
The `ValidationContext` handles the accumulation of validation failures,
provides access to the aggregation's siblings, and implements a few
validation utility methods.
Benchmarking showed that the effect of the ExitableDirectoryReader
is reduced considerably when checking every 8191 docs. Moreover,
set the cancellable task before calling QueryPhase#preProcess()
and make sure we don't wrap with an ExitableDirectoryReader at all
when lowLevelCancellation is set to false to avoid completely any
performance impact.
Follows: #52822
Follows: #53166
Follows: #53496
(cherry picked from commit cdc377e8e74d3ca6c231c36dc5e80621aab47c69)
Use sequence numbers and force merge UUID to determine whether a shard has changed or not instead before falling back to comparing files to get incremental snapshots on primary fail-over.
This change adds a "grant API key action"
POST /_security/api_key/grant
that creates a new API key using the privileges of one user ("the
system user") to execute the action, but creates the API key with
the roles of the second user ("the end user").
This allows a system (such as Kibana) to create API keys representing
the identity and access of an authenticated user without requiring
that user to have permission to create API keys on their own.
This also creates a new QA project for security on trial licenses and runs
the API key tests there
Backport of: #52886
Today in the `CoordinatorTests` each node uses multiple threadpools. This is
mostly fine as they are almost completely stateless, except for the
`ThreadContext`: by using multiple threadpools we cannot make assertions that
the thread context is/isn't preserved as we expect. This commit consolidates
the threadpool instances in use so that each node uses just one.
This fixes two issues:
1. Currently, the future here is never resolved on assertion error so a failing test would take a full minute
to complete until the future times out.
2. S3 tests overide this method to busy assert on this method. This only works if an assertion error makes it
to the calling thread.
Closes#53508
It's simple to deprecate a field used in an ObjectParser just by adding deprecation
markers to the relevant ParseField objects. The warnings themselves don't currently
have any context - they simply say that a deprecated field has been used, but not
where in the input xcontent it appears. This commit adds the parent object parser
name and XContentLocation to these deprecation messages.
Note that the context is automatically stripped from warning messages when they
are asserted on by integration tests and REST tests, because randomization of
xcontent type during these tests means that the XContentLocation is not constant
Today cluster states are sometimes (rarely) applied in the default context
rather than system context, which means that any appliers which capture their
contexts cannot do things like remote transport actions when security is
enabled.
There are at least two ways that we end up applying the cluster state in the
default context:
1. locally applying a cluster state that indicates that the master has failed
2. the elected master times out while waiting for a response from another node
This commit ensures that cluster states are always applied in the system
context.
Mitigates #53751
* Adds per context settings:
`script.context.${CONTEXT}.cache_max_size` ~
`script.cache.max_size`
`script.context.${CONTEXT}.cache_expire` ~
`script.cache.expire`
`script.context.${CONTEXT}.max_compilations_rate` ~
`script.max_compilations_rate`
* Context cache is used if:
`script.max_compilations_rate=use-context`. This
value is dynamically updatable, so users can
switch back to the general cache if desired.
* Settings for context caches take the first value
that applies:
1) Context specific settings if set, eg
`script.context.ingest.cache_max_size`
2) Correlated general setting is set to the non-default
value, eg `script.cache.max_size`
3) Context default
The reason for 2's inclusion is to allow an easy
transition for users who've customized their general
cache settings.
Using the general cache settings for the context caches
results in higher effective settings, since they are
multiplied across the number of contexts. So a general
cache max size of 200 will become 200 * # of contexts.
However, this behavior it will avoid users snapping to a
value that is too low for them.
Backport of: #52855
Refs: #50152
Re-applies the change from #53523 along with test fixes.
closes#53626closes#53624closes#53622closes#53625
Co-authored-by: Nik Everett <nik9000@gmail.com>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Jake Landis <jake.landis@elastic.co>
Today it can happen that a transport message fails to send (for example,
because a transport interceptor rejects the request). In this case, the
response handler is never invoked, which can lead to necessary cleanups
not being performed. There are two ways to handle this. One is to expect
every callsite that sends a message to try/catch these exceptions and
handle them appropriately. The other is merely to invoke the response
handler to handle the exception, which is already equipped to handle
transport exceptions.
This begins to clean up how `PipelineAggregator`s and executed.
Previously, we would create the `PipelineAggregator`s on the data nodes
and embed them in the aggregation tree. When it came time to execute the
pipeline aggregation we'd use the `PipelineAggregator`s that were on the
first shard's results. This is inefficient because:
1. The data node needs to make the `PipelineAggregator` only to
serialize it and then throw it away.
2. The coordinating node needs to deserialize all of the
`PipelineAggregator`s even though it only needs one of them.
3. You end up with many `PipelineAggregator` instances when you only
really *need* one per pipeline.
4. `PipelineAggregator` needs to implement serialization.
This begins to undo these by building the `PipelineAggregator`s directly
on the coordinating node and using those instead of the
`PipelineAggregator`s in the aggregtion tree. In a follow up change
we'll stop serializing the `PipelineAggregator`s to node versions that
support this behavior. And, one day, we'll be able to remove
`PipelineAggregator` from the aggregation result tree entirely.
Importantly, this doesn't change how pipeline aggregations are declared
or parsed or requested. They are still part of the `AggregationBuilder`
tree because *that* makes sense.
This change introduces a new API in x-pack basic that allows to track the progress of a search.
Users can submit an asynchronous search through a new endpoint called `_async_search` that
works exactly the same as the `_search` endpoint but instead of blocking and returning the final response when available, it returns a response after a provided `wait_for_completion` time.
````
GET my_index_pattern*/_async_search?wait_for_completion=100ms
{
"aggs": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "1h"
}
}
}
````
If after 100ms the final response is not available, a `partial_response` is included in the body:
````
{
"id": "9N3J1m4BgyzUDzqgC15b",
"version": 1,
"is_running": true,
"is_partial": true,
"response": {
"_shards": {
"total": 100,
"successful": 5,
"failed": 0
},
"total_hits": {
"value": 1653433,
"relation": "eq"
},
"aggs": {
...
}
}
}
````
The partial response contains the total number of requested shards, the number of shards that successfully returned and the number of shards that failed.
It also contains the total hits as well as partial aggregations computed from the successful shards.
To continue to monitor the progress of the search users can call the get `_async_search` API like the following:
````
GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms
````
That returns a new response that can contain the same partial response than the previous call if the search didn't progress, in such case the returned `version`
should be the same. If new partial results are available, the version is incremented and the `partial_response` contains the updated progress.
Finally if the response is fully available while or after waiting for completion, the `partial_response` is replaced by a `response` section that contains the usual _search response:
````
{
"id": "9N3J1m4BgyzUDzqgC15b",
"version": 10,
"is_running": false,
"response": {
"is_partial": false,
...
}
}
````
Asynchronous search are stored in a restricted index called `.async-search` if they survive (still running) after the initial submit. Each request has a keep alive that defaults to 5 days but this value can be changed/updated any time:
`````
GET my_index_pattern*/_async_search?wait_for_completion=100ms&keep_alive=10d
`````
The default can be changed when submitting the search, the example above raises the default value for the search to `10d`.
`````
GET _async_search/9N3J1m4BgyzUDzqgC15b/?wait_for_completion=100ms&keep_alive=10d
`````
The time to live for a specific search can be extended when getting the progress/result. In the example above we extend the keep alive to 10 more days.
A background service that runs only on the node that holds the first primary shard of the `async-search` index is responsible for deleting the expired results. It runs every hour but the expiration is also checked by running queries (if they take longer than the keep_alive) and when getting a result.
Like a normal `_search`, if the http channel that is used to submit a request is closed before getting a response, the search is automatically cancelled. Note that this behavior is only for the submit API, subsequent GET requests will not cancel if they are closed.
Asynchronous search are not persistent, if the coordinator node crashes or is restarted during the search, the asynchronous search will stop. To know if the search is still running or not the response contains a field called `is_running` that indicates if the task is up or not. It is the responsibility of the user to resume an asynchronous search that didn't reach a final response by re-submitting the query. However final responses and failures are persisted in a system index that allows
to retrieve a response even if the task finishes.
````
DELETE _async_search/9N3J1m4BgyzUDzqgC15b
````
The response is also not stored if the initial submit action returns a final response. This allows to not add any overhead to queries that completes within the initial `wait_for_completion`.
The `.async-search` index is a restricted index (should be migrated to a system index in +8.0) that is accessible only through the async search APIs. These APIs also ensure that only the user that submitted the initial query can retrieve or delete the running search. Note that admins/superusers would still be able to cancel the search task through the task manager like any other tasks.
Relates #49091
Co-authored-by: Luca Cavanna <javanna@users.noreply.github.com>
Using a Long alone is not strong enough for the id of search contexts
because we reset the id generator whenever a data node is restarted.
This can lead to two issues:
1. Fetch phase can fetch documents from another index
2. A scroll search can return documents from another index
This commit avoids these issues by adding a UUID to SearchContexId.
This commit introduces hidden aliases. These are similar to hidden
indices, in that they are not visible by default, unless explicitly
specified by name or by indicating that hidden indices/aliases are
desired.
The new alias property, `is_hidden` is implemented similarly to
`is_write_index`, except that it must be consistent across all indices
with a given alias - that is, all indices with a given alias must
specify the alias as either hidden, or all specify it as non-hidden,
either explicitly or by omitting the `is_hidden` property.
This commit updates the template used for watch history indices with
the hidden index setting so that new indices will be created as hidden.
Relates #50251
Backport of #52962
When we test backwards compatibility we often end up in a situation
where we *sometimes* get a warning, and sometimes don't. Like, we won't
get the warning if we're testing against an older version, but we will
in a newer one. Or we won't get the warning if the request randomly
lands on a node with an old version of the code. But we wouldn't if it
randomed into a node with newer code.
This adds `allowed_warnings` to our yaml test runner for those cases:
warnings declared this way are "allowed" but not "required".
Blocks #52959
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Implement an Exitable DirectoryReader that wraps the original
DirectoryReader so that when a search task is cancelled the
DirectoryReaders also stop their work fast. This is usuful for
expensive operations like wilcard/prefix queries where the
DirectoryReaders can spend lots of time and consume resources,
as previously their work wouldn't stop even though the original
search task was cancelled (e.g. because of timeout or dropped client
connection).
(cherry picked from commit 67acaf61f33bc5f54e26541514d07e375c202e03)
Tests in GoogleCloudStorageBlobStoreRepositoryTests are known
to be flaky on JDK 8 (#51446, #52430 ) and we suspect a JDK
bug (https://bugs.openjdk.java.net/browse/JDK-8180754) that triggers
some assertion on the server side logic that emulates the Google
Cloud Storage service.
Sadly we were not able to reproduce the failures, even when using
the same OS (Debian 9, Ubuntu 16.04) and JDK (Oracle Corporation
1.8.0_241 [Java HotSpot(TM) 64-Bit Server VM 25.241-b07]) of
almost all the test failures on CI. While we spent some time fixing
code (#51933, #52431) to circumvent the JDK bug they are still flaky
on JDK-8. This commit mute these tests for JDK-8 only.
Close ##52906
This commit introduces a module for Kibana that exposes REST APIs that
will be used by Kibana for access to its system indices. These APIs are wrapped
versions of the existing REST endpoints. A new setting is also introduced since
the Kibana system indices' names are allowed to be changed by a user in case
multiple instances of Kibana use the same instance of Elasticsearch.
Additionally, the ThreadContext has been extended to indicate that the use of
system indices may be allowed in a request. This will be built upon in the future
for the protection of system indices.
Backport of #52385
Generalize how queries on `_index` are handled at rewrite time (#52486)
Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.
This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.
Closes#49254
In #42838 we moved the terms index of all fields off-heap except the
`_id` field because we were worried it might make indexing slower. In
general, the indexing rate is only affected if explicit IDs are used, as
otherwise Elasticsearch almost never performs lookups in the terms
dictionary for the purpose of indexing. So it's quite wasteful to
require the terms index of `_id` to be loaded on-heap for users who have
append-only workloads. Furthermore I've been conducting benchmarks when
indexing with explicit ids on the http_logs dataset that suggest that
the slowdown is low enough that it's probably not worth forcing the terms
index to be kept on-heap. Here are some numbers for the median indexing
rate in docs/s:
| Run | Master | Patch |
| --- | ------- | ------- |
| 1 | 45851.2 | 46401.4 |
| 2 | 45192.6 | 44561.0 |
| 3 | 45635.2 | 44137.0 |
| 4 | 46435.0 | 44692.8 |
| 5 | 45829.0 | 44949.0 |
And now heap usage in MB for segments:
| Run | Master | Patch |
| --- | ------- | -------- |
| 1 | 41.1720 | 0.352083 |
| 2 | 45.1545 | 0.382534 |
| 3 | 41.7746 | 0.381285 |
| 4 | 45.3673 | 0.412737 |
| 5 | 45.4616 | 0.375063 |
Indexing rate decreased by 1.8% on average, while memory usage decreased
by more than 100x.
The `http_logs` dataset contains small documents and has a simple
indexing chain. More complex indexing chains, e.g. with more fields,
ingest pipelines, etc. would see an even lower decrease of indexing rate.
We consider index level read_only_allow_delete blocks temporary since
the DiskThresholdMonitor can automatically release those when an index
is no longer allocated on nodes above high threshold.
The rest status has therefore been changed to 429 when encountering this
index block to signal retryability to clients.
Related to #49393
This commit renames ElasticsearchAssertions#assertThrows to
assertRequestBuilderThrows and assertFutureThrows to avoid a
naming clash with JUnit 4.13+ and static imports of these methods.
Additionally, these methods have been updated to make use of
expectThrows internally to avoid duplicating the logic there.
Relates #51787
Backport of #52582
Phase 1 of adding compilation limits per context.
* Refactor rate limiting and caching into separate class,
`ScriptCache`, which will be used per context.
* Disable compilation limit for certain tests.
Backport of 0866031
Refs: #50152
Cache latest `RepositoryData` on heap when it's absolutely safe to do so (i.e. when the repository is in strictly consistent mode).
`RepositoryData` can safely be assumed to not grow to a size that would cause trouble because we often have at least two copies of it loaded at the same time when doing repository operations. Also, concurrent snapshot API status requests currently load it independently of each other and so on, making it safe to cache on heap and assume as "small" IMO.
The benefits of this move are:
* Much faster repository status API calls
* listing all snapshot names becomes instant
* Other operations are sped up massively too because they mostly operate in two steps: load repository data then load multiple other blobs to get the additional data
* Additional cloud cost savings
* Better resiliency, saving another spot where an IO issue could break the snapshot
* We can simplify a number of spots in the current code that currently pass around the repository data in tricky ways to avoid loading it multiple times in follow ups.
* Refactor Inflexible Snapshot Repository BwC (#52365)
Transport the version to use for a snapshot instead of whether to use shard generations in the snapshots in progress entry. This allows making upcoming repository metadata changes in a flexible manner in an analogous way to how we handle serialization BwC elsewhere.
Also, exposing the version at the repository API level will make it easier to do BwC relevant changes in derived repositories like source only or encrypted.
Currently we have three different implementations representing a
`ConnectionManager`. There is the basic `ConnectionManager` which
holds all connections for a cluster. And a remote connection manager
which support proxy behavior. And a stubbable connection manager for
tests. The remote and stubbable instances use the delegate pattern,
so this commit extracts an interface for them all to implement.
It looks like #52000 is caused by a slowdown in cluster state application
(maybe due to #50907) but I would like to understand the details to ensure that
there's nothing else going on here too before simply increasing the timeout.
This commit enables some relevant `DEBUG` loggers and also captures stack
traces from all threads rather than just the three hottest ones.
Closes#52450. `setRandomIndexSettings(...)` in `ESIntegTestCase` was
using both a thread-local and a supplied source of randomness. Fix this
method to only use the supplied source.
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.
At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.
Co-Authored by: Zachary Tong <zach@elastic.co>
This commit makes the names of fetch subphases more consistent:
* Now the names end in just 'Phase', whereas before some ended in
'FetchSubPhase'. This matches the query subphases like AggregationPhase.
* Some names include 'fetch' like FetchScorePhase to avoid ambiguity about what
they do.
This adds a builder and parsed results for the `string_stats`
aggregation directly to the high level rest client. Without this the
HLRC can't access the `string_stats` API without the elastic licensed
`analytics` module.
While I'm in there this adds a few of our usual unit tests and
modernizes the parsing.
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
Modifies SLM's and ILM's history indices to be hidden indices for added
protection against accidental querying and deletion, and improves
IndexTemplateRegistry to handle upgrading index templates.
Also modifies the REST test cleanup to delete hidden indices.
Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely
if joining a faulty master that is too slow to respond, and
`cluster.publish.timeout` to allow a faulty master to detect that it is unable
to publish its cluster state updates in a timely fashion. If these timeouts
occur then the node restarts the discovery process in an attempt to find a
healthier master.
In the special case of `discovery.type: single-node` there is no point in
looking for another healthier master since the single node in the cluster is
all we've got. This commit suppresses these timeouts and instead lets the node
wait for joins and publications to succeed no matter how long this might take.
We can just put the `IndexId` instead of just the index name into the recovery soruce and
save one load of `RepositoryData` on each shard restore that way.
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
This commit moves the logic that cancels search requests when the rest channel is closed
to a generic client that can be used by other APIs. This will be useful for any rest action
that wants to cancel the execution of a task if the underlying rest channel is closed by the
client before completion.
Relates #49931
Relates #50990
Relates #50990
* Allow Repository Plugins to Filter Metadata on Create
Add a hook that allows repository plugins to filter the repository metadata
before it gets written to the cluster state.
The docs tests have recently been running much slower than before (see #49753).
The gist here is that with ILM/SLM we do a lot of unnecessary setup / teardown work on each
test. Compounded with the slightly slower cluster state storage mechanism, this causes the
tests to run much slower.
In particular, on RAMDisk, docs:check is taking
ES 7.4: 6:55 minutes
ES master: 16:09 minutes
ES with this commit: 6:52 minutes
on SSD, docs:check is taking
ES 7.4: ??? minutes
ES master: 32:20 minutes
ES with this commit: 11:21 minutes
* Reload secure settings with password (#43197)
If a password is not set, we assume an empty string to be
compatible with previous behavior.
Only allow the reload to be broadcast to other nodes if TLS is
enabled for the transport layer.
* Add passphrase support to elasticsearch-keystore (#38498)
This change adds support for keystore passphrases to all subcommands
of the elasticsearch-keystore cli tool and adds a subcommand for
changing the passphrase of an existing keystore.
The work to read the passphrase in Elasticsearch when
loading, which will be addressed in a different PR.
Subcommands of elasticsearch-keystore can handle (open and create)
passphrase protected keystores
When reading a keystore, a user is only prompted for a passphrase
only if the keystore is passphrase protected.
When creating a keystore, a user is allowed (default behavior) to create one with an
empty passphrase
Passphrase can be set to be empty when changing/setting it for an
existing keystore
Relates to: #32691
Supersedes: #37472
* Restore behavior for force parameter (#44847)
Turns out that the behavior of `-f` for the add and add-file sub
commands where it would also forcibly create the keystore if it
didn't exist, was by design - although undocumented.
This change restores that behavior auto-creating a keystore that
is not password protected if the force flag is used. The force
OptionSpec is moved to the BaseKeyStoreCommand as we will presumably
want to maintain the same behavior in any other command that takes
a force option.
* Handle pwd protected keystores in all CLI tools (#45289)
This change ensures that `elasticsearch-setup-passwords` and
`elasticsearch-saml-metadata` can handle a password protected
elasticsearch.keystore.
For setup passwords the user would be prompted to add the
elasticsearch keystore password upon running the tool. There is no
option to pass the password as a parameter as we assume the user is
present in order to enter the desired passwords for the built-in
users.
For saml-metadata, we prompt for the keystore password at all times
even though we'd only need to read something from the keystore when
there is a signing or encryption configuration.
* Modify docs for setup passwords and saml metadata cli (#45797)
Adds a sentence in the documentation of `elasticsearch-setup-passwords`
and `elasticsearch-saml-metadata` to describe that users would be
prompted for the keystore's password when running these CLI tools,
when the keystore is password protected.
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
* Elasticsearch keystore passphrase for startup scripts (#44775)
This commit allows a user to provide a keystore password on Elasticsearch
startup, but only prompts when the keystore exists and is encrypted.
The entrypoint in Java code is standard input. When the Bootstrap class is
checking for secure keystore settings, it checks whether or not the keystore
is encrypted. If so, we read one line from standard input and use this as the
password. For simplicity's sake, we allow a maximum passphrase length of 128
characters. (This is an arbitrary limit and could be increased or eliminated.
It is also enforced in the keystore tools, so that a user can't create a
password that's too long to enter at startup.)
In order to provide a password on standard input, we have to account for four
different ways of starting Elasticsearch: the bash startup script, the Windows
batch startup script, systemd startup, and docker startup. We use wrapper
scripts to reduce systemd and docker to the bash case: in both cases, a
wrapper script can read a passphrase from the filesystem and pass it to the
bash script.
In order to simplify testing the need for a passphrase, I have added a
has-passwd command to the keystore tool. This command can run silently, and
exit with status 0 when the keystore has a password. It exits with status 1 if
the keystore doesn't exist or exists and is unencrypted.
A good deal of the code-change in this commit has to do with refactoring
packaging tests to cleanly use the same tests for both the "archive" and the
"package" cases. This required not only moving tests around, but also adding
some convenience methods for an abstraction layer over distribution-specific
commands.
* Adjust docs for password protected keystore (#45054)
This commit adds relevant parts in the elasticsearch-keystore
sub-commands reference docs and in the reload secure settings API
doc.
* Fix failing Keystore Passphrase test for feature branch (#50154)
One problem with the passphrase-from-file tests, as written, is that
they would leave a SystemD environment variable set when they failed,
and this setting would cause elasticsearch startup to fail for other
tests as well. By using a try-finally, I hope that these tests will fail
more gracefully.
It appears that our Fedora and Ubuntu environments may be configured to
store journald information under /var rather than under /run, so that it
will persist between boots. Our destructive tests that read from the
journal need to account for this in order to avoid trying to limit the
output we check in tests.
* Run keystore management tests on docker distros (#50610)
* Add Docker handling to PackagingTestCase
Keystore tests need to be able to run in the Docker case. We can do this
by using a DockerShell instead of a plain Shell when Docker is running.
* Improve ES startup check for docker
Previously we were checking truncated output for the packaged JDK as
an indication that Elasticsearch had started. With new preliminary
password checks, we might get a false positive from ES keystore
commands, so we have to check specifically that the Elasticsearch
class from the Bootstrap package is what's running.
* Test password-protected keystore with Docker (#50803)
This commit adds two tests for the case where we mount a
password-protected keystore into a Docker container and provide a
password via a Docker environment variable.
We also fix a logging bug where we were logging the identifier for an
array of strings rather than the contents of that array.
* Add documentation for keystore startup prompting (#50821)
When a keystore is password-protected, Elasticsearch will prompt at
startup. This commit adds documentation for this prompt for the archive,
systemd, and Docker cases.
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
* Warn when unable to upgrade keystore on debian (#51011)
For Red Hat RPM upgrades, we warn if we can't upgrade the keystore. This
commit brings the same logic to the code for Debian packages. See the
posttrans file for gets executed for RPMs.
* Restore handling of string input
Adds tests that were mistakenly removed. One of these tests proved
we were not handling the the stdin (-x) option correctly when no
input was added. This commit restores the original approach of
reading stdin one char at a time until there is no more (-1, \r, \n)
instead of using readline() that might return null
* Apply spotless reformatting
* Use '--since' flag to get recent journal messages
When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.
Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.
It seems to me that we might be able to use journald's "--since" flag to
retrieve only log messages from the last run, and that this might be
less likely to fail due to race conditions in file deletion.
Unfortunately, it looks as if the "--since" flag has a granularity of
one-second. I've added a two-second sleep to make sure that there's a
sufficient gap between the test that will read from journald and the
test before it.
* Use new journald wrapper pattern
* Update version added in secure settings request
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
This commit sets `xpack.security.ssl.diagnose.trust` to false in all
of our tests when running in FIPS 140 mode and when settings objects
are used to create an instance of the SSLService. This is needed
in 7.x because setting xpack.security.ssl.diagnose.trust to true
wraps SunJSSE TrustManager with our own DiagnosticTrustManager and
this is not allowed when SunJSSE is in FIPS mode.
An alternative would be to set xpack.security.fips.enabled to
true which would also implicitly disable
xpack.security.ssl.diagnose.trust but would have additional effects
(would require that we set PBKDF2 for password hashing algorithm in
all test clusters, would prohibit using JKS keystores in nodes even
if relevant tests have been muted in FIPS mode etc.)
Relates: #49900Resolves: #51268
This change changes the way to run our test suites in
JVMs configured in FIPS 140 approved mode. It does so by:
- Configuring any given runtime Java in FIPS mode with the bundled
policy and security properties files, setting the system
properties java.security.properties and java.security.policy
with the == operator that overrides the default JVM properties
and policy.
- When runtime java is 11 and higher, using BouncyCastle FIPS
Cryptographic provider and BCJSSE in FIPS mode. These are
used as testRuntime dependencies for unit
tests and internal clusters, and copied (relevant jars)
explicitly to the lib directory for testclusters used in REST tests
- When runtime java is 8, using BouncyCastle FIPS
Cryptographic provider and SunJSSE in FIPS mode.
Running the tests in FIPS 140 approved mode doesn't require an
additional configuration either in CI workers or locally and is
controlled by specifying -Dtests.fips.enabled=true
This reverts commit c7fd24ca1569a809b499caf34077599e463bb8d6.
Now that JDK-8236582 is fixed in JDK 14 EA, we can revert the workaround.
Relates #50523 and #50512