This adds two extra bits of info to the profiler:
1. Count of the number of different types of collectors. This lets us figure
out if we're using the optimization for segment ordinals. It adds a few
more similar counters just for good measure.
2. Profiles the `getLeafCollector` and `postCollection` methods. These are
non-trivial for some aggregations, like cardinality.
We want Logstash indices to be system indices, but the logstash
service will still need to be able to manage its indices. This PR
adds special system index APIs to the logstash plugin so that
logstash can manage its pipelines without direct access to the
underlying indices.
* Add logstash module with dedicated logstash APIs
* merge with x-pack plugin
* add system index access allowance
* Break out serialization tests into distinct classes
* Log failures for partial multiget failure
* Move LogstashSystemIndexIT to javaRestTest task
Co-authored-by: William Brafford <william.brafford@elastic.co>
Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
AuthorizationService#authorize uses the thread context to carry the result of the
authorisation as transient headers. The listener argument to the `authorize` method
must necessarily observe the header values. This PR makes it so that
the authorisation transient headers (`_indices_permissions` and `_authz_info`, but
NOT `_originating_action_name`) of the child action override the ones of the parent action.
Co-authored-by: Tim Vernum tim@adjective.org
This was missing and caused nodes to drop out of the cluster on serialization failures
when ever one tried to get an enrich policy task by name.
The test in here is a little dirty but I figured it would be nice to have an actual reproducer
for the issue and I couldn't find any infrastructure to nicely time the tasks so I put this on
top of existing test infra.
Use the newly introduced PIT API to have a consistent view of the data
while doing sequence matching, which involves multiple calls, aka
repeatable reads and thus avoid race conditions or any in-flight updates
on the data.
(cherry picked from commit daa72fc3c71fd36afb55278021ff6bbc591ef148)
Backport of #62361 to 7.x branch.
This test was fine and shouldn't have been muted.
The test case class should have preserved data streams as part of #62205Closes#62210
* Add "synthetics-*-*" templates for synthetics fleet data
For the Elastic Agent we currently have `logs` and `metrics`, however, synthetic data doesn't belong
with those and thus we should have a place for it to live. This would be data reported from
heartbeat and under the 'monitoring' category.
This commit adds a composable index template for `synthetics-*-*` indices similar to the work in
#56709 and #57629.
Resolves#61665
This PR adds support for the 'fields' option in the following places:
* Anytime `inner_hits` is used, for both fetching nested/ child docs and field collapsing
* The `top_hits` aggregation
Addresses #61949.
The annotations index is not covered by the comparison between
mappings and templates, as it does not use an index template.
This commit adds an assertion on annotations index mappings
that will fail if the mappings are not upgraded as expected.
Backport of #62325
The job comms thread pool is intended for the long-running job
processes that do anomaly detection or data frame analytics and
count towards job count and memory limits.
This commit moves the short-lived memory estimation processes
to the ML utility thread pool.
Although this doesn't matter in most cases, at the limits of
scale it could mean that memory estimations would get in the way
of starting jobs, or would queue up for an excessive period of
time while waiting for jobs to finish.
Similar to the work in #60994 where we introduced the `data_hot`, `data_warm`, etc node roles. This
introduces a new `data_content` node role to be used for the Content tier.
Currently this tier is not used anywhere, but subsequent work will use this tier.
Relates to #60848
When calling `_execute` there is a chance that there will be bulk indexing failures
or search failures.
These will result in the call failing overall. But, no information is provided for troubleshooting the failure.
This commit adds logging to indicate the number of failures, and new debug level logging so that
failure details can be determined if necessary.
closes https://github.com/elastic/elasticsearch/issues/60491
Just a number of obvious spots where we were allocating
duplicate empty structures or otherwise inefficient that I
found while investigating snapshot cluster state update performance.
This commit deprecates the Repository Stats API added in 7.8.0 as
an experimental API behind a feature flag. The goal is to deprecate
this API in 7.10.0 and remove it in a follow up PR in 8.0.0.
This API is now superseded by the Repositories Metering API.
It has been observed that if the normalizer process fails
to connect to the JVM then this causes a null pointer
exception as the JVM tries to close the native process
object. The accessors and close methods of the native
process class that access the C++ log handler should not
assume that it connected correctly.
Backport of #62059 to 7.x branch.
Return a 404 http status code when attempting to delete a non existing data stream.
However only return a 404 when targeting a data stream without any wildcards.
Closes#62022
This commit addresses a super minor misalignment with master, applying exactly the same change that was made as part of #62057, which was backported before point in time APIs were backported.
If shards are relocated to new nodes, then searches with a point in time
will fail, although a pit keeps search contexts open. This commit solves
this problem by reducing info used by SearchShardIterator and always
including the matching nodes when resolving a point in time.
Closes#61627
This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext)
before any other operation and that it is correctly recycled or removed at the end of the operation.
This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once.
Relates #61446
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
This commit introduces a new API that manages point-in-times in x-pack
basic. Elasticsearch pit (point in time) is a lightweight view into the
state of the data as it existed when initiated. A search request by
default executes against the most recent point in time. In some cases,
it is preferred to perform multiple search requests using the same point
in time. For example, if refreshes happen between search_after requests,
then the results of those requests might not be consistent as changes
happening between searches are only visible to the more recent point in
time.
A point in time must be opened before being used in search requests. The
`keep_alive` parameter tells Elasticsearch how long it should keep a
point in time around.
```
POST /my_index/_pit?keep_alive=1m
```
The response from the above request includes a `id`, which should be
passed to the `id` of the `pit` parameter of search requests.
```
POST /_search
{
"query": {
"match" : {
"title" : "elasticsearch"
}
},
"pit": {
"id": "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWICBXV1aWQyAAAFdXVpZDEAAQltYXRjaF9hbGw_gAAAAA==",
"keep_alive": "1m"
}
}
```
Point-in-times are automatically closed when the `keep_alive` is
elapsed. However, keeping point-in-times has a cost; hence,
point-in-times should be closed as soon as they are no longer used in
search requests.
```
DELETE /_pit
{
"id" : "46ToAwMDaWR4BXV1aWQxAgZub2RlXzEAAAAAAAAAAAEBYQNpZHkFdXVpZDIrBm5vZGVfMwAAAAAAAAAAKgFjA2lkeQV1dWlkMioGbm9kZV8yAAAAAAAAAAAMAWIBBXV1aWQyAAA="
}
```
#### Notable works in this change:
- Move the search state to the coordinating node: #52741
- Allow searches with a specific reader context: #53989
- Add the ability to acquire readers in IndexShard: #54966
Relates #46523
Relates #26472
Co-authored-by: Jim Ferenczi <jimczi@apache.org>
CCR shard follow task can hit CircuitBreakingException on the leader
cluster (read changes requests) or the follower cluster (bulk requests).
CCR should retry on CircuitBreakingException as it's a transient error.
We were missing a few `@Override` annotations in runtime fields which
let us drift from the methods we were supposed to override. Oops. This
adds them and links the methods.
For runtime fields we have written quite some lucene queries that work against runtime values that are the result of the execution of the different script contexts that runtime fields support.
The all (but one) share the same main logic: use a two phase iterator, iterate over all documents, and decide whether the current doc matches or not based on what the script returns. I went ahead and shared this bit of code in the base class for all queries on top of runtime fields.
This commit removes the documentation for some specific Searchable Snapshot REST APIs:
- clear cache
- searchable snapshot stats
- repository stats
These APIs are low-level and are useful to investigate the behavior of snapshot
backed indices but we expect them to be removed in the future or to appear in
a different form.
Backporting #62205 to 7.x branch.
This is similar to what happens for indices. Initially we decided to let each test cleanup the
data streams it created.
The reason behind this was that client yaml test runners would need to be modified to do this too and
because data steams were new, we waited with that and let each test cleanup the data stream it created.
However we sometimes have very hard to debug test failures, because many tests fail because another test
failed mid way and didn't clean up the data streams it created. Given that and data streams exist in
the code base for a while now, we should automatically delete all data streams after each yaml test.
Relates to #62190
* preserve data streams for rolling upgrade yaml tests
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing. The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects. However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects. As a first
step it makes sense to move the returned mappings closer
to the standard format.
This is a small building block towards fixing #55616
This commit removes `integTest` task from all es-plugins.
Most relevant projects have been converted to use yamlRestTest, javaRestTest,
or internalClusterTest in prior PRs.
A few projects needed to be adjusted to allow complete removal of this task
* x-pack/plugin - converted to use yamlRestTest and javaRestTest
* plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task
* qa/die-with-dignity - convert to javaRestTest
* x-pack/qa/security-example-spi-extension - convert to javaRestTest
* multiple projects - remove the integTest.enabled = false (yay!)
related: #61802
related: #60630
related: #59444
related: #59089
related: #56841
related: #59939
related: #55896
This prevent `keyword` valued runtime scripts from emitting too many
values or values that take up too much space. Without this you can put
allocate a ton of memory with the script by sticking it into a tight
loop. Painless has some protections against this but:
1. I don't want to rely on them out of sheer paranoia
2. They don't really kick in when the script uses callbacks like we do
anyway.
Relates to #59332
* [ML] only persist progress if it has changed
We already search for the previously stored progress document.
For optimization purposes, and to prevent restoring the same
progress after a failed analytics job is stopped,
this commit does an equality check between the previously stored progress and current progress
If the progress has changed, persistence continues as normal.
When a tree model is provided, it is possible that it is a stump.
Meaning, it only has one node with no splits
This implies that the tree has no features. In this case,
having zero feature_names is appropriate. In any other case,
this should be considered a validation failure.
This commit adds the validation if there is more than 1 node,
that the feature_names in the model are non-empty.
closes#60759