This change ensures that internal client requests spawned by the
transform persistent task executor and that use the end user security
credentials, have the parent task id assigned. The objective here is
to permit auditing (as well as tracking for debugging purposes) of all
the end-user requests executed on its behalf by persistent tasks.
Because transform tasks already implements graceful shutdown of the
child tasks, this change does not interfere with that by opting out of
the persistent task cancellation of child tasks.
Relates #55046#54943#52314Closes#54957
This commit refactors the `AuditTrail` to use the `TransportRequest` as a parameter
for all its audit methods, instead of the current `TransportMessage` super class.
The goal is to gain access to the `TransportRequest#parentTaskId` member,
so that it can be audited. The `parentTaskId` is used internally when spawning tasks
that handle transport requests; in this way tasks across nodes are related by the
same parent task.
Relates #52314
Provides basic repository-level stats that will allow us to get some insight into how many
requests are actually being made by the underlying SDK. Currently only tracks GET and LIST
calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint
that will help us better understand some of the low-level dynamics of searchable snapshots.
Fixes a couple of related failures in SearchableSnapshotsIntegTests.
Firstly, we were not correctly accounting for the case where the cache was so
small that some/all files were read directly; fixed this by only asserting that
the cache is definitely used if the corresponding node has a cache that's large
enough to hold the whole index.
Secondly, we were not permitting shards to be completely empty, which might be
the case (rarely) if there were not many documents indexed and the distribution
of IDs was a bit unlucky; fixed this by asserting that we get stats for at
least one file for the whole index, rather than for each shard separately.
Closes#55126
This is a first cut at giving NodeInfo the ability to carry a flexible
list of heterogeneous info responses. The trick is to be able to
serialize and deserialize an arbitrary list of blocks of information. It
is convenient to be able to deserialize into usable Java objects so that
we can aggregate nodes stats for the cluster stats endpoint.
In order to provide a little bit of clarity about which objects can and
can't be used as info blocks, I've introduced a new interface called
"ReportingService."
I have removed the hard-coded getters (e.g., getOs()) in favor of a
flexible method that can return heterogeneous kinds of info blocks
(e.g., getInfo(OsInfo.class)). Taking a class as an argument removes the
need to cast in the client code.
The isAuthAllowed() method for license checking is used by code that
wants to ensure security is both enabled and available. The enabled
state is dynamic and provided by isSecurityEnabled(). But since security
is available with all license types, an check on the license level is
not necessary. Thus, this change replaces isAuthAllowed() with calling
isSecurityEnabled().
We needlessly send documents to be persisted. If there are no stats added, then we should not attempt to persist them.
Also, this PR fixes the race condition that caused issue: https://github.com/elastic/elasticsearch/issues/54786
Adds support for filters to T-Test aggregation. The filters can be used to
select populations based on some criteria and use values from the same or
different fields.
Closes#53692
This change converts the module and plugin parameters
for testClusters to be lazy. Meaning that the values
are not resolved until they are actually used. This
removes the requirement to use project.afterEvaluate to
be able to resolve the bundle artifact.
Note - this does not completely remove the need for afterEvaluate
since it is still needed for the custom resource extension.
A small follow-up to #54910. Now that we can generated consistent set of
internal aggs to reduce, we no longer need to keep agg parameters as class
variables.
Related to #54910
* [ML] Start gathering and storing inference stats (#53429)
This PR enables stats on inference to be gathered and stored in the `.ml-stats-*` indices.
Each node + model_id will have its own running stats document and these will later be summed together when returning _stats to the user.
`.ml-stats-*` is ILM managed (when possible). So, at any point the underlying index could change. This means that a stats document that is read in and then later updated will actually be a new doc in a new index. This complicates matters as this means that having a running knowledge of seq_no and primary_term is complicated and almost impossible. This is because we don't know the latest index name.
We should also strive for throughput, as this code sits in the middle of an ingest pipeline (or even a query).
This commits adds a timeout when moving ILM back on to a failed step. In
case the master is struggling with processing the cluster update requests
these ones will expire (as we'll send them again anyway on the next ILM
loop run)
ILM more descriptive source messages for cluster updates
Use the configured ILM step master timeout setting
(cherry picked from commit ff6c5ed16616eadfcddd9c95317d370f0d126583)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
* ILM use Priority.IMMEDIATE for stop ILM cluster update (#54909)
This changes the priority of the cluster state update that stops ILM
altogether to `IMMEDIATE`. We've chosen to change this as it can be useful to
temporarily stop ILM if a cluster is overwhelmed, but a `NORMAL`
priority can see the "stop ILM update" not make it up the tasks queue.
On the same note, we're keeping the `start ILM` cluster update priority
to `NORMAL` on purpose such that we only start `ILM` if the cluster can
handle it.
(cherry picked from commit d67df3a7cd2a8619c2c9efac4dde3ba83271f2fa)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This change makes sure that all internal client requests spawned by the
data frame analytics persistent task executor and that use the end user
security credentials, have the parent task id assigned. The objective here
is to permit auditing (as well as tracking for debugging purposes) of all
the end-user requests executed on its behalf by persistent tasks.
Because data frame analytics taks already implements graceful shutdown
of child tasks, this change does not interfere with it by opting out of
the persistent task cancellation of child tasks.
Relates #54943#52314
We added a fancy method to provide random realistic test data to the
reduction tests in #54910. This uses that to remove some of the more
esoteric machinations in the agg tests. This will marginally increase
the coverage of the serialiation tests and, more importantly, remove
some mysterious value generation code that only really made sense for
random reduction tests but was used all over the place. It doesn't, on
the other hand, make the tests shorter. Just *hopefully* more clear.
I only cleaned up a few tests this way. If we like this it'd probably be
worth grabbing others.
We currently create the .async-search index if necessary before performing any action (index, update or delete). Truth is that this is needed only before storing the initial response. The other operations are either update or delete, which will anyways not find the document to update/delete even if the index gets created when missing. This also caused `testCancellation` failures as we were trying to delete the document twice from the .async-search index, once from `TransportDeleteAsyncSearchAction` and once as a consequence of the search task being completed. The latter may be called after the test is completed, but before the cluster is shut down and causing problems to the after test checks, for instance if it happens after all the indices have been cleaned up. It is totally fine to try to delete a response that is no longer found, but not quite so if such call will also trigger an index creation.
With this commit we remove all the calls to createIndexIfNecessary from the update/delete operation, and we leave one call only from storeInitialResponse which is where the index is expected to be created.
Closes#54180
We recently cleaned up the use of the word "metadata" across the
codebase. Even more additional uses have trickled in, likely from
in-progress work. This commit cleans up these last few additional
instances.
Relates #54519
The use of available processors, the terminology, and the settings
around it have evolved over time. This commit cleans up some places in
the codes and in the docs to adjust to the current terminology.
Deprecate alternative sequence parameter declaration (with then by)
Disallow lack of time units inside maxspan
Fix#55023
Relate #54680
(cherry picked from commit 201adafba9def1de4bf843760defb9def3394f63)
Implement DATETIME_PARSE(<datetime_str>, <pattern_str>) function
which allows to parse a datetime string according to the specified
pattern into a datetime object. The patterns allowed are those of
java.time.format.DateTimeFormatter.
Relates to #53714
(cherry picked from commit 3febcd8f3cdf9fdda4faf01f23a5f139f38b57e0)
Today, we do not clear the recent errors in AutoFollowCoordinator when
we successfully auto-follow indices. This can lead to confusion for the
operators.
This change preserves the task id for internal requests for the `StartDatafeedPersistentTask`.
Task ids are a way to express a relationship between related internal requests.
In this particular case, the task ids are used for debugging and (soon) security auditing,
but not for task cancellation, because there is already a graceful-shutdown of child
internal requests (given a task id) in place.
This change reintroduces the system index APIs for Kibana without the
changes made for marking what system indices could be accessed using
these APIs. In essence, this is a partial revert of #53912. The changes
for marking what system indices should be allowed access will be
handled in a separate change.
The APIs introduced here are wrapped versions of the existing REST
endpoints. A new setting is also introduced since the Kibana system
indices' names are allowed to be changed by a user in case multiple
instances of Kibana use the same instance of Elasticsearch.
Relates #52385
Backport of #54858
* Drop BASE TABLE type in favour for just TABLE
This commit drops the table type 'BASE TABLE' and replaces all
occurences with just 'TABLE', since his type is wider-used and
friendlier to the client applications that query for certain table types
in their discovery mode.
The 'TABLE' type is also explicitely mentioned by the JDBC and ODBC
standards and although other data source-specific types are permitted,
older apps will not work well with them.
* Refactor table type constants out of IndexType
Move SQL_TABLE/_ALIAS out of IndexType, so that they can also be used in
that Enum definition.
(cherry picked from commit 70241b52697ac2cf71004040042123c1ec050299)
Implement DATETIME_FORMAT(<date/datetime/time>, ) function
which allows for formatting a timestamp to the specified format. The
patterns allowed as those of java.time.format.DateTimeFormatter.
Related to #53714
(cherry picked from commit 72be0b54a9299e87e785469cdc9aafac2a48c046)
Today the shards of searchable snapshots are allocated with a naive
`ExistingShardsAllocator` which selects the first valid node for each shard.
Thanks to #54729 we can now allow these shards to fall through to the balanced
shards allocator so that they are allocated in a more balanced fashion.
Relates #50999
Guava was removed from Elasticsearch many years ago, but remnants of it
remain due to transitive dependencies. When a dependency pulls guava
into the compile classpath, devs can inadvertently begin using methods
from guava without realizing it. This commit moves guava to a runtime
dependency in the modules that it is needed.
Note that one special case is the html sanitizer in watcher. The third
party dep uses guava in the PolicyFactory class signature. However, only
calling a method on the PolicyFactory actually causes the class to be
loaded, a reference alone does not trigger compilation to look at the
class implementation. There we utilize a MethodHandle for invoking the
relevant method at runtime, where guava will continue to exist.
This commit introduces a new `geo` module that is intended
to be contain all the geo-spatial-specific features in server.
As a first step, the responsibility of registering the geo_shape
field mapper is moved to this module.
Co-authored-by: Nicholas Knize <nknize@gmail.com>
This removes pipeline aggregators from the aggregation result tree
except for a single field used for backwards compatibility with pre-7.8
versions of Elasticsearch. That field isn't populated unless we are
serializing to pre-7.8 Elasticsearch. So, good news! We no longer build
pipeline aggregators on the data node. Most of the time.
This allows subclasses of `InternalAggregationTestCase` to make a `List`
of values to reduce so that it can make values that are realistic
*together*. The first use of this is with `InternalTTest` which uses it
to make results that don't cause their `sum` field to wrap. It'd likely
be useful for a ton of other aggs but just one for now.
The usage of blank lines as separator between tests can be tricky to
deal with in case of merges where such lines can be added by accident.
Further more counting non-consecutive lines is non-intuitive.
The tests have been aligned to use ; at the end of the query and
exceptions so that the presence or absence of empty lines is irrelevant.
The parsing of the spec has been changed to perform validation to not
allow invalid/incomplete specs to cause exceptions.
(cherry picked from commit 192ad88d3a51e1e1f1f82830526518720ec88217)
#54803 introduces more QA tests for Azure storage service, but
they fail the build is one of the key or token is missing. It should i
nstead work like repository-azure:qa tests.
`testReduceRandom` was bumping up against the serialization that I added
in #54776. This makes it use random values that reduce in ways that
don't cause the randomized serialization to fail.
When a data frame analytics job is stopped, if the reindexing
task was still in progress we cancel it. Cancelling it should
be done from the same context as when we executed the reindexing
task. That means from a thread context with ML origin.
Backport of #54874
Asserting on the failed_step field from the explainAPI can produce flakiness
because the ILM state is moved back and forth between the (failing) step and
the ERROR step (as the workflow is retry, fail then move to ERROR step,
move back to the (failing) step, retry, fail, etc) and the failed_step
information is only available whilst in the ERROR state.
Unmute other tests as they were collateral failures
A read-only index could not be deleted in the wipeCluster phase and caused
these failures
(cherry picked from commit 99a6d57aeb3cf11abc38b514f38a96bb1612e357)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
`scripted_metric` did not work with cross cluster search because it
assumed that you'd never perform a partial reduction, serialize the
results, and then perform a final reduction. That
serialized-after-partial-reduction step was broken.
This is also required to support #54758.
This commit adds a new point field that is able to index arbitrary pair of values (x/y)
in the cartesian space. It only supports filtering using shape queries at the moment.
This is a backport of #54803 for 7.x.
This pull request cherry picks the squashed commit from #54803 with the additional commits:
6f50c92 which adjusts master code to 7.x
a114549 to mute a failing ILM test (#54818)
48cbca1 and 50186b2 that cleans up and fixes the previous test
aae12bb that adds a missing feature flag (#54861)
6f330e3 that adds missing serialization bits (#54864)
bf72c02 that adjust the version in YAML tests
a51955f that adds some plumbing for the transport client used in integration tests
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Yannick Welsch <yannick@welsch.lu>
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
This change adds the response headers of the original search request
in the stored response in order to be able to restore them when retrieving a result
from the async-search index. It also ensures that response headers are preserved for
users that retrieve a final response on a running search task.
Partial response can eventually return response headers too but this change only ensures
that they are present when the response if final.
Relates #33936
This change ensures that the AsyncSearchUser is correctly (de)serialized when
an action executed by this user is sent to a remote node internally (via transport client).
Some functions act as shortcuts for more verbose declarations (sometimes
with certain constraints). This PR removes the boilerplate around
declaring such functions as well as a dedicated rule for the optimizer
to perform the actual substitution.
Fix#54334
(cherry picked from commit 3231d01b0c583deb89252fafe84db48878da3246)
First step towards async search execution. At the moment we don't try to cancel
the underlying search requests, and just check if the task is canceled before
performing network operation (such as field caps and search)
Relates to #49638
Adds t_test metric aggregation that can perform paired and unpaired two-sample
t-tests. In this PR support for filters in unpaired is still missing. It will
be added in a follow-up PR.
Relates to #53692
Previously, the id of the `GetDataFrameAnalyticsStatsAction.Request`
could be `null` which caused NPE on serialization as `writeString`
is used (it doesn't accept null values).
This commit ensures the id is never null.
Closes#54807
Backport of #54808
It seems the 20 seconds timeout is occasionally not enough.
We still get sporadic failures where the logs reveal the job
wasn't opened within 20 seconds. I'm increasing the wait time
to 30 seconds.
Closes#54448
Backport of #54792
This changes a SamlServiceProvider to have a function that maps
from an "action-name" to set of role-names instead of a Map that does
so.
The on-disk representation of this mapping is a set of Java Regexp
Patterns, for which the first matching group is the role name.
For example "sso:(\w+)" would map any action that started with "sso:"
to the corresponding role name (e.g. "sso:superuser" -> "superuser").
Backport of: #54440
The SamlIdentityProviderTests IntegTests would sometimes encounter a
service unavailable exception when registering a new service provider.
This change ensure that there is a data node, and that the cluster
state is recovered before registering providers
Backport of: #54622
A remote client can throw a NoSuchRemoteClusterException while fetching
the cluster state from the leader cluster. We also need to handle that
exception when retrying to add a retention lease to the leader shard.
Closes#53225
The autoscaling REST tests use policies named "hot" in their test
cases. Instead, this commit changes the name of these policies to
"my_autoscaling_policy".
Some field name constants were not updaten when we moved from "string" to "text"
and "keyword" fields. Renaming them makes it easier and faster to know which
field type is used in test subclassing this base test case.
The test results are affected by the off-by-one error that is
fixed by https://github.com/elastic/ml-cpp/pull/1122
This test can be unmuted once that fix is merged and has been
built into ml-cpp snapshots.
Force stopping a failed job used to work but it
now puts the job in `stopping` state and hangs.
In addition, force stopping a `stopping` job is
not handled.
This commit addresses those issues with force
stopping data frame analytics. It inlines the
approach with that followed for anomaly detection
jobs.
Backport of #54650
Backport of #54576.
This commit is part of issue #40366 to remove disabled Xlint warnings
from gradle files. Remove the Xlint exclusions from the following files:
- x-pack/plugin/rollup/build.gradle
- x-pack/plugin/monitoring/build.gradle
- x-pack/qa/rolling-upgrade-basic/build.gradle
Add type parameters to parameterized types. Add wildcard-type parameters
or bounded wildcard-type parameters. Suppress `unchecked` and `rawtypes`
warnings at method level.
In #33933 we disallowed changing the `enabled` parameter in object mappings.
However, the fix didn't cover the root object mapper. This PR adjusts the change
to also include the root mapper and clarifies the error message.
This adds training_percent parameter to the analytics process for Classification and Regression. This parameter is then used to give more accurate memory estimations.
See native side pr: elastic/ml-cpp#1111
Removes pipeline aggregations from the aggregation result tree as they
are no longer used. This stops us from building the pipeline aggregators
at all on data nodes except for backwards compatibility serialization.
This will save a tiny bit of space in the aggregation tree which is
lovely, but the biggest benefit is that it is a step towards simplifying
pipeline aggregators.
This only does about half of the work to remove the pipeline aggs from
the tree. Removing all of it would, well, double the size of the change
and make it harder to review.
* [ML] add new inference_config field to trained model config (#54421)
A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder.
The inference processor can still override whatever is set as the default in the trained model config.
* fixing for backport
This commit addresses an issue with the autoscaling feature flag not
being registered in release builds of the internal cluster tests. This
commit addresses this by enabling the system property that is needed,
but only in release builds.
* [ML] prefer secondary authorization header for data[feed|frame] authz (#54121)
Secondary authorization headers are to be used to facilitate Kibana spaces support + ML jobs/datafeeds.
Now on PUT/Update/Preview datafeed, and PUT data frame analytics the secondary authorization is preferred over the primary (if provided).
closes https://github.com/elastic/elasticsearch/issues/53801
* fixing for backport
- Consolidates HDR/TDigest factories into a single factory
- Consolidates most HDR/TDigest builder into an abstract builder
- Deprecates method(), compression(), numSigFig() in favor of a new
unified PercentileConfig object
- Disallows setting algo options that don't apply to current algo
The unified config method carries both the method and algo-specific
setting. This provides a mechanism to reject settings that apply
to the wrong algorithm. For BWC the old methods are retained
but marked as deprecated, and can be removed in future versions.
Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
These methods are not needed, we were only following a pattern in the
rest of the codebase, but it's legacy from the HLRC sharing
request/response objects with the server.
When one of ML's normalize processes fails to connect to the JVM
quickly enough and another normalize process for the same job
starts shortly afterwards it is possible that their named pipes
can get mixed up.
This change avoids the risk of that by adding an incrementing
counter value into the named pipe names used for normalize
processes.
Backport of #54636
* [ML] add num_matches and preferred_to_categories to category defintion objects (#54214)
This adds two new fields to category definitions.
- `num_matches` indicating how many documents have been seen by this category
- `preferred_to_categories` indicating which other categories this particular category supersedes when messages are categorized.
These fields are only guaranteed to be up to date after a `_flush` or `_close`
native change: https://github.com/elastic/ml-cpp/pull/1062
* adjusting for backport