We disable MSU optimization if the local checkpoint is smaller than
max_seq_no_of_updates. Hence, we need to relax the MSU assertion in
FollowingEngine for that scenario. Suppose the leader has three
operations: index-0, delete-1, and index-2 for the same doc Id. MSU on
the leader is 1 as index-2 is an append. If the follower applies index-0
then index-2, then the assertion is violated.
Closes#47137
To be on the safe side in terms of use cases also add the alias
DATETRUNC to the DATE_TRUNC function.
Follows: #46473
(cherry picked from commit 9ac223cb1fc66486f86e218fa785a32b61e9bacc)
Drop the usage of `SimpleDateFormat` and use the `DateFormatter` instead
(cherry picked from commit 7cf509a7a11ecf6c40c44c18e8f03b8e81fcd1c2)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
In some cases, the fetch size affects the way the groups are returned
causing the last page to go beyond the limit. Add dedicated check to
prevent extra data from being returned.
Fix#47002
(cherry picked from commit f4c29646f097bbd29855300342823ef4cef61c05)
Enables support for Cartesian geometries shape type. We still need to
decide how to handle the distance function since it is currently using
the haversine distance formula and returns results in meters, which
doesn't make any sense for Cartesian geometries.
Closes#46412
Relates to #43644
When the ML native multi-node tests use _cat/indices/_all
and the request goes to a non-master node, _all is
translated to a list of concrete indices by the authz layer
on the coordinating node before the request is forwarded
to the master node. Then it is possible for the master
node to return an index_not_found_exception if one of
the concrete indices that was expanded on the
coordinating node has been deleted in the meantime.
(#47159 has been opened to track the underlying problem.)
It has been observed that the index that gets deleted when
the problem affects the ML native multi-node tests is
always the ML notifications index. The tests that fail are
only interested in the presence or absense of ML results
indices. Therefore the workaround is to only _cat indices
that match the ML results index pattern.
Fixes#45652
In the current implementation, the validation of the role query
occurs at runtime when the query is being executed.
This commit adds validation for the role query when creating a role
but not for the template query as we do not have the runtime
information required for evaluating the template query (eg. authenticated user's
information). This is similar to the scripts that we
store but do not evaluate or parse if they are valid queries or not.
For validation, the query is evaluated (if not a template), parsed to build the
QueryBuilder and verify if the query type is allowed.
Closes#34252
This change merges the `ShardSearchTransportRequest` and `ShardSearchLocalRequest`
into a single `ShardSearchRequest` that can be used to create a SearchContext.
Relates #46523
This commit adds support for POST requests to the SLM `_execute` API,
because POST is a more appropriate HTTP verb for this action as it is
not idempotent. The docs are also changed to favor POST over PUT,
although PUT is not removed or officially deprecated.
* ILM: parse origination date from index name (#46755)
Introduce the `index.lifecycle.parse_origination_date` setting that
indicates if the origination date should be parsed from the index name.
If set to true an index which doesn't match the expected format (namely
`indexName-{dateFormat}-optional_digits` will fail before being created.
The origination date will be parsed when initialising a lifecycle for an
index and it will be set as the `index.lifecycle.origination_date` for
that index.
A user set value for `index.lifecycle.origination_date` will always
override a possible parsable date from the index name.
(cherry picked from commit c363d27f0210733dad0c307d54fa224a92ddb569)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
* Drop usage of Map.of to be java 8 compliant
* Wait for snapshot completion in SLM snapshot invocation
This changes the snapshots internally invoked by SLM to wait for
completion. This allows us to capture more snapshotting failure
scenarios.
For example, previously a snapshot would be created and then registered
as a "success", however, the snapshot may have been aborted, or it may
have had a subset of its shards fail. These cases are now handled by
inspecting the response to the `CreateSnapshotRequest` and ensuring that
there are no failures. If any failures are present, the history store
now stores the action as a failure instead of a success.
Relates to #38461 and #43663
Using arrays of objects with embedded IDs is preferred for new APIs over
using entity IDs as JSON keys. This commit changes the SLM stats API to
use the preferred format.
* [ML][Inference] Feature pre-processing objects and functions (#46777)
To support inference on pre-trained machine learning models, some basic feature encoding will be necessary. I am using a named object serialization approach so new encodings/pre-processing steps could be added in the future.
This PR lays down the ground work for 3 basic encodings:
* HotOne
* Target Mean
* Frequency
More feature encodings or pre-processings could be added in the future:
* Handling missing columns
* Standardization
* Label encoding
* etc....
* fixing compilation for namedxcontent tests
This commit clarifies and points out that the Role management UI and
the Role management API cannot be used to manage roles that are
defined in roles.yml and that file based role management is
intended to have a small administrative scope and not handle all
possible RBAC use cases.
This change allows for the caller of the `saml/prepare` API to pass
a `relay_state` parameter that will then be part of the redirect
URL in the response as the `RelayState` query parameter.
The SAML IdP is required to reflect back the value of that relay
state when sending a SAML Response. The caller of the APIs can
then, when receiving the SAML Response, read and consume the value
as it see fit.
This change adds a check to the migration tool that warns about the deprecated
`enabled` setting for the `_field_names` field on 7.x indices and issues a
warning for templates containing this setting, which has been removed
with 8.0.
Relates to #42854, #46681
When we rewrite alias requests, after filtering down to only those that
the user is authorized to see, it can be that there are no aliases
remaining in the request. However, core Elasticsearch interprets this as
_all so the user would see more than they are authorized for. To address
this, we previously rewrote all such requests to have aliases `"*"`,
`"-*"`, which would be interpreted when aliases are resolved as
nome. Yet, this is only needed for get aliases requests and we were
applying it to all alias requests, including remove index requests. If
such a request was sent to a coordinating node that is not the master
node, the request would be rewritten to include `"*"` and `"-*"`, and
then the master would authorize the user for these. If the user had
limited permissions, the request would fail, even if they were
authorized on the index that the remove index action was over. This
commit addresses this by rewriting for get aliases and remove
aliases request types but not for the remove index.
Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
Co-authored-by: Tim Vernum <tim@adjective.org>
This change works around JDK-8213202, which is a bug related to TLSv1.3
session resumption before JDK 11.0.3 that occurs when there are
multiple concurrent sessions being established. Nodes connecting to
each other will trigger this bug when client authentication is
disabled, which is the case for SSLClientAuthTests.
Backport of #46680
Previously, queries on the _index field were not able to specify index aliases.
This was a regression in functionality compared to the 'indices' query that was
deprecated and removed in 6.0.
Now queries on _index can specify an alias, which is resolved to the concrete
index names when we check whether an index matches. To match a remote shard
target, the pattern needs to be of the form 'cluster:index' to match the
fully-qualified index name. Index aliases can be specified in the following query
types: term, terms, prefix, and wildcard.
This commit replaces the SearchContext used in AbstractQueryTestCase with
a QueryShardContext in order to reduce the visibility of search contexts.
Relates #46523
Add initial PIVOT support for transforming a regular table into a
statistics table around an arbitrary pivoting column:
SELECT * FROM
(SELECT languages, country, salary, FROM mp)
PIVOT (AVG(salary) FOR countries IN ('NL', 'DE', 'ES', 'RO', 'US'))
In the current implementation PIVOT allows only one aggregation however
this restriction is likely to be lifted in the future.
Also not all aggregations are working, in particular MatrixStats are not yet supported.
(cherry picked from commit d91263746a222915c570d4a662ec48c1d6b4f583)
This PR adds some restrictions around testfixtures to make sure the same service ( as defiend in docker-compose.yml ) is not shared between multiple projects.
Sharing would break running with --parallel.
Projects can still share fixtures as long as each has it;s own service within.
This is still useful to share some of the setup and configuration code of the fixture.
Project now also have to specify a service name when calling useCluster to refer to a specific service.
If this is not the case all services will be claimed and the fixture can't be shared.
For this reason fixtures have to explicitly specify if they are using themselves ( fixture and tests in the same project ).
This commit adds the ability to require an ingest pipeline on an
index. Today we can have a default pipeline, but that could be
overridden by a request pipeline parameter. This commit introduces a new
index setting index.required_pipeline that acts similarly to
index.default_pipeline, except that it can not be overridden by a
request pipeline parameter. Additionally, a default pipeline and a
request pipeline can not both be set. The required pipeline can be set
to _none to ensure that no pipeline ever runs for index requests on that
index.
When using auto-generated IDs + the ingest drop processor (which looks to be used by filebeat
as well) + coordinating nodes that do not have the ingest processor functionality, this can lead
to a NullPointerException.
The issue is that markCurrentItemAsDropped() is creating an UpdateResponse with no id when
the request contains auto-generated IDs. The response serialization is lenient for our
REST/XContent format (i.e. we will send "id" : null) but the internal transport format (used for
communication between nodes) assumes for this field to be non-null, which means that it can't
be serialized between nodes. Bulk requests with ingest functionality are processed on the
coordinating node if the node has the ingest capability, and only otherwise sent to a different
node. This means that, in order to reproduce this, one needs two nodes, with the coordinating
node not having the ingest functionality.
Closes#46678
The fact that this test randomly uses a relatively large number
of nodes and hence Netty worker threads created a problem with
running out of direct memory on CI.
Tests run with 512M heap (and hence 512M direct memory) by default.
On a CI worker with 16 cores, this means Netty will by default set
up 32 transport workers. If we get unlucky and a lot of them
actually do work (and thus instantiate a `CopyBytesSocketChannel`
which costs 1M per thread for the thread-local IO buffer) we
would run out of memory.
This specific failure was only seen with `NativeRealmIntegTests` so I
only added the constraint on the Netty worker count here.
We can add it to other tests (or `SecurityIntegTestCase`) if need be
but for now it doesn't seem necessary so I opted for least impact.
Closes#46803
This commit reuses the same state processor that is used for autodetect
to parse state output from data frame analytics jobs. We then index the
state document into the state index.
Backport of #46804
* [ML][Transforms] remove `force` flag from _start (#46414)
* [ML][Transforms] remove `force` flag from _start
* fixing expected error message
* adjusting bwc version
It is possible for a running analytics job that its config is removed
from the '.ml-config' index (perhaps the user deleted the entire index,
etc.). In that case the task remains without a matching config. I have
raised #46781 to discuss how to deal with this issue.
This commit focuses on `MlMemoryTracker` and changes it so that when
we get the configs for the running tasks we leniently ignore missing ones.
This at least means memory tracking will keep working for other jobs
if one or more are missing.
In addition, this commit makes the cleanup code for native analytics
tests more robust by explicitly stopping all jobs and force-stopping
if an error occurs. This helps so that a single failing test does
not cause other tests fail due to pending tasks.
Backport of #46789
Since the `resolveAllDependencies` task resolves all the congfigurations
it can find, this was not caught by our testing, but it's required to be
configuraed specifically.
We should probably cut-over to the new configurations at some point to
avoid problems like this.
Closeselastic/infra#14580