Using serialization/deserialization when dealing with non-trivial
documents causes the process to get stuck not to mention it is expensive.
Use a much more simple approach at the expense of losing information
(we're just interested in the source after all).
(cherry picked from commit e1659822db7ce1390ba9bbfb21768e24a0907dff)
Previously the concrete type parameters for the MappedFieldType didn't always
match those for the FieldMapper. This PR updates the mappers so that the type
parameters always match, which makes the design easier to follow.
to avoid many error stacktraces in logs during a rolling upgrade.
Stack templates use the composable index template and component APIs,these APIs
aren't supported in 7.7 and earlier and in mixed cluster
environments this can cause a lot of ActionNotFoundTransportException
errors in the logs during rolling upgrades. If these templates
are only installed via elected master node then the APIs are always
there and the ActionNotFoundTransportException errors are then prevented.
When an inference model is loaded it is accounted for in circuit breaker
and should not be released until there are no users of the model. Adds
a reference count to the model to track usage.
Today when mounting a searchable snapshot we obtain the snapshot/index
UUIDs and then assume that these are the UUIDs used during the
subsequent restore. If you concurrently delete the snapshot and replace
it with one with the same name then this assumption is violated, with
chaotic consequences.
This commit introduces a check that ensures that the snapshot UUID does
not change during the mount process. If the snapshot remains in place
then the index UUID necessarily does not change either.
Relates #50999
Case sensitivity is incorporated as a test dimension - instead of
running the same test twice, two different tests are created.
Clean-up the test invocation by removing unused parameters.
Fix#59294
(cherry picked from commit 72c8a3582d8e8a4a663d82814a17a1a3d2757292)
Unfortunately, we cannot guarantee that the execution will be truly
async even with 0ms timeout since we cannot block the execution. So, we need
to modify the test to work in both async and non-async mode.
Closes#59416
Backport of #59525 to 7.x branch.
* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
Since #58728 writing operations on searchable snapshot directory cache files
are executed in an asynchronous manner using a dedicated thread pool. The
thread pool used is searchable_snapshots which has been created to execute
prewarming tasks.
Reusing the same thread pool wasn't a good idea as it can lead to deadlock
situations. One of these situation arose in a test failure where the thread pool
was full of prewarming tasks, all waiting for a cache file to be accessible, while
the cache file was being evicted by the cache service. But such an eviction
can only be processed when all read/write operations on the cache file are
completed and in this case the deadlock occurred because the cache file was
actively being read by a concurrent search which also won the privilege to
write the range of bytes in cache... and this writing operation could never have
been completed because of the prewarming tasks making no progress and
filling up the thread pool.
This commit renames the searchable_snapshots thread pool to
searchable_snapshots_cache_fetch_async. Assertions are added to assert
that cache writes are executed using this thread pool and to assert that read
on cached index inputs are executed using a different thread pool to avoid
potential deadlock situations.
This commit also adds a searchable_snapshots_cache_prewarming that is
used to execute prewarming tasks. It also converts the existing cache prewarming
test into a more complte integration test that creates multiple searchable
snapshot indices concurrently with randomized thread pool sizes, and verifies
that all files have been correctly prewarmed.
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.
Backport of #59038
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
* We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations
See updated JavaDoc and added test cases for more details and illustration on the functionality.
Some notes:
The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.
This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.
Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
This commit adds a new api to track when gold+ features are used within
x-pack. The tracking is done internally whenever a feature is checked
against the current license. The output of the api is a list of each
used feature, which includes the name, license level, and last time it
was used. In addition to a unit test for the tracking, a rest test is
added which ensures starting up a default configured node does not
result in any features registering as used.
There are a couple features which currently do not work well with the
tracking, as they are checked in a manner that makes them look always
used. Those features will be fixed in followups, and in this PR they are
omitted from the feature usage output.
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
Instead of retrieving an entire SearchHit, get just a reference and
postpone the document retrieval when assembling the final results.
Remove sort information from results to make them consistent.
Move TumblingWindow under the sequence package.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
(cherry picked from commit bccfbcd81f2f1d3552e95e4a9ee2618fb3059bd9)
API keys can be created nameless using the grant endpoint (it is a bug, see #59484).
This change ensures auditing doesn't throw when such an API Key is used for authn.
The `create_doc`, `create`, `write` and `index` privileges do not grant
the PutMapping action anymore. Apart from the `write` privilege, the other
three privileges also do NOT grant (auto) updating the mapping when ingesting
a document with unmapped fields, according to the templates.
In order to maintain the BWC in the 7.x releases, the above privileges will still grant
the Put and AutoPutMapping actions, but only when the "index" entity is an alias
or a concrete index, but not a data stream or a backing index of a data stream.
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.
The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).
Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
The `Authentication` object that gets built following an API Key authentication
contains the realm name of the owner user that created the key (which is audited),
but the specific field used for storing it changed in #51305 .
This PR makes it so that auditing tolerates an "unfound" realm name, so it doesn't
throw an NPE, because the owner realm name is not found in the expected field.
Closes#59425
Renames and moves the cross validation splitter package.
First, the package and classes are renamed from using
"cross validation splitter" to "train test splitter".
Cross validation as a term is overloaded and encompasses
more concepts than what we are trying to do here.
Second, the package used to be under `process` but it does
not make sense to be there, it can be a top level package
under `dataframe`.
Backport of #59529
When a field is not included yet its type is unsupported, we currently
state that the reason the field is excluded is that it is not in the
includes list. However, this implies the user could include it but
if the user tried to do so, they would get a failure as they would
be including a field with unsupported type.
This commit improves this by stating the reason a not included field
with unsupported type is excluded is because of its type.
Backport of #59424
The primary shards of follower indices during the bootstrap need to be
on nodes with the remote cluster client role as those nodes reach out to
the corresponding leader shards on the remote cluster to copy Lucene
segment files and renew the retention leases. This commit introduces a
new allocation decider that ensures bootstrapping follower primaries are
allocated to nodes with the remote cluster client role.
Co-authored-by: Jason Tedor <jason@tedor.me>
Since we have added checking the cardinality of the dependent_variable
for classification, we have introduced a bug where an NPE is thrown
if the dependent_variable is a missing field.
This commit is fixing this issue.
Backport of #59524
This PR adds minimum support for prefix search of API Key name. It only touches API key name and leave all other query parameters, e.g. realm name, username unchanged.
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.
(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Certain OPs mix usage of boolean and string for boolean type OIDC claims. For example, the same "email_verified" field is presented as boolean in IdToken, but is a string of "true" in the response of user info. This inconsistency results in failures when we try to merge them during authorization.
This PR introduce a small leniency so that it will merge a boolean with a string that has value of the boolean's string representation. In another word, it will merge true with "true", also will merge false with "false", but nothing else.
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.
(cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Improve the way limit (in particular offset) is being applied to handle
the case where the matches are less than the offset and absolute limit.
Combine Matcher and SequenceStateMachine into one class since the two
have evolved beyond their original name and structure.
(cherry picked from commit 63d3c62cdfc33dea03f21d5565b9c8ea104003eb)
separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function.
Foundation to add more functions to transform.
piggy backed fixes:
- when running geo tile group_by it could fail due to query clause limit (unreleased)
- new style page size using settings was not validating limit of 10k (7.8)
- Fix duplicate path deprecation by removing duplicate test resources
- fix deprecated non annotated input property in LazyPropertyList
- fix deprecated usage of AbstractArchiveTask.version
- Resolve correct test resources
Now that we have per-partition categorization, the estimate for
the model memory limit required for a particular analysis config
needs to take into account whether categorization is operating
for the job as a whole or per-partition.
API keys can be created without names using grant API key action. This is considered as a bug (#59484). Since the feature has already been released, we need to accomodate existing keys that are created with null names. This PR relaxes the parser logic so that a null name is accepted.
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is
pretty minimal, returning only the number of data streams and the number of indices currently
abstracted by a data stream:
```
...
"data_streams" : {
"available" : true,
"enabled" : true,
"data_streams" : 3,
"indices_count" : 17
}
...
```
Removes member variable `index` from `ExtractedFieldsDetector`
as it is not used.
Backport of #59395
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Backport of #59293 to 7.x branch.
* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
this results in storing a composable index template
with data stream definition only to work with default
distribution. This way data streams can only be used
with default distribution, since a data stream can
currently only be created if a matching composable index
template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
to fail if `_data_stream_timestamp` meta field mapper
isn't registered. So that a more understandable
error is returned when attempting to store a template
with data stream definition via the oss distribution.
In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
With the introduction of per-partition categorization the old
logic for creating a job notification for categorization status
"warn" does not work. However, the C++ code is already writing
annotations for categorization status "warn" that take into
account whether per-partition categorization is being used and
which partition(s) the warnings relate to. Therefore, this
change alters the Java results processor to create notifications
based on the annotations the C++ writes. (It is arguable that
we don't need both annotations and notifications, but they show
up in different ways in the UI: only annotations are visible in
results and only notifications set the warning symbol in the
jobs list. This means it's best to have both.)
Backport of #59377
This PR ensure that same roles are cached only once even when they are from different API keys.
API key role descriptors and limited role descriptors are now saved in Authentication#metadata
as raw bytes instead of deserialised Map<String, Object>.
Hashes of these bytes are used as keys for API key roles. Only when the required role is not found
in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly
from raw bytes to RoleDescriptors without going through the current detour of
"bytes -> Map -> bytes -> RoleDescriptors".
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.
This setting may also be updated for a stopped job.
More threads may reduce the time it takes to complete the job at the cost
of using more CPU.
Backport of #59254 and #57274
Sequences now support until conditional, which prevents a match from
occurring if the until matches a document while doing look-ups.
Thus a sequence must complete before the until condition matches - if
any document within the sequence occurs at, or after, the until hit, the
sequence is discarded.
(cherry picked from commit 1ba1b9f0661aee655aa48cf9475ac61aaee2bfda)
Since we are able to load the inference model
and perform inference in java, we no longer need
to rely on the analytics process to be performing
test inference on the docs that were not used for
training. The benefit is that we do not need to
send test docs and fit them in memory of the c++
process.
Backport of #58877
Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.
This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.
Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields
to the `access_granted`, `access_denied`, `authentication_success`, and
(some) `tampered_request` audit events. The `apikey.id` and `apikey.name`
are present only when authn using an API Key.
2. When authn with an API Key, the `user.realm` field now contains the effective
realm name of the user that created the key, instead of the synthetic value of
`_es_api_key`.
* Add sample versions of standard deviation and variance functions (#59093)
* Add STDDEV_SAMP, VAR_SAMP
This commit adds the sampling variations of the standard deviation and
variance agg functions.
(cherry picked from commit 8b29817b49e386215f29cb5b3356d0183fd5d9de)
* Fix: workaround for lack of Map#of() in Java8
Replace Map#of() with a HashMap static init.
Waiting `INIT` here is dead code in newer versions that don't use `INIT`
any longer and leads to nothing being written to the repository in older versions
if the snapshot is cancelled at the `INIT` step which then breaks repo consistency
checks.
Since we have other tests ensuring that snapshot abort works properly we can just remove
the wait for `INIT` here and backport this down to 7.8 to fix tests.
relates #59140
Backport of #59076 to 7.x branch.
The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.
Relates to #58642
Relates to #53100Closes#58956Closes#58583
Today we empty the searchable snapshots cache when cleanly closing a
shard, but leak cache files in some cases involving an unclean shutdown.
Such leaks are not permanent, they are cleaned up on shard relocation or
deletion, but they still might last for arbitrarily long until that
happens. This commit introduces a cleanup process that runs during node
startup to catch such leaks sooner.
Also, today we permit searchable snapshots to be held on custom data
paths, and store the corresponding cache files within the custom
location. Supporting this feature would make the cleanup process
significantly more complicated since it would require each node to parse
the index metadata for the shards it held before shutdown. Yet, this
feature is undocumented and offers minimal benefits to searchable
snapshots. Therefore with this commit we forbid custom data paths for
searchable snapshot shards.
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
sub-aggregations live under multi-bucket aggregations. This is
normally true but `parentCardinality` is properly carried forward
for single bucket aggregations like `filter` and for multi-bucket
aggregations configured in single-bucket for like `range` with a
single range.
While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.
Relates to #56487
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
In order to ensure that we do not write a broken piece of `RepositoryData`
because the phyiscal repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.
Relates #56911
Corrected condition that caused a sequence window to be skipped when a query
returns no results by checking not just the current stage but also following
ones as they can match with in-flight sequences.
Improve logging
Fix NPE when emptying a SequenceGroup
Increase randomization in testing
Make maxspan inclusive (up to and equal to value vs just up to)
(cherry picked from commit ad32c488688cb350c2934dfca03af86045e997b0)
Ensure blocking tasks are running before submitting more no-op tasks. This ensures no task would be popped out of the queue unexpectedly, which in turn guarantees the rejection of subsequent authentication request.
Today, we send operations in phase2 of peer recoveries batch by batch
sequentially. Normally that's okay as we should have a fairly small of
operations in phase 2 due to the file-based threshold. However, if
phase1 takes a lot of time and we are actively indexing, then phase2 can
have a lot of operations to replay.
With this change, we will send multiple batches concurrently (defaults
to 1) to reduce the recovery time.
Backport of #58018
The composite role that is used for authz, following the authn with an API key,
is an intersection of the privileges from the owner role and the key privileges defined
when the key has been created.
This change ensures that the `#names` property of such a role equals the `#names`
property of the key owner role, thereby rectifying the value for the `user.roles`
audit event field.
* GET data stream API returns additional information (#59128)
This adds the data stream's index template, the configured ILM policy
(if any) and the health status of the data stream to the GET _data_stream
response.
Restoring a data stream from a snapshot could install a data stream that
doesn't match any composable templates. This also makes the `template`
field in the `GET _data_stream` response optional.
(cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This removes the blocking model lookup from the `inference` aggregator's
builder by integrating it into the request rewrite process that loads
stuff asynchronously.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Fixed an issue #59082 introduced. We have to wait for no more operations
in all tests here not just the one we were waiting in already so that the cleanup
operation from the parent class can run without failure.
Adds error handling when filling up the queue of the crypto thread pool. Also reduce queue size of the crypto thread pool to 10 so that the queue can be cleared out in time.
Test testAuthenticationReturns429WhenThreadPoolIsSaturated has seen failure on CI when it tries to push 1000 tasks into the queue (setup phase). Since multiple tests share the same internal test cluster, it may be possible that there are lingering requests not fully cleared out from the queue. When it happens, we will not be able to push all 1000 tasks into the queue. But since what we need is just queue saturation, so as long as we can be sure that the queue is fully filled, it is safe to ignore rejection error and just move on.
A number of 1000 tasks also take some to clear out, which could cause the test suite to time out. This PR change the queue to 10 so the tests would have better chance to complete in time.
For #58994 it would be useful to be able to share test infrastructure.
This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests
accordingly and adds a shared and efficient (compared to the previous implementations)
way of waiting for no running snapshot operations to the test infrastructure to dry things up further.
There have been a few test failures that are likely caused by tests
performing actions that use ML indices immediately after the actions
that create those ML indices. Currently this can result in attempts
to search the newly created index before its shards have initialized.
This change makes the method that creates the internal ML indices
that have been affected by this problem (state and stats) wait for
the shards to be initialized before returning.
Backport of #59027
* Enforce higher priority for RepositoriesService ClusterStateApplier
This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.
Backport of #58808
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.
The remaining cases in modules, plugins, and x-pack will be handled in followups.
This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.
The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.
Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).
As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.
Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory.
This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired.
As such, use in testing a minimum fetch size which exposed a number of bugs:
Jumping across data across queries causing valid data to be seen as a gap.
Incorrectly resuming searching across pages (again causing data to be discarded).
which have been addressed.
(cherry picked from commit 2f389a7724790d7b0bda67264d6eafcfa8b2116e)
UnresolvedRelation does not care about its source during equality hence
ignore it when doing randomized mutations.
Relates #59014
(cherry picked from commit b21222e714fbf85aad0916e4d4b6a933d2b6958a)
While at it, change the default size to 10 (to align it with the search
API defaults).
(cherry picked from commit 45795939b277e736a9e4f2f008d1c3f406239075)
The PR introduces following two changes:
Move API key validation into a new separate threadpool. The new threadpool is created separately with half of the available processors and 1000 in queue size. We could combine it with the existing TokenService's threadpool. Technically it is straightforward, but I am not sure whether it could be a rushed optimization since I am not clear about potential impact on the token service.
On threadpoool saturation, it now fails with EsRejectedExecutionException which in turns gives back a 429, instead of 401 status code to users.
Backport of #58582 to 7.x branch.
This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.
The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.
Relates to #53100
Dry up tests that use a disruption that isolates the master from all other nodes.
Also, turn disruption types that have neither parameters nor state into constants
to make things a little clearer.
This commit changes our behavior in 2 ways:
- When mapping claims to user properties ( principal, email, groups,
name), we only handle string and array of string type. Previously
we would fail to recognize an array of other types and that would
cause failures when trying to cast to String.
- When adding unmapped claims to the user metadata, we only handle
string, number, boolean and arrays of these. Previously, we would
fail to recognize an array of other types and that would cause
failures when attempting to process role mappings.
For user properties that are inherently single valued, like
principal(username) we continue to support arrays of strings where
we select the first one in case this is being depended on by users
but we plan on removing this leniency in the next major release.
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
Despite all my attempts I did not manage to reproduce issues like the ones
described in #58961. My guess is that the _mount request got retried at
some point but I wasn't able to validate this assumption.
Still, the FsSearchableSnapshotsIT can be pretty disk heavy if a small
random chunk size and a large number of documents is picked up in the
tests. The parent class also does not verify the acknowledged status
of some requests.
This commit lowers down the chunk size and number of docs in tests
(this is extensively tests in unit tests) and also adds assertions on
acknowledged responses.
Relates #58961
Working through a heap dump for an unrelated issue I found that we can easily rack up
tens of MBs of duplicate empty instances in some cases.
I moved to a static constructor to guard against that in all cases.
* SQL: Redact credentials in connection exceptions (#58650)
This commit adds the functionality to redact the credentials from the
exceptions generated when a connection attempt fails, preventing them
from leaking into logs, console history etc.
There are a few causes that can lead to failed connections. The most
challenging to deal with is a malformed connection string. The redaction
tries to get around it by modifying the URI to a parsable state, so that
the redaction can be applied reliably. If there's no reliability
guarantee, the redaction will bluntly replace the entire connection
string and the user informed about the option to modify it so that the
redaction won't apply. (This is done by using a caplitalized scheme,
which is legal, but otherwise never used in practice.)
The commit fixes a couple of other issues with the URI parser:
- it allows an empty hostname, or even entire connection string (as per
the existing documentation);
- it reduces the editing of the connection string in the exception
messages (so that the user easier recognize their input);
- it uses the default URI as source for the scheme and hostname.
(cherry picked from commit a0bd5929d0658c4fed44404e0c4d78eac88222fd)
* Implement String#repeat(), unavailable in Java8
Implement a client.StringUtils#repeatString() as a replacement for
String#repeat(), unavailable in Java8.
SQL: fix handling of escaped chars in JDBC connection string (#58429)
This commit fixes an issue emerging when the connection string URI
contains escaped characters.
The original URI is pre-parsed in order to re-assemble a new URI having
the optional elements filled in with defaults. The new URI has been
using however the unescaped query and fragment parts. So if these
contained any escaped `&` or `=` (such as in the password option value),
the unescaping would reveal them and make them later interfere with the
options parsing.
The commit changes that, so that the new URI be built from the unescaped
"raw" parts of the original URI.
(cherry picked from commit 94eb5a05e79c6e203de548d05b13e00295bd4489)
- The exception that we caught when failing to schedule a thread was incorrect.
- We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager
- when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed.
Closes#58995
.ml-state-write is supposed to be an index alias, however by accident it can become an index. If
.ml-state-write is a concrete index instead of an alias ML stops working. This change improves error
handling by setting the job to failed and properly log and audit the problem. The user still has to
manually fix the problem. This change should lead to a quicker resolution of the problem.
fixes#58482
When we execute search against remote indices, the remote indices are authorized on the remote cluster and not on the CCS cluster. When we introduced submit async search we added a check that requires that the user running it has the privilege to execute it on some index. That prevents users from executing async searches against remote indices unless they also have read access on the CCS cluster, which is common when the CCS cluster holds no data.
The solution is to let the submit async search go through as we already do for get and delete async search. Note that the inner search action will still check that the user can access local indices, and remote indices on the remote cluster, like search always does.
The Saml SP document stored the role mapping in a Set, but this made
the order in XContent inconsistent. This switched it to use a TreeSet.
Resolves: #54733
Backport of: #55201
This is a follow-up to #57573. This commit combines coordinating and
primary bytes under the same "write" bucket. Double accounting is
prevented by only accounting the bytes at either the reroute phase or
the primary phase. TransportBulkAction calls execute directly, so the
operations handler is skipped and the bytes are not double accounted.
A regression in the mapping code led to geo_shape no longer supporting
array-valued fields. This commit fixes this support and adds an integration
test to make sure this problem does not return!
We already had code to ensure the config index mappings were
up-to-date before creating a new config. However, it's also
possible that an update to a config could add the latest
settings that require the latest mappings to index correctly.
This change checks that the latest config index mappings are
in place in the 3 update actions in the same way as the checks
are done in the 3 put actions.
Backport of #58916
Refactor sequence matching classes in order to decouple querying from
results consumption (and matching).
Rename some classes to better convey their intent.
Introduce internal pagination of sequence algorithm, that is getting the
data in slices and, if needed, moving forward in order to find more
matches until either the dataset is consumer or the number of results
desired is found.
(cherry picked from commit bcf2c1141302f3f98c85e82d2c501aa02c8540e9)
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.
Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
* [ML] handles compressed model stream from native process (#58009)
This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.
1. ModelSizeInfo which contains model size information
2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.
`model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.
Native side change: https://github.com/elastic/ml-cpp/pull/1349
When the documents are large, a follower can receive a partial response
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.
Backport of #58620
Since #58728 part of searchable snapshot shard files are written in cache
in an asynchronous manner in a dedicated thread pool. It means that even
if a search query is successful and returns, there are still more bytes to
write in the cached files on disk.
On CI this can be slow; if we want to check that the cached_bytes_written
has changed we need to check multiple times to give some time for the
cached data to be effectively written.
The checks on the license state have a singular method, isAllowed, that
returns whether the given feature is allowed by the current license.
However, there are two classes of usages, one which intends to actually
use a feature, and another that intends to return in telemetry whether
the feature is allowed. When feature usage tracking is added, the latter
case should not count as a "usage", so this commit reworks the calls to
isAllowed into 2 methods, checkFeature, which will (eventually) both
check whether a feature is allowed, and keep track of the last usage
time, and isAllowed, which simply determines whether the feature is
allowed.
Note that I considered having a boolean flag on the current method, but
wanted the additional clarity that a different method name provides,
versus a boolean flag which is more easily copied without realizing what
the flag means since it is nameless in call sites.
This commit changes CacheFile and CachedBlobContainerIndexInput so that
the read operations made by these classes are now progressively executed
and do not wait for full range to be written in cache. It relies on the change
introduced in #58477 and it is the last change extracted from #58164.
Relates #58164
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.
The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.
Relates https://github.com/elastic/elasticsearch/issues/57023
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.
This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.
Backport of #58029 to 7.x
* Picky picky
* Missing type
SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective.
The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error.
This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.
Backport of #58419
Mapping updates that originate from indexing a document with unmapped fields will use this new action
instead of the current put mapping action. This way on the security side, authorization logic
can easily determine whether a mapping update is automatically generated or a mapping update originates
from the put mapping api.
The new auto put mapping action is only used if all nodes are on the version that supports it.
This PR implements recursive mapping merging for composable index templates.
When creating an index, we perform the following:
* Add each component template mapping in order, merging each one in after the
last.
* Merge in the index template mappings (if present).
* Merge in the mappings on the index request itself (if present).
Some principles:
* All 'structural' changes are disallowed (but everything else is fine). An
object mapper can never be changed between `type: object` and `type: nested`. A
field mapper can never be changed to an object mapper, and vice versa.
* Generally, each section is merged recursively. This includes `object`
mappings, as well as root options like `dynamic_templates` and `meta`. Once we
reach 'leaf components' like field definitions, they always overwrite an
existing one instead of being merged.
Relates to #53101.
When per_partition_categorization.stop_on_warn is set for an analysis
config it is now passed through to the autodetect C++ process.
Also adds some end-to-end tests that exercise the functionality
added in elastic/ml-cpp#1356
Backport of #58632
* Replace compile configuration usage with api (#58451)
- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
- required as java library will by default not have build jar file
- jar file is now explicit input of the task and gradle will ensure its properly build
* Fix compile usages in 7.x branch
RandomZone test method returns a ZoneId from the set of ids supported by
java. The only difference between joda and java supported timezones are
SystemV* timezones.
These should be excluded from randomZone method as they would break
testing. They also do not bring much confidence when used in testing as
I suspect they are rarely used.
That exclude should be removed for simplification once joda support is removed.
This commit adds the BuildParams.testSeed to the repository base paths used
in searchable snapshots QA tests. For S3 and GCS the test seed is added for
coherency sake with other integration tests while it's required for Azure as
Azure 3rd party tests are executed on CI simultaneously for regular and
SAS token accounts.
Closes#58260
The GET /_license endpoint displays "enterprise" licenses as
"platinum" by default so that old clients (including beats, kibana and
logstash) know to interpret this new license type as if it were a
platinum license.
However, this compatibility layer was not applied to the GET /_xpack/
endpoint which also displays a license type & mode.
This commit causes the _xpack API to mimic the _license API and treat
enterprise as platinum by default, with a new accept_enterprise
parameter that will cause the API to return the correct "enterprise"
value.
This BWC layer exists only for the 7.x branch.
This is a breaking change because, since 7.6, the _xpack API has
returned "enterprise" for enterprise licenses, but this has been found
to break old versions of beats and logstash so needs to be corrected.
EQL sequences can specify now a maximum time allowed for their span
(computed between the first and the last matching event).
(cherry picked from commit 747c3592244192a2e25a092f62aec91a899afc83)
* EQL: case sensitivity aware integration testing (#58624)
* Add DataLoader
* Rewrite case sensitivity settings:
NULL -> run both case sensitive and insensitive tests
TRUE -> run case sensitive test only
FALSE -> run case insensitive test only
* Rename test_queries_supported
* Add more toml tests from the Python client
Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
(cherry picked from commit 34d383421599f060a5c083b40df35f135de49e39)
SparseFileTracker.Gap can keep a reference to the corresponding range it is about to fill,
it does not need to resolve the range each time onSuccess/onProgress/onFailure are
called.
Relates #58477
The remote_monitoring_user user needs to access the enrich stats API.
But the request is denied because the API is categorized under admin.
The correct privilege should be monitor.
Adds parsing of `status` and `memory_reestimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`memory_reestimate_bytes` can be used to update the job's
limit in order to restart the job.
Backport of #58588
Introduce pipe support, in particular head and tail
(which can also be chained).
(cherry picked from commit 4521ca3367147d4d6531cf0ab975d8d705f400ea)
(cherry picked from commit d6731d659d012c96b19879d13cfc9e1eaf4745a4)
Today SparseFileTracker allows to wait for a range to become available
before executing a given listener. In the case of searchable snapshot,
we'd like to be able to wait for a large range to be filled (ie, downloaded
and written to disk) while being able to execute the listener as soon as
a smaller range is available.
This pull request is an extract from #58164 which introduces a
ProgressListenableActionFuture that is used internally by
SparseFileTracker. The progressive listenable future allows to register
listeners attached to SparseFileTracker.Gap so that they are executed
once the Gap is completed (with success or failure) or as soon as the
Gap progress reaches a given progress value. This progress value is
defined when the tracker.waitForRange() method is called; this method
has been modified to accept a range and another listener's range to
operate on.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
If a project is pulling in an external org.elasticsearch dependency, the dependency
report generation would require a license file for the dependency to be present.
This would break precommit because a license was present that it did not feel was
warranted. This un-reverts the update to the dependenciesInfo task, as well as the
JNA license addition.
In SLM retention, when a minimum number of snapshots is required for retention, we prefer to remove
the oldest snapshots first. To perform this, we limit one of the streams, in a rare case this can
cause:
```
[mynode] error during snapshot retention task
java.lang.IllegalArgumentException: -5
at java.util.stream.ReferencePipeline.limit(ReferencePipeline.java:469) ~[?:?]
at org.elasticsearch.xpack.core.slm.SnapshotRetentionConfiguration.lambda$getSnapshotDeletionPredicate$6(SnapshotRetentionConfiguration.java:195) ~[?:?]
at org.elasticsearch.xpack.slm.SnapshotRetentionTask.snapshotEligibleForDeletion(SnapshotRetentionTask.java:245) ~[?:?]
at org.elasticsearch.xpack.slm.SnapshotRetentionTask$1.lambda$onResponse$0(SnapshotRetentionTask.java:163) ~[?:?]
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176) ~[?:?]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1624) ~[?:?]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
```
When certain criteria are met. This commit fixes the negative limiting with `Math.max(0, ...)` and
adds a unit test for the behavior.
Resolves#58515
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
- node.data: false
- node.ingest: false
- node.remote_cluster_client: false
- node.ml: false
at a minimum if they are relying on defaults, but also add:
- node.master: true
- node.transform: false
- node.voting_only: false
If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.
This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.
With this change we deprecate the existing 'node.*' settings such as
'node.data'.
* [ML] make waiting for renormalization optional for internally flushing job (#58537)
When flushing, datafeeds only need the guaruntee that the latest bucket has been handled.
But, in addition to this, the typical call to flush waits for renormalization to complete. For large jobs, this can take a fair bit of time (even longer than a bucket length). This causes unnecessary delays in handling data.
This commit adds a new internal only flag that allows datafeeds (and forecasting) to skip waiting on renormalization.
closes#58395
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.
This PR addresses #9572.
The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.
At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.
The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.
Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue.
It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.
Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
The main changes are:
1. Catch the `NamedObjectNotFoundException` when parsing aggregation
type, and then throw a `ParsingException` with clear error message with hint.
2. Add a unit test method: AggregatorFactoriesTests#testInvalidType().
Closes#58146.
Co-authored-by: bellengao <gbl_long@163.com>
It is possible for the source document to have an empty string value
for a field that is mapped as numeric. We should treat those as missing
values and avoid throwing an assertion error.
Backport of #58541
This changes the default value for the results field of inference
applied on models that are trained via a data frame analytics job.
Previously, the results field default was `predicted_value`. This
commit makes it the same as in the training job itself. The new
default field is `<dependent_variable>_prediction`. Apart from
making inference consistent with the training job the model came
from, it is helpful to preserve the dependent variable name
by default as it provides some context to the user that may
avoid confusion as to which model results came from.
Backport of #58538
* Add acm mapping to APM for beats
* Add root mapping for APM
* Add sourcemap mapping to APM
* Fix missing properties
* Fix a second missing properties
* Add request property to acm
* Remove root and sourcemap per review
Co-authored-by: Mike Place <mike.place@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.
Relates to #53175
Similarities only apply to a few text-based field types, but are currently set directly on
the base MappedFieldType class. This commit moves similarity information into
TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper.
It was previously possible to include a similarity parameter on a number of field types
that would then ignore this information. To make it obvious that this has no effect, setting
this parameter on non-text field types now issues a deprecation warning.
This change allows the submit async search task to cancel children
and removes the manual indirection that cancels the search task when the submit
task is cancelled. This is now handled by the task cancellation, which can cancel
grand-children since #54757.
This commits allows data streams to be a valid source for analytics and transforms.
Data streams are fairly transparent and our `_search` and `_reindex` actions work without error.
For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.
GET inference stats now reads from the .ml-stats index.
Our tests should wait for yellow state before attempting to query the index for stat information.
Unlike `classification`, which is using a cross validation splitter
that produces training sets whose size is predictable and equal to
`training_percent * class_cardinality`, for regression we have been
using a random splitter that takes an independent decision for each
document. This means we cannot predict the exact size of the training
set. This poses a problem as we move towards performing test inference
on the java side as we need to be able to provide an accurate upper
bound of the training set size to the c++ process.
This commit replaces the random splitter we use for regression with
the same streaming-reservoir approach we do for `classification`.
Backport of #58331
Improve the usability of the MS-SQL server/ODBC escaped
date/time/timestamp literals, by allowing timezone/offset ids
in the parsed string, e.g.:
```
{ts '2000-01-01T11:11:11Z'}
```
Closes: #58262
(cherry picked from commit 0af1f2fef805324e802d97d2fd9b4660abb403f0)
There was a discrepancy in the implementation of flush
acknowledgements: most of the class was designed on the
basis that the "last finalized bucket time" could be null
but the wire serialization assumed that it was never
null. This works because, the C++ sends zero "last
finalized bucket time" when it is not known or not
relevant. But then the Java code will print that to
XContent as it is assuming null represents not known or
not relevant.
This change corrects the discrepancies. Internally within
the class null represents not known or not relevant, but
this is translated from/to 0 for communications from the
C++ and old nodes that have the bug.
Additionally I switched from Date to Instant for this
class and made the member variables final to modernise it
a bit.
Backport of #58413
Now that MappedFieldType no longer extends lucene's FieldType, we need to have a
way of getting the index information about a field necessary for building text queries,
building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo
abstraction that holds this information, and a getTextSearchInfo() method to
MappedFieldType to make it available. Field types that do not support text search can
just return null here.
This allows us to remove the MapperService.getLuceneFieldType() shim method.
FieldTypeLookup maps field names to their MappedFieldTypes. In the past, due to
the presence of multiple mapping types within a single index, this had to be updated
in-place because a mapping update might only affect one type. However, now that
we only have a single type per index, we can completely rebuild the FieldTypeLookup
on each update, removing lots of concurrency worries.
Adds a new value to the "event" enum of ML annotations, namely
"categorization_status_change".
This will allow users to see when categorization was found to
be performing poorly. Once per-partition categorization is
available, it will allow users to see when categorization is
performing poorly for a specific partition.
It does not make sense to reuse the "model_change" event that
annotations already have, because categorizer state is separate
to model state ("model" state is really anomaly detector state),
and is not reverted by the revert model snapshot API.
Therefore annotations related to categorization need to be
treated differently to annotations related to anomaly detection.
Backporting #58096 to 7.x branch.
Relates to #53100
* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
When doing aliasing with the same name over non existing fields, the analyzer gets stuck in a loop trying to resolve the alias over and over leading to SO. This PR breaks the cycle by checking the relationship between the alias and the child it tries to replace as an alias should never replace its child.
Fix#57270Close#57417
Co-authored-by: Hailei <zhh5919@163.com>
(cherry picked from commit 46786ff2e1ed5951006ff4bdd2b6ac6a1ebcf17b)
* Add support for snapshot and restore to data streams (#57675)
This change adds support for including data streams in snapshots.
Names are provided in indices field (the same way as in other APIs), wildcards are supported.
If rename pattern is specified it renames both data streams and backing indices.
It also adds test to make sure SLM works correctly.
Closes#57127
Relates to #53100
* version fix
* compilation fix
* compilation fix
* remove unused changes
* compilation fix
* test fix
When a local model is constructed, the cache hit miss count is incremented.
When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
This commit fixes an AOOBE in the handling of fatal
failures in _async_search. If the underlying cause is not found,
this change uses the root failure.
Closes#58311
Today when creating a follower index via the put follow API, or via an
auto-follow pattern, it is not possible to specify settings overrides
for the follower index. Instead, we copy all of the leader index
settings to the follower. Yet, there are cases where a user would want
some different settings on the follower index such as the number of
replicas, or allocation settings. This commit addresses this by allowing
the user to specify settings overrides when creating follower index via
manual put follower calls, or via auto-follow patterns. Note that not
all settings can be overrode (e.g., index.number_of_shards) so we also
have detection that prevents attempting to override settings that must
be equal between the leader and follow index. Note that we do not even
allow specifying such settings in the overrides, even if they are
specified to be equal between the leader and the follower
index. Instead, the must be implicitly copied from the leader index, not
explicitly set by the user.
Fixes a bug in TextFieldMapper serialization when index is false, and adds a
base-class test to ensure that all field mappers are tested against all variations
with defaults both included and excluded.
Fixes#58188
TIME_PARSE works correctly if both date and time parts are specified,
and a TIME object (that contains only time is returned).
Adjust docs and add a unit test that validates the behavior.
Follows: #55223
(cherry picked from commit 9d6b679a5da88f3c131b9bdba49aa92c6c272abe)
This is currently used to set the indexVersionCreated parameter on FieldMapper.
However, this parameter is only actually used by two implementations, and clutters
the API considerably. We should just remove it, and use it directly in the
implementations that require it.
Today the read/write locks used internally by CacheFile object are
wrapped into a ReleasableLock. This is not strictly required and also
prevents usage of the tryLock() methods which we would like to use
for early releasing of read operations (#58164).
This changes the actions that would attempt to make the managed index read only to
check if the managed index is the write index of a data stream before proceeding.
The updated actions are shrink, readonly, freeze and forcemerge.
(cherry picked from commit c906f631833fee8628f898917a8613a1f436c6b1)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
As part of the "ML in Spaces" project, access to the ML UI in
Kibana is migrating to being controlled by Kibana privileges.
The ML UI will check whether the logged-in user has permission
to do something ML-related using Kibana privileges, and if they
do will call the relevant ML Elasticsearch API using the Kibana
system user. In order for this to work the kibana_system role
needs to have administrative access to ML.
Backport of #58061
This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are
now on the latest maintained line, and pick up a large collection of bug
fixes that have accumulated.
The main improvement here is that the total expected
count of training rows in the test is calculated as the
sum of the training fraction times the cardinality of each
class (instead of the training fraction times the total doc count).
Also relaxes slightly the error bound on the uniformity test from 0.12
to 0.13.
Closes#54122
Backport of #58180
We don't allow converting a data stream's writeable index into a searchable
snapshot. We are currently preventing swapping a data stream's write index
with the restored index.
This adds another step that will not proceed with the searchable snapshot action
until the managed index is not the write index of a data stream anymore.
(cherry picked from commit ccd618ead7cf7f5a74b9fb34524d00024de1479a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
MappedFieldType is a combination of two concerns:
* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched
We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.
Relates to #56814
The leases issued by CCR keep one extra operation around on the leader shards. This is not
harmful to the leader cluster, but means that there's potentially one delete that can't be
cleaned up.
Today a mounted searchable snapshot defaults to having the same replica
configuration as the index that was snapshotted. This commit changes this
behaviour so that we default to zero replicas on these indices, but allow the
user to override this in the mount request.
Relates #50999
This commit adds an optional field, `description`, to all ingest processors
so that users can explain the purpose of the specific processor instance.
Closes#56000.
This need some reorg of BinaryDV field data classes to allow specialisation of scripted doc values.
Moved common logic to a new abstract base class and added a new subclass to return string-based representations to scripts.
Closes#58044
Allows the kibana user to collect data telemetry in a background
task by giving the kibana_system built-in role the view_index_metadata
and monitoring privileges over all indices (*).
Without this fix, users who try to use Metricbeat for Stack Monitoring today
see the following error repeatedly in their Metricbeat log. Due to this error
Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring
data is indexed into the Elasticsearch cluster.
Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
Previously we excluded requiring licenses for dependencies with the
group name org.elasticsearch under the assumption that these use the
top-level Elasticsearch license. This is not always correct, for
example, for the org.elasticsearch:jna dependency as this is merely a
wrapper around the upstream JNA project, and that is the license that we
should be including. A recent change modified this check from using the
group name to checking only if the dependency is a project
dependency. This exposed the use of JNA in SQL CLI to this check, but
the license for it was not added. This commit addresses this by adding
the license.
Relates #58015
This has `EnsembleInferenceModel` not parse feature_names from the XContent.
Instead, it will rely on `rewriteFeatureIndices` to be called ahead time.
Consequently, protections are made for a fail fast path if `rewriteFeatureIndices` has not been called before `infer`.
This type of result will store stats about how well categorization
is performing. When per-partition categorization is in use, separate
documents will be written for every partition so that it is possible
to see if categorization is working well for some partitions but not
others.
This PR is a minimal implementation to allow the C++ side changes to
be made. More Java side changes related to per-partition
categorization will be in followup PRs. However, even in the long
term I do not see a major benefit in introducing dedicated APIs for
querying categorizer stats. Like forecast request stats the
categorizer stats can be read directly from the job's results alias.
Backport of #57978
Adds support for reading in `model_size_info` objects.
These objects contain numeric values indicating the model definition size and complexity.
Additionally, these objects are not stored or serialized to any other node. They are to be used for calculating and storing model metadata. They are much smaller on heap than the true model definition and should help prevent the analytics process from using too much memory.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Now that annotations are part of the anomaly detection job results
the annotations index should be refreshed on flushing and closing
the job so that flush and close continue to fulfil their contracts
that immediately after returning all results the job generated up
to that point are searchable.
ModelLoadingService only caches models if they are referenced by an
ingest pipeline. For models used in search we want to always cache the
models and rely on TTL to evict them. Additionally when an ingest
pipeline is deleted the model it references should not be evicted if
it is used in search.
Search after is a better choice for the delete expired data iterators
where processing takes a long time as unlike scroll a context does not
have to be kept alive. Also changes the delete expired data endpoint to
404 if the job is unknown
Since we change the memory estimates for data frame analytics jobs from worst case to a realistic case, the strict less-than assertion in the test does not hold anymore. I replaced it with a less-or-equal-than assertion.
Backport or #57882
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.
This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN
level. This WARN level log should only occur on pipeline creation which
is a much lower frequency then every document.
Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
Allow a field inside the data to be used as a tie breaker for events
that have the same timestamp.
The field is optional by default.
If used, the tie-breaker always requires a non-null value since it is
used inside `search_after` which requires a non-null value.
Fix#56824
(cherry picked from commit e5719ecb474b32730d93afdbb6834a32b0b2df8b)
The shrink action creates a shrunken index with the target number of shards.
This makes the shrink action data stream aware. If the ILM managed index is
part of a data stream the shrink action will make sure to swap the original
managed index with the shrunken one as part of the data stream's backing
indices and then delete the original index.
(cherry picked from commit 99aeed6acf4ae7cbdd97a3bcfe54c5d37ab7a574)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This deprecates `Rounding#round` and `Rounding#nextRoundingValue` in
favor of calling
```
Rounding.Prepared prepared = rounding.prepare(min, max);
...
prepared.round(val)
```
because it is always going to be faster to prepare once. There
are going to be some cases where we won't know what to prepare *for*
and in those cases you can call `prepareForUnknown` and stil be faster
than calling the deprecated method over and over and over again.
Ultimately, this is important because it doesn't look like there is an
easy way to cache `Rounding.Prepared` or any of its precursors like
`LocalTimeOffset.Lookup`. Instead, we can just build it at most once per
request.
Relates to #56124
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
* Convert to date/datetime the result of numeric aggregations (min, max)
in Painless scripts
(cherry picked from commit f1de99e2a6fbf3806c4f2b6b809738aa8faa2d75)
This adds new plugin level circuit breaker for the ML plugin.
`model_inference` is the circuit breaker qualified name.
Right now it simply adds to the breaker when the model is loaded (and possibly breaking) and removing from the breaker when the model is unloaded.
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.
This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.
Related #38373, #41656Closes#24422
We want to validate the DataStreams on creation to make sure the future backing
indices would not clash with existing indices in the system (so we can
always rollover the data stream).
This changes the validation logic to allow for a DataStream to be created
with a backing index that has a prefix (eg. `shrink-foo-000001`) even if the
former backing index (`foo-000001`) exists in the system.
The new validation logic will look for potential index conflicts with indices
in the system that have the counter in the name greater than the data stream's
generation.
This ensures that the `DataStream`'s future rollovers are safe because for a
`DataStream` `foo` of generation 4, we will look for standalone indices in the
form of `foo-%06d` with the counter greater than 4 (ie. validation will fail if
`foo-000006` exists in the system), but will also allow replacing a
backing index with an index named by prefixing the backing index it replaces.
(cherry picked from commit 695b242d69f0dc017e732b63737625adb01fe595)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Deleting expired data can take a long time leading to timeouts if there
are many jobs. Often the problem is due to a few large jobs which
prevent the regular maintenance of the remaining jobs. This change adds
a job_id parameter to the delete expired data endpoint to help clean up
those problematic jobs.
This makes it easier to debug where such tasks come from in case they are returned from the get tasks API.
Also renamed the last occurrence of waitForCompletion to waitForCompletionTimeout in get async search request.
This PR adds the initial Java side changes to enable
use of the per-partition categorization functionality
added in elastic/ml-cpp#1293.
There will be a followup change to complete the work,
as there cannot be any end-to-end integration tests
until elastic/ml-cpp#1293 is merged, and also
elastic/ml-cpp#1293 does not implement some of the
more peripheral functionality, like stop_on_warn and
per-partition stats documents.
The changes so far cover REST APIs, results object
formats, HLRC and docs.
Backport of #57683
This is a major refactor of the underlying inference logic.
The main refactor is now we are separating the model configuration and
the inference interfaces.
This has the following benefits:
- we can store extra things with the model that are not
necessary for inference (i.e. treenode split information gain)
- we can optimize inference separate from model serialization and storage.
- The user is oblivious to the optimizations (other than seeing the benefits).
A major part of this commit is removing all inference related methods from the
trained model configurations (ensemble, tree, etc.) and moving them to a new class.
This new class satisfies a new interface that is ONLY for inference.
The optimizations applied currently are:
- feature maps are flattened once
- feature extraction only happens once at the highest level
(improves inference + feature importance through put)
- Only storing what we need for inference + feature importance on heap
#47711 and #47246 helped to validate that monitoring settings are
rejected at time of setting the monitoring settings. Else an invalid
monitoring setting can find it's way into the cluster state and result
in an exception thrown [1] on the cluster state application (there by
causing significant issues). Some additional monitoring settings have
been identified that can result in invalid cluster state that also
result in exceptions thrown on cluster state application.
All settings require a type of either http or local to be
applicable. When a setting is changed, the exporters are automatically
updated with the new settings. However, if the old or new settings lack
of a type setting an exception will be thrown (since exporters are
always of type 'http' or 'local'). Arguably we shouldn't blindly create
and destroy new exporters on each monitoring setting update, but the
lifecycle of the exporters is abit out the scope this PR is trying to
address.
This commit introduces a similar methodology to check for validity as
#47711 and #47246 but this time for ALL (including non-http) settings.
Monitoring settings are not useful unless there an exporter with a type
defined. The type is used as dependent setting, such that it must
exist to set the value. This ensures that when any monitoring settings
changes that they can only get added to cluster state if the type
exists. If the type exists (and the other validations pass) then the
exporters will get re-built and the cluster state remains valid.
Tests have been included to ensure that all dynamic monitoring settings
have the type as dependent settings.
[1]
org.elasticsearch.common.settings.SettingsException: missing exporter type for [found-user-defined] exporter
at org.elasticsearch.xpack.monitoring.exporter.Exporters.initExporters(Exporters.java:126) ~[?:?]
When we force delete a DF analytics job, we currently first force
stop it and then we proceed with deleting the job config.
This may result in logging errors if the job config is deleted
before it is retrieved while the job is starting.
Instead of force stopping the job, it would make more sense to
try to stop the job gracefully first. So we now try that out first.
If normal stop fails, then we resort to force stopping the job to
ensure we can go through with the delete.
In addition, this commit introduces `timeout` for the delete action
and makes use of it in the child requests.
Backport of #57680
rewrite config on update if either version is outdated, credentials change,
the update changes the config or deprecated settings are found. Deprecated
settings get migrated to the new format. The upgrade can be easily extended to
do any necessary re-writes.
fixes#56499
backport #57648
For a rolling/mixed cluster upgrade (add new version to existing cluster
then shutdown old instances), the watches that ship by default
with monitoring may not get properly updated to the new version.
Monitoring watches can only get published if the internal state is
marked as dirty. If a node is not master, will also get marked as
clean (e.g. not dirty).
For a mixed cluster upgrade, it is possible for the new node to be
added, not as master, the internal state gets marked as clean so
that no more attempts can be made to publish the watches. This
happens on all new nodes. Once the old nodes are de-commissioned
one of the new version nodes in the cluster gets promoted to master.
However, that new master node (with out intervention like restarting
the node or removing/adding exporters) will never attempt to re-publish
since the internal state was already marked as clean.
This commit adds a cluster state listener to mark the resource dirty
when a node is promoted to master. This will allow the new resource
to be published without any intervention.
In #55592 and #55416, we deprecated the settings for enabling and disabling
basic license features and turned those settings into no-ops. Since doing so,
we've had feedback that this change may not give users enough time to cleanly
switch from non-ILM index management tools to ILM. If two index managers
operate simultaneously, results could be strange and difficult to
reconstruct. We don't know of any cases where SLM will cause a problem, but we
are restoring that setting as well, to be on the safe side.
This PR is not a strict commit reversion. First, we are keeping the new
xpack.watcher.use_ilm_index_management setting, introduced when
xpack.ilm.enabled was made a no-op, so that users can begin migrating to using
it. Second, the SLM setting was modified in the same commit as a group of other
settings, so I have taken just the changes relating to SLM.
* Remove duplicate ssl setup in sql/qa projects
* Fix enforcement of task instances
* Use static data for cert generation
* Move ssl testing logic into a plugin
* Document test cert creation
This PR replaces the marker interface with the method
FieldMapper#parsesArrayValue. I find this cleaner and it will help with the
fields retrieval work (#55363).
The refactor also ensures that only field mappers can declare they parse array
values. Previously other types like ObjectMapper could implement the marker
interface and be passed array values, which doesn't make sense.
Add `TRIM` function which combines the functionality of both
`LTRIM` and `RTRIM` by stripping both leading and trailing
whitespaces.
Refers to #41195
(cherry picked from commit 6c86c919e12f0c4cb5e39d129aa65ab3e274268f)
We test expected TLS failures by catching SSLException, but other
security providers ( i.e. BCFIPS ) might throw a different one. In
this case, BCFIPS throws org.bouncycastle.tls.TlsFatalAlert
* Move classes from build scripts to buildSrc
- move Run task
- move duplicate SanEvaluator
* Remove :run workaround
* Some little cleanup on build scripts on the way
As the datastream information is stored in the `ClusterState.Metadata` we exposed
the `Metadata` to the `AsyncWaitStep#evaluateCondition` method in order for
the steps to be able to identify when a managed index is part of a DataStream.
If a managed index is part of a DataStream the rollover target is the DataStream
name and the highest generation index is the write index (ie. the rolled index).
(cherry picked from commit 6b410dfb78f3676fce1b7401f1628c1ca6fbd45a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Some BI tools (i.e. Tableau) would try to cast strings where the time
part is separated from the date part with a whitespace instead of `T`.
Adjust type conversion used by CAST to support this.
(cherry picked from commit 0e18321e7ad9f779c42855efbf93f171b9128a5e)
Add basic support for `TOP X` as a synonym to LIMIT X which is used
by [MS-SQL server](https://docs.microsoft.com/en-us/sql/t-sql/queries/top-transact-sql?view=sql-server-ver15),
e.g.:
```
SELECT TOP 5 a, b, c FROM test
```
TOP in SQL server also supports the `PERCENTAGE` and `WITH TIES`
keywords which this implementation doesn't.
Don't allow usage of both TOP and LIMIT in the same query.
Refers to #41195
(cherry picked from commit 2f5ab81b9ad884434d1faa60f4391f966ede73e8)
At some point, we changed the supported-type test to also catch
assertion errors. This has the side effect of also catching the
`fail()` call inside the try-catch, which silently smothered some
failures.
This modifies the test to throw at the end of the try-catch
block to prevent from accidentally catching itself.
Catching the AssertionError is convenient because there are other locations
that do throw an assertion in tests (due to hitting an assertion
before the exception is thrown) so I think we should keep it around.
Also includes a variety of fixes to other tests which were failing
but being silently smothered.
* [ML] mark forecasts for force closed/failed jobs as failed (#57143)
forecasts that are still running should be marked as failed/finished in the following scenarios:
- Job is force closed
- Job is re-assigned to another node.
Forecasts are not "resilient". Their execution does not continue after a node failure. Consequently, forecasts marked as STARTED or SCHEDULED should be flagged as failed. These forecasts can then be deleted.
Additionally, force closing a job kills the native task directly. This means that if a forecast was running, it is not allowed to complete and could still have the status of `STARTED` in the index.
relates to https://github.com/elastic/elasticsearch/issues/56419
* [ML] adds new for_export flag to GET _ml/inference API (#57351)
Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.
This flag is useful for moving models between clusters.
* Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695)
This commit lays the ground work for plugins supplying their own circuit breakers.
It adds a new interface: `CircuitBreakerPlugin`.
This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers.
With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.
This adds a max_model_memory setting to forecast requests.
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").
The default value is `20mb`.
There is a HARD limit at `500mb` which will throw an error if used.
If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.
related native change: https://github.com/elastic/ml-cpp/pull/1238
closes: https://github.com/elastic/elasticsearch/issues/56420
Implement TIME_PARSE(<time_str>, <pattern_str>) function
which allows to parse a time string according to the specified
pattern into a time object. The patterns allowed are those of
java.time.format.DateTimeFormatter.
Closes#54963
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Co-authored-by: Patrick Jiang(白泽) <patrickjiang0530@gmail.com>
(cherry picked from commit 1fe1188d449cad7d0782a202372edc52a4014135)
Backport of #56878 to 7.x branch.
With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.
Relates to #53100
Allows geo fields (`geo_point`, `geo_shape`) to have missing values.
Fixes a bug where such missing values would result in an error.
Closes#57299
Backport of #57300
Backporting #56888 to 7.x branch.
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.
This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.
Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.
Relates to #53100
Previously, `CASE` and `IIF` when translated to painless scripts
(used in GROUP BY, HAVING, WHERE) a custom `caseFunction`
registered in the `InternalSqlScriptUtils` was used. This function
received and array of arbitrary length:
```[condition1, result1, condition2, result2, ... elseResult]```
Painless doesn't know of the context and therefore is evaluating
all conditions and results before invoking the `caseFunction` on them.
As a consequence, erroneous result expressions (i.e. division by 0)
where always evaluated despite of the guarding condition.
Replace the `caseFunction` with painless `<cond> ? <res1> : <res2>`
expressions to properly guard the result expressions and only evaluate
the one for which its guarding condition evaluates to true (or of course
the elseResult).
As a bonus, this approach includes performance benefits since we avoid
unnecessary evaluations of both conditions and result expressions.
Fixes: #49672
(cherry picked from commit 9584b345d89f797bfb658212b928b9812804f02f)
The ssl.trust setting for Watcher provides a list of hostnames that
should be automatically trusted for SSL hostname verification. It was
accidentally broken when we added the full ssl.* settings for email
notifications (see #45272)
This commit corrects this, so the setting is once again respected,
as long as none of the other ssl settings are configured for email
notifications.
Resolves: #52153
Backport of: #56090
This commit removes the compiler.java setting from the build. It was
originally added when Gradle was far behind support for the latest jdk,
but is no longer applicable as we don't have any need to update the
supported compile version before gradle supports the newer version. Note
that the runtime version changing support still exists here, this only
ensures we use the same jdk to compile as we use to run gradle.
Since #51888 the ML job stats endpoint has returned entries for
jobs that have a persistent task but not job config. Such
orphaned tasks caused monitoring to fail.
This change ignores any such corrupt jobs for monitoring purposes.
Backport of #57235
This PR removes the blocking call to insert ingest documents into a queue in the
coordinator. It replaces it with an offer call which will throw a rejection exception
in the event that the queue is full. This prevents deadlocks of the write threads
when the queue fills to capacity and there are more than one enrich processors
in a pipeline.
The searcher was randomly wrapping its reader as slow, parallel, or filtered.
This was causing casting issues in the normalizer tests. By removing the
wrapping, the problem goes away.
Closes#57164
If a job is NOT opened, forecasts should be able to be deleted, no matter their state.
This also fixes a bug with expanding forecast IDs. We should check for wildcard `*` and `_all` when expanding the ids
closes https://github.com/elastic/elasticsearch/issues/56419
Change the error message wording for comparisons against fields in
filtering (s/variables/fields).
(cherry picked from commit d9a1cb50940d0a98fd75b9c0123ca6e1d862f65d)
* Update the JLine dependency to 3.14.1
Update the JLine dependency from 3.10.0 to 3.14.1.
(cherry picked from commit c2d9b74046fa5ddb54604da3afa7887cc38548a1)
Backport of #55548
Adds equivalence for keyword field to the wildcard field. Regex, fuzzy, wildcard and prefix queries are all supported.
All queries use an approximation query backed by an automaton-based verification queries.
Closes#54275
Fix delete_expired_data/nightly maintenance when
many model snapshots need deleting (#57041)
The queries performed by the expired data removers pull back entire
documents when only a few fields are required. For ModelSnapshots in
particular this is a problem as they contain quantiles which may be
100s of KB and the search size is set to 10,000.
This change makes the search more efficient by only requesting the
fields needed to work out which expired data should be deleted.
In #51089 where SamlAuthenticatorTests were refactored, we missed
to update one test case which meant that a single key would be
used both for signing and encryption in the same run. As explained
in #51089, and due to FIPS 140 requirements, BouncyCastle FIPS
provider will block RSA keys that have been used for signing from
being used for encryption and vice versa
This commit changes testNoAttributesReturnedWhenTheyCannotBeDecrypted
to always use the specific keys we have added for encryption.
This change ensures that we stop the maintenance service on all nodes
when a data node is restarted. This ensures that we don't send
update_by_query requests on the node that is restarted.
This commit also raises the log level to trace for some packages
in order to investigate the failures to acquire a shard lock
after a restart.
Relates #56765
- Use opensaml to sign and encrypt responses/assertions/attributes
instead of doing this manually
- Use opensaml to build response and assertion objects instead of
parsing xml strings
- Always use different keys for signing and encryption. Due to FIPS
140 requirements, BouncyCastle FIPS provider will block
RSA keys that have been used for signing from being used for
encryption and vice versa. This change adds new encryption specific
keys to be used throughout the tests.
Move the JDBC functionality integration tests from `:sql:qa` to a separate
module `:sql:qa:jdbc`. This way the tests are isolated from the rest of the
integration tests and they only depend to the `:sql:jdbc` module, thus
removing the danger of accidentally pulling in some dependency that may
hide bugs.
Moreover this is a preparation for #56722, so that we can run those tests
between different JDBC and ES node versions and ensure forward
compatibility.
Move the rest of existing tests inside a new `:sql:qa:server` project, so that
the `:sql:qa` becomes the parent project for both and one can run all the integration
tests by using this parent project.
(cherry picked from commit c09f4a04484b8a43934fe58fbc41bd90b7dbcc76)
Our FIPS 140 testing depends on setting the appropriate java policy
in order to configure the JVM in FIPS mode. Some tests (
discovery-ec2 and ccr qa ) also needed to set a custom policy file
to grant a specific permission, which overwrote the FIPS related
policy and tests would fail. This change ensures that when a
custom policy needs to be set in these tests, the permissions that
are necessary for FIPS are also set.
Resolves: #51685, #52034
Field mapping detection is done via grok patterns.
This commit adds well-known text (WKT) formatted geometry detection.
If everything is a `POINT`, then a `geo_point` mapping is preferred.
Otherwise, if all the fields are WKT geometries a `geo_shape` mapping is preferred.
This does **NOT** detect other types of formatted geometries (geohash, comma delimited points, etc.)
closes https://github.com/elastic/elasticsearch/issues/56967
* Fix temp dir locked errors
The tests involving a temporary directory (containing the JDBC JAR) fail
on Windows because they can't be deleted, due to still being in use.
This commit forces a premature closing of the JAR file, which mitigates
the failure by giving the JVM more time to collect any open FDs.
(Calling the System.gc() in the tests is another working alternative
fix.)
The stream-based JAR access is taken care by disabling the cache usage
(cherry picked from commit 04f97333a015404a68e8f19223f33aadeb396687)
The original implementation utilized `bbox` as the index mapping type. This would not work as it would have to be `envelope`. But, given that `envelope` and `polygon` are tessellated in the same way, we choose to use `polygon` as the geo_shape type. This is for easier support other places in the stack (a la kibana maps)
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.
This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.
Relates to #56814
Previously `COUNT(DISTINCT <literal>)` was returning the same result
as `COUNT(<literal>)` which is not correct as it should always return 1
if there is at least one matching row (bucket if there is a GROUP BY),
or 0 otherwise.
(cherry picked from commit 7f7d7562d43034907f432d39d0d66f490d78f4a8)
Throttling nightly cleanup as much as we do has been over cautious.
Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.
Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.
This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
The version number componenent can't equal or exceed the revision
multiplier.
This fixes a the VersionTests unit test.
(cherry picked from commit 7d2331a2818ae20024c5c3617cd4433f90e9c098)
* Adds support for MIN, MAX, AVG, SUM aggregates acting on literals.
SELECT SUM(1) FROM index
and
SELECT SUM(1), AVG(2)
work both on indices and as local execution.
(cherry picked from commit efb72907c0391612c4a2b6256e327060b4167912)
WatcherIndexTemplateRegistry as of https://github.com/elastic/elasticsearch/pull/52962
requires all nodes to be on 7.7.0 before it allows the version 11 index template to be
installed.
While in a mixed cluster, nothing prevents Watcher from running on the new
host before the all of the nodes are on 7.7.0. This will result in the
.watcher-history-11* index without the proper mappings. Without the proper
mapping a single document (for a large watch) can exceed the default 1000 field
limit and cause error to show in the logs.
This commit ensures the same logic for writing to the index is applied as for
installing the template. In a mixed cluster, the `10` index template will continue
to be written. Only once all of nodes are on 7.7.0+ will the `11` index template
be installed and used.
closes#56732
In DF analytics classification, it is possible to use no samples
of a class if its cardinality is too low.
This commit fixes this by ensuring the target sample count can never be zero.
Backport of #56783
* JDBC: fix access to the Manifest for non-entry JAR
The JDBC driver will attempt to read its version from the Manifest file
embedded into its JAR. The URL pointing to the JAR can be provided in a
few ways.
So far, accessing the Manfiest was attempted by getting a URLConnection
out of the URL and then getting an input stream out of this connection.
For file JAR URLs, this only works however if the URL points to the
driver as a JAR file entry (i.e. <sub-url>!/jdbc-driver.jar!/). If
that's not the case, the JarURLConnection will throw an IOException.
This commit fixes that: in case the URL points to a JAR entry
(jar:file:<path>/jdbc-driver.jar!/), the manifest is read directly with
JarURLConnection#getManifest().
(cherry picked from commit 2175b7b01cf5fcf3ab2bb21404a9bd454a8df3f0)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
In some cases the Enrich processor factory may be called before it is
ready to create processors. While these calls are usually made in error,
the response from the Enrich processor is an NPE which is almost always
an unhelpful error when debugging an issue.
This change aims to fix our setup in CI so that we can run 7.x in
FIPS 140 mode. The major issue that we have in 7.x and did not
have in master is that we can't use the diagnostic trust manager
in FIPS mode in Java 8 with SunJSSE in FIPS approved mode as it
explicitly disallows the wrapping of X509TrustManager.
Previous attempts like #56427 and #52211 focused on disabling the
setting in all of our tests when creating a Settings object or
on setting fips_mode.enabled accordingly (which implicitly disables
the diagnostic trust manager). The attempts weren't future proof
though as nothing would forbid someone to add new tests without
setting the necessary setting and forcing this would be very
inconvenient for any other case ( see
#56427 (comment) for the full argumentation).
This change introduces a runtime check in SSLService that overrides
the configuration value of xpack.security.ssl.diagnose.trust and
disables the diagnostic trust manager when we are running in Java 8
and the SunJSSE provider is set in FIPS mode.
* [Transform] add support for terms agg in transforms (#56696)
This adds support for `terms` and `rare_terms` aggs in transforms.
The default behavior is that the results are collapsed in the following manner:
`<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...`
Or if no sub aggs exist
`<AGG_NAME>.<BUCKET_NAME>.<_doc_count>`
The mapping is also defined as `flattened` by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.
This is a followup to #56632. Tests that had to be changed
to mock the C++ log handler more accurately need to be more
careful about when that stream ends, as ending of that
stream is used to detect crashes in the production system.
Fixes#56796
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.
The aggregations supports the following normalizations
- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization
To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.
For example:
```
{
"normalize": {
"buckets_path": <>,
"normalizer": "percent"
}
}
```
Optimize away events queries and joins/sequence that cannot match any
results without having to query the backend.
(cherry picked from commit 69c8ef8cfefd8fc6dcb6d1a566bfcd537068e3e4)
Adds the conflicting types and an example of an index which specifies
them in order to make it easier for the user to understand the conflict.
Backport of #56700
This change ensures that the maintenance service that is responsible for deleting the expired response is stopped between each test. This is needed since we check that no search context are in-flight after each test method.
Fixes#55988
If an email action is used in a foreach loop, message ids could have
been duplicated, which then get rejected by the mail server.
This commit introduces an additional static counter in the email action
in order to ensure that every message id is unique.
Prior to this change the named pipes that connect the ML C++
processes to the Elasticsearch JVM were all opened before any
of them were read from or written to.
This created a problem, where if the C++ process logged more
messages between opening the log pipe and opening the last
pipe to be connected than there was space for in the named
pipe's buffer then the C++ process would block. This would
mean it never got as far as opening the last named pipe, so
the JVM would never get as far as reading from the log pipe,
hence a deadlock.
This change alters the connection order so that the JVM
starts reading from the logging pipe immediately after opening
it so that if the C++ process logs messages while opening the
other named pipes they are captured in a timely manner and
there is no danger of a deadlock.
Backport of #56632
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.
Precondition for the terms work in #56487
Initial support for EQL sequences
The current algorithm is focused on correctness and does not contain
any optimization which is left for the future.
The current implementation uses a state machine approach which moves
ascending and runs each query one after the other working on computing
sequences as the data comes in.
For each result, the key and its timestamp are being extracted which are
then used for matching/building a sequence.
(cherry picked from commit 4f3e18c894a1841d333022361ad9d1fdf1477dc3)
- Add support for scalar functions on the field of SQL's LIKE/RLIKE
- Add support for scalar functions on the field of EQL's match/matchLite
Closes: #55058
(cherry picked from commit 51c14e2dbb7fb29004a23369c449d425b3ac8fe2)
When decoding async execution ids, exceptions thrown from the decode method itself were not caught, leading to cryptic errors like "Input byte array has incorrect ending byte at 68" being returned. With this commit we return "invalid id: [abcdef]".
Added tests coverage for a couple of these scenarios and also added tests for equals/hashcode methods.