Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
Implement DATE_PARSE(<date_str>, <pattern_str>) function
which allows to parse a date string according to the specified
pattern into a date object. The patterns allowed are those of
java.time.format.DateTimeFormatter.
Closes#54962
Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
Co-authored-by: Patrick Jiang(白泽) <dreamlike.sky@foxmail.com>
(cherry picked from commit 647a413d9b21bd3938f1716bb19f8407e1334125)
* [ML] add new `custom` field to trained model processors (#59542)
This commit adds the new configurable field `custom`.
`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.
Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).
This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.
Eclipse was confused by #59583. It can't see a the public inner
interface within the superclass. This time. Usually that is fine, but
the Eclipse gods don't like this particular code, I guess.
Using serialization/deserialization when dealing with non-trivial
documents causes the process to get stuck not to mention it is expensive.
Use a much more simple approach at the expense of losing information
(we're just interested in the source after all).
(cherry picked from commit e1659822db7ce1390ba9bbfb21768e24a0907dff)
Previously the concrete type parameters for the MappedFieldType didn't always
match those for the FieldMapper. This PR updates the mappers so that the type
parameters always match, which makes the design easier to follow.
to avoid many error stacktraces in logs during a rolling upgrade.
Stack templates use the composable index template and component APIs,these APIs
aren't supported in 7.7 and earlier and in mixed cluster
environments this can cause a lot of ActionNotFoundTransportException
errors in the logs during rolling upgrades. If these templates
are only installed via elected master node then the APIs are always
there and the ActionNotFoundTransportException errors are then prevented.
When an inference model is loaded it is accounted for in circuit breaker
and should not be released until there are no users of the model. Adds
a reference count to the model to track usage.
Today when mounting a searchable snapshot we obtain the snapshot/index
UUIDs and then assume that these are the UUIDs used during the
subsequent restore. If you concurrently delete the snapshot and replace
it with one with the same name then this assumption is violated, with
chaotic consequences.
This commit introduces a check that ensures that the snapshot UUID does
not change during the mount process. If the snapshot remains in place
then the index UUID necessarily does not change either.
Relates #50999
Case sensitivity is incorporated as a test dimension - instead of
running the same test twice, two different tests are created.
Clean-up the test invocation by removing unused parameters.
Fix#59294
(cherry picked from commit 72c8a3582d8e8a4a663d82814a17a1a3d2757292)
Unfortunately, we cannot guarantee that the execution will be truly
async even with 0ms timeout since we cannot block the execution. So, we need
to modify the test to work in both async and non-async mode.
Closes#59416
Backport of #59525 to 7.x branch.
* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
Since #58728 writing operations on searchable snapshot directory cache files
are executed in an asynchronous manner using a dedicated thread pool. The
thread pool used is searchable_snapshots which has been created to execute
prewarming tasks.
Reusing the same thread pool wasn't a good idea as it can lead to deadlock
situations. One of these situation arose in a test failure where the thread pool
was full of prewarming tasks, all waiting for a cache file to be accessible, while
the cache file was being evicted by the cache service. But such an eviction
can only be processed when all read/write operations on the cache file are
completed and in this case the deadlock occurred because the cache file was
actively being read by a concurrent search which also won the privilege to
write the range of bytes in cache... and this writing operation could never have
been completed because of the prewarming tasks making no progress and
filling up the thread pool.
This commit renames the searchable_snapshots thread pool to
searchable_snapshots_cache_fetch_async. Assertions are added to assert
that cache writes are executed using this thread pool and to assert that read
on cached index inputs are executed using a different thread pool to avoid
potential deadlock situations.
This commit also adds a searchable_snapshots_cache_prewarming that is
used to execute prewarming tasks. It also converts the existing cache prewarming
test into a more complte integration test that creates multiple searchable
snapshot indices concurrently with randomized thread pool sizes, and verifies
that all files have been correctly prewarmed.
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.
Backport of #59038
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
* We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations
See updated JavaDoc and added test cases for more details and illustration on the functionality.
Some notes:
The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.
This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.
Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
This commit adds a new api to track when gold+ features are used within
x-pack. The tracking is done internally whenever a feature is checked
against the current license. The output of the api is a list of each
used feature, which includes the name, license level, and last time it
was used. In addition to a unit test for the tracking, a rest test is
added which ensures starting up a default configured node does not
result in any features registering as used.
There are a couple features which currently do not work well with the
tracking, as they are checked in a manner that makes them look always
used. Those features will be fixed in followups, and in this PR they are
omitted from the feature usage output.
Removes the `@timestamp` field mapping from several data stream index
template snippets.
With #59317, the `@timestamp` field defaults to a `date` field data type
for data streams.
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
Instead of retrieving an entire SearchHit, get just a reference and
postpone the document retrieval when assembling the final results.
Remove sort information from results to make them consistent.
Move TumblingWindow under the sequence package.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
(cherry picked from commit bccfbcd81f2f1d3552e95e4a9ee2618fb3059bd9)
API keys can be created nameless using the grant endpoint (it is a bug, see #59484).
This change ensures auditing doesn't throw when such an API Key is used for authn.
The `create_doc`, `create`, `write` and `index` privileges do not grant
the PutMapping action anymore. Apart from the `write` privilege, the other
three privileges also do NOT grant (auto) updating the mapping when ingesting
a document with unmapped fields, according to the templates.
In order to maintain the BWC in the 7.x releases, the above privileges will still grant
the Put and AutoPutMapping actions, but only when the "index" entity is an alias
or a concrete index, but not a data stream or a backing index of a data stream.
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.
The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).
Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
The `Authentication` object that gets built following an API Key authentication
contains the realm name of the owner user that created the key (which is audited),
but the specific field used for storing it changed in #51305 .
This PR makes it so that auditing tolerates an "unfound" realm name, so it doesn't
throw an NPE, because the owner realm name is not found in the expected field.
Closes#59425
Renames and moves the cross validation splitter package.
First, the package and classes are renamed from using
"cross validation splitter" to "train test splitter".
Cross validation as a term is overloaded and encompasses
more concepts than what we are trying to do here.
Second, the package used to be under `process` but it does
not make sense to be there, it can be a top level package
under `dataframe`.
Backport of #59529
When a field is not included yet its type is unsupported, we currently
state that the reason the field is excluded is that it is not in the
includes list. However, this implies the user could include it but
if the user tried to do so, they would get a failure as they would
be including a field with unsupported type.
This commit improves this by stating the reason a not included field
with unsupported type is excluded is because of its type.
Backport of #59424
The primary shards of follower indices during the bootstrap need to be
on nodes with the remote cluster client role as those nodes reach out to
the corresponding leader shards on the remote cluster to copy Lucene
segment files and renew the retention leases. This commit introduces a
new allocation decider that ensures bootstrapping follower primaries are
allocated to nodes with the remote cluster client role.
Co-authored-by: Jason Tedor <jason@tedor.me>
Since we have added checking the cardinality of the dependent_variable
for classification, we have introduced a bug where an NPE is thrown
if the dependent_variable is a missing field.
This commit is fixing this issue.
Backport of #59524
This PR adds minimum support for prefix search of API Key name. It only touches API key name and leave all other query parameters, e.g. realm name, username unchanged.
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.
(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Certain OPs mix usage of boolean and string for boolean type OIDC claims. For example, the same "email_verified" field is presented as boolean in IdToken, but is a string of "true" in the response of user info. This inconsistency results in failures when we try to merge them during authorization.
This PR introduce a small leniency so that it will merge a boolean with a string that has value of the boolean's string representation. In another word, it will merge true with "true", also will merge false with "false", but nothing else.
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.
(cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Improve the way limit (in particular offset) is being applied to handle
the case where the matches are less than the offset and absolute limit.
Combine Matcher and SequenceStateMachine into one class since the two
have evolved beyond their original name and structure.
(cherry picked from commit 63d3c62cdfc33dea03f21d5565b9c8ea104003eb)
separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function.
Foundation to add more functions to transform.
piggy backed fixes:
- when running geo tile group_by it could fail due to query clause limit (unreleased)
- new style page size using settings was not validating limit of 10k (7.8)
- Fix duplicate path deprecation by removing duplicate test resources
- fix deprecated non annotated input property in LazyPropertyList
- fix deprecated usage of AbstractArchiveTask.version
- Resolve correct test resources
Now that we have per-partition categorization, the estimate for
the model memory limit required for a particular analysis config
needs to take into account whether categorization is operating
for the job as a whole or per-partition.
API keys can be created without names using grant API key action. This is considered as a bug (#59484). Since the feature has already been released, we need to accomodate existing keys that are created with null names. This PR relaxes the parser logic so that a null name is accepted.
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is
pretty minimal, returning only the number of data streams and the number of indices currently
abstracted by a data stream:
```
...
"data_streams" : {
"available" : true,
"enabled" : true,
"data_streams" : 3,
"indices_count" : 17
}
...
```
Removes member variable `index` from `ExtractedFieldsDetector`
as it is not used.
Backport of #59395
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Backport of #59293 to 7.x branch.
* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
this results in storing a composable index template
with data stream definition only to work with default
distribution. This way data streams can only be used
with default distribution, since a data stream can
currently only be created if a matching composable index
template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
to fail if `_data_stream_timestamp` meta field mapper
isn't registered. So that a more understandable
error is returned when attempting to store a template
with data stream definition via the oss distribution.
In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
With the introduction of per-partition categorization the old
logic for creating a job notification for categorization status
"warn" does not work. However, the C++ code is already writing
annotations for categorization status "warn" that take into
account whether per-partition categorization is being used and
which partition(s) the warnings relate to. Therefore, this
change alters the Java results processor to create notifications
based on the annotations the C++ writes. (It is arguable that
we don't need both annotations and notifications, but they show
up in different ways in the UI: only annotations are visible in
results and only notifications set the warning symbol in the
jobs list. This means it's best to have both.)
Backport of #59377
This PR ensure that same roles are cached only once even when they are from different API keys.
API key role descriptors and limited role descriptors are now saved in Authentication#metadata
as raw bytes instead of deserialised Map<String, Object>.
Hashes of these bytes are used as keys for API key roles. Only when the required role is not found
in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly
from raw bytes to RoleDescriptors without going through the current detour of
"bytes -> Map -> bytes -> RoleDescriptors".
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.
This setting may also be updated for a stopped job.
More threads may reduce the time it takes to complete the job at the cost
of using more CPU.
Backport of #59254 and #57274
Sequences now support until conditional, which prevents a match from
occurring if the until matches a document while doing look-ups.
Thus a sequence must complete before the until condition matches - if
any document within the sequence occurs at, or after, the until hit, the
sequence is discarded.
(cherry picked from commit 1ba1b9f0661aee655aa48cf9475ac61aaee2bfda)
Since we are able to load the inference model
and perform inference in java, we no longer need
to rely on the analytics process to be performing
test inference on the docs that were not used for
training. The benefit is that we do not need to
send test docs and fit them in memory of the c++
process.
Backport of #58877
Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.
This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.
Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields
to the `access_granted`, `access_denied`, `authentication_success`, and
(some) `tampered_request` audit events. The `apikey.id` and `apikey.name`
are present only when authn using an API Key.
2. When authn with an API Key, the `user.realm` field now contains the effective
realm name of the user that created the key, instead of the synthetic value of
`_es_api_key`.
* Add sample versions of standard deviation and variance functions (#59093)
* Add STDDEV_SAMP, VAR_SAMP
This commit adds the sampling variations of the standard deviation and
variance agg functions.
(cherry picked from commit 8b29817b49e386215f29cb5b3356d0183fd5d9de)
* Fix: workaround for lack of Map#of() in Java8
Replace Map#of() with a HashMap static init.
These tests sometimes install a template so they can be compatible with older versions, but they run
amok of the occasionally installed "global" template which changes the default number of shards.
This commit adds `allowedWarnings` and allows these warnings to be present, but doesn't fail if they
are not (since the global template is only randomly installed).
Resolves#58807Resolves#58258
Waiting `INIT` here is dead code in newer versions that don't use `INIT`
any longer and leads to nothing being written to the repository in older versions
if the snapshot is cancelled at the `INIT` step which then breaks repo consistency
checks.
Since we have other tests ensuring that snapshot abort works properly we can just remove
the wait for `INIT` here and backport this down to 7.8 to fix tests.
relates #59140
Backport of #59076 to 7.x branch.
The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.
Relates to #58642
Relates to #53100Closes#58956Closes#58583
Today we empty the searchable snapshots cache when cleanly closing a
shard, but leak cache files in some cases involving an unclean shutdown.
Such leaks are not permanent, they are cleaned up on shard relocation or
deletion, but they still might last for arbitrarily long until that
happens. This commit introduces a cleanup process that runs during node
startup to catch such leaks sooner.
Also, today we permit searchable snapshots to be held on custom data
paths, and store the corresponding cache files within the custom
location. Supporting this feature would make the cleanup process
significantly more complicated since it would require each node to parse
the index metadata for the shards it held before shutdown. Yet, this
feature is undocumented and offers minimal benefits to searchable
snapshots. Therefore with this commit we forbid custom data paths for
searchable snapshot shards.
This makes a `parentCardinality` available to every `Aggregator`'s ctor
so it can make intelligent choices about how it collects bucket values.
This replaces `collectsFromSingleBucket` and is similar to it but:
1. It supports `NONE`, `ONE`, and `MANY` values and is generally
extensible if we decide we can use more precise counts.
2. It is more accurate. `collectsFromSingleBucket` assumed that all
sub-aggregations live under multi-bucket aggregations. This is
normally true but `parentCardinality` is properly carried forward
for single bucket aggregations like `filter` and for multi-bucket
aggregations configured in single-bucket for like `range` with a
single range.
While I was touching every aggregation I renamed `doCreateInternal` to
`createMapped` because that seemed like a much better name and it was
right there, next to the change I was already making.
Relates to #56487
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
In order to ensure that we do not write a broken piece of `RepositoryData`
because the phyiscal repository generation was moved ahead more than one step
by erroneous concurrent writing to a repository we must check whether or not
the current assumed repository generation exists in the repository physically.
Without this check we run the risk of writing on top of stale cached repository data.
Relates #56911
Corrected condition that caused a sequence window to be skipped when a query
returns no results by checking not just the current stage but also following
ones as they can match with in-flight sequences.
Improve logging
Fix NPE when emptying a SequenceGroup
Increase randomization in testing
Make maxspan inclusive (up to and equal to value vs just up to)
(cherry picked from commit ad32c488688cb350c2934dfca03af86045e997b0)
Ensure blocking tasks are running before submitting more no-op tasks. This ensures no task would be popped out of the queue unexpectedly, which in turn guarantees the rejection of subsequent authentication request.
Today, we send operations in phase2 of peer recoveries batch by batch
sequentially. Normally that's okay as we should have a fairly small of
operations in phase 2 due to the file-based threshold. However, if
phase1 takes a lot of time and we are actively indexing, then phase2 can
have a lot of operations to replay.
With this change, we will send multiple batches concurrently (defaults
to 1) to reduce the recovery time.
Backport of #58018
The composite role that is used for authz, following the authn with an API key,
is an intersection of the privileges from the owner role and the key privileges defined
when the key has been created.
This change ensures that the `#names` property of such a role equals the `#names`
property of the key owner role, thereby rectifying the value for the `user.roles`
audit event field.
* GET data stream API returns additional information (#59128)
This adds the data stream's index template, the configured ILM policy
(if any) and the health status of the data stream to the GET _data_stream
response.
Restoring a data stream from a snapshot could install a data stream that
doesn't match any composable templates. This also makes the `template`
field in the `GET _data_stream` response optional.
(cherry picked from commit 0d9c98a82353b088c782b6a04c44844e66137054)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This removes the blocking model lookup from the `inference` aggregator's
builder by integrating it into the request rewrite process that loads
stuff asynchronously.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Fixed an issue #59082 introduced. We have to wait for no more operations
in all tests here not just the one we were waiting in already so that the cleanup
operation from the parent class can run without failure.
Adds error handling when filling up the queue of the crypto thread pool. Also reduce queue size of the crypto thread pool to 10 so that the queue can be cleared out in time.
Test testAuthenticationReturns429WhenThreadPoolIsSaturated has seen failure on CI when it tries to push 1000 tasks into the queue (setup phase). Since multiple tests share the same internal test cluster, it may be possible that there are lingering requests not fully cleared out from the queue. When it happens, we will not be able to push all 1000 tasks into the queue. But since what we need is just queue saturation, so as long as we can be sure that the queue is fully filled, it is safe to ignore rejection error and just move on.
A number of 1000 tasks also take some to clear out, which could cause the test suite to time out. This PR change the queue to 10 so the tests would have better chance to complete in time.
For #58994 it would be useful to be able to share test infrastructure.
This PR shares `AbstractSnapshotIntegTestCase` for that purpose, dries up SLM tests
accordingly and adds a shared and efficient (compared to the previous implementations)
way of waiting for no running snapshot operations to the test infrastructure to dry things up further.
There have been a few test failures that are likely caused by tests
performing actions that use ML indices immediately after the actions
that create those ML indices. Currently this can result in attempts
to search the newly created index before its shards have initialized.
This change makes the method that creates the internal ML indices
that have been affected by this problem (state and stats) wait for
the shards to be initialized before returning.
Backport of #59027
* Enforce higher priority for RepositoriesService ClusterStateApplier
This avoids shards allocation failures when the repository instance
comes in the same ClusterState update as the shard allocation.
Backport of #58808
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.
The remaining cases in modules, plugins, and x-pack will be handled in followups.
This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.
The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.
Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).
As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.
Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
The current internal sequence algorithm relies on fetching multiple results and then paginating through the dataset. Depending on the dataset and memory, setting a larger page size can yield better performance at the expense of memory.
This PR makes this behavior explicit by decoupling the fetch size from size, the maximum number of results desired.
As such, use in testing a minimum fetch size which exposed a number of bugs:
Jumping across data across queries causing valid data to be seen as a gap.
Incorrectly resuming searching across pages (again causing data to be discarded).
which have been addressed.
(cherry picked from commit 2f389a7724790d7b0bda67264d6eafcfa8b2116e)
UnresolvedRelation does not care about its source during equality hence
ignore it when doing randomized mutations.
Relates #59014
(cherry picked from commit b21222e714fbf85aad0916e4d4b6a933d2b6958a)
While at it, change the default size to 10 (to align it with the search
API defaults).
(cherry picked from commit 45795939b277e736a9e4f2f008d1c3f406239075)
The PR introduces following two changes:
Move API key validation into a new separate threadpool. The new threadpool is created separately with half of the available processors and 1000 in queue size. We could combine it with the existing TokenService's threadpool. Technically it is straightforward, but I am not sure whether it could be a rushed optimization since I am not clear about potential impact on the token service.
On threadpoool saturation, it now fails with EsRejectedExecutionException which in turns gives back a 429, instead of 401 status code to users.
Backport of #58582 to 7.x branch.
This commit adds a new metadata field mapper that validates,
that a document has exactly a single timestamp value in the data stream timestamp field and
that the timestamp field mapping only has `type`, `meta` or `format` attributes configured.
Other attributes can affect the guarantee that an index with this meta field mapper has a
useable timestamp field.
The MetadataCreateIndexService inserts a data stream timestamp field mapper whenever
a new backing index of a data stream is created.
Relates to #53100
Dry up tests that use a disruption that isolates the master from all other nodes.
Also, turn disruption types that have neither parameters nor state into constants
to make things a little clearer.
This commit changes our behavior in 2 ways:
- When mapping claims to user properties ( principal, email, groups,
name), we only handle string and array of string type. Previously
we would fail to recognize an array of other types and that would
cause failures when trying to cast to String.
- When adding unmapped claims to the user metadata, we only handle
string, number, boolean and arrays of these. Previously, we would
fail to recognize an array of other types and that would cause
failures when attempting to process role mappings.
For user properties that are inherently single valued, like
principal(username) we continue to support arrays of strings where
we select the first one in case this is being depended on by users
but we plan on removing this leniency in the next major release.
Co-authored-by: Ioannis Kakavas <ioannis@elastic.co>
Despite all my attempts I did not manage to reproduce issues like the ones
described in #58961. My guess is that the _mount request got retried at
some point but I wasn't able to validate this assumption.
Still, the FsSearchableSnapshotsIT can be pretty disk heavy if a small
random chunk size and a large number of documents is picked up in the
tests. The parent class also does not verify the acknowledged status
of some requests.
This commit lowers down the chunk size and number of docs in tests
(this is extensively tests in unit tests) and also adds assertions on
acknowledged responses.
Relates #58961
Working through a heap dump for an unrelated issue I found that we can easily rack up
tens of MBs of duplicate empty instances in some cases.
I moved to a static constructor to guard against that in all cases.
* SQL: Redact credentials in connection exceptions (#58650)
This commit adds the functionality to redact the credentials from the
exceptions generated when a connection attempt fails, preventing them
from leaking into logs, console history etc.
There are a few causes that can lead to failed connections. The most
challenging to deal with is a malformed connection string. The redaction
tries to get around it by modifying the URI to a parsable state, so that
the redaction can be applied reliably. If there's no reliability
guarantee, the redaction will bluntly replace the entire connection
string and the user informed about the option to modify it so that the
redaction won't apply. (This is done by using a caplitalized scheme,
which is legal, but otherwise never used in practice.)
The commit fixes a couple of other issues with the URI parser:
- it allows an empty hostname, or even entire connection string (as per
the existing documentation);
- it reduces the editing of the connection string in the exception
messages (so that the user easier recognize their input);
- it uses the default URI as source for the scheme and hostname.
(cherry picked from commit a0bd5929d0658c4fed44404e0c4d78eac88222fd)
* Implement String#repeat(), unavailable in Java8
Implement a client.StringUtils#repeatString() as a replacement for
String#repeat(), unavailable in Java8.
SQL: fix handling of escaped chars in JDBC connection string (#58429)
This commit fixes an issue emerging when the connection string URI
contains escaped characters.
The original URI is pre-parsed in order to re-assemble a new URI having
the optional elements filled in with defaults. The new URI has been
using however the unescaped query and fragment parts. So if these
contained any escaped `&` or `=` (such as in the password option value),
the unescaping would reveal them and make them later interfere with the
options parsing.
The commit changes that, so that the new URI be built from the unescaped
"raw" parts of the original URI.
(cherry picked from commit 94eb5a05e79c6e203de548d05b13e00295bd4489)
- The exception that we caught when failing to schedule a thread was incorrect.
- We may have failures when reducing the response before returning it, which were not handled correctly and may have caused get or submit async search task to not be properly unregistered from the task manager
- when the completion listener onFailure method is invoked, the search task has to be unregistered. Not doing so may cause the search task to be stuck in the task manager although it has completed.
Closes#58995
.ml-state-write is supposed to be an index alias, however by accident it can become an index. If
.ml-state-write is a concrete index instead of an alias ML stops working. This change improves error
handling by setting the job to failed and properly log and audit the problem. The user still has to
manually fix the problem. This change should lead to a quicker resolution of the problem.
fixes#58482
When we execute search against remote indices, the remote indices are authorized on the remote cluster and not on the CCS cluster. When we introduced submit async search we added a check that requires that the user running it has the privilege to execute it on some index. That prevents users from executing async searches against remote indices unless they also have read access on the CCS cluster, which is common when the CCS cluster holds no data.
The solution is to let the submit async search go through as we already do for get and delete async search. Note that the inner search action will still check that the user can access local indices, and remote indices on the remote cluster, like search always does.
The Saml SP document stored the role mapping in a Set, but this made
the order in XContent inconsistent. This switched it to use a TreeSet.
Resolves: #54733
Backport of: #55201
This is a follow-up to #57573. This commit combines coordinating and
primary bytes under the same "write" bucket. Double accounting is
prevented by only accounting the bytes at either the reroute phase or
the primary phase. TransportBulkAction calls execute directly, so the
operations handler is skipped and the bytes are not double accounted.
A regression in the mapping code led to geo_shape no longer supporting
array-valued fields. This commit fixes this support and adds an integration
test to make sure this problem does not return!
We already had code to ensure the config index mappings were
up-to-date before creating a new config. However, it's also
possible that an update to a config could add the latest
settings that require the latest mappings to index correctly.
This change checks that the latest config index mappings are
in place in the 3 update actions in the same way as the checks
are done in the 3 put actions.
Backport of #58916
Refactor sequence matching classes in order to decouple querying from
results consumption (and matching).
Rename some classes to better convey their intent.
Introduce internal pagination of sequence algorithm, that is getting the
data in slices and, if needed, moving forward in order to find more
matches until either the dataset is consumer or the number of results
desired is found.
(cherry picked from commit bcf2c1141302f3f98c85e82d2c501aa02c8540e9)
Add caching support for application privileges to reduce number of round-trips to security index when building application privilege descriptors.
Privilege retrieving in NativePrivilegeStore is changed to always fetching all privilege documents for a given application. The caching is applied to all places including "get privilege", "has privileges" APIs and CompositeRolesStore (for authentication).
* [ML] handles compressed model stream from native process (#58009)
This moves model storage from handling the fully parsed JSON string to handling two separate types of documents.
1. ModelSizeInfo which contains model size information
2. TrainedModelDefinitionChunk which contains a particular chunk of the compressed model definition string.
`model_size_info` is assumed to be handled first. This will generate the model_id and store the initial trained model config object. Then each chunk is assumed to be in correct order for concatenating the chunks to get a compressed definition.
Native side change: https://github.com/elastic/ml-cpp/pull/1349
When the documents are large, a follower can receive a partial response
because the requesting range of operations is capped by
max_read_request_size instead of max_read_request_operation_count. In
this case, the follower will continue reading the subsequent ranges
without checking the remaining size of the buffer. The buffer then can
use more memory than max_write_buffer_size and even causes OOM.
Backport of #58620
Since #58728 part of searchable snapshot shard files are written in cache
in an asynchronous manner in a dedicated thread pool. It means that even
if a search query is successful and returns, there are still more bytes to
write in the cached files on disk.
On CI this can be slow; if we want to check that the cached_bytes_written
has changed we need to check multiple times to give some time for the
cached data to be effectively written.
The checks on the license state have a singular method, isAllowed, that
returns whether the given feature is allowed by the current license.
However, there are two classes of usages, one which intends to actually
use a feature, and another that intends to return in telemetry whether
the feature is allowed. When feature usage tracking is added, the latter
case should not count as a "usage", so this commit reworks the calls to
isAllowed into 2 methods, checkFeature, which will (eventually) both
check whether a feature is allowed, and keep track of the last usage
time, and isAllowed, which simply determines whether the feature is
allowed.
Note that I considered having a boolean flag on the current method, but
wanted the additional clarity that a different method name provides,
versus a boolean flag which is more easily copied without realizing what
the flag means since it is nameless in call sites.
This commit changes CacheFile and CachedBlobContainerIndexInput so that
the read operations made by these classes are now progressively executed
and do not wait for full range to be written in cache. It relies on the change
introduced in #58477 and it is the last change extracted from #58164.
Relates #58164
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.
The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.
Relates https://github.com/elastic/elasticsearch/issues/57023
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.
This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.
Backport of #58029 to 7.x
* Picky picky
* Missing type
SAML idP sends back a LogoutResponse at the end of the logout workflow. It can be sent via either HTTP-Redirect binding or HTTP-POST binding. Currently, the HTTP-Redirect request is simply ignored by Kibana and never reaches ES. It does not cause any obvious issue and the workflow is completed normally from user's perspective.
The HTTP-POST request results in a 404 error because POST request is not accepted by Kibana's logout end-point. This causes a non-trivial issue because it renders an error page in user's browser. In addition, some resources do not seem to be fully cleaned up due to the error, e.g. the username will be pre-filled when trying to login again after the 404 error.
This PR solves both of the above issues from ES side with a new /_security/saml/complete_logout end-point. Changes are still needed on Kibana side to relay the messages.
Backport of #58419
Mapping updates that originate from indexing a document with unmapped fields will use this new action
instead of the current put mapping action. This way on the security side, authorization logic
can easily determine whether a mapping update is automatically generated or a mapping update originates
from the put mapping api.
The new auto put mapping action is only used if all nodes are on the version that supports it.
This PR implements recursive mapping merging for composable index templates.
When creating an index, we perform the following:
* Add each component template mapping in order, merging each one in after the
last.
* Merge in the index template mappings (if present).
* Merge in the mappings on the index request itself (if present).
Some principles:
* All 'structural' changes are disallowed (but everything else is fine). An
object mapper can never be changed between `type: object` and `type: nested`. A
field mapper can never be changed to an object mapper, and vice versa.
* Generally, each section is merged recursively. This includes `object`
mappings, as well as root options like `dynamic_templates` and `meta`. Once we
reach 'leaf components' like field definitions, they always overwrite an
existing one instead of being merged.
Relates to #53101.
When per_partition_categorization.stop_on_warn is set for an analysis
config it is now passed through to the autodetect C++ process.
Also adds some end-to-end tests that exercise the functionality
added in elastic/ml-cpp#1356
Backport of #58632
* Replace compile configuration usage with api (#58451)
- Use java-library instead of plugin to allow api configuration usage
- Remove explicit references to runtime configurations in dependency declarations
- Make test runtime classpath input for testing convention
- required as java library will by default not have build jar file
- jar file is now explicit input of the task and gradle will ensure its properly build
* Fix compile usages in 7.x branch
RandomZone test method returns a ZoneId from the set of ids supported by
java. The only difference between joda and java supported timezones are
SystemV* timezones.
These should be excluded from randomZone method as they would break
testing. They also do not bring much confidence when used in testing as
I suspect they are rarely used.
That exclude should be removed for simplification once joda support is removed.
This commit adds the BuildParams.testSeed to the repository base paths used
in searchable snapshots QA tests. For S3 and GCS the test seed is added for
coherency sake with other integration tests while it's required for Azure as
Azure 3rd party tests are executed on CI simultaneously for regular and
SAS token accounts.
Closes#58260
The GET /_license endpoint displays "enterprise" licenses as
"platinum" by default so that old clients (including beats, kibana and
logstash) know to interpret this new license type as if it were a
platinum license.
However, this compatibility layer was not applied to the GET /_xpack/
endpoint which also displays a license type & mode.
This commit causes the _xpack API to mimic the _license API and treat
enterprise as platinum by default, with a new accept_enterprise
parameter that will cause the API to return the correct "enterprise"
value.
This BWC layer exists only for the 7.x branch.
This is a breaking change because, since 7.6, the _xpack API has
returned "enterprise" for enterprise licenses, but this has been found
to break old versions of beats and logstash so needs to be corrected.
EQL sequences can specify now a maximum time allowed for their span
(computed between the first and the last matching event).
(cherry picked from commit 747c3592244192a2e25a092f62aec91a899afc83)
* EQL: case sensitivity aware integration testing (#58624)
* Add DataLoader
* Rewrite case sensitivity settings:
NULL -> run both case sensitive and insensitive tests
TRUE -> run case sensitive test only
FALSE -> run case insensitive test only
* Rename test_queries_supported
* Add more toml tests from the Python client
Co-authored-by: Ross Wolf <31489089+rw-access@users.noreply.github.com>
(cherry picked from commit 34d383421599f060a5c083b40df35f135de49e39)
SparseFileTracker.Gap can keep a reference to the corresponding range it is about to fill,
it does not need to resolve the range each time onSuccess/onProgress/onFailure are
called.
Relates #58477
The remote_monitoring_user user needs to access the enrich stats API.
But the request is denied because the API is categorized under admin.
The correct privilege should be monitor.
Adds parsing of `status` and `memory_reestimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`memory_reestimate_bytes` can be used to update the job's
limit in order to restart the job.
Backport of #58588
Introduce pipe support, in particular head and tail
(which can also be chained).
(cherry picked from commit 4521ca3367147d4d6531cf0ab975d8d705f400ea)
(cherry picked from commit d6731d659d012c96b19879d13cfc9e1eaf4745a4)
Today SparseFileTracker allows to wait for a range to become available
before executing a given listener. In the case of searchable snapshot,
we'd like to be able to wait for a large range to be filled (ie, downloaded
and written to disk) while being able to execute the listener as soon as
a smaller range is available.
This pull request is an extract from #58164 which introduces a
ProgressListenableActionFuture that is used internally by
SparseFileTracker. The progressive listenable future allows to register
listeners attached to SparseFileTracker.Gap so that they are executed
once the Gap is completed (with success or failure) or as soon as the
Gap progress reaches a given progress value. This progress value is
defined when the tracker.waitForRange() method is called; this method
has been modified to accept a range and another listener's range to
operate on.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
If a project is pulling in an external org.elasticsearch dependency, the dependency
report generation would require a license file for the dependency to be present.
This would break precommit because a license was present that it did not feel was
warranted. This un-reverts the update to the dependenciesInfo task, as well as the
JNA license addition.
In SLM retention, when a minimum number of snapshots is required for retention, we prefer to remove
the oldest snapshots first. To perform this, we limit one of the streams, in a rare case this can
cause:
```
[mynode] error during snapshot retention task
java.lang.IllegalArgumentException: -5
at java.util.stream.ReferencePipeline.limit(ReferencePipeline.java:469) ~[?:?]
at org.elasticsearch.xpack.core.slm.SnapshotRetentionConfiguration.lambda$getSnapshotDeletionPredicate$6(SnapshotRetentionConfiguration.java:195) ~[?:?]
at org.elasticsearch.xpack.slm.SnapshotRetentionTask.snapshotEligibleForDeletion(SnapshotRetentionTask.java:245) ~[?:?]
at org.elasticsearch.xpack.slm.SnapshotRetentionTask$1.lambda$onResponse$0(SnapshotRetentionTask.java:163) ~[?:?]
at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:176) ~[?:?]
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1624) ~[?:?]
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484) ~[?:?]
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474) ~[?:?]
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:913) ~[?:?]
```
When certain criteria are met. This commit fixes the negative limiting with `Math.max(0, ...)` and
adds a unit test for the behavior.
Resolves#58515
Rather than let ExtensiblePlugins know extending plugins' classloaders,
we now pass along an explicit ExtensionLoader that loads the extensions
asked for. Extensions constructed that way can optionally receive their
own Plugin instance in the constructor.
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
- node.data: false
- node.ingest: false
- node.remote_cluster_client: false
- node.ml: false
at a minimum if they are relying on defaults, but also add:
- node.master: true
- node.transform: false
- node.voting_only: false
If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.
This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.
With this change we deprecate the existing 'node.*' settings such as
'node.data'.
* [ML] make waiting for renormalization optional for internally flushing job (#58537)
When flushing, datafeeds only need the guaruntee that the latest bucket has been handled.
But, in addition to this, the typical call to flush waits for renormalization to complete. For large jobs, this can take a fair bit of time (even longer than a bucket length). This causes unnecessary delays in handling data.
This commit adds a new internal only flag that allows datafeeds (and forecasting) to skip waiting on renormalization.
closes#58395
Implements a new histogram aggregation called `variable_width_histogram` which
dynamically determines bucket intervals based on document groupings. These
groups are determined by running a one-pass clustering algorithm on each shard
and then reducing each shard's clusters using an agglomerative
clustering algorithm.
This PR addresses #9572.
The shard-level clustering is done in one pass to minimize memory overhead. The
algorithm was lightly inspired by
[this paper](https://ieeexplore.ieee.org/abstract/document/1198387). It fetches
a small number of documents to sample the data and determine initial clusters.
Subsequent documents are then placed into one of these clusters, or a new one
if they are an outlier. This algorithm is described in more details in the
aggregation's docs.
At reduce time, a
[hierarchical agglomerative clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)
algorithm inspired by [this paper](https://arxiv.org/abs/1802.00304)
continually merges the closest buckets from all shards (based on their
centroids) until the target number of buckets is reached.
The final values produced by this aggregation are approximate. Each bucket's
min value is used as its key in the histogram. Furthermore, buckets are merged
based on their centroids and not their bounds. So it is possible that adjacent
buckets will overlap after reduction. Because each bucket's key is its min,
this overlap is not shown in the final histogram. However, when such overlap
occurs, we set the key of the bucket with the larger centroid to the midpoint
between its minimum and the smaller bucket’s maximum:
`min[large] = (min[large] + max[small]) / 2`. This heuristic is expected to
increases the accuracy of the clustering.
Nodes are unable to share centroids during the shard-level clustering phase. In
the future, resolving https://github.com/elastic/elasticsearch/issues/50863
would let us solve this issue.
It doesn’t make sense for this aggregation to support the `min_doc_count`
parameter, since clusters are determined dynamically. The `order` parameter is
not supported here to keep this large PR from becoming too complex.
Co-authored-by: James Dorfman <jamesdorfman@users.noreply.github.com>
The main changes are:
1. Catch the `NamedObjectNotFoundException` when parsing aggregation
type, and then throw a `ParsingException` with clear error message with hint.
2. Add a unit test method: AggregatorFactoriesTests#testInvalidType().
Closes#58146.
Co-authored-by: bellengao <gbl_long@163.com>
It is possible for the source document to have an empty string value
for a field that is mapped as numeric. We should treat those as missing
values and avoid throwing an assertion error.
Backport of #58541
This changes the default value for the results field of inference
applied on models that are trained via a data frame analytics job.
Previously, the results field default was `predicted_value`. This
commit makes it the same as in the training job itself. The new
default field is `<dependent_variable>_prediction`. Apart from
making inference consistent with the training job the model came
from, it is helpful to preserve the dependent variable name
by default as it provides some context to the user that may
avoid confusion as to which model results came from.
Backport of #58538
* Add acm mapping to APM for beats
* Add root mapping for APM
* Add sourcemap mapping to APM
* Fix missing properties
* Fix a second missing properties
* Add request property to acm
* Remove root and sourcemap per review
Co-authored-by: Mike Place <mike.place@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Add the ability to get a custom value while specifying a default and use it throughout the
codebase to get rid of the `null` edge case and shorten the code a little.
Introduces a new method on `MappedFieldType` to return a family type name which defaults to the field type.
Changes `wildcard` and `constant_keyword` field types to return `keyword` for field capabilities.
Relates to #53175
Similarities only apply to a few text-based field types, but are currently set directly on
the base MappedFieldType class. This commit moves similarity information into
TextSearchInfo, and removes any mentions of it from MappedFieldType or FieldMapper.
It was previously possible to include a similarity parameter on a number of field types
that would then ignore this information. To make it obvious that this has no effect, setting
this parameter on non-text field types now issues a deprecation warning.
This change allows the submit async search task to cancel children
and removes the manual indirection that cancels the search task when the submit
task is cancelled. This is now handled by the task cancellation, which can cancel
grand-children since #54757.
This commits allows data streams to be a valid source for analytics and transforms.
Data streams are fairly transparent and our `_search` and `_reindex` actions work without error.
For `_transforms` the check-pointing works as desired as well. Data streams are effectively treated as an `alias` and the backing index values are stored within checkpointing information.
GET inference stats now reads from the .ml-stats index.
Our tests should wait for yellow state before attempting to query the index for stat information.
Unlike `classification`, which is using a cross validation splitter
that produces training sets whose size is predictable and equal to
`training_percent * class_cardinality`, for regression we have been
using a random splitter that takes an independent decision for each
document. This means we cannot predict the exact size of the training
set. This poses a problem as we move towards performing test inference
on the java side as we need to be able to provide an accurate upper
bound of the training set size to the c++ process.
This commit replaces the random splitter we use for regression with
the same streaming-reservoir approach we do for `classification`.
Backport of #58331
Improve the usability of the MS-SQL server/ODBC escaped
date/time/timestamp literals, by allowing timezone/offset ids
in the parsed string, e.g.:
```
{ts '2000-01-01T11:11:11Z'}
```
Closes: #58262
(cherry picked from commit 0af1f2fef805324e802d97d2fd9b4660abb403f0)
There was a discrepancy in the implementation of flush
acknowledgements: most of the class was designed on the
basis that the "last finalized bucket time" could be null
but the wire serialization assumed that it was never
null. This works because, the C++ sends zero "last
finalized bucket time" when it is not known or not
relevant. But then the Java code will print that to
XContent as it is assuming null represents not known or
not relevant.
This change corrects the discrepancies. Internally within
the class null represents not known or not relevant, but
this is translated from/to 0 for communications from the
C++ and old nodes that have the bug.
Additionally I switched from Date to Instant for this
class and made the member variables final to modernise it
a bit.
Backport of #58413
Now that MappedFieldType no longer extends lucene's FieldType, we need to have a
way of getting the index information about a field necessary for building text queries,
building term vectors, highlighting, etc. This commit introduces a new TextSearchInfo
abstraction that holds this information, and a getTextSearchInfo() method to
MappedFieldType to make it available. Field types that do not support text search can
just return null here.
This allows us to remove the MapperService.getLuceneFieldType() shim method.
FieldTypeLookup maps field names to their MappedFieldTypes. In the past, due to
the presence of multiple mapping types within a single index, this had to be updated
in-place because a mapping update might only affect one type. However, now that
we only have a single type per index, we can completely rebuild the FieldTypeLookup
on each update, removing lots of concurrency worries.
Adds a new value to the "event" enum of ML annotations, namely
"categorization_status_change".
This will allow users to see when categorization was found to
be performing poorly. Once per-partition categorization is
available, it will allow users to see when categorization is
performing poorly for a specific partition.
It does not make sense to reuse the "model_change" event that
annotations already have, because categorizer state is separate
to model state ("model" state is really anomaly detector state),
and is not reverted by the revert model snapshot API.
Therefore annotations related to categorization need to be
treated differently to annotations related to anomaly detection.
Backporting #58096 to 7.x branch.
Relates to #53100
* use mapping source direcly instead of using mapper service to extract the relevant mapping details
* moved assertion to TimestampField class and added helper method for tests
* Improved logic that inserts timestamp field mapping into an mapping.
If the timestamp field path consisted out of object fields and
if the final mapping did not contain the parent field then an error
occurred, because the prior logic assumed that the object field existed.
When doing aliasing with the same name over non existing fields, the analyzer gets stuck in a loop trying to resolve the alias over and over leading to SO. This PR breaks the cycle by checking the relationship between the alias and the child it tries to replace as an alias should never replace its child.
Fix#57270Close#57417
Co-authored-by: Hailei <zhh5919@163.com>
(cherry picked from commit 46786ff2e1ed5951006ff4bdd2b6ac6a1ebcf17b)
* Add support for snapshot and restore to data streams (#57675)
This change adds support for including data streams in snapshots.
Names are provided in indices field (the same way as in other APIs), wildcards are supported.
If rename pattern is specified it renames both data streams and backing indices.
It also adds test to make sure SLM works correctly.
Closes#57127
Relates to #53100
* version fix
* compilation fix
* compilation fix
* remove unused changes
* compilation fix
* test fix
When a local model is constructed, the cache hit miss count is incremented.
When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
This commit fixes an AOOBE in the handling of fatal
failures in _async_search. If the underlying cause is not found,
this change uses the root failure.
Closes#58311
Today when creating a follower index via the put follow API, or via an
auto-follow pattern, it is not possible to specify settings overrides
for the follower index. Instead, we copy all of the leader index
settings to the follower. Yet, there are cases where a user would want
some different settings on the follower index such as the number of
replicas, or allocation settings. This commit addresses this by allowing
the user to specify settings overrides when creating follower index via
manual put follower calls, or via auto-follow patterns. Note that not
all settings can be overrode (e.g., index.number_of_shards) so we also
have detection that prevents attempting to override settings that must
be equal between the leader and follow index. Note that we do not even
allow specifying such settings in the overrides, even if they are
specified to be equal between the leader and the follower
index. Instead, the must be implicitly copied from the leader index, not
explicitly set by the user.
Fixes a bug in TextFieldMapper serialization when index is false, and adds a
base-class test to ensure that all field mappers are tested against all variations
with defaults both included and excluded.
Fixes#58188
TIME_PARSE works correctly if both date and time parts are specified,
and a TIME object (that contains only time is returned).
Adjust docs and add a unit test that validates the behavior.
Follows: #55223
(cherry picked from commit 9d6b679a5da88f3c131b9bdba49aa92c6c272abe)
This is currently used to set the indexVersionCreated parameter on FieldMapper.
However, this parameter is only actually used by two implementations, and clutters
the API considerably. We should just remove it, and use it directly in the
implementations that require it.
Today the read/write locks used internally by CacheFile object are
wrapped into a ReleasableLock. This is not strictly required and also
prevents usage of the tryLock() methods which we would like to use
for early releasing of read operations (#58164).
This changes the actions that would attempt to make the managed index read only to
check if the managed index is the write index of a data stream before proceeding.
The updated actions are shrink, readonly, freeze and forcemerge.
(cherry picked from commit c906f631833fee8628f898917a8613a1f436c6b1)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
As part of the "ML in Spaces" project, access to the ML UI in
Kibana is migrating to being controlled by Kibana privileges.
The ML UI will check whether the logged-in user has permission
to do something ML-related using Kibana privileges, and if they
do will call the relevant ML Elasticsearch API using the Kibana
system user. In order for this to work the kibana_system role
needs to have administrative access to ML.
Backport of #58061
This commit bumps our JNA dependency from 4.5.1 to 5.5.0, so that we are
now on the latest maintained line, and pick up a large collection of bug
fixes that have accumulated.
The main improvement here is that the total expected
count of training rows in the test is calculated as the
sum of the training fraction times the cardinality of each
class (instead of the training fraction times the total doc count).
Also relaxes slightly the error bound on the uniformity test from 0.12
to 0.13.
Closes#54122
Backport of #58180
We don't allow converting a data stream's writeable index into a searchable
snapshot. We are currently preventing swapping a data stream's write index
with the restored index.
This adds another step that will not proceed with the searchable snapshot action
until the managed index is not the write index of a data stream anymore.
(cherry picked from commit ccd618ead7cf7f5a74b9fb34524d00024de1479a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
MappedFieldType is a combination of two concerns:
* an extension of lucene's FieldType, defining how a field should be indexed
* a set of query factory methods, defining how a field should be searched
We want to break these two concerns apart. This commit is a first step to doing this, breaking
the inheritance relationship between MappedFieldType and FieldType. MappedFieldType
instead has a series of boolean flags defining whether or not the field is searchable or
aggregatable, and FieldMapper has a separate FieldType passed to its constructor defining
how indexing should be done.
Relates to #56814
The leases issued by CCR keep one extra operation around on the leader shards. This is not
harmful to the leader cluster, but means that there's potentially one delete that can't be
cleaned up.
Today a mounted searchable snapshot defaults to having the same replica
configuration as the index that was snapshotted. This commit changes this
behaviour so that we default to zero replicas on these indices, but allow the
user to override this in the mount request.
Relates #50999
This commit adds an optional field, `description`, to all ingest processors
so that users can explain the purpose of the specific processor instance.
Closes#56000.
This template was added for 7.0 for what I am guessing is a BWC issue related to deprecation
warnings. It unfortunately seems to cause failures because templates for these tests are not cleared
after the test (because these are upgrade tests).
Resolves#56363
This need some reorg of BinaryDV field data classes to allow specialisation of scripted doc values.
Moved common logic to a new abstract base class and added a new subclass to return string-based representations to scripts.
Closes#58044
Allows the kibana user to collect data telemetry in a background
task by giving the kibana_system built-in role the view_index_metadata
and monitoring privileges over all indices (*).
Without this fix, users who try to use Metricbeat for Stack Monitoring today
see the following error repeatedly in their Metricbeat log. Due to this error
Metricbeat is unwilling to proceed further and, thus, no Stack Monitoring
data is indexed into the Elasticsearch cluster.
Co-authored-by: Albert Zaharovits <albert.zaharovits@elastic.co>
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
Previously we excluded requiring licenses for dependencies with the
group name org.elasticsearch under the assumption that these use the
top-level Elasticsearch license. This is not always correct, for
example, for the org.elasticsearch:jna dependency as this is merely a
wrapper around the upstream JNA project, and that is the license that we
should be including. A recent change modified this check from using the
group name to checking only if the dependency is a project
dependency. This exposed the use of JNA in SQL CLI to this check, but
the license for it was not added. This commit addresses this by adding
the license.
Relates #58015
This has `EnsembleInferenceModel` not parse feature_names from the XContent.
Instead, it will rely on `rewriteFeatureIndices` to be called ahead time.
Consequently, protections are made for a fail fast path if `rewriteFeatureIndices` has not been called before `infer`.
We were previously configuring BWC testing tasks by matching on task
name prefix. This naive approach breaks down when you have versions like
1.0.1 and 1.0.10 since they both share a common prefix. This commit
makes the pattern matching more specific so we won't inadvertently
spin up the wrong cluster version.
This type of result will store stats about how well categorization
is performing. When per-partition categorization is in use, separate
documents will be written for every partition so that it is possible
to see if categorization is working well for some partitions but not
others.
This PR is a minimal implementation to allow the C++ side changes to
be made. More Java side changes related to per-partition
categorization will be in followup PRs. However, even in the long
term I do not see a major benefit in introducing dedicated APIs for
querying categorizer stats. Like forecast request stats the
categorizer stats can be read directly from the job's results alias.
Backport of #57978
Adds support for reading in `model_size_info` objects.
These objects contain numeric values indicating the model definition size and complexity.
Additionally, these objects are not stored or serialized to any other node. They are to be used for calculating and storing model metadata. They are much smaller on heap than the true model definition and should help prevent the analytics process from using too much memory.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Now that annotations are part of the anomaly detection job results
the annotations index should be refreshed on flushing and closing
the job so that flush and close continue to fulfil their contracts
that immediately after returning all results the job generated up
to that point are searchable.
ModelLoadingService only caches models if they are referenced by an
ingest pipeline. For models used in search we want to always cache the
models and rely on TTL to evict them. Additionally when an ingest
pipeline is deleted the model it references should not be evicted if
it is used in search.
Search after is a better choice for the delete expired data iterators
where processing takes a long time as unlike scroll a context does not
have to be kept alive. Also changes the delete expired data endpoint to
404 if the job is unknown
Since we change the memory estimates for data frame analytics jobs from worst case to a realistic case, the strict less-than assertion in the test does not hold anymore. I replaced it with a less-or-equal-than assertion.
Backport or #57882
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
When Joni, the regex engine that powers grok emits a warning it
does so by default to System.err. System.err logs are all bucketed
together in the server log at WARN level. When Joni emits a warning,
it can be extremely verbose, logging a message for each execution
again that pattern. For ingest node that means for every document
that is run that through Grok. Fortunately, Joni provides a call
back hook to push these warnings to a custom location.
This commit implements Joni's callback hook to push the Joni warning
to the Elasticsearch server logger (logger.org.elasticsearch.ingest.common.GrokProcessor)
at debug level. Generally these warning indicate a possible issue with
the regular expression and upon creation of the Grok processor will
do a "test run" of the expression and log the result (if any) at WARN
level. This WARN level log should only occur on pipeline creation which
is a much lower frequency then every document.
Additionally, the documentation is updated with instructions for how
to set the logger to debug level.
Allow a field inside the data to be used as a tie breaker for events
that have the same timestamp.
The field is optional by default.
If used, the tie-breaker always requires a non-null value since it is
used inside `search_after` which requires a non-null value.
Fix#56824
(cherry picked from commit e5719ecb474b32730d93afdbb6834a32b0b2df8b)
The shrink action creates a shrunken index with the target number of shards.
This makes the shrink action data stream aware. If the ILM managed index is
part of a data stream the shrink action will make sure to swap the original
managed index with the shrunken one as part of the data stream's backing
indices and then delete the original index.
(cherry picked from commit 99aeed6acf4ae7cbdd97a3bcfe54c5d37ab7a574)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
This deprecates `Rounding#round` and `Rounding#nextRoundingValue` in
favor of calling
```
Rounding.Prepared prepared = rounding.prepare(min, max);
...
prepared.round(val)
```
because it is always going to be faster to prepare once. There
are going to be some cases where we won't know what to prepare *for*
and in those cases you can call `prepareForUnknown` and stil be faster
than calling the deprecated method over and over and over again.
Ultimately, this is important because it doesn't look like there is an
easy way to cache `Rounding.Prepared` or any of its precursors like
`LocalTimeOffset.Lookup`. Instead, we can just build it at most once per
request.
Relates to #56124
Adds assertions to Netty to make sure that its threads are not polluted by thread contexts (and
also that thread contexts are not leaked). Moves the ClusterApplierService to use the system
context (same as we do for MasterService), which allows to remove a hack from
TemplateUgradeService and makes it clearer that applying CS updates is fully executing under
system context.
* Convert to date/datetime the result of numeric aggregations (min, max)
in Painless scripts
(cherry picked from commit f1de99e2a6fbf3806c4f2b6b809738aa8faa2d75)
This adds new plugin level circuit breaker for the ML plugin.
`model_inference` is the circuit breaker qualified name.
Right now it simply adds to the breaker when the model is loaded (and possibly breaking) and removing from the breaker when the model is unloaded.
Fix broken numeric shard generations when reading them from the wire
or physically from the physical repository.
This should be the cheapest way to clean up broken shard generations
in a BwC and safe-to-backport manner for now. We can potentially
further optimize this by also not doing the checks on the generations
based on the versions we see in the `RepositoryData` but I don't think
it matters much since we will read `RepositoryData` from cache in almost
all cases.
Closes#57798
Before to determine if a field is meta-field, a static method of MapperService
isMetadataField was used. This method was using an outdated static list
of meta-fields.
This PR instead changes this method to the instance method that
is also aware of meta-fields in all registered plugins.
Related #38373, #41656Closes#24422
We want to validate the DataStreams on creation to make sure the future backing
indices would not clash with existing indices in the system (so we can
always rollover the data stream).
This changes the validation logic to allow for a DataStream to be created
with a backing index that has a prefix (eg. `shrink-foo-000001`) even if the
former backing index (`foo-000001`) exists in the system.
The new validation logic will look for potential index conflicts with indices
in the system that have the counter in the name greater than the data stream's
generation.
This ensures that the `DataStream`'s future rollovers are safe because for a
`DataStream` `foo` of generation 4, we will look for standalone indices in the
form of `foo-%06d` with the counter greater than 4 (ie. validation will fail if
`foo-000006` exists in the system), but will also allow replacing a
backing index with an index named by prefixing the backing index it replaces.
(cherry picked from commit 695b242d69f0dc017e732b63737625adb01fe595)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Deleting expired data can take a long time leading to timeouts if there
are many jobs. Often the problem is due to a few large jobs which
prevent the regular maintenance of the remaining jobs. This change adds
a job_id parameter to the delete expired data endpoint to help clean up
those problematic jobs.
This makes it easier to debug where such tasks come from in case they are returned from the get tasks API.
Also renamed the last occurrence of waitForCompletion to waitForCompletionTimeout in get async search request.
This PR adds the initial Java side changes to enable
use of the per-partition categorization functionality
added in elastic/ml-cpp#1293.
There will be a followup change to complete the work,
as there cannot be any end-to-end integration tests
until elastic/ml-cpp#1293 is merged, and also
elastic/ml-cpp#1293 does not implement some of the
more peripheral functionality, like stop_on_warn and
per-partition stats documents.
The changes so far cover REST APIs, results object
formats, HLRC and docs.
Backport of #57683
This is a major refactor of the underlying inference logic.
The main refactor is now we are separating the model configuration and
the inference interfaces.
This has the following benefits:
- we can store extra things with the model that are not
necessary for inference (i.e. treenode split information gain)
- we can optimize inference separate from model serialization and storage.
- The user is oblivious to the optimizations (other than seeing the benefits).
A major part of this commit is removing all inference related methods from the
trained model configurations (ensemble, tree, etc.) and moving them to a new class.
This new class satisfies a new interface that is ONLY for inference.
The optimizations applied currently are:
- feature maps are flattened once
- feature extraction only happens once at the highest level
(improves inference + feature importance through put)
- Only storing what we need for inference + feature importance on heap
#47711 and #47246 helped to validate that monitoring settings are
rejected at time of setting the monitoring settings. Else an invalid
monitoring setting can find it's way into the cluster state and result
in an exception thrown [1] on the cluster state application (there by
causing significant issues). Some additional monitoring settings have
been identified that can result in invalid cluster state that also
result in exceptions thrown on cluster state application.
All settings require a type of either http or local to be
applicable. When a setting is changed, the exporters are automatically
updated with the new settings. However, if the old or new settings lack
of a type setting an exception will be thrown (since exporters are
always of type 'http' or 'local'). Arguably we shouldn't blindly create
and destroy new exporters on each monitoring setting update, but the
lifecycle of the exporters is abit out the scope this PR is trying to
address.
This commit introduces a similar methodology to check for validity as
#47711 and #47246 but this time for ALL (including non-http) settings.
Monitoring settings are not useful unless there an exporter with a type
defined. The type is used as dependent setting, such that it must
exist to set the value. This ensures that when any monitoring settings
changes that they can only get added to cluster state if the type
exists. If the type exists (and the other validations pass) then the
exporters will get re-built and the cluster state remains valid.
Tests have been included to ensure that all dynamic monitoring settings
have the type as dependent settings.
[1]
org.elasticsearch.common.settings.SettingsException: missing exporter type for [found-user-defined] exporter
at org.elasticsearch.xpack.monitoring.exporter.Exporters.initExporters(Exporters.java:126) ~[?:?]
When we force delete a DF analytics job, we currently first force
stop it and then we proceed with deleting the job config.
This may result in logging errors if the job config is deleted
before it is retrieved while the job is starting.
Instead of force stopping the job, it would make more sense to
try to stop the job gracefully first. So we now try that out first.
If normal stop fails, then we resort to force stopping the job to
ensure we can go through with the delete.
In addition, this commit introduces `timeout` for the delete action
and makes use of it in the child requests.
Backport of #57680
rewrite config on update if either version is outdated, credentials change,
the update changes the config or deprecated settings are found. Deprecated
settings get migrated to the new format. The upgrade can be easily extended to
do any necessary re-writes.
fixes#56499
backport #57648
For a rolling/mixed cluster upgrade (add new version to existing cluster
then shutdown old instances), the watches that ship by default
with monitoring may not get properly updated to the new version.
Monitoring watches can only get published if the internal state is
marked as dirty. If a node is not master, will also get marked as
clean (e.g. not dirty).
For a mixed cluster upgrade, it is possible for the new node to be
added, not as master, the internal state gets marked as clean so
that no more attempts can be made to publish the watches. This
happens on all new nodes. Once the old nodes are de-commissioned
one of the new version nodes in the cluster gets promoted to master.
However, that new master node (with out intervention like restarting
the node or removing/adding exporters) will never attempt to re-publish
since the internal state was already marked as clean.
This commit adds a cluster state listener to mark the resource dirty
when a node is promoted to master. This will allow the new resource
to be published without any intervention.
In #55592 and #55416, we deprecated the settings for enabling and disabling
basic license features and turned those settings into no-ops. Since doing so,
we've had feedback that this change may not give users enough time to cleanly
switch from non-ILM index management tools to ILM. If two index managers
operate simultaneously, results could be strange and difficult to
reconstruct. We don't know of any cases where SLM will cause a problem, but we
are restoring that setting as well, to be on the safe side.
This PR is not a strict commit reversion. First, we are keeping the new
xpack.watcher.use_ilm_index_management setting, introduced when
xpack.ilm.enabled was made a no-op, so that users can begin migrating to using
it. Second, the SLM setting was modified in the same commit as a group of other
settings, so I have taken just the changes relating to SLM.
* Remove duplicate ssl setup in sql/qa projects
* Fix enforcement of task instances
* Use static data for cert generation
* Move ssl testing logic into a plugin
* Document test cert creation
This PR replaces the marker interface with the method
FieldMapper#parsesArrayValue. I find this cleaner and it will help with the
fields retrieval work (#55363).
The refactor also ensures that only field mappers can declare they parse array
values. Previously other types like ObjectMapper could implement the marker
interface and be passed array values, which doesn't make sense.
Add `TRIM` function which combines the functionality of both
`LTRIM` and `RTRIM` by stripping both leading and trailing
whitespaces.
Refers to #41195
(cherry picked from commit 6c86c919e12f0c4cb5e39d129aa65ab3e274268f)
We test expected TLS failures by catching SSLException, but other
security providers ( i.e. BCFIPS ) might throw a different one. In
this case, BCFIPS throws org.bouncycastle.tls.TlsFatalAlert
Almost every outbound message is serialized to buffers of 16k pagesize.
We were serializing these messages off the IO loop (and retaining the concrete message
instance as well) and would then enqueue it on the IO loop to be dealt with as soon as the
channel is ready.
1. This would cause buffers to be held onto for longer than necessary, causing less reuse on average.
2. If a channel was slow for some reason, not only would concrete message instances queue up for it, but also 16k of buffers would be reserved for each message until it would be written+flushed physically.
With this change, the serialization happens on the event loop which effectively limits the number of buffers that `N` IO-threads will ever use so long as messages are small and channels writable.
Also, this change dereferences the reference to the concrete outbound message as soon as it has been serialized to save some more on GC.
This reduces the GC time for a default PMC run by about 50% in experiments (3 nodes, 2G heap each, loopback ... obvious caveat is that GC isn't that heavy in the first place with recent changes but still a measurable gain).
I also expect it to be helpful for master node stability by causing less of a spike if master is e.g. hit by a large number of requests that are processed batched (e.g. shard snapshot status updates) and responded to in a short time frame all at once.
Obviously, the downside to this change is that it introduces more latency on the IO loop for the serialization. But since we read all of these messages on the IO loop as well I don't see it as much of a qualitative change really and the more predictable buffer use seems much more valuable relatively.
* Move classes from build scripts to buildSrc
- move Run task
- move duplicate SanEvaluator
* Remove :run workaround
* Some little cleanup on build scripts on the way
As the datastream information is stored in the `ClusterState.Metadata` we exposed
the `Metadata` to the `AsyncWaitStep#evaluateCondition` method in order for
the steps to be able to identify when a managed index is part of a DataStream.
If a managed index is part of a DataStream the rollover target is the DataStream
name and the highest generation index is the write index (ie. the rolled index).
(cherry picked from commit 6b410dfb78f3676fce1b7401f1628c1ca6fbd45a)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Some BI tools (i.e. Tableau) would try to cast strings where the time
part is separated from the date part with a whitespace instead of `T`.
Adjust type conversion used by CAST to support this.
(cherry picked from commit 0e18321e7ad9f779c42855efbf93f171b9128a5e)
Add basic support for `TOP X` as a synonym to LIMIT X which is used
by [MS-SQL server](https://docs.microsoft.com/en-us/sql/t-sql/queries/top-transact-sql?view=sql-server-ver15),
e.g.:
```
SELECT TOP 5 a, b, c FROM test
```
TOP in SQL server also supports the `PERCENTAGE` and `WITH TIES`
keywords which this implementation doesn't.
Don't allow usage of both TOP and LIMIT in the same query.
Refers to #41195
(cherry picked from commit 2f5ab81b9ad884434d1faa60f4391f966ede73e8)
At some point, we changed the supported-type test to also catch
assertion errors. This has the side effect of also catching the
`fail()` call inside the try-catch, which silently smothered some
failures.
This modifies the test to throw at the end of the try-catch
block to prevent from accidentally catching itself.
Catching the AssertionError is convenient because there are other locations
that do throw an assertion in tests (due to hitting an assertion
before the exception is thrown) so I think we should keep it around.
Also includes a variety of fixes to other tests which were failing
but being silently smothered.
* [ML] mark forecasts for force closed/failed jobs as failed (#57143)
forecasts that are still running should be marked as failed/finished in the following scenarios:
- Job is force closed
- Job is re-assigned to another node.
Forecasts are not "resilient". Their execution does not continue after a node failure. Consequently, forecasts marked as STARTED or SCHEDULED should be flagged as failed. These forecasts can then be deleted.
Additionally, force closing a job kills the native task directly. This means that if a forecast was running, it is not allowed to complete and could still have the status of `STARTED` in the index.
relates to https://github.com/elastic/elasticsearch/issues/56419
* [ML] adds new for_export flag to GET _ml/inference API (#57351)
Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.
This flag is useful for moving models between clusters.
* Add new circuitbreaker plugin and refactor CircuitBreakerService (#55695)
This commit lays the ground work for plugins supplying their own circuit breakers.
It adds a new interface: `CircuitBreakerPlugin`.
This interface provides methods for providing custom child CircuitBreaker objects. There are also facilities for allowing dynamic settings for the custom breakers.
With the refactor, circuit breakers are no longer replaced on setting changes. Instead, the two mutable settings themselves are `volatile`. Plugins that want to use their custom circuit breaker should keep a reference of their constructed breaker.
This adds a max_model_memory setting to forecast requests.
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").
The default value is `20mb`.
There is a HARD limit at `500mb` which will throw an error if used.
If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.
related native change: https://github.com/elastic/ml-cpp/pull/1238
closes: https://github.com/elastic/elasticsearch/issues/56420
Implement TIME_PARSE(<time_str>, <pattern_str>) function
which allows to parse a time string according to the specified
pattern into a time object. The patterns allowed are those of
java.time.format.DateTimeFormatter.
Closes#54963
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Co-authored-by: Patrick Jiang(白泽) <patrickjiang0530@gmail.com>
(cherry picked from commit 1fe1188d449cad7d0782a202372edc52a4014135)
Backport of #56878 to 7.x branch.
With this change the following APIs will be able to resolve data streams:
get index, get mappings and ilm explain APIs.
Relates to #53100
Allows geo fields (`geo_point`, `geo_shape`) to have missing values.
Fixes a bug where such missing values would result in an error.
Closes#57299
Backport of #57300
Backporting #56888 to 7.x branch.
Limit the creation of data streams only for namespaces that have a composable template with a data stream definition.
This way we ensure that mappings/settings have been specified and will be used at data stream creation and data stream rollover.
Also remove `timestamp_field` parameter from create data stream request and
let the create data stream api resolve the timestamp field
from the data stream definition snippet inside a composable template.
Relates to #53100
Previously, `CASE` and `IIF` when translated to painless scripts
(used in GROUP BY, HAVING, WHERE) a custom `caseFunction`
registered in the `InternalSqlScriptUtils` was used. This function
received and array of arbitrary length:
```[condition1, result1, condition2, result2, ... elseResult]```
Painless doesn't know of the context and therefore is evaluating
all conditions and results before invoking the `caseFunction` on them.
As a consequence, erroneous result expressions (i.e. division by 0)
where always evaluated despite of the guarding condition.
Replace the `caseFunction` with painless `<cond> ? <res1> : <res2>`
expressions to properly guard the result expressions and only evaluate
the one for which its guarding condition evaluates to true (or of course
the elseResult).
As a bonus, this approach includes performance benefits since we avoid
unnecessary evaluations of both conditions and result expressions.
Fixes: #49672
(cherry picked from commit 9584b345d89f797bfb658212b928b9812804f02f)
The ssl.trust setting for Watcher provides a list of hostnames that
should be automatically trusted for SSL hostname verification. It was
accidentally broken when we added the full ssl.* settings for email
notifications (see #45272)
This commit corrects this, so the setting is once again respected,
as long as none of the other ssl settings are configured for email
notifications.
Resolves: #52153
Backport of: #56090
This commit removes the compiler.java setting from the build. It was
originally added when Gradle was far behind support for the latest jdk,
but is no longer applicable as we don't have any need to update the
supported compile version before gradle supports the newer version. Note
that the runtime version changing support still exists here, this only
ensures we use the same jdk to compile as we use to run gradle.
Since #51888 the ML job stats endpoint has returned entries for
jobs that have a persistent task but not job config. Such
orphaned tasks caused monitoring to fail.
This change ignores any such corrupt jobs for monitoring purposes.
Backport of #57235
This PR removes the blocking call to insert ingest documents into a queue in the
coordinator. It replaces it with an offer call which will throw a rejection exception
in the event that the queue is full. This prevents deadlocks of the write threads
when the queue fills to capacity and there are more than one enrich processors
in a pipeline.
The searcher was randomly wrapping its reader as slow, parallel, or filtered.
This was causing casting issues in the normalizer tests. By removing the
wrapping, the problem goes away.
Closes#57164
If a job is NOT opened, forecasts should be able to be deleted, no matter their state.
This also fixes a bug with expanding forecast IDs. We should check for wildcard `*` and `_all` when expanding the ids
closes https://github.com/elastic/elasticsearch/issues/56419
Change the error message wording for comparisons against fields in
filtering (s/variables/fields).
(cherry picked from commit d9a1cb50940d0a98fd75b9c0123ca6e1d862f65d)
* Update the JLine dependency to 3.14.1
Update the JLine dependency from 3.10.0 to 3.14.1.
(cherry picked from commit c2d9b74046fa5ddb54604da3afa7887cc38548a1)
Backport of #55548
Adds equivalence for keyword field to the wildcard field. Regex, fuzzy, wildcard and prefix queries are all supported.
All queries use an approximation query backed by an automaton-based verification queries.
Closes#54275
Fix delete_expired_data/nightly maintenance when
many model snapshots need deleting (#57041)
The queries performed by the expired data removers pull back entire
documents when only a few fields are required. For ModelSnapshots in
particular this is a problem as they contain quantiles which may be
100s of KB and the search size is set to 10,000.
This change makes the search more efficient by only requesting the
fields needed to work out which expired data should be deleted.
In KeystoreWrapper class we determine if the error to decrypt a
given keystore is caused by a wrong password based on the exception
that the SunJCE implementation of AES is throwing
(AEADBadTagException). Other implementations from other Security
Providers might cause decryption to fail in a different way and cause
us to throw a generic error message.
We handle this in this test by matching both possible
exception messages.
Relates: #56889
In #51089 where SamlAuthenticatorTests were refactored, we missed
to update one test case which meant that a single key would be
used both for signing and encryption in the same run. As explained
in #51089, and due to FIPS 140 requirements, BouncyCastle FIPS
provider will block RSA keys that have been used for signing from
being used for encryption and vice versa
This commit changes testNoAttributesReturnedWhenTheyCannotBeDecrypted
to always use the specific keys we have added for encryption.
This change ensures that we stop the maintenance service on all nodes
when a data node is restarted. This ensures that we don't send
update_by_query requests on the node that is restarted.
This commit also raises the log level to trace for some packages
in order to investigate the failures to acquire a shard lock
after a restart.
Relates #56765
- Use opensaml to sign and encrypt responses/assertions/attributes
instead of doing this manually
- Use opensaml to build response and assertion objects instead of
parsing xml strings
- Always use different keys for signing and encryption. Due to FIPS
140 requirements, BouncyCastle FIPS provider will block
RSA keys that have been used for signing from being used for
encryption and vice versa. This change adds new encryption specific
keys to be used throughout the tests.
Move the JDBC functionality integration tests from `:sql:qa` to a separate
module `:sql:qa:jdbc`. This way the tests are isolated from the rest of the
integration tests and they only depend to the `:sql:jdbc` module, thus
removing the danger of accidentally pulling in some dependency that may
hide bugs.
Moreover this is a preparation for #56722, so that we can run those tests
between different JDBC and ES node versions and ensure forward
compatibility.
Move the rest of existing tests inside a new `:sql:qa:server` project, so that
the `:sql:qa` becomes the parent project for both and one can run all the integration
tests by using this parent project.
(cherry picked from commit c09f4a04484b8a43934fe58fbc41bd90b7dbcc76)
Our FIPS 140 testing depends on setting the appropriate java policy
in order to configure the JVM in FIPS mode. Some tests (
discovery-ec2 and ccr qa ) also needed to set a custom policy file
to grant a specific permission, which overwrote the FIPS related
policy and tests would fail. This change ensures that when a
custom policy needs to be set in these tests, the permissions that
are necessary for FIPS are also set.
Resolves: #51685, #52034
Field mapping detection is done via grok patterns.
This commit adds well-known text (WKT) formatted geometry detection.
If everything is a `POINT`, then a `geo_point` mapping is preferred.
Otherwise, if all the fields are WKT geometries a `geo_shape` mapping is preferred.
This does **NOT** detect other types of formatted geometries (geohash, comma delimited points, etc.)
closes https://github.com/elastic/elasticsearch/issues/56967
Adds an example snippet for creating a `_doc` payload field with the
Watcher `index` action.
Co-authored-by: Luiz Guilherme Pais dos Santos <luiz.santos@elastic.co>
* Fix temp dir locked errors
The tests involving a temporary directory (containing the JDBC JAR) fail
on Windows because they can't be deleted, due to still being in use.
This commit forces a premature closing of the JAR file, which mitigates
the failure by giving the JVM more time to collect any open FDs.
(Calling the System.gc() in the tests is another working alternative
fix.)
The stream-based JAR access is taken care by disabling the cache usage
(cherry picked from commit 04f97333a015404a68e8f19223f33aadeb396687)
The original implementation utilized `bbox` as the index mapping type. This would not work as it would have to be `envelope`. But, given that `envelope` and `polygon` are tessellated in the same way, we choose to use `polygon` as the geo_shape type. This is for easier support other places in the stack (a la kibana maps)
Merging logic is currently split between FieldMapper, with its merge() method, and
MappedFieldType, which checks for merging compatibility. The compatibility checks
are called from a third class, MappingMergeValidator. This makes it difficult to reason
about what is or is not compatible in updates, and even what is in fact updateable - we
have a number of tests that check compatibility on changes in mapping configuration
that are not in fact possible.
This commit refactors the compatibility logic so that it all sits on FieldMapper, and
makes it called at merge time. It adds a new FieldMapperTestCase base class that
FieldMapper tests can extend, and moves the compatibility testing machinery from
FieldTypeTestCase to here.
Relates to #56814
Previously `COUNT(DISTINCT <literal>)` was returning the same result
as `COUNT(<literal>)` which is not correct as it should always return 1
if there is at least one matching row (bucket if there is a GROUP BY),
or 0 otherwise.
(cherry picked from commit 7f7d7562d43034907f432d39d0d66f490d78f4a8)
Throttling nightly cleanup as much as we do has been over cautious.
Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.
Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.
This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
The version number componenent can't equal or exceed the revision
multiplier.
This fixes a the VersionTests unit test.
(cherry picked from commit 7d2331a2818ae20024c5c3617cd4433f90e9c098)
* Adds support for MIN, MAX, AVG, SUM aggregates acting on literals.
SELECT SUM(1) FROM index
and
SELECT SUM(1), AVG(2)
work both on indices and as local execution.
(cherry picked from commit efb72907c0391612c4a2b6256e327060b4167912)
WatcherIndexTemplateRegistry as of https://github.com/elastic/elasticsearch/pull/52962
requires all nodes to be on 7.7.0 before it allows the version 11 index template to be
installed.
While in a mixed cluster, nothing prevents Watcher from running on the new
host before the all of the nodes are on 7.7.0. This will result in the
.watcher-history-11* index without the proper mappings. Without the proper
mapping a single document (for a large watch) can exceed the default 1000 field
limit and cause error to show in the logs.
This commit ensures the same logic for writing to the index is applied as for
installing the template. In a mixed cluster, the `10` index template will continue
to be written. Only once all of nodes are on 7.7.0+ will the `11` index template
be installed and used.
closes#56732
In DF analytics classification, it is possible to use no samples
of a class if its cardinality is too low.
This commit fixes this by ensuring the target sample count can never be zero.
Backport of #56783
* JDBC: fix access to the Manifest for non-entry JAR
The JDBC driver will attempt to read its version from the Manifest file
embedded into its JAR. The URL pointing to the JAR can be provided in a
few ways.
So far, accessing the Manfiest was attempted by getting a URLConnection
out of the URL and then getting an input stream out of this connection.
For file JAR URLs, this only works however if the URL points to the
driver as a JAR file entry (i.e. <sub-url>!/jdbc-driver.jar!/). If
that's not the case, the JarURLConnection will throw an IOException.
This commit fixes that: in case the URL points to a JAR entry
(jar:file:<path>/jdbc-driver.jar!/), the manifest is read directly with
JarURLConnection#getManifest().
(cherry picked from commit 2175b7b01cf5fcf3ab2bb21404a9bd454a8df3f0)
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
In some cases the Enrich processor factory may be called before it is
ready to create processors. While these calls are usually made in error,
the response from the Enrich processor is an NPE which is almost always
an unhelpful error when debugging an issue.
This change aims to fix our setup in CI so that we can run 7.x in
FIPS 140 mode. The major issue that we have in 7.x and did not
have in master is that we can't use the diagnostic trust manager
in FIPS mode in Java 8 with SunJSSE in FIPS approved mode as it
explicitly disallows the wrapping of X509TrustManager.
Previous attempts like #56427 and #52211 focused on disabling the
setting in all of our tests when creating a Settings object or
on setting fips_mode.enabled accordingly (which implicitly disables
the diagnostic trust manager). The attempts weren't future proof
though as nothing would forbid someone to add new tests without
setting the necessary setting and forcing this would be very
inconvenient for any other case ( see
#56427 (comment) for the full argumentation).
This change introduces a runtime check in SSLService that overrides
the configuration value of xpack.security.ssl.diagnose.trust and
disables the diagnostic trust manager when we are running in Java 8
and the SunJSSE provider is set in FIPS mode.
* [Transform] add support for terms agg in transforms (#56696)
This adds support for `terms` and `rare_terms` aggs in transforms.
The default behavior is that the results are collapsed in the following manner:
`<AGG_NAME>.<BUCKET_NAME>.<SUBAGGS...>...`
Or if no sub aggs exist
`<AGG_NAME>.<BUCKET_NAME>.<_doc_count>`
The mapping is also defined as `flattened` by default. This is to avoid field explosion while still providing (limited) search and aggregation capabilities.
This is a followup to #56632. Tests that had to be changed
to mock the C++ log handler more accurately need to be more
careful about when that stream ends, as ending of that
stream is used to detect crashes in the production system.
Fixes#56796
Mapper.Builder currently has some complex generics on it to allow fluent builder
construction. However, the second parameter, a return type from the build() method,
is unnecessary, as we can use covariant return types. This commit removes this second
generic parameter.
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
This aggregation will perform normalizations of metrics
for a given series of data in the form of bucket values.
The aggregations supports the following normalizations
- rescale 0-1
- rescale 0-100
- percentage of sum
- mean normalization
- z-score normalization
- softmax normalization
To specify which normalization is to be used, it can be specified
in the normalize agg's `normalizer` field.
For example:
```
{
"normalize": {
"buckets_path": <>,
"normalizer": "percent"
}
}
```
If CI is running tests at exactly 0 or 5 minutes past the hour
the ack-watch docs tests may fail with a 409 error if the ack
test happens to run at the exact time that the schedule watch
is running.
This commit changes the public documentation (and the test) for
the ack to a feb 29th at noon schedule. Test doc or tests do
not really care about the schedule date and this is chosen
since it is a valid date, but one that is extremely unlikely
to cause issues.
Optimize away events queries and joins/sequence that cannot match any
results without having to query the backend.
(cherry picked from commit 69c8ef8cfefd8fc6dcb6d1a566bfcd537068e3e4)
Adds the conflicting types and an example of an index which specifies
them in order to make it easier for the user to understand the conflict.
Backport of #56700
This change ensures that the maintenance service that is responsible for deleting the expired response is stopped between each test. This is needed since we check that no search context are in-flight after each test method.
Fixes#55988
If an email action is used in a foreach loop, message ids could have
been duplicated, which then get rejected by the mail server.
This commit introduces an additional static counter in the email action
in order to ensure that every message id is unique.
Prior to this change the named pipes that connect the ML C++
processes to the Elasticsearch JVM were all opened before any
of them were read from or written to.
This created a problem, where if the C++ process logged more
messages between opening the log pipe and opening the last
pipe to be connected than there was space for in the named
pipe's buffer then the C++ process would block. This would
mean it never got as far as opening the last named pipe, so
the JVM would never get as far as reading from the log pipe,
hence a deadlock.
This change alters the connection order so that the JVM
starts reading from the logging pipe immediately after opening
it so that if the C++ process logs messages while opening the
other named pipes they are captured in a timely manner and
there is no danger of a deadlock.
Backport of #56632
This merges the code for the `significant_terms` agg into the package
for the code for the `terms` agg. They are *super* entangled already,
this mostly just admits that to ourselves.
Precondition for the terms work in #56487
Initial support for EQL sequences
The current algorithm is focused on correctness and does not contain
any optimization which is left for the future.
The current implementation uses a state machine approach which moves
ascending and runs each query one after the other working on computing
sequences as the data comes in.
For each result, the key and its timestamp are being extracted which are
then used for matching/building a sequence.
(cherry picked from commit 4f3e18c894a1841d333022361ad9d1fdf1477dc3)
- Add support for scalar functions on the field of SQL's LIKE/RLIKE
- Add support for scalar functions on the field of EQL's match/matchLite
Closes: #55058
(cherry picked from commit 51c14e2dbb7fb29004a23369c449d425b3ac8fe2)
When decoding async execution ids, exceptions thrown from the decode method itself were not caught, leading to cryptic errors like "Input byte array has incorrect ending byte at 68" being returned. With this commit we return "invalid id: [abcdef]".
Added tests coverage for a couple of these scenarios and also added tests for equals/hashcode methods.
The docs pattern url was using `*` which means zero or many instead
of `?` which means zero or one. The pattern url returned in error
messages was not in sync with the one in the docs.
Fixes: #56476
(cherry picked from commit 1a5945c3962cdda21482f4b0b3e0ca508534c2c4)
Today you can convert a searchable snapshot index back into a regular index by
restoring the underlying snapshot, but this is somewhat wasteful if the shards
are already in cache since it copies the whole index from the repository again.
Instead, we can make use of the locally-cached data by using the clone API to
copy the contents of the cache into the layout expected by a regular shard.
This commit marks the searchable snapshot's private index settings as
`NotCopyableOnResize` so that they are removed by resize operations such as
cloning.
Cloning a regular index typically hard-links the underlying files rather than
copying them, but this is tricky to support in the case of a searchable
snapshot so this commit takes the simpler approach of always copying the
underlying files.
This setting was not returned in the SamlRealmSettings#getSettings
so it was not possible for users to set this in the realm config
in our configuration.
* [DOCS] Promote cron expressions info from Watcher to a separate topic.
* Fix table error
* Fixed xref
* Apply suggestions from code review
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
* Incorporated review feedback
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Watcher adds watches to the trigger service on the postIndex action
for the .watches index. This has the (intentional) side effect of also
adding the watches to the stats. The tests rely on these stats for their
assertions. The tests also start and stop Watcher between each test for
a clean slate.
When Watcher executes it updates the .watches index and upon this update
it will go through the postIndex method and end up added that watch to the
trigger service (and stats). Functionally this is not a problem, if Watcher
is stopping or stopped since Watcher is also paused and will not execute
the watch. However, with specific timing and expectations of a clean slate
can cause issues the test assertions against the stats.
This commit ensures that the postIndex action only adds to the trigger service
if the Watcher state is not stopping or stopped. When started back up it will
re-read index .watches.
This commit also un-mutes the tests related to #53177 and #56534
* Swaps outdated index patterns for the default `logstash` index alias.
Adds some related information about Logstash ILM defaults to the callout.
* Swaps `*.raw` fields for `*.keyword` fields. The Logstash template
uses `keyword` fields by default since 6.x.
* Swaps instances of `ctx.payload.hits.total.value` with
`ctx.payload.hits.total`
This commit allows the JSON schema's documentation.url property to have a null value.
This can useful for cases where a feature is under development, and does not have
documentation published yet.
This commit also adds a documentation.url for two ml resources.
Two spots that allow for some optimization:
* We are often creating a composite reference of just a single item in
the transport layer => special cased via static constructor to make sure we never do that
* Also removed the pointless case of an empty composite bytes ref
* `ByteBufferReference` is practically always created from a heap buffer these days so there
is no point of dealing with all the bounds checks and extra references to sliced buffers from that
and we can just use the underlying array directly
Without the flag we run into the situation where a broken repository (broken by some old 6.x
version of ES that is missing some snap-${uuid}.dat blobs fails to run the SLM retention task
since it always errors out).
Use `ORDER BY` to ensure order of the rows since more
than are returned in the testDate().
Follows: #56492
(cherry picked from commit 0053a1cb515b4db160d7b0bed5cf3f13c1050687)
Backport: #55377
This commit adds the ability to auto create data streams using index templates v2.
Index templates (v2) now have a data_steam field that includes a timestamp field,
if provided and index name matches with that template then a data stream
(plus first backing index) is auto created.
Relates to #53100
* QL: case sensitive support in EQL (#56404)
* adds a generic startsWith function to QL
* modifies the existent EQL startsWith function to be case sensitive
aware
* improves the existent EQL startsWith function to use a prefix query
when the function is used in a case sensitive context. Same improvement
is used in SQL's newly added STARTS_WITH function.
* adds case sensitivity to EQL configuration through a case_sensitive
parameter in the eql request, as established in #54411.
The case_sensitive parameter can be specified when running queries
(default is case insensitive)
(cherry picked from commit ee5a09ea840167566e34c28c8225dc38bc6a7ae8)
fix count in get and get stats if explicit ids are given and ids might be
duplicated when configuration are stored in different index (versions).
fixes#56196
The Date/Time related query params of a JDBC prepared statement
serialized using java.util.Date. The rules for serializing
`java.util.Date` objects though reside in
`XContentElasticsearchExtension` which is not available in the
jdbc jar as this class is in `server` module. Therefore, a
custom extension of the `XContentBuilderExtension` iface has been
added to the jdbc module/jar.
Moreover the sql's `qa` project had as dependency the `sql-action`
module which depends on `server` so the `XContentBuilderExtension`
was available for the integ tests hiding the real problem.
Previously, when a user was setting a `java.sql.Time` to the prepStmt,
the DataType used was `DATETIME` instead of `TIME` and therefore
prevented from filtering with a `TIME` casted field:
```
SELECT * FROM test WHERE date::TIME = ?
```
Fixes: #56084
(cherry picked from commit f8d8e971bd2c85fa4aea44b5b3ba0cdcc950a4ed)
Backport of: #56569
A data stream test, which tests data stream resolvability in xpack apis failed in release builds.
A invocation of a searchable snapshot api failed, because the corresponding feature flag
wasn't enabled for xpack rest tests.
Closes#56531
Similar to what the moving function aggregation does, except merging windows of percentiles
sketches together instead of cumulatively merging final metrics
When no timezone is specified the session timezone is used without
conversion, fix the docs test accordingly.
Follows: #56158
(cherry picked from commit 4b79b19ea5c3d17e05cb8130f3c754ac9bfd2382)
This commit refactors the following:
* GeoPointFieldMapper and PointFieldMapper to
AbstractPointGeometryFieldMapper derived from AbstractGeometryFieldMapper.
* .setupFieldType moved up to AbstractGeometryFieldMapper
* lucene indexing moved up to AbstractGeometryFieldMapper.parse
* new addStoredFields, addDocValuesFields abstract methods for implementing
stored field and doc values field indexing in the concrete field mappers
This refactor is the next phase for setting up a framework for extending
spatial field mapper functionality in x-pack.
Currently Elasticsearch creates independent event loop groups for each
transport (http and internal) transport type. This is unnecessary and
can lead to contention when different threads access shared resources
(ex: allocators). This commit moves to a model where, by default, the
event loops are shared between the transports. The previous behavior can
be attained by specifically setting the http worker count.
This commit removes the `prefer_v2_templates` flag and setting. This was a brief setting that
allowed specifying whether V1 or V2 template should be used when an index is created. It has been
removed in favor of V2 templates always having priority.
Relates to #53101Resolves#56528
This is not a breaking change because this flag was never in a released version.
We are ensuring order in the two tests changed by waiting on latches.
The problem is, that 3s is a pretty short wait and on CI can randomly be exceeded
by pure chance. If that happened we wouldn't have visibility on it since we didn't
assert that the waits actually worked.
=> Fixed by asserting that the waits work and upping the timeout to our standard 10s
Also, moved to a per-test threadpool to make it simpler to identify which test failed,
should an unexpected task run on a closed client's pool afterall.
Async search integration tests are subject to random failures when:
* The test index has more than one replica.
* The request cache is used.
* Some shards are empty.
* The maintenance service starts a garbage collection when node is closing.
They are also slow because the test index is created/populated on each
test method.
This change refactors these integration tests in order to:
* Create the index once for the entire test suite.
* Fix the usage of the request cache and replicas.
* Ensures that all shards have at least one document.
* Increase the delay of the maintenance service garbage collection.
Closes#55895Closes#55988
Change TransportBroadcastByNodeAction and TransportBroadcastReplicationAction
to be able to resolve data streams by default. Implementations can change this ability.
This change allows to following APIs to resolve data streams: flush,
refresh (already supported data streams), force merge, clear indices cache,
indices stats (already supported data streams), segments, upgrade stats,
upgrade, validate query, searchable snapshots stats, clear searchable snapshots cache and
reload analyzers APIs.
Relates to #53100
Right now all implementations of the `terms` agg allocate a new
`Aggregator` per bucket. This uses a bunch of memory. Exactly how much
isn't clear but each `Aggregator` ends up making its own objects to read
doc values which have non-trivial buffers. And it forces all of it
sub-aggregations to do the same. We allocate a new `Aggregator` per
bucket for two reasons:
1. We didn't have an appropriate data structure to track the
sub-ordinals of each parent bucket.
2. You can only make a single call to `runDeferredCollections(long...)`
per `Aggregator` which was the only way to delay collection of
sub-aggregations.
This change switches the method that builds aggregation results from
building them one at a time to building all of the results for the
entire aggregator at the same time.
It also adds a fairly simplistic data structure to track the sub-ordinals
for `long`-keyed buckets.
It uses both of those to power numeric `terms` aggregations and removes
the per-bucket allocation of their `Aggregator`. This fairly
substantially reduces memory consumption of numeric `terms` aggregations
that are not the "top level", especially when those aggregations contain
many sub-aggregations. It also is a pretty big speed up, especially when
the aggregation is under a non-selective aggregation like
the `date_histogram`.
I picked numeric `terms` aggregations because those have the simplest
implementation. At least, I could kind of fit it in my head. And I
haven't fully understood the "bytes"-based terms aggregations, but I
imagine I'll be able to make similar optimizations to them in follow up
changes.
Even with changes from #48854 we're still seeing significant (as in tens and hundreds of MB)
buffer usage for bulk exports in some cases which destabilizes master nodes.
Since we need to know the serialized length of the bulk body we can't do the serialization
in a streaming manner. (also it's not easily doable with the HTTP client API we're using anyway).
=> let's at least serialize on heap in compressed form and decompress as we're streaming to the
HTTP connection. For small requests this adds negligible overhead but for large requests this reduces
the size of the payload field by about an order of magnitude (empirically determined) which is a massive reduction in size when considering O(100MB) bulk requests.
We have been using a zero timeout in the case that DF analytics
is stopped. This may cause a timeout when we cancel, for example,
the reindex task.
This commit fixes this by using the default timeout instead.
Backport of #56423
While investigating possible optimizations to speed up searchable
snapshots shard restores, we noticed that Elasticsearch builds the
list of shard files on local disk in order to compare it with the list of
files contained in the snapshot to restore. This list of files is
materialized with a MetadataSnapshot object whose construction
involves to read the footer checksum of every files of the shard
using Store.checksumFromLuceneFile() method.
Further investigation shows that a MetadataSnapshot object is
also created for other types of operations like building the list of
files to recover in a peer recovery (and primary shard relocation)
or in order to assign a shard to a node. These operations use the
Store.getMetadata(IndexCommit) method to build the list of files
and checksums.
In the case of searchable snapshots building the MetadataSnapshot
object can potentially trigger cache misses, which in turn can
cause the download and the writing in cache of the last range of
the file in order to check the 16 bytes footer. This in turn can
cause more evictions.
Since searchable snapshots already contains the footer information
of every file in BlobStoreIndexShardSnapshot it can directly read the
checksum from it and avoid to use the cache at all to create a
MetadataSnapshot for the operations mentioned above.
This commit adds a shortcut to the
SearchableSnapshotDirectory.openInput() method - similarly to what
already exists for segment infos - so that it creates a specific
IndexInput for checksum reading operation.
It is possible that the config document for a data frame
analytics job is deleted from the config index. If that is
the case the user is unable to stop a running job because
we attempt to retrieve the config and that will throw.
This commit changes that. When the request is forced,
we do not expand the requested ids based on the existing
configs but from the list of running tasks instead.
Backport of #56360
Due to multi-threading it is possible that phase progress
updates written from the c++ process arrive reordered.
We can address this by ensuring that progress may only increase.
Closes#56282
Backport of #56339
* Add xpack setting deprecations to deprecation API
The deprecated settings showed up in the deprecation log file by
default, but I did not add them to the deprecation API. This commit
fixes that. Now if you use one of the deprecated basic feature
enablement settings, calling _monitoring/deprecations will inform you of
that fact.
* Remove incorrectly backported settings documents
It seems that I backported these docs to the wrong place in #56061,
in #55980, and in #56167. I hope they're in the right place now.
Co-authored-by: debadair <debadair@elastic.co>
Rounding dates on a shard that contains a daylight savings time transition
is currently something like 1400% slower than when a shard contains dates
only on one side of the DST transition. And it makes a ton of short lived
garbage. This replaces that implementation with one that benchmarks to
having around 30% overhead instead of the 1400%. And it doesn't generate
any garbage per search hit.
Some background:
There are two ways to round in ES:
* Round to the nearest time unit (Day/Hour/Week/Month/etc)
* Round to the nearest time *interval* (3 days/2 weeks/etc)
I'm only optimizing the first one in this change and plan to do the second
in a follow up. It turns out that rounding to the nearest unit really *is*
two problems: when the unit rounds to midnight (day/week/month/year) and
when it doesn't (hour/minute/second). Rounding to midnight is consistently
about 25% faster and rounding to individual hour or minutes.
This optimization relies on being able to *usually* figure out what the
minimum and maximum dates are on the shard. This is similar to an existing
optimization where we rewrite time zones that aren't fixed
(think America/New_York and its daylight savings time transitions) into
fixed time zones so long as there isn't a daylight savings time transition
on the shard (UTC-5 or UTC-4 for America/New_York). Once I implement
time interval rounding the time zone rewriting optimization *should* no
longer be needed.
This optimization doesn't come into play for `composite` or
`auto_date_histogram` aggs because neither have been migrated to the new
`DATE` `ValuesSourceType` which is where that range lookup happens. When
they are they will be able to pick up the optimization without much work.
I expect this to be substantial for `auto_date_histogram` but less so for
`composite` because it deals with fewer values.
Note: My 30% overhead figure comes from small numbers of daylight savings
time transitions. That overhead gets higher when there are more
transitions in logarithmic fashion. When there are two thousand years
worth of transitions my algorithm ends up being 250% slower than rounding
without a time zone, but java time is 47000% slower at that point,
allocating memory as fast as it possibly can.
This test sometimes fails when prewarming is enabled because
it's possible that some files are cached in background while the
test tries to clear the cache. This commit disables prewarming
for this test.
* Simplify equals/not-equals TRUE/FALSE expressions, by returning them
as is (TRUE variant) or negating them (FALSE variant)
(cherry picked from commit 17858afbe6da5fa0b3ecfc537cabb337e4baaffe)
Another Jackson release is available. There are some CVEs addressed,
none of which impact us, but since we can now bump Jackson easily, let
us move along with the train to avoid the false positives from security
scanners.
`FieldMapper#parseCreateField` accepts the parse context, plus a list of fields
as an output parameter. These fields are immediately added to the document
through `ParseContext#doc()`.
This commit simplifies the signature by removing the list of fields, and having
the mappers add the fields directly to `ParseContext#doc()`. I think this is
nicer for implementors, because previously fields could be added either through
the list, or the context (through `add`, `addWithKey`, etc.)
Fixes#56164. A minor update in the documentation, API key name is required when creating API key. If the API key name is not provided then the request will fail.
Async search allows users to retrieve partial results for a running search. For partial results, the number of successful shards does not include the skipped shards, while the response returned to users should.
Also, we recently had a bug where async search would miss tracking shard failures, which would have been caught if we had assertions in place that verified that whenever we get the last response, the number of failures included in it is the same as the failures that were tracked through the listener notifications.
A FilterBlobContainer class was introduced in #55952 and it delegates
its behavior to a given BlobContainer while allowing to override
only necessary methods.
This commit replaces the existing BlobContainerWrapper class from
the test framework with the new FilterBlobContainer from core.
If an exception occurs while flushing a bulk the cause of the exception
can be lost. This commit ensures that cause of the exception is carried
forward and gets logged.
* SQL: Add BigDecimal support to JDBC (#56015)
* Introduce BigDecimal support to JDBC -- fetching
This commit adds support for the getBigDecimal() methods.
* Allow BigDecimal params in double range
A prepared statement will now accept a BigDecimal parameter as a proxy
for a double, if the conversion is lossless.
(cherry picked from commit e9a873ad7f387682e3472110b1d7c0514bd347c9)
* Fix compilation error
Dimond notation with anonymous inner classes not avail in Java8.
The incomatible client version test is changed to:
- iterate on all versions prior to the allowed one_s;
- format the exception message just as the server does it.
The defect stemed from the fact that the clients will not send a
version's qualifier, but just major.minor.revision, so the raised
error/exception_message won't contain it, while the test expected it.
(cherry picked from commit 4a81c8f7a1f4573e3be95f346d9fb18772b297ee)
* [ML] lay ground work for handling >1 result indices (#55892)
This commit removes all but one reference to `getInitialResultsIndexName`.
This is to support more than one result index for a single job.
* Introduce a query builder for the rest tests
The new BaseRestSqlTestCase.RequestObjectBuilder class is a helper class
to build REST request objects for the tests. Consequently, "manual" string
concatenation to form JSON is done away with.
The class mimics SqlQueryRequestBuilder API.
(cherry picked from commit c8363f04c029542c233a758e9286d33c51d9c0c4)
this commit adds aggregation support for the geo_shape field
type on geo*_grid aggregations.
it introduces a Tiler for both tiles and hashes that enables a new type of
ValuesSource to replace the GeoPoint's CellIdSource. This makes it possible
for the existing Aggregator to be re-used, so no new implementations of
the grid aggregators are added.
Transforms should propagate up the search execution exception if one is returned when it does the test query.
this allows transforms to return a `4xx` when the aggs are malformed but parseable.
closes https://github.com/elastic/elasticsearch/issues/55994
* Relax version lock between ES/SQL and clients
Allow older-than-server clients to connect, if these are past or on a
certain min release.
(cherry picked from commit 108f907297542ce649aa7304060aaf0a504eb699)
The following settings are now no-ops:
* xpack.flattened.enabled
* xpack.logstash.enabled
* xpack.rollup.enabled
* xpack.slm.enabled
* xpack.sql.enabled
* xpack.transform.enabled
* xpack.vectors.enabled
Since these settings no longer need to be checked, we can remove settings
parameters from a number of constructors and methods, and do so in this
commit.
We also update documentation to remove references to these settings.
This commit changes searchable snapshots so that it now respects the
repository's max_restore_bytes_per_sec setting when it downloads blobs.
Backport of #55952 for 7.x
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.
- The default for `model_snapshot_retention_days` for new jobs is now
10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
that defaults to 1 for new jobs and `model_snapshot_retention_days`
for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
and `model_snapshot_retention_days` all but the first model snapshot
for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
selected model snapshots to be retained indefinitely
Backport of #56125
This commit strengthens the assertion about which threads may access a blob
store to exclude the cluster applier thread, since we no longer need to do so.
Relates #50999
As of elastic/ml-cpp#1179, the analytics process reports phases
depending on the analysis type. This commit adjusts the phases
of current analyses from `analyzing` to the following:
- outlier_detection: [`computing_outlier`]
- regression/classification: [`feature_selection`, `coarse_parameter_search`, `fine_tuning_parameters`, `final_training`]
Backport of #56107
Previously, when the timezone was missing from the datetime string
and the pattern, UTC was used, instead of the session defined timezone.
Moreover, if a timezone was included in the datetime string and the
pattern then this timezone was used. To have a consistent behaviour
the resulting datetime will always be converted to the session defined
timezone, e.g.:
```
SELECT DATETIME_PARSE('2020-05-04 10:20:30.123 +02:00', 'HH:mm:ss dd/MM/uuuu VV') AS datetime;
```
with `time_zone` set to `-03:00` will result in
```
2020-05-04T05:20:40.123-03:00
```
Follows: #54960
(cherry picked from commit 8810ed03a209cc8fe1bad309a81e85b56a39da27)
Today the cache prewarming introduced in #55322 works by
enqueuing altogether the files parts to warm in the
searchable_snapshots thread pool. In order to make this fairer
among concurrent warmings, this commit starts workers that
concurrently polls file parts to warm from a queue, warms the
part and then immediately schedule another warming
execution. This should leave more room for concurrent
shard warming to sneak in and be executed.
Relates #55322
Previously, the timezone parameter was not passed to the RangeQuery
and as a results queries that use the ES date math notation (now,
now-1d, now/d, now/h, now+2h, etc.) were using the UTC timezone and
not the one passed through the "timezone"/"time_zone" JDBC/REST params.
As a consequence, the date math defined dates were always considered in
UTC and possibly led to incorrect results for queries like:
```
SELECT * FROM t WHERE date BETWEEN now-1d/d AND now/d
```
Fixes: #56049
(cherry picked from commit 300f010c0b18ed0f10a41d5e1606466ba0a3088f)
In #55763 I thought I could remove the flag that marks
reindexing was finished on a data frame analytics task.
However, that exposed a race condition. It is possible that
between updating reindexing progress to 100 because we
have called `DataFrameAnalyticsManager.startAnalytics()` and
a call to the _stats API which updates reindexing progress via the
method `DataFrameAnalyticsTask.updateReindexTaskProgress()` we
end up overwriting the 100 with a lower progress value.
This commit fixes this issue by bringing back the help of
a `isReindexingFinished` flag as it was prior to #55763.
Closes#56128
Backport of #56135
AuthN realms are ordered as a chain so that the credentials of a given
user are verified in succession. Upon the first successful verification,
the user is authenticated. Realms do however have the option to cut short
this iterative process, when the credentials don't verify and the user
cannot exist in any other realm. This mechanism is currently used by
the Reserved and the Kerberos realm.
This commit improves the early termination operation by allowing
realms to gracefully terminate authentication, as if the chain has been
tried out completely. Previously, early termination resulted in an
authentication error which varies the response body compared
to the failed authentication outcome where no realm could verify the
credentials successfully.
Reserved users are hence denied authentication in exactly the same
way as other users are when no realm can validate their credentials.
Backport of #56034.
Move includeDataStream flag from an IndicesOptions to IndexNameExpressionResolver.Context
as a dedicated field that callers to IndexNameExpressionResolver can set.
Also alter indices stats api to support data streams.
The rollover api uses this api and otherwise rolling over data stream does no longer work.
Relates to #53100
* Delay warning about missing x-pack (#54265)
Currently, when monitoring is enabled in a freshly-installed cluster,
the non-master nodes log a warning message indicating that master may
not have x-pack installed. The message is often printed even when the
master does have x-pack installed but takes some time to setup the local
exporter for monitoring. This commit adds the local exporter setting
`wait_master.timeout` which defaults to 30 seconds. The setting
configures the time that the non-master nodes should wait for master to
setup monitoring. After the time elapses, they log a message to the user
about possible missing x-pack installation on master.
The logging of this warning was moved from `resolveBulk()` to
`openBulk()` since `resolveBulk()` is called only on cluster updates and
the message might not be logged until a new cluster update occurs.
Closes#40898
If there are ill-formed pipelines, or other pipelines are not ready to be parsed, `InferenceProcessor.Factory::accept(ClusterState)` logs warnings. This can be confusing and cause log spam.
It might lead folks to think there an issue with the inference processor. Also, they would see logs for the inference processor even though they might not be using the inference processor. Leading to more confusion.
Additionally, pipelines might not be parseable in this method as some processors require the new cluster state metadata before construction (e.g. `enrich` requires cluster metadata to be set before creating the processor).
closes https://github.com/elastic/elasticsearch/issues/55985
Backport of #55858 to 7.x branch.
Currently the TransportBulkAction detects whether an index is missing and
then decides whether it should be auto created. The coordination of the
index creation also happens in the TransportBulkAction on the coordinating node.
This change adds a new transport action that the TransportBulkAction delegates to
if missing indices need to be created. The reasons for this change:
* Auto creation of data streams can't occur on the coordinating node.
Based on the index template (v2) either a regular index or a data stream should be created.
However if the coordinating node is slow in processing cluster state updates then it may be
unaware of the existence of certain index templates, which then can load to the
TransportBulkAction creating an index instead of a data stream. Therefor the coordination of
creating an index or data stream should occur on the master node. See #55377
* From a security perspective it is useful to know whether index creation originates from the
create index api or from auto creating a new index via the bulk or index api. For example
a user would be allowed to auto create an index, but not to use the create index api. The
auto create action will allow security to distinguish these two different patterns of
index creation.
This change adds the following new transport actions:
AutoCreateAction, the TransportBulkAction redirects to this action and this action will actually create the index (instead of the TransportCreateIndexAction). Later via #55377, can improve the AutoCreateAction to also determine whether an index or data stream should be created.
The create_index index privilege is also modified, so that if this permission is granted then a user is also allowed to auto create indices. This change does not yet add an auto_create index privilege. A future change can introduce this new index privilege or modify an existing index / write index privilege.
Relates to #53100
It's possible for a constant_keyword to have a 'null' value before any documents
are seen that contain a value for the field. In this case, no documents have a
value for the field, and 'exists' queries should return no documents.
Adds the step of stopping all data frame analytics before
deleting them to the cleanup of the corresponding HLRC tests.
Closes#56097
Backport of #56101
* Reject queries that act on nested fields or fields with nested field types in their hierarchy (#55721)
(cherry picked from commit 2a024461cd9da821112953d4c6e565ea622c678b)
Backports #55933 to 7.x
Implements value_count and avg aggregations over Histogram fields as discussed in #53285
- value_count returns the sum of all counts array of the histograms
- avg computes a weighted average of the values array of the histogram by multiplying each value with its associated element in the counts array
Using optimistic locking, add the ability to run a repository state
update task with a consistent view of the current repository data.
Allows for a follow-up to remove the snapshot INIT state.
* Allow Deleting Multiple Snapshots at Once (#55474)
Adds deleting multiple snapshots in one go without significantly changing the mechanics of snapshot deletes otherwise.
This change does not yet allow mixing snapshot delete and abort. Abort is still only allowed for a single snapshot delete by exact name.
* Make xpack.monitoring.enabled setting a no-op
This commit turns xpack.monitoring.enabled into a no-op. Mostly, this involved
removing the setting from the setup for integration tests. Monitoring may
introduce some complexity for test setup and teardown, so we should keep an eye
out for turbulence and failures
* Docs for making deprecated setting a no-op
This commit converts the remaining isXXXAllowed methods to instead of
use isAllowed with a Feature value. There are a couple other methods
that are static, as well as some licensed features that check the
license directly, but those will be dealt with in other followups.
* Emit deprecation warning if multiple v1 templates match with a new index (#55558)
* Emit deprecation warning if multiple v1 templates match with a new index
* DEPRECATION_LOGGER rename
This commit includes a number of minor improvements around `DelayableWriteable`: javadocs were expanded and reworded, `get` was renamed to `expand` and `DelayableWriteable` no longer implements `Supplier`. Also a couple of methods are now private instead of package private.
This commit correctly sets the maxLinesPerRow in the CsvPreference for delimited files given the file structure finder settings.
Previously, it was silently ignored.
This refactors native integ tests to assert progress without
expecting explicit phases for analyses. We can test those with
yaml tests in a single place.
Backport of #55925
* Make xpack.ilm.enabled setting a no-op
* Add watcher setting to not use ILM
* Update documentation for no-op setting
* Remove NO_ILM ml index templates
* Remove unneeded setting from test setup
* Inline variable definitions for ML templates
* Use identical parameter names in templates
* New ILM/watcher setting falls back to old setting
* Add fallback unit test for watcher/ilm setting
Fixes test by exposing the method ModelLoadingService::addModelLoadedListener()
so that the test class can be notified when a model is loaded which happens in
a background thread
implement throttling in async-indexer used by rollup and transform. The added
docs_per_second parameter is used to calculate a delay before the next
search request is send. With re-throttle its possible to change the parameter
at runtime. When stopping a running job, its ensured that despite throttling
the indexer stops in reasonable time. This change contains the groundwork, but
does not expose the new functionality.
relates #54862
backport: #55011
We were creating PemKeyConfig objects using different private
keys but always using testnode.crt certificate that uses the
RSA public key. The PemKeyConfig was built but we would
then later fail to handle SSL connections during the TLS
handshake eitherway.
This became obvious in FIPS tests where the consistency
checks that FIPS 140 mandates kick in and failed early
becausethe private key was of different type than the
public key
Anonymous roles resolution and user role deduplication are now performed during authentication instead of authorization. The change ensures:
* If anonymous access is enabled, user will be able to see the anonymous roles added in the roles field in the /_security/_authenticate response.
* Any duplication in user roles are removed and will not show in the above authenticate response.
* In any other case, the response is unchanged.
It also introduces a behaviour change: the anonymous role resolution is now authentication node specific, previously it was authorization node specific. Details can be found at #47195 (comment)
Backports #55826 to 7.x
Modified AggregatorTestCase.searchAndReduce() method so that it returns an empty aggregation result when no documents have been inserted.
Also refactored several aggregation tests so they do not re-implement method AggregatorTestCase.testCase()
Fixes#55824
On second thought, this check does not seem to be adding value.
We can test that the phases are as we expect them for each analysis
by adding yaml tests. Those would fail if we introduce new phases
from c++ accidentally or without coordination. This would achieve
the same thing. At the same time we would not have to comment out
this code each time a new phase is introduced. Instead we can just
temporarily mute those yaml tests. Note I will add those tests
right after the imminent new phases are added to the c++ side.
Backport of #55926
While it is good to not be lenient when attempting to guess the file format, it is frustrating to users when they KNOW it is CSV but there are a few ill-formatted rows in the file (via some entry error, etc.).
This commit allows for up to 10% of sample rows to be considered "bad". These rows are effectively ignored while guessing the format.
This percentage of "allows bad rows" is only applied when the user has specified delimited formatting options. As the structure finder needs some guidance on what a "bad row" actually means.
related to https://github.com/elastic/elasticsearch/issues/38890
Implements Sum aggregation over Histogram fields by summing the value of each bucket multiplied by their count as requested in #53285
Backports #55681 to 7.x
We were previously checking at least one supported field existed
when the _explain API was called. However, in the case of analyses
with required fields (e.g. regression) we were not accounting that
the dependent variable is not a feature and thus if the source index
only contains the dependent variable field there are no features to
train a model on.
This commit adds a validation that at least one feature is available
for analysis. Note that we also move that validation away from
`ExtractedFieldsDetector` and the _explain API and straight into
the _start API. The reason for doing this is to allow the user to use
the _explain API in order to understand why they would be seeing an
error like this one.
For example, the user might be using an index that has fields but
they are of unsupported types. If they start the job and get
an error that there are no features, they will wonder why that is.
Calling the _explain API will show them that all their fields are
unsupported. If the _explain API was failing instead, there would
be no way for the user to understand why all those fields are
ignored.
Closes#55593
Backport of #55876
This change adds a new setting, daily_model_snapshot_retention_after_days,
to the anomaly detection job config.
Initially this has no effect, the effect will be added in a followup PR.
This PR gets the complexities of making changes that interact with BWC
over well before feature freeze.
Backport of #55878
This replaces a reference to the result of partially reducing
aggregations that async search keeps with a reference to the serialized
form of the result of the partial reduction which we need to keep
anyway.
This has no practical impact on users since frozen indices are the only
throttled indices today. However this has an impact on upcoming features
that would use search throttling.
Filtering out throttled indices made sense a couple years ago, but as
we're now improving support for slow requests with `_async_search` and
exploring ways to reduce storage costs, this feature has most likely
become a trap, that we'd like to not have with upcoming features that
would use search throttling.
Relates #54058
Today when prewarming a searchable snapshot we use the `SparseFileTracker` to
lock each (part of a) snapshotted blob, blocking any other readers from
accessing this data until the whole part is available.
This commit changes this strategy: instead we optimistically start to download
the blob without any locking, and then lock much smaller ranges after each
individual `read()` call. This may mean that some bytes are downloaded twice,
but reduces the time that other readers may need to wait before the data they
need is available.
As a best-effort optimisation we try to request the smallest possible single
range of missing bytes in the part by first checking how many of the initial
and terminal bytes of the part are already present in cache. In particular if
the part is already fully cached before prewarming then this check means we
skip the part entirely.
Currently there is a clear mechanism to stub sending a request through
the transport. However, this is limited to testing exceptions on the
sender side. This commit reworks our transport related testing
infrastructure to allow stubbing request handling on the receiving side.
Adding to #55659, we missed another way we could set the task to
failed due to task cancellation. CI revealed that we might also
get a `SearchPhaseExecutionException` whose cause is a
`TaskCancelledException`. That exception is not wrapped so
unwrapping it will not return the underlying `TaskCancelledException`.
Thus to be complete in catching this, we also need to check the
error's cause.
Closes#55068
Backport of #55797
This is a continuation from #55580.
Now that we're parsing phase progresses from the analytics process
we change `ProgressTracker` to allow for custom phases between
the `loading_data` and `writing_results` phases. Each `DataFrameAnalysis`
may declare its own phases.
This commit sets things in place for the analytics process to start
reporting different phases per analysis type. However, this is
still preserving existing behaviour as all analyses currently
declare a single `analyzing` phase.
Backport of #55763
This change adds validation when running the users tool so that
if Elasticsearch is expected to run in a JVM that is configured to
be in FIPS 140 mode and the password hashing algorithm is not
compliant, we would throw an error.
Users tool uses the configuration from the node and this validation
would also happen upon node startup but users might be added in the
file realm before the node is started and we would have the
opportunity to notify the user of this misconfiguration.
The changes in #55544 make this much less probable to happen in 8
since the default algorithm will be compliant but this change can
act as a fallback in anycase and makes for a better user experience.
Our handling for concurrent refresh of access tokens suffered from
a race condition where:
1. Thread A has just finished with updating the existing token
document, but hasn't stored the new tokens in a new document
yet
2. Thread B attempts to refresh the same token and since the
original token document is marked as refreshed, it decrypts and
gets the new access token and refresh token and returns that to
the caller of the API.
3. The caller attempts to use the newly refreshed access token
immediately and gets an authentication error since thread A still
hasn't finished writing the document.
This commit changes the behavior so that Thread B, would first try
to do a Get request for the token document where it expects that
the access token it decrypted is stored(with exponential backoff )
and will not respond until it can verify that it reads it in the
tokens index. That ensures that we only ever return tokens in a
response if they are already valid and can be used immediately
It also adjusts TokenAuthIntegTests
to test authenticating with the tokens each thread receives,
which would fail without the fix.
Resolves: #54289
The failed_category_count statistic records the number of times
categorization wanted to create a new category but couldn't
because the job had reached its model_memory_limit.
Backport of #55716
This commit refactors all spatial Field Mappers to a common
AbstractGeometryFieldMapper that implements shared parameter functionality
(e.g., ignore_malformed, ignore_z_value) and provides a common framework for
overriding type parsing, and building in xpack. Common shape functionality is
implemented in a new AbstractShapeGeometryFieldMapper that is reused and
overridden in GeoShapeFieldMapper, GeoShapeFieldMapperWithDocValues,
LegacyGeoShapeFieldMapper, and ShapeFieldMapper. This abstraction provides a
reusable foundation for adding new xpack features; such as coordinate reference
system support.
This commit removes some code duplication in the CCR non-compliance
tests by refactoring an assertion method so that it can be used in both
tests that are present there.
Since #55580 we've introduced a new format for parsing progress
from the data frame analytics process. As the process is now
writing out progress in this new way, we can remove the parsing
of the old format.
Backport of #55711
If we choose to read from two random positions that are 1024 bytes apart then
this counts as a contiguous read for stats purposes, failing this test. This
commit ensures that we always perform a non-contiguous read.
`SearchableSnapshotDirectoryStatsTests#testCachedBytesReadsAndWrites` asserts
that each write takes one clock tick, but we now permit concurrent reads and
writes so each write might take longer. This commit relaxes the assertion to
match.
Closes#55707
Also unmutes the integ test that stops and restarts
an outlier detection job with the hope of learning more
of the failure in #55068.
Backport of #55545
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This change turns the AsyncSearchMaintenanceService into an
AbstractLifecycleComponent and ensures that the service is
stopped when a node is closing.
Closes#55646
improve tests related to stopping using a client that answers and can be
synchronized with the test thread in order to test special situations
relates #55011
This commit fixes reproducible test failures with the
SSLReloadDuringStartupIntegTests on the 7.x branch. The failures only
occur on 7.x due to the existence of the transport client and its usage
in our test infrastructure. This change removes the randomized usage of
transport clients when retrieving a client from a node in the internal
cluster. Transport clients do not support the reloading of files for
TLS configuration changes but if we build one from the nodes settings
and attempt to use it after the files have been changed, the client
will not know about the changes and the TLS connection will fail.
Closes#55524
License state is currently made up of boolean methods that check whether
a particular feature is allowed by the current license state. Each new
feature must copy/past boiler plate code. While that has gotten easier
with utilities like isAllowedByLicense, this is still more cumbersome
than should be necessary. This commit adds a general purpose isAllowed
method which takes a new Feature enum, where each value of the enum
defines the minimum license mode and whether the license must be active
to be allowed. Only security features are converted in this PR, in order
to keep the commit size relatively small. The rest of the features will
be converted in a followup.
This adds a validation to VSParserHelper to ensure that a field or
script or both are specified by the user. This is technically
required today already, but throws an exception much deeper
in the agg framework and has a very unintuitive error for the user
(as well as eating more resources instead of failing early)
The (de)serialization code of the async search response
cannot handle exceptions that extend ElasticsearchException (e.g. ScriptException).
This commit fixes this bug by serializing the error with the more generic
StreamInput#writeException.
EQL will require very similar functionality to async search. This PR refactors
AsyncSearchIndexService to make it reusable for EQL.
Supersedes #55119
Relates to #49638
This commit refactors geo_shape doc values, fielddata, and utility classes from
the single mapper package in x-pack spatial plugin to a package structure that
is consistent with the server module.
Previously audit messages were indexed when datafeeds that were
assigned to a node were stopped, but not datafeeds that were
unassigned at the time they were stopped.
This change adds auditing for the unassigned case.
Backport of #55656
While we were catching `TaskCancelledException` while we wait for
reindexing to complete, we missed the fact that this exception
may be wrapped in a multi-node cluster. This is the reason
we may still fail the task when stop is called while reindexing.
Some times we're lucky and the exception is thrown by the same
node that runs the job. Then the exception is not wrapped and
things work fine. But when that is not the case the exception is
wrapped, we fail to catch it, and set the task to failed.
The fix is to simply unwrap the exception when we check it it
is `TaskCancelledException`.
Closes#55068
Backport of #55659
The CacheFile.fileLock() method is used to acquire a lock
on a cache file so that the file can't be deleted (or its file
handle closed) during the execution of a read or a write
operation.
Today this lock is obtained by first acquiring the eviction
lock (the write lock of the readwrite lock), then by checking
if the cache file is evicted and the file channel still open,
and finally by obtaining the file lock (the read lock of the
readwrite lock). Acquiring the read lock while the eviction
lock is held ensures that the cache file eviction cannot
start in the meanwhile. But eviction starts (and terminations)
also acquire the eviction lock; and this lock cannot be
obtained while a read lock is held (the write lock of a
readwrite lock is exclusive).
If we were acquiring a read lock and checking the eviction
flag and file channel existence while holding the read lock
we know that no eviction can start or finish until the
read lock is released.
Backport of #55115.
Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...),
with an appropriate key, so that all messages can potentially be deduplicated.
The Kibana CSV export feature uses a non-standard timestamp format.
This change adds it to the formats the find_file_structure endpoint
recognizes out-of-the-box, to make round-tripping data from Kibana
back to Kibana via CSV files easier.
Fixes#55586
A JSON schema was recently introduced for the REST API specification. #54252
This PR introduces a 3rd party validation tool to ensure that the
REST specification conforms to the schema.
The task is applied to the 3 projects that contain REST API specifications.
The plugin wires this task into the precommit commit task, and should be
considered as part of the public API for the build tools for any plugin
developer to contribute their plugin's specification.
An ignore parameter has been introduced for the task to allow specific
file to be ignored from the validation. The ignored files in this PR
will soon get issues logged and a link so they can be fixed.
Closes#54314
Out of the box "access granted" audit events are not logged
for system users. The list of system users was stale and included
only the _system and _xpack users. This commit expands this list
with _xpack_security and _async_search, effectively reducing the
auditing noise by not logging the audit events of these system
users out of the box.
Closes#37924
The testPerformAction test has been failing periodically due to
how Hamcrest's containsStringIgnoringCase does not lowercase using
the same Locale set in the test infrastructure.
This commit falls back to explicitly lowercasing using the root
locale
This commit adds a new GeoShapeBoundsAggregator to the spatial plugin and registers it with the GeoShapeValuesSourceType. This enables geo_bounds aggregations on geo_shape fields
After #53562, the `geo_shape` field mapper is registered within
a module. This opens the door for introducing a new `geo_shape`
field mapper into the Spatial Plugin that has doc-values support.
This is very much an extension of server's GeoShapeFieldMapper,
but with the addition of the doc values implementation.
Data frame analytics process currently reports progress as
an integer `progress_percent`. We parse that and report it
from the _stats API as the progress of the `analyzing` phase.
However, we want to allow the DFA process to report progress
for more than one phase. This commit prepares for this by
parsing `phase_progress` from the process, an object that
contains the `phase` name plus the `progress_percent` for that
phase.
Backport of #55580
Instead of doing our own checks against REST status, shard counts, and shard failures, this commit changes all our extractor search requests to set `.setAllowPartialSearchResults(false)`.
- Scrolls are automatically cleared when a search failure occurs with `.setAllowPartialSearchResults(false)` set.
- Code error handling is simplified
closes https://github.com/elastic/elasticsearch/issues/40793
Issue #55521 suggested that audit messages were not written when
closing an unassigned job. This is not the case, but we didn't
have a test to prove it.
Backport of #55571
The ML info endpoint returns the max_model_memory_limit setting
if one is configured. However, it is still possible to create
a job that cannot run anywhere in the current cluster because
no node in the cluster has enough memory to accommodate it.
This change adds an extra piece of information,
limits.effective_max_model_memory_limit, to the ML info
response that returns the biggest model memory limit that could
be run in the current cluster assuming no other jobs were
running.
The idea is that the ML UI will be able to warn users who try to
create jobs with higher model memory limits that their jobs will
not be able to start unless they add a bigger ML node to their
cluster.
Backport of #55529
Adds a "node" field to the response from the following endpoints:
1. Open anomaly detection job
2. Start datafeed
3. Start data frame analytics job
If the job or datafeed is assigned to a node immediately then
this field will return the ID of that node.
In the case where a job or datafeed is opened or started lazily
the node field will contain an empty string. Clients that want
to test whether a job or datafeed was opened or started lazily
can therefore check for this.
Backport of #55473
PEMUtils would incorrectly fill the encryption password with zeros
(the '\0' character) after decrypting a PKCS#8 key.
Since PEMUtils did not take ownership of this password it should not
zero it out because it does not know whether the caller will use that
password array again. This is actually what PEMKeyConfig does - it
uses the key encryption password as the password for the ephemeral
keystore that it creates in order to build a KeyManager.
Backport of: #55457
In the test after the first load event is is not known which models are cached as
loading a later one will evict an earlier one and the order is not known.
The models could have been loaded 1 or 2 times not exactly twice
This change folds the removal of the in-progress snapshot entry
into setting the safe repository generation. Outside of removing
an unnecessary cluster state update, this also has the advantage
of removing a somewhat inconsistent cluster state where the safe
repository generation points at `RepositoryData` that contains a
finished snapshot while it is still in-progress in the cluster
state, making it easier to reason about the state machine of
upcoming concurrent snapshot operations.
Today a read-only engine requires a complete history of operations, in the
sense that its local checkpoint must equal its maximum sequence number. This is
a valid check for read-only engines that were obtained by closing an index
since closing an index waits for all in-flight operations to complete. However
a snapshot may not have this property if it was taken while indexing was
ongoing, but that's ok.
This commit weakens the check for a complete history to exclude the case of a
searchable snapshot.
Relates #50999
This change ensures that we return the latest expiration time
when retrieving the response from the index.
This commit also fixes a bug that stops the garbage collection of saved responses if the async search index is deleted.
If more than 100 shard-follow tasks are trying to connect to the remote
cluster, then some of them will abort with "connect listener queue is
full". This is because we retry on ESRejectedExecutionException, but not
on RejectedExecutionException.
`updateAndGet` could actually call the internal method more than once on contention.
If I read the JavaDocs, it says:
```* @param updateFunction a side-effect-free function```
So, it could be getting multiple updates on contention, thus having a race condition where stats are double counted.
To fix, I am going to use a `ReadWriteLock`. The `LongAdder` objects allows fast thread safe writes in high contention environments. These can be protected by the `ReadWriteLock::readLock`.
When stats are persisted, I need to call reset on all these adders. This is NOT thread safe if additions are taking place concurrently. So, I am going to protect with `ReadWriteLock::writeLock`.
This should prevent race conditions while allowing high (ish) throughput in the highly contention paths in inference.
I did some simple throughput tests and this change is not significantly slower and is simpler to grok (IMO).
closes https://github.com/elastic/elasticsearch/issues/54786
This paves the data layer way so that exceptionally large models are partitioned across multiple documents.
This change means that nodes before 7.8.0 will not be able to use trained inference models created on nodes on or after 7.8.0.
I chose the definition document limit to be 100. This *SHOULD* be plenty for any large model. One of the largest models that I have created so far had the following stats:
~314MB of inflated JSON, ~66MB when compressed, ~177MB of heap.
With the chunking sizes of `16 * 1024 * 1024` its compressed string could be partitioned to 5 documents.
Supporting models 20 times this size (compressed) seems adequate for now.
* [ML] fix native ML test log spam (#55459)
This adds a dependency to ingest common. This removes the log spam resulting from basic plugins being enabled that require the common ingest processors.
* removing unnecessary changes
* removing unused imports
* removing unnecessary java setting
This commit adds a new querystring parameter on the following APIs:
- Index
- Update
- Bulk
- Create Index
- Rollover
These APIs now support a `?prefer_v2_templates=true|false` flag. This flag changes the preference
creation to use either V2 index templates or V1 templates. This flag defaults to `false` and will be
changed to `true` for 8.0+ in subsequent work.
Additionally, setting this flag internally sets the `index.prefer_v2_templates` index-level setting.
This setting is used so that actions that automatically create a new index (things like rollover
initiated by ILM) will inherit the preference from the original index. This setting is dynamic so
that a transition from v1 to v2 templates can occur for long-running indices grouped by an alias
performing periodic rollover.
This also adds support for sending this parameter to the High Level Rest Client.
Relates to #53101
We don't really need `LinkedHashSet` here. We can assume that all the
entries are unique and just use a list and use the list utilities to
create the cheapest possible version of the list.
Also, this fixes a bug in `addSnapshot` which would mutate the existing
linked hash set on the current instance (fortunately this never caused a real world bug)
and brings the collection in line with the java docs on its getter that claim immutability.
Removing the deprecated "xpack.monitoring.enabled" setting introduced
log spam and potentially some failures in ML tests. It's possible to use
a different, non-deprecated setting to disable monitoring, so we do that
here.
Adds ranged read support for GCS repositories in order to enable searchable snapshot support
for GCS.
As part of this PR, I've extracted some of the test infrastructure to make sure that
GoogleCloudStorageBlobContainerRetriesTests and S3BlobContainerRetriesTests are covering
similar test (as I saw those diverging in what they cover)
This validation is not needed, as we have discovered the source of the
serialization error that was leading to some usage instances appearing
to not have a name.
This commit upgrades the ASM dependency used in the feature aware check
to 7.3.1. This gives support for JDK 14. Additionally, now that Gradle
understands JDK 13, it means we can remove a restriction on running the
feature aware check to JDK 12 and lower.
This change reworks the loading and monitoring of files that are used
for the construction of SSLContexts so that updates to these files are
not lost if the updates occur during startup. Previously, the
SSLService would parse the settings, build the SSLConfiguration
objects, and construct the SSLContexts prior to the
SSLConfigurationReloader starting to monitor these files for changes.
This allowed for a small window where updates to these files may never
be observed until the node restarted.
To remove the potential miss of a change to these files, the code now
parses the settings and builds SSLConfiguration instances prior to the
construction of the SSLService. The files back the SSLConfiguration
instances are then registered for monitoring and finally the SSLService
is constructed from the previously parse SSLConfiguration instances. As
the SSLService is not constructed when the code starts monitoring the
files for changes, a CompleteableFuture is used to obtain a reference
to the SSLService; this allows for construction of the SSLService to
complete and ensures that we do not miss any file updates during the
construction of the SSLService.
While working on this change, the SSLConfigurationReloader was also
refactored to reflect how it is currently used. When the
SSLConfigurationReloader was originally written the files that it
monitored could change during runtime. This is no longer the case as
we stopped the monitoring of files that back dynamic SSLContext
instances. In order to support the ability for items to change during
runtime, the class made use of concurrent data structures. The use of
these concurrent datastructures has been removed.
Closes#54867
Backport of #54999
Some aggregations, such as the Terms* family, will use an alternate
class to represent unmapped shard results (while the rest of the aggs
use the same object but with some form of "empty" or "nullish" values
to represent unmapped).
This was problematic with AbstractWireSerializingTestCase because it
expects the instanceReader to always match the original class. Instead,
we need to use the NamedWriteable version so that the registry
can be consulted for the proper deserialization reader.
Security features in the license state currently do a dynamic check on
whether security is enabled. This is because the license level can
change the default security enabled state. This commit splits out the
check on security being enabled, so that the combo method of security
enabled plus license allowed is no longer necessary.
We believe there's no longer a need to be able to disable basic-license
features completely using the "xpack.*.enabled" settings. If users don't
want to use those features, they simply don't need to use them. Having
such features always available lets us build more complex features that
assume basic-license features are present.
This commit deprecates settings of the form "xpack.*.enabled" for
basic-license features, excluding "security", which is a special case.
It also removes deprecated settings from integration tests and unit
tests where they're not directly relevant; e.g. monitoring and ILM are
no longer disabled in many integration tests.
* [ML] fix bugs with prediction field value settings (#55333)
This fixes two unreleased bugs:
1. Prediction value type of `number` might show unexpected classes
Analytics created models may have class labels like `1, 5, 10` (or some collection of discrete, whole numbers). These labels are passed to the inference model config in the `classification_labels` field.
When the predicted value format is `numeric` it should attempt to see if the classification labels are provided and are numeric. If so, use those. If not, use the underlying value.
2. When supplying an update overwrite, inference was losing the default prediction field value. This is because it was not copied over in the copy ctor in the ClassificationConfig.Builder class.
closes#55332
This fixes the long muted testHRDSplit. Some minor adjustments for modern day elasticsearch changes :).
The cause of the failure is that a new `by` field entering the model with an exceptionally high count does not cause an anomaly. We have since stopped combining the `rare` and `by` in this manner. New entries in a `by` field are not anomalous because we have no history on them yet.
closes https://github.com/elastic/elasticsearch/issues/32966
When a anomaly jobs, datafeeds, and analytics tasks are stopped, they enter an ephemeral state called `STOPPING`.
If the node executing the task fails while this is occurring, they could be stuck in the limbo state of `STOPPING`. It is best to mark the tasks as completed if they get reassigned to a node.
Implement the use of scalar functions inside aggregate functions.
This allows for complex expressions inside aggregations, with or without
GROUBY as well as with or without a HAVING clause. e.g.:
```
SELECT MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) AS max, b
FROM test
GROUP BY b
HAVING MAX(CASE WHEN a IS NULL then -1 ELSE abs(a * 10) + 1 END) > 5
```
Scalar functions are still not allowed for `KURTOSIS` and `SKEWNESS` as
this is currently not implemented on the ElasticSearch side.
Fixes: #29980Fixes: #36865Fixes: #37271
(cherry picked from commit 506d1beea7abb2b45de793bba2e349090a78f2f9)
Backport from: #54726
The INCLUDE_DATA_STREAMS indices option controls whether data streams can be resolved in an api for both concrete names and wildcard expressions. If data streams cannot be resolved then a 400 error is returned indicating that data streams cannot be used.
In this pr, the INCLUDE_DATA_STREAMS indices option is enabled in the following APIs: search, msearch, refresh, index (op_type create only) and bulk (index requests with op type create only). In a subsequent later change, we will determine which other APIs need to be able to resolve data streams and enable the INCLUDE_DATA_STREAMS indices option for these APIs.
Whether an api resolve all backing indices of a data stream or the latest index of a data stream (write index) depends on the IndexNameExpressionResolver.Context.isResolveToWriteIndex().
If isResolveToWriteIndex() returns true then data streams resolve to the latest index (for example: index api) and otherwise a data stream resolves to all backing indices of a data stream (for example: search api).
Relates to #53100
We do not validate the name is not null, and not empty. Even though it
never should be, we had a build failure where it appears that somehow
this did happen. We add some validation here, in case this really is
happening, we will have a more clear indication where this is coming
from, and of course, validation that name fits the implicit assumptions
that it is not null and not empty.
* Add ValuesSource Registry and associated logic (#54281)
* Remove ValuesSourceType argument to ValuesSourceAggregationBuilder (#48638)
* ValuesSourceRegistry Prototype (#48758)
* Remove generics from ValuesSource related classes (#49606)
* fix percentile aggregation tests (#50712)
* Basic thread safety for ValuesSourceRegistry (#50340)
* Remove target value type from ValuesSourceAggregationBuilder (#49943)
* Cleanup default values source type (#50992)
* CoreValuesSourceType no longer implements Writable (#51276)
* Remove genereics & hard coded ValuesSource references from Matrix Stats (#51131)
* Put values source types on fields (#51503)
* Remove VST Any (#51539)
* Rewire terms agg to use new VS registry (#51182)
Also adds some basic AggTestCases for untested code
paths (and boilerplate for future tests once the IT are
converted over)
* Wire Cardinality aggregation to work with the ValuesSourceRegistry (#51337)
* Wire Percentiles aggregator into new VS framework (#51639)
This required a bit of a refactor to percentiles itself. Before,
the Builder would switch on the chosen algo to generate an
algo-specific factory. This doesn't work (or at least, would be
difficult) in the new VS framework.
This refactor consolidates both factories together and introduces
a PercentilesConfig object to act as a standardized way to pass
algo-specific parameters through the factory. This object
is then used when deciding which kind of aggregator to create
Note: CoreValuesSourceType.HISTOGRAM still lives in core, and will
be moved in a subsequent PR.
* Remove generics and target value type from MultiVSAB (#51647)
* fix checkstyle after merge (#52008)
* Plumb ValuesSourceRegistry through to QuerySearchContext (#51710)
* Convert RareTerms to new VS registry (#52166)
* Wire up Value Count (#52225)
* Wire up Max & Min aggregations (#52219)
* ValuesSource refactoring: Wire up Sum aggregation (#52571)
* ValuesSource refactoring: Wire up SigTerms aggregation (#52590)
* Soft immutability for VSConfig (#52729)
* Unmute testSupportedFieldTypes, fix Percentiles/Ranks/Terms tests (#52734)
Also fixes Percentiles which was incorrectly specified to only accept
numeric, but in fact also accepts Boolean and Date (because those are
numeric on master - thanks `testSupportedFieldTypes` for catching it!)
* VS refactoring: Wire up stats aggregation (#52891)
* ValuesSource refactoring: Wire up string_stats aggregation (#52875)
* VS refactoring: Wire up median (MAD) aggregation (#52945)
* fix valuesourcetype issue with constant_keyword field (#53041)x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/job/RollupIndexer.java
this commit implements `getValuesSourceType` for
the ConstantKeyword field type.
master was merged into feature/extensible-values-source
introducing a new field type that was not implementing
`getValuesSourceType`.
* ValuesSource refactoring: Wire up Avg aggregation (#52752)
* Wire PercentileRanks aggregator into new VS framework (#51693)
* Add a VSConfig resolver for aggregations not using the registry (#53038)
* Vs refactor wire up ranges and date ranges (#52918)
* Wire up geo_bounds aggregation to ValuesSourceRegistry (#53034)
This commit updates the geo_bounds aggregation to depend
on registering itself in the ValuesSourceRegistry
relates #42949.
* VS refactoring: convert Boxplot to new registry (#53132)
* Wire-up geotile_grid and geohash_grid to ValuesSourceRegistry (#53037)
This commit updates the geo*_grid aggregations to depend
on registering itself in the ValuesSourceRegistry
relates to the values-source refactoring meta issue #42949.
* Wire-up geo_centroid agg to ValuesSourceRegistry (#53040)
This commit updates the geo_centroid aggregation to depend
on registering itself in the ValuesSourceRegistry.
relates to the values-source refactoring meta issue #42949.
* Fix type tests for Missing aggregation (#53501)
* ValuesSource Refactor: move histo VSType into XPack module (#53298)
- Introduces a new API (`getBareAggregatorRegistrar()`) which allows plugins to register aggregations against existing agg definitions defined in Core.
- This moves the histogram VSType over to XPack where it belongs. `getHistogramValues()` still remains as a Core concept
- Moves the histo-specific bits over to xpack (e.g. the actual aggregator logic). This requires extra boilerplate since we need to create a new "Analytics" Percentile/Rank aggregators to deal with the histo field. Doubly-so since percentiles/ranks are extra boiler-plate'y... should be much lighter for other aggs
* Wire up DateHistogram to the ValuesSourceRegistry (#53484)
* Vs refactor parser cleanup (#53198)
Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
* First batch of easy fixes
* Remove List.of from ValuesSourceRegistry
Note that we intend to have a follow up PR dealing with the mutability
of the registry, so I didn't even try to address that here.
* More compiler fixes
* More compiler fixes
* More compiler fixes
* Precommit is happy and so am I
* Add new Core VSTs to tests
* Disabled supported type test on SigTerms until we can backport it's fix
* fix checkstyle
* Fix test failure from semantic merge issue
* Fix some metaData->metadata replacements that got lost
* Fix list of supported types for MinAggregator
* Fix list of supported types for Avg
* remove unused import
Co-authored-by: Zachary Tong <polyfractal@elastic.co>
Co-authored-by: Zachary Tong <zach@elastic.co>
Co-authored-by: Christos Soulios <1561376+csoulios@users.noreply.github.com>
Co-authored-by: Tal Levy <JubBoy333@gmail.com>
Fix MINIMUM_SCALE, MAXIMUM_SCALE and SQL_DATETIME_SUB
ODBC metadata for the DATE & TIME data types.
Fixes: #41086
(cherry picked from commit c23677cd2955e25bb952c8e7ff8ca3151ee0df98)
We have some Dockerfiles that reference Ubuntu 19.04, which is not an LTS
version and has now appears to have been retired from the Ubuntu repositories.
Switch to 18.04, which is the current long-term support version. Also change a
usage of 16.04 to 18.04, for consistency.
This change adds the spec for the new REST APIs that we
introduce for the IDP and documentation for each of the APIs. The
documentation pages are intentionally not included in the API
reference so as to minimize unnecessary exposure.
supersedes: #53858
When retrieving the snapshots for a set of repos or deleting a single snapshot, it's possible for
the body of the `ActionListener`'s `onResponse` method to throw an Exception. In this case, the
`errHandler` passed in may not be executed, resulting in the `running` boolean not being reset back
to false.
This commit uses `ActionListener.wrap(...)` instead of creating a new ActionListener, which ensures
that if the `onResponse` fails in any way, the `onFailure` handler is still called.
Resolves#55217
Today we pass the `RepositoriesService` to the searchable snapshots plugin
during the initialization of the `RepositoryModule`, forcing the plugin to be a
`RepositoryPlugin` even though it does not implement any repositories.
After discussion we decided it best for now to pass this in via
`Plugin#createComponents` instead, pending some future work in which plugins
can depend on services more dynamically.
Added an integration test to validate behaviour of string scalars on top
of aggregate functions. The behaviour was fixed with #49570.
Relates to: #41597
(cherry picked from commit 35f964154850e3f02b6c7f9ca238da98ad83ebb3)
Simplify the code by removing the generic type from InferenceConfigUpdate which
meant wildcard types were used in many places. Instead check the class type is
appropriate where used.
This commit fixes our behavior regarding the responses we
return in various cases for the use of token related APIs.
More concretely:
- In the Get Token API with the `refresh` grant, when an invalid
(already deleted, malformed, unknown) refresh token is used in the
body of the request, we respond with `400` HTTP status code
and an `error_description` header with the message "could not
refresh the requested token".
Previously we would return erroneously return a `401` with "token
malformed" message.
- In the Invalidate Token API, when using an invalid (already
deleted, malformed, unknown) access or refresh token, we respond
with `404` and a body that shows that no tokens were invalidated:
```
{
"invalidated_tokens":0,
"previously_invalidated_tokens":0,
"error_count":0
}
```
The previous behavior would be to erroneously return
a `400` or `401` ( depending on the case ).
- In the Invalidate Token API, when the tokens index doesn't
exist or is closed, we return `400` because we assume this is
a user issue either because they tried to invalidate a token
when there is no tokens index yet ( i.e. no tokens have
been created yet or the tokens index has been deleted ) or the
index is closed.
- In the Invalidate Token API, when the tokens index is
unavailable, we return a `503` status code because
we want to signal to the caller of the API that the token they
tried to invalidate was not invalidated and we can't be sure
if it is still valid or not, and that they should try the request
again.
Resolves: #53323
When a datafeed transitions from lookback to real-time we request
that state is persisted from the autodetect process in the
background.
This PR adds a test to prove that for a categorization job the
state that is persisted includes the categorization state.
Without the fix from elastic/ml-cpp#1137 this test fails. After
that C++ fix is merged this test should pass.
Backport of #55243
After #54650 we catch `TaskCancelledException` when we wait for
reindexing to complete as it may be thrown. However, when that happens
we do not mark the task as completed. This results in the stop request
never returning and the failures we saw in #55068.
Closes#55068
Backport of #55286
Following elastic/ml-cpp#1135 there are now Linux binaries
for both x86_64 and aarch64. The code that finds the
correct binaries to ship with each distribution was
including both on every Linux distribution. This change
alters that logic to consider the architecture as well
as the operating system.
Also, there is no need to disable ML on aarch64 now that
we have the native binaries available. ML is still not
supported on aarch64, but the processes at least run up
and work at a superficial level.
Backport of #55256
The ResourceWatcherService enables watching of files for modifications
and deletions. During startup various consumers register the files that
should be watched by this service. There is behavior that might be
unexpected in that the service may not start polling until later in the
startup process due to the use of lifecycle states to control when the
service actually starts the jobs to monitor resources. This change
removes this unexpected behavior so that upon construction the service
has already registered its tasks to poll resources for changes. In
making this modification, the service no longer extends
AbstractLifecycleComponent and instead implements the Closeable
interface so that the polling jobs can be terminated when the service
is no longer required.
Relates #54867
Backport of #54993
Today we indiscriminately serialize these independent of the version on
the stream, even though the other side might not understand a new
feature set usage that we have added. For example, if we add feature set
usage in 7.7 for EQL, in a mixed cluster context if a request is sent to
an old coordinating node, but the master is a new version, then it would
attempt to serialize the usage information for the new feature back to
the old coordinating node, who will blow up on the unrecognized named
writeable. This commit addresses this by making feature usage version
aware, and only serializing those that the other side would understand.
I've noticed that a lot of our tests are using deprecated static methods
from the Hamcrest matchers. While this is not a big deal in any
objective sense, it seems like a small good thing to reduce compilation
warnings and be ready for a new release of the matcher library if we
need to upgrade. I've also switched a few other methods in tests that
have drop-in replacements.
Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
Added testing of following on top of a closed index.
This could for instance be the old leader index in
cases where leader and follower clusters have been
swapped.
Upgrade to lucene 8.5.1 release that contains a bug fix for a bug that might introduce index corruption when deleting data from an index that was previously shrunk.
* [ML] adding prediction_field_type to inference config (#55128)
Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch.
Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics.
Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise).
Analytics provides the default `prediction_field_type` when the model is created from the process.
We can be a little more efficient when aborting a snapshot. Since we know the new repository
data after finalizing the aborted snapshot when can pass it down to the snapshot completion listeners.
This way, we don't have to fork off to the snapshot threadpool to get the repository data when the listener completes and can directly submit the delete task with high priority straight from the cluster state thread.
Sets the default cache size for searchable snapshots to unlimited, which, for testing purposes,
is a better default than the 1GB that we currently have.
We implicitly only supported the prime256v1 ( aka secp256r1 )
curve for the EC keys we read as PEM files to be used in any
SSL Context. We would not fail when trying to read a key
pair using a different curve but we would silently assume
that it was using `secp256r1` which would lead to strange
TLS handshake issues if the curve was actually another one.
This commit fixes that behavior in that it
supports parsing EC keys that use any of the named curves
defined in rfc5915 and rfc5480 making no assumptions about
whether the security provider in use supports them (JDK8 and
higher support all the curves defined in rfc5480).
Prior to the change in #51631 indices were moved to the `TerminalPolicyStep` when their ILM actions
had completed. Once we switched ILM to stop in the last policy configured, these steps because
inaccessible from the policy's perspective. This meant that indices upgraded from ES prior to 7.7.0
could see the following error spammed in their logs every 10 minutes (by default) for every index in
this state:
```
[2020-04-14T15:52:23,764][ERROR][o.e.x.i.IndexLifecycleRunner] [midgar] current step [{"phase":"completed","action":"completed","name":"completed"}] for index [foo] with policy [full] is not recognized
```
This changes the runner to ignore these steps, which is what is desired anyway since the index is
already in the terminal phase.
This change ensures that internal client requests spawned by the
transform persistent task executor and that use the end user security
credentials, have the parent task id assigned. The objective here is
to permit auditing (as well as tracking for debugging purposes) of all
the end-user requests executed on its behalf by persistent tasks.
Because transform tasks already implements graceful shutdown of the
child tasks, this change does not interfere with that by opting out of
the persistent task cancellation of child tasks.
Relates #55046#54943#52314Closes#54957
This commit refactors the `AuditTrail` to use the `TransportRequest` as a parameter
for all its audit methods, instead of the current `TransportMessage` super class.
The goal is to gain access to the `TransportRequest#parentTaskId` member,
so that it can be audited. The `parentTaskId` is used internally when spawning tasks
that handle transport requests; in this way tasks across nodes are related by the
same parent task.
Relates #52314
Provides basic repository-level stats that will allow us to get some insight into how many
requests are actually being made by the underlying SDK. Currently only tracks GET and LIST
calls for S3 repositories. Most of the code is unfortunately boiler plate to add a new endpoint
that will help us better understand some of the low-level dynamics of searchable snapshots.