Examines the reindex response in order to report potential
problems that occurred during the reindexing phase of
data frame analytics jobs.
Backport of #60911
If the search for get stats with multiple job Ids fails the listener is called for each failure.
This change waits for all responses then returns the first error if there was one.
This commit removes the ability to test the top level result of an aggregator
before it runs the final reduce. All aggregator tests that use AggregatorTestCase#search
are rewritten with AggregatorTestCase#searchAndReduce in order to ensure that we test
the final output (the one sent to the end user) rather than an intermediary result
that could be different.
This change also removes spurious commits triggered on top of a random index writer.
These commits slow down the tests and are redundant with the commits that the
random index writer performs.
Split the autoscaling decider into a service and configuration
in order to enable having additional context information available
in the service. Added AutoscalingDeciderContext holding generic
information all deciders are expected to need. Implemented GET
_autoscaling/decision
This adds a force-merge step to the searchable snapshot action, enabled by default,
but parameterizable using the `force_merge-index" optional boolean.
eg.
```
PUT _ilm/policy/my_policy
{
"policy": {
"phases": {
"cold": {
"actions": {
"searchable_snapshot" : {
"snapshot_repository" : "backing_repo",
"force_merge_index": true
}
}
}
}
}
}
```
(cherry picked from commit d0a17b2d35f1b083b574246bdbf3e1929471a4a9)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
"Transitive" is technically ok here but it's an overloaded word and it's
not immediately clear which meaning is intended so this log message
always makes me do a double-take. I think both "transient" and
"transitory" are clearer, with "transient" being the usual choice.
This suite is still occasionally failing with a timeout on macOS.
Suggest further increasing this timeout until this suite is broken up.
Relates #58071
* [ML] have DELETE analytics ignore stats failures and clean up unused stats (#60776)
When deleting an analytics configuration, the request MIGHT fail if
the .ml-stats index does not exist or is in strange state (shards unallocated).
Instead of making the request fail, we should log that we were unable to delete the stats docs and then
have them cleaned up in the 'delete_expire_data' janitorial process
remove test, scripts are excluded in the change collector, the test is a leftover from a previous
solution of #57332, which has been discarded
relates #60724fixes#60794
When an exception is thrown during test inference we are
not including the cause message in our logging. This commit
addresses this issue.
Backport of #60749
This pull request adds recovery state tracking for Searchable Snapshots.
In order to track recoveries for searchable snapshot backed indices, this pull
request adds a new type of RecoveryState.
This newRecoveryState instance is able to deal with the
small differences that arise during Searchable snapshots recoveries.
Those differences can be summarized as follows:
- The Directory implementation that's provided by SearchableSnapshots mark the
snapshot files as reused during recovery. In order to keep track of the
recovery process as the cache is pre-warmed, those files shouldn't be marked
as reused.
- Once the shard is created, the cache starts its pre-warming phase, meaning that
we should keep track of those downloads during that process and tie the recovery
to this pre-warming phase. The shard is considered recovered once this pre-warming
phase has finished.
Backport of #60505
disable optimizations when using scripts in group_by, when scripts using scripts we can not predict
the outcome and we have no query counterpart. Other optimizations for other group_by's are not
affected.
fixes#57332
implements a test suite for testing continuous transform with randomization in terms of mappings,
index settings, transform configuration. Add a test case for terms and date histogram. The test
covers:
- continuous mode with several checkpoints created
- correctness of results
- optimizations (minimal necessary writes)
- permutations of features (index settings, aggs, data types, index or data stream)
When the RBACEngine authorizes scroll searches it sets the index access control
to the very limiting IndicesAccessControl.ALLOW_NO_INDICES value.
This change will set it to the value for the index access control that was produced
during the authorization of the initial search that created the scroll,
which is now stored in the scroll context.
Implements license degradation behavior for searchable snapshots. Snapshot-backed shards are failed when the license becomes invalid, and shards won't be reallocated. After valid license is put in place again, shards are allocated again.
This commit removes the body property from the
indices.create_data_stream.json REST API spec
as the API does not support sending a body.
Update the description of the API to remove
that a data stream can be updated with the
API - data streams can only be created with
this API and attempting to update yields a
`resource_already_exists_exception`.
Closes#60704
(cherry picked from commit 2cab2e0ee094769852df31566dbe22b5df59d900)
Currently, validation of mappers (checking that cross-references are correct, limits on
field name lengths and object depths, multiple definitions, etc) is performed by the
MapperService. This means that any mapper-specific validation, for example that done
on the CompletionFieldMapper, needs to be called specifically from core server code,
and so we can't add validation to mappers that live in plugins.
This commit reworks the validation framework so that mapper-specific validation is
done on the Mapper itself. Mapper gets a new `validate(MappingLookup)`
method (already present on `MetadataFieldMapper` and now pulled up to the parent
interface), which is called from a new `DocumentMapper.validate()` method. All
the validation code currently living on `MapperService` moves either to individual
mapper implementations (FieldAliasMapper, CompletionFieldMapper) or into
`MappingLookup`, an altered `DocumentFieldMappers` which now knows about
object fields and can check for duplicate definitions, or into DocumentMapper
which handles soft limit checks.
* Merge test runner task into RestIntegTest (#60261)
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
* Fix merge issues
* use former 7.x common test configuration
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
This commit changes TokenAuthIntegTests so all occurrences of
assertThat(x.size(), equalTo(0));
become
assertThat(x, empty());
This means that the assertion failure message will include the
contents of the list (`x`) instead of just its size, which
facilitates easier failure diagnosis.
Relates: #56903
Backport of: #60496
This commit does three things:
* Removes all Copyright/license headers for the build.gradle files under x-pack. (implicit Apache license)
* Removes evaluationDependsOn(xpackModule('core')) from build.gradle files under x-pack
* Removes a place holder test in favor of disabling the test task (in the async plugin)
Closing a regular index and mounting a snapshot-backed index into that existing index does not clean the existing index
folders of those preexisting shards.
This PR removes the existing Lucene / translog files once the searchable snapshot shard is starting up. Future PRs will
make reuse of the existing index files to populate the cache.
When a new cluster starts, the HTTP layer becomes ready to accept incoming
requests while the basic license is still being populated in the background.
When a get license request comes in before the license is ready, it can get
404 error. This PR fixes it by either wrap the license check in assertBusy or
ensure the license is ready before perform the check.
This is a backport for both #60498 and #60573
For consistency reasons (and reducing the overload of IllegalArgumentException)
this changes the exception thrown when trying to create a data stream
that already exists.
(cherry picked from commit ac2184c4614bba0f3ee377da49aea0daed98bab4)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
* SQL: Add option to provide the delimiter for the CSV format (#59907)
* Add option to provide the delimiter to the CSV fmt
This adds the option to provide the desired character as the separator
for the CSV format (the default remains comma).
A set of characters are excluded though - like CR, LF, `"` - to avoid
slipping onto the CSV-dialects slope. The tab is also forbidden, the
user needs to choose the "tsv" format explicitely.
Update the doc to make it clear that the textual CSV, TSV and TXT
formats pass the cursor back to the user through the Cursor HTTP header.
(cherry picked from commit 3a8b00cc7480f7ada57fcea3cbac957facac08fc)
* Java8 fixes
- replace Set#of();
- URLDecoder#decode() requires a string (vs a charset) as 2nd arg.
* Fix SYS COLUMNS schema in ODBC mode (#59513)
* Fix SYS COLUMNS schema in ODBC mode
This fixes a regression when certain ODBC-specific columns that need to
be of the short type were returned as the integer type.
This also fixes the stubbing for the *-indices SYS COLUMN commands.
(cherry picked from commit 96d89dc9b1fd731e736ef804a16bd05496c1dea6)
* Java8 fix: avoid diamond notation in test.
Qualify anonymous class in test.
* fix npe on ambiguous group by
* add tests for aggregates and group by, add quotes to error message
* add more cases for Group By ambiguity test
* change error messages for field ambiguity
* change collection aliases approach
* add locations of attributes for ambiguous grouping error
* Adress review comments
- remove Comparable implementations from Attribute and Location;
- add ad-hoc comparator for sorting locations in ambiguity message;
- remove added AttributeAlias class with Touple;
- add code comment to explain issue with Location overwriting.
* Fix c&p error in location ref generation comparator
Fix copy&paste error in dedicated comparator used for sorting ambiguity
location references.
Slightly increase its readability.
Co-authored-by: Nikita Verkhovin <verkhovin13@gmail.com>
(cherry picked from commit 9ba70a3483f0f4987229bec231cdc004f51b88a5)
This commit adds compatibility testing of our JDBC driver against
different Elasticsearch versions. Although we are really testing the
forwards compatibility nature of the JDBC driver we model the testing
the same as we do existing BWC tests, that is, with the current branch
fetching the earlier versions of the artifact that is to be tested. In
this case, that's the JDBC driver itself.
Because the tests include the JDBC driver jar on it's classpath we had
to change the packaging of the driver jar in order to avoid jarhell and
other conflicting dependency issues when using an old JDBC driver with
later branches. For this we simply relocate all driver dependencies in
the shadow jar under a "shadowed" package. This allows the JDBC driver
to use the correct version of Elasticsearch libs classes, while the
tests themselves use their versions. Since this required a change to the
driver jar compatibility testing can only go back as far as that version
which at the time of this commit is 7.8.1.
Prior to this change ML memory estimation processes for a
given job would always use the same named pipe names. This
would often cause one of the processes to fail.
This change avoids this risk by adding an incrementing counter
value into the named pipe names used for memory estimation
processes.
Backport of #60395
The retention run goes through a number of steps and can randomly take more than 10s.
=> increased timeout to 30s like we did in other spots in this test
Also, noticed that we had a hard wait of 10s in this test, removed it and adjusted following
busy assert in a way that can deal with a missing snapshot (from when the assert runs before
the snapshot was put into the CS).
Closes#60336
In order to unify model inference and analytics results we
need to write the same fields.
prediction_probability and prediction_score are now written
for inference calls against classification models.
CCR will stop functioning if the master node is on 7.8, but data nodes
are before that version because the master node considers that all data
nodes do not have the remote cluster client role. This commit allows CCR
work on data nodes with legacy roles only.
Relates #54146
Relates #59375
This sets up all indexing to one of our write aliases to require it actually be an alias.
This allows failures scenarios to be captured quickly, loudly, and then potentially recovered.
If a feature is created via a custom pre-processor,
we should return the importance for that feature.
This means we will not return the importance for the
original document field for custom processed features.
closes https://github.com/elastic/elasticsearch/issues/59330
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.
Addresses #49028 and #55363.
If a primary shard of a follower index is being relocated, then we
will fail to create a follow-task. This validation is too restricted.
We should ensure that all primaries of the follower index are active
instead.
Closes#59625
Today, a follow task will fail if the master node of the follower
cluster is temporarily overloaded and unable to process master node
requests (such as update mapping, setting, or alias) from a follow-task
within the default timeout. This error is transient, and follow-tasks
should not abort. We can avoid this problem by setting the timeout of
master node requests on the follower cluster to unbounded.
Closes#56891
Data frame analytics jobs that work with very large datasets
may produce bulk requests that are over the memory limit
for indexing. This commit adds a helper class that bundles
index requests in bulk requests that steer away from the
memory limit. We then use this class both from the results
joiner and the inference runner ensuring data frame analytics
jobs do not generate bulk requests that are too large.
Note the limit was implemented in #58885.
Backport of #60219
Previously the test was asserting the prediction on each document
was close 10.0 from the expected. It turned out that was not enough
as we occasionally saw the test failing by little.
Instead of relaxing that assertion, this commit changes it to
assert the mean prediction error is less than 10.0. This should
reduce the chances of the test failing significantly.
Fixes#60212
Backport of #60221
When the job is force-closed or shutting down due to a fatal error we clean
up all cancellable job operations. This includes cancelling the results processor.
However, this means that we might not persist objects that are written from the
process like stats, memory usage, etc.
In hindsight, we do not gain from cancelling the results processor in its
entirety. It makes more sense to skip row results and model chunks but keep
stats and instrumentation about the job as the latter may contain useful information
to understand what happened to the job.
Backport of #60113
Putting an ingest pipeline used to require that the user calling
it had permission to get nodes info as well as permission to
manage ingest. This was due to an internal implementaton detail
that was not visible to the end user.
This change alters the behaviour so that a user with the
manage_pipeline cluster privilege can put an ingest pipeline
regardless of whether they have the separate privilege to get
nodes info. The internal implementation detail now runs as
the internal _xpack user when security is enabled.
Backport of #60106
It can take more than 10 seconds to auto-follow and create a follow-task
on a slow CI. This commit increases timeout in AutoFollowIT by replacing
assertBusy with assertLongBusy.
Closes#59952
Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient.
By the same token, the use of stream copying with the default 8k buffer size for blob writes was inefficient as well.
We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`.
This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs.
This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.
This commit continues on the work in #59801 and makes other
implementors of the LocalNodeMasterListener interface thread safe in
that they will no longer allow the callbacks to run on different
threads and possibly race each other. This also helps address other
issues where these events could be queued to wait for execution while
the service keeps moving forward thinking it is the master even when
that is not the case.
In order to accomplish this, the LocalNodeMasterListener no longer has
the executorName() method to prevent future uses that could encounter
this surprising behavior.
Each use was inspected and if the class was also a
ClusterStateListener, the implementation of LocalNodeMasterListener
was removed in favor of a single listener that combined the logic. A
single listener is used and there is currently no guarantee on execution
order between ClusterStateListeners and LocalNodeMasterListeners,
so a future change there could cause undesired consequences. For other
classes, the implementations of the callbacks were inspected and if the
operations were lightweight, the overriden executorName method was
removed to use the default, which runs on the same thread.
Backport of #59932
In #58877, when we switched test inference on java, we just
use the doc's `_source` as features. However, this could be
missing out on features that were used during training,
e.g. alias fields, etc.
This commit addresses this by extracting fields to use as
features during inference the same way they are extracted
in `DataFrameDataExtractor` when they are used for training.
Backport of #59963
If shard level results are incomplete in the data streams stats call, it is possible to get inaccurate
counts of the number of backing indices, despite this data being accurate and available in the
cluster state.
This PR removes the expand_wildcards and forbid_closed_indices parameters from the Data
Streams Stats REST endpoint. These options are required for broadcast requests, but are not
needed for anything in terms of resolving data streams. Instead, we just set a default set of
IndicesOptions on the transport request.
This test failed by hitting the 10s default busy assert timeout.
Given how involved the retention run is (multiple disk reads, CS updates etc.)
we should have a higher timeout here.
Also, removed the pointless delete call for the snapshot that we just asserted is gone,
at the end of the test.
Closes#59956
We never used the `IndexSettings` parameter and we only used the
`MappedFieldType` parameter to get the name of the field which we
already know everywhere where we build the `IFD.Builder`. This allows us
to drop a fair bit of ceremony from a couple of tests.
This PR further reduces the memory footprint of the
testGeoHashGridCircuitBreaker test such that only
0.26% of the randomized runs result in memory usage of between
500kb-1mb. where most of that those that are in that range
produce ~650kb of usage. Before, 3% of the runs would use
> 50mb of memory resulting in OOMs in CI
Closes#59853.
This change fixes two possible race conditions in SLM related to
how local master changes and cluster state events are observed. When
implementing the `LocalNodeMasterListener` interface, it is only
recommended to execute on a separate threadpool if the operations are
heavy and would block the cluster state thread. SLM specified that the
listeners should run in the Snapshot thread pool, but the operations
in the listener were lightweight. This had the side effect of causing
master changes to be delayed if the Snapshot threads were all busy and
could also potentially cause the `onMaster` and `offMaster` calls to
race if both were queued and then executed concurrently. Additionally,
the `SnapshotLifecycleService` is also a `ClusterStateListener` and
there is currently no order of operations guarantee between
`LocalNodeMasterListeners` and `ClusterStateListeners` so this could
lead to incorrect behavior.
The resolution for these two issues is that the
SnapshotRetentionService now specifies the `SAME` executor for its
implementation of the `LocalNodeMasterListener` interface. The
`SnapshotLifecycleService` is no longer a `LocalNodeMasterListener` and
instead tracks local master changes in its `ClusterStateListner`.
Backport of #59801
The submit async search action should not populate the thread context
DLS/FLS permission set, because it is not currently authorised as an "indices request"
and hence the permission set that it builds is incomplete and it overrides the
DLS/FLS permission set of the actual spawned search request (which is built correctly).
When dealing with tail queries, data is returned descending for the base
criterion yet the rest of the queries are ascending. This caused a
problem during insertion since while in a page, the data is ASC, between
pages the blocks of data is DESC.
This caused incorrectly sorting inside a SequenceGroup which led to
incorrect results.
Further more in case of limit, since the data in a page is ASC, early
return is not possible neither is desc matching. Thus the page needs to
be consumed first before finding the final results.
A future improvement could be to keep only the top N results dropping
the rest during insertion time.
(cherry picked from commit 77c88da054a1ce662a264f72cde5986d4ce37e3a)
There was a bug in the geoshape circuit-breaker check where the
hash values array was being allocated before its new size was
accounted for by the circuit breaker.
Fixes#57847.
* Adding new `require_alias` option to indexing requests (#58917)
This commit adds the `require_alias` flag to requests that create new documents.
This flag, when `true` prevents the request from automatically creating an index. Instead, the destination of the request MUST be an alias.
When the flag is not set, or `false`, the behavior defaults to the `action.auto_create_index` settings.
This is useful when an alias is required instead of a concrete index.
closes https://github.com/elastic/elasticsearch/issues/55267
This commit makes DateFieldMapper extend ParametrizedFieldMapper,
declaring its parameters explicitly. As well as changes to DateFieldMapper
itself, there are some changes to dynamic mapping code to ensure that
dynamically detected date formats are passed through to new date mapper
builders.
The ILM policy for the source and shrunk indices run separately (ie. they
are two separate managed indices). This fixes the test which exhibited some
flakiness by allowing some time for the ILM policy for the source index
to finish executing.
(cherry picked from commit c78d5e8499fc5ca2ca1314f97bcc6b55ba06e2e7)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Adds a hard_bounds parameter to explicitly limit the buckets that a histogram
can generate. This is especially useful in case of open ended ranges that can
produce a very large number of buckets.
Implement DATE_PARSE(<date_str>, <pattern_str>) function
which allows to parse a date string according to the specified
pattern into a date object. The patterns allowed are those of
java.time.format.DateTimeFormatter.
Closes#54962
Co-authored-by: Marios Trivyzas <matriv@users.noreply.github.com>
Co-authored-by: Patrick Jiang(白泽) <dreamlike.sky@foxmail.com>
(cherry picked from commit 647a413d9b21bd3938f1716bb19f8407e1334125)
* [ML] add new `custom` field to trained model processors (#59542)
This commit adds the new configurable field `custom`.
`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.
Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).
This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors
in the analytics job configuration, we need to know the input and output field names.
Eclipse was confused by #59583. It can't see a the public inner
interface within the superclass. This time. Usually that is fine, but
the Eclipse gods don't like this particular code, I guess.
Using serialization/deserialization when dealing with non-trivial
documents causes the process to get stuck not to mention it is expensive.
Use a much more simple approach at the expense of losing information
(we're just interested in the source after all).
(cherry picked from commit e1659822db7ce1390ba9bbfb21768e24a0907dff)
Previously the concrete type parameters for the MappedFieldType didn't always
match those for the FieldMapper. This PR updates the mappers so that the type
parameters always match, which makes the design easier to follow.
to avoid many error stacktraces in logs during a rolling upgrade.
Stack templates use the composable index template and component APIs,these APIs
aren't supported in 7.7 and earlier and in mixed cluster
environments this can cause a lot of ActionNotFoundTransportException
errors in the logs during rolling upgrades. If these templates
are only installed via elected master node then the APIs are always
there and the ActionNotFoundTransportException errors are then prevented.
When an inference model is loaded it is accounted for in circuit breaker
and should not be released until there are no users of the model. Adds
a reference count to the model to track usage.
Today when mounting a searchable snapshot we obtain the snapshot/index
UUIDs and then assume that these are the UUIDs used during the
subsequent restore. If you concurrently delete the snapshot and replace
it with one with the same name then this assumption is violated, with
chaotic consequences.
This commit introduces a check that ensures that the snapshot UUID does
not change during the mount process. If the snapshot remains in place
then the index UUID necessarily does not change either.
Relates #50999
Case sensitivity is incorporated as a test dimension - instead of
running the same test twice, two different tests are created.
Clean-up the test invocation by removing unused parameters.
Fix#59294
(cherry picked from commit 72c8a3582d8e8a4a663d82814a17a1a3d2757292)
Unfortunately, we cannot guarantee that the execution will be truly
async even with 0ms timeout since we cannot block the execution. So, we need
to modify the test to work in both async and non-async mode.
Closes#59416
Backport of #59525 to 7.x branch.
* Actions are moved to xpack core.
* Transport and rest actions are moved the data-streams module.
* Removed data streams methods from Client interface.
* Adjusted tests to use client.execute(...) instead of data stream specific methods.
* only attempt to delete all data streams if xpack is installed in rest tests
* Now that ds apis are in xpack and ESIntegTestCase
no longers deletes all ds, do that in the MlNativeIntegTestCase
class for ml tests.
Since #58728 writing operations on searchable snapshot directory cache files
are executed in an asynchronous manner using a dedicated thread pool. The
thread pool used is searchable_snapshots which has been created to execute
prewarming tasks.
Reusing the same thread pool wasn't a good idea as it can lead to deadlock
situations. One of these situation arose in a test failure where the thread pool
was full of prewarming tasks, all waiting for a cache file to be accessible, while
the cache file was being evicted by the cache service. But such an eviction
can only be processed when all read/write operations on the cache file are
completed and in this case the deadlock occurred because the cache file was
actively being read by a concurrent search which also won the privilege to
write the range of bytes in cache... and this writing operation could never have
been completed because of the prewarming tasks making no progress and
filling up the thread pool.
This commit renames the searchable_snapshots thread pool to
searchable_snapshots_cache_fetch_async. Assertions are added to assert
that cache writes are executed using this thread pool and to assert that read
on cached index inputs are executed using a different thread pool to avoid
potential deadlock situations.
This commit also adds a searchable_snapshots_cache_prewarming that is
used to execute prewarming tasks. It also converts the existing cache prewarming
test into a more complte integration test that creates multiple searchable
snapshot indices concurrently with randomized thread pool sizes, and verifies
that all files have been correctly prewarmed.
Add a custom factory for recovery state into IndexStorePlugin that
allows different implementors to provide its own RecoveryState
implementation.
Backport of #59038
Enables fully concurrent snapshot operations:
* Snapshot create- and delete operations can be started in any order
* Delete operations wait for snapshot finalization to finish, are batched as much as possible to improve efficiency and once enqueued in the cluster state prevent new snapshots from starting on data nodes until executed
* We could be even more concurrent here in a follow-up by interleaving deletes and snapshots on a per-shard level. I decided not to do this for now since it seemed not worth the added complexity yet. Due to batching+deduplicating of deletes the pain of having a delete stuck behind a long -running snapshot seemed manageable (dropped client connections + resulting retries don't cause issues due to deduplication of delete jobs, batching of deletes allows enqueuing more and more deletes even if a snapshot blocks for a long time that will all be executed in essentially constant time (due to bulk snapshot deletion, deleting multiple snapshots is mostly about as fast as deleting a single one))
* Snapshot creation is completely concurrent across shards, but per shard snapshots are linearized for each repository as are snapshot finalizations
See updated JavaDoc and added test cases for more details and illustration on the functionality.
Some notes:
The queuing of snapshot finalizations and deletes and the related locking/synchronization is a little awkward in this version but can be much simplified with some refactoring. The problem is that snapshot finalizations resolve their listeners on the `SNAPSHOT` pool while deletes resolve the listener on the master update thread. With some refactoring both of these could be moved to the master update thread, effectively removing the need for any synchronization around the `SnapshotService` state. I didn't do this refactoring here because it's a fairly large change and not necessary for the functionality but plan to do so in a follow-up.
This change allows for completely removing any trickery around synchronizing deletes and snapshots from SLM and 100% does away with SLM errors from collisions between deletes and snapshots.
Snapshotting a single index in parallel to a long running full backup will execute without having to wait for the long running backup as required by the ILM/SLM use case of moving indices to "snapshot tier". Finalizations are linearized but ordered according to which snapshot saw all of its shards complete first
There is no point in writing out snapshots that contain no data that can be restored
whatsoever. It may have made sense to do so in the past when there was an `INIT` snapshot
step that wrote data to the repository that would've other become unreferenced, but in the
current day state machine without the `INIT` step there is no point in doing so.
Many of the parameters we pass into this method were only used to
build the `SnapshotInfo` instance to write.
This change simplifies the signature. Also, it seems less error prone to build
`SnapshotInfo` in `SnapshotsService` isntead of relying on the fact that each repository
implementation will build the correct `SnapshotInfo`.
This commit adds a new api to track when gold+ features are used within
x-pack. The tracking is done internally whenever a feature is checked
against the current license. The output of the api is a list of each
used feature, which includes the name, license level, and last time it
was used. In addition to a unit test for the tracking, a rest test is
added which ensures starting up a default configured node does not
result in any features registering as used.
There are a couple features which currently do not work well with the
tracking, as they are checked in a manner that makes them look always
used. Those features will be fixed in followups, and in this PR they are
omitted from the feature usage output.
This API reports on statistics important for data streams, including the number of data
streams, the number of backing indices for those streams, the disk usage for each data
stream, and the maximum timestamp for each data stream
Instead of retrieving an entire SearchHit, get just a reference and
postpone the document retrieval when assembling the final results.
Remove sort information from results to make them consistent.
Move TumblingWindow under the sequence package.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
(cherry picked from commit bccfbcd81f2f1d3552e95e4a9ee2618fb3059bd9)
API keys can be created nameless using the grant endpoint (it is a bug, see #59484).
This change ensures auditing doesn't throw when such an API Key is used for authn.
The `create_doc`, `create`, `write` and `index` privileges do not grant
the PutMapping action anymore. Apart from the `write` privilege, the other
three privileges also do NOT grant (auto) updating the mapping when ingesting
a document with unmapped fields, according to the templates.
In order to maintain the BWC in the 7.x releases, the above privileges will still grant
the Put and AutoPutMapping actions, but only when the "index" entity is an alias
or a concrete index, but not a data stream or a backing index of a data stream.
This PR introduces two new fields in to `RepositoryData` (index-N) to track the blob name of `IndexMetaData` blobs and their content via setting generations and uuids. This is used to deduplicate the `IndexMetaData` blobs (`meta-{uuid}.dat` in the indices folders under `/indices` so that new metadata for an index is only written to the repository during a snapshot if that same metadata can't be found in another snapshot.
This saves one write per index in the common case of unchanged metadata thus saving cost and making snapshot finalization drastically faster if many indices are being snapshotted at the same time.
The implementation is mostly analogous to that for shard generations in #46250 and piggy backs on the BwC mechanism introduced in that PR (which means this PR needs adjustments if it doesn't go into `7.6`).
Relates to #45736 as it improves the efficiency of snapshotting unchanged indices
Relates to #49800 as it has the potential of loading the index metadata for multiple snapshots of the same index concurrently much more efficient speeding up future concurrent snapshot delete
Currently we combine coordinating and primary bytes into a single bucket
for indexing pressure stats. This makes sense for rejection logic.
However, for metrics it would be useful to separate them.
The `Authentication` object that gets built following an API Key authentication
contains the realm name of the owner user that created the key (which is audited),
but the specific field used for storing it changed in #51305 .
This PR makes it so that auditing tolerates an "unfound" realm name, so it doesn't
throw an NPE, because the owner realm name is not found in the expected field.
Closes#59425
Renames and moves the cross validation splitter package.
First, the package and classes are renamed from using
"cross validation splitter" to "train test splitter".
Cross validation as a term is overloaded and encompasses
more concepts than what we are trying to do here.
Second, the package used to be under `process` but it does
not make sense to be there, it can be a top level package
under `dataframe`.
Backport of #59529
When a field is not included yet its type is unsupported, we currently
state that the reason the field is excluded is that it is not in the
includes list. However, this implies the user could include it but
if the user tried to do so, they would get a failure as they would
be including a field with unsupported type.
This commit improves this by stating the reason a not included field
with unsupported type is excluded is because of its type.
Backport of #59424
The primary shards of follower indices during the bootstrap need to be
on nodes with the remote cluster client role as those nodes reach out to
the corresponding leader shards on the remote cluster to copy Lucene
segment files and renew the retention leases. This commit introduces a
new allocation decider that ensures bootstrapping follower primaries are
allocated to nodes with the remote cluster client role.
Co-authored-by: Jason Tedor <jason@tedor.me>
Since we have added checking the cardinality of the dependent_variable
for classification, we have introduced a bug where an NPE is thrown
if the dependent_variable is a missing field.
This commit is fixing this issue.
Backport of #59524
This PR adds minimum support for prefix search of API Key name. It only touches API key name and leave all other query parameters, e.g. realm name, username unchanged.
This makes the data_stream timestamp field specification optional when
defining a composable template.
When there isn't one specified it will default to `@timestamp`.
(cherry picked from commit 5609353c5d164e15a636c22019c9c17fa98aac30)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Certain OPs mix usage of boolean and string for boolean type OIDC claims. For example, the same "email_verified" field is presented as boolean in IdToken, but is a string of "true" in the response of user info. This inconsistency results in failures when we try to merge them during authorization.
This PR introduce a small leniency so that it will merge a boolean with a string that has value of the boolean's string representation. In another word, it will merge true with "true", also will merge false with "false", but nothing else.
This adds a low precendece mapping for the `@timestamp` field with
type `date`.
This will aid with the bootstrapping of data streams as a timestamp
mapping can be omitted when nanos precision is not needed.
(cherry picked from commit 4e72f43d62edfe52a934367ce9809b5efbcdb531)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
Improve the way limit (in particular offset) is being applied to handle
the case where the matches are less than the offset and absolute limit.
Combine Matcher and SequenceStateMachine into one class since the two
have evolved beyond their original name and structure.
(cherry picked from commit 63d3c62cdfc33dea03f21d5565b9c8ea104003eb)
separate pivot from the indexer and introduce an abstraction layer, pivot becomes a function.
Foundation to add more functions to transform.
piggy backed fixes:
- when running geo tile group_by it could fail due to query clause limit (unreleased)
- new style page size using settings was not validating limit of 10k (7.8)
- Fix duplicate path deprecation by removing duplicate test resources
- fix deprecated non annotated input property in LazyPropertyList
- fix deprecated usage of AbstractArchiveTask.version
- Resolve correct test resources
Now that we have per-partition categorization, the estimate for
the model memory limit required for a particular analysis config
needs to take into account whether categorization is operating
for the job as a whole or per-partition.
API keys can be created without names using grant API key action. This is considered as a bug (#59484). Since the feature has already been released, we need to accomodate existing keys that are created with null names. This PR relaxes the parser logic so that a null name is accepted.
We have recently added internal metrics to monitor the amount of
indexing occurring on a node. These metrics introduce back pressure to
indexing when memory utilization is too high. This commit exposes these
stats through the node stats API.
This commit adds data stream info to the `/_xpack` and `/_xpack/usage` APIs. Currently the usage is
pretty minimal, returning only the number of data streams and the number of indices currently
abstracted by a data stream:
```
...
"data_streams" : {
"available" : true,
"enabled" : true,
"data_streams" : 3,
"indices_count" : 17
}
...
```
Removes member variable `index` from `ExtractedFieldsDetector`
as it is not used.
Backport of #59395
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Backport of #59293 to 7.x branch.
* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
this results in storing a composable index template
with data stream definition only to work with default
distribution. This way data streams can only be used
with default distribution, since a data stream can
currently only be created if a matching composable index
template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
to fail if `_data_stream_timestamp` meta field mapper
isn't registered. So that a more understandable
error is returned when attempting to store a template
with data stream definition via the oss distribution.
In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.
With the introduction of per-partition categorization the old
logic for creating a job notification for categorization status
"warn" does not work. However, the C++ code is already writing
annotations for categorization status "warn" that take into
account whether per-partition categorization is being used and
which partition(s) the warnings relate to. Therefore, this
change alters the Java results processor to create notifications
based on the annotations the C++ writes. (It is arguable that
we don't need both annotations and notifications, but they show
up in different ways in the UI: only annotations are visible in
results and only notifications set the warning symbol in the
jobs list. This means it's best to have both.)
Backport of #59377
This PR ensure that same roles are cached only once even when they are from different API keys.
API key role descriptors and limited role descriptors are now saved in Authentication#metadata
as raw bytes instead of deserialised Map<String, Object>.
Hashes of these bytes are used as keys for API key roles. Only when the required role is not found
in the cache, they will be deserialised to build the RoleDescriptors. The deserialisation is directly
from raw bytes to RoleDescriptors without going through the current detour of
"bytes -> Map -> bytes -> RoleDescriptors".
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.
This setting may also be updated for a stopped job.
More threads may reduce the time it takes to complete the job at the cost
of using more CPU.
Backport of #59254 and #57274
Sequences now support until conditional, which prevents a match from
occurring if the until matches a document while doing look-ups.
Thus a sequence must complete before the until condition matches - if
any document within the sequence occurs at, or after, the until hit, the
sequence is discarded.
(cherry picked from commit 1ba1b9f0661aee655aa48cf9475ac61aaee2bfda)
Since we are able to load the inference model
and perform inference in java, we no longer need
to rely on the analytics process to be performing
test inference on the docs that were not used for
training. The benefit is that we do not need to
send test docs and fit them in memory of the c++
process.
Backport of #58877
Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.
This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.
Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields
to the `access_granted`, `access_denied`, `authentication_success`, and
(some) `tampered_request` audit events. The `apikey.id` and `apikey.name`
are present only when authn using an API Key.
2. When authn with an API Key, the `user.realm` field now contains the effective
realm name of the user that created the key, instead of the synthetic value of
`_es_api_key`.
* Add sample versions of standard deviation and variance functions (#59093)
* Add STDDEV_SAMP, VAR_SAMP
This commit adds the sampling variations of the standard deviation and
variance agg functions.
(cherry picked from commit 8b29817b49e386215f29cb5b3356d0183fd5d9de)
* Fix: workaround for lack of Map#of() in Java8
Replace Map#of() with a HashMap static init.
Waiting `INIT` here is dead code in newer versions that don't use `INIT`
any longer and leads to nothing being written to the repository in older versions
if the snapshot is cancelled at the `INIT` step which then breaks repo consistency
checks.
Since we have other tests ensuring that snapshot abort works properly we can just remove
the wait for `INIT` here and backport this down to 7.8 to fix tests.
relates #59140
Backport of #59076 to 7.x branch.
The commit makes the following changes:
* The timestamp field of a data stream definition in a composable
index template can only be set to '@timestamp'.
* Removed custom data stream timestamp field validation and reuse the validation from `TimestampFieldMapper` and
instead only check that the _timestamp field mapping has been defined on a backing index of a data stream.
* Moved code that injects _timestamp meta field mapping from `MetadataCreateIndexService#applyCreateIndexRequestWithV2Template58956(...)` method
to `MetadataIndexTemplateService#collectMappings(...)` method.
* Fixed a bug (#58956) that cases timestamp field validation to be performed
for each template and instead of the final mappings that is created.
* only apply _timestamp meta field if index is created as part of a data stream or data stream rollover,
this fixes a docs test, where a regular index creation matches (logs-*) with a template with a data stream definition.
Relates to #58642
Relates to #53100Closes#58956Closes#58583