This refactoring has three motivations:
1. Separate all master node steps during snapshot operations from all data node steps in code.
2. Set up next steps in concurrent repository operations and general improvements by centralizing tracking of each shard's state in the repository in `SnapshotsService` so that operations for each shard can be linearized efficiently (i.e. without having to inspect the full snapshot state for all shards on every cluster state update, allowing us to track more in memory and only fall back to inspecting the full CS on master failover like we do in the snapshot shards service).
* This PR already contains some best effort examples of this, but obviously this could be way improved upon still (just did not want to do it in this PR for complexity reasons)
3. Make the `SnapshotsService` less expensive on the CS thread for large snapshots
- Fixes how libs in distribution are resolved
- Required minor rework on common repository setup to allow distribution projects
to resolve thirdparty artifacts
- Use Default configurations when resolving tools for distribution packaging
- Related to #57920
With the removal of mapping types and the immutability of FieldTypeLookup in #58162, we no longer
have any cause to compare MappedFieldType instances. This means that we can remove all equals
and hashCode implementations, and in addition we no longer need the clone implementations which
were required for equals/hashcode testing. This greatly simplifies implementing new MappedFieldTypes,
which will be particularly useful for the runtime fields project.
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.
This setting may also be updated for a stopped job.
More threads may reduce the time it takes to complete the job at the cost
of using more CPU.
Backport of #59254 and #57274
This modifies the `variable_width_histogram`'s distant bucket handling
to:
1. Properly handle integer overflows
2. Recalculate the average distance when new buckets are added on the
ends. This should slow down the rate at which we build extra buckets
as we build more of them.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Sequences now support until conditional, which prevents a match from
occurring if the until matches a document while doing look-ups.
Thus a sequence must complete before the until condition matches - if
any document within the sequence occurs at, or after, the until hit, the
sequence is discarded.
(cherry picked from commit 1ba1b9f0661aee655aa48cf9475ac61aaee2bfda)
Since we are able to load the inference model
and perform inference in java, we no longer need
to rely on the analytics process to be performing
test inference on the docs that were not used for
training. The benefit is that we do not need to
send test docs and fit them in memory of the c++
process.
Backport of #58877
Co-authored-by: Dimitris Athanasiou <dimitris@elastic.co>
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Part of #48366. Add documentation for the dangling indices
API added in #58176.
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
In #52680 we introduced a new health check mechanism. This commit fixes
up some sporadic related test failures, and improves the behaviour of
the `FollowersChecker` slightly in the case that no retries are
configured.
Closes#59252Closes#59172
Today `NodeEnvironment#findAllShardIds` enumerates the index directories
in each data path in order to find one with a specific name. Since we
already know the name of the folder we seek we can construct the path
directly and avoid this directory listing. This commit does that.
The FieldMapper infrastructure currently has a bunch of shared parameters, many of which
are only applicable to a subset of the 41 mapper implementations we ship with. Merging,
parsing and serialization of these parameters are spread around the class hierarchy, with
much repetitive boilerplate code required. It would be much easier to reason about these
things if we could declare the parameter set of each FieldMapper directly in the implementing
class, and share the parsing, merging and serialization logic instead.
This commit is a first effort at introducing a declarative parameter style. It adds a new FieldMapper
subclass, ParametrizedFieldMapper, and refactors two mappers, Boolean and Binary, to use it.
Parameters are declared on Builder classes, with the declaration including the parameter name,
whether or not it is updateable, a default value, how to parse it from mappings, and how to
extract it from another mapper at merge time. Builders have a getParameters method, which
returns a list of the declared parameters; this is then used for parsing, merging and serialization.
Merging is achieved by constructing a new Builder from the existing Mapper, and merging in
values from the merging Mapper; conflicts are all caught at this point, and if none exist then a new,
merged, Mapper can be built from the Builder. This allows all values on the Mapper to be final.
Other mappers can be gradually migrated to this new style, and once they have all been refactored
we can merge ParametrizedFieldMapper and FieldMapper entirely.
1. Add the `apikey.id`, `apikey.name` and `authentication.type` fields
to the `access_granted`, `access_denied`, `authentication_success`, and
(some) `tampered_request` audit events. The `apikey.id` and `apikey.name`
are present only when authn using an API Key.
2. When authn with an API Key, the `user.realm` field now contains the effective
realm name of the user that created the key, instead of the synthetic value of
`_es_api_key`.
* Add sample versions of standard deviation and variance functions (#59093)
* Add STDDEV_SAMP, VAR_SAMP
This commit adds the sampling variations of the standard deviation and
variance agg functions.
(cherry picked from commit 8b29817b49e386215f29cb5b3356d0183fd5d9de)
* Fix: workaround for lack of Map#of() in Java8
Replace Map#of() with a HashMap static init.
In 7.x, we have java 8 as minimum jdk version. In the past, for
packaging tests, we relied on the system to provide an alternative jdk
to be used by the no-jdk distributions. Master switched to using a build
provided jdk, but 7.x was stuck relying on the system because it needed
a java 8 jdk. The jdk download plugin was updated a while ago to support
jdk 8, and so this PR converts the distro tests to use the build
provided jdk just as master branch does.
Note also this fixes a failure that would sometimes occur on older jdks
in windows where the expected gc filename can be different.
* [DOCS] Adding get snapshot API docs (#59098)
* Adding page for get snapshot API.
* Adding values for state and cleaning up some other formatting.
* Adding missing forward slash to GET request.
* Updating values for start_time and end_time in TESTRESPONSE.
* Swap "return" for "retrieve"
* Swap "return" for "retrieve" 2
* Change .snapshot to .response
* Adding response parameters and incorporating edits from review.
* Update response example to include repository info
* Change dash to underscore
* Add data type for snapshot in response
* Incorporating review comments and adding missing response definitions.
* Minor rewording in description.
* Removing multi-snapshot support for 7.x.
* Changing end_time value from build error.
* Removing .response from snippet testing.
These tests sometimes install a template so they can be compatible with older versions, but they run
amok of the occasionally installed "global" template which changes the default number of shards.
This commit adds `allowedWarnings` and allows these warnings to be present, but doesn't fail if they
are not (since the global template is only randomly installed).
Resolves#58807Resolves#58258
The recovery chunk size setting was injected in #58018, but too
aggressively and broke several tests. This change removes that
random injection.
Relates #58018
We are leaking a FileChannel in #39585 if we release a safe commit with
CancellableThreads. Although it is a bug in Lucene where we do not close
a FileChannel if we failed to create a NIOFSIndexInput, I think it's
safer if we release a safe commit using the generic thread pool instead.
Closes#39585
Relates #45409
Waiting `INIT` here is dead code in newer versions that don't use `INIT`
any longer and leads to nothing being written to the repository in older versions
if the snapshot is cancelled at the `INIT` step which then breaks repo consistency
checks.
Since we have other tests ensuring that snapshot abort works properly we can just remove
the wait for `INIT` here and backport this down to 7.8 to fix tests.
relates #59140