We must not remove the snapshot from the initializing set
in the `timeout` getter. This was a plain oversight/mistake
and went unnoticed. It can lead to the removal of a valid
snapshot clone from the cluster state in rare circumstances
(e.g. when a node concurrently joins the cluster or a routing
change happens as it did in the linked test failure).
Closes#64115
The `NodeNotConnectedException` exception can be nested as well in the
fairly unlikley case of the disconnect occuring between the connected check
and actually sending the request in the transport service.
Closes#63233
There is a small chance that the file deletion will run
on the searchable snapshot thread pool and not on the test
thread now that the cache is non-blocking in which case
we fail the assertion unless we wait for that thread.
Adds support for the unsigned_long type to data frame analytics.
This type is handled in the same way as the long type. Values
sent to the ML native processes are converted to floats and
hence will lose accuracy when outside the range where a float
can uniquely represent long values.
Backport of #64066
If we run into a background merge between creating the snapshot and closing the index
then with compound files we could be in a situation where we get zero file reuse
on restore.
Force merging before the snapshot gives us a single segment that won't change down the line
so the restore always sees file reuse from the close index.
Closes#63476
Replacing the mechanism for eviction and listener references via a read-write lock by
a reference counting implementation.
This fixes a bug that caused test failure #63586 in which concurrently trying to acquire or release
an eviction listener while doing a file operation would sometimes lead to throwing an exception
since the `tryLock` call on the read lock would fail in this case.
Also this removes the possibility of blocking cluster state updates as a result of them waiting
on the write-lock which might take a long time if a slow read operation executes concurrently.
Closes#63586
Assuming the clone failed when the request failed is not sufficient.
There are failure modes where the request fails but the clone still works out
because the data node resent the requeest after the first clone had already been
failed and removed from the cluster state when master was restarted.
Closes#63473
The setting name is script.context.$CONTEXT.cache_max_size rather than
script.context.$CONTEXT.context_max_size.
Co-authored-by: Andrew Kroh <andrew.kroh@elastic.co>
This commit updates the list of system index names to be complete and
correct for Kibana and APM. The pattern `.kibana*` is too inclusive for
system indices and actually includes the
`.kibana-event-log-${version}-${int}` pattern for the Kibana event log,
which should only be hidden and not a system index. Additionally, the
`.apm-custom-link` index was not included in the list of system
indices. Finally, the reporting pattern has been updated to match that
of the permissions given to the kibana_system role.
Backport of #63950
* Add APM index to Kibana system indices, making it
accessible through the _kibana endpoint and giving it the
same access privileges as the other Kibana system indices.
* Parameterize kibana system index tests by index name
Backport of #63756
Co-authored-by: William Brafford <williamrandolphbrafford@gmail.com>
The deprecation indexing code was writing to a regular data stream,
and it is not yet possible to hide a data stream or prefix it with
a period. This functionality we be re-added once it is possible to
mark a data stream as hidden, and also to not rely on the standard
logs template since that can be disabled.
We have to wait for no more operations here not for `1`. This mostly worked
because the test thread would add the listener quickly enough so that it sees the
state where either the snapshot or clone but not both have already finished
but randomly the test thread would be slow and time out on a state without snaphots in it.
testHealthOnMasterFailover could timeout on some of the health requests
in the case where an index is added, since the recovery leads to
extended test run time.
Closes#62690
The officially supported way to clearing all entries from a cache is to use
wildcard of either * or _all. Though empty string has the same effect, it was
never intended. Therefore the tests should not use empty string and this PR
changes them to use *.
add support for unsigned_long, which required a change in
writing out integer results properly, because coerce is not
supported for unsigned_long
fixes#63871
backport #63940
We had and an error when serializing fully reduced scripted metrics.
Small typo and sever lack of tests..... Anyway, this fixed the one
character typo and adds a bunch more tests.
This commit adds a test in DiskThresholdDeciderTests that verifies
the allocation of a snapshot recovery source based shard in the
situation where the snapshot shard size was successfully provided
by the SnapshotInfoService introduced in #61906 and when the
service failed to provide the size.
Relates #61906
When calculating feature importance, the leaf values directly correlate the value of the importance.
Consequently, positive leaf values -> positive feature importance
negative leaf values -> negative feature importance.
It follows that for binary classification, this is done such that the importance relates to the leaf values, which relate directly to the "probability of class 1".
So, the feature importance calculated is always for the importance as it relates to class 1.
The inverse is the importance as it relates to class 0.
* Fix concurrent modification on task realization
* Use taskprovider instead of relying on tasks in distribution setup
* Port more task references in :distribution to task provider
* Fix nullpointer in distribution setup
Max and min aggs were producing wrong results for unsigned_long field
if field was indexed. If field is indexed for max/min aggs instead of
field data, we use values from indexed Points, values of which
are derived using method pointReaderIfPossible. Before
UnsignedLongFieldType#pointReaderIfPossible was incorrectly
producing values, as it failed to shift them back to original
values.
This patch fixes method pointReaderIfPossible to produce
correct original values.
Relates to #60050
In #57892 I broke *some* sub-aggregations inside of the `parent` and
`child` aggregator, specifically any sub-aggregations that do work in
the `postCollect` phase. This fixes it by delaying the post collect
phase of aggs under `parent` and `child` until `beforeBuildingBuckets`
because, well, we haven't done *any* collection until after that phase.
* [DOCS] Combining important config settings into a single page (#63849)
* Combining important config settings into a single page.
* Updating ids for two pages causing link errors and implementing redirects.
* Updating links to use IDs instead of xrefs.