For correctness we need to verify whether the history uuid of the leader
index shards never changes while that index is being followed.
* The history UUIDs are recorded as custom index metadata in the follow index.
* The follow api validates whether the current history UUIDs of the leader
index shards are the same as the recorded history UUIDs.
If not the follow api fails.
* While a follow index is following a leader index; shard follow tasks
on each shard changes api call verify whether their current history uuid
is the same as the recorded history uuid.
Relates to #30086
Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
This change adds a `_source` only snapshot repository that allows to wrap
any existing repository as a _backend_ to snapshot only the `_source` part
including live docs markers. Snapshots taken with the `source` repository
won't include any indices, doc-values or points. The snapshot will be reduced in size and
functionality such that it requires full re-indexing after it's successfully restored.
The restore process will copy the `_source` data locally starts a special shard and engine
to allow `match_all` scrolls and searches. Any other query, or get call will fail with and unsupported operation exception. The restored index is also marked as read-only.
This feature aims mainly for disaster recovery use-cases where snapshot size is
a concern or where time to restore is less of an issue.
**NOTE**: The snapshot produced by this repository is still a valid lucene index. This change doesn't allow for any longer retention policies which is out of scope for this change.
Improve failure handling of retryable errors by retrying remote calls in
a exponential backoff like manner. The delay between a retry would not be
longer than the configured max retry delay. Also retryable errors will be
retried indefinitely.
Relates to #30086
This commit adds settings upgraders for the search.remote.* settings
that can be in the cluster state to automatically upgrade these settings
to cluster.remote.*. Because of the infrastructure that we have here,
these settings can be upgraded when recovering the cluster state, but
also when a user tries to make a dynamic update for these settings.
These logs are incredibly verbose, and it makes chasing normal failures
burdensome. This commit removes the debug logging, which can be
reenabled again if needed.
* SQL: Make Literal a NamedExpression
Literal now is a NamedExpression reducing the need for Aliases for
folded expressions leading to simpler optimization rules.
Fix#33523
This change tightens up the meaning of the "input_fields" field
in the file structure finder output. Previously it was permitted
but not calculated for JSON and XML files. Following this change
the field is called "column_names" and is only permitted for
delimited files.
Additionally the way the column names are set for headerless
delimited files is refactored to encapsulate the way they're
named to one line of the code rather than having the same
logic in two places.
Previously, when an arithmetic function got applied on a
table column in the `SELECT` clause, the name of the result
column contained weird characters used internally when
processing the SQL statement e.g.:
SELECT CHAR(emp_no % 10000) FROM "test_emp"
returned:
CHAR((emp_no{f}#14) % 10000))
as the column name instead of:
CHAR((emp_no) % 10000))
Also, fix an issue that causes a ClassCastException to be thrown
when using functions where both arguments are literals.
Closes#31869Closes#33461
* LeafCollector.setScorer() now takes a Scorable
* Scorers may not have null Weights
* IndexWriter.getFlushingBytes() reports how much memory is being used by IW threads writing to disk
* [CCR] Delay auto follow license check
so that we're sure that there are auto follow patterns configured
Otherwise we log a warning in case someone is running with basic or gold
license and has not used the ccr feature.
This is a new index privilege that the user needs to have in the follow cluster.
This privilege is required in addition to the `manage_ccr` cluster privilege in
order to execute the create and follow api.
Closes#33555
* Correctly handle NONE keyword for system keystore
As defined in the PKCS#11 reference guide
https://docs.oracle.com/javase/8/docs/technotes/guides/security/p11guide.html
PKCS#11 tokens can be used as the JSSE keystore and truststore and
the way to indicate this is to set `javax.net.ssl.keyStore` and
`javax.net.ssl.trustStore` to `NONE` (case sensitive).
This commits ensures that we honor this convention and do not
attempt to load the keystore or truststore if the system property is
set to NONE.
* Handle password protected system truststore
When a PKCS#11 token is used as the system truststore, we need to
pass a password when loading it, even if only for reading
certificate entries. This commit ensures that if
`javax.net.ssl.trustStoreType` is set to `PKCS#11` (as it would
when a PKCS#11 token is in use) the password specified in
`javax.net.ssl.trustStorePassword` is passed when attempting to
load the truststore.
Relates #33459
In some cases we want to deprecate a setting, and then automatically
upgrade uses of that setting to a replacement setting. This commit adds
infrastructure for this so that we can upgrade settings when recovering
the cluster state, as well as when such settings are dynamically applied
on cluster update settings requests. This commit only focuses on cluster
settings, index settings can build on this infrastructure in a
follow-up.
When requesting job stats for `_all`, all ES tasks are accepted
resulting to loads of cluster traffic and a memory overhead.
This commit correctly filters out non ML job tasks.
Closes#33515
We may use different global checkpoints to validate/normalize the range
of a change request if the global checkpoint is advanced between these
calls. If this is the case, then we generate an invalid request range.
This commit reverses the logic for CCR license checks in a few
actions. This is done so that the successful case, which tends to be a
larger block of code, does not require indentation.
We have some listeners in the CCR license tests that invoke Assert#fail
if the onSuccess method for the listener is unexpectedly invoked. This
can leave the main test thread hanging until the test suite times out
rather than failing quickly. This commit adds some latch countdowns so
that we fail quickly if these cases are hit.
In the multi-cluster-with-non-compliant-license tests, we try to write
out a java.policy to a temporary directory. However, if this temporary
directory does not already exist then writing the java.policy file will
fail. This commit ensures that the temporary directory exists before we
attempt to write the java.policy file.
This commit adds license checks for the auto-follow implementation. We
check the license on put auto-follow patterns, and then for every
coordination round we check that the local and remote clusters are
licensed for CCR. In the case of non-compliance, we skip coordination
yet continue to schedule follow-ups.
This commit ensures that we bootstrap a new history_uuid when force
allocating a stale primary. A stale primary should never be the source
of an operation-based recovery to another shard which exists before the
forced-allocation.
Closes#26712
Today when checking settings dependencies, we do not check if fallback
settings are present. This means, for example, that if
cluster.remote.*.seeds falls back to search.remote.*.seeds, and
cluster.remote.*.skip_unavailable and search.remote.*.skip_unavailable
depend on cluster.remote.*.seeds, and we have set search.remote.*.seeds
and search.remote.*.skip_unavailable, then validation will fail because
it is expected that cluster.ermote.*.seeds is set here. This commit
addresses this by also checking fallback settings when validating
dependencies. To do this, we adjust the settings exist method to also
check for fallback settings, a case that it was not handling previously.
Change the logging infrastructure to handle when the node name isn't
available in `elasticsearch.yml`. In that case the node name is not
available until long after logging is configured. The biggest change is
that the node name logging no longer fixed at pattern build time.
Instead it is read from a `SetOnce` on every print. If it is unset it is
printed as `unknown` so we have something that fits in the pattern.
On normal startup we don't log anything until the node name is available
so we never see the `unknown`s.
This endpoint accepts an arbitrary file in the request body and
attempts to determine the structure. If successful it also
proposes mappings that could be used when indexing the file's
contents, and calculates simple statistics for each of the fields
that are useful in the data preparation step prior to configuring
machine learning jobs.
Watcher validates `action.auto_create_index` upon startup. If a user
specifies a pattern that does not contain watcher indices, it raises an
error message to include a list of three indices. However, the indices
are separated by a comma and a space which is not considered in parsing.
With this commit we change the error message string so it does not
contain the additional space thus making it more straightforward to copy
it to the configuration file.
Closes#33369
Relates #33497
Instead of passing DirectoryService which causes yet another dependency
on Store we can just pass in a Directory since we will just call
`DirectoryService#newDirectory()` on it anyway.
This change collapses all metrics aggregations classes into a single package `org.elasticsearch.aggregations.metrics`.
It also restricts the visibility of some classes (aggregators and factories) that should not be used outside of the package.
Relates #22868
Some browsers (eg. Firefox) behave differently when presented with
multiple auth schemes in 'WWW-Authenticate' header. The expected
behavior is that browser select the most secure auth-scheme before
trying others, but Firefox selects the first presented auth scheme and
tries the next ones sequentially. As the browser interpretation is
something that we do not control, we can at least present the auth
schemes in most to least secure order as the server's preference.
This commit modifies the code to collect and sort the auth schemes
presented by most to least secure. The priority of the auth schemes is
fixed, the lower number denoting more secure auth-scheme.
The current order of schemes based on the ES supported auth-scheme is
[Negotiate, Bearer,Basic] and when we add future support for
other schemes we will need to update the code. If need be we will make
this configuration customizable in future.
Unit test to verify the WWW-Authenticate header values are sorted by
server preference as more secure to least secure auth schemes.
Tested with Firefox, Chrome, Internet Explorer 11.
Closes#32699
This commit allows us to use different TranslogRecoveryRunner when
recovering an engine from its local translog. This change is a
prerequisite for the commit-based rollback PR.
Relates #32867
Split function section into multiple chapters
Add String functions
Add (small) section on Conversion/Cast functions
Add missing aggregation functions
Enable documentation testing (was disabled by accident). While at it,
fix failing tests
Improve spec tests to allow multi-line queries (useful for docs)
Add ability to ignore a spec test (name should end with -Ignore)
The main benefit of the upgrade for users is the search optimization for top scored documents when the total hit count is not needed. However this optimization is not activated in this change, there is another issue opened to discuss how it should be integrated smoothly.
Some comments about the change:
* Tests that can produce negative scores have been adapted but we need to forbid them completely: #33309Closes#32899
Many files supplied to the upcoming ML data preparation
functionality will not be "log" files. For example,
CSV files are generally not "log" files. Therefore it
makes sense to rename library that determines the
structure of these files.
Although "file structure" could be considered too broad,
as the library currently only works with a few text
formats, in the future it may be extended to work with
more formats.