Generalize how queries on `_index` are handled at rewrite time (#52486)
Since this change refactors rewrites, I also took it as an opportunity to adrress #49254: instead of returning the same queries you would get on a keyword field when a field is unmapped, queries get rewritten to a MatchNoDocsQueryBuilder.
This change exposed a couple bugs, like the fact that the percolator doesn't rewrite queries at query time, or that the significant_terms aggregation doesn't rewrite its inner filter, which I fixed.
Closes#49254
Currently we have two ways to create a GroupShardsIterator: one that will resort the iterators based on their natural ordering, and another one that will leave them in their original order. This is currently done through two constructors, one that accepts a single argument which does the sorting, and another which accepts a second boolean argument to control whether sorting should happen or not. This second constructor is only called externally to disable the sorting.
By introducing a specific method to create a sorted shard iterator we clarify and make it easier to track when we do sort and when we do not as the iterators are externally sorted.
This change fixes the incomplete backport of #46731 in 7.x (as of 7.5).
We now check if `max_children` is set on the top level nested sort and fails with an
exception if it's not the case.
Relates #46731Closes#52202
* Remove TODO in MaxAgeCondition serialization
This removes the TODO with a message for any future readers regarding the code in question.
Resolves#52505
Currently the shard bulk request can be rejected by the write threadpool
after a mapping update. This introduces a scenario where the mapping
listener thread will attempt to finish the request and fsync. This
thread can potentially be a transport thread. This commit fixes this
issue by forcing the finish action to happen on the write threadpool.
Fixes#51904.
Lucene's RAMDirectory has been deprecated. This commit replaces all uses of
RAMDirectory in elasticsearch with the newer ByteBuffersDirectory. Most uses
are in tests, but the percolator and painless executor may get some small speedups.
Currently, date ranges queries using NOW-based date math are rewritten to
MatchAllDocs queries when being preprocessed for the percolator. However,
since we added the verification step, this can result in incorrect matches when
percolator queries are run without scores. This commit changes things to instead
wrap date queries that use NOW with a new DateRangeIncludingNowQuery.
This is a simple wrapper query that returns its delegate at rewrite time, but it can
be detected by the percolator QueryAnalyzer and be dealt with accordingly.
This also allows us to remove a method on QueryRewriteContext, and push all
logic relating to NOW-based ranges into the DateFieldMapper.
Fixes#52617
When the Node class is being constructed, an initial environment is
passed in with the initial settings for the node. Once the plugin
servicie is initialized, the final Environment+Settings are created, at
which point the initial environment should no longer be used. This
commit renames the constructor arg to avoid naming clashes with the
final environment variable.
Before boost in script_score query was wrongly applied only to the subquery.
This commit makes sure that the boost is applied to the whole score
that comes out of script.
Closes#48465
In #42838 we moved the terms index of all fields off-heap except the
`_id` field because we were worried it might make indexing slower. In
general, the indexing rate is only affected if explicit IDs are used, as
otherwise Elasticsearch almost never performs lookups in the terms
dictionary for the purpose of indexing. So it's quite wasteful to
require the terms index of `_id` to be loaded on-heap for users who have
append-only workloads. Furthermore I've been conducting benchmarks when
indexing with explicit ids on the http_logs dataset that suggest that
the slowdown is low enough that it's probably not worth forcing the terms
index to be kept on-heap. Here are some numbers for the median indexing
rate in docs/s:
| Run | Master | Patch |
| --- | ------- | ------- |
| 1 | 45851.2 | 46401.4 |
| 2 | 45192.6 | 44561.0 |
| 3 | 45635.2 | 44137.0 |
| 4 | 46435.0 | 44692.8 |
| 5 | 45829.0 | 44949.0 |
And now heap usage in MB for segments:
| Run | Master | Patch |
| --- | ------- | -------- |
| 1 | 41.1720 | 0.352083 |
| 2 | 45.1545 | 0.382534 |
| 3 | 41.7746 | 0.381285 |
| 4 | 45.3673 | 0.412737 |
| 5 | 45.4616 | 0.375063 |
Indexing rate decreased by 1.8% on average, while memory usage decreased
by more than 100x.
The `http_logs` dataset contains small documents and has a simple
indexing chain. More complex indexing chains, e.g. with more fields,
ingest pipelines, etc. would see an even lower decrease of indexing rate.
This drops more of the `instanceof`s from `AggregationPath`. There are
still a couple in `AggregationPath`. And I ended up moving two into
`BucketsAggregator`, but I think this is still an improvement!
We consider index level read_only_allow_delete blocks temporary since
the DiskThresholdMonitor can automatically release those when an index
is no longer allocated on nodes above high threshold.
The rest status has therefore been changed to 429 when encountering this
index block to signal retryability to clients.
Related to #49393
This commit renames ElasticsearchAssertions#assertThrows to
assertRequestBuilderThrows and assertFutureThrows to avoid a
naming clash with JUnit 4.13+ and static imports of these methods.
Additionally, these methods have been updated to make use of
expectThrows internally to avoid duplicating the logic there.
Relates #51787
Backport of #52582
Phase 1 of adding compilation limits per context.
* Refactor rate limiting and caching into separate class,
`ScriptCache`, which will be used per context.
* Disable compilation limit for certain tests.
Backport of 0866031
Refs: #50152
This commit modifies the codebase so that our production code uses a
single instance of the IndexNameExpressionResolver class. This change
is being made in preparation for allowing name expression resolution
to be augmented by a plugin.
In order to remove some instances of IndexNameExpressionResolver, the
single instance is added as a parameter of Plugin#createComponents and
PersistentTaskPlugin#getPersistentTasksExecutor.
Backport of #52596
Cache latest `RepositoryData` on heap when it's absolutely safe to do so (i.e. when the repository is in strictly consistent mode).
`RepositoryData` can safely be assumed to not grow to a size that would cause trouble because we often have at least two copies of it loaded at the same time when doing repository operations. Also, concurrent snapshot API status requests currently load it independently of each other and so on, making it safe to cache on heap and assume as "small" IMO.
The benefits of this move are:
* Much faster repository status API calls
* listing all snapshot names becomes instant
* Other operations are sped up massively too because they mostly operate in two steps: load repository data then load multiple other blobs to get the additional data
* Additional cloud cost savings
* Better resiliency, saving another spot where an IO issue could break the snapshot
* We can simplify a number of spots in the current code that currently pass around the repository data in tricky ways to avoid loading it multiple times in follow ups.
* Refactor Inflexible Snapshot Repository BwC (#52365)
Transport the version to use for a snapshot instead of whether to use shard generations in the snapshots in progress entry. This allows making upcoming repository metadata changes in a flexible manner in an analogous way to how we handle serialization BwC elsewhere.
Also, exposing the version at the repository API level will make it easier to do BwC relevant changes in derived repositories like source only or encrypted.
AllocationDeciders would collect Yes decisions when not asking for debug
info. Changed to only include Yes decisions when debug is requested
(explain).
Added ability to specify comma separated list of source indices without
array. Also fixed so that empty string results in validation error
rather than index does not exist.
Closes#51949
Issue #52000 looks like a case of cluster state updates being slower than
expected, but it seems that these slowdowns are relatively rare: most
invocations of `testDelayWithALargeAmountOfShards` take well under a minute in
CI, but there are occasional failures that take 6+ minutes instead. When it
fails like this, cluster state persistence seems generally slow: most are
slower than expected, with some small updates even taking over 2 seconds to
complete.
The failures all have in common that they use `WindowsFS` to emulate Windows'
behaviour of refusing to delete files that are still open, by tracking all
files (really, inodes) and validating that deleted files are really closed
first. There is a suggestion that this is a little slow in the Lucene test
framework [1]. To see if we can attribute the slowdown to that common factor,
this commit suppresses the use of `WindowsFS` for this test suite.
[1] 4a513fa99f/lucene/test-framework/src/java/org/apache/lucene/util/TestRuleTemporaryFilesCleanup.java (L166)
Currently we lock when generating time based uuids. The lock is
implemented to prevent concurrent writes to the last timestamp. The uuid
generation is an area of contention when indexing. This commit modifies
the code to use atomic compare and set operations to update the last
timestamp.
Every time a setting#exist call is made we lock on the keyset to ensure
that it has been initialized. This a heavyweight operation that only
should be done once. This commit moves to a volatile read instead to
prevent unnecessary locking.
Currently we have three different implementations representing a
`ConnectionManager`. There is the basic `ConnectionManager` which
holds all connections for a cluster. And a remote connection manager
which support proxy behavior. And a stubbable connection manager for
tests. The remote and stubbable instances use the delegate pattern,
so this commit extracts an interface for them all to implement.
It looks like #52000 is caused by a slowdown in cluster state application
(maybe due to #50907) but I would like to understand the details to ensure that
there's nothing else going on here too before simply increasing the timeout.
This commit enables some relevant `DEBUG` loggers and also captures stack
traces from all threads rather than just the three hottest ones.
When `FilterStreamInput` wraps a Netty `ByteBuf` based stream it
did not forward the bulk primitive reads to the delegate.
These are optimized on the delegate but if they're not forwarded
then the delegate will be called e.g. 4 times to read an `int`.
This happens for essentially all network reads prior to this
change because they all run from a `NamedWritableAwareStreamInput`.
This also required optimising `BufferedChecksumStreamInput` individually to use bulk reads from the buffer because it implicitly assumed that the filter stream input wouldn't override any of the bulk operations.
The `top_metrics` agg is kind of like `top_hits` but it only works on
doc values so it *should* be faster.
At this point it is fairly limited in that it only supports a single,
numeric sort and a single, numeric metric. And it only fetches the "very
topest" document worth of metric. We plan to support returning a
configurable number of top metrics, requesting more than one metric and
more than one sort. And, eventually, non-numeric sorts and metrics. The
trick is doing those things fairly efficiently.
Co-Authored by: Zachary Tong <zach@elastic.co>
Fixes the the no-query optimization for `min` and `max` aggregations
for `date_nanos` fields by delegating decoding dates "through" their
`resolution` member.
Closes#52220
This commit makes the names of fetch subphases more consistent:
* Now the names end in just 'Phase', whereas before some ended in
'FetchSubPhase'. This matches the query subphases like AggregationPhase.
* Some names include 'fetch' like FetchScorePhase to avoid ambiguity about what
they do.
This adds a builder and parsed results for the `string_stats`
aggregation directly to the high level rest client. Without this the
HLRC can't access the `string_stats` API without the elastic licensed
`analytics` module.
While I'm in there this adds a few of our usual unit tests and
modernizes the parsing.
When `date_histogram` attempts to optimize itself it for a particular
time zone it checks to see if the entire shard is within the same
"transition". Most time zone transition once every size months or
thereabouts so the optimization can usually kicks in.
*But* it crashes when you attempt feed it a time zone who's last DST
transition was before epoch. The reason for this is a little twisted:
before this patch it'd find the next and previous transitions in
milliseconds since epoch. Then it'd cast them to `Long`s and pass them
into the `DateFieldType` to check if the shard's contents were within
the range. The trouble is they are then converted to `String`s which are
*then* parsed back to `Instant`s which are then convertd to `long`s. And
the parser doesn't like most negative numbers. And everything before
epoch is negative.
This change removes the
`long` -> `Long` -> `String` -> `Instant` -> `long` chain in favor of
passing the `long` -> `Instant` -> `long` which avoids the fairly complex
parsing code and handles a bunch of interesting edge cases around
epoch. And other edge cases around `date_nanos`.
Closes#50265
We need to reduce the translog sync interval for indices with translog
async setting so that we can have the safe commit in the assertBusy
interval. This is needed since #51905, where we use the local checkpoint
of the safe commit to calculate the number of uncommitted operations of
a translog stats.
Closes#52251
Relates #51905
This commit removes the need for DeprecatedRoute and ReplacedRoute to
have an instance of a DeprecationLogger. Instead the RestController now
has a DeprecationLogger that will be used for all deprecated and
replaced route messages.
Relates #51950
Backport of #52278
Add a new cluster setting `search.allow_expensive_queries` which by
default is `true`. If set to `false`, certain queries that have
usually slow performance cannot be executed and an error message
is returned.
- Queries that need to do linear scans to identify matches:
- Script queries
- Queries that have a high up-front cost:
- Fuzzy queries
- Regexp queries
- Prefix queries (without index_prefixes enabled
- Wildcard queries
- Range queries on text and keyword fields
- Joining queries
- HasParent queries
- HasChild queries
- ParentId queries
- Nested queries
- Queries on deprecated 6.x geo shapes (using PrefixTree implementation)
- Queries that may have a high per-document cost:
- Script score queries
- Percolate queries
Closes: #29050
(cherry picked from commit a8b39ed842c7770bd9275958c9f747502fd9a3ea)
The buffer in LoggingOutputStream skips flushing when only a newline
appears. However, if a windows newline appeared, the buffer length was
not reset. This commit resets the length so the \r does not appear in
the next logging message.
closes#51838
MockRandomMergePolicy randomly determines if a segment should use a
compound format. This can cause a force merge performing two merges: (1)
merging to a single segment, (2) rewriting the new segment using the
compound format. If the second merge completes after we have flushed,
then it can flip the flag shouldPeriodicallyFlushAfterBigMerge to true.
Closes#52205
Modifies SLM's and ILM's history indices to be hidden indices for added
protection against accidental querying and deletion, and improves
IndexTemplateRegistry to handle upgrading index templates.
Also modifies the REST test cleanup to delete hidden indices.
This removes a bunch of `instanceof`s in favor of two new methods on
`InernalAggregation`. The default implementations of these methods just
throw exceptions explaining that you can't sort on this aggregation.
They are overridden by all of the classes that used to have `instanceof`
checks against them.
I doubt this is really any faster in practice. The real benefit here is
that it is a little more obvious *that* you can sort by the results of
an aggregation and it should be *much* more obvious where to look at
*how* aggregations sort themselves.
There are still a bunch more `instanceof`s in left in `AggregationPath`
but those will wait for a followup change.
disallow to specify percentile out of range [0,100]. This also fixes a problem in transform by failing
validation if an invalid percentile configuration is used.
Changes the misleading error message when attempting to open
a job while the "cluster.persistent_tasks.allocation.enable"
setting is set to "none" to a clearer message that names the
setting.
Closes#51956
Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely
if joining a faulty master that is too slow to respond, and
`cluster.publish.timeout` to allow a faulty master to detect that it is unable
to publish its cluster state updates in a timely fashion. If these timeouts
occur then the node restarts the discovery process in an attempt to find a
healthier master.
In the special case of `discovery.type: single-node` there is no point in
looking for another healthier master since the single node in the cluster is
all we've got. This commit suppresses these timeouts and instead lets the node
wait for joins and publications to succeed no matter how long this might take.
Previously, the dot-index rules (namely, that indices with dot-prefixed
names should be either hidden indices or system indices) was done
before* template application, and so only checked for the `index.hidden`
setting in the request, ignoring if that setting was set via a template.
This commit moves that check to a different method, which is applied
after templates have been resolved and applied to the index settings.
This commit fixes another edge case in handling windows newlines in our
capture of stdout/stderr to log4j. The case is that the \r appears at
the beginning of the buffer when flushing, which would unintentionally
be emitted as an empty string. This commit skips the flush if only a \r
was found.
closes#51838
Currently, the logic for looking up `flattened` field types lives in the
top-level `FieldTypeLookup`. This PR moves it into a dedicated class
`DynamicKeyFieldTypeLookup`.
Segment(s) info blobs are already stored with their full content
in the "hash" field in the shard snapshot metadata as long as they are
smaller than 1MB. We can make use of this fact and never upload them
physically to the repo.
This saves a non-trivial number of uploads and downloads when restoring
and might also lower the latency of searchable snapshots since they can save
phyiscally loading this information as well.
- Enable SunJGSS provider for Kerberos tests
- Handle the fact that in the decrypt method in KeyStoreWrapper might
not throw immediately when the GCM cipher is from BouncyCastle FIPS
and we end up with a DataInputStream that has reached it's end.
- Disable tests, jarHell, testingConventions for ingest attachment
plugin. We don't support this plugin (and document this) in FIPS
mode.
- Don't attempt to install ingest-attachment in smoke-test-plugins
This commit changes how RestHandlers are registered with the
RestController so that a RestHandler no longer needs to register itself
with the RestController. Instead the RestHandler interface has new
methods which when called provide information about the routes
(method and path combinations) that are handled by the handler
including any deprecated and/or replaced combinations.
This change also makes the publication of RestHandlers safe since they
no longer publish a reference to themselves within their constructors.
Closes#51622
Co-authored-by: Jason Tedor <jason@tedor.me>
Backport of #51950
We might leak a searcher if the target shard is removed (i.e., its index
is deleted) or relocated while we are creating a SearchContext from a
SearchRewriteContext.
Relates #51708Closes#52021
I labelled this non-issue for an unreleased bug introduced in #51708.
We can just put the `IndexId` instead of just the index name into the recovery soruce and
save one load of `RepositoryData` on each shard restore that way.
We need to either exclude null responses from the scroll search response
or always create a search context for every target shards, although that
scroll query can be written to match_no_docs. Otherwise, we won't find
search_context for subsequent scroll requests.
This commit implements the latter option as it's less error-prone.
Relates #51708
This change ensures that the rewrite of the shard request is executed in the network thread or in the refresh listener when waiting for an active shard. This allows queries that rewrite to match_no_docs to bypass the search thread pool entirely even if the can_match phase was skipped (pre_filter_shard_size > number of shards). Coordinating nodes don't have the ability to create empty responses so this change also ensures that at least one shard creates a full empty response while the other can return null ones. This is needed since creating true empty responses on shards require to create concrete aggregators which would be too costly to build on a network thread. We should move this functionality to aggregation builders in a follow up but that would be a much bigger change.
This change is also important for #49601 since we want to add the ability to use the result of other shards to rewrite the request of subsequent ones. For instance if the first M shards have their top N computed, the top worst document in the global queue can be pass to subsequent shards that can then rewrite to match_no_docs if they can guarantee that they don't have any document better than the provided one.
QueryBuilders that throw exceptions on shards when building the Lucene query
returns the full serialization of the query builder in the exception message.
For large queries that fails to execute due to the max boolean clause, this means
that we keep a reference of these big messages for every shard that participate
in the request. In order to limit the memory needed to hold these query shard
exceptions in the coordinating node, this change removes the query builder
serialization from the shard exception. The query is known by the user so
there should be no need to repeat it on every shard exception. We could also
omit the entire stack trace for known bad request exception but it would deserve
a separate issue/pr.
Closes#51843Closes#48910
When the `rare_terms` aggregation contained another aggregation it'd
break them. Most of the time. This happened because the process that it
uses to remove buckets that turn out not to be rare was incorrectly
merging results from multiple leaves. This'd cause array index out of
bounds issues. We didn't catch it in the test because the issue doesn't
happen on the very first bucket. And the tests generated data in such a
way that the first bucket always contained the rare terms. Randomizing
the order of the generated data fixed the test so it caught the issue.
Closes#51020
Currently, the same class `FieldCapabilities` is used both to represent the
capabilities for one index, and also the merged capabilities across indices. To
help clarify the logic, this PR proposes to create a separate class
`IndexFieldCapabilities` for the capabilities in one index. The refactor will
also help when adding `source_path` information in #49264, since the merged
source path field will have a different structure from the field for a single index.
Individual changes:
* Add a new class IndexFieldCapabilities.
* Remove extra constructor from FieldCapabilities.
* Combine the add and merge methods in FieldCapabilities.Builder.
Currently, a mappings update request, where dynamic_mappings is an object
instead of an array, results in a http response with a 500 code. This PR checks
for this condition and throws a MapperParsingException like we do for other
malformed mapping cases.
Closes#51486
ActionListener.completeWith would catch exceptions from
listener.onResponse and deliver them to lister.onFailure, essentially
double notifying the listener. Instead we now assert that listeners do
not throw when using ActionListener.completeWith.
Relates #50886
when a timezone is not provided Ingest logic should consider a time to be in a timezone provided as a parameter.
When a timezone is provided Ingest should recalculate a time to the timezone provided as a parameter
closes#51108
backport(#51215)
With the new mechanism for storing cluster state in lucene, we store
index metadata in multiple data paths too. This causes cluster state
publish to timeout too frequently with a 1s timeout, so increasing it to
5s. Also increasing follower check timeout to 5s since it also sometimes
has fsync in its timeout path and leader check for symmetry.
Closes#51329
While we use `== false` as a more visible form of boolean negation
(instead of `!`), the true case is implied and the true value does not
need to explicitly checked. This commit converts cases that have slipped
into the code checking for `== true`.
LoggingOutputStream reads a stream and breaks on newlines. This commit
fixes the behavior to account for windows newlines also containing `\r`.
closes#51532
This commit inlines the `weightShardAdded` and `weightShardRemoved` methods
from the `BalancedShardsAllocator#WeightFunction` that respectively add and
subtract 1 (±ε) from the result of `weight`. It then follows up with a number
of simplifications that this inlining enables.
As a side-effect it also somewhat reduces the number of calls to canRebalance
and canAllocate during rebalancing when there are multiple shards of the same
index on a node that is heavier than average.
When running Elasticsearch on a flaky network, we may see nodes leaving the
cluster with reason `disconnected`. It may be useful to the cluster
administrator to see the full exception that caused the disconnection, but this
is only available with `TRACE` level logging which commingles the details of
the problem with other messages that are not useful to end users.
This commit promotes logging of exceptions in `TcpTransport` from `TRACE` to
`DEBUG` to separate them from the truly `TRACE`-level messages.
This commit switches the strategy for managing dot-prefixed indices that
should be hidden indices from using "fake" system indices to an explicit
exclusions list that must be updated when those indices are converted to
hidden indices.
* Fix InternalEngineTests.testSeqNoAndCheckpoints
If we force flush while possibly triggering a merge the local checkpoint may change
from the expectation from the loop that just increments on every operation.
Closes#51604
There is no reason not to allow deletes in parallel to restores
if they're dealing with different snapshots.
A delete will not remove any files related to the snapshot that
is being restored if it is different from the deleted snapshot
because those files will still be referenced by the restoring
snapshot.
Loading RepositoryData concurrently to modifying it is concurrency
safe nowadays as well since the repo generation is tracked in the
cluster state.
Closes#41463
ActionListener.map would call listener.onFailure for exceptions from
listener.onResponse, but this means we could double trigger some
listeners which is generally unexpected. Instead, we should assume that
a listener's onResponse (and onFailure) implementation is responsible
for its own exception handling.
In #50259 we redirected stdout and stderr to log4j, to capture jdk
and external library messages. However, a typo in the method name used
to redirect the stream in java means stdout is currently being
duplicated twice, and stderr not captured. This commit corrects that
mistake. Unfortunately this is at a level that cannot really be tested,
thus we are still missing tests for this behavior.
We need to warm up the engine (i.e., perform an external refresh) before
accessing the external refresh. Note that we refresh externally before
allowing reading from a shard.
Relates #48605Closes#51548
When checking if a device is up, today we can run into virtual ethernet
devices that disappear while we are in the middle of checking. This
leads to "no such device". This commit addresses such devices by
treating them as not being up, if they are virtual ethernet devices that
disappeared while we were checking.
This commit moves the logic that cancels search requests when the rest channel is closed
to a generic client that can be used by other APIs. This will be useful for any rest action
that wants to cancel the execution of a task if the underlying rest channel is closed by the
client before completion.
Relates #49931
Relates #50990
Relates #50990
* Allow Repository Plugins to Filter Metadata on Create
Add a hook that allows repository plugins to filter the repository metadata
before it gets written to the cluster state.
This commit deprecates the creation of dot-prefixed index names (e.g.
.watches) unless they are either 1) a hidden index, or 2) registered by
a plugin that extends SystemIndexPlugin. This is the first step
towards more thorough protections for system indices.
This commit also modifies several plugins which use dot-prefixed indices
to register indices they own as system indices, and adds a plugin to
register .tasks as a system index.
* Reload secure settings with password (#43197)
If a password is not set, we assume an empty string to be
compatible with previous behavior.
Only allow the reload to be broadcast to other nodes if TLS is
enabled for the transport layer.
* Add passphrase support to elasticsearch-keystore (#38498)
This change adds support for keystore passphrases to all subcommands
of the elasticsearch-keystore cli tool and adds a subcommand for
changing the passphrase of an existing keystore.
The work to read the passphrase in Elasticsearch when
loading, which will be addressed in a different PR.
Subcommands of elasticsearch-keystore can handle (open and create)
passphrase protected keystores
When reading a keystore, a user is only prompted for a passphrase
only if the keystore is passphrase protected.
When creating a keystore, a user is allowed (default behavior) to create one with an
empty passphrase
Passphrase can be set to be empty when changing/setting it for an
existing keystore
Relates to: #32691
Supersedes: #37472
* Restore behavior for force parameter (#44847)
Turns out that the behavior of `-f` for the add and add-file sub
commands where it would also forcibly create the keystore if it
didn't exist, was by design - although undocumented.
This change restores that behavior auto-creating a keystore that
is not password protected if the force flag is used. The force
OptionSpec is moved to the BaseKeyStoreCommand as we will presumably
want to maintain the same behavior in any other command that takes
a force option.
* Handle pwd protected keystores in all CLI tools (#45289)
This change ensures that `elasticsearch-setup-passwords` and
`elasticsearch-saml-metadata` can handle a password protected
elasticsearch.keystore.
For setup passwords the user would be prompted to add the
elasticsearch keystore password upon running the tool. There is no
option to pass the password as a parameter as we assume the user is
present in order to enter the desired passwords for the built-in
users.
For saml-metadata, we prompt for the keystore password at all times
even though we'd only need to read something from the keystore when
there is a signing or encryption configuration.
* Modify docs for setup passwords and saml metadata cli (#45797)
Adds a sentence in the documentation of `elasticsearch-setup-passwords`
and `elasticsearch-saml-metadata` to describe that users would be
prompted for the keystore's password when running these CLI tools,
when the keystore is password protected.
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
* Elasticsearch keystore passphrase for startup scripts (#44775)
This commit allows a user to provide a keystore password on Elasticsearch
startup, but only prompts when the keystore exists and is encrypted.
The entrypoint in Java code is standard input. When the Bootstrap class is
checking for secure keystore settings, it checks whether or not the keystore
is encrypted. If so, we read one line from standard input and use this as the
password. For simplicity's sake, we allow a maximum passphrase length of 128
characters. (This is an arbitrary limit and could be increased or eliminated.
It is also enforced in the keystore tools, so that a user can't create a
password that's too long to enter at startup.)
In order to provide a password on standard input, we have to account for four
different ways of starting Elasticsearch: the bash startup script, the Windows
batch startup script, systemd startup, and docker startup. We use wrapper
scripts to reduce systemd and docker to the bash case: in both cases, a
wrapper script can read a passphrase from the filesystem and pass it to the
bash script.
In order to simplify testing the need for a passphrase, I have added a
has-passwd command to the keystore tool. This command can run silently, and
exit with status 0 when the keystore has a password. It exits with status 1 if
the keystore doesn't exist or exists and is unencrypted.
A good deal of the code-change in this commit has to do with refactoring
packaging tests to cleanly use the same tests for both the "archive" and the
"package" cases. This required not only moving tests around, but also adding
some convenience methods for an abstraction layer over distribution-specific
commands.
* Adjust docs for password protected keystore (#45054)
This commit adds relevant parts in the elasticsearch-keystore
sub-commands reference docs and in the reload secure settings API
doc.
* Fix failing Keystore Passphrase test for feature branch (#50154)
One problem with the passphrase-from-file tests, as written, is that
they would leave a SystemD environment variable set when they failed,
and this setting would cause elasticsearch startup to fail for other
tests as well. By using a try-finally, I hope that these tests will fail
more gracefully.
It appears that our Fedora and Ubuntu environments may be configured to
store journald information under /var rather than under /run, so that it
will persist between boots. Our destructive tests that read from the
journal need to account for this in order to avoid trying to limit the
output we check in tests.
* Run keystore management tests on docker distros (#50610)
* Add Docker handling to PackagingTestCase
Keystore tests need to be able to run in the Docker case. We can do this
by using a DockerShell instead of a plain Shell when Docker is running.
* Improve ES startup check for docker
Previously we were checking truncated output for the packaged JDK as
an indication that Elasticsearch had started. With new preliminary
password checks, we might get a false positive from ES keystore
commands, so we have to check specifically that the Elasticsearch
class from the Bootstrap package is what's running.
* Test password-protected keystore with Docker (#50803)
This commit adds two tests for the case where we mount a
password-protected keystore into a Docker container and provide a
password via a Docker environment variable.
We also fix a logging bug where we were logging the identifier for an
array of strings rather than the contents of that array.
* Add documentation for keystore startup prompting (#50821)
When a keystore is password-protected, Elasticsearch will prompt at
startup. This commit adds documentation for this prompt for the archive,
systemd, and Docker cases.
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
* Warn when unable to upgrade keystore on debian (#51011)
For Red Hat RPM upgrades, we warn if we can't upgrade the keystore. This
commit brings the same logic to the code for Debian packages. See the
posttrans file for gets executed for RPMs.
* Restore handling of string input
Adds tests that were mistakenly removed. One of these tests proved
we were not handling the the stdin (-x) option correctly when no
input was added. This commit restores the original approach of
reading stdin one char at a time until there is no more (-1, \r, \n)
instead of using readline() that might return null
* Apply spotless reformatting
* Use '--since' flag to get recent journal messages
When we get Elasticsearch logs from journald, we want to fetch only log
messages from the last run. There are two reasons for this. First, if
there are many logs, we might get a string that's too large for our
utility methods. Second, when we're looking for a specific message or
error, we almost certainly want to look only at messages from the last
execution.
Previously, we've been trying to do this by clearing out the physical
files under the journald process. But there seems to be some contention
over these directories: if journald writes a log file in between when
our deletion command deletes the file and when it deletes the log
directory, the deletion will fail.
It seems to me that we might be able to use journald's "--since" flag to
retrieve only log messages from the last run, and that this might be
less likely to fail due to race conditions in file deletion.
Unfortunately, it looks as if the "--since" flag has a granularity of
one-second. I've added a two-second sleep to make sure that there's a
sufficient gap between the test that will read from journald and the
test before it.
* Use new journald wrapper pattern
* Update version added in secure settings request
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
This commit modifies the bounding box for geogrid unit tests
to only consider bounding boxes that have significant longitudinal
width and whose coordinates are normalized to quantized space
Closes#51103.
We added a new rounding in #50609 that handles offsets to the start and
end of the rounding so that we could support `offset` in the `composite`
aggregation. This starts moving `date_histogram` to that new offset.
This is a redo of #50873 with more integration tests.
This reverts commit d114c9db3e1d1a766f9f48f846eed0466125ce83.
We've been parsing the `time_zone` parameter on `date_hitogram` for a
while but it hasn't *done* anything. This wires it up.
Closes#45199
Inspired by #45200
We shouldn't be using potentially changing versions of the cluster state
when answering a snapshot status API call by calling `SnapshotService#currentSnapshots` multiple times (each time using `ClusterService#state` under the hood) but instead pass down the state from the transport action.
Having these API behave more in a more deterministic way will make it easier to use them once parallel repository operations
are introduced.
This commit overrides the stdout and stderr print streams to be
redirected to the main elasticsearch.log file. While the Elasticsearch
project ensures stdout and stderr are not written to, the jdk or 3rd
party libs may do this, which can be unexepected for users used to
looking the elasticsearch log.
closes#50156
Wait for the cluster to have settled down and have the same accepted version on all nodes before
executing and cancelling request so that a slow CS accept on one node doesn't make it fall behind
and then get sent the full CS because of the diff-version mismatch, breaking the mechanics of this test.
Closes#51308
Added node closed exception to the retryable remote exceptions as it's possible to run into this exception instead of a connect exception when the master node is just shutting down but still responding to requests.
* Fix Inconsistent Shard Failure Count in Failed Snapshots
This fix was necessary to allow for the below test enhancement:
We were not adding shard failure entries to a failed snapshot for those
snapshot entries that were never attempted because the snapshot failed during
the init stage and wasn't partial. This caused the never attempted snapshots
to be counted towards the successful shard count which seems wrong and
broke repository consistency tests.
Also, this change adjusts snapshot resiliency tests to run another snapshot
at the end of each test run to guarantee a correct `index.latest` blob exists
after each run.
Closes#47550
The order indices are returned in in the metadata is not guaranteed.
This commit accounts for any possible ordering in assertions about
hidden indices.
closes#51340
It is permitted for nodes to accept transport connections at addresses other
than their publish address, which allows a good deal of flexibility when
configuring discovery. However, it is not unusual for users to misconfigure
nodes to pick a publish address which is inaccessible to other nodes. We see
this happen a lot if the nodes are on different networks separated by a proxy,
or if the nodes are running in Docker with the wrong kind of network config.
In this case we offer no useful feedback to the user unless they enable
TRACE-level logs. It's particularly tricky to diagnose because if we test
connectivity between the nodes (using their discovery addresses) then all will
appear well.
This commit adds a WARN-level log if this kind of misconfiguration is detected:
the probe connection has succeeded (to indicate that we are really talking to a
healthy Elasticsearch node) but the followup connection attempt fails.
It also tidies up some loose ends in `HandshakingTransportAddressConnector`,
removing some TODOs that need not be completed, and registering its
accidentally-unregistered timeout settings.
IndexWriter might not filter out fully deleted segments if retention
leases exist or the number of the retaining operations is non-zero.
SoftDeletesDirectoryReaderWrapper, however, always filters out fully
deleted segments.
This change uses the original directory reader when calculating segment
stats instead.
Relates #51192Closes#51303
We were loading `RepositoryData` twice during snapshot initialization,
redundantly checking if a snapshot existed already.
The first snapshot existence check is somewhat redundant because a snapshot could be
created between loading `RepositoryData` and updating the cluster state with the `INIT`
state snapshot entry.
Also, it is much safer to do the subsequent checks for index existence in the repo and
and the presence of old version snapshots once the `INIT` state entry prevents further
snapshots from being created concurrently.
While the current state of things will never lead to corruption on a concurrent snapshot
creation, it could result in a situation (though unlikely) where all the snapshot's work
is done on the data nodes, only to find out that the repository generation was off during
snapshot finalization, failing there and leaving a bunch of dead data in the repository
that won't be used in a subsequent snapshot (because the shard generation was never referenced
due to the failed snapshot finalization).
Note: This is a step on the way to parallel repository operations by making snapshot related CS
and repo related CS more tightly correlated.
This reverts commit c7fd24ca1569a809b499caf34077599e463bb8d6.
Now that JDK-8236582 is fixed in JDK 14 EA, we can revert the workaround.
Relates #50523 and #50512
LuceneChangesSnapshot can be slow if nested documents are heavily used.
Also, it estimates the number of operations to be recovered in peer
recoveries inaccurately. With this change, we prefer excluding the
nested non-root documents in a Lucene query instead.
The method parameter is not used in the percentile aggs, instead
the method is determined by the presence of `hdr` or `tdigest`
objects.
Relates to #8324
After we rollover the index we wait for the configured number of shards for the
rolled index to become active (based on the index.write.wait_for_active_shards
setting which might be present in a template, or otherwise in the default case,
for the primaries to become active).
This wait might be long due to disk watermarks being tripped, replicas not
being able to spring to life due to cluster nodes reconfiguration and others
and, the RolloverStep might not complete successfully due to this inherent
transient situation, albeit the rolled index having been created.
(cherry picked from commit 457a92fb4c68c55976cc3c3e2f00a053dd2eac70)
Signed-off-by: Andrei Dan <andrei.dan@elastic.co>
On master failover we have to resent all the shard failed messages,
but the transport requests remain the same in the eyes of `equals`.
If the master failover is registered and the requests to the new master
are sent before all the callbacks have executed and the request to the
old master removed from the deduplicator then the requuests to the new
master will incorrectly fail and the snapshot get stuck.
Closes#51253
Add the character position of a scripting error to error responses.
The contents of the `position` field are experimental and subject to
change. Currently, `offset` refers to the character location where the
error was encountered, `start` and `end` define a range of characters
that contain the error.
eg.
```
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"y = x;",
" ^---- HERE"
],
"script": "def x = new ArrayList(); Map y = x;",
"lang": "painless",
"position": {
"offset": 33,
"start": 29,
"end": 35
}
}
```
Refs: #50993
This replaces the message we return for unknown queries with the standard
one that we use for unknown fields from `ObjectParser`. This is nice
because it includes "did you mean". One day we might convert parsing
queries to using object parser, but that looks complex. This change is
much smaller and seems useful.
Adding back accidentally removed jvm option that is required to enforce
start of the week = Monday in IsoCalendarDataProvider.
Adding a `feature` to yml test in order to skip running it in JDK8
commit that removed it 398c802
commit that backports SystemJvmOptions c4fbda3
relates 7.x backport of code that enforces CalendarDataProvider use #48349
The tests, when creating broken serialized blobs could randomly create
a sequence of bytes that is partially readable by the deserializer and then
not throw `IOException` but instead `ElasticsearchParseException`.
We should just handle these unexpected exceptions downstream properly and pass them
wrapped as `RepositoryException` to the listener to fix the test and keep the API consistent.
This change introduces a new feature for indices so that they can be
hidden from wildcard expansion. The feature is referred to as hidden
indices. An index can be marked hidden through the use of an index
setting, `index.hidden`, at creation time. One primary use case for
this feature is to have a construct that fits indices that are created
by the stack that contain data used for display to the user and/or
intended for querying by the user. The desire to keep them hidden is
to avoid confusing users when searching all of the data they have
indexed and getting results returned from indices created by the
system.
Hidden indices have the following properties:
* API calls for all indices (empty indices array, _all, or *) will not
return hidden indices by default.
* Wildcard expansion will not return hidden indices by default unless
the wildcard pattern begins with a `.`. This behavior is similar to
shell expansion of wildcards.
* REST API calls can enable the expansion of wildcards to hidden
indices with the `expand_wildcards` parameter. To expand wildcards
to hidden indices, use the value `hidden` in conjunction with `open`
and/or `closed`.
* Creation of a hidden index will ignore global index templates. A
global index template is one with a match-all pattern.
* Index templates can make an index hidden, with the exception of a
global index template.
* Accessing a hidden index directly requires no additional parameters.
Backport of #50452
When you declare an ObjectParser with top level named objects like we do
with `significant_terms` we didn't support "did you mean". This fixes
that.
Relates #50938
* Fix Infinite Retry Loop in loading RepositoryData
We were running into an infinite loop when trying to load corrupted (or otherwise un-loadable)
repository data for a repo that uses best effort consistency (e.g. that was just freshly mounted
as done in the test) because we kepy resetting to `-1` on `IOException`, listing and finding the broken
generation `N` and then interpreted the subsequent reset to `-1` as a concurrent change to the repository.
This moves the testing of custom significance heuristic plugins from an
`ESIntegTestCase` to an example plugin. This is *much* more "real" and
can be used as an example for anyone that needs to actually build such a
plugin. The old test had testing concerns and the example all jumbled
together.
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internally for the token filter factories but it registers only one
instance in the cache and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.
Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.
Fixes: #50734
(cherry picked from commit 24e1858)
Backport: #50467
This commit adds the name of the current pipeline to ingest metadata.
This pipeline name is accessible under the following key: '_ingest.pipeline'.
Example usage in pipeline:
PUT /_ingest/pipeline/2
{
"processors": [
{
"set": {
"field": "pipeline_name",
"value": "{{_ingest.pipeline}}"
}
}
]
}
Closes#42106
Index templates created in the 5x line can still be present in the cluster state
through multiple upgrades, and may have more than one mapping defined. 8x
will stop supporting templates with multiple mappings, and we should emit
deprecation warnings in 7x clusters to give users a chance to update their
templates before upgrading.
Check it out:
```
$ curl -u elastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_update/foo?pretty -d'{
"dac": {}
}'
{
"error" : {
"root_cause" : [
{
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
}
],
"type" : "x_content_parse_exception",
"reason" : "[2:3] [UpdateRequest] unknown field [dac] did you mean [doc]?"
},
"status" : 400
}
```
The tricky thing about implementing this is that x-content doesn't
depend on Lucene. So this works by creating an extension point for the
error message using SPI. Elasticsearch's server module provides the
"spell checking" implementation.
s