When slices is set as auto, there's an additional network call
needed for the reindex tasks to know how to rethrottle. Sometimes
the rethrottle action happens before the reindex task is fully
initialized, so in the test we wait for the task to be ready.
This commit also adds some safeguards to ensure that
cancel and rethrottle operations are handled correctly
Closes#26192
If we do not have permissions to write the keystore, an unclear access
denied exception is thrown. This commit catches this exception so that
we can decorate it with a friendlier error message.
Relates #26284
We use `:` for cross-cluster search (eg `cluster:index`), therefore, we should
not allow the ambiguity when allowing cluster or index names.
Relates to #23892
We already added the functionality to create a new keystore on startup
in #26126 but apparently missed to persist the keystore. This change adds
peristence and adds a test for the boostrap loading.
Today a `ClusterState.Custom` can be fetched by a transport client and
leaks to the user even if the classes are private etc since the serialized
bytes can be reconstructed. This change adds an option to customs to mark
them as private such that our clusterstate action will never leak it.
The AwaitsFix issue has been closed as the deleting an index and recreating with same name will give the
shard a fresh folder to be written to (based on the index uuid).
Due to the weird way of structuring the serialization code in AcknowledgedRequest, many request types forgot to properly serialize the request timeout, for example "index deletion", "index rollover", "index shrink", "putting pipeline", and other requests. This means that if those requests were not directly sent to the master node, the acknowledgement timeout information would be lost (and the default used instead).
Some requests also don't properly expose the timeout mechanism in the REST layer, such as put / delete stored script. This commit fixes all that.
This commit adds a keystore.seed setting that is automatically
generated when the ES keystore is created. This setting may be used by
plugins as a secure, random value. This commit also auto creates the
keystore upon startup to ensure the new setting is always available.
For the document field equals and hash code tests, we try to mutate the
document field to intentionally produce a document field not equal to
our provided one. We do this by randomly choosing a document field that
has either
- a randomly chosen field name and the same field value as the provided
document field
- a randomly chosen field value and the same field value as the
provided document field
If we are unlucky, it can be that the document field chosen by this
method can be equal to the provided document field. In this case, our
test will fail because the mutation really should be not equal. In this
case, we should simply try the other mutation. Note that random document
field produced by the second method can be equal to the provided
document because it has the same field name and we can get unlucky with
our randomly chosen field values. It is not the case that the random
document field produced by the first method can be equal to the provided
document field; this is because the current implementation guarantees
that the field name length will be different guaranteeing that we have a
different field name. Nevertheless, we fix the issue here by checking
that our random choice gives us a non-equal document field, and assert
that if we got unlucky the other one will work for us.
In a few places we need to lazy initialize static deprecation
loggers. This is needed to avoid touching logging before logging is
configured, but deprecation loggers that are used in foundational
classes like settings and parsers would be initialized before logging is
configured. Previously we used a lazy set once pattern which is fine,
but there's a simpler approach: the holder pattern.
Relates #26218
This commits changes the keystore cli add commands to prompt for
creating the keystore if it does not exist. This will make it easier on
users starting out, not having to run a separate command for creation.
An array of values is required because there is no default (or
reasonable way to set a default). But validation for values
only happens if it is actually set. If the values param is omitted
entirely than the agg builder will NPE.
This chance adds several random test infrastructure improvements that caused
issues in on-going developments but are generally useful. For instance is it impossible
to restart a node with a secure setting source since we close it after the node is started.
This change makes it cloneable such that we can reuse it for a restart.
Currently the `percentiles` aggregation allows specifying both possible methods
in the query DSL, but only the later one is used. This changes it to rejecting
such requests with an error. Setting the method multiple times via the java API
still works (and the last one wins).
Closes#26095
This simply removes the default identity hashcode and equals methods in InternalAggregation which where only temporarily put there while we implmeneted the methods in the subclasses.
The node setting `cluster.indices.tombstones.size` was not registered with the settings infrastructure, making it impossible for it to be set by a user.
Closes#26191
For CLI tools, we configure logging without reading the
log4j2.properties file. This because any log statements in a CLI tool
should dump to the console while reading from the log4j2.properties file
would cause them to dump whereever the log configuration there indicates
(e.g., possibly a remote machine). To do this, we added some code to the
base implementation of all CLI tools to configure logging without a
config file. This code is also executed when Elasticsearch starts up. In
the past this was fine yet we previously added detection to
Elasticsearch to find cases where we use logging before it is
configured. Because of configuring logging without a config, this means
we only catch uses of logging before the logging without config is
performed. To correct this, we enable a CLI tool to skip enabling
logging without a config and then in the Elasticsearch CLI we indeed
utilize this to skip configuring logging without a config.
Relates #26209
The flood warning checks the wrong threshold, namely the high
watermark. This would impact any node for which the disk usage is above
the high watermark and below the flood stage watermark. This commit
fixes this so that it compares to the flood threshold.
Relates #26204
* Rewrite range queries with open bounds to exists query
This change rewrites range query with open bounds to an exists query that should be faster to execute.
Fixes#22640
`epoch_millis` and `epoch_second` date formats truncate float values, as numbers or as strings.
The `coerce` parameter is not defined for `date` field type and this is not changing.
See PR #26119Closes#14641
The following token filters were moved: arabic_stem, brazilian_stem, czech_stem, dutch_stem, french_stem, german_stem and russian_stem.
Relates to #23658
In reindex APIs, when using the `slices` parameter to choose the number of slices, adds the option to specify `slices` as "auto" which will choose a reasonable number of slices. It uses the number of shards in the source index, up to a ceiling. If there is more than one source index, it uses the smallest number of shards among them.
This gives users an easy way to use slicing in these APIs without having to make decisions about how to configure it, as it provides a good-enough configuration for them out of the box. This may become the default behavior for these APIs in the future.
By default we only serialize analyzers if the index analyzer is not the
`default` analyzer or if the `search_analyzer` is different from the index
`analyzer`. This raises issues with the `_all` field when the
`index.analysis.analyzer.default_search` is set, since it automatically makes
the `search_analyzer` different from the index `analyzer`. Then there are
exceptions since we expect the `_all` configuration to be empty on 6.0 indices.
Closes#26136
Today we have a `null` invariant on all `ClusterState.Custom`. This makes
several code paths complicated and requires complex state handling in some cases.
This change allows to register a custom supplier that is used to initialize the
initial clusterstate with these transient customs.
This is a safer default since sorting by sub aggregations prevents these
aggregations from being deferred. `global_ordinals_hash` will at least
make sure that we do not use memory for buckets that are not collected.
Closes#24359
Two tests were still using the static indices:
* IndexFolderUpgraderTests#testUpgradeRealIndex()
* InternalEngineTests#testUpgradeOldIndex()
I removed these tests too, because these tests functionally overlap
with the full-cluster-restart qa tests.
Relates to #24939
This occasionally fails now because if `top` is `-Infinity` (which we sometimes
test for in randomization), the value might not get changed for the
equals/hashCode tests.
Closes#26107
* Adds ToXContentFragment
This interface is meant for objects that implement `ToXContent` but are not complete objects. It is basically the opposite of `ToXContentObject`. It means that it will be easier to track the migration of classes over to the fragment/not fragment ToXContent model as it will be clear which classes are not migrated. When no classes directly implement `ToXContent` we can make `ToXContent` package private to be sure that all new classes must implement `ToXContentObject` or `ToXContentFragment`.
* review comments
* more review comments
* javadocs
* iter
* Adds tests
* iter
* adds toString test for aggs
* improves tests following review comments
* iter
* iter
* validate half float values
* test upper bound for numeric mapper
* test for upper bound for float, double and half_float
* more tests on NaN and Infinity for NumberFieldMapper
* fix checkstyle errors
* minor renaming
* comments for disabled test
* tests for byte/short/integer/long removed and will be added in separate PR
* remove unused import
* Fix scaledfloat out of range validation message
* 1) delayed autoboxing in numbertype.parse(...)
2) no redudant checks in half_float validation
3) tests with negative values for half_float/float/double
* Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string
This change adds a new parameter called auto_generate_synonyms_phrase_query (defaults to true).
This option can be used in conjunction with synonym_graph token filter to generate phrase queries
when multi terms synonyms are encountered.
For example, a synonym like "ny, new york" would produce the following boolean query when "ny city" is parsed:
((ny OR "new york") AND city)
Note how the multi terms synonym "new york" produces a phrase query.
We introduced a hack in #25885 to respect the cluster alias if available on the `_index` field. This is important if aggregations or other field data related operations are executed. Yet, we added a small hack that duplicated an implementation detail from the `_index` field data builder to make this work. This change adds a necessary but simple API change that allows us to remove the hack and only have a single implementation.
The goal of this similarity is to help users who would like to keep the
functionality of the `tf-idf` similarity that we want to remove, or to allow
for specific usec-cases (disabling idf, disabling tf, disabling length norm,
etc.) to not have to build a custom plugin and familiarize with the low-level
Lucene API.
When `refresh=wait_for` is set on an indexing request, we register a listener on the shards that are call during the next refresh. During the recover translog phase, when the engine is open, we have a window of time when indexing operations succeed and they can add their listeners. Those listeners will only be called when the recovery finishes as we do not refresh during recoveries (unless the indexing buffer is full). Next to being a bad user experience, it can also cause deadlocks with an ongoing peer recovery that may wait for those operations to mark the replica in sync (details below).
To fix this, this PR changes refresh listeners to be a noop when the shard is not yet serving reads (implicitly covering the recovery period). It doesn't matter anyway.
Deadlock with recovery:
When finalizing a peer recovery we mark the peer as "in sync". To do so we wait until the peer's local checkpoint is at least as high as the global checkpoint. If an operation with `refresh=wait_for` is added as a listener on that peer during recovery, it is not completed from the perspective of the primary. The primary than may wait for it to complete before advancing the local checkpoint for that peer. Since that peer is not considered in sync, the global checkpoint on the primary can be higher, causing a deadlock. Operation waits for recovery to finish and a refresh to happen. Recovery waits on the operation.
* Allow ingest simulate to parse _id, _index, _type, _routing and _parent as either string or int (#23823)
* Generate data that includes Integer and String type fields for testing document parsing.
https://github.com/elastic/elasticsearch/pull/17379 fixed many metric aggs so that if the parent aggregation does not collect any documents an empty bucket value is returned instead of an ArrayOutOfBoundsException being thrown. Unfortunately the value count aggregation was mised from this fix.
This change applies this fix from #17379 for the value count aggregation.
`ClusterSearchShardsResponseTests.testSerialization` randomly uses `IdsQueryBuilderTests` to generate an alias filter. `IdsQueryBuilderTests` shecks if the array of current types is length zero but it can also be null which causes a `NullPointerException`. This changes adds a null check to avoid the exception.
Closes#26021
* Adds mutate function to various tests
Relates to #25929
* fix test
* implements mutate function for all single bucket aggs
* review comments
* convert getMutateFunction to mutateIInstance
This commit adds the nio transport as an option in place of the mock tcp
transport for tests. Each test will only use one transport type. The
transport type is decided by a random boolean generated inside of the
`ESTestCase` class.
This commit updates the version for master to 7.0.0-alpha1. It also adds
the 6.1 version constant, and fixes many tests, as well as marking some
as awaits fix.
Closes#25893Closes#25870
We are currently quite lenient about the targets of `copy_to`. However in a
number of cases we can detect illegal use of `copy_to` at mapping update time.
For instance, it does not make sense to use object fields as targets of
`copy_to`, or fields that would end up in a different nested document.
When ES starts up we verify we can write to all data folders and that they support atomic moves. We do so by creating and deleting temp files. If for some reason the files was successfully created but not successfully deleted, we still shut down correctly but subsequent start attempts will fail with a file already exists exception.
This commit makes sure to first clean any existing temporary files.
Superseeds #21007
ToXContentToBytes is used as a base class that adds toString and buildAsBytes method implementation to classes that implement ToXContent. With the ongoing cleanups, this class is limited and doesn't add a lot of value, given that buildAsBytes can be replaced with XContentHelper.toXContent and toString can be replaced with Strings.toString(this).
The plan would be to remove ToXContentToBytes entirely, and AbstractQueryBuilder is the first place where we can remove its usage.
During peer recoveries, we need to copy over lucene files and replay the operations they miss from the source translog. Guaranteeing that translog files are not cleaned up has seen many iterations overtime. Back in the old 1.0 days, recoveries went through the Engine and actively prevented both translog cleaning and lucene commits. We then moved to a notion called Translog Views, which allowed the recovery code to "acquire" a view into the translog which is then guaranteed to be kept around until the view is closed. The Engine code was free to commit lucene and do what it ever it wanted without coordinating with recoveries. Translog file deletion logic was based on reference counting on the file level. Those counters were incremented when a view was acquired but also when the view was used to create a `Snapshot` that allowed you to read operations from the files. At some point we removed the file based counting complexity in favor of constructs on the Translog level that just keep track of "open" views and the minimum translog generation they refer to. To do so, Views had to be kept around until the last snapshot that was made from them was consumed. This was fine in recovery code but lead to [a subtle bug](https://github.com/elastic/elasticsearch/pull/25862) in the [Primary Replica Resyncer](https://github.com/elastic/elasticsearch/pull/25862).
Concurrently, we have developed the notion of a `TranslogDeletionPolicy` which is responsible for the liveness aspect of translog files. This class makes it very simple to take translog Snapshot into account for keep translog files around, allowing people that just need a snapshot to just take a snapshot and not worry about views and such. Recovery code which actually does need a view can now prevent trimming by acquiring a simple retention lock (a `Closable`). This removes the need for the notion of a View.
The following token filters were moved: delimited_payload_filter, keep, keep_types, classic, apostrophe, decimal_digit, fingerprint, min_hash and scandinavian_folding.
Relates to #23658
This commit adds a bootstrap check for the maximum file size, and
ensures the limit is set correctly when Elasticsearch is installed as a
service on systemd-based systems.
Relates #25974
We have a command-line flag -V or --version that can be used to display
the version of Elasticsearch. However, the version that we display does
not contain whether or not the version is a snapshot build. This commit
changes the behavior here so that if the build is a snapshot, that is
included in the version string.
Relates #25970
Previously we manually checked if mutually exclusive options are passed
on the command line. Yet, after an upgrade to our option parser
dependency, we were able to use built-in functionality to establish
these mutually exclusive options and the parser would take care of
checking if such options are passed on the command line. However, the
previous manually checking code is now dead and was left behind. This
commit removes that dead code.
Relates #19278
The failure reason for snapshot shard failures might not be propagated properly if the master node changes after the errors were reported by other data nodes. This commits ensures that the snapshot shard failure reason is preserved properly and adds workaround for reading old snapshot files where this information might not have been preserved.
Closes#25878
The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput.
The query builder's binary format has now the same bwc guarentees as the xcontent format.
Added a qa test that verifies that percolator queries written in older versions are still readable by the current version.
This change merges the functionality of the FiltersFunctionScoreQuery in the FunctionScoreQuery.
It also ensures that an exception is thrown when the computed score is equals to Float.NaN or Float.NEGATIVE_INFINITY.
These scores are invalid for TopDocsCollectors that relies on score comparison.
Fixes#15709Fixes#23628
This commit fixes tests for environment-aware commands. A previous
change added a check that es.path.conf is not null. The problem is that
this system property is not being set in tests so this check trips every
single time. To fix this, we move the check into a method that can be
overridden, and then override this method in relevant places in tests to
avoid having to set the property in tests. We also add a test that this
check works as expected.
A previous change enabled it so that users could configure the
configuration path via a command-line option --path.conf. However, a
subsequent change has made it so that we expect users to set the
configuration path via the environment variable CONF_DIR. To enable
this, we now pass the value of CONF_DIR as the value for the
command-line option --path.conf. This has two problems:
- the presence of --path.conf always being on the command line breaks
other flags like --help for multi-commands
- the scripts for which --help is not broken say that you can pass
--path.conf but this is a lie since passing it will make it appear
twice in the command-line arguments breaking the script
Since --path.conf is no longer the way that we want users to set the
configuration path, we should remove the --path.conf option. However, we
still need a way to get the configuration path from the scripts to the
running Java process. To do this, we now pass the configuration path as
a system property. This keeps it off the script command line fixing the
above problems.
The only remaining question (that I can see) is whether or not to
respect -Des.path.conf=<some path> if the user sets this in their
jvm.options or via ES_JAVA_OPTS. I think that we should not do this (as
has been our tradition), es.path.home and es.path.conf are special,
should be set by our scripts only so users should not be setting them at
all so we should not take any effort to respect these flags if the user
tries to otherwise use them.
Relates #25943
With Gradle 4.1 and newer JDK versions, we can finally invoke Gradle directly using a JDK9 JAVA_HOME without requiring a JDK8 to "bootstrap" the build. As the thirdPartyAudit task runs within the JVM that Gradle runs in, it needs to be adapted now to be JDK9 aware.
This commit also changes the `JavaCompile` tasks to only fork if necessary (i.e. when Gradle's JVM and JAVA_HOME's JVM differ).
At the shard level we use an operation permit to coordinate between regular shard operations and special operations that need exclusive access. In ES versions < 6, the operation requiring exclusive access was invoked during primary relocation, but ES versions >= 6 this exclusive access is also used when a replica learns about a new primary or when a replica is promoted to primary.
These special operations requiring exclusive access delay regular operations from running, by adding them to a queue, and after finishing the exclusive access, release these operations which then need to be put back on the original thread-pool they were running on. In the presence of thread pool rejections, the current implementation had two issues:
- it would not properly release the operation permit when hitting a rejection (i.e. when calling ThreadedActionListener.onResponse from IndexShardOperationPermits.acquire).
- it would not invoke the onFailure method of the action listener when the shard was closed, and just log a warning instead (see ThreadedActionListener.onFailure), which would ultimately lead to the replication task never being cleaned up (see #25863).
This commit fixes both issues by introducing a custom threaded action listener that is permit-aware and properly deals with rejections.
Closes#25863
It fixes random score generation to ensure that you will not always get the
same scores on a read-only index by integrating the seed into the score
computation when using doc ids. It also removes `ctx.docBase` from the formula
since it might change over time if deletes are compacted while scores are
supposed to be cacheable per segment.
Extracts ranges from range queries on byte, short, integer, long, half_float, scaled_float, float, double, date and ip fields.
byte, short, integer and date ranges are normalized to Lucene's LongRange.
half_float and float are normalized to Lucene's DoubleRange.
When extracting range queries, the QueryAnalyzer computes the width of the range. This width is used to determine
what range should be preferred in a conjunction query. The QueryAnalyzer prefers the smaller ranges, because these
ranges tend to match with less documents.
Closes#21040
Today we expose `IndexFieldDataService` outside of IndexService to do maintenance
or lookup field data in different ways. Yet, we have a streamlined way to access IndexFieldData
via `QueryShardContext` that should encapsulate all access to it. This also ensures that we control all other functionality like cache clearing etc.
This change also removes the `recycler` option from `ClearIndicesCacheRequest` this option is a no-op and should have been removed long ago.
Currently, NioTransport does start normal socket selectors and the
client when the network server setting is set to false. This commit
makes it so that the client will be started even when the network server
is not enabled.
Additionally, it randomly introduces the NioTransport as an option for
the MockTransportClient throughout tests.
This predicate is used to deal with the intricacies of detecting when a master is reelected/nodes rejoins an existing master. The current implementation is based on nodeIds, which is fine if the master really change. If the nodeId is equal the code falls back to detecting an increment in the cluster state version which happens when a node is re-elected or when the node rejoins. Sadly this doesn't cover the case where the same node is elected after a full restart of all master nodes. In that case we recover the cluster state from disk but the version is reset back to 0. To fix this, the check should be done based on ephemeral IDs which are reset on restart.
Fixes#25471
These two methods do do the same thing. The subtle difference between the two is that the former prints out pretty printed content by default while the latter doesn't. There are way more usages of the latter throughout the codebase hence I kept that variant although I do think that it would be much better to print out prettified content by default from a `toString`. That breaks quite some tests so I didn't make that change yet.
Also XContentHelper#toString was outdated as it didn't check the ToXContent#isFragment method to decide whether a new anonymous object has to be created or not. It would simply fail with any ToXContentObject.
The test only waited for one op to be stuck. In rare occasions the other ops were still in flight when recovery captured a translog snapshot throwing doc count off.
The configuration removed from the runtime configuration did not
properly remove the deps jar from gradle versions > 3.3. The rest client
now removes both the 3.3 and 3.3+ configurations so this works on both
versions of gradle.
Closes#25884
Relates #25208