For the document field equals and hash code tests, we try to mutate the
document field to intentionally produce a document field not equal to
our provided one. We do this by randomly choosing a document field that
has either
- a randomly chosen field name and the same field value as the provided
document field
- a randomly chosen field value and the same field value as the
provided document field
If we are unlucky, it can be that the document field chosen by this
method can be equal to the provided document field. In this case, our
test will fail because the mutation really should be not equal. In this
case, we should simply try the other mutation. Note that random document
field produced by the second method can be equal to the provided
document because it has the same field name and we can get unlucky with
our randomly chosen field values. It is not the case that the random
document field produced by the first method can be equal to the provided
document field; this is because the current implementation guarantees
that the field name length will be different guaranteeing that we have a
different field name. Nevertheless, we fix the issue here by checking
that our random choice gives us a non-equal document field, and assert
that if we got unlucky the other one will work for us.
In a few places we need to lazy initialize static deprecation
loggers. This is needed to avoid touching logging before logging is
configured, but deprecation loggers that are used in foundational
classes like settings and parsers would be initialized before logging is
configured. Previously we used a lazy set once pattern which is fine,
but there's a simpler approach: the holder pattern.
Relates #26218
This commits changes the keystore cli add commands to prompt for
creating the keystore if it does not exist. This will make it easier on
users starting out, not having to run a separate command for creation.
An array of values is required because there is no default (or
reasonable way to set a default). But validation for values
only happens if it is actually set. If the values param is omitted
entirely than the agg builder will NPE.
This chance adds several random test infrastructure improvements that caused
issues in on-going developments but are generally useful. For instance is it impossible
to restart a node with a secure setting source since we close it after the node is started.
This change makes it cloneable such that we can reuse it for a restart.
Currently the `percentiles` aggregation allows specifying both possible methods
in the query DSL, but only the later one is used. This changes it to rejecting
such requests with an error. Setting the method multiple times via the java API
still works (and the last one wins).
Closes#26095
This simply removes the default identity hashcode and equals methods in InternalAggregation which where only temporarily put there while we implmeneted the methods in the subclasses.
The node setting `cluster.indices.tombstones.size` was not registered with the settings infrastructure, making it impossible for it to be set by a user.
Closes#26191
For CLI tools, we configure logging without reading the
log4j2.properties file. This because any log statements in a CLI tool
should dump to the console while reading from the log4j2.properties file
would cause them to dump whereever the log configuration there indicates
(e.g., possibly a remote machine). To do this, we added some code to the
base implementation of all CLI tools to configure logging without a
config file. This code is also executed when Elasticsearch starts up. In
the past this was fine yet we previously added detection to
Elasticsearch to find cases where we use logging before it is
configured. Because of configuring logging without a config, this means
we only catch uses of logging before the logging without config is
performed. To correct this, we enable a CLI tool to skip enabling
logging without a config and then in the Elasticsearch CLI we indeed
utilize this to skip configuring logging without a config.
Relates #26209
The flood warning checks the wrong threshold, namely the high
watermark. This would impact any node for which the disk usage is above
the high watermark and below the flood stage watermark. This commit
fixes this so that it compares to the flood threshold.
Relates #26204
* Rewrite range queries with open bounds to exists query
This change rewrites range query with open bounds to an exists query that should be faster to execute.
Fixes#22640
`epoch_millis` and `epoch_second` date formats truncate float values, as numbers or as strings.
The `coerce` parameter is not defined for `date` field type and this is not changing.
See PR #26119Closes#14641
The following token filters were moved: arabic_stem, brazilian_stem, czech_stem, dutch_stem, french_stem, german_stem and russian_stem.
Relates to #23658
In reindex APIs, when using the `slices` parameter to choose the number of slices, adds the option to specify `slices` as "auto" which will choose a reasonable number of slices. It uses the number of shards in the source index, up to a ceiling. If there is more than one source index, it uses the smallest number of shards among them.
This gives users an easy way to use slicing in these APIs without having to make decisions about how to configure it, as it provides a good-enough configuration for them out of the box. This may become the default behavior for these APIs in the future.
By default we only serialize analyzers if the index analyzer is not the
`default` analyzer or if the `search_analyzer` is different from the index
`analyzer`. This raises issues with the `_all` field when the
`index.analysis.analyzer.default_search` is set, since it automatically makes
the `search_analyzer` different from the index `analyzer`. Then there are
exceptions since we expect the `_all` configuration to be empty on 6.0 indices.
Closes#26136
Today we have a `null` invariant on all `ClusterState.Custom`. This makes
several code paths complicated and requires complex state handling in some cases.
This change allows to register a custom supplier that is used to initialize the
initial clusterstate with these transient customs.
This is a safer default since sorting by sub aggregations prevents these
aggregations from being deferred. `global_ordinals_hash` will at least
make sure that we do not use memory for buckets that are not collected.
Closes#24359
Two tests were still using the static indices:
* IndexFolderUpgraderTests#testUpgradeRealIndex()
* InternalEngineTests#testUpgradeOldIndex()
I removed these tests too, because these tests functionally overlap
with the full-cluster-restart qa tests.
Relates to #24939
This occasionally fails now because if `top` is `-Infinity` (which we sometimes
test for in randomization), the value might not get changed for the
equals/hashCode tests.
Closes#26107
* Adds ToXContentFragment
This interface is meant for objects that implement `ToXContent` but are not complete objects. It is basically the opposite of `ToXContentObject`. It means that it will be easier to track the migration of classes over to the fragment/not fragment ToXContent model as it will be clear which classes are not migrated. When no classes directly implement `ToXContent` we can make `ToXContent` package private to be sure that all new classes must implement `ToXContentObject` or `ToXContentFragment`.
* review comments
* more review comments
* javadocs
* iter
* Adds tests
* iter
* adds toString test for aggs
* improves tests following review comments
* iter
* iter
* validate half float values
* test upper bound for numeric mapper
* test for upper bound for float, double and half_float
* more tests on NaN and Infinity for NumberFieldMapper
* fix checkstyle errors
* minor renaming
* comments for disabled test
* tests for byte/short/integer/long removed and will be added in separate PR
* remove unused import
* Fix scaledfloat out of range validation message
* 1) delayed autoboxing in numbertype.parse(...)
2) no redudant checks in half_float validation
3) tests with negative values for half_float/float/double
* Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string
This change adds a new parameter called auto_generate_synonyms_phrase_query (defaults to true).
This option can be used in conjunction with synonym_graph token filter to generate phrase queries
when multi terms synonyms are encountered.
For example, a synonym like "ny, new york" would produce the following boolean query when "ny city" is parsed:
((ny OR "new york") AND city)
Note how the multi terms synonym "new york" produces a phrase query.
We introduced a hack in #25885 to respect the cluster alias if available on the `_index` field. This is important if aggregations or other field data related operations are executed. Yet, we added a small hack that duplicated an implementation detail from the `_index` field data builder to make this work. This change adds a necessary but simple API change that allows us to remove the hack and only have a single implementation.
The goal of this similarity is to help users who would like to keep the
functionality of the `tf-idf` similarity that we want to remove, or to allow
for specific usec-cases (disabling idf, disabling tf, disabling length norm,
etc.) to not have to build a custom plugin and familiarize with the low-level
Lucene API.
When `refresh=wait_for` is set on an indexing request, we register a listener on the shards that are call during the next refresh. During the recover translog phase, when the engine is open, we have a window of time when indexing operations succeed and they can add their listeners. Those listeners will only be called when the recovery finishes as we do not refresh during recoveries (unless the indexing buffer is full). Next to being a bad user experience, it can also cause deadlocks with an ongoing peer recovery that may wait for those operations to mark the replica in sync (details below).
To fix this, this PR changes refresh listeners to be a noop when the shard is not yet serving reads (implicitly covering the recovery period). It doesn't matter anyway.
Deadlock with recovery:
When finalizing a peer recovery we mark the peer as "in sync". To do so we wait until the peer's local checkpoint is at least as high as the global checkpoint. If an operation with `refresh=wait_for` is added as a listener on that peer during recovery, it is not completed from the perspective of the primary. The primary than may wait for it to complete before advancing the local checkpoint for that peer. Since that peer is not considered in sync, the global checkpoint on the primary can be higher, causing a deadlock. Operation waits for recovery to finish and a refresh to happen. Recovery waits on the operation.
* Allow ingest simulate to parse _id, _index, _type, _routing and _parent as either string or int (#23823)
* Generate data that includes Integer and String type fields for testing document parsing.
https://github.com/elastic/elasticsearch/pull/17379 fixed many metric aggs so that if the parent aggregation does not collect any documents an empty bucket value is returned instead of an ArrayOutOfBoundsException being thrown. Unfortunately the value count aggregation was mised from this fix.
This change applies this fix from #17379 for the value count aggregation.
`ClusterSearchShardsResponseTests.testSerialization` randomly uses `IdsQueryBuilderTests` to generate an alias filter. `IdsQueryBuilderTests` shecks if the array of current types is length zero but it can also be null which causes a `NullPointerException`. This changes adds a null check to avoid the exception.
Closes#26021
* Adds mutate function to various tests
Relates to #25929
* fix test
* implements mutate function for all single bucket aggs
* review comments
* convert getMutateFunction to mutateIInstance