This is a safer default since sorting by sub aggregations prevents these
aggregations from being deferred. `global_ordinals_hash` will at least
make sure that we do not use memory for buckets that are not collected.
Closes#24359
Two tests were still using the static indices:
* IndexFolderUpgraderTests#testUpgradeRealIndex()
* InternalEngineTests#testUpgradeOldIndex()
I removed these tests too, because these tests functionally overlap
with the full-cluster-restart qa tests.
Relates to #24939
This occasionally fails now because if `top` is `-Infinity` (which we sometimes
test for in randomization), the value might not get changed for the
equals/hashCode tests.
Closes#26107
* Adds ToXContentFragment
This interface is meant for objects that implement `ToXContent` but are not complete objects. It is basically the opposite of `ToXContentObject`. It means that it will be easier to track the migration of classes over to the fragment/not fragment ToXContent model as it will be clear which classes are not migrated. When no classes directly implement `ToXContent` we can make `ToXContent` package private to be sure that all new classes must implement `ToXContentObject` or `ToXContentFragment`.
* review comments
* more review comments
* javadocs
* iter
* Adds tests
* iter
* adds toString test for aggs
* improves tests following review comments
* iter
* iter
* validate half float values
* test upper bound for numeric mapper
* test for upper bound for float, double and half_float
* more tests on NaN and Infinity for NumberFieldMapper
* fix checkstyle errors
* minor renaming
* comments for disabled test
* tests for byte/short/integer/long removed and will be added in separate PR
* remove unused import
* Fix scaledfloat out of range validation message
* 1) delayed autoboxing in numbertype.parse(...)
2) no redudant checks in half_float validation
3) tests with negative values for half_float/float/double
* Add support for auto_generate_synonyms_phrase_query in match_query, multi_match_query, query_string and simple_query_string
This change adds a new parameter called auto_generate_synonyms_phrase_query (defaults to true).
This option can be used in conjunction with synonym_graph token filter to generate phrase queries
when multi terms synonyms are encountered.
For example, a synonym like "ny, new york" would produce the following boolean query when "ny city" is parsed:
((ny OR "new york") AND city)
Note how the multi terms synonym "new york" produces a phrase query.
We introduced a hack in #25885 to respect the cluster alias if available on the `_index` field. This is important if aggregations or other field data related operations are executed. Yet, we added a small hack that duplicated an implementation detail from the `_index` field data builder to make this work. This change adds a necessary but simple API change that allows us to remove the hack and only have a single implementation.
The goal of this similarity is to help users who would like to keep the
functionality of the `tf-idf` similarity that we want to remove, or to allow
for specific usec-cases (disabling idf, disabling tf, disabling length norm,
etc.) to not have to build a custom plugin and familiarize with the low-level
Lucene API.
When `refresh=wait_for` is set on an indexing request, we register a listener on the shards that are call during the next refresh. During the recover translog phase, when the engine is open, we have a window of time when indexing operations succeed and they can add their listeners. Those listeners will only be called when the recovery finishes as we do not refresh during recoveries (unless the indexing buffer is full). Next to being a bad user experience, it can also cause deadlocks with an ongoing peer recovery that may wait for those operations to mark the replica in sync (details below).
To fix this, this PR changes refresh listeners to be a noop when the shard is not yet serving reads (implicitly covering the recovery period). It doesn't matter anyway.
Deadlock with recovery:
When finalizing a peer recovery we mark the peer as "in sync". To do so we wait until the peer's local checkpoint is at least as high as the global checkpoint. If an operation with `refresh=wait_for` is added as a listener on that peer during recovery, it is not completed from the perspective of the primary. The primary than may wait for it to complete before advancing the local checkpoint for that peer. Since that peer is not considered in sync, the global checkpoint on the primary can be higher, causing a deadlock. Operation waits for recovery to finish and a refresh to happen. Recovery waits on the operation.
* Allow ingest simulate to parse _id, _index, _type, _routing and _parent as either string or int (#23823)
* Generate data that includes Integer and String type fields for testing document parsing.
https://github.com/elastic/elasticsearch/pull/17379 fixed many metric aggs so that if the parent aggregation does not collect any documents an empty bucket value is returned instead of an ArrayOutOfBoundsException being thrown. Unfortunately the value count aggregation was mised from this fix.
This change applies this fix from #17379 for the value count aggregation.
`ClusterSearchShardsResponseTests.testSerialization` randomly uses `IdsQueryBuilderTests` to generate an alias filter. `IdsQueryBuilderTests` shecks if the array of current types is length zero but it can also be null which causes a `NullPointerException`. This changes adds a null check to avoid the exception.
Closes#26021
* Adds mutate function to various tests
Relates to #25929
* fix test
* implements mutate function for all single bucket aggs
* review comments
* convert getMutateFunction to mutateIInstance
This commit adds the nio transport as an option in place of the mock tcp
transport for tests. Each test will only use one transport type. The
transport type is decided by a random boolean generated inside of the
`ESTestCase` class.
This commit updates the version for master to 7.0.0-alpha1. It also adds
the 6.1 version constant, and fixes many tests, as well as marking some
as awaits fix.
Closes#25893Closes#25870
We are currently quite lenient about the targets of `copy_to`. However in a
number of cases we can detect illegal use of `copy_to` at mapping update time.
For instance, it does not make sense to use object fields as targets of
`copy_to`, or fields that would end up in a different nested document.
When ES starts up we verify we can write to all data folders and that they support atomic moves. We do so by creating and deleting temp files. If for some reason the files was successfully created but not successfully deleted, we still shut down correctly but subsequent start attempts will fail with a file already exists exception.
This commit makes sure to first clean any existing temporary files.
Superseeds #21007
ToXContentToBytes is used as a base class that adds toString and buildAsBytes method implementation to classes that implement ToXContent. With the ongoing cleanups, this class is limited and doesn't add a lot of value, given that buildAsBytes can be replaced with XContentHelper.toXContent and toString can be replaced with Strings.toString(this).
The plan would be to remove ToXContentToBytes entirely, and AbstractQueryBuilder is the first place where we can remove its usage.
During peer recoveries, we need to copy over lucene files and replay the operations they miss from the source translog. Guaranteeing that translog files are not cleaned up has seen many iterations overtime. Back in the old 1.0 days, recoveries went through the Engine and actively prevented both translog cleaning and lucene commits. We then moved to a notion called Translog Views, which allowed the recovery code to "acquire" a view into the translog which is then guaranteed to be kept around until the view is closed. The Engine code was free to commit lucene and do what it ever it wanted without coordinating with recoveries. Translog file deletion logic was based on reference counting on the file level. Those counters were incremented when a view was acquired but also when the view was used to create a `Snapshot` that allowed you to read operations from the files. At some point we removed the file based counting complexity in favor of constructs on the Translog level that just keep track of "open" views and the minimum translog generation they refer to. To do so, Views had to be kept around until the last snapshot that was made from them was consumed. This was fine in recovery code but lead to [a subtle bug](https://github.com/elastic/elasticsearch/pull/25862) in the [Primary Replica Resyncer](https://github.com/elastic/elasticsearch/pull/25862).
Concurrently, we have developed the notion of a `TranslogDeletionPolicy` which is responsible for the liveness aspect of translog files. This class makes it very simple to take translog Snapshot into account for keep translog files around, allowing people that just need a snapshot to just take a snapshot and not worry about views and such. Recovery code which actually does need a view can now prevent trimming by acquiring a simple retention lock (a `Closable`). This removes the need for the notion of a View.
The following token filters were moved: delimited_payload_filter, keep, keep_types, classic, apostrophe, decimal_digit, fingerprint, min_hash and scandinavian_folding.
Relates to #23658
This commit adds a bootstrap check for the maximum file size, and
ensures the limit is set correctly when Elasticsearch is installed as a
service on systemd-based systems.
Relates #25974
We have a command-line flag -V or --version that can be used to display
the version of Elasticsearch. However, the version that we display does
not contain whether or not the version is a snapshot build. This commit
changes the behavior here so that if the build is a snapshot, that is
included in the version string.
Relates #25970
Previously we manually checked if mutually exclusive options are passed
on the command line. Yet, after an upgrade to our option parser
dependency, we were able to use built-in functionality to establish
these mutually exclusive options and the parser would take care of
checking if such options are passed on the command line. However, the
previous manually checking code is now dead and was left behind. This
commit removes that dead code.
Relates #19278
The failure reason for snapshot shard failures might not be propagated properly if the master node changes after the errors were reported by other data nodes. This commits ensures that the snapshot shard failure reason is preserved properly and adds workaround for reading old snapshot files where this information might not have been preserved.
Closes#25878
The Writeble representation is less heavy to parse and that will benefit percolate performance and throughput.
The query builder's binary format has now the same bwc guarentees as the xcontent format.
Added a qa test that verifies that percolator queries written in older versions are still readable by the current version.
This change merges the functionality of the FiltersFunctionScoreQuery in the FunctionScoreQuery.
It also ensures that an exception is thrown when the computed score is equals to Float.NaN or Float.NEGATIVE_INFINITY.
These scores are invalid for TopDocsCollectors that relies on score comparison.
Fixes#15709Fixes#23628
This commit fixes tests for environment-aware commands. A previous
change added a check that es.path.conf is not null. The problem is that
this system property is not being set in tests so this check trips every
single time. To fix this, we move the check into a method that can be
overridden, and then override this method in relevant places in tests to
avoid having to set the property in tests. We also add a test that this
check works as expected.