This commit changes the phonetic filter factory to use a DaitchMokotoffSoundexFilter
instead of a PhoneticFilter with a daitch_mokotoff encoder when daitch_mokotoff is selected.
The latter does not hanlde branching when computing the soundex and fails to encode multiple
variations when possible.
Closes#28211
Currently we test all maps, arrays or iterables. However, in the case that maps
contain sub maps for instance, we will test the sub maps again even though the
work has already been done for the top-level map.
Relates #26907
Currently if the Lucene version is `X.Y.Z-snapshot-{gitrev}`, then we will
expect the docs to have `X.Y.Z-snapshot` as a Lucene version. I would like
to change it to `X.Y.Z` so that this doesn't need changing when we move from a
snapshot to a final release.
A bug introduced in #24407 currently prevents `eager_global_ordinals` from
being updated. This new approach should fix the issue while still allowing
mapping updates to not specify the `_parent` field if it doesn't need
updating, which was the goal of #24407.
The composite aggregation defers the collection of sub-aggregations to a second pass that visits documents only if they
appear in the top buckets. Though the scorer for sub-aggregations is not set on this second pass and generates an NPE if any sub-aggregation
tries to access the score. This change creates a scorer for the second pass and makes sure that sub-aggs can use it safely to check the score of
the collected documents.
The method `initiateChannel` on `TcpTransport` is explicit in that
channels can be connect asynchronously. All production implementations
do connect asynchronously. Only the blocking `MockTcpTransport`
connects in a synchronous manner. This avoids testing some of the
blocking code in `TcpTransport` that waits on connections to complete.
Additionally, it requires a more extensive method signature than
required for other transports.
This commit modifies the `MockTcpTransport` to make these connections
asynchronously on a different thread. Additionally, it simplifies that
`initiateChannel` method signature.
* Fix synonym phrase query expansion for cross_fields parsing
The `cross_fields` mode for query parser ignores phrase query generated by multi-word synonyms.
In such case only the first field of each analyzer group is kept. This change fixes this issue
by expanding the phrase query for each analyzer group to **all** fields using a disjunction max query.
This is related to #27933. It introduces a jar named elasticsearch-core
in the lib directory. This commit moves the JarHell class from server to
elasticsearch-core. Additionally, PathUtils and some of Loggers are
moved as JarHell depends on them.
* Adds metadata to rewritten aggregations
Previous to this change, if any filters in the filters aggregation were rewritten, the rewritten version of the FiltersAggregationBuilder would not contain the metadata form the original. This is because `AbstractAggregationBuilder.getMetadata()` returns an empty map when not metadata is set.
Closes#28170
* Always set metadata when rewritten
As a replica always keeps a safe commit and starts peer-recovery with
that commit; file-based recovery only happens if new operations are
added to the primary and the required translog is not fully retained. In
the test, we tried to produce this condition by flushing a new commit in
order to trim all translog. However, if the new global checkpoint is not
persisted yet, we will keep two commits and not trim translog. This
commit tightens the file-based condition in the test by waiting for the
global checkpoint persisted properly on the new primary before flushing.
Close#28209
Relates #28181
We introduced a new option `createNewTranslog` in #28181. However, we
named that parameter as deleteLocalTranslog in other places. This commit
makes sure to have a consistent naming in these places.
Relates #28181
The global checkpoint should be assigned to unassigned rather than 0. If
a single document is indexed and the global checkpoint is initialized
with 0, the first commit is safe which the test does not suppose.
Relates #28038
Today a replica starts a peer-recovery with the last commit. If the last
commit is not a safe commit, a replica will immediately fallback to the
file based sync which is more expensive than the sequence based
recovery. This commit modifies the peer-recovery in replica to start
with a safe commit. Moreover we can keep the existing translog on the
target if the recovery is sequence based recovery.
Relates #10708
We are targeting to always have a safe index once the recovery is done.
This invariant does not hold if the translog is manually truncated by
users because the truncate translog cli resets the global checkpoint to
unassigned. This commit assigns the global checkpoint to the max_seqno
of the last commit when truncating translog. We can only safely do it
because the truncate translog command will generate a new history uuid
for that shard. With a new history UUID, sequence-based recovery between
that shard and other old shards will be disabled.
Relates #28181
Releasble locks hold accounting on who holds the lock when assertions
are enabled. However, the underlying lock can be re-entrant yet we mark
the lock as not held by the current thread as soon as the releasable is
closed. For a re-entrant lock this is not right because the thread could
have entered the lock multiple times. Instead, we have to count how many
times the thread has entered the lock and only mark the lock as not held
by the current thread when the counter reaches zero.
Relates #28202
Currently we keep a 5.x index commit as a safe commit until we have a
6.x safe commit. During that time, if peer-recovery happens, a primary
will send a 5.x commit in file-based sync and the recovery will even
fail as the snapshotted commit does not have sequence number tags.
This commit updates the combined deletion policy to delete legacy
commits if there are 6.x commits.
Relates #27606
Relates #28038
KeyedLockTests calls RandomizedTest.scaledRandomIntBetween from multiple threads which uses RandomizedTest.isNightly()
whereas that method is not concurrency-safe (see implementation of GroupEvaluator.isGroupEnabled)
Today a primary shard transfers the most recent commit point to a
replica shard in a file-based recovery. However, the most recent commit
may not be a "safe" commit; this causes a replica shard not having a
safe commit point until it can retain a safe commit by itself.
This commits collapses the snapshot deletion policy into the combined
deletion policy and modifies the peer recovery source to send a safe
commit.
Relates #10708
The unified highlighter selects a single sentence per fragment from the offset of the first highlighted term.
This change modifies this selection and allows more than one sentence in a single fragment.
The expansion is done forward (on the right of the matching offset), sentences are added to the current fragment iff the overall size of the fragment is smaller than the maximum length (fragment_size).
We should also add a way to expand the left context with the surrounding sentences but this is currently avoided because the unified highlighter in Lucene uses only the first offset that matches the query to derive the start and end offset of the next fragment.
If we expand on the left we could split multiple terms that would be grouped otherwise. Breaking this limitation implies some changes in the core of the unified highlighter.
Closes#28089
Currently when adding a document with a `null` value for a range field,
the range field mapper raises an error. Instead we should ignore null like
we do eg. with numbers or geo points.
Closes#27845
Since Elasticsearch 6.1.0 environment variable substitutions in lists do not work.
Environment variables in a list setting were not resolved, because settings with a list type were skipped during variables resolution.
This commit fixes by processing list settings as well.
Closes#27926
* This change makes sure that we don't detect a file path containing a ':' as
a maven coordinate (e.g.: `file:C:\path\to\zip`)
* restore test muted on master
This change modifies the installation for a meta plugin,
the content of the config and bin directory inside each bundled plugins are now moved in the meta plugin directory.
So instead of `$configDir/meta-plugin-name/bundled_plugin/name/` the content of the config
for a bundled plugin is now in `$configDir/meta-plugin-name`. Same applies for the bin directory.