Now that we unwrap mappings in DocumentMapperParser#extractMappings, it is not
necessary for the mapping definition to always be nested under the type. This
leniency around the mapping format was added in 2341825358.
* The static threadpool leaks a lot of memory in these tests because it prevents things like the connect listeners from `org.elasticsearch.transport.TcpTransport#initiateConnection`
to be GCed between tests (since they keep being referenced by the threadpool) which in turn reference channels and their underlying buffers
* I could not find any slowdown in executing these tests from this change, if anything they are slightly faster now on my machine
* Relates #36906 (which may be caused by slowness from leaking memory and also becomes testable in a loop by this change)
Types can be used both in the source and dest section of the body which will
be translated to search and index requests respectively. Adding a deprecation warning
for those cases and removing examples using more than one type in reindex since
support for this is going to be removed.
The `composite` aggregation uses a TreeMap to keep track of the best buckets.
This ensures a log(n) time cost to insert new buckets but also to retrieve buckets
that are already present in the map. In order to speed up the retrieval of buckets
this change replaces the TreeMap with a priority queue and a HashMap. The insertion
cost is still log(n) but the retrieval of buckets through the HashMap is now done in constant
time. This optimization can bring significant improvement since each document needs
to check if its associated buckets are already present in the current best buckets.
With this commit we rename `node.store.allow_mmapfs` to
`node.store.allow_mmap`. Previously this setting has controlled whether
`mmapfs` could be used as a store type. With the introduction of
`hybridfs` which also relies on memory-mapping,
`node.store.allow_mmapfs` also applies to `hybridfs` and thus we rename
it in order to convey that it is actually used to allow memory-mapping
but not a specific store type.
Relates #36668
Relates #37070
The QueryStringQueryBuilder does not currently delegate to the field mapper's prefixQuery
method, so does not use indexed prefixes. This commit corrects this.
It also fixes a bug where a query a* would not match the word a if indexed prefixes were used with
a minchar setting of 2.
When executing terms aggregations we set the shard_size, meaning the
number of buckets to collect on each shard, to a value that's higher than
the number of requested buckets, to guarantee some basic level of
precision. We have an optimization in place so that we leave shard_size
set to size whenever we are searching against a single shard, in which
case maximum precision is guaranteed by definition.
Such optimization requires us access to the total number of shards that
the search is executing against. In the context of cross-cluster search,
once we will introduce multiple reduction steps (one per cluster) each
cluster will only know the number of local shards, which is problematic
as we should only optimize if we are searching against a single shard in a
single cluster. It could be that we are searching against one shard per cluster
in which case the current code would optimize number of terms causing
a loss of precision.
While discussing how to address the CCS scenario, we decided that we do
not want to introduce further complexity caused by this single shard
optimization, as it benefits only a minority of cases, especially when
the benefits are not so great.
This commit removes the single shard optimization, meaning that we will
always have heuristic enabled on how many number of buckets to collect
on the shards, even when searching against a single shard.
This will cause more buckets to be collected when searching against a single
shard compared to before. If that becomes a problem for some users, they
can work around that by setting the shard_size equal to the size.
Relates to #32125
With #36221 we introduced shards counting to address a rare failure.
This caused a worse problem in this test when replicas were allocated
and shards failures were randomly returned. The latch has to take into
account additional attempts caused by the shard failures, which means
that in order for run to be called, performPhaseOnShard will be called
(numShards + numFailures) times.
To address this, we need to decide upfront which shard is going to fail,
making sure that at least one shards is successful otherwise the whole
request fails.
Closes#37074
With this commit we introduce a new store type `hybridfs` that is a
hybrid between `mmapfs` and `niofs`. This store type chooses different
strategies to read Lucene files based on the read access pattern (random
or linear) in order to optimize performance.
This store type has been available in earlier versions of Elasticsearch
as `default_fs`. We have chosen a different name now in order to convey
the intent of the store type instead of tying it to the fact whether it
is the default choice.
Relates #36668
SearchAsyncActionTests may fail with RejectedExecutionException as InitialSearchPhase may try to execute a runnable after the test has successfully completed, and the corresponding executor was already shut down. The latch was located in getNextPhase that is almost correct, but does not cover for the last finishAndRunNext round that gets executed after onShardResult is invoked.
This commit moves the latch to count the number of shards, and allowing the test to count down later, after finishAndRunNext has been potentially forked. This way nothing else will be executed once the executor is shut down at the end of the tests.
Closes#36221Closes#33699
* Fixes the issue reproduced in the added tests:
* When having open index requests on a shard that are waiting for a refresh, relocating that shard
becomes blocked until that refresh happens (which could be never as in the test scenario).
With #36997 we added the ability to provide a cluster alias with a
SearchRequest.
The next step is to disable the final reduction whenever a cluster alias
is provided with the SearchRequest. A cluster alias will be provided
when executing a cross-cluster search request with alternate execution
mode, where each cluster does its own reduction locally. In order for
the CCS node to be able to later perform an additional reduction of the
results, we need to make sure that all the needed info stays available.
This means that terms aggregations can be reduced but not pruned, and
pipeline aggs should not be executed. The final reduction will happen
later in the CCS coordinating node.
Relates to #36997 & #32125
With the upcoming cross-cluster search alternate execution mode, the CCS
node will be able to split a CCS request into multiple search requests,
one per remote cluster involved. In order to do that, the CCS node has
to be able to signal to each remote cluster that such sub-requests are
part of a CCS request. Each cluster does not know about the other
clusters involved, and does not know either what alias it is given in
the CCS node, hence the CCS coordinating node needs to be able to provide
the alias as part of the search request so that it is used as index prefix
in the returned search hits.
The cluster alias is a notion that's already supported in the search shards
iterator and search shard target, but it is currently used in CCS as both
index prefix and connection lookup key when fanning out to all the shards.
With CCS alternate execution mode the provided cluster alias needs to be
used only as index prefix, as shards are local to each cluster hence no
cluster alias should be used for connection lookups.
The local cluster alias can be set to the SearchRequest at the transport layer
only, and its constructor/getter methods are package private.
Relates to #32125
Improves on #36449 which did not cover the situation where a node had bumped its term during
the election, and not when receiving the first follower check. This was uncovered while refactoring
NodeJoinTests so that they don't need to access to an internal field of Coordinator anymore (which
can now be made private).
* This speeds up the test from an average 25s down to 7s runtime
* There is no need for artificially slowing down the snapshot to reproduce the issue of an out of sync routing table in practice.
Over hundreds of test runs the test's snapshot shard service still runs in the index not found exception every time reproducing this issue.
* Relates #36294
Keys are compared in BucketSortPipelineAggregation so making key type (ArrayMap) implement Comparable. Maps are compared using the entry set's iterator so ordered maps order is maintain. For each entry first comparing key then value. Assuming all keys are strings. When comparing entries' values if type is not identical and\or type not implementing Comparable, throwing exception. Not implementing equals() and hashCode() functions as parent's ones are sufficient. Tests included.
Today the routing of a SourceToParse is assigned in a separate step
after the object is created. We can easily forget to set the routing.
With this commit, the routing must be provided in the constructor of
SourceToParse.
Relates #36921
We introduce a typeless API in #35790 where we translate the default
docType "_doc" to the user-defined docType. However, we do not rewrite
the SourceToParse with the resolved docType. This leads to a situation
where we have two translog operations for the same document with
different types:
- prvOp [Index{id='9LCpwGcBkJN7eZxaB54L', type='_doc', seqNo=1,
primaryTerm=1, version=1, autoGeneratedIdTimestamp=1545125562123}]
- newOp [Index{id='9LCpwGcBkJN7eZxaB54L', type='not_doc', seqNo=1,
primaryTerm=1, version=1, autoGeneratedIdTimestamp=-1}]
Closes#36769
In this test, we verify that the LocalCheckpointTracker is initialized
with the operations of the safe commit. And the test fails because
Engine#Index does not implement the equals method (should not
implement as it consists of a mutable ParsedDocument).
Closes#36470
This is a follow-up to some discussions around #36399. Currently we have
relatively confusing compression behavior where compression can be
configured for requests based on transport.compress or a specific
setting for a remote cluster. However, we can only compress responses
based on transport.compress as we do not know where a request is
coming from (currently).
This commit modifies the behavior to NEVER compress responses based on
settings. Instead, a response will only be compressed if the request was
compressed. This commit also updates the documentation to more clearly
described transport level compression.
When the script contexts were created in 6, the use of params.ctx was
deprecated. This commit cleans up that code and ensures that params.ctx
is null in both watcher script contexts.
Relates: #34059
This commit adds a RemoteClusterAwareRequest interface that allows a
request to specify which remote node it should be routed to. The remote
cluster aware client will attempt to route the request directly to this
node. Otherwise it will send it as a proxy action to eventually end up
on the requested node.
It implements the ccr clean_session action with this client.