The number of user data attributes of an index commit has increased
from 6 to 8, but we forgot to adjust. This change increases the initial
size of that map to avoid resizing.
Reducing some methods scope and marking them as static where possible. Removing
"alias" support from AnalysisRegistry#produceAnalyze and changing that method to
return a NamedAnalyzer instead of having a side effect on the analyzer map passed in.
Also, CustomAnalyzerProvider doesn't seem to need the `environment` field.
Today if `cluster.routing.rebalance.enable: none` then rebalancing is disabled,
but we still execute `balanceByWeights()` and perform some rather expensive
calculations before discovering that we cannot rebalance any shards. In a large
cluster this can make cluster state updates occur rather slowly. With this
change we check earlier whether rebalancing is globally disabled and, if so,
avoid the rebalancing process entirely.
The phrase suggesters have an option to remove terms that have
a frequency lower than a provided min_doc_freq. However this value is
overwritten by the frequency of the original term in the popular mode.
This change ensures that we keep the maximum value between the provided
min_doc_value and the original term frequency as a threshold to select
candidates.
Fixes#16764
Today we are strict when parsing build flavor and types off the
wire. This means that if a later version introduces a new build flavor
or type, an older version would not be able to parse what that new
version is sending. For a practical example of this, we recently added
the build type "docker", and this means that in a rolling upgrade
scenario older nodes would not be able to understand the build type that
the newer node is sending. This breaks clusters and is bad. We do not
normally think of adding a new enumeration value as being a
serialization breaking change, it is just not a lesson that we have
learned before. We should be lenient here though, so that we can add
future changes without running the risk of breaking ourselves
horribly. It is either that, or we have super-strict testing
infrastructure here yet still I fear the possibility of mistakes. This
commit changes the parsing of build flavor and build type so that we are
still strict at startup, yet we are lenient with values coming across
the wire. This will help avoid us breaking rolling upgrades, or clients
that are on an older version.
If the transport service is stopped, likely because we are shutting
down, and a retention lease background sync fires the logs will display
a warn message and stacktrace. Yet, this situaton is harmless and can
happen as a normal course of business when shutting down. This commit
suppresses the log messages in this case.
The `getIndexShard()` and `sendReplicaRequest()` methods in
TransportReplicationAction are effectively only used to customise some
behaviour in tests. However there are other ways to do this that do not cause
such an obstacle to separating the TransportReplicationAction into its two
halves (see #40706).
This commit removes these customisation points and injects the test-only
behaviour using other techniques.
Many gradle projects specifically use the -try exclude flag, because
there are many cases where auto-closeable resource ignore is never
referenced in body of corresponding try statement. Suppressing this
warning specifically in each case that it happens using
`@SuppressWarnings("try")` would be very verbose.
This change removes `-try` from any gradle project and adds it to the
build plugin. Also this change removes exclude flags from gradle projects
that is already specified in build plugin (for example -deprecation).
Relates to #40366
Primary-replica resync in a mixed-cluster between 6.x and 5.6 can send
operations without sequence number to a replica which already processed
operations with sequence number. This leads to the failure of that
replica for we trip the sequence number assertion when writing resync
operations without sequence number to translog.
This change rejects an illegal combination of flush parameters where
force is true, but wait_if_ongoing is false. This combination is trappy
and should be forbidden.
Closes#36342
If there's an ongoing flush triggered by the translog flush threshold,
we may fail to execute a flush because waitIfOngoing is false by
default.
Relates to #36342
If a refresh, which is scheduled by the setting change, executes after
the index-2 operation and win the refresh race (i.e., maybeRefresh) with
the scheduledRefresh that we are going to check, then the latter will
return false.
Closes#39565
Relates #39462
PR #40387
A user reported that the same query that takes ~900ms when querying an index
pattern only takes ~50ms when only querying indices that have matches. The
query is a date range query and we confirmed that the `can_match` phase works
as expected. I was able to reproduce this issue locally with a single node: with
900 1-shard indices, a query to an index pattern that matches all indices runs
in ~90ms while a query to the only index that has matches runs in 0-1ms.
This ended up not being related to the `can_match` phase but to the cost of
resolving aliases when querying an index pattern that matches lots of indices.
In that case, we first resolve the index pattern to a list of concrete indices
and then for each concrete index, we check whether it was matched through an
alias, meaning we might have to apply alias filters. Unfortunately this second
per-index operation runs in linear time with the number of matched concrete
indices, which means that alias resolution runs in O(num_indices^2) overall.
So queries get exponentially slower as an index pattern matches more indices.
I reorganized alias resolution into a one-step operation that runs in linear
time with the number of matches indices, and then a per-index operation that
runs in linear time with the number of aliases of this index. This makes alias
resolution run is O(num_indices * num_aliases_per_index) overall instead. When
testing the scenario described above, the `took` went down from ~90ms to ~10ms.
It is still more than the 0-1ms latency that one gets when only querying the
single index that has data, but still much better than what we had before.
Closes#40248
If a field `field_name` was missing in a document,
doc['field_name'].get(0) incorrectly retrieved
a value of the previously accessed document.
This happened because `get(int index)` function
was just accessing `values[index]` without
checking the number of values - `count`.
This PR fixes this.
There were some test failures caused by the background retention lease sync running on a relocated
primary. This commit fixes the situation that triggered the assertion and reactivates the failing test.
Closes#40731
The Eclipse compiler (4.10, Photon) cannot build this test because it cannot
correctly infer the type arguments of the functions. Explicitely adding them
helps in this case.
This change adds the following internal refactorings:
* wraps input analyzers into an unmodifiable map in IndexAnalyzers ctor
* removes duplicated indexSetting in IndexAnalyzers
* removes references to IndexAnalyzers from DocumentMapperParser and TypeParser.ParserContext.
It can always be retrieve it from MapperService directly in those cases
It is important that resync actions are not rejected on the primary even if its
`write` threadpool is overloaded. Today we do this by exposing
`registerRequestHandlers` to subclasses and overriding it in
`TransportResyncReplicationAction`. This isn't ideal because it obscures the
difference between this action and other replication actions, and also might
allow subclasses to try and use some state before they are properly
initialised. This change replaces this override with a constructor parameter to
solve these issues.
Relates #40706
This commit deprecates versions of Java prior to Java 11. This commit
will cause a warning to be printed to standard error when any command
line tool is invoked, or when Elasticsearch is started. Additionally, we
log a deprecation message when Elasticsearch is started.
`TransportReplicationAction` is a rather complex beast, and some of its
concrete implementations do not need all of its features. More specifically, it
(a) chases a primary around the cluster until it manages to pin it down and
then (b) executes an action on that primary and all its replicas. There are
some actions that are coordinated by the primary itself, meaning that there is
no need for the chase-the-primary phases, and in the case of peer recovery
retention leases and primary/replica resync it is important to bypass these
first phases.
This commit is a step towards separating the `TransportReplicationAction` into
these two parts. It is a mostly mechanical sequence of steps to remove some
abstractions that are no longer in use.
Completion and DocStats are pulled from internal readers
instead of external since #33835 and #33847 which doesn't require
us to refresh after a stats call since refreshes will happen internally
anyhow and that will cause updated stats on ongoing indexing.
In 6.7.0 (#39378) we added a build type of DOCKER for the docker images, but
unfortunately earlier versions do not understand this and will reject any
transport messages that mention this build type.
This commit fixes this by reporting TAR instead of DOCKER when talking to older
nodes.
Relates (but does not fix) #40511
Relates #39378
In order to remain compatible with the existing joda based
implementation the parsing of milliseconds should support parsing single
digits instead of relying on three, even with strict formats.
This adds a few tests to duel against the existing joda based
implementation in order to ensure the parsing behaviour is the same.
Closes#40403
Currently, if Manifest write is unsuccessful (i.e. WriteStateException
is thrown) we perform cleanup of newly created metadata files.
However, this is wrong.
Consider the following sequence (caught by CI here
https://github.com/elastic/elasticsearch/issues/39077):
- cluster global data is written **successful**
- the associated manifest write **fails** (during the fsync, ie files
have been written)
- deleting (revert) the manifest files, **fails**, metadata is
therefore persisted
- deleting (revert) the cluster global data is **successful**
In this case, when trying to load metadata (after node restart
because of dirty WriteStateException), the following exception will
happen
```
java.io.IOException: failed to find global metadata [generation: 0]
```
because the manifest file is referencing missing global metadata file.
This commit checks if thrown WriteStateException is dirty and if its
we don't perform any cleanup, because new Manifest file might be
created, but its deletion has failed.
In the future, we might add more fine-grained check - perform the
clean up if WriteStateException is dirty, but Manifest deletion is
successful.
Closes https://github.com/elastic/elasticsearch/issues/39077
(cherry picked from commit 1fac56916bb3c4f3333c639e59188dbe743e385b)
On mapping updates the `text` field mapper does not update
the field types for the underlying prefix and phrase fields.
In practice this shouldn't be considered as a bug but we have
an assert in the code that check that field types in the mapper service
are identical to the ones present in field mappers.
When geo point parsing threw a parse exception, it did not consume
remaining tokens from the parser. This in turn meant that
indexing documents with malformed geo points into mappings with
ignore_malformed=true would fail in some cases, since DocumentParser
expects geo_point parsing to end on the END_OBJECT token.
Related to #17617
As part of #40177 we have added top-level pipeline aggs to
`InternalAggregations`. Given that `QuerySearchResult` holds an
`InternalAggregations` instance, there is no need to keep on setting
top-level pipeline aggs separately. Top-level pipeline aggs can then
always be transported through `InternalAggregations`. Such change is
made in a backwards compatible manner.
IOException are never thrown in any of the existing pipeline aggregation
builders. Removing the throws IOException from the create method allows
to remove it also from a couple of other methods which ends up simplifying
AggregationPhase (one less catch).
The cat recovery API is incredibly useful. Yet it is missing the start
and stop time as an option from the output. This commit adds these as
options to the cat recovery API. We elect to make these not visible by
default to avoid breaking the output that users might rely on.
To make script_score query to have the same features
as function_score query, we need to add randomScore
function.
This function produces different
random scores on different index shards.
It is also able to produce random scores
based on the internal Lucene Document Ids.
Today if you try and insert a very large number like `1e9999999` into a long
field we first construct this number as a `BigDecimal`, convert this to a
`BigInteger` and then reject it because it is out of range. Unfortunately
making such a large `BigInteger` is rather expensive.
We can avoid this expense by performing a (weaker) range check on the
`BigDecimal` representation of incoming `long`s too.
Relates #26137Closes#40323
In #33062 we introduced the `cluster.remote.*.proxy` setting for proxied
connections to remote clusters, but left it deliberately undocumented since it
needed followup work so that it could work with SNI. However, since #32517 is
now closed we can add this documentation and remove the comment about its lack
of documentation.
This commit fixes an edge case in tests where search hits are empty
after the merge but some shards returned hits. This can happen if
the total number of merged hits is less than the provided `from`.
Closes#40553
It initially mentioned the type in the exception because the type used to be
required to uniquely identify a document. This is not necessary anymore given
that indices have at most one type.
`Index` interns its name and uuid. My guess is that the main goal is to avoid
having duplicate strings in the representation of the cluster state. However
I doubt it helps much given that we have many other objects in the cluster state
that we don't try to reuse, and interning has some cost. When looking into
#40263 my profiler pointed to string interning because of the `Index` object
that is created in `QueryShardContext` as one of the bottlenecks of the
`can_match` phase.
Adds the search_as_you_type field type that acts like a text field optimized
for as-you-type search completion. It creates a couple subfields that analyze
the indexed terms as shingles, against which full terms are queried, and a
prefix subfield that analyze terms as the largest shingle size used and
edge-ngrams, against which partial terms are queried
Adds a match_bool_prefix query type that creates a boolean clause of a term
query for each term except the last, for which a boolean clause with a prefix
query is created.
The match_bool_prefix query is the recommended way of querying a search as you
type field, which will boil down to term queries for each shingle of the input
text on the appropriate shingle field, and the final (possibly partial) term
as a term query on the prefix field. This field type also supports phrase and
phrase prefix queries however
This commit adds an InboundHandler to handle inbound message processing.
With this commit, this code is moved out of the TcpTransport.
Additionally, finer grained unit tests are added to ensure that the
inbound processing works as expected
Replicated closed indices can't be indexed into or searched, and therefore don't need a shard with
full indexing and search capabilities allocated. We can save on a lot of heap memory for those
indices by not allocating a mapper service and caching infrastructure (which preallocates a constant
amount per instance). Before this change, a 1GB ES instance could host 250 replicated closed
metricbeat indices (each index with one shard). After this change, the same instance can host 7300
replicated closed metricbeat instances (not that this would be a recommended configuration). Most
of the remaining memory is in the cluster state and the IndexSettings object.
Switches "discovery.type: single-node" from using a separate implementation for single-node discovery to using the existing standard discovery implementation, with two small adaptions:
- auto-bootstrapping, but requiring initial_master_nodes not to be set.
- not actively pinging other nodes using the Peerfinder
- not allowing other nodes to join its single-node cluster (if they have e.g. been set up using regular discovery and connect to the single-disco node).
Currently there are some components of message serializer and sending
that still occur in TcpTransport. This commit makes it possible to
send a message without the TcpTransport by moving all of the remaining
application logic to the OutboundHandler. Additionally, it adds unit
tests to ensure that this logic works as expected.
This test inadvertently asserts that the election occurs after a master failure
is clean. However, messy elections are a fact of life so we should not fail on
a messy election.
This change moves this test away from an `AbstractDisruptionTestCase` since it
does not need the fault detector to be so enthusiastic, and weakens the
assertions to merely say that we ignore states published by the old master
without saying anything about the cleanliness of the election.
Closes#36556
Currently the TransportMessageListener is applied and used in the
Transport class. However, local requests and responses never make it to
this class. This PR moves the listener add/remove methods to the
TransportService. After this change the Transport can only have one
listener set with it. This one listener is the TransportService, which
will then propogate the events to the external listeners.
Additionally this commit back ports #40237
Remove Tracer from MockTransportService
Currently the TransportMessageListener is applied and used in the
Transport class. However, local requests and responses never make it to
this class. This PR moves the listener add/remove methods to the
TransportService. After this change the Transport can only have one
listener set with it. This one listener is the TransportService, which
will then propogate the events to the external listeners.
Java-time fails parsing composite patterns when first pattern matches only the prefix of the input. It expects pattern in longest to shortest order. Because of this constructing just one DateTimeFormatter with appendOptional is not sufficient. Parsers have to be iterated and if the parsing fails, the next one in order should be used. In order to not degrade performance parsing should not be throw exceptions on failure. Format.parseObject was used as it only returns null when parsing failed and allows to check if full input was read.
closes#39916
backport #40100
The implementation of TransportIndexAction and TransportDeleteAction as
TransportReplicationAction existed for interoperability with older 5.x nodes, as these older nodes
coordinated single index / deletes as replication requests. This BWC layer is no longer needed in 7.x,
where these single actions are now mapped to bulk requests. Completely removing the deprecated
transport actions is not possible yet if we want to keep BWC with a 6.x transport client. The best
way here is to wait for the transport client to go away and then just remove the actions.
Each cluster state publication schedules a cancellation task with the provided publication timeout
(30s by default). This scheduled cancellation keeps a reference to the publication, and therefore the
full cluster state that was published. In case of frequently updating a large cluster state, this results
in a large number of cancellation tasks keeping references to all previously published cluster states.
FilterDirectory.getPendingDeletions does not delegate, fixed
temporarily by overriding in StoreDirectory.
This in turn caused duplicate file name use after a trimUnsafeCommits
had been done, since a new IndexWriter would not consider the pending
deletes in IndexFileDeleter. This should only happen on windows (AFAIK).
Reenabled doing index updates for all tests using
IndexShardTests.indexOnReplicaWithGaps (which could fail due to above
when using mocked WindowsFS).
Added getPendingDeletions delegation to all elasticsearch
FilterDirectory subclasses that were not trivial test-only overrides to
minimize the risk of hitting this issue in another case.
This test checks that interval queries constructed against a field with no indexed
positions will throw exceptions. It uses a randomly-build IntervalsSourceProvider
against a fixed set of fields; however, the random source builder can occasionally
provide a source with a fixed field, meaning that even if the top-level query asks
for a set of intervals over a non-indexed field, the source will delegate to another
field, and no exception will be thrown.
This commit changes the test to always use a simple Match provider.
Fixes#40436
* Log Warning on Failed Blob Deletes in BlobStoreRepository
* We should not just debug log these spots, they all can and will lead to leaked files when snapshot deletion fails
Right now, the stats API only provides refresh metrics regarding
internal refreshes. This isn't very useful and somewhat misleading for
cluster administrators since the internal refreshes are not indicative
of documents being available for search.
In this PR I added a new metric for collecting external refreshes as
they occur and exposing them through the stats API. Now, calling an
endpoint for stats will yield external refresh metrics as well.
Relates #36712
In some cases, a request to perform a retention lease action can arrive
on a primary shard before it is active. In this case, the primary shard
would not yet be in primary mode, tripping an assertion in the
replication tracker. Instead, we should not attempt to perform such
actions on an initializing shard. This commit addresses this by not
returning the primary shard in the single shard iterator if the primary
shard is not yet active.
We were accidentally not mapping the index, which meant dynamic mapping
was choosing floats for the values. This led to enough loss of precision
for the aggregated values to differ slightly from the test doubles,
which accumulated into large differences in the holt output.
This test fix adds an explicit mapping.
This commit adjusts the frequency with which CCR renews retention leases
and with which primaries sync retention leases to replicas. This helps
Lucene reclaim soft-deleted documents more aggressively, which we have
found in some use-cases can help improve performance, and either way
will help keep disk space under more control.
This is the equivalent of the `field_masking_span` query, allowing users to
merge intervals from multiple fields - for example, to search for stemmed tokens
near unstemmed tokens.
Currently, we cannot update index setting index.translog.sync_interval if index is open, because it's
not dynamic which can be updated for closed index only.
Closes#32763
A recent refactoring (#37130) where imports got mixed up (changing Lucene's
IndexNotFoundException to Elasticsearch's IndexNotFoundException) led to many warnings being
logged in case of restoring a fresh snapshot.
This change adds an option to convert a `date` field to nanoseconds resolution
and a `date_nanos` field to millisecond resolution when sorting.
The resolution of the sort can be set using the `numeric_type` option of the
field sort builder. The conversion is done at the shard level and is restricted
to dates from 1970 to 2262 for the nanoseconds resolution in order to avoid
numeric overflow.
If a replica were first reset due to one primary failover and then
promoted (before resync completes), its MSU would not include changes
since global checkpoint, leading to errors during translog replay.
Fixed by re-initializing MSU before restoring local history.
Today we don't return segments stats for closed indices which makes it
hard to tell how much memory such an index would require. With this change
we return the statistics if requested by setting `include_unloaded_segments` to
true on the rest request.
Relates to #39512
Today RareClusterStateIT#testAssignmentWithJustAddedNodes fails on my Mac
because it waits for the default connection timeout of 30 seconds to connect to
a fake node with IP address 0.0.0.0. This connection attempt fails much more
quickly on Linux so the test passes.
This commit fixes this by reducing the connection timeout for this test.
Unlike index operations which can fail at the document level to
analyzing errors, delete operations should never fail at the document
level whether soft-deletes is enabled or not. With this change, we will
always fail the engine if we fail to apply a delete operation to Lucene.
Closes#33256
We currently convert pipeline aggregators to their corresponding
InternalAggregation instance as part of the final reduction phase.
They arrive to the coordinating node as part of QuerySearchResult
objects fom the shards and, despite we may incrementally reduce
aggs (hence we may have some non-final reduce and the final
one later) all the reduction phases happen on the same node.
With CCS minimizing roundtrips though, each cluster performs its
own non-final reduction, and then serializes the results back to
the CCS coordinating node which will perform the final coordination.
This breaks the assumptions made up until now around reductions
happening all on the same node.
With #40101 we have made sure that top-level pipeline aggs are not
reduced as part of the non-final reduction. The next step is to make
sure that they don't get lost, meaning that each coordinating node
needs to send them back to the CCS coordinating node as part of
the top-level `InternalAggregations` object.
Closes#40059
Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction, pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.
Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.
Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.
Relates to #40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
When minimizing round-trips, each cluster returns its own independent
search response. In case sort by field and/or field collapsing were
requested, when one cluster has no results to return, the information
about the field that sorting was based on (SortField array) as well as
the field (and the values) that collapsing was performed on are missing
in the search response. That causes problems as we can't build the
proper `TopDocs` instance which would need to be either `TopFieldDocs`
or `CollapseTopFieldDocs`. The merge routine expects that all the top
docs are of the same exact type which can't be guaranteed. Given that
the problematic results are empty, hence have no impact on the final
results, we can simply skip them.
Relates to #32125Closes#40067
When using DFS_QUERY_THEN_FETCH search type, the dfs phase is run and
its results are used in the query phase to make scoring accurate.
When using CCS, depending on whether the DFS phase runs in the CCS
coordinating node (like if all shards were local) or in each remote
cluster (when minimizing round-trips), scoring will differ.
This commit disables minimizing round-trips whenever DFS is requested,
as it is not currently possible to ensure that scoring is accurate in
that case.
Relates to #32125
When a node is repurposed to master/no-data or no-master/no-data, v7.x
will not start (see #37748 and #37347). The `elasticsearch repurpose`
tool can fix this by cleaning up the problematic data.
The cache used in linearizability checker now uses approximately 6x less
memory by changing the cache from a set of (bits, state) tuples into a
map from bits -> { state }.
Each combination of states is kept once only, building on the
assumption that the number of state permutations is small compared to
the number of bits permutations. For those histories that are difficult
to check we will have many bits combinations that use the same state
permutations.
We end up now using approximately 15 bytes per entry compared to 101
bytes before, ie. a 6x improvement, allowing us to linearizability check
significantly longer histories.
Re-enabled linearizability checker in CoordinatorTests, hoping above
ensures we no longer run out of memory.
Resolves#39437
We introduced WAIT_CLUSTERSTATE action in #19287 (5.0), but then stopped
using it since #25692 (6.0). This change removes that action and related
code in 7.x and 8.0.
Relates #19287
Relates #25692
This PR introduces AsyncRecoveryTarget which executes remote calls of
peer recovery asynchronously. In this change, we also add a new
assertion to ensure that method sendBatch, which sends a batch of
history operations in phase2, is never called recursively on the same
thread. This new assertion will also be used in method sendFileChunks.
This change adds a wrapper for IndexSearcher that makes IndexSearcher#search(List, Weight, Collector) visible by
sub-classes. The wrapper is used by the ContextIndexSearcher to call this protected method on a searcher created by a plugin.
This ensures that an override of the protected method in an IndexSearcherWrapper plugin is called when a search is executed.
Closes#30758
This change adds an option to the `FieldSortBuilder` that allows to transform the type
of a numeric field into another. Possible values for this option are `long` that transforms
the source field into an integer and `double` that transforms the source field into a floating point.
This new option is useful for cross-index search when the sort field is mapped differently on some
indices. For instance if a field is mapped as a floating point in one index and as an integer in another
it is possible to align the type for both indices using the `numeric_type` option:
```
{
"sort": {
"field": "my_field",
"numeric_type": "double" <1>
}
}
```
<1> Ensure that values for this field are transformed to a floating point if needed.
This commit removes the cluster state size field from the cluster state
response, and drops the backwards compatibility layer added in 6.7.0 to
continue to support this field. As calculation of this field was
expensive and had dubious value, we have elected to remove this field.
Currently, we maintain a transport name ("mock-nio", "nio", "netty")
that is passed to a `TcpTransportChannel` when a request is received.
The value of this name is to associate with the task when we register a
task with the task manager. However, it is only possible to run ES with
one transport, so having an implementation specific name is unnecessary.
This commit removes the name and replaces it with the generic
"transport".
The `sampler` agg creates a BestDocsDeferringCollector, which internally
initializes a priority queue of size `shardSize`. This queue is
populated with empty `Object` sentinels, which is roughly 16b per
object.
Similarly, the Diversified samplers create a DiversifiedTopDocsCollectors
which internally track PQ slots with ScoreDocKeys, weighing in around
28kb
If the user sets a very abusive `shard_size`, this could easily OOM
a node or cluster since these PQ are allocated up-front without
any checks.
This commit makes sure that when we create the collector, it
cannot be larger than the maxDoc so that we don't accidentally blow
up the node. We ensure the size is not greater than the overall
index maxDoc. A similar treatment is done for `maxDocsPerValue`
parameter of the diversified samplers
For good measure, this also adds in some CB accounting to try and track
memory usage.
Finally, a redundant array creation is removed to reduce a bit of
temporary memory.