Before we were checking the number of vectors in the segment against the total
number of documents in IndexReader. This meant FieldExistsQuery would not
rewrite to MatchAllDocsQuery when there were multiple segments.
Instead of collecting hit-by-hit using a `LeafCollector`, we break down the
search by instantiating a weight, creating scorers, and checking the underlying
iterator. If it is backed by a `BitSet`, we directly update the reference (as
we won't be editing the `Bits`). Else we can create a new `BitSet` from the
iterator using `BitSet.of`.
The test used to leave hanging threads behind following a failure. Also one method was executing two different tests. I split the existing method into two and I am now leveraging setup and teardown to properly close all the resources both when the tests succeed as well as whey they fail.
As suggested by @zhaih on #950, we could support more cases in
`BooleanWeight#count`. This PR adds support for these cases specifically:
- Pure disjunctions where only one clause has a non-zero count.
- Pure disjunctions where one clause matches all docs.
- Negations where positive clauses match all docs (pure negation).
- Negations where positive clauses match no docs.
- Negations where negative clauses match no docs.
- Negations where negative clauses match all docs.
The HNSW graph search does not consider that visitedLimit may be reached in the
upper levels of graph search itself
This occurs when the pre-filter is too restrictive (and its count sets the
visitedLimit). So instead of switching over to exactSearch, it tries to pop
from an empty heap and throws an error.
We can check if results are incomplete after searching in upper levels, and
break out accordingly. This way it won't throw heap errors, and gracefully
switch to exactSearch instead
With this change, segments are more likely to be considered for merging until
they reach the max merge size. Before this change, LogMergePolicy would exclude
an entire window of `mergeFactor` segments from merging if this window had a
too large segment and other segments were on the same tier.
* Try to fix the gradle compilation in idea
* Try to detect sync and build phases within intellij and act accordingly to support both modes of compilation (gradle and intellij).
This gives implementations of `findFullFlushMerges` to `LogMergePolicy` and
`TieredMergePolicy` and enables merge-on-refresh with a default timeout of
500ms.
The idea behind the 500ms default is that it felt both high-enough to have time
to run merges of small segments, and low enough that the freshness of the data
wouldn't look badly affected for users who have high refresh rates (e.g.
refreshing every second).
In both cases, `findFullFlushMerges` delegates to `findMerges` and filters
merges whose segments are all below the min/floor size.