Add support for BlockMax WAND via a minExactHits parameter. Hits will be counted accurately at least until this value, and above that, the count will be an approximation. In distributed search requests, the count will be per shard, so potentially the count will be accurately counted until numShards * minExactHits. The response will include the value numFoundExact which can be true (The value in numFound is exact) or false (the value in numFound is an approximation).
TestIndexWriterReader#testDuringAddDelete times out
if nightly hits a configuration that causes tons of flushes
with combination of SMS. This change disables auto-flush for this
test on nightly since it already produces lots of segments by definion.
This reverts commit 28e47549c8.
The use of RegExpQuery as a fallback has to consider that the search string may contain characters which are illegal regex syntax and need escaping.
Will rethink the approach.
Fix for Jira issue 9365 where search for `abc` doesn't match doc `abcd` if prefixlength = 3 and edit distance =1.
The fix is to rewrite the FuzzyQuery as a regex if prefix length == search string length.
LUCENE-9068 moved fuzzy automata construction into FuzzyQuery itself. However,
this has the nasty side-effect of blowing up query caches that expect queries to be
fairly small. This commit restores the previous behaviour of caching the large automata
on an AttributeSource shared between segments, while making the construction a
bit clearer by factoring it out into a package-private FuzzyAutomatonBuilder.
The grouping module tests currently all try and test both grouping by term and
grouping by ValueSource. They are quite difficult to follow, however, and it is not
at all easy to add tests for a new grouping type. This commit adds a new
BaseGroupSelectorTestCase class which can be extended to test particular
GroupSelector implementations, and adds tests for TermGroupSelector and
ValueSourceGroupSelector. It also adds a separate test for Block grouping,
so that the distinct grouping types are tested separately.
* Sync French stop words with latest version from Snowball.
This new version removed some French homonyms from the list
* Use latest master commit from snowball-website
* LUCENE-9354: regenerate with 'gradle snowball
* LUCENE-9354: add CHANGES.txt entry
TermInSetQuery currently iterates through all its prefix-encoded terms
in order to build an array to pass back to its visitor when visit() is called.
This seems like a waste, particularly when the visitor is not actually
consuming the terms (for example, when doing a clause-count check
before executing a search). This commit changes TermInSetQuery to use
consumeTermsMatching(), and also changes the signature of this method so
that it takes a BytesRunAutomaton supplier to allow for lazy instantiation. In
addition, IndexSearcher's clause count check wasn't counting leaves that
called consumeTermsMatching().