Fix for Jira issue 9365 where search for `abc` doesn't match doc `abcd` if prefixlength = 3 and edit distance =1.
The fix is to rewrite the FuzzyQuery as a regex if prefix length == search string length.
LUCENE-9068 moved fuzzy automata construction into FuzzyQuery itself. However,
this has the nasty side-effect of blowing up query caches that expect queries to be
fairly small. This commit restores the previous behaviour of caching the large automata
on an AttributeSource shared between segments, while making the construction a
bit clearer by factoring it out into a package-private FuzzyAutomatonBuilder.
The grouping module tests currently all try and test both grouping by term and
grouping by ValueSource. They are quite difficult to follow, however, and it is not
at all easy to add tests for a new grouping type. This commit adds a new
BaseGroupSelectorTestCase class which can be extended to test particular
GroupSelector implementations, and adds tests for TermGroupSelector and
ValueSourceGroupSelector. It also adds a separate test for Block grouping,
so that the distinct grouping types are tested separately.
* Sync French stop words with latest version from Snowball.
This new version removed some French homonyms from the list
* Use latest master commit from snowball-website
* LUCENE-9354: regenerate with 'gradle snowball
* LUCENE-9354: add CHANGES.txt entry
TermInSetQuery currently iterates through all its prefix-encoded terms
in order to build an array to pass back to its visitor when visit() is called.
This seems like a waste, particularly when the visitor is not actually
consuming the terms (for example, when doing a clause-count check
before executing a search). This commit changes TermInSetQuery to use
consumeTermsMatching(), and also changes the signature of this method so
that it takes a BytesRunAutomaton supplier to allow for lazy instantiation. In
addition, IndexSearcher's clause count check wasn't counting leaves that
called consumeTermsMatching().
Today it looks like wild wild west inside IndexWriter and some of it's
associated classes. This change makes sure all non-final members have
private visibility, methods that are not used outside of IW today are
made private unless they have been public. This change also removes
some unused or unnecessary members where possible and deleted some dead
code from previous refactoring.