This commit adds a new createManager static method to RandomSamplingFacetsCollector that allows users to perform random sampling concurrently. The returned collector manager is very similar to the existing FacetsCollectorManager but it exposes a specialized reduced RandomSamplingFacetsCollector.
This relates to [LUCENE-10002](https://issues.apache.org/jira/browse/LUCENE-10002). It allows users to use a collector manager instead of a collector when doing random sampling, in the effort of reducing usages of IndexSearcher#search(Query, Collector).
In the effort of decreasing usages of IndexSearcher#search(query, Collector) by using the corresponding method that accepts a collector manager, this commit replaces many usages of FacetsCollector with its corresponding existing collector manager.
IndexSortSortedNumericDocValuesRangeQuery unconditionally assumes the usage of
the LONG-encoded SortField. Using the numeric range query (in case of sorted
index) with anything but LONG ends up with class cast exception. Now the query
consults the numeric type of the `SortField` and perform appropriate checks.
This commit introduces a no-op implementation of HitsThresholdChecker that does no counting, to be used when early termination is disabled. This is automatically used when creating a TopFieldCollector or a TopScoreDocCollector. In that same scenario MaxScoreAccumulator can be null and scores are no longer accumulated when creating a shared collector manager.
With this, it is safe to replace the custom collector managers in DrillSideways with the ones returned by calling createSharedManager.
Some tests collect matching docs in a FixedBitSet. In the effort of moving such tests to using IndexSearcher#search(Query, CollectorManager) as part of LUCENE-10002, this commit adds a new FixedBitSetCollector class that exposes this functionality as well as a createManager method that returns a corresponding CollectorManager.
Some of our checks relied on doc IDs corresponding to the order in which docs
were passed to IndexWriter. This is fragile and sometimes resulted in failures.
Now we check against an "id" field instead.
As part of #716 I moved the test to use a collector manager, but I forgot to update one of the assertions.
We can't rely on totalHits being accurate when the search is executed my multiple threads and early terminated.
Add demo dependencies to third party modules. Add an IT that checks whether
demo classes are loadable.
Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
Co-authored-by: Julie Tibshirani <julietibs@apache.org>
Instead of caching dictionary strings and building multiple redundant DictionaryLemmatizer objects.
Co-authored-by: Michael Gibney <michael@michaelgibney.net>