TestMatchesIterator lives in core/tests and does various sanity checks
on the matches returned by various queries, including Span queries.
The Span-specific tests cannot stay here once Spans have been moved
out of core. This commit pulls various helper methods from this class
into a base class in the test framework, so that we can move the
Spans tests into their own class and keep coverage once things have
been migrated.
We have a number of helper classes in o.a.l.search that aid the
implementation of two-phase iteration over disjunctions. These have
some Spans-specific code, which will stop compiling once Spans
are moved into the queries module. This commit removes the
Spans references from the main code and duplicates the helper
code within the Spans package.
ConjunctionDISI is really an internal implementation of DocIdSetIterator,
and would ideally be package-private. However, it is used in a few
other places:
* directly in ConjunctionSpans
* as a utility in the facet and join modules
This commit adds a public helper class ConjunctionUtils that allows easy
intersection of iterators for use by other modules. This means that
ConjunctionDISI itself can become package-private. It also removes
a reference to Spans from core classes, which will make it easier to
migrate Spans to the queries module. ConjuctionSpans and
ConjunctionIntervalIterator now use the public Utils class, and intervals
no longer need their own ConjunctionDISI implementation.
We need to update the reading logic of the backward codec in Lucene 9
for LUCENE-9827 and LUCENE-9935 as we have backported them to Lucene 8.
Relates apache/lucene-solr#2495
Relates apache/lucene-solr#2494
This test assumes that there is no merging,
and was failing when there were merges.
This fixes the test but setting NoMergePolicy for
IndexWriter.
Relates to LUCENE-9334
Relates to #11
The public API in LongValueFacetCounts previously required the user to specify whether-or-not a field being counted should be single- or multi-valued (i.e., is it NumericDocValues or SortedNumericDocValues). Since we can detect this automatically, it seems unnecessary to ask users to specify.
Co-authored-by: Greg Miller <gmiller@amazon.com>
* LUCENE-9909: Some jflex regeneration tasks should have proper dependencies and also check the checksums of included files.
* Force a dependency on low-level spotless tasks so that they're always properly ordered (hell!). Update ASCIITLD and regenerate the remaining code. Add cross-dependencies between generation tasks that take includes as input.
Before PR#11, during merging if any merging segment has payloads
for a certain field, the new merged segment will also has payloads
set up for this field.
PR #11 introduced a bug where the first segment among merging
segments will define if the new merged segment will have
payloads. If the first segment doesn't have payloads, and
others do, the new merged segment mistakenly will not
have payloads set up.
This PR fixes this bug.
Relates to #11
DisjunctionMaxQuery stores its disjuncts in a Query[], and uses
Arrays.equals() for comparisons in its equals() implementation.
This means that the order in which disjuncts are added to the query
matters for equality checks.
This commit changes DMQ to instead store its disjuncts in a Multiset,
meaning that ordering no longer matters. The getDisjuncts()
method now returns a Collection<Query> rather than a List, and
some tests are changed to use query equality checks rather than
iterating over disjuncts and expecting a particular order.
The UkrainianMorfologikAnalyzer was reloading its dictionary every
time it created a new TokenStreamComponents, which meant that
while the analyzer was open it would hold onto one copy of the
dictionary per thread.
This commit loads the dictionary in a lazy static initializer, alongside
its stopword set. It also makes the normalizer charmap a singleton
so that we do not rebuild the same immutable object on every call
to initReader.