This test assumes that there is no merging,
and was failing when there were merges.
This fixes the test but setting NoMergePolicy for
IndexWriter.
Relates to LUCENE-9334
Relates to #11
The public API in LongValueFacetCounts previously required the user to specify whether-or-not a field being counted should be single- or multi-valued (i.e., is it NumericDocValues or SortedNumericDocValues). Since we can detect this automatically, it seems unnecessary to ask users to specify.
Co-authored-by: Greg Miller <gmiller@amazon.com>
* LUCENE-9909: Some jflex regeneration tasks should have proper dependencies and also check the checksums of included files.
* Force a dependency on low-level spotless tasks so that they're always properly ordered (hell!). Update ASCIITLD and regenerate the remaining code. Add cross-dependencies between generation tasks that take includes as input.
Before PR#11, during merging if any merging segment has payloads
for a certain field, the new merged segment will also has payloads
set up for this field.
PR #11 introduced a bug where the first segment among merging
segments will define if the new merged segment will have
payloads. If the first segment doesn't have payloads, and
others do, the new merged segment mistakenly will not
have payloads set up.
This PR fixes this bug.
Relates to #11
DisjunctionMaxQuery stores its disjuncts in a Query[], and uses
Arrays.equals() for comparisons in its equals() implementation.
This means that the order in which disjuncts are added to the query
matters for equality checks.
This commit changes DMQ to instead store its disjuncts in a Multiset,
meaning that ordering no longer matters. The getDisjuncts()
method now returns a Collection<Query> rather than a List, and
some tests are changed to use query equality checks rather than
iterating over disjuncts and expecting a particular order.
The UkrainianMorfologikAnalyzer was reloading its dictionary every
time it created a new TokenStreamComponents, which meant that
while the analyzer was open it would hold onto one copy of the
dictionary per thread.
This commit loads the dictionary in a lazy static initializer, alongside
its stopword set. It also makes the normalizer charmap a singleton
so that we do not rebuild the same immutable object on every call
to initReader.
This PR removes `VectorValues#search` in favor of exposing NN search through
`VectorReader#search` and `LeafReader#searchNearestVectors`. It also marks the
vector methods on `LeafReader` as experimental.
The compilation of the library is slow, disable optimization as it doesn't speed up our usage of the gennorm2 tool.
Use better heuristic for make parallelism (tests.jvms rather than just hardcoded value of four).
LUCENE-9334 requires that docs have the same schema
across the whole schema.
This fixes the test that attempts to modify schema of "number" field
from DocValues and Points to just DocValues.
Relates to #11
Require consistency between data-structures on a per-field basis
A field must be indexed with the same index options and data-structures across
all documents. Thus, for example, it is not allowed to have one document
where a certain field is indexed with doc values and points, and another document
where the same field is indexed only with points.
But it is allowed for a document not to have a certain field at all.
As a consequence of this, doc values updates are
only applicable for fields that are indexed with doc values only.
This commit removes `ramBytesUsed()` from `CodecReader` and all file formats
besides vectors, which is the only remaining file format that might use lots of
memory in the default codec. I left `ramBytesUsed()` on the `completion` format
too, which is another feature that could use lots of memory.
Other components that relied on being able to compute memory usage of readers
like facets' TaxonomyReader and the analyzing suggester assume that readers have
a RAM usage of 0 now.