IntervalBuilder.NO_INTERVALS should return -1 when unpositioned,
not NO_MORE_DOCS. This can trigger exceptions when an empty
IntervalQuery is combined in a conjunction.
Fixes#11759
This PR removes the recently added function on LeafReader to exhaustively search
through vectors, plus the helper function KnnVectorsReader#searchExhaustively.
Instead it performs the exact search within KnnVectorQuery, using a new helper
class called VectorScorer.
If ConcurrentMergeScheduler is used, and the merge hits fatal exception (such as disk full) after prepareCommit()'s ensureOpen() check, then startCommit() will throw IllegalStateException instead of AlreadyClosedException.
The test is currently not prepared to handle this: the logic is only geared around exceptions coming from addDocument()
Closes#11755
When indexing term vectors for a very large document, the automatic computation
of the dictionary size based on the overall size of the block might yield a
size that exceeds the maximum window size that is supported by LZ4. This commit
addresses the issue by automatically taking the minimum of the result of this
computation and the maximum window size (64kB).
* Remove usages of System.currentTimeMillis() from tests
- Use Random from `RandomizedRunner` to be able to use a Seed to
reproduce tests, instead of a seed coming from wall clock.
- Replace time based tests, using wall clock to determine periods
with counter of repetitions, to have a consistent reproduction.
Closes: #11459
* address comments
* tune iterations
* tune iterations for nightly
These internal versions only make sense within a codec definition, and aren't
meant to be exposed and compared across codecs. Since this method is only used
in tests, we can move the check to the test classes instead.
This change folds the `RandomAccessVectorValuesProducer` interface into
`RandomAccessVectorValues`. This reduces the number of interfaces and clarifies
the cloning/ copying behavior.
This is a small simplification related to LUCENE-9583, but does not address the
main issue.
The base spatial test case may create invalid self crossing polygons. These
polygons are cleaned by the tessellator which may result in an inconsistent
bounding box between the tessellated shape and the original, invalid, geometry.
This commit fixes the shape doc value test case to compute the bounding box from
the cleaned geometry instead of relying on the, potentially invalid, original
geometry.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
We currently compute the partition point for a set of points by multiplying the number of nodes that needs to be on
the left of the BKD tree by the maxPointsInLeafNode. This multiplication is done on the integer space so if the partition point is bigger than Integer.MAX_VALUE it will overflow. This commit moves the multiplication to the long space so it doesn't overflow.
Adds new doc value field to support LatLonShape and XYShape doc values. The
implementation is inspired by ComponentTree. A binary tree of tessellated
components (point, line, or triangle) is created. This tree is then DFS
serialized to a variable compressed DataOutput buffer to keep the doc value
format as compact as possible.
DocValue queries are performed on the serialized tree using a similar component
relation logic as found in SpatialQuery for BKD indexed shapes. To make this
possible some of the relation logic is refactored to make it accessible to the
doc value query counterpart.
Note this does not support the following:
* Multi Geometries or Collections - This will be investigated by exploring
the addition of multi binary doc values.
* General Geometry Queries - This will be added in a follow on improvement.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
* add Comment on Lev & pretty the toDot
* use auto generate scripts to add comment
* update checksum
* update checksum
* restore toDot
* add removeDeadStates in levAutomata
Co-authored-by: tangdonghai <tangdonghai@meituan.com>