Commit Graph

36347 Commits

Author SHA1 Message Date
Dhiru Kholia 30b72ec364
Fix a typo affecting Luke (#11763) 2022-09-12 13:05:40 +02:00
Alan Woodward 41d03f69ce
Fix IntervalBuilder.NO_INTERVALS docId when unpositioned (#11760)
IntervalBuilder.NO_INTERVALS should return -1 when unpositioned,
not NO_MORE_DOCS. This can trigger exceptions when an empty
IntervalQuery is combined in a conjunction.

Fixes #11759
2022-09-09 17:19:15 +01:00
Mayya Sharipova 0ea8035612
LUCENE-10592 Better estimate memory for HNSW graph (#11743)
Better estimate memory used for OnHeapHnswGraph,
as well as add tests.

Also don't overallocate arrays in NeighborArray

Relates to #992
2022-09-08 16:54:29 -04:00
Yuting Gan 49b596ef02
Added a top-n range faceting example (#1035) 2022-09-08 12:19:42 -07:00
Julie Tibshirani 09a13aeaf2
LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively (#11756)
This PR removes the recently added function on LeafReader to exhaustively search
through vectors, plus the helper function KnnVectorsReader#searchExhaustively.
Instead it performs the exact search within KnnVectorQuery, using a new helper
class called VectorScorer.
2022-09-08 12:15:02 -07:00
Robert Muir f4146a44e9
Fix TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull to handle IllegalStateException from startCommit() (#11757)
If ConcurrentMergeScheduler is used, and the merge hits fatal exception (such as disk full) after prepareCommit()'s ensureOpen() check, then startCommit() will throw IllegalStateException instead of AlreadyClosedException.

The test is currently not prepared to handle this: the logic is only geared around exceptions coming from addDocument()

Closes #11755
2022-09-08 13:35:54 -04:00
Adrien Grand f8285fd0fe
Prevent term vectors from exceeding the maximum dictionary size. (#11726)
When indexing term vectors for a very large document, the automatic computation
of the dictionary size based on the overall size of the block might yield a
size that exceeds the maximum window size that is supported by LZ4. This commit
addresses the issue by automatically taking the minimum of the result of this
computation and the maximum window size (64kB).
2022-09-08 13:44:21 +02:00
Marios Trivyzas dbffe3472b
LUCENE-10423: Remove usages of System.currentTimeMillis() from tests (#11749)
* Remove usages of System.currentTimeMillis() from tests

- Use Random from `RandomizedRunner` to be able to use a Seed to
  reproduce tests, instead of a seed coming from wall clock.
- Replace time based tests, using wall clock to determine periods
  with counter of repetitions, to have a consistent reproduction.

Closes: #11459

* address comments

* tune iterations

* tune iterations for nightly
2022-09-06 17:55:01 -04:00
Dawid Weiss d3460fa1bb
Add tidy after addVersion is called. (#11748) 2022-09-04 19:50:38 +02:00
Greg Miller 84cae4f27c
Simplify dense optimization check in TermInSetQuery (#11737) 2022-09-02 07:51:29 -07:00
Greg Miller 202dd809bd Ensure TermInSetQuery ScoreSupplier never returns null Scorer 2022-09-01 15:31:14 -07:00
Greg Miller 680f21dca5
LUCENE-10207: TermInSetQuery now provides a ScoreSupplier with cost estimation for use in IndexOrDocValuesQuery (#1058) 2022-09-01 14:04:43 -07:00
Michael Sokolov 0462a0ad73 fixed index order needed for TestKnnVectorQuery.testScoreEuclidean (#11732) 2022-09-01 09:53:57 -04:00
Michael Sokolov 1649964f07 Forward-port CHANGES entry for quantized HNSW vectors from 9.x branch 2022-09-01 09:53:46 -04:00
Tomoko Uchida fd86968fee
remove a link to old Jira in README. 2022-09-01 00:41:56 +09:00
Mayya Sharipova 554fabf682
LUCENE-10633 Disable sort optimization for SortedSetSortField (#3125)
Add ability to SortedSetSortField to disable sort optimization
2022-08-30 16:52:28 -04:00
Michael Sokolov 61ef031f7f
SimpleText knn vectors; fix searchExhaustively and suppress a byte format test case (#11725) 2022-08-29 11:49:52 -04:00
Tomoko Uchida 29f94b0404
a bit of clarification about GitHub Milestone 2022-08-28 13:52:58 +09:00
Tomoko Uchida 6d664ccd95 adjast wording 2022-08-27 13:02:48 +09:00
Tomoko Uchida 09a7f9aa53 clarify the relation between CHANGES and Milestone 2022-08-27 12:58:33 +09:00
Tomoko Uchida 224953304c
Document about Milestone for release planning (#11723) 2022-08-27 12:29:40 +09:00
Tomoko Uchida e61958e4fd links to github should be '/issues' 2022-08-27 11:54:20 +09:00
Dawid Weiss 4f7543725c
#11720 Upgrade randomizedtesting to 2.8.1 (#11721) 2022-08-26 00:01:57 +02:00
Mike Drob dbc7a9764a
Add Integer awareness to RamUsageEstimator.sizeOf (#11715)
Additionally, update comments to reflect that we have not been VM cache-aware for a long time now.
2022-08-25 15:18:08 -05:00
Uwe Schindler 1d54299011
Fix classloading deadlock in analysis factories / AnalysisSPILoader initialization. This closes #11701 (#11718) 2022-08-25 18:16:04 +02:00
Tomoko Uchida 53b1ce7504
update contributing guide for GH issue (#11716) 2022-08-25 04:06:09 +09:00
Greg Miller 1529606763
Optimize TermInSetQuery for terms that match all docs in a segment (#1062) 2022-08-23 08:37:44 -07:00
Michael Sokolov 8021c2db4e Don't throw an exception for byte-encoded vectors in SimpleText codec 2022-08-22 08:29:58 -04:00
Julie Tibshirani df67223497 Disable byte encoding in TestSimpleTextKnnVectorsFormat 2022-08-21 17:00:57 -07:00
Julie Tibshirani 653d2ebf71
Remove KnnVectorsFormat#currentVersion (#1077)
These internal versions only make sense within a codec definition, and aren't
meant to be exposed and compared across codecs. Since this method is only used
in tests, we can move the check to the test classes instead.
2022-08-21 13:09:07 -07:00
Michael Sokolov daa56d30f0 Fix TestHnswGraph rare failure 2022-08-20 17:26:50 -04:00
Michael Sokolov 0a58318e16
Fix for bad cast when sorting a KnnVectors index over BytesRef (#1074) 2022-08-20 17:23:47 -04:00
Michael Sokolov 798c02dd70
fix VectorUtil.dotProductScore normalization (#1073) 2022-08-20 09:15:38 -04:00
Michael Sokolov 60fa19d509
don't call BitSet.cardinality() more than needed (#1075) 2022-08-20 08:40:50 -04:00
Michael Sokolov f9680c6807
Add safety checks to KnnVectorField; fixed issue with copying BytesRef (#1076) 2022-08-20 08:38:42 -04:00
Tomoko Uchida 9ae3498f82 add notes about labels' color code 2022-08-20 13:22:50 +09:00
Julie Tibshirani 8308688d78
LUCENE-9583: Remove RandomAccessVectorValuesProducer (#1071)
This change folds the `RandomAccessVectorValuesProducer` interface into
`RandomAccessVectorValues`. This reduces the number of interfaces and clarifies
the cloning/ copying behavior.

This is a small simplification related to LUCENE-9583, but does not address the
main issue.
2022-08-19 18:04:05 -07:00
Yuting Gan 0914b537db
LUCENE-10644: Facets#getAllChildren testing should ignore child order (#1013) 2022-08-18 10:38:49 -07:00
Julie Tibshirani 7912ed02c4 Move Lucene91HnswGraphBuilder to test folder
It's only used in unit tests so it can live in the backwards_codecs tests.
2022-08-17 17:10:38 -07:00
Tomoko Uchida 8b3303b25f .asf.yaml 2022-08-16 20:02:47 +09:00
Michael Sokolov bc214d4958 standardize exception text for vector dimension mismatch (in SimpleText codec) 2022-08-13 13:12:11 -04:00
Nick Knize 543910d900
LUCENE-10654: Fix ShapeDocValue Bounding Box failure (#1066)
The base spatial test case may create invalid self crossing polygons. These
polygons are cleaned by the tessellator which may result in an inconsistent
bounding box between the tessellated shape and the original, invalid, geometry.
This commit fixes the shape doc value test case to compute the bounding box from
the cleaned geometry instead of relying on the, potentially invalid, original
geometry.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2022-08-12 10:54:22 -05:00
Ignacio Vera fe8d11254a
LUCENE-10678: Fix potential overflow when computing the partition point on the BKD tree (#1065)
We currently compute the partition point for a set of points by multiplying the number of nodes that needs to be on
 the left of the BKD tree by the maxPointsInLeafNode. This multiplication is done on the integer space so if the partition point is bigger than Integer.MAX_VALUE it will overflow. This commit moves the multiplication to the long space so it doesn't overflow.
2022-08-11 15:25:53 +02:00
Michael Sokolov a693fe819b
LUCENE-10577: enable quantization of HNSW vectors to 8 bits (#1054)
* LUCENE-10577: enable supplying, storing, and comparing HNSW vectors with 8 bit precision
2022-08-10 17:09:07 -04:00
Vigya Sharma 59a0917e25
Fix typo in PostingsReaderBase docstring (#948)
* remove extra PostingsEnum from docstring
* add ImpactsEnum to docstring
2022-08-09 16:20:51 -07:00
Nick Knize d7fd48c950
LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape (#1017)
Adds new doc value field to support LatLonShape and XYShape doc values. The
implementation is inspired by ComponentTree. A binary tree of tessellated
components (point, line, or triangle) is created. This tree is then DFS
serialized to a variable compressed DataOutput buffer to keep the doc value
format as compact as possible.

DocValue queries are performed on the serialized tree using a similar component
relation logic as found in SpatialQuery for BKD indexed shapes. To make this
possible some of the relation logic is refactored to make it accessible to the
doc value query counterpart.

Note this does not support the following:

* Multi Geometries or Collections - This will be investigated by exploring 
  the addition of multi binary doc values.
* General Geometry Queries - This will be added in a follow on improvement. 

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2022-08-09 12:51:45 -05:00
Tomoko Uchida 0eba72f625 disable GH issue 2022-08-09 15:26:55 +09:00
tang donghai b08e34722d
LUCENE-10646: Add some comment on LevenshteinAutomata (#1016)
* add Comment on Lev & pretty the toDot

* use auto generate scripts to add comment

* update checksum

* update checksum

* restore toDot

* add removeDeadStates in levAutomata

Co-authored-by: tangdonghai <tangdonghai@meituan.com>
2022-08-07 10:01:30 -04:00
Ignacio Vera bd0718f071
LUCENE-10673: Improve check of equality for latitudes for spatial3d GeoBoundingBox (#1056) 2022-08-04 06:47:27 +02:00
luyuncheng 34154736c6
LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data (#987) 2022-08-01 18:34:41 +02:00