lucene

Commit Graph

Author	SHA1	Message	Date
Michael Sokolov	798c02dd70	fix VectorUtil.dotProductScore normalization (#1073 )	2022-08-20 09:15:38 -04:00
Michael Sokolov	60fa19d509	don't call BitSet.cardinality() more than needed (#1075 )	2022-08-20 08:40:50 -04:00
Michael Sokolov	f9680c6807	Add safety checks to KnnVectorField; fixed issue with copying BytesRef (#1076 )	2022-08-20 08:38:42 -04:00
Tomoko Uchida	9ae3498f82	add notes about labels' color code	2022-08-20 13:22:50 +09:00
Julie Tibshirani	8308688d78	LUCENE-9583: Remove RandomAccessVectorValuesProducer (#1071 ) This change folds the `RandomAccessVectorValuesProducer` interface into `RandomAccessVectorValues`. This reduces the number of interfaces and clarifies the cloning/ copying behavior. This is a small simplification related to LUCENE-9583, but does not address the main issue.	2022-08-19 18:04:05 -07:00
Yuting Gan	0914b537db	LUCENE-10644: Facets#getAllChildren testing should ignore child order (#1013 )	2022-08-18 10:38:49 -07:00
Julie Tibshirani	7912ed02c4	Move Lucene91HnswGraphBuilder to test folder It's only used in unit tests so it can live in the backwards_codecs tests.	2022-08-17 17:10:38 -07:00
Tomoko Uchida	8b3303b25f	.asf.yaml	2022-08-16 20:02:47 +09:00
Michael Sokolov	bc214d4958	standardize exception text for vector dimension mismatch (in SimpleText codec)	2022-08-13 13:12:11 -04:00
Nick Knize	543910d900	LUCENE-10654: Fix ShapeDocValue Bounding Box failure (#1066 ) The base spatial test case may create invalid self crossing polygons. These polygons are cleaned by the tessellator which may result in an inconsistent bounding box between the tessellated shape and the original, invalid, geometry. This commit fixes the shape doc value test case to compute the bounding box from the cleaned geometry instead of relying on the, potentially invalid, original geometry. Signed-off-by: Nicholas Walter Knize <nknize@apache.org>	2022-08-12 10:54:22 -05:00
Ignacio Vera	fe8d11254a	LUCENE-10678: Fix potential overflow when computing the partition point on the BKD tree (#1065 ) We currently compute the partition point for a set of points by multiplying the number of nodes that needs to be on the left of the BKD tree by the maxPointsInLeafNode. This multiplication is done on the integer space so if the partition point is bigger than Integer.MAX_VALUE it will overflow. This commit moves the multiplication to the long space so it doesn't overflow.	2022-08-11 15:25:53 +02:00
Michael Sokolov	a693fe819b	LUCENE-10577: enable quantization of HNSW vectors to 8 bits (#1054 ) * LUCENE-10577: enable supplying, storing, and comparing HNSW vectors with 8 bit precision	2022-08-10 17:09:07 -04:00
Vigya Sharma	59a0917e25	Fix typo in PostingsReaderBase docstring (#948 ) * remove extra PostingsEnum from docstring * add ImpactsEnum to docstring	2022-08-09 16:20:51 -07:00
Nick Knize	d7fd48c950	LUCENE-10654: Add new ShapeDocValuesField for LatLonShape and XYShape (#1017 ) Adds new doc value field to support LatLonShape and XYShape doc values. The implementation is inspired by ComponentTree. A binary tree of tessellated components (point, line, or triangle) is created. This tree is then DFS serialized to a variable compressed DataOutput buffer to keep the doc value format as compact as possible. DocValue queries are performed on the serialized tree using a similar component relation logic as found in SpatialQuery for BKD indexed shapes. To make this possible some of the relation logic is refactored to make it accessible to the doc value query counterpart. Note this does not support the following: * Multi Geometries or Collections - This will be investigated by exploring the addition of multi binary doc values. * General Geometry Queries - This will be added in a follow on improvement. Signed-off-by: Nicholas Walter Knize <nknize@apache.org>	2022-08-09 12:51:45 -05:00
Tomoko Uchida	0eba72f625	disable GH issue	2022-08-09 15:26:55 +09:00
tang donghai	b08e34722d	LUCENE-10646: Add some comment on LevenshteinAutomata (#1016 ) * add Comment on Lev & pretty the toDot * use auto generate scripts to add comment * update checksum * update checksum * restore toDot * add removeDeadStates in levAutomata Co-authored-by: tangdonghai <tangdonghai@meituan.com>	2022-08-07 10:01:30 -04:00
Ignacio Vera	bd0718f071	LUCENE-10673: Improve check of equality for latitudes for spatial3d GeoBoundingBox (#1056 )	2022-08-04 06:47:27 +02:00
luyuncheng	34154736c6	LUCENE-10627: Using ByteBuffersDataInput reduce memory copy on compressing data (#987 )	2022-08-01 18:34:41 +02:00
Adrien Grand	04e4f317cb	LUCENE-10629: Fix NullPointerException. I hit a NPE while running tests. `Weight#scorer` may return `null`, but not `Scorer#iterator`.	2022-08-01 14:13:22 +02:00
Shai Erera	7ac75135b9	[LUCENE-10629]: Add fast match query support to FacetSets (#1015 )	2022-07-31 07:50:03 +03:00
Dawid Weiss	f93e52e5bb	LUCENE-10669: The build should be more helpful when generated resources are touched (#1053 )	2022-07-30 20:45:32 +02:00
Adrien Grand	7c9d3cd6ff	LUCENE-10633: Fix handling of missing values in reverse sorts.	2022-07-29 21:36:35 +02:00
Kaival Parikh	1ad28a3136	LUCENE-10559: Add Prefilter Option to KnnGraphTester (#932 ) Added a `prefilter` and `filterSelectivity` argument to KnnGraphTester to be able to compare pre and post-filtering benchmarks. `filterSelectivity` expresses the selectivity of a filter as proportion of passing docs that are randomly selected. We store these in a FixedBitSet and use this to calculate true KNN as well as in HNSW search. In case of post-filter, we over-select results as `topK / filterSelectivity` to get final hits close to actual requested `topK`. For pre-filter, we wrap the FixedBitSet in a query and pass it as prefilter argument to KnnVectorQuery.	2022-07-29 11:21:34 -07:00
Adrien Grand	eb7b7791ba	LUCENE-10633: Dynamic pruning for sorting on SORTED(_SET) fields. (#1023 ) This commit enables dynamic pruning for queries sorted on SORTED(_SET) fields by using postings to filter competitive documents.	2022-07-29 11:12:32 +02:00
iverase	e1d2005df4	Add back-compat indices for 9.3.0	2022-07-29 10:13:20 +02:00
iverase	52a41d702f	DOAP changes for release 9.3.0	2022-07-29 09:40:53 +02:00
Greg Miller	4ebc249dbc	Add #scoreSupplier support to DocValuesRewriteMethod along with singleton doc value opto (#1020 )	2022-07-28 11:12:21 -07:00
Shiming Li	bb752c774c	LUCENE-10663: Fix KnnVectorQuery explain with multiple segments (#1050 ) If there are multiple segments. KnnVectorQuery explain has a bug in locating the doc ID. This is because the doc ID in explain is the docBase without the segment. In KnnVectorQuery.DocAndScoreQuery docs docid is increased in each segment of the docBase. So, in the 'DocAndScoreQuery.explain', needs to be added with the segment's docBase. Co-authored-by: Julie Tibshirani <julietibs@apache.org>	2022-07-28 10:31:49 -07:00
Adrien Grand	0ff987562a	LUCENE-10661: Move CHANGES entry to 9.4.	2022-07-27 16:20:20 +02:00
luyuncheng	107747f359	LUCENE-10661: Reduce memory copy in BytesStore (#1047 )	2022-07-27 16:17:08 +02:00
Weiming Wu	2cf12b8cdc	Cache decoded bytes for TFIDFSimilarity scorer. (#1042 ) Co-authored-by: Weiming Wu <wweiming@amazon.com>	2022-07-26 13:47:52 +02:00
tang donghai	94960a0aff	precompute maxlevel in LogMergePolicy (#1045 )	2022-07-26 13:42:32 +02:00
Mayya Sharipova	2efc204a39	LUCENE-10592 Strengthen TestHnswGraph::testSortedAndUnsortedIndicesReturnSameResults This test occasionally fails if knn search returns only 1 document in the index, as we have an assertion that returned doc IDs from sorted and unsorted index must be different. This patch ensures that we have many documents in the index, so that knn search always returns enough results.	2022-07-25 09:48:43 -04:00
Greg Miller	f943a57ebe	Fix another TestDisiPriorityQueue bug	2022-07-22 14:32:08 -07:00
Mayya Sharipova	bd06cebfc2	Add change log for LUCENE-10592	2022-07-22 12:14:58 -04:00
Mayya Sharipova	fdbb76a8d7	Add next minor version 9.3.0	2022-07-22 12:01:08 -04:00
Mayya Sharipova	ba4bc04271	LUCENE-10592 Build HNSW Graph on indexing (#992 ) Currently, when indexing knn vectors, we buffer them in memory and on flush during a segment construction we build an HNSW graph. As building an HNSW graph is very expensive, this makes flush operation take a lot of time. This also makes overall indexing performance quite unpredictable – some indexing operations return almost instantly while others that trigger flush take a lot of time. This happens because flushes are unpredictable and trigged by memory used, presence of concurrent searches etc. Building an HNSW graph as we index documents avoid these problems, as the load of HNSW graph construction is spread evenly during indexing. Co-authored-by: Adrien Grand <jpountz@gmail.com>	2022-07-22 11:29:28 -04:00
Mayya Sharipova	bd360f9b3e	Create Lucene94 Codec and move Lucene92 to backwards_codecs (#1041 )	2022-07-22 10:04:10 -04:00
Michael Sokolov	6bdeb141b7	Revert "Create Lucene93 Codec and move Lucene92 to backwards_codecs (#924 )" This reverts commit `f4f4a159b7`.	2022-07-21 12:52:42 -04:00
Vigya Sharma	25a842d871	LUCENE-10583: Add docstring warning to not lock on Lucene objects (#963 ) * add locking warning to docstring * git tidy	2022-07-21 06:35:17 -04:00
Greg Miller	1bc38b7d1f	Fix TestDisiPriorityQueue test bug	2022-07-20 11:33:14 -07:00
Lu Xugang	39e7597f6e	LUCENE-10656: It is unnecessary that using `limit` to check boundary (#1027 )	2022-07-20 10:00:06 +08:00
Zach Chen	28ce8abb51	LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions (#1018 )	2022-07-19 18:59:19 -07:00
Greg Miller	3d7d85f245	LUCENE-10653: Heapify in BMMScorer (#1022 )	2022-07-19 13:49:31 -07:00
Greg Miller	a35dee5b27	Small tweak to IntervalQuery#visit logic (#1007 )	2022-07-19 12:27:41 -07:00
Adrien Grand	11e7fe6618	LUCENE-10657: Move CHANGES entry to 9.3.	2022-07-19 09:39:18 +02:00
luyuncheng	e5bf76b843	LUCENE-10657: CopyBytes now saves one memory copy on ByteBuffersDataOutput (#1034 ) Abstract method copyBytes need to copy from input to a buffer and then write into ByteBuffersDataOutput, i think there is unnecessary, we can override it, copy directly from input into output	2022-07-19 09:37:07 +02:00
hcqs33	9f80fea502	Fix error in TieredMergePolicy (#1028 ) Fix error in comparing between bytes of candidates and bytes of max merge. It's wrong to use candidateSize rather than currentCandidateBytes comparing with maxMergeBytes.	2022-07-19 09:21:09 +02:00
Tomoko Uchida	781edf442b	LUCENE-10557: Refine issue label texts (#1036 )	2022-07-19 13:41:42 +09:00
Adrien Grand	216e38a159	Synchronize FieldInfos#verifyFieldInfos. (#1019 ) This method is called from `addIndexes` and should be synchronized so that it would see consistent data structures in case of concurrent indexing that would be introducing new fields. I hit a rare test failure of `TestIndexRearranger` that I can only explain by this lack of locking: ``` 15:40:14 > java.util.concurrent.ExecutionException: java.lang.NullPointerException: Cannot read field "numDimensions" because "props" is null 15:40:14 > at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) 15:40:14 > at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) 15:40:14 > at org.apache.lucene.misc.index.IndexRearranger.execute(IndexRearranger.java:98) 15:40:14 > at org.apache.lucene.misc.index.TestIndexRearranger.testRearrangeUsingBinaryDocValueSelector(TestIndexRearranger.java:97) 15:40:14 > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 15:40:14 > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 15:40:14 > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 15:40:14 > at java.base/java.lang.reflect.Method.invoke(Method.java:568) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) 15:40:14 > at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47) 15:40:14 > at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850) 15:40:14 > at java.base/java.lang.Thread.run(Thread.java:833) 15:40:14 > 15:40:14 > Caused by: 15:40:14 > java.lang.NullPointerException: Cannot read field "numDimensions" because "props" is null 15:40:14 > at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.FieldInfos$FieldNumbers.verifySameSchema(FieldInfos.java:459) 15:40:14 > at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.FieldInfos$FieldNumbers.verifyFieldInfo(FieldInfos.java:359) 15:40:14 > at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3149) 15:40:14 > at org.apache.lucene.misc.index.IndexRearranger.addOneSegment(IndexRearranger.java:139) 15:40:14 > at org.apache.lucene.misc.index.IndexRearranger.lambda$execute$0(IndexRearranger.java:92) 15:40:14 > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 15:40:14 > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 15:40:14 > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 15:40:14 > ... 1 more ```	2022-07-18 16:17:29 +02:00

... 3 4 5 6 7 ...

36315 Commits All Branches Search

36315 Commits

All Branches