lucene

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	04e4f317cb	LUCENE-10629: Fix NullPointerException. I hit a NPE while running tests. `Weight#scorer` may return `null`, but not `Scorer#iterator`.	2022-08-01 14:13:22 +02:00
Shai Erera	7ac75135b9	[LUCENE-10629]: Add fast match query support to FacetSets (#1015 )	2022-07-31 07:50:03 +03:00
Dawid Weiss	f93e52e5bb	LUCENE-10669: The build should be more helpful when generated resources are touched (#1053 )	2022-07-30 20:45:32 +02:00
Adrien Grand	7c9d3cd6ff	LUCENE-10633: Fix handling of missing values in reverse sorts.	2022-07-29 21:36:35 +02:00
Kaival Parikh	1ad28a3136	LUCENE-10559: Add Prefilter Option to KnnGraphTester (#932 ) Added a `prefilter` and `filterSelectivity` argument to KnnGraphTester to be able to compare pre and post-filtering benchmarks. `filterSelectivity` expresses the selectivity of a filter as proportion of passing docs that are randomly selected. We store these in a FixedBitSet and use this to calculate true KNN as well as in HNSW search. In case of post-filter, we over-select results as `topK / filterSelectivity` to get final hits close to actual requested `topK`. For pre-filter, we wrap the FixedBitSet in a query and pass it as prefilter argument to KnnVectorQuery.	2022-07-29 11:21:34 -07:00
Adrien Grand	eb7b7791ba	LUCENE-10633: Dynamic pruning for sorting on SORTED(_SET) fields. (#1023 ) This commit enables dynamic pruning for queries sorted on SORTED(_SET) fields by using postings to filter competitive documents.	2022-07-29 11:12:32 +02:00
iverase	e1d2005df4	Add back-compat indices for 9.3.0	2022-07-29 10:13:20 +02:00
iverase	52a41d702f	DOAP changes for release 9.3.0	2022-07-29 09:40:53 +02:00
Greg Miller	4ebc249dbc	Add #scoreSupplier support to DocValuesRewriteMethod along with singleton doc value opto (#1020 )	2022-07-28 11:12:21 -07:00
Shiming Li	bb752c774c	LUCENE-10663: Fix KnnVectorQuery explain with multiple segments (#1050 ) If there are multiple segments. KnnVectorQuery explain has a bug in locating the doc ID. This is because the doc ID in explain is the docBase without the segment. In KnnVectorQuery.DocAndScoreQuery docs docid is increased in each segment of the docBase. So, in the 'DocAndScoreQuery.explain', needs to be added with the segment's docBase. Co-authored-by: Julie Tibshirani <julietibs@apache.org>	2022-07-28 10:31:49 -07:00
Adrien Grand	0ff987562a	LUCENE-10661: Move CHANGES entry to 9.4.	2022-07-27 16:20:20 +02:00
luyuncheng	107747f359	LUCENE-10661: Reduce memory copy in BytesStore (#1047 )	2022-07-27 16:17:08 +02:00
Weiming Wu	2cf12b8cdc	Cache decoded bytes for TFIDFSimilarity scorer. (#1042 ) Co-authored-by: Weiming Wu <wweiming@amazon.com>	2022-07-26 13:47:52 +02:00
tang donghai	94960a0aff	precompute maxlevel in LogMergePolicy (#1045 )	2022-07-26 13:42:32 +02:00
Mayya Sharipova	2efc204a39	LUCENE-10592 Strengthen TestHnswGraph::testSortedAndUnsortedIndicesReturnSameResults This test occasionally fails if knn search returns only 1 document in the index, as we have an assertion that returned doc IDs from sorted and unsorted index must be different. This patch ensures that we have many documents in the index, so that knn search always returns enough results.	2022-07-25 09:48:43 -04:00
Greg Miller	f943a57ebe	Fix another TestDisiPriorityQueue bug	2022-07-22 14:32:08 -07:00
Mayya Sharipova	bd06cebfc2	Add change log for LUCENE-10592	2022-07-22 12:14:58 -04:00
Mayya Sharipova	fdbb76a8d7	Add next minor version 9.3.0	2022-07-22 12:01:08 -04:00
Mayya Sharipova	ba4bc04271	LUCENE-10592 Build HNSW Graph on indexing (#992 ) Currently, when indexing knn vectors, we buffer them in memory and on flush during a segment construction we build an HNSW graph. As building an HNSW graph is very expensive, this makes flush operation take a lot of time. This also makes overall indexing performance quite unpredictable – some indexing operations return almost instantly while others that trigger flush take a lot of time. This happens because flushes are unpredictable and trigged by memory used, presence of concurrent searches etc. Building an HNSW graph as we index documents avoid these problems, as the load of HNSW graph construction is spread evenly during indexing. Co-authored-by: Adrien Grand <jpountz@gmail.com>	2022-07-22 11:29:28 -04:00
Mayya Sharipova	bd360f9b3e	Create Lucene94 Codec and move Lucene92 to backwards_codecs (#1041 )	2022-07-22 10:04:10 -04:00
Michael Sokolov	6bdeb141b7	Revert "Create Lucene93 Codec and move Lucene92 to backwards_codecs (#924 )" This reverts commit `f4f4a159b7`.	2022-07-21 12:52:42 -04:00
Vigya Sharma	25a842d871	LUCENE-10583: Add docstring warning to not lock on Lucene objects (#963 ) * add locking warning to docstring * git tidy	2022-07-21 06:35:17 -04:00
Greg Miller	1bc38b7d1f	Fix TestDisiPriorityQueue test bug	2022-07-20 11:33:14 -07:00
Lu Xugang	39e7597f6e	LUCENE-10656: It is unnecessary that using `limit` to check boundary (#1027 )	2022-07-20 10:00:06 +08:00
Zach Chen	28ce8abb51	LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions (#1018 )	2022-07-19 18:59:19 -07:00
Greg Miller	3d7d85f245	LUCENE-10653: Heapify in BMMScorer (#1022 )	2022-07-19 13:49:31 -07:00
Greg Miller	a35dee5b27	Small tweak to IntervalQuery#visit logic (#1007 )	2022-07-19 12:27:41 -07:00
Adrien Grand	11e7fe6618	LUCENE-10657: Move CHANGES entry to 9.3.	2022-07-19 09:39:18 +02:00
luyuncheng	e5bf76b843	LUCENE-10657: CopyBytes now saves one memory copy on ByteBuffersDataOutput (#1034 ) Abstract method copyBytes need to copy from input to a buffer and then write into ByteBuffersDataOutput, i think there is unnecessary, we can override it, copy directly from input into output	2022-07-19 09:37:07 +02:00
hcqs33	9f80fea502	Fix error in TieredMergePolicy (#1028 ) Fix error in comparing between bytes of candidates and bytes of max merge. It's wrong to use candidateSize rather than currentCandidateBytes comparing with maxMergeBytes.	2022-07-19 09:21:09 +02:00
Tomoko Uchida	781edf442b	LUCENE-10557: Refine issue label texts (#1036 )	2022-07-19 13:41:42 +09:00
Adrien Grand	216e38a159	Synchronize FieldInfos#verifyFieldInfos. (#1019 ) This method is called from `addIndexes` and should be synchronized so that it would see consistent data structures in case of concurrent indexing that would be introducing new fields. I hit a rare test failure of `TestIndexRearranger` that I can only explain by this lack of locking: ``` 15:40:14 > java.util.concurrent.ExecutionException: java.lang.NullPointerException: Cannot read field "numDimensions" because "props" is null 15:40:14 > at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122) 15:40:14 > at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191) 15:40:14 > at org.apache.lucene.misc.index.IndexRearranger.execute(IndexRearranger.java:98) 15:40:14 > at org.apache.lucene.misc.index.TestIndexRearranger.testRearrangeUsingBinaryDocValueSelector(TestIndexRearranger.java:97) 15:40:14 > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 15:40:14 > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) 15:40:14 > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 15:40:14 > at java.base/java.lang.reflect.Method.invoke(Method.java:568) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) 15:40:14 > at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60) 15:40:14 > at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47) 15:40:14 > at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390) 15:40:14 > at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850) 15:40:14 > at java.base/java.lang.Thread.run(Thread.java:833) 15:40:14 > 15:40:14 > Caused by: 15:40:14 > java.lang.NullPointerException: Cannot read field "numDimensions" because "props" is null 15:40:14 > at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.FieldInfos$FieldNumbers.verifySameSchema(FieldInfos.java:459) 15:40:14 > at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.FieldInfos$FieldNumbers.verifyFieldInfo(FieldInfos.java:359) 15:40:14 > at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3149) 15:40:14 > at org.apache.lucene.misc.index.IndexRearranger.addOneSegment(IndexRearranger.java:139) 15:40:14 > at org.apache.lucene.misc.index.IndexRearranger.lambda$execute$0(IndexRearranger.java:92) 15:40:14 > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) 15:40:14 > at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 15:40:14 > at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 15:40:14 > ... 1 more ```	2022-07-18 16:17:29 +02:00
Adrien Grand	0364402b30	Fix rare test failures of TestIndexSorting. Sometimes the final merge might not require sorting depending on the merge policy configuration.	2022-07-18 15:26:08 +02:00
Vigya Sharma	30a7c52e6c	LUCENE-10649: Fix failures in TestDemoParallelLeafReader (#1025 )	2022-07-18 14:32:38 +02:00
Tomoko Uchida	8938e6a3fa	LUCENE-10557: Add GitHub issue templates (#1024 )	2022-07-18 15:33:00 +09:00
Greg Miller	9b185b99c4	LUCENE-10603: Remove SSDV#NO_MORE_ORDS definition (#1021 )	2022-07-13 20:02:17 -07:00
Vigya Sharma	ca7917472b	LUCENE-10648: Fix failures in TestAssertingPointsFormat.testWithExceptions (#1012 ) * Fix failures in TestAssertingPointsFormat.testWithExceptions * remove redundant finally block * tidy * remove TODO as it is done now	2022-07-13 13:55:08 -04:00
Christine Poerschke	56462b5f96	LUCENE-10523: factor out UnifiedHighlighter.newFieldHighlighter() method (#821 )	2022-07-13 18:43:31 +01:00
Greg Miller	7c35311f29	Specialize ordinal encoding for SortedSetDocValues (#1010 )	2022-07-12 18:55:54 -07:00
tang donghai	d7c2def019	LUCENE-10619: Optimize the writeBytes in TermsHashPerField (#966 )	2022-07-12 17:14:37 +02:00
Greg Miller	d6dbe4374a	Move LUCENE-10614 CHANGES entry to 10.0 and add MIGRATE entry	2022-07-11 09:10:58 -07:00
Yuting Gan	5ef7e5025d	LUCENE-10614: Properly support getTopChildren in RangeFacetCounts (#974 )	2022-07-11 09:04:46 -07:00
Vigya Sharma	128869d63a	LUCENE-10647: Fix TestMergeSchedulerExternal failures (#1011 ) Ensure mergeScheduler.sync() gets called before we rollback the writer.	2022-07-11 11:23:17 +02:00
Vigya Sharma	c06b98262c	add comment for no pauses in writeBytes (#1014 )	2022-07-11 11:12:00 +02:00
Stefan Vodita	dd4e8b82d7	LUCENE-10603: Stop using SortedSetDocValues.NO_MORE_ORDS in tests (#1004 )	2022-07-07 09:54:41 -07:00
zacharymorn	da8143bfa3	LUCENE-10480: Move scoring from advance to TwoPhaseIterator#matches to improve disjunction within conjunction (#1006 )	2022-07-07 01:10:50 -07:00
Vigya Sharma	698f40ad51	LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges. (#633 ) * Use merge policy and merge scheduler to run addIndexes merges * wrapped reader does not see deletes - debug * Partially fixed tests in TestAddIndexes * Use writer object to invoke addIndexes merge * Use merge object info * Add javadocs for new methods * TestAddIndexes passing * verify field info schemas upfront from incoming readers * rename flag to track pooled readers * Keep addIndexes API transactional * Maintain transactionality - register segments with iw after all merges complete * fix checkstyle * PR comments * Fix pendingDocs - numDocs mismatch bug * Tests with 1-1 merges and partial merge failures * variable renaming and better comments * add test for partial merge failures. change tests to use 1-1 findmerges * abort pending merges gracefully * test null and empty merge specs * test interim files are deleted * test with empty readers * test cascading merges triggered * remove nocommits * gradle check errors * remove unused line * remove printf * spotless apply * update TestIndexWriterOnDiskFull to accept mergeException from failing addIndexes calls * return singleton reader mergespec in NoMergePolicy * rethrow exceptions seen in merge threads on failure * spotless apply * update test to new exception type thrown * spotlessApply * test for maxDoc limit in IndexWriter * spotlessApply * Use DocValuesIterator instead of DocValuesFieldExistsQuery for counting soft deletes * spotless apply * change exception message for closed IW * remove non-essential comments * update api doc string * doc string update * spotless * Changes file entry * simplify findMerges API, add 1-1 merges to MockRandomMergePolicy * update merge policies to new api * remove unused imports * spotless apply * move changes entry to end of list * fix testAddIndicesWithSoftDeletes * test with 1-1 merge policy always enabled * please spotcheck * tidy * test - never use 1-1 merge policy * use 1-1 merge policy randomly * Remove concurrent addIndexes findMerges from MockRandomMergePolicy * Bug Fix: RuntimeException in addIndexes Aborted pending merges were slipping through the merge exception check in API, and getting caught later in the RuntimeException check. * tidy * Rebase on main. Move changes to 10.0 * Synchronize IW.AddIndexesMergeSource on outer class IW object * tidy	2022-07-06 18:15:47 -04:00
Peter Gromov	d537013e70	LUCENE-10626: Hunspell: add tools to aid dictionary editing: analysis introspection, stem expansion and stem/flag suggestion (#975 )	2022-07-05 21:38:03 +02:00
Adrien Grand	3dd9a5487c	LUCENE-10636: Avoid computing the same scores multiple times. (#1005 ) `BlockMaxMaxscoreScorer` would previously compute the score twice for essential scorers. Co-authored-by: zacharymorn <zacharymorn@gmail.com>	2022-07-05 10:14:02 +02:00
Adrien Grand	81d4a7a69f	LUCENE-10151: Some fixes to query timeouts. (#996 ) I noticed some minor bugs in the original PR #927 that this PR should fix: - When a timeout is set, we would no longer catch `CollectionTerminatedException`. - I added randomization to `LuceneTestCase` to randomly set a timeout, it would have caught the above bug. - Fixed visibility of `TimeLimitingBulkScorer`.	2022-07-04 17:32:38 +02:00

1 2 3 4 5 ...

36097 Commits All Branches Search

36097 Commits

All Branches