Commit Graph

36094 Commits

Author SHA1 Message Date
Adrien Grand 7c9d3cd6ff LUCENE-10633: Fix handling of missing values in reverse sorts. 2022-07-29 21:36:35 +02:00
Kaival Parikh 1ad28a3136
LUCENE-10559: Add Prefilter Option to KnnGraphTester (#932)
Added a `prefilter` and `filterSelectivity` argument to KnnGraphTester to be
able to compare pre and post-filtering benchmarks.

`filterSelectivity` expresses the selectivity of a filter as proportion of
passing docs that are randomly selected. We store these in a FixedBitSet and
use this to calculate true KNN as well as in HNSW search.

In case of post-filter, we over-select results as `topK / filterSelectivity` to
get final hits close to actual requested `topK`. For pre-filter, we wrap the
FixedBitSet in a query and pass it as prefilter argument to KnnVectorQuery.
2022-07-29 11:21:34 -07:00
Adrien Grand eb7b7791ba
LUCENE-10633: Dynamic pruning for sorting on SORTED(_SET) fields. (#1023)
This commit enables dynamic pruning for queries sorted on SORTED(_SET) fields
by using postings to filter competitive documents.
2022-07-29 11:12:32 +02:00
iverase e1d2005df4 Add back-compat indices for 9.3.0 2022-07-29 10:13:20 +02:00
iverase 52a41d702f DOAP changes for release 9.3.0 2022-07-29 09:40:53 +02:00
Greg Miller 4ebc249dbc
Add #scoreSupplier support to DocValuesRewriteMethod along with singleton doc value opto (#1020) 2022-07-28 11:12:21 -07:00
Shiming Li bb752c774c
LUCENE-10663: Fix KnnVectorQuery explain with multiple segments (#1050)
If there are multiple segments. KnnVectorQuery explain has a bug in locating
the doc ID. This is because the doc ID in explain is the docBase without the
segment.  In KnnVectorQuery.DocAndScoreQuery docs docid is increased in each
segment of the docBase. So, in the 'DocAndScoreQuery.explain', needs to be
added with the segment's docBase. 

Co-authored-by: Julie Tibshirani <julietibs@apache.org>
2022-07-28 10:31:49 -07:00
Adrien Grand 0ff987562a LUCENE-10661: Move CHANGES entry to 9.4. 2022-07-27 16:20:20 +02:00
luyuncheng 107747f359
LUCENE-10661: Reduce memory copy in BytesStore (#1047) 2022-07-27 16:17:08 +02:00
Weiming Wu 2cf12b8cdc
Cache decoded bytes for TFIDFSimilarity scorer. (#1042)
Co-authored-by: Weiming Wu <wweiming@amazon.com>
2022-07-26 13:47:52 +02:00
tang donghai 94960a0aff
precompute maxlevel in LogMergePolicy (#1045) 2022-07-26 13:42:32 +02:00
Mayya Sharipova 2efc204a39 LUCENE-10592 Strengthen TestHnswGraph::testSortedAndUnsortedIndicesReturnSameResults
This test occasionally fails if knn search returns only 1 document
in the index, as we have an assertion that returned doc IDs from
sorted and unsorted index must be different.

This patch ensures that we have many documents in the index, so
that knn search always returns enough results.
2022-07-25 09:48:43 -04:00
Greg Miller f943a57ebe Fix another TestDisiPriorityQueue bug 2022-07-22 14:32:08 -07:00
Mayya Sharipova bd06cebfc2 Add change log for LUCENE-10592 2022-07-22 12:14:58 -04:00
Mayya Sharipova fdbb76a8d7 Add next minor version 9.3.0 2022-07-22 12:01:08 -04:00
Mayya Sharipova ba4bc04271
LUCENE-10592 Build HNSW Graph on indexing (#992)
Currently, when indexing knn vectors, we buffer them in memory and
on flush during a segment construction we build an HNSW graph.
As building an HNSW graph is very expensive, this makes flush
operation take a lot of time. This also makes overall indexing
performance quite unpredictable – some indexing operations return
almost instantly while others that trigger flush take a lot of time.
This happens because flushes are unpredictable and trigged
by memory used, presence of concurrent searches etc.

Building an HNSW graph as we index documents avoid these problems,
as the load of HNSW graph construction is spread evenly during indexing.

Co-authored-by: Adrien Grand <jpountz@gmail.com>
2022-07-22 11:29:28 -04:00
Mayya Sharipova bd360f9b3e
Create Lucene94 Codec and move Lucene92 to backwards_codecs (#1041) 2022-07-22 10:04:10 -04:00
Michael Sokolov 6bdeb141b7 Revert "Create Lucene93 Codec and move Lucene92 to backwards_codecs (#924)"
This reverts commit f4f4a159b7.
2022-07-21 12:52:42 -04:00
Vigya Sharma 25a842d871
LUCENE-10583: Add docstring warning to not lock on Lucene objects (#963)
* add locking warning to docstring

* git tidy
2022-07-21 06:35:17 -04:00
Greg Miller 1bc38b7d1f Fix TestDisiPriorityQueue test bug 2022-07-20 11:33:14 -07:00
Lu Xugang 39e7597f6e
LUCENE-10656: It is unnecessary that using `limit` to check boundary (#1027) 2022-07-20 10:00:06 +08:00
Zach Chen 28ce8abb51
LUCENE-10480: Use BulkScorer to limit BMMScorer to only top-level disjunctions (#1018) 2022-07-19 18:59:19 -07:00
Greg Miller 3d7d85f245
LUCENE-10653: Heapify in BMMScorer (#1022) 2022-07-19 13:49:31 -07:00
Greg Miller a35dee5b27
Small tweak to IntervalQuery#visit logic (#1007) 2022-07-19 12:27:41 -07:00
Adrien Grand 11e7fe6618 LUCENE-10657: Move CHANGES entry to 9.3. 2022-07-19 09:39:18 +02:00
luyuncheng e5bf76b843
LUCENE-10657: CopyBytes now saves one memory copy on ByteBuffersDataOutput (#1034)
Abstract method copyBytes need to copy from input to a buffer and then write into ByteBuffersDataOutput, i think there is unnecessary, we can override it, copy directly from input into output
2022-07-19 09:37:07 +02:00
hcqs33 9f80fea502
Fix error in TieredMergePolicy (#1028)
Fix error in comparing between bytes of candidates and bytes of max merge.
It's wrong to use candidateSize rather than currentCandidateBytes comparing with maxMergeBytes.
2022-07-19 09:21:09 +02:00
Tomoko Uchida 781edf442b
LUCENE-10557: Refine issue label texts (#1036) 2022-07-19 13:41:42 +09:00
Adrien Grand 216e38a159
Synchronize FieldInfos#verifyFieldInfos. (#1019)
This method is called from `addIndexes` and should be synchronized so that it
would see consistent data structures in case of concurrent indexing that would
be introducing new fields.

I hit a rare test failure of `TestIndexRearranger` that I can only explain by this lack of locking:

```
15:40:14    >     java.util.concurrent.ExecutionException: java.lang.NullPointerException: Cannot read field "numDimensions" because "props" is null
15:40:14    >         at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
15:40:14    >         at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
15:40:14    >         at org.apache.lucene.misc.index.IndexRearranger.execute(IndexRearranger.java:98)
15:40:14    >         at org.apache.lucene.misc.index.TestIndexRearranger.testRearrangeUsingBinaryDocValueSelector(TestIndexRearranger.java:97)
15:40:14    >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
15:40:14    >         at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
15:40:14    >         at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
15:40:14    >         at java.base/java.lang.reflect.Method.invoke(Method.java:568)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1758)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:946)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$9.evaluate(RandomizedRunner.java:982)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$10.evaluate(RandomizedRunner.java:996)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:44)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:45)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
15:40:14    >         at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:843)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:490)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:955)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:840)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:891)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:902)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:43)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:44)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:60)
15:40:14    >         at org.apache.lucene.test_framework@10.0.0-SNAPSHOT/org.apache.lucene.tests.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:47)
15:40:14    >         at junit@4.13.1/org.junit.rules.RunRules.evaluate(RunRules.java:20)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:390)
15:40:14    >         at randomizedtesting.runner@2.8.0/com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:850)
15:40:14    >         at java.base/java.lang.Thread.run(Thread.java:833)
15:40:14    >
15:40:14    >         Caused by:
15:40:14    >         java.lang.NullPointerException: Cannot read field "numDimensions" because "props" is null
15:40:14    >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.FieldInfos$FieldNumbers.verifySameSchema(FieldInfos.java:459)
15:40:14    >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.FieldInfos$FieldNumbers.verifyFieldInfo(FieldInfos.java:359)
15:40:14    >             at org.apache.lucene.core@10.0.0-SNAPSHOT/org.apache.lucene.index.IndexWriter.addIndexes(IndexWriter.java:3149)
15:40:14    >             at org.apache.lucene.misc.index.IndexRearranger.addOneSegment(IndexRearranger.java:139)
15:40:14    >             at org.apache.lucene.misc.index.IndexRearranger.lambda$execute$0(IndexRearranger.java:92)
15:40:14    >             at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
15:40:14    >             at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
15:40:14    >             at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
15:40:14    >             ... 1 more
```
2022-07-18 16:17:29 +02:00
Adrien Grand 0364402b30 Fix rare test failures of TestIndexSorting.
Sometimes the final merge might not require sorting depending on the merge
policy configuration.
2022-07-18 15:26:08 +02:00
Vigya Sharma 30a7c52e6c
LUCENE-10649: Fix failures in TestDemoParallelLeafReader (#1025) 2022-07-18 14:32:38 +02:00
Tomoko Uchida 8938e6a3fa
LUCENE-10557: Add GitHub issue templates (#1024) 2022-07-18 15:33:00 +09:00
Greg Miller 9b185b99c4
LUCENE-10603: Remove SSDV#NO_MORE_ORDS definition (#1021) 2022-07-13 20:02:17 -07:00
Vigya Sharma ca7917472b
LUCENE-10648: Fix failures in TestAssertingPointsFormat.testWithExceptions (#1012)
* Fix failures in TestAssertingPointsFormat.testWithExceptions

* remove redundant finally block

* tidy

* remove TODO as it is done now
2022-07-13 13:55:08 -04:00
Christine Poerschke 56462b5f96
LUCENE-10523: factor out UnifiedHighlighter.newFieldHighlighter() method (#821) 2022-07-13 18:43:31 +01:00
Greg Miller 7c35311f29
Specialize ordinal encoding for SortedSetDocValues (#1010) 2022-07-12 18:55:54 -07:00
tang donghai d7c2def019
LUCENE-10619: Optimize the writeBytes in TermsHashPerField (#966) 2022-07-12 17:14:37 +02:00
Greg Miller d6dbe4374a Move LUCENE-10614 CHANGES entry to 10.0 and add MIGRATE entry 2022-07-11 09:10:58 -07:00
Yuting Gan 5ef7e5025d
LUCENE-10614: Properly support getTopChildren in RangeFacetCounts (#974) 2022-07-11 09:04:46 -07:00
Vigya Sharma 128869d63a
LUCENE-10647: Fix TestMergeSchedulerExternal failures (#1011)
Ensure mergeScheduler.sync() gets called before we rollback the writer.
2022-07-11 11:23:17 +02:00
Vigya Sharma c06b98262c
add comment for no pauses in writeBytes (#1014) 2022-07-11 11:12:00 +02:00
Stefan Vodita dd4e8b82d7
LUCENE-10603: Stop using SortedSetDocValues.NO_MORE_ORDS in tests (#1004) 2022-07-07 09:54:41 -07:00
zacharymorn da8143bfa3
LUCENE-10480: Move scoring from advance to TwoPhaseIterator#matches to improve disjunction within conjunction (#1006) 2022-07-07 01:10:50 -07:00
Vigya Sharma 698f40ad51
LUCENE-10216: Use MergeScheduler and MergePolicy to run addIndexes(CodecReader[]) merges. (#633)
* Use merge policy and merge scheduler to run addIndexes merges

* wrapped reader does not see deletes - debug

* Partially fixed tests in TestAddIndexes

* Use writer object to invoke addIndexes merge

* Use merge object info

* Add javadocs for new methods

* TestAddIndexes passing

* verify field info schemas upfront from incoming readers

* rename flag to track pooled readers

* Keep addIndexes API transactional

* Maintain transactionality - register segments with iw after all merges complete

* fix checkstyle

* PR comments

* Fix pendingDocs - numDocs mismatch bug

* Tests with 1-1 merges and partial merge failures

* variable renaming and better comments

* add test for partial merge failures. change tests to use 1-1 findmerges

* abort pending merges gracefully

* test null and empty merge specs

* test interim files are deleted

* test with empty readers

* test cascading merges triggered

* remove nocommits

* gradle check errors

* remove unused line

* remove printf

* spotless apply

* update TestIndexWriterOnDiskFull to accept mergeException from failing addIndexes calls

* return singleton reader mergespec in NoMergePolicy

* rethrow exceptions seen in merge threads on failure

* spotless apply

* update test to new exception type thrown

* spotlessApply

* test for maxDoc limit in IndexWriter

* spotlessApply

* Use DocValuesIterator instead of DocValuesFieldExistsQuery for counting soft deletes

* spotless apply

* change exception message for closed IW

* remove non-essential comments

* update api doc string

* doc string update

* spotless

* Changes file entry

* simplify findMerges API, add 1-1 merges to MockRandomMergePolicy

* update merge policies to new api

* remove unused imports

* spotless apply

* move changes entry to end of list

* fix testAddIndicesWithSoftDeletes

* test with 1-1 merge policy always enabled

* please spotcheck

* tidy

* test - never use 1-1 merge policy

* use 1-1 merge policy randomly

* Remove concurrent addIndexes findMerges from MockRandomMergePolicy

* Bug Fix: RuntimeException in addIndexes

Aborted pending merges were slipping through the merge exception check in
API, and getting caught later in the RuntimeException check.

* tidy

* Rebase on main. Move changes to 10.0

* Synchronize IW.AddIndexesMergeSource on outer class IW object

* tidy
2022-07-06 18:15:47 -04:00
Peter Gromov d537013e70
LUCENE-10626: Hunspell: add tools to aid dictionary editing: analysis introspection, stem expansion and stem/flag suggestion (#975) 2022-07-05 21:38:03 +02:00
Adrien Grand 3dd9a5487c
LUCENE-10636: Avoid computing the same scores multiple times. (#1005)
`BlockMaxMaxscoreScorer` would previously compute the score twice for essential
scorers.

Co-authored-by: zacharymorn <zacharymorn@gmail.com>
2022-07-05 10:14:02 +02:00
Adrien Grand 81d4a7a69f
LUCENE-10151: Some fixes to query timeouts. (#996)
I noticed some minor bugs in the original PR #927 that this PR should fix:
 - When a timeout is set, we would no longer catch
   `CollectionTerminatedException`.
 - I added randomization to `LuceneTestCase` to randomly set a timeout, it
   would have caught the above bug.
 - Fixed visibility of `TimeLimitingBulkScorer`.
2022-07-04 17:32:38 +02:00
zacharymorn 503ec55973
LUCENE-10480: Use BMM scorer for 2 clauses disjunction (#972) 2022-07-02 13:26:16 -07:00
Julie Tibshirani 187f843e2a
LUCENE-10577: Add vectors format unit test and fix toString (#998)
We forgot to add this unit test when introducing the new 9.3 vectors format.
This commit adds the test and fixes issues it uncovered in toString.
2022-07-02 19:00:29 +02:00
Uwe Schindler fb617e29d0 Remove deprecations in main (#978) 2022-07-01 16:32:50 +02:00