Commit Graph

36782 Commits

Author SHA1 Message Date
Adrien Grand 8811f31b9c
Add a post-collection hook to LeafCollector. (#12380)
This adds `LeafCollector#finish` as a per-segment post-collection hook. While
it was already possible to do this sort of things on top of the collector API
before, a downside is that the last leaf would need to be post-collected in the
current thread instead of using the executor, which is a missed opportunity for
making queries concurrent.
2023-06-30 15:19:35 +02:00
Sagar 40ee6e583e
Assign a dummy simScorer in TermsWeight if score is not needed (#12383) 2023-06-30 15:14:33 +02:00
Sorabh 223eecca33
Add a thread safe CachingLeafSlicesSupplier to compute and cache the LeafSlices used with concurrent segment (#12374)
search. It uses the protected method `slices` by default to compute the slices which can be
overriden by the sub classes of IndexSearcher
2023-06-30 14:57:34 +02:00
zhangchao 01200b5804
Speed up NumericDocValuesWriter with index sorting (#12381) 2023-06-30 14:56:56 +02:00
Uwe Schindler e503805758
Remove usage and add some legacy java.util classes to forbiddenapis (Stack, Hashtable, Vector) (#12404) 2023-06-29 16:56:41 +02:00
Luca Cavanna f44cc45cf8
Share concurrent execution code into TaskExecutor (#12398)
Lucene has a non-public SliceExecutor abstraction that handles the execution of tasks when search
is executed concurrently across leaf slices. Knn query vector rewrite has similar code that runs
tasks concurrently and waits for them to be completed and handles
eventual exceptions.

This commit shares code among these two scenarios, to reduce code
duplicate as well as to ensure that furhter improvements can be shared among them.
2023-06-28 13:52:01 +02:00
Adrien Grand 4029cc37a7
Fix MaxScoreBulkScorer#score's return value. (#12400)
`AssertingBulkScorer` asserts that the return value of `BulkScorer#score` may
not be in `[maxDoc, NO_MORE_DOCS)`. While this is not part of the contract of
`BulkScorer#score`, a reasonable implementation should never have return values
in this range, as it would suggest that more matches need collecting when we're
already out of the range of valid doc IDs. So this generally indicates a bug.

`MaxScoreBulkScorer` failed this assertion, because it can sometimes skip the
requested window of doc IDs, when the sum of maximum scores would be less than
the minimum competitive score. In that case, the best information it has is
that there are no matches in the window, but it cannot give a good estimate of
the next potential match.

This assertion in `AssertingBulkScorer` looks sane to me, so I made a small
change to `MaxScoreBulkScorer` to make sure it meets `AssertingScorer`'s
expectations. This is done in a place that is only called once per scored
window, so it should not have a noticeable performance impact.
2023-06-28 09:17:41 +02:00
yixunx 6bb8cc0235
Let hard link wrapper fallback to delegate.copyFrom (#12384)
Co-authored-by: Yixun Xu <yixunx@palantir.com>
2023-06-28 09:17:20 +02:00
Stefan Vodita b88d3e1988
Catch offset overflows in byte pool (#9660) (#12392) 2023-06-28 09:16:56 +02:00
Tomas Eduardo Fernandez Lobbe b50b969d25
Update comment about IndexOptions ordinals (#12360)
FieldInfos no longer accepts changes in IndexOptions, however, different IndexOptions are still compared using their ordinals
2023-06-26 16:29:13 -07:00
Alan Woodward 79e8c9c8b9 Fix edge case in TestJoinUtil
TestJoinUtil.checkBoost() needs to check to see if there are
any results to validate, otherwise we can get an array-out-of-bounds
exception
2023-06-26 13:28:09 +01:00
Adrien Grand 4fec812cc5 Add back-compat indices for 9.7.0 2023-06-26 14:24:27 +02:00
Luca Cavanna 3e0dc2b572 Add missing change entires for 9.8 2023-06-26 11:23:36 +02:00
Adrien Grand edc2cf5cd1 DOAP changes for release 9.7.0 2023-06-26 11:05:46 +02:00
Alan Woodward edd799824f
Enable boosts on JoinUtil queries (#12388)
Boosts should not be ignored by queries returned from JoinUtil
2023-06-26 09:47:14 +01:00
Luca Cavanna 7f10dca1e5
Revert "Parallelize knn query rewrite across slices rather than segments (#12325)" (#12385)
This reverts commit 10bebde269.

Based on a recent discussion in
https://github.com/apache/lucene/pull/12183#discussion_r1235739084 we
agreed it makes more sense to parallelize knn query vector rewrite
across leaves rather than leaf slices.
2023-06-26 10:41:18 +02:00
Michael Sokolov cb195bd96e
github-12386: set java.io.tmpdir in replicator tests' forked processes (#12387) 2023-06-23 08:38:06 -04:00
Jonathan Ellis fe0278e36e
Reuse neighborqueue during hnsw index build (attempt 2) (#12372)
This changes HnswGraphBuilder to re-use the same candidates queues for adding nodes by allocating them in the Builder instance.

This saves about 2.5% of build time and takes memory allocations of NQ long[] from 25% of total to 0%. JFR runs are attached.

The difference from the first attempt (which actually made things slower for some graphs) is that it preserves the original code's behavior of using a 1-sized queue for the search in the levels above where the node actually gets added.

* Re-use NeighborQueue during build's search

* improve javadoc for OnHeapHnswGraphSearcher

* assert that results parameter is minheap as expected

* update CHANGES
2023-06-20 15:05:37 -04:00
Adrien Grand 8703e449ce
Change the MAXSCORE scorer to a bulk scorer. (#12361) 2023-06-20 18:55:03 +02:00
zhangchao 37b92adf6a
Avoid redundant loop for compute min value in DirectMonotonicWriter (#12377)
* Avoid redundant loop for get min value

* update CHANGES.txt
2023-06-20 09:12:15 -04:00
Alan Woodward 6d4314d46f Add back-compat indices for 9.6.0 2023-06-16 17:58:31 +02:00
Adrien Grand 6ee7b2b9f6 Add next minor version 9.8.0 2023-06-16 13:39:30 +02:00
Uwe Schindler 148236a50b
This allows VectorUtilProvider tests to be executed although hardware may not fully support vectorization or if C2 is not enabled (#12376) 2023-06-16 12:28:29 +02:00
Luca Cavanna bb6ec50d4c
Increased the likelihood of leveraging inter-segment concurrency in tests (#12369)
We have recently increased the likelihood of leveraging inter-segment search
concurrency in tests when `newSearcher` is used to create the index
searcher (see #959). When parallel execution is enabled though, it is
dependent on the number of documents and segments. That means
that out of 1000 test runs that use `RandomIndexWriter` to index a random
number of docs up to 100, we will effectively parallelize only a couple
of times.

This commit increases the likelihood of running concurrent searches by
randomly forcing 1 max segments per slice as well as 1 max doc per slice.
2023-06-15 11:35:10 +02:00
Alessandro Benedetti af1afc8cb6 * GITHUB#12252 CHANGES.txt fix 2023-06-14 16:00:41 +01:00
Elia Porciani 14c18d8624
GITHUB-12252: Add function queries for computing similarity scores between knn vectors (#12253)
Co-authored-by: Alessandro Benedetti <a.benedetti@sease.io>
2023-06-14 15:49:00 +01:00
Adrien Grand a8baa47733
Move TermAndBoost back to its original location. (#12366)
PR #12169 accidentally moved the `TermAndBoost` class to a different location,
which would break custom sub-classes of `QueryBuilder`. This commit moves it
back to its original location.
2023-06-14 11:54:10 +02:00
Chaitanya Gohel 65447c8388
Add CHANGES.txt for #12334 Honor after value for skipping documents even if queue is not full for PagingFieldCollector (#12368)
Signed-off-by: gashutos <gashutos@amazon.com>
2023-06-14 10:17:06 +02:00
Chris Hegarty 1090928c14
Implement VectorUtilProvider with Java 21 Project Pamana Vector API (#12363)
This commit enables the Panama Vector API for Java 21. The version of
VectorUtilPanamaProvider for Java 21 is identical to that of Java 20.
As such, there is no specific 21 version - the Java 20 version will be
loaded from the MRJAR.
2023-06-13 09:44:58 +01:00
Jonathan Ellis 071461ece5
Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors (#12281)
---------

Co-authored-by: Uwe Schindler <uschindler@apache.org>
2023-06-13 10:40:03 +02:00
gf2121 30eba6df56
Speed up IndexedDISI Sparse #AdvanceExactWithinBlock for tiny step advance (#12324) 2023-06-13 14:24:26 +08:00
Uwe Schindler c8e05c8cd6
Implement MMapDirectory with Java 21 Project Panama Preview API (#12294) 2023-06-12 21:07:04 +02:00
Chris Fournier 41baf23ad9
Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth (#12249) 2023-06-12 18:20:10 +02:00
Uwe Schindler ef35e6edf4
Work around SecurityManager issues during initialization of vector api (JDK-8309727) (#12362) 2023-06-09 22:07:31 +02:00
Alan Woodward a51241e4c9
Better paging when random reads go backwards (#12357)
When reading data from outside the buffer, BufferedIndexInput always resets
its buffer to start at the new read position. If we are reading backwards (for example,
using an OffHeapFSTStore for a terms dictionary) then this can have the effect of
re-reading the same data over and over again.

This commit changes BufferedIndexInput to use paging when reading backwards,
so that if we ask for a byte that is before the current buffer, we read a block of data
of bufferSize that ends at the previous buffer start.

Fixes #12356
2023-06-09 11:57:59 +01:00
fudongying 2934899ca6
feat: soft delete optimize (#12339) 2023-06-09 11:41:28 +02:00
Ignacio Vera 9a2d19324f
[Tessellator] Improve the checks that validate the diagonal between two polygon nodes (#12353) 2023-06-09 08:10:33 +02:00
Peter Gromov 5b63a1879d
TestHunspell: reduce the flakiness probability (#12351)
* TestHunspell: reduce the flakiness probability

We need to check how the timeout interacts with custom exception-throwing checkCanceled.
The default timeout seems not enough for some CI agents, so let's increase it.

Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>
2023-06-07 14:10:44 +02:00
Patrick Zhai 0c293909c0
Add updateDocuments API which accept a query (reopen) (#12346) 2023-06-03 20:16:16 -07:00
Greg Miller 52ace7eb35
Add "direct to binary" option for DaciukMihovAutomatonBuilder and use it in TermInSetQuery#visit (#12320) 2023-06-02 09:34:52 -07:00
Petr Portnov | PROgrm_JARvis 45110a6a46
Make memory fence in `ByteBufferGuard` explicit (#12290) 2023-06-01 13:41:06 +02:00
Uwe Schindler 40b582ab18
Revert "Add updateDocuments API which accept a query (#12341)" (#12344)
This reverts commit 52ab16731e.
2023-06-01 13:37:36 +02:00
Patrick Zhai 52ab16731e
Add updateDocuments API which accept a query (#12341) 2023-06-01 13:37:04 +02:00
Peter Gromov 4bf1b94209
hunspell (minor): reduce allocations when reading the dictionary's morphological data (#12323)
there can be many entries with morph data, so we'd better avoid compiling and matching regexes and even stream allocation
2023-06-01 11:37:38 +02:00
tang donghai ac8c1870fa
NeighborQueue set incomplemete false when call clear (#12322) 2023-05-31 19:55:21 -07:00
Greg Miller f79b316bd5 Add CHANGES entry for GH#12334 2023-05-31 15:18:34 -07:00
Chaitanya Gohel d44be24025
Fix searchafter high latency when after value is out of range for segment (#12334) 2023-05-31 15:07:53 -07:00
Daniele Antuzi da36c24cb9
Use thread-safe search version of HnswGraphSearcher (#12246)
Addressing comment received in the PR https://github.com/apache/lucene/pull/12246
2023-05-30 15:38:06 +01:00
Luca Cavanna 72b91156f3
Don't generate stacktrace for TimeExceededException (#12335)
The exception is package private and never rethrown, we can avoid
generating a stacktrace for it.
2023-05-30 10:29:46 +02:00
Patrick Zhai d1850e44f3
Update TestVectorUtilProviders.java (#12338) 2023-05-29 16:26:29 -07:00