This changes HnswGraphBuilder to re-use the same candidates queues for adding nodes by allocating them in the Builder instance.
This saves about 2.5% of build time and takes memory allocations of NQ long[] from 25% of total to 0%. JFR runs are attached.
The difference from the first attempt (which actually made things slower for some graphs) is that it preserves the original code's behavior of using a 1-sized queue for the search in the levels above where the node actually gets added.
* Re-use NeighborQueue during build's search
* improve javadoc for OnHeapHnswGraphSearcher
* assert that results parameter is minheap as expected
* update CHANGES
We have recently increased the likelihood of leveraging inter-segment search
concurrency in tests when `newSearcher` is used to create the index
searcher (see #959). When parallel execution is enabled though, it is
dependent on the number of documents and segments. That means
that out of 1000 test runs that use `RandomIndexWriter` to index a random
number of docs up to 100, we will effectively parallelize only a couple
of times.
This commit increases the likelihood of running concurrent searches by
randomly forcing 1 max segments per slice as well as 1 max doc per slice.
PR #12169 accidentally moved the `TermAndBoost` class to a different location,
which would break custom sub-classes of `QueryBuilder`. This commit moves it
back to its original location.
This commit enables the Panama Vector API for Java 21. The version of
VectorUtilPanamaProvider for Java 21 is identical to that of Java 20.
As such, there is no specific 21 version - the Java 20 version will be
loaded from the MRJAR.
When reading data from outside the buffer, BufferedIndexInput always resets
its buffer to start at the new read position. If we are reading backwards (for example,
using an OffHeapFSTStore for a terms dictionary) then this can have the effect of
re-reading the same data over and over again.
This commit changes BufferedIndexInput to use paging when reading backwards,
so that if we ask for a byte that is before the current buffer, we read a block of data
of bufferSize that ends at the previous buffer start.
Fixes#12356
* TestHunspell: reduce the flakiness probability
We need to check how the timeout interacts with custom exception-throwing checkCanceled.
The default timeout seems not enough for some CI agents, so let's increase it.
Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>
This method is showing up as a little hot when profiling some queries.
Almost all the time spent in this method is just burnt on ceremony
around stream indirections that don't inline.
Moving this to iterators, simplifying the check for same doc id and also saving one iteration (for the min
cost) makes this method far cheaper and easier to read.
The concurrent query rewrite for knn vectory query introduced with #12160
requests one thread per segment to the executor. To align this with the
IndexSearcher parallel behaviour, we should rather parallelize across
slices. Also, we can reuse the same slice executor instance that the
index searcher already holds, in that way we are using a
QueueSizeBasedExecutor when a thread pool executor is provided.