* Rewrite Javascript expression compiler to use hidden classes and MethodHandles for functions
* Use dynamic constants for MethodHandles
* Remove invokestatic code and handle everything through dynamic constants
* Rewrite code to patch stack trace (keep Expressions class unmodified)
* Improve generating of constant names
* Remove classloader test (no longer needed)
* Add benchmark
* use better exception in benchmark
* Add documentation, migration guide and a utility method to convert legacy function maps
* also ignore SecurityException here while checking compatibility (if it happens only an imprecise error message is thrown)
* Use Map.copyOf to not clone the map each time we compile an expression
* Add another test with same method multiple times
* Update ASM to 9.6 and set classfile version to Java 17
* Cleanup classloader permissions, unfortunately "createClassLoader" is still needed for Jacoco for God knows what
This commit fixes the intermittently failing TestParallelLeafReader.
The ParallelLeafReader requires the document order to be consistent across indexes - each document contains the union of the fields of all documents with the same document number. The test asserts this. But now, with MockRandomMergePolicy potentially reversing the doc ID order while merging, this invalidates the assumption of the test indexes and assertions. The solution is to just ensure that no merging actually happens in these tiny test indexes.
This commit fixes the intermittently failing TestSortedSetFieldSource.
The test assertions depend on doc order which may be affected by merging. The fix is to trivially avoid merging for the very small index, with just two docs.
* Report the time it took for building the FST
* Update CHANGES
* Change ramBytesUsed to numBytes
* Report the verification time
* Rename to fstSizeInBytes
* CheckIndex - Making -fast the default behaviour
1. Making -fast the new default.
2. The previous -slow is moved to -slower
3. The previous default behavior (checksum + segment file content) is activated by -slow.
* gradlew tidy
* Add changes.txt
* Moved change to Lucene 10.0, now using -detailLevel param
* Fix failing test
* Add MIGRATE.md note and comment to remove deprecated params
* Fix failing unit test
* Changing detailLevel -> level
* catch invalid API calls
* Update lucene/core/src/java/org/apache/lucene/index/CheckIndex.java
Co-authored-by: Adrien Grand <jpountz@gmail.com>
---------
Co-authored-by: Adrien Grand <jpountz@gmail.com>
This adds `BPReorderingMergePolicy`, a merge policy wrapper that reorders doc
IDs on merge using a `BPIndexReorderer`.
- Reordering always run on forced merges.
- A `minNaturalMergeNumDocs` parameter helps only enable reordering on the
larger merged segments. This way, small merges retain all merging
optimizations like bulk copying of stored fields, and only the larger
segments - which are the most important for search performance - get
reordered.
- If not enough RAM is available to perform reordering, reordering is skipped.
To make this work, I had to add the ability for any merge to reorder doc IDs of
the merged segment via `OneMerge#reorder`. `MockRandomMergePolicy` from the
test framework randomly reverts the order of documents in a merged segment to
make sure this logic is properly exercised.
Instead of using a fixed number of bits per value, the group-varint benchmark
now tries to reproduce the distribution of the number of bits per values that
can be observed on tail postings of wikibigall.
When we moved to group-varint for tail postings, we stop interleaving docs and
freqs and instead wrote all docs first, then all freqs. This means that we can
now skip decoding frequencies when they are not needed.
Make the TaskExecutor public which is currently pkg-private. At indexing time we concurrently create the hnsw graph (Concurrent HNSW Merge #12660). We could use the TaskExecutor implementation to do this for us.
Use TaskExecutor#invokeAll in HnswConcurrentMergeBuilder#build to run the workers concurrently.
* hunspell: a couple micro-optimizations to speed up dictionary loading
1. sort by the whole entry without searching for separators first: WordStorage doesn't require strong lexicographic order (only something close to it), and the separators are anyway before any usual word characters
2. avoid stream overhead when adding an entry
I think that this optimization was introduced because `advanceShallow` may
advance skip lists and then never decode a block of postings. But actually
`IndexInput#seek` is cheap, including on `NIOFSDirectory`. So let's seek
immediately?
* #7820: add initial (failing) test case exposing the bug in CheckIndex
* #7820: initial fix to CheckIndex to detect 'on load' corruption of older segments files
* exclamation point police
* tidy
* add missing @Override in new test case
* fold feedback: merge SIS-loading logic into single for-loop; rename sis -> lastCommit
* tidy
* sidestep segments.gen file as well in case we are reading very old index
* tidy
* always load last commit point first so if it has an issue that will not be masked by issues with older commit points
When requesting for k >= numVectors, it doesn't make sense to go through the HNSW graph. Even without a user supplied filter, we should not explore the HNSW graph if it contains fewer than k vectors.
One scenario where we may still explore the graph if k >= numVectors is when not every document has a vector and there are deleted docs. But, this commit significantly improves things regardless.
This was found by `testRandomExceptions()`: if an exception occurs when opening
the meta file, then the `rawVectorsReader` that is passed to the constructor
never gets closed.