This updates file formats to compute prefix sums by summing up 8 deltas per
long at the same time if the number of bits per value is 4 or less, and 4
deltas per long at the same time if the number of bits per value is between 5
included and 11 included. Otherwise, we keep summing up 2 deltas per long like
today.
The `PostingDecodingUtil` was slightly modified due to the fact that more
numbers of bits per value now need to apply different shifts to the input data.
E.g. now that we store integers that require 5 bits per value as 16-bit
integers under the hood rather than 8, we extract the first values by shifting
by 16-5=11, 16-2*5=6 and 16-3*5=1 and then decode tail values from the
remaining bit per 16-bit integer.
* Only run the ide configuration block for eclipse when explicitly invoked. fix property access ordering here and there.
* Correct dependsOn task name.
* Correct crlf/encoding after versionCatalogFormatDeps finishes.
* Change java-library to java-base in the plugin applied within the eclipse task.
* use ant.fixcrlf to correct line endings.
* Changes entry.
* Simplify fixcrlf
---------
Co-authored-by: Uwe Schindler <uschindler@apache.org>
This updates the postings format in order to inline skip data into postings. This format is generally similar to the current `Lucene99PostingsFormat`, e.g. it shares the same block encoding logic, but it has a few differences:
- Skip data is inlined into postings to make the access pattern more sequential.
- There are only 2 levels of skip data: on every block (128 docs) and every 32 blocks (4,096 docs).
In general, I found that the fact that skip data is inlined may slow down a bit queries that don't need skip data at all (e.g. `CountOrXXX` tasks that never advance of consult impacts) and speed up a bit queries that advance by small intervals. The fact that the greatest level only allows skipping 4096 docs at once means that we're slower at advancing by large intervals, but data suggests that it doesn't significantly hurt performance.
This commit adds a separate option, tests.defaultvectorization, to allow running Panama Vectorization for all tests with suitable C2 defaults.
For example:
./gradlew :lucene:core:test -Ptests.defaultvectorization=true
---------
Co-authored-by: Uwe Schindler <uschindler@apache.org>
* upgrade snowball to 26db1ab9adbf437f37a6facd3ee2aad1da9eba03
* add back-compat-hack to the factory, too
* remove open of irish package now that we don't have our own stopwords file here anymore
* CHANGES / MIGRATE
* Simple patch to prevent the common zero-width code points in our source and some types of resource files
* Validate correct UTF-8 input and fix buggy CSS file (ISO-8859-x encoded)
* add a bit of context
* Add CHANGES.txt
This commit reflows the code in the method body of computeCommonPrefixLengthAndBuildHistogram, so as to avoid a JVM JIT crash. The purpose of this change is to workaround the JVM bug which is somewhat fragile, but the best that we can do for now and appears to be working well.
* Rewrite Javascript expression compiler to use hidden classes and MethodHandles for functions
* Use dynamic constants for MethodHandles
* Remove invokestatic code and handle everything through dynamic constants
* Rewrite code to patch stack trace (keep Expressions class unmodified)
* Improve generating of constant names
* Remove classloader test (no longer needed)
* Add benchmark
* use better exception in benchmark
* Add documentation, migration guide and a utility method to convert legacy function maps
* also ignore SecurityException here while checking compatibility (if it happens only an imprecise error message is thrown)
* Use Map.copyOf to not clone the map each time we compile an expression
* Add another test with same method multiple times
* Update ASM to 9.6 and set classfile version to Java 17
* Cleanup classloader permissions, unfortunately "createClassLoader" is still needed for Jacoco for God knows what
* Change Postings back to using FOR in Lucene99PostingsFormat
We are still keeping PFOR for positions only.
This is a partial revert of https://github.com/apache/lucene/pull/69 which brings back ForDeltaUtil.
* fix merge commit
* Add forgotten forDeltaUtil calls to reader
* Addressing comments: adding Lucene90RWPostingsFormat + more
Also:
* Change to Changes.txt
* Removal of dead code which was only used in unit tests
* Removal of test code from PForUtil
* Changes.txt edit in right place now
* Apply suggestions from code review: `90 -> 99 refactoring`
Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com>
* Remove decodeTo32 from ForUtil and regenerate
---------
Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com>
The default tests.multiplier passed from gradle was 1, but
LuceneTestCase tried to compute its default value from TESTS_NIGHTLY.
This could lead to subtle errors: nightly mode failures would not report
tests.multipler=1 and when started from the IDE, the tests.multiplier
would be set to 2 (leading to different randomness).