This updates file formats to compute prefix sums by summing up 8 deltas per
long at the same time if the number of bits per value is 4 or less, and 4
deltas per long at the same time if the number of bits per value is between 5
included and 11 included. Otherwise, we keep summing up 2 deltas per long like
today.
The `PostingDecodingUtil` was slightly modified due to the fact that more
numbers of bits per value now need to apply different shifts to the input data.
E.g. now that we store integers that require 5 bits per value as 16-bit
integers under the hood rather than 8, we extract the first values by shifting
by 16-5=11, 16-2*5=6 and 16-3*5=1 and then decode tail values from the
remaining bit per 16-bit integer.
* Only run the ide configuration block for eclipse when explicitly invoked. fix property access ordering here and there.
* Correct dependsOn task name.
* Correct crlf/encoding after versionCatalogFormatDeps finishes.
* Change java-library to java-base in the plugin applied within the eclipse task.
* use ant.fixcrlf to correct line endings.
* Changes entry.
* Simplify fixcrlf
---------
Co-authored-by: Uwe Schindler <uschindler@apache.org>
This updates the postings format in order to inline skip data into postings. This format is generally similar to the current `Lucene99PostingsFormat`, e.g. it shares the same block encoding logic, but it has a few differences:
- Skip data is inlined into postings to make the access pattern more sequential.
- There are only 2 levels of skip data: on every block (128 docs) and every 32 blocks (4,096 docs).
In general, I found that the fact that skip data is inlined may slow down a bit queries that don't need skip data at all (e.g. `CountOrXXX` tasks that never advance of consult impacts) and speed up a bit queries that advance by small intervals. The fact that the greatest level only allows skipping 4096 docs at once means that we're slower at advancing by large intervals, but data suggests that it doesn't significantly hurt performance.
* upgrade snowball to 26db1ab9adbf437f37a6facd3ee2aad1da9eba03
* add back-compat-hack to the factory, too
* remove open of irish package now that we don't have our own stopwords file here anymore
* CHANGES / MIGRATE
* Change Postings back to using FOR in Lucene99PostingsFormat
We are still keeping PFOR for positions only.
This is a partial revert of https://github.com/apache/lucene/pull/69 which brings back ForDeltaUtil.
* fix merge commit
* Add forgotten forDeltaUtil calls to reader
* Addressing comments: adding Lucene90RWPostingsFormat + more
Also:
* Change to Changes.txt
* Removal of dead code which was only used in unit tests
* Removal of test code from PForUtil
* Changes.txt edit in right place now
* Apply suggestions from code review: `90 -> 99 refactoring`
Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com>
* Remove decodeTo32 from ForUtil and regenerate
---------
Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com>
This commit enables the Panama Vector API for Java 21. The version of
VectorUtilPanamaProvider for Java 21 is identical to that of Java 20.
As such, there is no specific 21 version - the Java 20 version will be
loaded from the MRJAR.
Leverage accelerated vector hardware instructions in Vector Search.
Lucene already has a mechanism that enables the use of non-final JDK APIs, currently used for the Previewing Pamana Foreign API. This change expands this mechanism to include the Incubating Pamana Vector API. When the jdk.incubator.vector module is present at run time the Panamaized version of the low-level primitives used by Vector Search is enabled. If not present, the default scalar version of these low-level primitives is used (as it was previously).
Currently, we're only targeting support for JDK 20. A subsequent PR should evaluate JDK 21.
---------
Co-authored-by: Uwe Schindler <uschindler@apache.org>
Co-authored-by: Robert Muir <rmuir@apache.org>
* pass jvm args to javac #11925
* Update net.ltgt.errorprone to the latest version so that jvm args are not overwritten, add -XX:+PrintFlagsFinal for debugging
* speed up the pure javac case too
It does not help to fork additional VMs (although error-prone will do
this since it messes with bootstrap classpath), so we avoid forking.
Instead it is best to tune org.gradle.jvmargs.
* use the flags consistently everywhere (tests and doc)
* handle the different possible alt toolchain cases
The difference is invoking 'java' versus invoking 'javac', so the args must be fed differently.
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
* pass jvm args to javac #11925
* Update net.ltgt.errorprone to the latest version so that jvm args are not overwritten, add -XX:+PrintFlagsFinal for debugging
* speed up the pure javac case too
It does not help to fork additional VMs (although error-prone will do
this since it messes with bootstrap classpath), so we avoid forking.
Instead it is best to tune org.gradle.jvmargs.
* use the flags consistently everywhere (tests and doc)
Co-authored-by: Dawid Weiss <dweiss@apache.org>
* pass jvm args to javac #11925
* Update net.ltgt.errorprone to the latest version so that jvm args are not overwritten, add -XX:+PrintFlagsFinal for debugging
* speed up the pure javac case too
It does not help to fork additional VMs (although error-prone will do
this since it messes with bootstrap classpath), so we avoid forking.
Instead it is best to tune org.gradle.jvmargs.
* use the flags consistently everywhere (tests and doc)
Co-authored-by: Robert Muir <rmuir@apache.org>
This uses Gradle's auto-provisioning to compile Java 19 classes and build a multi-release JAR from them. Please make sure to regenerate gradle.properties (delete it) or change "org.gradle.java.installations.auto-download" to "true"
* add Comment on Lev & pretty the toDot
* use auto generate scripts to add comment
* update checksum
* update checksum
* restore toDot
* add removeDeadStates in levAutomata
Co-authored-by: tangdonghai <tangdonghai@meituan.com>
* Bump %unicode 9 -> %unicode 12.1 for the 3 unicode grammars
* regenerate emoji conformance tests for unicode 12.1
* modify wordbreak conformance tests to use emoji data (which replaces old crazy E_base etc properties)
* regenerate wordbreak conformance tests
* Simplify grammar files and word-break conformance test generator, now that full-width numbers are WordBreak=Numeric
* Use jflex emoji properties rather than ICU-generated ones
Upgrade jflex.
Change doesn't alter the behavior of any of the analyzers (unicode
version or grammar refactorings), just the minimal to get new tooling
working.