Commit Graph

127 Commits

Author SHA1 Message Date
Adrien Grand 8ae03d66ad Move postings back to int[] to take advantage of having more lanes per vector. (#13968)
In Lucene 8.4, we updated postings to work on long[] arrays internally. This
allowed us to workaround the lack of explicit vectorization (auto-vectorization
doesn't detect all the scenarios that we would like to handle) support in the
JVM by summing up two integers in one operation for instance.

With explicit vectorization now available, it looks like we can get more
benefits from the ability to compare multiple intetgers in one operations than
from summing up two integers in one operation. Moving back to ints helps
compare 2x more integers at once vs. longs.
2024-11-04 18:40:53 +01:00
Dawid Weiss ea1441c81c
Upgrade to gradle 8.10 (#13700) 2024-08-30 12:36:56 +02:00
Adrien Grand 299d7f6721
Speed up prefix sums when decoding doc IDs. (#13658)
This updates file formats to compute prefix sums by summing up 8 deltas per
long at the same time if the number of bits per value is 4 or less, and 4
deltas per long at the same time if the number of bits per value is between 5
included and 11 included. Otherwise, we keep summing up 2 deltas per long like
today.

The `PostingDecodingUtil` was slightly modified due to the fact that more
numbers of bits per value now need to apply different shifts to the input data.
E.g. now that we store integers that require 5 bits per value as 16-bit
integers under the hood rather than 8, we extract the first values by shifting
by 16-5=11, 16-2*5=6 and 16-3*5=1 and then decode tail values from the
remaining bit per 16-bit integer.
2024-08-16 21:27:32 +02:00
Dawid Weiss 8b25c4dd0b Fix icu's regeneration script: instead of getVersion we can just pick the version from the catalog. 2024-08-15 22:28:49 +02:00
Dawid Weiss 1cfa697c06
Fix eclipse ide settings generation (#13649)
* Only run the ide configuration block for eclipse when explicitly invoked. fix property access ordering here and there.

* Correct dependsOn task name.

* Correct crlf/encoding after versionCatalogFormatDeps finishes.

* Change java-library to java-base in the plugin applied within the eclipse task.

* use ant.fixcrlf to correct line endings.

* Changes entry.

* Simplify fixcrlf

---------

Co-authored-by: Uwe Schindler <uschindler@apache.org>
2024-08-14 23:50:06 +02:00
Adrien Grand b4a8810b7a
Inline skip data into postings lists (#13585)
This updates the postings format in order to inline skip data into postings. This format is generally similar to the current `Lucene99PostingsFormat`, e.g. it shares the same block encoding logic, but it has a few differences:
 - Skip data is inlined into postings to make the access pattern more sequential.
 - There are only 2 levels of skip data: on every block (128 docs) and every 32 blocks (4,096 docs).

In general, I found that the fact that skip data is inlined may slow down a bit queries that don't need skip data at all (e.g. `CountOrXXX` tasks that never advance of consult impacts) and speed up a bit queries that advance by small intervals. The fact that the greatest level only allows skipping 4096 docs at once means that we're slower at advancing by large intervals, but data suggests that it doesn't significantly hurt performance.
2024-07-31 17:18:28 +02:00
Dawid Weiss dc287862dd
Gradle build: cleanup of dependency resolution and consolidation of dependency versions (#13484) 2024-06-17 09:49:21 +02:00
Dawid Weiss afe982b3ef
Schedule compileJava after the internal task if it affects source files (#13282) 2024-04-09 07:44:07 +02:00
Robert Muir a7e916223c
remove unnecessary chmod+x, file is marked executable in snowball git 2024-03-28 16:43:47 -04:00
Robert Muir 3553769463
remove now-unnecessary snowball mojibake hack (#13231) 2024-03-28 16:40:55 -04:00
Robert Muir 11712a3364
remove unsupported snowball algorithms (#13230) 2024-03-28 16:37:18 -04:00
Robert Muir ad8545151d
upgrade snowball to 34f3612e5e8c (round two) (#13227)
* upgrade snowball to 34f3612e5e8c (round two)

* disable forbidden-apis on snowball code (thanks @uschindler)
2024-03-27 17:51:48 -04:00
Robert Muir d54663ad76
upgrade snowball to 26db1ab9adbf437f37a6facd3ee2aad1da9eba03 (#13209)
* upgrade snowball to 26db1ab9adbf437f37a6facd3ee2aad1da9eba03

* add back-compat-hack to the factory, too

* remove open of irish package now that we don't have our own stopwords file here anymore

* CHANGES / MIGRATE
2024-03-27 10:05:57 -04:00
Uwe Schindler e446904c61
Remove ByteBufferIndexInput and update all Panama implementations (MMap and Vector) to Java 21 (#13146) 2024-02-29 19:38:37 +01:00
Uwe Schindler 178f5a7a7e
Enable MemorySegment in MMapDirectory for Java 22+ and Vectorization (incubation) for exact Java 22 (#12706) 2024-02-09 23:02:42 +01:00
sabi0 78b4f75a2c
Replace .collect(toList()) with .toList() and misc. code cleanups (#12978) 2023-12-30 17:04:11 +01:00
Uwe Schindler 10387f136f Fix encoding problem caused by invisible character with ExtractJdkApis.java 2023-12-12 15:00:01 +01:00
Jakub Slowinski 8ae598bae5
Remove patching for doc blocks. (#12741)
* Change Postings back to using FOR in Lucene99PostingsFormat

We are still keeping PFOR for positions only.
This is a partial revert of https://github.com/apache/lucene/pull/69 which brings back ForDeltaUtil.

* fix merge commit

* Add forgotten forDeltaUtil calls to reader

* Addressing comments: adding Lucene90RWPostingsFormat + more

Also:
* Change to Changes.txt
* Removal of dead code which was only used in unit tests
* Removal of test code from PForUtil

* Changes.txt edit in right place now

* Apply suggestions from code review: `90 -> 99 refactoring`

Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com>

* Remove decodeTo32 from ForUtil and regenerate

---------

Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com>
2023-11-06 10:46:03 -05:00
Uwe Schindler fad6653495
Fix Gradle toolchain download with Gradle 8.4 (#12655) (#12662) 2023-10-12 13:22:48 +02:00
Tyler Bertrand e43bea4fa4
Resolve CompileJava task cache miss (#12577)
Resolve CompileJava task cache miss. Implement CommandLineArgumentProvider to give relative path sensitivity to apiJar file
2023-09-21 10:42:20 +02:00
Uwe Schindler f668cfd1cd
Fix forUtil.gradle to actually execute python script and also fix type error in script (#12411) 2023-07-03 16:22:03 +02:00
Chris Hegarty 1090928c14
Implement VectorUtilProvider with Java 21 Project Pamana Vector API (#12363)
This commit enables the Panama Vector API for Java 21. The version of
VectorUtilPanamaProvider for Java 21 is identical to that of Java 20.
As such, there is no specific 21 version - the Java 20 version will be
loaded from the MRJAR.
2023-06-13 09:44:58 +01:00
Uwe Schindler c8e05c8cd6
Implement MMapDirectory with Java 21 Project Panama Preview API (#12294) 2023-06-12 21:07:04 +02:00
Uwe Schindler c188d47a8b
Handle jdk.internal classes mentioned in vector superclass or interfaces during extraction (#12329) 2023-05-25 17:21:03 +02:00
Uwe Schindler cf7245e38e Refactor loop to not addAll set to itsself on initial round (followup of #12311) 2023-05-25 09:06:45 +02:00
Chris Hegarty f756f90644
Integrate the Incubating Panama Vector API (#12311)
Leverage accelerated vector hardware instructions in Vector Search.

Lucene already has a mechanism that enables the use of non-final JDK APIs, currently used for the Previewing Pamana Foreign API. This change expands this mechanism to include the Incubating Pamana Vector API. When the jdk.incubator.vector module is present at run time the Panamaized version of the low-level primitives used by Vector Search is enabled. If not present, the default scalar version of these low-level primitives is used (as it was previously).

Currently, we're only targeting support for JDK 20. A subsequent PR should evaluate JDK 21.
---------

Co-authored-by: Uwe Schindler <uschindler@apache.org>
Co-authored-by: Robert Muir <rmuir@apache.org>
2023-05-25 07:59:50 +01:00
Uwe Schindler 84e2e3afc3
Make sure APIJAR reproduces with different timezone (unfortunately java encodes the date using local timezone) (#12315) 2023-05-19 18:42:55 +02:00
Uwe Schindler e4d8a5c5cb
Implement MMapDirectory with Java 20 Project Panama Preview API (#12188) 2023-03-09 21:27:31 +01:00
Uwe Schindler 8564da434d
Generate gradle.properties from gradlew (#12131)
* SOLR-16641 - Generate gradle.properties from gradlew (#1320)
* Adapt for Lucene
* Remove localSettings from smoker; thanks @colvinco
* Print properties at end for debugging
* Add CHANGES.txt entry

---------

Co-authored-by: Colvin Cowie <colvin.cowie.dev@gmail.com>
Co-authored-by: Colvin Cowie <51863265+colvinco@users.noreply.github.com>
2023-02-06 19:47:15 +01:00
Uwe Schindler c9401bf064
Patch class files for Java 19 code to no longer have the "preview" flag (this enables Java 19 memory segments by default) (#12033) 2022-12-26 10:07:44 +01:00
Dawid Weiss 3f6410b738
Implement source code regeneration for test-framework perl scripts (#11952) 2022-11-19 23:40:45 +01:00
Robert Muir 99aa6dd0eb
Lower gradle heap: 3GB is unnecessary (#11936) 2022-11-18 10:55:32 -05:00
Robert Muir b292151f21
reland: 11925 javac options (#11937)
* pass jvm args to javac  #11925

* Update net.ltgt.errorprone to the latest version so that jvm args are not overwritten, add -XX:+PrintFlagsFinal for debugging

* speed up the pure javac case too

It does not help to fork additional VMs (although error-prone will do
this since it messes with bootstrap classpath), so we avoid forking.

Instead it is best to tune org.gradle.jvmargs.

* use the flags consistently everywhere (tests and doc)

* handle the different possible alt toolchain cases

The difference is invoking 'java' versus invoking 'javac', so the args must be fed differently.

Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
2022-11-15 19:28:52 -05:00
Robert Muir 0753f48a80
Revert "speed up javac and error-prone tasks by using less resources (#11927)"
This reverts commit deab699188.

Having a shitty day: git lost changes :(
2022-11-15 19:08:43 -05:00
Robert Muir deab699188
speed up javac and error-prone tasks by using less resources (#11927)
* pass jvm args to javac  #11925
* Update net.ltgt.errorprone to the latest version so that jvm args are not overwritten, add -XX:+PrintFlagsFinal for debugging
* speed up the pure javac case too

It does not help to fork additional VMs (although error-prone will do
this since it messes with bootstrap classpath), so we avoid forking.

Instead it is best to tune org.gradle.jvmargs.

* use the flags consistently everywhere (tests and doc)

Co-authored-by: Dawid Weiss <dweiss@apache.org>
2022-11-15 19:04:17 -05:00
Robert Muir a02db83235
Revert "speed up javac and error-prone tasks by using less resources (#11927)"
This reverts commit b9c51d9a59.
2022-11-15 18:19:09 -05:00
Dawid Weiss b9c51d9a59
speed up javac and error-prone tasks by using less resources (#11927)
* pass jvm args to javac  #11925
* Update net.ltgt.errorprone to the latest version so that jvm args are not overwritten, add -XX:+PrintFlagsFinal for debugging
* speed up the pure javac case too

It does not help to fork additional VMs (although error-prone will do
this since it messes with bootstrap classpath), so we avoid forking.

Instead it is best to tune org.gradle.jvmargs.

* use the flags consistently everywhere (tests and doc)

Co-authored-by: Robert Muir <rmuir@apache.org>
2022-11-15 16:06:38 -05:00
Uwe Schindler 3b9c728ab5
MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23) (#912)
This uses Gradle's auto-provisioning to compile Java 19 classes and build a multi-release JAR from them. Please make sure to regenerate gradle.properties (delete it) or change "org.gradle.java.installations.auto-download" to "true"
2022-09-26 15:22:04 +02:00
Dawid Weiss 6b82be5f11
Regenerate sources after dependency updates. (#11817) 2022-09-25 18:09:30 +02:00
tang donghai b08e34722d
LUCENE-10646: Add some comment on LevenshteinAutomata (#1016)
* add Comment on Lev & pretty the toDot

* use auto generate scripts to add comment

* update checksum

* update checksum

* restore toDot

* add removeDeadStates in levAutomata

Co-authored-by: tangdonghai <tangdonghai@meituan.com>
2022-08-07 10:01:30 -04:00
Dawid Weiss f93e52e5bb
LUCENE-10669: The build should be more helpful when generated resources are touched (#1053) 2022-07-30 20:45:32 +02:00
Dawid Weiss ba1062620c
LUCENE-10510: Check module access prior to running gjf/spotless tasks (#802) 2022-04-10 20:35:45 +02:00
Tomoko Uchida 2a3e5ca07f
LUCENE-10475: Merge o.a.l.a.[ja|ko].util into o.a.l.a.[ja|ko].dict (#772) 2022-03-29 21:09:26 +09:00
Tomoko Uchida 76c9fd4e38 LUCENE-10416: Update Korean Dictionary to mecab-ko-dic-2.1.1-20180720 for Nori 2022-02-20 21:39:03 +09:00
Dawid Weiss b48cac0206
LUCENE-10285: try to force ordering of internal tasks, in spite of making top-level wrapper dependencies. (#549) 2021-12-17 19:12:09 +01:00
Robert Muir c8f5b9127d
LUCENE-10243: increase unicode versions of tokenizers to 12.1 (#465)
* Bump %unicode 9 -> %unicode 12.1 for the 3 unicode grammars
* regenerate emoji conformance tests for unicode 12.1
* modify wordbreak conformance tests to use emoji data (which replaces old crazy E_base etc properties)
* regenerate wordbreak conformance tests
* Simplify grammar files and word-break conformance test generator, now that full-width numbers are WordBreak=Numeric
* Use jflex emoji properties rather than ICU-generated ones
2021-12-03 20:20:57 -05:00
Robert Muir af831d2810
LUCENE-10239: upgrade jflex (1.7.0 -> 1.8.2) (#452)
Upgrade jflex.

Change doesn't alter the behavior of any of the analyzers (unicode
version or grammar refactorings), just the minimal to get new tooling
working.
2021-11-19 09:24:27 -05:00
Dawid Weiss bae095ae48
LUCENE-10240: gradle regenerate fails on java 17 (#449) 2021-11-17 18:36:34 +01:00
Dawid Weiss 0eeba8d37c
LUCENE-10238: Update icu4j to 70.1. (#447) 2021-11-17 18:13:40 +01:00
Dawid Weiss 1a38cac68e LUCENE-10195: add commented-out org.gradle.caching=true to the generated local settings. 2021-11-02 12:18:51 +01:00