585 Commits

Author SHA1 Message Date
Chris Hegarty
8698dd85d8
Allow easier verification of the Panama Vectorization provider with newer Java versions (#13986)
This commit allows easier verification of the Panama Vectorization provider with newer Java versions.

The upper bound Java version of the Vectorization provider is hardcoded to the version that has been tested and is known to work. This is a bit inflexible when experimenting with and verifying newer JDK versions. This change proposes to add a new system property that allows to set the upper bound of the range of Java versions supported.

With this change, and the accompanying small gradle change, then one can verify newer JDKs as follows:

CI=true; RUNTIME_JAVA_HOME=/Users/chegar/binaries/jdk-24.jdk-ea-b23/Contents/Home
./gradlew :lucene:core:test -Dorg.apache.lucene.vectorization.upperJavaFeatureVersion=24

This change helps both testing and verifying with Early Access JDK builds, as well as allowing to override the upper bound when the JDK is known to work fine.
2024-11-14 09:02:15 +00:00
Adrien Grand
cfdd20f5bc
Move postings back to int[] to take advantage of having more lanes per vector. (#13968)
In Lucene 8.4, we updated postings to work on long[] arrays internally. This
allowed us to workaround the lack of explicit vectorization (auto-vectorization
doesn't detect all the scenarios that we would like to handle) support in the
JVM by summing up two integers in one operation for instance.

With explicit vectorization now available, it looks like we can get more
benefits from the ability to compare multiple intetgers in one operations than
from summing up two integers in one operation. Moving back to ints helps
compare 2x more integers at once vs. longs.
2024-11-01 07:49:09 +01:00
Rene Groeschke
5f0d1fbd0c
Make generated archive files reproducible (#13835)
* Make generated archive files reproducible

This should ensure deterministic archive files and fix issues with changing checksums even
though the codebase has not changed
2024-10-10 11:49:41 +02:00
Dawid Weiss
d7dc57dd0d
jgit/ clean status check should ignore any 'untracked folders' (#13728)
* Ignore any 'untracked folders' #13719
* Upgrade jgit to 6.10.0.202406032230-r.
2024-09-06 09:01:15 +02:00
Dawid Weiss
ea1441c81c
Upgrade to gradle 8.10 (#13700) 2024-08-30 12:36:56 +02:00
Uwe Schindler
e45b5ebdf8
Fix Gradle build sometimes gives spurious "unreferenced license file" warnings (#13696)
Revert changes by #12150 in jar-checks.gradle, because tasks in this file share internal state between tasks without using files. Because of this all tasks here must always execute together, so they cannot define task outputs.
2024-08-28 20:35:26 +02:00
Stefan Vodita
385f56ba22
Remove mention of SolrNamedThreadFactory (#13690) 2024-08-28 11:07:56 +01:00
Adrien Grand
299d7f6721
Speed up prefix sums when decoding doc IDs. (#13658)
This updates file formats to compute prefix sums by summing up 8 deltas per
long at the same time if the number of bits per value is 4 or less, and 4
deltas per long at the same time if the number of bits per value is between 5
included and 11 included. Otherwise, we keep summing up 2 deltas per long like
today.

The `PostingDecodingUtil` was slightly modified due to the fact that more
numbers of bits per value now need to apply different shifts to the input data.
E.g. now that we store integers that require 5 bits per value as 16-bit
integers under the hood rather than 8, we extract the first values by shifting
by 16-5=11, 16-2*5=6 and 16-3*5=1 and then decode tail values from the
remaining bit per 16-bit integer.
2024-08-16 21:27:32 +02:00
Dawid Weiss
8b25c4dd0b Fix icu's regeneration script: instead of getVersion we can just pick the version from the catalog. 2024-08-15 22:28:49 +02:00
Dawid Weiss
1cfa697c06
Fix eclipse ide settings generation (#13649)
* Only run the ide configuration block for eclipse when explicitly invoked. fix property access ordering here and there.

* Correct dependsOn task name.

* Correct crlf/encoding after versionCatalogFormatDeps finishes.

* Change java-library to java-base in the plugin applied within the eclipse task.

* use ant.fixcrlf to correct line endings.

* Changes entry.

* Simplify fixcrlf

---------

Co-authored-by: Uwe Schindler <uschindler@apache.org>
2024-08-14 23:50:06 +02:00
Adrien Grand
b4a8810b7a
Inline skip data into postings lists (#13585)
This updates the postings format in order to inline skip data into postings. This format is generally similar to the current `Lucene99PostingsFormat`, e.g. it shares the same block encoding logic, but it has a few differences:
 - Skip data is inlined into postings to make the access pattern more sequential.
 - There are only 2 levels of skip data: on every block (128 docs) and every 32 blocks (4,096 docs).

In general, I found that the fact that skip data is inlined may slow down a bit queries that don't need skip data at all (e.g. `CountOrXXX` tasks that never advance of consult impacts) and speed up a bit queries that advance by small intervals. The fact that the greatest level only allows skipping 4096 docs at once means that we're slower at advancing by large intervals, but data suggests that it doesn't significantly hurt performance.
2024-07-31 17:18:28 +02:00
Dawid Weiss
dc287862dd
Gradle build: cleanup of dependency resolution and consolidation of dependency versions (#13484) 2024-06-17 09:49:21 +02:00
Dawid Weiss
06f86a5096
Silence odd test runner warnings after gradle upgrade (#13471) 2024-06-10 11:31:40 +02:00
Chris Hegarty
9a4caa935a
Update Gradle wrapper to 8.8 - supports Java 22 (#13453)
This commit updates the Gradle wrapper to 8.8, which has support for Java 22.
2024-06-06 08:46:18 +01:00
Chris Hegarty
8d7e4174af
Add a separate option to allow running Panama Vectorization for all tests with suitable C2 defaults (#13351)
This commit adds a separate option, tests.defaultvectorization, to allow running Panama Vectorization for all tests with suitable C2 defaults.

For example:

./gradlew :lucene:core:test -Ptests.defaultvectorization=true
---------

Co-authored-by: Uwe Schindler <uschindler@apache.org>
2024-05-09 11:00:51 +01:00
Dawid Weiss
afe982b3ef
Schedule compileJava after the internal task if it affects source files (#13282) 2024-04-09 07:44:07 +02:00
Robert Muir
a7e916223c
remove unnecessary chmod+x, file is marked executable in snowball git 2024-03-28 16:43:47 -04:00
Robert Muir
3553769463
remove now-unnecessary snowball mojibake hack (#13231) 2024-03-28 16:40:55 -04:00
Robert Muir
11712a3364
remove unsupported snowball algorithms (#13230) 2024-03-28 16:37:18 -04:00
Robert Muir
ad8545151d
upgrade snowball to 34f3612e5e8c (round two) (#13227)
* upgrade snowball to 34f3612e5e8c (round two)

* disable forbidden-apis on snowball code (thanks @uschindler)
2024-03-27 17:51:48 -04:00
Robert Muir
d54663ad76
upgrade snowball to 26db1ab9adbf437f37a6facd3ee2aad1da9eba03 (#13209)
* upgrade snowball to 26db1ab9adbf437f37a6facd3ee2aad1da9eba03

* add back-compat-hack to the factory, too

* remove open of irish package now that we don't have our own stopwords file here anymore

* CHANGES / MIGRATE
2024-03-27 10:05:57 -04:00
Uwe Schindler
26f5065e15
Add support for Github issue numbers in Markdown converter (e.g., MIGRATE.md file) (#13215) 2024-03-26 13:45:56 +01:00
Uwe Schindler
a4055dae62
Add support for posix_madvise to Java 21 MMapDirectory (#13196) 2024-03-25 18:44:33 +01:00
Dawid Weiss
1c77e2315c
An eye-gouging way to limit suppressAccessChecks to just the three JARs that need them. (#13164) 2024-03-08 08:10:49 +01:00
Dawid Weiss
3ce9ba9fd5 Correct typo #13148 2024-03-01 07:12:57 +01:00
Uwe Schindler
08325ac3e8 Fix successful tests counting not working in Gradle build by adding ReflectPermission back (see ##13146) 2024-03-01 01:25:02 +01:00
Uwe Schindler
6910a4358c
Do not place Panama Java 21 class files in MR-JAR section of core.jar file (#13148) 2024-02-29 23:10:16 +01:00
Uwe Schindler
e446904c61
Remove ByteBufferIndexInput and update all Panama implementations (MMap and Vector) to Java 21 (#13146) 2024-02-29 19:38:37 +01:00
Uwe Schindler
dfce6ee8d2 Update the Javadoc package list to Java 21 2024-02-29 15:06:47 +01:00
Uwe Schindler
5aaaeaee39 Update link to javadocs for Java 21 2024-02-29 14:37:09 +01:00
Uwe Schindler
8f17f23acf Bump minimum required Java version to 21 (#12753)
Co-authored-by: ChrisHegarty <chegar999@gmail.com>
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
Co-authored-by: Robert Muir <rmuir@apache.org>
2024-02-29 12:16:29 +01:00
Uwe Schindler
e7d2bd48a6 Revert "Merge branch 'java_21' of https://github.com/ChrisHegarty/lucene into main"
This reverts commit a356fc1e23ffc2f569b930f4c3431804df9a1e07, reversing
changes made to 7b01f2f516635eced934f3b950d58ad179bf0256.
2024-02-29 11:58:40 +01:00
Uwe Schindler
0ccb119495
Merge branch 'main' into java_21 2024-02-25 16:39:41 +01:00
Uwe Schindler
47021ae98f
Remove hardcoded "--release" from renderJavadoc task (#13132) 2024-02-25 16:32:30 +01:00
ChrisHegarty
07f4b5b19f Merge branch 'main' into java_21 2024-02-19 11:43:46 +00:00
Dawid Weiss
a270acae01
This reverts the addition of spotless:on/off regions and shows just one possible alternative that is formatter fool-proof. (#13098) 2024-02-13 19:00:11 +01:00
Uwe Schindler
178f5a7a7e
Enable MemorySegment in MMapDirectory for Java 22+ and Vectorization (incubation) for exact Java 22 (#12706) 2024-02-09 23:02:42 +01:00
Robert Muir
d7a16dc10a
Merge branch 'main' into java_21 2024-02-09 15:22:45 -05:00
Robert Muir
fefde0f721
java 17 -> java 21 2024-02-09 15:16:41 -05:00
Robert Muir
1f9545e830
remove java < 21 2024-02-09 15:15:52 -05:00
Robert Muir
784c331b68
java 17 -> java 21 2024-02-09 15:14:55 -05:00
Dawid Weiss
8c2c276c6c Modify getEnWikiRandomLines to fetch and decompress the zstd resource #13083 2024-02-06 22:08:09 +01:00
Shubham Chaudhary
4b5917029f
Forbidden Thread.sleep API (#13001)
Co-authored-by: Shubham Chaudhary <cshbha@amazon.com>
2024-02-05 17:23:52 +01:00
sabi0
78b4f75a2c
Replace .collect(toList()) with .toList() and misc. code cleanups (#12978) 2023-12-30 17:04:11 +01:00
Dawid Weiss
6bb244a932
An improved check for ignoring the c2-crash test if running on a client compiler. (#12953) 2023-12-18 12:37:57 +01:00
Jakub Slowinski
3965319441
Attempting to clean up some remaining Solr references (#12939)
* Attempting to clean up some remaining Solr references

* Update gradle/help.gradle

Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>

---------

Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>
2023-12-14 06:02:16 -05:00
Uwe Schindler
16d0b822b3
Prevent the common zero-width code points and detect invalid UTF-8 encoding in our sources and selected resource files (#12937)
* Simple patch to prevent the common zero-width code points in our source and some types of resource files

* Validate correct UTF-8 input and fix buggy CSS file (ISO-8859-x encoded)

* add a bit of context

* Add CHANGES.txt
2023-12-13 17:27:05 +01:00
Robert Muir
98d2df17d5
enable error-prone's DisableUnicodeInCode check (#12936)
Closes #12931
2023-12-13 08:19:22 -05:00
Uwe Schindler
10387f136f Fix encoding problem caused by invisible character with ExtractJdkApis.java 2023-12-12 15:00:01 +01:00
Chris Hegarty
a6f70ad2bb
Reflow computeCommonPrefixLengthAndBuildHistogram to avoid crash (#12905)
This commit reflows the code in the method body of computeCommonPrefixLengthAndBuildHistogram, so as to avoid a JVM JIT crash. The purpose of this change is to workaround the JVM bug which is somewhat fragile, but the best that we can do for now and appears to be working well.
2023-12-11 20:10:03 +00:00