lucene

Commit Graph

Author	SHA1	Message	Date
Adrien Grand	8ae03d66ad	Move postings back to int[] to take advantage of having more lanes per vector. (#13968 ) In Lucene 8.4, we updated postings to work on long[] arrays internally. This allowed us to workaround the lack of explicit vectorization (auto-vectorization doesn't detect all the scenarios that we would like to handle) support in the JVM by summing up two integers in one operation for instance. With explicit vectorization now available, it looks like we can get more benefits from the ability to compare multiple intetgers in one operations than from summing up two integers in one operation. Moving back to ints helps compare 2x more integers at once vs. longs.	2024-11-04 18:40:53 +01:00
Adrien Grand	299d7f6721	Speed up prefix sums when decoding doc IDs. (#13658 ) This updates file formats to compute prefix sums by summing up 8 deltas per long at the same time if the number of bits per value is 4 or less, and 4 deltas per long at the same time if the number of bits per value is between 5 included and 11 included. Otherwise, we keep summing up 2 deltas per long like today. The `PostingDecodingUtil` was slightly modified due to the fact that more numbers of bits per value now need to apply different shifts to the input data. E.g. now that we store integers that require 5 bits per value as 16-bit integers under the hood rather than 8, we extract the first values by shifting by 16-5=11, 16-25=6 and 16-35=1 and then decode tail values from the remaining bit per 16-bit integer.	2024-08-16 21:27:32 +02:00
Adrien Grand	b4a8810b7a	Inline skip data into postings lists (#13585 ) This updates the postings format in order to inline skip data into postings. This format is generally similar to the current `Lucene99PostingsFormat`, e.g. it shares the same block encoding logic, but it has a few differences: - Skip data is inlined into postings to make the access pattern more sequential. - There are only 2 levels of skip data: on every block (128 docs) and every 32 blocks (4,096 docs). In general, I found that the fact that skip data is inlined may slow down a bit queries that don't need skip data at all (e.g. `CountOrXXX` tasks that never advance of consult impacts) and speed up a bit queries that advance by small intervals. The fact that the greatest level only allows skipping 4096 docs at once means that we're slower at advancing by large intervals, but data suggests that it doesn't significantly hurt performance.	2024-07-31 17:18:28 +02:00
Dawid Weiss	afe982b3ef	Schedule compileJava after the internal task if it affects source files (#13282 )	2024-04-09 07:44:07 +02:00
Jakub Slowinski	8ae598bae5	Remove patching for doc blocks. (#12741 ) * Change Postings back to using FOR in Lucene99PostingsFormat We are still keeping PFOR for positions only. This is a partial revert of https://github.com/apache/lucene/pull/69 which brings back ForDeltaUtil. * fix merge commit * Add forgotten forDeltaUtil calls to reader * Addressing comments: adding Lucene90RWPostingsFormat + more Also: * Change to Changes.txt * Removal of dead code which was only used in unit tests * Removal of test code from PForUtil * Changes.txt edit in right place now * Apply suggestions from code review: `90 -> 99 refactoring` Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com> * Remove decodeTo32 from ForUtil and regenerate --------- Co-authored-by: gf2121 <52390227+gf2121@users.noreply.github.com>	2023-11-06 10:46:03 -05:00
Uwe Schindler	f668cfd1cd	Fix forUtil.gradle to actually execute python script and also fix type error in script (#12411 )	2023-07-03 16:22:03 +02:00
Dawid Weiss	aac6581f6e	LUCENE-9915: Add generation/ checksumming task for gen_ForUtil.py (#126 )	2021-05-05 22:03:06 +02:00

7 Commits