Commit Graph

36819 Commits

Author SHA1 Message Date
Zhang Chao 9dd068b4d4 Fix IntegerOverflow exception in postings encoding as group-varint (#13376)
The exception happen because the tail postings list block, which encoding with GroupVInt, had a docID delta that was >= 1<<30, when the postings are also storing freqs.
2024-05-18 00:56:01 +08:00
Nhat Nguyen f12e4899bf
Harden BaseDocValuesFormatTestCase (#13346) (#13348)
We hit a Codec bug in Elasticsearch, but it went unnoticed because our
tests extend from BaseDocValuesFormatTestCase, which doesn't attempt to
read the doc-values of the same document twice. This change strengthens
BaseDocValuesFormatTestCase checks and randomly inserts that access
pattern.
2024-05-07 10:21:37 -07:00
Zhang Chao 0723d4579b Fix test failure in TestReqOptSumScorer.testFilterRandomRareOpt (#13122) 2024-02-22 18:08:41 +08:00
Adrien Grand 6635dd52de Use NIO2 APIs. 2024-02-21 09:04:04 +01:00
Adrien Grand d2d02a0c8d Fix bw index generation logic. 2024-02-20 22:11:41 +01:00
Adrien Grand d689ac96a5 Add back-compat indices for 9.10.0 2024-02-20 22:07:27 +01:00
Adrien Grand f2369cc5a2 Only track released versions in `oldVersions`. (#13096) 2024-02-20 19:19:17 +01:00
Simon Willnauer 28b04c2cff Fix addBackcompatIndexes.py to properly generate missing versions (#13095)
In #13046 several changes broke the addBackcompatIndexes.py script
to properly add and test the unreleased version. This updates the
script to again properly add the new version.

Closes #13094

Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
2024-02-20 18:56:40 +01:00
Adrien Grand cb2c09d5ce Add next bugfix version 9.10.1 2024-02-20 18:46:17 +01:00
Adrien Grand df56401def DOAP changes for release 9.10.0 2024-02-20 18:18:31 +01:00
Dawid Weiss 695c0ac845
Add the missing Version field for 8.11.3. (#13093) 2024-02-11 11:50:15 +01:00
Dawid Weiss c0e767c8dc Revert "upgrade to OpenNLP 2.3.2 (#12674)" (minimum java requirement not fulfilled)
This reverts commit bb32700205.
2024-02-11 11:11:40 +01:00
Uwe Schindler f4dbab4e10 fix typo in CHANGES.txt 2024-02-09 23:16:02 +01:00
Uwe Schindler f1942c7d21 Enable MemorySegment in MMapDirectory for Java 22+ and Vectorization (incubation) for exact Java 22 (#12706) 2024-02-09 23:04:07 +01:00
Zhang Chao 6a5dde6430 Add necessary assertion in CheckHits#doCheckMaxScores (#13088) 2024-02-09 18:28:37 +01:00
Christine Poerschke bb32700205 upgrade to OpenNLP 2.3.2 (#12674)
(cherry picked from commit 563fafd8ac)

Resolved Conflicts:
	lucene/MIGRATE.md
	lucene/licenses/opennlp-tools-1.9.4.jar.sha1
	versions.lock
	versions.props
2024-02-09 11:36:26 +00:00
Houston Putman 06ee710c3c Tidy for back compat tests 2024-02-08 19:17:51 -06:00
Houston Putman d858b1f999 Add back-compat indices for 8.11.3 2024-02-08 19:16:43 -06:00
Houston Putman 8aecb255f8 Add bugfix version 8.11.3 2024-02-08 18:18:04 -06:00
Houston Putman d8b7d7cf90
Add 8.11.3 release to DOAP RDF file 2024-02-08 18:45:01 -05:00
Benjamin Trent 477cd56ba9 Prevent humongous allocations when calculating scalar quantiles (#13090)
The initial release of scalar quantization would periodically create a humongous allocation, which can put unwarranted pressure on the GC & on the heap usage as a whole.

This commit adjusts this by only allocating a float array of 20*dimensions and averaging the discovered quantiles from there. 

Why does this work?

 - Quantiles based on confidence intervals are (generally) unbiased and doing an average gives statistically good results
 - The selector algorithm scales linearly, so the cost is just about the same
 - We need to do more than `1` vector at a time to prevent extreme confidence intervals interacting strangely with edge cases
2024-02-08 15:57:38 -05:00
Benjamin Trent c70c946d96 Moving quantization logic to make future quantizer work simpler (#13091) 2024-02-08 09:38:58 -05:00
Stefan Vodita 2d713d904d Index arbitrary fields in taxonomy docs (#12337) 2024-02-08 11:13:36 +00:00
Christine Poerschke 95dc751462 in BytesRefHash constructor avoid duplicate BytesStartArray.bytesUsed() call (#13032) 2024-02-08 09:03:15 +01:00
Dzung Bui f3e2929a52 Make FSTCompiler.compile() to only return the FSTMetadata (#12831)
* Make FSTCompiler.compile() to only return the FSTMetadata

* tidy code
2024-02-07 13:15:32 -05:00
Benjamin Trent c059819a59 Fix knn vector visit limit fence post error (#13058)
I noticed while experimenting with brute-force search that our visitation limit is EXACTLY the number of filtered docs to hit. Consequently, if we happen to do brute force search and visit that exact number of vectors, we will fall back again to do brute-force a second time. This struck me as weird.

This commit adjusts the visit limit threshold for approximate search to account for this.
2024-02-07 12:58:53 -05:00
Benjamin Trent 4bbefda4f2 Fix test failure TestParentBlockJoinFloatKnnVectorQuery.testSkewedIndex (#13082) 2024-02-07 12:58:30 -05:00
Benjamin Trent d9accfd409 Fix TestTopFieldCollector.testTotalHits #13080 (#13081) 2024-02-07 12:58:21 -05:00
Zhang Chao a8b92f275a Do not use mock merge policy for TestFuzzyQuery#testFuzziness (#13070) 2024-02-07 18:05:50 +01:00
Mayya Sharipova a6c45ebe43
Speedup concurrent multi-segment HNWS graph search (#13086)
Speedup concurrent multi-segment HNWS graph search by exchanging 
the global top candidated collected so far across segments. These global top 
candidates set the minimum threshold that new candidates need to pass
 to be considered. This allows earlier stopping for segments that don't have 
good candidates.
2024-02-07 10:06:58 -05:00
Zhang Chao 29c14a75bb Fix NPE in TestReqOptSumScorer.testFilterRandomRareOpt #13069 2024-02-07 13:14:34 +01:00
Dawid Weiss c4df3e13ad Make date parsing more flexible for linedocsfile (europarl, enwiki) (#13075) 2024-02-05 19:05:42 +01:00
Uwe Schindler 0f33d86808 LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) (#888) 2024-02-05 18:14:19 +01:00
Zhang Chao 3da32a257b Use growNoCopy in some places (#12951) 2024-02-04 17:32:56 +01:00
Simon Willnauer 70bab56f6f Fix broken loop in TestDocumentsWriterStallControl.assertState() (#13062)
The loop in assertState prematurely exists due to a broken break steament.

Closes #13061
2024-02-02 16:15:58 +01:00
Dmitry Cherniachenko 4e73a4b2ac Fix normalization in TeluguAnalyzer (#13059)
DecimalDigitFilter and IndicNormalizationFilter were mistakenly omitted.
2024-02-01 14:45:47 +01:00
Adrien Grand caf90fd085 Fix formatting of some CHANGES entries. 2024-02-01 14:40:28 +01:00
Johannes Fredén c845bf02fe Optimize counts on two clause term disjunctions (#13036)
Calculate count(clause1 OR clause2) as count(clause1)
+ count(clause2) - count(clause1 AND clause2)
2024-02-01 14:39:05 +01:00
Simon Willnauer 3d8ad99039 Fix broken javadoc reference 2024-01-31 16:45:34 +01:00
Simon Willnauer e3d0af9e54 Modernize BWC testing with parameterized tests (#13046)
This change modernizes the BWC tests to leverage RandomizedRunners Parameterized Tests that allow us to have more structured and hopefully more extendible BWC tests in the future. This change doesn't add any new tests but tries to make the ones we have more structured and support growth down the road.
Basically, every index type got it's own Test class that doesn't require to loop over all the indices in each test. Each test case is run with all versions specified. Several sanity checks are applied in the base class to make individual tests smaller and much easier to read.

Co-authored-by: Michael McCandless <lucene@mikemccandless.com>
Co-authored-by: Adrien Grand <jpountz@gmail.com>
2024-01-31 15:55:09 +01:00
Dmitry Cherniachenko 6f6fdbdf2e Clean up AnyQueryNode code (#13053) 2024-01-31 10:49:44 +01:00
Dmitry Cherniachenko 1e36b46147 Make static final Set immutable (#13055): EnumSet.of() returns a mutable Set that should not be used for static final constants.
# Conflicts:
#	lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanPartOfSpeechStopFilter.java
2024-01-30 11:57:41 +01:00
Uwe Schindler bb37626dbd add changes entry 2024-01-30 11:49:13 +01:00
Dmitry Cherniachenko fe6724786b Align instanceof check with type cast (#13039) 2024-01-30 11:49:11 +01:00
Dmitry Cherniachenko fc6c5d6e1f Change `set.removeAll(list)` to `list.forEach(set::remove)` (#13052) 2024-01-30 11:40:07 +01:00
Uwe Schindler deac9c2651 Fix formatting (#13051) 2024-01-30 00:35:57 +01:00
Dmitry Cherniachenko b129e663f4 Use orElseGet() to avoid unnecessary object allocation (#13048) 2024-01-30 00:28:40 +01:00
Dmitry Cherniachenko 6db232c20a Replace `new HashSet<>(Arrays.asList())` with `EnumSet.of()` (#13051)
# Conflicts:
#	lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanPartOfSpeechStopFilter.java
2024-01-30 00:28:40 +01:00
Dmitry Cherniachenko 7b884b6ec6 Use String.isEmpty() instead of equals("") (#13050) 2024-01-30 00:27:02 +01:00
Dmitry Cherniachenko 2eb58ad0b6 Use String.indexOf(char) where possible (#13049) 2024-01-30 00:27:00 +01:00