lucene

Commit Graph

Author	SHA1	Message	Date
Zhang Chao	9dd068b4d4	Fix IntegerOverflow exception in postings encoding as group-varint (#13376 ) The exception happen because the tail postings list block, which encoding with GroupVInt, had a docID delta that was >= 1<<30, when the postings are also storing freqs.	2024-05-18 00:56:01 +08:00
Nhat Nguyen	f12e4899bf	Harden BaseDocValuesFormatTestCase (#13346 ) (#13348 ) We hit a Codec bug in Elasticsearch, but it went unnoticed because our tests extend from BaseDocValuesFormatTestCase, which doesn't attempt to read the doc-values of the same document twice. This change strengthens BaseDocValuesFormatTestCase checks and randomly inserts that access pattern.	2024-05-07 10:21:37 -07:00
Zhang Chao	0723d4579b	Fix test failure in TestReqOptSumScorer.testFilterRandomRareOpt (#13122 )	2024-02-22 18:08:41 +08:00
Adrien Grand	6635dd52de	Use NIO2 APIs.	2024-02-21 09:04:04 +01:00
Adrien Grand	d2d02a0c8d	Fix bw index generation logic.	2024-02-20 22:11:41 +01:00
Adrien Grand	d689ac96a5	Add back-compat indices for 9.10.0	2024-02-20 22:07:27 +01:00
Adrien Grand	f2369cc5a2	Only track released versions in `oldVersions`. (#13096 )	2024-02-20 19:19:17 +01:00
Simon Willnauer	28b04c2cff	Fix addBackcompatIndexes.py to properly generate missing versions (#13095 ) In #13046 several changes broke the addBackcompatIndexes.py script to properly add and test the unreleased version. This updates the script to again properly add the new version. Closes #13094 Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>	2024-02-20 18:56:40 +01:00
Adrien Grand	cb2c09d5ce	Add next bugfix version 9.10.1	2024-02-20 18:46:17 +01:00
Adrien Grand	df56401def	DOAP changes for release 9.10.0	2024-02-20 18:18:31 +01:00
Dawid Weiss	695c0ac845	Add the missing Version field for 8.11.3. (#13093 )	2024-02-11 11:50:15 +01:00
Dawid Weiss	c0e767c8dc	Revert "upgrade to OpenNLP 2.3.2 (#12674 )" (minimum java requirement not fulfilled) This reverts commit `bb32700205`.	2024-02-11 11:11:40 +01:00
Uwe Schindler	f4dbab4e10	fix typo in CHANGES.txt	2024-02-09 23:16:02 +01:00
Uwe Schindler	f1942c7d21	Enable MemorySegment in MMapDirectory for Java 22+ and Vectorization (incubation) for exact Java 22 (#12706 )	2024-02-09 23:04:07 +01:00
Zhang Chao	6a5dde6430	Add necessary assertion in CheckHits#doCheckMaxScores (#13088 )	2024-02-09 18:28:37 +01:00
Christine Poerschke	bb32700205	upgrade to OpenNLP 2.3.2 (#12674 ) (cherry picked from commit `563fafd8ac`) Resolved Conflicts: lucene/MIGRATE.md lucene/licenses/opennlp-tools-1.9.4.jar.sha1 versions.lock versions.props	2024-02-09 11:36:26 +00:00
Houston Putman	06ee710c3c	Tidy for back compat tests	2024-02-08 19:17:51 -06:00
Houston Putman	d858b1f999	Add back-compat indices for 8.11.3	2024-02-08 19:16:43 -06:00
Houston Putman	8aecb255f8	Add bugfix version 8.11.3	2024-02-08 18:18:04 -06:00
Houston Putman	d8b7d7cf90	Add 8.11.3 release to DOAP RDF file	2024-02-08 18:45:01 -05:00
Benjamin Trent	477cd56ba9	Prevent humongous allocations when calculating scalar quantiles (#13090 ) The initial release of scalar quantization would periodically create a humongous allocation, which can put unwarranted pressure on the GC & on the heap usage as a whole. This commit adjusts this by only allocating a float array of 20*dimensions and averaging the discovered quantiles from there. Why does this work? - Quantiles based on confidence intervals are (generally) unbiased and doing an average gives statistically good results - The selector algorithm scales linearly, so the cost is just about the same - We need to do more than `1` vector at a time to prevent extreme confidence intervals interacting strangely with edge cases	2024-02-08 15:57:38 -05:00
Benjamin Trent	c70c946d96	Moving quantization logic to make future quantizer work simpler (#13091 )	2024-02-08 09:38:58 -05:00
Stefan Vodita	2d713d904d	Index arbitrary fields in taxonomy docs (#12337 )	2024-02-08 11:13:36 +00:00
Christine Poerschke	95dc751462	in BytesRefHash constructor avoid duplicate BytesStartArray.bytesUsed() call (#13032 )	2024-02-08 09:03:15 +01:00
Dzung Bui	f3e2929a52	Make FSTCompiler.compile() to only return the FSTMetadata (#12831 ) * Make FSTCompiler.compile() to only return the FSTMetadata * tidy code	2024-02-07 13:15:32 -05:00
Benjamin Trent	c059819a59	Fix knn vector visit limit fence post error (#13058 ) I noticed while experimenting with brute-force search that our visitation limit is EXACTLY the number of filtered docs to hit. Consequently, if we happen to do brute force search and visit that exact number of vectors, we will fall back again to do brute-force a second time. This struck me as weird. This commit adjusts the visit limit threshold for approximate search to account for this.	2024-02-07 12:58:53 -05:00
Benjamin Trent	4bbefda4f2	Fix test failure TestParentBlockJoinFloatKnnVectorQuery.testSkewedIndex (#13082 )	2024-02-07 12:58:30 -05:00
Benjamin Trent	d9accfd409	Fix TestTopFieldCollector.testTotalHits #13080 (#13081 )	2024-02-07 12:58:21 -05:00
Zhang Chao	a8b92f275a	Do not use mock merge policy for TestFuzzyQuery#testFuzziness (#13070 )	2024-02-07 18:05:50 +01:00
Mayya Sharipova	a6c45ebe43	Speedup concurrent multi-segment HNWS graph search (#13086 ) Speedup concurrent multi-segment HNWS graph search by exchanging the global top candidated collected so far across segments. These global top candidates set the minimum threshold that new candidates need to pass to be considered. This allows earlier stopping for segments that don't have good candidates.	2024-02-07 10:06:58 -05:00
Zhang Chao	29c14a75bb	Fix NPE in TestReqOptSumScorer.testFilterRandomRareOpt #13069	2024-02-07 13:14:34 +01:00
Dawid Weiss	c4df3e13ad	Make date parsing more flexible for linedocsfile (europarl, enwiki) (#13075 )	2024-02-05 19:05:42 +01:00
Uwe Schindler	0f33d86808	LUCENE-10572: Add support for varhandles in native byte order (still randomized during tests) (#888 )	2024-02-05 18:14:19 +01:00
Zhang Chao	3da32a257b	Use growNoCopy in some places (#12951 )	2024-02-04 17:32:56 +01:00
Simon Willnauer	70bab56f6f	Fix broken loop in TestDocumentsWriterStallControl.assertState() (#13062 ) The loop in assertState prematurely exists due to a broken break steament. Closes #13061	2024-02-02 16:15:58 +01:00
Dmitry Cherniachenko	4e73a4b2ac	Fix normalization in TeluguAnalyzer (#13059 ) DecimalDigitFilter and IndicNormalizationFilter were mistakenly omitted.	2024-02-01 14:45:47 +01:00
Adrien Grand	caf90fd085	Fix formatting of some CHANGES entries.	2024-02-01 14:40:28 +01:00
Johannes Fredén	c845bf02fe	Optimize counts on two clause term disjunctions (#13036 ) Calculate count(clause1 OR clause2) as count(clause1) + count(clause2) - count(clause1 AND clause2)	2024-02-01 14:39:05 +01:00
Simon Willnauer	3d8ad99039	Fix broken javadoc reference	2024-01-31 16:45:34 +01:00
Simon Willnauer	e3d0af9e54	Modernize BWC testing with parameterized tests (#13046 ) This change modernizes the BWC tests to leverage RandomizedRunners Parameterized Tests that allow us to have more structured and hopefully more extendible BWC tests in the future. This change doesn't add any new tests but tries to make the ones we have more structured and support growth down the road. Basically, every index type got it's own Test class that doesn't require to loop over all the indices in each test. Each test case is run with all versions specified. Several sanity checks are applied in the base class to make individual tests smaller and much easier to read. Co-authored-by: Michael McCandless <lucene@mikemccandless.com> Co-authored-by: Adrien Grand <jpountz@gmail.com>	2024-01-31 15:55:09 +01:00
Dmitry Cherniachenko	6f6fdbdf2e	Clean up AnyQueryNode code (#13053 )	2024-01-31 10:49:44 +01:00
Dmitry Cherniachenko	1e36b46147	Make static final Set immutable (#13055 ): EnumSet.of() returns a mutable Set that should not be used for static final constants. # Conflicts: # lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanPartOfSpeechStopFilter.java	2024-01-30 11:57:41 +01:00
Uwe Schindler	bb37626dbd	add changes entry	2024-01-30 11:49:13 +01:00
Dmitry Cherniachenko	fe6724786b	Align instanceof check with type cast (#13039 )	2024-01-30 11:49:11 +01:00
Dmitry Cherniachenko	fc6c5d6e1f	Change `set.removeAll(list)` to `list.forEach(set::remove)` (#13052 )	2024-01-30 11:40:07 +01:00
Uwe Schindler	deac9c2651	Fix formatting (#13051 )	2024-01-30 00:35:57 +01:00
Dmitry Cherniachenko	b129e663f4	Use orElseGet() to avoid unnecessary object allocation (#13048 )	2024-01-30 00:28:40 +01:00
Dmitry Cherniachenko	6db232c20a	Replace `new HashSet<>(Arrays.asList())` with `EnumSet.of()` (#13051 ) # Conflicts: # lucene/analysis/nori/src/java/org/apache/lucene/analysis/ko/KoreanPartOfSpeechStopFilter.java	2024-01-30 00:28:40 +01:00
Dmitry Cherniachenko	7b884b6ec6	Use String.isEmpty() instead of equals("") (#13050 )	2024-01-30 00:27:02 +01:00
Dmitry Cherniachenko	2eb58ad0b6	Use String.indexOf(char) where possible (#13049 )	2024-01-30 00:27:00 +01:00

1 2 3 4 5 ...

36819 Commits All Branches Search

36819 Commits

All Branches