lucene

Commit Graph

Author	SHA1	Message	Date
Simon Willnauer	4d916a754b	Fix test to also take into accont minor versions for BWC	2024-01-09 12:21:13 +01:00
Simon Willnauer	ea327220a8	Remove stale BWC tests (#12874 ) Both of these tests have been disabled for quiet a long time. While `TestManyPointsInOldIndex` looks indeed stale, `TestIndexWriterOnOldIndex` is not a more general test.	2024-01-09 11:49:53 +01:00
sabi0	5442748995	Fix missing variable assignment in testAllVersionHaveCfsAndNocfs() and other minor code cleanups (#12969 )	2024-01-09 11:04:31 +01:00
sabi0	0fc1e2c2f7	Code cleanups in EscapeQuerySyntaxImpl (#12973 )	2024-01-08 22:18:37 +01:00
Jakub Slowinski	6d27c20579	Fix only use of .toLowerCase() with no Locale (#12856 )	2024-01-08 22:04:04 +01:00
sabi0	a32f6acadf	Remove unnecessary fields loop from extractWeightedSpanTerms() (#12965 )	2024-01-08 22:01:56 +01:00
Marc D'Mello	376bd24693	Improve code clarity for OrdinalMap (#11729 ) Closes #11728	2024-01-08 14:00:53 +01:00
Michael McCandless	3c235bb7b4	LockVerifyServer does not need to reuse addresses nor set accept timeout (#12535 )	2024-01-08 13:53:08 +01:00
gf2121	67be0189bc	clean up sleep (#12914 )	2024-01-08 13:48:26 +01:00
Adrien Grand	40060f8b70	Reduce contention on flushControl.isFullFlush(). (#12958 ) `flushControl.isFullFlush()` is a surprising source of contention with documents that are cheap to index and many indexing threads. If I slightly modify luceneutil's `IndexGeoNames` benchmark to configure a 4GB indexing buffer and disable `TextField` fields, which are more costly to index than `KeywordField` or `IntField` fields, this brings the time to load all the dataset in the `IndexWriter` buffers from 8.0s to 7.0s.	2024-01-08 13:23:05 +01:00
Stefan Vodita	115a30d462	Increase stale PRs actionbudget and mark not debug-only (#12998 )	2024-01-08 07:20:59 -05:00
Stefan Vodita	564b2ebecc	Introduce workflow for stale PRs (#12813 ) * Introduce stale workflow * Exempt draft PRs * Tune the action to our needs 1. Don't mark issues stale, only PRs. 2. Don't close anything automatically. 3. Keep the default Stale label. 4. Run in debug-only mode to start.	2024-01-08 06:22:19 -05:00
Dzung Bui	4c883a414c	Optimize FST on-heap BytesReader (#12879 ) * Move size() to FSTStore * Remove size() completely * Allow FST builder to use different DataOutput * access BytesStore byte[] directly for copying * Rename BytesStore * Change class to final * Reorder methods * Remove unused methods * Rename truncate to setPosition() and remove skipBytes() * Simplify the writing operations * Update comment * remove unused parameter * Simplify BytesStore operation * tidy code * Rename copyBytes to writeTo * Simplify BytesStore operations * Embed writeBytes() to FSTCompiler * Fix the write bytes method * Remove the default block bits constant * add assertion * Rename method parameter names * Move reverse to FSTCompiler * Revert setPosition call * Address comments * Return immediately when writing 0 bytes * Add comment & * Rename variables * Fix the compile error * Remove isReadable() * Remove isReadable() * Optimize ReadWriteDataOutput * tidy code * Freeze the DataOutput once finished() * Refactor * freeze the DataOutput before use * Improvement of ReadWriteDataOutput * tidy code * Address comments and add off-heap FST tests * Remove the hardcoded random * Ignore the Test2BFSTOffHeap test * Simplify ReadWriteDataOutput * Do not expose blockBits * tidy code * Remove 0 initialization * Add assertion and comment	2024-01-06 07:47:19 -05:00
sabi0	7b8aece125	Use Collections.addAll() instead of manual array copy and misc. code cleanups (#12977 )	2024-01-04 22:27:36 +01:00
sabi0	1a939410dd	Misc code cleanups (#12974 )	2024-01-04 08:37:49 +01:00
Kaival Parikh	248f067d52	Reduce number of dimensions for Test[Byte\|Float]VectorSimilarityQuery (#12988 ) ### Description Identified in #12955, where `TestFloatVectorSimilarityQuery.testVectorsAboveSimilarity` fails because of a disconnected HNSW graph This is a bigger issue, but we can reduce intermittent failures by keeping the number of docs and dimensions same as [`BaseKnnVectorQueryTestCase.testRandom`](`dc9f154aa5/lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java (L470)`) (similar test for KNN with random vectors) ### Command to reproduce ``` ./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestFloatVectorSimilarityQuery.testVectorsAboveSimilarity" -Ptests.jvms=12 -Ptests.jvmargs= -Ptests.seed=1A1CDC0974AF361 ```	2024-01-02 13:06:12 -05:00
sabi0	78b4f75a2c	Replace .collect(toList()) with .toList() and misc. code cleanups (#12978 )	2023-12-30 17:04:11 +01:00
sabi0	ec9e593dc4	Remove obsolete 'mappingRules' in Tokenizer tests (#12972 )	2023-12-30 16:59:59 +01:00
sabi0	67d866c586	Minor code cleanups (intellij inspections).	2023-12-30 16:55:49 +01:00
Uwe Schindler	346f4ff7d2	Move changes entry to 9.10 (#12841 )	2023-12-29 13:08:49 +01:00
sabi0	64cf54a4bf	Replace "UTF-8" with StandardCharsets.UTF_8 and other typo and minor cleanups (#12979 )	2023-12-28 19:42:22 +01:00
sabi0	91272f45da	Replace println(String.format(...)) with printf(...) (#12976 )	2023-12-28 19:32:06 +01:00
sabi0	57b104e806	Get rid of inefficient Stream.count() (#12975 )	2023-12-28 19:30:01 +01:00
sabi0	9c9949b2bc	Remove unused imports (#12970 )	2023-12-28 19:28:24 +01:00
Patrick Zhai	948970be58	Fix bug where NFARunAutomaton#getTransition does not set Transition correctly (#12909 )	2023-12-27 22:49:35 -08:00
sabi0	02722eeb69	Add missing spaces in concatenated strings (#12967 )	2023-12-23 20:30:30 -05:00
Zhang Chao	dc9f154aa5	Move group-varint encoding/decoding logic to DataOutput/DataInput (#12841 )	2023-12-23 13:18:34 +01:00
sabi0	9359a9dcff	Update contributing guide: autocrlf and build dependencies (#12963 )	2023-12-22 09:28:53 +01:00
sabi0	f6b2006195	Fix typo in help/formatting.txt (#12960 )	2023-12-21 19:58:53 +01:00
Adrien Grand	91002d04d3	Fix CheckIndex to correctly flag the automaton as binary.	2023-12-20 14:39:32 +01:00
Zhang Chao	5152051f68	Improve Javadoc for DocValuesConsumer (#12952 )	2023-12-20 13:40:44 +01:00
Adrien Grand	bcc7e120ba	Modernize LineFileDocs. (#12929 ) This replaces `StringField`/`SortedDocValuesField` with `KeywordField` and `IntPoint`/`NumericDocValuesField` with `IntField`.	2023-12-19 11:25:26 +01:00
Adrien Grand	5c084fcd6e	Add a stored fields test that indexes LineFileDocs. (#12927 ) Real-world data exhibits patterns that are taken advantage of by the compression logic, but also hardly reproducible in a randomized way. This makes this new test introduce interesting coverage. It takes one second to run on my machine, so I did not mark it `@Nightly`.	2023-12-19 11:20:14 +01:00
Adrien Grand	bf45ab79ec	Beef up `Terms#intersect` checks in `CheckIndex`. (#12926 ) Now also testing what happens with a non-null `startTerm`. This found bugs in `DirectPostingsFormat`.	2023-12-19 11:17:38 +01:00
Lukáš Vlček	5d6086e199	Fix position increment in (Reverse)PathHierarchyTokenizer (#12875 ) * Fix PathHierarchyTokenizer positions PathHierarchyTokenizer was emitting multiple tokens in the same position with changing offsets. To be consistent with EdgeNGramTokenizer (which is conceptually similar -- it's emitting multiple prefixes/suffixes off the input string), we can output every token with length 1 with positions incrementing by 1. * Fix ReversePathHierarchyTokenizer positions Making ReversePathHierarchyTokenizer consistent with recent changes in PathHierarchyTokenizer. --------- Co-authored-by: Michael Froh <froh@amazon.com>	2023-12-18 08:48:22 -05:00
Dawid Weiss	6bb244a932	An improved check for ignoring the c2-crash test if running on a client compiler. (#12953 )	2023-12-18 12:37:57 +01:00
ChrisHegarty	f6582ce048	Add back-compat indices for 9.9.1	2023-12-17 09:39:46 +00:00
ChrisHegarty	08728bf202	Add bugfix version 9.9.1	2023-12-17 09:20:34 +00:00
ChrisHegarty	1f1d0735c8	DOAP changes for release 9.9.1	2023-12-16 22:55:20 +00:00
Michael Sokolov	49d521145d	Use hppc IntIntHashMap to avoid Integer box/unbox when remapping vector ordinals during merge (#12950 )	2023-12-15 13:24:05 -05:00
Benjamin Trent	423f8279f0	Fix flaky tests that are caused by small float vectors (#12943 ) While quantization generally works well, when the number of dimensions is tiny (just two like in our tests), and we are indexing a circle, and we have random merge policies, we can end up getting unexpected ordering on the resulting vectors. closes: https://github.com/apache/lucene/issues/12940	2023-12-14 14:38:22 -05:00
Michael McCandless	d1551da027	#12932 : get monsters tests compiling/running again (#12942 )	2023-12-14 10:14:45 -05:00
Stefan Vodita	b0ebb849f5	Introduce growInRange to reduce array overallocation (#12844 ) In cases where we know there is an upper limit to the potential size of an array, we can use `growInRange` to avoid allocating beyond that limit.	2023-12-14 23:00:26 +09:00
Michael McCandless	ebf9e29570	Ensure Nori/Kuromoji shipped binary FST is the latest version (#12933 ) * ensure Nori/Kuromoji shipped binary FST is the latest version (closes #12911) * fold feedback from @uschindler: sharpen test failure methods to give the specific gradlew command to regenerate the precise FST (not everything) * add javadoc for FSTMetadata.getVersion	2023-12-14 07:38:34 -05:00
Jakub Slowinski	3965319441	Attempting to clean up some remaining Solr references (#12939 ) * Attempting to clean up some remaining Solr references * Update gradle/help.gradle Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com> --------- Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>	2023-12-14 06:02:16 -05:00
Patrick Zhai	da69346257	Add CHANGES.txt entry for #12910	2023-12-14 09:14:18 +09:00
Patrick Zhai	f303d29baf	Refactor around NeighborArray (#12910 )	2023-12-14 09:03:44 +09:00
Uwe Schindler	16d0b822b3	Prevent the common zero-width code points and detect invalid UTF-8 encoding in our sources and selected resource files (#12937 ) * Simple patch to prevent the common zero-width code points in our source and some types of resource files * Validate correct UTF-8 input and fix buggy CSS file (ISO-8859-x encoded) * add a bit of context * Add CHANGES.txt	2023-12-13 17:27:05 +01:00
Kaival Parikh	6c5dcc1795	Fix failing BaseVectorSimilarityQueryTestCase#testApproximate (#12922 ) Discovered in #12921, and introduced in #12679 The first issue is that we weren't advancing the `VectorScorer` [here](`cf13a92950/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java (L257-L262)`) -- so it was still un-positioned while trying to compute the similarity score Earlier in the PR, the underlying delegate of the `FilteredDocIdSetIterator` was `scorer.iterator()` (see [here](`cad565439b/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java (L107)`)) -- so we didn't need to explicitly advance it Later, we decided to maintain parity to `AbstractKnnVectorQuery` and introduce filtering in `AbstractVectorSimilarityQuery` (see [this commit](`5096790f28`)) to determine the `visitLimit` of approximate search -- after which the underlying iterator changed to the accepted docs (see [here](`5096790f28/lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java (L255)`)) and I missed advancing the `VectorScorer` explicitly.. After doing so, we no longer get the original `java.lang.ArrayIndexOutOfBoundsException` -- but the `BaseVectorSimilarityQueryTestCase#testApproximate` starts failing because it falls back to exact search, as the limit of the prefilter is met during graph search Relaxed the parameters of the test to fix this (making the filter less restrictive, and trying to visit a fewer number of nodes so that approximate search completes without hitting its limit) Sorry for missing this earlier!	2023-12-13 10:11:45 -05:00
Robert Muir	98d2df17d5	enable error-prone's DisableUnicodeInCode check (#12936 ) Closes #12931	2023-12-13 08:19:22 -05:00

... 2 3 4 5 6 ...

37166 Commits All Branches Search

37166 Commits

All Branches