lucene

Commit Graph

Author	SHA1	Message	Date
sabi0	d7a14257ce	Get rid of deprecated assertThat() usages (#12982 )	2024-01-10 16:31:29 +01:00
Andrew Ross	872702d828	Remove outdated comment from TaskExecutor (#12993 ) A previous iteration of this code used an AtomicInteger and required this comment. The committed version uses a self-documenting boolean and the comment is not needed.	2024-01-10 10:35:23 +01:00
Simon Willnauer	4d916a754b	Fix test to also take into accont minor versions for BWC	2024-01-09 12:21:13 +01:00
Simon Willnauer	ea327220a8	Remove stale BWC tests (#12874 ) Both of these tests have been disabled for quiet a long time. While `TestManyPointsInOldIndex` looks indeed stale, `TestIndexWriterOnOldIndex` is not a more general test.	2024-01-09 11:49:53 +01:00
sabi0	5442748995	Fix missing variable assignment in testAllVersionHaveCfsAndNocfs() and other minor code cleanups (#12969 )	2024-01-09 11:04:31 +01:00
sabi0	0fc1e2c2f7	Code cleanups in EscapeQuerySyntaxImpl (#12973 )	2024-01-08 22:18:37 +01:00
Jakub Slowinski	6d27c20579	Fix only use of .toLowerCase() with no Locale (#12856 )	2024-01-08 22:04:04 +01:00
sabi0	a32f6acadf	Remove unnecessary fields loop from extractWeightedSpanTerms() (#12965 )	2024-01-08 22:01:56 +01:00
Marc D'Mello	376bd24693	Improve code clarity for OrdinalMap (#11729 ) Closes #11728	2024-01-08 14:00:53 +01:00
Michael McCandless	3c235bb7b4	LockVerifyServer does not need to reuse addresses nor set accept timeout (#12535 )	2024-01-08 13:53:08 +01:00
gf2121	67be0189bc	clean up sleep (#12914 )	2024-01-08 13:48:26 +01:00
Adrien Grand	40060f8b70	Reduce contention on flushControl.isFullFlush(). (#12958 ) `flushControl.isFullFlush()` is a surprising source of contention with documents that are cheap to index and many indexing threads. If I slightly modify luceneutil's `IndexGeoNames` benchmark to configure a 4GB indexing buffer and disable `TextField` fields, which are more costly to index than `KeywordField` or `IntField` fields, this brings the time to load all the dataset in the `IndexWriter` buffers from 8.0s to 7.0s.	2024-01-08 13:23:05 +01:00
Stefan Vodita	115a30d462	Increase stale PRs actionbudget and mark not debug-only (#12998 )	2024-01-08 07:20:59 -05:00
Stefan Vodita	564b2ebecc	Introduce workflow for stale PRs (#12813 ) * Introduce stale workflow * Exempt draft PRs * Tune the action to our needs 1. Don't mark issues stale, only PRs. 2. Don't close anything automatically. 3. Keep the default Stale label. 4. Run in debug-only mode to start.	2024-01-08 06:22:19 -05:00
Dzung Bui	4c883a414c	Optimize FST on-heap BytesReader (#12879 ) * Move size() to FSTStore * Remove size() completely * Allow FST builder to use different DataOutput * access BytesStore byte[] directly for copying * Rename BytesStore * Change class to final * Reorder methods * Remove unused methods * Rename truncate to setPosition() and remove skipBytes() * Simplify the writing operations * Update comment * remove unused parameter * Simplify BytesStore operation * tidy code * Rename copyBytes to writeTo * Simplify BytesStore operations * Embed writeBytes() to FSTCompiler * Fix the write bytes method * Remove the default block bits constant * add assertion * Rename method parameter names * Move reverse to FSTCompiler * Revert setPosition call * Address comments * Return immediately when writing 0 bytes * Add comment & * Rename variables * Fix the compile error * Remove isReadable() * Remove isReadable() * Optimize ReadWriteDataOutput * tidy code * Freeze the DataOutput once finished() * Refactor * freeze the DataOutput before use * Improvement of ReadWriteDataOutput * tidy code * Address comments and add off-heap FST tests * Remove the hardcoded random * Ignore the Test2BFSTOffHeap test * Simplify ReadWriteDataOutput * Do not expose blockBits * tidy code * Remove 0 initialization * Add assertion and comment	2024-01-06 07:47:19 -05:00
sabi0	7b8aece125	Use Collections.addAll() instead of manual array copy and misc. code cleanups (#12977 )	2024-01-04 22:27:36 +01:00
sabi0	1a939410dd	Misc code cleanups (#12974 )	2024-01-04 08:37:49 +01:00
Kaival Parikh	248f067d52	Reduce number of dimensions for Test[Byte\|Float]VectorSimilarityQuery (#12988 ) ### Description Identified in #12955, where `TestFloatVectorSimilarityQuery.testVectorsAboveSimilarity` fails because of a disconnected HNSW graph This is a bigger issue, but we can reduce intermittent failures by keeping the number of docs and dimensions same as [`BaseKnnVectorQueryTestCase.testRandom`](`dc9f154aa5/lucene/core/src/test/org/apache/lucene/search/BaseKnnVectorQueryTestCase.java (L470)`) (similar test for KNN with random vectors) ### Command to reproduce ``` ./gradlew :lucene:core:test --tests "org.apache.lucene.search.TestFloatVectorSimilarityQuery.testVectorsAboveSimilarity" -Ptests.jvms=12 -Ptests.jvmargs= -Ptests.seed=1A1CDC0974AF361 ```	2024-01-02 13:06:12 -05:00
sabi0	78b4f75a2c	Replace .collect(toList()) with .toList() and misc. code cleanups (#12978 )	2023-12-30 17:04:11 +01:00
sabi0	ec9e593dc4	Remove obsolete 'mappingRules' in Tokenizer tests (#12972 )	2023-12-30 16:59:59 +01:00
sabi0	67d866c586	Minor code cleanups (intellij inspections).	2023-12-30 16:55:49 +01:00
Uwe Schindler	346f4ff7d2	Move changes entry to 9.10 (#12841 )	2023-12-29 13:08:49 +01:00
sabi0	64cf54a4bf	Replace "UTF-8" with StandardCharsets.UTF_8 and other typo and minor cleanups (#12979 )	2023-12-28 19:42:22 +01:00
sabi0	91272f45da	Replace println(String.format(...)) with printf(...) (#12976 )	2023-12-28 19:32:06 +01:00
sabi0	57b104e806	Get rid of inefficient Stream.count() (#12975 )	2023-12-28 19:30:01 +01:00
sabi0	9c9949b2bc	Remove unused imports (#12970 )	2023-12-28 19:28:24 +01:00
Patrick Zhai	948970be58	Fix bug where NFARunAutomaton#getTransition does not set Transition correctly (#12909 )	2023-12-27 22:49:35 -08:00
sabi0	02722eeb69	Add missing spaces in concatenated strings (#12967 )	2023-12-23 20:30:30 -05:00
Zhang Chao	dc9f154aa5	Move group-varint encoding/decoding logic to DataOutput/DataInput (#12841 )	2023-12-23 13:18:34 +01:00
sabi0	9359a9dcff	Update contributing guide: autocrlf and build dependencies (#12963 )	2023-12-22 09:28:53 +01:00
sabi0	f6b2006195	Fix typo in help/formatting.txt (#12960 )	2023-12-21 19:58:53 +01:00
Adrien Grand	91002d04d3	Fix CheckIndex to correctly flag the automaton as binary.	2023-12-20 14:39:32 +01:00
Zhang Chao	5152051f68	Improve Javadoc for DocValuesConsumer (#12952 )	2023-12-20 13:40:44 +01:00
Adrien Grand	bcc7e120ba	Modernize LineFileDocs. (#12929 ) This replaces `StringField`/`SortedDocValuesField` with `KeywordField` and `IntPoint`/`NumericDocValuesField` with `IntField`.	2023-12-19 11:25:26 +01:00
Adrien Grand	5c084fcd6e	Add a stored fields test that indexes LineFileDocs. (#12927 ) Real-world data exhibits patterns that are taken advantage of by the compression logic, but also hardly reproducible in a randomized way. This makes this new test introduce interesting coverage. It takes one second to run on my machine, so I did not mark it `@Nightly`.	2023-12-19 11:20:14 +01:00
Adrien Grand	bf45ab79ec	Beef up `Terms#intersect` checks in `CheckIndex`. (#12926 ) Now also testing what happens with a non-null `startTerm`. This found bugs in `DirectPostingsFormat`.	2023-12-19 11:17:38 +01:00
Lukáš Vlček	5d6086e199	Fix position increment in (Reverse)PathHierarchyTokenizer (#12875 ) * Fix PathHierarchyTokenizer positions PathHierarchyTokenizer was emitting multiple tokens in the same position with changing offsets. To be consistent with EdgeNGramTokenizer (which is conceptually similar -- it's emitting multiple prefixes/suffixes off the input string), we can output every token with length 1 with positions incrementing by 1. * Fix ReversePathHierarchyTokenizer positions Making ReversePathHierarchyTokenizer consistent with recent changes in PathHierarchyTokenizer. --------- Co-authored-by: Michael Froh <froh@amazon.com>	2023-12-18 08:48:22 -05:00
Dawid Weiss	6bb244a932	An improved check for ignoring the c2-crash test if running on a client compiler. (#12953 )	2023-12-18 12:37:57 +01:00
ChrisHegarty	f6582ce048	Add back-compat indices for 9.9.1	2023-12-17 09:39:46 +00:00
ChrisHegarty	08728bf202	Add bugfix version 9.9.1	2023-12-17 09:20:34 +00:00
ChrisHegarty	1f1d0735c8	DOAP changes for release 9.9.1	2023-12-16 22:55:20 +00:00
Michael Sokolov	49d521145d	Use hppc IntIntHashMap to avoid Integer box/unbox when remapping vector ordinals during merge (#12950 )	2023-12-15 13:24:05 -05:00
Benjamin Trent	423f8279f0	Fix flaky tests that are caused by small float vectors (#12943 ) While quantization generally works well, when the number of dimensions is tiny (just two like in our tests), and we are indexing a circle, and we have random merge policies, we can end up getting unexpected ordering on the resulting vectors. closes: https://github.com/apache/lucene/issues/12940	2023-12-14 14:38:22 -05:00
Michael McCandless	d1551da027	#12932 : get monsters tests compiling/running again (#12942 )	2023-12-14 10:14:45 -05:00
Stefan Vodita	b0ebb849f5	Introduce growInRange to reduce array overallocation (#12844 ) In cases where we know there is an upper limit to the potential size of an array, we can use `growInRange` to avoid allocating beyond that limit.	2023-12-14 23:00:26 +09:00
Michael McCandless	ebf9e29570	Ensure Nori/Kuromoji shipped binary FST is the latest version (#12933 ) * ensure Nori/Kuromoji shipped binary FST is the latest version (closes #12911) * fold feedback from @uschindler: sharpen test failure methods to give the specific gradlew command to regenerate the precise FST (not everything) * add javadoc for FSTMetadata.getVersion	2023-12-14 07:38:34 -05:00
Jakub Slowinski	3965319441	Attempting to clean up some remaining Solr references (#12939 ) * Attempting to clean up some remaining Solr references * Update gradle/help.gradle Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com> --------- Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>	2023-12-14 06:02:16 -05:00
Patrick Zhai	da69346257	Add CHANGES.txt entry for #12910	2023-12-14 09:14:18 +09:00
Patrick Zhai	f303d29baf	Refactor around NeighborArray (#12910 )	2023-12-14 09:03:44 +09:00
Uwe Schindler	16d0b822b3	Prevent the common zero-width code points and detect invalid UTF-8 encoding in our sources and selected resource files (#12937 ) * Simple patch to prevent the common zero-width code points in our source and some types of resource files * Validate correct UTF-8 input and fix buggy CSS file (ISO-8859-x encoded) * add a bit of context * Add CHANGES.txt	2023-12-13 17:27:05 +01:00

1 2 3 4 5 ...

37053 Commits All Branches Search

37053 Commits

All Branches