Commit Graph

36137 Commits

Author SHA1 Message Date
Tomoko Uchida c89f8a7ea1
LUCENE-10493: factor out Viterbi algorithm and share it between kuromoji and nori (#805) 2022-04-25 20:09:46 +09:00
Adrien Grand 2a4c21bb58
LUCENE-8836: Speed up TermsEnum#lookupOrd on increasing sequences of ords. (#827) 2022-04-25 09:18:21 +02:00
Robert Muir 1089b482fc
LUCENE-10528: use Xvfb in test to avoid messing up user's desktop (#828)
Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
2022-04-23 08:00:33 -04:00
gf2121 35ca2d79f7
LUCENE-10315: Speed up DocIdsWriter by ForUtil (#797) 2022-04-23 19:32:02 +08:00
Chris Hegarty 3bcc40efe9
LUCENE-10517: Improve performance of SortedSetDV faceting by iterating on class types (#812) 2022-04-21 18:39:53 +02:00
Chris Hegarty 08f848a582
Add two facet tests (#826) 2022-04-21 18:39:41 +02:00
Robert Muir c897aac077
fail clearly on too-new JDK (#819)
Gradle will give a very confusing error, let's make it absolutely clear.

Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
2022-04-21 09:22:26 -04:00
Robert Muir d6461eab0b
improve spotless error to suggest running 'gradlew tidy' (#817)
The current error isn't helpful as it suggests a per-module command. If
the user has modified multiple modules, they will be running gradle
commands to try to fix each one of them, when it would be easier to just
run 'gradlew tidy' a single time and fix everything.
2022-04-21 08:30:10 -04:00
Robert Muir 844bd88839
LUCENE-10526: add single method to mockfile to wrap a Path (#822)
Currently "new FilterPath" is called from everywhere, making it impossible for a mockfilesystem to use a custom subclass.
Add FilterFileSystemProvider.wrapPath(path), which subclasses can override. Fix tests to use it instead of juggling URI objects and passing FileSystems around.
2022-04-20 16:40:10 -04:00
Yuting Gan ec53a72a44
LUCENE-10495: Fix return statement of siblingsLoaded() in TaxonomyFacets (#778) 2022-04-20 12:56:43 -07:00
Adrien Grand 2d278a0efe
Clarify that terms dicts are per-field in block-tree's javadocs. (#823) 2022-04-20 17:19:51 +02:00
Robert Muir e390f33258
Fix incorrect docs in README.md: it must be java 17 exactly, java 18 does not work (#818) 2022-04-20 11:07:24 -04:00
Adrien Grand 7c173b0e1c LUCENE-10153: Make errorprone happy. 2022-04-20 16:47:34 +02:00
Ignacio Vera 4c133f435d
LUCENE-10514: Component2D#Within methods should return NOTWITHIN for triangles within the query geometry (#809)
This commit brings makes sure we always return NOTWITHIN for fully contained triangles in 
Component2D#within* methods
2022-04-20 16:30:29 +02:00
Adrien Grand 15ecf3c27f LUCENE-10503: Fix JIRA number in CHANGES. 2022-04-19 15:40:53 +02:00
Luca Cavanna 866bb86a1c
LUCENE-10506: change visibility of ProfilerCollector#deriveCollectorName to protected (#799)
This allows subclasses to extend how the inner collector name is derived.
2022-04-19 15:36:11 +02:00
Adrien Grand d9e37f3123
LUCENE-10153: Improve accuracy of scaled scores in WANDScorer. (#794) 2022-04-19 15:26:24 +02:00
Mike McCandless fb76d0b104 LUCENE-10482, LUCENE-10521: hrmph, put the @Ignore in the right place 2022-04-19 07:19:15 -04:00
Mike McCandless c388705855 LUCENE-10482: Ignore this test for now 2022-04-18 17:14:04 -04:00
Tomoko Uchida 872349cef9
Add some basic tasks to help/workflow (#811) 2022-04-18 11:34:28 +09:00
Gautam Worah d322be52f2
LUCENE-10482 Bug Fix: Don't use Instant.now() as prefix for the temp dir name (#814)
* Don't use Instant.now() as prefix for the temp dir name

* spotless
2022-04-17 21:18:08 -04:00
Gautam Worah 10ebc099c8
LUCENE-10482 Allow users to create their own DirectoryTaxonomyReaders with empty taxoArrays instead of letting the taxoEpoch decide (#762) 2022-04-15 10:45:02 -07:00
Mayya Sharipova 2e941fcfed
Make constructor for QueryOffsetRange public (#800)
QueryOffsetRange is a public class and is used in other classes
(e.g. FieldValueHighlighters needs it).
Make it constructor public as well to be used in other packages
2022-04-14 10:50:22 -04:00
Tomoko Uchida e6fb74f909
A bit of clarification, remove duplication, add link to help/workflow.txt 2022-04-13 12:35:30 +09:00
Rich Bowen e9789afb39
LUCENE-10513: Run `gradlew tidy` first (#808)
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
2022-04-12 21:22:55 +09:00
Ignacio Vera eb2df13bba
LUCENE-10508: Fixes some failures where a GeoArea is built with degenerated latitudes (#804)
Fixes some edge cases where GeoArea were built in a way that
vertical planes could not evaluate their sign, either because the planes
were the same or the center between those planes was lying on top of one
of the planes.
2022-04-12 08:06:35 +02:00
Rich Bowen 0a069ed454
LUCENE-10512: Grammar: Remove incidents of "the the" in comments. (#807)
* Grammar: Remove incidents of "the the" in comments.

* fixes formatting, as per helpful comment from Mike

* Running ./gradlew :lucene:misc:spotlessApply again made more changes.

* It keeps finding new things ... what's up with this?

* Fixing more nits that gradlew finds. Sorry, folks. I am new at this.
2022-04-11 11:11:10 -04:00
Dawid Weiss 2c1f938139
LUCENE-10229: return -1 for unknown offsets in ExtendedIntervalsSource. Modify highlighting to work properly with or without offsets (depending on their availability). (#803)
Thanks @romseygeek
2022-04-11 11:52:31 +02:00
Dawid Weiss ba1062620c
LUCENE-10510: Check module access prior to running gjf/spotless tasks (#802) 2022-04-10 20:35:45 +02:00
Julie Tibshirani ab1394e840 Fix rare failures in TestVectorUtil cosine tests
If one of the vectors is zero, the cosine is not defined. This change makes sure
the test vectors are non-zero.
2022-04-08 09:36:23 -07:00
Chris Hostetter 5015dc6dbb LUCENE-10292: Suggest: Fix AnalyzingInfixSuggester / BlendedInfixSuggester to correctly return existing lookup() results during concurrent build()
Fix other FST based suggesters so that getCount() returned results consistent with lookup() during concurrent build()
2022-04-08 09:25:33 -07:00
Tomoko Uchida 13630d361e
LUCENE-10493: Unify token Type enum in kuromoji and nori (#801) 2022-04-08 18:31:53 +09:00
Tomoko Uchida 9aa8ec9d06
LUCENE-10493: Unify TokenInfoFST in kuromoji and nori (#795) 2022-04-07 21:29:44 +09:00
Tomoko Uchida 4d2b08554a
LUCENE-10493: add 'backWordPos' array to JapaneseTokenizer.Position (#793) 2022-04-07 21:29:07 +09:00
zacharymorn 94fe7e314f
LUCENE-10436: Remove deprecated DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery (#790) 2022-04-07 00:53:29 -07:00
Greg Miller f870edf2fe
LUCENE-10444: Support alternate aggregation functions in association facets (#718) 2022-04-06 14:51:06 -07:00
Julie Tibshirani 9eeef080e5
Add release wizard step around build failures (#789)
This PR adds a preparation step to look at builds@lucene.apache.org and address
recurring failures. This helps make sure we catch and fix known bugs before
spinning the release candidate. It also prevents flaky tests from failing
during the release vote (which adds confusion).
2022-04-06 14:03:52 -07:00
Luca Cavanna 1cf1b301af
LUCENE-10002: replace more usages of search(Query, Collector) in tests (#787)
This commit replaces more usages of search(Query, Collector) with calling the corresponding search(Query, CollectorManager) instead. This round focuses on tests that implement custom collector, that need a corresponding collector manager.
2022-04-06 11:06:10 +02:00
Luca Cavanna 74e9716aec
LUCENE-10002: move MemoryIndex to search(Query, CollectorManager) (#785) 2022-04-06 11:02:25 +02:00
zacharymorn 91e29405d8
LUCENE-10436: Deprecate DocValuesFieldExistsQuery, NormsFieldExistsQuery and KnnVectorFieldExistsQuery with FieldExistsQuery (#767) 2022-04-05 23:07:20 -07:00
Luca Cavanna 796a19b457
LUCENE-10500: StringValueFacetCounts to not rely on sequential collection (#788)
StringValueFacetCounts should use the segment ordinal instead of the current index when looping through the matching hits, as when search is multi-threaded the order of the matching hits (one per segment) is not deterministic.
2022-04-05 22:42:06 +02:00
Quentin Pradet 6062ba0b3b
LUCENE-10085: Fix rare failure in TestDocValuesFieldExistsQuery (#784)
In rare cases, this test could delete all documents and cause a failure.
2022-04-05 10:30:01 -07:00
Greg Miller a071180a80 Add CHANGES entry for LUCENE-10467 2022-04-05 09:32:07 -07:00
Yuting Gan 6b82e600a8
LUCENE-10467: Throws IllegalArgumentException for getAllDims and getTopChildren if topN <= 0 (#751) 2022-04-05 09:28:59 -07:00
Tomoko Uchida bb4a0dc19b
LUCENE-10497: Add a base Token class to analysis-common (for kuromoji and nori) (#783) 2022-04-05 20:20:38 +09:00
Luca Cavanna ea52a84c7e
Replace TopFieldCollector usages in tests with collector manager (#761)
This commit replaces some usages of TopFieldCollector in tests with a corresponding collector manager created through TopFieldCollector#createSharedManager
2022-04-05 10:03:04 +02:00
Adrien Grand deb6170107 Fix CHANGES formatting. 2022-04-05 09:24:40 +02:00
xiaoping 898ec1659d
LUCENE-10456: Implement Weight#count for MultiRangeQuery (#731) 2022-04-05 09:23:59 +02:00
Adrien Grand f249046a1d LUCENE-10484: Move CHANGES entry to 9.2. 2022-04-05 08:53:57 +02:00
Luca Cavanna 7ed0f3d7ad
LUCENE-10484: Add support for concurrent facets random sampling (#765)
This commit adds a new createManager static method to RandomSamplingFacetsCollector that allows users to perform random sampling concurrently. The returned collector manager is very similar to the existing FacetsCollectorManager but it exposes a specialized reduced RandomSamplingFacetsCollector.

This relates to [LUCENE-10002](https://issues.apache.org/jira/browse/LUCENE-10002). It allows users to use a collector manager instead of a collector when doing random sampling, in the effort of reducing usages of IndexSearcher#search(Query, Collector).
2022-04-05 08:51:57 +02:00