Commit Graph

35613 Commits

Author SHA1 Message Date
Dawid Weiss c3d3d0703b Revert "Reverting back to 9d19b49038b."
This reverts commit d2a022f49b.
2021-12-19 11:59:13 +01:00
Dawid Weiss d2a022f49b Reverting back to 9d19b49038. 2021-12-18 23:35:22 +01:00
Dawid Weiss 1114cf2c25 LUCENE-10308: sort input files for ecj so that module-info.java comes first. 2021-12-18 21:18:16 +01:00
Dawid Weiss 7e1f3fef69 LUCENE-10255: add unsynced providers to the module. 2021-12-18 21:06:11 +01:00
Dawid Weiss c1c27d4ff4 LUCENE-10255: initial support for Java Modules (squashed). 2021-12-18 20:51:55 +01:00
Dawid Weiss 9d19b49038 LUCENE-10285: try to force ordering of internal tasks, in spite of making top-level wrapper dependencies. (#549) 2021-12-17 19:14:29 +01:00
Greg Miller ff274fddb6
LUCENE-10321: Tweak MultiRangeQuery interval tree creation logic (#548) 2021-12-17 06:38:48 -08:00
Dawid Weiss 34d69fd160 LUCENE-10313: add missing javadoc. 2021-12-16 18:36:57 +01:00
Jan Høydahl 97106d37c9 Run tidy on Version file 2021-12-16 15:53:55 +01:00
Tomoko Uchida db26532154 LUCENE-10313: minor clean-ups and follow-ups (#546) 2021-12-16 23:43:21 +09:00
Jan Høydahl 3d5c3b5f8e Remove duplicated @deprecated line 2021-12-16 15:36:21 +01:00
Jan Høydahl 9534a08aa3 Add back-compat indices for 8.11.1 2021-12-16 13:28:02 +01:00
Jan Høydahl 91d4846df6 Fix bug in Version deprecation on stable branch 2021-12-16 13:13:55 +01:00
Jan Høydahl 5694fa840e Add bugfix version 8.11.1 2021-12-16 12:13:06 +01:00
Tomoko Uchida 5ddc7ebb83 LUCENE-10303: remove unnecessary changes entry 2021-12-16 19:40:51 +09:00
Dawid Weiss 629ab1c952 LUCENE-10313: drop log4j from luke (#544) 2021-12-16 11:19:57 +01:00
Quentin Pradet 352a6b68f0 LUCENE-10085: Fix flaky testQueryMatchesCount (#538)
Five times every 10 000 tests, we did not index any documents with i
between 0 and 10 (inclusive), which caused the deleted tests to fail.

With this commit, we make sure that we always index at least one
document between 0 and 10.
2021-12-14 10:50:48 +01:00
Ignacio Vera 5ba87f9efa LUCENE-10310: Fix test error in TestXYDocValuesQueries#testRandomDistanceHuge (#537)
We create random circles using ShapeTestUtils which is safe.
2021-12-13 12:02:13 +01:00
Tomoko Uchida e0a6e1c662 LUCENE-10309: Minimum KnnVector codec support in Luke (#535) 2021-12-12 15:32:10 +09:00
Tomoko Uchida 140f48e267 LUCENE-10303: Upgrade log4j to 2.15.0 2021-12-11 10:45:00 +09:00
Tomoko Uchida f046b59a5b LUCENE-10305: Ensure line endings of versions.props is LF 2021-12-11 10:12:34 +09:00
Dawid Weiss cf5a7337e2 LUCENE-10229: change the wording a bit. 2021-12-09 17:35:18 +01:00
Patrick Zhai 2a47bbe8be LUCENE-10229: Unify behaviour of match offsets for interval queries (#521) 2021-12-09 17:35:18 +01:00
Ignacio Vera 5e0d8dc87a Revert "LUCENE-10289: Change DocIdSetBuilder#grow() from taking an int to a long (#520)" (#532)
This reverts commit af1e68b891.
2021-12-09 13:55:36 +01:00
Dawid Weiss 17aaab654e LUCENE-10294: Avoid compiling javadocs twice in 'gradlew check'. 2021-12-09 09:56:32 +01:00
Julie Tibshirani 394472d4b8 LUCENE-10040: Add test for vector search with skewed deletions (#527)
This exercises a challenging case where the documents to skip all happen to
be closest to the query vector. In many cases, HNSW appears to be robust to this
case and maintains good recall.
2021-12-08 11:26:45 -08:00
Robert Muir c74642d9a7
remove unnecessary "dependencies" in versions.props (#526)
Looks like stray cats from back when it was shared with solr
2021-12-07 21:23:33 -05:00
Adrien Grand 85caa4364e DOAP changes for release 9.0.0 2021-12-07 14:38:56 +01:00
Ignacio Vera 1eb935229f LUCENE-10289: Change DocIdSetBuilder#grow() from taking an int to a long (#520) 2021-12-07 07:42:04 +01:00
Tomoko Uchida 3eadfd4596 LUCENE-10287: Re-add abstract FSDirectory class as a supported directory (#522) 2021-12-07 15:34:24 +09:00
gf2121 892e324d02 LUCENE-10280: Store BKD blocks with continuous ids more efficiently (#510) 2021-12-07 07:27:11 +01:00
Robert Muir 4d48dc87f7
speed up TestSimpleExplanationsWithFillerDocs (#516)
This is the slowest test suite, runs for ~ 60s, because between every
document it adds 2048 "filler docs". This just adds up to a ton of
indexing across all the test methods.

Use 2048 for Nightly, and instead a smaller number (4) for local builds.
2021-12-06 22:13:04 -05:00
Robert Muir 9000dfc382
simplify jflex grammars by using difference rather than negation (#515)
Jflex grammars now avoid using complement operator twice as a demorgan-workaround for "macros in char classes". With the latest version of jflex, we can just do the subtraction directly and avoid unnecessary NFA->DFA conversions. This speeds up `generateUAX29URLEmailTokenizer` around 3x.
2021-12-06 21:59:40 -05:00
Uwe Schindler d36c70cdd6 LUCENE-10287: Add changes entry 2021-12-06 20:28:46 +01:00
Uwe Schindler 8e7fbcaf5b LUCENE-10287: Fix startup script of module enabled Luke to pass jdk.unsupported as module (#517) 2021-12-06 20:24:56 +01:00
gf2121 ebee531df7 LUCENE-10233: fix Unit Test TestFixedBitSet#testAndNot (#512)
Co-authored-by: guofeng.my <guofeng.my@bytedance.com>
2021-12-06 07:34:41 +01:00
Robert Muir aa6a78c28c
tone down TestIndexWriter.testMaxCompletedSequenceNumber in non-nightly (#506)
this test currently indexes up to 600 docs for each thread.
2021-12-04 14:23:03 -05:00
Robert Muir 3e06e2338e
tone down BaseTermVectorsFormatTestCase.testLotsOfFields in non-nightly (#505)
This test runs across every IndexOptions, indexing hundreds of fields.
It is slow for some implementations (e.g. SimpleText).

Use less fields for normal runs.
2021-12-04 14:22:57 -05:00
Robert Muir 401d6209fb
Make TestNRTReplication.testCrashReplica nightly (#504)
This test is forking and crashing JVMs, always runs over 10 seconds
2021-12-04 14:22:48 -05:00
Dawid Weiss d2563e6f1f LUCENE-10284: Upgrade morfologik-stemming to 2.1.8 (#514) 2021-12-04 09:57:02 +01:00
Robert Muir eff5430e58
LUCENE-10243: increase unicode versions of tokenizers to 12.1 (#465)
* Bump %unicode 9 -> %unicode 12.1 for the 3 unicode grammars
* regenerate emoji conformance tests for unicode 12.1
* modify wordbreak conformance tests to use emoji data (which replaces old crazy E_base etc properties)
* regenerate wordbreak conformance tests
* Simplify grammar files and word-break conformance test generator, now that full-width numbers are WordBreak=Numeric
* Use jflex emoji properties rather than ICU-generated ones
2021-12-03 20:34:29 -05:00
gf2121 eadc146e08 LUCENE-10233: Use AND NOT for inverse intersector (#499)
When docIds are stored as a BitSet, use andNot to speed up collecting them.
2021-12-03 09:24:54 +01:00
Ignacio Vera e2264cd7ef LUCENE-10279: add entry in CHANGES.txt and make RangeClause final (#507) 2021-12-03 07:24:40 +01:00
Misha Tiurin 074a233244
Remove duplicate entries in SpanishPluralStemmer invariants list (#508)
* Remove duplicate entries in SpanishPluralStemmer invariants list
Add assertion to prevent duplicates in the future

Co-authored-by: Xavier Sanchez <xavier.sanchez@wallapop.com>
2021-12-02 14:23:39 -05:00
Robert Muir cbd306f87b LUCENE-10278: don't write zero-sized array in this test (#501)
DocIdsWriter is not prepared for this.
2021-12-02 15:54:28 +01:00
gf2121 dd14424817 LUCENE-10233: Store docIds as bitset to speed up addAll (#438) 2021-12-02 15:54:18 +01:00
Adrien Grand 1b04807440 Make EndiannessReverser(Data|Index)Input always reverse byte order. (#502)
Currently EndiannessReverser(Data|Index)Input doesn't reverse the byte order for
`readLongs` and `readFloats`. The reasoning is that these two method replaced
`readLELongs` and `readLEFloats`, so the byte order doesn't need changing.

However this is creating some confusing situations where new code expects a
consistent byte order on the write and read sides and gets longs in the wrong
byte order. So this commit suggests a different approach, where
EndiannessReverser(Data|Index)Input always changes the byte order, and former
call sites of `readLELongs` and `readLEFloats` are changed to manually reverse
the byte order on top of `readLongs` and `readFloats`.

This is making old codecs a bit slower, but I think it's fair since these are
old codecs. And this makes the endianness reversing backward compatibility layer
easier to reason about?
2021-12-02 14:01:37 +01:00
Ignacio Vera c0f0686f74 LUCENE-10279: Fix equals in MultiRangeQuery (#503) 2021-12-02 13:34:47 +01:00
Robert Muir f33ae4e81f
improve term vector merging tests (#500)
Use less iterations locally so that term vector merging doesn't dominate
the list of slowest tests.

Split out deletes/no-deletes into separate methods to improve
debuggability.

Remove nightly from SimpleText term vectors merging tests, now that they
run much faster.
2021-12-02 05:40:36 -05:00
Ignacio Vera a580e29539 LUCENE-10275: Speed up MultiRangeQuery by using an interval tree 2021-12-02 09:54:32 +01:00