35020 Commits

Author SHA1 Message Date
Dawid Weiss
39071dbc54
LUCENE-9904: Port GenerateJflexTLDMacros.java regeneration to gradle and regenerate UAX tokenizer with up-to-date TLDs 2021-04-07 10:56:21 +02:00
Gautam Worah
efeea0b8ee
LUCENE-9902 Minor fixes to the faceting API (#62) 2021-04-06 14:50:23 -04:00
Robert Muir
be94a667f2
LUCENE-9827: avoid wasteful recompression for small segments (#28)
Require that the segment has enough dirty documents to create a clean
chunk before recompressing during merge, there must be at least maxChunkSize.

This prevents wasteful recompression with small flushes (e.g. every
document): we ensure recompression achieves some "permanent" progress.

Expose maxDocsPerChunk as a parameter for Term vectors too, matching the
stored fields format. This allows for easy testing.

Increment numDirtyDocs for partially optimized merges:
If segment N needs recompression, we have to flush any buffered docs
before bulk-copying segment N+1. Don't just increment numDirtyChunks,
also make sure numDirtyDocs is incremented, too.
This doesn't have a performance impact, and is unrelated to tooDirty()
improvements, but it is easier to reason about things with correct
statistics in the index.

Further tuning of how dirtiness is measured: for simplification just use percentage
of dirty chunks.

Co-authored-by: Adrien Grand <jpountz@gmail.com>
2021-04-06 14:18:48 -04:00
Adrien Grand
d991fefb49
Add an example to the CacheHelper docs. (#50) 2021-04-06 16:25:15 +02:00
Dawid Weiss
2662a74cab Correct some of the jdk17-offending javadocs. 2021-04-05 20:34:52 +02:00
Dawid Weiss
2773172455 Correct some of the jdk17-offending javadocs. 2021-04-05 20:21:52 +02:00
Dawid Weiss
baceb16904 Correct some of the jdk17-offending javadocs. 2021-04-05 20:19:56 +02:00
Dawid Weiss
fbf9191abf
LUCENE-9901: UnicodeData.java has no regeneration task (#63) 2021-04-05 20:12:56 +02:00
Ignacio Vera
67a0bd4b6d
LUCENE-9705: Final clean-up and entry in CHANGES.txt (#59) 2021-04-04 11:30:47 +02:00
Dawid Weiss
010e3a1ba9
LUCENE-9900: Regenerate/ run ICU only if inputs changed (#61) 2021-04-02 11:46:43 +02:00
Dawid Weiss
e3ae57a3c1
LUCENE-9872: Make the most painful tasks in regenerate fully incremental (#60) 2021-04-02 09:56:47 +02:00
Tomoko Uchida
670bbf8b99
Ignore sdkmanrc file on Git (#58) 2021-04-02 01:04:14 +09:00
Ignacio Vera
8c9b9546cc
LUCENE-9705: Create Lucene90PointsFormat (#52) 2021-04-01 07:04:04 +02:00
Pieter van Boxtel
1d579b9448
LUCENE-9898 Remove no longer used scorePayload method from BM25Similarity (#57) 2021-04-01 09:06:03 +09:00
zacharymorn
79fcd99f4c
LUCENE-9883: Turn on ecj missingEnumCaseDespiteDefault setting (#56) 2021-03-31 15:50:52 +09:00
Dawid Weiss
32e891c60f LUCENE-9871: move dummy outputs aspect into a separate file. 2021-03-30 20:15:55 +02:00
Adrien Grand
10520185a9 LUCENE-9877: Move CHANGES entry under 8.9. 2021-03-30 15:13:00 +02:00
Greg Miller
fd79f9737a
LUCENE-9877: Allow up to 7 exceptions in PForUtil (instead of 3) (#48)
Co-authored-by: Greg Miller <gmiller@amazon.com>
2021-03-30 15:11:33 +02:00
Dawid Weiss
39b8e97613 LUCENE-9896: Add 'quiet exec' utility suppressing exec output unless a failure occurs 2021-03-30 14:38:13 +02:00
Dawid Weiss
c7455ff561 LUCENE-9871: cleaning up the build system. Upgrade palantir. Remove all ant-related hacks. 2021-03-30 12:41:06 +02:00
Dawid Weiss
fd685682be This removes the last of ant-compatibility hacks - cross-project dependency on test classes. Replaced with gradle's test fixture artifact sharing. Cleaned up spatial3d classes a bit too. 2021-03-30 12:35:33 +02:00
Dawid Weiss
f83c9462bb Remove legacy ant hacks - add conf to test sourceSet. Correct jvm options hack (don't apply to benchmarks run). 2021-03-30 11:33:27 +02:00
Dawid Weiss
89024a466b Remove exceptional test exclusions for forked non-tests and inner classes. 2021-03-30 11:13:41 +02:00
Dawid Weiss
78bfbe0bad We don't need to exclude inner classes explicitly. 2021-03-30 10:57:15 +02:00
Dawid Weiss
3115797463 LUCENE-9871: clean up some old cruft and shuffle files around. Correct inputs/outputs on check broken links so that it's incremental. 2021-03-30 10:55:19 +02:00
Dawid Weiss
974e4bc5e8 LUCENE-9880: correct task ordering for clean. 2021-03-30 10:08:44 +02:00
Ignacio Vera
00e57f8c8a
LUCENE-9705: Create Lucene90SegmentInfoFormat (#30)
The existing Lucene86SegmentInfoFormat is moved to backwards-codecs.
2021-03-30 10:04:17 +02:00
iverase
c11a01ab61 Move LUCENE-9870 under Lucene 8.8.2 2021-03-30 10:00:39 +02:00
Michael McCandless
4d16ff21b2
LUCENE-9888: re-enable CheckIndex verification that indexSort is the same across all segments (#49) 2021-03-29 12:29:40 -04:00
liupanfeng
cce982146a LUCENE-9887: fix error param use in RadixSelector 2021-03-29 12:16:06 +02:00
Jørgen Nystad
06114459ee
LUCENE-9870: Fix Circle2D intersectsLine t-value (distance) range clamp (#41)
Fixes missing matches when line magnitudeAB < 1
2021-03-29 10:41:54 +02:00
Mike McCandless
d5d6dc0793 LUCENE-9385: add CHANGES.txt entry 2021-03-27 12:40:06 -04:00
zacharymorn
3648a1020a
LUCENE-9385: Add FacetsConfig option to control which drill-down terms are indexed for a FacetLabel (#25) 2021-03-27 12:38:00 -04:00
Robert Muir
3596e05e5c
LUCENE-9878: enable redundantNullCheck in ecjLint (#44)
Detects common cases of unreachable/dead code.

For generated javacc code, the check is disabled via
SuppressWarnings("unused") because javacc generates strange/bad code such as:

  if ("" == null)

For TestStressNRTReplication's startNode() method, the check is also
disabled because analysis folds the "test evilness controls" which are
static final constants. This itself is a WTF, shouldn't we instead
randomize these evil things in our tests rather than hardcoding them to
specific values?
2021-03-27 11:43:47 -04:00
Uwe Schindler
3538709269 Improvement for LUCENE-9881 (#46): Completely disable Eclipse plugins's eclipseJdt task and replace by owur own just copying the filtered config files. This now works correctly with inputs/outputs. 2021-03-27 12:08:12 +01:00
Robert Muir
690e256ec9
LUCENE-9881: synchronize ECJ linter with Eclipse IDE (#46)
Co-authored-by: Uwe Schindler <uschindler@apache.org>
2021-03-27 00:42:29 +01:00
Dawid Weiss
f02799c511
Skip errorprone on non-nightlies. (#45) 2021-03-26 21:42:15 +01:00
Mayya Sharipova
48715fe898
LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter (#32)
1. Add an option to supply a custom leaf sorter for IndexWriter.
A DirectoryReader opened from this IndexWriter will have its leaf
readers sorted with the provided leaf sorter. This is useful for
indices on which it is expected to run many queries with particular
sort criteria (e.g. for time-based indices this is usually a
descending sort on timestamp). Providing leafSorter allows
to speed up early termination for this particular type of
sort queries.

2. Add an option to supply a custom sub-readers sorter for
BaseCompositeReader. In this case sub-readers will be sorted 
according to the the provided leafSorter.

3. Add an option to supply a custom leaf sorter for
StandardDirectoryReader. The leaf readers of this
StandardDirectoryReader will be sorted according to
the the provided leaf sorter.
2021-03-26 09:56:02 -04:00
Tomoko Uchida
b174ef45c4
Add CHANGES entry for gradle build. (#43) 2021-03-26 09:50:38 +09:00
Tomoko Uchida
8c61c6b561
Point jdk.java.net instead of OracleJDK page. (#42) 2021-03-26 08:37:52 +09:00
Tomoko Uchida
ea74ffb984
LUCENE-9853: Use CJKWidthCharFilter as the default character width normalizer in JapaneseAnalyzer (#26) 2021-03-26 08:32:42 +09:00
zacharymorn
3ed87c867a
LUCENE-9864: Enforce @Override annotation everywhere (#40)
Requiring the annotation is helpful because if an abstract method is removed, the concrete methods will then show up as compile errors: preventing dead code from being accidentally left behind.

Co-authored-by: Robert Muir <rmuir@apache.org>
2021-03-25 17:50:38 -04:00
Dawid Weiss
a38713907d LUCENE-9866: regenerate kuromoji dict in regenerate 2021-03-25 11:43:37 +01:00
Uwe Schindler
3214e365e3
LUCENE-9856: Static analysis take 3: Remove redundant interfaces (#38)
Co-authored-by: Robert Muir <rmuir@apache.org>
2021-03-24 18:26:12 +01:00
Dawid Weiss
c23ea2f537
LUCENE-9865: Reduce unnecessary bla-bla-bla in top-level readme file (#39) 2021-03-24 17:17:53 +01:00
Dawid Weiss
285ca64ae3 LUCENE-9862: cleanup of all regenerate tasks. Leaving interim commits for reference. 2021-03-24 16:21:43 +01:00
Dawid Weiss
108cd85375 Avoid creating a circular dependency between shared subtasks. 2021-03-24 16:01:36 +01:00
Dawid Weiss
4c2de7ef43 Correct soft task ordering between tidy and any other dependency of regenerate. 2021-03-24 15:39:45 +01:00
Dawid Weiss
bb5db1e16d Correct snowball download/unzip sequence to be always consistent. 2021-03-24 15:39:45 +01:00
Dawid Weiss
34f589b0aa Correct run order between tidy and regenerate's deps. Make snowball not fail on Windows (just emit an error). 2021-03-24 15:39:45 +01:00