Commit Graph

35033 Commits

Author SHA1 Message Date
Mayya Sharipova 48715fe898
LUCENE-9507 Custom order for leaves in IndexReader and IndexWriter (#32)
1. Add an option to supply a custom leaf sorter for IndexWriter.
A DirectoryReader opened from this IndexWriter will have its leaf
readers sorted with the provided leaf sorter. This is useful for
indices on which it is expected to run many queries with particular
sort criteria (e.g. for time-based indices this is usually a
descending sort on timestamp). Providing leafSorter allows
to speed up early termination for this particular type of
sort queries.

2. Add an option to supply a custom sub-readers sorter for
BaseCompositeReader. In this case sub-readers will be sorted 
according to the the provided leafSorter.

3. Add an option to supply a custom leaf sorter for
StandardDirectoryReader. The leaf readers of this
StandardDirectoryReader will be sorted according to
the the provided leaf sorter.
2021-03-26 09:56:02 -04:00
Tomoko Uchida b174ef45c4
Add CHANGES entry for gradle build. (#43) 2021-03-26 09:50:38 +09:00
Tomoko Uchida 8c61c6b561
Point jdk.java.net instead of OracleJDK page. (#42) 2021-03-26 08:37:52 +09:00
Tomoko Uchida ea74ffb984
LUCENE-9853: Use CJKWidthCharFilter as the default character width normalizer in JapaneseAnalyzer (#26) 2021-03-26 08:32:42 +09:00
zacharymorn 3ed87c867a
LUCENE-9864: Enforce @Override annotation everywhere (#40)
Requiring the annotation is helpful because if an abstract method is removed, the concrete methods will then show up as compile errors: preventing dead code from being accidentally left behind.

Co-authored-by: Robert Muir <rmuir@apache.org>
2021-03-25 17:50:38 -04:00
Dawid Weiss a38713907d LUCENE-9866: regenerate kuromoji dict in regenerate 2021-03-25 11:43:37 +01:00
Uwe Schindler 3214e365e3
LUCENE-9856: Static analysis take 3: Remove redundant interfaces (#38)
Co-authored-by: Robert Muir <rmuir@apache.org>
2021-03-24 18:26:12 +01:00
Dawid Weiss c23ea2f537
LUCENE-9865: Reduce unnecessary bla-bla-bla in top-level readme file (#39) 2021-03-24 17:17:53 +01:00
Dawid Weiss 285ca64ae3 LUCENE-9862: cleanup of all regenerate tasks. Leaving interim commits for reference. 2021-03-24 16:21:43 +01:00
Dawid Weiss 108cd85375 Avoid creating a circular dependency between shared subtasks. 2021-03-24 16:01:36 +01:00
Dawid Weiss 4c2de7ef43 Correct soft task ordering between tidy and any other dependency of regenerate. 2021-03-24 15:39:45 +01:00
Dawid Weiss bb5db1e16d Correct snowball download/unzip sequence to be always consistent. 2021-03-24 15:39:45 +01:00
Dawid Weiss 34f589b0aa Correct run order between tidy and regenerate's deps. Make snowball not fail on Windows (just emit an error). 2021-03-24 15:39:45 +01:00
Dawid Weiss 27510d5f2f LUCENE-9862: cleanup of all regenerate tasks; moved common code into shared bit. Added failOnError for ant.patch. Included jflexStandardTokenizerImpl. 2021-03-24 15:39:45 +01:00
Robert Muir 945b1cb872
LUCENE-9856: fail precommit on unused local variables, take two (#37)
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.

Co-authored-by: Mike McCandless <mikemccand@apache.org>
2021-03-23 13:59:00 -04:00
Michael McCandless 53fd63dbb2
replace 'static enum' with 'enum' (#36) 2021-03-23 13:23:39 -04:00
Robert Muir e6c4956cf6
Revert "LUCENE-9856: fail precommit on unused local variables (#34)"
This reverts commit 20dba278bb.
2021-03-23 12:46:36 -04:00
Robert Muir 20dba278bb
LUCENE-9856: fail precommit on unused local variables (#34)
Enable ecj unused local variable, private instance and method detection. Allow SuppressWarnings("unused") to disable unused checks (e.g. for generated code or very special tests). Fix gradlew regenerate for python 3.9 SuppressWarnings("unused") for generated javacc and jflex code. Enable a few other easy ecj checks such as Deprecated annotation, hashcode/equals, equals across different types.

Co-authored-by: Mike McCandless <mikemccand@apache.org>
2021-03-23 11:09:24 -04:00
Dawid Weiss 078d0079d1
LUCENE-9861: pull tuned vm options into a separate aspect. (#33) 2021-03-23 10:39:09 +01:00
András Salamon 2678d68be8
SOLR-14024 Invalid html generated by changes2html.pl (#31) 2021-03-22 17:35:32 -04:00
Dawid Weiss 246c4beb22
LUCENE-9854: Clean up utilities to download and extract test/ benchmark data sets. (#27) 2021-03-22 12:22:39 +01:00
Dawid Weiss a5996dbecd Follow-up to help/validateLogCalls.txt removal. 2021-03-19 15:14:42 +01:00
Dawid Weiss c0852d1e9c Follow-up to help/ant.txt removal. 2021-03-19 15:13:55 +01:00
Dawid Weiss 1679076bde Nuke more unused/ obsolete refs. 2021-03-19 13:11:37 +01:00
Dawid Weiss f1299bca9f Nuke the obsolete ant.txt help. 2021-03-19 13:09:42 +01:00
Dawid Weiss ee59e4e1ac Add a link for Eclipse's users. 2021-03-19 13:08:15 +01:00
Dawid Weiss bf807c2a32 Correct header structure for jdk13+ 2021-03-19 08:30:15 +01:00
Christoph Büscher 7ed72972b8
LUCENE-9007: MockSynonymFilter should add TypeAttribute (#23)
The MockSynonymFilter should add the type TypeAttribute to the synonyms it
generates in order to make it a better stand-in for the real filter in tests.
2021-03-18 22:00:09 -04:00
Peter Gromov 28edbf8fc6
LUCENE-9852: Make Hunspell thread-safe (#24) 2021-03-18 21:57:03 -04:00
Michael Sokolov 5b36af3cd7
LUCENE-9844: document disk layout of Lucene90VectorFormat 2021-03-18 09:39:23 -04:00
Dawid Weiss 53bea54669
LUCENE-9375: cleaning up post-split conditional build logic and solr refs. (#22) 2021-03-18 11:04:45 +01:00
Dawid Weiss ca3de30aff
Don't cross-link between modules for interim snapshot builds. (#21) 2021-03-18 10:18:07 +01:00
András Salamon 0e245171f1
SOLR-15002 Upgrade httpclient to 4.5.13 and httpcore to 4.4.13 (#14) 2021-03-17 22:25:42 -04:00
Bruno Roustant d6a554138d
LUCENE-9663: Move to 8.9.0 section in CHANGES.txt. 2021-03-17 15:39:24 +01:00
Uwe Schindler 8e1d5e3dbf LUCENE-9836: cherry pick changes entry from 8.x 2021-03-16 17:12:34 +01:00
Mike McCandless 8b68bc744f remove accidental extra K 2021-03-15 17:34:25 -04:00
Dawid Weiss f8040c0ecf LUCENE-9650: errorprone plugin doesn't work on jdk16. A different workaround that keeps the dependency. 2021-03-15 10:19:27 +01:00
Peter Gromov cdff0accaa
Hunspell suggestions: speed up for some non-Latin scripts (#19) 2021-03-15 05:02:45 -04:00
Peter Gromov 8913a98379
LUCENE-9830: Hunspell: store word length for faster dictionary lookup/enumeration (#3) 2021-03-15 00:35:25 -04:00
Peter Gromov 42c6f780bf
LUCENE-9831: Hunspell GeneratingSuggester: faster flag & case checks, less allocations (#4) 2021-03-15 00:32:08 -04:00
Robert Muir d48193e8cf
LUCENE-9837: try to improve performance of VectorUtil.dotProduct (#17)
More loop unrolling for VectorUtil.dotProduct to eek out a bit more short-term performance.
2021-03-14 23:16:08 -04:00
Robert Muir f3a284ad83
LUCENE-9796: Fix SortedDocValues to no longer extend BinaryDocValues
SortedDocValues do not have a per-document binary value, they have a
per-document numeric `ordValue()`. The ordinal can then be dereferenced
to its binary form with `lookupOrd()`, but it was a performance trap to
implement a `binaryValue()` on the SortedDocValues api that does this
behind-the-scenes on every document.

You can replace calls of `binaryValue()` with `lookupOrd(ordValue())`
as a "quick fix", but it is better to use the ordinal alone
(integer-based datastructures) for per-document access, and only call
lookupOrd() a few times at the end (e.g. for the hits you want to display).
Otherwise, if you really don't want per-document ordinals, but instead a
per-document `byte[]`, use a BinaryDocValues field.

This change only addresses the API (slow `binaryValue()` trap), but
doesn't yet fix any slow algorithms that were discovered in the process,
so it doesn't yield any performance improvements.
2021-03-14 23:07:48 -04:00
Tomoko Uchida 471f38c031 LUCENE-9834: goodbye old friend - the classic luke logo 2021-03-12 23:17:32 +09:00
Dawid Weiss 4f5389bfa8 Flush output on javadoc emitting a failure. 2021-03-12 11:39:40 +01:00
Tomoko Uchida 7478b3fc17 LUCENE-9834: Adjast logo/colors in the Luke About dialog 2021-03-12 11:00:10 +09:00
Dawid Weiss 8bbcc39583 Always include errorprone dependency, even if we're not checking. This ensures consistent use patterns across JVMs. 2021-03-11 22:27:25 +01:00
Peter Gromov e784721e69
LUCENE-9833: Hunspell: AssertionError in WordStorage.lookupWord (#13) 2021-03-11 10:10:57 -05:00
Peter Gromov efa88a1790
LUCENE-9832: Hunspell: SIOOBE in GeneratingSuggester.expandRoot (#12) 2021-03-11 10:09:11 -05:00
Robert Muir 1b36406ec4
LUCENE-9827: Speed up merging of stored fields and term vectors for small segments
Stored Fields and Term Vectors are block-compressed. Decompressing and
recompressing all the documents on every merge is too slow, so we try to
avoid doing it unless it will actually improve the compression ratio. If
we can get away with it, we just bulk-copy existing compressed blocks to
the new segment.

Previously, small segments would always be considered dirty and
recompressed... the special optimized bulk merge wouldn't kick in until
segments were relatively large. But as block size and ratio (shared
dictionaries etc) have increased, "relatively large" has become a much
bigger number.

So try to avoid doing wasted work: if there's only 1 dirty chunk
(incompletely filled compression block), then don't recompress: it will
likely only give us 1 dirty chunk as a result, at the expense of cpu.

Require at least 2 dirty chunks to recompress: this way the
recompression actually buys us something (reduces 2 to 1).

The change also means that bulk merge will now happen often in
the unit test suite, increasing coverage.
2021-03-11 09:30:40 -05:00
Mike McCandless 12999d30f2 LUCENE-9791: add CHANGES.txt entry 2021-03-11 08:09:23 -05:00