Commit Graph

35511 Commits

Author SHA1 Message Date
Nhat Nguyen 92a53d3601 LUCENE-10126: Add CHANGES entry 2021-10-04 15:44:11 -04:00
Nhat Nguyen 45e8f639b0 LUCENE-10119: Add CHANGES entry 2021-10-04 15:43:17 -04:00
Nhat Nguyen c18e623b9a LUCENE-10106: Add CHANGES entry 2021-10-04 15:42:55 -04:00
Adrien Grand 18fc6c1f3e
LUCENE-10145: Speed up byte[] comparisons using VarHandles. (#349) 2021-10-04 18:35:27 +02:00
Chris Hegarty 04fb8c059e
LUCENE-10118: Test fix
We need to collect messages in a thread-safe list, as we're writing from multiple
threads.
2021-10-04 12:46:30 +01:00
Dawid Weiss 2e57a40546
LUCENE-10139: ExternalRefSorter returns a covariant with a subtype of BytesRefIterator that is Closeable. (#340) 2021-10-04 09:21:09 +02:00
Robert Muir b4fcdd9770
LUCENE-10142: use a better RNG for HNSW vectors
This code makes extensive use of Random, but uses the old legacy
java.util.Random, which is slow. Swap in SplittableRandom for better
performance.
2021-10-02 15:23:28 -04:00
Robert Muir 3dee08a09a
LUCENE-10130: small optimizations to SparseFixedBitSet set() codepath
Don't spend so many cycles updating ramBytesUsed when setting each bit.
Avoid recomputing some shifts that the caller already computes.
2021-10-02 08:30:54 -04:00
Robert Muir d395435fa8
LUCENE-10130: HnswGraph could make use of a SparseFixedBitSet.getAndSet 2021-10-01 23:16:20 -04:00
Nhat Nguyen 5748743d91
LUCENE-10126: Re-introduce chunk scoring logic in tests (#331)
This commit re-introduces the chunk scoring logic in AssertingBulkScorer 
and enables it in TestSortOptimization.
2021-10-01 10:02:28 -04:00
goankur cb366d04d4
LUCENE-10134: Move initialization of liveDocs bits outside the constructor to avoid AssertionError (#345)
Co-authored-by: Ankur Goel <goankur@amazon.com>
2021-10-01 08:57:11 +02:00
Timothy Potter 4c97b9e3f2
LUCENE-10131: Add backcompat indices for 8.10 and add LUCENE_8_10_0 to Version (#343) 2021-09-30 14:58:23 -06:00
Dawid Weiss 4d0fabf53b LUCENE-9713: we don't need those symbol-escape checks. They're valid adoc and we don't produce PDFs. 2021-09-30 15:27:56 +02:00
Dawid Weiss 93c66e1400 LUCENE-9713: exclude .idea/ (sync with Solr's version). 2021-09-30 15:19:19 +02:00
Dawid Weiss 3aa0676194
LUCENE-9713: apply source validation to txt files outside of src/* folders. Fix offenders. (#339) 2021-09-30 15:13:42 +02:00
Dawid Weiss 1bb4554832
LUCENE-10135: Correct passage selector behavior for long matching snippets (#334) 2021-09-30 15:05:41 +02:00
Chris Hegarty 797cfbf477
LUCENE-10118: Improve CMS infostream messages (#337)
Expand the log message when CMS.MergeThread completes its merge operation, 
to include addition useful diagnostic information, like the total-bytes-written, 
the time taken, as well as rate limiter information. Also, while here, unify the 
thread start and end log output to help improve tracing.
2021-09-30 11:43:45 +01:00
Alan Woodward ca810e732d
LUCENE-10138: Use maven central to resolve third-party gradle plugins (#336)
The gradle plugin portal uses jcenter to resolve third-party plugins, which
can be flaky. This commit instructs gradle to look first in maven central,
and only use the plugin portal for gradle's own plugins.
2021-09-30 11:41:05 +01:00
Chris Hegarty 3e568b911f
Support addition of diagnostics by custom merge policies (#329)
This commit adds a new `addDiagnostics` method to `SegmentInfo` that
allows custom merge policies to add new diagnostic information to the
segment's diagnostic map.
2021-09-30 09:50:22 +01:00
Dawid Weiss 0c13a52df5 Correct test error that allowed an empty array. 2021-09-30 09:17:29 +02:00
Dawid Weiss d2b88b7a0b LUCENE-10134: clean up the test from leaking threads and resources if an error occurs somewhere - this obscures the original cause of the problem. 2021-09-30 09:14:58 +02:00
Mayya Sharipova 88b264a368
LUCENE-10126 Add extra test on _doc sort (#326)
Add extra test on _doc sort to test
that search with after collects all documents
2021-09-29 14:49:16 -04:00
Adrien Grand 84e4050269
LUCENE-10125: Speed up DirectWriter. (#327)
There was a regression introduced in
https://github.com/apache/lucene/pull/107/files#diff-49b11ced76acedf749c5a5a0ff6e7fe93b8fb64caf8697e487a56f4f7adbb510
where we moved from write logic that was optimized for every number of bits per
value to more general logic that had to work for every number of bits per value.

This PR doesn't restore as much specialization, but some middle ground that
makes flushes and merges of doc values noticeably faster (though not much
faster).
2021-09-29 19:21:14 +02:00
Nhat Nguyen e56995d85e LUCENE-10126: Remove chunk scoring in AssertingBulkScorer
Many tests are failing due to the newly introduced chunk scoring in
AssertingBulkScorer. This commit reverts that change and will
reintroduce it later.
2021-09-28 22:21:17 -04:00
Nhat Nguyen cb153886eb LUCENE-10126: Fix AssertingBulkScorer
AssertingBulkScorer can generate a backward sub-range.
2021-09-28 19:50:40 -04:00
Timothy Potter a73848cfab DOAP changes for release 8.10.0 2021-09-28 13:28:45 -06:00
Nhat Nguyen 5ab900e10b
LUCENE-10126: Fix competitiveIterator wrongly skip documents (#324)
The competitive iterator can wrongly skip a document that is advanced 
but not collected in the previous scoreRange.
2021-09-28 15:26:30 -04:00
Adrien Grand 9f80b4d8fb
LUCENE-10125: Speed up computation of exceptions. (#322)
Even though it was not the driver for the slowdown, in LUCENE-10125 we
identified that the move to PFOR had slowed down indexing significantly
for fields indexed with indexOptions=DOCS. This patch gets some of the
peformance back by using the `LongHeap` that we introduced for vectors
instead of sorting the same array over and over again.

On the NYC Taxis benchmark, I observed ~8% faster merges of postings
with this change.
2021-09-28 17:25:56 +02:00
Adrien Grand 8f3f2ea4ab
LUCENE-10127: Minor speedup to doc values writes. (#325)
This reduces a bit the overhead of writing doc values. On the NYC Taxis
benchmark this resulted in ~10% faster merges for doc values.
2021-09-28 17:23:09 +02:00
Robert Muir 6ac311068f
LUCENE-10128: avoid costly reflection in SparseFixedBitSet ctor
Seems that VectorFormat merge creates A LOT of these bitsets. We don't
need to do any fancy reflection here via shallowSizeOf(Object), when we
can call sizeOf(long[]) which is fast.

We may want to revisit this RAMUsageEstimator api in the future to
prevent traps like this.
2021-09-28 09:39:36 -04:00
Adrien Grand 7357bdc272
LUCENE-10123: Handling of singletons in DocValuesConsumer. (#320)
This avoids double wrapping of doc values in `Lucene90DocValuesConsumer`.
2021-09-28 08:54:46 +02:00
Greg Miller 1ebd193fbe
Move CHANGES entry for LUCENE-10070 under 8.11 after backport (#323) 2021-09-27 12:15:52 -07:00
Uwe Schindler 849d5fc1ac
LUCENE-10125: Optimize primitive writes in OutputStreamIndexOutput (#321) 2021-09-27 19:04:03 +02:00
Julie Tibshirani eaa421094d
LUCENE-10109: Bump default beam width for HNSW (#312)
Lucene90HnswVectorsFormat has a default 'beam width' of 16. This is quite low
and produces poor recall on typical-sized datasets.

This commit bumps it to 100. This new default tries to balance good search
performance with indexing speed. Most runs in ann-benchmarks set the parameter
between ~400 and 800, but they are heavily optimizing search over index speed.
2021-09-24 18:02:34 -07:00
Greg Miller eb44d1e6ad
Add slightly more language in the README Contributing section (#318) 2021-09-24 12:06:06 -07:00
Nhat Nguyen 7390d1af51
LUCENE-10119: Do not set single sort with search after (#317)
We should not set single sort when the search_after is non-null; 
otherwise, we will incorrectly skip documents whose values are equal to
the value from the search_after and docIDs are greater than the docID
from the search_after.
2021-09-23 13:10:17 -04:00
Uwe Schindler fc475360a8 Only pass "--illegal-access=deny" up to JDK-15, later versions deprecate the option and default to "deny" 2021-09-22 19:41:59 +02:00
Lu Xugang ed7fb8dea0
LUCENE-10116: Missing calculating the bytes used of DocsWithFieldSet and currentValues in SortedSetDocValuesWriter (#316) 2021-09-22 14:25:40 +02:00
Lu Xugang a7bddfaacc
LUCENE-10111: Missing calculating the bytes used of DocsWithFieldSet in NormValuesWriter (#307) 2021-09-22 07:44:30 +02:00
Chris Hegarty a7578709a6
LUCENE-10115: Add a fuzzy parsing extension point for custom query parsers
This commit adds the QueryParserBase::getFuzzyDistance protected method, which 
can be overridden by subclasses to provide customisation of how the similarity distance 
is determined. The default implementation retains the current behaviour.
2021-09-21 13:25:09 +01:00
Julie Tibshirani b2a04a4bb4 LUCENE-10069: Adjust TestKnnVectorQuery#testRandom to stop failures
The test fails randomly because HNSW can sometimes miss results when k is close
to the number of total docs. While we wait for a fix, this commit decreases k to
prevent failures.
2021-09-20 14:16:47 -07:00
Uwe Schindler 5871ea7972
LUCENE-10112: Improve LZ4 Compression performance with direct primitive read/writes (#310)
Co-authored-by: Tim Brooks <tim@timbrooks.org>
2021-09-20 19:12:38 +02:00
Christine Poerschke 57524c6a5e
LUCENE-9809: replace 'master' with 'main' in release wizard (#305) 2021-09-20 17:51:41 +01:00
Uwe Schindler c57d6e5f8c
LUCENE-10113: Use VarHandles to access int/long/short types in byte arrays (e.g. ByteArrayDataInput) (#308)
Co-authored-by: Robert Muir <rmuir@apache.org>
2021-09-20 15:37:33 +02:00
Adrien Grand 4bcd64c5ed LUCENE-9620: Fix test bug. 2021-09-20 09:49:13 +02:00
Uwe Schindler 075d801abe
LUCENE-10114: Remove unused byte order mark in Lucene90PostingsWriter (#309)
Co-authored-by: Robert Muir <rmuir@apache.org>
2021-09-20 08:37:05 +02:00
Jim Ferenczi ccf0d5404d
LUCENE-10110: MultiCollector should conditionally wrap single leaf collector (#303)
MultiCollector should wrap single leaf collector that wants to skip low-scoring hits
 but the combined score mode doesn't allow it.
2021-09-20 07:26:51 +02:00
Tomoko Uchida 6c1e5920d8 LUCENE-10102: do not call incrementToken() against already consumed input stream. 2021-09-20 10:58:39 +09:00
Robert Muir 8b95e51d70
Add additional docs refs (nightly, build system help/) to README.md (#302) 2021-09-19 20:24:13 -04:00
Uwe Schindler f3c3b90e35
LUCENE-9047: fix typo in javadocs (still referred to big endian) 2021-09-19 13:51:51 +02:00