Commit Graph

35255 Commits

Author SHA1 Message Date
Adrien Grand 803d131fd0 LUCENE-9535: Try to do larger flushes.
DWPTPool currently always returns the last DWPT that was added to the
pool. By returning the largest DWPT instead, we could try to do larger
flushes by finishing DWPTs that are close to being full instead of the
last one that was added to the pool, which might be close to being
empty.

When indexing wikimediumall, this change did not seem to improve the
indexing rate significantly, but it didn't slow things down either and
the number of flushes went from 224-226 to 216, about 4% less.

My expectation is that our nightly benchmarks are a best-case scenario
for DWPTPool as the same number of threads is dedicated to indexing over
time, but in the case when you have e.g. a single fixed threadpool that
is responsible for indexing into several indices, the number of indexing
threads that contribute to a given index might greatly vary over time.
2021-06-16 10:26:45 +02:00
kkewwei b7b834b756
LUCENE-9998: delete useless param fis in StoredFieldsWriter.finish() and TermVectorsWriter.finish() (#183) 2021-06-15 16:59:42 +02:00
Nhat Nguyen 6f5a413ec6
LUCENE-9935: Clone term vectors reader for merges (#182)
The newly added assertion in the bulk-merge logic doesn't always hold 
because we do not create a new instance of
Lucene90CompressingTermVectorsReader for merges and that reader can be
accessed in tests (as long as it happens on the same thread).

This change clones a new term vectors reader for merges.
2021-06-15 07:10:30 -04:00
Nhat Nguyen 50607e0fb9 LUCENE-9935: Enable bulk-merge for term vectors with index sort (#140)
This change enables bulk-merge for term vectors with index sort. The
algorithm used here is similar to the one that is used to merge stored
fields.

Relates #134
2021-06-14 11:39:38 -04:00
Dawid Weiss 3bedc0871e
LUCENE-9977: rat task corrections (proper up-to-date checks, cleanup and rewrite of the task itself). (#178) 2021-06-11 09:26:34 +02:00
Nhat Nguyen 69ab1447a7 Revert "LUCENE-9935: Enable bulk-merge for term vectors with index sort (#140)"
This reverts commit 54fb21e862.
2021-06-10 11:54:11 -04:00
Nhat Nguyen 54fb21e862
LUCENE-9935: Enable bulk-merge for term vectors with index sort (#140)
This change enables bulk-merge for term vectors with index sort. The 
algorithm used here is similar to the one that is used to merge stored
fields.

Relates #134
2021-06-10 11:03:17 -04:00
Jack Conradson 40f66a450a
LUCENE-9965: Add tooling to introspect query execution time (#144)
This change adds new IndexSearcher and Collector implementations to profile
search execution and break down the timings. The breakdown includes the total
time spent in each of the following categories along with the number of times
visited: create weight, build scorer, next doc, advance, score, match.

Co-authored-by: Julie Tibshirani <julietibs@gmail.com>
2021-06-09 13:25:15 -07:00
Adrien Grand f5e050bd00 LUCENE-9992: Update expectations about vectors with no values. 2021-06-09 18:59:14 +02:00
Michael Sokolov 465cb17d2b
LUCENE-9992: write empty vector fields when merging (#172) 2021-06-09 07:56:50 -04:00
Dawid Weiss 332405e7ad LUCENE-9995: JDK17 generates wbr tags which make javadocs checker angry. 2021-06-09 10:45:01 +02:00
zacharymorn 8bcaf87a83
LUCENE-9976: Fix WANDScorer assertion error (#171)
LUCENE-9976: Fix WANDScorer assertion error as (tailMaxScore >= minCompetitiveScore) && (tailSize < minShouldMatch) are valid now
2021-06-09 00:11:10 -07:00
Julie Tibshirani d22af75686 Fix random failures in TestPerFieldVectorFormat#testMergeUsesNewFormat 2021-06-08 14:26:52 -07:00
Julie Tibshirani 300589433f Move some 9.0 changelog items to 8.x
These were backported so should appear in the later sections. This commit also
fixes some small typos.
2021-06-08 09:11:28 -07:00
Julie Tibshirani e9339253f5
LUCENE-9905: Make sure to use configured vector format when merging (#176)
Before when creating a VectorWriter for merging, we would always load the
default implementation. So if the format was configured with parameters, they
were ignored.

This issue was caught by `TestKnnGraph#testMergeProducesSameGraph`.
2021-06-08 08:07:35 -07:00
Christine Poerschke 1ec2a715a2 Fix 8.9.0 < 8.10.0 comparison in smokeTestRelease.py script. (#2509) 2021-06-08 15:54:57 +01:00
Julie Tibshirani 84499732c1 Mute TestKnnGraph#testMergeProducesSameGraph while we prepare a fix 2021-06-07 16:50:46 -07:00
Julie Tibshirani 05ae738fc9
LUCENE-9905: Move HNSW build parameters to codec (#166)
Previously, the max connections and beam width parameters could be configured as
field type attributes. This PR moves them to be parameters on
Lucene90HnswVectorFormat, to avoid exposing details of the vector format
implementation in the API.
2021-06-07 12:51:59 -07:00
Alan Woodward dbb4c265d5
LUCENE-8143: Remove no-op SpanBoostQuery (#155)
Boosts are ignored on inner span queries, and top-level boosts can
be applied by using a normal BoostQuery, so SpanBoostQuery
itself is redundant and trappy. This commit removes it entirely.
2021-06-07 15:56:16 +01:00
Greg Miller 428d2d99d7
Fix typo in CHANGES.txt (#169) 2021-06-05 07:14:01 -07:00
Greg Miller 4404b19142
LUCENE-9991: Address bug in TestStringValueFacetCounts (#168) 2021-06-04 14:40:07 -07:00
Jan Høydahl d47b75395c
LUCENE-9985 Upgrade Jetty to 9.4.41 (#165) 2021-06-04 09:41:35 +02:00
Greg Miller 7a7003c51c
LUCENE-9988: Fix DrillSideways bug discovered in randomized testing (#167) 2021-06-03 15:03:09 -07:00
Chris Hostetter efb7b2a5e8 LUCENE-9970: Add TooManyNestedClauses extends TooManyClauses so that IndexSearcher.rewrite can distinguish hos maxClauseCount is exceeded
This is an extension of the work done in LUCENE-8811 which added the two types of checks
2021-06-03 12:46:53 -07:00
Naoto MINAMI 89034ad8cf
LUCENE-9823: Prevent unsafe rewrites for SynonymQuery and CombinedFieldQuery. (#160)
Before, rewriting could slightly change the scoring when weights were
specified. We now rewrite less aggressively to avoid changing the query's
behavior.
2021-06-02 17:28:51 -07:00
Julie Tibshirani eecd1971fa
LUCENE-9905: Allow Lucene90Codec to be configured with a per-field vector format (#164)
Previously only AssertingCodec could handle a per-field vector format. This PR
also strengthens the checks in TestPerFieldVectorFormat.
2021-06-02 08:43:54 -07:00
Greg Miller 8b60641bca
LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSideways approach). (#142) 2021-06-02 06:27:48 -07:00
Greg Miller 3c7a76a148
LUCENE-9962: Allow DrillSideways sub-classes to provide their own "drill down" facet counting implementation (or null). (#143) 2021-06-01 12:25:34 -07:00
Mike McCandless c4cf7aa3e1 LUCENE-9981: more efficient getCommonSuffix/Prefix, and more accurate 'effort limit', instead of precise output state limit, during determinize, for throwing TooComplexToDeterminizeException 2021-06-01 13:58:47 -04:00
Gautam Worah 27b009c5d0
LUCENE-9956: Make getBaseQuery, getDrillDownQueries API from DrillDownQuery public (#138)
Co-authored-by: Gautam Worah <gauworah@amazon.com>
2021-06-01 09:54:18 -07:00
Nhat Nguyen c46bcf75cc
LUCENE-9980: Do not expose deleted commits (#158)
If we fail to delete files that belong to a commit point, then we will 
expose that deleted commit in the next calls of IndexDeletionPolicy#onCommit.
I think we should never expose those deleted commit points as 
some of their files might have been deleted already.
2021-05-31 11:03:48 -04:00
Greg Miller d76dd6454e
Add CHANGES.txt entry for LUCENE-9971 (#161) 2021-05-31 06:24:56 -07:00
Alexander Lukyanchikov 65842c5c4d
LUCENE-9971: Inconsistent SSDVFF and Taxonomy facet behavior in case of unseen dimension (#149) 2021-05-31 05:58:30 -07:00
Greg Miller d669ddebc5
LUCENE-9946: Support multi-value fields in range facet counting (#127) 2021-05-30 19:46:11 -07:00
Jan Høydahl 5fdff6eabb
LUCENE-9589 Swedish Minimal Stemmer (#136) 2021-05-28 14:20:11 +02:00
Dawid Weiss 0a316b2495
LUCENE-9975: don't require signing of 'unsignedJars' publication (maven artifacts published to the user's maven local repository, build folder and apache nexus). (#156) 2021-05-28 11:51:28 +02:00
Tomoko Uchida 2160d7239d Revert "LUCENE-9448: clean up unused start scripts for luke."
This reverts commit 16104090fb.
2021-05-27 19:22:29 +09:00
Alan Woodward 1e7d8146ff
LUCENE-9454: Remove version field on Analyzer (#154)
Version switching on Analyzer behaviour should be implemented
in the various component factories, rather than on a mutable
setting on Analyzer itself.
2021-05-26 17:34:01 +01:00
Tomoko Uchida 16104090fb LUCENE-9448: clean up unused start scripts for luke. 2021-05-26 23:32:52 +09:00
Alan Woodward 4464cd87cc
LUCENE 9204: Move SpanQuery and subclasses to the queries module (#152) 2021-05-26 10:12:14 +01:00
Dawid Weiss 5912e65434
LUCENE-9974: The test-framework module should apply the test ruleset for forbidden APIs. (#153) 2021-05-26 10:19:55 +02:00
Alan Woodward 93844d3846
LUCENE-9204: Move helper methods from TestMatchesIterator into a base class (#151)
TestMatchesIterator lives in core/tests and does various sanity checks
on the matches returned by various queries, including Span queries.
The Span-specific tests cannot stay here once Spans have been moved
out of core. This commit pulls various helper methods from this class
into a base class in the test framework, so that we can move the
Spans tests into their own class and keep coverage once things have
been migrated.
2021-05-25 14:16:05 +01:00
Alan Woodward 4b55ae5de4
LUCENE-9204: Remove Spans references from DisiWrapper (#150)
We have a number of helper classes in o.a.l.search that aid the
implementation of two-phase iteration over disjunctions. These have
some Spans-specific code, which will stop compiling once Spans
are moved into the queries module. This commit removes the
Spans references from the main code and duplicates the helper
code within the Spans package.
2021-05-25 14:14:47 +01:00
Alan Woodward 5e0e7a5479
LUCENE-9204: Make ConjunctionDISI package-private and add ConjunctionUtils factory class (#148)
ConjunctionDISI is really an internal implementation of DocIdSetIterator,
and would ideally be package-private. However, it is used in a few
other places:
* directly in ConjunctionSpans
* as a utility in the facet and join modules

This commit adds a public helper class ConjunctionUtils that allows easy
intersection of iterators for use by other modules. This means that
ConjunctionDISI itself can become package-private. It also removes
a reference to Spans from core classes, which will make it easier to
migrate Spans to the queries module.  ConjuctionSpans and
ConjunctionIntervalIterator now use the public Utils class, and intervals
no longer need their own ConjunctionDISI implementation.
2021-05-25 12:07:20 +01:00
Mike McCandless 654e978190 LUCENE-9967: don't throw NullPointerException while handling a different root-cause exception in ReplicaNode.start 2021-05-24 10:51:26 -04:00
Dawid Weiss f7fbb9eda5 Add a small clarification about the required Java version for gradle. 2021-05-24 09:59:54 +02:00
Nhat Nguyen a12260eb95
LUCENE-9827: Update backward codec in Lucene 9.0 (#147)
We need to update the reading logic of the backward codec in Lucene 9 
for LUCENE-9827 and LUCENE-9935 as we have backported them to Lucene 8.

Relates apache/lucene-solr#2495
Relates apache/lucene-solr#2494
2021-05-20 08:49:43 -04:00
Houston Putman f919672647 LUCENE-9936: Add Gpg Signing help info to gradle help command 2021-05-19 10:43:31 -05:00
Greg Miller 693b6d3e34
move changes entry for backport to 8.9 (#145)
Co-authored-by: Greg Miller <gmiller@amazon.com>
2021-05-19 07:04:23 -04:00
Greg Miller 65820e5170
LUCENE-9953: Make FacetResult#value accurate for LongValueFacetCounts multi-value doc cases (#131)
Co-authored-by: Greg Miller <gmiller@amazon.com>
2021-05-18 12:37:53 -04:00