Commit Graph

35290 Commits

Author SHA1 Message Date
Michael Wechner 489ba3e4f9 LUCENE-10024: Catch NoSuchFileException when opening index directory 2021-07-17 13:43:12 +09:00
Julie Tibshirani 982b95e38e Add missing changelog entry for LUCENE-10026 2021-07-16 21:09:48 -07:00
Jim Ferenczi f333b70dbf
LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields (#185)
This change fixes a bug in `MultiNormsLeafSimScorer` that assumes that each
field should have a norm for every term/document.

As part of the fix, it adds validation that the fields have consistent norms
settings.
2021-07-16 18:40:28 -07:00
Julie Tibshirani 30beb70ffa Small fix to CombinedFieldQuery#hashCode 2021-07-16 13:21:52 -07:00
Julie Tibshirani b9a70c28b6
LUCENE-10026: Fix CombinedFieldQuery equals and hashCode (#212)
The previous equals and hashCode methods only compared query terms. This meant
that queries on different fields, or with different field weights, were
considered the same

During boolean query rewrites, duplicate clauses are removed. So because equals/
hashCode was incorrect, rewrites could accidentally drop CombinedFieldQuery
clauses.
2021-07-16 09:59:33 -07:00
Robert Muir e65941f9c5
Fix broken ICU license link to point to the new ICU github.
The previous svn-based link no longer works. Instead point at the
license file in github: it is for icu4c, but see the repo: user is
explicitly directed at this license file for both icu4c and icu4j.

Good case to have a correct link, as the ICU license is complicated. It
even has "if (version > X)" conditionals in the legalese!!!
2021-07-13 23:11:55 -04:00
Robert Muir 5cf142f972
LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode (#211)
Re-enable the randomized testing here, but with a separate test for each
mode rather than all in one method. It gives better testing and also easier-to-debug
testing.
2021-07-13 23:11:18 -04:00
Robert Muir c21b0adb14
reorder items in CHANGES.txt to better match branch_8x !
There are other abnormalities, but the order of entries is an easy thing
to fix.
2021-07-13 21:56:56 -04:00
Michael Gibney c3482c99ff
LUCENE-9177: ICUNormalizer2CharFilter streaming no longer depends on presence of normalization-inert characters (#199)
Normalization-inert characters need not be required as boundaries
for incremental processing. It is sufficient to check `hasBoundaryAfter`
and `hasBoundaryBefore`, substantially improving worst-case performance.
2021-07-13 21:26:40 -04:00
Patrick Zhai caa822ff38
LUCENE-10021: Upgrade HPPC to 0.9.0. Replace usage of ...ScatterMap to ...HashMap (#209) 2021-07-13 14:38:46 +02:00
zacharymorn 180cfa241b
LUCENE-9959: Add non thread local based API for term vector reader usage (#180) 2021-07-12 23:34:52 -07:00
Uwe Schindler 15034f6c90 LUCENE-10019: Add extra checks as suggested by Adrien 2021-07-12 22:33:10 +02:00
Julie Tibshirani 08e61a2201 LUCENE-10022: Rewrite empty DisjunctionMaxQuery to MatchNoDocsQuery
It's possible to create a DisjunctionMaxQuery with no clauses. This is now
rewritten to MatchNoDocsQuery, matching the approach we take for BooleanQuery.
2021-07-09 11:38:41 -07:00
Uwe Schindler 0607ebed6b add changes 2021-07-09 18:54:20 +02:00
Uwe Schindler 69e85924b7
LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes) (#203) 2021-07-09 18:50:01 +02:00
Mayya Sharipova 64d9f8c587
LUCENE-10020 DocComparator don't skip docs of same docID (#204)
DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
 [_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Relates to LUCENE-9449
2021-07-06 14:59:57 -04:00
Christine Poerschke 167bd99c23
LUCENE-8638: remove deprecated Legacy*DocValues* classes in org.apache.lucene.codecs.memory package (#198) 2021-07-02 17:59:01 +01:00
Christine Poerschke 960e229df5
remove no-longer-accurate sentence in TopTermsScoringBooleanQueryRewrite javadocs (#197) 2021-07-02 17:42:17 +01:00
Gautam Worah a0d995d0c4
LUCENE-9964: Duplicate long values in a field should only be counted once when using SortedNumericDocValuesFields (#191) 2021-07-01 13:34:00 -07:00
Geoffrey Lawson 834041f286
LUCENE-9963 Add tests for alternate path failures in FlattenGraphFilter (#146)
Co-authored-by: Lawson <geoffrl@amazon.com>
2021-06-28 12:04:04 -04:00
Mike McCandless 3d833fdb66 LUCENE-10009: fix longstanding cosmetic bug in IndexWriter's infoStream logging, falsely claiming term frequencies were not enabled when positions were not indexed (thank you @yangsongbai) 2021-06-25 10:56:18 -04:00
Greg Miller 578f5cf51b
Fix concurrency bug in DrillSidewaysQuery (#195) 2021-06-24 12:18:38 -07:00
Greg Miller 9942d59f0d
Move CHANGES entries to 8.10 for LUCENE-9962/9946/9944/9988 (#194) 2021-06-24 06:41:06 -07:00
balmukundblr f1d54f7c35
Parallel processing (#132)
* Added a explicit Flush Task to flush data at Thread level once it completes the processing

* Included explicit flush per Thread level

* Done changes for parallel processing

* Removed extra brace

* Removed unused variable

* Removed unused variable initialization

* Did the required formating

* Refactored the code and added required comments & checks
2021-06-24 09:17:19 -04:00
Mike McCandless db26215f15 LUCENE-9902: move CHANGES entry to 8.10.0 2021-06-23 16:32:43 -04:00
Patrick Zhai 48ff29c8f3
LUCENE-9983: Stop sorting determinize powersets unnecessarily (#163)
* LUCENE-9983: Stop sorting determinize powersets unnecessarily
2021-06-23 13:07:22 -04:00
Adrien Grand 1d5d458960
LUCENE-9613: Encode ordinals like numerics. (#186)
This helps simplify the code, and also adds some optimizations to ordinals like
better compression for long runs of equal values or fields that are used in
index sorts.
2021-06-23 15:37:50 +02:00
Michael Gibney 495bf6730f
For stability of DisjunctionIntervalsSource.toString(), sort subSources (#193)
Iterators over subSources of DisjunctionIntervalsSource may
return elements in indeterminate order, requiring special handling
to make toString() output stable across equivalent instances
2021-06-23 07:53:30 -04:00
Mike McCandless 636d10be64 LUCENE-9981: move CHANGES.txt entry to the confusingly no-longer-a-proper-floating-point-number 8.10.0 section 2021-06-22 07:25:10 -04:00
Karl Wright e6ed1fb075 LUCENE-10012: Improve concurrency with path distance caching. 2021-06-22 05:18:30 -04:00
Mayya Sharipova a40d5a4258 Add back-compat indices for 8.9.0 2021-06-21 14:35:49 -04:00
Mayya Sharipova 8cca0290ff Add next minor version 8.9 2021-06-21 14:23:25 -04:00
Mayya Sharipova d821178bc0
Sync CHANGES for 8.9.0 (#189)
Move changes that were released in 8.9.0 from 9.0.0 to 8.9.0.
2021-06-18 17:25:13 -04:00
Adrien Grand 1365156fcd
LUCENE-9996: Reduce RAM usage of DWPT for a single document. (#184)
With this change, doc-value terms dictionaries use a shared `ByteBlockPool`
across all fields, and points, binary doc values and doc-value ordinals use
slightly smaller page sizes.
2021-06-18 09:17:50 +02:00
Mayya Sharipova 065026b74e DOAP changes for release 8.9.0 2021-06-17 16:01:03 -04:00
Adrien Grand 803d131fd0 LUCENE-9535: Try to do larger flushes.
DWPTPool currently always returns the last DWPT that was added to the
pool. By returning the largest DWPT instead, we could try to do larger
flushes by finishing DWPTs that are close to being full instead of the
last one that was added to the pool, which might be close to being
empty.

When indexing wikimediumall, this change did not seem to improve the
indexing rate significantly, but it didn't slow things down either and
the number of flushes went from 224-226 to 216, about 4% less.

My expectation is that our nightly benchmarks are a best-case scenario
for DWPTPool as the same number of threads is dedicated to indexing over
time, but in the case when you have e.g. a single fixed threadpool that
is responsible for indexing into several indices, the number of indexing
threads that contribute to a given index might greatly vary over time.
2021-06-16 10:26:45 +02:00
kkewwei b7b834b756
LUCENE-9998: delete useless param fis in StoredFieldsWriter.finish() and TermVectorsWriter.finish() (#183) 2021-06-15 16:59:42 +02:00
Nhat Nguyen 6f5a413ec6
LUCENE-9935: Clone term vectors reader for merges (#182)
The newly added assertion in the bulk-merge logic doesn't always hold 
because we do not create a new instance of
Lucene90CompressingTermVectorsReader for merges and that reader can be
accessed in tests (as long as it happens on the same thread).

This change clones a new term vectors reader for merges.
2021-06-15 07:10:30 -04:00
Nhat Nguyen 50607e0fb9 LUCENE-9935: Enable bulk-merge for term vectors with index sort (#140)
This change enables bulk-merge for term vectors with index sort. The
algorithm used here is similar to the one that is used to merge stored
fields.

Relates #134
2021-06-14 11:39:38 -04:00
Dawid Weiss 3bedc0871e
LUCENE-9977: rat task corrections (proper up-to-date checks, cleanup and rewrite of the task itself). (#178) 2021-06-11 09:26:34 +02:00
Nhat Nguyen 69ab1447a7 Revert "LUCENE-9935: Enable bulk-merge for term vectors with index sort (#140)"
This reverts commit 54fb21e862.
2021-06-10 11:54:11 -04:00
Nhat Nguyen 54fb21e862
LUCENE-9935: Enable bulk-merge for term vectors with index sort (#140)
This change enables bulk-merge for term vectors with index sort. The 
algorithm used here is similar to the one that is used to merge stored
fields.

Relates #134
2021-06-10 11:03:17 -04:00
Jack Conradson 40f66a450a
LUCENE-9965: Add tooling to introspect query execution time (#144)
This change adds new IndexSearcher and Collector implementations to profile
search execution and break down the timings. The breakdown includes the total
time spent in each of the following categories along with the number of times
visited: create weight, build scorer, next doc, advance, score, match.

Co-authored-by: Julie Tibshirani <julietibs@gmail.com>
2021-06-09 13:25:15 -07:00
Adrien Grand f5e050bd00 LUCENE-9992: Update expectations about vectors with no values. 2021-06-09 18:59:14 +02:00
Michael Sokolov 465cb17d2b
LUCENE-9992: write empty vector fields when merging (#172) 2021-06-09 07:56:50 -04:00
Dawid Weiss 332405e7ad LUCENE-9995: JDK17 generates wbr tags which make javadocs checker angry. 2021-06-09 10:45:01 +02:00
zacharymorn 8bcaf87a83
LUCENE-9976: Fix WANDScorer assertion error (#171)
LUCENE-9976: Fix WANDScorer assertion error as (tailMaxScore >= minCompetitiveScore) && (tailSize < minShouldMatch) are valid now
2021-06-09 00:11:10 -07:00
Julie Tibshirani d22af75686 Fix random failures in TestPerFieldVectorFormat#testMergeUsesNewFormat 2021-06-08 14:26:52 -07:00
Julie Tibshirani 300589433f Move some 9.0 changelog items to 8.x
These were backported so should appear in the later sections. This commit also
fixes some small typos.
2021-06-08 09:11:28 -07:00
Julie Tibshirani e9339253f5
LUCENE-9905: Make sure to use configured vector format when merging (#176)
Before when creating a VectorWriter for merging, we would always load the
default implementation. So if the format was configured with parameters, they
were ignored.

This issue was caught by `TestKnnGraph#testMergeProducesSameGraph`.
2021-06-08 08:07:35 -07:00