Commit Graph

35221 Commits

Author SHA1 Message Date
Geoffrey Lawson 647255b4d2
LUCENE-9963 Improve FlattenGraphFilter's robustness when handling incoming token graphs with holes (#157)
6 main improvements:
    1) Iterate through all output.InputNodes since dest gaps can exist.
    2) freeBefore the minimum input node instead of the first input node(which was usually, but not always, the minimum).
    3) Don't freeBefore from a hole source node. Book keeping may not be correct and could result in an early free.
    4) When adding an output node after hole recovery, calculate its new position increment instead of adding it to the end of the output graph.
    5) Nodes after holes that have edges to their source will do the output re-mapping that the deleted node would have done.
    6) If a disconnected input node swaps order with another node in the output, then map them to the same output node.

Co-authored-by: Lawson <geoffrl@amazon.com>
2021-08-09 16:06:53 -04:00
Greg Miller a11457b4e6
LUCENE-10047: Fix value de-duping check in LongValueFacetCounts and RangeFacetCounts (#237) 2021-08-07 10:20:49 -07:00
Greg Miller e937e739f3
LUCENE-10046: Fix counting bug in StringValueFacetCounts (#236) 2021-08-07 07:32:50 -07:00
Greg Miller 3037e33025
Slight improvement/optimization to duplicate facet value checking (ref: LUCENE-9964) (#234) 2021-08-06 12:57:09 -07:00
Greg Miller 645b64ef4e Update CHANGES entry for LUCENE-9945 after backporting 2021-08-02 16:38:10 -07:00
Sejal Pawar a76f2f8072
LUCENE-9945: Extend DrillSidewaysResult to expose drillDowns and drillSideways (#159) 2021-08-02 16:01:08 -07:00
Greg Miller 7450a7e64b Update CHANGES entry for LUCENE-10030 after backporting 2021-08-01 12:39:11 -07:00
Dawid Weiss b016c8dc2a LUCENE-10042: JAR minimal manifest JDK entries are incorrectly set to build-JVM 2021-08-01 14:14:42 +02:00
Mayya Sharipova 597398439c LUCENE-10027 Changes for Dir Open with leafSorter
Adjust changes to Directory Open API from commit with
leafsorter according with v. 8.10.

Relates to PR #214
2021-07-30 13:42:29 -04:00
Mayya Sharipova 1daf7e7c74
LUCENE-10027 provide leaf sorter from commit (#214)
Provide leaf sorter for directory readers opened from IndexCommit

LUCENE-9507 allowed to provide a leaf sorter for directory readers.
One API that was missed is to allow to provide a leaf sorter
for directory readers opened from an index commit.
This patch address this by adding an extra parameter: a custom
comparator for sorting leaf readers to the Directory reader open API
from indexCommit and minSupportedMajorVersion.

Relates to PR #32
2021-07-30 09:15:21 -04:00
Gautam Worah 56eb76dbaf Simplify some code 2021-07-29 13:12:27 -04:00
Gautam Worah bd3174de10 PR fixes 1. Change negation to 2. Move statement inside if condition 2021-07-29 13:12:27 -04:00
Gautam Worah cec19125fa Fix minor logic 2021-07-29 13:12:27 -04:00
Gautam Worah be0a3e5721 Move the version check to a final variable that is initialized in the
constructor
2021-07-29 13:12:27 -04:00
Gautam Worah 162131ecf8 Use BDV or a StoredField based on the Lucene version that has created
the last index commit

If the Lucene version was < 9 then use a StringField or else
if the index is fresh or if the index is was built using a
version >= 9, then use a BDV field.
2021-07-29 13:12:27 -04:00
Gautam Worah 7cb696041c Category documents added in the Lucene 9.0 taxonomy index use a
BDV field with a different name

Using BDV fields with a different "$full_path_binary$" name
ensures that the earlier "$full_path$" StringField does not have the same name as the
BDV field and hence they don't violate the field type consistency check
(LUCENE-9334).

This commit also enables the back-compat check that was disabled
earlier.
2021-07-29 13:12:27 -04:00
Nhat Nguyen ba417b593f
LUCENE-10032: Remove leafDocMaps from MergeState (#222)
These maps are no longer useful after LUCENE-8505.
2021-07-29 08:28:39 -04:00
Adrien Grand 0e6c3146d7
LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields. (#221)
When sorting by low-cardinality fields, the same sub remains current for long
sequences of doc IDs. This speeds up SortedDocIdMerger a bit by extracting
the sub that leads iteration.
2021-07-29 08:46:10 +02:00
Shintaro Murakami 03b1db91f9
LUCENE-9304: Remove assertion in DocumentsWriterFlushControl (#228)
This is assertion becomes obvious after LUCENE-9304.
2021-07-28 10:05:00 -04:00
Julie Tibshirani e8663b30b8
LUCENE-10039: Fix single-field scoring for CombinedFieldQuery (#229)
When there's only one field, CombinedFieldQuery will ignore its weight while
scoring. This makes the scoring inconsistent, since the field weight is supposed
to multiply its term frequency.

This PR removes the optimizations around single-field scoring to make sure the
weight is always taken into account. These optimizations are not critical since
it should be uncommon to use CombinedFieldQuery with only one field.
2021-07-28 15:43:56 +03:00
Greg Miller e44636c280
LUCENE-10036: Add factory method to ScoreCachingWrappingScorer that ensures unnecessary wrapping doesn't occur (#226) 2021-07-27 07:53:36 -07:00
Greg Miller 736d114901 Add CHANGES entry for LUCENE-10030 2021-07-26 13:11:32 -07:00
Grigoriy Troitskiy 61f8517000
LUCENE-10030: Lazily evaluate score in DrillSidewaysScorer.doQueryFirstScoring (#217) 2021-07-26 13:04:51 -07:00
Michael Sokolov 0ec93b632c LUCENE-10016: fix test case to use the same similarity in both cases 2021-07-24 15:22:39 -04:00
Tomoko Uchida df807dbe8f
LUCENE-9855: Rename knn search vector format (#218) 2021-07-24 12:03:15 +09:00
Greg Miller ad7746d6e3
LUCENE-10000: Make MultiCollectorManager consistent with MultiCollector (#196)
MultiCollectorManager is now consistent with MultiCollector with respect to
early termination, min score setting and score caching.
2021-07-22 19:02:15 -07:00
Adrien Grand 28ba8b7797
LUCENE-10015: Remove VectorSimilarityFunction#NONE. (#219) 2021-07-21 10:06:27 +02:00
Adrien Grand acf45d8a31
LUCENE-10016: Remove VectorValues#getSimilarityFunction. (#213)
VectorValues is only about iterating over vectors in doc ID order, so it feels
wrong to tie it to the similarity function.
2021-07-19 09:48:09 +02:00
Michael Sokolov 9b5e233960
LUCENE-10016: remove fanout parameter from nearest neighbor vector search (#210) 2021-07-17 11:12:15 -04:00
Tomoko Uchida 2bd6924f07 add changes entry of LUCENE-10024 2021-07-17 14:41:43 +09:00
Tomoko Uchida 40038bcc92 LUCENE-10024: remove non-existing path from history file 2021-07-17 14:30:27 +09:00
Michael Wechner 489ba3e4f9 LUCENE-10024: Catch NoSuchFileException when opening index directory 2021-07-17 13:43:12 +09:00
Julie Tibshirani 982b95e38e Add missing changelog entry for LUCENE-10026 2021-07-16 21:09:48 -07:00
Jim Ferenczi f333b70dbf
LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields (#185)
This change fixes a bug in `MultiNormsLeafSimScorer` that assumes that each
field should have a norm for every term/document.

As part of the fix, it adds validation that the fields have consistent norms
settings.
2021-07-16 18:40:28 -07:00
Julie Tibshirani 30beb70ffa Small fix to CombinedFieldQuery#hashCode 2021-07-16 13:21:52 -07:00
Julie Tibshirani b9a70c28b6
LUCENE-10026: Fix CombinedFieldQuery equals and hashCode (#212)
The previous equals and hashCode methods only compared query terms. This meant
that queries on different fields, or with different field weights, were
considered the same

During boolean query rewrites, duplicate clauses are removed. So because equals/
hashCode was incorrect, rewrites could accidentally drop CombinedFieldQuery
clauses.
2021-07-16 09:59:33 -07:00
Robert Muir e65941f9c5
Fix broken ICU license link to point to the new ICU github.
The previous svn-based link no longer works. Instead point at the
license file in github: it is for icu4c, but see the repo: user is
explicitly directed at this license file for both icu4c and icu4j.

Good case to have a correct link, as the ICU license is complicated. It
even has "if (version > X)" conditionals in the legalese!!!
2021-07-13 23:11:55 -04:00
Robert Muir 5cf142f972
LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode (#211)
Re-enable the randomized testing here, but with a separate test for each
mode rather than all in one method. It gives better testing and also easier-to-debug
testing.
2021-07-13 23:11:18 -04:00
Robert Muir c21b0adb14
reorder items in CHANGES.txt to better match branch_8x !
There are other abnormalities, but the order of entries is an easy thing
to fix.
2021-07-13 21:56:56 -04:00
Michael Gibney c3482c99ff
LUCENE-9177: ICUNormalizer2CharFilter streaming no longer depends on presence of normalization-inert characters (#199)
Normalization-inert characters need not be required as boundaries
for incremental processing. It is sufficient to check `hasBoundaryAfter`
and `hasBoundaryBefore`, substantially improving worst-case performance.
2021-07-13 21:26:40 -04:00
Patrick Zhai caa822ff38
LUCENE-10021: Upgrade HPPC to 0.9.0. Replace usage of ...ScatterMap to ...HashMap (#209) 2021-07-13 14:38:46 +02:00
zacharymorn 180cfa241b
LUCENE-9959: Add non thread local based API for term vector reader usage (#180) 2021-07-12 23:34:52 -07:00
Uwe Schindler 15034f6c90 LUCENE-10019: Add extra checks as suggested by Adrien 2021-07-12 22:33:10 +02:00
Julie Tibshirani 08e61a2201 LUCENE-10022: Rewrite empty DisjunctionMaxQuery to MatchNoDocsQuery
It's possible to create a DisjunctionMaxQuery with no clauses. This is now
rewritten to MatchNoDocsQuery, matching the approach we take for BooleanQuery.
2021-07-09 11:38:41 -07:00
Uwe Schindler 0607ebed6b add changes 2021-07-09 18:54:20 +02:00
Uwe Schindler 69e85924b7
LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes) (#203) 2021-07-09 18:50:01 +02:00
Mayya Sharipova 64d9f8c587
LUCENE-10020 DocComparator don't skip docs of same docID (#204)
DocComparator should not skip docs with the same docID on multiple
sorts with search after.

Because of the optimization introduced in LUCENE-9449, currently when
searching with sort on [_doc, other fields] with search after,
DocComparator can efficiently skip all docs before and including
the provided [search after docID]. This is a desirable behaviour
in a single index search. But in a distributed search, where multiple
indices have docs with the same docID, and when searching on
 [_doc, other fields], the sort optimization should NOT skip
documents with the same docIDs.

This PR fixes this.

Relates to LUCENE-9449
2021-07-06 14:59:57 -04:00
Christine Poerschke 167bd99c23
LUCENE-8638: remove deprecated Legacy*DocValues* classes in org.apache.lucene.codecs.memory package (#198) 2021-07-02 17:59:01 +01:00
Christine Poerschke 960e229df5
remove no-longer-accurate sentence in TopTermsScoringBooleanQueryRewrite javadocs (#197) 2021-07-02 17:42:17 +01:00
Gautam Worah a0d995d0c4
LUCENE-9964: Duplicate long values in a field should only be counted once when using SortedNumericDocValuesFields (#191) 2021-07-01 13:34:00 -07:00