lucene

Commit Graph

Author	SHA1	Message	Date
Mayya Sharipova	597398439c	LUCENE-10027 Changes for Dir Open with leafSorter Adjust changes to Directory Open API from commit with leafsorter according with v. 8.10. Relates to PR #214	2021-07-30 13:42:29 -04:00
Mayya Sharipova	1daf7e7c74	LUCENE-10027 provide leaf sorter from commit (#214 ) Provide leaf sorter for directory readers opened from IndexCommit LUCENE-9507 allowed to provide a leaf sorter for directory readers. One API that was missed is to allow to provide a leaf sorter for directory readers opened from an index commit. This patch address this by adding an extra parameter: a custom comparator for sorting leaf readers to the Directory reader open API from indexCommit and minSupportedMajorVersion. Relates to PR #32	2021-07-30 09:15:21 -04:00
Gautam Worah	56eb76dbaf	Simplify some code	2021-07-29 13:12:27 -04:00
Gautam Worah	bd3174de10	PR fixes 1. Change negation to 2. Move statement inside if condition	2021-07-29 13:12:27 -04:00
Gautam Worah	cec19125fa	Fix minor logic	2021-07-29 13:12:27 -04:00
Gautam Worah	be0a3e5721	Move the version check to a final variable that is initialized in the constructor	2021-07-29 13:12:27 -04:00
Gautam Worah	162131ecf8	Use BDV or a StoredField based on the Lucene version that has created the last index commit If the Lucene version was < 9 then use a StringField or else if the index is fresh or if the index is was built using a version >= 9, then use a BDV field.	2021-07-29 13:12:27 -04:00
Gautam Worah	7cb696041c	Category documents added in the Lucene 9.0 taxonomy index use a BDV field with a different name Using BDV fields with a different "$full_path_binary$" name ensures that the earlier "$full_path$" StringField does not have the same name as the BDV field and hence they don't violate the field type consistency check (LUCENE-9334). This commit also enables the back-compat check that was disabled earlier.	2021-07-29 13:12:27 -04:00
Nhat Nguyen	ba417b593f	LUCENE-10032: Remove leafDocMaps from MergeState (#222 ) These maps are no longer useful after LUCENE-8505.	2021-07-29 08:28:39 -04:00
Adrien Grand	0e6c3146d7	LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields. (#221 ) When sorting by low-cardinality fields, the same sub remains current for long sequences of doc IDs. This speeds up SortedDocIdMerger a bit by extracting the sub that leads iteration.	2021-07-29 08:46:10 +02:00
Shintaro Murakami	03b1db91f9	LUCENE-9304: Remove assertion in DocumentsWriterFlushControl (#228 ) This is assertion becomes obvious after LUCENE-9304.	2021-07-28 10:05:00 -04:00
Julie Tibshirani	e8663b30b8	LUCENE-10039: Fix single-field scoring for CombinedFieldQuery (#229 ) When there's only one field, CombinedFieldQuery will ignore its weight while scoring. This makes the scoring inconsistent, since the field weight is supposed to multiply its term frequency. This PR removes the optimizations around single-field scoring to make sure the weight is always taken into account. These optimizations are not critical since it should be uncommon to use CombinedFieldQuery with only one field.	2021-07-28 15:43:56 +03:00
Greg Miller	e44636c280	LUCENE-10036: Add factory method to ScoreCachingWrappingScorer that ensures unnecessary wrapping doesn't occur (#226 )	2021-07-27 07:53:36 -07:00
Greg Miller	736d114901	Add CHANGES entry for LUCENE-10030	2021-07-26 13:11:32 -07:00
Grigoriy Troitskiy	61f8517000	LUCENE-10030: Lazily evaluate score in DrillSidewaysScorer.doQueryFirstScoring (#217 )	2021-07-26 13:04:51 -07:00
Michael Sokolov	0ec93b632c	LUCENE-10016: fix test case to use the same similarity in both cases	2021-07-24 15:22:39 -04:00
Tomoko Uchida	df807dbe8f	LUCENE-9855: Rename knn search vector format (#218 )	2021-07-24 12:03:15 +09:00
Greg Miller	ad7746d6e3	LUCENE-10000: Make MultiCollectorManager consistent with MultiCollector (#196 ) MultiCollectorManager is now consistent with MultiCollector with respect to early termination, min score setting and score caching.	2021-07-22 19:02:15 -07:00
Adrien Grand	28ba8b7797	LUCENE-10015: Remove VectorSimilarityFunction#NONE. (#219 )	2021-07-21 10:06:27 +02:00
Adrien Grand	acf45d8a31	LUCENE-10016: Remove VectorValues#getSimilarityFunction. (#213 ) VectorValues is only about iterating over vectors in doc ID order, so it feels wrong to tie it to the similarity function.	2021-07-19 09:48:09 +02:00
Michael Sokolov	9b5e233960	LUCENE-10016: remove fanout parameter from nearest neighbor vector search (#210 )	2021-07-17 11:12:15 -04:00
Tomoko Uchida	2bd6924f07	add changes entry of LUCENE-10024	2021-07-17 14:41:43 +09:00
Tomoko Uchida	40038bcc92	LUCENE-10024: remove non-existing path from history file	2021-07-17 14:30:27 +09:00
Michael Wechner	489ba3e4f9	LUCENE-10024: Catch NoSuchFileException when opening index directory	2021-07-17 13:43:12 +09:00
Julie Tibshirani	982b95e38e	Add missing changelog entry for LUCENE-10026	2021-07-16 21:09:48 -07:00
Jim Ferenczi	f333b70dbf	LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields (#185 ) This change fixes a bug in `MultiNormsLeafSimScorer` that assumes that each field should have a norm for every term/document. As part of the fix, it adds validation that the fields have consistent norms settings.	2021-07-16 18:40:28 -07:00
Julie Tibshirani	30beb70ffa	Small fix to CombinedFieldQuery#hashCode	2021-07-16 13:21:52 -07:00
Julie Tibshirani	b9a70c28b6	LUCENE-10026: Fix CombinedFieldQuery equals and hashCode (#212 ) The previous equals and hashCode methods only compared query terms. This meant that queries on different fields, or with different field weights, were considered the same During boolean query rewrites, duplicate clauses are removed. So because equals/ hashCode was incorrect, rewrites could accidentally drop CombinedFieldQuery clauses.	2021-07-16 09:59:33 -07:00
Robert Muir	e65941f9c5	Fix broken ICU license link to point to the new ICU github. The previous svn-based link no longer works. Instead point at the license file in github: it is for icu4c, but see the repo: user is explicitly directed at this license file for both icu4c and icu4j. Good case to have a correct link, as the ICU license is complicated. It even has "if (version > X)" conditionals in the legalese!!!	2021-07-13 23:11:55 -04:00
Robert Muir	5cf142f972	LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode (#211 ) Re-enable the randomized testing here, but with a separate test for each mode rather than all in one method. It gives better testing and also easier-to-debug testing.	2021-07-13 23:11:18 -04:00
Robert Muir	c21b0adb14	reorder items in CHANGES.txt to better match branch_8x ! There are other abnormalities, but the order of entries is an easy thing to fix.	2021-07-13 21:56:56 -04:00
Michael Gibney	c3482c99ff	LUCENE-9177: ICUNormalizer2CharFilter streaming no longer depends on presence of normalization-inert characters (#199 ) Normalization-inert characters need not be required as boundaries for incremental processing. It is sufficient to check `hasBoundaryAfter` and `hasBoundaryBefore`, substantially improving worst-case performance.	2021-07-13 21:26:40 -04:00
Patrick Zhai	caa822ff38	LUCENE-10021: Upgrade HPPC to 0.9.0. Replace usage of ...ScatterMap to ...HashMap (#209 )	2021-07-13 14:38:46 +02:00
zacharymorn	180cfa241b	LUCENE-9959: Add non thread local based API for term vector reader usage (#180 )	2021-07-12 23:34:52 -07:00
Uwe Schindler	15034f6c90	LUCENE-10019: Add extra checks as suggested by Adrien	2021-07-12 22:33:10 +02:00
Julie Tibshirani	08e61a2201	LUCENE-10022: Rewrite empty DisjunctionMaxQuery to MatchNoDocsQuery It's possible to create a DisjunctionMaxQuery with no clauses. This is now rewritten to MatchNoDocsQuery, matching the approach we take for BooleanQuery.	2021-07-09 11:38:41 -07:00
Uwe Schindler	0607ebed6b	add changes	2021-07-09 18:54:20 +02:00
Uwe Schindler	69e85924b7	LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes) (#203 )	2021-07-09 18:50:01 +02:00
Mayya Sharipova	64d9f8c587	LUCENE-10020 DocComparator don't skip docs of same docID (#204 ) DocComparator should not skip docs with the same docID on multiple sorts with search after. Because of the optimization introduced in LUCENE-9449, currently when searching with sort on [_doc, other fields] with search after, DocComparator can efficiently skip all docs before and including the provided [search after docID]. This is a desirable behaviour in a single index search. But in a distributed search, where multiple indices have docs with the same docID, and when searching on [_doc, other fields], the sort optimization should NOT skip documents with the same docIDs. This PR fixes this. Relates to LUCENE-9449	2021-07-06 14:59:57 -04:00
Christine Poerschke	167bd99c23	LUCENE-8638: remove deprecated LegacyDocValues classes in org.apache.lucene.codecs.memory package (#198 )	2021-07-02 17:59:01 +01:00
Christine Poerschke	960e229df5	remove no-longer-accurate sentence in TopTermsScoringBooleanQueryRewrite javadocs (#197 )	2021-07-02 17:42:17 +01:00
Gautam Worah	a0d995d0c4	LUCENE-9964: Duplicate long values in a field should only be counted once when using SortedNumericDocValuesFields (#191 )	2021-07-01 13:34:00 -07:00
Geoffrey Lawson	834041f286	LUCENE-9963 Add tests for alternate path failures in FlattenGraphFilter (#146 ) Co-authored-by: Lawson <geoffrl@amazon.com>	2021-06-28 12:04:04 -04:00
Mike McCandless	3d833fdb66	LUCENE-10009: fix longstanding cosmetic bug in IndexWriter's infoStream logging, falsely claiming term frequencies were not enabled when positions were not indexed (thank you @yangsongbai)	2021-06-25 10:56:18 -04:00
Greg Miller	578f5cf51b	Fix concurrency bug in DrillSidewaysQuery (#195 )	2021-06-24 12:18:38 -07:00
Greg Miller	9942d59f0d	Move CHANGES entries to 8.10 for LUCENE-9962/9946/9944/9988 (#194 )	2021-06-24 06:41:06 -07:00
balmukundblr	f1d54f7c35	Parallel processing (#132 ) * Added a explicit Flush Task to flush data at Thread level once it completes the processing * Included explicit flush per Thread level * Done changes for parallel processing * Removed extra brace * Removed unused variable * Removed unused variable initialization * Did the required formating * Refactored the code and added required comments & checks	2021-06-24 09:17:19 -04:00
Mike McCandless	db26215f15	LUCENE-9902: move CHANGES entry to 8.10.0	2021-06-23 16:32:43 -04:00
Patrick Zhai	48ff29c8f3	LUCENE-9983: Stop sorting determinize powersets unnecessarily (#163 ) * LUCENE-9983: Stop sorting determinize powersets unnecessarily	2021-06-23 13:07:22 -04:00
Adrien Grand	1d5d458960	LUCENE-9613: Encode ordinals like numerics. (#186 ) This helps simplify the code, and also adds some optimizations to ordinals like better compression for long runs of equal values or fields that are used in index sorts.	2021-06-23 15:37:50 +02:00

1 2 3 4 5 ...

35213 Commits All Branches Search

35213 Commits

All Branches