lucene

Commit Graph

Author	SHA1	Message	Date
Tomoko Uchida	40038bcc92	LUCENE-10024: remove non-existing path from history file	2021-07-17 14:30:27 +09:00
Michael Wechner	489ba3e4f9	LUCENE-10024: Catch NoSuchFileException when opening index directory	2021-07-17 13:43:12 +09:00
Julie Tibshirani	982b95e38e	Add missing changelog entry for LUCENE-10026	2021-07-16 21:09:48 -07:00
Jim Ferenczi	f333b70dbf	LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields (#185 ) This change fixes a bug in `MultiNormsLeafSimScorer` that assumes that each field should have a norm for every term/document. As part of the fix, it adds validation that the fields have consistent norms settings.	2021-07-16 18:40:28 -07:00
Julie Tibshirani	30beb70ffa	Small fix to CombinedFieldQuery#hashCode	2021-07-16 13:21:52 -07:00
Julie Tibshirani	b9a70c28b6	LUCENE-10026: Fix CombinedFieldQuery equals and hashCode (#212 ) The previous equals and hashCode methods only compared query terms. This meant that queries on different fields, or with different field weights, were considered the same During boolean query rewrites, duplicate clauses are removed. So because equals/ hashCode was incorrect, rewrites could accidentally drop CombinedFieldQuery clauses.	2021-07-16 09:59:33 -07:00
Robert Muir	e65941f9c5	Fix broken ICU license link to point to the new ICU github. The previous svn-based link no longer works. Instead point at the license file in github: it is for icu4c, but see the repo: user is explicitly directed at this license file for both icu4c and icu4j. Good case to have a correct link, as the ICU license is complicated. It even has "if (version > X)" conditionals in the legalese!!!	2021-07-13 23:11:55 -04:00
Robert Muir	5cf142f972	LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode (#211 ) Re-enable the randomized testing here, but with a separate test for each mode rather than all in one method. It gives better testing and also easier-to-debug testing.	2021-07-13 23:11:18 -04:00
Robert Muir	c21b0adb14	reorder items in CHANGES.txt to better match branch_8x ! There are other abnormalities, but the order of entries is an easy thing to fix.	2021-07-13 21:56:56 -04:00
Michael Gibney	c3482c99ff	LUCENE-9177: ICUNormalizer2CharFilter streaming no longer depends on presence of normalization-inert characters (#199 ) Normalization-inert characters need not be required as boundaries for incremental processing. It is sufficient to check `hasBoundaryAfter` and `hasBoundaryBefore`, substantially improving worst-case performance.	2021-07-13 21:26:40 -04:00
Patrick Zhai	caa822ff38	LUCENE-10021: Upgrade HPPC to 0.9.0. Replace usage of ...ScatterMap to ...HashMap (#209 )	2021-07-13 14:38:46 +02:00
zacharymorn	180cfa241b	LUCENE-9959: Add non thread local based API for term vector reader usage (#180 )	2021-07-12 23:34:52 -07:00
Uwe Schindler	15034f6c90	LUCENE-10019: Add extra checks as suggested by Adrien	2021-07-12 22:33:10 +02:00
Julie Tibshirani	08e61a2201	LUCENE-10022: Rewrite empty DisjunctionMaxQuery to MatchNoDocsQuery It's possible to create a DisjunctionMaxQuery with no clauses. This is now rewritten to MatchNoDocsQuery, matching the approach we take for BooleanQuery.	2021-07-09 11:38:41 -07:00
Uwe Schindler	0607ebed6b	add changes	2021-07-09 18:54:20 +02:00
Uwe Schindler	69e85924b7	LUCENE-10019: Align file starts in CFS files to have proper alignment (8 bytes) (#203 )	2021-07-09 18:50:01 +02:00
Mayya Sharipova	64d9f8c587	LUCENE-10020 DocComparator don't skip docs of same docID (#204 ) DocComparator should not skip docs with the same docID on multiple sorts with search after. Because of the optimization introduced in LUCENE-9449, currently when searching with sort on [_doc, other fields] with search after, DocComparator can efficiently skip all docs before and including the provided [search after docID]. This is a desirable behaviour in a single index search. But in a distributed search, where multiple indices have docs with the same docID, and when searching on [_doc, other fields], the sort optimization should NOT skip documents with the same docIDs. This PR fixes this. Relates to LUCENE-9449	2021-07-06 14:59:57 -04:00
Christine Poerschke	167bd99c23	LUCENE-8638: remove deprecated LegacyDocValues classes in org.apache.lucene.codecs.memory package (#198 )	2021-07-02 17:59:01 +01:00
Christine Poerschke	960e229df5	remove no-longer-accurate sentence in TopTermsScoringBooleanQueryRewrite javadocs (#197 )	2021-07-02 17:42:17 +01:00
Gautam Worah	a0d995d0c4	LUCENE-9964: Duplicate long values in a field should only be counted once when using SortedNumericDocValuesFields (#191 )	2021-07-01 13:34:00 -07:00
Geoffrey Lawson	834041f286	LUCENE-9963 Add tests for alternate path failures in FlattenGraphFilter (#146 ) Co-authored-by: Lawson <geoffrl@amazon.com>	2021-06-28 12:04:04 -04:00
Mike McCandless	3d833fdb66	LUCENE-10009: fix longstanding cosmetic bug in IndexWriter's infoStream logging, falsely claiming term frequencies were not enabled when positions were not indexed (thank you @yangsongbai)	2021-06-25 10:56:18 -04:00
Greg Miller	578f5cf51b	Fix concurrency bug in DrillSidewaysQuery (#195 )	2021-06-24 12:18:38 -07:00
Greg Miller	9942d59f0d	Move CHANGES entries to 8.10 for LUCENE-9962/9946/9944/9988 (#194 )	2021-06-24 06:41:06 -07:00
balmukundblr	f1d54f7c35	Parallel processing (#132 ) * Added a explicit Flush Task to flush data at Thread level once it completes the processing * Included explicit flush per Thread level * Done changes for parallel processing * Removed extra brace * Removed unused variable * Removed unused variable initialization * Did the required formating * Refactored the code and added required comments & checks	2021-06-24 09:17:19 -04:00
Mike McCandless	db26215f15	LUCENE-9902: move CHANGES entry to 8.10.0	2021-06-23 16:32:43 -04:00
Patrick Zhai	48ff29c8f3	LUCENE-9983: Stop sorting determinize powersets unnecessarily (#163 ) * LUCENE-9983: Stop sorting determinize powersets unnecessarily	2021-06-23 13:07:22 -04:00
Adrien Grand	1d5d458960	LUCENE-9613: Encode ordinals like numerics. (#186 ) This helps simplify the code, and also adds some optimizations to ordinals like better compression for long runs of equal values or fields that are used in index sorts.	2021-06-23 15:37:50 +02:00
Michael Gibney	495bf6730f	For stability of DisjunctionIntervalsSource.toString(), sort subSources (#193 ) Iterators over subSources of DisjunctionIntervalsSource may return elements in indeterminate order, requiring special handling to make toString() output stable across equivalent instances	2021-06-23 07:53:30 -04:00
Mike McCandless	636d10be64	LUCENE-9981: move CHANGES.txt entry to the confusingly no-longer-a-proper-floating-point-number 8.10.0 section	2021-06-22 07:25:10 -04:00
Karl Wright	e6ed1fb075	LUCENE-10012: Improve concurrency with path distance caching.	2021-06-22 05:18:30 -04:00
Mayya Sharipova	a40d5a4258	Add back-compat indices for 8.9.0	2021-06-21 14:35:49 -04:00
Mayya Sharipova	8cca0290ff	Add next minor version 8.9	2021-06-21 14:23:25 -04:00
Mayya Sharipova	d821178bc0	Sync CHANGES for 8.9.0 (#189 ) Move changes that were released in 8.9.0 from 9.0.0 to 8.9.0.	2021-06-18 17:25:13 -04:00
Adrien Grand	1365156fcd	LUCENE-9996: Reduce RAM usage of DWPT for a single document. (#184 ) With this change, doc-value terms dictionaries use a shared `ByteBlockPool` across all fields, and points, binary doc values and doc-value ordinals use slightly smaller page sizes.	2021-06-18 09:17:50 +02:00
Mayya Sharipova	065026b74e	DOAP changes for release 8.9.0	2021-06-17 16:01:03 -04:00
Adrien Grand	803d131fd0	LUCENE-9535: Try to do larger flushes. DWPTPool currently always returns the last DWPT that was added to the pool. By returning the largest DWPT instead, we could try to do larger flushes by finishing DWPTs that are close to being full instead of the last one that was added to the pool, which might be close to being empty. When indexing wikimediumall, this change did not seem to improve the indexing rate significantly, but it didn't slow things down either and the number of flushes went from 224-226 to 216, about 4% less. My expectation is that our nightly benchmarks are a best-case scenario for DWPTPool as the same number of threads is dedicated to indexing over time, but in the case when you have e.g. a single fixed threadpool that is responsible for indexing into several indices, the number of indexing threads that contribute to a given index might greatly vary over time.	2021-06-16 10:26:45 +02:00
kkewwei	b7b834b756	LUCENE-9998: delete useless param fis in StoredFieldsWriter.finish() and TermVectorsWriter.finish() (#183 )	2021-06-15 16:59:42 +02:00
Nhat Nguyen	6f5a413ec6	LUCENE-9935: Clone term vectors reader for merges (#182 ) The newly added assertion in the bulk-merge logic doesn't always hold because we do not create a new instance of Lucene90CompressingTermVectorsReader for merges and that reader can be accessed in tests (as long as it happens on the same thread). This change clones a new term vectors reader for merges.	2021-06-15 07:10:30 -04:00
Nhat Nguyen	50607e0fb9	LUCENE-9935: Enable bulk-merge for term vectors with index sort (#140 ) This change enables bulk-merge for term vectors with index sort. The algorithm used here is similar to the one that is used to merge stored fields. Relates #134	2021-06-14 11:39:38 -04:00
Dawid Weiss	3bedc0871e	LUCENE-9977: rat task corrections (proper up-to-date checks, cleanup and rewrite of the task itself). (#178 )	2021-06-11 09:26:34 +02:00
Nhat Nguyen	69ab1447a7	Revert "LUCENE-9935: Enable bulk-merge for term vectors with index sort (#140 )" This reverts commit `54fb21e862`.	2021-06-10 11:54:11 -04:00
Nhat Nguyen	54fb21e862	LUCENE-9935: Enable bulk-merge for term vectors with index sort (#140 ) This change enables bulk-merge for term vectors with index sort. The algorithm used here is similar to the one that is used to merge stored fields. Relates #134	2021-06-10 11:03:17 -04:00
Jack Conradson	40f66a450a	LUCENE-9965: Add tooling to introspect query execution time (#144 ) This change adds new IndexSearcher and Collector implementations to profile search execution and break down the timings. The breakdown includes the total time spent in each of the following categories along with the number of times visited: create weight, build scorer, next doc, advance, score, match. Co-authored-by: Julie Tibshirani <julietibs@gmail.com>	2021-06-09 13:25:15 -07:00
Adrien Grand	f5e050bd00	LUCENE-9992: Update expectations about vectors with no values.	2021-06-09 18:59:14 +02:00
Michael Sokolov	465cb17d2b	LUCENE-9992: write empty vector fields when merging (#172 )	2021-06-09 07:56:50 -04:00
Dawid Weiss	332405e7ad	LUCENE-9995: JDK17 generates wbr tags which make javadocs checker angry.	2021-06-09 10:45:01 +02:00
zacharymorn	8bcaf87a83	LUCENE-9976: Fix WANDScorer assertion error (#171 ) LUCENE-9976: Fix WANDScorer assertion error as (tailMaxScore >= minCompetitiveScore) && (tailSize < minShouldMatch) are valid now	2021-06-09 00:11:10 -07:00
Julie Tibshirani	d22af75686	Fix random failures in TestPerFieldVectorFormat#testMergeUsesNewFormat	2021-06-08 14:26:52 -07:00
Julie Tibshirani	300589433f	Move some 9.0 changelog items to 8.x These were backported so should appear in the later sections. This commit also fixes some small typos.	2021-06-08 09:11:28 -07:00

... 2 3 4 5 6 ...

35341 Commits All Branches Search

35341 Commits

All Branches