lucene

Commit Graph

Author	SHA1	Message	Date
Julie Tibshirani	e48be684b2	LUCENE-9614: Prevent TestKnnVectorQuery from using simple text codec (#244 ) The simple text codec doesn't support kNN searches, so the test will fail when we randomly chose to use it.	2021-08-16 09:11:03 -07:00
Julie Tibshirani	6993fb9a99	LUCENE-10040: Handle deletions in nearest vector search (#239 ) This PR extends VectorReader#search to take a parameter specifying the live docs. LeafReader#searchNearestVectors then always returns the k nearest undeleted docs. To implement this, the HNSW algorithm will only add a candidate to the result set if it is a live doc. The graph search still visits and traverses deleted docs as it gathers candidates.	2021-08-16 07:44:17 -07:00
Mike McCandless	19e5c00a4f	LUCENE-10014: fix performance bug: when writing doc values with block GCD compression we were unnecessarily wasting index storage by failing to take fully advantage of the GCD compression	2021-08-16 08:40:02 -04:00
Mike McCandless	b18f714096	LUCENE-10008: add CHANGES entry	2021-08-13 14:47:53 -04:00
Vigya Sharma	cb4c8ae07f	Lucene-10008: Respect ignoreCase flag in CommonGramsFilterFactory and factor out a common abstract base class AbstractWordsFileFilterFactory.java (#188 )	2021-08-13 14:45:58 -04:00
Michael Sokolov	624560a3d7	LUCENE-9614: add KnnVectorQuery implementation	2021-08-13 12:15:40 -04:00
Julie Tibshirani	a9fb5a965d	LUCENE-10043: Decrease default LRUQueryCache#skipCacheFactor to 10 (#232 ) In LUCENE-9002 we introduced logic to skip caching a clause if it would be too expensive compared to the usual query cost. Specifically, we avoid caching a clause if its cost is estimated to be a 250x higher than the lead iterator's. We've found that the default of 250 is quite high and can lead to poor tail latencies. This PR decreases it to 10 to cache more conservatively.	2021-08-11 13:29:12 +03:00
Mike McCandless	931ff63232	LUCENE-9963: add CHANGES entry	2021-08-09 16:11:31 -04:00
Geoffrey Lawson	647255b4d2	LUCENE-9963 Improve FlattenGraphFilter's robustness when handling incoming token graphs with holes (#157 ) 6 main improvements: 1) Iterate through all output.InputNodes since dest gaps can exist. 2) freeBefore the minimum input node instead of the first input node(which was usually, but not always, the minimum). 3) Don't freeBefore from a hole source node. Book keeping may not be correct and could result in an early free. 4) When adding an output node after hole recovery, calculate its new position increment instead of adding it to the end of the output graph. 5) Nodes after holes that have edges to their source will do the output re-mapping that the deleted node would have done. 6) If a disconnected input node swaps order with another node in the output, then map them to the same output node. Co-authored-by: Lawson <geoffrl@amazon.com>	2021-08-09 16:06:53 -04:00
Greg Miller	a11457b4e6	LUCENE-10047: Fix value de-duping check in LongValueFacetCounts and RangeFacetCounts (#237 )	2021-08-07 10:20:49 -07:00
Greg Miller	e937e739f3	LUCENE-10046: Fix counting bug in StringValueFacetCounts (#236 )	2021-08-07 07:32:50 -07:00
Greg Miller	3037e33025	Slight improvement/optimization to duplicate facet value checking (ref: LUCENE-9964) (#234 )	2021-08-06 12:57:09 -07:00
Greg Miller	645b64ef4e	Update CHANGES entry for LUCENE-9945 after backporting	2021-08-02 16:38:10 -07:00
Sejal Pawar	a76f2f8072	LUCENE-9945: Extend DrillSidewaysResult to expose drillDowns and drillSideways (#159 )	2021-08-02 16:01:08 -07:00
Greg Miller	7450a7e64b	Update CHANGES entry for LUCENE-10030 after backporting	2021-08-01 12:39:11 -07:00
Dawid Weiss	b016c8dc2a	LUCENE-10042: JAR minimal manifest JDK entries are incorrectly set to build-JVM	2021-08-01 14:14:42 +02:00
Mayya Sharipova	597398439c	LUCENE-10027 Changes for Dir Open with leafSorter Adjust changes to Directory Open API from commit with leafsorter according with v. 8.10. Relates to PR #214	2021-07-30 13:42:29 -04:00
Mayya Sharipova	1daf7e7c74	LUCENE-10027 provide leaf sorter from commit (#214 ) Provide leaf sorter for directory readers opened from IndexCommit LUCENE-9507 allowed to provide a leaf sorter for directory readers. One API that was missed is to allow to provide a leaf sorter for directory readers opened from an index commit. This patch address this by adding an extra parameter: a custom comparator for sorting leaf readers to the Directory reader open API from indexCommit and minSupportedMajorVersion. Relates to PR #32	2021-07-30 09:15:21 -04:00
Gautam Worah	56eb76dbaf	Simplify some code	2021-07-29 13:12:27 -04:00
Gautam Worah	bd3174de10	PR fixes 1. Change negation to 2. Move statement inside if condition	2021-07-29 13:12:27 -04:00
Gautam Worah	cec19125fa	Fix minor logic	2021-07-29 13:12:27 -04:00
Gautam Worah	be0a3e5721	Move the version check to a final variable that is initialized in the constructor	2021-07-29 13:12:27 -04:00
Gautam Worah	162131ecf8	Use BDV or a StoredField based on the Lucene version that has created the last index commit If the Lucene version was < 9 then use a StringField or else if the index is fresh or if the index is was built using a version >= 9, then use a BDV field.	2021-07-29 13:12:27 -04:00
Gautam Worah	7cb696041c	Category documents added in the Lucene 9.0 taxonomy index use a BDV field with a different name Using BDV fields with a different "$full_path_binary$" name ensures that the earlier "$full_path$" StringField does not have the same name as the BDV field and hence they don't violate the field type consistency check (LUCENE-9334). This commit also enables the back-compat check that was disabled earlier.	2021-07-29 13:12:27 -04:00
Nhat Nguyen	ba417b593f	LUCENE-10032: Remove leafDocMaps from MergeState (#222 ) These maps are no longer useful after LUCENE-8505.	2021-07-29 08:28:39 -04:00
Adrien Grand	0e6c3146d7	LUCENE-10031: Speed up SortedDocIdMerger on low-cardinality sort fields. (#221 ) When sorting by low-cardinality fields, the same sub remains current for long sequences of doc IDs. This speeds up SortedDocIdMerger a bit by extracting the sub that leads iteration.	2021-07-29 08:46:10 +02:00
Shintaro Murakami	03b1db91f9	LUCENE-9304: Remove assertion in DocumentsWriterFlushControl (#228 ) This is assertion becomes obvious after LUCENE-9304.	2021-07-28 10:05:00 -04:00
Julie Tibshirani	e8663b30b8	LUCENE-10039: Fix single-field scoring for CombinedFieldQuery (#229 ) When there's only one field, CombinedFieldQuery will ignore its weight while scoring. This makes the scoring inconsistent, since the field weight is supposed to multiply its term frequency. This PR removes the optimizations around single-field scoring to make sure the weight is always taken into account. These optimizations are not critical since it should be uncommon to use CombinedFieldQuery with only one field.	2021-07-28 15:43:56 +03:00
Greg Miller	e44636c280	LUCENE-10036: Add factory method to ScoreCachingWrappingScorer that ensures unnecessary wrapping doesn't occur (#226 )	2021-07-27 07:53:36 -07:00
Greg Miller	736d114901	Add CHANGES entry for LUCENE-10030	2021-07-26 13:11:32 -07:00
Grigoriy Troitskiy	61f8517000	LUCENE-10030: Lazily evaluate score in DrillSidewaysScorer.doQueryFirstScoring (#217 )	2021-07-26 13:04:51 -07:00
Michael Sokolov	0ec93b632c	LUCENE-10016: fix test case to use the same similarity in both cases	2021-07-24 15:22:39 -04:00
Tomoko Uchida	df807dbe8f	LUCENE-9855: Rename knn search vector format (#218 )	2021-07-24 12:03:15 +09:00
Greg Miller	ad7746d6e3	LUCENE-10000: Make MultiCollectorManager consistent with MultiCollector (#196 ) MultiCollectorManager is now consistent with MultiCollector with respect to early termination, min score setting and score caching.	2021-07-22 19:02:15 -07:00
Adrien Grand	28ba8b7797	LUCENE-10015: Remove VectorSimilarityFunction#NONE. (#219 )	2021-07-21 10:06:27 +02:00
Adrien Grand	acf45d8a31	LUCENE-10016: Remove VectorValues#getSimilarityFunction. (#213 ) VectorValues is only about iterating over vectors in doc ID order, so it feels wrong to tie it to the similarity function.	2021-07-19 09:48:09 +02:00
Michael Sokolov	9b5e233960	LUCENE-10016: remove fanout parameter from nearest neighbor vector search (#210 )	2021-07-17 11:12:15 -04:00
Tomoko Uchida	2bd6924f07	add changes entry of LUCENE-10024	2021-07-17 14:41:43 +09:00
Tomoko Uchida	40038bcc92	LUCENE-10024: remove non-existing path from history file	2021-07-17 14:30:27 +09:00
Michael Wechner	489ba3e4f9	LUCENE-10024: Catch NoSuchFileException when opening index directory	2021-07-17 13:43:12 +09:00
Julie Tibshirani	982b95e38e	Add missing changelog entry for LUCENE-10026	2021-07-16 21:09:48 -07:00
Jim Ferenczi	f333b70dbf	LUCENE-9999: CombinedFieldQuery can fail with an exception when document is missing fields (#185 ) This change fixes a bug in `MultiNormsLeafSimScorer` that assumes that each field should have a norm for every term/document. As part of the fix, it adds validation that the fields have consistent norms settings.	2021-07-16 18:40:28 -07:00
Julie Tibshirani	30beb70ffa	Small fix to CombinedFieldQuery#hashCode	2021-07-16 13:21:52 -07:00
Julie Tibshirani	b9a70c28b6	LUCENE-10026: Fix CombinedFieldQuery equals and hashCode (#212 ) The previous equals and hashCode methods only compared query terms. This meant that queries on different fields, or with different field weights, were considered the same During boolean query rewrites, duplicate clauses are removed. So because equals/ hashCode was incorrect, rewrites could accidentally drop CombinedFieldQuery clauses.	2021-07-16 09:59:33 -07:00
Robert Muir	e65941f9c5	Fix broken ICU license link to point to the new ICU github. The previous svn-based link no longer works. Instead point at the license file in github: it is for icu4c, but see the repo: user is explicitly directed at this license file for both icu4c and icu4j. Good case to have a correct link, as the ICU license is complicated. It even has "if (version > X)" conditionals in the legalese!!!	2021-07-13 23:11:55 -04:00
Robert Muir	5cf142f972	LUCENE-5595: re-enable TestICUNormalizer2CharFilter random test, splitting by mode (#211 ) Re-enable the randomized testing here, but with a separate test for each mode rather than all in one method. It gives better testing and also easier-to-debug testing.	2021-07-13 23:11:18 -04:00
Robert Muir	c21b0adb14	reorder items in CHANGES.txt to better match branch_8x ! There are other abnormalities, but the order of entries is an easy thing to fix.	2021-07-13 21:56:56 -04:00
Michael Gibney	c3482c99ff	LUCENE-9177: ICUNormalizer2CharFilter streaming no longer depends on presence of normalization-inert characters (#199 ) Normalization-inert characters need not be required as boundaries for incremental processing. It is sufficient to check `hasBoundaryAfter` and `hasBoundaryBefore`, substantially improving worst-case performance.	2021-07-13 21:26:40 -04:00
Patrick Zhai	caa822ff38	LUCENE-10021: Upgrade HPPC to 0.9.0. Replace usage of ...ScatterMap to ...HashMap (#209 )	2021-07-13 14:38:46 +02:00
zacharymorn	180cfa241b	LUCENE-9959: Add non thread local based API for term vector reader usage (#180 )	2021-07-12 23:34:52 -07:00

1 2 3 4 5 ...

35229 Commits All Branches Search

35229 Commits

All Branches