lucene

Commit Graph

Author	SHA1	Message	Date
Dawid Weiss	2b0378cd4a	Use JavaInfo instead of toolchains. Internal but works and is free of toolchain's quirks.	2021-08-25 10:03:59 +02:00
Dawid Weiss	68cf86ba35	Experiments with the new apis.	2021-08-25 10:03:59 +02:00
Dawid Weiss	72f373791e	Upgrade palantir's plugin.	2021-08-25 10:03:59 +02:00
Dawid Weiss	3ff4263535	Upgrade gradle to 7.1.1	2021-08-25 10:03:59 +02:00
Dawid Weiss	523cea2c5d	Revert "Adding initial patch by Gautam Worah" (restore pristine main) This reverts commit 067ab4f503aabea59639e692e3ea9ee30750c68e.	2021-08-25 10:03:59 +02:00
Dawid Weiss	bac22d6116	Adding initial patch by Gautam Worah	2021-08-25 10:03:59 +02:00
Mayya Sharipova	fc67d6aa6e	Revert "LUCENE-10054 Make HnswGraph hierarchical (#250 )" This reverts commit `257d256def`. We've decided to have a separate feature branch for HNSW, and put all related changes there.	2021-08-24 14:58:59 -04:00
Julie Tibshirani	782c3cca3a	LUCENE-10040: Relax TestKnnVectorQuery#testDeletes assertion (#251 ) TestKnnVectorQuery#testDeletes assumes that if there are n total documents, we can perform a kNN search with k=n and retrieve all documents. This isn't true with our implementation -- due to randomization we may select less than n entry points and never visit some vectors.	2021-08-24 11:15:27 -07:00
Adrien Grand	83ba5d859c	LUCENE-7020: Remove TieredMergePolicy#setMaxMergeAtOnceExplicit. (#230 ) TieredMergePolicy no longer bounds the number of segments that can be merged via a forced merge.	2021-08-24 10:27:00 +02:00
Mayya Sharipova	257d256def	LUCENE-10054 Make HnswGraph hierarchical (#250 ) Currently HNSW has only a single layer. This is the first part to make it multi-layered. To keep changes small, this PR only adds multiple layers in the HnswGraph class. TODO for following PRs: - modify graph construction and search algorithm for a hierarchical graph. - modify Lucene90HnswVectorsWriter and Lucene90HnswVectorsReader to write and read multiple layers\	2021-08-23 15:54:26 -04:00
Greg Miller	46fa09d265	LUCENE-5309: Optimize facet counting for single-valued SSDV / StringValueFacetCounts (#255 )	2021-08-23 10:01:23 -07:00
51search	191ee3ad3e	LUCENE-10058: fix gradle lucene:benchmark:run error (#253 )	2021-08-23 10:36:33 -04:00
Uwe Schindler	5813292de2	LUCENE-10055: Update Subversion foder for Javadocs	2021-08-22 13:34:09 +02:00
Michael Sokolov	054b444c14	Fix off-by-one in TestDemo.testKnnVectorSearch	2021-08-21 14:22:47 -04:00
Mike Drob	c36495dce7	LUCENE-10017 Less verbose exception on IndexFormatTooOld (#200 )	2021-08-20 15:40:52 -05:00
Dzung Bui	0c3c8ec09a	LUCENE-10059: Fix an AssertionError when JapaneseTokenizer tries to backtrace from and to the same position (#254 ) Co-authored-by: Anh Dung Bui <buidun@amazon.com>	2021-08-20 08:21:58 -04:00
Michael Sokolov	5896e5389a	LUCENE-10057: Use Lucene abstractions to store demo KnnVectorDict (Dawid Weiss)	2021-08-19 16:14:06 -04:00
Michael Sokolov	eeb296ce90	LUCENE-8638: remove LegacyBM25Similarity	2021-08-18 15:44:56 -04:00
Michael Sokolov	b8210dee7a	Close vector dictionary when exiting the demo	2021-08-18 15:43:33 -04:00
Michael Sokolov	d1d60e2db6	LUCENE-8638: remove unused deprecated methods and related tests (#248 )	2021-08-18 08:19:49 -04:00
Michael Sokolov	666c7a2590	LUCENE-8638: remove deprecated FST get by output	2021-08-18 08:15:31 -04:00
Michael Sokolov	a37844aedd	LUCENE-10016: Added KnnVector index/query support to demo	2021-08-18 08:13:59 -04:00
Michael Sokolov	4213f9d3cd	LUCENE-8638: remove long-deprecated Jaspell suggester	2021-08-17 17:45:22 -04:00
Michael McCandless	65a53450dc	LUCENE-10052: first cut at LTC.newBytesRef methods, and switching a few test cases over (#245 ) * LUCENE-10052: first cut at LTC.newBytesRef methods, to randomize the offset/length of a BytesRef, and switching a few test cases over	2021-08-17 16:18:40 -04:00
Michael Sokolov	2d21a600ba	LUCENE-8638: remove deprecated code (#243 )	2021-08-17 13:51:04 -04:00
Julie Tibshirani	29ed3908ea	LUCENE-9614: Small fixes to KnnVectorQuery hashCode and toString	2021-08-16 09:10:53 -07:00
Julie Tibshirani	e48be684b2	LUCENE-9614: Prevent TestKnnVectorQuery from using simple text codec (#244 ) The simple text codec doesn't support kNN searches, so the test will fail when we randomly chose to use it.	2021-08-16 09:11:03 -07:00
Julie Tibshirani	6993fb9a99	LUCENE-10040: Handle deletions in nearest vector search (#239 ) This PR extends VectorReader#search to take a parameter specifying the live docs. LeafReader#searchNearestVectors then always returns the k nearest undeleted docs. To implement this, the HNSW algorithm will only add a candidate to the result set if it is a live doc. The graph search still visits and traverses deleted docs as it gathers candidates.	2021-08-16 07:44:17 -07:00
Mike McCandless	19e5c00a4f	LUCENE-10014: fix performance bug: when writing doc values with block GCD compression we were unnecessarily wasting index storage by failing to take fully advantage of the GCD compression	2021-08-16 08:40:02 -04:00
Mike McCandless	b18f714096	LUCENE-10008: add CHANGES entry	2021-08-13 14:47:53 -04:00
Vigya Sharma	cb4c8ae07f	Lucene-10008: Respect ignoreCase flag in CommonGramsFilterFactory and factor out a common abstract base class AbstractWordsFileFilterFactory.java (#188 )	2021-08-13 14:45:58 -04:00
Michael Sokolov	624560a3d7	LUCENE-9614: add KnnVectorQuery implementation	2021-08-13 12:15:40 -04:00
Julie Tibshirani	a9fb5a965d	LUCENE-10043: Decrease default LRUQueryCache#skipCacheFactor to 10 (#232 ) In LUCENE-9002 we introduced logic to skip caching a clause if it would be too expensive compared to the usual query cost. Specifically, we avoid caching a clause if its cost is estimated to be a 250x higher than the lead iterator's. We've found that the default of 250 is quite high and can lead to poor tail latencies. This PR decreases it to 10 to cache more conservatively.	2021-08-11 13:29:12 +03:00
Mike McCandless	931ff63232	LUCENE-9963: add CHANGES entry	2021-08-09 16:11:31 -04:00
Geoffrey Lawson	647255b4d2	LUCENE-9963 Improve FlattenGraphFilter's robustness when handling incoming token graphs with holes (#157 ) 6 main improvements: 1) Iterate through all output.InputNodes since dest gaps can exist. 2) freeBefore the minimum input node instead of the first input node(which was usually, but not always, the minimum). 3) Don't freeBefore from a hole source node. Book keeping may not be correct and could result in an early free. 4) When adding an output node after hole recovery, calculate its new position increment instead of adding it to the end of the output graph. 5) Nodes after holes that have edges to their source will do the output re-mapping that the deleted node would have done. 6) If a disconnected input node swaps order with another node in the output, then map them to the same output node. Co-authored-by: Lawson <geoffrl@amazon.com>	2021-08-09 16:06:53 -04:00
Greg Miller	a11457b4e6	LUCENE-10047: Fix value de-duping check in LongValueFacetCounts and RangeFacetCounts (#237 )	2021-08-07 10:20:49 -07:00
Greg Miller	e937e739f3	LUCENE-10046: Fix counting bug in StringValueFacetCounts (#236 )	2021-08-07 07:32:50 -07:00
Greg Miller	3037e33025	Slight improvement/optimization to duplicate facet value checking (ref: LUCENE-9964) (#234 )	2021-08-06 12:57:09 -07:00
Greg Miller	645b64ef4e	Update CHANGES entry for LUCENE-9945 after backporting	2021-08-02 16:38:10 -07:00
Sejal Pawar	a76f2f8072	LUCENE-9945: Extend DrillSidewaysResult to expose drillDowns and drillSideways (#159 )	2021-08-02 16:01:08 -07:00
Greg Miller	7450a7e64b	Update CHANGES entry for LUCENE-10030 after backporting	2021-08-01 12:39:11 -07:00
Dawid Weiss	b016c8dc2a	LUCENE-10042: JAR minimal manifest JDK entries are incorrectly set to build-JVM	2021-08-01 14:14:42 +02:00
Mayya Sharipova	597398439c	LUCENE-10027 Changes for Dir Open with leafSorter Adjust changes to Directory Open API from commit with leafsorter according with v. 8.10. Relates to PR #214	2021-07-30 13:42:29 -04:00
Mayya Sharipova	1daf7e7c74	LUCENE-10027 provide leaf sorter from commit (#214 ) Provide leaf sorter for directory readers opened from IndexCommit LUCENE-9507 allowed to provide a leaf sorter for directory readers. One API that was missed is to allow to provide a leaf sorter for directory readers opened from an index commit. This patch address this by adding an extra parameter: a custom comparator for sorting leaf readers to the Directory reader open API from indexCommit and minSupportedMajorVersion. Relates to PR #32	2021-07-30 09:15:21 -04:00
Gautam Worah	56eb76dbaf	Simplify some code	2021-07-29 13:12:27 -04:00
Gautam Worah	bd3174de10	PR fixes 1. Change negation to 2. Move statement inside if condition	2021-07-29 13:12:27 -04:00
Gautam Worah	cec19125fa	Fix minor logic	2021-07-29 13:12:27 -04:00
Gautam Worah	be0a3e5721	Move the version check to a final variable that is initialized in the constructor	2021-07-29 13:12:27 -04:00
Gautam Worah	162131ecf8	Use BDV or a StoredField based on the Lucene version that has created the last index commit If the Lucene version was < 9 then use a StringField or else if the index is fresh or if the index is was built using a version >= 9, then use a BDV field.	2021-07-29 13:12:27 -04:00
Gautam Worah	7cb696041c	Category documents added in the Lucene 9.0 taxonomy index use a BDV field with a different name Using BDV fields with a different "$full_path_binary$" name ensures that the earlier "$full_path$" StringField does not have the same name as the BDV field and hence they don't violate the field type consistency check (LUCENE-9334). This commit also enables the back-compat check that was disabled earlier.	2021-07-29 13:12:27 -04:00

1 2 3 4 5 ...

35255 Commits All Branches Search

35255 Commits

All Branches