lucene

Commit Graph

Author	SHA1	Message	Date
Julie Tibshirani	57d9515eff	LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls (#641 ) A couple of the data structures used in HNSW search are pretty large and expensive to allocate. This commit creates a shared candidates queue and visited set that are reused across calls to HnswGraph#searchLevel. Now the same data structures are used for building the entire graph, which can cut down on allocations during indexing. For graph building it also switches the visited set to FixedBitSet for better performance.	2022-02-03 16:00:09 -08:00
Luca Cavanna	bade484998	LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery (#635 ) IndexSortSortedNumericDocValuesRangeQuery can implement its count method and coompute count through a binary search, the same binary search that is used to execute the query itself, whenever all the required conditions are met.	2022-02-03 17:19:05 +01:00
Luca Cavanna	ee7a8d6918	LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests (#639 ) Also use LuceneTestCase#newSearcher	2022-02-03 17:17:02 +01:00
Dawid Weiss	9a28c91a5a	LUCENE-10283: bump the minimum source/release in javadoc settings.	2022-02-02 17:25:50 +01:00
Dawid Weiss	87bba4152c	LUCENE-10283: bump the minimum source/release in ecj linter settings.	2022-02-02 17:25:41 +01:00
Mike Drob	56f49257ed	null check on infoStream (#637 )	2022-02-02 09:44:31 -06:00
Mayya Sharipova	c8e1c08cc8	Small fix for assertConsistentGraph (#631 ) TestKnnGraph.testMultipleVectorFields sometimes breaks with the following message: java.lang.NullPointerException: Cannot invoke "org.apache.lucene.codecs.lucene91.Lucene91HnswVectorsReader.getGraphValues(String)" because "vectorReader" is null This happens in assertConsistentGraph. This patch ensures that for a segment and a field where there is no vectors indexed, we don't run a check on consistent graph.	2022-02-01 10:21:48 -05:00
Dawid Weiss	f103cca565	LUCENE-10255: Add the required unnamed modules in benchmarks subproject to module-info so that they are explicit.	2022-02-01 12:15:01 +01:00
Dawid Weiss	e7212fa47d	LUCENE-10283: bump minimum JDK version to 17 in buildSrc.	2022-02-01 12:09:35 +01:00
Mayya Sharipova	8dfdb261e7	LUCENE-9573 Add Vectors to TestBackwardsCompatibility (#616 ) This patch adds KNN vectors for testing backward compatible indices - Add a KnnVectorField to documents when creating a new backward compatible index - Add knn vectors search and check for vector values to the testing of search of backward compatible indices - Add tests for knn vector search when changing backward compatible indices (merging them and adding new documents to them)	2022-01-31 09:20:53 -05:00
Luca Cavanna	df12e2b195	LUCENE-10395: Introduce TotalHitCountCollectorManager (#622 )	2022-01-31 14:45:35 +01:00
Luca Cavanna	933c54fe87	Improve Weight#count and IndexSearcher#count javadocs (#625 )	2022-01-28 16:47:25 +01:00
Robert Muir	61edacee5d	update javac flags for java 17 (#628 ) Previously -Xlint:text-blocks and -Xlint:text-blocks were enabled conditionally, if the user had at least java 15 or java 16, respectively. Enable them always. Add new options so that the warnings list is fully configured: * -Xlint:module (new in java 17) * -Xlint:strictfp (new in java 17) Disable "path" with -Xlint:-path rather than commenting it out, for consistency. Disable "missing-explicit-ctor" (new in java 17), as it is unlikely to succeed right now. Alphasort the flags and doc how to get the updated list, this makes it easy to compare and keep up to date.	2022-01-28 05:48:58 -05:00
Adrien Grand	09ddac1fe5	Simplify HnswGraph#search. (#627 ) Currently the contract on `bound` is that it holds the score of the top of the `results` priority queue. It means that a candidate is only considered if its score is better than the bound or if less than `topK` results have been accumulated so far. I think it would be simpler if `bound` would always hold the minimum score that is required for a candidate to be considered? This would also be more consistent with how our WAND support works, by trusting `setMinCompetitiveScore` alone, instead of having to check whether the priority queue is full as well.	2022-01-27 18:08:06 +01:00
Greg Miller	4323848469	LUCENE-10368: Make IntTaxonomyFacets pkg-private (#600 )	2022-01-27 08:56:42 -08:00
Mayya Sharipova	dcd9e3d6f7	LUCENE-10389 Adjust TestHnswGraph.testRandom (#626 ) Before PR #608 this test when searching HnswGraph was using numSeed (the search queue size) equal to 100. This patch returns the original value of the search queue to 100, and gets the top topK results from it.	2022-01-27 09:06:48 -05:00
gf2121	eda9c29b8c	LUCENE-10388: Remove MultiLevelSkipListReader#SkipBuffer (#620 )	2022-01-26 16:36:33 +08:00
gf2121	3ad4c1b3c9	clean lastPayloadByteUpto (#619 )	2022-01-26 14:03:32 +08:00
Tomoko Uchida	e18dfea8bd	LUCENE-10389: temporary disable TestHnswGraph.testRandom()	2022-01-26 10:45:55 +09:00
Mayya Sharipova	b0d6fe68d1	LUCENE-10054 Make HnswGraph hierarchical (#608 ) Currently HNSW has only a single layer. This patch makes HNSW graph multi-layered. This PR is based on the following PRs: #250, #267, #287, #315, #536, #416 Main changes: - Multi layers are introduced into HnswGraph and HnswGraphBuilder - A new Lucene91HnswVectorsFormat with new Lucene91HnswVectorsReader and Lucene91HnswVectorsWriter are introduced to encode graph layers' information - Lucene90Codec, Lucene90HnswVectorsFormat, and the reading logic of Lucene90HnswVectorsReader and Lucene90HnswGraph are moved to backward_codecs to support reading and searching of graphs built in pre 9.1 version. Lucene90HnswVectorsWriter is deleted. - For backwards compatible tests, previous Lucene90 graph reading and writing logic was copied into test files of Lucene90RWHnswVectorsFormat, Lucene90HnswVectorsWriter, Lucene90HnswGraphBuilder and Lucene90HnswRWGraph. TODO: tests for KNN search for graphs built in pre 9.1 version; tests for merge of indices of pre 9.1 + current versions.	2022-01-25 13:53:55 -05:00
Mayya Sharipova	1a4f838fe2	LUCENE-10384: Simplify LongHeap small addition (#623 ) LUCENE-10384 and PR#615 introduced encoding f into NeighborQueue. But one function `nodes()` was remained to add this encoding. Also modify the test that would fail without this patch.	2022-01-25 11:43:40 -05:00
Luca Cavanna	11006fba59	LUCENE-10002: Replace simple usages of TotalHitCountCollector with IndexSearcher#count (#612 ) In case only number of documents are collected, IndexSearcher#search(Query, Collector) is commonly used, which does not use the executor that's been eventually set to the searcher. Calling `IndexSearcher#count(Query)` makes the code more concise and is also more correct as it honours the executor that's been set to the searcher instance. Co-authored-by: Adrien Grand <jpountz@gmail.com>	2022-01-25 16:11:19 +01:00
Tomoko Uchida	fd817b6fb1	LUCENE-8930: increase timeout to 1 minite for the launched luke (seems it occationaly takes long time on windows vm)	2022-01-25 22:56:11 +09:00
Tomoko Uchida	0aa526d256	LUCENE-10076: fix the assertion in luke module to only check if the optional has a value	2022-01-25 22:05:27 +09:00
Adrien Grand	07fe46ff86	LUCENE-10384: Simplify LongHeap. (#615 ) The min/max ordering logic moves to NeighborQueue.	2022-01-25 09:04:52 +01:00
Greg Miller	eaf3cb6739	Fix minor bug that snuck in with LUCENE-9952	2022-01-24 06:58:31 -08:00
Greg Miller	9e560c1af1	LUCENE-9952: Fix dim count inaccuracies in SSDV faceting when a dim is multi-valued (#611 )	2022-01-24 06:48:20 -08:00
Greg Miller	10ca531ddc	LUCENE-10381: Require users to provide FacetsConfig for SSDV faceting (#613 )	2022-01-24 06:46:22 -08:00
Julie Tibshirani	fb09ae1f7c	Undo accidental change to build.gradle	2022-01-23 16:26:16 -08:00
Julie Tibshirani	7ece8145bc	LUCENE-10375: Write vectors to file in flush (#617 ) In a previous commit, we updated HNSW merge to first write the combined segment vectors to a file, then use that file to build the graph. This commit applies the same strategy to flush, which lets us use the same logic for flush and merge.	2022-01-23 16:19:23 -08:00
Dawid Weiss	08d6633d94	LUCENE-8930: increase timeout for the launched luke.	2022-01-20 16:51:05 +01:00
Ignacio Vera	4ec8f865c8	LUCENE-10288: Check BKD tree shape for lucene pre-8.6 1D indexes (#607 ) Adds efficient logic to compute if a tree is balanced or unbalanced for indexes created before Lucene 8.6	2022-01-20 07:49:29 +01:00
Dawid Weiss	72ba7ae2ee	LUCENE-8930: script testing in the distribution (#550 )	2022-01-20 00:09:15 +09:00
Julie Tibshirani	9b6d417d1c	LUCENE-10040: Update HnswGraph javadoc related to deletions Previously it claimed the search method did not handle deletions.	2022-01-18 15:36:00 -08:00
Julie Tibshirani	dfca9a5608	LUCENE-10375: Write merged vectors to file before building graph (#601 ) When merging segments together, the `KnnVectorsWriter` creates a `VectorValues` instance with a merged view of all the segments' vectors. This merged instance is used when constructing the new HNSW graph. Graph building needs random access, and the merged VectorValues support this by mapping from merged ordinals to segments and segment ordinals. This mapping can add significant overhead when building the graph. This change updates the HNSW merging logic to first write the combined segment vectors to a file, then use that the file to build the graph. This helps speed up segment merging, and also lets us simplify `VectorValuesMerger`, which provides the merged view of vector values.	2022-01-18 13:53:05 -08:00
Alan Woodward	2e2c4818d1	LUCENE-10377: Replace 'sortPos' with 'enableSkipping' in SortField.getComparator() (#603 ) The sort position parameter in SortField.getComparator() is only ever used to determine whether or not skipping should be enabled on a given comparator, so the parameter name should reflect that. This commit also explicitly disables skipping in a number of cases where it is never used, in particular CheckIndex and the grouping collectors.	2022-01-17 10:44:57 +00:00
Adrien Grand	457367e9b7	LUCENE-10168: Fix typo that would _not_ run nightly tests.	2022-01-14 13:51:16 +01:00
Greg Miller	2f5e3c323b	LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll (#605 ) Co-authored-by: guofeng.my <guofeng.my@bytedance.com>	2022-01-13 09:17:55 -08:00
Mayya Sharipova	bd2cc4124d	Small edits for KnnGraphTester (#575 ) 1. Correct the remaining size for input files larger than Integer.MAX_VALUE, as currently with every iteration we try to map the next blockSize of bytes even if less < blockSize bytes are left in the file. 2. Correct java.lang.ClassCastException when retrieving KnnGraphValues for stats printing. 3. Add an option for euclidean metric	2022-01-12 17:23:10 -05:00
gf2121	8d9fa6dba1	revert LUCENE-10355 (#597 ) Trying to find the source of taxo-facet performance regression. See also LUCENE-10374 Co-authored-by: guofeng.my <guofeng.my@bytedance.com>	2022-01-12 10:23:13 -08:00
Adrien Grand	71dfa9e9cd	addBackcompatIndexes.py should use Gradle, not Ant. (#531 )	2022-01-12 18:55:59 +01:00
Uwe Schindler	636d42e032	Fix wrong project name	2022-01-11 17:42:21 +01:00
Nikola Grcevski	bad65c53c9	LUCENE-10369: Move DelegatingCacheHelper to FilterDirectoryReader (#596 )	2022-01-11 15:22:06 +01:00
Adrien Grand	308ddd7502	Add documentation on file formats. (#598 )	2022-01-11 15:16:05 +01:00
Adrien Grand	f81c760cc8	LUCENE-10370: Fix precommit.	2022-01-11 10:13:10 +01:00
Dawid Weiss	9b54fbaa01	LUCENE-10370: temporarily ignore TestStressNRTReplication	2022-01-11 09:25:31 +01:00
Greg Miller	82703757fe	LUCENE-10245: Addition of MultiDoubleValues(Source) and MultiLongValues(Source) along with faceting capabilities (#543 )	2022-01-10 13:48:36 -08:00
Dawid Weiss	bff930c1bf	LUCENE-10370: temporarily ignore TestNRTReplication.	2022-01-10 22:18:12 +01:00
Greg Miller	cf12b46092	LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts (#585 )	2022-01-10 10:23:46 -08:00
Greg Miller	eb0b1bf9f1	Add CHANGES entry for LUCENE-10250	2022-01-10 08:57:28 -08:00

1 2 3 4 5 ...

35812 Commits All Branches Search

35812 Commits

All Branches