lucene

Commit Graph

Author	SHA1	Message	Date
Alan Woodward	2183756f1c	LUCENE-10413: Make default Ukrainian stopword set available (#665 ) This commit adds a new getDefaultStopwords() static method to UkrainianMorfologikAnalyzer, which makes it possible to create an analyzer with the default stop word set but a custom stem exclusion set.	2022-02-09 14:37:44 +00:00
Greg Miller	8178ffda00	LUCENE-10403: Add ArrayUtil#grow(T[]) (#644 )	2022-02-08 09:43:55 -08:00
Adrien Grand	ce93d45532	LUCENE-10367: Optimize CoveringQuery for the case when the minimum number of matching clauses is a constant.	2022-02-08 17:25:53 +01:00
Nhat Nguyen	bcb70fd742	LUCENE-10190: Ensure changes are visible before advancing seqno (#640 ) DocumentWriter#anyChanges() can return false after we process and generate a sequence number for an update operation; but before we adjust the numDocsInRAM. In this window of time, refreshes are noop, although the maxCompletedSequenceNumber has advanced.	2022-02-08 10:29:20 -05:00
gf2121	5250186bd1	LUCENE-10410: Add more tests for legacy decoding logic in DocIdsWriter (#654 )	2022-02-08 16:59:32 +08:00
Tomoko Uchida	20f7f33c8d	LUCENE-10400: cleanup obsolete APIs in kuromoji (#655 )	2022-02-08 09:32:33 +09:00
Julie Tibshirani	eb5bdd7d15	Rename KnnGraphValues -> HnswGraph (#645 ) This PR proposes some renames to clarify the code structure. The top-level `KnnGraphValues` is renamed to `HnswGraph`, since it now represents a hierarchical graph. It's also moved from `org.apache.lucene.index` to the `hnsw` package. Other renames: * The old `HnswGraph` -> `OnHeapHnswGraph` * `IndexedKnnGraphValues` -> `OffHeapHnswGraph` (to match `OffHeapVectorValues`)	2022-02-07 13:21:15 -08:00
Tomoko Uchida	e7546c2427	LUCENE-10400: revise binary dictionaries' constructor in kuromoji (#643 )	2022-02-07 19:31:22 +09:00
gf2121	e93b08f471	LUCENE-10315: Add CHANGES for #541 (#653 )	2022-02-07 16:23:34 +08:00
gf2121	8c67a3816b	LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil (#541 )	2022-02-07 15:35:54 +08:00
Ignacio Vera	4c578017af	LUCENE-10405: binary and Sorted doc values are stored as BytesRef instead of BytesRefHash in memory index (#647 ) When using the MemoryIndex, binary and Sorted doc values are stored as BytesRef instead of BytesRefHash so they don't have a limit on size.	2022-02-07 07:33:07 +01:00
Greg Miller	deef3c704e	Update github hunspell regression test to use JDK 17 (#651 )	2022-02-06 08:00:31 -08:00
Gautam Worah	de4eccbb55	LUCENE-10050 Remove DrillSideways#search(DrillDownQuery,Collector) in favor of DrillSideways#search(DrillDownQuery,CollectorManager) (#632 )	2022-02-04 15:25:52 -08:00
Mayya Sharipova	ff2189c477	Add changes item for LUCENE-10054	2022-02-04 14:51:48 -05:00
Mayya Sharipova	ea4ab26e52	LUCENE-9573 Add Vectors to TestBackwardsCompatibility (#636 ) Update index.9.0.0-cfs.zip and index.9.0.0-nocfs.zip to include knn vector field.	2022-02-04 14:42:44 -05:00
Alan Woodward	6b64f4b556	LUCENE-10407: Set bpos flag to true when containing filter is exhausted (#648 ) ContainedByIntervalIterator and OverlappingIntervalIterator set their 'is the filter interval exhausted' flag to `false` once it has returned NO_MORE_POSITIONS on a document, so that subsequent calls to `startPosition()` will also return NO_MORE_POSITIONS. ContainingIntervalIterator omits to do this, and so it can incorrectly report matches, for example when used in a disjunction. This commit fixes that omission.	2022-02-04 16:44:57 +00:00
Alan Woodward	9ebee5a058	LUCENE-10402: Changes entry	2022-02-04 15:28:44 +00:00
Alan Woodward	e72d796e96	LUCENE-10402: Prefix interval automaton should be declared binary (#646 )	2022-02-04 15:27:03 +00:00
Adrien Grand	ed6c1b5aea	LUCENE-10401: Fix lookups on empty doc-values terms dictionaries. (#642 )	2022-02-04 09:28:35 +01:00
Julie Tibshirani	57d9515eff	LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls (#641 ) A couple of the data structures used in HNSW search are pretty large and expensive to allocate. This commit creates a shared candidates queue and visited set that are reused across calls to HnswGraph#searchLevel. Now the same data structures are used for building the entire graph, which can cut down on allocations during indexing. For graph building it also switches the visited set to FixedBitSet for better performance.	2022-02-03 16:00:09 -08:00
Luca Cavanna	bade484998	LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery (#635 ) IndexSortSortedNumericDocValuesRangeQuery can implement its count method and coompute count through a binary search, the same binary search that is used to execute the query itself, whenever all the required conditions are met.	2022-02-03 17:19:05 +01:00
Luca Cavanna	ee7a8d6918	LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests (#639 ) Also use LuceneTestCase#newSearcher	2022-02-03 17:17:02 +01:00
Dawid Weiss	9a28c91a5a	LUCENE-10283: bump the minimum source/release in javadoc settings.	2022-02-02 17:25:50 +01:00
Dawid Weiss	87bba4152c	LUCENE-10283: bump the minimum source/release in ecj linter settings.	2022-02-02 17:25:41 +01:00
Mike Drob	56f49257ed	null check on infoStream (#637 )	2022-02-02 09:44:31 -06:00
Mayya Sharipova	c8e1c08cc8	Small fix for assertConsistentGraph (#631 ) TestKnnGraph.testMultipleVectorFields sometimes breaks with the following message: java.lang.NullPointerException: Cannot invoke "org.apache.lucene.codecs.lucene91.Lucene91HnswVectorsReader.getGraphValues(String)" because "vectorReader" is null This happens in assertConsistentGraph. This patch ensures that for a segment and a field where there is no vectors indexed, we don't run a check on consistent graph.	2022-02-01 10:21:48 -05:00
Dawid Weiss	f103cca565	LUCENE-10255: Add the required unnamed modules in benchmarks subproject to module-info so that they are explicit.	2022-02-01 12:15:01 +01:00
Dawid Weiss	e7212fa47d	LUCENE-10283: bump minimum JDK version to 17 in buildSrc.	2022-02-01 12:09:35 +01:00
Mayya Sharipova	8dfdb261e7	LUCENE-9573 Add Vectors to TestBackwardsCompatibility (#616 ) This patch adds KNN vectors for testing backward compatible indices - Add a KnnVectorField to documents when creating a new backward compatible index - Add knn vectors search and check for vector values to the testing of search of backward compatible indices - Add tests for knn vector search when changing backward compatible indices (merging them and adding new documents to them)	2022-01-31 09:20:53 -05:00
Luca Cavanna	df12e2b195	LUCENE-10395: Introduce TotalHitCountCollectorManager (#622 )	2022-01-31 14:45:35 +01:00
Luca Cavanna	933c54fe87	Improve Weight#count and IndexSearcher#count javadocs (#625 )	2022-01-28 16:47:25 +01:00
Robert Muir	61edacee5d	update javac flags for java 17 (#628 ) Previously -Xlint:text-blocks and -Xlint:text-blocks were enabled conditionally, if the user had at least java 15 or java 16, respectively. Enable them always. Add new options so that the warnings list is fully configured: * -Xlint:module (new in java 17) * -Xlint:strictfp (new in java 17) Disable "path" with -Xlint:-path rather than commenting it out, for consistency. Disable "missing-explicit-ctor" (new in java 17), as it is unlikely to succeed right now. Alphasort the flags and doc how to get the updated list, this makes it easy to compare and keep up to date.	2022-01-28 05:48:58 -05:00
Adrien Grand	09ddac1fe5	Simplify HnswGraph#search. (#627 ) Currently the contract on `bound` is that it holds the score of the top of the `results` priority queue. It means that a candidate is only considered if its score is better than the bound or if less than `topK` results have been accumulated so far. I think it would be simpler if `bound` would always hold the minimum score that is required for a candidate to be considered? This would also be more consistent with how our WAND support works, by trusting `setMinCompetitiveScore` alone, instead of having to check whether the priority queue is full as well.	2022-01-27 18:08:06 +01:00
Greg Miller	4323848469	LUCENE-10368: Make IntTaxonomyFacets pkg-private (#600 )	2022-01-27 08:56:42 -08:00
Mayya Sharipova	dcd9e3d6f7	LUCENE-10389 Adjust TestHnswGraph.testRandom (#626 ) Before PR #608 this test when searching HnswGraph was using numSeed (the search queue size) equal to 100. This patch returns the original value of the search queue to 100, and gets the top topK results from it.	2022-01-27 09:06:48 -05:00
gf2121	eda9c29b8c	LUCENE-10388: Remove MultiLevelSkipListReader#SkipBuffer (#620 )	2022-01-26 16:36:33 +08:00
gf2121	3ad4c1b3c9	clean lastPayloadByteUpto (#619 )	2022-01-26 14:03:32 +08:00
Tomoko Uchida	e18dfea8bd	LUCENE-10389: temporary disable TestHnswGraph.testRandom()	2022-01-26 10:45:55 +09:00
Mayya Sharipova	b0d6fe68d1	LUCENE-10054 Make HnswGraph hierarchical (#608 ) Currently HNSW has only a single layer. This patch makes HNSW graph multi-layered. This PR is based on the following PRs: #250, #267, #287, #315, #536, #416 Main changes: - Multi layers are introduced into HnswGraph and HnswGraphBuilder - A new Lucene91HnswVectorsFormat with new Lucene91HnswVectorsReader and Lucene91HnswVectorsWriter are introduced to encode graph layers' information - Lucene90Codec, Lucene90HnswVectorsFormat, and the reading logic of Lucene90HnswVectorsReader and Lucene90HnswGraph are moved to backward_codecs to support reading and searching of graphs built in pre 9.1 version. Lucene90HnswVectorsWriter is deleted. - For backwards compatible tests, previous Lucene90 graph reading and writing logic was copied into test files of Lucene90RWHnswVectorsFormat, Lucene90HnswVectorsWriter, Lucene90HnswGraphBuilder and Lucene90HnswRWGraph. TODO: tests for KNN search for graphs built in pre 9.1 version; tests for merge of indices of pre 9.1 + current versions.	2022-01-25 13:53:55 -05:00
Mayya Sharipova	1a4f838fe2	LUCENE-10384: Simplify LongHeap small addition (#623 ) LUCENE-10384 and PR#615 introduced encoding f into NeighborQueue. But one function `nodes()` was remained to add this encoding. Also modify the test that would fail without this patch.	2022-01-25 11:43:40 -05:00
Luca Cavanna	11006fba59	LUCENE-10002: Replace simple usages of TotalHitCountCollector with IndexSearcher#count (#612 ) In case only number of documents are collected, IndexSearcher#search(Query, Collector) is commonly used, which does not use the executor that's been eventually set to the searcher. Calling `IndexSearcher#count(Query)` makes the code more concise and is also more correct as it honours the executor that's been set to the searcher instance. Co-authored-by: Adrien Grand <jpountz@gmail.com>	2022-01-25 16:11:19 +01:00
Tomoko Uchida	fd817b6fb1	LUCENE-8930: increase timeout to 1 minite for the launched luke (seems it occationaly takes long time on windows vm)	2022-01-25 22:56:11 +09:00
Tomoko Uchida	0aa526d256	LUCENE-10076: fix the assertion in luke module to only check if the optional has a value	2022-01-25 22:05:27 +09:00
Adrien Grand	07fe46ff86	LUCENE-10384: Simplify LongHeap. (#615 ) The min/max ordering logic moves to NeighborQueue.	2022-01-25 09:04:52 +01:00
Greg Miller	eaf3cb6739	Fix minor bug that snuck in with LUCENE-9952	2022-01-24 06:58:31 -08:00
Greg Miller	9e560c1af1	LUCENE-9952: Fix dim count inaccuracies in SSDV faceting when a dim is multi-valued (#611 )	2022-01-24 06:48:20 -08:00
Greg Miller	10ca531ddc	LUCENE-10381: Require users to provide FacetsConfig for SSDV faceting (#613 )	2022-01-24 06:46:22 -08:00
Julie Tibshirani	fb09ae1f7c	Undo accidental change to build.gradle	2022-01-23 16:26:16 -08:00
Julie Tibshirani	7ece8145bc	LUCENE-10375: Write vectors to file in flush (#617 ) In a previous commit, we updated HNSW merge to first write the combined segment vectors to a file, then use that file to build the graph. This commit applies the same strategy to flush, which lets us use the same logic for flush and merge.	2022-01-23 16:19:23 -08:00
Dawid Weiss	08d6633d94	LUCENE-8930: increase timeout for the launched luke.	2022-01-20 16:51:05 +01:00

1 2 3 4 5 ...

35731 Commits All Branches Search

35731 Commits

All Branches