Commit Graph

35738 Commits

Author SHA1 Message Date
Adrien Grand b443015f01 LUCENE-10384: Simplify LongHeap. (#615)
The min/max ordering logic moves to NeighborQueue.
2022-01-25 09:05:11 +01:00
Greg Miller 225fd1527c
LUCENE-9952: Fix dim count inaccuracies in SSDV faceting when a dim is multi-valued (#621) 2022-01-24 11:33:37 -08:00
Greg Miller fa63c68857
LUCENE-10381: Require users to provide FacetsConfig for SSDV faceting (#614) 2022-01-24 06:46:10 -08:00
Julie Tibshirani b1b3812bd0 LUCENE-10375: Write vectors to file in flush (#617)
In a previous commit, we updated HNSW merge to first write the combined segment
vectors to a file, then use that file to build the graph. This commit applies
the same strategy to flush, which lets us use the same logic for flush and
merge.
2022-01-23 16:31:18 -08:00
Dawid Weiss 77ee2a7c3c LUCENE-8930: increase timeout for the launched luke. 2022-01-20 16:54:24 +01:00
Dawid Weiss 5f1bcc6481 LUCENE-8930: script testing in the distribution (#550) 2022-01-20 21:35:18 +09:00
Ignacio Vera 73a3db90dd LUCENE-10288: Check BKD tree shape for lucene pre-8.6 1D indexes (#607)
Adds efficient logic to compute if a tree is balanced or unbalanced for indexes 
created before Lucene 8.6
2022-01-20 07:50:22 +01:00
Julie Tibshirani 00a7d5f170 LUCENE-10040: Update HnswGraph javadoc related to deletions
Previously it claimed the search method did not handle deletions.
2022-01-18 15:36:47 -08:00
Julie Tibshirani f68cdd4c03 LUCENE-10375: Write merged vectors to file before building graph (#601)
When merging segments together, the `KnnVectorsWriter` creates a `VectorValues`
instance with a merged view of all the segments' vectors. This merged instance
is used when constructing the new HNSW graph. Graph building needs random
access, and the merged VectorValues support this by mapping from merged
ordinals to segments and segment ordinals. This mapping can add significant
overhead when building the graph.

This change updates the HNSW merging logic to first write the combined segment
vectors to a file, then use that the file to build the graph. This helps speed
up segment merging, and also lets us simplify `VectorValuesMerger`, which
provides the merged view of vector values.
2022-01-18 14:11:33 -08:00
zacharymorn af3a0bc4d5 LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues (#534) 2022-01-18 13:59:12 -08:00
Greg Miller c5c082e3e6 LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll
Co-authored-by: guofeng.my <guofeng.my@bytedance.com>
2022-01-17 17:12:28 -08:00
Greg Miller cfc6e54597 revert part of LUCENE-10350 that got missed in the earlier revert 2022-01-17 16:52:35 -08:00
Mayya Sharipova 90ce59e70b Small edits for KnnGraphTester (#575)
1. Correct the remaining size for input files larger
than Integer.MAX_VALUE, as currently with every
iteration we try to map the next blockSize of bytes
even if less < blockSize bytes are left in the file.

2. Correct java.lang.ClassCastException when retrieving
KnnGraphValues for stats printing.

3. Add an option for euclidean metric
2022-01-17 11:52:35 -05:00
Alan Woodward f8c76198d2 LUCENE-10377: Replace 'sortPos' with 'enableSkipping' in SortField.getComparator() (#603)
The sort position parameter in SortField.getComparator() is only ever used
to determine whether or not skipping should be enabled on a given comparator,
so the parameter name should reflect that.  This commit also explicitly disables
skipping in a number of cases where it is never used, in particular CheckIndex
and the grouping collectors.
2022-01-17 11:39:01 +00:00
Adrien Grand 81171690b8 LUCENE-10168: Fix typo that would _not_ run nightly tests. 2022-01-14 13:51:48 +01:00
Greg Miller 169785ff4d
revert LUCENE-10350 (#604) 2022-01-13 09:14:59 -08:00
Adrien Grand 04856ee8c0 addBackcompatIndexes.py should use Gradle, not Ant. (#531) 2022-01-12 18:56:16 +01:00
Uwe Schindler 2f5a1d93f5 Fix wrong project name 2022-01-11 17:43:22 +01:00
Nikola Grcevski 842ef165e4 LUCENE-10369: Move DelegatingCacheHelper to FilterDirectoryReader (#596) 2022-01-11 15:22:21 +01:00
Adrien Grand 22a8e4cbfa Add documentation on file formats. (#598) 2022-01-11 15:16:22 +01:00
Adrien Grand bb0eaaa4d4 LUCENE-10370: Fix precommit. 2022-01-11 10:14:11 +01:00
Dawid Weiss 7e50d954f4 LUCENE-10370: temporarily ignore TestStressNRTReplication 2022-01-11 09:25:50 +01:00
Greg Miller 60b499b907 LUCENE-10245: Addition of MultiDoubleValues(Source) and MultiLongValues(Source) along with faceting capabilities (#543) 2022-01-10 13:50:14 -08:00
Dawid Weiss c13bcac2e6 LUCENE-10370: temporarily ignore TestNRTReplication. 2022-01-10 22:18:28 +01:00
Greg Miller 674eb5a490 LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts (#585) 2022-01-10 10:36:19 -08:00
Marc D'mello c7650cdec2 LUCENE-10250: Add support for arbitrary length hierarchical SSDV facets (#509) 2022-01-10 09:24:53 -08:00
gf2121 12c526595c LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll() (#578) 2022-01-10 07:45:33 -08:00
Adrien Grand 78a1e6a8f7 Simplify some exception handling with try-with-resources. (#589) 2022-01-10 15:41:36 +01:00
Yannick Welsch 1998903a45 LUCENE-10291: Don't use CFS in testMinimalCodec (#593)
This test was occasionally failing on CI, as the test randomly installed a merge policy
that would force compound file creation while the goal of the test was not to do so.
2022-01-10 12:18:14 +00:00
Uwe Schindler 964006360c LUCENE-10364: Prepare and update errorprone plugin for Java 17 (#590) 2022-01-07 19:21:05 +01:00
Uwe Schindler 80d057fb03 Revert "Revert this change as module system work was not yet backported"
This reverts commit 336341ed71.
2022-01-06 19:06:30 +01:00
Dawid Weiss f568cfa15a LUCENE-10328: open up certain packages for junit and the test framework (reflective access). 2022-01-06 19:06:02 +01:00
Dawid Weiss bc6aa00c5f LUCENE-10328: Module path for compiling and running tests is wrong (#571) 2022-01-06 19:06:01 +01:00
Uwe Schindler 336341ed71 Revert this change as module system work was not yet backported 2022-01-06 18:53:32 +01:00
Robert Muir 0678611a50 LUCENE-10353: add random null injection to TestRandomChains (#586)
Co-authored-by: Uwe Schindler <uschindler@apache.org>, Robert Muir <rmuir@apache.org>
2022-01-06 16:58:37 +01:00
Adrien Grand a87cd9ae16 Fix path of docs for import into the website. (#524)
The current `svn import` looks for docs where they used to be produced by the
`Ant` build, but `Gradle` now puts them in a different place.
2022-01-06 09:27:03 +01:00
Adrien Grand 6c1b9f74e8 LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields. (#583) 2022-01-05 18:47:56 +01:00
Adrien Grand 7572352b79 LUCENE-10291: Bug fix. 2022-01-05 16:37:17 +01:00
Adrien Grand 5920486671 LUCENE-10291: CHANGES entry 2022-01-05 16:31:23 +01:00
Yannick Welsch 738247e78d LUCENE-10291: Only read/write postings when there is at least one indexed field (#539) 2022-01-05 16:31:23 +01:00
Adrien Grand 2fd41b8ff2 Use CDN to download source release. (#529) 2022-01-05 15:54:49 +01:00
Adrien Grand 90cc570343 Modernize release announcement text. (#525)
It currently reads as Lucene is a full-text search library when it can do much
more than that nowadays.
2022-01-05 15:54:09 +01:00
gf2121 76d83507be LUCENE-10343: Remove MyRandom in favor of test framework random (#573) 2022-01-05 15:54:06 +01:00
Uwe Schindler 75259417f1 LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system (#582)
Co-authored-by: Robert Muir <rmuir@apache.org>
2022-01-05 15:37:17 +01:00
gf2121 3fda998fc7 LUCENE-10355: clean zeros (#584) 2022-01-05 15:23:34 +01:00
Mayya Sharipova 0a1cf31084 LUCENE-10351 Correct knn search failure with deleted docs (#580)
Current when doing knn search on an segment where all documents
with knn field were deleted, we get the following error:

maxSize must be > 0 and < 2147483630; got: 0
java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; got: 0
	at __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0)
	at org.apache.lucene.util.LongHeap.<init>(LongHeap.java:57)
	at org.apache.lucene.util.LongHeap$1.<init>(LongHeap.java:69)
	at org.apache.lucene.util.LongHeap.create(LongHeap.java:69)
	at org.apache.lucene.util.hnsw.NeighborQueue.<init>(NeighborQueue.java:41)
	at org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)#

This patch fixes this error and ensures empty TopDocs are returned when
knn field doesn't have any documents left.
2022-01-04 16:56:49 -05:00
Uwe Schindler fbc8923b0a Merge branch 'branch_9x' of https://gitbox.apache.org/repos/asf/lucene into branch_9x 2022-01-04 15:07:18 +01:00
Uwe Schindler d17c54b5ff LUCENE-10348: Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader (#581) 2022-01-04 15:06:56 +01:00
Christine Poerschke 8a5b4a091c Update copyright year in NOTICE.txt file. 2022-01-04 10:45:20 +00:00
Dawid Weiss 78077d4d69 LUCENE-10347: add a helper task 'collectRuntimeJars' that assembles binary artifacts under each module's build 'runtimeJars' folder. (#576) 2022-01-03 21:12:46 +01:00