Commit Graph

35879 Commits

Author SHA1 Message Date
Julie Tibshirani 9b6d417d1c LUCENE-10040: Update HnswGraph javadoc related to deletions
Previously it claimed the search method did not handle deletions.
2022-01-18 15:36:00 -08:00
Julie Tibshirani dfca9a5608
LUCENE-10375: Write merged vectors to file before building graph (#601)
When merging segments together, the `KnnVectorsWriter` creates a `VectorValues`
instance with a merged view of all the segments' vectors. This merged instance
is used when constructing the new HNSW graph. Graph building needs random
access, and the merged VectorValues support this by mapping from merged
ordinals to segments and segment ordinals. This mapping can add significant
overhead when building the graph.

This change updates the HNSW merging logic to first write the combined segment
vectors to a file, then use that the file to build the graph. This helps speed
up segment merging, and also lets us simplify `VectorValuesMerger`, which
provides the merged view of vector values.
2022-01-18 13:53:05 -08:00
Alan Woodward 2e2c4818d1
LUCENE-10377: Replace 'sortPos' with 'enableSkipping' in SortField.getComparator() (#603)
The sort position parameter in SortField.getComparator() is only ever used
to determine whether or not skipping should be enabled on a given comparator,
so the parameter name should reflect that.  This commit also explicitly disables
skipping in a number of cases where it is never used, in particular CheckIndex
and the grouping collectors.
2022-01-17 10:44:57 +00:00
Adrien Grand 457367e9b7 LUCENE-10168: Fix typo that would _not_ run nightly tests. 2022-01-14 13:51:16 +01:00
Greg Miller 2f5e3c323b
LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll (#605)
Co-authored-by: guofeng.my <guofeng.my@bytedance.com>
2022-01-13 09:17:55 -08:00
Mayya Sharipova bd2cc4124d
Small edits for KnnGraphTester (#575)
1. Correct the remaining size for input files larger
than Integer.MAX_VALUE, as currently with every
iteration we try to map the next blockSize of bytes
even if less < blockSize bytes are left in the file.

2. Correct java.lang.ClassCastException when retrieving
KnnGraphValues for stats printing.

3. Add an option for euclidean metric
2022-01-12 17:23:10 -05:00
gf2121 8d9fa6dba1
revert LUCENE-10355 (#597)
Trying to find the source of taxo-facet performance regression. See also LUCENE-10374

Co-authored-by: guofeng.my <guofeng.my@bytedance.com>
2022-01-12 10:23:13 -08:00
Adrien Grand 71dfa9e9cd
addBackcompatIndexes.py should use Gradle, not Ant. (#531) 2022-01-12 18:55:59 +01:00
Uwe Schindler 636d42e032 Fix wrong project name 2022-01-11 17:42:21 +01:00
Nikola Grcevski bad65c53c9
LUCENE-10369: Move DelegatingCacheHelper to FilterDirectoryReader (#596) 2022-01-11 15:22:06 +01:00
Adrien Grand 308ddd7502
Add documentation on file formats. (#598) 2022-01-11 15:16:05 +01:00
Adrien Grand f81c760cc8 LUCENE-10370: Fix precommit. 2022-01-11 10:13:10 +01:00
Dawid Weiss 9b54fbaa01 LUCENE-10370: temporarily ignore TestStressNRTReplication 2022-01-11 09:25:31 +01:00
Greg Miller 82703757fe
LUCENE-10245: Addition of MultiDoubleValues(Source) and MultiLongValues(Source) along with faceting capabilities (#543) 2022-01-10 13:48:36 -08:00
Dawid Weiss bff930c1bf LUCENE-10370: temporarily ignore TestNRTReplication. 2022-01-10 22:18:12 +01:00
Greg Miller cf12b46092
LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts (#585) 2022-01-10 10:23:46 -08:00
Greg Miller eb0b1bf9f1 Add CHANGES entry for LUCENE-10250 2022-01-10 08:57:28 -08:00
Marc D'mello b4e27f2c63
LUCENE-10250: Add support for arbitrary length hierarchical SSDV facets (#509) 2022-01-10 08:52:14 -08:00
gf2121 e750f6cd37
LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll() (#578) 2022-01-10 07:43:09 -08:00
Adrien Grand 2ebc57a465
LUCENE-10283: Bump minimum required Java version to 17. (#579)
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
2022-01-10 15:42:15 +01:00
Adrien Grand 74698994a9
Simplify some exception handling with try-with-resources. (#589) 2022-01-10 15:40:47 +01:00
Yannick Welsch d9d65ab849
LUCENE-10291: Don't use CFS in testMinimalCodec (#593)
This test was occasionally failing on CI, as the test randomly installed a merge policy
that would force compound file creation while the goal of the test was not to do so.
2022-01-10 12:17:45 +00:00
Uwe Schindler 42fe2d5620
LUCENE-10364: Prepare and update errorprone plugin for Java 17 (#590) 2022-01-07 19:19:46 +01:00
zacharymorn d0ad9f5bfc
LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues (#534) 2022-01-06 22:14:41 -08:00
Robert Muir f2e00bb9e0
LUCENE-10353: add random null injection to TestRandomChains (#586)
Co-authored-by: Uwe Schindler <uschindler@apache.org>, Robert Muir <rmuir@apache.org>
2022-01-06 16:56:49 +01:00
Adrien Grand 603a43f668
Fix path of docs for import into the website. (#524)
The current `svn import` looks for docs where they used to be produced by the
`Ant` build, but `Gradle` now puts them in a different place.
2022-01-06 09:26:45 +01:00
Dawid Weiss b8da9f32c8 LUCENE-10328: open up certain packages for junit and the test framework (reflective access). 2022-01-05 21:02:51 +01:00
Dawid Weiss ff547e7bbd
LUCENE-10328: Module path for compiling and running tests is wrong (#571) 2022-01-05 20:42:02 +01:00
Adrien Grand c8651afde7
LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields. (#583) 2022-01-05 18:47:35 +01:00
Adrien Grand 7fdba36941 LUCENE-10291: Bug fix. 2022-01-05 16:37:37 +01:00
Adrien Grand f9ff620ec6 LUCENE-10291: CHANGES entry 2022-01-05 16:30:58 +01:00
Yannick Welsch 8fa7412dec
LUCENE-10291: Only read/write postings when there is at least one indexed field (#539) 2022-01-05 16:28:00 +01:00
Adrien Grand 65296e5f84
Use CDN to download source release. (#529) 2022-01-05 15:54:33 +01:00
Adrien Grand 6149387f7c
Modernize release announcement text. (#525)
It currently reads as Lucene is a full-text search library when it can do much
more than that nowadays.
2022-01-05 15:53:49 +01:00
Uwe Schindler 475fbd0bdd
LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system (#582)
Co-authored-by: Robert Muir <rmuir@apache.org>
2022-01-05 15:35:02 +01:00
gf2121 238119224a
LUCENE-10343: Remove MyRandom in favor of test framework random (#573) 2022-01-05 15:31:00 +01:00
gf2121 60b80017cb
LUCENE-10355: clean zeros (#584) 2022-01-05 15:23:16 +01:00
Mayya Sharipova 78da703037
LUCENE-10351 Correct knn search failure with deleted docs (#580)
Current when doing knn search on an segment where all documents
with knn field were deleted, we get the following error:

maxSize must be > 0 and < 2147483630; got: 0
java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; got: 0
	at __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0)
	at org.apache.lucene.util.LongHeap.<init>(LongHeap.java:57)
	at org.apache.lucene.util.LongHeap$1.<init>(LongHeap.java:69)
	at org.apache.lucene.util.LongHeap.create(LongHeap.java:69)
	at org.apache.lucene.util.hnsw.NeighborQueue.<init>(NeighborQueue.java:41)
	at org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)#

This patch fixes this error and ensures empty TopDocs are returned when
knn field doesn't have any documents left.
2022-01-04 15:59:30 -05:00
Uwe Schindler 4bacf93c7e
LUCENE-10348: Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader (#581) 2022-01-04 15:05:29 +01:00
Christine Poerschke ef1a554204 Update copyright year in NOTICE.txt file. 2022-01-04 10:43:46 +00:00
Dawid Weiss 0f0d06ca28
LUCENE-10347: add a helper task 'collectRuntimeJars' that assembles binary artifacts under each module's build 'runtimeJars' folder. (#576) 2022-01-03 21:11:35 +01:00
Adrien Grand cc5634f0f1 Remove unused backward indices. 2022-01-03 15:17:47 +01:00
Uwe Schindler 305d9ebb86
LUCENE-10349: Cleanup WordListLoader to use try-with-resources and make the default stop words unmodifiable (#577) 2022-01-03 15:07:44 +01:00
Adrien Grand 835e821287 LUCENE-10346: Move CHANGES entry to 9.1. 2022-01-03 15:04:24 +01:00
Uwe Schindler 8b5887f244 LUCENE-10287: Remove obsolete changes entry (we now have a warning and won't rely on the module when staring luke) 2022-01-03 14:59:12 +01:00
gf2121 26713b3f57
LUCENE-10346: Specially treat SingletonSortedNumericDocValues in FastTaxonomyFacetCounts#countAll() (#574) 2022-01-03 14:44:05 +01:00
David Smiley 1ee11a8497
LUCENE-10252: ValueSource.asDoubleValues should not compute the score (#519)
ValueSource.asDoubleValues and asLongValues should not compute the score unless asked to -- typically never.  This fixes a performance regression since 7.3 LUCENE-8099 when some older boosting queries were replaced with this.
2022-01-03 08:26:50 -05:00
Uwe Schindler 1c7ad19bf0 LUCENE-10335: Fix ECJ warning 2022-01-03 11:34:39 +01:00
Uwe Schindler cc342ea740
LUCENE-10335: Deprecate helper methods for resource loading in IOUtils and StopwordAnalyzerBase that are not compatible with module system; add utility method IOUtils#requireResourceNonNull(T) and add ModuleResourceLoader as complement to ClasspathResourceLoader
Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
2022-01-03 10:38:19 +01:00
Uwe Schindler 0b517573a4
LUCENE-10342: Add logging to static initializers to warn users if unmapping or object size calculation does not work (#572)
Co-authored-by: Tomoko Uchida <tomoko.uchida.1111@gmail.com>
2021-12-29 18:18:21 +01:00