35737 Commits

Author SHA1 Message Date
Greg Miller
eaf3cb6739 Fix minor bug that snuck in with LUCENE-9952 2022-01-24 06:58:31 -08:00
Greg Miller
9e560c1af1
LUCENE-9952: Fix dim count inaccuracies in SSDV faceting when a dim is multi-valued (#611) 2022-01-24 06:48:20 -08:00
Greg Miller
10ca531ddc
LUCENE-10381: Require users to provide FacetsConfig for SSDV faceting (#613) 2022-01-24 06:46:22 -08:00
Julie Tibshirani
fb09ae1f7c Undo accidental change to build.gradle 2022-01-23 16:26:16 -08:00
Julie Tibshirani
7ece8145bc
LUCENE-10375: Write vectors to file in flush (#617)
In a previous commit, we updated HNSW merge to first write the combined segment
vectors to a file, then use that file to build the graph. This commit applies
the same strategy to flush, which lets us use the same logic for flush and
merge.
2022-01-23 16:19:23 -08:00
Dawid Weiss
08d6633d94 LUCENE-8930: increase timeout for the launched luke. 2022-01-20 16:51:05 +01:00
Ignacio Vera
4ec8f865c8
LUCENE-10288: Check BKD tree shape for lucene pre-8.6 1D indexes (#607)
Adds efficient logic to compute if a tree is balanced or unbalanced for indexes 
created before Lucene 8.6
2022-01-20 07:49:29 +01:00
Dawid Weiss
72ba7ae2ee
LUCENE-8930: script testing in the distribution (#550) 2022-01-20 00:09:15 +09:00
Julie Tibshirani
9b6d417d1c LUCENE-10040: Update HnswGraph javadoc related to deletions
Previously it claimed the search method did not handle deletions.
2022-01-18 15:36:00 -08:00
Julie Tibshirani
dfca9a5608
LUCENE-10375: Write merged vectors to file before building graph (#601)
When merging segments together, the `KnnVectorsWriter` creates a `VectorValues`
instance with a merged view of all the segments' vectors. This merged instance
is used when constructing the new HNSW graph. Graph building needs random
access, and the merged VectorValues support this by mapping from merged
ordinals to segments and segment ordinals. This mapping can add significant
overhead when building the graph.

This change updates the HNSW merging logic to first write the combined segment
vectors to a file, then use that the file to build the graph. This helps speed
up segment merging, and also lets us simplify `VectorValuesMerger`, which
provides the merged view of vector values.
2022-01-18 13:53:05 -08:00
Alan Woodward
2e2c4818d1
LUCENE-10377: Replace 'sortPos' with 'enableSkipping' in SortField.getComparator() (#603)
The sort position parameter in SortField.getComparator() is only ever used
to determine whether or not skipping should be enabled on a given comparator,
so the parameter name should reflect that.  This commit also explicitly disables
skipping in a number of cases where it is never used, in particular CheckIndex
and the grouping collectors.
2022-01-17 10:44:57 +00:00
Adrien Grand
457367e9b7 LUCENE-10168: Fix typo that would _not_ run nightly tests. 2022-01-14 13:51:16 +01:00
Greg Miller
2f5e3c323b
LUCENE-10379: Count directly into the dense values array in FastTaxonomyFacetCounts#countAll (#605)
Co-authored-by: guofeng.my <guofeng.my@bytedance.com>
2022-01-13 09:17:55 -08:00
Mayya Sharipova
bd2cc4124d
Small edits for KnnGraphTester (#575)
1. Correct the remaining size for input files larger
than Integer.MAX_VALUE, as currently with every
iteration we try to map the next blockSize of bytes
even if less < blockSize bytes are left in the file.

2. Correct java.lang.ClassCastException when retrieving
KnnGraphValues for stats printing.

3. Add an option for euclidean metric
2022-01-12 17:23:10 -05:00
gf2121
8d9fa6dba1
revert LUCENE-10355 (#597)
Trying to find the source of taxo-facet performance regression. See also LUCENE-10374

Co-authored-by: guofeng.my <guofeng.my@bytedance.com>
2022-01-12 10:23:13 -08:00
Adrien Grand
71dfa9e9cd
addBackcompatIndexes.py should use Gradle, not Ant. (#531) 2022-01-12 18:55:59 +01:00
Uwe Schindler
636d42e032 Fix wrong project name 2022-01-11 17:42:21 +01:00
Nikola Grcevski
bad65c53c9
LUCENE-10369: Move DelegatingCacheHelper to FilterDirectoryReader (#596) 2022-01-11 15:22:06 +01:00
Adrien Grand
308ddd7502
Add documentation on file formats. (#598) 2022-01-11 15:16:05 +01:00
Adrien Grand
f81c760cc8 LUCENE-10370: Fix precommit. 2022-01-11 10:13:10 +01:00
Dawid Weiss
9b54fbaa01 LUCENE-10370: temporarily ignore TestStressNRTReplication 2022-01-11 09:25:31 +01:00
Greg Miller
82703757fe
LUCENE-10245: Addition of MultiDoubleValues(Source) and MultiLongValues(Source) along with faceting capabilities (#543) 2022-01-10 13:48:36 -08:00
Dawid Weiss
bff930c1bf LUCENE-10370: temporarily ignore TestNRTReplication. 2022-01-10 22:18:12 +01:00
Greg Miller
cf12b46092
LUCENE-10356: Further optimize facet counting for single-valued TaxonomyFacetCounts (#585) 2022-01-10 10:23:46 -08:00
Greg Miller
eb0b1bf9f1 Add CHANGES entry for LUCENE-10250 2022-01-10 08:57:28 -08:00
Marc D'mello
b4e27f2c63
LUCENE-10250: Add support for arbitrary length hierarchical SSDV facets (#509) 2022-01-10 08:52:14 -08:00
gf2121
e750f6cd37
LUCENE-10350: Avoid some null checking for FastTaxonomyFacetCounts#countAll() (#578) 2022-01-10 07:43:09 -08:00
Adrien Grand
2ebc57a465
LUCENE-10283: Bump minimum required Java version to 17. (#579)
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
2022-01-10 15:42:15 +01:00
Adrien Grand
74698994a9
Simplify some exception handling with try-with-resources. (#589) 2022-01-10 15:40:47 +01:00
Yannick Welsch
d9d65ab849
LUCENE-10291: Don't use CFS in testMinimalCodec (#593)
This test was occasionally failing on CI, as the test randomly installed a merge policy
that would force compound file creation while the goal of the test was not to do so.
2022-01-10 12:17:45 +00:00
Uwe Schindler
42fe2d5620
LUCENE-10364: Prepare and update errorprone plugin for Java 17 (#590) 2022-01-07 19:19:46 +01:00
zacharymorn
d0ad9f5bfc
LUCENE-10183: KnnVectorsWriter#writeField to take KnnVectorsReader instead of VectorValues (#534) 2022-01-06 22:14:41 -08:00
Robert Muir
f2e00bb9e0
LUCENE-10353: add random null injection to TestRandomChains (#586)
Co-authored-by: Uwe Schindler <uschindler@apache.org>, Robert Muir <rmuir@apache.org>
2022-01-06 16:56:49 +01:00
Adrien Grand
603a43f668
Fix path of docs for import into the website. (#524)
The current `svn import` looks for docs where they used to be produced by the
`Ant` build, but `Gradle` now puts them in a different place.
2022-01-06 09:26:45 +01:00
Dawid Weiss
b8da9f32c8 LUCENE-10328: open up certain packages for junit and the test framework (reflective access). 2022-01-05 21:02:51 +01:00
Dawid Weiss
ff547e7bbd
LUCENE-10328: Module path for compiling and running tests is wrong (#571) 2022-01-05 20:42:02 +01:00
Adrien Grand
c8651afde7
LUCENE-10354: Clarify contract of codec APIs with missing/disabled fields. (#583) 2022-01-05 18:47:35 +01:00
Adrien Grand
7fdba36941 LUCENE-10291: Bug fix. 2022-01-05 16:37:37 +01:00
Adrien Grand
f9ff620ec6 LUCENE-10291: CHANGES entry 2022-01-05 16:30:58 +01:00
Yannick Welsch
8fa7412dec
LUCENE-10291: Only read/write postings when there is at least one indexed field (#539) 2022-01-05 16:28:00 +01:00
Adrien Grand
65296e5f84
Use CDN to download source release. (#529) 2022-01-05 15:54:33 +01:00
Adrien Grand
6149387f7c
Modernize release announcement text. (#525)
It currently reads as Lucene is a full-text search library when it can do much
more than that nowadays.
2022-01-05 15:53:49 +01:00
Uwe Schindler
475fbd0bdd
LUCENE-10352: Convert TestAllAnalyzersHaveFactories and TestRandomChains to a global integration test and discover classes to check from module system (#582)
Co-authored-by: Robert Muir <rmuir@apache.org>
2022-01-05 15:35:02 +01:00
gf2121
238119224a
LUCENE-10343: Remove MyRandom in favor of test framework random (#573) 2022-01-05 15:31:00 +01:00
gf2121
60b80017cb
LUCENE-10355: clean zeros (#584) 2022-01-05 15:23:16 +01:00
Mayya Sharipova
78da703037
LUCENE-10351 Correct knn search failure with deleted docs (#580)
Current when doing knn search on an segment where all documents
with knn field were deleted, we get the following error:

maxSize must be > 0 and < 2147483630; got: 0
java.lang.IllegalArgumentException: maxSize must be > 0 and < 2147483630; got: 0
	at __randomizedtesting.SeedInfo.seed([43F1F124D7076A4E:1B860BFCCB9B0BB5]:0)
	at org.apache.lucene.util.LongHeap.<init>(LongHeap.java:57)
	at org.apache.lucene.util.LongHeap$1.<init>(LongHeap.java:69)
	at org.apache.lucene.util.LongHeap.create(LongHeap.java:69)
	at org.apache.lucene.util.hnsw.NeighborQueue.<init>(NeighborQueue.java:41)
	at org.apache.lucene.util.hnsw.HnswGraph.search(HnswGraph.java:105)#

This patch fixes this error and ensures empty TopDocs are returned when
knn field doesn't have any documents left.
2022-01-04 15:59:30 -05:00
Uwe Schindler
4bacf93c7e
LUCENE-10348: Make stopwords resources from analyzers modules visible to ClasspathResourceLoader and ModuleResourceLoader (#581) 2022-01-04 15:05:29 +01:00
Christine Poerschke
ef1a554204 Update copyright year in NOTICE.txt file. 2022-01-04 10:43:46 +00:00
Dawid Weiss
0f0d06ca28
LUCENE-10347: add a helper task 'collectRuntimeJars' that assembles binary artifacts under each module's build 'runtimeJars' folder. (#576) 2022-01-03 21:11:35 +01:00
Adrien Grand
cc5634f0f1 Remove unused backward indices. 2022-01-03 15:17:47 +01:00