35476 Commits

Author SHA1 Message Date
Patrick Zhai
6b99f03cdd
LUCENE-10122 Use NumericDocValue to store taxonomy parent array (#454) 2021-11-19 13:05:56 -05:00
Quentin Pradet
631d1ad749 LUCENE-10085: Implement Weight#count on DocValuesFieldExistsQuery (#445)
Co-authored-by: Adrien Grand <jpountz@gmail.com>
2021-11-19 18:07:29 +01:00
Robert Muir
ee56d31425
LUCENE-10239: upgrade jflex (1.7.0 -> 1.8.2) (#452)
Upgrade jflex.

Change doesn't alter the behavior of any of the analyzers (unicode
version or grammar refactorings), just the minimal to get new tooling
working.
2021-11-19 09:28:11 -05:00
Ignacio Vera
9adf7e27f9
LUCENE-9820: Separate logic for reading the BKD index from logic to intersecting it (#7) (#457)
Extract BKD tree interface and move intersecting logic to the PointValues abstract class.
2021-11-19 08:39:28 +01:00
Jim Ferenczi
2e5c4bb5a5 LUCENE-10208: Ensure that the minimum competitive score does not decrease in concurrent search (#431)
Co-authored-by: Adrien Grand <jpountz@gmail.com>
2021-11-18 17:33:04 +01:00
Andriy Redko
42bee6f223 LUCENE-10242: The TopScoreDocCollector::createSharedManager should use ScoreDoc instead of FieldDoc (#450)
Signed-off-by: Andriy Redko <andriy.redko@aiven.io>
2021-11-18 16:36:32 +01:00
Dawid Weiss
8d07018050 LUCENE-10240: gradle regenerate fails on java 17 (#449) 2021-11-17 18:36:58 +01:00
Dawid Weiss
4c22d30f80 LUCENE-10238: Update icu4j to 70.1. (#447) 2021-11-17 18:14:33 +01:00
Adrien Grand
7ce0cfa9c5 Add back-compat indices for 8.11.0 2021-11-17 11:51:18 +01:00
Bruno Roustant
02a63f688c
LUCENE-10225: Improve IntroSelector with 3-way partitioning. 2021-11-17 11:31:11 +01:00
Adrien Grand
b6f456573a DOAP changes for release 8.11.0 2021-11-16 10:55:08 +01:00
Dawid Weiss
9d0eb88d2c LUCENE-10234: Add automatic module name to JAR manifests. (#440) 2021-11-15 17:03:08 +01:00
Quentin Pradet
e034a2d6e2 LUCENE-10085: Rename DocValuesFieldExistsQuery test (#441)
FieldValueQuery got renamed to DocValuesFieldExistsQuery but the test
wasn't renamed.
2021-11-15 16:24:57 +01:00
Julie Tibshirani
607b10dc2a LUCENE-10069: Document that kNN queries might not return all results (#434)
Performing a kNN search with very large k may return fewer than k documents.
This is due to the fact that the HNSW graph is not guaranteed to be connected.
This commit documents the behavior as part of a general warning that the results
of a kNN search may be approximate.
2021-11-12 14:20:09 -08:00
Julie Tibshirani
68be365283 LUCENE-10063: Fix score calculation in SimpleTextKnnVectorsFormat
The method VectorSimilarityFunction#convertToScore already reverses the
similarity, so we shouldn't reverse it again.
2021-11-11 11:36:50 -08:00
Julie Tibshirani
9c73562161 LUCENE-10228: Ensure PerFieldKnnVectorsFormat uses right format name (#432)
Before when creating a KnnVectorsWriter for merging, we consulted the existing
"PER_FIELD_SUFFIX_KEY" attribute to determine the format's per-field suffix.
This isn't correct since we could be using a new codec (that produces different
formats/ suffixes).

This commit modifies TestPerFieldDocValuesFormat#testMergeUsesNewFormat to
trigger the problem. Without the fix we it throws an error like
"java.nio.file.FileAlreadyExistsException: File
"_3_Lucene90HnswVectorsFormat_0.vem" was already written to."
2021-11-11 11:22:52 -08:00
Dawid Weiss
ff9ee28c60 LUCENE-10223: interval support in standard syntax parser (#429) 2021-11-11 08:56:48 +01:00
Dawid Weiss
238cd5fd0c LUCENE-10226: test target creates a weird folder (lazy property). 2021-11-09 08:38:42 +01:00
Dawid Weiss
ffe40d23e1 LUCENE-10222: Enable github precommit check workflow on branch_9x 2021-11-05 09:01:45 +01:00
Dawid Weiss
5de05f3556 LUCENE-10220: Add an utility method to get IntervalSource from analyzed text (or token stream) (#427) 2021-11-05 08:58:37 +01:00
Uwe Schindler
6ccee3204f UCENE-10218: Extend validateSourcePatterns task to scan for LTR/RTL unicode to catch "Trojan Source" source code attacks (#425)
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
# Conflicts:
#	gradle/validation/validate-source-patterns.gradle
2021-11-03 17:21:15 +01:00
Adrien Grand
5fa093bdba Format javadocs of new versions in a way that Spotless is happy with. 2021-11-02 13:23:45 +01:00
Adrien Grand
713385004f Add next minor version 9.1.0 2021-11-02 13:20:20 +01:00
Adrien Grand
cc2a31f2be LUCENE-10103: Move CHANGES entry to correct version. 2021-11-02 10:35:55 +01:00
Bruno Roustant
63b9e603e6
LUCENE-10196: Improve IntroSorter with 3-ways partitioning. 2021-11-01 10:55:44 +01:00
Dawid Weiss
0544819b78
LUCENE-10200: store git revision in the release folder and read it back from buildAndPushRelease (#419) 2021-11-01 09:29:06 +01:00
Dawid Weiss
1d152c5f67 LUCENE-10192: drop jars from binary distribution and an aggregate merge of related minor tasks. 2021-10-31 10:50:11 +01:00
Dawid Weiss
98b17952f9 LUCENE-10213: Use unicode escapes in message property files in Luke (remove hacks) 2021-10-31 10:41:54 +01:00
Dawid Weiss
ded915b29b LUCENE-10192: Use modules instead of classpath for binary distribution testing. 2021-10-31 10:41:49 +01:00
Dawid Weiss
01839da593 LUCENE-10192: Adjust checks to the new binary file structure. 2021-10-31 10:41:42 +01:00
Dawid Weiss
d23f37d02d LUCENE-10200: The branch does not have to be on origin remote. Replace this logic with a check whether the branch is up to date with the remote. 2021-10-31 10:41:32 +01:00
Dawid Weiss
6d8ea58ccd LUCENE-10200: Rename pddl-10.txt to reference glove. 2021-10-31 10:41:16 +01:00
Dawid Weiss
7f7007966e LUCENE-10192: No need for hacky classpath, add the log4j module to the root set. Automatic modules have access to all other modules by default. 2021-10-31 10:41:10 +01:00
Dawid Weiss
627ef4d469 LUCENE-9978: Integrate Luke with the binary release package. 2021-10-31 10:40:51 +01:00
Dawid Weiss
39d388330c LUCENE-10192: Move the test framework to a separate top-level folder. I'm not even sure it really needs to be in the binary distribution but it is distinctively different from the rest of the modules. 2021-10-31 10:40:45 +01:00
Dawid Weiss
fda47a24f8 LUCENE-10192: Flatten the modules into a single jar folder to allow --module-path to be used. So much simpler.' 2021-10-31 10:40:23 +01:00
Dawid Weiss
bcdfc4c8c9 LUCENE-10192: drop third party jars from the binary distribution. 2021-10-31 10:36:33 +01:00
Michael Sokolov
84a4797d14 Apply query score conversion to vector similarities in SimpleTextKnnVectorReader 2021-10-30 21:26:17 -04:00
David Smiley
c2c215d3a8
LUCENE-10201: Upgrade Spatial4j to 0.8 (#409)
Upgrading Spatial4j to 0.8 improving a varitety of minor things.
See release notes:
https://github.com/locationtech/spatial4j/releases/tag/spatial4j-0.8

Test-only dependency on JTS is upgraded to 1.17 as well
2021-10-29 22:01:52 -04:00
Mike Drob
23256a30fa
Replace deprecated Gradle 7.2 properties (#417) 2021-10-29 09:59:47 -05:00
Adrien Grand
53b40e0fb7 LUCENE-10145: Revert change to computeMinMax.
This part of the change would call `ArrayUtil#getUnsignedComparator` on a
length that is rarely 4 or 8. In such cases it's better to use
`Arrays#compareUnsigned`.
2021-10-28 16:29:05 +02:00
Mike McCandless
512cad0e01 LUCENE-9673: fix IntBlockPool's slice allocator to actually grow properly with larger and larger slice-chained int[]; excise wasted RAM due to unused (overallocation) of int[] to track in-memory postings 2021-10-28 09:37:36 -04:00
Dawid Weiss
727c6b1e0b LUCENE-10209: Temporarily comment out gradle validation. 2021-10-27 21:12:14 +02:00
Dawid Weiss
62eb9a809e LUCENE-10200: remove unused dangling license exclusions. Add references to the remaining ones. 2021-10-27 20:40:39 +02:00
Julie Tibshirani
abd5ec4ff0
LUCENE-9614: Fix KnnVectorQuery failure when numDocs is 0 (#413)
When the reader has no live docs, `KnnVectorQuery` can error out. This happens
because `IndexReader#numDocs` is 0, and we end up passing an illegal value of
`k = 0` to the search method.

This commit removes the problematic optimization in `KnnVectorQuery` and
replaces with a lower-level based on the total number of vectors in the segment.
2021-10-27 11:08:47 -07:00
Nik Everett
941df98c3f
LUCENE-10206 Implement O(1) count on query cache (#415)
When we load a query into the query cache we always calculate the count
of matching documents. This uses that count to power the new `O(1)`
`Weight#count` method.
2021-10-27 10:20:10 +02:00
Dawid Weiss
1613355149 LUCENE-10163: update smoke tester - README inside lucene/ is no longer there in the source release. 2021-10-26 21:58:20 +02:00
Dawid Weiss
4329450392 LUCENE-10198: remove debug statement that crept in. 2021-10-26 21:33:19 +02:00
Dawid Weiss
fb6aaa7b2c
LUCENE-10199: drop binary .zip artifact. (#407) 2021-10-26 21:21:30 +02:00
Dawid Weiss
08c0356664
LUCENE-10163: clean up and remove some old cruft in readme files. Move binary release only README.md to the distribution project so that it doesn't look weird in the source tree. (#406) 2021-10-26 21:20:42 +02:00