36149 Commits

Author SHA1 Message Date
Vigya Sharma
c132bbf677
LUCENE-10084: Rewrite DocValuesFieldExistsQuery to MatchAllDocsQuery when all docs have the field (#677)
Since all documents are required to use the same features (LUCENE-9334) we can
rewrite DocValuesFieldExistsQuery to a MatchAllDocsQuery whenever terms or
points have a docCount that is equal to maxDoc.
2022-02-17 11:20:06 -08:00
Greg Miller
00029f1ec4 Add CHANGES entry for LUCENE-10398 2022-02-17 09:26:11 -08:00
spike.liu
fc3c790ab4
LUCENE-10398: Add static method for getting Terms from LeafReader (#678)
Co-authored-by: chengliu@ctrip.com <chengliu@ctrip.com>
2022-02-17 09:21:51 -08:00
Mayya Sharipova
f8c5408be7
LUCENE-10408 Better encoding of doc Ids in vectors (#649)
Better encoding of doc Ids in Lucene91HnswVectorsFormat
for a dense case where all docs have vectors.

Currently we write doc Ids of all documents that have vectors
not very efficiently.
This improve their encoding by for a case when all documents
have vectors, we don't write document IDs, but just write a
single short value – a dense marker.
2022-02-17 11:34:42 +01:00
Ignacio Vera
84e34dc468
LUCENE-10415: FunctionScoreQuery and IndexOrDocValuesQuery delegate Weight#count. (#685)
These query wrappers do not modify the set of matching documents so they can delegate Weight#count.
2022-02-17 08:03:47 +01:00
Gautam Worah
dd25fabb03
LUCENE-10378 Implement Weight#count for PointRangeQuery (#658)
Implement Weight#count for PointRangeQuery to provide a faster way to calculate
the number of matching range docs when each doc has at-most one point and the 
points are 1-dimensional.
2022-02-16 07:23:49 +01:00
Patrick Zhai
6157854523
LUCENE-10371 Make IndexRearranger able to arrange segment in a determined order (#630) 2022-02-15 10:52:40 -08:00
Uwe Schindler
70c152bf32
LUCENE-10420: Remove deprecated interfaces and methods in IOUtils in main (#680) 2022-02-14 17:05:34 +01:00
Tomoko Uchida
db8fcb84bb
LUCENE-10420: Move functional interfaces in IOUtils to top-level interfaces (#673)
Co-authored-by: Uwe Schindler <uschindler@apache.org>
2022-02-15 00:12:28 +09:00
Dawid Weiss
8aa4763070 LUCENE-10419: fix rat thread safety bug. 2022-02-13 18:43:13 +01:00
Dawid Weiss
a861ff8df2 LUCENE-10419: revert debugging changes. 2022-02-13 18:34:57 +01:00
Dawid Weiss
50b7e2970f LUCENE-10419: more debugging code. The message from AbstractStringBuilder suggests a concurrency issue somewhere, but I just can't see it! 2022-02-12 20:22:49 +01:00
Dawid Weiss
21c5b42063 LUCENE-10419: upgrade rat to 0.13. 2022-02-10 17:37:06 +01:00
Tomoko Uchida
4cb55a7e9c
trivial updates on github actions (#674) 2022-02-11 01:13:18 +09:00
Luca Cavanna
ea170c9fab
Avoid SimpleText codec in TestIndexSortSortedNumericDocValuesRangeQuery (#675)
The recently introduced testCount verifies that the Weight#count optimization kicks in. When SimpleText codec is used, `DocValues#unwrapSingleton` returns null which disables the optimization and makes the test fail.
2022-02-10 17:06:31 +01:00
Dawid Weiss
f6cebac333
LUCENE-10414: Add fn:fuzzyTerm interval function to flexible query parser (#668) 2022-02-10 12:18:13 +01:00
Dawid Weiss
1f1da12c89 LUCENE-10419: add debugging code. 2022-02-10 12:03:54 +01:00
Adrien Grand
69d3a1d6af
LUCENE-10412: Improve handling of MatchNoDocsQuery in rewrites. (#664) 2022-02-09 19:02:54 +01:00
Alan Woodward
2183756f1c
LUCENE-10413: Make default Ukrainian stopword set available (#665)
This commit adds a new getDefaultStopwords() static method to
UkrainianMorfologikAnalyzer, which makes it possible to create an
analyzer with the default stop word set but a custom stem exclusion
set.
2022-02-09 14:37:44 +00:00
Greg Miller
8178ffda00
LUCENE-10403: Add ArrayUtil#grow(T[]) (#644) 2022-02-08 09:43:55 -08:00
Adrien Grand
ce93d45532 LUCENE-10367: Optimize CoveringQuery for the case when the minimum number of matching clauses is a constant. 2022-02-08 17:25:53 +01:00
Nhat Nguyen
bcb70fd742
LUCENE-10190: Ensure changes are visible before advancing seqno (#640)
DocumentWriter#anyChanges() can return false after we process and
generate a sequence number for an update operation; but before we adjust
the numDocsInRAM. In this window of time, refreshes are noop, although
the maxCompletedSequenceNumber has advanced.
2022-02-08 10:29:20 -05:00
gf2121
5250186bd1
LUCENE-10410: Add more tests for legacy decoding logic in DocIdsWriter (#654) 2022-02-08 16:59:32 +08:00
Tomoko Uchida
20f7f33c8d
LUCENE-10400: cleanup obsolete APIs in kuromoji (#655) 2022-02-08 09:32:33 +09:00
Julie Tibshirani
eb5bdd7d15
Rename KnnGraphValues -> HnswGraph (#645)
This PR proposes some renames to clarify the code structure. The top-level
`KnnGraphValues` is renamed to `HnswGraph`, since it now represents a
hierarchical graph. It's also moved from `org.apache.lucene.index` to the
`hnsw` package.

Other renames:
* The old `HnswGraph` -> `OnHeapHnswGraph`
* `IndexedKnnGraphValues` -> `OffHeapHnswGraph` (to match
`OffHeapVectorValues`)
2022-02-07 13:21:15 -08:00
Tomoko Uchida
e7546c2427
LUCENE-10400: revise binary dictionaries' constructor in kuromoji (#643) 2022-02-07 19:31:22 +09:00
gf2121
e93b08f471
LUCENE-10315: Add CHANGES for #541 (#653) 2022-02-07 16:23:34 +08:00
gf2121
8c67a3816b
LUCENE-10315: Speed up BKD leaf block ids codec by a 512 ints ForUtil (#541) 2022-02-07 15:35:54 +08:00
Ignacio Vera
4c578017af
LUCENE-10405: binary and Sorted doc values are stored as BytesRef instead of BytesRefHash in memory index (#647)
When using the MemoryIndex, binary and Sorted doc values are stored 
as BytesRef instead of BytesRefHash so they don't have a limit on size.
2022-02-07 07:33:07 +01:00
Greg Miller
deef3c704e
Update github hunspell regression test to use JDK 17 (#651) 2022-02-06 08:00:31 -08:00
Gautam Worah
de4eccbb55
LUCENE-10050 Remove DrillSideways#search(DrillDownQuery,Collector) in favor of DrillSideways#search(DrillDownQuery,CollectorManager) (#632) 2022-02-04 15:25:52 -08:00
Mayya Sharipova
ff2189c477 Add changes item for LUCENE-10054 2022-02-04 14:51:48 -05:00
Mayya Sharipova
ea4ab26e52 LUCENE-9573 Add Vectors to TestBackwardsCompatibility (#636)
Update index.9.0.0-cfs.zip and index.9.0.0-nocfs.zip
to include knn vector field.
2022-02-04 14:42:44 -05:00
Alan Woodward
6b64f4b556
LUCENE-10407: Set bpos flag to true when containing filter is exhausted (#648)
ContainedByIntervalIterator and OverlappingIntervalIterator set their 'is the filter
interval exhausted' flag to `false` once it has returned NO_MORE_POSITIONS on
a document, so that subsequent calls to `startPosition()` will also return
NO_MORE_POSITIONS. ContainingIntervalIterator omits to do this, and so it can
incorrectly report matches, for example when used in a disjunction.  This commit
fixes that omission.
2022-02-04 16:44:57 +00:00
Alan Woodward
9ebee5a058 LUCENE-10402: Changes entry 2022-02-04 15:28:44 +00:00
Alan Woodward
e72d796e96
LUCENE-10402: Prefix interval automaton should be declared binary (#646) 2022-02-04 15:27:03 +00:00
Adrien Grand
ed6c1b5aea
LUCENE-10401: Fix lookups on empty doc-values terms dictionaries. (#642) 2022-02-04 09:28:35 +01:00
Julie Tibshirani
57d9515eff
LUCENE-10391: Reuse data structures across HnswGraph#searchLevel calls (#641)
A couple of the data structures used in HNSW search are pretty large and
expensive to allocate. This commit creates a shared candidates queue and
visited set that are reused across calls to HnswGraph#searchLevel. Now the same
data structures are used for building the entire graph, which can cut down on
allocations during indexing. For graph building it also switches the visited
set to FixedBitSet for better performance.
2022-02-03 16:00:09 -08:00
Luca Cavanna
bade484998
LUCENE-10385: Implement Weight#count on IndexSortSortedNumericDocValuesRangeQuery (#635)
IndexSortSortedNumericDocValuesRangeQuery can implement its count method and coompute count through a binary search, the same binary search that is used to execute the query itself, whenever all the required conditions are met.
2022-02-03 17:19:05 +01:00
Luca Cavanna
ee7a8d6918
LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests (#639)
Also use LuceneTestCase#newSearcher
2022-02-03 17:17:02 +01:00
Dawid Weiss
9a28c91a5a LUCENE-10283: bump the minimum source/release in javadoc settings. 2022-02-02 17:25:50 +01:00
Dawid Weiss
87bba4152c LUCENE-10283: bump the minimum source/release in ecj linter settings. 2022-02-02 17:25:41 +01:00
Mike Drob
56f49257ed
null check on infoStream (#637) 2022-02-02 09:44:31 -06:00
Mayya Sharipova
c8e1c08cc8
Small fix for assertConsistentGraph (#631)
TestKnnGraph.testMultipleVectorFields sometimes breaks with
the following message:

java.lang.NullPointerException: Cannot invoke
 "org.apache.lucene.codecs.lucene91.Lucene91HnswVectorsReader.getGraphValues(String)"
because "vectorReader" is null

This happens in assertConsistentGraph.

This patch ensures that for a segment and a field  where there is no
vectors indexed, we don't run a check on consistent graph.
2022-02-01 10:21:48 -05:00
Dawid Weiss
f103cca565 LUCENE-10255: Add the required unnamed modules in benchmarks subproject to module-info so that they are explicit. 2022-02-01 12:15:01 +01:00
Dawid Weiss
e7212fa47d LUCENE-10283: bump minimum JDK version to 17 in buildSrc. 2022-02-01 12:09:35 +01:00
Mayya Sharipova
8dfdb261e7
LUCENE-9573 Add Vectors to TestBackwardsCompatibility (#616)
This patch adds KNN vectors for testing backward compatible indices

- Add a KnnVectorField to documents when creating a new backward
  compatible index
- Add knn vectors search and check for vector values to the testing
  of search of backward compatible indices
- Add tests for knn vector search when changing backward compatible
 indices (merging them and adding new documents to them)
2022-01-31 09:20:53 -05:00
Luca Cavanna
df12e2b195
LUCENE-10395: Introduce TotalHitCountCollectorManager (#622) 2022-01-31 14:45:35 +01:00
Luca Cavanna
933c54fe87
Improve Weight#count and IndexSearcher#count javadocs (#625) 2022-01-28 16:47:25 +01:00
Robert Muir
61edacee5d
update javac flags for java 17 (#628)
Previously -Xlint:text-blocks and -Xlint:text-blocks were enabled
conditionally, if the user had at least java 15 or java 16,
respectively. Enable them always.

Add new options so that the warnings list is fully configured:
* -Xlint:module (new in java 17)
* -Xlint:strictfp (new in java 17)

Disable "path" with -Xlint:-path rather than commenting it out, for
consistency.

Disable "missing-explicit-ctor" (new in java 17), as it is unlikely to
succeed right now.

Alphasort the flags and doc how to get the updated list, this makes it
easy to compare and keep up to date.
2022-01-28 05:48:58 -05:00