Commit Graph

36374 Commits

Author SHA1 Message Date
Adrien Grand 432296d967
Fix codec name in index header for Lucene94FieldInfosFormat. (#11818) 2022-09-26 14:56:30 +02:00
Dawid Weiss 6b82be5f11
Regenerate sources after dependency updates. (#11817) 2022-09-25 18:09:30 +02:00
Dawid Weiss 5d121ce44c
Upgrade several build dependencies. (#11812)
* Upgrade several build dependencies.

* Update error prone rules (those are off but they do trigger warnings/ errors)

* A few corrections I made before I turned off new warnings. Let's do nother issue to fix them.
2022-09-25 17:10:22 +02:00
Robert Muir 15f3743f02
Remove Operations.isFinite (#11813)
This method is recursive: to avoid eating too much stack we apply a
small limit. This means it can't really be used on any largish automata
without hitting exception.

But the benefit of knowing finite vs infinite in AutomatonTermsEnum is
minor: let's not auto-compute this. FuzzyQuery still gets the finite
optimization because its finite by definition. PrefixQuery is always
infinite. Wildcard/Regex just assume infinite which is safe to do.

Remove the auto-computation and the "trillean" Boolean parameter. If you
dont know that your automaton is finite, pass false to
CompiledAutomaton, it is safe.

Move this method to AutomatonTestUtil so we can still use it in test
asserts.

Closes #11809
2022-09-24 10:51:04 -04:00
Dawid Weiss 54fba99cb1
Upgrade google java format and apply tidy (#11811) 2022-09-24 15:40:27 +02:00
Dawid Weiss 8bdfa90ea9 Fix and simplify the test (#11734). 2022-09-24 12:51:01 +02:00
Alan Woodward 188a78d769
Don't try to highlight very long terms (#11808)
The UnifiedHighlighter can throw exceptions when highlighting terms that are longer
than the maximum size the DaciukMihovAutomatonBuilder accepts. Rather than throwing
a confusing exception, we can instead filter out the long terms when building the
MemoryIndexOffsetStrategy. Very long terms are likely to be junk input in any case.
2022-09-24 11:26:16 +01:00
Luke Kot-Zaniewski 3a04aa44c2
Fix repeating token sentence boundary bug (#11734)
Signed-off-by: lkotzaniewsk <lkotzaniewsk@bloomberg.net>
Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>
2022-09-23 12:59:46 +02:00
jianping weng 5b24a233bd
LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search (#687) 2022-09-22 08:51:13 +02:00
Shai Erera bcc116057d
Minor refactoring and cleanup to taxonomy index code (#11775) 2022-09-21 13:08:33 +03:00
Julie Tibshirani add309bb40 Mute TestKnnVectorQuery#testFilterWithSameScore while we work on a fix 2022-09-20 15:48:56 -07:00
Luca Cavanna 4eaebee686
Guard FieldExistsQuery against null pointers (#11794)
FieldExistsQuery checks if there are points for a certain field, and then retrieves the
corresponding point values. When all documents that had points for a certain field have
been deleted from a certain segments, as well as merged away, field info may report
that there are points yet the corresponding point values are null.

With this change we add a null check in FieldExistsQuery. Long term, we will likely want
to prevent this situation from happening.

Relates #11393
2022-09-20 15:38:38 +02:00
Adrien Grand 6c46662b43
Fix handling of ghost fields in string sorts. (#11792)
Introduction of dynamic pruning for string sorts (#11669) introduced a bug with
string sorts and ghost fields, triggering a `NullPointerException` because the
code assumes that `LeafReader#terms` is not null if the field is indexed
according to field infos.

This commit fixes the issue and adds tests for ghost fields across all sort
types.

Hopefully we can simplify and remove the null check in the future when we
improve handling of ghost fields (#11393).
2022-09-20 13:49:52 +02:00
Jan Høydahl 00a8112d97
LUCENE-10365 Wizard changes contributed from Solr (#591) 2022-09-20 12:07:42 +02:00
Alex 26d6063ec3
GitHub Workflows security hardening (#11789) 2022-09-20 11:28:07 +02:00
Ignacio Vera ecb0ba542b
Improve tessellator performance by delaying calls to the method #isIntersectingPolygon (#11786) 2022-09-20 07:15:38 +02:00
Michael Sokolov accc3bdcfa
update DOAP and releaseWizard to reflect migration to github (#11747) 2022-09-19 13:53:26 -04:00
Michael Sokolov 07af358f90
Diversity check bugfix (#11781)
* Fixes bug in HNSW diversity checks introduced in LUCENE-10577
2022-09-19 11:48:59 -04:00
Michael Sokolov e69c48b8d9 Fix rare bug in TestKnnVectorQuery when we have multiple segments 2022-09-18 20:21:39 +00:00
Namgyu Kim 451bab300e
GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori (#11779) 2022-09-17 00:42:35 +09:00
Adrien Grand 155876a902 LUCENE-10674: Move changes entry to 9.4. 2022-09-16 16:59:42 +02:00
Dawid Weiss 9acc653995
GH-11172: remove WindowsDirectory and native subproject. (#11774) 2022-09-15 16:22:46 +02:00
John Mazanec 0587844742
LUCENE-10674: Ensure BitSetConjDISI returns NO_MORE_DOCS when sub-iterator exhausts. (#1068)
Signed-off-by: John Mazanec <jmazane@amazon.com>
2022-09-15 11:21:39 +02:00
Alexander Münch 5de685cfba
Removed duplicate check in SpanGradientFormatter (#11762) 2022-09-14 13:37:31 +01:00
Adrien Grand a426c6fec3 Fix integer overflow in tests. 2022-09-13 17:08:17 +02:00
Greg Miller 4463a0b271
GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children (#11764) 2022-09-13 06:50:52 -07:00
Dawid Weiss e491ef797c
Retry gradle wrapper download on http 500 and 503. (#11766) 2022-09-13 10:30:20 +02:00
Dhiru Kholia 30b72ec364
Fix a typo affecting Luke (#11763) 2022-09-12 13:05:40 +02:00
Alan Woodward 41d03f69ce
Fix IntervalBuilder.NO_INTERVALS docId when unpositioned (#11760)
IntervalBuilder.NO_INTERVALS should return -1 when unpositioned,
not NO_MORE_DOCS. This can trigger exceptions when an empty
IntervalQuery is combined in a conjunction.

Fixes #11759
2022-09-09 17:19:15 +01:00
Mayya Sharipova 0ea8035612
LUCENE-10592 Better estimate memory for HNSW graph (#11743)
Better estimate memory used for OnHeapHnswGraph,
as well as add tests.

Also don't overallocate arrays in NeighborArray

Relates to #992
2022-09-08 16:54:29 -04:00
Yuting Gan 49b596ef02
Added a top-n range faceting example (#1035) 2022-09-08 12:19:42 -07:00
Julie Tibshirani 09a13aeaf2
LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively (#11756)
This PR removes the recently added function on LeafReader to exhaustively search
through vectors, plus the helper function KnnVectorsReader#searchExhaustively.
Instead it performs the exact search within KnnVectorQuery, using a new helper
class called VectorScorer.
2022-09-08 12:15:02 -07:00
Robert Muir f4146a44e9
Fix TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull to handle IllegalStateException from startCommit() (#11757)
If ConcurrentMergeScheduler is used, and the merge hits fatal exception (such as disk full) after prepareCommit()'s ensureOpen() check, then startCommit() will throw IllegalStateException instead of AlreadyClosedException.

The test is currently not prepared to handle this: the logic is only geared around exceptions coming from addDocument()

Closes #11755
2022-09-08 13:35:54 -04:00
Adrien Grand f8285fd0fe
Prevent term vectors from exceeding the maximum dictionary size. (#11726)
When indexing term vectors for a very large document, the automatic computation
of the dictionary size based on the overall size of the block might yield a
size that exceeds the maximum window size that is supported by LZ4. This commit
addresses the issue by automatically taking the minimum of the result of this
computation and the maximum window size (64kB).
2022-09-08 13:44:21 +02:00
Marios Trivyzas dbffe3472b
LUCENE-10423: Remove usages of System.currentTimeMillis() from tests (#11749)
* Remove usages of System.currentTimeMillis() from tests

- Use Random from `RandomizedRunner` to be able to use a Seed to
  reproduce tests, instead of a seed coming from wall clock.
- Replace time based tests, using wall clock to determine periods
  with counter of repetitions, to have a consistent reproduction.

Closes: #11459

* address comments

* tune iterations

* tune iterations for nightly
2022-09-06 17:55:01 -04:00
Dawid Weiss d3460fa1bb
Add tidy after addVersion is called. (#11748) 2022-09-04 19:50:38 +02:00
Greg Miller 84cae4f27c
Simplify dense optimization check in TermInSetQuery (#11737) 2022-09-02 07:51:29 -07:00
Greg Miller 202dd809bd Ensure TermInSetQuery ScoreSupplier never returns null Scorer 2022-09-01 15:31:14 -07:00
Greg Miller 680f21dca5
LUCENE-10207: TermInSetQuery now provides a ScoreSupplier with cost estimation for use in IndexOrDocValuesQuery (#1058) 2022-09-01 14:04:43 -07:00
Michael Sokolov 0462a0ad73 fixed index order needed for TestKnnVectorQuery.testScoreEuclidean (#11732) 2022-09-01 09:53:57 -04:00
Michael Sokolov 1649964f07 Forward-port CHANGES entry for quantized HNSW vectors from 9.x branch 2022-09-01 09:53:46 -04:00
Tomoko Uchida fd86968fee
remove a link to old Jira in README. 2022-09-01 00:41:56 +09:00
Mayya Sharipova 554fabf682
LUCENE-10633 Disable sort optimization for SortedSetSortField (#3125)
Add ability to SortedSetSortField to disable sort optimization
2022-08-30 16:52:28 -04:00
Michael Sokolov 61ef031f7f
SimpleText knn vectors; fix searchExhaustively and suppress a byte format test case (#11725) 2022-08-29 11:49:52 -04:00
Tomoko Uchida 29f94b0404
a bit of clarification about GitHub Milestone 2022-08-28 13:52:58 +09:00
Tomoko Uchida 6d664ccd95 adjast wording 2022-08-27 13:02:48 +09:00
Tomoko Uchida 09a7f9aa53 clarify the relation between CHANGES and Milestone 2022-08-27 12:58:33 +09:00
Tomoko Uchida 224953304c
Document about Milestone for release planning (#11723) 2022-08-27 12:29:40 +09:00
Tomoko Uchida e61958e4fd links to github should be '/issues' 2022-08-27 11:54:20 +09:00
Dawid Weiss 4f7543725c
#11720 Upgrade randomizedtesting to 2.8.1 (#11721) 2022-08-26 00:01:57 +02:00