Commit Graph

36167 Commits

Author SHA1 Message Date
Luke Kot-Zaniewski 3a04aa44c2
Fix repeating token sentence boundary bug (#11734)
Signed-off-by: lkotzaniewsk <lkotzaniewsk@bloomberg.net>
Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>
2022-09-23 12:59:46 +02:00
jianping weng 5b24a233bd
LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search (#687) 2022-09-22 08:51:13 +02:00
Shai Erera bcc116057d
Minor refactoring and cleanup to taxonomy index code (#11775) 2022-09-21 13:08:33 +03:00
Julie Tibshirani add309bb40 Mute TestKnnVectorQuery#testFilterWithSameScore while we work on a fix 2022-09-20 15:48:56 -07:00
Luca Cavanna 4eaebee686
Guard FieldExistsQuery against null pointers (#11794)
FieldExistsQuery checks if there are points for a certain field, and then retrieves the
corresponding point values. When all documents that had points for a certain field have
been deleted from a certain segments, as well as merged away, field info may report
that there are points yet the corresponding point values are null.

With this change we add a null check in FieldExistsQuery. Long term, we will likely want
to prevent this situation from happening.

Relates #11393
2022-09-20 15:38:38 +02:00
Adrien Grand 6c46662b43
Fix handling of ghost fields in string sorts. (#11792)
Introduction of dynamic pruning for string sorts (#11669) introduced a bug with
string sorts and ghost fields, triggering a `NullPointerException` because the
code assumes that `LeafReader#terms` is not null if the field is indexed
according to field infos.

This commit fixes the issue and adds tests for ghost fields across all sort
types.

Hopefully we can simplify and remove the null check in the future when we
improve handling of ghost fields (#11393).
2022-09-20 13:49:52 +02:00
Jan Høydahl 00a8112d97
LUCENE-10365 Wizard changes contributed from Solr (#591) 2022-09-20 12:07:42 +02:00
Alex 26d6063ec3
GitHub Workflows security hardening (#11789) 2022-09-20 11:28:07 +02:00
Ignacio Vera ecb0ba542b
Improve tessellator performance by delaying calls to the method #isIntersectingPolygon (#11786) 2022-09-20 07:15:38 +02:00
Michael Sokolov accc3bdcfa
update DOAP and releaseWizard to reflect migration to github (#11747) 2022-09-19 13:53:26 -04:00
Michael Sokolov 07af358f90
Diversity check bugfix (#11781)
* Fixes bug in HNSW diversity checks introduced in LUCENE-10577
2022-09-19 11:48:59 -04:00
Michael Sokolov e69c48b8d9 Fix rare bug in TestKnnVectorQuery when we have multiple segments 2022-09-18 20:21:39 +00:00
Namgyu Kim 451bab300e
GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori (#11779) 2022-09-17 00:42:35 +09:00
Adrien Grand 155876a902 LUCENE-10674: Move changes entry to 9.4. 2022-09-16 16:59:42 +02:00
Dawid Weiss 9acc653995
GH-11172: remove WindowsDirectory and native subproject. (#11774) 2022-09-15 16:22:46 +02:00
John Mazanec 0587844742
LUCENE-10674: Ensure BitSetConjDISI returns NO_MORE_DOCS when sub-iterator exhausts. (#1068)
Signed-off-by: John Mazanec <jmazane@amazon.com>
2022-09-15 11:21:39 +02:00
Alexander Münch 5de685cfba
Removed duplicate check in SpanGradientFormatter (#11762) 2022-09-14 13:37:31 +01:00
Adrien Grand a426c6fec3 Fix integer overflow in tests. 2022-09-13 17:08:17 +02:00
Greg Miller 4463a0b271
GITHUB#11742: MatchingFacetSetsCounts#getTopChildren now returns top children instead of all children (#11764) 2022-09-13 06:50:52 -07:00
Dawid Weiss e491ef797c
Retry gradle wrapper download on http 500 and 503. (#11766) 2022-09-13 10:30:20 +02:00
Dhiru Kholia 30b72ec364
Fix a typo affecting Luke (#11763) 2022-09-12 13:05:40 +02:00
Alan Woodward 41d03f69ce
Fix IntervalBuilder.NO_INTERVALS docId when unpositioned (#11760)
IntervalBuilder.NO_INTERVALS should return -1 when unpositioned,
not NO_MORE_DOCS. This can trigger exceptions when an empty
IntervalQuery is combined in a conjunction.

Fixes #11759
2022-09-09 17:19:15 +01:00
Mayya Sharipova 0ea8035612
LUCENE-10592 Better estimate memory for HNSW graph (#11743)
Better estimate memory used for OnHeapHnswGraph,
as well as add tests.

Also don't overallocate arrays in NeighborArray

Relates to #992
2022-09-08 16:54:29 -04:00
Yuting Gan 49b596ef02
Added a top-n range faceting example (#1035) 2022-09-08 12:19:42 -07:00
Julie Tibshirani 09a13aeaf2
LUCENE-10577: Remove LeafReader#searchNearestVectorsExhaustively (#11756)
This PR removes the recently added function on LeafReader to exhaustively search
through vectors, plus the helper function KnnVectorsReader#searchExhaustively.
Instead it performs the exact search within KnnVectorQuery, using a new helper
class called VectorScorer.
2022-09-08 12:15:02 -07:00
Robert Muir f4146a44e9
Fix TestIndexWriterOnDiskFull.testAddDocumentOnDiskFull to handle IllegalStateException from startCommit() (#11757)
If ConcurrentMergeScheduler is used, and the merge hits fatal exception (such as disk full) after prepareCommit()'s ensureOpen() check, then startCommit() will throw IllegalStateException instead of AlreadyClosedException.

The test is currently not prepared to handle this: the logic is only geared around exceptions coming from addDocument()

Closes #11755
2022-09-08 13:35:54 -04:00
Adrien Grand f8285fd0fe
Prevent term vectors from exceeding the maximum dictionary size. (#11726)
When indexing term vectors for a very large document, the automatic computation
of the dictionary size based on the overall size of the block might yield a
size that exceeds the maximum window size that is supported by LZ4. This commit
addresses the issue by automatically taking the minimum of the result of this
computation and the maximum window size (64kB).
2022-09-08 13:44:21 +02:00
Marios Trivyzas dbffe3472b
LUCENE-10423: Remove usages of System.currentTimeMillis() from tests (#11749)
* Remove usages of System.currentTimeMillis() from tests

- Use Random from `RandomizedRunner` to be able to use a Seed to
  reproduce tests, instead of a seed coming from wall clock.
- Replace time based tests, using wall clock to determine periods
  with counter of repetitions, to have a consistent reproduction.

Closes: #11459

* address comments

* tune iterations

* tune iterations for nightly
2022-09-06 17:55:01 -04:00
Dawid Weiss d3460fa1bb
Add tidy after addVersion is called. (#11748) 2022-09-04 19:50:38 +02:00
Greg Miller 84cae4f27c
Simplify dense optimization check in TermInSetQuery (#11737) 2022-09-02 07:51:29 -07:00
Greg Miller 202dd809bd Ensure TermInSetQuery ScoreSupplier never returns null Scorer 2022-09-01 15:31:14 -07:00
Greg Miller 680f21dca5
LUCENE-10207: TermInSetQuery now provides a ScoreSupplier with cost estimation for use in IndexOrDocValuesQuery (#1058) 2022-09-01 14:04:43 -07:00
Michael Sokolov 0462a0ad73 fixed index order needed for TestKnnVectorQuery.testScoreEuclidean (#11732) 2022-09-01 09:53:57 -04:00
Michael Sokolov 1649964f07 Forward-port CHANGES entry for quantized HNSW vectors from 9.x branch 2022-09-01 09:53:46 -04:00
Tomoko Uchida fd86968fee
remove a link to old Jira in README. 2022-09-01 00:41:56 +09:00
Mayya Sharipova 554fabf682
LUCENE-10633 Disable sort optimization for SortedSetSortField (#3125)
Add ability to SortedSetSortField to disable sort optimization
2022-08-30 16:52:28 -04:00
Michael Sokolov 61ef031f7f
SimpleText knn vectors; fix searchExhaustively and suppress a byte format test case (#11725) 2022-08-29 11:49:52 -04:00
Tomoko Uchida 29f94b0404
a bit of clarification about GitHub Milestone 2022-08-28 13:52:58 +09:00
Tomoko Uchida 6d664ccd95 adjast wording 2022-08-27 13:02:48 +09:00
Tomoko Uchida 09a7f9aa53 clarify the relation between CHANGES and Milestone 2022-08-27 12:58:33 +09:00
Tomoko Uchida 224953304c
Document about Milestone for release planning (#11723) 2022-08-27 12:29:40 +09:00
Tomoko Uchida e61958e4fd links to github should be '/issues' 2022-08-27 11:54:20 +09:00
Dawid Weiss 4f7543725c
#11720 Upgrade randomizedtesting to 2.8.1 (#11721) 2022-08-26 00:01:57 +02:00
Mike Drob dbc7a9764a
Add Integer awareness to RamUsageEstimator.sizeOf (#11715)
Additionally, update comments to reflect that we have not been VM cache-aware for a long time now.
2022-08-25 15:18:08 -05:00
Uwe Schindler 1d54299011
Fix classloading deadlock in analysis factories / AnalysisSPILoader initialization. This closes #11701 (#11718) 2022-08-25 18:16:04 +02:00
Tomoko Uchida 53b1ce7504
update contributing guide for GH issue (#11716) 2022-08-25 04:06:09 +09:00
Greg Miller 1529606763
Optimize TermInSetQuery for terms that match all docs in a segment (#1062) 2022-08-23 08:37:44 -07:00
Michael Sokolov 8021c2db4e Don't throw an exception for byte-encoded vectors in SimpleText codec 2022-08-22 08:29:58 -04:00
Julie Tibshirani df67223497 Disable byte encoding in TestSimpleTextKnnVectorsFormat 2022-08-21 17:00:57 -07:00
Julie Tibshirani 653d2ebf71
Remove KnnVectorsFormat#currentVersion (#1077)
These internal versions only make sense within a codec definition, and aren't
meant to be exposed and compared across codecs. Since this method is only used
in tests, we can move the check to the test classes instead.
2022-08-21 13:09:07 -07:00