Commit Graph

36339 Commits

Author SHA1 Message Date
Peter Gromov 6fbc5f73c3
hunspell: introduce FragmentChecker to speed up ModifyingSuggester (#11909)
hunspell: introduce FragmentChecker to speed up ModifyingSuggester

add NGramFragmentChecker to quickly check whether insertions/replacements produce strings that are even possible in the language

Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>
2022-11-11 12:13:47 +01:00
Benjamin Trent c8d44acf20
Follow up to GITHUB#11916, remove deleted docs check (#11919) 2022-11-10 18:40:24 -05:00
Benjamin Trent 3a506ec87a
GITHUB#11911: improve checkindex to be more thorough for vectors (#11916)
search every N docs to get close to 64 tests
2022-11-10 16:45:47 -05:00
Uwe Schindler e9ef61ba39 Fix bug with set of strings since upgrade of Gradle -> explicit cast from GString to String 2022-11-10 17:18:30 +01:00
Benjamin Trent 1360baaee9
Fix integer overflow when seeking the vector index for connections (#11905)
* Fix integer overflow when seeking the vector index for connections
* Adding monster test to cause overflow failure
2022-11-10 08:24:32 -05:00
Peter Gromov f7417d5961
hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in) (#11893)
hunspell: allow for faster dictionary iteration during 'suggest' by using more memory (opt-in)
2022-11-09 08:20:50 +01:00
Greg Miller c66a559050
Further optimize DrillSideways scoring (#11881) 2022-11-08 10:08:12 -08:00
Benjamin Trent f9c26ed501
Fix latent casting bug in BKDWriter (#11907) 2022-11-08 15:55:07 +01:00
Peter Gromov 682e5c94e8
[hunspell] speed up WordFormGenerator (#11904) 2022-11-07 19:41:17 +01:00
Lu Xugang a8120bcb32
Simplify the logic of matchAll() in IndexSortSortedNumericDocValuesRangeQuery (#11884)
* Simplify the logic of matchAll() in IndexSortSortedNumericDocValuesRangeQuery
2022-11-07 19:09:52 +08:00
Michael Sokolov 48aad5090f
#11896: reduce top k in test to avoid split-graph (#11899) 2022-11-04 09:30:46 -04:00
Nhat Nguyen 1a5ad61b9d
Document that bulkScorer method can return null (#11897)
Like Weight#scorer, we should warn users that Weight#bulkScorer can 
return null if the query matches no documents.
2022-11-02 15:12:43 -07:00
Robert Muir 4e207fed62
Tone down TestDocumentsWriterStallControl.testRandom, so it does not take minutes (#11894)
This test often takes several minutes with normal runs (no NIGHTLY/multiplier/etc). Tone it down so that it isn't slow: CI builds can work it harder by passing those parameters
2022-11-02 12:17:15 -04:00
Tim Stewart 7c130d2f07
Fix type in CONTRIBUTING.md (#11879) 2022-11-01 20:10:05 +00:00
Peter Gromov 419ffd3974 [hunspell] perform a bit fewer checks after 2 suffixes have been removed 2022-10-31 10:09:54 +01:00
Marios Trivyzas 3210a42f09
Fix nanos to millis conversion for tests (#11856) 2022-10-29 09:05:17 +02:00
Patrick Zhai 26ec0dd44c
add gradle aggregated coverage console log and html location (#11882)
* jacoco/ coverage shouldn't trigger all test tasks as dependencies - instead, it should run after those test tasks that you choose to run. removed java plugin from top-level.

* Make coverage depend on the default test task.

* Update jacoco log plugin so that it doesn't make hard dependencies on test tasks.

Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
2022-10-28 23:33:37 -07:00
Robert Muir 8736c18747
Allow building with java 18 now that gradle supports it (#11889)
* Allow building with java 18 now that gradle supports it
* update the "generic error" in these scripts
2022-10-28 23:41:09 -04:00
Navneet Verma e7253f112d
Add interface to relate a LatLonShape with another shape represented as Component2D. (#11753)
Adds createLatLonShapeDocValues and createXYShapeDocValues factory methods
to LatLonShape and XYShape factory classes, respectively.

Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
2022-10-28 13:52:20 -05:00
Dawid Weiss 5c7edd7f38
Upgrade to gradle 7.5.1 (excluding launch scripts, which we have customized) (#11886) 2022-10-28 08:49:36 +02:00
Marc D'Mello 2793256682
GITHUB#11795: Add FilterDirectory to track write amplification factor (#11796)
* LUCENE-11795: Add FilterDirectory to track write amplification factor

* addressed feedback

* added optional temp output tracking and real time tracking

* addressed more feedback

* more improvements + added CHANGED.txt entry

* format edit to CHANGES.txt

* remove waf factor calculation

Co-authored-by: Marc D'Mello <dmellomd@amazon.com>
2022-10-27 15:07:56 -04:00
Michael Sokolov b3bc59910f
When evaluating expressions, defer calling advanceExact on operands until doubleValue() is called (#11878) 2022-10-26 14:05:39 -04:00
gf2121 05bd83dfe1
Use ByteArrayComparator for PointInSetQuery#MergePointVisitor (#11876) 2022-10-26 13:39:32 +08:00
Dawid Weiss 50261de406
Update java version to 17 for Lucene 10 in the release wizard. (#11872) 2022-10-25 13:50:21 +02:00
gf2121 b1d1e488f2
Move LUCENE-10376 CHANGES entry to 10.0.0 (#11871) 2022-10-24 22:39:21 +08:00
iverase 976a38baa0 Add back-compat indices for 9.4.1 2022-10-24 15:20:44 +02:00
iverase 9ce6268cce Add bugfix version 9.4.1 2022-10-24 15:13:12 +02:00
iverase 70d0ec322b DOAP changes for release 9.4.1 2022-10-24 14:20:51 +02:00
gf2121 8cfbc18497
LUCENE-10376: Roll up the loop in vint/vlong in DataInput (#602) 2022-10-24 17:39:22 +08:00
Julie Tibshirani 0f525bfb14
Fix Lucene94HnswVectorsFormat validation on large segments (#11861)
When reading large segments, the vectors format can fail with a validation
error:

java.lang.IllegalStateException: Vector data length 3070061568 not matching
size=999369 * dim=768 * byteSize=4 = -1224905728

The problem is that we use an integer to represent the size, which is too small
to hold it. The bug snuck in during the work to enable int8 values, which
switched a long value to an int.
2022-10-19 13:49:59 -07:00
Patrick Zhai 6cde41c9fd
GITHUB-11838 Change API to allow concurrent query rewrite (#11840)
Replace Query#rewrite(IndexReader) with Query#rewrite(IndexSearcher)
2022-10-19 09:49:40 -07:00
Peter Gromov 05971b3315
hunspell: speed up GeneratingSuggester by not deserializing non-suggestible roots (#11859) 2022-10-19 13:17:43 +02:00
Steven Schlansker f3d85be476
PrimaryNode: add configurable timeout to waitForAllRemotesToClose (#11822) 2022-10-18 17:21:01 -07:00
Adrien Grand 2ed16c7846 Revert "Binary search the entries when all suffixes have the same length in a leaf block. (#11722)"
This reverts commit 3adec5b1ce.
2022-10-18 14:27:02 +02:00
zhouhui 3adec5b1ce
Binary search the entries when all suffixes have the same length in a leaf block. (#11722) 2022-10-18 11:07:52 +02:00
Benjamin Trent cd5e200f47
Fix failure to load larger data sets in KnnGraphTest (#11849)
When running the `reindex` task with KnnGraphTester, exceptionally large
datasets can be used. Since mmap is used to read the data, we need to know the
buffer size. This size is limited to Integer.MAX_VALUE, which is inadequate for
larger datasets.

So, this commit adjusts the reading to only read a single vector at a time.
2022-10-17 16:39:58 -07:00
Peter Gromov 2958f2ae9d
hunspell: speedup suggestions by caching speller and compound stemming requests (#11857)
hunspell: speed up suggestions by caching speller and compound stemming requests
2022-10-17 21:25:12 +02:00
Zach Chen 21e3f654fb
LUCENE-10635: Ensure test coverage for WANDScorer by using a test query (#1039) 2022-10-15 13:02:02 -07:00
Robert Muir ece8ea715c
Fix ExitableDirectoryReader sampling constants to be power-of-2 (#11850)
If it's performance sensitive enough that we should do sampling, then we should avoid integer division too.
2022-10-15 12:05:15 -04:00
Benjamin Trent a7369d7f59
Remove cancellation check on every vector (#11843)
We recently introduced support for kNN vectors to `ExitableDirectoryReader`.
Previously, we checked for cancellation not only on sampled calls `advance`,
but on every single call to `vectorValue`. This can cause significant overhead
when a query scans many vector values (for example the case where you're doing
an exact scan and computing a vector similarity for every matching document).

This PR removes the cancellation checks on `vectorValue`, since having them on
`advance` is already enough.
2022-10-13 09:29:33 -07:00
Marc D'Mello 3a608995a1
GITHUB-11761 (part 2): Fix unit tests to cleany work with new TierMergePolicy delete pct default (#11841)
Co-authored-by: Marc D'Mello <dmellomd@amazon.com>
2022-10-13 15:18:50 +02:00
Robert Muir 83891d9a61
WrapperDownloader: add retries for network blips around connect(), too (#11846)
Add retries for common issues such as connect timeout, etc.

This won't solve the problem of read-timeouts happening around the actual
transferTo, but it is an easy incremental improvement.
2022-10-13 07:21:34 -04:00
Robert Muir 5e26b36ac8
Mark TestLongBitSet.testHugeCapacity @Monster as it requires a lot of memory (#11844)
Closes #11842
2022-10-13 07:20:21 -04:00
Peter Gromov ab50fe640b [hunspell] fix TestPerformance measurement after millis->nanos conversion 2022-10-12 11:29:07 +02:00
Robert Muir 4c434b7089
make 'gradle coverage' print test coverage summaries. (#11837)
Currently, this task is too silent and just writes HTML reports. It is a
nice improvement to print the summary to the console.

Before:

```
> Task :lucene:analysis:icu:jacocoTestReport
Code coverage report at: /home/rmuir/workspace/lucene/lucene/analysis/icu/build/reports/jacoco/test/html.
```

After:

```
> Task :lucene:analysis:icu:jacocoTestReport
Code coverage report at: /home/rmuir/workspace/lucene/lucene/analysis/icu/build/reports/jacoco/test/html.

> Task :lucene:analysis:icu:jacocoLogTestCoverage
Test Coverage:
    - Class Coverage: 100%
    - Method Coverage: 87.9%
    - Branch Coverage: 82.7%
    - Line Coverage: 92.8%
    - Instruction Coverage: 92.7%
    - Complexity Coverage: 78.8%
```
2022-10-05 21:46:20 -04:00
Marc D'Mello d966adcb62
GITHUB-11761: Move minimum TieredMergePolicy delete percentage and change default value (#11831)
Move minimum TieredMergePolicy delete percentage from 20% to 5%

and change deletePctAllowed default to 20%

Co-authored-by: Marc D'Mello <dmellomd@amazon.com>
2022-10-05 15:33:12 -07:00
Uwe Schindler f54fddc89f
GH-11819: Exclude MR-JAR sourceSet and folders from Idea Sync (#11836) 2022-10-04 11:49:39 +02:00
Alan Woodward 6bd8733fdb
No need to rewrite queries in unified highlighter (#11807)
Since QueryVisitor added the ability to signal multi-term queries, the query rewrite
call in UnifiedHighlighter has been essentially useless, and with more aggressive
rewriting this is now causing bugs like #11490. We can safely remove this call.

Fixes #11490
2022-10-03 10:15:40 +01:00
Uwe Schindler df94e6c005
Clean up MR-JAR build, so we do not have hardcoded "19" everywhere in validation tasks (#11835)
As long as soureSets are named "mainXX", with XX a feature version, we check everything automatically:
- ECJ is disabled (we can't do a check without forking ECJ as a separate process using toolkit, we may support this later)
- forbiddenapis (we disable checks for missing classes)
- errorprone is disabled (errorprone does not work correctly at moment with forked compiler)
2022-10-02 20:41:46 +02:00
Uwe Schindler e5a226ec7c For now only use bundled signatures from minJavaVersion (#11834)
# Conflicts:
#	gradle/validation/forbidden-apis.gradle
2022-10-02 17:54:11 +02:00