Port generic exception handling from MemorySegmentIndexInput to ByteBufferIndexInput. This also adds the invalid position while seeking or reading to the exception message.
hunspell: introduce FragmentChecker to speed up ModifyingSuggester
add NGramFragmentChecker to quickly check whether insertions/replacements produce strings that are even possible in the language
Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>
This test often takes several minutes with normal runs (no NIGHTLY/multiplier/etc). Tone it down so that it isn't slow: CI builds can work it harder by passing those parameters
* jacoco/ coverage shouldn't trigger all test tasks as dependencies - instead, it should run after those test tasks that you choose to run. removed java plugin from top-level.
* Make coverage depend on the default test task.
* Update jacoco log plugin so that it doesn't make hard dependencies on test tasks.
Co-authored-by: Dawid Weiss <dawid.weiss@carrotsearch.com>
Adds createLatLonShapeDocValues and createXYShapeDocValues factory methods
to LatLonShape and XYShape factory classes, respectively.
Signed-off-by: Nicholas Walter Knize <nknize@apache.org>
When reading large segments, the vectors format can fail with a validation
error:
java.lang.IllegalStateException: Vector data length 3070061568 not matching
size=999369 * dim=768 * byteSize=4 = -1224905728
The problem is that we use an integer to represent the size, which is too small
to hold it. The bug snuck in during the work to enable int8 values, which
switched a long value to an int.
When running the `reindex` task with KnnGraphTester, exceptionally large
datasets can be used. Since mmap is used to read the data, we need to know the
buffer size. This size is limited to Integer.MAX_VALUE, which is inadequate for
larger datasets.
So, this commit adjusts the reading to only read a single vector at a time.
We recently introduced support for kNN vectors to `ExitableDirectoryReader`.
Previously, we checked for cancellation not only on sampled calls `advance`,
but on every single call to `vectorValue`. This can cause significant overhead
when a query scans many vector values (for example the case where you're doing
an exact scan and computing a vector similarity for every matching document).
This PR removes the cancellation checks on `vectorValue`, since having them on
`advance` is already enough.
Add retries for common issues such as connect timeout, etc.
This won't solve the problem of read-timeouts happening around the actual
transferTo, but it is an easy incremental improvement.
Currently, this task is too silent and just writes HTML reports. It is a
nice improvement to print the summary to the console.
Before:
```
> Task :lucene:analysis:icu:jacocoTestReport
Code coverage report at: /home/rmuir/workspace/lucene/lucene/analysis/icu/build/reports/jacoco/test/html.
```
After:
```
> Task :lucene:analysis:icu:jacocoTestReport
Code coverage report at: /home/rmuir/workspace/lucene/lucene/analysis/icu/build/reports/jacoco/test/html.
> Task :lucene:analysis:icu:jacocoLogTestCoverage
Test Coverage:
- Class Coverage: 100%
- Method Coverage: 87.9%
- Branch Coverage: 82.7%
- Line Coverage: 92.8%
- Instruction Coverage: 92.7%
- Complexity Coverage: 78.8%
```
Move minimum TieredMergePolicy delete percentage from 20% to 5%
and change deletePctAllowed default to 20%
Co-authored-by: Marc D'Mello <dmellomd@amazon.com>