We recently introduced support for kNN vectors to `ExitableDirectoryReader`.
Previously, we checked for cancellation not only on sampled calls `advance`,
but on every single call to `vectorValue`. This can cause significant overhead
when a query scans many vector values (for example the case where you're doing
an exact scan and computing a vector similarity for every matching document).
This PR removes the cancellation checks on `vectorValue`, since having them on
`advance` is already enough.
Add retries for common issues such as connect timeout, etc.
This won't solve the problem of read-timeouts happening around the actual
transferTo, but it is an easy incremental improvement.
Currently, this task is too silent and just writes HTML reports. It is a
nice improvement to print the summary to the console.
Before:
```
> Task :lucene:analysis:icu:jacocoTestReport
Code coverage report at: /home/rmuir/workspace/lucene/lucene/analysis/icu/build/reports/jacoco/test/html.
```
After:
```
> Task :lucene:analysis:icu:jacocoTestReport
Code coverage report at: /home/rmuir/workspace/lucene/lucene/analysis/icu/build/reports/jacoco/test/html.
> Task :lucene:analysis:icu:jacocoLogTestCoverage
Test Coverage:
- Class Coverage: 100%
- Method Coverage: 87.9%
- Branch Coverage: 82.7%
- Line Coverage: 92.8%
- Instruction Coverage: 92.7%
- Complexity Coverage: 78.8%
```
Move minimum TieredMergePolicy delete percentage from 20% to 5%
and change deletePctAllowed default to 20%
Co-authored-by: Marc D'Mello <dmellomd@amazon.com>
Since QueryVisitor added the ability to signal multi-term queries, the query rewrite
call in UnifiedHighlighter has been essentially useless, and with more aggressive
rewriting this is now causing bugs like #11490. We can safely remove this call.
Fixes#11490
As long as soureSets are named "mainXX", with XX a feature version, we check everything automatically:
- ECJ is disabled (we can't do a check without forking ECJ as a separate process using toolkit, we may support this later)
- forbiddenapis (we disable checks for missing classes)
- errorprone is disabled (errorprone does not work correctly at moment with forked compiler)
- by default, Lucene will only generate a config for Java 17 (or 11 in 9.x), without the MR-JAR sourceSets
- if passed -Peclipse.javaVersion=19, it will include matching sourcesets and set compiler version to given version in classpath
This uses Gradle's auto-provisioning to compile Java 19 classes and build a multi-release JAR from them. Please make sure to regenerate gradle.properties (delete it) or change "org.gradle.java.installations.auto-download" to "true"
* Upgrade several build dependencies.
* Update error prone rules (those are off but they do trigger warnings/ errors)
* A few corrections I made before I turned off new warnings. Let's do nother issue to fix them.
This method is recursive: to avoid eating too much stack we apply a
small limit. This means it can't really be used on any largish automata
without hitting exception.
But the benefit of knowing finite vs infinite in AutomatonTermsEnum is
minor: let's not auto-compute this. FuzzyQuery still gets the finite
optimization because its finite by definition. PrefixQuery is always
infinite. Wildcard/Regex just assume infinite which is safe to do.
Remove the auto-computation and the "trillean" Boolean parameter. If you
dont know that your automaton is finite, pass false to
CompiledAutomaton, it is safe.
Move this method to AutomatonTestUtil so we can still use it in test
asserts.
Closes#11809
The UnifiedHighlighter can throw exceptions when highlighting terms that are longer
than the maximum size the DaciukMihovAutomatonBuilder accepts. Rather than throwing
a confusing exception, we can instead filter out the long terms when building the
MemoryIndexOffsetStrategy. Very long terms are likely to be junk input in any case.
FieldExistsQuery checks if there are points for a certain field, and then retrieves the
corresponding point values. When all documents that had points for a certain field have
been deleted from a certain segments, as well as merged away, field info may report
that there are points yet the corresponding point values are null.
With this change we add a null check in FieldExistsQuery. Long term, we will likely want
to prevent this situation from happening.
Relates #11393
Introduction of dynamic pruning for string sorts (#11669) introduced a bug with
string sorts and ghost fields, triggering a `NullPointerException` because the
code assumes that `LeafReader#terms` is not null if the field is indexed
according to field infos.
This commit fixes the issue and adds tests for ghost fields across all sort
types.
Hopefully we can simplify and remove the null check in the future when we
improve handling of ghost fields (#11393).