Commit Graph

36253 Commits

Author SHA1 Message Date
Peter Gromov 2958f2ae9d
hunspell: speedup suggestions by caching speller and compound stemming requests (#11857)
hunspell: speed up suggestions by caching speller and compound stemming requests
2022-10-17 21:25:12 +02:00
Zach Chen 21e3f654fb
LUCENE-10635: Ensure test coverage for WANDScorer by using a test query (#1039) 2022-10-15 13:02:02 -07:00
Robert Muir ece8ea715c
Fix ExitableDirectoryReader sampling constants to be power-of-2 (#11850)
If it's performance sensitive enough that we should do sampling, then we should avoid integer division too.
2022-10-15 12:05:15 -04:00
Benjamin Trent a7369d7f59
Remove cancellation check on every vector (#11843)
We recently introduced support for kNN vectors to `ExitableDirectoryReader`.
Previously, we checked for cancellation not only on sampled calls `advance`,
but on every single call to `vectorValue`. This can cause significant overhead
when a query scans many vector values (for example the case where you're doing
an exact scan and computing a vector similarity for every matching document).

This PR removes the cancellation checks on `vectorValue`, since having them on
`advance` is already enough.
2022-10-13 09:29:33 -07:00
Marc D'Mello 3a608995a1
GITHUB-11761 (part 2): Fix unit tests to cleany work with new TierMergePolicy delete pct default (#11841)
Co-authored-by: Marc D'Mello <dmellomd@amazon.com>
2022-10-13 15:18:50 +02:00
Robert Muir 83891d9a61
WrapperDownloader: add retries for network blips around connect(), too (#11846)
Add retries for common issues such as connect timeout, etc.

This won't solve the problem of read-timeouts happening around the actual
transferTo, but it is an easy incremental improvement.
2022-10-13 07:21:34 -04:00
Robert Muir 5e26b36ac8
Mark TestLongBitSet.testHugeCapacity @Monster as it requires a lot of memory (#11844)
Closes #11842
2022-10-13 07:20:21 -04:00
Peter Gromov ab50fe640b [hunspell] fix TestPerformance measurement after millis->nanos conversion 2022-10-12 11:29:07 +02:00
Robert Muir 4c434b7089
make 'gradle coverage' print test coverage summaries. (#11837)
Currently, this task is too silent and just writes HTML reports. It is a
nice improvement to print the summary to the console.

Before:

```
> Task :lucene:analysis:icu:jacocoTestReport
Code coverage report at: /home/rmuir/workspace/lucene/lucene/analysis/icu/build/reports/jacoco/test/html.
```

After:

```
> Task :lucene:analysis:icu:jacocoTestReport
Code coverage report at: /home/rmuir/workspace/lucene/lucene/analysis/icu/build/reports/jacoco/test/html.

> Task :lucene:analysis:icu:jacocoLogTestCoverage
Test Coverage:
    - Class Coverage: 100%
    - Method Coverage: 87.9%
    - Branch Coverage: 82.7%
    - Line Coverage: 92.8%
    - Instruction Coverage: 92.7%
    - Complexity Coverage: 78.8%
```
2022-10-05 21:46:20 -04:00
Marc D'Mello d966adcb62
GITHUB-11761: Move minimum TieredMergePolicy delete percentage and change default value (#11831)
Move minimum TieredMergePolicy delete percentage from 20% to 5%

and change deletePctAllowed default to 20%

Co-authored-by: Marc D'Mello <dmellomd@amazon.com>
2022-10-05 15:33:12 -07:00
Uwe Schindler f54fddc89f
GH-11819: Exclude MR-JAR sourceSet and folders from Idea Sync (#11836) 2022-10-04 11:49:39 +02:00
Alan Woodward 6bd8733fdb
No need to rewrite queries in unified highlighter (#11807)
Since QueryVisitor added the ability to signal multi-term queries, the query rewrite
call in UnifiedHighlighter has been essentially useless, and with more aggressive
rewriting this is now causing bugs like #11490. We can safely remove this call.

Fixes #11490
2022-10-03 10:15:40 +01:00
Uwe Schindler df94e6c005
Clean up MR-JAR build, so we do not have hardcoded "19" everywhere in validation tasks (#11835)
As long as soureSets are named "mainXX", with XX a feature version, we check everything automatically:
- ECJ is disabled (we can't do a check without forking ECJ as a separate process using toolkit, we may support this later)
- forbiddenapis (we disable checks for missing classes)
- errorprone is disabled (errorprone does not work correctly at moment with forked compiler)
2022-10-02 20:41:46 +02:00
Uwe Schindler e5a226ec7c For now only use bundled signatures from minJavaVersion (#11834)
# Conflicts:
#	gradle/validation/forbidden-apis.gradle
2022-10-02 17:54:11 +02:00
Uwe Schindler aae293437f
Upgrade forbiddenapis to 3.4 (#11834) 2022-10-02 16:42:36 +02:00
Uwe Schindler 7333f0329b Fix typo in log message (we only support exactly Java 19) 2022-10-02 11:09:58 +02:00
Michael Sokolov 9c12bec4a4 DOAP changes for release 9.4.0 2022-09-30 18:03:35 -04:00
Greg Miller 44b4602776
TermInSetQuery optimization when all docs in a field match a term (#11828) 2022-09-29 06:59:59 -07:00
Greg Miller 367cd2ea95 Associate correct PR with DrillSideway change in CHANGES 2022-09-29 05:48:29 -07:00
Greg Miller d02ba3134f
DrillSideways optimizations (#11803)
DrillSidewaysScorer now breaks up first- and second-phase matching and makes use of advance when possible over nextDoc.
2022-09-29 05:22:30 -07:00
Uwe Schindler 6f25c79db3 Update smoketester on main to optionally run with Java 19 2022-09-27 12:24:24 +02:00
Uwe Schindler c2058d71a1
Let smoketester initialize local settings before running any checks (like Github CI or Jenkins) (#11826) 2022-09-27 11:45:38 +02:00
Uwe Schindler 1f30800cb5
GH-11819: Fix the Eclipse part to support deveopment of the MR-JAR: (#11823)
- by default, Lucene will only generate a config for Java 17 (or 11 in 9.x), without the MR-JAR sourceSets
- if passed -Peclipse.javaVersion=19, it will include matching sourcesets and set compiler version to given version in classpath
2022-09-27 11:11:49 +02:00
Ignacio Vera 78b58b8e2e
Build SpatialVisitor once per index (#11825)
Address a performance regression on polygon queries using LatLonPoint field.
2022-09-27 10:51:49 +02:00
Greg Miller 971ae01164
Fix tie-break bug in various Facets implementations (#11768) 2022-09-26 15:05:57 -07:00
Greg Miller 734841d6c0
Optimize MultiTermQueryConstantScoreWrapper for case when a term matches all docs in a segment. (#11738) 2022-09-26 10:39:47 -07:00
Greg Miller ac12cd9f17
FacetsCollector#collect is no longer final to allow extension (#11804) 2022-09-26 10:15:31 -07:00
Uwe Schindler d943b76215 GITHUB-912: Remove deprecated APIs; fix link 2022-09-26 18:36:09 +02:00
Uwe Schindler 3b9c728ab5
MR-JAR rewrite of MMapDirectory with JDK-19 preview Panama APIs (>= JDK-19-ea+23) (#912)
This uses Gradle's auto-provisioning to compile Java 19 classes and build a multi-release JAR from them. Please make sure to regenerate gradle.properties (delete it) or change "org.gradle.java.installations.auto-download" to "true"
2022-09-26 15:22:04 +02:00
Adrien Grand 432296d967
Fix codec name in index header for Lucene94FieldInfosFormat. (#11818) 2022-09-26 14:56:30 +02:00
Dawid Weiss 6b82be5f11
Regenerate sources after dependency updates. (#11817) 2022-09-25 18:09:30 +02:00
Dawid Weiss 5d121ce44c
Upgrade several build dependencies. (#11812)
* Upgrade several build dependencies.

* Update error prone rules (those are off but they do trigger warnings/ errors)

* A few corrections I made before I turned off new warnings. Let's do nother issue to fix them.
2022-09-25 17:10:22 +02:00
Robert Muir 15f3743f02
Remove Operations.isFinite (#11813)
This method is recursive: to avoid eating too much stack we apply a
small limit. This means it can't really be used on any largish automata
without hitting exception.

But the benefit of knowing finite vs infinite in AutomatonTermsEnum is
minor: let's not auto-compute this. FuzzyQuery still gets the finite
optimization because its finite by definition. PrefixQuery is always
infinite. Wildcard/Regex just assume infinite which is safe to do.

Remove the auto-computation and the "trillean" Boolean parameter. If you
dont know that your automaton is finite, pass false to
CompiledAutomaton, it is safe.

Move this method to AutomatonTestUtil so we can still use it in test
asserts.

Closes #11809
2022-09-24 10:51:04 -04:00
Dawid Weiss 54fba99cb1
Upgrade google java format and apply tidy (#11811) 2022-09-24 15:40:27 +02:00
Dawid Weiss 8bdfa90ea9 Fix and simplify the test (#11734). 2022-09-24 12:51:01 +02:00
Alan Woodward 188a78d769
Don't try to highlight very long terms (#11808)
The UnifiedHighlighter can throw exceptions when highlighting terms that are longer
than the maximum size the DaciukMihovAutomatonBuilder accepts. Rather than throwing
a confusing exception, we can instead filter out the long terms when building the
MemoryIndexOffsetStrategy. Very long terms are likely to be junk input in any case.
2022-09-24 11:26:16 +01:00
Luke Kot-Zaniewski 3a04aa44c2
Fix repeating token sentence boundary bug (#11734)
Signed-off-by: lkotzaniewsk <lkotzaniewsk@bloomberg.net>
Co-authored-by: Dawid Weiss <dawid.weiss@gmail.com>
2022-09-23 12:59:46 +02:00
jianping weng 5b24a233bd
LUCENE-10425:speed up IndexSortSortedNumericDocValuesRangeQuery#BoundedDocSetIdIterator construction using bkd binary search (#687) 2022-09-22 08:51:13 +02:00
Shai Erera bcc116057d
Minor refactoring and cleanup to taxonomy index code (#11775) 2022-09-21 13:08:33 +03:00
Julie Tibshirani add309bb40 Mute TestKnnVectorQuery#testFilterWithSameScore while we work on a fix 2022-09-20 15:48:56 -07:00
Luca Cavanna 4eaebee686
Guard FieldExistsQuery against null pointers (#11794)
FieldExistsQuery checks if there are points for a certain field, and then retrieves the
corresponding point values. When all documents that had points for a certain field have
been deleted from a certain segments, as well as merged away, field info may report
that there are points yet the corresponding point values are null.

With this change we add a null check in FieldExistsQuery. Long term, we will likely want
to prevent this situation from happening.

Relates #11393
2022-09-20 15:38:38 +02:00
Adrien Grand 6c46662b43
Fix handling of ghost fields in string sorts. (#11792)
Introduction of dynamic pruning for string sorts (#11669) introduced a bug with
string sorts and ghost fields, triggering a `NullPointerException` because the
code assumes that `LeafReader#terms` is not null if the field is indexed
according to field infos.

This commit fixes the issue and adds tests for ghost fields across all sort
types.

Hopefully we can simplify and remove the null check in the future when we
improve handling of ghost fields (#11393).
2022-09-20 13:49:52 +02:00
Jan Høydahl 00a8112d97
LUCENE-10365 Wizard changes contributed from Solr (#591) 2022-09-20 12:07:42 +02:00
Alex 26d6063ec3
GitHub Workflows security hardening (#11789) 2022-09-20 11:28:07 +02:00
Ignacio Vera ecb0ba542b
Improve tessellator performance by delaying calls to the method #isIntersectingPolygon (#11786) 2022-09-20 07:15:38 +02:00
Michael Sokolov accc3bdcfa
update DOAP and releaseWizard to reflect migration to github (#11747) 2022-09-19 13:53:26 -04:00
Michael Sokolov 07af358f90
Diversity check bugfix (#11781)
* Fixes bug in HNSW diversity checks introduced in LUCENE-10577
2022-09-19 11:48:59 -04:00
Michael Sokolov e69c48b8d9 Fix rare bug in TestKnnVectorQuery when we have multiple segments 2022-09-18 20:21:39 +00:00
Namgyu Kim 451bab300e
GITHUB#11778: Add detailed part-of-speech tag for particle and ending on Nori (#11779) 2022-09-17 00:42:35 +09:00
Adrien Grand 155876a902 LUCENE-10674: Move changes entry to 9.4. 2022-09-16 16:59:42 +02:00