Apache Lucene open-source search software
Go to file
Adrien Grand 611bbbd951
Sometimes intersect the essential clause and the best non-essential clause. (#12589)
The idea behind MAXSCORE is to run disjunctions as `+(essentialClause1 ...
essentialClauseM) nonEssentialClause1 ... nonEssentialClauseN`, moving more and
more clauses from the essential list to the non-essential list as the minimum
competitive score increases. For instance, a query such as `the book of life`
which I found in the Tantivy benchmark ends up running as `+book the of life`
after some time, ie. with one required clause and other clauses optional. This
is because matching `the`, `of` and `life` alone is not good enough for
yielding a match.

Here some statistics in that case:
 - min competitive score: 3.4781857
 - max_window_score(book): 2.8796153
 - max_window_score(life): 2.037863
 - max_window_score(the): 0.103848875
 - max_window_score(of): 0.19427927

Actually if you look at these statistics, we could do better, because a match
may only be competitive if it matches both `book` and `life`, so this query
could actually execute as `+book +life the of`, which may help evaluate fewer
documents compared to `+book the of life`. Especially if you enable recursive
graph bisection.

This is what this PR tries to achieve: in the event when there is a single
essential clause and matching all clauses but the best non-essential clause
cannot produce a competitive match, then the scorer will only evaluate
documents that match the intersection of the essential clause and the best
non-essential clause.

It's worth noting that this optimization would kick in very frequently on
2-clauses disjunctions.
2023-10-24 17:54:23 +02:00
.github Add timeouts to github jobs. Estimates taken from empirical run times (actions history), with a generous buffer added. (#12687) 2023-10-17 08:02:53 +02:00
buildSrc GITHUB#12655: Upgrade to Gradle 8.4 2023-10-11 16:11:53 -04:00
dev-docs a bit of clarification about GitHub Milestone 2022-08-28 13:52:58 +09:00
dev-tools Add a little bit more hint to releaseWizard 2023-10-09 17:05:23 -03:00
gradle Add createClassLoader to replicator permissions (block specific to jacoco). (#12684) 2023-10-16 09:11:57 +02:00
help slight correction 2023-10-21 02:05:34 -04:00
lucene Sometimes intersect the essential clause and the best non-essential clause. (#12589) 2023-10-24 17:54:23 +02:00
.asf.yaml .asf.yaml 2022-08-16 20:02:47 +09:00
.dir-locals.el LUCENE-9322: Add Lucene90 codec, including VectorFormat 2020-10-18 07:49:36 -04:00
.git-blame-ignore-revs GITHUB#12655: Add google java format upgrade tidy / regen to blame ignore 2023-10-11 16:15:42 -04:00
.gitattributes LUCENE-10305: Ensure line endings of versions.props is LF 2021-12-11 10:10:44 +09:00
.gitignore LUCENE-9920: Remove binary gradle-wrapper.jar from the repository 2021-04-10 16:08:39 +02:00
.hgignore LUCENE-2792: add FST impl 2010-12-12 15:36:08 +00:00
.lift.toml Disable liftbot, we have our own tools 2022-05-05 22:27:57 +02:00
CONTRIBUTING.md Fix type in CONTRIBUTING.md (#11879) 2022-11-01 20:10:05 +00:00
LICENSE.txt LUCENE-10163 Move LICENSE and NOTICE file to top level (#388) 2021-10-18 01:24:11 +02:00
NOTICE.txt Cleanup NOTICE.txt (#12227) 2023-04-18 15:58:09 -04:00
README.md Allow building with java 18 now that gradle supports it (#11889) 2022-10-28 23:41:09 -04:00
build.gradle GITHUB#12655: upgrade jacoco aggregation plugin (failed with gradle 8.x), change html output property. 2023-10-12 20:42:17 +02:00
gradlew GITHUB#12655: Upgrade to Gradle 8.4 2023-10-11 16:11:53 -04:00
gradlew.bat GITHUB#12655: Upgrade to Gradle 8.4 2023-10-11 16:11:53 -04:00
settings.gradle Added JMH micro-benchmarks submodule (#12663) 2023-10-12 20:25:34 +02:00
versions.lock Added JMH micro-benchmarks submodule (#12663) 2023-10-12 20:25:34 +02:00
versions.props Upgrade to errorprone 2.18 (#12086) 2023-01-14 14:39:23 -05:00

README.md

Apache Lucene

Lucene Logo

Apache Lucene is a high-performance, full-featured text search engine library written in Java.

Build Status

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Building

Basic steps:

  1. Install OpenJDK 17 or 18.
  2. Clone Lucene's git repository (or download the source distribution).
  3. Run gradle launcher script (gradlew).

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.

See Contributing Guide for details.

Contributing

Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.

Discussion and Support