mirror of https://github.com/apache/lucene.git
611bbbd951
The idea behind MAXSCORE is to run disjunctions as `+(essentialClause1 ... essentialClauseM) nonEssentialClause1 ... nonEssentialClauseN`, moving more and more clauses from the essential list to the non-essential list as the minimum competitive score increases. For instance, a query such as `the book of life` which I found in the Tantivy benchmark ends up running as `+book the of life` after some time, ie. with one required clause and other clauses optional. This is because matching `the`, `of` and `life` alone is not good enough for yielding a match. Here some statistics in that case: - min competitive score: 3.4781857 - max_window_score(book): 2.8796153 - max_window_score(life): 2.037863 - max_window_score(the): 0.103848875 - max_window_score(of): 0.19427927 Actually if you look at these statistics, we could do better, because a match may only be competitive if it matches both `book` and `life`, so this query could actually execute as `+book +life the of`, which may help evaluate fewer documents compared to `+book the of life`. Especially if you enable recursive graph bisection. This is what this PR tries to achieve: in the event when there is a single essential clause and matching all clauses but the best non-essential clause cannot produce a competitive match, then the scorer will only evaluate documents that match the intersection of the essential clause and the best non-essential clause. It's worth noting that this optimization would kick in very frequently on 2-clauses disjunctions. |
||
---|---|---|
.github | ||
buildSrc | ||
dev-docs | ||
dev-tools | ||
gradle | ||
help | ||
lucene | ||
.asf.yaml | ||
.dir-locals.el | ||
.git-blame-ignore-revs | ||
.gitattributes | ||
.gitignore | ||
.hgignore | ||
.lift.toml | ||
CONTRIBUTING.md | ||
LICENSE.txt | ||
NOTICE.txt | ||
README.md | ||
build.gradle | ||
gradlew | ||
gradlew.bat | ||
settings.gradle | ||
versions.lock | ||
versions.props |
README.md
Apache Lucene
Apache Lucene is a high-performance, full-featured text search engine library written in Java.
Online Documentation
This README file only contains basic setup instructions. For more comprehensive documentation, visit:
- Latest Releases: https://lucene.apache.org/core/documentation.html
- Nightly: https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/javadoc/
- Build System Documentation: help/
- Developer Documentation: dev-docs/
- Migration Guide: lucene/MIGRATE.md
Building
Basic steps:
- Install OpenJDK 17 or 18.
- Clone Lucene's git repository (or download the source distribution).
- Run gradle launcher script (
gradlew
).
We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.
See Contributing Guide for details.
Contributing
Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.
Discussion and Support
- Users Mailing List
- Developers Mailing List
- IRC:
#lucene
and#lucene-dev
on freenode.net