Apache Lucene open-source search software

backend information-retrieval java lucene nosql search search-engine

Go to file

Adrien Grand 611bbbd951 Sometimes intersect the essential clause and the best non-essential clause. (#12589 ) The idea behind MAXSCORE is to run disjunctions as `+(essentialClause1 ... essentialClauseM) nonEssentialClause1 ... nonEssentialClauseN`, moving more and more clauses from the essential list to the non-essential list as the minimum competitive score increases. For instance, a query such as `the book of life` which I found in the Tantivy benchmark ends up running as `+book the of life` after some time, ie. with one required clause and other clauses optional. This is because matching `the`, `of` and `life` alone is not good enough for yielding a match. Here some statistics in that case: - min competitive score: 3.4781857 - max_window_score(book): 2.8796153 - max_window_score(life): 2.037863 - max_window_score(the): 0.103848875 - max_window_score(of): 0.19427927 Actually if you look at these statistics, we could do better, because a match may only be competitive if it matches both `book` and `life`, so this query could actually execute as `+book +life the of`, which may help evaluate fewer documents compared to `+book the of life`. Especially if you enable recursive graph bisection. This is what this PR tries to achieve: in the event when there is a single essential clause and matching all clauses but the best non-essential clause cannot produce a competitive match, then the scorer will only evaluate documents that match the intersection of the essential clause and the best non-essential clause. It's worth noting that this optimization would kick in very frequently on 2-clauses disjunctions.		2023-10-24 17:54:23 +02:00
.github	Add timeouts to github jobs. Estimates taken from empirical run times (actions history), with a generous buffer added. (#12687 )	2023-10-17 08:02:53 +02:00
buildSrc	GITHUB#12655: Upgrade to Gradle 8.4	2023-10-11 16:11:53 -04:00
dev-docs	a bit of clarification about GitHub Milestone	2022-08-28 13:52:58 +09:00
dev-tools	Add a little bit more hint to releaseWizard	2023-10-09 17:05:23 -03:00
gradle	Add createClassLoader to replicator permissions (block specific to jacoco). (#12684 )	2023-10-16 09:11:57 +02:00
help	slight correction	2023-10-21 02:05:34 -04:00
lucene	Sometimes intersect the essential clause and the best non-essential clause. (#12589 )	2023-10-24 17:54:23 +02:00
.asf.yaml	.asf.yaml	2022-08-16 20:02:47 +09:00
.dir-locals.el	LUCENE-9322: Add Lucene90 codec, including VectorFormat	2020-10-18 07:49:36 -04:00
.git-blame-ignore-revs	GITHUB#12655: Add google java format upgrade tidy / regen to blame ignore	2023-10-11 16:15:42 -04:00
.gitattributes	LUCENE-10305: Ensure line endings of versions.props is LF	2021-12-11 10:10:44 +09:00
.gitignore	LUCENE-9920: Remove binary gradle-wrapper.jar from the repository	2021-04-10 16:08:39 +02:00
.hgignore	LUCENE-2792: add FST impl	2010-12-12 15:36:08 +00:00
.lift.toml	Disable liftbot, we have our own tools	2022-05-05 22:27:57 +02:00
CONTRIBUTING.md	Fix type in CONTRIBUTING.md (#11879 )	2022-11-01 20:10:05 +00:00
LICENSE.txt	LUCENE-10163 Move LICENSE and NOTICE file to top level (#388 )	2021-10-18 01:24:11 +02:00
NOTICE.txt	Cleanup NOTICE.txt (#12227 )	2023-04-18 15:58:09 -04:00
README.md	Allow building with java 18 now that gradle supports it (#11889 )	2022-10-28 23:41:09 -04:00
build.gradle	GITHUB#12655: upgrade jacoco aggregation plugin (failed with gradle 8.x), change html output property.	2023-10-12 20:42:17 +02:00
gradlew	GITHUB#12655: Upgrade to Gradle 8.4	2023-10-11 16:11:53 -04:00
gradlew.bat	GITHUB#12655: Upgrade to Gradle 8.4	2023-10-11 16:11:53 -04:00
settings.gradle	Added JMH micro-benchmarks submodule (#12663 )	2023-10-12 20:25:34 +02:00
versions.lock	Added JMH micro-benchmarks submodule (#12663 )	2023-10-12 20:25:34 +02:00
versions.props	Upgrade to errorprone 2.18 (#12086 )	2023-01-14 14:39:23 -05:00

README.md

Apache Lucene

Apache Lucene is a high-performance, full-featured text search engine library written in Java.

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Latest Releases: https://lucene.apache.org/core/documentation.html
Nightly: https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/javadoc/
Build System Documentation: help/
Developer Documentation: dev-docs/
Migration Guide: lucene/MIGRATE.md

Building

Basic steps:

Install OpenJDK 17 or 18.
Clone Lucene's git repository (or download the source distribution).
Run gradle launcher script (gradlew).

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.

See Contributing Guide for details.

Contributing

Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.

Discussion and Support

Users Mailing List
Developers Mailing List
IRC: #lucene and #lucene-dev on freenode.net