mirror of https://github.com/apache/lucene.git
5a5aa2c8fa
The current dot-product score scaling and similarity implementation assumes normalized vectors. This disregards information that the model may store within the magnitude. See: https://github.com/apache/lucene/issues/12342#issuecomment-1658640222 for a good explanation for the need. To prevent from breaking current scoring assumptions in Lucene, a new `MAXIMUM_INNER_PRODUCT` similarity function is added. Because the similarity from a `dotProduct` function call could be negative, this similarity scorer will scale negative dotProducts to between 0-1 and then all positive dotProduct values are from 1-MAX. One concern with adding this similarity function is that it breaks the triangle inequality. It is assumed that this is needed to build graph structures. But, there is conflicting research here when it comes to real-world data. See: - For: https://github.com/apache/lucene/issues/12342#issuecomment-1618258984 - Against: https://github.com/apache/lucene/issues/12342#issuecomment-1631577657, https://github.com/apache/lucene/issues/12342#issuecomment-1631808301 To check if any transformation of the input is required to satisfy the triangle inequality, many tests have been ran See: - https://github.com/apache/lucene/issues/12342#issuecomment-1653420640 - https://github.com/apache/lucene/issues/12342#issuecomment-1656112434 - https://github.com/apache/lucene/issues/12342#issuecomment-1656718447 If there are any additional tests, or issues with the provided tests & scripts, please let me know. We want to make sure this works well for our users. closes: https://github.com/apache/lucene/issues/12342 |
||
---|---|---|
.github | ||
buildSrc | ||
dev-docs | ||
dev-tools | ||
gradle | ||
help | ||
lucene | ||
.asf.yaml | ||
.dir-locals.el | ||
.git-blame-ignore-revs | ||
.gitattributes | ||
.gitignore | ||
.hgignore | ||
.lift.toml | ||
CONTRIBUTING.md | ||
LICENSE.txt | ||
NOTICE.txt | ||
README.md | ||
build.gradle | ||
gradlew | ||
gradlew.bat | ||
settings.gradle | ||
versions.lock | ||
versions.props |
README.md
Apache Lucene
Apache Lucene is a high-performance, full-featured text search engine library written in Java.
Online Documentation
This README file only contains basic setup instructions. For more comprehensive documentation, visit:
- Latest Releases: https://lucene.apache.org/core/documentation.html
- Nightly: https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/javadoc/
- Build System Documentation: help/
- Developer Documentation: dev-docs/
- Migration Guide: lucene/MIGRATE.md
Building
Basic steps:
- Install OpenJDK 17 or 18.
- Clone Lucene's git repository (or download the source distribution).
- Run gradle launcher script (
gradlew
).
We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.
See Contributing Guide for details.
Contributing
Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.
Discussion and Support
- Users Mailing List
- Developers Mailing List
- IRC:
#lucene
and#lucene-dev
on freenode.net