Apache Lucene open-source search software
Go to file
Benjamin Trent 5a5aa2c8fa
GITHUB#12342 Add new maximum inner product vector similarity method (#12479)
The current dot-product score scaling and similarity implementation assumes normalized vectors. This disregards information that the model may store within the magnitude. 

See: https://github.com/apache/lucene/issues/12342#issuecomment-1658640222 for a good explanation for the need.

To prevent from breaking current scoring assumptions in Lucene, a new `MAXIMUM_INNER_PRODUCT` similarity function is added. 

Because the similarity from a `dotProduct` function call could be negative, this similarity scorer will scale negative dotProducts to between 0-1 and then all positive dotProduct values are from 1-MAX.

One concern with adding this similarity function is that it breaks the triangle inequality. It is assumed that this is needed to build graph structures. But, there is conflicting research here when it comes to real-world data.

See:
 - For: https://github.com/apache/lucene/issues/12342#issuecomment-1618258984
 - Against: https://github.com/apache/lucene/issues/12342#issuecomment-1631577657, https://github.com/apache/lucene/issues/12342#issuecomment-1631808301

To check if any transformation of the input is required to satisfy the triangle inequality, many tests have been ran

See:

 - https://github.com/apache/lucene/issues/12342#issuecomment-1653420640
 - https://github.com/apache/lucene/issues/12342#issuecomment-1656112434
 - https://github.com/apache/lucene/issues/12342#issuecomment-1656718447

If there are any additional tests, or issues with the provided tests & scripts, please let me know. We want to make sure this works well for our users.

closes: https://github.com/apache/lucene/issues/12342
2023-08-16 12:15:25 -04:00
.github Generate gradle.properties from gradlew (#12131) 2023-02-06 19:47:15 +01:00
buildSrc Implement MMapDirectory with Java 21 Project Panama Preview API (#12294) 2023-06-12 21:07:04 +02:00
dev-docs a bit of clarification about GitHub Milestone 2022-08-28 13:52:58 +09:00
dev-tools DOAP changes for release 9.7.0 2023-06-26 11:05:46 +02:00
gradle Enable search for site javadocs (#12430) 2023-07-24 10:38:19 -04:00
help Generate gradle.properties from gradlew (#12131) 2023-02-06 19:47:15 +01:00
lucene GITHUB#12342 Add new maximum inner product vector similarity method (#12479) 2023-08-16 12:15:25 -04:00
.asf.yaml .asf.yaml 2022-08-16 20:02:47 +09:00
.dir-locals.el LUCENE-9322: Add Lucene90 codec, including VectorFormat 2020-10-18 07:49:36 -04:00
.git-blame-ignore-revs LUCENE-9570: code reformatting [record rev]. 2021-01-05 13:44:42 +01:00
.gitattributes LUCENE-10305: Ensure line endings of versions.props is LF 2021-12-11 10:10:44 +09:00
.gitignore LUCENE-9920: Remove binary gradle-wrapper.jar from the repository 2021-04-10 16:08:39 +02:00
.hgignore LUCENE-2792: add FST impl 2010-12-12 15:36:08 +00:00
.lift.toml Disable liftbot, we have our own tools 2022-05-05 22:27:57 +02:00
CONTRIBUTING.md Fix type in CONTRIBUTING.md (#11879) 2022-11-01 20:10:05 +00:00
LICENSE.txt LUCENE-10163 Move LICENSE and NOTICE file to top level (#388) 2021-10-18 01:24:11 +02:00
NOTICE.txt Cleanup NOTICE.txt (#12227) 2023-04-18 15:58:09 -04:00
README.md Allow building with java 18 now that gradle supports it (#11889) 2022-10-28 23:41:09 -04:00
build.gradle Integrate the Incubating Panama Vector API (#12311) 2023-05-25 07:59:50 +01:00
gradlew Generate gradle.properties from gradlew (#12131) 2023-02-06 19:47:15 +01:00
gradlew.bat Generate gradle.properties from gradlew (#12131) 2023-02-06 19:47:15 +01:00
settings.gradle GH-11172: remove WindowsDirectory and native subproject. (#11774) 2022-09-15 16:22:46 +02:00
versions.lock remove non-NRT replication support (#12038) 2023-01-14 11:14:46 -05:00
versions.props Upgrade to errorprone 2.18 (#12086) 2023-01-14 14:39:23 -05:00

README.md

Apache Lucene

Lucene Logo

Apache Lucene is a high-performance, full-featured text search engine library written in Java.

Build Status

Online Documentation

This README file only contains basic setup instructions. For more comprehensive documentation, visit:

Building

Basic steps:

  1. Install OpenJDK 17 or 18.
  2. Clone Lucene's git repository (or download the source distribution).
  3. Run gradle launcher script (gradlew).

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.

See Contributing Guide for details.

Contributing

Bug fixes, improvements and new features are always welcome! Please review the Contributing to Lucene Guide for information on contributing.

Discussion and Support