lucene/lucene/luke
Benjamin Trent 5a5aa2c8fa
GITHUB#12342 Add new maximum inner product vector similarity method (#12479)
The current dot-product score scaling and similarity implementation assumes normalized vectors. This disregards information that the model may store within the magnitude. 

See: https://github.com/apache/lucene/issues/12342#issuecomment-1658640222 for a good explanation for the need.

To prevent from breaking current scoring assumptions in Lucene, a new `MAXIMUM_INNER_PRODUCT` similarity function is added. 

Because the similarity from a `dotProduct` function call could be negative, this similarity scorer will scale negative dotProducts to between 0-1 and then all positive dotProduct values are from 1-MAX.

One concern with adding this similarity function is that it breaks the triangle inequality. It is assumed that this is needed to build graph structures. But, there is conflicting research here when it comes to real-world data.

See:
 - For: https://github.com/apache/lucene/issues/12342#issuecomment-1618258984
 - Against: https://github.com/apache/lucene/issues/12342#issuecomment-1631577657, https://github.com/apache/lucene/issues/12342#issuecomment-1631808301

To check if any transformation of the input is required to satisfy the triangle inequality, many tests have been ran

See:

 - https://github.com/apache/lucene/issues/12342#issuecomment-1653420640
 - https://github.com/apache/lucene/issues/12342#issuecomment-1656112434
 - https://github.com/apache/lucene/issues/12342#issuecomment-1656718447

If there are any additional tests, or issues with the provided tests & scripts, please let me know. We want to make sure this works well for our users.

closes: https://github.com/apache/lucene/issues/12342
2023-08-16 12:15:25 -04:00
..
src GITHUB#12342 Add new maximum inner product vector similarity method (#12479) 2023-08-16 12:15:25 -04:00
README.md remove obsolete image/description from luke/README.md 2022-03-28 08:44:29 +09:00
build.gradle LUCENE-10328: Module path for compiling and running tests is wrong (#571) 2022-01-05 20:42:02 +01:00

README.md

Luke

Integrated desktop GUI tool: a utility for browsing, searching and maintaining indexes and documents.

Older releases

Older releases of Luke (prior to 8.1) can be found at https://github.com/DmitryKey/luke