mirror of https://github.com/apache/lucene.git
5a5aa2c8fa
The current dot-product score scaling and similarity implementation assumes normalized vectors. This disregards information that the model may store within the magnitude. See: https://github.com/apache/lucene/issues/12342#issuecomment-1658640222 for a good explanation for the need. To prevent from breaking current scoring assumptions in Lucene, a new `MAXIMUM_INNER_PRODUCT` similarity function is added. Because the similarity from a `dotProduct` function call could be negative, this similarity scorer will scale negative dotProducts to between 0-1 and then all positive dotProduct values are from 1-MAX. One concern with adding this similarity function is that it breaks the triangle inequality. It is assumed that this is needed to build graph structures. But, there is conflicting research here when it comes to real-world data. See: - For: https://github.com/apache/lucene/issues/12342#issuecomment-1618258984 - Against: https://github.com/apache/lucene/issues/12342#issuecomment-1631577657, https://github.com/apache/lucene/issues/12342#issuecomment-1631808301 To check if any transformation of the input is required to satisfy the triangle inequality, many tests have been ran See: - https://github.com/apache/lucene/issues/12342#issuecomment-1653420640 - https://github.com/apache/lucene/issues/12342#issuecomment-1656112434 - https://github.com/apache/lucene/issues/12342#issuecomment-1656718447 If there are any additional tests, or issues with the provided tests & scripts, please let me know. We want to make sure this works well for our users. closes: https://github.com/apache/lucene/issues/12342 |
||
---|---|---|
.. | ||
src | ||
README.md | ||
build.gradle |
README.md
Luke
Integrated desktop GUI tool: a utility for browsing, searching and maintaining indexes and documents.
Older releases
Older releases of Luke (prior to 8.1) can be found at https://github.com/DmitryKey/luke