lucene/lucene
Kevin Rosendahl ddb01cacd4
Normalize written scalar quantized vectors when using cosine similarity (#12780)
### Description

When using cosine similarity, the `ScalarQuantizer` normalizes vectors when calculating quantiles and `ScalarQuantizedRandomVectorScorer` normalizes query vectors before scoring them, but `Lucene99ScalarQuantizedVectorsWriter` does not normalize the vectors prior to quantizing them when producing the quantized vectors to write to disk. This PR normalizes vectors prior to quantizing them when writing them to disk.

Recall results on my M1 with the `glove-100-angular` data set (all using `maxConn`: 16, `beamWidth` 100, `numCandidates`: 100, `k`: 10, single segment):
| Configuration | Recall | Average Query Duration |
|---------------|-------|-----------------| 
| Pre-patch no quantization | 0.78762 | 0.68 ms |
| Pre-patch with quantization | 8.999999999999717E-5 | 0.45 ms |
| Post-patch no quantization | 0.78762 | 0.70 ms |
| Post-patch with quantization | 0.66742 | 0.66 ms |
2023-11-08 14:26:48 -05:00
..
analysis Remove usage of deprecated java.util.Locale constructor (#12761) 2023-11-06 08:06:45 +00:00
analysis.tests Introduced the Word2VecSynonymFilter (#12169) 2023-04-24 13:35:26 +02:00
backward-codecs Re-adding the backward_codecs.lucene90 TestPForUtil + TestForUtil (#12781) 2023-11-08 09:01:47 -05:00
benchmark Remove usage of deprecated java.util.Locale constructor (#12761) 2023-11-06 08:06:45 +00:00
benchmark-jmh stabilize vectorutil benchmark 2023-11-02 04:09:26 +00:00
classification Replace consecutive close() calls and close() calls with null checks with IOUtils.close() (#12428) 2023-08-16 17:12:34 -07:00
codecs Remove patching for doc blocks. (#12741) 2023-11-06 10:46:03 -05:00
core Normalize written scalar quantized vectors when using cosine similarity (#12780) 2023-11-08 14:26:48 -05:00
core.tests LUCENE-10328: Module path for compiling and running tests is wrong (#571) 2022-01-05 20:42:02 +01:00
demo GITHUB#12655: gradle tidy after google java format update for jdk 21 and regen 2023-10-11 16:12:09 -04:00
dev-docs
distribution LUCENE-10528: use Xvfb in test to avoid messing up user's desktop (#828) 2022-04-23 08:00:33 -04:00
distribution.tests Added JMH micro-benchmarks submodule (#12663) 2023-10-12 20:25:34 +02:00
documentation fix typo in documentation 2021-11-28 10:11:49 +09:00
expressions GITHUB#12655: gradle tidy after google java format update for jdk 21 and regen 2023-10-11 16:12:09 -04:00
facet Ensure LeafCollector#finish is only called once on the main collector during drill-sideways (#12642) 2023-10-13 07:24:40 -07:00
grouping GITHUB#12655: gradle tidy after google java format update for jdk 21 and regen 2023-10-11 16:12:09 -04:00
highlighter Scorer should sum up scores into a double (#12682) 2023-10-23 13:01:06 -04:00
join GITHUB#12655: gradle tidy after google java format update for jdk 21 and regen 2023-10-11 16:12:09 -04:00
licenses remove non-NRT replication support (#12038) 2023-01-14 11:14:46 -05:00
luke Remove usage of deprecated java.util.Locale constructor (#12761) 2023-11-06 08:06:45 +00:00
memory Record if block API has been used in SegmentInfo (#12685) 2023-10-23 09:46:12 +02:00
misc Speed up the sort when building forward index (#12712) 2023-10-25 13:36:52 +08:00
monitor Fix NullPointerException in Monitor.getQuery when query is not present (#12736) 2023-10-31 15:05:31 +00:00
queries Remove patching for doc blocks. (#12741) 2023-11-06 10:46:03 -05:00
queryparser Remove usage of deprecated java.util.Locale constructor (#12761) 2023-11-06 08:06:45 +00:00
replicator github-12386: set java.io.tmpdir in replicator tests' forked processes (#12387) 2023-06-23 08:38:06 -04:00
sandbox Bound the RAM used by the NodeHash (sharing common suffixes) during FST compilation (#12633) 2023-10-20 11:52:55 -04:00
spatial-extras GITHUB#12655: gradle tidy after google java format update for jdk 21 and regen 2023-10-11 16:12:09 -04:00
spatial-test-fixtures Upgrade google java format and apply tidy (#11811) 2022-09-24 15:40:27 +02:00
spatial3d GITHUB#12655: gradle tidy after google java format update for jdk 21 and regen 2023-10-11 16:12:09 -04:00
suggest Remove patching for doc blocks. (#12741) 2023-11-06 10:46:03 -05:00
test-framework Normalize written scalar quantized vectors when using cosine similarity (#12780) 2023-11-08 14:26:48 -05:00
CHANGES.txt TestIndexWriterOnVMError.testUnknownError times out (fixes potential IW deadlock on tragic exceptions). (#12751) 2023-11-08 09:47:19 +01:00
JRE_VERSION_MIGRATION.md LUCENE-10163: clean up and remove some old cruft in readme files. Move binary release only README.md to the distribution project so that it doesn't look weird in the source tree. (#406) 2021-10-26 21:20:42 +02:00
MIGRATE.md Remove deprecated IndexSearcher#getExecutor method (#12580) 2023-09-21 12:30:32 +02:00
SYSTEM_REQUIREMENTS.md LUCENE-10283: Bump minimum required Java version to 17. (#579) 2022-01-10 15:42:15 +01:00
build.gradle