Fix Lucene94HnswVectorsFormat validation on large segments (#11861)

When reading large segments, the vectors format can fail with a validation
error:

java.lang.IllegalStateException: Vector data length 3070061568 not matching
size=999369 * dim=768 * byteSize=4 = -1224905728

The problem is that we use an integer to represent the size, which is too small
to hold it. The bug snuck in during the work to enable int8 values, which
switched a long value to an int.
This commit is contained in:
Julie Tibshirani 2022-10-19 13:49:59 -07:00 committed by GitHub
parent 6cde41c9fd
commit 0f525bfb14
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 10 additions and 1 deletions

View File

@ -157,6 +157,14 @@ Other
* LUCENE-10635: Ensure test coverage for WANDScorer by using a test query. (Zach Chen, Adrien Grand)
======================== Lucene 9.4.1 =======================
Bug Fixes
---------------------
* GITHUB#11858: Fix kNN vectors format validation on large segments. This
addresses a regression in 9.4.0 where validation could fail, preventing
further writes or searches on the index. (Julie Tibshirani)
======================== Lucene 9.4.0 =======================
API Changes

View File

@ -175,7 +175,8 @@ public final class Lucene94HnswVectorsReader extends KnnVectorsReader {
case BYTE -> Byte.BYTES;
case FLOAT32 -> Float.BYTES;
};
int numBytes = fieldEntry.size * dimension * byteSize;
long vectorBytes = Math.multiplyExact((long) dimension, byteSize);
long numBytes = Math.multiplyExact(vectorBytes, fieldEntry.size);
if (numBytes != fieldEntry.vectorDataLength) {
throw new IllegalStateException(
"Vector data length "