Deduplicate bytes for `FieldReader#rootCode` (#13610)

Looking at how these instances are serialized to disk it appears
that the empty output in the FST metadata is always the same as the
rootCode bytes.
Without changing the serialization we could at least deduplicate here,
saving hundreds of MB in some high-segment count use cases I observed in
ES.
This commit is contained in:
Armin Braun 2024-07-31 20:31:13 +02:00 committed by GitHub
parent 255a2fcf9c
commit ca098e63b9
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
1 changed files with 8 additions and 1 deletions

View File

@ -78,7 +78,6 @@ public final class FieldReader extends Terms {
this.sumTotalTermFreq = sumTotalTermFreq;
this.sumDocFreq = sumDocFreq;
this.docCount = docCount;
this.rootCode = rootCode;
this.minTerm = minTerm;
this.maxTerm = maxTerm;
// if (DEBUG) {
@ -100,6 +99,14 @@ public final class FieldReader extends Terms {
w.close();
}
*/
BytesRef emptyOutput = metadata.getEmptyOutput();
if (rootCode.equals(emptyOutput) == false) {
// TODO: this branch is never taken
assert false;
this.rootCode = rootCode;
} else {
this.rootCode = emptyOutput;
}
}
long readVLongOutput(DataInput in) throws IOException {