Speed up sorting on unique string fields. (#11903)

Since increasing the number of hits retrieved in nightly benchmarks from 10 to 100, the performance of sorting documents by title dropped back to the level it had before introducing dynamic pruning. This is not too surprising given that the `title` field is a unique field, so the optimization would only kick in when the current 100th hit would have an ordinal that is less than 128 - something that would only happen after collecting most hits. This change increases the threshold to 1024, so that the optimization would kick in when the current 100th hit has an ordinal that is less than 1024, something that happens a bit sooner.
2023-11-02 14:16:11 +01:00 · 2023-11-02 14:16:11 +01:00 · 5b87a31556
parent 4b3f7662ce
commit 5b87a31556
2 changed files with 3 additions and 1 deletions
--- a/lucene/CHANGES.txt
+++ b/lucene/CHANGES.txt
@ -253,6 +253,8 @@ Optimizations
 * GITHUB#12719: Top-level conjunctions that are not sorted by score now have a
  specialized bulk scorer. (Adrien Grand)

+* GITHUB#11903: Faster sort on high-cardinality string fields. (Adrien Grand)
+
 Changes in runtime behavior
 ---------------------

--- a/lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
+++ b/lucene/core/src/java/org/apache/lucene/search/comparators/TermOrdValComparator.java
@ -475,7 +475,7 @@ public class TermOrdValComparator extends FieldComparator<BytesRef> {

  private class CompetitiveIterator extends DocIdSetIterator {

-    private static final int MAX_TERMS = 128;
+    private static final int MAX_TERMS = 1024;

    private final LeafReaderContext context;
    private final int maxDoc;