From e19238a7bde174e015df35a19a3ed200d335a79a Mon Sep 17 00:00:00 2001 From: Adrien Grand Date: Thu, 11 Apr 2024 21:21:28 +0200 Subject: [PATCH] Increase the default number of merge threads. (#13294) You need as many merge threads as necessary to make sure that merges can keep up with indexing. But this number depends on the data that you are indexing: if you are only indexing stored fields, merges can copy compressed data directly and merges are only a small fraction of the total indexing+flushing+merging cost. But if you primary index knn vectors, merging N docs may require about as much work as flushing N docs. If you add the fact that documents typically go through multiple rounds of merging, the merging cost can end up being more than half of the total indexing+flushing+merging cost. This change proposes to update the default number of merge threads assuming an intermediate scenario where merges perform about half of the total indexing+flushing+merging work, ie. it gives half the threads of the system to merges. One goal of this change is to no longer have to configure a custom number of merge threads on nightly benchmarks, which run on a highly concurrent machine. --- lucene/CHANGES.txt | 3 +++ .../apache/lucene/index/ConcurrentMergeScheduler.java | 9 ++++++++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/lucene/CHANGES.txt b/lucene/CHANGES.txt index 74bfa5f4917..1b68f73b8f6 100644 --- a/lucene/CHANGES.txt +++ b/lucene/CHANGES.txt @@ -180,6 +180,9 @@ Changes in Runtime Behavior * GITHUB13293: Auto I/O throttling is now disabled by default on ConcurrentMergeScheduler. (Adrien Grand) +* GITHUB#13293: ConcurrentMergeScheduler now allows up to 50% of the threads of the host to be used + for merging. (Adrien Grand) + Changes in Backwards Compatibility Policy ----------------------------------------- diff --git a/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java b/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java index f27308f2aca..d0e79375ea6 100644 --- a/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java +++ b/lucene/core/src/java/org/apache/lucene/index/ConcurrentMergeScheduler.java @@ -181,7 +181,14 @@ public class ConcurrentMergeScheduler extends MergeScheduler { Throwable ignored) { } - maxThreadCount = Math.max(1, Math.min(4, coreCount / 2)); + // If you are indexing at full throttle, how many merge threads do you need to keep up? It + // depends: for most data structures, merging is cheaper than indexing/flushing, but for knn + // vectors, merges can require about as much work as the initial indexing/flushing. Plus + // documents are indexed/flushed only once, but may be merged multiple times. + // Here, we assume an intermediate scenario where merging requires about as much work as + // indexing/flushing overall, so we give half the core count to merges. + + maxThreadCount = Math.max(1, coreCount / 2); maxMergeCount = maxThreadCount + 5; } }