mirror of https://github.com/apache/lucene.git
Increase the default number of merge threads. (#13294)
You need as many merge threads as necessary to make sure that merges can keep up with indexing. But this number depends on the data that you are indexing: if you are only indexing stored fields, merges can copy compressed data directly and merges are only a small fraction of the total indexing+flushing+merging cost. But if you primary index knn vectors, merging N docs may require about as much work as flushing N docs. If you add the fact that documents typically go through multiple rounds of merging, the merging cost can end up being more than half of the total indexing+flushing+merging cost. This change proposes to update the default number of merge threads assuming an intermediate scenario where merges perform about half of the total indexing+flushing+merging work, ie. it gives half the threads of the system to merges. One goal of this change is to no longer have to configure a custom number of merge threads on nightly benchmarks, which run on a highly concurrent machine.
This commit is contained in:
parent
fbea47b4f4
commit
e19238a7bd
|
@ -180,6 +180,9 @@ Changes in Runtime Behavior
|
|||
* GITHUB13293: Auto I/O throttling is now disabled by default on ConcurrentMergeScheduler.
|
||||
(Adrien Grand)
|
||||
|
||||
* GITHUB#13293: ConcurrentMergeScheduler now allows up to 50% of the threads of the host to be used
|
||||
for merging. (Adrien Grand)
|
||||
|
||||
Changes in Backwards Compatibility Policy
|
||||
-----------------------------------------
|
||||
|
||||
|
|
|
@ -181,7 +181,14 @@ public class ConcurrentMergeScheduler extends MergeScheduler {
|
|||
Throwable ignored) {
|
||||
}
|
||||
|
||||
maxThreadCount = Math.max(1, Math.min(4, coreCount / 2));
|
||||
// If you are indexing at full throttle, how many merge threads do you need to keep up? It
|
||||
// depends: for most data structures, merging is cheaper than indexing/flushing, but for knn
|
||||
// vectors, merges can require about as much work as the initial indexing/flushing. Plus
|
||||
// documents are indexed/flushed only once, but may be merged multiple times.
|
||||
// Here, we assume an intermediate scenario where merging requires about as much work as
|
||||
// indexing/flushing overall, so we give half the core count to merges.
|
||||
|
||||
maxThreadCount = Math.max(1, coreCount / 2);
|
||||
maxMergeCount = maxThreadCount + 5;
|
||||
}
|
||||
}
|
||||
|
|
Loading…
Reference in New Issue