Reduce contention on flushControl.isFullFlush(). (#12958)

`flushControl.isFullFlush()` is a surprising source of contention with
documents that are cheap to index and many indexing threads. If I slightly
modify luceneutil's `IndexGeoNames` benchmark to configure a 4GB indexing
buffer and disable `TextField` fields, which are more costly to index than
`KeywordField` or `IntField` fields, this brings the time to load all the
dataset in the `IndexWriter` buffers from 8.0s to 7.0s.
This commit is contained in:
Adrien Grand 2024-01-08 13:23:05 +01:00 committed by GitHub
parent 115a30d462
commit 40060f8b70
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 13 additions and 1 deletions

View File

@ -167,7 +167,11 @@ final class DocumentsWriter implements Closeable, Accountable {
private boolean applyAllDeletes() throws IOException {
final DocumentsWriterDeleteQueue deleteQueue = this.deleteQueue;
if (flushControl.isFullFlush() == false
// Check the applyAllDeletes flag first. This helps exit early most of the time without checking
// isFullFlush(), which takes a lock and introduces contention on small documents that are quick
// to index.
if (flushControl.getApplyAllDeletes()
&& flushControl.isFullFlush() == false
// never apply deletes during full flush this breaks happens before relationship.
&& deleteQueue.isOpen()
// if it's closed then it's already fully applied and we have a new delete queue

View File

@ -509,6 +509,14 @@ final class DocumentsWriterFlushControl implements Accountable, Closeable {
return flushDeletes.getAndSet(false);
}
/**
* Check whether deletes need to be applied. This can be used as a pre-flight check before calling
* {@link #getAndResetApplyAllDeletes()} to make sure that a single thread applies deletes.
*/
public boolean getApplyAllDeletes() {
return flushDeletes.get();
}
public void setApplyAllDeletes() {
flushDeletes.set(true);
}