Additional docs/log for direct memory usage (#4631)

* Additional docs/log for direct memory usage

* Tweak docs

* Doc rewording
This commit is contained in:
Jonathan Wei 2017-08-10 23:33:20 -07:00 committed by Gian Merlino
parent eb0deba54a
commit 1bddfc089c
2 changed files with 18 additions and 1 deletions

View File

@ -20,6 +20,21 @@ We recommend 250mb * (processing.numThreads) for the heap.
[Coordinator nodes](../design/coordinator.html) do not require off-heap memory and the heap is used for loading information about all segments to determine what segments need to be loaded, dropped, moved, or replicated.
## How much direct memory does Druid use?
Any Druid node that process queries (brokers, ingestion workers, and historical nodes) use two kinds of direct memory buffers with configurable size: processing buffers and merge buffers.
Each processing thread is allocated one processing buffer. Additionally, there is a shared pool of merge buffers (only used for GroupBy V2 queries currently).
Other sources of direct memory usage include:
- When a column is loaded for reading, a 64KB direct buffer is allocated for decompression.
- When a set of segments are merged during ingestion, a direct buffer is allocated for every String typed column, for every segment in the set to be merged. The size of these buffers are equal to the cardinality of the String column within its segment, times 4 bytes (the buffers store integers). For example, if two segments are being merged, the first segment having a single String column with cardinality 1000, and the second segment having a String column with cardinality 500, the merge step would allocate (1000 + 500) * 4 = 6000 bytes of direct memory. These buffers are used for merging the value dictionaries of the String column across segments. These "dictionary merging buffers" are independent of the "merge buffers" configured by `druid.processing.numMergeBuffers`.
A useful formula for estimating direct memory usage follows:
`druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + druid.processing.numThreads + 1)`
The `+1` is a fuzzy parameter meant to account for the decompression and dictionary merging buffers and may need to be adjusted based on the characteristics of the data being ingested/queried.
## What is the intermediate computation buffer?
The intermediate computation buffer specifies a buffer size for the storage of intermediate results. The computation engine in both the Historical and Realtime nodes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed. The default size is 1073741824 bytes (1GB).

View File

@ -445,7 +445,9 @@ public interface IndexMerger
}
Indexed<String> indexed = dimValueLookups[i];
if (useDirect) {
conversions[i] = ByteBuffer.allocateDirect(indexed.size() * Ints.BYTES).asIntBuffer();
int allocationSize = indexed.size() * Ints.BYTES;
log.info("Allocating dictionary merging direct buffer with size[%d]", allocationSize);
conversions[i] = ByteBuffer.allocateDirect(allocationSize).asIntBuffer();
} else {
conversions[i] = IntBuffer.allocate(indexed.size());
}