Adjust default globalIngestionHeapLimitBytes for indexer, add more docs (#8255)

2019-08-08 01:04:07 -05:00 · 2019-08-08 01:04:07 -05:00 · e88bbe71c0
parent 03a37936ec
commit e88bbe71c0
2 changed files with 9 additions and 3 deletions
--- a/docs/content/development/indexer.md
+++ b/docs/content/development/indexer.md
@ -62,11 +62,17 @@ In addition to these two pools, 2 separate threads are allocated for lookup hand

 The Indexer uses the `druid.worker.globalIngestionHeapLimitBytes` configuration to impose a global heap limit across all of the tasks it is running. 

-This global limit is evenly divided across the number of task slots configured by `druid.worker.capacity`. 
+This global limit is evenly divided across the number of task slots configured by `druid.worker.capacity`.

 To apply the per-task heap limit, the Indexer will override `maxBytesInMemory` in task tuning configs (i.e., ignoring the default value or any user configured value). `maxRowsInMemory` will also be overridden to an essentially unlimited value: the Indexer does not support row limits.

-By default, `druid.worker.globalIngestionHeapLimitBytes` is set to 60% of the available JVM heap. The remaining portion of the heap is reserved for query processing and segment persist/merge operations, and miscellaneous heap usage.
+By default, `druid.worker.globalIngestionHeapLimitBytes` is set to 1/6th of the available JVM heap. This default is chosen to align with the default value of `maxBytesInMemory` in task tuning configs when using the MiddleManager/Peon system, which is also 1/6th of the JVM heap.
+
+The peak usage for rows held in heap memory relates to the interaction between the `maxBytesInMemory` and `maxPendingPersists` properties in the task tuning configs. When the amount of row data held in-heap by a task reaches the limit specified by `maxBytesInMemory`, a task will persist the in-heap row data. After the persist has been started, the task can again ingest up to `maxBytesInMemory` bytes worth of row data while the persist is running.
+
+This means that the peak in-heap usage for row data can be up to approximately `maxBytesInMemory` * (2 + `maxPendingPersists`). The default value of `maxPendingPersists` is 0, which allows for 1 persist to run concurrently with ingestion work.
+
+The remaining portion of the heap is reserved for query processing and segment persist/merge operations, and miscellaneous heap usage.

 #### Concurrent Segment Persist/Merge Limits

--- a/server/src/main/java/org/apache/druid/indexing/worker/config/WorkerConfig.java
+++ b/server/src/main/java/org/apache/druid/indexing/worker/config/WorkerConfig.java
@ -50,7 +50,7 @@ public class WorkerConfig
  private Period intermediaryPartitionTimeout = new Period("P1D");

  @JsonProperty
-  private final long globalIngestionHeapLimitBytes = (long) (Runtime.getRuntime().maxMemory() * 0.6);
+  private final long globalIngestionHeapLimitBytes = (long) (Runtime.getRuntime().maxMemory() / 6);

  @JsonProperty
  private final int numConcurrentMerges = (int) Math.max(1, capacity / 2);