diff --git a/docs/content/querying/scan-query.md b/docs/content/querying/scan-query.md index fd43fc092eb..1bb67158de9 100644 --- a/docs/content/querying/scan-query.md +++ b/docs/content/querying/scan-query.md @@ -158,7 +158,7 @@ The format of the result when resultFormat equals `compactedList`: The Scan query currently supports ordering based on timestamp for non-legacy queries. Note that using time ordering will yield results that do not indicate which segment rows are from (`segmentId` will show up as `null`). Furthermore, time ordering is only supported where the result set limit is less than `druid.query.scan.maxRowsQueuedForOrdering` -rows **or** fewer than `druid.query.scan.maxSegmentPartitionsOrderedInMemory` segments are scanned per Historical. The +rows **or** all segments scanned have fewer than `druid.query.scan.maxSegmentPartitionsOrderedInMemory` partitions. The reasoning behind these limitations is that the implementation of time ordering uses two strategies that can consume too much heap memory if left unbounded. These strategies (listed below) are chosen on a per-Historical basis depending on query result set limit and the number of segments being scanned. @@ -170,12 +170,12 @@ priority queue are streamed back to the Broker(s) in batches. Attempting to loa risk of Historical nodes running out of memory. The `druid.query.scan.maxRowsQueuedForOrdering` property protects from this by limiting the number of rows in the query result set when time ordering is used. -2. N-Way Merge: Each segment on a Historical is opened in parallel. Since each segment's rows are already -time-ordered, an n-way merge can be performed on the results from each segment. This approach doesn't persist the entire +2. N-Way Merge: For each segment, each partition is opened in parallel. Since each partition's rows are already +time-ordered, an n-way merge can be performed on the results from each partition. This approach doesn't persist the entire result set in memory (like the Priority Queue) as it streams back batches as they are returned from the merge function. -However, attempting to query too many segments could also result in high memory usage due to the need to open +However, attempting to query too many partition could also result in high memory usage due to the need to open decompression and decoding buffers for each. The `druid.query.scan.maxSegmentPartitionsOrderedInMemory` limit protects -from this by capping the number of segments opened per historical when time ordering is used. +from this by capping the number of partitions opened at any times when time ordering is used. Both `druid.query.scan.maxRowsQueuedForOrdering` and `druid.query.scan.maxSegmentPartitionsOrderedInMemory` are configurable and can be tuned based on hardware specs and number of dimensions being queried. diff --git a/processing/src/main/java/org/apache/druid/query/scan/ScanQueryLimitRowIterator.java b/processing/src/main/java/org/apache/druid/query/scan/ScanQueryLimitRowIterator.java index 4103b147697..7165de4134e 100644 --- a/processing/src/main/java/org/apache/druid/query/scan/ScanQueryLimitRowIterator.java +++ b/processing/src/main/java/org/apache/druid/query/scan/ScanQueryLimitRowIterator.java @@ -97,9 +97,10 @@ public class ScanQueryLimitRowIterator implements CloseableIterator eventsToAdd = new ArrayList<>(batchSize); List columns = new ArrayList<>();