Fix docs and flipped boolean in ScanQueryLimitRowIterator

This commit is contained in:
Justin Borromeo 2019-03-25 17:17:41 -07:00
parent 35692680fc
commit 8a6bb1127c
2 changed files with 10 additions and 9 deletions

View File

@ -158,7 +158,7 @@ The format of the result when resultFormat equals `compactedList`:
The Scan query currently supports ordering based on timestamp for non-legacy queries. Note that using time ordering
will yield results that do not indicate which segment rows are from (`segmentId` will show up as `null`). Furthermore,
time ordering is only supported where the result set limit is less than `druid.query.scan.maxRowsQueuedForOrdering`
rows **or** fewer than `druid.query.scan.maxSegmentPartitionsOrderedInMemory` segments are scanned per Historical. The
rows **or** all segments scanned have fewer than `druid.query.scan.maxSegmentPartitionsOrderedInMemory` partitions. The
reasoning behind these limitations is that the implementation of time ordering uses two strategies that can consume too
much heap memory if left unbounded. These strategies (listed below) are chosen on a per-Historical basis depending on
query result set limit and the number of segments being scanned.
@ -170,12 +170,12 @@ priority queue are streamed back to the Broker(s) in batches. Attempting to loa
risk of Historical nodes running out of memory. The `druid.query.scan.maxRowsQueuedForOrdering` property protects
from this by limiting the number of rows in the query result set when time ordering is used.
2. N-Way Merge: Each segment on a Historical is opened in parallel. Since each segment's rows are already
time-ordered, an n-way merge can be performed on the results from each segment. This approach doesn't persist the entire
2. N-Way Merge: For each segment, each partition is opened in parallel. Since each partition's rows are already
time-ordered, an n-way merge can be performed on the results from each partition. This approach doesn't persist the entire
result set in memory (like the Priority Queue) as it streams back batches as they are returned from the merge function.
However, attempting to query too many segments could also result in high memory usage due to the need to open
However, attempting to query too many partition could also result in high memory usage due to the need to open
decompression and decoding buffers for each. The `druid.query.scan.maxSegmentPartitionsOrderedInMemory` limit protects
from this by capping the number of segments opened per historical when time ordering is used.
from this by capping the number of partitions opened at any times when time ordering is used.
Both `druid.query.scan.maxRowsQueuedForOrdering` and `druid.query.scan.maxSegmentPartitionsOrderedInMemory` are
configurable and can be tuned based on hardware specs and number of dimensions being queried.

View File

@ -97,9 +97,10 @@ public class ScanQueryLimitRowIterator implements CloseableIterator<ScanResultVa
throw new UOE(ScanQuery.ResultFormat.RESULT_FORMAT_VALUE_VECTOR + " is not supported yet");
}
// We want to perform batching if we are not time-ordering or are at the outer level if we are re time-ordering
// We want to perform multi-event ScanResultValue limiting if we are not time-ordering or are at the
// outer level if we are time-ordering
if (query.getOrder() == ScanQuery.Order.NONE ||
!query.getContextBoolean(ScanQuery.CTX_KEY_OUTERMOST, true)) {
query.getContextBoolean(ScanQuery.CTX_KEY_OUTERMOST, true)) {
ScanResultValue batch = yielder.get();
List events = (List) batch.getEvents();
if (events.size() <= limit - count) {
@ -114,8 +115,8 @@ public class ScanQueryLimitRowIterator implements CloseableIterator<ScanResultVa
return new ScanResultValue(batch.getSegmentId(), batch.getColumns(), events.subList(0, numLeft));
}
} else {
// Perform single-event ScanResultValue batching. Each scan result value in this case will only have one event
// so there's no need to iterate through events.
// Perform single-event ScanResultValue batching. Each scan result value from the yielder in this case will only
// have one event so there's no need to iterate through events.
int batchSize = query.getBatchSize();
List<Object> eventsToAdd = new ArrayList<>(batchSize);
List<String> columns = new ArrayList<>();