Commit Graph

27 Commits

Author SHA1 Message Date
Gian Merlino ffa25b7832
Query vectorization. (#6794)
* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.
2019-07-12 12:54:07 -07:00
Jonathan Wei 74960e82bf Add more Apache branding to docs (#7515) 2019-04-19 15:52:26 -07:00
Gian Merlino 9178793ab5 Further improve caching documentation. (#7236)
Follow-up to #7223 that fixes a doc bug (a result-level cache property
was misspelled), changes the recommended "small cluster" threshold from
20 to 5 servers, and clarifies behavior of the various caching options.
2019-03-11 17:57:00 -07:00
Pierre-Emile Ferron a88fbcd5db Improve caching doc (#7223)
- Set correct default values for query context result cache parameters
- Add details about broker cache impact on local historical merging
2019-03-11 20:06:28 -04:00
Jonathan Wei 32c418fdd8 Reword 'node' to 'process' (#7172) 2019-02-28 18:10:39 -08:00
Jonathan Wei 82137874ea Add master/data/query server concepts to docs/packaging (#6916)
* Add master/data/query server concepts to docs/packaging

* PR comments

* TOC and markdown fix

* Update image legend

* PR comment

* More PR comments
2019-01-30 19:41:07 -08:00
David Lim f7bbee2e65 Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 11:47:20 -08:00
Vadim Ogievetsky da4836f38c Added titles and harmonized docs to improve usability and SEO (#6731)
* added titles and harmonized docs

* manually fixed some titles
2018-12-12 20:42:12 -08:00
Jihoon Son cdae2fe7b5 Deprecate IntervalChunkingQueryRunner (#6591)
* Deprecate IntervalChunkingQueryRunner

* add doc

* deprecate metric

* fix doc
2018-11-14 06:33:27 +08:00
David Lim afb239b17a add missing license headers, in particular to MD files; clean up RAT … (#6563)
* add missing license headers, in particular to MD files; clean up RAT exclusions

* revert inadvertent doc changes

* docs

* cr changes

* fix modified druid-production.svg
2018-11-13 09:38:37 -08:00
Gian Merlino d6cbdf86c2
Broker backpressure. (#6313)
* Broker backpressure.

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.

Fixes #4933.

* Fix query context doc.
2018-09-10 09:33:29 -07:00
Jonathan Wei 180e3ccfad
Docs consistency cleanup (#6259) 2018-09-04 12:54:41 -07:00
Atul Mohan ec17a44e09 Add result level caching to Brokers (#5028)
* Add result level caching to Brokers

* Minor doc changes

* Simplify sequences

*  Move etag execution

* Modify cacheLimit criteria

* Fix incorrect etag computation

* Fix docs

* Add separate query runner for result level caching

* Update docs

* Add post aggregated results to result level cache

* Fix indents

* Check byte size for exceeding cache limit

* Fix indents

* Fix indents

* Add flag for result caching

* Remove logs

* Make cache object generation synchronous

* Avoid saving intermediate cache results to list

* Fix changes that handle etag based response

* Release bytestream after use

*  Address PR comments

*  Discard resultcache stream after use

* Fix docs

* Address comments

* Add comment about fluent workflow issue
2018-03-23 19:11:52 -07:00
Daniel 22c49b0d33 docs: fix broken link to broker configuration (#5105) 2017-11-21 13:32:00 +09:00
Gian Merlino 4e1d0f49d8 Docs: Fix link to broker configuration. (#4934) 2017-10-10 11:18:46 -07:00
kaijianding 551a89bd67 serialize DateTime As Long to improve json serde performance (#4038) 2017-06-06 10:08:51 -07:00
Himanshu 136b2fae72 improve query timeout handling and limit max scatter-gather bytes (#4229)
* improve query timeout handling and limit max scatter-gather bytes

* address review comments
2017-05-16 12:47:32 -05:00
asrayousuf e4fbc2bc5b Updating the description of useCache (#4200)
Updating the description of useCache

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment

Updating query-context doc based on Gian's comment
2017-04-25 10:26:15 -07:00
Jihoon Son 5b69f2eff2 Make timeout behavior consistent to document (#4134)
* Make timeout behavior consistent to document

* Refactoring BlockingPool and add more methods to QueryContexts

* remove unused imports

* Addressed comments

* Address comments

* remove unused method

* Make default query timeout configurable

* Fix test failure

* Change timeout from period to millis
2017-04-19 09:47:53 +09:00
Gian Merlino 4ca5270e88 Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods. (#4004)
* Ignore chunkPeriod for groupBy v2, fix chunkPeriod for irregular periods.

Includes two fixes:
- groupBy v2 now ignores chunkPeriod, since it wouldn't have helped anyway (its mergeResults
returns a lazy sequence) and it generates incorrect results.
- Fix chunkPeriod handling for periods of irregular length, like "P1M" or "P1Y".

Also includes doc and test fixes:
- groupBy v1 was no longer being tested by GroupByQueryRunnerTest since #3953, now it
  is once again.
- chunkPeriod documentation was misleading due to its checkered past. Updated it to
  be more accurate.

* Remove unused import.

* Restore buffer size.
2017-03-06 12:27:02 -06:00
Gian Merlino 16ef513c7d SQL: Add context and contextual functions to planner. (#3919)
* SQL: Add context and contextual functions to planner.

Added support for context parameters specified as JDBC connection properties
or a JSON object for SQL-over-JSON-over-HTTP.

Also added features that depend on context functionality:

- Added CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP functions.
- Added support for time zones other than UTC via a "timeZone" context.
- Pass down query context to Druid queries too.

Also some bug fixes:

- Fix DATE handling, it was largely done incorrectly before.
- Fix CAST(__time TO DATE) which should do a floor-to-day.
- Fix non-equality comparisons to FLOOR(__time TO X).
- Fix maxQueryCount property.

* Pass down context to nested queries too.
2017-02-15 14:09:14 -08:00
Jonathan Wei ca2b04f0fd Add long/float ColumnSelectorStrategy implementations (#3838)
* Add long/float ColumnSelectorStrategy implementations

* Address PR comments

* Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests

* PR comments

* Use BaseSingleValueDimensionSelector for long/float wrapping

* remove unused import

* Address PR comments

* PR comments

* PR comments

* More PR comments

* Fix failing calcite histogram subquery tests

* ScanQuery test and comment about isInputRaw

* Add outputType to extractionDimensionSpec, tweak SQL tests

* Fix limit spec optimization for numerics

* Add cardinality sanity checks to TopN

* Fix import from merge

* Add tests for filtered dimension spec outputType

* Address PR comments

* Allow filtered dimspecs on numerics

* More comments
2017-02-08 20:39:29 -08:00
Himanshu 3cfd9c64c9 make singleThreaded groupBy query config overridable at query time (#2828)
* make isSingleThreaded groupBy query processing overridable at query time

* refactor code in GroupByMergedQueryRunner to make processing of single threaded and parallel merging of runners consistent
2016-04-21 17:12:58 -07:00
Himanshu Gupta ca5de3f583 only allow lowering maxResults and maxIntermediateRows from groupBy query context 2016-03-08 15:03:59 -06:00
Himanshu Gupta 099acb4966 allow groupBy max[Intermediate]Rows limit be overridable by context 2016-03-07 15:22:41 -06:00
binlijin 2751f785f8 add doc 2016-01-12 11:25:11 +08:00
Himanshu Gupta 8edc2aaca3 renaming all *.md filenames to only have lowercase and dashes
so that they are editable on case-insensitive os as well
2015-05-29 20:55:42 -05:00