1845 Commits

Author SHA1 Message Date
Roman Leventov
ceb969903f
Refactor SQLMetadataSegmentManager; Change contract of REST met… (#7653)
* Refactor SQLMetadataSegmentManager; Change contract of REST methods in DataSourcesResource

* Style fixes

* Unused imports

* Fix tests

* Fix style

* Comments

* Comment fix

* Remove unresolvable Javadoc references; address comments

* Add comments to ImmutableDruidDataSource

* Merge with master

* Fix bad web-console merge

* Fixes in api-reference.md

* Rename in DruidCoordinatorRuntimeParams

* Fix compilation

* Residual changes
2019-07-17 17:18:48 +03:00
Magnus Henoch
179253a2fc Fix documentation formatting (#8079)
The Markdown dialect used when publishing the documentation to the web
site is much more sensitive than Github-flavoured Markdown.  In
particular, it requires an empty line before code blocks (unless the
code block starts right after a heading), otherwise the code block
gets formatted in-line with the previous paragraph.  Likewise for
bullet-point lists.
2019-07-15 09:55:18 -07:00
Gian Merlino
ffa25b7832
Query vectorization. (#6794)
* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.
2019-07-12 12:54:07 -07:00
Chi Cao Minh
da3d141dd2 Add inline firehose (#8056)
* Add inline firehose

To allow users to quickly parsing and schema, add a firehose that reads
data that is inlined in its spec.

* Address review comments

* Remove suppression of sonar warnings
2019-07-11 21:43:46 -07:00
Atul Mohan
631cda649b Include replicated segment size property for datasources endpoint (#8039)
* Add replication size

* Summon comma
2019-07-11 01:10:38 -07:00
Himanshu
14aec7fcec
add config to optionally disable all compression in intermediate segment persists while ingestion (#7919)
* disable all compression in intermediate segment persists while ingestion

* more changes and build fix

* by default retain existing indexingSpec for intermediate persisted segments

* document indexSpecForIntermediatePersists index tuning config

* fix build issues

* update serde tests
2019-07-10 12:22:24 -07:00
Sashidhar Thallam
3353da2974 Adding missing docs for druid.indexer.logs.disableAcl (#8046) 2019-07-09 16:11:25 -07:00
Eyal Yurman
2eee711653 Add missing reference to Materialized-View extension. (#8003)
* Reference Materialized View extension from extensions page.

* Add comma
2019-07-06 13:50:41 -07:00
Dinesh Sawant
9c7c7c58ae Fix overlord port in delete data tutorial (#8037)
In Single-Server Quickstart tutorial the overlord and coordinator
is started as one process on port 8081. But in delete data tutorial the kill
task is sent to 8090 port, which fails.
2019-07-06 08:50:01 -07:00
Chi Cao Minh
0ded0ce414 Add round support for DS-HLL (#8023)
* Add round support for DS-HLL

Since the Cardinality aggregator has a "round" option to round off estimated
values generated from the HyperLogLog algorithm, add the same "round" option to
the DataSketches HLL Sketch module aggregators to be consistent.

* Fix checkstyle errors

* Change HllSketchSqlAggregator to do rounding

* Fix test for standard-compliant null handling mode
2019-07-05 15:37:58 -07:00
Clint Wylie
42a7b8849a remove FirehoseV2 and realtime node extensions (#8020)
* remove firehosev2 and realtime node extensions

* revert intellij stuff

* rat exclusion
2019-07-04 15:40:22 -07:00
Gian Merlino
613f09b45a SQL: Add TIME_CEIL function. (#8027)
Also simplify conversions for CEIL, FLOOR, and TIME_FLOOR by allowing them to
share more code.
2019-07-04 15:40:03 -07:00
Clint Wylie
3b84246cd6 add SQL docs for multi-value string dimensions (#8011)
* add SQL docs for multi-value string dimensions

* formatting consistency

* fix typo

* adjust
2019-07-03 08:22:33 -07:00
Clint Wylie
c556d44a19
more sql support for expression array functions (#7974)
* more sql support for expression array functions

* prepend/slice

* doc fixes

* fix imports

* fix tests

* add null numeric expr for proper conversions between ExprEval and Expr and back to ExprEval

* re-arrange

* imports :(

* add append/prepend test
2019-07-02 21:39:26 -07:00
Clint Wylie
f7283378ac
remove deprecated standalone realtime node (#7915)
* remove CliRealtime, RealtimeManager, etc

* add redirects for deleted page to page that explains the deleted thing

* adjust docs
2019-07-02 18:12:17 -07:00
Clint Wylie
93b738bbfa
expression language array constructor and sql multi-value string filtering support (#7973)
* expr array constructor and sql multi-value string support

* doc fix

* checkstyle

* change from feedback
2019-07-01 15:14:50 -07:00
Eyal Yurman
3650eed1aa Improve pull-deps reference in extensions page. (#8002) 2019-07-01 11:18:27 -07:00
Xue Yu
2831944056 support NVL sql function (#7965)
* sql nvl

* add nvl in sql doc
2019-06-30 13:14:30 -07:00
Alexander Saydakov
f38a62e949 theta sketch to string post agg (#7937) 2019-06-27 15:09:57 -07:00
Vadim Ogievetsky
ad45ef12ed fix SQL doc comment (#7981) 2019-06-27 15:05:45 -07:00
Clint Wylie
10d6b0318d clarify granularity docs (#7977) 2019-06-27 08:51:22 -07:00
Xue Yu
5464c8938f Add array_slice and array_unshift function expr (#7950)
* add array_slice and array_unshift function expr

* feedback address
2019-06-26 16:56:09 -07:00
Benedict Jin
16aafd5788 [ImgBot] Optimize images (#7873)
*Total -- 10,997.25kb -> 7,160.16kb (34.89%)

/publications/radstack/figures/precompute.png -- 54.20kb -> 16.97kb (68.69%)
/web-console/favicon.png -- 4.41kb -> 1.61kb (63.58%)
/docs/img/indexing_service.png -- 47.37kb -> 21.96kb (53.64%)
/docs/img/segmentPropagation.png -- 62.94kb -> 29.85kb (52.57%)
/docs/content/tutorials/img/tutorial-quickstart-01.png -- 55.62kb -> 29.13kb (47.62%)
/docs/content/tutorials/img/tutorial-deletion-02.png -- 791.43kb -> 429.30kb (45.76%)
/docs/content/tutorials/img/tutorial-deletion-03.png -- 786.79kb -> 427.05kb (45.72%)
/docs/content/tutorials/img/tutorial-retention-00.png -- 135.06kb -> 75.88kb (43.82%)
/docs/content/tutorials/img/tutorial-batch-data-loader-10.png -- 77.23kb -> 43.47kb (43.71%)
/docs/content/tutorials/img/tutorial-batch-data-loader-01.png -- 97.03kb -> 55.16kb (43.15%)
/docs/content/tutorials/img/tutorial-batch-data-loader-07.png -- 79.49kb -> 45.44kb (42.84%)
/docs/content/tutorials/img/tutorial-retention-02.png -- 401.30kb -> 234.68kb (41.52%)
/docs/content/tutorials/img/tutorial-compaction-06.png -- 343.27kb -> 201.87kb (41.19%)
/docs/content/tutorials/img/tutorial-batch-data-loader-09.png -- 105.14kb -> 61.86kb (41.16%)
/docs/content/tutorials/img/tutorial-retention-06.png -- 227.57kb -> 134.35kb (40.97%)
/docs/content/tutorials/img/tutorial-compaction-04.png -- 304.83kb -> 180.04kb (40.94%)
/docs/content/tutorials/img/tutorial-compaction-02.png -- 273.18kb -> 162.67kb (40.45%)
/docs/content/tutorials/img/tutorial-query-05.png -- 85.03kb -> 50.64kb (40.44%)
/publications/radstack/figures/druid_vs_bigquery.png -- 155.44kb -> 92.85kb (40.27%)
/docs/content/tutorials/img/tutorial-kafka-02.png -- 122.51kb -> 73.93kb (39.65%)
/docs/content/tutorials/img/tutorial-deletion-01.png -- 70.37kb -> 42.56kb (39.52%)
/docs/content/tutorials/img/tutorial-batch-data-loader-06.png -- 103.50kb -> 62.79kb (39.33%)
/docs/content/tutorials/img/tutorial-batch-submit-task-01.png -- 111.25kb -> 67.73kb (39.12%)
/docs/content/tutorials/img/tutorial-query-03.png -- 103.60kb -> 63.51kb (38.69%)
/docs/content/tutorials/img/tutorial-query-04.png -- 105.79kb -> 64.87kb (38.69%)
/docs/content/tutorials/img/tutorial-batch-data-loader-11.png -- 130.20kb -> 81.34kb (37.53%)
/docs/content/tutorials/img/tutorial-query-07.png -- 122.52kb -> 76.79kb (37.32%)
/docs/content/tutorials/img/tutorial-kafka-01.png -- 133.12kb -> 83.47kb (37.3%)
/docs/content/tutorials/img/tutorial-query-06.png -- 127.55kb -> 80.28kb (37.06%)
/docs/content/tutorials/img/tutorial-batch-submit-task-02.png -- 133.07kb -> 84.06kb (36.83%)
/docs/content/tutorials/img/tutorial-retention-05.png -- 60.19kb -> 38.08kb (36.74%)
/docs/content/tutorials/img/tutorial-batch-data-loader-03.png -- 211.92kb -> 134.22kb (36.66%)
/docs/content/tutorials/img/tutorial-batch-data-loader-05.png -- 250.36kb -> 158.68kb (36.62%)
/publications/radstack/figures/radstack.png -- 16.80kb -> 10.67kb (36.48%)
/docs/content/tutorials/img/tutorial-batch-data-loader-08.png -- 158.59kb -> 101.49kb (36%)
/docs/content/tutorials/img/tutorial-batch-data-loader-04.png -- 255.10kb -> 163.33kb (35.97%)
/docs/content/tutorials/img/tutorial-query-02.png -- 126.92kb -> 81.42kb (35.85%)
/docs/content/tutorials/img/tutorial-compaction-01.png -- 53.86kb -> 34.87kb (35.25%)
/docs/img/druid-architecture.png -- 202.23kb -> 130.97kb (35.24%)
/docs/content/tutorials/img/tutorial-retention-01.png -- 52.69kb -> 34.35kb (34.81%)
/docs/img/druid-timeline.png -- 35.87kb -> 23.59kb (34.22%)
/docs/content/tutorials/img/tutorial-query-01.png -- 149.53kb -> 98.56kb (34.08%)
/docs/content/tutorials/img/tutorial-retention-04.png -- 65.91kb -> 43.57kb (33.89%)
/docs/content/tutorials/img/tutorial-compaction-08.png -- 42.24kb -> 28.08kb (33.53%)
/docs/content/tutorials/img/tutorial-compaction-07.png -- 39.17kb -> 26.06kb (33.47%)
/docs/content/tutorials/img/tutorial-compaction-03.png -- 39.17kb -> 26.13kb (33.3%)
/docs/content/tutorials/img/tutorial-compaction-05.png -- 38.85kb -> 25.96kb (33.17%)
/publications/demo/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%)
/publications/radstack/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%)
/publications/whitepaper/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%)
/docs/content/tutorials/img/tutorial-retention-03.png -- 43.11kb -> 29.33kb (31.97%)
/publications/radstack/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%)
/publications/whitepaper/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%)
/publications/demo/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%)
/publications/radstack/figures/joined.png -- 164.14kb -> 113.47kb (30.87%)
/docs/content/tutorials/img/tutorial-batch-data-loader-02.png -- 508.93kb -> 351.85kb (30.87%)
/publications/radstack/figures/imps_clicks.png -- 190.95kb -> 132.70kb (30.51%)
/publications/radstack/figures/shuffled.png -- 180.46kb -> 128.21kb (28.95%)
/publications/radstack/figures/pipeline.png -- 392.54kb -> 281.93kb (28.18%)
/docs/img/druid-manage-1.png -- 108.94kb -> 78.53kb (27.92%)
/publications/radstack/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%)
/publications/demo/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%)
/publications/whitepaper/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%)
/docs/img/druid-production.png -- 50.00kb -> 39.18kb (21.63%)
/docs/img/druid-dataflow-3.png -- 88.25kb -> 69.75kb (20.96%)
/publications/demo/figures/realtime_flow.png -- 51.12kb -> 40.61kb (20.56%)
/publications/demo/figures/realtime_timeline.png -- 36.15kb -> 29.24kb (19.12%)
/publications/demo/figures/tpch_scaling.png -- 43.21kb -> 34.97kb (19.08%)
/publications/demo/figures/caching.png -- 35.26kb -> 29.09kb (17.49%)
/dev/intellij-sdk-config.jpg -- 1,019.35kb -> 864.37kb (15.2%)
/docs/img/druid-column-types.png -- 101.53kb -> 91.17kb (10.2%)
/docs/img/druid-dataflow-2x.png -- 138.30kb -> 127.11kb (8.09%)
2019-06-24 21:27:48 -07:00
Jonathan Wei
35601bb7a0 Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory (#7784)
* Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory

* Add finalizeAsBase64Binary option to ApproximateHistogramFactory

* Update approx histogram doc
2019-06-21 18:00:19 -07:00
Clint Wylie
494b8ebe56 multi-value string column support for expressions (#7588)
* array support for expression language for multi-value string columns

* fix tests?

* fixes

* more tests

* fixes

* cleanup

* more better, more test

* ignore inspection

* license

* license fix

* inspection

* remove dumb import

* more better

* some comments

* add expr rewrite for arrayfn args for more magic, tests

* test stuff

* more tests

* fix test

* fix test

* castfunc can deal with arrays

* needs more empty array

* more tests, make cast to long array more forgiving

* refactor

* simplify ExprMacro Expr implementations with base classes in core

* oops

* more test

* use Shuttle for Parser.flatten, javadoc, cleanup

* fixes and more tests

* unused import

* fixes

* javadocs, cleanup, refactors

* fix imports

* more javadoc

* more javadoc

* more

* more javadocs, nonnullbydefault, minor refactor

* markdown fix

* adjustments

* more doc

* move initial filter out

* docs

* map empty arg lambda, apply function argument validation

* check function args at parse time instead of eval time

* more immutable

* more more immutable

* clarify grammar

* fix docs

* empty array is string test, we need a way to make arrays better maybe in the future, or define empty arrays as other types..
2019-06-19 13:57:37 -07:00
Clint Wylie
71997c16a2 switch links from druid.io to druid.apache.org (#7914)
* switch links from druid.io to druid.apache.org

* fix it
2019-06-18 09:06:27 -07:00
Vadim Ogievetsky
24dd4573da Added the web console to the quickstart tutorials and docs (#7863)
* added console to the quickstart tutorials

* feedback fixes

* feedback fixes

* more typo fixes

* moved reseting cluster section after load data

* update images

* stage -> step

* feedback fixes

* more feedback fixes
2019-06-17 18:00:54 -07:00
Himanshu
b3328b2785
endpoint to delete lookup tier and remove tier on last lookup deletion (#7852) 2019-06-15 17:55:50 -07:00
Justin Borromeo
8e5003b01c Scan Doc Change (#7903) 2019-06-15 01:21:34 -07:00
Xue Yu
456a3654ce add PolygonBound and missing extentions list doc (#7885) 2019-06-13 12:03:58 -07:00
Clint Wylie
8117222da3 use right port for kafka tutorial, reinfoce that tutorials assume you are using micro-quickstart single-server configuration (#7862) 2019-06-11 08:50:52 -07:00
Xue Yu
ce591d1457 Support var_pop, var_samp, stddev_pop and stddev_samp etc in sql (#7801)
* support var_pop, stddev_pop etc in sql

* fix sql compatible

* rebase on master

* update doc
2019-06-10 09:40:09 -07:00
Clint Wylie
3fbb0a5e00 Supervisor list api with states and health (#7839)
* allow optionally listing all supervisors with their state and health

* docs

* add state to full

* clean

* casing

* format

* spelling
2019-06-07 16:26:33 -07:00
Jihoon Son
61ec521135
Remove keepSegmentGranularity option for compaction (#7747)
* Remove keepSegmentGranularity option from compaction

* fix it test

* clean up

* remove from web console

* fix test
2019-06-03 12:59:15 -07:00
Eyal Yurman
69e9b8a464 Enables SQL by default. (#7808) 2019-05-31 20:53:42 -07:00
Justin Borromeo
8032c4add8 Add errors and state to stream supervisor status API endpoint (#7428)
* Add state and error tracking for seekable stream supervisors

* Fixed nits in docs

* Made inner class static and updated spec test with jackson inject

* Review changes

* Remove redundant config param in supervisor

* Style

* Applied some of Jon's recommendations

* Add transience field

* write test

* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()

* remove transience reporting and fix SeekableStreamSupervisorStateManager impl

* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests

* remove stateHistory because it wasn't adding much value, some fixes, and add more tests

* fix tests

* code review changes and add HTTP health check status

* fix test failure

* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager

* fixup after merge

* code review changes - add additional docs

* cleanup KafkaIndexTaskTest

* add additional documentation for Kinesis indexing

* remove unused throws class
2019-05-31 17:16:01 -07:00
Jonathan Wei
83152a7a00 Fix performance-faq and remove insert-segment-to-db redirects (#7759) 2019-05-24 13:20:02 -07:00
Jonathan Wei
cfb7756c9b Fix references to removed performance FAQ page (#7755) 2019-05-24 11:52:40 -07:00
Jonathan Wei
eb0e1a056c Add limit to timeseries docs (#7750) 2019-05-23 19:41:52 -07:00
Jonathan Wei
f2e34a76bd Fix TOC clustering example link (#7749) 2019-05-23 19:41:27 -07:00
Jonathan Wei
ec4d09a02f Remove obsolete isExcluded config from Kerberos authenticator (#7745) 2019-05-23 16:00:05 -07:00
awelsh93
6964ac23a2 Adding influxdb emitter as a contrib extension (#7717)
* Adding influxdb emitter as a contrib extension

* addressing code review comments
2019-05-23 11:11:48 -07:00
Fangjin Yang
3dec5cd1e4
reorganizing the ToC (#7734) 2019-05-23 09:24:38 -07:00
gocho1
bd899b9224 add s3 authentication method informations (#7674)
* add s3 authentication method informations

* add druid.s3.fileSessionCredentials related content

* remove authentication parameters to avoid confusion as it is more detailed in S3 Deep Storage page

* streamline s3 docs
2019-05-22 11:46:02 -07:00
Gian Merlino
cbbce955de SQL: Allow NULLs in place of optional arguments in many functions. (#7709)
* SQL: Allow NULLs in place of optional arguments in many functions.

Also adjust SQL docs to describe how to make time literals using
TIME_PARSE (which is now possible in a nicer way).

* Be less forbidden.
2019-05-21 11:54:34 -07:00
Gian Merlino
b6941551ae Upgrade various build and doc links to https. (#7722)
* Upgrade various build and doc links to https.

Where it wasn't possible to upgrade build-time dependencies to https,
I kept http in place but used hardcoded checksums or GPG keys to ensure
that artifacts fetched over http are verified properly.

* Switch to https://apache.org.
2019-05-21 11:30:14 -07:00
Xue Yu
dd7dace70a Add TIMESTAMPDIFF sql support (#7695)
* add timestampdiff sql support

* feedback address
2019-05-21 08:05:38 -07:00
Vadim Ogievetsky
156322932f Update Druid Console docs for 0.15.0 (#7697)
* Update Druid Console docs for 0.15.0

* SQL -> query

* added links and fix typos
2019-05-21 04:00:42 -07:00
andrewluotechnologies
1add566411 Fix typo (ComplexMetricSerde class name was spelled incorrectly) (#7694) 2019-05-19 09:49:54 -07:00
Clint Wylie
939b417379 Update tutorial-kafka.md (#7678) 2019-05-16 23:10:45 -07:00