Commit Graph

2280 Commits

Author SHA1 Message Date
Jonathan Wei c626452b47 Add nano-quickstart single server example configuration (#8390)
* Add nano-quickstart single server example configuration

* Use two workers

* Shrink processing buffers
2019-08-24 22:07:20 -07:00
Furkan KAMACI 02fe3db911 Zookeeper version is updated. (#8363)
* Zookeeper version is updated.

* Zookeeper version is updated at licenses.yaml

* licenses.yaml is updated and dependencies are fixed to make the project successfully build.

* Zookeeper versions are fixed at licenses.yaml
2019-08-24 22:00:43 -07:00
Jihoon Son 95fa609615 Fix wrong partitionsSpec type names in the document (#8297)
* Fix wrong type names for partitionsSpec

* add unit tests; add json properties for backward compatibility

* beautify conf names

* remove maxRowsPerSegment from hashed partitionsSpec

* fix doc build
2019-08-23 13:44:58 -07:00
Clint Wylie 7749571a7f order and add more ports to hadoop docker container in hadoop indexing tutorial (#8329)
LGTM
2019-08-23 15:43:06 -05:00
Surekha cf2a2dd917
Add group_id to the sys.tasks table (#8304)
* Add group_id to overlord tasks API and sys.tasks table

* adjust test

* modify docs

* Make groupId nullable

* fix integration test

* fix toString

* Remove groupId from TaskInfo

* Modify docs and tests

* modify TaskMonitorTest
2019-08-22 15:28:23 -07:00
Clint Wylie 010f70b371
autogenerate NOTICE.BINARY from NOTICE and licenses.yaml (#8306)
* migrate binary notice entries to live in licenses.yaml, use licenses.yaml and NOTICE to generate NOTICE.BINARY at distribution time

* +x

* move release scripts to distribution/bin, fixup notice script, trim dependencies for avro and kerberos in licenses.yaml

* add missing hdfs-storage dependencies

* revert to old syntax, fixes

* formatting

* update notices for recently updated dependencies
2019-08-21 12:46:27 -07:00
Gian Merlino d007477742
Docusaurus build framework + ingestion doc refresh. (#8311)
* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes
2019-08-20 21:48:59 -07:00
Fokko Driesprong d5a19675dd Remove fromPigAvroStorage from the docs (#8340)
This one has been deprecated a while ago
2019-08-20 16:34:55 -07:00
Jonathan Wei dd2e53baf4
Clarify Avro decoder docs (#8302) 2019-08-19 15:37:18 -05:00
Jihoon Son 31af4eb9ad
Rename maxNumSubTasks to maxNumConcurrentSubTasks for native parallel index task (#8324) 2019-08-16 15:57:13 -07:00
Jihoon Son 5dac6375f3
Add support for parallel native indexing with shuffle for perfect rollup (#8257)
* Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks

* kill runner when it's ready

* add comment

* kill run thread

* fix test

* Take closeable out of Appenderator

* add javadoc

* fix test

* fix test

* update javadoc

* add javadoc about killed task

* address comment

* Add support for parallel native indexing with shuffle for perfect rollup.

* Add comment about volatiles

* fix test

* fix test

* handling missing exceptions

* more clear javadoc for stopGracefully

* unused import

* update javadoc

* Add missing statement in javadoc

* address comments; fix doc

* add javadoc for isGuaranteedRollup

* Rename confusing variable name and fix typos

* fix typos; move fetch() to a better home; fix the expiration time

* add support https
2019-08-15 17:43:35 -07:00
Jihoon Son eeae5d9365 Add a warning about experimental segment locking (#8301)
* Add a warning about experimental segment locking

* fix typo
2019-08-15 16:07:59 -07:00
Jihoon Son a5c9c2950f Add missing maxBytesInMemory in tuningConfig for auto compaction (#8274)
* Add missing tuningConfigs for auto compaciton

* Add doc

* add test
2019-08-13 14:10:26 -05:00
Alexandre Yang 6b4d028b96 [statsd-emitter] Add config to send Druid process/service as tag (#8238)
* [statsd-emitter] Add serviceAsTag option

* [statsd-emitter] Refactor serviceAsTag option

* [statsd-emitter] Update statsd.md

* [statsd-emitter] add default prefix

* [statsd-emitter] update statsd.md

* [statsd-emitter] Remove extra spaces

* [statsd-emitter] Improve docs for config `dogstatsdServiceAsTag`

* [statsd-emitter] Simplify equals() for StatsDEmitterConfig.java

* [statsd-emitter] Add @Nullable for StatsDEmitterConfig.java
2019-08-12 13:18:44 -07:00
Nathan b28e252d9a Minor Spelling Error (#8277)
* Minor Spelling Error

* Update mySQL password in docs

/extensions-core/mysql update druid.metadata.storage.connector.password
2019-08-09 16:06:02 -05:00
Jonathan Wei e88bbe71c0 Adjust default globalIngestionHeapLimitBytes for indexer, add more docs (#8255) 2019-08-07 23:04:07 -07:00
Jonathan Wei 5e57492298 Add docs for CliIndexer as an experimental feature (#8245)
* Experimental CliIndexer docs

* PR comments
2019-08-06 15:57:17 -07:00
Lucas Capistrant e252abedc5 Enable toggling request logging on/off for different query types (#7562)
* Enable ability to toggle SegmentMetadata request logging on/off

* Move SegmentMetadata query log filter to FilteredRequestLogger

* Update documentation to reflect the segment metadata flag moving to the filtered request logger

* Modify patch to allow blacklist of query types to not log to request logger

* Address styling and naming requests following latest code review

* Fix indentation on multiple locations per Druid style rules
2019-08-06 15:47:30 +03:00
Samarth Jain 93cf9d4ad4 SQL support for t-digest based sketch aggregators (#8100)
* SQL support for t-digest based sketch aggregators

* Fix teamcity errors

* Add missing dependencies

* Remove unused dependency

* Address code review comments

* Add checks for compression param
2019-08-05 12:01:42 -07:00
Jihoon Son 1ee828ff49
Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking (#8173)
* Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking

* add more test

* javadoc for missingIntervalsInOverwriteMode

* Fix test

* Address comments

* avoid spotbugs
2019-08-02 20:30:05 -07:00
Chi Cao Minh 4bd3bad8ba Add IPv4 SQL functions (#8223)
* Add IPv4 SQL functions

New SQL functions for filtering IPv4 addresses:
- IPV4_MATCH: Check if IP address belongs to a subnet
- IPV4_PARSE: Convert string IP address to integer
- IPV4_STRINGIFY: Convert integer IP address to string

These are the SQL analogs of the druid expressions with the same name.
Filtering is more efficient when operating on IP addresses as integers
instead of strings.

* Refactor operator conversions into named constants
2019-08-01 21:29:58 -07:00
Clint Wylie 01c8c82982 correct kerberos doc extension load list (#8224) 2019-08-01 17:03:25 -07:00
Chi Cao Minh 7783b31846 Add IPv4 druid expressions (#8197)
* Add IPv4 druid expressions

New druid expressions for filtering IPv4 addresses:
- ipv4address_match: Check if IP address belongs to a subnet
- ipv4address_parse: Convert string IP address to long
- ipv4address_stringify: Convert long IP address to string

These expressions operate on IP addresses represented as either strings
or longs, so that they can be applied to dimensions with mixed
representation of IP addresses. The filtering is more efficient when
operating on IP addresses as longs. In other words, the intended use
case is:

1) Use ipv4address_parse to convert to long at ingestion time
2) Use ipv4address_match to filter (on longs) at query time
3) Use ipv4adress_stringify to convert to (readable) string at query
time

* Fix licenses and null handling

* Simplify IPv4 expressions

* Fix tests

* Fix check for valid ipv4 address string
2019-08-01 11:45:04 -07:00
Surekha f0ecdfee30 Fix `is_realtime` column behavior in sys.segments table (#8154)
* Fix is_realtime flag

* make variable final

* minor changes

* Modify is_realtime behavior based on review comment

* Fix UT
2019-07-31 22:26:49 -06:00
Nathan 716ce7fdc7 Spelling Error (#8206) 2019-07-31 10:43:11 -07:00
Jihoon Son 385f492a55
Use PartitionsSpec for all task types (#8141)
* Use partitionsSpec for all task types

* fix doc

* fix typos and revert to use isPushRequired

* address comments

* move partitionsSpec to core

* remove hadoopPartitionsSpec
2019-07-30 17:24:39 -07:00
Clint Wylie 653b558134 sql firehose and firehose doc adjustments (#8067)
* firehose doc adjustments

* fix typo

* additional information on parser types in ingestion docs

* clarify ingest segment firehose docs, add sql firehose examples to sql extension pages

* fixit

* make sql firehose more forgiving my always constructing a MapInputRowParser from the parseSpec of whatever actual InputRowParser impl is provided, remove doc references to map based parsers

* transforms

* fix tests
2019-07-30 15:28:10 -07:00
Jonathan Wei 640b7afc1c Add CliIndexer process type and initial task runner implementation (#8107)
* Add CliIndexer process type and initial task runner implementation

* Fix HttpRemoteTaskRunnerTest

* Remove batch sanity check on PeonAppenderatorsManager

* Fix paralle index tests

* PR comments

* Adjust Jersey resource logging

* Additional cleanup

* Fix SystemSchemaTest

* Add comment to LocalDataSegmentPusherTest absolute path test

* More PR comments

* Use Server annotated with RemoteChatHandler

* More PR comments

* Checkstyle

* PR comments

* Add task shutdown to stopGracefully

* Small cleanup

* Compile fix

* Address PR comments

* Adjust TaskReportFileWriter and fix nits

* Remove unnecessary closer

* More PR comments

* Minor adjustments

* PR comments

* ThreadingTaskRunner: cancel  task run future not shutdownFuture and remove thread from workitem
2019-07-29 17:06:33 -07:00
Jihoon Son 61f4abece4
Add more warning to the doc for resetOffsetAutomatically (#8153)
* Add more warnings to the doc for resetOffsetAutomatically

* fix kinesis doc

* fix typos

* revise the description

* capital

* capitalize
2019-07-24 17:37:32 -07:00
Magnus Henoch c87b47e0fa More documentation formatting fixes (#8149)
Add empty lines before bulleted lists and code blocks, to ensure that
they show up properly on the web site.  See also #8079.
2019-07-24 15:26:03 -07:00
Clint Wylie b8b22b7aaa fix references to bin/supervise in tutorial docs (#8087) 2019-07-23 15:05:01 -07:00
Clint Wylie 83514958db remove unnecessary lock in ForegroundCachePopulator leading to a lot of contention (#8116)
* remove unecessary lock in ForegroundCachePopulator leading to a lot of contention

* mutableboolean, javadocs,document some cache configs that were missing

* more doc stuff

* adjustments

* remove background documentation
2019-07-23 10:57:59 -07:00
Sashidhar Thallam ea4bad7836 Druid SQL EXTRACT time function - adding support for additional Time Units (#8068)
* 1. Added TimestampExtractExprMacro.Unit for MILLISECOND 2. expr eval for MILLISECOND 3. Added a test case to test extracting millisecond from expression. #7935

* 1. Adding DATASOURCE4 in tests. 2. Adding test TimeExtractWithMilliseconds

* Fixing testInformationSchemaTables test

* Fixing failing tests in DruidAvaticaHandlerTest

* Adding cannotVectorize() call before the test

* Extract time function - Adding support for MICROSECOND, ISODOW, ISOYEAR and CENTURY time units, documentation changes.

* Adding MILLISECOND in test case

* Adding support DECADE and MILLENNIUM, updating test case and documentation

* Fixing expression eval for DECADE and MILLENIUM
2019-07-19 20:38:32 -07:00
Roman Leventov ceb969903f
Refactor SQLMetadataSegmentManager; Change contract of REST met… (#7653)
* Refactor SQLMetadataSegmentManager; Change contract of REST methods in DataSourcesResource

* Style fixes

* Unused imports

* Fix tests

* Fix style

* Comments

* Comment fix

* Remove unresolvable Javadoc references; address comments

* Add comments to ImmutableDruidDataSource

* Merge with master

* Fix bad web-console merge

* Fixes in api-reference.md

* Rename in DruidCoordinatorRuntimeParams

* Fix compilation

* Residual changes
2019-07-17 17:18:48 +03:00
Magnus Henoch 179253a2fc Fix documentation formatting (#8079)
The Markdown dialect used when publishing the documentation to the web
site is much more sensitive than Github-flavoured Markdown.  In
particular, it requires an empty line before code blocks (unless the
code block starts right after a heading), otherwise the code block
gets formatted in-line with the previous paragraph.  Likewise for
bullet-point lists.
2019-07-15 09:55:18 -07:00
Gian Merlino ffa25b7832
Query vectorization. (#6794)
* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.
2019-07-12 12:54:07 -07:00
Chi Cao Minh da3d141dd2 Add inline firehose (#8056)
* Add inline firehose

To allow users to quickly parsing and schema, add a firehose that reads
data that is inlined in its spec.

* Address review comments

* Remove suppression of sonar warnings
2019-07-11 21:43:46 -07:00
Atul Mohan 631cda649b Include replicated segment size property for datasources endpoint (#8039)
* Add replication size

* Summon comma
2019-07-11 01:10:38 -07:00
Himanshu 14aec7fcec
add config to optionally disable all compression in intermediate segment persists while ingestion (#7919)
* disable all compression in intermediate segment persists while ingestion

* more changes and build fix

* by default retain existing indexingSpec for intermediate persisted segments

* document indexSpecForIntermediatePersists index tuning config

* fix build issues

* update serde tests
2019-07-10 12:22:24 -07:00
Jihoon Son 0a3538b569 Fix license check in travis and make it optional (#8049)
* Fix license check in travis and make it optional

* debug

* fix build

* too loud maven

* move MAVEN_OPTS to top and add comments

* adjust script

* remove mvn option from python script
2019-07-09 19:35:29 -07:00
Sashidhar Thallam 3353da2974 Adding missing docs for druid.indexer.logs.disableAcl (#8046) 2019-07-09 16:11:25 -07:00
Jihoon Son 12f12676e3
Binary license management system (#7998)
* Binary license management system

* add missing file

* add comment

* Address comments

* print missing licenses

* print druid module name

* Add missing licenses and update versions

* fix library versions and add missing ones. also fix pom.xml

* testing multi thread

* Parallel report generation

* fix build error

* install pyyaml and use old api

* install python3

* fix travis script

* python3.6

* pip

* setuptools

* python3-setuptools

* address comment

* error on not found reports or registered licenses

* removed licenses

* debug

* travis debug

* add missing licenses

* travis debug

* debug

* remove debug code

* test build script

* travis debug

* still debug

* add missing python lib

* debug

* debug

* fix travis

* fix travis

* debug travis

* flush print

* print something more to keep travis alive

* adjust print

* single threaded

* single threaded

* debug

* debug

* remove debug

* remove deprecated-2017Q4 from travis conf

* remove comments and duplicate sudo
2019-07-08 12:24:51 -07:00
Eyal Yurman 2eee711653 Add missing reference to Materialized-View extension. (#8003)
* Reference Materialized View extension from extensions page.

* Add comma
2019-07-06 13:50:41 -07:00
Dinesh Sawant 9c7c7c58ae Fix overlord port in delete data tutorial (#8037)
In Single-Server Quickstart tutorial the overlord and coordinator
is started as one process on port 8081. But in delete data tutorial the kill
task is sent to 8090 port, which fails.
2019-07-06 08:50:01 -07:00
Chi Cao Minh 0ded0ce414 Add round support for DS-HLL (#8023)
* Add round support for DS-HLL

Since the Cardinality aggregator has a "round" option to round off estimated
values generated from the HyperLogLog algorithm, add the same "round" option to
the DataSketches HLL Sketch module aggregators to be consistent.

* Fix checkstyle errors

* Change HllSketchSqlAggregator to do rounding

* Fix test for standard-compliant null handling mode
2019-07-05 15:37:58 -07:00
Clint Wylie 42a7b8849a remove FirehoseV2 and realtime node extensions (#8020)
* remove firehosev2 and realtime node extensions

* revert intellij stuff

* rat exclusion
2019-07-04 15:40:22 -07:00
Gian Merlino 613f09b45a SQL: Add TIME_CEIL function. (#8027)
Also simplify conversions for CEIL, FLOOR, and TIME_FLOOR by allowing them to
share more code.
2019-07-04 15:40:03 -07:00
Clint Wylie 3b84246cd6 add SQL docs for multi-value string dimensions (#8011)
* add SQL docs for multi-value string dimensions

* formatting consistency

* fix typo

* adjust
2019-07-03 08:22:33 -07:00
Clint Wylie c556d44a19
more sql support for expression array functions (#7974)
* more sql support for expression array functions

* prepend/slice

* doc fixes

* fix imports

* fix tests

* add null numeric expr for proper conversions between ExprEval and Expr and back to ExprEval

* re-arrange

* imports :(

* add append/prepend test
2019-07-02 21:39:26 -07:00
Clint Wylie f7283378ac
remove deprecated standalone realtime node (#7915)
* remove CliRealtime, RealtimeManager, etc

* add redirects for deleted page to page that explains the deleted thing

* adjust docs
2019-07-02 18:12:17 -07:00
Clint Wylie 93b738bbfa
expression language array constructor and sql multi-value string filtering support (#7973)
* expr array constructor and sql multi-value string support

* doc fix

* checkstyle

* change from feedback
2019-07-01 15:14:50 -07:00
Eyal Yurman 3650eed1aa Improve pull-deps reference in extensions page. (#8002) 2019-07-01 11:18:27 -07:00
Xue Yu 2831944056 support NVL sql function (#7965)
* sql nvl

* add nvl in sql doc
2019-06-30 13:14:30 -07:00
Jihoon Son f148249f64 Fix wrong redirect for orc extension (#7983) 2019-06-27 16:27:08 -07:00
Alexander Saydakov f38a62e949 theta sketch to string post agg (#7937) 2019-06-27 15:09:57 -07:00
Vadim Ogievetsky ad45ef12ed fix SQL doc comment (#7981) 2019-06-27 15:05:45 -07:00
Jihoon Son c4aaf26797 Add missing redirect for ORC extension document (#7979) 2019-06-27 14:23:44 -07:00
Clint Wylie 10d6b0318d clarify granularity docs (#7977) 2019-06-27 08:51:22 -07:00
Xue Yu 5464c8938f Add array_slice and array_unshift function expr (#7950)
* add array_slice and array_unshift function expr

* feedback address
2019-06-26 16:56:09 -07:00
Benedict Jin 16aafd5788 [ImgBot] Optimize images (#7873)
*Total -- 10,997.25kb -> 7,160.16kb (34.89%)

/publications/radstack/figures/precompute.png -- 54.20kb -> 16.97kb (68.69%)
/web-console/favicon.png -- 4.41kb -> 1.61kb (63.58%)
/docs/img/indexing_service.png -- 47.37kb -> 21.96kb (53.64%)
/docs/img/segmentPropagation.png -- 62.94kb -> 29.85kb (52.57%)
/docs/content/tutorials/img/tutorial-quickstart-01.png -- 55.62kb -> 29.13kb (47.62%)
/docs/content/tutorials/img/tutorial-deletion-02.png -- 791.43kb -> 429.30kb (45.76%)
/docs/content/tutorials/img/tutorial-deletion-03.png -- 786.79kb -> 427.05kb (45.72%)
/docs/content/tutorials/img/tutorial-retention-00.png -- 135.06kb -> 75.88kb (43.82%)
/docs/content/tutorials/img/tutorial-batch-data-loader-10.png -- 77.23kb -> 43.47kb (43.71%)
/docs/content/tutorials/img/tutorial-batch-data-loader-01.png -- 97.03kb -> 55.16kb (43.15%)
/docs/content/tutorials/img/tutorial-batch-data-loader-07.png -- 79.49kb -> 45.44kb (42.84%)
/docs/content/tutorials/img/tutorial-retention-02.png -- 401.30kb -> 234.68kb (41.52%)
/docs/content/tutorials/img/tutorial-compaction-06.png -- 343.27kb -> 201.87kb (41.19%)
/docs/content/tutorials/img/tutorial-batch-data-loader-09.png -- 105.14kb -> 61.86kb (41.16%)
/docs/content/tutorials/img/tutorial-retention-06.png -- 227.57kb -> 134.35kb (40.97%)
/docs/content/tutorials/img/tutorial-compaction-04.png -- 304.83kb -> 180.04kb (40.94%)
/docs/content/tutorials/img/tutorial-compaction-02.png -- 273.18kb -> 162.67kb (40.45%)
/docs/content/tutorials/img/tutorial-query-05.png -- 85.03kb -> 50.64kb (40.44%)
/publications/radstack/figures/druid_vs_bigquery.png -- 155.44kb -> 92.85kb (40.27%)
/docs/content/tutorials/img/tutorial-kafka-02.png -- 122.51kb -> 73.93kb (39.65%)
/docs/content/tutorials/img/tutorial-deletion-01.png -- 70.37kb -> 42.56kb (39.52%)
/docs/content/tutorials/img/tutorial-batch-data-loader-06.png -- 103.50kb -> 62.79kb (39.33%)
/docs/content/tutorials/img/tutorial-batch-submit-task-01.png -- 111.25kb -> 67.73kb (39.12%)
/docs/content/tutorials/img/tutorial-query-03.png -- 103.60kb -> 63.51kb (38.69%)
/docs/content/tutorials/img/tutorial-query-04.png -- 105.79kb -> 64.87kb (38.69%)
/docs/content/tutorials/img/tutorial-batch-data-loader-11.png -- 130.20kb -> 81.34kb (37.53%)
/docs/content/tutorials/img/tutorial-query-07.png -- 122.52kb -> 76.79kb (37.32%)
/docs/content/tutorials/img/tutorial-kafka-01.png -- 133.12kb -> 83.47kb (37.3%)
/docs/content/tutorials/img/tutorial-query-06.png -- 127.55kb -> 80.28kb (37.06%)
/docs/content/tutorials/img/tutorial-batch-submit-task-02.png -- 133.07kb -> 84.06kb (36.83%)
/docs/content/tutorials/img/tutorial-retention-05.png -- 60.19kb -> 38.08kb (36.74%)
/docs/content/tutorials/img/tutorial-batch-data-loader-03.png -- 211.92kb -> 134.22kb (36.66%)
/docs/content/tutorials/img/tutorial-batch-data-loader-05.png -- 250.36kb -> 158.68kb (36.62%)
/publications/radstack/figures/radstack.png -- 16.80kb -> 10.67kb (36.48%)
/docs/content/tutorials/img/tutorial-batch-data-loader-08.png -- 158.59kb -> 101.49kb (36%)
/docs/content/tutorials/img/tutorial-batch-data-loader-04.png -- 255.10kb -> 163.33kb (35.97%)
/docs/content/tutorials/img/tutorial-query-02.png -- 126.92kb -> 81.42kb (35.85%)
/docs/content/tutorials/img/tutorial-compaction-01.png -- 53.86kb -> 34.87kb (35.25%)
/docs/img/druid-architecture.png -- 202.23kb -> 130.97kb (35.24%)
/docs/content/tutorials/img/tutorial-retention-01.png -- 52.69kb -> 34.35kb (34.81%)
/docs/img/druid-timeline.png -- 35.87kb -> 23.59kb (34.22%)
/docs/content/tutorials/img/tutorial-query-01.png -- 149.53kb -> 98.56kb (34.08%)
/docs/content/tutorials/img/tutorial-retention-04.png -- 65.91kb -> 43.57kb (33.89%)
/docs/content/tutorials/img/tutorial-compaction-08.png -- 42.24kb -> 28.08kb (33.53%)
/docs/content/tutorials/img/tutorial-compaction-07.png -- 39.17kb -> 26.06kb (33.47%)
/docs/content/tutorials/img/tutorial-compaction-03.png -- 39.17kb -> 26.13kb (33.3%)
/docs/content/tutorials/img/tutorial-compaction-05.png -- 38.85kb -> 25.96kb (33.17%)
/publications/demo/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%)
/publications/radstack/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%)
/publications/whitepaper/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%)
/docs/content/tutorials/img/tutorial-retention-03.png -- 43.11kb -> 29.33kb (31.97%)
/publications/radstack/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%)
/publications/whitepaper/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%)
/publications/demo/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%)
/publications/radstack/figures/joined.png -- 164.14kb -> 113.47kb (30.87%)
/docs/content/tutorials/img/tutorial-batch-data-loader-02.png -- 508.93kb -> 351.85kb (30.87%)
/publications/radstack/figures/imps_clicks.png -- 190.95kb -> 132.70kb (30.51%)
/publications/radstack/figures/shuffled.png -- 180.46kb -> 128.21kb (28.95%)
/publications/radstack/figures/pipeline.png -- 392.54kb -> 281.93kb (28.18%)
/docs/img/druid-manage-1.png -- 108.94kb -> 78.53kb (27.92%)
/publications/radstack/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%)
/publications/demo/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%)
/publications/whitepaper/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%)
/docs/img/druid-production.png -- 50.00kb -> 39.18kb (21.63%)
/docs/img/druid-dataflow-3.png -- 88.25kb -> 69.75kb (20.96%)
/publications/demo/figures/realtime_flow.png -- 51.12kb -> 40.61kb (20.56%)
/publications/demo/figures/realtime_timeline.png -- 36.15kb -> 29.24kb (19.12%)
/publications/demo/figures/tpch_scaling.png -- 43.21kb -> 34.97kb (19.08%)
/publications/demo/figures/caching.png -- 35.26kb -> 29.09kb (17.49%)
/dev/intellij-sdk-config.jpg -- 1,019.35kb -> 864.37kb (15.2%)
/docs/img/druid-column-types.png -- 101.53kb -> 91.17kb (10.2%)
/docs/img/druid-dataflow-2x.png -- 138.30kb -> 127.11kb (8.09%)
2019-06-24 21:27:48 -07:00
Jonathan Wei 35601bb7a0 Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory (#7784)
* Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory

* Add finalizeAsBase64Binary option to ApproximateHistogramFactory

* Update approx histogram doc
2019-06-21 18:00:19 -07:00
Clint Wylie 494b8ebe56 multi-value string column support for expressions (#7588)
* array support for expression language for multi-value string columns

* fix tests?

* fixes

* more tests

* fixes

* cleanup

* more better, more test

* ignore inspection

* license

* license fix

* inspection

* remove dumb import

* more better

* some comments

* add expr rewrite for arrayfn args for more magic, tests

* test stuff

* more tests

* fix test

* fix test

* castfunc can deal with arrays

* needs more empty array

* more tests, make cast to long array more forgiving

* refactor

* simplify ExprMacro Expr implementations with base classes in core

* oops

* more test

* use Shuttle for Parser.flatten, javadoc, cleanup

* fixes and more tests

* unused import

* fixes

* javadocs, cleanup, refactors

* fix imports

* more javadoc

* more javadoc

* more

* more javadocs, nonnullbydefault, minor refactor

* markdown fix

* adjustments

* more doc

* move initial filter out

* docs

* map empty arg lambda, apply function argument validation

* check function args at parse time instead of eval time

* more immutable

* more more immutable

* clarify grammar

* fix docs

* empty array is string test, we need a way to make arrays better maybe in the future, or define empty arrays as other types..
2019-06-19 13:57:37 -07:00
Clint Wylie 71997c16a2 switch links from druid.io to druid.apache.org (#7914)
* switch links from druid.io to druid.apache.org

* fix it
2019-06-18 09:06:27 -07:00
Vadim Ogievetsky 24dd4573da Added the web console to the quickstart tutorials and docs (#7863)
* added console to the quickstart tutorials

* feedback fixes

* feedback fixes

* more typo fixes

* moved reseting cluster section after load data

* update images

* stage -> step

* feedback fixes

* more feedback fixes
2019-06-17 18:00:54 -07:00
Himanshu b3328b2785
endpoint to delete lookup tier and remove tier on last lookup deletion (#7852) 2019-06-15 17:55:50 -07:00
Justin Borromeo 8e5003b01c Scan Doc Change (#7903) 2019-06-15 01:21:34 -07:00
Jihoon Son 3cd9a7507d Fix script for dependencies report for extensions (#7899) 2019-06-14 18:53:50 -07:00
Jihoon Son a648e1548d Add support of --exclude-extension argument for dependency report script (#7786) 2019-06-14 15:18:59 -07:00
Xue Yu 456a3654ce add PolygonBound and missing extentions list doc (#7885) 2019-06-13 12:03:58 -07:00
Clint Wylie 8117222da3 use right port for kafka tutorial, reinfoce that tutorials assume you are using micro-quickstart single-server configuration (#7862) 2019-06-11 08:50:52 -07:00
Xue Yu ce591d1457 Support var_pop, var_samp, stddev_pop and stddev_samp etc in sql (#7801)
* support var_pop, stddev_pop etc in sql

* fix sql compatible

* rebase on master

* update doc
2019-06-10 09:40:09 -07:00
Clint Wylie 3fbb0a5e00 Supervisor list api with states and health (#7839)
* allow optionally listing all supervisors with their state and health

* docs

* add state to full

* clean

* casing

* format

* spelling
2019-06-07 16:26:33 -07:00
Jihoon Son 61ec521135
Remove keepSegmentGranularity option for compaction (#7747)
* Remove keepSegmentGranularity option from compaction

* fix it test

* clean up

* remove from web console

* fix test
2019-06-03 12:59:15 -07:00
Jihoon Son e289820bbd Add a script to find missing backports (#7817) 2019-06-03 07:56:52 -07:00
Eyal Yurman 69e9b8a464 Enables SQL by default. (#7808) 2019-05-31 20:53:42 -07:00
Justin Borromeo 8032c4add8 Add errors and state to stream supervisor status API endpoint (#7428)
* Add state and error tracking for seekable stream supervisors

* Fixed nits in docs

* Made inner class static and updated spec test with jackson inject

* Review changes

* Remove redundant config param in supervisor

* Style

* Applied some of Jon's recommendations

* Add transience field

* write test

* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()

* remove transience reporting and fix SeekableStreamSupervisorStateManager impl

* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests

* remove stateHistory because it wasn't adding much value, some fixes, and add more tests

* fix tests

* code review changes and add HTTP health check status

* fix test failure

* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager

* fixup after merge

* code review changes - add additional docs

* cleanup KafkaIndexTaskTest

* add additional documentation for Kinesis indexing

* remove unused throws class
2019-05-31 17:16:01 -07:00
Jonathan Wei 83152a7a00 Fix performance-faq and remove insert-segment-to-db redirects (#7759) 2019-05-24 13:20:02 -07:00
Jonathan Wei cfb7756c9b Fix references to removed performance FAQ page (#7755) 2019-05-24 11:52:40 -07:00
Jonathan Wei eb0e1a056c Add limit to timeseries docs (#7750) 2019-05-23 19:41:52 -07:00
Jonathan Wei f2e34a76bd Fix TOC clustering example link (#7749) 2019-05-23 19:41:27 -07:00
Jonathan Wei ec4d09a02f Remove obsolete isExcluded config from Kerberos authenticator (#7745) 2019-05-23 16:00:05 -07:00
awelsh93 6964ac23a2 Adding influxdb emitter as a contrib extension (#7717)
* Adding influxdb emitter as a contrib extension

* addressing code review comments
2019-05-23 11:11:48 -07:00
Fangjin Yang 3dec5cd1e4
reorganizing the ToC (#7734) 2019-05-23 09:24:38 -07:00
gocho1 bd899b9224 add s3 authentication method informations (#7674)
* add s3 authentication method informations

* add druid.s3.fileSessionCredentials related content

* remove authentication parameters to avoid confusion as it is more detailed in S3 Deep Storage page

* streamline s3 docs
2019-05-22 11:46:02 -07:00
Gian Merlino cbbce955de SQL: Allow NULLs in place of optional arguments in many functions. (#7709)
* SQL: Allow NULLs in place of optional arguments in many functions.

Also adjust SQL docs to describe how to make time literals using
TIME_PARSE (which is now possible in a nicer way).

* Be less forbidden.
2019-05-21 11:54:34 -07:00
Gian Merlino b6941551ae Upgrade various build and doc links to https. (#7722)
* Upgrade various build and doc links to https.

Where it wasn't possible to upgrade build-time dependencies to https,
I kept http in place but used hardcoded checksums or GPG keys to ensure
that artifacts fetched over http are verified properly.

* Switch to https://apache.org.
2019-05-21 11:30:14 -07:00
Xue Yu dd7dace70a Add TIMESTAMPDIFF sql support (#7695)
* add timestampdiff sql support

* feedback address
2019-05-21 08:05:38 -07:00
Vadim Ogievetsky 156322932f Update Druid Console docs for 0.15.0 (#7697)
* Update Druid Console docs for 0.15.0

* SQL -> query

* added links and fix typos
2019-05-21 04:00:42 -07:00
andrewluotechnologies 1add566411 Fix typo (ComplexMetricSerde class name was spelled incorrectly) (#7694) 2019-05-19 09:49:54 -07:00
Jihoon Son 94721de141 Add auto tagging milestone script (#7677)
* Add auto tagging milestone script

* fix usage

* missing newline

* missing newline
2019-05-16 23:11:16 -07:00
Clint Wylie 939b417379 Update tutorial-kafka.md (#7678) 2019-05-16 23:10:45 -07:00
Jonathan Wei d99f77a01b
Add option to use YARN RM as fallback for JobHistory failure (#7673)
* Add option to use YARN RM as fallback for job status

* PR comments
2019-05-16 13:59:10 -07:00
Fangjin Yang dc85a5309e
some more doc improvements (#7675) 2019-05-16 13:17:21 -07:00
Jonathan Wei d667655871 Add basic tuning guide, getting started page, updated clustering docs (#7629)
* Add basic tuning guide, getting started page, updated clustering docs

* Add note about caching, fix tutorial paths

* Adjust hadoop wording

* Add license

* Tweak

* Shrink overlord heaps, fix tutorial urls

* Tweak xlarge peon, update peon sizing

* Update Data peon buffer size

* Fix cluster start scripts

* Add upper level _common to classpath

* Fix cluster data/query confs

* Address PR comments

* Elaborate on connection pools

* PR comments

* Increase druid.broker.http.maxQueuedBytes

* Add guidelines for broker backpressure

* PR comments
2019-05-16 11:13:48 -07:00
Benedict Jin 3df364c472 Fix broken links in api-reference.md (#7670) 2019-05-15 18:53:34 -07:00
Clint Wylie c2abbc24a7 minor web console doc fixes (#7668) 2019-05-15 18:52:51 -07:00
Surekha d3545f5086 Show all server types in sys.servers table (#7654)
* update sys.servers table to show all servers

* update docs

* Fix integration test

* modify test query for batch integration test

* fix case in test queries

* make the server_type lowercase

* Apply suggestions from code review

Co-Authored-By: Himanshu <g.himanshu@gmail.com>

* Fix compilation from git suggestion

* fix unit test
2019-05-15 16:54:02 -07:00
Gian Merlino 0352f450d7 Fix broken links in docs, add broken link checker. (#7658)
Also adds back insert-segment-to-db.md with some docs about why and
when it was removed (in #6911).
2019-05-15 14:49:50 -07:00
Surekha 917106985f Update tutorial to delete data (#7577)
* Update tutorial to delete data

* update tutorial, remove old ways to drop data

* PR comments
2019-05-15 14:40:06 -07:00
Jonathan Wei e874da7cea
Add simpler permissions option to BasicAuthorizer GET APIs (#7635)
* Add simpler permissions option to BasicAuthorizer GET APIs

* Adjust log message

Co-Authored-By: Himanshu <g.himanshu@gmail.com>

* Adjust log message

Co-Authored-By: Himanshu <g.himanshu@gmail.com>
2019-05-15 12:59:32 -07:00
Clint Wylie b87c8f0314 fix lookup editor to use lookup tiers instead of historical tiers (#7647)
* fix lookup editor to use lookup tiers instead of historical tiers

* use default tier if empty response, fix if configured lookups is null

* fixes

* fix typo
2019-05-14 13:30:51 -07:00
Alexander Saydakov ca1a6649f6 Datasketches quantiles more post-aggs (#7550)
* rank and CDF post-aggs

* added post-aggs to the module

* added new post-aggs

* moved post-agg IDs

* moved post-agg IDs
2019-05-10 11:46:54 -07:00
Clint Wylie 402d76a10f make-redirects.py requires python3, explicitly specify it (#7625) 2019-05-09 21:32:58 -07:00
Clint Wylie 6a6c6d573d
Add plain text README.txt, use relative link from README.md to build.md (#7611)
* use relative link to build instructions from top level readme

* add textfile to readme

* formatting

* make README.BINARY plaintext, move LABELS.md to LABELS, README.txt to README

* exclude README.BINARY still

* remove jdk links/recommmendations

* add script to use DRUIDVERSION in textfile README instead of latest, add links to recommended jdk to build.md

* license

* better readme template, links to latest if does not detect an apache release version

* fix
2019-05-09 21:29:26 -07:00
Samarth Jain b542bb9f34 TDigest backed sketch aggregators (#7331)
* First set of changes for tDigest histogram

* Add license

* Address code review comments

* Add a doc page for new T-Digest sketch aggregators. Minor code cleanup and comments.

* Remove synchronization from BufferAggregators. Address code review comments

* Fix typo
2019-05-09 17:22:55 -07:00
Magnus Henoch 2ac112151f Fix formatting in scan query documentation (#7622)
Escape underscores in `__time`, so they're not interpreted as bold
formatting.
2019-05-09 11:32:37 -07:00
Jinseon Lee 0ef435a16c add postgresql meta db table schema configuration property (#7137) (#7183)
* add postgresql meta db table schema configuration property (#7137)

If the postgresql db schema changes, you must set the configuration
values.
You do not need to set it if there is no change from the default schema
'public'.
druid.metadata.postgres.dbTableSchema=public

* create postgresql metadb table schema configuration property (#7137)
If the postgresql db schema changes, you must set the configuration
values.
You do not need to set it if there is no change from the default schema
'public'.
druid.metadata.postgres.dbTableSchema=public
check PostgreSQLTablesConfig.java

* modify postgresql readme file. - metadb table schema (#7137)
If the postgresql db schema changes, you must set the configuration
values.
You do not need to set it if there is no change from the default schema
'public'.
druid.metadata.postgres.dbTableSchema=public
check PostgreSQLTablesConfig.java
2019-05-08 12:56:30 -07:00
Jonathan Wei dadf6a2f11
Add tool for migrating from local deep storage/Derby metadata (#7598)
* Add tool for migrating from local deep storage/Derby metadata

* Split deep storage and metadata migration docs

* Support import into Derby

* Fix create tables cmd

* Fix create tables cmd

* Fix commands

* PR comment

* Add -p
2019-05-06 23:39:40 -07:00
Jonathan Wei 7c2ca474da Add single-machine deployment example cfgs and scripts (#7590)
* Add single-machine deployment example cfgs and scripts

* Add (8u92+)

* Use combined coordinator-overlord for single machine confs

* RAT fix
2019-05-06 19:11:13 -07:00
Gian Merlino 727b65c7e5 Remove SQL experimental banner and other doc adjustments. (#7591)
* Remove SQL experimental banner and other doc adjustments.

Also,

- Adjust the ToC and other docs a bit so SQL and native queries are
  presented on more equal footing.
- De-emphasize querying historicals and peons directly in the
  native query docs. This is a really niche thing and may have been
  confusing to include prominently in the very first paragraph.
- Remove DataSketches and Kafka indexing service from the experimental
  features ToC. They are not experimental any longer and were there in
  error.

* More notes.

* Slight tweak.

* Remove extra extra word.

* Remove RT node from ToC.
2019-05-06 12:31:51 -07:00
Samarth Jain afbcb9c07f Improve parallelism of zookeeper based segment change processing (#7088)
* V1 - improve parallelism of zookeeper based segment change processing

* Create zk nodes in batches. Address code review comments.
Introduce various configs.

* Add documentation for the newly added configs

* Fix test failures

* Fix more test failures

* Remove prinstacktrace statements

* Address code review comments

* Use a single queue

* Address code review comments

Since we have a separate load peon for every historical, just having a single SegmentChangeProcessor
task per historical is enough. This commit also gets rid of the associated config druid.coordinator.loadqueuepeon.curator.numCreateThreads

* Resolve merge conflict

* Fix compilation failure

* Remove batching since we already have a dynamic config maxSegmentsInNodeLoadingQueue that provides that control

* Fix NPE in test

* Remove documentation for configs that are no longer needed

* Address code review comments

* Address more code review comments

* Fix checkstyle issue

* Address code review comments

* Code review comments

* Add back monitor node remove executor

* Cleanup code to isolate null checks  and minor refactoring

* Change param name since it conflicts with member variable name
2019-05-03 15:58:42 +02:00
Jonathan Wei a013350018 Adjust required permissions for system schema (#7579)
* Adjust required permissions for system schema

* PR comments, fix current_size handling

* Checkstyle

* Set curr_size instead of current_size

* Adjust information schema docs

* Fix merge conflict

* Update tests
2019-05-02 07:18:02 -07:00
Surekha 15d19f3059 Add is_overshadowed column to sys.segments table (#7425)
* Add is_overshadowed column to sys.segments table

* update docs

* Rename class and variables

* PR comments

* PR comments

* remove unused variables in MetadataResource

* move constants together

* add getFullyOvershadowedSegments method to ImmutableDruidDataSource

* Fix compareTo of SegmentWithOvershadowedStatus

* PR comment

* PR comments

* PR comments

* PR comments

* PR comments

* fix issue with already consumed stream

* minor refactoring

* PR comments
2019-05-01 18:00:57 +02:00
Gian Merlino c648775b5b SQL: Remove "useFallback" feature. (#7567)
This feature allows Calcite's Bindable interpreter to be bolted on
top of Druid queries and table scans. I think it should be removed for
a few reasons:

1. It is not recommended for production anyway, because it generates
unscalable query plans (e.g. it will plan a join into two table scans
and then try to do the entire join in memory on the broker).
2. It doesn't work with Druid-specific SQL functions, like TIME_FLOOR,
REGEXP_EXTRACT, APPROX_COUNT_DISTINCT, etc.
3. It makes the SQL planning code needlessly complicated.

With SQL coming out of experimental status soon, it's a good opportunity
to remove this feature.
2019-04-28 18:26:44 -07:00
Eyal Yurman f02251ab2d Contributing Moving-Average Query to open source. (#6430)
* Contributing Moving-Average Query to open source.

* Fix failing code inspections.

* See if explicit types will invoke the correct comparison function.

* Explicitly remove support for druid.generic.useDefaultValueForNull configuration parameter.

* Update styling and headers for complience.

* Refresh code with latest master changes:

* Remove NullDimensionSelector.
* Apply changes of RequestLogger.
* Apply changes of TimelineServerView.

* Small checkstyle fix.

* Checkstyle fixes.

* Fixing rat errors; Teamcity errors.

* Removing support theta sketches. Will be added back in this pr or a following once DI conflicts with datasketches are resolved.

* Implements some of the review fixes.

* Contributing Moving-Average Query to open source.

* Fix failing code inspections.

* See if explicit types will invoke the correct comparison function.

* Explicitly remove support for druid.generic.useDefaultValueForNull configuration parameter.

* Update styling and headers for complience.

* Refresh code with latest master changes:

* Remove NullDimensionSelector.
* Apply changes of RequestLogger.
* Apply changes of TimelineServerView.

* Small checkstyle fix.

* Checkstyle fixes.

* Fixing rat errors; Teamcity errors.

* Removing support theta sketches. Will be added back in this pr or a following once DI conflicts with datasketches are resolved.

* Implements some of the review fixes.

* More fixes for review.

* More fixes from review.

* MapBasedRow is Unmodifiable. Create new rows instead of modifying existing ones.

* Remove more changes related to datasketches support.

* Refactor BaseAverager startFrom field and add a comment.

* fakeEvents field: Refactor initialization and add comment.

* Rename parameters (tiny change).

* Fix variable name typo in test (JAN_4).

* Fix styling of non camelCase fields.

* Fix Preconditions.checkArgument for cycleSize.

* Add more documentation to RowBucketIterable and other classes.

* key/value comment on in MovingAverageIterable.

* Fix anonymous makeColumnValueSelector returning null.

* Replace IdentityYieldingAccumolator with Yielders.each().

* * internalNext() should return null instead of throwing exception.
* Remove unused variables/prarameters.

* Harden MovingAverageIterableTest (Switch anyOf to exact match).

* Change internalNext() from recursion to iteration; Simplify next() and hasNext().

* Remove unused imports.

* Address review comments.

* Rename fakeEvents to emptyEvents.

* Remove redundant parameter key from computeMovingAverage.

* Check yielder as well in RowBucketIterable#hasNext()

* Fix javadoc.
2019-04-26 17:07:48 -07:00
Adam Peck ebdf07b69f Add reload by interval API (#7490)
* Add reload by interval API
Implements the reload proposal of #7439
Added tests and updated docs

* PR updates

* Only build timeline with required segments
Use 404 with message when a segmentId is not found
Fix typo in doc
Return number of segments modified.

* Fix checkstyle errors

* Replace String.format with StringUtils.format

* Remove return value

* Expand timeline to segments that overlap for intervals
Restrict update call to only segments that need updating.

* Only add overlapping enabled segments to the timeline

* Some renames for clarity
Added comments

* Don't rely on cached poll data
Only fetch required information from DB

* Match error style

* Merge and cleanup doc

* Fix String.format call

* Add unit tests

* Fix unit tests that check for overshadowing
2019-04-26 16:01:50 -07:00
Clint Wylie 09b7700d13 fix docs (#7556) 2019-04-25 22:00:37 -07:00
Justin Borromeo 012ab02bf4 Update select doc disclaimer (#7554) 2019-04-25 19:23:39 -07:00
Surekha 8308ffef1f API to drop data by interval (#7494)
* Add api to drop data by interval

* update to address comments

* unused imports

* PR comments + add tests in SQLMetadataSegmentManagerTest

*  update tests and docs
2019-04-25 14:24:40 -07:00
Jonathan Wei 658fb2b062 Fix bugs in milestone contributor script (#7545)
* Only check PRs in milestone contributor script

* Fix no-pagination bug
2019-04-24 22:11:57 -07:00
Jonathan Wei 8b1a4e18dd Additional Apache branding doc updates (#7524) 2019-04-23 14:39:16 -07:00
Xue Yu 2c8a71f883 Support LPAD and RPAD sql function (#7388)
* lpad and rpad sql function

* feedback address

* feedback address

* add doc and format

* update docs
2019-04-22 14:51:32 -07:00
Jonathan Wei 3487663de9 Adjust approx agg deprecation wording (#7518) 2019-04-19 19:31:50 -07:00
Jonathan Wei 74960e82bf Add more Apache branding to docs (#7515) 2019-04-19 15:52:26 -07:00
Slim Bouguerra 5463ecb979 Fix broken link due to Typo. (#7513)
Change-Id: I5792f89ed6afe945f386058edd44f0400998460a
2019-04-19 09:58:54 -07:00
Jonathan Wei 8078f567aa Update kafka version in tutorials (#7500) 2019-04-17 14:56:29 -07:00
Kazuhito Takeuchi 7c19c92a81 Add ROUND function in druid-sql. (#7224)
* Implement round function in druid-sql

* Return value according to the type of argument

* Fix codes for abnoraml inputs, updated math-expr.md

* Fix assert text

* Fix error messages and refactor codes

* Fix compile error, update sql.md, refactor codes and format tests
2019-04-16 11:15:39 -07:00
Lucas Capistrant 8acad27d99 Enhance the Http Firehose to work with URIs requiring basic authentication (#7145)
* Enhnace the HttpFirehose to work with both insecure URIs and URIs requiring basic authentication

* Improve security of enhanced HttpFirehoseFactory by not logging auth credentials

* Fix checkstyle failure in HttpFirehoseFactory.java

* Update docs and fix TeamCity build with required noinspection

* Indentation cleanup and logic modification for HttpFirehose object stream

* Remove default Empty string password provider in http firehose

* Add JavaDoc for MixIn describing its intended use

* Reverting documentation notation for json code to be inline with rest of doc

* Improve instantiation of ObjectMappers that require MixIn for redacting password from task logs

* Add comment to clarify fully qualified references of Objects in SQLMetadataStorageActionHandler
2019-04-15 14:29:01 -07:00
Justin Borromeo 85f10ed0d0 Support querying realtime segments using time-ordered scan queries and fix broken scan queries without time column (#7454)
* Update scan query runner factory to accept SpecificSegmentSpec

*  nit

* Sorry travis

* Improve logging and fix doc

* Bug fix

* Friendlier error msgs and tests to cover bug

* Address Gian's comments

* Fix doc

* Added tests for empty and null column list

* Style

* Fix checking wrong order (looking at query param when it should be
looking at the null-handled order)

* Add test case for null order

* Fix ScanQueryRunnerTest

* Forbidden APIs fixed
2019-04-12 19:08:34 -07:00
zhaojiandong 1d9450da81 Some docs optimization (#6890)
* some markdown docs optimization

* markdown escape
2019-04-12 17:30:57 -07:00
Gian Merlino 2470b3279f SQL: Fix docs for STRING_FORMAT. (#7455) 2019-04-11 21:57:28 -07:00
Gian Merlino a517f8ce49 Coordinator: Allow dropping all segments. (#7447)
Removes the coordinator sanity check that prevents it from dropping all
segments. It's useful to get rid of this, since the behavior is
unintuitive for dev/testing clusters where users might regularly want
to drop all their data to get back to a clean slate.

But the sanity check was there for a reason: to prevent a race condition
where the coordinator might drop all segments if it ran before the
first metadata store poll finished. This patch addresses that concern
differently, by allowing methods in MetadataSegmentManager to return
null if a poll has not happened yet, and canceling coordinator runs
in that case.

This patch also makes the "dataSources" reference in
SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile
before, but it seems necessary to me: it's not final, and it's dereferenced
from multiple threads without synchronization.
2019-04-11 08:45:38 -07:00
Justin Borromeo 408e3e1b2a Remove select execution code from SQL planner (#7416)
* Removed select execution code from SQL planner

* Update doc
2019-04-10 22:32:57 -07:00
Benjamin Hopp 78e6f6fb38 Updated Javascript Affinity config docs (#7441)
Updated with hostname:port rather than IP Address.
2019-04-10 21:44:50 -07:00
Benedict Jin 2f64414ade Add "REVERSE" / "REPEAT" / "RIGHT" / "LEFT" functions (#7334)
* Add "REVERSE" / "REPEAT" / "RIGHT" / "LEFT" functions

* Fix ImportOrder

* Use RuntimeException instead of OutOfMemoryError according to "Effective Java"

* Simplify

* Patch suggestions
2019-04-10 11:46:29 +08:00
Clint Wylie 89bb43f382 'core' ORC extension (#7138)
* orc extension reworked to use apache orc map-reduce lib, moved to core extensions, support for flattenSpec, tests, docs

* change binary handling to be compatible with avro and parquet, Rows.objectToStrings now converts byte[] to base64, change date handling

* better docs and tests

* fix it

* formatting

* doc fix

* fix it

* exclude redundant dependencies

* use latest orc-mapreduce, add hadoop jobProperties recommendations to docs

* doc fix

* review stuff and fix binaryAsString

* cache for root level fields

* more better
2019-04-09 09:03:26 -07:00
Justin Borromeo 799c66d9ac Allow max rows and max segments for time-ordered scans to be overridden using the scan query JSON spec (#7413)
* Initial changes

* Fixed NPEs

* Fixed failing spec test

* Fixed failing Calcite test

* Move configs to context

* Validated and added docs

* fixed weird indentation

* Update default context vals in doc

* Fixed allowable values
2019-04-07 20:12:52 -07:00
Clint Wylie e28a15f9f5 fix expressions docs operator table (#7420)
* fix expressions docs operator table

* Update math-expr.md
2019-04-07 20:12:00 -07:00
Justin Borromeo e23fd41fa7 Update SQL doc for planning change (#7415) 2019-04-05 15:14:07 -07:00
Jonathan Wei 0f6cb1e7e0 Update theta/hll sketch doc comparison (#7407) 2019-04-03 15:21:33 -07:00
Gian Merlino 8c104a115c
SQL: Add STRING_FORMAT function. (#7327) 2019-04-03 17:09:54 -04:00
David Glasser 4e23c11345 Make IngestSegmentFirehoseFactory splittable for parallel ingestion (#7048)
* Make IngestSegmentFirehoseFactory splittable for parallel ingestion

* Code review feedback

- Get rid of WindowedSegment
- Don't document 'segments' parameter or support splitting firehoses that use it
- Require 'intervals' in WindowedSegmentId (since it won't be written by hand)

* Add missing @JsonProperty

* Integration test passes

* Add unit test

* Remove two FIXME comments from CompactionTask

I'd like to leave this PR in a potentially mergeable state, but I still would
appreciate reviewer eyes on the questions I'm removing here.

* Updates from code review
2019-04-02 14:59:17 -07:00
Xue Yu 78fd5aff21 support radians and degrees in sql (#7336)
* support radians and degrees in sql

* update test case
2019-04-02 12:47:49 -07:00
Qi Shu 134f71d1b4 Add documentation for Druid native query in SQL view of web console (#7381)
* Add docmentation for Druid native query in SQL view of web console

* Edit sentence
2019-04-02 12:20:51 -07:00
Michael Trelinski 347779b17a Zookeeper loss (#6740)
* Update init

Fix bin/init to source from proper directory.

* Fix for Proposal #6518: Shutdown druid processes upon complete loss of ZK connectivity

* Zookeeper Loss:

- Add feature documentation
- Cosmetic refactors
- Variable extractions
- Remove getter

* - Change config key name and reword documentation
- Switch from Function<Void,Void> to Runnable/Lambda
- try { … } finally { … }

* Fix line length too long

* - change to formatted string for logging
- use System.err.println after lifecycle stops

* commenting on makeEnsembleProvider()-created Zookeeper termination

* Add javadoc

* added java doc reference back to apache discussion thread.

* move comment to other class

* favor two-slash comments instead of multiline comments
2019-03-29 15:10:42 -07:00
Justin Borromeo ad7862c58a Time Ordering On Scans (#7133)
* Moved Scan Builder to Druids class and started on Scan Benchmark setup

* Need to form queries

* It runs.

* Stuff for time-ordered scan query

* Move ScanResultValue timestamp comparator to a separate class for testing

* Licensing stuff

* Change benchmark

* Remove todos

* Added TimestampComparator tests

* Change number of benchmark iterations

* Added time ordering to the scan benchmark

* Changed benchmark params

* More param changes

* Benchmark param change

* Made Jon's changes and removed TODOs

* Broke some long lines into two lines

* nit

* Decrease segment size for less memory usage

* Wrote tests for heapsort scan result values and fixed bug where iterator
wasn't returning elements in correct order

* Wrote more tests for scan result value sort

* Committing a param change to kick teamcity

* Fixed codestyle and forbidden API errors

* .

* Improved conciseness

* nit

* Created an error message for when someone tries to time order a result
set > threshold limit

* Set to spaces over tabs

* Fixing tests WIP

* Fixed failing calcite tests

* Kicking travis with change to benchmark param

* added all query types to scan benchmark

* Fixed benchmark queries

* Renamed sort function

* Added javadoc on ScanResultValueTimestampComparator

* Unused import

* Added more javadoc

* improved doc

* Removed unused import to satisfy PMD check

* Small changes

* Changes based on Gian's comments

* Fixed failing test due to null resultFormat

* Added config and get # of segments

* Set up time ordering strategy decision tree

* Refactor and pQueue works

* Cleanup

* Ordering is correct on n-way merge -> still need to batch events into
ScanResultValues

* WIP

* Sequence stuff is so dirty :(

* Fixed bug introduced by replacing deque with list

* Wrote docs

* Multi-historical setup works

* WIP

* Change so batching only occurs on broker for time-ordered scans

Restricted batching to broker for time-ordered queries and adjusted
tests

Formatting

Cleanup

* Fixed mistakes in merge

* Fixed failing tests

* Reset config

* Wrote tests and added Javadoc

* Nit-change on javadoc

* Checkstyle fix

* Improved test and appeased TeamCity

* Sorry, checkstyle

* Applied Jon's recommended changes

* Checkstyle fix

* Optimization

* Fixed tests

* Updated error message

* Added error message for UOE

* Renaming

* Finish rename

* Smarter limiting for pQueue method

* Optimized n-way merge strategy

* Rename segment limit -> segment partitions limit

* Added a bit of docs

* More comments

* Fix checkstyle and test

* Nit comment

* Fixed failing tests -> allow usage of all types of segment spec

* Fixed failing tests -> allow usage of all types of segment spec

* Revert "Fixed failing tests -> allow usage of all types of segment spec"

This reverts commit ec470288c7.

* Revert "Merge branch '6088-Time-Ordering-On-Scans-N-Way-Merge' of github.com:justinborromeo/incubator-druid into 6088-Time-Ordering-On-Scans-N-Way-Merge"

This reverts commit 57033f36df, reversing
changes made to 8f01d8dd16.

* Check type of segment spec before using for time ordering

* Fix bug in numRowsScanned

* Fix bug messing up count of rows

* Fix docs and flipped boolean in ScanQueryLimitRowIterator

* Refactor n-way merge

* Added test for n-way merge

* Refixed regression

* Checkstyle and doc update

* Modified sequence limit to accept longs and added test for long limits

* doc fix

* Implemented Clint's recommendations
2019-03-28 14:37:09 -07:00
Surekha be318f4de3 Add column type to sys table docs (#7359)
* Add column type

* oops should be used=1
2019-03-27 20:21:57 -07:00
Charles Allen eeb3dbe79d Move GCP to a core extension (#6953)
* Move GCP to a core extension

* Don't provide druid-core >.<

* Keep AWS and GCP modules separate

* Move AWSModule to its own module

* Add aws ec2 extension and more modules in more places

* Fix bad imports

* Fix test jackson module

* Include AWS and GCP core in server

* Add simple empty method comment

* Update version to 15

* One more 0.13.0-->0.15.0 change

* Fix multi-binding problem

* Grep for s3-extensions and update docs

* Update extensions.md
2019-03-27 09:00:43 -07:00
Justin Borromeo c7fea6ac8f Added better QueryInterruptedException error message for UnsupportedOperationException (#7248)
* Added error message for UOE

* Updated docs

* Doc change

* Doc change
2019-03-26 15:20:24 -07:00
Gian Merlino 4ca5fe0f60 SQL: Add PARSE_LONG function. (#7326)
* SQL: Add PARSE_LONG function.

* Fix test.
2019-03-22 15:40:10 -07:00
Vadim Ogievetsky e4f2dcacf2 Druid console docs (#7300)
* console docs

* fix typo
2019-03-21 00:37:33 -07:00
Justin Borromeo ff94bd16e6 Fix conflicting information in configuration doc (#7299)
* Doc fix

* Fix typo
2019-03-19 14:55:58 -07:00
Qi Shu 5406aaa49d Add SQL auto complete in druid console (#7244)
* Add SQL auto complete in druid console

* Add comment in sql.md to alert user to change create-sql-function-doc if sql.md format gets changed
2019-03-16 01:45:53 -07:00
Jihoon Son 892d1d35d6
Deprecate NoneShardSpec and drop support for automatic segment merge (#6883)
* Deprecate noneShardSpec

* clean up noneShardSpec constructor

* revert unnecessary change

* Deprecate mergeTask

* add more doc

* remove convert from indexMerger

* Remove mergeTask

* remove HadoopDruidConverterConfig

* fix build

* fix build

* fix teamcity

* fix teamcity

* fix ServerModule

* fix compilation

* fix compilation
2019-03-15 23:29:25 -07:00
Atul Mohan 2daeb50008 Add support for optional client authentication on TLS (#7250)
* Add optional client auth

* Add docs
2019-03-15 15:14:34 -07:00
Hongze Zhang f9d99b245b Add missing doc link for operations/http-compression.html; Fix magic numbers in test cases using JettyServerInitUtils.wrapWithDefaultGzipHandler (#7110) 2019-03-13 14:09:19 -07:00
Clint Wylie 3895914aa2 consolidate CompressionUtils.java since now in the same jar (#6908) 2019-03-13 11:02:44 -04:00
Gian Merlino 9178793ab5 Further improve caching documentation. (#7236)
Follow-up to #7223 that fixes a doc bug (a result-level cache property
was misspelled), changes the recommended "small cluster" threshold from
20 to 5 servers, and clarifies behavior of the various caching options.
2019-03-11 17:57:00 -07:00
Pierre-Emile Ferron a88fbcd5db Improve caching doc (#7223)
- Set correct default values for query context result cache parameters
- Add details about broker cache impact on local historical merging
2019-03-11 20:06:28 -04:00
Venkatraman P 3118160387 Adding a tutorial in doc for using Kerberized Hadoop as deep storage. (#6863)
* Adding a tutorial in doc for using Kerberized Hadoop as deep storage.

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

Fixed - to ~ in Apache License section.

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md
2019-03-11 11:39:15 -07:00
Jonathan Wei e1d8c17746 Add commit ID milestone helper script (#7100)
* Add commit ID milestone helper script

* Filter on merged/closed in API call
2019-03-11 11:36:07 -07:00
Jonathan Wei 94463b5778 Add missing redirects and fix broken links (#7213)
* Add missing redirects

* Fix zookeeper redirect

* Fix broken links
2019-03-09 15:16:23 -08:00
jorbay-au 62f0de9b89 Remove outdated instruction for rule updates (#7205) 2019-03-08 16:42:08 -08:00
Clint Wylie a44df6522c rename maintenance mode to decommission (#7154)
* rename maintenance mode to decommission

* review changes

* missed one

* fix straggler, add doc about decommissioning stalling if no active servers

* fix missed typo, docs

* refine docs

* doc changes, replace generals

* add explicit comment to mention suppressed stats for balanceTier

* rename decommissioningVelocity to decommissioningMaxSegmentsToMovePercent and update docs

* fix precondition check

* decommissioningMaxPercentOfMaxSegmentsToMove

* fix test

* fix test

* fixes
2019-03-08 16:33:51 -08:00
Jihoon Son e48a9c138e Reduce default max # of subTasks to 1 for native parallel task (#7181)
* Reduce # of max subTasks to 2

* fix typo and add more doc

* add more doc and link

* change default and add warning

* fix doc

* add test

* fix it test
2019-03-05 22:06:36 -08:00
Jonathan Wei 9183e32876 Add more approximate algorithm docs (#7195) 2019-03-05 16:44:02 -08:00
Xue Yu 65118277a3 support sin cos etc trigonometric function in sql (#7182)
* support triangle function in sql

* feedback address
2019-03-04 19:18:22 -08:00
Jonathan Wei 5486c2abf8
Update LICENSE and NOTICE files (#7026)
* Update LICENSE and NOTICE files

* Update react-table version
2019-03-04 18:45:22 -08:00
Roman Leventov 10c9f6d708
Fix and document concurrency of EventReceiverFirehose and TimedShutoffFirehose; Refine concurrency specification of Firehose (#7038)
#### `EventReceiverFirehoseFactory`
Fixed several concurrency bugs in `EventReceiverFirehoseFactory`:
 - Race condition over putting an entry into `producerSequences` in `checkProducerSequence()`.
 - `Stopwatch` used to measure time across threads, but it's a non-thread-safe class.
 - Use `System.nanoTime()` instead of `System.currentTimeMillis()` because the latter are [not suitable](https://stackoverflow.com/a/351571/648955)  for measuring time intervals.
 - `close()` was not synchronized by could be called from multiple threads concurrently.

Removed unnecessary `readLock` (protecting `hasMore()` and `nextRow()` which are always called from a single thread). Removed unnecessary `volatile` modifiers.

Documented threading model and concurrent control flow of `EventReceiverFirehose` instances.

**Important:** please read the updated Javadoc for `EventReceiverFirehose.addAll()`. It allows events from different requests (batches) to be interleaved in the buffer. Is this OK?

#### `TimedShutoffFirehoseFactory`
- Fixed a race condition that was possible because `close()` that was not properly synchronized.

Documented threading model and concurrent control flow of `TimedShutoffFirehose` instances.

#### `Firehose`

Refined concurrency contract of `Firehose` based on `EventReceiverFirehose` implementation. Importantly, now it states that `close()` doesn't affect `hasMore()` and `nextRow()` and could be called concurrently with them. In other words, specified that `close()` is for "row supply" side rather than "row consume" side. However, I didn't check that other `Firehose` implementatations adhere to this contract.

<hr>

This issue is the result of reviewing `EventReceiverFirehose` and `TimedShutoffFirehose` using [this checklist](https://medium.com/@leventov/code-review-checklist-java-concurrency-49398c326154).
2019-03-04 18:50:03 -03:00
Jihoon Son ded03d9d4c Improve doc for auto compaction (#7117)
* Improve doc for auto compaction

* fix doc

* address comments
2019-03-02 12:21:50 -08:00
Jihoon Son 45f12de9ad Fix supported file formats for Hadoop vs Native batch doc (#7069)
* Fix supported file formats

* address comment
2019-02-28 19:44:45 -08:00
Jonathan Wei 32c418fdd8 Reword 'node' to 'process' (#7172) 2019-02-28 18:10:39 -08:00
Jonathan Wei a0afd7931d
Add web consoles doc page (#7123)
* Add web consoles doc page

* PR comments

* Remove 'unified'

* PR comments

* Fix TOC

* PR comments

* More revisions

* GUI -> UI

* Update router docs

* Reword router doc
2019-02-28 14:02:39 -08:00
Jonathan Wei 0b4f771062 Exclude hadoop-lzo from thrift-extensions build (#7151) 2019-02-27 19:57:53 -08:00
Jonathan Wei 3d247498ef Update tutorials for 0.14.0-incubating (#7157) 2019-02-27 19:50:31 -08:00
Jihoon Son 6b232d8195 Improve compaction tutorial to demonstrate compaction with keepSegmentGranularity = true (#7079)
* Improve compaction tutorial to demonstrate compaction with keepSegmentGranularity = true

* typo

* add a warning
2019-02-27 16:02:51 -08:00
Jihoon Son 4e2b085201
Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file in deep storage (#6911)
* Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file

* delete descriptor.file when killing segments

* fix test

* Add doc for ha

* improve warning
2019-02-20 15:10:29 -08:00
Mingming Qiu dd34691004 Coordinator await initialization before finishing startup (#6847)
* Curator server inventory await initialization

* address comments

* print exception object in log

* remove throws ISE

* cachingCost awaitInitialization default to false
2019-02-20 11:56:23 -08:00
David Glasser a81b1b8c9c index_parallel: support !appendToExisting with no explicit intervals (#7046)
* index_parallel: support !appendToExisting with no explicit intervals

This enables ParallelIndexSupervisorTask to dynamically request locks at runtime
if it is run without explicit intervals in the granularity spec and with
appendToExisting set to false.  Previously, it behaved as if appendToExisting
was set to true, which was undocumented and inconsistent with IndexTask and
Hadoop indexing.

Also, when ParallelIndexSupervisorTask allocates segments in the explicit
interval case, fail if its locks on the interval have been revoked.

Also make a few other additions/clarifications to native ingestion docs.

Fixes #6989.

* Review feedback.

PR description on GitHub updated to match.

* Make native batch ingestion partitions start at 0

* Fix to previous commit

* Unit test. Verified to fail without the other commits on this branch.

* Another round of review

* Slightly scarier warning
2019-02-20 10:54:26 -08:00
Surekha 2b04e6d0bc add note on consistency of results for sys.segments queries (#7034)
* add doc

* change docs

* PR comments

* few more changes
2019-02-19 10:52:37 -08:00
Clint Wylie cadb6c5280 Missing Overlord and MiddleManager api docs (#7042)
* document middle manager api

* re-arrange

* correction

* document more missing overlord api calls, minor re-arrange of some code i was referencing

* fix it

* this will fix it

* fixup

* link to other docs
2019-02-19 10:52:05 -08:00
Surekha 80a2ef7be4 Support kafka transactional topics (#5404) (#6496)
* Support kafka transactional topics

* update kafka to version 2.0.0
* Remove the skipOffsetGaps option since it's not used anymore
* Adjust kafka consumer to use transactional semantics
* Update tests

* Remove unused import from test

* Fix compilation

* Invoke transaction api to fix a unit test

* temporary modification of travis.yml for debugging

* another attempt to get travis tasklogs

* update kafka to 2.0.1 at all places

* Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes

* Add deprecated in docs for kafka-eight and kafka-simple extensions

* Remove skipOffsetGaps and code changes for transaction support

* Fix indentation

* remove skipOffsetGaps from kinesis

* Add transaction api to KafkaRecordSupplierTest

* Fix indent

* Fix test

* update kafka version to 2.1.0
2019-02-18 11:50:08 -08:00
scrawfor 0fa9000849 Add Postgresql SqlFirehose (#6813)
* Add Postgresql SqlFirehose

* Fix Code Style.

* Fix style.

* Fix Import Order.

* Add Line Break before package.
2019-02-14 22:52:03 -08:00
awelsh93 ee91e27fe7 Update api-reference.md doc (#7065)
- moving description of coordinator isLeader endpoint
2019-02-14 14:38:09 +00:00
Edward Gan 90c1a54b86 Moments Sketch custom aggregator (#6581)
* Moments Sketch Integration with Druid

* updates, add documentation, fix warnings

* nits

* disallowed base64

* update to druid 0.14
2019-02-13 14:03:47 -08:00
Jihoon Son 970308463d
Add doc for Hadoop-based ingestion vs Native batch ingestion (#7044)
* Add doc for Hadoop-based ingestion vs Native batch ingestion

* add links

* add links
2019-02-13 11:23:08 -08:00
Jihoon Son b1c4a5de0d
Fix and improve doc for partitioning of local index (#7064) 2019-02-13 11:20:52 -08:00
Jihoon Son d42de574d6 Add an api to get all lookup specs (#7025)
* Add an api to get all lookup specs

* add doc
2019-02-08 11:05:59 -08:00
Jihoon Son 8e3a58f723
Improve druid.storage.sse.kms.keyId and druid.s3.protocol (#7012)
* Improve druid.storage.sse.kms.keyId and druid.s3.protocol

* fix article
2019-02-06 15:00:51 -08:00
Jihoon Son 75c70c2ccc Add doc for S3 permissions settings (#7011)
* Add doc for S3 permissions settings

* add a comment about additional settings
2019-02-05 11:52:09 -08:00
Egor Riashin 97b6407983 maintenance mode for Historical (#6349)
* maintenance mode for Historical

forbidden api fix, config deserialization fix

logging fix, unit tests

* addressed comments

* addressed comments

* a style fix

* addressed comments

* a unit-test fix due to recent code-refactoring

* docs & refactoring

* addressed comments

* addressed a LoadRule drop flaw

* post merge cleaning up
2019-02-04 18:11:00 -08:00
Jonathan Wei 953b96d0a4 Add more sketch aggregator support in Druid SQL (#6951)
* Add more sketch aggregator support in Druid SQL

* Add docs

* Tweak module serde register

* Fix tests

* Checkstyle

* Test fix

* PR comment

* PR comment

* PR comments
2019-02-02 22:34:53 -08:00
Surekha 7baa33049c Introduce published segment cache in broker (#6901)
* Add published segment cache in broker

* Change the DataSegment interner so it's not based on DataSEgment's equals only and size is preserved if set

* Added a trueEquals to DataSegment class

* Use separate interner for realtime and historical segments

* Remove trueEquals as it's not used anymore, change log message

* PR comments

* PR comments

* Fix tests

* PR comments

* Few more modification to

* change the coordinator api
* removeall segments at once from MetadataSegmentView in order to serve a more consistent view of published segments
* Change the poll behaviour to avoid multiple poll execution at same time

* minor changes

* PR comments

* PR comments

* Make the segment cache in broker off by default

* Added a config to PlannerConfig
* Moved MetadataSegmentView to sql module

* Add doc for new planner config

* Update documentation

* PR comments

* some more changes

* PR comments

* fix test

* remove unintentional change, whether to synchronize on lifecycleLock is still in discussion in PR

* minor changes

* some changes to initialization

* use pollPeriodInMS

* Add boolean cachePopulated to check if first poll succeeds

* Remove poll from start()

* take the log message out of condition in stop()
2019-02-02 22:27:13 -08:00
Justin Borromeo 6430ef8e1b lol (#6985) 2019-02-01 14:21:13 -08:00
Clint Wylie 7a5827e12e bloom filter sql aggregator (#6950)
* adds sql aggregator for bloom filter, adds complex value serde for sql results

* fix tests

* checkstyle

* fix copy-paste
2019-02-01 13:54:46 -08:00
lxqfy e45f9ea5e9 Update metrics.md (#6976) 2019-02-01 13:40:44 -08:00
jorbay-au 852fe86ea2 Remove repeated word in indexing-service.md (#6983) 2019-02-01 13:38:22 -08:00
Furkan KAMACI 185a7d4fc5 Updated definition and added link for Zookeeper connection string. (#6961)
* Updated definition and added link for Zookeeper connection string.

* Conflicts are merged.
2019-01-31 10:14:42 -08:00
Gian Merlino 54735a5ad1 Kafka indexing: Remove experimental notice. (#6970) 2019-01-31 09:54:22 -08:00
Surekha 4c211ab2b4 update sys table docs (#6955)
* update sys table docs

* Capitalize SQL
2019-01-31 08:51:39 -08:00
Jonathan Wei 82137874ea Add master/data/query server concepts to docs/packaging (#6916)
* Add master/data/query server concepts to docs/packaging

* PR comments

* TOC and markdown fix

* Update image legend

* PR comment

* More PR comments
2019-01-30 19:41:07 -08:00
Jihoon Son d4fbbb8deb Support protocol configuration for S3 (#6954)
* Support protocol configuration for S3

* Add doc
2019-01-30 19:32:00 -08:00
Gian Merlino edee576a7a Add doc for druid.storage.useS3aSchema. (#6964) 2019-01-30 10:26:37 -08:00
Clint Wylie a6d81c0d16 Adds bloom filter aggregator to 'druid-bloom-filters' extension (#6397)
* blooming aggs

* partially address review

* fix docs

* minor test refactor after rebase

* use copied bloomkfilter

* add ByteBuffer methods to BloomKFilter to allow agg to use in place, simplify some things, more tests

* add methods to BloomKFilter to get number of set bits, use in comparator, fixes

* more docs

* fix

* fix style

* simplify bloomfilter bytebuffer merge, change methods to allow passing buffer offsets

* oof, more fixes

* more sane docs example

* fix it

* do the right thing in the right place

* formatting

* fix

* avoid conflict

* typo fixes, faster comparator, docs for comparator behavior

* unused imports

* use buffer comparator instead of deserializing

* striped readwrite lock for buffer agg, null handling comparator, other review changes

* style fixes

* style

* remove sync for now

* oops

* consistency

* inspect runtime shape of selector instead of selector plus, static comparator, add inner exception on serde exception

* CardinalityBufferAggregator inspect selectors instead of selectorPluses

* fix style

* refactor away from using ColumnSelectorPlus and ColumnSelectorStrategyFactory to instead use specialized aggregators for each supported column type, other review comments

* adjustment

* fix teamcity error?

* rename nil aggs to empty, change empty agg constructor signature, add comments

* use stringutils base64 stuff to be chill with master

* add aggregate combiner, comment
2019-01-29 20:05:17 +07:00
Justin Borromeo 8d70ba69cf Fix broken link on select query doc page (#6933)
* Fixed broken link

* Typo fix
2019-01-28 17:02:21 -08:00
Clint Wylie af3cbc3687 add bloom filter druid expression (#6904)
* add "bloom_filter_test" druid expression to support bloom filters in ExpressionVirtualColumn and ExpressionDimFilter and sql expressions

* more docs

* use java.util.Base64, doc fixes
2019-01-28 08:41:45 -05:00
Navin Kumar ae4dba7785 Fix Configuration options (#6884)
Change `druid.metadata.postgres.*` to `druid.metadata.postgres.ssl.*`
2019-01-27 12:35:27 -08:00
Gian Merlino 7c5a06bb85
More docs on data modeling. (#6899)
* More docs on data modeling.

* Try to fix formatting.

* Fix indentation.

* More details and adjustments after feedback.
2019-01-27 11:33:21 -08:00
Janek Lasocki-Biczysko 89f2475369 Move ingest/kafka/* metrics into a separate section on the metrics docs (#6895)
The `ingest/kafka/*` metrics were grouped together with metrics relevant
to RealtimeMetricsMonitor, whereas they should be in their own section.
2019-01-28 00:11:53 +08:00
Jihoon Son 3b020fd81b Improve doc for auto compaction (#6782)
* Improve doc for auto compaction

* address comments

* address comments

* address comments
2019-01-23 16:21:45 -08:00
Justin Borromeo 86e171a234 Doc change and commands tested command on v5 and v8 (#6886) 2019-01-18 15:13:11 -08:00
Jonathan Wei 68f744ec0a
Fixed buckets histogram aggregator (#6638)
* Fixed buckets histogram aggregator

* PR comments

* More PR comments

* Checkstyle

* TeamCity

* More TeamCity

* PR comment

* PR comment

* Fix doc formatting
2019-01-17 14:51:16 -08:00
lxqfy f6dcd63084 Fixed the format of broker client configration (#6878) 2019-01-16 22:57:50 -08:00
Dayue Gao 5b8a221713 Add SQL id, request logs, and metrics (#6302)
* use SqlLifecyle to manage sql execution, add sqlId

* add sql request logger

* fix UT

* rename sqlId to sqlQueryId, sql/time to sqlQuery/time, etc

* add docs and more sql request logger impls

* add UT for http and jdbc

* fix forbidden use of com.google.common.base.Charsets

* fix UT in QuantileSqlAggregatorTest, supressed unused warning of getSqlQueryId

* do not use default method in QueryMetrics interface

* capitalize 'sql' everywhere in the non-property parts of the docs

* use RequestLogger interface to log sql query

* minor bugfixes and add switching request logger

* add filePattern configs for FileRequestLogger

* address review comments, adjust sql request log format

* fix inspection error

* try SuppressWarnings("RedundantThrows") to fix inspection error on ComposingRequestLoggerProvider
2019-01-15 23:12:59 -08:00
Jonathan Wei 9a8bade2fb Update approximate aggregators docs (#6848) 2019-01-11 21:50:51 -08:00
Furkan KAMACI 55927bf8e3 Kafka version is updated (#6835)
Update Kafka version in tutorial from 0.10.2.0 to 0.10.2.2
2019-01-10 17:58:40 -08:00
Jihoon Son c35a39d70b
Add support maxRowsPerSegment for auto compaction (#6780)
* Add support maxRowsPerSegment for auto compaction

* fix build

* fix build

* fix teamcity

* add test

* fix test

* address comment
2019-01-10 09:50:14 -08:00
Furkan KAMACI ea973fee6b Tranquility version is updated (#6824) 2019-01-10 09:46:58 +08:00
dongyifeng def823124c add version comparator for StringComparator (#6745)
* add version comparator for StringComparator

* add more test case and docs
2019-01-08 17:17:03 -08:00
Benjamin Hopp ef80c4e036 Update sql.md (#6821)
Corrected defaults for druid.sql.avatica.maxStatementsPerConnection and druid.sql.avatica.maxConnections
2019-01-08 10:15:12 -08:00
Janek Lasocki-Biczysko b88e6304c4 Fix broken link in ingestion/schema-design.md docs (#6810) 2019-01-06 18:20:53 -08:00
David Glasser c08f391605 statsd-emitter: support constant DogStatsD tags (#6791)
PR #6605 added support to the statsd emitter for DogStatsD tags. This commit
lets you specify "constant tags" in the config file which are included with
every event. This is helpful if you are running in an environment where you
cannot configure your datadog-agent with tags like "cluster name" --- eg, a
Kubernetes cluster with a datadog-agent on each node and different Druid
deployments in different namespaces but sharing the same datadog-agent
daemonset.

Also fix the name of an existing boolean getter to start with 'is'.
2019-01-04 15:35:37 +08:00
thomask 0e04acca43 Show how to include classpath in command (#6802)
Would have saved me some time
2019-01-03 18:31:55 -08:00
Jihoon Son 9ad6a733a5 Add support segmentGranularity for CompactionTask (#6758)
* Add support segmentGranularity

* add doc and fix combination of options

* improve doc
2019-01-03 17:50:45 -08:00
Mingming Qiu 6761663509 make kafka poll timeout can be configured (#6773)
* make kafka poll timeout can be configured

* add doc

* rename DEFAULT_POLL_TIMEOUT to DEFAULT_POLL_TIMEOUT_MILLIS
2019-01-03 12:16:02 +08:00
Mingming Qiu 114a9fc38f change propertyBase in ServerViewModule (#6774) 2019-01-02 16:44:02 +08:00
Clint Wylie 67f832957b add bloom filter operator to general sql docs (#6785) 2018-12-31 11:30:33 -08:00
Joshua Sun 7c7997e8a1 Add Kinesis Indexing Service to core Druid (#6431)
* created seekablestream classes

* created seekablestreamsupervisor class

* first attempt to integrate kafa indexing service to use SeekableStream

* seekablestream bug fixes

* kafkarecordsupplier

* integrated kafka indexing service with seekablestream

* implemented resume/suspend and refactored some package names

* moved kinesis indexing service into core druid extensions

* merged some changes from kafka supervisor race condition

* integrated kinesis-indexing-service with seekablestream

* unite tests for kinesis-indexing-service

* various bug fixes for kinesis-indexing-service

* refactored kinesisindexingtask

* finished up more kinesis unit tests

* more bug fixes for kinesis-indexing-service

* finsihed refactoring kinesis unit tests

* removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions

* kinesis-indexing-service code cleanup and docs

* merge #6291

merge #6337

merge #6383

* added more docs and reordered methods

* fixd kinesis tests after merging master and added docs in seekablestream

* fix various things from pr comment

* improve recordsupplier and add unit tests

* migrated to aws-java-sdk-kinesis

* merge changes from master

* fix pom files and forbiddenapi checks

* checkpoint JavaType bug fix

* fix pom and stuff

* disable checkpointing in kinesis

* fix kinesis sequence number null in closed shard

* merge changes from master

* fixes for kinesis tasks

* capitalized <partitionType, sequenceType>

* removed abstract class loggers

* conform to guava api restrictions

* add docker for travis other modules test

* address comments

* improve RecordSupplier to supply records in batch

* fix strict compile issue

* add test scope for localstack dependency

* kinesis indexing task refactoring

* comments

* github comments

* minor fix

* removed unneeded readme

* fix deserialization bug

* fix various bugs

* KinesisRecordSupplier unable to catch up to earliest position in stream bug fix

* minor changes to kinesis

* implement deaggregate for kinesis

* Merge remote-tracking branch 'upstream/master' into seekablestream

* fix kinesis offset discrepancy with kafka

* kinesis record supplier disable getPosition

* pr comments

* mock for kinesis tests and remove docker dependency for unit tests

* PR comments

* avg lag in kafkasupervisor #6587

* refacotred SequenceMetadata in taskRunners

* small fix

* more small fix

* recordsupplier resource leak

* revert .travis.yml formatting

* fix style

* kinesis docs

* doc part2

* more docs

* comments

* comments*2

* revert string replace changes

* comments

* teamcity

* comments part 1

* comments part 2

* comments part 3

* merge #6754

* fix injection binding

* comments

* KinesisRegion refactor

* comments part idk lol

* can't think of a commit msg anymore

* remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner

* commmmmmmmmmments

* extra error handling in KinesisRecordSupplier getRecords

* comments

* quickfix

* typo

* oof
2018-12-21 12:49:24 -07:00
Gian Merlino 7a09cde4de
Broker: Await initialization before finishing startup. (#6742)
* Broker: Await initialization before finishing startup.

In particular, hold off on announcing the service and starting the
HTTP server until the server view and SQL metadata cache are finished
initializing. This closes a window of time where a Broker could return
partial results shortly after startup.

As part of this, some simplification of server-lifecycle service
announcements. This helps ensure that the two different kinds of
announcements we do (legacy and new-style) stay in sync.

* Remove unused imports.

* Fix NPE in ServerRunnable.
2018-12-18 20:32:31 -08:00
Jihoon Son 2c380e3a26 Fix doc for automatic compaction (#6749) 2018-12-17 11:44:33 -08:00
Jonathan Wei c713116a75 Use @Coordinator leader client in CoordinatorRuleManager (#6729) 2018-12-16 15:18:09 -08:00
Gian Merlino 04e7c7fbdc FilteredRequestLogger: Fix start/stop, invalid delegate behavior. (#6637)
* FilteredRequestLogger: Fix start/stop, invalid delegate behavior.

Fixes two bugs:

1) FilteredRequestLogger did not start/stop the delegate.

2) FilteredRequestLogger would ignore an invalid delegate type, and
instead silently substitute the "noop" logger. This was due to a larger
problem with RequestLoggerProvider setup in general; the fix here is
to remove "defaultImpl" from the RequestLoggerProvider interface, and
instead have JsonConfigurator be responsible for creating the
default implementations. It is stricter about things than the old system
was, and is only willing to make a noop logger if it doesn't see any
request logger configs. Otherwise, it'll raise a provision error.

* Remove unneeded annotations.
2018-12-14 16:55:44 +08:00
Clint Wylie 4ec068642d move parquet extension input formats up a level to `org.apache.druid.data.input.parquet.DruidParquetInputFormat` for `parquet` and `org.apache.druid.data.input.parquet.DruidParquetAvroInputFormat` for `parquet-avro` (#6727) 2018-12-13 16:33:42 -08:00
David Lim f7bbee2e65 Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 11:47:20 -08:00
Vadim Ogievetsky da4836f38c Added titles and harmonized docs to improve usability and SEO (#6731)
* added titles and harmonized docs

* manually fixed some titles
2018-12-12 20:42:12 -08:00
Clint Wylie 55914687bb Fix broken link in docs toc (#6728)
Change 'peon.html' to the correct link, 'peons.html'. No redirect is needed because the file has always been 'peons', just an incorrect link was introduced in the toc here https://github.com/apache/incubator-druid/pull/6259/files#diff-45297643736c5fb6da0e92f2c3df5d68R89
2018-12-12 15:14:38 -08:00
Vincent Newkirk cc44a4a28f Correct Documentation for lowerStrict/upperStrict (#6707)
The documentation for Bound filter's lowerStrict/upperStrict is incorrect. It is not consistent with the examples provided and actual behaviour of the bound filter. Correct this.
2018-12-06 10:14:50 -08:00
Mingming Qiu 607339003b Add TaskCountStatsMonitor to monitor task count stats (#6657)
* Add TaskCountStatsMonitor to monitor task count stats

* address comments

* add file header

* tweak test
2018-12-04 13:37:17 -08:00
Clint Wylie a1c9d0add2 autosize processing buffers based on direct memory sizing by default (#6588)
* autosize processing buffers based on direct memory sizing

* remove oops, more test

* max 1gb autosize buffers, test, start of docs

* fix oops

* revert accidental change

* print buffer size in exception

* change the things
2018-12-03 18:40:02 -07:00
David Lim e2bedab665 fix links to use relative references (#6696) 2018-11-30 16:32:10 -08:00
David Lim b332021c49 remove extensions from default configs that have configuration/library dependencies and update docs (#6694) 2018-11-30 12:52:46 -08:00
rcgarcia74 9bf835b84f remove #658 doc reference for Schema-less design (#6693) 2018-11-30 12:53:57 -07:00
Jihoon Son d6539abd0a Fix overlord api and console (#6686)
* Fix overlord APIs and console

* remove getRunningTasksByDataSource

* add missing path to isApplicable
2018-11-29 23:45:28 -08:00
Mingming Qiu c5405bb592 emit maxLag/avgLag in KafkaSupervisor (#6587)
* emit maxLag/totalLag/avgLag in KafkaSupervisor

* modify ingest/kafka/totalLag to ingest/kafka/lag for backwards compatibility
2018-11-28 02:11:14 -08:00
Mingming Qiu 849ba867b2 fix missing property in JsonTypeInfo of SegmentWriteOutMediumFactory (#6656) 2018-11-27 15:59:58 -08:00
Clint Wylie efdec50847 bloom filter sql (#6502)
* bloom filter sql support

* docs

* style fix

* style fixes after rebase

* use copied/patched bloomkfilter

* remove context literal lookup function, changes from review

* fix build

* rename LookupOperatorConversion to QueryLookupOperatorConversion

* remove doc

* revert unintended change

* add internal exception to bloom filter deserialization exception
2018-11-27 14:11:18 +08:00
Evans Hauser 03df481c9c Docs: Fix wikipedia links in Ingestion:Rollup (#6659)
The rendered site doesn't have automatic link detection, so we need to add these links in explicitly. This also fixes the Measure link, which included an extra `)`

http://druid.io/docs/latest/ingestion/index.html#rollup
2018-11-23 16:28:05 -08:00
seoeun 22a5bf97a2 Fix issue that tasks tables in metadata storage are not cleared (#6592)
* tasks tables in metadata storage are not cleared

* address comments. remove tasklogs and revert obsolete changes

* address comments. change comment and update doc.

* address comments. update doc more detailed

* address comments. remove redundant log and update doc more detailed.

* address comments. update document
2018-11-22 11:50:31 +08:00
Jonathan Wei e285b1103d Use PasswordProvider for basic HTTP escalator (#6650) 2018-11-21 07:34:15 -08:00
Caroline1000 a438a9b99c fix typo in config page of docs (#6645) 2018-11-19 16:32:58 -08:00
Deiwin Sarjas e0d1dc5846 Support DogStatsD style tags in statsd-emitter (#6605)
* Replace StatsD client library

The [Datadog package][1] is a StatsD compatible drop-in replacement for the
client library, but it seems to be [better maintained][2] and has support for
Datadog DogStatsD specific features, which will be made use of in a subsequent
commit.

The `count`, `time`, and `gauge` methods are actually exactly compatible with
the previous library and the modifications shouldn't be required, but EasyMock
seems to have a hard time dealing with the variable arguments added by the
DogStatsD library and causes tests to fail if no arguments are provided for the
last String vararg. Passing an empty array fixes the test failures.

[1]: https://github.com/DataDog/java-dogstatsd-client
[2]: https://github.com/tim-group/java-statsd-client/issues/37#issuecomment-248698856

* Retain dimension key information for StatsD metrics

This doesn't change behavior, but allows separating dimensions from the metric
name in subsequent commits.

There is a possible order change for values from
`dimsBuilder.build().values()`, but from the tests it looks like it doesn't
affect actual behavior and the order of user dimensions is also retained.

* Support DogStatsD style tags in statsd-emitter

Datadog [doesn't support name-encoded dimensions and uses a concept of _tags_
instead.][1] This change allows Datadog users to send the metrics without
having to encode the various dimensions in the metric names. This enables
building graphs and monitors with and without aggregation across various
dimensions from the same data.

As tests in this commit verify, the behavior remains the same for users who
don't enable the `druid.emitter.statsd.dogstatsd` configuration flag.

[1]: https://www.datadoghq.com/blog/the-power-of-tagged-metrics/#tags-decouple-collection-and-reporting

* Disable convertRange behavior for DogStatsD users

DogStatsD, unlike regular StatsD, supports floating-point values, so this
behavior is unnecessary. It would be possible to still support `convertRange`,
even with `dogstatsd` enabled, but that would mean that people using the
default mapping would have some of the gauges unnecessarily converted.

`time` is in milliseconds and doesn't support floating-point values.
2018-11-19 09:47:57 -08:00
Gian Merlino 7cd457f41c Kafka: Add warning to doc for earlyMessageRejectionPeriod. (#6644) 2018-11-18 15:47:38 -07:00
Benjamin Hopp 8a258d3a6a Fix Hadoop Indexing doc to clarify segmentOutputPath only required for CLI indexer (#6636)
* Updated hadoop indexing doc to reflect segmentOutputPath is only required when using CLI indexer; otherwise it must be NULL
2018-11-17 12:19:20 +08:00
Niketh Sabbineni 2ebdce20b1 Fix smile query documentation (#6620) 2018-11-14 08:51:02 +08:00
Jihoon Son cdae2fe7b5 Deprecate IntervalChunkingQueryRunner (#6591)
* Deprecate IntervalChunkingQueryRunner

* add doc

* deprecate metric

* fix doc
2018-11-14 06:33:27 +08:00
Gian Merlino 154b6fbcef SQL: Add "POSITION" function. (#6596)
Also add a "fromIndex" argument to the strpos expression function. There
are some -1 and +1 adjustment terms due to the fact that the strpos
expression behaves like Java indexOf (0-indexed), but the POSITION SQL
function is 1-indexed.
2018-11-13 13:39:00 -08:00
Jihoon Son 7b262b7123 Remove unnecessary path param from auto compaction api (#6594)
* Remove unnecessary path param from auto compaction api

* fix ci
2018-11-13 09:46:13 -08:00
David Lim afb239b17a add missing license headers, in particular to MD files; clean up RAT … (#6563)
* add missing license headers, in particular to MD files; clean up RAT exclusions

* revert inadvertent doc changes

* docs

* cr changes

* fix modified druid-production.svg
2018-11-13 09:38:37 -08:00
Clint Wylie 1224d8b746 overhaul 'druid-parquet-extensions' module, promoting from 'contrib' to 'core' (#6360)
* move parquet-extensions from contrib to core, adds new hadoop parquet parser that does not convert to avro first and supports flattenSpec and int96 columns, add support for flattenSpec for parquet-avro conversion parser, much test with a bunch of files lifted from spark-sql

* fix avro flattener to support nullable primitives for auto discovery and now only supports primitive arrays instead of all arrays

* remove leftover print

* convert micro timestamp to millis

* checkstyle

* add ignore for .parquet and .parq to rat exclude

* fix legit test failure from avro flattern behavior change

* fix rebase

* add exclusions to pom to cut down on redundant jars

* refactor tests, add support for unwrapping lists for parquet-avro, review comments

* more comment

* fix oops

* tweak parquet-avro list handling

* more docs

* fix style

* grr styles
2018-11-05 21:33:42 -08:00
David Lim 23ad3d214c fixup docs to download from Apache mirror, fixup tarball name and path, change references from quickstart/* to quickstart/tutorial/* (#6570) 2018-11-01 21:47:29 -07:00
Caroline1000 26d992840c correct default tier name (#6568) 2018-11-01 17:51:13 -07:00
QiuMM ddd15a6907 correct default value for maxTotalRows (#6566) 2018-11-01 16:53:15 -07:00
Jihoon Son a92c2a197b
Move supervisor APIs to api-reference (#6555)
* Move supervisor APIs to api-reference

* fix kafka-specific docs

* add ingestion stats report
2018-11-01 13:10:05 -07:00
QiuMM 7b34662462 Period load/drop/broadcast rules should include the future by default (#6414)
* Period load/drop/broadcast rules should include the future by default

* address comments

* adjust coordinator console and tweak docs

* address comments

* fix travis-ci
2018-11-01 09:43:34 -07:00
Jihoon Son d2a533c7c7 Add doc for missing balancerComputeThreads configuration (#6561)
* Add doc for missing balancerComputeThreads configuration

* remove duplicate
2018-10-31 18:43:12 -07:00
taiii b1159174b7 Update mysql.md (#6545) 2018-10-30 14:01:32 -07:00
Jonathan Wei 8382764900 Remove unused bin/init script, conf-quickstart reference (#6520) 2018-10-26 11:30:01 -07:00
Jonathan Wei b2d9b6f23d Allow custom TLS cert checks (#6432)
* Allow custom TLS cert checks

* PR comment

* Checkstyle, PR comment
2018-10-24 16:31:52 -07:00
QiuMM 601183b4c7 Add period drop before rule (#6415)
* Add period drop before rule

* add license header

* support period drop before rule in coordinator console

* address comments
2018-10-24 12:44:30 -07:00
David Lim 822e564f54 include mysql-metadata-storage extension in distribution, but without… (#6497)
* include mysql-metadata-storage extension in distribution, but without the GPL-licensed connector library

* Install mysql connector package

* use symlinks to avoid versioning issues

* add documentation for fetching the mysql connector
2018-10-20 18:18:58 -07:00
QiuMM f5f4171a45 QueryCountStatsMonitor: emit query/count (#6473)
Let `QueryCountStatsMonitor` emit `query/count`, then I can monitor QPS of my services, or I have to count it by myself.
2018-10-19 10:15:02 -03:00
patelh c780aacc03 Add ability to specify dbcp properties file (#6419)
* Add ability to specify dbcp properties file

* Address PR comments, use mock config, remove setter

* Add documentation

* APRC, updated docs with example file contents

* APRC, add @Nullable, @VisibileForTesting, update doc

* APRC, remove error log, use props directly as jackson binding

* Remove unused files
2018-10-16 12:27:19 -07:00
QiuMM 85a89e2703 make druid node bind address configurable (#6464)
* make druid node bind address configurable

* fix tests

* fix travis-ci
2018-10-15 14:19:40 -07:00
robertervin 95ab1ea737 Fix Empty InDimFilter Failure (#6330)
* fix empty InDimFilter failure (#6101)

* Add test case for empty values input

* Add documentation for empty values in InDimFilter
2018-10-14 20:43:16 -07:00
Clint Wylie 84598fba3b combine druid-api, druid-common, java-util into druid-core (#6443)
* combine druid-api, druid-common, java-util

* spacing
2018-10-14 20:37:37 -07:00
dongyifeng b06ac54a5e add PrefixFilteredDimensionSpec for multi-value dimensions (#6307)
* add PrefixFilteredDimensionSpec for multi-value dimensions

* add docs for PrefixFilteredDimensionSpec

* remove unnecessary null handling

* add null check to the result of NullHandling
2018-10-12 17:51:09 -07:00
vishnu rao 6567fff9e7 Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type' header (#4033)
* o- Query Response format to be based on http 'accept' header & Query Payload contenty type to be based on 'content-type' header

* o- Query Response format to be based on http 'accept' header & Query Payload contenty type to be based on 'content-type' header
o- if Accept header is absent, it defaults to Content-Type header

* Feature: Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type'  PR #4033
Minor change to a comment - restoring to previous wording

* Feature: Query Response format to be based on http 'accept' header & Query Payload content type to be based on 'content-type'  PR #4033
o- minor change to check for empty string
2018-10-12 14:29:14 -07:00
Atul Mohan ab7b4798cc Securing passwords used for SSL connections to Kafka (#6285)
* Secure credentials in consumer properties

* Merge master

* Refactor property population into separate method

* Fix property setter

* Fix tests
2018-10-11 10:03:01 -07:00
QiuMM f8f4526b16 Add suspend|resume|terminate all supervisors endpoints. (#6272)
* ability to showdown all supervisors

* add doc

* address comments

* fix code style

* address comments

* change ternary assignment to if statement

* better docs
2018-10-10 21:41:59 -07:00
Clint Wylie f7775d1db3 fixes for LookupReferencesManagerTest (#6444)
* some fixes for LookupReferencesManagerTest

* docs

* formatting

* more formatting fixes
2018-10-10 18:02:11 -07:00
Surekha 3a0a667fe0 Introduce SystemSchema tables (#5989) (#6094)
* Added SystemSchema with following tables (#5989)

* SEGMENTS table provides details on served and published segments
* SERVERS table provides details on data servers
* SERVERSEGMETS table is the JOIN of SEGMENTS and SERVERS
* TASKS table provides details on tasks

* Add documentation for system schema

* Fix static-analysis warnings

* Address PR comments

*Add unit tests

* Fix a test

* Try to fix a test

* Fix a bug around replica count

* rename io.druid to org.apache.druid

* Major change is to make tasks and segment queries streaming

* Made tasks/segments stream to calcite instead of storing it in memory
* Add num_rows to segments table
* Refactor JsonParserIterator
* Replace with closeable iterator

* Fix docs, make num_rows column nullable, some unit test changes

* make num_rows column type long, allow it to be null

fix a compile error after merge, add TrafficCop param to InputStreamResponseHandler

* Filter null rows for segments table from Linq4j enumerable

* change num_replicas datatype to long in segments table

* Fix some tests and address comments

* Doc updates, other PR comments

* Update tests

* Address comments

* Add auth check
* Update docs
* Refactoring

* Fix teamcity warning, change the getQueryableServer in TimelineServerView

* Fix compilation after rebase

* Use the stream API from AuthorizationUtils

* Added LeaderClient interface and NoopDruidLeaderClient class

* Revert "Added LeaderClient interface and NoopDruidLeaderClient class"

This reverts commit 100fa46e39.

* Make the naming consistent to server_segments for the join table

* Add ForbiddenException on auth check failure
* Remove static block from SystemSchema

* Try to fix a test in CalciteQueryTest due to rename of server_segments

* Fix the json output format in the coordinator API

* Add auth check in the segments API
* Add null check to avoid NPE

* Use annonymous class object instead of mock for DruidLeaderClient in SqlBenchmark

* Fix test failures, type long/BIGINT can be nullable

* Revert long nullability to fix tests

* Fix style for tests

* PR comments

* Address PR comments

* Add the missing BytesAccumulatingResponseHandler class

* Use Sequences.withBaggage in DruidPlanner

* Fix docs, add comments

* Close the iterator if hasNext returns false
2018-10-10 17:17:29 -07:00
QiuMM d559dfecb2 replace deprecated druid.port by druid.plaintextPort in docs (#6427) 2018-10-09 10:57:01 -07:00
Jihoon Son 88d23b77b7 Add support keepSegmentGranularity for automatic compaction (#6407)
* Add support keepSegmentGranularity for automatic compaction

* skip unknown dataSource

* ignore single semgnet to compact

* add doc

* address comments

* address comment
2018-10-07 16:48:58 -07:00
Jihoon Son 45aa51a00c Add support hash partitioning by a subset of dimensions to indexTask (#6326)
* Add support hash partitioning by a subset of dimensions to indexTask

* add doc

* fix style

* fix test

* fix doc

* fix build
2018-10-06 16:45:07 -07:00
Roman Leventov c5872bef41 Improve GC metrics documentation (#6423) 2018-10-05 14:57:01 -07:00
Gian Merlino 244046fda5 SQL: Fix too-long headers in http responses. (#6411)
Fixes #6409 by moving column name info from HTTP headers into the
result body.
2018-10-01 18:13:08 -07:00
Jihoon Son cb14a43038 Remove ConvertSegmentTask, HadoopConverterTask, and ConvertSegmentBackwardsCompatibleTask (#6393)
* Remove ConvertSegmentTask, HadoopConverterTask, and ConvertSegmentBackwardsCompatibleTask

* update doc and remove auto conversion

* remove remaining doc

* fix teamcity
2018-10-01 12:03:35 -07:00
Shiv Toolsidass a56ffe6ab2 Added backpressure metric to docs and defaultMetricDimensions (#6405)
* Added backpressure metric to docs and defaultMetricDimensions.json

* Reworded description for backpressure metric in docs
2018-09-29 17:57:29 -07:00
adursun 6f44e568db Add missing comma (#6399) 2018-09-28 09:02:36 -07:00
QiuMM 47a6cca013 Add TimestampSpec format for microsecond (#6395) 2018-09-27 09:38:44 -07:00
Jihoon Son 6fb503c073 Deprecate task audit logging (#6368)
* Deprecate task audit logging

* fix test

* fix it test
2018-09-26 16:28:02 -07:00
Nishant Bangarwa c9d281a2e9 Add ability to pass in Bloom filter from Hive Queries (#6222)
* Bloom filter initial implementation

fix checkstyle

review comments

Fix wierd failure

review comments

Revert "Fix wierd failure"

This reverts commit a13a83ad7887e679f6d539191b52aeaaea85b613.

* fix test

* review comment
2018-09-26 16:04:26 -07:00
Caroline1000 034d006d24 add to docs on including Drop rule with Load rule (#6378) 2018-09-25 20:13:52 -07:00
Benedict Jin e5d9fcfe8f Add maven.exec.xxx.skip option for exec-maven-plugin (#6162)
* Fix conflicts

* Modify io.druid into org.apache.druid
2018-09-25 10:05:26 -07:00
Jihoon Son 99428e20d2 Deprecate dimensions / metrics APIs on brokers (#6361)
* Deprecate dimensions / metrics APIs on brokers

* add segmentMetadataQuery link

* add more doc
2018-09-24 17:56:38 -07:00
Jonathan Wei ee7b565469 Docs for ingestion stat reports and new parse exception handling (#6373) 2018-09-24 17:45:05 -07:00
Alexander Saydakov 93345064b5 HllSketch module (#5712)
* HllSketch module

* updated license and imports

* updated package name

* implemented makeAggregateCombiner()

* removed json marks

* style fix

* added module

* removed unnecessary import, side effect of package renaming

* use TreadLocalRandom

* addressing code review points, mostly formatting and comments

* javadoc

* natural order with nulls

* typo

* factored out raw input value extraction

* singleton

* style fix

* style fix

* use Collections.singletonList instead of Arrays.asList

* suppress warning
2018-09-24 08:41:56 -07:00
Jonathan Wei f12ffd19a8
Add Kafka reset instructions for tutorial (#6362) 2018-09-21 14:18:31 -07:00
Jonathan Wei 8972244c68 Mutual TLS support (#6076)
* Mutual TLS support

* Kafka test fixes

* TeamCity fix

* Split integration tests

* Use localhost DOCKER_IP

* Increase server thread count

* Increase SSL handshake timeouts

* Add broken pipe retries, use injected client config params

* PR comments, Rat license check exclusion
2018-09-19 09:56:15 -07:00
Dayue Gao edf0c13807 add a sql option to force user to specify time condition (#6246)
* add a sql option to force user to specify time condition

* rename forceTimeCondition to requireTimeCondition, refine error message
2018-09-17 13:52:24 -07:00
QiuMM 288aa4d504 Add missing metadata table information in docs (#6309)
* Add missing metadata table information in doc file

* address review comment
2018-09-14 12:17:05 -07:00
QiuMM 85391e9fb3 fix opentsdb emitter always be running and fail sending tags whose value contains colon (#6251)
* fix opentsdb emitter always be running

* check if emitter started

* add more details about consumeDelay in doc

* fix possible thread unsafe

* fix fail sending tags whose value contain colon
2018-09-14 12:14:15 -07:00
QiuMM 87ccee05f7 Add ability to specify list of task ports and port range (#6263)
* support specify list of task ports

* fix typos

* address comments

* remove druid.indexer.runner.separateIngestionEndpoint config

* tweak doc

* fix doc

* code cleanup

* keep some useful comments
2018-09-13 19:36:04 -07:00
Jonathan Wei fd6786ac6c Fix endpoint permissions section in basic-security docs (#6331) 2018-09-13 15:23:41 -07:00
Clint Wylie 91a37c692d 'suspend' and 'resume' support for supervisors (kafka indexing service, materialized views) (#6234)
* 'suspend' and 'resume' support for kafka indexing service
changes:
* introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively.
* `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise.
* `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors
* Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever
* Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}`
* Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate'

* move overlord console status to own column in supervisor table so does not look like garbage

* spacing

* padding

* other kind of spacing

* fix rebase fail

* fix more better

* all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests

* fix log
2018-09-13 14:42:18 -07:00
Gian Merlino d6cbdf86c2
Broker backpressure. (#6313)
* Broker backpressure.

Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.

Fixes #4933.

* Fix query context doc.
2018-09-10 09:33:29 -07:00
Gian Merlino 4669f0878f SQL: UNION ALL operator. (#6314)
* SQL: UNION ALL operator.

* Remove unused import.
2018-09-09 22:32:56 -07:00
Clint Wylie e6e068ce60 Add support for 'maxTotalRows' to incremental publishing kafka indexing task and appenderator based realtime task (#6129)
* resolves #5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask

* address review comments

* changes due to review

* merge fail
2018-09-07 13:17:49 -07:00
Jonathan Wei 60cbc64472
Use PasswordProvider, fix info on initial passwords in basic security extension docs (#6303)
* Fix info on initial passwords in basic security extension docs

* Use PasswordProvider

* Compile fix
2018-09-05 17:07:16 -07:00
Jonathan Wei 4caa61d8fa Fix tutorial sample data filename, fix logger classname in metrics docs (#6299) 2018-09-04 21:47:12 -07:00
Eyal Yurman 10ca290d64 Correct file name typo in Quickstart tutorial (#6297)
Correct name wikipedia-2015-09-12-sampled.json.gz to wikiticker-2015-09-12-sampled.json.gz
2018-09-04 14:20:17 -07:00
Jonathan Wei 180e3ccfad
Docs consistency cleanup (#6259) 2018-09-04 12:54:41 -07:00
QiuMM 9b04846e6b correct metric name in doc file (#6271) 2018-08-30 10:57:35 -07:00
Gian Merlino 431d3d8497
Rename io.druid to org.apache.druid. (#6266)
* Rename io.druid to org.apache.druid.

* Fix META-INF files and remove some benchmark results.

* MonitorsConfig update for metrics package migration.

* Reorder some dimensions in inner queries for some reason.

* Fix protobuf tests.
2018-08-30 09:56:26 -07:00
Himanshu 1fae6513e1 add "subtotalsSpec" attribute to groupBy query (#5280)
* add subtotalsSpec attribute to groupBy query

* dont sent subtotalsSpec to downstream nodes from broker and other updates

* address review comment

* fix checkstyle issues after merge to master

* add docs for subtotalsSpec feature

* address doc review comments
2018-08-28 17:46:38 -07:00
Jim Slattery d957295b98 spelling: storage (#6248) 2018-08-27 16:35:31 -07:00
Gian Merlino 0172326c62 SQL: Support more result formats, add columns header. (#6191)
* SQL: Support more result formats, add columns header.

- Add result formats for line-based JSON and CSV.
- Add X-Druid-Sql-Columns header with a list of all columns that
the response will contain.
- Add more comprehensive documentation on what callers should expect
when making Druid SQL queries.

* Fix some tests.

* Adjust tests.

* Adjust trailer, add types header.

* Fix trailers.
2018-08-26 23:00:14 -06:00
Susie 6e73ad6231 Fix bound query keys for Filtering on numeric values (#5881)
It is currently showing the use of `lowerBound` and `upperBound` instead of `lower` and `upper` for the range.
2018-08-23 14:07:10 -07:00
QiuMM ceb8f8e625 remove unnecessary tlsPortFinder to avoid potential port conflicts (#6194) 2018-08-23 10:41:49 -07:00
Ryan Plessner 9c500fb69f Add PostgreSQLConnectorConfig to expose SSL configuration options (#6181)
* Add PostgreSQLConnectorConfig to expose SSL configuration options for the Postgres Metadata Storage module.

* Fix checkstyle violations and add license header

* Convert properties in the postgres docs to be the full property path and fix typo

* Fix grammar in sslFactory docs
2018-08-21 16:45:27 -07:00
QiuMM 266f3dfbcb remove duplicate link to operations/recommendations.html (#6193) 2018-08-21 12:02:43 -07:00
QiuMM b0cf8d0252 'shutdownAllTasks' API for a dataSource (#6185)
* 'shutdownAllTasks' API for a dataSource

Change-Id: I30d14390457d39e0427d23a48f4f224223dc5777

* fix api path and return

Change-Id: Ib463f31ee2c4cb168cf2697f149be845b57c42e5

* optimize implementation

Change-Id: I50a8dcd44dd9d36c9ecbfa78e103eb9bff32eab9
2018-08-17 12:57:09 -04:00
Jonathan Wei 0c3bb47558 Change hybrid cache default types in docs to caffeine (#6182) 2018-08-17 12:17:43 -04:00
Caroline1000 f447b784de update sigar link (#6175) 2018-08-14 16:58:29 -07:00
QiuMM 69f555019b convert all time-intervals in ISO 8601 format to uppercase in doc files (#6118)
Change-Id: I904fed4cfb600a8a42664335557f611133a5078d
2018-08-13 12:58:47 -07:00
Jonathan Wei 94a937b5e8
New doc fixes (#6156) 2018-08-13 11:11:32 -07:00
Atul Mohan 064c22c937 Fix redirects (#6151) 2018-08-10 13:55:47 -07:00
Jonathan Wei b0805540af
Fix kafka tutorial typo (#6141) 2018-08-09 18:41:05 -07:00
Jonathan Wei af0557c1f7
Unified configuration doc page (#6127)
* Unified configuration doc page

* Rename to index.md, update redirects

* PR comments

* PR comments

* PR comment
2018-08-09 14:52:14 -07:00
Jonathan Wei fea2ab7094
New docs intro (#6122)
* New docs intro

* PR comments

* Fix arch diagram

* PR comment

* PR comment

* PR comment
2018-08-09 14:19:11 -07:00
pdeva c028d18d74 update redis-cache documentation (#6109)
* update redis-cache documentation

added clarifying info on setup and enablement

* added link
2018-08-09 13:44:59 -07:00
Jonathan Wei aa660b8751 Add docs for virtual columns and transform specs (#6119)
* Add docs for virtual columns and transform specs

* PR Comments

* PR comment
2018-08-09 14:42:52 -06:00
Jonathan Wei 2b64025eaf Separate hadoop and native batch docs more (#6120)
* Separate hadoop and native batch docs more

* Rebase with parallel batch

* PR comments
2018-08-09 14:40:20 -06:00
Jonathan Wei 24f2e8ba26 New quickstart and tutorials (#6126)
* New quickstart and tutorials

* PR comments

* Fix tranquility
2018-08-09 14:37:52 -06:00
Jonathan Wei 2b0f03acb9 Unified API doc page (#6128)
* Unified API doc page

* PR comments

* Fix metadata endpoint
2018-08-09 14:27:42 -06:00
Gian Merlino 3525d4059e
Cache: Add maxEntrySize config, make groupBy cacheable by default. (#5108)
* Cache: Add maxEntrySize config.

The idea is this makes it more feasible to cache query types that
can potentially generate large result sets, like groupBy and select,
without fear of writing too much to the cache per query.

Includes a refactor of cache population code in CachingQueryRunner and
CachingClusteredClient, such that they now use the same CachePopulator
interface with two implementations: one for foreground and one for
background.

The main reason for splitting the foreground / background impls is
that the foreground impl can have a more effective implementation of
maxEntrySize. It can stop retaining subvalues for the cache early.

* Add CachePopulatorStats.

* Fix whitespace.

* Fix docs.

* Fix various tests.

* Add tests.

* Fix tests.

* Better tests

* Remove conflict markers.

* Fix licenses.
2018-08-07 10:23:15 -07:00
Jihoon Son 56ab4363ea
Native parallel batch indexing without shuffle (#5492)
* Native parallel indexing without shuffle

* fix build

* fix ci

* fix ingestion without intervals

* fix retry

* fix retry

* add it test

* use chat handler

* fix build

* add docs

* fix ITUnionQueryTest

* fix failures

* disable metrics reporting

* working

* Fix split of static-s3 firehose

* Add endpoints to supervisor task and a unit test for endpoints

* increase timeout in test

* Added doc

* Address comments

* Fix overlapping locks

* address comments

* Fix static s3 firehose

* Fix test

* fix build

* fix test

* fix typo in docs

* add missing maxBytesInMemory to doc

* address comments

* fix race in test

* fix test

* Rename to ParallelIndexSupervisorTask

* fix teamcity

* address comments

* Fix license

* addressing comments

* addressing comments

* indexTaskClient-based segmentAllocator instead of CountingActionBasedSegmentAllocator

* Fix race in TaskMonitor and move HTTP endpoints to supervisorTask from runner

* Add more javadocs

* use StringUtils.nonStrictFormat for logging

* fix typo and remove unused class

* fix tests

* change package

* fix strict build

* tmp

* Fix overlord api according to the recent change in master

* Fix it test
2018-08-06 23:59:42 -07:00
Nishant Bangarwa 75c8a87ce1 Part 2 of changes for SQL Compatible Null Handling (#5958)
* Part 2 of changes for SQL Compatible Null Handling

* Review comments - break lines longer than 120 characters

* review comments

* review comments

* fix license

* fix test failure

* fix CalciteQueryTest failure

* Null Handling - Review comments

* review comments

* review comments

* fix checkstyle

* fix checkstyle

* remove unrelated change

* fix test failure

* fix failing test

* fix travis failures

* Make StringLast and StringFirst aggregators nullable and fix travis failures
2018-08-02 08:20:25 -07:00
Andrés Gómez e270362767 Add stringLast and stringFirst aggregators extension (#5789)
* Add lastString and firstString aggregators extension

* Remove duplicated class

* Move first-last-string doc page to extensions-contrib

* Fix ObjectStrategy compare method

* Fix doc bad aggregatos type name

* Create FoldingAggregatorFactory classes to fix SegmentMetadataQuery

* Add getMaxStringBytes() method to support JSON serialization

* Fix null pointer exception at segment creation phase when the string value is null

* Control the valueSelector object class on BufferAggregators

* Perform all improvements

* Add java doc on SerializablePairLongStringSerde

* Refactor ObjectStraty compare method

* Remove unused ;

* Add aggregateCombiner unit tests. Rename BufferAggregators unit tests

* Remove unused imports

* Add license header

* Add class name to java doc class serde

* Throw exception if value is unsupported class type

* Move first-last-string extension into druid core

* Update druid core docs

* Fix null pointer exception when pair->string is null

* Add null control unit tests

* Remove unused imports

* Add first/last string folding aggregator on AggregatorsModule to support segment metadata query

* Change SerializablePairLongString to extend SerializablePair

* Change vars from public to private

* Convert vars to primitive type

* Clarify compare comment

* Change IllegalStateException to ISE

* Remove TODO comments

* Control possible null pointer exception

* Add @Nullable annotation

* Remove empty line

* Remove unused parameter type

* Improve AggregatorCombiner javadocs

* Add filterNullValues option at StringLast and StringFirst aggregators

* Add filterNullValues option at agg documentation

* Fix checkstyle

* Update header license

* Fix StringFirstAggregatorFactory.VALUE_COMPARATOR

* Fix StringFirstAggregatorCombiner

* Fix if condition at StringFirstAggregateCombiner

* Remove filterNullValues from string first/last aggregators

* Add isReset flag in FirstAggregatorCombiner

* Change Arrays.asList to Collections.singletonList
2018-08-01 10:52:54 -07:00
Caroline1000 7f89c72932 Add definition of 'NONE' to queryGranularity in ingestion.index doc (#6073)
* Add meaning of granularity = None to queryGranularity

* Fix format
2018-07-30 14:07:33 -07:00
Gian Merlino 63be028cee
CompactionTask: Reject empty intervals on construction. (#6059)
* CompactionTask: Reject empty intervals on construction.

They don't make sense anyway, and it's better to fail fast.

* Switch API.
2018-07-30 08:52:50 -07:00
Eyal Yurman 94d6c9a0a5 Remove JDK 7 from build documentation. (#6031)
See issue #6030
2018-07-26 17:05:07 -07:00
Jonathan Wei efab3b0160 Add concat and textcat SQL functions (#6005) 2018-07-20 11:21:04 -07:00
Gian Merlino cd8ea3da8d
SQL: Add server-wide default time zone config. (#5993)
* SQL: Add server-wide default time zone config.

* Switch API.
2018-07-18 13:12:40 -07:00
Caroline1000 5f78a333ad show that flatten will also work with avro extension (#5874)
* show that flatten will also work with avro extension

* fix url
2018-07-11 16:47:03 -07:00
Gian Merlino 04ea3c9f8c
Update license headers. (#5976)
* Update license headers.

For compliance with http://www.apache.org/legal/src-headers.html.

* More license adjustments.

* Fix mistakenly edited package line.
2018-07-11 09:55:18 -07:00
Caroline1000 b3976050ad add definition of balancerComputeThreads (#5865) 2018-07-05 09:54:36 -07:00
Caroline1000 ee4a5aafb0 add config values for GCS deep storage (#5875)
* add config values for GCS deep storage

* fix config values for GCS deep storage
2018-07-05 09:53:41 -07:00
Dylan Wylie 10642ef9ca Fix filtered request logging docs (#5924)
- Setting druid.request.logging.delegate has no effect. 
- The provider is injected based on a type parameter & this looks to be scoped to delegate for filtered loggers
2018-07-05 09:51:10 -07:00
scrawfor bf2a31a5bc Add new 'true' filter which always returns true. (#5711)
* Add new 'true' filter which always returns true.

* Add support for bitmap index.

* Adds documentation.

* Removes No-op Filter
2018-06-28 11:52:45 -07:00
Gian Merlino a28314349c
Fix spelling of "propagate" in various places. (#5896)
One of these is a configuration parameter (introduced in #5429),
but it's never been in a release, so I think it's ok to rename it.
2018-06-25 09:18:08 -07:00
varaga b4b1b2a020 Provisioning support for ZooKeeper Authorization (#5701)
Review comments implemented
2018-06-15 14:02:01 -07:00
zhangxinyu e43e5ebbcd Materialized view implementation (#5556)
* implement materialized view

* modify code according to jihoonson's comments

* modify code according to jihoonson's comments - 2

* add documentation about materialized view

* use new HadoopTuningConfig in pr 5583

* add minDataLag and fix optimizer bug

* correct value of DEFAULT_MIN_DATA_LAG_MS

* modify code according to jihoonson's comments - 3

* use the boolean expression instead of if-else
2018-06-09 12:24:54 -07:00
Caroline1000 96feb479cd add order change needed for KIS in 0.12.0 (#5760) 2018-06-08 15:25:26 -07:00
Hongze Zhang cfa94b747b Update to jetty 9.4; Enable request decompression (#5624)
* Update to jetty 9.4; Enable request decompression; Add http compression config options

* Fix BadMessageException from jetty server at HttpGenerator.generateHeaders(...)
2018-06-08 14:53:08 -07:00
awelsh93 adbe22c05b Security - add anonymous authenticator (#5842)
* Anonymous authenticator that authenticates all requests and then directs them to an authorizer.

* Adding documentation

* Removed some fields from class AnonymousAuthenticator

* Updating docs
2018-06-07 10:17:54 -07:00
Siddharth Subramanian 37409dc2f4 Fix minor documentation error (#5851)
Adding a required `,` in the sample JSON
2018-06-06 12:51:56 -07:00
Ryan Plessner ee45ee6915 Fix docs to reflect the correct default max total row count for the IndexTuningConfig (#5845) 2018-06-05 13:15:12 -07:00
awelsh93 1a4707f09c Remove extra slash in endpoint (#5822) 2018-06-05 13:11:26 -07:00
Alexander Saydakov d1cdcd4895 Datasketches doc correction (#5816)
* func was renamed to operation during code review

* added missing descriptions, some cleanup
2018-06-05 17:52:37 +05:30
Atul Mohan 50ad7a45ff Fix authentication doc (#5813) 2018-05-30 11:10:48 -07:00
Jihoon Son 67ff7dacbd Support server-side encryption for s3 (#5740)
* Support server-side encryption for s3

* fix teamcity

* typo

* address comments

* Refactoring configuration injection

* fix doc

* fix doc
2018-05-28 20:22:08 -07:00
Joseph Glanville 5cbfb95e1f docs: Document inputFormat on Hadoop InputSpecs (#5784) 2018-05-24 21:44:37 -07:00
Gian Merlino bc0ff251a3 Docs: Clarify the meaning of maxSplitSize. (#5803) 2018-05-24 21:43:39 -07:00
Michael Schnupp 33b4eb624d fix freeSpacePercent in segmentCache.locations (#5765)
* fix freeSpacePercent in segmentCache.locations

* the check should probably test the other way around
* documentation should put the option in the right place
* examples have a superfluous backslash

* add test to verify correct behavior

* switch to Path and test with jimfs

Path allows to use different filesystems.
Jimfs provides an actual (in memory) filesystem.
This also allows more complex test scenarios.

The behavior should be unchanged by this commit.

* Revert "switch to Path and test with jimfs"

This reverts commit 8b9a418d65.
2018-05-24 11:15:30 +09:00
Atul Mohan 1b9611a60e Local indexing from RDBMS (#5441)
* Local indexing from RDBMS

*  Fix content

* Remove pom changes

* Remove extraneous space

* Add tests and update documentation

* Fix comments

* Fix docs

*  Fix build related issue

*  Handle invalid strings

* Make target database independent of metadata storage

* Add firehose connector

* Fix accessibility

* Add docs

* Remove unused def

* Remove lazy instantiation of jsoniterator

* Move unused changes

* Move unused changes

* Fix build

* Make Sqlfirehose method private
2018-05-22 12:33:01 +09:00
Caroline1000 c73e3ea4f5 Provide examples to havingSpec filters (#5774)
* expand examples

* expand examples for filtered havingSpecs

* expand other having examples

* remove blank code block

* add better AND/OR/NOT examples

* fix indentation
2018-05-14 13:43:42 -07:00
Abhishek Kaushik aa23fe6386 Typo fix in historical doc (#5753) 2018-05-08 11:08:27 -07:00
Kirill Kozlov 67d0b0ee42 Add taskType dimension to task metrics (#5664) 2018-05-07 09:42:26 -07:00
kaijianding c12c16385e support throw duplcate row during realtime ingestion in RealtimePlumber (#5693) 2018-05-04 10:12:25 -07:00
Dylan Wylie 2c5f0038fd Make lookup offheap buffer configurable (#5696)
* Make lookup offheap buffer configurable

Fixes #3663

* Address comments

* Update docs

* Update docs
2018-05-04 10:00:55 -07:00
Stuart McLean c2b5e5ec95 Default caffeine cache size (#5738)
* add default caffeine cache size based on runtime Xmx or max 1GB

* update docs for caffeine cache

* fix formatting

* test caffeine size should never be less than 0

* set caffeine max default size to 1G not 1M

* fix caffeine cache tests
2018-05-04 09:29:11 -07:00
Surekha 13c616ba24 'maxBytesInMemory' tuningConfig introduced for ingestion tasks (#5583)
* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Fix check style and remove a comment

* Add overlord unsecured paths to coordinator when using combined service (#5579)

* Add overlord unsecured paths to coordinator when using combined service

* PR comment

* More error reporting and stats for ingestion tasks (#5418)

* Add more indexing task status and error reporting

* PR comments, add support in AppenderatorDriverRealtimeIndexTask

* Use TaskReport instead of metrics/context

* Fix tests

* Use TaskReport uploads

* Refactor fire department metrics retrieval

* Refactor input row serde in hadoop task

* Refactor hadoop task loader names

* Truncate error message in TaskStatus, add errorMsg to task report

* PR comments

* Allow getDomain to return disjointed intervals (#5570)

* Allow getDomain to return disjointed intervals

* Indentation issues

* Adding feature thetaSketchConstant to do some set operation in PostAgg (#5551)

* Adding feature thetaSketchConstant to do some set operation in PostAggregator

* Updated review comments for PR #5551 - Adding thetaSketchConstant

* Fixed CI build issue

* Updated review comments 2 for PR #5551 - Adding thetaSketchConstant

* Fix taskDuration docs for KafkaIndexingService (#5572)

* With incremental handoff the changed line is no longer true.

* Add doc for automatic pendingSegments (#5565)

* Add missing doc for automatic pendingSegments

* address comments

* Fix indexTask to respect forceExtendableShardSpecs (#5509)

* Fix indexTask to respect forceExtendableShardSpecs

* add comments

* Deprecate spark2 profile in pom.xml (#5581)

Deprecated due to https://github.com/druid-io/druid/pull/5382

* CompressionUtils: Add support for decompressing xz, bz2, zip. (#5586)

Also switch various firehoses to the new method.

Fixes #5585.

* This commit introduces a new tuning config called 'maxBytesInMemory' for ingestion tasks

Currently a config called 'maxRowsInMemory' is present which affects how much memory gets
used for indexing.If this value is not optimal for your JVM heap size, it could lead
to OutOfMemoryError sometimes. A lower value will lead to frequent persists which might
be bad for query performance and a higher value will limit number of persists but require
more jvm heap space and could lead to OOM.
'maxBytesInMemory' is an attempt to solve this problem. It limits the total number of bytes
kept in memory before persisting.

 * The default value is 1/3(Runtime.maxMemory())
 * To maintain the current behaviour set 'maxBytesInMemory' to -1
 * If both 'maxRowsInMemory' and 'maxBytesInMemory' are present, both of them
   will be respected i.e. the first one to go above threshold will trigger persist

* Address code review comments

* Fix the coding style according to druid conventions
* Add more javadocs
* Rename some variables/methods
* Other minor issues

* Address more code review comments

* Some refactoring to put defaults in IndexTaskUtils
* Added check for maxBytesInMemory in AppenderatorImpl
* Decrement bytes in abandonSegment
* Test unit test for multiple sinks in single appenderator
* Fix some merge conflicts after rebase

* Fix some style checks

* Merge conflicts

* Fix failing tests

Add back check for 0 maxBytesInMemory in OnHeapIncrementalIndex

* Address PR comments

* Put defaults for maxRows and maxBytes in TuningConfig
* Change/add javadocs
* Refactoring and renaming some variables/methods

* Fix TeamCity inspection warnings

* Added maxBytesInMemory config to HadoopTuningConfig

* Updated the docs and examples

* Added maxBytesInMemory config in docs
* Removed references to maxRowsInMemory under tuningConfig in examples

* Set maxBytesInMemory to 0 until used

Set the maxBytesInMemory to 0 if user does not set it as part of tuningConfing
and set to part of max jvm memory when ingestion task starts

* Update toString in KafkaSupervisorTuningConfig

* Use correct maxBytesInMemory value in AppenderatorImpl

* Update DEFAULT_MAX_BYTES_IN_MEMORY to 1/6 max jvm memory

Experimenting with various defaults, 1/3 jvm memory causes OOM

* Update docs to correct maxBytesInMemory default value

* Minor to rename and add comment

* Add more details in docs

* Address new PR comments

* Address PR comments

* Fix spelling typo
2018-05-03 16:25:58 -07:00
Gian Merlino 739e347320 Allow Hadoop dataSource inputSpec to be specified multiple times. (#5717)
* Allow Hadoop dataSource inputSpec to be specified multiple times.

* Fix test
2018-05-03 13:51:57 -07:00
Stuart McLean d2b8d880ea include hybrid and caffeine in cache docs and show caffeine as default (#5737) 2018-05-03 09:52:05 -07:00
Jihoon Son d4311b4a5a Support enablePathStyleAccess, disableChunkedEncoding, and forceGlobalBucketAccessEnabled for aws client (#5702)
* Support enablePathStyleAccess and disableChunkedEncoding for aws client

* add an option for forceGlobalBucketAccessEnabled

* add missing doc
2018-05-02 10:45:38 -07:00
Jakub Kukul e2431ae161 Update defaultHadoopCoordinates in documentation. (#5720)
* Update defaultHadoopCoordinates in documentation.

To match changes applied in #5382.

* Remove a parameter with defaults from example configuration file.

If it has reasonable defaults, then why would it be in an example config file?

Also, it is yet another place that has been forgotten to be updated and will be forgotten in the future.

Also, if someone is running different hadoop version, then there's much more work to be done than just changing this property, so why give users false hopes?

* Fix typo in documentation.
2018-04-30 20:49:14 -07:00
Dylan Wylie 754c80e74a Fix quickstart docs to specify that Java 8 is required. (#5722)
See #4907 #5719
2018-04-30 13:25:59 -07:00
Gian Merlino 0f8493846e Replace dev list references in docs. (#5723) 2018-04-30 11:25:45 -07:00
David Lim 8ec2d2fe18 Use unique segment paths for Kafka indexing (#5692)
* support unique segment file paths

* forbiddenapis

* code review changes

* code review changes

* code review changes

* checkstyle fix
2018-04-29 21:59:48 -07:00
Gian Merlino 762f8829e4
Add task action metrics, add taskId metric dimension. (#5714)
* Add task action metrics, add taskId metric dimension.

Adds two new metrics: task/action/log/time and task/action/run/time. Also
adds taskId as a dimension, to give us the ability to drill down into metrics
for an individual task. Also standardizes metrics-attachment using two helper
methods in IndexTaskUtils.

* Fix typo
2018-04-29 21:24:06 -07:00
Joseph Glanville 90cd05696e Document processing properties required for Middlemanager (#5660) 2018-04-29 17:20:17 -07:00
Jihoon Son 86746f82d8 Use mergeBuffer instead of processingBuffer in parallelCombiner (#5634)
* Use mergeBuffer instead of processingBuffer in parallelCombiner

* Fix test

* address comments

* fix test

* Fix test

* Update comment

* address comments

* fix build

* Fix test failure
2018-04-27 18:14:37 -07:00
Gian Merlino f81855d607
Add unauthorized errorCode to query docs. (#5691) 2018-04-26 13:06:25 -07:00
Caroline1000 fd76af9737 remove old prod cluster config link (#5676) 2018-04-23 18:00:24 -07:00
scrawfor 15f4ab2b31 Expose noop filter to users (#5597) 2018-04-18 07:57:07 -07:00
Gian Merlino fbf3fc178e Timeseries: Add "grandTotal" option. (#5640)
* Timeseries: Add "grandTotal" option.

* Modify whitespace.

* Checkstyle workaround.
2018-04-16 18:22:19 -07:00
Jonathan Wei d0b66a6af5 Fix HTTP OPTIONS request auth handling (#5638)
* Fix HTTP OPTIONS request auth handling

* PR comment

* More PR comments

* Fix

* PR comment
2018-04-16 18:09:56 -07:00
Jihoon Son 6b3bde0143 Fix granularitySpec doc (#5647) 2018-04-16 14:24:39 -04:00
Jonathan Wei 882b172318
Revert "Fix HTTP OPTIONS request auth handling (#5615)" (#5637)
This reverts commit df51a7bcb7.
2018-04-12 16:43:54 -07:00
Jonathan Wei df51a7bcb7
Fix HTTP OPTIONS request auth handling (#5615)
* Fix HTTP OPTIONS request auth handling

* Flip configuration boolean
2018-04-12 14:02:20 -07:00
Caroline1000 48c1a1ef57 change header from Data Schema to Ingestion Spec (#5631) 2018-04-11 21:42:54 -07:00
Nishant Bangarwa e6efd75a3d Add config to allow setting up custom unsecured paths for druid nodes. (#5614)
* Add config to allow setting up custom unsecured paths for druid nodes.

* return all resources for Unsecured paths

* review comment - Add test

* fix tests

* fix test
2018-04-11 17:10:07 -07:00
Caroline1000 afa75e04b7 change header in overlord console; minor querydoc change (#5625)
* change header in overlord console; minor querydoc change

* remove change to overlord console

* address Gian comments
2018-04-11 12:57:22 -07:00
Nishant Bangarwa b32aad9ab4 Fix some broken links in druid docs (#5622)
* Fix some broken links in druid docs

* review comment
2018-04-11 10:27:33 -07:00
Nishant Bangarwa 80fa5094e8 Fix Kerberos Authentication failing requests without cookies and excludedPaths config. (#5596)
* Fix Kerberos Authentication failing requests without cookies.

KerberosAuthenticator was failing `First` request from the clients.
After authentication we were setting the cookie properly but not
setting the the authenticated flag in the request. This PR fixed that.

Additional Fixes -
* Removing of Unused SpnegoFilterConfig - replaced by
KerberosAuthenticator
* Unused internalClientKeytab and principal from KerberosAuthenticator
* Fix docs accordingly and add docs for configuring an escalated
client.

* Fix excluded path config behavior

* spelling correction

* Revert "spelling correction"

This reverts commit fb754b43d8.

* Revert "Fix excluded path config behavior"

This reverts commit 3901047769.
2018-04-09 20:45:35 -07:00
Alexander T ad6f234e1e Update lookups-cached-global.md (#5525)
Update lookup creation example to work with version 0.12.0
2018-04-06 16:13:17 -07:00
Jihoon Son 723857699c Add doc for automatic pendingSegments (#5565)
* Add missing doc for automatic pendingSegments

* address comments
2018-04-05 23:53:43 -07:00
Dylan Wylie ddd23a11e6 Fix taskDuration docs for KafkaIndexingService (#5572)
* With incremental handoff the changed line is no longer true.
2018-04-05 23:52:58 -07:00
Arup Malakar 0c4598c1fe Fix typo in avatica java client code documenation (#5553) 2018-03-29 16:36:40 -05:00