Commit Graph

2529 Commits

Author SHA1 Message Date
Surekha d3497a6581
Filter on metrics doc (#10087)
* add note about filter on metrics to filter docs

* edit doc to include having and filtered aggregator links
2020-06-30 19:52:40 -07:00
Lee Rhodes 7b4edc93fc
Update web address to datasketches.apache.org (#10096) 2020-06-30 19:05:23 -07:00
Yuanli Han fc555980e8
Remove payload field from table sys.segment (#9883)
* remove payload field from table sys.segments

* update doc

* fix test

* fix CI failure

* add necessary fields

* fix doc

* fix comment
2020-06-29 22:20:23 -07:00
Clint Wylie 4a625751e8
Information schema doc update (#10081)
* add docs for IS_JOINABLE and IS_BROADCAST to INFORMATION_SCHEMA docs

* fixes

* oops

* revert noise

* missed one

* spellbot
2020-06-29 21:08:13 -07:00
BIGrey 69f2b1ef00
Correct the position of the double quotation in distinctcount.md file (#10094)
```
"dimensions": "[sample_dim]"
```
should be
```
"dimensions": ["sample_dim"]
```
2020-06-29 20:59:56 -07:00
Suneet Saldanha 15a0b4ffe2
Filter http requests by http method (#10085)
* Filter http requests by http method

Add a config that allows a user which http methods to allow against their
Druid server.

Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.

If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config

* Exclude OPTIONS from always supported HTTP methods

Add HEAD as an allowed method for web console e2e tests

* fix docs

* fix security IT

* Actually fix the web console e2e tests

* Ignore icode coverage for nitialization classes

* code review
2020-06-29 16:59:31 -07:00
Jian Wang 20fd72bd13
Fix NPE when brokers use custom priority list (#9878) 2020-06-26 17:28:54 -07:00
Maytas Monsereenusorn ec46d82c71
Add integration tests for SqlInputSource (#10080)
* Add integration tests for SqlInputSource

* make it faster
2020-06-26 10:32:42 -10:00
Jihoon Son aaee72c781
Allow append to existing datasources when dynamic partitioning is used (#10033)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports

* Allow append to existing datasources when dynamic partitioing is used

* fix test

* checkstyle

* checkstyle

* fix test

* fix test

* fix other tests..

* checkstyle

* hansle unknown core partitions size in overlord segment allocation

* fail to append when numCorePartitions is unknown

* log

* fix comment; rename to be more intuitive

* double append test

* cleanup complete(); add tests

* fix build

* add tests

* address comments

* checkstyle
2020-06-25 13:37:31 -07:00
Clint Wylie 0f51b3c190
fix dropwizard emitter jvm bufferpoolName metric (#10075)
* fix dropwizard emitter jvm bufferpoolName metric

* fixes
2020-06-25 12:20:25 -07:00
Maytas Monsereenusorn 9be5039f68
Enable query vectorization by default (#10065)
* Enable query vectorization by default

* update docs
2020-06-24 13:08:49 -07:00
sthetland 978b494b46
Druid user permissions (#10047)
* Druid user permissions apply in the console

* Update index.md

* noting user warning in console page; some minor shuffling

* noting user warning in console page; some minor shuffling 1

* touchups

* link checking fixes

* Updated per suggestions
2020-06-23 17:39:48 -07:00
Dylan Wylie 0470fcc9da
change default number of segment loading threads (#9856)
* change default number of segment loading threads

* fix docs

* missed file

* min -> max for segment loading threads

Co-authored-by: Dylan <dwylie@spotx.tv>
2020-06-23 13:56:44 -07:00
Jianhuan Liu 5600e1c204
fix docs error in hadoop-based part (#9907)
* fix docs error: google to azure and hdfs to http

* fix docs error: indexSpecForIntermediatePersists of tuningConfig in hadoop-based batch part

* fix docs error: logParseExceptions of tuningConfig in hadoop-based batch part

* fix docs error: maxParseExceptions of tuningConfig in hadoop-based batch part
2020-06-19 23:14:54 -10:00
Jihoon Son d644a27f1a
Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports
2020-06-18 18:40:43 -07:00
Suneet Saldanha b8a3223f24
Remove changes from #9114 (#10050) 2020-06-18 18:18:12 -07:00
litao a4bd144ebe
fix docs (#9114)
Co-authored-by: tomscut <tomscut@gmail.com>
2020-06-18 09:48:47 -07:00
Maytas Monsereenusorn 1a2620606d
API to verify a datasource has the latest ingested data (#9965)
* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* fix checksyle

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* fix spelling

* address comments

* fix checkstyle

* update docs

* fix tests

* fix doc

* address comments

* fix typo

* fix spelling

* address comments

* address comments

* fix typo in docs
2020-06-16 20:48:30 -10:00
Suneet Saldanha 4e483a70b4
ROUND and having comparators correctly handle special double values (#10014)
* ROUND and having comparators correctly handle doubles

Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real
numbers. Because of this, they can not be converted to BigDecimal and instead
throw a NumberFormatException.

This change adds support for calculations that produce these numbers either
for use in the `ROUND` function or the HavingSpecMetricComparator by not
attempting to convert the number to a BigDecimal.

The bug in ROUND was first introduced in #7224 where we added the ability to
round to any decimal place. This PR changes the behavior back to using
`Math.round` if we recognize a number that can not be converted to a
BigDecimal.

* Add tests and fix spellcheck

* update error message in ExpressionsTest

* Address comments

* fix up round for infinity

* round non numeric doubles returns a double

* fix spotbugs

* Update docs/misc/math-expr.md

* Update docs/querying/sql.md
2020-06-16 16:09:46 -07:00
Suneet Saldanha 0035f39e25
lpad and rpad functions match postrges behavior in SQL compatible mode (#10006)
* lpad and rpad functions deal with empty pad

Return null if the pad string used by the `lpad` and `rpad` functions is
an empty string

* Fix rpad

* Match PostgreSQL behavior in SQL compliant null handling mode

* Match PostgreSQL behavior for pad -ve len

* address review comments
2020-06-15 10:47:57 -07:00
Jonathan Wei fe2f656427
Fix broadcast rule drop and docs (#10019)
* Fix broadcast rule drop and docs

* Remove racy test check

* Don't drop non-broadcast segments on tasks, add overshadowing handling

* Don't use realtimes for overshadowing

* Fix dropping for ingestion services
2020-06-12 02:33:28 -07:00
danc 5da78d13af
Update password-provider.md (#9857) 2020-06-10 09:32:49 -07:00
Atul Mohan 17cf8ea8f2
Add Sql InputSource (#9449)
* Add Sql InputSource

* Add spelling

* Use separate DruidModule

* Change module name

* Fix docs

* Use sqltestutils for tests

* Add additional tests

* Fix inspection

* Add module test

* Fix md in docs

* Remove annotation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-06-09 12:55:20 -07:00
Lucas Capistrant 2ae1d26aa8
small fixes to configuration documentation (#9975) 2020-06-09 10:31:08 -07:00
Gian Merlino 3dfd7c30c0
Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. (#9893)
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.

- Add REGEXP_LIKE function that returns a boolean, and is useful in
  WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
  filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
  (previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
  non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.

* Changes based on PR review.

* Fix arg check.

* Important fixes!

* Add speller.

* wip

* Additional tests.

* Fix up tests.

* Add validation error tests.

* Additional tests.

* Remove useless call.
2020-06-03 14:31:37 -07:00
Maytas Monsereenusorn 0d22462e07
Document unsupported Join on multi-value column (#9948)
* Document Unsupported Join on multi-value column

* Document Unsupported Join on multi-value column

* address comments

* Add unit tests

* address comments

* add tests
2020-06-03 09:55:52 -10:00
sthetland a33705f0e3
Querying doc refresh tutorial (#9879)
* Update tutorial-query.md

* First full pass complete

* Smoothing over, a bit

* link and spell checking

* Update querying.md

* Review comments; screenshot fixes

* Making ports consistent, pending confirmation 

Switching to the Router port, to make this be consistent with the tutorial ports, but can switch back here and there if it should be 8082 instead.

* Resizing screenshot

* Update querying.md

* Review feedback incorporated.
2020-05-29 14:32:21 -07:00
Surekha ff551ae412
Modify information schema doc to specify correct value of TABLE_CATALOG (#9950) 2020-05-29 10:10:28 -07:00
Maytas Monsereenusorn 6130a834c2
Update doc on tmp dir (java.io.tmpdir) best practice (#9910)
* Update doc on tmp dir best practice

* remove local recommendation
2020-05-26 09:37:01 -07:00
frank chen b91d50044e
add some details to the build doc (#9885)
* update initial build command

* add some details for building

* fix spelling check errors

* fix spelling check warnings

Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-05-21 12:35:54 -07:00
Jianhuan Liu 2050f2b00a
fix docs error: google to azure and hdfs to http (#9881) 2020-05-20 10:17:39 -07:00
Joseph Glanville 793f386d6a
Add support for Avro OCF using InputFormat (#9671)
* Add AvroOCFInputFormat

* Support supplying a reader schema in AvroOCFInputFormat

* Add docs for Avro OCF input format

* Address review comments

* Address second round of review
2020-05-16 14:09:12 -07:00
Maytas Monsereenusorn 0a8bf83bc5
Bad plan for table-lookup-lookup join with filter on first lookup and outer limit (#9773)
* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* Bad plan for table-lookup-lookup join with filter on first lookup and outer limit

* address comments

* address comments

* fix checkstyle

* address comments

* address comments
2020-05-14 16:56:40 -07:00
awelsh93 6f25a84d2e
Add TaskCountStatsMonitor to config docs (#9447) 2020-05-11 14:08:46 -07:00
sthetland ce03f31a73
Clarifying workerThreads and a few other nits (#9804)
* Update data-formats.md

Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }"

* Light text cleanup

* Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here.

* Clarifying accepted values for URI lookup

* Update index.md

* original quickstart full first pass

* original quickstart full first pass

* first pass all the way through

* straggler

* image touchups and finished old tutorial

* a bit of finishing up

* druid-caffeine-cache ext previously removed

* Sample MaxDirectMemorySize value unrealistic

* Review comments

* fixing links

* spell checking gymnastics

* workerThreads desc slightly expanded

* typo

* Typo

* Reversing Kafka config order

* Changing order of configs for Kinesis

* Trying this again: ioConfig then tuningConfig
2020-05-06 09:05:18 -07:00
Alexander Saydakov 844d626738
added number of bins parameter (#9436)
* added number of bins parameter

* addressed review points

* test equals

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2020-05-04 16:53:09 -07:00
Jian Wang 85dfbb64cb
Update documention for metricCompression (#9811) 2020-05-03 12:56:48 -07:00
sthetland c61365c1e0
Druid Quickstart refactor and update (#9766)
* Update data-formats.md

Per Suneet, "Since you're editing this file can you also fix the json on line 177 please - it's missing a comma after the }"

* Light text cleanup

* Removing discussion of sample data, since it's repeated in the data loading tutorial, and not immediately relevant here.

* Update index.md

* original quickstart full first pass

* original quickstart full first pass

* first pass all the way through

* straggler

* image touchups and finished old tutorial

* a bit of finishing up

* Review comments

* fixing links

* spell checking gymnastics
2020-04-30 12:07:28 -07:00
Aleksei Chumagin 0642f778fa
changed Preview to Apply (#9757) 2020-04-29 09:53:25 -07:00
James Dalton b279e04a31
table fix (#9769) 2020-04-28 11:23:24 -07:00
Francesco Nidito e7e41e3a36
Adding support for autoscaling in GCE (#8987)
* Adding support for autoscaling in GCE

* adding extra google deps also in gce pom

* fix link in doc

* remove unused deps

* adding terms to spelling file

* version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT

* GCEXyz -> GceXyz in naming for consistency

* add preconditions

* add VisibleForTesting annotation

* typos in comments

* use StringUtils.format instead of String.format

* use custom exception instead of exit

* factorize interval time between retries

* making literal value a constant

* iter all network interfaces

* use provided on google (non api) deps

* adding missing dep

* removing unneded this and use Objects methods instead o 3-way if in hash and comparison

* adding import

* adding retries around getRunningInstances and adding limit for operation end waiting

* refactor GceEnvironmentConfig.hashCode

* 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT

* removing unused config

* adding tests to hash and equals

* adding nullable to waitForOperationEnd

* adding testTerminate

* adding unit tests for createComputeService

* increasing retries in unrelated integration-test to prevent sporadic failure (hopefully)

* reverting queryResponseTemplate change

* adding comment for Compute.Builder.build() returning null
2020-04-28 03:13:39 -07:00
Gian Merlino 4087a015e8
Datasource doc structure adjustments. (#9716)
- Reorder both the datasource and query-execution page orderings to
table, lookup, union, inline, query, join. (Roughly increasing order
of conceptual "fanciness".)
- Add more crosslinks from datasource page to query-execution page:
one per datasource type.
2020-04-23 16:04:59 -07:00
Clint Wylie e677c62484
document useFilterCNF query context parameter (#9647)
* document useFilterCNF query context parameter

* move context key to QueryContexts

* Update .spelling
2020-04-16 22:12:20 -07:00
Clint Wylie b89ad49396
disable group by config applyLimitPushDownToSegment by default (#9711)
* disable group by config applyLimitPushDownToSegment by default

* document
2020-04-16 03:03:35 -07:00
Gian Merlino 42590ae64b
Refresh query docs. (#9704)
* Refresh query docs.

Larger changes:

- New doc: querying/datasource.md describes the various kinds of
datasources you can use, and has examples for both SQL and native.
- New doc: querying/query-execution.md describes how native queries
are executed at a high level. It doesn't go into the details of specific
query engines or how queries run at a per-segment level. But I think it
would be good to add or link that content here in the future.
- Refreshed doc: querying/sql.md updated to refer to joins, reformatted
a bit, added a new "Query translation" section that explains how
queries are translated from SQL to native, and removed configuration
details (moved to configuration/index.md).
- Refreshed doc: querying/joins.md updated to refer to join datasources.

Smaller changes:

- Add helpful banners to the top of query documentation pages telling
people whether a given page describes SQL, native, or both.
- Add SQL metrics to operations/metrics.md.
- Add some color and cross-links in various places.
- Add native query component docs to the sidebar, and renamed them so
they look nicer.
- Remove Select query from the sidebar.
- Fix Broker SQL configs in configuration/index.md. Remove them from
querying/sql.md.
- Combined querying/searchquery.md and querying/searchqueryspec.md.

* Updates.

* Fix numbering.

* Fix glitches.

* Add new words to spellcheck file.

* Assorted changes.

* Further adjustments.

* Add missing punctuation.
2020-04-15 16:12:20 -07:00
Maytas Monsereenusorn 8328d91b30
Add missing integration tests for the compaction by the coordinator (#9644)
* Add API to trigger a compaction by the coordinator for integration tests

* Add missing integration tests for the compaction by the coordinator

* address comments
2020-04-15 14:27:33 -07:00
Will Salisbury cda9f41e69
s/S3/GCS/g (#9700)
fix typo [ at least I hope this was a typo… ]
2020-04-14 18:39:54 -07:00
Himanshu ca369e5768
druid-pac4j: add ability to use custom ssl trust store while talking to auth server (#9637)
* druid-pac4j: add ability for custom ssl trust store for talking to auth
server

* fix nimbusds DefaultResourceRetriever name in comment
2020-04-10 18:01:59 -07:00
bolkedebruin ab5ac7f890
Document possible vulnerabilities for the druid-ranger-security (#9649)
* Document possible vulnerabilities for the druid-ranger-security

In certain configurations the ranger plugin can expose vulnerabilities due
to some of its dependencies having CVEs.

* Spelling checker is a bit tight
2020-04-09 10:43:11 -07:00
bolkedebruin 2d99966933
Add Apache Ranger Authorization (#9579) 2020-04-04 18:02:24 +02:00
Maytas Monsereenusorn 1852bf33ea
Add Integration Test for functionality of kinesis ingestion (#9576)
* kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* Kinesis IT

* fix kinesis timeout

* Kinesis IT

* Kinesis IT

* fix checkstyle

* Kinesis IT

* address comments

* fix checkstyle
2020-04-03 09:45:22 -07:00
Neil Volungis 0ac875a8b4
Update docker.md readme to note memory requirements (#9529)
* Update docker.md readme to note memory requirements

* Fix grammatical error

Co-Authored-By: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>

Co-authored-by: Suneet Saldanha <44787917+suneet-s@users.noreply.github.com>
2020-03-24 03:33:29 -07:00
Clint Wylie bf85ea19b2
roaring bitmaps by default (#9548)
* it is finally time

* fix it

* more docs

* fix doc
2020-03-23 18:15:57 -07:00
Himanshu 5604ac7963
druid extension for OpenID Connect auth using pac4j lib (#8992)
* druid pac4j security extension for OpenID Connect OAuth 2.0 authentication

* update version in druid-pac4j pom

* introducing unauthorized resource filter

* authenticated but authorized /unified-webconsole.html

* use httpReq.getRequestURI() for matching callback path

* add documentation

* minor doc addition

* licesne file updates

* make dependency analyze succeed

* fix doc build

* hopefully fixes doc build

* hopefully fixes license check build

* yet another try on fixing license build

* revert unintentional changes to website folder

* update version to 0.18.0-SNAPSHOT

* check session and its expiry on each request

* add crypto service

* code for encrypting the cookie

* update doc with cookiePassphrase

* update license yaml

* make sessionstore in Pac4jFilter private non static

* make Pac4jFilter fields final

* okta: use sha256 for hmac

* remove incubating

* add UTs for crypto util and session store impl

* use standard charsets

* add license header

* remove unused file

* add org.objenesis.objenesis to license.yaml

* a bit of nit changes  in CryptoService  and embedding EncryptionResult for clarity

* rename alg  to cipherAlgName

* take cipher alg name, mode and padding as input

* add java doc  for CryptoService  and make it more understandable

* another  UT for CryptoService

* cache pac4j Config

* use generics clearly in Pac4jSessionStore

* update cookiePassphrase doc to mention PasswordProvider

* mark stuff Nullable where appropriate in Pac4jSessionStore

* update doc to mention jdbc

* add error log on reaching callback resource

* javadoc  for Pac4jCallbackResource

* introduce NOOP_HTTP_ACTION_ADAPTER

* add correct module name in license file

* correct extensions folder name in licenses.yaml

* replace druid-kubernetes-extensions to druid-pac4j

* cache SecureRandom instance

* rename UnauthorizedResourceFilter to AuthenticationOnlyResourceFilter
2020-03-23 18:15:45 -07:00
Clint Wylie d8833316c4
fix broken links (#9537)
* fix broken links

* missing /

* adjustment
2020-03-22 17:41:18 -07:00
Gian Merlino 54c9325256
SQL support for joins on subqueries. (#9545)
* SQL support for joins on subqueries.

Changes to SQL module:

- DruidJoinRule: Allow joins on subqueries (left/right are no longer
  required to be scans or mappings).
- DruidJoinRel: Add cost estimation code for joins on subqueries.
- DruidSemiJoinRule, DruidSemiJoinRel: Removed, since DruidJoinRule can
  handle this case now.
- DruidRel: Remove Nullable annotation from toDruidQuery, because
  it is no longer needed (it was used by DruidSemiJoinRel).
- Update Rules constants to reflect new rules available in our current
  version of Calcite. Some of these are useful for optimizing joins on
  subqueries.
- Rework cost estimation to be in terms of cost per row, and place all
  relevant constants in CostEstimates.

Other changes:

- RowBasedColumnSelectorFactory: Don't set hasMultipleValues. The lack
  of isComplete is enough to let callers know that columns might have
  multiple values, and explicitly setting it to true causes
  ExpressionSelectors to think it definitely has multiple values, and
  treat the inputs as arrays. This behavior interfered with some of the
  new tests that involved queries on lookups.
- QueryContexts: Add maxSubqueryRows parameter, and use it in druid-sql
  tests.

* Fixes for tests.

* Adjustments.
2020-03-22 16:43:55 -07:00
Clint Wylie 68013fbc64
fix issue where total limit was being applied even when not configured (#9534)
* fix issue where total limit was being applied even when not configured

* fix inspection

* add reserved lane name check to manual laning strategy
2020-03-18 18:05:59 -07:00
Chi Cao Minh e7b3dd9cd1
Update to mysql connector 5.1.48 (#9514) 2020-03-16 10:38:31 -07:00
Clint Wylie 69af760a19
add manual laning strategy, integration test (#9492)
* add manual laning strategy, integration test, json config test

* share percent conversion method

* wrong assert

* review stuffs

* doc adjustments

* more tests

* test adjustment

* adjust docs

* Update index.md
2020-03-13 20:06:55 -07:00
Clint Wylie 6afd55c8f4
threshold based automatic query prioritization (#9493)
* threshold based automatic query prioritization

* fixes

* spelling and fixes

* fix docs

* spelling

* checkstyle

* adjustments

* doc fix
2020-03-13 01:41:54 -07:00
Chi Cao Minh 6b02991464
Match GREATEST/LEAST function behavior to other DBs (#9488)
* Match GREATEST/LEAST function behavior

Change the behavior of the GREATEST / LEAST functions to be similar to
how it is implemented in other databases (as functions instead of
aggregators). The GREATEST/LEAST functions are not in the SQL standard,
but users will expect behavior similar to what other databases provide.

* Match postgres behavior & handle more SQL types

* Fix imports
2020-03-12 15:10:11 -07:00
Maytas Monsereenusorn e9888f41cb
Modify check java version script to indicate experimental support for Java 11 (#9455)
* Modify check java version script to indicate experimental support for Java 11

* update docs
2020-03-11 09:22:39 -07:00
Himanshu 75a5591448
remove old unused zookeeper dependent lookups code (#9480)
* remove old unused zookeeper dependent lookups code

* make  intellij inspector happy
2020-03-10 12:12:48 -07:00
Clint Wylie 8b9fe6f584
query laning and load shedding (#9407)
* prototype

* merge QueryScheduler and QueryManager

* everything in its right place

* adjustments

* docs

* fixes

* doc fixes

* use resilience4j instead of semaphore

* more tests

* simplify

* checkstyle

* spelling

* oops heh

* remove unused

* simplify

* concurrency tests

* add SqlResource tests, refactor error response

* add json config tests

* use LongAdder instead of AtomicLong

* remove test only stuffs from scheduler

* javadocs, etc

* style

* partial review stuffs

* adjust

* review stuffs

* more javadoc

* error response documentation

* spelling

* preserve user specified lane for NoSchedulingStrategy

* more test, why not

* doc adjustment

* style

* missed review for make a thing a constant

* fixes and tests

* fix test

* Update docs/configuration/index.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* doc update

Co-authored-by: sthetland <steve.hetland@imply.io>
2020-03-10 02:57:16 -07:00
Jihoon Son 75e2051195
Convert array_contains() and array_overlaps() into native filters if possible (#9487)
* Convert array_contains() and array_overlaps() into native filters if
possible

* make spotbugs happy and fix null results when null compatible
2020-03-09 22:50:38 -07:00
Maytas Monsereenusorn 814f5a9717
add password provider reference to s3 optional cred docs (#9439) 2020-03-09 17:56:42 -07:00
Julian Jaffe eda03630d0
Add OnHeapMemorySegmentWriteOutMediumFactory (#9454)
* Add OnHeapMemorySegmentWriteOutMediumFactory

Add a factory for OnHeapMemorySegmentWriteOutMedium to support direct writing via Spark.

* Register OnHeapMemorySegmentWriteOutMediumFactory.

Register OnHeapMemorySegmentWriteOutMediumFactory with SegmentWriteOutMediumFactory.

* Remove unnecessary throws

The base `makeSegmentWriteOutMedium` throws an IOException, but the particular implementation of OnHeapMemorySegmentWriteOutMediumFactory does not throw a checked exception.

* Update SegmentWriteOutMedium docs to include onHeapMemory

Update the SegmentWriteOutMedium section of the indexing docs to include a description of the new OnHeapSegmentMediumWriteOut option.
2020-03-05 22:34:08 -08:00
Jihoon Son 3016057178
Make Transform an ExtensionPoint (#9319)
* Make Transform an ExtensionPoint

* Add transform to the list of documented extensions

* Add example transform implementation
2020-03-04 12:13:14 -08:00
Jihoon Son 9466ac7c9b
Skip empty files for local, hdfs, and cloud input sources (#9450)
* Skip empty files for local, hdfs, and cloud input sources

* split hint spec doc

* doc for skipping empty files

* fix typo; adjust tests

* unnecessary fluent iterable

* address comments

* fix test

* use the right lists

* fix test

* fix test
2020-03-03 20:51:06 -08:00
Gian Merlino c9faf3e148
Add SQL GROUPING SETS support. (#9122)
* Add SQL GROUPING SETS support.

Built on top of the subtotalsSpec feature in the groupBy query. This also involves
two changes to subtotalsSpec:

- Alter behavior so limitSpec is applied after subtotalsSpec, rather than applied to
  each grouping set. This is more in line with SQL standard behavior. I think it is okay
  to make this change, since the old behavior was not documented, so users should
  hopefully not be depending on it.
- Fix a bug where virtual columns were included in the subtotal queries, but they
  should not have been.

Also fixes two bugs in query equality checking:

- BaseQuery: Use getDuration() instead of "duration" in equals and hashCode, since the
  latter is lazily initialized and might be null in one query but not the other.
- GroupByQuery: Include subtotalsSpec in equals and hashCode.

* Fix bugs.

* Fix tests.

* PR updates.

* Grouping class hygiene.
2020-02-26 08:52:39 -08:00
Maytas Monsereenusorn 92fb83726b
Add support for optional aws credentials for s3 for ingestion (#9375)
* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

* Add support for optional cloud (aws, gcs, etc.) credentials for s3 for ingestion

* fix build failure

* fix failing build

* fix failing build

* Code cleanup

* fix failing test

* Removed CloudConfigProperties and make specific class for each cloudInputSource

* Removed CloudConfigProperties and make specific class for each cloudInputSource

* pass s3ConfigProperties for split

* lazy init s3client

* update docs

* fix docs check

* address comments

* add ServerSideEncryptingAmazonS3.Builder

* fix failing checkstyle

* fix typo

* wrap the ServerSideEncryptingAmazonS3.Builder in a provider

* added java docs for S3InputSource constructor

* added java docs for S3InputSource constructor

* remove wrap the ServerSideEncryptingAmazonS3.Builder in a provider
2020-02-25 20:59:53 -08:00
zachjsh d771b42ed1
Move Azure extension into Core (#9394)
* Move Azure extension into Core

Moving the azure extension into Core.

* * Fix build failure

* * Add The MIT License (MIT) to list of compatible licenses

* * Address review comments

* * change reference to contrib azure to core azure

* * Fix spelling mistakes.
2020-02-25 17:49:16 -08:00
als-sdin f619903403
Updated the configuration documentation on coordinator kill tasks to clarify whether they delete only unused segments. (#9400) 2020-02-25 13:15:55 -08:00
Chi Cao Minh 7fc99ee206
Add common optional dependencies for extensions (#9399)
* Add common optional dependencies for extensions

Include hadoop-aws and postgres JDBC connector jar to improve
out-of-the-box experience for extensions. The mysql JDBC connector jar
is not bundled as it is GPL.

* Update docs

* Fix typo
2020-02-25 00:04:00 -08:00
Jihoon Son 3bc7ae782c
Create splits of multiple files for parallel indexing (#9360)
* Create splits of multiple files for parallel indexing

* fix wrong import and npe in test

* use the single file split in tests

* rename

* import order

* Remove specific local input source

* Update docs/ingestion/native-batch.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* Update docs/ingestion/native-batch.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* doc and error msg

* fix build

* fix a test and address comments

Co-authored-by: sthetland <steve.hetland@imply.io>
2020-02-24 17:34:39 -08:00
Clint Wylie 6d8dd5ec10
string -> expression -> string -> expression (#9367)
* add Expr.stringify which produces parseable expression strings, parser support for null values in arrays, and parser support for empty numeric arrays

* oops, macros are expressions too

* style

* spotbugs

* qualified type arrays

* review stuffs

* simplify grammar

* more permissive array parsing

* reuse expr joiner

* fix it
2020-02-21 15:43:02 -08:00
zachjsh f707064bed
Add Azure config options for segment prefix and max listing length (#9356)
* Add Azure config options for segment prefix and max listing length

Added configuration options to allow the user to specify the prefix
within the segment container to store the segment files. Also
added a configuration option to allow the user to specify the
maximum number of input files to stream for each iteration.

* * Fix test failures

* * Address review comments

* * add dependency explicitly to pom

* * update docs

* * Address review comments

* * Address review comments
2020-02-21 14:12:03 -08:00
Jihoon Son 141d8dd875
Enable druid.coordinator.kill.pendingSegments.on by default (#9385)
* Enable druid.coordinator.kill.pendingSegments.on by default

* checkstyle
2020-02-21 13:13:49 -08:00
Björn Zettergren 30c24df4d3
Add config option for namespacePrefix (#9372)
* Add config option for namespacePrefix

opentsdb emitter sends metric names to opentsdb verbatim as what druid
names them, for example "query.count", this doesn't fit well with a
central opentsdb server which might have namespaced metrics, for example
"druid.query.count". This adds support for adding an optional prefix.

The prefix also gets a trailing dot (.), after it, so the metric name
becomes <namespacePrefix>.<metricname>

configureable as "druid.emitter.opentsdb.namespacePrefix", as
documented.

Co-authored-by: Martin Gerholm <martin.gerholm@deltaprojects.com>
Signed-off-by: Martin Gerholm <martin.gerholm@deltaprojects.com>
Signed-off-by: Björn Zettergren <bjorn.zettergren@deltaprojects.com>

* Spelling for PR #9372

Added "namespacePrefix" to .spelling exceptions, it's a variable name
used in documentation for opentsdb-emitter.

* fixing tests for PR #9372

changed naming of variables to be more descriptive
added test of prefix being an empty string: "".
added a conditional to buildNamespacePrefix to check for empty string
being fed if EventConverter called without OpentsdbEmitterConfig
instance.

* fixing checkstyle errors for PR #9372

used == to compare literal string, should be equals()

* cleaned up and updated PR #9372

Created a buildMetric function as suggested by clintropolis, and
removed redundant tests for empty strings as they're only used when
calling EventConverter directly without going through
OpentsdbEmitterConfig.

* consistent naming of tests PR #9372

Changed names of tests in files to match better with what it was
actually testing

changed check for Strings.isNullOrEmpty to just check for `null`, as
empty string valued `namespacePrefix` is handled in
OpentsdbEmitterConfig.

Co-authored-by: Martin Gerholm <inspector-martin@users.noreply.github.com>
2020-02-20 14:01:41 -08:00
Clint Wylie b408a6d774
sql support for dynamic parameters (#6974)
* sql support for dynamic parameters

* fixup

* javadocs

* fixup from merge

* formatting

* fixes

* fix it

* doc fix

* remove druid fallback self-join parameterized test

* unused imports

* ignore test for now

* fix imports

* fixup

* fix merge

* merge fixup

* fix test that cannot vectorize

* fixup and more better

* dependency thingo

* fix docs

* tweaks

* fix docs

* spelling

* unused imports after merge

* review stuffs

* add comment

* add ignore text

* review stuffs
2020-02-19 13:09:20 -08:00
Clint Wylie 2e54755a03
add docker tutorial, friendlier docker-compose.yml, experimental java 11 dockerfile (#9262)
* add docker tutorial, experimental java 11 dockerfile

* fix typo

* spelling

* doc adjustments
2020-02-13 21:24:45 -08:00
Maytas Monsereenusorn c30579e47b
ANY Aggregator should not skip null values implementation (#9317)
* ANY Aggregator should not skip null values implementation

* add tests

* add more tests

* Update documentation

* add more tests

* address review comments

* optimize StringAnyBufferAggregator

* fix failing tests

* address pr comments
2020-02-12 14:01:41 -08:00
Clint Wylie c3ebb5eb65
variance aggregator support for double columns (#9076)
* variance aggregator support for double column instead of casting to float

* docs

* everything in its right place

* checkstyle

* adjustments
2020-02-12 09:32:42 -08:00
Dusan Maric ebd199da73
docs: extensions-core: mysql: fix MySQL connector library Maven Central URL (#9344) 2020-02-10 21:54:46 -08:00
Atul Mohan 7968524b01
Add Pig-specific file handling to Avro parser (#9258)
* Add processing for data files from AvroStorage

* Add words to spellings file
2020-02-10 21:53:11 -08:00
Clint Wylie b55657cc26
fix protobuf extension packaging and docs (#9320)
* fix protobuf extension packaging and docs

* fix paths

* Update protobuf.md

* Update protobuf.md
2020-02-07 09:26:52 -08:00
Lucas Capistrant 2e1dbe598c
Create new dynamic config to pause coordinator helpers when needed (#9224)
* Create new dynamic config to pause coordinator helpers when needed

* Fix spelling mistakes flagged in Travis build

* Add an integration test for coordinator pause dynamic config

* Improve documentation for new dynamic coordinator config and remove un-needed info logs in favor of debug

* address naming convention of 'deep store' vs 'deep storage' in new configs doc line

* Fix newline at end of configuration index.md

* Last try to resolve newline issue in configuration readme

* fix spell checks from travis build

* Fix another flagges spelling error from Travis
2020-02-05 15:33:42 -08:00
Aditya 868fdeb384
GREATEST/LEAST post-aggregators in SQL (#8719)
* implement shell for greatest sql aggregator with hardcoded long values

* implement functional long greatest aggregator for direct access columns

* implement greatest & least sql aggregators for long & double types using abstract base class

* add javadocs, unit tests & handling for floats for greatest/least postaggregations

* minor checkstyle fix

* improve naming for the test cases

* make inner class static

* remove blank lines to retest travis build

* change trivial text to rerun travis build

* implement suggested updates for greatest/least sql aggs & fix checkstyle issues

* fix stale comments in greatest/least sql aggs abstract base

* Update sql.md

* improve sql function definitions for greatest/least sql aggs

* add more tests for greatest/least sql aggs

* add tests to cover invalid greatest/least sql expressions

* rename & reorder greatest least sql tests
2020-02-04 17:08:53 -08:00
sthetland 556a3861ed
Make docs on reset supervisor operation scarier (#9288)
* Update kafka-ingestion.md

Companion doc update to #9253, intended to make a supervisor reset scarier

* Update kinesis-ingestion.md
2020-02-04 15:30:31 -08:00
Roman Leventov b9186f8f9f Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager (#7306)
* Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error

* Fix brace

* Import order

* Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill

* Fix tests

* Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY

* More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters

* Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig

* More variable and method renames

* Rename MetadataSegments to SegmentsMetadata

* Javadoc update

* Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs

* Update Javadoc of VersionedIntervalTimeline.iterateAllObjects()

* Reorder imports

* Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers

* Complete merge

* Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests

* Remove MetadataSegmentManager

* Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments

* Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder

* Fix inspections

* Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest

* Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods

* Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator

* Unused import

* Optimize imports

* Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata()

* Unused import

* Update terminology in datasource-view.tsx

* Fix label in datasource-view.spec.tsx.snap

* Fix lint errors in datasource-view.tsx

* Doc improvements

* Another attempt to please TSLint

* Another attempt to please TSLint

* Style fixes

* Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge)

* Try to fix docs build issue

* Javadoc and spelling fixes

* Rename SegmentsMetadata to SegmentsMetadataManager, address other comments

* Address more comments
2020-01-27 11:24:29 -08:00
Zhenxiao Luo 479c09751c Add MostAvailableSizeStorageLocationSelectorStrategy (#8879)
* Add MostAvailableSize LocationSelectorStrategy

* Add doc for mostAvailableSize strategy

* Fix docs for mostAvailableSize
2020-01-23 13:42:03 -08:00
sthetland 83ddc8de1e Update data-formats.md (#9238)
* Update data-formats.md

Field error and light rewording of new Avro material (and working through the doc authoring process).

* Update data-formats.md

Make default statements consistent. Future change: s/=/is.
2020-01-22 15:00:53 -08:00
Clint Wylie 8011211a0c first/last aggregators and nulls (#9161)
* null handling for numeric first/last aggregators, refactor to not extend nullable numeric agg since they are complex typed aggs

* initially null or not based on config

* review stuff, make string first/last consistent with null handling of numeric columns, more tests

* docs

* handle nil selectors, revert to primitive first/last types so groupby v1 works...
2020-01-20 11:51:54 -08:00
Suneet Saldanha 180c622e0f Minor doc updates (#9217)
* update string first last aggs

* update kafka ingestion specs in docs

* remove unnecessary parser spec
2020-01-20 11:34:37 -08:00
Gian Merlino d21054f7c5
Remove the deprecated interval-chunking stuff. (#9216)
* Remove the deprecated interval-chunking stuff.

See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details.

* Remove unused import.

* Remove chunkInterval too.
2020-01-19 17:14:23 -08:00
Suneet Saldanha 93167188ea Update docs for extensions (#9218)
* Update docs for s3 and avro extensions

* More doc updates - google + cleanup
2020-01-19 12:49:33 -08:00
Jihoon Son 153495068b Doc update for the new input source and the new input format (#9171)
* Doc update for new input source and input format.

- The input source and input format are promoted in all docs under docs/ingestion
- All input sources including core extension ones are located in docs/ingestion/native-batch.md
- All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md
- New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md

* parquet

* add warning for range partitioning with sequential mode

* hdfs + s3, gs

* add fs impl for gs

* address comments

* address comments

* gcs
2020-01-17 15:52:05 -08:00
singh 936b9bdfd0 add deets about the keyfile (#9209) 2020-01-17 11:24:49 -08:00
Maytas Monsereenusorn 42359c93dd Implement ANY aggregator (#9187)
* Implement ANY aggregator

* Add copyright headers

* Add unit tests

* fix BufferAggregator

* Fix bug in BufferAggregator

* hook up the SQL command

* add check for buffer aggregator

* Address comment

* address comments

* add docs

* Address comments

* add more tests for numeric columns that have null values when run in sql compatible null mode

* fix checkstyle errors

* fix failing tests

* fix failing tests
2020-01-16 14:40:32 -08:00
Suneet Saldanha 92ac22d060 Link javaOpts to middlemanager runtime.properties docs (#9101)
* Link javaOpts to middlemanager runtime.properties docs

* fix broken link

* reword config links
2020-01-15 21:22:49 -08:00
Suneet Saldanha 85a3d416b0 Tutorials use new ingestion spec where possible (#9155)
* Tutorials use new ingestion spec where possible

There are 2 main changes
  * Use task type index_parallel instead of index
  * Remove the use of parser + firehose in favor of inputFormat + inputSource

index_parallel is the preferred method starting in 0.17. Setting the job to
index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent
of an index task

Instead of using a parserSpec, dimensionSpec and timestampSpec have been
promoted to the dataSchema. The format is described in the ioConfig as the
inputFormat.

There are a few cases where the new format is not supported
 * Hadoop must use firehoses instead of the inputSource and inputFormat
 * There is no equivalent of a combining firehose as an inputSource
 * A Combining firehose does not support index_parallel

* fix typo
2020-01-15 14:08:29 -08:00
Jonathan Wei d1500c1328 Update Kinesis resharding information about task failures (#9104) 2020-01-07 15:44:48 -08:00
Jonathan Wei 58d337186b
Graduation update for ASF release process guide and download links (#9126)
* Graduation update for ASF release process guide and download links

* Fix release vote thread typo

* Fix pom.xml
2020-01-06 15:00:33 -06:00
Jonathan Wei aa539177ec De-incubation cleanup in code, docs, packaging (#9108)
* De-incubation cleanup in code, docs, packaging

* remove unused docs script
2020-01-03 12:33:19 -05:00
Jihoon Son 3c31493772 Add missing docs for http client configurations (#9054)
* Add missing docs for http client configurations

* fix typo

* backticks
2019-12-19 17:41:04 -08:00
Chi Cao Minh 6178f05da6 Fail superbatch range partition multi dim values (#9058)
* Fail superbatch range partition multi dim values

Change the behavior of parallel indexing range partitioning to fail
ingestion if any row had multiple values for the partition dimension.
After this change, the behavior matches that of hadoop indexing.
(Previously, rows with multiple dimension values would be skipped.)

* Improve err msg, rename method, rename test class
2019-12-18 10:14:03 -08:00
Clint Wylie 6881535b48
docs - clarify cache parameters (#9020) 2019-12-13 16:53:45 -08:00
Suneet Saldanha 3325da1718 Allow startup scripts to specify java home (#9021)
* Allow startup scripts to specify java home

The startup scripts now look for java in 3 locations. The order is from
most related to druid to least, ie
    ${DRUID_JAVA_HOME}
    ${JAVA_HOME}
    ${PATH}

* Update fn names and clean up code

* final round of fixes

* fix spellcheck
2019-12-12 21:36:00 -08:00
Himanshu 9236dd9467
optionally enable Jetty ForwardedRequestCustomizer (#9010)
* optionally enable Jetty ForwardedRequestCustomizer

* fix doc build
2019-12-12 17:00:08 -08:00
Benjamin Hopp 13c33c1766 Update architecture.md (#9015) 2019-12-11 19:05:50 -08:00
Jihoon Son e5e1e9c4ee
Fix broken master (#9005)
* Multibinding for NodeRole

* Fix endpoints

* fix doc

* fix test
2019-12-11 15:56:36 -08:00
Parag Jain 24fe824055 add readiness endpoints to processes having initialization delays (#8841) 2019-12-10 17:26:13 -08:00
Chi Cao Minh 3de7ab8523 DataSketches jars in core (#9003)
Having DataSketches jars in core will allow potential improvements, for
example:
- Provide an alternative implementation of HLL:
  https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html
- Range partitioning for native parallel batch indexing without having
  the user load extensions on the classpath

Dev mailing list discussion:
https://lists.apache.org/thread.html/301410d71ff799cf616bf17c4ebcf9999fc30829f5fa62909f403e6c%40%3Cdev.druid.apache.org%3E
2019-12-10 14:02:34 -08:00
Chi Cao Minh bab78fc80e Parallel indexing single dim partitions (#8925)
* Parallel indexing single dim partitions

Implements single dimension range partitioning for native parallel batch
indexing as described in #8769. This initial version requires the
druid-datasketches extension to be loaded.

The algorithm has 5 phases that are orchestrated by the supervisor in
`ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`.
These phases and the main classes involved are described below:

1) In parallel, determine the distribution of dimension values for each
   input source split.

   `PartialDimensionDistributionTask` uses `StringSketch` to generate
   the approximate distribution of dimension values for each input
   source split. If the rows are ungrouped,
   `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter`
   uses a Bloom filter to skip rows that would be grouped. The final
   distribution is sent back to the supervisor via
   `DimensionDistributionReport`.

2) The range partitions are determined.

   In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the
   supervisor uses `StringSketchMerger` to merge the individual
   `StringSketch`es created in the preceding phase. The merged sketch is
   then used to create the range partitions.

3) In parallel, generate partial range-partitioned segments.

   `PartialRangeSegmentGenerateTask` uses the range partitions
   determined in the preceding phase and
   `RangePartitionCachingLocalSegmentAllocator` to generate
   `SingleDimensionShardSpec`s.  The partition information is sent back
   to the supervisor via `GeneratedGenericPartitionsReport`.

4) The partial range segments are grouped.

   In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`,
   the supervisor creates the `PartialGenericSegmentMergeIOConfig`s
   necessary for the next phase.

5) In parallel, merge partial range-partitioned segments.

   `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to
   retrieve the partial range-partitioned segments generated earlier and
   then merges and publishes them.

* Fix dependencies & forbidden apis

* Fixes for integration test

* Address review comments

* Fix docs, strict compile, sketch check, rollup check

* Fix first shard spec, partition serde, single subtask

* Fix first partition check in test

* Misc rewording/refactoring to address code review

* Fix doc link

* Split batch index integration test

* Do not run parallel-batch-index twice

* Adjust last partition

* Split ITParallelIndexTest to reduce runtime

* Rename test class

* Allow null values in range partitions

* Indicate which phase failed

* Improve asserts in tests
2019-12-09 23:05:49 -08:00
Vadim Ogievetsky 0330744793 Docs: bold Java 8 requirement (#8996)
* bold Java 8 req

* add warning box
2019-12-09 20:23:07 -08:00
Roman Leventov 1c62987783
Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702)
* Add SelfDiscoveryResource

* Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself).

* Extended docs

* Fix brace

* Remove redundant throws in Lifecycle.Handler.stop()

* Import order

* Remove unresolvable link

* Address comments

* tmp

* tmp

* Rollback docker changes

* Remove extra .sh files

* Move filter

* Fix SecurityResourceFilterTest
2019-12-08 18:47:58 +03:00
Clint Wylie 441515cb50 update dump-segment docs so example command works (#8998)
* update dump-segment docs so example command works

* not everyone uses bash
2019-12-07 06:36:46 -08:00
Jonathan Wei c949a25210
Add DruidInputSource (replacement for IngestSegmentFirehose) (#8982)
* Add Druid input source and format

* Inherit dims/metrics from segment

* Add ingest segment firehose reindexing test

* Remove unnecessary module

* Fix unit tests, checkstyle

* Add doc entry

* Fix dimensionExclusions handling, add parallel index integration test

* Add spelling exclusion

* Address some PR comments

* Checkstyle

* wip

* Address rest of PR comments

* Address PR comments
2019-12-05 16:50:00 -08:00
Clint Wylie 5ecdf94d83
add 'prefixes' support to google input source (#8930)
* add prefixes support to google input source, making it symmetrical-ish with s3

* docs

* more better, and tests

* unused

* formatting

* javadoc

* dependencies

* oops

* review comments

* better javadoc
2019-12-04 21:01:10 -08:00
Lucas Capistrant 8dd9a8cb15 Small doc fix for baseTaskDir conf (#8978) 2019-12-04 14:07:03 -08:00
Clint Wylie a48784a1fd dropwizard-emitter doc fixes (#8988) 2019-12-04 12:52:58 -08:00
Fangyuan Deng 187cf0dd3f [Improvement] historical fast restart by lazy load columns metadata(20X faster) (#6988)
* historical fast restart by lazy load columns metadata

* delete repeated code

* add documentation for druid.segmentCache.lazyLoadOnStart

* fix unit test fail

* fix spellcheck

* update docs

* update docs mentioning a catch
2019-12-03 09:47:01 -08:00
Jonathan Wei 00ce18a0ea
Additional Kinesis resharding fixes (#8870)
* Additional Kinesis resharding fixes

* Address PR comments

* Remove unused method

* Adjust SegmentTransactionalInsertAction null handling

* Check for unchanged metadata on empty publish

* Add logs for empty publish

* Fix javadoc

* Clear offset when invalid endOffsets are seen

* Fix LGTM alert

* Fix build

* Add resharding note to Kinesis docs

* Checkstyle

* Spelling

* Address PR comments

* Checkstyle
2019-11-28 12:59:01 -08:00
Clint Wylie 4458113375
S3 input source (#8903)
* add s3 input source for native batch ingestion

* add docs

* fixes

* checkstyle

* lazy splits

* fixes and hella tests

* fix it

* re-use better iterator

* use key

* javadoc and checkstyle

* exception

* oops

* refactor to use S3Coords instead of URI

* remove unused code, add retrying stream to handle s3 stream

* remove unused parameter

* update to latest master

* use list of objects instead of object

* serde test

* refactor and such

* now with the ability to compile

* fix signature and javadocs

* fix conflicts yet again, fix S3 uri stuffs

* more tests, enforce uri for bucket

* javadoc

* oops

* abstract class instead of interface

* null or empty

* better error
2019-11-25 22:31:19 -08:00
Jihoon Son a2e6de4b16 Fix the potential race between SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor (#8924)
* Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor

* Fix docs and javadoc

* Add unit tests for large or small estimated num splits

* add override
2019-11-23 01:38:08 -08:00
Clint Wylie 7250010388 add parquet support to native batch (#8883)
* add parquet support to native batch

* cleanup

* implement toJson for sampler support

* better binaryAsString test

* docs

* i hate spellcheck

* refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest

* add comment, fix some stuff

* adjustments

* fix accident

* tweaks
2019-11-22 10:49:16 -08:00
SeKing 9955107e8e RandomLocationSelectorStrategy to Choose an available disk(location) to store a segment. With unit tests. (#8461) 2019-11-22 03:46:54 -08:00
Surekha d628bebbd7 Make supervisor API similar to submit task API (#8810)
* accept spec or dataSchema, tuningConfig, ioConfig while submitting task json

* fix test

* update docs

* lgtm warning

* Add original constructor back to IndexTask to minimize changes

* fix indentation in docs

* Allow spec to be specified in supervisor schema

* undo IndexTask spec changes

* update docs

* Add Nullable and deprecated annotations

* remove deprecated configs from SeekableStreamSupervisorSpec

* remove nullable annotation
2019-11-20 10:04:41 -08:00
Clint Wylie d67c3c7aed document SQL compatible null handling mode (#8894)
* document SQL compatible null handling mode

* adjustments

* fix docs

* review changes
2019-11-20 06:52:20 -08:00
Clint Wylie 074a45219d add google cloud storage InputSource for native batch (#8907)
* add google cloud storage InputSource for native batch

* rename

* checkstyle

* fix

* fix spelling

* review comments
2019-11-19 19:49:43 -08:00
Chi Cao Minh 8365bdf62a Address security vulnerabilities (#8878)
* Address security vulnerabilities

Security vulnerabilities addressed by upgrading 3rd party libs:

- Upgrade avro-ipc to 1.9.1
  - sonatype-2019-0115
- Upgrade caffeine to 2.8.0
  - sonatype-2019-0282
- Upgrade commons-beanutils to 1.9.4
  - CVE-2014-0114
- Upgrade commons-codec to 1.13
  - sonatype-2012-0050
- Upgrade commons-compress to 1.19
  - CVE-2019-12402
  - sonatype-2018-0293
- Upgrade hadoop-common to 2.8.5
  - CVE-2018-11767
- Upgrade hadoop-mapreduce-client-core to 2.8.5
  - CVE-2017-3166
- Upgrade hibernate-validator to 5.2.5
  - CVE-2017-7536
- Upgrade httpclient to 4.5.10
  - sonatype-2017-0359
- Upgrade icu4j to 55.1
  - CVE-2014-8147
- Upgrade jackson-databind to 2.6.7.3:
  - CVE-2017-7525
- Upgrade jetty-http to 9.4.12:
  - CVE-2017-7657
  - CVE-2017-7658
  - CVE-2017-7656
  - CVE-2018-12545
- Upgrade log4j-core to 2.8.2
  - CVE-2017-5645:
- Upgrade netty to 3.10.6
  - CVE-2015-2156
- Upgrade netty-common to 4.1.42
  - CVE-2019-9518
- Upgrade netty-codec-http to 4.1.42
  - CVE-2019-16869
- Upgrade nimbus-jose-jwt to 4.41.1
  - CVE-2017-12972
  - CVE-2017-12974
- Upgrade plexus-utils to 3.0.24
  - CVE-2017-1000487
  - sonatype-2015-0173
  - sonatype-2016-0398
- Upgrade postgresql to 42.2.8
  - CVE-2018-10936

Note that if users are using JDBC lookups with postgres, they may need
to update the JDBC jar used by the lookup extension.

* Fix license for postgresql
2019-11-19 09:14:33 -08:00
Chi Cao Minh d60978343a Improve missing JDBC driver error for lookups (#8872)
If the JDBC drivers are missing from the lookup extensions, throw an
exception that directs the user how to resolve the issue. This change is
a follow up to #8825.
2019-11-18 11:42:38 -08:00
Jihoon Son 1611792855
Add InputSource and InputFormat interfaces (#8823)
* Add InputSource and InputFormat interfaces

* revert orc dependency

* fix dimension exclusions and failing unit tests

* fix tests

* fix test

* fix test

* fix firehose and inputSource for parallel indexing task

* fix tc

* fix tc: remove unused method

* Formattable

* add needsFormat(); renamed to ObjectSource; pass metricsName for reader

* address comments

* fix closing resource

* fix checkstyle

* fix tests

* remove verify from csv

* Revert "remove verify from csv"

This reverts commit 1ea7758489.

* address comments

* fix import order and javadoc

* flatMap

* sampleLine

* Add IntermediateRowParsingReader

* Address comments

* move csv reader test

* remove test for verify

* adjust comments

* Fix InputEntityIteratingReader

* rename source -> entity

* address comments
2019-11-15 09:22:09 -08:00
Clint Wylie cc54b2a9df support for array expressions in TransformSpec with ExpressionTransform (#8744)
* transformSpec + array expressions

changes:
* added array expression support to transformSpec
* removed ParseSpec.verify since its only use afaict was preventing transform expr that did not replace their input from functioning
* hijacked index task test to test changes

* remove docs about being unsupported

* re-arrange test assert

* unused imports

* imports

* fix tests

* preserve types

* suppress warning, fixes, add test

* formatting

* cleanup

* better list to array type conversion and tests

* fix oops
2019-11-13 11:04:37 -08:00
fst0 80dbf44fca Add reference to druid.storage.type (#8857)
* Add reference to `druid.storage.type`

This should be in here. Without setting storage type to S3 globally it will obviously not be used, even if all other parameters are correct.

* Update s3.md

Add global storage parameter to knob table.

* Update s3.md
2019-11-13 10:03:41 -08:00
Lucas Capistrant a066cc5648 Fix groupMapping endpoint URIs in druid-basic-security doc (#8847) 2019-11-12 21:12:34 +05:30
Jonathan Wei 75ea0d592a Add more datasketches doubles sketch SQL functions (#8843)
* Add more datasketches doubles sketch SQL postaggs

* style and lgtm
2019-11-08 18:05:06 -08:00
Gian Merlino 0e8c3f74d0 SQL: EARLIEST, LATEST aggregators. (#8815)
* SQL: EARLIEST, LATEST aggregators.

I chose these names instead of FIRST, LAST because those are already
reserved functions in Calcite that mean something different. I think
these are also better names anyway.

* Finalify.

* SQL updates.

* Adjust aggregator calls.

* Validations, test updates.

* Review docs.
2019-11-08 16:29:25 -08:00
Clint Wylie 7aafcf8bca parallel broker merges on fork join pool (#8578)
* sketch of broker parallel merges done in small batches on fork join pool

* fix non-terminating sequences, auto compute parallelism

* adjust benches

* adjust benchmarks

* now hella more faster, fixed dumb

* fix

* remove comments

* log.info for debug

* javadoc

* safer block for sequence to yielder conversion

* refactor LifecycleForkJoinPool into LifecycleForkJoinPoolProvider which wraps a ForkJoinPool

* smooth yield rate adjustment, more logs to help tune

* cleanup, less logs

* error handling, bug fixes, on by default, more parallel, more tests

* remove unused var

* comments

* timeboundary mergeFn

* simplify, more javadoc

* formatting

* pushdown config

* use nanos consistently, move logs back to debug level, bit more javadoc

* static terminal result batch

* javadoc for nullability of createMergeFn

* cleanup

* oops

* fix race, add docs

* spelling, remove todo, add unhandled exception log

* cleanup, revert unintended change

* another unintended change

* review stuff

* add ParallelMergeCombiningSequenceBenchmark, fixes

* hyper-threading is the enemy

* fix initial start delay, lol

* parallelism computer now balances partition sizes to partition counts using sqrt of sequence count instead of sequence count by 2

* fix those important style issues with the benchmarks code

* lazy sequence creation for benchmarks

* more benchmark comments

* stable sequence generation time

* update defaults to use 100ms target time, 4096 batch size, 16384 initial yield, also update user docs

* add jmh thread based benchmarks, cleanup some stuff

* oops

* style

* add spread to jmh thread benchmark start range, more comments to benchmarks parameters and purpose

* retool benchmark to allow modeling more typical heterogenous heavy workloads

* spelling

* fix

* refactor benchmarks

* formatting

* docs

* add maxThreadStartDelay parameter to threaded benchmark

* why does catch need to be on its own line but else doesnt
2019-11-07 11:58:46 -08:00
Jad Naous ce3c0dae4d Add note on JDBC libs for lookups (#8825)
* Add note on JDBC libs for lookups

* Fix directory and additional "the"
2019-11-06 13:31:26 -08:00
Himanshu 5adc8212b4
add documentation for druid docker and k8s operator (#8802)
* add documentation for druid docker and k8s operator

* address review comment and add Kubernetes to spelling file
2019-11-06 12:56:21 -08:00
Tijo Thomas 27acdbd2b8 'hadoop fs' command is deprecated . The new approach is to use hdfs command . Replacing 'hadoop fs' command with 'hdfs dfs' (#8762) 2019-11-01 04:42:10 +05:30
Giuseppe Martino 9c171e2b1f Message rejection absolute date (#8656)
* Add option lateMessageRejectionStartDate

* Use option lateMessageRejectionStartDate

* Fix tests

* Add lateMessageRejectionStartDate to kafka indexing service

* Update tests kafka indexing service

* Fix tests for KafkaSupervisorTest

* Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig

* Fix var name

* Update documentation

* Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.
2019-10-31 15:13:02 -07:00
Clint Wylie 3ff5e02237 remove select query (#8739)
* remove select query

* thanks teamcity

* oops

* oops

* add back a SelectQuery class that throws RuntimeExceptions linking to docs

* adjust text

* update docs per review

* deprecated
2019-10-30 19:29:56 -07:00
Gian Merlino 7605c23354 Remove Tranquility configs and certain doc references. (#8793)
Since it hasn't received updates or community interest in a while, it makes sense
to de-emphasize it in the distribution and most documentation (outside of simple
mentions of its existence).
2019-10-30 16:30:16 -07:00
Gian Merlino c922d2c3c9 Use bundled ZooKeeper in tutorials. (#8792) 2019-10-30 16:17:28 -07:00
Gian Merlino aa81253cf4 Fix typos. (#8767) 2019-10-28 12:47:01 -07:00
Gian Merlino b65d2ac648 Add HDFS firehose (#8754)
* Add HDFS firehose.

* Tests, support for lists of paths.

* Fixups.

* Update list of firehoses.

* Wildcards is a word.
2019-10-28 08:07:38 -07:00
Vadim Ogievetsky f9b94a5db1
Docs: remove self link (#8760)
This section links to itself in the description. I tried to follow that link and spit hot tea all over my monitor from laughter.
2019-10-27 22:33:22 -07:00
Clint Wylie 09f92818d4 update druid expression docs to indicate that array functions do not work at indexing time (#8734)
* update druid expression docs to indicate that array functions are not supported in transformSpec

* fix unrelated spelling check
2019-10-24 22:04:08 -07:00
Eyal Yurman 14e33428f0 Moving Average extention: Add Sum averagers (#8511)
* Add sum averagers.

* avoid casting double to long.
2019-10-24 16:37:24 -07:00
Vadim Ogievetsky cc3650ee3b fix doc headers (#8729) 2019-10-24 11:17:39 -07:00
Jihoon Son f5b9bf5525 Cluster-wide configuration for query vectorization (#8657)
* Cluster-wide configuration for query vectorization

* add doc

* fix build

* fix doc

* rename to QueryConfig and add javadoc

* fix checkstyle

* fix variable names
2019-10-23 21:44:28 +08:00
David Glasser b453fda251 docs: clarify native batch ingestion w/ overlapping segments (#8720)
I was confused by a paragraph in the docs that I myself wrote!
2019-10-22 21:01:56 -07:00
Jad Naous 2ab43aa688 Update tutorial-kerberos-hadoop.md (#8689)
* Update tutorial-kerberos-hadoop.md

Fix up what looks like a bad merge.

* Update tutorial-kerberos-hadoop.md

Fix spelling issues
2019-10-22 14:40:41 -07:00
Abhishek Radhakrishnan 42cfe679f1 Update query result timestamp to match query intervals. (#8717) 2019-10-22 14:39:47 -07:00
Surekha e919eccc4b Update docs to add metadataSegment configs (#8708)
* Add metadataSegment configs to docs

* rearrange in alphabetical order
2019-10-22 01:19:36 -07:00
Kamal Gurala 3ed5f9698a gcs prefix doc fix (#8699) 2019-10-21 08:29:54 -07:00
Surekha 98f59ddd7e Add `sys.supervisors` table to system tables (#8547)
* Add supervisors table to SystemSchema

* Add docs

* fix checkstyle

* fix test

* fix CI

* Add comments

* Fix javadoc teamcity error

* comments

* fix links in docs

* fix links

* rename fullStatus query param to system and remove it from docs
2019-10-18 15:16:42 -07:00
Jonathan Wei d88075237a
Add initial SQL support for non-expression sketch postaggs (#8487)
* Add initial SQL support for non-expression sketch postaggs

* Checkstyle, spotbugs

* checkstyle

* imports

* Update SQL docs

* Checkstyle

* Fix theta sketch operator docs

* PR comments

* Checkstyle fixes

* Add missing entries for HLL sketch module

* PR comments, add round param to HLL estimate operator, fix optional HLL param
2019-10-18 14:59:44 -07:00
Jihoon Son 30c15900be
Auto compaction based on parallel indexing (#8570)
* Auto compaction based on parallel indexing

* javadoc and doc

* typo

* update spell

* addressing comments

* address comments

* fix log

* fix build

* fix test

* increase default max input segment bytes per task

* fix test
2019-10-18 13:24:14 -07:00
Mingming Qiu 2c758ef5ff Support assign tasks to run on different categories of MiddleManagers (#7066)
* Support assign tasks to run on different tiers of MiddleManagers

* address comments

* address comments

* rename tier to category and docs

* doc

* fix doc

* fix spelling errors

* docs
2019-10-17 12:57:19 -07:00
Jad Naous d54d2e1627 Update segments.md (#8693)
Make bullet numbers clearer with parantheses, fix last reference to 2 being interpreted as a bullet point.
2019-10-17 11:55:23 -07:00
Jad Naous 9f4e11df32 Update tutorial-rollup.md (#8687)
At this point there hasn't yet been an explanation in the tutorial of what "segments" are
2019-10-16 20:08:09 -06:00
Jonathan Wei 89ce6384f5
More Kinesis resharding adjustments (#8671)
* More Kinesis resharding adjustments

* Fix TC inspection

* Fix comment'

* Adjust comment, small refactor

* Make repartition transition time configurable

* Add spellcheck exclusion

* Spelling fix
2019-10-15 23:19:17 -07:00
Jihoon Son 4046c86d62
Stateful auto compaction (#8573)
* Stateful auto compaction

* javaodc

* add removed test back

* fix test

* adding indexSpec to compactionState

* fix build

* add lastCompactionState

* address comments

* extract CompactionState

* fix doc

* fix build and test

* Add a task context to store compaction state; add javadoc

* fix it test
2019-10-15 22:57:42 -07:00
Mitch Lloyd 1a78a0c98a Add credentials for ECS (#8651)
* Add credentials for ECS

* Fix import order

* Update S3 authentication methods table

* Update .spelling for new documentation
2019-10-12 09:12:14 -07:00
Abhishek Radhakrishnan d87840d894 Minor updates to documentation. (#8665) 2019-10-12 09:11:03 -07:00
Jihoon Son 96d8523ecb Use hash of Segment IDs instead of a list of explicit segments in auto compaction (#8571)
* IOConfig for compaction task

* add javadoc, doc, unit test

* fix webconsole test

* add spelling

* address comments

* fix build and test

* address comments
2019-10-09 11:12:00 -07:00
Clint Wylie 8bda3afea4 fix spelling errors triggered by another doc PR (#8653) 2019-10-08 23:43:58 -07:00
Nishant Bangarwa 0853273091 Add tier based usage metrics for historical nodes to help with autoscaling (#8636)
* Add tier based usage metrics for historical nodes to help with druid historical autoscaling

Add tier based usage metrics for historical nodes to help druid cluster orchestration systems understand the historical node usage and requirements. Following metrics would be helpful -

tier/required/capacity- total capacity in bytes required in each tier. Dimensions - tier
tier/total/capacity - total capacity in bytes available in a given tier. Dimension - tier
tier/historical/count - no. of historical nodes available in each tier. Dimension - tier
tier/replication/factor - configured maximum replication factor in given tier. Dimension - tier

* fix unit test failures
2019-10-08 19:55:32 -07:00
Mohammad J. Khan 18758f5228 Support LDAP authentication/authorization (#6972)
* Support LDAP authentication/authorization

* fixed integration-tests

* fixed Travis CI build errors related to druid-security module

* fixed failing test

* fixed failing test header

* added comments, force build

* fixes for strict compilation spotbugs checks

* removed authenticator rolling credential update feature

* removed escalator rolling credential update feature

* fixed teamcity inspection deprecated API usage error

* fixed checkstyle execution error, removed unused import

* removed cached config as part of removing authenticator rolling credential update feature

* removed config bundle entity as part of removing authenticator rolling credential update feature

* refactored ldao configuration

* added support for SSLContext configuration and TLSCertificateChecker

* removed check to return authentication failure when user has no group assigned, will be checked and handled by the authorizer

* Separate out authorizer checks between metadata-backed store user and LDAP user/groups

* refactored BasicSecuritySSLSocketFactory usage to fix strict compilation spotbugs checks

* fixes build issue

* final review comments updates

* final review comments updates

* fixed LGTM and spellcheck alerts

* Fixed Avatica auth failure error message check

* Updated metadata credentials validator exception message string, replaced DB with metadata store
2019-10-08 17:08:27 -07:00
Clint Wylie 2f20799868 merge recommendations into basic-cluster-tuning, add additional info (#8649)
* merge recommendations into basic-cluster-tuning, add additional info

* stupid sidebar
2019-10-08 16:33:54 -07:00
Himanshu c078ed40fd
groupBy query: optional limit push down to segment scan (#8426)
* groupBy query: optional limit push down to segment scan

* make segment level limit push down configurable

* fix teamcity errors

* fix segment limit pushdown flag handling on query level config override

* use equals for comparator check

* fix sql and null handling

* fix unused imports

* handle null offset in NullableValueGroupByColumnSelectorStrategy for buffer comparator similar to RowBasedGrouperHelper.NullableRowBasedKeySerdeHelper
2019-10-08 15:35:07 -07:00
Lucas Capistrant d801ce2f29 Update rollup table to properly reflect 0.16.0 (#8638)
This table stated that `index_parallel` tasks were best-effort only. However, this changed with #8061 and this documentation update was simply missed.
2019-10-07 12:37:15 -07:00
Xavier Léauté 1d42551d95 Fix statsd types (#8628)
* fix segment underReplicated/unavailable counts to be gauges instead of counters

* fix jvm/gc/cpu to be a counter instead of timre

jvm/gc/cpu represents the total cpu time spent for multiple gc
invocations, not the time spent in each gc cycle.

the number needs to be divided by jvm/gc/count to get the average gc
time per cycle

* update docs

* fix spellcheck
2019-10-06 14:14:09 -07:00
Parag Jain f0d74b240d password provider for basic authentication of HttpEmitterConfig (#8618) 2019-10-02 15:59:17 -07:00
Nishant Bangarwa 8537fbeca7 Implementing dropwizard emitter for druid (#7363)
* Implementing dropwizard emitter for druid

making metric manager and alert emitters as optional

* Refactor and make things work

more improvements

improve docs

refactrings

* Fix teamcity inspections

* review comments

* more review comments

* add limit to max number of gauges

* update pom version

* fix pom

* review comments

* review comment

* review comments

* fix broken doc link

review comments

review comments

* review comments

* fix checkstyle

* more spell check fixes

* fix travis failures
2019-10-01 14:59:30 -07:00
pdeva db65068c42 add reference to indexer nodes (#8607) 2019-09-30 16:45:33 -06:00
Sashidhar Thallam 51a7235ebc Making optimal usage of multiple segment cache locations (#8038)
* #7641 - Changing segment distribution algorithm to distribute segments to multiple segment cache locations

* Fixing indentation

* WIP

* Adding interface for location strategy selection, least bytes used strategy impl, round-robin strategy impl, locationSelectorStrategy config with least bytes used strategy as the default strategy

* fixing code style

* Fixing test

* Adding a method visible only for testing, fixing tests

* 1. Changing the method contract to return an iterator of locations instead of a single best location. 2. Check style fixes

* fixing the conditional statement

* Added testSegmentDistributionUsingLeastBytesUsedStrategy, fixed testSegmentDistributionUsingRoundRobinStrategy

* to trigger CI build

* Add documentation for the selection strategy configuration

* to re trigger CI build

* updated docs as per review comments, made LeastBytesUsedStorageLocationSelectorStrategy.getLocations a synchronzied method, other minor fixes

* In checkLocationConfigForNull method, using getLocations() to check for null instead of directly referring to the locations variable so that tests overriding getLocations() method do not fail

* Implementing review comments. Added tests for StorageLocationSelectorStrategy

* Checkstyle fixes

* Adding java doc comments for StorageLocationSelectorStrategy interface

* checkstyle

* empty commit to retrigger build

* Empty commit

* Adding suppressions for words leastBytesUsed and roundRobin of ../docs/configuration/index.md file

* Impl review comments including updating docs as suggested

* Removing checkLocationConfigForNull(), @NotEmpty annotation serves the purpose

* Round robin iterator to keep track of the no. of iterations, impl review comments, added tests for round robin strategy

* Fixing the round robin iterator

* Removed numLocationsToTry, updated java docs

* changing property attribute value from tier to type

* Fixing assert messages
2019-09-28 00:17:44 -06:00
Himanshu 9f1f5e115c
doubleMean aggregator to be used at query time (#8459)
* doubleMean aggregator for computing mean

* make docs

* build fixes

* address review comment: handle null args
2019-09-26 08:04:33 -07:00
Nishant Bangarwa a75ddaad9e Add TrustedDomain Authenticator (#8248)
* Add TrustedDomain Authenticator

update javadoc

Add nullable annotations

Add cautionary note

fix travis failure

* add IP to spell checker
2019-09-25 11:25:03 -07:00
Rye f2a444321b Added live reports for Kafka and Native batch task (#8557)
* Added live reports for Kafka and Native batch task

* Removed unused local variables

* Added the missing unit test

* Refine unit test logic, add implementation for HttpRemoteTaskRunner

* checksytle fixes

* Update doc descriptions for updated API

* remove unnecessary files

* Fix spellcheck complaints

* More details for api descriptions
2019-09-23 21:08:36 -07:00
Vadim Ogievetsky 52f3f2c229 fix docs version interpolation (#8568) 2019-09-22 17:38:55 -07:00
Vadim Ogievetsky 94298f7809 Update Kafka loading docs to use the streaming data loader (#8544)
* fix redirects

* remove useless page

* fix Single server reference configurations formatting

* update batch data loading

* update Kafka docs

* fix typos and tests

* add more links

* fix spelling
2019-09-22 15:00:52 -07:00
Chi Cao Minh aeac0d4fd3 Adjust defaults for hashed partitioning (#8565)
* Adjust defaults for hashed partitioning

If neither the partition size nor the number of shards are specified,
default to partitions of 5,000,000 rows (similar to the behavior of
dynamic partitions). Previously, both could be null and cause incorrect
behavior.

Specifying both a partition size and a number of shards now results in
an error instead of ignoring the partition size in favor of using the
number of shards. This is a behavior change that makes it more apparent
to the user that only one of the two properties will be honored
(previously, a message was just logged when the specified partition size
was ignored).

* Fix test

* Handle -1 as null

* Add -1 as null tests for single dim partitioning

* Simplify logic to handle -1 as null

* Address review comments
2019-09-21 20:57:40 -07:00
Chi Cao Minh 99b6eedab5 Rename partition spec fields (#8507)
* Rename partition spec fields

Rename partition spec fields to be consistent across the various types
(hashed, single_dim, dynamic). Specifically, use targetNumRowsPerSegment
and maxRowsPerSegment in favor of targetPartitionSize and
maxSegmentSize. Consistent and clearer names are easier for users to
understand and use.

Also fix various IntelliJ inspection warnings and doc spelling mistakes.

* Fix test

* Improve docs

* Add targetRowsPerSegment to HashedPartitionsSpec
2019-09-20 14:59:18 -06:00
Xavier Léauté e184d24a74
add support for dogstatsd events in statsd-emitter (#8546)
* add support for dogstatsd events in statsd-emitter
* add option to turn on alert events (off by default)
* updated docs
2019-09-19 08:12:30 -07:00
Chi Cao Minh 7dcbaca658 Spellcheck docs (#8548)
* Spellcheck docs

Fix spelling mistakes in docs and add CI job for running spellcheck on
docs.

* Add missing license header
2019-09-17 12:47:30 -07:00
Vadim Ogievetsky 0490909ab3 Web console: Update web console docs for 0.16.0 (#8530)
* Update webconsole docs

* home view

* fix annotation typo
2019-09-13 09:09:36 -07:00
Clint Wylie 75978e5b98 move google ext docs from contrib to core (#8512)
* move google ext docs from contrib to core

* fix links

* revert unintended change

* more links, add note to example ext doc that it was removed, unlink from sidebar
2019-09-12 09:40:39 -07:00
Jonathan Wei 0145642d8b Move router/indexer config/API docs to main pages (#8510)
* Move router/indexer config/API docs to main pages

* Restore missing properties, fix typo

* Use sentence casing

* Fix broken link
2019-09-11 21:42:58 -07:00
Clint Wylie fb078eea1e
fix web-console build in src distribution, fix kafka doc minimum version (#8502) 2019-09-10 21:01:07 -07:00
Chi Cao Minh 14a8613d69 Exit JVM on curator unhandled errors (#8458)
* Exit JVM on curator unhandled errors

If an unhandled error occurs when curator is talking to ZooKeeper, exit
the JVM in addition to stopping the lifecycle to prevent the process
from being left in a zombie state. With this change,
BoundedExponentialBackoffRetryWithQuit is no longer needed as when
curator exceeds the configured retries, it triggers its unhandled error
listeners. A new "connectionTimeoutMs" CuratorConfig setting is added
mostly to facilitate testing curator unhandled errors, but it may be
useful for users as well.

* Address review comments
2019-09-06 16:43:59 -07:00
Clint Wylie fd58fbc8d3
fix statds dogstatsdServiceAsTag docs example to match behavior (#8477) 2019-09-05 19:05:25 -07:00
SeKing 6a6893b406 Fix operator mistake of expression OR (#8452)
* Add realization for updating version of derived segments in MaterializedView

* add unit test, and change code style for the sake of ease of understanding

* fix document's mistake of expression
2019-09-04 21:27:18 -07:00
Lucas Capistrant bfb02f09f8 Add druid.segmentCache.numBootstrapThreads back to the docs (#8462) 2019-09-04 20:27:17 -07:00
legendtkl 0be4a41c06 Website Doc: fix bash command (#8442)
* fix "gunzip -k" to "gunzip -c"
2019-08-30 22:22:09 -07:00
Clint Wylie 3baf31e9a8 add documentation for group by array based result format (#8416) 2019-08-28 08:30:31 -07:00
Jonathan Wei c626452b47 Add nano-quickstart single server example configuration (#8390)
* Add nano-quickstart single server example configuration

* Use two workers

* Shrink processing buffers
2019-08-24 22:07:20 -07:00
Furkan KAMACI 02fe3db911 Zookeeper version is updated. (#8363)
* Zookeeper version is updated.

* Zookeeper version is updated at licenses.yaml

* licenses.yaml is updated and dependencies are fixed to make the project successfully build.

* Zookeeper versions are fixed at licenses.yaml
2019-08-24 22:00:43 -07:00
Jihoon Son 95fa609615 Fix wrong partitionsSpec type names in the document (#8297)
* Fix wrong type names for partitionsSpec

* add unit tests; add json properties for backward compatibility

* beautify conf names

* remove maxRowsPerSegment from hashed partitionsSpec

* fix doc build
2019-08-23 13:44:58 -07:00
Clint Wylie 7749571a7f order and add more ports to hadoop docker container in hadoop indexing tutorial (#8329)
LGTM
2019-08-23 15:43:06 -05:00
Surekha cf2a2dd917
Add group_id to the sys.tasks table (#8304)
* Add group_id to overlord tasks API and sys.tasks table

* adjust test

* modify docs

* Make groupId nullable

* fix integration test

* fix toString

* Remove groupId from TaskInfo

* Modify docs and tests

* modify TaskMonitorTest
2019-08-22 15:28:23 -07:00
Clint Wylie 010f70b371
autogenerate NOTICE.BINARY from NOTICE and licenses.yaml (#8306)
* migrate binary notice entries to live in licenses.yaml, use licenses.yaml and NOTICE to generate NOTICE.BINARY at distribution time

* +x

* move release scripts to distribution/bin, fixup notice script, trim dependencies for avro and kerberos in licenses.yaml

* add missing hdfs-storage dependencies

* revert to old syntax, fixes

* formatting

* update notices for recently updated dependencies
2019-08-21 12:46:27 -07:00
Gian Merlino d007477742
Docusaurus build framework + ingestion doc refresh. (#8311)
* Docusaurus build framework + ingestion doc refresh.

* stick to npm instead of yarn

* fix typos

* restore some _bin

* Adjustments.

* detect and fix redirect anchors

* update anchor lint

* Web-console: remove specific column filters (#8343)

* add clear filter

* update tool kit

* remove usless check

* auto run

* add %

* Fix resource leak (#8337)

* Fix resource leak

* Patch comments

* Enable Spotbugs NP_NONNULL_RETURN_VIOLATION (#8234)

* Fixes from PR review.

* Fix more anchors.

* Preamble nix.

* Fix more anchors, headers

* clean up placeholder page

* add to website lint to travis config

* better broken link checking

* travis fix

* Fixed more broken links

* better redirects

* unfancy catch

* fix LGTM error

* link fixes

* fix md issues

* Addl fixes
2019-08-20 21:48:59 -07:00
Fokko Driesprong d5a19675dd Remove fromPigAvroStorage from the docs (#8340)
This one has been deprecated a while ago
2019-08-20 16:34:55 -07:00
Jonathan Wei dd2e53baf4
Clarify Avro decoder docs (#8302) 2019-08-19 15:37:18 -05:00
Jihoon Son 31af4eb9ad
Rename maxNumSubTasks to maxNumConcurrentSubTasks for native parallel index task (#8324) 2019-08-16 15:57:13 -07:00
Jihoon Son 5dac6375f3
Add support for parallel native indexing with shuffle for perfect rollup (#8257)
* Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks

* kill runner when it's ready

* add comment

* kill run thread

* fix test

* Take closeable out of Appenderator

* add javadoc

* fix test

* fix test

* update javadoc

* add javadoc about killed task

* address comment

* Add support for parallel native indexing with shuffle for perfect rollup.

* Add comment about volatiles

* fix test

* fix test

* handling missing exceptions

* more clear javadoc for stopGracefully

* unused import

* update javadoc

* Add missing statement in javadoc

* address comments; fix doc

* add javadoc for isGuaranteedRollup

* Rename confusing variable name and fix typos

* fix typos; move fetch() to a better home; fix the expiration time

* add support https
2019-08-15 17:43:35 -07:00
Jihoon Son eeae5d9365 Add a warning about experimental segment locking (#8301)
* Add a warning about experimental segment locking

* fix typo
2019-08-15 16:07:59 -07:00
Jihoon Son a5c9c2950f Add missing maxBytesInMemory in tuningConfig for auto compaction (#8274)
* Add missing tuningConfigs for auto compaciton

* Add doc

* add test
2019-08-13 14:10:26 -05:00
Alexandre Yang 6b4d028b96 [statsd-emitter] Add config to send Druid process/service as tag (#8238)
* [statsd-emitter] Add serviceAsTag option

* [statsd-emitter] Refactor serviceAsTag option

* [statsd-emitter] Update statsd.md

* [statsd-emitter] add default prefix

* [statsd-emitter] update statsd.md

* [statsd-emitter] Remove extra spaces

* [statsd-emitter] Improve docs for config `dogstatsdServiceAsTag`

* [statsd-emitter] Simplify equals() for StatsDEmitterConfig.java

* [statsd-emitter] Add @Nullable for StatsDEmitterConfig.java
2019-08-12 13:18:44 -07:00
Nathan b28e252d9a Minor Spelling Error (#8277)
* Minor Spelling Error

* Update mySQL password in docs

/extensions-core/mysql update druid.metadata.storage.connector.password
2019-08-09 16:06:02 -05:00
Jonathan Wei e88bbe71c0 Adjust default globalIngestionHeapLimitBytes for indexer, add more docs (#8255) 2019-08-07 23:04:07 -07:00
Jonathan Wei 5e57492298 Add docs for CliIndexer as an experimental feature (#8245)
* Experimental CliIndexer docs

* PR comments
2019-08-06 15:57:17 -07:00
Lucas Capistrant e252abedc5 Enable toggling request logging on/off for different query types (#7562)
* Enable ability to toggle SegmentMetadata request logging on/off

* Move SegmentMetadata query log filter to FilteredRequestLogger

* Update documentation to reflect the segment metadata flag moving to the filtered request logger

* Modify patch to allow blacklist of query types to not log to request logger

* Address styling and naming requests following latest code review

* Fix indentation on multiple locations per Druid style rules
2019-08-06 15:47:30 +03:00
Samarth Jain 93cf9d4ad4 SQL support for t-digest based sketch aggregators (#8100)
* SQL support for t-digest based sketch aggregators

* Fix teamcity errors

* Add missing dependencies

* Remove unused dependency

* Address code review comments

* Add checks for compression param
2019-08-05 12:01:42 -07:00
Jihoon Son 1ee828ff49
Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking (#8173)
* Add a cluster-wide configuration to force timeChunk lock and add a doc for segment locking

* add more test

* javadoc for missingIntervalsInOverwriteMode

* Fix test

* Address comments

* avoid spotbugs
2019-08-02 20:30:05 -07:00
Chi Cao Minh 4bd3bad8ba Add IPv4 SQL functions (#8223)
* Add IPv4 SQL functions

New SQL functions for filtering IPv4 addresses:
- IPV4_MATCH: Check if IP address belongs to a subnet
- IPV4_PARSE: Convert string IP address to integer
- IPV4_STRINGIFY: Convert integer IP address to string

These are the SQL analogs of the druid expressions with the same name.
Filtering is more efficient when operating on IP addresses as integers
instead of strings.

* Refactor operator conversions into named constants
2019-08-01 21:29:58 -07:00
Clint Wylie 01c8c82982 correct kerberos doc extension load list (#8224) 2019-08-01 17:03:25 -07:00
Chi Cao Minh 7783b31846 Add IPv4 druid expressions (#8197)
* Add IPv4 druid expressions

New druid expressions for filtering IPv4 addresses:
- ipv4address_match: Check if IP address belongs to a subnet
- ipv4address_parse: Convert string IP address to long
- ipv4address_stringify: Convert long IP address to string

These expressions operate on IP addresses represented as either strings
or longs, so that they can be applied to dimensions with mixed
representation of IP addresses. The filtering is more efficient when
operating on IP addresses as longs. In other words, the intended use
case is:

1) Use ipv4address_parse to convert to long at ingestion time
2) Use ipv4address_match to filter (on longs) at query time
3) Use ipv4adress_stringify to convert to (readable) string at query
time

* Fix licenses and null handling

* Simplify IPv4 expressions

* Fix tests

* Fix check for valid ipv4 address string
2019-08-01 11:45:04 -07:00
Surekha f0ecdfee30 Fix `is_realtime` column behavior in sys.segments table (#8154)
* Fix is_realtime flag

* make variable final

* minor changes

* Modify is_realtime behavior based on review comment

* Fix UT
2019-07-31 22:26:49 -06:00
Nathan 716ce7fdc7 Spelling Error (#8206) 2019-07-31 10:43:11 -07:00
Jihoon Son 385f492a55
Use PartitionsSpec for all task types (#8141)
* Use partitionsSpec for all task types

* fix doc

* fix typos and revert to use isPushRequired

* address comments

* move partitionsSpec to core

* remove hadoopPartitionsSpec
2019-07-30 17:24:39 -07:00
Clint Wylie 653b558134 sql firehose and firehose doc adjustments (#8067)
* firehose doc adjustments

* fix typo

* additional information on parser types in ingestion docs

* clarify ingest segment firehose docs, add sql firehose examples to sql extension pages

* fixit

* make sql firehose more forgiving my always constructing a MapInputRowParser from the parseSpec of whatever actual InputRowParser impl is provided, remove doc references to map based parsers

* transforms

* fix tests
2019-07-30 15:28:10 -07:00
Jonathan Wei 640b7afc1c Add CliIndexer process type and initial task runner implementation (#8107)
* Add CliIndexer process type and initial task runner implementation

* Fix HttpRemoteTaskRunnerTest

* Remove batch sanity check on PeonAppenderatorsManager

* Fix paralle index tests

* PR comments

* Adjust Jersey resource logging

* Additional cleanup

* Fix SystemSchemaTest

* Add comment to LocalDataSegmentPusherTest absolute path test

* More PR comments

* Use Server annotated with RemoteChatHandler

* More PR comments

* Checkstyle

* PR comments

* Add task shutdown to stopGracefully

* Small cleanup

* Compile fix

* Address PR comments

* Adjust TaskReportFileWriter and fix nits

* Remove unnecessary closer

* More PR comments

* Minor adjustments

* PR comments

* ThreadingTaskRunner: cancel  task run future not shutdownFuture and remove thread from workitem
2019-07-29 17:06:33 -07:00
Jihoon Son 61f4abece4
Add more warning to the doc for resetOffsetAutomatically (#8153)
* Add more warnings to the doc for resetOffsetAutomatically

* fix kinesis doc

* fix typos

* revise the description

* capital

* capitalize
2019-07-24 17:37:32 -07:00
Magnus Henoch c87b47e0fa More documentation formatting fixes (#8149)
Add empty lines before bulleted lists and code blocks, to ensure that
they show up properly on the web site.  See also #8079.
2019-07-24 15:26:03 -07:00
Clint Wylie b8b22b7aaa fix references to bin/supervise in tutorial docs (#8087) 2019-07-23 15:05:01 -07:00
Clint Wylie 83514958db remove unnecessary lock in ForegroundCachePopulator leading to a lot of contention (#8116)
* remove unecessary lock in ForegroundCachePopulator leading to a lot of contention

* mutableboolean, javadocs,document some cache configs that were missing

* more doc stuff

* adjustments

* remove background documentation
2019-07-23 10:57:59 -07:00
Sashidhar Thallam ea4bad7836 Druid SQL EXTRACT time function - adding support for additional Time Units (#8068)
* 1. Added TimestampExtractExprMacro.Unit for MILLISECOND 2. expr eval for MILLISECOND 3. Added a test case to test extracting millisecond from expression. #7935

* 1. Adding DATASOURCE4 in tests. 2. Adding test TimeExtractWithMilliseconds

* Fixing testInformationSchemaTables test

* Fixing failing tests in DruidAvaticaHandlerTest

* Adding cannotVectorize() call before the test

* Extract time function - Adding support for MICROSECOND, ISODOW, ISOYEAR and CENTURY time units, documentation changes.

* Adding MILLISECOND in test case

* Adding support DECADE and MILLENNIUM, updating test case and documentation

* Fixing expression eval for DECADE and MILLENIUM
2019-07-19 20:38:32 -07:00
Roman Leventov ceb969903f
Refactor SQLMetadataSegmentManager; Change contract of REST met… (#7653)
* Refactor SQLMetadataSegmentManager; Change contract of REST methods in DataSourcesResource

* Style fixes

* Unused imports

* Fix tests

* Fix style

* Comments

* Comment fix

* Remove unresolvable Javadoc references; address comments

* Add comments to ImmutableDruidDataSource

* Merge with master

* Fix bad web-console merge

* Fixes in api-reference.md

* Rename in DruidCoordinatorRuntimeParams

* Fix compilation

* Residual changes
2019-07-17 17:18:48 +03:00
Magnus Henoch 179253a2fc Fix documentation formatting (#8079)
The Markdown dialect used when publishing the documentation to the web
site is much more sensitive than Github-flavoured Markdown.  In
particular, it requires an empty line before code blocks (unless the
code block starts right after a heading), otherwise the code block
gets formatted in-line with the previous paragraph.  Likewise for
bullet-point lists.
2019-07-15 09:55:18 -07:00
Gian Merlino ffa25b7832
Query vectorization. (#6794)
* Benchmarks: New SqlBenchmark, add caching & vectorization to some others.

- Introduce a new SqlBenchmark geared towards benchmarking a wide
  variety of SQL queries. Rename the old SqlBenchmark to
  SqlVsNativeBenchmark.
- Add (optional) caching to SegmentGenerator to enable easier
  benchmarking of larger segments.
- Add vectorization to FilteredAggregatorBenchmark and GroupByBenchmark.

* Query vectorization.

This patch includes vectorized timeseries and groupBy engines, as well
as some analogs of your favorite Druid classes:

- VectorCursor is like Cursor. (It comes from StorageAdapter.makeVectorCursor.)
- VectorColumnSelectorFactory is like ColumnSelectorFactory, and it has
  methods to create analogs of the column selectors you know and love.
- VectorOffset and ReadableVectorOffset are like Offset and ReadableOffset.
- VectorAggregator is like BufferAggregator.
- VectorValueMatcher is like ValueMatcher.

There are some noticeable differences between vectorized and regular
execution:

- Unlike regular cursors, vector cursors do not understand time
  granularity. They expect query engines to handle this on their own,
  which a new VectorCursorGranularizer class helps with. This is to
  avoid too much batch-splitting and to respect the fact that vector
  selectors are somewhat more heavyweight than regular selectors.
- Unlike FilteredOffset, FilteredVectorOffset does not leverage indexes
  for filters that might partially support them (like an OR of one
  filter that supports indexing and another that doesn't). I'm not sure
  that this behavior is desirable anyway (it is potentially too eager)
  but, at any rate, it'd be better to harmonize it between the two
  classes. Potentially they should both do some different thing that
  is smarter than what either of them is doing right now.
- When vector cursors are created by QueryableIndexCursorSequenceBuilder,
  they use a morphing binary-then-linear search to find their start and
  end rows, rather than linear search.

Limitations in this patch are:

- Only timeseries and groupBy have vectorized engines.
- GroupBy doesn't handle multi-value dimensions yet.
- Vector cursors cannot handle virtual columns or descending order.
- Only some filters have vectorized matchers: "selector", "bound", "in",
  "like", "regex", "search", "and", "or", and "not".
- Only some aggregators have vectorized implementations: "count",
  "doubleSum", "floatSum", "longSum", "hyperUnique", and "filtered".
- Dimension specs other than "default" don't work yet (no extraction
  functions or filtered dimension specs).

Currently, the testing strategy includes adding vectorization-enabled
tests to TimeseriesQueryRunnerTest, GroupByQueryRunnerTest,
GroupByTimeseriesQueryRunnerTest, CalciteQueryTest, and all of the
filtering tests that extend BaseFilterTest. In all of those classes,
there are some test cases that don't support vectorization. They are
marked by special function calls like "cannotVectorize" or "skipVectorize"
that tell the test harness to either expect an exception or to skip the
test case.

Testing should be expanded in the future -- a project in and of itself.

Related to #3011.

* WIP

* Adjustments for unused things.

* Adjust javadocs.

* DimensionDictionarySelector adjustments.

* Add "clone" to BatchIteratorAdapter.

* ValueMatcher javadocs.

* Fix benchmark.

* Fixups post-merge.

* Expect exception on testGroupByWithStringVirtualColumn for IncrementalIndex.

* BloomDimFilterSqlTest: Tag two non-vectorizable tests.

* Minor adjustments.

* Update surefire, bump up Xmx in Travis.

* Some more adjustments.

* Javadoc adjustments

* AggregatorAdapters adjustments.

* Additional comments.

* Remove switching search.

* Only missiles.
2019-07-12 12:54:07 -07:00
Chi Cao Minh da3d141dd2 Add inline firehose (#8056)
* Add inline firehose

To allow users to quickly parsing and schema, add a firehose that reads
data that is inlined in its spec.

* Address review comments

* Remove suppression of sonar warnings
2019-07-11 21:43:46 -07:00
Atul Mohan 631cda649b Include replicated segment size property for datasources endpoint (#8039)
* Add replication size

* Summon comma
2019-07-11 01:10:38 -07:00
Himanshu 14aec7fcec
add config to optionally disable all compression in intermediate segment persists while ingestion (#7919)
* disable all compression in intermediate segment persists while ingestion

* more changes and build fix

* by default retain existing indexingSpec for intermediate persisted segments

* document indexSpecForIntermediatePersists index tuning config

* fix build issues

* update serde tests
2019-07-10 12:22:24 -07:00
Jihoon Son 0a3538b569 Fix license check in travis and make it optional (#8049)
* Fix license check in travis and make it optional

* debug

* fix build

* too loud maven

* move MAVEN_OPTS to top and add comments

* adjust script

* remove mvn option from python script
2019-07-09 19:35:29 -07:00
Sashidhar Thallam 3353da2974 Adding missing docs for druid.indexer.logs.disableAcl (#8046) 2019-07-09 16:11:25 -07:00
Jihoon Son 12f12676e3
Binary license management system (#7998)
* Binary license management system

* add missing file

* add comment

* Address comments

* print missing licenses

* print druid module name

* Add missing licenses and update versions

* fix library versions and add missing ones. also fix pom.xml

* testing multi thread

* Parallel report generation

* fix build error

* install pyyaml and use old api

* install python3

* fix travis script

* python3.6

* pip

* setuptools

* python3-setuptools

* address comment

* error on not found reports or registered licenses

* removed licenses

* debug

* travis debug

* add missing licenses

* travis debug

* debug

* remove debug code

* test build script

* travis debug

* still debug

* add missing python lib

* debug

* debug

* fix travis

* fix travis

* debug travis

* flush print

* print something more to keep travis alive

* adjust print

* single threaded

* single threaded

* debug

* debug

* remove debug

* remove deprecated-2017Q4 from travis conf

* remove comments and duplicate sudo
2019-07-08 12:24:51 -07:00
Eyal Yurman 2eee711653 Add missing reference to Materialized-View extension. (#8003)
* Reference Materialized View extension from extensions page.

* Add comma
2019-07-06 13:50:41 -07:00
Dinesh Sawant 9c7c7c58ae Fix overlord port in delete data tutorial (#8037)
In Single-Server Quickstart tutorial the overlord and coordinator
is started as one process on port 8081. But in delete data tutorial the kill
task is sent to 8090 port, which fails.
2019-07-06 08:50:01 -07:00
Chi Cao Minh 0ded0ce414 Add round support for DS-HLL (#8023)
* Add round support for DS-HLL

Since the Cardinality aggregator has a "round" option to round off estimated
values generated from the HyperLogLog algorithm, add the same "round" option to
the DataSketches HLL Sketch module aggregators to be consistent.

* Fix checkstyle errors

* Change HllSketchSqlAggregator to do rounding

* Fix test for standard-compliant null handling mode
2019-07-05 15:37:58 -07:00
Clint Wylie 42a7b8849a remove FirehoseV2 and realtime node extensions (#8020)
* remove firehosev2 and realtime node extensions

* revert intellij stuff

* rat exclusion
2019-07-04 15:40:22 -07:00
Gian Merlino 613f09b45a SQL: Add TIME_CEIL function. (#8027)
Also simplify conversions for CEIL, FLOOR, and TIME_FLOOR by allowing them to
share more code.
2019-07-04 15:40:03 -07:00
Clint Wylie 3b84246cd6 add SQL docs for multi-value string dimensions (#8011)
* add SQL docs for multi-value string dimensions

* formatting consistency

* fix typo

* adjust
2019-07-03 08:22:33 -07:00
Clint Wylie c556d44a19
more sql support for expression array functions (#7974)
* more sql support for expression array functions

* prepend/slice

* doc fixes

* fix imports

* fix tests

* add null numeric expr for proper conversions between ExprEval and Expr and back to ExprEval

* re-arrange

* imports :(

* add append/prepend test
2019-07-02 21:39:26 -07:00
Clint Wylie f7283378ac
remove deprecated standalone realtime node (#7915)
* remove CliRealtime, RealtimeManager, etc

* add redirects for deleted page to page that explains the deleted thing

* adjust docs
2019-07-02 18:12:17 -07:00
Clint Wylie 93b738bbfa
expression language array constructor and sql multi-value string filtering support (#7973)
* expr array constructor and sql multi-value string support

* doc fix

* checkstyle

* change from feedback
2019-07-01 15:14:50 -07:00
Eyal Yurman 3650eed1aa Improve pull-deps reference in extensions page. (#8002) 2019-07-01 11:18:27 -07:00
Xue Yu 2831944056 support NVL sql function (#7965)
* sql nvl

* add nvl in sql doc
2019-06-30 13:14:30 -07:00
Jihoon Son f148249f64 Fix wrong redirect for orc extension (#7983) 2019-06-27 16:27:08 -07:00
Alexander Saydakov f38a62e949 theta sketch to string post agg (#7937) 2019-06-27 15:09:57 -07:00
Vadim Ogievetsky ad45ef12ed fix SQL doc comment (#7981) 2019-06-27 15:05:45 -07:00
Jihoon Son c4aaf26797 Add missing redirect for ORC extension document (#7979) 2019-06-27 14:23:44 -07:00
Clint Wylie 10d6b0318d clarify granularity docs (#7977) 2019-06-27 08:51:22 -07:00
Xue Yu 5464c8938f Add array_slice and array_unshift function expr (#7950)
* add array_slice and array_unshift function expr

* feedback address
2019-06-26 16:56:09 -07:00
Benedict Jin 16aafd5788 [ImgBot] Optimize images (#7873)
*Total -- 10,997.25kb -> 7,160.16kb (34.89%)

/publications/radstack/figures/precompute.png -- 54.20kb -> 16.97kb (68.69%)
/web-console/favicon.png -- 4.41kb -> 1.61kb (63.58%)
/docs/img/indexing_service.png -- 47.37kb -> 21.96kb (53.64%)
/docs/img/segmentPropagation.png -- 62.94kb -> 29.85kb (52.57%)
/docs/content/tutorials/img/tutorial-quickstart-01.png -- 55.62kb -> 29.13kb (47.62%)
/docs/content/tutorials/img/tutorial-deletion-02.png -- 791.43kb -> 429.30kb (45.76%)
/docs/content/tutorials/img/tutorial-deletion-03.png -- 786.79kb -> 427.05kb (45.72%)
/docs/content/tutorials/img/tutorial-retention-00.png -- 135.06kb -> 75.88kb (43.82%)
/docs/content/tutorials/img/tutorial-batch-data-loader-10.png -- 77.23kb -> 43.47kb (43.71%)
/docs/content/tutorials/img/tutorial-batch-data-loader-01.png -- 97.03kb -> 55.16kb (43.15%)
/docs/content/tutorials/img/tutorial-batch-data-loader-07.png -- 79.49kb -> 45.44kb (42.84%)
/docs/content/tutorials/img/tutorial-retention-02.png -- 401.30kb -> 234.68kb (41.52%)
/docs/content/tutorials/img/tutorial-compaction-06.png -- 343.27kb -> 201.87kb (41.19%)
/docs/content/tutorials/img/tutorial-batch-data-loader-09.png -- 105.14kb -> 61.86kb (41.16%)
/docs/content/tutorials/img/tutorial-retention-06.png -- 227.57kb -> 134.35kb (40.97%)
/docs/content/tutorials/img/tutorial-compaction-04.png -- 304.83kb -> 180.04kb (40.94%)
/docs/content/tutorials/img/tutorial-compaction-02.png -- 273.18kb -> 162.67kb (40.45%)
/docs/content/tutorials/img/tutorial-query-05.png -- 85.03kb -> 50.64kb (40.44%)
/publications/radstack/figures/druid_vs_bigquery.png -- 155.44kb -> 92.85kb (40.27%)
/docs/content/tutorials/img/tutorial-kafka-02.png -- 122.51kb -> 73.93kb (39.65%)
/docs/content/tutorials/img/tutorial-deletion-01.png -- 70.37kb -> 42.56kb (39.52%)
/docs/content/tutorials/img/tutorial-batch-data-loader-06.png -- 103.50kb -> 62.79kb (39.33%)
/docs/content/tutorials/img/tutorial-batch-submit-task-01.png -- 111.25kb -> 67.73kb (39.12%)
/docs/content/tutorials/img/tutorial-query-03.png -- 103.60kb -> 63.51kb (38.69%)
/docs/content/tutorials/img/tutorial-query-04.png -- 105.79kb -> 64.87kb (38.69%)
/docs/content/tutorials/img/tutorial-batch-data-loader-11.png -- 130.20kb -> 81.34kb (37.53%)
/docs/content/tutorials/img/tutorial-query-07.png -- 122.52kb -> 76.79kb (37.32%)
/docs/content/tutorials/img/tutorial-kafka-01.png -- 133.12kb -> 83.47kb (37.3%)
/docs/content/tutorials/img/tutorial-query-06.png -- 127.55kb -> 80.28kb (37.06%)
/docs/content/tutorials/img/tutorial-batch-submit-task-02.png -- 133.07kb -> 84.06kb (36.83%)
/docs/content/tutorials/img/tutorial-retention-05.png -- 60.19kb -> 38.08kb (36.74%)
/docs/content/tutorials/img/tutorial-batch-data-loader-03.png -- 211.92kb -> 134.22kb (36.66%)
/docs/content/tutorials/img/tutorial-batch-data-loader-05.png -- 250.36kb -> 158.68kb (36.62%)
/publications/radstack/figures/radstack.png -- 16.80kb -> 10.67kb (36.48%)
/docs/content/tutorials/img/tutorial-batch-data-loader-08.png -- 158.59kb -> 101.49kb (36%)
/docs/content/tutorials/img/tutorial-batch-data-loader-04.png -- 255.10kb -> 163.33kb (35.97%)
/docs/content/tutorials/img/tutorial-query-02.png -- 126.92kb -> 81.42kb (35.85%)
/docs/content/tutorials/img/tutorial-compaction-01.png -- 53.86kb -> 34.87kb (35.25%)
/docs/img/druid-architecture.png -- 202.23kb -> 130.97kb (35.24%)
/docs/content/tutorials/img/tutorial-retention-01.png -- 52.69kb -> 34.35kb (34.81%)
/docs/img/druid-timeline.png -- 35.87kb -> 23.59kb (34.22%)
/docs/content/tutorials/img/tutorial-query-01.png -- 149.53kb -> 98.56kb (34.08%)
/docs/content/tutorials/img/tutorial-retention-04.png -- 65.91kb -> 43.57kb (33.89%)
/docs/content/tutorials/img/tutorial-compaction-08.png -- 42.24kb -> 28.08kb (33.53%)
/docs/content/tutorials/img/tutorial-compaction-07.png -- 39.17kb -> 26.06kb (33.47%)
/docs/content/tutorials/img/tutorial-compaction-03.png -- 39.17kb -> 26.13kb (33.3%)
/docs/content/tutorials/img/tutorial-compaction-05.png -- 38.85kb -> 25.96kb (33.17%)
/publications/demo/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%)
/publications/radstack/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%)
/publications/whitepaper/figures/throughput_vs_cardinality.png -- 73.49kb -> 49.31kb (32.9%)
/docs/content/tutorials/img/tutorial-retention-03.png -- 43.11kb -> 29.33kb (31.97%)
/publications/radstack/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%)
/publications/whitepaper/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%)
/publications/demo/figures/throughput_vs_num_dims.png -- 72.86kb -> 49.72kb (31.76%)
/publications/radstack/figures/joined.png -- 164.14kb -> 113.47kb (30.87%)
/docs/content/tutorials/img/tutorial-batch-data-loader-02.png -- 508.93kb -> 351.85kb (30.87%)
/publications/radstack/figures/imps_clicks.png -- 190.95kb -> 132.70kb (30.51%)
/publications/radstack/figures/shuffled.png -- 180.46kb -> 128.21kb (28.95%)
/publications/radstack/figures/pipeline.png -- 392.54kb -> 281.93kb (28.18%)
/docs/img/druid-manage-1.png -- 108.94kb -> 78.53kb (27.92%)
/publications/radstack/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%)
/publications/demo/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%)
/publications/whitepaper/figures/throughput_vs_num_metrics.png -- 85.25kb -> 61.80kb (27.51%)
/docs/img/druid-production.png -- 50.00kb -> 39.18kb (21.63%)
/docs/img/druid-dataflow-3.png -- 88.25kb -> 69.75kb (20.96%)
/publications/demo/figures/realtime_flow.png -- 51.12kb -> 40.61kb (20.56%)
/publications/demo/figures/realtime_timeline.png -- 36.15kb -> 29.24kb (19.12%)
/publications/demo/figures/tpch_scaling.png -- 43.21kb -> 34.97kb (19.08%)
/publications/demo/figures/caching.png -- 35.26kb -> 29.09kb (17.49%)
/dev/intellij-sdk-config.jpg -- 1,019.35kb -> 864.37kb (15.2%)
/docs/img/druid-column-types.png -- 101.53kb -> 91.17kb (10.2%)
/docs/img/druid-dataflow-2x.png -- 138.30kb -> 127.11kb (8.09%)
2019-06-24 21:27:48 -07:00
Jonathan Wei 35601bb7a0 Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory (#7784)
* Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory

* Add finalizeAsBase64Binary option to ApproximateHistogramFactory

* Update approx histogram doc
2019-06-21 18:00:19 -07:00
Clint Wylie 494b8ebe56 multi-value string column support for expressions (#7588)
* array support for expression language for multi-value string columns

* fix tests?

* fixes

* more tests

* fixes

* cleanup

* more better, more test

* ignore inspection

* license

* license fix

* inspection

* remove dumb import

* more better

* some comments

* add expr rewrite for arrayfn args for more magic, tests

* test stuff

* more tests

* fix test

* fix test

* castfunc can deal with arrays

* needs more empty array

* more tests, make cast to long array more forgiving

* refactor

* simplify ExprMacro Expr implementations with base classes in core

* oops

* more test

* use Shuttle for Parser.flatten, javadoc, cleanup

* fixes and more tests

* unused import

* fixes

* javadocs, cleanup, refactors

* fix imports

* more javadoc

* more javadoc

* more

* more javadocs, nonnullbydefault, minor refactor

* markdown fix

* adjustments

* more doc

* move initial filter out

* docs

* map empty arg lambda, apply function argument validation

* check function args at parse time instead of eval time

* more immutable

* more more immutable

* clarify grammar

* fix docs

* empty array is string test, we need a way to make arrays better maybe in the future, or define empty arrays as other types..
2019-06-19 13:57:37 -07:00
Clint Wylie 71997c16a2 switch links from druid.io to druid.apache.org (#7914)
* switch links from druid.io to druid.apache.org

* fix it
2019-06-18 09:06:27 -07:00
Vadim Ogievetsky 24dd4573da Added the web console to the quickstart tutorials and docs (#7863)
* added console to the quickstart tutorials

* feedback fixes

* feedback fixes

* more typo fixes

* moved reseting cluster section after load data

* update images

* stage -> step

* feedback fixes

* more feedback fixes
2019-06-17 18:00:54 -07:00
Himanshu b3328b2785
endpoint to delete lookup tier and remove tier on last lookup deletion (#7852) 2019-06-15 17:55:50 -07:00
Justin Borromeo 8e5003b01c Scan Doc Change (#7903) 2019-06-15 01:21:34 -07:00
Jihoon Son 3cd9a7507d Fix script for dependencies report for extensions (#7899) 2019-06-14 18:53:50 -07:00
Jihoon Son a648e1548d Add support of --exclude-extension argument for dependency report script (#7786) 2019-06-14 15:18:59 -07:00
Xue Yu 456a3654ce add PolygonBound and missing extentions list doc (#7885) 2019-06-13 12:03:58 -07:00
Clint Wylie 8117222da3 use right port for kafka tutorial, reinfoce that tutorials assume you are using micro-quickstart single-server configuration (#7862) 2019-06-11 08:50:52 -07:00
Xue Yu ce591d1457 Support var_pop, var_samp, stddev_pop and stddev_samp etc in sql (#7801)
* support var_pop, stddev_pop etc in sql

* fix sql compatible

* rebase on master

* update doc
2019-06-10 09:40:09 -07:00
Clint Wylie 3fbb0a5e00 Supervisor list api with states and health (#7839)
* allow optionally listing all supervisors with their state and health

* docs

* add state to full

* clean

* casing

* format

* spelling
2019-06-07 16:26:33 -07:00
Jihoon Son 61ec521135
Remove keepSegmentGranularity option for compaction (#7747)
* Remove keepSegmentGranularity option from compaction

* fix it test

* clean up

* remove from web console

* fix test
2019-06-03 12:59:15 -07:00
Jihoon Son e289820bbd Add a script to find missing backports (#7817) 2019-06-03 07:56:52 -07:00
Eyal Yurman 69e9b8a464 Enables SQL by default. (#7808) 2019-05-31 20:53:42 -07:00
Justin Borromeo 8032c4add8 Add errors and state to stream supervisor status API endpoint (#7428)
* Add state and error tracking for seekable stream supervisors

* Fixed nits in docs

* Made inner class static and updated spec test with jackson inject

* Review changes

* Remove redundant config param in supervisor

* Style

* Applied some of Jon's recommendations

* Add transience field

* write test

* implement code review changes except for reconsidering logic of markRunFinishedAndEvaluateHealth()

* remove transience reporting and fix SeekableStreamSupervisorStateManager impl

* move call to stateManager.markRunFinished() from RunNotice to runInternal() for tests

* remove stateHistory because it wasn't adding much value, some fixes, and add more tests

* fix tests

* code review changes and add HTTP health check status

* fix test failure

* refactor to split into a generic SupervisorStateManager and a specific SeekableStreamSupervisorStateManager

* fixup after merge

* code review changes - add additional docs

* cleanup KafkaIndexTaskTest

* add additional documentation for Kinesis indexing

* remove unused throws class
2019-05-31 17:16:01 -07:00
Jonathan Wei 83152a7a00 Fix performance-faq and remove insert-segment-to-db redirects (#7759) 2019-05-24 13:20:02 -07:00
Jonathan Wei cfb7756c9b Fix references to removed performance FAQ page (#7755) 2019-05-24 11:52:40 -07:00
Jonathan Wei eb0e1a056c Add limit to timeseries docs (#7750) 2019-05-23 19:41:52 -07:00
Jonathan Wei f2e34a76bd Fix TOC clustering example link (#7749) 2019-05-23 19:41:27 -07:00
Jonathan Wei ec4d09a02f Remove obsolete isExcluded config from Kerberos authenticator (#7745) 2019-05-23 16:00:05 -07:00
awelsh93 6964ac23a2 Adding influxdb emitter as a contrib extension (#7717)
* Adding influxdb emitter as a contrib extension

* addressing code review comments
2019-05-23 11:11:48 -07:00
Fangjin Yang 3dec5cd1e4
reorganizing the ToC (#7734) 2019-05-23 09:24:38 -07:00
gocho1 bd899b9224 add s3 authentication method informations (#7674)
* add s3 authentication method informations

* add druid.s3.fileSessionCredentials related content

* remove authentication parameters to avoid confusion as it is more detailed in S3 Deep Storage page

* streamline s3 docs
2019-05-22 11:46:02 -07:00
Gian Merlino cbbce955de SQL: Allow NULLs in place of optional arguments in many functions. (#7709)
* SQL: Allow NULLs in place of optional arguments in many functions.

Also adjust SQL docs to describe how to make time literals using
TIME_PARSE (which is now possible in a nicer way).

* Be less forbidden.
2019-05-21 11:54:34 -07:00
Gian Merlino b6941551ae Upgrade various build and doc links to https. (#7722)
* Upgrade various build and doc links to https.

Where it wasn't possible to upgrade build-time dependencies to https,
I kept http in place but used hardcoded checksums or GPG keys to ensure
that artifacts fetched over http are verified properly.

* Switch to https://apache.org.
2019-05-21 11:30:14 -07:00
Xue Yu dd7dace70a Add TIMESTAMPDIFF sql support (#7695)
* add timestampdiff sql support

* feedback address
2019-05-21 08:05:38 -07:00
Vadim Ogievetsky 156322932f Update Druid Console docs for 0.15.0 (#7697)
* Update Druid Console docs for 0.15.0

* SQL -> query

* added links and fix typos
2019-05-21 04:00:42 -07:00
andrewluotechnologies 1add566411 Fix typo (ComplexMetricSerde class name was spelled incorrectly) (#7694) 2019-05-19 09:49:54 -07:00
Jihoon Son 94721de141 Add auto tagging milestone script (#7677)
* Add auto tagging milestone script

* fix usage

* missing newline

* missing newline
2019-05-16 23:11:16 -07:00
Clint Wylie 939b417379 Update tutorial-kafka.md (#7678) 2019-05-16 23:10:45 -07:00
Jonathan Wei d99f77a01b
Add option to use YARN RM as fallback for JobHistory failure (#7673)
* Add option to use YARN RM as fallback for job status

* PR comments
2019-05-16 13:59:10 -07:00
Fangjin Yang dc85a5309e
some more doc improvements (#7675) 2019-05-16 13:17:21 -07:00
Jonathan Wei d667655871 Add basic tuning guide, getting started page, updated clustering docs (#7629)
* Add basic tuning guide, getting started page, updated clustering docs

* Add note about caching, fix tutorial paths

* Adjust hadoop wording

* Add license

* Tweak

* Shrink overlord heaps, fix tutorial urls

* Tweak xlarge peon, update peon sizing

* Update Data peon buffer size

* Fix cluster start scripts

* Add upper level _common to classpath

* Fix cluster data/query confs

* Address PR comments

* Elaborate on connection pools

* PR comments

* Increase druid.broker.http.maxQueuedBytes

* Add guidelines for broker backpressure

* PR comments
2019-05-16 11:13:48 -07:00
Benedict Jin 3df364c472 Fix broken links in api-reference.md (#7670) 2019-05-15 18:53:34 -07:00
Clint Wylie c2abbc24a7 minor web console doc fixes (#7668) 2019-05-15 18:52:51 -07:00
Surekha d3545f5086 Show all server types in sys.servers table (#7654)
* update sys.servers table to show all servers

* update docs

* Fix integration test

* modify test query for batch integration test

* fix case in test queries

* make the server_type lowercase

* Apply suggestions from code review

Co-Authored-By: Himanshu <g.himanshu@gmail.com>

* Fix compilation from git suggestion

* fix unit test
2019-05-15 16:54:02 -07:00
Gian Merlino 0352f450d7 Fix broken links in docs, add broken link checker. (#7658)
Also adds back insert-segment-to-db.md with some docs about why and
when it was removed (in #6911).
2019-05-15 14:49:50 -07:00
Surekha 917106985f Update tutorial to delete data (#7577)
* Update tutorial to delete data

* update tutorial, remove old ways to drop data

* PR comments
2019-05-15 14:40:06 -07:00
Jonathan Wei e874da7cea
Add simpler permissions option to BasicAuthorizer GET APIs (#7635)
* Add simpler permissions option to BasicAuthorizer GET APIs

* Adjust log message

Co-Authored-By: Himanshu <g.himanshu@gmail.com>

* Adjust log message

Co-Authored-By: Himanshu <g.himanshu@gmail.com>
2019-05-15 12:59:32 -07:00
Clint Wylie b87c8f0314 fix lookup editor to use lookup tiers instead of historical tiers (#7647)
* fix lookup editor to use lookup tiers instead of historical tiers

* use default tier if empty response, fix if configured lookups is null

* fixes

* fix typo
2019-05-14 13:30:51 -07:00
Alexander Saydakov ca1a6649f6 Datasketches quantiles more post-aggs (#7550)
* rank and CDF post-aggs

* added post-aggs to the module

* added new post-aggs

* moved post-agg IDs

* moved post-agg IDs
2019-05-10 11:46:54 -07:00
Clint Wylie 402d76a10f make-redirects.py requires python3, explicitly specify it (#7625) 2019-05-09 21:32:58 -07:00
Clint Wylie 6a6c6d573d
Add plain text README.txt, use relative link from README.md to build.md (#7611)
* use relative link to build instructions from top level readme

* add textfile to readme

* formatting

* make README.BINARY plaintext, move LABELS.md to LABELS, README.txt to README

* exclude README.BINARY still

* remove jdk links/recommmendations

* add script to use DRUIDVERSION in textfile README instead of latest, add links to recommended jdk to build.md

* license

* better readme template, links to latest if does not detect an apache release version

* fix
2019-05-09 21:29:26 -07:00
Samarth Jain b542bb9f34 TDigest backed sketch aggregators (#7331)
* First set of changes for tDigest histogram

* Add license

* Address code review comments

* Add a doc page for new T-Digest sketch aggregators. Minor code cleanup and comments.

* Remove synchronization from BufferAggregators. Address code review comments

* Fix typo
2019-05-09 17:22:55 -07:00
Magnus Henoch 2ac112151f Fix formatting in scan query documentation (#7622)
Escape underscores in `__time`, so they're not interpreted as bold
formatting.
2019-05-09 11:32:37 -07:00
Jinseon Lee 0ef435a16c add postgresql meta db table schema configuration property (#7137) (#7183)
* add postgresql meta db table schema configuration property (#7137)

If the postgresql db schema changes, you must set the configuration
values.
You do not need to set it if there is no change from the default schema
'public'.
druid.metadata.postgres.dbTableSchema=public

* create postgresql metadb table schema configuration property (#7137)
If the postgresql db schema changes, you must set the configuration
values.
You do not need to set it if there is no change from the default schema
'public'.
druid.metadata.postgres.dbTableSchema=public
check PostgreSQLTablesConfig.java

* modify postgresql readme file. - metadb table schema (#7137)
If the postgresql db schema changes, you must set the configuration
values.
You do not need to set it if there is no change from the default schema
'public'.
druid.metadata.postgres.dbTableSchema=public
check PostgreSQLTablesConfig.java
2019-05-08 12:56:30 -07:00
Jonathan Wei dadf6a2f11
Add tool for migrating from local deep storage/Derby metadata (#7598)
* Add tool for migrating from local deep storage/Derby metadata

* Split deep storage and metadata migration docs

* Support import into Derby

* Fix create tables cmd

* Fix create tables cmd

* Fix commands

* PR comment

* Add -p
2019-05-06 23:39:40 -07:00
Jonathan Wei 7c2ca474da Add single-machine deployment example cfgs and scripts (#7590)
* Add single-machine deployment example cfgs and scripts

* Add (8u92+)

* Use combined coordinator-overlord for single machine confs

* RAT fix
2019-05-06 19:11:13 -07:00
Gian Merlino 727b65c7e5 Remove SQL experimental banner and other doc adjustments. (#7591)
* Remove SQL experimental banner and other doc adjustments.

Also,

- Adjust the ToC and other docs a bit so SQL and native queries are
  presented on more equal footing.
- De-emphasize querying historicals and peons directly in the
  native query docs. This is a really niche thing and may have been
  confusing to include prominently in the very first paragraph.
- Remove DataSketches and Kafka indexing service from the experimental
  features ToC. They are not experimental any longer and were there in
  error.

* More notes.

* Slight tweak.

* Remove extra extra word.

* Remove RT node from ToC.
2019-05-06 12:31:51 -07:00
Samarth Jain afbcb9c07f Improve parallelism of zookeeper based segment change processing (#7088)
* V1 - improve parallelism of zookeeper based segment change processing

* Create zk nodes in batches. Address code review comments.
Introduce various configs.

* Add documentation for the newly added configs

* Fix test failures

* Fix more test failures

* Remove prinstacktrace statements

* Address code review comments

* Use a single queue

* Address code review comments

Since we have a separate load peon for every historical, just having a single SegmentChangeProcessor
task per historical is enough. This commit also gets rid of the associated config druid.coordinator.loadqueuepeon.curator.numCreateThreads

* Resolve merge conflict

* Fix compilation failure

* Remove batching since we already have a dynamic config maxSegmentsInNodeLoadingQueue that provides that control

* Fix NPE in test

* Remove documentation for configs that are no longer needed

* Address code review comments

* Address more code review comments

* Fix checkstyle issue

* Address code review comments

* Code review comments

* Add back monitor node remove executor

* Cleanup code to isolate null checks  and minor refactoring

* Change param name since it conflicts with member variable name
2019-05-03 15:58:42 +02:00
Jonathan Wei a013350018 Adjust required permissions for system schema (#7579)
* Adjust required permissions for system schema

* PR comments, fix current_size handling

* Checkstyle

* Set curr_size instead of current_size

* Adjust information schema docs

* Fix merge conflict

* Update tests
2019-05-02 07:18:02 -07:00
Surekha 15d19f3059 Add is_overshadowed column to sys.segments table (#7425)
* Add is_overshadowed column to sys.segments table

* update docs

* Rename class and variables

* PR comments

* PR comments

* remove unused variables in MetadataResource

* move constants together

* add getFullyOvershadowedSegments method to ImmutableDruidDataSource

* Fix compareTo of SegmentWithOvershadowedStatus

* PR comment

* PR comments

* PR comments

* PR comments

* PR comments

* fix issue with already consumed stream

* minor refactoring

* PR comments
2019-05-01 18:00:57 +02:00
Gian Merlino c648775b5b SQL: Remove "useFallback" feature. (#7567)
This feature allows Calcite's Bindable interpreter to be bolted on
top of Druid queries and table scans. I think it should be removed for
a few reasons:

1. It is not recommended for production anyway, because it generates
unscalable query plans (e.g. it will plan a join into two table scans
and then try to do the entire join in memory on the broker).
2. It doesn't work with Druid-specific SQL functions, like TIME_FLOOR,
REGEXP_EXTRACT, APPROX_COUNT_DISTINCT, etc.
3. It makes the SQL planning code needlessly complicated.

With SQL coming out of experimental status soon, it's a good opportunity
to remove this feature.
2019-04-28 18:26:44 -07:00
Eyal Yurman f02251ab2d Contributing Moving-Average Query to open source. (#6430)
* Contributing Moving-Average Query to open source.

* Fix failing code inspections.

* See if explicit types will invoke the correct comparison function.

* Explicitly remove support for druid.generic.useDefaultValueForNull configuration parameter.

* Update styling and headers for complience.

* Refresh code with latest master changes:

* Remove NullDimensionSelector.
* Apply changes of RequestLogger.
* Apply changes of TimelineServerView.

* Small checkstyle fix.

* Checkstyle fixes.

* Fixing rat errors; Teamcity errors.

* Removing support theta sketches. Will be added back in this pr or a following once DI conflicts with datasketches are resolved.

* Implements some of the review fixes.

* Contributing Moving-Average Query to open source.

* Fix failing code inspections.

* See if explicit types will invoke the correct comparison function.

* Explicitly remove support for druid.generic.useDefaultValueForNull configuration parameter.

* Update styling and headers for complience.

* Refresh code with latest master changes:

* Remove NullDimensionSelector.
* Apply changes of RequestLogger.
* Apply changes of TimelineServerView.

* Small checkstyle fix.

* Checkstyle fixes.

* Fixing rat errors; Teamcity errors.

* Removing support theta sketches. Will be added back in this pr or a following once DI conflicts with datasketches are resolved.

* Implements some of the review fixes.

* More fixes for review.

* More fixes from review.

* MapBasedRow is Unmodifiable. Create new rows instead of modifying existing ones.

* Remove more changes related to datasketches support.

* Refactor BaseAverager startFrom field and add a comment.

* fakeEvents field: Refactor initialization and add comment.

* Rename parameters (tiny change).

* Fix variable name typo in test (JAN_4).

* Fix styling of non camelCase fields.

* Fix Preconditions.checkArgument for cycleSize.

* Add more documentation to RowBucketIterable and other classes.

* key/value comment on in MovingAverageIterable.

* Fix anonymous makeColumnValueSelector returning null.

* Replace IdentityYieldingAccumolator with Yielders.each().

* * internalNext() should return null instead of throwing exception.
* Remove unused variables/prarameters.

* Harden MovingAverageIterableTest (Switch anyOf to exact match).

* Change internalNext() from recursion to iteration; Simplify next() and hasNext().

* Remove unused imports.

* Address review comments.

* Rename fakeEvents to emptyEvents.

* Remove redundant parameter key from computeMovingAverage.

* Check yielder as well in RowBucketIterable#hasNext()

* Fix javadoc.
2019-04-26 17:07:48 -07:00
Adam Peck ebdf07b69f Add reload by interval API (#7490)
* Add reload by interval API
Implements the reload proposal of #7439
Added tests and updated docs

* PR updates

* Only build timeline with required segments
Use 404 with message when a segmentId is not found
Fix typo in doc
Return number of segments modified.

* Fix checkstyle errors

* Replace String.format with StringUtils.format

* Remove return value

* Expand timeline to segments that overlap for intervals
Restrict update call to only segments that need updating.

* Only add overlapping enabled segments to the timeline

* Some renames for clarity
Added comments

* Don't rely on cached poll data
Only fetch required information from DB

* Match error style

* Merge and cleanup doc

* Fix String.format call

* Add unit tests

* Fix unit tests that check for overshadowing
2019-04-26 16:01:50 -07:00
Clint Wylie 09b7700d13 fix docs (#7556) 2019-04-25 22:00:37 -07:00
Justin Borromeo 012ab02bf4 Update select doc disclaimer (#7554) 2019-04-25 19:23:39 -07:00
Surekha 8308ffef1f API to drop data by interval (#7494)
* Add api to drop data by interval

* update to address comments

* unused imports

* PR comments + add tests in SQLMetadataSegmentManagerTest

*  update tests and docs
2019-04-25 14:24:40 -07:00
Jonathan Wei 658fb2b062 Fix bugs in milestone contributor script (#7545)
* Only check PRs in milestone contributor script

* Fix no-pagination bug
2019-04-24 22:11:57 -07:00
Jonathan Wei 8b1a4e18dd Additional Apache branding doc updates (#7524) 2019-04-23 14:39:16 -07:00
Xue Yu 2c8a71f883 Support LPAD and RPAD sql function (#7388)
* lpad and rpad sql function

* feedback address

* feedback address

* add doc and format

* update docs
2019-04-22 14:51:32 -07:00
Jonathan Wei 3487663de9 Adjust approx agg deprecation wording (#7518) 2019-04-19 19:31:50 -07:00
Jonathan Wei 74960e82bf Add more Apache branding to docs (#7515) 2019-04-19 15:52:26 -07:00
Slim Bouguerra 5463ecb979 Fix broken link due to Typo. (#7513)
Change-Id: I5792f89ed6afe945f386058edd44f0400998460a
2019-04-19 09:58:54 -07:00
Jonathan Wei 8078f567aa Update kafka version in tutorials (#7500) 2019-04-17 14:56:29 -07:00
Kazuhito Takeuchi 7c19c92a81 Add ROUND function in druid-sql. (#7224)
* Implement round function in druid-sql

* Return value according to the type of argument

* Fix codes for abnoraml inputs, updated math-expr.md

* Fix assert text

* Fix error messages and refactor codes

* Fix compile error, update sql.md, refactor codes and format tests
2019-04-16 11:15:39 -07:00
Lucas Capistrant 8acad27d99 Enhance the Http Firehose to work with URIs requiring basic authentication (#7145)
* Enhnace the HttpFirehose to work with both insecure URIs and URIs requiring basic authentication

* Improve security of enhanced HttpFirehoseFactory by not logging auth credentials

* Fix checkstyle failure in HttpFirehoseFactory.java

* Update docs and fix TeamCity build with required noinspection

* Indentation cleanup and logic modification for HttpFirehose object stream

* Remove default Empty string password provider in http firehose

* Add JavaDoc for MixIn describing its intended use

* Reverting documentation notation for json code to be inline with rest of doc

* Improve instantiation of ObjectMappers that require MixIn for redacting password from task logs

* Add comment to clarify fully qualified references of Objects in SQLMetadataStorageActionHandler
2019-04-15 14:29:01 -07:00
Justin Borromeo 85f10ed0d0 Support querying realtime segments using time-ordered scan queries and fix broken scan queries without time column (#7454)
* Update scan query runner factory to accept SpecificSegmentSpec

*  nit

* Sorry travis

* Improve logging and fix doc

* Bug fix

* Friendlier error msgs and tests to cover bug

* Address Gian's comments

* Fix doc

* Added tests for empty and null column list

* Style

* Fix checking wrong order (looking at query param when it should be
looking at the null-handled order)

* Add test case for null order

* Fix ScanQueryRunnerTest

* Forbidden APIs fixed
2019-04-12 19:08:34 -07:00
zhaojiandong 1d9450da81 Some docs optimization (#6890)
* some markdown docs optimization

* markdown escape
2019-04-12 17:30:57 -07:00
Gian Merlino 2470b3279f SQL: Fix docs for STRING_FORMAT. (#7455) 2019-04-11 21:57:28 -07:00
Gian Merlino a517f8ce49 Coordinator: Allow dropping all segments. (#7447)
Removes the coordinator sanity check that prevents it from dropping all
segments. It's useful to get rid of this, since the behavior is
unintuitive for dev/testing clusters where users might regularly want
to drop all their data to get back to a clean slate.

But the sanity check was there for a reason: to prevent a race condition
where the coordinator might drop all segments if it ran before the
first metadata store poll finished. This patch addresses that concern
differently, by allowing methods in MetadataSegmentManager to return
null if a poll has not happened yet, and canceling coordinator runs
in that case.

This patch also makes the "dataSources" reference in
SQLMetadataSegmentManager volatile. I'm not sure why it wasn't volatile
before, but it seems necessary to me: it's not final, and it's dereferenced
from multiple threads without synchronization.
2019-04-11 08:45:38 -07:00
Justin Borromeo 408e3e1b2a Remove select execution code from SQL planner (#7416)
* Removed select execution code from SQL planner

* Update doc
2019-04-10 22:32:57 -07:00
Benjamin Hopp 78e6f6fb38 Updated Javascript Affinity config docs (#7441)
Updated with hostname:port rather than IP Address.
2019-04-10 21:44:50 -07:00
Benedict Jin 2f64414ade Add "REVERSE" / "REPEAT" / "RIGHT" / "LEFT" functions (#7334)
* Add "REVERSE" / "REPEAT" / "RIGHT" / "LEFT" functions

* Fix ImportOrder

* Use RuntimeException instead of OutOfMemoryError according to "Effective Java"

* Simplify

* Patch suggestions
2019-04-10 11:46:29 +08:00
Clint Wylie 89bb43f382 'core' ORC extension (#7138)
* orc extension reworked to use apache orc map-reduce lib, moved to core extensions, support for flattenSpec, tests, docs

* change binary handling to be compatible with avro and parquet, Rows.objectToStrings now converts byte[] to base64, change date handling

* better docs and tests

* fix it

* formatting

* doc fix

* fix it

* exclude redundant dependencies

* use latest orc-mapreduce, add hadoop jobProperties recommendations to docs

* doc fix

* review stuff and fix binaryAsString

* cache for root level fields

* more better
2019-04-09 09:03:26 -07:00
Justin Borromeo 799c66d9ac Allow max rows and max segments for time-ordered scans to be overridden using the scan query JSON spec (#7413)
* Initial changes

* Fixed NPEs

* Fixed failing spec test

* Fixed failing Calcite test

* Move configs to context

* Validated and added docs

* fixed weird indentation

* Update default context vals in doc

* Fixed allowable values
2019-04-07 20:12:52 -07:00
Clint Wylie e28a15f9f5 fix expressions docs operator table (#7420)
* fix expressions docs operator table

* Update math-expr.md
2019-04-07 20:12:00 -07:00
Justin Borromeo e23fd41fa7 Update SQL doc for planning change (#7415) 2019-04-05 15:14:07 -07:00
Jonathan Wei 0f6cb1e7e0 Update theta/hll sketch doc comparison (#7407) 2019-04-03 15:21:33 -07:00
Gian Merlino 8c104a115c
SQL: Add STRING_FORMAT function. (#7327) 2019-04-03 17:09:54 -04:00
David Glasser 4e23c11345 Make IngestSegmentFirehoseFactory splittable for parallel ingestion (#7048)
* Make IngestSegmentFirehoseFactory splittable for parallel ingestion

* Code review feedback

- Get rid of WindowedSegment
- Don't document 'segments' parameter or support splitting firehoses that use it
- Require 'intervals' in WindowedSegmentId (since it won't be written by hand)

* Add missing @JsonProperty

* Integration test passes

* Add unit test

* Remove two FIXME comments from CompactionTask

I'd like to leave this PR in a potentially mergeable state, but I still would
appreciate reviewer eyes on the questions I'm removing here.

* Updates from code review
2019-04-02 14:59:17 -07:00
Xue Yu 78fd5aff21 support radians and degrees in sql (#7336)
* support radians and degrees in sql

* update test case
2019-04-02 12:47:49 -07:00
Qi Shu 134f71d1b4 Add documentation for Druid native query in SQL view of web console (#7381)
* Add docmentation for Druid native query in SQL view of web console

* Edit sentence
2019-04-02 12:20:51 -07:00
Michael Trelinski 347779b17a Zookeeper loss (#6740)
* Update init

Fix bin/init to source from proper directory.

* Fix for Proposal #6518: Shutdown druid processes upon complete loss of ZK connectivity

* Zookeeper Loss:

- Add feature documentation
- Cosmetic refactors
- Variable extractions
- Remove getter

* - Change config key name and reword documentation
- Switch from Function<Void,Void> to Runnable/Lambda
- try { … } finally { … }

* Fix line length too long

* - change to formatted string for logging
- use System.err.println after lifecycle stops

* commenting on makeEnsembleProvider()-created Zookeeper termination

* Add javadoc

* added java doc reference back to apache discussion thread.

* move comment to other class

* favor two-slash comments instead of multiline comments
2019-03-29 15:10:42 -07:00
Justin Borromeo ad7862c58a Time Ordering On Scans (#7133)
* Moved Scan Builder to Druids class and started on Scan Benchmark setup

* Need to form queries

* It runs.

* Stuff for time-ordered scan query

* Move ScanResultValue timestamp comparator to a separate class for testing

* Licensing stuff

* Change benchmark

* Remove todos

* Added TimestampComparator tests

* Change number of benchmark iterations

* Added time ordering to the scan benchmark

* Changed benchmark params

* More param changes

* Benchmark param change

* Made Jon's changes and removed TODOs

* Broke some long lines into two lines

* nit

* Decrease segment size for less memory usage

* Wrote tests for heapsort scan result values and fixed bug where iterator
wasn't returning elements in correct order

* Wrote more tests for scan result value sort

* Committing a param change to kick teamcity

* Fixed codestyle and forbidden API errors

* .

* Improved conciseness

* nit

* Created an error message for when someone tries to time order a result
set > threshold limit

* Set to spaces over tabs

* Fixing tests WIP

* Fixed failing calcite tests

* Kicking travis with change to benchmark param

* added all query types to scan benchmark

* Fixed benchmark queries

* Renamed sort function

* Added javadoc on ScanResultValueTimestampComparator

* Unused import

* Added more javadoc

* improved doc

* Removed unused import to satisfy PMD check

* Small changes

* Changes based on Gian's comments

* Fixed failing test due to null resultFormat

* Added config and get # of segments

* Set up time ordering strategy decision tree

* Refactor and pQueue works

* Cleanup

* Ordering is correct on n-way merge -> still need to batch events into
ScanResultValues

* WIP

* Sequence stuff is so dirty :(

* Fixed bug introduced by replacing deque with list

* Wrote docs

* Multi-historical setup works

* WIP

* Change so batching only occurs on broker for time-ordered scans

Restricted batching to broker for time-ordered queries and adjusted
tests

Formatting

Cleanup

* Fixed mistakes in merge

* Fixed failing tests

* Reset config

* Wrote tests and added Javadoc

* Nit-change on javadoc

* Checkstyle fix

* Improved test and appeased TeamCity

* Sorry, checkstyle

* Applied Jon's recommended changes

* Checkstyle fix

* Optimization

* Fixed tests

* Updated error message

* Added error message for UOE

* Renaming

* Finish rename

* Smarter limiting for pQueue method

* Optimized n-way merge strategy

* Rename segment limit -> segment partitions limit

* Added a bit of docs

* More comments

* Fix checkstyle and test

* Nit comment

* Fixed failing tests -> allow usage of all types of segment spec

* Fixed failing tests -> allow usage of all types of segment spec

* Revert "Fixed failing tests -> allow usage of all types of segment spec"

This reverts commit ec470288c7.

* Revert "Merge branch '6088-Time-Ordering-On-Scans-N-Way-Merge' of github.com:justinborromeo/incubator-druid into 6088-Time-Ordering-On-Scans-N-Way-Merge"

This reverts commit 57033f36df, reversing
changes made to 8f01d8dd16.

* Check type of segment spec before using for time ordering

* Fix bug in numRowsScanned

* Fix bug messing up count of rows

* Fix docs and flipped boolean in ScanQueryLimitRowIterator

* Refactor n-way merge

* Added test for n-way merge

* Refixed regression

* Checkstyle and doc update

* Modified sequence limit to accept longs and added test for long limits

* doc fix

* Implemented Clint's recommendations
2019-03-28 14:37:09 -07:00
Surekha be318f4de3 Add column type to sys table docs (#7359)
* Add column type

* oops should be used=1
2019-03-27 20:21:57 -07:00
Charles Allen eeb3dbe79d Move GCP to a core extension (#6953)
* Move GCP to a core extension

* Don't provide druid-core >.<

* Keep AWS and GCP modules separate

* Move AWSModule to its own module

* Add aws ec2 extension and more modules in more places

* Fix bad imports

* Fix test jackson module

* Include AWS and GCP core in server

* Add simple empty method comment

* Update version to 15

* One more 0.13.0-->0.15.0 change

* Fix multi-binding problem

* Grep for s3-extensions and update docs

* Update extensions.md
2019-03-27 09:00:43 -07:00
Justin Borromeo c7fea6ac8f Added better QueryInterruptedException error message for UnsupportedOperationException (#7248)
* Added error message for UOE

* Updated docs

* Doc change

* Doc change
2019-03-26 15:20:24 -07:00
Gian Merlino 4ca5fe0f60 SQL: Add PARSE_LONG function. (#7326)
* SQL: Add PARSE_LONG function.

* Fix test.
2019-03-22 15:40:10 -07:00
Vadim Ogievetsky e4f2dcacf2 Druid console docs (#7300)
* console docs

* fix typo
2019-03-21 00:37:33 -07:00
Justin Borromeo ff94bd16e6 Fix conflicting information in configuration doc (#7299)
* Doc fix

* Fix typo
2019-03-19 14:55:58 -07:00
Qi Shu 5406aaa49d Add SQL auto complete in druid console (#7244)
* Add SQL auto complete in druid console

* Add comment in sql.md to alert user to change create-sql-function-doc if sql.md format gets changed
2019-03-16 01:45:53 -07:00
Jihoon Son 892d1d35d6
Deprecate NoneShardSpec and drop support for automatic segment merge (#6883)
* Deprecate noneShardSpec

* clean up noneShardSpec constructor

* revert unnecessary change

* Deprecate mergeTask

* add more doc

* remove convert from indexMerger

* Remove mergeTask

* remove HadoopDruidConverterConfig

* fix build

* fix build

* fix teamcity

* fix teamcity

* fix ServerModule

* fix compilation

* fix compilation
2019-03-15 23:29:25 -07:00
Atul Mohan 2daeb50008 Add support for optional client authentication on TLS (#7250)
* Add optional client auth

* Add docs
2019-03-15 15:14:34 -07:00
Hongze Zhang f9d99b245b Add missing doc link for operations/http-compression.html; Fix magic numbers in test cases using JettyServerInitUtils.wrapWithDefaultGzipHandler (#7110) 2019-03-13 14:09:19 -07:00
Clint Wylie 3895914aa2 consolidate CompressionUtils.java since now in the same jar (#6908) 2019-03-13 11:02:44 -04:00
Gian Merlino 9178793ab5 Further improve caching documentation. (#7236)
Follow-up to #7223 that fixes a doc bug (a result-level cache property
was misspelled), changes the recommended "small cluster" threshold from
20 to 5 servers, and clarifies behavior of the various caching options.
2019-03-11 17:57:00 -07:00
Pierre-Emile Ferron a88fbcd5db Improve caching doc (#7223)
- Set correct default values for query context result cache parameters
- Add details about broker cache impact on local historical merging
2019-03-11 20:06:28 -04:00
Venkatraman P 3118160387 Adding a tutorial in doc for using Kerberized Hadoop as deep storage. (#6863)
* Adding a tutorial in doc for using Kerberized Hadoop as deep storage.

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md

Fixed - to ~ in Apache License section.

* Update tutorial-kerberos-hadoop.md

* Update tutorial-kerberos-hadoop.md
2019-03-11 11:39:15 -07:00
Jonathan Wei e1d8c17746 Add commit ID milestone helper script (#7100)
* Add commit ID milestone helper script

* Filter on merged/closed in API call
2019-03-11 11:36:07 -07:00
Jonathan Wei 94463b5778 Add missing redirects and fix broken links (#7213)
* Add missing redirects

* Fix zookeeper redirect

* Fix broken links
2019-03-09 15:16:23 -08:00
jorbay-au 62f0de9b89 Remove outdated instruction for rule updates (#7205) 2019-03-08 16:42:08 -08:00
Clint Wylie a44df6522c rename maintenance mode to decommission (#7154)
* rename maintenance mode to decommission

* review changes

* missed one

* fix straggler, add doc about decommissioning stalling if no active servers

* fix missed typo, docs

* refine docs

* doc changes, replace generals

* add explicit comment to mention suppressed stats for balanceTier

* rename decommissioningVelocity to decommissioningMaxSegmentsToMovePercent and update docs

* fix precondition check

* decommissioningMaxPercentOfMaxSegmentsToMove

* fix test

* fix test

* fixes
2019-03-08 16:33:51 -08:00
Jihoon Son e48a9c138e Reduce default max # of subTasks to 1 for native parallel task (#7181)
* Reduce # of max subTasks to 2

* fix typo and add more doc

* add more doc and link

* change default and add warning

* fix doc

* add test

* fix it test
2019-03-05 22:06:36 -08:00
Jonathan Wei 9183e32876 Add more approximate algorithm docs (#7195) 2019-03-05 16:44:02 -08:00
Xue Yu 65118277a3 support sin cos etc trigonometric function in sql (#7182)
* support triangle function in sql

* feedback address
2019-03-04 19:18:22 -08:00
Jonathan Wei 5486c2abf8
Update LICENSE and NOTICE files (#7026)
* Update LICENSE and NOTICE files

* Update react-table version
2019-03-04 18:45:22 -08:00
Roman Leventov 10c9f6d708
Fix and document concurrency of EventReceiverFirehose and TimedShutoffFirehose; Refine concurrency specification of Firehose (#7038)
#### `EventReceiverFirehoseFactory`
Fixed several concurrency bugs in `EventReceiverFirehoseFactory`:
 - Race condition over putting an entry into `producerSequences` in `checkProducerSequence()`.
 - `Stopwatch` used to measure time across threads, but it's a non-thread-safe class.
 - Use `System.nanoTime()` instead of `System.currentTimeMillis()` because the latter are [not suitable](https://stackoverflow.com/a/351571/648955)  for measuring time intervals.
 - `close()` was not synchronized by could be called from multiple threads concurrently.

Removed unnecessary `readLock` (protecting `hasMore()` and `nextRow()` which are always called from a single thread). Removed unnecessary `volatile` modifiers.

Documented threading model and concurrent control flow of `EventReceiverFirehose` instances.

**Important:** please read the updated Javadoc for `EventReceiverFirehose.addAll()`. It allows events from different requests (batches) to be interleaved in the buffer. Is this OK?

#### `TimedShutoffFirehoseFactory`
- Fixed a race condition that was possible because `close()` that was not properly synchronized.

Documented threading model and concurrent control flow of `TimedShutoffFirehose` instances.

#### `Firehose`

Refined concurrency contract of `Firehose` based on `EventReceiverFirehose` implementation. Importantly, now it states that `close()` doesn't affect `hasMore()` and `nextRow()` and could be called concurrently with them. In other words, specified that `close()` is for "row supply" side rather than "row consume" side. However, I didn't check that other `Firehose` implementatations adhere to this contract.

<hr>

This issue is the result of reviewing `EventReceiverFirehose` and `TimedShutoffFirehose` using [this checklist](https://medium.com/@leventov/code-review-checklist-java-concurrency-49398c326154).
2019-03-04 18:50:03 -03:00
Jihoon Son ded03d9d4c Improve doc for auto compaction (#7117)
* Improve doc for auto compaction

* fix doc

* address comments
2019-03-02 12:21:50 -08:00
Jihoon Son 45f12de9ad Fix supported file formats for Hadoop vs Native batch doc (#7069)
* Fix supported file formats

* address comment
2019-02-28 19:44:45 -08:00
Jonathan Wei 32c418fdd8 Reword 'node' to 'process' (#7172) 2019-02-28 18:10:39 -08:00
Jonathan Wei a0afd7931d
Add web consoles doc page (#7123)
* Add web consoles doc page

* PR comments

* Remove 'unified'

* PR comments

* Fix TOC

* PR comments

* More revisions

* GUI -> UI

* Update router docs

* Reword router doc
2019-02-28 14:02:39 -08:00
Jonathan Wei 0b4f771062 Exclude hadoop-lzo from thrift-extensions build (#7151) 2019-02-27 19:57:53 -08:00
Jonathan Wei 3d247498ef Update tutorials for 0.14.0-incubating (#7157) 2019-02-27 19:50:31 -08:00
Jihoon Son 6b232d8195 Improve compaction tutorial to demonstrate compaction with keepSegmentGranularity = true (#7079)
* Improve compaction tutorial to demonstrate compaction with keepSegmentGranularity = true

* typo

* add a warning
2019-02-27 16:02:51 -08:00
Jihoon Son 4e2b085201
Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file in deep storage (#6911)
* Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file

* delete descriptor.file when killing segments

* fix test

* Add doc for ha

* improve warning
2019-02-20 15:10:29 -08:00
Mingming Qiu dd34691004 Coordinator await initialization before finishing startup (#6847)
* Curator server inventory await initialization

* address comments

* print exception object in log

* remove throws ISE

* cachingCost awaitInitialization default to false
2019-02-20 11:56:23 -08:00
David Glasser a81b1b8c9c index_parallel: support !appendToExisting with no explicit intervals (#7046)
* index_parallel: support !appendToExisting with no explicit intervals

This enables ParallelIndexSupervisorTask to dynamically request locks at runtime
if it is run without explicit intervals in the granularity spec and with
appendToExisting set to false.  Previously, it behaved as if appendToExisting
was set to true, which was undocumented and inconsistent with IndexTask and
Hadoop indexing.

Also, when ParallelIndexSupervisorTask allocates segments in the explicit
interval case, fail if its locks on the interval have been revoked.

Also make a few other additions/clarifications to native ingestion docs.

Fixes #6989.

* Review feedback.

PR description on GitHub updated to match.

* Make native batch ingestion partitions start at 0

* Fix to previous commit

* Unit test. Verified to fail without the other commits on this branch.

* Another round of review

* Slightly scarier warning
2019-02-20 10:54:26 -08:00
Surekha 2b04e6d0bc add note on consistency of results for sys.segments queries (#7034)
* add doc

* change docs

* PR comments

* few more changes
2019-02-19 10:52:37 -08:00
Clint Wylie cadb6c5280 Missing Overlord and MiddleManager api docs (#7042)
* document middle manager api

* re-arrange

* correction

* document more missing overlord api calls, minor re-arrange of some code i was referencing

* fix it

* this will fix it

* fixup

* link to other docs
2019-02-19 10:52:05 -08:00
Surekha 80a2ef7be4 Support kafka transactional topics (#5404) (#6496)
* Support kafka transactional topics

* update kafka to version 2.0.0
* Remove the skipOffsetGaps option since it's not used anymore
* Adjust kafka consumer to use transactional semantics
* Update tests

* Remove unused import from test

* Fix compilation

* Invoke transaction api to fix a unit test

* temporary modification of travis.yml for debugging

* another attempt to get travis tasklogs

* update kafka to 2.0.1 at all places

* Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes

* Add deprecated in docs for kafka-eight and kafka-simple extensions

* Remove skipOffsetGaps and code changes for transaction support

* Fix indentation

* remove skipOffsetGaps from kinesis

* Add transaction api to KafkaRecordSupplierTest

* Fix indent

* Fix test

* update kafka version to 2.1.0
2019-02-18 11:50:08 -08:00
scrawfor 0fa9000849 Add Postgresql SqlFirehose (#6813)
* Add Postgresql SqlFirehose

* Fix Code Style.

* Fix style.

* Fix Import Order.

* Add Line Break before package.
2019-02-14 22:52:03 -08:00
awelsh93 ee91e27fe7 Update api-reference.md doc (#7065)
- moving description of coordinator isLeader endpoint
2019-02-14 14:38:09 +00:00
Edward Gan 90c1a54b86 Moments Sketch custom aggregator (#6581)
* Moments Sketch Integration with Druid

* updates, add documentation, fix warnings

* nits

* disallowed base64

* update to druid 0.14
2019-02-13 14:03:47 -08:00
Jihoon Son 970308463d
Add doc for Hadoop-based ingestion vs Native batch ingestion (#7044)
* Add doc for Hadoop-based ingestion vs Native batch ingestion

* add links

* add links
2019-02-13 11:23:08 -08:00
Jihoon Son b1c4a5de0d
Fix and improve doc for partitioning of local index (#7064) 2019-02-13 11:20:52 -08:00
Jihoon Son d42de574d6 Add an api to get all lookup specs (#7025)
* Add an api to get all lookup specs

* add doc
2019-02-08 11:05:59 -08:00
Jihoon Son 8e3a58f723
Improve druid.storage.sse.kms.keyId and druid.s3.protocol (#7012)
* Improve druid.storage.sse.kms.keyId and druid.s3.protocol

* fix article
2019-02-06 15:00:51 -08:00
Jihoon Son 75c70c2ccc Add doc for S3 permissions settings (#7011)
* Add doc for S3 permissions settings

* add a comment about additional settings
2019-02-05 11:52:09 -08:00
Egor Riashin 97b6407983 maintenance mode for Historical (#6349)
* maintenance mode for Historical

forbidden api fix, config deserialization fix

logging fix, unit tests

* addressed comments

* addressed comments

* a style fix

* addressed comments

* a unit-test fix due to recent code-refactoring

* docs & refactoring

* addressed comments

* addressed a LoadRule drop flaw

* post merge cleaning up
2019-02-04 18:11:00 -08:00
Jonathan Wei 953b96d0a4 Add more sketch aggregator support in Druid SQL (#6951)
* Add more sketch aggregator support in Druid SQL

* Add docs

* Tweak module serde register

* Fix tests

* Checkstyle

* Test fix

* PR comment

* PR comment

* PR comments
2019-02-02 22:34:53 -08:00
Surekha 7baa33049c Introduce published segment cache in broker (#6901)
* Add published segment cache in broker

* Change the DataSegment interner so it's not based on DataSEgment's equals only and size is preserved if set

* Added a trueEquals to DataSegment class

* Use separate interner for realtime and historical segments

* Remove trueEquals as it's not used anymore, change log message

* PR comments

* PR comments

* Fix tests

* PR comments

* Few more modification to

* change the coordinator api
* removeall segments at once from MetadataSegmentView in order to serve a more consistent view of published segments
* Change the poll behaviour to avoid multiple poll execution at same time

* minor changes

* PR comments

* PR comments

* Make the segment cache in broker off by default

* Added a config to PlannerConfig
* Moved MetadataSegmentView to sql module

* Add doc for new planner config

* Update documentation

* PR comments

* some more changes

* PR comments

* fix test

* remove unintentional change, whether to synchronize on lifecycleLock is still in discussion in PR

* minor changes

* some changes to initialization

* use pollPeriodInMS

* Add boolean cachePopulated to check if first poll succeeds

* Remove poll from start()

* take the log message out of condition in stop()
2019-02-02 22:27:13 -08:00
Justin Borromeo 6430ef8e1b lol (#6985) 2019-02-01 14:21:13 -08:00
Clint Wylie 7a5827e12e bloom filter sql aggregator (#6950)
* adds sql aggregator for bloom filter, adds complex value serde for sql results

* fix tests

* checkstyle

* fix copy-paste
2019-02-01 13:54:46 -08:00
lxqfy e45f9ea5e9 Update metrics.md (#6976) 2019-02-01 13:40:44 -08:00
jorbay-au 852fe86ea2 Remove repeated word in indexing-service.md (#6983) 2019-02-01 13:38:22 -08:00
Furkan KAMACI 185a7d4fc5 Updated definition and added link for Zookeeper connection string. (#6961)
* Updated definition and added link for Zookeeper connection string.

* Conflicts are merged.
2019-01-31 10:14:42 -08:00
Gian Merlino 54735a5ad1 Kafka indexing: Remove experimental notice. (#6970) 2019-01-31 09:54:22 -08:00
Surekha 4c211ab2b4 update sys table docs (#6955)
* update sys table docs

* Capitalize SQL
2019-01-31 08:51:39 -08:00
Jonathan Wei 82137874ea Add master/data/query server concepts to docs/packaging (#6916)
* Add master/data/query server concepts to docs/packaging

* PR comments

* TOC and markdown fix

* Update image legend

* PR comment

* More PR comments
2019-01-30 19:41:07 -08:00
Jihoon Son d4fbbb8deb Support protocol configuration for S3 (#6954)
* Support protocol configuration for S3

* Add doc
2019-01-30 19:32:00 -08:00
Gian Merlino edee576a7a Add doc for druid.storage.useS3aSchema. (#6964) 2019-01-30 10:26:37 -08:00
Clint Wylie a6d81c0d16 Adds bloom filter aggregator to 'druid-bloom-filters' extension (#6397)
* blooming aggs

* partially address review

* fix docs

* minor test refactor after rebase

* use copied bloomkfilter

* add ByteBuffer methods to BloomKFilter to allow agg to use in place, simplify some things, more tests

* add methods to BloomKFilter to get number of set bits, use in comparator, fixes

* more docs

* fix

* fix style

* simplify bloomfilter bytebuffer merge, change methods to allow passing buffer offsets

* oof, more fixes

* more sane docs example

* fix it

* do the right thing in the right place

* formatting

* fix

* avoid conflict

* typo fixes, faster comparator, docs for comparator behavior

* unused imports

* use buffer comparator instead of deserializing

* striped readwrite lock for buffer agg, null handling comparator, other review changes

* style fixes

* style

* remove sync for now

* oops

* consistency

* inspect runtime shape of selector instead of selector plus, static comparator, add inner exception on serde exception

* CardinalityBufferAggregator inspect selectors instead of selectorPluses

* fix style

* refactor away from using ColumnSelectorPlus and ColumnSelectorStrategyFactory to instead use specialized aggregators for each supported column type, other review comments

* adjustment

* fix teamcity error?

* rename nil aggs to empty, change empty agg constructor signature, add comments

* use stringutils base64 stuff to be chill with master

* add aggregate combiner, comment
2019-01-29 20:05:17 +07:00
Justin Borromeo 8d70ba69cf Fix broken link on select query doc page (#6933)
* Fixed broken link

* Typo fix
2019-01-28 17:02:21 -08:00
Clint Wylie af3cbc3687 add bloom filter druid expression (#6904)
* add "bloom_filter_test" druid expression to support bloom filters in ExpressionVirtualColumn and ExpressionDimFilter and sql expressions

* more docs

* use java.util.Base64, doc fixes
2019-01-28 08:41:45 -05:00
Navin Kumar ae4dba7785 Fix Configuration options (#6884)
Change `druid.metadata.postgres.*` to `druid.metadata.postgres.ssl.*`
2019-01-27 12:35:27 -08:00
Gian Merlino 7c5a06bb85
More docs on data modeling. (#6899)
* More docs on data modeling.

* Try to fix formatting.

* Fix indentation.

* More details and adjustments after feedback.
2019-01-27 11:33:21 -08:00
Janek Lasocki-Biczysko 89f2475369 Move ingest/kafka/* metrics into a separate section on the metrics docs (#6895)
The `ingest/kafka/*` metrics were grouped together with metrics relevant
to RealtimeMetricsMonitor, whereas they should be in their own section.
2019-01-28 00:11:53 +08:00
Jihoon Son 3b020fd81b Improve doc for auto compaction (#6782)
* Improve doc for auto compaction

* address comments

* address comments

* address comments
2019-01-23 16:21:45 -08:00
Justin Borromeo 86e171a234 Doc change and commands tested command on v5 and v8 (#6886) 2019-01-18 15:13:11 -08:00
Jonathan Wei 68f744ec0a
Fixed buckets histogram aggregator (#6638)
* Fixed buckets histogram aggregator

* PR comments

* More PR comments

* Checkstyle

* TeamCity

* More TeamCity

* PR comment

* PR comment

* Fix doc formatting
2019-01-17 14:51:16 -08:00
lxqfy f6dcd63084 Fixed the format of broker client configration (#6878) 2019-01-16 22:57:50 -08:00
Dayue Gao 5b8a221713 Add SQL id, request logs, and metrics (#6302)
* use SqlLifecyle to manage sql execution, add sqlId

* add sql request logger

* fix UT

* rename sqlId to sqlQueryId, sql/time to sqlQuery/time, etc

* add docs and more sql request logger impls

* add UT for http and jdbc

* fix forbidden use of com.google.common.base.Charsets

* fix UT in QuantileSqlAggregatorTest, supressed unused warning of getSqlQueryId

* do not use default method in QueryMetrics interface

* capitalize 'sql' everywhere in the non-property parts of the docs

* use RequestLogger interface to log sql query

* minor bugfixes and add switching request logger

* add filePattern configs for FileRequestLogger

* address review comments, adjust sql request log format

* fix inspection error

* try SuppressWarnings("RedundantThrows") to fix inspection error on ComposingRequestLoggerProvider
2019-01-15 23:12:59 -08:00
Jonathan Wei 9a8bade2fb Update approximate aggregators docs (#6848) 2019-01-11 21:50:51 -08:00
Furkan KAMACI 55927bf8e3 Kafka version is updated (#6835)
Update Kafka version in tutorial from 0.10.2.0 to 0.10.2.2
2019-01-10 17:58:40 -08:00
Jihoon Son c35a39d70b
Add support maxRowsPerSegment for auto compaction (#6780)
* Add support maxRowsPerSegment for auto compaction

* fix build

* fix build

* fix teamcity

* add test

* fix test

* address comment
2019-01-10 09:50:14 -08:00
Furkan KAMACI ea973fee6b Tranquility version is updated (#6824) 2019-01-10 09:46:58 +08:00
dongyifeng def823124c add version comparator for StringComparator (#6745)
* add version comparator for StringComparator

* add more test case and docs
2019-01-08 17:17:03 -08:00
Benjamin Hopp ef80c4e036 Update sql.md (#6821)
Corrected defaults for druid.sql.avatica.maxStatementsPerConnection and druid.sql.avatica.maxConnections
2019-01-08 10:15:12 -08:00
Janek Lasocki-Biczysko b88e6304c4 Fix broken link in ingestion/schema-design.md docs (#6810) 2019-01-06 18:20:53 -08:00
David Glasser c08f391605 statsd-emitter: support constant DogStatsD tags (#6791)
PR #6605 added support to the statsd emitter for DogStatsD tags. This commit
lets you specify "constant tags" in the config file which are included with
every event. This is helpful if you are running in an environment where you
cannot configure your datadog-agent with tags like "cluster name" --- eg, a
Kubernetes cluster with a datadog-agent on each node and different Druid
deployments in different namespaces but sharing the same datadog-agent
daemonset.

Also fix the name of an existing boolean getter to start with 'is'.
2019-01-04 15:35:37 +08:00
thomask 0e04acca43 Show how to include classpath in command (#6802)
Would have saved me some time
2019-01-03 18:31:55 -08:00
Jihoon Son 9ad6a733a5 Add support segmentGranularity for CompactionTask (#6758)
* Add support segmentGranularity

* add doc and fix combination of options

* improve doc
2019-01-03 17:50:45 -08:00
Mingming Qiu 6761663509 make kafka poll timeout can be configured (#6773)
* make kafka poll timeout can be configured

* add doc

* rename DEFAULT_POLL_TIMEOUT to DEFAULT_POLL_TIMEOUT_MILLIS
2019-01-03 12:16:02 +08:00
Mingming Qiu 114a9fc38f change propertyBase in ServerViewModule (#6774) 2019-01-02 16:44:02 +08:00
Clint Wylie 67f832957b add bloom filter operator to general sql docs (#6785) 2018-12-31 11:30:33 -08:00
Joshua Sun 7c7997e8a1 Add Kinesis Indexing Service to core Druid (#6431)
* created seekablestream classes

* created seekablestreamsupervisor class

* first attempt to integrate kafa indexing service to use SeekableStream

* seekablestream bug fixes

* kafkarecordsupplier

* integrated kafka indexing service with seekablestream

* implemented resume/suspend and refactored some package names

* moved kinesis indexing service into core druid extensions

* merged some changes from kafka supervisor race condition

* integrated kinesis-indexing-service with seekablestream

* unite tests for kinesis-indexing-service

* various bug fixes for kinesis-indexing-service

* refactored kinesisindexingtask

* finished up more kinesis unit tests

* more bug fixes for kinesis-indexing-service

* finsihed refactoring kinesis unit tests

* removed KinesisParititons and KafkaPartitions to use SeekableStreamPartitions

* kinesis-indexing-service code cleanup and docs

* merge #6291

merge #6337

merge #6383

* added more docs and reordered methods

* fixd kinesis tests after merging master and added docs in seekablestream

* fix various things from pr comment

* improve recordsupplier and add unit tests

* migrated to aws-java-sdk-kinesis

* merge changes from master

* fix pom files and forbiddenapi checks

* checkpoint JavaType bug fix

* fix pom and stuff

* disable checkpointing in kinesis

* fix kinesis sequence number null in closed shard

* merge changes from master

* fixes for kinesis tasks

* capitalized <partitionType, sequenceType>

* removed abstract class loggers

* conform to guava api restrictions

* add docker for travis other modules test

* address comments

* improve RecordSupplier to supply records in batch

* fix strict compile issue

* add test scope for localstack dependency

* kinesis indexing task refactoring

* comments

* github comments

* minor fix

* removed unneeded readme

* fix deserialization bug

* fix various bugs

* KinesisRecordSupplier unable to catch up to earliest position in stream bug fix

* minor changes to kinesis

* implement deaggregate for kinesis

* Merge remote-tracking branch 'upstream/master' into seekablestream

* fix kinesis offset discrepancy with kafka

* kinesis record supplier disable getPosition

* pr comments

* mock for kinesis tests and remove docker dependency for unit tests

* PR comments

* avg lag in kafkasupervisor #6587

* refacotred SequenceMetadata in taskRunners

* small fix

* more small fix

* recordsupplier resource leak

* revert .travis.yml formatting

* fix style

* kinesis docs

* doc part2

* more docs

* comments

* comments*2

* revert string replace changes

* comments

* teamcity

* comments part 1

* comments part 2

* comments part 3

* merge #6754

* fix injection binding

* comments

* KinesisRegion refactor

* comments part idk lol

* can't think of a commit msg anymore

* remove possiblyResetDataSourceMetadata() for IncrementalPublishingTaskRunner

* commmmmmmmmmments

* extra error handling in KinesisRecordSupplier getRecords

* comments

* quickfix

* typo

* oof
2018-12-21 12:49:24 -07:00
Gian Merlino 7a09cde4de
Broker: Await initialization before finishing startup. (#6742)
* Broker: Await initialization before finishing startup.

In particular, hold off on announcing the service and starting the
HTTP server until the server view and SQL metadata cache are finished
initializing. This closes a window of time where a Broker could return
partial results shortly after startup.

As part of this, some simplification of server-lifecycle service
announcements. This helps ensure that the two different kinds of
announcements we do (legacy and new-style) stay in sync.

* Remove unused imports.

* Fix NPE in ServerRunnable.
2018-12-18 20:32:31 -08:00
Jihoon Son 2c380e3a26 Fix doc for automatic compaction (#6749) 2018-12-17 11:44:33 -08:00
Jonathan Wei c713116a75 Use @Coordinator leader client in CoordinatorRuleManager (#6729) 2018-12-16 15:18:09 -08:00
Gian Merlino 04e7c7fbdc FilteredRequestLogger: Fix start/stop, invalid delegate behavior. (#6637)
* FilteredRequestLogger: Fix start/stop, invalid delegate behavior.

Fixes two bugs:

1) FilteredRequestLogger did not start/stop the delegate.

2) FilteredRequestLogger would ignore an invalid delegate type, and
instead silently substitute the "noop" logger. This was due to a larger
problem with RequestLoggerProvider setup in general; the fix here is
to remove "defaultImpl" from the RequestLoggerProvider interface, and
instead have JsonConfigurator be responsible for creating the
default implementations. It is stricter about things than the old system
was, and is only willing to make a noop logger if it doesn't see any
request logger configs. Otherwise, it'll raise a provision error.

* Remove unneeded annotations.
2018-12-14 16:55:44 +08:00
Clint Wylie 4ec068642d move parquet extension input formats up a level to `org.apache.druid.data.input.parquet.DruidParquetInputFormat` for `parquet` and `org.apache.druid.data.input.parquet.DruidParquetAvroInputFormat` for `parquet-avro` (#6727) 2018-12-13 16:33:42 -08:00
David Lim f7bbee2e65 Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 11:47:20 -08:00
Vadim Ogievetsky da4836f38c Added titles and harmonized docs to improve usability and SEO (#6731)
* added titles and harmonized docs

* manually fixed some titles
2018-12-12 20:42:12 -08:00
Clint Wylie 55914687bb Fix broken link in docs toc (#6728)
Change 'peon.html' to the correct link, 'peons.html'. No redirect is needed because the file has always been 'peons', just an incorrect link was introduced in the toc here https://github.com/apache/incubator-druid/pull/6259/files#diff-45297643736c5fb6da0e92f2c3df5d68R89
2018-12-12 15:14:38 -08:00
Vincent Newkirk cc44a4a28f Correct Documentation for lowerStrict/upperStrict (#6707)
The documentation for Bound filter's lowerStrict/upperStrict is incorrect. It is not consistent with the examples provided and actual behaviour of the bound filter. Correct this.
2018-12-06 10:14:50 -08:00
Mingming Qiu 607339003b Add TaskCountStatsMonitor to monitor task count stats (#6657)
* Add TaskCountStatsMonitor to monitor task count stats

* address comments

* add file header

* tweak test
2018-12-04 13:37:17 -08:00
Clint Wylie a1c9d0add2 autosize processing buffers based on direct memory sizing by default (#6588)
* autosize processing buffers based on direct memory sizing

* remove oops, more test

* max 1gb autosize buffers, test, start of docs

* fix oops

* revert accidental change

* print buffer size in exception

* change the things
2018-12-03 18:40:02 -07:00
David Lim e2bedab665 fix links to use relative references (#6696) 2018-11-30 16:32:10 -08:00
David Lim b332021c49 remove extensions from default configs that have configuration/library dependencies and update docs (#6694) 2018-11-30 12:52:46 -08:00
rcgarcia74 9bf835b84f remove #658 doc reference for Schema-less design (#6693) 2018-11-30 12:53:57 -07:00
Jihoon Son d6539abd0a Fix overlord api and console (#6686)
* Fix overlord APIs and console

* remove getRunningTasksByDataSource

* add missing path to isApplicable
2018-11-29 23:45:28 -08:00
Mingming Qiu c5405bb592 emit maxLag/avgLag in KafkaSupervisor (#6587)
* emit maxLag/totalLag/avgLag in KafkaSupervisor

* modify ingest/kafka/totalLag to ingest/kafka/lag for backwards compatibility
2018-11-28 02:11:14 -08:00
Mingming Qiu 849ba867b2 fix missing property in JsonTypeInfo of SegmentWriteOutMediumFactory (#6656) 2018-11-27 15:59:58 -08:00
Clint Wylie efdec50847 bloom filter sql (#6502)
* bloom filter sql support

* docs

* style fix

* style fixes after rebase

* use copied/patched bloomkfilter

* remove context literal lookup function, changes from review

* fix build

* rename LookupOperatorConversion to QueryLookupOperatorConversion

* remove doc

* revert unintended change

* add internal exception to bloom filter deserialization exception
2018-11-27 14:11:18 +08:00
Evans Hauser 03df481c9c Docs: Fix wikipedia links in Ingestion:Rollup (#6659)
The rendered site doesn't have automatic link detection, so we need to add these links in explicitly. This also fixes the Measure link, which included an extra `)`

http://druid.io/docs/latest/ingestion/index.html#rollup
2018-11-23 16:28:05 -08:00
seoeun 22a5bf97a2 Fix issue that tasks tables in metadata storage are not cleared (#6592)
* tasks tables in metadata storage are not cleared

* address comments. remove tasklogs and revert obsolete changes

* address comments. change comment and update doc.

* address comments. update doc more detailed

* address comments. remove redundant log and update doc more detailed.

* address comments. update document
2018-11-22 11:50:31 +08:00
Jonathan Wei e285b1103d Use PasswordProvider for basic HTTP escalator (#6650) 2018-11-21 07:34:15 -08:00
Caroline1000 a438a9b99c fix typo in config page of docs (#6645) 2018-11-19 16:32:58 -08:00
Deiwin Sarjas e0d1dc5846 Support DogStatsD style tags in statsd-emitter (#6605)
* Replace StatsD client library

The [Datadog package][1] is a StatsD compatible drop-in replacement for the
client library, but it seems to be [better maintained][2] and has support for
Datadog DogStatsD specific features, which will be made use of in a subsequent
commit.

The `count`, `time`, and `gauge` methods are actually exactly compatible with
the previous library and the modifications shouldn't be required, but EasyMock
seems to have a hard time dealing with the variable arguments added by the
DogStatsD library and causes tests to fail if no arguments are provided for the
last String vararg. Passing an empty array fixes the test failures.

[1]: https://github.com/DataDog/java-dogstatsd-client
[2]: https://github.com/tim-group/java-statsd-client/issues/37#issuecomment-248698856

* Retain dimension key information for StatsD metrics

This doesn't change behavior, but allows separating dimensions from the metric
name in subsequent commits.

There is a possible order change for values from
`dimsBuilder.build().values()`, but from the tests it looks like it doesn't
affect actual behavior and the order of user dimensions is also retained.

* Support DogStatsD style tags in statsd-emitter

Datadog [doesn't support name-encoded dimensions and uses a concept of _tags_
instead.][1] This change allows Datadog users to send the metrics without
having to encode the various dimensions in the metric names. This enables
building graphs and monitors with and without aggregation across various
dimensions from the same data.

As tests in this commit verify, the behavior remains the same for users who
don't enable the `druid.emitter.statsd.dogstatsd` configuration flag.

[1]: https://www.datadoghq.com/blog/the-power-of-tagged-metrics/#tags-decouple-collection-and-reporting

* Disable convertRange behavior for DogStatsD users

DogStatsD, unlike regular StatsD, supports floating-point values, so this
behavior is unnecessary. It would be possible to still support `convertRange`,
even with `dogstatsd` enabled, but that would mean that people using the
default mapping would have some of the gauges unnecessarily converted.

`time` is in milliseconds and doesn't support floating-point values.
2018-11-19 09:47:57 -08:00