Commit Graph

10718 Commits

Author SHA1 Message Date
zhangyue19921010 1884c35698
Do Integrate test for Druid base on K8s cluster (#10669)
* add a travls job to do integrate test on K8s

* revert build_run_cluster.sh

* revert msic

* run IT test

* ready to test

* modify before/after script

* done

* change mod for script

* done

* add env DRUID_OPERATOR_VERSION=0.0.3

* change version

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-12-16 16:00:42 -08:00
Vadim Ogievetsky 55b8cc428a
remove extra word (#10682) 2020-12-15 23:02:33 -08:00
Abhishek Agarwal 7a8e9bb156
Fix hadoop docker copy script (#10671) 2020-12-14 23:08:50 -08:00
Himanshu ac1882bf74
kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544)
* honor zk enablement config in more places in druid code

* kubernetes based discovery module

* fix spotbugs check

* fix intellij checks error

* fix doc link to kubernetes.md from extension

* make spellchecker happy

* update license.yaml

* fix dependency check errors

* update extension coverage

* UTs for BaseNodeRoleWatcher

* fix forbidden-api check

* update k8s module coverage ignores

* add Bouncy Castle License being same as MIT License for license checking purposes

* further update licenses.yaml

* label/annotation pre-existence assumption

* address review comment
2020-12-14 21:10:31 -08:00
Harini Rajendran c2e26d2e1c
Add status/selfDiscovered endpoint to indexer for self discovery of indexer (#10679)
Added the status/selfDiscovered endpoint to indexer. Per the api-reference doc, all services support status/selfDiscovered endpoint. So this change would fix that expected behavior.

Also added example config files for indexer process that can be used to spin up the indexer process.
2020-12-14 19:04:14 -08:00
zhangyue19921010 0ad27c06da
Historical load Segments enhancement (#10650)
* load segments with segment files check

* add more java docs

* done

* add java docs

* revert misc

* resolve ci failures

* resolve ci failures

* done

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-12-14 13:56:03 -08:00
Clint Wylie 64f97e7003
fix DruidSchema incorrectly listing tables with no segments (#10660)
* fix race condition with DruidSchema tables and dataSourcesNeedingRebuild

* rework to see if it passes analysis

* more better

* maybe this

* re-arrange and comments
2020-12-11 14:14:00 -08:00
Gian Merlino 753fa6b3bd
IdUtils: Forbid characters that cannot be used in znodes. (#10659)
* IdUtils: Forbid characters that cannot be used in znodes.

* Fix whitespace.
2020-12-10 10:49:40 -08:00
Himanshu be019760bb
document DynamicConfigProvider for kafka consumer properties (#10658)
* document DynamicConfigProvider for kafka consumer properties

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

* fix doc build

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-12-10 08:24:33 -08:00
Jihoon Son abcf624a2e
Bump up jackson-databind to 2.10.5.1 (#10655)
* Bump up jackson version to 2.10.5.1

* only jackson-databind

* license
2020-12-09 13:54:47 -08:00
Vadim Ogievetsky 577cd66002
Web console: reflect the changes to interval requirement in the data loader flow (#10647)
* no need for intervals

* don't set redundant fields

* fix tests

* better filter control

* work with not

* wrap callout with form group

* update snapshot

* add split hint

* highlight issues with spec

* fixes

* fix default value

* move intervals back to partition step

* work with all sorts of chars

* fix enabled view
2020-12-09 10:18:42 -08:00
Harini Rajendran 9d2df506ea
Return appropriate config directory path for indexer process. (#10657)
Currently `getConfPath` function returns an empty string for `indexer` service. So the config directory path for this service is not set properly when installed using docker environment.
2020-12-09 09:30:36 -08:00
Atul Mohan 44df05b8b2
Clarify split hint spec behavior (#10656) 2020-12-09 08:24:32 -06:00
Abhishek Agarwal 4ea1ab8531
Fix links in the grouping function doc (#10654) 2020-12-09 14:56:32 +08:00
Gian Merlino 96a387d972
Fixes and tests related to the Indexer process. (#10631)
* Fixes and tests related to the Indexer process.

Three bugs fixed:

1) Indexers would not announce themselves as segment servers if they
   did not have storage locations defined. This used to work, but was
   broken in #9971. Fixed this by adding an "isSegmentServer" method
   to ServerType and updating SegmentLoadDropHandler to always announce
   if this method returns true.

2) Certain batch task types were written in a way that assumed "isReady"
   would be called before "run", which is not guaranteed. In particular,
   they relied on it in order to initialize "taskLockHelper". Fixed this
   by updating AbstractBatchIndexTask to ensure "isReady" is called
   before "run" for these tasks.

3) UnifiedIndexerAppenderatorsManager did not properly handle complex
   datasources. Introduced DataSourceAnalysis in order to fix this.

Test changes:

1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
   Indexer instead of a MiddleManager.

2) Introduce a "USE_INDEXER" environment variable that determines if
   docker-compose will start up an Indexer or a MiddleManager.

3) Duplicate all the jdk8 tests and run them in both MiddleManager and
   Indexer mode.

4) Various adjustments to encourage fail-fast errors in the Docker
   build scripts.

5) Various adjustments to speed up integration tests and reduce memory
   usage.

6) Add another Mac-specific approach to determining a machine's own IP.
   This was useful on my development machine.

7) Update segment-count check in ITCompactionTaskTest to eliminate a
   race condition (it was looking for 6 segments, which only exist
   together briefly, until the older 4 are marked unused).

Javadoc updates:

1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
   that make it clear when taskLockHelper will be initialized as a side
   effect. (Related to the second bug above.)

2) Task: Clarified that "isReady" is not guaranteed to be called before
   "run". It was already implied, but now it's explicit.

3) ZkCoordinator: Clarified deprecation message.

4) DataSegmentServerAnnouncer: Clarified deprecation message.

* Fix stop_cluster script.

* Fix sanity check in script.

* Fix hashbang lines.

* Test and doc adjustments.

* Additional tests, and adjustments for tests.

* Split ITs back out.

* Revert change to druid_coordinator_period_indexingPeriod.

* Set Indexer capacity to match MM.

* Bump up Historical memory.

* Bump down coordinator, overlord memory.

* Bump up Broker memory.
2020-12-08 16:02:26 -08:00
Vyatcheslav Mogilevsky 5324785eac
integration tests fix: update base image for hadoop containers to centos 7 (#10638)
LGTM
2020-12-08 11:00:51 -08:00
frank chen c410648630
fix injection failure of StorageLocationSelectorStrategy objects (#10363)
* fix to allow customer storage location selector strategy

* add test cases to check instance of selector strategy

* update doc

* code format

* resolve code review comments

* inject StorageLocation

* fix CI

* fix mismatched license item reported by CI

* change property path from druid.segmentCache.locationSelectorStrategy.type to druid.segmentCache.locationSelector.strategy

* using a helper method to bind to correct property path
2020-12-08 09:48:31 -08:00
Vadim Ogievetsky e3f7217546
Web console: Improve the handling of extreme data (funky datasources, longs) (#10641)
* better API escape

* fix escaping issue, bigints

* update licenses

* fix align

* do not show Query with SQL if no SQL

* add prettify script

* update dev readme

* add ordering to the datasource list

* add ordering to supervisor table
2020-12-08 09:25:14 -08:00
Gian Merlino 9acab0b646
DruidInputSource: Sort segments by ID before grouping into splits. (#10646)
This is useful because it groups up segments for the same time chunk
into the same splits, which in turn is useful because it minimizes the
number of time chunks that each task will have to deal with.
2020-12-07 13:48:24 -08:00
Abhishek Agarwal 26d74b3580
Add grouping_id function (#10518)
* First draft of grouping_id function

* Add more tests and documentation

* Add calcite tests

* Fix travis failures

* bit of a change

* Add documentation

* Fix typos

* typo fix
2020-12-07 11:46:29 -08:00
Gian Merlino b681861f05
Speed up integration tests in two ways. (#10648)
1) Accelerate coordinator runs to speed up segment load after publishing.

2) For streaming ingestion tests, Instead of waiting 3 minutes for data to
   load, wait until the expected number of rows is loaded.

Also updates segment-count check in ITCompactionTaskTest to eliminate a
race condition (it was looking for 6 segments, which only exist together
briefly, until the older 4 are marked unused).
2020-12-07 10:59:29 -08:00
egor-ryashin f46cc4faaf Revert "fixed input source sampler buildReader exp"
This reverts commit e688db8
2020-12-07 18:34:59 +03:00
egor-ryashin e688db8503 fixed input source sampler buildReader exp 2020-12-07 18:28:25 +03:00
Gian Merlino b7641f644c
Two fixes related to encoding of % symbols. (#10645)
* Two fixes related to encoding of % symbols.

1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments()
   returns already-decoded strings. Applying StringUtils.urlDecode on
   top of that causes erroneous behavior with '%' characters.

2) Update various ThreadFactoryBuilder name formats to escape '%'
   characters. This fixes situations where substrings starting with '%'
   are erroneously treated as format specifiers.

ITs are updated to include a '%' in extra.datasource.name.suffix.

* Avoid String.replace.

* Work around surefire bug.

* Fix xml encoding.

* Another try at the proper encoding.

* Give up on the emojis.

* Less ambitious testing.

* Fix an additional problem.

* Adjust encodeForFormat to return null if the input is null.
2020-12-06 22:35:11 -08:00
Gian Merlino 17f39ab91e
Fix misspellings in druid-forbidden-apis. (#10634)
These caused certain APIs to not actually be properly forbidden.

Also removed two MoreExecutors entries for methods that don't exist in
our version of Guava.
2020-12-05 15:26:57 -08:00
Maytas Monsereenusorn 7eb5f59a9a
Fix string byte calculation in StringDimensionIndexer (#10623)
* fix string byte calculation

* fix tests

* fix test
2020-12-04 00:51:48 -08:00
Liran Funaro 52d46cebc3
Move common configurations to TuningConfig (#10478)
* Move common methods that are used in HadoopTuningConfig and in AppenderatorConfig to TuningConfig
* Rename rowFlushBoundary in HadoopTuningConfig to maxRowsInMemory to match TuningConfig API
2020-12-03 18:13:32 -08:00
zhangyue19921010 229b5f359f
Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better (#10551)
* remove build in kafka consumer config :

* modify druid docs of kafka indexing service

* yuezhang

* modify doc

* modify docs

* fix kafkaindexTaskTest.java

* revert uncessary change

* add more logs and modify docs

* revert jdk version

* modify docs

* modify-kafka-version v2

* modify docs

* modify docs

* modify docs

* modify docs

* modify docs

* done

* remove useless import

* change code and add UT

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-12-03 17:37:59 -08:00
Jihoon Son ae6c43de71
Add an integration test for HTTP inputSource (#10620) 2020-12-03 15:51:56 -08:00
Valdemar 2cd017b7aa
Fix the config initialization on container restart (#10458) 2020-12-03 12:03:00 -08:00
Himanshu 813e18774e
make dimension column extensible with COMPLEX type (#10277)
* make dimension column extensible with COMPLEX type

* more changes

Change-Id: I9707dd644b8d71030b74a8c1d6fff0c0020d960d

* processing module changes for build fix

Change-Id: I146f95a41b79d20edb1721be13f0e9641f788e0e

* rename ColumnCapabilities.getTypeName() to getComplexTypeName()

* rename ColumnBuilder.setTypeName(..) -> ColumnBuilder.setComplexTypeName(..)
2020-12-03 08:58:17 -08:00
Suneet Saldanha c94be8a945
Revert "Update google client libraries (#10536)" (#10599)
This reverts commit 4537016cad.
2020-12-03 20:14:52 +05:30
Himanshu 7e9522870f
introduce DynamicConfigProvider interface and make kafka consumer props extensible (#10309)
* introduce DynamicConfigProvider interface and make kafka consumer props extensible

* fix intellij inspection error

* make DynamicConfigProvider generic

Change-Id: I2e3e89f8617b6fe7fc96859deca4011f609dc5a3

* deprecate PasswordProvider
2020-12-02 16:38:27 -08:00
Atul Mohan f965464f36
Fix empty directory handling (#10319)
Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-12-02 10:37:08 -08:00
Lucas Capistrant 2e02eebd9d
Add context dimension to DefaultQueryMetrics (#10578)
* Add context dimension to DefaultQueryMetrics

* remove redundant addition of context dimension from DruidMetrics now that QueryMetrics adds it by default

* update SearchQueryMetrics to reflect the same pattern as other default dimensions in QueryMetrics

* add PublicApi annotation for context in QueryMetrics Interface
2020-12-01 18:34:03 -08:00
zhangyue19921010 e7e07eab11
[Improve Doc] : Modify the disadvantages of the lazyLoadOnStart feature. (#10608)
* modify docs

* modify docs

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-12-01 18:33:22 -08:00
frank chen 24f1e35b5d
fix desc of 'required' for granularity property (#10616) 2020-12-01 18:29:51 -08:00
Vadim Ogievetsky 5b06c7a3a9
Web console: improve how code is imported, use API instance (#10597)
* fix imports

* clean up imports

* update DQT to fix escaping
2020-12-01 13:16:14 -08:00
Jihoon Son d47d6cf081
Add time-to-first-result benchmark for groupBy (#10612) 2020-12-01 10:32:37 -08:00
Lucas Capistrant 2560bf0a19
Add new coordinator metrics for coordinator duty runtimes (#10603)
* Add new coordinator metrics for duty runtimes

* fix spelling for a constant variable value

* add comment clarifying why the global runtime metric is emitted where it is

* Remove duty alias in lieu of using the class name for metrics

* fix docs

* CoordinatorStats tests + add duty stats to accumulate() logic
2020-11-29 14:47:35 -08:00
Himanshu 30bcb0fd74
DataSourcesSnapshotBenchmark to measure iterateAllUsedSegmentsInSnapshot perf (#10604) 2020-11-29 14:42:14 -08:00
Jihoon Son 7462b0b953
Allow missing intervals for Parallel task with hash/range partitioning (#10592)
* Allow missing intervals for Parallel task

* fix row filter

* fix tests

* fix log
2020-11-25 14:50:22 -08:00
Ayush Kulshrestha d0c2ede50c
Added CronScheduler support as a proof to clock drift while emitting metrics (#10448)
Co-authored-by: Ayush Kulshrestha <ayush.kulshrestha@miqdigital.com>
2020-11-25 12:31:38 +01:00
frank chen fe693a4f01
Improve doc and exception message for invalid user configurations (#10598)
* improve doc and exception message

* add spelling check rules and remove unused import

* add a test to improve test coverage
2020-11-23 15:03:13 -08:00
zhangyue19921010 31740b3b29
Fix : Druid throws java.util.concurrent.RejectedExecutionException when ingest task is stopping. (#10555)
* check exec status before return Signal

* add more log

* change log level to debug and add UT

* change log leverl to warn and merge master

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-11-23 14:52:03 -08:00
Nishant Bangarwa 4537016cad
Update google client libraries (#10536)
modify license.yaml

Update google oauth client version
2020-11-20 15:23:30 -08:00
Atul Mohan 111b431c07
Introduce query/timeout/count metric (#10567)
* Add timeout metric

* Add tests
2020-11-20 15:17:26 -08:00
David Palmer 3cafd531de
fix issue causing incorrect config in Docker (#10595)
Previously, when the Docker entrypoint script generated the config
files, it would append the configuration without including a newline.
This could result in incorrect configuration. This has been fixed by
always appending a newline before any configuration.

Co-authored-by: Bryson Chen <brysonjackychen@gmail.com>

Co-authored-by: Bryson Chen <brysonjackychen@gmail.com>
2020-11-20 14:52:38 -08:00
Himanshu 2201ffa2f0
druid-docker-image: add DRUID_DIRS_TO_CREATE variable to customize directories created on startup (#10591)
* druid-docker-image: add DRUID_DIRS_TO_CREATE variable to customize directories created on startup

* address review comment

* remove unintentional change
2020-11-20 14:46:19 -08:00
sthetland ba915b7f56
Security overview documentation (#10339)
* initial file

* initial file

* security overview added

* ldap added

* spacing adjustments

* nits

* security graphics and doc review

* Update docs/operations/security-overview.md

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Update docs/operations/security-user-auth.md

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Update docs/operations/security-overview.md

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Update docs/operations/security-overview.md

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* updates frm review

* review comments

* finish up review and light edits

* broken links

* spell check

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
2020-11-19 15:24:58 -08:00