Commit Graph

10902 Commits

Author SHA1 Message Date
Xavier Léauté 118b50195e
Introduce KafkaRecordEntity to support Kafka headers in InputFormats (#10730)
Today Kafka message support in streaming indexing tasks is limited to
message values, and does not provide a way to expose Kafka headers,
timestamps, or keys, which may be of interest to more specialized
Druid input formats. For instance, Kafka headers may be used to indicate
payload format/encoding or additional metadata, and timestamps are often
omitted from values in Kafka streams applications, since they are
included in the record.

This change proposes to introduce KafkaRecordEntity as InputEntity,
which would give input formats full access to the underlying Kafka record,
including headers, key, timestamps. It would also open access to low-level
information such as topic, partition, offset if needed.

KafkaEntity is a subclass of ByteEntity for backwards compatibility with
existing input formats, and to avoid introducing unnecessary complexity
for Kinesis indexing tasks.
2021-01-08 16:04:37 -08:00
zhangyue19921010 2837a9b62f
[Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout` (#10661)
* done

* done

* done

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-08 15:23:08 -08:00
zhangyue19921010 d5192640cb
remove extra comma (#10670)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-08 15:15:08 -08:00
Yi Yuan 3624acbcf8
fix web-console show json bug (#10710)
* fix web-console show json bug

* replace all JSON.stringify

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-01-08 14:55:55 -08:00
秦臻 c62b7c19c3
javascript filter result convert to java boolean (#10721)
* javascript filter result convert to java boolean

* use type convert replace script convert, and add more unit test

Co-authored-by: qinzhen <qinzhen@kuaishou.com>
2021-01-08 14:30:09 -08:00
Abhishek Agarwal f66fdbfa5d
add offsetFetchPeriod to kinesis ingestion doc (#10734) 2021-01-08 14:19:26 -08:00
Gian Merlino 6eef0e4c9f
Fix collision between #10689 and #10593. (#10738) 2021-01-08 09:52:27 -08:00
Aleksey Plekhanov 26bcd47e51
Thread-safety for ResponseContext.REGISTERED_KEYS (#9667) 2021-01-08 00:37:49 -08:00
Liran Funaro 08ab82f55c
IncrementalIndex Tests and Benchmarks Parametrization (#10593)
* Remove redundant IncrementalIndex.Builder

* Parametrize incremental index tests and benchmarks

- Reveal and fix a bug in OffheapIncrementalIndex

* Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]

* Fix Intellij errors: declared exception is never thrown

* Add documentation and validate before closing objects on tearDown.

* Add documentation to OffheapIncrementalIndexTestSpec

* Doc corrections and minor changes.

* Add logging for generated rows.

* Refactor new tests/benchmarks.

* Improve IncrementalIndexCreator documentation

* Add required tests for DataGenerator

* Revert "rollupOpportunity" to be a string
2021-01-07 22:18:47 -08:00
kaijianding 01e25f1e69
reuse DataSegment object when a segment found on another server (#10715) 2021-01-07 21:55:25 -08:00
Jonathan Wei c7f2d3fbb5
Update deps for CVE-2020-28168 and CVE-2020-28052 (#10733)
* Update deps for CVE-2020-28168 and CVE-2020-28052

* Make BC runtime scope
2021-01-07 20:31:44 -08:00
Makdon 1905b80ec3
Update badge for travis in README.md (#10717)
* Update README for updating travis badge

Update README cause
> This repository was migrated and is now building on travis-ci.com

* Update README.md
2021-01-07 18:39:58 -08:00
Himanshu c7b1212a43
AWS RDS token based password provider (#9518)
* refresh db pwd

* aws iam token password provider

* fix analyze-dependencies build

* fix doc build

* add  ut for BasicDataSourceExt

* more doc updates

* more  doc update

* moving aws  token password  provider to new extension

* remove duplicate changes

* make  all config inline

* extension docs

* refresh db  password  in SQL Firehose code path as well

* add ut

* fix build

* add new extension to distribution

* rds lib is not provided

* fix license build

* add version to license

* change parent version to 0.19.0-snapshot

* address review comments

* fix core/ code coverage

* Update server/src/main/java/org/apache/druid/metadata/BasicDataSourceExt.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* address review comments

* fix spellchecker

* remove inadvertant website file change

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-01-06 21:15:29 -08:00
Gian Merlino 48e576a307
Scan query: More accurate error message when segment per time chunk limit is exceeded. (#10630)
* Scan query: More accurate error message when segment per time chunk limit is exceeded.

* Add guardrail test.
2021-01-06 14:11:28 -08:00
Makdon f9fc1892d1
Typo: missing comma in json (#10711) 2021-01-06 13:49:50 -08:00
Jonathan Wei 68bb038b31
Multiphase segment merge for IndexMergerV9 (#10689)
* Multiphase merge for IndexMergerV9

* JSON fix

* Cleanup temp files

* Docs

* Address logging and add IT

* Fix spelling and test unloader datasource name
2021-01-05 22:19:09 -08:00
Himanshu d2e6240cac
k8s-int-test-build: zk-less druid cluster and http based segment/task managment (#10686)
* zk-less druid cluster in k8s build

* attempt to fix build and use http based remote task management

* mm/router logs for debugging

* add default account k8s role and binding for pod, configMap access

* fix issue

* change router port to 8088 for common readinessProbe

* break build_run_k8s_cluster.sh into separate scripts

* revert changes to K8sDruidNodeAnnouncer.java

* k8s extension doc update

* add license to new file

* address review comments

* do not try to load lookups at startup to improve cluster startup time
2021-01-05 18:51:47 -08:00
Jihoon Son ea2d51d61f
Better error message for compaction task when it sees no segments for compaction (#10728) 2021-01-05 16:49:57 -08:00
Jonathan Wei 769c21cc87
Add sample method to IndexingServiceClient (#10729)
* Add sample method to IndexingServiceClient

* Add unit test

* Fix LGTM
2021-01-05 15:02:44 -08:00
Franklyn Dsouza 045b29fa95
Correctly handle null values in time column results (#10642)
* handle null case

* test this case

* test sql resource

* fix style
2021-01-04 22:22:46 -08:00
Lucas Capistrant 26b911a384
Make some additions to IT suite to make Hadoop related testing more understandable (#10667)
* Make some additions to IT suite to make Hadoop related testing more understandable

* add start.hadoop.docker to mvn arg tips in doc

* fix issues preventing ITIndexHadoopTest from running in local mode
2020-12-28 12:25:47 -06:00
Clint Wylie edfbdbfc97
fix NPE when calling TaskLocation.hashCode with null host (#10708) 2020-12-24 15:30:54 -08:00
Clint Wylie 74fbdd322d
refactor NodeRole so extensions can participate in disco and announcement (#10700)
* refactor NodeRole so extensions can participate in disco and announcement

* fixes, maybe

* retries

* javadoc

* fix

* spelling
2020-12-24 15:29:32 -08:00
Charles Smith 797371598d
update syntax for golbal cached uri lookups (#10629) 2020-12-24 09:49:01 -08:00
Xavier Léauté b7a16d08a6
Update Apache Kafka to 2.7.0 (#10701)
- align scala versions to match Kafka
2020-12-22 13:56:00 -08:00
Lucas Capistrant 58ce2e55d8
Add dynamic coordinator config that allows control over how many segments are considered when picking a segment to move. (#10284)
* dynamic coord config adding more balancing control

add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This
config caps the number of segments that are iterated over when selecting
a segment to move. The default value combined with current balancing
strategies will still iterate over all provided segments. However,
setting this value to something > 0 will cap the number of segments
visited. This could make sense in cases where a cluster has a very large
number of segments and the admins prefer less iterations vs a thorough
consideration of all segments provided.

* fix checkstyle failure

* Make doc more detailed for admin to understand when/why to use new config

* refactor PR to use a % of segments instead of raw number

* update the docs

* remove bad doc line

* fix typo in name of new dynamic config

* update RservoirSegmentSampler to gracefully deal with values > 100%

* add handler for <= 0 in ReservoirSegmentSampler

* fixup CoordinatorDynamicConfigTest naming and argument ordering

* fix items in docs after spellcheck flags

* Fix lgtm flag on missing space in string literal

* improve documentation for new config

* Add default value to config docs and add advice in cluster tuning doc

* Add percentOfSegmentsToConsiderPerMove to web console coord config dialog

* update jest snapshot after console change

* fix spell checker errors

* Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider

* add new config back to web console module after merge with master

* fix ReservoirSegmentSamplerTest

* fix line breaks in coordinator console dialog

* Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove

* Make improvements based off of feedback in review

* additional cleanup coming from review

* Add a warning log if limit on segments to consider for move can't be calcluated

* remove unused import

* fix tests for CoordinatorDynamicConfig

* remove precondition test that is redundant in CoordinatorDynamicConfig Builder class
2020-12-22 08:27:55 -08:00
Maytas Monsereenusorn 5bd7924296
Fix kinesis integration test (#10696)
* fix kinesis IT

* fix checkstyle
2020-12-21 12:57:40 -08:00
Clint Wylie 92e5700e1e
fix integration test override config which requires environment variables before calling compose (#10694) 2020-12-18 17:57:07 -08:00
keefe roedersheimer ca3b925133
allow server selection to be aware of query (#10428)
* add query through to server selector

* add nullable extensions, deprecate old methods with defaults

* style changes

* add nullable to ServerSelectorStrategy

* fix test coverage

* missing override in test

* add null check
2020-12-18 13:56:19 -08:00
Maytas Monsereenusorn 6f2ce8f0a5
fix Kinesis It (#10692) 2020-12-18 13:47:00 -08:00
Gian Merlino 57ee8ce4e7
CompressionUtils: Read the entire stream when unzipping from a stream. (#10664)
* CompressionUtils: Read the entire stream when unzipping from a stream.

Should fix #6905 by making sure we avoid closing partially-read streams.

* CHECKSTYLE!
2020-12-17 22:52:04 -08:00
Clint Wylie da0eabaa01
integration test for coordinator and overlord leadership client (#10680)
* integration test for coordinator and overlord leadership, added sys.servers is_leader column

* docs

* remove not needed

* fix comments

* fix compile heh

* oof

* revert unintended

* fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster

* fixes

* style

* dang

* heh

* scripts are hard

* fix spelling

* fix thing that must not matter since was already wrong ip, log when test fails

* needs more heap

* fix merge

* less aggro
2020-12-17 22:50:12 -08:00
Abhishek Agarwal 796c25532e
Fix post-aggregator computation when used with subtotals (#10653)
* Fix post-aggregator computation

* remove commented code

* Fix numeric null handling

* Add test when subquery returns null long
2020-12-17 20:10:26 -08:00
sthetland 6ae8059c09
cleaning up and fixing links (#10528)
* cleaning up and fixing links

* reverting local link

* Update indexer.md

* link checking

* Fixing one more stale link for PostgreSQL
2020-12-17 13:37:43 -08:00
zhangyue19921010 1884c35698
Do Integrate test for Druid base on K8s cluster (#10669)
* add a travls job to do integrate test on K8s

* revert build_run_cluster.sh

* revert msic

* run IT test

* ready to test

* modify before/after script

* done

* change mod for script

* done

* add env DRUID_OPERATOR_VERSION=0.0.3

* change version

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-12-16 16:00:42 -08:00
Vadim Ogievetsky 55b8cc428a
remove extra word (#10682) 2020-12-15 23:02:33 -08:00
Abhishek Agarwal 7a8e9bb156
Fix hadoop docker copy script (#10671) 2020-12-14 23:08:50 -08:00
Himanshu ac1882bf74
kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544)
* honor zk enablement config in more places in druid code

* kubernetes based discovery module

* fix spotbugs check

* fix intellij checks error

* fix doc link to kubernetes.md from extension

* make spellchecker happy

* update license.yaml

* fix dependency check errors

* update extension coverage

* UTs for BaseNodeRoleWatcher

* fix forbidden-api check

* update k8s module coverage ignores

* add Bouncy Castle License being same as MIT License for license checking purposes

* further update licenses.yaml

* label/annotation pre-existence assumption

* address review comment
2020-12-14 21:10:31 -08:00
Harini Rajendran c2e26d2e1c
Add status/selfDiscovered endpoint to indexer for self discovery of indexer (#10679)
Added the status/selfDiscovered endpoint to indexer. Per the api-reference doc, all services support status/selfDiscovered endpoint. So this change would fix that expected behavior.

Also added example config files for indexer process that can be used to spin up the indexer process.
2020-12-14 19:04:14 -08:00
zhangyue19921010 0ad27c06da
Historical load Segments enhancement (#10650)
* load segments with segment files check

* add more java docs

* done

* add java docs

* revert misc

* resolve ci failures

* resolve ci failures

* done

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2020-12-14 13:56:03 -08:00
Clint Wylie 64f97e7003
fix DruidSchema incorrectly listing tables with no segments (#10660)
* fix race condition with DruidSchema tables and dataSourcesNeedingRebuild

* rework to see if it passes analysis

* more better

* maybe this

* re-arrange and comments
2020-12-11 14:14:00 -08:00
Gian Merlino 753fa6b3bd
IdUtils: Forbid characters that cannot be used in znodes. (#10659)
* IdUtils: Forbid characters that cannot be used in znodes.

* Fix whitespace.
2020-12-10 10:49:40 -08:00
Himanshu be019760bb
document DynamicConfigProvider for kafka consumer properties (#10658)
* document DynamicConfigProvider for kafka consumer properties

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

* fix doc build

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-12-10 08:24:33 -08:00
Jihoon Son abcf624a2e
Bump up jackson-databind to 2.10.5.1 (#10655)
* Bump up jackson version to 2.10.5.1

* only jackson-databind

* license
2020-12-09 13:54:47 -08:00
Vadim Ogievetsky 577cd66002
Web console: reflect the changes to interval requirement in the data loader flow (#10647)
* no need for intervals

* don't set redundant fields

* fix tests

* better filter control

* work with not

* wrap callout with form group

* update snapshot

* add split hint

* highlight issues with spec

* fixes

* fix default value

* move intervals back to partition step

* work with all sorts of chars

* fix enabled view
2020-12-09 10:18:42 -08:00
Harini Rajendran 9d2df506ea
Return appropriate config directory path for indexer process. (#10657)
Currently `getConfPath` function returns an empty string for `indexer` service. So the config directory path for this service is not set properly when installed using docker environment.
2020-12-09 09:30:36 -08:00
Atul Mohan 44df05b8b2
Clarify split hint spec behavior (#10656) 2020-12-09 08:24:32 -06:00
Abhishek Agarwal 4ea1ab8531
Fix links in the grouping function doc (#10654) 2020-12-09 14:56:32 +08:00
Gian Merlino 96a387d972
Fixes and tests related to the Indexer process. (#10631)
* Fixes and tests related to the Indexer process.

Three bugs fixed:

1) Indexers would not announce themselves as segment servers if they
   did not have storage locations defined. This used to work, but was
   broken in #9971. Fixed this by adding an "isSegmentServer" method
   to ServerType and updating SegmentLoadDropHandler to always announce
   if this method returns true.

2) Certain batch task types were written in a way that assumed "isReady"
   would be called before "run", which is not guaranteed. In particular,
   they relied on it in order to initialize "taskLockHelper". Fixed this
   by updating AbstractBatchIndexTask to ensure "isReady" is called
   before "run" for these tasks.

3) UnifiedIndexerAppenderatorsManager did not properly handle complex
   datasources. Introduced DataSourceAnalysis in order to fix this.

Test changes:

1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
   Indexer instead of a MiddleManager.

2) Introduce a "USE_INDEXER" environment variable that determines if
   docker-compose will start up an Indexer or a MiddleManager.

3) Duplicate all the jdk8 tests and run them in both MiddleManager and
   Indexer mode.

4) Various adjustments to encourage fail-fast errors in the Docker
   build scripts.

5) Various adjustments to speed up integration tests and reduce memory
   usage.

6) Add another Mac-specific approach to determining a machine's own IP.
   This was useful on my development machine.

7) Update segment-count check in ITCompactionTaskTest to eliminate a
   race condition (it was looking for 6 segments, which only exist
   together briefly, until the older 4 are marked unused).

Javadoc updates:

1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
   that make it clear when taskLockHelper will be initialized as a side
   effect. (Related to the second bug above.)

2) Task: Clarified that "isReady" is not guaranteed to be called before
   "run". It was already implied, but now it's explicit.

3) ZkCoordinator: Clarified deprecation message.

4) DataSegmentServerAnnouncer: Clarified deprecation message.

* Fix stop_cluster script.

* Fix sanity check in script.

* Fix hashbang lines.

* Test and doc adjustments.

* Additional tests, and adjustments for tests.

* Split ITs back out.

* Revert change to druid_coordinator_period_indexingPeriod.

* Set Indexer capacity to match MM.

* Bump up Historical memory.

* Bump down coordinator, overlord memory.

* Bump up Broker memory.
2020-12-08 16:02:26 -08:00
Vyatcheslav Mogilevsky 5324785eac
integration tests fix: update base image for hadoop containers to centos 7 (#10638)
LGTM
2020-12-08 11:00:51 -08:00