Commit Graph

2253 Commits

Author SHA1 Message Date
Parag Jain b35486fa81
request logs through kafka emitter (#11036)
* request logs through kafka emitter

* travis fixes

* review comments

* kafka emitter unit test

* new line

* travis checks

* checkstyle fix

* count request lost when request topic is null
2021-04-01 11:31:32 +05:30
Lasse Krogh Mammen 782a1d4e6c
Add Calcite Avatica protobuf handler (#10543) 2021-03-31 12:46:25 -07:00
Tushar Raj 6789ed0a05
Update reset-cluster.md (#10990)
fixed Error: Could not find or load main class org.apache.druid.cli.Main
2021-03-29 20:38:35 -07:00
Charles Smith 8544d29bc7
remove experimental from Kinesis with caveats (#10998)
* remove experimental from Kinesis with caveats

* add suggested known issue

* spelling fixes
2021-03-29 13:57:58 -07:00
Parag Jain 2fdc313e4d
GCS lookup support (#11026)
* GCS lookup support

* checkstyle fix

* review comments

* review comments

* remove unused import
2021-03-30 01:40:41 +05:30
Gian Merlino bf20f9e979
DruidInputSource: Fix issues in column projection, timestamp handling. (#10267)
* DruidInputSource: Fix issues in column projection, timestamp handling.

DruidInputSource, DruidSegmentReader changes:

1) Remove "dimensions" and "metrics". They are not necessary, because we
   can compute which columns we need to read based on what is going to
   be used by the timestamp, transform, dimensions, and metrics.
2) Start using ColumnsFilter (see below) to decide which columns we need
   to read.
3) Actually respect the "timestampSpec". Previously, it was ignored, and
   the timestamp of the returned InputRows was set to the `__time` column
   of the input datasource.

(1) and (2) together fix a bug in which the DruidInputSource would not
properly read columns that are used as inputs to a transformSpec.

(3) fixes a bug where the timestampSpec would be ignored if you attempted
to set the column to something other than `__time`.

(1) and (3) are breaking changes.

Web console changes:

1) Remove "Dimensions" and "Metrics" from the Druid input source.
2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for
   compatibility with the new behavior.

Other changes:

1) Add ColumnsFilter, a new class that allows input readers to determine
   which columns they need to read. Currently, it's only used by the
   DruidInputSource, but it could be used by other columnar input sources
   in the future.
2) Add a ColumnsFilter to InputRowSchema.
3) Remove the metric names from InputRowSchema (they were unused).
4) Add InputRowSchemas.fromDataSchema method that computes the proper
   ColumnsFilter for given timestamp, dimensions, transform, and metrics.
5) Add "getRequiredColumns" method to TransformSpec to support the above.

* Various fixups.

* Uncomment incorrectly commented lines.

* Move TransformSpecTest to the proper module.

* Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting.

* Fix.

* Fix build.

* Checkstyle.

* Misc fixes.

* Fix test.

* Move config.

* Fix imports.

* Fixup.

* Fix ShuffleResourceTest.

* Add import.

* Smarter exclusions.

* Fixes based on tests.

Also, add TIME_COLUMN constant in the web console.

* Adjustments for tests.

* Reorder test data.

* Update docs.

* Update docs to say Druid 0.22.0 instead of 0.21.0.

* Fix test.

* Fix ITAutoCompactionTest.

* Changes from review & from merging.
2021-03-25 10:32:21 -07:00
Charles Smith d69533dbd9
First refactor of compaction (#10935)
* first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc

* fix links, typos, some reorganization

* fix spelling. TBD still there for work in progress

* updates tutorial examples, adds more clarification around compaction use cases

* add granularity spec to automatic compaction config

* final edits

* spelling fixes

* apply suggestions from review

* upadtes from review

* last edits

* move note

* clarify null

* fix links & spelling

* latest review

* edits to auto-compaction config

* add back rollup

* fix links & spelling

* Update compaction.md

add granularityspec to example
2021-03-24 11:41:44 -07:00
Atul Mohan 3d7e7c2c83
Avoid deletion of load/drop entry from CuratorLoadQueuePeon in case of load timeout (#10213)
* Skip queue removal on timeout

* Clarify error

* Add new config to control replication

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2021-03-17 11:34:05 -07:00
Mohammadamin Karbasforushan dfad38d561
Fix unclear documentation of human readable byte (#10825)
* Fix unclear documentation of human readable byte

Follows https://github.com/apache/druid/pull/10203 ;
See https://github.com/apache/druid/pull/10203#issuecomment-771080634 .

* Fix sentence style

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-11 00:01:38 -08:00
benkrug 7f96ca8f5e
Update topnquery.md (#10944)
minor edits of the English, no meanings changed (imo)
2021-03-09 15:19:02 -08:00
Yi Yuan 36e86a2880
Add protobuf schema registry (#10839)
* dd_protobuf_schema_registry

* change licese

* delete some annotation

* nodify tests

* delete extra exception

* add licenses

* add descriptor and protoMessageType in ProtobufInputRowParser for adopt to old version

* seperate kafka-protobuf-provider

* modify protobuf.md

* refine protobuf.md

* add config and header

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-03-09 15:15:51 -08:00
Tianxin Zhao a57c28e9ce
prometheus metric exporter (#10412)
* prometheus-emitter

* use existing jetty server to expose prometheus collection endpoint

* unused variables

* better variable names

* removed unused dependencies

* more metric definitions

* reorganize

* use prometheus HTTPServer instead of hooking into Jetty server

* temporary empty help string

* temporary non-empty help.  fix incorrect dimension value in JSON (also updated statsd json)

* added full help text.  added metric conversion factor for timers that are not using seconds. Correct metric dimension name in documentation

* added documentation for prometheus emitter

* safety for invalid labelNames

* fix travis checks

* Unit test and better sanitization of metrics names and label values

* add precondition to check namespace against regex

* use precompiled regex

* remove static imports. fix metric types

* better docs. fix possible NPE in PrometheusEmitterConfig. Guard against multiple calls to PrometheusEmitter.start()

* Update regex for label-value replacements to allow internal numeric values.  Additional tests

* Adds missing license header
updates website/.spelling to add words used in prometheus-emitter docs.
updates docs/operations/metrics.md to correct the spelling of
bufferPoolName

* fixes version in extensions-contrib/prometheus-emitter

* fix style guide errors

* update import ordering

* add another word to website/.spelling

* remove unthrown declared exception

* remove unused import

* Pushgateway strategy for metrics

* typo

* Format fix and nullable strategy

* Update pom file for prometheus-emitter

* code review comments. Counter to gauge for cache metrics, periodical task to pushGateway

* Syntax fix

* Dimension label regex include numeric character back, fix previous commit

* bump prometheus-emitter pom dev version

* Remove scheduled task inside poen that push metrics

* Fix checkstyle

* Unit test coverage

* Unit test coverage

* Spelling

* Doc fix

* spelling

Co-authored-by: Michael Schiff <michael.schiff@tubemogul.com>
Co-authored-by: Michael Schiff <schiff.michael@gmail.com>
Co-authored-by: Tianxin Zhao <tianxin.zhao@tubemogul.com>
Co-authored-by: Tianxin Zhao <tizhao@adobe.com>
2021-03-09 14:37:31 -08:00
Abhishek Agarwal c66951a59e
Add flag in SQL to disable left base filter optimization for joins (#10947)
* Add flag to disable left base filter

* code coverage

* Draft

* Review comments

* code coverage

* add docs

* Add old tests
2021-03-09 13:07:34 -08:00
Charles Smith 0f81ce32a0
refactor query caching docs (#10848)
* refactor query caching

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* add description for context link

* accept suggestions

* reword, rework some awkward language

* incorporate feedback, fix errors

* add back perf considerations

* Apply suggestions from code review

applying @suneet-s 's changes

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update caching.md

fix link

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-03-08 22:25:48 -08:00
Jihoon Son 9946306d4b
Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830)
* Allow only HTTP and HTTPS protocols for the HTTP inputSource

* rename

* Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* fix http firehose and update doc

* HDFS inputSource

* add configs for allowed protocols

* fix checkstyle and doc

* more checkstyle

* remove stale doc

* remove more doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* update hdfs address in docs

* fix test

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-06 11:43:00 -08:00
zhangyue19921010 bddacbb1c3
Dynamic auto scale Kafka-Stream ingest tasks (#10524)
* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* test dynamic auto scale done

* auto scale tasks tested on prd cluster

* auto scale tasks tested on prd cluster

* modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20

* rename test fiel function

* change codes and add docs based on capistrant reviewed

* midify test docs

* modify docs

* modify docs

* modify docs

* merge from master

* Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there &&  Make autoscaling algorithm configurable and scalable.

* fix ci failed

* revert msic.xml

* add uts to test autoscaler create && scale out/in and kafka ingest with scale enable

* add more uts

* fix inner class check

* add IT for kafka ingestion with autoscaler

* add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler

* review change

* code review

* remove unused imports

* fix NLP

* fix docs and UTs

* revert misc.xml

* use jackson to build autoScaleConfig with default values

* add uts

* use jackson to init AutoScalerConfig in IOConfig instead of Map<>

* autoscalerConfig interface and provide a defaultAutoScalerConfig

* modify uts

* modify docs

* fix checkstyle

* revert misc.xml

* modify uts

* reviewed code change

* reviewed code change

* code reviewed

* code review

* log changed

* do StringUtils.encodeForFormat when create allocationExec

* code review && limit taskCountMax to partitionNumbers

* modify docs

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-03-06 14:36:52 +05:30
Jihoon Son 16acd6686a
Remove stale 'namespace' config for JDBC lookups from doc (#10886)
* Remove stale 'namespace' config for JDBC lookups from doc and web-console

* revert webconsole change

* address comments
2021-03-04 17:16:34 -08:00
Atul Mohan be2ac8d6ce
Document type inference issues with dynamic params in SQL (#10801)
* Clarify docs

* Apply suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-04 03:48:11 -08:00
spinatelli 99198c02af
Add config and header support for confluent schema registry. (#10314)
* Add config and header support for confluent schema registry. (porting code from https://github.com/apache/druid/pull/9096)

* Add Eclipse Public License 2.0 to license check

* Update licenses.yaml, revert changes to check-licenses.py and dependencies for integration-tests

* Add spelling exception and remove unused dependency

* Use non-deprecated getSchemaById() and remove duplicated license entry

* Update docs/ingestion/data-formats.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Added check for schema being null, as per Confluent code

* Missing imports and whitespace

* Updated unit tests with AvroSchema

Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@7-tv.de>
Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@joyn.de>
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-02-27 14:25:35 -08:00
Charles Smith 573de3bc0d
clarify security requirements around HTTPInputSource (#10914)
* clarify security requirements around HTTPInputSource

* explicitly mention write/datasource in best practices. clarify that the ingestion task is the risk

* Update docs/operations/security-overview.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-02-26 09:37:47 -08:00
zachjsh 67eff4110d
Improve Druid ldap auth documentation (#10915)
* Improve Druid ldap auth documentation

Improved the ldap auth docs by clarifying that the object classes and
attributes noted are specific to Microsoft Active Directory, and could
be different depending on the specific ldap server being used. Also
emphasized the importance of the memberOf field and noted that the
step about adding users to roles is only needed in certain circumstances.

* * add another note

* Apply suggestions from code review

Co-authored-by: sthetland <steve.hetland@imply.io>

* * simplify

* * Address review comments

Co-authored-by: sthetland <steve.hetland@imply.io>
2021-02-24 15:28:41 -08:00
Clint Wylie f34c6eb3c0
add druid jdbc handler config for minimum number of rows per frame (#10880)
* add druid jdbc handler config for minimum number of rows per frame

* javadocs and docs adjustments

* spelling

* adjust docs per review with minor tweaks

* adjust more
2021-02-23 02:11:04 -08:00
Clint Wylie cbbef80c7f
add SQL operators for bitwise expressions (#10823)
* add SQL operators for bitwise expressions

* more test

* fix spelling

* more tests
2021-02-18 20:56:33 -08:00
sthetland 1e40f51e65
Fix example names of security artifacts in docs (#10882)
* replacing example names

* unrelated typos

* unintended changes

* a few more typo fixes
2021-02-16 14:58:50 -08:00
Jihoon Son 1ec3f0bd73
Revert "Add support for Blacklisting some domains for HTTPInputSource (#10535)" (#10871)
This reverts commit 6b14bdb3a5.
2021-02-09 17:51:26 -08:00
Charles Smith 41a0c1d4b5
reword appendToExisting for clarity. add failure behavior (#10748) 2021-02-08 13:00:01 -08:00
Jihoon Son ac41e41232
Update doc for query errors and add unit tests for JsonParserIterator (#10833)
* Update doc for query errors and add unit tests for JsonParserIterator

* static constructor for convenience

* rename method
2021-02-05 02:55:32 -08:00
Abhishek Agarwal 96d26e5338
Fix kinesis ingestion bugs (#10761)
* add offsetFetchPeriod to kinesis ingestion doc

* Remove jackson dependencies from extensions

* Use fixed delay for lag collection

* Metrics reset after finishing processing

* comments

* Broaden the list of exceptions to retry for

* Unit tests

* Add more tests

* Refactoring

* re-order metrics

* Doc suggestions

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* Add tests

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-02-05 02:49:58 -08:00
Clint Wylie 2ce7b3dcf4
bitwise math function expressions (#10605)
* expressions: adding bitwise expressions

* double handling and vectorization

* move conversion to Evals

* revert unintended changes

* less magic, split convert functions, fix parser for funny exponent doubles

* fix spelling exceptions list

* more spelling

* fix grammar, add more test, fix docs

* fix docs

Co-authored-by: Max Kaplan <max@maxkaplan.me>
2021-01-28 11:16:53 -08:00
Maytas Monsereenusorn a46d561bd7
Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740)
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix checkstyle

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix test

* fix test

* add log

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* address comments

* fix checkstyle

* fix checkstyle

* add config to skip overhead memory calculation

* add test for the skipBytesInMemoryOverheadCheck config

* add docs

* fix checkstyle

* fix checkstyle

* fix spelling

* address comments

* fix travis

* address comments
2021-01-27 00:34:56 -08:00
Himadri Singh 1c1b396eaa
AWS Web Identity / IRSA Support (#10541)
* AWS Web Identity Support

required for AWS IRSA

* Update kinesis-ingestion.md

* disabling coverage tests

https://github.com/apache/druid/pull/10541#issuecomment-737558213

* exclude coverage

* Update licenses.yaml
2021-01-25 18:44:02 +05:30
Charles Smith 99494e3d16
suggest index parallel for native batch reindexing > 1GB (#10788) 2021-01-22 21:54:28 -08:00
zhangyue19921010 bf1d1d583b
modify (#10778)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-22 09:20:13 -08:00
zhangyue19921010 2837a9b62f
[Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout` (#10661)
* done

* done

* done

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-08 15:23:08 -08:00
Abhishek Agarwal f66fdbfa5d
add offsetFetchPeriod to kinesis ingestion doc (#10734) 2021-01-08 14:19:26 -08:00
Himanshu c7b1212a43
AWS RDS token based password provider (#9518)
* refresh db pwd

* aws iam token password provider

* fix analyze-dependencies build

* fix doc build

* add  ut for BasicDataSourceExt

* more doc updates

* more  doc update

* moving aws  token password  provider to new extension

* remove duplicate changes

* make  all config inline

* extension docs

* refresh db  password  in SQL Firehose code path as well

* add ut

* fix build

* add new extension to distribution

* rds lib is not provided

* fix license build

* add version to license

* change parent version to 0.19.0-snapshot

* address review comments

* fix core/ code coverage

* Update server/src/main/java/org/apache/druid/metadata/BasicDataSourceExt.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* address review comments

* fix spellchecker

* remove inadvertant website file change

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-01-06 21:15:29 -08:00
Makdon f9fc1892d1
Typo: missing comma in json (#10711) 2021-01-06 13:49:50 -08:00
Jonathan Wei 68bb038b31
Multiphase segment merge for IndexMergerV9 (#10689)
* Multiphase merge for IndexMergerV9

* JSON fix

* Cleanup temp files

* Docs

* Address logging and add IT

* Fix spelling and test unloader datasource name
2021-01-05 22:19:09 -08:00
Himanshu d2e6240cac
k8s-int-test-build: zk-less druid cluster and http based segment/task managment (#10686)
* zk-less druid cluster in k8s build

* attempt to fix build and use http based remote task management

* mm/router logs for debugging

* add default account k8s role and binding for pod, configMap access

* fix issue

* change router port to 8088 for common readinessProbe

* break build_run_k8s_cluster.sh into separate scripts

* revert changes to K8sDruidNodeAnnouncer.java

* k8s extension doc update

* add license to new file

* address review comments

* do not try to load lookups at startup to improve cluster startup time
2021-01-05 18:51:47 -08:00
Charles Smith 797371598d
update syntax for golbal cached uri lookups (#10629) 2020-12-24 09:49:01 -08:00
Xavier Léauté b7a16d08a6
Update Apache Kafka to 2.7.0 (#10701)
- align scala versions to match Kafka
2020-12-22 13:56:00 -08:00
Lucas Capistrant 58ce2e55d8
Add dynamic coordinator config that allows control over how many segments are considered when picking a segment to move. (#10284)
* dynamic coord config adding more balancing control

add new dynamic coordinator config, maxSegmentsToConsiderPerMove. This
config caps the number of segments that are iterated over when selecting
a segment to move. The default value combined with current balancing
strategies will still iterate over all provided segments. However,
setting this value to something > 0 will cap the number of segments
visited. This could make sense in cases where a cluster has a very large
number of segments and the admins prefer less iterations vs a thorough
consideration of all segments provided.

* fix checkstyle failure

* Make doc more detailed for admin to understand when/why to use new config

* refactor PR to use a % of segments instead of raw number

* update the docs

* remove bad doc line

* fix typo in name of new dynamic config

* update RservoirSegmentSampler to gracefully deal with values > 100%

* add handler for <= 0 in ReservoirSegmentSampler

* fixup CoordinatorDynamicConfigTest naming and argument ordering

* fix items in docs after spellcheck flags

* Fix lgtm flag on missing space in string literal

* improve documentation for new config

* Add default value to config docs and add advice in cluster tuning doc

* Add percentOfSegmentsToConsiderPerMove to web console coord config dialog

* update jest snapshot after console change

* fix spell checker errors

* Improve debug logging in getRandomSegmentBalancerHolder to cover all bad inputs for % of segments to consider

* add new config back to web console module after merge with master

* fix ReservoirSegmentSamplerTest

* fix line breaks in coordinator console dialog

* Add a test that helps ensure not regressions for percentOfSegmentsToConsiderPerMove

* Make improvements based off of feedback in review

* additional cleanup coming from review

* Add a warning log if limit on segments to consider for move can't be calcluated

* remove unused import

* fix tests for CoordinatorDynamicConfig

* remove precondition test that is redundant in CoordinatorDynamicConfig Builder class
2020-12-22 08:27:55 -08:00
Clint Wylie da0eabaa01
integration test for coordinator and overlord leadership client (#10680)
* integration test for coordinator and overlord leadership, added sys.servers is_leader column

* docs

* remove not needed

* fix comments

* fix compile heh

* oof

* revert unintended

* fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster

* fixes

* style

* dang

* heh

* scripts are hard

* fix spelling

* fix thing that must not matter since was already wrong ip, log when test fails

* needs more heap

* fix merge

* less aggro
2020-12-17 22:50:12 -08:00
sthetland 6ae8059c09
cleaning up and fixing links (#10528)
* cleaning up and fixing links

* reverting local link

* Update indexer.md

* link checking

* Fixing one more stale link for PostgreSQL
2020-12-17 13:37:43 -08:00
Himanshu ac1882bf74
kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544)
* honor zk enablement config in more places in druid code

* kubernetes based discovery module

* fix spotbugs check

* fix intellij checks error

* fix doc link to kubernetes.md from extension

* make spellchecker happy

* update license.yaml

* fix dependency check errors

* update extension coverage

* UTs for BaseNodeRoleWatcher

* fix forbidden-api check

* update k8s module coverage ignores

* add Bouncy Castle License being same as MIT License for license checking purposes

* further update licenses.yaml

* label/annotation pre-existence assumption

* address review comment
2020-12-14 21:10:31 -08:00
Himanshu be019760bb
document DynamicConfigProvider for kafka consumer properties (#10658)
* document DynamicConfigProvider for kafka consumer properties

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Jihoon Son <jihoonson@apache.org>

* Update docs/development/extensions-core/kafka-ingestion.md

* fix doc build

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-12-10 08:24:33 -08:00
Atul Mohan 44df05b8b2
Clarify split hint spec behavior (#10656) 2020-12-09 08:24:32 -06:00
Abhishek Agarwal 4ea1ab8531
Fix links in the grouping function doc (#10654) 2020-12-09 14:56:32 +08:00
Gian Merlino 96a387d972
Fixes and tests related to the Indexer process. (#10631)
* Fixes and tests related to the Indexer process.

Three bugs fixed:

1) Indexers would not announce themselves as segment servers if they
   did not have storage locations defined. This used to work, but was
   broken in #9971. Fixed this by adding an "isSegmentServer" method
   to ServerType and updating SegmentLoadDropHandler to always announce
   if this method returns true.

2) Certain batch task types were written in a way that assumed "isReady"
   would be called before "run", which is not guaranteed. In particular,
   they relied on it in order to initialize "taskLockHelper". Fixed this
   by updating AbstractBatchIndexTask to ensure "isReady" is called
   before "run" for these tasks.

3) UnifiedIndexerAppenderatorsManager did not properly handle complex
   datasources. Introduced DataSourceAnalysis in order to fix this.

Test changes:

1) Add a new "docker-compose.cli-indexer.yml" config that spins up an
   Indexer instead of a MiddleManager.

2) Introduce a "USE_INDEXER" environment variable that determines if
   docker-compose will start up an Indexer or a MiddleManager.

3) Duplicate all the jdk8 tests and run them in both MiddleManager and
   Indexer mode.

4) Various adjustments to encourage fail-fast errors in the Docker
   build scripts.

5) Various adjustments to speed up integration tests and reduce memory
   usage.

6) Add another Mac-specific approach to determining a machine's own IP.
   This was useful on my development machine.

7) Update segment-count check in ITCompactionTaskTest to eliminate a
   race condition (it was looking for 6 segments, which only exist
   together briefly, until the older 4 are marked unused).

Javadoc updates:

1) AbstractBatchIndexTask: Added javadocs to determineLockGranularityXXX
   that make it clear when taskLockHelper will be initialized as a side
   effect. (Related to the second bug above.)

2) Task: Clarified that "isReady" is not guaranteed to be called before
   "run". It was already implied, but now it's explicit.

3) ZkCoordinator: Clarified deprecation message.

4) DataSegmentServerAnnouncer: Clarified deprecation message.

* Fix stop_cluster script.

* Fix sanity check in script.

* Fix hashbang lines.

* Test and doc adjustments.

* Additional tests, and adjustments for tests.

* Split ITs back out.

* Revert change to druid_coordinator_period_indexingPeriod.

* Set Indexer capacity to match MM.

* Bump up Historical memory.

* Bump down coordinator, overlord memory.

* Bump up Broker memory.
2020-12-08 16:02:26 -08:00
frank chen c410648630
fix injection failure of StorageLocationSelectorStrategy objects (#10363)
* fix to allow customer storage location selector strategy

* add test cases to check instance of selector strategy

* update doc

* code format

* resolve code review comments

* inject StorageLocation

* fix CI

* fix mismatched license item reported by CI

* change property path from druid.segmentCache.locationSelectorStrategy.type to druid.segmentCache.locationSelector.strategy

* using a helper method to bind to correct property path
2020-12-08 09:48:31 -08:00