Commit Graph

10906 Commits

Author SHA1 Message Date
Clint Wylie 58294329b7
fix SQL issue for group by queries with time filter that gets optimized to false (#10968)
* fix SQL issue for group by queries with time filter that gets optimized to false

* short circuit always false in CombineAndSimplifyBounds

* adjust

* javadocs

* add preconditions for and/or filters to ensure they have children

* add comments, remove preconditions
2021-03-09 19:41:16 -08:00
Jonathan Wei 9c083783c9
Don't fail on invalid views in InformationSchema (#10960)
* Don't fail on invalid views in InformationSchema

* Fix test
2021-03-09 16:19:59 -08:00
benkrug 7f96ca8f5e
Update topnquery.md (#10944)
minor edits of the English, no meanings changed (imo)
2021-03-09 15:19:02 -08:00
Yi Yuan 36e86a2880
Add protobuf schema registry (#10839)
* dd_protobuf_schema_registry

* change licese

* delete some annotation

* nodify tests

* delete extra exception

* add licenses

* add descriptor and protoMessageType in ProtobufInputRowParser for adopt to old version

* seperate kafka-protobuf-provider

* modify protobuf.md

* refine protobuf.md

* add config and header

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-03-09 15:15:51 -08:00
Tianxin Zhao a57c28e9ce
prometheus metric exporter (#10412)
* prometheus-emitter

* use existing jetty server to expose prometheus collection endpoint

* unused variables

* better variable names

* removed unused dependencies

* more metric definitions

* reorganize

* use prometheus HTTPServer instead of hooking into Jetty server

* temporary empty help string

* temporary non-empty help.  fix incorrect dimension value in JSON (also updated statsd json)

* added full help text.  added metric conversion factor for timers that are not using seconds. Correct metric dimension name in documentation

* added documentation for prometheus emitter

* safety for invalid labelNames

* fix travis checks

* Unit test and better sanitization of metrics names and label values

* add precondition to check namespace against regex

* use precompiled regex

* remove static imports. fix metric types

* better docs. fix possible NPE in PrometheusEmitterConfig. Guard against multiple calls to PrometheusEmitter.start()

* Update regex for label-value replacements to allow internal numeric values.  Additional tests

* Adds missing license header
updates website/.spelling to add words used in prometheus-emitter docs.
updates docs/operations/metrics.md to correct the spelling of
bufferPoolName

* fixes version in extensions-contrib/prometheus-emitter

* fix style guide errors

* update import ordering

* add another word to website/.spelling

* remove unthrown declared exception

* remove unused import

* Pushgateway strategy for metrics

* typo

* Format fix and nullable strategy

* Update pom file for prometheus-emitter

* code review comments. Counter to gauge for cache metrics, periodical task to pushGateway

* Syntax fix

* Dimension label regex include numeric character back, fix previous commit

* bump prometheus-emitter pom dev version

* Remove scheduled task inside poen that push metrics

* Fix checkstyle

* Unit test coverage

* Unit test coverage

* Spelling

* Doc fix

* spelling

Co-authored-by: Michael Schiff <michael.schiff@tubemogul.com>
Co-authored-by: Michael Schiff <schiff.michael@gmail.com>
Co-authored-by: Tianxin Zhao <tianxin.zhao@tubemogul.com>
Co-authored-by: Tianxin Zhao <tizhao@adobe.com>
2021-03-09 14:37:31 -08:00
Abhishek Agarwal c66951a59e
Add flag in SQL to disable left base filter optimization for joins (#10947)
* Add flag to disable left base filter

* code coverage

* Draft

* Review comments

* code coverage

* add docs

* Add old tests
2021-03-09 13:07:34 -08:00
Maytas Monsereenusorn 4dd22a850b
Fix streaming ingestion fails if it encounters empty rows (Regression) (#10962)
* Fix streaming ingestion fails and halt if it  encounters empty rows

* address comments
2021-03-09 12:11:58 -08:00
frank chen 80ec28578a
show leader in Services Tab (#10951)
Signed-off-by: frank chen <frank.chen021@outlook.com>
2021-03-09 08:03:56 -08:00
Charles Smith 0f81ce32a0
refactor query caching docs (#10848)
* refactor query caching

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* Update docs/querying/using-caching.md

Co-authored-by: sthetland <steve.hetland@imply.io>

* add description for context link

* accept suggestions

* reword, rework some awkward language

* incorporate feedback, fix errors

* add back perf considerations

* Apply suggestions from code review

applying @suneet-s 's changes

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update caching.md

fix link

Co-authored-by: sthetland <steve.hetland@imply.io>
Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-03-08 22:25:48 -08:00
Abhishek Agarwal 489f5b1a03
Avoid expensive findEntry call in segment metadata query (#10892)
* Avoid expensive findEntry call in segment metadata query

* other places

* Remove findEntry

* Fix add cost

* Refactor a bit

* Add performance test

* Add comment

* Review comments

* intellij
2021-03-08 22:08:33 -08:00
Abhishek Agarwal ae620921df
Fix classCastException when inputs to union are join (#10950)
* Fix union queries

* Add tests
2021-03-08 21:20:26 -08:00
Suneet Saldanha 756ac6ef30
Remove flaky arm64 test job (#10953) 2021-03-08 14:09:33 -08:00
Clint Wylie 96889cdebc
add avro + kafka + schema registry integration test (#10929)
* add avro + schema registry integration test

* style

* retry init

* maybe this

* oops heh

* this will fix it

* review stuffs

* fix comment
2021-03-08 08:12:12 -08:00
Jihoon Son 9946306d4b
Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830)
* Allow only HTTP and HTTPS protocols for the HTTP inputSource

* rename

* Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* fix http firehose and update doc

* HDFS inputSource

* add configs for allowed protocols

* fix checkstyle and doc

* more checkstyle

* remove stale doc

* remove more doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* update hdfs address in docs

* fix test

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-06 11:43:00 -08:00
zhangyue19921010 bddacbb1c3
Dynamic auto scale Kafka-Stream ingest tasks (#10524)
* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* druid task auto scale based on kafka lag

* fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig

* test dynamic auto scale done

* auto scale tasks tested on prd cluster

* auto scale tasks tested on prd cluster

* modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20

* rename test fiel function

* change codes and add docs based on capistrant reviewed

* midify test docs

* modify docs

* modify docs

* modify docs

* merge from master

* Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there &&  Make autoscaling algorithm configurable and scalable.

* fix ci failed

* revert msic.xml

* add uts to test autoscaler create && scale out/in and kafka ingest with scale enable

* add more uts

* fix inner class check

* add IT for kafka ingestion with autoscaler

* add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler

* review change

* code review

* remove unused imports

* fix NLP

* fix docs and UTs

* revert misc.xml

* use jackson to build autoScaleConfig with default values

* add uts

* use jackson to init AutoScalerConfig in IOConfig instead of Map<>

* autoscalerConfig interface and provide a defaultAutoScalerConfig

* modify uts

* modify docs

* fix checkstyle

* revert misc.xml

* modify uts

* reviewed code change

* reviewed code change

* code reviewed

* code review

* log changed

* do StringUtils.encodeForFormat when create allocationExec

* code review && limit taskCountMax to partitionNumbers

* modify docs

* code review

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-03-06 14:36:52 +05:30
Jihoon Son 16acd6686a
Remove stale 'namespace' config for JDBC lookups from doc (#10886)
* Remove stale 'namespace' config for JDBC lookups from doc and web-console

* revert webconsole change

* address comments
2021-03-04 17:16:34 -08:00
Jihoon Son 2c30f8b3b7
Migrate bitmap benchmarks to JMH (#10936)
* Migrate bitmap benchmarks to JMH

* add concise
2021-03-04 12:50:55 -08:00
Abhishek Agarwal 1a15987432
Supporting filters in the left base table for join datasources (#10697)
* where filter left first draft

* Revert changes in calcite test

* Refactor a bit

* Fixing the Tests

* Changes

* Adding tests

* Add tests for correlated queries

* Add comment

* Fix typos
2021-03-04 10:39:21 -08:00
Atul Mohan 6040c30fcd
Upgrade jetty to latest version (#10937)
* Upgrade jetty

* Fix license
2021-03-04 08:28:50 -06:00
Atul Mohan be2ac8d6ce
Document type inference issues with dynamic params in SQL (#10801)
* Clarify docs

* Apply suggestions from code review

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-03-04 03:48:11 -08:00
Gian Merlino 87a2abff79
Fix runtime error when IndexedTableJoinMatcher matches long selector to unique string index. (#10942)
* Fix runtime error when IndexedTableJoinMatcher matches long selector to unique string index.

The issue arises when matching against a long selector on the left-hand side to a string
typed Index on the right-hand side, and when that Index also returns true from areKeysUnique.
In this case, IndexedTableJoinMatcher would generate a ConditionMatcher that implements
matchSingleRow by calling findUniqueLong on the Index. This is inappropriate because the Index
is actually string typed. The fix is to check the type of the Index before deciding how to
implement the ConditionMatcher.

The patch adds "testMatchSingleRowToUniqueStringIndex" to IndexedTableJoinMatcherTest, which
explores this case.

* Update tests.
2021-03-04 00:57:59 -08:00
Jihoon Son d6e591220b
Add instructions for updating licenses (#10894)
* Add instructions for updating licenses

* updating license

* pull request template
2021-03-03 00:57:09 -08:00
Jihoon Son 8831c0d057
Update PR template to include the contributing doc; suggestion to not use force push (#10769)
* Update PR template to include the contributing doc; suggestion to not use force push

* make it a comment

* moved it to top

* address review comment
2021-03-02 22:26:49 -08:00
Maytas Monsereenusorn 23333914c7
add javadoc and test (#10938) 2021-03-03 11:34:00 +08:00
Abhishek Agarwal 7d9a61cf7f
Suppress CVE-2017-15288 and upgrade bcprov-ext-jdk15o (#10933) 2021-03-02 16:18:27 -08:00
Maytas Monsereenusorn b7b0ee8362
Add query granularity to compaction task (#10900)
* add query granularity to compaction task

* fix checkstyle

* fix checkstyle

* fix test

* fix test

* add tests

* fix test

* fix test

* cleanup

* rename class

* fix test

* fix test

* add test

* fix test
2021-03-02 11:23:52 -08:00
Gian Merlino 05e8f8fe06
CsvInputFormat: Create a parser per InputEntityReader. (#10923)
RFC4180Parser is not thread safe and cannot be shared across readers.
2021-02-27 18:37:05 -08:00
spinatelli 99198c02af
Add config and header support for confluent schema registry. (#10314)
* Add config and header support for confluent schema registry. (porting code from https://github.com/apache/druid/pull/9096)

* Add Eclipse Public License 2.0 to license check

* Update licenses.yaml, revert changes to check-licenses.py and dependencies for integration-tests

* Add spelling exception and remove unused dependency

* Use non-deprecated getSchemaById() and remove duplicated license entry

* Update docs/ingestion/data-formats.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Added check for schema being null, as per Confluent code

* Missing imports and whitespace

* Updated unit tests with AvroSchema

Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@7-tv.de>
Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@joyn.de>
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-02-27 14:25:35 -08:00
Charles Smith 573de3bc0d
clarify security requirements around HTTPInputSource (#10914)
* clarify security requirements around HTTPInputSource

* explicitly mention write/datasource in best practices. clarify that the ingestion task is the risk

* Update docs/operations/security-overview.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-02-26 09:37:47 -08:00
Alexander Saydakov f930cf14d6
Use the latest Apache DataSketches release 2.0.0 (#10917)
* use the latest Apache DataSketches release 2.0.0

* updated datasketches version

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2021-02-26 07:52:00 -06:00
Gian Merlino 07902f607b
Granularity: Introduce primitive-typed bucketStart, increment methods. (#10904)
* Granularity: Introduce primitive-typed bucketStart, increment methods.

Saves creation of unnecessary DateTime objects in timestamp_floor and
timestamp_ceil expressions.

* Fix style.

* Amp up the test coverage.
2021-02-25 07:59:20 -08:00
zachjsh 67eff4110d
Improve Druid ldap auth documentation (#10915)
* Improve Druid ldap auth documentation

Improved the ldap auth docs by clarifying that the object classes and
attributes noted are specific to Microsoft Active Directory, and could
be different depending on the specific ldap server being used. Also
emphasized the importance of the memberOf field and noted that the
step about adding users to roles is only needed in certain circumstances.

* * add another note

* Apply suggestions from code review

Co-authored-by: sthetland <steve.hetland@imply.io>

* * simplify

* * Address review comments

Co-authored-by: sthetland <steve.hetland@imply.io>
2021-02-24 15:28:41 -08:00
Clint Wylie 0ecc90142e
basic security extension ignore permissions that use unknown ResourceType or Action (#10896)
* suppress unknown ResourceType and Action for basic-security authorizer stuff

* fix pom

* print failed role, test logs
2021-02-23 14:49:09 -08:00
zachjsh 553f5c8570
Ldap integration tests (#10901)
* Add integration tests for ldap extension

* * refactor

* * add ldap-security integration test to travis

* * fix license error

* * Fix failing other integration test

* * break up large tests
* refactor
* address review comments

* * fix intellij inspections failure

* * remove dead code
2021-02-23 13:29:57 -08:00
Clint Wylie f34c6eb3c0
add druid jdbc handler config for minimum number of rows per frame (#10880)
* add druid jdbc handler config for minimum number of rows per frame

* javadocs and docs adjustments

* spelling

* adjust docs per review with minor tweaks

* adjust more
2021-02-23 02:11:04 -08:00
Abhishek Agarwal 3a0a0c033f
Reload segment usage when starting the process (#10884)
* Reload segment usage when starting the process

* doc

* Add more tests

* remove forbidden method

* Add alert
2021-02-22 00:08:44 -08:00
Gian Merlino b7e9f5bc85
BoundDimFilter: Simplify the various DruidLongPredicates. (#10906)
They all use Long.compare, but they don't need to. Changing to
regular comparisons simplifies the code and also removes branches.
(Internally, Long.compare has two branches.)
2021-02-19 16:44:56 -08:00
Maytas Monsereenusorn f5bfccc720
Fix maxBytesInMemory for heap overhead of all sinks and hydrants check (#10891)
* fix maxBytesInMemory

* fix maxBytesInMemory check

* fix maxBytesInMemory check

* fix test
2021-02-18 21:48:57 -08:00
Clint Wylie cbbef80c7f
add SQL operators for bitwise expressions (#10823)
* add SQL operators for bitwise expressions

* more test

* fix spelling

* more tests
2021-02-18 20:56:33 -08:00
Jonathan Wei 84341737d5
Add property for binding view manager type (#10895)
* Add property for binding view manager type

* Checkstyle

* Fix constructor

* Add @Test
2021-02-18 15:57:45 -08:00
Agustin Gonzalez eabad0fb35
Keep query granularity of compacted segments after compaction (#10856)
* Keep query granularity of compacted segments after compaction

* Protect against null isRollup

* Fix bugspot check RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN & edit an existing comment

* Make sure that NONE is also included when comparing for the finer granularity

* Update integration test check for segment size due to query granularity propagation affecting size

* Minor code cleanup

* Added functional test to verify queryGranlarity after compaction

* Minor style fix

* Update unit tests
2021-02-18 01:35:10 -08:00
Vadim Ogievetsky 1a4c43f9fd
Web console: remove namespace prop that does not exist from JDBC lookup (#10888)
* remove namespace prop that does not exist from JDBC lookup

* remove namespace from tests
2021-02-17 17:07:32 -08:00
Benedict Jin 32e801ceab
Bump Apache Parquet from 1.11.0 to 1.11.1 (#10889) 2021-02-17 12:18:17 +08:00
Suneet Saldanha bc7004006f
Update dependency-check plugin (#10883)
* Use dependency-check aggregate

* oops
2021-02-16 19:22:04 -08:00
Jonathan Wei 8ad68135c8
Filter unauthorized views in InformationSchema (#10874)
* Filter unauthorized views in InformationSchema

* Use fixed name for view schema

* Remove unused string
2021-02-16 17:36:45 -08:00
sthetland 1e40f51e65
Fix example names of security artifacts in docs (#10882)
* replacing example names

* unrelated typos

* unintended changes

* a few more typo fixes
2021-02-16 14:58:50 -08:00
Will Xu c8d2654605
Use native git for git-commit-id-plugin to speed up build (#10881)
* Segment timeline doesn't show results older than 3 months

* Adoption testing patch for web segment timeline view and also refactoring default time config

* Changing git-commit-id-plugin to use native git, shaving off 15% off build time

Co-authored-by: dev <dev@dev.minitoken.com>
2021-02-12 09:31:07 -08:00
Maytas Monsereenusorn 6541178c21
Support segmentGranularity for auto-compaction (#10843)
* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* resolve conflict

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* fix tests

* fix more tests

* fix checkstyle

* add unit tests

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add unit tests

* add integration tests

* fix checkstyle

* fix checkstyle

* fix failing tests

* address comments

* address comments

* fix tests

* fix tests

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test
2021-02-12 03:03:20 -08:00
misqos e684b83e29
Add the ability to supply client certificate to dsql comand line tool. (#10765) 2021-02-11 20:16:47 -08:00
zachjsh 64774037c1
Add config option to specify zk version in integration tests (#10870)
* Update integration-tests README

Updated the integration-tests README file to include instructions
for setting the `ZK_VERSION` property which is now required to be
set prior to executing any integration test. Also added a note
about the importance of setting the test group parameter when
running integration tests, even when running single tests.

* * revert change made to DOCKER_IP doc

* * Add default value for zk version

* * update travis config to use new zk.version property when running
  integration tests

* Remove doc about needing to set ZK_VERSION variable when running
  integration tests
2021-02-11 10:31:49 -08:00