Commit Graph

331 Commits

Author SHA1 Message Date
Jihoon Son 0cc9eb4903
Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288)
* Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided

* query context

* fix tests; add more test

* javadoc

* docs and more tests

* remove default and hadoop tests

* consistent name and fix javadoc

* spelling and field name

* default function for partitionsSpec

* other comments

* address comments

* fix tests and spelling

* test

* doc
2020-09-24 16:32:56 -07:00
Jonathan Wei cb30b1fe23
Automatically determine numShards for parallel ingestion hash partitioning (#10419)
* Automatically determine numShards for parallel ingestion hash partitioning

* Fix inspection, tests, coverage

* Docs and some PR comments

* Adjust locking

* Use HllSketch instead of HyperLogLogCollector

* Fix tests

* Address some PR comments

* Fix granularity bug

* Small doc fix
2020-09-24 13:47:53 -07:00
Maytas Monsereenusorn 72f1b55f56
Add last_compaction_state to sys.segments table (#10413)
* Add is_compacted to sys.segments table

* change is_compacted to last_compaction_state

* fix tests

* fix tests

* address comments
2020-09-23 15:29:36 -07:00
Atul Mohan b6ad790dc7
Support combining inputsource for parallel ingestion (#10387)
* Add combining inputsource

* Fix documentation

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-09-15 16:25:35 -07:00
Jihoon Son 8657b23ab2
Integration tests and docs for auto compaction with different partitioning (#10354)
* Working

* add test

* doc

* fix test

* split other integration test

* exclude other-index from other tests

* doc anchor fix

* adjust task slots and number of merge tasks

* spell check

* reduce maxNumConcurrentSubTasks to 1

* maxNumConcurrentSubtasks for range partitinoing

* reduce memory for historical

* change group name
2020-09-15 11:28:09 -07:00
Joy Kent e5f0da30ae
Fix stringFirst/stringLast rollup during ingestion (#10332)
* Add IndexMergerRollupTest

This changelist adds a test to merge indexes with StringFirst/StringLast aggregator.

* Fix StringFirstAggregateCombiner/StringLastAggregateCombiner

The segment-level type for stringFirst/stringLast is SerializablePairLongString,
not String. This changelist fixes it.

* Fix EarliestLatestAnySqlAggregator to handle COMPLEX type

This changelist allows EarliestLatestAnySqlAggregator to accept COMPLEX
type as an operand. For its return type, we set it to VARCHAR, since
COMPLEX column is only generated by stringFirst/stringLast during ingestion
rollup.

* Return value with smaller timestamp in StringFirstAggregatorFactory.combine function

* Add integration tests for stringFirst/stringLast during ingestion

* Use one EarliestLatestReturnTypeInference instance

Co-authored-by: Joy Kent <joy@automonic.ai>
2020-09-08 17:36:04 -07:00
Gian Merlino 21703d81ac
Fix handling of 'join' on top of 'union' datasources. (#10318)
* Fix handling of 'join' on top of 'union' datasources.

The problem is that unions are typically rewritten into a series of
individual queries on the underlying tables, but this isn't done when
the union is wrapped in a join.

The main changes are in UnionQueryRunner:

1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis.
2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource".

Together, these enable UnionQueryRunner to "see through" a join.

* Tests.

* Adjust heap sizes for integration tests.

* Different approach, more tests.

* Tweak.

* Styling.
2020-08-26 14:23:54 -07:00
Jihoon Son b5b3e6ecce
Add maxNumFiles to splitHintSpec (#10243)
* Add maxNumFiles to splitHintSpec

* missing link

* fix build failure; use maxNumFiles for integration tests

* spelling

* lower default

* Update docs/ingestion/native-batch.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* address comments; change default maxSplitSize

* spelling

* typos and doc

* same change for segments splitHintSpec

* fix build

* fix build

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2020-08-21 09:43:58 -07:00
Clint Wylie 7620b0c54e
Segment backed broadcast join IndexedTable (#10224)
* Segment backed broadcast join IndexedTable

* fix comments

* fix tests

* sharing is caring

* fix test

* i hope this doesnt fix it

* filter by schema to maybe fix test

* changes

* close join stuffs so it does not leak, allow table to directly make selector factory

* oops

* update comment

* review stuffs

* better check
2020-08-20 14:12:39 -07:00
Atul Mohan 618c04a99e
Fix CombiningFirehose compatibility (#10264)
* Fix CombiningFirehose

* Add integration test

* Fix path

* Add full datasource name

* Fix input location

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-08-20 10:37:38 -07:00
Clint Wylie b36dab0fe6
fix connectionId issue with JDBC prepared statement queries and router (#10272)
* fix router jdbc prepared statement connectionId issue

* column metadata too

* style

* remove tls

* try tls again

* add keystore stuffs

* use keyManager password

* add unit test

* simplify
2020-08-19 00:18:06 -07:00
Xavier Léauté 225490474d
Update Kafka dependencies to 2.6.0 (#10286)
* update Kafka dependencies to Kafka 2.6.0
* switch to Scala 2.13 build of Kafka
* update integration tests
* update Kafka tutorial
2020-08-15 07:56:40 -07:00
Clint Wylie e053348f74
add hasNulls to ColumnCapabilities, ColumnAnalysis (#10219)
* add isNullable to ColumnCapabilities, ColumnAnalysis

* better builder

* fix segment metadata queries in integration tests

* adjustments

* cleanup

* fix spotbugs

* treat unknown as true in segmentmetadata

* rename to hasNulls, add docs

* fixup

* test the dim indexer selector isNull fix for numeric columns

* fixes

* oof
2020-08-13 14:55:32 -07:00
Atul Mohan 06539bc828
Set default server.maxsize to the sum of segment cache (#10255)
* Default server.maxsize

* Remove maxsize refs from config

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-08-10 09:21:22 -07:00
Jihoon Son 6fdce36e41
Add integration tests for query retry on missing segments (#10171)
* Add integration tests for query retry on missing segments

* add missing dependencies; fix travis conf

* address comments

* Integration tests extension

* remove unused dependency

* remove druid_main

* fix java agent port
2020-07-22 22:30:35 -07:00
Jihoon Son 26d099f39b
Fix sys.servers table to not throw NPE and handle brokers/indexers/peons properly for broadcast segments (#10183)
* Fix sys.servers table to not throw NPE and handle brokers/indexers/peons properly for broadcast segments

* fix tests and add missing tests

* revert null handling fix

* unused import

* move out util methods from DiscoveryDruidNode
2020-07-21 17:52:51 -07:00
Maytas Monsereenusorn dd7a32ad48
Fix ITSqlInputSourceTest (#10194)
* Fix ITSqlInputSourceTest.java

* Fix ITSqlInputSourceTest.java

* Fix ITSqlInputSourceTest.java

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix
2020-07-21 09:52:13 -07:00
Maytas Monsereenusorn 0cabc53bd5
Add integration tests for Appends (#10186)
* append test

* add append IT

* fix checkstyle

* fix checkstyle

* Remove parallel

* fix checkstyle

* fix

* fix

* address comments

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix

* fix
2020-07-20 13:43:13 -07:00
Suneet Saldanha e6c9142129
Add validation for authenticator and authorizer name (#10106)
* Add validation for authorizer name

* fix deps

* add javadocs

* Do not use resource filters

* Fix BasicAuthenticatorResource as well

* Add integration tests

* fix test

* fix
2020-07-13 21:15:54 -07:00
Nishant Bangarwa 2b48de074a
Add additional properties for Kafka AdminClient and consumer from test config file (#10137)
* Add kafka test configs from file for AdminClient and consumer

* review comment
2020-07-13 18:16:05 +05:30
Suneet Saldanha 58f2e51161
Do not echo back username on auth failure (#10097)
* Do not echo back username on auth failure

* use bad username

* Remove username from exception messages

* fix tests

* fix the tests

* hopefully this time

* this time the tests work

* fixed this time

* fix

* upgrade to Jetty 9.4.30

* Unknown users echo back Unauthorized

* fix
2020-07-10 12:19:10 -07:00
Maytas Monsereenusorn 4e8570b71b
Add integration tests for all InputFormat (#10088)
* Add integration tests for Avro OCF InputFormat

* Add integration tests for Avro OCF InputFormat

* add tests

* fix bug

* fix bug

* fix failing tests

* add comments

* address comments

* address comments

* address comments

* fix test data

* reduce resource needed for IT

* remove bug fix

* fix checkstyle

* add bug fix
2020-07-08 12:50:29 -07:00
Maytas Monsereenusorn 859ff6e9c0
Reduce memory footprint of integration test by not starting unneeded containers (#10150)
* Reduce memory footprint of integration test

* fix README

* fix README

* fix error in script

* fix security IT
2020-07-08 09:46:18 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
frank chen 60c6bd5b4c
support Aliyun OSS service as deep storage (#9898)
* init commit, all tests passed

* fix format

Signed-off-by: frank chen <frank.chen021@outlook.com>

* data stored successfully

* modify config path

* add doc

* add aliyun-oss extension to project

* remove descriptor deletion code to avoid warning message output by aliyun client

* fix warnings reported by lgtm-com

* fix ci warnings

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix errors reported by intellj inspection check

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix doc spelling check

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix dependency warnings reported by ci

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix warnings reported by CI

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add package configuration to support showing extension info

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add IT test cases and fix bugs

Signed-off-by: frank chen <frank.chen021@outlook.com>

* 1. code review comments adopted
2. change schema from 'aliyun-oss' to 'oss'

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add license info

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix doc

Signed-off-by: frank chen <frank.chen021@outlook.com>

* exclude execution of IT testcases of OSS extension from CI

Signed-off-by: frank chen <frank.chen021@outlook.com>

* put the extensions under contrib group and add to distribution

* fix names in test cases

* add unit test to cover OssInputSource

* fix names in test cases

* fix dependency problem reported by CI

Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-07-01 22:20:53 -07:00
Yuanli Han fc555980e8
Remove payload field from table sys.segment (#9883)
* remove payload field from table sys.segments

* update doc

* fix test

* fix CI failure

* add necessary fields

* fix doc

* fix comment
2020-06-29 22:20:23 -07:00
Suneet Saldanha 15a0b4ffe2
Filter http requests by http method (#10085)
* Filter http requests by http method

Add a config that allows a user which http methods to allow against their
Druid server.

Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.

If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config

* Exclude OPTIONS from always supported HTTP methods

Add HEAD as an allowed method for web console e2e tests

* fix docs

* fix security IT

* Actually fix the web console e2e tests

* Ignore icode coverage for nitialization classes

* code review
2020-06-29 16:59:31 -07:00
Maytas Monsereenusorn ec46d82c71
Add integration tests for SqlInputSource (#10080)
* Add integration tests for SqlInputSource

* make it faster
2020-06-26 10:32:42 -10:00
Jihoon Son d644a27f1a
Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports
2020-06-18 18:40:43 -07:00
Aleksey Plekhanov 2c384b61ff
IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()" (#9690)
* IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()"

* Reverted checkstyle rule

* Added tests to pass CI

* Codestyle
2020-06-18 09:47:07 -07:00
agricenko cad9eea15d
Integration test docker compose readme (#10016)
* Integration Tests. Docker-compose readme part

* Readme updates. PR fixes

Co-authored-by: agritsenko <agritsenko@provectus.com>
2020-06-15 14:48:34 -10:00
Clint Wylie f8b643ec72
make joinables closeable (#9982)
* make joinables closeable

* tests and adjustments

* refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests

* fixes

* javadocs and stuff

* fix bugs

* more test

* fix lgtm alert

* simplify

* fixup javadoc

* review stuffs

* safeguard against exceptions

* i hate this checkstyle rule

* make IndexedTable extend Closeable
2020-06-09 20:12:36 -07:00
agricenko e72f490be0
Integration Tests. Small fixes for CI. (#9988)
Co-authored-by: agritsenko <agritsenko@provectus.com>
2020-06-04 17:10:56 -07:00
agricenko 56a9cad532
Integration Tests. (#9854)
* Integration Tests.
Added docker-compose with druid-cluster configuration.
Refactored shell scripts. split code in a few files

* Integration Tests.
Added environment variable: DRUID_INTEGRATION_TEST_GROUP

* Integration Tests. Removed nit

* Integration Tests. Updated if block in docker_run_cluster.sh.

* Integration Tests. Readme. Added Docker-compose section.

* Integration Tests. removed yml files for s3, gcs, azure.
Renamed variables for skip start/stop/build docker.
Updated readme.
Rollback maven profile: int-tests-config-file

* Integration Tests. Removed docker-compose.test-env.yml file.
Added DRUID_INTEGRATION_TEST_GROUP variable to docker-compose.yml

* Integration Tests. Readme. Added details about docker-compose

* Integration Tests. cleanup shell scripts

Co-authored-by: agritsenko <agritsenko@provectus.com>
2020-06-02 09:38:53 -07:00
Xavier Léauté 65280a6953
update kafka client version to 2.5.0 (#9902)
- remove dependency on deprecated internal Kafka classes
- keep LZ4 version in line with the version shipped with Kafka
2020-05-27 13:20:32 -07:00
Maytas Monsereenusorn 9db29b93bf
Fix Hadoop IT Legacy test query json was not parameterized (#9901) 2020-05-20 21:09:17 -07:00
Jihoon Son c06d3f14b1
Add javadoc for stream ingestion integration tests (#9795) 2020-05-12 08:56:43 -07:00
Jonathan Wei 61295bd002
More Hadoop integration tests (#9714)
* More Hadoop integration tests

* Add missing s3 instructions

* Address PR comments

* Address PR comments

* PR comments

* Fix typo
2020-04-30 14:33:01 -07:00
Jihoon Son 39722bd064
Integration tests for stream ingestion with various data formats (#9783)
* Integration tests for stream ingestion with various data formats

* fix npe

* better logging; fix tsv

* fix tsv

* exclude kinesis from travis

* some readme
2020-04-29 13:18:01 -07:00
Maytas Monsereenusorn 6bc64b731f
Improve "waiting for tasks complete" logic in integration tests (#9759)
* improve waiting for tasks complete logic in integration tests

* improve waiting for tasks complete logic in integration tests

* fix forbidden check
2020-04-29 08:53:45 -07:00
Maytas Monsereenusorn a107ee3ed2
Fix problem when running single integration test using -Dit.test= (#9778)
* fix running single it

* fix checksyle
2020-04-29 08:53:25 -07:00
Maytas Monsereenusorn 16f5ae4405
Add integration tests for kafka ingestion (#9724)
* add kafka admin and kafka writer

* refactor kinesis IT

* fix typo refactor

* parallel

* parallel

* parallel

* parallel works now

* add kafka it

* add doc to readme

* fix tests

* fix failing test

* test

* test

* test

* test

* address comments

* addressed comments
2020-04-22 10:43:34 -07:00
Maytas Monsereenusorn cff39892ba
Fixes intermittent failure in ITAutoCompactionTest (#9739)
* fix intermittent failure in ITAutoCompactionTest

* fix typo

* update javadoc
2020-04-21 20:56:17 -07:00
Maytas Monsereenusorn 8328d91b30
Add missing integration tests for the compaction by the coordinator (#9644)
* Add API to trigger a compaction by the coordinator for integration tests

* Add missing integration tests for the compaction by the coordinator

* address comments
2020-04-15 14:27:33 -07:00
Maytas Monsereenusorn d930f04e6a
Test file format extensions for inputSource (orc, parquet) (#9632)
* Test file format extensions for inputSource (orc, parquet)

* Test file format extensions for inputSource (orc, parquet)

* fix path

* resolve merge conflict

* fix typo
2020-04-13 13:03:56 -07:00
Suneet Saldanha 1ced3b33fb
IntelliJ inspections cleanup (#9339)
* IntelliJ inspections cleanup

* Standard Charset object can be used
* Redundant Collection.addAll() call
* String literal concatenation missing whitespace
* Statement with empty body
* Redundant Collection operation
* StringBuilder can be replaced with String
* Type parameter hides visible type

* fix warnings in test code

* more test fixes

* remove string concatenation inspection error

* fix extra curly brace

* cleanup AzureTestUtils

* fix charsets for RangerAdminClient

* review comments
2020-04-10 10:04:40 -07:00
Maytas Monsereenusorn 73a6baaeb6
change hadoop inputSource IT to use parallel batch ingestion (#9616) 2020-04-07 11:37:37 -07:00
Clint Wylie d267b1c414
check paths used for shuffle intermediary data manager get and delete (#9630)
* check paths used for shuffle intermediary data manager get and delete

* add test

* newline

* meh
2020-04-07 09:47:18 -07:00
Aleksei Chumagin 79522f3e25
Integration-tests: typo (#9624)
* QA-57: change $ to # as comment

* QA-57: fix haddop to hadoop
2020-04-06 17:40:05 -07:00
Clint Wylie 4d277dbf99
Fix double count ssl connection metrics (#9594)
* fix double counted jetty/numOpenConnections metric for ssl connections

* tests

* more better

* style
2020-04-03 23:29:23 -07:00