Commit Graph

10583 Commits

Author SHA1 Message Date
Chi Cao Minh 33a37d85d7
Fix native batch range partition segment sizing (#10089)
* Fix native batch range partition segment sizing

Fixes #10057.

Native batch range partitioning was only considering the partition
dimension value when grouping rows instead of using all of the row's
partition values. Thus, for schemas with multiple dimensions, the rollup
was overestimated, which would cause too many dimension values to be
packed into the same range partition. The resulting segments would then
be overly large (and not honor the target or max partition sizes).

Main changes:

- PartialDimensionDistributionTask: Consider all dimension values when
  grouping row

- RangePartitionMultiPhaseParallelIndexingTest: Regression test by
  having input with rows that should roll up and rows that should not
  roll up

* Use hadoop & native hash ingestion row group key
2020-06-29 17:49:52 -07:00
Jihoon Son 8ef3598c05
Move shardSpec tests to core (#10079)
* Move shardSpec tests to core

* checkstyle

* inject object mapper for testing

* unused import
2020-06-29 17:31:37 -07:00
Suneet Saldanha 15a0b4ffe2
Filter http requests by http method (#10085)
* Filter http requests by http method

Add a config that allows a user which http methods to allow against their
Druid server.

Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.

If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config

* Exclude OPTIONS from always supported HTTP methods

Add HEAD as an allowed method for web console e2e tests

* fix docs

* fix security IT

* Actually fix the web console e2e tests

* Ignore icode coverage for nitialization classes

* code review
2020-06-29 16:59:31 -07:00
Will Xu 35c7c0ec25
Segment timeline doesn't show results older than 3 months (#9956)
* Segment timeline doesn't show results older than 3 months

* Adoption testing patch for web segment timeline view and also refactoring default time config
2020-06-28 01:45:05 -07:00
chenyuzhi459 a4c6d5f37e
fix query memory leak (#10027)
* fix query memory leak

* rollup ./idea

* roll up .idea

* clean code

* optimize style

* optimize cancel function

* optimize style

* add concurrentGroupTest test case

* add test case

* add unit test

* fix code style

* optimize cancell method use

* format code

* reback code

* optimize cancelAll

* clean code

* add comment
2020-06-26 23:30:59 -07:00
Jian Wang 20fd72bd13
Fix NPE when brokers use custom priority list (#9878) 2020-06-26 17:28:54 -07:00
xiangqiao123 405ebdcaaf
fix MaterializedView gropuby query return arry result by default (#9936)
* fix bug:MaterializedView gropuby query return map result by default

* add unit test

* add unit test

* add unit test

* fix bug:MaterializedView gropuby query return map result by default

* add unit test

* add unit test

* add unit test

* update pr

* update pr

Co-authored-by: xiangqiao <xiangqiao@kuaishou.com>
2020-06-26 16:52:04 -07:00
Clint Wylie 4b99c6d3ef
ensure ParallelMergeCombiningSequence closes its closeables (#10076)
* ensure close for all closeables of ParallelMergeCombiningSequence

* revert unneeded change

* consolidate methods

* catch throwable instead of exception
2020-06-26 14:37:20 -07:00
Maytas Monsereenusorn ec46d82c71
Add integration tests for SqlInputSource (#10080)
* Add integration tests for SqlInputSource

* make it faster
2020-06-26 10:32:42 -10:00
Jihoon Son c591ff8ea8
Add NonnullPair (#10013)
* Add NonnullPair

* new line

* test

* make it consistent
2020-06-26 09:52:06 -07:00
Suneet Saldanha b7d771f633
More prominent instructions on code coverage failure (#10060)
* More prominent instructions on code coverage failure

* Update .travis.yml
2020-06-25 19:48:30 -07:00
morrifeldman f6594fff60
Fix missing temp dir for native single_dim (#10046)
* Fix missing temp dir for native single_dim

Native single dim indexing throws a file not found exception from
InputEntityIteratingReader.java:81.  This MR creates the required
temporary directory when setting up the
PartialDimensionDistributionTask.  The change was tested on a Druid
cluster.  After installing the change native single_dim indexing
completes successfully.

* Fix indentation

* Use SinglePhaseSubTask as example for creating the temp dir

* Move temporary indexing dir creation in to TaskToolbox

* Remove unused dependency

Co-authored-by: Morri Feldman <morri@appsflyer.com>
2020-06-25 14:41:22 -07:00
Jihoon Son aaee72c781
Allow append to existing datasources when dynamic partitioning is used (#10033)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports

* Allow append to existing datasources when dynamic partitioing is used

* fix test

* checkstyle

* checkstyle

* fix test

* fix test

* fix other tests..

* checkstyle

* hansle unknown core partitions size in overlord segment allocation

* fail to append when numCorePartitions is unknown

* log

* fix comment; rename to be more intuitive

* double append test

* cleanup complete(); add tests

* fix build

* add tests

* address comments

* checkstyle
2020-06-25 13:37:31 -07:00
Clint Wylie 0f51b3c190
fix dropwizard emitter jvm bufferpoolName metric (#10075)
* fix dropwizard emitter jvm bufferpoolName metric

* fixes
2020-06-25 12:20:25 -07:00
Parag Jain 422a8af14e
Fix balancer strategy (#10070)
* fix server overassignment

* fix random balancer strategy, add more tests

* comment

* added more tests

* fix forbidden apis

* fix typo
2020-06-25 16:45:00 +05:30
Clint Wylie ec1f443a5c
update avatica to handle additional character sets over jdbc (#10074)
* update avatica to handle additional character sets over jdbc

* update license yaml, fix test

* oops
2020-06-24 19:58:34 -07:00
Xavier Léauté 572cd16e6f
fix dimension names for jvm monitor metrics (#10071) 2020-06-24 19:56:16 -07:00
xhl0726 1596b3eacd
Optimize protobuf parsing for flatten data (#9999)
* optimize for protobuf parsing

* fix import error and maven dependency

* add unit test in protobufInputrowParserTest for flatten data

* solve code duplication (remove the log and main())

* rename 'flatten' to 'flat' to make it clearer

Co-authored-by: xionghuilin <xionghuilin@bytedance.com>
2020-06-24 18:01:31 -07:00
Maytas Monsereenusorn 9be5039f68
Enable query vectorization by default (#10065)
* Enable query vectorization by default

* update docs
2020-06-24 13:08:49 -07:00
Maytas Monsereenusorn f80c02da02
Fix HyperUniquesAggregatorFactory.estimateCardinality null handling to respect output type (#10063)
* fix return type from HyperUniquesAggregator/HyperUniquesVectorAggregator

* address comments

* address comments
2020-06-23 15:54:37 -10:00
sthetland 978b494b46
Druid user permissions (#10047)
* Druid user permissions apply in the console

* Update index.md

* noting user warning in console page; some minor shuffling

* noting user warning in console page; some minor shuffling 1

* touchups

* link checking fixes

* Updated per suggestions
2020-06-23 17:39:48 -07:00
Harshpreet Singh d96aa1586a
retry 500 and 503 errors against kinesis (#10059)
* retry 500 and 503 errors against kinesis

* add test that exercises retry logic

* more branch coverage

* retry 500 and 503 on getRecords request when fetching sequence numberu

Co-authored-by: Harshpreet Singh <hrshpr@twitch.tv>
2020-06-23 15:49:34 -07:00
Dylan Wylie 0470fcc9da
change default number of segment loading threads (#9856)
* change default number of segment loading threads

* fix docs

* missed file

* min -> max for segment loading threads

Co-authored-by: Dylan <dwylie@spotx.tv>
2020-06-23 13:56:44 -07:00
Clint Wylie eee99ff0d5
minor rework of topn algorithm selection for clarity and more javadocs (#10058)
* minor refactor of topn engine algorithm selection for clarity

* adjust

* more javadoc
2020-06-22 09:08:50 -07:00
Jianhuan Liu 5600e1c204
fix docs error in hadoop-based part (#9907)
* fix docs error: google to azure and hdfs to http

* fix docs error: indexSpecForIntermediatePersists of tuningConfig in hadoop-based batch part

* fix docs error: logParseExceptions of tuningConfig in hadoop-based batch part

* fix docs error: maxParseExceptions of tuningConfig in hadoop-based batch part
2020-06-19 23:14:54 -10:00
Maytas Monsereenusorn 9bab6b6371
SketchAggregator.updateUnion should handle null inside List update object (#10055) 2020-06-19 20:29:25 -07:00
Maytas Monsereenusorn 191572ad5e
Add safeguard to make sure new Rules added are aware of Rule usage in loadstatus API (#10054)
* Add safeguard to make sure new Rules added are aware of Rule usuage in loadstatus API

* address comments

* address comments

* add tests
2020-06-19 20:18:56 -07:00
Clint Wylie c2f5d453f8
fix topn on string columns with non-sorted or non-unique dictionaries (#10053)
* fix topn on string columns with non-sorted or non-unique dictionaries

* fix metadata tests

* refactor, clarify comments and code, fix ci failures
2020-06-19 11:35:18 -07:00
Jonathan Wei 37e150c075
Fix join filter rewrites with nested queries (#10015)
* Fix join filter rewrites with nested queries

* Fix test, inspection, coverage

* Remove clauses from group key

* Fix import order

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2020-06-18 21:32:29 -07:00
Jihoon Son d644a27f1a
Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports
2020-06-18 18:40:43 -07:00
Suneet Saldanha b8a3223f24
Remove changes from #9114 (#10050) 2020-06-18 18:18:12 -07:00
Maytas Monsereenusorn 857e5204bf
Coordinator loadstatus API full format does not consider Broadcast rules (#10048)
* Coordinator loadstatus API full format does not consider Broadcast rules

* address comments

* fix checkstyle

* minor optimization

* address comments
2020-06-18 17:52:33 -07:00
Clint Wylie b5e6569d2c
global table only if joinable (#10041)
* global table if only joinable

* oops

* fix style, add more tests

* Update sql/src/test/java/org/apache/druid/sql/calcite/schema/DruidSchemaTest.java

* better information schema columns, distinguish broadcast from joinable

* fix javadoc

* fix mistake

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-06-18 17:32:10 -07:00
litao a4bd144ebe
fix docs (#9114)
Co-authored-by: tomscut <tomscut@gmail.com>
2020-06-18 09:48:47 -07:00
Aleksey Plekhanov 2c384b61ff
IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()" (#9690)
* IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()"

* Reverted checkstyle rule

* Added tests to pass CI

* Codestyle
2020-06-18 09:47:07 -07:00
Samarth Jain 3527458f85
Druid Avatica - Handle escaping of search characters correctly (#10040)
Fix Avatica based metadata queries by appending ESCAPE '\' clause to the LIKE expressions
2020-06-17 20:01:31 -07:00
Maytas Monsereenusorn 7569ee3ec6
All aggregators should check if column can be vectorize (#10026)
* All aggregators should use vectorization-aware column processor

* All aggregators should use vectorization-aware column processor

* fix canVectorize

* fix canVectorize

* add tests

* revert back default

* address comment

* address comments

* address comment

* address comment
2020-06-17 01:52:02 -10:00
Maytas Monsereenusorn 1a2620606d
API to verify a datasource has the latest ingested data (#9965)
* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* fix checksyle

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* fix spelling

* address comments

* fix checkstyle

* update docs

* fix tests

* fix doc

* address comments

* fix typo

* fix spelling

* address comments

* address comments

* fix typo in docs
2020-06-16 20:48:30 -10:00
Clint Wylie 68aa384190
global table datasource for broadcast segments (#10020)
* global table datasource for broadcast segments

* tests

* fix

* fix test

* comments and javadocs

* review stuffs

* use generated equals and hashcode
2020-06-16 17:58:05 -07:00
Suneet Saldanha 4e483a70b4
ROUND and having comparators correctly handle special double values (#10014)
* ROUND and having comparators correctly handle doubles

Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real
numbers. Because of this, they can not be converted to BigDecimal and instead
throw a NumberFormatException.

This change adds support for calculations that produce these numbers either
for use in the `ROUND` function or the HavingSpecMetricComparator by not
attempting to convert the number to a BigDecimal.

The bug in ROUND was first introduced in #7224 where we added the ability to
round to any decimal place. This PR changes the behavior back to using
`Math.round` if we recognize a number that can not be converted to a
BigDecimal.

* Add tests and fix spellcheck

* update error message in ExpressionsTest

* Address comments

* fix up round for infinity

* round non numeric doubles returns a double

* fix spotbugs

* Update docs/misc/math-expr.md

* Update docs/querying/sql.md
2020-06-16 16:09:46 -07:00
Gian Merlino 9330ca9717
Remove LegacyDataSource. (#10037)
* Remove LegacyDataSource.

Its purpose was to enable deserialization of strings into TableDataSources.
But we can do this more straightforwardly with Jackson annotations.

* Slight test improvement.
2020-06-16 14:40:35 -07:00
Clint Wylie 9468df4721
make phaser of ReferenceCountingCloseableObject protected instead of private so subclasses can do stuff with it (#10035) 2020-06-15 19:56:49 -07:00
agricenko cad9eea15d
Integration test docker compose readme (#10016)
* Integration Tests. Docker-compose readme part

* Readme updates. PR fixes

Co-authored-by: agritsenko <agritsenko@provectus.com>
2020-06-15 14:48:34 -10:00
Suneet Saldanha 0035f39e25
lpad and rpad functions match postrges behavior in SQL compatible mode (#10006)
* lpad and rpad functions deal with empty pad

Return null if the pad string used by the `lpad` and `rpad` functions is
an empty string

* Fix rpad

* Match PostgreSQL behavior in SQL compliant null handling mode

* Match PostgreSQL behavior for pad -ve len

* address review comments
2020-06-15 10:47:57 -07:00
Jihoon Son 9a10f8352b
Set the core partition set size properly for batch ingestion with dynamic partitioning (#10012)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle
2020-06-12 21:39:37 -07:00
Clint Wylie 8a7e7e773a
fix balancer + broadcast segments npe (#10021) 2020-06-12 13:09:22 -07:00
Jonathan Wei fe2f656427
Fix broadcast rule drop and docs (#10019)
* Fix broadcast rule drop and docs

* Remove racy test check

* Don't drop non-broadcast segments on tasks, add overshadowing handling

* Don't use realtimes for overshadowing

* Fix dropping for ingestion services
2020-06-12 02:33:28 -07:00
Chi Cao Minh 67669b4ad4
Fix CVE-2020-13602 (#10024)
Upgrade postgres jdbc driver to latest version to address CVE, which was
fixed in 42.2.13.
2020-06-11 17:30:13 -07:00
Maytas Monsereenusorn 5d35f3e080
Remove colocated datasources from web console for broadcast indexed tables (#10018) 2020-06-11 14:08:03 -10:00
Stefan Birkner 369ed2503e
Remove duplicate parameters from test (#10022)
Commit 771870ae2d removed constructor
arguments from the rules. Therefore multiple parameters of the test are
now the same and can be removed.
2020-06-11 14:15:02 -07:00