Commit Graph

10476 Commits

Author SHA1 Message Date
Clint Wylie 4b99c6d3ef
ensure ParallelMergeCombiningSequence closes its closeables (#10076)
* ensure close for all closeables of ParallelMergeCombiningSequence

* revert unneeded change

* consolidate methods

* catch throwable instead of exception
2020-06-26 14:37:20 -07:00
Maytas Monsereenusorn ec46d82c71
Add integration tests for SqlInputSource (#10080)
* Add integration tests for SqlInputSource

* make it faster
2020-06-26 10:32:42 -10:00
Jihoon Son c591ff8ea8
Add NonnullPair (#10013)
* Add NonnullPair

* new line

* test

* make it consistent
2020-06-26 09:52:06 -07:00
Suneet Saldanha b7d771f633
More prominent instructions on code coverage failure (#10060)
* More prominent instructions on code coverage failure

* Update .travis.yml
2020-06-25 19:48:30 -07:00
morrifeldman f6594fff60
Fix missing temp dir for native single_dim (#10046)
* Fix missing temp dir for native single_dim

Native single dim indexing throws a file not found exception from
InputEntityIteratingReader.java:81.  This MR creates the required
temporary directory when setting up the
PartialDimensionDistributionTask.  The change was tested on a Druid
cluster.  After installing the change native single_dim indexing
completes successfully.

* Fix indentation

* Use SinglePhaseSubTask as example for creating the temp dir

* Move temporary indexing dir creation in to TaskToolbox

* Remove unused dependency

Co-authored-by: Morri Feldman <morri@appsflyer.com>
2020-06-25 14:41:22 -07:00
Jihoon Son aaee72c781
Allow append to existing datasources when dynamic partitioning is used (#10033)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports

* Allow append to existing datasources when dynamic partitioing is used

* fix test

* checkstyle

* checkstyle

* fix test

* fix test

* fix other tests..

* checkstyle

* hansle unknown core partitions size in overlord segment allocation

* fail to append when numCorePartitions is unknown

* log

* fix comment; rename to be more intuitive

* double append test

* cleanup complete(); add tests

* fix build

* add tests

* address comments

* checkstyle
2020-06-25 13:37:31 -07:00
Clint Wylie 0f51b3c190
fix dropwizard emitter jvm bufferpoolName metric (#10075)
* fix dropwizard emitter jvm bufferpoolName metric

* fixes
2020-06-25 12:20:25 -07:00
Parag Jain 422a8af14e
Fix balancer strategy (#10070)
* fix server overassignment

* fix random balancer strategy, add more tests

* comment

* added more tests

* fix forbidden apis

* fix typo
2020-06-25 16:45:00 +05:30
Clint Wylie ec1f443a5c
update avatica to handle additional character sets over jdbc (#10074)
* update avatica to handle additional character sets over jdbc

* update license yaml, fix test

* oops
2020-06-24 19:58:34 -07:00
Xavier Léauté 572cd16e6f
fix dimension names for jvm monitor metrics (#10071) 2020-06-24 19:56:16 -07:00
xhl0726 1596b3eacd
Optimize protobuf parsing for flatten data (#9999)
* optimize for protobuf parsing

* fix import error and maven dependency

* add unit test in protobufInputrowParserTest for flatten data

* solve code duplication (remove the log and main())

* rename 'flatten' to 'flat' to make it clearer

Co-authored-by: xionghuilin <xionghuilin@bytedance.com>
2020-06-24 18:01:31 -07:00
Maytas Monsereenusorn 9be5039f68
Enable query vectorization by default (#10065)
* Enable query vectorization by default

* update docs
2020-06-24 13:08:49 -07:00
Maytas Monsereenusorn f80c02da02
Fix HyperUniquesAggregatorFactory.estimateCardinality null handling to respect output type (#10063)
* fix return type from HyperUniquesAggregator/HyperUniquesVectorAggregator

* address comments

* address comments
2020-06-23 15:54:37 -10:00
sthetland 978b494b46
Druid user permissions (#10047)
* Druid user permissions apply in the console

* Update index.md

* noting user warning in console page; some minor shuffling

* noting user warning in console page; some minor shuffling 1

* touchups

* link checking fixes

* Updated per suggestions
2020-06-23 17:39:48 -07:00
Harshpreet Singh d96aa1586a
retry 500 and 503 errors against kinesis (#10059)
* retry 500 and 503 errors against kinesis

* add test that exercises retry logic

* more branch coverage

* retry 500 and 503 on getRecords request when fetching sequence numberu

Co-authored-by: Harshpreet Singh <hrshpr@twitch.tv>
2020-06-23 15:49:34 -07:00
Dylan Wylie 0470fcc9da
change default number of segment loading threads (#9856)
* change default number of segment loading threads

* fix docs

* missed file

* min -> max for segment loading threads

Co-authored-by: Dylan <dwylie@spotx.tv>
2020-06-23 13:56:44 -07:00
Clint Wylie eee99ff0d5
minor rework of topn algorithm selection for clarity and more javadocs (#10058)
* minor refactor of topn engine algorithm selection for clarity

* adjust

* more javadoc
2020-06-22 09:08:50 -07:00
Jianhuan Liu 5600e1c204
fix docs error in hadoop-based part (#9907)
* fix docs error: google to azure and hdfs to http

* fix docs error: indexSpecForIntermediatePersists of tuningConfig in hadoop-based batch part

* fix docs error: logParseExceptions of tuningConfig in hadoop-based batch part

* fix docs error: maxParseExceptions of tuningConfig in hadoop-based batch part
2020-06-19 23:14:54 -10:00
Maytas Monsereenusorn 9bab6b6371
SketchAggregator.updateUnion should handle null inside List update object (#10055) 2020-06-19 20:29:25 -07:00
Maytas Monsereenusorn 191572ad5e
Add safeguard to make sure new Rules added are aware of Rule usage in loadstatus API (#10054)
* Add safeguard to make sure new Rules added are aware of Rule usuage in loadstatus API

* address comments

* address comments

* add tests
2020-06-19 20:18:56 -07:00
Clint Wylie c2f5d453f8
fix topn on string columns with non-sorted or non-unique dictionaries (#10053)
* fix topn on string columns with non-sorted or non-unique dictionaries

* fix metadata tests

* refactor, clarify comments and code, fix ci failures
2020-06-19 11:35:18 -07:00
Jonathan Wei 37e150c075
Fix join filter rewrites with nested queries (#10015)
* Fix join filter rewrites with nested queries

* Fix test, inspection, coverage

* Remove clauses from group key

* Fix import order

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2020-06-18 21:32:29 -07:00
Jihoon Son d644a27f1a
Create packed core partitions for hash/range-partitioned segments in native batch ingestion (#10025)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports
2020-06-18 18:40:43 -07:00
Suneet Saldanha b8a3223f24
Remove changes from #9114 (#10050) 2020-06-18 18:18:12 -07:00
Maytas Monsereenusorn 857e5204bf
Coordinator loadstatus API full format does not consider Broadcast rules (#10048)
* Coordinator loadstatus API full format does not consider Broadcast rules

* address comments

* fix checkstyle

* minor optimization

* address comments
2020-06-18 17:52:33 -07:00
Clint Wylie b5e6569d2c
global table only if joinable (#10041)
* global table if only joinable

* oops

* fix style, add more tests

* Update sql/src/test/java/org/apache/druid/sql/calcite/schema/DruidSchemaTest.java

* better information schema columns, distinguish broadcast from joinable

* fix javadoc

* fix mistake

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-06-18 17:32:10 -07:00
litao a4bd144ebe
fix docs (#9114)
Co-authored-by: tomscut <tomscut@gmail.com>
2020-06-18 09:48:47 -07:00
Aleksey Plekhanov 2c384b61ff
IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()" (#9690)
* IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()"

* Reverted checkstyle rule

* Added tests to pass CI

* Codestyle
2020-06-18 09:47:07 -07:00
Samarth Jain 3527458f85
Druid Avatica - Handle escaping of search characters correctly (#10040)
Fix Avatica based metadata queries by appending ESCAPE '\' clause to the LIKE expressions
2020-06-17 20:01:31 -07:00
Maytas Monsereenusorn 7569ee3ec6
All aggregators should check if column can be vectorize (#10026)
* All aggregators should use vectorization-aware column processor

* All aggregators should use vectorization-aware column processor

* fix canVectorize

* fix canVectorize

* add tests

* revert back default

* address comment

* address comments

* address comment

* address comment
2020-06-17 01:52:02 -10:00
Maytas Monsereenusorn 1a2620606d
API to verify a datasource has the latest ingested data (#9965)
* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* fix checksyle

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* API to verify a datasource has the latest ingested data

* fix spelling

* address comments

* fix checkstyle

* update docs

* fix tests

* fix doc

* address comments

* fix typo

* fix spelling

* address comments

* address comments

* fix typo in docs
2020-06-16 20:48:30 -10:00
Clint Wylie 68aa384190
global table datasource for broadcast segments (#10020)
* global table datasource for broadcast segments

* tests

* fix

* fix test

* comments and javadocs

* review stuffs

* use generated equals and hashcode
2020-06-16 17:58:05 -07:00
Suneet Saldanha 4e483a70b4
ROUND and having comparators correctly handle special double values (#10014)
* ROUND and having comparators correctly handle doubles

Double.NaN, Double.POSITIVE_INFINITY and Double.NEGATIVE_INFINITY are not real
numbers. Because of this, they can not be converted to BigDecimal and instead
throw a NumberFormatException.

This change adds support for calculations that produce these numbers either
for use in the `ROUND` function or the HavingSpecMetricComparator by not
attempting to convert the number to a BigDecimal.

The bug in ROUND was first introduced in #7224 where we added the ability to
round to any decimal place. This PR changes the behavior back to using
`Math.round` if we recognize a number that can not be converted to a
BigDecimal.

* Add tests and fix spellcheck

* update error message in ExpressionsTest

* Address comments

* fix up round for infinity

* round non numeric doubles returns a double

* fix spotbugs

* Update docs/misc/math-expr.md

* Update docs/querying/sql.md
2020-06-16 16:09:46 -07:00
Gian Merlino 9330ca9717
Remove LegacyDataSource. (#10037)
* Remove LegacyDataSource.

Its purpose was to enable deserialization of strings into TableDataSources.
But we can do this more straightforwardly with Jackson annotations.

* Slight test improvement.
2020-06-16 14:40:35 -07:00
Clint Wylie 9468df4721
make phaser of ReferenceCountingCloseableObject protected instead of private so subclasses can do stuff with it (#10035) 2020-06-15 19:56:49 -07:00
agricenko cad9eea15d
Integration test docker compose readme (#10016)
* Integration Tests. Docker-compose readme part

* Readme updates. PR fixes

Co-authored-by: agritsenko <agritsenko@provectus.com>
2020-06-15 14:48:34 -10:00
Suneet Saldanha 0035f39e25
lpad and rpad functions match postrges behavior in SQL compatible mode (#10006)
* lpad and rpad functions deal with empty pad

Return null if the pad string used by the `lpad` and `rpad` functions is
an empty string

* Fix rpad

* Match PostgreSQL behavior in SQL compliant null handling mode

* Match PostgreSQL behavior for pad -ve len

* address review comments
2020-06-15 10:47:57 -07:00
Jihoon Son 9a10f8352b
Set the core partition set size properly for batch ingestion with dynamic partitioning (#10012)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle
2020-06-12 21:39:37 -07:00
Clint Wylie 8a7e7e773a
fix balancer + broadcast segments npe (#10021) 2020-06-12 13:09:22 -07:00
Jonathan Wei fe2f656427
Fix broadcast rule drop and docs (#10019)
* Fix broadcast rule drop and docs

* Remove racy test check

* Don't drop non-broadcast segments on tasks, add overshadowing handling

* Don't use realtimes for overshadowing

* Fix dropping for ingestion services
2020-06-12 02:33:28 -07:00
Chi Cao Minh 67669b4ad4
Fix CVE-2020-13602 (#10024)
Upgrade postgres jdbc driver to latest version to address CVE, which was
fixed in 42.2.13.
2020-06-11 17:30:13 -07:00
Maytas Monsereenusorn 5d35f3e080
Remove colocated datasources from web console for broadcast indexed tables (#10018) 2020-06-11 14:08:03 -10:00
Stefan Birkner 369ed2503e
Remove duplicate parameters from test (#10022)
Commit 771870ae2d removed constructor
arguments from the rules. Therefore multiple parameters of the test are
now the same and can be removed.
2020-06-11 14:15:02 -07:00
Maytas Monsereenusorn c6d2b94d7b
Add instruction for code coverage checks (#9995)
* Add instruction for code coverage checks

* address comments
2020-06-10 18:17:36 -10:00
Clint Wylie 96eb69e475
ignore brokers in broker views (#10017) 2020-06-10 12:29:30 -07:00
danc 5da78d13af
Update password-provider.md (#9857) 2020-06-10 09:32:49 -07:00
Stefan Birkner 7282e2f2f9
Simplify CompressedVSizeColumnarIntsSupplierTest (#10003)
The parameters generator uses CompressionStrategy.noNoneValues() instead
of CompressionStrategyTest.compressionStrategies() which wrapped each
strategy in a single element array. This improves readability of the
test.
2020-06-10 09:32:00 -07:00
BIGrey d4d0004338
Fix failed tests in TimestampParserTest when running locally (#9997)
* fix failed tests in TimestampPaserTest due to timezone

* remove unneeded -Duser.country=US

Co-authored-by: huagnhui.bigrey <huanghui.bigrey@bytedance.com>
2020-06-10 09:19:38 -07:00
Clint Wylie f8b643ec72
make joinables closeable (#9982)
* make joinables closeable

* tests and adjustments

* refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests

* fixes

* javadocs and stuff

* fix bugs

* more test

* fix lgtm alert

* simplify

* fixup javadoc

* review stuffs

* safeguard against exceptions

* i hate this checkstyle rule

* make IndexedTable extend Closeable
2020-06-09 20:12:36 -07:00
Clint Wylie 1c9ca55247
remove incorrect and unnecessary overrides from BooleanVectorValueMatcher (#9994)
* remove incorrect and unnecessary overrides from BooleanVectorValueMatcher

* add test case

* add unit tests for ... part of VectorValueMatcherColumnProcessorFactory

* Update VectorValueMatcherColumnProcessorFactoryTest.java
2020-06-09 19:32:16 -07:00