Commit Graph

792 Commits

Author SHA1 Message Date
frank chen e83d5cb59e
Fix ingestion failure of pretty-formatted JSON message (#10383)
* support multi-line text

* add test cases

* split json text into lines case by case

* improve exception handle

* fix CI

* use IntermediateRowParsingReader as base of JsonReader

* update doc

* ignore the non-immutable field in test case

* add more test cases

* mark `lineSplittable` as final

* fix testcases

* fix doc

* add a test case for SqlReader

* return all raw columns when exception occurs

* fix CI

* fix test cases

* resolve review comments

* handle ParseException returned by index.add

* apply Iterables.getOnlyElement

* fix CI

* fix test cases

* improve code in more graceful way

* fix test cases

* fix test cases

* add a test case to check multiple json string in one text block

* fix inspection check
2020-11-13 13:59:23 -08:00
Clint Wylie 6563599de4
modify access to protected SQLMetadataConnector methods to allow extensions to create SQL metadata tables using implementation specific constructs (payload type, serial type, etc) (#10573) 2020-11-12 23:20:01 +05:30
Clint Wylie d0821de854
support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions (#10499)
* support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions

* inspector

* changes

* more test

* clean
2020-10-26 19:55:24 -07:00
Liran Funaro f3a2903218
Configurable Index Type (#10335)
* Introduce a Configurable Index Type

* Change to @UnstableApi

* Add AppendableIndexSpecTest

* Update doc

* Add spelling exception

* Add tests coverage

* Revert some of the changes to reduce diff

* Minor fixes

* Update getMaxBytesInMemoryOrDefault() comment

* Fix typo, remove redundant interface

* Remove off-heap spec (postponed to a later PR)

* Add javadocs to AppendableIndexSpec

* Describe testCreateTask()

* Add tests for AppendableIndexSpec within TuningConfig

* Modify hashCode() to conform with equals()

* Add comment where building incremental-index

* Add "EqualsVerifier" tests

* Revert some of the API back to AppenderatorConfig

* Don't use multi-line comments

* Remove knob documentation (deferred)
2020-10-23 18:34:26 -07:00
Joseph Glanville 7ce9ac4548
Fix Avro support in Web Console (#10232)
* Fix Avro OCF detection prefix and run formation detection on raw input

* Support Avro Fixed and Enum types correctly

* Check Avro version byte in format detection

* Add test for AvroOCFReader.sample

Ensures that the Sampler doesn't receive raw input that it can't
serialize into JSON.

* Document Avro type handling

* Add TS unit tests for guessInputFormat
2020-10-07 21:08:22 -07:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Lasse Krogh Mammen 20ca9aaaf7
Allow using jsonpath predicates with AvroFlattener (#10330) 2020-10-02 10:14:48 +01:00
Abhishek Agarwal d057c5149f
Fix the offset setting in GoogleStorage#get (#10449)
* Fix the offset in get of GCP object

* upgrade compute dependency

* fix version

* review comments

* missed
2020-10-01 08:38:58 -07:00
Clint Wylie 19c4b16640
vectorized expressions and expression virtual columns (#10401)
* vectorized expression virtual columns

* cleanup

* fixes

* preserve float if explicitly specified

* oops

* null handling fixes, more tests

* what is an expression planner?

* better names

* remove unused method, add pi

* move vector processor builders into static methods

* reduce boilerplate

* oops

* more naming adjustments

* changes

* nullable

* missing hex

* more
2020-09-23 13:56:38 -07:00
Tarun 49a09302f3
Issue fix for CSV loading with header and skip header not parsing well. (#10398) 2020-09-21 15:14:22 -07:00
Igor Dvorzhak d0ee2e3a48
Upgrade ORC to 1.5.10 version (#10291) 2020-09-18 13:38:45 -07:00
belugabehr 74368d95af
Remove JODA Time Dependency from Avro Extensions (#10010) 2020-09-18 12:41:42 -07:00
Suneet Saldanha 0b4c897fbe
Vectorized variance aggregators (#10390)
* wip vectorize

* close but not quite

* faster

* unit tests

* fix complex types for variance
2020-09-17 15:05:40 -07:00
Clint Wylie 184b202411
add computed Expr output types (#10370)
* push down ValueType to ExprType conversion, tidy up

* determine expr output type for given input types

* revert unintended name change

* add nullable

* tidy up

* fixup

* more better

* fix signatures

* naming things is hard

* fix inspection

* javadoc

* make default implementation of Expr.getOutputType that returns null

* rename method

* more test

* add output for contains expr macro, split operation and function auto conversion
2020-09-14 18:18:56 -07:00
Jihoon Son 8f14ac814e
More structured way to handle parse exceptions (#10336)
* More structured way to handle parse exceptions

* checkstyle; add more tests

* forbidden api; test

* address comment; new test

* address review comments

* javadoc for parseException; remove redundant parseException in streaming ingestion

* fix tests

* unnecessary catch

* unused imports

* appenderator test

* unused import
2020-09-11 16:31:10 -07:00
Abhishek Agarwal a5c46dc84b
Add vectorization for druid-histogram extension (#10304)
* First draft

* Remove redundant code from FixedBucketsHistogramAggregator classes

* Add test cases for new classes

* Fix tests in sql compatible mode

* Typo fix

* Fix comment

* Add spelling

* Vectorize only for supported types

* Rename internal aggregator files

* Fix tests
2020-09-09 13:56:33 -07:00
Suneet Saldanha a5cd5f1e84
Fix VARIANCE aggregator comparator (#10340)
* Fix VARIANCE aggregator comparator

The comparator for the variance aggregator used to compare values using the
count. This is now fixed to compare values using the variance. If the variance
is equal, the count and sum are used as tie breakers.

* fix tests + sql compatible mode

* code review

* more tests

* fix last test
2020-09-03 17:38:37 -07:00
Gian Merlino 8ab1979304
Remove implied profanity from error messages. (#10270)
i.e. WTF, WTH.
2020-08-28 11:38:50 -07:00
Jihoon Son f82fd22fa7
Move tools for indexing to TaskToolbox instead of injecting them in constructor (#10308)
* Move tools for indexing to TaskToolbox instead of injecting them in constructor

* oops, other changes

* fix test

* unnecessary new file

* fix test

* fix build
2020-08-26 17:08:12 -07:00
Abhishek Agarwal d4ac62f284
Handle internal kinesis sequence numbers when reporting lag (#10315)
* Handle internal kinesis sequence numbers when reporting lag

* add unit test
2020-08-26 11:27:37 -07:00
Clint Wylie ab60661008
refactor internal type system (#9638)
* better type tracking: add typed postaggs, finalized types for agg factories

* more javadoc

* adjustments

* transition to getTypeName to be used exclusively for complex types

* remove unused fn

* adjust

* more better

* rename getTypeName to getComplexTypeName

* setup expression post agg for type inference existing

* more javadocs

* fixup

* oops

* more test

* more test

* more comments/javadoc

* nulls

* explicitly handle only numeric and complex aggregators for incremental index

* checkstyle

* more tests

* adjust

* more tests to showcase difference in behavior

* timeseries longsum array
2020-08-26 10:53:44 -07:00
Jihoon Son b5b3e6ecce
Add maxNumFiles to splitHintSpec (#10243)
* Add maxNumFiles to splitHintSpec

* missing link

* fix build failure; use maxNumFiles for integration tests

* spelling

* lower default

* Update docs/ingestion/native-batch.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* address comments; change default maxSplitSize

* spelling

* typos and doc

* same change for segments splitHintSpec

* fix build

* fix build

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2020-08-21 09:43:58 -07:00
Xavier Léauté 225490474d
Update Kafka dependencies to 2.6.0 (#10286)
* update Kafka dependencies to Kafka 2.6.0
* switch to Scala 2.13 build of Kafka
* update integration tests
* update Kafka tutorial
2020-08-15 07:56:40 -07:00
Clint Wylie cfb7a893e7
fill out missing test coverage for druid-datasketches postaggs (#9730)
* fill out missing test coverage for druid-datasketches postaggs

* fixup

* fixup merge

* oops

* oops again
2020-07-31 10:08:07 -07:00
Suneet Saldanha e6c9142129
Add validation for authenticator and authorizer name (#10106)
* Add validation for authorizer name

* fix deps

* add javadocs

* Do not use resource filters

* Fix BasicAuthenticatorResource as well

* Add integration tests

* fix test

* fix
2020-07-13 21:15:54 -07:00
Suneet Saldanha 58f2e51161
Do not echo back username on auth failure (#10097)
* Do not echo back username on auth failure

* use bad username

* Remove username from exception messages

* fix tests

* fix the tests

* hopefully this time

* this time the tests work

* fixed this time

* fix

* upgrade to Jetty 9.4.30

* Unknown users echo back Unauthorized

* fix
2020-07-10 12:19:10 -07:00
Maytas Monsereenusorn 4e8570b71b
Add integration tests for all InputFormat (#10088)
* Add integration tests for Avro OCF InputFormat

* Add integration tests for Avro OCF InputFormat

* add tests

* fix bug

* fix bug

* fix failing tests

* add comments

* address comments

* address comments

* address comments

* fix test data

* reduce resource needed for IT

* remove bug fix

* fix checkstyle

* add bug fix
2020-07-08 12:50:29 -07:00
Franklyn Dsouza 1b9aacb1cd
Fix avg sql aggregator (#10135)
* new average aggregator

* method to create count aggregator factory

* test everything

* update other usages

* fix style

* fix more tests

* fix datasketches tests
2020-07-08 08:38:56 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
Jonathan Wei ed981ef88e
Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters (#10056)
* Ensure that join filter pre-analysis operates on optimized filters, add DimFilter.toOptimizedFilter

* Remove aggressive equality check that was used for testing

* Use Suppliers.memoize

* Checkstyle
2020-07-01 22:26:17 -07:00
Clint Wylie 477335abb4
update links datasketches.github.io to datasketches.apache.org (#10107)
* update links datasketches.github.io to datasketches.apache.org

* now with more apache

* oops

* oops
2020-07-01 14:56:17 -07:00
Suneet Saldanha 363d0d86be
QueryCountStatsMonitor can be injected in the Peon (#10092)
* QueryCountStatsMonitor can be injected in the Peon

This change fixes a dependency injection bug where there is a circular
dependency on getting the MonitorScheduler when a user configures the
QueryCountStatsMonitor to be used.

* fix tests

* Actually fix the tests this time
2020-06-29 21:03:07 -07:00
Suneet Saldanha 15a0b4ffe2
Filter http requests by http method (#10085)
* Filter http requests by http method

Add a config that allows a user which http methods to allow against their
Druid server.

Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.

If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config

* Exclude OPTIONS from always supported HTTP methods

Add HEAD as an allowed method for web console e2e tests

* fix docs

* fix security IT

* Actually fix the web console e2e tests

* Ignore icode coverage for nitialization classes

* code review
2020-06-29 16:59:31 -07:00
xhl0726 1596b3eacd
Optimize protobuf parsing for flatten data (#9999)
* optimize for protobuf parsing

* fix import error and maven dependency

* add unit test in protobufInputrowParserTest for flatten data

* solve code duplication (remove the log and main())

* rename 'flatten' to 'flat' to make it clearer

Co-authored-by: xionghuilin <xionghuilin@bytedance.com>
2020-06-24 18:01:31 -07:00
Harshpreet Singh d96aa1586a
retry 500 and 503 errors against kinesis (#10059)
* retry 500 and 503 errors against kinesis

* add test that exercises retry logic

* more branch coverage

* retry 500 and 503 on getRecords request when fetching sequence numberu

Co-authored-by: Harshpreet Singh <hrshpr@twitch.tv>
2020-06-23 15:49:34 -07:00
Maytas Monsereenusorn 9bab6b6371
SketchAggregator.updateUnion should handle null inside List update object (#10055) 2020-06-19 20:29:25 -07:00
Aleksey Plekhanov 2c384b61ff
IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()" (#9690)
* IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()"

* Reverted checkstyle rule

* Added tests to pass CI

* Codestyle
2020-06-18 09:47:07 -07:00
Jonathan Wei 771870ae2d
Load broadcast datasources on broker and tasks (#9971)
* Load broadcast datasources on broker and tasks

* Add javadocs

* Support HTTP segment management

* Fix indexer maxSize

* inspection fix

* Make segment cache optional on non-historicals

* Fix build

* Fix inspections, some coverage, failed tests

* More tests

* Add CliIndexer to MainTest

* Fix inspection

* Rename UnprunedDataSegment to LoadableDataSegment

* Address PR comments

* Fix
2020-06-08 20:15:59 -07:00
Maytas Monsereenusorn 790e9482ea
Fix Subquery could not be converted to groupBy query (#9959)
* Fix join

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* Fix Subquery could not be converted to groupBy query

* add tests

* address comments

* fix failing tests
2020-06-03 16:46:28 -07:00
Gian Merlino 3dfd7c30c0
Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT. (#9893)
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.

- Add REGEXP_LIKE function that returns a boolean, and is useful in
  WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
  filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
  (previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
  non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.

* Changes based on PR review.

* Fix arg check.

* Important fixes!

* Add speller.

* wip

* Additional tests.

* Fix up tests.

* Add validation error tests.

* Additional tests.

* Remove useless call.
2020-06-03 14:31:37 -07:00
Xavier Léauté a934b2664c
remove ListenableFutures and revert to using the Guava implementation (#9944)
This change removes ListenableFutures.transformAsync in favor of the
existing Guava Futures.transform implementation. Our own implementation
had a bug which did not fail the future if the applied function threw an
exception, resulting in the future never completing.

An attempt was made to fix this bug, however when running againts Guava's own
tests, our version failed another half dozen tests, so it was decided to not
continue down that path and scrap our own implementation.

Explanation for how was this bug manifested itself:

An exception thrown in BaseAppenderatorDriver.publishInBackground when
invoked via transformAsync in StreamAppenderatorDriver.publish will
cause the resulting future to never complete.

This explains why when encountering https://github.com/apache/druid/issues/9845
the task will never complete, forever waiting for the publishFuture to
register the handoff. As a result, the corresponding "Error while
publishing segments ..." message only gets logged once the index task
times out and is forcefully shutdown when the future is force-cancelled
by the executor.
2020-06-03 10:46:03 -07:00
Clint Wylie c2c38f6ac2
only close exec if it exists (#9952) 2020-05-29 20:09:34 -07:00
Xavier Léauté 65280a6953
update kafka client version to 2.5.0 (#9902)
- remove dependency on deprecated internal Kafka classes
- keep LZ4 version in line with the version shipped with Kafka
2020-05-27 13:20:32 -07:00
Clint Wylie 2e9548d93d
refactor SeekableStreamSupervisor usage of RecordSupplier (#9819)
* refactor SeekableStreamSupervisor usage of RecordSupplier to reduce contention between background threads and main thread, refactor KinesisRecordSupplier, refactor Kinesis lag metric collection and emitting

* fix style and test

* cleanup, refactor, javadocs, test

* fixes

* keep collecting current offsets and lag if unhealthy in background reporting thread

* review stuffs

* add comment
2020-05-16 14:09:39 -07:00
Alexander Saydakov 522df300c2
Datasketches 1 3 0 (#9880)
* use the latest datasketches release

* new sketch debug print

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2020-05-16 14:09:23 -07:00
Joseph Glanville 793f386d6a
Add support for Avro OCF using InputFormat (#9671)
* Add AvroOCFInputFormat

* Support supplying a reader schema in AvroOCFInputFormat

* Add docs for Avro OCF input format

* Address review comments

* Address second round of review
2020-05-16 14:09:12 -07:00
Jihoon Son 46beaa0640
Fix potential resource leak in ParquetReader (#9852)
* Fix potential resource leak in ParquetReader

* add test

* never thrown exception

* catch potential exceptions
2020-05-16 09:57:12 -07:00
zachjsh 80b212fe43
druid.storage.maxListingLength should default to 1000 for s3 (#9858)
* druid.storage.maxListingLength should default to 1000 for s3

* * Address review comments

* * Address review comments

* * Address comments
2020-05-14 07:00:51 -07:00
mcbrewster 28be107a1c
add flag to flattenSpec to keep null columns (#9814)
* add flag to flattenSpec to keep null columns

* remove changes to inputFormat interface

* add comment

* change comment message

* update web console e2e test

* move keepNullColmns to JSONParseSpec

* fix merge conflicts

* fix tests

* set keepNullColumns to false by default

* fix lgtm

* change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns

* Add equals verifier tests
2020-05-08 21:53:39 -07:00
Clint Wylie 339876b69d
fill out missing test coverage for druid-stats, druid-momentsketch, druid-tdigestsketch postaggs (#9740)
* postagg test coverage for druid-stats, druid-momentsketch, druid-tdigestsketch and fixes

* style fixes

* fix comparator for TDigestQuantilePostAggregator
2020-05-07 13:48:33 -07:00