Commit Graph

10469 Commits

Author SHA1 Message Date
Maytas Monsereenusorn 4e8570b71b
Add integration tests for all InputFormat (#10088)
* Add integration tests for Avro OCF InputFormat

* Add integration tests for Avro OCF InputFormat

* add tests

* fix bug

* fix bug

* fix failing tests

* add comments

* address comments

* address comments

* address comments

* fix test data

* reduce resource needed for IT

* remove bug fix

* fix checkstyle

* add bug fix
2020-07-08 12:50:29 -07:00
Maytas Monsereenusorn 859ff6e9c0
Reduce memory footprint of integration test by not starting unneeded containers (#10150)
* Reduce memory footprint of integration test

* fix README

* fix README

* fix error in script

* fix security IT
2020-07-08 09:46:18 -07:00
Franklyn Dsouza 1b9aacb1cd
Fix avg sql aggregator (#10135)
* new average aggregator

* method to create count aggregator factory

* test everything

* update other usages

* fix style

* fix more tests

* fix datasketches tests
2020-07-08 08:38:56 -07:00
Jihoon Son c776e412e0
Update dictionary for spell check (#10152) 2020-07-07 23:12:39 -07:00
Gian Merlino 11c0da8097
Add availability and consistency docs. (#10149)
* Add availability and consistency docs.

Describes transactional ingestion and atomic replacement. Also, this patch
deletes some bad advice from the javadocs for SegmentTransactionalInsertAction.

* Fix missing word.
2020-07-07 15:22:52 -07:00
Egor Riashin d54a5e009f
ui: fix missing columns during Transform step (#10086)
Co-authored-by: egor-ryashin <egor.ryashin@metamarkets.com>
2020-07-07 15:16:52 -07:00
Gian Merlino eeaf609fc0
Update Jetty to 9.4.30.v20200611. (#10098)
* Update Jetty to 9.4.30.v20200611.

This is the latest version currently available in the 9.4.x line.

* Various adjustments.

* Class name fixes.

* Remove unused HttpClientModule code.

* Add coverage suppressions.

* Another coverage suppression.

* Fix wildcards.
2020-07-07 14:24:02 -07:00
Parag Jain 98ac7dfeff
mask secrets in MM task command log (#10128)
* mask secrets in MM task command log

* unit test for masked iterator

* checkstyle fix
2020-07-07 10:25:15 -07:00
Clint Wylie 010fe047e1
AbstractOptimizableDimFilter should be public (#10142) 2020-07-06 15:19:32 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
Gian Merlino ddda2a4f18
VersionedIntervalTimeline: Fix thread-unsafe call to "lookup". (#10130) 2020-07-05 09:32:18 -07:00
Fullstop000 bcf41922ce
Remove unsupported task types in doc (#10111) 2020-07-04 18:13:53 -07:00
Jihoon Son 2b93dc6019
Fix CachingClusteredClient when querying specific segments (#10125)
* Fix CachingClusteredClient when querying specific segments

* delete useless test

* roll back timeout
2020-07-02 17:50:50 -07:00
Jonathan Wei ed981ef88e
Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters (#10056)
* Ensure that join filter pre-analysis operates on optimized filters, add DimFilter.toOptimizedFilter

* Remove aggressive equality check that was used for testing

* Use Suppliers.memoize

* Checkstyle
2020-07-01 22:26:17 -07:00
Atul Mohan 367eaedbb4
Clarify change in behavior for druid.server.maxSize (#10105)
* Clarify maxSize docs

* Add info about maxSize

Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-07-01 22:22:18 -07:00
frank chen 60c6bd5b4c
support Aliyun OSS service as deep storage (#9898)
* init commit, all tests passed

* fix format

Signed-off-by: frank chen <frank.chen021@outlook.com>

* data stored successfully

* modify config path

* add doc

* add aliyun-oss extension to project

* remove descriptor deletion code to avoid warning message output by aliyun client

* fix warnings reported by lgtm-com

* fix ci warnings

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix errors reported by intellj inspection check

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix doc spelling check

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix dependency warnings reported by ci

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix warnings reported by CI

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add package configuration to support showing extension info

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add IT test cases and fix bugs

Signed-off-by: frank chen <frank.chen021@outlook.com>

* 1. code review comments adopted
2. change schema from 'aliyun-oss' to 'oss'

Signed-off-by: frank chen <frank.chen021@outlook.com>

* add license info

Signed-off-by: frank chen <frank.chen021@outlook.com>

* fix doc

Signed-off-by: frank chen <frank.chen021@outlook.com>

* exclude execution of IT testcases of OSS extension from CI

Signed-off-by: frank chen <frank.chen021@outlook.com>

* put the extensions under contrib group and add to distribution

* fix names in test cases

* add unit test to cover OssInputSource

* fix names in test cases

* fix dependency problem reported by CI

Signed-off-by: frank chen <frank.chen021@outlook.com>
2020-07-01 22:20:53 -07:00
Samarth Jain e2c5bcc22d
Fix UnknownComplexTypeColumn#makeVectorObjectSelector. Add a warning … (#10123)
* Fix UnknownComplexTypeColumn#makeVectorObjectSelector. Add a warning message to indicate failure in deserializing.
2020-07-01 20:06:23 -07:00
Clint Wylie c5540f46ed
fixes for ranger docs (#10109) 2020-07-01 18:26:41 -07:00
Maytas Monsereenusorn 1676ba22e3
Fix Stack overflow with infinite loop in ReduceExpressionsRule of HepProgram (#10120)
* Fix Stack overflow with SELECT ARRAY ['Hello', NULL]

* address comments
2020-07-01 17:48:09 -07:00
Clint Wylie 477335abb4
update links datasketches.github.io to datasketches.apache.org (#10107)
* update links datasketches.github.io to datasketches.apache.org

* now with more apache

* oops

* oops
2020-07-01 14:56:17 -07:00
Samarth Jain 3e92cdf1cf
Revert "Fix UnknownTypeComplexColumn#makeVectorObjectSelector" (#10121)
This reverts commit 7bb7489afc.
2020-07-01 14:33:17 -07:00
Clint Wylie a337ef351c
Closing yielder from ParallelMergeCombiningSequence should trigger cancellation (#10117)
* cancel parallel merge combine sequence on yielder close

* finish incomplete comment

* Update core/src/test/java/org/apache/druid/java/util/common/guava/ParallelMergeCombiningSequenceTest.java

Fixes checkstyle

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2020-07-01 14:07:44 -07:00
Jihoon Son 657f8ee80f
Fix RetryQueryRunner to actually do the job (#10082)
* Fix RetryQueryRunner to actually do the job

* more javadoc

* fix test and checkstyle

* don't combine for testing

* address comments

* fix unit tests

* always initialize response context in cachingClusteredClient

* fix subquery

* address comments

* fix test

* query id for builders

* make queryId optional in the builders and ClusterQueryResult

* fix test

* suppress tests and unused methods

* exclude groupBy builder

* fix jacoco exclusion

* add tests for builders

* address comments

* don't truncate
2020-07-01 14:02:21 -07:00
samarthjain 7bb7489afc Fix UnknownTypeComplexColumn#makeVectorObjectSelector 2020-07-01 12:02:23 -07:00
Surekha d3497a6581
Filter on metrics doc (#10087)
* add note about filter on metrics to filter docs

* edit doc to include having and filtered aggregator links
2020-06-30 19:52:40 -07:00
Gian Merlino 5faa897a34
Join filter pre-analysis simplifications and sanity checks. (#10104)
* Join filter pre-analysis simplifications and sanity checks.

- At pre-analysis time, only compute pre-analysis for the innermost
  root query, since this is the one that will run on the join that involves
  the base datasource. Previously, pre-analyses were computed for multiple
  levels of the query, some of which were unnecessary.
- Remove JoinFilterPreAnalysisGroup and join query level gathering code,
  since they existed to support precomputation of multiple pre-analyses.
- Embed JoinFilterPreAnalysisKey into JoinFilterPreAnalysis and use it to
  sanity check at processing time that the correct pre-analysis was done.

Tangentially related changes:

- Remove prioritizeAndLaneQuery functionality from LocalQuerySegmentWalker.
  The computed priority and lanes were not being used.
- Add "getBaseQuery" method to DataSourceAnalysis to support identification
  of the proper subquery for filter pre-analysis.

* Fix compilation errors.

* Adjust tests.
2020-06-30 19:14:22 -07:00
Lee Rhodes 7b4edc93fc
Update web address to datasketches.apache.org (#10096) 2020-06-30 19:05:23 -07:00
Samarth Jain 2c1b45842f
Prevent unknown complex types from breaking DruidSchema refresh (#9422) 2020-06-30 14:06:17 -07:00
Mohammad Shoaib 84290a2332
Enabling Static Imports for Unit Testing DSLs (#331) (#9764)
* Enabling Static Imports for Unit Testing DSLs (#331)

Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com>

* Feature 8885 - Enabling Static Imports for Unit Testing DSLs (#435)

* Enabling Static Imports for Unit Testing DSLs

* Using suppressions checkstyle to allow static imports only in the UTs

Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com>

* Removing the changes in the checkstyle because those are not needed

Co-authored-by: mohammadshoaib <mohammadshoaib@miqdigital.com>
2020-06-30 13:59:35 -07:00
Vadim Ogievetsky c01fd56182
Web console: allow link overrides for docs, and more (#10100)
* link overrides

* change doc version

* fix snapshots
2020-06-30 12:46:50 -07:00
Yuanli Han fc555980e8
Remove payload field from table sys.segment (#9883)
* remove payload field from table sys.segments

* update doc

* fix test

* fix CI failure

* add necessary fields

* fix doc

* fix comment
2020-06-29 22:20:23 -07:00
Clint Wylie 4a625751e8
Information schema doc update (#10081)
* add docs for IS_JOINABLE and IS_BROADCAST to INFORMATION_SCHEMA docs

* fixes

* oops

* revert noise

* missed one

* spellbot
2020-06-29 21:08:13 -07:00
Suneet Saldanha 363d0d86be
QueryCountStatsMonitor can be injected in the Peon (#10092)
* QueryCountStatsMonitor can be injected in the Peon

This change fixes a dependency injection bug where there is a circular
dependency on getting the MonitorScheduler when a user configures the
QueryCountStatsMonitor to be used.

* fix tests

* Actually fix the tests this time
2020-06-29 21:03:07 -07:00
BIGrey 69f2b1ef00
Correct the position of the double quotation in distinctcount.md file (#10094)
```
"dimensions": "[sample_dim]"
```
should be
```
"dimensions": ["sample_dim"]
```
2020-06-29 20:59:56 -07:00
Suneet Saldanha b91a16943b
Make 0.19 brokers compatible with 0.18 router (#10091)
* Make brokers backwards compatible

In 0.19, Brokers gained the ability to serve segments. To support this change,
a `BROKER` ServerType was added to `druid.server.coordination`.

Druid nodes prior to this change do not know of this new server type and so
they would fail to deserialize this node's announcement.

This change makes it so that the broker only announces itself if the segment
cache is configured on the broker. It is expected that a Druid admin will only
configure the segment cache on the broker once the cluster has been upgraded
to a version that supports a broker using the segment cache.

* make code nicer

* Add tests

* Ignore icode coverage for nitialization classes

* Revert "Ignore icode coverage for nitialization classes"

This reverts commit aeec0c2ac2.

* code review
2020-06-29 20:57:33 -07:00
Atul Mohan 0841c89df6
Fix nullhandling exception (#10095)
Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>
2020-06-29 20:55:38 -07:00
Chi Cao Minh 33a37d85d7
Fix native batch range partition segment sizing (#10089)
* Fix native batch range partition segment sizing

Fixes #10057.

Native batch range partitioning was only considering the partition
dimension value when grouping rows instead of using all of the row's
partition values. Thus, for schemas with multiple dimensions, the rollup
was overestimated, which would cause too many dimension values to be
packed into the same range partition. The resulting segments would then
be overly large (and not honor the target or max partition sizes).

Main changes:

- PartialDimensionDistributionTask: Consider all dimension values when
  grouping row

- RangePartitionMultiPhaseParallelIndexingTest: Regression test by
  having input with rows that should roll up and rows that should not
  roll up

* Use hadoop & native hash ingestion row group key
2020-06-29 17:49:52 -07:00
Jihoon Son 8ef3598c05
Move shardSpec tests to core (#10079)
* Move shardSpec tests to core

* checkstyle

* inject object mapper for testing

* unused import
2020-06-29 17:31:37 -07:00
Suneet Saldanha 15a0b4ffe2
Filter http requests by http method (#10085)
* Filter http requests by http method

Add a config that allows a user which http methods to allow against their
Druid server.

Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.

If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config

* Exclude OPTIONS from always supported HTTP methods

Add HEAD as an allowed method for web console e2e tests

* fix docs

* fix security IT

* Actually fix the web console e2e tests

* Ignore icode coverage for nitialization classes

* code review
2020-06-29 16:59:31 -07:00
Will Xu 35c7c0ec25
Segment timeline doesn't show results older than 3 months (#9956)
* Segment timeline doesn't show results older than 3 months

* Adoption testing patch for web segment timeline view and also refactoring default time config
2020-06-28 01:45:05 -07:00
chenyuzhi459 a4c6d5f37e
fix query memory leak (#10027)
* fix query memory leak

* rollup ./idea

* roll up .idea

* clean code

* optimize style

* optimize cancel function

* optimize style

* add concurrentGroupTest test case

* add test case

* add unit test

* fix code style

* optimize cancell method use

* format code

* reback code

* optimize cancelAll

* clean code

* add comment
2020-06-26 23:30:59 -07:00
Jian Wang 20fd72bd13
Fix NPE when brokers use custom priority list (#9878) 2020-06-26 17:28:54 -07:00
xiangqiao123 405ebdcaaf
fix MaterializedView gropuby query return arry result by default (#9936)
* fix bug:MaterializedView gropuby query return map result by default

* add unit test

* add unit test

* add unit test

* fix bug:MaterializedView gropuby query return map result by default

* add unit test

* add unit test

* add unit test

* update pr

* update pr

Co-authored-by: xiangqiao <xiangqiao@kuaishou.com>
2020-06-26 16:52:04 -07:00
Clint Wylie 4b99c6d3ef
ensure ParallelMergeCombiningSequence closes its closeables (#10076)
* ensure close for all closeables of ParallelMergeCombiningSequence

* revert unneeded change

* consolidate methods

* catch throwable instead of exception
2020-06-26 14:37:20 -07:00
Maytas Monsereenusorn ec46d82c71
Add integration tests for SqlInputSource (#10080)
* Add integration tests for SqlInputSource

* make it faster
2020-06-26 10:32:42 -10:00
Jihoon Son c591ff8ea8
Add NonnullPair (#10013)
* Add NonnullPair

* new line

* test

* make it consistent
2020-06-26 09:52:06 -07:00
Suneet Saldanha b7d771f633
More prominent instructions on code coverage failure (#10060)
* More prominent instructions on code coverage failure

* Update .travis.yml
2020-06-25 19:48:30 -07:00
morrifeldman f6594fff60
Fix missing temp dir for native single_dim (#10046)
* Fix missing temp dir for native single_dim

Native single dim indexing throws a file not found exception from
InputEntityIteratingReader.java:81.  This MR creates the required
temporary directory when setting up the
PartialDimensionDistributionTask.  The change was tested on a Druid
cluster.  After installing the change native single_dim indexing
completes successfully.

* Fix indentation

* Use SinglePhaseSubTask as example for creating the temp dir

* Move temporary indexing dir creation in to TaskToolbox

* Remove unused dependency

Co-authored-by: Morri Feldman <morri@appsflyer.com>
2020-06-25 14:41:22 -07:00
Jihoon Son aaee72c781
Allow append to existing datasources when dynamic partitioning is used (#10033)
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning

* incomplete javadoc

* Address comments

* fix tests

* fix json serde, add tests

* checkstyle

* Set core partition set size for hash-partitioned segments properly in
batch ingestion

* test for both parallel and single-threaded task

* unused variables

* fix test

* unused imports

* add hash/range buckets

* some test adjustment and missing json serde

* centralized partition id allocation in parallel and simple tasks

* remove string partition chunk

* revive string partition chunk

* fill numCorePartitions for hadoop

* clean up hash stuffs

* resolved todos

* javadocs

* Fix tests

* add more tests

* doc

* unused imports

* Allow append to existing datasources when dynamic partitioing is used

* fix test

* checkstyle

* checkstyle

* fix test

* fix test

* fix other tests..

* checkstyle

* hansle unknown core partitions size in overlord segment allocation

* fail to append when numCorePartitions is unknown

* log

* fix comment; rename to be more intuitive

* double append test

* cleanup complete(); add tests

* fix build

* add tests

* address comments

* checkstyle
2020-06-25 13:37:31 -07:00
Clint Wylie 0f51b3c190
fix dropwizard emitter jvm bufferpoolName metric (#10075)
* fix dropwizard emitter jvm bufferpoolName metric

* fixes
2020-06-25 12:20:25 -07:00