Commit Graph

11123 Commits

Author SHA1 Message Date
Peter Marshall 0de1837ff7
Docs - partitioning note re: skew / dim concatenation + nav update (#11488)
* Update native-batch.md

Knowledge from https://the-asf.slack.com/archives/CJ8D1JTB8/p1595434977062400

* Update native-batch.md

* Fixed broken link + some grammar
2021-07-27 09:17:01 -07:00
Kashif Faraz 8a4e27f51d
Select broker based on query context parameter `brokerService` (#11495)
This change allows the selection of a specific broker service (or broker tier) by the Router.

The newly added ManualTieredBrokerSelectorStrategy works as follows:

Check for the parameter brokerService in the query context. If this is a valid broker service, use it.
Check if the field defaultManualBrokerService has been set in the strategy. If this is a valid broker service, use it.
Move on to the next strategy
2021-07-27 20:56:05 +05:30
Peter Marshall 60fdf7a734
Rollup measurement query amended (#11479)
By user request from https://groups.google.com/g/druid-user/c/bFkOtE-1eQg - gives the measure as a floating point instead of an integer.
2021-07-27 06:29:29 -07:00
Jonathan Wei 676efb1b3f
Fix integration test credential resource path handling (#11487)
This PR fixes an issue with the integration test copy_resources.sh script.

The "install druid jars" portion was removing the $SHARED_DIR/docker directory, which wipes out the $SHARED_DIR/docker/extensions and $SHARED_DIR/docker/credentials directories created just before, which leads to issues later in the script when copying resources to the $SHARED_DIR/docker/credentials/ dir.
2021-07-27 12:32:34 +05:30
Maytas Monsereenusorn c068906fca
Make intermediate store for shuffle tasks an extension point (#11492)
* add interface

* add docs

* fix errors

* fix injection

* fix injection

* update javadoc
2021-07-27 11:29:43 +07:00
Suneet Saldanha 3f456fe305
Address CVE-2021-35515 CVE-2021-36090 (#11496)
* Address CVE-2021-35515 CVE-2021-36090

Bump commons-compress to deal with new CVEs

* fix licenses
2021-07-26 14:54:32 -07:00
Peter Marshall 973e5bf7d0
Docs - HLL lgK tip and slight layout change (#11482)
* HLL lgK and a tip

Knowledge transfer from https://the-asf.slack.com/archives/CJ8D1JTB8/p1600699967024200.  Attempted to make a connection between the SQL HLL function and the HLL underneath without getting too complicated.  Also added a note about using K over 16 being pretty much pointless.

* Corrected spelling

* Create datasketches-hll.md

Put roll-up back to rollup

* Update docs/development/extensions-core/datasketches-hll.md

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2021-07-26 12:28:53 -07:00
Abhishek Agarwal fcb908d505
Make buildQueryRunnerForSegment protected in ServerManager(#11493)
This is a minor change in ServerManager. Any sub-class can access the buildQueryRunnerForSegment in an extension if required.
2021-07-24 11:08:28 +05:30
Rohan Garg c98e7c3aa3
Fix left join SQL queries with IS NOT NULL filter (#11434)
This PR fixes the incorrect results for query : 

SELECT dim1, l1.k FROM foo LEFT JOIN (select k || '' as k from lookup.lookyloo group by 1) l1 ON foo.dim1 = l1.k WHERE l1.k IS NOT NULL (in CalciteQueryTests)
In the current code, the WHERE clause gets removed from the top of the left join and is pushed to the table foo
leading to incorrect results.
The fix for such a situation is done by :

Converting such left joins into inner joins (since logically the mentioned left join query is equivalent to an inner join) using Calcite while maintaining that the druid execution layer can execute such inner joins.
Preferring converted inner joins over original left joins in our cost model
2021-07-23 20:57:19 +05:30
Maytas Monsereenusorn 161f4dbc0e
Add integration tests for S3 Assume Role ingestion feature (#11472)
* add IT for S3 assume role

* fix checkstyle

* fix test

* fix pom

* fix test
2021-07-23 10:09:09 +07:00
Lucas Capistrant 9767b42e85
Add a new metric query/segments/count that is not emitted by default (#11394)
* Add a new metric query/segments/count that is not emitted by default

* docs

* test the default implementation of the metric

* fix spelling error in docs

* document the fact that query retries will result in additional metric emissions

* update using recommended text from @jihoonson
2021-07-22 17:57:35 -07:00
Abhishek Agarwal ce1faa5635
Make SegmentLoader extensible and customizable (#11398)
This PR refactors the code related to segment loading specifically SegmentLoader and SegmentLoaderLocalCacheManager. SegmentLoader is marked UnstableAPI which means, it can be extended outside core druid in custom extensions. Here is a summary of changes

SegmentLoader returns an instance of ReferenceCountingSegment instead of Segment. Earlier, SegmentManager was wrapping Segment objects inside ReferenceCountingSegment. That is now moved to SegmentLoader. With this, a custom implementation can track the references of segments. It also allows them to create custom ReferenceCountingSegment implementations. For this reason, the constructor visibility in ReferenceCountingSegment is changed from private to protected.
SegmentCacheManager has two additional methods called - reserve(DataSegment) and release(DataSegment). These methods let the caller reserve or release space without calling SegmentLoader#getSegment. We already had similar methods in StorageLocation and now they are available in SegmentCacheManager too which wraps multiple locations.
Refactoring to simplify the code in SegmentCacheManager wherever possible. There is no change in the functionality.
2021-07-22 18:00:49 +05:30
benkrug 167c45260c
Update druid-vs-kudu.md (#11470)
small typo - "need" to "needed"
2021-07-21 22:58:14 +08:00
Maytas Monsereenusorn 6ce3b6ca2d
Improve documentation for druid.indexer.autoscale.workerCapacityHint config (#11444)
* fix doc

* address comments

* Update docs/configuration/index.md

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-07-21 12:48:56 +07:00
Jihoon Son 0453e461f6
Add errorMessage in taskStatus for task failures in middleManagers/indexers/peons (#11446)
* Add error message; add unit tests for ForkingTaskRunner

* add tests

* fix comment

* unused import

* add exit code in error message

* fix test
2021-07-20 21:34:53 -07:00
Jihoon Son 84c957f541
Add more sql tests for groupby queries (#11454)
* Add more sql tests for simple groupby queries

* unused import

* fix tests

* javadocs

* unused import
2021-07-20 21:05:11 -07:00
zachjsh a2538d264d
Add back missing unit test coverage in AvroFlattenerMakerTest (#11451)
* Add back missing unit test coverage in AvroFlattenerMakerTest

Adds back test coverage for Avro flattener that was mistakenly removed in https://github.com/apache/druid/pull/10505. Recfactored the tests a bit too.

* resolve checkstyle warnings
2021-07-20 18:27:00 -07:00
Paul Rogers aa8c615ac2
Updates to source and doc build pages (#11464)
* Updates to source and doc build pages.

Clarifies a few points for newbies.

* Fixed spelling error

And added spellcheck info to website README file.
2021-07-20 18:07:34 -07:00
Vadim Ogievetsky aee2f2e24f
Web console: better handle BigInt math (#11450)
* better handle BigInt math

* correctly brace bigint

* feedback fixes and tests
2021-07-20 17:17:19 -07:00
Suneet Saldanha 1937b5c0da
Pin Jetty version to 9.4.x (#11453)
Major version bumps in jetty are too scary for now. So let's keep up to date with the latest 9.4.x
2021-07-20 14:50:39 -07:00
Abhishek Agarwal 94c1671eaf
Split SegmentLoader into SegmentLoader and SegmentCacheManager (#11466)
This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager.

SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects.

SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.
2021-07-21 00:14:19 +05:30
Junegunn Choi 69b0c6a47b
"druid.request.logging.type" should allow "noop" value (#10774) 2021-07-20 09:10:46 -07:00
jerryleooo c7fdf1d685
Fix typo in ingestion spec sample (#11433)
* Update index.md

Fix typo in the ingestion spec sample

* fixed more typos
2021-07-19 22:02:21 -07:00
Dongjoon Hyun 5037493e45
Bump commons-io to 2.11.0 (#11460)
* Bump commons-io to 2.11.0

* Address comments

* Remove try catch

* Fix checkstyle
2021-07-19 15:47:14 -07:00
Clint Wylie 2705fe98fa
Fix avro json serde issues (#11455) 2021-07-20 00:32:05 +08:00
Jihoon Son 8729b40893
Add the error message in taskStatus for task failures in overlord (#11419)
* add error messages in taskStatus for task failures in overlord

* unused imports

* add helper message for logs to look up

* fix tests

* fix counting the same task failures more than once

* same fix for HttpRemoteTaskRunner
2021-07-15 13:14:28 -07:00
sthetland a366753ba5
Consolidate multi-value dimension doc and highlight configurability (#11428)
* Clarify options for multi-value dims
* Add first example
2021-07-15 10:19:10 -07:00
Maytas Monsereenusorn 8d7d60d18e
Improve Auto scaler pendingTaskBased provisioning strategy to handle when there are no currently running worker node better (#11440)
* fix pendingTaskBased

* fix doc

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments
2021-07-15 06:52:25 +07:00
Maytas Monsereenusorn d3e82b1114
speed up test (#11442) 2021-07-14 21:14:38 +07:00
zachjsh ace4b807f4
update dependency-check cron job to purge cache before checking (#11436)
The dependency-check cron job now purges any caches NVD before performing dependency check. Without this, a high CVE vulernability was reported in this job a few months after the nvd was updated for it.
2021-07-13 01:43:31 -07:00
zachjsh 73711a456a
Suppress CVE-2021-27568 from json-smart 2.3 dependency (#11438)
Dependency on hadoop 2.8.5 is preventing us form updating this dependency to a later version. We don't believe that this is a major concern since Druid eats uncaught exceptions, and only displays them in logs. This issue also should only affect ingestion jobs, which can only be run by admin type users.
2021-07-12 22:58:06 -04:00
Maytas Monsereenusorn 05d5dd9289
compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded (#11426)
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded

* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded

* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded

* fix test

* fix test
2021-07-13 09:48:06 +07:00
Maytas Monsereenusorn f5d53569ca
Supervisor metadata auto cleanup failing as missing Guice injection (#11424)
* Fix Supervisor metadata auto cleanup failing as missing Guice injection

* Fix Supervisor metadata auto cleanup failing as missing Guice injection

* fix IT

* fix IT

* Update services/src/main/java/org/apache/druid/cli/CliCoordinator.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* fix

* fix

* fix

* fix

* fix

* fix

* fix

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-07-13 09:47:49 +07:00
frank chen 2236cf2234
eliminate extra object instantiation (#11345) 2021-07-12 18:31:39 -07:00
Sandeep 18b8ac5349
removes unnecessary checks (#11431)
* removes unnecessary checks

* removes unnecessary checks
2021-07-12 18:21:16 -07:00
Clint Wylie d0b4e55a6f
fix pom version reference (#11435) 2021-07-13 08:37:57 +08:00
kaijianding e39ff44481
improve groupBy query granularity translation with 2x query performance improve when issued from sql layer (#11379)
* improve groupBy query granularity translation when issued from sql layer

* fix style

* use virtual column to determine timestampResult granularity

* dont' apply postaggregators on compute nodes

* relocate constants

* fix order by correctness issue

* fix ut

* use more easier understanding code in DefaultLimitSpec

* address comment

* rollback use virtual column to determine timestampResult granularity

* fix style

* fix style

* address the comment

* add more detail document to explain the tradeoff

* address the comment

* address the comment
2021-07-11 10:22:47 -07:00
Abhishek Agarwal e228a84d91
Fix retry sleep when callable throws exception (#11430)
If the callable throws an exception, we neither increase the retry count nor sleep the thread.
2021-07-11 15:06:10 +05:30
Vadim Ogievetsky 377b5e708c
Web console: Data loading walkthrough fixes (#11416)
* fix quotes

* fix sql doc parsing

* prevent array-input from losing position while the user is typing

* make group filter click-to-filterable

* fix casing bug in exact table search

* do not sort columns in smaples

* can bypass transform step

* fixed string json parsing

* improve PartitionMessage

* better error messages

* feedback fixes

* tool to order dimensions in schema view
2021-07-10 07:56:50 -07:00
Suneet Saldanha 49e8732e4f
Display errors for invalid timezones in TIME_FORMAT (#11423)
Users sometimes make typos when picking timezones - like `America/Los Angeles`
instead of `America/Los_Angeles` instead of defaulting to UTC, this change
makes it so that an error is thrown instead notifying the user of their mistake.
2021-07-09 06:07:13 -07:00
Agustin Gonzalez 7e61042794
Bound memory utilization for dynamic partitioning (i.e. memory growth is constant) (#11294)
* Bound memory in native batch ingest create segments

* Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that

* Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged

* Changed name from RealtimeAppenderator to StreamAppenderator

* Style

* Incorporating tests from StreamAppenderatorTest

* Keep totalRows and cleanup code

* Added missing dep

* Fix unit test

* Checkstyle

* allowIncrementalPersists should always be true for batch

* Added sinks metadata

* clear sinks metadata when closing appenderator

* Style + minor edits to log msgs

* Update sinks metadata & totalRows when dropping a sink (segment)

* Remove max

* Intelli-j check

* Keep a count of hydrants persisted by sink for sanity check before merge

* Move out sanity

* Add previous hydrant count to sink metadata

* Remove redundant field from SinkMetadata

* Remove unneeded functions

* Cleanup unused code

* Removed unused code

* Remove unused field

* Exclude it from jacoco because it is very hard to get branch coverage

* Remove segment announcement and some other minor cleanup

* Add fallback flag

* Minor code cleanup

* Checkstyle

* Code review changes

* Update batchMemoryMappedIndex name

* Code review comments

* Exclude class from coverage, will include again when packaging gets fixed

* Moved test classes to server module

* More BatchAppenderator cleanup

* Fix bug in wrong counting of totalHydrants plus minor cleanup in add

* Removed left over comments

* Have BatchAppenderator follow the Appenderator contract for push & getSegments

* Fix LGTM violations

* Review comments

* Add stats after push is done

* Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag,
remove real time flag stuff from stream appenderator, etc.)

* Update javadocs

* Add thread safety notice to BatchAppenderator

* Further cleanup config

* More config cleanup
2021-07-09 00:10:29 -07:00
Harini Rajendran 4c90c0c21d
Adding more debug logs to increase visibility into StreamSupervisor notices queue size and processing time. (#11415) 2021-07-08 16:15:15 -07:00
Clint Wylie 63fcd77c38
support using mariadb connector with mysql extensions (#11402)
* support using mariadb connector with mysql extensions

* cleanup and more tests

* fix test

* javadocs, more tests, etc

* style and more test

* more test more better

* missing pom

* more pom
2021-07-08 12:25:37 -07:00
Abhishek Agarwal 3481bb0440
Better error message for unsupported double values (#11409)
A constant expression may evaluate to Double.NEGATIVE_INFINITY/Double.POSITIVE_INFINITY/Double.NAN e.g. log10(0). When using such an expression in native queries, the user will get the corresponding value without any error. In SQL, however, the user will run into NumberFormatException because we convert the double to big-decimal while constructing a literal numeric expression. This probably should be fixed in calcite - see https://issues.apache.org/jira/browse/CALCITE-2067. This PR adds a verbose error message so that users can take corrective action without scratching their heads.
2021-07-08 16:55:17 +05:30
Joseph Glanville d5e8d4d680
Avro union support (#10505)
* Avro union support

* Document new union support

* Add support for AvroStreamInputFormat and fix checkstyle

* Extend multi-member union test schema and format

* Some additional docs and add Enums to spelling

* Rename explodeUnions -> extractUnions

* explode -> extract

* ByType

* Correct spelling error
2021-07-06 22:05:41 -07:00
Clint Wylie 17efa6f556
add single input string expression dimension vector selector and better expression planning (#11213)
* add single input string expression dimension vector selector and better expression planning

* better

* fixes

* oops

* rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing

* oops

* javadocs, renaming

* more javadocs

* benchmarks

* use string expression vector processor with vector size 1 instead of expr.eval

* better logging

* javadocs, surprising number of the the

* more

* simplify
2021-07-06 11:20:49 -07:00
Daniel Koepke 497f2a1051
Allow spaces in java home. (#11407)
Quote the $java_exec var in examples/bin/verify-java to support spaces in DRUID_JAVA_HOME/JAVA_HOME. At present, the steps before and after the version check properly quote the path, but the version check spuriously fails when pointing to a Java 8 install that has a space in its path.
2021-07-05 18:50:36 +05:30
Jason Koch d3220693f3
perf: improve concurrency and improve perf for task query in HeapMemoryTaskStorage (#11272)
* perf: improve concurrency and reduce algorithmic cost for task querying in HeapMemoryTaskStorage

* fix: address intellij linter concern regarding use of ConcurrentMap interface

* nit: document thread safety of HeapMemoryTaskStorage

* empty to trigger ci
2021-07-05 10:07:26 +08:00
Agustin Gonzalez a9c4b478ab
Fix expiration logic for ldap internal credential cache (#11395)
* Fix expiration logic for ldap internal credential cache

* Removed sleeps from tests

* Make method package scoped so it can be used in unit tests

* Removed unused thrown exceptions
2021-07-01 14:34:29 -07:00
Abhishek Agarwal 03a6a6d6e1
Replace Processing ExecutorService with QueryProcessingPool (#11382)
This PR refactors the code for QueryRunnerFactory#mergeRunners to accept a new interface called QueryProcessingPool instead of ExecutorService for concurrent execution of query runners. This interface will let custom extensions inject their own implementation for deciding which query-runner to prioritize first. The default implementation is the same as today that takes the priority of query into account. QueryProcessingPool can also be used as a regular executor service. It has a dedicated method for accepting query execution work so implementations can differentiate between regular async tasks and query execution tasks. This dedicated method also passes the QueryRunner object as part of the task information. This hook will let custom extensions carry any state from QuerySegmentWalker to QueryProcessingPool#mergeRunners which is not possible currently.
2021-07-01 16:03:08 +05:30