Commit Graph

11325 Commits

Author SHA1 Message Date
Kashif Faraz 2d77e1a3c6
Add support for multi dimension range partitioning (#11848)
This PR adds support for range partitioning on multiple dimensions. It extends on the
concept and implementation of single dimension range partitioning.

The new partition type added is range which corresponds to a set of Dimension Range Partition classes. single_dim is now treated as a range type partition with a single partition dimension.

The start and end values of a DimensionRangeShardSpec are represented
by StringTuples, where each String in the tuple is the value of a partition dimension.
2021-11-06 12:50:17 +05:30
Gian Merlino 1c12dd97dc
Add javadocs to StringUtils.fromUtf8. (#11881)
They clarify that the methods advance the position of the buffer.
2021-11-05 15:27:24 -07:00
Gian Merlino 8971056763
Properly count segment references in tests. (#11870) 2021-11-05 12:49:10 -07:00
Clint Wylie 907e4ca0c5
use correct DimensionSpec with for column value selectors created from dictionary encoded column indexers (#11873)
* use correct dimension spec for column value selectors of dictionary encoded column indexers
2021-11-05 01:51:15 -07:00
zachjsh 1d6df48145
Warn if cache size of lookup is beyond max size (#11863)
Enhanced the ExtractionNamespace interface in lookups-cached-global core extension with the ability to set a maxHeapPercentage for the cache of the respective namespace. The reason for adding this functionality, is make it easier to detect when a lookup table grows to a size that the underlying service cannot handle, because it does not have enough memory. The default value of maxHeap for the interface is -1, which indicates that no maxHeapPercentage has been set. For the JdbcExtractionNamespace and UriExtractionNamespace implementations, the default value is null, which will cause the respective service that the lookup is loaded in, to warn when its cache is beyond mxHeapPercentage of the service's configured max heap size. If a positive non-null value is set for the namespace's maxHeapPercentage config, this value will be honored for all services that the respective lookup is loaded onto, and consequently log warning messages when the cache of the respective lookup grows beyond this respective percentage of the services configured max heap size. Warnings are logged every time that either Uri based or Jdbc based lookups are regenerated, if the maxHeapPercentage constraint is violated. No other implementations will log warnings at this time. No error is thrown when the size exceeds the maxHeapPercentage at this time, as doing so could break functionality for existing users. Previously the JdbcCacheGenerator generated its cache by materializing all rows of the underling table in memory at once; this made it difficult to log warning messages in the case that the results from the jdbc query were very large and caused the service to run out of memory. To help with this, this pr makes it so that the jdbc query results are instead streamed through an iterator.
2021-11-03 21:32:22 -04:00
Abhishek Agarwal 652e1491e0
Update default values for tuning parameters in kinesis data loader (#11867) 2021-11-02 23:51:28 +05:30
Karan Kumar cf27366b35
Fixing typos in docker build scripts (#11866) 2021-11-02 23:50:52 +05:30
andreacyc 88bbc8e9e1
Add info for compation config dialog (#11847)
* add-info-for-compation-config-dialog

* correct the info

* remove space typo

* Revert "remove space typo"

This reverts commit 28b28733ae.

* remove typo space

* update snapshots for jest-test
2021-11-02 10:03:29 -07:00
Kashif Faraz a22687ecbe
Add Broker config `druid.broker.segment.watchRealtimeNodes` (#11732)
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to skip the realtime nodes and
thus it would query only Historical processes for any given segment.
2021-11-02 12:38:42 +05:30
Katya Macedo 5e1dc843d1
Fix quickstart link (#11864) 2021-11-02 13:27:53 +08:00
Nolan Emirot cd6867844f
docs: update helm flag (#11721)
In helm v3 the --name doesn't exist
2021-11-02 13:25:49 +08:00
Sandeep 52539de521
fixes data validation error using correct way to comment the license under templates (#11839) 2021-11-02 09:32:47 +08:00
Maytas Monsereenusorn ba2874ee1f
Support changing query granularity in Auto Compaction (#11856)
* add queryGranularity

* fix checkstyle

* fix test
2021-11-01 15:18:44 -07:00
Clint Wylie 9bd2ccbb9b
SqlAggregationModuleTest now extends CalciteTestBase to ensure consistent string encoding (#11861) 2021-11-01 15:11:40 -07:00
Will Xu 7af36fecff
Fix travis' link behind build badge (#11858) 2021-11-01 07:26:30 -07:00
Karan Kumar 90640bb316
Support for hadoop 3 via maven profiles (#11794)
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
2021-10-30 22:46:24 +05:30
Maytas Monsereenusorn 33d9d9bd74
Add rollup config to auto and manual compaction (#11850)
* add rollup to auto and manual compaction

* add unit tests

* add unit tests

* add IT

* fix checkstyle
2021-10-29 10:22:25 -07:00
Jonathan Wei a96aed021e
Fix indefinite WAITING batch task when lock is revoked (#11788)
* Fix indefinite WAITING batch task when lock is revoked

* Use revoked property on TaskLock

* Update TimeChunkLockAcquireAction to return TaskLock for revoked locks
2021-10-27 17:49:15 -05:00
Liran Funaro 9ca8f1ec97
Remove IncrementalIndex template modifier (#11160)
Co-authored-by: Liran Funaro <liran.funaro@verizonmedia.com>
2021-10-27 13:10:37 -07:00
Gian Merlino fc95c92806
Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (#11124)
* Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs.

This patch does the following:

- Removes OffheapIncrementalIndex.
- Clarifies that Aggregators are required to be thread safe.
- Clarifies that BufferAggregators and VectorAggregators are not
  required to be thread safe.
- Removes thread safety code from some DataSketches aggregators that
  had it. (Not all of them did, and that's OK, because it wasn't necessary
  anyway.)
- Makes enabling "useOffheap" with groupBy v1 an error.

Rationale for removing the offheap incremental index:

- It is only used in one rare scenario: groupBy v1 (which is non-default)
  in "useOffheap" mode (also non-default). So you have to go pretty deep
  into the wilderness to get this code to activate in production. It is
  never used during ingestion.
- Its existence complicates developer efforts to reason about how
  aggregators get used, because the way it uses buffer aggregators is so
  different from how every other query engine uses them.
- It doesn't have meaningful testing.

By the way, I do believe that the given way the offheap incremental index
works, it actually didn't require buffer aggregators to be thread-safe.
It synchronizes on "aggregate" and doesn't call "get" until it has
stopped calling "aggregate". Nevertheless, this is a bother to think about,
and for the above reasons I think it makes sense to remove the code anyway.

* Remove things that are now unused.

* Revert removal of getFloat, getLong, getDouble from BufferAggregator.

* OAK-related warnings, suppressions.

* Unused item suppressions.
2021-10-26 08:05:56 -07:00
Vadim Ogievetsky 8ea9309168
Web console: update typescript 4.4 for faster build speeds (#11725)
* update typescript

* do not show pagination when there is only one page

* update snapshots

* fix pagination
2021-10-25 21:53:38 -07:00
Đặng Minh Dũng 4baebb231b
add `prometheus-emitter` to distribution (#11812)
* add `prometheus-emitter` to distribution

Signed-off-by: Đặng Minh Dũng <dungdm93@live.com>

* add `druid-momentsketch` to distribution

Signed-off-by: Đặng Minh Dũng <dungdm93@live.com>
2021-10-25 21:16:17 -07:00
Jihoon Son 07a232d7b4
Bump netty4 to 4.1.68; suppress CVE-2021-37136 and CVE-2021-37137 for netty3 (#11844)
* bump netty4 to 4.1.68

* suppress CVE-2021-37136 and CVE-2021-37137 for netty3

* license
2021-10-25 21:09:15 -07:00
Vadim Ogievetsky f2106d7621
Web console: Add segment size in bytes column and hide it by default (#11797)
* add segment size column

* allow hidden default column

* fix tests

* update e2e tests
2021-10-25 13:24:44 -07:00
Sergio Ferragut 000a5551fa
docker mem reqs (#11827)
* docker mem reqs

* Update docs/tutorials/docker.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Sergio Ferragut <sergio.ferragut@imply.io>
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-10-25 12:23:25 -07:00
Gian Merlino 8276c031c5
Add druid.sql.approxCountDistinct.function property. (#11181)
* Add druid.sql.approxCountDistinct.function property.

The new property allows admins to configure the implementation for
APPROX_COUNT_DISTINCT and COUNT(DISTINCT expr) in approximate mode.

The motivation for adding this setting is to enable site admins to
switch the default HLL implementation to DataSketches.

For example, an admin can set:

  druid.sql.approxCountDistinct.function = APPROX_COUNT_DISTINCT_DS_HLL

* Fixes

* Fix tests.

* Remove erroneous cannotVectorize.

* Remove unused import.

* Remove unused test imports.
2021-10-25 12:16:21 -07:00
Lucas Capistrant 43383c73a8
refactor BalanceSegments#balanceServers to exit early if there is no work to be done (#11768)
* remove useless call to balanceServers for move from decom servers when there are no decom servers

* refactor approach to this PR but accomplish the same thing
2021-10-25 10:06:35 -05:00
Kashif Faraz abac9e39ed
Revert permission changes to Supervisor and Task APIs (#11819)
* Revert "Require Datasource WRITE authorization for Supervisor and Task access (#11718)"

This reverts commit f2d6100124.

* Revert "Require DATASOURCE WRITE access in SupervisorResourceFilter and TaskResourceFilter (#11680)"

This reverts commit 6779c4652d.

* Fix docs for the reverted commits

* Fix and restore deleted tests

* Fix and restore SystemSchemaTest
2021-10-25 14:50:38 +05:30
Charles Smith 10c5fa93f1
remove dupe sentence (#11821) 2021-10-25 14:48:20 +05:30
Vadim Ogievetsky 4354e43983
Use existing queryId if it exists (#11834) 2021-10-23 19:02:39 -07:00
Gian Merlino d4cace385f
SQL: Allow Scans to be used as outer queries. (#11831)
* SQL: Allow Scans to be used as outer queries.

This has been possible in the native query system for a while, but the capability
hasn't yet propagated into the SQL layer. One example of where this is useful is
a query like:

  SELECT * FROM (... LIMIT X) WHERE <filter>

Because this expands the kinds of subquery structures the SQL layer will consider,
it was also necessary to improve the cost calculations. These changes appear in
PartialDruidQuery and DruidOuterQueryRel. The ideas are:

- Attach per-column penalties to the output signature of each query, instead of to
  the initial projection that starts a query. This encourages moving projections
  into subqueries instead of leaving them on outer queries.
- Only attach penalties to projections if there are actually expressions happening.
  So, now, projections that simply reorder or remove fields are free.
- Attach a constant penalty to every outer query. This discourages creating them
  when they are not needed.

The changes are generally beneficial to the test cases we have in CalciteQueryTest.
Most plans are unchanged, or are changed in purely cosmetic ways. Two have changed
for the better:

- testUsingSubqueryWithLimit now returns a constant from the subquery, instead of
  returning every column.
- testJoinOuterGroupByAndSubqueryHasLimit returns a minimal set of columns from
  the innermost subquery; two unnecessary columns are no longer there.

* Fix various DS operator conversions.

These were all implemented as direct conversions, which isn't appropriate
because they do not actually map onto native functions. These are only
usable as post-aggregations.

* Test case adjustment.
2021-10-23 17:18:43 -07:00
Gian Merlino 98ecbb21cd
Remove CloseQuietly and migrate its usages to other methods. (#10247)
* Remove CloseQuietly and migrate its usages to other methods.

These other methods include:

1) New method CloseableUtils.closeAndWrapExceptions, which wraps IOExceptions
   in RuntimeExceptions for callers that just want to avoid dealing with
   checked exceptions. Most usages were migrated to this method, because it
   looks like they were mainly attempts to avoid declaring a throws clause,
   and perhaps were unintentionally suppressing IOExceptions.
2) New method CloseableUtils.closeInCatch, designed to properly close something
   in a catch block without losing exceptions. Some usages from catch blocks
   were migrated here, when it seemed that they were intended to avoid checked
   exception handling, and did not really intend to also suppress IOExceptions.
3) New method CloseableUtils.closeAndSuppressExceptions, which sends all
   exceptions to a "chomper" that consumes them. Nothing is thrown or returned.
   The behavior is slightly different: with this method, _all_ exceptions are
   suppressed, not just IOExceptions. Calls that seemed like they had good
   reason to suppress exceptions were migrated here.
4) Some calls were migrated to try-with-resources, in cases where it appeared
   that CloseQuietly was being used to avoid throwing an exception in a finally
   block.

🎵 You don't have to go home, but you can't stay here... 🎵

* Remove unused import.

* Fix up various issues.

* Adjustments to tests.

* Fix null handling.

* Additional test.

* Adjustments from review.

* Fixup style stuff.

* Fix NPE caused by holder starting out null.

* Fix spelling.

* Chomp Throwables too.
2021-10-23 17:03:21 -07:00
Clint Wylie 44a7b09190
Revert "Missing Loader parameter in generate-binary-license and generate-binary-notice py scripts (#11815)" (#11832)
This reverts commit a7ee646927.
2021-10-23 08:34:26 -07:00
Gian Merlino b7a4c79314
Null handling fixes for DS HLL and Theta sketches. (#11830)
* Null handling fixes for DS HLL and Theta sketches.

For HLL, this fixes an NPE when processing a null in a multi-value dimension.

For both, empty strings are now properly treated as nulls (and ignored) in
replace-with-default mode. Behavior in SQL-compatible mode is unchanged.

* Fix expectation.
2021-10-22 19:09:00 -07:00
Gian Merlino cb9bc15e95
Fix task report streaming in https setups. (#11739)
* Fix task report streaming in https setups.

* Trivial change to re-trigger ITs.
2021-10-22 19:07:29 -07:00
Clint Wylie 02b2057371
extract generic dictionary encoded column indexing and merging stuffs (#11829)
* extract generic dictionary encoded column indexing and merging stuffs to pave the path towards supporting other types of dictionary encoded columns

* spotbugs and inspections fixes

* friendlier

* javadoc

* better name

* adjust
2021-10-22 17:31:22 -07:00
Victoria Lim 43103632fb
Docs - add description on time origin (#11826)
* add description on time origin

* reorder parameter descriptions

* add example of origin value
2021-10-22 14:57:13 -07:00
Clint Wylie 741b4ed516
add output type information to ExpressionPostAggregator (#11818)
* add ColumnInspector argument to PostAggregator.getType to allow post-aggs to compute their output type based on input types

* add test for test for coverage

* simplify

* Remove unused imports.

Co-authored-by: Gian Merlino <gian@imply.io>
2021-10-22 13:52:51 -07:00
Arun Ramani df4894afff
Fallback to /sys/fs root when looking for cgroups (#11810)
ProcCgroupDiscoverer builds the cgroup directory by concatenating the proc mounts and proc cgroup paths together. This doesn't seem to work in Kubernetes if the execution context is within the container. Also this isn't consistent across all Linux OSes. The fix is to fallback to / as the root and it seems to work empirically.
2021-10-21 09:51:16 +05:30
Alexander Saydakov 8cf1cbc4a9
latest datasketches-java and datasketches-memory (#11773)
* latest datasketches-java and datasketches-memory

* updated versions of datasketches-java and datasketches-memory

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2021-10-19 23:42:30 -07:00
David Ferlay a7ee646927
Missing Loader parameter in generate-binary-license and generate-binary-notice py scripts (#11815) 2021-10-20 00:25:17 +05:30
Clint Wylie 187df58e30
better types (#11713)
* better type system

* needle in a haystack

* ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support

* fixup merge

* more test

* fixup

* intern

* fix

* oops

* oops again

* ...

* more test coverage

* fix error message

* adjust interning, more javadocs

* oops

* more docs more better
2021-10-19 01:47:25 -07:00
Sandeep 17459a84d3
Update link to helm chart quickstart guide (#11801) 2021-10-19 14:10:40 +05:30
David Bar 7d4841471f
Optimize supervisor history retrieval for specific id (#11807)
Optimization. Fetch from the metadata store only the relevant history items for the requested supervisor id.
2021-10-19 14:08:25 +05:30
TSFenwick 9c15f938fd
fix test issue where JettyTest would fail if JettyWithResponseFilterEnabledTest ran before it (#11803)
this change ensures that JettyTest is setting the properties it needs in case some other test overwrites them
this also changes up the ordering of the call for setProperties to call super's first in case super is setting the same property
2021-10-18 12:42:41 -07:00
Charles Smith 938c1493e5
edits to kafka inputFormat (#11796)
* edits to kafka inputFormat

* revise conflict resolution description

* tweak for clarity

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* style fixes

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/ingestion/data-formats.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
2021-10-15 14:01:10 -07:00
Charles Smith 6089a168ea
Docs - update dynamic config provider topic (#11795)
* update dynamic config provider

* update topic

* add examples for dynamic config provider:

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/development/extensions-core/kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update docs/operations/dynamic-config-provider.md

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* Update kafka-ingestion.md

Co-authored-by: Katya Macedo  <38017980+ektravel@users.noreply.github.com>
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-10-14 17:51:32 -07:00
Abhishek Agarwal 4f62905be0
Fix the travis build (#11799) 2021-10-14 16:31:51 +05:30
Agustin Gonzalez 887cecf29e
Simplify ITHttpInputSourceTest to mitigate flakiness (#11751)
* Increment retry count to add more time for tests to pass

* Re-enable ITHttpInputSourceTest

* Restore original count

* This test is about input source, hash partitioning takes longer and not required thus changing to dynamic

* Further simplify by removing sketches
2021-10-12 11:51:27 -05:00
andreacyc adb2237628
Fix CVE-2021-3749 reported in security vulnerabilities job (#11786)
* Fix CVE-2021-3749 reported in security vulnerabilities job

* test why test fail

* update axios

* remove console log for testing
2021-10-08 23:02:58 -07:00