Commit Graph

10786 Commits

Author SHA1 Message Date
Gian Merlino 6c0c6e60b3
Vectorized theta sketch aggregator + rework of VectorColumnProcessorFactory. (#10767)
* Vectorized theta sketch aggregator.

Also a refactoring of BufferAggregator and VectorAggregator such that
they share a common interface, BaseBufferAggregator. This allows
implementing both in the same file with an abstract + dual subclass
structure.

* Rework implementation to use composition instead of inheritance.

* Rework things to enable working properly for both complex types and
regular types.

Involved finally moving makeVectorProcessor from DimensionHandlerUtils
into ColumnProcessors and harmonizing the two things.

* Add missing method.

* Style and name changes.

* Fix issues from inspections.

* Fix style issue.
2021-01-29 09:30:09 -08:00
Agustin Gonzalez 0e4750bac2
Granularity interval materialization (#10742)
* Prevent interval materialization for UniformGranularitySpec inside the overlord

* Change API of bucketIntervals in GranularitySpec to return an Iterable<Interval>

* Javadoc update, respect inputIntervals contract

* Eliminate dependency on wrappedspec (i.e. ArbitraryGranularity) in UniformGranularitySpec

* Added one boundary condition test to UniformGranularityTest and fixed Travis forbidden method errors in IntervalsByGranularity

* Fix Travis style & other checks

* Refactor TreeSet to facilitate re-use in UniformGranularitySpec

* Make sure intervals are unique when there is no segment granularity

* Style/bugspot fixes...

* More travis checks

* Add condensedIntervals method to GranularitySpec and pass it as needed to the lock method

* Style & PR feedback

* Fixed failing test

* Fixed bug in IntervalsByGranularity iterator that it would return repeated elements (see added unit tests that were broken before this change)

* Refactor so that we can get the condensed buckets without materializing the intervals

* Get rid of GranularitySpec::condensedInputIntervals ... not needed

* Travis failures fixes

* Travis checkstyle fix

* Edited/added javadoc comments and a method name (code review feedback)

* Fixed jacoco coverage by moving class and adding more coverage

* Avoid materializing the condensed intervals when locking

* Deal with overlapping intervals

* Remove code and use library code instead

* Refactor intervals by granularity using the FluentIterable, add sanity checks

* Change !hasNext() to inputIntervals().isEmpty()

* Remove redundant lambda

* Use materialized intervals here since this is outside the overlord (for performance)

* Name refactor to reflect the fact that bucket intervals are sorted.

* Style fixes

* Removed redundant method and have condensedIntervalIterator throw IAE when element is null for consistency with other methods in this class (as well that null interval when condensing does not make sense)

* Remove forbidden api

* Move helper class inside common base class to reduce public space pollution
2021-01-29 06:02:10 -08:00
Suneet Saldanha f773497e2c
Do not run integration tests in cron stage (#10814) 2021-01-28 20:11:36 -08:00
Abhishek Agarwal 0080e333cc
Fix cardinality estimation (#10762)
* Fix cardinality estimation

* Add unit test

* code coverage

* fix typo
2021-01-28 15:06:10 -08:00
Vadim Ogievetsky 2a1e47afc3
Web console: Remove first / last suggestions (#10794)
* Remove first / last suggestions

* remove commened out code
2021-01-28 13:37:10 -08:00
Clint Wylie 2ce7b3dcf4
bitwise math function expressions (#10605)
* expressions: adding bitwise expressions

* double handling and vectorization

* move conversion to Evals

* revert unintended changes

* less magic, split convert functions, fix parser for funny exponent doubles

* fix spelling exceptions list

* more spelling

* fix grammar, add more test, fix docs

* fix docs

Co-authored-by: Max Kaplan <max@maxkaplan.me>
2021-01-28 11:16:53 -08:00
Maytas Monsereenusorn a46d561bd7
Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740)
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix checkstyle

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix test

* fix test

* add log

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* address comments

* fix checkstyle

* fix checkstyle

* add config to skip overhead memory calculation

* add test for the skipBytesInMemoryOverheadCheck config

* add docs

* fix checkstyle

* fix checkstyle

* fix spelling

* address comments

* fix travis

* address comments
2021-01-27 00:34:56 -08:00
Suneet Saldanha 5efaaab561
Run integration tests in a second stage (#10791)
* Run integration tests in a second stage

* maybe run the integration tests

* better

* drop sudo
2021-01-26 23:11:54 -08:00
Suneet Saldanha a7542652ff
Fix dependabot warnings (#10796)
* Bump http-proxy from 1.18.0 to 1.18.1 in /web-console (#7)

Bumps [http-proxy](https://github.com/http-party/node-http-proxy) from 1.18.0 to 1.18.1.
- [Release notes](https://github.com/http-party/node-http-proxy/releases)
- [Changelog](https://github.com/http-party/node-http-proxy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/http-party/node-http-proxy/compare/1.18.0...1.18.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump elliptic from 6.5.1 to 6.5.3 in /web-console (#6)

Bumps [elliptic](https://github.com/indutny/elliptic) from 6.5.1 to 6.5.3.
- [Release notes](https://github.com/indutny/elliptic/releases)
- [Commits](https://github.com/indutny/elliptic/compare/v6.5.1...v6.5.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump dot-prop from 4.2.0 to 4.2.1 in /web-console (#5)

Bumps [dot-prop](https://github.com/sindresorhus/dot-prop) from 4.2.0 to 4.2.1.
- [Release notes](https://github.com/sindresorhus/dot-prop/releases)
- [Commits](https://github.com/sindresorhus/dot-prop/compare/v4.2.0...v4.2.1)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump bl from 1.2.2 to 1.2.3 in /website (#4)

Bumps [bl](https://github.com/rvagg/bl) from 1.2.2 to 1.2.3.
- [Release notes](https://github.com/rvagg/bl/releases)
- [Commits](https://github.com/rvagg/bl/compare/v1.2.2...v1.2.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump prismjs from 1.20.0 to 1.23.0 in /website (#3)

Bumps [prismjs](https://github.com/PrismJS/prism) from 1.20.0 to 1.23.0.
- [Release notes](https://github.com/PrismJS/prism/releases)
- [Changelog](https://github.com/PrismJS/prism/blob/master/CHANGELOG.md)
- [Commits](https://github.com/PrismJS/prism/compare/v1.20.0...v1.23.0)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-01-26 23:11:26 -08:00
Himadri Singh 1c1b396eaa
AWS Web Identity / IRSA Support (#10541)
* AWS Web Identity Support

required for AWS IRSA

* Update kinesis-ingestion.md

* disabling coverage tests

https://github.com/apache/druid/pull/10541#issuecomment-737558213

* exclude coverage

* Update licenses.yaml
2021-01-25 18:44:02 +05:30
Vadim Ogievetsky 8c227bc566
use new example manifest (#10787) 2021-01-24 12:38:13 -08:00
Charles Smith 99494e3d16
suggest index parallel for native batch reindexing > 1GB (#10788) 2021-01-22 21:54:28 -08:00
Clint Wylie cd6af93274
add leftover tests from #10743 (#10766) 2021-01-22 09:20:48 -08:00
zhangyue19921010 bf1d1d583b
modify (#10778)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-22 09:20:13 -08:00
zhangyue19921010 8c6153d511
[Bug Fix] Broker will not wait for its SQL metadata view to fully initialize before starting up, even though set awaitInitializationOnStart true (#10779)
* enhance the logic of Start up DruidSchema immediately if there are no segments.

* add UT to test DruidSchema init

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-22 08:48:21 -08:00
Gian Merlino 8b808c4879
Retain order of AND, OR filter children. (#10758)
* Retain order of AND, OR filter children.

If we retain the order, it enables short-circuiting. People can put a
more selective filter earlier in the list and lower the chance that
later filters will need to be evaluated.

Short-circuiting was working before #9608, which switched to unordered
sets to solve a different problem. This patch tries to solve that
problem a different way.

This patch moves filter simplification logic from "optimize" to
"toFilter", because that allows the code to be shared with Filters.and
and Filters.or. The simplification has become more complicated and so
it's useful to share it.

This patch also removes code from CalciteCnfHelper that is no longer
necessary because Filters.and and Filters.or are now doing the work.

* Fixes for inspections.

* Fix tests.

* Back to a Set.
2021-01-20 08:59:20 -08:00
zhangyue19921010 2590ad4f67
Historical unloads damaged segments automatically when lazy on start. (#10688)
* ready to test

* tested on dev cluster

* tested

* code review

* add UTs

* add UTs

* ut passed

* ut passed

* opti imports

* done

* done

* fix checkstyle

* modify uts

* modify logs

* changing the package of SegmentLazyLoadFailCallback.java to org.apache.druid.segment

* merge from master

* modify import orders

* merge from master

* merge from master

* modify logs

* modify docs

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

* modify logs to rerun ci

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-16 19:53:30 -08:00
Gian Merlino 2b24dc3764
SegmentAnalyzer: Properly close column after retrieving it. (#10772) 2021-01-16 19:26:34 -08:00
Jihoon Son 95065bdf1a
Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
Xavier Léauté c5ecbf6794
fix task metric types in statsd emitter (#10764)
except success and failure stats, task count metrics should all be
gauges, since they represent the current state and not some aggregate
counter over time.
2021-01-15 11:39:51 -08:00
Gian Merlino a82910e065
OrFilter: Properly handle child matchers that return the original mask. (#10754)
* OrFilter: Properly handle child matchers that return the original mask.

This happens when a child matcher is literally true (for example,
BooleanVectorValueMatcher). In this case, OrFilter would throw this
exception from its call to removeAll while processing the next filter:

  java.lang.IllegalStateException: 'other' must be a different instance from 'this'

Also update the javadocs for VectorValueMatcher to call out that the
returned object may be the same as the input mask.

* Fix style.
2021-01-14 23:28:13 -08:00
Gian Merlino 7354953b1b
VectorMatch: Disallow "copyFrom", "addAll" on self; improve tests. (#10755)
No existing code relies on being able to call these methods in this way.

The new tests exhaustively test all vectors up to size 7, and also test
behavior the run-on-self behavior that has been adjusted by this patch.
2021-01-14 18:29:13 -08:00
Gian Merlino 2bbf89db81
Remove FalseVectorMatcher, TrueVectorMatcher in favor of BooleanVectorValueMatcher. (#10757) 2021-01-14 18:28:25 -08:00
Vadim Ogievetsky e52db19823
treat null as not defined (#10751) 2021-01-14 18:22:59 -08:00
kaijianding 4437c6af60
use actual dataInterval in cache key (#10714)
* use actual dataInterval in cache key

* fix ut fail

* fix segmentMaxTime exclusive
2021-01-13 18:31:36 -08:00
Jihoon Son b3325c1601
Add a config for monitorScheduler type (#10732)
* Add a config for monitorScheduler type

* check interrupted

* null check

* do not schedule monitor if the previous one is still running

* checkstyle

* clean up names

* change default back to basic

* fix test
2021-01-13 17:20:43 -08:00
Jihoon Son 149306c9db
Tidy up HTTP status codes for query errors (#10746)
* Tidy up query error codes

* fix tests

* Restore query exception type in JsonParserIterator

* address review comments; add a comment explaining the ugly switch

* fix test
2021-01-13 17:20:00 -08:00
Clint Wylie 8c3c9b4060
fix limited queries with subtotals (#10743)
* i put my thing down, flip it and reverse it

* oops
2021-01-13 12:55:24 -08:00
Clint Wylie 9362dc7968
re-use expression vector evaluation results for the same offset in expression vector selectors (#10614)
* cache expression selector results by associating vector expression bindings to underlying vector offset

* better coverage, fix floats

* style

* stupid bot

* stupid me

* more test

* intellij threw me under the bus when it generated those junit methods

* narrow interface instead of passing around offset
2021-01-13 12:44:56 -08:00
Vadim Ogievetsky 2fc2938b01
Web console: fix bad results if there is not native bigint (#10741)
* fix bigint when it does not exist

* add test
2021-01-12 16:32:23 -08:00
Lucas Capistrant aecc9e5e7e
Remove legacy code from LogUsedSegments duty (#10287)
* allow the LogUsedSegments duty to be skippped

* Fixes for TravisCI coverage checks and documentation spell checking

* prameterize DruidCoordinatorTest in order to achieve coverage

* update config name to remove duty ref and improve documentation

* refine documentation for new config with reviewer advice

* add default column to docs for new config

* remove legacy code in LogUsedSegments and remove config to disbale duty

* fix makeHistoricalMangementDuties now that the returned list is always the same
2021-01-12 14:09:19 -08:00
Jihoon Son ca32652932
Fix potential deadlock in batch ingestion (#10736)
* Fix potential deadlock in batch ingestion

* fix checkstyle and comment

* this is better
2021-01-12 12:50:45 -08:00
Jihoon Son 3984457e5b
Add missing unit tests for segment loading in historicals (#10737)
* Add missing unit tests for segment loading in historicals

* unused import
2021-01-11 18:20:13 -06:00
Lucas Capistrant fe0511b16a
Coordinator Dynamic Config changes to ease upgrading with new config value (#10724)
* Coordinator Dynamic Config changes to ease upgrading with new config value

* change a log to debug level following review

* changes based on review feedback

* fix checkstyle
2021-01-10 20:05:39 -08:00
Xavier Léauté 118b50195e
Introduce KafkaRecordEntity to support Kafka headers in InputFormats (#10730)
Today Kafka message support in streaming indexing tasks is limited to
message values, and does not provide a way to expose Kafka headers,
timestamps, or keys, which may be of interest to more specialized
Druid input formats. For instance, Kafka headers may be used to indicate
payload format/encoding or additional metadata, and timestamps are often
omitted from values in Kafka streams applications, since they are
included in the record.

This change proposes to introduce KafkaRecordEntity as InputEntity,
which would give input formats full access to the underlying Kafka record,
including headers, key, timestamps. It would also open access to low-level
information such as topic, partition, offset if needed.

KafkaEntity is a subclass of ByteEntity for backwards compatibility with
existing input formats, and to avoid introducing unnecessary complexity
for Kinesis indexing tasks.
2021-01-08 16:04:37 -08:00
zhangyue19921010 2837a9b62f
[Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout` (#10661)
* done

* done

* done

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-08 15:23:08 -08:00
zhangyue19921010 d5192640cb
remove extra comma (#10670)
Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-01-08 15:15:08 -08:00
Yi Yuan 3624acbcf8
fix web-console show json bug (#10710)
* fix web-console show json bug

* replace all JSON.stringify

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-01-08 14:55:55 -08:00
秦臻 c62b7c19c3
javascript filter result convert to java boolean (#10721)
* javascript filter result convert to java boolean

* use type convert replace script convert, and add more unit test

Co-authored-by: qinzhen <qinzhen@kuaishou.com>
2021-01-08 14:30:09 -08:00
Abhishek Agarwal f66fdbfa5d
add offsetFetchPeriod to kinesis ingestion doc (#10734) 2021-01-08 14:19:26 -08:00
Gian Merlino 6eef0e4c9f
Fix collision between #10689 and #10593. (#10738) 2021-01-08 09:52:27 -08:00
Aleksey Plekhanov 26bcd47e51
Thread-safety for ResponseContext.REGISTERED_KEYS (#9667) 2021-01-08 00:37:49 -08:00
Liran Funaro 08ab82f55c
IncrementalIndex Tests and Benchmarks Parametrization (#10593)
* Remove redundant IncrementalIndex.Builder

* Parametrize incremental index tests and benchmarks

- Reveal and fix a bug in OffheapIncrementalIndex

* Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]

* Fix Intellij errors: declared exception is never thrown

* Add documentation and validate before closing objects on tearDown.

* Add documentation to OffheapIncrementalIndexTestSpec

* Doc corrections and minor changes.

* Add logging for generated rows.

* Refactor new tests/benchmarks.

* Improve IncrementalIndexCreator documentation

* Add required tests for DataGenerator

* Revert "rollupOpportunity" to be a string
2021-01-07 22:18:47 -08:00
kaijianding 01e25f1e69
reuse DataSegment object when a segment found on another server (#10715) 2021-01-07 21:55:25 -08:00
Jonathan Wei c7f2d3fbb5
Update deps for CVE-2020-28168 and CVE-2020-28052 (#10733)
* Update deps for CVE-2020-28168 and CVE-2020-28052

* Make BC runtime scope
2021-01-07 20:31:44 -08:00
Makdon 1905b80ec3
Update badge for travis in README.md (#10717)
* Update README for updating travis badge

Update README cause
> This repository was migrated and is now building on travis-ci.com

* Update README.md
2021-01-07 18:39:58 -08:00
Himanshu c7b1212a43
AWS RDS token based password provider (#9518)
* refresh db pwd

* aws iam token password provider

* fix analyze-dependencies build

* fix doc build

* add  ut for BasicDataSourceExt

* more doc updates

* more  doc update

* moving aws  token password  provider to new extension

* remove duplicate changes

* make  all config inline

* extension docs

* refresh db  password  in SQL Firehose code path as well

* add ut

* fix build

* add new extension to distribution

* rds lib is not provided

* fix license build

* add version to license

* change parent version to 0.19.0-snapshot

* address review comments

* fix core/ code coverage

* Update server/src/main/java/org/apache/druid/metadata/BasicDataSourceExt.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* address review comments

* fix spellchecker

* remove inadvertant website file change

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-01-06 21:15:29 -08:00
Gian Merlino 48e576a307
Scan query: More accurate error message when segment per time chunk limit is exceeded. (#10630)
* Scan query: More accurate error message when segment per time chunk limit is exceeded.

* Add guardrail test.
2021-01-06 14:11:28 -08:00
Makdon f9fc1892d1
Typo: missing comma in json (#10711) 2021-01-06 13:49:50 -08:00
Jonathan Wei 68bb038b31
Multiphase segment merge for IndexMergerV9 (#10689)
* Multiphase merge for IndexMergerV9

* JSON fix

* Cleanup temp files

* Docs

* Address logging and add IT

* Fix spelling and test unloader datasource name
2021-01-05 22:19:09 -08:00