Commit Graph

10968 Commits

Author SHA1 Message Date
Gian Merlino 6d82c3cbf1
StringComparators: No need to convert to UTF-8 for lexicographic comparison. (#11171)
Lexicographic ordering of UTF-8 byte sequences and in-memory UTF-16
strings are equivalent. So, we can skip the (expensive) conversion and
get an equivalent result. Thank you, Unicode!
2021-04-30 10:54:20 -07:00
benkrug fdab95ea99
Update index.md (#11174)
tiny change for readability
2021-04-30 09:40:19 -07:00
frank chen aec65242fc
fix docker volume permissions (#11167)
Docker volume directory was accidentally removed due to reordering of statements.
This causes ownership and permissions on the volume directory to be reset, preventing startup.

fixes #11166
Signed-off-by: frank chen <frank.chen021@outlook.com>
2021-04-30 08:51:53 -07:00
Benedict Jin ed81548ff7
Add helm chart for Apache Druid (#11163)
* Add helm chart for Apache Druid

* Add license headers
2021-04-29 12:38:50 +08:00
Jihoon Son 941440afb6
Suggest using svn mv; clarify the process for doc updating (#11162) 2021-04-29 10:08:23 +08:00
Gian Merlino 7d808e357c
InDimFilter: Fix cache key computation to avoid collisions. (#11168)
The prior code did not include separation between values, and encoded
null ambiguously. This patch fixes both of those issues by encoding
strings as length + value instead of just value.

I think cache key computation was OK prior to #9800. Prior to that
patch, the cache key was computed using CacheKeyBuilder.appendStrings,
which encodes strings as UTF-8 and inserts a separator byte (0xff)
between them that cannot appear in a UTF-8 stream.
2021-04-28 17:28:29 -07:00
Gian Merlino ad028de538
InDimFilter: Fix NPE involving certain Set types. (#11169)
* InDimFilter: Fix NPE involving certain Set types.

Normally, InDimFilters that come from JSON have HashSets for "values".
However, programmatically-generated filters (like the ones from #11068)
may use other set types. Some set types, like TreeSets with natural
ordering, will throw NPE on "contains(null)", which causes the
InDimFilter's ValueMatcher to throw NPE if it encounters a null value.

This patch adds code to detect if the values set can support
contains(null), and if not, wrap that in a null-checking lambda.

Also included:

- Remove unneeded NullHandling.needsEmptyToNull method.
- Update IndexedTableJoinable to generate a TreeSet that does not
  require lambda-wrapping. (This particular TreeSet is how I noticed
  the bug in the first place.)

* Test fixes.

* Improve test coverage
2021-04-28 14:13:42 -07:00
Jeet Patel 7139c60868
Change the `id` for `kubernetes` doc link to work (#11176)
* Change the `id` for doc link to work

* Added `druid-kubernetes-extensions` to the list
2021-04-28 10:12:28 -07:00
Jihoon Son 8215cc3238
Unit test for DefaultOperandTypeChecker (#11152)
* Less strict operand type check and implicit casting

* fix ci

* Clean up unnecessary changes

* more cleanup

* unused import
2021-04-27 18:47:38 -07:00
Jihoon Son 261c1f271f
Keep traitSet of logicalValues (#11138) 2021-04-27 18:45:23 -07:00
Clint Wylie 57ddae782e
fix serde issues with time-min-max extension (#11146)
* fix serde issues with time-min-max extension

* fix pom dependencies
2021-04-27 10:33:13 -07:00
Vadim Ogievetsky 1b07d554a8
Misc QueryView UX improvments (#11158) 2021-04-27 10:26:01 -07:00
Xavier Léauté 0296f20551
upgrade Apache Kafka to 2.8.0 (#11139)
* upgrade to Apache Kafka 2.8.0 (release notes:
  https://downloads.apache.org/kafka/2.8.0/RELEASE_NOTES.html)
* pass Kafka version as a Docker argument in integration tests
  to keep in sync with maven version
* fix use of internal Kafka APIs in integration tests
2021-04-24 08:27:07 -07:00
Jeet Patel 31042cddf5
Fix `defaultMetricDimensions.json` path link (#11156) 2021-04-24 11:08:03 +08:00
Gian Merlino a47c0d2579
Clarify meaning of "root-level fields" in the documentation. (#11143) 2021-04-24 11:06:08 +08:00
Harini Rajendran 8a3be6bccc
Fix TimeSeriesUnionQueryRunnerTest by extending InitializedNullHandlingTest (#11154) 2021-04-23 08:56:03 -07:00
John Gozde 9745d9e1c3
Web console: Switch to ESLint (#11142)
* Initial eslint config

* I guess eslint sorts underscores differently

* Trim curlies (in jsx)

* Re-organize rules

* Use consistent quote props

* Restructure eslint rules as additions/overrides to recommended configs

* Fix the 'recommended' stuff

* Add prefer-readonly

* Add prefer-object-spread

* Prettify

* Add eslint-plugin-react-hooks

* Switch to eslint-plugin-simple-sort-order

So much better

* Add no-extraneous-dependencies

* ban-tslint-comment for funzies

* If we enabled no-shadow, we'd probably want this option

* Not prefer-for-of

* no-confusing-void-expression, no-confusing-non-null-assertion

* Add some no-unnecessary-* rules

* non-nullable-type-assertion-style!

* prefer-includes

* Reorganize

* prefer-things

* switch-exhaustiveness-check

* We don't need the jsdoc plugin, prettier has our backs

* Remove a useless rule

* Drop TSLint and (temporarily) awesome-code-style

* Removing Object.assign revealed a type issue

* Bring back awesome-code-style for sasslint config

* Disable react/jsx-no-target-blank

* Add prettify-check script

* Add license to eslint config

* Format readme

* Update README for eslint, IDE settings

* Add 'autofix' script

* Switch to @awesome-code-style
2021-04-22 19:33:03 -07:00
Clint Wylie 57ff1f9cdb
expression aggregator (#11104)
* add experimental expression aggregator

* add test

* fix lgtm

* fix test

* adjust test

* use not null constant

* array_set_concat docs

* add equals and hashcode and tostring

* fix it

* spelling

* do multi-value magic for expression agg, more javadocs, tests

* formatting

* fix inspection

* more better

* nullable
2021-04-22 18:30:16 -07:00
Jonathan Wei 49a9c3ffb7
Revert "Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075)" (#11151)
This reverts commit a2892d9c40.
2021-04-22 15:33:27 -07:00
zachjsh a2892d9c40
Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075)
* Do stuff

* Do more stuff

* * Do more stuff

* * Do more stuff

* * working

* * cleanup

* * more cleanup

* * more cleanup

* * add license header

* * Add unit tests

* * add java docs

* * add more unit tests

* * Cleanup test

* * Move removing of workingPath to index task rather than in hadoop job.

* * Address review comments

* * remove unused import

* * Address review comments

* Do not overwrite segment descriptor for segment if it already exists.

* * add comments to FileSystemHelper class

* * fix local hadoop integration test
2021-04-21 12:24:31 -07:00
Maytas Monsereenusorn 6d2b5cdd7e
Add feature to automatically remove audit logs based on retention period (#11084)
* add docs

* add impl

* fix checkstyle

* fix test

* add test

* fix checkstyle

* fix checkstyle

* fix test

* Address comments

* Address comments

* fix spelling

* fix docs
2021-04-20 17:10:43 -07:00
Charles Smith 09dcf6aa36
fix syntax error for loadstatus api (#11136) 2021-04-20 14:17:20 +08:00
Vadim Ogievetsky 4caa221d72
Web console: Better inline docs (#11128)
* better highlight

* better highlighting

* add spec
2021-04-19 14:36:53 -07:00
John Gozde fdc3c2f362
Web console: update dev dependencies (#11119)
* Update some dev dependencies, prettify, tslint-fix

* Sort tsconfig keys for easy comparison

* Set noImplicitThis

* Slightly more accurate types

* Bump Jest and related

* Bump react to latest on v16

* Bump node-sass, sass-loader for node14 support

* Remove node-sass-chokidar (unused)

* More unused dependencies

* Fix blueprint imports

* Webpack 5

* Update webpack config for 'process' usage

* Update playwright-chromium

* Emit esnext modules for tree shaking

* Enable source maps in development

* Dedupe

* Bump babel and things

* npm audit fix

* Add .editorconfig file to match prettier settings

* Update licenses (tslib is 0BSD as of 1.11.2)

https://github.com/microsoft/tslib/pull/96

* Require node >= 10

* Use Node 10 to run e2e tests

* Use 'ws' transport mode for dev server (will be default in next version)

* Remove an 'any'

* No sourcemaps in prod

* Exclude .editorconfig from license checks

* Try nvm for setting node version
2021-04-16 20:15:19 -07:00
Gian Merlino f2b54de205
Vectorized versions of HllSketch aggregators. (#11115)
* Vectorized versions of HllSketch aggregators.

The patch uses the same "helper" approach as #10767 and #10304, and
extends the tests to run in both vectorized and non-vectorized modes.

Also includes some minor changes to the theta sketch vector aggregator:

- Cosmetic changes to make the hll and theta implementations look
  more similar.
- Extends the theta SQL tests to run in vectorized mode.

* Updates post-code-review.

* Fix javadoc.
2021-04-16 18:45:46 -07:00
Sandeep 26d1074ade
[Security] Bump netty4.version from 4.1.48.Final to 4.1.63.Final (#11117) 2021-04-16 10:32:22 +08:00
Gian Merlino cb7c6ac314
Doc updates for union datasources. (#11103)
The main one is updating datasources.md to talk about SQL. (It still said
that table unions are not supported in SQL.) Also, this doc update adds
some clarifying details on limitations.
2021-04-14 18:18:14 -07:00
Gian Merlino 202c78c8f3
Enable rewriting certain inner joins as filters. (#11068)
* Enable rewriting certain inner joins as filters.

The main logic for doing the rewrite is in JoinableFactoryWrapper's
segmentMapFn method. The requirements are:

- It must be an inner equi-join.
- The right-hand columns referenced by the condition must not contain any
  duplicate values. (If they did, the inner join would not be guaranteed
  to return at most one row for each left-hand-side row.)
- No columns from the right-hand side can be used by anything other than
  the join condition itself.

HashJoinSegmentStorageAdapter is also modified to pass through to
the base adapter (even allowing vectorization!) in the case where 100%
of join clauses could be rewritten as filters.

In support of this goal:

- Add Query getRequiredColumns() method to help us figure out whether
  the right-hand side of a join datasource is being used or not.
- Add JoinConditionAnalysis getRequiredColumns() method to help us
  figure out if the right-hand side of a join is being used by later
  join clauses acting on the same base.
- Add Joinable getNonNullColumnValuesIfAllUnique method to enable
  retrieving the set of values that will form the "in" filter.
- Add LookupExtractor canGetKeySet() and keySet() methods to support
  LookupJoinable in its efforts to implement the new Joinable method.
- Add "enableRewriteJoinToFilter" feature flag to
  JoinFilterRewriteConfig. The default is disabled.

* Test improvements.

* Test fixes.

* Avoid slow size() call.

* Remove invalid test.

* Fix style.

* Fix mistaken default.

* Small fixes.

* Fix logic error.
2021-04-14 10:49:27 -07:00
Charles Smith b51632b0bf
Update security overview with additional recommendations (#11016)
* updatee security overview with additional recommendations for improved security

* address first set of review questions

* Update docs/operations/security-overview.md

* Update docs/operations/security-overview.md

* apply changes from review

* Update docs/operations/security-overview.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update docs/operations/security-overview.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update docs/operations/security-overview.md

Co-authored-by: Suneet Saldanha <suneet@apache.org>

* Update security-overview.md

fix additional comments & typos cc: @suneet-s, @jihoonsoon

Co-authored-by: Suneet Saldanha <suneet@apache.org>
2021-04-14 08:58:17 -07:00
Maytas Monsereenusorn f968400170
Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload (#11078)
* Add config to skip storing audit payload if exceed limit

* fix checkstyle

* change config name

* skip null fields for audit payload

* fix checkstyle

* address comments

* fix guice

* fix test

* add tests

* address comments

* address comments

* address comments

* fix checkstyle

* address comments

* fix test

* fix test

* address comments

* Address comments

Co-authored-by: Jihoon Son <jihoonson@apache.org>

Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-04-13 20:18:28 -07:00
Clint Wylie 08d3786738
improve bitmap vector offset to report contiguous groups (#11039)
* improve bitmap vector offset to report contiguous groups

* benchmark style

* check for contiguous in getOffsets, tests for exceptions
2021-04-13 11:47:01 -07:00
chenyuzhi459 b8423a38df
add round test (#11088)
* add round test

* code style

* handle null val for round function

* handle null val for round function

* support null for round

* fix compatiblity

* fix test

* fix test

* code style

* optimize format
2021-04-13 11:36:32 -07:00
Gian Merlino c158207ab6
Rename BitmapOperationTest base class to avoid flaky test. (#11102)
PR #10936 renamed BitmapBenchmark, the parent of a couple of bitmap tests, to
BitmapOperationTest. This patch renames it to BitmapOperationTestBase so JUnit
doesn't pick it up as a test case. When JUnit picks it up, it becomes a flaky
test, since its behavior and correctness depends on whether it runs before
or after its subclasses.
2021-04-13 08:01:15 -07:00
Gian Merlino c8e394015d
LongsLongEncodingReader: Implement "duplicate", fixing concurrency bug. (#11098)
Regression introduced in #11004 due to overzealous optimization. Even though
we replaced stateful usage of ByteBuffer with stateless usage of Memory, we
still need to create a new object on "duplicate" due to semantics of setBuffer.
2021-04-13 08:01:01 -07:00
bergmt2000 f60d8ea1c3
Update index.md (#11105)
Fix json typo in readme for granularitySpec in compaction config example
2021-04-13 16:26:36 +08:00
Jihoon Son 25db8787b3
Fix CAST being ignored when aggregating on strings after cast (#11083)
* Fix CAST being ignored when aggregating on strings after cast

* fix checkstyle and dependency

* unused import
2021-04-12 22:21:24 -07:00
Yi Yuan 0e0c1a1aaf
add protobuf inputformat (#11018)
* add protobuf inputformat

* repair pom

* alter intermediateRow to type of Dynamicmessage

* add document

* refine test

* fix document

* add protoBytesDecoder

* refine document and add ser test

* add hash

* add schema registry ser test

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-04-12 22:03:13 -07:00
Yi Yuan d0a94a8c14
add avro stream input format (#11040)
* add avro stream input format

* bug fixed

* add document

* doc fix

* change doc

* add integretion test

* bug fixed

* bug fixed

* add string as binary getter

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-04-12 21:53:41 -07:00
Gian Merlino c3faa24f26
DataSchema: Improve duplicate-column error message. (#11082)
* DataSchema: Improve duplicate-column error message.

Now, when duplicate columns are specified, the error message will include
information about where those duplicate columns were seen. Also, if there
are multiple duplicate columns, all will be listed in the error message
instead of just the first one encountered.

* Fix style for checkstyle.

* Further improve error message.
2021-04-12 19:03:15 -07:00
Suneet Saldanha c86178aaeb
Suppress CVE in libthrift (#11093) 2021-04-12 18:13:42 -07:00
Jihoon Son a6a2758095
More unit tests for JsonParserIterator; Integration tests for query errors (#11091)
* unit tests for timeout exception in init

* integration tests

* run integraion test on travis

* fix inspection
2021-04-12 15:08:50 -07:00
Vadim Ogievetsky 8432d82c48
Web console: Do not put __time in the dimensions list (#11085)
* Do not make time dimensions

* update e2e test
2021-04-12 09:48:10 -07:00
BIGrey d33fdd093b
Nested GroupBy query got wrong/empty result when using virtual column and filter (#11081)
* fix nested groupby got empty result when using virtual column

* move to query.getVirtualColumns().wrap instead of new VirtualizedColumnSelectorFactory

* move test to GroupByQueryRunnerTest

* Update processing/src/test/java/org/apache/druid/query/groupby/GroupByQueryRunnerTest.java

Co-authored-by: huagnhui.bigrey <huanghui.bigrey@bytedance.com>
Co-authored-by: Jihoon Son <jihoonson@apache.org>
2021-04-10 21:29:41 -07:00
zhangyue19921010 95b82dd325
Add missing API references for coordinator (#10967)
* add miss API references for coordinator

* add miss API references for coordinator

* add miss API references for coordinator

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-04-09 18:20:47 -07:00
Makdon d939420f23
Update SketchAggregator.java for removing duplicated parentheses (#11021)
* Update SketchAggregator.java

* Add test for sketches aggregator update unoin with double
2021-04-09 17:11:25 -07:00
Jonathan Wei e7b2ecd0fd
Add retry around query loop in ITWikipediaQueryTest.testQueryLaningLaneIsLimited (#11077) 2021-04-09 10:54:34 -07:00
Maytas Monsereenusorn 4576152e4a
Make dropExisting flag for Compaction configurable and add warning documentations (#11070)
* Make dropExisting flag for Compaction configurable

* fix checkstyle

* fix checkstyle

* fix test

* add tests

* fix spelling

* fix docs

* add IT

* fix test

* fix doc

* fix doc
2021-04-09 00:12:28 -07:00
Lucas Capistrant 8264203cee
Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676)
* Add ability to wait for segment availability for batch jobs

* IT updates

* fix queries in legacy hadoop IT

* Fix broken indexing integration tests

* address an lgtm flag

* spell checker still flagging for hadoop doc. adding under that file header too

* fix compaction IT

* Updates to wait for availability method

* improve unit testing for patch

* fix bad indentation

* refactor waitForSegmentAvailability

* Fixes based off of review comments

* cleanup to get compile after merging with master

* fix failing test after previous logic update

* add back code that must have gotten deleted during conflict resolution

* update some logging code

* fixes to get compilation working after merge with master

* reset interrupt flag in catch block after code review pointed it out

* small changes following self-review

* fixup some issues brought on by merge with master

* small changes after review

* cleanup a little bit after merge with master

* Fix potential resource leak in AbstractBatchIndexTask

* syntax fix

* Add a Compcation TuningConfig type

* add docs stipulating the lack of support by Compaction tasks for the new config

* Fixup compilation errors after merge with master

* Remove erreneous newline
2021-04-08 21:03:00 -07:00
Clint Wylie 338886fd5f
vector group by support for string expressions (#11010)
* vector group by support for string expressions

* fix test

* comments, javadoc
2021-04-08 19:23:39 -07:00
zhangyue19921010 de691808ce
[Bug]Kinesis-data-format IT can not work (#11071)
* start schema-resgity and replace json template

* add docs

Co-authored-by: yuezhang <yuezhang@freewheel.tv>
2021-04-08 15:50:04 -07:00