Commit Graph

306 Commits

Author SHA1 Message Date
Gian Merlino babf00f8e3
Migrate File.mkdirs to FileUtils.mkdirp. (#11879)
* Migrate File.mkdirs to FileUtils.mkdirp.

* Remove unused imports.

* Fix LookupReferencesManager.

* Simplify.

* Also migrate usages of forceMkdir.

* Fix var name.

* Fix incorrect call.

* Update test.
2021-11-09 11:10:49 -08:00
Maytas Monsereenusorn ddc68c6a81
Support changing dimension schema in Auto Compaction (#11874)
* add impl

* add unit tests

* fix checkstyle

* add impl

* add impl

* add impl

* add impl

* add impl

* add impl

* fix test

* add IT

* add IT

* fix docs

* add test

* address comments

* fix conflict
2021-11-08 21:17:08 -08:00
Clint Wylie 907e4ca0c5
use correct DimensionSpec with for column value selectors created from dictionary encoded column indexers (#11873)
* use correct dimension spec for column value selectors of dictionary encoded column indexers
2021-11-05 01:51:15 -07:00
Liran Funaro 9ca8f1ec97
Remove IncrementalIndex template modifier (#11160)
Co-authored-by: Liran Funaro <liran.funaro@verizonmedia.com>
2021-10-27 13:10:37 -07:00
Gian Merlino fc95c92806
Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs. (#11124)
* Remove OffheapIncrementalIndex and clarify aggregator thread-safety needs.

This patch does the following:

- Removes OffheapIncrementalIndex.
- Clarifies that Aggregators are required to be thread safe.
- Clarifies that BufferAggregators and VectorAggregators are not
  required to be thread safe.
- Removes thread safety code from some DataSketches aggregators that
  had it. (Not all of them did, and that's OK, because it wasn't necessary
  anyway.)
- Makes enabling "useOffheap" with groupBy v1 an error.

Rationale for removing the offheap incremental index:

- It is only used in one rare scenario: groupBy v1 (which is non-default)
  in "useOffheap" mode (also non-default). So you have to go pretty deep
  into the wilderness to get this code to activate in production. It is
  never used during ingestion.
- Its existence complicates developer efforts to reason about how
  aggregators get used, because the way it uses buffer aggregators is so
  different from how every other query engine uses them.
- It doesn't have meaningful testing.

By the way, I do believe that the given way the offheap incremental index
works, it actually didn't require buffer aggregators to be thread-safe.
It synchronizes on "aggregate" and doesn't call "get" until it has
stopped calling "aggregate". Nevertheless, this is a bother to think about,
and for the above reasons I think it makes sense to remove the code anyway.

* Remove things that are now unused.

* Revert removal of getFloat, getLong, getDouble from BufferAggregator.

* OAK-related warnings, suppressions.

* Unused item suppressions.
2021-10-26 08:05:56 -07:00
Alexander Saydakov 8cf1cbc4a9
latest datasketches-java and datasketches-memory (#11773)
* latest datasketches-java and datasketches-memory

* updated versions of datasketches-java and datasketches-memory

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2021-10-19 23:42:30 -07:00
Clint Wylie 187df58e30
better types (#11713)
* better type system

* needle in a haystack

* ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support

* fixup merge

* more test

* fixup

* intern

* fix

* oops

* oops again

* ...

* more test coverage

* fix error message

* adjust interning, more javadocs

* oops

* more docs more better
2021-10-19 01:47:25 -07:00
Clint Wylie 392f0ca1b5
refactor sql authorization to get resource type from schema, resource type to be string (#11692)
* refactor sql authorization to get resource type from schema, refactor resource type from enum to string

* information schema auth filtering adjustments

* refactor

* minor stuff

* Update SqlResourceCollectorShuttle.java
2021-09-17 09:53:25 -07:00
Clint Wylie fe1d8c206a
bump version to 0.23.0-SNAPSHOT (#11670) 2021-09-08 15:56:04 -07:00
Jihoon Son 7e90d00cc0
Configurable maxStreamLength for doubles sketches (#11574)
* Configurable maxStreamLength for doubles sketches

* fix equals/hashcode and it test failure

* fix test

* fix it test

* benchmark

* doc

* grouping key

* fix comment

* dependency check

* Update docs/development/extensions-core/datasketches-quantiles.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

* Update docs/querying/sql.md

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-08-31 14:56:37 -07:00
dependabot[bot] cf674c833c
Bump maven-resources-plugin from 3.1.0 to 3.2.0 (#11525)
Bumps [maven-resources-plugin](https://github.com/apache/maven-resources-plugin) from 3.1.0 to 3.2.0.
- [Release notes](https://github.com/apache/maven-resources-plugin/releases)
- [Commits](https://github.com/apache/maven-resources-plugin/compare/maven-resources-plugin-3.1.0...maven-resources-plugin-3.2.0)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-resources-plugin
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2021-08-02 09:38:34 -07:00
Jihoon Son 98312d54cf
Fix CI for master (#11522) 2021-07-30 15:41:21 -07:00
Yuanli Han b83742179a
Reduce method invocation of reservoir sampling (#11257)
* reduce method invocation of reservoir sampling

* add a dynamic parameter and add benchmark

* rebase
2021-07-30 22:09:50 +08:00
Lucas Capistrant 9767b42e85
Add a new metric query/segments/count that is not emitted by default (#11394)
* Add a new metric query/segments/count that is not emitted by default

* docs

* test the default implementation of the metric

* fix spelling error in docs

* document the fact that query retries will result in additional metric emissions

* update using recommended text from @jihoonson
2021-07-22 17:57:35 -07:00
Clint Wylie 17efa6f556
add single input string expression dimension vector selector and better expression planning (#11213)
* add single input string expression dimension vector selector and better expression planning

* better

* fixes

* oops

* rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing

* oops

* javadocs, renaming

* more javadocs

* benchmarks

* use string expression vector processor with vector size 1 instead of expr.eval

* better logging

* javadocs, surprising number of the the

* more

* simplify
2021-07-06 11:20:49 -07:00
frank chen 906a704c55
Eliminate ambiguities of KB/MB/GB in the doc (#11333)
* GB ---> GiB

* suppress spelling check

* MB --> MiB, KB --> KiB

* Use IEC binary prefix

* Add reference link

* Fix doc style
2021-06-30 13:42:45 -07:00
Xavier Léauté 712f2a5d00
upgrade error-prone to 2.7.1 and support checks with Java 11+ (#11363)
* upgrade error-prone to 2.7.1 and support checks with Java 11+

- upgrade error-prone to 2.7.1
- support running error-prone with Java 11 and above using -Xplugin
  instead of custom compiler
- add compiler arguments to ignore warnings/errors in Java 15/16
- introduce strictCompile property to enable strict profiles since we
  now need multiple strict profiles for Java 8
- properly exclude all generated source files from error-prone
- fix druid-processing overriding annotation processors from parent pom
- fix druid-core disabling most non-default checks
- align plugin and annotation errorprone versions
- fix / suppress additional issues found by error-prone:
  * fix bug in SeekableStreamSupervisor initializing ArrayList size with
    the taskGroupdId
  * fix missing @Override annotations
- remove outdated compiler plugin in benchmarks
- remove deleted ParameterPackage error-prone rule
- re-enable checks on benchmark module as well

* fix IntelliJ inspections

* disable LongFloatConversion due to bug in error-prone with JDK 8

* add comment about InsecureCrypto
2021-06-16 12:55:34 -07:00
dependabot[bot] 167044f715
Bump fastutil from 8.2.3 to 8.5.4 (#11347)
* Bump fastutil from 8.2.3 to 8.5.4

Bumps [fastutil](https://github.com/vigna/fastutil) from 8.2.3 to 8.5.4.
- [Release notes](https://github.com/vigna/fastutil/releases)
- [Changelog](https://github.com/vigna/fastutil/blob/master/CHANGES)
- [Commits](https://github.com/vigna/fastutil/compare/8.2.3...8.5.4)

---
updated-dependencies:
- dependency-name: it.unimi.dsi:fastutil
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml
* update maven dependency list for -core and -extra libraries to pass maven dependency checks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2021-06-10 07:43:18 -07:00
Gian Merlino 202c78c8f3
Enable rewriting certain inner joins as filters. (#11068)
* Enable rewriting certain inner joins as filters.

The main logic for doing the rewrite is in JoinableFactoryWrapper's
segmentMapFn method. The requirements are:

- It must be an inner equi-join.
- The right-hand columns referenced by the condition must not contain any
  duplicate values. (If they did, the inner join would not be guaranteed
  to return at most one row for each left-hand-side row.)
- No columns from the right-hand side can be used by anything other than
  the join condition itself.

HashJoinSegmentStorageAdapter is also modified to pass through to
the base adapter (even allowing vectorization!) in the case where 100%
of join clauses could be rewritten as filters.

In support of this goal:

- Add Query getRequiredColumns() method to help us figure out whether
  the right-hand side of a join datasource is being used or not.
- Add JoinConditionAnalysis getRequiredColumns() method to help us
  figure out if the right-hand side of a join is being used by later
  join clauses acting on the same base.
- Add Joinable getNonNullColumnValuesIfAllUnique method to enable
  retrieving the set of values that will form the "in" filter.
- Add LookupExtractor canGetKeySet() and keySet() methods to support
  LookupJoinable in its efforts to implement the new Joinable method.
- Add "enableRewriteJoinToFilter" feature flag to
  JoinFilterRewriteConfig. The default is disabled.

* Test improvements.

* Test fixes.

* Avoid slow size() call.

* Remove invalid test.

* Fix style.

* Fix mistaken default.

* Small fixes.

* Fix logic error.
2021-04-14 10:49:27 -07:00
Clint Wylie 08d3786738
improve bitmap vector offset to report contiguous groups (#11039)
* improve bitmap vector offset to report contiguous groups

* benchmark style

* check for contiguous in getOffsets, tests for exceptions
2021-04-13 11:47:01 -07:00
Jihoon Son 25db8787b3
Fix CAST being ignored when aggregating on strings after cast (#11083)
* Fix CAST being ignored when aggregating on strings after cast

* fix checkstyle and dependency

* unused import
2021-04-12 22:21:24 -07:00
Maytas Monsereenusorn 4576152e4a
Make dropExisting flag for Compaction configurable and add warning documentations (#11070)
* Make dropExisting flag for Compaction configurable

* fix checkstyle

* fix checkstyle

* fix test

* add tests

* fix spelling

* fix docs

* add IT

* fix test

* fix doc

* fix doc
2021-04-09 00:12:28 -07:00
Clint Wylie 338886fd5f
vector group by support for string expressions (#11010)
* vector group by support for string expressions

* fix test

* comments, javadoc
2021-04-08 19:23:39 -07:00
Clint Wylie c0e6d1c7f8
vectorize 'auto' long decoding (#11004)
* Vectorize LongDeserializers.

Also, add many more tests.

* more faster

* more more faster

* more cleanup

* fixes

* forbidden

* benchmark style

* idk why

* adjust

* add preconditions for value >= 0 for writers

* add 64 bit exception

Co-authored-by: Gian Merlino <gian@imply.io>
2021-03-26 18:39:13 -07:00
Xavier Léauté 7a68cd8b86
use maven enforcer to check maven version (#10977)
* removes a warning about prerequisites only being allowed for plugins
* update maven enforcer plugin to the latest version (3.0.0-M3)
2021-03-11 07:30:10 -08:00
Yi Yuan 36e86a2880
Add protobuf schema registry (#10839)
* dd_protobuf_schema_registry

* change licese

* delete some annotation

* nodify tests

* delete extra exception

* add licenses

* add descriptor and protoMessageType in ProtobufInputRowParser for adopt to old version

* seperate kafka-protobuf-provider

* modify protobuf.md

* refine protobuf.md

* add config and header

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-03-09 15:15:51 -08:00
Jihoon Son 2c30f8b3b7
Migrate bitmap benchmarks to JMH (#10936)
* Migrate bitmap benchmarks to JMH

* add concise
2021-03-04 12:50:55 -08:00
Abhishek Agarwal 1a15987432
Supporting filters in the left base table for join datasources (#10697)
* where filter left first draft

* Revert changes in calcite test

* Refactor a bit

* Fixing the Tests

* Changes

* Adding tests

* Add tests for correlated queries

* Add comment

* Fix typos
2021-03-04 10:39:21 -08:00
Maytas Monsereenusorn 6541178c21
Support segmentGranularity for auto-compaction (#10843)
* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* resolve conflict

* Support segmentGranularity for auto-compaction

* Support segmentGranularity for auto-compaction

* fix tests

* fix more tests

* fix checkstyle

* add unit tests

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add unit tests

* add integration tests

* fix checkstyle

* fix checkstyle

* fix failing tests

* address comments

* address comments

* fix tests

* fix tests

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test

* fix test
2021-02-12 03:03:20 -08:00
Clint Wylie fe30f4b414
refactor sql lifecycle, druid planner, views, and view permissions (#10812)
* before i leaped i should've seen, the view from halfway down

* fixes

* fixes, more test

* rename

* fix style

* further refactoring

* review stuffs

* rename

* more javadoc and comments
2021-02-05 12:56:55 -08:00
Jihoon Son 95065bdf1a
Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
Liran Funaro 08ab82f55c
IncrementalIndex Tests and Benchmarks Parametrization (#10593)
* Remove redundant IncrementalIndex.Builder

* Parametrize incremental index tests and benchmarks

- Reveal and fix a bug in OffheapIncrementalIndex

* Fix forbiddenapis error: Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale]

* Fix Intellij errors: declared exception is never thrown

* Add documentation and validate before closing objects on tearDown.

* Add documentation to OffheapIncrementalIndexTestSpec

* Doc corrections and minor changes.

* Add logging for generated rows.

* Refactor new tests/benchmarks.

* Improve IncrementalIndexCreator documentation

* Add required tests for DataGenerator

* Revert "rollupOpportunity" to be a string
2021-01-07 22:18:47 -08:00
Jonathan Wei 68bb038b31
Multiphase segment merge for IndexMergerV9 (#10689)
* Multiphase merge for IndexMergerV9

* JSON fix

* Cleanup temp files

* Docs

* Address logging and add IT

* Fix spelling and test unloader datasource name
2021-01-05 22:19:09 -08:00
Jihoon Son d47d6cf081
Add time-to-first-result benchmark for groupBy (#10612) 2020-12-01 10:32:37 -08:00
Himanshu 30bcb0fd74
DataSourcesSnapshotBenchmark to measure iterateAllUsedSegmentsInSnapshot perf (#10604) 2020-11-29 14:42:14 -08:00
Abhishek Agarwal 4d2a92f46a
Add caching support to join queries (#10366)
* Proposed changes for making joins cacheable

* Add unit tests

* Fix tests

* simplify logic

* Pull empty byte array logic out of CachingQueryRunner

* remove useless null check

* Minor refactor

* Fix tests

* Fix segment caching on Broker

* Move join cache key computation in Broker

Move join cache key computation in Broker from ResultLevelCachingQueryRunner to CachingClusteredClient

* Fix compilation

* Review comments

* Add more tests

* Fix inspection errors

* Pushed condition analysis to JoinableFactory

* review comments

* Disable join caching for broker and add prefix key to BroadcastSegmentIndexedTable

* Remove commented lines

* Fix populateCache

* Disable caching for selective datasources

Refactored the code so that we can decide at the data source level, whether to enable cache for broker or data nodes
2020-10-09 17:42:30 -07:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Clint Wylie 1d6cb624f4
add vectorizeVirtualColumns query context parameter (#10432)
* add vectorizeVirtualColumns query context parameter

* oops

* spelling

* default to false, more docs

* fix test

* fix spelling
2020-09-28 18:48:34 -07:00
Clint Wylie 19c4b16640
vectorized expressions and expression virtual columns (#10401)
* vectorized expression virtual columns

* cleanup

* fixes

* preserve float if explicitly specified

* oops

* null handling fixes, more tests

* what is an expression planner?

* better names

* remove unused method, add pi

* move vector processor builders into static methods

* reduce boilerplate

* oops

* more naming adjustments

* changes

* nullable

* missing hex

* more
2020-09-23 13:56:38 -07:00
Suneet Saldanha 0b4c897fbe
Vectorized variance aggregators (#10390)
* wip vectorize

* close but not quite

* faster

* unit tests

* fix complex types for variance
2020-09-17 15:05:40 -07:00
Clint Wylie 084b23deed
benchmark for indexed table experiments (#10327)
* benchmark for indexed table experiments

* fix style

* teardown outside of measurement
2020-09-14 15:14:38 -07:00
Jihoon Son 8f14ac814e
More structured way to handle parse exceptions (#10336)
* More structured way to handle parse exceptions

* checkstyle; add more tests

* forbidden api; test

* address comment; new test

* address review comments

* javadoc for parseException; remove redundant parseException in streaming ingestion

* fix tests

* unnecessary catch

* unused imports

* appenderator test

* unused import
2020-09-11 16:31:10 -07:00
Suneet Saldanha a9de00d43a
Remove NUMERIC_HASHING_THRESHOLD (#10313)
* Make NUMERIC_HASHING_THRESHOLD configurable

Change the default numeric hashing threshold to 1 and make it configurable.

Benchmarks attached to this PR show that binary searches are not more faster
than doing a set contains check. The attached flamegraph shows the amount of
time a query spent in the binary search. Given the benchmarks, we can expect
to see roughly a 2x speed up in this part of the query which works out to
~ a 10% faster query in this instance.

* Remove NUMERIC_HASHING_THRESHOLD

* Remove stale docs
2020-08-25 20:05:39 -07:00
Clint Wylie c72f96a4ba
fix bug with expressions on sparse string realtime columns without explicit null valued rows (#10248)
* fix bug with realtime expressions on sparse string columns

* fix test

* add comment back

* push capabilities for dimensions to dimension indexers since they know things

* style

* style

* fixes

* getting a bit carried away

* missed one

* fix it

* benchmark build fix

* review stuffs

* javadoc and comments

* add comment

* more strict check

* fix missed usaged of impl instead of interface
2020-08-11 11:07:17 -07:00
Maytas Monsereenusorn 574b062f1f
Cluster wide default query context setting (#10208)
* Cluster wide default query context setting

* Cluster wide default query context setting

* Cluster wide default query context setting

* add docs

* fix docs

* update props

* fix checkstyle

* fix checkstyle

* fix checkstyle

* update docs

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix NPE
2020-07-29 15:19:18 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
Jihoon Son 657f8ee80f
Fix RetryQueryRunner to actually do the job (#10082)
* Fix RetryQueryRunner to actually do the job

* more javadoc

* fix test and checkstyle

* don't combine for testing

* address comments

* fix unit tests

* always initialize response context in cachingClusteredClient

* fix subquery

* address comments

* fix test

* query id for builders

* make queryId optional in the builders and ClusterQueryResult

* fix test

* suppress tests and unused methods

* exclude groupBy builder

* fix jacoco exclusion

* add tests for builders

* address comments

* don't truncate
2020-07-01 14:02:21 -07:00
Gian Merlino 5faa897a34
Join filter pre-analysis simplifications and sanity checks. (#10104)
* Join filter pre-analysis simplifications and sanity checks.

- At pre-analysis time, only compute pre-analysis for the innermost
  root query, since this is the one that will run on the join that involves
  the base datasource. Previously, pre-analyses were computed for multiple
  levels of the query, some of which were unnecessary.
- Remove JoinFilterPreAnalysisGroup and join query level gathering code,
  since they existed to support precomputation of multiple pre-analyses.
- Embed JoinFilterPreAnalysisKey into JoinFilterPreAnalysis and use it to
  sanity check at processing time that the correct pre-analysis was done.

Tangentially related changes:

- Remove prioritizeAndLaneQuery functionality from LocalQuerySegmentWalker.
  The computed priority and lanes were not being used.
- Add "getBaseQuery" method to DataSourceAnalysis to support identification
  of the proper subquery for filter pre-analysis.

* Fix compilation errors.

* Adjust tests.
2020-06-30 19:14:22 -07:00
xhl0726 1596b3eacd
Optimize protobuf parsing for flatten data (#9999)
* optimize for protobuf parsing

* fix import error and maven dependency

* add unit test in protobufInputrowParserTest for flatten data

* solve code duplication (remove the log and main())

* rename 'flatten' to 'flat' to make it clearer

Co-authored-by: xionghuilin <xionghuilin@bytedance.com>
2020-06-24 18:01:31 -07:00
Clint Wylie c2f5d453f8
fix topn on string columns with non-sorted or non-unique dictionaries (#10053)
* fix topn on string columns with non-sorted or non-unique dictionaries

* fix metadata tests

* refactor, clarify comments and code, fix ci failures
2020-06-19 11:35:18 -07:00