Commit Graph

156 Commits

Author SHA1 Message Date
Clint Wylie 17efa6f556
add single input string expression dimension vector selector and better expression planning (#11213)
* add single input string expression dimension vector selector and better expression planning

* better

* fixes

* oops

* rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing

* oops

* javadocs, renaming

* more javadocs

* benchmarks

* use string expression vector processor with vector size 1 instead of expr.eval

* better logging

* javadocs, surprising number of the the

* more

* simplify
2021-07-06 11:20:49 -07:00
dependabot[bot] 167044f715
Bump fastutil from 8.2.3 to 8.5.4 (#11347)
* Bump fastutil from 8.2.3 to 8.5.4

Bumps [fastutil](https://github.com/vigna/fastutil) from 8.2.3 to 8.5.4.
- [Release notes](https://github.com/vigna/fastutil/releases)
- [Changelog](https://github.com/vigna/fastutil/blob/master/CHANGES)
- [Commits](https://github.com/vigna/fastutil/compare/8.2.3...8.5.4)

---
updated-dependencies:
- dependency-name: it.unimi.dsi:fastutil
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* update licenses.yaml
* update maven dependency list for -core and -extra libraries to pass maven dependency checks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xavier Léauté <xvrl@apache.org>
2021-06-10 07:43:18 -07:00
Clint Wylie f6662b4893
fix count and average SQL aggregators on constant virtual columns (#11208)
* fix count and average SQL aggregators on constant virtual columns

* style

* even better, why are we tracking virtual columns in aggregations at all if we have a virtual column registry

* oops missed a few

* remove unused

* this will fix it
2021-05-10 13:41:48 -07:00
Clint Wylie 691d7a1d54
SQL timeseries no longer skip empty buckets with all granularity (#11188)
* SQL timeseries no longer skip empty buckets with all granularity

* add comment, fix tests

* the ol switcheroo

* revert unintended change

* docs and more tests

* style

* make checkstyle happy

* docs fixes and more tests

* add docs, tests for array_agg

* fixes

* oops

* doc stuffs

* fix compile, match doc style
2021-05-10 10:13:37 -07:00
Jihoon Son 25db8787b3
Fix CAST being ignored when aggregating on strings after cast (#11083)
* Fix CAST being ignored when aggregating on strings after cast

* fix checkstyle and dependency

* unused import
2021-04-12 22:21:24 -07:00
Jihoon Son 95065bdf1a
Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Abhishek Agarwal a5c46dc84b
Add vectorization for druid-histogram extension (#10304)
* First draft

* Remove redundant code from FixedBucketsHistogramAggregator classes

* Add test cases for new classes

* Fix tests in sql compatible mode

* Typo fix

* Fix comment

* Add spelling

* Vectorize only for supported types

* Rename internal aggregator files

* Fix tests
2020-09-09 13:56:33 -07:00
Clint Wylie ab60661008
refactor internal type system (#9638)
* better type tracking: add typed postaggs, finalized types for agg factories

* more javadoc

* adjustments

* transition to getTypeName to be used exclusively for complex types

* remove unused fn

* adjust

* more better

* rename getTypeName to getComplexTypeName

* setup expression post agg for type inference existing

* more javadocs

* fixup

* oops

* more test

* more test

* more comments/javadoc

* nulls

* explicitly handle only numeric and complex aggregators for incremental index

* checkstyle

* more tests

* adjust

* more tests to showcase difference in behavior

* timeseries longsum array
2020-08-26 10:53:44 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
Clint Wylie 2c0746cfab
increase druid-histogram postagg test coverage (#9732) 2020-05-07 00:10:29 -07:00
Suneet Saldanha 332ca19621
Fix potential integer overflow issues (#9609)
ApproximateHistogram - seems unlikely
SegmentAnalyzer - unclear if this is an actual issue
GenericIndexedWriter - unclear if this is an actual issue
IncrementalIndexRow and OnheapIncrementalIndex are non-issues becaus it's very
unlikely for the number of dims to be large enough to hit the overflow
condition
2020-04-10 11:47:08 -07:00
Suneet Saldanha 1ced3b33fb
IntelliJ inspections cleanup (#9339)
* IntelliJ inspections cleanup

* Standard Charset object can be used
* Redundant Collection.addAll() call
* String literal concatenation missing whitespace
* Statement with empty body
* Redundant Collection operation
* StringBuilder can be replaced with String
* Type parameter hides visible type

* fix warnings in test code

* more test fixes

* remove string concatenation inspection error

* fix extra curly brace

* cleanup AzureTestUtils

* fix charsets for RangerAdminClient

* review comments
2020-04-10 10:04:40 -07:00
Jihoon Son 0da8ffc3ff
Bump up development version to 0.19.0-SNAPSHOT (#9586) 2020-03-30 16:24:04 -07:00
Gian Merlino 1ef25a438f
Broker: Add ability to inline subqueries. (#9533)
* Broker: Add ability to inline subqueries.

The main changes:

- ClientQuerySegmentWalker: Add ability to inline queries.
- Query: Add "getSubQueryId" and "withSubQueryId" methods.
- QueryMetrics: Add "subQueryId" dimension.
- ServerConfig: Add new "maxSubqueryRows" parameter, which is used by
  ClientQuerySegmentWalker to limit how many rows can be inlined per
  query.
- IndexedTableJoinMatcher: Allow creating keys on top of unknown types,
  by assuming they are strings. This is useful because not all types are
  known for fields in query results.
- InlineDataSource: Store RowSignature rather than component parts. Add
  more zealous "equals" and "hashCode" methods to ease testing.
- Moved QuerySegmentWalker test code from CalciteTests and
  SpecificSegmentsQueryWalker in druid-sql to QueryStackTests in
  druid-server. Use this to spin up a new ClientQuerySegmentWalkerTest.

* Adjustments from CI.

* Fix integration test.
2020-03-18 15:06:45 -07:00
Gian Merlino ff59d2e78b
Move RowSignature from druid-sql to druid-processing and make use of it. (#9508)
* Move RowSignature from druid-sql to druid-processing and make use of it.

1) Moved (most of) RowSignature from sql to processing. Left behind the SQL-specific
   stuff in a RowSignatures utility class. It also picked up some new convenience
   methods along the way.
2) There were a lot of places in the code where Map<String, ValueType> was used to
   associate columns with type info. These are now all replaced with RowSignature.
3) QueryToolChest's resultArrayFields method is replaced with resultArraySignature,
   and it now provides type info.

* Fix up extensions.

* Various fixes
2020-03-12 11:06:44 -07:00
Gian Merlino c6c2282b59
Harmonization and bug-fixing for selector and filter behavior on unknown types. (#9484)
* Harmonization and bug-fixing for selector and filter behavior on unknown types.

- Migrate ValueMatcherColumnSelectorStrategy to newer ColumnProcessorFactory
  system, and set defaultType COMPLEX so unknown types can be dynamically matched.
- Remove ValueGetters in favor of ColumnComparisonFilter doing its own thing.
- Switch various methods to use convertObjectToX when casting to numbers, rather
  than ad-hoc and inconsistent logic.
- Fix bug in RowBasedExpressionColumnValueSelector: isBindingArray should return
  true even for 0- or 1- element arrays.
- Adjust various javadocs.

* Add throwParseExceptions option to Rows.objectToNumber, switch back to that.

* Update tests.

* Adjust moment sketch tests.
2020-03-10 07:15:57 -07:00
Clint Wylie b408a6d774
sql support for dynamic parameters (#6974)
* sql support for dynamic parameters

* fixup

* javadocs

* fixup from merge

* formatting

* fixes

* fix it

* doc fix

* remove druid fallback self-join parameterized test

* unused imports

* ignore test for now

* fix imports

* fixup

* fix merge

* merge fixup

* fix test that cannot vectorize

* fixup and more better

* dependency thingo

* fix docs

* tweaks

* fix docs

* spelling

* unused imports after merge

* review stuffs

* add comment

* add ignore text

* review stuffs
2020-02-19 13:09:20 -08:00
Suneet Saldanha 51d7864935
Codestyle - use java style array declaration (#9338)
* Codestyle - use java style array declaration

Replaced C-style array declarations with java style declarations and marked
the intelliJ inspection as an error

* cleanup test code
2020-02-10 14:25:26 -08:00
Clint Wylie 5c541f556b
remove log.info from FixedBucketsHistogramAggregator aggregate method (#9309) 2020-02-04 11:52:50 -08:00
Suneet Saldanha 33a97dfaae
Guicify druid sql module (#9279)
* Guicify druid sql module

Break up the SQLModule in to smaller modules and provide a binding that
modules can use to register schemas with druid sql.

* fix some tests

* address code review

* tests compile

* Working tests

* Add all the tests

* fix up licenses and dependencies

* add calcite dependency to druid-benchmarks

* tests pass

* rename the schemas
2020-02-04 11:33:48 -08:00
Gian Merlino b411443d22
SQL join support for lookups. (#9294)
* SQL join support for lookups.

1) Add LookupSchema to SQL, so lookups show up in the catalog.
2) Add join-related rels and rules to SQL, allowing joins to be planned into
   native Druid queries.

* Add two missing LookupSchema calls in tests.

* Fix tests.

* Fix typo.
2020-01-31 23:51:16 -08:00
Gian Merlino d21054f7c5
Remove the deprecated interval-chunking stuff. (#9216)
* Remove the deprecated interval-chunking stuff.

See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details.

* Remove unused import.

* Remove chunkInterval too.
2020-01-19 17:14:23 -08:00
Clint Wylie 85219ece13 fix null handling for arithmetic post aggregator comparator (#9159)
* fix null handling for arithmetic postagg comparator, add test for comparator for min/max/quantile postaggs in histogram ext

* fix
2020-01-10 13:49:19 -08:00
Jonathan Wei 4e8368a5d9 Set version to 0.18.0-SNAPSHOT (#9109) 2020-01-02 17:55:10 -05:00
Jonathan Wei 8af41d7cd0 Update version to 0.18.0-incubating-SNAPSHOT (#9009) 2019-12-11 14:04:03 -08:00
jon-wei dfbc066163 Revert "[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1"
This reverts commit a0f21d9b07.
2019-11-27 23:22:43 -08:00
jon-wei 0402ff85b8 Revert "[maven-release-plugin] prepare for next development iteration"
This reverts commit 8ffa71e7e6.
2019-11-27 23:22:32 -08:00
jon-wei 8ffa71e7e6 [maven-release-plugin] prepare for next development iteration 2019-11-27 23:18:48 -08:00
jon-wei a0f21d9b07 [maven-release-plugin] prepare release druid-0.16.1-incubating-rc1 2019-11-27 23:18:37 -08:00
Clint Wylie 3fcaa1a61b
fix sql compatible null handling config work with runtime.properties (#8876)
* fix sql compatible null handling config work with runtime.properties

* fix npe

* fix tests

* add friendly error

* comment, and friendlier still

* fix compile

* fix from merges
2019-11-20 03:55:29 -08:00
Roman Leventov 5c0fc0a13a Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564)
* IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals()

* Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever

* Fix tests

* More fixes

* Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code

* Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase

* More test fixes

* More test fixes

* Add a comment to VersionedIntervalTimelineTestBase

* Fix tests

* Set DataSegment.size(0) in more tests

* Specify DataSegment.size(0) in more places in tests

* Fix more tests

* Fix DruidSchemaTest

* Set DataSegment's size in more tests and benchmarks

* Fix HdfsDataSegmentPusherTest

* Doc changes addressing comments

* Extended doc for visibility

* Typo

* Typo 2

* Address comment
2019-11-06 11:07:04 -08:00
Chi Cao Minh 5f61374cb3 Fix dependency analyze warnings (#8230)
* Fix dependency analyze warnings

Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports and
updated druid-forbidden-apis to prevent regressions.

* Address review comments

* Adjust scope for org.glassfish.jaxb:jaxb-runtime

* Fix dependencies for hdfs-storage

* Consolidate netty4 versions
2019-09-09 14:37:21 -07:00
Clint Wylie c73a489335
bump master version to 0.17.0-incubating-SNAPSHOT (#8421) 2019-08-28 01:58:36 -07:00
SandishKumarHN 33f0753a70 Add Checkstyle for constant name static final (#8060)
* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* check ctyle for constant field name

* merging with upstream

* review-1

* unknow changes

* unknow changes

* review-2

* merging with master

* review-2 1 changes

* review changes-2 2

* bug fix
2019-08-23 13:13:54 +03:00
Clint Wylie 1054d85171
add mechanism to control filter optimization in historical query processing (#8209)
* add support for mechanism to control filter optimization in historical query processing

* oops

* adjust

* woo

* javadoc

* review comments

* fix

* default

* oops

* oof

* this will fix it

* more nullable, refactor DimFilter.getRequiredColumns to use Set, formatting

* extract class DimFilterToStringBuilder with common code from custom DimFilter toString implementations

* adjust variable naming

* missing nullable

* more nullable

* fix javadocs

* nullable

* address review comments

* javadocs, precondition

* nullable

* rename method to be consistent

* review comments

* remove tuning from ColumnComparisonFilter/ColumnComparisonDimFilter
2019-08-09 16:36:18 -07:00
Fokko Driesprong 91743eeebe Spotbugs: NP_NONNULL_PARAM_VIOLATION (#8129) 2019-08-02 19:20:22 +03:00
Gian Merlino 77297f4e6f GroupBy array-based result rows. (#8196)
* GroupBy array-based result rows.

Fixes #8118; see that proposal for details.

Other than the GroupBy changes, the main other "interesting" classes are:

- ResultRow: The array-based result type.
- BaseQuery: T is no longer required to be Comparable.
- QueryToolChest: Adds "decorateObjectMapper" to enable query-aware serialization
  and deserialization of result rows (necessary due to their positional nature).
- QueryResource: Uses the new decoration functionality.
- DirectDruidClient: Also uses the new decoration functionality.
- QueryMaker (in Druid SQL): Modifications to read ResultRows.

These classes weren't changed, but got some new javadocs:

- BySegmentQueryRunner
- FinalizeResultsQueryRunner
- Query

* Adjustments for TC stuff.
2019-07-31 16:15:12 -07:00
Chi Cao Minh ab71a2e1e4 Revert "Fix dependency analyze warnings (#8128)" (#8189)
This reverts commit 5dd0d8e873.
2019-07-29 11:42:16 -07:00
Chi Cao Minh 5dd0d8e873 Fix dependency analyze warnings (#8128)
* Fix dependency analyze warnings

Update the maven dependency plugin to the latest version and fix all
warnings for unused declared and used undeclared dependencies in the
compile scope. Added new travis job to add the check to CI. Also fixed
some source code files to use the correct packages for their imports.

* Fix licenses and dependencies

* Fix licenses and dependencies again

* Fix integration test dependency

* Address review comments

* Fix unit test dependencies

* Fix integration test dependency

* Fix integration test dependency again

* Fix integration test dependency third time

* Fix integration test dependency fourth time

* Fix compile error

* Fix assert package
2019-07-26 10:49:03 -07:00
Eugene Sevastianov 799d20249f Response context refactoring (#8110)
* Response context refactoring

* Serialization/Deserialization of ResponseContext

* Added java doc comments

* Renamed vars related to ResponseContext

* Renamed empty() methods to createEmpty()

* Fixed ResponseContext usage

* Renamed multiple ResponseContext static fields

* Added PublicApi annotations

* Renamed QueryResponseContext class to ResourceIOReaderWriter

* Moved the protected method below public static constants

* Added createEmpty method to ResponseContext with DefaultResponseContext creation

* Fixed inspection error

* Added comments to the ResponseContext length limit and ResponseContext
http header name

* Added a comment of possible future refactoring

* Removed .gitignore file of indexing-service

* Removed a never-used method

* VisibleForTesting method reducing boilerplate

Co-Authored-By: Clint Wylie <cjwylie@gmail.com>

* Reduced boilerplate

* Renamed the method serialize to serializeWith

* Removed unused import

* Fixed incorrectly refactored test method

* Added comments for ResponseContext keys

* Fixed incorrectly refactored test method

* Fixed IntervalChunkingQueryRunnerTest mocks
2019-07-24 18:29:03 +03:00
Clint Wylie 03e55d30eb
add CachingClusteredClient benchmark, refactor some stuff (#8089)
* add CachingClusteredClient benchmark, refactor some stuff

* revert WeightedServerSelectorStrategy to ConnectionCountServerSelectorStrategy and remove getWeight since felt artificial, default mergeResults in toolchest implementation for topn, search, select

* adjust javadoc

* adjustments

* oops

* use it

* use BinaryOperator, remove CombiningFunction, use Comparator instead of Ordering, other review adjustments

* rename createComparator to createResultComparator, fix typo, firstNonNull nullable parameters
2019-07-18 13:16:28 -07:00
Clint Wylie abf9843e2a fail complex type 'serde' registration when registered type does not match expected type (#7985)
* make ComplexMetrics.registerSerde type check on register, resolves #7959

* add test

* simplify

* unused imports :/

* simplify

* burned by imports yet again

* remove unused constructor

* switch to getName

* heh oops
2019-07-11 23:03:15 -07:00
Xue Yu b9c6a26c0e Use ComplexMetrics.registerSerde() across the codebase (#7925)
* refactor complexmetric registerserde

* fix error

* feedback address
2019-06-25 11:39:04 +03:00
Jonathan Wei 35601bb7a0 Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory (#7784)
* Add finalizeAsBase64Binary option to FixedBucketsHistogramAggregatorFactory

* Add finalizeAsBase64Binary option to ApproximateHistogramFactory

* Update approx histogram doc
2019-06-21 18:00:19 -07:00
Jihoon Son 7abfbb066a Bump up snapshot version to 0.16.0 (#7802) 2019-05-30 17:17:33 -07:00
Gian Merlino 8649b8ab4c
SQL: Allow select-sort-project query shapes. (#7769)
* SQL: Allow select-sort-project query shapes.

Fixes #7768.

Design changes:

- In PartialDruidQuery, allow projection after select + sort by removing
  the SELECT_SORT query stage and instead allowing the SORT and
  SORT_PROJECT stages to apply either after aggregation or after a plain
  non-aggregating select. This is different from prior behavior, where
  SORT and SORT_PROJECT were only considered valid after aggregation
  stages. This logic change is in the "canAccept" method.
- In DruidQuery, represent either kind of sorting with a single "Sorting"
  class (instead of DefaultLimitSpec). The Sorting class is still
  convertible into a DefaultLimitSpec, but is also convertible into the
  sorting parameters accepted by a Scan query.
- In DruidQuery, represent post-select and post-sorting projections with
  a single "Projection" class. This obsoletes the SortProject and
  SelectProjection classes, and simplifies the DruidQuery by allowing us
  to move virtual-column and post-aggregator-creation logic into the
  new Projection class.
- Split "DruidQuerySignature" into RowSignature and VirtualColumnRegistry.
  This effectively means that instead of having mutable and immutable
  versions of DruidQuerySignature, we instead of RowSignature (always
  immutable) and VirtualColumnRegistry (always mutable, but sometimes
  null). This change wasn't required, but IMO it this makes the logic
  involving them easier to follow, and makes it more clear when the
  virtual column registry is active and when it's not.

Other changes:

- ConvertBoundsToSelectors now just accepts a RowSignature, but we
  use the VirtualColumnRegistry.getFullRowSignature() method to get
  a signature that includes all columns, and therefore allows us to
  simplify the logic (no need to special-case virtual columns).
- Add `__time` to the Scan column list if the query is ordering by time.

* Remove unused import.
2019-05-30 12:56:29 -07:00
Roman Leventov 782863ed0f Fix some problems reported by PVS-Studio (#7738)
* Fix some problems reported by PVS-Studio

* Address comments
2019-05-29 11:20:45 -07:00
Clint Wylie ffc2397bcd fix AggregatorFactory.finalizeComputation implementations to be ok with null inputs (#7731)
* AggregatorFactory finalizeComputation is nullable with nullable input, make implementations honor this

* fixes
2019-05-22 21:13:09 -07:00
Roman Leventov bca40dcdaf
Fix some IntelliJ inspections (#7273)
Prepare TeamCity for IntelliJ 2018.3.1 upgrade. Mostly removed redundant exceptions declarations in `throws` clauses.
2019-03-25 21:11:01 -03:00