Commit Graph

273 Commits

Author SHA1 Message Date
Jihoon Son d47d6cf081
Add time-to-first-result benchmark for groupBy (#10612) 2020-12-01 10:32:37 -08:00
Himanshu 30bcb0fd74
DataSourcesSnapshotBenchmark to measure iterateAllUsedSegmentsInSnapshot perf (#10604) 2020-11-29 14:42:14 -08:00
Abhishek Agarwal 4d2a92f46a
Add caching support to join queries (#10366)
* Proposed changes for making joins cacheable

* Add unit tests

* Fix tests

* simplify logic

* Pull empty byte array logic out of CachingQueryRunner

* remove useless null check

* Minor refactor

* Fix tests

* Fix segment caching on Broker

* Move join cache key computation in Broker

Move join cache key computation in Broker from ResultLevelCachingQueryRunner to CachingClusteredClient

* Fix compilation

* Review comments

* Add more tests

* Fix inspection errors

* Pushed condition analysis to JoinableFactory

* review comments

* Disable join caching for broker and add prefix key to BroadcastSegmentIndexedTable

* Remove commented lines

* Fix populateCache

* Disable caching for selective datasources

Refactored the code so that we can decide at the data source level, whether to enable cache for broker or data nodes
2020-10-09 17:42:30 -07:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Clint Wylie 1d6cb624f4
add vectorizeVirtualColumns query context parameter (#10432)
* add vectorizeVirtualColumns query context parameter

* oops

* spelling

* default to false, more docs

* fix test

* fix spelling
2020-09-28 18:48:34 -07:00
Clint Wylie 19c4b16640
vectorized expressions and expression virtual columns (#10401)
* vectorized expression virtual columns

* cleanup

* fixes

* preserve float if explicitly specified

* oops

* null handling fixes, more tests

* what is an expression planner?

* better names

* remove unused method, add pi

* move vector processor builders into static methods

* reduce boilerplate

* oops

* more naming adjustments

* changes

* nullable

* missing hex

* more
2020-09-23 13:56:38 -07:00
Suneet Saldanha 0b4c897fbe
Vectorized variance aggregators (#10390)
* wip vectorize

* close but not quite

* faster

* unit tests

* fix complex types for variance
2020-09-17 15:05:40 -07:00
Clint Wylie 084b23deed
benchmark for indexed table experiments (#10327)
* benchmark for indexed table experiments

* fix style

* teardown outside of measurement
2020-09-14 15:14:38 -07:00
Jihoon Son 8f14ac814e
More structured way to handle parse exceptions (#10336)
* More structured way to handle parse exceptions

* checkstyle; add more tests

* forbidden api; test

* address comment; new test

* address review comments

* javadoc for parseException; remove redundant parseException in streaming ingestion

* fix tests

* unnecessary catch

* unused imports

* appenderator test

* unused import
2020-09-11 16:31:10 -07:00
Suneet Saldanha a9de00d43a
Remove NUMERIC_HASHING_THRESHOLD (#10313)
* Make NUMERIC_HASHING_THRESHOLD configurable

Change the default numeric hashing threshold to 1 and make it configurable.

Benchmarks attached to this PR show that binary searches are not more faster
than doing a set contains check. The attached flamegraph shows the amount of
time a query spent in the binary search. Given the benchmarks, we can expect
to see roughly a 2x speed up in this part of the query which works out to
~ a 10% faster query in this instance.

* Remove NUMERIC_HASHING_THRESHOLD

* Remove stale docs
2020-08-25 20:05:39 -07:00
Clint Wylie c72f96a4ba
fix bug with expressions on sparse string realtime columns without explicit null valued rows (#10248)
* fix bug with realtime expressions on sparse string columns

* fix test

* add comment back

* push capabilities for dimensions to dimension indexers since they know things

* style

* style

* fixes

* getting a bit carried away

* missed one

* fix it

* benchmark build fix

* review stuffs

* javadoc and comments

* add comment

* more strict check

* fix missed usaged of impl instead of interface
2020-08-11 11:07:17 -07:00
Maytas Monsereenusorn 574b062f1f
Cluster wide default query context setting (#10208)
* Cluster wide default query context setting

* Cluster wide default query context setting

* Cluster wide default query context setting

* add docs

* fix docs

* update props

* fix checkstyle

* fix checkstyle

* fix checkstyle

* update docs

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix NPE
2020-07-29 15:19:18 -07:00
Clint Wylie c86e7ce30b
bump version to 0.20.0-SNAPSHOT (#10124) 2020-07-06 15:08:32 -07:00
Jihoon Son 657f8ee80f
Fix RetryQueryRunner to actually do the job (#10082)
* Fix RetryQueryRunner to actually do the job

* more javadoc

* fix test and checkstyle

* don't combine for testing

* address comments

* fix unit tests

* always initialize response context in cachingClusteredClient

* fix subquery

* address comments

* fix test

* query id for builders

* make queryId optional in the builders and ClusterQueryResult

* fix test

* suppress tests and unused methods

* exclude groupBy builder

* fix jacoco exclusion

* add tests for builders

* address comments

* don't truncate
2020-07-01 14:02:21 -07:00
Gian Merlino 5faa897a34
Join filter pre-analysis simplifications and sanity checks. (#10104)
* Join filter pre-analysis simplifications and sanity checks.

- At pre-analysis time, only compute pre-analysis for the innermost
  root query, since this is the one that will run on the join that involves
  the base datasource. Previously, pre-analyses were computed for multiple
  levels of the query, some of which were unnecessary.
- Remove JoinFilterPreAnalysisGroup and join query level gathering code,
  since they existed to support precomputation of multiple pre-analyses.
- Embed JoinFilterPreAnalysisKey into JoinFilterPreAnalysis and use it to
  sanity check at processing time that the correct pre-analysis was done.

Tangentially related changes:

- Remove prioritizeAndLaneQuery functionality from LocalQuerySegmentWalker.
  The computed priority and lanes were not being used.
- Add "getBaseQuery" method to DataSourceAnalysis to support identification
  of the proper subquery for filter pre-analysis.

* Fix compilation errors.

* Adjust tests.
2020-06-30 19:14:22 -07:00
xhl0726 1596b3eacd
Optimize protobuf parsing for flatten data (#9999)
* optimize for protobuf parsing

* fix import error and maven dependency

* add unit test in protobufInputrowParserTest for flatten data

* solve code duplication (remove the log and main())

* rename 'flatten' to 'flat' to make it clearer

Co-authored-by: xionghuilin <xionghuilin@bytedance.com>
2020-06-24 18:01:31 -07:00
Clint Wylie c2f5d453f8
fix topn on string columns with non-sorted or non-unique dictionaries (#10053)
* fix topn on string columns with non-sorted or non-unique dictionaries

* fix metadata tests

* refactor, clarify comments and code, fix ci failures
2020-06-19 11:35:18 -07:00
Jonathan Wei 37e150c075
Fix join filter rewrites with nested queries (#10015)
* Fix join filter rewrites with nested queries

* Fix test, inspection, coverage

* Remove clauses from group key

* Fix import order

Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2020-06-18 21:32:29 -07:00
Clint Wylie f8b643ec72
make joinables closeable (#9982)
* make joinables closeable

* tests and adjustments

* refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests

* fixes

* javadocs and stuff

* fix bugs

* more test

* fix lgtm alert

* simplify

* fixup javadoc

* review stuffs

* safeguard against exceptions

* i hate this checkstyle rule

* make IndexedTable extend Closeable
2020-06-09 20:12:36 -07:00
Clint Wylie c5d6163c76
add a GeneratorInputSource to fill up a cluster with generated data for testing (#9946)
* move benchmark data generator into druid-processing, add a GeneratorInputSource to fill up a cluster with data

* newlines

* make test coverage not fail maybe

* remove useless test

* Update pom.xml

* Update GeneratorInputSourceTest.java

* less passive aggressive test names
2020-06-09 19:31:04 -07:00
Clint Wylie 77dd5b06ae
ColumnCapabilities.hasMultipleValues refactor (#9731)
* transition ColumnCapabilities.hasMultipleValues to Capable enum, remove ColumnCapabilities.isComplete

* remove artifical, always multi-value capabilities from IncrementalIndexStorageAdapter and fix up fallout from that, fix ColumnCapabilities merge in index merger

* fix typo

* remove unused method

* review stuffs, revert IncrementalIndexStorageAdapater capabilities change, plumb lame workaround to SegmentAnalyzer

* more comment

* use volatile booleans

* fix line length

* correctly handle missing columns for vector processors

* return ColumnCapabilities.Capable for BitmapIndexSelector.hasMultipleValues, fix vector processor selection for complex

* false on non-existent
2020-06-04 23:52:37 -07:00
Suneet Saldanha 9c40bebc02
Refactor JoinFilterAnalyzer - part 2 (#9929)
* Refactor JoinFilterAnalyzer

This patch attempts to make it easier to follow the join filter analysis code
with the hope of making it easier to add rewrite optimizations in the future.

To keep the patch small and easy to review, this is the first of at least 2
patches that are planned.

This patch adds a builder to the Pre-Analysis, so that it is easier to
instantiate the preAnalysis. It also moves some of the filter normalization
code out to Fitlers with associated tests.

* fix tests

* Refactor JoinFilterAnalyzer - part 2

This change introduces the following components:
 * RhsRewriteCandidates - a wrapper for a list of candidates and associated
     functions to operate on the set of candidates.
 * JoinableClauses - a wrapper for the list of JoinableClause that represent
     a join condition and the associated functions to operate on the clauses.
 * Equiconditions - a wrapper representing the equiconditions that are used
     in the join condition.

And associated test changes.

This refactoring surfaced 2 bugs:
 - Missing equals and hashcode implementation for RhsRewriteCandidate, thus
   allowing potential duplicates in the rhs rewrite candidates
 - Missing Filter#supportsRequiredColumnRewrite check in
   analyzeJoinFilterClause, which could result in UnsupportedOperationException
   being thrown by the filter

* fix compile error

* remove unused class
2020-05-29 15:03:35 -07:00
Chi Cao Minh 427239f451
Enforce code coverage (#9863)
* Enforce code coverage

Add an automated way of checking if new code has adequate unit tests,
since merging code coverage reports and check coverage thresholds via
coveralls or codecov is unreliable.

The following minimum unit test code coverage is now enforced:
- 80% functions
- 65% branch
- 65% line

Branch and line coverage thresholds are slightly lower for now as they
are harder to achieve.

After the code coverage check looks reliable, the thresholds can be
increased later if needed.

* Add comments
2020-05-20 09:31:37 -07:00
mcbrewster 28be107a1c
add flag to flattenSpec to keep null columns (#9814)
* add flag to flattenSpec to keep null columns

* remove changes to inputFormat interface

* add comment

* change comment message

* update web console e2e test

* move keepNullColmns to JSONParseSpec

* fix merge conflicts

* fix tests

* set keepNullColumns to false by default

* fix lgtm

* change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns

* Add equals verifier tests
2020-05-08 21:53:39 -07:00
BIGrey ee9a721acc
fix npe in IncrementalIndexReadBenchmark (#9754)
Co-authored-by: 黄辉 <huanghui.bigrey@bytedance.com>
2020-05-03 12:52:50 -07:00
Clint Wylie 9a293d554d
remove UnionMergeRule rules from SQL planner (#9797) 2020-05-01 12:50:11 -07:00
BIGrey c5bfe36011
Optimize FileWriteOutBytes to avoid high system cpu usage (#9722)
* optimize FileWriteOutBytes to avoid high sys cpu

* optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException

* optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException in writeOutBytes.size

* Revert "optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException in writeOutBytes.size"

This reverts commit 965f7421

* Revert "optimize FileWriteOutBytes to avoid high sys cpu -- remove IOException"

This reverts commit 149e08c0

* optimize FileWriteOutBytes to avoid high sys cpu -- avoid IOEception never thrown check

* Fix size counting to handle IOE in FileWriteOutBytes + tests

* remove unused throws IOException in WriteOutBytes.size()

* Remove redundant throws IOExcpetion clauses

* Parameterize IndexMergeBenchmark

Co-authored-by: huanghui.bigrey <huanghui.bigrey@bytedance.com>
Co-authored-by: Suneet Saldanha <suneet.saldanha@imply.io>
2020-04-23 20:18:42 -07:00
Suneet Saldanha 1ced3b33fb
IntelliJ inspections cleanup (#9339)
* IntelliJ inspections cleanup

* Standard Charset object can be used
* Redundant Collection.addAll() call
* String literal concatenation missing whitespace
* Statement with empty body
* Redundant Collection operation
* StringBuilder can be replaced with String
* Type parameter hides visible type

* fix warnings in test code

* more test fixes

* remove string concatenation inspection error

* fix extra curly brace

* cleanup AzureTestUtils

* fix charsets for RangerAdminClient

* review comments
2020-04-10 10:04:40 -07:00
Jihoon Son a6790ff22a
More optimize CNF conversion of filters (#9634)
* More optimize CNF conversion of filters

* update license

* fix build

* checkstyle

* remove unnecessary code

* split helper

* license

* checkstyle

* add comments on cnf conversion
2020-04-08 21:31:17 -07:00
Jihoon Son 40e84a171b
Eliminate common subfilters when converting it to a CNF (#9608) 2020-04-05 22:29:41 -07:00
Jihoon Son 0da8ffc3ff
Bump up development version to 0.19.0-SNAPSHOT (#9586) 2020-03-30 16:24:04 -07:00
Clint Wylie fa5da6693c
add lane enforcement for joinish queries (#9563)
* add lane enforcement for joinish queries

* oops

* style

* review stuffs
2020-03-30 11:58:16 -07:00
Clint Wylie 2c49f6d89a
error on value counter overflow instead of writing sad segments (#9559) 2020-03-26 16:54:48 -07:00
Gian Merlino 1ef25a438f
Broker: Add ability to inline subqueries. (#9533)
* Broker: Add ability to inline subqueries.

The main changes:

- ClientQuerySegmentWalker: Add ability to inline queries.
- Query: Add "getSubQueryId" and "withSubQueryId" methods.
- QueryMetrics: Add "subQueryId" dimension.
- ServerConfig: Add new "maxSubqueryRows" parameter, which is used by
  ClientQuerySegmentWalker to limit how many rows can be inlined per
  query.
- IndexedTableJoinMatcher: Allow creating keys on top of unknown types,
  by assuming they are strings. This is useful because not all types are
  known for fields in query results.
- InlineDataSource: Store RowSignature rather than component parts. Add
  more zealous "equals" and "hashCode" methods to ease testing.
- Moved QuerySegmentWalker test code from CalciteTests and
  SpecificSegmentsQueryWalker in druid-sql to QueryStackTests in
  druid-server. Use this to spin up a new ClientQuerySegmentWalkerTest.

* Adjustments from CI.

* Fix integration test.
2020-03-18 15:06:45 -07:00
Jonathan Wei b1847364b0
More efficient join filter rewrites (#9516)
* More efficient join filter rewrites

* Rebase

* Remove unused functions

* PR comments, fix compile

* Adjust comment

* Allow filter rewrite when join condition has LHS expression

* Fix inspections

* Fix tests
2020-03-16 22:16:14 -07:00
Clint Wylie 6afd55c8f4
threshold based automatic query prioritization (#9493)
* threshold based automatic query prioritization

* fixes

* spelling and fixes

* fix docs

* spelling

* checkstyle

* adjustments

* doc fix
2020-03-13 01:41:54 -07:00
Jihoon Son 7401bb3f93
Improve OvershadowableManager performance (#9441)
* Use the iterator instead of higherKey(); use the iterator API instead of stream

* Fix tests; fix a concurrency bug in timeline

* fix test

* add tests for findNonOvershadowedObjectsInInterval

* fix test

* add missing tests; fix a bug in QueueEntry

* equals tests

* fix test
2020-03-10 13:22:19 -07:00
Clint Wylie 8b9fe6f584
query laning and load shedding (#9407)
* prototype

* merge QueryScheduler and QueryManager

* everything in its right place

* adjustments

* docs

* fixes

* doc fixes

* use resilience4j instead of semaphore

* more tests

* simplify

* checkstyle

* spelling

* oops heh

* remove unused

* simplify

* concurrency tests

* add SqlResource tests, refactor error response

* add json config tests

* use LongAdder instead of AtomicLong

* remove test only stuffs from scheduler

* javadocs, etc

* style

* partial review stuffs

* adjust

* review stuffs

* more javadoc

* error response documentation

* spelling

* preserve user specified lane for NoSchedulingStrategy

* more test, why not

* doc adjustment

* style

* missed review for make a thing a constant

* fixes and tests

* fix test

* Update docs/configuration/index.md

Co-Authored-By: sthetland <steve.hetland@imply.io>

* doc update

Co-authored-by: sthetland <steve.hetland@imply.io>
2020-03-10 02:57:16 -07:00
Jihoon Son 75e2051195
Convert array_contains() and array_overlaps() into native filters if possible (#9487)
* Convert array_contains() and array_overlaps() into native filters if
possible

* make spotbugs happy and fix null results when null compatible
2020-03-09 22:50:38 -07:00
Jonathan Wei 0136dba95d
Add option to control join filter rewrites (#9472)
* Add option to control join filter rewrites

* Fix inspections
2020-03-09 17:36:07 -07:00
Jihoon Son b924161086
Add main method to VersionedIntervalTimelineBenchmark (#9404) 2020-02-26 12:01:02 -08:00
Clint Wylie b408a6d774
sql support for dynamic parameters (#6974)
* sql support for dynamic parameters

* fixup

* javadocs

* fixup from merge

* formatting

* fixes

* fix it

* doc fix

* remove druid fallback self-join parameterized test

* unused imports

* ignore test for now

* fix imports

* fixup

* fix merge

* merge fixup

* fix test that cannot vectorize

* fixup and more better

* dependency thingo

* fix docs

* tweaks

* fix docs

* spelling

* unused imports after merge

* review stuffs

* add comment

* add ignore text

* review stuffs
2020-02-19 13:09:20 -08:00
Adam Peck e9aebd994a
Fix for building in Eclipse & VS Code. (#7481)
Fixes #6866
Reverse dependencies from /main/ to /test/
Add generated-test-sources to source path for Eclipse
2020-02-13 14:58:32 -08:00
Jonathan Wei b2c00b3a79
Add query context option to disable join filter push down (#9335) 2020-02-11 15:31:34 -08:00
Lucas Capistrant 53bb45fc9a
Forbid easily misused HashSet and HashMap constructors (#9165)
* Forbid easily misused HashSet and HashMap constructors

* Add two LinkedHashMap constructors to forbidden-apis and create utility method as replacement for them

* Fix visibility of constant in CollectionUtils.java

* Make an exception for an instance of LinkedHashMap#<init>(int) because proper sizing is used

* revert changes to sql module tests that should be in separate PR

* Finish reverting changes to sql module tests that were flagged in checkstyle during CI

* Add netty dependency resulting from SupressForbidden
2020-02-07 10:44:09 +03:00
Gian Merlino 3ef5c2f2e8
Add MemoryOpenHashTable, a table similar to ByteBufferHashTable. (#9308)
* Add MemoryOpenHashTable, a table similar to ByteBufferHashTable.

With some key differences to improve speed and design simplicity:

1) Uses Memory rather than ByteBuffer for its backing storage.
2) Uses faster hashing and comparison routines (see HashTableUtils).
3) Capacity is always a power of two, allowing simpler design and more
   efficient implementation of findBucket.
4) Does not implement growability; instead, leaves that to its callers.
   The idea is this removes the need for subclasses, while still giving
   callers flexibility in how to handle table-full scenarios.

* Fix LGTM warnings.

* Adjust dependencies.

* Remove easymock from druid-benchmarks.

* Adjustments from review.

* Fix datasketches unit tests.

* Fix checkstyle.
2020-02-04 19:57:59 -08:00
Suneet Saldanha 33a97dfaae
Guicify druid sql module (#9279)
* Guicify druid sql module

Break up the SQLModule in to smaller modules and provide a binding that
modules can use to register schemas with druid sql.

* fix some tests

* address code review

* tests compile

* Working tests

* Add all the tests

* fix up licenses and dependencies

* add calcite dependency to druid-benchmarks

* tests pass

* rename the schemas
2020-02-04 11:33:48 -08:00
Gian Merlino b411443d22
SQL join support for lookups. (#9294)
* SQL join support for lookups.

1) Add LookupSchema to SQL, so lookups show up in the catalog.
2) Add join-related rels and rules to SQL, allowing joins to be planned into
   native Druid queries.

* Add two missing LookupSchema calls in tests.

* Fix tests.

* Fix typo.
2020-01-31 23:51:16 -08:00
Gian Merlino 204ba9966f
Add LookupJoinableFactory. (#9281)
* Add LookupJoinableFactory.

Enables joins where the right-hand side is a lookup. Includes an
integration test.

Also, includes changes to LookupExtractorFactoryContainerProvider:

1) Add "getAllLookupNames", which will be needed to eventually connect
   lookups to Druid's SQL catalog.
2) Convert "get" from nullable to Optional return.
3) Swap out most usages of LookupReferencesManager in favor of the
   simpler LookupExtractorFactoryContainerProvider interface.

* Fixes for tests.

* Fix another test.

* Java 11 message fix.

* Fixups.

* Fixup benchmark class.
2020-01-30 14:46:21 -08:00
Chi Cao Minh a1494c30e0
Join microbenchmark (#9267)
Add microbenchmark for joins. Enabling the column cache improves
performance by ~70% for the benchmarks for joins with string keys.
Adjusting LookupJoinMatcher.matchCondition() to have fewer branches,
improves performance by ~10% for the benchmarks for joins with lookups.
2020-01-29 14:08:19 -08:00