* SQL support for union datasources.
Exposed via the "UNION ALL" operator. This means that there are now two
different implementations of UNION ALL: one at the top level of a query
that works by concatenating subquery results, and one at the table level
that works by creating a UnionDataSource.
The SQL documentation is updated to discuss these two use cases and how
they behave.
Future work could unify these by building support for a native datasource
that represents the union of multiple subqueries. (Today, UnionDataSource
can only represent the union of tables, not subqueries.)
* Fixes.
* Error message for sanity check.
* Additional test fixes.
* Add some error messages.
* Fix handling of 'join' on top of 'union' datasources.
The problem is that unions are typically rewritten into a series of
individual queries on the underlying tables, but this isn't done when
the union is wrapped in a join.
The main changes are in UnionQueryRunner:
1) Replace an instanceof UnionQueryRunner check with DataSourceAnalysis.
2) Replace a "query.withDataSource" call with a new function, "Queries.withBaseDataSource".
Together, these enable UnionQueryRunner to "see through" a join.
* Tests.
* Adjust heap sizes for integration tests.
* Different approach, more tests.
* Tweak.
* Styling.
* support redis cluster
* add 'password', 'database' properties
* test cases passed
* update doc
* some improvements
* fix CI
* add more test cases to improve branch coverage
* fix dependency check for test
* resolve review comments
* Add SQL "OFFSET" clause.
Under the hood, this uses the new offset features from #10233 (Scan)
and #10235 (GroupBy). Since Timeseries and TopN queries do not currently
have an offset feature, SQL planning will switch from one of those to
Scan or GroupBy if users add an OFFSET.
Includes a refactoring to harmonize offset and limit planning using an
OffsetLimit wrapper class. This is useful because it ensures that the
various places that need to deal with offset and limit collapsing all
behave the same way, using its "andThen" method.
* Fix test and add another test.
* Add note about aggreations on floats
Floating point math is known to be unstable. Due to the way aggregators work
across segments it's possible for the same query operating on the same data to
produce slightly different results.
The same problem exists with any aggregators that are not commutative since
the merge order across segments is not guaranteed.
* Also talk about doubles
* Apply suggestions from code review
* Add "offset" parameter to the Scan query.
It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.
* Fix constructor call.
* Fix up JSONs.
* Fix call to ScanQuery.
* Doc update.
* Fix javadocs.
* Spotbugs, LGTM suppressions.
* Javadocs.
* Fix suppression.
* Stabilize Scan query result order, add tests.
* Update LGTM comment.
* Fixup.
* Test different batch sizes too.
* Nicer tests.
* Fix comment.
* LongMaxVectorAggregator support and test case.
* DoubleMinVectorAggregator and test cases.
* DoubleMaxVectorAggregator and unit test.
* FloatMinVectorAggregator and FloatMaxVectorAggregator.
* Documentation update to include the other vector aggregators.
* Bug fix.
* checkstyle formatting fixes.
* CalciteQueryTest cases update.
* Separate test classes for FloatMaxAggregation and FloatMniAggregation.
* remove the cannotVectorize for float max/min aggregator in test.
* Tests in GroupByQueryRunner, GroupByTimeseriesQueryRunner and TimeseriesQueryRunner.
* Add "offset" parameter to GroupBy query.
It works by doing the query as normal and then throwing away the first
"offset" number of rows on the broker.
* Stabilize GroupBy sorts.
* Fix inspections.
* Fix suppression.
* Fixups.
* Move TopNSequence to druid-core.
* Addl comments.
* NumberedElement equals verification.
* Changes from review.
* Fix minor formatting in docs.
* Add Nullhandling initialization for test to run from IDE.
* Vectorize longMin aggregator.
- A new vectorized class for the vectorized long min aggregator.
- Changes to AggregatorFactory to support vectorize functionality.
- Few changes to schema evolution test to add LongMinAggregatorFactory.
* Add longSum to the supported vectorized aggregator implementations.
* Add MIN() long min to calcite query test that can vectorize.
* Add simple long aggregations test.
* Fixup formatting per checkstyle guide.
* fixup and add more tests for long min aggregator.
* Override test for groupBy since timestamps are handled differently.
* Null compatibility check in test.
* Review comment: Add a test case to LongMinAggregationTest.
* support unit suffix on byte-related properties
* add doc
* change default value of byte-related properites in example files
* fix coding style
* fix doc
* fix CI
* suppress spelling errors
* improve code according to comments
* rename Bytes to HumanReadableBytes
* add getBytesInInt to get value safely
* improve doc
* fix problem reported by CI
* fix problem reported by CI
* resolve code review comments
* improve error message
* improve code & doc according to comments
* fix CI problem
* improve doc
* suppress spelling check errors
* Add segment pruning for hash based partitioning
* Update doc
* Add additional test
* Address comments
* Fix unit test failure
Co-authored-by: Jian Wang <jwang@pinterest.com>
* Add availability and consistency docs.
Describes transactional ingestion and atomic replacement. Also, this patch
deletes some bad advice from the javadocs for SegmentTransactionalInsertAction.
* Fix missing word.
* init commit, all tests passed
* fix format
Signed-off-by: frank chen <frank.chen021@outlook.com>
* data stored successfully
* modify config path
* add doc
* add aliyun-oss extension to project
* remove descriptor deletion code to avoid warning message output by aliyun client
* fix warnings reported by lgtm-com
* fix ci warnings
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix errors reported by intellj inspection check
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix doc spelling check
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix dependency warnings reported by ci
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix warnings reported by CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add package configuration to support showing extension info
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add IT test cases and fix bugs
Signed-off-by: frank chen <frank.chen021@outlook.com>
* 1. code review comments adopted
2. change schema from 'aliyun-oss' to 'oss'
Signed-off-by: frank chen <frank.chen021@outlook.com>
* add license info
Signed-off-by: frank chen <frank.chen021@outlook.com>
* fix doc
Signed-off-by: frank chen <frank.chen021@outlook.com>
* exclude execution of IT testcases of OSS extension from CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* put the extensions under contrib group and add to distribution
* fix names in test cases
* add unit test to cover OssInputSource
* fix names in test cases
* fix dependency problem reported by CI
Signed-off-by: frank chen <frank.chen021@outlook.com>
* Filter http requests by http method
Add a config that allows a user which http methods to allow against their
Druid server.
Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.
If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config
* Exclude OPTIONS from always supported HTTP methods
Add HEAD as an allowed method for web console e2e tests
* fix docs
* fix security IT
* Actually fix the web console e2e tests
* Ignore icode coverage for nitialization classes
* code review
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning
* incomplete javadoc
* Address comments
* fix tests
* fix json serde, add tests
* checkstyle
* Set core partition set size for hash-partitioned segments properly in
batch ingestion
* test for both parallel and single-threaded task
* unused variables
* fix test
* unused imports
* add hash/range buckets
* some test adjustment and missing json serde
* centralized partition id allocation in parallel and simple tasks
* remove string partition chunk
* revive string partition chunk
* fill numCorePartitions for hadoop
* clean up hash stuffs
* resolved todos
* javadocs
* Fix tests
* add more tests
* doc
* unused imports
* Allow append to existing datasources when dynamic partitioing is used
* fix test
* checkstyle
* checkstyle
* fix test
* fix test
* fix other tests..
* checkstyle
* hansle unknown core partitions size in overlord segment allocation
* fail to append when numCorePartitions is unknown
* log
* fix comment; rename to be more intuitive
* double append test
* cleanup complete(); add tests
* fix build
* add tests
* address comments
* checkstyle
* Druid user permissions apply in the console
* Update index.md
* noting user warning in console page; some minor shuffling
* noting user warning in console page; some minor shuffling 1
* touchups
* link checking fixes
* Updated per suggestions
* change default number of segment loading threads
* fix docs
* missed file
* min -> max for segment loading threads
Co-authored-by: Dylan <dwylie@spotx.tv>
* fix docs error: google to azure and hdfs to http
* fix docs error: indexSpecForIntermediatePersists of tuningConfig in hadoop-based batch part
* fix docs error: logParseExceptions of tuningConfig in hadoop-based batch part
* fix docs error: maxParseExceptions of tuningConfig in hadoop-based batch part
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning
* incomplete javadoc
* Address comments
* fix tests
* fix json serde, add tests
* checkstyle
* Set core partition set size for hash-partitioned segments properly in
batch ingestion
* test for both parallel and single-threaded task
* unused variables
* fix test
* unused imports
* add hash/range buckets
* some test adjustment and missing json serde
* centralized partition id allocation in parallel and simple tasks
* remove string partition chunk
* revive string partition chunk
* fill numCorePartitions for hadoop
* clean up hash stuffs
* resolved todos
* javadocs
* Fix tests
* add more tests
* doc
* unused imports