* Join filter pre-analysis simplifications and sanity checks.
- At pre-analysis time, only compute pre-analysis for the innermost
root query, since this is the one that will run on the join that involves
the base datasource. Previously, pre-analyses were computed for multiple
levels of the query, some of which were unnecessary.
- Remove JoinFilterPreAnalysisGroup and join query level gathering code,
since they existed to support precomputation of multiple pre-analyses.
- Embed JoinFilterPreAnalysisKey into JoinFilterPreAnalysis and use it to
sanity check at processing time that the correct pre-analysis was done.
Tangentially related changes:
- Remove prioritizeAndLaneQuery functionality from LocalQuerySegmentWalker.
The computed priority and lanes were not being used.
- Add "getBaseQuery" method to DataSourceAnalysis to support identification
of the proper subquery for filter pre-analysis.
* Fix compilation errors.
* Adjust tests.
* Make brokers backwards compatible
In 0.19, Brokers gained the ability to serve segments. To support this change,
a `BROKER` ServerType was added to `druid.server.coordination`.
Druid nodes prior to this change do not know of this new server type and so
they would fail to deserialize this node's announcement.
This change makes it so that the broker only announces itself if the segment
cache is configured on the broker. It is expected that a Druid admin will only
configure the segment cache on the broker once the cluster has been upgraded
to a version that supports a broker using the segment cache.
* make code nicer
* Add tests
* Ignore icode coverage for nitialization classes
* Revert "Ignore icode coverage for nitialization classes"
This reverts commit aeec0c2ac2.
* code review
* Filter http requests by http method
Add a config that allows a user which http methods to allow against their
Druid server.
Druid will only accept http requests with the method: GET, PUT, POST, DELETE
and OPTIONS.
If a Druid admin wants to allow other methods, they can do so by using the
ServerConfig#allowedHttpMethods config.
If a Druid user would like to disallow OPTIONS, this can be done by changing
the AuthConfig#allowUnauthenticatedHttpOptions config
* Exclude OPTIONS from always supported HTTP methods
Add HEAD as an allowed method for web console e2e tests
* fix docs
* fix security IT
* Actually fix the web console e2e tests
* Ignore icode coverage for nitialization classes
* code review
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning
* incomplete javadoc
* Address comments
* fix tests
* fix json serde, add tests
* checkstyle
* Set core partition set size for hash-partitioned segments properly in
batch ingestion
* test for both parallel and single-threaded task
* unused variables
* fix test
* unused imports
* add hash/range buckets
* some test adjustment and missing json serde
* centralized partition id allocation in parallel and simple tasks
* remove string partition chunk
* revive string partition chunk
* fill numCorePartitions for hadoop
* clean up hash stuffs
* resolved todos
* javadocs
* Fix tests
* add more tests
* doc
* unused imports
* Allow append to existing datasources when dynamic partitioing is used
* fix test
* checkstyle
* checkstyle
* fix test
* fix test
* fix other tests..
* checkstyle
* hansle unknown core partitions size in overlord segment allocation
* fail to append when numCorePartitions is unknown
* log
* fix comment; rename to be more intuitive
* double append test
* cleanup complete(); add tests
* fix build
* add tests
* address comments
* checkstyle
* change default number of segment loading threads
* fix docs
* missed file
* min -> max for segment loading threads
Co-authored-by: Dylan <dwylie@spotx.tv>
* Fill in the core partition set size properly for batch ingestion with
dynamic partitioning
* incomplete javadoc
* Address comments
* fix tests
* fix json serde, add tests
* checkstyle
* Set core partition set size for hash-partitioned segments properly in
batch ingestion
* test for both parallel and single-threaded task
* unused variables
* fix test
* unused imports
* add hash/range buckets
* some test adjustment and missing json serde
* centralized partition id allocation in parallel and simple tasks
* remove string partition chunk
* revive string partition chunk
* fill numCorePartitions for hadoop
* clean up hash stuffs
* resolved todos
* javadocs
* Fix tests
* add more tests
* doc
* unused imports
* IntelliJ inspection and checkstyle rule for "Collection.EMPTY_* field accesses replaceable with Collections.empty*()"
* Reverted checkstyle rule
* Added tests to pass CI
* Codestyle
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* fix checksyle
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* API to verify a datasource has the latest ingested data
* fix spelling
* address comments
* fix checkstyle
* update docs
* fix tests
* fix doc
* address comments
* fix typo
* fix spelling
* address comments
* address comments
* fix typo in docs
* Remove LegacyDataSource.
Its purpose was to enable deserialization of strings into TableDataSources.
But we can do this more straightforwardly with Jackson annotations.
* Slight test improvement.
* Fix broadcast rule drop and docs
* Remove racy test check
* Don't drop non-broadcast segments on tasks, add overshadowing handling
* Don't use realtimes for overshadowing
* Fix dropping for ingestion services
* make joinables closeable
* tests and adjustments
* refactor to make join stuffs impelement ReferenceCountedObject instead of Closable, more tests
* fixes
* javadocs and stuff
* fix bugs
* more test
* fix lgtm alert
* simplify
* fixup javadoc
* review stuffs
* safeguard against exceptions
* i hate this checkstyle rule
* make IndexedTable extend Closeable
* Add REGEXP_LIKE, fix empty-pattern bug in REGEXP_EXTRACT.
- Add REGEXP_LIKE function that returns a boolean, and is useful in
WHERE clauses.
- Fix REGEXP_EXTRACT return type (should be nullable; causes incorrect
filter elision).
- Fix REGEXP_EXTRACT behavior for empty patterns: should always match
(previously, they threw errors).
- Improve error behavior when REGEXP_EXTRACT and REGEXP_LIKE are passed
non-literal patterns.
- Improve documentation of REGEXP_EXTRACT.
* Changes based on PR review.
* Fix arg check.
* Important fixes!
* Add speller.
* wip
* Additional tests.
* Fix up tests.
* Add validation error tests.
* Additional tests.
* Remove useless call.
This change removes ListenableFutures.transformAsync in favor of the
existing Guava Futures.transform implementation. Our own implementation
had a bug which did not fail the future if the applied function threw an
exception, resulting in the future never completing.
An attempt was made to fix this bug, however when running againts Guava's own
tests, our version failed another half dozen tests, so it was decided to not
continue down that path and scrap our own implementation.
Explanation for how was this bug manifested itself:
An exception thrown in BaseAppenderatorDriver.publishInBackground when
invoked via transformAsync in StreamAppenderatorDriver.publish will
cause the resulting future to never complete.
This explains why when encountering https://github.com/apache/druid/issues/9845
the task will never complete, forever waiting for the publishFuture to
register the handoff. As a result, the corresponding "Error while
publishing segments ..." message only gets logged once the index task
times out and is forcefully shutdown when the future is force-cancelled
by the executor.
during segment publishing we do streaming operations on a collection not
safe for concurrent modification. To guarantee correct results we must
also guard any operations on the stream itself.
This may explain the issue seen in https://github.com/apache/druid/issues/9845
* refactor SeekableStreamSupervisor usage of RecordSupplier to reduce contention between background threads and main thread, refactor KinesisRecordSupplier, refactor Kinesis lag metric collection and emitting
* fix style and test
* cleanup, refactor, javadocs, test
* fixes
* keep collecting current offsets and lag if unhealthy in background reporting thread
* review stuffs
* add comment
* add flag to flattenSpec to keep null columns
* remove changes to inputFormat interface
* add comment
* change comment message
* update web console e2e test
* move keepNullColmns to JSONParseSpec
* fix merge conflicts
* fix tests
* set keepNullColumns to false by default
* fix lgtm
* change Boolean to boolean, add keepNullColumns to hash, add tests for keepKeepNullColumns false + true with no nuulul columns
* Add equals verifier tests
* Adding support for autoscaling in GCE
* adding extra google deps also in gce pom
* fix link in doc
* remove unused deps
* adding terms to spelling file
* version in pom 0.17.0-incubating-SNAPSHOT --> 0.18.0-SNAPSHOT
* GCEXyz -> GceXyz in naming for consistency
* add preconditions
* add VisibleForTesting annotation
* typos in comments
* use StringUtils.format instead of String.format
* use custom exception instead of exit
* factorize interval time between retries
* making literal value a constant
* iter all network interfaces
* use provided on google (non api) deps
* adding missing dep
* removing unneded this and use Objects methods instead o 3-way if in hash and comparison
* adding import
* adding retries around getRunningInstances and adding limit for operation end waiting
* refactor GceEnvironmentConfig.hashCode
* 0.18.0-SNAPSHOT -> 0.19.0-SNAPSHOT
* removing unused config
* adding tests to hash and equals
* adding nullable to waitForOperationEnd
* adding testTerminate
* adding unit tests for createComputeService
* increasing retries in unrelated integration-test to prevent sporadic failure (hopefully)
* reverting queryResponseTemplate change
* adding comment for Compute.Builder.build() returning null
* fixes for inline subqueries when multi-value dimension is present
* fix test
* allow missing capabilities for vectorized group by queries to be treated as single dims since it means that column doesnt exist
* add comment
* Add API to trigger a compaction by the coordinator for integration tests
* Add missing integration tests for the compaction by the coordinator
* address comments
* Fix off-by-one in IndexedTableJoinMatcher.getCardinality.
It would report a cardinality that is one lower than the actual cardinality.
The missing value is the phantom null that can be generated by outer joins.
* Fix tests.
* Remove no-op assert statement
The assert statement in ClientQuerySegmentWalker will always be true because
of the preceeding while loop which has the same condition.
This change removes dead code to fix an error reported by LGTM
* Suppress lgtm
* cleanup whitespace
* IntelliJ inspections cleanup
* Standard Charset object can be used
* Redundant Collection.addAll() call
* String literal concatenation missing whitespace
* Statement with empty body
* Redundant Collection operation
* StringBuilder can be replaced with String
* Type parameter hides visible type
* fix warnings in test code
* more test fixes
* remove string concatenation inspection error
* fix extra curly brace
* cleanup AzureTestUtils
* fix charsets for RangerAdminClient
* review comments