* Code cleanup from query profile project
* Fix spelling errors
* Fix Javadoc formatting
* Abstract out repeated test code
* Reuse constants in place of some string literals
* Fix up some parameterized types
* Reduce warnings reported by Eclipse
* Reverted change due to lack of tests
* Use intermediate-persist IndexSpec during multiphase merge.
The main change is the addition of an intermediate-persist IndexSpec
to the main "merge" method in IndexMerger. There are also a few minor
adjustments to the IndexMerger interface to encourage more harmonious
usage of its methods in the future.
* Additional changes inspired by the test coverage checker.
- Remove unused-in-production IndexMerger methods "append" and "convert".
- Add additional unit tests to UnifiedIndexerAppenderatorsManager.
* Additional adjustments.
* Even more additional adjustments.
* Test fixes.
* Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs.
This patch gathers together a variety of SQL from SqlSegmentsMetadataManager
and IndexerSQLMetadataStorageCoordinator into a new class SqlSegmentsMetadataQuery.
It focuses on SQL related to retrieving segment payloads and marking
segments used and unused.
In addition to cleaning up the code a bit, this patch also fixes a bug
with years before 0 or after 9999. The prior SQL did not work properly
because dates outside this range cannot be compared as strings. The new
code does work for these far-past and far-future years.
So, if you're ever interested in using Druid to analyze things from
ancient Babylon, you better apply this patch first!
* Fix test compiling.
* Fixes and improvements.
* Fix forbidden API.
* Additional fixes.
* add impl
* fix checkstyle
* add test
* add test
* add unit tests
* fix unit tests
* fix unit tests
* fix unit tests
* add IT
* add IT
* add comments
* fix spelling
* Make LocalInputSource.files a List instead of Set and adjust wikipedia_index_task to use file list.
Rationale: the behavior of wikipedia_index_task.json is order-dependent with regard to its input
files; some orders produce 4 segments and some produce 5 segments. Some integration tests, like
ITSystemTableBatchIndexTaskTest and ITAutoCompactionTest, are written assuming that the
4-segment case will always happen. Providing the file list in a specific order ensures that this
will happen as expected by the tests.
I didn't see a specific reason why the LocalInputSource.files parameter needed to be a Set, so
changing it to a List was the simplest way to achieve the consistent ordering. I think it will
also make the behavior make more sense if someone does specify the same input file multiple
times in a spec: I think they'd expect it to be loaded multiple times instead of deduped. This
is consistent with the behavior of other input sources like S3, GCS, HTTP.
* Sort files in LocalFirehoseFactory.
* Add worker category as dimension in TaskSlotCountStatsMonitor
* Change description
* Add workerConfig as field
* Modify HttpRemoteTaskRunnerTest to test worker category in taskslot metrics
* Fixing tests
* Fixing alerts
* Adding unit test in SingleTaskBackgroundRunnerTest for task slot metrics APIs
* Resolving false positive spell check
* addressing comments
* throw UnsupportedOperationException for tasklotmetrics APIs in SingleTaskBackgroundRunner
Co-authored-by: Nikhil Navadiya <nnavadiya@twitter.com>
* Fix bug Unrecognized token 'No': was expecting (JSON String,...) when calling the API /druid/indexer/v1/task/taskId/reports and the report is not found
* Also log other non-OK statuses
* Use a simple class to sanitize sanitizable errors and log them
The purpose of this is to sanitize JDBC errors, but can sanitize other errors
if they implement SanitizableError Interface
add a class to log errors and sanitize them
added a simple test that tests out that the error gets sanitized
add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy
* return less information as part of too many connections, and instead only log specific details
This is so an end user gets relevant information but not too much info since they might now how
many brokers they have
* return only runtime exceptions
added new error types that need to be sanitized
also sanitize deprecated and unsupported exceptions.
* dont reqrewite exceptions unless necessary for checked exceptions
add docs
avoid blanket turning all exceptions into runtime exceptions
* address comments, to fix up docs.
add more javadocs
add support UOE sanitization
* use try catch instead and sanitize at public methods
* checkstyle fixes
* throw noSuchStatement and NoSuchConnection as Avatica is affected by those
* address comments. move log error back to druid meta
clean up bad formatting and commented code. add missed catch for NoSuchStatementException
clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions
* alter test to reflect new error message
There are 3 types of query IDs - id, subQueryId, sqlQueryId. Currently, whenever a query generates subqueries, the subquery's subQueryId is populated randomly. Also, subquery's Id is not set to the parent query Id. Therefore there is no way of linking the subqueries to the parent query, and one loses the ability to look at end to end view of the query.
This PR aims to implement following couple of things:
Populate the subqueries with it's parent's id (and sqlQueryId if present)
Populate the subqueryId such that it forms a hierarchical relationship amongs themselves. For example, if there is a query which launches a subquery, which in turn launches a couple of subqueries, then the ids and subQueryIds should have following structure.
* RowBasedSegment: Use Sequence instead of Iterable.
The main reason this is good is that Sequences can include baggage that
must be closed after iteration is finished. This enables creating
RowBasedSegments on top of closeable sequences of rows.
To preserve the optimization that allows reversing a List without
copying it, this patch also makes SimpleSequence its own class and allows
extracting the Iterable that was used to create it.
* Fix tests.
* Remove StorageAdapter.getColumnTypeName.
It was only used by SegmentAnalyzer, and isn't necessary anymore due to
the recent improvements to ColumnCapabilities.
Also: tidy ColumnDescriptor.read slightly by removing an instanceof
check, and moving the relevant logic into ComplexColumnPartSerde.
* Fix spellings.
* complex typed expressions
* add built-in hll collector expressions to get coverage on druid-processing, more types, more better
* rampage!!!
* more javadoc
* adjustments
* oops
* lol
* remove unused dependency
* contradiction?
* more test
This PR adds support for range partitioning on multiple dimensions. It extends on the
concept and implementation of single dimension range partitioning.
The new partition type added is range which corresponds to a set of Dimension Range Partition classes. single_dim is now treated as a range type partition with a single partition dimension.
The start and end values of a DimensionRangeShardSpec are represented
by StringTuples, where each String in the tuple is the value of a partition dimension.
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to skip the realtime nodes and
thus it would query only Historical processes for any given segment.
* remove useless call to balanceServers for move from decom servers when there are no decom servers
* refactor approach to this PR but accomplish the same thing
* Remove CloseQuietly and migrate its usages to other methods.
These other methods include:
1) New method CloseableUtils.closeAndWrapExceptions, which wraps IOExceptions
in RuntimeExceptions for callers that just want to avoid dealing with
checked exceptions. Most usages were migrated to this method, because it
looks like they were mainly attempts to avoid declaring a throws clause,
and perhaps were unintentionally suppressing IOExceptions.
2) New method CloseableUtils.closeInCatch, designed to properly close something
in a catch block without losing exceptions. Some usages from catch blocks
were migrated here, when it seemed that they were intended to avoid checked
exception handling, and did not really intend to also suppress IOExceptions.
3) New method CloseableUtils.closeAndSuppressExceptions, which sends all
exceptions to a "chomper" that consumes them. Nothing is thrown or returned.
The behavior is slightly different: with this method, _all_ exceptions are
suppressed, not just IOExceptions. Calls that seemed like they had good
reason to suppress exceptions were migrated here.
4) Some calls were migrated to try-with-resources, in cases where it appeared
that CloseQuietly was being used to avoid throwing an exception in a finally
block.
🎵 You don't have to go home, but you can't stay here... 🎵
* Remove unused import.
* Fix up various issues.
* Adjustments to tests.
* Fix null handling.
* Additional test.
* Adjustments from review.
* Fixup style stuff.
* Fix NPE caused by holder starting out null.
* Fix spelling.
* Chomp Throwables too.
* better type system
* needle in a haystack
* ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support
* fixup merge
* more test
* fixup
* intern
* fix
* oops
* oops again
* ...
* more test coverage
* fix error message
* adjust interning, more javadocs
* oops
* more docs more better
this change ensures that JettyTest is setting the properties it needs in case some other test overwrites them
this also changes up the ordering of the call for setProperties to call super's first in case super is setting the same property
* Add the ability to add a context to internally generated druid broker queries
* fix docs
* changes after first CI failure
* cleanup after merge with master
* change default to empty map and improve unit tests
* add doc info and fix checkstyle
* refactor DruidSchema#runSegmentMetadataQuery and add a unit test
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to ignore the segments being served
by the specified historical tiers. By default, no tier is ignored.
This config is useful when you want a completely isolated tier amongst many other tiers.
Say there are several tiers of historicals Tier T1, Tier T2 ... Tier Tn
and there are several brokers Broker B1, Broker B2 .... Broker Bm
If we want only Broker B1 to query Tier T1, instead of setting a long list of watchedTiers
on each of the other Brokers B2 ... Bm, we could just set druid.broker.segment.ignoredTiers=["T1"]
for these Brokers, while Broker B1 could have druid.broker.segment.watchedTiers=["T1"]
* Fix issue of duplicate key under certain conditions when loading late data in streaming. Also fixes a documentation issue with skipSegmentLineageCheck.
* maxId may be null at this point, need to check for that
* Remove hypothetical case (it cannot happen)
* Revert compaction is simply "killing" the compacted segment and previously, used, overshadowed segments are visible again
* Add comments
* refactor sql authorization to get resource type from schema, refactor resource type from enum to string
* information schema auth filtering adjustments
* refactor
* minor stuff
* Update SqlResourceCollectorShuttle.java
* Make persists concurrent with ingestion
* Remove semaphore but keep concurrent persists (with add) and add push in the backround as well
* Go back to documented default persists (zero)
* Move to debug
* Remove unnecessary Atomics
* Comments on synchronization (or not) for sinks & sinkMetadata
* Some cleanup for unit tests but they still need further work
* Shutdown & wait for persists and push on close
* Provide support for three existing batch appenderators using batchProcessingMode flag
* Fix reference to wrong appenderator
* Fix doc typos
* Add BatchAppenderators class test coverage
* Add log message to batchProcessingMode final value, fix typo in enum name
* Another typo and minor fix to log message
* LEGACY->OPEN_SEGMENTS, Edit docs
* Minor update legacy->open segments log message
* More code comments, mostly small adjustments to naming etc
* fix spelling
* Exclude BtachAppenderators from Jacoco since it is fully tested but Jacoco still refuses to ack coverage
* Coverage for Appenderators & BatchAppenderators, name change of a method that was still using "legacy" rather than "openSegments"
Co-authored-by: Clint Wylie <cjwylie@gmail.com>
* initial work
* reduce lock in sqlLifecycle
* Integration test for sql canceling
* javadoc, cleanup, more tests
* log level to debug
* fix test
* checkstyle
* fix flaky test; address comments
* rowTransformer
* cancelled state
* use lock
* explode instead of noop
* oops
* unused import
* less aggressive with state
* fix calcite charset
* don't emit metrics when you are not authorized
Fixes#11297.
Description
Description and design in the proposal #11297
Key changed/added classes in this PR
*DataSegmentPusher
*ShuffleClient
*PartitionStat
*PartitionLocation
*IntermediaryDataManager
This PR adds a new property druid.router.sql.enable which allows the
Router to handle SQL queries when set to true.
This change does not affect Avatica JDBC requests and they are still routed
by hashing the Connection ID.
To allow parsing of the request object as a SqlQuery (contained in module druid-sql),
some classes have been moved from druid-server to druid-services with
the same package name.
* Improve concurrency between DruidSchema and BrokerServerView
* unused imports and workaround for error prone faiure
* count only known segments
* add comments
* Update default maxSegmentsInNodeLoadingQueue
Update the default maxSegmentsInNodeLoadingQueue from 0 (unbounded) to 100.
An unbounded maxSegmentsInNodeLoadingQueue can cause cluster instability.
Since this is the default druid operators need to run into this instability
and then look through the docs to see that the recommended value for a large
cluster is 1000. This change makes it so the default will prevent clusters
from falling over as they grow over time.
* update tests
* codestyle
* Allow kill task to mark segments as unused
* Add IndexerSQLMetadataStorageCoordinator test
* Update docs/ingestion/data-management.md
Co-authored-by: Jihoon Son <jihoonson@apache.org>
* Add warning to kill task doc
Co-authored-by: Jihoon Son <jihoonson@apache.org>
This change allows the selection of a specific broker service (or broker tier) by the Router.
The newly added ManualTieredBrokerSelectorStrategy works as follows:
Check for the parameter brokerService in the query context. If this is a valid broker service, use it.
Check if the field defaultManualBrokerService has been set in the strategy. If this is a valid broker service, use it.
Move on to the next strategy
* Add a new metric query/segments/count that is not emitted by default
* docs
* test the default implementation of the metric
* fix spelling error in docs
* document the fact that query retries will result in additional metric emissions
* update using recommended text from @jihoonson
This PR refactors the code related to segment loading specifically SegmentLoader and SegmentLoaderLocalCacheManager. SegmentLoader is marked UnstableAPI which means, it can be extended outside core druid in custom extensions. Here is a summary of changes
SegmentLoader returns an instance of ReferenceCountingSegment instead of Segment. Earlier, SegmentManager was wrapping Segment objects inside ReferenceCountingSegment. That is now moved to SegmentLoader. With this, a custom implementation can track the references of segments. It also allows them to create custom ReferenceCountingSegment implementations. For this reason, the constructor visibility in ReferenceCountingSegment is changed from private to protected.
SegmentCacheManager has two additional methods called - reserve(DataSegment) and release(DataSegment). These methods let the caller reserve or release space without calling SegmentLoader#getSegment. We already had similar methods in StorageLocation and now they are available in SegmentCacheManager too which wraps multiple locations.
Refactoring to simplify the code in SegmentCacheManager wherever possible. There is no change in the functionality.
This PR splits current SegmentLoader into SegmentLoader and SegmentCacheManager.
SegmentLoader - this class is responsible for building the segment object but does not expose any methods for downloading, cache space management, etc. Default implementation delegates the download operations to SegmentCacheManager and only contains the logic for building segments once downloaded. . This class will be used in SegmentManager to construct Segment objects.
SegmentCacheManager - this class manages the segment cache on the local disk. It fetches the segment files to the local disk, can clean up the cache, and in the future, support reserve and release on cache space. [See https://github.com/Make SegmentLoader extensible and customizable #11398]. This class will be used in ingestion tasks such as compaction, re-indexing where segment files need to be downloaded locally.
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded
* compaction/status API retains status for datasources that no longer existed causing in-memory used to grow unbounded
* fix test
* fix test
* Bound memory in native batch ingest create segments
* Move BatchAppenderatorDriverTest to indexing service... note that we had to put the sink back in sinks in mergeandpush since the persistent data needs to be dropped and the sink is required for that
* Remove sinks from memory and clean up intermediate persists dirs manually after sink has been merged
* Changed name from RealtimeAppenderator to StreamAppenderator
* Style
* Incorporating tests from StreamAppenderatorTest
* Keep totalRows and cleanup code
* Added missing dep
* Fix unit test
* Checkstyle
* allowIncrementalPersists should always be true for batch
* Added sinks metadata
* clear sinks metadata when closing appenderator
* Style + minor edits to log msgs
* Update sinks metadata & totalRows when dropping a sink (segment)
* Remove max
* Intelli-j check
* Keep a count of hydrants persisted by sink for sanity check before merge
* Move out sanity
* Add previous hydrant count to sink metadata
* Remove redundant field from SinkMetadata
* Remove unneeded functions
* Cleanup unused code
* Removed unused code
* Remove unused field
* Exclude it from jacoco because it is very hard to get branch coverage
* Remove segment announcement and some other minor cleanup
* Add fallback flag
* Minor code cleanup
* Checkstyle
* Code review changes
* Update batchMemoryMappedIndex name
* Code review comments
* Exclude class from coverage, will include again when packaging gets fixed
* Moved test classes to server module
* More BatchAppenderator cleanup
* Fix bug in wrong counting of totalHydrants plus minor cleanup in add
* Removed left over comments
* Have BatchAppenderator follow the Appenderator contract for push & getSegments
* Fix LGTM violations
* Review comments
* Add stats after push is done
* Code review comments (cleanup, remove rest of synchronization constructs in batch appenderator, reneame feature flag,
remove real time flag stuff from stream appenderator, etc.)
* Update javadocs
* Add thread safety notice to BatchAppenderator
* Further cleanup config
* More config cleanup
* support using mariadb connector with mysql extensions
* cleanup and more tests
* fix test
* javadocs, more tests, etc
* style and more test
* more test more better
* missing pom
* more pom
* add single input string expression dimension vector selector and better expression planning
* better
* fixes
* oops
* rework how vector processor factories choose string processors, fix to be less aggressive about vectorizing
* oops
* javadocs, renaming
* more javadocs
* benchmarks
* use string expression vector processor with vector size 1 instead of expr.eval
* better logging
* javadocs, surprising number of the the
* more
* simplify
This PR refactors the code for QueryRunnerFactory#mergeRunners to accept a new interface called QueryProcessingPool instead of ExecutorService for concurrent execution of query runners. This interface will let custom extensions inject their own implementation for deciding which query-runner to prioritize first. The default implementation is the same as today that takes the priority of query into account. QueryProcessingPool can also be used as a regular executor service. It has a dedicated method for accepting query execution work so implementations can differentiate between regular async tasks and query execution tasks. This dedicated method also passes the QueryRunner object as part of the task information. This hook will let custom extensions carry any state from QuerySegmentWalker to QueryProcessingPool#mergeRunners which is not possible currently.
* upgrade error-prone to 2.7.1 and support checks with Java 11+
- upgrade error-prone to 2.7.1
- support running error-prone with Java 11 and above using -Xplugin
instead of custom compiler
- add compiler arguments to ignore warnings/errors in Java 15/16
- introduce strictCompile property to enable strict profiles since we
now need multiple strict profiles for Java 8
- properly exclude all generated source files from error-prone
- fix druid-processing overriding annotation processors from parent pom
- fix druid-core disabling most non-default checks
- align plugin and annotation errorprone versions
- fix / suppress additional issues found by error-prone:
* fix bug in SeekableStreamSupervisor initializing ArrayList size with
the taskGroupdId
* fix missing @Override annotations
- remove outdated compiler plugin in benchmarks
- remove deleted ParameterPackage error-prone rule
- re-enable checks on benchmark module as well
* fix IntelliJ inspections
* disable LongFloatConversion due to bug in error-prone with JDK 8
* add comment about InsecureCrypto
* enrich expression cache key information to support expressions which depend on external state such as lookups
* cache rules everything around me
* low carb
* rename
With this change, Druid will only support ZooKeeper 3.5.x and later.
In order to support Java 15 we need to switch to ZK 3.5.x client libraries and drop support for ZK 3.4.x
(see #10780 for the detailed reasons)
* remove ZooKeeper 3.4.x compatibility
* exclude additional ZK 3.5.x netty dependencies to ensure we use our version
* keep ZooKeeper version used for integration tests in sync with client library version
* remove the need to specify ZK version at runtime for docker
* add support to run integration tests with JDK 15
* build and run unit tests with Java 15 in travis
* Avoid mapping hydrants in create segments phase for native ingestion
* Drop queriable indices after a given sink is fully merged
* Do not drop memory mappings for realtime ingestion
* Style fixes
* Renamed to match use case better
* Rollback memoization code and use the real time flag instead
* Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations
* Style
* Log some count stats
* Make sure sinks size is obtained at the right time
* BatchAppenderator unit test
* Fix comment typos
* Renamed methods to make them more readable
* Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator
* Missing dependency
* Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments.
* Replaced concurrent variables with normal ones
* Added batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path.
* Style fix.
* Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment.
* Forgot to commit this edited documentation message
* lay the groundwork for throttling replicant loads per RunRules execution
* Add dynamic coordinator config to control new replicant threshold.
* remove redundant line
* add some unit tests
* fix checkstyle error
* add documentation for new dynamic config
* improve docs and logs
* Alter how null is handled for new config. If null, manually set as default
* ARRAY_AGG sql aggregator function
* add javadoc
* spelling
* review stuff, return null instead of empty when nil input
* review stuff
* Update sql.md
* use type inference for finalize, refactor some things
* Add feature to automatically remove rules based on retention period
* Add feature to automatically remove rules based on retention period
* address comments
* DataSchema: Improve duplicate-column error message.
Now, when duplicate columns are specified, the error message will include
information about where those duplicate columns were seen. Also, if there
are multiple duplicate columns, all will be listed in the error message
instead of just the first one encountered.
* Fix style for checkstyle.
* Further improve error message.