* use BloomFilter instead of BloomKFilter since the latters test method is not threadsafe
* fix formatting
* style and forbidden api
* remove redundant hive notice entry
* add todo with note to delete copied implementation and link to related hive jira
* better fix for masks than ThreadLocal
* move parquet-extensions from contrib to core, adds new hadoop parquet parser that does not convert to avro first and supports flattenSpec and int96 columns, add support for flattenSpec for parquet-avro conversion parser, much test with a bunch of files lifted from spark-sql
* fix avro flattener to support nullable primitives for auto discovery and now only supports primitive arrays instead of all arrays
* remove leftover print
* convert micro timestamp to millis
* checkstyle
* add ignore for .parquet and .parq to rat exclude
* fix legit test failure from avro flattern behavior change
* fix rebase
* add exclusions to pom to cut down on redundant jars
* refactor tests, add support for unwrapping lists for parquet-avro, review comments
* more comment
* fix oops
* tweak parquet-avro list handling
* more docs
* fix style
* grr styles
* Prohibit some guava collection APIs and use JDK APIs directly
* reset files that changed by accident
* sort codestyle/druid-forbidden-apis.txt alphabetically
* remove consumer.listTopics() method
* add consumerLock and exception handling for consumer.partitionFor() and remove some useless checks
* add check in case consumer.partitionsFor() returns null
* fix CI failure
* fix failed UT
* Revert "fix CI failure"
This reverts commit f839d09e1e.
* revert unless commit and re-commit the useful part to fix failed UT
* use a sha512 hash of bloom filter for cache key instead of filter bytes
* make serde private, BloomDimFilter.toString and BloomDimFilter.equals use hash instead of bloomKFilter which has no tostring or equals of its own
* keep and use HashCode object instead of converting to bytes up front
* uneeded imports oops
* tweaks from review
* refactor dupe code
* refactor
* include mysql-metadata-storage extension in distribution, but without the GPL-licensed connector library
* Install mysql connector package
* use symlinks to avoid versioning issues
* add documentation for fetching the mysql connector
* Use NodeType enum instead of Strings
* Make NodeType constants uppercase
* Fix CommonCacheNotifier and NodeType/ServerType comments
* Reconsidering comment
* Fix import
* Add a comment to CommonCacheNotifier.NODE_TYPES
* Added SystemSchema with following tables (#5989)
* SEGMENTS table provides details on served and published segments
* SERVERS table provides details on data servers
* SERVERSEGMETS table is the JOIN of SEGMENTS and SERVERS
* TASKS table provides details on tasks
* Add documentation for system schema
* Fix static-analysis warnings
* Address PR comments
*Add unit tests
* Fix a test
* Try to fix a test
* Fix a bug around replica count
* rename io.druid to org.apache.druid
* Major change is to make tasks and segment queries streaming
* Made tasks/segments stream to calcite instead of storing it in memory
* Add num_rows to segments table
* Refactor JsonParserIterator
* Replace with closeable iterator
* Fix docs, make num_rows column nullable, some unit test changes
* make num_rows column type long, allow it to be null
fix a compile error after merge, add TrafficCop param to InputStreamResponseHandler
* Filter null rows for segments table from Linq4j enumerable
* change num_replicas datatype to long in segments table
* Fix some tests and address comments
* Doc updates, other PR comments
* Update tests
* Address comments
* Add auth check
* Update docs
* Refactoring
* Fix teamcity warning, change the getQueryableServer in TimelineServerView
* Fix compilation after rebase
* Use the stream API from AuthorizationUtils
* Added LeaderClient interface and NoopDruidLeaderClient class
* Revert "Added LeaderClient interface and NoopDruidLeaderClient class"
This reverts commit 100fa46e39.
* Make the naming consistent to server_segments for the join table
* Add ForbiddenException on auth check failure
* Remove static block from SystemSchema
* Try to fix a test in CalciteQueryTest due to rename of server_segments
* Fix the json output format in the coordinator API
* Add auth check in the segments API
* Add null check to avoid NPE
* Use annonymous class object instead of mock for DruidLeaderClient in SqlBenchmark
* Fix test failures, type long/BIGINT can be nullable
* Revert long nullability to fix tests
* Fix style for tests
* PR comments
* Address PR comments
* Add the missing BytesAccumulatingResponseHandler class
* Use Sequences.withBaggage in DruidPlanner
* Fix docs, add comments
* Close the iterator if hasNext returns false
* Prevent failed KafkaConsumer creation from blocking overlord startup
* PR comments
* Fix random task ID length
* Adjust test timer
* Use Integer.SIZE
This PR accumulates many refactorings and small improvements that I did while preparing the next change set of https://github.com/druid-io/druid/projects/2. I finally decided to make them a separate PR to minimize the volume of the main PR.
Some of the changes:
- Renamed confusing "Generic Column" term to "Numeric Column" (what it actually implies) in many class names.
- Generified `ComplexMetricExtractor`
* 'suspend' and 'resume' support for kafka indexing service
changes:
* introduces `SuspendableSupervisorSpec` interface to describe supervisors which support suspend/resume functionality controlled through the `SupervisorManager`, which will gracefully shutdown the supervisor and it's tasks, update it's `SupervisorSpec` with either a suspended or running state, and update with the toggled spec. Spec updates are provided by `SuspendableSupervisorSpec.createSuspendedSpec` and `SuspendableSupervisorSpec.createRunningSpec` respectively.
* `KafkaSupervisorSpec` extends `SuspendableSupervisorSpec` and now supports suspend/resume functionality. The difference in behavior between 'running' and 'suspended' state is whether the supervisor will attempt to ensure that indexing tasks are or are not running respectively. Behavior is identical otherwise.
* `SupervisorResource` now provides `/druid/indexer/v1/supervisor/{id}/suspend` and `/druid/indexer/v1/supervisor/{id}/resume` which are used to suspend/resume suspendable supervisors
* Deprecated `/druid/indexer/v1/supervisor/{id}/shutdown` and moved it's functionality to `/druid/indexer/v1/supervisor/{id}/terminate` since 'shutdown' is ambiguous verbage for something that effectively stops a supervisor forever
* Added ability to get all supervisor specs from `/druid/indexer/v1/supervisor` by supplying the 'full' query parameter `/druid/indexer/v1/supervisor?full` which will return a list of json objects of the form `{"id":<id>, "spec":<SupervisorSpec>}`
* Updated overlord console ui to enable suspend/resume, and changed 'shutdown' to 'terminate'
* move overlord console status to own column in supervisor table so does not look like garbage
* spacing
* padding
* other kind of spacing
* fix rebase fail
* fix more better
* all supervisors now suspendable, updated materialized view supervisor to support suspend, more tests
* fix log
* Broker backpressure.
Adds a new property "druid.broker.http.maxQueuedBytes" and a new context
parameter "maxQueuedBytes". Both represent a maximum number of bytes queued
per query before exerting backpressure on the channel to the data server.
Fixes#4933.
* Fix query context doc.
* resolves#5898 by adding maxTotalRows to incremental publishing kafka index task and appenderator based realtime indexing task, as available in IndexTask
* address review comments
* changes due to review
* merge fail
* Rename io.druid to org.apache.druid.
* Fix META-INF files and remove some benchmark results.
* MonitorsConfig update for metrics package migration.
* Reorder some dimensions in inner queries for some reason.
* Fix protobuf tests.