Commit Graph

1388 Commits

Author SHA1 Message Date
Kashif Faraz 50461c3bd5
Enable smartSegmentLoading on the Coordinator (#13197)
This commit does a complete revamp of the coordinator to address problem areas:
- Stability: Fix several bugs, add capabilities to prioritize and cancel load queue items
- Visibility: Add new metrics, improve logs, revamp `CoordinatorRunStats`
- Configuration: Add dynamic config `smartSegmentLoading` to automatically set
optimal values for all segment loading configs such as `maxSegmentsToMove`,
`replicationThrottleLimit` and `maxSegmentsInNodeLoadingQueue`.

Changed classes:
- Add `StrategicSegmentAssigner` to make assignment decisions for load, replicate and move
- Add `SegmentAction` to distinguish between load, replicate, drop and move operations
- Add `SegmentReplicationStatus` to capture current state of replication of all used segments
- Add `SegmentLoadingConfig` to contain recomputed dynamic config values
- Simplify classes `LoadRule`, `BroadcastRule`
- Simplify the `BalancerStrategy` and `CostBalancerStrategy`
- Add several new methods to `ServerHolder` to track loaded and queued segments
- Refactor `DruidCoordinator`

Impact:
- Enable `smartSegmentLoading` by default. With this enabled, none of the following
dynamic configs need to be set: `maxSegmentsToMove`, `replicationThrottleLimit`,
`maxSegmentsInNodeLoadingQueue`, `useRoundRobinSegmentAssignment`,
`emitBalancingStats` and `replicantLifetime`.
- Coordinator reports richer metrics and produces cleaner and more informative logs
- Coordinator uses an unlimited load queue for all serves, and makes better assignment decisions
2023-06-19 14:27:35 +05:30
Abhishek Agarwal 139156cf6b
Reduce the spam in broker logs (#14368) 2023-06-05 18:56:34 +05:30
Katya Macedo 269137c682
Update Ingestion section (#14023)
Co-authored-by: Charles Smith <techdocsmith@gmail.com>
Co-authored-by: Victoria Lim <vtlim@users.noreply.github.com>
Co-authored-by: Victoria Lim <lim.t.victoria@gmail.com>
2023-05-19 09:42:27 -07:00
Paul Rogers 3c0983c8e9
Extend the IT framework to allow tests in extensions (#13877)
The "new" IT framework provides a convenient way to package and run integration tests (ITs), but only for core modules. We have a use case to run an IT for a contrib extension: the proposed gRPC query extension. This PR provides the IT framework functionality to allow non-core ITs.
2023-05-15 20:29:51 +05:30
imply-cheddar f9861808bc
Be able to load segments on Peons (#14239)
* Be able to load segments on Peons

This change introduces a new config on WorkerConfig
that indicates how many bytes of each storage
location to use for storage of a task.  Said config
is divided up amongst the locations and slots
and then used to set TaskConfig.tmpStorageBytesPerTask

The Peons use their local task dir and
tmpStorageBytesPerTask as their StorageLocations for
the SegmentManager such that they can accept broadcast
segments.
2023-05-12 16:51:00 -07:00
Kashif Faraz ba11b3d462
Refactor: Add OverlordDuty to replace OverlordHelper and align with CoordinatorDuty (#14235)
Changes:
- Replace `OverlordHelper` with `OverlordDuty` to align with `CoordinatorDuty`
  - Each duty has a `run()` method and defines a `Schedule` with an initial delay and period.
  - Update existing duties `TaskLogAutoCleaner` and `DurableStorageCleaner`
- Add utility class `Configs`
- Update log, error messages and javadocs
- Other minor style improvements
2023-05-12 22:39:56 +05:30
Clint Wylie 8805d8d7db
fix issues with filtering nulls on values coerced to numeric types (#14139)
* fix issues with filtering nulls on values coerced to numeric types
* fix issues with 'auto' type numeric columns in default value mode
* optimize variant typed columns without nested data
* more tests for 'auto' type column ingestion
2023-05-08 13:19:02 -07:00
Abhishek Agarwal edfd46ed45
Better actionable error message when druid services are not running (#14202)
We have seen that the first-time users often don't know the next steps if druid services are unresponsive for some reason. This PR makes some of those messages a bit more clear.
2023-05-04 18:03:59 +05:30
Clint Wylie 90ea192d9c
fix bugs with auto encoded long vector deserializers (#14186)
This PR fixes an issue when using 'auto' encoded LONG typed columns and the 'vectorized' query engine. These columns use a delta based bit-packing mechanism, and errors in the vectorized reader would cause it to incorrectly read column values for some bit sizes (1 through 32 bits). This is a regression caused by #11004, which added the optimized readers to improve performance, so impacts Druid versions 0.22.0+.

While writing the test I finally got sad enough about IndexSpec not having a "builder", so I made one, and switched all the things to use it. Apologies for the noise in this bug fix PR, the only real changes are in VSizeLongSerde, and the tests that have been modified to cover the buggy behavior, VSizeLongSerdeTest and ExpressionVectorSelectorsTest. Everything else is just cleanup of IndexSpec usage.
2023-05-01 11:49:27 +05:30
Gian Merlino 0884a22c41
MSQ: Support for querying lookup and inline data directly. (#14048)
* MSQ: Support for querying lookup and inline data directly.

Main changes:

1) Add of LookupInputSpec and DataSourcePlan.forLookup.

2) Add InlineInputSpec, and modify of DataSourcePlan.forInline to use
   this instead of an ExternalInputSpec with JSON. This allows the inline
   data to act as the right-hand side of a join, if needed.

Supporting changes:

1) Modify JoinDataSource's leftFilter validation to be a little less
   strict: it's now OK with leftFilter being attached to any concrete
   leaf (no children) datasource, rather than requiring it be a table.
   This allows MSQ to create JoinDataSource with InputNumberDataSource
   as the base.

2) Add SegmentWranglerModule to CliIndexer, CliPeon. This allows them to
   query lookups and inline data directly.

* Updates based on CI.

* Additional tests.

* Style fix.

* Remove unused import.
2023-04-14 14:04:02 -07:00
Atul Mohan e3c160f2f2
Add start_time column to sys.servers (#13358)
Adds a new column start_time to sys.servers that captures the time at which the server was added to the cluster.
2023-04-14 15:23:34 +05:30
imply-cheddar aaa6cc1883
Make the tasks run with only a single directory (#14063)
* Make the tasks run with only a single directory

There was a change that tried to get indexing to run on multiple disks
It made a bunch of changes to how tasks run, effectively hiding the
"safe" directory for tasks to write files into from the task code itself
making it extremely difficult to do anything correctly inside of a task.

This change reverts those changes inside of the tasks and makes it so that
only the task runners are the ones that make decisions about which
mount points should be used for storing task-related files.

It adds the config druid.worker.baseTaskDirs which can be used by the
task runners to know which directories they should schedule tasks inside of.
The TaskConfig remains the authoritative source of configuration for where
and how an individual task should be operating.
2023-04-13 00:45:02 -07:00
Clint Wylie 1aef72aa7e
Bump up the version in pom to 27.0.0 in preparation of release (#14051) 2023-04-10 14:56:59 +05:30
Clint Wylie d21babc5b8
remix nested columns (#14014)
changes:
* introduce ColumnFormat to separate physical storage format from logical type. ColumnFormat is now used instead of ColumnCapabilities to get column handlers for segment creation
* introduce new 'auto' type indexer and merger which produces a new common nested format of columns, which is the next logical iteration of the nested column stuff. Essentially this is an automatic type column indexer that produces the most appropriate column for the given inputs, making either STRING, ARRAY<STRING>, LONG, ARRAY<LONG>, DOUBLE, ARRAY<DOUBLE>, or COMPLEX<json>.
* revert NestedDataColumnIndexer, NestedDataColumnMerger, NestedDataColumnSerializer to their version pre #13803 behavior (v4) for backwards compatibility
* fix a bug in RoaringBitmapSerdeFactory if anything actually ever wrote out an empty bitmap using toBytes and then later tried to read it (the nerve!)
2023-04-04 17:51:59 -07:00
Clint Wylie d5b1b5bc8e
nested columns + arrays = array columns! (#13803)
array columns!
changes:
* add support for storing nested arrays of string, long, and double values as specialized nested columns instead of breaking them into separate element columns
* nested column type mimic behavior means that columns ingested with only root arrays of primitive values will be ARRAY typed columns
* neat test refactor stuff
* add v4 segment test
* add array element indexes
* add tests for unnest and array columns
* fix unnest column value selector cursor handling of null and empty arrays
2023-03-27 12:42:35 -07:00
Rishabh Singh 598eaad7e1
Fix HSTS for middle manager (#13975)
Fix HSTS for middle manager
2023-03-25 14:01:09 +05:30
Gian Merlino fe9d0c46d5
Improve memory efficiency of WrappedRoaringBitmap. (#13889)
* Improve memory efficiency of WrappedRoaringBitmap.

Two changes:

1) Use an int[] for sizes 4 or below.
2) Remove the boolean compressRunOnSerialization. Doesn't save much
   space, but it does save a little, and it isn't adding a ton of value
   to have it be configurable. It was originally configurable in case
   anything broke when enabling it, but it's been a while and nothing
   has broken.

* Slight adjustment.

* Adjust for inspection.

* Updates.

* Update snaps.

* Update test.

* Adjust test.

* Fix snaps.
2023-03-09 15:48:02 -08:00
Karan Kumar 65c3954942
Adding forbidden api for Properties#get() and Properties#getOrDefault() (#13882)
Properties#getOrDefault method does not check the default map for values where as Properties#getProperty() does.
2023-03-06 10:42:04 +05:30
Tejaswini Bandlamudi 7103cb4b9d
Removes FiniteFirehoseFactory and its implementations (#12852)
The FiniteFirehoseFactory and InputRowParser classes were deprecated in 0.17.0 (#8823) in favor of InputSource & InputFormat. This PR removes the FiniteFirehoseFactory and all its implementations along with classes solely used by them like Fetcher (Used by PrefetchableTextFilesFirehoseFactory). Refactors classes including tests using FiniteFirehoseFactory to use InputSource instead.
Removing InputRowParser may not be as trivial as many classes that aren't deprecated depends on it (with no alternatives), like EventReceiverFirehoseFactory. Hence FirehoseFactory, EventReceiverFirehoseFactory, and Firehose are marked deprecated.
2023-03-02 18:07:17 +05:30
Suneet Saldanha 31c7de1087
Make CompactionSearchPolicy injectable (#13842)
* Make CompactionSearchPolicy injectable

A small refactoring that makes the search policy for compaction injectable.

Future changes can introduce new search policies that can be configured and
injected so that operators can choose which search policy is best suited for
their cluster.

This will also allow us to de-couple the scheduling of compaction jobs from
the CompactSegments duty, allowing the co-ordinator to schedule compaction
jobs faster than the duty lifecycle.

This PR is made so that it easy to review the future changes.

* fix tests
2023-02-27 07:57:03 -08:00
Clint Wylie 08b5951cc5
merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything (#13698)
* merge druid-core, extendedset, and druid-hll into druid-processing to simplify everything
* fix poms and license stuff
* mockito is evil
* allow reset of JvmUtils RuntimeInfo if tests used static injection to override
2023-02-17 14:27:41 -08:00
Paul Rogers 333196d207
Code cleanup & message improvements (#13778)
* Misc cleanup edits

Correct spacing
Add type parameters
Add toString() methods to formats so tests compare correctly
IT doc revisions
Error message edits
Display UT query results when tests fail

* Edit

* Build fix

* Build fixes
2023-02-15 15:22:54 +05:30
AmatyaAvadhanula 0cf1fc3d55
Indexing on multiple disks (#13476)
* Initial commit

* Simple UTs

* Parameterize tests

* Parameterized tests for k8s task runner

* Fix restore bug

* Refactor TaskStorageDirTracker

* Change CliPeon args
2023-02-08 11:31:34 +05:30
Abhishek Agarwal 17936e2920
Add an option to enable HSTS in druid services (#13489)
* Add an option to enable HSTS

* Fix code and add docs

* Deduplicate headers

* unused import

* Fix spelling
2023-01-10 22:31:51 +05:30
imply-cheddar 0efd0879a8
Unify the handling of HTTP between SQL and Native (#13564)
* Unify the handling of HTTP between SQL and Native

The SqlResource and QueryResource have been
using independent logic for things like error
handling and response context stuff.  This
became abundantly clear and painful during a
change I was making for Window Functions, so
I unified them into using the same code for
walking the response and serializing it.

Things are still not perfectly unified (it would
be the absolute best if the SqlResource just
took SQL, planned it and then delegated the
query run entirely to the QueryResource), but
this refactor doesn't take that fully on.

The new code leverages async query processing
from our jetty container, the different
interaction model with the Resource means that
a lot of tests had to be adjusted to align with
the async query model.  The semantics of the
tests remain the same with one exception: the
SqlResource used to not log requests that failed
authorization checks, now it does.
2022-12-19 00:25:33 -08:00
Kashif Faraz 7cf761cee4
Prepare master branch for next release, 26.0.0 (#13401)
* Prepare master branch for next release, 26.0.0

* Use docker image for druid 24.0.1

* Fix version in druid-it-cases pom.xml
2022-11-22 15:31:01 +05:30
Clint Wylie 1231ce3b75
dump-segment tool support for examining nested columns (#13356)
* add nested mode to dump segment tool to dump nested columns

* docs

* more test

* fix it
2022-11-14 16:08:47 -08:00
Dr. Sizzles e5ad24ff9f
Support for middle manager less druid, tasks launch as k8s jobs (#13156)
* Support for middle manager less druid, tasks launch as k8s jobs

* Fixing forking task runner test

* Test cleanup, dependency cleanup, intellij inspections cleanup

* Changes per PR review

Add configuration option to disable http/https proxy for the k8s client
Update the docs to provide more detail about sidecar support

* Removing un-needed log lines

* Small changes per PR review

* Upon task completion we callback to the overlord to update the status / locaiton, for slower k8s clusters, this reduces locking time significantly

* Merge conflict fix

* Fixing tests and docs

* update tiny-cluster.yaml 

changed `enableTaskLevelLogPush` to `encapsulatedTask`

* Apply suggestions from code review

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>

* Minor changes per PR request

* Cleanup, adding test to AbstractTask

* Add comment in peon.sh

* Bumping code coverage

* More tests to make code coverage happy

* Doh a duplicate dependnecy

* Integration test setup is weird for k8s, will do this in a different PR

* Reverting back all integration test changes, will do in anotbher PR

* use StringUtils.base64 instead of Base64

* Jdk is nasty, if i compress in jdk 11 in jdk 17 the decompressed result is different

Co-authored-by: Rahul Gidwani <r_gidwani@apple.com>
Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-11-02 19:44:47 -07:00
Paul Rogers b34b4353f4
Async reads for JDBC (#13196)
Async reads for JDBC:
Prevents JDBC timeouts on long queries by returning empty batches
when a batch fetch takes too long. Uses an async model to run the
result fetch concurrently with JDBC requests.

Fixed race condition in Druid's Avatica server-side handler
Fixed issue with no-user connections
2022-10-18 11:40:57 -07:00
Paul Rogers f4dcc52dac
Redesign QueryContext class (#13071)
We introduce two new configuration keys that refine the query context security model controlled by druid.auth.authorizeQueryContextParams. When that value is set to true then two other configuration options become available:

druid.auth.unsecuredContextKeys: The set of query context keys that do not require a security check. Use this for the "white-list" of key to allow. All other keys go through the existing context key security checks.
druid.auth.securedContextKeys: The set of query context keys that do require a security check. Use this when you want to allow all but a specific set of keys: only these keys go through the existing context key security checks.
Both are set using JSON list format:

druid.auth.securedContextKeys=["secretKey1", "secretKey2"]
You generally set one or the other values. If both are set, unsecuredContextKeys acts as exceptions to securedContextKeys.

In addition, Druid defines two query context keys which always bypass checks because Druid uses them internally:

sqlQueryId
sqlStringifyArrays
2022-10-15 11:02:11 +05:30
AmatyaAvadhanula 41e51b21c3
Make http options the default configurations (#13092)
Druid currently uses Zookeeper dependent options as the default.
This commit updates the following to use HTTP as the default instead.
- task runner. `druid.indexer.runner.type=remote -> httpRemote`
- load queue peon. `druid.coordinator.loadqueuepeon.type=curator -> http`
- server inventory view. `druid.serverview.type=curator -> http`
2022-10-05 05:35:17 +05:30
Rohan Garg 7aa8d7f987
Add query/time metric for SQL queries from router (#12867)
* Add query/time metric for SQL queries from router

* Fix query cancel bug when user has overriden native query-id in a SQL query
2022-09-07 13:54:46 +05:30
Abhishek Agarwal 618757352b
Bump up the version to 25.0.0 (#12975)
* Bump up the version to 25.0.0

* Fix the version in console
2022-08-29 11:27:38 +05:30
Karan Kumar 275f834b2a
Race in Task report/log streamer (#12931)
* Fixing RACE in HTTP remote task Runner

* Changes in the interface

* Updating documentation

* Adding test cases to SwitchingTaskLogStreamer

* Adding more tests
2022-08-25 17:56:01 -07:00
Paul Rogers cfed036091
Add the new integration test framework (#12368)
This commit is a first draft of the revised integration test framework which provides:
- A new directory, integration-tests-ex that holds the new integration test structure. (For now, the existing integration-tests is left unchanged.)
- Maven module druid-it-tools to hold code placed into the Docker image.
- Maven module druid-it-image to build the Druid-only test image from the tarball produced in distribution. (Dependencies live in their "official" image.)
- Maven module druid-it-cases that holds the revised tests and the framework itself. The framework includes file-based test configuration, test-specific clients, test initialization and updated versions of some of the common test support classes.

The integration test setup is primarily a huge mass of details. This approach refactors many of those details: from how the image is built and configured to how the Docker Compose scripts are structured to test configuration. An extensive set of "readme" files explains those details. Rather than repeat that material here, please consult those files for explanations.
2022-08-24 17:03:23 +05:30
Paul Rogers 8ad8582dc8
Refactor DruidSchema & DruidTable (#12835)
Refactors the DruidSchema and DruidTable abstractions to prepare for the Druid Catalog.

As we add the catalog, we’ll want to combine physical segment metadata information with “hints” provided by the catalog. This is best done if we tidy up the existing code to more clearly separate responsibilities.

This PR is purely a refactoring move: no functionality changed. There is no difference to user functionality or external APIs. Functionality changes will come later as we add the catalog itself.

DruidSchema
In the present code, DruidSchema does three tasks:

Holds the segment metadata cache
Interfaces with an external schema manager
Acts as a schema to Calcite
This PR splits those responsibilities.

DruidSchema holds the Calcite schema for the druid namespace, combining information fro the segment metadata cache, from the external schema manager and (later) from the catalog.
SegmentMetadataCache holds the segment metadata cache formerly in DruidSchema.
DruidTable
The present DruidTable class is a bit of a kitchen sink: it holds all the various kinds of tables which Druid supports, and uses if-statements to handle behavior that differs between types. Yet, any given DruidTable will handle only one such table type. To more clearly model the actual table types, we split DruidTable into several classes:

DruidTable becomes an abstract base class to hold Druid-specific methods.
DatasourceTable represents a datasource.
ExternalTable represents an external table, such as from EXTERN or (later) from the catalog.
InlineTable represents the internal case in which we attach data directly to a table.
LookupTable represents Druid’s lookup table mechanism.
The new subclasses are more focused: they can be selective about the data they hold and the various predicates since they represent just one table type. This will be important as the catalog information will differ depending on table type and the new structure makes adding that logic cleaner.

DatasourceMetadata
Previously, the DruidSchema segment cache would work with DruidTable objects. With the catalog, we need a layer between the segment metadata and the table as presented to Calcite. To fix this, the new SegmentMetadataCache class uses a new DatasourceMetadata class as its cache entry to hold only the “physical” segment metadata information: it is up to the DruidTable to combine this with the catalog information in a later PR.

More Efficient Table Resolution
Calcite provides a convenient base class for schema objects: AbstractSchema. However, this class is a bit too convenient: all we have to do is provide a map of tables and Calcite does the rest. This means that, to resolve any single datasource, say, foo, we need to cache segment metadata, external schema information, and catalog information for all tables. Just so Calcite can do a map lookup.

There is nothing special about AbstractSchema. We can handle table lookups ourselves. The new AbstractTableSchema does this. In fact, all the rest of Calcite wants is to resolve individual tables by name, and, for commands we don’t use, to provide a list of table names.

DruidSchema now extends AbstractTableSchema. SegmentMetadataCache resolves individual tables (and provides table names.)

DruidSchemaManager
DruidSchemaManager provides a way to specify table schemas externally. In this sense, it is similar to the catalog, but only for datasources. It originally followed the AbstractSchema pattern: it implements provide a map of tables. This PR provides new optional methods for the table lookup and table names operations. The default implementations work the same way that AbstractSchema works: we get the entire map and pick out the information we need. Extensions that use this API should be revised to support the individual operations instead. Druid code no longer calls the original getTables() method.

The PR has one breaking change: since the DruidSchemaManager map is read-only to the rest of Druid, we should return a Map, not a ConcurrentMap.
2022-08-10 10:24:04 +05:30
Paul Rogers a618458bf0
Tidy up construction of the Guice Injectors (#12816)
* Refactor Guice initialization

Builders for various module collections
Revise the extensions loader
Injector builders for server startup
Move Hadoop init to indexer
Clean up server node role filtering
Calcite test injector builder

* Revisions from review comments

* Build fixes

* Revisions from review comments
2022-08-04 00:05:07 -07:00
Gian Merlino 2b330186e2
Mid-level service client and updated high-level clients. (#12696)
* Mid-level service client and updated high-level clients.

Our servers talk to each other over HTTP. We have a low-level HTTP
client (HttpClient) that is super-asynchronous and super-customizable
through its handlers. It's also proven to be quite robust: we use it
for Broker -> Historical communication over the wide variety of query
types and workloads we support.

But the low-level client has no facilities for service location or
retries, which means we have a variety of high-level clients that
implement these in their own ways. Some high-level clients do a better
job than others. This patch adds a mid-level ServiceClient that makes
it easier for high-level clients to be built correctly and harmoniously,
and migrates some of the high-level logic to use ServiceClients.

Main changes:

1) Add ServiceClient org.apache.druid.rpc package. That package also
   contains supporting stuff like ServiceLocator and RetryPolicy
   interfaces, and a DiscoveryServiceLocator based on
   DruidNodeDiscoveryProvider.

2) Add high-level OverlordClient in org.apache.druid.rpc.indexing.

3) Indexing task client creator in TaskServiceClients. It uses
   SpecificTaskServiceLocator to find the tasks. This improves on
   ClientInfoTaskProvider by caching task locations for up to 30 seconds
   across calls, reducing load on the Overlord.

4) Rework ParallelIndexSupervisorTaskClient to use a ServiceClient
   instead of extending IndexTaskClient.

5) Rework RemoteTaskActionClient to use a ServiceClient instead of
   DruidLeaderClient.

6) Rework LocalIntermediaryDataManager, TaskMonitor, and
   ParallelIndexSupervisorTask. As a result, MiddleManager, Peon, and
   Overlord no longer need IndexingServiceClient (which internally used
   DruidLeaderClient).

There are some concrete benefits over the prior logic, namely:

- DruidLeaderClient does retries in its "go" method, but only retries
  exactly 5 times, does not sleep between retries, and does not retry
  retryable HTTP codes like 502, 503, 504. (It only retries IOExceptions.)
  ServiceClient handles retries in a more reasonable way.

- DruidLeaderClient's methods are all synchronous, whereas ServiceClient
  methods are asynchronous. This is used in one place so far: the
  SpecificTaskServiceLocator, so we don't need to block a thread trying
  to locate a task. It can be used in other places in the future.

- HttpIndexingServiceClient does not properly handle all server errors.
  In some cases, it tries to parse a server error as a successful
  response (for example: in getTaskStatus).

- IndexTaskClient currently makes an Overlord call on every task-to-task
  HTTP request, as a way to find where the target task is. ServiceClient,
  through SpecificTaskServiceLocator, caches these target locations
  for a period of time.

* Style adjustments.

* For the coverage.

* Adjustments.

* Better behaviors.

* Fixes.
2022-07-05 09:43:26 -07:00
Gian Merlino d29343cbe3
Disable autokill of segments by default. (#12693)
Also add clarifying commentary to the documentation about how durationToRetain works.
2022-06-23 17:17:11 -07:00
Paul Rogers ffcb996468
Cleanup changes pulled out of PR #12368 (#12672)
This commit contains the cleanup needed for the new integration test framework.

Changes:
- Fix log lines, misspellings, docs, etc.
- Allow the use of some of Druid's "JSON config" objects in tests
- Fix minor bug in `BaseNodeRoleWatcher`
2022-06-23 23:19:50 +05:30
Gian Merlino 70f3b13621
ForkingTaskRunner: Set ActiveProcessorCount for tasks. (#12592)
* ForkingTaskRunner: Set ActiveProcessorCount for tasks.

This prevents various automatically-sized thread pools from being unreasonably
large (we don't want each task to size its pools as if it is the only thing on
the entire machine).

* Fix tests.

* Add missing LifecycleStart annotation.

* ForkingTaskRunner needs ManageLifecycle.
2022-06-15 15:56:32 -07:00
Rohan Garg 28f2c8e112
Support LoadScope for Peons + Access Modifier Updates (#12640)
* Support LoadScope for Peons

* Update access modifiers for GroupByEngineV2
2022-06-14 21:52:50 -07:00
Gian Merlino 1f6e888472
Add QoSFilters first in the chain. (#12625)
* Add QoSFilters first in the chain.

When a request is suspended and later resumed due to QoS constraints,
its filter chain is restarted. Placing QoSFilters first in the chain
avoids double-execution of other filters.

Fixes an issue where requests deferred by QoS would report 403 Forbidden
due to double-execution of SecuritySanityCheckFilter.

* Smaller changes.

* Add QoS filters in BaseJettyTest.

* Remove unused parameter.
2022-06-14 13:37:00 -07:00
Gian Merlino a503683a4a
Add caching and CSP response headers. (#12609)
* Add caching and CSP response headers.

* Fix tests.

* Fix checkstyle issues

Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com>
2022-06-04 21:46:49 +05:30
Clint Wylie 9e5a940cf1
remake column indexes and query processing of filters (#12388)
Following up on #12315, which pushed most of the logic of building ImmutableBitmap into BitmapIndex in order to hide the details of how column indexes are implemented from the Filter implementations, this PR totally refashions how Filter consume indexes. The end result, while a rather dramatic reshuffling of the existing code, should be extraordinarily flexible, eventually allowing us to model any type of index we can imagine, and providing the machinery to build the filters that use them, while also allowing for other column implementations to implement the built-in index types to provide adapters to make use indexing in the current set filters that Druid provides.
2022-05-11 11:57:08 +05:30
Abhishek Agarwal 2fe053c5cb
Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
zachjsh 564d6defd4
Worker level task metrics (#12446)
* * fix metric name inconsistency

* * add task slot metrics for middle managers

* * add new WorkerTaskCountStatsMonitor to report task count metrics
  from worker

* * more stuff

* * remove unused variable

* * more stuff

* * add javadocs

* * fix checkstyle

* * fix hadoop test failure

* * cleanup

* * add more code coverage in tests

* * fix test failure

* * add docs

* * increase code coverage

* * fix spelling

* * fix failing tests

* * remove dead code

* * fix spelling
2022-04-26 11:44:44 -05:00
Xavier Léauté 1434197ee1
update airline dependency to 2.x (#12270)
* upgrade Airline to Airline 2
  https://github.com/airlift/airline is no longer maintained, updating to
  https://github.com/rvesse/airline (Airline 2) to use an actively
  maintained version, while minimizing breaking changes.

  Note, this is a backwards incompatible change, and extensions relying on
  the CliCommandCreator extension point will also need to be updated.

* fix dependency checks where jakarta.inject is now resolved first instead
  of javax.inject, due to Airline 2 using jakarta
2022-02-27 15:19:28 -08:00
Suneet Saldanha ced1389d4c
Enable auto kill segments by default (#12187)
* Enable auto-kill by default

* tests

* wip

* test

* fix IT

* fix it

* remove from docs

* make coverage bot happy
2022-02-07 06:57:54 -08:00
Maytas Monsereenusorn fac6a48a8f
add impl (#12201) 2022-01-27 11:39:59 -08:00
Suneet Saldanha 2b32d86f3b
Enable automatic metdata cleanup by default (#12188) 2022-01-24 20:04:17 -08:00
somu-imply c267b65f97
Removing unused processing threadpool on broker (#12070)
* Thread pool for broker

* Updating two tests to improve coverage for new method added

* Updating druidProcessingConfigTest to cover coverage

* Adding missed spelling errors caused in doc

* Adding test to cover lines of new function added
2021-12-21 13:07:53 -08:00
Clint Wylie e53c3e80ca
set log4j2.is.webapp to false if not set so that shutdown hooks are run (#12056)
* set log4j2.is.webapp to false if not set so that shutdown hooks are run
2021-12-10 21:55:35 -08:00
Jihoon Son fc9513b6cd
Make NodeRole available during binding; add support for dynamic registration of DruidService (#12012)
* Make nodeRole available during binding; add support for dynamic registration of DruidService

* fix checkstyle and test

* fix customRole test

* address comments

* add more javadoc
2021-12-03 11:59:00 -08:00
Gian Merlino 6f6e88e02e
SQL: Add type headers to response formats. (#11914)
This allows clients to interpret the results of SQL queries without having to guess types.
2021-11-13 11:30:57 +05:30
Gian Merlino babf00f8e3
Migrate File.mkdirs to FileUtils.mkdirp. (#11879)
* Migrate File.mkdirs to FileUtils.mkdirp.

* Remove unused imports.

* Fix LookupReferencesManager.

* Simplify.

* Also migrate usages of forceMkdir.

* Fix var name.

* Fix incorrect call.

* Update test.
2021-11-09 11:10:49 -08:00
Karan Kumar 90640bb316
Support for hadoop 3 via maven profiles (#11794)
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
2021-10-30 22:46:24 +05:30
Lucas Capistrant 1930ad1f47
Implement configurable internally generated query context (#11429)
* Add the ability to add a context to internally generated druid broker queries

* fix docs

* changes after first CI failure

* cleanup after merge with master

* change default to empty map and improve unit tests

* add doc info and fix checkstyle

* refactor DruidSchema#runSegmentMetadataQuery and add a unit test
2021-10-06 09:02:41 -07:00
Clint Wylie 11017ef00a
support jdbc even if trailing / is missing (#11737)
* support jdbc even if trailing / is missing

* fix tests
2021-09-29 13:59:26 -07:00
Maytas Monsereenusorn a04b08e45c
Add new config to filter internal Druid-related messages from Query API response (#11711)
* add impl

* add impl

* add tests

* add unit test

* fix checkstyle

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* address comments

* address comments

* address comments

* fix test

* fix test

* fix test

* fix test

* fix test

* change config name

* change config name

* change config name

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* fix compile

* fix compile

* change config

* add more tests

* fix IT
2021-09-29 12:55:49 +07:00
Kashif Faraz c641657bae
Fix router documentation for `druid.router.sql.enable` (#11716)
* Rename field, fix router documentation

* Add more lines to doc

* Apply doc suggestions from code review

Co-authored-by: Charles Smith <techdocsmith@gmail.com>

Co-authored-by: Charles Smith <techdocsmith@gmail.com>
2021-09-28 22:54:13 +05:30
Clint Wylie 3044372fc1
improved JDBC logging (#11676)
* improve jdbc and router query debug logging

* log errors too

* no stacktrace

* trace those stacks
2021-09-16 01:28:16 -07:00
Rohan Garg 3c46577eec
Fix moving average extension loading in middle manager and overlord (#11662) 2021-09-08 22:09:22 -07:00
Clint Wylie fe1d8c206a
bump version to 0.23.0-SNAPSHOT (#11670) 2021-09-08 15:56:04 -07:00
Jihoon Son 82049bbf0a
Cancel API for sqls (#11643)
* initial work

* reduce lock in sqlLifecycle

* Integration test for sql canceling

* javadoc, cleanup, more tests

* log level to debug

* fix test

* checkstyle

* fix flaky test; address comments

* rowTransformer

* cancelled state

* use lock

* explode instead of noop

* oops

* unused import

* less aggressive with state

* fix calcite charset

* don't emit metrics when you are not authorized
2021-09-05 10:57:45 -07:00
Maytas Monsereenusorn fc86a7a97f
fix custom coordinator duty (#11641) 2021-08-31 14:04:00 +07:00
Maytas Monsereenusorn ce4dd48bb8
Support custom coordinator duties (#11601)
* impl

* fix checkstyle

* fix checkstyle

* fix checkstyle

* add test

* add test

* add test

* add integration tests

* add integration tests

* add more docs

* address comments

* address comments

* address comments

* add test

* fix checkstyle

* fix test
2021-08-19 11:54:11 +07:00
Parag Jain c7b46671b3
option to use deep storage for storing shuffle data (#11507)
Fixes #11297.
Description

Description and design in the proposal #11297
Key changed/added classes in this PR

    *DataSegmentPusher
    *ShuffleClient
    *PartitionStat
    *PartitionLocation
    *IntermediaryDataManager
2021-08-13 16:40:25 -04:00
Kashif Faraz aaf0aaad8f
Enable routing of SQL queries at Router (#11566)
This PR adds a new property druid.router.sql.enable which allows the
Router to handle SQL queries when set to true.

This change does not affect Avatica JDBC requests and they are still routed
by hashing the Connection ID.

To allow parsing of the request object as a SqlQuery (contained in module druid-sql),
some classes have been moved from druid-server to druid-services with
the same package name.
2021-08-13 18:44:39 +05:30
Maytas Monsereenusorn c068906fca
Make intermediate store for shuffle tasks an extension point (#11492)
* add interface

* add docs

* fix errors

* fix injection

* fix injection

* update javadoc
2021-07-27 11:29:43 +07:00
Maytas Monsereenusorn f5d53569ca
Supervisor metadata auto cleanup failing as missing Guice injection (#11424)
* Fix Supervisor metadata auto cleanup failing as missing Guice injection

* Fix Supervisor metadata auto cleanup failing as missing Guice injection

* fix IT

* fix IT

* Update services/src/main/java/org/apache/druid/cli/CliCoordinator.java

Co-authored-by: Clint Wylie <cjwylie@gmail.com>

* fix

* fix

* fix

* fix

* fix

* fix

* fix

Co-authored-by: Clint Wylie <cjwylie@gmail.com>
2021-07-13 09:47:49 +07:00
Abhishek Agarwal 03a6a6d6e1
Replace Processing ExecutorService with QueryProcessingPool (#11382)
This PR refactors the code for QueryRunnerFactory#mergeRunners to accept a new interface called QueryProcessingPool instead of ExecutorService for concurrent execution of query runners. This interface will let custom extensions inject their own implementation for deciding which query-runner to prioritize first. The default implementation is the same as today that takes the priority of query into account. QueryProcessingPool can also be used as a regular executor service. It has a dedicated method for accepting query execution work so implementations can differentiate between regular async tasks and query execution tasks. This dedicated method also passes the QueryRunner object as part of the task information. This hook will let custom extensions carry any state from QuerySegmentWalker to QueryProcessingPool#mergeRunners which is not possible currently.
2021-07-01 16:03:08 +05:30
Yi Yuan de8daf8139
Delete buildV9Directly in Kafka and Kinesis Indexing Service (#11351)
* delete_buildV9Directly_in_kafka_and_kinesis_indexing_service

* delete

* delete them from server

* delete buildV9Directly from hadoop indexing

* bug fixed

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-23 16:36:46 -07:00
Yi Yuan 8de0d36c52
Allow query through router when load moving average extension (#11276)
* init commit

* change NoopQuerySegmentWalker name

* change doc

* move NoopQuerySegmentWalker and add document

* fix doc

Co-authored-by: yuanyi <yuanyi@freewheel.tv>
2021-06-10 18:46:53 +08:00
Maytas Monsereenusorn 3455352241
Add feature to automatically remove compaction configurations for inactive datasources (#11232)
* add auto cleanup

* add auto cleanup

* add auto cleanup

* add tests

* add tests

* use retryutils

* use retryutils

* use retryutils

* address comments
2021-05-11 18:49:18 -07:00
Maytas Monsereenusorn 4326e699bd
Add feature to automatically remove datasource metadata based on retention period (#11227)
* add auto clean up datasource metadata

* add test

* fix checkstyle

* add comments

* fix error

* address comments

* Address comments

* fix test

* fix test

* fix typo

* add comment

* fix test

* fix test
2021-05-11 01:22:33 -07:00
Jihoon Son 2df42143ae
Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189)
* Fix idempotence of segment allocation and task report apis in native
batch ingestion

* better error and javadoc

* checkstyle and dependency

* fix tests and add more tests

* task config instead of context; add doc

* unused import and dependency

* typo in doc

* fix unintended changes

* fix wrong import

* remove unnecessary error handling

* add task context back

* default task context

* fix test and doc

* address comments

* unused imports
2021-05-07 14:29:48 -07:00
Maytas Monsereenusorn d73f72e508
Add feature to automatically remove supervisor based on retention period (#11200)
* add auto clean up

* add test

* add test

* fix test

* Address comments

* Address comments
2021-05-06 22:25:23 -07:00
zachjsh 99f39c7202
Hadoop segment index file rename (#11194)
* Do stuff

* Do more stuff

* * Do more stuff

* * Do more stuff

* * working

* * cleanup

* * more cleanup

* * more cleanup

* * add license header

* * Add unit tests

* * add java docs

* * add more unit tests

* * Cleanup test

* * Move removing of workingPath to index task rather than in hadoop job.

* * Address review comments

* * remove unused import

* * Address review comments

* Do not overwrite segment descriptor for segment if it already exists.

* * add comments to FileSystemHelper class

* * fix local hadoop integration test

* * Fix failing test failures when running with java11

* Revert "Revert "Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075)" (#11151)"

This reverts commit 49a9c3ffb7.

* * remove JobHelperPowerMockTest

* * remove FileSystemHelper class
2021-05-04 20:22:18 -04:00
Maytas Monsereenusorn 84aac4832d
Add feature to automatically remove rules based on retention period (#11164)
* Add feature to automatically remove rules based on retention period

* Add feature to automatically remove rules based on retention period

* address comments
2021-05-03 11:50:45 -07:00
Jonathan Wei 49a9c3ffb7
Revert "Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075)" (#11151)
This reverts commit a2892d9c40.
2021-04-22 15:33:27 -07:00
zachjsh a2892d9c40
Adjust HadoopIndexTask temp segment renaming to avoid potential race conditions (#11075)
* Do stuff

* Do more stuff

* * Do more stuff

* * Do more stuff

* * working

* * cleanup

* * more cleanup

* * more cleanup

* * add license header

* * Add unit tests

* * add java docs

* * add more unit tests

* * Cleanup test

* * Move removing of workingPath to index task rather than in hadoop job.

* * Address review comments

* * remove unused import

* * Address review comments

* Do not overwrite segment descriptor for segment if it already exists.

* * add comments to FileSystemHelper class

* * fix local hadoop integration test
2021-04-21 12:24:31 -07:00
Maytas Monsereenusorn 6d2b5cdd7e
Add feature to automatically remove audit logs based on retention period (#11084)
* add docs

* add impl

* fix checkstyle

* fix test

* add test

* fix checkstyle

* fix checkstyle

* fix test

* Address comments

* Address comments

* fix spelling

* fix docs
2021-04-20 17:10:43 -07:00
Jihoon Son a6a2758095
More unit tests for JsonParserIterator; Integration tests for query errors (#11091)
* unit tests for timeout exception in init

* integration tests

* run integraion test on travis

* fix inspection
2021-04-12 15:08:50 -07:00
Lucas Capistrant 8264203cee
Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676)
* Add ability to wait for segment availability for batch jobs

* IT updates

* fix queries in legacy hadoop IT

* Fix broken indexing integration tests

* address an lgtm flag

* spell checker still flagging for hadoop doc. adding under that file header too

* fix compaction IT

* Updates to wait for availability method

* improve unit testing for patch

* fix bad indentation

* refactor waitForSegmentAvailability

* Fixes based off of review comments

* cleanup to get compile after merging with master

* fix failing test after previous logic update

* add back code that must have gotten deleted during conflict resolution

* update some logging code

* fixes to get compilation working after merge with master

* reset interrupt flag in catch block after code review pointed it out

* small changes following self-review

* fixup some issues brought on by merge with master

* small changes after review

* cleanup a little bit after merge with master

* Fix potential resource leak in AbstractBatchIndexTask

* syntax fix

* Add a Compcation TuningConfig type

* add docs stipulating the lack of support by Compaction tasks for the new config

* Fixup compilation errors after merge with master

* Remove erreneous newline
2021-04-08 21:03:00 -07:00
Lasse Krogh Mammen 782a1d4e6c
Add Calcite Avatica protobuf handler (#10543) 2021-03-31 12:46:25 -07:00
Abhishek Agarwal 96d26e5338
Fix kinesis ingestion bugs (#10761)
* add offsetFetchPeriod to kinesis ingestion doc

* Remove jackson dependencies from extensions

* Use fixed delay for lag collection

* Metrics reset after finishing processing

* comments

* Broaden the list of exceptions to retry for

* Unit tests

* Add more tests

* Refactoring

* re-order metrics

* Doc suggestions

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>

* Add tests

Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>
2021-02-05 02:49:58 -08:00
Maytas Monsereenusorn a46d561bd7
Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740)
* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix checkstyle

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* fix test

* fix test

* add log

* Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

* address comments

* fix checkstyle

* fix checkstyle

* add config to skip overhead memory calculation

* add test for the skipBytesInMemoryOverheadCheck config

* add docs

* fix checkstyle

* fix checkstyle

* fix spelling

* address comments

* fix travis

* address comments
2021-01-27 00:34:56 -08:00
Jihoon Son 95065bdf1a
Bump dev version to 0.22.0-SNAPSHOT (#10759) 2021-01-15 13:16:23 -08:00
Clint Wylie 74fbdd322d
refactor NodeRole so extensions can participate in disco and announcement (#10700)
* refactor NodeRole so extensions can participate in disco and announcement

* fixes, maybe

* retries

* javadoc

* fix

* spelling
2020-12-24 15:29:32 -08:00
Himanshu ac1882bf74
kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544)
* honor zk enablement config in more places in druid code

* kubernetes based discovery module

* fix spotbugs check

* fix intellij checks error

* fix doc link to kubernetes.md from extension

* make spellchecker happy

* update license.yaml

* fix dependency check errors

* update extension coverage

* UTs for BaseNodeRoleWatcher

* fix forbidden-api check

* update k8s module coverage ignores

* add Bouncy Castle License being same as MIT License for license checking purposes

* further update licenses.yaml

* label/annotation pre-existence assumption

* address review comment
2020-12-14 21:10:31 -08:00
Harini Rajendran c2e26d2e1c
Add status/selfDiscovered endpoint to indexer for self discovery of indexer (#10679)
Added the status/selfDiscovered endpoint to indexer. Per the api-reference doc, all services support status/selfDiscovered endpoint. So this change would fix that expected behavior.

Also added example config files for indexer process that can be used to spin up the indexer process.
2020-12-14 19:04:14 -08:00
Himanshu ee136303bb
optionally disable all of hardcoded zookeeper use (#9507)
* optionally disable all of hardcoded zookeeper use

* fix DruidCoordinatorTest compilation

* fix test in DruidCoordinatorTest

* fix strict compilation

Co-authored-by: Himanshu Gupta <fill email>
2020-10-26 22:35:59 -07:00
Liran Funaro f3a2903218
Configurable Index Type (#10335)
* Introduce a Configurable Index Type

* Change to @UnstableApi

* Add AppendableIndexSpecTest

* Update doc

* Add spelling exception

* Add tests coverage

* Revert some of the changes to reduce diff

* Minor fixes

* Update getMaxBytesInMemoryOrDefault() comment

* Fix typo, remove redundant interface

* Remove off-heap spec (postponed to a later PR)

* Add javadocs to AppendableIndexSpec

* Describe testCreateTask()

* Add tests for AppendableIndexSpec within TuningConfig

* Modify hashCode() to conform with equals()

* Add comment where building incremental-index

* Add "EqualsVerifier" tests

* Revert some of the API back to AppenderatorConfig

* Don't use multi-line comments

* Remove knob documentation (deferred)
2020-10-23 18:34:26 -07:00
Jihoon Son ad437dd655
Add shuffle metrics for parallel indexing (#10359)
* Add shuffle metrics for parallel indexing

* javadoc and concurrency test

* concurrency

* fix javadoc

* Feature flag

* doc

* fix doc and add a test

* checkstyle

* add tests

* fix build and address comments
2020-10-10 19:35:17 -07:00
Jonathan Wei 65c0d64676
Update version to 0.21.0-SNAPSHOT (#10450)
* [maven-release-plugin] prepare release druid-0.21.0

* [maven-release-plugin] prepare for next development iteration

* Update web-console versions
2020-10-03 16:08:34 -07:00
Mainak Ghosh 8168e14e92
Adding task slot count metrics to Druid Overlord (#10379)
* Adding more worker metrics to Druid Overlord

* Changing the nomenclature from worker to peon as that represents the metrics that we want to monitor better

* Few more instance of worker usage replaced with peon

* Modifying the peon idle count logic to only use eligible workers available capacity

* Changing the naming to task slot count instead of peon

* Adding some unit test coverage for the new test runner apis

* Addressing Review Comments

* Modifying the TaskSlotCountStatsProvider apis so that overlords which are not leader do not emit these metrics

* Fixing the spelling issue in the docs

* Setting the annotation Nullable on the TaskSlotCountStatsProvider methods
2020-09-28 23:50:38 -07:00
Maytas Monsereenusorn 72f1b55f56
Add last_compaction_state to sys.segments table (#10413)
* Add is_compacted to sys.segments table

* change is_compacted to last_compaction_state

* fix tests

* fix tests

* address comments
2020-09-23 15:29:36 -07:00
Jihoon Son 8f14ac814e
More structured way to handle parse exceptions (#10336)
* More structured way to handle parse exceptions

* checkstyle; add more tests

* forbidden api; test

* address comment; new test

* address review comments

* javadoc for parseException; remove redundant parseException in streaming ingestion

* fix tests

* unnecessary catch

* unused imports

* appenderator test

* unused import
2020-09-11 16:31:10 -07:00
Gian Merlino 8ab1979304
Remove implied profanity from error messages. (#10270)
i.e. WTF, WTH.
2020-08-28 11:38:50 -07:00