The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API)
Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid )
The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.
Often users are submitting queries, and ingestion specs that work only if the relevant extension is not loaded. However, the error is too technical for the users and doesn't suggest them to check for missing extensions. This PR modifies the error message so users can at least check their settings before assuming that the error is because of a bug.
* Ensure ByteBuffers allocated in tests get freed.
Many tests had problems where a direct ByteBuffer would be allocated
and then not freed. This is bad because it causes flaky tests.
To fix this:
1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder.
This makes it easy to free the direct buffer. Currently, it's only used
in tests, because production code seems OK.
2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either
to ByteBuffer.allocate (on-heap, which are garbaged collected), or to
ByteBufferUtils.allocateDirect (wherever it seemed like there was a good
reason for the buffer to be off-heap). Make sure to close all direct
holders when done.
* Changes based on CI results.
* A different approach.
* Roll back BitmapOperationTest stuff.
* Try additional surefire memory.
* Revert "Roll back BitmapOperationTest stuff."
This reverts commit 49f846d9e3.
* Add TestBufferPool.
* Revert Xmx change in tests.
* Better behaved NestedQueryPushDownTest. Exit tests on OOME.
* Fix TestBufferPool.
* Remove T1C from ARM tests.
* Somewhat safer.
* Fix tests.
* Fix style stuff.
* Additional debugging.
* Reset null / expr configs better.
* ExpressionLambdaAggregatorFactory thread-safety.
* Alter forkNode to try to get better info when a JVM crashes.
* Fix buffer retention in ExpressionLambdaAggregatorFactory.
* Remove unused import.
The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.
Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params.
User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters.
System context params. They are set by the Druid query engine during query processing. These params override other context params.
Today, any context params are allowed to users. This can cause
1) a bad UX if the context param is not matured yet or
2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows.
This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission.
{
"resourceAction" : {
"resource" : {
"name" : "maxSubqueryRows",
"type" : "QUERY_CONTEXT"
},
"action" : "WRITE"
},
"resourceNamePattern" : "maxSubqueryRows"
}
Each role can have multiple permissions for context params. Each permission should be set for different context params.
When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case,
HTTP endpoints will return 403 response code.
JDBC will throw ForbiddenException.
Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService.
The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.
The latest version of Error Prone now requires Java 11. Upgrading means we can
remove a lot of the maven profile complexity required to run checks with Java 8.
This also requires switching our strict build to use Java 11.
* update error-prone to 2.11
* remove need for specific maven profiles for Java 8 and Java 15
* fix additional Error Prone warnings with Java 11
* update strict build to use Java 11
* remove use of mocks for ServiceMetricEvent
* simplify KafkaEmitterTests by moving to Mockito
* speed up KafkaEmitterTest by adjusting reporting frequency in tests
* remove unnecessary easymock and JUnitParams dependencies
* rework sql planner expression and virtual column handling
* simplify a bit
* add back and deprecate old methods, more tests, fix multi-value string coercion bug and associated tests
* spotbugs
* fix bugs with multi-value string array expression handling
* javadocs and adjust test
* better
* fix tests
* working
* Lazily load segmentKillers, segmentMovers, and segmentArchivers
* more tests
* test-jar plugin
* more coverage
* lazy client
* clean up changes
* checkstyle
* i did not change the branch condition
* adjust failure rate to run tests faster
* javadocs
* checkstyle
* Refactor ResponseContext
Fixes a number of issues in preparation for request trailers
and the query profile.
* Converts keys from an enum to classes for smaller code
* Wraps stored values in functions for easier capture for other uses
* Reworks the "header squeezer" to handle types other than arrays.
* Uses metadata for visibility, and ability to compress,
to replace ad-hoc code.
* Cleans up JSON serialization for the response context.
* Other miscellaneous cleanup.
* Handle unknown keys in deserialization
Also, make "Visibility" into a boolean.
* Revised comment
* Renamd variable
Druid currently has 2 serverViews, regular serverView and filtered serverView. The regular serverView is used to monitor all segment announcements from all data nodes (historicals, tasks, indexers). The filtered serverView is used when you want to watch segment announcements from particular tiers. Since these server views keep track of different sets of druidServers and segments in memory, they should be maintained separately. However, they currently share the same name for their executorService, which can cause confusion and make debugging harder especially in the broker since it is using both serverViews, the filtered view for normal query processing and the regular view to serve the servers table (I'm unsure whether this is intended or whether this is a good behavior). This PR changes it to a more obvious name.
This PR also removes SingleServerInventoryView. This view was deprecated a long time ago and has not been documented at least since 0.13 (#6127). I also don't think this can be better in any case than BatchServerInventoryView. Finally, I merged AbstractCuratorServerInventoryView and BatchServerInventoryView as we no longer need AbstractCuratorServerInventoryView after SingleServerInventoryView is removed.
* Make nodeRole available during binding; add support for dynamic registration of DruidService
* fix checkstyle and test
* fix customRole test
* address comments
* add more javadoc
* Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs.
This patch gathers together a variety of SQL from SqlSegmentsMetadataManager
and IndexerSQLMetadataStorageCoordinator into a new class SqlSegmentsMetadataQuery.
It focuses on SQL related to retrieving segment payloads and marking
segments used and unused.
In addition to cleaning up the code a bit, this patch also fixes a bug
with years before 0 or after 9999. The prior SQL did not work properly
because dates outside this range cannot be compared as strings. The new
code does work for these far-past and far-future years.
So, if you're ever interested in using Druid to analyze things from
ancient Babylon, you better apply this patch first!
* Fix test compiling.
* Fixes and improvements.
* Fix forbidden API.
* Additional fixes.
* add back and deprecate aggregator factory methods so i can say i told you so when i delete these later
* rename to make less ambiguous, fix fill method
* adjust
* Add worker category as dimension in TaskSlotCountStatsMonitor
* Change description
* Add workerConfig as field
* Modify HttpRemoteTaskRunnerTest to test worker category in taskslot metrics
* Fixing tests
* Fixing alerts
* Adding unit test in SingleTaskBackgroundRunnerTest for task slot metrics APIs
* Resolving false positive spell check
* addressing comments
* throw UnsupportedOperationException for tasklotmetrics APIs in SingleTaskBackgroundRunner
Co-authored-by: Nikhil Navadiya <nnavadiya@twitter.com>
* add missing json type for ListFilteredVirtualColumn, and tests to try to avoid this happening again
* fixes
* ugly, but maybe this
* oops
* too many mappers
Add support for hadoop 3 profiles . Most of the details are captured in #11791 .
We use a combination of maven profiles and resource filtering to achieve this. Hadoop2 is supported by default and a new maven profile with the name hadoop3 is created. This will allow the user to choose the profile which is best suited for the use case.
* add ColumnInspector argument to PostAggregator.getType to allow post-aggs to compute their output type based on input types
* add test for test for coverage
* simplify
* Remove unused imports.
Co-authored-by: Gian Merlino <gian@imply.io>
* better type system
* needle in a haystack
* ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support
* fixup merge
* more test
* fixup
* intern
* fix
* oops
* oops again
* ...
* more test coverage
* fix error message
* adjust interning, more javadocs
* oops
* more docs more better
* Redis mget problem in cluster mode
* Format code
* push down implementation of getBulk to sub-classes
* Add tests
* revert some changes
* Fix intelllij inspections
* Fix comments
Signed-off-by: frank chen <frank.chen021@outlook.com>
* Update extensions-contrib/redis-cache/src/main/java/org/apache/druid/client/cache/RedisClusterCache.java
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update extensions-contrib/redis-cache/src/test/java/org/apache/druid/client/cache/RedisClusterCacheTest.java
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* Update extensions-contrib/redis-cache/src/main/java/org/apache/druid/client/cache/AbstractRedisCache.java
Co-authored-by: Benedict Jin <asdf2014@apache.org>
* returns empty map in case of internal exception
Co-authored-by: Benedict Jin <asdf2014@apache.org>
Fixes#11297.
Description
Description and design in the proposal #11297
Key changed/added classes in this PR
*DataSegmentPusher
*ShuffleClient
*PartitionStat
*PartitionLocation
*IntermediaryDataManager
* Add error msg to parallel task's TaskStatus
* Consolidate failure block
* Add failure test
* Make it fail
* Add fail while stopped
* Simplify hash task test using a runner that fails after so many runs (parameter)
* Remove unthrown exception
* Use runner names to identify phase
* Added range partition kill test & fixed a timing bug with the custom runner
* Forbidden api
* Style
* Unit test code cleanup
* Added message to invalid state exception and improved readability of the phase error messages for the parallel task failure unit tests
* Add a new metric query/segments/count that is not emitted by default
* docs
* test the default implementation of the metric
* fix spelling error in docs
* document the fact that query retries will result in additional metric emissions
* update using recommended text from @jihoonson
Switching to the bom dependency declaration simplifies managing jackson
dependencies. It also removes the need to override individual library
versions for CVE fixes, since the bom takes care of that internally.
This change aligns our jackson dependency versions on 2.10.5(.x):
- updates jackson libraries from 2.10.2 to 2.10.5
- jackson-databind remains at 2.10.5.1 as defined in the bom
Release notes: https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.10
* fix count and average SQL aggregators on constant virtual columns
* style
* even better, why are we tracking virtual columns in aggregations at all if we have a virtual column registry
* oops missed a few
* remove unused
* this will fix it
* SQL timeseries no longer skip empty buckets with all granularity
* add comment, fix tests
* the ol switcheroo
* revert unintended change
* docs and more tests
* style
* make checkstyle happy
* docs fixes and more tests
* add docs, tests for array_agg
* fixes
* oops
* doc stuffs
* fix compile, match doc style