Commit Graph

3816 Commits

Author SHA1 Message Date
Jihoon Son cc2ffc6c0f
Fix node discovery to ignore unknown DruidServices (#12157)
* Fix node discovery to ignore unknown DruidServices

* ignore all runtime exceptions

* fix test

* add custom deserializer

* custom serializer

* log host for unparseable druidService
2022-01-18 22:08:59 -08:00
Maytas Monsereenusorn bd7fe45da0
Support adding metrics in Auto Compaction (#12125)
* add impl

* add impl

* add unit tests

* add unit tests

* add unit tests

* add unit tests

* add unit tests

* add integration tests

* add integration tests

* fix LGTM

* fix test

* remove doc
2022-01-17 20:19:31 -08:00
Marcelo R Costa c28b2834a1
Add http response status code to org.eclipse.jetty.server.RequestLog (#12116)
* Add http response status code to org.eclipse.jetty.server.RequestLog

* http response code is expressed as an int. Set log msg interpolation based on digit

* trying to add an unit test to verify if the logger.debug method is called

* trying to add an unit test to verify if the logger.debug method is called

* fix compilation issues

* remove test
2022-01-06 20:10:01 +08:00
Maytas Monsereenusorn b53e7f4d12
Support overlapping segment intervals in auto compaction (#12062)
* add impl

* add impl

* fix more bugs

* add tests

* fix checkstyle

* address comments

* address comments

* fix test
2022-01-04 11:47:38 -08:00
somu-imply c267b65f97
Removing unused processing threadpool on broker (#12070)
* Thread pool for broker

* Updating two tests to improve coverage for new method added

* Updating druidProcessingConfigTest to cover coverage

* Adding missed spelling errors caused in doc

* Adding test to cover lines of new function added
2021-12-21 13:07:53 -08:00
lokesh-lingarajan 60a3a802b6
Modifying index from druid_segments(datasource, used, end) to druid_segments(datasource, used, end, start) to support kill task (#11894)
This index helps in faster query results during kill task's query on interval based unused segment listing. This can become a bottleneck in some production loads causing coordinator to wait longer for metadata db replies and impacting  Kafka ingestion. The modified index has helped reduce the query times for such queries.
2021-12-16 10:28:20 -08:00
Jonathan Wei 229f82a6f0
Add parse error list API for stream supervisors, use structured object for parse exceptions, simplify parse exception message (#11961)
* Add parse error list API for stream supervisors, simplify parse exception message

* Add input string to parse exception

* Use structured ParseExceptionReport

* Fix tests

* Add test

* PR comments, add ParseExceptionReport equals verifier

* Fix test
2021-12-09 15:42:55 -06:00
Lucas Capistrant 150902b95c
clean up the balancing code around the batched vs deprecated way of sampling segments to balance (#11960)
* clean up the balancing code around the batched vs deprecated way of sampling segments to balance

* fix docs, clarify comments, add deprecated annotations to legacy code

* remove unused variable

* update dynamic config dialog in console to state percentOfSegmentsToConsiderPerMove deprecated

* fix dynamic config text for percentOfSegmentsToConsiderPerMove

* run prettier to cleanup coordinator-dynamic-config.tsx changes

* update jest snapshot

* update documentation per review feedback
2021-12-07 14:47:46 -08:00
Clint Wylie a8815f671e
Fix druid client timeout zero (#12023)
* fix bug where queries fail immediately when timeout is 0 instead of using default timeout

* fix to use serverside max

* more better

* less flaky test

* oops
2021-12-07 12:41:01 -08:00
zachjsh 65cadbe42a
Fix bad lookup config fails task (#12021)
This PR fixes an issue in which if a lookup is configured incorreclty; does not serialize properly when being pulled by peon node, it causes the task to fail. The failure occurs because the peon and other leaf nodes (broker, historical), have retry logic that continues to retry the lookup loading for 3 minutes by default. The http listener thread on the peon task is not started until lookup loading completes, by default, the overlord waits 1 minute by default, to communicate with the peon task to get the task status, after which is orders the task to shut down, causing the ingestion task to fail.

To fix the issue, we catch the exception serialization error, and do not retry. Also fixed an issue in which a bad lookup config interferes with any other good lookup configs from being loaded.
2021-12-07 00:55:34 -05:00
Abhishek Agarwal 834aae096a
Human-readable and actionable SQL error messages (#11911)
This PR does two things

1. It adds the capability to surface missing features in SQL to users - The calcite planner will explore through multiple rules to convert a logical SQL query to a druid native query. Some rules change the shape of the query itself, optimize it and some rules are responsible for translating the query into a druid native query. These are DruidQueryRule, DruidOuterQueryRule, DruidJoinRule, DruidUnionDataSourceRule, DruidUnionRule etc. These rules will look at SQL and will do the necessary transformation. But if the rule can't transform the query, it returns back the control to the calcite planner without recording why was it not able to transform. E.g. there is a join query with a non-equal join condition. DruidJoinRule will look at the condition, see that it is not supported, and return back the control. The reason can be that a query can be planned in many different ways so if one rule can't parse it, the query may still be parseable by other rules. In this PR, we are intercepting these gaps and passing them back to the user if the query could not be planned at all.

2. The said capability has been used to generate actionable errors for some common unsupported SQL features. However, not all possible errors are covered and we can keep adding more in the future.
2021-12-07 09:44:08 +05:30
Paul Rogers 34a3d45737
Refactor ResponseContext (#11828)
* Refactor ResponseContext

Fixes a number of issues in preparation for request trailers
and the query profile.

* Converts keys from an enum to classes for smaller code
* Wraps stored values in functions for easier capture for other uses
* Reworks the "header squeezer" to handle types other than arrays.
* Uses metadata for visibility, and ability to compress,
  to replace ad-hoc code.
* Cleans up JSON serialization for the response context.
* Other miscellaneous cleanup.

* Handle unknown keys in deserialization

Also, make "Visibility" into a boolean.

* Revised comment

* Renamd variable
2021-12-06 17:03:12 -08:00
Karan Kumar 2539b7a748
Adding ToString() to ExceptionEvent (#12027)
For readable output for exception events, while generating the report in SeekableStreamSupervisor
2021-12-06 13:37:16 +05:30
Jihoon Son 1f052b43c5
Better serverView exec name; remove SingleServerInventoryView (#11770)
Druid currently has 2 serverViews, regular serverView and filtered serverView. The regular serverView is used to monitor all segment announcements from all data nodes (historicals, tasks, indexers). The filtered serverView is used when you want to watch segment announcements from particular tiers. Since these server views keep track of different sets of druidServers and segments in memory, they should be maintained separately. However, they currently share the same name for their executorService, which can cause confusion and make debugging harder especially in the broker since it is using both serverViews, the filtered view for normal query processing and the regular view to serve the servers table (I'm unsure whether this is intended or whether this is a good behavior). This PR changes it to a more obvious name.

This PR also removes SingleServerInventoryView. This view was deprecated a long time ago and has not been documented at least since 0.13 (#6127). I also don't think this can be better in any case than BatchServerInventoryView. Finally, I merged AbstractCuratorServerInventoryView and BatchServerInventoryView as we no longer need AbstractCuratorServerInventoryView after SingleServerInventoryView is removed.
2021-12-04 18:43:05 +05:30
Jihoon Son fc9513b6cd
Make NodeRole available during binding; add support for dynamic registration of DruidService (#12012)
* Make nodeRole available during binding; add support for dynamic registration of DruidService

* fix checkstyle and test

* fix customRole test

* address comments

* add more javadoc
2021-12-03 11:59:00 -08:00
Gian Merlino e0e05aad99
Enhancements to IndexTaskClient. (#12011)
* Enhancements to IndexTaskClient.

1) Ability to use handlers other than StringFullResponseHandler. This
   functionality is not used in production code yet, but is useful
   because it will allow tasks to communicate with each other in
   non-string-based formats and in streaming fashion. In the future,
   we'll be able to use this to make task-to-task communication
   more efficient.

2) Truncate server errors at 1KB, so long errors do not pollute logs.

3) Change error log level for retryable errors from WARN to INFO. (The
   final error is still WARN.)

4) Harmonize log and exception messages to have a more consistent format.

* Additional tests and improvements.
2021-12-03 09:14:32 -08:00
Paul Rogers a66f10eea1
Code cleanup from query profile project (#11822)
* Code cleanup from query profile project

* Fix spelling errors
* Fix Javadoc formatting
* Abstract out repeated test code
* Reuse constants in place of some string literals
* Fix up some parameterized types
* Reduce warnings reported by Eclipse

* Reverted change due to lack of tests
2021-11-30 11:35:38 -08:00
Gian Merlino f6e6ca2893
Use intermediate-persist IndexSpec during multiphase merge. (#11940)
* Use intermediate-persist IndexSpec during multiphase merge.

The main change is the addition of an intermediate-persist IndexSpec
to the main "merge" method in IndexMerger. There are also a few minor
adjustments to the IndexMerger interface to encourage more harmonious
usage of its methods in the future.

* Additional changes inspired by the test coverage checker.

- Remove unused-in-production IndexMerger methods "append" and "convert".
- Add additional unit tests to UnifiedIndexerAppenderatorsManager.

* Additional adjustments.

* Even more additional adjustments.

* Test fixes.
2021-11-29 15:08:49 -08:00
Sandeep 9bc18a93a2
warn when segment cannot be loaded by Historical nodes (#11849) 2021-11-26 17:27:17 +08:00
Gian Merlino 3d72e66f56
Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs. (#11582)
* Consolidate a bunch of ad-hoc segments metadata SQL; fix some bugs.

This patch gathers together a variety of SQL from SqlSegmentsMetadataManager
and IndexerSQLMetadataStorageCoordinator into a new class SqlSegmentsMetadataQuery.
It focuses on SQL related to retrieving segment payloads and marking
segments used and unused.

In addition to cleaning up the code a bit, this patch also fixes a bug
with years before 0 or after 9999. The prior SQL did not work properly
because dates outside this range cannot be compared as strings. The new
code does work for these far-past and far-future years.

So, if you're ever interested in using Druid to analyze things from
ancient Babylon, you better apply this patch first!

* Fix test compiling.

* Fixes and improvements.

* Fix forbidden API.

* Additional fixes.
2021-11-24 14:51:53 -08:00
Maytas Monsereenusorn bb3d2a433a
Support filtering data in Auto Compaction (#11922)
* add impl

* fix checkstyle

* add test

* add test

* add unit tests

* fix unit tests

* fix unit tests

* fix unit tests

* add IT

* add IT

* add comments

* fix spelling
2021-11-24 10:56:38 -08:00
Agustin Gonzalez 311d9a2370
Log correct hydrant count (#11976) 2021-11-23 08:22:17 -08:00
Gian Merlino b13f07a057
Harmonize local input sources; fix batch index integration test. (#11965)
* Make LocalInputSource.files a List instead of Set and adjust wikipedia_index_task to use file list.

Rationale: the behavior of wikipedia_index_task.json is order-dependent with regard to its input
files; some orders produce 4 segments and some produce 5 segments. Some integration tests, like
ITSystemTableBatchIndexTaskTest and ITAutoCompactionTest, are written assuming that the
4-segment case will always happen. Providing the file list in a specific order ensures that this
will happen as expected by the tests.

I didn't see a specific reason why the LocalInputSource.files parameter needed to be a Set, so
changing it to a List was the simplest way to achieve the consistent ordering. I think it will
also make the behavior make more sense if someone does specify the same input file multiple
times in a spec: I think they'd expect it to be loaded multiple times instead of deduped. This
is consistent with the behavior of other input sources like S3, GCS, HTTP.

* Sort files in LocalFirehoseFactory.
2021-11-21 22:26:31 -08:00
Nikhil Navadiya 3c51136098
Add worker category dimension (#11554)
* Add worker category as dimension in TaskSlotCountStatsMonitor

* Change description

* Add workerConfig as field

* Modify HttpRemoteTaskRunnerTest to test worker category in taskslot metrics

* Fixing tests

* Fixing alerts

* Adding unit test in SingleTaskBackgroundRunnerTest for task slot metrics APIs

* Resolving false positive spell check

* addressing comments

* throw UnsupportedOperationException for tasklotmetrics APIs in SingleTaskBackgroundRunner

Co-authored-by: Nikhil Navadiya <nnavadiya@twitter.com>
2021-11-18 22:59:07 -08:00
Agustin Gonzalez a4353aa1f4
Fix bug Unrecognized token 'No': was expecting (JSON String,...) when… (#11934)
* Fix bug Unrecognized token 'No': was expecting (JSON String,...) when calling the API /druid/indexer/v1/task/taskId/reports and the report is not found

* Also log other non-OK statuses
2021-11-18 10:29:28 -07:00
Gian Merlino a04f99a950
Indexer: Demote WARN to DEBUG for tasks that don't register Appenderators. (#11939) 2021-11-18 07:54:43 -08:00
TSFenwick 1487f558b1
Use a simple class to sanitize JDBC exceptions and also log them (#11843)
* Use a simple class to sanitize sanitizable errors and log them

The purpose of this is to sanitize JDBC errors, but can sanitize other errors
if they implement SanitizableError Interface

add a class to log errors and sanitize them
added a simple test that tests out that the error gets sanitized
add @NonNull annotation to serverconfig's ErrorResponseTransfromStrategy

* return less information as part of too many connections, and instead only log specific details

This is so an end user gets relevant information but not too much info since they might now how
many brokers they have

* return only runtime exceptions

added new error types that need to be sanitized
also sanitize deprecated and unsupported exceptions.

* dont reqrewite exceptions unless necessary for checked exceptions

add docs
avoid blanket turning all exceptions into runtime exceptions

* address comments, to fix up docs.

add more javadocs
add support UOE sanitization

* use try catch instead and sanitize at public methods

* checkstyle fixes

* throw noSuchStatement and NoSuchConnection as Avatica is affected by those

* address comments. move log error back to druid meta

clean up bad formatting and commented code. add missed catch for NoSuchStatementException
clean up comments for error handler and add comment explainging not wanting to santize avatica exceptions

* alter test to reflect new error message
2021-11-16 13:13:03 -08:00
Laksh Singla 57ed5127a7
Make subquery IDs more comprehensive (#11809)
There are 3 types of query IDs - id, subQueryId, sqlQueryId. Currently, whenever a query generates subqueries, the subquery's subQueryId is populated randomly. Also, subquery's Id is not set to the parent query Id. Therefore there is no way of linking the subqueries to the parent query, and one loses the ability to look at end to end view of the query.

This PR aims to implement following couple of things:

Populate the subqueries with it's parent's id (and sqlQueryId if present)
Populate the subqueryId such that it forms a hierarchical relationship amongs themselves. For example, if there is a query which launches a subquery, which in turn launches a couple of subqueries, then the ids and subQueryIds should have following structure.
2021-11-11 16:31:56 +05:30
Gian Merlino 14b0b4aee2
RowBasedSegment: Use Sequence instead of Iterable. (#11886)
* RowBasedSegment: Use Sequence instead of Iterable.

The main reason this is good is that Sequences can include baggage that
must be closed after iteration is finished. This enables creating
RowBasedSegments on top of closeable sequences of rows.

To preserve the optimization that allows reversing a List without
copying it, this patch also makes SimpleSequence its own class and allows
extracting the Iterable that was used to create it.

* Fix tests.
2021-11-10 06:06:52 -08:00
Gian Merlino 6c196a5ea2
Remove StorageAdapter.getColumnTypeName. (#11893)
* Remove StorageAdapter.getColumnTypeName.

It was only used by SegmentAnalyzer, and isn't necessary anymore due to
the recent improvements to ColumnCapabilities.

Also: tidy ColumnDescriptor.read slightly by removing an instanceof
check, and moving the relevant logic into ComplexColumnPartSerde.

* Fix spellings.
2021-11-09 15:18:07 -08:00
Gian Merlino babf00f8e3
Migrate File.mkdirs to FileUtils.mkdirp. (#11879)
* Migrate File.mkdirs to FileUtils.mkdirp.

* Remove unused imports.

* Fix LookupReferencesManager.

* Simplify.

* Also migrate usages of forceMkdir.

* Fix var name.

* Fix incorrect call.

* Update test.
2021-11-09 11:10:49 -08:00
Maytas Monsereenusorn ddc68c6a81
Support changing dimension schema in Auto Compaction (#11874)
* add impl

* add unit tests

* fix checkstyle

* add impl

* add impl

* add impl

* add impl

* add impl

* add impl

* fix test

* add IT

* add IT

* fix docs

* add test

* address comments

* fix conflict
2021-11-08 21:17:08 -08:00
Clint Wylie 7237dc837c
complex typed expressions (#11853)
* complex typed expressions

* add built-in hll collector expressions to get coverage on druid-processing, more types, more better

* rampage!!!

* more javadoc

* adjustments

* oops

* lol

* remove unused dependency

* contradiction?

* more test
2021-11-08 00:33:06 -08:00
Jian Wang 8e7e679984
Add more metrics for Jetty server thread pool usage (#11113)
Add more metrics for jetty server thread pool usage so we know if we have allocated enough http threads to handle requests.
2021-11-07 16:51:44 +05:30
Kashif Faraz 2d77e1a3c6
Add support for multi dimension range partitioning (#11848)
This PR adds support for range partitioning on multiple dimensions. It extends on the
concept and implementation of single dimension range partitioning.

The new partition type added is range which corresponds to a set of Dimension Range Partition classes. single_dim is now treated as a range type partition with a single partition dimension.

The start and end values of a DimensionRangeShardSpec are represented
by StringTuples, where each String in the tuple is the value of a partition dimension.
2021-11-06 12:50:17 +05:30
Gian Merlino 8971056763
Properly count segment references in tests. (#11870) 2021-11-05 12:49:10 -07:00
Kashif Faraz a22687ecbe
Add Broker config `druid.broker.segment.watchRealtimeNodes` (#11732)
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to skip the realtime nodes and
thus it would query only Historical processes for any given segment.
2021-11-02 12:38:42 +05:30
Maytas Monsereenusorn ba2874ee1f
Support changing query granularity in Auto Compaction (#11856)
* add queryGranularity

* fix checkstyle

* fix test
2021-11-01 15:18:44 -07:00
Maytas Monsereenusorn 33d9d9bd74
Add rollup config to auto and manual compaction (#11850)
* add rollup to auto and manual compaction

* add unit tests

* add unit tests

* add IT

* fix checkstyle
2021-10-29 10:22:25 -07:00
Lucas Capistrant 43383c73a8
refactor BalanceSegments#balanceServers to exit early if there is no work to be done (#11768)
* remove useless call to balanceServers for move from decom servers when there are no decom servers

* refactor approach to this PR but accomplish the same thing
2021-10-25 10:06:35 -05:00
Gian Merlino 98ecbb21cd
Remove CloseQuietly and migrate its usages to other methods. (#10247)
* Remove CloseQuietly and migrate its usages to other methods.

These other methods include:

1) New method CloseableUtils.closeAndWrapExceptions, which wraps IOExceptions
   in RuntimeExceptions for callers that just want to avoid dealing with
   checked exceptions. Most usages were migrated to this method, because it
   looks like they were mainly attempts to avoid declaring a throws clause,
   and perhaps were unintentionally suppressing IOExceptions.
2) New method CloseableUtils.closeInCatch, designed to properly close something
   in a catch block without losing exceptions. Some usages from catch blocks
   were migrated here, when it seemed that they were intended to avoid checked
   exception handling, and did not really intend to also suppress IOExceptions.
3) New method CloseableUtils.closeAndSuppressExceptions, which sends all
   exceptions to a "chomper" that consumes them. Nothing is thrown or returned.
   The behavior is slightly different: with this method, _all_ exceptions are
   suppressed, not just IOExceptions. Calls that seemed like they had good
   reason to suppress exceptions were migrated here.
4) Some calls were migrated to try-with-resources, in cases where it appeared
   that CloseQuietly was being used to avoid throwing an exception in a finally
   block.

🎵 You don't have to go home, but you can't stay here... 🎵

* Remove unused import.

* Fix up various issues.

* Adjustments to tests.

* Fix null handling.

* Additional test.

* Adjustments from review.

* Fixup style stuff.

* Fix NPE caused by holder starting out null.

* Fix spelling.

* Chomp Throwables too.
2021-10-23 17:03:21 -07:00
Clint Wylie 187df58e30
better types (#11713)
* better type system

* needle in a haystack

* ColumnCapabilities is a TypeSignature instead of having one, INFORMATION_SCHEMA support

* fixup merge

* more test

* fixup

* intern

* fix

* oops

* oops again

* ...

* more test coverage

* fix error message

* adjust interning, more javadocs

* oops

* more docs more better
2021-10-19 01:47:25 -07:00
David Bar 7d4841471f
Optimize supervisor history retrieval for specific id (#11807)
Optimization. Fetch from the metadata store only the relevant history items for the requested supervisor id.
2021-10-19 14:08:25 +05:30
TSFenwick 9c15f938fd
fix test issue where JettyTest would fail if JettyWithResponseFilterEnabledTest ran before it (#11803)
this change ensures that JettyTest is setting the properties it needs in case some other test overwrites them
this also changes up the ordering of the call for setProperties to call super's first in case super is setting the same property
2021-10-18 12:42:41 -07:00
Lucas Capistrant 1930ad1f47
Implement configurable internally generated query context (#11429)
* Add the ability to add a context to internally generated druid broker queries

* fix docs

* changes after first CI failure

* cleanup after merge with master

* change default to empty map and improve unit tests

* add doc info and fix checkstyle

* refactor DruidSchema#runSegmentMetadataQuery and add a unit test
2021-10-06 09:02:41 -07:00
Kashif Faraz b688db790b
Add Broker config `druid.broker.segment.ignoredTiers` (#11766)
The new config is an extension of the concept of "watchedTiers" where
the Broker can choose to add the info of only the specified tiers to its timeline.
Similarly, with this config, Broker can choose to ignore the segments being served
by the specified historical tiers. By default, no tier is ignored.

This config is useful when you want a completely isolated tier amongst many other tiers.

Say there are several tiers of historicals Tier T1, Tier T2 ... Tier Tn
and there are several brokers Broker B1, Broker B2 .... Broker Bm

If we want only Broker B1 to query Tier T1, instead of setting a long list of watchedTiers
on each of the other Brokers B2 ... Bm, we could just set druid.broker.segment.ignoredTiers=["T1"]
for these Brokers, while Broker B1 could have druid.broker.segment.watchedTiers=["T1"]
2021-10-06 10:06:32 +05:30
Maytas Monsereenusorn a04b08e45c
Add new config to filter internal Druid-related messages from Query API response (#11711)
* add impl

* add impl

* add tests

* add unit test

* fix checkstyle

* address comments

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* fix checkstyle

* address comments

* address comments

* address comments

* fix test

* fix test

* fix test

* fix test

* fix test

* change config name

* change config name

* change config name

* address comments

* address comments

* address comments

* address comments

* address comments

* address comments

* fix compile

* fix compile

* change config

* add more tests

* fix IT
2021-09-29 12:55:49 +07:00
Agustin Gonzalez 2355a60419
Avoid primary key violation in segment tables under certain conditions when appending data to same interval (#11714)
* Fix issue of duplicate key  under certain conditions when loading late data in streaming. Also fixes a documentation issue with skipSegmentLineageCheck.

* maxId may be null at this point, need to check for that

* Remove hypothetical case (it cannot happen)

* Revert compaction is simply "killing" the compacted segment and previously, used, overshadowed segments are visible again

* Add comments
2021-09-22 19:21:48 -05:00
Clint Wylie 5de26cf6d9
add optional system schema authorization (#11720)
* add optional system schema authorization

* remove unused

* adjust docs

* doc fixes, missing ldap config change for integration tests

* style
2021-09-21 13:28:26 -07:00
Clint Wylie 392f0ca1b5
refactor sql authorization to get resource type from schema, resource type to be string (#11692)
* refactor sql authorization to get resource type from schema, refactor resource type from enum to string

* information schema auth filtering adjustments

* refactor

* minor stuff

* Update SqlResourceCollectorShuttle.java
2021-09-17 09:53:25 -07:00