Commit Graph

3853 Commits

Author SHA1 Message Date
Rohan Garg 95694b5afa
Convert simple min/max SQL queries on __time to timeBoundary queries (#12472)
* Support array based results in timeBoundary query

* Fix bug with query interval in timeBoundary

* Convert min(__time) and max(__time) SQL queries to timeBoundary

* Add tests for timeBoundary backed SQL queries

* Fix query plans for existing tests

* fixup! Convert min(__time) and max(__time) SQL queries to timeBoundary

* fixup! Add tests for timeBoundary backed SQL queries

* fixup! Fix bug with query interval in timeBoundary
2022-04-25 08:18:58 -07:00
Gian Merlino b7621226d2
QueryScheduler: Log per-query message at DEBUG level. (#12467)
We generally want to avoid having any routine per-query messages at
INFO level, because they pollute logs.
2022-04-22 11:22:34 -07:00
Jihoon Son 73ce5df22d
Add support for authorizing query context params (#12396)
The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.

Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params.
User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters.
System context params. They are set by the Druid query engine during query processing. These params override other context params.
Today, any context params are allowed to users. This can cause 
1) a bad UX if the context param is not matured yet or 
2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows.

This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission.

{
  "resourceAction" : {
    "resource" : {
      "name" : "maxSubqueryRows",
      "type" : "QUERY_CONTEXT"
    },
    "action" : "WRITE"
  },
  "resourceNamePattern" : "maxSubqueryRows"
}
Each role can have multiple permissions for context params. Each permission should be set for different context params.

When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case,

HTTP endpoints will return 403 response code.
JDBC will throw ForbiddenException.
Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService.

The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.
2022-04-21 14:21:16 +05:30
Maytas Monsereenusorn c25a556827
Fix bug in auto compaction preserveExistingMetrics feature (#12438)
* fix bug

* fix test

* fix IT
2022-04-15 15:47:47 -07:00
Agustin Gonzalez 0460d45e92
Make tombstones ingestible by having them return an empty result set. (#12392)
* Make tombstones ingestible by having them return an empty result set.

* Spotbug

* Coverage

* Coverage

* Remove unnecessary exception (checkstyle)

* Fix integration test and add one more to test dropExisting set to false over tombstones

* Force dropExisting to true in auto-compaction when the interval contains only tombstones

* Checkstyle, fix unit test

* Changed flag by mistake, fixing it

* Remove method from interface since this method is specific to only DruidSegmentInputentity

* Fix typo

* Adapt to latest code

* Update comments when only tombstones to compact

* Move empty iterator to a new DruidTombstoneSegmentReader

* Code review feedback

* Checkstyle

* Review feedback

* Coverage
2022-04-15 09:08:06 -07:00
Parag Jain 2c79d28bb7
Copy of #11309 with fixes (#12402)
* Optionally load segment index files into page cache on bootstrap and new segment download

* Fix unit test failure

* Fix test case

* fix spelling

* fix spelling

* fix test and test coverage issues

Co-authored-by: Jian Wang <wjhypo@gmail.com>
2022-04-11 21:05:24 +05:30
Maytas Monsereenusorn 36e17a20ea
Improve metrics for Auto Compaction (#12413)
* add impl

* add docs

* fix
2022-04-08 20:14:36 -07:00
Maytas Monsereenusorn 8edea5a82d
Add a new flag for ingestion to preserve existing metrics (#12185)
* add impl

* add impl

* fix checkstyle

* add impl

* add unit test

* fix stuff

* fix stuff

* fix stuff

* add unit test

* add more unit tests

* add more unit tests

* add IT

* add IT

* add IT

* add IT

* add ITs

* address comments

* fix test

* fix test

* fix test

* address comments

* address comments

* address comments

* fix conflict

* fix checkstyle

* address comments

* fix test

* fix checkstyle

* fix test

* fix test

* fix IT
2022-04-08 11:02:02 -07:00
Paul Rogers 2cc2088720
Method to specify eternity in the scan query builder (#12223)
* Method to specify eternity in the scan query builder

* Fix checkstyle issue

* Renamed eterity() to eternityInterval()

* Minor fixes
2022-04-04 15:11:32 -07:00
Tejaswini Bandlamudi 984904779b
Increase default DatasourceCompactionConfig.inputSegmentSizeBytes to Long.MAX_VALUE (#12381)
The current default value of inputSegmentSizeBytes is 400MB, which is pretty
low for most compaction use cases. Thus most users are forced to override the
default.

The default value is now increased to Long.MAX_VALUE.
2022-04-04 16:28:53 +05:30
Yuanli Han f2495a67d2
fix messageGap metric (#12337) 2022-03-28 09:21:06 -07:00
Maytas Monsereenusorn ea51d8a16c
Duties in Indexing group (such as Auto Compaction) does not report metrics (#12352)
* add impl

* add unit tests

* fix checkstyle

* address comments

* fix checkstyle
2022-03-23 18:18:28 -07:00
Jihoon Son b6eeef31e5
Store null columns in the segments (#12279)
* Store null columns in the segments

* fix test

* remove NullNumericColumn and unused dependency

* fix compile failure

* use guava instead of apache commons

* split new tests

* unused imports

* address comments
2022-03-23 16:54:04 -07:00
Maytas Monsereenusorn dbb9518f50
Fix auto compaction by adjusting compaction task's interval to align with segmentGranularity when segmentGranularity is set (#12334)
* add impl

* add ITs

* address comments

* address comments

* address comments

* fix failure

* fix checkstyle

* fix checkstyle
2022-03-18 12:46:16 -07:00
Jihoon Son 5e23674fe5
Fix a race condition in the '/tasks' Overlord API (#12330)
* finds complete and active tasks from the same snapshot

* overlord resource

* unit test

* integration test

* javadoc and cleanup

* more cleanup

* fix test and add more
2022-03-17 10:47:45 +09:00
AmatyaAvadhanula 7bf1d8c5c0
Facilitate lazy initialization of connections to mitigate overwhelming of Coordinator (#12298)
Add config for eager / lazy connection initialization in ResourcePool

Description
Currently, when multiple tasks are launched, each of them eagerly initializes a full pool's worth of connections to the coordinator.

While this is acceptable when the parameter for number of eagerConnections (== maxSize) is small, this can be problematic in environments where it's a large value (say 1000) and multiple tasks are launched simultaneously, which can cause a large number of connections to be created to the coordinator, thereby overwhelming it.

Patch
Nodes like the broker may require eager initialization of resources and do not create connections with the Coordinator.
It is unnecessary to do this with other types of nodes.

A config parameter eagerInitialization is added, which when set to true, initializes the max permissible connections when ResourcePool is initialized.

If set to false, lazy initialization of connection resources takes place.

NOTE: All nodes except the broker have this new parameter set to false in the quickstart as part of this PR

Algorithm
The current implementation relies on the creation of maxSize resources eagerly.

The new implementation's behaviour is as follows:

If a resource has been previously created and is available, lend it.
Else if the number of created resources is less than the allowed parameter, create and lend it.
Else, wait for one of the lent resources to be returned.
2022-03-09 23:17:43 +05:30
Agustin Gonzalez abe76ccb90
Batch ingestion replace (#12137)
* Tombstone support for replace functionality

* A used segment interval is the interval of a current used segment that overlaps any of the input intervals for the spec

* Update compaction test to match replace behavior

* Adapt ITAutoCompactionTest to work with tombstones rather than dropping segments. Add support for tombstones in the broker.

* Style plus simple queriableindex test

* Add segment cache loader tombstone test

* Add more tests

* Add a method to the LogicalSegment to test whether it has any data

* Test filter with some empty logical segments

* Refactor more compaction/dropexisting tests

* Code coverage

* Support for all empty segments

* Skip tombstones when looking-up broker's timeline. Discard changes made to tool chest to avoid empty segments since they will no longer have empty segments after lookup because we are skipping over them.

* Fix null ptr when segment does not have a queriable index

* Add support for empty replace interval (all input data has been filtered out)

* Fixed coverage & style

* Find tombstone versions from lock versions

* Test failures & style

* Interner was making this fail since the two segments were consider equal due to their id's being equal

* Cleanup tombstone version code

* Force timeChunkLock whenever replace (i.e. dropExisting=true) is being used

* Reject replace spec when input intervals are empty

* Documentation

* Style and unit test

* Restore test code deleted by mistake

* Allocate forces TIME_CHUNK locking and uses lock versions. TombstoneShardSpec added.

* Unused imports. Dead code. Test coverage.

* Coverage.

* Prevent killer from throwing an exception for tombstones. This is the killer used in the peon for killing segments.

* Fix OmniKiller + more test coverage.

* Tombstones are now marked using a shard spec

* Drop a segment factory.json in the segment cache for tombstones

* Style

* Style + coverage

* style

* Add TombstoneLoadSpec.class to mapper in test

* Update core/src/main/java/org/apache/druid/segment/loading/TombstoneLoadSpec.java

Typo

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Update docs/configuration/index.md

Missing

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>

* Typo

* Integrated replace with an existing test since the replace part was redundant and more importantly, the test file was very close or exceeding the 10 min default "no output" CI Travis threshold.

* Range does not work with multi-dim

Co-authored-by: Jonathan Wei <jon-wei@users.noreply.github.com>
2022-03-08 20:07:02 -07:00
Gian Merlino 28f8bcce9b
Always reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307)
* Always reopen stream in FileUtils.copyLarge, RetryingInputStream.

When an InputStream throws an exception from one of its read methods,
we should assume it's bad and reopen it.

The main changes here are:

- In FileUtils.copyLarge, replace InputStream with InputStreamSupplier.
- In RetryingInputStream, collapse retryCondition and resetCondition
  into a single condition. Also, make it required, since every usage
  is passing in a specific condition anyway.

* Test fixes.

* Fix read impl.
2022-03-05 14:39:14 -08:00
Sandeep 61e1ffc7f7
add a new query laning metrics to visualize lane assignment (#12111)
* add a new query laning metrics to visualize lane assignment

* fixes :spotbugs check

* Update docs/operations/metrics.md

Co-authored-by: Benedict Jin <asdf2014@apache.org>

* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java

Co-authored-by: Benedict Jin <asdf2014@apache.org>

* Update server/src/main/java/org/apache/druid/server/QueryScheduler.java

Co-authored-by: Benedict Jin <asdf2014@apache.org>

Co-authored-by: Benedict Jin <asdf2014@apache.org>
2022-03-04 15:21:17 +08:00
Laksh Singla 3f709db173
Make ParseExceptions more informative (#12259)
This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader)

Following changes are addressed in this PR:

A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next().
IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number").
TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error.
This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).
2022-02-28 22:31:15 +05:30
Xavier Léauté d105519558
Replace use of PowerMock with Mockito (#12282)
Mockito now supports all our needs and plays much better with recent Java versions.
Migrating to Mockito also simplifies running the kind of tests that required PowerMock in the past. 

* replace all uses of powermock with mockito-inline
* upgrade mockito to 4.3.1 and fix use of deprecated methods
* import mockito bom to align all our mockito dependencies
* add powermock to forbidden-apis to avoid accidentally reintroducing it in the future
2022-02-27 22:47:09 -08:00
Xavier Léauté 1434197ee1
update airline dependency to 2.x (#12270)
* upgrade Airline to Airline 2
  https://github.com/airlift/airline is no longer maintained, updating to
  https://github.com/rvesse/airline (Airline 2) to use an actively
  maintained version, while minimizing breaking changes.

  Note, this is a backwards incompatible change, and extensions relying on
  the CliCommandCreator extension point will also need to be updated.

* fix dependency checks where jakarta.inject is now resolved first instead
  of javax.inject, due to Airline 2 using jakarta
2022-02-27 15:19:28 -08:00
Jihoon Son e5ad862665
A new includeAllDimension flag for dimensionsSpec (#12276)
* includeAllDimensions in dimensionsSpec

* doc

* address comments

* unused import and doc spelling
2022-02-25 18:27:48 -08:00
Maytas Monsereenusorn 6e2eded277
Allow coordinator run auto compaction duty period to be configured separately from other indexing duties (#12263)
* add impl

* add impl

* add unit tests

* add impl

* add impl

* add serde test

* add tests

* add docs

* fix test

* fix test

* fix docs

* fix docs

* fix spelling
2022-02-18 23:02:57 -08:00
tejaswini-imply 70c40c4281
Fix long overflow in SegmentCostCache.Bucket.toLocalInterval (#12257)
Problem:
When using a `CachingCostBalancerStrategy` with segments of granularity ALL,
no segment gets loaded.
- With granularity ALL, segments of eternity interval are created which have 
  `start = Long.MIN_VALUE / 2` and `end = Long.MAX_VALUE / 2`.
- For cost calculation in the balancer strategy, `toLocalInterval()` method is invoked where
  `Long.MIN_VALUE / 2` or `Long.MAX_VALUE / 2` cause an overflow thus resulting in no overlap.
- The strategy is unable to find any eligible server for loading a given segment.

Fix:
- Reverse order of operations to divide by `MILLIS_FACTOR` (~10^8) first,
   then do the subtraction to prevent Long overflow.
2022-02-17 15:13:51 +05:30
Jihoon Son ab3d994a17
Lazy instantiation for segmentKillers, segmentMovers, and segmentArchivers (#12207)
* working

* Lazily load segmentKillers, segmentMovers, and segmentArchivers

* more tests

* test-jar plugin

* more coverage

* lazy client

* clean up changes

* checkstyle

* i did not change the branch condition

* adjust failure rate to run tests faster

* javadocs

* checkstyle
2022-02-08 13:02:06 -08:00
Suneet Saldanha ced1389d4c
Enable auto kill segments by default (#12187)
* Enable auto-kill by default

* tests

* wip

* test

* fix IT

* fix it

* remove from docs

* make coverage bot happy
2022-02-07 06:57:54 -08:00
Maytas Monsereenusorn 2b8e7fc0b4
Add a flag to allow auto compaction task slot ratio to consider auto scaler slots (#12228)
* add impl

* fix checkstyle

* add unit tests

* checkstyle

* add IT

* fix IT

* add comments

* fix checkstyle
2022-02-06 20:46:05 -08:00
Suneet Saldanha 159f97dcb0
Update docs for druid.processing.numThreads in brokers (#12231)
* Update docs for druid.processing.numThreads

* error msg

* one more reference
2022-02-04 17:34:21 -08:00
Clint Wylie 8fd587b28c
remove duplicate Broker ServerInventoryView, improve HttpServerInventoryView logging (#12209)
* changes:
* remove SystemSchema duplicate ServerInventoryView in broker
* suppress duplicate segment added/removed warnings in HttpServerInventoryView when doing a full sync

* fixes
2022-02-03 12:57:34 -08:00
Kashif Faraz e648b01afb
Improve memory estimates in Aggregator and DimensionIndexer (#12073)
Fixes #12022  

### Description
The current implementations of memory estimation in `OnHeapIncrementalIndex` and `StringDimensionIndexer` tend to over-estimate which leads to more persistence cycles than necessary.

This PR replaces the max estimation mechanism with getting the incremental memory used by the aggregator or indexer at each invocation of `aggregate` or `encode` respectively.

### Changes
- Add new flag `useMaxMemoryEstimates` in the task context. This overrides the same flag in DefaultTaskConfig i.e. `druid.indexer.task.default.context` map
- Add method `AggregatorFactory.factorizeWithSize()` that returns an `AggregatorAndSize` which contains
  the aggregator instance and the estimated initial size of the aggregator
- Add method `Aggregator.aggregateWithSize()` which returns the incremental memory used by this aggregation step
- Update the method `DimensionIndexer.processRowValsToKeyComponent()` to return the encoded key component as well as its effective size in bytes
- Update `OnHeapIncrementalIndex` to use the new estimations only if `useMaxMemoryEstimates = false`
2022-02-03 10:34:02 +05:30
Rohan Garg c4fa3ccfc4
Fix load-drop-load sequence for same segment and historical in http loadqueue peon (#11717)
Fixes an issue where a load-drop-load sequence for a segment and historical doesn't work correctly for http based load queue peon. The first cycle of load-drop works fine - the problem comes when there is an attempt to reload the segment. The historical caches load success for some recent segments and makes the reload as a no-op. But it doesn't consider that fact that the segment was also dropped in between the load requests.
This change invalidates the cache after a client tries to fetch a success result.
2022-01-31 13:16:58 +05:30
Clint Wylie 5d2291991e
use reflection to check for mysql transient exception type (#12205)
* use reflection to check for mysql transient exception type

* better

* oops
2022-01-27 13:13:16 -08:00
zachjsh f906f2f577
Fix HttpRemoteTaskRunner LifecycleStart / LifecycleStop race condition (#12184)
* * stop workers, remove listener, and call exitStop() on HttpRemoteTaskRunner @LifecycleStop

* * fix test failure
2022-01-27 13:15:14 -06:00
TSFenwick a813816fb1
add module test for QueryableModule to allow for better runtime.properties testing (#12202)
added a default GetRequestLoggerProviderTest and GetEmitterRequestLoggerProviderTest
2022-01-25 22:26:11 -08:00
Karan Kumar 96b3498a40
Grouping on arrays as arrays (#12078)
* init multiValue column group by

* Changing sorting to Lexicographic as default

* Adding initial tests

* 1.Fixing test cases adding
2.Optimized inmem structs

* Linking SQL layer to native layer

* Adding multiDimension support to group by column strategy

* 1. Removing array coercion in Calcite layer
2. Removing ResultRowDeserializer

* 1. Supporting all primitive array types
2. Removing dimension spec as part of columnSelector

* 1. Supporting all primitive array types
2. Removing dimension spec as part of columnSelector

* 1. Checkstyle things
2. Removing flag

* Minor naming things

* CheckStyle Things

* Fixing test case

* Fixing hashing

* 1. Adding the MV function
2. Added few test cases

* 1. Adding MV function test cases

* Adding Selector strategy function test cases

* Fixing ClientQuerySegmentWalkerTest

* Adding GroupByQueryRunnerTest test cases

* Fixing test cases

* Adding few more test cases

* Fixing Exception asset statement and intellij inspection

* Adding null compatibility tests

* Review comments

* Fixing few failing tests

* Fixing few failing tests

* Do no convert to topN Q incase of group by on array

* Fixing checkstyle

* Fixing differences between jdk's class cast exception message

* 1. Fixing ordering if the grouping key is an array

* Fixing DefaultLimitSpec

* Fixing CalciteArraysQueryTest

* Dummy commit for LGTM

* changes:
* only coerce multi-value string null values when `ExpressionPlan.Trait.NEEDS_APPLIED` is set
* correct return type inference for ARRAY_APPEND,ARRAY_PREPEND,ARRAY_SLICE,ARRAY_CONCAT
* fix bug with ExprEval.ofType when actual type of object from binding doesn't match its claimed type

* Review comments

* Fixing test cases

* Fixing spot bugs

* Fixing strict compile

Co-authored-by: Clint Wylie <cwylie@apache.org>
2022-01-25 20:30:56 -08:00
Suneet Saldanha 2b32d86f3b
Enable automatic metdata cleanup by default (#12188) 2022-01-24 20:04:17 -08:00
Jihoon Son cc2ffc6c0f
Fix node discovery to ignore unknown DruidServices (#12157)
* Fix node discovery to ignore unknown DruidServices

* ignore all runtime exceptions

* fix test

* add custom deserializer

* custom serializer

* log host for unparseable druidService
2022-01-18 22:08:59 -08:00
Maytas Monsereenusorn bd7fe45da0
Support adding metrics in Auto Compaction (#12125)
* add impl

* add impl

* add unit tests

* add unit tests

* add unit tests

* add unit tests

* add unit tests

* add integration tests

* add integration tests

* fix LGTM

* fix test

* remove doc
2022-01-17 20:19:31 -08:00
Marcelo R Costa c28b2834a1
Add http response status code to org.eclipse.jetty.server.RequestLog (#12116)
* Add http response status code to org.eclipse.jetty.server.RequestLog

* http response code is expressed as an int. Set log msg interpolation based on digit

* trying to add an unit test to verify if the logger.debug method is called

* trying to add an unit test to verify if the logger.debug method is called

* fix compilation issues

* remove test
2022-01-06 20:10:01 +08:00
Maytas Monsereenusorn b53e7f4d12
Support overlapping segment intervals in auto compaction (#12062)
* add impl

* add impl

* fix more bugs

* add tests

* fix checkstyle

* address comments

* address comments

* fix test
2022-01-04 11:47:38 -08:00
somu-imply c267b65f97
Removing unused processing threadpool on broker (#12070)
* Thread pool for broker

* Updating two tests to improve coverage for new method added

* Updating druidProcessingConfigTest to cover coverage

* Adding missed spelling errors caused in doc

* Adding test to cover lines of new function added
2021-12-21 13:07:53 -08:00
lokesh-lingarajan 60a3a802b6
Modifying index from druid_segments(datasource, used, end) to druid_segments(datasource, used, end, start) to support kill task (#11894)
This index helps in faster query results during kill task's query on interval based unused segment listing. This can become a bottleneck in some production loads causing coordinator to wait longer for metadata db replies and impacting  Kafka ingestion. The modified index has helped reduce the query times for such queries.
2021-12-16 10:28:20 -08:00
Jonathan Wei 229f82a6f0
Add parse error list API for stream supervisors, use structured object for parse exceptions, simplify parse exception message (#11961)
* Add parse error list API for stream supervisors, simplify parse exception message

* Add input string to parse exception

* Use structured ParseExceptionReport

* Fix tests

* Add test

* PR comments, add ParseExceptionReport equals verifier

* Fix test
2021-12-09 15:42:55 -06:00
Lucas Capistrant 150902b95c
clean up the balancing code around the batched vs deprecated way of sampling segments to balance (#11960)
* clean up the balancing code around the batched vs deprecated way of sampling segments to balance

* fix docs, clarify comments, add deprecated annotations to legacy code

* remove unused variable

* update dynamic config dialog in console to state percentOfSegmentsToConsiderPerMove deprecated

* fix dynamic config text for percentOfSegmentsToConsiderPerMove

* run prettier to cleanup coordinator-dynamic-config.tsx changes

* update jest snapshot

* update documentation per review feedback
2021-12-07 14:47:46 -08:00
Clint Wylie a8815f671e
Fix druid client timeout zero (#12023)
* fix bug where queries fail immediately when timeout is 0 instead of using default timeout

* fix to use serverside max

* more better

* less flaky test

* oops
2021-12-07 12:41:01 -08:00
zachjsh 65cadbe42a
Fix bad lookup config fails task (#12021)
This PR fixes an issue in which if a lookup is configured incorreclty; does not serialize properly when being pulled by peon node, it causes the task to fail. The failure occurs because the peon and other leaf nodes (broker, historical), have retry logic that continues to retry the lookup loading for 3 minutes by default. The http listener thread on the peon task is not started until lookup loading completes, by default, the overlord waits 1 minute by default, to communicate with the peon task to get the task status, after which is orders the task to shut down, causing the ingestion task to fail.

To fix the issue, we catch the exception serialization error, and do not retry. Also fixed an issue in which a bad lookup config interferes with any other good lookup configs from being loaded.
2021-12-07 00:55:34 -05:00
Abhishek Agarwal 834aae096a
Human-readable and actionable SQL error messages (#11911)
This PR does two things

1. It adds the capability to surface missing features in SQL to users - The calcite planner will explore through multiple rules to convert a logical SQL query to a druid native query. Some rules change the shape of the query itself, optimize it and some rules are responsible for translating the query into a druid native query. These are DruidQueryRule, DruidOuterQueryRule, DruidJoinRule, DruidUnionDataSourceRule, DruidUnionRule etc. These rules will look at SQL and will do the necessary transformation. But if the rule can't transform the query, it returns back the control to the calcite planner without recording why was it not able to transform. E.g. there is a join query with a non-equal join condition. DruidJoinRule will look at the condition, see that it is not supported, and return back the control. The reason can be that a query can be planned in many different ways so if one rule can't parse it, the query may still be parseable by other rules. In this PR, we are intercepting these gaps and passing them back to the user if the query could not be planned at all.

2. The said capability has been used to generate actionable errors for some common unsupported SQL features. However, not all possible errors are covered and we can keep adding more in the future.
2021-12-07 09:44:08 +05:30
Paul Rogers 34a3d45737
Refactor ResponseContext (#11828)
* Refactor ResponseContext

Fixes a number of issues in preparation for request trailers
and the query profile.

* Converts keys from an enum to classes for smaller code
* Wraps stored values in functions for easier capture for other uses
* Reworks the "header squeezer" to handle types other than arrays.
* Uses metadata for visibility, and ability to compress,
  to replace ad-hoc code.
* Cleans up JSON serialization for the response context.
* Other miscellaneous cleanup.

* Handle unknown keys in deserialization

Also, make "Visibility" into a boolean.

* Revised comment

* Renamd variable
2021-12-06 17:03:12 -08:00
Karan Kumar 2539b7a748
Adding ToString() to ExceptionEvent (#12027)
For readable output for exception events, while generating the report in SeekableStreamSupervisor
2021-12-06 13:37:16 +05:30