Commit Graph

966 Commits

Author SHA1 Message Date
Clint Wylie e25ba00470
fix bug in ObjectFlatteners.toMap which caused null values in avro-stream/avro-ocf/parquet/orc to be converted to {} instead of null in web-console sampler UI (#12785)
* fix bug in ObjectFlatteners.toMap which caused null values in avro-stream/avro-ocf/parquet/orc to be converted to {} instead of null
* fix parquet test that expected wrong behavior, my bad heh
2022-07-14 16:52:01 -07:00
Gian Merlino e82890fde4
Mark specific nimbus.lang.tag.version. (#12751)
* Mark specific nimbus.lang.tag.version.

* Add ignoredUnusedDeclaredDependencies.
2022-07-07 09:58:35 +05:30
Gian Merlino 2b330186e2
Mid-level service client and updated high-level clients. (#12696)
* Mid-level service client and updated high-level clients.

Our servers talk to each other over HTTP. We have a low-level HTTP
client (HttpClient) that is super-asynchronous and super-customizable
through its handlers. It's also proven to be quite robust: we use it
for Broker -> Historical communication over the wide variety of query
types and workloads we support.

But the low-level client has no facilities for service location or
retries, which means we have a variety of high-level clients that
implement these in their own ways. Some high-level clients do a better
job than others. This patch adds a mid-level ServiceClient that makes
it easier for high-level clients to be built correctly and harmoniously,
and migrates some of the high-level logic to use ServiceClients.

Main changes:

1) Add ServiceClient org.apache.druid.rpc package. That package also
   contains supporting stuff like ServiceLocator and RetryPolicy
   interfaces, and a DiscoveryServiceLocator based on
   DruidNodeDiscoveryProvider.

2) Add high-level OverlordClient in org.apache.druid.rpc.indexing.

3) Indexing task client creator in TaskServiceClients. It uses
   SpecificTaskServiceLocator to find the tasks. This improves on
   ClientInfoTaskProvider by caching task locations for up to 30 seconds
   across calls, reducing load on the Overlord.

4) Rework ParallelIndexSupervisorTaskClient to use a ServiceClient
   instead of extending IndexTaskClient.

5) Rework RemoteTaskActionClient to use a ServiceClient instead of
   DruidLeaderClient.

6) Rework LocalIntermediaryDataManager, TaskMonitor, and
   ParallelIndexSupervisorTask. As a result, MiddleManager, Peon, and
   Overlord no longer need IndexingServiceClient (which internally used
   DruidLeaderClient).

There are some concrete benefits over the prior logic, namely:

- DruidLeaderClient does retries in its "go" method, but only retries
  exactly 5 times, does not sleep between retries, and does not retry
  retryable HTTP codes like 502, 503, 504. (It only retries IOExceptions.)
  ServiceClient handles retries in a more reasonable way.

- DruidLeaderClient's methods are all synchronous, whereas ServiceClient
  methods are asynchronous. This is used in one place so far: the
  SpecificTaskServiceLocator, so we don't need to block a thread trying
  to locate a task. It can be used in other places in the future.

- HttpIndexingServiceClient does not properly handle all server errors.
  In some cases, it tries to parse a server error as a successful
  response (for example: in getTaskStatus).

- IndexTaskClient currently makes an Overlord call on every task-to-task
  HTTP request, as a way to find where the target task is. ServiceClient,
  through SpecificTaskServiceLocator, caches these target locations
  for a period of time.

* Style adjustments.

* For the coverage.

* Adjustments.

* Better behaviors.

* Fixes.
2022-07-05 09:43:26 -07:00
Tejaswini Bandlamudi d559773a0e
sets Hadoop conf ClassLoader (#12738) 2022-07-04 17:07:39 +05:30
Rui Chen 068bea6334
deps: upgrade mysql-connector-java to v5.1.49 (#12704) 2022-06-29 23:15:46 +08:00
Paul Rogers f7caee3b25
Revert changes from #12672 (#12703)
* Revert changes from #12672

* Reverted more conflicting changes

Changes are not needed given previous reversions.
2022-06-25 09:10:44 +05:30
William Hyun 2aadd69f54
Update ORC to 1.7.5 (#12667) 2022-06-24 16:08:42 -07:00
Gian Merlino d5abd06b96
Fix flaky KafkaIndexTaskTest. (#12657)
* Fix flaky KafkaIndexTaskTest.

The testRunTransactionModeRollback case had many race conditions. Most notably,
it would commit a transaction and then immediately check to see that the results
were *not* indexed. This is racey because it relied on the indexing thread being
slower than the test thread.

Now, the case waits for the transaction to be processed by the indexing thread
before checking the results.

* Changes from review.
2022-06-24 13:53:51 -07:00
Didip Kerabat 6ddb828c7a
Able to filter Cloud objects with glob notation. (#12659)
In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable.

Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord.

This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files.

I am using the glob notation to be consistent with the LocalFirehose syntax.
2022-06-24 11:40:08 +05:30
Paul Rogers ffcb996468
Cleanup changes pulled out of PR #12368 (#12672)
This commit contains the cleanup needed for the new integration test framework.

Changes:
- Fix log lines, misspellings, docs, etc.
- Allow the use of some of Druid's "JSON config" objects in tests
- Fix minor bug in `BaseNodeRoleWatcher`
2022-06-23 23:19:50 +05:30
Tejaswini Bandlamudi a85b1d8985
Lazy Initialisation of Orc extensions module (#12663)
* Lazy initialization of Orc extension

* nit

* moving intialize method to OrcInputFormat
2022-06-21 11:13:10 +05:30
AmatyaAvadhanula f970757efc
Optimize overlord GET /tasks memory usage (#12404)
The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API)

Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid )

The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.
2022-06-16 22:30:37 +05:30
Dongjoon Hyun 79f86a0511
Upgrade ORC to 1.7.4 (#12572)
This commit upgrades Apache ORC library from 1.7.2 to 1.7.4.
Apache ORC 1.7.4 is the maintenance release with the following bug fixes.

https://orc.apache.org/news/2022/04/15/ORC-1.7.4/
https://github.com/apache/orc/releases/tag/v1.7.4
2022-05-28 17:44:36 +05:30
Gian Merlino 4631cff2a9
Free ByteBuffers in tests and fix some bugs. (#12521)
* Ensure ByteBuffers allocated in tests get freed.

Many tests had problems where a direct ByteBuffer would be allocated
and then not freed. This is bad because it causes flaky tests.

To fix this:

1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder.
   This makes it easy to free the direct buffer. Currently, it's only used
   in tests, because production code seems OK.

2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either
   to ByteBuffer.allocate (on-heap, which are garbaged collected), or to
   ByteBufferUtils.allocateDirect (wherever it seemed like there was a good
   reason for the buffer to be off-heap). Make sure to close all direct
   holders when done.

* Changes based on CI results.

* A different approach.

* Roll back BitmapOperationTest stuff.

* Try additional surefire memory.

* Revert "Roll back BitmapOperationTest stuff."

This reverts commit 49f846d9e3.

* Add TestBufferPool.

* Revert Xmx change in tests.

* Better behaved NestedQueryPushDownTest. Exit tests on OOME.

* Fix TestBufferPool.

* Remove T1C from ARM tests.

* Somewhat safer.

* Fix tests.

* Fix style stuff.

* Additional debugging.

* Reset null / expr configs better.

* ExpressionLambdaAggregatorFactory thread-safety.

* Alter forkNode to try to get better info when a JVM crashes.

* Fix buffer retention in ExpressionLambdaAggregatorFactory.

* Remove unused import.
2022-05-19 07:42:29 -07:00
Kashif Faraz 7ab2170802
Use datasketches version 3.2.0 (#12509)
Changes:
- Use apache datasketches version 3.2.0.
- Remove unsafe reflection-based usage of datasketch internals added in #12022
2022-05-13 11:28:15 +05:30
Lucas Capistrant 39e7191f03
Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030)
* Add authentication call before cleaning up intermediate files in hadoop ingestions

* fix checkstyle

* remove debug log
2022-05-02 08:40:44 -05:00
MC-JY bb080693a9
Improve build performance of modules (#12486)
* improve build performance of modules

* improve build performance of modules

* Update pom.xml

* improve build performance of modules
2022-05-01 22:43:11 +08:00
Gian Merlino 529b983ad0
GroupBy: Reduce allocations by reusing entry and key holders. (#12474)
* GroupBy: Reduce allocations by reusing entry and key holders.

Two main changes:

1) Reuse Entry objects returned by various implementations of
   Grouper.iterator.

2) Reuse key objects contained within those Entry objects.

This is allowed by the contract, which states that entries must be
processed and immediately discarded. However, not all call sites
respected this, so this patch also updates those call sites.

One particularly sneaky way that the old code retained entries too long
is due to Guava's MergingIterator and CombiningIterator. Internally,
these both advance to the next value prior to returning the current
value. So, this patch addresses that in two ways:

1) For merging, we have our own implementation MergeIterator already,
   although it had the same problem. So, this patch updates our
   implementation to return the current item prior to advancing to the
   next item. It also adds a forbidden-api entry to ensure that this
   safer implementation is used instead of Guava's.

2) For combining, we address the problem in a different way: by copying
   the key when creating the new, combined entry.

* Attempt to fix test.

* Remove unused import.
2022-04-28 23:21:13 -07:00
Abhishek Agarwal 2fe053c5cb
Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
Jihoon Son 73ce5df22d
Add support for authorizing query context params (#12396)
The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.

Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params.
User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters.
System context params. They are set by the Druid query engine during query processing. These params override other context params.
Today, any context params are allowed to users. This can cause 
1) a bad UX if the context param is not matured yet or 
2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows.

This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission.

{
  "resourceAction" : {
    "resource" : {
      "name" : "maxSubqueryRows",
      "type" : "QUERY_CONTEXT"
    },
    "action" : "WRITE"
  },
  "resourceNamePattern" : "maxSubqueryRows"
}
Each role can have multiple permissions for context params. Each permission should be set for different context params.

When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case,

HTTP endpoints will return 403 response code.
JDBC will throw ForbiddenException.
Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService.

The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.
2022-04-21 14:21:16 +05:30
PJ Fanning 341c65738d
issue-12426 upgrade k8s client due to cve (#12427)
* issue-12426 upgrade k8s client due to cve

* compile issues

* try to fix license check
2022-04-21 10:11:55 +08:00
Rohan Garg de9f12b5c6
Fail fast incase a lookup load fails (#12397)
Currently while loading a lookup for the first time, loading threads blocks
for `waitForFirstRunMs` incase the lookup failed to load. If the `waitForFirstRunMs`
is long (like 10 minutes), such blocking can slow down the loading of other lookups.

This commit allows the thread to progress as soon as the loading of the lookup fails.
2022-04-18 13:14:02 +05:30
AmatyaAvadhanula 067254b778
Package kinesis client jar within the extension (#12370)
amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension.
This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities.

Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.
2022-04-04 21:31:18 +05:30
AmatyaAvadhanula c5531be553
Add feature flag for Kinesis listShards API usage (#12383)
listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161.

However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html).

A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.
2022-04-04 14:58:10 +05:30
Jihoon Son b6eeef31e5
Store null columns in the segments (#12279)
* Store null columns in the segments

* fix test

* remove NullNumericColumn and unused dependency

* fix compile failure

* use guava instead of apache commons

* split new tests

* unused imports

* address comments
2022-03-23 16:54:04 -07:00
Kyle Larose db91961af7
kubernetes: restart watch on null response (#12233)
* kubernetes: restart watch on null response

Kubernetes watches allow a client to efficiently processes changes to
resources. However, they have some idiosyncrasies. In particular, they
can error out for various reasons leading to what would normally be seen
as an invalid result.

The Druid kubernetes node discovery subsystem does not handle a certain
case properly. The watch can return an item with a null object.  These
leads to a null pointer exception. When this happens, the provider needs
to restart the watch, because rerunning the watch from the same resource
version leads to the same result: yet another null pointer exception.

This commit changes the provider to handle null objects by restarting
the watch.

* review: add more coverage

This adds a bit more coverage to the K8sDruidNodeDiscoveryProvider watch
loop, and removes an unnecessay return.

* kubernetes: reduce logging verbosity

The log messages about items being NULL don't really deserve to be at a
level other than DEBUG since they are not actionable, particularly since
we automatically recover now. Move them to the DEBUG level.
2022-03-10 12:56:40 -08:00
Parag Jain 2efb74ff1e
fix supervisor auto scaler config serde bug (#12317) 2022-03-09 16:17:12 -08:00
Abhishek Agarwal 6346b9561d
Reuse the InputEntityReader in SettableByteEntityReader (#12269)
* Reuse the InputEntityReader in SettableByteEntityReader

* Fix logic

* Fix kafka streaming ingestion

* Add Tests for kafka input format change

* Address review comments
2022-03-09 14:38:31 -08:00
Clint Wylie 0600772cce
use a non-concurrent map for lookups-cached-global unless incremental updates are actually required (#12293)
* use a non-concurrent map for lookups-cached-global unless incremental updates are actually required
* adjustments
* fix test
2022-03-08 21:54:25 -08:00
Gian Merlino 28f8bcce9b
Always reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307)
* Always reopen stream in FileUtils.copyLarge, RetryingInputStream.

When an InputStream throws an exception from one of its read methods,
we should assume it's bad and reopen it.

The main changes here are:

- In FileUtils.copyLarge, replace InputStream with InputStreamSupplier.
- In RetryingInputStream, collapse retryCondition and resetCondition
  into a single condition. Also, make it required, since every usage
  is passing in a specific condition anyway.

* Test fixes.

* Fix read impl.
2022-03-05 14:39:14 -08:00
Frank Chen 36bc41855d
Set Content-Type for String based response (#12295) 2022-03-04 15:17:03 +08:00
Alexander Saydakov 50038d9344
latest datasketches-java-3.1.0 (#12224)
These changes are to use the latest datasketches-java-3.1.0 and also to restore support for quantile and HLL4 sketches to be able to grow larger than a given buffer in a buffer aggregator and move to heap in rare cases. This was discussed in #11544.

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
2022-03-01 17:14:42 -08:00
Laksh Singla 3f709db173
Make ParseExceptions more informative (#12259)
This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader)

Following changes are addressed in this PR:

A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next().
IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number").
TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error.
This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).
2022-02-28 22:31:15 +05:30
Xavier Léauté d105519558
Replace use of PowerMock with Mockito (#12282)
Mockito now supports all our needs and plays much better with recent Java versions.
Migrating to Mockito also simplifies running the kind of tests that required PowerMock in the past. 

* replace all uses of powermock with mockito-inline
* upgrade mockito to 4.3.1 and fix use of deprecated methods
* import mockito bom to align all our mockito dependencies
* add powermock to forbidden-apis to avoid accidentally reintroducing it in the future
2022-02-27 22:47:09 -08:00
Xavier Léauté 4c61878f9c
Reduce use of mocking and simplify some tests (#12283)
* remove use of mocks for ServiceMetricEvent
* simplify KafkaEmitterTests by moving to Mockito
* speed up KafkaEmitterTest by adjusting reporting frequency in tests
* remove unnecessary easymock and JUnitParams dependencies
2022-02-26 17:23:09 -08:00
Jihoon Son e5ad862665
A new includeAllDimension flag for dimensionsSpec (#12276)
* includeAllDimensions in dimensionsSpec

* doc

* address comments

* unused import and doc spelling
2022-02-25 18:27:48 -08:00
Karan Kumar b86f2d4c2e
Performance fixes in proto readers (#12267) 2022-02-24 23:21:48 +05:30
Xavier Léauté 009dd9e09a
upgrade core Apache Kafka dependencies to 3.1.0 (#12203)
Announcement: https://blogs.apache.org/kafka/entry/what-s-new-in-apache7
Release notes: https://dist.apache.org/repos/dist/release/kafka/3.1.0/RELEASE_NOTES.html

* upgrade core Apache Kafka dependencies to 3.1.0
* fix use of private Kafka APIs
* remove deprecated test rules
* remove mock calls that weren't verified in the first place
* remove the need for powermock in KafkaLookupExtractorFactoryTest
* align curator-test version with curator itself
* update easymock to 4.3.0
2022-02-23 18:42:51 -08:00
Karan Kumar b94390ba33
Adding Shared Access resource support for azure (#12266)
Azure Blob storage has multiple modes of authentication. One of them is Shared access resource
. This is very useful in cases when we do not want to add the account key in the druid properties .
2022-02-22 18:27:43 +05:30
AmatyaAvadhanula 1ec57cb935
Improve kinesis task assignment after resharding (#12235)
Problem:
- When a kinesis stream is resharded, the original shards are closed.
   Any intermediate shard created in the process is eventually closed as well.
- If a shard is closed before any record is put into it, it can be safely ignored for ingestion.
- It is expensive to determine if a closed shard is empty, since it requires a call to the Kinesis cluster.

Changes:
- Maintain a cache of closed empty and closed non-empty shards in `KinesisSupervisor`
- Add config `skipIngorableShards` to `KinesisSupervisorTuningConfig`
- The caches are used and updated only when `skipIgnorableShards = true`
2022-02-18 12:37:06 +05:30
William Hyun 34bc361953
Update ORC to 1.7.2 (#12084) 2022-02-15 10:04:12 -08:00
Clint Wylie 3ee66bb492
allow optimizing sql expressions and virtual columns (#12241)
* rework sql planner expression and virtual column handling

* simplify a bit

* add back and deprecate old methods, more tests, fix multi-value string coercion bug and associated tests

* spotbugs

* fix bugs with multi-value string array expression handling

* javadocs and adjust test

* better

* fix tests
2022-02-09 14:55:50 -08:00
Jihoon Son ab3d994a17
Lazy instantiation for segmentKillers, segmentMovers, and segmentArchivers (#12207)
* working

* Lazily load segmentKillers, segmentMovers, and segmentArchivers

* more tests

* test-jar plugin

* more coverage

* lazy client

* clean up changes

* checkstyle

* i did not change the branch condition

* adjust failure rate to run tests faster

* javadocs

* checkstyle
2022-02-08 13:02:06 -08:00
Gian Merlino de82c611de
Harmonize implementations of "visit" for Exprs from ExprMacros. (#12230)
* Harmonize implementations of "visit" for Exprs from ExprMacros.

Many of them had bugs where they would not visit all of the original
arguments. I don't think this has user-visible consequences right now,
but it's possible it would in a future world where "visit" is used
for more stuff than it is today.

So, this patch all updates all implementations to a more consistent
style that emphasizes reapplying the macro to the shuttled args.

* Test fixes, test coverage, PR review comments.
2022-02-04 08:08:54 -08:00
Kashif Faraz e648b01afb
Improve memory estimates in Aggregator and DimensionIndexer (#12073)
Fixes #12022  

### Description
The current implementations of memory estimation in `OnHeapIncrementalIndex` and `StringDimensionIndexer` tend to over-estimate which leads to more persistence cycles than necessary.

This PR replaces the max estimation mechanism with getting the incremental memory used by the aggregator or indexer at each invocation of `aggregate` or `encode` respectively.

### Changes
- Add new flag `useMaxMemoryEstimates` in the task context. This overrides the same flag in DefaultTaskConfig i.e. `druid.indexer.task.default.context` map
- Add method `AggregatorFactory.factorizeWithSize()` that returns an `AggregatorAndSize` which contains
  the aggregator instance and the estimated initial size of the aggregator
- Add method `Aggregator.aggregateWithSize()` which returns the incremental memory used by this aggregation step
- Update the method `DimensionIndexer.processRowValsToKeyComponent()` to return the encoded key component as well as its effective size in bytes
- Update `OnHeapIncrementalIndex` to use the new estimations only if `useMaxMemoryEstimates = false`
2022-02-03 10:34:02 +05:30
Clint Wylie 978b8f7dde
do not explode if mysql transient exception class does not exist (#12213)
Follow up to #12205 to allow druid-mysql-extensions to work with mysql connector/j 8.x again, which does not contain MySQLTransientException, and while would have had the same problem as mariadb if a transient exception was checked, the new check eagerly loads the class when starting up, causing immediate failure.
2022-02-01 09:06:24 +05:30
Clint Wylie 5d2291991e
use reflection to check for mysql transient exception type (#12205)
* use reflection to check for mysql transient exception type

* better

* oops
2022-01-27 13:13:16 -08:00
AmatyaAvadhanula 1f63b447c4
Mitigate Kinesis stream LimitExceededException by using listShards API (#12161)
Makes kinesis ingestion resilient to `LimitExceededException` caused by resharding.
Replace `describeStream` with `listShards` (recommended) to get shard related info.
`describeStream` has a limit (100) to the number of shards returned per call and a low default TPS limit of 10.
`listShards` returns the info for at most 1000 shards and has a higher TPS limit of 100 as well.

Key changed/added classes in this PR
 * `KinesisRecordSupplier`
 * `KinesisAdminClient`
2022-01-21 10:15:51 +05:30
Abhishek Agarwal 53c0e489c2
Fix infinite retrying during task pausing (#12167)
This fixes a bug that causes TaskClient in overlord to continuously retry to pause tasks. This can happen when a task is not responding to the pause command. Ideally, in such a case when the task is unresponsive, the overlord would have given up after a few retries and would have killed the task. However, due to this bug, retries go on forever.
2022-01-19 09:03:36 +05:30
Jonathan Wei 74c876e578
Throw parse exceptions on schema get errors for SchemaRegistryBasedAvroBytesDecoder (#12080)
* Add option to throw parse exceptions on schema get errors for SchemaRegistryBasedAvroBytesDecoder

* Remove option
2022-01-13 12:36:51 -06:00