Commit Graph

989 Commits

Author SHA1 Message Date
Abhishek Agarwal 618757352b
Bump up the version to 25.0.0 (#12975)
* Bump up the version to 25.0.0

* Fix the version in console
2022-08-29 11:27:38 +05:30
Alexander Saydakov 7e2371bbde
KLL sketch (#12498)
* KLL sketch

* added documentation

* direct static refs

* direct static refs

* fixed test

* addressed review points

* added KLL sketch related terms

* return a copy from get

* Copy unions when returning them from "get".

* Remove redundant "final".

Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>
Co-authored-by: Gian Merlino <gianmerlino@gmail.com>
2022-08-26 21:19:24 -07:00
Junge e476e75462
fix #12945 - type conversion exception occurs during the variance query (#12967)
Co-authored-by: gejun <gejun@tingyun.com>
2022-08-25 18:10:58 -07:00
Karan Kumar 275f834b2a
Race in Task report/log streamer (#12931)
* Fixing RACE in HTTP remote task Runner

* Changes in the interface

* Updating documentation

* Adding test cases to SwitchingTaskLogStreamer

* Adding more tests
2022-08-25 17:56:01 -07:00
Clint Wylie 8ee8786d3c
add maxBytesInMemory and maxClientResponseBytes to SamplerConfig (#12947)
* add maxBytesInMemory and maxClientResponseBytes to SamplerConfig
2022-08-25 00:50:41 -07:00
Clint Wylie 82ad927087
tighten up array handling, fix bug with array_slice output type inference (#12914) 2022-08-25 00:48:49 -07:00
Karan Kumar 31db3beed8
Fixing json creator for s3 storage connector provider (#12948)
* Fixing json creator for s3 storage connector provider

* Adding guice tests
2022-08-25 11:08:57 +05:30
Paul Rogers cfed036091
Add the new integration test framework (#12368)
This commit is a first draft of the revised integration test framework which provides:
- A new directory, integration-tests-ex that holds the new integration test structure. (For now, the existing integration-tests is left unchanged.)
- Maven module druid-it-tools to hold code placed into the Docker image.
- Maven module druid-it-image to build the Druid-only test image from the tarball produced in distribution. (Dependencies live in their "official" image.)
- Maven module druid-it-cases that holds the revised tests and the framework itself. The framework includes file-based test configuration, test-specific clients, test initialization and updated versions of some of the common test support classes.

The integration test setup is primarily a huge mass of details. This approach refactors many of those details: from how the image is built and configured to how the Docker Compose scripts are structured to test configuration. An extensive set of "readme" files explains those details. Rather than repeat that material here, please consult those files for explanations.
2022-08-24 17:03:23 +05:30
Adarsh Sanjeev 3b58a01c7c
Correct spelling in messages and variable names. (#12932) 2022-08-24 11:06:31 +05:30
Gian Merlino d7d15ba51f
Add druid-multi-stage-query extension. (#12918)
* Add druid-multi-stage-query extension.

* Adjustments from CI.

* Task ID validation.

* Various changes from code review.

* Remove unnecessary code.

* LGTM-related.
2022-08-23 18:44:01 -07:00
William Hyun a1c4eab522
Update ORC to 1.7.6 (#12928) 2022-08-23 01:09:38 -07:00
AmatyaAvadhanula 379df5f103
Kinesis docs and logs improvements (#12886)
Going ahead with the merge. CI is failing because of a code coverage change in the log line.
2022-08-22 14:49:42 +05:30
imply-cheddar 536415b948
Stop leaking Avro objects from parser (#12828)
The Avro parsing code leaks some "object" representations.
We need to convert them into Maps/Lists so that other code
can understand and expect good things.  Previously, these
objects were handled with .toString(), but that's not a
good contract in terms of how to work with objects.
2022-08-18 03:16:20 +05:30
Paul Rogers 41712b7a3a
Refactor SqlLifecycle into statement classes (#12845)
* Refactor SqlLifecycle into statement classes

Create direct & prepared statements
Remove redundant exceptions from tests
Tidy up Calcite query tests
Make PlannerConfig more testable

* Build fixes

* Added builder to SqlQueryPlus

* Moved Calcites system properties to saffron.properties

* Build fix

* Resolve merge conflict

* Fix IntelliJ inspection issue

* Revisions from reviews

Backed out a revision to Calcite tests that didn't work out as planned

* Build fix

* Fixed spelling errors

* Fixed failed test

Prepare now enforces security; before it did not.

* Rebase and fix IntelliJ inspections issue

* Clean up exception handling

* Fix handling of JDBC auth errors

* Build fix

* More tweaks to security messages
2022-08-14 00:44:08 -07:00
Karan Kumar 2f2d8ded5a
Introducing Storage connector Interface (#12874)
In the current druid code base, we have the interface DataSegmentPusher which allows us to push segments to the appropriate deep storage without the extension being worried about the semantics of how to push too deep storage.

While working on #12262, whose some part of the code will go as an extension, I realized that we do not have an interface that allows us to do basic "write, get, delete, deleteAll" operations on the appropriate deep storage without let's say pulling the s3-storage-extension dependency in the custom extension.

Hence, the idea of StorageConnector was born where the storage connector sits inside the druid core so all extensions have access to it.

Each deep storage implementation, for eg s3, GCS, will implement this interface.
Now with some Jackson magic, we bind the implementation of the correct deep storage implementation on runtime using a type variable.
2022-08-12 16:11:49 +05:30
David Palmer 2855fb6ff8
Change Kafka Lookup Extractor to not register consumer group (#12842)
* change kafka lookups module to not commit offsets

The current behaviour of the Kafka lookup extractor is to not commit
offsets by assigning a unique ID to the consumer group and setting
auto.offset.reset to earliest. This does the job but also pollutes the
Kafka broker with a bunch of "ghost" consumer groups that will never again be
used.

To fix this, we now set enable.auto.commit to false, which prevents the
ghost consumer groups being created in the first place.

* update docs to include new enable.auto.commit setting behaviour

* update kafka-lookup-extractor documentation

Provide some additional detail on functionality and configuration.
Hopefully this will make it clearer how the extractor works for
developers who aren't so familiar with Kafka.

* add comments better explaining the logic of the code

* add spelling exceptions for kafka lookup docs
2022-08-09 16:14:22 +05:30
Hamish Ball abd7a9748d
Remove kafka lookup records when a record is tombstoned (#12819)
* remove kafka lookup records from factory when record tombstoned

* update kafka lookup docs to include tombstone behaviour

* change test wait time down to 10ms

Co-authored-by: David Palmer <david.palmer@adscale.co.nz>
2022-08-09 10:42:51 +05:30
Karan Kumar 607b0b9310
Adding withName implementation to AggregatorFactory (#12862)
* Adding agg factory with name impl

* Adding test cases

* Fixing test case

* Fixing test case

* Updated java docs.
2022-08-08 18:31:56 +05:30
AmatyaAvadhanula d294404924
Kinesis ingestion with empty shards (#12792)
Kinesis ingestion requires all shards to have at least 1 record at the required position in druid.
Even if this is satisified initially, resharding the stream can lead to empty intermediate shards. A significant delay in writing to newly created shards was also problematic.

Kinesis shard sequence numbers are big integers. Introduce two more custom sequence tokens UNREAD_TRIM_HORIZON and UNREAD_LATEST to indicate that a shard has not been read from and that it needs to be read from the start or the end respectively.
These values can be used to avoid the need to read at least one record to obtain a sequence number for ingesting a newly discovered shard.

If a record cannot be obtained immediately, use a marker to obtain the relevant shardIterator and use this shardIterator to obtain a valid sequence number. As long as a valid sequence number is not obtained, continue storing the token as the offset.

These tokens (UNREAD_TRIM_HORIZON and UNREAD_LATEST) are logically ordered to be earlier than any valid sequence number.

However, the ordering requires a few subtle changes to the existing mechanism for record sequence validation:

The sequence availability check ensures that the current offset is before the earliest available sequence in the shard. However, current token being an UNREAD token indicates that any sequence number in the shard is valid (despite the ordering)

Kinesis sequence numbers are inclusive i.e if current sequence == end sequence, there are more records left to read.
However, the equality check is exclusive when dealing with UNREAD tokens.
2022-08-05 22:38:58 +05:30
Paul Rogers a618458bf0
Tidy up construction of the Guice Injectors (#12816)
* Refactor Guice initialization

Builders for various module collections
Revise the extensions loader
Injector builders for server startup
Move Hadoop init to indexer
Clean up server node role filtering
Calcite test injector builder

* Revisions from review comments

* Build fixes

* Revisions from review comments
2022-08-04 00:05:07 -07:00
Gian Merlino ef6811ef88
Improved Java 17 support and Java runtime docs. (#12839)
* Improved Java 17 support and Java runtime docs.

1) Add a "Java runtime" doc page with information about supported
   Java versions, garbage collection, and strong encapsulation..

2) Update asm and equalsverifier to versions that support Java 17.

3) Add additional "--add-opens" lines to surefire configuration, so
   tests can pass successfully under Java 17.

4) Switch openjdk15 tests to openjdk17.

5) Update FrameFile to specifically mention Java runtime incompatibility
   as the cause of not being able to use Memory.map.

6) Update SegmentLoadDropHandler to log an error for Errors too, not
   just Exceptions. This is important because an IllegalAccessError is
   encountered when the correct "--add-opens" line is not provided,
   which would otherwise be silently ignored.

7) Update example configs to use druid.indexer.runner.javaOptsArray
   instead of druid.indexer.runner.javaOpts. (The latter is deprecated.)

* Adjustments.

* Use run-java in more places.

* Add run-java.

* Update .gitignore.

* Exclude hadoop-client-api.

Brought in when building on Java 17.

* Swap one more usage of java.

* Fix the run-java script.

* Fix flag.

* Include link to Temurin.

* Spelling.

* Update examples/bin/run-java

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>

Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
2022-08-03 23:16:05 -07:00
Tejaswini Bandlamudi cceb2e849e
Perform lazy initialization of parquet extensions module (#12827)
Historicals and middle managers crash with an `UnknownHostException` on trying
to load `druid-parquet-extensions` with an ephemeral Hadoop cluster. This happens
because the `fs.defaultFS` URI value cannot be resolved at start up time as the
hadoop cluster may not exist at startup time.

This commit fixes the error by performing initialization of the filesystem in
`ParquetInputFormat.createReader()` whenever a new reader is requested.
2022-08-02 13:41:12 +05:30
Atul Mohan 75045970cd
S3 Ingestion from non-default endpoints (#11798)
* Add endpoint support for s3inputsource

* Changes to tests

* Fix docs

* Fix config

* Fix inspections

* Fix spelling

* Remove password from toString
2022-07-15 11:03:34 -07:00
Clint Wylie e25ba00470
fix bug in ObjectFlatteners.toMap which caused null values in avro-stream/avro-ocf/parquet/orc to be converted to {} instead of null in web-console sampler UI (#12785)
* fix bug in ObjectFlatteners.toMap which caused null values in avro-stream/avro-ocf/parquet/orc to be converted to {} instead of null
* fix parquet test that expected wrong behavior, my bad heh
2022-07-14 16:52:01 -07:00
Gian Merlino e82890fde4
Mark specific nimbus.lang.tag.version. (#12751)
* Mark specific nimbus.lang.tag.version.

* Add ignoredUnusedDeclaredDependencies.
2022-07-07 09:58:35 +05:30
Gian Merlino 2b330186e2
Mid-level service client and updated high-level clients. (#12696)
* Mid-level service client and updated high-level clients.

Our servers talk to each other over HTTP. We have a low-level HTTP
client (HttpClient) that is super-asynchronous and super-customizable
through its handlers. It's also proven to be quite robust: we use it
for Broker -> Historical communication over the wide variety of query
types and workloads we support.

But the low-level client has no facilities for service location or
retries, which means we have a variety of high-level clients that
implement these in their own ways. Some high-level clients do a better
job than others. This patch adds a mid-level ServiceClient that makes
it easier for high-level clients to be built correctly and harmoniously,
and migrates some of the high-level logic to use ServiceClients.

Main changes:

1) Add ServiceClient org.apache.druid.rpc package. That package also
   contains supporting stuff like ServiceLocator and RetryPolicy
   interfaces, and a DiscoveryServiceLocator based on
   DruidNodeDiscoveryProvider.

2) Add high-level OverlordClient in org.apache.druid.rpc.indexing.

3) Indexing task client creator in TaskServiceClients. It uses
   SpecificTaskServiceLocator to find the tasks. This improves on
   ClientInfoTaskProvider by caching task locations for up to 30 seconds
   across calls, reducing load on the Overlord.

4) Rework ParallelIndexSupervisorTaskClient to use a ServiceClient
   instead of extending IndexTaskClient.

5) Rework RemoteTaskActionClient to use a ServiceClient instead of
   DruidLeaderClient.

6) Rework LocalIntermediaryDataManager, TaskMonitor, and
   ParallelIndexSupervisorTask. As a result, MiddleManager, Peon, and
   Overlord no longer need IndexingServiceClient (which internally used
   DruidLeaderClient).

There are some concrete benefits over the prior logic, namely:

- DruidLeaderClient does retries in its "go" method, but only retries
  exactly 5 times, does not sleep between retries, and does not retry
  retryable HTTP codes like 502, 503, 504. (It only retries IOExceptions.)
  ServiceClient handles retries in a more reasonable way.

- DruidLeaderClient's methods are all synchronous, whereas ServiceClient
  methods are asynchronous. This is used in one place so far: the
  SpecificTaskServiceLocator, so we don't need to block a thread trying
  to locate a task. It can be used in other places in the future.

- HttpIndexingServiceClient does not properly handle all server errors.
  In some cases, it tries to parse a server error as a successful
  response (for example: in getTaskStatus).

- IndexTaskClient currently makes an Overlord call on every task-to-task
  HTTP request, as a way to find where the target task is. ServiceClient,
  through SpecificTaskServiceLocator, caches these target locations
  for a period of time.

* Style adjustments.

* For the coverage.

* Adjustments.

* Better behaviors.

* Fixes.
2022-07-05 09:43:26 -07:00
Tejaswini Bandlamudi d559773a0e
sets Hadoop conf ClassLoader (#12738) 2022-07-04 17:07:39 +05:30
Rui Chen 068bea6334
deps: upgrade mysql-connector-java to v5.1.49 (#12704) 2022-06-29 23:15:46 +08:00
Paul Rogers f7caee3b25
Revert changes from #12672 (#12703)
* Revert changes from #12672

* Reverted more conflicting changes

Changes are not needed given previous reversions.
2022-06-25 09:10:44 +05:30
William Hyun 2aadd69f54
Update ORC to 1.7.5 (#12667) 2022-06-24 16:08:42 -07:00
Gian Merlino d5abd06b96
Fix flaky KafkaIndexTaskTest. (#12657)
* Fix flaky KafkaIndexTaskTest.

The testRunTransactionModeRollback case had many race conditions. Most notably,
it would commit a transaction and then immediately check to see that the results
were *not* indexed. This is racey because it relied on the indexing thread being
slower than the test thread.

Now, the case waits for the transaction to be processed by the indexing thread
before checking the results.

* Changes from review.
2022-06-24 13:53:51 -07:00
Didip Kerabat 6ddb828c7a
Able to filter Cloud objects with glob notation. (#12659)
In a heterogeneous environment, sometimes you don't have control over the input folder. Upstream can put any folder they want. In this situation the S3InputSource.java is unusable.

Most people like me solved it by using Airflow to fetch the full list of parquet files and pass it over to Druid. But doing this explodes the JSON spec. We had a situation where 1 of the JSON spec is 16MB and that's simply too much for Overlord.

This patch allows users to pass {"filter": "*.parquet"} and let Druid performs the filtering of the input files.

I am using the glob notation to be consistent with the LocalFirehose syntax.
2022-06-24 11:40:08 +05:30
Paul Rogers ffcb996468
Cleanup changes pulled out of PR #12368 (#12672)
This commit contains the cleanup needed for the new integration test framework.

Changes:
- Fix log lines, misspellings, docs, etc.
- Allow the use of some of Druid's "JSON config" objects in tests
- Fix minor bug in `BaseNodeRoleWatcher`
2022-06-23 23:19:50 +05:30
Tejaswini Bandlamudi a85b1d8985
Lazy Initialisation of Orc extensions module (#12663)
* Lazy initialization of Orc extension

* nit

* moving intialize method to OrcInputFormat
2022-06-21 11:13:10 +05:30
AmatyaAvadhanula f970757efc
Optimize overlord GET /tasks memory usage (#12404)
The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API)

Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid )

The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.
2022-06-16 22:30:37 +05:30
Dongjoon Hyun 79f86a0511
Upgrade ORC to 1.7.4 (#12572)
This commit upgrades Apache ORC library from 1.7.2 to 1.7.4.
Apache ORC 1.7.4 is the maintenance release with the following bug fixes.

https://orc.apache.org/news/2022/04/15/ORC-1.7.4/
https://github.com/apache/orc/releases/tag/v1.7.4
2022-05-28 17:44:36 +05:30
Gian Merlino 4631cff2a9
Free ByteBuffers in tests and fix some bugs. (#12521)
* Ensure ByteBuffers allocated in tests get freed.

Many tests had problems where a direct ByteBuffer would be allocated
and then not freed. This is bad because it causes flaky tests.

To fix this:

1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder.
   This makes it easy to free the direct buffer. Currently, it's only used
   in tests, because production code seems OK.

2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either
   to ByteBuffer.allocate (on-heap, which are garbaged collected), or to
   ByteBufferUtils.allocateDirect (wherever it seemed like there was a good
   reason for the buffer to be off-heap). Make sure to close all direct
   holders when done.

* Changes based on CI results.

* A different approach.

* Roll back BitmapOperationTest stuff.

* Try additional surefire memory.

* Revert "Roll back BitmapOperationTest stuff."

This reverts commit 49f846d9e3.

* Add TestBufferPool.

* Revert Xmx change in tests.

* Better behaved NestedQueryPushDownTest. Exit tests on OOME.

* Fix TestBufferPool.

* Remove T1C from ARM tests.

* Somewhat safer.

* Fix tests.

* Fix style stuff.

* Additional debugging.

* Reset null / expr configs better.

* ExpressionLambdaAggregatorFactory thread-safety.

* Alter forkNode to try to get better info when a JVM crashes.

* Fix buffer retention in ExpressionLambdaAggregatorFactory.

* Remove unused import.
2022-05-19 07:42:29 -07:00
Kashif Faraz 7ab2170802
Use datasketches version 3.2.0 (#12509)
Changes:
- Use apache datasketches version 3.2.0.
- Remove unsafe reflection-based usage of datasketch internals added in #12022
2022-05-13 11:28:15 +05:30
Lucas Capistrant 39e7191f03
Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030)
* Add authentication call before cleaning up intermediate files in hadoop ingestions

* fix checkstyle

* remove debug log
2022-05-02 08:40:44 -05:00
MC-JY bb080693a9
Improve build performance of modules (#12486)
* improve build performance of modules

* improve build performance of modules

* Update pom.xml

* improve build performance of modules
2022-05-01 22:43:11 +08:00
Gian Merlino 529b983ad0
GroupBy: Reduce allocations by reusing entry and key holders. (#12474)
* GroupBy: Reduce allocations by reusing entry and key holders.

Two main changes:

1) Reuse Entry objects returned by various implementations of
   Grouper.iterator.

2) Reuse key objects contained within those Entry objects.

This is allowed by the contract, which states that entries must be
processed and immediately discarded. However, not all call sites
respected this, so this patch also updates those call sites.

One particularly sneaky way that the old code retained entries too long
is due to Guava's MergingIterator and CombiningIterator. Internally,
these both advance to the next value prior to returning the current
value. So, this patch addresses that in two ways:

1) For merging, we have our own implementation MergeIterator already,
   although it had the same problem. So, this patch updates our
   implementation to return the current item prior to advancing to the
   next item. It also adds a forbidden-api entry to ensure that this
   safer implementation is used instead of Guava's.

2) For combining, we address the problem in a different way: by copying
   the key when creating the new, combined entry.

* Attempt to fix test.

* Remove unused import.
2022-04-28 23:21:13 -07:00
Abhishek Agarwal 2fe053c5cb
Bump up the versions (#12480) 2022-04-27 14:28:20 +05:30
Jihoon Son 73ce5df22d
Add support for authorizing query context params (#12396)
The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below.

Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params.
User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters.
System context params. They are set by the Druid query engine during query processing. These params override other context params.
Today, any context params are allowed to users. This can cause 
1) a bad UX if the context param is not matured yet or 
2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows.

This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission.

{
  "resourceAction" : {
    "resource" : {
      "name" : "maxSubqueryRows",
      "type" : "QUERY_CONTEXT"
    },
    "action" : "WRITE"
  },
  "resourceNamePattern" : "maxSubqueryRows"
}
Each role can have multiple permissions for context params. Each permission should be set for different context params.

When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case,

HTTP endpoints will return 403 response code.
JDBC will throw ForbiddenException.
Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService.

The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.
2022-04-21 14:21:16 +05:30
PJ Fanning 341c65738d
issue-12426 upgrade k8s client due to cve (#12427)
* issue-12426 upgrade k8s client due to cve

* compile issues

* try to fix license check
2022-04-21 10:11:55 +08:00
Rohan Garg de9f12b5c6
Fail fast incase a lookup load fails (#12397)
Currently while loading a lookup for the first time, loading threads blocks
for `waitForFirstRunMs` incase the lookup failed to load. If the `waitForFirstRunMs`
is long (like 10 minutes), such blocking can slow down the loading of other lookups.

This commit allows the thread to progress as soon as the loading of the lookup fails.
2022-04-18 13:14:02 +05:30
AmatyaAvadhanula 067254b778
Package kinesis client jar within the extension (#12370)
amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension.
This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities.

Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.
2022-04-04 21:31:18 +05:30
AmatyaAvadhanula c5531be553
Add feature flag for Kinesis listShards API usage (#12383)
listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161.

However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html).

A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.
2022-04-04 14:58:10 +05:30
Jihoon Son b6eeef31e5
Store null columns in the segments (#12279)
* Store null columns in the segments

* fix test

* remove NullNumericColumn and unused dependency

* fix compile failure

* use guava instead of apache commons

* split new tests

* unused imports

* address comments
2022-03-23 16:54:04 -07:00
Kyle Larose db91961af7
kubernetes: restart watch on null response (#12233)
* kubernetes: restart watch on null response

Kubernetes watches allow a client to efficiently processes changes to
resources. However, they have some idiosyncrasies. In particular, they
can error out for various reasons leading to what would normally be seen
as an invalid result.

The Druid kubernetes node discovery subsystem does not handle a certain
case properly. The watch can return an item with a null object.  These
leads to a null pointer exception. When this happens, the provider needs
to restart the watch, because rerunning the watch from the same resource
version leads to the same result: yet another null pointer exception.

This commit changes the provider to handle null objects by restarting
the watch.

* review: add more coverage

This adds a bit more coverage to the K8sDruidNodeDiscoveryProvider watch
loop, and removes an unnecessay return.

* kubernetes: reduce logging verbosity

The log messages about items being NULL don't really deserve to be at a
level other than DEBUG since they are not actionable, particularly since
we automatically recover now. Move them to the DEBUG level.
2022-03-10 12:56:40 -08:00
Parag Jain 2efb74ff1e
fix supervisor auto scaler config serde bug (#12317) 2022-03-09 16:17:12 -08:00