druid

Commit Graph

Author	SHA1	Message	Date
Tejaswini Bandlamudi	a85b1d8985	Lazy Initialisation of Orc extensions module (#12663 ) * Lazy initialization of Orc extension * nit * moving intialize method to OrcInputFormat	2022-06-21 11:13:10 +05:30
AmatyaAvadhanula	f970757efc	Optimize overlord GET /tasks memory usage (#12404 ) The web-console (indirectly) calls the Overlord’s GET tasks API to fetch the tasks' summary which in turn queries the metadata tasks table. This query tries to fetch several columns, including payload, of all the rows at once. This introduces a significant memory overhead and can cause unresponsiveness or overlord failure when the ingestion tab is opened multiple times (due to several parallel calls to this API) Another thing to note is that the task table (the payload column in particular) can be very large. Extracting large payloads from such tables can be very slow, leading to slow UI. While we are fixing the memory pressure in the overlord, we can also fix the slowness in UI caused by fetching large payloads from the table. Fetching large payloads also puts pressure on the metadata store as reported in the community (Metadata store query performance degrades as the tasks in druid_tasks table grows · Issue #12318 · apache/druid ) The task summaries returned as a response for the API are several times smaller and can fit comfortably in memory. So, there is an opportunity here to fix the memory usage, slow ingestion, and under-pressure metadata store by removing the need to handle large payloads in every layer we can. Of course, the solution becomes complex as we try to fix more layers. With that in mind, this page captures two approaches. They vary in complexity and also in the degree to which they fix the aforementioned problems.	2022-06-16 22:30:37 +05:30
Dongjoon Hyun	79f86a0511	Upgrade ORC to 1.7.4 (#12572 ) This commit upgrades Apache ORC library from 1.7.2 to 1.7.4. Apache ORC 1.7.4 is the maintenance release with the following bug fixes. https://orc.apache.org/news/2022/04/15/ORC-1.7.4/ https://github.com/apache/orc/releases/tag/v1.7.4	2022-05-28 17:44:36 +05:30
Gian Merlino	4631cff2a9	Free ByteBuffers in tests and fix some bugs. (#12521 ) * Ensure ByteBuffers allocated in tests get freed. Many tests had problems where a direct ByteBuffer would be allocated and then not freed. This is bad because it causes flaky tests. To fix this: 1) Add ByteBufferUtils.allocateDirect(size), which returns a ResourceHolder. This makes it easy to free the direct buffer. Currently, it's only used in tests, because production code seems OK. 2) Update all usages of ByteBuffer.allocateDirect (off-heap) in tests either to ByteBuffer.allocate (on-heap, which are garbaged collected), or to ByteBufferUtils.allocateDirect (wherever it seemed like there was a good reason for the buffer to be off-heap). Make sure to close all direct holders when done. * Changes based on CI results. * A different approach. * Roll back BitmapOperationTest stuff. * Try additional surefire memory. * Revert "Roll back BitmapOperationTest stuff." This reverts commit `49f846d9e3`. * Add TestBufferPool. * Revert Xmx change in tests. * Better behaved NestedQueryPushDownTest. Exit tests on OOME. * Fix TestBufferPool. * Remove T1C from ARM tests. * Somewhat safer. * Fix tests. * Fix style stuff. * Additional debugging. * Reset null / expr configs better. * ExpressionLambdaAggregatorFactory thread-safety. * Alter forkNode to try to get better info when a JVM crashes. * Fix buffer retention in ExpressionLambdaAggregatorFactory. * Remove unused import.	2022-05-19 07:42:29 -07:00
Kashif Faraz	7ab2170802	Use datasketches version 3.2.0 (#12509 ) Changes: - Use apache datasketches version 3.2.0. - Remove unsafe reflection-based usage of datasketch internals added in #12022	2022-05-13 11:28:15 +05:30
Lucas Capistrant	39e7191f03	Add authentication call before cleaning up intermediate files in hadoop ingestions (#12030 ) * Add authentication call before cleaning up intermediate files in hadoop ingestions * fix checkstyle * remove debug log	2022-05-02 08:40:44 -05:00
MC-JY	bb080693a9	Improve build performance of modules (#12486 ) * improve build performance of modules * improve build performance of modules * Update pom.xml * improve build performance of modules	2022-05-01 22:43:11 +08:00
Gian Merlino	529b983ad0	GroupBy: Reduce allocations by reusing entry and key holders. (#12474 ) * GroupBy: Reduce allocations by reusing entry and key holders. Two main changes: 1) Reuse Entry objects returned by various implementations of Grouper.iterator. 2) Reuse key objects contained within those Entry objects. This is allowed by the contract, which states that entries must be processed and immediately discarded. However, not all call sites respected this, so this patch also updates those call sites. One particularly sneaky way that the old code retained entries too long is due to Guava's MergingIterator and CombiningIterator. Internally, these both advance to the next value prior to returning the current value. So, this patch addresses that in two ways: 1) For merging, we have our own implementation MergeIterator already, although it had the same problem. So, this patch updates our implementation to return the current item prior to advancing to the next item. It also adds a forbidden-api entry to ensure that this safer implementation is used instead of Guava's. 2) For combining, we address the problem in a different way: by copying the key when creating the new, combined entry. * Attempt to fix test. * Remove unused import.	2022-04-28 23:21:13 -07:00
Abhishek Agarwal	2fe053c5cb	Bump up the versions (#12480 )	2022-04-27 14:28:20 +05:30
Jihoon Son	73ce5df22d	Add support for authorizing query context params (#12396 ) The query context is a way that the user gives a hint to the Druid query engine, so that they enforce a certain behavior or at least let the query engine prefer a certain plan during query planning. Today, there are 3 types of query context params as below. Default context params. They are set via druid.query.default.context in runtime properties. Any user context params can be default params. User context params. They are set in the user query request. See https://druid.apache.org/docs/latest/querying/query-context.html for parameters. System context params. They are set by the Druid query engine during query processing. These params override other context params. Today, any context params are allowed to users. This can cause 1) a bad UX if the context param is not matured yet or 2) even query failure or system fault in the worst case if a sensitive param is abused, ex) maxSubqueryRows. This PR adds an ability to limit context params per user role. That means, a query will fail if you have a context param set in the query that is not allowed to you. To do that, this PR adds a new built-in resource type, QUERY_CONTEXT. The resource to authorize has a name of the context param (such as maxSubqueryRows) and the type of QUERY_CONTEXT. To allow a certain context param for a user, the user should be granted WRITE permission on the context param resource. Here is an example of the permission. { "resourceAction" : { "resource" : { "name" : "maxSubqueryRows", "type" : "QUERY_CONTEXT" }, "action" : "WRITE" }, "resourceNamePattern" : "maxSubqueryRows" } Each role can have multiple permissions for context params. Each permission should be set for different context params. When a query is issued with a query context X, the query will fail if the user who issued the query does not have WRITE permission on the query context X. In this case, HTTP endpoints will return 403 response code. JDBC will throw ForbiddenException. Note: there is a context param called brokerService that is used only by the router. This param is used to pin your query to run it in a specific broker. Because the authorization is done not in the router, but in the broker, if you have brokerService set in your query without a proper permission, your query will fail in the broker after routing is done. Technically, this is not right because the authorization is checked after the context param takes effect. However, this should not cause any user-facing issue and thus should be OK. The query will still fail if the user doesn’t have permission for brokerService. The context param authorization can be enabled using druid.auth.authorizeQueryContextParams. This is disabled by default to avoid any hassle when someone upgrades his cluster blindly without reading release notes.	2022-04-21 14:21:16 +05:30
PJ Fanning	341c65738d	issue-12426 upgrade k8s client due to cve (#12427 ) * issue-12426 upgrade k8s client due to cve * compile issues * try to fix license check	2022-04-21 10:11:55 +08:00
Rohan Garg	de9f12b5c6	Fail fast incase a lookup load fails (#12397 ) Currently while loading a lookup for the first time, loading threads blocks for `waitForFirstRunMs` incase the lookup failed to load. If the `waitForFirstRunMs` is long (like 10 minutes), such blocking can slow down the loading of other lookups. This commit allows the thread to progress as soon as the loading of the lookup fails.	2022-04-18 13:14:02 +05:30
AmatyaAvadhanula	067254b778	Package kinesis client jar within the extension (#12370 ) amazon-kinesis-client was not covered undered the apache license and required separate insertion in the kinesis extension. This can now be avoided since it is covered, and including it within druid helps prevent incompatibilities. Allows enabling of deaggregation out of the box by packaging amazon-kinesis-client (1.14.4) with druid for kinesis ingestion.	2022-04-04 21:31:18 +05:30
AmatyaAvadhanula	c5531be553	Add feature flag for Kinesis listShards API usage (#12383 ) listShards API was used to get all the shards for kinesis ingestion to improve its resiliency as part of #12161. However, this may require additional permissions in the IAM policy where the stream is present. (Please refer to: https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListShards.html). A dynamic configuration useListShards has been added to KinesisSupervisorTuningConfig to control the usage of this API and prevent issues upon upgrade. It can be safely turned on (and is recommended when using kinesis ingestion) by setting this configuration to true.	2022-04-04 14:58:10 +05:30
Jihoon Son	b6eeef31e5	Store null columns in the segments (#12279 ) * Store null columns in the segments * fix test * remove NullNumericColumn and unused dependency * fix compile failure * use guava instead of apache commons * split new tests * unused imports * address comments	2022-03-23 16:54:04 -07:00
Kyle Larose	db91961af7	kubernetes: restart watch on null response (#12233 ) * kubernetes: restart watch on null response Kubernetes watches allow a client to efficiently processes changes to resources. However, they have some idiosyncrasies. In particular, they can error out for various reasons leading to what would normally be seen as an invalid result. The Druid kubernetes node discovery subsystem does not handle a certain case properly. The watch can return an item with a null object. These leads to a null pointer exception. When this happens, the provider needs to restart the watch, because rerunning the watch from the same resource version leads to the same result: yet another null pointer exception. This commit changes the provider to handle null objects by restarting the watch. * review: add more coverage This adds a bit more coverage to the K8sDruidNodeDiscoveryProvider watch loop, and removes an unnecessay return. * kubernetes: reduce logging verbosity The log messages about items being NULL don't really deserve to be at a level other than DEBUG since they are not actionable, particularly since we automatically recover now. Move them to the DEBUG level.	2022-03-10 12:56:40 -08:00
Parag Jain	2efb74ff1e	fix supervisor auto scaler config serde bug (#12317 )	2022-03-09 16:17:12 -08:00
Abhishek Agarwal	6346b9561d	Reuse the InputEntityReader in SettableByteEntityReader (#12269 ) * Reuse the InputEntityReader in SettableByteEntityReader * Fix logic * Fix kafka streaming ingestion * Add Tests for kafka input format change * Address review comments	2022-03-09 14:38:31 -08:00
Clint Wylie	0600772cce	use a non-concurrent map for lookups-cached-global unless incremental updates are actually required (#12293 ) * use a non-concurrent map for lookups-cached-global unless incremental updates are actually required * adjustments * fix test	2022-03-08 21:54:25 -08:00
Gian Merlino	28f8bcce9b	Always reopen stream in FileUtils.copyLarge, RetryingInputStream. (#12307 ) * Always reopen stream in FileUtils.copyLarge, RetryingInputStream. When an InputStream throws an exception from one of its read methods, we should assume it's bad and reopen it. The main changes here are: - In FileUtils.copyLarge, replace InputStream with InputStreamSupplier. - In RetryingInputStream, collapse retryCondition and resetCondition into a single condition. Also, make it required, since every usage is passing in a specific condition anyway. * Test fixes. * Fix read impl.	2022-03-05 14:39:14 -08:00
Frank Chen	36bc41855d	Set Content-Type for String based response (#12295 )	2022-03-04 15:17:03 +08:00
Alexander Saydakov	50038d9344	latest datasketches-java-3.1.0 (#12224 ) These changes are to use the latest datasketches-java-3.1.0 and also to restore support for quantile and HLL4 sketches to be able to grow larger than a given buffer in a buffer aggregator and move to heap in rare cases. This was discussed in #11544. Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>	2022-03-01 17:14:42 -08:00
Laksh Singla	3f709db173	Make ParseExceptions more informative (#12259 ) This PR aims to make the ParseExceptions in Druid more informative, by adding additional information (metadata) to the ParseException, which can contain additional information about the exception. For example - the path of the file generating the issue, the line number (where it can be easily fetched - like CsvReader) Following changes are addressed in this PR: A new class CloseableIteratorWithMetadata has been created which is like CloseableIterator but also has a metadata method that returns a context Map<String, Object> about the current element returned by next(). IntermediateRowParsingReader#read() now attaches the InputEntity and the "record number" which created the exception (while parsing them), and IntermediateRowParsingReader#sample attaches the InputEntity (but not the "record number"). TextReader (and its subclasses), which is a specific implementation of the IntermediateRowParsingReader also include the line number which caused the generation of the error. This will also help in triaging the issues when InputSourceReader generates ParseException because it can point to the specific InputEntity which caused the exception (while trying to read it).	2022-02-28 22:31:15 +05:30
Xavier Léauté	d105519558	Replace use of PowerMock with Mockito (#12282 ) Mockito now supports all our needs and plays much better with recent Java versions. Migrating to Mockito also simplifies running the kind of tests that required PowerMock in the past. * replace all uses of powermock with mockito-inline * upgrade mockito to 4.3.1 and fix use of deprecated methods * import mockito bom to align all our mockito dependencies * add powermock to forbidden-apis to avoid accidentally reintroducing it in the future	2022-02-27 22:47:09 -08:00
Xavier Léauté	4c61878f9c	Reduce use of mocking and simplify some tests (#12283 ) * remove use of mocks for ServiceMetricEvent * simplify KafkaEmitterTests by moving to Mockito * speed up KafkaEmitterTest by adjusting reporting frequency in tests * remove unnecessary easymock and JUnitParams dependencies	2022-02-26 17:23:09 -08:00
Jihoon Son	e5ad862665	A new includeAllDimension flag for dimensionsSpec (#12276 ) * includeAllDimensions in dimensionsSpec * doc * address comments * unused import and doc spelling	2022-02-25 18:27:48 -08:00
Karan Kumar	b86f2d4c2e	Performance fixes in proto readers (#12267 )	2022-02-24 23:21:48 +05:30
Xavier Léauté	009dd9e09a	upgrade core Apache Kafka dependencies to 3.1.0 (#12203 ) Announcement: https://blogs.apache.org/kafka/entry/what-s-new-in-apache7 Release notes: https://dist.apache.org/repos/dist/release/kafka/3.1.0/RELEASE_NOTES.html * upgrade core Apache Kafka dependencies to 3.1.0 * fix use of private Kafka APIs * remove deprecated test rules * remove mock calls that weren't verified in the first place * remove the need for powermock in KafkaLookupExtractorFactoryTest * align curator-test version with curator itself * update easymock to 4.3.0	2022-02-23 18:42:51 -08:00
Karan Kumar	b94390ba33	Adding Shared Access resource support for azure (#12266 ) Azure Blob storage has multiple modes of authentication. One of them is Shared access resource . This is very useful in cases when we do not want to add the account key in the druid properties .	2022-02-22 18:27:43 +05:30
AmatyaAvadhanula	1ec57cb935	Improve kinesis task assignment after resharding (#12235 ) Problem: - When a kinesis stream is resharded, the original shards are closed. Any intermediate shard created in the process is eventually closed as well. - If a shard is closed before any record is put into it, it can be safely ignored for ingestion. - It is expensive to determine if a closed shard is empty, since it requires a call to the Kinesis cluster. Changes: - Maintain a cache of closed empty and closed non-empty shards in `KinesisSupervisor` - Add config `skipIngorableShards` to `KinesisSupervisorTuningConfig` - The caches are used and updated only when `skipIgnorableShards = true`	2022-02-18 12:37:06 +05:30
William Hyun	34bc361953	Update ORC to 1.7.2 (#12084 )	2022-02-15 10:04:12 -08:00
Clint Wylie	3ee66bb492	allow optimizing sql expressions and virtual columns (#12241 ) * rework sql planner expression and virtual column handling * simplify a bit * add back and deprecate old methods, more tests, fix multi-value string coercion bug and associated tests * spotbugs * fix bugs with multi-value string array expression handling * javadocs and adjust test * better * fix tests	2022-02-09 14:55:50 -08:00
Jihoon Son	ab3d994a17	Lazy instantiation for segmentKillers, segmentMovers, and segmentArchivers (#12207 ) * working * Lazily load segmentKillers, segmentMovers, and segmentArchivers * more tests * test-jar plugin * more coverage * lazy client * clean up changes * checkstyle * i did not change the branch condition * adjust failure rate to run tests faster * javadocs * checkstyle	2022-02-08 13:02:06 -08:00
Gian Merlino	de82c611de	Harmonize implementations of "visit" for Exprs from ExprMacros. (#12230 ) * Harmonize implementations of "visit" for Exprs from ExprMacros. Many of them had bugs where they would not visit all of the original arguments. I don't think this has user-visible consequences right now, but it's possible it would in a future world where "visit" is used for more stuff than it is today. So, this patch all updates all implementations to a more consistent style that emphasizes reapplying the macro to the shuttled args. * Test fixes, test coverage, PR review comments.	2022-02-04 08:08:54 -08:00
Kashif Faraz	e648b01afb	Improve memory estimates in Aggregator and DimensionIndexer (#12073 ) Fixes #12022 ### Description The current implementations of memory estimation in `OnHeapIncrementalIndex` and `StringDimensionIndexer` tend to over-estimate which leads to more persistence cycles than necessary. This PR replaces the max estimation mechanism with getting the incremental memory used by the aggregator or indexer at each invocation of `aggregate` or `encode` respectively. ### Changes - Add new flag `useMaxMemoryEstimates` in the task context. This overrides the same flag in DefaultTaskConfig i.e. `druid.indexer.task.default.context` map - Add method `AggregatorFactory.factorizeWithSize()` that returns an `AggregatorAndSize` which contains the aggregator instance and the estimated initial size of the aggregator - Add method `Aggregator.aggregateWithSize()` which returns the incremental memory used by this aggregation step - Update the method `DimensionIndexer.processRowValsToKeyComponent()` to return the encoded key component as well as its effective size in bytes - Update `OnHeapIncrementalIndex` to use the new estimations only if `useMaxMemoryEstimates = false`	2022-02-03 10:34:02 +05:30
Clint Wylie	978b8f7dde	do not explode if mysql transient exception class does not exist (#12213 ) Follow up to #12205 to allow druid-mysql-extensions to work with mysql connector/j 8.x again, which does not contain MySQLTransientException, and while would have had the same problem as mariadb if a transient exception was checked, the new check eagerly loads the class when starting up, causing immediate failure.	2022-02-01 09:06:24 +05:30
Clint Wylie	5d2291991e	use reflection to check for mysql transient exception type (#12205 ) * use reflection to check for mysql transient exception type * better * oops	2022-01-27 13:13:16 -08:00
AmatyaAvadhanula	1f63b447c4	Mitigate Kinesis stream LimitExceededException by using listShards API (#12161 ) Makes kinesis ingestion resilient to `LimitExceededException` caused by resharding. Replace `describeStream` with `listShards` (recommended) to get shard related info. `describeStream` has a limit (100) to the number of shards returned per call and a low default TPS limit of 10. `listShards` returns the info for at most 1000 shards and has a higher TPS limit of 100 as well. Key changed/added classes in this PR * `KinesisRecordSupplier` * `KinesisAdminClient`	2022-01-21 10:15:51 +05:30
Abhishek Agarwal	53c0e489c2	Fix infinite retrying during task pausing (#12167 ) This fixes a bug that causes TaskClient in overlord to continuously retry to pause tasks. This can happen when a task is not responding to the pause command. Ideally, in such a case when the task is unresponsive, the overlord would have given up after a few retries and would have killed the task. However, due to this bug, retries go on forever.	2022-01-19 09:03:36 +05:30
Jonathan Wei	74c876e578	Throw parse exceptions on schema get errors for SchemaRegistryBasedAvroBytesDecoder (#12080 ) * Add option to throw parse exceptions on schema get errors for SchemaRegistryBasedAvroBytesDecoder * Remove option	2022-01-13 12:36:51 -06:00
imply-cheddar	b153cb2342	Add a small LRU cache and use utf8 bytes in ArrayOfDoubles (#12130 ) * Add a small LRU cache and use utf8 bytes in ArrayOfDoubles * Add tests for extra branches * Even more tests for branch coverage * Fix Style	2022-01-11 13:04:11 -08:00
somu-imply	08fea7a46a	input type validation for datasketches hll "build" aggregator factory (#12131 ) * Ingestion will fail for HLLSketchBuild instead of creating with incorrect values * Addressing review comments for HLL< updated error message introduced test case	2022-01-11 12:00:14 -08:00
Suneet Saldanha	25ac04e067	MySqlFirehoseDatabaseConnector uses configured driver class name (#12049 )	2021-12-09 20:58:55 -08:00
Frank Chen	58245b4617	Support JsonPath functions in JsonPath expressions (#11722 ) * Add jsonPath functions support * Add jsonPath function test for Avro * Add jsonPath function length() to Orc * Add jsonPath function length() to Parquet * Add more tests to ORC format * update doc * Fix exception during ingestion * Add IT test case * Revert "Fix exception during ingestion" This reverts commit `5a5484b9ea`. * update IT test case * Add 'keys()' * Commit IT test case * Fix UT	2021-12-10 10:53:23 +08:00
Jonathan Wei	229f82a6f0	Add parse error list API for stream supervisors, use structured object for parse exceptions, simplify parse exception message (#11961 ) * Add parse error list API for stream supervisors, simplify parse exception message * Add input string to parse exception * Use structured ParseExceptionReport * Fix tests * Add test * PR comments, add ParseExceptionReport equals verifier * Fix test	2021-12-09 15:42:55 -06:00
zachjsh	65cadbe42a	Fix bad lookup config fails task (#12021 ) This PR fixes an issue in which if a lookup is configured incorreclty; does not serialize properly when being pulled by peon node, it causes the task to fail. The failure occurs because the peon and other leaf nodes (broker, historical), have retry logic that continues to retry the lookup loading for 3 minutes by default. The http listener thread on the peon task is not started until lookup loading completes, by default, the overlord waits 1 minute by default, to communicate with the peon task to get the task status, after which is orders the task to shut down, causing the ingestion task to fail. To fix the issue, we catch the exception serialization error, and do not retry. Also fixed an issue in which a bad lookup config interferes with any other good lookup configs from being loaded.	2021-12-07 00:55:34 -05:00
Gian Merlino	e0e05aad99	Enhancements to IndexTaskClient. (#12011 ) * Enhancements to IndexTaskClient. 1) Ability to use handlers other than StringFullResponseHandler. This functionality is not used in production code yet, but is useful because it will allow tasks to communicate with each other in non-string-based formats and in streaming fashion. In the future, we'll be able to use this to make task-to-task communication more efficient. 2) Truncate server errors at 1KB, so long errors do not pollute logs. 3) Change error log level for retryable errors from WARN to INFO. (The final error is still WARN.) 4) Harmonize log and exception messages to have a more consistent format. * Additional tests and improvements.	2021-12-03 09:14:32 -08:00
Frank Chen	c2cea25a6b	Improve exception message when loading data from web-console (#11723 ) * Improve exception handling * Revert some changes * Resolve comments * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/sampler/SamplerExceptionMapper.java Co-authored-by: Karan Kumar <karankumar1100@gmail.com> * Update indexing-service/src/main/java/org/apache/druid/indexing/overlord/sampler/SamplerExceptionMapper.java Co-authored-by: Karan Kumar <karankumar1100@gmail.com> * Address review comments Co-authored-by: Karan Kumar <karankumar1100@gmail.com>	2021-12-03 21:33:49 +08:00
Abhishek Agarwal	503384569a	Fix classNotFoundException when connecting to secure LDAP (#11978 ) This PR fixes a problem where the com.sun.jndi.ldap.Connection tries to build BasicSecuritySSLSocketFactory when calling LDAPCredentialsValidator.validateCredentials since BasicSecuritySSLSocketFactory is in extension class loader and not visible to system classloader.	2021-12-03 12:08:19 +05:30
Clint Wylie	84b4bf56d8	vectorize logical operators and boolean functions (#11184 ) changes: * adds new config, druid.expressions.useStrictBooleans which make longs the official boolean type of all expressions * vectorize logical operators and boolean functions, some only if useStrictBooleans is true	2021-12-02 16:40:23 -08:00

1 2 3 4 5 ...

956 Commits