druid

Commit Graph

Author	SHA1	Message	Date
Himanshu	a0d52c3def	k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value (#10961 ) * k8s discovery module: fix issue for druid.host being more than 63chars not permitted as k8s resource label value * update doc * fix test	2021-04-07 17:45:28 -07:00
Jihoon Son	cc12a57034	Enforce allow list for JDBC properties by default (#11063 ) * Enforce allow list for JDBC properties by default * fix tests	2021-04-06 19:46:19 -07:00
Jihoon Son	cfcebc40f6	Allow list for JDBC connection properties to address CVE-2021-26919 (#11047 ) * Allow list for JDBC connection properties to address CVE-2021-26919 * fix tests for java 11	2021-04-01 17:30:47 -07:00
Parag Jain	2fdc313e4d	GCS lookup support (#11026 ) * GCS lookup support * checkstyle fix * review comments * review comments * remove unused import	2021-03-30 01:40:41 +05:30
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
zhangyue19921010	8b4f966708	[BUG FIX]Kinesis lag keep increasing when there is no more new data for kinesis stream (#11006 ) * fix kinesis lag metrics bug and modify current UT * done * revert misc.xml * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-19 07:47:27 -07:00
Maytas Monsereenusorn	ed91a2bb38	Fix Kinesis should not increment throwAway count on EOS record (#10976 ) * fix Kinesis increament throwAway on EOS record * fix checkstyle * fix IT * fix test * fix IT * fix IT * fix IT * fix IT	2021-03-11 22:04:58 -08:00
Yi Yuan	36e86a2880	Add protobuf schema registry (#10839 ) * dd_protobuf_schema_registry * change licese * delete some annotation * nodify tests * delete extra exception * add licenses * add descriptor and protoMessageType in ProtobufInputRowParser for adopt to old version * seperate kafka-protobuf-provider * modify protobuf.md * refine protobuf.md * add config and header * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-03-09 15:15:51 -08:00
Maytas Monsereenusorn	4dd22a850b	Fix streaming ingestion fails if it encounters empty rows (Regression) (#10962 ) * Fix streaming ingestion fails and halt if it encounters empty rows * address comments	2021-03-09 12:11:58 -08:00
Clint Wylie	96889cdebc	add avro + kafka + schema registry integration test (#10929 ) * add avro + schema registry integration test * style * retry init * maybe this * oops heh * this will fix it * review stuffs * fix comment	2021-03-08 08:12:12 -08:00
Jihoon Son	9946306d4b	Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830 ) * Allow only HTTP and HTTPS protocols for the HTTP inputSource * rename * Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * fix http firehose and update doc * HDFS inputSource * add configs for allowed protocols * fix checkstyle and doc * more checkstyle * remove stale doc * remove more doc * Apply doc suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * update hdfs address in docs * fix test Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-06 11:43:00 -08:00
zhangyue19921010	bddacbb1c3	Dynamic auto scale Kafka-Stream ingest tasks (#10524 ) * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * test dynamic auto scale done * auto scale tasks tested on prd cluster * auto scale tasks tested on prd cluster * modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20 * rename test fiel function * change codes and add docs based on capistrant reviewed * midify test docs * modify docs * modify docs * modify docs * merge from master * Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable. * fix ci failed * revert msic.xml * add uts to test autoscaler create && scale out/in and kafka ingest with scale enable * add more uts * fix inner class check * add IT for kafka ingestion with autoscaler * add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler * review change * code review * remove unused imports * fix NLP * fix docs and UTs * revert misc.xml * use jackson to build autoScaleConfig with default values * add uts * use jackson to init AutoScalerConfig in IOConfig instead of Map<> * autoscalerConfig interface and provide a defaultAutoScalerConfig * modify uts * modify docs * fix checkstyle * revert misc.xml * modify uts * reviewed code change * reviewed code change * code reviewed * code review * log changed * do StringUtils.encodeForFormat when create allocationExec * code review && limit taskCountMax to partitionNumbers * modify docs * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-06 14:36:52 +05:30
Abhishek Agarwal	7d9a61cf7f	Suppress CVE-2017-15288 and upgrade bcprov-ext-jdk15o (#10933 )	2021-03-02 16:18:27 -08:00
spinatelli	99198c02af	Add config and header support for confluent schema registry. (#10314 ) * Add config and header support for confluent schema registry. (porting code from https://github.com/apache/druid/pull/9096) * Add Eclipse Public License 2.0 to license check * Update licenses.yaml, revert changes to check-licenses.py and dependencies for integration-tests * Add spelling exception and remove unused dependency * Use non-deprecated getSchemaById() and remove duplicated license entry * Update docs/ingestion/data-formats.md Co-authored-by: Clint Wylie <cjwylie@gmail.com> * Added check for schema being null, as per Confluent code * Missing imports and whitespace * Updated unit tests with AvroSchema Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@7-tv.de> Co-authored-by: Sergio Spinatelli <sergio.spinatelli.extern@joyn.de> Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-02-27 14:25:35 -08:00
Alexander Saydakov	f930cf14d6	Use the latest Apache DataSketches release 2.0.0 (#10917 ) * use the latest Apache DataSketches release 2.0.0 * updated datasketches version Co-authored-by: AlexanderSaydakov <AlexanderSaydakov@users.noreply.github.com>	2021-02-26 07:52:00 -06:00
Clint Wylie	0ecc90142e	basic security extension ignore permissions that use unknown ResourceType or Action (#10896 ) * suppress unknown ResourceType and Action for basic-security authorizer stuff * fix pom * print failed role, test logs	2021-02-23 14:49:09 -08:00
Benedict Jin	32e801ceab	Bump Apache Parquet from 1.11.0 to 1.11.1 (#10889 )	2021-02-17 12:18:17 +08:00
Clint Wylie	fe30f4b414	refactor sql lifecycle, druid planner, views, and view permissions (#10812 ) * before i leaped i should've seen, the view from halfway down * fixes * fixes, more test * rename * fix style * further refactoring * review stuffs * rename * more javadoc and comments	2021-02-05 12:56:55 -08:00
Abhishek Agarwal	96d26e5338	Fix kinesis ingestion bugs (#10761 ) * add offsetFetchPeriod to kinesis ingestion doc * Remove jackson dependencies from extensions * Use fixed delay for lag collection * Metrics reset after finishing processing * comments * Broaden the list of exceptions to retry for * Unit tests * Add more tests * Refactoring * re-order metrics * Doc suggestions Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * Add tests Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-02-05 02:49:58 -08:00
Jonathan Wei	a1a49811d9	Address CVE-2020-8570, suppress CVE-2020-8554 (#10826 ) * Address CVE-2020-8570, suppress CVE-2020-8554 * Update licenses.yaml	2021-02-03 15:17:06 -08:00
Gian Merlino	6c0c6e60b3	Vectorized theta sketch aggregator + rework of VectorColumnProcessorFactory. (#10767 ) * Vectorized theta sketch aggregator. Also a refactoring of BufferAggregator and VectorAggregator such that they share a common interface, BaseBufferAggregator. This allows implementing both in the same file with an abstract + dual subclass structure. * Rework implementation to use composition instead of inheritance. * Rework things to enable working properly for both complex types and regular types. Involved finally moving makeVectorProcessor from DimensionHandlerUtils into ColumnProcessors and harmonizing the two things. * Add missing method. * Style and name changes. * Fix issues from inspections. * Fix style issue.	2021-01-29 09:30:09 -08:00
Maytas Monsereenusorn	a46d561bd7	Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead (#10740 ) * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * fix checkstyle * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * fix test * fix test * add log * Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead * address comments * fix checkstyle * fix checkstyle * add config to skip overhead memory calculation * add test for the skipBytesInMemoryOverheadCheck config * add docs * fix checkstyle * fix checkstyle * fix spelling * address comments * fix travis * address comments	2021-01-27 00:34:56 -08:00
Himadri Singh	1c1b396eaa	AWS Web Identity / IRSA Support (#10541 ) * AWS Web Identity Support required for AWS IRSA * Update kinesis-ingestion.md * disabling coverage tests https://github.com/apache/druid/pull/10541#issuecomment-737558213 * exclude coverage * Update licenses.yaml	2021-01-25 18:44:02 +05:30
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
Xavier Léauté	118b50195e	Introduce KafkaRecordEntity to support Kafka headers in InputFormats (#10730 ) Today Kafka message support in streaming indexing tasks is limited to message values, and does not provide a way to expose Kafka headers, timestamps, or keys, which may be of interest to more specialized Druid input formats. For instance, Kafka headers may be used to indicate payload format/encoding or additional metadata, and timestamps are often omitted from values in Kafka streams applications, since they are included in the record. This change proposes to introduce KafkaRecordEntity as InputEntity, which would give input formats full access to the underlying Kafka record, including headers, key, timestamps. It would also open access to low-level information such as topic, partition, offset if needed. KafkaEntity is a subclass of ByteEntity for backwards compatibility with existing input formats, and to avoid introducing unnecessary complexity for Kinesis indexing tasks.	2021-01-08 16:04:37 -08:00
Jonathan Wei	c7f2d3fbb5	Update deps for CVE-2020-28168 and CVE-2020-28052 (#10733 ) * Update deps for CVE-2020-28168 and CVE-2020-28052 * Make BC runtime scope	2021-01-07 20:31:44 -08:00
Himanshu	c7b1212a43	AWS RDS token based password provider (#9518 ) * refresh db pwd * aws iam token password provider * fix analyze-dependencies build * fix doc build * add ut for BasicDataSourceExt * more doc updates * more doc update * moving aws token password provider to new extension * remove duplicate changes * make all config inline * extension docs * refresh db password in SQL Firehose code path as well * add ut * fix build * add new extension to distribution * rds lib is not provided * fix license build * add version to license * change parent version to 0.19.0-snapshot * address review comments * fix core/ code coverage * Update server/src/main/java/org/apache/druid/metadata/BasicDataSourceExt.java Co-authored-by: Clint Wylie <cjwylie@gmail.com> * address review comments * fix spellchecker * remove inadvertant website file change Co-authored-by: Clint Wylie <cjwylie@gmail.com>	2021-01-06 21:15:29 -08:00
Jonathan Wei	769c21cc87	Add sample method to IndexingServiceClient (#10729 ) * Add sample method to IndexingServiceClient * Add unit test * Fix LGTM	2021-01-05 15:02:44 -08:00
Xavier Léauté	b7a16d08a6	Update Apache Kafka to 2.7.0 (#10701 ) - align scala versions to match Kafka	2020-12-22 13:56:00 -08:00
Clint Wylie	da0eabaa01	integration test for coordinator and overlord leadership client (#10680 ) * integration test for coordinator and overlord leadership, added sys.servers is_leader column * docs * remove not needed * fix comments * fix compile heh * oof * revert unintended * fix tests, split out docker-compose file selection from starting cluster, use docker-compose down to stop cluster * fixes * style * dang * heh * scripts are hard * fix spelling * fix thing that must not matter since was already wrong ip, log when test fails * needs more heap * fix merge * less aggro	2020-12-17 22:50:12 -08:00
Himanshu	ac1882bf74	kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544 ) * honor zk enablement config in more places in druid code * kubernetes based discovery module * fix spotbugs check * fix intellij checks error * fix doc link to kubernetes.md from extension * make spellchecker happy * update license.yaml * fix dependency check errors * update extension coverage * UTs for BaseNodeRoleWatcher * fix forbidden-api check * update k8s module coverage ignores * add Bouncy Castle License being same as MIT License for license checking purposes * further update licenses.yaml * label/annotation pre-existence assumption * address review comment	2020-12-14 21:10:31 -08:00
Gian Merlino	b7641f644c	Two fixes related to encoding of % symbols. (#10645 ) * Two fixes related to encoding of % symbols. 1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments() returns already-decoded strings. Applying StringUtils.urlDecode on top of that causes erroneous behavior with '%' characters. 2) Update various ThreadFactoryBuilder name formats to escape '%' characters. This fixes situations where substrings starting with '%' are erroneously treated as format specifiers. ITs are updated to include a '%' in extra.datasource.name.suffix. * Avoid String.replace. * Work around surefire bug. * Fix xml encoding. * Another try at the proper encoding. * Give up on the emojis. * Less ambitious testing. * Fix an additional problem. * Adjust encodeForFormat to return null if the input is null.	2020-12-06 22:35:11 -08:00
zhangyue19921010	229b5f359f	Remove hard limitation that druid(after 0.15.0) only can consume Kafka version 0.11.x or better (#10551 ) * remove build in kafka consumer config : * modify druid docs of kafka indexing service * yuezhang * modify doc * modify docs * fix kafkaindexTaskTest.java * revert uncessary change * add more logs and modify docs * revert jdk version * modify docs * modify-kafka-version v2 * modify docs * modify docs * modify docs * modify docs * modify docs * done * remove useless import * change code and add UT Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2020-12-03 17:37:59 -08:00
Himanshu	7e9522870f	introduce DynamicConfigProvider interface and make kafka consumer props extensible (#10309 ) * introduce DynamicConfigProvider interface and make kafka consumer props extensible * fix intellij inspection error * make DynamicConfigProvider generic Change-Id: I2e3e89f8617b6fe7fc96859deca4011f609dc5a3 * deprecate PasswordProvider	2020-12-02 16:38:27 -08:00
frank chen	e83d5cb59e	Fix ingestion failure of pretty-formatted JSON message (#10383 ) * support multi-line text * add test cases * split json text into lines case by case * improve exception handle * fix CI * use IntermediateRowParsingReader as base of JsonReader * update doc * ignore the non-immutable field in test case * add more test cases * mark `lineSplittable` as final * fix testcases * fix doc * add a test case for SqlReader * return all raw columns when exception occurs * fix CI * fix test cases * resolve review comments * handle ParseException returned by index.add * apply Iterables.getOnlyElement * fix CI * fix test cases * improve code in more graceful way * fix test cases * fix test cases * add a test case to check multiple json string in one text block * fix inspection check	2020-11-13 13:59:23 -08:00
Clint Wylie	6563599de4	modify access to protected SQLMetadataConnector methods to allow extensions to create SQL metadata tables using implementation specific constructs (payload type, serial type, etc) (#10573 )	2020-11-12 23:20:01 +05:30
Clint Wylie	d0821de854	support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions (#10499 ) * support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions * inspector * changes * more test * clean	2020-10-26 19:55:24 -07:00
Liran Funaro	f3a2903218	Configurable Index Type (#10335 ) * Introduce a Configurable Index Type * Change to @UnstableApi * Add AppendableIndexSpecTest * Update doc * Add spelling exception * Add tests coverage * Revert some of the changes to reduce diff * Minor fixes * Update getMaxBytesInMemoryOrDefault() comment * Fix typo, remove redundant interface * Remove off-heap spec (postponed to a later PR) * Add javadocs to AppendableIndexSpec * Describe testCreateTask() * Add tests for AppendableIndexSpec within TuningConfig * Modify hashCode() to conform with equals() * Add comment where building incremental-index * Add "EqualsVerifier" tests * Revert some of the API back to AppenderatorConfig * Don't use multi-line comments * Remove knob documentation (deferred)	2020-10-23 18:34:26 -07:00
Joseph Glanville	7ce9ac4548	Fix Avro support in Web Console (#10232 ) * Fix Avro OCF detection prefix and run formation detection on raw input * Support Avro Fixed and Enum types correctly * Check Avro version byte in format detection * Add test for AvroOCFReader.sample Ensures that the Sampler doesn't receive raw input that it can't serialize into JSON. * Document Avro type handling * Add TS unit tests for guessInputFormat	2020-10-07 21:08:22 -07:00
Jonathan Wei	65c0d64676	Update version to 0.21.0-SNAPSHOT (#10450 ) * [maven-release-plugin] prepare release druid-0.21.0 * [maven-release-plugin] prepare for next development iteration * Update web-console versions	2020-10-03 16:08:34 -07:00
Lasse Krogh Mammen	20ca9aaaf7	Allow using jsonpath predicates with AvroFlattener (#10330 )	2020-10-02 10:14:48 +01:00
Abhishek Agarwal	d057c5149f	Fix the offset setting in GoogleStorage#get (#10449 ) * Fix the offset in get of GCP object * upgrade compute dependency * fix version * review comments * missed	2020-10-01 08:38:58 -07:00
Clint Wylie	19c4b16640	vectorized expressions and expression virtual columns (#10401 ) * vectorized expression virtual columns * cleanup * fixes * preserve float if explicitly specified * oops * null handling fixes, more tests * what is an expression planner? * better names * remove unused method, add pi * move vector processor builders into static methods * reduce boilerplate * oops * more naming adjustments * changes * nullable * missing hex * more	2020-09-23 13:56:38 -07:00
Tarun	49a09302f3	Issue fix for CSV loading with header and skip header not parsing well. (#10398 )	2020-09-21 15:14:22 -07:00
Igor Dvorzhak	d0ee2e3a48	Upgrade ORC to 1.5.10 version (#10291 )	2020-09-18 13:38:45 -07:00
belugabehr	74368d95af	Remove JODA Time Dependency from Avro Extensions (#10010 )	2020-09-18 12:41:42 -07:00
Suneet Saldanha	0b4c897fbe	Vectorized variance aggregators (#10390 ) * wip vectorize * close but not quite * faster * unit tests * fix complex types for variance	2020-09-17 15:05:40 -07:00
Clint Wylie	184b202411	add computed Expr output types (#10370 ) * push down ValueType to ExprType conversion, tidy up * determine expr output type for given input types * revert unintended name change * add nullable * tidy up * fixup * more better * fix signatures * naming things is hard * fix inspection * javadoc * make default implementation of Expr.getOutputType that returns null * rename method * more test * add output for contains expr macro, split operation and function auto conversion	2020-09-14 18:18:56 -07:00
Jihoon Son	8f14ac814e	More structured way to handle parse exceptions (#10336 ) * More structured way to handle parse exceptions * checkstyle; add more tests * forbidden api; test * address comment; new test * address review comments * javadoc for parseException; remove redundant parseException in streaming ingestion * fix tests * unnecessary catch * unused imports * appenderator test * unused import	2020-09-11 16:31:10 -07:00
Abhishek Agarwal	a5c46dc84b	Add vectorization for druid-histogram extension (#10304 ) * First draft * Remove redundant code from FixedBucketsHistogramAggregator classes * Add test cases for new classes * Fix tests in sql compatible mode * Typo fix * Fix comment * Add spelling * Vectorize only for supported types * Rename internal aggregator files * Fix tests	2020-09-09 13:56:33 -07:00

1 2 3 4 5 ...

826 Commits