druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Jihoon Son	efc5d7d112	Suppress CVEs for Solr and org.codehaus.jackson (#11030 ) * Suppress CVEs for Solr and org.codehaus.jackson * add a comment	2021-03-24 16:44:05 -07:00
Charles Smith	d69533dbd9	First refactor of compaction (#10935 ) * first pass compaction refactor. includes updated behavior for queryGranularity. removes duplicated doc * fix links, typos, some reorganization * fix spelling. TBD still there for work in progress * updates tutorial examples, adds more clarification around compaction use cases * add granularity spec to automatic compaction config * final edits * spelling fixes * apply suggestions from review * upadtes from review * last edits * move note * clarify null * fix links & spelling * latest review * edits to auto-compaction config * add back rollup * fix links & spelling * Update compaction.md add granularityspec to example	2021-03-24 11:41:44 -07:00
Maytas Monsereenusorn	c87ac0823f	Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap (#11019 ) * Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap * Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap * Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap * Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap * address comments	2021-03-24 11:37:29 -07:00
Jonathan Wei	8296123d89	Add resources used to EXPLAIN PLAN FOR output (#11024 )	2021-03-23 17:21:15 -07:00
Jihoon Son	6aec8f0c1b	allow multiple ldap bootstrap files for integration tests (#11023 )	2021-03-23 13:18:36 -07:00
Jihoon Son	a041933017	Allow overlapping intervals for the compaction task (#10912 ) * Allow overlapping intervals for the compaction task * unused import * line indentation Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>	2021-03-23 11:21:54 -07:00
Maytas Monsereenusorn	51d2c61f1c	Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity (#11009 ) * Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity * Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity * Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity * address comments * address comments * address comments * address comments * address comments	2021-03-19 17:38:28 -07:00
Samarth Jain	5fae7dfcf2	Fix regression introduced by #11008 (#11013 ) * Fix regression introduced by #11008 * Add back and tweak the check to not inspect resources for authorization when AllowAllAuthorizer is configured. Add a unit test to validate that the change doesn't introduce new behavior.	2021-03-19 17:15:03 -07:00
Benedict Jin	82c4d9dd92	Fix a resource leak in JobHelper (#10913 ) Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-03-19 17:02:31 -07:00
zhangyue19921010	8b4f966708	[BUG FIX]Kinesis lag keep increasing when there is no more new data for kinesis stream (#11006 ) * fix kinesis lag metrics bug and modify current UT * done * revert misc.xml * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-19 07:47:27 -07:00
Maytas Monsereenusorn	f19c2e9ce4	If ingested data has sparse columns, the ingested data with forceGuaranteedRollup=true can result in imperfect rollup and final dimension ordering can be different from dimensionSpec ordering in the ingestionSpec (#10948 ) * add IT * add IT * add the fix * fix checkstyle * fix compile * fix compile * fix test * fix test * address comments	2021-03-18 17:04:28 -07:00
Samarth Jain	83fcab1d0f	Improve performance of queries against SYSTEM.SEGMENT table. (#11008 ) Size HashMap and HashSet appropriately. Perf analysis of the queries revealed that over 25% of the query time was spent in resizing HashMap and HashSet collections. Also, prevent the need to examine and authorize all resources when AllowAllAuthorizer is the configured authorizer.	2021-03-17 22:24:02 -07:00
Atul Mohan	3d7e7c2c83	Avoid deletion of load/drop entry from CuratorLoadQueuePeon in case of load timeout (#10213 ) * Skip queue removal on timeout * Clarify error * Add new config to control replication Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2021-03-17 11:34:05 -07:00
Xavier Léauté	1061faa6ba	prefer string concatenation over String.format in performance sensitive code (#10997 ) String.format relies on regex parsing, which makes these calls expensive at higher request volumes.	2021-03-16 22:06:26 -07:00
Clint Wylie	694605e815	suppress (#11002 )	2021-03-16 18:17:57 -07:00
Suneet Saldanha	6b0c2e8996	CompactionTask throws exception on conflicting segmentGranularity (#10996 ) * CompactionTask throws exception on conflicting segmentGranularity * add comment	2021-03-16 12:51:50 -07:00
Maytas Monsereenusorn	f37713dc6d	Fix auto compaction with mixed versions in the same time chunk based on new segment granularity (#11000 )	2021-03-16 12:48:19 -07:00
Clint Wylie	4cd4a22f87	expression filter support for vectorized query engines (#10613 ) * expression filter support for vectorized query engines * remove unused codes * more tests * refactor, more tests * suppress * more * more * more * oops, i was wrong * comment * remove decorate, object dimension selector, more javadocs * style	2021-03-16 11:46:50 -07:00
Xavier Léauté	68781a0d20	update testing frameworks for Java 15 support (#10984 ) * update jacoco to 0.8.6 * update easymock to 4.2 * update equalsverifier to 3.5.5 * update mockito to 3.8.0 * update powermock to 2.0.9 * update assertj-core to 3.19.0 * update testng to 7.3.0 - fix DTD url security for testng 7.x - fix backwards incompatibility in testng 7.x	2021-03-12 20:18:13 -08:00
Maytas Monsereenusorn	ed91a2bb38	Fix Kinesis should not increment throwAway count on EOS record (#10976 ) * fix Kinesis increament throwAway on EOS record * fix checkstyle * fix IT * fix test * fix IT * fix IT * fix IT * fix IT	2021-03-11 22:04:58 -08:00
zhangyue19921010	3277479ff7	[Minor]Add metadata-related logs and missing UT for kill tasks. (#10956 ) * logs more info when delete segments && add deleteSegments-related UT * revert msic.xml * code review * use log.debugSegments Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-11 18:00:52 -08:00
Vyatcheslav Mogilevsky	b0432be07a	Apache archive mirror (#10979 ) * Ability to use mirror of archive.apache.org * Ability to use mirror of archive.apache.org: documentation * Ability to use mirror of archive.apache.org: fix int test Dockerfile: missing COPY instruction	2021-03-11 09:07:51 -08:00
Xavier Léauté	d26e1bc70d	update code check plugins for Java 15 support (#10978 ) * update maven-forbidden-api plugin to 3.1 * update maven-pmd-plugin to 3.14 * update spotbugs to 4.2.2 * fixes validation failures newly caught by those updates - fix SpotBugs NP_NONNULL_PARAM_VIOLATION - fix PMD UnnecessaryFullyQualifiedName	2021-03-11 07:31:41 -08:00
Xavier Léauté	7a68cd8b86	use maven enforcer to check maven version (#10977 ) * removes a warning about prerequisites only being allowed for plugins * update maven enforcer plugin to the latest version (3.0.0-M3)	2021-03-11 07:30:10 -08:00
frank chen	b808fd2ef9	Fix NPE in the constructor of TopNQuery (#10969 ) * fix NPE * Add unit tests to cover parameter checking	2021-03-11 00:04:49 -08:00
Mohammadamin Karbasforushan	dfad38d561	Fix unclear documentation of human readable byte (#10825 ) * Fix unclear documentation of human readable byte Follows https://github.com/apache/druid/pull/10203 ; See https://github.com/apache/druid/pull/10203#issuecomment-771080634 . * Fix sentence style Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-11 00:01:38 -08:00
frank chen	b79b7e6dfb	Improve exception handling in IT to reduce excessive stack trace messages (#10955 ) * Suppress logging for some exceptions to reduce excessive stack trace messages Signed-off-by: frank chen <frank.chen021@outlook.com> * log message for channel disconnected exception Signed-off-by: frank chen <frank.chen021@outlook.com>	2021-03-10 21:27:55 -08:00
Vadim Ogievetsky	4897731e37	Make web console fast around sys.segments (#10909 ) * do not load all the segments * fix filtering * update datasource view * updated tests * remove trimmedSegments * Availability detail * be smart about when showing smart modes * fix tests * add coordinator overlord mode	2021-03-10 19:59:50 -08:00
Himanshu	43638cc6f9	license.yaml fixes for code introduced related to AWS RDS token based password provider in PR #9518 (#10885 ) * license.yaml fixes for code introduced related to AWS RDS token based password provider in PR #9518 * add notice for commons-dbcp in license file * add version and update NOTICE file	2021-03-10 12:59:25 -08:00
Vadim Ogievetsky	c0fb326788	Web console: fix service view actions when grouping (#10898 ) * fix service view actions when grouping * fix test	2021-03-09 21:38:56 -08:00
Clint Wylie	58294329b7	fix SQL issue for group by queries with time filter that gets optimized to false (#10968 ) * fix SQL issue for group by queries with time filter that gets optimized to false * short circuit always false in CombineAndSimplifyBounds * adjust * javadocs * add preconditions for and/or filters to ensure they have children * add comments, remove preconditions	2021-03-09 19:41:16 -08:00
Jonathan Wei	9c083783c9	Don't fail on invalid views in InformationSchema (#10960 ) * Don't fail on invalid views in InformationSchema * Fix test	2021-03-09 16:19:59 -08:00
benkrug	7f96ca8f5e	Update topnquery.md (#10944 ) minor edits of the English, no meanings changed (imo)	2021-03-09 15:19:02 -08:00
Yi Yuan	36e86a2880	Add protobuf schema registry (#10839 ) * dd_protobuf_schema_registry * change licese * delete some annotation * nodify tests * delete extra exception * add licenses * add descriptor and protoMessageType in ProtobufInputRowParser for adopt to old version * seperate kafka-protobuf-provider * modify protobuf.md * refine protobuf.md * add config and header * bug fixed Co-authored-by: yuanyi <yuanyi@freewheel.tv>	2021-03-09 15:15:51 -08:00
Tianxin Zhao	a57c28e9ce	prometheus metric exporter (#10412 ) * prometheus-emitter * use existing jetty server to expose prometheus collection endpoint * unused variables * better variable names * removed unused dependencies * more metric definitions * reorganize * use prometheus HTTPServer instead of hooking into Jetty server * temporary empty help string * temporary non-empty help. fix incorrect dimension value in JSON (also updated statsd json) * added full help text. added metric conversion factor for timers that are not using seconds. Correct metric dimension name in documentation * added documentation for prometheus emitter * safety for invalid labelNames * fix travis checks * Unit test and better sanitization of metrics names and label values * add precondition to check namespace against regex * use precompiled regex * remove static imports. fix metric types * better docs. fix possible NPE in PrometheusEmitterConfig. Guard against multiple calls to PrometheusEmitter.start() * Update regex for label-value replacements to allow internal numeric values. Additional tests * Adds missing license header updates website/.spelling to add words used in prometheus-emitter docs. updates docs/operations/metrics.md to correct the spelling of bufferPoolName * fixes version in extensions-contrib/prometheus-emitter * fix style guide errors * update import ordering * add another word to website/.spelling * remove unthrown declared exception * remove unused import * Pushgateway strategy for metrics * typo * Format fix and nullable strategy * Update pom file for prometheus-emitter * code review comments. Counter to gauge for cache metrics, periodical task to pushGateway * Syntax fix * Dimension label regex include numeric character back, fix previous commit * bump prometheus-emitter pom dev version * Remove scheduled task inside poen that push metrics * Fix checkstyle * Unit test coverage * Unit test coverage * Spelling * Doc fix * spelling Co-authored-by: Michael Schiff <michael.schiff@tubemogul.com> Co-authored-by: Michael Schiff <schiff.michael@gmail.com> Co-authored-by: Tianxin Zhao <tianxin.zhao@tubemogul.com> Co-authored-by: Tianxin Zhao <tizhao@adobe.com>	2021-03-09 14:37:31 -08:00
Abhishek Agarwal	c66951a59e	Add flag in SQL to disable left base filter optimization for joins (#10947 ) * Add flag to disable left base filter * code coverage * Draft * Review comments * code coverage * add docs * Add old tests	2021-03-09 13:07:34 -08:00
Maytas Monsereenusorn	4dd22a850b	Fix streaming ingestion fails if it encounters empty rows (Regression) (#10962 ) * Fix streaming ingestion fails and halt if it encounters empty rows * address comments	2021-03-09 12:11:58 -08:00
frank chen	80ec28578a	show leader in Services Tab (#10951 ) Signed-off-by: frank chen <frank.chen021@outlook.com>	2021-03-09 08:03:56 -08:00
Charles Smith	0f81ce32a0	refactor query caching docs (#10848 ) * refactor query caching * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * Update docs/querying/using-caching.md Co-authored-by: sthetland <steve.hetland@imply.io> * add description for context link * accept suggestions * reword, rework some awkward language * incorporate feedback, fix errors * add back perf considerations * Apply suggestions from code review applying @suneet-s 's changes Co-authored-by: Suneet Saldanha <suneet@apache.org> * Update caching.md fix link Co-authored-by: sthetland <steve.hetland@imply.io> Co-authored-by: Suneet Saldanha <suneet@apache.org>	2021-03-08 22:25:48 -08:00
Abhishek Agarwal	489f5b1a03	Avoid expensive findEntry call in segment metadata query (#10892 ) * Avoid expensive findEntry call in segment metadata query * other places * Remove findEntry * Fix add cost * Refactor a bit * Add performance test * Add comment * Review comments * intellij	2021-03-08 22:08:33 -08:00
Abhishek Agarwal	ae620921df	Fix classCastException when inputs to union are join (#10950 ) * Fix union queries * Add tests	2021-03-08 21:20:26 -08:00
Suneet Saldanha	756ac6ef30	Remove flaky arm64 test job (#10953 )	2021-03-08 14:09:33 -08:00
Clint Wylie	96889cdebc	add avro + kafka + schema registry integration test (#10929 ) * add avro + schema registry integration test * style * retry init * maybe this * oops heh * this will fix it * review stuffs * fix comment	2021-03-08 08:12:12 -08:00
Jihoon Son	9946306d4b	Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830 ) * Allow only HTTP and HTTPS protocols for the HTTP inputSource * rename * Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * fix http firehose and update doc * HDFS inputSource * add configs for allowed protocols * fix checkstyle and doc * more checkstyle * remove stale doc * remove more doc * Apply doc suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * update hdfs address in docs * fix test Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-06 11:43:00 -08:00
zhangyue19921010	bddacbb1c3	Dynamic auto scale Kafka-Stream ingest tasks (#10524 ) * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * test dynamic auto scale done * auto scale tasks tested on prd cluster * auto scale tasks tested on prd cluster * modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20 * rename test fiel function * change codes and add docs based on capistrant reviewed * midify test docs * modify docs * modify docs * modify docs * merge from master * Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable. * fix ci failed * revert msic.xml * add uts to test autoscaler create && scale out/in and kafka ingest with scale enable * add more uts * fix inner class check * add IT for kafka ingestion with autoscaler * add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler * review change * code review * remove unused imports * fix NLP * fix docs and UTs * revert misc.xml * use jackson to build autoScaleConfig with default values * add uts * use jackson to init AutoScalerConfig in IOConfig instead of Map<> * autoscalerConfig interface and provide a defaultAutoScalerConfig * modify uts * modify docs * fix checkstyle * revert misc.xml * modify uts * reviewed code change * reviewed code change * code reviewed * code review * log changed * do StringUtils.encodeForFormat when create allocationExec * code review && limit taskCountMax to partitionNumbers * modify docs * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-06 14:36:52 +05:30
Jihoon Son	16acd6686a	Remove stale 'namespace' config for JDBC lookups from doc (#10886 ) * Remove stale 'namespace' config for JDBC lookups from doc and web-console * revert webconsole change * address comments	2021-03-04 17:16:34 -08:00
Jihoon Son	2c30f8b3b7	Migrate bitmap benchmarks to JMH (#10936 ) * Migrate bitmap benchmarks to JMH * add concise	2021-03-04 12:50:55 -08:00
Abhishek Agarwal	1a15987432	Supporting filters in the left base table for join datasources (#10697 ) * where filter left first draft * Revert changes in calcite test * Refactor a bit * Fixing the Tests * Changes * Adding tests * Add tests for correlated queries * Add comment * Fix typos	2021-03-04 10:39:21 -08:00
Atul Mohan	6040c30fcd	Upgrade jetty to latest version (#10937 ) * Upgrade jetty * Fix license	2021-03-04 08:28:50 -06:00

... 2 3 4 5 6 ...

11037 Commits All Branches Search

11037 Commits

All Branches