druid

Commit Graph

Author	SHA1	Message	Date
Maytas Monsereenusorn	3455352241	Add feature to automatically remove compaction configurations for inactive datasources (#11232 ) * add auto cleanup * add auto cleanup * add auto cleanup * add tests * add tests * use retryutils * use retryutils * use retryutils * address comments	2021-05-11 18:49:18 -07:00
Agustin Gonzalez	8e5048e643	Avoid memory mapping hydrants after they are persisted & after they are merged for native batch ingestion (#11123 ) * Avoid mapping hydrants in create segments phase for native ingestion * Drop queriable indices after a given sink is fully merged * Do not drop memory mappings for realtime ingestion * Style fixes * Renamed to match use case better * Rollback memoization code and use the real time flag instead * Null ptr fix in FireHydrant toString plus adjustments to memory pressure tracking calculations * Style * Log some count stats * Make sure sinks size is obtained at the right time * BatchAppenderator unit test * Fix comment typos * Renamed methods to make them more readable * Move persisted metadata from FireHydrant class to AppenderatorImpl. Removed superfluous differences and fix comment typo. Removed custom comparator * Missing dependency * Make persisted hydrant metadata map concurrent and better reflect the fact that keys are Java references. Maintain persisted metadata when dropping/closing segments. * Replaced concurrent variables with normal ones * Added batchMemoryMappedIndex "fallback" flag with default "false". Set this to "true" make code fallback to previous code path. * Style fix. * Added note to new setting in doc, using Iterables.size (and removing a dependency), and fixing a typo in a comment. * Forgot to commit this edited documentation message	2021-05-11 14:34:26 -07:00
Maytas Monsereenusorn	4326e699bd	Add feature to automatically remove datasource metadata based on retention period (#11227 ) * add auto clean up datasource metadata * add test * fix checkstyle * add comments * fix error * address comments * Address comments * fix test * fix test * fix typo * add comment * fix test * fix test	2021-05-11 01:22:33 -07:00
Maytas Monsereenusorn	3a660bc6ee	Make sure updating coordinator config is protected against race condition (#11144 ) * Make sure changing coordinator config is protected against concurrent updates * Make sure updating coordinator config is protected against race condition * add retry * fix checkstyle * add tests * add tests * add more tests * add tests * fix * fix checkstyle	2021-05-10 13:58:08 -07:00
Jihoon Son	2df42143ae	Fix idempotence of segment allocation and task report apis in native batch ingestion (#11189 ) * Fix idempotence of segment allocation and task report apis in native batch ingestion * better error and javadoc * checkstyle and dependency * fix tests and add more tests * task config instead of context; add doc * unused import and dependency * typo in doc * fix unintended changes * fix wrong import * remove unnecessary error handling * add task context back * default task context * fix test and doc * address comments * unused imports	2021-05-07 14:29:48 -07:00
Maytas Monsereenusorn	d73f72e508	Add feature to automatically remove supervisor based on retention period (#11200 ) * add auto clean up * add test * add test * fix test * Address comments * Address comments	2021-05-06 22:25:23 -07:00
Lucas Capistrant	bb3c810b36	Create dynamic config that can limit number of non-primary replicants loaded per coordination cycle (#11135 ) * lay the groundwork for throttling replicant loads per RunRules execution * Add dynamic coordinator config to control new replicant threshold. * remove redundant line * add some unit tests * fix checkstyle error * add documentation for new dynamic config * improve docs and logs * Alter how null is handled for new config. If null, manually set as default	2021-05-05 07:39:36 -05:00
Clint Wylie	6f0b4d90d8	clarify that appenderator persist byte sizes are estimations (#11198 )	2021-05-05 17:44:04 +08:00
frank chen	204901a602	Fix Smile encoding for HTTP response (#10980 ) * fix Smile encoding bug Signed-off-by: frank chen <frank.chen021@outlook.com> * Add unit tests * Add IT for smile encoding * Fix cases * Update javadoc Co-authored-by: Jihoon Son <jihoonson@apache.org> * resolve comments Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-05-03 22:43:47 -07:00
Clint Wylie	554f1ffeee	ARRAY_AGG sql aggregator function (#11157 ) * ARRAY_AGG sql aggregator function * add javadoc * spelling * review stuff, return null instead of empty when nil input * review stuff * Update sql.md * use type inference for finalize, refactor some things	2021-05-03 22:17:10 -07:00
Maytas Monsereenusorn	84aac4832d	Add feature to automatically remove rules based on retention period (#11164 ) * Add feature to automatically remove rules based on retention period * Add feature to automatically remove rules based on retention period * address comments	2021-05-03 11:50:45 -07:00
Maytas Monsereenusorn	6d2b5cdd7e	Add feature to automatically remove audit logs based on retention period (#11084 ) * add docs * add impl * fix checkstyle * fix test * add test * fix checkstyle * fix checkstyle * fix test * Address comments * Address comments * fix spelling * fix docs	2021-04-20 17:10:43 -07:00
Maytas Monsereenusorn	f968400170	Introduce a new configuration that skip storing audit payload if payload size exceed limit and skip storing null fields for audit payload (#11078 ) * Add config to skip storing audit payload if exceed limit * fix checkstyle * change config name * skip null fields for audit payload * fix checkstyle * address comments * fix guice * fix test * add tests * address comments * address comments * address comments * fix checkstyle * address comments * fix test * fix test * address comments * Address comments Co-authored-by: Jihoon Son <jihoonson@apache.org> Co-authored-by: Jihoon Son <jihoonson@apache.org>	2021-04-13 20:18:28 -07:00
Gian Merlino	c3faa24f26	DataSchema: Improve duplicate-column error message. (#11082 ) * DataSchema: Improve duplicate-column error message. Now, when duplicate columns are specified, the error message will include information about where those duplicate columns were seen. Also, if there are multiple duplicate columns, all will be listed in the error message instead of just the first one encountered. * Fix style for checkstyle. * Further improve error message.	2021-04-12 19:03:15 -07:00
Jihoon Son	a6a2758095	More unit tests for JsonParserIterator; Integration tests for query errors (#11091 ) * unit tests for timeout exception in init * integration tests * run integraion test on travis * fix inspection	2021-04-12 15:08:50 -07:00
Maytas Monsereenusorn	4576152e4a	Make dropExisting flag for Compaction configurable and add warning documentations (#11070 ) * Make dropExisting flag for Compaction configurable * fix checkstyle * fix checkstyle * fix test * add tests * fix spelling * fix docs * add IT * fix test * fix doc * fix doc	2021-04-09 00:12:28 -07:00
Lucas Capistrant	8264203cee	Allow client to configure batch ingestion task to wait to complete until segments are confirmed to be available by other (#10676 ) * Add ability to wait for segment availability for batch jobs * IT updates * fix queries in legacy hadoop IT * Fix broken indexing integration tests * address an lgtm flag * spell checker still flagging for hadoop doc. adding under that file header too * fix compaction IT * Updates to wait for availability method * improve unit testing for patch * fix bad indentation * refactor waitForSegmentAvailability * Fixes based off of review comments * cleanup to get compile after merging with master * fix failing test after previous logic update * add back code that must have gotten deleted during conflict resolution * update some logging code * fixes to get compilation working after merge with master * reset interrupt flag in catch block after code review pointed it out * small changes following self-review * fixup some issues brought on by merge with master * small changes after review * cleanup a little bit after merge with master * Fix potential resource leak in AbstractBatchIndexTask * syntax fix * Add a Compcation TuningConfig type * add docs stipulating the lack of support by Compaction tasks for the new config * Fixup compilation errors after merge with master * Remove erreneous newline	2021-04-08 21:03:00 -07:00
Xavier Léauté	15bdd6bc2f	Fix unit tests and GC settings for Java 15 (#11074 ) * JavaScript script engine support was removed in JDK 15: skip those tests for JDKs without it * Fix flaky HTTP client tests with Java 15 * Switch from CMS to G1GC in integration tests, since CMS is no longer available in JDK 15	2021-04-08 10:33:37 -07:00
Abhishek Agarwal	0df0bff44b	Enable multiple distinct aggregators in same query (#11014 ) * Enable multiple distinct count * Add more tests * fix sql test * docs fix * Address nits	2021-04-07 00:52:19 -07:00
Jihoon Son	cc12a57034	Enforce allow list for JDBC properties by default (#11063 ) * Enforce allow list for JDBC properties by default * fix tests	2021-04-06 19:46:19 -07:00
zachjsh	8cf1e83543	Add paramter to loadstatus API to compute underdeplication against cluster view (#11056 ) * Add paramter to loadstatus API to compute underdeplication against cluster view This change adds a query parameter `computeUsingClusterView` to loadstatus apis that if specified have the coordinator compute undereplication for segments based on the number of services available within cluster that the segment can be replicated on, instead of the configured replication count configured in load rule. A default load rule is created in all clusters that specified that all segments should be replicated 2 times. As replicas are forced to be on separate nodes in the cluster, this causes the loadstatus api to report that there are under-replicated segments when there is only 1 data server in the cluster. In this case, calling loadstatus api without this new query parameter will always result in a response indicating under-replication of segments * * fix exception mapper * * Address review comments * * update external API docs * Apply suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * * update more external docs * * update javadoc * Apply suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-04-05 00:02:43 -04:00
Jihoon Son	cfcebc40f6	Allow list for JDBC connection properties to address CVE-2021-26919 (#11047 ) * Allow list for JDBC connection properties to address CVE-2021-26919 * fix tests for java 11	2021-04-01 17:30:47 -07:00
Maytas Monsereenusorn	d7f5293364	Add an option for ingestion task to drop (mark unused) all existing segments that are contained by interval in the ingestionSpec (#11025 ) * Auto-Compaction can run indefinitely when segmentGranularity is changed from coarser to finer. * Add option to drop segments after ingestion * fix checkstyle * add tests * add tests * add tests * fix test * add tests * fix checkstyle * fix checkstyle * add docs * fix docs * address comments * address comments * fix spelling	2021-04-01 12:29:36 -07:00
Parag Jain	b35486fa81	request logs through kafka emitter (#11036 ) * request logs through kafka emitter * travis fixes * review comments * kafka emitter unit test * new line * travis checks * checkstyle fix * count request lost when request topic is null	2021-04-01 11:31:32 +05:30
Lasse Krogh Mammen	782a1d4e6c	Add Calcite Avatica protobuf handler (#10543 )	2021-03-31 12:46:25 -07:00
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Maytas Monsereenusorn	c87ac0823f	Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap (#11019 ) * Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap * Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap * Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap * Fix Auto-compaction with segment granularity retrieve incomplete segments from timeline when interval overlap * address comments	2021-03-24 11:37:29 -07:00
Jihoon Son	a041933017	Allow overlapping intervals for the compaction task (#10912 ) * Allow overlapping intervals for the compaction task * unused import * line indentation Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>	2021-03-23 11:21:54 -07:00
Maytas Monsereenusorn	51d2c61f1c	Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity (#11009 ) * Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity * Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity * Auto-compaction with segment granularity should skip segments that already have the configured segmentGranularity * address comments * address comments * address comments * address comments * address comments	2021-03-19 17:38:28 -07:00
Samarth Jain	5fae7dfcf2	Fix regression introduced by #11008 (#11013 ) * Fix regression introduced by #11008 * Add back and tweak the check to not inspect resources for authorization when AllowAllAuthorizer is configured. Add a unit test to validate that the change doesn't introduce new behavior.	2021-03-19 17:15:03 -07:00
Maytas Monsereenusorn	f19c2e9ce4	If ingested data has sparse columns, the ingested data with forceGuaranteedRollup=true can result in imperfect rollup and final dimension ordering can be different from dimensionSpec ordering in the ingestionSpec (#10948 ) * add IT * add IT * add the fix * fix checkstyle * fix compile * fix compile * fix test * fix test * address comments	2021-03-18 17:04:28 -07:00
Samarth Jain	83fcab1d0f	Improve performance of queries against SYSTEM.SEGMENT table. (#11008 ) Size HashMap and HashSet appropriately. Perf analysis of the queries revealed that over 25% of the query time was spent in resizing HashMap and HashSet collections. Also, prevent the need to examine and authorize all resources when AllowAllAuthorizer is the configured authorizer.	2021-03-17 22:24:02 -07:00
Atul Mohan	3d7e7c2c83	Avoid deletion of load/drop entry from CuratorLoadQueuePeon in case of load timeout (#10213 ) * Skip queue removal on timeout * Clarify error * Add new config to control replication Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2021-03-17 11:34:05 -07:00
Xavier Léauté	1061faa6ba	prefer string concatenation over String.format in performance sensitive code (#10997 ) String.format relies on regex parsing, which makes these calls expensive at higher request volumes.	2021-03-16 22:06:26 -07:00
Maytas Monsereenusorn	f37713dc6d	Fix auto compaction with mixed versions in the same time chunk based on new segment granularity (#11000 )	2021-03-16 12:48:19 -07:00
zhangyue19921010	3277479ff7	[Minor]Add metadata-related logs and missing UT for kill tasks. (#10956 ) * logs more info when delete segments && add deleteSegments-related UT * revert msic.xml * code review * use log.debugSegments Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-11 18:00:52 -08:00
Xavier Léauté	d26e1bc70d	update code check plugins for Java 15 support (#10978 ) * update maven-forbidden-api plugin to 3.1 * update maven-pmd-plugin to 3.14 * update spotbugs to 4.2.2 * fixes validation failures newly caught by those updates - fix SpotBugs NP_NONNULL_PARAM_VIOLATION - fix PMD UnnecessaryFullyQualifiedName	2021-03-11 07:31:41 -08:00
Abhishek Agarwal	c66951a59e	Add flag in SQL to disable left base filter optimization for joins (#10947 ) * Add flag to disable left base filter * code coverage * Draft * Review comments * code coverage * add docs * Add old tests	2021-03-09 13:07:34 -08:00
Abhishek Agarwal	489f5b1a03	Avoid expensive findEntry call in segment metadata query (#10892 ) * Avoid expensive findEntry call in segment metadata query * other places * Remove findEntry * Fix add cost * Refactor a bit * Add performance test * Add comment * Review comments * intellij	2021-03-08 22:08:33 -08:00
Jihoon Son	9946306d4b	Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830 ) * Allow only HTTP and HTTPS protocols for the HTTP inputSource * rename * Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * fix http firehose and update doc * HDFS inputSource * add configs for allowed protocols * fix checkstyle and doc * more checkstyle * remove stale doc * remove more doc * Apply doc suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * update hdfs address in docs * fix test Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-06 11:43:00 -08:00
zhangyue19921010	bddacbb1c3	Dynamic auto scale Kafka-Stream ingest tasks (#10524 ) * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * druid task auto scale based on kafka lag * fix kafkaSupervisorIOConfig and KinesisSupervisorIOConfig * test dynamic auto scale done * auto scale tasks tested on prd cluster * auto scale tasks tested on prd cluster * modify code style to solve 29055.10 29055.9 29055.17 29055.18 29055.19 29055.20 * rename test fiel function * change codes and add docs based on capistrant reviewed * midify test docs * modify docs * modify docs * modify docs * merge from master * Extract the autoScale logic out of SeekableStreamSupervisor to minimize putting more stuff inside there && Make autoscaling algorithm configurable and scalable. * fix ci failed * revert msic.xml * add uts to test autoscaler create && scale out/in and kafka ingest with scale enable * add more uts * fix inner class check * add IT for kafka ingestion with autoscaler * add new IT in groups=kafka-index named testKafkaIndexDataWithWithAutoscaler * review change * code review * remove unused imports * fix NLP * fix docs and UTs * revert misc.xml * use jackson to build autoScaleConfig with default values * add uts * use jackson to init AutoScalerConfig in IOConfig instead of Map<> * autoscalerConfig interface and provide a defaultAutoScalerConfig * modify uts * modify docs * fix checkstyle * revert misc.xml * modify uts * reviewed code change * reviewed code change * code reviewed * code review * log changed * do StringUtils.encodeForFormat when create allocationExec * code review && limit taskCountMax to partitionNumbers * modify docs * code review Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2021-03-06 14:36:52 +05:30
Abhishek Agarwal	1a15987432	Supporting filters in the left base table for join datasources (#10697 ) * where filter left first draft * Revert changes in calcite test * Refactor a bit * Fixing the Tests * Changes * Adding tests * Add tests for correlated queries * Add comment * Fix typos	2021-03-04 10:39:21 -08:00
Maytas Monsereenusorn	23333914c7	add javadoc and test (#10938 )	2021-03-03 11:34:00 +08:00
Maytas Monsereenusorn	b7b0ee8362	Add query granularity to compaction task (#10900 ) * add query granularity to compaction task * fix checkstyle * fix checkstyle * fix test * fix test * add tests * fix test * fix test * cleanup * rename class * fix test * fix test * add test * fix test	2021-03-02 11:23:52 -08:00
Gian Merlino	07902f607b	Granularity: Introduce primitive-typed bucketStart, increment methods. (#10904 ) * Granularity: Introduce primitive-typed bucketStart, increment methods. Saves creation of unnecessary DateTime objects in timestamp_floor and timestamp_ceil expressions. * Fix style. * Amp up the test coverage.	2021-02-25 07:59:20 -08:00
Abhishek Agarwal	3a0a0c033f	Reload segment usage when starting the process (#10884 ) * Reload segment usage when starting the process * doc * Add more tests * remove forbidden method * Add alert	2021-02-22 00:08:44 -08:00
Maytas Monsereenusorn	f5bfccc720	Fix maxBytesInMemory for heap overhead of all sinks and hydrants check (#10891 ) * fix maxBytesInMemory * fix maxBytesInMemory check * fix maxBytesInMemory check * fix test	2021-02-18 21:48:57 -08:00
Jonathan Wei	8ad68135c8	Filter unauthorized views in InformationSchema (#10874 ) * Filter unauthorized views in InformationSchema * Use fixed name for view schema * Remove unused string	2021-02-16 17:36:45 -08:00
Maytas Monsereenusorn	6541178c21	Support segmentGranularity for auto-compaction (#10843 ) * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * resolve conflict * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * fix tests * fix more tests * fix checkstyle * add unit tests * fix checkstyle * fix checkstyle * fix checkstyle * add unit tests * add integration tests * fix checkstyle * fix checkstyle * fix failing tests * address comments * address comments * fix tests * fix tests * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test	2021-02-12 03:03:20 -08:00
Jihoon Son	1ec3f0bd73	Revert "Add support for Blacklisting some domains for HTTPInputSource (#10535 )" (#10871 ) This reverts commit `6b14bdb3a5`.	2021-02-09 17:51:26 -08:00

1 2 3 4 5 ...

3715 Commits