druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	bf20f9e979	DruidInputSource: Fix issues in column projection, timestamp handling. (#10267 ) * DruidInputSource: Fix issues in column projection, timestamp handling. DruidInputSource, DruidSegmentReader changes: 1) Remove "dimensions" and "metrics". They are not necessary, because we can compute which columns we need to read based on what is going to be used by the timestamp, transform, dimensions, and metrics. 2) Start using ColumnsFilter (see below) to decide which columns we need to read. 3) Actually respect the "timestampSpec". Previously, it was ignored, and the timestamp of the returned InputRows was set to the `__time` column of the input datasource. (1) and (2) together fix a bug in which the DruidInputSource would not properly read columns that are used as inputs to a transformSpec. (3) fixes a bug where the timestampSpec would be ignored if you attempted to set the column to something other than `__time`. (1) and (3) are breaking changes. Web console changes: 1) Remove "Dimensions" and "Metrics" from the Druid input source. 2) Set timestampSpec to `{"column": "__time", "format": "millis"}` for compatibility with the new behavior. Other changes: 1) Add ColumnsFilter, a new class that allows input readers to determine which columns they need to read. Currently, it's only used by the DruidInputSource, but it could be used by other columnar input sources in the future. 2) Add a ColumnsFilter to InputRowSchema. 3) Remove the metric names from InputRowSchema (they were unused). 4) Add InputRowSchemas.fromDataSchema method that computes the proper ColumnsFilter for given timestamp, dimensions, transform, and metrics. 5) Add "getRequiredColumns" method to TransformSpec to support the above. * Various fixups. * Uncomment incorrectly commented lines. * Move TransformSpecTest to the proper module. * Add druid.indexer.task.ignoreTimestampSpecForDruidInputSource setting. * Fix. * Fix build. * Checkstyle. * Misc fixes. * Fix test. * Move config. * Fix imports. * Fixup. * Fix ShuffleResourceTest. * Add import. * Smarter exclusions. * Fixes based on tests. Also, add TIME_COLUMN constant in the web console. * Adjustments for tests. * Reorder test data. * Update docs. * Update docs to say Druid 0.22.0 instead of 0.21.0. * Fix test. * Fix ITAutoCompactionTest. * Changes from review & from merging.	2021-03-25 10:32:21 -07:00
Jihoon Son	a041933017	Allow overlapping intervals for the compaction task (#10912 ) * Allow overlapping intervals for the compaction task * unused import * line indentation Co-authored-by: Maytas Monsereenusorn <maytasm@apache.org>	2021-03-23 11:21:54 -07:00
Xavier Léauté	1061faa6ba	prefer string concatenation over String.format in performance sensitive code (#10997 ) String.format relies on regex parsing, which makes these calls expensive at higher request volumes.	2021-03-16 22:06:26 -07:00
Clint Wylie	4cd4a22f87	expression filter support for vectorized query engines (#10613 ) * expression filter support for vectorized query engines * remove unused codes * more tests * refactor, more tests * suppress * more * more * more * oops, i was wrong * comment * remove decorate, object dimension selector, more javadocs * style	2021-03-16 11:46:50 -07:00
Abhishek Agarwal	c66951a59e	Add flag in SQL to disable left base filter optimization for joins (#10947 ) * Add flag to disable left base filter * code coverage * Draft * Review comments * code coverage * add docs * Add old tests	2021-03-09 13:07:34 -08:00
Maytas Monsereenusorn	4dd22a850b	Fix streaming ingestion fails if it encounters empty rows (Regression) (#10962 ) * Fix streaming ingestion fails and halt if it encounters empty rows * address comments	2021-03-09 12:11:58 -08:00
Abhishek Agarwal	489f5b1a03	Avoid expensive findEntry call in segment metadata query (#10892 ) * Avoid expensive findEntry call in segment metadata query * other places * Remove findEntry * Fix add cost * Refactor a bit * Add performance test * Add comment * Review comments * intellij	2021-03-08 22:08:33 -08:00
Jihoon Son	9946306d4b	Add configurations for allowed protocols for HTTP and HDFS inputSources/firehoses (#10830 ) * Allow only HTTP and HTTPS protocols for the HTTP inputSource * rename * Update core/src/main/java/org/apache/druid/data/input/impl/HttpInputSource.java Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> * fix http firehose and update doc * HDFS inputSource * add configs for allowed protocols * fix checkstyle and doc * more checkstyle * remove stale doc * remove more doc * Apply doc suggestions from code review Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com> * update hdfs address in docs * fix test Co-authored-by: Abhishek Agarwal <1477457+abhishekagarwal87@users.noreply.github.com> Co-authored-by: Charles Smith <38529548+techdocsmith@users.noreply.github.com>	2021-03-06 11:43:00 -08:00
Gian Merlino	05e8f8fe06	CsvInputFormat: Create a parser per InputEntityReader. (#10923 ) RFC4180Parser is not thread safe and cannot be shared across readers.	2021-02-27 18:37:05 -08:00
Gian Merlino	07902f607b	Granularity: Introduce primitive-typed bucketStart, increment methods. (#10904 ) * Granularity: Introduce primitive-typed bucketStart, increment methods. Saves creation of unnecessary DateTime objects in timestamp_floor and timestamp_ceil expressions. * Fix style. * Amp up the test coverage.	2021-02-25 07:59:20 -08:00
Clint Wylie	cbbef80c7f	add SQL operators for bitwise expressions (#10823 ) * add SQL operators for bitwise expressions * more test * fix spelling * more tests	2021-02-18 20:56:33 -08:00
Agustin Gonzalez	eabad0fb35	Keep query granularity of compacted segments after compaction (#10856 ) * Keep query granularity of compacted segments after compaction * Protect against null isRollup * Fix bugspot check RC_REF_COMPARISON_BAD_PRACTICE_BOOLEAN & edit an existing comment * Make sure that NONE is also included when comparing for the finer granularity * Update integration test check for segment size due to query granularity propagation affecting size * Minor code cleanup * Added functional test to verify queryGranlarity after compaction * Minor style fix * Update unit tests	2021-02-18 01:35:10 -08:00
Maytas Monsereenusorn	6541178c21	Support segmentGranularity for auto-compaction (#10843 ) * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * resolve conflict * Support segmentGranularity for auto-compaction * Support segmentGranularity for auto-compaction * fix tests * fix more tests * fix checkstyle * add unit tests * fix checkstyle * fix checkstyle * fix checkstyle * add unit tests * add integration tests * fix checkstyle * fix checkstyle * fix failing tests * address comments * address comments * fix tests * fix tests * fix test * fix test * fix test * fix test * fix test * fix test * fix test * fix test	2021-02-12 03:03:20 -08:00
Abhishek Agarwal	8718155f8f	Allow for empty keys in hash map (#10869 ) * allow for empty keys in hash map * fix serde test	2021-02-10 11:19:57 -08:00
Jihoon Son	1ec3f0bd73	Revert "Add support for Blacklisting some domains for HTTPInputSource (#10535 )" (#10871 ) This reverts commit `6b14bdb3a5`.	2021-02-09 17:51:26 -08:00
Agustin Gonzalez	3785ad5812	Add log message when local input's filter does not match any files (#10837 ) * Add log message when local input's filter does not match any files * Re-use previously defined fileIterator	2021-02-05 11:35:19 -06:00
Jihoon Son	ac41e41232	Update doc for query errors and add unit tests for JsonParserIterator (#10833 ) * Update doc for query errors and add unit tests for JsonParserIterator * static constructor for convenience * rename method	2021-02-05 02:55:32 -08:00
Jihoon Son	3f8f00a231	Fix CVE-2021-25646 (#10818 )	2021-02-04 11:21:43 -08:00
Agustin Gonzalez	0e4750bac2	Granularity interval materialization (#10742 ) * Prevent interval materialization for UniformGranularitySpec inside the overlord * Change API of bucketIntervals in GranularitySpec to return an Iterable<Interval> * Javadoc update, respect inputIntervals contract * Eliminate dependency on wrappedspec (i.e. ArbitraryGranularity) in UniformGranularitySpec * Added one boundary condition test to UniformGranularityTest and fixed Travis forbidden method errors in IntervalsByGranularity * Fix Travis style & other checks * Refactor TreeSet to facilitate re-use in UniformGranularitySpec * Make sure intervals are unique when there is no segment granularity * Style/bugspot fixes... * More travis checks * Add condensedIntervals method to GranularitySpec and pass it as needed to the lock method * Style & PR feedback * Fixed failing test * Fixed bug in IntervalsByGranularity iterator that it would return repeated elements (see added unit tests that were broken before this change) * Refactor so that we can get the condensed buckets without materializing the intervals * Get rid of GranularitySpec::condensedInputIntervals ... not needed * Travis failures fixes * Travis checkstyle fix * Edited/added javadoc comments and a method name (code review feedback) * Fixed jacoco coverage by moving class and adding more coverage * Avoid materializing the condensed intervals when locking * Deal with overlapping intervals * Remove code and use library code instead * Refactor intervals by granularity using the FluentIterable, add sanity checks * Change !hasNext() to inputIntervals().isEmpty() * Remove redundant lambda * Use materialized intervals here since this is outside the overlord (for performance) * Name refactor to reflect the fact that bucket intervals are sorted. * Style fixes * Removed redundant method and have condensedIntervalIterator throw IAE when element is null for consistency with other methods in this class (as well that null interval when condensing does not make sense) * Remove forbidden api * Move helper class inside common base class to reduce public space pollution	2021-01-29 06:02:10 -08:00
Clint Wylie	2ce7b3dcf4	bitwise math function expressions (#10605 ) * expressions: adding bitwise expressions * double handling and vectorization * move conversion to Evals * revert unintended changes * less magic, split convert functions, fix parser for funny exponent doubles * fix spelling exceptions list * more spelling * fix grammar, add more test, fix docs * fix docs Co-authored-by: Max Kaplan <max@maxkaplan.me>	2021-01-28 11:16:53 -08:00
Jihoon Son	95065bdf1a	Bump dev version to 0.22.0-SNAPSHOT (#10759 )	2021-01-15 13:16:23 -08:00
Jihoon Son	b3325c1601	Add a config for monitorScheduler type (#10732 ) * Add a config for monitorScheduler type * check interrupted * null check * do not schedule monitor if the previous one is still running * checkstyle * clean up names * change default back to basic * fix test	2021-01-13 17:20:43 -08:00
Jihoon Son	149306c9db	Tidy up HTTP status codes for query errors (#10746 ) * Tidy up query error codes * fix tests * Restore query exception type in JsonParserIterator * address review comments; add a comment explaining the ugly switch * fix test	2021-01-13 17:20:00 -08:00
Clint Wylie	9362dc7968	re-use expression vector evaluation results for the same offset in expression vector selectors (#10614 ) * cache expression selector results by associating vector expression bindings to underlying vector offset * better coverage, fix floats * style * stupid bot * stupid me * more test * intellij threw me under the bus when it generated those junit methods * narrow interface instead of passing around offset	2021-01-13 12:44:56 -08:00
Xavier Léauté	118b50195e	Introduce KafkaRecordEntity to support Kafka headers in InputFormats (#10730 ) Today Kafka message support in streaming indexing tasks is limited to message values, and does not provide a way to expose Kafka headers, timestamps, or keys, which may be of interest to more specialized Druid input formats. For instance, Kafka headers may be used to indicate payload format/encoding or additional metadata, and timestamps are often omitted from values in Kafka streams applications, since they are included in the record. This change proposes to introduce KafkaRecordEntity as InputEntity, which would give input formats full access to the underlying Kafka record, including headers, key, timestamps. It would also open access to low-level information such as topic, partition, offset if needed. KafkaEntity is a subclass of ByteEntity for backwards compatibility with existing input formats, and to avoid introducing unnecessary complexity for Kinesis indexing tasks.	2021-01-08 16:04:37 -08:00
Clint Wylie	edfbdbfc97	fix NPE when calling TaskLocation.hashCode with null host (#10708 )	2020-12-24 15:30:54 -08:00
Gian Merlino	57ee8ce4e7	CompressionUtils: Read the entire stream when unzipping from a stream. (#10664 ) * CompressionUtils: Read the entire stream when unzipping from a stream. Should fix #6905 by making sure we avoid closing partially-read streams. * CHECKSTYLE!	2020-12-17 22:52:04 -08:00
Himanshu	ac1882bf74	kubernetes based discovery druid extension to run Druid on K8S without Zookeeper (#10544 ) * honor zk enablement config in more places in druid code * kubernetes based discovery module * fix spotbugs check * fix intellij checks error * fix doc link to kubernetes.md from extension * make spellchecker happy * update license.yaml * fix dependency check errors * update extension coverage * UTs for BaseNodeRoleWatcher * fix forbidden-api check * update k8s module coverage ignores * add Bouncy Castle License being same as MIT License for license checking purposes * further update licenses.yaml * label/annotation pre-existence assumption * address review comment	2020-12-14 21:10:31 -08:00
Gian Merlino	753fa6b3bd	IdUtils: Forbid characters that cannot be used in znodes. (#10659 ) * IdUtils: Forbid characters that cannot be used in znodes. * Fix whitespace.	2020-12-10 10:49:40 -08:00
Gian Merlino	b7641f644c	Two fixes related to encoding of % symbols. (#10645 ) * Two fixes related to encoding of % symbols. 1) TaskResourceFilter: Don't double-decode task ids. request.getPathSegments() returns already-decoded strings. Applying StringUtils.urlDecode on top of that causes erroneous behavior with '%' characters. 2) Update various ThreadFactoryBuilder name formats to escape '%' characters. This fixes situations where substrings starting with '%' are erroneously treated as format specifiers. ITs are updated to include a '%' in extra.datasource.name.suffix. * Avoid String.replace. * Work around surefire bug. * Fix xml encoding. * Another try at the proper encoding. * Give up on the emojis. * Less ambitious testing. * Fix an additional problem. * Adjust encodeForFormat to return null if the input is null.	2020-12-06 22:35:11 -08:00
Himanshu	7e9522870f	introduce DynamicConfigProvider interface and make kafka consumer props extensible (#10309 ) * introduce DynamicConfigProvider interface and make kafka consumer props extensible * fix intellij inspection error * make DynamicConfigProvider generic Change-Id: I2e3e89f8617b6fe7fc96859deca4011f609dc5a3 * deprecate PasswordProvider	2020-12-02 16:38:27 -08:00
Ayush Kulshrestha	d0c2ede50c	Added CronScheduler support as a proof to clock drift while emitting metrics (#10448 ) Co-authored-by: Ayush Kulshrestha <ayush.kulshrestha@miqdigital.com>	2020-11-25 12:31:38 +01:00
frank chen	fe693a4f01	Improve doc and exception message for invalid user configurations (#10598 ) * improve doc and exception message * add spelling check rules and remove unused import * add a test to improve test coverage	2020-11-23 15:03:13 -08:00
zhangyue19921010	31740b3b29	Fix : Druid throws java.util.concurrent.RejectedExecutionException when ingest task is stopping. (#10555 ) * check exec status before return Signal * add more log * change log level to debug and add UT * change log leverl to warn and merge master Co-authored-by: yuezhang <yuezhang@freewheel.tv>	2020-11-23 14:52:03 -08:00
frank chen	e83d5cb59e	Fix ingestion failure of pretty-formatted JSON message (#10383 ) * support multi-line text * add test cases * split json text into lines case by case * improve exception handle * fix CI * use IntermediateRowParsingReader as base of JsonReader * update doc * ignore the non-immutable field in test case * add more test cases * mark `lineSplittable` as final * fix testcases * fix doc * add a test case for SqlReader * return all raw columns when exception occurs * fix CI * fix test cases * resolve review comments * handle ParseException returned by index.add * apply Iterables.getOnlyElement * fix CI * fix test cases * improve code in more graceful way * fix test cases * fix test cases * add a test case to check multiple json string in one text block * fix inspection check	2020-11-13 13:59:23 -08:00
Atul Mohan	6ccddedb7a	Improved exception handling in case of query timeouts (#10464 ) * Separate timeout exceptions * Add more tests Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-11-03 09:00:33 -06:00
Nishant Bangarwa	6b14bdb3a5	Add support for Blacklisting some domains for HTTPInputSource (#10535 ) fix inspections refactor class name change name add allowList as well distinguish between empty and null list Fix CI	2020-11-02 21:47:25 +05:30
Clint Wylie	d0821de854	support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions (#10499 ) * support for vectorizing expressions with non-existent inputs, more consistent type handling for non-vectorized expressions * inspector * changes * more test * clean	2020-10-26 19:55:24 -07:00
Jihoon Son	ad437dd655	Add shuffle metrics for parallel indexing (#10359 ) * Add shuffle metrics for parallel indexing * javadoc and concurrency test * concurrency * fix javadoc * Feature flag * doc * fix doc and add a test * checkstyle * add tests * fix build and address comments	2020-10-10 19:35:17 -07:00
Atul Mohan	0ab8b6e0a9	Improve test (#10480 )	2020-10-07 08:40:02 -05:00
Jonathan Wei	65c0d64676	Update version to 0.21.0-SNAPSHOT (#10450 ) * [maven-release-plugin] prepare release druid-0.21.0 * [maven-release-plugin] prepare for next development iteration * Update web-console versions	2020-10-03 16:08:34 -07:00
Clint Wylie	9ec5c08e2a	fix array types from escaping into wider query engine (#10460 ) * fix array types from escaping into wider query engine * oops * adjust * fix lgtm	2020-10-03 15:30:34 -07:00
Gian Merlino	599aacce0f	Remove Expr.visit. (#10437 ) * Remove Expr.visit. It isn't used and doesn't have tests. * Remove Visitor too.	2020-09-28 22:13:10 -07:00
Clint Wylie	3d700a5e31	vectorize remaining math expressions (#10429 ) * vectorize remaining math expressions * fixes * remove cannotVectorize() where no longer true * disable vectorized groupby for numeric columns with nulls * fixes	2020-09-26 23:30:14 -07:00
Jihoon Son	0cc9eb4903	Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided (#10288 ) * Store hash partition function in dataSegment and allow segment pruning only when hash partition function is provided * query context * fix tests; add more test * javadoc * docs and more tests * remove default and hadoop tests * consistent name and fix javadoc * spelling and field name * default function for partitionsSpec * other comments * address comments * fix tests and spelling * test * doc	2020-09-24 16:32:56 -07:00
Jonathan Wei	cb30b1fe23	Automatically determine numShards for parallel ingestion hash partitioning (#10419 ) * Automatically determine numShards for parallel ingestion hash partitioning * Fix inspection, tests, coverage * Docs and some PR comments * Adjust locking * Use HllSketch instead of HyperLogLogCollector * Fix tests * Address some PR comments * Fix granularity bug * Small doc fix	2020-09-24 13:47:53 -07:00
Maytas Monsereenusorn	72f1b55f56	Add last_compaction_state to sys.segments table (#10413 ) * Add is_compacted to sys.segments table * change is_compacted to last_compaction_state * fix tests * fix tests * address comments	2020-09-23 15:29:36 -07:00
Clint Wylie	19c4b16640	vectorized expressions and expression virtual columns (#10401 ) * vectorized expression virtual columns * cleanup * fixes * preserve float if explicitly specified * oops * null handling fixes, more tests * what is an expression planner? * better names * remove unused method, add pi * move vector processor builders into static methods * reduce boilerplate * oops * more naming adjustments * changes * nullable * missing hex * more	2020-09-23 13:56:38 -07:00
Atul Mohan	b6ad790dc7	Support combining inputsource for parallel ingestion (#10387 ) * Add combining inputsource * Fix documentation Co-authored-by: Atul Mohan <atulmohan@yahoo-inc.com>	2020-09-15 16:25:35 -07:00
Clint Wylie	184b202411	add computed Expr output types (#10370 ) * push down ValueType to ExprType conversion, tidy up * determine expr output type for given input types * revert unintended name change * add nullable * tidy up * fixup * more better * fix signatures * naming things is hard * fix inspection * javadoc * make default implementation of Expr.getOutputType that returns null * rename method * more test * add output for contains expr macro, split operation and function auto conversion	2020-09-14 18:18:56 -07:00

1 2 3 4 5 ...

330 Commits