druid

mirror of https://github.com/apache/druid.git synced 2025-02-21 09:46:21 +00:00

Author	SHA1	Message	Date
Suneet Saldanha	a2939bbd1a	Optimize JoinCondition matching (#9200 ) * Optimize JoinCondition matching The LookupJoinMatcher needs to check if a condition is always true or false multiple times. This can be pre-computed to speed up the match checking This change reduces the time it takes to perform a for joining on a long key from ~ 36 ms/op to 23 ms/ op * Rename variables * fix typo	2020-01-21 09:11:50 -08:00
Gian Merlino	f511af1306	Fix DOCKER_HOST_IP handling for multihomed machines. (#9225 ) By picking one. Otherwise, when a machine has multiple IP addresses, DOCKER_HOST_IP would have a newline in the middle, causing havoc in configuration files.	2020-01-21 09:01:19 -08:00
Clint Wylie	8011211a0c	first/last aggregators and nulls (#9161 ) * null handling for numeric first/last aggregators, refactor to not extend nullable numeric agg since they are complex typed aggs * initially null or not based on config * review stuff, make string first/last consistent with null handling of numeric columns, more tests * docs * handle nil selectors, revert to primitive first/last types so groupby v1 works...	2020-01-20 11:51:54 -08:00
Suneet Saldanha	180c622e0f	Minor doc updates (#9217 ) * update string first last aggs * update kafka ingestion specs in docs * remove unnecessary parser spec	2020-01-20 11:34:37 -08:00
Gian Merlino	d21054f7c5	Remove the deprecated interval-chunking stuff. (#9216 ) * Remove the deprecated interval-chunking stuff. See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details. * Remove unused import. * Remove chunkInterval too.	2020-01-19 17:14:23 -08:00
Suneet Saldanha	d64bed79f0	Update docs for extensions (#9218 ) * Update docs for s3 and avro extensions * More doc updates - google + cleanup	2020-01-19 12:55:45 -08:00
Suneet Saldanha	df3c1075a8	Update docs for extensions (#9218 ) * Update docs for s3 and avro extensions * More doc updates - google + cleanup	2020-01-19 12:55:01 -08:00
Suneet Saldanha	bade2c802b	Update docs for extensions (#9218 ) * Update docs for s3 and avro extensions * More doc updates - google + cleanup	2020-01-19 12:53:21 -08:00
Suneet Saldanha	f98b664bb0	Update docs for extensions (#9218 ) * Update docs for s3 and avro extensions * More doc updates - google + cleanup	2020-01-19 12:52:49 -08:00
Suneet Saldanha	de231d3c80	Update docs for extensions (#9218 ) * Update docs for s3 and avro extensions * More doc updates - google + cleanup	2020-01-19 12:50:05 -08:00
Suneet Saldanha	93167188ea	Update docs for extensions (#9218 ) * Update docs for s3 and avro extensions * More doc updates - google + cleanup	2020-01-19 12:49:33 -08:00
Clint Wylie	f0dddaa51a	fix topn aggregation on numeric columns with null values (#9183 ) * fix topn issue with aggregating on numeric columns with null values * adjustments * rename * add more tests * fix comments * more javadocs * computeIfAbsent	2020-01-17 18:12:24 -08:00
Jihoon Son	153495068b	Doc update for the new input source and the new input format (#9171 ) * Doc update for new input source and input format. - The input source and input format are promoted in all docs under docs/ingestion - All input sources including core extension ones are located in docs/ingestion/native-batch.md - All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md - New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md * parquet * add warning for range partitioning with sequential mode * hdfs + s3, gs * add fs impl for gs * address comments * address comments * gcs	2020-01-17 15:52:05 -08:00
Jihoon Son	84ff0d2352	Fix TSV bugs (#9199 ) * working * - support multi-char delimiter for tsv - respect "delimiter" property for tsv * default value check for findColumnsFromHeader * remove CSVParser to have a true and only CSVParser * fix tests * fix another test	2020-01-17 15:35:14 -08:00
singh	936b9bdfd0	add deets about the keyfile (#9209 )	2020-01-17 11:24:49 -08:00
Fokko Driesprong	12b84cfb33	Bump Jackson to 2.10.2 (#9173 )	2020-01-17 11:39:32 +01:00
Vadim Ogievetsky	ab2672514b	allow empty values to be set in the auto form (#9198 )	2020-01-16 21:06:51 -08:00
Maytas Monsereenusorn	68ed2a2c8f	Fix LATEST / EARLIEST Buffer Aggregator does not work on String column (#9197 ) * fix buff limit bug * add tests * add test * add tests * fix checkstyle	2020-01-16 21:02:37 -08:00
Gian Merlino	448da78765	Speed up String first/last aggregators when folding isn't needed. (#9181 ) * Speed up String first/last aggregators when folding isn't needed. Examines the value column, and disables fold checking via a needsFoldCheck flag if that column can't possibly contain SerializableLongStringPairs. This is helpful because it avoids calling getObject on the value selector when unnecessary; say, because the time selector didn't yield an earlier or later value. * PR comments. * Move fastLooseChop to StringUtils.	2020-01-16 21:02:02 -08:00
Fokko Driesprong	486c0fd149	Bump Apache Parquet to 1.11.0 (#9129 ) * Bump Parquet to 1.11.0 * Update licenses.yaml * Add parquet-format-structures	2020-01-16 16:24:25 -08:00
Gian Merlino	bd49ec03bc	Move result-to-array logic from SQL layer into QueryToolChests. (#9130 ) * Move result-to-array logic from SQL layer into QueryToolChests. * Checkstyle adjustment. * Fix typo.	2020-01-16 15:42:10 -08:00
Gian Merlino	bfcb30e48f	Add javadocs and small improvements to join code. (#9196 ) A follow-up to #9111.	2020-01-16 15:25:38 -08:00
Maytas Monsereenusorn	42359c93dd	Implement ANY aggregator (#9187 ) * Implement ANY aggregator * Add copyright headers * Add unit tests * fix BufferAggregator * Fix bug in BufferAggregator * hook up the SQL command * add check for buffer aggregator * Address comment * address comments * add docs * Address comments * add more tests for numeric columns that have null values when run in sql compatible null mode * fix checkstyle errors * fix failing tests * fix failing tests	2020-01-16 14:40:32 -08:00
Gian Merlino	a87db7f353	Add HashJoinSegment, a virtual segment for joins. (#9111 ) * Add HashJoinSegment, a virtual segment for joins. An initial step towards #8728. This patch adds enough functionality to implement a joining cursor on top of a normal datasource. It does not include enough to actually do a query. For that, future patches will need to wire this low-level functionality into the query language. * Fixups. * Fix missing format argument. * Various tests and minor improvements. * Changes. * Remove or add tests for unused stuff. * Fix up package locations.	2020-01-16 13:14:20 -08:00
Vadim Ogievetsky	09efd20b42	fix refresh button (#9195 )	2020-01-16 10:13:47 -08:00
Suneet Saldanha	92ac22d060	Link javaOpts to middlemanager runtime.properties docs (#9101 ) * Link javaOpts to middlemanager runtime.properties docs * fix broken link * reword config links	2020-01-15 21:22:49 -08:00
Suneet Saldanha	85a3d416b0	Tutorials use new ingestion spec where possible (#9155 ) * Tutorials use new ingestion spec where possible There are 2 main changes * Use task type index_parallel instead of index * Remove the use of parser + firehose in favor of inputFormat + inputSource index_parallel is the preferred method starting in 0.17. Setting the job to index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent of an index task Instead of using a parserSpec, dimensionSpec and timestampSpec have been promoted to the dataSchema. The format is described in the ioConfig as the inputFormat. There are a few cases where the new format is not supported * Hadoop must use firehoses instead of the inputSource and inputFormat * There is no equivalent of a combining firehose as an inputSource * A Combining firehose does not support index_parallel * fix typo	2020-01-15 14:08:29 -08:00
Lucas Capistrant	4716e0b585	Fix concurrency of ComplexMetrics.java (#9134 )	2020-01-15 17:19:45 +03:00
Chi Cao Minh	b2877119d0	Suppress CVE-2019-20330 for htrace-core-4.0.1 (#9189 ) CVE-2019-20330 was updated on 14 Jan 2020, which now gets flagged by the security vulnerability scan. Since the CVE is for jackson-databind, via htrace-core-4.0.1, it can be added to the existing list of security vulnerability suppressions for that dependency.	2020-01-14 21:15:24 -08:00
Chi Cao Minh	1fd05bef9a	Add jackson-mapper-asl for hdfs-storage extension (#9178 ) Previously jackson-mapper-asl was excluded to remove a security vulnerability; however, it is required for functionality (e.g., org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator).	2020-01-14 09:50:45 -08:00
Atul Mohan	ea51bc45bf	Fix nullhandling in tests (#9119 )	2020-01-12 20:19:12 -08:00
Atul Mohan	b642b1aa5b	Fix deserialization of maxBytesInMemory (#9092 ) * Fix deserialization of maxBytesInMemory * Add maxBytes check	2020-01-12 20:08:07 -08:00
Clint Wylie	85219ece13	fix null handling for arithmetic post aggregator comparator (#9159 ) * fix null handling for arithmetic postagg comparator, add test for comparator for min/max/quantile postaggs in histogram ext * fix	2020-01-10 13:49:19 -08:00
Jonathan Wei	8c53818fa9	Add numeric nulls to sample data, fix some numeric null handling issues (#9154 ) * Fix LongSumAggregator comparator null handling * Remove unneeded GroupBy test change * Checkstyle * Update other processing tests for new sample data * Remove unused code * Fix SearchQueryRunner column selectors * Fix DimensionIndexer null handling and ScanQueryRunnerTest * Fix TeamCity errors	2020-01-10 13:49:06 -08:00
Clint Wylie	f245292e5d	add middle manager and indexer worker category to tier column of services view (#9158 )	2020-01-09 12:20:42 -08:00
Jihoon Son	e27a1e8604	Fix handling nullable writableComparable in OrcStructConverter (#9138 ) * Handle nullable writableComparable in OrcStructConverter * add missing dependency	2020-01-08 13:40:24 -08:00
Clint Wylie	7439f73c23	web console services tab treat indexer as a real service (#9139 )	2020-01-07 18:14:04 -08:00
Clint Wylie	28edd3b44e	data loader style fix for double typed columns (#9137 )	2020-01-07 16:07:30 -08:00
Jonathan Wei	d1500c1328	Update Kinesis resharding information about task failures (#9104 )	2020-01-07 15:44:48 -08:00
Clint Wylie	f540216931	fix InputFormat serde issue with SeekableStream based supervisors (#9136 )	2020-01-07 16:18:54 -06:00
Clint Wylie	c248e00984	fix moment sketch null handling (#9075 )	2020-01-07 14:15:59 -06:00
Clint Wylie	7af85250cb	null handling for doubles sketch and array of doubles sketch aggs (#9112 ) * doubles sketch and array of doubles sketch aggs now skip rows with nulls in sql compatible null handling mode * formatting	2020-01-07 14:15:32 -06:00
Clint Wylie	14702429a0	fix web console data loader dimension types (#9135 )	2020-01-06 20:56:58 -08:00
Jonathan Wei	58d337186b	Graduation update for ASF release process guide and download links (#9126 ) * Graduation update for ASF release process guide and download links * Fix release vote thread typo * Fix pom.xml	2020-01-06 15:00:33 -06:00
Gian Merlino	66657012bf	Replace CaseFilteredAggregatorRule with Calcite equivalent. (#9113 ) AggregateCaseToFilterRule was added to Calcite in https://issues.apache.org/jira/browse/CALCITE-3144, and was originally copied from Druid's CaseFilteredAggregatorRule. So there isn't a good reason to keep using our version.	2020-01-04 19:11:18 -08:00
Suneet Saldanha	bdd0d0d8a5	Add avro dependency to parquet extension (#9124 ) * Add avro dependency to parquet extension If the parquet extension is loaded and an ingestionSpec uses the older format specifying a 'parser' instead of using an 'inputFormat' the job fails with the following error java.lang.TypeNotPresentException: Type org.apache.avro.generic.GenericRecord not present This change removes the exclusion of the avro package so that the missing class can be found. * Address review comments and add dependency version	2020-01-03 20:11:13 -06:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Gian Merlino	eb124a3068	Fix DistinctCountGroupByQueryTest Y2020 bug. (#9120 ) It used data with the current timestamp alongside a query that had an end instant of 2020-01-01.	2020-01-02 21:10:32 -05:00
Jonathan Wei	4e8368a5d9	Set version to 0.18.0-SNAPSHOT (#9109 )	2020-01-02 17:55:10 -05:00
Gian Merlino	18eb456fe6	S3: Improvements to prefix listing (including fix for an infinite loop) (#9098 ) * S3: Improvements to prefix listing (including fix for an infinite loop) 1) Fixes #9097, an infinite loop that occurs when more than one batch of objects is retrieved during a prefix listing. 2) Removes the Access Denied fallback code added in #4444. I don't think the behavior is reasonable: its purpose is to fall back from a prefix listing to a single-object access, but it's only activated when the end user supplied a prefix, so it would be better to simply fail, so the end user knows that their request for a prefix-based load is not going to work. Presumably the end user can switch from supplying 'prefixes' to supplying 'uris' if desired. 3) Filters out directory placeholders when walking prefixes. 4) Splits LazyObjectSummariesIterator into its own class and adds tests. * Adjust S3InputSourceTest. * Changes from review. * Include hamcrest-core.	2019-12-31 19:06:49 -05:00

1 2 3 4 5 ...

10051 Commits