10052 Commits

Author SHA1 Message Date
Chi Cao Minh
0b0056b77f More tests for range partition parallel indexing (#9232)
Add more unit tests for range partition native batch parallel indexing.

Also, fix a bug where ParallelIndexPhaseRunner incorrectly thinks that
identical collected DimensionDistributionReports are not equal due to
not overriding equals() in DimensionDistributionReport.
2020-01-21 12:59:43 -08:00
Suneet Saldanha
a2939bbd1a Optimize JoinCondition matching (#9200)
* Optimize JoinCondition matching

The LookupJoinMatcher needs to check if a condition is always true or false
multiple times. This can be pre-computed to speed up the match checking

This change reduces the time it takes to perform a for joining on a long key
from ~ 36 ms/op to 23 ms/ op

* Rename variables

* fix typo
2020-01-21 09:11:50 -08:00
Gian Merlino
f511af1306 Fix DOCKER_HOST_IP handling for multihomed machines. (#9225)
By picking one. Otherwise, when a machine has multiple IP addresses, DOCKER_HOST_IP
would have a newline in the middle, causing havoc in configuration files.
2020-01-21 09:01:19 -08:00
Clint Wylie
8011211a0c first/last aggregators and nulls (#9161)
* null handling for numeric first/last aggregators, refactor to not extend nullable numeric agg since they are complex typed aggs

* initially null or not based on config

* review stuff, make string first/last consistent with null handling of numeric columns, more tests

* docs

* handle nil selectors, revert to primitive first/last types so groupby v1 works...
2020-01-20 11:51:54 -08:00
Suneet Saldanha
180c622e0f Minor doc updates (#9217)
* update string first last aggs

* update kafka ingestion specs in docs

* remove unnecessary parser spec
2020-01-20 11:34:37 -08:00
Gian Merlino
d21054f7c5
Remove the deprecated interval-chunking stuff. (#9216)
* Remove the deprecated interval-chunking stuff.

See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details.

* Remove unused import.

* Remove chunkInterval too.
2020-01-19 17:14:23 -08:00
Suneet Saldanha
d64bed79f0 Update docs for extensions (#9218)
* Update docs for s3 and avro extensions

* More doc updates - google + cleanup
2020-01-19 12:55:45 -08:00
Suneet Saldanha
df3c1075a8 Update docs for extensions (#9218)
* Update docs for s3 and avro extensions

* More doc updates - google + cleanup
2020-01-19 12:55:01 -08:00
Suneet Saldanha
bade2c802b Update docs for extensions (#9218)
* Update docs for s3 and avro extensions

* More doc updates - google + cleanup
2020-01-19 12:53:21 -08:00
Suneet Saldanha
f98b664bb0 Update docs for extensions (#9218)
* Update docs for s3 and avro extensions

* More doc updates - google + cleanup
2020-01-19 12:52:49 -08:00
Suneet Saldanha
de231d3c80 Update docs for extensions (#9218)
* Update docs for s3 and avro extensions

* More doc updates - google + cleanup
2020-01-19 12:50:05 -08:00
Suneet Saldanha
93167188ea Update docs for extensions (#9218)
* Update docs for s3 and avro extensions

* More doc updates - google + cleanup
2020-01-19 12:49:33 -08:00
Clint Wylie
f0dddaa51a fix topn aggregation on numeric columns with null values (#9183)
* fix topn issue with aggregating on numeric columns with null values

* adjustments

* rename

* add more tests

* fix comments

* more javadocs

* computeIfAbsent
2020-01-17 18:12:24 -08:00
Jihoon Son
153495068b Doc update for the new input source and the new input format (#9171)
* Doc update for new input source and input format.

- The input source and input format are promoted in all docs under docs/ingestion
- All input sources including core extension ones are located in docs/ingestion/native-batch.md
- All input formats and parsers including core extension ones are localted in docs/ingestion/data-formats.md
- New behavior of the parallel task with different partitionsSpecs are documented in docs/ingestion/native-batch.md

* parquet

* add warning for range partitioning with sequential mode

* hdfs + s3, gs

* add fs impl for gs

* address comments

* address comments

* gcs
2020-01-17 15:52:05 -08:00
Jihoon Son
84ff0d2352
Fix TSV bugs (#9199)
* working

* - support multi-char delimiter for tsv
- respect "delimiter" property for tsv

* default value check for findColumnsFromHeader

* remove CSVParser to have a true and only CSVParser

* fix tests

* fix another test
2020-01-17 15:35:14 -08:00
singh
936b9bdfd0 add deets about the keyfile (#9209) 2020-01-17 11:24:49 -08:00
Fokko Driesprong
12b84cfb33
Bump Jackson to 2.10.2 (#9173) 2020-01-17 11:39:32 +01:00
Vadim Ogievetsky
ab2672514b allow empty values to be set in the auto form (#9198) 2020-01-16 21:06:51 -08:00
Maytas Monsereenusorn
68ed2a2c8f Fix LATEST / EARLIEST Buffer Aggregator does not work on String column (#9197)
* fix buff limit bug

* add tests

* add test

* add tests

* fix checkstyle
2020-01-16 21:02:37 -08:00
Gian Merlino
448da78765 Speed up String first/last aggregators when folding isn't needed. (#9181)
* Speed up String first/last aggregators when folding isn't needed.

Examines the value column, and disables fold checking via a needsFoldCheck
flag if that column can't possibly contain SerializableLongStringPairs. This
is helpful because it avoids calling getObject on the value selector when
unnecessary; say, because the time selector didn't yield an earlier or later
value.

* PR comments.

* Move fastLooseChop to StringUtils.
2020-01-16 21:02:02 -08:00
Fokko Driesprong
486c0fd149 Bump Apache Parquet to 1.11.0 (#9129)
* Bump Parquet to 1.11.0

* Update licenses.yaml

* Add parquet-format-structures
2020-01-16 16:24:25 -08:00
Gian Merlino
bd49ec03bc
Move result-to-array logic from SQL layer into QueryToolChests. (#9130)
* Move result-to-array logic from SQL layer into QueryToolChests.

* Checkstyle adjustment.

* Fix typo.
2020-01-16 15:42:10 -08:00
Gian Merlino
bfcb30e48f
Add javadocs and small improvements to join code. (#9196)
A follow-up to #9111.
2020-01-16 15:25:38 -08:00
Maytas Monsereenusorn
42359c93dd Implement ANY aggregator (#9187)
* Implement ANY aggregator

* Add copyright headers

* Add unit tests

* fix BufferAggregator

* Fix bug in BufferAggregator

* hook up the SQL command

* add check for buffer aggregator

* Address comment

* address comments

* add docs

* Address comments

* add more tests for numeric columns that have null values when run in sql compatible null mode

* fix checkstyle errors

* fix failing tests

* fix failing tests
2020-01-16 14:40:32 -08:00
Gian Merlino
a87db7f353
Add HashJoinSegment, a virtual segment for joins. (#9111)
* Add HashJoinSegment, a virtual segment for joins.

An initial step towards #8728. This patch adds enough functionality to implement a joining
cursor on top of a normal datasource. It does not include enough to actually do a query. For
that, future patches will need to wire this low-level functionality into the query language.

* Fixups.

* Fix missing format argument.

* Various tests and minor improvements.

* Changes.

* Remove or add tests for unused stuff.

* Fix up package locations.
2020-01-16 13:14:20 -08:00
Vadim Ogievetsky
09efd20b42
fix refresh button (#9195) 2020-01-16 10:13:47 -08:00
Suneet Saldanha
92ac22d060 Link javaOpts to middlemanager runtime.properties docs (#9101)
* Link javaOpts to middlemanager runtime.properties docs

* fix broken link

* reword config links
2020-01-15 21:22:49 -08:00
Suneet Saldanha
85a3d416b0 Tutorials use new ingestion spec where possible (#9155)
* Tutorials use new ingestion spec where possible

There are 2 main changes
  * Use task type index_parallel instead of index
  * Remove the use of parser + firehose in favor of inputFormat + inputSource

index_parallel is the preferred method starting in 0.17. Setting the job to
index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent
of an index task

Instead of using a parserSpec, dimensionSpec and timestampSpec have been
promoted to the dataSchema. The format is described in the ioConfig as the
inputFormat.

There are a few cases where the new format is not supported
 * Hadoop must use firehoses instead of the inputSource and inputFormat
 * There is no equivalent of a combining firehose as an inputSource
 * A Combining firehose does not support index_parallel

* fix typo
2020-01-15 14:08:29 -08:00
Lucas Capistrant
4716e0b585 Fix concurrency of ComplexMetrics.java (#9134) 2020-01-15 17:19:45 +03:00
Chi Cao Minh
b2877119d0 Suppress CVE-2019-20330 for htrace-core-4.0.1 (#9189)
CVE-2019-20330 was updated on 14 Jan 2020, which now gets flagged by the
security vulnerability scan. Since the CVE is for jackson-databind, via
htrace-core-4.0.1, it can be added to the existing list of security
vulnerability suppressions for that dependency.
2020-01-14 21:15:24 -08:00
Chi Cao Minh
1fd05bef9a Add jackson-mapper-asl for hdfs-storage extension (#9178)
Previously jackson-mapper-asl was excluded to remove a security
vulnerability; however, it is required for functionality (e.g.,
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator).
2020-01-14 09:50:45 -08:00
Atul Mohan
ea51bc45bf Fix nullhandling in tests (#9119) 2020-01-12 20:19:12 -08:00
Atul Mohan
b642b1aa5b Fix deserialization of maxBytesInMemory (#9092)
* Fix deserialization of maxBytesInMemory

* Add maxBytes check
2020-01-12 20:08:07 -08:00
Clint Wylie
85219ece13 fix null handling for arithmetic post aggregator comparator (#9159)
* fix null handling for arithmetic postagg comparator, add test for comparator for min/max/quantile postaggs in histogram ext

* fix
2020-01-10 13:49:19 -08:00
Jonathan Wei
8c53818fa9
Add numeric nulls to sample data, fix some numeric null handling issues (#9154)
* Fix LongSumAggregator comparator null handling

* Remove unneeded GroupBy test change

* Checkstyle

* Update other processing tests for new sample data

* Remove unused code

* Fix SearchQueryRunner column selectors

* Fix DimensionIndexer null handling and ScanQueryRunnerTest

* Fix TeamCity errors
2020-01-10 13:49:06 -08:00
Clint Wylie
f245292e5d add middle manager and indexer worker category to tier column of services view (#9158) 2020-01-09 12:20:42 -08:00
Jihoon Son
e27a1e8604
Fix handling nullable writableComparable in OrcStructConverter (#9138)
* Handle nullable writableComparable in OrcStructConverter

* add missing dependency
2020-01-08 13:40:24 -08:00
Clint Wylie
7439f73c23 web console services tab treat indexer as a real service (#9139) 2020-01-07 18:14:04 -08:00
Clint Wylie
28edd3b44e data loader style fix for double typed columns (#9137) 2020-01-07 16:07:30 -08:00
Jonathan Wei
d1500c1328 Update Kinesis resharding information about task failures (#9104) 2020-01-07 15:44:48 -08:00
Clint Wylie
f540216931 fix InputFormat serde issue with SeekableStream based supervisors (#9136) 2020-01-07 16:18:54 -06:00
Clint Wylie
c248e00984 fix moment sketch null handling (#9075) 2020-01-07 14:15:59 -06:00
Clint Wylie
7af85250cb null handling for doubles sketch and array of doubles sketch aggs (#9112)
* doubles sketch and array of doubles sketch aggs now skip rows with nulls in sql compatible null handling mode

* formatting
2020-01-07 14:15:32 -06:00
Clint Wylie
14702429a0 fix web console data loader dimension types (#9135) 2020-01-06 20:56:58 -08:00
Jonathan Wei
58d337186b
Graduation update for ASF release process guide and download links (#9126)
* Graduation update for ASF release process guide and download links

* Fix release vote thread typo

* Fix pom.xml
2020-01-06 15:00:33 -06:00
Gian Merlino
66657012bf Replace CaseFilteredAggregatorRule with Calcite equivalent. (#9113)
AggregateCaseToFilterRule was added to Calcite in https://issues.apache.org/jira/browse/CALCITE-3144,
and was originally copied from Druid's CaseFilteredAggregatorRule. So there isn't a good reason to
keep using our version.
2020-01-04 19:11:18 -08:00
Suneet Saldanha
bdd0d0d8a5 Add avro dependency to parquet extension (#9124)
* Add avro dependency to parquet extension

If the parquet extension is loaded and an ingestionSpec uses the older format
specifying a 'parser' instead of using an 'inputFormat' the job fails
with the following error

java.lang.TypeNotPresentException: Type org.apache.avro.generic.GenericRecord not present

This change removes the exclusion of the avro package so that the missing
class can be found.

* Address review comments and add dependency version
2020-01-03 20:11:13 -06:00
Jonathan Wei
aa539177ec De-incubation cleanup in code, docs, packaging (#9108)
* De-incubation cleanup in code, docs, packaging

* remove unused docs script
2020-01-03 12:33:19 -05:00
Gian Merlino
eb124a3068
Fix DistinctCountGroupByQueryTest Y2020 bug. (#9120)
It used data with the current timestamp alongside a query that had an end
instant of 2020-01-01.
2020-01-02 21:10:32 -05:00
Jonathan Wei
4e8368a5d9 Set version to 0.18.0-SNAPSHOT (#9109) 2020-01-02 17:55:10 -05:00