Commit Graph

10025 Commits

Author SHA1 Message Date
Suneet Saldanha 85a3d416b0 Tutorials use new ingestion spec where possible (#9155)
* Tutorials use new ingestion spec where possible

There are 2 main changes
  * Use task type index_parallel instead of index
  * Remove the use of parser + firehose in favor of inputFormat + inputSource

index_parallel is the preferred method starting in 0.17. Setting the job to
index_parallel with the default maxNumConcurrentSubTasks(1) is the equivalent
of an index task

Instead of using a parserSpec, dimensionSpec and timestampSpec have been
promoted to the dataSchema. The format is described in the ioConfig as the
inputFormat.

There are a few cases where the new format is not supported
 * Hadoop must use firehoses instead of the inputSource and inputFormat
 * There is no equivalent of a combining firehose as an inputSource
 * A Combining firehose does not support index_parallel

* fix typo
2020-01-15 14:08:29 -08:00
Lucas Capistrant 4716e0b585 Fix concurrency of ComplexMetrics.java (#9134) 2020-01-15 17:19:45 +03:00
Chi Cao Minh b2877119d0 Suppress CVE-2019-20330 for htrace-core-4.0.1 (#9189)
CVE-2019-20330 was updated on 14 Jan 2020, which now gets flagged by the
security vulnerability scan. Since the CVE is for jackson-databind, via
htrace-core-4.0.1, it can be added to the existing list of security
vulnerability suppressions for that dependency.
2020-01-14 21:15:24 -08:00
Chi Cao Minh 1fd05bef9a Add jackson-mapper-asl for hdfs-storage extension (#9178)
Previously jackson-mapper-asl was excluded to remove a security
vulnerability; however, it is required for functionality (e.g.,
org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator).
2020-01-14 09:50:45 -08:00
Atul Mohan ea51bc45bf Fix nullhandling in tests (#9119) 2020-01-12 20:19:12 -08:00
Atul Mohan b642b1aa5b Fix deserialization of maxBytesInMemory (#9092)
* Fix deserialization of maxBytesInMemory

* Add maxBytes check
2020-01-12 20:08:07 -08:00
Clint Wylie 85219ece13 fix null handling for arithmetic post aggregator comparator (#9159)
* fix null handling for arithmetic postagg comparator, add test for comparator for min/max/quantile postaggs in histogram ext

* fix
2020-01-10 13:49:19 -08:00
Jonathan Wei 8c53818fa9
Add numeric nulls to sample data, fix some numeric null handling issues (#9154)
* Fix LongSumAggregator comparator null handling

* Remove unneeded GroupBy test change

* Checkstyle

* Update other processing tests for new sample data

* Remove unused code

* Fix SearchQueryRunner column selectors

* Fix DimensionIndexer null handling and ScanQueryRunnerTest

* Fix TeamCity errors
2020-01-10 13:49:06 -08:00
Clint Wylie f245292e5d add middle manager and indexer worker category to tier column of services view (#9158) 2020-01-09 12:20:42 -08:00
Jihoon Son e27a1e8604
Fix handling nullable writableComparable in OrcStructConverter (#9138)
* Handle nullable writableComparable in OrcStructConverter

* add missing dependency
2020-01-08 13:40:24 -08:00
Clint Wylie 7439f73c23 web console services tab treat indexer as a real service (#9139) 2020-01-07 18:14:04 -08:00
Clint Wylie 28edd3b44e data loader style fix for double typed columns (#9137) 2020-01-07 16:07:30 -08:00
Jonathan Wei d1500c1328 Update Kinesis resharding information about task failures (#9104) 2020-01-07 15:44:48 -08:00
Clint Wylie f540216931 fix InputFormat serde issue with SeekableStream based supervisors (#9136) 2020-01-07 16:18:54 -06:00
Clint Wylie c248e00984 fix moment sketch null handling (#9075) 2020-01-07 14:15:59 -06:00
Clint Wylie 7af85250cb null handling for doubles sketch and array of doubles sketch aggs (#9112)
* doubles sketch and array of doubles sketch aggs now skip rows with nulls in sql compatible null handling mode

* formatting
2020-01-07 14:15:32 -06:00
Clint Wylie 14702429a0 fix web console data loader dimension types (#9135) 2020-01-06 20:56:58 -08:00
Jonathan Wei 58d337186b
Graduation update for ASF release process guide and download links (#9126)
* Graduation update for ASF release process guide and download links

* Fix release vote thread typo

* Fix pom.xml
2020-01-06 15:00:33 -06:00
Gian Merlino 66657012bf Replace CaseFilteredAggregatorRule with Calcite equivalent. (#9113)
AggregateCaseToFilterRule was added to Calcite in https://issues.apache.org/jira/browse/CALCITE-3144,
and was originally copied from Druid's CaseFilteredAggregatorRule. So there isn't a good reason to
keep using our version.
2020-01-04 19:11:18 -08:00
Suneet Saldanha bdd0d0d8a5 Add avro dependency to parquet extension (#9124)
* Add avro dependency to parquet extension

If the parquet extension is loaded and an ingestionSpec uses the older format
specifying a 'parser' instead of using an 'inputFormat' the job fails
with the following error

java.lang.TypeNotPresentException: Type org.apache.avro.generic.GenericRecord not present

This change removes the exclusion of the avro package so that the missing
class can be found.

* Address review comments and add dependency version
2020-01-03 20:11:13 -06:00
Jonathan Wei aa539177ec De-incubation cleanup in code, docs, packaging (#9108)
* De-incubation cleanup in code, docs, packaging

* remove unused docs script
2020-01-03 12:33:19 -05:00
Gian Merlino eb124a3068
Fix DistinctCountGroupByQueryTest Y2020 bug. (#9120)
It used data with the current timestamp alongside a query that had an end
instant of 2020-01-01.
2020-01-02 21:10:32 -05:00
Jonathan Wei 4e8368a5d9 Set version to 0.18.0-SNAPSHOT (#9109) 2020-01-02 17:55:10 -05:00
Gian Merlino 18eb456fe6
S3: Improvements to prefix listing (including fix for an infinite loop) (#9098)
* S3: Improvements to prefix listing (including fix for an infinite loop)

1) Fixes #9097, an infinite loop that occurs when more than one batch
   of objects is retrieved during a prefix listing.

2) Removes the Access Denied fallback code added in #4444. I don't think
   the behavior is reasonable: its purpose is to fall back from a prefix
   listing to a single-object access, but it's only activated when the
   end user supplied a prefix, so it would be better to simply fail, so
   the end user knows that their request for a prefix-based load is not
   going to work. Presumably the end user can switch from supplying
   'prefixes' to supplying 'uris' if desired.

3) Filters out directory placeholders when walking prefixes.

4) Splits LazyObjectSummariesIterator into its own class and adds tests.

* Adjust S3InputSourceTest.

* Changes from review.

* Include hamcrest-core.
2019-12-31 19:06:49 -05:00
Suneet Saldanha dec619ebf4 Optimize CachingLocalSegmentAllocator#getSequenceName (#8909)
* Optimize CachingLocalSegmentAllocator#getSequenceName

Replace StringUtils#format with string addition to generate the sequence
name for an interval and partition. This is faster because format uses a
Matcher under the covers to replace the string format with the variables.

* fix imports and add test

* Add comment about optimization

* Use renamed function for TaskToolbox

* Move tests after refactor

* Rename tests
2019-12-23 18:33:22 -08:00
Vadim Ogievetsky 320c50d24a Web console: fix spec reset (#9081)
* extract spec type

* better text

* better copy

* de incubate the console

* fix status dialog scss
2019-12-23 18:23:14 -08:00
Samarth Jain 9ec9619143 Handle null values for metrics in TDigest aggregators. (#9073)
Add support for rollup during ingestion.
2019-12-23 17:49:06 -08:00
Vadim Ogievetsky a24e2f347f make supervisor statistics dialog more robust (#9089) 2019-12-23 17:43:08 -08:00
Benedict Jin 7a7c948595 Exclude .asf.yaml from the configuration of the rat plugin (#9088) 2019-12-23 13:08:23 -08:00
Fangjin Yang 2231e69b7f
Update README.md 2019-12-20 20:56:53 -08:00
Chi Cao Minh 513bb1f6da Get proper Kinesis index task AWS credentials (#9082)
Previously, the configured S3 credentials would be used instead of the
ones configured for Kinesis for Kinesis index tasks.
2019-12-20 19:35:05 -08:00
Gian Merlino 342107b4c2 Add .asf.yaml. (#9083)
Based on the docs at https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories.
2019-12-20 16:45:38 -08:00
Clint Wylie 8ccce9857a fix vectorized query engine numeric filter matchers against null values (#9063)
* fix druid-sql issue with filtering numeric columns by null values

* fix vector numeric column matchers to check null vector for null matches
2019-12-20 13:15:48 -08:00
Fangjin Yang 60d896a67c
Update README.md 2019-12-19 22:32:08 -08:00
Clint Wylie c2e9ab8100 benchmark schema with numeric dimensions and null column values (#9036)
* benchmark schema with null column values

* oops

* adjustments

* rename again, different null percentage so rows more variety

* more schema
2019-12-19 17:45:19 -08:00
Jihoon Son 3c31493772 Add missing docs for http client configurations (#9054)
* Add missing docs for http client configurations

* fix typo

* backticks
2019-12-19 17:41:04 -08:00
Suneet Saldanha 3c13444167 Fix flaky ITBasicAuthConfigurationTest (#9072)
This test was failing to authenticate using the admin credentials. These
should be available by default in the metadata store. This indicates that
the credentials are not successfully being syncd before the test is run.

This change increases the number of retries to 20 so that the services
are syncd before the test runs
2019-12-19 17:38:55 -08:00
Suneet Saldanha 176bc8fd97 Remove resolve-ip dependency for integration-tests (#9065)
* Remove resolve-ip dependency for integration-tests

* use host hostname and fallback to dscacheutil

* better shell script comparisons
2019-12-19 14:53:36 -08:00
Fangjin Yang 256b8f69b6 Update README.md (#9078) 2019-12-19 13:00:27 -08:00
Fangjin Yang d20d2ff71d Update README.md (#9077) 2019-12-19 11:54:14 -08:00
Fangjin Yang de18f76c8b Update README.md (#9074)
Updates to readme
2019-12-19 11:39:27 -08:00
Clint Wylie 84ef8b819e
fix druid-sql issue with filtering numeric columns by null values (#9061)
* fix druid-sql issue with filtering numeric columns by null values

* fix tests

* fix tests for reals
2019-12-18 13:30:34 -08:00
Jihoon Son 94a23fb17e Fix flaky realtime index task tests (#8999)
* Fix flaky realtime index task tests

* fix ITAppenderatorDriverRealtimeIndexTaskTest

* fix comment

* address comments
2019-12-18 13:25:00 -08:00
Jonathan Wei 15884f6d10
Fix hadoop ingestion property handling when using indexers (#9059) 2019-12-18 12:13:19 -08:00
Jonathan Wei b1547a76b1
Update GPG key instructions for ASF release guide (#9006) 2019-12-18 12:12:48 -08:00
Suneet Saldanha 1fb93d56c3 Add instructions to backport a PR (#9052)
* Add instructions to backport a PR

* Clearer image

* Add period in backport instructions
2019-12-18 11:57:01 -08:00
Chi Cao Minh 6178f05da6 Fail superbatch range partition multi dim values (#9058)
* Fail superbatch range partition multi dim values

Change the behavior of parallel indexing range partitioning to fail
ingestion if any row had multiple values for the partition dimension.
After this change, the behavior matches that of hadoop indexing.
(Previously, rows with multiple dimension values would be skipped.)

* Improve err msg, rename method, rename test class
2019-12-18 10:14:03 -08:00
Jonathan Wei 131b3f13be Skip non-Apache repo PRs in milestone tagging script (#9064) 2019-12-17 18:28:11 -08:00
Vadim Ogievetsky e7b1653d88 add button to reapply retention rules (#9055) 2019-12-17 18:08:57 -08:00
Benedict Jin 24be558347 Fix NPE for subquery with limit (#8775)
* Fix NPE for subquery with limit

* Mark it as unplannable by returning null

* Migrate testcases from SqlResourceTest to CalciteQueryTest

* Throw CannotBuildQueryException

* Fix typo

* Patch comments
2019-12-17 10:21:12 -08:00