druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	66657012bf	Replace CaseFilteredAggregatorRule with Calcite equivalent. (#9113 ) AggregateCaseToFilterRule was added to Calcite in https://issues.apache.org/jira/browse/CALCITE-3144, and was originally copied from Druid's CaseFilteredAggregatorRule. So there isn't a good reason to keep using our version.	2020-01-04 19:11:18 -08:00
Suneet Saldanha	bdd0d0d8a5	Add avro dependency to parquet extension (#9124 ) * Add avro dependency to parquet extension If the parquet extension is loaded and an ingestionSpec uses the older format specifying a 'parser' instead of using an 'inputFormat' the job fails with the following error java.lang.TypeNotPresentException: Type org.apache.avro.generic.GenericRecord not present This change removes the exclusion of the avro package so that the missing class can be found. * Address review comments and add dependency version	2020-01-03 20:11:13 -06:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Gian Merlino	eb124a3068	Fix DistinctCountGroupByQueryTest Y2020 bug. (#9120 ) It used data with the current timestamp alongside a query that had an end instant of 2020-01-01.	2020-01-02 21:10:32 -05:00
Jonathan Wei	4e8368a5d9	Set version to 0.18.0-SNAPSHOT (#9109 )	2020-01-02 17:55:10 -05:00
Gian Merlino	18eb456fe6	S3: Improvements to prefix listing (including fix for an infinite loop) (#9098 ) * S3: Improvements to prefix listing (including fix for an infinite loop) 1) Fixes #9097, an infinite loop that occurs when more than one batch of objects is retrieved during a prefix listing. 2) Removes the Access Denied fallback code added in #4444. I don't think the behavior is reasonable: its purpose is to fall back from a prefix listing to a single-object access, but it's only activated when the end user supplied a prefix, so it would be better to simply fail, so the end user knows that their request for a prefix-based load is not going to work. Presumably the end user can switch from supplying 'prefixes' to supplying 'uris' if desired. 3) Filters out directory placeholders when walking prefixes. 4) Splits LazyObjectSummariesIterator into its own class and adds tests. * Adjust S3InputSourceTest. * Changes from review. * Include hamcrest-core.	2019-12-31 19:06:49 -05:00
Suneet Saldanha	dec619ebf4	Optimize CachingLocalSegmentAllocator#getSequenceName (#8909 ) * Optimize CachingLocalSegmentAllocator#getSequenceName Replace StringUtils#format with string addition to generate the sequence name for an interval and partition. This is faster because format uses a Matcher under the covers to replace the string format with the variables. * fix imports and add test * Add comment about optimization * Use renamed function for TaskToolbox * Move tests after refactor * Rename tests	2019-12-23 18:33:22 -08:00
Vadim Ogievetsky	320c50d24a	Web console: fix spec reset (#9081 ) * extract spec type * better text * better copy * de incubate the console * fix status dialog scss	2019-12-23 18:23:14 -08:00
Samarth Jain	9ec9619143	Handle null values for metrics in TDigest aggregators. (#9073 ) Add support for rollup during ingestion.	2019-12-23 17:49:06 -08:00
Vadim Ogievetsky	a24e2f347f	make supervisor statistics dialog more robust (#9089 )	2019-12-23 17:43:08 -08:00
Benedict Jin	7a7c948595	Exclude .asf.yaml from the configuration of the rat plugin (#9088 )	2019-12-23 13:08:23 -08:00
Fangjin Yang	2231e69b7f	Update README.md	2019-12-20 20:56:53 -08:00
Chi Cao Minh	513bb1f6da	Get proper Kinesis index task AWS credentials (#9082 ) Previously, the configured S3 credentials would be used instead of the ones configured for Kinesis for Kinesis index tasks.	2019-12-20 19:35:05 -08:00
Gian Merlino	342107b4c2	Add .asf.yaml. (#9083 ) Based on the docs at https://cwiki.apache.org/confluence/display/INFRA/.asf.yaml+features+for+git+repositories.	2019-12-20 16:45:38 -08:00
Clint Wylie	8ccce9857a	fix vectorized query engine numeric filter matchers against null values (#9063 ) * fix druid-sql issue with filtering numeric columns by null values * fix vector numeric column matchers to check null vector for null matches	2019-12-20 13:15:48 -08:00
Fangjin Yang	60d896a67c	Update README.md	2019-12-19 22:32:08 -08:00
Clint Wylie	c2e9ab8100	benchmark schema with numeric dimensions and null column values (#9036 ) * benchmark schema with null column values * oops * adjustments * rename again, different null percentage so rows more variety * more schema	2019-12-19 17:45:19 -08:00
Jihoon Son	3c31493772	Add missing docs for http client configurations (#9054 ) * Add missing docs for http client configurations * fix typo * backticks	2019-12-19 17:41:04 -08:00
Suneet Saldanha	3c13444167	Fix flaky ITBasicAuthConfigurationTest (#9072 ) This test was failing to authenticate using the admin credentials. These should be available by default in the metadata store. This indicates that the credentials are not successfully being syncd before the test is run. This change increases the number of retries to 20 so that the services are syncd before the test runs	2019-12-19 17:38:55 -08:00
Suneet Saldanha	176bc8fd97	Remove resolve-ip dependency for integration-tests (#9065 ) * Remove resolve-ip dependency for integration-tests * use host hostname and fallback to dscacheutil * better shell script comparisons	2019-12-19 14:53:36 -08:00
Fangjin Yang	256b8f69b6	Update README.md (#9078 )	2019-12-19 13:00:27 -08:00
Fangjin Yang	d20d2ff71d	Update README.md (#9077 )	2019-12-19 11:54:14 -08:00
Fangjin Yang	de18f76c8b	Update README.md (#9074 ) Updates to readme	2019-12-19 11:39:27 -08:00
Clint Wylie	84ef8b819e	fix druid-sql issue with filtering numeric columns by null values (#9061 ) * fix druid-sql issue with filtering numeric columns by null values * fix tests * fix tests for reals	2019-12-18 13:30:34 -08:00
Jihoon Son	94a23fb17e	Fix flaky realtime index task tests (#8999 ) * Fix flaky realtime index task tests * fix ITAppenderatorDriverRealtimeIndexTaskTest * fix comment * address comments	2019-12-18 13:25:00 -08:00
Jonathan Wei	15884f6d10	Fix hadoop ingestion property handling when using indexers (#9059 )	2019-12-18 12:13:19 -08:00
Jonathan Wei	b1547a76b1	Update GPG key instructions for ASF release guide (#9006 )	2019-12-18 12:12:48 -08:00
Suneet Saldanha	1fb93d56c3	Add instructions to backport a PR (#9052 ) * Add instructions to backport a PR * Clearer image * Add period in backport instructions	2019-12-18 11:57:01 -08:00
Chi Cao Minh	6178f05da6	Fail superbatch range partition multi dim values (#9058 ) * Fail superbatch range partition multi dim values Change the behavior of parallel indexing range partitioning to fail ingestion if any row had multiple values for the partition dimension. After this change, the behavior matches that of hadoop indexing. (Previously, rows with multiple dimension values would be skipped.) * Improve err msg, rename method, rename test class	2019-12-18 10:14:03 -08:00
Jonathan Wei	131b3f13be	Skip non-Apache repo PRs in milestone tagging script (#9064 )	2019-12-17 18:28:11 -08:00
Vadim Ogievetsky	e7b1653d88	add button to reapply retention rules (#9055 )	2019-12-17 18:08:57 -08:00
Benedict Jin	24be558347	Fix NPE for subquery with limit (#8775 ) * Fix NPE for subquery with limit * Mark it as unplannable by returning null * Migrate testcases from SqlResourceTest to CalciteQueryTest * Throw CannotBuildQueryException * Fix typo * Patch comments	2019-12-17 10:21:12 -08:00
Suneet Saldanha	301c0649a7	Fix equalsAndHashCode in ClientCompactQueryTuningConfig (#9035 ) * Fix equalsAndHashCode in ClientCompactQueryTuningConfig This change introduces a dependency to EqualsVerifier for the test scope. The dependency is licensed under Apache 2. The library makes it trivial to add equals and hashCode checks to prevent bugs like this from happening in the future * fix checkstyle * fix test name	2019-12-16 14:33:00 -08:00
Jihoon Son	298425a33a	Fix handling interruptedException in resource pool (#9044 )	2019-12-16 09:41:13 -08:00
Clint Wylie	bc16ff5e7c	sql auto limit wrapping fix (#9043 ) * sql auto limit wrapping fix * fix tests and style * remove setImportance	2019-12-16 01:38:24 -08:00
Clint Wylie	6881535b48	docs - clarify cache parameters (#9020 )	2019-12-13 16:53:45 -08:00
Gian Merlino	d452cbbb82	GenericIndexedWriter: Fix issue when writing large values to large columns. (#9029 )	2019-12-13 15:33:14 -08:00
Suneet Saldanha	3325da1718	Allow startup scripts to specify java home (#9021 ) * Allow startup scripts to specify java home The startup scripts now look for java in 3 locations. The order is from most related to druid to least, ie ${DRUID_JAVA_HOME} ${JAVA_HOME} ${PATH} * Update fn names and clean up code * final round of fixes * fix spellcheck	2019-12-12 21:36:00 -08:00
Fangyuan Deng	41f30e53a6	[bugfix]fix getAvgSizePerGranularity logic in DerivativeDataSourceManager(materializedview) (#8929 ) * fix getAvgSizePerGranularity in DerivativeDataSourceManager * revert * redo	2019-12-12 17:27:02 -08:00
Himanshu	9236dd9467	optionally enable Jetty ForwardedRequestCustomizer (#9010 ) * optionally enable Jetty ForwardedRequestCustomizer * fix doc build	2019-12-12 17:00:08 -08:00
Himanshu	45101183bc	HRTR: make pending task execution handling to go through all tasks on not finding worker slots (#8697 ) * HRTR: make pending task execution handling to go through all tasks on not finding worker slots * make HRTR methods package private that are meant to be used only in HttpRemoteTaskRunnerResource * mark HttpRemoteTaskRunnerWorkItem.State global variables final * hrtr: move immutableWorker NULL check outside of try-catch or finally block could have NPE * add some explanatory comments * add comment on explaining mechanics around hand off of pending tasks from submission to it getting picked up by a task execution thread * fix spelling	2019-12-12 14:58:52 -08:00
Xavier Léauté	810b85a352	allow druid.host to be undefined to use canonical hostname (#9019 ) It is currently not possible to unset the druid.host property in the docker image to let Druid default to the canonical hostname. It always gets set to the container's IP address. Passing the override environment variable druid_host= unfortunately does not solve the problem, as this gets interpreted as empty string and does not let the default kick in. This change adds the option to pass DRUID_SET_HOST=0 as environment variable to disable the default behavior, and allows passing a common runtime.properties file without druid.host.	2019-12-12 13:51:57 -08:00
Benjamin Hopp	13c33c1766	Update architecture.md (#9015 )	2019-12-11 19:05:50 -08:00
Jihoon Son	66056b2826	Using annotation to distinguish Hadoop Configuration in each module (#9013 ) * Multibinding for NodeRole * Fix endpoints * fix doc * fix test * Using annotation to distinguish Hadoop Configuration in each module	2019-12-11 17:30:44 -08:00
Jihoon Son	e5e1e9c4ee	Fix broken master (#9005 ) * Multibinding for NodeRole * Fix endpoints * fix doc * fix test	2019-12-11 15:56:36 -08:00
Jonathan Wei	8af41d7cd0	Update version to 0.18.0-incubating-SNAPSHOT (#9009 )	2019-12-11 14:04:03 -08:00
Parag Jain	24fe824055	add readiness endpoints to processes having initialization delays (#8841 )	2019-12-10 17:26:13 -08:00
Chi Cao Minh	3de7ab8523	DataSketches jars in core (#9003 ) Having DataSketches jars in core will allow potential improvements, for example: - Provide an alternative implementation of HLL: https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html - Range partitioning for native parallel batch indexing without having the user load extensions on the classpath Dev mailing list discussion: https://lists.apache.org/thread.html/301410d71ff799cf616bf17c4ebcf9999fc30829f5fa62909f403e6c%40%3Cdev.druid.apache.org%3E	2019-12-10 14:02:34 -08:00
Chi Cao Minh	bab78fc80e	Parallel indexing single dim partitions (#8925 ) * Parallel indexing single dim partitions Implements single dimension range partitioning for native parallel batch indexing as described in #8769. This initial version requires the druid-datasketches extension to be loaded. The algorithm has 5 phases that are orchestrated by the supervisor in `ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`. These phases and the main classes involved are described below: 1) In parallel, determine the distribution of dimension values for each input source split. `PartialDimensionDistributionTask` uses `StringSketch` to generate the approximate distribution of dimension values for each input source split. If the rows are ungrouped, `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter` uses a Bloom filter to skip rows that would be grouped. The final distribution is sent back to the supervisor via `DimensionDistributionReport`. 2) The range partitions are determined. In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the supervisor uses `StringSketchMerger` to merge the individual `StringSketch`es created in the preceding phase. The merged sketch is then used to create the range partitions. 3) In parallel, generate partial range-partitioned segments. `PartialRangeSegmentGenerateTask` uses the range partitions determined in the preceding phase and `RangePartitionCachingLocalSegmentAllocator` to generate `SingleDimensionShardSpec`s. The partition information is sent back to the supervisor via `GeneratedGenericPartitionsReport`. 4) The partial range segments are grouped. In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`, the supervisor creates the `PartialGenericSegmentMergeIOConfig`s necessary for the next phase. 5) In parallel, merge partial range-partitioned segments. `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to retrieve the partial range-partitioned segments generated earlier and then merges and publishes them. * Fix dependencies & forbidden apis * Fixes for integration test * Address review comments * Fix docs, strict compile, sketch check, rollup check * Fix first shard spec, partition serde, single subtask * Fix first partition check in test * Misc rewording/refactoring to address code review * Fix doc link * Split batch index integration test * Do not run parallel-batch-index twice * Adjust last partition * Split ITParallelIndexTest to reduce runtime * Rename test class * Allow null values in range partitions * Indicate which phase failed * Improve asserts in tests	2019-12-09 23:05:49 -08:00
Vadim Ogievetsky	a6dcc99962	better input format detection (#9007 )	2019-12-09 22:31:28 -08:00

1 2 3 4 5 ...

10007 Commits All Branches Search

10007 Commits

All Branches