druid

Commit Graph

Author	SHA1	Message	Date
Jihoon Son	66056b2826	Using annotation to distinguish Hadoop Configuration in each module (#9013 ) * Multibinding for NodeRole * Fix endpoints * fix doc * fix test * Using annotation to distinguish Hadoop Configuration in each module	2019-12-11 17:30:44 -08:00
Jihoon Son	e5e1e9c4ee	Fix broken master (#9005 ) * Multibinding for NodeRole * Fix endpoints * fix doc * fix test	2019-12-11 15:56:36 -08:00
Jonathan Wei	8af41d7cd0	Update version to 0.18.0-incubating-SNAPSHOT (#9009 )	2019-12-11 14:04:03 -08:00
Parag Jain	24fe824055	add readiness endpoints to processes having initialization delays (#8841 )	2019-12-10 17:26:13 -08:00
Chi Cao Minh	3de7ab8523	DataSketches jars in core (#9003 ) Having DataSketches jars in core will allow potential improvements, for example: - Provide an alternative implementation of HLL: https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html - Range partitioning for native parallel batch indexing without having the user load extensions on the classpath Dev mailing list discussion: https://lists.apache.org/thread.html/301410d71ff799cf616bf17c4ebcf9999fc30829f5fa62909f403e6c%40%3Cdev.druid.apache.org%3E	2019-12-10 14:02:34 -08:00
Chi Cao Minh	bab78fc80e	Parallel indexing single dim partitions (#8925 ) * Parallel indexing single dim partitions Implements single dimension range partitioning for native parallel batch indexing as described in #8769. This initial version requires the druid-datasketches extension to be loaded. The algorithm has 5 phases that are orchestrated by the supervisor in `ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`. These phases and the main classes involved are described below: 1) In parallel, determine the distribution of dimension values for each input source split. `PartialDimensionDistributionTask` uses `StringSketch` to generate the approximate distribution of dimension values for each input source split. If the rows are ungrouped, `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter` uses a Bloom filter to skip rows that would be grouped. The final distribution is sent back to the supervisor via `DimensionDistributionReport`. 2) The range partitions are determined. In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the supervisor uses `StringSketchMerger` to merge the individual `StringSketch`es created in the preceding phase. The merged sketch is then used to create the range partitions. 3) In parallel, generate partial range-partitioned segments. `PartialRangeSegmentGenerateTask` uses the range partitions determined in the preceding phase and `RangePartitionCachingLocalSegmentAllocator` to generate `SingleDimensionShardSpec`s. The partition information is sent back to the supervisor via `GeneratedGenericPartitionsReport`. 4) The partial range segments are grouped. In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`, the supervisor creates the `PartialGenericSegmentMergeIOConfig`s necessary for the next phase. 5) In parallel, merge partial range-partitioned segments. `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to retrieve the partial range-partitioned segments generated earlier and then merges and publishes them. * Fix dependencies & forbidden apis * Fixes for integration test * Address review comments * Fix docs, strict compile, sketch check, rollup check * Fix first shard spec, partition serde, single subtask * Fix first partition check in test * Misc rewording/refactoring to address code review * Fix doc link * Split batch index integration test * Do not run parallel-batch-index twice * Adjust last partition * Split ITParallelIndexTest to reduce runtime * Rename test class * Allow null values in range partitions * Indicate which phase failed * Improve asserts in tests	2019-12-09 23:05:49 -08:00
Vadim Ogievetsky	a6dcc99962	better input format detection (#9007 )	2019-12-09 22:31:28 -08:00
Clint Wylie	4327892b84	modify multi-value expression transformation behavior to not treat re-use of the same input as a candidate for cartesian mapping (#8957 )	2019-12-09 20:38:15 -08:00
Vadim Ogievetsky	0330744793	Docs: bold Java 8 requirement (#8996 ) * bold Java 8 req * add warning box	2019-12-09 20:23:07 -08:00
Parag Jain	9640f9649a	fix npe while logging sql/query request (#9001 ) * fix npe while logging sql/query request * forbid forbidden DateTime API	2019-12-09 12:02:11 -08:00
Rye	ca77d576c6	add customize separator for TSV inputFormat (#8993 ) * add customize separator for TSV inputFormat * fix spotbug * code refactor * code refactor * add argument check for delimiter * refine null check * add check for delimiter and listdelimiter can not be same * add unit tests	2019-12-09 11:24:09 -08:00
Roman Leventov	1c62987783	Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702 ) * Add SelfDiscoveryResource * Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself). * Extended docs * Fix brace * Remove redundant throws in Lifecycle.Handler.stop() * Import order * Remove unresolvable link * Address comments * tmp * tmp * Rollback docker changes * Remove extra .sh files * Move filter * Fix SecurityResourceFilterTest	2019-12-08 18:47:58 +03:00
Clint Wylie	441515cb50	update dump-segment docs so example command works (#8998 ) * update dump-segment docs so example command works * not everyone uses bash	2019-12-07 06:36:46 -08:00
Clint Wylie	06cd30460e	add query metrics for broker parallel merges, off by default (#8981 ) * add a bunch of metrics for broker parallel merges, off by default, and tests * fix tests * review stuffs * propogateIfPossible	2019-12-06 13:42:53 -08:00
Clint Wylie	cefcfe26dc	update web-console data loader to support unified s3 and google input sources (#8994 ) * update web-console data loader to support unified s3 and google input source * fixes * add placeholder for objects * only show objects if it already exists	2019-12-06 07:25:26 -08:00
Clint Wylie	ca2a7a1f08	more flush timeout for emitter tests (#8991 ) * more flush timeout for emitter tests * share constant	2019-12-05 16:52:35 -08:00
Jonathan Wei	c949a25210	Add DruidInputSource (replacement for IngestSegmentFirehose) (#8982 ) * Add Druid input source and format * Inherit dims/metrics from segment * Add ingest segment firehose reindexing test * Remove unnecessary module * Fix unit tests, checkstyle * Add doc entry * Fix dimensionExclusions handling, add parallel index integration test * Add spelling exclusion * Address some PR comments * Checkstyle * wip * Address rest of PR comments * Address PR comments	2019-12-05 16:50:00 -08:00
Chi Cao Minh	af74acaa85	Address security vulnerabilities CVSS >= 7 (#8980 ) * Address security vulnerabilities CVSS >= 7 Update dependencies to address security vulnerabilities with CVSS scores of 7 or higher. A new Travis CI job is added to prevent new high/critical security vulnerabilities from being added. Updated dependencies: - api-util 1.0.0 -> 1.0.3 - jackson 2.9.10 -> 2.10.1 - kafka 2.1.0 -> 2.1.1 - libthrift 0.10.0 -> 0.13.0 - protobuf 3.2.0 -> 3.11.0 The following high/critical security vulnerabilities are currently suppressed (so that the new Travis CI job can be added now) and are left as future work to fix: - hibernate-validator:5.2.5 - jackson-mapper-asl:1.9.13 - libthrift:0.6.1 - netty:3.10.6 - nimbus-jose-jwt:4.41.1 * Rename EDL1 license file * Fix inspection errors	2019-12-05 14:34:35 -08:00
Clint Wylie	5ecdf94d83	add 'prefixes' support to google input source (#8930 ) * add prefixes support to google input source, making it symmetrical-ish with s3 * docs * more better, and tests * unused * formatting * javadoc * dependencies * oops * review comments * better javadoc	2019-12-04 21:01:10 -08:00
Vadim Ogievetsky	1cff73f3e0	Web console: support new ingest spec format (#8828 ) * converter v1 * working v1 * update tests * update tests * upgrades * adjust to new API * remove hack * fwd * step * neo cache * fix time selection * smart reset * parquest autodetection * add binaryAsString option * partitionsSpec * add ORC support * ingestSegment -> druid * remove index tasks * better min * load data works * remove downgrade * filter on group_id * fix group_id in test * update auto form for new props * add dropBeforeByPeriod rule * simplify * prettify json	2019-12-04 20:21:07 -08:00
Lucas Capistrant	8dd9a8cb15	Small doc fix for baseTaskDir conf (#8978 )	2019-12-04 14:07:03 -08:00
Clint Wylie	a48784a1fd	dropwizard-emitter doc fixes (#8988 )	2019-12-04 12:52:58 -08:00
Q	391646123e	Fix double-checked locking in predicate suppliers in BoundDimFi… (#8974 ) * Fix double-checked locking in predicate suppliers in BoundDimFilter * Fix double-checked locking in predicate suppliers in BoundDimFilter * 1. Use Suppliers.memoize() to initialize and publish singleton. 2. Fix coding style. * Fix coding style * Fix double-checked locking bug for predicate suppliers in InDimFilter	2019-12-04 20:01:52 +03:00
Clint Wylie	d0a6fe7f12	fix bug with sqlOuterLimit, use sqlOuterLimit in web console (#8919 ) * fix bug with sqlOuterLimit, use sqlOuterLimit instead of wrapping sql query for web console * fixes, refactors, tests * meh * better name * fix comment location * fix copy and paste	2019-12-03 18:36:28 -08:00
Fangyuan Deng	187cf0dd3f	[Improvement] historical fast restart by lazy load columns metadata(20X faster) (#6988 ) * historical fast restart by lazy load columns metadata * delete repeated code * add documentation for druid.segmentCache.lazyLoadOnStart * fix unit test fail * fix spellcheck * update docs * update docs mentioning a catch	2019-12-03 09:47:01 -08:00
Clint Wylie	b4efaa698b	unexclude necessary jackson mapper-asl jars (#8977 )	2019-12-02 17:01:11 -08:00
Chi Cao Minh	4b7e79a4e6	Exclude unneeded hadoop transitive dependencies (#8962 ) * Exclude unneeded hadoop transitive dependencies These dependencies are provided by core: - com.squareup.okhttp:okhttp - commons-beanutils:commons-beanutils - org.apache.commons:commons-compress - org.apache.zookepper:zookeeper These dependencies are not needed and are excluded because they contain security vulnerabilities: - commons-beanutils:commons-beanutils-core - org.codehaus.jackson:jackson-mapper-asl * Simplify exclusions + separate unneeded/vulnerable * Do not exclude jackson-mapper-asl	2019-12-02 16:08:21 -08:00
Clint Wylie	6997b167b1	add hdfs client dependency for native batch parquet when using hdfs (#8964 )	2019-11-28 13:12:45 -08:00
Jonathan Wei	00ce18a0ea	Additional Kinesis resharding fixes (#8870 ) * Additional Kinesis resharding fixes * Address PR comments * Remove unused method * Adjust SegmentTransactionalInsertAction null handling * Check for unchanged metadata on empty publish * Add logs for empty publish * Fix javadoc * Clear offset when invalid endOffsets are seen * Fix LGTM alert * Fix build * Add resharding note to Kinesis docs * Checkstyle * Spelling * Address PR comments * Checkstyle	2019-11-28 12:59:01 -08:00
Jihoon Son	86e8903523	Support orc format for native batch ingestion (#8950 ) * Support orc format for native batch ingestion * fix pom and remove wrong comment * fix unnecessary condition check * use flatMap back to handle exception properly * move exceptionThrowingIterator to intermediateRowParsingReader * runtime	2019-11-28 12:45:24 -08:00
Jonathan Wei	55ecaafff0	Add licenses.yaml entry for Wikipedia sample data (#8968 )	2019-11-28 11:41:42 -08:00
jon-wei	dfbc066163	Revert "[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1" This reverts commit `a0f21d9b07`.	2019-11-27 23:22:43 -08:00
jon-wei	0402ff85b8	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit `8ffa71e7e6`.	2019-11-27 23:22:32 -08:00
jon-wei	8ffa71e7e6	[maven-release-plugin] prepare for next development iteration	2019-11-27 23:18:48 -08:00
jon-wei	a0f21d9b07	[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1	2019-11-27 23:18:37 -08:00
Clint Wylie	923c003213	add flush timeout to emitter test (#8963 )	2019-11-27 19:30:09 -08:00
Atul Mohan	a5b40a6099	Remove null handling check (#8960 )	2019-11-27 12:09:33 -08:00
Chi Cao Minh	fba876b607	Update jackson to 2.9.10 (#8940 ) Addresses security vulnerabilities: - sonatype-2016-0397: https://github.com/FasterXML/jackson-core/issues/315 - sonatype-2017-0355: https://github.com/FasterXML/jackson-core/pull/322	2019-11-26 21:41:14 -08:00
Gian Merlino	adb72fe8d5	Improve verify-default-ports to check both INADDR_ANY and 127.0.0.1. (#8942 )	2019-11-26 16:05:15 -08:00
Vadim Ogievetsky	50f7cf6947	Adding quick links to readme (#8946 ) * quick links * added download * add twitter also	2019-11-26 16:04:54 -08:00
Clint Wylie	52ef043be1	add license for tutorial wiki data (#8944 ) * add license for tutorial wiki data * tweaks	2019-11-26 13:33:24 -08:00
Clint Wylie	4458113375	S3 input source (#8903 ) * add s3 input source for native batch ingestion * add docs * fixes * checkstyle * lazy splits * fixes and hella tests * fix it * re-use better iterator * use key * javadoc and checkstyle * exception * oops * refactor to use S3Coords instead of URI * remove unused code, add retrying stream to handle s3 stream * remove unused parameter * update to latest master * use list of objects instead of object * serde test * refactor and such * now with the ability to compile * fix signature and javadocs * fix conflicts yet again, fix S3 uri stuffs * more tests, enforce uri for bucket * javadoc * oops * abstract class instead of interface * null or empty * better error	2019-11-25 22:31:19 -08:00
Vadim Ogievetsky	282b838b3f	fix home view tabs (#8938 )	2019-11-26 12:21:32 +08:00
Alexander Saydakov	4a9da3f3fc	use the latest release of datasketches (#8647 ) * use the latest release of datasketches * added datasketches-memory dependency * updated datasketches entries * use datasketches-memory-1.2.0 * updated dependencies * fixed tests	2019-11-25 19:45:51 -08:00
Clint Wylie	cd31bcc093	un-exclude necessary parquet jackson dependencies instead of relying on curator (#8939 )	2019-11-25 15:57:34 -08:00
Jihoon Son	a2e6de4b16	Fix the potential race between SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor (#8924 ) * Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor * Fix docs and javadoc * Add unit tests for large or small estimated num splits * add override	2019-11-23 01:38:08 -08:00
Gian Merlino	e0eb85ace7	Add FileUtils.createTempDir() and enforce its usage. (#8932 ) * Add FileUtils.createTempDir() and enforce its usage. The purpose of this is to improve error messages. Previously, the error message on a nonexistent or unwritable temp directory would be "Failed to create directory within 10,000 attempts". * Further updates. * Another update. * Remove commons-io from benchmark. * Fix tests.	2019-11-22 19:48:49 -08:00
Rye	0514e5686e	add TsvInputFormat (#8915 ) * add TsvInputFormat * refactor code * fix grammar * use enum replace string literal * code refactor * code refactor * mark abstract for base class meant not to be instantiated * remove constructor for test	2019-11-22 18:01:40 -08:00
Clint Wylie	7250010388	add parquet support to native batch (#8883 ) * add parquet support to native batch * cleanup * implement toJson for sampler support * better binaryAsString test * docs * i hate spellcheck * refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest * add comment, fix some stuff * adjustments * fix accident * tweaks	2019-11-22 10:49:16 -08:00
SeKing	9955107e8e	RandomLocationSelectorStrategy to Choose an available disk(location) to store a segment. With unit tests. (#8461 )	2019-11-22 03:46:54 -08:00

... 2 3 4 5 6 ...

10114 Commits All Branches Search

10114 Commits

All Branches