druid

Commit Graph

Author	SHA1	Message	Date
Clint Wylie	b55657cc26	fix protobuf extension packaging and docs (#9320 ) * fix protobuf extension packaging and docs * fix paths * Update protobuf.md * Update protobuf.md	2020-02-07 09:26:52 -08:00
Lucas Capistrant	53bb45fc9a	Forbid easily misused HashSet and HashMap constructors (#9165 ) * Forbid easily misused HashSet and HashMap constructors * Add two LinkedHashMap constructors to forbidden-apis and create utility method as replacement for them * Fix visibility of constant in CollectionUtils.java * Make an exception for an instance of LinkedHashMap#<init>(int) because proper sizing is used * revert changes to sql module tests that should be in separate PR * Finish reverting changes to sql module tests that were flagged in checkstyle during CI * Add netty dependency resulting from SupressForbidden	2020-02-07 10:44:09 +03:00
Gian Merlino	3ef5c2f2e8	Add MemoryOpenHashTable, a table similar to ByteBufferHashTable. (#9308 ) * Add MemoryOpenHashTable, a table similar to ByteBufferHashTable. With some key differences to improve speed and design simplicity: 1) Uses Memory rather than ByteBuffer for its backing storage. 2) Uses faster hashing and comparison routines (see HashTableUtils). 3) Capacity is always a power of two, allowing simpler design and more efficient implementation of findBucket. 4) Does not implement growability; instead, leaves that to its callers. The idea is this removes the need for subclasses, while still giving callers flexibility in how to handle table-full scenarios. * Fix LGTM warnings. * Adjust dependencies. * Remove easymock from druid-benchmarks. * Adjustments from review. * Fix datasketches unit tests. * Fix checkstyle.	2020-02-04 19:57:59 -08:00
zachjsh	768d60c7b4	Get larger batch of input files when using native batch with google cloud (#9307 ) By default native batch ingestion was only getting a batch of 10 files at a time when used with google cloud. The Default for other cloud providers is 1024, and should be similar for google cloud. The low batch size was caused by mistype. This change updates the batch size to 1024 when using google cloud.	2020-02-04 12:03:32 -08:00
Clint Wylie	5c541f556b	remove log.info from FixedBucketsHistogramAggregator aggregate method (#9309 )	2020-02-04 11:52:50 -08:00
Suneet Saldanha	33a97dfaae	Guicify druid sql module (#9279 ) * Guicify druid sql module Break up the SQLModule in to smaller modules and provide a binding that modules can use to register schemas with druid sql. * fix some tests * address code review * tests compile * Working tests * Add all the tests * fix up licenses and dependencies * add calcite dependency to druid-benchmarks * tests pass * rename the schemas	2020-02-04 11:33:48 -08:00
Gian Merlino	b411443d22	SQL join support for lookups. (#9294 ) * SQL join support for lookups. 1) Add LookupSchema to SQL, so lookups show up in the catalog. 2) Add join-related rels and rules to SQL, allowing joins to be planned into native Druid queries. * Add two missing LookupSchema calls in tests. * Fix tests. * Fix typo.	2020-01-31 23:51:16 -08:00
Gian Merlino	660f8838f4	Allow HdfsDataSegmentKiller to be instantiated without storageDirectory set. (#9296 ) This is important because if a user has the hdfs extension loaded, but is not using hdfs deep storage, then they will not have storageDirectory set and will get the following error: IllegalArgumentException: Can not create a Path from an empty string at io.druid.storage.hdfs.HdfsDataSegmentKiller.<init>(HdfsDataSegmentKiller.java:47) This scenario is realistic: it comes up when someone has the hdfs extension loaded because they want to use HdfsInputSource, but don't want to use hdfs for deep storage. Fixes #4694.	2020-01-31 23:50:48 -08:00
Gian Merlino	204ba9966f	Add LookupJoinableFactory. (#9281 ) * Add LookupJoinableFactory. Enables joins where the right-hand side is a lookup. Includes an integration test. Also, includes changes to LookupExtractorFactoryContainerProvider: 1) Add "getAllLookupNames", which will be needed to eventually connect lookups to Druid's SQL catalog. 2) Convert "get" from nullable to Optional return. 3) Swap out most usages of LookupReferencesManager in favor of the simpler LookupExtractorFactoryContainerProvider interface. * Fixes for tests. * Fix another test. * Java 11 message fix. * Fixups. * Fixup benchmark class.	2020-01-30 14:46:21 -08:00
Suneet Saldanha	303b02eba1	intelliJ inspections cleanup (#9260 ) * intelliJ inspections cleanup - remove redundant escapes - performance warnings - access static member via instance reference - static method declared final - inner class may be static Most of these changes are aesthetic, however, they will allow inspections to be enabled as part of CI checks going forward The valuable changes in this delta are: - using StringBuilder instead of string addition in a loop indexing-hadoop/.../Utils.java processing/.../ByteBufferMinMaxOffsetHeap.java - Use class variables instead of static variables for parameterized test processing/src/.../ScanQueryLimitRowIteratorTest.java * Add intelliJ inspection warnings as errors to druid profile * one more static inner class	2020-01-29 11:50:52 -08:00
Roman Leventov	b9186f8f9f	Reconcile terminology and method naming to 'used/unused segments'; Rename MetadataSegmentManager to MetadataSegmentsManager (#7306 ) * Reconcile terminology and method naming to 'used/unused segments'; Don't use terms 'enable/disable data source'; Rename MetadataSegmentManager to MetadataSegments; Make REST API methods which mark segments as used/unused to return server error instead of an empty response in case of error * Fix brace * Import order * Rename withKillDataSourceWhitelist to withSpecificDataSourcesToKill * Fix tests * Fix tests by adding proper methods without interval parameters to IndexerMetadataStorageCoordinator instead of hacking with Intervals.ETERNITY * More aligned names of DruidCoordinatorHelpers, rename several CoordinatorDynamicConfig parameters * Rename ClientCompactTaskQuery to ClientCompactionTaskQuery for consistency with CompactionTask; ClientCompactQueryTuningConfig to ClientCompactionTaskQueryTuningConfig * More variable and method renames * Rename MetadataSegments to SegmentsMetadata * Javadoc update * Simplify SegmentsMetadata.getUnusedSegmentIntervals(), more javadocs * Update Javadoc of VersionedIntervalTimeline.iterateAllObjects() * Reorder imports * Rename SegmentsMetadata.tryMark... methods to mark... and make them to return boolean and the numbers of segments changed and relay exceptions to callers * Complete merge * Add CollectionUtils.newTreeSet(); Refactor DruidCoordinatorRuntimeParams creation in tests * Remove MetadataSegmentManager * Rename millisLagSinceCoordinatorBecomesLeaderBeforeCanMarkAsUnusedOvershadowedSegments to leadingTimeMillisBeforeCanMarkAsUnusedOvershadowedSegments * Fix tests, refactor DruidCluster creation in tests into DruidClusterBuilder * Fix inspections * Fix SQLMetadataSegmentManagerEmptyTest and rename it to SqlSegmentsMetadataEmptyTest * Rename SegmentsAndMetadata to SegmentsAndCommitMetadata to reduce the similarity with SegmentsMetadata; Rename some methods * Rename DruidCoordinatorHelper to CoordinatorDuty, refactor DruidCoordinator * Unused import * Optimize imports * Rename IndexerSQLMetadataStorageCoordinator.getDataSourceMetadata() to retrieveDataSourceMetadata() * Unused import * Update terminology in datasource-view.tsx * Fix label in datasource-view.spec.tsx.snap * Fix lint errors in datasource-view.tsx * Doc improvements * Another attempt to please TSLint * Another attempt to please TSLint * Style fixes * Fix IndexerSQLMetadataStorageCoordinator.createUsedSegmentsSqlQueryForIntervals() (wrong merge) * Try to fix docs build issue * Javadoc and spelling fixes * Rename SegmentsMetadata to SegmentsMetadataManager, address other comments * Address more comments	2020-01-27 11:24:29 -08:00
Clint Wylie	c6c8b80644	fix build by updating kafka client to 2.2.2 for CVE-2019-12399 (#9259 ) * fix build by updating kafka client to 2.2.2 for CVE-2019-12399 * one kafka version to rule them all * notice	2020-01-27 11:07:02 -08:00
Gian Merlino	19b427e8f3	Add JoinableFactory interface and use it in the query stack. (#9247 ) * Add JoinableFactory interface and use it in the query stack. Also includes InlineJoinableFactory, which enables joining against inline datasources. This is the first patch where a basic join query actually works. It includes integration tests. * Fix test issues. * Adjustments from code review.	2020-01-24 13:10:01 -08:00
Gian Merlino	d21054f7c5	Remove the deprecated interval-chunking stuff. (#9216 ) * Remove the deprecated interval-chunking stuff. See https://github.com/apache/druid/pull/6591, https://github.com/apache/druid/pull/4004#issuecomment-284171911 for details. * Remove unused import. * Remove chunkInterval too.	2020-01-19 17:14:23 -08:00
Fokko Driesprong	486c0fd149	Bump Apache Parquet to 1.11.0 (#9129 ) * Bump Parquet to 1.11.0 * Update licenses.yaml * Add parquet-format-structures	2020-01-16 16:24:25 -08:00
Gian Merlino	a87db7f353	Add HashJoinSegment, a virtual segment for joins. (#9111 ) * Add HashJoinSegment, a virtual segment for joins. An initial step towards #8728. This patch adds enough functionality to implement a joining cursor on top of a normal datasource. It does not include enough to actually do a query. For that, future patches will need to wire this low-level functionality into the query language. * Fixups. * Fix missing format argument. * Various tests and minor improvements. * Changes. * Remove or add tests for unused stuff. * Fix up package locations.	2020-01-16 13:14:20 -08:00
Chi Cao Minh	1fd05bef9a	Add jackson-mapper-asl for hdfs-storage extension (#9178 ) Previously jackson-mapper-asl was excluded to remove a security vulnerability; however, it is required for functionality (e.g., org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator).	2020-01-14 09:50:45 -08:00
Atul Mohan	ea51bc45bf	Fix nullhandling in tests (#9119 )	2020-01-12 20:19:12 -08:00
Clint Wylie	85219ece13	fix null handling for arithmetic post aggregator comparator (#9159 ) * fix null handling for arithmetic postagg comparator, add test for comparator for min/max/quantile postaggs in histogram ext * fix	2020-01-10 13:49:19 -08:00
Jihoon Son	e27a1e8604	Fix handling nullable writableComparable in OrcStructConverter (#9138 ) * Handle nullable writableComparable in OrcStructConverter * add missing dependency	2020-01-08 13:40:24 -08:00
Clint Wylie	f540216931	fix InputFormat serde issue with SeekableStream based supervisors (#9136 )	2020-01-07 16:18:54 -06:00
Clint Wylie	7af85250cb	null handling for doubles sketch and array of doubles sketch aggs (#9112 ) * doubles sketch and array of doubles sketch aggs now skip rows with nulls in sql compatible null handling mode * formatting	2020-01-07 14:15:32 -06:00
Suneet Saldanha	bdd0d0d8a5	Add avro dependency to parquet extension (#9124 ) * Add avro dependency to parquet extension If the parquet extension is loaded and an ingestionSpec uses the older format specifying a 'parser' instead of using an 'inputFormat' the job fails with the following error java.lang.TypeNotPresentException: Type org.apache.avro.generic.GenericRecord not present This change removes the exclusion of the avro package so that the missing class can be found. * Address review comments and add dependency version	2020-01-03 20:11:13 -06:00
Jonathan Wei	aa539177ec	De-incubation cleanup in code, docs, packaging (#9108 ) * De-incubation cleanup in code, docs, packaging * remove unused docs script	2020-01-03 12:33:19 -05:00
Jonathan Wei	4e8368a5d9	Set version to 0.18.0-SNAPSHOT (#9109 )	2020-01-02 17:55:10 -05:00
Gian Merlino	18eb456fe6	S3: Improvements to prefix listing (including fix for an infinite loop) (#9098 ) * S3: Improvements to prefix listing (including fix for an infinite loop) 1) Fixes #9097, an infinite loop that occurs when more than one batch of objects is retrieved during a prefix listing. 2) Removes the Access Denied fallback code added in #4444. I don't think the behavior is reasonable: its purpose is to fall back from a prefix listing to a single-object access, but it's only activated when the end user supplied a prefix, so it would be better to simply fail, so the end user knows that their request for a prefix-based load is not going to work. Presumably the end user can switch from supplying 'prefixes' to supplying 'uris' if desired. 3) Filters out directory placeholders when walking prefixes. 4) Splits LazyObjectSummariesIterator into its own class and adds tests. * Adjust S3InputSourceTest. * Changes from review. * Include hamcrest-core.	2019-12-31 19:06:49 -05:00
Chi Cao Minh	513bb1f6da	Get proper Kinesis index task AWS credentials (#9082 ) Previously, the configured S3 credentials would be used instead of the ones configured for Kinesis for Kinesis index tasks.	2019-12-20 19:35:05 -08:00
Jihoon Son	66056b2826	Using annotation to distinguish Hadoop Configuration in each module (#9013 ) * Multibinding for NodeRole * Fix endpoints * fix doc * fix test * Using annotation to distinguish Hadoop Configuration in each module	2019-12-11 17:30:44 -08:00
Jonathan Wei	8af41d7cd0	Update version to 0.18.0-incubating-SNAPSHOT (#9009 )	2019-12-11 14:04:03 -08:00
Chi Cao Minh	3de7ab8523	DataSketches jars in core (#9003 ) Having DataSketches jars in core will allow potential improvements, for example: - Provide an alternative implementation of HLL: https://datasketches.github.io/docs/HLL/HllSketchVsDruidHyperLogLogCollector.html - Range partitioning for native parallel batch indexing without having the user load extensions on the classpath Dev mailing list discussion: https://lists.apache.org/thread.html/301410d71ff799cf616bf17c4ebcf9999fc30829f5fa62909f403e6c%40%3Cdev.druid.apache.org%3E	2019-12-10 14:02:34 -08:00
Chi Cao Minh	bab78fc80e	Parallel indexing single dim partitions (#8925 ) * Parallel indexing single dim partitions Implements single dimension range partitioning for native parallel batch indexing as described in #8769. This initial version requires the druid-datasketches extension to be loaded. The algorithm has 5 phases that are orchestrated by the supervisor in `ParallelIndexSupervisorTask#runRangePartitionMultiPhaseParallel()`. These phases and the main classes involved are described below: 1) In parallel, determine the distribution of dimension values for each input source split. `PartialDimensionDistributionTask` uses `StringSketch` to generate the approximate distribution of dimension values for each input source split. If the rows are ungrouped, `PartialDimensionDistributionTask.UngroupedRowDimensionValueFilter` uses a Bloom filter to skip rows that would be grouped. The final distribution is sent back to the supervisor via `DimensionDistributionReport`. 2) The range partitions are determined. In `ParallelIndexSupervisorTask#determineAllRangePartitions()`, the supervisor uses `StringSketchMerger` to merge the individual `StringSketch`es created in the preceding phase. The merged sketch is then used to create the range partitions. 3) In parallel, generate partial range-partitioned segments. `PartialRangeSegmentGenerateTask` uses the range partitions determined in the preceding phase and `RangePartitionCachingLocalSegmentAllocator` to generate `SingleDimensionShardSpec`s. The partition information is sent back to the supervisor via `GeneratedGenericPartitionsReport`. 4) The partial range segments are grouped. In `ParallelIndexSupervisorTask#groupGenericPartitionLocationsPerPartition()`, the supervisor creates the `PartialGenericSegmentMergeIOConfig`s necessary for the next phase. 5) In parallel, merge partial range-partitioned segments. `PartialGenericSegmentMergeTask` uses `GenericPartitionLocation` to retrieve the partial range-partitioned segments generated earlier and then merges and publishes them. * Fix dependencies & forbidden apis * Fixes for integration test * Address review comments * Fix docs, strict compile, sketch check, rollup check * Fix first shard spec, partition serde, single subtask * Fix first partition check in test * Misc rewording/refactoring to address code review * Fix doc link * Split batch index integration test * Do not run parallel-batch-index twice * Adjust last partition * Split ITParallelIndexTest to reduce runtime * Rename test class * Allow null values in range partitions * Indicate which phase failed * Improve asserts in tests	2019-12-09 23:05:49 -08:00
Roman Leventov	1c62987783	Add SelfDiscoveryResource; rename org.apache.druid.discovery.No… (#6702 ) * Add SelfDiscoveryResource * Rename org.apache.druid.discovery.NodeType to NodeRole. Refactor CuratorDruidNodeDiscoveryProvider. Make SelfDiscoveryResource to listen to updates only about a single node (itself). * Extended docs * Fix brace * Remove redundant throws in Lifecycle.Handler.stop() * Import order * Remove unresolvable link * Address comments * tmp * tmp * Rollback docker changes * Remove extra .sh files * Move filter * Fix SecurityResourceFilterTest	2019-12-08 18:47:58 +03:00
Chi Cao Minh	af74acaa85	Address security vulnerabilities CVSS >= 7 (#8980 ) * Address security vulnerabilities CVSS >= 7 Update dependencies to address security vulnerabilities with CVSS scores of 7 or higher. A new Travis CI job is added to prevent new high/critical security vulnerabilities from being added. Updated dependencies: - api-util 1.0.0 -> 1.0.3 - jackson 2.9.10 -> 2.10.1 - kafka 2.1.0 -> 2.1.1 - libthrift 0.10.0 -> 0.13.0 - protobuf 3.2.0 -> 3.11.0 The following high/critical security vulnerabilities are currently suppressed (so that the new Travis CI job can be added now) and are left as future work to fix: - hibernate-validator:5.2.5 - jackson-mapper-asl:1.9.13 - libthrift:0.6.1 - netty:3.10.6 - nimbus-jose-jwt:4.41.1 * Rename EDL1 license file * Fix inspection errors	2019-12-05 14:34:35 -08:00
Clint Wylie	5ecdf94d83	add 'prefixes' support to google input source (#8930 ) * add prefixes support to google input source, making it symmetrical-ish with s3 * docs * more better, and tests * unused * formatting * javadoc * dependencies * oops * review comments * better javadoc	2019-12-04 21:01:10 -08:00
Clint Wylie	b4efaa698b	unexclude necessary jackson mapper-asl jars (#8977 )	2019-12-02 17:01:11 -08:00
Chi Cao Minh	4b7e79a4e6	Exclude unneeded hadoop transitive dependencies (#8962 ) * Exclude unneeded hadoop transitive dependencies These dependencies are provided by core: - com.squareup.okhttp:okhttp - commons-beanutils:commons-beanutils - org.apache.commons:commons-compress - org.apache.zookepper:zookeeper These dependencies are not needed and are excluded because they contain security vulnerabilities: - commons-beanutils:commons-beanutils-core - org.codehaus.jackson:jackson-mapper-asl * Simplify exclusions + separate unneeded/vulnerable * Do not exclude jackson-mapper-asl	2019-12-02 16:08:21 -08:00
Clint Wylie	6997b167b1	add hdfs client dependency for native batch parquet when using hdfs (#8964 )	2019-11-28 13:12:45 -08:00
Jonathan Wei	00ce18a0ea	Additional Kinesis resharding fixes (#8870 ) * Additional Kinesis resharding fixes * Address PR comments * Remove unused method * Adjust SegmentTransactionalInsertAction null handling * Check for unchanged metadata on empty publish * Add logs for empty publish * Fix javadoc * Clear offset when invalid endOffsets are seen * Fix LGTM alert * Fix build * Add resharding note to Kinesis docs * Checkstyle * Spelling * Address PR comments * Checkstyle	2019-11-28 12:59:01 -08:00
Jihoon Son	86e8903523	Support orc format for native batch ingestion (#8950 ) * Support orc format for native batch ingestion * fix pom and remove wrong comment * fix unnecessary condition check * use flatMap back to handle exception properly * move exceptionThrowingIterator to intermediateRowParsingReader * runtime	2019-11-28 12:45:24 -08:00
jon-wei	dfbc066163	Revert "[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1" This reverts commit `a0f21d9b07`.	2019-11-27 23:22:43 -08:00
jon-wei	0402ff85b8	Revert "[maven-release-plugin] prepare for next development iteration" This reverts commit `8ffa71e7e6`.	2019-11-27 23:22:32 -08:00
jon-wei	8ffa71e7e6	[maven-release-plugin] prepare for next development iteration	2019-11-27 23:18:48 -08:00
jon-wei	a0f21d9b07	[maven-release-plugin] prepare release druid-0.16.1-incubating-rc1	2019-11-27 23:18:37 -08:00
Clint Wylie	4458113375	S3 input source (#8903 ) * add s3 input source for native batch ingestion * add docs * fixes * checkstyle * lazy splits * fixes and hella tests * fix it * re-use better iterator * use key * javadoc and checkstyle * exception * oops * refactor to use S3Coords instead of URI * remove unused code, add retrying stream to handle s3 stream * remove unused parameter * update to latest master * use list of objects instead of object * serde test * refactor and such * now with the ability to compile * fix signature and javadocs * fix conflicts yet again, fix S3 uri stuffs * more tests, enforce uri for bucket * javadoc * oops * abstract class instead of interface * null or empty * better error	2019-11-25 22:31:19 -08:00
Alexander Saydakov	4a9da3f3fc	use the latest release of datasketches (#8647 ) * use the latest release of datasketches * added datasketches-memory dependency * updated datasketches entries * use datasketches-memory-1.2.0 * updated dependencies * fixed tests	2019-11-25 19:45:51 -08:00
Clint Wylie	cd31bcc093	un-exclude necessary parquet jackson dependencies instead of relying on curator (#8939 )	2019-11-25 15:57:34 -08:00
Jihoon Son	a2e6de4b16	Fix the potential race between SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor (#8924 ) * Fix the potential race SplittableInputSource.getNumSplits() and SplittableInputSource.createSplits() in TaskMonitor * Fix docs and javadoc * Add unit tests for large or small estimated num splits * add override	2019-11-23 01:38:08 -08:00
Gian Merlino	e0eb85ace7	Add FileUtils.createTempDir() and enforce its usage. (#8932 ) * Add FileUtils.createTempDir() and enforce its usage. The purpose of this is to improve error messages. Previously, the error message on a nonexistent or unwritable temp directory would be "Failed to create directory within 10,000 attempts". * Further updates. * Another update. * Remove commons-io from benchmark. * Fix tests.	2019-11-22 19:48:49 -08:00
Clint Wylie	7250010388	add parquet support to native batch (#8883 ) * add parquet support to native batch * cleanup * implement toJson for sampler support * better binaryAsString test * docs * i hate spellcheck * refactor toMap conversion so can be shared through flattenerMaker, default impls should be good enough for orc+avro, fixup for merge with latest * add comment, fix some stuff * adjustments * fix accident * tweaks	2019-11-22 10:49:16 -08:00
Jihoon Son	934547a215	RetryingInputEntity to retry on transient errors (#8923 ) * RetryingInputEntity to retry on transient errors * fix some javadoc and httpEntity * Make it interface * Javadoc for offset	2019-11-21 21:32:18 -08:00
Jonathan Wei	dc6178d1f2	Upgrade Calcite to 1.21 (#8566 ) * Upgrade Calcite to 1.21 * Checkstyle, test fix' * Exclude calcite yaml deps, update license.yaml * Add method for exception chain handling * Checkstyle * PR comments, Add outer limit context flag * Revert project settings change * Update subquery test comment * Checkstyle fix * Fix test in sql compat mode * Fix test * Fix dependency analysis * Address PR comments * Checkstyle * Adjust testSelectStarFromSelectSingleColumnWithLimitDescending	2019-11-20 21:22:55 -08:00
Jihoon Son	ac6d703814	Support inputFormat and inputSource for sampler (#8901 ) * Support inputFormat and inputSource for sampler * Cleanup javadocs and names * fix style * fix timed shutoff input source reader * fix timed shutoff input source reader again * tidy up timed shutoff reader * unused imports * fix tc	2019-11-20 14:51:25 -08:00
Surekha	d628bebbd7	Make supervisor API similar to submit task API (#8810 ) * accept spec or dataSchema, tuningConfig, ioConfig while submitting task json * fix test * update docs * lgtm warning * Add original constructor back to IndexTask to minimize changes * fix indentation in docs * Allow spec to be specified in supervisor schema * undo IndexTask spec changes * update docs * Add Nullable and deprecated annotations * remove deprecated configs from SeekableStreamSupervisorSpec * remove nullable annotation	2019-11-20 10:04:41 -08:00
Clint Wylie	3fcaa1a61b	fix sql compatible null handling config work with runtime.properties (#8876 ) * fix sql compatible null handling config work with runtime.properties * fix npe * fix tests * add friendly error * comment, and friendlier still * fix compile * fix from merges	2019-11-20 03:55:29 -08:00
Chi Cao Minh	4ae6466ae2	HDFS input source (#8899 ) * HDFS input source Add support for using HDFS as an input source. In this version, commas or globs are not supported in HDFS paths. * Fix forbidden api * Address review comments	2019-11-19 22:19:39 -08:00
Clint Wylie	074a45219d	add google cloud storage InputSource for native batch (#8907 ) * add google cloud storage InputSource for native batch * rename * checkstyle * fix * fix spelling * review comments	2019-11-19 19:49:43 -08:00
Rye	d0913475b7	sampler returns nulls in CSV (#8871 ) * sampler returns nulls in CSV * fixed kafka sampler test * fix Kinesis test * sql compatibility fix * remove null to empty string conversion, use null * fix sql compatibility	2019-11-19 13:59:44 -08:00
Gian Merlino	c44452f0c1	Tidy up lifecycle, query, and ingestion logging. (#8889 ) * Tidy up lifecycle, query, and ingestion logging. The goal of this patch is to improve the clarity and usefulness of Druid's logging for cluster operators. For more information, see https://twitter.com/cowtowncoder/status/1195469299814555648. Concretely, this patch does the following: - Changes a lot of INFO logs to DEBUG, and DEBUG to TRACE, with the goal of reducing redundancy and improving clarity by avoiding showing rarely-useful log messages. This includes most "starting" and "stopping" messages, and most messages related to individual columns. - Adds new log4j2 templates that show operators how to enabled DEBUG logging for certain important packages. - Eliminate stack traces for query errors, unless log level is DEBUG or more. This is useful because query errors often indicate user error rather than system error, but dumping stack trace often gave operators the impression that there was a system failure. - Adds task id to Appenderator, AppenderatorDriver thread names. In the default log4j2 configuration, this will put them in log lines as well. It's very useful if a user is using the Indexer, where multiple tasks run in the same JVM. - More consistent terminology when it comes to "sequences" (sets of segments that are handed-off together by Kafka ingestion) and "offsets" (cursors in partitions). These terms had been confused in some log messages due to the fact that Kinesis calls offsets "sequence numbers". - Replaces some ugly toString calls with either the JSONification or something more operator-accessible (like a URL or segment identifier, instead of JSON object representing the same). * Adjustments. * Adjust integration test.	2019-11-19 13:57:58 -08:00
Surekha	cf6643eb9a	add sequenceName and currentCheckPoint for backwards compatibility (#8864 ) * add sequenceName and currentCheckPoint for backwards compatibility * Add serde unit test in kafka * fix checkstyle * add hashcode * update javadoc	2019-11-19 13:11:31 -08:00
Chi Cao Minh	8365bdf62a	Address security vulnerabilities (#8878 ) * Address security vulnerabilities Security vulnerabilities addressed by upgrading 3rd party libs: - Upgrade avro-ipc to 1.9.1 - sonatype-2019-0115 - Upgrade caffeine to 2.8.0 - sonatype-2019-0282 - Upgrade commons-beanutils to 1.9.4 - CVE-2014-0114 - Upgrade commons-codec to 1.13 - sonatype-2012-0050 - Upgrade commons-compress to 1.19 - CVE-2019-12402 - sonatype-2018-0293 - Upgrade hadoop-common to 2.8.5 - CVE-2018-11767 - Upgrade hadoop-mapreduce-client-core to 2.8.5 - CVE-2017-3166 - Upgrade hibernate-validator to 5.2.5 - CVE-2017-7536 - Upgrade httpclient to 4.5.10 - sonatype-2017-0359 - Upgrade icu4j to 55.1 - CVE-2014-8147 - Upgrade jackson-databind to 2.6.7.3: - CVE-2017-7525 - Upgrade jetty-http to 9.4.12: - CVE-2017-7657 - CVE-2017-7658 - CVE-2017-7656 - CVE-2018-12545 - Upgrade log4j-core to 2.8.2 - CVE-2017-5645: - Upgrade netty to 3.10.6 - CVE-2015-2156 - Upgrade netty-common to 4.1.42 - CVE-2019-9518 - Upgrade netty-codec-http to 4.1.42 - CVE-2019-16869 - Upgrade nimbus-jose-jwt to 4.41.1 - CVE-2017-12972 - CVE-2017-12974 - Upgrade plexus-utils to 3.0.24 - CVE-2017-1000487 - sonatype-2015-0173 - sonatype-2016-0398 - Upgrade postgresql to 42.2.8 - CVE-2018-10936 Note that if users are using JDBC lookups with postgres, they may need to update the JDBC jar used by the lookup extension. * Fix license for postgresql	2019-11-19 09:14:33 -08:00
Atul Mohan	8515a03c6b	Modify batch index task naming to accomodate simultaneous tasks (#8612 ) * Change hadoop task naming * Remove unused * Add timestamp * Fix build	2019-11-18 15:07:16 -08:00
Chi Cao Minh	d60978343a	Improve missing JDBC driver error for lookups (#8872 ) If the JDBC drivers are missing from the lookup extensions, throw an exception that directs the user how to resolve the issue. This change is a follow up to #8825.	2019-11-18 11:42:38 -08:00
Rye	ea8e4066f6	Use earliest offset on kafka newly discovered partitions (#8748 ) * Use earliest offset on kafka newly discovered partitions * resolve conflicts * remove redundant check cases * simplified unit tests * change test case * rewrite comments * add regression test * add junit ignore annotation * minor modifications * indent * override testableKafkaSupervisor and KafkaRecordSupplier to make the test runable * modified test constructor of kafkaRecordSupplier * simplify * delegated constructor	2019-11-18 11:05:31 -08:00
Jihoon Son	1611792855	Add InputSource and InputFormat interfaces (#8823 ) * Add InputSource and InputFormat interfaces * revert orc dependency * fix dimension exclusions and failing unit tests * fix tests * fix test * fix test * fix firehose and inputSource for parallel indexing task * fix tc * fix tc: remove unused method * Formattable * add needsFormat(); renamed to ObjectSource; pass metricsName for reader * address comments * fix closing resource * fix checkstyle * fix tests * remove verify from csv * Revert "remove verify from csv" This reverts commit `1ea7758489`. * address comments * fix import order and javadoc * flatMap * sampleLine * Add IntermediateRowParsingReader * Address comments * move csv reader test * remove test for verify * adjust comments * Fix InputEntityIteratingReader * rename source -> entity * address comments	2019-11-15 09:22:09 -08:00
Jonathan Wei	75ea0d592a	Add more datasketches doubles sketch SQL functions (#8843 ) * Add more datasketches doubles sketch SQL postaggs * style and lgtm	2019-11-08 18:05:06 -08:00
Gian Merlino	c204d68376	Fixes, adjustments to numeric null handling and string first/last aggregators. (#8834 ) There is a class of bugs due to the fact that BaseObjectColumnValueSelector has both "getObject" and "isNull" methods, but in most selector implementations and most call sites, it is clear that the intent of "isNull" is only to apply to the primitive getters, not the object getter. This makes sense, because the purpose of isNull is to enable detection of nulls in otherwise-primitive columns. Imagine a string column with a numeric selector built on top of it. You would want it to return isNull = true, so numeric aggregators don't treat it as all zeroes. Sometimes this design leads people to accidentally guard non-primitive get methods with "selector.isNull" checks, which is improper. This patch has three goals: 1) Fix null-handling bugs that already exist in this class. 2) Make interface and doc changes that reduce the probability of future bugs. 3) Fix other, unrelated bugs I noticed in the stringFirst and stringLast aggregators while fixing null-handling bugs. I thought about splitting this into its own patch, but it ended up being tough to split from the null-handling fixes. For (1) the fixes are, - Fix StringFirst and StringLastAggregatorFactory to stop guarding getObject calls on isNull, by no longer extending NullableAggregatorFactory. Now uses -1 as a sigil value for null, to differentiate nulls and empty strings. - Fix ExpressionFilter to stop guarding getObject calls on isNull. Also, use eval.asBoolean() to avoid calling getLong on the selector after already calling getObject. - Fix ObjectBloomFilterAggregator to stop guarding DimensionSelector calls on isNull. Also, refactored slightly to avoid the overhead of calling getObject followed by another getter (see BloomFilterAggregatorFactory for part of this). For (2) the main changes are, - Remove the "isNull" method from BaseObjectColumnValueSelector. - Clarify "isNull" doc on BaseNullableColumnValueSelector. - Rename NullableAggregatorFactory -> NullbleNumericAggregatorFactory to emphasize that it only works on aggregators that take numbers as input. - Similar naming changes to the Aggregator, BufferAggregator, and AggregateCombiner. - Similar naming changes to helper methods for groupBy, ValueMatchers, etc. For (3) the other fixes for StringFirst and StringLastAggregatorFactory are, - Fixed buffer overrun in the buffer aggregators when some characters in the string code into more than one byte (the old code used "substring" to apply a byte limit, which is bad). I did this by introducing a new StringUtils.toUtf8WithLimit method. - Fixed weird IncrementalIndex logic that led to reading nulls for the timestamp. - Adjusted weird StringFirst/Last logic that worked around the weird IncrementalIndex behavior. - Refactored to share code between the four aggregators. - Improved test coverage. - Made the base stringFirst, stringLast aggregators adaptive, and streamlined the xFold versions into aliases. The adaptiveness is similar to how other aggregators like hyperUnique work.	2019-11-07 17:46:59 -08:00
Roman Leventov	5c0fc0a13a	Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564 ) * IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals() * Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever * Fix tests * More fixes * Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code * Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase * More test fixes * More test fixes * Add a comment to VersionedIntervalTimelineTestBase * Fix tests * Set DataSegment.size(0) in more tests * Specify DataSegment.size(0) in more places in tests * Fix more tests * Fix DruidSchemaTest * Set DataSegment's size in more tests and benchmarks * Fix HdfsDataSegmentPusherTest * Doc changes addressing comments * Extended doc for visibility * Typo * Typo 2 * Address comment	2019-11-06 11:07:04 -08:00
Giuseppe Martino	9c171e2b1f	Message rejection absolute date (#8656 ) * Add option lateMessageRejectionStartDate * Use option lateMessageRejectionStartDate * Fix tests * Add lateMessageRejectionStartDate to kafka indexing service * Update tests kafka indexing service * Fix tests for KafkaSupervisorTest * Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig * Fix var name * Update documentation * Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.	2019-10-31 15:13:02 -07:00
yuanli	bca649e492	Case sensitive comparison of nonbinary string in MySQL metadata storage (#8758 )	2019-10-30 20:48:08 -07:00
Clint Wylie	3ff5e02237	remove select query (#8739 ) * remove select query * thanks teamcity * oops * oops * add back a SelectQuery class that throws RuntimeExceptions linking to docs * adjust text * update docs per review * deprecated	2019-10-30 19:29:56 -07:00
karthikbhat13	b8ceee4eee	Removed 'if' condition. (#8768 )	2019-10-28 13:40:03 -07:00
Gian Merlino	b65d2ac648	Add HDFS firehose (#8754 ) * Add HDFS firehose. * Tests, support for lists of paths. * Fixups. * Update list of firehoses. * Wildcards is a word.	2019-10-28 08:07:38 -07:00
Jihoon Son	2518478b20	Remove deprecated parameter for Checkpoint request (#8707 ) * Remove deprecated parameter for Checkpoint request * fix wrong doc	2019-10-23 16:51:16 -07:00
Surekha	98f59ddd7e	Add `sys.supervisors` table to system tables (#8547 ) * Add supervisors table to SystemSchema * Add docs * fix checkstyle * fix test * fix CI * Add comments * Fix javadoc teamcity error * comments * fix links in docs * fix links * rename fullStatus query param to system and remove it from docs	2019-10-18 15:16:42 -07:00
Jonathan Wei	d88075237a	Add initial SQL support for non-expression sketch postaggs (#8487 ) * Add initial SQL support for non-expression sketch postaggs * Checkstyle, spotbugs * checkstyle * imports * Update SQL docs * Checkstyle * Fix theta sketch operator docs * PR comments * Checkstyle fixes * Add missing entries for HLL sketch module * PR comments, add round param to HLL estimate operator, fix optional HLL param	2019-10-18 14:59:44 -07:00
Jihoon Son	30c15900be	Auto compaction based on parallel indexing (#8570 ) * Auto compaction based on parallel indexing * javadoc and doc * typo * update spell * addressing comments * address comments * fix log * fix build * fix test * increase default max input segment bytes per task * fix test	2019-10-18 13:24:14 -07:00
Mohammad J. Khan	1ca859584f	Issue 8678 Non-coordinator services are repeatedly logging JsonMappingException when using druid-basic-security extension with an authenticator that has no users setup (#8692 )	2019-10-18 11:09:53 -07:00
Jonathan Wei	89ce6384f5	More Kinesis resharding adjustments (#8671 ) * More Kinesis resharding adjustments * Fix TC inspection * Fix comment' * Adjust comment, small refactor * Make repartition transition time configurable * Add spellcheck exclusion * Spelling fix	2019-10-15 23:19:17 -07:00
Jihoon Son	4046c86d62	Stateful auto compaction (#8573 ) * Stateful auto compaction * javaodc * add removed test back * fix test * adding indexSpec to compactionState * fix build * add lastCompactionState * address comments * extract CompactionState * fix doc * fix build and test * Add a task context to store compaction state; add javadoc * fix it test	2019-10-15 22:57:42 -07:00
Jonathan Wei	0c387c1d47	Fix Kinesis resharding issues (#8644 ) * Fix Kinesis resharding issues * PR comments * Adjust metadata error message * Remove unused method * Use sha1 for shard id hashing * Add metadata sanity check, add comment * Only use shard ID hashing for group mapping * Style fix * Fix unused import * update comment * Fix teamcity inspection	2019-10-10 00:16:44 -07:00
Jonathan Wei	526f04c47c	Fix missing jackson jars for hadoop ingestion (#8652 ) * Fix missing jackson jars for hadoop ingestion * PR comments * pom ordering * New approach * Remove all jackson-core/mapper-asl exclusions from hdfs storage	2019-10-08 23:54:55 -07:00
Mohammad J. Khan	18758f5228	Support LDAP authentication/authorization (#6972 ) * Support LDAP authentication/authorization * fixed integration-tests * fixed Travis CI build errors related to druid-security module * fixed failing test * fixed failing test header * added comments, force build * fixes for strict compilation spotbugs checks * removed authenticator rolling credential update feature * removed escalator rolling credential update feature * fixed teamcity inspection deprecated API usage error * fixed checkstyle execution error, removed unused import * removed cached config as part of removing authenticator rolling credential update feature * removed config bundle entity as part of removing authenticator rolling credential update feature * refactored ldao configuration * added support for SSLContext configuration and TLSCertificateChecker * removed check to return authentication failure when user has no group assigned, will be checked and handled by the authorizer * Separate out authorizer checks between metadata-backed store user and LDAP user/groups * refactored BasicSecuritySSLSocketFactory usage to fix strict compilation spotbugs checks * fixes build issue * final review comments updates * final review comments updates * fixed LGTM and spellcheck alerts * Fixed Avatica auth failure error message check * Updated metadata credentials validator exception message string, replaced DB with metadata store	2019-10-08 17:08:27 -07:00
Fokko Driesprong	a2363b6b61	Remove commons-httpclient (#8407 )	2019-09-27 02:14:58 -07:00
elloooooo	7f2b6577ef	get active task by datasource when supervisor discover tasks (#8450 ) * get active task by datasource when supervisor discover tasks * fix ut * fix ut * fix ut * remove unnecessary condition check * fix ut * remove stream in hot loop	2019-09-26 16:15:24 -07:00
Rye	f2a444321b	Added live reports for Kafka and Native batch task (#8557 ) * Added live reports for Kafka and Native batch task * Removed unused local variables * Added the missing unit test * Refine unit test logic, add implementation for HttpRemoteTaskRunner * checksytle fixes * Update doc descriptions for updated API * remove unnecessary files * Fix spellcheck complaints * More details for api descriptions	2019-09-23 21:08:36 -07:00
Benedict Jin	c6f4f09557	Fix missing space in string literal and spurious Javadoc @param tags from LGTM (#8491 ) * Fix missing space in string literal * Fix spurious Javadoc @param tags	2019-09-16 14:37:47 +05:30
Kamal Gurala	61761bd0b1	kafka version update (#8525 )	2019-09-12 18:56:47 -07:00
Chi Cao Minh	5f61374cb3	Fix dependency analyze warnings (#8230 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports and updated druid-forbidden-apis to prevent regressions. * Address review comments * Adjust scope for org.glassfish.jaxb:jaxb-runtime * Fix dependencies for hdfs-storage * Consolidate netty4 versions	2019-09-09 14:37:21 -07:00
Benedict Jin	de18840412	Fix inconsistent equals and hashCode (#8381 ) * Fix inconsistent equals and hashCode * Patch comments * Remove equals and hashCode from InsensitiveContainsSearchQuerySpec	2019-09-04 13:48:08 +08:00
Fokko Driesprong	abd86467f8	Bump ORC library to 1.5.6 (#8405 ) Changelog at: https://orc.apache.org/docs/releases.html#current-release---156	2019-09-02 02:24:31 -07:00
Clint Wylie	c73a489335	bump master version to 0.17.0-incubating-SNAPSHOT (#8421 )	2019-08-28 01:58:36 -07:00
Himanshu	4d87a19547	Logging emitter to publish query and other metric events as valid json objects (#8359 ) * LoggingEmitter: print event as json * use DefaultRequestLogEventBuilderFactory in emitting request logger by default * print context in query metric as json * removed unused jsonMapper from DefaultQueryMetrics * add comment * remove change to DefaultRequestLogEventBuilderFactory.java	2019-08-27 15:00:23 -07:00
Jihoon Son	e5ef5ddafa	Fix the shuffle with TLS enabled for parallel indexing; add an integration test; improve unit tests (#8350 ) * Fix shuffle with tls enabled; add an integration test; improve unit tests * remove debug log * fix tests * unused import * add javadoc * rename to getContent	2019-08-26 19:27:41 -07:00
Xavier Léauté	5c7803fe6b	fix powermock classloader issues with Java 9 and above	2019-08-24 18:20:52 -04:00
SandishKumarHN	33f0753a70	Add Checkstyle for constant name static final (#8060 ) * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * merging with upstream * review-1 * unknow changes * unknow changes * review-2 * merging with master * review-2 1 changes * review changes-2 2 * bug fix	2019-08-23 13:13:54 +03:00
Atul Mohan	661976f266	Reset sketch combiner in AggregatorCombiner (#8368 ) * Reset union in AggregateCombiner * Use newer sketch objects for test * Add empty sketch objects	2019-08-23 00:22:40 -07:00
Jihoon Son	fba92ae469	Fix to always use end sequenceNumber for reset (#8305 ) * Fix to always use end sequenceNumber for reset * fix checkstyle * fix style and add log	2019-08-22 16:51:25 -05:00
Aaron Bossert	a4d1219184	Removed hard-coded Kafka Deserializer in Web-UI Kafka data import such that users can supply a custom deserializer in the UI as well as in hand-built ingestion specs. (#8364 )	2019-08-21 18:03:57 -07:00
Jihoon Son	22d6384d36	Fix unrealistic test variables in KafkaSupervisorTest and tidy up unused variable in checkpointing process (#7319 ) * Fix unrealistic test arguments in KafkaSupervisorTest * remove currentCheckpoint from checkpoint action * rename variable	2019-08-21 10:58:22 -07:00
Benedict Jin	781873ba53	Fix resource leak (#8337 ) * Fix resource leak * Patch comments	2019-08-20 12:55:41 +03:00

1 2 3 4 5 ...

744 Commits