druid

Commit Graph

Author	SHA1	Message	Date
Jonathan Wei	75ea0d592a	Add more datasketches doubles sketch SQL functions (#8843 ) * Add more datasketches doubles sketch SQL postaggs * style and lgtm	2019-11-08 18:05:06 -08:00
Gian Merlino	c204d68376	Fixes, adjustments to numeric null handling and string first/last aggregators. (#8834 ) There is a class of bugs due to the fact that BaseObjectColumnValueSelector has both "getObject" and "isNull" methods, but in most selector implementations and most call sites, it is clear that the intent of "isNull" is only to apply to the primitive getters, not the object getter. This makes sense, because the purpose of isNull is to enable detection of nulls in otherwise-primitive columns. Imagine a string column with a numeric selector built on top of it. You would want it to return isNull = true, so numeric aggregators don't treat it as all zeroes. Sometimes this design leads people to accidentally guard non-primitive get methods with "selector.isNull" checks, which is improper. This patch has three goals: 1) Fix null-handling bugs that already exist in this class. 2) Make interface and doc changes that reduce the probability of future bugs. 3) Fix other, unrelated bugs I noticed in the stringFirst and stringLast aggregators while fixing null-handling bugs. I thought about splitting this into its own patch, but it ended up being tough to split from the null-handling fixes. For (1) the fixes are, - Fix StringFirst and StringLastAggregatorFactory to stop guarding getObject calls on isNull, by no longer extending NullableAggregatorFactory. Now uses -1 as a sigil value for null, to differentiate nulls and empty strings. - Fix ExpressionFilter to stop guarding getObject calls on isNull. Also, use eval.asBoolean() to avoid calling getLong on the selector after already calling getObject. - Fix ObjectBloomFilterAggregator to stop guarding DimensionSelector calls on isNull. Also, refactored slightly to avoid the overhead of calling getObject followed by another getter (see BloomFilterAggregatorFactory for part of this). For (2) the main changes are, - Remove the "isNull" method from BaseObjectColumnValueSelector. - Clarify "isNull" doc on BaseNullableColumnValueSelector. - Rename NullableAggregatorFactory -> NullbleNumericAggregatorFactory to emphasize that it only works on aggregators that take numbers as input. - Similar naming changes to the Aggregator, BufferAggregator, and AggregateCombiner. - Similar naming changes to helper methods for groupBy, ValueMatchers, etc. For (3) the other fixes for StringFirst and StringLastAggregatorFactory are, - Fixed buffer overrun in the buffer aggregators when some characters in the string code into more than one byte (the old code used "substring" to apply a byte limit, which is bad). I did this by introducing a new StringUtils.toUtf8WithLimit method. - Fixed weird IncrementalIndex logic that led to reading nulls for the timestamp. - Adjusted weird StringFirst/Last logic that worked around the weird IncrementalIndex behavior. - Refactored to share code between the four aggregators. - Improved test coverage. - Made the base stringFirst, stringLast aggregators adaptive, and streamlined the xFold versions into aliases. The adaptiveness is similar to how other aggregators like hyperUnique work.	2019-11-07 17:46:59 -08:00
Roman Leventov	5c0fc0a13a	Fix ambiguity about IndexerSQLMetadataStorageCoordinator.getUsedSegmentsForInterval() returning only non-overshadowed or all used segments (#8564 ) * IndexerSQLMetadataStorageCoordinator.getTimelineForIntervalsWithHandle() don't fetch abutting intervals; simplify getUsedSegmentsForIntervals() * Add VersionedIntervalTimeline.findNonOvershadowedObjectsInInterval() method; Propagate the decision about whether only visible segmetns or visible and overshadowed segments should be returned from IndexerMetadataStorageCoordinator's methods to the user logic; Rename SegmentListUsedAction to RetrieveUsedSegmentsAction, SegmetnListUnusedAction to RetrieveUnusedSegmentsAction, and UsedSegmentLister to UsedSegmentsRetriever * Fix tests * More fixes * Add javadoc notes about returning Collection instead of Set. Add JacksonUtils.readValue() to reduce boilerplate code * Fix KinesisIndexTaskTest, factor out common parts from KinesisIndexTaskTest and KafkaIndexTaskTest into SeekableStreamIndexTaskTestBase * More test fixes * More test fixes * Add a comment to VersionedIntervalTimelineTestBase * Fix tests * Set DataSegment.size(0) in more tests * Specify DataSegment.size(0) in more places in tests * Fix more tests * Fix DruidSchemaTest * Set DataSegment's size in more tests and benchmarks * Fix HdfsDataSegmentPusherTest * Doc changes addressing comments * Extended doc for visibility * Typo * Typo 2 * Address comment	2019-11-06 11:07:04 -08:00
Giuseppe Martino	9c171e2b1f	Message rejection absolute date (#8656 ) * Add option lateMessageRejectionStartDate * Use option lateMessageRejectionStartDate * Fix tests * Add lateMessageRejectionStartDate to kafka indexing service * Update tests kafka indexing service * Fix tests for KafkaSupervisorTest * Add lateMessageRejectionStartDate to KinesisSupervisorIOConfig * Fix var name * Update documentation * Add check lateMessageRejectionStartDateTime and lateMessageRejectionPeriod, fails if both were specified.	2019-10-31 15:13:02 -07:00
yuanli	bca649e492	Case sensitive comparison of nonbinary string in MySQL metadata storage (#8758 )	2019-10-30 20:48:08 -07:00
Clint Wylie	3ff5e02237	remove select query (#8739 ) * remove select query * thanks teamcity * oops * oops * add back a SelectQuery class that throws RuntimeExceptions linking to docs * adjust text * update docs per review * deprecated	2019-10-30 19:29:56 -07:00
karthikbhat13	b8ceee4eee	Removed 'if' condition. (#8768 )	2019-10-28 13:40:03 -07:00
Gian Merlino	b65d2ac648	Add HDFS firehose (#8754 ) * Add HDFS firehose. * Tests, support for lists of paths. * Fixups. * Update list of firehoses. * Wildcards is a word.	2019-10-28 08:07:38 -07:00
Jihoon Son	2518478b20	Remove deprecated parameter for Checkpoint request (#8707 ) * Remove deprecated parameter for Checkpoint request * fix wrong doc	2019-10-23 16:51:16 -07:00
Surekha	98f59ddd7e	Add `sys.supervisors` table to system tables (#8547 ) * Add supervisors table to SystemSchema * Add docs * fix checkstyle * fix test * fix CI * Add comments * Fix javadoc teamcity error * comments * fix links in docs * fix links * rename fullStatus query param to system and remove it from docs	2019-10-18 15:16:42 -07:00
Jonathan Wei	d88075237a	Add initial SQL support for non-expression sketch postaggs (#8487 ) * Add initial SQL support for non-expression sketch postaggs * Checkstyle, spotbugs * checkstyle * imports * Update SQL docs * Checkstyle * Fix theta sketch operator docs * PR comments * Checkstyle fixes * Add missing entries for HLL sketch module * PR comments, add round param to HLL estimate operator, fix optional HLL param	2019-10-18 14:59:44 -07:00
Jihoon Son	30c15900be	Auto compaction based on parallel indexing (#8570 ) * Auto compaction based on parallel indexing * javadoc and doc * typo * update spell * addressing comments * address comments * fix log * fix build * fix test * increase default max input segment bytes per task * fix test	2019-10-18 13:24:14 -07:00
Mohammad J. Khan	1ca859584f	Issue 8678 Non-coordinator services are repeatedly logging JsonMappingException when using druid-basic-security extension with an authenticator that has no users setup (#8692 )	2019-10-18 11:09:53 -07:00
Jonathan Wei	89ce6384f5	More Kinesis resharding adjustments (#8671 ) * More Kinesis resharding adjustments * Fix TC inspection * Fix comment' * Adjust comment, small refactor * Make repartition transition time configurable * Add spellcheck exclusion * Spelling fix	2019-10-15 23:19:17 -07:00
Jihoon Son	4046c86d62	Stateful auto compaction (#8573 ) * Stateful auto compaction * javaodc * add removed test back * fix test * adding indexSpec to compactionState * fix build * add lastCompactionState * address comments * extract CompactionState * fix doc * fix build and test * Add a task context to store compaction state; add javadoc * fix it test	2019-10-15 22:57:42 -07:00
Jonathan Wei	0c387c1d47	Fix Kinesis resharding issues (#8644 ) * Fix Kinesis resharding issues * PR comments * Adjust metadata error message * Remove unused method * Use sha1 for shard id hashing * Add metadata sanity check, add comment * Only use shard ID hashing for group mapping * Style fix * Fix unused import * update comment * Fix teamcity inspection	2019-10-10 00:16:44 -07:00
Jonathan Wei	526f04c47c	Fix missing jackson jars for hadoop ingestion (#8652 ) * Fix missing jackson jars for hadoop ingestion * PR comments * pom ordering * New approach * Remove all jackson-core/mapper-asl exclusions from hdfs storage	2019-10-08 23:54:55 -07:00
Mohammad J. Khan	18758f5228	Support LDAP authentication/authorization (#6972 ) * Support LDAP authentication/authorization * fixed integration-tests * fixed Travis CI build errors related to druid-security module * fixed failing test * fixed failing test header * added comments, force build * fixes for strict compilation spotbugs checks * removed authenticator rolling credential update feature * removed escalator rolling credential update feature * fixed teamcity inspection deprecated API usage error * fixed checkstyle execution error, removed unused import * removed cached config as part of removing authenticator rolling credential update feature * removed config bundle entity as part of removing authenticator rolling credential update feature * refactored ldao configuration * added support for SSLContext configuration and TLSCertificateChecker * removed check to return authentication failure when user has no group assigned, will be checked and handled by the authorizer * Separate out authorizer checks between metadata-backed store user and LDAP user/groups * refactored BasicSecuritySSLSocketFactory usage to fix strict compilation spotbugs checks * fixes build issue * final review comments updates * final review comments updates * fixed LGTM and spellcheck alerts * Fixed Avatica auth failure error message check * Updated metadata credentials validator exception message string, replaced DB with metadata store	2019-10-08 17:08:27 -07:00
Fokko Driesprong	a2363b6b61	Remove commons-httpclient (#8407 )	2019-09-27 02:14:58 -07:00
elloooooo	7f2b6577ef	get active task by datasource when supervisor discover tasks (#8450 ) * get active task by datasource when supervisor discover tasks * fix ut * fix ut * fix ut * remove unnecessary condition check * fix ut * remove stream in hot loop	2019-09-26 16:15:24 -07:00
Rye	f2a444321b	Added live reports for Kafka and Native batch task (#8557 ) * Added live reports for Kafka and Native batch task * Removed unused local variables * Added the missing unit test * Refine unit test logic, add implementation for HttpRemoteTaskRunner * checksytle fixes * Update doc descriptions for updated API * remove unnecessary files * Fix spellcheck complaints * More details for api descriptions	2019-09-23 21:08:36 -07:00
Benedict Jin	c6f4f09557	Fix missing space in string literal and spurious Javadoc @param tags from LGTM (#8491 ) * Fix missing space in string literal * Fix spurious Javadoc @param tags	2019-09-16 14:37:47 +05:30
Kamal Gurala	61761bd0b1	kafka version update (#8525 )	2019-09-12 18:56:47 -07:00
Chi Cao Minh	5f61374cb3	Fix dependency analyze warnings (#8230 ) * Fix dependency analyze warnings Update the maven dependency plugin to the latest version and fix all warnings for unused declared and used undeclared dependencies in the compile scope. Added new travis job to add the check to CI. Also fixed some source code files to use the correct packages for their imports and updated druid-forbidden-apis to prevent regressions. * Address review comments * Adjust scope for org.glassfish.jaxb:jaxb-runtime * Fix dependencies for hdfs-storage * Consolidate netty4 versions	2019-09-09 14:37:21 -07:00
Benedict Jin	de18840412	Fix inconsistent equals and hashCode (#8381 ) * Fix inconsistent equals and hashCode * Patch comments * Remove equals and hashCode from InsensitiveContainsSearchQuerySpec	2019-09-04 13:48:08 +08:00
Fokko Driesprong	abd86467f8	Bump ORC library to 1.5.6 (#8405 ) Changelog at: https://orc.apache.org/docs/releases.html#current-release---156	2019-09-02 02:24:31 -07:00
Clint Wylie	c73a489335	bump master version to 0.17.0-incubating-SNAPSHOT (#8421 )	2019-08-28 01:58:36 -07:00
Himanshu	4d87a19547	Logging emitter to publish query and other metric events as valid json objects (#8359 ) * LoggingEmitter: print event as json * use DefaultRequestLogEventBuilderFactory in emitting request logger by default * print context in query metric as json * removed unused jsonMapper from DefaultQueryMetrics * add comment * remove change to DefaultRequestLogEventBuilderFactory.java	2019-08-27 15:00:23 -07:00
Jihoon Son	e5ef5ddafa	Fix the shuffle with TLS enabled for parallel indexing; add an integration test; improve unit tests (#8350 ) * Fix shuffle with tls enabled; add an integration test; improve unit tests * remove debug log * fix tests * unused import * add javadoc * rename to getContent	2019-08-26 19:27:41 -07:00
Xavier Léauté	5c7803fe6b	fix powermock classloader issues with Java 9 and above	2019-08-24 18:20:52 -04:00
SandishKumarHN	33f0753a70	Add Checkstyle for constant name static final (#8060 ) * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * check ctyle for constant field name * merging with upstream * review-1 * unknow changes * unknow changes * review-2 * merging with master * review-2 1 changes * review changes-2 2 * bug fix	2019-08-23 13:13:54 +03:00
Atul Mohan	661976f266	Reset sketch combiner in AggregatorCombiner (#8368 ) * Reset union in AggregateCombiner * Use newer sketch objects for test * Add empty sketch objects	2019-08-23 00:22:40 -07:00
Jihoon Son	fba92ae469	Fix to always use end sequenceNumber for reset (#8305 ) * Fix to always use end sequenceNumber for reset * fix checkstyle * fix style and add log	2019-08-22 16:51:25 -05:00
Aaron Bossert	a4d1219184	Removed hard-coded Kafka Deserializer in Web-UI Kafka data import such that users can supply a custom deserializer in the UI as well as in hand-built ingestion specs. (#8364 )	2019-08-21 18:03:57 -07:00
Jihoon Son	22d6384d36	Fix unrealistic test variables in KafkaSupervisorTest and tidy up unused variable in checkpointing process (#7319 ) * Fix unrealistic test arguments in KafkaSupervisorTest * remove currentCheckpoint from checkpoint action * rename variable	2019-08-21 10:58:22 -07:00
Benedict Jin	781873ba53	Fix resource leak (#8337 ) * Fix resource leak * Patch comments	2019-08-20 12:55:41 +03:00
Clint Wylie	7362b1d8fc	exclude avro extension dependencies that are already included in druid libs (#8309 )	2019-08-16 17:42:06 -05:00
Clint Wylie	a34cfd3e68	exclude kerberos extension dependencies that are already included in druid core libraries (#8310 ) * exclude kerberos extension dependencies that are already included in druid libs * missing net * exclude json-smart * eh might as well go aggro and remove all the ones it looks like we do not need * guess we actually need this one	2019-08-16 17:41:47 -05:00
Jihoon Son	b654096194	Fix equals for ArrayOfDoublesSketchAggregatorFactory (#8326 )	2019-08-16 14:47:37 -07:00
Jihoon Son	5dac6375f3	Add support for parallel native indexing with shuffle for perfect rollup (#8257 ) * Add TaskResourceCleaner; fix a couple of concurrency bugs in batch tasks * kill runner when it's ready * add comment * kill run thread * fix test * Take closeable out of Appenderator * add javadoc * fix test * fix test * update javadoc * add javadoc about killed task * address comment * Add support for parallel native indexing with shuffle for perfect rollup. * Add comment about volatiles * fix test * fix test * handling missing exceptions * more clear javadoc for stopGracefully * unused import * update javadoc * Add missing statement in javadoc * address comments; fix doc * add javadoc for isGuaranteedRollup * Rename confusing variable name and fix typos * fix typos; move fetch() to a better home; fix the expiration time * add support https	2019-08-15 17:43:35 -07:00
Sayat	1f3a99616d	Upgrade Kafka library for kafka-lookup module (#8078 ) * Upgrade Kafka library for kafka-lookup module * Update licenes.yaml * Adopt class workaround from KafkaRecordSupplier#getKafkaConsumer * Update lisences for kafka clients	2019-08-14 13:46:25 -07:00
Clint Wylie	1054d85171	add mechanism to control filter optimization in historical query processing (#8209 ) * add support for mechanism to control filter optimization in historical query processing * oops * adjust * woo * javadoc * review comments * fix * default * oops * oof * this will fix it * more nullable, refactor DimFilter.getRequiredColumns to use Set, formatting * extract class DimFilterToStringBuilder with common code from custom DimFilter toString implementations * adjust variable naming * missing nullable * more nullable * fix javadocs * nullable * address review comments * javadocs, precondition * nullable * rename method to be consistent * review comments * remove tuning from ColumnComparisonFilter/ColumnComparisonDimFilter	2019-08-09 16:36:18 -07:00
Jihoon Son	8a16a8e97f	Teach tasks what machine they are running on (#8190 ) * Teach the middleManager port to tasks * parent annotation * Bind parent for indexer	2019-08-02 15:34:44 -07:00
Fokko Driesprong	91743eeebe	Spotbugs: NP_NONNULL_PARAM_VIOLATION (#8129 )	2019-08-02 19:20:22 +03:00
Gian Merlino	77297f4e6f	GroupBy array-based result rows. (#8196 ) * GroupBy array-based result rows. Fixes #8118; see that proposal for details. Other than the GroupBy changes, the main other "interesting" classes are: - ResultRow: The array-based result type. - BaseQuery: T is no longer required to be Comparable. - QueryToolChest: Adds "decorateObjectMapper" to enable query-aware serialization and deserialization of result rows (necessary due to their positional nature). - QueryResource: Uses the new decoration functionality. - DirectDruidClient: Also uses the new decoration functionality. - QueryMaker (in Druid SQL): Modifications to read ResultRows. These classes weren't changed, but got some new javadocs: - BySegmentQueryRunner - FinalizeResultsQueryRunner - Query * Adjustments for TC stuff.	2019-07-31 16:15:12 -07:00
Gian Merlino	63461311f8	HllSketch Merge/Build BufferAggregators: Speed up init with prebuilt sketch. (#8194 ) * HllSketchMergeBufferAggregator: Speed up init by copying prebuilt sketch. * Remove useless writableRegion call. * POM variables. * Fix missing reposition. * Apply similar optimization to HllSketchBuildBufferAggregator. * Rename emptySketch -> emptyUnion in merge flavor. * Adjustments based on review. * Comment update. * Additional updates. * Comment push.	2019-07-31 08:18:42 -07:00
Jihoon Son	385f492a55	Use PartitionsSpec for all task types (#8141 ) * Use partitionsSpec for all task types * fix doc * fix typos and revert to use isPushRequired * address comments * move partitionsSpec to core * remove hadoopPartitionsSpec	2019-07-30 17:24:39 -07:00
Aaron Bossert	aba65bb675	removed hard-coded Kafka key and value deserializer (#8112 ) * removed hard-coded Kafka key and value deserializer, leaving default deserializer as org.apache.kafka.common.serialization.ByteArrayDeserializer. Also added checks to ensure that any provided deserializer class extends org.apache.kafka.serialization.Deserializer and outputs a byte array. * Addressed all comments from original pull request and also added a unit test. * Added additional test that uses "poll" to ensure that custom deserializer works properly.	2019-07-30 16:25:32 -07:00
Jonathan Wei	640b7afc1c	Add CliIndexer process type and initial task runner implementation (#8107 ) * Add CliIndexer process type and initial task runner implementation * Fix HttpRemoteTaskRunnerTest * Remove batch sanity check on PeonAppenderatorsManager * Fix paralle index tests * PR comments * Adjust Jersey resource logging * Additional cleanup * Fix SystemSchemaTest * Add comment to LocalDataSegmentPusherTest absolute path test * More PR comments * Use Server annotated with RemoteChatHandler * More PR comments * Checkstyle * PR comments * Add task shutdown to stopGracefully * Small cleanup * Compile fix * Address PR comments * Adjust TaskReportFileWriter and fix nits * Remove unnecessary closer * More PR comments * Minor adjustments * PR comments * ThreadingTaskRunner: cancel task run future not shutdownFuture and remove thread from workitem	2019-07-29 17:06:33 -07:00
Chi Cao Minh	ab71a2e1e4	Revert "Fix dependency analyze warnings (#8128 )" (#8189 ) This reverts commit `5dd0d8e873`.	2019-07-29 11:42:16 -07:00

1 2 3 4 5 ...

630 Commits