druid

mirror of https://github.com/apache/druid.git synced 2025-02-28 06:19:13 +00:00

Author	SHA1	Message	Date
awelsh93	6964ac23a2	Adding influxdb emitter as a contrib extension (#7717 ) * Adding influxdb emitter as a contrib extension * addressing code review comments	2019-05-23 11:11:48 -07:00
Merlin Lee	26fad7e06a	Add checkstyle for "Local variable names shouldn't start with capital" (#7681 ) * Add checkstyle for "Local variable names shouldn't start with capital" * Adjust some local variables to constants * Replace StringUtils.LINE_SEPARATOR with System.lineSeparator()	2019-05-23 18:40:28 +02:00
Clint Wylie	ffc2397bcd	fix AggregatorFactory.finalizeComputation implementations to be ok with null inputs (#7731 ) * AggregatorFactory finalizeComputation is nullable with nullable input, make implementations honor this * fixes	2019-05-22 21:13:09 -07:00
Gian Merlino	b6941551ae	Upgrade various build and doc links to https. (#7722 ) * Upgrade various build and doc links to https. Where it wasn't possible to upgrade build-time dependencies to https, I kept http in place but used hardcoded checksums or GPG keys to ensure that artifacts fetched over http are verified properly. * Switch to https://apache.org.	2019-05-21 11:30:14 -07:00
Merlin Lee	5f08b0b474	Add checkstyle for "Prohibit @author tags in Javadoc" (#7682 ) * Add checkstyle for "Prohibit @author tags in Javadoc" * Add "Do not use author tags/information in the code" back to CONTRIBUTING.md	2019-05-20 00:09:51 -07:00
Jonathan Wei	d99f77a01b	Add option to use YARN RM as fallback for JobHistory failure (#7673 ) * Add option to use YARN RM as fallback for job status * PR comments	2019-05-16 13:59:10 -07:00
Denis Boulas	0b739a5787	Added more mappings for new metrics. (#7665 ) Signed-off-by: Denis Boulas <dene14@gmail.com>	2019-05-15 15:04:46 -07:00
Fokko Driesprong	2aa9613bed	Bump Checkstyle to 8.20 (#7651 ) * Bump Checkstyle to 8.20 Moderate severity vulnerability that affects: com.puppycrawl.tools:checkstyle Checkstyle prior to 8.18 loads external DTDs by default, which can potentially lead to denial of service attacks or the leaking of confidential information. Affected versions: < 8.18 * Oops, missed one * Oops, missed a few	2019-05-14 11:53:37 -07:00
Samarth Jain	b542bb9f34	TDigest backed sketch aggregators (#7331 ) * First set of changes for tDigest histogram * Add license * Address code review comments * Add a doc page for new T-Digest sketch aggregators. Minor code cleanup and comments. * Remove synchronization from BufferAggregators. Address code review comments * Fix typo	2019-05-09 17:22:55 -07:00
Xavier Léauté	f7bfe8f269	Update mocking libraries for Java 11 support (#7596 ) * update easymock / powermock for to 4.0.2 / 2.0.2 for JDK11 support * update tests to use new easymock interfaces * fix tests failing due to easymock fixes * remove dependency on jmockit * fix race condition in ResourcePoolTest	2019-05-06 12:28:56 -07:00
Eyal Yurman	f02251ab2d	Contributing Moving-Average Query to open source. (#6430 ) * Contributing Moving-Average Query to open source. * Fix failing code inspections. * See if explicit types will invoke the correct comparison function. * Explicitly remove support for druid.generic.useDefaultValueForNull configuration parameter. * Update styling and headers for complience. * Refresh code with latest master changes: * Remove NullDimensionSelector. * Apply changes of RequestLogger. * Apply changes of TimelineServerView. * Small checkstyle fix. * Checkstyle fixes. * Fixing rat errors; Teamcity errors. * Removing support theta sketches. Will be added back in this pr or a following once DI conflicts with datasketches are resolved. * Implements some of the review fixes. * Contributing Moving-Average Query to open source. * Fix failing code inspections. * See if explicit types will invoke the correct comparison function. * Explicitly remove support for druid.generic.useDefaultValueForNull configuration parameter. * Update styling and headers for complience. * Refresh code with latest master changes: * Remove NullDimensionSelector. * Apply changes of RequestLogger. * Apply changes of TimelineServerView. * Small checkstyle fix. * Checkstyle fixes. * Fixing rat errors; Teamcity errors. * Removing support theta sketches. Will be added back in this pr or a following once DI conflicts with datasketches are resolved. * Implements some of the review fixes. * More fixes for review. * More fixes from review. * MapBasedRow is Unmodifiable. Create new rows instead of modifying existing ones. * Remove more changes related to datasketches support. * Refactor BaseAverager startFrom field and add a comment. * fakeEvents field: Refactor initialization and add comment. * Rename parameters (tiny change). * Fix variable name typo in test (JAN_4). * Fix styling of non camelCase fields. * Fix Preconditions.checkArgument for cycleSize. * Add more documentation to RowBucketIterable and other classes. * key/value comment on in MovingAverageIterable. * Fix anonymous makeColumnValueSelector returning null. * Replace IdentityYieldingAccumolator with Yielders.each(). * * internalNext() should return null instead of throwing exception. * Remove unused variables/prarameters. * Harden MovingAverageIterableTest (Switch anyOf to exact match). * Change internalNext() from recursion to iteration; Simplify next() and hasNext(). * Remove unused imports. * Address review comments. * Rename fakeEvents to emptyEvents. * Remove redundant parameter key from computeMovingAverage. * Check yielder as well in RowBucketIterable#hasNext() * Fix javadoc.	2019-04-26 17:07:48 -07:00
Clint Wylie	89bb43f382	'core' ORC extension (#7138 ) * orc extension reworked to use apache orc map-reduce lib, moved to core extensions, support for flattenSpec, tests, docs * change binary handling to be compatible with avro and parquet, Rows.objectToStrings now converts byte[] to base64, change date handling * better docs and tests * fix it * formatting * doc fix * fix it * exclude redundant dependencies * use latest orc-mapreduce, add hadoop jobProperties recommendations to docs * doc fix * review stuff and fix binaryAsString * cache for root level fields * more better	2019-04-09 09:03:26 -07:00
Clint Wylie	a99f0ff450	prefix no-op aggs with "Noop" (#6960 )	2019-04-02 15:05:07 -07:00
Charles Allen	eeb3dbe79d	Move GCP to a core extension (#6953 ) * Move GCP to a core extension * Don't provide druid-core >.< * Keep AWS and GCP modules separate * Move AWSModule to its own module * Add aws ec2 extension and more modules in more places * Fix bad imports * Fix test jackson module * Include AWS and GCP core in server * Add simple empty method comment * Update version to 15 * One more 0.13.0-->0.15.0 change * Fix multi-binding problem * Grep for s3-extensions and update docs * Update extensions.md	2019-03-27 09:00:43 -07:00
Roman Leventov	bca40dcdaf	Fix some IntelliJ inspections (#7273 ) Prepare TeamCity for IntelliJ 2018.3.1 upgrade. Mostly removed redundant exceptions declarations in `throws` clauses.	2019-03-25 21:11:01 -03:00
Jihoon Son	0c5dcf5586	Fix exclusivity for start offset in kinesis indexing service & check exclusivity properly in IndexerSQLMetadataStorageCoordinator (#7291 ) * Fix exclusivity for start offset in kinesis indexing service * some adjustment * Fix SeekableStreamDataSourceMetadata * Add missing javadocs * Add missing comments and unit test * fix SeekableStreamStartSequenceNumbers.plus and add comments * remove extra exclusivePartitions in KafkaIOConfig and fix downgrade issue * Add javadocs * fix compilation * fix test * remove unused variable	2019-03-21 13:12:22 -07:00
Roman Leventov	dfd27e00c0	Avoid many unnecessary materializations of collections of 'all segments in cluster' cardinality (#7185 ) * Avoid many unnecessary materializations of collections of 'all segments in cluster' cardinality * Fix DruidCoordinatorTest; Renamed DruidCoordinator.getReplicationStatus() to computeUnderReplicationCountsPerDataSourcePerTier() * More Javadocs, typos, refactor DruidCoordinatorRuntimeParams.createAvailableSegmentsSet() * Style * typo * Disable StaticPseudoFunctionalStyleMethod inspection because of too much false positives * Fixes	2019-03-19 18:22:56 -03:00
Jihoon Son	892d1d35d6	Deprecate NoneShardSpec and drop support for automatic segment merge (#6883 ) * Deprecate noneShardSpec * clean up noneShardSpec constructor * revert unnecessary change * Deprecate mergeTask * add more doc * remove convert from indexMerger * Remove mergeTask * remove HadoopDruidConverterConfig * fix build * fix build * fix teamcity * fix teamcity * fix ServerModule * fix compilation * fix compilation	2019-03-15 23:29:25 -07:00
Furkan KAMACI	7ada1c49f9	Prohibit Throwables.propagate() (#7121 ) * Throw caught exception. * Throw caught exceptions. * Related checkstyle rule is added to prevent further bugs. * RuntimeException() is used instead of Throwables.propagate(). * Missing import is added. * Throwables are propogated if possible. * Throwables are propogated if possible. * Throwables are propogated if possible. * Throwables are propogated if possible. * * Checkstyle definition is improved. * Throwables.propagate() usages are removed. * Checkstyle pattern is changed for only scanning "Throwables.propagate(" instead of checking lookbehind. * Throwable is kept before firing a Runtime Exception. * Fix unused assignments.	2019-03-14 18:28:33 -03:00
Clint Wylie	3895914aa2	consolidate CompressionUtils.java since now in the same jar (#6908 )	2019-03-13 11:02:44 -04:00
Clint Wylie	4d3987c1dd	lifecycle stage refactor to ensure proper start and stop ordering of servers and announcements (#7234 ) * lifecycle stage refactor to ensure proper ordering of servers and announcements * move DerivativeDataSourceManager to Lifecycle.Stage.NORMAL	2019-03-12 07:09:03 -07:00
Jonathan Wei	0b4f771062	Exclude hadoop-lzo from thrift-extensions build (#7151 )	2019-02-27 19:57:53 -08:00
Jihoon Son	4e2b085201	Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file in deep storage (#6911 ) * Remove DataSegmentFinder, InsertSegmentToDb, and descriptor.json file * delete descriptor.file when killing segments * fix test * Add doc for ha * improve warning	2019-02-20 15:10:29 -08:00
Furkan KAMACI	9a521526c7	Since notify might not wake up the right thread, notifyAll should be used instead. (#6931 ) * Since notify might not wake up the right thread, notifyAll should be used instead. * Comment is added about why notifyAll() is not used.	2019-02-20 09:02:58 -08:00
Fangyuan Deng	7d1e8f353e	bugfix: when building materialized-view, if taskCount>1, may cause concurrentModificationException (#6690 ) * bugfix: when building materialized-view, if taskCount >1, may cause ConcurrentModificationException * remove entry after iteration instead of using ConcurrentMap, and add unit test * small change * modify unit test for coverage * remove unused method	2019-02-19 13:10:55 -08:00
Surekha	80a2ef7be4	Support kafka transactional topics (#5404 ) (#6496 ) * Support kafka transactional topics * update kafka to version 2.0.0 * Remove the skipOffsetGaps option since it's not used anymore * Adjust kafka consumer to use transactional semantics * Update tests * Remove unused import from test * Fix compilation * Invoke transaction api to fix a unit test * temporary modification of travis.yml for debugging * another attempt to get travis tasklogs * update kafka to 2.0.1 at all places * Remove druid-kafka-eight dependency from integration-tests, remove the kafka firehose test and deprecate kafka-eight classes * Add deprecated in docs for kafka-eight and kafka-simple extensions * Remove skipOffsetGaps and code changes for transaction support * Fix indentation * remove skipOffsetGaps from kinesis * Add transaction api to KafkaRecordSupplierTest * Fix indent * Fix test * update kafka version to 2.1.0	2019-02-18 11:50:08 -08:00
Jonathan Wei	1f29940811	Fix momentsketch build issues (#7074 ) * Fix momentsketch build issues * Remove unused section in pom * Fix test * Remove unused method * Checkstyle	2019-02-13 21:32:43 -08:00
Edward Gan	90c1a54b86	Moments Sketch custom aggregator (#6581 ) * Moments Sketch Integration with Druid * updates, add documentation, fix warnings * nits * disallowed base64 * update to druid 0.14	2019-02-13 14:03:47 -08:00
Jonathan Wei	fafbc4a80e	Set version to 0.15.0-incubating-SNAPSHOT (#7014 )	2019-02-07 14:02:52 -08:00
Jonathan Wei	8bc5eaa908	Set version to 0.14.0-incubating-SNAPSHOT (#7003 )	2019-02-04 19:36:20 -08:00
Roman Leventov	0e926e8652	Prohibit assigning concurrent maps into Map-typed variables and fields and fix a race condition in CoordinatorRuleManager (#6898 ) * Prohibit assigning concurrent maps into Map-types variables and fields; Fix a race condition in CoordinatorRuleManager; improve logic in DirectDruidClient and ResourcePool * Enforce that if compute(), computeIfAbsent(), computeIfPresent() or merge() is called on a ConcurrentHashMap, it's stored in a ConcurrentHashMap-typed variable, not ConcurrentMap; add comments explaining get()-before-computeIfAbsent() optimization; refactor Counters; fix a race condition in Intialization.java * Remove unnecessary comment * Checkstyle * Fix getFromExtensions() * Add a reference to the comment about guarded computeIfAbsent() optimization; IdentityHashMap optimization * Fix UriCacheGeneratorTest * Workaround issue with MaterializedViewQueryQueryToolChest * Strengthen Appenderator's contract regarding concurrency	2019-02-04 09:18:12 -08:00
David Glasser	9eaf8f5304	google-storage: retry GoogleTaskLogs inserts (#6918 ) This is an extension of PR #5750 by @drcrallen which added retry to a variety of GCS operations, but not to GoogleTaskLogs, which we have found to occasionally fail in our cluster. Also fixes a typo in a variable name and removes an unused private method parameter. Fixes #6912.	2019-01-31 01:21:35 -08:00
Clint Wylie	a6d81c0d16	Adds bloom filter aggregator to 'druid-bloom-filters' extension (#6397 ) * blooming aggs * partially address review * fix docs * minor test refactor after rebase * use copied bloomkfilter * add ByteBuffer methods to BloomKFilter to allow agg to use in place, simplify some things, more tests * add methods to BloomKFilter to get number of set bits, use in comparator, fixes * more docs * fix * fix style * simplify bloomfilter bytebuffer merge, change methods to allow passing buffer offsets * oof, more fixes * more sane docs example * fix it * do the right thing in the right place * formatting * fix * avoid conflict * typo fixes, faster comparator, docs for comparator behavior * unused imports * use buffer comparator instead of deserializing * striped readwrite lock for buffer agg, null handling comparator, other review changes * style fixes * style * remove sync for now * oops * consistency * inspect runtime shape of selector instead of selector plus, static comparator, add inner exception on serde exception * CardinalityBufferAggregator inspect selectors instead of selectorPluses * fix style * refactor away from using ColumnSelectorPlus and ColumnSelectorStrategyFactory to instead use specialized aggregators for each supported column type, other review comments * adjustment * fix teamcity error? * rename nil aggs to empty, change empty agg constructor signature, add comments * use stringutils base64 stuff to be chill with master * add aggregate combiner, comment	2019-01-29 20:05:17 +07:00
Benedict Jin	72a571fbf7	For performance reasons, use `java.util.Base64` instead of Base64 in Apache Commons Codec and Guava (#6913 ) * * Add few methods about base64 into StringUtils * Use `java.util.Base64` instead of others * Add org.apache.commons.codec.binary.Base64 & com.google.common.io.BaseEncoding into druid-forbidden-apis * Rename encodeBase64String & decodeBase64String * Update druid-forbidden-apis	2019-01-25 17:32:29 -08:00
Roman Leventov	8eae26fd4e	Introduce SegmentId class (#6370 ) * Introduce SegmentId class * tmp * Fix SelectQueryRunnerTest * Fix indentation * Fixes * Remove Comparators.inverse() tests * Refinements * Fix tests * Fix more tests * Remove duplicate DataSegmentTest, fixes #6064 * SegmentDescriptor doc * Fix SQLMetadataStorageUpdaterJobHandler * Fix DataSegment deserialization for ignoring id * Add comments * More comments * Address more comments * Fix compilation * Restore segment2 in SystemSchemaTest according to a comment * Fix style * fix testServerSegmentsTable * Fix compilation * Add comments about why SegmentId and SegmentIdWithShardSpec are separate classes * Fix SystemSchemaTest * Fix style * Compare SegmentDescriptor with SegmentId in Javadoc and comments rather than with DataSegment * Remove a link, see https://youtrack.jetbrains.com/issue/IDEA-205164 * Fix compilation	2019-01-21 11:11:10 -08:00
Jonathan Wei	b18d681551	Use kafka_2.12-0.10.2.2 (#6846 )	2019-01-10 20:52:55 -08:00
pzhdfy	ea3f426171	bugfix: Materialized view not support post-aggregator (#6689 ) * bugfix: Materialized view not support post post-aggregator * add unit test	2019-01-10 14:25:09 -08:00
David Glasser	c08f391605	statsd-emitter: support constant DogStatsD tags (#6791 ) PR #6605 added support to the statsd emitter for DogStatsD tags. This commit lets you specify "constant tags" in the config file which are included with every event. This is helpful if you are running in an environment where you cannot configure your datadog-agent with tags like "cluster name" --- eg, a Kubernetes cluster with a datadog-agent on each node and different Druid deployments in different namespaces but sharing the same datadog-agent daemonset. Also fix the name of an existing boolean getter to start with 'is'.	2019-01-04 15:35:37 +08:00
David Lim	3443f9a008	add missing contrib extensions to distribution packaging, fix influx package name, fix checkstyle plugin config to support Maven < 3.3.9 (#6717 )	2018-12-10 17:56:32 -08:00
Mingming Qiu	e8dd3716b8	add close method in Cache interface (#6540 ) * add close method in Cache interface * address comments * address comments and fix travis-ci * use try-finally	2018-12-06 17:28:41 +08:00
Mingming Qiu	c5405bb592	emit maxLag/avgLag in KafkaSupervisor (#6587 ) * emit maxLag/totalLag/avgLag in KafkaSupervisor * modify ingest/kafka/totalLag to ingest/kafka/lag for backwards compatibility	2018-11-28 02:11:14 -08:00
Roman Leventov	87b96fb1fd	Add checkstyle rules about imports and empty lines between members (#6543 ) * Add checkstyle rules about imports and empty lines between members * Add suppressions * Update Eclipse import order * Add empty line * Fix StatsDEmitter	2018-11-20 12:42:15 +01:00
Deiwin Sarjas	e0d1dc5846	Support DogStatsD style tags in statsd-emitter (#6605 ) * Replace StatsD client library The [Datadog package][1] is a StatsD compatible drop-in replacement for the client library, but it seems to be [better maintained][2] and has support for Datadog DogStatsD specific features, which will be made use of in a subsequent commit. The `count`, `time`, and `gauge` methods are actually exactly compatible with the previous library and the modifications shouldn't be required, but EasyMock seems to have a hard time dealing with the variable arguments added by the DogStatsD library and causes tests to fail if no arguments are provided for the last String vararg. Passing an empty array fixes the test failures. [1]: https://github.com/DataDog/java-dogstatsd-client [2]: https://github.com/tim-group/java-statsd-client/issues/37#issuecomment-248698856 * Retain dimension key information for StatsD metrics This doesn't change behavior, but allows separating dimensions from the metric name in subsequent commits. There is a possible order change for values from `dimsBuilder.build().values()`, but from the tests it looks like it doesn't affect actual behavior and the order of user dimensions is also retained. * Support DogStatsD style tags in statsd-emitter Datadog [doesn't support name-encoded dimensions and uses a concept of _tags_ instead.][1] This change allows Datadog users to send the metrics without having to encode the various dimensions in the metric names. This enables building graphs and monitors with and without aggregation across various dimensions from the same data. As tests in this commit verify, the behavior remains the same for users who don't enable the `druid.emitter.statsd.dogstatsd` configuration flag. [1]: https://www.datadoghq.com/blog/the-power-of-tagged-metrics/#tags-decouple-collection-and-reporting * Disable convertRange behavior for DogStatsD users DogStatsD, unlike regular StatsD, supports floating-point values, so this behavior is unnecessary. It would be possible to still support `convertRange`, even with `dogstatsd` enabled, but that would mean that people using the default mapping would have some of the gauges unnecessarily converted. `time` is in milliseconds and doesn't support floating-point values.	2018-11-19 09:47:57 -08:00
Mingming Qiu	93b0d58571	optimize input row parsers (#6590 ) * optimize input row parsers * address comments	2018-11-16 11:48:32 +08:00
Jihoon Son	d738ce4d2a	Enforce logging when killing a task (#6621 ) * Enforce logging when killing a task * fix test * address comment * address comment	2018-11-16 10:01:56 +08:00
Roman Leventov	8f3fe9cd02	Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies (#6607 ) * Prohibit String.replace() and String.replaceAll(), fix and prohibit some toString()-related redundancies * Fix bug * Replace checkstyle regexp with IntelliJ inspection	2018-11-15 13:21:34 -08:00
David Lim	afb239b17a	add missing license headers, in particular to MD files; clean up RAT … (#6563 ) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg	2018-11-13 09:38:37 -08:00
Roman Leventov	54351a5c75	Fix various bugs; Enable more IntelliJ inspections and update error-prone (#6490 ) * Fix various bugs; Enable more IntelliJ inspections and update error-prone * Fix NPE * Fix inspections * Remove unused imports	2018-11-06 14:38:08 -08:00
Clint Wylie	1224d8b746	overhaul 'druid-parquet-extensions' module, promoting from 'contrib' to 'core' (#6360 ) * move parquet-extensions from contrib to core, adds new hadoop parquet parser that does not convert to avro first and supports flattenSpec and int96 columns, add support for flattenSpec for parquet-avro conversion parser, much test with a bunch of files lifted from spark-sql * fix avro flattener to support nullable primitives for auto discovery and now only supports primitive arrays instead of all arrays * remove leftover print * convert micro timestamp to millis * checkstyle * add ignore for .parquet and .parq to rat exclude * fix legit test failure from avro flattern behavior change * fix rebase * add exclusions to pom to cut down on redundant jars * refactor tests, add support for unwrapping lists for parquet-avro, review comments * more comment * fix oops * tweak parquet-avro list handling * more docs * fix style * grr styles	2018-11-05 21:33:42 -08:00
Roman Leventov	2cdce2e2a6	Add RequestLogEventBuilderFactory (#6477 ) This PR allows to control the fields in `RequestLogEvent`, emitted in `EmittingRequestLogger`. In our case, we want to get rid of the `intervals` fields of the query objects that are a part of `DefaultRequestLogEvent`. They are enormous (thousands of segments) and not useful. Related to #5522, FYI @a2l007.	2018-10-31 22:24:37 +01:00

1 2 3 4 5 ...

262 Commits