druid

Commit Graph

Author	SHA1	Message	Date
Fokko Driesprong	13143f9376	Update to Parquet 1.8.2 (#4210 ) Hi guys, Since Spark 2.x uses Parquet 1.8.2, we would like to update Druid's parquet library from 1.8.1 to 1.8.2 as well. It includes a lot of patches, performance improvements and better compatibility: `4aba4da...c652278` Cheers, Fokko	2017-04-27 15:34:30 +09:00
Roman Leventov	ee9b5a619a	Fix bugs in query builders and in TimeBoundaryQuery.getFilter() (#4131 ) * Add queryMetrics property to Query interface; Fix bugs and removed unused code in Druids * Fix a bug in TimeBoundaryQuery.getFilter() and remove TimeBoundaryQuery.getDimensionsFilter() * Don't reassign query's queryMetrics if already present in CPUTimeMetricQueryRunner and MetricsEmittingQueryRunner * Add compatibility constructor to BaseQuery * Remove Query.queryMetrics property * Move nullToNoopLimitSpec() method to LimitSpec interface * Rename GroupByQuery.applyLimit() to postProcess(); Fix inconsistencies in GroupByQuery.Builder	2017-04-25 16:32:02 -05:00
Gian Merlino	2ca7b00346	Update versions to 0.10.1-SNAPSHOT. (#4191 )	2017-04-20 18:12:28 -07:00
Jihoon Son	5b69f2eff2	Make timeout behavior consistent to document (#4134 ) * Make timeout behavior consistent to document * Refactoring BlockingPool and add more methods to QueryContexts * remove unused imports * Addressed comments * Address comments * remove unused method * Make default query timeout configurable * Fix test failure * Change timeout from period to millis	2017-04-19 09:47:53 +09:00
kaijianding	db656c5a88	fix kafka8 unparsable message halt job issue (#4164 )	2017-04-18 11:23:02 -07:00
Dongkyu Hwangbo	0d2e91ed50	Adding Kafka-emitter (#3860 ) * Initial commit * Apply another config: clustername * Rename variable * Fix bug * Add retry logic * Edit retry logic * Upgrade kafka-clients version to the most recent release * Make callback single object * Write documentation * Rewrite error message and emit logic * Handling AlertEvent * Override toString() * make clusterName more optional * bump up druid version * add producer.config option which make user can apply another optional config value of kafka producer * remove potential blocking in emit() * using MemoryBoundLinkedBlockingQueue * Fixing coding convention * Remove logging every exception and just increment counting * refactoring * trivial modification * logging when callback has exception * Replace kafka-clients 0.10.1.1 with 0.10.2.0 * Resolve the problem related of classloader * adopt try statement * code reformatting * make variables final * rewrite toString	2017-04-04 14:07:43 -07:00
Jason Banich	117c698c59	Update StatsDEmitter.java (#4111 ) This was mentioned in the original pull (https://github.com/druid-io/druid/pull/2410) by @sascha-coenen, and the original author (@michaelschiff) agreed that it seemed reasonable This commit fixes issue https://github.com/druid-io/druid/issues/3960	2017-03-27 10:32:21 -07:00
Roman Leventov	4b5ae31207	QueryMetrics: abstraction layer of query metrics emitting (part of #3798 ) (#3954 ) * QueryMetrics: abstraction layer of query metrics emitting * Minor fixes * QueryMetrics.emit() for bulk emit and improve Javadoc * Fixes * Fix * Javadoc fixes * Typo * Use DefaultObjectMapper * Add tests * Address PR comments * Remove QueryMetrics.userDimensions(); Rename QueryMetric.register() to report() * Dedicated TopNQueryMetricsFactory, GroupByQueryMetricsFactory and TimeseriesQueryMetricsFactory * Typo * More elaborate Javadoc of QueryMetrics * Formatting * Replace QueryMetric enum with lambdas * Add comments and VisibleForTesting annotations	2017-03-23 17:23:59 -07:00
Karol Woźniak	8510a52e02	scan-query: Use long as limit. (#4081 ) * scan-query: Use long instead of int as limit type * Use MAX_INSTANT queryTimeout, if timeout == 0	2017-03-20 14:19:35 -07:00
Roman Leventov	84fe91ba0b	Monomorphic processing of TopN queries with 1 and 2 aggregators (key part of #3798 ) (#3889 ) * Monomorphic processing: add HotLoopCallee, CalledFromHotLoop, RuntimeShapeInspector, SpecializationService. Specialize topN queries with 1 or 2 aggregators. Add Cursor.advanceUninterruptibly() and isDoneOrInterrupted() for exception-free query processing. * Use Execs.singleThreaded() * RuntimeShapeInspector to support nullable fields * Make CalledFromHotLoop annotation Inherited * Remove unnecessary conversion of array of ColumnSelectorPluses to list and back to array in CardinalityAggregatorFactory * Close InputStream in SpecializationService * Formatting * Test specialized PooledTopNScanners * Set flags in PooledTopNAlgorithm directly * Fix tests, dependent on CountAggragatorFactory toString() form * Fix * Revert CountAggregatorFactory changes * Implement inspectRuntimeShape() for LongWrappingDimensionSelector and FloatWrappingDimensionSelector * Remove duplicate RoaringBitmap dependency in the extendedset pom.xml * Fix * Treat ByteBuffers specially in StringRuntimeShape * Doc fix * Annotate BufferAggregator.init() with CalledFromHotLoop * Make triggerSpecializationIterationsThreshold an int * Remove SpecializationService.PerPrototypeClassState.of() * Add comments * Limit the amount of specializations that SpecializationService could make * Add default implementation for BufferAggregator.inspectRuntimeShape(), for compatibility with extensions * Use more efficient ConcurrentMap's idioms in SpecializationService	2017-03-17 14:44:36 -05:00
Roman Leventov	81a5f9851f	TmpFileIOPeons to create files under the merging output directory, instead of java.io.tmpdir (#3990 ) * In IndexMerger and IndexMergerV9, create temporary files under the output directory/tmpPeonFiles, instead of java.io.tmpdir * Use FileUtils.forceMkdir() across the codebase and remove some unused code * Fix test * Fix PullDependencies.run() * Unused import	2017-03-02 14:05:12 -08:00
Jonathan Wei	5fb1638534	Add default configuration for select query 'fromNext' parameter (#3986 ) * Add default configuration for select query 'fromNext' parameter * PR comments * Fix PagingSpec config injection * Injection fix for test	2017-03-01 17:05:35 -08:00
Gian Merlino	cc20133e70	Checkstyle rule to outlaw tabs. (#3988 ) Tabs are the worst.	2017-02-28 23:52:53 -08:00
Jonathan Wei	a08660a9ca	Support ingestion of long/float dimensions (#3966 ) * Support ingestion for long/float dimensions * Allow non-arrays for key components in indexing type strategy interfaces * Add numeric index merge test, fixes * Docs for numeric dims at ingestion * Remove unused import * Adjust docs, add aggregate on numeric dims tests * remove unused imports * Throw exception for bitmap method on numerics * Move typed selector creation to DimensionIndexer interface * unused imports * Fix * Remove unused DimensionSpec from indexer methods, check for dims first in inc index storage adapter * Remove spaces	2017-02-28 19:04:41 -08:00
praveev	5ccfdcc48b	Fix testDeadlock timeout delay (#3979 ) * No more singleton. Reduce iterations * Granularities * Fix the delay in the test * Add license header * Remove unused imports * Lot more unused imports from all the rearranging * CR feedback * Move javadoc to constructor	2017-02-28 12:51:41 -06:00
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
michaelschiff	e5fb0e1ff5	New property for each metric that tells the StatsDEmitter to convert metric values from range 0-1 to 0-100. This (#3936 ) prevents rates and percentages expressed as Doubles (0.xx) from being rounded down to 0.	2017-02-16 13:55:56 -08:00
Akash Dwivedi	8854ce018e	File.deleteOnExit() (#3923 ) * Less use of File.deleteOnExit() * removed deleteOnExit from most of the tests/benchmarks/iopeon * Made IOpeon closable * Formatting. * Revert DeterminePartitionsJobTest, remove cleanup method from IOPeon	2017-02-13 15:12:14 -08:00
michaelschiff	c1eee9bbf3	modified "end" column to `end` (#3903 ) * modified "end" column to `end`. "end" is interpretted as a string rather than dereferencing the column value * SQLMetadataConnector.getQuoteString defines the string that should be used to quote string fields * positional arguments for String.format * for Connectors that use " need to include the \ escape as well	2017-02-13 12:36:27 -08:00
Jonathan Wei	ca2b04f0fd	Add long/float ColumnSelectorStrategy implementations (#3838 ) * Add long/float ColumnSelectorStrategy implementations * Address PR comments * Add String strategy with internal dictionary to V2 groupby, remove dict from numeric wrapping selectors, more tests * PR comments * Use BaseSingleValueDimensionSelector for long/float wrapping * remove unused import * Address PR comments * PR comments * PR comments * More PR comments * Fix failing calcite histogram subquery tests * ScanQuery test and comment about isInputRaw * Add outputType to extractionDimensionSpec, tweak SQL tests * Fix limit spec optimization for numerics * Add cardinality sanity checks to TopN * Fix import from merge * Add tests for filtered dimension spec outputType * Address PR comments * Allow filtered dimspecs on numerics * More comments	2017-02-08 20:39:29 -08:00
Gian Merlino	12317fd001	Bump version to 0.10.0-SNAPSHOT. (#3913 )	2017-02-06 17:54:35 -08:00
Jonathan Wei	e6b95e80aa	Remove deprecated Aggregator/AggregatorFactory methods (#3894 )	2017-02-01 14:43:18 -08:00
Gian Merlino	d3a3b7ba0c	Add virtual column types, holder serde, and safety features. (#3823 ) * Add virtual column types, holder serde, and safety features. Virtual columns: - add long, float, dimension selectors - put cache IDs in VirtualColumnCacheHelper - adjust serde so VirtualColumns can be the holder object for Jackson - add fail-fast validation for cycle detection and duplicates - add expression virtual column in core Storage adapters: - move virtual column hooks before checking base columns, to prevent surprises when a new base column is added that happens to have the same name as a virtual column. * Fix ExtractionDimensionSpecs with virtual dimensions. * Fix unused imports. * CR comments * Merge one more time, with feeling.	2017-01-26 18:15:51 -08:00
Roman Leventov	75d9e5e7a7	DimensionSelector-related bug fixes and optimizations (fixes #3799 , part of #3798 ) (#3858 ) * * Add DimensionSelector.idLookup() and nameLookupPossibleInAdvance() to allow better inspection of features DimensionSelectors supports, and safer code working with DimensionSelectors in BaseTopNAlgorithm, BaseFilteredDimensionSpec, DimensionSelectorUtils; * Add PredicateFilteringDimensionSelector, to make BaseFilteredDimensionSpec to be able to decorate DimensionSelectors with unknown cardinality; * Add DimensionSelector.makeValueMatcher() (two kinds) for DimensionSelector-side specifics-aware optimization of ValueMatchers; * Optimize getRow() in BaseFilteredDimensionSpec's DimensionSelector, StringDimensionIndexer's DimensionSelector and SingleScanTimeDimSelector; * Use two static singletons, TrueValueMatcher and FalseValueMatcher, instead of BooleanValueMatcher; * Add NullStringObjectColumnSelector singleton and use it in MapVirtualColumn * Rename DimensionSelectorUtils.makeNonDictionaryEncodedIndexedIntsBasedValueMatcher to makeNonDictionaryEncodedRowBasedValueMatcher * Make ArrayBasedIndexedInts constructor private, replace it's usages with of() static factory method * Cache baseIdLookup in ForwardingFilteredDimensionSelector * Fix a bug in DimensionSelectorUtils.makeRowBasedValueMatcher(selector, predicate, matchNull) * Employ precomputed BitSet optimization in DimensionSelector.makeValueMatcher(value, matchNull) when lookupId() is not available, but cardinality is known and lookupName() is available * Doc fixes * Addressed comments * Fix * Fix * Adjust javadoc of DimensionSelector.nameLookupPossibleInAdvance() for SingleScanTimeDimSelector * throw UnsupportedOperationException instead of IAE in BaseTopNAlgorithm	2017-01-25 15:28:27 -08:00
Himanshu	efb1b40fe0	build sqlserver-metadata-storage contrib extension (#3871 )	2017-01-20 14:39:15 -08:00
kaijianding	33ae9dd485	streaming version of select query (#3307 ) * streaming version of select query * use columns instead of dimensions and metrics;prepare for valueVector;remove granularity * respect query limit within historical * use constant * fix thread name corrupted bug when using jetty qtp thread rather than processing thread while working with SpecificSegmentQueryRunner * add some test for scan query * add scan query document * fix merge conflicts * add compactedList resultFormat, this format is better for json ser/der * respect query timeout * respect query limit on broker * use static consts and remove unused code	2017-01-19 16:09:53 -06:00
Jihoon Son	d80bec83cc	Enable auto license checking (#3836 ) * Enable license checking * Clean duplicated license headers	2017-01-10 18:13:47 -08:00
Roman Leventov	33800122ad	Don't return leaked Objects back to StupidPool, because this is dangerous. Reuse Cleaners in StupidPool. Make StupidPools named. Add StupidPool.leakedObjectCount(). Minor fixes (#3631 )	2016-12-26 00:35:35 -06:00
Himanshu	4ca3b7f1e4	overlord helpers framework and tasklog auto cleanup (#3677 ) * overlord helpers framework and tasklog auto cleanup * review comment changes * further review comments addressed	2016-12-21 15:18:55 -08:00
Gian Merlino	6440ddcbca	Fix #3795 (Java 7 compatibility). (#3796 ) * Fix #3795 (Java 7 compatibility). Also introduce Animal Sniffer checks during build, which would have caught the original problems. * Add Animal Sniffer on caffeine-cache for JDK8.	2016-12-21 10:19:13 -08:00
Nishant	f576a0ff14	Contrib Extension for Ambari Metrics Emitter (#3767 ) * Contrib Extension for Ambari Metrics Emitter extension to enable druid to send metrics to ambari metrics server (https://cwiki.apache.org/confluence/display/AMBARI/Metrics) review comments switch to public repo * review comments * add docs * fix pom version * Add link for doc page in extensions.md * remove unused imports * review comments review comments remove unused dependency review comment	2016-12-19 11:12:47 -08:00
Slim	7b18fb79e0	as per @itaiy suggested will close then try to connect. (#3669 ) * as per @itaiy suggested will close then try to connect. * use close instead of flush * git fix comments * break the loop in case of interrupted	2016-12-13 23:50:28 +05:30
Ninglin Du	469ab21091	[Feature] Thrift support for realtime and batch ingestion (#3418 ) * Thrift ingestion plugin 1. thrift binary is platform dependent, use scrooge to generate java files to avoid style check failure 2. stream and hadoop ingesion are both supported, input format can be sequence file and lzo thrift block file. 3. base64 and protocol aware change header * fix conlicts in pom	2016-12-13 10:05:15 -08:00
Navis Ryu	c74d267f50	Support virtual column for select query (#2511 ) * Support virtual column for select query * Addressed comments	2016-12-05 15:14:35 -08:00
Roman Leventov	7b56cec3b9	Fix resource leaks (#3702 )	2016-11-18 21:21:36 +05:30
Gian Merlino	7e80d1045a	Exercise v2 engine in the groupBy aggregator and multi-value dimension tests. (#3698 ) This also involved some other test changes: - Added a factory.mergeRunners step to AggregationTestHelper's groupBy chain, since the v2 engine does merging there. - Changed test byteBuffer pools from on-heap to off-heap to work around https://github.com/DataSketches/sketches-core/pull/116 for datasketches tests.	2016-11-16 20:02:25 -08:00
Erik Dubbelboer	7d36f540e8	WIP: Add Google Storage support (#2458 ) Also excludes the correct artifacts from #2741	2016-11-16 14:06:45 +05:30
Keuntae Park	094f5b851b	Support Min/Max for Timestamp (#3299 ) * Min/Max aggregator for Timestamp * remove unused imports and method * rebase and zip the test data * add docs	2016-11-14 23:00:21 -08:00
Akash Dwivedi	3e408497b3	Migrating bytebuffercollections from Metamarkets. (#3647 ) * Migrating bytebuffercollections from Metamarkets. * resolving code conflicts and removing <p> from bytebuffer-collections.	2016-11-11 10:51:07 -08:00
praveev	52a74cf84f	Use timestamp in millis as Map key instead of DateTime object (#3674 ) * Use Long timestamp as key instead of DateTime. DateTime representation is screwed up when you store with an obj and read with a different DateTime obj. For example: The code below fails when you use DateTime as key ``` DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles"))); HashMap<DateTime, String> map = new HashMap<>(); map.put(odt, "abc"); DateTime dt = new DateTime(odt.getMillis()); System.out.println(map.get(dt)); ``` * Respect timezone when creating the file. * Update docs with timezone caveat in granularity spec * Remove unused imports	2016-11-11 10:20:20 -08:00
Himanshu	b76b3f8d85	reset-cluster command to clean up druid state stored on metadata and deep storage (#3670 )	2016-11-09 11:07:01 -06:00
Mark	575aeb843a	Metadata Storage extension for Microsoft SqlServer (sqlserver-metadata-storage) (#3421 )	2016-11-08 14:56:52 -08:00
Gian Merlino	89d9c61894	Deprecate Aggregator.getName and AggregatorFactory.getAggregatorStartValue. (#3572 )	2016-10-31 15:24:30 -07:00
Akash Dwivedi	4b3bd8bd63	Migrating java-util from Metamarkets. (#3585 ) * Migrating java-util from Metamarkets. * checkstyle and updated license on java-util files. * Removed unused imports from whole project. * cherry pick metamx/java-util@826021f. * Copyright changes on java-util pom, address review comments.	2016-10-21 14:57:07 -07:00
Roman Leventov	5dc95389f7	Add Checkstyle framework (#3551 ) * Add Checkstyle framework * Avoid star import * Need braces for control flow statements * Redundant imports * Add NewLineAtEndOfFile check	2016-10-13 13:37:47 -07:00
Akash Dwivedi	078de4fcf9	Use explicit version from HadoopIngestionSpec. (#3554 )	2016-10-07 13:59:14 -07:00
Gian Merlino	40f2fe7893	Bump versions to 0.9.3-SNAPSHOT (#3524 )	2016-09-29 13:53:32 -07:00
Gian Merlino	27bd5cb13a	Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473 ) Fixes #3241.	2016-09-21 13:40:04 -06:00
Slim	6a1cd7fc66	avoid throwing exceptions fix#3389 (#3441 ) * avoid throwing exceptions * log alert * fix comments	2016-09-09 16:19:50 -07:00
Stéphane Derosiaux	48dce88aab	Add flag binaryAsString for parquet ingestion (#3381 )	2016-08-30 17:30:50 -07:00
kaijianding	df89f25b15	fix can't get latest offset in KafkaEightSimpleConsumerFirehoseFactory (#3355 )	2016-08-11 18:00:24 -07:00
kaijianding	b21a98e2f6	fix NPE if queueBufferLength is null in KafkaEightSimpleConsumerFirehoseFactory (#3345 )	2016-08-10 07:59:17 -07:00
Keuntae Park	95a58097e2	Hadoop InputRowParser for Orc file (#3019 ) * InputRowParser to decode OrcStruct from OrcNewInputFormat * add unit test for orc hadoop indexing * update docs and fix test code bug * doc updated * resove maven dependency conflict * remove unused imports * fix returning array type from Object[] to correct primitive array type * fix to support getDimension() of MapBasedRow : changing return type of orc list from array to list * rebase and updated based on comments * updated based on comments * on reflecting review comments * fix bug in typeStringFromParseSpec() and add unit test * add license header	2016-07-26 09:42:56 -07:00
Xavier Léauté	485e381387	remove datasource from hadoop output path (#3196 ) fixes #2083, follow-up to #1702	2016-06-29 08:53:45 -07:00
du00cs	bf53490d70	fix: no split file will throw IndexOutOfBounds Exception (#3179 )	2016-06-26 12:50:18 -07:00
Hyukjin Kwon	45f553fc28	Replace the deprecated usage of NoneShardSpec (#3166 )	2016-06-25 10:27:25 -07:00
Gian Merlino	4cc39b2ee7	Alternative groupBy strategy. (#2998 ) This patch introduces a GroupByStrategy concept and two strategies: "v1" is the current groupBy strategy and "v2" is a new one. It also introduces a merge buffers concept in DruidProcessingModule, to try to better manage memory used for merging. Both of these are described in more detail in #2987. There are two goals of this patch: 1. Make it possible for historical/realtime nodes to return larger groupBy result sets, faster, with better memory management. 2. Make it possible for brokers to merge streams when there are no order-by columns, avoiding materialization. This patch does not do anything to help with memory management on the broker when there are order-by columns or when there are nested queries. That could potentially be done in a future patch.	2016-06-24 18:06:09 -07:00
du00cs	ebd654228b	fix: avro types exception in sketch (#3167 )	2016-06-22 15:54:20 -05:00
Gian Merlino	ebf890fe79	Update master version to 0.9.2-SNAPSHOT. (#3133 )	2016-06-13 13:10:38 -07:00
Charles Allen	15ccf451f9	Move QueryGranularity static fields to QueryGranularities (#2980 ) * Move QueryGranularity static fields to QueryGranularityUtil * Fixes #2979 * Add test showing #2979 * change name to QueryGranularities	2016-05-17 16:23:48 -07:00
Slim	abf64a13b0	reconnect to the graphite after transient disconnect (#2952 ) * reconnect to the graphite after transient disconnect * catch the socket exception and retry	2016-05-12 11:32:36 -07:00
Slim	035134d070	fix for file not found execption at the graphite extension module (#2917 )	2016-05-04 15:37:10 -07:00
michaelschiff	2203a812bc	statsd-emitter (#2410 )	2016-04-28 18:41:02 -07:00
Slim	58510d826b	fix emit wait time (#2869 )	2016-04-26 17:07:03 -07:00
du00cs	639d0630b8	jackson conflict workaround in hadooop ingestio & parquet extension coordinate update (#2817 )	2016-04-13 14:20:33 -07:00
DuNinglin [杜宁林]	0f67ff7dfb	reoganize code folder according to recent upstream folder changes, seperate it from avro code and take it into extensions-conrib. docs rewite too	2016-03-30 11:21:41 +08:00
Gian Merlino	9877c570d3	RabbitMQ: Swap SIRP to BBIRP so it can use more kinds of parsers.	2016-03-29 08:42:06 -07:00
fjy	c418a55638	cleanup distinct count agg	2016-03-28 17:29:41 -07:00
Fangjin Yang	62c1dc7a09	Merge pull request #2602 from binlijin/distinctcount implement special distinctcount	2016-03-28 17:20:17 -07:00
Gian Merlino	7e7a886f65	Move druid-api into the druid repo. This is from druid-api-0.3.17, as of commit 51884f1d05d5512cacaf62cedfbb28c6ab2535cf in the druid-api repo.	2016-03-24 11:04:34 -07:00
binlijin	2729efca71	implement special distinctcount	2016-03-24 11:11:11 +08:00
fjy	943cbe6e76	refactor extensions into their own docs	2016-03-22 18:54:10 -07:00
Gian Merlino	738dcd8cd9	Update version to 0.9.1-SNAPSHOT. Fixes #2462	2016-03-17 10:34:20 -07:00
Nishant	ba1185963b	Fix a bunch of dependencies * Eliminate exclusion groups from pull-deps * Only consider dependency nodes in pull-deps if they are not in the following scopes * provided * test * system * Fix a bunch of `<scope>provided</scope>` missing tags * Better exclusions for a couple of problematic libs	2016-03-10 10:18:08 -08:00
fjy	e3e932a4d4	refactor extensions into core and contrib	2016-03-08 17:12:09 -08:00

... 4 5 6 7 8

375 Commits