druid

Commit Graph

Author	SHA1	Message	Date
Jonathan Wei	1e3979a5e8	Add variance aggregator from hive to NOTICE (#3327 )	2016-08-04 17:43:55 -07:00
Navis Ryu	5b3f0ccb1f	Support variance and standard deviation (#2525 ) * Support variance and standard deviation * addressed comments	2016-08-04 17:32:58 -07:00
Gleb Smirnov	33dbe0800c	Makes kafka lookup extraction factory's replace() behavior consistent with other lookup extraction factories (#3326 )	2016-08-04 10:24:19 -07:00
Gian Merlino	9437a7a313	HLL: Avoid some allocations when possible. (#3314 ) - HLLC.fold avoids duplicating the other buffer by saving and restoring its position. - HLLC.makeCollector(buffer) no longer duplicates incoming BBs. - Updated call sites where appropriate to duplicate BBs passed to HLLC.	2016-08-03 18:08:52 -07:00
Himanshu	be79b095ba	fixing expected result for segmentMetadata query in integration tests (#3318 )	2016-08-03 12:13:27 -07:00
Gian Merlino	a4b95af839	Fix grouper closing in GroupByMergingQueryRunnerV2. (#3316 ) The grouperHolder should be closed on failure, not the grouper.	2016-08-02 21:02:30 -07:00
Gian Merlino	0299ac73b8	Fix FilteredAggregators at ingestion time and in groupBy v2 nested queries. (#3312 ) The common theme between the two is they both create "fake" DimensionSelectors that work on top of Rows. They both do it because there isn't really any dictionary for the underlying Rows, they're just a stream of data. The fix for both is to allow a DimensionSelector to tell callers that it has no dictionary by returning CARDINALITY_UNKNOWN from getValueCardinality. The callers, in turn, can avoid using it in ways that assume it has a dictionary. Fixes #3311.	2016-08-02 17:39:40 -07:00
Gian Merlino	ae3e0015b6	Fix ClassCastException in nested v2 groupBys with timeouts. (#3310 ) Add tests for the CCE and for a bunch of other groupBy stuff. Also avoids setting the interrupted flag when InterruptedExceptions happen, since this might interfere with resource closing, no other query does it, and is probably pointless anyway since the thread is likely to be a jetty thread that we don't actually want to set an interrupt flag on. Also fixes toString on OrderByColumnSpec.	2016-08-02 16:02:44 -06:00
kaijianding	50d52a24fc	ability to not rollup at index time, make pre aggregation an option (#3020 ) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug	2016-08-02 11:13:05 -07:00
Jonathan Wei	0bdaaa224b	Use Long.compare for NumericComparator when possible (#3309 )	2016-08-01 20:36:56 -07:00
Dave Li	bc20658239	groupBy nested query using v2 strategy (#3269 ) * changed v2 nested query strategy * add test for #3239 * update for new ValueMatcher interface and add benchmarks * enable time filtering * address PR comments * add failing test for outer filter aggregator * add helper class for sharing code * update nested groupby doc * move temporary storage instantiation * address PR comment * address PR comment 2	2016-08-01 18:30:39 -07:00
Fangjin Yang	d51ec398d4	fix parquet docs (#3304 )	2016-08-01 07:54:48 -07:00
Jonathan Wei	a6105cbb86	Add numeric StringComparator (#3270 ) * Add numeric StringComparator * Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query * Address PR comments, add multithreaded BoundDimFilter test * Add comment on strlen tie handling * Add timeseries interval filter benchmark * Adjust docs * Use jackson for StringComparator, address PR comments * Add new TopNMetricSpec and SearchSortSpec with tests (WIP) * More TopNMetricSpec and SearchSortSpec tests * Fix NewSearchSortSpec serde * Update docs for new DimensionTopNMetricSpec * Delete NumericDimensionTopNMetricSpec * Delete old SearchSortSpec * Rename NewSearchSortSpec to SearchSortSpec * Add TopN numeric comparator benchmark, address PR comments * Refactor OrderByColumnSpec * Add null checks to NumericComparator and String->BigDecimal conversion function * Add more OrderByColumnSpec serde tests	2016-07-29 15:44:16 -07:00
Charles Allen	d04af6aee4	Add `slf4j` requst logger (#3146 ) * Add `slf4j` requst logger * Address comments * Fix conflicts with master * Fix removed map value	2016-07-29 15:15:41 -07:00
Gian Merlino	e5397ed316	Link up Hadoop class loading docs better. (#3302 )	2016-07-29 10:19:54 -07:00
kaijianding	1fa681934c	fix ConcurrentModificationException in CachingClusteredClient.run() (#3278 ) * fix ConcurrentModificationException in CachingClusteredClient.run() * obtain new copy of PartitionHolder to avoid potential multi-threads read/write issue	2016-07-28 19:52:50 -07:00
Navis Ryu	884017d981	"all" type search query spec (#3300 ) * "all" type search query spec * addressed comments * added unit test	2016-07-28 18:16:15 -07:00
Gian Merlino	2553997200	Associate groupBy v2 resources with the Sequence lifecycle. (#3296 ) This fixes a potential issue where groupBy resources could be allocated to create a Sequence, but then the Sequence is never used, and thus the resources are never freed. Also simplifies how groupBy handles config overrides (this made the new unit test easier to write).	2016-07-27 18:44:19 -07:00
Charles Allen	546e4f79b0	Add size of pending deletes to historical metrics (#3295 ) * Add size of pending deletes to historical metrics	2016-07-27 11:30:47 -07:00
Charles Allen	b1e3fe77f5	More logging around how the coordinator balancer is happening (#3279 ) * More logging around how the coordinator balancer is happening * Address comments * Address code review comments and add actual logging	2016-07-27 13:24:32 +05:30
David Lim	9a068e1ba6	fix broken link and use of pipes in table (#3290 )	2016-07-26 15:46:51 -07:00
Gian Merlino	2f275497b6	Fix caching of extension classloaders. (#3289 )	2016-07-26 15:19:15 -07:00
Himanshu	b0fa274481	fix segmentMetadata query results in integration tests (#3288 )	2016-07-26 14:05:14 -07:00
Gian Merlino	8030f1cb67	Be more respectful of maxRowsInMemory. (#3284 ) - Appenderator: Respect maxRowsInMemory across all sinks. - KafkaIndexTask: Respect maxRowsInMemory across all partitions.	2016-07-26 15:02:35 -06:00
Gian Merlino	9b5523add3	Reference counting, better error handling for resources in groupBy v2. (#3268 ) Refcounting prevents releasing the merge buffer, or closing the concurrent grouper, before the processing threads have all finished. The better error handling prevents an avalanche of per-runner exceptions when grouping resources are exhausted, by grouping those all up into a single merged exception.	2016-07-27 01:59:02 +05:30
Charles Allen	188a4bc89a	Revert "Optionally intern ServerInventoryView inventory objects. (#3238 )" (#3286 ) This reverts commit `a931debf79`. Fixes #3283 The core issue here is that realtime nodes announce their size as 0, so a coordinator which interns the realtime version of the data segment will not be able to see the new sized announcement when handoff occurs. This is caused by the `eauals` method on a `DataSegment` only evaluating the identifier. the `eauals` method should be correct for object equivalence, and things which need to check equivalence of some sub-portion of the object should do so explicitly.	2016-07-26 11:47:34 -07:00
Keuntae Park	95a58097e2	Hadoop InputRowParser for Orc file (#3019 ) * InputRowParser to decode OrcStruct from OrcNewInputFormat * add unit test for orc hadoop indexing * update docs and fix test code bug * doc updated * resove maven dependency conflict * remove unused imports * fix returning array type from Object[] to correct primitive array type * fix to support getDimension() of MapBasedRow : changing return type of orc list from array to list * rebase and updated based on comments * updated based on comments * on reflecting review comments * fix bug in typeStringFromParseSpec() and add unit test * add license header	2016-07-26 09:42:56 -07:00
Erik Dubbelboer	76fabcfdb2	Fix #2782 , Unit test failed for DruidProcessingConfigTest.testDeserialization (#3231 ) On systems with only once processor this test fails.	2016-07-25 15:51:09 -07:00
kaijianding	3dc2974894	Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227 ) * save TimestampSpec in metadata.drd * add timestampSpec info in SegmentMetadataQuery	2016-07-25 15:45:30 -07:00
David Lim	d5ed3f1347	change expected response from ACCEPTED to OK (#3280 )	2016-07-23 19:48:30 -07:00
Gian Merlino	b316cde554	Appenderator tests for disjoint query intervals. (#3281 )	2016-07-23 19:48:15 -07:00
Charles Allen	c58bbfa0c6	Intern DataSegments in SQLMetadataSegmentManager (#3267 ) * Helps with heap pressure on coordinator	2016-07-21 16:46:08 -07:00
Jonathan Wei	a42ccb6d19	Support filtering on long columns (including __time) (#3180 ) * Support filtering on __time column * Rename DruidPredicate * Add docs for ValueMatcherFactory, add comment on getColumnCapabilities * Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate * Address PR comments (support filter on all long columns) * Use predicate factory instead of composite predicate * Address PR comments * Lazily initialize long handling in selector/in filter * Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe * Add multithreaded selector/in filter test * Fix non-final lock object in SelectorDimFilter	2016-07-20 17:08:49 -07:00
Navis Ryu	cd7337fc8a	Calculate max split size based on numMapTask in DatasourceInputFormat (#2882 ) * Calculate max split size based on numMapTask * updated docs & fixed possible ArithmeticException	2016-07-20 16:53:51 -07:00
Parag Jain	fd798d32bc	fix testSecuredGetServer ut (#3262 )	2016-07-20 10:20:13 -07:00
Gian Merlino	06624c40c0	Share query handling between Appenderator and RealtimePlumber. (#3248 ) Fixes inconsistent metric handling between the two implementations. Formerly, RealtimePlumber only emitted query/segmentAndCache/time and query/wait and Appenderator only emitted query/partial/time and query/wait (all per sink). Now they both do the same thing: - query/segmentAndCache/time, query/segment/time are the time spent per sink. - query/cpu/time is the CPU time spent per query. - query/wait/time is the executor waiting time per sink. These generally match historical metrics, except segmentAndCache & segment mean the same thing here, because one Sink may be partially cached and partially uncached and we aren't splitting that out.	2016-07-19 22:15:13 -05:00
Gian Merlino	50db86cb17	Quickstart: Use hadoopyString for batch indexing instead of string. (#3263 )	2016-07-19 10:18:10 -07:00
Nishant	47894c4eff	add comment for default hadoop coordinates (#3257 ) 1) Modify CliHadoopIndexer to share constant from `TaskConfig.DEFAULT_DEFAULT_HADOOP_COORDINATES` 2) add comment to pom.xml as discussed in https://github.com/druid-io/druid/pull/3044 fix name	2016-07-18 15:23:11 -07:00
Emanuele Cesena	a9a73c5f71	Distribution: pull-deps compiled hadoop version (#3044 )	2016-07-18 09:39:15 -07:00
Gian Merlino	13d8d96bc6	Update to guice-4.1.0. (#3222 )	2016-07-18 08:08:43 -07:00
Gian Merlino	dd4ec751d0	Update docs for working with Hadoop dependencies. (#3252 ) - Attempt to make things clearer in general - Point out that HDFS deep storage and MR jobs don't use the same loading mechanism - Recommend using mapreduce.job.classloader = true when possible	2016-07-18 07:47:58 -05:00
Himanshu	3f82108d15	optionally enable coordinator auto kill tasks on all dataSources via dynamic config (#3250 )	2016-07-17 18:47:52 -07:00
Nishant	7995818220	Increase test timeout to prevent failing on slow machines (#3224 ) constantly timing out on one of slow build machines, increasing the timeout fixed it. Running io.druid.granularity.QueryGranularityTest Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.776 sec - in io.druid.granularity.QueryGranularityTest	2016-07-17 18:44:48 -07:00
Gian Merlino	90f5d8cd17	Fix path in cluster.md. (#3253 )	2016-07-17 08:30:20 -07:00
Gian Merlino	6cd1f5375b	Better harmonized dimensions for query metrics. (#3245 ) All query metrics now start with toolChest.makeMetricBuilder, and all of those now start with DruidMetrics.makePartialQueryTimeMetric. Also, "id" moved to common code, since all query metrics added it anyway. In particular this will add query-type specific dimensions like "threshold" and "numDimensions" to servlet-originated metrics like query/time.	2016-07-14 11:55:51 -07:00
Hyukjin Kwon	55e7a52475	Replace deprecated usage for StringInputRowParser and JSONParseSpec (#3215 )	2016-07-14 09:19:17 -07:00
Nishant	a1715c8cda	fix-3237 (#3244 ) DruidBroker use FilteredServerInventoryView instead of ServerInventoryView	2016-07-13 22:30:35 -07:00
Gian Merlino	6a03a0cfec	Fix ingest/persist/backPressure docs. (#3243 )	2016-07-13 21:56:28 -07:00
Gian Merlino	c622a25236	BenchmarkDataGenerator: Don't generate timestamps at the end instant of the interval. (#3242 ) Because timestamps at the end instant are not actually part of the interval. This affected benchmark numbers, since it meant some data points would not be queried (the interval for the query was based on getDataInterval) and also the TimestampCheckingOffsets could not use the allWithinThreshold optimization.	2016-07-14 10:20:10 +05:30
Charles Allen	a931debf79	Optionally intern ServerInventoryView inventory objects. (#3238 )	2016-07-14 08:49:26 +05:30

... 6 7 8 9 10 ...

7712 Commits All Branches Search

7712 Commits

All Branches