druid

mirror of https://github.com/apache/druid.git synced 2025-02-06 10:08:26 +00:00

Author	SHA1	Message	Date
Dave Li	bc20658239	groupBy nested query using v2 strategy (#3269 ) * changed v2 nested query strategy * add test for #3239 * update for new ValueMatcher interface and add benchmarks * enable time filtering * address PR comments * add failing test for outer filter aggregator * add helper class for sharing code * update nested groupby doc * move temporary storage instantiation * address PR comment * address PR comment 2	2016-08-01 18:30:39 -07:00
Fangjin Yang	d51ec398d4	fix parquet docs (#3304 )	2016-08-01 07:54:48 -07:00
Jonathan Wei	a6105cbb86	Add numeric StringComparator (#3270 ) * Add numeric StringComparator * Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query * Address PR comments, add multithreaded BoundDimFilter test * Add comment on strlen tie handling * Add timeseries interval filter benchmark * Adjust docs * Use jackson for StringComparator, address PR comments * Add new TopNMetricSpec and SearchSortSpec with tests (WIP) * More TopNMetricSpec and SearchSortSpec tests * Fix NewSearchSortSpec serde * Update docs for new DimensionTopNMetricSpec * Delete NumericDimensionTopNMetricSpec * Delete old SearchSortSpec * Rename NewSearchSortSpec to SearchSortSpec * Add TopN numeric comparator benchmark, address PR comments * Refactor OrderByColumnSpec * Add null checks to NumericComparator and String->BigDecimal conversion function * Add more OrderByColumnSpec serde tests	2016-07-29 15:44:16 -07:00
Charles Allen	d04af6aee4	Add `slf4j` requst logger (#3146 ) * Add `slf4j` requst logger * Address comments * Fix conflicts with master * Fix removed map value	2016-07-29 15:15:41 -07:00
Gian Merlino	e5397ed316	Link up Hadoop class loading docs better. (#3302 )	2016-07-29 10:19:54 -07:00
kaijianding	1fa681934c	fix ConcurrentModificationException in CachingClusteredClient.run() (#3278 ) * fix ConcurrentModificationException in CachingClusteredClient.run() * obtain new copy of PartitionHolder to avoid potential multi-threads read/write issue	2016-07-28 19:52:50 -07:00
Navis Ryu	884017d981	"all" type search query spec (#3300 ) * "all" type search query spec * addressed comments * added unit test	2016-07-28 18:16:15 -07:00
Gian Merlino	2553997200	Associate groupBy v2 resources with the Sequence lifecycle. (#3296 ) This fixes a potential issue where groupBy resources could be allocated to create a Sequence, but then the Sequence is never used, and thus the resources are never freed. Also simplifies how groupBy handles config overrides (this made the new unit test easier to write).	2016-07-27 18:44:19 -07:00
Charles Allen	546e4f79b0	Add size of pending deletes to historical metrics (#3295 ) * Add size of pending deletes to historical metrics	2016-07-27 11:30:47 -07:00
Charles Allen	b1e3fe77f5	More logging around how the coordinator balancer is happening (#3279 ) * More logging around how the coordinator balancer is happening * Address comments * Address code review comments and add actual logging	2016-07-27 13:24:32 +05:30
David Lim	9a068e1ba6	fix broken link and use of pipes in table (#3290 )	2016-07-26 15:46:51 -07:00
Gian Merlino	2f275497b6	Fix caching of extension classloaders. (#3289 )	2016-07-26 15:19:15 -07:00
Himanshu	b0fa274481	fix segmentMetadata query results in integration tests (#3288 )	2016-07-26 14:05:14 -07:00
Gian Merlino	8030f1cb67	Be more respectful of maxRowsInMemory. (#3284 ) - Appenderator: Respect maxRowsInMemory across all sinks. - KafkaIndexTask: Respect maxRowsInMemory across all partitions.	2016-07-26 15:02:35 -06:00
Gian Merlino	9b5523add3	Reference counting, better error handling for resources in groupBy v2. (#3268 ) Refcounting prevents releasing the merge buffer, or closing the concurrent grouper, before the processing threads have all finished. The better error handling prevents an avalanche of per-runner exceptions when grouping resources are exhausted, by grouping those all up into a single merged exception.	2016-07-27 01:59:02 +05:30
Charles Allen	188a4bc89a	Revert "Optionally intern ServerInventoryView inventory objects. (#3238 )" (#3286 ) This reverts commit a931debf790eaf4454ae13f35e11ba9d39765645. Fixes #3283 The core issue here is that realtime nodes announce their size as 0, so a coordinator which interns the realtime version of the data segment will not be able to see the new sized announcement when handoff occurs. This is caused by the `eauals` method on a `DataSegment` only evaluating the identifier. the `eauals` method should be correct for object equivalence, and things which need to check equivalence of some sub-portion of the object should do so explicitly.	2016-07-26 11:47:34 -07:00
Keuntae Park	95a58097e2	Hadoop InputRowParser for Orc file (#3019 ) * InputRowParser to decode OrcStruct from OrcNewInputFormat * add unit test for orc hadoop indexing * update docs and fix test code bug * doc updated * resove maven dependency conflict * remove unused imports * fix returning array type from Object[] to correct primitive array type * fix to support getDimension() of MapBasedRow : changing return type of orc list from array to list * rebase and updated based on comments * updated based on comments * on reflecting review comments * fix bug in typeStringFromParseSpec() and add unit test * add license header	2016-07-26 09:42:56 -07:00
Erik Dubbelboer	76fabcfdb2	Fix #2782 , Unit test failed for DruidProcessingConfigTest.testDeserialization (#3231 ) On systems with only once processor this test fails.	2016-07-25 15:51:09 -07:00
kaijianding	3dc2974894	Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227 ) * save TimestampSpec in metadata.drd * add timestampSpec info in SegmentMetadataQuery	2016-07-25 15:45:30 -07:00
David Lim	d5ed3f1347	change expected response from ACCEPTED to OK (#3280 )	2016-07-23 19:48:30 -07:00
Gian Merlino	b316cde554	Appenderator tests for disjoint query intervals. (#3281 )	2016-07-23 19:48:15 -07:00
Charles Allen	c58bbfa0c6	Intern DataSegments in SQLMetadataSegmentManager (#3267 ) * Helps with heap pressure on coordinator	2016-07-21 16:46:08 -07:00
Jonathan Wei	a42ccb6d19	Support filtering on long columns (including __time) (#3180 ) * Support filtering on __time column * Rename DruidPredicate * Add docs for ValueMatcherFactory, add comment on getColumnCapabilities * Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate * Address PR comments (support filter on all long columns) * Use predicate factory instead of composite predicate * Address PR comments * Lazily initialize long handling in selector/in filter * Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe * Add multithreaded selector/in filter test * Fix non-final lock object in SelectorDimFilter	2016-07-20 17:08:49 -07:00
Navis Ryu	cd7337fc8a	Calculate max split size based on numMapTask in DatasourceInputFormat (#2882 ) * Calculate max split size based on numMapTask * updated docs & fixed possible ArithmeticException	2016-07-20 16:53:51 -07:00
Parag Jain	fd798d32bc	fix testSecuredGetServer ut (#3262 )	2016-07-20 10:20:13 -07:00
Gian Merlino	06624c40c0	Share query handling between Appenderator and RealtimePlumber. (#3248 ) Fixes inconsistent metric handling between the two implementations. Formerly, RealtimePlumber only emitted query/segmentAndCache/time and query/wait and Appenderator only emitted query/partial/time and query/wait (all per sink). Now they both do the same thing: - query/segmentAndCache/time, query/segment/time are the time spent per sink. - query/cpu/time is the CPU time spent per query. - query/wait/time is the executor waiting time per sink. These generally match historical metrics, except segmentAndCache & segment mean the same thing here, because one Sink may be partially cached and partially uncached and we aren't splitting that out.	2016-07-19 22:15:13 -05:00
Gian Merlino	50db86cb17	Quickstart: Use hadoopyString for batch indexing instead of string. (#3263 )	2016-07-19 10:18:10 -07:00
Nishant	47894c4eff	add comment for default hadoop coordinates (#3257 ) 1) Modify CliHadoopIndexer to share constant from `TaskConfig.DEFAULT_DEFAULT_HADOOP_COORDINATES` 2) add comment to pom.xml as discussed in https://github.com/druid-io/druid/pull/3044 fix name	2016-07-18 15:23:11 -07:00
Emanuele Cesena	a9a73c5f71	Distribution: pull-deps compiled hadoop version (#3044 )	2016-07-18 09:39:15 -07:00
Gian Merlino	13d8d96bc6	Update to guice-4.1.0. (#3222 )	2016-07-18 08:08:43 -07:00
Gian Merlino	dd4ec751d0	Update docs for working with Hadoop dependencies. (#3252 ) - Attempt to make things clearer in general - Point out that HDFS deep storage and MR jobs don't use the same loading mechanism - Recommend using mapreduce.job.classloader = true when possible	2016-07-18 07:47:58 -05:00
Himanshu	3f82108d15	optionally enable coordinator auto kill tasks on all dataSources via dynamic config (#3250 )	2016-07-17 18:47:52 -07:00
Nishant	7995818220	Increase test timeout to prevent failing on slow machines (#3224 ) constantly timing out on one of slow build machines, increasing the timeout fixed it. Running io.druid.granularity.QueryGranularityTest Tests run: 33, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 11.776 sec - in io.druid.granularity.QueryGranularityTest	2016-07-17 18:44:48 -07:00
Gian Merlino	90f5d8cd17	Fix path in cluster.md. (#3253 )	2016-07-17 08:30:20 -07:00
Gian Merlino	6cd1f5375b	Better harmonized dimensions for query metrics. (#3245 ) All query metrics now start with toolChest.makeMetricBuilder, and all of those now start with DruidMetrics.makePartialQueryTimeMetric. Also, "id" moved to common code, since all query metrics added it anyway. In particular this will add query-type specific dimensions like "threshold" and "numDimensions" to servlet-originated metrics like query/time.	2016-07-14 11:55:51 -07:00
Hyukjin Kwon	55e7a52475	Replace deprecated usage for StringInputRowParser and JSONParseSpec (#3215 )	2016-07-14 09:19:17 -07:00
Nishant	a1715c8cda	fix-3237 (#3244 ) DruidBroker use FilteredServerInventoryView instead of ServerInventoryView	2016-07-13 22:30:35 -07:00
Gian Merlino	6a03a0cfec	Fix ingest/persist/backPressure docs. (#3243 )	2016-07-13 21:56:28 -07:00
Gian Merlino	c622a25236	BenchmarkDataGenerator: Don't generate timestamps at the end instant of the interval. (#3242 ) Because timestamps at the end instant are not actually part of the interval. This affected benchmark numbers, since it meant some data points would not be queried (the interval for the query was based on getDataInterval) and also the TimestampCheckingOffsets could not use the allWithinThreshold optimization.	2016-07-14 10:20:10 +05:30
Charles Allen	a931debf79	Optionally intern ServerInventoryView inventory objects. (#3238 )	2016-07-14 08:49:26 +05:30
Gian Merlino	3ab4a4efbc	Fix formatting in granularities doc. (#3229 )	2016-07-08 09:29:58 -07:00
Gian Merlino	ea03906fcf	Configurable compressRunOnSerialization for Roaring bitmaps. (#3228 ) Defaults to true, which is a change in behavior (this used to be false and unconfigurable).	2016-07-08 10:24:19 +05:30
Charles Allen	5d9fd0a713	Migrate IndexerSQLMetadataStorageCoordinator.getUnusedSegmentsForInterval to streaming (#3043 ) * Migrate IndexerSQLMetadataStorageCoordinator.getUnusedSegmentsForInterval to streaming * Missed query from #2859 * Make inReadOnlyTransaction part of SQLMetadataConnector	2016-07-06 16:55:27 -07:00
Charles Allen	3f1681c16c	Caffeine cache extension (#3028 ) * Initial commit of caffeine cache * Address code comments * Move and fixup README.md a bit * Improve caffeine readme information * Cleanup caffeine pom * Address review comments * Bump caffeine to 2.3.1 * Bump druid version to 0.9.2-SNAPSHOT * Make test not fail randomly. See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation * Fix distribution and documentation * Add caffeine to extensions.md * Fix links in extensions.md * Lexicographic	2016-07-06 15:42:54 -07:00
Gian Merlino	b8a4f4ea7b	DumpSegment: Add --dump bitmaps option. (#3221 ) Also make --dump metadata respect --column.	2016-07-06 12:42:50 -07:00
Gian Merlino	fdc7e88a7d	Allow queries with no aggregators. (#3216 ) This is actually reasonable for a groupBy or lexicographic topNs that is being used to do a "COUNT DISTINCT" kind of query. No aggregators are needed for that query, and including a dummy aggregator wastes 8 bytes per row. It's kind of silly for timeseries, but why not.	2016-07-06 20:38:54 +05:30
Charles Allen	bfa5c05aaa	Make global lookup cache introspector class public (#3199 ) * Make global lookup cache introspector class public * Fixes #3187 * Make KafkaLookupExtractorIntrospectionHandler a public static class	2016-07-01 15:50:57 -07:00
Himanshu	e1313e4b90	add log msg when event recvr firehose buffer is full (#3209 )	2016-07-01 17:35:30 -05:00
Fangjin Yang	8eeae2e844	remove bad docs on setting up clusters (#3188 )	2016-07-01 15:41:40 -05:00
Parag Jain	99844dfeb5	remove need for tmp extensions dir (#3211 ) correct lib path relative to base distribution dir	2016-07-01 12:55:57 -07:00

1 2 3 4 5 ...

7352 Commits