druid

Commit Graph

Author	SHA1	Message	Date
Gian Merlino	e9050c2b4c	TimeFormatExtractionFn: Allow null formats (equivalent to ISO8601) and granular bucketing. (#3411 )	2016-08-31 20:58:53 +05:30
Stéphane Derosiaux	48dce88aab	Add flag binaryAsString for parquet ingestion (#3381 )	2016-08-30 17:30:50 -07:00
Dave Li	c4e8440c22	Adds long compression methods (#3148 ) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2	2016-08-30 16:17:46 -07:00
Jonathan Wei	4e91330a17	Use DimensionSpec in CardinalityAggregatorFactory (#3406 ) * Use DimensionSpec in CardinalityAggregatorFactory * Address PR comments * Fix requiredFields()	2016-08-30 15:54:02 -07:00
Ashish	6b40bf8b32	doc: added note to README, about necessary hdfs config after insert-segment-to-db (#3402 )	2016-08-28 16:39:33 -07:00
Chanh Le	d624037698	Pull-deps: correct the library directory in the document (#3361 ) * Pull-deps: correct the library directory in the document * Pull-deps: correct the library directory in the document in the last example command	2016-08-16 17:18:15 -07:00
Fangjin Yang	edb0eca3a9	fix docs (#3370 )	2016-08-16 16:25:50 -07:00
Fangjin Yang	6beb8ac342	fix some docs and add new content (#3369 )	2016-08-16 15:00:18 -07:00
rajk-tetration	362b9266f8	Adding filters for TimeBoundary on backend (#3168 ) * Adding filters for TimeBoundary on backend Signed-off-by: Balachandar Kesavan <raj.ksvn@gmail.com> * updating TimeBoundaryQuery constructor in QueryHostFinderTest * add filter helpers * update filterSegments + test * Conditional filterSegment depending on whether a filter exists * Style changes * Trigger rebuild * Adding documentation for timeboundaryquery filtering * added filter serialization to timeboundaryquery cache * code style changes	2016-08-15 10:25:24 -07:00
Himanshu	03cfcf002b	fix the race described in #3174 (#3205 )	2016-08-10 11:29:50 -07:00
Himanshu	46da682231	avro-extensions -- feature to specify avro reader schema inline in the task json for all events (#3249 )	2016-08-10 10:49:26 -07:00
Nishant	8035c73409	Implement EnvironmentVariablePasswordProvider (#3329 ) * Implement EnvironmentVariablePasswordProvider * Review Comment : rename passwordKey to passwordVariable * add docs * improve doc layout * review comment: rename property for variable	2016-08-10 05:33:51 +08:00
Gian Merlino	8899affe48	Introduce standardized "Resource limit exceeded" error. (#3338 ) Fixes #3336.	2016-08-09 10:50:56 -07:00
Gian Merlino	21bce96c4c	More useful query errors. (#3335 ) Follow-up to #1773, which meant to add more useful query errors but did not actually do so. Since that patch, any error other than interrupt/cancel/timeout was reported as `{"error":"Unknown exception"}`. With this patch, the error fields are: - error, one of the specific strings "Query interrupted", "Query timeout", "Query cancelled", or "Unknown exception" (same behavior as before). - errorMessage, the message of the topmost non-QueryInterruptedException in the causality chain. - errorClass, the class of the topmost non-QueryInterruptedException in the causality chain. - host, the host that failed the query.	2016-08-09 16:14:52 +08:00
Navis Ryu	39351fb8d2	Mask properties from logging (#3332 ) * Mask properties from logging * mask "password" by default	2016-08-08 21:36:10 +05:30
Himanshu	ed5b92d612	document how to check MM enabled/disabled (#3331 )	2016-08-06 05:56:51 +08:00
Jonathan Wei	decefb7477	Add time interval dim filter and retention analysis example (#3315 ) * Add time interval dim filter and retention analysis example * Use closed-open matching for intervals, update cache key generation * Fix time filtering tests for interval boundary change	2016-08-05 07:25:04 -07:00
Navis Ryu	5b3f0ccb1f	Support variance and standard deviation (#2525 ) * Support variance and standard deviation * addressed comments	2016-08-04 17:32:58 -07:00
kaijianding	50d52a24fc	ability to not rollup at index time, make pre aggregation an option (#3020 ) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug	2016-08-02 11:13:05 -07:00
Dave Li	bc20658239	groupBy nested query using v2 strategy (#3269 ) * changed v2 nested query strategy * add test for #3239 * update for new ValueMatcher interface and add benchmarks * enable time filtering * address PR comments * add failing test for outer filter aggregator * add helper class for sharing code * update nested groupby doc * move temporary storage instantiation * address PR comment * address PR comment 2	2016-08-01 18:30:39 -07:00
Fangjin Yang	d51ec398d4	fix parquet docs (#3304 )	2016-08-01 07:54:48 -07:00
Jonathan Wei	a6105cbb86	Add numeric StringComparator (#3270 ) * Add numeric StringComparator * Only use direct long comparison for numeric ordering in BoundFilter, add time filtering benchmark query * Address PR comments, add multithreaded BoundDimFilter test * Add comment on strlen tie handling * Add timeseries interval filter benchmark * Adjust docs * Use jackson for StringComparator, address PR comments * Add new TopNMetricSpec and SearchSortSpec with tests (WIP) * More TopNMetricSpec and SearchSortSpec tests * Fix NewSearchSortSpec serde * Update docs for new DimensionTopNMetricSpec * Delete NumericDimensionTopNMetricSpec * Delete old SearchSortSpec * Rename NewSearchSortSpec to SearchSortSpec * Add TopN numeric comparator benchmark, address PR comments * Refactor OrderByColumnSpec * Add null checks to NumericComparator and String->BigDecimal conversion function * Add more OrderByColumnSpec serde tests	2016-07-29 15:44:16 -07:00
Charles Allen	d04af6aee4	Add `slf4j` requst logger (#3146 ) * Add `slf4j` requst logger * Address comments * Fix conflicts with master * Fix removed map value	2016-07-29 15:15:41 -07:00
Gian Merlino	e5397ed316	Link up Hadoop class loading docs better. (#3302 )	2016-07-29 10:19:54 -07:00
Gian Merlino	2553997200	Associate groupBy v2 resources with the Sequence lifecycle. (#3296 ) This fixes a potential issue where groupBy resources could be allocated to create a Sequence, but then the Sequence is never used, and thus the resources are never freed. Also simplifies how groupBy handles config overrides (this made the new unit test easier to write).	2016-07-27 18:44:19 -07:00
Charles Allen	546e4f79b0	Add size of pending deletes to historical metrics (#3295 ) * Add size of pending deletes to historical metrics	2016-07-27 11:30:47 -07:00
Charles Allen	b1e3fe77f5	More logging around how the coordinator balancer is happening (#3279 ) * More logging around how the coordinator balancer is happening * Address comments * Address code review comments and add actual logging	2016-07-27 13:24:32 +05:30
David Lim	9a068e1ba6	fix broken link and use of pipes in table (#3290 )	2016-07-26 15:46:51 -07:00
Keuntae Park	95a58097e2	Hadoop InputRowParser for Orc file (#3019 ) * InputRowParser to decode OrcStruct from OrcNewInputFormat * add unit test for orc hadoop indexing * update docs and fix test code bug * doc updated * resove maven dependency conflict * remove unused imports * fix returning array type from Object[] to correct primitive array type * fix to support getDimension() of MapBasedRow : changing return type of orc list from array to list * rebase and updated based on comments * updated based on comments * on reflecting review comments * fix bug in typeStringFromParseSpec() and add unit test * add license header	2016-07-26 09:42:56 -07:00
kaijianding	3dc2974894	Add timestampSpec to metadata.drd and SegmentMetadataQuery (#3227 ) * save TimestampSpec in metadata.drd * add timestampSpec info in SegmentMetadataQuery	2016-07-25 15:45:30 -07:00
Jonathan Wei	a42ccb6d19	Support filtering on long columns (including __time) (#3180 ) * Support filtering on __time column * Rename DruidPredicate * Add docs for ValueMatcherFactory, add comment on getColumnCapabilities * Combine ValueMatcherFactory predicate methods to accept DruidCompositePredicate * Address PR comments (support filter on all long columns) * Use predicate factory instead of composite predicate * Address PR comments * Lazily initialize long handling in selector/in filter * Move long value parsing from InFilter to InDimFilter, make long value parsing thread-safe * Add multithreaded selector/in filter test * Fix non-final lock object in SelectorDimFilter	2016-07-20 17:08:49 -07:00
Navis Ryu	cd7337fc8a	Calculate max split size based on numMapTask in DatasourceInputFormat (#2882 ) * Calculate max split size based on numMapTask * updated docs & fixed possible ArithmeticException	2016-07-20 16:53:51 -07:00
Gian Merlino	dd4ec751d0	Update docs for working with Hadoop dependencies. (#3252 ) - Attempt to make things clearer in general - Point out that HDFS deep storage and MR jobs don't use the same loading mechanism - Recommend using mapreduce.job.classloader = true when possible	2016-07-18 07:47:58 -05:00
Himanshu	3f82108d15	optionally enable coordinator auto kill tasks on all dataSources via dynamic config (#3250 )	2016-07-17 18:47:52 -07:00
Gian Merlino	90f5d8cd17	Fix path in cluster.md. (#3253 )	2016-07-17 08:30:20 -07:00
Gian Merlino	6a03a0cfec	Fix ingest/persist/backPressure docs. (#3243 )	2016-07-13 21:56:28 -07:00
Gian Merlino	3ab4a4efbc	Fix formatting in granularities doc. (#3229 )	2016-07-08 09:29:58 -07:00
Gian Merlino	ea03906fcf	Configurable compressRunOnSerialization for Roaring bitmaps. (#3228 ) Defaults to true, which is a change in behavior (this used to be false and unconfigurable).	2016-07-08 10:24:19 +05:30
Charles Allen	3f1681c16c	Caffeine cache extension (#3028 ) * Initial commit of caffeine cache * Address code comments * Move and fixup README.md a bit * Improve caffeine readme information * Cleanup caffeine pom * Address review comments * Bump caffeine to 2.3.1 * Bump druid version to 0.9.2-SNAPSHOT * Make test not fail randomly. See https://github.com/ben-manes/caffeine/pull/93#issuecomment-227617998 for an explanation * Fix distribution and documentation * Add caffeine to extensions.md * Fix links in extensions.md * Lexicographic	2016-07-06 15:42:54 -07:00
Gian Merlino	b8a4f4ea7b	DumpSegment: Add --dump bitmaps option. (#3221 ) Also make --dump metadata respect --column.	2016-07-06 12:42:50 -07:00
Gian Merlino	fdc7e88a7d	Allow queries with no aggregators. (#3216 ) This is actually reasonable for a groupBy or lexicographic topNs that is being used to do a "COUNT DISTINCT" kind of query. No aggregators are needed for that query, and including a dummy aggregator wastes 8 bytes per row. It's kind of silly for timeseries, but why not.	2016-07-06 20:38:54 +05:30
Fangjin Yang	8eeae2e844	remove bad docs on setting up clusters (#3188 )	2016-07-01 15:41:40 -05:00
Parag Jain	99844dfeb5	remove need for tmp extensions dir (#3211 ) correct lib path relative to base distribution dir	2016-07-01 12:55:57 -07:00
Charles Allen	8b7d9750ee	Update extension docs for global lookup module (#3206 )	2016-06-29 12:51:52 -07:00
David Lim	b24425a280	update docs with new behavior (#3200 )	2016-06-28 16:17:04 -07:00
jaehong choi	efbcbf5315	Support alphanumeric sort in search query (#2593 ) * support alphanumeric sort in search query * address a comment about handling equals() and hashCode() * address comments * add Ut for string comparators * address a comment about space indentations.	2016-06-28 15:06:18 -07:00
Gian Merlino	4cc39b2ee7	Alternative groupBy strategy. (#2998 ) This patch introduces a GroupByStrategy concept and two strategies: "v1" is the current groupBy strategy and "v2" is a new one. It also introduces a merge buffers concept in DruidProcessingModule, to try to better manage memory used for merging. Both of these are described in more detail in #2987. There are two goals of this patch: 1. Make it possible for historical/realtime nodes to return larger groupBy result sets, faster, with better memory management. 2. Make it possible for brokers to merge streams when there are no order-by columns, avoiding materialization. This patch does not do anything to help with memory management on the broker when there are order-by columns or when there are nested queries. That could potentially be done in a future patch.	2016-06-24 18:06:09 -07:00
michaelschiff	66d8ad36d7	adds new coordinator metrics 'segment/unavailable/count' and (#3176 ) 'segment/underReplicated/count' (#3173)	2016-06-23 14:53:15 -07:00
Gian Merlino	da660bb592	DumpSegment tool. (#3182 ) Fixes #2723.	2016-06-23 14:37:50 -07:00
Dave Li	12be1c0a4b	Add bucket extraction function (#3033 ) * add bucket extraction function * add doc and header * updated doc and test	2016-06-17 09:24:27 -07:00

1 2 3 4 5 ...

1247 Commits