druid

Commit Graph

Author	SHA1	Message	Date
Slim	71e7a4c054	Adding double colums supports (#4491 ) * add double columns support * Fix numbers and expected results in UTs * adding float aggregators * fix IT expected test results * fix comments * more fixes * fix comp * fix test * refactor double and float aggregator factories * fix * fix UTs * fix comments * clean unused code * fix more comments * undo unnecessary changes * fix null issue * refactor TopNColumnSelectorStrategyFactory * fix docs * refactor NumericTopNColumnSelectorStrategy * fix return * fix comments * handle the null case in DimesionIndexer * more null fixing * cosmetic changes	2017-07-20 10:14:14 +03:00
Jihoon Son	cc20260078	Early publishing segments in the middle of data ingestion (#4238 ) * Early publishing segments in the middle of data ingestion * Remove unnecessary logs * Address comments * Refactoring the patch according to #4292 and address comments * Set the total shard number of NumberedShardSpec to 0 * refactoring * Address comments * Fix tests * Address comments * Fix sync problem of committer and retry push only * Fix doc * Fix build failure * Address comments * Fix compilation failure * Fix transient test failure	2017-07-10 22:35:36 -07:00
Roman Leventov	05d58689ad	Remove the ability to create segments in v8 format (#4420 ) * Remove ability to create segments in v8 format * Fix IndexGeneratorJobTest * Fix parameterized test name in IndexMergerTest * Remove extra legacy merging stuff * Remove legacy serializer builders * Remove ConciseBitmapIndexMergerTest and RoaringBitmapIndexMergerTest	2017-06-26 13:21:39 -07:00
Jihoon Son	1150bf7a2c	Refactoring Appenderator Driver (#4292 ) * Refactoring Appenderator 1) Added publishExecutor and handoffExecutor for background publishing and handing segments off 2) Change add() to not move segments out in it * Address comments 1) Remove publishTimeout for KafkaIndexTask 2) Simplifying registerHandoff() 3) Add increamental handoff test * Remove unused variable * Add persist() to Appenderator and more tests for AppenderatorDriver * Remove unused imports * Fix strict build * Address comments	2017-06-02 07:09:11 +09:00
Kenji Noguchi	3400f601db	Protobuf extension (#4039 ) * move ProtoBufInputRowParser from processing module to protobuf extensions * Ported PR #3509 * add DynamicMessage * fix local test stuff that slipped in * add license header * removed redundant type name * removed commented code * fix code style * rename ProtoBuf -> Protobuf * pom.xml: shade protobuf classes, handle .desc resource file as binary file * clean up error messages * pick first message type from descriptor if not specified * fix protoMessageType null check. add test case * move protobuf-extension from contrib to core * document: add new configuration keys, and descriptions * update document. add examples * move protobuf-extension from contrib to core (2nd try) * touch * include protobuf extensions in the distribution * fix whitespace * include protobuf example in the distribution * example: create new pb obj everytime * document: use properly quoted json * fix whitespace * bump parent version to 0.10.1-SNAPSHOT * ignore Override check * touch	2017-05-30 13:11:58 -07:00
Jihoon Son	11b7b1bea6	Add support for HttpFirehose (#4297 ) * Add support for HttpFirehose * Fix document * Add documents	2017-05-25 16:13:04 -05:00
Jihoon Son	733dfc9b30	Add PrefetchableTextFilesFirehoseFactory for cloud storage types (#4193 ) * Add PrefetcheableTextFilesFirehoseFactory * fix comment * exception handling * Fix wrong json property * Remove ReplayableFirehoseFactory and fix misspelling * Defer object initialization * Add a temporaryDirectory parameter to FirehoseFactory.connect() * fix when cache and fetch are disabled * Address comments * Add more test * Increase timeout for test * Add wrapObjectStream * Move methods to Firehose from PrefetchableFirehoseFactory * Cleanup comment * add directory listing to s3 firehose * Rename a variable * Addressing comments * Update document * Support disabling prefetch * Fix race condition * Add fetchLock * Remove ReplayableFirehoseFactoryTest * Fix compilation error * Fix test failure * Address comments * Add default implementation for new method	2017-05-18 15:37:18 +09:00
Jihoon Son	50a4ec2b0b	Add support for headers and skipping thereof for CSV and TSV (#4254 ) * initial commit * small fixes * fix bug * fix bug * address code review * more cr * more cr * more cr * fix * Skip head rows for CSV and TSV * Move checking skipHeadRows to FileIteratingFirehose * Remove checking null iterators * Remove unused imports * Address comments * Fix compilation error * Address comments * Add more tests * Add a comment to ReplayableFirehose * Addressing comments * Add docs and fix typos	2017-05-15 22:57:31 -07:00
Himanshu	de081c711b	RealtimeIndexTask to support alertTimeout in context (#4089 ) * RealtimeIndexTask to support alertTimeout in context and raise alert if task process exists after the timeout * move alertTimeout config to tuningConfig and document	2017-03-24 12:48:12 -07:00
Gian Merlino	b4289c0004	Remove "granularity" from IngestSegmentFirehose. (#4110 ) It wasn't doing anything useful (the sequences were being concatted, and cursor.getTime() wasn't being called) and it defaulted to Granularities.NONE. Changing it to Granularities.ALL gave me a 700x+ performance boost on a small dataset I was reindexing (2m27s to 365ms). Most of that was from avoiding making a lot of unnecessary column selectors.	2017-03-24 10:28:54 -07:00
Gian Merlino	cab2e2f5d5	Add docs about filtering and indexes on numeric columns. (#4035 )	2017-03-10 12:48:59 -08:00
kaijianding	19ac1c7c2c	Add SameIntervalMergeTask for easier usage of MergeTask (#3981 ) * Add SameIntervalMergeTask for easier usage of MergeTask * fix a bug and add ut * remove same_interval_merge_sub from Task.java and remove other no needed code	2017-03-06 11:21:32 -06:00
Jonathan Wei	a08660a9ca	Support ingestion of long/float dimensions (#3966 ) * Support ingestion for long/float dimensions * Allow non-arrays for key components in indexing type strategy interfaces * Add numeric index merge test, fixes * Docs for numeric dims at ingestion * Remove unused import * Adjust docs, add aggregate on numeric dims tests * remove unused imports * Throw exception for bitmap method on numerics * Move typed selector creation to DimensionIndexer interface * unused imports * Fix * Remove unused DimensionSpec from indexer methods, check for dims first in inc index storage adapter * Remove spaces	2017-02-28 19:04:41 -08:00
kaijianding	ef6a19c81b	buildV9Directly in MergeTask and AppendTask (#3976 ) * buildV9Directly in MergeTask and AppendTask * add doc	2017-02-28 10:04:32 -08:00
praveev	c3bf40108d	One granularity (#3850 ) * Refactor Segment Granularity * Beginning of one granularity * Copy the fix for custom periods in segment-grunalrity over here. * Remove the custom serialization for now. * Compilation cleanup * Reformat code * Fixing unit tests * Unify to use a single iterable * Backward compatibility for rolling upgrade * Minor check style. Cosmetic changes. * Rename length and millis to duration * CR feedback * Minor changes.	2017-02-25 01:02:29 -06:00
Jihoon Son	ebd100cbb0	Set default query granularity for null value (#3965 )	2017-02-22 17:38:43 -08:00
Himanshu	9dfcf0763a	disable javascript execution by default (#3818 )	2017-02-13 15:11:18 -08:00
Gian Merlino	151ff6d064	flattenSpec: Document that "expr" is ignored for type "root". (#3884 )	2017-01-31 10:27:20 -08:00
David Lim	ff52581bd3	IndexTask improvements (#3611 ) * index task improvements * code review changes * add null check	2017-01-18 14:24:37 -08:00
Gian Merlino	bcd20441be	Make buildV9Directly the default. (#3688 )	2016-11-14 09:29:32 -08:00
praveev	52a74cf84f	Use timestamp in millis as Map key instead of DateTime object (#3674 ) * Use Long timestamp as key instead of DateTime. DateTime representation is screwed up when you store with an obj and read with a different DateTime obj. For example: The code below fails when you use DateTime as key ``` DateTime odt = DateTime.now(DateTimeUtils.getZone(DateTimeZone.forID("America/Los_Angeles"))); HashMap<DateTime, String> map = new HashMap<>(); map.put(odt, "abc"); DateTime dt = new DateTime(odt.getMillis()); System.out.println(map.get(dt)); ``` * Respect timezone when creating the file. * Update docs with timezone caveat in granularity spec * Remove unused imports	2016-11-11 10:20:20 -08:00
Akash Dwivedi	3a83e0513e	Doc update(batch-ingestion) to include useExplicitVersion. (#3557 )	2016-10-07 14:48:00 -07:00
praveev	43cdc675c7	Add support for timezone in segment granularity (#3528 ) * Add support for timezone in segment granularity * CR feedback. Handle null timezone during equals check. * Include timezone in docs. Add timezone for ArbitraryGranularitySpec.	2016-10-03 08:15:42 -07:00
Gian Merlino	27bd5cb13a	Add forceExtendableShardSpecs option to Hadoop indexing, IndexTask. (#3473 ) Fixes #3241.	2016-09-21 13:40:04 -06:00
Gian Merlino	e0e28866ee	JavaScript docs: Fix links and typos, add to TOC. (#3457 )	2016-09-13 15:26:44 -07:00
Gian Merlino	76a24054e3	JavaScript docs, including docs for globals. (#3454 )	2016-09-13 13:46:55 -07:00
Slim	ba6ddf307e	Adding hadoop kerberos authentification. (#3419 ) * adding kerberos authentication * make the 2 functions identical	2016-09-13 10:42:50 -07:00
Dave Li	c4e8440c22	Adds long compression methods (#3148 ) * add read * update deprecated guava calls * add write and vsizeserde * add benchmark * separate encoding and compression * add header and reformat * update doc * address PR comment * fix buffer order * generate benchmark files * separate encoding strategy and format * fix benchmark * modify supplier write to channel * add float NONE handling * address PR comment * address PR comment 2	2016-08-30 16:17:46 -07:00
kaijianding	50d52a24fc	ability to not rollup at index time, make pre aggregation an option (#3020 ) * ability to not rollup at index time, make pre aggregation an option * rename getRowIndexForRollup to getPriorIndex * fix doc misspelling * test query using no-rollup indexes * fix benchmark fail due to jmh bug	2016-08-02 11:13:05 -07:00
Gian Merlino	e5397ed316	Link up Hadoop class loading docs better. (#3302 )	2016-07-29 10:19:54 -07:00
Navis Ryu	cd7337fc8a	Calculate max split size based on numMapTask in DatasourceInputFormat (#2882 ) * Calculate max split size based on numMapTask * updated docs & fixed possible ArithmeticException	2016-07-20 16:53:51 -07:00
Gian Merlino	ea03906fcf	Configurable compressRunOnSerialization for Roaring bitmaps. (#3228 ) Defaults to true, which is a change in behavior (this used to be false and unconfigurable).	2016-07-08 10:24:19 +05:30
Jonathan Wei	c5dbf364e3	Fix JSON flatten docs, add link to path expression tester (#3105 )	2016-06-07 14:39:57 -07:00
Nishant	0ac1b27d53	Allow manually setting of shutoffTime for EventReceiverFirehose (#2803 ) * Allow dynamically setting of shutoffTime for EventReceiverFirehose Allow dynamically setting shutoffTime for EventReceiverFirehose review comments and tests * shut down exec on close	2016-05-24 07:24:00 -07:00
Gian Merlino	fffa9c8265	Fix flattenSpec docs, "nested" should be "path". (#2924 )	2016-05-05 08:59:41 -07:00
David Lim	890bdb543d	doc fixes (#2897 )	2016-04-28 15:34:58 -07:00
Fangjin Yang	abd951df1a	Document how to use roaring bitmaps (#2824 ) * Document how to use roaring bitmaps This fixes #2408. While not all indexSpec properties are explained, it does explain how roaring bitmaps can be turned on. * fix * fix * fix * fix	2016-04-12 19:28:02 -07:00
Sébastien Launay	37d2ab623e	Merge pull request #2815 from slaunay/documentation/hadoop-classpath-issue-fix-with-configuration Doc for mapreduce.job.user.classpath.first=true	2016-04-12 10:51:51 -07:00
Himanshu Gupta	004b00bb96	config to explicitly specify classpath for hadoop container during hadoop ingestion	2016-03-25 10:51:28 -05:00
Gian Merlino	2dfd3877c0	Fix a bunch of broken links in the docs.	2016-03-23 10:21:28 -07:00
fjy	943cbe6e76	refactor extensions into their own docs	2016-03-22 18:54:10 -07:00
binlijin	bce600f5d5	Single dimension hash-based partitioning	2016-03-22 13:15:33 +08:00
Gian Merlino	a2b1652787	Clarify parser docs. - Clarify what parseSpecs are used for. - Avro, Protobuf should use timeAndDims parseSpecs. - Hadoop jobs should use hadoopyString string parsers.	2016-03-10 08:45:04 -08:00
fjy	e3e932a4d4	refactor extensions into core and contrib	2016-03-08 17:12:09 -08:00
Fangjin Yang	8e36e6fa43	Merge pull request #2610 from dclim/add-combineText-doc add combineText property and cleanup batch ingestion doc	2016-03-08 12:54:16 -08:00
dclim	df29667a89	add combineText property and cleanup batch ingestion doc	2016-03-08 13:10:34 -07:00
Himanshu Gupta	0402636598	configurable handoffConditionTimeout in realtime tasks for segment handoff wait	2016-03-05 10:14:54 -06:00
Slim Bouguerra	623e89aa54	skip corrupt message	2016-03-04 08:30:40 -06:00
Björn Zettergren	2462c82c0e	New defaults for maxRowsInMemory rowFlushBoundary To bring consistency to docs and source this commit changes the default values for maxRowsInMemory and rowFlushBoundary to 75000 after discussion in PR https://github.com/druid-io/druid/pull/2457. The previous default was 500000 and it's lower now on the grounds that it's better for a default to be somewhat less efficient, and work, than to reach for the stars and possibly result in "OutOfMemoryError: java heap space" errors.	2016-03-01 13:50:28 +01:00
Charles Allen	1fe277ee29	Merge pull request #2367 from se7entyse7en/feature-rackspace-cloud-files-static-firehose Adds support to use Rackspace's cloudfiles as static firehose	2016-02-25 17:31:06 -08:00
Gian Merlino	3534483433	Better handling of ParseExceptions. Two changes: - Allow IncrementalIndex to suppress ParseExceptions on "aggregate". - Add "reportParseExceptions" option to realtime tuning configs. By default this is "false". Behavior of the counters should now be: - processed: Number of rows indexed, including rows where some fields could be parsed and some could not. - thrownAway: Number of rows thrown away due to rejection policy. - unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all). If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would cause an exception to be thrown). In addition, "processed" will only include fully parseable rows (because even partial parse failures will cause exceptions to be thrown). Fixes #2510.	2016-02-23 10:11:43 -08:00
Himanshu Gupta	21b0b8a07d	new coordinator endpoint to get list of used segment given a dataSource and list of intervals	2016-02-21 23:17:58 -06:00
Himanshu Gupta	09ffcae4ae	give user the option to specify the segments for dataSource inputSpec	2016-02-21 23:15:31 -06:00
Fangjin Yang	083f019a48	Merge pull request #2465 from druid-io/more-doc-fix more doc fixes	2016-02-17 11:00:38 -08:00
fjy	7da6594bfe	more doc fixes	2016-02-17 09:43:47 -08:00
Gian Merlino	3a996216bd	Multivalued dimensions can be compressed since 0.8.0.	2016-02-17 08:33:21 -08:00
Himanshu	f6eebf5884	Merge pull request #2422 from rasahner/docMinorFixes some minor doc changes	2016-02-09 10:03:22 -06:00
Robin	1d57e3267d	some minor doc changes	2016-02-09 08:20:53 -06:00
fjy	6fc5bcb1ef	fix docs	2016-02-08 13:40:53 -08:00
fjy	003f54e268	add doc rendering	2016-02-04 14:21:59 -08:00
fjy	1aa363cea7	new quickstart	2016-02-04 09:37:38 -08:00
Lou Marvin Caraig	9de57eb1c8	Added documentation	2016-02-02 14:32:12 +01:00
Björn Zettergren	d373573c25	DOCs: Missing 'type' for leaveIntermediate Added missing 'Boolean' as type for leaveIntermediate row in table TuningConfig	2016-01-29 14:42:19 +01:00
Himanshu Gupta	b3437825f0	add ignoreWhenNoSegments flag to optionally ignore the dataSource inputSpec when no segments were found	2016-01-26 17:23:55 -06:00
binlijin	cd1c71ceb4	rename persistBackgroundCount to numBackgroundPersistThreads	2016-01-22 14:29:41 +08:00
Nishant	dcb7830330	Merge pull request #984 from drcrallen/thread-priority-rebase Use thread priorities. (aka set `nice` values for background-like tasks)	2016-01-21 15:02:34 +05:30
Charles Allen	2a69a58570	Merge pull request #2149 from binlijin/master Do persist IncrementalIndex in another thread in IndexGeneratorReducer	2016-01-20 17:06:42 -08:00
Charles Allen	2e1d6aaf3d	Use thread priorities. (aka set `nice` values for background-like tasks) * Defaults the thread priority to java.util.Thread.NORM_PRIORITY in io.druid.indexing.common.task.AbstractTask * Each exec service has its own Task Factory which is assigned a priority for spawned task. Therefore each priority class has a unique exec service * Added priority to tasks as taskPriority in the task context. <0 means low, 0 means take default, >0 means high. It is up to any particular implementation to determine how to handle these numbers * Add options to ForkingTaskRunner * Add "-XX:+UseThreadPriorities" default option * Add "-XX:ThreadPriorityPolicy=42" default option * AbstractTask - Removed unneded @JsonIgnore on priority * Added priority to RealtimePlumber executors. All sub-executors (non query runners) get Thread.MIN_PRIORITY * Add persistThreadPriority and mergeThreadPriority to realtime tuning config	2016-01-20 14:00:31 -08:00
Logan Linn	c3bdaefe1f	Update batch-ingestion.md Fix documented type of the `dataGranularity` config	2016-01-19 17:20:47 -08:00
binlijin	8e43e2c446	Do persist IncrementalIndex in another thread in IndexGeneratorReducer	2016-01-20 09:20:09 +08:00
Kurt Young	82ff98c2bf	add config for build v9 directly and update docs	2016-01-16 11:26:34 +08:00
Zhao Weinan	5e57ddb8cc	Adding avro support to realtime & hadoop batch indexing.	2016-01-05 10:21:27 +08:00
Robin	0961c0b703	trivial documentation fix	2016-01-04 12:39:10 -06:00
fjy	88f6b9b5ad	Multiple improvements for docs	2016-01-02 21:54:54 -08:00
Himanshu Gupta	48de9dfafa	doc update to make it easy to find how to do re-indexing or delta ingestion	2015-12-30 23:58:09 -06:00
fjy	398a3ec620	add docs for more specs	2015-12-17 18:06:30 -08:00
jon-wei	c53bf85d83	Add docs and benchmark for JSON flattening parser	2015-12-09 16:13:30 -08:00
Himanshu Gupta	efe3c9f4a5	update the examples for batch reindexing/delta ingestion to use "intervals" instead of deprecated "interval"	2015-12-06 00:22:20 -06:00
Himanshu Gupta	61aaa09012	support multiple intervals in dataSource input spec	2015-12-03 21:28:04 -06:00
jon-wei	95dca4440f	Update data formats doc with info about JSON multi-value dimensions	2015-11-24 14:38:06 -08:00
sahner	a4ed2ce2d1	fix formatting in schema-design	2015-11-17 16:50:53 -06:00
fjy	8f231fd3e3	cleanup druid codebase	2015-11-04 13:59:53 -08:00
Nishant	efc49da073	fix doc - correct default value for maxRowsInMemory	2015-11-01 22:09:24 -08:00
Bingkun Guo	4914925d65	New extension loading mechanism 1) Remove maven client from downloading extensions at runtime. 2) Provide a way to load Druid extensions and hadoop dependencies through file system. 3) Refactor pull-deps so that it can download extensions into extension directories. 4) Add documents on how to use this new extension loading mechanism. 5) Change the way how Druid tarball is generated. Now all the extensions + hadoop-client 2.3.0 are packaged within the Druid tarball.	2015-10-21 14:22:36 -05:00
Gian Merlino	933cbdf780	Adjust realtime constraints in the docs.	2015-10-09 10:52:52 -07:00
Gian Merlino	b29cbf97a6	Docs: Suggest hadoopyString parser for Hadoop.	2015-09-16 10:19:42 -07:00
Himanshu Gupta	075b6d4385	update ingestion faq to mention dataSource inputSpec as an option of reindexing via hadoop	2015-09-10 14:41:13 -05:00
Xavier Léauté	d89b0fa76a	Merge pull request #1662 from qix/pathFormat-doc Add documentation for pathFormat in batch ingestion	2015-08-31 11:14:54 -07:00
Josh Yudaken	29c29b42d3	Add default value and link to joda docs	2015-08-31 11:09:54 -07:00
lvjq	2237a8cf0f	kafka 8 simple consumer firehose	2015-08-27 20:50:46 -05:00
Bingkun	ae1f104c10	Fix batch ingestion doc	2015-08-26 15:16:21 -05:00
Gian Merlino	10946610f4	Merge pull request #1656 from druid-io/all-the-docs more docs for common questions	2015-08-25 17:49:47 -07:00
fjy	4055f9ca48	more docs for common questions	2015-08-25 17:49:04 -07:00
sahner	3def847e28	add documentation about TimedShutoff firehose	2015-08-24 20:41:42 -05:00
Josh Yudaken	5e42aee49e	Add documentation for pathFormat in batch ingestion	2015-08-24 14:39:57 -07:00
Himanshu Gupta	cfd81bfac7	updating the docs on how to do hadoop batch re-ingesion and delta ingestion	2015-08-16 14:07:35 -05:00
fjy	012fff6616	fix firehose docs	2015-08-04 09:52:23 -07:00
Himanshu Gupta	7ee509bcd0	fix mysql references in tutorial docs	2015-07-30 22:05:05 -05:00
pdeva	ef0439229d	Specify dynamic dimension schema Document how druid can dynamically infer dimension columns	2015-07-27 20:20:53 -07:00
sahner	4801de62a2	make "announce" the chathandler default in realtime node, remove doc references to chathandler type "announce" since it is the default now,	2015-07-27 12:14:28 -05:00

1 2 3 4

167 Commits