121 Commits

Author SHA1 Message Date
Himanshu Gupta
0402636598 configurable handoffConditionTimeout in realtime tasks for segment handoff wait 2016-03-05 10:14:54 -06:00
Slim Bouguerra
623e89aa54 skip corrupt message 2016-03-04 08:30:40 -06:00
Björn Zettergren
2462c82c0e New defaults for maxRowsInMemory rowFlushBoundary
To bring consistency to docs and source this commit changes the default
values for maxRowsInMemory and rowFlushBoundary to 75000 after
discussion in PR https://github.com/druid-io/druid/pull/2457.

The previous default was 500000 and it's lower now on the grounds that
it's better for a default to be somewhat less efficient, and work,
than to reach for the stars and possibly result in
"OutOfMemoryError: java heap space" errors.
2016-03-01 13:50:28 +01:00
Charles Allen
1fe277ee29 Merge pull request #2367 from se7entyse7en/feature-rackspace-cloud-files-static-firehose
Adds support to use Rackspace's cloudfiles as static firehose
2016-02-25 17:31:06 -08:00
Gian Merlino
3534483433 Better handling of ParseExceptions.
Two changes:
- Allow IncrementalIndex to suppress ParseExceptions on "aggregate".
- Add "reportParseExceptions" option to realtime tuning configs. By default this is "false".

Behavior of the counters should now be:

- processed: Number of rows indexed, including rows where some fields could be parsed and some could not.
- thrownAway: Number of rows thrown away due to rejection policy.
- unparseable: Number of rows thrown away due to being completely unparseable (no fields salvageable at all).

If "reportParseExceptions" is true then "unparseable" will always be zero (because a parse error would
cause an exception to be thrown). In addition, "processed" will only include fully parseable rows
(because even partial parse failures will cause exceptions to be thrown).

Fixes #2510.
2016-02-23 10:11:43 -08:00
Himanshu Gupta
21b0b8a07d new coordinator endpoint to get list of used segment given a dataSource and list of intervals 2016-02-21 23:17:58 -06:00
Himanshu Gupta
09ffcae4ae give user the option to specify the segments for dataSource inputSpec 2016-02-21 23:15:31 -06:00
Fangjin Yang
083f019a48 Merge pull request #2465 from druid-io/more-doc-fix
more doc fixes
2016-02-17 11:00:38 -08:00
fjy
7da6594bfe more doc fixes 2016-02-17 09:43:47 -08:00
Gian Merlino
3a996216bd Multivalued dimensions can be compressed since 0.8.0. 2016-02-17 08:33:21 -08:00
Himanshu
f6eebf5884 Merge pull request #2422 from rasahner/docMinorFixes
some minor doc changes
2016-02-09 10:03:22 -06:00
Robin
1d57e3267d some minor doc changes 2016-02-09 08:20:53 -06:00
fjy
6fc5bcb1ef fix docs 2016-02-08 13:40:53 -08:00
fjy
003f54e268 add doc rendering 2016-02-04 14:21:59 -08:00
fjy
1aa363cea7 new quickstart 2016-02-04 09:37:38 -08:00
Lou Marvin Caraig
9de57eb1c8 Added documentation 2016-02-02 14:32:12 +01:00
Björn Zettergren
d373573c25 DOCs: Missing 'type' for leaveIntermediate
Added missing 'Boolean' as type for leaveIntermediate row in table TuningConfig
2016-01-29 14:42:19 +01:00
Himanshu Gupta
b3437825f0 add ignoreWhenNoSegments flag to optionally ignore the dataSource inputSpec when no segments were found 2016-01-26 17:23:55 -06:00
binlijin
cd1c71ceb4 rename persistBackgroundCount to numBackgroundPersistThreads 2016-01-22 14:29:41 +08:00
Nishant
dcb7830330 Merge pull request #984 from drcrallen/thread-priority-rebase
Use thread priorities. (aka set `nice` values for background-like tasks)
2016-01-21 15:02:34 +05:30
Charles Allen
2a69a58570 Merge pull request #2149 from binlijin/master
Do persist IncrementalIndex in another thread in IndexGeneratorReducer
2016-01-20 17:06:42 -08:00
Charles Allen
2e1d6aaf3d Use thread priorities. (aka set nice values for background-like tasks)
* Defaults the thread priority to java.util.Thread.NORM_PRIORITY in io.druid.indexing.common.task.AbstractTask
 * Each exec service has its own Task Factory which is assigned a priority for spawned task. Therefore each priority class has a unique exec service
 * Added priority to tasks as taskPriority in the task context. <0 means low, 0 means take default, >0 means high. It is up to any particular implementation to determine how to handle these numbers
 * Add options to ForkingTaskRunner
    * Add "-XX:+UseThreadPriorities" default option
    * Add "-XX:ThreadPriorityPolicy=42" default option
 * AbstractTask - Removed unneded @JsonIgnore on priority
 * Added priority to RealtimePlumber executors. All sub-executors (non query runners) get Thread.MIN_PRIORITY
 * Add persistThreadPriority and mergeThreadPriority to realtime tuning config
2016-01-20 14:00:31 -08:00
Logan Linn
c3bdaefe1f Update batch-ingestion.md
Fix documented type of the `dataGranularity` config
2016-01-19 17:20:47 -08:00
binlijin
8e43e2c446 Do persist IncrementalIndex in another thread in IndexGeneratorReducer 2016-01-20 09:20:09 +08:00
Kurt Young
82ff98c2bf add config for build v9 directly and update docs 2016-01-16 11:26:34 +08:00
Zhao Weinan
5e57ddb8cc Adding avro support to realtime & hadoop batch indexing. 2016-01-05 10:21:27 +08:00
Robin
0961c0b703 trivial documentation fix 2016-01-04 12:39:10 -06:00
fjy
88f6b9b5ad Multiple improvements for docs 2016-01-02 21:54:54 -08:00
Himanshu Gupta
48de9dfafa doc update to make it easy to find how to do re-indexing or delta ingestion 2015-12-30 23:58:09 -06:00
fjy
398a3ec620 add docs for more specs 2015-12-17 18:06:30 -08:00
jon-wei
c53bf85d83 Add docs and benchmark for JSON flattening parser 2015-12-09 16:13:30 -08:00
Himanshu Gupta
efe3c9f4a5 update the examples for batch reindexing/delta ingestion to use "intervals" instead of deprecated "interval" 2015-12-06 00:22:20 -06:00
Himanshu Gupta
61aaa09012 support multiple intervals in dataSource input spec 2015-12-03 21:28:04 -06:00
jon-wei
95dca4440f Update data formats doc with info about JSON multi-value dimensions 2015-11-24 14:38:06 -08:00
sahner
a4ed2ce2d1 fix formatting in schema-design 2015-11-17 16:50:53 -06:00
fjy
8f231fd3e3 cleanup druid codebase 2015-11-04 13:59:53 -08:00
Nishant
efc49da073 fix doc - correct default value for maxRowsInMemory 2015-11-01 22:09:24 -08:00
Bingkun Guo
4914925d65 New extension loading mechanism
1) Remove maven client from downloading extensions at runtime.
2) Provide a way to load Druid extensions and hadoop dependencies through file system.
3) Refactor pull-deps so that it can download extensions into extension directories.
4) Add documents on how to use this new extension loading mechanism.
5) Change the way how Druid tarball is generated. Now all the extensions + hadoop-client 2.3.0
are packaged within the Druid tarball.
2015-10-21 14:22:36 -05:00
Gian Merlino
933cbdf780 Adjust realtime constraints in the docs. 2015-10-09 10:52:52 -07:00
Gian Merlino
b29cbf97a6 Docs: Suggest hadoopyString parser for Hadoop. 2015-09-16 10:19:42 -07:00
Himanshu Gupta
075b6d4385 update ingestion faq to mention dataSource inputSpec as an option of reindexing via hadoop 2015-09-10 14:41:13 -05:00
Xavier Léauté
d89b0fa76a Merge pull request #1662 from qix/pathFormat-doc
Add documentation for pathFormat in batch ingestion
2015-08-31 11:14:54 -07:00
Josh Yudaken
29c29b42d3 Add default value and link to joda docs 2015-08-31 11:09:54 -07:00
lvjq
2237a8cf0f kafka 8 simple consumer firehose 2015-08-27 20:50:46 -05:00
Bingkun
ae1f104c10 Fix batch ingestion doc 2015-08-26 15:16:21 -05:00
Gian Merlino
10946610f4 Merge pull request #1656 from druid-io/all-the-docs
more docs for common questions
2015-08-25 17:49:47 -07:00
fjy
4055f9ca48 more docs for common questions 2015-08-25 17:49:04 -07:00
sahner
3def847e28 add documentation about TimedShutoff firehose 2015-08-24 20:41:42 -05:00
Josh Yudaken
5e42aee49e Add documentation for pathFormat in batch ingestion 2015-08-24 14:39:57 -07:00
Himanshu Gupta
cfd81bfac7 updating the docs on how to do hadoop batch re-ingesion and delta ingestion 2015-08-16 14:07:35 -05:00