diff --git a/docs/content/Best-Practices.md b/docs/content/Best-Practices.md index f4cd8f32595..9688a52f966 100644 --- a/docs/content/Best-Practices.md +++ b/docs/content/Best-Practices.md @@ -21,3 +21,8 @@ SSDs are highly recommended for historical and real-time nodes if you are not ru Although Druid supports schemaless ingestion of dimensions, because of https://github.com/metamx/druid/issues/658, you may sometimes get bigger segments than necessary. To ensure segments are as compact as possible, providing dimension names in lexicographic order is recommended. This may require some ETL processing on your data however. +# Read FAQs + +You should read common problems people have here: +1) [Ingestion-FAQ](Ingestion-FAQ.html) +2) [Performance-FAQ](Performance-FAQ.html) \ No newline at end of file diff --git a/docs/content/Coordinator.md b/docs/content/Coordinator.md index 9021cbe1dff..42d1d17041d 100644 --- a/docs/content/Coordinator.md +++ b/docs/content/Coordinator.md @@ -20,9 +20,7 @@ io.druid.cli.Main server coordinator Rules ----- -Segments are loaded and dropped from the cluster based on a set of rules. Rules indicate how segments should be assigned to different historical node tiers and how many replicants of a segment should exist in each tier. Rules may also indicate when segments should be dropped entirely from the cluster. The coordinator loads a set of rules from the database. Rules may be specific to a certain datasource and/or a default set of rules can be configured. Rules are read in order and hence the ordering of rules is important. The coordinator will cycle through all available segments and match each segment with the first rule that applies. Each segment may only match a single rule. - -For more information on rules, see [Rule Configuration](Rule-Configuration.html). +Segments can be automatically loaded and dropped from the cluster based on a set of rules. For more information on rules, see [Rule Configuration](Rule-Configuration.html). Cleaning Up Segments -------------------- diff --git a/docs/content/Ingestion-FAQ.md b/docs/content/Ingestion-FAQ.md index fc83907c47a..ecf6b2ccdac 100644 --- a/docs/content/Ingestion-FAQ.md +++ b/docs/content/Ingestion-FAQ.md @@ -21,6 +21,12 @@ druid.storage.bucket=druid druid.storage.baseKey=sample ``` +Other common reasons that hand-off fails are as follows: + +1) Historical nodes are out of capacity and cannot download any more segments. You'll see exceptions in the coordinator logs if this occurs. +2) Segments are corrupt and cannot download. You'll see exceptions in your historical nodes if this occurs. +3) Deep storage is improperly configured. Make sure that your segment actually exists in deep storage and that the coordinator logs have no errors. + ## How do I get HDFS to work? Make sure to include the `druid-hdfs-storage` module as one of your extensions and set `druid.storage.type=hdfs`. @@ -50,6 +56,9 @@ To do this use the IngestSegmentFirehose and run an indexer task. The IngestSegm Typically the above will be run as a batch job to say everyday feed in a chunk of data and aggregate it. +## Real-time ingestion seems to be stuck + +There are a few ways this can occur. Druid will throttle ingestion to prevent out of memory problems if the intermediate persists are taking too long or if hand-off is taking too long. If your node logs indicate certain columns are taking a very long time to build (for example, if your segment granularity is hourly, but creating a single column takes 30 minutes), you should re-evaluate your configuration or scale up your real-time ingestion. ## More information diff --git a/docs/content/Rule-Configuration.md b/docs/content/Rule-Configuration.md index bf8b8a9792d..c25c9e62b70 100644 --- a/docs/content/Rule-Configuration.md +++ b/docs/content/Rule-Configuration.md @@ -2,12 +2,34 @@ layout: doc_page --- # Configuring Rules for Coordinator Nodes + +Rules indicate how segments should be assigned to different historical node tiers and how many replicas of a segment should exist in each tier. Rules may also indicate when segments should be dropped entirely from the cluster. The coordinator loads a set of rules from the metadata storage. Rules may be specific to a certain datasource and/or a default set of rules can be configured. Rules are read in order and hence the ordering of rules is important. The coordinator will cycle through all available segments and match each segment with the first rule that applies. Each segment may only match a single rule. + Note: It is recommended that the coordinator console is used to configure rules. However, the coordinator node does have HTTP endpoints to programmatically configure rules. + Load Rules ---------- -Load rules indicate how many replicants of a segment should exist in a server tier. +Load rules indicate how many replicas of a segment should exist in a server tier. + +### Forever Load Rule + +Forever load rules are of the form: + +```json +{ + "type" : "loadForever", + "tieredReplicants": { + "hot": 1, + "_default_tier" : 1 + } +} +``` + +* `type` - this should always be "loadByInterval" +* `tieredReplicants` - A JSON Object where the keys are the tier names and values are the number of replicas for that tier. + ### Interval Load Rule @@ -16,14 +38,17 @@ Interval load rules are of the form: ```json { "type" : "loadByInterval", - "interval" : "2012-01-01/2013-01-01", - "tier" : "hot" + "interval": "2012-01-01/2013-01-01", + "tieredReplicants": { + "hot": 1, + "_default_tier" : 1 + } } ``` * `type` - this should always be "loadByInterval" * `interval` - A JSON Object representing ISO-8601 Intervals -* `tier` - the configured historical node tier +* `tieredReplicants` - A JSON Object where the keys are the tier names and values are the number of replicas for that tier. ### Period Load Rule @@ -33,13 +58,16 @@ Period load rules are of the form: { "type" : "loadByPeriod", "period" : "P1M", - "tier" : "hot" + "tieredReplicants": { + "hot": 1, + "_default_tier" : 1 + } } ``` * `type` - this should always be "loadByPeriod" * `period` - A JSON Object representing ISO-8601 Periods -* `tier` - the configured historical node tier +* `tieredReplicants` - A JSON Object where the keys are the tier names and values are the number of replicas for that tier. The interval of a segment will be compared against the specified period. The rule matches if the period overlaps the interval. @@ -48,6 +76,21 @@ Drop Rules Drop rules indicate when segments should be dropped from the cluster. +### Forever Drop Rule + +Forever drop rules are of the form: + +```json +{ + "type" : "dropForever" +} +``` + +* `type` - this should always be "dropByPeriod" + +All segments that match this rule are dropped from the cluster. + + ### Interval Drop Rule Interval drop rules are of the form: