From 08d00cc80fe6d70bfc7e919a2f7f9be03a79d33e Mon Sep 17 00:00:00 2001 From: fjy Date: Thu, 2 Jul 2015 17:57:10 -0700 Subject: [PATCH] rework the realtime examples a bit; add more faq --- docs/content/ingestion/faq.md | 14 ++++++++++++++ docs/content/ingestion/realtime-ingestion.md | 4 ++-- docs/content/operations/recommendations.md | 4 ---- examples/config/realtime/runtime.properties | 6 ++++++ 4 files changed, 22 insertions(+), 6 deletions(-) diff --git a/docs/content/ingestion/faq.md b/docs/content/ingestion/faq.md index 08f94ad1b3f..e832eadc980 100644 --- a/docs/content/ingestion/faq.md +++ b/docs/content/ingestion/faq.md @@ -2,6 +2,20 @@ layout: doc_page --- +## My Data isn't being loaded + +### Realtime Ingestion + +If you are trying to stream in historical (not current time) data into Druid and you are using the [serverTime](../ingestion/realtime-ingestion.html) rejection policy in your ingestion spec (the default rejection policy), Druid will not ingest this data as it is outside of the acceptable window period. You can verify this is what is happening by looking at the logs of your real-time process for log lines containing "ingest/events/*". These metrics will indicate the events ingested, rejected, etc. We recommend using batch ingestion methods for historical data in production. + +If you are doing a POC, you can use the [messageTime](../ingestion/realtime-ingestion.html) rejection policy, but please be aware of the hand-off caveats. This rejection policy is not recommended in production. + +If you are experimenting with realtime ingestion, you can also use the [none](../ingestion/realtime-ingestion.html) rejection policy to load all incoming events, but hand-off will never occur. + +### Batch Ingestion + +If you are trying to batch load historical data but no events are being loaded, make sure the interval of your ingestion spec actually encapsulates the interval of your data. Events outside this interval are dropped. + ## What types of data does Druid support? Druid can ingest JSON, CSV, TSV and other delimited data out of the box. Druid supports single dimension values, or multiple dimension values (an array of strings). Druid supports long and float numeric columns. diff --git a/docs/content/ingestion/realtime-ingestion.md b/docs/content/ingestion/realtime-ingestion.md index 6a499615bb1..33f3a6d2391 100644 --- a/docs/content/ingestion/realtime-ingestion.md +++ b/docs/content/ingestion/realtime-ingestion.md @@ -148,8 +148,8 @@ The tuningConfig is optional and default parameters will be used if no tuningCon The following policies are available: * `serverTime` – The recommended policy for "current time" data, it is optimal for current data that is generated and ingested in real time. Uses `windowPeriod` to accept only those events that are inside the window looking forward and back. -* `messageTime` – Can be used for non-"current time" as long as that data is relatively in sequence. Events are rejected if they are less than `windowPeriod` from the event with the latest timestamp. Hand off only occurs if an event is seen after the segmentGranularity and `windowPeriod`. -* `none` – Never hands off data unless shutdown() is called on the configured firehose. +* `messageTime` – Can be used for non-"current time" as long as that data is relatively in sequence. Events are rejected if they are less than `windowPeriod` from the event with the latest timestamp. Hand off only occurs if an event is seen after the segmentGranularity and `windowPeriod` (hand off will not periodically occur unless you have a constant stream of data). +* `none` – All events are accepted. Never hands off data unless shutdown() is called on the configured firehose. #### Sharding diff --git a/docs/content/operations/recommendations.md b/docs/content/operations/recommendations.md index df1d01a5b1c..670266fea10 100644 --- a/docs/content/operations/recommendations.md +++ b/docs/content/operations/recommendations.md @@ -9,10 +9,6 @@ Recommendations We recommend using UTC timezone for all your events and across on your nodes, not just for Druid, but for all data infrastructure. This can greatly mitigate potential query problems with inconsistent timezones. -# Use Lowercase Strings for Column Names - -Druid is not perfect in how it handles mix-cased dimension and metric names. This will hopefully change very soon but for the time being, lower casing your column names is recommended. - # SSDs SSDs are highly recommended for historical and real-time nodes if you are not running a cluster that is entirely in memory. SSDs can greatly mitigate the time required to page data in and out of memory. diff --git a/examples/config/realtime/runtime.properties b/examples/config/realtime/runtime.properties index 2e89c391478..8c0dda16de0 100644 --- a/examples/config/realtime/runtime.properties +++ b/examples/config/realtime/runtime.properties @@ -26,5 +26,11 @@ druid.service=realtime druid.processing.buffer.sizeBytes=100000000 druid.processing.numThreads=1 +# Override emitter to print logs about events ingested, rejected, etc +druid.emitter=logging + # Enable Real monitoring +druid.monitoring.monitors=["io.druid.segment.realtime.RealtimeMetricsMonitor"] +# Enable all monitors # druid.monitoring.monitors=["com.metamx.metrics.SysMonitor","com.metamx.metrics.JvmMonitor","io.druid.segment.realtime.RealtimeMetricsMonitor"] +