From ee392b6064c7ec04783f348de068cf513249d65b Mon Sep 17 00:00:00 2001 From: fjy Date: Tue, 21 Oct 2014 16:26:17 -0700 Subject: [PATCH] more updates to docs --- docs/content/Ingestion-FAQ.md | 9 ++++++++- .../{Best-Practices.md => Recommendations.md} | 15 +++++++++++---- docs/content/index.md | 13 +------------ docs/content/toc.textile | 2 +- 4 files changed, 21 insertions(+), 18 deletions(-) rename docs/content/{Best-Practices.md => Recommendations.md} (53%) diff --git a/docs/content/Ingestion-FAQ.md b/docs/content/Ingestion-FAQ.md index ecf6b2ccdac..972e62a6a23 100644 --- a/docs/content/Ingestion-FAQ.md +++ b/docs/content/Ingestion-FAQ.md @@ -1,6 +1,11 @@ --- layout: doc_page --- + +## What types of data does Druid support? + +Druid can ingest JSON, CSV, TSV and other delimited data out of the box. Druid supports single dimension values, or multiple dimension values (an array of strings). Druid supports long and float numeric columns. + ## Where do my Druid segments end up after ingestion? Depending on what `druid.storage.type` is set to, Druid will upload segments to some [Deep Storage](Deep-Storage.html). Local disk is used as the default deep storage. @@ -24,7 +29,9 @@ druid.storage.baseKey=sample Other common reasons that hand-off fails are as follows: 1) Historical nodes are out of capacity and cannot download any more segments. You'll see exceptions in the coordinator logs if this occurs. + 2) Segments are corrupt and cannot download. You'll see exceptions in your historical nodes if this occurs. + 3) Deep storage is improperly configured. Make sure that your segment actually exists in deep storage and that the coordinator logs have no errors. ## How do I get HDFS to work? @@ -41,7 +48,7 @@ You can check the coordinator console located at `:/cluste ## My queries are returning empty results -You can check `:/druid/v2/datasources/?interval=0/3000` for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists. Note: the broker endpoint will only return valid results on historical segments. +You can check `:/druid/v2/datasources/?interval=0/3000` for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists. Note: the broker endpoint will only return valid results on historical segments and not segments served by real-time nodes. ## How can I Reindex existing data in Druid with schema changes? diff --git a/docs/content/Best-Practices.md b/docs/content/Recommendations.md similarity index 53% rename from docs/content/Best-Practices.md rename to docs/content/Recommendations.md index 9688a52f966..bf764ffe6c2 100644 --- a/docs/content/Best-Practices.md +++ b/docs/content/Recommendations.md @@ -2,8 +2,8 @@ layout: doc_page --- -Best Practices -============== +Recommendations +=============== # Use UTC Timezone @@ -17,12 +17,19 @@ Druid is not perfect in how it handles mix-cased dimension and metric names. Thi SSDs are highly recommended for historical and real-time nodes if you are not running a cluster that is entirely in memory. SSDs can greatly mitigate the time required to page data in and out of memory. -# Provide Columns Names in Lexicographic Order for Best Results +# Provide Columns Names in Lexicographic Order -Although Druid supports schemaless ingestion of dimensions, because of https://github.com/metamx/druid/issues/658, you may sometimes get bigger segments than necessary. To ensure segments are as compact as possible, providing dimension names in lexicographic order is recommended. This may require some ETL processing on your data however. +Although Druid supports schema-less ingestion of dimensions, because of [https://github.com/metamx/druid/issues/658](https://github.com/metamx/druid/issues/658), you may sometimes get bigger segments than necessary. To ensure segments are as compact as possible, providing dimension names in lexicographic order is recommended. + + +# Use Timeseries and TopN Queries Instead of GroupBy Where Possible + +Timeseries and TopN queries are much more optimized and significantly faster than groupBy queries for their designed use cases. Issuing multiple topN or timeseries queries from your application can potentially be more efficient than a single groupBy query. # Read FAQs You should read common problems people have here: + 1) [Ingestion-FAQ](Ingestion-FAQ.html) + 2) [Performance-FAQ](Performance-FAQ.html) \ No newline at end of file diff --git a/docs/content/index.md b/docs/content/index.md index 529a2325436..3c236cc81be 100644 --- a/docs/content/index.md +++ b/docs/content/index.md @@ -37,17 +37,6 @@ When Druid? * You want to do your analysis on data as it’s happening (in real-time) * You need a data store that is always available, 24x7x365, and years into the future. - -Not Druid? ----------- - -* The amount of data you have can easily be handled by MySQL -* You're querying for individual entries or doing lookups (not analytics) -* Batch ingestion is good enough -* Canned queries are good enough -* Downtime is no big deal - - Druid vs… ---------- @@ -60,7 +49,7 @@ Druid vs… About This Page ---------- -The data store world is vast, confusing and constantly in flux. This page is meant to help potential evaluators decide whether Druid is a good fit for the problem one needs to solve. If anything about it is incorrect please provide that feedback on the mailing list or via some other means so we can fix it. +The data infrastructure world is vast, confusing and constantly in flux. This page is meant to help potential evaluators decide whether Druid is a good fit for the problem one needs to solve. If anything about it is incorrect please provide that feedback on the mailing list or via some other means so we can fix it. diff --git a/docs/content/toc.textile b/docs/content/toc.textile index 21f867bca36..29c08ab4d0e 100644 --- a/docs/content/toc.textile +++ b/docs/content/toc.textile @@ -19,7 +19,7 @@ h2. Booting a Druid Cluster * "Production Cluster Configuration":Production-Cluster-Configuration.html * "Production Hadoop Configuration":Hadoop-Configuration.html * "Rolling Cluster Updates":Rolling-Updates.html -* "Best Practices":Best-Practices.html +* "Recommendations":Recommendations.html h2. Configuration * "Common Configuration":Configuration.html