more updates to docs

This commit is contained in:
fjy 2014-10-21 16:26:17 -07:00
parent 2d96bc5f1f
commit ee392b6064
4 changed files with 21 additions and 18 deletions

View File

@ -1,6 +1,11 @@
---
layout: doc_page
---
## What types of data does Druid support?
Druid can ingest JSON, CSV, TSV and other delimited data out of the box. Druid supports single dimension values, or multiple dimension values (an array of strings). Druid supports long and float numeric columns.
## Where do my Druid segments end up after ingestion?
Depending on what `druid.storage.type` is set to, Druid will upload segments to some [Deep Storage](Deep-Storage.html). Local disk is used as the default deep storage.
@ -24,7 +29,9 @@ druid.storage.baseKey=sample
Other common reasons that hand-off fails are as follows:
1) Historical nodes are out of capacity and cannot download any more segments. You'll see exceptions in the coordinator logs if this occurs.
2) Segments are corrupt and cannot download. You'll see exceptions in your historical nodes if this occurs.
3) Deep storage is improperly configured. Make sure that your segment actually exists in deep storage and that the coordinator logs have no errors.
## How do I get HDFS to work?
@ -41,7 +48,7 @@ You can check the coordinator console located at `<COORDINATOR_IP>:<PORT>/cluste
## My queries are returning empty results
You can check `<BROKER_IP>:<PORT>/druid/v2/datasources/<YOUR_DATASOURCE>?interval=0/3000` for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists. Note: the broker endpoint will only return valid results on historical segments.
You can check `<BROKER_IP>:<PORT>/druid/v2/datasources/<YOUR_DATASOURCE>?interval=0/3000` for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists. Note: the broker endpoint will only return valid results on historical segments and not segments served by real-time nodes.
## How can I Reindex existing data in Druid with schema changes?

View File

@ -2,8 +2,8 @@
layout: doc_page
---
Best Practices
==============
Recommendations
===============
# Use UTC Timezone
@ -17,12 +17,19 @@ Druid is not perfect in how it handles mix-cased dimension and metric names. Thi
SSDs are highly recommended for historical and real-time nodes if you are not running a cluster that is entirely in memory. SSDs can greatly mitigate the time required to page data in and out of memory.
# Provide Columns Names in Lexicographic Order for Best Results
# Provide Columns Names in Lexicographic Order
Although Druid supports schemaless ingestion of dimensions, because of https://github.com/metamx/druid/issues/658, you may sometimes get bigger segments than necessary. To ensure segments are as compact as possible, providing dimension names in lexicographic order is recommended. This may require some ETL processing on your data however.
Although Druid supports schema-less ingestion of dimensions, because of [https://github.com/metamx/druid/issues/658](https://github.com/metamx/druid/issues/658), you may sometimes get bigger segments than necessary. To ensure segments are as compact as possible, providing dimension names in lexicographic order is recommended.
# Use Timeseries and TopN Queries Instead of GroupBy Where Possible
Timeseries and TopN queries are much more optimized and significantly faster than groupBy queries for their designed use cases. Issuing multiple topN or timeseries queries from your application can potentially be more efficient than a single groupBy query.
# Read FAQs
You should read common problems people have here:
1) [Ingestion-FAQ](Ingestion-FAQ.html)
2) [Performance-FAQ](Performance-FAQ.html)

View File

@ -37,17 +37,6 @@ When Druid?
* You want to do your analysis on data as its happening (in real-time)
* You need a data store that is always available, 24x7x365, and years into the future.
Not Druid?
----------
* The amount of data you have can easily be handled by MySQL
* You're querying for individual entries or doing lookups (not analytics)
* Batch ingestion is good enough
* Canned queries are good enough
* Downtime is no big deal
Druid vs…
----------
@ -60,7 +49,7 @@ Druid vs…
About This Page
----------
The data store world is vast, confusing and constantly in flux. This page is meant to help potential evaluators decide whether Druid is a good fit for the problem one needs to solve. If anything about it is incorrect please provide that feedback on the mailing list or via some other means so we can fix it.
The data infrastructure world is vast, confusing and constantly in flux. This page is meant to help potential evaluators decide whether Druid is a good fit for the problem one needs to solve. If anything about it is incorrect please provide that feedback on the mailing list or via some other means so we can fix it.

View File

@ -19,7 +19,7 @@ h2. Booting a Druid Cluster
* "Production Cluster Configuration":Production-Cluster-Configuration.html
* "Production Hadoop Configuration":Hadoop-Configuration.html
* "Rolling Cluster Updates":Rolling-Updates.html
* "Best Practices":Best-Practices.html
* "Recommendations":Recommendations.html
h2. Configuration
* "Common Configuration":Configuration.html