druid/docs/content/Ingestion-FAQ.md

---
layout: doc_page
---

## What types of data does Druid support?

Druid can ingest JSON, CSV, TSV and other delimited data out of the box. Druid supports single dimension values, or multiple dimension values (an array of strings). Druid supports long and float numeric columns.

## Where do my Druid segments end up after ingestion?

Depending on what `druid.storage.type` is set to, Druid will upload segments to some [Deep Storage](Deep-Storage.html). Local disk is used as the default deep storage.

## My realtime node is not handing segments off

Make sure that the `druid.publish.type` on your real-time nodes is set to "metadata". Also make sure that `druid.storage.type` is set to a deep storage that makes sense. Some example configs:

```
druid.publish.type=db

druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid
druid.db.connector.user=druid
druid.db.connector.password=diurd

druid.storage.type=s3
druid.storage.bucket=druid
druid.storage.baseKey=sample
```

Other common reasons that hand-off fails are as follows:

1) Historical nodes are out of capacity and cannot download any more segments. You'll see exceptions in the coordinator logs if this occurs.

2) Segments are corrupt and cannot download. You'll see exceptions in your historical nodes if this occurs.

3) Deep storage is improperly configured. Make sure that your segment actually exists in deep storage and that the coordinator logs have no errors.

## How do I get HDFS to work?

Make sure to include the `druid-hdfs-storage` module as one of your extensions and set `druid.storage.type=hdfs`.

## I don't see my Druid segments on my historical nodes
You can check the coordinator console located at `<COORDINATOR_IP>:<PORT>/cluster.html`. Make sure that your segments have actually loaded on [historical nodes](Historical.html). If your segments are not present, check the coordinator logs for messages about capacity of replication errors. One reason that segments are not downloaded is because historical nodes have maxSizes that are too small, making them incapable of downloading more data. You can change that with (for example):

```
-Ddruid.segmentCache.locations=[{"path":"/tmp/druid/storageLocation","maxSize":"500000000000"}]
-Ddruid.server.maxSize=500000000000
 ```

## My queries are returning empty results

You can check `<BROKER_IP>:<PORT>/druid/v2/datasources/<YOUR_DATASOURCE>?interval=0/3000` for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists. Note: the broker endpoint will only return valid results on historical segments and not segments served by real-time nodes.

## How can I Reindex existing data in Druid with schema changes?

You can use IngestSegmentFirehose with index task to ingest existing druid segments using a new schema and change the name, dimensions, metrics, rollup, etc. of the segment.
See [Firehose](Firehose.html) for more details on IngestSegmentFirehose.

## How can I change the granularity of existing data in Druid?

In a lot of situations you may want to lower the granularity of older data. Example, any data older than 1 month has only hour level granularity but newer data has minute level granularity. 

To do this use the IngestSegmentFirehose and run an indexer task. The IngestSegment firehose will allow you to take in existing segments from Druid and aggregate them and feed them back into druid. It will also allow you to filter the data in those segments while feeding it back in. This means if there are rows you want to delete, you can just filter them away during re-ingestion.

Typically the above will be run as a batch job to say everyday feed in a chunk of data and aggregate it.

## Real-time ingestion seems to be stuck

There are a few ways this can occur. Druid will throttle ingestion to prevent out of memory problems if the intermediate persists are taking too long or if hand-off is taking too long. If your node logs indicate certain columns are taking a very long time to build (for example, if your segment granularity is hourly, but creating a single column takes 30 minutes), you should re-evaluate your configuration or scale up your real-time ingestion. 

## More information

Getting data into Druid can definitely be difficult for first time users. Please don't hesitate to ask questions in our IRC channel or on our [google groups page](https://groups.google.com/forum/#!forum/druid-development).
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00			`---`
			`layout: doc_page`
			`---`
more updates to docs 2014-10-21 19:26:17 -04:00
			`## What types of data does Druid support?`

			`Druid can ingest JSON, CSV, TSV and other delimited data out of the box. Druid supports single dimension values, or multiple dimension values (an array of strings). Druid supports long and float numeric columns.`

a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00			`## Where do my Druid segments end up after ingestion?`

			Depending on what `druid.storage.type` is set to, Druid will upload segments to some [Deep Storage](Deep-Storage.html). Local disk is used as the default deep storage.

			`## My realtime node is not handing segments off`

rewrite config docs 2014-12-11 18:52:26 -05:00			Make sure that the `druid.publish.type` on your real-time nodes is set to "metadata". Also make sure that `druid.storage.type` is set to a deep storage that makes sense. Some example configs:
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00
			```
rewrite config docs 2014-12-11 18:52:26 -05:00			`druid.publish.type=db`
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00
rewrite config docs 2014-12-11 18:52:26 -05:00			`druid.db.connector.connectURI=jdbc\:mysql\://localhost\:3306/druid`
			`druid.db.connector.user=druid`
			`druid.db.connector.password=diurd`
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00
			`druid.storage.type=s3`
			`druid.storage.bucket=druid`
			`druid.storage.baseKey=sample`
			```

more FAQ docs 2014-10-21 19:08:56 -04:00			`Other common reasons that hand-off fails are as follows:`

			`1) Historical nodes are out of capacity and cannot download any more segments. You'll see exceptions in the coordinator logs if this occurs.`
more updates to docs 2014-10-21 19:26:17 -04:00
more FAQ docs 2014-10-21 19:08:56 -04:00			`2) Segments are corrupt and cannot download. You'll see exceptions in your historical nodes if this occurs.`
more updates to docs 2014-10-21 19:26:17 -04:00
more FAQ docs 2014-10-21 19:08:56 -04:00			`3) Deep storage is improperly configured. Make sure that your segment actually exists in deep storage and that the coordinator logs have no errors.`

fix endpoint bugs and more docs 2014-04-03 20:01:33 -04:00			`## How do I get HDFS to work?`

			Make sure to include the `druid-hdfs-storage` module as one of your extensions and set `druid.storage.type=hdfs`.

a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00			`## I don't see my Druid segments on my historical nodes`
fixed markdown rendering issue (literal '<' are interpreted) 2014-03-04 19:14:19 -05:00			You can check the coordinator console located at `<COORDINATOR_IP>:<PORT>/cluster.html`. Make sure that your segments have actually loaded on [historical nodes](Historical.html). If your segments are not present, check the coordinator logs for messages about capacity of replication errors. One reason that segments are not downloaded is because historical nodes have maxSizes that are too small, making them incapable of downloading more data. You can change that with (for example):
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00
			```
			`-Ddruid.segmentCache.locations=[{"path":"/tmp/druid/storageLocation","maxSize":"500000000000"}]`
			`-Ddruid.server.maxSize=500000000000`
			```

			`## My queries are returning empty results`

more updates to docs 2014-10-21 19:26:17 -04:00			You can check `<BROKER_IP>:<PORT>/druid/v2/datasources/<YOUR_DATASOURCE>?interval=0/3000` for the dimensions and metrics that have been created for your datasource. Make sure that the name of the aggregators you use in your query match one of these metrics. Also make sure that the query interval you specify match a valid time range where data exists. Note: the broker endpoint will only return valid results on historical segments and not segments served by real-time nodes.
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00
doc for IngestSegmentFirehose 2014-08-01 07:13:39 -04:00			`## How can I Reindex existing data in Druid with schema changes?`

			`You can use IngestSegmentFirehose with index task to ingest existing druid segments using a new schema and change the name, dimensions, metrics, rollup, etc. of the segment.`
			`See [Firehose](Firehose.html) for more details on IngestSegmentFirehose.`

FAQ on changing data granularity added cause this question is asked a ton of times on google group and i had a similar question and had to resort to the forums cause there was no doc. 2014-09-25 04:48:45 -04:00			`## How can I change the granularity of existing data in Druid?`

			`In a lot of situations you may want to lower the granularity of older data. Example, any data older than 1 month has only hour level granularity but newer data has minute level granularity.`

			`To do this use the IngestSegmentFirehose and run an indexer task. The IngestSegment firehose will allow you to take in existing segments from Druid and aggregate them and feed them back into druid. It will also allow you to filter the data in those segments while feeding it back in. This means if there are rows you want to delete, you can just filter them away during re-ingestion.`

			`Typically the above will be run as a batch job to say everyday feed in a chunk of data and aggregate it.`

more FAQ docs 2014-10-21 19:08:56 -04:00			`## Real-time ingestion seems to be stuck`

			`There are a few ways this can occur. Druid will throttle ingestion to prevent out of memory problems if the intermediate persists are taking too long or if hand-off is taking too long. If your node logs indicate certain columns are taking a very long time to build (for example, if your segment granularity is hourly, but creating a single column takes 30 minutes), you should re-evaluate your configuration or scale up your real-time ingestion.`
FAQ on changing data granularity added cause this question is asked a ton of times on google group and i had a similar question and had to resort to the forums cause there was no doc. 2014-09-25 04:48:45 -04:00
a whole bunch of docs and fixes 2014-01-13 21:01:56 -05:00			`## More information`

			`Getting data into Druid can definitely be difficult for first time users. Please don't hesitate to ask questions in our IRC channel or on our [google groups page](https://groups.google.com/forum/#!forum/druid-development).`