refactored hadoop intro since the replacement page for instructions also wasn't right and there are quite a few available that are right

This commit is contained in:
Igal Levy 2014-02-21 10:43:52 -08:00
parent 0e3224bd1e
commit 21299a6f81
1 changed files with 2 additions and 2 deletions

View File

@ -205,9 +205,9 @@ Issuing a [TimeBoundaryQuery](TimeBoundaryQuery.html) to the real-time node shou
Batch Ingestion
---------------
Druid is designed for large data volumes, and most real-world data sets require batch indexing be done through a Hadoop job.
Druid is designed for large data volumes, and most real-world data sets require batch indexing be done through a Hadoop job.
The setup for a single node, 'standalone' Hadoop cluster is available [here](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html).
For this tutorial, we used [Hadoop 1.0.3](https://archive.apache.org/dist/hadoop/core/hadoop-1.0.3/). There are many pages on the Internet showing how to set up a single-node (standalone) Hadoop cluster, which is all that's needed for this example.
For the purposes of this tutorial, we are going to use our very small and simple Wikipedia data set. This data can directly be ingested via other means as shown in the previous [tutorial](Tutorial%3A-Loading-Your-Data-Part-1.html), but we are going to use Hadoop here for demonstration purposes.