mirror of https://github.com/apache/druid.git
commit
484b91db5a
|
@ -17,10 +17,7 @@ You can provision individual servers, loading Druid onto each machine (or buildi
|
|||
You'll need an AWS account, S3 Bucket and an EC2 key pair from that account so that Whirr can connect to the cloud via the EC2 API. If you haven't generated a key pair, see the [AWS documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) or see this [Whirr FAQ](http://whirr.apache.org/faq.html#how-do-i-find-my-cloud-credentials).
|
||||
|
||||
|
||||
### Installing Whirr
|
||||
You must use a version of Whirr that includes and supports a Druid recipe. You can do it so in one of two ways:
|
||||
|
||||
#### Build the Following Version of Whirr
|
||||
### Install Whirr
|
||||
Clone the code from [https://github.com/druid-io/whirr](https://github.com/druid-io/whirr) and build Whirr:
|
||||
|
||||
git clone git@github.com:druid-io/whirr.git
|
||||
|
@ -28,23 +25,23 @@ Clone the code from [https://github.com/druid-io/whirr](https://github.com/druid
|
|||
git checkout trunk
|
||||
mvn clean install -Dmaven.test.failure.ignore=true
|
||||
|
||||
#### Build the Latest Version of Whirr
|
||||
Clone the code from the Whirr repository:
|
||||
In order to run the test below, you'll also need two files that available only from a [standard install of Druid](http://druid.io/downloads.html) or the [Druid repo](https://github.com/metamx/druid/tree/master/examples/bin/examples):
|
||||
|
||||
git clone git://git.apache.org/whirr.git
|
||||
* `druid/examples/bin/examples/wikipedia/wikipedia_realtime.spec`
|
||||
* `druid/examples/bin/examples/indexing/wikipedia_realtime_task.json`
|
||||
|
||||
Then run `mvn install` from the root directory.
|
||||
|
||||
### Configure Whirr
|
||||
The Whirr recipe for Druid is the configuration file `$WHIRR_HOME/recipies/druid.properties`. You can edit this file to suit your needs -- it is annotated and self-explanatory. Here are some hints about that file:
|
||||
The Whirr recipe for Druid is the configuration file `$WHIRR_HOME/recipies/druid.properties`. You can edit this file to suit your needs; it is annotated and self-explanatory. Here are some hints about that file:
|
||||
|
||||
* Set `whirr.location-id` to a specific AWS region (e.g., us-east-1) if desired, else one will be chosen for you.
|
||||
* Set `whirr.location-id` to a specific AWS region if desired. If this is left blank, a region is chosen for you. The default value is `us-east-1`.
|
||||
* You can choose the hardware used with `whirr.hardware-id` to a specific instance type (e.g., m1.large). By default druid.properties, m3.2xlarge (broker, historical, middle manager), m1.xlarge (coordinator, overlord), and m1.small (zookeeper, mysql) are used.
|
||||
* If you don't choose an image via `whirr.image-id` (image must be compatible with hardware), you'll get plain vanilla Linux. Default druid.properties uses ami-018c9568 (Ubuntu 12.04).
|
||||
* SSH keys (not password protected) must exist for the local user. If they are in the default locations, `${sys:user.home}/.ssh/id_rsa` and `${sys:user.home}/.ssh/id_rsa.pub`, Whirr will find them. Otherwise, you'll have to specify them with `whirr.private-key-file` and `whirr.public-key-file`.
|
||||
* Be sure to specify the absolute path of the Druid realtime spec file `realtime.spec` in `whirr.druid.realtime.spec.path`.
|
||||
* Also make sure to specify the correct S3 bucket. Otherwise the cluster won't be able to process tasks.
|
||||
* Two Druid cluster templates (see `whirr.instance-templates`) are provided: a small cluster running on a single EC2 instance, and a larger cluster running on multiple instances.
|
||||
* You must specify the path to an S3 bucket. Otherwise the cluster won't be able to process tasks.
|
||||
* To successfully submit the test task below, you'll need to specify the location of the `wikipedia_realtime.spec` in the property `whirr.druid.realtime.spec.path`.
|
||||
* Specify Druid version only if [Druid extenions](Modules.html) are being used.
|
||||
|
||||
The following AWS information must be set in `druid.properties`, as environment variables, or in the file `$WHIRR_HOME/conf/credentials`:
|
||||
|
||||
|
@ -54,7 +51,7 @@ The following AWS information must be set in `druid.properties`, as environment
|
|||
|
||||
How to get the IDENTITY and CREDENTIAL keys is discussed above.
|
||||
|
||||
In order to configure each node, you can edit `services/druid/src/main/resources/functions/start_druid.sh` for JVM configuration and `services/druid/src/main/resources/functions/configure_[NODE_NAME].sh` for specific node configuration. For more information on configuration, read the Druid documentations about it (http://druid.io/docs/0.6.116/Configuration.html).
|
||||
In order to configure each node, you can edit `services/druid/src/main/resources/functions/start_druid.sh` for JVM configuration and `services/druid/src/main/resources/functions/configure_[NODE_NAME].sh` for specific node configuration. For more information on configuration, see the [Druid configuration documentation](Configuration.html).
|
||||
|
||||
### Start a Test Cluster With Whirr
|
||||
Run the following command:
|
||||
|
@ -66,14 +63,14 @@ If Whirr starts without any errors, you should see the following message:
|
|||
|
||||
Running on provider aws-ec2 using identity <your-aws-id-here>
|
||||
|
||||
You can then use the EC2 dashboard to locate the instance and confirm that it has started up.
|
||||
You can then use the EC2 dashboard to locate the instances and confirm that they have started up.
|
||||
|
||||
If both the instance and the Druid cluster launch successfully, a few minutes later other messages to STDOUT should follow with information returned from EC2, including the instance ID:
|
||||
If both the instances and the Druid cluster launch successfully, a few minutes later other messages to STDOUT should follow with information returned from EC2, including the instance ID:
|
||||
|
||||
Started cluster of 1 instances
|
||||
Started cluster of 8 instances
|
||||
Cluster{instances=[Instance{roles=[zookeeper, druid-mysql, druid-coordinator, druid-broker, druid-historical, druid-realtime], publicIp= ...
|
||||
|
||||
The final message will contain login information for the instance.
|
||||
The final message will contain login information for the instances.
|
||||
|
||||
Note that Whirr will return an exception if any of the nodes fail to launch, and the cluster will be destroyed. To destroy the cluster manually, run the following command:
|
||||
|
||||
|
@ -85,9 +82,11 @@ Note that Whirr will return an exception if any of the nodes fail to launch, and
|
|||
Now you can run an indexing task and a simple query to see if all the nodes have launched correctly. We are going to use a Wikipedia example again. For a realtime indexing task, run the following command:
|
||||
|
||||
```bash
|
||||
curl -X 'POST' -H 'Content-Type:application/json' -d @#{YOUR_DRUID_DIRECTORY}/examples/indexing/wikipedia_realtime_task.json #{OVERLORD_PUBLIC_IP_ADDR}:#{PORT}/druid/indexer/v1/task
|
||||
curl -X 'POST' -H 'Content-Type:application/json' -d @#{PATH_TO}/wikipedia_realtime_task.json #{OVERLORD_PUBLIC_IP_ADDR}:#{PORT}/druid/indexer/v1/task
|
||||
```
|
||||
Issuing the request should return a task ID.
|
||||
where OVERLORD_PUBLIC_IP_ADDR should be available from the EC2 information logged to STDOUT, the Overlord port is 8080 by default, and `wikipedia_realtime_task.json` is discussed above.
|
||||
|
||||
Issuing this request should return a task ID.
|
||||
|
||||
To check the state of the overlord, open up your browser and go to `#{OVERLORD_PUBLIC_IP_ADDR}:#{PORT}/console.html`.
|
||||
|
||||
|
|
Loading…
Reference in New Issue