From 262d43ae2e20e65a9666fcce1f0a567b397c0aaf Mon Sep 17 00:00:00 2001 From: Igal Levy Date: Fri, 20 Jun 2014 17:03:21 -0700 Subject: [PATCH] removed build section that didn't work (direct from whirr repo); added info on resource files needed to run test; tweaked whirr config section to match current requirements; changed cluster-start section to reflect fact that we're starting multiple instances; other minor edits --- docs/content/Booting-a-production-cluster.md | 39 ++++++++++---------- 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/docs/content/Booting-a-production-cluster.md b/docs/content/Booting-a-production-cluster.md index 77c22ba5f63..235df64a2ad 100644 --- a/docs/content/Booting-a-production-cluster.md +++ b/docs/content/Booting-a-production-cluster.md @@ -17,10 +17,7 @@ You can provision individual servers, loading Druid onto each machine (or buildi You'll need an AWS account, S3 Bucket and an EC2 key pair from that account so that Whirr can connect to the cloud via the EC2 API. If you haven't generated a key pair, see the [AWS documentation](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) or see this [Whirr FAQ](http://whirr.apache.org/faq.html#how-do-i-find-my-cloud-credentials). -### Installing Whirr -You must use a version of Whirr that includes and supports a Druid recipe. You can do it so in one of two ways: - -#### Build the Following Version of Whirr +### Install Whirr Clone the code from [https://github.com/druid-io/whirr](https://github.com/druid-io/whirr) and build Whirr: git clone git@github.com:druid-io/whirr.git @@ -28,23 +25,23 @@ Clone the code from [https://github.com/druid-io/whirr](https://github.com/druid git checkout trunk mvn clean install -Dmaven.test.failure.ignore=true -#### Build the Latest Version of Whirr -Clone the code from the Whirr repository: +In order to run the test below, you'll also need two files that available only from a [standard install of Druid](http://druid.io/downloads.html) or the [Druid repo](https://github.com/metamx/druid/tree/master/examples/bin/examples): + +* `druid/examples/bin/examples/wikipedia/wikipedia_realtime.spec` +* `druid/examples/bin/examples/indexing/wikipedia_realtime_task.json` - git clone git://git.apache.org/whirr.git - -Then run `mvn install` from the root directory. ### Configure Whirr -The Whirr recipe for Druid is the configuration file `$WHIRR_HOME/recipies/druid.properties`. You can edit this file to suit your needs -- it is annotated and self-explanatory. Here are some hints about that file: +The Whirr recipe for Druid is the configuration file `$WHIRR_HOME/recipies/druid.properties`. You can edit this file to suit your needs; it is annotated and self-explanatory. Here are some hints about that file: -* Set `whirr.location-id` to a specific AWS region (e.g., us-east-1) if desired, else one will be chosen for you. +* Set `whirr.location-id` to a specific AWS region if desired. If this is left blank, a region is chosen for you. The default value is `us-east-1`. * You can choose the hardware used with `whirr.hardware-id` to a specific instance type (e.g., m1.large). By default druid.properties, m3.2xlarge (broker, historical, middle manager), m1.xlarge (coordinator, overlord), and m1.small (zookeeper, mysql) are used. * If you don't choose an image via `whirr.image-id` (image must be compatible with hardware), you'll get plain vanilla Linux. Default druid.properties uses ami-018c9568 (Ubuntu 12.04). * SSH keys (not password protected) must exist for the local user. If they are in the default locations, `${sys:user.home}/.ssh/id_rsa` and `${sys:user.home}/.ssh/id_rsa.pub`, Whirr will find them. Otherwise, you'll have to specify them with `whirr.private-key-file` and `whirr.public-key-file`. -* Be sure to specify the absolute path of the Druid realtime spec file `realtime.spec` in `whirr.druid.realtime.spec.path`. -* Also make sure to specify the correct S3 bucket. Otherwise the cluster won't be able to process tasks. * Two Druid cluster templates (see `whirr.instance-templates`) are provided: a small cluster running on a single EC2 instance, and a larger cluster running on multiple instances. +* You must specify the path to an S3 bucket. Otherwise the cluster won't be able to process tasks. +* To successfully submit the test task below, you'll need to specify the location of the `wikipedia_realtime.spec` in the property `whirr.druid.realtime.spec.path`. +* Specify Druid version only if [Druid extenions](Modules.html) are being used. The following AWS information must be set in `druid.properties`, as environment variables, or in the file `$WHIRR_HOME/conf/credentials`: @@ -54,7 +51,7 @@ The following AWS information must be set in `druid.properties`, as environment How to get the IDENTITY and CREDENTIAL keys is discussed above. -In order to configure each node, you can edit `services/druid/src/main/resources/functions/start_druid.sh` for JVM configuration and `services/druid/src/main/resources/functions/configure_[NODE_NAME].sh` for specific node configuration. For more information on configuration, read the Druid documentations about it (http://druid.io/docs/0.6.116/Configuration.html). +In order to configure each node, you can edit `services/druid/src/main/resources/functions/start_druid.sh` for JVM configuration and `services/druid/src/main/resources/functions/configure_[NODE_NAME].sh` for specific node configuration. For more information on configuration, see the [Druid configuration documentation](Configuration.html). ### Start a Test Cluster With Whirr Run the following command: @@ -66,14 +63,14 @@ If Whirr starts without any errors, you should see the following message: Running on provider aws-ec2 using identity -You can then use the EC2 dashboard to locate the instance and confirm that it has started up. +You can then use the EC2 dashboard to locate the instances and confirm that they have started up. -If both the instance and the Druid cluster launch successfully, a few minutes later other messages to STDOUT should follow with information returned from EC2, including the instance ID: +If both the instances and the Druid cluster launch successfully, a few minutes later other messages to STDOUT should follow with information returned from EC2, including the instance ID: - Started cluster of 1 instances + Started cluster of 8 instances Cluster{instances=[Instance{roles=[zookeeper, druid-mysql, druid-coordinator, druid-broker, druid-historical, druid-realtime], publicIp= ... -The final message will contain login information for the instance. +The final message will contain login information for the instances. Note that Whirr will return an exception if any of the nodes fail to launch, and the cluster will be destroyed. To destroy the cluster manually, run the following command: @@ -85,9 +82,11 @@ Note that Whirr will return an exception if any of the nodes fail to launch, and Now you can run an indexing task and a simple query to see if all the nodes have launched correctly. We are going to use a Wikipedia example again. For a realtime indexing task, run the following command: ```bash -curl -X 'POST' -H 'Content-Type:application/json' -d @#{YOUR_DRUID_DIRECTORY}/examples/indexing/wikipedia_realtime_task.json #{OVERLORD_PUBLIC_IP_ADDR}:#{PORT}/druid/indexer/v1/task +curl -X 'POST' -H 'Content-Type:application/json' -d @#{PATH_TO}/wikipedia_realtime_task.json #{OVERLORD_PUBLIC_IP_ADDR}:#{PORT}/druid/indexer/v1/task ``` -Issuing the request should return a task ID. +where OVERLORD_PUBLIC_IP_ADDR should be available from the EC2 information logged to STDOUT, the Overlord port is 8080 by default, and `wikipedia_realtime_task.json` is discussed above. + +Issuing this request should return a task ID. To check the state of the overlord, open up your browser and go to `#{OVERLORD_PUBLIC_IP_ADDR}:#{PORT}/console.html`.