diff --git a/docs/Batch-ingestion.md b/docs/Batch-ingestion.md index 6511b85b452..42a42ac7b29 100644 --- a/docs/Batch-ingestion.md +++ b/docs/Batch-ingestion.md @@ -9,7 +9,7 @@ There are two choices for batch data ingestion to your Druid cluster, you can us Which should I use? ------------------- -The [Indexing service](Indexing service.html) is a node that can run as part of your Druid cluster and can accomplish a number of different types of indexing tasks. Even if all you care about is batch indexing, it provides for the encapsulation of things like the Database that is used for segment metadata and other things, so that your indexing tasks do not need to include such information. Long-term, the indexing service is going to be the preferred method of ingesting data. +The [Indexing service](Indexing-service.html) is a node that can run as part of your Druid cluster and can accomplish a number of different types of indexing tasks. Even if all you care about is batch indexing, it provides for the encapsulation of things like the Database that is used for segment metadata and other things, so that your indexing tasks do not need to include such information. Long-term, the indexing service is going to be the preferred method of ingesting data. The `HadoopDruidIndexerMain` runs hadoop jobs in order to separate and index data segments. It takes advantage of Hadoop as a job scheduling and distributed job execution platform. It is a simple method if you already have Hadoop running and don’t want to spend the time configuring and deploying the [Indexing service](Indexing service.html) just yet. diff --git a/docs/Booting-a-production-cluster.md b/docs/Booting-a-production-cluster.md index d5fc38c8ce5..f7e5444ab8e 100644 --- a/docs/Booting-a-production-cluster.md +++ b/docs/Booting-a-production-cluster.md @@ -3,7 +3,7 @@ layout: default --- # Booting a Single Node Cluster # -[Loading Your Data](Loading Your Data.html) and [Querying Your Data](Querying Your Data.html) contain recipes to boot a small druid cluster on localhost. Here we will boot a small cluster on EC2. You can checkout the code, or download a tarball from [here](http://static.druid.io/artifacts/druid-services-0.5.51-SNAPSHOT-bin.tar.gz). +[Loading Your Data](Loading-Your-Data.html) and [Querying Your Data](Querying-Your-Data.html) contain recipes to boot a small druid cluster on localhost. Here we will boot a small cluster on EC2. You can checkout the code, or download a tarball from [here](http://static.druid.io/artifacts/druid-services-0.5.51-SNAPSHOT-bin.tar.gz). The [ec2 run script](https://github.com/metamx/druid/blob/master/examples/bin/run_ec2.sh), run_ec2.sh, is located at 'examples/bin' if you have checked out the code, or at the root of the project if you've downloaded a tarball. The scripts rely on the [Amazon EC2 API Tools](http://aws.amazon.com/developertools/351), and you will need to set three environment variables: diff --git a/docs/Configuration.md b/docs/Configuration.md index 544b9ea4f55..4042d02d825 100644 --- a/docs/Configuration.md +++ b/docs/Configuration.md @@ -91,7 +91,7 @@ These properties are for connecting with S3 and using it to pull down segments. ### JDBC connection -These properties specify the jdbc connection and other configuration around the “segments table” database. The only processes that connect to the DB with these properties are the [Master](Master.html) and [Indexing service](Indexing service.html). This is tested on MySQL. +These properties specify the jdbc connection and other configuration around the “segments table” database. The only processes that connect to the DB with these properties are the [Master](Master.html) and [Indexing service](Indexing-service.html). This is tested on MySQL. |Property|Description|Default| |--------|-----------|-------| diff --git a/docs/Druid-Personal-Demo-Cluster.md b/docs/Druid-Personal-Demo-Cluster.md index 498f8ff8e14..0ef9834f198 100644 --- a/docs/Druid-Personal-Demo-Cluster.md +++ b/docs/Druid-Personal-Demo-Cluster.md @@ -3,7 +3,7 @@ layout: default --- # Druid Personal Demo Cluster (DPDC) -Note, there are currently some issues with the CloudFormation. We are working through them and will update the documentation here when things work properly. In the meantime, the simplest way to get your feet wet with a cluster setup is to run through the instructions at [housejester/druid-test-harness](https://github.com/housejester/druid-test-harness), though it is based on an older version. If you just want to get a feel for the types of data and queries that you can issue, check out [Realtime Examples](Realtime Examples.html) +Note, there are currently some issues with the CloudFormation. We are working through them and will update the documentation here when things work properly. In the meantime, the simplest way to get your feet wet with a cluster setup is to run through the instructions at [housejester/druid-test-harness](https://github.com/housejester/druid-test-harness), though it is based on an older version. If you just want to get a feel for the types of data and queries that you can issue, check out [Realtime Examples](Realtime-Examples.html) ## Introduction To make it easy for you to get started with Druid, we created an AWS (Amazon Web Services) [CloudFormation](http://aws.amazon.com/cloudformation/) Template that allows you to create a small pre-configured Druid cluster using your own AWS account. The cluster contains a pre-loaded sample workload, the Wikipedia edit stream, and a basic query interface that gets you familiar with Druid capabilities like drill-downs and filters. diff --git a/docs/Examples.md b/docs/Examples.md index 2f48f60b1b5..4207911464b 100644 --- a/docs/Examples.md +++ b/docs/Examples.md @@ -34,7 +34,7 @@ Clone Druid and build it: Twitter Example --------------- -For a full tutorial based on the twitter example, check out this [Twitter Tutorial](Twitter Tutorial.html). +For a full tutorial based on the twitter example, check out this [Twitter Tutorial](Twitter-Tutorial.html). This Example uses a feature of Twitter that allows for sampling of it’s stream. We sample the Twitter stream via our [TwitterSpritzerFirehoseFactory](https://github.com/metamx/druid/blob/master/examples/src/main/java/druid/examples/twitter/TwitterSpritzerFirehoseFactory.java) class and use it to simulate the kinds of data you might ingest into Druid. Then, with the client part, the sample shows what kinds of analytics explorations you can do during and after the data is loaded. diff --git a/docs/GroupByQuery.md b/docs/GroupByQuery.md index 7e95ebcbdee..01edc6bdc7e 100644 --- a/docs/GroupByQuery.md +++ b/docs/GroupByQuery.md @@ -98,7 +98,7 @@ There are 9 main parts to a groupBy query: |granularity|Defines the granularity of the query. See [Granularities](Granularities.html)|yes| |filter|See [Filters](Filters.html)|no| |aggregations|See [Aggregations](Aggregations.html)|yes| -|postAggregations|See [Post Aggregations](Post Aggregations.html)|no| +|postAggregations|See [Post Aggregations](Post-Aggregations.html)|no| |intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes| |context|An additional JSON Object which can be used to specify certain flags.|no| diff --git a/docs/Libraries.md b/docs/Libraries.md index 75bc17c633c..0c57ffab3e8 100644 --- a/docs/Libraries.md +++ b/docs/Libraries.md @@ -13,6 +13,9 @@ Some great folks have written their own libraries to interact with Druid #### Ruby \* [madvertise/ruby-druid](https://github.com/madvertise/ruby-druid) - A ruby client for Druid +#### Python +\* [metamx/pydruid](https://github.com/metamx/pydruid) - A python client for Druid + #### Helper Libraries - [madvertise/druid-dumbo](https://github.com/madvertise/druid-dumbo) - Scripts to help generate batch configs for the ingestion of data into Druid diff --git a/docs/Loading-Your-Data.md b/docs/Loading-Your-Data.md index a5edd9d65ea..2e27fad8303 100644 --- a/docs/Loading-Your-Data.md +++ b/docs/Loading-Your-Data.md @@ -165,7 +165,7 @@ curl -X POST "http://localhost:8080/druid/v2/?pretty" \ } } ] ``` -Now you're ready for [Querying Your Data](Querying Your Data.html)! +Now you're ready for [Querying Your Data](Querying-Your-Data.html)! ## Loading Data with the HadoopDruidIndexer ## @@ -367,4 +367,4 @@ Now its time to run the Hadoop [Batch-ingestion](Batch-ingestion.html) job, Hado java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.realtime.specFile=realtime.spec -classpath lib/* com.metamx.druid.indexer.HadoopDruidIndexerMain batchConfig.json ``` -You can now move on to [Querying Your Data](Querying Your Data.html)! \ No newline at end of file +You can now move on to [Querying Your Data](Querying-Your-Data.html)! \ No newline at end of file diff --git a/docs/Master.md b/docs/Master.md index c96af56dea9..eb86a3e81fd 100644 --- a/docs/Master.md +++ b/docs/Master.md @@ -15,7 +15,7 @@ Rules Segments are loaded and dropped from the cluster based on a set of rules. Rules indicate how segments should be assigned to different compute node tiers and how many replicants of a segment should exist in each tier. Rules may also indicate when segments should be dropped entirely from the cluster. The master loads a set of rules from the database. Rules may be specific to a certain datasource and/or a default set of rules can be configured. Rules are read in order and hence the ordering of rules is important. The master will cycle through all available segments and match each segment with the first rule that applies. Each segment may only match a single rule -For more information on rules, see [Rule Configuration](Rule Configuration.html). +For more information on rules, see [Rule Configuration](Rule-Configuration.html). Cleaning Up Segments -------------------- diff --git a/docs/MySQL.md b/docs/MySQL.md index 88ef75006cf..713ad0ab18d 100644 --- a/docs/MySQL.md +++ b/docs/MySQL.md @@ -44,4 +44,4 @@ The config table is used to store runtime configuration objects. We do not have Task-related Tables ------------------- -There are also a number of tables created and used by the [Indexing Service](Indexing Service.html) in the course of its work. +There are also a number of tables created and used by the [Indexing Service](Indexing-Service.html) in the course of its work. diff --git a/docs/Querying-your-data.md b/docs/Querying-your-data.md index 5bf72a6fa54..dc3e04d645c 100644 --- a/docs/Querying-your-data.md +++ b/docs/Querying-your-data.md @@ -3,7 +3,7 @@ layout: default --- # Setup # -Before we start querying druid, we're going to finish setting up a complete cluster on localhost. In [Loading Your Data](Loading Your Data.html) we setup a [Realtime](Realtime.html), [Compute](Compute.html) and [Master](Master.html) node. If you've already completed that tutorial, you need only follow the directions for 'Booting a Broker Node'. +Before we start querying druid, we're going to finish setting up a complete cluster on localhost. In [Loading Your Data](Loading-Your-Data.html) we setup a [Realtime](Realtime.html), [Compute](Compute.html) and [Master](Master.html) node. If you've already completed that tutorial, you need only follow the directions for 'Booting a Broker Node'. ## Booting a Broker Node ## @@ -98,7 +98,7 @@ com.metamx.druid.http.ComputeMain # Querying Your Data # -Now that we have a complete cluster setup on localhost, we need to load data. To do so, refer to [Loading Your Data](Loading Your Data.html). Having done that, its time to query our data! For a complete specification of queries, see [Querying](Querying.html). +Now that we have a complete cluster setup on localhost, we need to load data. To do so, refer to [Loading Your Data](Loading-Your-Data.html). Having done that, its time to query our data! For a complete specification of queries, see [Querying](Querying.html). ## Querying Different Nodes ## @@ -363,4 +363,4 @@ Check out [Filters](Filters.html) for more. ## Learn More ## -You can learn more about querying at [Querying](Querying.html)! Now check out [Booting a production cluster](Booting a production cluster.html)! \ No newline at end of file +You can learn more about querying at [Querying](Querying.html)! Now check out [Booting a production cluster](Booting-a-production-cluster.html)! \ No newline at end of file diff --git a/docs/TimeseriesQuery.md b/docs/TimeseriesQuery.md index 9ea79fcfa75..62ebcee59f1 100644 --- a/docs/TimeseriesQuery.md +++ b/docs/TimeseriesQuery.md @@ -87,7 +87,7 @@ There are 7 main parts to a timeseries query: |granularity|Defines the granularity of the query. See [Granularities](Granularities.html)|yes| |filter|See [Filters](Filters.html)|no| |aggregations|See [Aggregations](Aggregations.html)|yes| -|postAggregations|See [Post Aggregations](Post Aggregations.html)|no| +|postAggregations|See [Post Aggregations](Post-Aggregations.html)|no| |intervals|A JSON Object representing ISO-8601 Intervals. This defines the time ranges to run the query over.|yes| |context|An additional JSON Object which can be used to specify certain flags.|no| diff --git a/docs/Tutorial:-A-First-Look-at-Druid.md b/docs/Tutorial:-A-First-Look-at-Druid.md index c3de0df1d91..987cf89fa28 100644 --- a/docs/Tutorial:-A-First-Look-at-Druid.md +++ b/docs/Tutorial:-A-First-Look-at-Druid.md @@ -357,9 +357,9 @@ Feel free to tweak other query parameters to answer other questions you may have Next Steps ---------- -What to know even more information about the Druid Cluster? Check out [Tutorial: The Druid Cluster](Tutorial: The Druid Cluster.html) +What to know even more information about the Druid Cluster? Check out [Tutorial: The Druid Cluster](Tutorial:-The-Druid-Cluster.html) -Druid is even more fun if you load your own data into it! To learn how to load your data, see [Loading Your Data](Loading Your Data.html). +Druid is even more fun if you load your own data into it! To learn how to load your data, see [Loading Your Data](Loading-Your-Data.html). Additional Information ---------------------- diff --git a/docs/Tutorial:-The-Druid-Cluster.md b/docs/Tutorial:-The-Druid-Cluster.md index 286447cc3cd..282ec9fa7f8 100644 --- a/docs/Tutorial:-The-Druid-Cluster.md +++ b/docs/Tutorial:-The-Druid-Cluster.md @@ -19,7 +19,7 @@ tar -zxvf druid-services-*-bin.tar.gz cd druid-services-* ``` -You can also [Build From Source](Build From Source.html). +You can also [Build From Source](Build-From-Source.html). ## External Dependencies ## diff --git a/docs/Tutorial:-Webstream.md b/docs/Tutorial:-Webstream.md index bbfb42450fd..bfb7ed73bed 100644 --- a/docs/Tutorial:-Webstream.md +++ b/docs/Tutorial:-Webstream.md @@ -346,8 +346,8 @@ Feel free to tweak other query parameters to answer other questions you may have Next Steps ---------- -What to know even more information about the Druid Cluster? Check out [Tutorial: The Druid Cluster](Tutorial: The Druid Cluster.html) -Druid is even more fun if you load your own data into it! To learn how to load your data, see [Loading Your Data](Loading Your Data.html). +What to know even more information about the Druid Cluster? Check out [Tutorial: The Druid Cluster](Tutorial:-The-Druid-Cluster.html) +Druid is even more fun if you load your own data into it! To learn how to load your data, see [Loading Your Data](Loading-Your-Data.html). Additional Information ---------------------- diff --git a/docs/contents.md b/docs/contents.md index 0d3f7f9cb62..963f88926e1 100644 --- a/docs/contents.md +++ b/docs/contents.md @@ -9,20 +9,20 @@ Contents ======================== Getting Started -\* [Tutorial: A First Look at Druid](Tutorial: A First Look at Druid.html) -\* [Tutorial: The Druid Cluster](Tutorial: The Druid Cluster.html) -\* [Loading Your Data](Loading Your Data.html) -\* [Querying Your Data](Querying Your Data.html) -\* [Booting a Production Cluster](Booting a Production Cluster.html) +\* [Tutorial: A First Look at Druid](Tutorial:-A-First-Look-at-Druid.html) +\* [Tutorial: The Druid Cluster](Tutorial:-The-Druid-Cluster.html) +\* [Loading Your Data](Loading-Your-Data.html) +\* [Querying Your Data](Querying-Your-Data.html) +\* [Booting a Production Cluster](Booting-a-Production-Cluster.html) \* [Examples](Examples.html) -\* [Cluster Setup](Cluster Setup.html) +\* [Cluster Setup](Cluster-Setup.html) \* [Configuration](Configuration.html) -------------------------------------- Data Ingestion \* [Realtime](Realtime.html) -\* [Batch|Batch Ingestion](Batch|Batch Ingestion.html) -\* [Indexing Service](Indexing Service.html) +\* [Batch|Batch Ingestion](Batch|Batch-Ingestion.html) +\* [Indexing Service](Indexing-Service.html) ---------------------------- Querying @@ -57,12 +57,12 @@ Architecture **\* ] **\* [MySQL](MySQL.html) **\* ] -** [Concepts and Terminology](Concepts and Terminology.html) +** [Concepts and Terminology](Concepts-and-Terminology.html) ------------------------------- Development \* [Versioning](Versioning.html) -\* [Build From Source](Build From Source.html) +\* [Build From Source](Build-From-Source.html) \* [Libraries](Libraries.html) ------------------------