diff --git a/docs/content/ingestion/overview.md b/docs/content/ingestion/overview.md index 64e46f4243e..94330880763 100644 --- a/docs/content/ingestion/overview.md +++ b/docs/content/ingestion/overview.md @@ -14,7 +14,7 @@ If you have a continuous stream of data, there are a few options to get your dat ### Ingest from a Stream Processor -If you process your data using a stream processor such as Apache Samza or Apache Storm, you can use the [Tranquility](https://github.com/metamx/tranquility) library to manage +If you process your data using a stream processor such as Apache Samza or Apache Storm, you can use the [Tranquility](https://github.com/druid-io/tranquility) library to manage your real-time ingestion. This setup requires using the indexing service for ingestion, which is what is used in production by many organizations that use Druid. ### Ingest from Apache Kafka diff --git a/docs/content/ingestion/realtime-ingestion.md b/docs/content/ingestion/realtime-ingestion.md index b17782a6dca..6a499615bb1 100644 --- a/docs/content/ingestion/realtime-ingestion.md +++ b/docs/content/ingestion/realtime-ingestion.md @@ -10,7 +10,7 @@ For Real-time Node Configuration, see [Realtime Configuration](../configuration/ For writing your own plugins to the real-time node, see [Firehose](../ingestion/firehose.html). -There are two ways of ingesting real-time data. This can be achieved with a standalone real-time node, or using the [Tranquility](https://github.com/metamx/tranquility) client library as part of the [Indexing Service](../design/indexing-service.html). For a full explanation of why there are two methods, please see [this link](https://groups.google.com/forum/#!searchin/druid-development/fangjin$20yang$20%22thoughts%22/druid-development/aRMmNHQGdhI/muBGl0Xi_wgJ). If you are comfortable with the limitations of standalone real-time nodes, you can use them as they are easier to set up. The indexing service is a more robust and highly available solution but will also require more effort to set up. +There are two ways of ingesting real-time data. This can be achieved with a standalone real-time node, or using the [Tranquility](https://github.com/druid-io/tranquility) client library as part of the [Indexing Service](../design/indexing-service.html). For a full explanation of why there are two methods, please see [this link](https://groups.google.com/forum/#!searchin/druid-development/fangjin$20yang$20%22thoughts%22/druid-development/aRMmNHQGdhI/muBGl0Xi_wgJ). If you are comfortable with the limitations of standalone real-time nodes, you can use them as they are easier to set up. The indexing service is a more robust and highly available solution but will also require more effort to set up. ## Realtime Node Ingestion @@ -238,7 +238,7 @@ You can use type `numbered` similarly. Note that type `none` is essentially type ## Realtime Ingestion using the Indexing Service -We strongly recommend using the client library [Tranquility](https://github.com/metamx/tranquility) for this use case. Please read the documentation on the Tranquility web page. +We strongly recommend using the client library [Tranquility](https://github.com/druid-io/tranquility) for this use case. Please read the documentation on the Tranquility web page. ## Constraints diff --git a/docs/content/misc/tasks.md b/docs/content/misc/tasks.md index 8c49002e299..be4127b38ab 100644 --- a/docs/content/misc/tasks.md +++ b/docs/content/misc/tasks.md @@ -157,7 +157,7 @@ If you are having trouble with any extensions in HadoopIndexTask, it may be the ### Realtime Index Task -The indexing service can also run real-time tasks. These tasks effectively transform a middle manager into a real-time node. We introduced real-time tasks as a way to programmatically add new real-time data sources without needing to manually add nodes. We recommend you use the library [tranquility](https://github.com/metamx/tranquility) to programmatically manage generating real-time index tasks. The grammar for the real-time task is as follows: +The indexing service can also run real-time tasks. These tasks effectively transform a middle manager into a real-time node. We introduced real-time tasks as a way to programmatically add new real-time data sources without needing to manually add nodes. We recommend you use the library [tranquility](https://github.com/druid-io/tranquility) to programmatically manage generating real-time index tasks. The grammar for the real-time task is as follows: ```json { diff --git a/docs/content/tutorials/tutorial-loading-streaming-data.md b/docs/content/tutorials/tutorial-loading-streaming-data.md index d76ae0862fb..4db4d1f8c7a 100644 --- a/docs/content/tutorials/tutorial-loading-streaming-data.md +++ b/docs/content/tutorials/tutorial-loading-streaming-data.md @@ -144,7 +144,7 @@ download](http://static.druid.io/artifacts/releases/druid-services-0.7.1-bin.tar Druid offers an additional method of ingesting streaming data via the indexing service. You may be wondering why a second method is needed. Standalone real-time nodes are sufficient for certain volumes of data and availability tolerances. They pull data from a message queue like Kafka or Rabbit, index data locally, and periodically finalize segments for handoff to historical nodes. They are fairly straightforward to scale, simply taking advantage of the innate scalability of the backing message queue. But they are difficult to make highly available with Kafka, the most popular supported message queue, because its high-level consumer doesn’t provide a way to scale out two replicated consumer groups such that each one gets the same data in the same shard. They also become difficult to manage once you have a lot of them, since every machine needs a unique configuration. -Druid solved the availability problem by switching from a pull-based model to a push-based model; rather than Druid indexers pulling data from Kafka, another process pulls data and pushes the data into Druid. Since with the push based model, we can ensure that the same data makes it into the same shard, we can replicate data. The [indexing service](../design/indexing-service.html) encapsulates this functionality, where a task-and-resources model replaces a standalone machine model. In addition to simplifying machine configuration, the model also allows nodes to run in the cloud with an elastic number of machines. If you are interested in this form of real-time ingestion, please check out the client library [Tranquility](https://github.com/metamx/tranquility). +Druid solved the availability problem by switching from a pull-based model to a push-based model; rather than Druid indexers pulling data from Kafka, another process pulls data and pushes the data into Druid. Since with the push based model, we can ensure that the same data makes it into the same shard, we can replicate data. The [indexing service](../design/indexing-service.html) encapsulates this functionality, where a task-and-resources model replaces a standalone machine model. In addition to simplifying machine configuration, the model also allows nodes to run in the cloud with an elastic number of machines. If you are interested in this form of real-time ingestion, please check out the client library [Tranquility](https://github.com/druid-io/tranquility). Additional Information ----------------------