From 6d05f4fe5e704ecfc9bb37f5777a267fb869450e Mon Sep 17 00:00:00 2001 From: fjy Date: Wed, 18 Feb 2015 14:07:10 -0800 Subject: [PATCH] Add more compare docs --- docs/content/Druid-vs-Elasticsearch.md | 12 ++++++++++++ docs/content/Druid-vs-Spark.md | 21 +++++++++++++++++++++ docs/content/index.md | 2 ++ 3 files changed, 35 insertions(+) create mode 100644 docs/content/Druid-vs-Elasticsearch.md create mode 100644 docs/content/Druid-vs-Spark.md diff --git a/docs/content/Druid-vs-Elasticsearch.md b/docs/content/Druid-vs-Elasticsearch.md new file mode 100644 index 00000000000..362af084a5f --- /dev/null +++ b/docs/content/Druid-vs-Elasticsearch.md @@ -0,0 +1,12 @@ +--- +layout: doc_page +--- + +Druid vs Elasticsearch +====================== + +We are not experts on Elasticsearch, if anything is incorrect about our portrayal, please let us know on the mailing list or via some other means. + +Elasticsearch is a search server based on Apache Lucene. It provides full text search for schema-free documents and provides access to raw event level data. Elasticsearch also provides support for analytics and aggregations. Based on [user testimony](https://groups.google.com/forum/#!msg/druid-development/nlpwTHNclj8/sOuWlKOzPpYJ), the resource requirements for data ingestion and aggregation in Elasticsearch are higher than those of Druid. + +Druid focuses on OLAP work flows. Druid is optimized for high performance (fast aggregation and ingestion) at low cost, and supports a wide range of analytic operations. Druid has some basic search support for structured event data. diff --git a/docs/content/Druid-vs-Spark.md b/docs/content/Druid-vs-Spark.md new file mode 100644 index 00000000000..9032c4e65d6 --- /dev/null +++ b/docs/content/Druid-vs-Spark.md @@ -0,0 +1,21 @@ +--- +layout: doc_page +--- + +Druid vs Spark +============== + +We are not experts on Spark, if anything is incorrect about our portrayal, please let us know on the mailing list or via some other means. + +Spark is a cluster computing framework built around the concept of Resilient Distributed Datasets (RDDs) and +can be viewed as a back-office analytics platform. RDDs enable data reuse by persisting intermediate results +in memory and enable Spark to provide fast computations for iterative algorithms. +This is especially beneficial for certain work flows such as machine +learning, where the same operation may be applied over and over +again until some result is converged upon. Spark provides analysts with +the ability to run queries and analyze large amounts of data with a +wide array of different algorithms. + +Druid is designed to power analytic applications and focuses on the latencies to ingest data and serve queries +over that data. If you were to build a web UI where users could +arbitrarily explore data, the latencies seen by using Spark may be too slow for interactive use cases. diff --git a/docs/content/index.md b/docs/content/index.md index cda5af3ab16..811ca3f6bd1 100644 --- a/docs/content/index.md +++ b/docs/content/index.md @@ -45,6 +45,8 @@ Druid vs… * [Druid-vs-Vertica](Druid-vs-Vertica.html) * [Druid-vs-Cassandra](Druid-vs-Cassandra.html) * [Druid-vs-Hadoop](Druid-vs-Hadoop.html) +* [Druid-vs-Spark](Druid-vs-Spark.html) +* [Druid-vs-Elasticsearch](Druid-vs-Elasticsearch.html) About This Page