remove unneeded

2015-11-09 17:03:00 -08:00 · 2015-11-09 17:03:00 -08:00 · 46bf1ba5ef
parent 8a8bb0369e
commit 46bf1ba5ef
2 changed files with 0 additions and 55 deletions
--- a/docs/content/comparisons/druid-vs-hadoop.md
+++ b/docs/content/comparisons/druid-vs-hadoop.md
@ -1,25 +0,0 @@
 ---
 layout: doc_page
 ---
 Druid vs Hadoop (HDFS/MapReduce)
 ================================
 Hadoop has shown the world that it’s possible to house your data warehouse on commodity hardware for a fraction of the price 
 of typical solutions. As people adopt Hadoop for their data warehousing needs, they find two things.
 1.  They can now query all of their data in a fairly flexible manner and answer any question they have
 2.  The queries take a long time
 The first one is the joy that everyone feels the first time they get Hadoop running. The latter is what they realize after they have used Hadoop interactively for a while because Hadoop is optimized for throughput, not latency. 
 Druid is a complementary addition to Hadoop. Hadoop is great at storing and making accessible large amounts of individually low-value data. 
 Unfortunately, Hadoop is not great at providing query speed guarantees on top of that data, nor does it have very good 
 operational characteristics for a customer-facing production system. Druid, on the other hand, excels at taking high-value 
 summaries of the low-value data on Hadoop, making it available in a fast and always-on fashion, such that it could be exposed directly to a customer.
 Druid also requires some infrastructure to exist for [deep storage](../dependencies/deep-storage.html). 
 HDFS is one of the implemented options for this [deep storage](../dependencies/deep-storage.html).
 Please note we are only comparing Druid to base Hadoop here, but we welcome comparisons of Druid vs other systems or combinations 
 of systems in the Hadoop ecosystems.
--- a/docs/content/comparisons/druid-vs-storage-formats.md
+++ b/docs/content/comparisons/druid-vs-storage-formats.md
@ -1,30 +0,0 @@
 ---
 layout: doc_page
 ---
 Druid vs Storage Formats (Parquet/Kudu)
 =======================================
 The biggest difference between Druid and existing storage formats is that Druid includes an execution engine that can run 
 queries, and a time-optimized data management system for advanced data retention and distribution,. 
 The Druid segment is a custom column format designed for fast aggregates and filters. Below we compare Druid's segment format to 
 other existing formats. 
 ## Druid vs Parquet
 Druid's storage format is highly optimized for linear scans. Although Druid has support for nested data, Parquet's storage format is much 
 more hierachical, and is more designed for binary chunking. In theory, this should lead to faster scans in Druid.
 ## Druid vs Kudu
 Kudu's storage format enables single row updates, whereas updates to existing Druid segments requires recreating the segment, hence,   
 the process for updating old values is slower in Druid. The requirements in Kudu for maintaining extra head space to store 
 updates as well as organizing data by id instead of time has the potential to introduce some extra latency and accessing 
 of data that is not need to answer a query at query time. Druid summarizes/rollups up data at ingestion time, which in practice reduces the raw data that needs to be 
 stored by an average of 40 times, and increases performance of scanning raw data significantly. 
 This summarization processes loses information about individual events however. Druid segments also contain bitmap indexes for 
 fast filtering, which Kudu does not currently support. Druid's segment architecture is heavily geared towards fast aggregates and filters, and for OLAP workflows. Appends are very 
 fast in Druid, whereas updates of older data is slower. This is by design as the data Druid is good far is typically event data, and does not need to be updated too frequently. 
 Kudu supports arbitrary primary keys with uniqueness constraints, and efficient lookup by ranges of those keys. 
 Kudu chooses not to include the execution engine, but supports sufficient operations so as to allow node-local processing from the execution engines. 
 This means that Kudu can support multiple frameworks on the same data (eg MR, Spark, and SQL).