more edits

2014-03-23 14:44:46 -07:00 · 2014-03-23 14:44:46 -07:00 · e863afd375
parent aec12ee3cc
commit e863afd375
2 changed files with 17 additions and 13 deletions
--- a/publications/demo/druid_demo.pdf
+++ b/publications/demo/druid_demo.pdf
--- a/publications/demo/druid_demo.tex
+++ b/publications/demo/druid_demo.tex
@ -332,12 +332,9 @@ we also include results from synthetic workloads on TPC-H data.

 \subsection{Query Performance}
 Query latencies are shown in Figure~\ref{fig:query_latency} for a cluster
-holding 10TB of data across several hundred nodes. The average queries per
-minute during this time was approximately 1000. The number of dimensions the
-various data sources vary from 25 to 78 dimensions, and 8 to 35 metrics. Across
-all the various data sources, average query latency is approximately 550
-milliseconds, with 90\% of queries returning in less than 1 second, 95\% in
-under 2 seconds, and 99\% of queries returning in less than 10 seconds.  
+hosting approximately 10.5TB of data using 1302 processing threads and 672
+total cores (hyperthreaded). There are approximately 50 billion rows of data in
+this cluster.

 \begin{figure}
 \centering
@ -346,6 +343,20 @@ under 2 seconds, and 99\% of queries returning in less than 10 seconds.
 \label{fig:query_latency}
 \end{figure}

+\begin{figure}
+\centering
+\includegraphics[width = 2.3in]{tpch_100gb}
+\caption{Druid \& MySQL benchmarks -- 100GB TPC-H data.}
+\label{fig:tpch_100gb}
+\end{figure}
+
+The average queries per minute during this time was approximately
+1000. The number of dimensions the various data sources vary from 25 to 78
+dimensions, and 8 to 35 metrics. Across all the various data sources, average
+query latency is approximately 550 milliseconds, with 90\% of queries returning
+in less than 1 second, 95\% in under 2 seconds, and 99\% of queries returning
+in less than 10 seconds.  
+
 Approximately 30\% of the queries are standard
 aggregates involving different types of metrics and filters, 60\% of queries
 are ordered group bys over one or more dimensions with aggregates, and 10\% of
@ -354,13 +365,6 @@ columns scanned in aggregate queries roughly follows an exponential
 distribution. Queries involving a single column are very frequent, and queries
 involving all columns are very rare.

-\begin{figure}
-\centering
-\includegraphics[width = 2.3in]{tpch_100gb}
-\caption{Druid \& MySQL benchmarks -- 100GB TPC-H data.}
-\label{fig:tpch_100gb}
-\end{figure}
-
 We also present Druid benchmarks on TPC-H data.  Most TPC-H queries do
 not directly apply to Druid, so we selected queries more typical of Druid's
 workload to demonstrate query performance. As a comparison, we also provide the