diff --git a/publications/demo/druid_demo.pdf b/publications/demo/druid_demo.pdf index 883d31e00a6..b8cbb7fc9c0 100644 Binary files a/publications/demo/druid_demo.pdf and b/publications/demo/druid_demo.pdf differ diff --git a/publications/demo/druid_demo.tex b/publications/demo/druid_demo.tex index 88479da1c82..fc83501b0b1 100644 --- a/publications/demo/druid_demo.tex +++ b/publications/demo/druid_demo.tex @@ -332,12 +332,9 @@ we also include results from synthetic workloads on TPC-H data. \subsection{Query Performance} Query latencies are shown in Figure~\ref{fig:query_latency} for a cluster -holding 10TB of data across several hundred nodes. The average queries per -minute during this time was approximately 1000. The number of dimensions the -various data sources vary from 25 to 78 dimensions, and 8 to 35 metrics. Across -all the various data sources, average query latency is approximately 550 -milliseconds, with 90\% of queries returning in less than 1 second, 95\% in -under 2 seconds, and 99\% of queries returning in less than 10 seconds. +hosting approximately 10.5TB of data using 1302 processing threads and 672 +total cores (hyperthreaded). There are approximately 50 billion rows of data in +this cluster. \begin{figure} \centering @@ -346,6 +343,20 @@ under 2 seconds, and 99\% of queries returning in less than 10 seconds. \label{fig:query_latency} \end{figure} +\begin{figure} +\centering +\includegraphics[width = 2.3in]{tpch_100gb} +\caption{Druid \& MySQL benchmarks -- 100GB TPC-H data.} +\label{fig:tpch_100gb} +\end{figure} + +The average queries per minute during this time was approximately +1000. The number of dimensions the various data sources vary from 25 to 78 +dimensions, and 8 to 35 metrics. Across all the various data sources, average +query latency is approximately 550 milliseconds, with 90\% of queries returning +in less than 1 second, 95\% in under 2 seconds, and 99\% of queries returning +in less than 10 seconds. + Approximately 30\% of the queries are standard aggregates involving different types of metrics and filters, 60\% of queries are ordered group bys over one or more dimensions with aggregates, and 10\% of @@ -354,13 +365,6 @@ columns scanned in aggregate queries roughly follows an exponential distribution. Queries involving a single column are very frequent, and queries involving all columns are very rare. -\begin{figure} -\centering -\includegraphics[width = 2.3in]{tpch_100gb} -\caption{Druid \& MySQL benchmarks -- 100GB TPC-H data.} -\label{fig:tpch_100gb} -\end{figure} - We also present Druid benchmarks on TPC-H data. Most TPC-H queries do not directly apply to Druid, so we selected queries more typical of Druid's workload to demonstrate query performance. As a comparison, we also provide the