more edits

This commit is contained in:
fjy 2014-03-23 14:44:46 -07:00
parent aec12ee3cc
commit e863afd375
2 changed files with 17 additions and 13 deletions

Binary file not shown.

View File

@ -332,12 +332,9 @@ we also include results from synthetic workloads on TPC-H data.
\subsection{Query Performance}
Query latencies are shown in Figure~\ref{fig:query_latency} for a cluster
holding 10TB of data across several hundred nodes. The average queries per
minute during this time was approximately 1000. The number of dimensions the
various data sources vary from 25 to 78 dimensions, and 8 to 35 metrics. Across
all the various data sources, average query latency is approximately 550
milliseconds, with 90\% of queries returning in less than 1 second, 95\% in
under 2 seconds, and 99\% of queries returning in less than 10 seconds.
hosting approximately 10.5TB of data using 1302 processing threads and 672
total cores (hyperthreaded). There are approximately 50 billion rows of data in
this cluster.
\begin{figure}
\centering
@ -346,6 +343,20 @@ under 2 seconds, and 99\% of queries returning in less than 10 seconds.
\label{fig:query_latency}
\end{figure}
\begin{figure}
\centering
\includegraphics[width = 2.3in]{tpch_100gb}
\caption{Druid \& MySQL benchmarks -- 100GB TPC-H data.}
\label{fig:tpch_100gb}
\end{figure}
The average queries per minute during this time was approximately
1000. The number of dimensions the various data sources vary from 25 to 78
dimensions, and 8 to 35 metrics. Across all the various data sources, average
query latency is approximately 550 milliseconds, with 90\% of queries returning
in less than 1 second, 95\% in under 2 seconds, and 99\% of queries returning
in less than 10 seconds.
Approximately 30\% of the queries are standard
aggregates involving different types of metrics and filters, 60\% of queries
are ordered group bys over one or more dimensions with aggregates, and 10\% of
@ -354,13 +365,6 @@ columns scanned in aggregate queries roughly follows an exponential
distribution. Queries involving a single column are very frequent, and queries
involving all columns are very rare.
\begin{figure}
\centering
\includegraphics[width = 2.3in]{tpch_100gb}
\caption{Druid \& MySQL benchmarks -- 100GB TPC-H data.}
\label{fig:tpch_100gb}
\end{figure}
We also present Druid benchmarks on TPC-H data. Most TPC-H queries do
not directly apply to Druid, so we selected queries more typical of Druid's
workload to demonstrate query performance. As a comparison, we also provide the