tidy up things a bit

This commit is contained in:
Xavier Léauté 2014-03-12 22:03:39 -07:00
parent 65a5dcaa3c
commit c8f37efc95
1 changed files with 20 additions and 13 deletions

View File

@ -611,7 +611,7 @@ van2011memory} and often utilize run-length encoding. Druid opted to use the
Concise algorithm \cite{colantonio2010concise} as it can outperform WAH by Concise algorithm \cite{colantonio2010concise} as it can outperform WAH by
reducing compressed bitmap size by up to 50\%. Figure~\ref{fig:concise_plot} reducing compressed bitmap size by up to 50\%. Figure~\ref{fig:concise_plot}
illustrates the number of bytes using Concise compression versus using an illustrates the number of bytes using Concise compression versus using an
integer array. The results were generated on a cc2.8xlarge system with a single integer array. The results were generated on a \texttt{cc2.8xlarge} system with a single
thread, 2G heap, 512m young gen, and a forced GC between each run. The data set thread, 2G heap, 512m young gen, and a forced GC between each run. The data set
is a single days worth of data collected from the Twitter garden hose is a single days worth of data collected from the Twitter garden hose
\cite{twitter2013} data stream. The data set contains 2,272,295 rows and 12 \cite{twitter2013} data stream. The data set contains 2,272,295 rows and 12
@ -713,15 +713,18 @@ requiring joins on your queries often limits the performance you can achieve.
\label{sec:benchmarks} \label{sec:benchmarks}
Druid runs in production at several organizations, and to demonstrate its Druid runs in production at several organizations, and to demonstrate its
performance, we have chosen to share some real world numbers for the main production performance, we have chosen to share some real world numbers for the main production
cluster running at Metamarkets in early 2014. cluster running at Metamarkets in early 2014. For comparison with other databases
we also include results from synthetic workloads on TPC-H data.
\subsection{Query Performance} \subsection{Query Performance in Production}
Druid query performance can vary signficantly depending on the query Druid query performance can vary signficantly depending on the query
being issued. For example, sorting the values of a high cardinality dimension being issued. For example, sorting the values of a high cardinality dimension
based on a given metric is much more expensive than a simple count over a time based on a given metric is much more expensive than a simple count over a time
range. To showcase the average query latencies in a production Druid cluster, range. To showcase the average query latencies in a production Druid cluster,
we selected 8 of our most queried data sources, described in we selected 8 of our most queried data sources, described in
Table~\ref{tab:datasources}. Approximately 30\% of the queries are standard Table~\ref{tab:datasources}.
Approximately 30\% of the queries are standard
aggregates involving different types of metrics and filters, 60\% of queries aggregates involving different types of metrics and filters, 60\% of queries
are ordered group bys over one or more dimensions with aggregates, and 10\% of are ordered group bys over one or more dimensions with aggregates, and 10\% of
queries are search queries and metadata retrieval queries. The number of queries are search queries and metadata retrieval queries. The number of
@ -786,10 +789,13 @@ that across the various data sources, the average query latency is approximately
\label{fig:queries_per_min} \label{fig:queries_per_min}
\end{figure} \end{figure}
\subsection{Query Benchmarks on TPC-H Data}
We also present Druid benchmarks on TPC-H data. Our setup used Amazon EC2 We also present Druid benchmarks on TPC-H data. Our setup used Amazon EC2
\texttt{m3.2xlarge} (Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz) instances for \texttt{m3.2xlarge} (Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz) instances for
historical nodes and \texttt{c3.2xlarge} (Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz) instances for broker historical nodes and \texttt{c3.2xlarge} (Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz) instances for broker
nodes. Most TPC-H queries do not directly apply to Druid, so we nodes.
Most TPC-H queries do not directly apply to Druid, so we
selected queries more typical of Druid's workload to demonstrate query performance. As a selected queries more typical of Druid's workload to demonstrate query performance. As a
comparison, we also provide the results of the same queries using MySQL using the comparison, we also provide the results of the same queries using MySQL using the
MyISAM engine (InnoDB was slower in our experiments). Our MySQL setup was an Amazon MyISAM engine (InnoDB was slower in our experiments). Our MySQL setup was an Amazon
@ -807,14 +813,14 @@ and 36,246,530 rows/second/core for a \texttt{select sum(float)} type query.
\begin{figure} \begin{figure}
\centering \centering
\includegraphics[width = 2.8in]{tpch_1gb} \includegraphics[width = 2.8in]{tpch_1gb}
\caption{Druid and MySQL (MyISAM) benchmarks with the TPC-H 1 GB data set.} \caption{Druid \& MySQL benchmarks -- 1GB TPC-H data.}
\label{fig:tpch_1gb} \label{fig:tpch_1gb}
\end{figure} \end{figure}
\begin{figure} \begin{figure}
\centering \centering
\includegraphics[width = 2.8in]{tpch_100gb} \includegraphics[width = 2.8in]{tpch_100gb}
\caption{Druid and MySQL (MyISAM) benchmarks with the TPC-H 100 GB data set.} \caption{Druid \& MySQL benchmarks -- 100GB TPC-H data.}
\label{fig:tpch_100gb} \label{fig:tpch_100gb}
\end{figure} \end{figure}
@ -832,7 +838,7 @@ well.
\begin{figure} \begin{figure}
\centering \centering
\includegraphics[width = 2.8in]{tpch_scaling} \includegraphics[width = 2.8in]{tpch_scaling}
\caption{Scaling a Druid cluster with the TPC-H 100 GB data set.} \caption{Druid scaling benchmarks -- 100GB TPC-H data.}
\label{fig:tpch_scaling} \label{fig:tpch_scaling}
\end{figure} \end{figure}
@ -860,7 +866,7 @@ chracteristics.
\label{tab:ingest_datasources} \label{tab:ingest_datasources}
\begin{tabular}{| l | l | l | l |} \begin{tabular}{| l | l | l | l |}
\hline \hline
\scriptsize\textbf{Data Source} & \scriptsize\textbf{Dimensions} & \scriptsize\textbf{Metrics} & \scriptsize\textbf{Peak Throughput (events/sec)} \\ \hline \scriptsize\textbf{Data Source} & \scriptsize\textbf{Dimensions} & \scriptsize\textbf{Metrics} & \scriptsize\textbf{Peak events/s} \\ \hline
\texttt{s} & 7 & 2 & 28334.60 \\ \hline \texttt{s} & 7 & 2 & 28334.60 \\ \hline
\texttt{t} & 10 & 7 & 68808.70 \\ \hline \texttt{t} & 10 & 7 & 68808.70 \\ \hline
\texttt{u} & 5 & 1 & 49933.93 \\ \hline \texttt{u} & 5 & 1 & 49933.93 \\ \hline
@ -882,7 +888,7 @@ Figure~\ref{fig:ingestion_rate}. We define throughput as the number of events a
real-time node can ingest and also make queryable. If too many events are sent real-time node can ingest and also make queryable. If too many events are sent
to the real-time node, those events are blocked until the real-time node has to the real-time node, those events are blocked until the real-time node has
capacity to accept them. The peak ingestion latency we measured in production capacity to accept them. The peak ingestion latency we measured in production
was 22914.43 events/sec/core on a datasource with 30 dimensions and 19 metrics, was 22914.43 events/s/core on a datasource with 30 dimensions and 19 metrics,
running an Amazon \texttt{cc2.8xlarge} instance. running an Amazon \texttt{cc2.8xlarge} instance.
\begin{figure} \begin{figure}
@ -932,7 +938,7 @@ factor of cost. It is extremely rare to see more than 2 nodes fail at once and
never recover and hence, we leave enough capacity to completely reassign the never recover and hence, we leave enough capacity to completely reassign the
data from 2 historical nodes. data from 2 historical nodes.
\paragraph{Outages} \paragraph{Data Center Outages}
Complete cluster failures are possible, but extremely rare. When running Complete cluster failures are possible, but extremely rare. When running
in a single data center, it is possible for the entire data center to fail. In in a single data center, it is possible for the entire data center to fail. In
such a case, a new cluster needs to be created. As long as deep storage is such a case, a new cluster needs to be created. As long as deep storage is
@ -960,8 +966,9 @@ made in production and what users are most interested in.
\subsection{Pairing Druid with a Stream Processor} \subsection{Pairing Druid with a Stream Processor}
At the time of writing, Druid can only understand fully denormalized data At the time of writing, Druid can only understand fully denormalized data
streams. In order to provide full business logic in production, Druid can be streams. In order to provide full business logic in production, Druid can be
paired with a stream processor such as Apache Storm \cite{marz2013storm}. A paired with a stream processor such as Apache Storm \cite{marz2013storm}.
Storm topology consumes events from a data stream, retains only those that are
A Storm topology consumes events from a data stream, retains only those that are
“on-time”, and applies any relevant business logic. This could range from “on-time”, and applies any relevant business logic. This could range from
simple transformations, such as id to name lookups, up to complex operations simple transformations, such as id to name lookups, up to complex operations
such as multi-stream joins. The Storm topology forwards the processed event such as multi-stream joins. The Storm topology forwards the processed event