fix typos

This commit is contained in:
Xavier Léauté 2014-03-21 14:33:44 -07:00
parent e523bfc237
commit 7093495a9f
1 changed files with 8 additions and 8 deletions

View File

@ -43,7 +43,7 @@ created a surge in machine-generated events. Individually, these
events contain minimal useful information and are of low value. Given the events contain minimal useful information and are of low value. Given the
time and resources required to extract meaning from large collections of time and resources required to extract meaning from large collections of
events, many companies were willing to discard this data instead. Although events, many companies were willing to discard this data instead. Although
infrastructure has been built to handle event based data (e.g. IBM's infrastructure has been built to handle event-based data (e.g. IBM's
Netezza\cite{singh2011introduction}, HP's Vertica\cite{bear2012vertica}, and EMC's Netezza\cite{singh2011introduction}, HP's Vertica\cite{bear2012vertica}, and EMC's
Greenplum\cite{miner2012unified}), they are largely sold at high price points Greenplum\cite{miner2012unified}), they are largely sold at high price points
and are only targeted towards those companies who can afford the offering. and are only targeted towards those companies who can afford the offering.
@ -146,7 +146,7 @@ Relational Database Management Systems (RDBMS) and NoSQL key/value stores were
unable to provide a low latency data ingestion and query platform for unable to provide a low latency data ingestion and query platform for
interactive applications \cite{tschetter2011druid}. In the early days of interactive applications \cite{tschetter2011druid}. In the early days of
Metamarkets, we were focused on building a hosted dashboard that would allow Metamarkets, we were focused on building a hosted dashboard that would allow
users to arbitrary explore and visualize event streams. The data store users to arbitrarily explore and visualize event streams. The data store
powering the dashboard needed to return queries fast enough that the data powering the dashboard needed to return queries fast enough that the data
visualizations built on top of it could provide users with an interactive visualizations built on top of it could provide users with an interactive
experience. experience.
@ -198,7 +198,7 @@ Figure~\ref{fig:cluster}.
Real-time nodes encapsulate the functionality to ingest and query event Real-time nodes encapsulate the functionality to ingest and query event
streams. Events indexed via these nodes are immediately available for querying. streams. Events indexed via these nodes are immediately available for querying.
The nodes are only concerned with events for some small time range and The nodes are only concerned with events for some small time range and
periodically hand off immutable batches of events they've collected over this periodically hand off immutable batches of events they have collected over this
small time range to other nodes in the Druid cluster that are specialized in small time range to other nodes in the Druid cluster that are specialized in
dealing with batches of immutable events. Real-time nodes leverage Zookeeper dealing with batches of immutable events. Real-time nodes leverage Zookeeper
\cite{hunt2010zookeeper} for coordination with the rest of the Druid cluster. \cite{hunt2010zookeeper} for coordination with the rest of the Druid cluster.
@ -789,7 +789,7 @@ approximately 10TB of segments loaded. Collectively,
there are about 50 billion Druid rows in this tier. Results for there are about 50 billion Druid rows in this tier. Results for
every data source are not shown. every data source are not shown.
\item The hot tier uses Xeon E5-2670 processors and consists of 1302 processing \item The hot tier uses Intel Xeon E5-2670 processors and consists of 1302 processing
threads and 672 total cores (hyperthreaded). threads and 672 total cores (hyperthreaded).
\item A memory-mapped storage engine was used (the machine was configured to \item A memory-mapped storage engine was used (the machine was configured to
@ -828,7 +828,7 @@ comparison, we also provide the results of the same queries using MySQL using th
MyISAM engine (InnoDB was slower in our experiments). MyISAM engine (InnoDB was slower in our experiments).
We selected MySQL to benchmark We selected MySQL to benchmark
against because of its universal popularity. We choose not to select another against because of its universal popularity. We chose not to select another
open source column store because we were not confident we could correctly tune open source column store because we were not confident we could correctly tune
it for optimal performance. it for optimal performance.
@ -933,9 +933,9 @@ running an Amazon \texttt{cc2.8xlarge} instance.
\label{fig:ingestion_rate} \label{fig:ingestion_rate}
\end{figure} \end{figure}
The latency measurements we presented are sufficient to address the our stated The latency measurements we presented are sufficient to address the stated
problems of interactivity. We would prefer the variability in the latencies to problems of interactivity. We would prefer the variability in the latencies to
be less. It is still very possible to possible to decrease latencies by adding be less. It is still very possible to decrease latencies by adding
additional hardware, but we have not chosen to do so because infrastructure additional hardware, but we have not chosen to do so because infrastructure
costs are still a consideration to us. costs are still a consideration to us.
@ -1017,7 +1017,7 @@ data centers as well. The tier configuration in Druid coordinator nodes allow
for segments to be replicated across multiple tiers. Hence, segments can be for segments to be replicated across multiple tiers. Hence, segments can be
exactly replicated across historical nodes in multiple data centers. exactly replicated across historical nodes in multiple data centers.
Similarily, query preference can be assigned to different tiers. It is possible Similarily, query preference can be assigned to different tiers. It is possible
to have nodes in one data center act as a primary cluster (and recieve all to have nodes in one data center act as a primary cluster (and receive all
queries) and have a redundant cluster in another data center. Such a setup may queries) and have a redundant cluster in another data center. Such a setup may
be desired if one data center is situated much closer to users. be desired if one data center is situated much closer to users.