fix typos

2014-03-21 14:33:44 -07:00 · 2014-03-21 14:33:44 -07:00 · 7093495a9f
parent e523bfc237
commit 7093495a9f
1 changed files with 8 additions and 8 deletions
--- a/publications/whitepaper/druid.tex
+++ b/publications/whitepaper/druid.tex
@ -43,7 +43,7 @@ created a surge in machine-generated events.  Individually, these
 events contain minimal useful information and are of low value.  Given the
 time and resources required to extract meaning from large collections of
 events, many companies were willing to discard this data instead.  Although
-infrastructure has been built to handle event based data (e.g. IBM's
+infrastructure has been built to handle event-based data (e.g. IBM's
 Netezza\cite{singh2011introduction}, HP's Vertica\cite{bear2012vertica}, and EMC's
 Greenplum\cite{miner2012unified}), they are largely sold at high price points
 and are only targeted towards those companies who can afford the offering.
@ -146,7 +146,7 @@ Relational Database Management Systems (RDBMS) and NoSQL key/value stores were
 unable to provide a low latency data ingestion and query platform for
 interactive applications \cite{tschetter2011druid}. In the early days of
 Metamarkets, we were focused on building a hosted dashboard that would allow
-users to arbitrary explore and visualize event streams.  The data store
+users to arbitrarily explore and visualize event streams.  The data store
 powering the dashboard needed to return queries fast enough that the data
 visualizations built on top of it could provide users with an interactive
 experience. 
@ -198,7 +198,7 @@ Figure~\ref{fig:cluster}.
 Real-time nodes encapsulate the functionality to ingest and query event
 streams. Events indexed via these nodes are immediately available for querying.
 The nodes are only concerned with events for some small time range and
-periodically hand off immutable batches of events they've collected over this
+periodically hand off immutable batches of events they have collected over this
 small time range to other nodes in the Druid cluster that are specialized in
 dealing with batches of immutable events. Real-time nodes leverage Zookeeper
 \cite{hunt2010zookeeper} for coordination with the rest of the Druid cluster.
@ -789,7 +789,7 @@ approximately 10TB of segments loaded. Collectively,
 there are about 50 billion Druid rows in this tier. Results for
 every data source are not shown.

-\item The hot tier uses Xeon E5-2670 processors and consists of 1302 processing
+\item The hot tier uses Intel Xeon E5-2670 processors and consists of 1302 processing
 threads and 672 total cores (hyperthreaded).

 \item A memory-mapped storage engine was used (the machine was configured to
@ -828,7 +828,7 @@ comparison, we also provide the results of the same queries using MySQL using th
 MyISAM engine (InnoDB was slower in our experiments).

 We selected MySQL to benchmark
-against because of its universal popularity. We choose not to select another
+against because of its universal popularity. We chose not to select another
 open source column store because we were not confident we could correctly tune
 it for optimal performance.

@ -933,9 +933,9 @@ running an Amazon \texttt{cc2.8xlarge} instance.
 \label{fig:ingestion_rate}
 \end{figure}

-The latency measurements we presented are sufficient to address the our stated
+The latency measurements we presented are sufficient to address the stated
 problems of interactivity. We would prefer the variability in the latencies to
-be less. It is still very possible to possible to decrease latencies by adding
+be less. It is still very possible to decrease latencies by adding
 additional hardware, but we have not chosen to do so because infrastructure
 costs are still a consideration to us.

@ -1017,7 +1017,7 @@ data centers as well. The tier configuration in Druid coordinator nodes allow
 for segments to be replicated across multiple tiers. Hence, segments can be
 exactly replicated across historical nodes in multiple data centers.
 Similarily, query preference can be assigned to different tiers. It is possible
-to have nodes in one data center act as a primary cluster (and recieve all
+to have nodes in one data center act as a primary cluster (and receive all
 queries) and have a redundant cluster in another data center. Such a setup may
 be desired if one data center is situated much closer to users.