fix typos

This commit is contained in:
Xavier Léauté 2014-03-21 14:33:44 -07:00
parent e523bfc237
commit 7093495a9f
1 changed files with 8 additions and 8 deletions

View File

@ -43,7 +43,7 @@ created a surge in machine-generated events. Individually, these
events contain minimal useful information and are of low value. Given the
time and resources required to extract meaning from large collections of
events, many companies were willing to discard this data instead. Although
infrastructure has been built to handle event based data (e.g. IBM's
infrastructure has been built to handle event-based data (e.g. IBM's
Netezza\cite{singh2011introduction}, HP's Vertica\cite{bear2012vertica}, and EMC's
Greenplum\cite{miner2012unified}), they are largely sold at high price points
and are only targeted towards those companies who can afford the offering.
@ -146,7 +146,7 @@ Relational Database Management Systems (RDBMS) and NoSQL key/value stores were
unable to provide a low latency data ingestion and query platform for
interactive applications \cite{tschetter2011druid}. In the early days of
Metamarkets, we were focused on building a hosted dashboard that would allow
users to arbitrary explore and visualize event streams. The data store
users to arbitrarily explore and visualize event streams. The data store
powering the dashboard needed to return queries fast enough that the data
visualizations built on top of it could provide users with an interactive
experience.
@ -198,7 +198,7 @@ Figure~\ref{fig:cluster}.
Real-time nodes encapsulate the functionality to ingest and query event
streams. Events indexed via these nodes are immediately available for querying.
The nodes are only concerned with events for some small time range and
periodically hand off immutable batches of events they've collected over this
periodically hand off immutable batches of events they have collected over this
small time range to other nodes in the Druid cluster that are specialized in
dealing with batches of immutable events. Real-time nodes leverage Zookeeper
\cite{hunt2010zookeeper} for coordination with the rest of the Druid cluster.
@ -789,7 +789,7 @@ approximately 10TB of segments loaded. Collectively,
there are about 50 billion Druid rows in this tier. Results for
every data source are not shown.
\item The hot tier uses Xeon E5-2670 processors and consists of 1302 processing
\item The hot tier uses Intel Xeon E5-2670 processors and consists of 1302 processing
threads and 672 total cores (hyperthreaded).
\item A memory-mapped storage engine was used (the machine was configured to
@ -828,7 +828,7 @@ comparison, we also provide the results of the same queries using MySQL using th
MyISAM engine (InnoDB was slower in our experiments).
We selected MySQL to benchmark
against because of its universal popularity. We choose not to select another
against because of its universal popularity. We chose not to select another
open source column store because we were not confident we could correctly tune
it for optimal performance.
@ -933,9 +933,9 @@ running an Amazon \texttt{cc2.8xlarge} instance.
\label{fig:ingestion_rate}
\end{figure}
The latency measurements we presented are sufficient to address the our stated
The latency measurements we presented are sufficient to address the stated
problems of interactivity. We would prefer the variability in the latencies to
be less. It is still very possible to possible to decrease latencies by adding
be less. It is still very possible to decrease latencies by adding
additional hardware, but we have not chosen to do so because infrastructure
costs are still a consideration to us.
@ -1017,7 +1017,7 @@ data centers as well. The tier configuration in Druid coordinator nodes allow
for segments to be replicated across multiple tiers. Hence, segments can be
exactly replicated across historical nodes in multiple data centers.
Similarily, query preference can be assigned to different tiers. It is possible
to have nodes in one data center act as a primary cluster (and recieve all
to have nodes in one data center act as a primary cluster (and receive all
queries) and have a redundant cluster in another data center. Such a setup may
be desired if one data center is situated much closer to users.