mirror of https://github.com/apache/druid.git
fix typos
This commit is contained in:
parent
e523bfc237
commit
7093495a9f
|
@ -43,7 +43,7 @@ created a surge in machine-generated events. Individually, these
|
|||
events contain minimal useful information and are of low value. Given the
|
||||
time and resources required to extract meaning from large collections of
|
||||
events, many companies were willing to discard this data instead. Although
|
||||
infrastructure has been built to handle event based data (e.g. IBM's
|
||||
infrastructure has been built to handle event-based data (e.g. IBM's
|
||||
Netezza\cite{singh2011introduction}, HP's Vertica\cite{bear2012vertica}, and EMC's
|
||||
Greenplum\cite{miner2012unified}), they are largely sold at high price points
|
||||
and are only targeted towards those companies who can afford the offering.
|
||||
|
@ -146,7 +146,7 @@ Relational Database Management Systems (RDBMS) and NoSQL key/value stores were
|
|||
unable to provide a low latency data ingestion and query platform for
|
||||
interactive applications \cite{tschetter2011druid}. In the early days of
|
||||
Metamarkets, we were focused on building a hosted dashboard that would allow
|
||||
users to arbitrary explore and visualize event streams. The data store
|
||||
users to arbitrarily explore and visualize event streams. The data store
|
||||
powering the dashboard needed to return queries fast enough that the data
|
||||
visualizations built on top of it could provide users with an interactive
|
||||
experience.
|
||||
|
@ -198,7 +198,7 @@ Figure~\ref{fig:cluster}.
|
|||
Real-time nodes encapsulate the functionality to ingest and query event
|
||||
streams. Events indexed via these nodes are immediately available for querying.
|
||||
The nodes are only concerned with events for some small time range and
|
||||
periodically hand off immutable batches of events they've collected over this
|
||||
periodically hand off immutable batches of events they have collected over this
|
||||
small time range to other nodes in the Druid cluster that are specialized in
|
||||
dealing with batches of immutable events. Real-time nodes leverage Zookeeper
|
||||
\cite{hunt2010zookeeper} for coordination with the rest of the Druid cluster.
|
||||
|
@ -789,7 +789,7 @@ approximately 10TB of segments loaded. Collectively,
|
|||
there are about 50 billion Druid rows in this tier. Results for
|
||||
every data source are not shown.
|
||||
|
||||
\item The hot tier uses Xeon E5-2670 processors and consists of 1302 processing
|
||||
\item The hot tier uses Intel Xeon E5-2670 processors and consists of 1302 processing
|
||||
threads and 672 total cores (hyperthreaded).
|
||||
|
||||
\item A memory-mapped storage engine was used (the machine was configured to
|
||||
|
@ -828,7 +828,7 @@ comparison, we also provide the results of the same queries using MySQL using th
|
|||
MyISAM engine (InnoDB was slower in our experiments).
|
||||
|
||||
We selected MySQL to benchmark
|
||||
against because of its universal popularity. We choose not to select another
|
||||
against because of its universal popularity. We chose not to select another
|
||||
open source column store because we were not confident we could correctly tune
|
||||
it for optimal performance.
|
||||
|
||||
|
@ -933,9 +933,9 @@ running an Amazon \texttt{cc2.8xlarge} instance.
|
|||
\label{fig:ingestion_rate}
|
||||
\end{figure}
|
||||
|
||||
The latency measurements we presented are sufficient to address the our stated
|
||||
The latency measurements we presented are sufficient to address the stated
|
||||
problems of interactivity. We would prefer the variability in the latencies to
|
||||
be less. It is still very possible to possible to decrease latencies by adding
|
||||
be less. It is still very possible to decrease latencies by adding
|
||||
additional hardware, but we have not chosen to do so because infrastructure
|
||||
costs are still a consideration to us.
|
||||
|
||||
|
@ -1017,7 +1017,7 @@ data centers as well. The tier configuration in Druid coordinator nodes allow
|
|||
for segments to be replicated across multiple tiers. Hence, segments can be
|
||||
exactly replicated across historical nodes in multiple data centers.
|
||||
Similarily, query preference can be assigned to different tiers. It is possible
|
||||
to have nodes in one data center act as a primary cluster (and recieve all
|
||||
to have nodes in one data center act as a primary cluster (and receive all
|
||||
queries) and have a redundant cluster in another data center. Such a setup may
|
||||
be desired if one data center is situated much closer to users.
|
||||
|
||||
|
|
Loading…
Reference in New Issue