mirror of https://github.com/apache/druid.git
fix typos
This commit is contained in:
parent
e523bfc237
commit
7093495a9f
|
@ -43,7 +43,7 @@ created a surge in machine-generated events. Individually, these
|
||||||
events contain minimal useful information and are of low value. Given the
|
events contain minimal useful information and are of low value. Given the
|
||||||
time and resources required to extract meaning from large collections of
|
time and resources required to extract meaning from large collections of
|
||||||
events, many companies were willing to discard this data instead. Although
|
events, many companies were willing to discard this data instead. Although
|
||||||
infrastructure has been built to handle event based data (e.g. IBM's
|
infrastructure has been built to handle event-based data (e.g. IBM's
|
||||||
Netezza\cite{singh2011introduction}, HP's Vertica\cite{bear2012vertica}, and EMC's
|
Netezza\cite{singh2011introduction}, HP's Vertica\cite{bear2012vertica}, and EMC's
|
||||||
Greenplum\cite{miner2012unified}), they are largely sold at high price points
|
Greenplum\cite{miner2012unified}), they are largely sold at high price points
|
||||||
and are only targeted towards those companies who can afford the offering.
|
and are only targeted towards those companies who can afford the offering.
|
||||||
|
@ -146,7 +146,7 @@ Relational Database Management Systems (RDBMS) and NoSQL key/value stores were
|
||||||
unable to provide a low latency data ingestion and query platform for
|
unable to provide a low latency data ingestion and query platform for
|
||||||
interactive applications \cite{tschetter2011druid}. In the early days of
|
interactive applications \cite{tschetter2011druid}. In the early days of
|
||||||
Metamarkets, we were focused on building a hosted dashboard that would allow
|
Metamarkets, we were focused on building a hosted dashboard that would allow
|
||||||
users to arbitrary explore and visualize event streams. The data store
|
users to arbitrarily explore and visualize event streams. The data store
|
||||||
powering the dashboard needed to return queries fast enough that the data
|
powering the dashboard needed to return queries fast enough that the data
|
||||||
visualizations built on top of it could provide users with an interactive
|
visualizations built on top of it could provide users with an interactive
|
||||||
experience.
|
experience.
|
||||||
|
@ -198,7 +198,7 @@ Figure~\ref{fig:cluster}.
|
||||||
Real-time nodes encapsulate the functionality to ingest and query event
|
Real-time nodes encapsulate the functionality to ingest and query event
|
||||||
streams. Events indexed via these nodes are immediately available for querying.
|
streams. Events indexed via these nodes are immediately available for querying.
|
||||||
The nodes are only concerned with events for some small time range and
|
The nodes are only concerned with events for some small time range and
|
||||||
periodically hand off immutable batches of events they've collected over this
|
periodically hand off immutable batches of events they have collected over this
|
||||||
small time range to other nodes in the Druid cluster that are specialized in
|
small time range to other nodes in the Druid cluster that are specialized in
|
||||||
dealing with batches of immutable events. Real-time nodes leverage Zookeeper
|
dealing with batches of immutable events. Real-time nodes leverage Zookeeper
|
||||||
\cite{hunt2010zookeeper} for coordination with the rest of the Druid cluster.
|
\cite{hunt2010zookeeper} for coordination with the rest of the Druid cluster.
|
||||||
|
@ -789,7 +789,7 @@ approximately 10TB of segments loaded. Collectively,
|
||||||
there are about 50 billion Druid rows in this tier. Results for
|
there are about 50 billion Druid rows in this tier. Results for
|
||||||
every data source are not shown.
|
every data source are not shown.
|
||||||
|
|
||||||
\item The hot tier uses Xeon E5-2670 processors and consists of 1302 processing
|
\item The hot tier uses Intel Xeon E5-2670 processors and consists of 1302 processing
|
||||||
threads and 672 total cores (hyperthreaded).
|
threads and 672 total cores (hyperthreaded).
|
||||||
|
|
||||||
\item A memory-mapped storage engine was used (the machine was configured to
|
\item A memory-mapped storage engine was used (the machine was configured to
|
||||||
|
@ -828,7 +828,7 @@ comparison, we also provide the results of the same queries using MySQL using th
|
||||||
MyISAM engine (InnoDB was slower in our experiments).
|
MyISAM engine (InnoDB was slower in our experiments).
|
||||||
|
|
||||||
We selected MySQL to benchmark
|
We selected MySQL to benchmark
|
||||||
against because of its universal popularity. We choose not to select another
|
against because of its universal popularity. We chose not to select another
|
||||||
open source column store because we were not confident we could correctly tune
|
open source column store because we were not confident we could correctly tune
|
||||||
it for optimal performance.
|
it for optimal performance.
|
||||||
|
|
||||||
|
@ -933,9 +933,9 @@ running an Amazon \texttt{cc2.8xlarge} instance.
|
||||||
\label{fig:ingestion_rate}
|
\label{fig:ingestion_rate}
|
||||||
\end{figure}
|
\end{figure}
|
||||||
|
|
||||||
The latency measurements we presented are sufficient to address the our stated
|
The latency measurements we presented are sufficient to address the stated
|
||||||
problems of interactivity. We would prefer the variability in the latencies to
|
problems of interactivity. We would prefer the variability in the latencies to
|
||||||
be less. It is still very possible to possible to decrease latencies by adding
|
be less. It is still very possible to decrease latencies by adding
|
||||||
additional hardware, but we have not chosen to do so because infrastructure
|
additional hardware, but we have not chosen to do so because infrastructure
|
||||||
costs are still a consideration to us.
|
costs are still a consideration to us.
|
||||||
|
|
||||||
|
@ -1017,7 +1017,7 @@ data centers as well. The tier configuration in Druid coordinator nodes allow
|
||||||
for segments to be replicated across multiple tiers. Hence, segments can be
|
for segments to be replicated across multiple tiers. Hence, segments can be
|
||||||
exactly replicated across historical nodes in multiple data centers.
|
exactly replicated across historical nodes in multiple data centers.
|
||||||
Similarily, query preference can be assigned to different tiers. It is possible
|
Similarily, query preference can be assigned to different tiers. It is possible
|
||||||
to have nodes in one data center act as a primary cluster (and recieve all
|
to have nodes in one data center act as a primary cluster (and receive all
|
||||||
queries) and have a redundant cluster in another data center. Such a setup may
|
queries) and have a redundant cluster in another data center. Such a setup may
|
||||||
be desired if one data center is situated much closer to users.
|
be desired if one data center is situated much closer to users.
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue