mirror of https://github.com/apache/druid.git
typos, wording, fix overfull hboxes
This commit is contained in:
parent
bff902150e
commit
35099e4970
|
@ -73,7 +73,7 @@ aggregations, flexible filters, and low latency data ingestion.
|
||||||
% A category with the (minimum) three required fields
|
% A category with the (minimum) three required fields
|
||||||
\category{H.2.4}{Database Management}{Systems}[Distributed databases]
|
\category{H.2.4}{Database Management}{Systems}[Distributed databases]
|
||||||
% \category{D.2.8}{Software Engineering}{Metrics}[complexity measures, performance measures]
|
% \category{D.2.8}{Software Engineering}{Metrics}[complexity measures, performance measures]
|
||||||
\keywords{distributed; real-time; fault-tolerant; analytics; column-oriented; OLAP}
|
\keywords{distributed; real-time; fault-tolerant; highly available; open source; analytics; column-oriented; OLAP}
|
||||||
|
|
||||||
|
|
||||||
\section{Introduction}
|
\section{Introduction}
|
||||||
|
@ -96,7 +96,7 @@ large amounts of log data. Hadoop has contributed much to helping companies
|
||||||
convert their low-value event streams into high-value aggregates for a variety
|
convert their low-value event streams into high-value aggregates for a variety
|
||||||
of applications such as business intelligence and A-B testing.
|
of applications such as business intelligence and A-B testing.
|
||||||
|
|
||||||
As with a lot of great systems, Hadoop has opened our eyes to a new space of
|
As with many great systems, Hadoop has opened our eyes to a new space of
|
||||||
problems. Specifically, Hadoop excels at storing and providing access to large
|
problems. Specifically, Hadoop excels at storing and providing access to large
|
||||||
amounts of data, however, it does not make any performance guarantees around
|
amounts of data, however, it does not make any performance guarantees around
|
||||||
how quickly that data can be accessed. Furthermore, although Hadoop is a
|
how quickly that data can be accessed. Furthermore, although Hadoop is a
|
||||||
|
@ -113,7 +113,7 @@ needs. We explored different solutions in the space, and after
|
||||||
trying both Relational Database Management Systems and NoSQL architectures, we
|
trying both Relational Database Management Systems and NoSQL architectures, we
|
||||||
came to the conclusion that there was nothing in the open source world that
|
came to the conclusion that there was nothing in the open source world that
|
||||||
could be fully leveraged for our requirements. We ended up creating Druid, an
|
could be fully leveraged for our requirements. We ended up creating Druid, an
|
||||||
open-source, distributed, column-oriented, real-time analytical data store. In
|
open source, distributed, column-oriented, real-time analytical data store. In
|
||||||
many ways, Druid shares similarities with other OLAP systems
|
many ways, Druid shares similarities with other OLAP systems
|
||||||
\cite{oehler2012ibm, schrader2009oracle, lachev2005applied},
|
\cite{oehler2012ibm, schrader2009oracle, lachev2005applied},
|
||||||
interactive query systems \cite{melnik2010dremel}, main-memory databases
|
interactive query systems \cite{melnik2010dremel}, main-memory databases
|
||||||
|
@ -196,7 +196,7 @@ system is unavailable in the face of software upgrades or network failure.
|
||||||
Downtime for startups, who often lack proper internal operations management, can
|
Downtime for startups, who often lack proper internal operations management, can
|
||||||
determine business success or failure.
|
determine business success or failure.
|
||||||
|
|
||||||
Finally, another key problem that Metamarkets faced in its early days was to
|
Finally, another challenge that Metamarkets faced in its early days was to
|
||||||
allow users and alerting systems to be able to make business decisions in
|
allow users and alerting systems to be able to make business decisions in
|
||||||
``real-time". The time from when an event is created to when that event is
|
``real-time". The time from when an event is created to when that event is
|
||||||
queryable determines how fast interested parties are able to react to
|
queryable determines how fast interested parties are able to react to
|
||||||
|
@ -240,11 +240,11 @@ periodically hand off immutable batches of events they have collected over this
|
||||||
small time range to other nodes in the Druid cluster that are specialized in
|
small time range to other nodes in the Druid cluster that are specialized in
|
||||||
dealing with batches of immutable events. Real-time nodes leverage Zookeeper
|
dealing with batches of immutable events. Real-time nodes leverage Zookeeper
|
||||||
\cite{hunt2010zookeeper} for coordination with the rest of the Druid cluster.
|
\cite{hunt2010zookeeper} for coordination with the rest of the Druid cluster.
|
||||||
The nodes announce their online state and the data they are serving in
|
The nodes announce their online state and the data they serve in
|
||||||
Zookeeper.
|
Zookeeper.
|
||||||
|
|
||||||
Real-time nodes maintain an in-memory index buffer for all incoming events.
|
Real-time nodes maintain an in-memory index buffer for all incoming events.
|
||||||
These indexes are incrementally populated as new events are ingested and the
|
These indexes are incrementally populated as events are ingested and the
|
||||||
indexes are also directly queryable. Druid behaves as a row store
|
indexes are also directly queryable. Druid behaves as a row store
|
||||||
for queries on events that exist in this JVM heap-based buffer. To avoid heap
|
for queries on events that exist in this JVM heap-based buffer. To avoid heap
|
||||||
overflow problems, real-time nodes persist their in-memory indexes to disk
|
overflow problems, real-time nodes persist their in-memory indexes to disk
|
||||||
|
@ -269,7 +269,7 @@ Queries will hit both the in-memory and persisted indexes.
|
||||||
On a periodic basis, each real-time node will schedule a background task that
|
On a periodic basis, each real-time node will schedule a background task that
|
||||||
searches for all locally persisted indexes. The task merges these indexes
|
searches for all locally persisted indexes. The task merges these indexes
|
||||||
together and builds an immutable block of data that contains all the events
|
together and builds an immutable block of data that contains all the events
|
||||||
that have ingested by a real-time node for some span of time. We refer to this
|
that have been ingested by a real-time node for some span of time. We refer to this
|
||||||
block of data as a ``segment". During the handoff stage, a real-time node
|
block of data as a ``segment". During the handoff stage, a real-time node
|
||||||
uploads this segment to a permanent backup storage, typically a distributed
|
uploads this segment to a permanent backup storage, typically a distributed
|
||||||
file system such as S3 \cite{decandia2007dynamo} or HDFS
|
file system such as S3 \cite{decandia2007dynamo} or HDFS
|
||||||
|
@ -337,7 +337,7 @@ multiple real-time nodes can read events. Multiple real-time nodes can ingest
|
||||||
the same set of events from the bus, creating a replication of events. In a
|
the same set of events from the bus, creating a replication of events. In a
|
||||||
scenario where a node completely fails and loses disk, replicated streams
|
scenario where a node completely fails and loses disk, replicated streams
|
||||||
ensure that no data is lost. A single ingestion endpoint also allows for data
|
ensure that no data is lost. A single ingestion endpoint also allows for data
|
||||||
streams for be partitioned such that multiple real-time nodes each ingest a
|
streams to be partitioned such that multiple real-time nodes each ingest a
|
||||||
portion of a stream. This allows additional real-time nodes to be seamlessly
|
portion of a stream. This allows additional real-time nodes to be seamlessly
|
||||||
added. In practice, this model has allowed one of the largest production Druid
|
added. In practice, this model has allowed one of the largest production Druid
|
||||||
clusters to be able to consume raw data at approximately 500 MB/s (150,000
|
clusters to be able to consume raw data at approximately 500 MB/s (150,000
|
||||||
|
@ -394,9 +394,9 @@ can also be created with much less powerful backing hardware. The
|
||||||
|
|
||||||
\subsubsection{Availability}
|
\subsubsection{Availability}
|
||||||
Historical nodes depend on Zookeeper for segment load and unload instructions.
|
Historical nodes depend on Zookeeper for segment load and unload instructions.
|
||||||
If Zookeeper becomes unavailable, historical nodes are no longer able to serve
|
Should Zookeeper become unavailable, historical nodes are no longer able to serve
|
||||||
new data and drop outdated data, however, because the queries are served over
|
new data or drop outdated data, however, because the queries are served over
|
||||||
HTTP, historical nodes are still be able to respond to query requests for
|
HTTP, historical nodes are still able to respond to query requests for
|
||||||
the data they are currently serving. This means that Zookeeper outages do not
|
the data they are currently serving. This means that Zookeeper outages do not
|
||||||
impact current data availability on historical nodes.
|
impact current data availability on historical nodes.
|
||||||
|
|
||||||
|
@ -500,7 +500,7 @@ source, recency, and size. The exact details of the algorithm are beyond the
|
||||||
scope of this paper and may be discussed in future literature.
|
scope of this paper and may be discussed in future literature.
|
||||||
|
|
||||||
\subsubsection{Replication}
|
\subsubsection{Replication}
|
||||||
Coordinator nodes may tell different historical nodes to load copies of the
|
Coordinator nodes may tell different historical nodes to load a copy of the
|
||||||
same segment. The number of replicates in each tier of the historical compute
|
same segment. The number of replicates in each tier of the historical compute
|
||||||
cluster is fully configurable. Setups that require high levels of fault
|
cluster is fully configurable. Setups that require high levels of fault
|
||||||
tolerance can be configured to have a high number of replicas. Replicated
|
tolerance can be configured to have a high number of replicas. Replicated
|
||||||
|
@ -513,7 +513,7 @@ cluster. Over the last two years, we have never taken downtime in our Druid
|
||||||
cluster for software upgrades.
|
cluster for software upgrades.
|
||||||
|
|
||||||
\subsubsection{Availability}
|
\subsubsection{Availability}
|
||||||
Druid coordinator nodes have two external dependencies: Zookeeper and MySQL.
|
Druid coordinator nodes have Zookeeper and MySQL as external dependencies.
|
||||||
Coordinator nodes rely on Zookeeper to determine what historical nodes already
|
Coordinator nodes rely on Zookeeper to determine what historical nodes already
|
||||||
exist in the cluster. If Zookeeper becomes unavailable, the coordinator will no
|
exist in the cluster. If Zookeeper becomes unavailable, the coordinator will no
|
||||||
longer be able to send instructions to assign, balance, and drop segments.
|
longer be able to send instructions to assign, balance, and drop segments.
|
||||||
|
@ -558,7 +558,7 @@ range.
|
||||||
|
|
||||||
Druid segments are stored in a column orientation. Given that Druid is best
|
Druid segments are stored in a column orientation. Given that Druid is best
|
||||||
used for aggregating event streams (all data going into Druid must have a
|
used for aggregating event streams (all data going into Druid must have a
|
||||||
timestamp), the advantages storing aggregate information as columns rather than
|
timestamp), the advantages of storing aggregate information as columns rather than
|
||||||
rows are well documented \cite{abadi2008column}. Column storage allows for more
|
rows are well documented \cite{abadi2008column}. Column storage allows for more
|
||||||
efficient CPU usage as only what is needed is actually loaded and scanned. In a
|
efficient CPU usage as only what is needed is actually loaded and scanned. In a
|
||||||
row oriented data store, all columns associated with a row must be scanned as
|
row oriented data store, all columns associated with a row must be scanned as
|
||||||
|
@ -573,7 +573,7 @@ contain strings. Storing strings directly is unnecessarily costly and string
|
||||||
columns can be dictionary encoded instead. Dictionary encoding is a common
|
columns can be dictionary encoded instead. Dictionary encoding is a common
|
||||||
method to compress data and has been used in other data stores such as
|
method to compress data and has been used in other data stores such as
|
||||||
PowerDrill \cite{hall2012processing}. In the example in
|
PowerDrill \cite{hall2012processing}. In the example in
|
||||||
Table~\ref{tab:sample_data}, we can map each page to an unique integer
|
Table~\ref{tab:sample_data}, we can map each page to a unique integer
|
||||||
identifier.
|
identifier.
|
||||||
{\small\begin{verbatim}
|
{\small\begin{verbatim}
|
||||||
Justin Bieber -> 0
|
Justin Bieber -> 0
|
||||||
|
@ -607,7 +607,7 @@ representations.
|
||||||
In many real world OLAP workflows, queries are issued for the aggregated
|
In many real world OLAP workflows, queries are issued for the aggregated
|
||||||
results of some set of metrics where some set of dimension specifications are
|
results of some set of metrics where some set of dimension specifications are
|
||||||
met. An example query is: ``How many Wikipedia edits were done by users in
|
met. An example query is: ``How many Wikipedia edits were done by users in
|
||||||
San Francisco who are also male?". This query is filtering the Wikipedia data
|
San Francisco who are also male?" This query is filtering the Wikipedia data
|
||||||
set in Table~\ref{tab:sample_data} based on a Boolean expression of dimension
|
set in Table~\ref{tab:sample_data} based on a Boolean expression of dimension
|
||||||
values. In many real world data sets, dimension columns contain strings and
|
values. In many real world data sets, dimension columns contain strings and
|
||||||
metric columns contain numeric values. Druid creates additional lookup
|
metric columns contain numeric values. Druid creates additional lookup
|
||||||
|
@ -734,7 +734,7 @@ equal to ``Ke\$ha". The results will be bucketed by day and will be a JSON array
|
||||||
} ]
|
} ]
|
||||||
\end{verbatim}}
|
\end{verbatim}}
|
||||||
|
|
||||||
Druid supports many types of aggregations including double sums, long sums,
|
Druid supports many types of aggregations including sums on floating-point and integer types,
|
||||||
minimums, maximums, and complex aggregations such as cardinality estimation and
|
minimums, maximums, and complex aggregations such as cardinality estimation and
|
||||||
approximate quantile estimation. The results of aggregations can be combined
|
approximate quantile estimation. The results of aggregations can be combined
|
||||||
in mathematical expressions to form other aggregations. It is beyond the scope
|
in mathematical expressions to form other aggregations. It is beyond the scope
|
||||||
|
@ -756,8 +756,8 @@ The reasons for this decision are generally two-fold.
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
A join query is essentially the merging of two or more streams of data based on
|
A join query is essentially the merging of two or more streams of data based on
|
||||||
a shared set of keys. The primary high-level strategies for join queries the
|
a shared set of keys. The primary high-level strategies for join queries we
|
||||||
authors are aware of are a hash-based strategy or a sorted-merge strategy. The
|
are aware of are a hash-based strategy or a sorted-merge strategy. The
|
||||||
hash-based strategy requires that all but one data set be available as
|
hash-based strategy requires that all but one data set be available as
|
||||||
something that looks like a hash table, a lookup operation is then performed on
|
something that looks like a hash table, a lookup operation is then performed on
|
||||||
this hash table for every row in the ``primary" stream. The sorted-merge
|
this hash table for every row in the ``primary" stream. The sorted-merge
|
||||||
|
@ -770,16 +770,15 @@ When all sides of the join are significantly large tables (> 1 billion records),
|
||||||
materializing the pre-join streams requires complex distributed memory
|
materializing the pre-join streams requires complex distributed memory
|
||||||
management. The complexity of the memory management is only amplified by
|
management. The complexity of the memory management is only amplified by
|
||||||
the fact that we are targeting highly concurrent, multitenant workloads.
|
the fact that we are targeting highly concurrent, multitenant workloads.
|
||||||
This is, as far as the authors are aware, an active academic research
|
This is, as far as we are aware, an active academic research
|
||||||
problem that we would be more than willing to engage with the academic
|
problem that we would be willing to help resolve in a scalable manner.
|
||||||
community to help resolving in a scalable manner.
|
|
||||||
|
|
||||||
|
|
||||||
\section{Performance}
|
\section{Performance}
|
||||||
\label{sec:benchmarks}
|
\label{sec:benchmarks}
|
||||||
Druid runs in production at several organizations, and to demonstrate its
|
Druid runs in production at several organizations, and to demonstrate its
|
||||||
performance, we have chosen to share some real world numbers for the main production
|
performance, we have chosen to share some real world numbers for the main production
|
||||||
cluster running at Metamarkets in early 2014. For comparison with other databases
|
cluster running at Metamarkets as of early 2014. For comparison with other databases
|
||||||
we also include results from synthetic workloads on TPC-H data.
|
we also include results from synthetic workloads on TPC-H data.
|
||||||
|
|
||||||
\subsection{Query Performance in Production}
|
\subsection{Query Performance in Production}
|
||||||
|
@ -789,7 +788,7 @@ based on a given metric is much more expensive than a simple count over a time
|
||||||
range. To showcase the average query latencies in a production Druid cluster,
|
range. To showcase the average query latencies in a production Druid cluster,
|
||||||
we selected 8 of our most queried data sources, described in Table~\ref{tab:datasources}.
|
we selected 8 of our most queried data sources, described in Table~\ref{tab:datasources}.
|
||||||
|
|
||||||
Approximately 30\% of the queries are standard
|
Approximately 30\% of queries are standard
|
||||||
aggregates involving different types of metrics and filters, 60\% of queries
|
aggregates involving different types of metrics and filters, 60\% of queries
|
||||||
are ordered group bys over one or more dimensions with aggregates, and 10\% of
|
are ordered group bys over one or more dimensions with aggregates, and 10\% of
|
||||||
queries are search queries and metadata retrieval queries. The number of
|
queries are search queries and metadata retrieval queries. The number of
|
||||||
|
@ -827,7 +826,7 @@ approximately 10TB of segments loaded. Collectively,
|
||||||
there are about 50 billion Druid rows in this tier. Results for
|
there are about 50 billion Druid rows in this tier. Results for
|
||||||
every data source are not shown.
|
every data source are not shown.
|
||||||
|
|
||||||
\item The hot tier uses Intel Xeon E5-2670 processors and consists of 1302 processing
|
\item The hot tier uses Intel\textsuperscript{\textregistered} Xeon\textsuperscript{\textregistered} E5-2670 processors and consists of 1302 processing
|
||||||
threads and 672 total cores (hyperthreaded).
|
threads and 672 total cores (hyperthreaded).
|
||||||
|
|
||||||
\item A memory-mapped storage engine was used (the machine was configured to
|
\item A memory-mapped storage engine was used (the machine was configured to
|
||||||
|
@ -871,8 +870,8 @@ open source column store because we were not confident we could correctly tune
|
||||||
it for optimal performance.
|
it for optimal performance.
|
||||||
|
|
||||||
Our Druid setup used Amazon EC2
|
Our Druid setup used Amazon EC2
|
||||||
\texttt{m3.2xlarge} (Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz) instances for
|
\texttt{m3.2xlarge} instance types (Intel\textsuperscript{\textregistered} Xeon\textsuperscript{\textregistered} E5-2680 v2 @ 2.80GHz) for
|
||||||
historical nodes and \texttt{c3.2xlarge} (Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz) instances for broker
|
historical nodes and \texttt{c3.2xlarge} instances (Intel\textsuperscript{\textregistered} Xeon\textsuperscript{\textregistered} E5-2670 v2 @ 2.50GHz) for broker
|
||||||
nodes. Our MySQL setup was an Amazon RDS instance that ran on the same \texttt{m3.2xlarge} instance type.
|
nodes. Our MySQL setup was an Amazon RDS instance that ran on the same \texttt{m3.2xlarge} instance type.
|
||||||
|
|
||||||
The results for the 1 GB TPC-H data set are shown
|
The results for the 1 GB TPC-H data set are shown
|
||||||
|
@ -918,7 +917,7 @@ well.
|
||||||
To showcase Druid's data ingestion latency, we selected several production
|
To showcase Druid's data ingestion latency, we selected several production
|
||||||
datasources of varying dimensions, metrics, and event volumes. Our production
|
datasources of varying dimensions, metrics, and event volumes. Our production
|
||||||
ingestion setup consists of 6 nodes, totalling 360GB of RAM and 96 cores
|
ingestion setup consists of 6 nodes, totalling 360GB of RAM and 96 cores
|
||||||
(12 x Intel Xeon E5-2670).
|
(12 x Intel\textsuperscript\textregistered Xeon\textsuperscript\textregistered E5-2670).
|
||||||
|
|
||||||
Note that in this setup, several other data sources were being ingested and
|
Note that in this setup, several other data sources were being ingested and
|
||||||
many other Druid related ingestion tasks were running concurrently on the machines.
|
many other Druid related ingestion tasks were running concurrently on the machines.
|
||||||
|
@ -974,9 +973,9 @@ running an Amazon \texttt{cc2.8xlarge} instance.
|
||||||
|
|
||||||
The latency measurements we presented are sufficient to address the stated
|
The latency measurements we presented are sufficient to address the stated
|
||||||
problems of interactivity. We would prefer the variability in the latencies to
|
problems of interactivity. We would prefer the variability in the latencies to
|
||||||
be less. It is still very possible to decrease latencies by adding
|
be less. It is still possible to decrease latencies by adding
|
||||||
additional hardware, but we have not chosen to do so because infrastructure
|
additional hardware, but we have not chosen to do so because infrastructure
|
||||||
costs are still a consideration to us.
|
costs are still a consideration for us.
|
||||||
|
|
||||||
\section{Druid in Production}\label{sec:production}
|
\section{Druid in Production}\label{sec:production}
|
||||||
Over the last few years, we have gained tremendous knowledge about handling
|
Over the last few years, we have gained tremendous knowledge about handling
|
||||||
|
@ -988,8 +987,8 @@ explore use case, the number of queries issued by a single user are much higher
|
||||||
than in the reporting use case. Exploratory queries often involve progressively
|
than in the reporting use case. Exploratory queries often involve progressively
|
||||||
adding filters for the same time range to narrow down results. Users tend to
|
adding filters for the same time range to narrow down results. Users tend to
|
||||||
explore short time intervals of recent data. In the generate report use case,
|
explore short time intervals of recent data. In the generate report use case,
|
||||||
users query for much longer data intervals, but users also already know the
|
users query for much longer data intervals, but those queries are generally few
|
||||||
queries they want to issue.
|
and pre-determined.
|
||||||
|
|
||||||
\paragraph{Multitenancy}
|
\paragraph{Multitenancy}
|
||||||
Expensive concurrent queries can be problematic in a multitenant
|
Expensive concurrent queries can be problematic in a multitenant
|
||||||
|
@ -1005,7 +1004,7 @@ interactivity in this use case as when they are exploring data.
|
||||||
\paragraph{Node failures}
|
\paragraph{Node failures}
|
||||||
Single node failures are common in distributed environments, but many nodes
|
Single node failures are common in distributed environments, but many nodes
|
||||||
failing at once are not. If historical nodes completely fail and do not
|
failing at once are not. If historical nodes completely fail and do not
|
||||||
recover, their segments need to reassigned, which means we need excess cluster
|
recover, their segments need to be reassigned, which means we need excess cluster
|
||||||
capacity to load this data. The amount of additional capacity to have at any
|
capacity to load this data. The amount of additional capacity to have at any
|
||||||
time contributes to the cost of running a cluster. From our experiences, it is
|
time contributes to the cost of running a cluster. From our experiences, it is
|
||||||
extremely rare to see more than 2 nodes completely fail at once and hence, we
|
extremely rare to see more than 2 nodes completely fail at once and hence, we
|
||||||
|
@ -1016,10 +1015,10 @@ historical nodes.
|
||||||
Complete cluster failures are possible, but extremely rare. If Druid is
|
Complete cluster failures are possible, but extremely rare. If Druid is
|
||||||
only deployed in a single data center, it is possible for the entire data
|
only deployed in a single data center, it is possible for the entire data
|
||||||
center to fail. In such cases, new machines need to be provisioned. As long as
|
center to fail. In such cases, new machines need to be provisioned. As long as
|
||||||
deep storage is still available, cluster recovery time is network bound as
|
deep storage is still available, cluster recovery time is network bound, as
|
||||||
historical nodes simply need to redownload every segment from deep storage. We
|
historical nodes simply need to redownload every segment from deep storage. We
|
||||||
have experienced such failures in the past, and the recovery time was around
|
have experienced such failures in the past, and the recovery time was
|
||||||
several hours in the AWS ecosystem for several TBs of data.
|
several hours in the Amazon AWS ecosystem for several terabytes of data.
|
||||||
|
|
||||||
\subsection{Operational Monitoring}
|
\subsection{Operational Monitoring}
|
||||||
Proper monitoring is critical to run a large scale distributed cluster.
|
Proper monitoring is critical to run a large scale distributed cluster.
|
||||||
|
@ -1035,16 +1034,16 @@ performance and stability of the production cluster. This dedicated metrics
|
||||||
cluster has allowed us to find numerous production problems, such as gradual
|
cluster has allowed us to find numerous production problems, such as gradual
|
||||||
query speed degregations, less than optimally tuned hardware, and various other
|
query speed degregations, less than optimally tuned hardware, and various other
|
||||||
system bottlenecks. We also use a metrics cluster to analyze what queries are
|
system bottlenecks. We also use a metrics cluster to analyze what queries are
|
||||||
made in production and what users are most interested in.
|
made in production and what aspects of the data users are most interested in.
|
||||||
|
|
||||||
\subsection{Pairing Druid with a Stream Processor}
|
\subsection{Pairing Druid with a Stream Processor}
|
||||||
At the time of writing, Druid can only understand fully denormalized data
|
Currently, Druid can only understand fully denormalized data
|
||||||
streams. In order to provide full business logic in production, Druid can be
|
streams. In order to provide full business logic in production, Druid can be
|
||||||
paired with a stream processor such as Apache Storm \cite{marz2013storm}.
|
paired with a stream processor such as Apache Storm \cite{marz2013storm}.
|
||||||
|
|
||||||
A Storm topology consumes events from a data stream, retains only those that are
|
A Storm topology consumes events from a data stream, retains only those that are
|
||||||
“on-time”, and applies any relevant business logic. This could range from
|
“on-time”, and applies any relevant business logic. This could range from
|
||||||
simple transformations, such as id to name lookups, up to complex operations
|
simple transformations, such as id to name lookups, to complex operations
|
||||||
such as multi-stream joins. The Storm topology forwards the processed event
|
such as multi-stream joins. The Storm topology forwards the processed event
|
||||||
stream to Druid in real-time. Storm handles the streaming data processing work,
|
stream to Druid in real-time. Storm handles the streaming data processing work,
|
||||||
and Druid is used for responding to queries for both real-time and
|
and Druid is used for responding to queries for both real-time and
|
||||||
|
@ -1075,14 +1074,14 @@ Although Druid builds on many of the same principles as other distributed
|
||||||
columnar data stores \cite{fink2012distributed}, many of these data stores are
|
columnar data stores \cite{fink2012distributed}, many of these data stores are
|
||||||
designed to be more generic key-value stores \cite{lakshman2010cassandra} and do not
|
designed to be more generic key-value stores \cite{lakshman2010cassandra} and do not
|
||||||
support computation directly in the storage layer. There are also other data
|
support computation directly in the storage layer. There are also other data
|
||||||
stores designed for some of the same of the data warehousing issues that Druid
|
stores designed for some of the same data warehousing issues that Druid
|
||||||
is meant to solve. These systems include include in-memory databases such as
|
is meant to solve. These systems include in-memory databases such as
|
||||||
SAP’s HANA \cite{farber2012sap} and VoltDB \cite{voltdb2010voltdb}. These data
|
SAP’s HANA \cite{farber2012sap} and VoltDB \cite{voltdb2010voltdb}. These data
|
||||||
stores lack Druid's low latency ingestion characteristics. Druid also has
|
stores lack Druid's low latency ingestion characteristics. Druid also has
|
||||||
native analytical features baked in, similar to \cite{paraccel2013}, however,
|
native analytical features baked in, similar to ParAccel \cite{paraccel2013}, however,
|
||||||
Druid allows system wide rolling software updates with no downtime.
|
Druid allows system wide rolling software updates with no downtime.
|
||||||
|
|
||||||
Druid is similiar to \cite{stonebraker2005c, cipar2012lazybase} in that it has
|
Druid is similiar to C-Store \cite{stonebraker2005c} and LazyBase \cite{cipar2012lazybase} in that it has
|
||||||
two subsystems, a read-optimized subsystem in the historical nodes and a
|
two subsystems, a read-optimized subsystem in the historical nodes and a
|
||||||
write-optimized subsystem in real-time nodes. Real-time nodes are designed to
|
write-optimized subsystem in real-time nodes. Real-time nodes are designed to
|
||||||
ingest a high volume of append heavy data, and do not support data updates.
|
ingest a high volume of append heavy data, and do not support data updates.
|
||||||
|
@ -1090,7 +1089,7 @@ Unlike the two aforementioned systems, Druid is meant for OLAP transactions and
|
||||||
not OLTP transactions.
|
not OLTP transactions.
|
||||||
|
|
||||||
Druid's low latency data ingestion features share some similarities with
|
Druid's low latency data ingestion features share some similarities with
|
||||||
Trident/Storm \cite{marz2013storm} and Streaming Spark
|
Trident/Storm \cite{marz2013storm} and Spark Streaming
|
||||||
\cite{zaharia2012discretized}, however, both systems are focused on stream
|
\cite{zaharia2012discretized}, however, both systems are focused on stream
|
||||||
processing whereas Druid is focused on ingestion and aggregation. Stream
|
processing whereas Druid is focused on ingestion and aggregation. Stream
|
||||||
processors are great complements to Druid as a means of pre-processing the data
|
processors are great complements to Druid as a means of pre-processing the data
|
||||||
|
@ -1111,7 +1110,7 @@ stores \cite{macnicol2004sybase}.
|
||||||
|
|
||||||
\section{Conclusions}
|
\section{Conclusions}
|
||||||
\label{sec:conclusions}
|
\label{sec:conclusions}
|
||||||
In this paper, we presented Druid, a distributed, column-oriented, real-time
|
In this paper we presented Druid, a distributed, column-oriented, real-time
|
||||||
analytical data store. Druid is designed to power high performance applications
|
analytical data store. Druid is designed to power high performance applications
|
||||||
and is optimized for low query latencies. Druid supports streaming data
|
and is optimized for low query latencies. Druid supports streaming data
|
||||||
ingestion and is fault-tolerant. We discussed Druid benchmarks and
|
ingestion and is fault-tolerant. We discussed Druid benchmarks and
|
||||||
|
|
Loading…
Reference in New Issue