mirror of https://github.com/apache/druid.git
Added titles and harmonized docs to improve usability and SEO (#6731)
* added titles and harmonized docs * manually fixed some titles
This commit is contained in:
parent
55914687bb
commit
da4836f38c
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid vs Elasticsearch"
|
||||
---
|
||||
|
||||
Druid vs Elasticsearch
|
||||
======================
|
||||
# Druid vs Elasticsearch
|
||||
|
||||
We are not experts on search systems, if anything is incorrect about our portrayal, please let us know on the mailing list or via some other means.
|
||||
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)"
|
||||
---
|
||||
|
||||
Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
|
||||
====================================================
|
||||
# Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
|
||||
|
||||
Druid is highly optimized for scans and aggregations, it supports arbitrarily deep drill downs into data sets. This same functionality
|
||||
is supported in key/value stores in 2 ways:
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid vs Kudu"
|
||||
---
|
||||
|
||||
Druid vs Kudu
|
||||
=============
|
||||
# Druid vs Kudu
|
||||
|
||||
Kudu's storage format enables single row updates, whereas updates to existing Druid segments requires recreating the segment, so theoretically
|
||||
the process for updating old values should be higher latency in Druid. However, the requirements in Kudu for maintaining extra head space to store
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid vs Redshift"
|
||||
---
|
||||
Druid vs Redshift
|
||||
=================
|
||||
|
||||
# Druid vs Redshift
|
||||
|
||||
### How does Druid compare to Redshift?
|
||||
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid vs Spark"
|
||||
---
|
||||
|
||||
Druid vs Spark
|
||||
==============
|
||||
# Druid vs Spark
|
||||
|
||||
Druid and Spark are complementary solutions as Druid can be used to accelerate OLAP queries in Spark.
|
||||
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid vs SQL-on-Hadoop"
|
||||
---
|
||||
|
||||
Druid vs SQL-on-Hadoop (Impala/Drill/Spark SQL/Presto)
|
||||
===========================================================
|
||||
# Druid vs SQL-on-Hadoop (Impala/Drill/Spark SQL/Presto)
|
||||
|
||||
SQL-on-Hadoop engines provide an
|
||||
execution engine for various data formats and data stores, and
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Configuration Reference"
|
||||
---
|
||||
|
||||
# Configuration Reference
|
||||
|
||||
This page documents all of the configuration properties for each Druid service type.
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Logging"
|
||||
---
|
||||
Logging
|
||||
==========================
|
||||
# Logging
|
||||
|
||||
Druid nodes will emit logs that are useful for debugging to the console. Druid nodes also emit periodic metrics about their state. For more about metrics, see [Configuration](../configuration/index.html#enabling-metrics). Metric logs are printed to the console by default, and can be disabled with `-Ddruid.emitter.logging.logLevel=debug`.
|
||||
|
||||
|
|
|
@ -19,10 +19,10 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Realtime Node Configuration"
|
||||
---
|
||||
# Realtime Node Configuration
|
||||
|
||||
Realtime Node Configuration
|
||||
==============================
|
||||
For general Realtime Node information, see [here](../design/realtime.html).
|
||||
|
||||
Runtime Configuration
|
||||
|
|
|
@ -19,9 +19,12 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Cassandra Deep Storage"
|
||||
---
|
||||
# Cassandra Deep Storage
|
||||
|
||||
## Introduction
|
||||
|
||||
Druid can use Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables:
|
||||
`index_storage` and `descriptor_storage`. Underneath the hood, the Cassandra integration leverages Astyanax. The
|
||||
index storage table is a [Chunked Object](https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store) repository. It contains
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Deep Storage"
|
||||
---
|
||||
|
||||
# Deep Storage
|
||||
|
||||
Deep storage is where segments are stored. It is a storage mechanism that Druid does not provide. This deep storage infrastructure defines the level of durability of your data, as long as Druid nodes can see this storage infrastructure and get at the segments stored on it, you will not lose data no matter how many Druid nodes you lose. If segments disappear from this storage layer, then you will lose whatever data those segments represented.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Metadata Storage"
|
||||
---
|
||||
|
||||
# Metadata Storage
|
||||
|
||||
The Metadata Storage is an external dependency of Druid. Druid uses it to store
|
||||
|
|
|
@ -19,8 +19,10 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "ZooKeeper"
|
||||
---
|
||||
# ZooKeeper
|
||||
|
||||
Druid uses [ZooKeeper](http://zookeeper.apache.org/) (ZK) for management of current cluster state. The operations that happen over ZK are
|
||||
|
||||
1. [Coordinator](../design/coordinator.html) leader election
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Authentication and Authorization"
|
||||
---
|
||||
|
||||
# Authentication and Authorization
|
||||
|
||||
|Property|Type|Description|Default|Required|
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Broker"
|
||||
---
|
||||
Broker
|
||||
======
|
||||
# Broker
|
||||
|
||||
### Configuration
|
||||
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Coordinator Node"
|
||||
---
|
||||
Coordinator Node
|
||||
================
|
||||
# Coordinator Node
|
||||
|
||||
### Configuration
|
||||
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Historical Node"
|
||||
---
|
||||
Historical Node
|
||||
===============
|
||||
# Historical Node
|
||||
|
||||
### Configuration
|
||||
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Design"
|
||||
---
|
||||
|
||||
# What is Druid?<a id="what-is-druid"></a>
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Indexing Service"
|
||||
---
|
||||
Indexing Service
|
||||
================
|
||||
# Indexing Service
|
||||
|
||||
The indexing service is a highly-available, distributed service that runs indexing related tasks.
|
||||
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "MiddleManager Node"
|
||||
---
|
||||
|
||||
Middle Manager Node
|
||||
------------------
|
||||
# MiddleManager Node
|
||||
|
||||
### Configuration
|
||||
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Overlord Node"
|
||||
---
|
||||
|
||||
Overlord Node
|
||||
-------------
|
||||
# Overlord Node
|
||||
|
||||
### Configuration
|
||||
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Peons"
|
||||
---
|
||||
|
||||
Peons
|
||||
-----
|
||||
# Peons
|
||||
|
||||
### Configuration
|
||||
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid Plumbers"
|
||||
---
|
||||
|
||||
# Druid Plumbers
|
||||
|
||||
The plumber handles generated segments both while they are being generated and when they are "done". This is also technically a pluggable interface and there are multiple implementations. However, plumbers handle numerous complex details, and therefore an advanced understanding of Druid is recommended before implementing your own.
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Real-time Node"
|
||||
---
|
||||
|
||||
Real-time Node
|
||||
==============
|
||||
# Real-time Node
|
||||
|
||||
<div class="note info">
|
||||
NOTE: Realtime nodes are deprecated. Please use the <a href="../development/extensions-core/kafka-ingestion.html">Kafka Indexing Service</a> for stream pull use cases instead.
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Segments"
|
||||
---
|
||||
Segments
|
||||
========
|
||||
# Segments
|
||||
|
||||
Druid stores its index in *segment files*, which are partitioned by
|
||||
time. In a basic setup, one segment file is created for each time
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Build from Source"
|
||||
---
|
||||
|
||||
### Build from Source
|
||||
# Build from Source
|
||||
|
||||
You can build Druid directly from source. Please note that these instructions are for building the latest stable version of Druid.
|
||||
For building the latest code in master, follow the instructions [here](https://github.com/apache/incubator-druid/blob/master/docs/content/development/build.md).
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Experimental Features"
|
||||
---
|
||||
|
||||
# About Experimental Features
|
||||
# Experimental Features
|
||||
|
||||
Experimental features are features we have developed but have not fully tested in a production environment. If you choose to try them out, there will likely be edge cases that we have not covered. We would love feedback on any of these features, whether they are bug reports, suggestions for improvement, or letting us know they work as intended.
|
||||
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Ambari Metrics Emitter"
|
||||
---
|
||||
|
||||
# Ambari Metrics Emitter
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `ambari-metrics-emitter` extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Microsoft Azure"
|
||||
---
|
||||
|
||||
# Microsoft Azure
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-azure-extensions` extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Apache Cassandra"
|
||||
---
|
||||
|
||||
# Apache Cassandra
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-cassandra-storage` extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Rackspace Cloud Files"
|
||||
---
|
||||
|
||||
# Rackspace Cloud Files
|
||||
|
||||
## Deep Storage
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "DistinctCount Aggregator"
|
||||
---
|
||||
|
||||
# DistinctCount aggregator
|
||||
# DistinctCount Aggregator
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) the `druid-distinctcount` extension.
|
||||
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Google Cloud Storage"
|
||||
---
|
||||
|
||||
# Google Cloud Storage
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-google-extensions` extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Graphite Emitter"
|
||||
---
|
||||
|
||||
# Graphite Emitter
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `graphite-emitter` extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "InfluxDB Line Protocol Parser"
|
||||
---
|
||||
|
||||
# InfluxDB Line Protocol Parser
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-influx-extensions`.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Kafka Emitter"
|
||||
---
|
||||
|
||||
# Kafka Emitter
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `kafka-emitter` extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Kafka Simple Consumer"
|
||||
---
|
||||
|
||||
# Kafka Simple Consumer
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-kafka-eight-simpleConsumer` extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Materialized View"
|
||||
---
|
||||
|
||||
# Materialized View
|
||||
|
||||
To use this feature, make sure to only load materialized-view-selection on broker and load materialized-view-maintenance on overlord. In addtion, this feature currently requires a hadoop cluster.
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "OpenTSDB Emitter"
|
||||
---
|
||||
|
||||
# Opentsdb Emitter
|
||||
# OpenTSDB Emitter
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `opentsdb-emitter` extension.
|
||||
|
||||
|
|
|
@ -19,15 +19,15 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "ORC"
|
||||
---
|
||||
|
||||
# Orc
|
||||
# ORC
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-orc-extensions`.
|
||||
|
||||
This extension enables Druid to ingest and understand the Apache Orc data format offline.
|
||||
This extension enables Druid to ingest and understand the Apache ORC data format offline.
|
||||
|
||||
## Orc Hadoop Parser
|
||||
## ORC Hadoop Parser
|
||||
|
||||
This is for batch ingestion using the HadoopDruidIndexer. The inputFormat of inputSpec in ioConfig must be set to `"org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat"`.
|
||||
|
||||
|
@ -35,7 +35,7 @@ This is for batch ingestion using the HadoopDruidIndexer. The inputFormat of inp
|
|||
|----------|-------------|----------------------------------------------------------------------------------------|---------|
|
||||
|type | String | This should say `orc` | yes|
|
||||
|parseSpec | JSON Object | Specifies the timestamp and dimensions of the data. Any parse spec that extends ParseSpec is possible but only their TimestampSpec and DimensionsSpec are used. | yes|
|
||||
|typeString| String | String representation of Orc struct type info. If not specified, auto constructed from parseSpec but all metric columns are dropped | no|
|
||||
|typeString| String | String representation of ORC struct type info. If not specified, auto constructed from parseSpec but all metric columns are dropped | no|
|
||||
|mapFieldNameFormat| String | String format for resolving the flatten map fields. Default is `<PARENT>_<CHILD>`. | no |
|
||||
|
||||
For example of `typeString`, string column col1 and array of string column col2 is represented by `"struct<col1:string,col2:array<string>>"`.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "RabbitMQ"
|
||||
---
|
||||
|
||||
# RabbitMQ
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-rabbitmq` extension.
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid Redis Cache"
|
||||
---
|
||||
|
||||
Druid Redis Cache
|
||||
--------------------
|
||||
# Druid Redis Cache
|
||||
|
||||
A cache implementation for Druid based on [Redis](https://github.com/antirez/redis).
|
||||
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "RocketMQ"
|
||||
---
|
||||
|
||||
# RocketMQ
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-rocketmq` extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Microsoft SQLServer"
|
||||
---
|
||||
|
||||
# Microsoft SQLServer
|
||||
|
||||
Make sure to [include](../../operations/including-extensions.html) `sqlserver-metadata-storage` as an extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "StatsD Emitter"
|
||||
---
|
||||
|
||||
# StatsD Emitter
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `statsd-emitter` extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Thrift"
|
||||
---
|
||||
|
||||
# Thrift
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-thrift-extensions`.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Timestamp Min/Max aggregators"
|
||||
---
|
||||
|
||||
# Timestamp Min/Max aggregators
|
||||
|
||||
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-time-min-max`.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Approximate Histogram aggregator"
|
||||
---
|
||||
|
||||
# Approximate Histogram aggregator
|
||||
|
||||
Make sure to [include](../../operations/including-extensions.html) `druid-histogram` as an extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Avro"
|
||||
---
|
||||
|
||||
# Avro
|
||||
|
||||
This extension enables Druid to ingest and understand the Apache Avro data format. Make sure to [include](../../operations/including-extensions.html) `druid-avro-extensions` as an extension.
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Bloom Filter"
|
||||
---
|
||||
|
||||
# Druid Bloom Filter
|
||||
# Bloom Filter
|
||||
|
||||
Make sure to [include](../../operations/including-extensions.html) `druid-bloom-filter` as an extension.
|
||||
|
||||
|
@ -37,7 +37,8 @@ Following are some characterstics of BloomFilter
|
|||
|
||||
Internally, this implementation of bloom filter uses Murmur3 fast non-cryptographic hash algorithm.
|
||||
|
||||
### Json Representation of Bloom Filter
|
||||
### JSON Representation of Bloom Filter
|
||||
|
||||
```json
|
||||
{
|
||||
"type" : "bloom",
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "DataSketches extension"
|
||||
---
|
||||
|
||||
## DataSketches extension
|
||||
# DataSketches extension
|
||||
|
||||
Druid aggregators based on [datasketches](http://datasketches.github.io/) library. Sketches are data structures implementing approximate streaming mergeable algorithms. Sketches can be ingested from the outside of Druid or built from raw data at ingestion time. Sketches can be stored in Druid segments as additive metrics.
|
||||
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "DataSketches HLL Sketch module"
|
||||
---
|
||||
|
||||
## DataSketches HLL Sketch module
|
||||
# DataSketches HLL Sketch module
|
||||
|
||||
This module provides Druid aggregators for distinct counting based on HLL sketch from [datasketches](http://datasketches.github.io/) library. At ingestion time, this aggregator creates the HLL sketch objects to be stored in Druid segments. At query time, sketches are read and merged together. In the end, by default, you receive the estimate of the number of distinct values presented to the sketch. Also, you can use post aggregator to produce a union of sketch columns in the same row.
|
||||
You can use the HLL sketch aggregator on columns of any identifiers. It will return estimated cardinality of the column.
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "DataSketches Quantiles Sketch module"
|
||||
---
|
||||
|
||||
## DataSketches Quantiles Sketch module
|
||||
# DataSketches Quantiles Sketch module
|
||||
|
||||
This module provides Druid aggregators based on numeric quantiles DoublesSketch from [datasketches](http://datasketches.github.io/) library. Quantiles sketch is a mergeable streaming algorithm to estimate the distribution of values, and approximately answer queries about the rank of a value, probability mass function of the distribution (PMF) or histogram, cummulative distribution function (CDF), and quantiles (median, min, max, 95th percentile and such). See [Quantiles Sketch Overview](https://datasketches.github.io/docs/Quantiles/QuantilesOverview.html).
|
||||
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "DataSketches Theta Sketch module"
|
||||
---
|
||||
|
||||
## DataSketches Theta Sketch module
|
||||
# DataSketches Theta Sketch module
|
||||
|
||||
This module provides Druid aggregators based on Theta sketch from [datasketches](http://datasketches.github.io/) library. Note that sketch algorithms are approximate; see details in the "Accuracy" section of the datasketches doc.
|
||||
At ingestion time, this aggregator creates the Theta sketch objects which get stored in Druid segments. Logically speaking, a Theta sketch object can be thought of as a Set data structure. At query time, sketches are read and aggregated (set unioned) together. In the end, by default, you receive the estimate of the number of unique entries in the sketch object. Also, you can use post aggregators to do union, intersection or difference on sketch columns in the same row.
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "DataSketches Tuple Sketch module"
|
||||
---
|
||||
|
||||
## DataSketches Tuple Sketch module
|
||||
# DataSketches Tuple Sketch module
|
||||
|
||||
This module provides Druid aggregators based on Tuple sketch from [datasketches](http://datasketches.github.io/) library. ArrayOfDoublesSketch sketches extend the functionality of the count-distinct Theta sketches by adding arrays of double values associated with unique keys.
|
||||
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Basic Security"
|
||||
---
|
||||
|
||||
# Druid Basic Security
|
||||
|
||||
This extension adds:
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Kerberos"
|
||||
---
|
||||
|
||||
# Druid-Kerberos
|
||||
# Kerberos
|
||||
|
||||
Druid Extension to enable Authentication for Druid Nodes using Kerberos.
|
||||
This extension adds an Authenticator which is used to protect HTTP Endpoints using the simple and protected GSSAPI negotiation mechanism [SPNEGO](https://en.wikipedia.org/wiki/SPNEGO).
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Cached Lookup Module"
|
||||
---
|
||||
# Cached Lookup Module
|
||||
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Extension Examples"
|
||||
---
|
||||
|
||||
# Druid examples
|
||||
# Extension Examples
|
||||
|
||||
## TwitterSpritzerFirehose
|
||||
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "HDFS"
|
||||
---
|
||||
|
||||
# HDFS
|
||||
|
||||
Make sure to [include](../../operations/including-extensions.html) `druid-hdfs-storage` as an extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Kafka Eight Firehose"
|
||||
---
|
||||
|
||||
# Kafka Eight Firehose
|
||||
|
||||
Make sure to [include](../../operations/including-extensions.html) `druid-kafka-eight` as an extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Kafka Lookups"
|
||||
---
|
||||
|
||||
# Kafka Lookups
|
||||
|
||||
<div class="note caution">
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Kafka Indexing Service"
|
||||
---
|
||||
|
||||
# Kafka Indexing Service
|
||||
|
||||
The Kafka indexing service enables the configuration of *supervisors* on the Overlord, which facilitate ingestion from
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Globally Cached Lookups"
|
||||
---
|
||||
|
||||
# Globally Cached Lookups
|
||||
|
||||
<div class="note caution">
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "MySQL Metadata Store"
|
||||
---
|
||||
|
||||
# MySQL Metadata Store
|
||||
|
||||
Make sure to [include](../../operations/including-extensions.html) `mysql-metadata-storage` as an extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid Parquet Extension"
|
||||
---
|
||||
|
||||
# Druid Parquet Extension
|
||||
|
||||
This module extends [Druid Hadoop based indexing](../../ingestion/hadoop.html) to ingest data directly from offline
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "PostgreSQL Metadata Store"
|
||||
---
|
||||
|
||||
# PostgreSQL Metadata Store
|
||||
|
||||
Make sure to [include](../../operations/including-extensions.html) `postgresql-metadata-storage` as an extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Protobuf"
|
||||
---
|
||||
|
||||
# Protobuf
|
||||
|
||||
This extension enables Druid to ingest and understand the Protobuf data format. Make sure to [include](../../operations/including-extensions.html) `druid-protobuf-extensions` as an extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "S3-compatible"
|
||||
---
|
||||
|
||||
# S3-compatible
|
||||
|
||||
Make sure to [include](../../operations/including-extensions.html) `druid-s3-extensions` as an extension.
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Simple SSLContext Provider Module"
|
||||
---
|
||||
|
||||
## Simple SSLContext Provider Module
|
||||
# Simple SSLContext Provider Module
|
||||
|
||||
This module contains a simple implementation of [SSLContext](http://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLContext.html)
|
||||
that will be injected to be used with HttpClient that Druid nodes use internally to communicate with each other. To learn more about
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Stats aggregator"
|
||||
---
|
||||
|
||||
# Stats aggregator
|
||||
|
||||
Includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../operations/including-extensions.html) `druid-stats` as an extension.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Test Stats Aggregators"
|
||||
---
|
||||
|
||||
# Test Stats Aggregators
|
||||
|
||||
Incorporates test statistics related aggregators, including z-score and p-value. Please refer to [https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/](https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/) for math background and details.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid extensions"
|
||||
---
|
||||
|
||||
# Druid extensions
|
||||
|
||||
Druid implements an extension system that allows for adding functionality at runtime. Extensions
|
||||
|
|
|
@ -19,8 +19,10 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Geographic Queries"
|
||||
---
|
||||
# Geographic Queries
|
||||
|
||||
Druid supports filtering specially spatially indexed columns based on an origin and a bound.
|
||||
|
||||
# Spatial Indexing
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Integrating Druid With Other Technologies"
|
||||
---
|
||||
# Integrating Druid With Other Technologies
|
||||
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "JavaScript Programming Guide"
|
||||
---
|
||||
# JavaScript Programming Guide
|
||||
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Extending Druid With Custom Modules"
|
||||
---
|
||||
|
||||
# Extending Druid With Custom Modules
|
||||
|
||||
Druid uses a module system that allows for the addition of extensions at runtime.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Developing on Druid"
|
||||
---
|
||||
|
||||
# Developing on Druid
|
||||
|
||||
Druid's codebase consists of several major components. For developers interested in learning the code, this document provides
|
||||
|
|
|
@ -19,10 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Router Node"
|
||||
---
|
||||
|
||||
Router Node
|
||||
===========
|
||||
# Router Node
|
||||
|
||||
You should only ever need the router node if you have a Druid cluster well into the terabyte range. The router node can be used to route queries to different broker nodes. By default, the broker routes queries based on how [Rules](../operations/rule-configuration.html) are set up. For example, if 1 month of recent data is loaded into a `hot` cluster, queries that fall within the recent month can be routed to a dedicated set of brokers. Queries outside this range are routed to another set of brokers. This set up provides query isolation such that queries for more important data are not impacted by queries for less important data.
|
||||
|
||||
|
|
|
@ -19,8 +19,10 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Versioning Druid"
|
||||
---
|
||||
# Versioning Druid
|
||||
|
||||
This page discusses how we do versioning and provides information on our stable releases.
|
||||
|
||||
Versioning Strategy
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Batch Data Ingestion"
|
||||
---
|
||||
|
||||
# Batch Data Ingestion
|
||||
|
||||
Druid can load data from static files through a variety of methods described here.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Command Line Hadoop Indexer"
|
||||
---
|
||||
|
||||
# Command Line Hadoop Indexer
|
||||
|
||||
To run:
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Compaction Task"
|
||||
---
|
||||
|
||||
# Compaction Task
|
||||
|
||||
Compaction tasks merge all segments of the given interval. The syntax is:
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Data Formats for Ingestion"
|
||||
---
|
||||
Data Formats for Ingestion
|
||||
==========================
|
||||
# Data Formats for Ingestion
|
||||
|
||||
Druid can ingest denormalized data in JSON, CSV, or a delimited form such as TSV, or any custom format. While most examples in the documentation use data in JSON format, it is not difficult to configure Druid to ingest any other delimited data.
|
||||
We welcome any contributions to new formats.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Deleting Data"
|
||||
---
|
||||
|
||||
# Deleting Data
|
||||
|
||||
Permanent deletion of a Druid segment has two steps:
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "My Data isn't being loaded"
|
||||
---
|
||||
|
||||
## My Data isn't being loaded
|
||||
# My Data isn't being loaded
|
||||
|
||||
### Realtime Ingestion
|
||||
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Druid Firehoses"
|
||||
---
|
||||
|
||||
# Druid Firehoses
|
||||
|
||||
Firehoses are used in [native batch ingestion tasks](../ingestion/native_tasks.html), stream push tasks automatically created by [Tranquility](../ingestion/stream-push.html), and the [stream-pull (deprecated)](../ingestion/stream-pull.html) ingestion model.
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "JSON Flatten Spec"
|
||||
---
|
||||
|
||||
# JSON Flatten Spec
|
||||
|
||||
| Field | Type | Description | Required |
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Hadoop-based Batch Ingestion"
|
||||
---
|
||||
|
||||
# Hadoop-based Batch Ingestion
|
||||
|
||||
Hadoop-based batch ingestion in Druid is supported via a Hadoop-ingestion task. These tasks can be posted to a running
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Ingestion"
|
||||
---
|
||||
|
||||
# Ingestion
|
||||
|
||||
## Overview
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Ingestion Spec"
|
||||
---
|
||||
|
||||
# Ingestion Spec
|
||||
|
||||
A Druid ingestion spec consists of 3 components:
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Task Locking & Priority"
|
||||
---
|
||||
|
||||
# Task Locking & Priority
|
||||
|
||||
## Locking
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Miscellaneous Tasks"
|
||||
---
|
||||
|
||||
# Miscellaneous Tasks
|
||||
|
||||
## Noop Task
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Native Index Tasks"
|
||||
---
|
||||
# Native Index Tasks
|
||||
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Ingestion Reports"
|
||||
---
|
||||
# Ingestion Reports
|
||||
|
||||
|
|
|
@ -19,6 +19,7 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Schema Changes"
|
||||
---
|
||||
# Schema Changes
|
||||
|
||||
|
|
|
@ -19,8 +19,8 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Schema Design"
|
||||
---
|
||||
|
||||
# Schema Design
|
||||
|
||||
This page is meant to assist users in designing a schema for data to be ingested in Druid. Druid intakes denormalized data
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Loading Streams"
|
||||
---
|
||||
|
||||
# Loading streams
|
||||
# Loading Streams
|
||||
|
||||
Streams can be ingested in Druid using either [Tranquility](https://github.com/druid-io/tranquility) (a Druid-aware
|
||||
client) or the [Kafka Indexing Service](../development/extensions-core/kafka-ingestion.html).
|
||||
|
|
|
@ -19,14 +19,14 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Stream Pull Ingestion"
|
||||
---
|
||||
|
||||
<div class="note info">
|
||||
NOTE: Realtime nodes are deprecated. Please use the <a href="../development/extensions-core/kafka-ingestion.html">Kafka Indexing Service</a> for stream pull use cases instead.
|
||||
</div>
|
||||
|
||||
Stream Pull Ingestion
|
||||
=====================
|
||||
# Stream Pull Ingestion
|
||||
|
||||
If you have an external service that you want to pull data from, you have two options. The simplest
|
||||
option is to set up a "copying" service that reads from the data source and writes to Druid using
|
||||
|
|
|
@ -19,9 +19,9 @@
|
|||
|
||||
---
|
||||
layout: doc_page
|
||||
title: "Stream Push"
|
||||
---
|
||||
|
||||
## Stream Push
|
||||
# Stream Push
|
||||
|
||||
Druid can connect to any streaming data source through
|
||||
[Tranquility](https://github.com/druid-io/tranquility/blob/master/README.md), a package for pushing
|
||||
|
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue