Add more Apache branding to docs (#7515)

This commit is contained in:
Jonathan Wei 2019-04-19 15:52:26 -07:00 committed by Fangjin Yang
parent 9929f8b022
commit 74960e82bf
167 changed files with 195 additions and 189 deletions

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid vs Elasticsearch"
title: "Apache Druid (incubating) vs Elasticsearch"
---
<!--

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)"
title: "Apache Druid (incubating) vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)"
---
<!--

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid vs Kudu"
title: "Apache Druid (incubating) vs Kudu"
---
<!--

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid vs Redshift"
title: "Apache Druid (incubating) vs Redshift"
---
<!--

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid vs Spark"
title: "Apache Druid (incubating) vs Spark"
---
<!--

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid vs SQL-on-Hadoop"
title: "Apache Druid (incubating) vs SQL-on-Hadoop"
---
<!--

View File

@ -24,7 +24,7 @@ title: "Configuration Reference"
# Configuration Reference
This page documents all of the configuration properties for each Druid service type.
This page documents all of the configuration properties for each Apache Druid (incubating) service type.
## Table of Contents
* [Recommended Configuration File Organization](#recommended-configuration-file-organization)

View File

@ -24,7 +24,7 @@ title: "Logging"
# Logging
Druid processes will emit logs that are useful for debugging to the console. Druid processes also emit periodic metrics about their state. For more about metrics, see [Configuration](../configuration/index.html#enabling-metrics). Metric logs are printed to the console by default, and can be disabled with `-Ddruid.emitter.logging.logLevel=debug`.
Apache Druid (incubating) processes will emit logs that are useful for debugging to the console. Druid processes also emit periodic metrics about their state. For more about metrics, see [Configuration](../configuration/index.html#enabling-metrics). Metric logs are printed to the console by default, and can be disabled with `-Ddruid.emitter.logging.logLevel=debug`.
Druid uses [log4j2](http://logging.apache.org/log4j/2.x/) for logging. Logging can be configured with a log4j2.xml file. Add the path to the directory containing the log4j2.xml file (e.g. the _common/ dir) to your classpath if you want to override default Druid log configuration. Note that this directory should be earlier in the classpath than the druid jars. The easiest way to do this is to prefix the classpath with the config dir.

View File

@ -24,7 +24,7 @@ title: "Realtime Process Configuration"
# Realtime Process Configuration
For general Realtime Process information, see [here](../design/realtime.html).
For general Apache Druid (incubating) Realtime Process information, see [here](../design/realtime.html).
Runtime Configuration
---------------------

View File

@ -26,7 +26,7 @@ title: "Cassandra Deep Storage"
## Introduction
Druid can use Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables:
Apache Druid (incubating) can use Apache Cassandra as a deep storage mechanism. Segments and their metadata are stored in Cassandra in two tables:
`index_storage` and `descriptor_storage`. Underneath the hood, the Cassandra integration leverages Astyanax. The
index storage table is a [Chunked Object](https://github.com/Netflix/astyanax/wiki/Chunked-Object-Store) repository. It contains
compressed segments for distribution to Historical processes. Since segments can be large, the Chunked Object storage allows the integration to multi-thread

View File

@ -24,7 +24,7 @@ title: "Deep Storage"
# Deep Storage
Deep storage is where segments are stored. It is a storage mechanism that Druid does not provide. This deep storage infrastructure defines the level of durability of your data, as long as Druid processes can see this storage infrastructure and get at the segments stored on it, you will not lose data no matter how many Druid nodes you lose. If segments disappear from this storage layer, then you will lose whatever data those segments represented.
Deep storage is where segments are stored. It is a storage mechanism that Apache Druid (incubating) does not provide. This deep storage infrastructure defines the level of durability of your data, as long as Druid processes can see this storage infrastructure and get at the segments stored on it, you will not lose data no matter how many Druid nodes you lose. If segments disappear from this storage layer, then you will lose whatever data those segments represented.
## Local Mount

View File

@ -24,7 +24,7 @@ title: "Metadata Storage"
# Metadata Storage
The Metadata Storage is an external dependency of Druid. Druid uses it to store
The Metadata Storage is an external dependency of Apache Druid (incubating). Druid uses it to store
various metadata about the system, but not to store the actual data. There are
a number of tables used for various purposes described below.

View File

@ -24,7 +24,7 @@ title: "ZooKeeper"
# ZooKeeper
Druid uses [ZooKeeper](http://zookeeper.apache.org/) (ZK) for management of current cluster state. The operations that happen over ZK are
Apache Druid (incubating) uses [Apache ZooKeeper](http://zookeeper.apache.org/) (ZK) for management of current cluster state. The operations that happen over ZK are
1. [Coordinator](../design/coordinator.html) leader election
2. Segment "publishing" protocol from [Historical](../design/historical.html) and [Realtime](../design/realtime.html)

View File

@ -24,6 +24,8 @@ title: "Authentication and Authorization"
# Authentication and Authorization
This document describes non-extension specific Apache Druid (incubating) authentication and authorization configurations.
|Property|Type|Description|Default|Required|
|--------|-----------|--------|--------|--------|
|`druid.auth.authenticatorChain`|JSON List of Strings|List of Authenticator type names|["allowAll"]|no|

View File

@ -26,7 +26,7 @@ title: "Broker"
### Configuration
For Broker Process Configuration, see [Broker Configuration](../configuration/index.html#broker).
For Apache Druid (incubating) Broker Process Configuration, see [Broker Configuration](../configuration/index.html#broker).
### HTTP endpoints
@ -45,7 +45,7 @@ org.apache.druid.cli.Main server broker
### Forwarding Queries
Most druid queries contain an interval object that indicates a span of time for which data is requested. Likewise, Druid [Segments](../design/segments.html) are partitioned to contain data for some interval of time and segments are distributed across a cluster. Consider a simple datasource with 7 segments where each segment contains data for a given day of the week. Any query issued to the datasource for more than one day of data will hit more than one segment. These segments will likely be distributed across multiple processes, and hence, the query will likely hit multiple processes.
Most Druid queries contain an interval object that indicates a span of time for which data is requested. Likewise, Druid [Segments](../design/segments.html) are partitioned to contain data for some interval of time and segments are distributed across a cluster. Consider a simple datasource with 7 segments where each segment contains data for a given day of the week. Any query issued to the datasource for more than one day of data will hit more than one segment. These segments will likely be distributed across multiple processes, and hence, the query will likely hit multiple processes.
To determine which processes to forward queries to, the Broker process first builds a view of the world from information in Zookeeper. Zookeeper maintains information about [Historical](../design/historical.html) and streaming ingestion [Peon](../design/peons.html) processes and the segments they are serving. For every datasource in Zookeeper, the Broker process builds a timeline of segments and the processes that serve them. When queries are received for a specific datasource and interval, the Broker process performs a lookup into the timeline associated with the query datasource for the query interval and retrieves the processes that contain data for the query. The Broker process then forwards down the query to the selected processes.

View File

@ -26,7 +26,7 @@ title: "Coordinator Process"
### Configuration
For Coordinator Process Configuration, see [Coordinator Configuration](../configuration/index.html#coordinator).
For Apache Druid (incubating) Coordinator Process Configuration, see [Coordinator Configuration](../configuration/index.html#coordinator).
### HTTP endpoints

View File

@ -26,7 +26,7 @@ title: "Historical Process"
### Configuration
For Historical Process Configuration, see [Historical Configuration](../configuration/index.html#historical).
For Apache Druid (incubating) Historical Process Configuration, see [Historical Configuration](../configuration/index.html#historical).
### HTTP Endpoints

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Design"
title: "Apache Druid (incubating) Design"
---
<!--
@ -24,7 +24,7 @@ title: "Design"
# What is Druid?<a id="what-is-druid"></a>
Druid is a data store designed for high-performance slice-and-dice analytics
Apache Druid (incubating) is a data store designed for high-performance slice-and-dice analytics
("[OLAP](http://en.wikipedia.org/wiki/Online_analytical_processing)"-style) on large data sets. Druid is most often
used as a data store for powering GUI analytical applications, or as a backend for highly-concurrent APIs that need
fast aggregations. Common application areas for Druid include:

View File

@ -24,7 +24,7 @@ title: "Indexing Service"
# Indexing Service
The indexing service is a highly-available, distributed service that runs indexing related tasks.
The Apache Druid (incubating) indexing service is a highly-available, distributed service that runs indexing related tasks.
Indexing [tasks](../ingestion/tasks.html) create (and sometimes destroy) Druid [segments](../design/segments.html). The indexing service has a master/slave like architecture.

View File

@ -26,7 +26,7 @@ title: "MiddleManager Process"
### Configuration
For Middlemanager Process Configuration, see [Indexing Service Configuration](../configuration/index.html#middlemanager-and-peons).
For Apache Druid (incubating) Middlemanager Process Configuration, see [Indexing Service Configuration](../configuration/index.html#middlemanager-and-peons).
### HTTP Endpoints

View File

@ -26,7 +26,7 @@ title: "Overlord Process"
### Configuration
For Overlord Process Configuration, see [Overlord Configuration](../configuration/index.html#overlord).
For Apache Druid (incubating) Overlord Process Configuration, see [Overlord Configuration](../configuration/index.html#overlord).
### HTTP Endpoints

View File

@ -26,7 +26,7 @@ title: "Peons"
### Configuration
For Peon Configuration, see [Peon Query Configuration](../configuration/index.html#peon-query-configuration) and [Additional Peon Configuration](../configuration/index.html#additional-peon-configuration).
For Apache Druid (incubating) Peon Configuration, see [Peon Query Configuration](../configuration/index.html#peon-query-configuration) and [Additional Peon Configuration](../configuration/index.html#additional-peon-configuration).
### HTTP Endpoints

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid Plumbers"
title: "Apache Druid (incubating) Plumbers"
---
<!--

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid Processes and Servers"
title: "Apache Druid (incubating) Processes and Servers"
---
<!--

View File

@ -28,7 +28,7 @@ title: "Real-time Process"
NOTE: Realtime processes are deprecated. Please use the <a href="../development/extensions-core/kafka-ingestion.html">Kafka Indexing Service</a> for stream pull use cases instead.
</div>
For Real-time Process Configuration, see [Realtime Configuration](../configuration/realtime.html).
For Apache Druid (incubating) Real-time Process Configuration, see [Realtime Configuration](../configuration/realtime.html).
For Real-time Ingestion, see [Realtime Ingestion](../ingestion/stream-ingestion.html).

View File

@ -24,7 +24,7 @@ title: "Segments"
# Segments
Druid stores its index in *segment files*, which are partitioned by
Apache Druid (incubating) stores its index in *segment files*, which are partitioned by
time. In a basic setup, one segment file is created for each time
interval, where the time interval is configurable in the
`segmentGranularity` parameter of the `granularitySpec`, which is

View File

@ -24,7 +24,7 @@ title: "Build from Source"
# Build from Source
You can build Druid directly from source. Please note that these instructions are for building the latest stable version of Druid.
You can build Apache Druid (incubating) directly from source. Please note that these instructions are for building the latest stable version of Druid.
For building the latest code in master, follow the instructions [here](https://github.com/apache/incubator-druid/blob/master/docs/content/development/build.md).

View File

@ -36,4 +36,4 @@ To enable experimental features, include their artifacts in the configuration ru
druid.extensions.loadList=["druid-histogram"]
```
The configuration files for all the Druid processes need to be updated with this.
The configuration files for all the Apache Druid (incubating) processes need to be updated with this.

View File

@ -24,11 +24,11 @@ title: "Ambari Metrics Emitter"
# Ambari Metrics Emitter
To use this extension, make sure to [include](../../operations/including-extensions.html) `ambari-metrics-emitter` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `ambari-metrics-emitter` extension.
## Introduction
This extension emits druid metrics to a ambari-metrics carbon server.
This extension emits Druid metrics to a ambari-metrics carbon server.
Events are sent after been [pickled](http://ambari-metrics.readthedocs.org/en/latest/feeding-carbon.html#the-pickle-protocol); the size of the batch is configurable.
## Configuration

View File

@ -24,11 +24,11 @@ title: "Microsoft Azure"
# Microsoft Azure
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-azure-extensions` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-azure-extensions` extension.
## Deep Storage
[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional druid configuration.
[Microsoft Azure Storage](http://azure.microsoft.com/en-us/services/storage/) is another option for deep storage. This requires some additional Druid configuration.
|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|

View File

@ -24,8 +24,8 @@ title: "Apache Cassandra"
# Apache Cassandra
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-cassandra-storage` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-cassandra-storage` extension.
[Apache Cassandra](http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra) can also
be leveraged for deep storage. This requires some additional druid configuration as well as setting up the necessary
be leveraged for deep storage. This requires some additional Druid configuration as well as setting up the necessary
schema within a Cassandra keystore.

View File

@ -24,9 +24,11 @@ title: "Rackspace Cloud Files"
# Rackspace Cloud Files
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-cloudfiles-extensions` extension.
## Deep Storage
[Rackspace Cloud Files](http://www.rackspace.com/cloud/files/) is another option for deep storage. This requires some additional druid configuration.
[Rackspace Cloud Files](http://www.rackspace.com/cloud/files/) is another option for deep storage. This requires some additional Druid configuration.
|Property|Possible Values|Description|Default|
|--------|---------------|-----------|-------|

View File

@ -24,7 +24,7 @@ title: "DistinctCount Aggregator"
# DistinctCount Aggregator
To use this extension, make sure to [include](../../operations/including-extensions.html) the `druid-distinctcount` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) the `druid-distinctcount` extension.
Additionally, follow these steps:

View File

@ -24,7 +24,7 @@ title: "Google Cloud Storage"
# Google Cloud Storage
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-google-extensions` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-google-extensions` extension.
## Deep Storage

View File

@ -24,7 +24,7 @@ title: "Graphite Emitter"
# Graphite Emitter
To use this extension, make sure to [include](../../operations/including-extensions.html) `graphite-emitter` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `graphite-emitter` extension.
## Introduction

View File

@ -24,7 +24,7 @@ title: "InfluxDB Line Protocol Parser"
# InfluxDB Line Protocol Parser
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-influx-extensions`.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-influx-extensions`.
This extension enables Druid to parse the [InfluxDB Line Protocol](https://docs.influxdata.com/influxdb/v1.5/write_protocols/line_protocol_tutorial/), a popular text-based timeseries metric serialization format.

View File

@ -24,11 +24,11 @@ title: "Kafka Emitter"
# Kafka Emitter
To use this extension, make sure to [include](../../operations/including-extensions.html) `kafka-emitter` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `kafka-emitter` extension.
## Introduction
This extension emits Druid metrics to a [Kafka](https://kafka.apache.org) directly with JSON format.<br>
This extension emits Druid metrics to [Apache Kafka](https://kafka.apache.org) directly with JSON format.<br>
Currently, Kafka has not only their nice ecosystem but also consumer API readily available.
So, If you currently use Kafka, It's easy to integrate various tool or UI
to monitor the status of your Druid cluster with this extension.

View File

@ -24,11 +24,11 @@ title: "Kafka Simple Consumer"
# Kafka Simple Consumer
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-kafka-eight-simpleConsumer` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-kafka-eight-simpleConsumer` extension.
## Firehose
This is an experimental firehose to ingest data from kafka using kafka simple consumer api. Currently, this firehose would only work inside standalone realtime processes.
This is an experimental firehose to ingest data from Apache Kafka using the Kafka simple consumer api. Currently, this firehose would only work inside standalone realtime processes.
The configuration for KafkaSimpleConsumerFirehose is similar to the Kafka Eight Firehose , except `firehose` should be replaced with `firehoseV2` like this:
```json

View File

@ -24,7 +24,7 @@ title: "Materialized View"
# Materialized View
To use this feature, make sure to only load `materialized-view-selection` on Broker and load `materialized-view-maintenance` on Overlord. In addtion, this feature currently requires a Hadoop cluster.
To use this Apache Druid (incubating) feature, make sure to only load `materialized-view-selection` on Broker and load `materialized-view-maintenance` on Overlord. In addtion, this feature currently requires a Hadoop cluster.
This feature enables Druid to greatly improve the query performance, especially when the query dataSource has a very large number of dimensions but the query only required several dimensions. This feature includes two parts. One is `materialized-view-maintenance`, and the other is `materialized-view-selection`.

View File

@ -24,10 +24,10 @@ title: "Moment Sketches for Approximate Quantiles module"
# MomentSketch Quantiles Sketch module
This module provides Druid aggregators for approximate quantile queries using the [momentsketch](https://github.com/stanford-futuredata/momentsketch) library.
This module provides aggregators for approximate quantile queries using the [momentsketch](https://github.com/stanford-futuredata/momentsketch) library.
The momentsketch provides coarse quantile estimates with less space and aggregation time overheads than traditional sketches, approaching the performance of counts and sums by reconstructing distributions from computed statistics.
To use this aggregator, make sure you [include](../../operations/including-extensions.html) the extension in your config file:
To use this Apache Druid (incubating) extension, make sure you [include](../../operations/including-extensions.html) the extension in your config file:
```
druid.extensions.loadList=["druid-momentsketch"]

View File

@ -24,7 +24,7 @@ title: "OpenTSDB Emitter"
# OpenTSDB Emitter
To use this extension, make sure to [include](../../operations/including-extensions.html) `opentsdb-emitter` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `opentsdb-emitter` extension.
## Introduction

View File

@ -24,7 +24,7 @@ title: "RabbitMQ"
# RabbitMQ
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-rabbitmq` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-rabbitmq` extension.
## Firehose

View File

@ -24,6 +24,8 @@ title: "Druid Redis Cache"
# Druid Redis Cache
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-redis-cache` extension.
A cache implementation for Druid based on [Redis](https://github.com/antirez/redis).
# Configuration

View File

@ -24,6 +24,6 @@ title: "RocketMQ"
# RocketMQ
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-rocketmq` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-rocketmq` extension.
Original author: [https://github.com/lizhanhui](https://github.com/lizhanhui).

View File

@ -24,7 +24,7 @@ title: "Microsoft SQLServer"
# Microsoft SQLServer
Make sure to [include](../../operations/including-extensions.html) `sqlserver-metadata-storage` as an extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `sqlserver-metadata-storage` as an extension.
## Setting up SQLServer

View File

@ -24,7 +24,7 @@ title: "StatsD Emitter"
# StatsD Emitter
To use this extension, make sure to [include](../../operations/including-extensions.html) `statsd-emitter` extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `statsd-emitter` extension.
## Introduction

View File

@ -24,7 +24,7 @@ title: "Thrift"
# Thrift
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-thrift-extensions`.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-thrift-extensions`.
This extension enables Druid to ingest thrift compact data online (`ByteBuffer`) and offline (SequenceFile of type `<Writable, BytesWritable>` or LzoThriftBlock File).

View File

@ -24,7 +24,7 @@ title: "Timestamp Min/Max aggregators"
# Timestamp Min/Max aggregators
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-time-min-max`.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-time-min-max`.
These aggregators enable more precise calculation of min and max time of given events than `__time` column whose granularity is sparse, the same as query granularity.
To use this feature, a "timeMin" or "timeMax" aggregator must be included at indexing time.

View File

@ -24,7 +24,7 @@ title: "Approximate Histogram aggregators"
# Approximate Histogram aggregators
Make sure to [include](../../operations/including-extensions.html) `druid-histogram` as an extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-histogram` as an extension.
The `druid-histogram` extension provides an approximate histogram aggregator and a fixed buckets histogram aggregator.

View File

@ -24,7 +24,7 @@ title: "Avro"
# Avro
This extension enables Druid to ingest and understand the Apache Avro data format. Make sure to [include](../../operations/including-extensions.html) `druid-avro-extensions` as an extension.
This Apache Druid (incubating) extension enables Druid to ingest and understand the Apache Avro data format. Make sure to [include](../../operations/including-extensions.html) `druid-avro-extensions` as an extension.
### Avro Stream Parser

View File

@ -24,7 +24,7 @@ title: "Bloom Filter"
# Bloom Filter
This extension adds the ability to both construct bloom filters from query results, and filter query results by testing
This Apache Druid (incubating) extension adds the ability to both construct bloom filters from query results, and filter query results by testing
against a bloom filter. Make sure to [include](../../operations/including-extensions.html) `druid-bloom-filter` as an
extension.

View File

@ -24,7 +24,7 @@ title: "DataSketches extension"
# DataSketches extension
Druid aggregators based on [datasketches](http://datasketches.github.io/) library. Sketches are data structures implementing approximate streaming mergeable algorithms. Sketches can be ingested from the outside of Druid or built from raw data at ingestion time. Sketches can be stored in Druid segments as additive metrics.
Apache Druid (incubating) aggregators based on [datasketches](http://datasketches.github.io/) library. Sketches are data structures implementing approximate streaming mergeable algorithms. Sketches can be ingested from the outside of Druid or built from raw data at ingestion time. Sketches can be stored in Druid segments as additive metrics.
To use the datasketches aggregators, make sure you [include](../../operations/including-extensions.html) the extension in your config file:

View File

@ -24,7 +24,7 @@ title: "DataSketches HLL Sketch module"
# DataSketches HLL Sketch module
This module provides Druid aggregators for distinct counting based on HLL sketch from [datasketches](http://datasketches.github.io/) library. At ingestion time, this aggregator creates the HLL sketch objects to be stored in Druid segments. At query time, sketches are read and merged together. In the end, by default, you receive the estimate of the number of distinct values presented to the sketch. Also, you can use post aggregator to produce a union of sketch columns in the same row.
This module provides Apache Druid (incubating) aggregators for distinct counting based on HLL sketch from [datasketches](http://datasketches.github.io/) library. At ingestion time, this aggregator creates the HLL sketch objects to be stored in Druid segments. At query time, sketches are read and merged together. In the end, by default, you receive the estimate of the number of distinct values presented to the sketch. Also, you can use post aggregator to produce a union of sketch columns in the same row.
You can use the HLL sketch aggregator on columns of any identifiers. It will return estimated cardinality of the column.
To use this aggregator, make sure you [include](../../operations/including-extensions.html) the extension in your config file:

View File

@ -24,7 +24,7 @@ title: "DataSketches Quantiles Sketch module"
# DataSketches Quantiles Sketch module
This module provides Druid aggregators based on numeric quantiles DoublesSketch from [datasketches](http://datasketches.github.io/) library. Quantiles sketch is a mergeable streaming algorithm to estimate the distribution of values, and approximately answer queries about the rank of a value, probability mass function of the distribution (PMF) or histogram, cummulative distribution function (CDF), and quantiles (median, min, max, 95th percentile and such). See [Quantiles Sketch Overview](https://datasketches.github.io/docs/Quantiles/QuantilesOverview.html).
This module provides Apache Druid (incubating) aggregators based on numeric quantiles DoublesSketch from [datasketches](http://datasketches.github.io/) library. Quantiles sketch is a mergeable streaming algorithm to estimate the distribution of values, and approximately answer queries about the rank of a value, probability mass function of the distribution (PMF) or histogram, cummulative distribution function (CDF), and quantiles (median, min, max, 95th percentile and such). See [Quantiles Sketch Overview](https://datasketches.github.io/docs/Quantiles/QuantilesOverview.html).
There are three major modes of operation:

View File

@ -24,7 +24,7 @@ title: "DataSketches Theta Sketch module"
# DataSketches Theta Sketch module
This module provides Druid aggregators based on Theta sketch from [datasketches](http://datasketches.github.io/) library. Note that sketch algorithms are approximate; see details in the "Accuracy" section of the datasketches doc.
This module provides Apache Druid (incubating) aggregators based on Theta sketch from [datasketches](http://datasketches.github.io/) library. Note that sketch algorithms are approximate; see details in the "Accuracy" section of the datasketches doc.
At ingestion time, this aggregator creates the Theta sketch objects which get stored in Druid segments. Logically speaking, a Theta sketch object can be thought of as a Set data structure. At query time, sketches are read and aggregated (set unioned) together. In the end, by default, you receive the estimate of the number of unique entries in the sketch object. Also, you can use post aggregators to do union, intersection or difference on sketch columns in the same row.
Note that you can use `thetaSketch` aggregator on columns which were not ingested using the same. It will return estimated cardinality of the column. It is recommended to use it at ingestion time as well to make querying faster.

View File

@ -24,7 +24,7 @@ title: "DataSketches Tuple Sketch module"
# DataSketches Tuple Sketch module
This module provides Druid aggregators based on Tuple sketch from [datasketches](http://datasketches.github.io/) library. ArrayOfDoublesSketch sketches extend the functionality of the count-distinct Theta sketches by adding arrays of double values associated with unique keys.
This module provides Apache Druid (incubating) aggregators based on Tuple sketch from [datasketches](http://datasketches.github.io/) library. ArrayOfDoublesSketch sketches extend the functionality of the count-distinct Theta sketches by adding arrays of double values associated with unique keys.
To use this aggregator, make sure you [include](../../operations/including-extensions.html) the extension in your config file:

View File

@ -24,7 +24,7 @@ title: "Basic Security"
# Druid Basic Security
This extension adds:
This Apache Druid (incubating) extension adds:
- an Authenticator which supports [HTTP Basic authentication](https://en.wikipedia.org/wiki/Basic_access_authentication)
- an Authorizer which implements basic role-based access control

View File

@ -24,7 +24,7 @@ title: "Kerberos"
# Kerberos
Druid Extension to enable Authentication for Druid Processes using Kerberos.
Apache Druid (incubating) Extension to enable Authentication for Druid Processes using Kerberos.
This extension adds an Authenticator which is used to protect HTTP Endpoints using the simple and protected GSSAPI negotiation mechanism [SPNEGO](https://en.wikipedia.org/wiki/SPNEGO).
Make sure to [include](../../operations/including-extensions.html) `druid-kerberos` as an extension.

View File

@ -27,7 +27,7 @@ title: "Cached Lookup Module"
<div class="note info">Please note that this is an experimental module and the development/testing still at early stage. Feel free to try it and give us your feedback.</div>
## Description
This module provides a per-lookup caching mechanism for JDBC data sources.
This Apache Druid (incubating) module provides a per-lookup caching mechanism for JDBC data sources.
The main goal of this cache is to speed up the access to a high latency lookup sources and to provide a caching isolation for every lookup source.
Thus user can define various caching strategies or and implementation per lookup, even if the source is the same.
This module can be used side to side with other lookup module like the global cached lookup module.

View File

@ -24,7 +24,7 @@ title: "HDFS"
# HDFS
Make sure to [include](../../operations/including-extensions.html) `druid-hdfs-storage` as an extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-hdfs-storage` as an extension.
## Deep Storage

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Kafka Eight Firehose"
title: "Apache Kafka Eight Firehose"
---
<!--
@ -24,7 +24,7 @@ title: "Kafka Eight Firehose"
# Kafka Eight Firehose
Make sure to [include](../../operations/including-extensions.html) `druid-kafka-eight` as an extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-kafka-eight` as an extension.
This firehose acts as a Kafka 0.8.x consumer and ingests data from Kafka.

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Kafka Lookups"
title: "Apache Kafka Lookups"
---
<!--
@ -28,7 +28,7 @@ title: "Kafka Lookups"
Lookups are an <a href="../experimental.html">experimental</a> feature.
</div>
Make sure to [include](../../operations/including-extensions.html) `druid-lookups-cached-global` and `druid-kafka-extraction-namespace` as an extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-lookups-cached-global` and `druid-kafka-extraction-namespace` as an extension.
If you need updates to populate as promptly as possible, it is possible to plug into a kafka topic whose key is the old value and message is the desired new value (both in UTF-8) as a LookupExtractorFactory.

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Kafka Indexing Service"
title: "Apache Kafka Indexing Service"
---
<!--
@ -31,7 +31,7 @@ able to read non-recent events from Kafka and are not subject to the window peri
ingestion mechanisms using Tranquility. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures,
and ensure that the scalability and replication requirements are maintained.
This service is provided in the `druid-kafka-indexing-service` core extension (see
This service is provided in the `druid-kafka-indexing-service` core Apache Druid (incubating) extension (see
[Including Extensions](../../operations/including-extensions.html)).
<div class="note info">

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Kinesis Indexing Service"
title: "Amazon Kinesis Indexing Service"
---
<!--
@ -31,7 +31,7 @@ able to read non-recent events from Kinesis and are not subject to the window pe
ingestion mechanisms using Tranquility. The supervisor oversees the state of the indexing tasks to coordinate handoffs, manage failures,
and ensure that the scalability and replication requirements are maintained.
The Kinesis indexing service is provided as the `druid-kinesis-indexing-service` core extension (see
The Kinesis indexing service is provided as the `druid-kinesis-indexing-service` core Apache Druid (incubating) extension (see
[Including Extensions](../../operations/including-extensions.html)). Please note that this is
currently designated as an *experimental feature* and is subject to the usual
[experimental caveats](../experimental.html).

View File

@ -28,7 +28,7 @@ title: "Globally Cached Lookups"
Lookups are an <a href="../experimental.html">experimental</a> feature.
</div>
Make sure to [include](../../operations/including-extensions.html) `druid-lookups-cached-global` as an extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-lookups-cached-global` as an extension.
## Configuration
<div class="note caution">

View File

@ -24,7 +24,7 @@ title: "MySQL Metadata Store"
# MySQL Metadata Store
Make sure to [include](../../operations/including-extensions.html) `mysql-metadata-storage` as an extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `mysql-metadata-storage` as an extension.
<div class="note caution">
The MySQL extension requires the MySQL Connector/J library which is not included in the Druid distribution.

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid ORC Extension"
title: "ORC Extension"
---
<!--
@ -22,9 +22,9 @@ title: "Druid ORC Extension"
~ under the License.
-->
# Druid ORC Extension
# ORC Extension
This module extends [Druid Hadoop based indexing](../../ingestion/hadoop.html) to ingest data directly from offline
This Apache Druid (incubating) module extends [Druid Hadoop based indexing](../../ingestion/hadoop.html) to ingest data directly from offline
Apache ORC files.
To use this extension, make sure to [include](../../operations/including-extensions.html) `druid-orc-extensions`.

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid Parquet Extension"
title: "Apache Parquet Extension"
---
<!--
@ -22,9 +22,9 @@ title: "Druid Parquet Extension"
~ under the License.
-->
# Druid Parquet Extension
# Apache Parquet Extension
This module extends [Druid Hadoop based indexing](../../ingestion/hadoop.html) to ingest data directly from offline
This Apache Druid (incubating) module extends [Druid Hadoop based indexing](../../ingestion/hadoop.html) to ingest data directly from offline
Apache Parquet files.
Note: `druid-parquet-extensions` depends on the `druid-avro-extensions` module, so be sure to

View File

@ -24,7 +24,7 @@ title: "PostgreSQL Metadata Store"
# PostgreSQL Metadata Store
Make sure to [include](../../operations/including-extensions.html) `postgresql-metadata-storage` as an extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `postgresql-metadata-storage` as an extension.
## Setting up PostgreSQL

View File

@ -24,7 +24,7 @@ title: "Protobuf"
# Protobuf
This extension enables Druid to ingest and understand the Protobuf data format. Make sure to [include](../../operations/including-extensions.html) `druid-protobuf-extensions` as an extension.
This Apache Druid (incubating) extension enables Druid to ingest and understand the Protobuf data format. Make sure to [include](../../operations/including-extensions.html) `druid-protobuf-extensions` as an extension.
## Protobuf Parser

View File

@ -24,7 +24,7 @@ title: "S3-compatible"
# S3-compatible
Make sure to [include](../../operations/including-extensions.html) `druid-s3-extensions` as an extension.
To use this Apache Druid (incubating) extension, make sure to [include](../../operations/including-extensions.html) `druid-s3-extensions` as an extension.
## Deep Storage

View File

@ -24,7 +24,7 @@ title: "Simple SSLContext Provider Module"
# Simple SSLContext Provider Module
This module contains a simple implementation of [SSLContext](http://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLContext.html)
This Apache Druid (incubating) module contains a simple implementation of [SSLContext](http://docs.oracle.com/javase/8/docs/api/javax/net/ssl/SSLContext.html)
that will be injected to be used with HttpClient that Druid processes use internally to communicate with each other. To learn more about
Java's SSL support, please refer to [this](http://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/JSSERefGuide.html) guide.

View File

@ -24,7 +24,7 @@ title: "Stats aggregator"
# Stats aggregator
Includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../operations/including-extensions.html) `druid-stats` as an extension.
This Apache Druid (incubating) extension includes stat-related aggregators, including variance and standard deviations, etc. Make sure to [include](../../operations/including-extensions.html) `druid-stats` as an extension.
## Variance aggregator

View File

@ -24,7 +24,7 @@ title: "Test Stats Aggregators"
# Test Stats Aggregators
Incorporates test statistics related aggregators, including z-score and p-value. Please refer to [https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/](https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/) for math background and details.
This Apache Druid (incubating) extension incorporates test statistics related aggregators, including z-score and p-value. Please refer to [https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/](https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/) for math background and details.
Make sure to include `druid-stats` extension in order to use these aggregrators.

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid extensions"
title: "Apache Druid (incubating) extensions"
---
<!--

View File

@ -24,7 +24,7 @@ title: "Geographic Queries"
# Geographic Queries
Druid supports filtering specially spatially indexed columns based on an origin and a bound.
Apache Druid (incubating) supports filtering specially spatially indexed columns based on an origin and a bound.
# Spatial Indexing
In any of the data specs, there is the option of providing spatial dimensions. For example, for a JSON data spec, spatial dimensions can be specified as follows:

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Integrating Druid With Other Technologies"
title: "Integrating Apache Druid (incubating) With Other Technologies"
---
<!--

View File

@ -24,7 +24,7 @@ title: "JavaScript Programming Guide"
# JavaScript Programming Guide
This page discusses how to use JavaScript to extend Druid.
This page discusses how to use JavaScript to extend Apache Druid (incubating).
## Examples

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Extending Druid With Custom Modules"
title: "Extending Apache Druid (incubating) With Custom Modules"
---
<!--

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Developing on Druid"
title: "Developing on Apache Druid (incubating)"
---
<!--

View File

@ -24,7 +24,7 @@ title: "Router Process"
# Router Process
The Router process can be used to route queries to different Broker processes. By default, the broker routes queries based on how [Rules](../operations/rule-configuration.html) are set up. For example, if 1 month of recent data is loaded into a `hot` cluster, queries that fall within the recent month can be routed to a dedicated set of brokers. Queries outside this range are routed to another set of brokers. This set up provides query isolation such that queries for more important data are not impacted by queries for less important data.
The Apache Druid (incubating) Router process can be used to route queries to different Broker processes. By default, the broker routes queries based on how [Rules](../operations/rule-configuration.html) are set up. For example, if 1 month of recent data is loaded into a `hot` cluster, queries that fall within the recent month can be routed to a dedicated set of brokers. Queries outside this range are routed to another set of brokers. This set up provides query isolation such that queries for more important data are not impacted by queries for less important data.
For query routing purposes, you should only ever need the Router process if you have a Druid cluster well into the terabyte range.

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Versioning Druid"
title: "Versioning Apache Druid (incubating)"
---
<!--

View File

@ -24,7 +24,7 @@ title: "Batch Data Ingestion"
# Batch Data Ingestion
Druid can load data from static files through a variety of methods described here.
Apache Druid (incubating) can load data from static files through a variety of methods described here.
## Native Batch Ingestion

View File

@ -32,7 +32,7 @@ java -Xmx256m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -classpath lib/*:<hadoop
## Options
- "--coordinate" - provide a version of Hadoop to use. This property will override the default Hadoop coordinates. Once specified, Druid will look for those Hadoop dependencies from the location specified by `druid.extensions.hadoopDependenciesDir`.
- "--coordinate" - provide a version of Apache Hadoop to use. This property will override the default Hadoop coordinates. Once specified, Apache Druid (incubating) will look for those Hadoop dependencies from the location specified by `druid.extensions.hadoopDependenciesDir`.
- "--no-default-hadoop" - don't pull down the default hadoop version
## Spec file

View File

@ -90,7 +90,7 @@ data segments loaded in it (or if the interval you specify is empty).
The output segment can have different metadata from the input segments unless all input segments have the same metadata.
- Dimensions: since Druid supports schema change, the dimensions can be different across segments even if they are a part of the same dataSource.
- Dimensions: since Apache Druid (incubating) supports schema change, the dimensions can be different across segments even if they are a part of the same dataSource.
If the input segments have different dimensions, the output segment basically includes all dimensions of the input segments.
However, even if the input segments have the same set of dimensions, the dimension order or the data type of dimensions can be different. For example, the data type of some dimensions can be
changed from `string` to primitive types, or the order of dimensions can be changed for better locality.

View File

@ -24,7 +24,7 @@ title: "Data Formats for Ingestion"
# Data Formats for Ingestion
Druid can ingest denormalized data in JSON, CSV, or a delimited form such as TSV, or any custom format. While most examples in the documentation use data in JSON format, it is not difficult to configure Druid to ingest any other delimited data.
Apache Druid (incubating) can ingest denormalized data in JSON, CSV, or a delimited form such as TSV, or any custom format. While most examples in the documentation use data in JSON format, it is not difficult to configure Druid to ingest any other delimited data.
We welcome any contributions to new formats.
For additional data formats, please see our [extensions list](../development/extensions.html).

View File

@ -24,7 +24,7 @@ title: "Deleting Data"
# Deleting Data
Permanent deletion of a Druid segment has two steps:
Permanent deletion of a segment in Apache Druid (incubating) has two steps:
1. The segment must first be marked as "unused". This occurs when a segment is dropped by retention rules, and when a user manually disables a segment through the Coordinator API.
2. After segments have been marked as "unused", a Kill Task will delete any "unused" segments from Druid's metadata store as well as deep storage.

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "My Data isn't being loaded"
title: "Apache Druid (incubating) FAQ"
---
<!--

View File

@ -1,6 +1,6 @@
---
layout: doc_page
title: "Druid Firehoses"
title: "Apache Druid (incubating) Firehoses"
---
<!--

View File

@ -24,7 +24,7 @@ title: "Hadoop-based Batch Ingestion VS Native Batch Ingestion"
# Comparison of Batch Ingestion Methods
Druid basically supports three types of batch ingestion: Hadoop-based
Apache Druid (incubating) basically supports three types of batch ingestion: Apache Hadoop-based
batch ingestion, native parallel batch ingestion, and native local batch
ingestion. The below table shows what features are supported by each
ingestion method.

View File

@ -24,7 +24,7 @@ title: "Hadoop-based Batch Ingestion"
# Hadoop-based Batch Ingestion
Hadoop-based batch ingestion in Druid is supported via a Hadoop-ingestion task. These tasks can be posted to a running
Apache Hadoop-based batch ingestion in Apache Druid (incubating) is supported via a Hadoop-ingestion task. These tasks can be posted to a running
instance of a Druid [Overlord](../design/overlord.html).
Please check [Hadoop-based Batch Ingestion VS Native Batch Ingestion](./hadoop-vs-native-batch.html) for differences between native batch ingestion and Hadoop-based ingestion.

View File

@ -30,7 +30,7 @@ title: "Ingestion"
### Datasources and segments
Druid data is stored in "datasources", which are similar to tables in a traditional RDBMS. Each datasource is
Apache Druid (incubating) data is stored in "datasources", which are similar to tables in a traditional RDBMS. Each datasource is
partitioned by time and, optionally, further partitioned by other attributes. Each time range is called a "chunk" (for
example, a single day, if your datasource is partitioned by day). Within a chunk, data is partitioned into one or more
"segments". Each segment is a single file, typically comprising up to a few million rows of data. Since segments are

View File

@ -24,7 +24,7 @@ title: "Ingestion Spec"
# Ingestion Spec
A Druid ingestion spec consists of 3 components:
An Apache Druid (incubating) ingestion spec consists of 3 components:
```json
{

View File

@ -43,7 +43,7 @@ Tasks are also part of a "task group", which is a set of tasks that can share in
## Priority
Druid's indexing tasks use locks for atomic data ingestion. Each lock is acquired for the combination of a dataSource and an interval. Once a task acquires a lock, it can write data for the dataSource and the interval of the acquired lock unless the lock is released or preempted. Please see [the below Locking section](#locking)
Apache Druid (incubating)'s indexing tasks use locks for atomic data ingestion. Each lock is acquired for the combination of a dataSource and an interval. Once a task acquires a lock, it can write data for the dataSource and the interval of the acquired lock unless the lock is released or preempted. Please see [the below Locking section](#locking)
Each task has a priority which is used for lock acquisition. The locks of higher-priority tasks can preempt the locks of lower-priority tasks if they try to acquire for the same dataSource and interval. If some locks of a task are preempted, the behavior of the preempted task depends on the task implementation. Usually, most tasks finish as failed if they are preempted.

View File

@ -24,7 +24,7 @@ title: "Native Index Tasks"
# Native Index Tasks
Druid currently has two types of native batch indexing tasks, `index_parallel` which runs tasks
Apache Druid (incubating) currently has two types of native batch indexing tasks, `index_parallel` which runs tasks
in parallel on multiple MiddleManager processes, and `index` which will run a single indexing task locally on a single
MiddleManager.

View File

@ -24,7 +24,7 @@ title: "Schema Changes"
# Schema Changes
Schemas for datasources can change at any time and Druid supports different schemas among segments.
Schemas for datasources can change at any time and Apache Druid (incubating) supports different schemas among segments.
## Replacing Segments

View File

@ -24,7 +24,7 @@ title: "Schema Design"
# Schema Design
This page is meant to assist users in designing a schema for data to be ingested in Druid. Druid offers a unique data
This page is meant to assist users in designing a schema for data to be ingested in Apache Druid (incubating). Druid offers a unique data
modeling system that bears similarity to both relational and timeseries models. The key factors are:
* Druid data is stored in [datasources](index.html#datasources), which are similar to tables in a traditional RDBMS.

View File

@ -24,7 +24,7 @@ title: "Loading Streams"
# Loading Streams
Streams can be ingested in Druid using either [Tranquility](https://github.com/druid-io/tranquility) (a Druid-aware
Streams can be ingested in Apache Druid (incubating) using either [Tranquility](https://github.com/druid-io/tranquility) (a Druid-aware
client) or the [Kafka Indexing Service](../development/extensions-core/kafka-ingestion.html).
## Tranquility (Stream Push)

View File

@ -29,7 +29,7 @@ NOTE: Realtime processes are deprecated. Please use the <a href="../development/
# Stream Pull Ingestion
If you have an external service that you want to pull data from, you have two options. The simplest
option is to set up a "copying" service that reads from the data source and writes to Druid using
option is to set up a "copying" service that reads from the data source and writes to Apache Druid (incubating) using
the [stream push method](stream-push.html).
Another option is *stream pull*. With this approach, a Druid Realtime Process ingests data from a

View File

@ -24,7 +24,7 @@ title: "Stream Push"
# Stream Push
Druid can connect to any streaming data source through
Apache Druid (incubating) can connect to any streaming data source through
[Tranquility](https://github.com/druid-io/tranquility/blob/master/README.md), a package for pushing
streams to Druid in real-time. Druid does not come bundled with Tranquility, and you will have to download the distribution.

Some files were not shown because too many files have changed in this diff Show More