mirror of https://github.com/apache/druid.git
More documentation formatting fixes (#8149)
Add empty lines before bulleted lists and code blocks, to ensure that they show up properly on the web site. See also #8079.
This commit is contained in:
parent
0695e487e7
commit
c87b47e0fa
|
@ -37,6 +37,7 @@ The Approximate Histogram aggregator is deprecated. Please use <a href="../exten
|
|||
This aggregator is based on
|
||||
[http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf)
|
||||
to compute approximate histograms, with the following modifications:
|
||||
|
||||
- some tradeoffs in accuracy were made in the interest of speed (see below)
|
||||
- the sketch maintains the exact original data as long as the number of
|
||||
distinct data points is fewer than the resolutions (number of centroids),
|
||||
|
|
|
@ -33,6 +33,7 @@ to use with Druid for cases where an explicit filter is impossible, e.g. filteri
|
|||
values.
|
||||
|
||||
Following are some characteristics of BloomFilters:
|
||||
|
||||
- BloomFilters are highly space efficient when compared to using a HashSet.
|
||||
- Because of the probabilistic nature of bloom filters, false positive results are possible (element was not actually
|
||||
inserted into a bloom filter during construction, but `test()` says true)
|
||||
|
|
|
@ -25,6 +25,7 @@ title: "Basic Security"
|
|||
# Druid Basic Security
|
||||
|
||||
This Apache Druid (incubating) extension adds:
|
||||
|
||||
- an Authenticator which supports [HTTP Basic authentication](https://en.wikipedia.org/wiki/Basic_access_authentication)
|
||||
- an Authorizer which implements basic role-based access control
|
||||
|
||||
|
@ -342,6 +343,7 @@ Unassign role {roleName} from user {userName}
|
|||
Set the permissions of {roleName}. This replaces the previous set of permissions on the role.
|
||||
|
||||
Content: List of JSON Resource-Action objects, e.g.:
|
||||
|
||||
```
|
||||
[
|
||||
{
|
||||
|
|
|
@ -55,6 +55,7 @@ the implementation of splittable firehoses. Please note that multiple tasks can
|
|||
if one of them fails.
|
||||
|
||||
You may want to consider the below points:
|
||||
|
||||
- Since this task doesn't shuffle intermediate data, it isn't available for [perfect rollup](../ingestion/index.html#roll-up-modes).
|
||||
- The number of tasks for parallel ingestion is decided by `maxNumSubTasks` in the tuningConfig.
|
||||
Since the supervisor task creates up to `maxNumSubTasks` worker tasks regardless of the available task slots,
|
||||
|
|
|
@ -37,6 +37,7 @@ If you have questions on tuning Druid for specific use cases, or questions on co
|
|||
#### Heap sizing
|
||||
|
||||
The biggest contributions to heap usage on Historicals are:
|
||||
|
||||
- Partial unmerged query results from segments
|
||||
- The stored maps for [lookups](../querying/lookups.html).
|
||||
|
||||
|
@ -63,6 +64,7 @@ Be sure to add `(2 * total size of all loaded lookups)` to your heap size in add
|
|||
Please see the [General Guidelines for Processing Threads and Buffers](#general-guidelines-for-processing-threads-and-buffers) section for an overview of processing thread/buffer configuration.
|
||||
|
||||
On Historicals:
|
||||
|
||||
- `druid.processing.numThreads` should generally be set to `(number of cores - 1)`: a smaller value can result in CPU underutilization, while going over the number of cores can result in unnecessary CPU contention.
|
||||
- `druid.processing.buffer.sizeBytes` can be set to 500MB.
|
||||
- `druid.processing.numMergeBuffers`, a 1:4 ratio of merge buffers to processing threads is a reasonable choice for general use.
|
||||
|
|
|
@ -28,6 +28,7 @@ If you have been running an evaluation Druid cluster using local deep storage an
|
|||
more production-capable deep storage system such as S3 or HDFS, this document describes the necessary steps.
|
||||
|
||||
Migration of deep storage involves the following steps at a high level:
|
||||
|
||||
- Copying segments from local deep storage to the new deep storage
|
||||
- Exporting Druid's segments table from metadata
|
||||
- Rewriting the load specs in the exported segment data to reflect the new deep storage location
|
||||
|
|
|
@ -27,6 +27,7 @@ title: "Export Metadata Tool"
|
|||
Druid includes an `export-metadata` tool for assisting with migration of cluster metadata and deep storage.
|
||||
|
||||
This tool exports the contents of the following Druid metadata tables:
|
||||
|
||||
- segments
|
||||
- rules
|
||||
- config
|
||||
|
@ -37,6 +38,7 @@ Additionally, the tool can rewrite the local deep storage location descriptors i
|
|||
to point to new deep storage locations (S3, HDFS, and local rewrite paths are supported).
|
||||
|
||||
The tool has the following limitations:
|
||||
|
||||
- Only exporting from Derby metadata is currently supported
|
||||
- If rewriting load specs for deep storage migration, only migrating from local deep storage is currently supported.
|
||||
|
||||
|
@ -46,20 +48,19 @@ The `export-metadata` tool provides the following options:
|
|||
|
||||
### Connection Properties
|
||||
|
||||
`--connectURI`: The URI of the Derby database, e.g. `jdbc:derby://localhost:1527/var/druid/metadata.db;create=true`
|
||||
`--user`: Username
|
||||
`--password`: Password
|
||||
`--base`: corresponds to the value of `druid.metadata.storage.tables.base` in the configuration, `druid` by default.
|
||||
- `--connectURI`: The URI of the Derby database, e.g. `jdbc:derby://localhost:1527/var/druid/metadata.db;create=true`
|
||||
- `--user`: Username
|
||||
- `--password`: Password
|
||||
- `--base`: corresponds to the value of `druid.metadata.storage.tables.base` in the configuration, `druid` by default.
|
||||
|
||||
### Output Path
|
||||
|
||||
`--output-path`, `-o`: The output directory of the tool. CSV files for the Druid segments, rules, config, datasource, and supervisors tables will be written to this directory.
|
||||
- `--output-path`, `-o`: The output directory of the tool. CSV files for the Druid segments, rules, config, datasource, and supervisors tables will be written to this directory.
|
||||
|
||||
### Export Format Options
|
||||
|
||||
`--use-hex-blobs`, `-x`: If set, export BLOB payload columns as hexadecimal strings. This needs to be set if importing back into Derby. Default is false.
|
||||
|
||||
`--booleans-as-strings`, `-t`: If set, write boolean values as "true" or "false" instead of "1" and "0". This needs to be set if importing back into Derby. Default is false.
|
||||
- `--use-hex-blobs`, `-x`: If set, export BLOB payload columns as hexadecimal strings. This needs to be set if importing back into Derby. Default is false.
|
||||
- `--booleans-as-strings`, `-t`: If set, write boolean values as "true" or "false" instead of "1" and "0". This needs to be set if importing back into Derby. Default is false.
|
||||
|
||||
### Deep Storage Migration
|
||||
|
||||
|
@ -69,8 +70,8 @@ By setting the options below, the tool will rewrite the segment load specs to po
|
|||
|
||||
This helps users migrate segments stored in local deep storage to S3.
|
||||
|
||||
`--s3bucket`, `-b`: The S3 bucket that will hold the migrated segments
|
||||
`--s3baseKey`, `-k`: The base S3 key where the migrated segments will be stored
|
||||
- `--s3bucket`, `-b`: The S3 bucket that will hold the migrated segments
|
||||
- `--s3baseKey`, `-k`: The base S3 key where the migrated segments will be stored
|
||||
|
||||
When copying the local deep storage segments to S3, the rewrite performed by this tool requires that the directory structure of the segments be unchanged.
|
||||
|
||||
|
@ -142,6 +143,7 @@ java -classpath "lib/*" -Dlog4j.configurationFile=conf/druid/cluster/_common/log
|
|||
```
|
||||
|
||||
In the example command above:
|
||||
|
||||
- `lib` is the the Druid lib directory
|
||||
- `extensions` is the Druid extensions directory
|
||||
- `/tmp/csv` is the output directory. Please make sure that this directory exists.
|
||||
|
|
|
@ -61,6 +61,7 @@ Update your Druid runtime properties with the new metadata configuration.
|
|||
Druid provides a `metadata-init` tool for creating Druid's metadata tables. After initializing the Druid database, you can run the commands shown below from the root of the Druid package to initialize the tables.
|
||||
|
||||
In the example commands below:
|
||||
|
||||
- `lib` is the the Druid lib directory
|
||||
- `extensions` is the Druid extensions directory
|
||||
- `base` corresponds to the value of `druid.metadata.storage.tables.base` in the configuration, `druid` by default.
|
||||
|
|
|
@ -59,6 +59,7 @@ JVM Flags:
|
|||
Please note that above flags are general guidelines only. Be cautious and feel free to change them if necessary for the specific deployment.
|
||||
|
||||
Additionally, for large jvm heaps, here are a few Garbage Collection efficiency guidelines that have been known to help in some cases.
|
||||
|
||||
- Mount /tmp on tmpfs ( See http://www.evanjones.ca/jvm-mmap-pause.html )
|
||||
- On Disk-IO intensive processes (e.g. Historical and MiddleManager), GC and Druid logs should be written to a different disk than where data is written.
|
||||
- Disable Transparent Huge Pages ( See https://blogs.oracle.com/linux/performance-issues-with-transparent-huge-pages-thp )
|
||||
|
|
|
@ -337,6 +337,7 @@ The [Approximate Histogram](../development/extensions-core/approximate-histogram
|
|||
The algorithm used by this deprecated aggregator is highly distribution-dependent and its output is subject to serious distortions when the input does not fit within the algorithm's limitations.
|
||||
|
||||
A [study published by the DataSketches team](https://datasketches.github.io/docs/Quantiles/DruidApproxHistogramStudy.html) demonstrates some of the known failure modes of this algorithm:
|
||||
|
||||
- The algorithm's quantile calculations can fail to provide results for a large range of rank values (all ranks less than 0.89 in the example used in the study), returning all zeroes instead.
|
||||
- The algorithm can completely fail to record spikes in the tail ends of the distribution
|
||||
- In general, the histogram produced by the algorithm can deviate significantly from the true histogram, with no bounds on the errors.
|
||||
|
|
|
@ -30,6 +30,7 @@ In this document, we'll set up a simple cluster and discuss how it can be furthe
|
|||
your needs.
|
||||
|
||||
This simple cluster will feature:
|
||||
|
||||
- A Master server to host the Coordinator and Overlord processes
|
||||
- Two scalable, fault-tolerant Data servers running Historical and MiddleManager processes
|
||||
- A query server, hosting the Druid Broker and Router processes
|
||||
|
@ -49,6 +50,7 @@ The Coordinator and Overlord processes are responsible for handling the metadata
|
|||
In this example, we will be deploying the equivalent of one AWS [m5.2xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instance.
|
||||
|
||||
This hardware offers:
|
||||
|
||||
- 8 vCPUs
|
||||
- 31 GB RAM
|
||||
|
||||
|
@ -77,6 +79,7 @@ in-memory query cache. These servers benefit greatly from CPU and RAM.
|
|||
In this example, we will be deploying the equivalent of one AWS [m5.2xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instance.
|
||||
|
||||
This hardware offers:
|
||||
|
||||
- 8 vCPUs
|
||||
- 31 GB RAM
|
||||
|
||||
|
@ -323,6 +326,7 @@ You can copy your existing `coordinator-overlord` configs from the single-server
|
|||
Suppose we are migrating from a single-server deployment that had 32 CPU and 256GB RAM. In the old deployment, the following configurations for Historicals and MiddleManagers were applied:
|
||||
|
||||
Historical (Single-server)
|
||||
|
||||
```
|
||||
druid.processing.buffer.sizeBytes=500000000
|
||||
druid.processing.numMergeBuffers=8
|
||||
|
@ -330,6 +334,7 @@ druid.processing.numThreads=31
|
|||
```
|
||||
|
||||
MiddleManager (Single-server)
|
||||
|
||||
```
|
||||
druid.worker.capacity=8
|
||||
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
|
||||
|
@ -340,11 +345,13 @@ druid.indexer.fork.property.druid.processing.numThreads=1
|
|||
In the clustered deployment, we can choose a split factor (2 in this example), and deploy 2 Data servers with 16CPU and 128GB RAM each. The areas to scale are the following:
|
||||
|
||||
Historical
|
||||
|
||||
- `druid.processing.numThreads`: Set to `(num_cores - 1)` based on the new hardware
|
||||
- `druid.processing.numMergeBuffers`: Divide the old value from the single-server deployment by the split factor
|
||||
- `druid.processing.buffer.sizeBytes`: Keep this unchanged
|
||||
|
||||
MiddleManager:
|
||||
|
||||
- `druid.worker.capacity`: Divide the old value from the single-server deployment by the split factor
|
||||
- `druid.indexer.fork.property.druid.processing.numMergeBuffers`: Keep this unchanged
|
||||
- `druid.indexer.fork.property.druid.processing.buffer.sizeBytes`: Keep this unchanged
|
||||
|
@ -353,6 +360,7 @@ MiddleManager:
|
|||
The resulting configs after the split:
|
||||
|
||||
New Historical (on 2 Data servers)
|
||||
|
||||
```
|
||||
druid.processing.buffer.sizeBytes=500000000
|
||||
druid.processing.numMergeBuffers=8
|
||||
|
@ -360,6 +368,7 @@ New Historical (on 2 Data servers)
|
|||
```
|
||||
|
||||
New MiddleManager (on 2 Data servers)
|
||||
|
||||
```
|
||||
druid.worker.capacity=4
|
||||
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
|
||||
|
|
Loading…
Reference in New Issue