More documentation formatting fixes (#8149)

Add empty lines before bulleted lists and code blocks, to ensure that
they show up properly on the web site.  See also #8079.
This commit is contained in:
Magnus Henoch 2019-07-24 23:26:03 +01:00 committed by Himanshu
parent 0695e487e7
commit c87b47e0fa
11 changed files with 32 additions and 10 deletions

View File

@ -37,6 +37,7 @@ The Approximate Histogram aggregator is deprecated. Please use <a href="../exten
This aggregator is based on
[http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf)
to compute approximate histograms, with the following modifications:
- some tradeoffs in accuracy were made in the interest of speed (see below)
- the sketch maintains the exact original data as long as the number of
distinct data points is fewer than the resolutions (number of centroids),

View File

@ -33,6 +33,7 @@ to use with Druid for cases where an explicit filter is impossible, e.g. filteri
values.
Following are some characteristics of BloomFilters:
- BloomFilters are highly space efficient when compared to using a HashSet.
- Because of the probabilistic nature of bloom filters, false positive results are possible (element was not actually
inserted into a bloom filter during construction, but `test()` says true)

View File

@ -25,6 +25,7 @@ title: "Basic Security"
# Druid Basic Security
This Apache Druid (incubating) extension adds:
- an Authenticator which supports [HTTP Basic authentication](https://en.wikipedia.org/wiki/Basic_access_authentication)
- an Authorizer which implements basic role-based access control
@ -342,6 +343,7 @@ Unassign role {roleName} from user {userName}
Set the permissions of {roleName}. This replaces the previous set of permissions on the role.
Content: List of JSON Resource-Action objects, e.g.:
```
[
{

View File

@ -55,6 +55,7 @@ the implementation of splittable firehoses. Please note that multiple tasks can
if one of them fails.
You may want to consider the below points:
- Since this task doesn't shuffle intermediate data, it isn't available for [perfect rollup](../ingestion/index.html#roll-up-modes).
- The number of tasks for parallel ingestion is decided by `maxNumSubTasks` in the tuningConfig.
Since the supervisor task creates up to `maxNumSubTasks` worker tasks regardless of the available task slots,

View File

@ -37,6 +37,7 @@ If you have questions on tuning Druid for specific use cases, or questions on co
#### Heap sizing
The biggest contributions to heap usage on Historicals are:
- Partial unmerged query results from segments
- The stored maps for [lookups](../querying/lookups.html).
@ -63,6 +64,7 @@ Be sure to add `(2 * total size of all loaded lookups)` to your heap size in add
Please see the [General Guidelines for Processing Threads and Buffers](#general-guidelines-for-processing-threads-and-buffers) section for an overview of processing thread/buffer configuration.
On Historicals:
- `druid.processing.numThreads` should generally be set to `(number of cores - 1)`: a smaller value can result in CPU underutilization, while going over the number of cores can result in unnecessary CPU contention.
- `druid.processing.buffer.sizeBytes` can be set to 500MB.
- `druid.processing.numMergeBuffers`, a 1:4 ratio of merge buffers to processing threads is a reasonable choice for general use.

View File

@ -28,6 +28,7 @@ If you have been running an evaluation Druid cluster using local deep storage an
more production-capable deep storage system such as S3 or HDFS, this document describes the necessary steps.
Migration of deep storage involves the following steps at a high level:
- Copying segments from local deep storage to the new deep storage
- Exporting Druid's segments table from metadata
- Rewriting the load specs in the exported segment data to reflect the new deep storage location

View File

@ -27,6 +27,7 @@ title: "Export Metadata Tool"
Druid includes an `export-metadata` tool for assisting with migration of cluster metadata and deep storage.
This tool exports the contents of the following Druid metadata tables:
- segments
- rules
- config
@ -37,6 +38,7 @@ Additionally, the tool can rewrite the local deep storage location descriptors i
to point to new deep storage locations (S3, HDFS, and local rewrite paths are supported).
The tool has the following limitations:
- Only exporting from Derby metadata is currently supported
- If rewriting load specs for deep storage migration, only migrating from local deep storage is currently supported.
@ -46,20 +48,19 @@ The `export-metadata` tool provides the following options:
### Connection Properties
`--connectURI`: The URI of the Derby database, e.g. `jdbc:derby://localhost:1527/var/druid/metadata.db;create=true`
`--user`: Username
`--password`: Password
`--base`: corresponds to the value of `druid.metadata.storage.tables.base` in the configuration, `druid` by default.
- `--connectURI`: The URI of the Derby database, e.g. `jdbc:derby://localhost:1527/var/druid/metadata.db;create=true`
- `--user`: Username
- `--password`: Password
- `--base`: corresponds to the value of `druid.metadata.storage.tables.base` in the configuration, `druid` by default.
### Output Path
`--output-path`, `-o`: The output directory of the tool. CSV files for the Druid segments, rules, config, datasource, and supervisors tables will be written to this directory.
- `--output-path`, `-o`: The output directory of the tool. CSV files for the Druid segments, rules, config, datasource, and supervisors tables will be written to this directory.
### Export Format Options
`--use-hex-blobs`, `-x`: If set, export BLOB payload columns as hexadecimal strings. This needs to be set if importing back into Derby. Default is false.
`--booleans-as-strings`, `-t`: If set, write boolean values as "true" or "false" instead of "1" and "0". This needs to be set if importing back into Derby. Default is false.
- `--use-hex-blobs`, `-x`: If set, export BLOB payload columns as hexadecimal strings. This needs to be set if importing back into Derby. Default is false.
- `--booleans-as-strings`, `-t`: If set, write boolean values as "true" or "false" instead of "1" and "0". This needs to be set if importing back into Derby. Default is false.
### Deep Storage Migration
@ -69,8 +70,8 @@ By setting the options below, the tool will rewrite the segment load specs to po
This helps users migrate segments stored in local deep storage to S3.
`--s3bucket`, `-b`: The S3 bucket that will hold the migrated segments
`--s3baseKey`, `-k`: The base S3 key where the migrated segments will be stored
- `--s3bucket`, `-b`: The S3 bucket that will hold the migrated segments
- `--s3baseKey`, `-k`: The base S3 key where the migrated segments will be stored
When copying the local deep storage segments to S3, the rewrite performed by this tool requires that the directory structure of the segments be unchanged.
@ -142,6 +143,7 @@ java -classpath "lib/*" -Dlog4j.configurationFile=conf/druid/cluster/_common/log
```
In the example command above:
- `lib` is the the Druid lib directory
- `extensions` is the Druid extensions directory
- `/tmp/csv` is the output directory. Please make sure that this directory exists.

View File

@ -61,6 +61,7 @@ Update your Druid runtime properties with the new metadata configuration.
Druid provides a `metadata-init` tool for creating Druid's metadata tables. After initializing the Druid database, you can run the commands shown below from the root of the Druid package to initialize the tables.
In the example commands below:
- `lib` is the the Druid lib directory
- `extensions` is the Druid extensions directory
- `base` corresponds to the value of `druid.metadata.storage.tables.base` in the configuration, `druid` by default.

View File

@ -59,6 +59,7 @@ JVM Flags:
Please note that above flags are general guidelines only. Be cautious and feel free to change them if necessary for the specific deployment.
Additionally, for large jvm heaps, here are a few Garbage Collection efficiency guidelines that have been known to help in some cases.
- Mount /tmp on tmpfs ( See http://www.evanjones.ca/jvm-mmap-pause.html )
- On Disk-IO intensive processes (e.g. Historical and MiddleManager), GC and Druid logs should be written to a different disk than where data is written.
- Disable Transparent Huge Pages ( See https://blogs.oracle.com/linux/performance-issues-with-transparent-huge-pages-thp )

View File

@ -337,6 +337,7 @@ The [Approximate Histogram](../development/extensions-core/approximate-histogram
The algorithm used by this deprecated aggregator is highly distribution-dependent and its output is subject to serious distortions when the input does not fit within the algorithm's limitations.
A [study published by the DataSketches team](https://datasketches.github.io/docs/Quantiles/DruidApproxHistogramStudy.html) demonstrates some of the known failure modes of this algorithm:
- The algorithm's quantile calculations can fail to provide results for a large range of rank values (all ranks less than 0.89 in the example used in the study), returning all zeroes instead.
- The algorithm can completely fail to record spikes in the tail ends of the distribution
- In general, the histogram produced by the algorithm can deviate significantly from the true histogram, with no bounds on the errors.

View File

@ -30,6 +30,7 @@ In this document, we'll set up a simple cluster and discuss how it can be furthe
your needs.
This simple cluster will feature:
- A Master server to host the Coordinator and Overlord processes
- Two scalable, fault-tolerant Data servers running Historical and MiddleManager processes
- A query server, hosting the Druid Broker and Router processes
@ -49,6 +50,7 @@ The Coordinator and Overlord processes are responsible for handling the metadata
In this example, we will be deploying the equivalent of one AWS [m5.2xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instance.
This hardware offers:
- 8 vCPUs
- 31 GB RAM
@ -77,6 +79,7 @@ in-memory query cache. These servers benefit greatly from CPU and RAM.
In this example, we will be deploying the equivalent of one AWS [m5.2xlarge](https://aws.amazon.com/ec2/instance-types/m5/) instance.
This hardware offers:
- 8 vCPUs
- 31 GB RAM
@ -323,6 +326,7 @@ You can copy your existing `coordinator-overlord` configs from the single-server
Suppose we are migrating from a single-server deployment that had 32 CPU and 256GB RAM. In the old deployment, the following configurations for Historicals and MiddleManagers were applied:
Historical (Single-server)
```
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=8
@ -330,6 +334,7 @@ druid.processing.numThreads=31
```
MiddleManager (Single-server)
```
druid.worker.capacity=8
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
@ -340,11 +345,13 @@ druid.indexer.fork.property.druid.processing.numThreads=1
In the clustered deployment, we can choose a split factor (2 in this example), and deploy 2 Data servers with 16CPU and 128GB RAM each. The areas to scale are the following:
Historical
- `druid.processing.numThreads`: Set to `(num_cores - 1)` based on the new hardware
- `druid.processing.numMergeBuffers`: Divide the old value from the single-server deployment by the split factor
- `druid.processing.buffer.sizeBytes`: Keep this unchanged
MiddleManager:
- `druid.worker.capacity`: Divide the old value from the single-server deployment by the split factor
- `druid.indexer.fork.property.druid.processing.numMergeBuffers`: Keep this unchanged
- `druid.indexer.fork.property.druid.processing.buffer.sizeBytes`: Keep this unchanged
@ -353,6 +360,7 @@ MiddleManager:
The resulting configs after the split:
New Historical (on 2 Data servers)
```
druid.processing.buffer.sizeBytes=500000000
druid.processing.numMergeBuffers=8
@ -360,6 +368,7 @@ New Historical (on 2 Data servers)
```
New MiddleManager (on 2 Data servers)
```
druid.worker.capacity=4
druid.indexer.fork.property.druid.processing.numMergeBuffers=2