druid/docs/content/operations/recommendations.md

---
layout: doc_page
title: "Apache Druid (incubating) Recommendations"
---

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

# Recommendations

# Some General guidelines

JVM Flags:

```
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=<something other than /tmp which might be mounted to volatile tmpfs file system>
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Dorg.jboss.logging.provider=slf4j
-Dnet.spy.log.LoggerImpl=net.spy.memcached.compat.log.SLF4JLogger
-Dlog4j.shutdownCallbackRegistry=org.apache.druid.common.config.Log4jShutdown
-Dlog4j.shutdownHookEnabled=true
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-Xloggc:/var/logs/druid/historical.gc.log
-XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=50
-XX:GCLogFileSize=10m
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/var/logs/druid/historical.hprof
-XX:MaxDirectMemorySize=10240g
```

`ExitOnOutOfMemoryError` flag is only supported starting JDK 8u92 . For older versions, `-XX:OnOutOfMemoryError='kill -9 %p'` can be used.

`MaxDirectMemorySize` restricts jvm from allocating more than specified limit, by setting it to unlimited jvm restriction is lifted and OS level memory limits would still be effective. It's still important to make sure that Druid is not configured to allocate more off-heap memory than your machine has available. Important settings here include druid.processing.numThreads, druid.processing.numMergeBuffers, and druid.processing.buffer.sizeBytes.

Please note that above flags are general guidelines only. Be cautious and feel free to change them if necessary for the specific deployment.

Additionally, for large jvm heaps, here are a few Garbage Collection efficiency guidelines that have been known to help in some cases.
- Mount /tmp on tmpfs ( See http://www.evanjones.ca/jvm-mmap-pause.html )
- On Disk-IO intensive processes (e.g. Historical and MiddleManager), GC and Druid logs should be written to a different disk than where data is written.
- Disable Transparent Huge Pages ( See https://blogs.oracle.com/linux/performance-issues-with-transparent-huge-pages-thp )
- Try disabling biased locking by using `-XX:-UseBiasedLocking` jvm flag. ( See https://dzone.com/articles/logging-stop-world-pauses-jvm )

# Use UTC Timezone

We recommend using UTC timezone for all your events and across your hosts, not just for Druid, but for all data infrastructure. This can greatly mitigate potential query problems with inconsistent timezones. To query in a non-UTC timezone see [query granularities](../querying/granularities.html#period-granularities)

# SSDs

SSDs are highly recommended for Historical and real-time processes if you are not running a cluster that is entirely in memory. SSDs can greatly mitigate the time required to page data in and out of memory.

# JBOD vs RAID
Historical processes store large number of segments on Disk and support specifying multiple paths for storing those. Typically, hosts have multiple disks configured with RAID which makes them look like a single disk to OS. RAID might have overheads specially if its not hardware controller based but software based. So, Historicals might get improved disk throughput with JBOD.

# Use Timeseries and TopN Queries Instead of GroupBy Where Possible

Timeseries and TopN queries are much more optimized and significantly faster than groupBy queries for their designed use cases. Issuing multiple topN or timeseries queries from your application can potentially be more efficient than a single groupBy query.

# Segment sizes matter

Segments should generally be between 300MB-700MB in size. Too many small segments results in inefficient CPU utilizations and 
too many large segments impacts query performance, most notably with TopN queries.

# Read FAQs

You should read common problems people have here:

1) [Ingestion-FAQ](../ingestion/faq.html)

2) [Performance-FAQ](../operations/performance-faq.html)
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`---`
			`layout: doc_page`
Add more Apache branding to docs (#7515) 2019-04-19 18:52:26 -04:00			`title: "Apache Druid (incubating) Recommendations"`
Front Matter header needs to be on the first line for md to be rendered properly by jekyll (#6733) 2018-12-13 14:47:20 -05:00			`---`

add missing license headers, in particular to MD files; clean up RAT … (#6563) * add missing license headers, in particular to MD files; clean up RAT exclusions * revert inadvertent doc changes * docs * cr changes * fix modified druid-production.svg 2018-11-13 12:38:37 -05:00			`<!--`
			`~ Licensed to the Apache Software Foundation (ASF) under one`
			`~ or more contributor license agreements. See the NOTICE file`
			`~ distributed with this work for additional information`
			`~ regarding copyright ownership. The ASF licenses this file`
			`~ to you under the Apache License, Version 2.0 (the`
			`~ "License"); you may not use this file except in compliance`
			`~ with the License. You may obtain a copy of the License at`
			`~`
			`~ http://www.apache.org/licenses/LICENSE-2.0`
			`~`
			`~ Unless required by applicable law or agreed to in writing,`
			`~ software distributed under the License is distributed on an`
			`~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY`
			`~ KIND, either express or implied. See the License for the`
			`~ specific language governing permissions and limitations`
			`~ under the License.`
			`-->`

Added titles and harmonized docs to improve usability and SEO (#6731) * added titles and harmonized docs * manually fixed some titles 2018-12-12 23:42:12 -05:00			`# Recommendations`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
[documentation] add more jvm and os guidelines (#4793) * add more jvm and os guidelines * address review comments * add not so general guidelines too * duplicate statement removal 2017-09-20 16:12:57 -04:00			`# Some General guidelines`

			`JVM Flags:`

			```
			`-Duser.timezone=UTC`
			`-Dfile.encoding=UTF-8`
			`-Djava.io.tmpdir=<something other than /tmp which might be mounted to volatile tmpfs file system>`
			`-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager`
			`-Dorg.jboss.logging.provider=slf4j`
			`-Dnet.spy.log.LoggerImpl=net.spy.memcached.compat.log.SLF4JLogger`
Rename io.druid to org.apache.druid. (#6266) * Rename io.druid to org.apache.druid. * Fix META-INF files and remove some benchmark results. * MonitorsConfig update for metrics package migration. * Reorder some dimensions in inner queries for some reason. * Fix protobuf tests. 2018-08-30 12:56:26 -04:00			`-Dlog4j.shutdownCallbackRegistry=org.apache.druid.common.config.Log4jShutdown`
[documentation] add more jvm and os guidelines (#4793) * add more jvm and os guidelines * address review comments * add not so general guidelines too * duplicate statement removal 2017-09-20 16:12:57 -04:00			`-Dlog4j.shutdownHookEnabled=true`
			`-XX:+PrintGCDetails`
			`-XX:+PrintGCDateStamps`
			`-XX:+PrintGCTimeStamps`
			`-XX:+PrintGCApplicationStoppedTime`
			`-XX:+PrintGCApplicationConcurrentTime`
			`-Xloggc:/var/logs/druid/historical.gc.log`
			`-XX:+UseGCLogFileRotation`
			`-XX:NumberOfGCLogFiles=50`
			`-XX:GCLogFileSize=10m`
			`-XX:+ExitOnOutOfMemoryError`
			`-XX:+HeapDumpOnOutOfMemoryError`
			`-XX:HeapDumpPath=/var/logs/druid/historical.hprof`
			`-XX:MaxDirectMemorySize=10240g`
			```

			`ExitOnOutOfMemoryError` flag is only supported starting JDK 8u92 . For older versions, `-XX:OnOutOfMemoryError='kill -9 %p'` can be used.

			`MaxDirectMemorySize` restricts jvm from allocating more than specified limit, by setting it to unlimited jvm restriction is lifted and OS level memory limits would still be effective. It's still important to make sure that Druid is not configured to allocate more off-heap memory than your machine has available. Important settings here include druid.processing.numThreads, druid.processing.numMergeBuffers, and druid.processing.buffer.sizeBytes.

			`Please note that above flags are general guidelines only. Be cautious and feel free to change them if necessary for the specific deployment.`

			`Additionally, for large jvm heaps, here are a few Garbage Collection efficiency guidelines that have been known to help in some cases.`
			`- Mount /tmp on tmpfs ( See http://www.evanjones.ca/jvm-mmap-pause.html )`
Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`- On Disk-IO intensive processes (e.g. Historical and MiddleManager), GC and Druid logs should be written to a different disk than where data is written.`
[documentation] add more jvm and os guidelines (#4793) * add more jvm and os guidelines * address review comments * add not so general guidelines too * duplicate statement removal 2017-09-20 16:12:57 -04:00			`- Disable Transparent Huge Pages ( See https://blogs.oracle.com/linux/performance-issues-with-transparent-huge-pages-thp )`
			- Try disabling biased locking by using `-XX:-UseBiasedLocking` jvm flag. ( See https://dzone.com/articles/logging-stop-world-pauses-jvm )

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`# Use UTC Timezone`

Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`We recommend using UTC timezone for all your events and across your hosts, not just for Druid, but for all data infrastructure. This can greatly mitigate potential query problems with inconsistent timezones. To query in a non-UTC timezone see [query granularities](../querying/granularities.html#period-granularities)`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
			`# SSDs`

Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`SSDs are highly recommended for Historical and real-time processes if you are not running a cluster that is entirely in memory. SSDs can greatly mitigate the time required to page data in and out of memory.`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
[documentation] add more jvm and os guidelines (#4793) * add more jvm and os guidelines * address review comments * add not so general guidelines too * duplicate statement removal 2017-09-20 16:12:57 -04:00			`# JBOD vs RAID`
Reword 'node' to 'process' (#7172) 2019-02-28 21:10:39 -05:00			`Historical processes store large number of segments on Disk and support specifying multiple paths for storing those. Typically, hosts have multiple disks configured with RAID which makes them look like a single disk to OS. RAID might have overheads specially if its not hardware controller based but software based. So, Historicals might get improved disk throughput with JBOD.`
[documentation] add more jvm and os guidelines (#4793) * add more jvm and os guidelines * address review comments * add not so general guidelines too * duplicate statement removal 2017-09-20 16:12:57 -04:00
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`# Use Timeseries and TopN Queries Instead of GroupBy Where Possible`

			`Timeseries and TopN queries are much more optimized and significantly faster than groupBy queries for their designed use cases. Issuing multiple topN or timeseries queries from your application can potentially be more efficient than a single groupBy query.`

more docs for common questions 2015-08-21 16:54:00 -04:00			`# Segment sizes matter`

			`Segments should generally be between 300MB-700MB in size. Too many small segments results in inefficient CPU utilizations and`
			`too many large segments impacts query performance, most notably with TopN queries.`

renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00			`# Read FAQs`

			`You should read common problems people have here:`

fix ingestion faq link 2015-10-16 11:14:14 -04:00			`1) [Ingestion-FAQ](../ingestion/faq.html)`
renaming all *.md filenames to only have lowercase and dashes so that they are editable on case-insensitive os as well 2015-05-05 17:07:32 -04:00
			`2) [Performance-FAQ](../operations/performance-faq.html)`