druid/docs/content/configuration/historical.md

10 KiB

layout
doc_page

Historical Node Configuration

For general Historical Node information, see here.

Runtime Configuration

The historical node uses several of the global configs in Configuration and has the following set of configurations as well:

Node Configs

Property Description Default
druid.host The host for the current node. This is used to advertise the current processes location as reachable from another node and should generally be specified such that http://${druid.host}/ could actually talk to this process InetAddress.getLocalHost().getCanonicalHostName()
druid.plaintextPort This is the port to actually listen on; unless port mapping is used, this will be the same port as is on druid.host 8083
druid.tlsPort TLS port for HTTPS connector, if druid.enableTlsPort is set then this config will be used. If druid.host contains port then that port will be ignored. This should be a non-negative Integer. 8283
druid.service The name of the service. This is used as a dimension when emitting metrics and alerts to differentiate between the various services druid/historical

General Configuration

Property Description Default
druid.server.maxSize The maximum number of bytes-worth of segments that the node wants assigned to it. This is not a limit that Historical nodes actually enforces, just a value published to the Coordinator node so it can plan accordingly. 0
druid.server.tier A string to name the distribution tier that the storage node belongs to. Many of the rules Coordinator nodes use to manage segments can be keyed on tiers. _default_tier
druid.server.priority In a tiered architecture, the priority of the tier, thus allowing control over which nodes are queried. Higher numbers mean higher priority. The default (no priority) works for architecture with no cross replication (tiers that have no data-storage overlap). Data centers typically have equal priority. 0

Storing Segments

Property Description Default
druid.segmentCache.locations Segments assigned to a Historical node are first stored on the local file system (in a disk cache) and then served by the Historical node. These locations define where that local cache resides. This value cannot be NULL or EMPTY. Here is an example druid.segmentCache.locations=[{"path": "/mnt/druidSegments", "maxSize": 10000, "freeSpacePercent": 1.0}]. "freeSpacePercent" is optional, if provided then enforces that much of free disk partition space while storing segments. But, it depends on File.getTotalSpace() and File.getFreeSpace() methods, so enable if only if they work for your File System. none
druid.segmentCache.deleteOnRemove Delete segment files from cache once a node is no longer serving a segment. true
druid.segmentCache.dropSegmentDelayMillis How long a node delays before completely dropping segment. 30000 (30 seconds)
druid.segmentCache.infoDir Historical nodes keep track of the segments they are serving so that when the process is restarted they can reload the same segments without waiting for the Coordinator to reassign. This path defines where this metadata is kept. Directory will be created if needed. ${first_location}/info_dir
druid.segmentCache.announceIntervalMillis How frequently to announce segments while segments are loading from cache. Set this value to zero to wait for all segments to be loaded before announcing. 5000 (5 seconds)
druid.segmentCache.numLoadingThreads How many segments to drop or load concurrently from from deep storage. 10
druid.segmentCache.numBootstrapThreads How many segments to load concurrently from local storage at startup. Same as numLoadingThreads

In druid.segmentCache.locations, freeSpacePercent was added because maxSize setting is only a theoretical limit and assumes that much space will always be available for storing segments. In case of any druid bug leading to unaccounted segment files left alone on disk or some other process writing stuff to disk, This check can start failing segment loading early before filling up the disk completely and leaving the host usable otherwise.

Query Configs

Concurrent Requests

Druid uses Jetty to serve HTTP requests.

Property Description Default
druid.server.http.numThreads Number of threads for HTTP requests. max(10, (Number of cores * 17) / 16 + 2) + 30
druid.server.http.queueSize Size of the worker queue used by Jetty server to temporarily store incoming client connections. If this value is set and a request is rejected by jetty because queue is full then client would observe request failure with TCP connection being closed immediately with a completely empty response from server. Unbounded
druid.server.http.maxIdleTime The Jetty max idle time for a connection. PT5m
druid.server.http.enableRequestLimit If enabled, no requests would be queued in jetty queue and "HTTP 429 Too Many Requests" error response would be sent. false
druid.server.http.defaultQueryTimeout Query timeout in millis, beyond which unfinished queries will be cancelled 300000
druid.server.http.gracefulShutdownTimeout The maximum amount of time Jetty waits after receiving shutdown signal. After this timeout the threads will be forcefully shutdown. This allows any queries that are executing to complete. PT0s (do not wait)
druid.server.http.unannouncePropogationDelay How long to wait for zookeeper unannouncements to propgate before shutting down Jetty. This is a minimum and druid.server.http.gracefulShutdownTimeout does not start counting down until after this period elapses. PT0s (do not wait)
druid.server.http.maxQueryTimeout Maximum allowed value (in milliseconds) for timeout parameter. See query-context to know more about timeout. Query is rejected if the query context timeout is greater than this value. Long.MAX_VALUE
druid.server.http.maxRequestHeaderSize Maximum size of a request header in bytes. Larger headers consume more memory and can make a server more vulnerable to denial of service attacks. 8 * 1024

Processing

Property Description Default
druid.processing.buffer.sizeBytes This specifies a buffer size for the storage of intermediate results. The computation engine in both the Historical and Realtime nodes will use a scratch buffer of this size to do all of their intermediate computations off-heap. Larger values allow for more aggregations in a single pass over the data while smaller values can require more passes depending on the query that is being executed. 1073741824 (1GB)
druid.processing.buffer.poolCacheMaxCount processing buffer pool caches the buffers for later use, this is the maximum count cache will grow to. note that pool can create more buffers than it can cache if necessary. Integer.MAX_VALUE
druid.processing.formatString Realtime and historical nodes use this format string to name their processing threads. processing-%s
druid.processing.numMergeBuffers The number of direct memory buffers available for merging query results. The buffers are sized by druid.processing.buffer.sizeBytes. This property is effectively a concurrency limit for queries that require merging buffers. If you are using any queries that require merge buffers (currently, just groupBy v2) then you should have at least two of these. max(2, druid.processing.numThreads / 4)
druid.processing.numThreads The number of processing threads to have available for parallel processing of segments. Our rule of thumb is num_cores - 1, which means that even under heavy load there will still be one core available to do background tasks like talking with ZooKeeper and pulling down segments. If only one core is available, this property defaults to the value 1. Number of cores - 1 (or 1)
druid.processing.columnCache.sizeBytes Maximum size in bytes for the dimension value lookup cache. Any value greater than 0 enables the cache. It is currently disabled by default. Enabling the lookup cache can significantly improve the performance of aggregators operating on dimension values, such as the JavaScript aggregator, or cardinality aggregator, but can slow things down if the cache hit rate is low (i.e. dimensions with few repeating values). Enabling it may also require additional garbage collection tuning to avoid long GC pauses. 0 (disabled)
druid.processing.fifo If the processing queue should treat tasks of equal priority in a FIFO manner false
druid.processing.tmpDir Path where temporary files created while processing a query should be stored. If specified, this configuration takes priority over the default java.io.tmpdir path. path represented by java.io.tmpdir

The amount of direct memory needed by Druid is at least druid.processing.buffer.sizeBytes * (druid.processing.numMergeBuffers + druid.processing.numThreads + 1). You can ensure at least this amount of direct memory is available by providing -XX:MaxDirectMemorySize=<VALUE> at the command line.

General Query Configuration

GroupBy Query Config

See groupBy server configuration.

Search Query Config
Property Description Default
druid.query.search.maxSearchLimit Maximum number of search results to return. 1000

Caching

You can optionally only configure caching to be enabled on the historical by setting caching configs here.

Property Possible Values Description Default
druid.historical.cache.useCache true, false Enable the cache on the historical. false
druid.historical.cache.populateCache true, false Populate the cache on the historical. false
druid.historical.cache.unCacheable All druid query types All query types to not cache. ["groupBy", "select"]

See cache configuration for how to configure cache settings.