Druid Metrics
Druid generates metrics related to queries, ingestion, and coordination.
Metrics are emitted as JSON objects to a runtime log file or over HTTP (to a service such as Apache Kafka). Metric emission is disabled by default.
All Druid metrics share a common set of fields:
timestamp
- the time the metric was created
metric
- the name of the metric
service
- the service name that emitted the metric
host
- the host name that emitted the metric
value
- some numeric value associated with the metric
Metrics may have additional dimensions beyond those listed above.
Most metric values reset each emission period. By default druid emission period is 1 minute, this can be changed by setting the property druid.monitoring.emissionPeriod
.
Available Metrics
Query Metrics
Broker
Metric |
Description |
Dimensions |
Normal Value |
query/time |
Milliseconds taken to complete a query. |
Common: dataSource, type, interval, hasFilters, duration, context, remoteAddress, id. Aggregation Queries: numMetrics, numComplexMetrics. GroupBy: numDimensions. TopN: threshold, dimension. |
< 1s |
query/bytes |
number of bytes returned in query response. |
Common: dataSource, type, interval, hasFilters, duration, context, remoteAddress, id. Aggregation Queries: numMetrics, numComplexMetrics. GroupBy: numDimensions. TopN: threshold, dimension. |
|
query/node/time |
Milliseconds taken to query individual historical/realtime nodes. |
id, status, server. |
< 1s |
query/node/bytes |
number of bytes returned from querying individual historical/realtime nodes. |
id, status, server. |
|
query/node/ttfb |
Time to first byte. Milliseconds elapsed until broker starts receiving the response from individual historical/realtime nodes. |
id, status, server. |
< 1s |
query/intervalChunk/time |
Only emitted if interval chunking is enabled. Milliseconds required to query an interval chunk. |
id, status, chunkInterval (if interval chunking is enabled). |
< 1s |
Historical
Metric |
Description |
Dimensions |
Normal Value |
query/time |
Milliseconds taken to complete a query. |
Common: dataSource, type, interval, hasFilters, duration, context, remoteAddress, id. Aggregation Queries: numMetrics, numComplexMetrics. GroupBy: numDimensions. TopN: threshold, dimension. |
< 1s |
query/segment/time |
Milliseconds taken to query individual segment. Includes time to page in the segment from disk. |
id, status, segment. |
several hundred milliseconds |
query/wait/time |
Milliseconds spent waiting for a segment to be scanned. |
id, segment. |
< several hundred milliseconds |
segment/scan/pending |
Number of segments in queue waiting to be scanned. |
|
Close to 0 |
query/segmentAndCache/time |
Milliseconds taken to query individual segment or hit the cache (if it is enabled on the historical node). |
id, segment. |
several hundred milliseconds |
query/cpu/time |
Microseconds of CPU time taken to complete a query |
Common: dataSource, type, interval, hasFilters, duration, context, remoteAddress, id. Aggregation Queries: numMetrics, numComplexMetrics. GroupBy: numDimensions. TopN: threshold, dimension. |
Varies |
Real-time
Metric |
Description |
Dimensions |
Normal Value |
query/time |
Milliseconds taken to complete a query. |
Common: dataSource, type, interval, hasFilters, duration, context, remoteAddress, id. Aggregation Queries: numMetrics, numComplexMetrics. GroupBy: numDimensions. TopN: threshold, dimension. |
< 1s |
query/wait/time |
Milliseconds spent waiting for a segment to be scanned. |
id, segment. |
several hundred milliseconds |
segment/scan/pending |
Number of segments in queue waiting to be scanned. |
|
Close to 0 |
Jetty
Metric |
Description |
Normal Value |
jetty/numOpenConnections |
Number of open jetty connections. |
Not much higher than number of jetty threads. |
Cache
Metric |
Description |
Normal Value |
query/cache/delta/* |
Cache metrics since the last emission. |
|
query/cache/total/* |
Total cache metrics. |
|
Metric |
Description |
Dimensions |
Normal Value |
*/numEntries |
Number of cache entries. |
|
Varies. |
*/sizeBytes |
Size in bytes of cache entries. |
|
Varies. |
*/hits |
Number of cache hits. |
|
Varies. |
*/misses |
Number of cache misses. |
|
Varies. |
*/evictions |
Number of cache evictions. |
|
Varies. |
*/hitRate |
Cache hit rate. |
|
~40% |
*/averageByte |
Average cache entry byte size. |
|
Varies. |
*/timeouts |
Number of cache timeouts. |
|
0 |
*/errors |
Number of cache errors. |
|
0 |
Memcached only metrics
Memcached client metrics are reported as per the following. These metrics come directly from the client as opposed to from the cache retrieval layer.
Metric |
Description |
Dimensions |
Normal Value |
query/cache/memcached/total |
Cache metrics unique to memcached (only if druid.cache.type=memcached ) as their actual values |
Variable |
N/A |
query/cache/memcached/delta |
Cache metrics unique to memcached (only if druid.cache.type=memcached ) as their delta from the prior event emission |
Variable |
N/A |
Ingestion Metrics
These metrics are only available if the RealtimeMetricsMonitor is included in the monitors list for the Realtime node. These metrics are deltas for each emission period.
Metric |
Description |
Dimensions |
Normal Value |
ingest/events/thrownAway |
Number of events rejected because they are outside the windowPeriod. |
dataSource. |
0 |
ingest/events/unparseable |
Number of events rejected because the events are unparseable. |
dataSource. |
0 |
ingest/events/processed |
Number of events successfully processed per emission period. |
dataSource. |
Equal to your # of events per emission period. |
ingest/rows/output |
Number of Druid rows persisted. |
dataSource. |
Your # of events with rollup. |
ingest/persists/count |
Number of times persist occurred. |
dataSource. |
Depends on configuration. |
ingest/persists/time |
Milliseconds spent doing intermediate persist. |
dataSource. |
Depends on configuration. Generally a few minutes at most. |
ingest/persists/cpu |
Cpu time in Nanoseconds spent on doing intermediate persist. |
dataSource. |
Depends on configuration. Generally a few minutes at most. |
ingest/persists/backPressure |
Number of persists pending. |
dataSource. |
0 |
ingest/persists/failed |
Number of persists that failed. |
dataSource. |
0 |
ingest/handoff/failed |
Number of handoffs that failed. |
dataSource. |
0 |
ingest/merge/time |
Milliseconds spent merging intermediate segments |
dataSource. |
Depends on configuration. Generally a few minutes at most. |
ingest/merge/cpu |
Cpu time in Nanoseconds spent on merging intermediate segments. |
dataSource. |
Depends on configuration. Generally a few minutes at most. |
ingest/handoff/count |
Number of handoffs that happened. |
dataSource. |
Varies. Generally greater than 0 once every segment granular period if cluster operating normally |
Note: If the JVM does not support CPU time measurement for the current thread, ingest/merge/cpu and ingest/persists/cpu will be 0.
Indexing Service
Metric |
Description |
Dimensions |
Normal Value |
task/run/time |
Milliseconds taken to run task. |
dataSource, taskType, taskStatus. |
Varies. |
segment/added/bytes |
Size in bytes of new segments created. |
dataSource, taskType, interval. |
Varies. |
segment/moved/bytes |
Size in bytes of segments moved/archived via the Move Task. |
dataSource, taskType, interval. |
Varies. |
segment/nuked/bytes |
Size in bytes of segments deleted via the Kill Task. |
dataSource, taskType, interval. |
Varies. |
Coordination
These metrics are for the Druid coordinator and are reset each time the coordinator runs the coordination logic.
Metric |
Description |
Dimensions |
Normal Value |
segment/assigned/count |
Number of segments assigned to be loaded in the cluster. |
tier. |
Varies. |
segment/moved/count |
Number of segments moved in the cluster. |
tier. |
Varies. |
segment/dropped/count |
Number of segments dropped due to being overshadowed. |
tier. |
Varies. |
segment/deleted/count |
Number of segments dropped due to rules. |
tier. |
Varies. |
segment/unneeded/count |
Number of segments dropped due to being marked as unused. |
tier. |
Varies. |
segment/cost/raw |
Used in cost balancing. The raw cost of hosting segments. |
tier. |
Varies. |
segment/cost/normalization |
Used in cost balancing. The normalization of hosting segments. |
tier. |
Varies. |
segment/cost/normalized |
Used in cost balancing. The normalized cost of hosting segments. |
tier. |
Varies. |
segment/loadQueue/size |
Size in bytes of segments to load. |
server. |
Varies. |
segment/loadQueue/failed |
Number of segments that failed to load. |
server. |
0 |
segment/loadQueue/count |
Number of segments to load. |
server. |
Varies. |
segment/dropQueue/count |
Number of segments to drop. |
server. |
Varies. |
segment/size |
Size in bytes of available segments. |
dataSource. |
Varies. |
segment/count |
Number of available segments. |
dataSource. |
< max |
segment/overShadowed/count |
Number of overShadowed segments. |
|
Varies. |
General Health
Historical
Metric |
Description |
Dimensions |
Normal Value |
segment/max |
Maximum byte limit available for segments. |
|
Varies. |
segment/used |
Bytes used for served segments. |
dataSource, tier, priority. |
< max |
segment/usedPercent |
Percentage of space used by served segments. |
dataSource, tier, priority. |
< 100% |
segment/count |
Number of served segments. |
dataSource, tier, priority. |
Varies. |
JVM
These metrics are only available if the JVMMonitor module is included.
Metric |
Description |
Dimensions |
Normal Value |
jvm/pool/committed |
Committed pool. |
poolKind, poolName. |
close to max pool |
jvm/pool/init |
Initial pool. |
poolKind, poolName. |
Varies. |
jvm/pool/max |
Max pool. |
poolKind, poolName. |
Varies. |
jvm/pool/used |
Pool used. |
poolKind, poolName. |
< max pool |
jvm/bufferpool/count |
Bufferpool count. |
bufferPoolName. |
Varies. |
jvm/bufferpool/used |
Bufferpool used. |
bufferPoolName. |
close to capacity |
jvm/bufferpool/capacity |
Bufferpool capacity. |
bufferPoolName. |
Varies. |
jvm/mem/init |
Initial memory. |
memKind. |
Varies. |
jvm/mem/max |
Max memory. |
memKind. |
Varies. |
jvm/mem/used |
Used memory. |
memKind. |
< max memory |
jvm/mem/committed |
Committed memory. |
memKind. |
close to max memory |
jvm/gc/count |
Garbage collection count. |
gcName. |
< 100 |
jvm/gc/time |
Garbage collection time. |
gcName. |
< 1s |
EventReceiverFirehose
The following metric is only available if the EventReceiverFirehoseMonitor module is included.
Metric |
Description |
Dimensions |
Normal Value |
ingest/events/buffered |
Number of events queued in the EventReceiverFirehose's buffer |
serviceName, dataSource, taskId, bufferCapacity. |
Equal to current # of events in the buffer queue. |
ingest/bytes/received |
Number of bytes received by the EventReceiverFirehose. |
serviceName, dataSource, taskId. |
Varies. |
Sys
These metrics are only available if the SysMonitor module is included.
Metric |
Description |
Dimensions |
Normal Value |
sys/swap/free |
Free swap. |
|
Varies. |
sys/swap/max |
Max swap. |
|
Varies. |
sys/swap/pageIn |
Paged in swap. |
|
Varies. |
sys/swap/pageOut |
Paged out swap. |
|
Varies. |
sys/disk/write/count |
Writes to disk. |
fsDevName, fsDirName, fsTypeName, fsSysTypeName, fsOptions. |
Varies. |
sys/disk/read/count |
Reads from disk. |
fsDevName, fsDirName, fsTypeName, fsSysTypeName, fsOptions. |
Varies. |
sys/disk/write/size |
Bytes written to disk. Can we used to determine how much paging is occuring with regards to segments. |
fsDevName, fsDirName, fsTypeName, fsSysTypeName, fsOptions. |
Varies. |
sys/disk/read/size |
Bytes read from disk. Can we used to determine how much paging is occuring with regards to segments. |
fsDevName, fsDirName, fsTypeName, fsSysTypeName, fsOptions. |
Varies. |
sys/net/write/size |
Bytes written to the network. |
netName, netAddress, netHwaddr |
Varies. |
sys/net/read/size |
Bytes read from the network. |
netName, netAddress, netHwaddr |
Varies. |
sys/fs/used |
Filesystem bytes used. |
fsDevName, fsDirName, fsTypeName, fsSysTypeName, fsOptions. |
< max |
sys/fs/max |
Filesystesm bytes max. |
fsDevName, fsDirName, fsTypeName, fsSysTypeName, fsOptions. |
Varies. |
sys/mem/used |
Memory used. |
|
< max |
sys/mem/max |
Memory max. |
|
Varies. |
sys/storage/used |
Disk space used. |
fsDirName. |
Varies. |
sys/cpu |
CPU used. |
cpuName, cpuTime. |
Varies. |