opensearch-docs-cn/_monitoring-your-cluster/pa/rca/shard-hotspot.md

105 lines
4.6 KiB
Markdown
Raw Normal View History

Add documentation for Shard Hotspot Identification RCA (#3741) * Shard Hotspot RCA Signed-off-by: ariamarble <armarble@amazon.com> * small update Signed-off-by: ariamarble <armarble@amazon.com> * API and content update Signed-off-by: ariamarble <armarble@amazon.com> * additional updates Signed-off-by: ariamarble <armarble@amazon.com> * Apply suggestions from doc review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * further doc review changes Signed-off-by: ariamarble <armarble@amazon.com> * Apply suggestions from doc review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Added response field table Signed-off-by: ariamarble <armarble@amazon.com> * Add more details Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented tech review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Last tech review update Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Small change Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * small typo Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented doc review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update _monitoring-your-cluster/pa/rca/shard-hotspot.md Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: ariamarble <armarble@amazon.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-05-02 11:03:11 -04:00
---
layout: default
title: Hot shard identification
parent: Root Cause Analysis
grand_parent: Performance Analyzer
nav_order: 30
---
# Hot shard identification
Hot shard identification root cause analysis (RCA) lets you identify a hot shard within an index. A hot shard is an outlier that consumes more resources than other shards and may lead to poor indexing and search performance. The hot shard identification RCA monitors the following metrics:
- CPU utilization
- Heap allocation rate
Shards may become hot because of the nature of your workload. When you use a `_routing` parameter or a custom document ID, a specific shard or several shards within the cluster receive frequent updates, consuming more CPU and heap resources than other shards.
The hot shard identification RCA compares the CPU utilization and heap allocation rates against their threshold values. If the usage for either metric is greater than the threshold, the shard is considered to be _hot_.
For more information about the hot shard identification RCA implementation, see [Hot Shard RCA](https://github.com/opensearch-project/performance-analyzer-rca/blob/main/src/main/java/org/opensearch/performanceanalyzer/rca/store/rca/hotshard/docs/README.md).
#### Example request
The following query requests hot shard identification:
```bash
GET _plugins/_performanceanalyzer/rca?name=HotShardClusterRca
```
{% include copy-curl.html %}
#### Example response
Add documentation for Shard Hotspot Identification RCA (#3741) * Shard Hotspot RCA Signed-off-by: ariamarble <armarble@amazon.com> * small update Signed-off-by: ariamarble <armarble@amazon.com> * API and content update Signed-off-by: ariamarble <armarble@amazon.com> * additional updates Signed-off-by: ariamarble <armarble@amazon.com> * Apply suggestions from doc review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * further doc review changes Signed-off-by: ariamarble <armarble@amazon.com> * Apply suggestions from doc review Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Added response field table Signed-off-by: ariamarble <armarble@amazon.com> * Add more details Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented tech review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Last tech review update Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Small change Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * small typo Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Implemented doc review comments Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> * Update _monitoring-your-cluster/pa/rca/shard-hotspot.md Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: Nathan Bower <nbower@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> --------- Signed-off-by: ariamarble <armarble@amazon.com> Signed-off-by: Fanit Kolchina <kolchfa@amazon.com> Signed-off-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: kolchfa-aws <105444904+kolchfa-aws@users.noreply.github.com> Co-authored-by: Fanit Kolchina <kolchfa@amazon.com> Co-authored-by: Melissa Vagi <vagimeli@amazon.com> Co-authored-by: Nathan Bower <nbower@amazon.com>
2023-05-02 11:03:11 -04:00
The response contains a list of unhealthy shards:
```json
"HotShardClusterRca": [{
"rca_name": "HotShardClusterRca",
"timestamp": 1680721367563,
"state": "unhealthy",
"HotClusterSummary": [
{
"number_of_nodes": 3,
"number_of_unhealthy_nodes": 1,
"HotNodeSummary": [
{
"node_id": "7kosAbpASsqBoHmHkVXxmw",
"host_address": "192.168.80.4",
"HotResourceSummary": [
{
"resource_type": "cpu usage",
"resource_metric": "cpu usage(num of cores)",
"threshold": 0.027397981341796683,
"value": 0.034449630200405396,
"time_period_seconds": 60,
"meta_data": "ssZw1WRUSHS5DZCW73BOJQ index9 4"
},
{
"resource_type": "heap",
"resource_metric": "heap alloc rate(heap alloc rate in bytes per second)",
"threshold": 7605441.367010161,
"value": 10872119.748328414,
"time_period_seconds": 60,
"meta_data": "ssZw1WRUSHS5DZCW73BOJQ index9 4"
},
{
"resource_type": "heap",
"resource_metric": "heap alloc rate(heap alloc rate in bytes per second)",
"threshold": 7605441.367010161,
"value": 8019622.354388569,
"time_period_seconds": 60,
"meta_data": "QRF4rBM7SNCDr1g3KU6HyA index9 0"
}
]
}
]
}
]
}]
```
## Response fields
The following table lists the response fields.
Field | Type | Description
:--- | :--- | :---
rca_name | String | The name of the RCA. In this case, "HotShardClusterRca".
timestamp | Integer | The timestamp of the RCA.
state | Object | The state of the cluster determined by the RCA. The `state` can be `healthy`, `unhealthy`, or `unknown`.
HotClusterSummary.HotNodeSummary.number_of_nodes | Integer | The number of nodes in the cluster.
HotClusterSummary.HotNodeSummary.number_of_unhealthy_nodes | Integer | The number of nodes found to be in an `unhealthy` state.
HotClusterSummary.HotNodeSummary.HotResourceSummary.resource_type | Object | The type of resource causing the unhealthy state, either "cpu usage" or "heap".
HotClusterSummary.HotNodeSummary.HotResourceSummary.resource_metric | String | The definition of the resource_type. Either "cpu usage(num of cores)" or "heap alloc rate(heap alloc rate in bytes per second)".
HotClusterSummary.HotNodeSummary.HotResourceSummary.threshold | Float | The value that determines whether a resource is contended.
HotClusterSummary.HotNodeSummary.HotResourceSummary.value | Float | The current value of the resource.
HotClusterSummary.HotNodeSummary.HotResourceSummary.time_period_seconds | Time | The amount of time that a shard was monitored before its state was declared to be healthy or unhealthy.
HotClusterSummary.HotNodeSummary.HotResourceSummary.meta_data | String | The metadata associated with the resource_type.
In the preceding example response, `meta_data` is `QRF4rBM7SNCDr1g3KU6HyA index9 0`. The `meta_data` string consists of three fields:
- Node name: `QRF4rBM7SNCDr1g3KU6HyA`
- Index name: `index9`
- Shard ID: `0`
This means that shard `0` of index `index9` on node `QRF4rBM7SNCDr1g3KU6HyA` is hot.