Performance analyzer documentation updates (#1318)

* updated incorrect links

Signed-off-by: ariamarble <armarble@amazon.com>

* performance analyzer documentation update

Signed-off-by: ariamarble <armarble@amazon.com>

* draft update

Signed-off-by: ariamarble <armarble@amazon.com>

* converting to ready for review

Signed-off-by: ariamarble <armarble@amazon.com>

* tech review updates

Signed-off-by: ariamarble <armarble@amazon.com>

* made suggested changes

Signed-off-by: ariamarble <armarble@amazon.com>

* made further changes and fixed header

Signed-off-by: ariamarble <armarble@amazon.com>

* editorial changes

Signed-off-by: ariamarble <armarble@amazon.com>

Signed-off-by: ariamarble <armarble@amazon.com>
This commit is contained in:
Aria Marble 2022-10-12 09:22:07 -07:00 committed by GitHub
parent 8bd6150d20
commit 15c76088fe
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 224 additions and 125 deletions

View File

@ -7,127 +7,56 @@ redirect_from:
- /monitoring-plugins/pa/
---
# Performance Analyzer
# Performance analyzer
Performance Analyzer is an agent and REST API that allows you to query numerous performance metrics for your cluster, including aggregations of those metrics, independent of the Java Virtual Machine (JVM). PerfTop is the default command line interface (CLI) for displaying those metrics.
Performance analyzer is an agent and REST API that allows you to query numerous performance metrics for your cluster, including aggregations of those metrics.
To download PerfTop, see [Download](https://github.com/opensearch-project/perftop/releases) on the PerfTop release page.
You can also install it using [npm](https://www.npmjs.com/):
```bash
npm install -g @aws/opensearch-perftop
```
![PerfTop screenshot]({{site.url}}{{site.baseurl}}/images/perftop.png)
For enabling Performance Analyzer with tarball installations of OpenSearch, see [Configure Performance Analyzer for Tarball Installation](#configure-performance-analyzer-for-tarball-installations).
## Get started with PerfTop
The basic syntax is:
```bash
./opensearch-perf-top-<operating_system> --dashboard <dashboard>.json --endpoint <endpoint>
```
If you're using npm, the syntax is similar:
```bash
opensearch-perf-top --dashboard <dashboard> --endpoint <endpoint>
```
If you're running PerfTop from a node (i.e. locally), specify port 9600:
```bash
./opensearch-perf-top-linux --dashboard dashboards/<dashboard>.json --endpoint localhost:9600
```
Otherwise, just specify the OpenSearch endpoint:
```bash
./opensearch-perf-top-macos --dashboard dashboards/<dashboard>.json --endpoint my-cluster.my-domain.com
```
PerfTop has four pre-built dashboards in the `dashboards` directory, but you can also [create your own]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/dashboards/).
You can also load the pre-built dashboards (ClusterOverview, ClusterNetworkMemoryAnalysis, ClusterThreadAnalysis, or NodeAnalysis) without the JSON files, such as `--dashboard ClusterThreadAnalysis`.
PerfTop has no interactivity. Start the application, monitor the dashboard, and press Esc, Q, or Ctrl + C to quit.
The performance analyzer plugin is installed by default in OpenSearch version 2.0 and higher.
{: .note }
## Performance analyzer installation and configuration
### Other options
The following sections provide the steps for installing and configuring the performance analyzer plugin.
- For NodeAnalysis and similar custom dashboards, you can add the `--nodename <node_name>` argument if you want your dashboard to display metrics for only a single node.
- For troubleshooting, add the `--logfile <log-file>.txt` argument.
### Install performance analyzer
The performance analyzer plugin is included in the installation for [Docker]({{site.url}}{{site.baseurl}}/opensearch/install/docker/) and [tarball]({{site.url}}{{site.baseurl}}/opensearch/install/tar/). If you need to install the performance analyzer plugin manually, download the plugin from [Maven](https://search.maven.org/search?q=org.opensearch.plugin) and install the plugin using the standard [plugins install]({{site.url}}{{site.baseurl}}/opensearch/install/plugins/) process. Performance analyzer will run on each node in a cluster.
## Performance Analyzer configuration
To start the performance analyzer root cause analysis (RCA) agent on a tarball installation, run the following command:
### Storage
````bash
OPENSEARCH_HOME=~/opensearch-2.2.1 OPENSEARCH_JAVA_HOME=~/opensearch-2.2.1/jdk OPENSEARCH_PATH_CONF=~/opensearch-2.2.1/bin ./performance-analyzer-agent-cli
````
Performance Analyzer uses `/dev/shm` for temporary storage. During heavy workloads on a cluster, Performance Analyzer can use up to 1 GB of space.
The following command enables the performance analyzer plugin and performance analyzer RCA agent:
Docker, however, has a default `/dev/shm` size of 64 MB. To change this value, you can use the `docker run --shm-size 1gb` flag or [a similar setting in Docker Compose](https://docs.docker.com/compose/compose-file#shm_size).
````bash
curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'
````
If you're not using Docker, check the size of `/dev/shm` using `df -h`. The default value is probably plenty, but if you need to change its size, add the following line to `/etc/fstab`:
To shut down the performance analyzer RCA agent, run the following command:
```bash
tmpfs /dev/shm tmpfs defaults,noexec,nosuid,size=1G 0 0
```
````bash
kill $(ps aux | grep -i 'PerformanceAnalyzerApp' | grep -v grep | awk '{print $2}')
````
Then remount the file system:
To disable the performance analyzer plugin, run the following command:
```bash
mount -o remount /dev/shm
```
````bash
curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": false}'
````
To uninstall the performance analyzer plugin, run the following command:
### Security
````bash
bin/opensearch-plugin remove opensearch-performance-analyzer
````
Performance Analyzer supports encryption in transit for requests. It currently does *not* support client or server authentication for requests. To enable encryption in transit, edit `performance-analyzer.properties` in your `$OPENSEARCH_HOME` directory:
### Configure performance analyzer
```bash
vi $OPENSEARCH_HOME/config/opensearch-performance-analyzer/performance-analyzer.properties
```
To configure the performance analyzer plugin, you will need to edit the `performance-analyzer.properties` configuration file in the `config/opensearch-performance-analyzer/` directory. Make sure to uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`. You can reference the following example configuration.
Change the following lines to configure encryption in transit. Note that `certificate-file-path` must be a certificate for the server, not a root CA:
```
https-enabled = true
#Setup the correct path for certificates
certificate-file-path = specify_path
private-key-file-path = specify_path
```
## Enable Performance Analyzer for RPM/YUM installations
If you installed OpenSearch from an RPM distribution, you can start and stop Performance Analyzer with `systemctl`:
```bash
# Start OpenSearch Performance Analyzer
sudo systemctl start opensearch-performance-analyzer.service
# Stop OpenSearch Performance Analyzer
sudo systemctl stop opensearch-performance-analyzer.service
```
## Configure Performance Analyzer for tarball installations
In a tarball installation, Performance Analyzer collects data when it is enabled. But in order to read that data using the REST API on port 9600, you must first manually launch the associated reader agent process:
1. Make Performance Analyzer accessible outside of the host machine
```bash
cd /usr/share/opensearch # navigate to the OpenSearch home directory
cd config/opensearch-performance-analyzer/
vi performance-analyzer.properties
```
Uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`:
```
````bash
# ======================== OpenSearch performance analyzer plugin config =========================
# NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS.
@ -156,49 +85,219 @@ In a tarball installation, Performance Analyzer collects data when it is enabled
https-enabled = false
#Setup the correct path for certificates
certificate-file-path = specify_path
#certificate-file-path = specify_path
private-key-file-path = specify_path
#private-key-file-path = specify_path
# Plugin Stats Metadata file name, expected to be in the same location
plugin-stats-metadata = plugin-stats-metadata
# Agent Stats Metadata file name, expected to be in the same location
agent-stats-metadata = agent-stats-metadata
```
````
To start the performance analyzer RCA agent, run the following command.
1. Make the CLI executable:
````bash
OPENSEARCH_HOME=~/opensearch-2.2.1 OPENSEARCH_JAVA_HOME=~/opensearch-2.2.1/jdk OPENSEARCH_PATH_CONF=~/opensearch-2.2.1/bin ./performance-analyzer-agent-cli
````
### Storage
Performance analyzer uses `/dev/shm` for temporary storage. During heavy workloads on a cluster, performance analyzer can use up to 1 GB of space.
Docker, however, has a default `/dev/shm` size of 64 MB. To change this value, you can use the `docker run --shm-size 1gb` flag or [a similar setting in Docker Compose](https://docs.docker.com/compose/compose-file#shm_size).
If you're not using Docker, check the size of `/dev/shm` using `df -h`. The default value is probably plenty, but if you need to change its size, add the following line to `/etc/fstab`:
```bash
sudo chmod +x ./bin/performance-analyzer-agent-cli
tmpfs /dev/shm tmpfs defaults,noexec,nosuid,size=1G 0 0
```
1. Launch the agent CLI:
Then remount the file system:
```bash
OPENSEARCH_HOME="$PWD" OPENSEARCH_PATH_CONF="$PWD/config" ./bin/performance-analyzer-agent-cli
mount -o remount /dev/shm
```
1. In a separate window, enable the Performance Analyzer plugin:
### Security
Performance analyzer supports encryption in transit for requests. It currently does *not* support client or server authentication for requests. To enable encryption in transit, edit `performance-analyzer.properties` in your `$OPENSEARCH_HOME` directory.
```bash
curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'
vi $OPENSEARCH_HOME/config/opensearch-performance-analyzer/performance-analyzer.properties
```
If you receive the `curl: (52) Empty reply from server` error, you are likely protecting your cluster with the security plugin and you need to provide credentials. Modify the following command to use your username and password:
Change the following lines to configure encryption in transit. Note that `certificate-file-path` must be a certificate for the server, not a root certificate authority (CA).
````bash
https-enabled = true
#Setup the correct path for certificates
certificate-file-path = specify_path
private-key-file-path = specify_path
````
### Enable performance analyzer for RPM/YUM installations
If you installed OpenSearch from an RPM distribution, you can start and stop performance analyzer with `systemctl`.
```bash
curl -XPOST https://localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k
# Start OpenSearch Performance Analyzer
sudo systemctl start opensearch-performance-analyzer.service
# Stop OpenSearch Performance Analyzer
sudo systemctl stop opensearch-performance-analyzer.service
```
1. Finally, enable the Root Cause Analyzer (RCA) framework
## Example API query and response
The following is an example API query:
````bash
GET localhost:9600/_plugins/_performanceanalyzer/metrics/units
````
The following is an example response:
````json
{"Disk_Utilization":"%","Cache_Request_Hit":"count",
"Refresh_Time":"ms","ThreadPool_QueueLatency":"count",
"Merge_Time":"ms","ClusterApplierService_Latency":"ms",
"PublishClusterState_Latency":"ms",
"Cache_Request_Size":"B","LeaderCheck_Failure":"count",
"ThreadPool_QueueSize":"count","Sched_Runtime":"s/ctxswitch","Disk_ServiceRate":"MB/s","Heap_AllocRate":"B/s","Indexing_Pressure_Current_Limits":"B",
"Sched_Waittime":"s/ctxswitch","ShardBulkDocs":"count",
"Thread_Blocked_Time":"s/event","VersionMap_Memory":"B",
"Master_Task_Queue_Time":"ms","IO_TotThroughput":"B/s",
"Indexing_Pressure_Current_Bytes":"B",
"Indexing_Pressure_Last_Successful_Timestamp":"ms",
"Net_PacketRate6":"packets/s","Cache_Query_Hit":"count",
"IO_ReadSyscallRate":"count/s","Net_PacketRate4":"packets/s","Cache_Request_Miss":"count",
"ThreadPool_RejectedReqs":"count","Net_TCP_TxQ":"segments/flow","Master_Task_Run_Time":"ms",
"IO_WriteSyscallRate":"count/s","IO_WriteThroughput":"B/s",
"Refresh_Event":"count","Flush_Time":"ms","Heap_Init":"B",
"Indexing_Pressure_Rejection_Count":"count",
"CPU_Utilization":"cores","Cache_Query_Size":"B",
"Merge_Event":"count","Cache_FieldData_Eviction":"count",
"IO_TotalSyscallRate":"count/s","Net_Throughput":"B/s",
"Paging_RSS":"pages",
"AdmissionControl_ThresholdValue":"count",
"Indexing_Pressure_Average_Window_Throughput":"count/s",
"Cache_MaxSize":"B","IndexWriter_Memory":"B",
"Net_TCP_SSThresh":"B/flow","IO_ReadThroughput":"B/s",
"LeaderCheck_Latency":"ms","FollowerCheck_Failure":"count",
"HTTP_RequestDocs":"count","Net_TCP_Lost":"segments/flow",
"GC_Collection_Event":"count","Sched_CtxRate":"count/s",
"AdmissionControl_RejectionCount":"count","Heap_Max":"B",
"ClusterApplierService_Failure":"count",
"PublishClusterState_Failure":"count",
"Merge_CurrentEvent":"count","Indexing_Buffer":"B",
"Bitset_Memory":"B","Net_PacketDropRate4":"packets/s",
"Heap_Committed":"B","Net_PacketDropRate6":"packets/s",
"Thread_Blocked_Event":"count","GC_Collection_Time":"ms",
"Cache_Query_Miss":"count","Latency":"ms",
"Shard_State":"count","Thread_Waited_Event":"count",
"CB_ConfiguredSize":"B","ThreadPool_QueueCapacity":"count",
"CB_TrippedEvents":"count","Disk_WaitTime":"ms",
"Data_RetryingPendingTasksCount":"count",
"AdmissionControl_CurrentValue":"count",
"Flush_Event":"count","Net_TCP_RxQ":"segments/flow",
"Shard_Size_In_Bytes":"B","Thread_Waited_Time":"s/event",
"HTTP_TotalRequests":"count",
"ThreadPool_ActiveThreads":"count",
"Paging_MinfltRate":"count/s","Net_TCP_SendCWND":"B/flow",
"Cache_Request_Eviction":"count","Segments_Total":"count",
"FollowerCheck_Latency":"ms","Heap_Used":"B",
"Master_ThrottledPendingTasksCount":"count",
"CB_EstimatedSize":"B","Indexing_ThrottleTime":"ms",
"Master_PendingQueueSize":"count",
"Cache_FieldData_Size":"B","Paging_MajfltRate":"count/s",
"ThreadPool_TotalThreads":"count","ShardEvents":"count",
"Net_TCP_NumFlows":"count","Election_Term":"count"}
````
## Root cause analysis
The [root cause analysis]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/rca/index/) (RCA) framework uses the information from performance analyzer to inform administrators of the root cause of performance and availability issues that their clusters might be experiencing.
### Enable the RCA framework
To enable the RCA framework, run the following command:
```bash
curl -XPOST localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'
curl -XPOST http://localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}'
```
Similar to step 4, if you run into `curl: (52) Empty reply from server`, run the command below to enable RCA
If you encounter the `curl: (52) Empty reply from server` response, run the following command to enable RCA:
```bash
curl -XPOST https://localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k
```
### Example API query and response
To request all available RCAs, run the following command:
````bash
GET localhost:9600/_plugins/_performanceanalyzer/rca
````
To request a specific RCA, run the following command:
````bash
GET localhost:9600/_plugins/_performanceanalyzer/rca?name=HighHeapUsageClusterRCA
````
The following is an example response:
```json
{
"HighHeapUsageClusterRCA": [{
"RCA_name": "HighHeapUsageClusterRCA",
"state": "unhealthy",
"timestamp": 1587426650942,
"HotClusterSummary": [{
"number_of_nodes": 2,
"number_of_unhealthy_nodes": 1,
"HotNodeSummary": [{
"host_address": "192.168.144.2",
"node_id": "JtlEoRowSI6iNpzpjlbp_Q",
"HotResourceSummary": [{
"resource_type": "old gen",
"threshold": 0.65,
"value": 0.81827232588145373,
"avg": NaN,
"max": NaN,
"min": NaN,
"unit_type": "heap usage in percentage",
"time_period_seconds": 600,
"TopConsumerSummary": [{
"name": "CACHE_FIELDDATA_SIZE",
"value": 590702564
},
{
"name": "CACHE_REQUEST_SIZE",
"value": 28375
},
{
"name": "CACHE_QUERY_SIZE",
"value": 12687
}
],
}]
}]
}]
}]
}
```
## Performance analyzer and RCA API references
### Related links
Further documentation on the use of performance analyzer and RCA can be found at the following links:
- [Performance analyzer API guide]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/api/).
- [Root cause analysis]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/rca/index/).
- [Root cause analysis API guide]({{site.url}}{{site.baseurl}}/latest/monitoring-plugins/pa/rca/api/).
- [RFC: Root cause analysis](https://github.com/opensearch-project/performance-analyzer-rca/blob/main/docs/rfc-rca.pdf).