From 15c76088fe913cb4874c192288096b658f6ea492 Mon Sep 17 00:00:00 2001 From: Aria Marble <111301581+ariamarble@users.noreply.github.com> Date: Wed, 12 Oct 2022 09:22:07 -0700 Subject: [PATCH] Performance analyzer documentation updates (#1318) * updated incorrect links Signed-off-by: ariamarble * performance analyzer documentation update Signed-off-by: ariamarble * draft update Signed-off-by: ariamarble * converting to ready for review Signed-off-by: ariamarble * tech review updates Signed-off-by: ariamarble * made suggested changes Signed-off-by: ariamarble * made further changes and fixed header Signed-off-by: ariamarble * editorial changes Signed-off-by: ariamarble Signed-off-by: ariamarble --- _monitoring-plugins/pa/index.md | 349 ++++++++++++++++++++------------ 1 file changed, 224 insertions(+), 125 deletions(-) diff --git a/_monitoring-plugins/pa/index.md b/_monitoring-plugins/pa/index.md index 3490ea76..7a5bfeec 100644 --- a/_monitoring-plugins/pa/index.md +++ b/_monitoring-plugins/pa/index.md @@ -7,67 +7,103 @@ redirect_from: - /monitoring-plugins/pa/ --- -# Performance Analyzer +# Performance analyzer -Performance Analyzer is an agent and REST API that allows you to query numerous performance metrics for your cluster, including aggregations of those metrics, independent of the Java Virtual Machine (JVM). PerfTop is the default command line interface (CLI) for displaying those metrics. +Performance analyzer is an agent and REST API that allows you to query numerous performance metrics for your cluster, including aggregations of those metrics. -To download PerfTop, see [Download](https://github.com/opensearch-project/perftop/releases) on the PerfTop release page. - -You can also install it using [npm](https://www.npmjs.com/): - -```bash -npm install -g @aws/opensearch-perftop -``` - -![PerfTop screenshot]({{site.url}}{{site.baseurl}}/images/perftop.png) - -For enabling Performance Analyzer with tarball installations of OpenSearch, see [Configure Performance Analyzer for Tarball Installation](#configure-performance-analyzer-for-tarball-installations). - -## Get started with PerfTop - -The basic syntax is: - -```bash -./opensearch-perf-top- --dashboard .json --endpoint -``` - -If you're using npm, the syntax is similar: - -```bash -opensearch-perf-top --dashboard --endpoint -``` - -If you're running PerfTop from a node (i.e. locally), specify port 9600: - -```bash -./opensearch-perf-top-linux --dashboard dashboards/.json --endpoint localhost:9600 -``` - -Otherwise, just specify the OpenSearch endpoint: - -```bash -./opensearch-perf-top-macos --dashboard dashboards/.json --endpoint my-cluster.my-domain.com -``` - -PerfTop has four pre-built dashboards in the `dashboards` directory, but you can also [create your own]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/dashboards/). - -You can also load the pre-built dashboards (ClusterOverview, ClusterNetworkMemoryAnalysis, ClusterThreadAnalysis, or NodeAnalysis) without the JSON files, such as `--dashboard ClusterThreadAnalysis`. - -PerfTop has no interactivity. Start the application, monitor the dashboard, and press Esc, Q, or Ctrl + C to quit. +The performance analyzer plugin is installed by default in OpenSearch version 2.0 and higher. {: .note } +## Performance analyzer installation and configuration -### Other options +The following sections provide the steps for installing and configuring the performance analyzer plugin. -- For NodeAnalysis and similar custom dashboards, you can add the `--nodename ` argument if you want your dashboard to display metrics for only a single node. -- For troubleshooting, add the `--logfile .txt` argument. +### Install performance analyzer +The performance analyzer plugin is included in the installation for [Docker]({{site.url}}{{site.baseurl}}/opensearch/install/docker/) and [tarball]({{site.url}}{{site.baseurl}}/opensearch/install/tar/). If you need to install the performance analyzer plugin manually, download the plugin from [Maven](https://search.maven.org/search?q=org.opensearch.plugin) and install the plugin using the standard [plugins install]({{site.url}}{{site.baseurl}}/opensearch/install/plugins/) process. Performance analyzer will run on each node in a cluster. -## Performance Analyzer configuration +To start the performance analyzer root cause analysis (RCA) agent on a tarball installation, run the following command: + +````bash +OPENSEARCH_HOME=~/opensearch-2.2.1 OPENSEARCH_JAVA_HOME=~/opensearch-2.2.1/jdk OPENSEARCH_PATH_CONF=~/opensearch-2.2.1/bin ./performance-analyzer-agent-cli +```` + +The following command enables the performance analyzer plugin and performance analyzer RCA agent: + +````bash +curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' +```` + +To shut down the performance analyzer RCA agent, run the following command: + +````bash +kill $(ps aux | grep -i 'PerformanceAnalyzerApp' | grep -v grep | awk '{print $2}') +```` + +To disable the performance analyzer plugin, run the following command: + +````bash +curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": false}' +```` + +To uninstall the performance analyzer plugin, run the following command: + +````bash +bin/opensearch-plugin remove opensearch-performance-analyzer +```` + +### Configure performance analyzer + +To configure the performance analyzer plugin, you will need to edit the `performance-analyzer.properties` configuration file in the `config/opensearch-performance-analyzer/` directory. Make sure to uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`. You can reference the following example configuration. + +````bash +# ======================== OpenSearch performance analyzer plugin config ========================= + +# NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS. + +# WebService bind host; default to all interfaces +webservice-bind-host = 0.0.0.0 + +# Metrics data location +metrics-location = /dev/shm/performanceanalyzer/ + +# Metrics deletion interval (minutes) for metrics data. +# Interval should be between 1 to 60. +metrics-deletion-interval = 1 + +# If set to true, the system cleans up the files behind it. So at any point, we should expect only 2 +# metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving +# the files and wouldn't like for them to be cleaned up. +cleanup-metrics-db-files = true + +# WebService exposed by App's port +webservice-listener-port = 9600 + +# Metric DB File Prefix Path location +metrics-db-file-prefix-path = /tmp/metricsdb_ + +https-enabled = false + +#Setup the correct path for certificates +#certificate-file-path = specify_path + +#private-key-file-path = specify_path + +# Plugin Stats Metadata file name, expected to be in the same location +plugin-stats-metadata = plugin-stats-metadata + +# Agent Stats Metadata file name, expected to be in the same location +agent-stats-metadata = agent-stats-metadata +```` +To start the performance analyzer RCA agent, run the following command. + +````bash +OPENSEARCH_HOME=~/opensearch-2.2.1 OPENSEARCH_JAVA_HOME=~/opensearch-2.2.1/jdk OPENSEARCH_PATH_CONF=~/opensearch-2.2.1/bin ./performance-analyzer-agent-cli +```` ### Storage -Performance Analyzer uses `/dev/shm` for temporary storage. During heavy workloads on a cluster, Performance Analyzer can use up to 1 GB of space. +Performance analyzer uses `/dev/shm` for temporary storage. During heavy workloads on a cluster, performance analyzer can use up to 1 GB of space. Docker, however, has a default `/dev/shm` size of 64 MB. To change this value, you can use the `docker run --shm-size 1gb` flag or [a similar setting in Docker Compose](https://docs.docker.com/compose/compose-file#shm_size). @@ -83,29 +119,29 @@ Then remount the file system: mount -o remount /dev/shm ``` - ### Security -Performance Analyzer supports encryption in transit for requests. It currently does *not* support client or server authentication for requests. To enable encryption in transit, edit `performance-analyzer.properties` in your `$OPENSEARCH_HOME` directory: +Performance analyzer supports encryption in transit for requests. It currently does *not* support client or server authentication for requests. To enable encryption in transit, edit `performance-analyzer.properties` in your `$OPENSEARCH_HOME` directory. ```bash vi $OPENSEARCH_HOME/config/opensearch-performance-analyzer/performance-analyzer.properties ``` -Change the following lines to configure encryption in transit. Note that `certificate-file-path` must be a certificate for the server, not a root CA: +Change the following lines to configure encryption in transit. Note that `certificate-file-path` must be a certificate for the server, not a root certificate authority (CA). -``` +````bash https-enabled = true #Setup the correct path for certificates certificate-file-path = specify_path private-key-file-path = specify_path -``` +```` -## Enable Performance Analyzer for RPM/YUM installations +### Enable performance analyzer for RPM/YUM installations + +If you installed OpenSearch from an RPM distribution, you can start and stop performance analyzer with `systemctl`. -If you installed OpenSearch from an RPM distribution, you can start and stop Performance Analyzer with `systemctl`: ```bash # Start OpenSearch Performance Analyzer sudo systemctl start opensearch-performance-analyzer.service @@ -113,92 +149,155 @@ sudo systemctl start opensearch-performance-analyzer.service sudo systemctl stop opensearch-performance-analyzer.service ``` -## Configure Performance Analyzer for tarball installations +## Example API query and response -In a tarball installation, Performance Analyzer collects data when it is enabled. But in order to read that data using the REST API on port 9600, you must first manually launch the associated reader agent process: +The following is an example API query: + +````bash +GET localhost:9600/_plugins/_performanceanalyzer/metrics/units +```` -1. Make Performance Analyzer accessible outside of the host machine +The following is an example response: - ```bash - cd /usr/share/opensearch # navigate to the OpenSearch home directory - cd config/opensearch-performance-analyzer/ - vi performance-analyzer.properties - ``` +````json +{"Disk_Utilization":"%","Cache_Request_Hit":"count", +"Refresh_Time":"ms","ThreadPool_QueueLatency":"count", +"Merge_Time":"ms","ClusterApplierService_Latency":"ms", +"PublishClusterState_Latency":"ms", +"Cache_Request_Size":"B","LeaderCheck_Failure":"count", +"ThreadPool_QueueSize":"count","Sched_Runtime":"s/ctxswitch","Disk_ServiceRate":"MB/s","Heap_AllocRate":"B/s","Indexing_Pressure_Current_Limits":"B", +"Sched_Waittime":"s/ctxswitch","ShardBulkDocs":"count", +"Thread_Blocked_Time":"s/event","VersionMap_Memory":"B", +"Master_Task_Queue_Time":"ms","IO_TotThroughput":"B/s", +"Indexing_Pressure_Current_Bytes":"B", +"Indexing_Pressure_Last_Successful_Timestamp":"ms", +"Net_PacketRate6":"packets/s","Cache_Query_Hit":"count", +"IO_ReadSyscallRate":"count/s","Net_PacketRate4":"packets/s","Cache_Request_Miss":"count", +"ThreadPool_RejectedReqs":"count","Net_TCP_TxQ":"segments/flow","Master_Task_Run_Time":"ms", +"IO_WriteSyscallRate":"count/s","IO_WriteThroughput":"B/s", +"Refresh_Event":"count","Flush_Time":"ms","Heap_Init":"B", +"Indexing_Pressure_Rejection_Count":"count", +"CPU_Utilization":"cores","Cache_Query_Size":"B", +"Merge_Event":"count","Cache_FieldData_Eviction":"count", +"IO_TotalSyscallRate":"count/s","Net_Throughput":"B/s", +"Paging_RSS":"pages", +"AdmissionControl_ThresholdValue":"count", +"Indexing_Pressure_Average_Window_Throughput":"count/s", +"Cache_MaxSize":"B","IndexWriter_Memory":"B", +"Net_TCP_SSThresh":"B/flow","IO_ReadThroughput":"B/s", +"LeaderCheck_Latency":"ms","FollowerCheck_Failure":"count", +"HTTP_RequestDocs":"count","Net_TCP_Lost":"segments/flow", +"GC_Collection_Event":"count","Sched_CtxRate":"count/s", +"AdmissionControl_RejectionCount":"count","Heap_Max":"B", +"ClusterApplierService_Failure":"count", +"PublishClusterState_Failure":"count", +"Merge_CurrentEvent":"count","Indexing_Buffer":"B", +"Bitset_Memory":"B","Net_PacketDropRate4":"packets/s", +"Heap_Committed":"B","Net_PacketDropRate6":"packets/s", +"Thread_Blocked_Event":"count","GC_Collection_Time":"ms", +"Cache_Query_Miss":"count","Latency":"ms", +"Shard_State":"count","Thread_Waited_Event":"count", +"CB_ConfiguredSize":"B","ThreadPool_QueueCapacity":"count", +"CB_TrippedEvents":"count","Disk_WaitTime":"ms", +"Data_RetryingPendingTasksCount":"count", +"AdmissionControl_CurrentValue":"count", +"Flush_Event":"count","Net_TCP_RxQ":"segments/flow", +"Shard_Size_In_Bytes":"B","Thread_Waited_Time":"s/event", +"HTTP_TotalRequests":"count", +"ThreadPool_ActiveThreads":"count", +"Paging_MinfltRate":"count/s","Net_TCP_SendCWND":"B/flow", +"Cache_Request_Eviction":"count","Segments_Total":"count", +"FollowerCheck_Latency":"ms","Heap_Used":"B", +"Master_ThrottledPendingTasksCount":"count", +"CB_EstimatedSize":"B","Indexing_ThrottleTime":"ms", +"Master_PendingQueueSize":"count", +"Cache_FieldData_Size":"B","Paging_MajfltRate":"count/s", +"ThreadPool_TotalThreads":"count","ShardEvents":"count", +"Net_TCP_NumFlows":"count","Election_Term":"count"} +```` - Uncomment the line `#webservice-bind-host` and set it to `0.0.0.0`: +## Root cause analysis - ``` - # ======================== OpenSearch performance analyzer plugin config ========================= +The [root cause analysis]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/rca/index/) (RCA) framework uses the information from performance analyzer to inform administrators of the root cause of performance and availability issues that their clusters might be experiencing. - # NOTE: this is an example for Linux. Please modify the config accordingly if you are using it under other OS. +### Enable the RCA framework - # WebService bind host; default to all interfaces - webservice-bind-host = 0.0.0.0 +To enable the RCA framework, run the following command: - # Metrics data location - metrics-location = /dev/shm/performanceanalyzer/ +```bash +curl -XPOST http://localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' +``` - # Metrics deletion interval (minutes) for metrics data. - # Interval should be between 1 to 60. - metrics-deletion-interval = 1 +If you encounter the `curl: (52) Empty reply from server` response, run the following command to enable RCA: - # If set to true, the system cleans up the files behind it. So at any point, we should expect only 2 - # metrics-db-file-prefix-path files. If set to false, no files are cleaned up. This can be useful, if you are archiving - # the files and wouldn't like for them to be cleaned up. - cleanup-metrics-db-files = true +```bash +curl -XPOST https://localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k +``` - # WebService exposed by App's port - webservice-listener-port = 9600 +### Example API query and response - # Metric DB File Prefix Path location - metrics-db-file-prefix-path = /tmp/metricsdb_ +To request all available RCAs, run the following command: - https-enabled = false +````bash +GET localhost:9600/_plugins/_performanceanalyzer/rca +```` - #Setup the correct path for certificates - certificate-file-path = specify_path +To request a specific RCA, run the following command: - private-key-file-path = specify_path +````bash +GET localhost:9600/_plugins/_performanceanalyzer/rca?name=HighHeapUsageClusterRCA +```` - # Plugin Stats Metadata file name, expected to be in the same location - plugin-stats-metadata = plugin-stats-metadata +The following is an example response: - # Agent Stats Metadata file name, expected to be in the same location - agent-stats-metadata = agent-stats-metadata - ``` +```json +{ + "HighHeapUsageClusterRCA": [{ + "RCA_name": "HighHeapUsageClusterRCA", + "state": "unhealthy", + "timestamp": 1587426650942, + "HotClusterSummary": [{ + "number_of_nodes": 2, + "number_of_unhealthy_nodes": 1, + "HotNodeSummary": [{ + "host_address": "192.168.144.2", + "node_id": "JtlEoRowSI6iNpzpjlbp_Q", + "HotResourceSummary": [{ + "resource_type": "old gen", + "threshold": 0.65, + "value": 0.81827232588145373, + "avg": NaN, + "max": NaN, + "min": NaN, + "unit_type": "heap usage in percentage", + "time_period_seconds": 600, + "TopConsumerSummary": [{ + "name": "CACHE_FIELDDATA_SIZE", + "value": 590702564 + }, + { + "name": "CACHE_REQUEST_SIZE", + "value": 28375 + }, + { + "name": "CACHE_QUERY_SIZE", + "value": 12687 + } + ], + }] + }] + }] + }] +} +``` -1. Make the CLI executable: +## Performance analyzer and RCA API references - ```bash - sudo chmod +x ./bin/performance-analyzer-agent-cli - ``` +### Related links -1. Launch the agent CLI: +Further documentation on the use of performance analyzer and RCA can be found at the following links: - ```bash - OPENSEARCH_HOME="$PWD" OPENSEARCH_PATH_CONF="$PWD/config" ./bin/performance-analyzer-agent-cli - ``` - -1. In a separate window, enable the Performance Analyzer plugin: - - ```bash - curl -XPOST localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' - ``` - - If you receive the `curl: (52) Empty reply from server` error, you are likely protecting your cluster with the security plugin and you need to provide credentials. Modify the following command to use your username and password: - - ```bash - curl -XPOST https://localhost:9200/_plugins/_performanceanalyzer/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k - ``` - -1. Finally, enable the Root Cause Analyzer (RCA) framework - - ```bash - curl -XPOST localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' - ``` - - Similar to step 4, if you run into `curl: (52) Empty reply from server`, run the command below to enable RCA - - ```bash - curl -XPOST https://localhost:9200/_plugins/_performanceanalyzer/rca/cluster/config -H 'Content-Type: application/json' -d '{"enabled": true}' -u 'admin:admin' -k - ``` \ No newline at end of file +- [Performance analyzer API guide]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/api/). +- [Root cause analysis]({{site.url}}{{site.baseurl}}/monitoring-plugins/pa/rca/index/). +- [Root cause analysis API guide]({{site.url}}{{site.baseurl}}/latest/monitoring-plugins/pa/rca/api/). +- [RFC: Root cause analysis](https://github.com/opensearch-project/performance-analyzer-rca/blob/main/docs/rfc-rca.pdf). \ No newline at end of file