HBASE-24405 : Document hbase:slowlog related operations (#1747)
Signed-off-by: ramkrish86 <ramkrishna@apache.org> Signed-off-by: Anoop Sam John <anoopsamjohn@apache.org>
This commit is contained in:
parent
b144d17044
commit
2383cc4242
|
@ -1961,6 +1961,18 @@ possible configurations would overwhelm and obscure the important.
|
||||||
responses with complete data.
|
responses with complete data.
|
||||||
</description>
|
</description>
|
||||||
</property>
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>hbase.regionserver.slowlog.systable.enabled</name>
|
||||||
|
<value>false</value>
|
||||||
|
<description>
|
||||||
|
Should be enabled only if hbase.regionserver.slowlog.buffer.enabled is enabled. If enabled
|
||||||
|
(true), all slow/large RPC logs would be persisted to system table hbase:slowlog (in addition
|
||||||
|
to in-memory ring buffer at each RegionServer). The records are stored in increasing
|
||||||
|
order of time. Operators can scan the table with various combination of ColumnValueFilter.
|
||||||
|
More details are provided in the doc section:
|
||||||
|
"Get Slow/Large Response Logs from System table hbase:slowlog"
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
<property>
|
<property>
|
||||||
<name>hbase.rpc.rows.size.threshold.reject</name>
|
<name>hbase.rpc.rows.size.threshold.reject</name>
|
||||||
<value>false</value>
|
<value>false</value>
|
||||||
|
|
|
@ -2110,6 +2110,24 @@ A comma-separated list of
|
||||||
`false`
|
`false`
|
||||||
|
|
||||||
|
|
||||||
|
[[hbase.regionserver.slowlog.systable.enabled]]
|
||||||
|
*`hbase.regionserver.slowlog.systable.enabled`*::
|
||||||
|
+
|
||||||
|
.Description
|
||||||
|
|
||||||
|
Should be enabled only if hbase.regionserver.slowlog.buffer.enabled is enabled.
|
||||||
|
If enabled (true), all slow/large RPC logs would be persisted to system table
|
||||||
|
hbase:slowlog (in addition to in-memory ring buffer at each RegionServer).
|
||||||
|
The records are stored in increasing order of time.
|
||||||
|
Operators can scan the table with various combination of ColumnValueFilter and
|
||||||
|
time range.
|
||||||
|
More details are provided in the doc section:
|
||||||
|
"Get Slow/Large Response Logs from System table hbase:slowlog"
|
||||||
|
|
||||||
|
+
|
||||||
|
.Default
|
||||||
|
`false`
|
||||||
|
|
||||||
|
|
||||||
[[hbase.region.replica.replication.enabled]]
|
[[hbase.region.replica.replication.enabled]]
|
||||||
*`hbase.region.replica.replication.enabled`*::
|
*`hbase.region.replica.replication.enabled`*::
|
||||||
|
|
|
@ -1955,6 +1955,9 @@ Examples:
|
||||||
|
|
||||||
----
|
----
|
||||||
|
|
||||||
|
include::slow_log_responses_from_systable.adoc[]
|
||||||
|
|
||||||
|
|
||||||
=== Block Cache Monitoring
|
=== Block Cache Monitoring
|
||||||
|
|
||||||
Starting with HBase 0.98, the HBase Web UI includes the ability to monitor and report on the performance of the block cache.
|
Starting with HBase 0.98, the HBase Web UI includes the ability to monitor and report on the performance of the block cache.
|
||||||
|
|
|
@ -0,0 +1,118 @@
|
||||||
|
////
|
||||||
|
/**
|
||||||
|
*
|
||||||
|
* Licensed to the Apache Software Foundation (ASF) under one
|
||||||
|
* or more contributor license agreements. See the NOTICE file
|
||||||
|
* distributed with this work for additional information
|
||||||
|
* regarding copyright ownership. The ASF licenses this file
|
||||||
|
* to you under the Apache License, Version 2.0 (the
|
||||||
|
* "License"); you may not use this file except in compliance
|
||||||
|
* with the License. You may obtain a copy of the License at
|
||||||
|
*
|
||||||
|
* http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
*
|
||||||
|
* Unless required by applicable law or agreed to in writing, software
|
||||||
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
* See the License for the specific language governing permissions and
|
||||||
|
* limitations under the License.
|
||||||
|
*/
|
||||||
|
////
|
||||||
|
|
||||||
|
[[slow_log_responses_from_systable]]
|
||||||
|
==== Get Slow/Large Response Logs from System table hbase:slowlog
|
||||||
|
|
||||||
|
The above section provides details about Admin APIs:
|
||||||
|
|
||||||
|
* get_slowlog_responses
|
||||||
|
* get_largelog_responses
|
||||||
|
* clear_slowlog_responses
|
||||||
|
|
||||||
|
All of the above APIs access online in-memory ring buffers from
|
||||||
|
individual RegionServers and accumulate logs from ring buffers to display
|
||||||
|
to end user. However, since the logs are stored in memory, after RegionServer is
|
||||||
|
restarted, all the objects held in memory of that RegionServer will be cleaned up
|
||||||
|
and previous logs are lost. What if we want to persist all these logs forever?
|
||||||
|
What if we want to store them in such a manner that operator can get all historical
|
||||||
|
records with some filters? e.g get me all large/slow RPC logs that are triggered by
|
||||||
|
user1 and are related to region:
|
||||||
|
cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. ?
|
||||||
|
|
||||||
|
If we have a system table that stores such logs in increasing (not so strictly though)
|
||||||
|
order of time, it can definitely help operators debug some historical events
|
||||||
|
(scan, get, put, compaction, flush etc) with detailed inputs.
|
||||||
|
|
||||||
|
Config which enabled system table to be created and store all log events is
|
||||||
|
`hbase.regionserver.slowlog.systable.enabled`.
|
||||||
|
|
||||||
|
The default value for this config is `false`. If provided `true`
|
||||||
|
(Note: `hbase.regionserver.slowlog.buffer.enabled` should also be `true`),
|
||||||
|
a cron job running in every RegionServer will persist the slow/large logs into
|
||||||
|
table hbase:slowlog. By default cron job runs every 10 min. Duration can be configured
|
||||||
|
with key: `hbase.slowlog.systable.chore.duration`. By default, RegionServer will
|
||||||
|
store upto 1000(config key: `hbase.regionserver.slowlog.systable.queue.size`)
|
||||||
|
slow/large logs in an internal queue and the chore will retrieve these logs
|
||||||
|
from the queue and perform batch insertion in hbase:slowlog.
|
||||||
|
|
||||||
|
hbase:slowlog has single ColumnFamily: `info`
|
||||||
|
`info` contains multiple qualifiers which are the same attributes present as
|
||||||
|
part of `get_slowlog_responses` API response.
|
||||||
|
|
||||||
|
* info:call_details
|
||||||
|
* info:client_address
|
||||||
|
* info:method_name
|
||||||
|
* info:param
|
||||||
|
* info:processing_time
|
||||||
|
* info:queue_time
|
||||||
|
* info:region_name
|
||||||
|
* info:response_size
|
||||||
|
* info:server_class
|
||||||
|
* info:start_time
|
||||||
|
* info:type
|
||||||
|
* info:username
|
||||||
|
|
||||||
|
And example of 2 rows from hbase:slowlog scan result:
|
||||||
|
[source]
|
||||||
|
----
|
||||||
|
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:call_details, timestamp=2020-05-16T14:58:14.211Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:client_address, timestamp=2020-05-16T14:58:14.211Z, value=172.20.10.2:57347
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:method_name, timestamp=2020-05-16T14:58:14.211Z, value=Scan
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:param, timestamp=2020-05-16T14:58:14.211Z, value=region { type: REGION_NAME value: "hbase:meta,,1" } scan { column { family: "info" } attribute { name: "_isolationle
|
||||||
|
vel_" value: "\x5C000" } start_row: "cluster_test,33333333,99999999999999" stop_row: "cluster_test,," time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks
|
||||||
|
: true max_result_size: 2097152 reversed: true caching: 10 include_stop_row: true readType: PREAD } number_of_rows: 10 close_scanner: false client_handles_partials: true client_
|
||||||
|
handles_heartbeats: true track_scan_metrics: false
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:processing_time, timestamp=2020-05-16T14:58:14.211Z, value=18
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:queue_time, timestamp=2020-05-16T14:58:14.211Z, value=0
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:region_name, timestamp=2020-05-16T14:58:14.211Z, value=hbase:meta,,1
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:response_size, timestamp=2020-05-16T14:58:14.211Z, value=1575
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:server_class, timestamp=2020-05-16T14:58:14.211Z, value=HRegionServer
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:start_time, timestamp=2020-05-16T14:58:14.211Z, value=1589640743732
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:type, timestamp=2020-05-16T14:58:14.211Z, value=ALL
|
||||||
|
\x024\xC1\x03\xE9\x04\xF5@ column=info:username, timestamp=2020-05-16T14:58:14.211Z, value=user2
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:call_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:client_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:method_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION_NAME value: "cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a
|
||||||
|
ttribute { name: "_isolationlevel_" value: "\x5C000" } start_row: "cccccccc" time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks: true max_result_size: 2
|
||||||
|
097152 caching: 2147483647 include_stop_row: false } number_of_rows: 2147483647 close_scanner: false client_handles_partials: true client_handles_heartbeats: true track_scan_met
|
||||||
|
rics: false
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:processing_time, timestamp=2020-05-16T14:59:58.764Z, value=24
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:queue_time, timestamp=2020-05-16T14:59:58.764Z, value=0
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:region_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf.
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:response_size, timestamp=2020-05-16T14:59:58.764Z, value=211227
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:server_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:start_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL
|
||||||
|
\x024\xC1\x06X\x81\xF6\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=user1
|
||||||
|
----
|
||||||
|
|
||||||
|
Operator can use ColumnValueFilter to filter records based on region_name, username,
|
||||||
|
client_address etc.
|
||||||
|
|
||||||
|
Time range based queries will also be very useful.
|
||||||
|
Example:
|
||||||
|
[source]
|
||||||
|
----
|
||||||
|
scan 'hbase:slowlog', { TIMERANGE => [1589621394000, 1589637999999] }
|
||||||
|
----
|
Loading…
Reference in New Issue