HBASE-24405 : Document hbase:slowlog related operations (#1747)
Signed-off-by: ramkrish86 <ramkrishna@apache.org> Signed-off-by: Anoop Sam John <anoopsamjohn@apache.org>
This commit is contained in:
parent
b144d17044
commit
2383cc4242
|
@ -1961,6 +1961,18 @@ possible configurations would overwhelm and obscure the important.
|
|||
responses with complete data.
|
||||
</description>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.regionserver.slowlog.systable.enabled</name>
|
||||
<value>false</value>
|
||||
<description>
|
||||
Should be enabled only if hbase.regionserver.slowlog.buffer.enabled is enabled. If enabled
|
||||
(true), all slow/large RPC logs would be persisted to system table hbase:slowlog (in addition
|
||||
to in-memory ring buffer at each RegionServer). The records are stored in increasing
|
||||
order of time. Operators can scan the table with various combination of ColumnValueFilter.
|
||||
More details are provided in the doc section:
|
||||
"Get Slow/Large Response Logs from System table hbase:slowlog"
|
||||
</description>
|
||||
</property>
|
||||
<property>
|
||||
<name>hbase.rpc.rows.size.threshold.reject</name>
|
||||
<value>false</value>
|
||||
|
|
|
@ -2110,6 +2110,24 @@ A comma-separated list of
|
|||
`false`
|
||||
|
||||
|
||||
[[hbase.regionserver.slowlog.systable.enabled]]
|
||||
*`hbase.regionserver.slowlog.systable.enabled`*::
|
||||
+
|
||||
.Description
|
||||
|
||||
Should be enabled only if hbase.regionserver.slowlog.buffer.enabled is enabled.
|
||||
If enabled (true), all slow/large RPC logs would be persisted to system table
|
||||
hbase:slowlog (in addition to in-memory ring buffer at each RegionServer).
|
||||
The records are stored in increasing order of time.
|
||||
Operators can scan the table with various combination of ColumnValueFilter and
|
||||
time range.
|
||||
More details are provided in the doc section:
|
||||
"Get Slow/Large Response Logs from System table hbase:slowlog"
|
||||
|
||||
+
|
||||
.Default
|
||||
`false`
|
||||
|
||||
|
||||
[[hbase.region.replica.replication.enabled]]
|
||||
*`hbase.region.replica.replication.enabled`*::
|
||||
|
|
|
@ -1955,6 +1955,9 @@ Examples:
|
|||
|
||||
----
|
||||
|
||||
include::slow_log_responses_from_systable.adoc[]
|
||||
|
||||
|
||||
=== Block Cache Monitoring
|
||||
|
||||
Starting with HBase 0.98, the HBase Web UI includes the ability to monitor and report on the performance of the block cache.
|
||||
|
|
|
@ -0,0 +1,118 @@
|
|||
////
|
||||
/**
|
||||
*
|
||||
* Licensed to the Apache Software Foundation (ASF) under one
|
||||
* or more contributor license agreements. See the NOTICE file
|
||||
* distributed with this work for additional information
|
||||
* regarding copyright ownership. The ASF licenses this file
|
||||
* to you under the Apache License, Version 2.0 (the
|
||||
* "License"); you may not use this file except in compliance
|
||||
* with the License. You may obtain a copy of the License at
|
||||
*
|
||||
* http://www.apache.org/licenses/LICENSE-2.0
|
||||
*
|
||||
* Unless required by applicable law or agreed to in writing, software
|
||||
* distributed under the License is distributed on an "AS IS" BASIS,
|
||||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
* See the License for the specific language governing permissions and
|
||||
* limitations under the License.
|
||||
*/
|
||||
////
|
||||
|
||||
[[slow_log_responses_from_systable]]
|
||||
==== Get Slow/Large Response Logs from System table hbase:slowlog
|
||||
|
||||
The above section provides details about Admin APIs:
|
||||
|
||||
* get_slowlog_responses
|
||||
* get_largelog_responses
|
||||
* clear_slowlog_responses
|
||||
|
||||
All of the above APIs access online in-memory ring buffers from
|
||||
individual RegionServers and accumulate logs from ring buffers to display
|
||||
to end user. However, since the logs are stored in memory, after RegionServer is
|
||||
restarted, all the objects held in memory of that RegionServer will be cleaned up
|
||||
and previous logs are lost. What if we want to persist all these logs forever?
|
||||
What if we want to store them in such a manner that operator can get all historical
|
||||
records with some filters? e.g get me all large/slow RPC logs that are triggered by
|
||||
user1 and are related to region:
|
||||
cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf. ?
|
||||
|
||||
If we have a system table that stores such logs in increasing (not so strictly though)
|
||||
order of time, it can definitely help operators debug some historical events
|
||||
(scan, get, put, compaction, flush etc) with detailed inputs.
|
||||
|
||||
Config which enabled system table to be created and store all log events is
|
||||
`hbase.regionserver.slowlog.systable.enabled`.
|
||||
|
||||
The default value for this config is `false`. If provided `true`
|
||||
(Note: `hbase.regionserver.slowlog.buffer.enabled` should also be `true`),
|
||||
a cron job running in every RegionServer will persist the slow/large logs into
|
||||
table hbase:slowlog. By default cron job runs every 10 min. Duration can be configured
|
||||
with key: `hbase.slowlog.systable.chore.duration`. By default, RegionServer will
|
||||
store upto 1000(config key: `hbase.regionserver.slowlog.systable.queue.size`)
|
||||
slow/large logs in an internal queue and the chore will retrieve these logs
|
||||
from the queue and perform batch insertion in hbase:slowlog.
|
||||
|
||||
hbase:slowlog has single ColumnFamily: `info`
|
||||
`info` contains multiple qualifiers which are the same attributes present as
|
||||
part of `get_slowlog_responses` API response.
|
||||
|
||||
* info:call_details
|
||||
* info:client_address
|
||||
* info:method_name
|
||||
* info:param
|
||||
* info:processing_time
|
||||
* info:queue_time
|
||||
* info:region_name
|
||||
* info:response_size
|
||||
* info:server_class
|
||||
* info:start_time
|
||||
* info:type
|
||||
* info:username
|
||||
|
||||
And example of 2 rows from hbase:slowlog scan result:
|
||||
[source]
|
||||
----
|
||||
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:call_details, timestamp=2020-05-16T14:58:14.211Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:client_address, timestamp=2020-05-16T14:58:14.211Z, value=172.20.10.2:57347
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:method_name, timestamp=2020-05-16T14:58:14.211Z, value=Scan
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:param, timestamp=2020-05-16T14:58:14.211Z, value=region { type: REGION_NAME value: "hbase:meta,,1" } scan { column { family: "info" } attribute { name: "_isolationle
|
||||
vel_" value: "\x5C000" } start_row: "cluster_test,33333333,99999999999999" stop_row: "cluster_test,," time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks
|
||||
: true max_result_size: 2097152 reversed: true caching: 10 include_stop_row: true readType: PREAD } number_of_rows: 10 close_scanner: false client_handles_partials: true client_
|
||||
handles_heartbeats: true track_scan_metrics: false
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:processing_time, timestamp=2020-05-16T14:58:14.211Z, value=18
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:queue_time, timestamp=2020-05-16T14:58:14.211Z, value=0
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:region_name, timestamp=2020-05-16T14:58:14.211Z, value=hbase:meta,,1
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:response_size, timestamp=2020-05-16T14:58:14.211Z, value=1575
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:server_class, timestamp=2020-05-16T14:58:14.211Z, value=HRegionServer
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:start_time, timestamp=2020-05-16T14:58:14.211Z, value=1589640743732
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:type, timestamp=2020-05-16T14:58:14.211Z, value=ALL
|
||||
\x024\xC1\x03\xE9\x04\xF5@ column=info:username, timestamp=2020-05-16T14:58:14.211Z, value=user2
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:call_details, timestamp=2020-05-16T14:59:58.764Z, value=Scan(org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ScanRequest)
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:client_address, timestamp=2020-05-16T14:59:58.764Z, value=172.20.10.2:57348
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:method_name, timestamp=2020-05-16T14:59:58.764Z, value=Scan
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:param, timestamp=2020-05-16T14:59:58.764Z, value=region { type: REGION_NAME value: "cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf." } scan { a
|
||||
ttribute { name: "_isolationlevel_" value: "\x5C000" } start_row: "cccccccc" time_range { from: 0 to: 9223372036854775807 } max_versions: 1 cache_blocks: true max_result_size: 2
|
||||
097152 caching: 2147483647 include_stop_row: false } number_of_rows: 2147483647 close_scanner: false client_handles_partials: true client_handles_heartbeats: true track_scan_met
|
||||
rics: false
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:processing_time, timestamp=2020-05-16T14:59:58.764Z, value=24
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:queue_time, timestamp=2020-05-16T14:59:58.764Z, value=0
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:region_name, timestamp=2020-05-16T14:59:58.764Z, value=cluster_test,cccccccc,1589635796466.aa45e1571d533f5ed0bb31cdccaaf9cf.
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:response_size, timestamp=2020-05-16T14:59:58.764Z, value=211227
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:server_class, timestamp=2020-05-16T14:59:58.764Z, value=HRegionServer
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:start_time, timestamp=2020-05-16T14:59:58.764Z, value=1589640743932
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:type, timestamp=2020-05-16T14:59:58.764Z, value=ALL
|
||||
\x024\xC1\x06X\x81\xF6\xEC column=info:username, timestamp=2020-05-16T14:59:58.764Z, value=user1
|
||||
----
|
||||
|
||||
Operator can use ColumnValueFilter to filter records based on region_name, username,
|
||||
client_address etc.
|
||||
|
||||
Time range based queries will also be very useful.
|
||||
Example:
|
||||
[source]
|
||||
----
|
||||
scan 'hbase:slowlog', { TIMERANGE => [1589621394000, 1589637999999] }
|
||||
----
|
Loading…
Reference in New Issue