HBASE-17840 Update hbase book to space quotas on snapshots
This commit is contained in:
parent
c7a64a8313
commit
4dc805145b
|
@ -1964,6 +1964,51 @@ In these cases, the user may configure the system to not delete any space quota
|
||||||
</property>
|
</property>
|
||||||
----
|
----
|
||||||
|
|
||||||
|
=== HBase Snapshots with Space Quotas
|
||||||
|
|
||||||
|
One common area of unintended-filesystem-use with HBase is via HBase snapshots. Because snapshots
|
||||||
|
exist outside of the management of HBase tables, it is not uncommon for administrators to suddenly
|
||||||
|
realize that hundreds of gigabytes or terabytes of space is being used by HBase snapshots which were
|
||||||
|
forgotten and never removed.
|
||||||
|
|
||||||
|
link:https://issues.apache.org/jira/browse/HBASE-17748[HBASE-17748] is the umbrella JIRA issue which
|
||||||
|
expands on the original space quota functionality to also include HBase snapshots. While this is a confusing
|
||||||
|
subject, the implementation attempts to present this support in as reasonable and simple of a manner as
|
||||||
|
possible for administrators. This feature does not make any changes to administrator interaction with
|
||||||
|
space quotas, only in the internal computation of table/namespace usage. Table and namespace usage will
|
||||||
|
automatically incorporate the size taken by a snapshot per the rules defined below.
|
||||||
|
|
||||||
|
As a review, let's cover a snapshot's lifecycle: a snapshot is metadata which points to
|
||||||
|
a list of HFiles on the filesystem. This is why creating a snapshot is a very cheap operation; no HBase
|
||||||
|
table data is actually copied to perform a snapshot. Cloning a snapshot into a new table or restoring
|
||||||
|
a table is a cheap operation for the same reason; the new table references the files which already exist
|
||||||
|
on the filesystem without a copy. To include snapshots in space quotas, we need to define which table
|
||||||
|
"owns" a file when a snapshot references the file ("owns" refers to encompassing the filesystem usage
|
||||||
|
of that file).
|
||||||
|
|
||||||
|
Consider a snapshot which was made against a table. When the snapshot refers to a file and the table no
|
||||||
|
longer refers to that file, the "originating" table "owns" that file. When multiple snapshots refer to
|
||||||
|
the same file and no table refers to that file, the snapshot with the lowest-sorting name (lexicographically)
|
||||||
|
is chosen and the table which that snapshot was created from "owns" that file. HFiles are not "double-counted"
|
||||||
|
hen a table and one or more snapshots refer to that HFile.
|
||||||
|
|
||||||
|
When a table is "rematerialized" (via `clone_snapshot` or `restore_snapshot`), a similar problem of file
|
||||||
|
ownership arises. In this case, while the rematerialized table references a file which a snapshot also
|
||||||
|
references, the table does not "own" the file. The table from which the snapshot was created still "owns"
|
||||||
|
that file. When the rematerialized table is compacted or the snapshot is deleted, the rematerialized table
|
||||||
|
will uniquely refer to a new file and "own" the usage of that file. Similarly, when a table is duplicated via a snapshot
|
||||||
|
and `restore_snapshot`, the new table will not consume any quota size until the original table stops referring
|
||||||
|
to the files, either due to a compaction on the original table, a compaction on the new table, or the
|
||||||
|
original table being deleted.
|
||||||
|
|
||||||
|
One new HBase shell command was added to inspect the computed sizes of each snapshot in an HBase instance.
|
||||||
|
|
||||||
|
----
|
||||||
|
hbase> list_snapshot_sizes
|
||||||
|
SNAPSHOT SIZE
|
||||||
|
t1.s1 1159108
|
||||||
|
----
|
||||||
|
|
||||||
[[ops.backup]]
|
[[ops.backup]]
|
||||||
== HBase Backup
|
== HBase Backup
|
||||||
|
|
||||||
|
|
Loading…
Reference in New Issue