HBASE-4931 [docs] CopyTable instructions could be improved (Misty Stanley-Jones)
This commit is contained in:
parent
21d37b3a59
commit
95ef3acdd3
|
@ -159,8 +159,12 @@ public class CopyTable extends Configured implements Tool {
|
||||||
System.err.println(" $ bin/hbase " +
|
System.err.println(" $ bin/hbase " +
|
||||||
"org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 " +
|
"org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 " +
|
||||||
"--peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable ");
|
"--peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable ");
|
||||||
System.err.println("For performance consider the following general options:\n"
|
System.err.println("For performance consider the following general option:\n"
|
||||||
|
+ " It is recommended that you set the following to >=100. A higher value uses more memory but\n"
|
||||||
|
+ " decreases the round trip time to the server and may increase performance.\n"
|
||||||
+ " -Dhbase.client.scanner.caching=100\n"
|
+ " -Dhbase.client.scanner.caching=100\n"
|
||||||
|
+ " The following should always be set to false, to prevent writing data twice, which may produce \n"
|
||||||
|
+ " inaccurate results.\n"
|
||||||
+ " -Dmapreduce.map.speculative=false");
|
+ " -Dmapreduce.map.speculative=false");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
|
@ -188,20 +188,50 @@ private static final int ERROR_EXIT_CODE = 4;</programlisting>
|
||||||
<section
|
<section
|
||||||
xml:id="driver">
|
xml:id="driver">
|
||||||
<title>Driver</title>
|
<title>Driver</title>
|
||||||
<para>There is a <code>Driver</code> class that is executed by the HBase jar can be used to
|
<para>Several frequently-accessed utilities are provided as <code>Driver</code> classes, and executed by
|
||||||
invoke frequently accessed utilities. For example,</para>
|
the <filename>bin/hbase</filename> command. These utilities represent MapReduce jobs which
|
||||||
<screen>HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar
|
run on your cluster. They are run in the following way, replacing
|
||||||
|
<replaceable>UtilityName</replaceable> with the utility you want to run. This command
|
||||||
An example program must be given as the first argument.
|
assumes you have set the environment variable <literal>HBASE_HOME</literal> to the directory
|
||||||
Valid program names are:
|
where HBase is unpacked on your server.</para>
|
||||||
completebulkload: Complete a bulk data load.
|
<screen>
|
||||||
copytable: Export a table from local cluster to peer cluster
|
${HBASE_HOME}/bin/hbase org.apache.hadoop.hbase.mapreduce.<replaceable>UtilityName</replaceable>
|
||||||
export: Write table data to HDFS.
|
|
||||||
import: Import data written by Export.
|
|
||||||
importtsv: Import data in TSV format.
|
|
||||||
rowcounter: Count rows in HBase table
|
|
||||||
verifyrep: Compare the data from tables in two different clusters. WARNING: It doesn't work for incrementColumnValues'd cells since the timestamp is chan
|
|
||||||
</screen>
|
</screen>
|
||||||
|
<para>The following utilities are available:</para>
|
||||||
|
<variablelist>
|
||||||
|
<varlistentry>
|
||||||
|
<term><command>LoadIncrementalHFiles</command></term>
|
||||||
|
<listitem><para>Complete a bulk data load.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
<varlistentry>
|
||||||
|
<term><command>CopyTable</command></term>
|
||||||
|
<listitem><para>Export a table from the local cluster to a peer cluster.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
<varlistentry>
|
||||||
|
<term><command>Export</command></term>
|
||||||
|
<listitem><para>Write table data to HDFS.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
<varlistentry>
|
||||||
|
<term><command>Import</command></term>
|
||||||
|
<listitem><para>Import data written by a previous <command>Export</command> operation.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
<varlistentry>
|
||||||
|
<term><command>ImportTsv</command></term>
|
||||||
|
<listitem><para>Import data in TSV format.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
<varlistentry>
|
||||||
|
<term><command>RowCounter</command></term>
|
||||||
|
<listitem><para>Count rows in an HBase table.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
<varlistentry>
|
||||||
|
<term><command>replication.VerifyReplication</command></term>
|
||||||
|
<listitem><para>Compare the data from tables in two different clusters. WARNING: It
|
||||||
|
doesn't work for incrementColumnValues'd cells since the timestamp is changed. Note that
|
||||||
|
this command is in a different package than the others.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
</variablelist>
|
||||||
|
<para>Each command except <command>RowCounter</command> accepts a single
|
||||||
|
<literal>--help</literal> argument to print usage instructions.</para>
|
||||||
</section>
|
</section>
|
||||||
<section
|
<section
|
||||||
xml:id="hbck">
|
xml:id="hbck">
|
||||||
|
@ -266,66 +296,45 @@ Valid program names are:
|
||||||
<para> CopyTable is a utility that can copy part or of all of a table, either to the same
|
<para> CopyTable is a utility that can copy part or of all of a table, either to the same
|
||||||
cluster or another cluster. The target table must first exist. The usage is as
|
cluster or another cluster. The target table must first exist. The usage is as
|
||||||
follows:</para>
|
follows:</para>
|
||||||
<screen>$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename
|
|
||||||
</screen>
|
|
||||||
|
|
||||||
<variablelist>
|
<screen>
|
||||||
<title>Options</title>
|
$ <userinput>./bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help </userinput>
|
||||||
<varlistentry>
|
/bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --help
|
||||||
<term>starttime</term>
|
Usage: CopyTable [general options] [--starttime=X] [--endtime=Y] [--new.name=NEW] [--peer.adr=ADR] <tablename>
|
||||||
<listitem>
|
|
||||||
<para>Beginning of the time range. Without endtime means starttime to forever.</para>
|
Options:
|
||||||
</listitem>
|
rs.class hbase.regionserver.class of the peer cluster,
|
||||||
</varlistentry>
|
specify if different from current cluster
|
||||||
<varlistentry>
|
rs.impl hbase.regionserver.impl of the peer cluster,
|
||||||
<term>endtime</term>
|
startrow the start row
|
||||||
<listitem>
|
stoprow the stop row
|
||||||
<para>End of the time range. Without endtime means starttime to forever.</para>
|
starttime beginning of the time range (unixtime in millis)
|
||||||
</listitem>
|
without endtime means from starttime to forever
|
||||||
</varlistentry>
|
endtime end of the time range. Ignored if no starttime specified.
|
||||||
<varlistentry>
|
versions number of cell versions to copy
|
||||||
<term>versions</term>
|
new.name new table's name
|
||||||
<listitem>
|
peer.adr Address of the peer cluster given in the format
|
||||||
<para>Number of cell versions to copy.</para>
|
hbase.zookeeer.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
|
||||||
</listitem>
|
families comma-separated list of families to copy
|
||||||
</varlistentry>
|
To copy from cf1 to cf2, give sourceCfName:destCfName.
|
||||||
<varlistentry>
|
To keep the same name, just give "cfName"
|
||||||
<term>new.name</term>
|
all.cells also copy delete markers and deleted cells
|
||||||
<listitem>
|
|
||||||
<para>New table's name.</para>
|
Args:
|
||||||
</listitem>
|
tablename Name of the table to copy
|
||||||
</varlistentry>
|
|
||||||
<varlistentry>
|
Examples:
|
||||||
<term>peer.adr</term>
|
To copy 'TestTable' to a cluster that uses replication for a 1 hour window:
|
||||||
<listitem>
|
$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable --starttime=1265875194289 --endtime=1265878794289 --peer.adr=server1,server2,server3:2181:/hbase --families=myOldCf:myNewCf,cf2,cf3 TestTable
|
||||||
<para>Address of the peer cluster given in the format
|
|
||||||
hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent</para>
|
For performance consider the following general options:
|
||||||
</listitem>
|
It is recommended that you set the following to >=100. A higher value uses more memory but
|
||||||
</varlistentry>
|
decreases the round trip time to the server and may increase performance.
|
||||||
<varlistentry>
|
-Dhbase.client.scanner.caching=100
|
||||||
<term>families</term>
|
The following should always be set to false, to prevent writing data twice, which may produce
|
||||||
<listitem>
|
inaccurate results.
|
||||||
<para>Comma-separated list of ColumnFamilies to copy.</para>
|
-Dmapred.map.tasks.speculative.execution=false
|
||||||
</listitem>
|
</screen>
|
||||||
</varlistentry>
|
|
||||||
<varlistentry>
|
|
||||||
<term>all.cells</term>
|
|
||||||
<listitem>
|
|
||||||
<para>Also copy delete markers and uncollected deleted cells (advanced option).</para>
|
|
||||||
</listitem>
|
|
||||||
</varlistentry>
|
|
||||||
</variablelist>
|
|
||||||
<itemizedlist>
|
|
||||||
<title>Args:</title>
|
|
||||||
<listitem>
|
|
||||||
<para>tablename Name of table to copy.</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
<para>Example of copying 'TestTable' to a cluster that uses replication for a 1 hour
|
|
||||||
window:</para>
|
|
||||||
<screen>$ bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable
|
|
||||||
--starttime=1265875194289 --endtime=1265878794289
|
|
||||||
--peer.adr=server1,server2,server3:2181:/hbase TestTable</screen>
|
|
||||||
<note>
|
<note>
|
||||||
<title>Scanner Caching</title>
|
<title>Scanner Caching</title>
|
||||||
<para>Caching for the input Scan is configured via <code>hbase.client.scanner.caching</code>
|
<para>Caching for the input Scan is configured via <code>hbase.client.scanner.caching</code>
|
||||||
|
|
Loading…
Reference in New Issue