HDFS-8284. Update documentation about how to use HTrace with HDFS (Masatake Iwasaki via Colin P. McCabe)
(cherry picked from commit 8f7c2364d7
)
This commit is contained in:
parent
f8807839c6
commit
d83ae68bb0
|
@ -1758,18 +1758,6 @@ for ldap providers in the same way as above does.
|
||||||
</description>
|
</description>
|
||||||
</property>
|
</property>
|
||||||
|
|
||||||
<property>
|
|
||||||
<name>hadoop.htrace.spanreceiver.classes</name>
|
|
||||||
<value></value>
|
|
||||||
<description>
|
|
||||||
A comma separated list of the fully-qualified class name of classes
|
|
||||||
implementing SpanReceiver. The tracing system works by collecting
|
|
||||||
information in structs called 'Spans'. It is up to you to choose
|
|
||||||
how you want to receive this information by implementing the
|
|
||||||
SpanReceiver interface.
|
|
||||||
</description>
|
|
||||||
</property>
|
|
||||||
|
|
||||||
<property>
|
<property>
|
||||||
<name>ipc.server.max.connections</name>
|
<name>ipc.server.max.connections</name>
|
||||||
<value>0</value>
|
<value>0</value>
|
||||||
|
|
|
@ -18,12 +18,12 @@ Enabling Dapper-like Tracing in Hadoop
|
||||||
* [Enabling Dapper-like Tracing in Hadoop](#Enabling_Dapper-like_Tracing_in_Hadoop)
|
* [Enabling Dapper-like Tracing in Hadoop](#Enabling_Dapper-like_Tracing_in_Hadoop)
|
||||||
* [Dapper-like Tracing in Hadoop](#Dapper-like_Tracing_in_Hadoop)
|
* [Dapper-like Tracing in Hadoop](#Dapper-like_Tracing_in_Hadoop)
|
||||||
* [HTrace](#HTrace)
|
* [HTrace](#HTrace)
|
||||||
* [Samplers](#Samplers)
|
|
||||||
* [SpanReceivers](#SpanReceivers)
|
* [SpanReceivers](#SpanReceivers)
|
||||||
* [Setting up ZipkinSpanReceiver](#Setting_up_ZipkinSpanReceiver)
|
|
||||||
* [Dynamic update of tracing configuration](#Dynamic_update_of_tracing_configuration)
|
* [Dynamic update of tracing configuration](#Dynamic_update_of_tracing_configuration)
|
||||||
* [Starting tracing spans by HTrace API](#Starting_tracing_spans_by_HTrace_API)
|
* [Starting tracing spans by HTrace API](#Starting_tracing_spans_by_HTrace_API)
|
||||||
* [Sample code for tracing](#Sample_code_for_tracing)
|
* [Sample code for tracing](#Sample_code_for_tracing)
|
||||||
|
* [Starting tracing spans by configuration for HDFS client](#Starting_tracing_spans_by_configuration_for_HDFS_client)
|
||||||
|
|
||||||
|
|
||||||
Dapper-like Tracing in Hadoop
|
Dapper-like Tracing in Hadoop
|
||||||
-----------------------------
|
-----------------------------
|
||||||
|
@ -32,83 +32,51 @@ Dapper-like Tracing in Hadoop
|
||||||
|
|
||||||
[HDFS-5274](https://issues.apache.org/jira/browse/HDFS-5274) added support for tracing requests through HDFS,
|
[HDFS-5274](https://issues.apache.org/jira/browse/HDFS-5274) added support for tracing requests through HDFS,
|
||||||
using the open source tracing library,
|
using the open source tracing library,
|
||||||
[Apache HTrace](https://git-wip-us.apache.org/repos/asf/incubator-htrace.git).
|
[Apache HTrace](http://htrace.incubator.apache.org/).
|
||||||
Setting up tracing is quite simple, however it requires some very minor changes to your client code.
|
Setting up tracing is quite simple, however it requires some very minor changes to your client code.
|
||||||
|
|
||||||
### Samplers
|
|
||||||
|
|
||||||
Configure the samplers in `core-site.xml` property: `hadoop.htrace.sampler`.
|
|
||||||
The value can be NeverSampler, AlwaysSampler or ProbabilitySampler.
|
|
||||||
NeverSampler: HTrace is OFF for all spans;
|
|
||||||
AlwaysSampler: HTrace is ON for all spans;
|
|
||||||
ProbabilitySampler: HTrace is ON for some percentage% of top-level spans.
|
|
||||||
|
|
||||||
<property>
|
|
||||||
<name>hadoop.htrace.sampler</name>
|
|
||||||
<value>NeverSampler</value>
|
|
||||||
</property>
|
|
||||||
|
|
||||||
### SpanReceivers
|
### SpanReceivers
|
||||||
|
|
||||||
The tracing system works by collecting information in structs called 'Spans'.
|
The tracing system works by collecting information in structs called 'Spans'.
|
||||||
It is up to you to choose how you want to receive this information
|
It is up to you to choose how you want to receive this information
|
||||||
by implementing the SpanReceiver interface, which defines one method:
|
by using implementation of [SpanReceiver](http://htrace.incubator.apache.org/#Span_Receivers)
|
||||||
|
interface bundled with HTrace or implementing it by yourself.
|
||||||
|
|
||||||
public void receiveSpan(Span span);
|
[HTrace](http://htrace.incubator.apache.org/) provides options such as
|
||||||
|
|
||||||
Configure what SpanReceivers you'd like to use
|
* FlumeSpanReceiver
|
||||||
|
* HBaseSpanReceiver
|
||||||
|
* HTracedRESTReceiver
|
||||||
|
* ZipkinSpanReceiver
|
||||||
|
|
||||||
|
In order to set up SpanReceivers for HDFS servers,
|
||||||
|
configure what SpanReceivers you'd like to use
|
||||||
by putting a comma separated list of the fully-qualified class name of classes implementing SpanReceiver
|
by putting a comma separated list of the fully-qualified class name of classes implementing SpanReceiver
|
||||||
in `core-site.xml` property: `hadoop.htrace.spanreceiver.classes`.
|
in `hdfs-site.xml` property: `dfs.htrace.spanreceiver.classes`.
|
||||||
|
|
||||||
|
```xml
|
||||||
<property>
|
<property>
|
||||||
<name>hadoop.htrace.spanreceiver.classes</name>
|
<name>dfs.htrace.spanreceiver.classes</name>
|
||||||
<value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
|
<value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
|
||||||
</property>
|
</property>
|
||||||
<property>
|
<property>
|
||||||
<name>hadoop.htrace.local-file-span-receiver.path</name>
|
<name>dfs.htrace.local-file-span-receiver.path</name>
|
||||||
<value>/var/log/hadoop/htrace.out</value>
|
<value>/var/log/hadoop/htrace.out</value>
|
||||||
</property>
|
</property>
|
||||||
|
```
|
||||||
|
|
||||||
You can omit package name prefix if you use span receiver bundled with HTrace.
|
You can omit package name prefix if you use span receiver bundled with HTrace.
|
||||||
|
|
||||||
|
```xml
|
||||||
<property>
|
<property>
|
||||||
<name>hadoop.htrace.spanreceiver.classes</name>
|
<name>dfs.htrace.spanreceiver.classes</name>
|
||||||
<value>LocalFileSpanReceiver</value>
|
<value>LocalFileSpanReceiver</value>
|
||||||
</property>
|
</property>
|
||||||
|
```
|
||||||
|
|
||||||
### Setting up ZipkinSpanReceiver
|
You also need to add the jar bundling SpanReceiver to the classpath of Hadoop
|
||||||
|
on each node. (LocalFileSpanReceiver in the example above is included in the
|
||||||
Instead of implementing SpanReceiver by yourself,
|
jar of htrace-core which is bundled with Hadoop.)
|
||||||
you can use `ZipkinSpanReceiver` which uses
|
|
||||||
[Zipkin](https://github.com/twitter/zipkin) for collecting and displaying tracing data.
|
|
||||||
|
|
||||||
In order to use `ZipkinSpanReceiver`,
|
|
||||||
you need to download and setup [Zipkin](https://github.com/twitter/zipkin) first.
|
|
||||||
|
|
||||||
you also need to add the jar of `htrace-zipkin` to the classpath of Hadoop on each node.
|
|
||||||
Here is example setup procedure.
|
|
||||||
|
|
||||||
$ git clone https://github.com/cloudera/htrace
|
|
||||||
$ cd htrace/htrace-zipkin
|
|
||||||
$ mvn compile assembly:single
|
|
||||||
$ cp target/htrace-zipkin-*-jar-with-dependencies.jar $HADOOP_HOME/share/hadoop/common/lib/
|
|
||||||
|
|
||||||
The sample configuration for `ZipkinSpanReceiver` is shown below.
|
|
||||||
By adding these to `core-site.xml` of NameNode and DataNodes, `ZipkinSpanReceiver` is initialized on the startup.
|
|
||||||
You also need this configuration on the client node in addition to the servers.
|
|
||||||
|
|
||||||
<property>
|
|
||||||
<name>hadoop.htrace.spanreceiver.classes</name>
|
|
||||||
<value>ZipkinSpanReceiver</value>
|
|
||||||
</property>
|
|
||||||
<property>
|
|
||||||
<name>hadoop.htrace.zipkin.collector-hostname</name>
|
|
||||||
<value>192.168.1.2</value>
|
|
||||||
</property>
|
|
||||||
<property>
|
|
||||||
<name>hadoop.htrace.zipkin.collector-port</name>
|
|
||||||
<value>9410</value>
|
|
||||||
</property>
|
|
||||||
|
|
||||||
### Dynamic update of tracing configuration
|
### Dynamic update of tracing configuration
|
||||||
|
|
||||||
|
@ -136,8 +104,8 @@ You need to run the command against all servers if you want to update the config
|
||||||
You need to specify the class name of span receiver as argument of `-class` option.
|
You need to specify the class name of span receiver as argument of `-class` option.
|
||||||
You can specify the configuration associated with span receiver by `-Ckey=value` options.
|
You can specify the configuration associated with span receiver by `-Ckey=value` options.
|
||||||
|
|
||||||
$ hadoop trace -add -class LocalFileSpanReceiver -Chadoop.htrace.local-file-span-receiver.path=/tmp/htrace.out -host 192.168.56.2:9000
|
$ hadoop trace -add -class LocalFileSpanReceiver -Cdfs.htrace.local-file-span-receiver.path=/tmp/htrace.out -host 192.168.56.2:9000
|
||||||
Added trace span receiver 2 with configuration hadoop.htrace.local-file-span-receiver.path = /tmp/htrace.out
|
Added trace span receiver 2 with configuration dfs.htrace.local-file-span-receiver.path = /tmp/htrace.out
|
||||||
|
|
||||||
$ hadoop trace -list -host 192.168.56.2:9000
|
$ hadoop trace -list -host 192.168.56.2:9000
|
||||||
ID CLASS
|
ID CLASS
|
||||||
|
@ -149,8 +117,9 @@ In order to trace, you will need to wrap the traced logic with **tracing span**
|
||||||
When there is running tracing spans,
|
When there is running tracing spans,
|
||||||
the tracing information is propagated to servers along with RPC requests.
|
the tracing information is propagated to servers along with RPC requests.
|
||||||
|
|
||||||
In addition, you need to initialize `SpanReceiver` once per process.
|
In addition, you need to initialize `SpanReceiverHost` once per process.
|
||||||
|
|
||||||
|
```java
|
||||||
import org.apache.hadoop.hdfs.HdfsConfiguration;
|
import org.apache.hadoop.hdfs.HdfsConfiguration;
|
||||||
import org.apache.hadoop.tracing.SpanReceiverHost;
|
import org.apache.hadoop.tracing.SpanReceiverHost;
|
||||||
import org.apache.htrace.Sampler;
|
import org.apache.htrace.Sampler;
|
||||||
|
@ -169,14 +138,17 @@ In addition, you need to initialize `SpanReceiver` once per process.
|
||||||
} finally {
|
} finally {
|
||||||
if (ts != null) ts.close();
|
if (ts != null) ts.close();
|
||||||
}
|
}
|
||||||
|
```
|
||||||
|
|
||||||
### Sample code for tracing
|
### Sample code for tracing by HTrace API
|
||||||
|
|
||||||
The `TracingFsShell.java` shown below is the wrapper of FsShell
|
The `TracingFsShell.java` shown below is the wrapper of FsShell
|
||||||
which start tracing span before invoking HDFS shell command.
|
which start tracing span before invoking HDFS shell command.
|
||||||
|
|
||||||
|
```java
|
||||||
import org.apache.hadoop.conf.Configuration;
|
import org.apache.hadoop.conf.Configuration;
|
||||||
import org.apache.hadoop.fs.FsShell;
|
import org.apache.hadoop.fs.FsShell;
|
||||||
|
import org.apache.hadoop.hdfs.DFSConfigKeys;
|
||||||
import org.apache.hadoop.tracing.SpanReceiverHost;
|
import org.apache.hadoop.tracing.SpanReceiverHost;
|
||||||
import org.apache.hadoop.util.ToolRunner;
|
import org.apache.hadoop.util.ToolRunner;
|
||||||
import org.apache.htrace.Sampler;
|
import org.apache.htrace.Sampler;
|
||||||
|
@ -189,7 +161,7 @@ which start tracing span before invoking HDFS shell command.
|
||||||
FsShell shell = new FsShell();
|
FsShell shell = new FsShell();
|
||||||
conf.setQuietMode(false);
|
conf.setQuietMode(false);
|
||||||
shell.setConf(conf);
|
shell.setConf(conf);
|
||||||
SpanReceiverHost.getInstance(conf);
|
SpanReceiverHost.get(conf, DFSConfigKeys.DFS_SERVER_HTRACE_PREFIX);
|
||||||
int res = 0;
|
int res = 0;
|
||||||
TraceScope ts = null;
|
TraceScope ts = null;
|
||||||
try {
|
try {
|
||||||
|
@ -202,8 +174,40 @@ which start tracing span before invoking HDFS shell command.
|
||||||
System.exit(res);
|
System.exit(res);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
```
|
||||||
|
|
||||||
You can compile and execute this code as shown below.
|
You can compile and execute this code as shown below.
|
||||||
|
|
||||||
$ javac -cp `hadoop classpath` TracingFsShell.java
|
$ javac -cp `hadoop classpath` TracingFsShell.java
|
||||||
$ java -cp .:`hadoop classpath` TracingFsShell -ls /
|
$ java -cp .:`hadoop classpath` TracingFsShell -ls /
|
||||||
|
|
||||||
|
### Starting tracing spans by configuration for HDFS client
|
||||||
|
|
||||||
|
The DFSClient can enable tracing internally. This allows you to use HTrace with
|
||||||
|
your client without modifying the client source code.
|
||||||
|
|
||||||
|
Configure the span receivers and samplers in `hdfs-site.xml`
|
||||||
|
by properties `dfs.client.htrace.sampler` and `dfs.client.htrace.sampler`.
|
||||||
|
The value of `dfs.client.htrace.sampler` can be NeverSampler, AlwaysSampler or ProbabilitySampler.
|
||||||
|
|
||||||
|
* NeverSampler: HTrace is OFF for all requests to namenodes and datanodes;
|
||||||
|
* AlwaysSampler: HTrace is ON for all requests to namenodes and datanodes;
|
||||||
|
* ProbabilitySampler: HTrace is ON for some percentage% of requests to namenodes and datanodes
|
||||||
|
|
||||||
|
You do not need to enable this if your client program has been modified
|
||||||
|
to use HTrace.
|
||||||
|
|
||||||
|
```xml
|
||||||
|
<property>
|
||||||
|
<name>dfs.client.htrace.spanreceiver.classes</name>
|
||||||
|
<value>LocalFileSpanReceiver</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>dfs.client.htrace.sampler</name>
|
||||||
|
<value>ProbabilitySampler</value>
|
||||||
|
</property>
|
||||||
|
<property>
|
||||||
|
<name>dfs.client.htrace.sampler.fraction</name>
|
||||||
|
<value>0.5</value>
|
||||||
|
</property>
|
||||||
|
```
|
||||||
|
|
|
@ -197,6 +197,9 @@ Release 2.8.0 - UNRELEASED
|
||||||
|
|
||||||
HDFS-5640. Add snapshot methods to FileContext. (Rakesh R via cnauroth)
|
HDFS-5640. Add snapshot methods to FileContext. (Rakesh R via cnauroth)
|
||||||
|
|
||||||
|
HDFS-8284. Update documentation about how to use HTrace with HDFS (Masatake
|
||||||
|
Iwasaki via Colin P. McCabe)
|
||||||
|
|
||||||
OPTIMIZATIONS
|
OPTIMIZATIONS
|
||||||
|
|
||||||
HDFS-8026. Trace FSOutputSummer#writeChecksumChunks rather than
|
HDFS-8026. Trace FSOutputSummer#writeChecksumChunks rather than
|
||||||
|
|
|
@ -2315,4 +2315,22 @@
|
||||||
the delay time will increase exponentially(double) for each retry.
|
the delay time will increase exponentially(double) for each retry.
|
||||||
</description>
|
</description>
|
||||||
</property>
|
</property>
|
||||||
|
|
||||||
|
<property>
|
||||||
|
<name>dfs.htrace.spanreceiver.classes</name>
|
||||||
|
<value></value>
|
||||||
|
<description>
|
||||||
|
The class name of the HTrace SpanReceiver for the NameNode and DataNode.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
|
||||||
|
<property>
|
||||||
|
<name>dfs.client.htrace.spanreceiver.classes</name>
|
||||||
|
<value></value>
|
||||||
|
<description>
|
||||||
|
The class name of the HTrace SpanReceiver for the HDFS client. You do not
|
||||||
|
need to enable this if your client program has been modified to use HTrace.
|
||||||
|
</description>
|
||||||
|
</property>
|
||||||
|
|
||||||
</configuration>
|
</configuration>
|
||||||
|
|
Loading…
Reference in New Issue