MAPREDUCE-3991. Streaming FAQ has some wrong instructions about input files splitting. (harsh)

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1299045 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Harsh J 2012-03-09 21:07:33 +00:00
parent ad3d3f54d5
commit 1c534d0f9d
2 changed files with 3 additions and 1 deletions

View File

@ -113,6 +113,8 @@ Release 0.23.3 - UNRELEASED
MAPREDUCE-3885. Avoid an unnecessary copy for all requests/responses in
MRs ProtoOverHadoopRpcEngine. (Devaraj Das via sseth)
MAPREDUCE-3991. Streaming FAQ has some wrong instructions about input files splitting. (harsh)
OPTIMIZATIONS
BUG FIXES

View File

@ -750,7 +750,7 @@ You can use Hadoop Streaming to do this.
As an example, consider the problem of zipping (compressing) a set of files across the hadoop cluster. You can achieve this using either of these methods:
</p><ol>
<li> Hadoop Streaming and custom mapper script:<ul>
<li> Generate a file containing the full HDFS path of the input files. Each map task would get one file name as input.</li>
<li> Generate files listing the full HDFS paths of the files to be processed. Each list file is the input for an individual map task which processes the files listed.</li>
<li> Create a mapper script which, given a filename, will get the file to local disk, gzip the file and put it back in the desired output directory</li>
</ul></li>
<li>The existing Hadoop Framework:<ul>