HADOOP-8911. CRLF characters in source and text files (trunk equivalent patch). Contributed Raja Aluri.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2@1397435 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Suresh Srinivas 2012-10-12 04:48:40 +00:00
parent 5fa60ed91f
commit 3a9aadc3c7
51 changed files with 8326 additions and 8314 deletions

View File

@ -41,6 +41,9 @@ Release 2.0.3-alpha - Unreleased
HADOOP-8909. Hadoop Common Maven protoc calls must not depend on external HADOOP-8909. Hadoop Common Maven protoc calls must not depend on external
sh script. (Chris Nauroth via suresh) sh script. (Chris Nauroth via suresh)
HADOOP-8911. CRLF characters in source and text files.
(Raja Aluri via suresh)
OPTIMIZATIONS OPTIMIZATIONS
HADOOP-8866. SampleQuantiles#query is O(N^2) instead of O(N). (Andrew Wang HADOOP-8866. SampleQuantiles#query is O(N^2) instead of O(N). (Andrew Wang

View File

@ -15,8 +15,8 @@ <h2>Changes since Hadoop 2.0.1-alpha</h2>
<li> <a href="https://issues.apache.org/jira/browse/YARN-137">YARN-137</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-137">YARN-137</a>.
Major improvement reported by Siddharth Seth and fixed by Siddharth Seth (scheduler)<br> Major improvement reported by Siddharth Seth and fixed by Siddharth Seth (scheduler)<br>
<b>Change the default scheduler to the CapacityScheduler</b><br> <b>Change the default scheduler to the CapacityScheduler</b><br>
<blockquote>There's some bugs in the FifoScheduler atm - doesn't distribute tasks across nodes and some headroom (available resource) issues. <blockquote>There's some bugs in the FifoScheduler atm - doesn't distribute tasks across nodes and some headroom (available resource) issues.
That's not the best experience for users trying out the 2.0 branch. The CS with the default configuration of a single queue behaves the same as the FifoScheduler and doesn't have these issues. That's not the best experience for users trying out the 2.0 branch. The CS with the default configuration of a single queue behaves the same as the FifoScheduler and doesn't have these issues.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-108">YARN-108</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-108">YARN-108</a>.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br> Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
@ -45,73 +45,73 @@ <h2>Changes since Hadoop 2.0.1-alpha</h2>
<li> <a href="https://issues.apache.org/jira/browse/YARN-79">YARN-79</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-79">YARN-79</a>.
Major bug reported by Bikas Saha and fixed by Vinod Kumar Vavilapalli (client)<br> Major bug reported by Bikas Saha and fixed by Vinod Kumar Vavilapalli (client)<br>
<b>Calling YarnClientImpl.close throws Exception</b><br> <b>Calling YarnClientImpl.close throws Exception</b><br>
<blockquote>The following exception is thrown <blockquote>The following exception is thrown
=========== ===========
*org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy - is not Closeable or does not provide closeable invocation handler class org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl* *org.apache.hadoop.HadoopIllegalArgumentException: Cannot close proxy - is not Closeable or does not provide closeable invocation handler class org.apache.hadoop.yarn.api.impl.pb.client.ClientRMProtocolPBClientImpl*
*at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:624)* *at org.apache.hadoop.ipc.RPC.stopProxy(RPC.java:624)*
*at org.hadoop.yarn.client.YarnClientImpl.stop(YarnClientImpl.java:102)* *at org.hadoop.yarn.client.YarnClientImpl.stop(YarnClientImpl.java:102)*
at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:336) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.UnmanagedAMLauncher.run(UnmanagedAMLauncher.java:336)
at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:156) at org.apache.hadoop.yarn.applications.unmanagedamlauncher.TestUnmanagedAMLauncher.testDSShell(TestUnmanagedAMLauncher.java:156)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79) at org.junit.runners.BlockJUnit4ClassRunner.runNotIgnored(BlockJUnit4ClassRunner.java:79)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:71)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:49)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:52)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:191)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at org.junit.runners.ParentRunner.run(ParentRunner.java:236) at org.junit.runners.ParentRunner.run(ParentRunner.java:236)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:236)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:134)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597) at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189) at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:189)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165) at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:165)
at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85) at org.apache.maven.surefire.booter.ProviderFactory.invokeProvider(ProviderFactory.java:85)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:103)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:74)
===========</blockquote></li> ===========</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-75">YARN-75</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-75">YARN-75</a>.
Major bug reported by Siddharth Seth and fixed by Siddharth Seth <br> Major bug reported by Siddharth Seth and fixed by Siddharth Seth <br>
<b>RMContainer should handle a RELEASE event while RUNNING</b><br> <b>RMContainer should handle a RELEASE event while RUNNING</b><br>
<blockquote>An AppMaster can send a container release at any point. Currently this results in an exception, if this is done while the RM considers the container to be RUNNING. <blockquote>An AppMaster can send a container release at any point. Currently this results in an exception, if this is done while the RM considers the container to be RUNNING.
The event not being processed correctly also implies that these containers do not show up in the Completed Container List seen by the AM (AMRMProtocol). MR-3902 depends on this set being complete. </blockquote></li> The event not being processed correctly also implies that these containers do not show up in the Completed Container List seen by the AM (AMRMProtocol). MR-3902 depends on this set being complete. </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-68">YARN-68</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-68">YARN-68</a>.
Major bug reported by patrick white and fixed by Daryn Sharp (nodemanager)<br> Major bug reported by patrick white and fixed by Daryn Sharp (nodemanager)<br>
<b>NodeManager will refuse to shutdown indefinitely due to container log aggregation</b><br> <b>NodeManager will refuse to shutdown indefinitely due to container log aggregation</b><br>
<blockquote>The nodemanager is able to get into a state where containermanager.logaggregation.AppLogAggregatorImpl will apparently wait <blockquote>The nodemanager is able to get into a state where containermanager.logaggregation.AppLogAggregatorImpl will apparently wait
indefinitely for log aggregation to complete for an application, even if that application has abnormally terminated and is no longer present. indefinitely for log aggregation to complete for an application, even if that application has abnormally terminated and is no longer present.
Observed behavior is that an attempt to stop the nodemanager daemon will return but have no effect, the nm log continually displays messages similar to this: Observed behavior is that an attempt to stop the nodemanager daemon will return but have no effect, the nm log continually displays messages similar to this:
[Thread-1]2012-08-21 17:44:07,581 INFO [Thread-1]2012-08-21 17:44:07,581 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
Waiting for aggregation to complete for application_1345221477405_2733 Waiting for aggregation to complete for application_1345221477405_2733
The only recovery we found to work was to 'kill -9' the nm process. The only recovery we found to work was to 'kill -9' the nm process.
What exactly causes the NM to enter this state is unclear but we do see this behavior reliably when the NM has run a task which failed, for example when debugging oozie distcp actions and having a distcp map task fail, the NM that was running the container will now enter this state where a shutdown on said NM will never complete, 'never' in this case was waiting for 2 hours before killing the nodemanager process. What exactly causes the NM to enter this state is unclear but we do see this behavior reliably when the NM has run a task which failed, for example when debugging oozie distcp actions and having a distcp map task fail, the NM that was running the container will now enter this state where a shutdown on said NM will never complete, 'never' in this case was waiting for 2 hours before killing the nodemanager process.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-66">YARN-66</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-66">YARN-66</a>.
Critical bug reported by Thomas Graves and fixed by Thomas Graves (nodemanager)<br> Critical bug reported by Thomas Graves and fixed by Thomas Graves (nodemanager)<br>
<b>aggregated logs permissions not set properly</b><br> <b>aggregated logs permissions not set properly</b><br>
<blockquote>If the default file permissions are set to something restrictive - like 700, application logs get aggregated and created with those restrictive file permissions which doesn't allow the history server to serve them up. <blockquote>If the default file permissions are set to something restrictive - like 700, application logs get aggregated and created with those restrictive file permissions which doesn't allow the history server to serve them up.
They need to be created with group readable similar to how log aggregation sets up the directory permissions. They need to be created with group readable similar to how log aggregation sets up the directory permissions.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-63">YARN-63</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-63">YARN-63</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br> Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)<br>
@ -128,47 +128,47 @@ <h2>Changes since Hadoop 2.0.1-alpha</h2>
<li> <a href="https://issues.apache.org/jira/browse/YARN-42">YARN-42</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-42">YARN-42</a>.
Major bug reported by Devaraj K and fixed by Devaraj K (nodemanager)<br> Major bug reported by Devaraj K and fixed by Devaraj K (nodemanager)<br>
<b>Node Manager throws NPE on startup</b><br> <b>Node Manager throws NPE on startup</b><br>
<blockquote>NM throws NPE on startup if it doesn't have persmission's on nm local dir's <blockquote>NM throws NPE on startup if it doesn't have persmission's on nm local dir's
{code:xml} {code:xml}
2012-05-14 16:32:13,468 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager 2012-05-14 16:32:13,468 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.YarnException: Failed to initialize LocalizationService org.apache.hadoop.yarn.YarnException: Failed to initialize LocalizationService
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:202) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:202)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.init(ContainerManagerImpl.java:183) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.init(ContainerManagerImpl.java:183)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58) at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:166) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:166)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:268) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:268)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:284) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:284)
Caused by: java.io.IOException: mkdir of /mrv2/tmp/nm-local-dir/usercache failed Caused by: java.io.IOException: mkdir of /mrv2/tmp/nm-local-dir/usercache failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:907) at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:907)
at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325) at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:188) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:188)
... 6 more ... 6 more
2012-05-14 16:32:13,472 INFO org.apache.hadoop.yarn.service.CompositeService: Error stopping org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler 2012-05-14 16:32:13,472 INFO org.apache.hadoop.yarn.service.CompositeService: Error stopping org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
java.lang.NullPointerException java.lang.NullPointerException
at org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.stop(NonAggregatingLogHandler.java:82) at org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.stop(NonAggregatingLogHandler.java:82)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stop(ContainerManagerImpl.java:266) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stop(ContainerManagerImpl.java:266)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89) at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:182) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:182)
at org.apache.hadoop.yarn.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:122) at org.apache.hadoop.yarn.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:122)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54) at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
{code} {code}
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-39">YARN-39</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-39">YARN-39</a>.
Critical sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br> Critical sub-task reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli <br>
<b>RM-NM secret-keys should be randomly generated and rolled every so often</b><br> <b>RM-NM secret-keys should be randomly generated and rolled every so often</b><br>
<blockquote> - RM should generate the master-key randomly <blockquote> - RM should generate the master-key randomly
- The master-key should roll every so often - The master-key should roll every so often
- NM should remember old expired keys so that already doled out container-requests can be satisfied.</blockquote></li> - NM should remember old expired keys so that already doled out container-requests can be satisfied.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-37">YARN-37</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-37">YARN-37</a>.
Minor bug reported by Jason Lowe and fixed by Mayank Bansal (resourcemanager)<br> Minor bug reported by Jason Lowe and fixed by Mayank Bansal (resourcemanager)<br>
@ -177,42 +177,42 @@ <h2>Changes since Hadoop 2.0.1-alpha</h2>
<li> <a href="https://issues.apache.org/jira/browse/YARN-36">YARN-36</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-36">YARN-36</a>.
Blocker bug reported by Eli Collins and fixed by Radim Kolar <br> Blocker bug reported by Eli Collins and fixed by Radim Kolar <br>
<b>branch-2.1.0-alpha doesn't build</b><br> <b>branch-2.1.0-alpha doesn't build</b><br>
<blockquote>branch-2.1.0-alpha doesn't build due to the following. Per YARN-1 I updated the mvn version to be 2.1.0-SNAPSHOT, before I hit this issue it didn't compile due to the bogus version. <blockquote>branch-2.1.0-alpha doesn't build due to the following. Per YARN-1 I updated the mvn version to be 2.1.0-SNAPSHOT, before I hit this issue it didn't compile due to the bogus version.
{noformat} {noformat}
hadoop-branch-2.1.0-alpha $ mvn compile hadoop-branch-2.1.0-alpha $ mvn compile
[INFO] Scanning for projects... [INFO] Scanning for projects...
[ERROR] The build could not read 1 project -&gt; [Help 1] [ERROR] The build could not read 1 project -&gt; [Help 1]
[ERROR] [ERROR]
[ERROR] The project org.apache.hadoop:hadoop-yarn-project:2.1.0-SNAPSHOT (/home/eli/src/hadoop-branch-2.1.0-alpha/hadoop-yarn-project/pom.xml) has 1 error [ERROR] The project org.apache.hadoop:hadoop-yarn-project:2.1.0-SNAPSHOT (/home/eli/src/hadoop-branch-2.1.0-alpha/hadoop-yarn-project/pom.xml) has 1 error
[ERROR] 'dependencies.dependency.version' for org.hsqldb:hsqldb:jar is missing. @ line 160, column 17 [ERROR] 'dependencies.dependency.version' for org.hsqldb:hsqldb:jar is missing. @ line 160, column 17
{noformat}</blockquote></li> {noformat}</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-31">YARN-31</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-31">YARN-31</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves <br> Major bug reported by Thomas Graves and fixed by Thomas Graves <br>
<b>TestDelegationTokenRenewer fails on jdk7</b><br> <b>TestDelegationTokenRenewer fails on jdk7</b><br>
<blockquote>TestDelegationTokenRenewer fails when run with jdk7. <blockquote>TestDelegationTokenRenewer fails when run with jdk7.
With JDK7, test methods run in an undefined order. Here it is expecting that testDTRenewal runs first but it no longer is.</blockquote></li> With JDK7, test methods run in an undefined order. Here it is expecting that testDTRenewal runs first but it no longer is.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-29">YARN-29</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-29">YARN-29</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (client)<br> Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (client)<br>
<b>Add a yarn-client module</b><br> <b>Add a yarn-client module</b><br>
<blockquote>I see that we are duplicating (some) code for talking to RM via client API. In this light, a yarn-client module will be useful so that clients of all frameworks can use/extend it. <blockquote>I see that we are duplicating (some) code for talking to RM via client API. In this light, a yarn-client module will be useful so that clients of all frameworks can use/extend it.
And that same module can be the destination for all the YARN's command line tools.</blockquote></li> And that same module can be the destination for all the YARN's command line tools.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-27">YARN-27</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-27">YARN-27</a>.
Major bug reported by Ramya Sunil and fixed by Arun C Murthy <br> Major bug reported by Ramya Sunil and fixed by Arun C Murthy <br>
<b>Failed refreshQueues due to misconfiguration prevents further refreshing of queues</b><br> <b>Failed refreshQueues due to misconfiguration prevents further refreshing of queues</b><br>
<blockquote>Stumbled upon this problem while refreshing queues with incorrect configuration. The exact scenario was: <blockquote>Stumbled upon this problem while refreshing queues with incorrect configuration. The exact scenario was:
1. Added a new queue "newQueue" without defining its capacity. 1. Added a new queue "newQueue" without defining its capacity.
2. "bin/mapred queue -refreshQueues" fails correctly with "Illegal capacity of -1 for queue root.newQueue" 2. "bin/mapred queue -refreshQueues" fails correctly with "Illegal capacity of -1 for queue root.newQueue"
3. However, after defining the capacity of "newQueue" followed by a second "bin/mapred queue -refreshQueues" throws "org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=newQueue already exists!" Also see Hadoop:name=QueueMetrics,q0=root,q1=newQueue,service=ResourceManager metrics being available even though the queue was not added. 3. However, after defining the capacity of "newQueue" followed by a second "bin/mapred queue -refreshQueues" throws "org.apache.hadoop.metrics2.MetricsException: Metrics source QueueMetrics,q0=root,q1=newQueue already exists!" Also see Hadoop:name=QueueMetrics,q0=root,q1=newQueue,service=ResourceManager metrics being available even though the queue was not added.
The expected behavior would be to refresh the queues correctly and allow addition of "newQueue". </blockquote></li> The expected behavior would be to refresh the queues correctly and allow addition of "newQueue". </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-25">YARN-25</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-25">YARN-25</a>.
Major bug reported by Thomas Graves and fixed by Robert Joseph Evans <br> Major bug reported by Thomas Graves and fixed by Robert Joseph Evans <br>
<b>remove old aggregated logs</b><br> <b>remove old aggregated logs</b><br>
<blockquote>Currently the aggregated user logs under NM_REMOTE_APP_LOG_DIR are never removed. We should have mechanism to remove them after certain period. <blockquote>Currently the aggregated user logs under NM_REMOTE_APP_LOG_DIR are never removed. We should have mechanism to remove them after certain period.
It might make sense for job history server to remove them.</blockquote></li> It might make sense for job history server to remove them.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-22">YARN-22</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-22">YARN-22</a>.
Minor bug reported by Eli Collins and fixed by Mayank Bansal <br> Minor bug reported by Eli Collins and fixed by Mayank Bansal <br>
@ -221,29 +221,29 @@ <h2>Changes since Hadoop 2.0.1-alpha</h2>
<li> <a href="https://issues.apache.org/jira/browse/YARN-15">YARN-15</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-15">YARN-15</a>.
Critical bug reported by Alejandro Abdelnur and fixed by Arun C Murthy (nodemanager)<br> Critical bug reported by Alejandro Abdelnur and fixed by Arun C Murthy (nodemanager)<br>
<b>YarnConfiguration DEFAULT_YARN_APPLICATION_CLASSPATH should be updated</b><br> <b>YarnConfiguration DEFAULT_YARN_APPLICATION_CLASSPATH should be updated</b><br>
<blockquote> <blockquote>
{code} {code}
/** /**
* Default CLASSPATH for YARN applications. A comma-separated list of * Default CLASSPATH for YARN applications. A comma-separated list of
* CLASSPATH entries * CLASSPATH entries
*/ */
public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = { public static final String[] DEFAULT_YARN_APPLICATION_CLASSPATH = {
"$HADOOP_CONF_DIR", "$HADOOP_COMMON_HOME/share/hadoop/common/*", "$HADOOP_CONF_DIR", "$HADOOP_COMMON_HOME/share/hadoop/common/*",
"$HADOOP_COMMON_HOME/share/hadoop/common/lib/*", "$HADOOP_COMMON_HOME/share/hadoop/common/lib/*",
"$HADOOP_HDFS_HOME/share/hadoop/hdfs/*", "$HADOOP_HDFS_HOME/share/hadoop/hdfs/*",
"$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*", "$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*",
"$YARN_HOME/share/hadoop/mapreduce/*", "$YARN_HOME/share/hadoop/mapreduce/*",
"$YARN_HOME/share/hadoop/mapreduce/lib/*"}; "$YARN_HOME/share/hadoop/mapreduce/lib/*"};
{code} {code}
It should have {{share/yarn/}} and MR should add the {{share/mapreduce/}} (another JIRA?)</blockquote></li> It should have {{share/yarn/}} and MR should add the {{share/mapreduce/}} (another JIRA?)</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-14">YARN-14</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-14">YARN-14</a>.
Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br> Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)<br>
<b>Symlinks to peer distributed cache files no longer work</b><br> <b>Symlinks to peer distributed cache files no longer work</b><br>
<blockquote>Trying to create a symlink to another file that is specified for the distributed cache will fail to create the link. For example: <blockquote>Trying to create a symlink to another file that is specified for the distributed cache will fail to create the link. For example:
hadoop jar ... -files "x,y,x#z" hadoop jar ... -files "x,y,x#z"
will localize the files x and y as x and y, but the z symlink for x will not be created. This is a regression from 1.x behavior.</blockquote></li> will localize the files x and y as x and y, but the z symlink for x will not be created. This is a regression from 1.x behavior.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-13">YARN-13</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-13">YARN-13</a>.
Critical bug reported by Todd Lipcon and fixed by <br> Critical bug reported by Todd Lipcon and fixed by <br>
@ -252,13 +252,13 @@ <h2>Changes since Hadoop 2.0.1-alpha</h2>
<li> <a href="https://issues.apache.org/jira/browse/YARN-12">YARN-12</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-12">YARN-12</a>.
Major bug reported by Junping Du and fixed by Junping Du (scheduler)<br> Major bug reported by Junping Du and fixed by Junping Du (scheduler)<br>
<b>Several Findbugs issues with new FairScheduler in YARN</b><br> <b>Several Findbugs issues with new FairScheduler in YARN</b><br>
<blockquote>The good feature of FairScheduler is added recently to YARN. As recently PreCommit test from MAPREDUCE-4309, there are several bugs found by Findbugs related to FairScheduler: <blockquote>The good feature of FairScheduler is added recently to YARN. As recently PreCommit test from MAPREDUCE-4309, there are several bugs found by Findbugs related to FairScheduler:
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerEventLog.shutdown() might ignore java.lang.Exception org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerEventLog.shutdown() might ignore java.lang.Exception
Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerEventLog.logDisabled; locked 50% of time Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairSchedulerEventLog.logDisabled; locked 50% of time
Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.queueMaxAppsDefault; locked 50% of time Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.queueMaxAppsDefault; locked 50% of time
Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.userMaxAppsDefault; locked 50% of time Inconsistent synchronization of org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.QueueManager.userMaxAppsDefault; locked 50% of time
The details are in:https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2612//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#DE_MIGHT_IGNORE The details are in:https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2612//artifact/trunk/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html#DE_MIGHT_IGNORE
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/YARN-10">YARN-10</a>. <li> <a href="https://issues.apache.org/jira/browse/YARN-10">YARN-10</a>.
Major improvement reported by Arun C Murthy and fixed by Hitesh Shah <br> Major improvement reported by Arun C Murthy and fixed by Hitesh Shah <br>
@ -991,18 +991,18 @@ <h2>Changes since Hadoop 2.0.1-alpha</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3812">MAPREDUCE-3812</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3812">MAPREDUCE-3812</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Harsh J (mrv2 , performance)<br> Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Harsh J (mrv2 , performance)<br>
<b>Lower default allocation sizes, fix allocation configurations and document them</b><br> <b>Lower default allocation sizes, fix allocation configurations and document them</b><br>
<blockquote>Removes two sets of previously available config properties: <blockquote>Removes two sets of previously available config properties:
1. ( yarn.scheduler.fifo.minimum-allocation-mb and yarn.scheduler.fifo.maximum-allocation-mb ) and, 1. ( yarn.scheduler.fifo.minimum-allocation-mb and yarn.scheduler.fifo.maximum-allocation-mb ) and,
2. ( yarn.scheduler.capacity.minimum-allocation-mb and yarn.scheduler.capacity.maximum-allocation-mb ) 2. ( yarn.scheduler.capacity.minimum-allocation-mb and yarn.scheduler.capacity.maximum-allocation-mb )
In favor of two new, generically named properties: In favor of two new, generically named properties:
1. yarn.scheduler.minimum-allocation-mb - This acts as the floor value of memory resource requests for containers. 1. yarn.scheduler.minimum-allocation-mb - This acts as the floor value of memory resource requests for containers.
2. yarn.scheduler.maximum-allocation-mb - This acts as the ceiling value of memory resource requests for containers. 2. yarn.scheduler.maximum-allocation-mb - This acts as the ceiling value of memory resource requests for containers.
Both these properties need to be set at the ResourceManager (RM) to take effect, as the RM is where the scheduler resides. Both these properties need to be set at the ResourceManager (RM) to take effect, as the RM is where the scheduler resides.
Also changes the default minimum and maximums to 128 MB and 10 GB respectively.</blockquote></li> Also changes the default minimum and maximums to 128 MB and 10 GB respectively.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3782">MAPREDUCE-3782</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3782">MAPREDUCE-3782</a>.
Critical bug reported by Arpit Gupta and fixed by Jason Lowe (mrv2)<br> Critical bug reported by Arpit Gupta and fixed by Jason Lowe (mrv2)<br>
@ -1043,8 +1043,8 @@ <h2>Changes since Hadoop 2.0.1-alpha</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3543">MAPREDUCE-3543</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3543">MAPREDUCE-3543</a>.
Critical bug reported by Mahadev konar and fixed by Thomas Graves (mrv2)<br> Critical bug reported by Mahadev konar and fixed by Thomas Graves (mrv2)<br>
<b>Mavenize Gridmix.</b><br> <b>Mavenize Gridmix.</b><br>
<blockquote>Note that to apply this you should first run the script - ./MAPREDUCE-3543v3.sh svn, then apply the patch. <blockquote>Note that to apply this you should first run the script - ./MAPREDUCE-3543v3.sh svn, then apply the patch.
If this is merged to more then trunk, the version inside of hadoop-tools/hadoop-gridmix/pom.xml will need to be udpated accordingly.</blockquote></li> If this is merged to more then trunk, the version inside of hadoop-tools/hadoop-gridmix/pom.xml will need to be udpated accordingly.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3506">MAPREDUCE-3506</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3506">MAPREDUCE-3506</a>.
Minor bug reported by Ratandeep Ratti and fixed by Jason Lowe (client , mrv2)<br> Minor bug reported by Ratandeep Ratti and fixed by Jason Lowe (client , mrv2)<br>
@ -1613,10 +1613,10 @@ <h2>Changes since Hadoop 2.0.1-alpha</h2>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3475">HDFS-3475</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-3475">HDFS-3475</a>.
Trivial improvement reported by Harsh J and fixed by Harsh J <br> Trivial improvement reported by Harsh J and fixed by Harsh J <br>
<b>Make the replication and invalidation rates configurable</b><br> <b>Make the replication and invalidation rates configurable</b><br>
<blockquote>This change adds two new configuration parameters. <blockquote>This change adds two new configuration parameters.
# {{dfs.namenode.invalidate.work.pct.per.iteration}} for controlling deletion rate of blocks. # {{dfs.namenode.invalidate.work.pct.per.iteration}} for controlling deletion rate of blocks.
# {{dfs.namenode.replication.work.multiplier.per.iteration}} for controlling replication rate. This in turn allows controlling the time it takes for decommissioning. # {{dfs.namenode.replication.work.multiplier.per.iteration}} for controlling replication rate. This in turn allows controlling the time it takes for decommissioning.
Please see hdfs-default.xml for detailed description.</blockquote></li> Please see hdfs-default.xml for detailed description.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-3474">HDFS-3474</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-3474">HDFS-3474</a>.
Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly <br> Major sub-task reported by Ivan Kelly and fixed by Ivan Kelly <br>
@ -4769,8 +4769,8 @@ <h2>Changes since Hadoop 0.23.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3720">MAPREDUCE-3720</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3720">MAPREDUCE-3720</a>.
Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (client , mrv2)<br> Major bug reported by Vinod Kumar Vavilapalli and fixed by Vinod Kumar Vavilapalli (client , mrv2)<br>
<b>Command line listJobs should not visit each AM</b><br> <b>Command line listJobs should not visit each AM</b><br>
<blockquote>Changed bin/mapred job -list to not print job-specific information not available at RM. <blockquote>Changed bin/mapred job -list to not print job-specific information not available at RM.
Very minor incompatibility in cmd-line output, inevitable due to MRv2 architecture.</blockquote></li> Very minor incompatibility in cmd-line output, inevitable due to MRv2 architecture.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3718">MAPREDUCE-3718</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3718">MAPREDUCE-3718</a>.
Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Hitesh Shah (mrv2 , performance)<br> Major sub-task reported by Vinod Kumar Vavilapalli and fixed by Hitesh Shah (mrv2 , performance)<br>
@ -4819,8 +4819,8 @@ <h2>Changes since Hadoop 0.23.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3703">MAPREDUCE-3703</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3703">MAPREDUCE-3703</a>.
Critical bug reported by Eric Payne and fixed by Eric Payne (mrv2 , resourcemanager)<br> Critical bug reported by Eric Payne and fixed by Eric Payne (mrv2 , resourcemanager)<br>
<b>ResourceManager should provide node lists in JMX output</b><br> <b>ResourceManager should provide node lists in JMX output</b><br>
<blockquote>New JMX Bean in ResourceManager to provide list of live node managers: <blockquote>New JMX Bean in ResourceManager to provide list of live node managers:
Hadoop:service=ResourceManager,name=RMNMInfo LiveNodeManagers</blockquote></li> Hadoop:service=ResourceManager,name=RMNMInfo LiveNodeManagers</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3702">MAPREDUCE-3702</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3702">MAPREDUCE-3702</a>.
Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)<br> Critical bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)<br>
@ -5037,12 +5037,12 @@ <h2>Changes since Hadoop 0.23.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3549">MAPREDUCE-3549</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3549">MAPREDUCE-3549</a>.
Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)<br> Blocker bug reported by Thomas Graves and fixed by Thomas Graves (mrv2)<br>
<b>write api documentation for web service apis for RM, NM, mapreduce app master, and job history server</b><br> <b>write api documentation for web service apis for RM, NM, mapreduce app master, and job history server</b><br>
<blockquote>new files added: A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WebServicesIntro.apt.vm <blockquote>new files added: A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WebServicesIntro.apt.vm
A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRest.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/NodeManagerRest.apt.vm
A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/ResourceManagerRest.apt.vm
A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/MapredAppMasterRest.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/MapredAppMasterRest.apt.vm
A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/HistoryServerRest.apt.vm A hadoop-mapreduce-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/HistoryServerRest.apt.vm
The hadoop-project/src/site/site.xml is split into separate patch.</blockquote></li> The hadoop-project/src/site/site.xml is split into separate patch.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3548">MAPREDUCE-3548</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3548">MAPREDUCE-3548</a>.
Critical sub-task reported by Thomas Graves and fixed by Thomas Graves (mrv2)<br> Critical sub-task reported by Thomas Graves and fixed by Thomas Graves (mrv2)<br>
@ -5471,7 +5471,7 @@ <h2>Changes since Hadoop 0.23.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3297">MAPREDUCE-3297</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3297">MAPREDUCE-3297</a>.
Major task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)<br> Major task reported by Siddharth Seth and fixed by Siddharth Seth (mrv2)<br>
<b>Move Log Related components from yarn-server-nodemanager to yarn-common</b><br> <b>Move Log Related components from yarn-server-nodemanager to yarn-common</b><br>
<blockquote>Moved log related components into yarn-common so that HistoryServer and clients can use them without depending on the yarn-server-nodemanager module. <blockquote>Moved log related components into yarn-common so that HistoryServer and clients can use them without depending on the yarn-server-nodemanager module.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3291">MAPREDUCE-3291</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3291">MAPREDUCE-3291</a>.
Blocker bug reported by Ramya Sunil and fixed by Robert Joseph Evans (mrv2)<br> Blocker bug reported by Ramya Sunil and fixed by Robert Joseph Evans (mrv2)<br>
@ -5504,17 +5504,17 @@ <h2>Changes since Hadoop 0.23.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3219">MAPREDUCE-3219</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3219">MAPREDUCE-3219</a>.
Minor sub-task reported by Hitesh Shah and fixed by Hitesh Shah (mrv2 , test)<br> Minor sub-task reported by Hitesh Shah and fixed by Hitesh Shah (mrv2 , test)<br>
<b>ant test TestDelegationToken failing on trunk</b><br> <b>ant test TestDelegationToken failing on trunk</b><br>
<blockquote>Reenabled and fixed bugs in the failing test TestDelegationToken. <blockquote>Reenabled and fixed bugs in the failing test TestDelegationToken.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3217">MAPREDUCE-3217</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3217">MAPREDUCE-3217</a>.
Minor sub-task reported by Hitesh Shah and fixed by Devaraj K (mrv2 , test)<br> Minor sub-task reported by Hitesh Shah and fixed by Devaraj K (mrv2 , test)<br>
<b>ant test TestAuditLogger fails on trunk</b><br> <b>ant test TestAuditLogger fails on trunk</b><br>
<blockquote>Reenabled and fixed bugs in the failing ant test TestAuditLogger. <blockquote>Reenabled and fixed bugs in the failing ant test TestAuditLogger.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3215">MAPREDUCE-3215</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3215">MAPREDUCE-3215</a>.
Minor sub-task reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)<br> Minor sub-task reported by Hitesh Shah and fixed by Hitesh Shah (mrv2)<br>
<b>org.apache.hadoop.mapreduce.TestNoJobSetupCleanup failing on trunk</b><br> <b>org.apache.hadoop.mapreduce.TestNoJobSetupCleanup failing on trunk</b><br>
<blockquote>Reneabled and fixed bugs in the failing test TestNoJobSetupCleanup. <blockquote>Reneabled and fixed bugs in the failing test TestNoJobSetupCleanup.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3194">MAPREDUCE-3194</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3194">MAPREDUCE-3194</a>.
Major bug reported by Siddharth Seth and fixed by Jason Lowe (mrv2)<br> Major bug reported by Siddharth Seth and fixed by Jason Lowe (mrv2)<br>
@ -5875,12 +5875,12 @@ <h2>Changes since Hadoop 0.23.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-2246">HDFS-2246</a>.
Major improvement reported by Sanjay Radia and fixed by Jitendra Nath Pandey <br> Major improvement reported by Sanjay Radia and fixed by Jitendra Nath Pandey <br>
<b>Shortcut a local client reads to a Datanodes files directly</b><br> <b>Shortcut a local client reads to a Datanodes files directly</b><br>
<blockquote>1. New configurations <blockquote>1. New configurations
a. dfs.block.local-path-access.user is the key in datanode configuration to specify the user allowed to do short circuit read. a. dfs.block.local-path-access.user is the key in datanode configuration to specify the user allowed to do short circuit read.
b. dfs.client.read.shortcircuit is the key to enable short circuit read at the client side configuration. b. dfs.client.read.shortcircuit is the key to enable short circuit read at the client side configuration.
c. dfs.client.read.shortcircuit.skip.checksum is the key to bypass checksum check at the client side. c. dfs.client.read.shortcircuit.skip.checksum is the key to bypass checksum check at the client side.
2. By default none of the above are enabled and short circuit read will not kick in. 2. By default none of the above are enabled and short circuit read will not kick in.
3. If security is on, the feature can be used only for user that has kerberos credentials at the client, therefore map reduce tasks cannot benefit from it in general. 3. If security is on, the feature can be used only for user that has kerberos credentials at the client, therefore map reduce tasks cannot benefit from it in general.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2178">HDFS-2178</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-2178">HDFS-2178</a>.
Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur <br> Major improvement reported by Alejandro Abdelnur and fixed by Alejandro Abdelnur <br>
@ -6161,7 +6161,7 @@ <h2>Changes since Hadoop 0.23.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-7802">HADOOP-7802</a>. <li> <a href="https://issues.apache.org/jira/browse/HADOOP-7802">HADOOP-7802</a>.
Major bug reported by Bruno Mah&#233; and fixed by Bruno Mah&#233; <br> Major bug reported by Bruno Mah&#233; and fixed by Bruno Mah&#233; <br>
<b>Hadoop scripts unconditionally source "$bin"/../libexec/hadoop-config.sh.</b><br> <b>Hadoop scripts unconditionally source "$bin"/../libexec/hadoop-config.sh.</b><br>
<blockquote>Here is a patch to enable this behavior <blockquote>Here is a patch to enable this behavior
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-7801">HADOOP-7801</a>. <li> <a href="https://issues.apache.org/jira/browse/HADOOP-7801">HADOOP-7801</a>.
Major bug reported by Bruno Mah&#233; and fixed by Bruno Mah&#233; (build)<br> Major bug reported by Bruno Mah&#233; and fixed by Bruno Mah&#233; (build)<br>
@ -6486,9 +6486,9 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3186">MAPREDUCE-3186</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3186">MAPREDUCE-3186</a>.
Blocker bug reported by Ramgopal N and fixed by Eric Payne (mrv2)<br> Blocker bug reported by Ramgopal N and fixed by Eric Payne (mrv2)<br>
<b>User jobs are getting hanged if the Resource manager process goes down and comes up while job is getting executed.</b><br> <b>User jobs are getting hanged if the Resource manager process goes down and comes up while job is getting executed.</b><br>
<blockquote>New Yarn configuration property: <blockquote>New Yarn configuration property:
Name: yarn.app.mapreduce.am.scheduler.connection.retries Name: yarn.app.mapreduce.am.scheduler.connection.retries
Description: Number of times AM should retry to contact RM if connection is lost.</blockquote></li> Description: Number of times AM should retry to contact RM if connection is lost.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3185">MAPREDUCE-3185</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3185">MAPREDUCE-3185</a>.
Critical bug reported by Mahadev konar and fixed by Jonathan Eagles (mrv2)<br> Critical bug reported by Mahadev konar and fixed by Jonathan Eagles (mrv2)<br>
@ -6641,7 +6641,7 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3112">MAPREDUCE-3112</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3112">MAPREDUCE-3112</a>.
Major bug reported by Eric Yang and fixed by Eric Yang (contrib/streaming)<br> Major bug reported by Eric Yang and fixed by Eric Yang (contrib/streaming)<br>
<b>Calling hadoop cli inside mapreduce job leads to errors</b><br> <b>Calling hadoop cli inside mapreduce job leads to errors</b><br>
<blockquote>Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process. <blockquote>Removed inheritance of certain server environment variables (HADOOP_OPTS and HADOOP_ROOT_LOGGER) in task attempt process.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3110">MAPREDUCE-3110</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-3110">MAPREDUCE-3110</a>.
Major bug reported by Devaraj K and fixed by Vinod Kumar Vavilapalli (mrv2 , test)<br> Major bug reported by Devaraj K and fixed by Vinod Kumar Vavilapalli (mrv2 , test)<br>
@ -7114,16 +7114,16 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2858">MAPREDUCE-2858</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2858">MAPREDUCE-2858</a>.
Blocker sub-task reported by Luke Lu and fixed by Robert Joseph Evans (applicationmaster , mrv2 , security)<br> Blocker sub-task reported by Luke Lu and fixed by Robert Joseph Evans (applicationmaster , mrv2 , security)<br>
<b>MRv2 WebApp Security</b><br> <b>MRv2 WebApp Security</b><br>
<blockquote>A new server has been added to yarn. It is a web proxy that sits in front of the AM web UI. The server is controlled by the yarn.web-proxy.address config. If that config is set, and it points to an address that is different then the RM web interface then a separate proxy server needs to be launched. <blockquote>A new server has been added to yarn. It is a web proxy that sits in front of the AM web UI. The server is controlled by the yarn.web-proxy.address config. If that config is set, and it points to an address that is different then the RM web interface then a separate proxy server needs to be launched.
This can be done by running This can be done by running
yarn-daemon.sh start proxyserver yarn-daemon.sh start proxyserver
If a separate proxy server is needed other configs also may need to be set, if security is enabled. If a separate proxy server is needed other configs also may need to be set, if security is enabled.
yarn.web-proxy.principal yarn.web-proxy.principal
yarn.web-proxy.keytab yarn.web-proxy.keytab
The proxy server is stateless and should be able to support a VIP or other load balancing sitting in front of multiple instances of this server.</blockquote></li> The proxy server is stateless and should be able to support a VIP or other load balancing sitting in front of multiple instances of this server.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2854">MAPREDUCE-2854</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2854">MAPREDUCE-2854</a>.
Major bug reported by Thomas Graves and fixed by Thomas Graves <br> Major bug reported by Thomas Graves and fixed by Thomas Graves <br>
@ -8061,12 +8061,12 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2037">MAPREDUCE-2037</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2037">MAPREDUCE-2037</a>.
Major new feature reported by Dick King and fixed by Dick King <br> Major new feature reported by Dick King and fixed by Dick King <br>
<b>Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds</b><br> <b>Capturing interim progress times, CPU usage, and memory usage, when tasks reach certain progress thresholds</b><br>
<blockquote>Capture intermediate task resource consumption information: <blockquote>Capture intermediate task resource consumption information:
* Time taken so far * Time taken so far
* CPU load [either at the time the data are taken, or exponentially smoothed] * CPU load [either at the time the data are taken, or exponentially smoothed]
* Memory load [also either at the time the data are taken, or exponentially smoothed] * Memory load [also either at the time the data are taken, or exponentially smoothed]
This would be taken at intervals that depend on the task progress plateaus. For example, reducers have three progress ranges - [0-1/3], (1/3-2/3], and (2/3-3/3] - where fundamentally different activities happen. Mappers have different boundaries that are not symmetrically placed [0-9/10], (9/10-1]. Data capture boundaries should coincide with activity boundaries. For the state information capture [CPU and memory] we should average over the covered interval. This would be taken at intervals that depend on the task progress plateaus. For example, reducers have three progress ranges - [0-1/3], (1/3-2/3], and (2/3-3/3] - where fundamentally different activities happen. Mappers have different boundaries that are not symmetrically placed [0-9/10], (9/10-1]. Data capture boundaries should coincide with activity boundaries. For the state information capture [CPU and memory] we should average over the covered interval.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2033">MAPREDUCE-2033</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-2033">MAPREDUCE-2033</a>.
Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)<br> Major task reported by Vinay Kumar Thota and fixed by Vinay Kumar Thota (contrib/gridmix)<br>
@ -8175,24 +8175,24 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-279">MAPREDUCE-279</a>. <li> <a href="https://issues.apache.org/jira/browse/MAPREDUCE-279">MAPREDUCE-279</a>.
Major improvement reported by Arun C Murthy and fixed by (mrv2)<br> Major improvement reported by Arun C Murthy and fixed by (mrv2)<br>
<b>Map-Reduce 2.0</b><br> <b>Map-Reduce 2.0</b><br>
<blockquote>MapReduce has undergone a complete re-haul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2). <blockquote>MapReduce has undergone a complete re-haul in hadoop-0.23 and we now have, what we call, MapReduce 2.0 (MRv2).
The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs. The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks. The fundamental idea of MRv2 is to split up the two major functionalities of the JobTracker, resource management and job scheduling/monitoring, into separate daemons. The idea is to have a global ResourceManager (RM) and per-application ApplicationMaster (AM). An application is either a single job in the classical sense of Map-Reduce jobs or a DAG of jobs. The ResourceManager and per-node slave, the NodeManager (NM), form the data-computation framework. The ResourceManager is the ultimate authority that arbitrates resources among all the applications in the system. The per-application ApplicationMaster is, in effect, a framework specific library and is tasked with negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the tasks.
The ResourceManager has two main components: The ResourceManager has two main components:
* Scheduler (S) * Scheduler (S)
* ApplicationsManager (ASM) * ApplicationsManager (ASM)
The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees on restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract notion of a Resource Container which incorporates elements such as memory, cpu, disk, network etc. The Scheduler is responsible for allocating resources to the various running applications subject to familiar constraints of capacities, queues etc. The Scheduler is pure scheduler in the sense that it performs no monitoring or tracking of status for the application. Also, it offers no guarantees on restarting failed tasks either due to application failure or hardware failures. The Scheduler performs its scheduling function based the resource requirements of the applications; it does so based on the abstract notion of a Resource Container which incorporates elements such as memory, cpu, disk, network etc.
The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. The current Map-Reduce schedulers such as the CapacityScheduler and the FairScheduler would be some examples of the plug-in. The Scheduler has a pluggable policy plug-in, which is responsible for partitioning the cluster resources among the various queues, applications etc. The current Map-Reduce schedulers such as the CapacityScheduler and the FairScheduler would be some examples of the plug-in.
The CapacityScheduler supports hierarchical queues to allow for more predictable sharing of cluster resources. The CapacityScheduler supports hierarchical queues to allow for more predictable sharing of cluster resources.
The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure. The ApplicationsManager is responsible for accepting job-submissions, negotiating the first container for executing the application specific ApplicationMaster and provides the service for restarting the ApplicationMaster container on failure.
The NodeManager is the per-machine framework agent who is responsible for launching the applications' containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Scheduler. The NodeManager is the per-machine framework agent who is responsible for launching the applications' containers, monitoring their resource usage (cpu, memory, disk, network) and reporting the same to the Scheduler.
The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress. The per-application ApplicationMaster has the responsibility of negotiating appropriate resource containers from the Scheduler, tracking their status and monitoring for progress.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2540">HDFS-2540</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-2540">HDFS-2540</a>.
Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE <br> Major sub-task reported by Tsz Wo (Nicholas), SZE and fixed by Tsz Wo (Nicholas), SZE <br>
@ -8253,10 +8253,10 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2465">HDFS-2465</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-2465">HDFS-2465</a>.
Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node , performance)<br> Major improvement reported by Todd Lipcon and fixed by Todd Lipcon (data-node , performance)<br>
<b>Add HDFS support for fadvise readahead and drop-behind</b><br> <b>Add HDFS support for fadvise readahead and drop-behind</b><br>
<blockquote>HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to manage the OS buffer cache. This support is currently considered experimental, and may be enabled by configuring the following keys: <blockquote>HDFS now has the ability to use posix_fadvise and sync_data_range syscalls to manage the OS buffer cache. This support is currently considered experimental, and may be enabled by configuring the following keys:
dfs.datanode.drop.cache.behind.writes - set to true to drop data out of the buffer cache after writing dfs.datanode.drop.cache.behind.writes - set to true to drop data out of the buffer cache after writing
dfs.datanode.drop.cache.behind.reads - set to true to drop data out of the buffer cache when performing sequential reads dfs.datanode.drop.cache.behind.reads - set to true to drop data out of the buffer cache when performing sequential reads
dfs.datanode.sync.behind.writes - set to true to trigger dirty page writeback immediately after writing data dfs.datanode.sync.behind.writes - set to true to trigger dirty page writeback immediately after writing data
dfs.datanode.readahead.bytes - set to a non-zero value to trigger readahead for sequential reads</blockquote></li> dfs.datanode.readahead.bytes - set to a non-zero value to trigger readahead for sequential reads</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-2453">HDFS-2453</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-2453">HDFS-2453</a>.
Major sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)<br> Major sub-task reported by Arpit Gupta and fixed by Tsz Wo (Nicholas), SZE (webhdfs)<br>
@ -9331,7 +9331,7 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-1594">HDFS-1594</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-1594">HDFS-1594</a>.
Major bug reported by Devaraj K and fixed by Aaron T. Myers (name-node)<br> Major bug reported by Devaraj K and fixed by Aaron T. Myers (name-node)<br>
<b>When the disk becomes full Namenode is getting shutdown and not able to recover</b><br> <b>When the disk becomes full Namenode is getting shutdown and not able to recover</b><br>
<blockquote>Implemented a daemon thread to monitor the disk usage for periodically and if the disk usage reaches the threshold value, put the name node into Safe mode so that no modification to file system will occur. Once the disk usage reaches below the threshold, name node will be put out of the safe mode. Here threshold value and interval to check the disk usage are configurable. <blockquote>Implemented a daemon thread to monitor the disk usage for periodically and if the disk usage reaches the threshold value, put the name node into Safe mode so that no modification to file system will occur. Once the disk usage reaches below the threshold, name node will be put out of the safe mode. Here threshold value and interval to check the disk usage are configurable.
</blockquote></li> </blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-1592">HDFS-1592</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-1592">HDFS-1592</a>.
Major bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi <br> Major bug reported by Bharath Mundlapudi and fixed by Bharath Mundlapudi <br>
@ -9376,9 +9376,9 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-1547">HDFS-1547</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-1547">HDFS-1547</a>.
Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)<br> Major improvement reported by Suresh Srinivas and fixed by Suresh Srinivas (name-node)<br>
<b>Improve decommission mechanism</b><br> <b>Improve decommission mechanism</b><br>
<blockquote>Summary of changes to the decommissioning process: <blockquote>Summary of changes to the decommissioning process:
# After nodes are decommissioned, they are not shutdown. The decommissioned nodes are not used for writes. For reads, the decommissioned nodes are given as the last location to read from. # After nodes are decommissioned, they are not shutdown. The decommissioned nodes are not used for writes. For reads, the decommissioned nodes are given as the last location to read from.
# Number of live and dead decommissioned nodes are displayed in the namenode webUI. # Number of live and dead decommissioned nodes are displayed in the namenode webUI.
# Decommissioned nodes free capacity is not count towards the the cluster free capacity.</blockquote></li> # Decommissioned nodes free capacity is not count towards the the cluster free capacity.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-1541">HDFS-1541</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-1541">HDFS-1541</a>.
Major sub-task reported by Hairong Kuang and fixed by Hairong Kuang (name-node)<br> Major sub-task reported by Hairong Kuang and fixed by Hairong Kuang (name-node)<br>
@ -9491,10 +9491,10 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-1448">HDFS-1448</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-1448">HDFS-1448</a>.
Major new feature reported by Erik Steffl and fixed by Erik Steffl (tools)<br> Major new feature reported by Erik Steffl and fixed by Erik Steffl (tools)<br>
<b>Create multi-format parser for edits logs file, support binary and XML formats initially</b><br> <b>Create multi-format parser for edits logs file, support binary and XML formats initially</b><br>
<blockquote>Offline edits viewer feature adds oev tool to hdfs script. Oev makes it possible to convert edits logs to/from native binary and XML formats. It uses the same framework as Offline image viewer. <blockquote>Offline edits viewer feature adds oev tool to hdfs script. Oev makes it possible to convert edits logs to/from native binary and XML formats. It uses the same framework as Offline image viewer.
Example usage: Example usage:
$HADOOP_HOME/bin/hdfs oev -i edits -o output.xml</blockquote></li> $HADOOP_HOME/bin/hdfs oev -i edits -o output.xml</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HDFS-1445">HDFS-1445</a>. <li> <a href="https://issues.apache.org/jira/browse/HDFS-1445">HDFS-1445</a>.
Major sub-task reported by Matt Foley and fixed by Matt Foley (data-node)<br> Major sub-task reported by Matt Foley and fixed by Matt Foley (data-node)<br>
@ -9762,7 +9762,7 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-7681">HADOOP-7681</a>. <li> <a href="https://issues.apache.org/jira/browse/HADOOP-7681">HADOOP-7681</a>.
Minor bug reported by Arpit Gupta and fixed by Arpit Gupta (conf)<br> Minor bug reported by Arpit Gupta and fixed by Arpit Gupta (conf)<br>
<b>log4j.properties is missing properties for security audit and hdfs audit should be changed to info</b><br> <b>log4j.properties is missing properties for security audit and hdfs audit should be changed to info</b><br>
<blockquote>HADOOP-7681. Fixed security and hdfs audit log4j properties <blockquote>HADOOP-7681. Fixed security and hdfs audit log4j properties
(Arpit Gupta via Eric Yang)</blockquote></li> (Arpit Gupta via Eric Yang)</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-7671">HADOOP-7671</a>. <li> <a href="https://issues.apache.org/jira/browse/HADOOP-7671">HADOOP-7671</a>.
Major bug reported by Ravi Prakash and fixed by Ravi Prakash <br> Major bug reported by Ravi Prakash and fixed by Ravi Prakash <br>
@ -10363,8 +10363,8 @@ <h2>Changes since Hadoop 1.0.0</h2>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-7227">HADOOP-7227</a>. <li> <a href="https://issues.apache.org/jira/browse/HADOOP-7227">HADOOP-7227</a>.
Major improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey (ipc)<br> Major improvement reported by Jitendra Nath Pandey and fixed by Jitendra Nath Pandey (ipc)<br>
<b>Remove protocol version check at proxy creation in Hadoop RPC.</b><br> <b>Remove protocol version check at proxy creation in Hadoop RPC.</b><br>
<blockquote>1. Protocol version check is removed from proxy creation, instead version check is performed at server in every rpc call. <blockquote>1. Protocol version check is removed from proxy creation, instead version check is performed at server in every rpc call.
2. This change is backward incompatible because format of the rpc messages is changed to include client version, client method hash and rpc version. 2. This change is backward incompatible because format of the rpc messages is changed to include client version, client method hash and rpc version.
3. rpc version is introduced which should change when the format of rpc messages is changed.</blockquote></li> 3. rpc version is introduced which should change when the format of rpc messages is changed.</blockquote></li>
<li> <a href="https://issues.apache.org/jira/browse/HADOOP-7223">HADOOP-7223</a>. <li> <a href="https://issues.apache.org/jira/browse/HADOOP-7223">HADOOP-7223</a>.
Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (fs)<br> Major bug reported by Suresh Srinivas and fixed by Suresh Srinivas (fs)<br>

View File

@ -1,211 +1,211 @@
/* /*
* ContextFactory.java * ContextFactory.java
* *
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.metrics; package org.apache.hadoop.metrics;
import java.io.IOException; import java.io.IOException;
import java.io.InputStream; import java.io.InputStream;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.Collection; import java.util.Collection;
import java.util.HashMap; import java.util.HashMap;
import java.util.Iterator; import java.util.Iterator;
import java.util.Map; import java.util.Map;
import java.util.Properties; import java.util.Properties;
import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.metrics.spi.NullContext; import org.apache.hadoop.metrics.spi.NullContext;
/** /**
* Factory class for creating MetricsContext objects. To obtain an instance * Factory class for creating MetricsContext objects. To obtain an instance
* of this class, use the static <code>getFactory()</code> method. * of this class, use the static <code>getFactory()</code> method.
*/ */
@InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"}) @InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"})
@InterfaceStability.Evolving @InterfaceStability.Evolving
public class ContextFactory { public class ContextFactory {
private static final String PROPERTIES_FILE = private static final String PROPERTIES_FILE =
"/hadoop-metrics.properties"; "/hadoop-metrics.properties";
private static final String CONTEXT_CLASS_SUFFIX = private static final String CONTEXT_CLASS_SUFFIX =
".class"; ".class";
private static final String DEFAULT_CONTEXT_CLASSNAME = private static final String DEFAULT_CONTEXT_CLASSNAME =
"org.apache.hadoop.metrics.spi.NullContext"; "org.apache.hadoop.metrics.spi.NullContext";
private static ContextFactory theFactory = null; private static ContextFactory theFactory = null;
private Map<String,Object> attributeMap = new HashMap<String,Object>(); private Map<String,Object> attributeMap = new HashMap<String,Object>();
private Map<String,MetricsContext> contextMap = private Map<String,MetricsContext> contextMap =
new HashMap<String,MetricsContext>(); new HashMap<String,MetricsContext>();
// Used only when contexts, or the ContextFactory itself, cannot be // Used only when contexts, or the ContextFactory itself, cannot be
// created. // created.
private static Map<String,MetricsContext> nullContextMap = private static Map<String,MetricsContext> nullContextMap =
new HashMap<String,MetricsContext>(); new HashMap<String,MetricsContext>();
/** Creates a new instance of ContextFactory */ /** Creates a new instance of ContextFactory */
protected ContextFactory() { protected ContextFactory() {
} }
/** /**
* Returns the value of the named attribute, or null if there is no * Returns the value of the named attribute, or null if there is no
* attribute of that name. * attribute of that name.
* *
* @param attributeName the attribute name * @param attributeName the attribute name
* @return the attribute value * @return the attribute value
*/ */
public Object getAttribute(String attributeName) { public Object getAttribute(String attributeName) {
return attributeMap.get(attributeName); return attributeMap.get(attributeName);
} }
/** /**
* Returns the names of all the factory's attributes. * Returns the names of all the factory's attributes.
* *
* @return the attribute names * @return the attribute names
*/ */
public String[] getAttributeNames() { public String[] getAttributeNames() {
String[] result = new String[attributeMap.size()]; String[] result = new String[attributeMap.size()];
int i = 0; int i = 0;
// for (String attributeName : attributeMap.keySet()) { // for (String attributeName : attributeMap.keySet()) {
Iterator it = attributeMap.keySet().iterator(); Iterator it = attributeMap.keySet().iterator();
while (it.hasNext()) { while (it.hasNext()) {
result[i++] = (String) it.next(); result[i++] = (String) it.next();
} }
return result; return result;
} }
/** /**
* Sets the named factory attribute to the specified value, creating it * Sets the named factory attribute to the specified value, creating it
* if it did not already exist. If the value is null, this is the same as * if it did not already exist. If the value is null, this is the same as
* calling removeAttribute. * calling removeAttribute.
* *
* @param attributeName the attribute name * @param attributeName the attribute name
* @param value the new attribute value * @param value the new attribute value
*/ */
public void setAttribute(String attributeName, Object value) { public void setAttribute(String attributeName, Object value) {
attributeMap.put(attributeName, value); attributeMap.put(attributeName, value);
} }
/** /**
* Removes the named attribute if it exists. * Removes the named attribute if it exists.
* *
* @param attributeName the attribute name * @param attributeName the attribute name
*/ */
public void removeAttribute(String attributeName) { public void removeAttribute(String attributeName) {
attributeMap.remove(attributeName); attributeMap.remove(attributeName);
} }
/** /**
* Returns the named MetricsContext instance, constructing it if necessary * Returns the named MetricsContext instance, constructing it if necessary
* using the factory's current configuration attributes. <p/> * using the factory's current configuration attributes. <p/>
* *
* When constructing the instance, if the factory property * When constructing the instance, if the factory property
* <i>contextName</i>.class</code> exists, * <i>contextName</i>.class</code> exists,
* its value is taken to be the name of the class to instantiate. Otherwise, * its value is taken to be the name of the class to instantiate. Otherwise,
* the default is to create an instance of * the default is to create an instance of
* <code>org.apache.hadoop.metrics.spi.NullContext</code>, which is a * <code>org.apache.hadoop.metrics.spi.NullContext</code>, which is a
* dummy "no-op" context which will cause all metric data to be discarded. * dummy "no-op" context which will cause all metric data to be discarded.
* *
* @param contextName the name of the context * @param contextName the name of the context
* @return the named MetricsContext * @return the named MetricsContext
*/ */
public synchronized MetricsContext getContext(String refName, String contextName) public synchronized MetricsContext getContext(String refName, String contextName)
throws IOException, ClassNotFoundException, throws IOException, ClassNotFoundException,
InstantiationException, IllegalAccessException { InstantiationException, IllegalAccessException {
MetricsContext metricsContext = contextMap.get(refName); MetricsContext metricsContext = contextMap.get(refName);
if (metricsContext == null) { if (metricsContext == null) {
String classNameAttribute = refName + CONTEXT_CLASS_SUFFIX; String classNameAttribute = refName + CONTEXT_CLASS_SUFFIX;
String className = (String) getAttribute(classNameAttribute); String className = (String) getAttribute(classNameAttribute);
if (className == null) { if (className == null) {
className = DEFAULT_CONTEXT_CLASSNAME; className = DEFAULT_CONTEXT_CLASSNAME;
} }
Class contextClass = Class.forName(className); Class contextClass = Class.forName(className);
metricsContext = (MetricsContext) contextClass.newInstance(); metricsContext = (MetricsContext) contextClass.newInstance();
metricsContext.init(contextName, this); metricsContext.init(contextName, this);
contextMap.put(contextName, metricsContext); contextMap.put(contextName, metricsContext);
} }
return metricsContext; return metricsContext;
} }
public synchronized MetricsContext getContext(String contextName) public synchronized MetricsContext getContext(String contextName)
throws IOException, ClassNotFoundException, InstantiationException, throws IOException, ClassNotFoundException, InstantiationException,
IllegalAccessException { IllegalAccessException {
return getContext(contextName, contextName); return getContext(contextName, contextName);
} }
/** /**
* Returns all MetricsContexts built by this factory. * Returns all MetricsContexts built by this factory.
*/ */
public synchronized Collection<MetricsContext> getAllContexts() { public synchronized Collection<MetricsContext> getAllContexts() {
// Make a copy to avoid race conditions with creating new contexts. // Make a copy to avoid race conditions with creating new contexts.
return new ArrayList<MetricsContext>(contextMap.values()); return new ArrayList<MetricsContext>(contextMap.values());
} }
/** /**
* Returns a "null" context - one which does nothing. * Returns a "null" context - one which does nothing.
*/ */
public static synchronized MetricsContext getNullContext(String contextName) { public static synchronized MetricsContext getNullContext(String contextName) {
MetricsContext nullContext = nullContextMap.get(contextName); MetricsContext nullContext = nullContextMap.get(contextName);
if (nullContext == null) { if (nullContext == null) {
nullContext = new NullContext(); nullContext = new NullContext();
nullContextMap.put(contextName, nullContext); nullContextMap.put(contextName, nullContext);
} }
return nullContext; return nullContext;
} }
/** /**
* Returns the singleton ContextFactory instance, constructing it if * Returns the singleton ContextFactory instance, constructing it if
* necessary. <p/> * necessary. <p/>
* *
* When the instance is constructed, this method checks if the file * When the instance is constructed, this method checks if the file
* <code>hadoop-metrics.properties</code> exists on the class path. If it * <code>hadoop-metrics.properties</code> exists on the class path. If it
* exists, it must be in the format defined by java.util.Properties, and all * exists, it must be in the format defined by java.util.Properties, and all
* the properties in the file are set as attributes on the newly created * the properties in the file are set as attributes on the newly created
* ContextFactory instance. * ContextFactory instance.
* *
* @return the singleton ContextFactory instance * @return the singleton ContextFactory instance
*/ */
public static synchronized ContextFactory getFactory() throws IOException { public static synchronized ContextFactory getFactory() throws IOException {
if (theFactory == null) { if (theFactory == null) {
theFactory = new ContextFactory(); theFactory = new ContextFactory();
theFactory.setAttributes(); theFactory.setAttributes();
} }
return theFactory; return theFactory;
} }
private void setAttributes() throws IOException { private void setAttributes() throws IOException {
InputStream is = getClass().getResourceAsStream(PROPERTIES_FILE); InputStream is = getClass().getResourceAsStream(PROPERTIES_FILE);
if (is != null) { if (is != null) {
try { try {
Properties properties = new Properties(); Properties properties = new Properties();
properties.load(is); properties.load(is);
//for (Object propertyNameObj : properties.keySet()) { //for (Object propertyNameObj : properties.keySet()) {
Iterator it = properties.keySet().iterator(); Iterator it = properties.keySet().iterator();
while (it.hasNext()) { while (it.hasNext()) {
String propertyName = (String) it.next(); String propertyName = (String) it.next();
String propertyValue = properties.getProperty(propertyName); String propertyValue = properties.getProperty(propertyName);
setAttribute(propertyName, propertyValue); setAttribute(propertyName, propertyValue);
} }
} finally { } finally {
is.close(); is.close();
} }
} }
} }
} }

View File

@ -1,122 +1,122 @@
/* /*
* MetricsContext.java * MetricsContext.java
* *
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.metrics; package org.apache.hadoop.metrics;
import java.io.IOException; import java.io.IOException;
import java.util.Collection; import java.util.Collection;
import java.util.Map; import java.util.Map;
import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.metrics.spi.OutputRecord; import org.apache.hadoop.metrics.spi.OutputRecord;
/** /**
* The main interface to the metrics package. * The main interface to the metrics package.
*/ */
@InterfaceAudience.Private @InterfaceAudience.Private
@InterfaceStability.Evolving @InterfaceStability.Evolving
public interface MetricsContext { public interface MetricsContext {
/** /**
* Default period in seconds at which data is sent to the metrics system. * Default period in seconds at which data is sent to the metrics system.
*/ */
public static final int DEFAULT_PERIOD = 5; public static final int DEFAULT_PERIOD = 5;
/** /**
* Initialize this context. * Initialize this context.
* @param contextName The given name for this context * @param contextName The given name for this context
* @param factory The creator of this context * @param factory The creator of this context
*/ */
public void init(String contextName, ContextFactory factory); public void init(String contextName, ContextFactory factory);
/** /**
* Returns the context name. * Returns the context name.
* *
* @return the context name * @return the context name
*/ */
public abstract String getContextName(); public abstract String getContextName();
/** /**
* Starts or restarts monitoring, the emitting of metrics records as they are * Starts or restarts monitoring, the emitting of metrics records as they are
* updated. * updated.
*/ */
public abstract void startMonitoring() public abstract void startMonitoring()
throws IOException; throws IOException;
/** /**
* Stops monitoring. This does not free any data that the implementation * Stops monitoring. This does not free any data that the implementation
* may have buffered for sending at the next timer event. It * may have buffered for sending at the next timer event. It
* is OK to call <code>startMonitoring()</code> again after calling * is OK to call <code>startMonitoring()</code> again after calling
* this. * this.
* @see #close() * @see #close()
*/ */
public abstract void stopMonitoring(); public abstract void stopMonitoring();
/** /**
* Returns true if monitoring is currently in progress. * Returns true if monitoring is currently in progress.
*/ */
public abstract boolean isMonitoring(); public abstract boolean isMonitoring();
/** /**
* Stops monitoring and also frees any buffered data, returning this * Stops monitoring and also frees any buffered data, returning this
* object to its initial state. * object to its initial state.
*/ */
public abstract void close(); public abstract void close();
/** /**
* Creates a new MetricsRecord instance with the given <code>recordName</code>. * Creates a new MetricsRecord instance with the given <code>recordName</code>.
* Throws an exception if the metrics implementation is configured with a fixed * Throws an exception if the metrics implementation is configured with a fixed
* set of record names and <code>recordName</code> is not in that set. * set of record names and <code>recordName</code> is not in that set.
* *
* @param recordName the name of the record * @param recordName the name of the record
* @throws MetricsException if recordName conflicts with configuration data * @throws MetricsException if recordName conflicts with configuration data
*/ */
public abstract MetricsRecord createRecord(String recordName); public abstract MetricsRecord createRecord(String recordName);
/** /**
* Registers a callback to be called at regular time intervals, as * Registers a callback to be called at regular time intervals, as
* determined by the implementation-class specific configuration. * determined by the implementation-class specific configuration.
* *
* @param updater object to be run periodically; it should updated * @param updater object to be run periodically; it should updated
* some metrics records and then return * some metrics records and then return
*/ */
public abstract void registerUpdater(Updater updater); public abstract void registerUpdater(Updater updater);
/** /**
* Removes a callback, if it exists. * Removes a callback, if it exists.
* *
* @param updater object to be removed from the callback list * @param updater object to be removed from the callback list
*/ */
public abstract void unregisterUpdater(Updater updater); public abstract void unregisterUpdater(Updater updater);
/** /**
* Returns the timer period. * Returns the timer period.
*/ */
public abstract int getPeriod(); public abstract int getPeriod();
/** /**
* Retrieves all the records managed by this MetricsContext. * Retrieves all the records managed by this MetricsContext.
* Useful for monitoring systems that are polling-based. * Useful for monitoring systems that are polling-based.
* *
* @return A non-null map from all record names to the records managed. * @return A non-null map from all record names to the records managed.
*/ */
Map<String, Collection<OutputRecord>> getAllRecords(); Map<String, Collection<OutputRecord>> getAllRecords();
} }

View File

@ -1,47 +1,47 @@
/* /*
* MetricsException.java * MetricsException.java
* *
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.metrics; package org.apache.hadoop.metrics;
import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.classification.InterfaceStability;
/** /**
* General-purpose, unchecked metrics exception. * General-purpose, unchecked metrics exception.
*/ */
@InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"}) @InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"})
@InterfaceStability.Evolving @InterfaceStability.Evolving
public class MetricsException extends RuntimeException { public class MetricsException extends RuntimeException {
private static final long serialVersionUID = -1643257498540498497L; private static final long serialVersionUID = -1643257498540498497L;
/** Creates a new instance of MetricsException */ /** Creates a new instance of MetricsException */
public MetricsException() { public MetricsException() {
} }
/** Creates a new instance of MetricsException /** Creates a new instance of MetricsException
* *
* @param message an error message * @param message an error message
*/ */
public MetricsException(String message) { public MetricsException(String message) {
super(message); super(message);
} }
} }

View File

@ -1,251 +1,251 @@
/* /*
* MetricsRecord.java * MetricsRecord.java
* *
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.metrics; package org.apache.hadoop.metrics;
import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.classification.InterfaceStability;
/** /**
* A named and optionally tagged set of records to be sent to the metrics * A named and optionally tagged set of records to be sent to the metrics
* system. <p/> * system. <p/>
* *
* A record name identifies the kind of data to be reported. For example, a * A record name identifies the kind of data to be reported. For example, a
* program reporting statistics relating to the disks on a computer might use * program reporting statistics relating to the disks on a computer might use
* a record name "diskStats".<p/> * a record name "diskStats".<p/>
* *
* A record has zero or more <i>tags</i>. A tag has a name and a value. To * A record has zero or more <i>tags</i>. A tag has a name and a value. To
* continue the example, the "diskStats" record might use a tag named * continue the example, the "diskStats" record might use a tag named
* "diskName" to identify a particular disk. Sometimes it is useful to have * "diskName" to identify a particular disk. Sometimes it is useful to have
* more than one tag, so there might also be a "diskType" with value "ide" or * more than one tag, so there might also be a "diskType" with value "ide" or
* "scsi" or whatever.<p/> * "scsi" or whatever.<p/>
* *
* A record also has zero or more <i>metrics</i>. These are the named * A record also has zero or more <i>metrics</i>. These are the named
* values that are to be reported to the metrics system. In the "diskStats" * values that are to be reported to the metrics system. In the "diskStats"
* example, possible metric names would be "diskPercentFull", "diskPercentBusy", * example, possible metric names would be "diskPercentFull", "diskPercentBusy",
* "kbReadPerSecond", etc.<p/> * "kbReadPerSecond", etc.<p/>
* *
* The general procedure for using a MetricsRecord is to fill in its tag and * The general procedure for using a MetricsRecord is to fill in its tag and
* metric values, and then call <code>update()</code> to pass the record to the * metric values, and then call <code>update()</code> to pass the record to the
* client library. * client library.
* Metric data is not immediately sent to the metrics system * Metric data is not immediately sent to the metrics system
* each time that <code>update()</code> is called. * each time that <code>update()</code> is called.
* An internal table is maintained, identified by the record name. This * An internal table is maintained, identified by the record name. This
* table has columns * table has columns
* corresponding to the tag and the metric names, and rows * corresponding to the tag and the metric names, and rows
* corresponding to each unique set of tag values. An update * corresponding to each unique set of tag values. An update
* either modifies an existing row in the table, or adds a new row with a set of * either modifies an existing row in the table, or adds a new row with a set of
* tag values that are different from all the other rows. Note that if there * tag values that are different from all the other rows. Note that if there
* are no tags, then there can be at most one row in the table. <p/> * are no tags, then there can be at most one row in the table. <p/>
* *
* Once a row is added to the table, its data will be sent to the metrics system * Once a row is added to the table, its data will be sent to the metrics system
* on every timer period, whether or not it has been updated since the previous * on every timer period, whether or not it has been updated since the previous
* timer period. If this is inappropriate, for example if metrics were being * timer period. If this is inappropriate, for example if metrics were being
* reported by some transient object in an application, the <code>remove()</code> * reported by some transient object in an application, the <code>remove()</code>
* method can be used to remove the row and thus stop the data from being * method can be used to remove the row and thus stop the data from being
* sent.<p/> * sent.<p/>
* *
* Note that the <code>update()</code> method is atomic. This means that it is * Note that the <code>update()</code> method is atomic. This means that it is
* safe for different threads to be updating the same metric. More precisely, * safe for different threads to be updating the same metric. More precisely,
* it is OK for different threads to call <code>update()</code> on MetricsRecord instances * it is OK for different threads to call <code>update()</code> on MetricsRecord instances
* with the same set of tag names and tag values. Different threads should * with the same set of tag names and tag values. Different threads should
* <b>not</b> use the same MetricsRecord instance at the same time. * <b>not</b> use the same MetricsRecord instance at the same time.
*/ */
@InterfaceAudience.Private @InterfaceAudience.Private
@InterfaceStability.Evolving @InterfaceStability.Evolving
public interface MetricsRecord { public interface MetricsRecord {
/** /**
* Returns the record name. * Returns the record name.
* *
* @return the record name * @return the record name
*/ */
public abstract String getRecordName(); public abstract String getRecordName();
/** /**
* Sets the named tag to the specified value. The tagValue may be null, * Sets the named tag to the specified value. The tagValue may be null,
* which is treated the same as an empty String. * which is treated the same as an empty String.
* *
* @param tagName name of the tag * @param tagName name of the tag
* @param tagValue new value of the tag * @param tagValue new value of the tag
* @throws MetricsException if the tagName conflicts with the configuration * @throws MetricsException if the tagName conflicts with the configuration
*/ */
public abstract void setTag(String tagName, String tagValue); public abstract void setTag(String tagName, String tagValue);
/** /**
* Sets the named tag to the specified value. * Sets the named tag to the specified value.
* *
* @param tagName name of the tag * @param tagName name of the tag
* @param tagValue new value of the tag * @param tagValue new value of the tag
* @throws MetricsException if the tagName conflicts with the configuration * @throws MetricsException if the tagName conflicts with the configuration
*/ */
public abstract void setTag(String tagName, int tagValue); public abstract void setTag(String tagName, int tagValue);
/** /**
* Sets the named tag to the specified value. * Sets the named tag to the specified value.
* *
* @param tagName name of the tag * @param tagName name of the tag
* @param tagValue new value of the tag * @param tagValue new value of the tag
* @throws MetricsException if the tagName conflicts with the configuration * @throws MetricsException if the tagName conflicts with the configuration
*/ */
public abstract void setTag(String tagName, long tagValue); public abstract void setTag(String tagName, long tagValue);
/** /**
* Sets the named tag to the specified value. * Sets the named tag to the specified value.
* *
* @param tagName name of the tag * @param tagName name of the tag
* @param tagValue new value of the tag * @param tagValue new value of the tag
* @throws MetricsException if the tagName conflicts with the configuration * @throws MetricsException if the tagName conflicts with the configuration
*/ */
public abstract void setTag(String tagName, short tagValue); public abstract void setTag(String tagName, short tagValue);
/** /**
* Sets the named tag to the specified value. * Sets the named tag to the specified value.
* *
* @param tagName name of the tag * @param tagName name of the tag
* @param tagValue new value of the tag * @param tagValue new value of the tag
* @throws MetricsException if the tagName conflicts with the configuration * @throws MetricsException if the tagName conflicts with the configuration
*/ */
public abstract void setTag(String tagName, byte tagValue); public abstract void setTag(String tagName, byte tagValue);
/** /**
* Removes any tag of the specified name. * Removes any tag of the specified name.
* *
* @param tagName name of a tag * @param tagName name of a tag
*/ */
public abstract void removeTag(String tagName); public abstract void removeTag(String tagName);
/** /**
* Sets the named metric to the specified value. * Sets the named metric to the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue new value of the metric * @param metricValue new value of the metric
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public abstract void setMetric(String metricName, int metricValue); public abstract void setMetric(String metricName, int metricValue);
/** /**
* Sets the named metric to the specified value. * Sets the named metric to the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue new value of the metric * @param metricValue new value of the metric
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public abstract void setMetric(String metricName, long metricValue); public abstract void setMetric(String metricName, long metricValue);
/** /**
* Sets the named metric to the specified value. * Sets the named metric to the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue new value of the metric * @param metricValue new value of the metric
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public abstract void setMetric(String metricName, short metricValue); public abstract void setMetric(String metricName, short metricValue);
/** /**
* Sets the named metric to the specified value. * Sets the named metric to the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue new value of the metric * @param metricValue new value of the metric
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public abstract void setMetric(String metricName, byte metricValue); public abstract void setMetric(String metricName, byte metricValue);
/** /**
* Sets the named metric to the specified value. * Sets the named metric to the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue new value of the metric * @param metricValue new value of the metric
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public abstract void setMetric(String metricName, float metricValue); public abstract void setMetric(String metricName, float metricValue);
/** /**
* Increments the named metric by the specified value. * Increments the named metric by the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue incremental value * @param metricValue incremental value
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public abstract void incrMetric(String metricName, int metricValue); public abstract void incrMetric(String metricName, int metricValue);
/** /**
* Increments the named metric by the specified value. * Increments the named metric by the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue incremental value * @param metricValue incremental value
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public abstract void incrMetric(String metricName, long metricValue); public abstract void incrMetric(String metricName, long metricValue);
/** /**
* Increments the named metric by the specified value. * Increments the named metric by the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue incremental value * @param metricValue incremental value
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public abstract void incrMetric(String metricName, short metricValue); public abstract void incrMetric(String metricName, short metricValue);
/** /**
* Increments the named metric by the specified value. * Increments the named metric by the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue incremental value * @param metricValue incremental value
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public abstract void incrMetric(String metricName, byte metricValue); public abstract void incrMetric(String metricName, byte metricValue);
/** /**
* Increments the named metric by the specified value. * Increments the named metric by the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue incremental value * @param metricValue incremental value
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public abstract void incrMetric(String metricName, float metricValue); public abstract void incrMetric(String metricName, float metricValue);
/** /**
* Updates the table of buffered data which is to be sent periodically. * Updates the table of buffered data which is to be sent periodically.
* If the tag values match an existing row, that row is updated; * If the tag values match an existing row, that row is updated;
* otherwise, a new row is added. * otherwise, a new row is added.
*/ */
public abstract void update(); public abstract void update();
/** /**
* Removes, from the buffered data table, all rows having tags * Removes, from the buffered data table, all rows having tags
* that equal the tags that have been set on this record. For example, * that equal the tags that have been set on this record. For example,
* if there are no tags on this record, all rows for this record name * if there are no tags on this record, all rows for this record name
* would be removed. Or, if there is a single tag on this record, then * would be removed. Or, if there is a single tag on this record, then
* just rows containing a tag with the same name and value would be removed. * just rows containing a tag with the same name and value would be removed.
*/ */
public abstract void remove(); public abstract void remove();
} }

View File

@ -1,154 +1,154 @@
/* /*
* FileContext.java * FileContext.java
* *
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.metrics.file; package org.apache.hadoop.metrics.file;
import java.io.BufferedOutputStream; import java.io.BufferedOutputStream;
import java.io.File; import java.io.File;
import java.io.FileWriter; import java.io.FileWriter;
import java.io.IOException; import java.io.IOException;
import java.io.PrintWriter; import java.io.PrintWriter;
import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.metrics.ContextFactory; import org.apache.hadoop.metrics.ContextFactory;
import org.apache.hadoop.metrics.spi.AbstractMetricsContext; import org.apache.hadoop.metrics.spi.AbstractMetricsContext;
import org.apache.hadoop.metrics.spi.OutputRecord; import org.apache.hadoop.metrics.spi.OutputRecord;
/** /**
* Metrics context for writing metrics to a file.<p/> * Metrics context for writing metrics to a file.<p/>
* *
* This class is configured by setting ContextFactory attributes which in turn * This class is configured by setting ContextFactory attributes which in turn
* are usually configured through a properties file. All the attributes are * are usually configured through a properties file. All the attributes are
* prefixed by the contextName. For example, the properties file might contain: * prefixed by the contextName. For example, the properties file might contain:
* <pre> * <pre>
* myContextName.fileName=/tmp/metrics.log * myContextName.fileName=/tmp/metrics.log
* myContextName.period=5 * myContextName.period=5
* </pre> * </pre>
* @see org.apache.hadoop.metrics2.sink.FileSink for metrics 2.0. * @see org.apache.hadoop.metrics2.sink.FileSink for metrics 2.0.
*/ */
@InterfaceAudience.Public @InterfaceAudience.Public
@InterfaceStability.Evolving @InterfaceStability.Evolving
@Deprecated @Deprecated
public class FileContext extends AbstractMetricsContext { public class FileContext extends AbstractMetricsContext {
/* Configuration attribute names */ /* Configuration attribute names */
@InterfaceAudience.Private @InterfaceAudience.Private
protected static final String FILE_NAME_PROPERTY = "fileName"; protected static final String FILE_NAME_PROPERTY = "fileName";
@InterfaceAudience.Private @InterfaceAudience.Private
protected static final String PERIOD_PROPERTY = "period"; protected static final String PERIOD_PROPERTY = "period";
private File file = null; // file for metrics to be written to private File file = null; // file for metrics to be written to
private PrintWriter writer = null; private PrintWriter writer = null;
/** Creates a new instance of FileContext */ /** Creates a new instance of FileContext */
@InterfaceAudience.Private @InterfaceAudience.Private
public FileContext() {} public FileContext() {}
@InterfaceAudience.Private @InterfaceAudience.Private
public void init(String contextName, ContextFactory factory) { public void init(String contextName, ContextFactory factory) {
super.init(contextName, factory); super.init(contextName, factory);
String fileName = getAttribute(FILE_NAME_PROPERTY); String fileName = getAttribute(FILE_NAME_PROPERTY);
if (fileName != null) { if (fileName != null) {
file = new File(fileName); file = new File(fileName);
} }
parseAndSetPeriod(PERIOD_PROPERTY); parseAndSetPeriod(PERIOD_PROPERTY);
} }
/** /**
* Returns the configured file name, or null. * Returns the configured file name, or null.
*/ */
@InterfaceAudience.Private @InterfaceAudience.Private
public String getFileName() { public String getFileName() {
if (file == null) { if (file == null) {
return null; return null;
} else { } else {
return file.getName(); return file.getName();
} }
} }
/** /**
* Starts or restarts monitoring, by opening in append-mode, the * Starts or restarts monitoring, by opening in append-mode, the
* file specified by the <code>fileName</code> attribute, * file specified by the <code>fileName</code> attribute,
* if specified. Otherwise the data will be written to standard * if specified. Otherwise the data will be written to standard
* output. * output.
*/ */
@InterfaceAudience.Private @InterfaceAudience.Private
public void startMonitoring() public void startMonitoring()
throws IOException throws IOException
{ {
if (file == null) { if (file == null) {
writer = new PrintWriter(new BufferedOutputStream(System.out)); writer = new PrintWriter(new BufferedOutputStream(System.out));
} else { } else {
writer = new PrintWriter(new FileWriter(file, true)); writer = new PrintWriter(new FileWriter(file, true));
} }
super.startMonitoring(); super.startMonitoring();
} }
/** /**
* Stops monitoring, closing the file. * Stops monitoring, closing the file.
* @see #close() * @see #close()
*/ */
@InterfaceAudience.Private @InterfaceAudience.Private
public void stopMonitoring() { public void stopMonitoring() {
super.stopMonitoring(); super.stopMonitoring();
if (writer != null) { if (writer != null) {
writer.close(); writer.close();
writer = null; writer = null;
} }
} }
/** /**
* Emits a metrics record to a file. * Emits a metrics record to a file.
*/ */
@InterfaceAudience.Private @InterfaceAudience.Private
public void emitRecord(String contextName, String recordName, OutputRecord outRec) { public void emitRecord(String contextName, String recordName, OutputRecord outRec) {
writer.print(contextName); writer.print(contextName);
writer.print("."); writer.print(".");
writer.print(recordName); writer.print(recordName);
String separator = ": "; String separator = ": ";
for (String tagName : outRec.getTagNames()) { for (String tagName : outRec.getTagNames()) {
writer.print(separator); writer.print(separator);
separator = ", "; separator = ", ";
writer.print(tagName); writer.print(tagName);
writer.print("="); writer.print("=");
writer.print(outRec.getTag(tagName)); writer.print(outRec.getTag(tagName));
} }
for (String metricName : outRec.getMetricNames()) { for (String metricName : outRec.getMetricNames()) {
writer.print(separator); writer.print(separator);
separator = ", "; separator = ", ";
writer.print(metricName); writer.print(metricName);
writer.print("="); writer.print("=");
writer.print(outRec.getMetric(metricName)); writer.print(outRec.getMetric(metricName));
} }
writer.println(); writer.println();
} }
/** /**
* Flushes the output writer, forcing updates to disk. * Flushes the output writer, forcing updates to disk.
*/ */
@InterfaceAudience.Private @InterfaceAudience.Private
public void flush() { public void flush() {
writer.flush(); writer.flush();
} }
} }

View File

@ -1,481 +1,481 @@
/* /*
* AbstractMetricsContext.java * AbstractMetricsContext.java
* *
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.metrics.spi; package org.apache.hadoop.metrics.spi;
import java.io.IOException; import java.io.IOException;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.Collection; import java.util.Collection;
import java.util.HashMap; import java.util.HashMap;
import java.util.HashSet; import java.util.HashSet;
import java.util.Iterator; import java.util.Iterator;
import java.util.List; import java.util.List;
import java.util.Map; import java.util.Map;
import java.util.Set; import java.util.Set;
import java.util.Timer; import java.util.Timer;
import java.util.TimerTask; import java.util.TimerTask;
import java.util.TreeMap; import java.util.TreeMap;
import java.util.Map.Entry; import java.util.Map.Entry;
import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.metrics.ContextFactory; import org.apache.hadoop.metrics.ContextFactory;
import org.apache.hadoop.metrics.MetricsContext; import org.apache.hadoop.metrics.MetricsContext;
import org.apache.hadoop.metrics.MetricsException; import org.apache.hadoop.metrics.MetricsException;
import org.apache.hadoop.metrics.MetricsRecord; import org.apache.hadoop.metrics.MetricsRecord;
import org.apache.hadoop.metrics.Updater; import org.apache.hadoop.metrics.Updater;
/** /**
* The main class of the Service Provider Interface. This class should be * The main class of the Service Provider Interface. This class should be
* extended in order to integrate the Metrics API with a specific metrics * extended in order to integrate the Metrics API with a specific metrics
* client library. <p/> * client library. <p/>
* *
* This class implements the internal table of metric data, and the timer * This class implements the internal table of metric data, and the timer
* on which data is to be sent to the metrics system. Subclasses must * on which data is to be sent to the metrics system. Subclasses must
* override the abstract <code>emitRecord</code> method in order to transmit * override the abstract <code>emitRecord</code> method in order to transmit
* the data. <p/> * the data. <p/>
*/ */
@InterfaceAudience.Public @InterfaceAudience.Public
@InterfaceStability.Evolving @InterfaceStability.Evolving
public abstract class AbstractMetricsContext implements MetricsContext { public abstract class AbstractMetricsContext implements MetricsContext {
private int period = MetricsContext.DEFAULT_PERIOD; private int period = MetricsContext.DEFAULT_PERIOD;
private Timer timer = null; private Timer timer = null;
private Set<Updater> updaters = new HashSet<Updater>(1); private Set<Updater> updaters = new HashSet<Updater>(1);
private volatile boolean isMonitoring = false; private volatile boolean isMonitoring = false;
private ContextFactory factory = null; private ContextFactory factory = null;
private String contextName = null; private String contextName = null;
@InterfaceAudience.Private @InterfaceAudience.Private
public static class TagMap extends TreeMap<String,Object> { public static class TagMap extends TreeMap<String,Object> {
private static final long serialVersionUID = 3546309335061952993L; private static final long serialVersionUID = 3546309335061952993L;
TagMap() { TagMap() {
super(); super();
} }
TagMap(TagMap orig) { TagMap(TagMap orig) {
super(orig); super(orig);
} }
/** /**
* Returns true if this tagmap contains every tag in other. * Returns true if this tagmap contains every tag in other.
*/ */
public boolean containsAll(TagMap other) { public boolean containsAll(TagMap other) {
for (Map.Entry<String,Object> entry : other.entrySet()) { for (Map.Entry<String,Object> entry : other.entrySet()) {
Object value = get(entry.getKey()); Object value = get(entry.getKey());
if (value == null || !value.equals(entry.getValue())) { if (value == null || !value.equals(entry.getValue())) {
// either key does not exist here, or the value is different // either key does not exist here, or the value is different
return false; return false;
} }
} }
return true; return true;
} }
} }
@InterfaceAudience.Private @InterfaceAudience.Private
public static class MetricMap extends TreeMap<String,Number> { public static class MetricMap extends TreeMap<String,Number> {
private static final long serialVersionUID = -7495051861141631609L; private static final long serialVersionUID = -7495051861141631609L;
MetricMap() { MetricMap() {
super(); super();
} }
MetricMap(MetricMap orig) { MetricMap(MetricMap orig) {
super(orig); super(orig);
} }
} }
static class RecordMap extends HashMap<TagMap,MetricMap> { static class RecordMap extends HashMap<TagMap,MetricMap> {
private static final long serialVersionUID = 259835619700264611L; private static final long serialVersionUID = 259835619700264611L;
} }
private Map<String,RecordMap> bufferedData = new HashMap<String,RecordMap>(); private Map<String,RecordMap> bufferedData = new HashMap<String,RecordMap>();
/** /**
* Creates a new instance of AbstractMetricsContext * Creates a new instance of AbstractMetricsContext
*/ */
protected AbstractMetricsContext() { protected AbstractMetricsContext() {
} }
/** /**
* Initializes the context. * Initializes the context.
*/ */
public void init(String contextName, ContextFactory factory) public void init(String contextName, ContextFactory factory)
{ {
this.contextName = contextName; this.contextName = contextName;
this.factory = factory; this.factory = factory;
} }
/** /**
* Convenience method for subclasses to access factory attributes. * Convenience method for subclasses to access factory attributes.
*/ */
protected String getAttribute(String attributeName) { protected String getAttribute(String attributeName) {
String factoryAttribute = contextName + "." + attributeName; String factoryAttribute = contextName + "." + attributeName;
return (String) factory.getAttribute(factoryAttribute); return (String) factory.getAttribute(factoryAttribute);
} }
/** /**
* Returns an attribute-value map derived from the factory attributes * Returns an attribute-value map derived from the factory attributes
* by finding all factory attributes that begin with * by finding all factory attributes that begin with
* <i>contextName</i>.<i>tableName</i>. The returned map consists of * <i>contextName</i>.<i>tableName</i>. The returned map consists of
* those attributes with the contextName and tableName stripped off. * those attributes with the contextName and tableName stripped off.
*/ */
protected Map<String,String> getAttributeTable(String tableName) { protected Map<String,String> getAttributeTable(String tableName) {
String prefix = contextName + "." + tableName + "."; String prefix = contextName + "." + tableName + ".";
Map<String,String> result = new HashMap<String,String>(); Map<String,String> result = new HashMap<String,String>();
for (String attributeName : factory.getAttributeNames()) { for (String attributeName : factory.getAttributeNames()) {
if (attributeName.startsWith(prefix)) { if (attributeName.startsWith(prefix)) {
String name = attributeName.substring(prefix.length()); String name = attributeName.substring(prefix.length());
String value = (String) factory.getAttribute(attributeName); String value = (String) factory.getAttribute(attributeName);
result.put(name, value); result.put(name, value);
} }
} }
return result; return result;
} }
/** /**
* Returns the context name. * Returns the context name.
*/ */
public String getContextName() { public String getContextName() {
return contextName; return contextName;
} }
/** /**
* Returns the factory by which this context was created. * Returns the factory by which this context was created.
*/ */
public ContextFactory getContextFactory() { public ContextFactory getContextFactory() {
return factory; return factory;
} }
/** /**
* Starts or restarts monitoring, the emitting of metrics records. * Starts or restarts monitoring, the emitting of metrics records.
*/ */
public synchronized void startMonitoring() public synchronized void startMonitoring()
throws IOException { throws IOException {
if (!isMonitoring) { if (!isMonitoring) {
startTimer(); startTimer();
isMonitoring = true; isMonitoring = true;
} }
} }
/** /**
* Stops monitoring. This does not free buffered data. * Stops monitoring. This does not free buffered data.
* @see #close() * @see #close()
*/ */
public synchronized void stopMonitoring() { public synchronized void stopMonitoring() {
if (isMonitoring) { if (isMonitoring) {
stopTimer(); stopTimer();
isMonitoring = false; isMonitoring = false;
} }
} }
/** /**
* Returns true if monitoring is currently in progress. * Returns true if monitoring is currently in progress.
*/ */
public boolean isMonitoring() { public boolean isMonitoring() {
return isMonitoring; return isMonitoring;
} }
/** /**
* Stops monitoring and frees buffered data, returning this * Stops monitoring and frees buffered data, returning this
* object to its initial state. * object to its initial state.
*/ */
public synchronized void close() { public synchronized void close() {
stopMonitoring(); stopMonitoring();
clearUpdaters(); clearUpdaters();
} }
/** /**
* Creates a new AbstractMetricsRecord instance with the given <code>recordName</code>. * Creates a new AbstractMetricsRecord instance with the given <code>recordName</code>.
* Throws an exception if the metrics implementation is configured with a fixed * Throws an exception if the metrics implementation is configured with a fixed
* set of record names and <code>recordName</code> is not in that set. * set of record names and <code>recordName</code> is not in that set.
* *
* @param recordName the name of the record * @param recordName the name of the record
* @throws MetricsException if recordName conflicts with configuration data * @throws MetricsException if recordName conflicts with configuration data
*/ */
public final synchronized MetricsRecord createRecord(String recordName) { public final synchronized MetricsRecord createRecord(String recordName) {
if (bufferedData.get(recordName) == null) { if (bufferedData.get(recordName) == null) {
bufferedData.put(recordName, new RecordMap()); bufferedData.put(recordName, new RecordMap());
} }
return newRecord(recordName); return newRecord(recordName);
} }
/** /**
* Subclasses should override this if they subclass MetricsRecordImpl. * Subclasses should override this if they subclass MetricsRecordImpl.
* @param recordName the name of the record * @param recordName the name of the record
* @return newly created instance of MetricsRecordImpl or subclass * @return newly created instance of MetricsRecordImpl or subclass
*/ */
protected MetricsRecord newRecord(String recordName) { protected MetricsRecord newRecord(String recordName) {
return new MetricsRecordImpl(recordName, this); return new MetricsRecordImpl(recordName, this);
} }
/** /**
* Registers a callback to be called at time intervals determined by * Registers a callback to be called at time intervals determined by
* the configuration. * the configuration.
* *
* @param updater object to be run periodically; it should update * @param updater object to be run periodically; it should update
* some metrics records * some metrics records
*/ */
public synchronized void registerUpdater(final Updater updater) { public synchronized void registerUpdater(final Updater updater) {
if (!updaters.contains(updater)) { if (!updaters.contains(updater)) {
updaters.add(updater); updaters.add(updater);
} }
} }
/** /**
* Removes a callback, if it exists. * Removes a callback, if it exists.
* *
* @param updater object to be removed from the callback list * @param updater object to be removed from the callback list
*/ */
public synchronized void unregisterUpdater(Updater updater) { public synchronized void unregisterUpdater(Updater updater) {
updaters.remove(updater); updaters.remove(updater);
} }
private synchronized void clearUpdaters() { private synchronized void clearUpdaters() {
updaters.clear(); updaters.clear();
} }
/** /**
* Starts timer if it is not already started * Starts timer if it is not already started
*/ */
private synchronized void startTimer() { private synchronized void startTimer() {
if (timer == null) { if (timer == null) {
timer = new Timer("Timer thread for monitoring " + getContextName(), timer = new Timer("Timer thread for monitoring " + getContextName(),
true); true);
TimerTask task = new TimerTask() { TimerTask task = new TimerTask() {
public void run() { public void run() {
try { try {
timerEvent(); timerEvent();
} }
catch (IOException ioe) { catch (IOException ioe) {
ioe.printStackTrace(); ioe.printStackTrace();
} }
} }
}; };
long millis = period * 1000; long millis = period * 1000;
timer.scheduleAtFixedRate(task, millis, millis); timer.scheduleAtFixedRate(task, millis, millis);
} }
} }
/** /**
* Stops timer if it is running * Stops timer if it is running
*/ */
private synchronized void stopTimer() { private synchronized void stopTimer() {
if (timer != null) { if (timer != null) {
timer.cancel(); timer.cancel();
timer = null; timer = null;
} }
} }
/** /**
* Timer callback. * Timer callback.
*/ */
private void timerEvent() throws IOException { private void timerEvent() throws IOException {
if (isMonitoring) { if (isMonitoring) {
Collection<Updater> myUpdaters; Collection<Updater> myUpdaters;
synchronized (this) { synchronized (this) {
myUpdaters = new ArrayList<Updater>(updaters); myUpdaters = new ArrayList<Updater>(updaters);
} }
// Run all the registered updates without holding a lock // Run all the registered updates without holding a lock
// on this context // on this context
for (Updater updater : myUpdaters) { for (Updater updater : myUpdaters) {
try { try {
updater.doUpdates(this); updater.doUpdates(this);
} }
catch (Throwable throwable) { catch (Throwable throwable) {
throwable.printStackTrace(); throwable.printStackTrace();
} }
} }
emitRecords(); emitRecords();
} }
} }
/** /**
* Emits the records. * Emits the records.
*/ */
private synchronized void emitRecords() throws IOException { private synchronized void emitRecords() throws IOException {
for (String recordName : bufferedData.keySet()) { for (String recordName : bufferedData.keySet()) {
RecordMap recordMap = bufferedData.get(recordName); RecordMap recordMap = bufferedData.get(recordName);
synchronized (recordMap) { synchronized (recordMap) {
Set<Entry<TagMap, MetricMap>> entrySet = recordMap.entrySet (); Set<Entry<TagMap, MetricMap>> entrySet = recordMap.entrySet ();
for (Entry<TagMap, MetricMap> entry : entrySet) { for (Entry<TagMap, MetricMap> entry : entrySet) {
OutputRecord outRec = new OutputRecord(entry.getKey(), entry.getValue()); OutputRecord outRec = new OutputRecord(entry.getKey(), entry.getValue());
emitRecord(contextName, recordName, outRec); emitRecord(contextName, recordName, outRec);
} }
} }
} }
flush(); flush();
} }
/** /**
* Retrieves all the records managed by this MetricsContext. * Retrieves all the records managed by this MetricsContext.
* Useful for monitoring systems that are polling-based. * Useful for monitoring systems that are polling-based.
* @return A non-null collection of all monitoring records. * @return A non-null collection of all monitoring records.
*/ */
public synchronized Map<String, Collection<OutputRecord>> getAllRecords() { public synchronized Map<String, Collection<OutputRecord>> getAllRecords() {
Map<String, Collection<OutputRecord>> out = new TreeMap<String, Collection<OutputRecord>>(); Map<String, Collection<OutputRecord>> out = new TreeMap<String, Collection<OutputRecord>>();
for (String recordName : bufferedData.keySet()) { for (String recordName : bufferedData.keySet()) {
RecordMap recordMap = bufferedData.get(recordName); RecordMap recordMap = bufferedData.get(recordName);
synchronized (recordMap) { synchronized (recordMap) {
List<OutputRecord> records = new ArrayList<OutputRecord>(); List<OutputRecord> records = new ArrayList<OutputRecord>();
Set<Entry<TagMap, MetricMap>> entrySet = recordMap.entrySet(); Set<Entry<TagMap, MetricMap>> entrySet = recordMap.entrySet();
for (Entry<TagMap, MetricMap> entry : entrySet) { for (Entry<TagMap, MetricMap> entry : entrySet) {
OutputRecord outRec = new OutputRecord(entry.getKey(), entry.getValue()); OutputRecord outRec = new OutputRecord(entry.getKey(), entry.getValue());
records.add(outRec); records.add(outRec);
} }
out.put(recordName, records); out.put(recordName, records);
} }
} }
return out; return out;
} }
/** /**
* Sends a record to the metrics system. * Sends a record to the metrics system.
*/ */
protected abstract void emitRecord(String contextName, String recordName, protected abstract void emitRecord(String contextName, String recordName,
OutputRecord outRec) throws IOException; OutputRecord outRec) throws IOException;
/** /**
* Called each period after all records have been emitted, this method does nothing. * Called each period after all records have been emitted, this method does nothing.
* Subclasses may override it in order to perform some kind of flush. * Subclasses may override it in order to perform some kind of flush.
*/ */
protected void flush() throws IOException { protected void flush() throws IOException {
} }
/** /**
* Called by MetricsRecordImpl.update(). Creates or updates a row in * Called by MetricsRecordImpl.update(). Creates or updates a row in
* the internal table of metric data. * the internal table of metric data.
*/ */
protected void update(MetricsRecordImpl record) { protected void update(MetricsRecordImpl record) {
String recordName = record.getRecordName(); String recordName = record.getRecordName();
TagMap tagTable = record.getTagTable(); TagMap tagTable = record.getTagTable();
Map<String,MetricValue> metricUpdates = record.getMetricTable(); Map<String,MetricValue> metricUpdates = record.getMetricTable();
RecordMap recordMap = getRecordMap(recordName); RecordMap recordMap = getRecordMap(recordName);
synchronized (recordMap) { synchronized (recordMap) {
MetricMap metricMap = recordMap.get(tagTable); MetricMap metricMap = recordMap.get(tagTable);
if (metricMap == null) { if (metricMap == null) {
metricMap = new MetricMap(); metricMap = new MetricMap();
TagMap tagMap = new TagMap(tagTable); // clone tags TagMap tagMap = new TagMap(tagTable); // clone tags
recordMap.put(tagMap, metricMap); recordMap.put(tagMap, metricMap);
} }
Set<Entry<String, MetricValue>> entrySet = metricUpdates.entrySet(); Set<Entry<String, MetricValue>> entrySet = metricUpdates.entrySet();
for (Entry<String, MetricValue> entry : entrySet) { for (Entry<String, MetricValue> entry : entrySet) {
String metricName = entry.getKey (); String metricName = entry.getKey ();
MetricValue updateValue = entry.getValue (); MetricValue updateValue = entry.getValue ();
Number updateNumber = updateValue.getNumber(); Number updateNumber = updateValue.getNumber();
Number currentNumber = metricMap.get(metricName); Number currentNumber = metricMap.get(metricName);
if (currentNumber == null || updateValue.isAbsolute()) { if (currentNumber == null || updateValue.isAbsolute()) {
metricMap.put(metricName, updateNumber); metricMap.put(metricName, updateNumber);
} }
else { else {
Number newNumber = sum(updateNumber, currentNumber); Number newNumber = sum(updateNumber, currentNumber);
metricMap.put(metricName, newNumber); metricMap.put(metricName, newNumber);
} }
} }
} }
} }
private synchronized RecordMap getRecordMap(String recordName) { private synchronized RecordMap getRecordMap(String recordName) {
return bufferedData.get(recordName); return bufferedData.get(recordName);
} }
/** /**
* Adds two numbers, coercing the second to the type of the first. * Adds two numbers, coercing the second to the type of the first.
* *
*/ */
private Number sum(Number a, Number b) { private Number sum(Number a, Number b) {
if (a instanceof Integer) { if (a instanceof Integer) {
return Integer.valueOf(a.intValue() + b.intValue()); return Integer.valueOf(a.intValue() + b.intValue());
} }
else if (a instanceof Float) { else if (a instanceof Float) {
return new Float(a.floatValue() + b.floatValue()); return new Float(a.floatValue() + b.floatValue());
} }
else if (a instanceof Short) { else if (a instanceof Short) {
return Short.valueOf((short)(a.shortValue() + b.shortValue())); return Short.valueOf((short)(a.shortValue() + b.shortValue()));
} }
else if (a instanceof Byte) { else if (a instanceof Byte) {
return Byte.valueOf((byte)(a.byteValue() + b.byteValue())); return Byte.valueOf((byte)(a.byteValue() + b.byteValue()));
} }
else if (a instanceof Long) { else if (a instanceof Long) {
return Long.valueOf((a.longValue() + b.longValue())); return Long.valueOf((a.longValue() + b.longValue()));
} }
else { else {
// should never happen // should never happen
throw new MetricsException("Invalid number type"); throw new MetricsException("Invalid number type");
} }
} }
/** /**
* Called by MetricsRecordImpl.remove(). Removes all matching rows in * Called by MetricsRecordImpl.remove(). Removes all matching rows in
* the internal table of metric data. A row matches if it has the same * the internal table of metric data. A row matches if it has the same
* tag names and values as record, but it may also have additional * tag names and values as record, but it may also have additional
* tags. * tags.
*/ */
protected void remove(MetricsRecordImpl record) { protected void remove(MetricsRecordImpl record) {
String recordName = record.getRecordName(); String recordName = record.getRecordName();
TagMap tagTable = record.getTagTable(); TagMap tagTable = record.getTagTable();
RecordMap recordMap = getRecordMap(recordName); RecordMap recordMap = getRecordMap(recordName);
synchronized (recordMap) { synchronized (recordMap) {
Iterator<TagMap> it = recordMap.keySet().iterator(); Iterator<TagMap> it = recordMap.keySet().iterator();
while (it.hasNext()) { while (it.hasNext()) {
TagMap rowTags = it.next(); TagMap rowTags = it.next();
if (rowTags.containsAll(tagTable)) { if (rowTags.containsAll(tagTable)) {
it.remove(); it.remove();
} }
} }
} }
} }
/** /**
* Returns the timer period. * Returns the timer period.
*/ */
public int getPeriod() { public int getPeriod() {
return period; return period;
} }
/** /**
* Sets the timer period * Sets the timer period
*/ */
protected void setPeriod(int period) { protected void setPeriod(int period) {
this.period = period; this.period = period;
} }
/** /**
* If a period is set in the attribute passed in, override * If a period is set in the attribute passed in, override
* the default with it. * the default with it.
*/ */
protected void parseAndSetPeriod(String attributeName) { protected void parseAndSetPeriod(String attributeName) {
String periodStr = getAttribute(attributeName); String periodStr = getAttribute(attributeName);
if (periodStr != null) { if (periodStr != null) {
int period = 0; int period = 0;
try { try {
period = Integer.parseInt(periodStr); period = Integer.parseInt(periodStr);
} catch (NumberFormatException nfe) { } catch (NumberFormatException nfe) {
} }
if (period <= 0) { if (period <= 0) {
throw new MetricsException("Invalid period: " + periodStr); throw new MetricsException("Invalid period: " + periodStr);
} }
setPeriod(period); setPeriod(period);
} }
} }
} }

View File

@ -1,281 +1,281 @@
/* /*
* MetricsRecordImpl.java * MetricsRecordImpl.java
* *
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.metrics.spi; package org.apache.hadoop.metrics.spi;
import java.util.LinkedHashMap; import java.util.LinkedHashMap;
import java.util.Map; import java.util.Map;
import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.metrics.MetricsException; import org.apache.hadoop.metrics.MetricsException;
import org.apache.hadoop.metrics.MetricsRecord; import org.apache.hadoop.metrics.MetricsRecord;
import org.apache.hadoop.metrics.spi.AbstractMetricsContext.TagMap; import org.apache.hadoop.metrics.spi.AbstractMetricsContext.TagMap;
/** /**
* An implementation of MetricsRecord. Keeps a back-pointer to the context * An implementation of MetricsRecord. Keeps a back-pointer to the context
* from which it was created, and delegates back to it on <code>update</code> * from which it was created, and delegates back to it on <code>update</code>
* and <code>remove()</code>. * and <code>remove()</code>.
*/ */
@InterfaceAudience.Public @InterfaceAudience.Public
@InterfaceStability.Evolving @InterfaceStability.Evolving
public class MetricsRecordImpl implements MetricsRecord { public class MetricsRecordImpl implements MetricsRecord {
private TagMap tagTable = new TagMap(); private TagMap tagTable = new TagMap();
private Map<String,MetricValue> metricTable = new LinkedHashMap<String,MetricValue>(); private Map<String,MetricValue> metricTable = new LinkedHashMap<String,MetricValue>();
private String recordName; private String recordName;
private AbstractMetricsContext context; private AbstractMetricsContext context;
/** Creates a new instance of FileRecord */ /** Creates a new instance of FileRecord */
protected MetricsRecordImpl(String recordName, AbstractMetricsContext context) protected MetricsRecordImpl(String recordName, AbstractMetricsContext context)
{ {
this.recordName = recordName; this.recordName = recordName;
this.context = context; this.context = context;
} }
/** /**
* Returns the record name. * Returns the record name.
* *
* @return the record name * @return the record name
*/ */
public String getRecordName() { public String getRecordName() {
return recordName; return recordName;
} }
/** /**
* Sets the named tag to the specified value. * Sets the named tag to the specified value.
* *
* @param tagName name of the tag * @param tagName name of the tag
* @param tagValue new value of the tag * @param tagValue new value of the tag
* @throws MetricsException if the tagName conflicts with the configuration * @throws MetricsException if the tagName conflicts with the configuration
*/ */
public void setTag(String tagName, String tagValue) { public void setTag(String tagName, String tagValue) {
if (tagValue == null) { if (tagValue == null) {
tagValue = ""; tagValue = "";
} }
tagTable.put(tagName, tagValue); tagTable.put(tagName, tagValue);
} }
/** /**
* Sets the named tag to the specified value. * Sets the named tag to the specified value.
* *
* @param tagName name of the tag * @param tagName name of the tag
* @param tagValue new value of the tag * @param tagValue new value of the tag
* @throws MetricsException if the tagName conflicts with the configuration * @throws MetricsException if the tagName conflicts with the configuration
*/ */
public void setTag(String tagName, int tagValue) { public void setTag(String tagName, int tagValue) {
tagTable.put(tagName, Integer.valueOf(tagValue)); tagTable.put(tagName, Integer.valueOf(tagValue));
} }
/** /**
* Sets the named tag to the specified value. * Sets the named tag to the specified value.
* *
* @param tagName name of the tag * @param tagName name of the tag
* @param tagValue new value of the tag * @param tagValue new value of the tag
* @throws MetricsException if the tagName conflicts with the configuration * @throws MetricsException if the tagName conflicts with the configuration
*/ */
public void setTag(String tagName, long tagValue) { public void setTag(String tagName, long tagValue) {
tagTable.put(tagName, Long.valueOf(tagValue)); tagTable.put(tagName, Long.valueOf(tagValue));
} }
/** /**
* Sets the named tag to the specified value. * Sets the named tag to the specified value.
* *
* @param tagName name of the tag * @param tagName name of the tag
* @param tagValue new value of the tag * @param tagValue new value of the tag
* @throws MetricsException if the tagName conflicts with the configuration * @throws MetricsException if the tagName conflicts with the configuration
*/ */
public void setTag(String tagName, short tagValue) { public void setTag(String tagName, short tagValue) {
tagTable.put(tagName, Short.valueOf(tagValue)); tagTable.put(tagName, Short.valueOf(tagValue));
} }
/** /**
* Sets the named tag to the specified value. * Sets the named tag to the specified value.
* *
* @param tagName name of the tag * @param tagName name of the tag
* @param tagValue new value of the tag * @param tagValue new value of the tag
* @throws MetricsException if the tagName conflicts with the configuration * @throws MetricsException if the tagName conflicts with the configuration
*/ */
public void setTag(String tagName, byte tagValue) { public void setTag(String tagName, byte tagValue) {
tagTable.put(tagName, Byte.valueOf(tagValue)); tagTable.put(tagName, Byte.valueOf(tagValue));
} }
/** /**
* Removes any tag of the specified name. * Removes any tag of the specified name.
*/ */
public void removeTag(String tagName) { public void removeTag(String tagName) {
tagTable.remove(tagName); tagTable.remove(tagName);
} }
/** /**
* Sets the named metric to the specified value. * Sets the named metric to the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue new value of the metric * @param metricValue new value of the metric
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public void setMetric(String metricName, int metricValue) { public void setMetric(String metricName, int metricValue) {
setAbsolute(metricName, Integer.valueOf(metricValue)); setAbsolute(metricName, Integer.valueOf(metricValue));
} }
/** /**
* Sets the named metric to the specified value. * Sets the named metric to the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue new value of the metric * @param metricValue new value of the metric
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public void setMetric(String metricName, long metricValue) { public void setMetric(String metricName, long metricValue) {
setAbsolute(metricName, Long.valueOf(metricValue)); setAbsolute(metricName, Long.valueOf(metricValue));
} }
/** /**
* Sets the named metric to the specified value. * Sets the named metric to the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue new value of the metric * @param metricValue new value of the metric
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public void setMetric(String metricName, short metricValue) { public void setMetric(String metricName, short metricValue) {
setAbsolute(metricName, Short.valueOf(metricValue)); setAbsolute(metricName, Short.valueOf(metricValue));
} }
/** /**
* Sets the named metric to the specified value. * Sets the named metric to the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue new value of the metric * @param metricValue new value of the metric
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public void setMetric(String metricName, byte metricValue) { public void setMetric(String metricName, byte metricValue) {
setAbsolute(metricName, Byte.valueOf(metricValue)); setAbsolute(metricName, Byte.valueOf(metricValue));
} }
/** /**
* Sets the named metric to the specified value. * Sets the named metric to the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue new value of the metric * @param metricValue new value of the metric
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public void setMetric(String metricName, float metricValue) { public void setMetric(String metricName, float metricValue) {
setAbsolute(metricName, new Float(metricValue)); setAbsolute(metricName, new Float(metricValue));
} }
/** /**
* Increments the named metric by the specified value. * Increments the named metric by the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue incremental value * @param metricValue incremental value
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public void incrMetric(String metricName, int metricValue) { public void incrMetric(String metricName, int metricValue) {
setIncrement(metricName, Integer.valueOf(metricValue)); setIncrement(metricName, Integer.valueOf(metricValue));
} }
/** /**
* Increments the named metric by the specified value. * Increments the named metric by the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue incremental value * @param metricValue incremental value
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public void incrMetric(String metricName, long metricValue) { public void incrMetric(String metricName, long metricValue) {
setIncrement(metricName, Long.valueOf(metricValue)); setIncrement(metricName, Long.valueOf(metricValue));
} }
/** /**
* Increments the named metric by the specified value. * Increments the named metric by the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue incremental value * @param metricValue incremental value
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public void incrMetric(String metricName, short metricValue) { public void incrMetric(String metricName, short metricValue) {
setIncrement(metricName, Short.valueOf(metricValue)); setIncrement(metricName, Short.valueOf(metricValue));
} }
/** /**
* Increments the named metric by the specified value. * Increments the named metric by the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue incremental value * @param metricValue incremental value
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public void incrMetric(String metricName, byte metricValue) { public void incrMetric(String metricName, byte metricValue) {
setIncrement(metricName, Byte.valueOf(metricValue)); setIncrement(metricName, Byte.valueOf(metricValue));
} }
/** /**
* Increments the named metric by the specified value. * Increments the named metric by the specified value.
* *
* @param metricName name of the metric * @param metricName name of the metric
* @param metricValue incremental value * @param metricValue incremental value
* @throws MetricsException if the metricName or the type of the metricValue * @throws MetricsException if the metricName or the type of the metricValue
* conflicts with the configuration * conflicts with the configuration
*/ */
public void incrMetric(String metricName, float metricValue) { public void incrMetric(String metricName, float metricValue) {
setIncrement(metricName, new Float(metricValue)); setIncrement(metricName, new Float(metricValue));
} }
private void setAbsolute(String metricName, Number metricValue) { private void setAbsolute(String metricName, Number metricValue) {
metricTable.put(metricName, new MetricValue(metricValue, MetricValue.ABSOLUTE)); metricTable.put(metricName, new MetricValue(metricValue, MetricValue.ABSOLUTE));
} }
private void setIncrement(String metricName, Number metricValue) { private void setIncrement(String metricName, Number metricValue) {
metricTable.put(metricName, new MetricValue(metricValue, MetricValue.INCREMENT)); metricTable.put(metricName, new MetricValue(metricValue, MetricValue.INCREMENT));
} }
/** /**
* Updates the table of buffered data which is to be sent periodically. * Updates the table of buffered data which is to be sent periodically.
* If the tag values match an existing row, that row is updated; * If the tag values match an existing row, that row is updated;
* otherwise, a new row is added. * otherwise, a new row is added.
*/ */
public void update() { public void update() {
context.update(this); context.update(this);
} }
/** /**
* Removes the row, if it exists, in the buffered data table having tags * Removes the row, if it exists, in the buffered data table having tags
* that equal the tags that have been set on this record. * that equal the tags that have been set on this record.
*/ */
public void remove() { public void remove() {
context.remove(this); context.remove(this);
} }
TagMap getTagTable() { TagMap getTagTable() {
return tagTable; return tagTable;
} }
Map<String, MetricValue> getMetricTable() { Map<String, MetricValue> getMetricTable() {
return metricTable; return metricTable;
} }
} }

View File

@ -1,460 +1,460 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.util; package org.apache.hadoop.util;
import java.io.DataInputStream; import java.io.DataInputStream;
import java.io.DataOutputStream; import java.io.DataOutputStream;
import java.io.IOException; import java.io.IOException;
import java.nio.ByteBuffer; import java.nio.ByteBuffer;
import java.util.zip.Checksum; import java.util.zip.Checksum;
import org.apache.hadoop.classification.InterfaceAudience; import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability; import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.fs.ChecksumException; import org.apache.hadoop.fs.ChecksumException;
/** /**
* This class provides inteface and utilities for processing checksums for * This class provides inteface and utilities for processing checksums for
* DFS data transfers. * DFS data transfers.
*/ */
@InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"}) @InterfaceAudience.LimitedPrivate({"HDFS", "MapReduce"})
@InterfaceStability.Evolving @InterfaceStability.Evolving
public class DataChecksum implements Checksum { public class DataChecksum implements Checksum {
// Misc constants // Misc constants
public static final int HEADER_LEN = 5; /// 1 byte type and 4 byte len public static final int HEADER_LEN = 5; /// 1 byte type and 4 byte len
// checksum types // checksum types
public static final int CHECKSUM_NULL = 0; public static final int CHECKSUM_NULL = 0;
public static final int CHECKSUM_CRC32 = 1; public static final int CHECKSUM_CRC32 = 1;
public static final int CHECKSUM_CRC32C = 2; public static final int CHECKSUM_CRC32C = 2;
public static final int CHECKSUM_DEFAULT = 3; public static final int CHECKSUM_DEFAULT = 3;
public static final int CHECKSUM_MIXED = 4; public static final int CHECKSUM_MIXED = 4;
/** The checksum types */ /** The checksum types */
public static enum Type { public static enum Type {
NULL (CHECKSUM_NULL, 0), NULL (CHECKSUM_NULL, 0),
CRC32 (CHECKSUM_CRC32, 4), CRC32 (CHECKSUM_CRC32, 4),
CRC32C(CHECKSUM_CRC32C, 4), CRC32C(CHECKSUM_CRC32C, 4),
DEFAULT(CHECKSUM_DEFAULT, 0), // This cannot be used to create DataChecksum DEFAULT(CHECKSUM_DEFAULT, 0), // This cannot be used to create DataChecksum
MIXED (CHECKSUM_MIXED, 0); // This cannot be used to create DataChecksum MIXED (CHECKSUM_MIXED, 0); // This cannot be used to create DataChecksum
public final int id; public final int id;
public final int size; public final int size;
private Type(int id, int size) { private Type(int id, int size) {
this.id = id; this.id = id;
this.size = size; this.size = size;
} }
/** @return the type corresponding to the id. */ /** @return the type corresponding to the id. */
public static Type valueOf(int id) { public static Type valueOf(int id) {
if (id < 0 || id >= values().length) { if (id < 0 || id >= values().length) {
throw new IllegalArgumentException("id=" + id throw new IllegalArgumentException("id=" + id
+ " out of range [0, " + values().length + ")"); + " out of range [0, " + values().length + ")");
} }
return values()[id]; return values()[id];
} }
} }
public static DataChecksum newDataChecksum(Type type, int bytesPerChecksum ) { public static DataChecksum newDataChecksum(Type type, int bytesPerChecksum ) {
if ( bytesPerChecksum <= 0 ) { if ( bytesPerChecksum <= 0 ) {
return null; return null;
} }
switch ( type ) { switch ( type ) {
case NULL : case NULL :
return new DataChecksum(type, new ChecksumNull(), bytesPerChecksum ); return new DataChecksum(type, new ChecksumNull(), bytesPerChecksum );
case CRC32 : case CRC32 :
return new DataChecksum(type, new PureJavaCrc32(), bytesPerChecksum ); return new DataChecksum(type, new PureJavaCrc32(), bytesPerChecksum );
case CRC32C: case CRC32C:
return new DataChecksum(type, new PureJavaCrc32C(), bytesPerChecksum); return new DataChecksum(type, new PureJavaCrc32C(), bytesPerChecksum);
default: default:
return null; return null;
} }
} }
/** /**
* Creates a DataChecksum from HEADER_LEN bytes from arr[offset]. * Creates a DataChecksum from HEADER_LEN bytes from arr[offset].
* @return DataChecksum of the type in the array or null in case of an error. * @return DataChecksum of the type in the array or null in case of an error.
*/ */
public static DataChecksum newDataChecksum( byte bytes[], int offset ) { public static DataChecksum newDataChecksum( byte bytes[], int offset ) {
if ( offset < 0 || bytes.length < offset + HEADER_LEN ) { if ( offset < 0 || bytes.length < offset + HEADER_LEN ) {
return null; return null;
} }
// like readInt(): // like readInt():
int bytesPerChecksum = ( (bytes[offset+1] & 0xff) << 24 ) | int bytesPerChecksum = ( (bytes[offset+1] & 0xff) << 24 ) |
( (bytes[offset+2] & 0xff) << 16 ) | ( (bytes[offset+2] & 0xff) << 16 ) |
( (bytes[offset+3] & 0xff) << 8 ) | ( (bytes[offset+3] & 0xff) << 8 ) |
( (bytes[offset+4] & 0xff) ); ( (bytes[offset+4] & 0xff) );
return newDataChecksum( Type.valueOf(bytes[0]), bytesPerChecksum ); return newDataChecksum( Type.valueOf(bytes[0]), bytesPerChecksum );
} }
/** /**
* This constructucts a DataChecksum by reading HEADER_LEN bytes from * This constructucts a DataChecksum by reading HEADER_LEN bytes from
* input stream <i>in</i> * input stream <i>in</i>
*/ */
public static DataChecksum newDataChecksum( DataInputStream in ) public static DataChecksum newDataChecksum( DataInputStream in )
throws IOException { throws IOException {
int type = in.readByte(); int type = in.readByte();
int bpc = in.readInt(); int bpc = in.readInt();
DataChecksum summer = newDataChecksum(Type.valueOf(type), bpc ); DataChecksum summer = newDataChecksum(Type.valueOf(type), bpc );
if ( summer == null ) { if ( summer == null ) {
throw new IOException( "Could not create DataChecksum of type " + throw new IOException( "Could not create DataChecksum of type " +
type + " with bytesPerChecksum " + bpc ); type + " with bytesPerChecksum " + bpc );
} }
return summer; return summer;
} }
/** /**
* Writes the checksum header to the output stream <i>out</i>. * Writes the checksum header to the output stream <i>out</i>.
*/ */
public void writeHeader( DataOutputStream out ) public void writeHeader( DataOutputStream out )
throws IOException { throws IOException {
out.writeByte( type.id ); out.writeByte( type.id );
out.writeInt( bytesPerChecksum ); out.writeInt( bytesPerChecksum );
} }
public byte[] getHeader() { public byte[] getHeader() {
byte[] header = new byte[DataChecksum.HEADER_LEN]; byte[] header = new byte[DataChecksum.HEADER_LEN];
header[0] = (byte) (type.id & 0xff); header[0] = (byte) (type.id & 0xff);
// Writing in buffer just like DataOutput.WriteInt() // Writing in buffer just like DataOutput.WriteInt()
header[1+0] = (byte) ((bytesPerChecksum >>> 24) & 0xff); header[1+0] = (byte) ((bytesPerChecksum >>> 24) & 0xff);
header[1+1] = (byte) ((bytesPerChecksum >>> 16) & 0xff); header[1+1] = (byte) ((bytesPerChecksum >>> 16) & 0xff);
header[1+2] = (byte) ((bytesPerChecksum >>> 8) & 0xff); header[1+2] = (byte) ((bytesPerChecksum >>> 8) & 0xff);
header[1+3] = (byte) (bytesPerChecksum & 0xff); header[1+3] = (byte) (bytesPerChecksum & 0xff);
return header; return header;
} }
/** /**
* Writes the current checksum to the stream. * Writes the current checksum to the stream.
* If <i>reset</i> is true, then resets the checksum. * If <i>reset</i> is true, then resets the checksum.
* @return number of bytes written. Will be equal to getChecksumSize(); * @return number of bytes written. Will be equal to getChecksumSize();
*/ */
public int writeValue( DataOutputStream out, boolean reset ) public int writeValue( DataOutputStream out, boolean reset )
throws IOException { throws IOException {
if ( type.size <= 0 ) { if ( type.size <= 0 ) {
return 0; return 0;
} }
if ( type.size == 4 ) { if ( type.size == 4 ) {
out.writeInt( (int) summer.getValue() ); out.writeInt( (int) summer.getValue() );
} else { } else {
throw new IOException( "Unknown Checksum " + type ); throw new IOException( "Unknown Checksum " + type );
} }
if ( reset ) { if ( reset ) {
reset(); reset();
} }
return type.size; return type.size;
} }
/** /**
* Writes the current checksum to a buffer. * Writes the current checksum to a buffer.
* If <i>reset</i> is true, then resets the checksum. * If <i>reset</i> is true, then resets the checksum.
* @return number of bytes written. Will be equal to getChecksumSize(); * @return number of bytes written. Will be equal to getChecksumSize();
*/ */
public int writeValue( byte[] buf, int offset, boolean reset ) public int writeValue( byte[] buf, int offset, boolean reset )
throws IOException { throws IOException {
if ( type.size <= 0 ) { if ( type.size <= 0 ) {
return 0; return 0;
} }
if ( type.size == 4 ) { if ( type.size == 4 ) {
int checksum = (int) summer.getValue(); int checksum = (int) summer.getValue();
buf[offset+0] = (byte) ((checksum >>> 24) & 0xff); buf[offset+0] = (byte) ((checksum >>> 24) & 0xff);
buf[offset+1] = (byte) ((checksum >>> 16) & 0xff); buf[offset+1] = (byte) ((checksum >>> 16) & 0xff);
buf[offset+2] = (byte) ((checksum >>> 8) & 0xff); buf[offset+2] = (byte) ((checksum >>> 8) & 0xff);
buf[offset+3] = (byte) (checksum & 0xff); buf[offset+3] = (byte) (checksum & 0xff);
} else { } else {
throw new IOException( "Unknown Checksum " + type ); throw new IOException( "Unknown Checksum " + type );
} }
if ( reset ) { if ( reset ) {
reset(); reset();
} }
return type.size; return type.size;
} }
/** /**
* Compares the checksum located at buf[offset] with the current checksum. * Compares the checksum located at buf[offset] with the current checksum.
* @return true if the checksum matches and false otherwise. * @return true if the checksum matches and false otherwise.
*/ */
public boolean compare( byte buf[], int offset ) { public boolean compare( byte buf[], int offset ) {
if ( type.size == 4 ) { if ( type.size == 4 ) {
int checksum = ( (buf[offset+0] & 0xff) << 24 ) | int checksum = ( (buf[offset+0] & 0xff) << 24 ) |
( (buf[offset+1] & 0xff) << 16 ) | ( (buf[offset+1] & 0xff) << 16 ) |
( (buf[offset+2] & 0xff) << 8 ) | ( (buf[offset+2] & 0xff) << 8 ) |
( (buf[offset+3] & 0xff) ); ( (buf[offset+3] & 0xff) );
return checksum == (int) summer.getValue(); return checksum == (int) summer.getValue();
} }
return type.size == 0; return type.size == 0;
} }
private final Type type; private final Type type;
private final Checksum summer; private final Checksum summer;
private final int bytesPerChecksum; private final int bytesPerChecksum;
private int inSum = 0; private int inSum = 0;
private DataChecksum( Type type, Checksum checksum, int chunkSize ) { private DataChecksum( Type type, Checksum checksum, int chunkSize ) {
this.type = type; this.type = type;
summer = checksum; summer = checksum;
bytesPerChecksum = chunkSize; bytesPerChecksum = chunkSize;
} }
// Accessors // Accessors
public Type getChecksumType() { public Type getChecksumType() {
return type; return type;
} }
public int getChecksumSize() { public int getChecksumSize() {
return type.size; return type.size;
} }
public int getBytesPerChecksum() { public int getBytesPerChecksum() {
return bytesPerChecksum; return bytesPerChecksum;
} }
public int getNumBytesInSum() { public int getNumBytesInSum() {
return inSum; return inSum;
} }
public static final int SIZE_OF_INTEGER = Integer.SIZE / Byte.SIZE; public static final int SIZE_OF_INTEGER = Integer.SIZE / Byte.SIZE;
static public int getChecksumHeaderSize() { static public int getChecksumHeaderSize() {
return 1 + SIZE_OF_INTEGER; // type byte, bytesPerChecksum int return 1 + SIZE_OF_INTEGER; // type byte, bytesPerChecksum int
} }
//Checksum Interface. Just a wrapper around member summer. //Checksum Interface. Just a wrapper around member summer.
@Override @Override
public long getValue() { public long getValue() {
return summer.getValue(); return summer.getValue();
} }
@Override @Override
public void reset() { public void reset() {
summer.reset(); summer.reset();
inSum = 0; inSum = 0;
} }
@Override @Override
public void update( byte[] b, int off, int len ) { public void update( byte[] b, int off, int len ) {
if ( len > 0 ) { if ( len > 0 ) {
summer.update( b, off, len ); summer.update( b, off, len );
inSum += len; inSum += len;
} }
} }
@Override @Override
public void update( int b ) { public void update( int b ) {
summer.update( b ); summer.update( b );
inSum += 1; inSum += 1;
} }
/** /**
* Verify that the given checksums match the given data. * Verify that the given checksums match the given data.
* *
* The 'mark' of the ByteBuffer parameters may be modified by this function,. * The 'mark' of the ByteBuffer parameters may be modified by this function,.
* but the position is maintained. * but the position is maintained.
* *
* @param data the DirectByteBuffer pointing to the data to verify. * @param data the DirectByteBuffer pointing to the data to verify.
* @param checksums the DirectByteBuffer pointing to a series of stored * @param checksums the DirectByteBuffer pointing to a series of stored
* checksums * checksums
* @param fileName the name of the file being read, for error-reporting * @param fileName the name of the file being read, for error-reporting
* @param basePos the file position to which the start of 'data' corresponds * @param basePos the file position to which the start of 'data' corresponds
* @throws ChecksumException if the checksums do not match * @throws ChecksumException if the checksums do not match
*/ */
public void verifyChunkedSums(ByteBuffer data, ByteBuffer checksums, public void verifyChunkedSums(ByteBuffer data, ByteBuffer checksums,
String fileName, long basePos) String fileName, long basePos)
throws ChecksumException { throws ChecksumException {
if (type.size == 0) return; if (type.size == 0) return;
if (data.hasArray() && checksums.hasArray()) { if (data.hasArray() && checksums.hasArray()) {
verifyChunkedSums( verifyChunkedSums(
data.array(), data.arrayOffset() + data.position(), data.remaining(), data.array(), data.arrayOffset() + data.position(), data.remaining(),
checksums.array(), checksums.arrayOffset() + checksums.position(), checksums.array(), checksums.arrayOffset() + checksums.position(),
fileName, basePos); fileName, basePos);
return; return;
} }
if (NativeCrc32.isAvailable()) { if (NativeCrc32.isAvailable()) {
NativeCrc32.verifyChunkedSums(bytesPerChecksum, type.id, checksums, data, NativeCrc32.verifyChunkedSums(bytesPerChecksum, type.id, checksums, data,
fileName, basePos); fileName, basePos);
return; return;
} }
int startDataPos = data.position(); int startDataPos = data.position();
data.mark(); data.mark();
checksums.mark(); checksums.mark();
try { try {
byte[] buf = new byte[bytesPerChecksum]; byte[] buf = new byte[bytesPerChecksum];
byte[] sum = new byte[type.size]; byte[] sum = new byte[type.size];
while (data.remaining() > 0) { while (data.remaining() > 0) {
int n = Math.min(data.remaining(), bytesPerChecksum); int n = Math.min(data.remaining(), bytesPerChecksum);
checksums.get(sum); checksums.get(sum);
data.get(buf, 0, n); data.get(buf, 0, n);
summer.reset(); summer.reset();
summer.update(buf, 0, n); summer.update(buf, 0, n);
int calculated = (int)summer.getValue(); int calculated = (int)summer.getValue();
int stored = (sum[0] << 24 & 0xff000000) | int stored = (sum[0] << 24 & 0xff000000) |
(sum[1] << 16 & 0xff0000) | (sum[1] << 16 & 0xff0000) |
(sum[2] << 8 & 0xff00) | (sum[2] << 8 & 0xff00) |
sum[3] & 0xff; sum[3] & 0xff;
if (calculated != stored) { if (calculated != stored) {
long errPos = basePos + data.position() - startDataPos - n; long errPos = basePos + data.position() - startDataPos - n;
throw new ChecksumException( throw new ChecksumException(
"Checksum error: "+ fileName + " at "+ errPos + "Checksum error: "+ fileName + " at "+ errPos +
" exp: " + stored + " got: " + calculated, errPos); " exp: " + stored + " got: " + calculated, errPos);
} }
} }
} finally { } finally {
data.reset(); data.reset();
checksums.reset(); checksums.reset();
} }
} }
/** /**
* Implementation of chunked verification specifically on byte arrays. This * Implementation of chunked verification specifically on byte arrays. This
* is to avoid the copy when dealing with ByteBuffers that have array backing. * is to avoid the copy when dealing with ByteBuffers that have array backing.
*/ */
private void verifyChunkedSums( private void verifyChunkedSums(
byte[] data, int dataOff, int dataLen, byte[] data, int dataOff, int dataLen,
byte[] checksums, int checksumsOff, String fileName, byte[] checksums, int checksumsOff, String fileName,
long basePos) throws ChecksumException { long basePos) throws ChecksumException {
int remaining = dataLen; int remaining = dataLen;
int dataPos = 0; int dataPos = 0;
while (remaining > 0) { while (remaining > 0) {
int n = Math.min(remaining, bytesPerChecksum); int n = Math.min(remaining, bytesPerChecksum);
summer.reset(); summer.reset();
summer.update(data, dataOff + dataPos, n); summer.update(data, dataOff + dataPos, n);
dataPos += n; dataPos += n;
remaining -= n; remaining -= n;
int calculated = (int)summer.getValue(); int calculated = (int)summer.getValue();
int stored = (checksums[checksumsOff] << 24 & 0xff000000) | int stored = (checksums[checksumsOff] << 24 & 0xff000000) |
(checksums[checksumsOff + 1] << 16 & 0xff0000) | (checksums[checksumsOff + 1] << 16 & 0xff0000) |
(checksums[checksumsOff + 2] << 8 & 0xff00) | (checksums[checksumsOff + 2] << 8 & 0xff00) |
checksums[checksumsOff + 3] & 0xff; checksums[checksumsOff + 3] & 0xff;
checksumsOff += 4; checksumsOff += 4;
if (calculated != stored) { if (calculated != stored) {
long errPos = basePos + dataPos - n; long errPos = basePos + dataPos - n;
throw new ChecksumException( throw new ChecksumException(
"Checksum error: "+ fileName + " at "+ errPos + "Checksum error: "+ fileName + " at "+ errPos +
" exp: " + stored + " got: " + calculated, errPos); " exp: " + stored + " got: " + calculated, errPos);
} }
} }
} }
/** /**
* Calculate checksums for the given data. * Calculate checksums for the given data.
* *
* The 'mark' of the ByteBuffer parameters may be modified by this function, * The 'mark' of the ByteBuffer parameters may be modified by this function,
* but the position is maintained. * but the position is maintained.
* *
* @param data the DirectByteBuffer pointing to the data to checksum. * @param data the DirectByteBuffer pointing to the data to checksum.
* @param checksums the DirectByteBuffer into which checksums will be * @param checksums the DirectByteBuffer into which checksums will be
* stored. Enough space must be available in this * stored. Enough space must be available in this
* buffer to put the checksums. * buffer to put the checksums.
*/ */
public void calculateChunkedSums(ByteBuffer data, ByteBuffer checksums) { public void calculateChunkedSums(ByteBuffer data, ByteBuffer checksums) {
if (type.size == 0) return; if (type.size == 0) return;
if (data.hasArray() && checksums.hasArray()) { if (data.hasArray() && checksums.hasArray()) {
calculateChunkedSums(data.array(), data.arrayOffset() + data.position(), data.remaining(), calculateChunkedSums(data.array(), data.arrayOffset() + data.position(), data.remaining(),
checksums.array(), checksums.arrayOffset() + checksums.position()); checksums.array(), checksums.arrayOffset() + checksums.position());
return; return;
} }
data.mark(); data.mark();
checksums.mark(); checksums.mark();
try { try {
byte[] buf = new byte[bytesPerChecksum]; byte[] buf = new byte[bytesPerChecksum];
while (data.remaining() > 0) { while (data.remaining() > 0) {
int n = Math.min(data.remaining(), bytesPerChecksum); int n = Math.min(data.remaining(), bytesPerChecksum);
data.get(buf, 0, n); data.get(buf, 0, n);
summer.reset(); summer.reset();
summer.update(buf, 0, n); summer.update(buf, 0, n);
checksums.putInt((int)summer.getValue()); checksums.putInt((int)summer.getValue());
} }
} finally { } finally {
data.reset(); data.reset();
checksums.reset(); checksums.reset();
} }
} }
/** /**
* Implementation of chunked calculation specifically on byte arrays. This * Implementation of chunked calculation specifically on byte arrays. This
* is to avoid the copy when dealing with ByteBuffers that have array backing. * is to avoid the copy when dealing with ByteBuffers that have array backing.
*/ */
private void calculateChunkedSums( private void calculateChunkedSums(
byte[] data, int dataOffset, int dataLength, byte[] data, int dataOffset, int dataLength,
byte[] sums, int sumsOffset) { byte[] sums, int sumsOffset) {
int remaining = dataLength; int remaining = dataLength;
while (remaining > 0) { while (remaining > 0) {
int n = Math.min(remaining, bytesPerChecksum); int n = Math.min(remaining, bytesPerChecksum);
summer.reset(); summer.reset();
summer.update(data, dataOffset, n); summer.update(data, dataOffset, n);
dataOffset += n; dataOffset += n;
remaining -= n; remaining -= n;
long calculated = summer.getValue(); long calculated = summer.getValue();
sums[sumsOffset++] = (byte) (calculated >> 24); sums[sumsOffset++] = (byte) (calculated >> 24);
sums[sumsOffset++] = (byte) (calculated >> 16); sums[sumsOffset++] = (byte) (calculated >> 16);
sums[sumsOffset++] = (byte) (calculated >> 8); sums[sumsOffset++] = (byte) (calculated >> 8);
sums[sumsOffset++] = (byte) (calculated); sums[sumsOffset++] = (byte) (calculated);
} }
} }
@Override @Override
public boolean equals(Object other) { public boolean equals(Object other) {
if (!(other instanceof DataChecksum)) { if (!(other instanceof DataChecksum)) {
return false; return false;
} }
DataChecksum o = (DataChecksum)other; DataChecksum o = (DataChecksum)other;
return o.bytesPerChecksum == this.bytesPerChecksum && return o.bytesPerChecksum == this.bytesPerChecksum &&
o.type == this.type; o.type == this.type;
} }
@Override @Override
public int hashCode() { public int hashCode() {
return (this.type.id + 31) * this.bytesPerChecksum; return (this.type.id + 31) * this.bytesPerChecksum;
} }
@Override @Override
public String toString() { public String toString() {
return "DataChecksum(type=" + type + return "DataChecksum(type=" + type +
", chunkSize=" + bytesPerChecksum + ")"; ", chunkSize=" + bytesPerChecksum + ")";
} }
/** /**
* This just provides a dummy implimentation for Checksum class * This just provides a dummy implimentation for Checksum class
* This is used when there is no checksum available or required for * This is used when there is no checksum available or required for
* data * data
*/ */
static class ChecksumNull implements Checksum { static class ChecksumNull implements Checksum {
public ChecksumNull() {} public ChecksumNull() {}
//Dummy interface //Dummy interface
@Override @Override
public long getValue() { return 0; } public long getValue() { return 0; }
@Override @Override
public void reset() {} public void reset() {}
@Override @Override
public void update(byte[] b, int off, int len) {} public void update(byte[] b, int off, int len) {}
@Override @Override
public void update(int b) {} public void update(int b) {}
}; };
} }

View File

@ -48,6 +48,9 @@ Release 2.0.3-alpha - Unreleased
HDFS-4041. Hadoop HDFS Maven protoc calls must not depend on external HDFS-4041. Hadoop HDFS Maven protoc calls must not depend on external
sh script. (Chris Nauroth via suresh) sh script. (Chris Nauroth via suresh)
HADOOP-8911. CRLF characters in source and text files.
(Raja Aluri via suresh)
OPTIMIZATIONS OPTIMIZATIONS
BUG FIXES BUG FIXES

View File

@ -1,110 +1,110 @@
<?xml version="1.0"?> <?xml version="1.0"?>
<!-- <!--
Licensed to the Apache Software Foundation (ASF) under one or more Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership. this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0 The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0 http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and See the License for the specific language governing permissions and
limitations under the License. limitations under the License.
--> -->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN"
"http://forrest.apache.org/dtd/document-v20.dtd"> "http://forrest.apache.org/dtd/document-v20.dtd">
<document> <document>
<header> <header>
<title>C API libhdfs</title> <title>C API libhdfs</title>
<meta name="http-equiv">Content-Type</meta> <meta name="http-equiv">Content-Type</meta>
<meta name="content">text/html;</meta> <meta name="content">text/html;</meta>
<meta name="charset">utf-8</meta> <meta name="charset">utf-8</meta>
</header> </header>
<body> <body>
<section> <section>
<title>Overview</title> <title>Overview</title>
<p> <p>
libhdfs is a JNI based C API for Hadoop's Distributed File System (HDFS). libhdfs is a JNI based C API for Hadoop's Distributed File System (HDFS).
It provides C APIs to a subset of the HDFS APIs to manipulate HDFS files and It provides C APIs to a subset of the HDFS APIs to manipulate HDFS files and
the filesystem. libhdfs is part of the Hadoop distribution and comes the filesystem. libhdfs is part of the Hadoop distribution and comes
pre-compiled in ${HADOOP_PREFIX}/libhdfs/libhdfs.so . pre-compiled in ${HADOOP_PREFIX}/libhdfs/libhdfs.so .
</p> </p>
</section> </section>
<section> <section>
<title>The APIs</title> <title>The APIs</title>
<p> <p>
The libhdfs APIs are a subset of: <a href="api/org/apache/hadoop/fs/FileSystem.html" >hadoop fs APIs</a>. The libhdfs APIs are a subset of: <a href="api/org/apache/hadoop/fs/FileSystem.html" >hadoop fs APIs</a>.
</p> </p>
<p> <p>
The header file for libhdfs describes each API in detail and is available in ${HADOOP_PREFIX}/src/c++/libhdfs/hdfs.h The header file for libhdfs describes each API in detail and is available in ${HADOOP_PREFIX}/src/c++/libhdfs/hdfs.h
</p> </p>
</section> </section>
<section> <section>
<title>A Sample Program</title> <title>A Sample Program</title>
<source> <source>
#include "hdfs.h" #include "hdfs.h"
int main(int argc, char **argv) { int main(int argc, char **argv) {
hdfsFS fs = hdfsConnect("default", 0); hdfsFS fs = hdfsConnect("default", 0);
const char* writePath = "/tmp/testfile.txt"; const char* writePath = "/tmp/testfile.txt";
hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0); hdfsFile writeFile = hdfsOpenFile(fs, writePath, O_WRONLY|O_CREAT, 0, 0, 0);
if(!writeFile) { if(!writeFile) {
fprintf(stderr, "Failed to open %s for writing!\n", writePath); fprintf(stderr, "Failed to open %s for writing!\n", writePath);
exit(-1); exit(-1);
} }
char* buffer = "Hello, World!"; char* buffer = "Hello, World!";
tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1); tSize num_written_bytes = hdfsWrite(fs, writeFile, (void*)buffer, strlen(buffer)+1);
if (hdfsFlush(fs, writeFile)) { if (hdfsFlush(fs, writeFile)) {
fprintf(stderr, "Failed to 'flush' %s\n", writePath); fprintf(stderr, "Failed to 'flush' %s\n", writePath);
exit(-1); exit(-1);
} }
hdfsCloseFile(fs, writeFile); hdfsCloseFile(fs, writeFile);
} }
</source> </source>
</section> </section>
<section> <section>
<title>How To Link With The Library</title> <title>How To Link With The Library</title>
<p> <p>
See the Makefile for hdfs_test.c in the libhdfs source directory (${HADOOP_PREFIX}/src/c++/libhdfs/Makefile) or something like:<br /> See the Makefile for hdfs_test.c in the libhdfs source directory (${HADOOP_PREFIX}/src/c++/libhdfs/Makefile) or something like:<br />
gcc above_sample.c -I${HADOOP_PREFIX}/src/c++/libhdfs -L${HADOOP_PREFIX}/libhdfs -lhdfs -o above_sample gcc above_sample.c -I${HADOOP_PREFIX}/src/c++/libhdfs -L${HADOOP_PREFIX}/libhdfs -lhdfs -o above_sample
</p> </p>
</section> </section>
<section> <section>
<title>Common Problems</title> <title>Common Problems</title>
<p> <p>
The most common problem is the CLASSPATH is not set properly when calling a program that uses libhdfs. The most common problem is the CLASSPATH is not set properly when calling a program that uses libhdfs.
Make sure you set it to all the Hadoop jars needed to run Hadoop itself. Currently, there is no way to Make sure you set it to all the Hadoop jars needed to run Hadoop itself. Currently, there is no way to
programmatically generate the classpath, but a good bet is to include all the jar files in ${HADOOP_PREFIX} programmatically generate the classpath, but a good bet is to include all the jar files in ${HADOOP_PREFIX}
and ${HADOOP_PREFIX}/lib as well as the right configuration directory containing hdfs-site.xml and ${HADOOP_PREFIX}/lib as well as the right configuration directory containing hdfs-site.xml
</p> </p>
</section> </section>
<section> <section>
<title>Thread Safe</title> <title>Thread Safe</title>
<p>libdhfs is thread safe.</p> <p>libdhfs is thread safe.</p>
<ul> <ul>
<li>Concurrency and Hadoop FS "handles" <li>Concurrency and Hadoop FS "handles"
<br />The Hadoop FS implementation includes a FS handle cache which caches based on the URI of the <br />The Hadoop FS implementation includes a FS handle cache which caches based on the URI of the
namenode along with the user connecting. So, all calls to hdfsConnect will return the same handle but namenode along with the user connecting. So, all calls to hdfsConnect will return the same handle but
calls to hdfsConnectAsUser with different users will return different handles. But, since HDFS client calls to hdfsConnectAsUser with different users will return different handles. But, since HDFS client
handles are completely thread safe, this has no bearing on concurrency. handles are completely thread safe, this has no bearing on concurrency.
</li> </li>
<li>Concurrency and libhdfs/JNI <li>Concurrency and libhdfs/JNI
<br />The libhdfs calls to JNI should always be creating thread local storage, so (in theory), libhdfs <br />The libhdfs calls to JNI should always be creating thread local storage, so (in theory), libhdfs
should be as thread safe as the underlying calls to the Hadoop FS. should be as thread safe as the underlying calls to the Hadoop FS.
</li> </li>
</ul> </ul>
</section> </section>
</body> </body>
</document> </document>

View File

@ -17,6 +17,9 @@ Release 2.0.3-alpha - Unreleased
MAPREDUCE-4616. Improve javadoc for MultipleOutputs. (Tony Burton via MAPREDUCE-4616. Improve javadoc for MultipleOutputs. (Tony Burton via
acmurthy) acmurthy)
HADOOP-8911. CRLF characters in source and text files.
(Raja Aluri via suresh)
OPTIMIZATIONS OPTIMIZATIONS
BUG FIXES BUG FIXES

View File

@ -1,120 +1,120 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.mapreduce; package org.apache.hadoop.mapreduce;
import java.io.IOException; import java.io.IOException;
import junit.framework.TestCase; import junit.framework.TestCase;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapred.LocalJobRunner; import org.apache.hadoop.mapred.LocalJobRunner;
import org.apache.hadoop.mapreduce.server.jobtracker.JTConfig; import org.apache.hadoop.mapreduce.server.jobtracker.JTConfig;
import org.junit.Test; import org.junit.Test;
public class TestClientProtocolProviderImpls extends TestCase { public class TestClientProtocolProviderImpls extends TestCase {
@Test @Test
public void testClusterWithLocalClientProvider() throws Exception { public void testClusterWithLocalClientProvider() throws Exception {
Configuration conf = new Configuration(); Configuration conf = new Configuration();
try { try {
conf.set(MRConfig.FRAMEWORK_NAME, "incorrect"); conf.set(MRConfig.FRAMEWORK_NAME, "incorrect");
new Cluster(conf); new Cluster(conf);
fail("Cluster should not be initialized with incorrect framework name"); fail("Cluster should not be initialized with incorrect framework name");
} catch (IOException e) { } catch (IOException e) {
} }
try { try {
conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME); conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME);
conf.set(JTConfig.JT_IPC_ADDRESS, "127.0.0.1:0"); conf.set(JTConfig.JT_IPC_ADDRESS, "127.0.0.1:0");
new Cluster(conf); new Cluster(conf);
fail("Cluster with Local Framework name should use local JT address"); fail("Cluster with Local Framework name should use local JT address");
} catch (IOException e) { } catch (IOException e) {
} }
try { try {
conf.set(JTConfig.JT_IPC_ADDRESS, "local"); conf.set(JTConfig.JT_IPC_ADDRESS, "local");
Cluster cluster = new Cluster(conf); Cluster cluster = new Cluster(conf);
assertTrue(cluster.getClient() instanceof LocalJobRunner); assertTrue(cluster.getClient() instanceof LocalJobRunner);
cluster.close(); cluster.close();
} catch (IOException e) { } catch (IOException e) {
} }
} }
@Test @Test
public void testClusterWithJTClientProvider() throws Exception { public void testClusterWithJTClientProvider() throws Exception {
Configuration conf = new Configuration(); Configuration conf = new Configuration();
try { try {
conf.set(MRConfig.FRAMEWORK_NAME, "incorrect"); conf.set(MRConfig.FRAMEWORK_NAME, "incorrect");
new Cluster(conf); new Cluster(conf);
fail("Cluster should not be initialized with incorrect framework name"); fail("Cluster should not be initialized with incorrect framework name");
} catch (IOException e) { } catch (IOException e) {
} }
try { try {
conf.set(MRConfig.FRAMEWORK_NAME, "classic"); conf.set(MRConfig.FRAMEWORK_NAME, "classic");
conf.set(JTConfig.JT_IPC_ADDRESS, "local"); conf.set(JTConfig.JT_IPC_ADDRESS, "local");
new Cluster(conf); new Cluster(conf);
fail("Cluster with classic Framework name shouldnot use local JT address"); fail("Cluster with classic Framework name shouldnot use local JT address");
} catch (IOException e) { } catch (IOException e) {
} }
try { try {
conf = new Configuration(); conf = new Configuration();
conf.set(MRConfig.FRAMEWORK_NAME, "classic"); conf.set(MRConfig.FRAMEWORK_NAME, "classic");
conf.set(JTConfig.JT_IPC_ADDRESS, "127.0.0.1:0"); conf.set(JTConfig.JT_IPC_ADDRESS, "127.0.0.1:0");
Cluster cluster = new Cluster(conf); Cluster cluster = new Cluster(conf);
cluster.close(); cluster.close();
} catch (IOException e) { } catch (IOException e) {
} }
} }
@Test @Test
public void testClusterException() { public void testClusterException() {
Configuration conf = new Configuration(); Configuration conf = new Configuration();
conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.CLASSIC_FRAMEWORK_NAME); conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.CLASSIC_FRAMEWORK_NAME);
conf.set(JTConfig.JT_IPC_ADDRESS, "local"); conf.set(JTConfig.JT_IPC_ADDRESS, "local");
// initializing a cluster with this conf should throw an error. // initializing a cluster with this conf should throw an error.
// However the exception thrown should not be specific to either // However the exception thrown should not be specific to either
// the job tracker client provider or the local provider // the job tracker client provider or the local provider
boolean errorThrown = false; boolean errorThrown = false;
try { try {
Cluster cluster = new Cluster(conf); Cluster cluster = new Cluster(conf);
cluster.close(); cluster.close();
fail("Not expected - cluster init should have failed"); fail("Not expected - cluster init should have failed");
} catch (IOException e) { } catch (IOException e) {
errorThrown = true; errorThrown = true;
assert(e.getMessage().contains("Cannot initialize Cluster. Please check")); assert(e.getMessage().contains("Cannot initialize Cluster. Please check"));
} }
assert(errorThrown); assert(errorThrown);
} }
} }

View File

@ -1,129 +1,129 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.mapreduce; package org.apache.hadoop.mapreduce;
import static org.mockito.Matchers.any; import static org.mockito.Matchers.any;
import static org.mockito.Mockito.mock; import static org.mockito.Mockito.mock;
import static org.mockito.Mockito.when; import static org.mockito.Mockito.when;
import java.io.IOException; import java.io.IOException;
import java.nio.ByteBuffer; import java.nio.ByteBuffer;
import junit.framework.TestCase; import junit.framework.TestCase;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.LocalJobRunner; import org.apache.hadoop.mapred.LocalJobRunner;
import org.apache.hadoop.mapred.ResourceMgrDelegate; import org.apache.hadoop.mapred.ResourceMgrDelegate;
import org.apache.hadoop.mapred.YARNRunner; import org.apache.hadoop.mapred.YARNRunner;
import org.apache.hadoop.mapreduce.protocol.ClientProtocol; import org.apache.hadoop.mapreduce.protocol.ClientProtocol;
import org.apache.hadoop.security.token.Token; import org.apache.hadoop.security.token.Token;
import org.apache.hadoop.yarn.api.ClientRMProtocol; import org.apache.hadoop.yarn.api.ClientRMProtocol;
import org.apache.hadoop.yarn.api.protocolrecords.GetDelegationTokenRequest; import org.apache.hadoop.yarn.api.protocolrecords.GetDelegationTokenRequest;
import org.apache.hadoop.yarn.api.protocolrecords.GetDelegationTokenResponse; import org.apache.hadoop.yarn.api.protocolrecords.GetDelegationTokenResponse;
import org.apache.hadoop.yarn.api.records.DelegationToken; import org.apache.hadoop.yarn.api.records.DelegationToken;
import org.apache.hadoop.yarn.conf.YarnConfiguration; import org.apache.hadoop.yarn.conf.YarnConfiguration;
import org.apache.hadoop.yarn.factories.RecordFactory; import org.apache.hadoop.yarn.factories.RecordFactory;
import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider; import org.apache.hadoop.yarn.factory.providers.RecordFactoryProvider;
import org.junit.Test; import org.junit.Test;
public class TestYarnClientProtocolProvider extends TestCase { public class TestYarnClientProtocolProvider extends TestCase {
private static final RecordFactory recordFactory = RecordFactoryProvider. private static final RecordFactory recordFactory = RecordFactoryProvider.
getRecordFactory(null); getRecordFactory(null);
@Test @Test
public void testClusterWithYarnClientProtocolProvider() throws Exception { public void testClusterWithYarnClientProtocolProvider() throws Exception {
Configuration conf = new Configuration(false); Configuration conf = new Configuration(false);
Cluster cluster = null; Cluster cluster = null;
try { try {
cluster = new Cluster(conf); cluster = new Cluster(conf);
} catch (Exception e) { } catch (Exception e) {
throw new Exception( throw new Exception(
"Failed to initialize a local runner w/o a cluster framework key", e); "Failed to initialize a local runner w/o a cluster framework key", e);
} }
try { try {
assertTrue("client is not a LocalJobRunner", assertTrue("client is not a LocalJobRunner",
cluster.getClient() instanceof LocalJobRunner); cluster.getClient() instanceof LocalJobRunner);
} finally { } finally {
if (cluster != null) { if (cluster != null) {
cluster.close(); cluster.close();
} }
} }
try { try {
conf = new Configuration(); conf = new Configuration();
conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.YARN_FRAMEWORK_NAME); conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.YARN_FRAMEWORK_NAME);
cluster = new Cluster(conf); cluster = new Cluster(conf);
ClientProtocol client = cluster.getClient(); ClientProtocol client = cluster.getClient();
assertTrue("client is a YARNRunner", client instanceof YARNRunner); assertTrue("client is a YARNRunner", client instanceof YARNRunner);
} catch (IOException e) { } catch (IOException e) {
} finally { } finally {
if (cluster != null) { if (cluster != null) {
cluster.close(); cluster.close();
} }
} }
} }
@Test @Test
public void testClusterGetDelegationToken() throws Exception { public void testClusterGetDelegationToken() throws Exception {
Configuration conf = new Configuration(false); Configuration conf = new Configuration(false);
Cluster cluster = null; Cluster cluster = null;
try { try {
conf = new Configuration(); conf = new Configuration();
conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.YARN_FRAMEWORK_NAME); conf.set(MRConfig.FRAMEWORK_NAME, MRConfig.YARN_FRAMEWORK_NAME);
cluster = new Cluster(conf); cluster = new Cluster(conf);
YARNRunner yrunner = (YARNRunner) cluster.getClient(); YARNRunner yrunner = (YARNRunner) cluster.getClient();
GetDelegationTokenResponse getDTResponse = GetDelegationTokenResponse getDTResponse =
recordFactory.newRecordInstance(GetDelegationTokenResponse.class); recordFactory.newRecordInstance(GetDelegationTokenResponse.class);
DelegationToken rmDTToken = recordFactory.newRecordInstance( DelegationToken rmDTToken = recordFactory.newRecordInstance(
DelegationToken.class); DelegationToken.class);
rmDTToken.setIdentifier(ByteBuffer.wrap(new byte[2])); rmDTToken.setIdentifier(ByteBuffer.wrap(new byte[2]));
rmDTToken.setKind("Testclusterkind"); rmDTToken.setKind("Testclusterkind");
rmDTToken.setPassword(ByteBuffer.wrap("testcluster".getBytes())); rmDTToken.setPassword(ByteBuffer.wrap("testcluster".getBytes()));
rmDTToken.setService("0.0.0.0:8032"); rmDTToken.setService("0.0.0.0:8032");
getDTResponse.setRMDelegationToken(rmDTToken); getDTResponse.setRMDelegationToken(rmDTToken);
final ClientRMProtocol cRMProtocol = mock(ClientRMProtocol.class); final ClientRMProtocol cRMProtocol = mock(ClientRMProtocol.class);
when(cRMProtocol.getDelegationToken(any( when(cRMProtocol.getDelegationToken(any(
GetDelegationTokenRequest.class))).thenReturn(getDTResponse); GetDelegationTokenRequest.class))).thenReturn(getDTResponse);
ResourceMgrDelegate rmgrDelegate = new ResourceMgrDelegate( ResourceMgrDelegate rmgrDelegate = new ResourceMgrDelegate(
new YarnConfiguration(conf)) { new YarnConfiguration(conf)) {
@Override @Override
public synchronized void start() { public synchronized void start() {
this.rmClient = cRMProtocol; this.rmClient = cRMProtocol;
} }
}; };
yrunner.setResourceMgrDelegate(rmgrDelegate); yrunner.setResourceMgrDelegate(rmgrDelegate);
Token t = cluster.getDelegationToken(new Text(" ")); Token t = cluster.getDelegationToken(new Text(" "));
assertTrue("Token kind is instead " + t.getKind().toString(), assertTrue("Token kind is instead " + t.getKind().toString(),
"Testclusterkind".equals(t.getKind().toString())); "Testclusterkind".equals(t.getKind().toString()));
} finally { } finally {
if (cluster != null) { if (cluster != null) {
cluster.close(); cluster.close();
} }
} }
} }
} }

View File

@ -1,196 +1,196 @@
package org.apache.hadoop.examples; package org.apache.hadoop.examples;
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
import java.io.BufferedReader; import java.io.BufferedReader;
import java.io.IOException; import java.io.IOException;
import java.io.InputStreamReader; import java.io.InputStreamReader;
import java.util.StringTokenizer; import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured; import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.util.ToolRunner;
public class WordMean extends Configured implements Tool { public class WordMean extends Configured implements Tool {
private double mean = 0; private double mean = 0;
private final static Text COUNT = new Text("count"); private final static Text COUNT = new Text("count");
private final static Text LENGTH = new Text("length"); private final static Text LENGTH = new Text("length");
private final static LongWritable ONE = new LongWritable(1); private final static LongWritable ONE = new LongWritable(1);
/** /**
* Maps words from line of text into 2 key-value pairs; one key-value pair for * Maps words from line of text into 2 key-value pairs; one key-value pair for
* counting the word, another for counting its length. * counting the word, another for counting its length.
*/ */
public static class WordMeanMapper extends public static class WordMeanMapper extends
Mapper<Object, Text, Text, LongWritable> { Mapper<Object, Text, Text, LongWritable> {
private LongWritable wordLen = new LongWritable(); private LongWritable wordLen = new LongWritable();
/** /**
* Emits 2 key-value pairs for counting the word and its length. Outputs are * Emits 2 key-value pairs for counting the word and its length. Outputs are
* (Text, LongWritable). * (Text, LongWritable).
* *
* @param value * @param value
* This will be a line of text coming in from our input file. * This will be a line of text coming in from our input file.
*/ */
public void map(Object key, Text value, Context context) public void map(Object key, Text value, Context context)
throws IOException, InterruptedException { throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString()); StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) { while (itr.hasMoreTokens()) {
String string = itr.nextToken(); String string = itr.nextToken();
this.wordLen.set(string.length()); this.wordLen.set(string.length());
context.write(LENGTH, this.wordLen); context.write(LENGTH, this.wordLen);
context.write(COUNT, ONE); context.write(COUNT, ONE);
} }
} }
} }
/** /**
* Performs integer summation of all the values for each key. * Performs integer summation of all the values for each key.
*/ */
public static class WordMeanReducer extends public static class WordMeanReducer extends
Reducer<Text, LongWritable, Text, LongWritable> { Reducer<Text, LongWritable, Text, LongWritable> {
private LongWritable sum = new LongWritable(); private LongWritable sum = new LongWritable();
/** /**
* Sums all the individual values within the iterator and writes them to the * Sums all the individual values within the iterator and writes them to the
* same key. * same key.
* *
* @param key * @param key
* This will be one of 2 constants: LENGTH_STR or COUNT_STR. * This will be one of 2 constants: LENGTH_STR or COUNT_STR.
* @param values * @param values
* This will be an iterator of all the values associated with that * This will be an iterator of all the values associated with that
* key. * key.
*/ */
public void reduce(Text key, Iterable<LongWritable> values, Context context) public void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException { throws IOException, InterruptedException {
int theSum = 0; int theSum = 0;
for (LongWritable val : values) { for (LongWritable val : values) {
theSum += val.get(); theSum += val.get();
} }
sum.set(theSum); sum.set(theSum);
context.write(key, sum); context.write(key, sum);
} }
} }
/** /**
* Reads the output file and parses the summation of lengths, and the word * Reads the output file and parses the summation of lengths, and the word
* count, to perform a quick calculation of the mean. * count, to perform a quick calculation of the mean.
* *
* @param path * @param path
* The path to find the output file in. Set in main to the output * The path to find the output file in. Set in main to the output
* directory. * directory.
* @throws IOException * @throws IOException
* If it cannot access the output directory, we throw an exception. * If it cannot access the output directory, we throw an exception.
*/ */
private double readAndCalcMean(Path path, Configuration conf) private double readAndCalcMean(Path path, Configuration conf)
throws IOException { throws IOException {
FileSystem fs = FileSystem.get(conf); FileSystem fs = FileSystem.get(conf);
Path file = new Path(path, "part-r-00000"); Path file = new Path(path, "part-r-00000");
if (!fs.exists(file)) if (!fs.exists(file))
throw new IOException("Output not found!"); throw new IOException("Output not found!");
BufferedReader br = null; BufferedReader br = null;
// average = total sum / number of elements; // average = total sum / number of elements;
try { try {
br = new BufferedReader(new InputStreamReader(fs.open(file))); br = new BufferedReader(new InputStreamReader(fs.open(file)));
long count = 0; long count = 0;
long length = 0; long length = 0;
String line; String line;
while ((line = br.readLine()) != null) { while ((line = br.readLine()) != null) {
StringTokenizer st = new StringTokenizer(line); StringTokenizer st = new StringTokenizer(line);
// grab type // grab type
String type = st.nextToken(); String type = st.nextToken();
// differentiate // differentiate
if (type.equals(COUNT.toString())) { if (type.equals(COUNT.toString())) {
String countLit = st.nextToken(); String countLit = st.nextToken();
count = Long.parseLong(countLit); count = Long.parseLong(countLit);
} else if (type.equals(LENGTH.toString())) { } else if (type.equals(LENGTH.toString())) {
String lengthLit = st.nextToken(); String lengthLit = st.nextToken();
length = Long.parseLong(lengthLit); length = Long.parseLong(lengthLit);
} }
} }
double theMean = (((double) length) / ((double) count)); double theMean = (((double) length) / ((double) count));
System.out.println("The mean is: " + theMean); System.out.println("The mean is: " + theMean);
return theMean; return theMean;
} finally { } finally {
br.close(); br.close();
} }
} }
public static void main(String[] args) throws Exception { public static void main(String[] args) throws Exception {
ToolRunner.run(new Configuration(), new WordMean(), args); ToolRunner.run(new Configuration(), new WordMean(), args);
} }
@Override @Override
public int run(String[] args) throws Exception { public int run(String[] args) throws Exception {
if (args.length != 2) { if (args.length != 2) {
System.err.println("Usage: wordmean <in> <out>"); System.err.println("Usage: wordmean <in> <out>");
return 0; return 0;
} }
Configuration conf = getConf(); Configuration conf = getConf();
@SuppressWarnings("deprecation") @SuppressWarnings("deprecation")
Job job = new Job(conf, "word mean"); Job job = new Job(conf, "word mean");
job.setJarByClass(WordMean.class); job.setJarByClass(WordMean.class);
job.setMapperClass(WordMeanMapper.class); job.setMapperClass(WordMeanMapper.class);
job.setCombinerClass(WordMeanReducer.class); job.setCombinerClass(WordMeanReducer.class);
job.setReducerClass(WordMeanReducer.class); job.setReducerClass(WordMeanReducer.class);
job.setOutputKeyClass(Text.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class); job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0])); FileInputFormat.addInputPath(job, new Path(args[0]));
Path outputpath = new Path(args[1]); Path outputpath = new Path(args[1]);
FileOutputFormat.setOutputPath(job, outputpath); FileOutputFormat.setOutputPath(job, outputpath);
boolean result = job.waitForCompletion(true); boolean result = job.waitForCompletion(true);
mean = readAndCalcMean(outputpath, conf); mean = readAndCalcMean(outputpath, conf);
return (result ? 0 : 1); return (result ? 0 : 1);
} }
/** /**
* Only valuable after run() called. * Only valuable after run() called.
* *
* @return Returns the mean value. * @return Returns the mean value.
*/ */
public double getMean() { public double getMean() {
return mean; return mean;
} }
} }

View File

@ -1,208 +1,208 @@
package org.apache.hadoop.examples; package org.apache.hadoop.examples;
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
import java.io.BufferedReader; import java.io.BufferedReader;
import java.io.IOException; import java.io.IOException;
import java.io.InputStreamReader; import java.io.InputStreamReader;
import java.util.StringTokenizer; import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured; import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.TaskCounter; import org.apache.hadoop.mapreduce.TaskCounter;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.util.ToolRunner;
public class WordMedian extends Configured implements Tool { public class WordMedian extends Configured implements Tool {
private double median = 0; private double median = 0;
private final static IntWritable ONE = new IntWritable(1); private final static IntWritable ONE = new IntWritable(1);
/** /**
* Maps words from line of text into a key-value pair; the length of the word * Maps words from line of text into a key-value pair; the length of the word
* as the key, and 1 as the value. * as the key, and 1 as the value.
*/ */
public static class WordMedianMapper extends public static class WordMedianMapper extends
Mapper<Object, Text, IntWritable, IntWritable> { Mapper<Object, Text, IntWritable, IntWritable> {
private IntWritable length = new IntWritable(); private IntWritable length = new IntWritable();
/** /**
* Emits a key-value pair for counting the word. Outputs are (IntWritable, * Emits a key-value pair for counting the word. Outputs are (IntWritable,
* IntWritable). * IntWritable).
* *
* @param value * @param value
* This will be a line of text coming in from our input file. * This will be a line of text coming in from our input file.
*/ */
public void map(Object key, Text value, Context context) public void map(Object key, Text value, Context context)
throws IOException, InterruptedException { throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString()); StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) { while (itr.hasMoreTokens()) {
String string = itr.nextToken(); String string = itr.nextToken();
length.set(string.length()); length.set(string.length());
context.write(length, ONE); context.write(length, ONE);
} }
} }
} }
/** /**
* Performs integer summation of all the values for each key. * Performs integer summation of all the values for each key.
*/ */
public static class WordMedianReducer extends public static class WordMedianReducer extends
Reducer<IntWritable, IntWritable, IntWritable, IntWritable> { Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {
private IntWritable val = new IntWritable(); private IntWritable val = new IntWritable();
/** /**
* Sums all the individual values within the iterator and writes them to the * Sums all the individual values within the iterator and writes them to the
* same key. * same key.
* *
* @param key * @param key
* This will be a length of a word that was read. * This will be a length of a word that was read.
* @param values * @param values
* This will be an iterator of all the values associated with that * This will be an iterator of all the values associated with that
* key. * key.
*/ */
public void reduce(IntWritable key, Iterable<IntWritable> values, public void reduce(IntWritable key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException { Context context) throws IOException, InterruptedException {
int sum = 0; int sum = 0;
for (IntWritable value : values) { for (IntWritable value : values) {
sum += value.get(); sum += value.get();
} }
val.set(sum); val.set(sum);
context.write(key, val); context.write(key, val);
} }
} }
/** /**
* This is a standard program to read and find a median value based on a file * This is a standard program to read and find a median value based on a file
* of word counts such as: 1 456, 2 132, 3 56... Where the first values are * of word counts such as: 1 456, 2 132, 3 56... Where the first values are
* the word lengths and the following values are the number of times that * the word lengths and the following values are the number of times that
* words of that length appear. * words of that length appear.
* *
* @param path * @param path
* The path to read the HDFS file from (part-r-00000...00001...etc). * The path to read the HDFS file from (part-r-00000...00001...etc).
* @param medianIndex1 * @param medianIndex1
* The first length value to look for. * The first length value to look for.
* @param medianIndex2 * @param medianIndex2
* The second length value to look for (will be the same as the first * The second length value to look for (will be the same as the first
* if there are an even number of words total). * if there are an even number of words total).
* @throws IOException * @throws IOException
* If file cannot be found, we throw an exception. * If file cannot be found, we throw an exception.
* */ * */
private double readAndFindMedian(String path, int medianIndex1, private double readAndFindMedian(String path, int medianIndex1,
int medianIndex2, Configuration conf) throws IOException { int medianIndex2, Configuration conf) throws IOException {
FileSystem fs = FileSystem.get(conf); FileSystem fs = FileSystem.get(conf);
Path file = new Path(path, "part-r-00000"); Path file = new Path(path, "part-r-00000");
if (!fs.exists(file)) if (!fs.exists(file))
throw new IOException("Output not found!"); throw new IOException("Output not found!");
BufferedReader br = null; BufferedReader br = null;
try { try {
br = new BufferedReader(new InputStreamReader(fs.open(file))); br = new BufferedReader(new InputStreamReader(fs.open(file)));
int num = 0; int num = 0;
String line; String line;
while ((line = br.readLine()) != null) { while ((line = br.readLine()) != null) {
StringTokenizer st = new StringTokenizer(line); StringTokenizer st = new StringTokenizer(line);
// grab length // grab length
String currLen = st.nextToken(); String currLen = st.nextToken();
// grab count // grab count
String lengthFreq = st.nextToken(); String lengthFreq = st.nextToken();
int prevNum = num; int prevNum = num;
num += Integer.parseInt(lengthFreq); num += Integer.parseInt(lengthFreq);
if (medianIndex2 >= prevNum && medianIndex1 <= num) { if (medianIndex2 >= prevNum && medianIndex1 <= num) {
System.out.println("The median is: " + currLen); System.out.println("The median is: " + currLen);
br.close(); br.close();
return Double.parseDouble(currLen); return Double.parseDouble(currLen);
} else if (medianIndex2 >= prevNum && medianIndex1 < num) { } else if (medianIndex2 >= prevNum && medianIndex1 < num) {
String nextCurrLen = st.nextToken(); String nextCurrLen = st.nextToken();
double theMedian = (Integer.parseInt(currLen) + Integer double theMedian = (Integer.parseInt(currLen) + Integer
.parseInt(nextCurrLen)) / 2.0; .parseInt(nextCurrLen)) / 2.0;
System.out.println("The median is: " + theMedian); System.out.println("The median is: " + theMedian);
br.close(); br.close();
return theMedian; return theMedian;
} }
} }
} finally { } finally {
br.close(); br.close();
} }
// error, no median found // error, no median found
return -1; return -1;
} }
public static void main(String[] args) throws Exception { public static void main(String[] args) throws Exception {
ToolRunner.run(new Configuration(), new WordMedian(), args); ToolRunner.run(new Configuration(), new WordMedian(), args);
} }
@Override @Override
public int run(String[] args) throws Exception { public int run(String[] args) throws Exception {
if (args.length != 2) { if (args.length != 2) {
System.err.println("Usage: wordmedian <in> <out>"); System.err.println("Usage: wordmedian <in> <out>");
return 0; return 0;
} }
setConf(new Configuration()); setConf(new Configuration());
Configuration conf = getConf(); Configuration conf = getConf();
@SuppressWarnings("deprecation") @SuppressWarnings("deprecation")
Job job = new Job(conf, "word median"); Job job = new Job(conf, "word median");
job.setJarByClass(WordMedian.class); job.setJarByClass(WordMedian.class);
job.setMapperClass(WordMedianMapper.class); job.setMapperClass(WordMedianMapper.class);
job.setCombinerClass(WordMedianReducer.class); job.setCombinerClass(WordMedianReducer.class);
job.setReducerClass(WordMedianReducer.class); job.setReducerClass(WordMedianReducer.class);
job.setOutputKeyClass(IntWritable.class); job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class); job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0])); FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1])); FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean result = job.waitForCompletion(true); boolean result = job.waitForCompletion(true);
// Wait for JOB 1 -- get middle value to check for Median // Wait for JOB 1 -- get middle value to check for Median
long totalWords = job.getCounters() long totalWords = job.getCounters()
.getGroup(TaskCounter.class.getCanonicalName()) .getGroup(TaskCounter.class.getCanonicalName())
.findCounter("MAP_OUTPUT_RECORDS", "Map output records").getValue(); .findCounter("MAP_OUTPUT_RECORDS", "Map output records").getValue();
int medianIndex1 = (int) Math.ceil((totalWords / 2.0)); int medianIndex1 = (int) Math.ceil((totalWords / 2.0));
int medianIndex2 = (int) Math.floor((totalWords / 2.0)); int medianIndex2 = (int) Math.floor((totalWords / 2.0));
median = readAndFindMedian(args[1], medianIndex1, medianIndex2, conf); median = readAndFindMedian(args[1], medianIndex1, medianIndex2, conf);
return (result ? 0 : 1); return (result ? 0 : 1);
} }
public double getMedian() { public double getMedian() {
return median; return median;
} }
} }

View File

@ -1,210 +1,210 @@
package org.apache.hadoop.examples; package org.apache.hadoop.examples;
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
import java.io.BufferedReader; import java.io.BufferedReader;
import java.io.IOException; import java.io.IOException;
import java.io.InputStreamReader; import java.io.InputStreamReader;
import java.util.StringTokenizer; import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured; import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.util.ToolRunner;
public class WordStandardDeviation extends Configured implements Tool { public class WordStandardDeviation extends Configured implements Tool {
private double stddev = 0; private double stddev = 0;
private final static Text LENGTH = new Text("length"); private final static Text LENGTH = new Text("length");
private final static Text SQUARE = new Text("square"); private final static Text SQUARE = new Text("square");
private final static Text COUNT = new Text("count"); private final static Text COUNT = new Text("count");
private final static LongWritable ONE = new LongWritable(1); private final static LongWritable ONE = new LongWritable(1);
/** /**
* Maps words from line of text into 3 key-value pairs; one key-value pair for * Maps words from line of text into 3 key-value pairs; one key-value pair for
* counting the word, one for counting its length, and one for counting the * counting the word, one for counting its length, and one for counting the
* square of its length. * square of its length.
*/ */
public static class WordStandardDeviationMapper extends public static class WordStandardDeviationMapper extends
Mapper<Object, Text, Text, LongWritable> { Mapper<Object, Text, Text, LongWritable> {
private LongWritable wordLen = new LongWritable(); private LongWritable wordLen = new LongWritable();
private LongWritable wordLenSq = new LongWritable(); private LongWritable wordLenSq = new LongWritable();
/** /**
* Emits 3 key-value pairs for counting the word, its length, and the * Emits 3 key-value pairs for counting the word, its length, and the
* squares of its length. Outputs are (Text, LongWritable). * squares of its length. Outputs are (Text, LongWritable).
* *
* @param value * @param value
* This will be a line of text coming in from our input file. * This will be a line of text coming in from our input file.
*/ */
public void map(Object key, Text value, Context context) public void map(Object key, Text value, Context context)
throws IOException, InterruptedException { throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString()); StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) { while (itr.hasMoreTokens()) {
String string = itr.nextToken(); String string = itr.nextToken();
this.wordLen.set(string.length()); this.wordLen.set(string.length());
// the square of an integer is an integer... // the square of an integer is an integer...
this.wordLenSq.set((long) Math.pow(string.length(), 2.0)); this.wordLenSq.set((long) Math.pow(string.length(), 2.0));
context.write(LENGTH, this.wordLen); context.write(LENGTH, this.wordLen);
context.write(SQUARE, this.wordLenSq); context.write(SQUARE, this.wordLenSq);
context.write(COUNT, ONE); context.write(COUNT, ONE);
} }
} }
} }
/** /**
* Performs integer summation of all the values for each key. * Performs integer summation of all the values for each key.
*/ */
public static class WordStandardDeviationReducer extends public static class WordStandardDeviationReducer extends
Reducer<Text, LongWritable, Text, LongWritable> { Reducer<Text, LongWritable, Text, LongWritable> {
private LongWritable val = new LongWritable(); private LongWritable val = new LongWritable();
/** /**
* Sums all the individual values within the iterator and writes them to the * Sums all the individual values within the iterator and writes them to the
* same key. * same key.
* *
* @param key * @param key
* This will be one of 2 constants: LENGTH_STR, COUNT_STR, or * This will be one of 2 constants: LENGTH_STR, COUNT_STR, or
* SQUARE_STR. * SQUARE_STR.
* @param values * @param values
* This will be an iterator of all the values associated with that * This will be an iterator of all the values associated with that
* key. * key.
*/ */
public void reduce(Text key, Iterable<LongWritable> values, Context context) public void reduce(Text key, Iterable<LongWritable> values, Context context)
throws IOException, InterruptedException { throws IOException, InterruptedException {
int sum = 0; int sum = 0;
for (LongWritable value : values) { for (LongWritable value : values) {
sum += value.get(); sum += value.get();
} }
val.set(sum); val.set(sum);
context.write(key, val); context.write(key, val);
} }
} }
/** /**
* Reads the output file and parses the summation of lengths, the word count, * Reads the output file and parses the summation of lengths, the word count,
* and the lengths squared, to perform a quick calculation of the standard * and the lengths squared, to perform a quick calculation of the standard
* deviation. * deviation.
* *
* @param path * @param path
* The path to find the output file in. Set in main to the output * The path to find the output file in. Set in main to the output
* directory. * directory.
* @throws IOException * @throws IOException
* If it cannot access the output directory, we throw an exception. * If it cannot access the output directory, we throw an exception.
*/ */
private double readAndCalcStdDev(Path path, Configuration conf) private double readAndCalcStdDev(Path path, Configuration conf)
throws IOException { throws IOException {
FileSystem fs = FileSystem.get(conf); FileSystem fs = FileSystem.get(conf);
Path file = new Path(path, "part-r-00000"); Path file = new Path(path, "part-r-00000");
if (!fs.exists(file)) if (!fs.exists(file))
throw new IOException("Output not found!"); throw new IOException("Output not found!");
double stddev = 0; double stddev = 0;
BufferedReader br = null; BufferedReader br = null;
try { try {
br = new BufferedReader(new InputStreamReader(fs.open(file))); br = new BufferedReader(new InputStreamReader(fs.open(file)));
long count = 0; long count = 0;
long length = 0; long length = 0;
long square = 0; long square = 0;
String line; String line;
while ((line = br.readLine()) != null) { while ((line = br.readLine()) != null) {
StringTokenizer st = new StringTokenizer(line); StringTokenizer st = new StringTokenizer(line);
// grab type // grab type
String type = st.nextToken(); String type = st.nextToken();
// differentiate // differentiate
if (type.equals(COUNT.toString())) { if (type.equals(COUNT.toString())) {
String countLit = st.nextToken(); String countLit = st.nextToken();
count = Long.parseLong(countLit); count = Long.parseLong(countLit);
} else if (type.equals(LENGTH.toString())) { } else if (type.equals(LENGTH.toString())) {
String lengthLit = st.nextToken(); String lengthLit = st.nextToken();
length = Long.parseLong(lengthLit); length = Long.parseLong(lengthLit);
} else if (type.equals(SQUARE.toString())) { } else if (type.equals(SQUARE.toString())) {
String squareLit = st.nextToken(); String squareLit = st.nextToken();
square = Long.parseLong(squareLit); square = Long.parseLong(squareLit);
} }
} }
// average = total sum / number of elements; // average = total sum / number of elements;
double mean = (((double) length) / ((double) count)); double mean = (((double) length) / ((double) count));
// standard deviation = sqrt((sum(lengths ^ 2)/count) - (mean ^ 2)) // standard deviation = sqrt((sum(lengths ^ 2)/count) - (mean ^ 2))
mean = Math.pow(mean, 2.0); mean = Math.pow(mean, 2.0);
double term = (((double) square / ((double) count))); double term = (((double) square / ((double) count)));
stddev = Math.sqrt((term - mean)); stddev = Math.sqrt((term - mean));
System.out.println("The standard deviation is: " + stddev); System.out.println("The standard deviation is: " + stddev);
} finally { } finally {
br.close(); br.close();
} }
return stddev; return stddev;
} }
public static void main(String[] args) throws Exception { public static void main(String[] args) throws Exception {
ToolRunner.run(new Configuration(), new WordStandardDeviation(), ToolRunner.run(new Configuration(), new WordStandardDeviation(),
args); args);
} }
@Override @Override
public int run(String[] args) throws Exception { public int run(String[] args) throws Exception {
if (args.length != 2) { if (args.length != 2) {
System.err.println("Usage: wordstddev <in> <out>"); System.err.println("Usage: wordstddev <in> <out>");
return 0; return 0;
} }
Configuration conf = getConf(); Configuration conf = getConf();
@SuppressWarnings("deprecation") @SuppressWarnings("deprecation")
Job job = new Job(conf, "word stddev"); Job job = new Job(conf, "word stddev");
job.setJarByClass(WordStandardDeviation.class); job.setJarByClass(WordStandardDeviation.class);
job.setMapperClass(WordStandardDeviationMapper.class); job.setMapperClass(WordStandardDeviationMapper.class);
job.setCombinerClass(WordStandardDeviationReducer.class); job.setCombinerClass(WordStandardDeviationReducer.class);
job.setReducerClass(WordStandardDeviationReducer.class); job.setReducerClass(WordStandardDeviationReducer.class);
job.setOutputKeyClass(Text.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class); job.setOutputValueClass(LongWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0])); FileInputFormat.addInputPath(job, new Path(args[0]));
Path outputpath = new Path(args[1]); Path outputpath = new Path(args[1]);
FileOutputFormat.setOutputPath(job, outputpath); FileOutputFormat.setOutputPath(job, outputpath);
boolean result = job.waitForCompletion(true); boolean result = job.waitForCompletion(true);
// read output and calculate standard deviation // read output and calculate standard deviation
stddev = readAndCalcStdDev(outputpath, conf); stddev = readAndCalcStdDev(outputpath, conf);
return (result ? 0 : 1); return (result ? 0 : 1);
} }
public double getStandardDeviation() { public double getStandardDeviation() {
return stddev; return stddev;
} }
} }

View File

@ -1,272 +1,272 @@
package org.apache.hadoop.examples; package org.apache.hadoop.examples;
import static org.junit.Assert.assertEquals; import static org.junit.Assert.assertEquals;
import java.io.BufferedReader; import java.io.BufferedReader;
import java.io.File; import java.io.File;
import java.io.IOException; import java.io.IOException;
import java.io.InputStreamReader; import java.io.InputStreamReader;
import java.util.StringTokenizer; import java.util.StringTokenizer;
import java.util.TreeMap; import java.util.TreeMap;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.util.ToolRunner;
import org.junit.Before; import org.junit.Before;
import org.junit.Test; import org.junit.Test;
public class TestWordStats { public class TestWordStats {
private final static String INPUT = "src/test/java/org/apache/hadoop/examples/pi/math"; private final static String INPUT = "src/test/java/org/apache/hadoop/examples/pi/math";
private final static String MEAN_OUTPUT = "build/data/mean_output"; private final static String MEAN_OUTPUT = "build/data/mean_output";
private final static String MEDIAN_OUTPUT = "build/data/median_output"; private final static String MEDIAN_OUTPUT = "build/data/median_output";
private final static String STDDEV_OUTPUT = "build/data/stddev_output"; private final static String STDDEV_OUTPUT = "build/data/stddev_output";
/** /**
* Modified internal test class that is designed to read all the files in the * Modified internal test class that is designed to read all the files in the
* input directory, and find the standard deviation between all of the word * input directory, and find the standard deviation between all of the word
* lengths. * lengths.
*/ */
public static class WordStdDevReader { public static class WordStdDevReader {
private long wordsRead = 0; private long wordsRead = 0;
private long wordLengthsRead = 0; private long wordLengthsRead = 0;
private long wordLengthsReadSquared = 0; private long wordLengthsReadSquared = 0;
public WordStdDevReader() { public WordStdDevReader() {
} }
public double read(String path) throws IOException { public double read(String path) throws IOException {
FileSystem fs = FileSystem.get(new Configuration()); FileSystem fs = FileSystem.get(new Configuration());
FileStatus[] files = fs.listStatus(new Path(path)); FileStatus[] files = fs.listStatus(new Path(path));
for (FileStatus fileStat : files) { for (FileStatus fileStat : files) {
if (!fileStat.isFile()) if (!fileStat.isFile())
continue; continue;
BufferedReader br = null; BufferedReader br = null;
try { try {
br = new BufferedReader(new InputStreamReader(fs.open(fileStat.getPath()))); br = new BufferedReader(new InputStreamReader(fs.open(fileStat.getPath())));
String line; String line;
while ((line = br.readLine()) != null) { while ((line = br.readLine()) != null) {
StringTokenizer st = new StringTokenizer(line); StringTokenizer st = new StringTokenizer(line);
String word; String word;
while (st.hasMoreTokens()) { while (st.hasMoreTokens()) {
word = st.nextToken(); word = st.nextToken();
this.wordsRead++; this.wordsRead++;
this.wordLengthsRead += word.length(); this.wordLengthsRead += word.length();
this.wordLengthsReadSquared += (long) Math.pow(word.length(), 2.0); this.wordLengthsReadSquared += (long) Math.pow(word.length(), 2.0);
} }
} }
} catch (IOException e) { } catch (IOException e) {
System.out.println("Output could not be read!"); System.out.println("Output could not be read!");
throw e; throw e;
} finally { } finally {
br.close(); br.close();
} }
} }
double mean = (((double) this.wordLengthsRead) / ((double) this.wordsRead)); double mean = (((double) this.wordLengthsRead) / ((double) this.wordsRead));
mean = Math.pow(mean, 2.0); mean = Math.pow(mean, 2.0);
double term = (((double) this.wordLengthsReadSquared / ((double) this.wordsRead))); double term = (((double) this.wordLengthsReadSquared / ((double) this.wordsRead)));
double stddev = Math.sqrt((term - mean)); double stddev = Math.sqrt((term - mean));
return stddev; return stddev;
} }
} }
/** /**
* Modified internal test class that is designed to read all the files in the * Modified internal test class that is designed to read all the files in the
* input directory, and find the median length of all the words. * input directory, and find the median length of all the words.
*/ */
public static class WordMedianReader { public static class WordMedianReader {
private long wordsRead = 0; private long wordsRead = 0;
private TreeMap<Integer, Integer> map = new TreeMap<Integer, Integer>(); private TreeMap<Integer, Integer> map = new TreeMap<Integer, Integer>();
public WordMedianReader() { public WordMedianReader() {
} }
public double read(String path) throws IOException { public double read(String path) throws IOException {
FileSystem fs = FileSystem.get(new Configuration()); FileSystem fs = FileSystem.get(new Configuration());
FileStatus[] files = fs.listStatus(new Path(path)); FileStatus[] files = fs.listStatus(new Path(path));
int num = 0; int num = 0;
for (FileStatus fileStat : files) { for (FileStatus fileStat : files) {
if (!fileStat.isFile()) if (!fileStat.isFile())
continue; continue;
BufferedReader br = null; BufferedReader br = null;
try { try {
br = new BufferedReader(new InputStreamReader(fs.open(fileStat.getPath()))); br = new BufferedReader(new InputStreamReader(fs.open(fileStat.getPath())));
String line; String line;
while ((line = br.readLine()) != null) { while ((line = br.readLine()) != null) {
StringTokenizer st = new StringTokenizer(line); StringTokenizer st = new StringTokenizer(line);
String word; String word;
while (st.hasMoreTokens()) { while (st.hasMoreTokens()) {
word = st.nextToken(); word = st.nextToken();
this.wordsRead++; this.wordsRead++;
if (this.map.get(word.length()) == null) { if (this.map.get(word.length()) == null) {
this.map.put(word.length(), 1); this.map.put(word.length(), 1);
} else { } else {
int count = this.map.get(word.length()); int count = this.map.get(word.length());
this.map.put(word.length(), count + 1); this.map.put(word.length(), count + 1);
} }
} }
} }
} catch (IOException e) { } catch (IOException e) {
System.out.println("Output could not be read!"); System.out.println("Output could not be read!");
throw e; throw e;
} finally { } finally {
br.close(); br.close();
} }
} }
int medianIndex1 = (int) Math.ceil((this.wordsRead / 2.0)); int medianIndex1 = (int) Math.ceil((this.wordsRead / 2.0));
int medianIndex2 = (int) Math.floor((this.wordsRead / 2.0)); int medianIndex2 = (int) Math.floor((this.wordsRead / 2.0));
for (Integer key : this.map.navigableKeySet()) { for (Integer key : this.map.navigableKeySet()) {
int prevNum = num; int prevNum = num;
num += this.map.get(key); num += this.map.get(key);
if (medianIndex2 >= prevNum && medianIndex1 <= num) { if (medianIndex2 >= prevNum && medianIndex1 <= num) {
return key; return key;
} else if (medianIndex2 >= prevNum && medianIndex1 < num) { } else if (medianIndex2 >= prevNum && medianIndex1 < num) {
Integer nextCurrLen = this.map.navigableKeySet().iterator().next(); Integer nextCurrLen = this.map.navigableKeySet().iterator().next();
double median = (key + nextCurrLen) / 2.0; double median = (key + nextCurrLen) / 2.0;
return median; return median;
} }
} }
return -1; return -1;
} }
} }
/** /**
* Modified internal test class that is designed to read all the files in the * Modified internal test class that is designed to read all the files in the
* input directory, and find the mean length of all the words. * input directory, and find the mean length of all the words.
*/ */
public static class WordMeanReader { public static class WordMeanReader {
private long wordsRead = 0; private long wordsRead = 0;
private long wordLengthsRead = 0; private long wordLengthsRead = 0;
public WordMeanReader() { public WordMeanReader() {
} }
public double read(String path) throws IOException { public double read(String path) throws IOException {
FileSystem fs = FileSystem.get(new Configuration()); FileSystem fs = FileSystem.get(new Configuration());
FileStatus[] files = fs.listStatus(new Path(path)); FileStatus[] files = fs.listStatus(new Path(path));
for (FileStatus fileStat : files) { for (FileStatus fileStat : files) {
if (!fileStat.isFile()) if (!fileStat.isFile())
continue; continue;
BufferedReader br = null; BufferedReader br = null;
try { try {
br = new BufferedReader(new InputStreamReader(fs.open(fileStat.getPath()))); br = new BufferedReader(new InputStreamReader(fs.open(fileStat.getPath())));
String line; String line;
while ((line = br.readLine()) != null) { while ((line = br.readLine()) != null) {
StringTokenizer st = new StringTokenizer(line); StringTokenizer st = new StringTokenizer(line);
String word; String word;
while (st.hasMoreTokens()) { while (st.hasMoreTokens()) {
word = st.nextToken(); word = st.nextToken();
this.wordsRead++; this.wordsRead++;
this.wordLengthsRead += word.length(); this.wordLengthsRead += word.length();
} }
} }
} catch (IOException e) { } catch (IOException e) {
System.out.println("Output could not be read!"); System.out.println("Output could not be read!");
throw e; throw e;
} finally { } finally {
br.close(); br.close();
} }
} }
double mean = (((double) this.wordLengthsRead) / ((double) this.wordsRead)); double mean = (((double) this.wordLengthsRead) / ((double) this.wordsRead));
return mean; return mean;
} }
} }
/** /**
* Internal class designed to delete the output directory. Meant solely for * Internal class designed to delete the output directory. Meant solely for
* use before and after the test is run; this is so next iterations of the * use before and after the test is run; this is so next iterations of the
* test do not encounter a "file already exists" error. * test do not encounter a "file already exists" error.
* *
* @param dir * @param dir
* The directory to delete. * The directory to delete.
* @return Returns whether the deletion was successful or not. * @return Returns whether the deletion was successful or not.
*/ */
public static boolean deleteDir(File dir) { public static boolean deleteDir(File dir) {
if (dir.isDirectory()) { if (dir.isDirectory()) {
String[] children = dir.list(); String[] children = dir.list();
for (int i = 0; i < children.length; i++) { for (int i = 0; i < children.length; i++) {
boolean success = deleteDir(new File(dir, children[i])); boolean success = deleteDir(new File(dir, children[i]));
if (!success) { if (!success) {
System.out.println("Could not delete directory after test!"); System.out.println("Could not delete directory after test!");
return false; return false;
} }
} }
} }
// The directory is now empty so delete it // The directory is now empty so delete it
return dir.delete(); return dir.delete();
} }
@Before public void setup() throws Exception { @Before public void setup() throws Exception {
deleteDir(new File(MEAN_OUTPUT)); deleteDir(new File(MEAN_OUTPUT));
deleteDir(new File(MEDIAN_OUTPUT)); deleteDir(new File(MEDIAN_OUTPUT));
deleteDir(new File(STDDEV_OUTPUT)); deleteDir(new File(STDDEV_OUTPUT));
} }
@Test public void testGetTheMean() throws Exception { @Test public void testGetTheMean() throws Exception {
String args[] = new String[2]; String args[] = new String[2];
args[0] = INPUT; args[0] = INPUT;
args[1] = MEAN_OUTPUT; args[1] = MEAN_OUTPUT;
WordMean wm = new WordMean(); WordMean wm = new WordMean();
ToolRunner.run(new Configuration(), wm, args); ToolRunner.run(new Configuration(), wm, args);
double mean = wm.getMean(); double mean = wm.getMean();
// outputs MUST match // outputs MUST match
WordMeanReader wr = new WordMeanReader(); WordMeanReader wr = new WordMeanReader();
assertEquals(mean, wr.read(INPUT), 0.0); assertEquals(mean, wr.read(INPUT), 0.0);
} }
@Test public void testGetTheMedian() throws Exception { @Test public void testGetTheMedian() throws Exception {
String args[] = new String[2]; String args[] = new String[2];
args[0] = INPUT; args[0] = INPUT;
args[1] = MEDIAN_OUTPUT; args[1] = MEDIAN_OUTPUT;
WordMedian wm = new WordMedian(); WordMedian wm = new WordMedian();
ToolRunner.run(new Configuration(), wm, args); ToolRunner.run(new Configuration(), wm, args);
double median = wm.getMedian(); double median = wm.getMedian();
// outputs MUST match // outputs MUST match
WordMedianReader wr = new WordMedianReader(); WordMedianReader wr = new WordMedianReader();
assertEquals(median, wr.read(INPUT), 0.0); assertEquals(median, wr.read(INPUT), 0.0);
} }
@Test public void testGetTheStandardDeviation() throws Exception { @Test public void testGetTheStandardDeviation() throws Exception {
String args[] = new String[2]; String args[] = new String[2];
args[0] = INPUT; args[0] = INPUT;
args[1] = STDDEV_OUTPUT; args[1] = STDDEV_OUTPUT;
WordStandardDeviation wsd = new WordStandardDeviation(); WordStandardDeviation wsd = new WordStandardDeviation();
ToolRunner.run(new Configuration(), wsd, args); ToolRunner.run(new Configuration(), wsd, args);
double stddev = wsd.getStandardDeviation(); double stddev = wsd.getStandardDeviation();
// outputs MUST match // outputs MUST match
WordStdDevReader wr = new WordStdDevReader(); WordStdDevReader wr = new WordStdDevReader();
assertEquals(stddev, wr.read(INPUT), 0.0); assertEquals(stddev, wr.read(INPUT), 0.0);
} }
} }

View File

@ -1,10 +1,10 @@
0 ins apache dot org 0 ins apache dot org
1 ins apache 1 ins apache
2 ins apache 2 ins apache
3 ins apache 3 ins apache
4 ins apache 4 ins apache
5 ins apache 5 ins apache
6 ins apache 6 ins apache
7 ins apache 7 ins apache
8 ins apache 8 ins apache
9 ins apache 9 ins apache

View File

@ -1,10 +1,10 @@
0 del 0 del
1 upd hadoop 1 upd hadoop
2 del 2 del
3 upd hadoop 3 upd hadoop
4 del 4 del
5 upd hadoop 5 upd hadoop
6 del 6 del
7 upd hadoop 7 upd hadoop
8 del 8 del
9 upd hadoop 9 upd hadoop

View File

@ -1,56 +1,56 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.example; package org.apache.hadoop.contrib.index.example;
import org.apache.hadoop.contrib.index.mapred.DocumentID; import org.apache.hadoop.contrib.index.mapred.DocumentID;
import org.apache.hadoop.contrib.index.mapred.IDistributionPolicy; import org.apache.hadoop.contrib.index.mapred.IDistributionPolicy;
import org.apache.hadoop.contrib.index.mapred.Shard; import org.apache.hadoop.contrib.index.mapred.Shard;
/** /**
* Choose a shard for each insert or delete based on document id hashing. Do * Choose a shard for each insert or delete based on document id hashing. Do
* NOT use this distribution policy when the number of shards changes. * NOT use this distribution policy when the number of shards changes.
*/ */
public class HashingDistributionPolicy implements IDistributionPolicy { public class HashingDistributionPolicy implements IDistributionPolicy {
private int numShards; private int numShards;
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#init(org.apache.hadoop.contrib.index.mapred.Shard[]) * @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#init(org.apache.hadoop.contrib.index.mapred.Shard[])
*/ */
public void init(Shard[] shards) { public void init(Shard[] shards) {
numShards = shards.length; numShards = shards.length;
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#chooseShardForInsert(org.apache.hadoop.contrib.index.mapred.DocumentID) * @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#chooseShardForInsert(org.apache.hadoop.contrib.index.mapred.DocumentID)
*/ */
public int chooseShardForInsert(DocumentID key) { public int chooseShardForInsert(DocumentID key) {
int hashCode = key.hashCode(); int hashCode = key.hashCode();
return hashCode >= 0 ? hashCode % numShards : (-hashCode) % numShards; return hashCode >= 0 ? hashCode % numShards : (-hashCode) % numShards;
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#chooseShardForDelete(org.apache.hadoop.contrib.index.mapred.DocumentID) * @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#chooseShardForDelete(org.apache.hadoop.contrib.index.mapred.DocumentID)
*/ */
public int chooseShardForDelete(DocumentID key) { public int chooseShardForDelete(DocumentID key) {
int hashCode = key.hashCode(); int hashCode = key.hashCode();
return hashCode >= 0 ? hashCode % numShards : (-hashCode) % numShards; return hashCode >= 0 ? hashCode % numShards : (-hashCode) % numShards;
} }
} }

View File

@ -1,57 +1,57 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.example; package org.apache.hadoop.contrib.index.example;
import java.io.IOException; import java.io.IOException;
import org.apache.hadoop.contrib.index.mapred.DocumentAndOp; import org.apache.hadoop.contrib.index.mapred.DocumentAndOp;
import org.apache.hadoop.contrib.index.mapred.DocumentID; import org.apache.hadoop.contrib.index.mapred.DocumentID;
import org.apache.hadoop.contrib.index.mapred.ILocalAnalysis; import org.apache.hadoop.contrib.index.mapred.ILocalAnalysis;
import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.Reporter;
/** /**
* Identity local analysis maps inputs directly into outputs. * Identity local analysis maps inputs directly into outputs.
*/ */
public class IdentityLocalAnalysis implements public class IdentityLocalAnalysis implements
ILocalAnalysis<DocumentID, DocumentAndOp> { ILocalAnalysis<DocumentID, DocumentAndOp> {
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.Mapper#map(java.lang.Object, java.lang.Object, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) * @see org.apache.hadoop.mapred.Mapper#map(java.lang.Object, java.lang.Object, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter)
*/ */
public void map(DocumentID key, DocumentAndOp value, public void map(DocumentID key, DocumentAndOp value,
OutputCollector<DocumentID, DocumentAndOp> output, Reporter reporter) OutputCollector<DocumentID, DocumentAndOp> output, Reporter reporter)
throws IOException { throws IOException {
output.collect(key, value); output.collect(key, value);
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.JobConfigurable#configure(org.apache.hadoop.mapred.JobConf) * @see org.apache.hadoop.mapred.JobConfigurable#configure(org.apache.hadoop.mapred.JobConf)
*/ */
public void configure(JobConf job) { public void configure(JobConf job) {
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.io.Closeable#close() * @see org.apache.hadoop.io.Closeable#close()
*/ */
public void close() throws IOException { public void close() throws IOException {
} }
} }

View File

@ -1,46 +1,46 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.example; package org.apache.hadoop.contrib.index.example;
import java.io.IOException; import java.io.IOException;
import org.apache.hadoop.contrib.index.mapred.DocumentID; import org.apache.hadoop.contrib.index.mapred.DocumentID;
import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileSplit; import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.InputSplit; import org.apache.hadoop.mapred.InputSplit;
import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.RecordReader; import org.apache.hadoop.mapred.RecordReader;
import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.Reporter;
/** /**
* An InputFormat for LineDoc for plain text files where each line is a doc. * An InputFormat for LineDoc for plain text files where each line is a doc.
*/ */
public class LineDocInputFormat extends public class LineDocInputFormat extends
FileInputFormat<DocumentID, LineDocTextAndOp> { FileInputFormat<DocumentID, LineDocTextAndOp> {
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.FileInputFormat#getRecordReader(org.apache.hadoop.mapred.InputSplit, org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.Reporter) * @see org.apache.hadoop.mapred.FileInputFormat#getRecordReader(org.apache.hadoop.mapred.InputSplit, org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.Reporter)
*/ */
public RecordReader<DocumentID, LineDocTextAndOp> getRecordReader( public RecordReader<DocumentID, LineDocTextAndOp> getRecordReader(
InputSplit split, JobConf job, Reporter reporter) throws IOException { InputSplit split, JobConf job, Reporter reporter) throws IOException {
reporter.setStatus(split.toString()); reporter.setStatus(split.toString());
return new LineDocRecordReader(job, (FileSplit) split); return new LineDocRecordReader(job, (FileSplit) split);
} }
} }

View File

@ -1,80 +1,80 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.example; package org.apache.hadoop.contrib.index.example;
import java.io.IOException; import java.io.IOException;
import org.apache.hadoop.contrib.index.mapred.DocumentAndOp; import org.apache.hadoop.contrib.index.mapred.DocumentAndOp;
import org.apache.hadoop.contrib.index.mapred.DocumentID; import org.apache.hadoop.contrib.index.mapred.DocumentID;
import org.apache.hadoop.contrib.index.mapred.ILocalAnalysis; import org.apache.hadoop.contrib.index.mapred.ILocalAnalysis;
import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.Reporter;
import org.apache.lucene.document.Document; import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field; import org.apache.lucene.document.Field;
import org.apache.lucene.index.Term; import org.apache.lucene.index.Term;
/** /**
* Convert LineDocTextAndOp to DocumentAndOp as required by ILocalAnalysis. * Convert LineDocTextAndOp to DocumentAndOp as required by ILocalAnalysis.
*/ */
public class LineDocLocalAnalysis implements public class LineDocLocalAnalysis implements
ILocalAnalysis<DocumentID, LineDocTextAndOp> { ILocalAnalysis<DocumentID, LineDocTextAndOp> {
private static String docidFieldName = "id"; private static String docidFieldName = "id";
private static String contentFieldName = "content"; private static String contentFieldName = "content";
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.Mapper#map(java.lang.Object, java.lang.Object, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) * @see org.apache.hadoop.mapred.Mapper#map(java.lang.Object, java.lang.Object, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter)
*/ */
public void map(DocumentID key, LineDocTextAndOp value, public void map(DocumentID key, LineDocTextAndOp value,
OutputCollector<DocumentID, DocumentAndOp> output, Reporter reporter) OutputCollector<DocumentID, DocumentAndOp> output, Reporter reporter)
throws IOException { throws IOException {
DocumentAndOp.Op op = value.getOp(); DocumentAndOp.Op op = value.getOp();
Document doc = null; Document doc = null;
Term term = null; Term term = null;
if (op == DocumentAndOp.Op.INSERT || op == DocumentAndOp.Op.UPDATE) { if (op == DocumentAndOp.Op.INSERT || op == DocumentAndOp.Op.UPDATE) {
doc = new Document(); doc = new Document();
doc.add(new Field(docidFieldName, key.getText().toString(), doc.add(new Field(docidFieldName, key.getText().toString(),
Field.Store.YES, Field.Index.UN_TOKENIZED)); Field.Store.YES, Field.Index.UN_TOKENIZED));
doc.add(new Field(contentFieldName, value.getText().toString(), doc.add(new Field(contentFieldName, value.getText().toString(),
Field.Store.NO, Field.Index.TOKENIZED)); Field.Store.NO, Field.Index.TOKENIZED));
} }
if (op == DocumentAndOp.Op.DELETE || op == DocumentAndOp.Op.UPDATE) { if (op == DocumentAndOp.Op.DELETE || op == DocumentAndOp.Op.UPDATE) {
term = new Term(docidFieldName, key.getText().toString()); term = new Term(docidFieldName, key.getText().toString());
} }
output.collect(key, new DocumentAndOp(op, doc, term)); output.collect(key, new DocumentAndOp(op, doc, term));
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.JobConfigurable#configure(org.apache.hadoop.mapred.JobConf) * @see org.apache.hadoop.mapred.JobConfigurable#configure(org.apache.hadoop.mapred.JobConf)
*/ */
public void configure(JobConf job) { public void configure(JobConf job) {
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.io.Closeable#close() * @see org.apache.hadoop.io.Closeable#close()
*/ */
public void close() throws IOException { public void close() throws IOException {
} }
} }

View File

@ -1,231 +1,231 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.example; package org.apache.hadoop.contrib.index.example;
import java.io.BufferedInputStream; import java.io.BufferedInputStream;
import java.io.ByteArrayOutputStream; import java.io.ByteArrayOutputStream;
import java.io.IOException; import java.io.IOException;
import java.io.InputStream; import java.io.InputStream;
import java.io.OutputStream; import java.io.OutputStream;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.contrib.index.mapred.DocumentAndOp; import org.apache.hadoop.contrib.index.mapred.DocumentAndOp;
import org.apache.hadoop.contrib.index.mapred.DocumentID; import org.apache.hadoop.contrib.index.mapred.DocumentID;
import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileSplit; import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.RecordReader; import org.apache.hadoop.mapred.RecordReader;
/** /**
* A simple RecordReader for LineDoc for plain text files where each line is a * A simple RecordReader for LineDoc for plain text files where each line is a
* doc. Each line is as follows: documentID<SPACE>op<SPACE>content<EOF>, * doc. Each line is as follows: documentID<SPACE>op<SPACE>content<EOF>,
* where op can be "i", "ins" or "insert" for insert, "d", "del" or "delete" * where op can be "i", "ins" or "insert" for insert, "d", "del" or "delete"
* for delete, or "u", "upd" or "update" for update. * for delete, or "u", "upd" or "update" for update.
*/ */
public class LineDocRecordReader implements public class LineDocRecordReader implements
RecordReader<DocumentID, LineDocTextAndOp> { RecordReader<DocumentID, LineDocTextAndOp> {
private static final char SPACE = ' '; private static final char SPACE = ' ';
private static final char EOL = '\n'; private static final char EOL = '\n';
private long start; private long start;
private long pos; private long pos;
private long end; private long end;
private BufferedInputStream in; private BufferedInputStream in;
private ByteArrayOutputStream buffer = new ByteArrayOutputStream(256); private ByteArrayOutputStream buffer = new ByteArrayOutputStream(256);
/** /**
* Provide a bridge to get the bytes from the ByteArrayOutputStream without * Provide a bridge to get the bytes from the ByteArrayOutputStream without
* creating a new byte array. * creating a new byte array.
*/ */
private static class TextStuffer extends OutputStream { private static class TextStuffer extends OutputStream {
public Text target; public Text target;
public void write(int b) { public void write(int b) {
throw new UnsupportedOperationException("write(byte) not supported"); throw new UnsupportedOperationException("write(byte) not supported");
} }
public void write(byte[] data, int offset, int len) throws IOException { public void write(byte[] data, int offset, int len) throws IOException {
target.set(data, offset, len); target.set(data, offset, len);
} }
} }
private TextStuffer bridge = new TextStuffer(); private TextStuffer bridge = new TextStuffer();
/** /**
* Constructor * Constructor
* @param job * @param job
* @param split * @param split
* @throws IOException * @throws IOException
*/ */
public LineDocRecordReader(Configuration job, FileSplit split) public LineDocRecordReader(Configuration job, FileSplit split)
throws IOException { throws IOException {
long start = split.getStart(); long start = split.getStart();
long end = start + split.getLength(); long end = start + split.getLength();
final Path file = split.getPath(); final Path file = split.getPath();
// open the file and seek to the start of the split // open the file and seek to the start of the split
FileSystem fs = file.getFileSystem(job); FileSystem fs = file.getFileSystem(job);
FSDataInputStream fileIn = fs.open(split.getPath()); FSDataInputStream fileIn = fs.open(split.getPath());
InputStream in = fileIn; InputStream in = fileIn;
boolean skipFirstLine = false; boolean skipFirstLine = false;
if (start != 0) { if (start != 0) {
skipFirstLine = true; // wait till BufferedInputStream to skip skipFirstLine = true; // wait till BufferedInputStream to skip
--start; --start;
fileIn.seek(start); fileIn.seek(start);
} }
this.in = new BufferedInputStream(in); this.in = new BufferedInputStream(in);
if (skipFirstLine) { // skip first line and re-establish "start". if (skipFirstLine) { // skip first line and re-establish "start".
start += LineDocRecordReader.readData(this.in, null, EOL); start += LineDocRecordReader.readData(this.in, null, EOL);
} }
this.start = start; this.start = start;
this.pos = start; this.pos = start;
this.end = end; this.end = end;
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.RecordReader#close() * @see org.apache.hadoop.mapred.RecordReader#close()
*/ */
public void close() throws IOException { public void close() throws IOException {
in.close(); in.close();
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.RecordReader#createKey() * @see org.apache.hadoop.mapred.RecordReader#createKey()
*/ */
public DocumentID createKey() { public DocumentID createKey() {
return new DocumentID(); return new DocumentID();
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.RecordReader#createValue() * @see org.apache.hadoop.mapred.RecordReader#createValue()
*/ */
public LineDocTextAndOp createValue() { public LineDocTextAndOp createValue() {
return new LineDocTextAndOp(); return new LineDocTextAndOp();
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.RecordReader#getPos() * @see org.apache.hadoop.mapred.RecordReader#getPos()
*/ */
public long getPos() throws IOException { public long getPos() throws IOException {
return pos; return pos;
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.RecordReader#getProgress() * @see org.apache.hadoop.mapred.RecordReader#getProgress()
*/ */
public float getProgress() throws IOException { public float getProgress() throws IOException {
if (start == end) { if (start == end) {
return 0.0f; return 0.0f;
} else { } else {
return Math.min(1.0f, (pos - start) / (float) (end - start)); return Math.min(1.0f, (pos - start) / (float) (end - start));
} }
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.RecordReader#next(java.lang.Object, java.lang.Object) * @see org.apache.hadoop.mapred.RecordReader#next(java.lang.Object, java.lang.Object)
*/ */
public synchronized boolean next(DocumentID key, LineDocTextAndOp value) public synchronized boolean next(DocumentID key, LineDocTextAndOp value)
throws IOException { throws IOException {
if (pos >= end) { if (pos >= end) {
return false; return false;
} }
// key is document id, which are bytes until first space // key is document id, which are bytes until first space
if (!readInto(key.getText(), SPACE)) { if (!readInto(key.getText(), SPACE)) {
return false; return false;
} }
// read operation: i/d/u, or ins/del/upd, or insert/delete/update // read operation: i/d/u, or ins/del/upd, or insert/delete/update
Text opText = new Text(); Text opText = new Text();
if (!readInto(opText, SPACE)) { if (!readInto(opText, SPACE)) {
return false; return false;
} }
String opStr = opText.toString(); String opStr = opText.toString();
DocumentAndOp.Op op; DocumentAndOp.Op op;
if (opStr.equals("i") || opStr.equals("ins") || opStr.equals("insert")) { if (opStr.equals("i") || opStr.equals("ins") || opStr.equals("insert")) {
op = DocumentAndOp.Op.INSERT; op = DocumentAndOp.Op.INSERT;
} else if (opStr.equals("d") || opStr.equals("del") } else if (opStr.equals("d") || opStr.equals("del")
|| opStr.equals("delete")) { || opStr.equals("delete")) {
op = DocumentAndOp.Op.DELETE; op = DocumentAndOp.Op.DELETE;
} else if (opStr.equals("u") || opStr.equals("upd") } else if (opStr.equals("u") || opStr.equals("upd")
|| opStr.equals("update")) { || opStr.equals("update")) {
op = DocumentAndOp.Op.UPDATE; op = DocumentAndOp.Op.UPDATE;
} else { } else {
// default is insert // default is insert
op = DocumentAndOp.Op.INSERT; op = DocumentAndOp.Op.INSERT;
} }
value.setOp(op); value.setOp(op);
if (op == DocumentAndOp.Op.DELETE) { if (op == DocumentAndOp.Op.DELETE) {
return true; return true;
} else { } else {
// read rest of the line // read rest of the line
return readInto(value.getText(), EOL); return readInto(value.getText(), EOL);
} }
} }
private boolean readInto(Text text, char delimiter) throws IOException { private boolean readInto(Text text, char delimiter) throws IOException {
buffer.reset(); buffer.reset();
long bytesRead = readData(in, buffer, delimiter); long bytesRead = readData(in, buffer, delimiter);
if (bytesRead == 0) { if (bytesRead == 0) {
return false; return false;
} }
pos += bytesRead; pos += bytesRead;
bridge.target = text; bridge.target = text;
buffer.writeTo(bridge); buffer.writeTo(bridge);
return true; return true;
} }
private static long readData(InputStream in, OutputStream out, char delimiter) private static long readData(InputStream in, OutputStream out, char delimiter)
throws IOException { throws IOException {
long bytes = 0; long bytes = 0;
while (true) { while (true) {
int b = in.read(); int b = in.read();
if (b == -1) { if (b == -1) {
break; break;
} }
bytes += 1; bytes += 1;
byte c = (byte) b; byte c = (byte) b;
if (c == EOL || c == delimiter) { if (c == EOL || c == delimiter) {
break; break;
} }
if (c == '\r') { if (c == '\r') {
in.mark(1); in.mark(1);
byte nextC = (byte) in.read(); byte nextC = (byte) in.read();
if (nextC != EOL || c == delimiter) { if (nextC != EOL || c == delimiter) {
in.reset(); in.reset();
} else { } else {
bytes += 1; bytes += 1;
} }
break; break;
} }
if (out != null) { if (out != null) {
out.write(c); out.write(c);
} }
} }
return bytes; return bytes;
} }
} }

View File

@ -1,92 +1,92 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.example; package org.apache.hadoop.contrib.index.example;
import java.io.DataInput; import java.io.DataInput;
import java.io.DataOutput; import java.io.DataOutput;
import java.io.IOException; import java.io.IOException;
import org.apache.hadoop.contrib.index.mapred.DocumentAndOp; import org.apache.hadoop.contrib.index.mapred.DocumentAndOp;
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.Writable;
/** /**
* This class represents an operation. The operation can be an insert, a delete * This class represents an operation. The operation can be an insert, a delete
* or an update. If the operation is an insert or an update, a (new) document, * or an update. If the operation is an insert or an update, a (new) document,
* which is in the form of text, is specified. * which is in the form of text, is specified.
*/ */
public class LineDocTextAndOp implements Writable { public class LineDocTextAndOp implements Writable {
private DocumentAndOp.Op op; private DocumentAndOp.Op op;
private Text doc; private Text doc;
/** /**
* Constructor * Constructor
*/ */
public LineDocTextAndOp() { public LineDocTextAndOp() {
doc = new Text(); doc = new Text();
} }
/** /**
* Set the type of the operation. * Set the type of the operation.
* @param op the type of the operation * @param op the type of the operation
*/ */
public void setOp(DocumentAndOp.Op op) { public void setOp(DocumentAndOp.Op op) {
this.op = op; this.op = op;
} }
/** /**
* Get the type of the operation. * Get the type of the operation.
* @return the type of the operation * @return the type of the operation
*/ */
public DocumentAndOp.Op getOp() { public DocumentAndOp.Op getOp() {
return op; return op;
} }
/** /**
* Get the text that represents a document. * Get the text that represents a document.
* @return the text that represents a document * @return the text that represents a document
*/ */
public Text getText() { public Text getText() {
return doc; return doc;
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see java.lang.Object#toString() * @see java.lang.Object#toString()
*/ */
public String toString() { public String toString() {
return this.getClass().getName() + "[op=" + op + ", text=" + doc + "]"; return this.getClass().getName() + "[op=" + op + ", text=" + doc + "]";
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.io.Writable#write(java.io.DataOutput) * @see org.apache.hadoop.io.Writable#write(java.io.DataOutput)
*/ */
public void write(DataOutput out) throws IOException { public void write(DataOutput out) throws IOException {
throw new IOException(this.getClass().getName() throw new IOException(this.getClass().getName()
+ ".write should never be called"); + ".write should never be called");
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput) * @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput)
*/ */
public void readFields(DataInput in) throws IOException { public void readFields(DataInput in) throws IOException {
throw new IOException(this.getClass().getName() throw new IOException(this.getClass().getName()
+ ".readFields should never be called"); + ".readFields should never be called");
} }
} }

View File

@ -1,58 +1,58 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.example; package org.apache.hadoop.contrib.index.example;
import org.apache.hadoop.contrib.index.mapred.DocumentID; import org.apache.hadoop.contrib.index.mapred.DocumentID;
import org.apache.hadoop.contrib.index.mapred.IDistributionPolicy; import org.apache.hadoop.contrib.index.mapred.IDistributionPolicy;
import org.apache.hadoop.contrib.index.mapred.Shard; import org.apache.hadoop.contrib.index.mapred.Shard;
/** /**
* Choose a shard for each insert in a round-robin fashion. Choose all the * Choose a shard for each insert in a round-robin fashion. Choose all the
* shards for each delete because we don't know where it is stored. * shards for each delete because we don't know where it is stored.
*/ */
public class RoundRobinDistributionPolicy implements IDistributionPolicy { public class RoundRobinDistributionPolicy implements IDistributionPolicy {
private int numShards; private int numShards;
private int rr; // round-robin implementation private int rr; // round-robin implementation
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#init(org.apache.hadoop.contrib.index.mapred.Shard[]) * @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#init(org.apache.hadoop.contrib.index.mapred.Shard[])
*/ */
public void init(Shard[] shards) { public void init(Shard[] shards) {
numShards = shards.length; numShards = shards.length;
rr = 0; rr = 0;
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#chooseShardForInsert(org.apache.hadoop.contrib.index.mapred.DocumentID) * @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#chooseShardForInsert(org.apache.hadoop.contrib.index.mapred.DocumentID)
*/ */
public int chooseShardForInsert(DocumentID key) { public int chooseShardForInsert(DocumentID key) {
int chosen = rr; int chosen = rr;
rr = (rr + 1) % numShards; rr = (rr + 1) % numShards;
return chosen; return chosen;
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#chooseShardForDelete(org.apache.hadoop.contrib.index.mapred.DocumentID) * @see org.apache.hadoop.contrib.index.mapred.IDistributionPolicy#chooseShardForDelete(org.apache.hadoop.contrib.index.mapred.DocumentID)
*/ */
public int chooseShardForDelete(DocumentID key) { public int chooseShardForDelete(DocumentID key) {
// -1 represents all the shards // -1 represents all the shards
return -1; return -1;
} }
} }

View File

@ -1,55 +1,55 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.lucene; package org.apache.hadoop.contrib.index.lucene;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter; import org.apache.hadoop.fs.PathFilter;
import org.apache.lucene.index.IndexFileNameFilter; import org.apache.lucene.index.IndexFileNameFilter;
/** /**
* A wrapper class to convert an IndexFileNameFilter which implements * A wrapper class to convert an IndexFileNameFilter which implements
* java.io.FilenameFilter to an org.apache.hadoop.fs.PathFilter. * java.io.FilenameFilter to an org.apache.hadoop.fs.PathFilter.
*/ */
class LuceneIndexFileNameFilter implements PathFilter { class LuceneIndexFileNameFilter implements PathFilter {
private static final LuceneIndexFileNameFilter singleton = private static final LuceneIndexFileNameFilter singleton =
new LuceneIndexFileNameFilter(); new LuceneIndexFileNameFilter();
/** /**
* Get a static instance. * Get a static instance.
* @return the static instance * @return the static instance
*/ */
public static LuceneIndexFileNameFilter getFilter() { public static LuceneIndexFileNameFilter getFilter() {
return singleton; return singleton;
} }
private final IndexFileNameFilter luceneFilter; private final IndexFileNameFilter luceneFilter;
private LuceneIndexFileNameFilter() { private LuceneIndexFileNameFilter() {
luceneFilter = IndexFileNameFilter.getFilter(); luceneFilter = IndexFileNameFilter.getFilter();
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.fs.PathFilter#accept(org.apache.hadoop.fs.Path) * @see org.apache.hadoop.fs.PathFilter#accept(org.apache.hadoop.fs.Path)
*/ */
public boolean accept(Path path) { public boolean accept(Path path) {
return luceneFilter.accept(null, path.getName()); return luceneFilter.accept(null, path.getName());
} }
} }

View File

@ -1,112 +1,112 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.lucene; package org.apache.hadoop.contrib.index.lucene;
import java.io.IOException; import java.io.IOException;
import org.apache.lucene.store.Directory; import org.apache.lucene.store.Directory;
/** /**
* This class copies some methods from Lucene's SegmentInfos since that class * This class copies some methods from Lucene's SegmentInfos since that class
* is not public. * is not public.
*/ */
public final class LuceneUtil { public final class LuceneUtil {
static final class IndexFileNames { static final class IndexFileNames {
/** Name of the index segment file */ /** Name of the index segment file */
static final String SEGMENTS = "segments"; static final String SEGMENTS = "segments";
/** Name of the generation reference file name */ /** Name of the generation reference file name */
static final String SEGMENTS_GEN = "segments.gen"; static final String SEGMENTS_GEN = "segments.gen";
} }
/** /**
* Check if the file is a segments_N file * Check if the file is a segments_N file
* @param name * @param name
* @return true if the file is a segments_N file * @return true if the file is a segments_N file
*/ */
public static boolean isSegmentsFile(String name) { public static boolean isSegmentsFile(String name) {
return name.startsWith(IndexFileNames.SEGMENTS) return name.startsWith(IndexFileNames.SEGMENTS)
&& !name.equals(IndexFileNames.SEGMENTS_GEN); && !name.equals(IndexFileNames.SEGMENTS_GEN);
} }
/** /**
* Check if the file is the segments.gen file * Check if the file is the segments.gen file
* @param name * @param name
* @return true if the file is the segments.gen file * @return true if the file is the segments.gen file
*/ */
public static boolean isSegmentsGenFile(String name) { public static boolean isSegmentsGenFile(String name) {
return name.equals(IndexFileNames.SEGMENTS_GEN); return name.equals(IndexFileNames.SEGMENTS_GEN);
} }
/** /**
* Get the generation (N) of the current segments_N file in the directory. * Get the generation (N) of the current segments_N file in the directory.
* *
* @param directory -- directory to search for the latest segments_N file * @param directory -- directory to search for the latest segments_N file
*/ */
public static long getCurrentSegmentGeneration(Directory directory) public static long getCurrentSegmentGeneration(Directory directory)
throws IOException { throws IOException {
String[] files = directory.list(); String[] files = directory.list();
if (files == null) if (files == null)
throw new IOException("cannot read directory " + directory throw new IOException("cannot read directory " + directory
+ ": list() returned null"); + ": list() returned null");
return getCurrentSegmentGeneration(files); return getCurrentSegmentGeneration(files);
} }
/** /**
* Get the generation (N) of the current segments_N file from a list of * Get the generation (N) of the current segments_N file from a list of
* files. * files.
* *
* @param files -- array of file names to check * @param files -- array of file names to check
*/ */
public static long getCurrentSegmentGeneration(String[] files) { public static long getCurrentSegmentGeneration(String[] files) {
if (files == null) { if (files == null) {
return -1; return -1;
} }
long max = -1; long max = -1;
for (int i = 0; i < files.length; i++) { for (int i = 0; i < files.length; i++) {
String file = files[i]; String file = files[i];
if (file.startsWith(IndexFileNames.SEGMENTS) if (file.startsWith(IndexFileNames.SEGMENTS)
&& !file.equals(IndexFileNames.SEGMENTS_GEN)) { && !file.equals(IndexFileNames.SEGMENTS_GEN)) {
long gen = generationFromSegmentsFileName(file); long gen = generationFromSegmentsFileName(file);
if (gen > max) { if (gen > max) {
max = gen; max = gen;
} }
} }
} }
return max; return max;
} }
/** /**
* Parse the generation off the segments file name and return it. * Parse the generation off the segments file name and return it.
*/ */
public static long generationFromSegmentsFileName(String fileName) { public static long generationFromSegmentsFileName(String fileName) {
if (fileName.equals(IndexFileNames.SEGMENTS)) { if (fileName.equals(IndexFileNames.SEGMENTS)) {
return 0; return 0;
} else if (fileName.startsWith(IndexFileNames.SEGMENTS)) { } else if (fileName.startsWith(IndexFileNames.SEGMENTS)) {
return Long.parseLong( return Long.parseLong(
fileName.substring(1 + IndexFileNames.SEGMENTS.length()), fileName.substring(1 + IndexFileNames.SEGMENTS.length()),
Character.MAX_RADIX); Character.MAX_RADIX);
} else { } else {
throw new IllegalArgumentException("fileName \"" + fileName throw new IllegalArgumentException("fileName \"" + fileName
+ "\" is not a segments file"); + "\" is not a segments file");
} }
} }
} }

View File

@ -1,49 +1,49 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.lucene; package org.apache.hadoop.contrib.index.lucene;
import java.io.IOException; import java.io.IOException;
import java.util.List; import java.util.List;
import org.apache.lucene.index.IndexCommitPoint; import org.apache.lucene.index.IndexCommitPoint;
import org.apache.lucene.index.IndexDeletionPolicy; import org.apache.lucene.index.IndexDeletionPolicy;
/** /**
* For mixed directory. Use KeepAllDeletionPolicy for the read-only directory * For mixed directory. Use KeepAllDeletionPolicy for the read-only directory
* (keep all from init) and use KeepOnlyLastCommitDeletionPolicy for the * (keep all from init) and use KeepOnlyLastCommitDeletionPolicy for the
* writable directory (initially empty, keep latest after init). * writable directory (initially empty, keep latest after init).
*/ */
class MixedDeletionPolicy implements IndexDeletionPolicy { class MixedDeletionPolicy implements IndexDeletionPolicy {
private int keepAllFromInit = 0; private int keepAllFromInit = 0;
public void onInit(List commits) throws IOException { public void onInit(List commits) throws IOException {
keepAllFromInit = commits.size(); keepAllFromInit = commits.size();
} }
public void onCommit(List commits) throws IOException { public void onCommit(List commits) throws IOException {
int size = commits.size(); int size = commits.size();
assert (size > keepAllFromInit); assert (size > keepAllFromInit);
// keep all from init and the latest, delete the rest // keep all from init and the latest, delete the rest
for (int i = keepAllFromInit; i < size - 1; i++) { for (int i = keepAllFromInit; i < size - 1; i++) {
((IndexCommitPoint) commits.get(i)).delete(); ((IndexCommitPoint) commits.get(i)).delete();
} }
} }
} }

View File

@ -1,185 +1,185 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.lucene; package org.apache.hadoop.contrib.index.lucene;
import java.io.IOException; import java.io.IOException;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.lucene.store.Directory; import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory; import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.IndexInput; import org.apache.lucene.store.IndexInput;
import org.apache.lucene.store.IndexOutput; import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.store.NoLockFactory; import org.apache.lucene.store.NoLockFactory;
/** /**
* The initial version of an index is stored in a read-only FileSystem dir * The initial version of an index is stored in a read-only FileSystem dir
* (FileSystemDirectory). Index files created by newer versions are written to * (FileSystemDirectory). Index files created by newer versions are written to
* a writable local FS dir (Lucene's FSDirectory). We should use the general * a writable local FS dir (Lucene's FSDirectory). We should use the general
* FileSystemDirectory for the writable dir as well. But have to use Lucene's * FileSystemDirectory for the writable dir as well. But have to use Lucene's
* FSDirectory because currently Lucene does randome write and * FSDirectory because currently Lucene does randome write and
* FileSystemDirectory only supports sequential write. * FileSystemDirectory only supports sequential write.
* *
* Note: We may delete files from the read-only FileSystem dir because there * Note: We may delete files from the read-only FileSystem dir because there
* can be some segment files from an uncommitted checkpoint. For the same * can be some segment files from an uncommitted checkpoint. For the same
* reason, we may create files in the writable dir which already exist in the * reason, we may create files in the writable dir which already exist in the
* read-only dir and logically they overwrite the ones in the read-only dir. * read-only dir and logically they overwrite the ones in the read-only dir.
*/ */
class MixedDirectory extends Directory { class MixedDirectory extends Directory {
private final Directory readDir; // FileSystemDirectory private final Directory readDir; // FileSystemDirectory
private final Directory writeDir; // Lucene's FSDirectory private final Directory writeDir; // Lucene's FSDirectory
// take advantage of the fact that Lucene's FSDirectory.fileExists is faster // take advantage of the fact that Lucene's FSDirectory.fileExists is faster
public MixedDirectory(FileSystem readFs, Path readPath, FileSystem writeFs, public MixedDirectory(FileSystem readFs, Path readPath, FileSystem writeFs,
Path writePath, Configuration conf) throws IOException { Path writePath, Configuration conf) throws IOException {
try { try {
readDir = new FileSystemDirectory(readFs, readPath, false, conf); readDir = new FileSystemDirectory(readFs, readPath, false, conf);
// check writeFS is a local FS? // check writeFS is a local FS?
writeDir = FSDirectory.getDirectory(writePath.toString()); writeDir = FSDirectory.getDirectory(writePath.toString());
} catch (IOException e) { } catch (IOException e) {
try { try {
close(); close();
} catch (IOException e1) { } catch (IOException e1) {
// ignore this one, throw the original one // ignore this one, throw the original one
} }
throw e; throw e;
} }
lockFactory = new NoLockFactory(); lockFactory = new NoLockFactory();
} }
// for debugging // for debugging
MixedDirectory(Directory readDir, Directory writeDir) throws IOException { MixedDirectory(Directory readDir, Directory writeDir) throws IOException {
this.readDir = readDir; this.readDir = readDir;
this.writeDir = writeDir; this.writeDir = writeDir;
lockFactory = new NoLockFactory(); lockFactory = new NoLockFactory();
} }
@Override @Override
public String[] list() throws IOException { public String[] list() throws IOException {
String[] readFiles = readDir.list(); String[] readFiles = readDir.list();
String[] writeFiles = writeDir.list(); String[] writeFiles = writeDir.list();
if (readFiles == null || readFiles.length == 0) { if (readFiles == null || readFiles.length == 0) {
return writeFiles; return writeFiles;
} else if (writeFiles == null || writeFiles.length == 0) { } else if (writeFiles == null || writeFiles.length == 0) {
return readFiles; return readFiles;
} else { } else {
String[] result = new String[readFiles.length + writeFiles.length]; String[] result = new String[readFiles.length + writeFiles.length];
System.arraycopy(readFiles, 0, result, 0, readFiles.length); System.arraycopy(readFiles, 0, result, 0, readFiles.length);
System.arraycopy(writeFiles, 0, result, readFiles.length, System.arraycopy(writeFiles, 0, result, readFiles.length,
writeFiles.length); writeFiles.length);
return result; return result;
} }
} }
@Override @Override
public void deleteFile(String name) throws IOException { public void deleteFile(String name) throws IOException {
if (writeDir.fileExists(name)) { if (writeDir.fileExists(name)) {
writeDir.deleteFile(name); writeDir.deleteFile(name);
} }
if (readDir.fileExists(name)) { if (readDir.fileExists(name)) {
readDir.deleteFile(name); readDir.deleteFile(name);
} }
} }
@Override @Override
public boolean fileExists(String name) throws IOException { public boolean fileExists(String name) throws IOException {
return writeDir.fileExists(name) || readDir.fileExists(name); return writeDir.fileExists(name) || readDir.fileExists(name);
} }
@Override @Override
public long fileLength(String name) throws IOException { public long fileLength(String name) throws IOException {
if (writeDir.fileExists(name)) { if (writeDir.fileExists(name)) {
return writeDir.fileLength(name); return writeDir.fileLength(name);
} else { } else {
return readDir.fileLength(name); return readDir.fileLength(name);
} }
} }
@Override @Override
public long fileModified(String name) throws IOException { public long fileModified(String name) throws IOException {
if (writeDir.fileExists(name)) { if (writeDir.fileExists(name)) {
return writeDir.fileModified(name); return writeDir.fileModified(name);
} else { } else {
return readDir.fileModified(name); return readDir.fileModified(name);
} }
} }
@Override @Override
public void renameFile(String from, String to) throws IOException { public void renameFile(String from, String to) throws IOException {
throw new UnsupportedOperationException(); throw new UnsupportedOperationException();
} }
@Override @Override
public void touchFile(String name) throws IOException { public void touchFile(String name) throws IOException {
if (writeDir.fileExists(name)) { if (writeDir.fileExists(name)) {
writeDir.touchFile(name); writeDir.touchFile(name);
} else { } else {
readDir.touchFile(name); readDir.touchFile(name);
} }
} }
@Override @Override
public IndexOutput createOutput(String name) throws IOException { public IndexOutput createOutput(String name) throws IOException {
return writeDir.createOutput(name); return writeDir.createOutput(name);
} }
@Override @Override
public IndexInput openInput(String name) throws IOException { public IndexInput openInput(String name) throws IOException {
if (writeDir.fileExists(name)) { if (writeDir.fileExists(name)) {
return writeDir.openInput(name); return writeDir.openInput(name);
} else { } else {
return readDir.openInput(name); return readDir.openInput(name);
} }
} }
@Override @Override
public IndexInput openInput(String name, int bufferSize) throws IOException { public IndexInput openInput(String name, int bufferSize) throws IOException {
if (writeDir.fileExists(name)) { if (writeDir.fileExists(name)) {
return writeDir.openInput(name, bufferSize); return writeDir.openInput(name, bufferSize);
} else { } else {
return readDir.openInput(name, bufferSize); return readDir.openInput(name, bufferSize);
} }
} }
@Override @Override
public void close() throws IOException { public void close() throws IOException {
try { try {
if (readDir != null) { if (readDir != null) {
readDir.close(); readDir.close();
} }
} finally { } finally {
if (writeDir != null) { if (writeDir != null) {
writeDir.close(); writeDir.close();
} }
} }
} }
public String toString() { public String toString() {
return this.getClass().getName() + "@" + readDir + "&" + writeDir; return this.getClass().getName() + "@" + readDir + "&" + writeDir;
} }
} }

View File

@ -1,119 +1,119 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.lucene; package org.apache.hadoop.contrib.index.lucene;
import java.io.DataInput; import java.io.DataInput;
import java.io.DataOutput; import java.io.DataOutput;
import java.io.IOException; import java.io.IOException;
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text;
import org.apache.lucene.store.IndexInput; import org.apache.lucene.store.IndexInput;
import org.apache.lucene.store.IndexOutput; import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.store.RAMDirectory;
/** /**
* A utility class which writes an index in a ram dir into a DataOutput and * A utility class which writes an index in a ram dir into a DataOutput and
* read from a DataInput an index into a ram dir. * read from a DataInput an index into a ram dir.
*/ */
public class RAMDirectoryUtil { public class RAMDirectoryUtil {
private static final int BUFFER_SIZE = 1024; // RAMOutputStream.BUFFER_SIZE; private static final int BUFFER_SIZE = 1024; // RAMOutputStream.BUFFER_SIZE;
/** /**
* Write a number of files from a ram directory to a data output. * Write a number of files from a ram directory to a data output.
* @param out the data output * @param out the data output
* @param dir the ram directory * @param dir the ram directory
* @param names the names of the files to write * @param names the names of the files to write
* @throws IOException * @throws IOException
*/ */
public static void writeRAMFiles(DataOutput out, RAMDirectory dir, public static void writeRAMFiles(DataOutput out, RAMDirectory dir,
String[] names) throws IOException { String[] names) throws IOException {
out.writeInt(names.length); out.writeInt(names.length);
for (int i = 0; i < names.length; i++) { for (int i = 0; i < names.length; i++) {
Text.writeString(out, names[i]); Text.writeString(out, names[i]);
long length = dir.fileLength(names[i]); long length = dir.fileLength(names[i]);
out.writeLong(length); out.writeLong(length);
if (length > 0) { if (length > 0) {
// can we avoid the extra copy? // can we avoid the extra copy?
IndexInput input = null; IndexInput input = null;
try { try {
input = dir.openInput(names[i], BUFFER_SIZE); input = dir.openInput(names[i], BUFFER_SIZE);
int position = 0; int position = 0;
byte[] buffer = new byte[BUFFER_SIZE]; byte[] buffer = new byte[BUFFER_SIZE];
while (position < length) { while (position < length) {
int len = int len =
position + BUFFER_SIZE <= length ? BUFFER_SIZE position + BUFFER_SIZE <= length ? BUFFER_SIZE
: (int) (length - position); : (int) (length - position);
input.readBytes(buffer, 0, len); input.readBytes(buffer, 0, len);
out.write(buffer, 0, len); out.write(buffer, 0, len);
position += len; position += len;
} }
} finally { } finally {
if (input != null) { if (input != null) {
input.close(); input.close();
} }
} }
} }
} }
} }
/** /**
* Read a number of files from a data input to a ram directory. * Read a number of files from a data input to a ram directory.
* @param in the data input * @param in the data input
* @param dir the ram directory * @param dir the ram directory
* @throws IOException * @throws IOException
*/ */
public static void readRAMFiles(DataInput in, RAMDirectory dir) public static void readRAMFiles(DataInput in, RAMDirectory dir)
throws IOException { throws IOException {
int numFiles = in.readInt(); int numFiles = in.readInt();
for (int i = 0; i < numFiles; i++) { for (int i = 0; i < numFiles; i++) {
String name = Text.readString(in); String name = Text.readString(in);
long length = in.readLong(); long length = in.readLong();
if (length > 0) { if (length > 0) {
// can we avoid the extra copy? // can we avoid the extra copy?
IndexOutput output = null; IndexOutput output = null;
try { try {
output = dir.createOutput(name); output = dir.createOutput(name);
int position = 0; int position = 0;
byte[] buffer = new byte[BUFFER_SIZE]; byte[] buffer = new byte[BUFFER_SIZE];
while (position < length) { while (position < length) {
int len = int len =
position + BUFFER_SIZE <= length ? BUFFER_SIZE position + BUFFER_SIZE <= length ? BUFFER_SIZE
: (int) (length - position); : (int) (length - position);
in.readFully(buffer, 0, len); in.readFully(buffer, 0, len);
output.writeBytes(buffer, 0, len); output.writeBytes(buffer, 0, len);
position += len; position += len;
} }
} finally { } finally {
if (output != null) { if (output != null) {
output.close(); output.close();
} }
} }
} }
} }
} }
} }

View File

@ -1,233 +1,233 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.lucene; package org.apache.hadoop.contrib.index.lucene;
import java.io.IOException; import java.io.IOException;
import java.util.Iterator; import java.util.Iterator;
import org.apache.commons.logging.Log; import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory; import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.contrib.index.mapred.IndexUpdateConfiguration; import org.apache.hadoop.contrib.index.mapred.IndexUpdateConfiguration;
import org.apache.hadoop.contrib.index.mapred.IntermediateForm; import org.apache.hadoop.contrib.index.mapred.IntermediateForm;
import org.apache.hadoop.contrib.index.mapred.Shard; import org.apache.hadoop.contrib.index.mapred.Shard;
import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter; import org.apache.hadoop.fs.PathFilter;
import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy; import org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy;
import org.apache.lucene.index.Term; import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory; import org.apache.lucene.store.Directory;
/** /**
* The initial version of an index is stored in the perm dir. Index files * The initial version of an index is stored in the perm dir. Index files
* created by newer versions are written to a temp dir on the local FS. After * created by newer versions are written to a temp dir on the local FS. After
* successfully creating the new version in the temp dir, the shard writer * successfully creating the new version in the temp dir, the shard writer
* moves the new files to the perm dir and deletes the temp dir in close(). * moves the new files to the perm dir and deletes the temp dir in close().
*/ */
public class ShardWriter { public class ShardWriter {
static final Log LOG = LogFactory.getLog(ShardWriter.class); static final Log LOG = LogFactory.getLog(ShardWriter.class);
private final FileSystem fs; private final FileSystem fs;
private final FileSystem localFs; private final FileSystem localFs;
private final Path perm; private final Path perm;
private final Path temp; private final Path temp;
private final Directory dir; private final Directory dir;
private final IndexWriter writer; private final IndexWriter writer;
private int maxNumSegments; private int maxNumSegments;
private long numForms = 0; private long numForms = 0;
/** /**
* Constructor * Constructor
* @param fs * @param fs
* @param shard * @param shard
* @param tempDir * @param tempDir
* @param iconf * @param iconf
* @throws IOException * @throws IOException
*/ */
public ShardWriter(FileSystem fs, Shard shard, String tempDir, public ShardWriter(FileSystem fs, Shard shard, String tempDir,
IndexUpdateConfiguration iconf) throws IOException { IndexUpdateConfiguration iconf) throws IOException {
LOG.info("Construct a shard writer"); LOG.info("Construct a shard writer");
this.fs = fs; this.fs = fs;
localFs = FileSystem.getLocal(iconf.getConfiguration()); localFs = FileSystem.getLocal(iconf.getConfiguration());
perm = new Path(shard.getDirectory()); perm = new Path(shard.getDirectory());
temp = new Path(tempDir); temp = new Path(tempDir);
long initGeneration = shard.getGeneration(); long initGeneration = shard.getGeneration();
if (!fs.exists(perm)) { if (!fs.exists(perm)) {
assert (initGeneration < 0); assert (initGeneration < 0);
fs.mkdirs(perm); fs.mkdirs(perm);
} else { } else {
restoreGeneration(fs, perm, initGeneration); restoreGeneration(fs, perm, initGeneration);
} }
dir = dir =
new MixedDirectory(fs, perm, localFs, fs.startLocalOutput(perm, temp), new MixedDirectory(fs, perm, localFs, fs.startLocalOutput(perm, temp),
iconf.getConfiguration()); iconf.getConfiguration());
// analyzer is null because we only use addIndexes, not addDocument // analyzer is null because we only use addIndexes, not addDocument
writer = writer =
new IndexWriter(dir, false, null, new IndexWriter(dir, false, null,
initGeneration < 0 ? new KeepOnlyLastCommitDeletionPolicy() initGeneration < 0 ? new KeepOnlyLastCommitDeletionPolicy()
: new MixedDeletionPolicy()); : new MixedDeletionPolicy());
setParameters(iconf); setParameters(iconf);
} }
/** /**
* Process an intermediate form by carrying out, on the Lucene instance of * Process an intermediate form by carrying out, on the Lucene instance of
* the shard, the deletes and the inserts (a ram index) in the form. * the shard, the deletes and the inserts (a ram index) in the form.
* @param form the intermediate form containing deletes and a ram index * @param form the intermediate form containing deletes and a ram index
* @throws IOException * @throws IOException
*/ */
public void process(IntermediateForm form) throws IOException { public void process(IntermediateForm form) throws IOException {
// first delete // first delete
Iterator<Term> iter = form.deleteTermIterator(); Iterator<Term> iter = form.deleteTermIterator();
while (iter.hasNext()) { while (iter.hasNext()) {
writer.deleteDocuments(iter.next()); writer.deleteDocuments(iter.next());
} }
// then insert // then insert
writer.addIndexesNoOptimize(new Directory[] { form.getDirectory() }); writer.addIndexesNoOptimize(new Directory[] { form.getDirectory() });
numForms++; numForms++;
} }
/** /**
* Close the shard writer. Optimize the Lucene instance of the shard before * Close the shard writer. Optimize the Lucene instance of the shard before
* closing if necessary, and copy the files created in the temp directory * closing if necessary, and copy the files created in the temp directory
* to the permanent directory after closing. * to the permanent directory after closing.
* @throws IOException * @throws IOException
*/ */
public void close() throws IOException { public void close() throws IOException {
LOG.info("Closing the shard writer, processed " + numForms + " forms"); LOG.info("Closing the shard writer, processed " + numForms + " forms");
try { try {
try { try {
if (maxNumSegments > 0) { if (maxNumSegments > 0) {
writer.optimize(maxNumSegments); writer.optimize(maxNumSegments);
LOG.info("Optimized the shard into at most " + maxNumSegments LOG.info("Optimized the shard into at most " + maxNumSegments
+ " segments"); + " segments");
} }
} finally { } finally {
writer.close(); writer.close();
LOG.info("Closed Lucene index writer"); LOG.info("Closed Lucene index writer");
} }
moveFromTempToPerm(); moveFromTempToPerm();
LOG.info("Moved new index files to " + perm); LOG.info("Moved new index files to " + perm);
} finally { } finally {
dir.close(); dir.close();
LOG.info("Closed the shard writer"); LOG.info("Closed the shard writer");
} }
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see java.lang.Object#toString() * @see java.lang.Object#toString()
*/ */
public String toString() { public String toString() {
return this.getClass().getName() + "@" + perm + "&" + temp; return this.getClass().getName() + "@" + perm + "&" + temp;
} }
private void setParameters(IndexUpdateConfiguration iconf) { private void setParameters(IndexUpdateConfiguration iconf) {
int maxFieldLength = iconf.getIndexMaxFieldLength(); int maxFieldLength = iconf.getIndexMaxFieldLength();
if (maxFieldLength > 0) { if (maxFieldLength > 0) {
writer.setMaxFieldLength(maxFieldLength); writer.setMaxFieldLength(maxFieldLength);
} }
writer.setUseCompoundFile(iconf.getIndexUseCompoundFile()); writer.setUseCompoundFile(iconf.getIndexUseCompoundFile());
maxNumSegments = iconf.getIndexMaxNumSegments(); maxNumSegments = iconf.getIndexMaxNumSegments();
if (maxFieldLength > 0) { if (maxFieldLength > 0) {
LOG.info("sea.max.field.length = " + writer.getMaxFieldLength()); LOG.info("sea.max.field.length = " + writer.getMaxFieldLength());
} }
LOG.info("sea.use.compound.file = " + writer.getUseCompoundFile()); LOG.info("sea.use.compound.file = " + writer.getUseCompoundFile());
LOG.info("sea.max.num.segments = " + maxNumSegments); LOG.info("sea.max.num.segments = " + maxNumSegments);
} }
// in case a previous reduce task fails, restore the generation to // in case a previous reduce task fails, restore the generation to
// the original starting point by deleting the segments.gen file // the original starting point by deleting the segments.gen file
// and the segments_N files whose generations are greater than the // and the segments_N files whose generations are greater than the
// starting generation; rest of the unwanted files will be deleted // starting generation; rest of the unwanted files will be deleted
// once the unwanted segments_N files are deleted // once the unwanted segments_N files are deleted
private void restoreGeneration(FileSystem fs, Path perm, long startGen) private void restoreGeneration(FileSystem fs, Path perm, long startGen)
throws IOException { throws IOException {
FileStatus[] fileStatus = fs.listStatus(perm, new PathFilter() { FileStatus[] fileStatus = fs.listStatus(perm, new PathFilter() {
public boolean accept(Path path) { public boolean accept(Path path) {
return LuceneUtil.isSegmentsFile(path.getName()); return LuceneUtil.isSegmentsFile(path.getName());
} }
}); });
// remove the segments_N files whose generation are greater than // remove the segments_N files whose generation are greater than
// the starting generation // the starting generation
for (int i = 0; i < fileStatus.length; i++) { for (int i = 0; i < fileStatus.length; i++) {
Path path = fileStatus[i].getPath(); Path path = fileStatus[i].getPath();
if (startGen < LuceneUtil.generationFromSegmentsFileName(path.getName())) { if (startGen < LuceneUtil.generationFromSegmentsFileName(path.getName())) {
fs.delete(path, true); fs.delete(path, true);
} }
} }
// always remove segments.gen in case last failed try removed segments_N // always remove segments.gen in case last failed try removed segments_N
// but not segments.gen, and segments.gen will be overwritten anyway. // but not segments.gen, and segments.gen will be overwritten anyway.
Path segmentsGenFile = new Path(LuceneUtil.IndexFileNames.SEGMENTS_GEN); Path segmentsGenFile = new Path(LuceneUtil.IndexFileNames.SEGMENTS_GEN);
if (fs.exists(segmentsGenFile)) { if (fs.exists(segmentsGenFile)) {
fs.delete(segmentsGenFile, true); fs.delete(segmentsGenFile, true);
} }
} }
// move the files created in the temp dir into the perm dir // move the files created in the temp dir into the perm dir
// and then delete the temp dir from the local FS // and then delete the temp dir from the local FS
private void moveFromTempToPerm() throws IOException { private void moveFromTempToPerm() throws IOException {
try { try {
FileStatus[] fileStatus = FileStatus[] fileStatus =
localFs.listStatus(temp, LuceneIndexFileNameFilter.getFilter()); localFs.listStatus(temp, LuceneIndexFileNameFilter.getFilter());
Path segmentsPath = null; Path segmentsPath = null;
Path segmentsGenPath = null; Path segmentsGenPath = null;
// move the files created in temp dir except segments_N and segments.gen // move the files created in temp dir except segments_N and segments.gen
for (int i = 0; i < fileStatus.length; i++) { for (int i = 0; i < fileStatus.length; i++) {
Path path = fileStatus[i].getPath(); Path path = fileStatus[i].getPath();
String name = path.getName(); String name = path.getName();
if (LuceneUtil.isSegmentsGenFile(name)) { if (LuceneUtil.isSegmentsGenFile(name)) {
assert (segmentsGenPath == null); assert (segmentsGenPath == null);
segmentsGenPath = path; segmentsGenPath = path;
} else if (LuceneUtil.isSegmentsFile(name)) { } else if (LuceneUtil.isSegmentsFile(name)) {
assert (segmentsPath == null); assert (segmentsPath == null);
segmentsPath = path; segmentsPath = path;
} else { } else {
fs.completeLocalOutput(new Path(perm, name), path); fs.completeLocalOutput(new Path(perm, name), path);
} }
} }
// move the segments_N file // move the segments_N file
if (segmentsPath != null) { if (segmentsPath != null) {
fs.completeLocalOutput(new Path(perm, segmentsPath.getName()), fs.completeLocalOutput(new Path(perm, segmentsPath.getName()),
segmentsPath); segmentsPath);
} }
// move the segments.gen file // move the segments.gen file
if (segmentsGenPath != null) { if (segmentsGenPath != null) {
fs.completeLocalOutput(new Path(perm, segmentsGenPath.getName()), fs.completeLocalOutput(new Path(perm, segmentsGenPath.getName()),
segmentsGenPath); segmentsGenPath);
} }
} finally { } finally {
// finally delete the temp dir (files should have been deleted) // finally delete the temp dir (files should have been deleted)
localFs.delete(temp, true); localFs.delete(temp, true);
} }
} }
} }

View File

@ -1,276 +1,276 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.main; package org.apache.hadoop.contrib.index.main;
import java.io.IOException; import java.io.IOException;
import java.text.NumberFormat; import java.text.NumberFormat;
import java.util.Arrays; import java.util.Arrays;
import org.apache.commons.logging.Log; import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory; import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.contrib.index.mapred.IndexUpdateConfiguration; import org.apache.hadoop.contrib.index.mapred.IndexUpdateConfiguration;
import org.apache.hadoop.contrib.index.mapred.IIndexUpdater; import org.apache.hadoop.contrib.index.mapred.IIndexUpdater;
import org.apache.hadoop.contrib.index.mapred.Shard; import org.apache.hadoop.contrib.index.mapred.Shard;
import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.util.ReflectionUtils; import org.apache.hadoop.util.ReflectionUtils;
/** /**
* A distributed "index" is partitioned into "shards". Each shard corresponds * A distributed "index" is partitioned into "shards". Each shard corresponds
* to a Lucene instance. This class contains the main() method which uses a * to a Lucene instance. This class contains the main() method which uses a
* Map/Reduce job to analyze documents and update Lucene instances in parallel. * Map/Reduce job to analyze documents and update Lucene instances in parallel.
* *
* The main() method in UpdateIndex requires the following information for * The main() method in UpdateIndex requires the following information for
* updating the shards: * updating the shards:
* - Input formatter. This specifies how to format the input documents. * - Input formatter. This specifies how to format the input documents.
* - Analysis. This defines the analyzer to use on the input. The analyzer * - Analysis. This defines the analyzer to use on the input. The analyzer
* determines whether a document is being inserted, updated, or deleted. * determines whether a document is being inserted, updated, or deleted.
* For inserts or updates, the analyzer also converts each input document * For inserts or updates, the analyzer also converts each input document
* into a Lucene document. * into a Lucene document.
* - Input paths. This provides the location(s) of updated documents, * - Input paths. This provides the location(s) of updated documents,
* e.g., HDFS files or directories, or HBase tables. * e.g., HDFS files or directories, or HBase tables.
* - Shard paths, or index path with the number of shards. Either specify * - Shard paths, or index path with the number of shards. Either specify
* the path for each shard, or specify an index path and the shards are * the path for each shard, or specify an index path and the shards are
* the sub-directories of the index directory. * the sub-directories of the index directory.
* - Output path. When the update to a shard is done, a message is put here. * - Output path. When the update to a shard is done, a message is put here.
* - Number of map tasks. * - Number of map tasks.
* *
* All of the information can be specified in a configuration file. All but * All of the information can be specified in a configuration file. All but
* the first two can also be specified as command line options. Check out * the first two can also be specified as command line options. Check out
* conf/index-config.xml.template for other configurable parameters. * conf/index-config.xml.template for other configurable parameters.
* *
* Note: Because of the parallel nature of Map/Reduce, the behaviour of * Note: Because of the parallel nature of Map/Reduce, the behaviour of
* multiple inserts, deletes or updates to the same document is undefined. * multiple inserts, deletes or updates to the same document is undefined.
*/ */
public class UpdateIndex { public class UpdateIndex {
public static final Log LOG = LogFactory.getLog(UpdateIndex.class); public static final Log LOG = LogFactory.getLog(UpdateIndex.class);
private static final NumberFormat NUMBER_FORMAT = NumberFormat.getInstance(); private static final NumberFormat NUMBER_FORMAT = NumberFormat.getInstance();
static { static {
NUMBER_FORMAT.setMinimumIntegerDigits(5); NUMBER_FORMAT.setMinimumIntegerDigits(5);
NUMBER_FORMAT.setGroupingUsed(false); NUMBER_FORMAT.setGroupingUsed(false);
} }
private static long now() { private static long now() {
return System.currentTimeMillis(); return System.currentTimeMillis();
} }
private static void printUsage(String cmd) { private static void printUsage(String cmd) {
System.err.println("Usage: java " + UpdateIndex.class.getName() + "\n" System.err.println("Usage: java " + UpdateIndex.class.getName() + "\n"
+ " -inputPaths <inputPath,inputPath>\n" + " -inputPaths <inputPath,inputPath>\n"
+ " -outputPath <outputPath>\n" + " -outputPath <outputPath>\n"
+ " -shards <shardDir,shardDir>\n" + " -shards <shardDir,shardDir>\n"
+ " -indexPath <indexPath>\n" + " -indexPath <indexPath>\n"
+ " -numShards <num>\n" + " -numShards <num>\n"
+ " -numMapTasks <num>\n" + " -numMapTasks <num>\n"
+ " -conf <confPath>\n" + " -conf <confPath>\n"
+ "Note: Do not use both -shards option and -indexPath option."); + "Note: Do not use both -shards option and -indexPath option.");
} }
private static String getIndexPath(Configuration conf) { private static String getIndexPath(Configuration conf) {
return conf.get("sea.index.path"); return conf.get("sea.index.path");
} }
private static int getNumShards(Configuration conf) { private static int getNumShards(Configuration conf) {
return conf.getInt("sea.num.shards", 1); return conf.getInt("sea.num.shards", 1);
} }
private static Shard[] createShards(String indexPath, int numShards, private static Shard[] createShards(String indexPath, int numShards,
Configuration conf) throws IOException { Configuration conf) throws IOException {
String parent = Shard.normalizePath(indexPath) + Path.SEPARATOR; String parent = Shard.normalizePath(indexPath) + Path.SEPARATOR;
long versionNumber = -1; long versionNumber = -1;
long generation = -1; long generation = -1;
FileSystem fs = FileSystem.get(conf); FileSystem fs = FileSystem.get(conf);
Path path = new Path(indexPath); Path path = new Path(indexPath);
if (fs.exists(path)) { if (fs.exists(path)) {
FileStatus[] fileStatus = fs.listStatus(path); FileStatus[] fileStatus = fs.listStatus(path);
String[] shardNames = new String[fileStatus.length]; String[] shardNames = new String[fileStatus.length];
int count = 0; int count = 0;
for (int i = 0; i < fileStatus.length; i++) { for (int i = 0; i < fileStatus.length; i++) {
if (fileStatus[i].isDirectory()) { if (fileStatus[i].isDirectory()) {
shardNames[count] = fileStatus[i].getPath().getName(); shardNames[count] = fileStatus[i].getPath().getName();
count++; count++;
} }
} }
Arrays.sort(shardNames, 0, count); Arrays.sort(shardNames, 0, count);
Shard[] shards = new Shard[count >= numShards ? count : numShards]; Shard[] shards = new Shard[count >= numShards ? count : numShards];
for (int i = 0; i < count; i++) { for (int i = 0; i < count; i++) {
shards[i] = shards[i] =
new Shard(versionNumber, parent + shardNames[i], generation); new Shard(versionNumber, parent + shardNames[i], generation);
} }
int number = count; int number = count;
for (int i = count; i < numShards; i++) { for (int i = count; i < numShards; i++) {
String shardPath; String shardPath;
while (true) { while (true) {
shardPath = parent + NUMBER_FORMAT.format(number++); shardPath = parent + NUMBER_FORMAT.format(number++);
if (!fs.exists(new Path(shardPath))) { if (!fs.exists(new Path(shardPath))) {
break; break;
} }
} }
shards[i] = new Shard(versionNumber, shardPath, generation); shards[i] = new Shard(versionNumber, shardPath, generation);
} }
return shards; return shards;
} else { } else {
Shard[] shards = new Shard[numShards]; Shard[] shards = new Shard[numShards];
for (int i = 0; i < shards.length; i++) { for (int i = 0; i < shards.length; i++) {
shards[i] = shards[i] =
new Shard(versionNumber, parent + NUMBER_FORMAT.format(i), new Shard(versionNumber, parent + NUMBER_FORMAT.format(i),
generation); generation);
} }
return shards; return shards;
} }
} }
/** /**
* The main() method * The main() method
* @param argv * @param argv
*/ */
public static void main(String[] argv) { public static void main(String[] argv) {
if (argv.length == 0) { if (argv.length == 0) {
printUsage(""); printUsage("");
System.exit(-1); System.exit(-1);
} }
String inputPathsString = null; String inputPathsString = null;
Path outputPath = null; Path outputPath = null;
String shardsString = null; String shardsString = null;
String indexPath = null; String indexPath = null;
int numShards = -1; int numShards = -1;
int numMapTasks = -1; int numMapTasks = -1;
Configuration conf = new Configuration(); Configuration conf = new Configuration();
String confPath = null; String confPath = null;
// parse the command line // parse the command line
for (int i = 0; i < argv.length; i++) { // parse command line for (int i = 0; i < argv.length; i++) { // parse command line
if (argv[i].equals("-inputPaths")) { if (argv[i].equals("-inputPaths")) {
inputPathsString = argv[++i]; inputPathsString = argv[++i];
} else if (argv[i].equals("-outputPath")) { } else if (argv[i].equals("-outputPath")) {
outputPath = new Path(argv[++i]); outputPath = new Path(argv[++i]);
} else if (argv[i].equals("-shards")) { } else if (argv[i].equals("-shards")) {
shardsString = argv[++i]; shardsString = argv[++i];
} else if (argv[i].equals("-indexPath")) { } else if (argv[i].equals("-indexPath")) {
indexPath = argv[++i]; indexPath = argv[++i];
} else if (argv[i].equals("-numShards")) { } else if (argv[i].equals("-numShards")) {
numShards = Integer.parseInt(argv[++i]); numShards = Integer.parseInt(argv[++i]);
} else if (argv[i].equals("-numMapTasks")) { } else if (argv[i].equals("-numMapTasks")) {
numMapTasks = Integer.parseInt(argv[++i]); numMapTasks = Integer.parseInt(argv[++i]);
} else if (argv[i].equals("-conf")) { } else if (argv[i].equals("-conf")) {
// add as a local FS resource // add as a local FS resource
confPath = argv[++i]; confPath = argv[++i];
conf.addResource(new Path(confPath)); conf.addResource(new Path(confPath));
} else { } else {
System.out.println("Unknown option " + argv[i] + " w/ value " System.out.println("Unknown option " + argv[i] + " w/ value "
+ argv[++i]); + argv[++i]);
} }
} }
LOG.info("inputPaths = " + inputPathsString); LOG.info("inputPaths = " + inputPathsString);
LOG.info("outputPath = " + outputPath); LOG.info("outputPath = " + outputPath);
LOG.info("shards = " + shardsString); LOG.info("shards = " + shardsString);
LOG.info("indexPath = " + indexPath); LOG.info("indexPath = " + indexPath);
LOG.info("numShards = " + numShards); LOG.info("numShards = " + numShards);
LOG.info("numMapTasks= " + numMapTasks); LOG.info("numMapTasks= " + numMapTasks);
LOG.info("confPath = " + confPath); LOG.info("confPath = " + confPath);
Path[] inputPaths = null; Path[] inputPaths = null;
Shard[] shards = null; Shard[] shards = null;
JobConf jobConf = new JobConf(conf); JobConf jobConf = new JobConf(conf);
IndexUpdateConfiguration iconf = new IndexUpdateConfiguration(jobConf); IndexUpdateConfiguration iconf = new IndexUpdateConfiguration(jobConf);
if (inputPathsString != null) { if (inputPathsString != null) {
jobConf.set(org.apache.hadoop.mapreduce.lib.input. jobConf.set(org.apache.hadoop.mapreduce.lib.input.
FileInputFormat.INPUT_DIR, inputPathsString); FileInputFormat.INPUT_DIR, inputPathsString);
} }
inputPaths = FileInputFormat.getInputPaths(jobConf); inputPaths = FileInputFormat.getInputPaths(jobConf);
if (inputPaths.length == 0) { if (inputPaths.length == 0) {
inputPaths = null; inputPaths = null;
} }
if (outputPath == null) { if (outputPath == null) {
outputPath = FileOutputFormat.getOutputPath(jobConf); outputPath = FileOutputFormat.getOutputPath(jobConf);
} }
if (inputPaths == null || outputPath == null) { if (inputPaths == null || outputPath == null) {
System.err.println("InputPaths and outputPath must be specified."); System.err.println("InputPaths and outputPath must be specified.");
printUsage(""); printUsage("");
System.exit(-1); System.exit(-1);
} }
if (shardsString != null) { if (shardsString != null) {
iconf.setIndexShards(shardsString); iconf.setIndexShards(shardsString);
} }
shards = Shard.getIndexShards(iconf); shards = Shard.getIndexShards(iconf);
if (shards != null && shards.length == 0) { if (shards != null && shards.length == 0) {
shards = null; shards = null;
} }
if (indexPath == null) { if (indexPath == null) {
indexPath = getIndexPath(conf); indexPath = getIndexPath(conf);
} }
if (numShards <= 0) { if (numShards <= 0) {
numShards = getNumShards(conf); numShards = getNumShards(conf);
} }
if (shards == null && indexPath == null) { if (shards == null && indexPath == null) {
System.err.println("Either shards or indexPath must be specified."); System.err.println("Either shards or indexPath must be specified.");
printUsage(""); printUsage("");
System.exit(-1); System.exit(-1);
} }
if (numMapTasks <= 0) { if (numMapTasks <= 0) {
numMapTasks = jobConf.getNumMapTasks(); numMapTasks = jobConf.getNumMapTasks();
} }
try { try {
// create shards and set their directories if necessary // create shards and set their directories if necessary
if (shards == null) { if (shards == null) {
shards = createShards(indexPath, numShards, conf); shards = createShards(indexPath, numShards, conf);
} }
long startTime = now(); long startTime = now();
try { try {
IIndexUpdater updater = IIndexUpdater updater =
(IIndexUpdater) ReflectionUtils.newInstance( (IIndexUpdater) ReflectionUtils.newInstance(
iconf.getIndexUpdaterClass(), conf); iconf.getIndexUpdaterClass(), conf);
LOG.info("sea.index.updater = " LOG.info("sea.index.updater = "
+ iconf.getIndexUpdaterClass().getName()); + iconf.getIndexUpdaterClass().getName());
updater.run(conf, inputPaths, outputPath, numMapTasks, shards); updater.run(conf, inputPaths, outputPath, numMapTasks, shards);
LOG.info("Index update job is done"); LOG.info("Index update job is done");
} finally { } finally {
long elapsedTime = now() - startTime; long elapsedTime = now() - startTime;
LOG.info("Elapsed time is " + (elapsedTime / 1000) + "s"); LOG.info("Elapsed time is " + (elapsedTime / 1000) + "s");
System.out.println("Elapsed time is " + (elapsedTime / 1000) + "s"); System.out.println("Elapsed time is " + (elapsedTime / 1000) + "s");
} }
} catch (Exception e) { } catch (Exception e) {
e.printStackTrace(System.err); e.printStackTrace(System.err);
} }
} }
} }

View File

@ -1,208 +1,208 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import java.io.DataInput; import java.io.DataInput;
import java.io.DataOutput; import java.io.DataOutput;
import java.io.IOException; import java.io.IOException;
import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.Writable;
import org.apache.lucene.document.Document; import org.apache.lucene.document.Document;
import org.apache.lucene.index.Term; import org.apache.lucene.index.Term;
/** /**
* This class represents an indexing operation. The operation can be an insert, * This class represents an indexing operation. The operation can be an insert,
* a delete or an update. If the operation is an insert or an update, a (new) * a delete or an update. If the operation is an insert or an update, a (new)
* document must be specified. If the operation is a delete or an update, a * document must be specified. If the operation is a delete or an update, a
* delete term must be specified. * delete term must be specified.
*/ */
public class DocumentAndOp implements Writable { public class DocumentAndOp implements Writable {
/** /**
* This class represents the type of an operation - an insert, a delete or * This class represents the type of an operation - an insert, a delete or
* an update. * an update.
*/ */
public static final class Op { public static final class Op {
public static final Op INSERT = new Op("INSERT"); public static final Op INSERT = new Op("INSERT");
public static final Op DELETE = new Op("DELETE"); public static final Op DELETE = new Op("DELETE");
public static final Op UPDATE = new Op("UPDATE"); public static final Op UPDATE = new Op("UPDATE");
private String name; private String name;
private Op(String name) { private Op(String name) {
this.name = name; this.name = name;
} }
public String toString() { public String toString() {
return name; return name;
} }
} }
private Op op; private Op op;
private Document doc; private Document doc;
private Term term; private Term term;
/** /**
* Constructor for no operation. * Constructor for no operation.
*/ */
public DocumentAndOp() { public DocumentAndOp() {
} }
/** /**
* Constructor for an insert operation. * Constructor for an insert operation.
* @param op * @param op
* @param doc * @param doc
*/ */
public DocumentAndOp(Op op, Document doc) { public DocumentAndOp(Op op, Document doc) {
assert (op == Op.INSERT); assert (op == Op.INSERT);
this.op = op; this.op = op;
this.doc = doc; this.doc = doc;
this.term = null; this.term = null;
} }
/** /**
* Constructor for a delete operation. * Constructor for a delete operation.
* @param op * @param op
* @param term * @param term
*/ */
public DocumentAndOp(Op op, Term term) { public DocumentAndOp(Op op, Term term) {
assert (op == Op.DELETE); assert (op == Op.DELETE);
this.op = op; this.op = op;
this.doc = null; this.doc = null;
this.term = term; this.term = term;
} }
/** /**
* Constructor for an insert, a delete or an update operation. * Constructor for an insert, a delete or an update operation.
* @param op * @param op
* @param doc * @param doc
* @param term * @param term
*/ */
public DocumentAndOp(Op op, Document doc, Term term) { public DocumentAndOp(Op op, Document doc, Term term) {
if (op == Op.INSERT) { if (op == Op.INSERT) {
assert (doc != null); assert (doc != null);
assert (term == null); assert (term == null);
} else if (op == Op.DELETE) { } else if (op == Op.DELETE) {
assert (doc == null); assert (doc == null);
assert (term != null); assert (term != null);
} else { } else {
assert (op == Op.UPDATE); assert (op == Op.UPDATE);
assert (doc != null); assert (doc != null);
assert (term != null); assert (term != null);
} }
this.op = op; this.op = op;
this.doc = doc; this.doc = doc;
this.term = term; this.term = term;
} }
/** /**
* Set the instance to be an insert operation. * Set the instance to be an insert operation.
* @param doc * @param doc
*/ */
public void setInsert(Document doc) { public void setInsert(Document doc) {
this.op = Op.INSERT; this.op = Op.INSERT;
this.doc = doc; this.doc = doc;
this.term = null; this.term = null;
} }
/** /**
* Set the instance to be a delete operation. * Set the instance to be a delete operation.
* @param term * @param term
*/ */
public void setDelete(Term term) { public void setDelete(Term term) {
this.op = Op.DELETE; this.op = Op.DELETE;
this.doc = null; this.doc = null;
this.term = term; this.term = term;
} }
/** /**
* Set the instance to be an update operation. * Set the instance to be an update operation.
* @param doc * @param doc
* @param term * @param term
*/ */
public void setUpdate(Document doc, Term term) { public void setUpdate(Document doc, Term term) {
this.op = Op.UPDATE; this.op = Op.UPDATE;
this.doc = doc; this.doc = doc;
this.term = term; this.term = term;
} }
/** /**
* Get the type of operation. * Get the type of operation.
* @return the type of the operation. * @return the type of the operation.
*/ */
public Op getOp() { public Op getOp() {
return op; return op;
} }
/** /**
* Get the document. * Get the document.
* @return the document * @return the document
*/ */
public Document getDocument() { public Document getDocument() {
return doc; return doc;
} }
/** /**
* Get the term. * Get the term.
* @return the term * @return the term
*/ */
public Term getTerm() { public Term getTerm() {
return term; return term;
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see java.lang.Object#toString() * @see java.lang.Object#toString()
*/ */
public String toString() { public String toString() {
StringBuilder buffer = new StringBuilder(); StringBuilder buffer = new StringBuilder();
buffer.append(this.getClass().getName()); buffer.append(this.getClass().getName());
buffer.append("[op="); buffer.append("[op=");
buffer.append(op); buffer.append(op);
buffer.append(", doc="); buffer.append(", doc=");
if (doc != null) { if (doc != null) {
buffer.append(doc); buffer.append(doc);
} else { } else {
buffer.append("null"); buffer.append("null");
} }
buffer.append(", term="); buffer.append(", term=");
if (term != null) { if (term != null) {
buffer.append(term); buffer.append(term);
} else { } else {
buffer.append("null"); buffer.append("null");
} }
buffer.append("]"); buffer.append("]");
return buffer.toString(); return buffer.toString();
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.io.Writable#write(java.io.DataOutput) * @see org.apache.hadoop.io.Writable#write(java.io.DataOutput)
*/ */
public void write(DataOutput out) throws IOException { public void write(DataOutput out) throws IOException {
throw new IOException(this.getClass().getName() throw new IOException(this.getClass().getName()
+ ".write should never be called"); + ".write should never be called");
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput) * @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput)
*/ */
public void readFields(DataInput in) throws IOException { public void readFields(DataInput in) throws IOException {
throw new IOException(this.getClass().getName() throw new IOException(this.getClass().getName()
+ ".readFields should never be called"); + ".readFields should never be called");
} }
} }

View File

@ -1,89 +1,89 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import java.io.DataInput; import java.io.DataInput;
import java.io.DataOutput; import java.io.DataOutput;
import java.io.IOException; import java.io.IOException;
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.io.WritableComparable;
/** /**
* The class represents a document id, which is of type text. * The class represents a document id, which is of type text.
*/ */
public class DocumentID implements WritableComparable { public class DocumentID implements WritableComparable {
private final Text docID; private final Text docID;
/** /**
* Constructor. * Constructor.
*/ */
public DocumentID() { public DocumentID() {
docID = new Text(); docID = new Text();
} }
/** /**
* The text of the document id. * The text of the document id.
* @return the text * @return the text
*/ */
public Text getText() { public Text getText() {
return docID; return docID;
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see java.lang.Comparable#compareTo(java.lang.Object) * @see java.lang.Comparable#compareTo(java.lang.Object)
*/ */
public int compareTo(Object obj) { public int compareTo(Object obj) {
if (this == obj) { if (this == obj) {
return 0; return 0;
} else { } else {
return docID.compareTo(((DocumentID) obj).docID); return docID.compareTo(((DocumentID) obj).docID);
} }
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see java.lang.Object#hashCode() * @see java.lang.Object#hashCode()
*/ */
public int hashCode() { public int hashCode() {
return docID.hashCode(); return docID.hashCode();
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see java.lang.Object#toString() * @see java.lang.Object#toString()
*/ */
public String toString() { public String toString() {
return this.getClass().getName() + "[" + docID + "]"; return this.getClass().getName() + "[" + docID + "]";
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.io.Writable#write(java.io.DataOutput) * @see org.apache.hadoop.io.Writable#write(java.io.DataOutput)
*/ */
public void write(DataOutput out) throws IOException { public void write(DataOutput out) throws IOException {
throw new IOException(this.getClass().getName() throw new IOException(this.getClass().getName()
+ ".write should never be called"); + ".write should never be called");
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput) * @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput)
*/ */
public void readFields(DataInput in) throws IOException { public void readFields(DataInput in) throws IOException {
throw new IOException(this.getClass().getName() throw new IOException(this.getClass().getName()
+ ".readFields should never be called"); + ".readFields should never be called");
} }
} }

View File

@ -1,50 +1,50 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
/** /**
* A distribution policy decides, given a document with a document id, which * A distribution policy decides, given a document with a document id, which
* one shard the request should be sent to if the request is an insert, and * one shard the request should be sent to if the request is an insert, and
* which shard(s) the request should be sent to if the request is a delete. * which shard(s) the request should be sent to if the request is a delete.
*/ */
public interface IDistributionPolicy { public interface IDistributionPolicy {
/** /**
* Initialization. It must be called before any chooseShard() is called. * Initialization. It must be called before any chooseShard() is called.
* @param shards * @param shards
*/ */
void init(Shard[] shards); void init(Shard[] shards);
/** /**
* Choose a shard to send an insert request. * Choose a shard to send an insert request.
* @param key * @param key
* @return the index of the chosen shard * @return the index of the chosen shard
*/ */
int chooseShardForInsert(DocumentID key); int chooseShardForInsert(DocumentID key);
/** /**
* Choose a shard or all shards to send a delete request. E.g. a round-robin * Choose a shard or all shards to send a delete request. E.g. a round-robin
* distribution policy would send a delete request to all the shards. * distribution policy would send a delete request to all the shards.
* -1 represents all the shards. * -1 represents all the shards.
* @param key * @param key
* @return the index of the chosen shard, -1 if all the shards are chosen * @return the index of the chosen shard, -1 if all the shards are chosen
*/ */
int chooseShardForDelete(DocumentID key); int chooseShardForDelete(DocumentID key);
} }

View File

@ -1,46 +1,46 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import java.io.IOException; import java.io.IOException;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
/** /**
* A class implements an index updater interface should create a Map/Reduce job * A class implements an index updater interface should create a Map/Reduce job
* configuration and run the Map/Reduce job to analyze documents and update * configuration and run the Map/Reduce job to analyze documents and update
* Lucene instances in parallel. * Lucene instances in parallel.
*/ */
public interface IIndexUpdater { public interface IIndexUpdater {
/** /**
* Create a Map/Reduce job configuration and run the Map/Reduce job to * Create a Map/Reduce job configuration and run the Map/Reduce job to
* analyze documents and update Lucene instances in parallel. * analyze documents and update Lucene instances in parallel.
* @param conf * @param conf
* @param inputPaths * @param inputPaths
* @param outputPath * @param outputPath
* @param numMapTasks * @param numMapTasks
* @param shards * @param shards
* @throws IOException * @throws IOException
*/ */
void run(Configuration conf, Path[] inputPaths, Path outputPath, void run(Configuration conf, Path[] inputPaths, Path outputPath,
int numMapTasks, Shard[] shards) throws IOException; int numMapTasks, Shard[] shards) throws IOException;
} }

View File

@ -1,32 +1,32 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.Mapper;
/** /**
* Application specific local analysis. The output type must be (DocumentID, * Application specific local analysis. The output type must be (DocumentID,
* DocumentAndOp). * DocumentAndOp).
*/ */
public interface ILocalAnalysis<K extends WritableComparable, V extends Writable> public interface ILocalAnalysis<K extends WritableComparable, V extends Writable>
extends Mapper<K, V, DocumentID, DocumentAndOp> { extends Mapper<K, V, DocumentID, DocumentAndOp> {
} }

View File

@ -1,111 +1,111 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import java.io.IOException; import java.io.IOException;
import java.util.Iterator; import java.util.Iterator;
import org.apache.commons.logging.Log; import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory; import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.Reporter;
/** /**
* This combiner combines multiple intermediate forms into one intermediate * This combiner combines multiple intermediate forms into one intermediate
* form. More specifically, the input intermediate forms are a single-document * form. More specifically, the input intermediate forms are a single-document
* ram index and/or a single delete term. An output intermediate form contains * ram index and/or a single delete term. An output intermediate form contains
* a multi-document ram index and/or multiple delete terms. * a multi-document ram index and/or multiple delete terms.
*/ */
public class IndexUpdateCombiner extends MapReduceBase implements public class IndexUpdateCombiner extends MapReduceBase implements
Reducer<Shard, IntermediateForm, Shard, IntermediateForm> { Reducer<Shard, IntermediateForm, Shard, IntermediateForm> {
static final Log LOG = LogFactory.getLog(IndexUpdateCombiner.class); static final Log LOG = LogFactory.getLog(IndexUpdateCombiner.class);
IndexUpdateConfiguration iconf; IndexUpdateConfiguration iconf;
long maxSizeInBytes; long maxSizeInBytes;
long nearMaxSizeInBytes; long nearMaxSizeInBytes;
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.Reducer#reduce(java.lang.Object, java.util.Iterator, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) * @see org.apache.hadoop.mapred.Reducer#reduce(java.lang.Object, java.util.Iterator, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter)
*/ */
public void reduce(Shard key, Iterator<IntermediateForm> values, public void reduce(Shard key, Iterator<IntermediateForm> values,
OutputCollector<Shard, IntermediateForm> output, Reporter reporter) OutputCollector<Shard, IntermediateForm> output, Reporter reporter)
throws IOException { throws IOException {
String message = key.toString(); String message = key.toString();
IntermediateForm form = null; IntermediateForm form = null;
while (values.hasNext()) { while (values.hasNext()) {
IntermediateForm singleDocForm = values.next(); IntermediateForm singleDocForm = values.next();
long formSize = form == null ? 0 : form.totalSizeInBytes(); long formSize = form == null ? 0 : form.totalSizeInBytes();
long singleDocFormSize = singleDocForm.totalSizeInBytes(); long singleDocFormSize = singleDocForm.totalSizeInBytes();
if (form != null && formSize + singleDocFormSize > maxSizeInBytes) { if (form != null && formSize + singleDocFormSize > maxSizeInBytes) {
closeForm(form, message); closeForm(form, message);
output.collect(key, form); output.collect(key, form);
form = null; form = null;
} }
if (form == null && singleDocFormSize >= nearMaxSizeInBytes) { if (form == null && singleDocFormSize >= nearMaxSizeInBytes) {
output.collect(key, singleDocForm); output.collect(key, singleDocForm);
} else { } else {
if (form == null) { if (form == null) {
form = createForm(message); form = createForm(message);
} }
form.process(singleDocForm); form.process(singleDocForm);
} }
} }
if (form != null) { if (form != null) {
closeForm(form, message); closeForm(form, message);
output.collect(key, form); output.collect(key, form);
} }
} }
private IntermediateForm createForm(String message) throws IOException { private IntermediateForm createForm(String message) throws IOException {
LOG.info("Construct a form writer for " + message); LOG.info("Construct a form writer for " + message);
IntermediateForm form = new IntermediateForm(); IntermediateForm form = new IntermediateForm();
form.configure(iconf); form.configure(iconf);
return form; return form;
} }
private void closeForm(IntermediateForm form, String message) private void closeForm(IntermediateForm form, String message)
throws IOException { throws IOException {
form.closeWriter(); form.closeWriter();
LOG.info("Closed the form writer for " + message + ", form = " + form); LOG.info("Closed the form writer for " + message + ", form = " + form);
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.MapReduceBase#configure(org.apache.hadoop.mapred.JobConf) * @see org.apache.hadoop.mapred.MapReduceBase#configure(org.apache.hadoop.mapred.JobConf)
*/ */
public void configure(JobConf job) { public void configure(JobConf job) {
iconf = new IndexUpdateConfiguration(job); iconf = new IndexUpdateConfiguration(job);
maxSizeInBytes = iconf.getMaxRAMSizeInBytes(); maxSizeInBytes = iconf.getMaxRAMSizeInBytes();
nearMaxSizeInBytes = maxSizeInBytes - (maxSizeInBytes >>> 3); // 7/8 of max nearMaxSizeInBytes = maxSizeInBytes - (maxSizeInBytes >>> 3); // 7/8 of max
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.MapReduceBase#close() * @see org.apache.hadoop.mapred.MapReduceBase#close()
*/ */
public void close() throws IOException { public void close() throws IOException {
} }
} }

View File

@ -1,256 +1,256 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.contrib.index.example.HashingDistributionPolicy; import org.apache.hadoop.contrib.index.example.HashingDistributionPolicy;
import org.apache.hadoop.contrib.index.example.LineDocInputFormat; import org.apache.hadoop.contrib.index.example.LineDocInputFormat;
import org.apache.hadoop.contrib.index.example.LineDocLocalAnalysis; import org.apache.hadoop.contrib.index.example.LineDocLocalAnalysis;
import org.apache.hadoop.mapred.InputFormat; import org.apache.hadoop.mapred.InputFormat;
import org.apache.hadoop.mapreduce.MRConfig; import org.apache.hadoop.mapreduce.MRConfig;
import org.apache.hadoop.mapreduce.MRJobConfig; import org.apache.hadoop.mapreduce.MRJobConfig;
import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer;
/** /**
* This class provides the getters and the setters to a number of parameters. * This class provides the getters and the setters to a number of parameters.
* Most of the parameters are related to the index update and the rest are * Most of the parameters are related to the index update and the rest are
* from the existing Map/Reduce parameters. * from the existing Map/Reduce parameters.
*/ */
public class IndexUpdateConfiguration { public class IndexUpdateConfiguration {
final Configuration conf; final Configuration conf;
/** /**
* Constructor * Constructor
* @param conf * @param conf
*/ */
public IndexUpdateConfiguration(Configuration conf) { public IndexUpdateConfiguration(Configuration conf) {
this.conf = conf; this.conf = conf;
} }
/** /**
* Get the underlying configuration object. * Get the underlying configuration object.
* @return the configuration * @return the configuration
*/ */
public Configuration getConfiguration() { public Configuration getConfiguration() {
return conf; return conf;
} }
// //
// existing map/reduce properties // existing map/reduce properties
// //
// public int getIOFileBufferSize() { // public int getIOFileBufferSize() {
// return getInt("io.file.buffer.size", 4096); // return getInt("io.file.buffer.size", 4096);
// } // }
/** /**
* Get the IO sort space in MB. * Get the IO sort space in MB.
* @return the IO sort space in MB * @return the IO sort space in MB
*/ */
public int getIOSortMB() { public int getIOSortMB() {
return conf.getInt(MRJobConfig.IO_SORT_MB, 100); return conf.getInt(MRJobConfig.IO_SORT_MB, 100);
} }
/** /**
* Set the IO sort space in MB. * Set the IO sort space in MB.
* @param mb the IO sort space in MB * @param mb the IO sort space in MB
*/ */
public void setIOSortMB(int mb) { public void setIOSortMB(int mb) {
conf.setInt(MRJobConfig.IO_SORT_MB, mb); conf.setInt(MRJobConfig.IO_SORT_MB, mb);
} }
/** /**
* Get the Map/Reduce temp directory. * Get the Map/Reduce temp directory.
* @return the Map/Reduce temp directory * @return the Map/Reduce temp directory
*/ */
public String getMapredTempDir() { public String getMapredTempDir() {
return conf.get(MRConfig.TEMP_DIR); return conf.get(MRConfig.TEMP_DIR);
} }
// //
// properties for index update // properties for index update
// //
/** /**
* Get the distribution policy class. * Get the distribution policy class.
* @return the distribution policy class * @return the distribution policy class
*/ */
public Class<? extends IDistributionPolicy> getDistributionPolicyClass() { public Class<? extends IDistributionPolicy> getDistributionPolicyClass() {
return conf.getClass("sea.distribution.policy", return conf.getClass("sea.distribution.policy",
HashingDistributionPolicy.class, IDistributionPolicy.class); HashingDistributionPolicy.class, IDistributionPolicy.class);
} }
/** /**
* Set the distribution policy class. * Set the distribution policy class.
* @param theClass the distribution policy class * @param theClass the distribution policy class
*/ */
public void setDistributionPolicyClass( public void setDistributionPolicyClass(
Class<? extends IDistributionPolicy> theClass) { Class<? extends IDistributionPolicy> theClass) {
conf.setClass("sea.distribution.policy", theClass, conf.setClass("sea.distribution.policy", theClass,
IDistributionPolicy.class); IDistributionPolicy.class);
} }
/** /**
* Get the analyzer class. * Get the analyzer class.
* @return the analyzer class * @return the analyzer class
*/ */
public Class<? extends Analyzer> getDocumentAnalyzerClass() { public Class<? extends Analyzer> getDocumentAnalyzerClass() {
return conf.getClass("sea.document.analyzer", StandardAnalyzer.class, return conf.getClass("sea.document.analyzer", StandardAnalyzer.class,
Analyzer.class); Analyzer.class);
} }
/** /**
* Set the analyzer class. * Set the analyzer class.
* @param theClass the analyzer class * @param theClass the analyzer class
*/ */
public void setDocumentAnalyzerClass(Class<? extends Analyzer> theClass) { public void setDocumentAnalyzerClass(Class<? extends Analyzer> theClass) {
conf.setClass("sea.document.analyzer", theClass, Analyzer.class); conf.setClass("sea.document.analyzer", theClass, Analyzer.class);
} }
/** /**
* Get the index input format class. * Get the index input format class.
* @return the index input format class * @return the index input format class
*/ */
public Class<? extends InputFormat> getIndexInputFormatClass() { public Class<? extends InputFormat> getIndexInputFormatClass() {
return conf.getClass("sea.input.format", LineDocInputFormat.class, return conf.getClass("sea.input.format", LineDocInputFormat.class,
InputFormat.class); InputFormat.class);
} }
/** /**
* Set the index input format class. * Set the index input format class.
* @param theClass the index input format class * @param theClass the index input format class
*/ */
public void setIndexInputFormatClass(Class<? extends InputFormat> theClass) { public void setIndexInputFormatClass(Class<? extends InputFormat> theClass) {
conf.setClass("sea.input.format", theClass, InputFormat.class); conf.setClass("sea.input.format", theClass, InputFormat.class);
} }
/** /**
* Get the index updater class. * Get the index updater class.
* @return the index updater class * @return the index updater class
*/ */
public Class<? extends IIndexUpdater> getIndexUpdaterClass() { public Class<? extends IIndexUpdater> getIndexUpdaterClass() {
return conf.getClass("sea.index.updater", IndexUpdater.class, return conf.getClass("sea.index.updater", IndexUpdater.class,
IIndexUpdater.class); IIndexUpdater.class);
} }
/** /**
* Set the index updater class. * Set the index updater class.
* @param theClass the index updater class * @param theClass the index updater class
*/ */
public void setIndexUpdaterClass(Class<? extends IIndexUpdater> theClass) { public void setIndexUpdaterClass(Class<? extends IIndexUpdater> theClass) {
conf.setClass("sea.index.updater", theClass, IIndexUpdater.class); conf.setClass("sea.index.updater", theClass, IIndexUpdater.class);
} }
/** /**
* Get the local analysis class. * Get the local analysis class.
* @return the local analysis class * @return the local analysis class
*/ */
public Class<? extends ILocalAnalysis> getLocalAnalysisClass() { public Class<? extends ILocalAnalysis> getLocalAnalysisClass() {
return conf.getClass("sea.local.analysis", LineDocLocalAnalysis.class, return conf.getClass("sea.local.analysis", LineDocLocalAnalysis.class,
ILocalAnalysis.class); ILocalAnalysis.class);
} }
/** /**
* Set the local analysis class. * Set the local analysis class.
* @param theClass the local analysis class * @param theClass the local analysis class
*/ */
public void setLocalAnalysisClass(Class<? extends ILocalAnalysis> theClass) { public void setLocalAnalysisClass(Class<? extends ILocalAnalysis> theClass) {
conf.setClass("sea.local.analysis", theClass, ILocalAnalysis.class); conf.setClass("sea.local.analysis", theClass, ILocalAnalysis.class);
} }
/** /**
* Get the string representation of a number of shards. * Get the string representation of a number of shards.
* @return the string representation of a number of shards * @return the string representation of a number of shards
*/ */
public String getIndexShards() { public String getIndexShards() {
return conf.get("sea.index.shards"); return conf.get("sea.index.shards");
} }
/** /**
* Set the string representation of a number of shards. * Set the string representation of a number of shards.
* @param shards the string representation of a number of shards * @param shards the string representation of a number of shards
*/ */
public void setIndexShards(String shards) { public void setIndexShards(String shards) {
conf.set("sea.index.shards", shards); conf.set("sea.index.shards", shards);
} }
/** /**
* Get the max field length for a Lucene instance. * Get the max field length for a Lucene instance.
* @return the max field length for a Lucene instance * @return the max field length for a Lucene instance
*/ */
public int getIndexMaxFieldLength() { public int getIndexMaxFieldLength() {
return conf.getInt("sea.max.field.length", -1); return conf.getInt("sea.max.field.length", -1);
} }
/** /**
* Set the max field length for a Lucene instance. * Set the max field length for a Lucene instance.
* @param maxFieldLength the max field length for a Lucene instance * @param maxFieldLength the max field length for a Lucene instance
*/ */
public void setIndexMaxFieldLength(int maxFieldLength) { public void setIndexMaxFieldLength(int maxFieldLength) {
conf.setInt("sea.max.field.length", maxFieldLength); conf.setInt("sea.max.field.length", maxFieldLength);
} }
/** /**
* Get the max number of segments for a Lucene instance. * Get the max number of segments for a Lucene instance.
* @return the max number of segments for a Lucene instance * @return the max number of segments for a Lucene instance
*/ */
public int getIndexMaxNumSegments() { public int getIndexMaxNumSegments() {
return conf.getInt("sea.max.num.segments", -1); return conf.getInt("sea.max.num.segments", -1);
} }
/** /**
* Set the max number of segments for a Lucene instance. * Set the max number of segments for a Lucene instance.
* @param maxNumSegments the max number of segments for a Lucene instance * @param maxNumSegments the max number of segments for a Lucene instance
*/ */
public void setIndexMaxNumSegments(int maxNumSegments) { public void setIndexMaxNumSegments(int maxNumSegments) {
conf.setInt("sea.max.num.segments", maxNumSegments); conf.setInt("sea.max.num.segments", maxNumSegments);
} }
/** /**
* Check whether to use the compound file format for a Lucene instance. * Check whether to use the compound file format for a Lucene instance.
* @return true if using the compound file format for a Lucene instance * @return true if using the compound file format for a Lucene instance
*/ */
public boolean getIndexUseCompoundFile() { public boolean getIndexUseCompoundFile() {
return conf.getBoolean("sea.use.compound.file", false); return conf.getBoolean("sea.use.compound.file", false);
} }
/** /**
* Set whether use the compound file format for a Lucene instance. * Set whether use the compound file format for a Lucene instance.
* @param useCompoundFile whether to use the compound file format * @param useCompoundFile whether to use the compound file format
*/ */
public void setIndexUseCompoundFile(boolean useCompoundFile) { public void setIndexUseCompoundFile(boolean useCompoundFile) {
conf.setBoolean("sea.use.compound.file", useCompoundFile); conf.setBoolean("sea.use.compound.file", useCompoundFile);
} }
/** /**
* Get the max ram index size in bytes. The default is 50M. * Get the max ram index size in bytes. The default is 50M.
* @return the max ram index size in bytes * @return the max ram index size in bytes
*/ */
public long getMaxRAMSizeInBytes() { public long getMaxRAMSizeInBytes() {
return conf.getLong("sea.max.ramsize.bytes", 50L << 20); return conf.getLong("sea.max.ramsize.bytes", 50L << 20);
} }
/** /**
* Set the max ram index size in bytes. * Set the max ram index size in bytes.
* @param b the max ram index size in bytes * @param b the max ram index size in bytes
*/ */
public void setMaxRAMSizeInBytes(long b) { public void setMaxRAMSizeInBytes(long b) {
conf.setLong("sea.max.ramsize.bytes", b); conf.setLong("sea.max.ramsize.bytes", b);
} }
} }

View File

@ -1,199 +1,199 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import java.io.IOException; import java.io.IOException;
import org.apache.commons.logging.Log; import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory; import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.util.ReflectionUtils; import org.apache.hadoop.util.ReflectionUtils;
import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.Analyzer;
/** /**
* This class applies local analysis on a key-value pair and then convert the * This class applies local analysis on a key-value pair and then convert the
* result docid-operation pair to a shard-and-intermediate form pair. * result docid-operation pair to a shard-and-intermediate form pair.
*/ */
public class IndexUpdateMapper<K extends WritableComparable, V extends Writable> public class IndexUpdateMapper<K extends WritableComparable, V extends Writable>
extends MapReduceBase implements Mapper<K, V, Shard, IntermediateForm> { extends MapReduceBase implements Mapper<K, V, Shard, IntermediateForm> {
static final Log LOG = LogFactory.getLog(IndexUpdateMapper.class); static final Log LOG = LogFactory.getLog(IndexUpdateMapper.class);
/** /**
* Get the map output key class. * Get the map output key class.
* @return the map output key class * @return the map output key class
*/ */
public static Class<? extends WritableComparable> getMapOutputKeyClass() { public static Class<? extends WritableComparable> getMapOutputKeyClass() {
return Shard.class; return Shard.class;
} }
/** /**
* Get the map output value class. * Get the map output value class.
* @return the map output value class * @return the map output value class
*/ */
public static Class<? extends Writable> getMapOutputValueClass() { public static Class<? extends Writable> getMapOutputValueClass() {
return IntermediateForm.class; return IntermediateForm.class;
} }
IndexUpdateConfiguration iconf; IndexUpdateConfiguration iconf;
private Analyzer analyzer; private Analyzer analyzer;
private Shard[] shards; private Shard[] shards;
private IDistributionPolicy distributionPolicy; private IDistributionPolicy distributionPolicy;
private ILocalAnalysis<K, V> localAnalysis; private ILocalAnalysis<K, V> localAnalysis;
private DocumentID tmpKey; private DocumentID tmpKey;
private DocumentAndOp tmpValue; private DocumentAndOp tmpValue;
private OutputCollector<DocumentID, DocumentAndOp> tmpCollector = private OutputCollector<DocumentID, DocumentAndOp> tmpCollector =
new OutputCollector<DocumentID, DocumentAndOp>() { new OutputCollector<DocumentID, DocumentAndOp>() {
public void collect(DocumentID key, DocumentAndOp value) public void collect(DocumentID key, DocumentAndOp value)
throws IOException { throws IOException {
tmpKey = key; tmpKey = key;
tmpValue = value; tmpValue = value;
} }
}; };
/** /**
* Map a key-value pair to a shard-and-intermediate form pair. Internally, * Map a key-value pair to a shard-and-intermediate form pair. Internally,
* the local analysis is first applied to map the key-value pair to a * the local analysis is first applied to map the key-value pair to a
* document id-and-operation pair, then the docid-and-operation pair is * document id-and-operation pair, then the docid-and-operation pair is
* mapped to a shard-intermediate form pair. The intermediate form is of the * mapped to a shard-intermediate form pair. The intermediate form is of the
* form of a single-document ram index and/or a single delete term. * form of a single-document ram index and/or a single delete term.
*/ */
public void map(K key, V value, public void map(K key, V value,
OutputCollector<Shard, IntermediateForm> output, Reporter reporter) OutputCollector<Shard, IntermediateForm> output, Reporter reporter)
throws IOException { throws IOException {
synchronized (this) { synchronized (this) {
localAnalysis.map(key, value, tmpCollector, reporter); localAnalysis.map(key, value, tmpCollector, reporter);
if (tmpKey != null && tmpValue != null) { if (tmpKey != null && tmpValue != null) {
DocumentAndOp doc = tmpValue; DocumentAndOp doc = tmpValue;
IntermediateForm form = new IntermediateForm(); IntermediateForm form = new IntermediateForm();
form.configure(iconf); form.configure(iconf);
form.process(doc, analyzer); form.process(doc, analyzer);
form.closeWriter(); form.closeWriter();
if (doc.getOp() == DocumentAndOp.Op.INSERT) { if (doc.getOp() == DocumentAndOp.Op.INSERT) {
int chosenShard = distributionPolicy.chooseShardForInsert(tmpKey); int chosenShard = distributionPolicy.chooseShardForInsert(tmpKey);
if (chosenShard >= 0) { if (chosenShard >= 0) {
// insert into one shard // insert into one shard
output.collect(shards[chosenShard], form); output.collect(shards[chosenShard], form);
} else { } else {
throw new IOException("Chosen shard for insert must be >= 0"); throw new IOException("Chosen shard for insert must be >= 0");
} }
} else if (doc.getOp() == DocumentAndOp.Op.DELETE) { } else if (doc.getOp() == DocumentAndOp.Op.DELETE) {
int chosenShard = distributionPolicy.chooseShardForDelete(tmpKey); int chosenShard = distributionPolicy.chooseShardForDelete(tmpKey);
if (chosenShard >= 0) { if (chosenShard >= 0) {
// delete from one shard // delete from one shard
output.collect(shards[chosenShard], form); output.collect(shards[chosenShard], form);
} else { } else {
// broadcast delete to all shards // broadcast delete to all shards
for (int i = 0; i < shards.length; i++) { for (int i = 0; i < shards.length; i++) {
output.collect(shards[i], form); output.collect(shards[i], form);
} }
} }
} else { // UPDATE } else { // UPDATE
int insertToShard = distributionPolicy.chooseShardForInsert(tmpKey); int insertToShard = distributionPolicy.chooseShardForInsert(tmpKey);
int deleteFromShard = int deleteFromShard =
distributionPolicy.chooseShardForDelete(tmpKey); distributionPolicy.chooseShardForDelete(tmpKey);
if (insertToShard >= 0) { if (insertToShard >= 0) {
if (insertToShard == deleteFromShard) { if (insertToShard == deleteFromShard) {
// update into one shard // update into one shard
output.collect(shards[insertToShard], form); output.collect(shards[insertToShard], form);
} else { } else {
// prepare a deletion form // prepare a deletion form
IntermediateForm deletionForm = new IntermediateForm(); IntermediateForm deletionForm = new IntermediateForm();
deletionForm.configure(iconf); deletionForm.configure(iconf);
deletionForm.process(new DocumentAndOp(DocumentAndOp.Op.DELETE, deletionForm.process(new DocumentAndOp(DocumentAndOp.Op.DELETE,
doc.getTerm()), analyzer); doc.getTerm()), analyzer);
deletionForm.closeWriter(); deletionForm.closeWriter();
if (deleteFromShard >= 0) { if (deleteFromShard >= 0) {
// delete from one shard // delete from one shard
output.collect(shards[deleteFromShard], deletionForm); output.collect(shards[deleteFromShard], deletionForm);
} else { } else {
// broadcast delete to all shards // broadcast delete to all shards
for (int i = 0; i < shards.length; i++) { for (int i = 0; i < shards.length; i++) {
output.collect(shards[i], deletionForm); output.collect(shards[i], deletionForm);
} }
} }
// prepare an insertion form // prepare an insertion form
IntermediateForm insertionForm = new IntermediateForm(); IntermediateForm insertionForm = new IntermediateForm();
insertionForm.configure(iconf); insertionForm.configure(iconf);
insertionForm.process(new DocumentAndOp(DocumentAndOp.Op.INSERT, insertionForm.process(new DocumentAndOp(DocumentAndOp.Op.INSERT,
doc.getDocument()), analyzer); doc.getDocument()), analyzer);
insertionForm.closeWriter(); insertionForm.closeWriter();
// insert into one shard // insert into one shard
output.collect(shards[insertToShard], insertionForm); output.collect(shards[insertToShard], insertionForm);
} }
} else { } else {
throw new IOException("Chosen shard for insert must be >= 0"); throw new IOException("Chosen shard for insert must be >= 0");
} }
} }
} }
} }
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.MapReduceBase#configure(org.apache.hadoop.mapred.JobConf) * @see org.apache.hadoop.mapred.MapReduceBase#configure(org.apache.hadoop.mapred.JobConf)
*/ */
public void configure(JobConf job) { public void configure(JobConf job) {
iconf = new IndexUpdateConfiguration(job); iconf = new IndexUpdateConfiguration(job);
analyzer = analyzer =
(Analyzer) ReflectionUtils.newInstance( (Analyzer) ReflectionUtils.newInstance(
iconf.getDocumentAnalyzerClass(), job); iconf.getDocumentAnalyzerClass(), job);
localAnalysis = localAnalysis =
(ILocalAnalysis) ReflectionUtils.newInstance( (ILocalAnalysis) ReflectionUtils.newInstance(
iconf.getLocalAnalysisClass(), job); iconf.getLocalAnalysisClass(), job);
localAnalysis.configure(job); localAnalysis.configure(job);
shards = Shard.getIndexShards(iconf); shards = Shard.getIndexShards(iconf);
distributionPolicy = distributionPolicy =
(IDistributionPolicy) ReflectionUtils.newInstance( (IDistributionPolicy) ReflectionUtils.newInstance(
iconf.getDistributionPolicyClass(), job); iconf.getDistributionPolicyClass(), job);
distributionPolicy.init(shards); distributionPolicy.init(shards);
LOG.info("sea.document.analyzer = " + analyzer.getClass().getName()); LOG.info("sea.document.analyzer = " + analyzer.getClass().getName());
LOG.info("sea.local.analysis = " + localAnalysis.getClass().getName()); LOG.info("sea.local.analysis = " + localAnalysis.getClass().getName());
LOG.info(shards.length + " shards = " + iconf.getIndexShards()); LOG.info(shards.length + " shards = " + iconf.getIndexShards());
LOG.info("sea.distribution.policy = " LOG.info("sea.distribution.policy = "
+ distributionPolicy.getClass().getName()); + distributionPolicy.getClass().getName());
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.MapReduceBase#close() * @see org.apache.hadoop.mapred.MapReduceBase#close()
*/ */
public void close() throws IOException { public void close() throws IOException {
localAnalysis.close(); localAnalysis.close();
} }
} }

View File

@ -1,60 +1,60 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import java.util.HashMap; import java.util.HashMap;
import java.util.Map; import java.util.Map;
import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.Partitioner; import org.apache.hadoop.mapred.Partitioner;
/** /**
* This partitioner class puts the values of the same key - in this case the * This partitioner class puts the values of the same key - in this case the
* same shard - in the same partition. * same shard - in the same partition.
*/ */
public class IndexUpdatePartitioner implements public class IndexUpdatePartitioner implements
Partitioner<Shard, IntermediateForm> { Partitioner<Shard, IntermediateForm> {
private Shard[] shards; private Shard[] shards;
private Map<Shard, Integer> map; private Map<Shard, Integer> map;
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.Partitioner#getPartition(java.lang.Object, java.lang.Object, int) * @see org.apache.hadoop.mapred.Partitioner#getPartition(java.lang.Object, java.lang.Object, int)
*/ */
public int getPartition(Shard key, IntermediateForm value, int numPartitions) { public int getPartition(Shard key, IntermediateForm value, int numPartitions) {
int partition = map.get(key).intValue(); int partition = map.get(key).intValue();
if (partition < numPartitions) { if (partition < numPartitions) {
return partition; return partition;
} else { } else {
return numPartitions - 1; return numPartitions - 1;
} }
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.JobConfigurable#configure(org.apache.hadoop.mapred.JobConf) * @see org.apache.hadoop.mapred.JobConfigurable#configure(org.apache.hadoop.mapred.JobConf)
*/ */
public void configure(JobConf job) { public void configure(JobConf job) {
shards = Shard.getIndexShards(new IndexUpdateConfiguration(job)); shards = Shard.getIndexShards(new IndexUpdateConfiguration(job));
map = new HashMap<Shard, Integer>(); map = new HashMap<Shard, Integer>();
for (int i = 0; i < shards.length; i++) { for (int i = 0; i < shards.length; i++) {
map.put(shards[i], i); map.put(shards[i], i);
} }
} }
} }

View File

@ -1,143 +1,143 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import java.io.IOException; import java.io.IOException;
import java.util.Iterator; import java.util.Iterator;
import org.apache.commons.logging.Log; import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory; import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.contrib.index.lucene.ShardWriter; import org.apache.hadoop.contrib.index.lucene.ShardWriter;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Closeable; import org.apache.hadoop.io.Closeable;
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.Writable;
import org.apache.hadoop.io.WritableComparable; import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.Reporter;
/** /**
* This reducer applies to a shard the changes for it. A "new version" of * This reducer applies to a shard the changes for it. A "new version" of
* a shard is created at the end of a reduce. It is important to note that * a shard is created at the end of a reduce. It is important to note that
* the new version of the shard is not derived from scratch. By leveraging * the new version of the shard is not derived from scratch. By leveraging
* Lucene's update algorithm, the new version of each Lucene instance will * Lucene's update algorithm, the new version of each Lucene instance will
* share as many files as possible as the previous version. * share as many files as possible as the previous version.
*/ */
public class IndexUpdateReducer extends MapReduceBase implements public class IndexUpdateReducer extends MapReduceBase implements
Reducer<Shard, IntermediateForm, Shard, Text> { Reducer<Shard, IntermediateForm, Shard, Text> {
static final Log LOG = LogFactory.getLog(IndexUpdateReducer.class); static final Log LOG = LogFactory.getLog(IndexUpdateReducer.class);
static final Text DONE = new Text("done"); static final Text DONE = new Text("done");
/** /**
* Get the reduce output key class. * Get the reduce output key class.
* @return the reduce output key class * @return the reduce output key class
*/ */
public static Class<? extends WritableComparable> getOutputKeyClass() { public static Class<? extends WritableComparable> getOutputKeyClass() {
return Shard.class; return Shard.class;
} }
/** /**
* Get the reduce output value class. * Get the reduce output value class.
* @return the reduce output value class * @return the reduce output value class
*/ */
public static Class<? extends Writable> getOutputValueClass() { public static Class<? extends Writable> getOutputValueClass() {
return Text.class; return Text.class;
} }
private IndexUpdateConfiguration iconf; private IndexUpdateConfiguration iconf;
private String mapredTempDir; private String mapredTempDir;
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.Reducer#reduce(java.lang.Object, java.util.Iterator, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter) * @see org.apache.hadoop.mapred.Reducer#reduce(java.lang.Object, java.util.Iterator, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter)
*/ */
public void reduce(Shard key, Iterator<IntermediateForm> values, public void reduce(Shard key, Iterator<IntermediateForm> values,
OutputCollector<Shard, Text> output, Reporter reporter) OutputCollector<Shard, Text> output, Reporter reporter)
throws IOException { throws IOException {
LOG.info("Construct a shard writer for " + key); LOG.info("Construct a shard writer for " + key);
FileSystem fs = FileSystem.get(iconf.getConfiguration()); FileSystem fs = FileSystem.get(iconf.getConfiguration());
String temp = String temp =
mapredTempDir + Path.SEPARATOR + "shard_" + System.currentTimeMillis(); mapredTempDir + Path.SEPARATOR + "shard_" + System.currentTimeMillis();
final ShardWriter writer = new ShardWriter(fs, key, temp, iconf); final ShardWriter writer = new ShardWriter(fs, key, temp, iconf);
// update the shard // update the shard
while (values.hasNext()) { while (values.hasNext()) {
IntermediateForm form = values.next(); IntermediateForm form = values.next();
writer.process(form); writer.process(form);
reporter.progress(); reporter.progress();
} }
// close the shard // close the shard
final Reporter fReporter = reporter; final Reporter fReporter = reporter;
new Closeable() { new Closeable() {
volatile boolean closed = false; volatile boolean closed = false;
public void close() throws IOException { public void close() throws IOException {
// spawn a thread to give progress heartbeats // spawn a thread to give progress heartbeats
Thread prog = new Thread() { Thread prog = new Thread() {
public void run() { public void run() {
while (!closed) { while (!closed) {
try { try {
fReporter.setStatus("closing"); fReporter.setStatus("closing");
Thread.sleep(1000); Thread.sleep(1000);
} catch (InterruptedException e) { } catch (InterruptedException e) {
continue; continue;
} catch (Throwable e) { } catch (Throwable e) {
return; return;
} }
} }
} }
}; };
try { try {
prog.start(); prog.start();
if (writer != null) { if (writer != null) {
writer.close(); writer.close();
} }
} finally { } finally {
closed = true; closed = true;
} }
} }
}.close(); }.close();
LOG.info("Closed the shard writer for " + key + ", writer = " + writer); LOG.info("Closed the shard writer for " + key + ", writer = " + writer);
output.collect(key, DONE); output.collect(key, DONE);
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.MapReduceBase#configure(org.apache.hadoop.mapred.JobConf) * @see org.apache.hadoop.mapred.MapReduceBase#configure(org.apache.hadoop.mapred.JobConf)
*/ */
public void configure(JobConf job) { public void configure(JobConf job) {
iconf = new IndexUpdateConfiguration(job); iconf = new IndexUpdateConfiguration(job);
mapredTempDir = iconf.getMapredTempDir(); mapredTempDir = iconf.getMapredTempDir();
mapredTempDir = Shard.normalizePath(mapredTempDir); mapredTempDir = Shard.normalizePath(mapredTempDir);
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.mapred.MapReduceBase#close() * @see org.apache.hadoop.mapred.MapReduceBase#close()
*/ */
public void close() throws IOException { public void close() throws IOException {
} }
} }

View File

@ -1,252 +1,252 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import java.io.DataInput; import java.io.DataInput;
import java.io.DataOutput; import java.io.DataOutput;
import java.io.IOException; import java.io.IOException;
import java.util.Collection; import java.util.Collection;
import java.util.Iterator; import java.util.Iterator;
import java.util.concurrent.ConcurrentLinkedQueue; import java.util.concurrent.ConcurrentLinkedQueue;
import org.apache.hadoop.contrib.index.lucene.RAMDirectoryUtil; import org.apache.hadoop.contrib.index.lucene.RAMDirectoryUtil;
import org.apache.hadoop.io.Text; import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable; import org.apache.hadoop.io.Writable;
import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy; import org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy;
import org.apache.lucene.index.Term; import org.apache.lucene.index.Term;
import org.apache.lucene.store.Directory; import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.store.RAMDirectory;
/** /**
* An intermediate form for one or more parsed Lucene documents and/or * An intermediate form for one or more parsed Lucene documents and/or
* delete terms. It actually uses Lucene file format as the format for * delete terms. It actually uses Lucene file format as the format for
* the intermediate form by using RAM dir files. * the intermediate form by using RAM dir files.
* *
* Note: If process(*) is ever called, closeWriter() should be called. * Note: If process(*) is ever called, closeWriter() should be called.
* Otherwise, no need to call closeWriter(). * Otherwise, no need to call closeWriter().
*/ */
public class IntermediateForm implements Writable { public class IntermediateForm implements Writable {
private IndexUpdateConfiguration iconf = null; private IndexUpdateConfiguration iconf = null;
private final Collection<Term> deleteList; private final Collection<Term> deleteList;
private RAMDirectory dir; private RAMDirectory dir;
private IndexWriter writer; private IndexWriter writer;
private int numDocs; private int numDocs;
/** /**
* Constructor * Constructor
* @throws IOException * @throws IOException
*/ */
public IntermediateForm() throws IOException { public IntermediateForm() throws IOException {
deleteList = new ConcurrentLinkedQueue<Term>(); deleteList = new ConcurrentLinkedQueue<Term>();
dir = new RAMDirectory(); dir = new RAMDirectory();
writer = null; writer = null;
numDocs = 0; numDocs = 0;
} }
/** /**
* Configure using an index update configuration. * Configure using an index update configuration.
* @param iconf the index update configuration * @param iconf the index update configuration
*/ */
public void configure(IndexUpdateConfiguration iconf) { public void configure(IndexUpdateConfiguration iconf) {
this.iconf = iconf; this.iconf = iconf;
} }
/** /**
* Get the ram directory of the intermediate form. * Get the ram directory of the intermediate form.
* @return the ram directory * @return the ram directory
*/ */
public Directory getDirectory() { public Directory getDirectory() {
return dir; return dir;
} }
/** /**
* Get an iterator for the delete terms in the intermediate form. * Get an iterator for the delete terms in the intermediate form.
* @return an iterator for the delete terms * @return an iterator for the delete terms
*/ */
public Iterator<Term> deleteTermIterator() { public Iterator<Term> deleteTermIterator() {
return deleteList.iterator(); return deleteList.iterator();
} }
/** /**
* This method is used by the index update mapper and process a document * This method is used by the index update mapper and process a document
* operation into the current intermediate form. * operation into the current intermediate form.
* @param doc input document operation * @param doc input document operation
* @param analyzer the analyzer * @param analyzer the analyzer
* @throws IOException * @throws IOException
*/ */
public void process(DocumentAndOp doc, Analyzer analyzer) throws IOException { public void process(DocumentAndOp doc, Analyzer analyzer) throws IOException {
if (doc.getOp() == DocumentAndOp.Op.DELETE if (doc.getOp() == DocumentAndOp.Op.DELETE
|| doc.getOp() == DocumentAndOp.Op.UPDATE) { || doc.getOp() == DocumentAndOp.Op.UPDATE) {
deleteList.add(doc.getTerm()); deleteList.add(doc.getTerm());
} }
if (doc.getOp() == DocumentAndOp.Op.INSERT if (doc.getOp() == DocumentAndOp.Op.INSERT
|| doc.getOp() == DocumentAndOp.Op.UPDATE) { || doc.getOp() == DocumentAndOp.Op.UPDATE) {
if (writer == null) { if (writer == null) {
// analyzer is null because we specify an analyzer with addDocument // analyzer is null because we specify an analyzer with addDocument
writer = createWriter(); writer = createWriter();
} }
writer.addDocument(doc.getDocument(), analyzer); writer.addDocument(doc.getDocument(), analyzer);
numDocs++; numDocs++;
} }
} }
/** /**
* This method is used by the index update combiner and process an * This method is used by the index update combiner and process an
* intermediate form into the current intermediate form. More specifically, * intermediate form into the current intermediate form. More specifically,
* the input intermediate forms are a single-document ram index and/or a * the input intermediate forms are a single-document ram index and/or a
* single delete term. * single delete term.
* @param form the input intermediate form * @param form the input intermediate form
* @throws IOException * @throws IOException
*/ */
public void process(IntermediateForm form) throws IOException { public void process(IntermediateForm form) throws IOException {
if (form.deleteList.size() > 0) { if (form.deleteList.size() > 0) {
deleteList.addAll(form.deleteList); deleteList.addAll(form.deleteList);
} }
if (form.dir.sizeInBytes() > 0) { if (form.dir.sizeInBytes() > 0) {
if (writer == null) { if (writer == null) {
writer = createWriter(); writer = createWriter();
} }
writer.addIndexesNoOptimize(new Directory[] { form.dir }); writer.addIndexesNoOptimize(new Directory[] { form.dir });
numDocs++; numDocs++;
} }
} }
/** /**
* Close the Lucene index writer associated with the intermediate form, * Close the Lucene index writer associated with the intermediate form,
* if created. Do not close the ram directory. In fact, there is no need * if created. Do not close the ram directory. In fact, there is no need
* to close a ram directory. * to close a ram directory.
* @throws IOException * @throws IOException
*/ */
public void closeWriter() throws IOException { public void closeWriter() throws IOException {
if (writer != null) { if (writer != null) {
writer.close(); writer.close();
writer = null; writer = null;
} }
} }
/** /**
* The total size of files in the directory and ram used by the index writer. * The total size of files in the directory and ram used by the index writer.
* It does not include memory used by the delete list. * It does not include memory used by the delete list.
* @return the total size in bytes * @return the total size in bytes
*/ */
public long totalSizeInBytes() throws IOException { public long totalSizeInBytes() throws IOException {
long size = dir.sizeInBytes(); long size = dir.sizeInBytes();
if (writer != null) { if (writer != null) {
size += writer.ramSizeInBytes(); size += writer.ramSizeInBytes();
} }
return size; return size;
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see java.lang.Object#toString() * @see java.lang.Object#toString()
*/ */
public String toString() { public String toString() {
StringBuilder buffer = new StringBuilder(); StringBuilder buffer = new StringBuilder();
buffer.append(this.getClass().getSimpleName()); buffer.append(this.getClass().getSimpleName());
buffer.append("[numDocs="); buffer.append("[numDocs=");
buffer.append(numDocs); buffer.append(numDocs);
buffer.append(", numDeletes="); buffer.append(", numDeletes=");
buffer.append(deleteList.size()); buffer.append(deleteList.size());
if (deleteList.size() > 0) { if (deleteList.size() > 0) {
buffer.append("("); buffer.append("(");
Iterator<Term> iter = deleteTermIterator(); Iterator<Term> iter = deleteTermIterator();
while (iter.hasNext()) { while (iter.hasNext()) {
buffer.append(iter.next()); buffer.append(iter.next());
buffer.append(" "); buffer.append(" ");
} }
buffer.append(")"); buffer.append(")");
} }
buffer.append("]"); buffer.append("]");
return buffer.toString(); return buffer.toString();
} }
private IndexWriter createWriter() throws IOException { private IndexWriter createWriter() throws IOException {
IndexWriter writer = IndexWriter writer =
new IndexWriter(dir, false, null, new IndexWriter(dir, false, null,
new KeepOnlyLastCommitDeletionPolicy()); new KeepOnlyLastCommitDeletionPolicy());
writer.setUseCompoundFile(false); writer.setUseCompoundFile(false);
if (iconf != null) { if (iconf != null) {
int maxFieldLength = iconf.getIndexMaxFieldLength(); int maxFieldLength = iconf.getIndexMaxFieldLength();
if (maxFieldLength > 0) { if (maxFieldLength > 0) {
writer.setMaxFieldLength(maxFieldLength); writer.setMaxFieldLength(maxFieldLength);
} }
} }
return writer; return writer;
} }
private void resetForm() throws IOException { private void resetForm() throws IOException {
deleteList.clear(); deleteList.clear();
if (dir.sizeInBytes() > 0) { if (dir.sizeInBytes() > 0) {
// it's ok if we don't close a ram directory // it's ok if we don't close a ram directory
dir.close(); dir.close();
// an alternative is to delete all the files and reuse the ram directory // an alternative is to delete all the files and reuse the ram directory
dir = new RAMDirectory(); dir = new RAMDirectory();
} }
assert (writer == null); assert (writer == null);
numDocs = 0; numDocs = 0;
} }
// /////////////////////////////////// // ///////////////////////////////////
// Writable // Writable
// /////////////////////////////////// // ///////////////////////////////////
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.io.Writable#write(java.io.DataOutput) * @see org.apache.hadoop.io.Writable#write(java.io.DataOutput)
*/ */
public void write(DataOutput out) throws IOException { public void write(DataOutput out) throws IOException {
out.writeInt(deleteList.size()); out.writeInt(deleteList.size());
for (Term term : deleteList) { for (Term term : deleteList) {
Text.writeString(out, term.field()); Text.writeString(out, term.field());
Text.writeString(out, term.text()); Text.writeString(out, term.text());
} }
String[] files = dir.list(); String[] files = dir.list();
RAMDirectoryUtil.writeRAMFiles(out, dir, files); RAMDirectoryUtil.writeRAMFiles(out, dir, files);
} }
/* (non-Javadoc) /* (non-Javadoc)
* @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput) * @see org.apache.hadoop.io.Writable#readFields(java.io.DataInput)
*/ */
public void readFields(DataInput in) throws IOException { public void readFields(DataInput in) throws IOException {
resetForm(); resetForm();
int numDeleteTerms = in.readInt(); int numDeleteTerms = in.readInt();
for (int i = 0; i < numDeleteTerms; i++) { for (int i = 0; i < numDeleteTerms; i++) {
String field = Text.readString(in); String field = Text.readString(in);
String text = Text.readString(in); String text = Text.readString(in);
deleteList.add(new Term(field, text)); deleteList.add(new Term(field, text));
} }
RAMDirectoryUtil.readRAMFiles(in, dir); RAMDirectoryUtil.readRAMFiles(in, dir);
} }
} }

View File

@ -1,105 +1,105 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.lucene; package org.apache.hadoop.contrib.index.lucene;
import java.io.IOException; import java.io.IOException;
import junit.framework.TestCase; import junit.framework.TestCase;
import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document; import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field; import org.apache.lucene.document.Field;
import org.apache.lucene.index.IndexDeletionPolicy; import org.apache.lucene.index.IndexDeletionPolicy;
import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy; import org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy;
import org.apache.lucene.index.Term; import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits; import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory; import org.apache.lucene.store.Directory;
import org.apache.lucene.store.IndexOutput; import org.apache.lucene.store.IndexOutput;
import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.store.RAMDirectory;
public class TestMixedDirectory extends TestCase { public class TestMixedDirectory extends TestCase {
private int numDocsPerUpdate = 10; private int numDocsPerUpdate = 10;
private int maxBufferedDocs = 2; private int maxBufferedDocs = 2;
public void testMixedDirectoryAndPolicy() throws IOException { public void testMixedDirectoryAndPolicy() throws IOException {
Directory readDir = new RAMDirectory(); Directory readDir = new RAMDirectory();
updateIndex(readDir, 0, numDocsPerUpdate, updateIndex(readDir, 0, numDocsPerUpdate,
new KeepOnlyLastCommitDeletionPolicy()); new KeepOnlyLastCommitDeletionPolicy());
verify(readDir, numDocsPerUpdate); verify(readDir, numDocsPerUpdate);
IndexOutput out = IndexOutput out =
readDir.createOutput("_" + (numDocsPerUpdate / maxBufferedDocs + 2) readDir.createOutput("_" + (numDocsPerUpdate / maxBufferedDocs + 2)
+ ".cfs"); + ".cfs");
out.writeInt(0); out.writeInt(0);
out.close(); out.close();
Directory writeDir = new RAMDirectory(); Directory writeDir = new RAMDirectory();
Directory mixedDir = new MixedDirectory(readDir, writeDir); Directory mixedDir = new MixedDirectory(readDir, writeDir);
updateIndex(mixedDir, numDocsPerUpdate, numDocsPerUpdate, updateIndex(mixedDir, numDocsPerUpdate, numDocsPerUpdate,
new MixedDeletionPolicy()); new MixedDeletionPolicy());
verify(readDir, numDocsPerUpdate); verify(readDir, numDocsPerUpdate);
verify(mixedDir, 2 * numDocsPerUpdate); verify(mixedDir, 2 * numDocsPerUpdate);
} }
public void updateIndex(Directory dir, int base, int numDocs, public void updateIndex(Directory dir, int base, int numDocs,
IndexDeletionPolicy policy) throws IOException { IndexDeletionPolicy policy) throws IOException {
IndexWriter writer = IndexWriter writer =
new IndexWriter(dir, false, new StandardAnalyzer(), policy); new IndexWriter(dir, false, new StandardAnalyzer(), policy);
writer.setMaxBufferedDocs(maxBufferedDocs); writer.setMaxBufferedDocs(maxBufferedDocs);
writer.setMergeFactor(1000); writer.setMergeFactor(1000);
for (int i = 0; i < numDocs; i++) { for (int i = 0; i < numDocs; i++) {
addDoc(writer, base + i); addDoc(writer, base + i);
} }
writer.close(); writer.close();
} }
private void addDoc(IndexWriter writer, int id) throws IOException { private void addDoc(IndexWriter writer, int id) throws IOException {
Document doc = new Document(); Document doc = new Document();
doc.add(new Field("id", String.valueOf(id), Field.Store.YES, doc.add(new Field("id", String.valueOf(id), Field.Store.YES,
Field.Index.UN_TOKENIZED)); Field.Index.UN_TOKENIZED));
doc.add(new Field("content", "apache", Field.Store.NO, doc.add(new Field("content", "apache", Field.Store.NO,
Field.Index.TOKENIZED)); Field.Index.TOKENIZED));
writer.addDocument(doc); writer.addDocument(doc);
} }
private void verify(Directory dir, int expectedHits) throws IOException { private void verify(Directory dir, int expectedHits) throws IOException {
IndexSearcher searcher = new IndexSearcher(dir); IndexSearcher searcher = new IndexSearcher(dir);
Hits hits = searcher.search(new TermQuery(new Term("content", "apache"))); Hits hits = searcher.search(new TermQuery(new Term("content", "apache")));
int numHits = hits.length(); int numHits = hits.length();
assertEquals(expectedHits, numHits); assertEquals(expectedHits, numHits);
int[] docs = new int[numHits]; int[] docs = new int[numHits];
for (int i = 0; i < numHits; i++) { for (int i = 0; i < numHits; i++) {
Document hit = hits.doc(i); Document hit = hits.doc(i);
docs[Integer.parseInt(hit.get("id"))]++; docs[Integer.parseInt(hit.get("id"))]++;
} }
for (int i = 0; i < numHits; i++) { for (int i = 0; i < numHits; i++) {
assertEquals(1, docs[i]); assertEquals(1, docs[i]);
} }
searcher.close(); searcher.close();
} }
} }

View File

@ -1,234 +1,234 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import java.io.File; import java.io.File;
import java.io.IOException; import java.io.IOException;
import java.text.NumberFormat; import java.text.NumberFormat;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.contrib.index.example.HashingDistributionPolicy; import org.apache.hadoop.contrib.index.example.HashingDistributionPolicy;
import org.apache.hadoop.contrib.index.example.RoundRobinDistributionPolicy; import org.apache.hadoop.contrib.index.example.RoundRobinDistributionPolicy;
import org.apache.hadoop.contrib.index.lucene.FileSystemDirectory; import org.apache.hadoop.contrib.index.lucene.FileSystemDirectory;
import org.apache.hadoop.hdfs.MiniDFSCluster; import org.apache.hadoop.hdfs.MiniDFSCluster;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.MiniMRCluster; import org.apache.hadoop.mapred.MiniMRCluster;
import org.apache.lucene.document.Document; import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.MultiReader; import org.apache.lucene.index.MultiReader;
import org.apache.lucene.index.Term; import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits; import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory; import org.apache.lucene.store.Directory;
import junit.framework.TestCase; import junit.framework.TestCase;
public class TestDistributionPolicy extends TestCase { public class TestDistributionPolicy extends TestCase {
private static final NumberFormat NUMBER_FORMAT = NumberFormat.getInstance(); private static final NumberFormat NUMBER_FORMAT = NumberFormat.getInstance();
static { static {
NUMBER_FORMAT.setMinimumIntegerDigits(5); NUMBER_FORMAT.setMinimumIntegerDigits(5);
NUMBER_FORMAT.setGroupingUsed(false); NUMBER_FORMAT.setGroupingUsed(false);
} }
// however, "we only allow 0 or 1 reducer in local mode" - from // however, "we only allow 0 or 1 reducer in local mode" - from
// LocalJobRunner // LocalJobRunner
private Configuration conf; private Configuration conf;
private Path localInputPath = new Path(System.getProperty("build.test") + "/sample/data.txt"); private Path localInputPath = new Path(System.getProperty("build.test") + "/sample/data.txt");
private Path localUpdatePath = private Path localUpdatePath =
new Path(System.getProperty("build.test") + "/sample/data2.txt"); new Path(System.getProperty("build.test") + "/sample/data2.txt");
private Path inputPath = new Path("/myexample/data.txt"); private Path inputPath = new Path("/myexample/data.txt");
private Path updatePath = new Path("/myexample/data2.txt"); private Path updatePath = new Path("/myexample/data2.txt");
private Path outputPath = new Path("/myoutput"); private Path outputPath = new Path("/myoutput");
private Path indexPath = new Path("/myindex"); private Path indexPath = new Path("/myindex");
private int numShards = 3; private int numShards = 3;
private int numMapTasks = 5; private int numMapTasks = 5;
private int numDataNodes = 3; private int numDataNodes = 3;
private int numTaskTrackers = 3; private int numTaskTrackers = 3;
private int numDocsPerRun = 10; // num of docs in local input path private int numDocsPerRun = 10; // num of docs in local input path
private FileSystem fs; private FileSystem fs;
private MiniDFSCluster dfsCluster; private MiniDFSCluster dfsCluster;
private MiniMRCluster mrCluster; private MiniMRCluster mrCluster;
public TestDistributionPolicy() throws IOException { public TestDistributionPolicy() throws IOException {
super(); super();
if (System.getProperty("hadoop.log.dir") == null) { if (System.getProperty("hadoop.log.dir") == null) {
String base = new File(".").getPath(); // getAbsolutePath(); String base = new File(".").getPath(); // getAbsolutePath();
System.setProperty("hadoop.log.dir", new Path(base).toString() + "/logs"); System.setProperty("hadoop.log.dir", new Path(base).toString() + "/logs");
} }
conf = new Configuration(); conf = new Configuration();
} }
protected void setUp() throws Exception { protected void setUp() throws Exception {
super.setUp(); super.setUp();
try { try {
dfsCluster = dfsCluster =
new MiniDFSCluster(conf, numDataNodes, true, (String[]) null); new MiniDFSCluster(conf, numDataNodes, true, (String[]) null);
fs = dfsCluster.getFileSystem(); fs = dfsCluster.getFileSystem();
if (fs.exists(inputPath)) { if (fs.exists(inputPath)) {
fs.delete(inputPath, true); fs.delete(inputPath, true);
} }
fs.copyFromLocalFile(localInputPath, inputPath); fs.copyFromLocalFile(localInputPath, inputPath);
if (fs.exists(updatePath)) { if (fs.exists(updatePath)) {
fs.delete(updatePath, true); fs.delete(updatePath, true);
} }
fs.copyFromLocalFile(localUpdatePath, updatePath); fs.copyFromLocalFile(localUpdatePath, updatePath);
if (fs.exists(outputPath)) { if (fs.exists(outputPath)) {
// do not create, mapred will create // do not create, mapred will create
fs.delete(outputPath, true); fs.delete(outputPath, true);
} }
if (fs.exists(indexPath)) { if (fs.exists(indexPath)) {
fs.delete(indexPath, true); fs.delete(indexPath, true);
} }
mrCluster = mrCluster =
new MiniMRCluster(numTaskTrackers, fs.getUri().toString(), 1); new MiniMRCluster(numTaskTrackers, fs.getUri().toString(), 1);
} catch (IOException e) { } catch (IOException e) {
if (dfsCluster != null) { if (dfsCluster != null) {
dfsCluster.shutdown(); dfsCluster.shutdown();
dfsCluster = null; dfsCluster = null;
} }
if (fs != null) { if (fs != null) {
fs.close(); fs.close();
fs = null; fs = null;
} }
if (mrCluster != null) { if (mrCluster != null) {
mrCluster.shutdown(); mrCluster.shutdown();
mrCluster = null; mrCluster = null;
} }
throw e; throw e;
} }
} }
protected void tearDown() throws Exception { protected void tearDown() throws Exception {
if (dfsCluster != null) { if (dfsCluster != null) {
dfsCluster.shutdown(); dfsCluster.shutdown();
dfsCluster = null; dfsCluster = null;
} }
if (fs != null) { if (fs != null) {
fs.close(); fs.close();
fs = null; fs = null;
} }
if (mrCluster != null) { if (mrCluster != null) {
mrCluster.shutdown(); mrCluster.shutdown();
mrCluster = null; mrCluster = null;
} }
super.tearDown(); super.tearDown();
} }
public void testDistributionPolicy() throws IOException { public void testDistributionPolicy() throws IOException {
IndexUpdateConfiguration iconf = new IndexUpdateConfiguration(conf); IndexUpdateConfiguration iconf = new IndexUpdateConfiguration(conf);
// test hashing distribution policy // test hashing distribution policy
iconf.setDistributionPolicyClass(HashingDistributionPolicy.class); iconf.setDistributionPolicyClass(HashingDistributionPolicy.class);
onetest(); onetest();
if (fs.exists(indexPath)) { if (fs.exists(indexPath)) {
fs.delete(indexPath, true); fs.delete(indexPath, true);
} }
// test round-robin distribution policy // test round-robin distribution policy
iconf.setDistributionPolicyClass(RoundRobinDistributionPolicy.class); iconf.setDistributionPolicyClass(RoundRobinDistributionPolicy.class);
onetest(); onetest();
} }
private void onetest() throws IOException { private void onetest() throws IOException {
long versionNumber = -1; long versionNumber = -1;
long generation = -1; long generation = -1;
Shard[] shards = new Shard[numShards]; Shard[] shards = new Shard[numShards];
for (int j = 0; j < shards.length; j++) { for (int j = 0; j < shards.length; j++) {
shards[j] = shards[j] =
new Shard(versionNumber, new Shard(versionNumber,
new Path(indexPath, NUMBER_FORMAT.format(j)).toString(), new Path(indexPath, NUMBER_FORMAT.format(j)).toString(),
generation); generation);
} }
if (fs.exists(outputPath)) { if (fs.exists(outputPath)) {
fs.delete(outputPath, true); fs.delete(outputPath, true);
} }
IIndexUpdater updater = new IndexUpdater(); IIndexUpdater updater = new IndexUpdater();
updater.run(conf, new Path[] { inputPath }, outputPath, numMapTasks, updater.run(conf, new Path[] { inputPath }, outputPath, numMapTasks,
shards); shards);
if (fs.exists(outputPath)) { if (fs.exists(outputPath)) {
fs.delete(outputPath, true); fs.delete(outputPath, true);
} }
// delete docs w/ even docids, update docs w/ odd docids // delete docs w/ even docids, update docs w/ odd docids
updater.run(conf, new Path[] { updatePath }, outputPath, numMapTasks, updater.run(conf, new Path[] { updatePath }, outputPath, numMapTasks,
shards); shards);
verify(shards); verify(shards);
} }
private void verify(Shard[] shards) throws IOException { private void verify(Shard[] shards) throws IOException {
// verify the index // verify the index
IndexReader[] readers = new IndexReader[shards.length]; IndexReader[] readers = new IndexReader[shards.length];
for (int i = 0; i < shards.length; i++) { for (int i = 0; i < shards.length; i++) {
Directory dir = Directory dir =
new FileSystemDirectory(fs, new Path(shards[i].getDirectory()), new FileSystemDirectory(fs, new Path(shards[i].getDirectory()),
false, conf); false, conf);
readers[i] = IndexReader.open(dir); readers[i] = IndexReader.open(dir);
} }
IndexReader reader = new MultiReader(readers); IndexReader reader = new MultiReader(readers);
IndexSearcher searcher = new IndexSearcher(reader); IndexSearcher searcher = new IndexSearcher(reader);
Hits hits = searcher.search(new TermQuery(new Term("content", "apache"))); Hits hits = searcher.search(new TermQuery(new Term("content", "apache")));
assertEquals(0, hits.length()); assertEquals(0, hits.length());
hits = searcher.search(new TermQuery(new Term("content", "hadoop"))); hits = searcher.search(new TermQuery(new Term("content", "hadoop")));
assertEquals(numDocsPerRun / 2, hits.length()); assertEquals(numDocsPerRun / 2, hits.length());
int[] counts = new int[numDocsPerRun]; int[] counts = new int[numDocsPerRun];
for (int i = 0; i < hits.length(); i++) { for (int i = 0; i < hits.length(); i++) {
Document doc = hits.doc(i); Document doc = hits.doc(i);
counts[Integer.parseInt(doc.get("id"))]++; counts[Integer.parseInt(doc.get("id"))]++;
} }
for (int i = 0; i < numDocsPerRun; i++) { for (int i = 0; i < numDocsPerRun; i++) {
if (i % 2 == 0) { if (i % 2 == 0) {
assertEquals(0, counts[i]); assertEquals(0, counts[i]);
} else { } else {
assertEquals(1, counts[i]); assertEquals(1, counts[i]);
} }
} }
searcher.close(); searcher.close();
reader.close(); reader.close();
} }
} }

View File

@ -1,258 +1,258 @@
/** /**
* Licensed to the Apache Software Foundation (ASF) under one * Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file * or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information * distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file * regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the * to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance * "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at * with the License. You may obtain a copy of the License at
* *
* http://www.apache.org/licenses/LICENSE-2.0 * http://www.apache.org/licenses/LICENSE-2.0
* *
* Unless required by applicable law or agreed to in writing, software * Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS, * distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and * See the License for the specific language governing permissions and
* limitations under the License. * limitations under the License.
*/ */
package org.apache.hadoop.contrib.index.mapred; package org.apache.hadoop.contrib.index.mapred;
import java.io.File; import java.io.File;
import java.io.IOException; import java.io.IOException;
import java.text.NumberFormat; import java.text.NumberFormat;
import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.contrib.index.lucene.FileSystemDirectory; import org.apache.hadoop.contrib.index.lucene.FileSystemDirectory;
import org.apache.hadoop.hdfs.MiniDFSCluster; import org.apache.hadoop.hdfs.MiniDFSCluster;
import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.PathFilter; import org.apache.hadoop.fs.PathFilter;
import org.apache.hadoop.mapred.MiniMRCluster; import org.apache.hadoop.mapred.MiniMRCluster;
import org.apache.lucene.document.Document; import org.apache.lucene.document.Document;
import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter; import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy; import org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy;
import org.apache.lucene.index.MultiReader; import org.apache.lucene.index.MultiReader;
import org.apache.lucene.index.Term; import org.apache.lucene.index.Term;
import org.apache.lucene.search.Hits; import org.apache.lucene.search.Hits;
import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.TermQuery; import org.apache.lucene.search.TermQuery;
import org.apache.lucene.store.Directory; import org.apache.lucene.store.Directory;
import junit.framework.TestCase; import junit.framework.TestCase;
public class TestIndexUpdater extends TestCase { public class TestIndexUpdater extends TestCase {
private static final NumberFormat NUMBER_FORMAT = NumberFormat.getInstance(); private static final NumberFormat NUMBER_FORMAT = NumberFormat.getInstance();
static { static {
NUMBER_FORMAT.setMinimumIntegerDigits(5); NUMBER_FORMAT.setMinimumIntegerDigits(5);
NUMBER_FORMAT.setGroupingUsed(false); NUMBER_FORMAT.setGroupingUsed(false);
} }
// however, "we only allow 0 or 1 reducer in local mode" - from // however, "we only allow 0 or 1 reducer in local mode" - from
// LocalJobRunner // LocalJobRunner
private Configuration conf; private Configuration conf;
private Path localInputPath = new Path(System.getProperty("build.test") + "/sample/data.txt"); private Path localInputPath = new Path(System.getProperty("build.test") + "/sample/data.txt");
private Path inputPath = new Path("/myexample/data.txt"); private Path inputPath = new Path("/myexample/data.txt");
private Path outputPath = new Path("/myoutput"); private Path outputPath = new Path("/myoutput");
private Path indexPath = new Path("/myindex"); private Path indexPath = new Path("/myindex");
private int initNumShards = 3; private int initNumShards = 3;
private int numMapTasks = 5; private int numMapTasks = 5;
private int numDataNodes = 3; private int numDataNodes = 3;
private int numTaskTrackers = 3; private int numTaskTrackers = 3;
private int numRuns = 3; private int numRuns = 3;
private int numDocsPerRun = 10; // num of docs in local input path private int numDocsPerRun = 10; // num of docs in local input path
private FileSystem fs; private FileSystem fs;
private MiniDFSCluster dfsCluster; private MiniDFSCluster dfsCluster;
private MiniMRCluster mrCluster; private MiniMRCluster mrCluster;
public TestIndexUpdater() throws IOException { public TestIndexUpdater() throws IOException {
super(); super();
if (System.getProperty("hadoop.log.dir") == null) { if (System.getProperty("hadoop.log.dir") == null) {
String base = new File(".").getPath(); // getAbsolutePath(); String base = new File(".").getPath(); // getAbsolutePath();
System.setProperty("hadoop.log.dir", new Path(base).toString() + "/logs"); System.setProperty("hadoop.log.dir", new Path(base).toString() + "/logs");
} }
conf = new Configuration(); conf = new Configuration();
//See MAPREDUCE-947 for more details. Setting to false prevents the creation of _SUCCESS. //See MAPREDUCE-947 for more details. Setting to false prevents the creation of _SUCCESS.
conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false); conf.setBoolean("mapreduce.fileoutputcommitter.marksuccessfuljobs", false);
} }
protected void setUp() throws Exception { protected void setUp() throws Exception {
super.setUp(); super.setUp();
try { try {
dfsCluster = dfsCluster =
new MiniDFSCluster(conf, numDataNodes, true, (String[]) null); new MiniDFSCluster(conf, numDataNodes, true, (String[]) null);
fs = dfsCluster.getFileSystem(); fs = dfsCluster.getFileSystem();
if (fs.exists(inputPath)) { if (fs.exists(inputPath)) {
fs.delete(inputPath, true); fs.delete(inputPath, true);
} }
fs.copyFromLocalFile(localInputPath, inputPath); fs.copyFromLocalFile(localInputPath, inputPath);
if (fs.exists(outputPath)) { if (fs.exists(outputPath)) {
// do not create, mapred will create // do not create, mapred will create
fs.delete(outputPath, true); fs.delete(outputPath, true);
} }
if (fs.exists(indexPath)) { if (fs.exists(indexPath)) {
fs.delete(indexPath, true); fs.delete(indexPath, true);
} }
mrCluster = mrCluster =
new MiniMRCluster(numTaskTrackers, fs.getUri().toString(), 1); new MiniMRCluster(numTaskTrackers, fs.getUri().toString(), 1);
} catch (IOException e) { } catch (IOException e) {
if (dfsCluster != null) { if (dfsCluster != null) {
dfsCluster.shutdown(); dfsCluster.shutdown();
dfsCluster = null; dfsCluster = null;
} }
if (fs != null) { if (fs != null) {
fs.close(); fs.close();
fs = null; fs = null;
} }
if (mrCluster != null) { if (mrCluster != null) {
mrCluster.shutdown(); mrCluster.shutdown();
mrCluster = null; mrCluster = null;
} }
throw e; throw e;
} }
} }
protected void tearDown() throws Exception { protected void tearDown() throws Exception {
if (dfsCluster != null) { if (dfsCluster != null) {
dfsCluster.shutdown(); dfsCluster.shutdown();
dfsCluster = null; dfsCluster = null;
} }
if (fs != null) { if (fs != null) {
fs.close(); fs.close();
fs = null; fs = null;
} }
if (mrCluster != null) { if (mrCluster != null) {
mrCluster.shutdown(); mrCluster.shutdown();
mrCluster = null; mrCluster = null;
} }
super.tearDown(); super.tearDown();
} }
public void testIndexUpdater() throws IOException { public void testIndexUpdater() throws IOException {
IndexUpdateConfiguration iconf = new IndexUpdateConfiguration(conf); IndexUpdateConfiguration iconf = new IndexUpdateConfiguration(conf);
// max field length, compound file and number of segments will be checked // max field length, compound file and number of segments will be checked
// later // later
iconf.setIndexMaxFieldLength(2); iconf.setIndexMaxFieldLength(2);
iconf.setIndexUseCompoundFile(true); iconf.setIndexUseCompoundFile(true);
iconf.setIndexMaxNumSegments(1); iconf.setIndexMaxNumSegments(1);
iconf.setMaxRAMSizeInBytes(20480); iconf.setMaxRAMSizeInBytes(20480);
long versionNumber = -1; long versionNumber = -1;
long generation = -1; long generation = -1;
for (int i = 0; i < numRuns; i++) { for (int i = 0; i < numRuns; i++) {
if (fs.exists(outputPath)) { if (fs.exists(outputPath)) {
fs.delete(outputPath, true); fs.delete(outputPath, true);
} }
Shard[] shards = new Shard[initNumShards + i]; Shard[] shards = new Shard[initNumShards + i];
for (int j = 0; j < shards.length; j++) { for (int j = 0; j < shards.length; j++) {
shards[j] = shards[j] =
new Shard(versionNumber, new Path(indexPath, new Shard(versionNumber, new Path(indexPath,
NUMBER_FORMAT.format(j)).toString(), generation); NUMBER_FORMAT.format(j)).toString(), generation);
} }
run(i + 1, shards); run(i + 1, shards);
} }
} }
private void run(int numRuns, Shard[] shards) throws IOException { private void run(int numRuns, Shard[] shards) throws IOException {
IIndexUpdater updater = new IndexUpdater(); IIndexUpdater updater = new IndexUpdater();
updater.run(conf, new Path[] { inputPath }, outputPath, numMapTasks, updater.run(conf, new Path[] { inputPath }, outputPath, numMapTasks,
shards); shards);
// verify the done files // verify the done files
Path[] doneFileNames = new Path[shards.length]; Path[] doneFileNames = new Path[shards.length];
int count = 0; int count = 0;
FileStatus[] fileStatus = fs.listStatus(outputPath); FileStatus[] fileStatus = fs.listStatus(outputPath);
for (int i = 0; i < fileStatus.length; i++) { for (int i = 0; i < fileStatus.length; i++) {
FileStatus[] doneFiles = fs.listStatus(fileStatus[i].getPath()); FileStatus[] doneFiles = fs.listStatus(fileStatus[i].getPath());
for (int j = 0; j < doneFiles.length; j++) { for (int j = 0; j < doneFiles.length; j++) {
doneFileNames[count++] = doneFiles[j].getPath(); doneFileNames[count++] = doneFiles[j].getPath();
} }
} }
assertEquals(shards.length, count); assertEquals(shards.length, count);
for (int i = 0; i < count; i++) { for (int i = 0; i < count; i++) {
assertTrue(doneFileNames[i].getName().startsWith( assertTrue(doneFileNames[i].getName().startsWith(
IndexUpdateReducer.DONE.toString())); IndexUpdateReducer.DONE.toString()));
} }
// verify the index // verify the index
IndexReader[] readers = new IndexReader[shards.length]; IndexReader[] readers = new IndexReader[shards.length];
for (int i = 0; i < shards.length; i++) { for (int i = 0; i < shards.length; i++) {
Directory dir = Directory dir =
new FileSystemDirectory(fs, new Path(shards[i].getDirectory()), new FileSystemDirectory(fs, new Path(shards[i].getDirectory()),
false, conf); false, conf);
readers[i] = IndexReader.open(dir); readers[i] = IndexReader.open(dir);
} }
IndexReader reader = new MultiReader(readers); IndexReader reader = new MultiReader(readers);
IndexSearcher searcher = new IndexSearcher(reader); IndexSearcher searcher = new IndexSearcher(reader);
Hits hits = searcher.search(new TermQuery(new Term("content", "apache"))); Hits hits = searcher.search(new TermQuery(new Term("content", "apache")));
assertEquals(numRuns * numDocsPerRun, hits.length()); assertEquals(numRuns * numDocsPerRun, hits.length());
int[] counts = new int[numDocsPerRun]; int[] counts = new int[numDocsPerRun];
for (int i = 0; i < hits.length(); i++) { for (int i = 0; i < hits.length(); i++) {
Document doc = hits.doc(i); Document doc = hits.doc(i);
counts[Integer.parseInt(doc.get("id"))]++; counts[Integer.parseInt(doc.get("id"))]++;
} }
for (int i = 0; i < numDocsPerRun; i++) { for (int i = 0; i < numDocsPerRun; i++) {
assertEquals(numRuns, counts[i]); assertEquals(numRuns, counts[i]);
} }
// max field length is 2, so "dot" is also indexed but not "org" // max field length is 2, so "dot" is also indexed but not "org"
hits = searcher.search(new TermQuery(new Term("content", "dot"))); hits = searcher.search(new TermQuery(new Term("content", "dot")));
assertEquals(numRuns, hits.length()); assertEquals(numRuns, hits.length());
hits = searcher.search(new TermQuery(new Term("content", "org"))); hits = searcher.search(new TermQuery(new Term("content", "org")));
assertEquals(0, hits.length()); assertEquals(0, hits.length());
searcher.close(); searcher.close();
reader.close(); reader.close();
// open and close an index writer with KeepOnlyLastCommitDeletionPolicy // open and close an index writer with KeepOnlyLastCommitDeletionPolicy
// to remove earlier checkpoints // to remove earlier checkpoints
for (int i = 0; i < shards.length; i++) { for (int i = 0; i < shards.length; i++) {
Directory dir = Directory dir =
new FileSystemDirectory(fs, new Path(shards[i].getDirectory()), new FileSystemDirectory(fs, new Path(shards[i].getDirectory()),
false, conf); false, conf);
IndexWriter writer = IndexWriter writer =
new IndexWriter(dir, false, null, new IndexWriter(dir, false, null,
new KeepOnlyLastCommitDeletionPolicy()); new KeepOnlyLastCommitDeletionPolicy());
writer.close(); writer.close();
} }
// verify the number of segments, must be done after an writer with // verify the number of segments, must be done after an writer with
// KeepOnlyLastCommitDeletionPolicy so that earlier checkpoints are removed // KeepOnlyLastCommitDeletionPolicy so that earlier checkpoints are removed
for (int i = 0; i < shards.length; i++) { for (int i = 0; i < shards.length; i++) {
PathFilter cfsFilter = new PathFilter() { PathFilter cfsFilter = new PathFilter() {
public boolean accept(Path path) { public boolean accept(Path path) {
return path.getName().endsWith(".cfs"); return path.getName().endsWith(".cfs");
} }
}; };
FileStatus[] cfsFiles = FileStatus[] cfsFiles =
fs.listStatus(new Path(shards[i].getDirectory()), cfsFilter); fs.listStatus(new Path(shards[i].getDirectory()), cfsFilter);
assertEquals(1, cfsFiles.length); assertEquals(1, cfsFiles.length);
} }
} }
} }

View File

@ -44,6 +44,9 @@ Release 2.0.3-alpha - Unreleased
YARN-146. Add unit tests for computing fair share in the fair scheduler. YARN-146. Add unit tests for computing fair share in the fair scheduler.
(Sandy Ryza via tomwhite) (Sandy Ryza via tomwhite)
HADOOP-8911. CRLF characters in source and text files.
(Raja Aluri via suresh)
OPTIMIZATIONS OPTIMIZATIONS
BUG FIXES BUG FIXES