HBASE-23779 Up the default fork count; make count relative to CPU count (#1108)

Set the fork count for first and second parts to be 0.5C. Add a bit of doc too on this as well as some qualification on our test categories. Also adds -T0.5C to MAVEN_ARGS in the hbase personality.
2020-02-04 20:47:02 -08:00 · 2020-02-04 20:47:02 -08:00 · b49ec58073
commit b49ec58073
parent 3a1a39d40d
3 changed files with 28 additions and 16 deletions
--- a/dev-support/hbase-personality.sh
+++ b/dev-support/hbase-personality.sh
@ -81,6 +81,13 @@ function personality_globals

  # Override the maven options
  MAVEN_OPTS="${MAVEN_OPTS:-"-Xms4G -Xmx4G"}"
+  # Pass maven a -T argument. Should make it run faster. Pass conservative value.
+  # Default is one thread. 0.5C on an apache box of 24 cores and 2 executors should
+  # make for 6 threads? Lets see. Setting this here for yetus to pick up. See
+  # https://yetus.apache.org/documentation/0.11.1/precommit-advanced/#global-definitions
+  # See below for more on -T:
+  # https://cwiki.apache.org/confluence/display/MAVEN/Parallel+builds+in+Maven+3
+  export MAVEN_ARGS="-T0.5C ${MAVEN_ARGS}"

  # Yetus 0.7.0 enforces limits. Default proclimit is 1000.
  # Up it. See HBASE-19902 for how we arrived at this number.
--- a/pom.xml
+++ b/pom.xml
@ -1568,8 +1568,13 @@
    <!-- default: run small & medium, medium with 2 threads -->
    <surefire.skipFirstPart>false</surefire.skipFirstPart>
    <surefire.skipSecondPart>false</surefire.skipSecondPart>
-    <surefire.firstPartForkCount>1</surefire.firstPartForkCount>
-    <surefire.secondPartForkCount>2</surefire.secondPartForkCount>
+    <!-- Fork count varies w/ CPU count. Setting is conservative. Up this
+      value is you want to burn through tests faster (could make for more failures
+      if more contention around resources). There is a matching MAVEN_ARG
+      in our yetus personality where we set the maven -T command to 0.5C too.
+    -->
+    <surefire.firstPartForkCount>0.5C</surefire.firstPartForkCount>
+    <surefire.secondPartForkCount>0.5C</surefire.secondPartForkCount>
    <surefire.firstPartGroups>org.apache.hadoop.hbase.testclassification.SmallTests</surefire.firstPartGroups>
    <surefire.secondPartGroups>org.apache.hadoop.hbase.testclassification.MediumTests</surefire.secondPartGroups>
    <surefire.testFailureIgnore>false</surefire.testFailureIgnore>
@ -3323,7 +3328,6 @@
        <activeByDefault>false</activeByDefault>
      </activation>
      <properties>
-        <surefire.firstPartForkCount>1</surefire.firstPartForkCount>
        <surefire.skipFirstPart>false</surefire.skipFirstPart>
        <surefire.skipSecondPart>true</surefire.skipSecondPart>
        <surefire.firstPartGroups>org.apache.hadoop.hbase.testclassification.SmallTests</surefire.firstPartGroups>
@ -3377,8 +3381,6 @@
        <activeByDefault>false</activeByDefault>
      </activation>
      <properties>
-        <surefire.firstPartForkCount>1</surefire.firstPartForkCount>
-        <surefire.secondPartForkCount>5</surefire.secondPartForkCount>
        <surefire.skipFirstPart>false</surefire.skipFirstPart>
        <surefire.skipSecondPart>false</surefire.skipSecondPart>
        <surefire.firstPartGroups>org.apache.hadoop.hbase.testclassification.SmallTests</surefire.firstPartGroups>
@ -3391,8 +3393,6 @@
        <activeByDefault>false</activeByDefault>
      </activation>
      <properties>
-        <surefire.firstPartForkCount>1</surefire.firstPartForkCount>
-        <surefire.secondPartForkCount>1</surefire.secondPartForkCount>
        <surefire.skipFirstPart>false</surefire.skipFirstPart>
        <surefire.skipSecondPart>true</surefire.skipSecondPart>
        <surefire.firstPartGroups>org.apache.hadoop.hbase.testclassification.MiscTests
--- a/src/main/asciidoc/_chapters/developer.adoc
+++ b/src/main/asciidoc/_chapters/developer.adoc
@ -1272,12 +1272,17 @@ Small Tests (((SmallTests)))::
  _Small_ test cases are executed in a shared JVM and each test suite/test class should
   run in 15 seconds or less; i.e. a link:https://en.wikipedia.org/wiki/JUnit[junit test fixture], a java object made
   up of test methods, should finish in under 15 seconds, no matter how many or how few test methods
-   it has. These test cases should not use a minicluster.
+   it has. These test cases should not use a minicluster as a minicluster starts many services,
+   most unrelated to what is being tested. Multiple start/stops may leak resources or just overwhelm
+   the single JVM context.

 Medium Tests (((MediumTests)))::
  _Medium_ test cases are executed in separate JVM and individual test suites or test classes or in
  junit parlance, link:https://en.wikipedia.org/wiki/JUnit[test fixture], should run in 50 seconds
-   or less. These test cases can use a mini cluster.
+   or less. These test cases can use a mini cluster. Since we start up a JVM per test fixture (and
+   often a cluster too), be sure to make the startup pay by writing test fixtures that do a lot of
+   testing running tens of seconds perhaps combining test rather than spin up a jvm (and cluster)
+   per test method; this practice will help w/ overall test times.

 Large Tests (((LargeTests)))::
  _Large_ test cases are everything else. They are typically large-scale tests, regression tests
@ -1339,13 +1344,13 @@ For convenience, you can run `mvn test -P runDevTests` to execute both small and
 [[hbase.unittests.test.faster]]
 ==== Running tests faster

-By default, `$ mvn test -P runAllTests` runs 5 tests in parallel.
-It can be increased on a developer's machine.
-Allowing that you can have 2 tests in parallel per core, and you need about 2GB of memory per test (at the extreme), if you have an 8 core, 24GB box, you can have 16 tests in parallel.
-but the memory available limits it to 12 (24/2), To run all tests with 12 tests in parallel, do this: +mvn test -P runAllTests
-                        -Dsurefire.secondPartForkCount=12+.
-If using a version earlier than  2.0, do: +mvn test -P runAllTests -Dsurefire.secondPartThreadCount=12
-                    +.
+By default, `$ mvn test -P runAllTests` runs all small tests in 1 forked instance and the medium and large tests in 5 parallel forked instances. Up these counts to get the build to run faster (you may run into
+rare issues of test mutual interference). For example,
+allowing that you want to have 2 tests in parallel per core, and you need about 2GB of memory per test (at the extreme), if you have an 8 core, 24GB box, you can have 16 tests in parallel.
+but the memory available limits it to 12 (24/2), To run all tests with 12 tests in parallel, do this: +mvn test -P runAllTests -Dsurefire.secondPartForkCount=12+.
+If using a version earlier than  2.0, do: +mvn test -P runAllTests -Dsurefire.secondPartThreadCount=12 +.
+You can also increase the fork count for the first party by setting -Dsurefire.firstPartForkCount to a value > 1.
+The values passed as fork counts can be specified as a fraction of CPU as follows: for two forks per available CPU, set the value to 2.0C; for a fork for every two CPUs, set it to 0.5C.
 To increase the speed, you can as well use a ramdisk.
 You will need 2GB  of memory to run all tests.
 You will also need to delete the files between two  test run.