HBASE-24180 Edit test doc around forkcount and speeding up test runs (#1505)

Signed-off-by: Jan Hentschel <janh@apache.org>
2020-04-14 10:23:53 -07:00 · 2020-04-14 10:23:53 -07:00 · ce29147dca
parent 92b30f2638
commit ce29147dca
1 changed files with 61 additions and 25 deletions
--- a/src/main/asciidoc/_chapters/developer.adoc
+++ b/src/main/asciidoc/_chapters/developer.adoc
@ -1298,22 +1298,44 @@ Integration Tests (((IntegrationTests)))::
 [[hbase.unittests.cmds]]
 === Running tests

+The state of tests on the hbase branches varies. Some branches keep good test hygiene and all tests pass
+reliably with perhaps an unlucky sporadic flakey test failure. On other branches, the case may be less so with
+frequent flakies and even broken tests in need of attention that fail 100% of the time. Try and figure
+the state of tests on the branch you are currently interested in; the current state of nightly
+linke:https://builds.apache.org/view/H-L/view/HBase/job/HBase%20Nightly/[apache jenkins builds] is a good
+place to start. Tests on master branch are generally not in the best of condition as releases
+are less frequent off master. This can make it hard landing patches especially given our dictum that
+patches land on master branch first.
+
+The full test suite can take from 5-6 hours on an anemic VM with 4 CPUs and minimal
+parallelism to 50 minutes or less on a linux machine with dozens of CPUs.
+
+When you go to run the full test suite, make sure you up the test runner user nproc
+(`ulimit -u` -- make sure it > ~6000 or more even) and the number of files (`ulimit -n` -- make
+sure it > 10240 or so) limits on your system. Errors because the test run hits
+limits are often only opaquely related to the constraint. You can see the current
+user settings by running `ulimit -a`.
+
 [[hbase.unittests.cmds.test]]
 ==== Default: small and medium category tests

-Running `mvn test` will execute all small tests in a single JVM (no fork) and then medium tests in a separate JVM for each test instance.
-Medium tests are NOT executed if there is an error in a small test. Large tests are NOT executed.
+Running `mvn test` will execute all small tests in a single JVM (no fork) and then medium tests in a
+forked, separate JVM for each test instance (For definition of 'small' test and so on, see
+<<hbase.unittests>>). Medium tests are NOT executed if there is an error in a
+small test. Large tests are NOT executed.

 [[hbase.unittests.cmds.test.runalltests]]
 ==== Running all tests

-Running `mvn test -P runAllTests` will execute small tests in a single JVM then medium and large tests in a separate JVM for each test.
-Medium and large tests are NOT executed if there is an error in a small test.
+Running `mvn test -P runAllTests` will execute small tests in a single JVM, then medium and large tests
+in a forked, separate JVM for each test. Medium and large tests are NOT executed if there is an error in
+a small test.

 [[hbase.unittests.cmds.test.localtests.mytest]]
 ==== Running a single test or all tests in a package

-To run an individual test, e.g. `MyTest`, rum `mvn test -Dtest=MyTest` You can also pass multiple, individual tests as a comma-delimited list:
+To run an individual test, e.g. `MyTest`, rum `mvn test -Dtest=MyTest` You can also pass multiple,
+individual tests as a comma-delimited list:
 [source,bash]
 ----
 mvn test  -Dtest=MyTest1,MyTest2,MyTest3
@ -1325,10 +1347,12 @@ mvn test '-Dtest=org.apache.hadoop.hbase.client.*'
 ----

 When `-Dtest` is specified, the `localTests` profile will be used.
-Each junit test is executed in a separate JVM (A fork per test class). There is no parallelization when tests are running in this mode.
+Each junit test is executed in a separate JVM (A fork per test class).
+There is no parallelization when tests are running in this mode.
 You will see a new message at the end of the -report: `"[INFO] Tests are skipped"`.
-It's harmless.
-However, you need to make sure the sum of `Tests run:` in the `Results:` section of test reports matching the number of tests you specified because no error will be reported when a non-existent test case is specified.
+It's harmless.  However, you need to make sure the sum of
+`Tests run:` in the `Results:` section of test reports matching the number of tests
+you specified because no error will be reported when a non-existent test case is specified.

 [[hbase.unittests.cmds.test.profiles]]
 ==== Other test invocation permutations
@ -1344,17 +1368,33 @@ For convenience, you can run `mvn test -P runDevTests` to execute both small and
 [[hbase.unittests.test.faster]]
 ==== Running tests faster

-By default, `$ mvn test -P runAllTests` runs all small tests in 1 forked instance and the medium and large tests in 5 parallel forked instances. Up these counts to get the build to run faster (you may run into
-rare issues of test mutual interference). For example,
-allowing that you want to have 2 tests in parallel per core, and you need about 2GB of memory per test (at the extreme), if you have an 8 core, 24GB box, you can have 16 tests in parallel.
-but the memory available limits it to 12 (24/2), To run all tests with 12 tests in parallel, do this: +mvn test -P runAllTests -Dsurefire.secondPartForkCount=12+.
-If using a version earlier than  2.0, do: +mvn test -P runAllTests -Dsurefire.secondPartThreadCount=12 +.
-You can also increase the fork count for the first party by setting -Dsurefire.firstPartForkCount to a value > 1.
-The values passed as fork counts can be specified as a fraction of CPU as follows: for two forks per available CPU, set the value to 2.0C; for a fork for every two CPUs, set it to 0.5C.
-To increase the speed, you can as well use a ramdisk.
-You will need 2GB  of memory to run all tests.
-You will also need to delete the files between two  test run.
-The typical way to configure a ramdisk on Linux is:
+By default, `$ mvn test -P runAllTests` runs all tests using a quarter of the CPUs available on machine
+hosting the test run (see `surefire.firstPartForkCount` and `surefire.secondPartForkCount` in the top-level
+hbase `pom.xml`). Up these counts to get the build to run faster. You can also have hbase modules
+run their tests in parrallel when the dependency graph allows by passing `--threads=N` when you invoke
+maven, where `N` is the amount of parallelism wanted.
+
+For example, allowing that you want to use all cores on a machine to run tests,
+you could start up the maven test run with:
+
+----
+  $ x="1.0C";  mvn -Dsurefire.firstPartForkCount=$x -Dsurefire.secondPartForkCount=$x test -PrunAllTests
+----
+
+If a 32 core machine, you should see periods during which 32 forked jvms appear in your process listing each running unit tests.
+Your milage may vary. Dependent on hardware, overcommittment of CPU, memory, etc., can bring the test suite crashing down,
+usually complaining of system exit and incomplete test report xml files. Start gently, with the default fork setting which
+uses a quarter of the available CPUs.
+
+Adding the `--threads=N`, maven will run N modules in parallel when dependencies allow. Be aware, if you have
+set the forkcount to `1.0C`, and the threads count to '2', the number of concurrent test runners can approach
+2 * CPU count likely overcommitting the host machine.
+
+You will need ~2.2GB of memory per forked JVM plus the memory used by maven itself (3-4G).
+
+
+To increase the speed, you can as well use a ramdisk. 2-3G should be sufficient. Be sure to
+delete the files between each test run. The typical way to configure a ramdisk on Linux is:

 ----
 $ sudo mkdir /ram2G
@ -1364,17 +1404,13 @@ sudo mount -t tmpfs -o size=2048M tmpfs /ram2G
 You can then use it to run all HBase tests on 2.0 with the command:

 ----
-mvn test
-                        -P runAllTests -Dsurefire.secondPartForkCount=12
-                        -Dtest.build.data.basedirectory=/ram2G
+mvn test -PrunAllTests -Dtest.build.data.basedirectory=/ram2G
 ----

 On earlier versions, use:

 ----
-mvn test
-                        -P runAllTests -Dsurefire.secondPartThreadCount=12
-                        -Dtest.build.data.basedirectory=/ram2G
+mvn test -P runAllTests -Dtest.build.data.basedirectory=/ram2G
 ----

 [[hbase.unittests.cmds.test.hbasetests]]