diff --git a/xdocs/userguide/stat.xml b/xdocs/userguide/stat.xml index 098db56f1..8ff01f14e 100644 --- a/xdocs/userguide/stat.xml +++ b/xdocs/userguide/stat.xml @@ -17,7 +17,7 @@ --> - + The Commons Math User Guide - Statistics @@ -57,7 +57,7 @@ all statistics, consists of evaluate() methods that take double[] arrays as arguments and return the value of the statistic. This interface is extended by - org.apache.commons.math.stat.univariate.StorelessUnivariateStatistic, which adds increment(), + StorelessUnivariateStatistic, which adds increment(), getResult() and associated methods to support "storageless" implementations that maintain counters, sums or other state information as values are added using the increment() method. @@ -65,29 +65,110 @@

Abstract implementations of the top level interfaces are provided in - org.apache.commons.math.stat.univariate.AbstractUnivariateStatistic and + AbstractUnivariateStatistic and - org.apache.commons.math.stat.univariate.AbstractStorelessUnivariateStatistic respectively. + AbstractStorelessUnivariateStatistic respectively.

Each statistic is implemented as a separate class, in one of the subpackages (moment, rank, summary) and each extends one of the abstract classes above (depending on whether or not value storage is required to compute the statistic). There are several ways to instantiate and use statistics. Statistics can be instantiated and used directly, but it is - generally more convenient to access them using the provided aggregates: - - - - -
AggregateStatistics IncludedValues stored?
- org.apache.commons.math.stat.DescriptiveStatisticsAllYes
- org.apache.commons.math.stat.SummaryStatisticsmin, max, mean, geometric mean, n, sum, sum of squares, standard deviation, varianceNo
- TODO: add code sample - There is also a utility class, - org.apache.commons.math.stat.StatUtils, that provides static methods for computing statistics - from double[] arrays. + generally more convenient (and efficient) to access them using the provided aggregates, + DescriptiveStatistics and + SummaryStatistics. DescriptiveStatistics maintains the input data in memory and has the capability + of producing "rolling" statistics computed from a "window" consisting of the most recently added values. SummaryStatisics + does not store the input data values in memory, so the statistics included in this aggregate are limited to those that can be + computed in one pass through the data without access to the full array of values.

+

+ + + + +
AggregateStatistics IncludedValues stored?"Rolling" capability?
+ DescriptiveStatisticsmin, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance, percentiles, skewness, kurtosis, medianYesYes
+ SummaryStatisticsmin, max, mean, geometric mean, n, sum, sum of squares, standard deviation, varianceNoNo
+

+

+ There is also a utility class, + StatUtils, that provides static methods for computing statistics + directly from double[] arrays. +

+

+ Here are some examples showing how to compute univariate statistics. +

+
Compute summary statistics for a list of double values
+

+
Using the DescriptiveStatistics aggregate (values are stored in memory): + +// Get a DescriptiveStatistics instance using factory method +DescriptiveStatistics stats = DescriptiveStatistics.newInstance(); + +// Add the data from the array +for( int i = 0; i < inputArray.length; i++) { + stats.addValue(inputArray[i]); +} + +// Compute some statistics +double mean = stats.getMean(); +double std = stats.getStandardDeviation(); +double median = stats.getMedian(); + +
+
Using the SummaryStatistics aggregate (values are not stored in memory): + +// Get a SummaryStatistics instance using factory method +SummaryStatistics stats = SummaryStatistics.newInstance(); + +// Read data from an input stream, adding values and updating sums, counters, etc. necessary for stats +while (line != null) { + line = in.readLine(); + stats.addValue(Double.parseDouble(line.trim())); +} +in.close(); + +// Compute the statistics +double mean = stats.getMean(); +double std = stats.getStandardDeviation(); +//double median = stats.getMedian(); <-- NOT AVAILABLE in SummaryStatistics + +
+
Using the StatUtils utility class: + +// Compute statistics directly from the array -- assume values is a double[] array +double mean = StatUtils.mean(values); +double std = StatUtils.variance(values); +double median = StatUtils.percentile(50); +// Compute the mean of the first three values in the array +mean = StatuUtils.mean(values, 0, 3); + +
+
Maintain a "rolling mean" of the most recent 100 values from an input stream
+

+
Use a DescriptiveStatistics instance with window size set to 100 + +// Create a DescriptiveStats instance and set the window size to 100 +DescriptiveStatistics stats = DescriptiveStatistics.newInstance(); +stats.setWindowSize(100); +// Read data from an input stream, displaying the mean of the most recent 100 observations +// after every 100 observations +long nLines = 0; +while (line != null) { + line = in.readLine(); + stats.addValue(Double.parseDouble(line.trim())); + if (nLines == 100) { + nLines = 0; + System.out.println(stats.getMean()); // "rolling" mean of most recent 100 values + } +} +in.close(); + +
+
+

+

This is yet to be written. Any contributions will be gratefully accepted!