From e6c5757f99825ca09967b03eb4119b634e3575e2 Mon Sep 17 00:00:00 2001
From: Phil Steitz
Abstract implementations of the top level interfaces are provided in
- org.apache.commons.math.stat.univariate.AbstractUnivariateStatistic and
+ AbstractUnivariateStatistic and
- org.apache.commons.math.stat.univariate.AbstractStorelessUnivariateStatistic respectively.
+ AbstractStorelessUnivariateStatistic respectively.
Each statistic is implemented as a separate class, in one of the subpackages (moment, rank, summary) and
each extends one of the abstract classes above (depending on whether or not value storage is required to
compute the statistic).
There are several ways to instantiate and use statistics. Statistics can be instantiated and used directly, but it is
- generally more convenient to access them using the provided aggregates:
- evaluate()
methods that take double[] arrays as arguments and return
the value of the statistic. This interface is extended by
- org.apache.commons.math.stat.univariate.StorelessUnivariateStatistic, which adds increment(),
+ StorelessUnivariateStatistic, which adds increment(),
getResult()
and associated methods to support "storageless" implementations that
maintain counters, sums or other state information as values are added using the increment()
method.
@@ -65,29 +65,110 @@
-
- TODO: add code sample
- There is also a utility class,
- org.apache.commons.math.stat.StatUtils, that provides static methods for computing statistics
- from double[] arrays.
+ generally more convenient (and efficient) to access them using the provided aggregates,
+ DescriptiveStatistics and
+ SummaryStatistics.
- Aggregate Statistics Included Values stored?
-
- org.apache.commons.math.stat.DescriptiveStatistics All Yes
-
- org.apache.commons.math.stat.SummaryStatistics min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance No DescriptiveStatistics
maintains the input data in memory and has the capability
+ of producing "rolling" statistics computed from a "window" consisting of the most recently added values. SummaryStatisics
+ does not store the input data values in memory, so the statistics included in this aggregate are limited to those that can be
+ computed in one pass through the data without access to the full array of values.
+
Aggregate | Statistics Included | Values stored? | "Rolling" capability? |
---|---|---|---|
+ DescriptiveStatistics | min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance, percentiles, skewness, kurtosis, median | Yes | Yes |
+ SummaryStatistics | min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance | No | No |
+ There is also a utility class, + StatUtils, that provides static methods for computing statistics + directly from double[] arrays. +
++ Here are some examples showing how to compute univariate statistics. +
DescriptiveStatistics
aggregate (values are stored in memory):
+
+ SummaryStatistics
aggregate (values are not stored in memory):
+
+ StatUtils
utility class:
+
+ DescriptiveStatistics
instance with window size set to 100
+
+ This is yet to be written. Any contributions will be gratefully accepted!