Filled in missing content in univariate statistics section.

git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@141115 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Phil Steitz 2004-03-03 02:32:25 +00:00
parent be15008b64
commit e6c5757f99
1 changed files with 97 additions and 16 deletions

View File

@ -17,7 +17,7 @@
-->
<?xml-stylesheet type="text/xsl" href="./xdoc.xsl"?>
<!-- $Revision: 1.9 $ $Date: 2004/02/29 21:25:08 $ -->
<!-- $Revision: 1.10 $ $Date: 2004/03/03 02:32:25 $ -->
<document url="stat.html">
<properties>
<title>The Commons Math User Guide - Statistics</title>
@ -57,7 +57,7 @@
all statistics, consists of <code>evaluate()</code> methods that take double[] arrays as arguments and return
the value of the statistic. This interface is extended by
<a href="../apidocs/org/apache/commons/math/stat/univariate/StorelessUnivariateStatistic.html">
org.apache.commons.math.stat.univariate.StorelessUnivariateStatistic,</a> which adds <code>increment(),</code>
StorelessUnivariateStatistic,</a> which adds <code>increment(),</code>
<code>getResult()</code> and associated methods to support "storageless" implementations that
maintain counters, sums or other state information as values are added using the <code>increment()</code>
method.
@ -65,29 +65,110 @@
<p>
Abstract implementations of the top level interfaces are provided in
<a href="../apidocs/org/apache/commons/math/stat/univariate/AbstractUnivariateStatistic.html">
org.apache.commons.math.stat.univariate.AbstractUnivariateStatistic</a> and
AbstractUnivariateStatistic</a> and
<a href="../apidocs/org/apache/commons/math/stat/univariate/AbstractStorelessUnivariateStatistic.html">
org.apache.commons.math.stat.univariate.AbstractStorelessUnivariateStatistic</a> respectively.
AbstractStorelessUnivariateStatistic</a> respectively.
</p>
<p>
Each statistic is implemented as a separate class, in one of the subpackages (moment, rank, summary) and
each extends one of the abstract classes above (depending on whether or not value storage is required to
compute the statistic).
There are several ways to instantiate and use statistics. Statistics can be instantiated and used directly, but it is
generally more convenient to access them using the provided aggregates:
<table>
<tr><th>Aggregate</th><th>Statistics Included</th><th>Values stored?</th></tr>
<tr><td><a href="../apidocs/org/apache/commons/math/stat/DescriptiveStatistics.html">
org.apache.commons.math.stat.DescriptiveStatistics</a></td><td>All</td><td>Yes</td></tr>
<tr><td><a href="../apidocs/org/apache/commons/math/stat/SummaryStatistics.html">
org.apache.commons.math.stat.SummaryStatistics</a></td><td>min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance</td><td>No</td></tr>
</table>
TODO: add code sample
There is also a utility class, <a href="../apidocs/org/apache/commons/math/stat/StatUtils.html">
org.apache.commons.math.stat.StatUtils,</a> that provides static methods for computing statistics
from double[] arrays.
generally more convenient (and efficient) to access them using the provided aggregates, <a href="../apidocs/org/apache/commons/math/stat/DescriptiveStatistics.html">
DescriptiveStatistics</a> and <a href="../apidocs/org/apache/commons/math/stat/SummaryStatistics.html">
SummaryStatistics.</a> <code>DescriptiveStatistics</code> maintains the input data in memory and has the capability
of producing "rolling" statistics computed from a "window" consisting of the most recently added values. <code>SummaryStatisics</code>
does not store the input data values in memory, so the statistics included in this aggregate are limited to those that can be
computed in one pass through the data without access to the full array of values.
</p>
<p>
<table>
<tr><th>Aggregate</th><th>Statistics Included</th><th>Values stored?</th><th>"Rolling" capability?</th></tr>
<tr><td><a href="../apidocs/org/apache/commons/math/stat/DescriptiveStatistics.html">
DescriptiveStatistics</a></td><td>min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance, percentiles, skewness, kurtosis, median</td><td>Yes</td><td>Yes</td></tr>
<tr><td><a href="../apidocs/org/apache/commons/math/stat/SummaryStatistics.html">
SummaryStatistics</a></td><td>min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance</td><td>No</td><td>No</td></tr>
</table>
</p>
<p>
There is also a utility class, <a href="../apidocs/org/apache/commons/math/stat/StatUtils.html">
StatUtils,</a> that provides static methods for computing statistics
directly from double[] arrays.
</p>
<p>
Here are some examples showing how to compute univariate statistics.
<dl>
<dt>Compute summary statistics for a list of double values</dt>
<br></br>
<dd>Using the <code>DescriptiveStatistics</code> aggregate (values are stored in memory):
<source>
// Get a DescriptiveStatistics instance using factory method
DescriptiveStatistics stats = DescriptiveStatistics.newInstance();
// Add the data from the array
for( int i = 0; i &lt; inputArray.length; i++) {
stats.addValue(inputArray[i]);
}
// Compute some statistics
double mean = stats.getMean();
double std = stats.getStandardDeviation();
double median = stats.getMedian();
</source>
</dd>
<dd>Using the <code>SummaryStatistics</code> aggregate (values are <strong>not</strong> stored in memory):
<source>
// Get a SummaryStatistics instance using factory method
SummaryStatistics stats = SummaryStatistics.newInstance();
// Read data from an input stream, adding values and updating sums, counters, etc. necessary for stats
while (line != null) {
line = in.readLine();
stats.addValue(Double.parseDouble(line.trim()));
}
in.close();
// Compute the statistics
double mean = stats.getMean();
double std = stats.getStandardDeviation();
//double median = stats.getMedian(); &lt;-- NOT AVAILABLE in SummaryStatistics
</source>
</dd>
<dd>Using the <code>StatUtils</code> utility class:
<source>
// Compute statistics directly from the array -- assume values is a double[] array
double mean = StatUtils.mean(values);
double std = StatUtils.variance(values);
double median = StatUtils.percentile(50);
// Compute the mean of the first three values in the array
mean = StatuUtils.mean(values, 0, 3);
</source>
</dd>
<dt>Maintain a "rolling mean" of the most recent 100 values from an input stream</dt>
<br></br>
<dd>Use a <code>DescriptiveStatistics</code> instance with window size set to 100
<source>
// Create a DescriptiveStats instance and set the window size to 100
DescriptiveStatistics stats = DescriptiveStatistics.newInstance();
stats.setWindowSize(100);
// Read data from an input stream, displaying the mean of the most recent 100 observations
// after every 100 observations
long nLines = 0;
while (line != null) {
line = in.readLine();
stats.addValue(Double.parseDouble(line.trim()));
if (nLines == 100) {
nLines = 0;
System.out.println(stats.getMean()); // "rolling" mean of most recent 100 values
}
}
in.close();
</source>
</dd>
</dl>
</p>
</subsection>
<subsection name="1.3 Frequency distributions" href="frequency">
<p>This is yet to be written. Any contributions will be gratefully
accepted!</p>