Filled in missing content in univariate statistics section.
git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@141115 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
be15008b64
commit
e6c5757f99
|
@ -17,7 +17,7 @@
|
|||
-->
|
||||
|
||||
<?xml-stylesheet type="text/xsl" href="./xdoc.xsl"?>
|
||||
<!-- $Revision: 1.9 $ $Date: 2004/02/29 21:25:08 $ -->
|
||||
<!-- $Revision: 1.10 $ $Date: 2004/03/03 02:32:25 $ -->
|
||||
<document url="stat.html">
|
||||
<properties>
|
||||
<title>The Commons Math User Guide - Statistics</title>
|
||||
|
@ -57,7 +57,7 @@
|
|||
all statistics, consists of <code>evaluate()</code> methods that take double[] arrays as arguments and return
|
||||
the value of the statistic. This interface is extended by
|
||||
<a href="../apidocs/org/apache/commons/math/stat/univariate/StorelessUnivariateStatistic.html">
|
||||
org.apache.commons.math.stat.univariate.StorelessUnivariateStatistic,</a> which adds <code>increment(),</code>
|
||||
StorelessUnivariateStatistic,</a> which adds <code>increment(),</code>
|
||||
<code>getResult()</code> and associated methods to support "storageless" implementations that
|
||||
maintain counters, sums or other state information as values are added using the <code>increment()</code>
|
||||
method.
|
||||
|
@ -65,29 +65,110 @@
|
|||
<p>
|
||||
Abstract implementations of the top level interfaces are provided in
|
||||
<a href="../apidocs/org/apache/commons/math/stat/univariate/AbstractUnivariateStatistic.html">
|
||||
org.apache.commons.math.stat.univariate.AbstractUnivariateStatistic</a> and
|
||||
AbstractUnivariateStatistic</a> and
|
||||
<a href="../apidocs/org/apache/commons/math/stat/univariate/AbstractStorelessUnivariateStatistic.html">
|
||||
org.apache.commons.math.stat.univariate.AbstractStorelessUnivariateStatistic</a> respectively.
|
||||
AbstractStorelessUnivariateStatistic</a> respectively.
|
||||
</p>
|
||||
<p>
|
||||
Each statistic is implemented as a separate class, in one of the subpackages (moment, rank, summary) and
|
||||
each extends one of the abstract classes above (depending on whether or not value storage is required to
|
||||
compute the statistic).
|
||||
There are several ways to instantiate and use statistics. Statistics can be instantiated and used directly, but it is
|
||||
generally more convenient to access them using the provided aggregates:
|
||||
<table>
|
||||
<tr><th>Aggregate</th><th>Statistics Included</th><th>Values stored?</th></tr>
|
||||
<tr><td><a href="../apidocs/org/apache/commons/math/stat/DescriptiveStatistics.html">
|
||||
org.apache.commons.math.stat.DescriptiveStatistics</a></td><td>All</td><td>Yes</td></tr>
|
||||
<tr><td><a href="../apidocs/org/apache/commons/math/stat/SummaryStatistics.html">
|
||||
org.apache.commons.math.stat.SummaryStatistics</a></td><td>min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance</td><td>No</td></tr>
|
||||
</table>
|
||||
TODO: add code sample
|
||||
There is also a utility class, <a href="../apidocs/org/apache/commons/math/stat/StatUtils.html">
|
||||
org.apache.commons.math.stat.StatUtils,</a> that provides static methods for computing statistics
|
||||
from double[] arrays.
|
||||
generally more convenient (and efficient) to access them using the provided aggregates, <a href="../apidocs/org/apache/commons/math/stat/DescriptiveStatistics.html">
|
||||
DescriptiveStatistics</a> and <a href="../apidocs/org/apache/commons/math/stat/SummaryStatistics.html">
|
||||
SummaryStatistics.</a> <code>DescriptiveStatistics</code> maintains the input data in memory and has the capability
|
||||
of producing "rolling" statistics computed from a "window" consisting of the most recently added values. <code>SummaryStatisics</code>
|
||||
does not store the input data values in memory, so the statistics included in this aggregate are limited to those that can be
|
||||
computed in one pass through the data without access to the full array of values.
|
||||
</p>
|
||||
<p>
|
||||
<table>
|
||||
<tr><th>Aggregate</th><th>Statistics Included</th><th>Values stored?</th><th>"Rolling" capability?</th></tr>
|
||||
<tr><td><a href="../apidocs/org/apache/commons/math/stat/DescriptiveStatistics.html">
|
||||
DescriptiveStatistics</a></td><td>min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance, percentiles, skewness, kurtosis, median</td><td>Yes</td><td>Yes</td></tr>
|
||||
<tr><td><a href="../apidocs/org/apache/commons/math/stat/SummaryStatistics.html">
|
||||
SummaryStatistics</a></td><td>min, max, mean, geometric mean, n, sum, sum of squares, standard deviation, variance</td><td>No</td><td>No</td></tr>
|
||||
</table>
|
||||
</p>
|
||||
<p>
|
||||
There is also a utility class, <a href="../apidocs/org/apache/commons/math/stat/StatUtils.html">
|
||||
StatUtils,</a> that provides static methods for computing statistics
|
||||
directly from double[] arrays.
|
||||
</p>
|
||||
<p>
|
||||
Here are some examples showing how to compute univariate statistics.
|
||||
<dl>
|
||||
<dt>Compute summary statistics for a list of double values</dt>
|
||||
<br></br>
|
||||
<dd>Using the <code>DescriptiveStatistics</code> aggregate (values are stored in memory):
|
||||
<source>
|
||||
// Get a DescriptiveStatistics instance using factory method
|
||||
DescriptiveStatistics stats = DescriptiveStatistics.newInstance();
|
||||
|
||||
// Add the data from the array
|
||||
for( int i = 0; i < inputArray.length; i++) {
|
||||
stats.addValue(inputArray[i]);
|
||||
}
|
||||
|
||||
// Compute some statistics
|
||||
double mean = stats.getMean();
|
||||
double std = stats.getStandardDeviation();
|
||||
double median = stats.getMedian();
|
||||
</source>
|
||||
</dd>
|
||||
<dd>Using the <code>SummaryStatistics</code> aggregate (values are <strong>not</strong> stored in memory):
|
||||
<source>
|
||||
// Get a SummaryStatistics instance using factory method
|
||||
SummaryStatistics stats = SummaryStatistics.newInstance();
|
||||
|
||||
// Read data from an input stream, adding values and updating sums, counters, etc. necessary for stats
|
||||
while (line != null) {
|
||||
line = in.readLine();
|
||||
stats.addValue(Double.parseDouble(line.trim()));
|
||||
}
|
||||
in.close();
|
||||
|
||||
// Compute the statistics
|
||||
double mean = stats.getMean();
|
||||
double std = stats.getStandardDeviation();
|
||||
//double median = stats.getMedian(); <-- NOT AVAILABLE in SummaryStatistics
|
||||
</source>
|
||||
</dd>
|
||||
<dd>Using the <code>StatUtils</code> utility class:
|
||||
<source>
|
||||
// Compute statistics directly from the array -- assume values is a double[] array
|
||||
double mean = StatUtils.mean(values);
|
||||
double std = StatUtils.variance(values);
|
||||
double median = StatUtils.percentile(50);
|
||||
// Compute the mean of the first three values in the array
|
||||
mean = StatuUtils.mean(values, 0, 3);
|
||||
</source>
|
||||
</dd>
|
||||
<dt>Maintain a "rolling mean" of the most recent 100 values from an input stream</dt>
|
||||
<br></br>
|
||||
<dd>Use a <code>DescriptiveStatistics</code> instance with window size set to 100
|
||||
<source>
|
||||
// Create a DescriptiveStats instance and set the window size to 100
|
||||
DescriptiveStatistics stats = DescriptiveStatistics.newInstance();
|
||||
stats.setWindowSize(100);
|
||||
// Read data from an input stream, displaying the mean of the most recent 100 observations
|
||||
// after every 100 observations
|
||||
long nLines = 0;
|
||||
while (line != null) {
|
||||
line = in.readLine();
|
||||
stats.addValue(Double.parseDouble(line.trim()));
|
||||
if (nLines == 100) {
|
||||
nLines = 0;
|
||||
System.out.println(stats.getMean()); // "rolling" mean of most recent 100 values
|
||||
}
|
||||
}
|
||||
in.close();
|
||||
</source>
|
||||
</dd>
|
||||
</dl>
|
||||
</p>
|
||||
</subsection>
|
||||
|
||||
<subsection name="1.3 Frequency distributions" href="frequency">
|
||||
<p>This is yet to be written. Any contributions will be gratefully
|
||||
accepted!</p>
|
||||
|
|
Loading…
Reference in New Issue