diff --git a/src/site/xdoc/userguide/stat.xml b/src/site/xdoc/userguide/stat.xml index 747d0801f..bceaa20a9 100644 --- a/src/site/xdoc/userguide/stat.xml +++ b/src/site/xdoc/userguide/stat.xml @@ -25,20 +25,22 @@
- +

The statistics package provides frameworks and implementations for basic Descriptive statistics, frequency distributions, bivariate regression, and t-, chi-square and ANOVA test statistics.

- Descriptive statistics

- Frequency distributions

- Simple Regression

- Statistical Tests

+ Descriptive statistics

+ Frequency distributions

+ Simple Regression

+ Multiple Regression

+ Covariance and correlation

+ Statistical Tests

- +

The stat package includes a framework and default implementations for the following Descriptive statistics: @@ -217,7 +219,7 @@ DescriptiveStatistics stats = DescriptiveStatistics.newInstance(SynchronizedDesc

- +

org.apache.commons.math.stat.descriptive.Frequency @@ -281,7 +283,7 @@ System.out.println(f.getCumPct("z")); // displays 1

- +

org.apache.commons.math.stat.regression.SimpleRegression @@ -398,7 +400,7 @@ System.out.println(regression.getSlopeStdErr());

- +

org.apache.commons.math.stat.regression.MultipleLinearRegression @@ -492,7 +494,121 @@ regression.addData(y, x, omega); // we do need covariance

- + +

+ The + org.apache.commons.math.stat.correlation package computes covariances + and correlations for pairs of arrays or columns of a matrix. + + Covariance computes covariances and + + PearsonsCorrelation provides Pearson's Product-Moment correlation coefficients. +

+

+ Implementation Notes +

    +
  • + Unbiased covariances are given by the formula

    + cov(X, Y) = sum [(xi - E(X))(yi - E(Y))] / (n - 1) + where E(X) is the mean of X and E(Y) + is the mean of the Y values. Non-bias-corrected estimates use + n in place of n - 1. Whether or not covariances are + bias-corrected is determined by the optional constructor parameter, + "biasCorrected," which defaults to true. +
  • +
  • + + PearsonsCorrelation computes corralations defined by the formula

    + cor(X, Y) = sum[(xi - E(X))(yi - E(Y))] / [(n - 1)s(X)s(Y)] + where E(X) and E(Y) are means of X and Y + and s(X), s(Y) are standard deviations. +
  • +
+

+

+ Examples: +

+
Covariance of 2 arrays
+

+
To compute the unbiased covariance between 2 double arrays, + x and y, use: + +new Covariance().covariance(x, y) + + For non-bias-corrected covariances, use + +covariance(x, y, false) + +
+

+
Covariance matrix
+

+
A covariance matrix over the columns of a source matrix data + can be computed using + +new Covariance().computeCovarianceMatrix(data) + + The i-jth entry of the returned matrix is the unbiased covariance of the ith and jth + columns of data. As above, to get non-bias-corrected covariances, + use + +computeCovarianceMatrix(data, false) + +
+

+
Pearson's correlation of 2 arrays
+

+
To compute the Pearson's product-moment correlation between two double arrays + x and y, use: + +new PearsonsCorrelation().correlation(x, y) + +
+

+
Pearson's correlation matrix
+

+
A (Pearson's) correlation matrix over the columns of a source matrix data + can be computed using + +new PearsonsCorrelation().computeCorrelationMatrix(data) + + The i-jth entry of the returned matrix is the Pearson's product-moment correlation between the + ith and jth columns of data. +
+

+
Pearson's correlation significance and standard errors
+

+
To compute standard errors and/or significances of correlation coefficients + associated with Pearson's correlation coefficients, start by creating a PearsonsCorrelation + instance from the data data using + +PearsonsCorrelation correlation = new PearsonsCorrelation(data); + + where data is either a rectangular array or a RealMatrix. + Then the matrix of standard errors is + +correlation.getCorrelationStandardErrors(); + + The formula used to compute the standard error is
+ SEr = ((1 - r2) / (n - 2))1/2
+ where r is the estimated correlation coefficient and + n is the number of observations in the source dataset.

+ p-values for the null hypothesis that respective coefficients are zero (also known as + significances) populate the RealMatrix returned by + +correlation.getCorrelationPValues(); + + getCorrelationPValues().getEntry(i,j) is the probability + that a random variable distributed as tn-2 takes + a value with absolute value greater than or equal to

+ |r|((n - 2) / (1 - r2))1/2, where r + is the estimated correlation coefficient. +
+

+
+

+
+

The interfaces and implementations in the