Fixed errors in multiple regression section. JIRA: MATH-407.

git-svn-id: https://svn.apache.org/repos/asf/commons/proper/math/trunk@998761 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Phil Steitz 2010-09-20 01:57:03 +00:00
parent dbb5bb4968
commit eb504310ee
1 changed files with 52 additions and 41 deletions

View File

@ -473,37 +473,47 @@ System.out.println(regression.getSlopeStdErr());
</subsection> </subsection>
<subsection name="1.5 Multiple linear regression"> <subsection name="1.5 Multiple linear regression">
<p> <p>
<a href="../apidocs/org/apache/commons/math/stat/regression/MultipleLinearRegression.html"> <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
MultipleLinearRegression</a> provides ordinary least squares regression OLSMultipleLinearRegression</a> and
with a generic multiple variable linear model, which in matrix notation <a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html">
can be expressed as: GLSMultipleLinearRegression</a> provide least squares regression to fit the linear model:
</p> </p>
<p> <p>
<code> y=X*b+u </code> <code> Y=X*b+u </code>
</p> </p>
<p> <p>
where y is an <code>n-vector</code> <b>regressand</b>, X is a <code>[n,k]</code> matrix whose <code>k</code> columns are called where Y is an n-vector <b>regressand</b>, X is a [n,k] matrix whose k columns are called
<b>regressors</b>, b is <code>k-vector</code> of <b>regression parameters</b> and <code>u</code> is an <code>n-vector</code> <b>regressors</b>, b is k-vector of <b>regression parameters</b> and u is an n-vector
of <b>error terms</b> or <b>residuals</b>. The notation is quite standard in literature, of <b>error terms</b> or <b>residuals</b>.
cf eg <a href="http://www.econ.queensu.ca/ETM">Davidson and MacKinnon, Econometrics Theory and Methods, 2004</a>.
</p> </p>
<p> <p>
Two implementations are provided: <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html"> <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
OLSMultipleLinearRegression</a> and OLSMultipleLinearRegression</a> provides Ordinary Least Squares Regression, and
<a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html"> <a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html">
GLSMultipleLinearRegression</a> GLSMultipleLinearRegression</a> implements Generalized Least Squares. See the javadoc for these
classes for details on the algorithms and forumlas used.
</p> </p>
<p> <p>
Observations (x,y and covariance data matrices) can be added to the model via the <code>addData(double[] y, double[][] x, double[][] covariance)</code> method. Data for OLS models can be loaded in a single double[] array, consisting of concatenated rows of data, each containing
The observations are stored in memory until the next time the addData method is invoked. the regressand (Y) value, followed by regressor values; or using a double[][] array with rows corresponding to
observations. GLS models also require a double[][] array representing the covariance matrix of the error terms. See
<a href="../apidocs/org/apache/commons/math/stat/regression/AbstractMultipleLinearRegression.html#newSampleData(double[], int, int)">
AbstractMultipleLinearRegression#newSampleData(double[],int,int)</a>,
<a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html#newSampleData(double[], double[][])">
OLSMultipleLinearRegression#newSampleData(double[], double[][])</a> and
<a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html#newSampleData(double[], double[][], double[][])">
GLSMultipleLinearRegression#newSampleData(double[],double[][],double[][])</a> for details.
</p> </p>
<p> <p>
<strong>Usage Notes</strong>: <ul> <strong>Usage Notes</strong>: <ul>
<li> Data is validated when invoking the <code>addData(double[] y, double[][] x, double[][] covariance)</code> method and <li> Data are validated when invoking any of the newSample, newX, newY or newCovariance methods and
<code>IllegalArgumentException</code> is thrown when inappropriate. <code>IllegalArgumentException</code> is thrown when input data arrays do not have matching dimensions
or do not contain sufficient data to estimate the model.
</li> </li>
<li> Only the GLS regressions require the covariance matrix, so in the OLS regression it is ignored and can be safely <li> By default, regression models are estimated with intercept terms. In the notation above, this implies that the
inputted as <code>null</code>.</li> X matrix contains an initial row identically equal to 1. X data supplied to the newX or newSample methods should not
include this column - the data loading methods will create it automatically. To estimate a model without an intercept
term, set the <code>noIntercept</code> property to <code>true.</code></li>
</ul> </ul>
</p> </p>
<p> <p>
@ -511,44 +521,48 @@ System.out.println(regression.getSlopeStdErr());
<dl> <dl>
<dt>OLS regression</dt> <dt>OLS regression</dt>
<br></br> <br></br>
<dd>Instantiate an OLS regression object and load dataset <dd>Instantiate an OLS regression object and load a dataset:
<source> <source>
MultipleLinearRegression regression = new OLSMultipleLinearRegression(); OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0}; double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
double[] x = new double[6][]; double[] x = new double[6][];
x[0] = new double[]{1.0, 0, 0, 0, 0, 0}; x[0] = new double[]{0, 0, 0, 0, 0};
x[1] = new double[]{1.0, 2.0, 0, 0, 0, 0}; x[1] = new double[]{2.0, 0, 0, 0, 0};
x[2] = new double[]{1.0, 0, 3.0, 0, 0, 0}; x[2] = new double[]{0, 3.0, 0, 0, 0};
x[3] = new double[]{1.0, 0, 0, 4.0, 0, 0}; x[3] = new double[]{0, 0, 4.0, 0, 0};
x[4] = new double[]{1.0, 0, 0, 0, 5.0, 0}; x[4] = new double[]{0, 0, 0, 5.0, 0};
x[5] = new double[]{1.0, 0, 0, 0, 0, 6.0}; x[5] = new double[]{0, 0, 0, 0, 6.0};
regression.addData(y, x, null); // we don't need covariance regression.newSample(y, x);
</source> </source>
</dd> </dd>
<dd>Estimate of regression values honours the <code>MultipleLinearRegression</code> interface: <dd>Get regression parameters and diagnostics:
<source> <source>
double[] beta = regression.estimateRegressionParameters(); double[] beta = regression.estimateRegressionParameters();
double[] residuals = regression.estimateResiduals(); double[] residuals = regression.estimateResiduals();
double[][] parametersVariance = regression.estimateRegressionParametersVariance(); double[][] parametersVariance = regression.estimateRegressionParametersVariance();
double regressandVariance = regression.estimateRegressandVariance(); double regressandVariance = regression.estimateRegressandVariance();
double rSquared = regression.caclulateRSquared();
double sigma = regression.estimateRegressionStandardError();
</source> </source>
</dd> </dd>
<dt>GLS regression</dt> <dt>GLS regression</dt>
<br></br> <br></br>
<dd>Instantiate an GLS regression object and load dataset <dd>Instantiate a GLS regression object and load a dataset:
<source> <source>
MultipleLinearRegression regression = new GLSMultipleLinearRegression(); GLSMultipleLinearRegression regression = new GLSMultipleLinearRegression();
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0}; double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
double[] x = new double[6][]; double[] x = new double[6][];
x[0] = new double[]{1.0, 0, 0, 0, 0, 0}; x[0] = new double[]{0, 0, 0, 0, 0};
x[1] = new double[]{1.0, 2.0, 0, 0, 0, 0}; x[1] = new double[]{2.0, 0, 0, 0, 0};
x[2] = new double[]{1.0, 0, 3.0, 0, 0, 0}; x[2] = new double[]{0, 3.0, 0, 0, 0};
x[3] = new double[]{1.0, 0, 0, 4.0, 0, 0}; x[3] = new double[]{0, 0, 4.0, 0, 0};
x[4] = new double[]{1.0, 0, 0, 0, 5.0, 0}; x[4] = new double[]{0, 0, 0, 5.0, 0};
x[5] = new double[]{1.0, 0, 0, 0, 0, 6.0}; x[5] = new double[]{0, 0, 0, 0, 6.0};
double[][] omega = new double[6][]; double[][] omega = new double[6][];
omega[0] = new double[]{1.1, 0, 0, 0, 0, 0}; omega[0] = new double[]{1.1, 0, 0, 0, 0, 0};
omega[1] = new double[]{0, 2.2, 0, 0, 0, 0}; omega[1] = new double[]{0, 2.2, 0, 0, 0, 0};
@ -556,12 +570,9 @@ omega[2] = new double[]{0, 0, 3.3, 0, 0, 0};
omega[3] = new double[]{0, 0, 0, 4.4, 0, 0}; omega[3] = new double[]{0, 0, 0, 4.4, 0, 0};
omega[4] = new double[]{0, 0, 0, 0, 5.5, 0}; omega[4] = new double[]{0, 0, 0, 0, 5.5, 0};
omega[5] = new double[]{0, 0, 0, 0, 0, 6.6}; omega[5] = new double[]{0, 0, 0, 0, 0, 6.6};
regression.addData(y, x, omega); // we do need covariance regression.newSampleData(y, x, omega);
</source> </source>
</dd> </dd>
<dd>Estimate of regression values honours the same <code>MultipleLinearRegression</code> interface as
the OLS regression.
</dd>
</dl> </dl>
</p> </p>
</subsection> </subsection>