Fixed errors in multiple regression section. JIRA: MATH-407.
git-svn-id: https://svn.apache.org/repos/asf/commons/proper/math/trunk@998761 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
dbb5bb4968
commit
eb504310ee
|
@ -473,37 +473,47 @@ System.out.println(regression.getSlopeStdErr());
|
||||||
</subsection>
|
</subsection>
|
||||||
<subsection name="1.5 Multiple linear regression">
|
<subsection name="1.5 Multiple linear regression">
|
||||||
<p>
|
<p>
|
||||||
<a href="../apidocs/org/apache/commons/math/stat/regression/MultipleLinearRegression.html">
|
<a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
|
||||||
MultipleLinearRegression</a> provides ordinary least squares regression
|
OLSMultipleLinearRegression</a> and
|
||||||
with a generic multiple variable linear model, which in matrix notation
|
<a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html">
|
||||||
can be expressed as:
|
GLSMultipleLinearRegression</a> provide least squares regression to fit the linear model:
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
<code> y=X*b+u </code>
|
<code> Y=X*b+u </code>
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
where y is an <code>n-vector</code> <b>regressand</b>, X is a <code>[n,k]</code> matrix whose <code>k</code> columns are called
|
where Y is an n-vector <b>regressand</b>, X is a [n,k] matrix whose k columns are called
|
||||||
<b>regressors</b>, b is <code>k-vector</code> of <b>regression parameters</b> and <code>u</code> is an <code>n-vector</code>
|
<b>regressors</b>, b is k-vector of <b>regression parameters</b> and u is an n-vector
|
||||||
of <b>error terms</b> or <b>residuals</b>. The notation is quite standard in literature,
|
of <b>error terms</b> or <b>residuals</b>.
|
||||||
cf eg <a href="http://www.econ.queensu.ca/ETM">Davidson and MacKinnon, Econometrics Theory and Methods, 2004</a>.
|
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Two implementations are provided: <a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
|
<a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html">
|
||||||
OLSMultipleLinearRegression</a> and
|
OLSMultipleLinearRegression</a> provides Ordinary Least Squares Regression, and
|
||||||
<a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html">
|
<a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html">
|
||||||
GLSMultipleLinearRegression</a>
|
GLSMultipleLinearRegression</a> implements Generalized Least Squares. See the javadoc for these
|
||||||
|
classes for details on the algorithms and forumlas used.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
Observations (x,y and covariance data matrices) can be added to the model via the <code>addData(double[] y, double[][] x, double[][] covariance)</code> method.
|
Data for OLS models can be loaded in a single double[] array, consisting of concatenated rows of data, each containing
|
||||||
The observations are stored in memory until the next time the addData method is invoked.
|
the regressand (Y) value, followed by regressor values; or using a double[][] array with rows corresponding to
|
||||||
|
observations. GLS models also require a double[][] array representing the covariance matrix of the error terms. See
|
||||||
|
<a href="../apidocs/org/apache/commons/math/stat/regression/AbstractMultipleLinearRegression.html#newSampleData(double[], int, int)">
|
||||||
|
AbstractMultipleLinearRegression#newSampleData(double[],int,int)</a>,
|
||||||
|
<a href="../apidocs/org/apache/commons/math/stat/regression/OLSMultipleLinearRegression.html#newSampleData(double[], double[][])">
|
||||||
|
OLSMultipleLinearRegression#newSampleData(double[], double[][])</a> and
|
||||||
|
<a href="../apidocs/org/apache/commons/math/stat/regression/GLSMultipleLinearRegression.html#newSampleData(double[], double[][], double[][])">
|
||||||
|
GLSMultipleLinearRegression#newSampleData(double[],double[][],double[][])</a> for details.
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
<strong>Usage Notes</strong>: <ul>
|
<strong>Usage Notes</strong>: <ul>
|
||||||
<li> Data is validated when invoking the <code>addData(double[] y, double[][] x, double[][] covariance)</code> method and
|
<li> Data are validated when invoking any of the newSample, newX, newY or newCovariance methods and
|
||||||
<code>IllegalArgumentException</code> is thrown when inappropriate.
|
<code>IllegalArgumentException</code> is thrown when input data arrays do not have matching dimensions
|
||||||
|
or do not contain sufficient data to estimate the model.
|
||||||
</li>
|
</li>
|
||||||
<li> Only the GLS regressions require the covariance matrix, so in the OLS regression it is ignored and can be safely
|
<li> By default, regression models are estimated with intercept terms. In the notation above, this implies that the
|
||||||
inputted as <code>null</code>.</li>
|
X matrix contains an initial row identically equal to 1. X data supplied to the newX or newSample methods should not
|
||||||
|
include this column - the data loading methods will create it automatically. To estimate a model without an intercept
|
||||||
|
term, set the <code>noIntercept</code> property to <code>true.</code></li>
|
||||||
</ul>
|
</ul>
|
||||||
</p>
|
</p>
|
||||||
<p>
|
<p>
|
||||||
|
@ -511,44 +521,48 @@ System.out.println(regression.getSlopeStdErr());
|
||||||
<dl>
|
<dl>
|
||||||
<dt>OLS regression</dt>
|
<dt>OLS regression</dt>
|
||||||
<br></br>
|
<br></br>
|
||||||
<dd>Instantiate an OLS regression object and load dataset
|
<dd>Instantiate an OLS regression object and load a dataset:
|
||||||
<source>
|
<source>
|
||||||
MultipleLinearRegression regression = new OLSMultipleLinearRegression();
|
OLSMultipleLinearRegression regression = new OLSMultipleLinearRegression();
|
||||||
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
|
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
|
||||||
double[] x = new double[6][];
|
double[] x = new double[6][];
|
||||||
x[0] = new double[]{1.0, 0, 0, 0, 0, 0};
|
x[0] = new double[]{0, 0, 0, 0, 0};
|
||||||
x[1] = new double[]{1.0, 2.0, 0, 0, 0, 0};
|
x[1] = new double[]{2.0, 0, 0, 0, 0};
|
||||||
x[2] = new double[]{1.0, 0, 3.0, 0, 0, 0};
|
x[2] = new double[]{0, 3.0, 0, 0, 0};
|
||||||
x[3] = new double[]{1.0, 0, 0, 4.0, 0, 0};
|
x[3] = new double[]{0, 0, 4.0, 0, 0};
|
||||||
x[4] = new double[]{1.0, 0, 0, 0, 5.0, 0};
|
x[4] = new double[]{0, 0, 0, 5.0, 0};
|
||||||
x[5] = new double[]{1.0, 0, 0, 0, 0, 6.0};
|
x[5] = new double[]{0, 0, 0, 0, 6.0};
|
||||||
regression.addData(y, x, null); // we don't need covariance
|
regression.newSample(y, x);
|
||||||
</source>
|
</source>
|
||||||
</dd>
|
</dd>
|
||||||
<dd>Estimate of regression values honours the <code>MultipleLinearRegression</code> interface:
|
<dd>Get regression parameters and diagnostics:
|
||||||
<source>
|
<source>
|
||||||
double[] beta = regression.estimateRegressionParameters();
|
double[] beta = regression.estimateRegressionParameters();
|
||||||
|
|
||||||
double[] residuals = regression.estimateResiduals();
|
double[] residuals = regression.estimateResiduals();
|
||||||
|
|
||||||
double[][] parametersVariance = regression.estimateRegressionParametersVariance();
|
double[][] parametersVariance = regression.estimateRegressionParametersVariance();
|
||||||
|
|
||||||
double regressandVariance = regression.estimateRegressandVariance();
|
double regressandVariance = regression.estimateRegressandVariance();
|
||||||
|
|
||||||
|
double rSquared = regression.caclulateRSquared();
|
||||||
|
|
||||||
|
double sigma = regression.estimateRegressionStandardError();
|
||||||
</source>
|
</source>
|
||||||
</dd>
|
</dd>
|
||||||
<dt>GLS regression</dt>
|
<dt>GLS regression</dt>
|
||||||
<br></br>
|
<br></br>
|
||||||
<dd>Instantiate an GLS regression object and load dataset
|
<dd>Instantiate a GLS regression object and load a dataset:
|
||||||
<source>
|
<source>
|
||||||
MultipleLinearRegression regression = new GLSMultipleLinearRegression();
|
GLSMultipleLinearRegression regression = new GLSMultipleLinearRegression();
|
||||||
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
|
double[] y = new double[]{11.0, 12.0, 13.0, 14.0, 15.0, 16.0};
|
||||||
double[] x = new double[6][];
|
double[] x = new double[6][];
|
||||||
x[0] = new double[]{1.0, 0, 0, 0, 0, 0};
|
x[0] = new double[]{0, 0, 0, 0, 0};
|
||||||
x[1] = new double[]{1.0, 2.0, 0, 0, 0, 0};
|
x[1] = new double[]{2.0, 0, 0, 0, 0};
|
||||||
x[2] = new double[]{1.0, 0, 3.0, 0, 0, 0};
|
x[2] = new double[]{0, 3.0, 0, 0, 0};
|
||||||
x[3] = new double[]{1.0, 0, 0, 4.0, 0, 0};
|
x[3] = new double[]{0, 0, 4.0, 0, 0};
|
||||||
x[4] = new double[]{1.0, 0, 0, 0, 5.0, 0};
|
x[4] = new double[]{0, 0, 0, 5.0, 0};
|
||||||
x[5] = new double[]{1.0, 0, 0, 0, 0, 6.0};
|
x[5] = new double[]{0, 0, 0, 0, 6.0};
|
||||||
double[][] omega = new double[6][];
|
double[][] omega = new double[6][];
|
||||||
omega[0] = new double[]{1.1, 0, 0, 0, 0, 0};
|
omega[0] = new double[]{1.1, 0, 0, 0, 0, 0};
|
||||||
omega[1] = new double[]{0, 2.2, 0, 0, 0, 0};
|
omega[1] = new double[]{0, 2.2, 0, 0, 0, 0};
|
||||||
|
@ -556,12 +570,9 @@ omega[2] = new double[]{0, 0, 3.3, 0, 0, 0};
|
||||||
omega[3] = new double[]{0, 0, 0, 4.4, 0, 0};
|
omega[3] = new double[]{0, 0, 0, 4.4, 0, 0};
|
||||||
omega[4] = new double[]{0, 0, 0, 0, 5.5, 0};
|
omega[4] = new double[]{0, 0, 0, 0, 5.5, 0};
|
||||||
omega[5] = new double[]{0, 0, 0, 0, 0, 6.6};
|
omega[5] = new double[]{0, 0, 0, 0, 0, 6.6};
|
||||||
regression.addData(y, x, omega); // we do need covariance
|
regression.newSampleData(y, x, omega);
|
||||||
</source>
|
</source>
|
||||||
</dd>
|
</dd>
|
||||||
<dd>Estimate of regression values honours the same <code>MultipleLinearRegression</code> interface as
|
|
||||||
the OLS regression.
|
|
||||||
</dd>
|
|
||||||
</dl>
|
</dl>
|
||||||
</p>
|
</p>
|
||||||
</subsection>
|
</subsection>
|
||||||
|
|
Loading…
Reference in New Issue