Removed boolean equalVariances flag from t-test API.

git-svn-id: https://svn.apache.org/repos/asf/jakarta/commons/proper/math/trunk@141418 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Phil Steitz 2004-08-02 04:20:09 +00:00
parent b134bf41f7
commit 77b718485c
4 changed files with 786 additions and 365 deletions

View File

@ -20,12 +20,30 @@ import org.apache.commons.math.stat.univariate.StatisticalSummary;
/**
* An interface for Student's t-tests.
* <p>
* Tests can be:<ul>
* <li>One-sample or two-sample</li>
* <li>One-sided or two-sided</li>
* <li>Paired or unpaired (for two-sample tests)</li>
* <li>Homoscedastic (equal variance assumption) or heteroscedastic
* (for two sample tests)</li>
* <li>Fixed significance level (boolean-valued) or returning p-values.
* </li></ul>
* <p>
* Test statistics are available for all tests. Methods including "Test" in
* in their names perform tests, all other methods return t-statistics. Among
* the "Test" methods, <code>double-</code>valued methods return p-values;
* <code>boolean-</code>valued methods perform fixed significance level tests.
* Significance levels are always specified as numbers between 0 and 0.5
* (e.g. tests at the 95% level use <code>alpha=0.05</code>).
* <p>
* Input to tests can be either <code>double[]</code> arrays or
* {@link StatisticalSummary} instances.
*
*
* @version $Revision: 1.6 $ $Date: 2004/06/23 16:26:14 $
* @version $Revision: 1.7 $ $Date: 2004/08/02 04:20:08 $
*/
public interface TTest {
/**
* Computes a paired, 2-sample t-statistic based on the data in the input
* arrays. The t-statistic returned is equivalent to what would be returned by
@ -46,13 +64,11 @@ public interface TTest {
* @throws MathException if the statistic can not be computed do to a
* convergence or other numerical error.
*/
double pairedT(double[] sample1, double[] sample2)
throws IllegalArgumentException, MathException;
public abstract double pairedT(double[] sample1, double[] sample2)
throws IllegalArgumentException, MathException;
/**
* Returns the <i>observed significance level</i>, or
* <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
* p-value</a>, associated with a paired, two-sample, two-tailed t-test
* <i> p-value</i>, associated with a paired, two-sample, two-tailed t-test
* based on the data in the input arrays.
* <p>
* The number returned is the smallest significance level
@ -83,11 +99,10 @@ public interface TTest {
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
double pairedTTest(double[] sample1, double[] sample2)
throws IllegalArgumentException, MathException;
public abstract double pairedTTest(double[] sample1, double[] sample2)
throws IllegalArgumentException, MathException;
/**
* Performs a paired t-test</a> evaluating the null hypothesis that the
* Performs a paired t-test evaluating the null hypothesis that the
* mean of the paired differences between <code>sample1</code> and
* <code>sample2</code> is 0 in favor of the two-sided alternative that the
* mean paired difference is not equal to 0, with significance level
@ -118,9 +133,11 @@ public interface TTest {
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
boolean pairedTTest(double[] sample1, double[] sample2, double alpha)
throws IllegalArgumentException, MathException;
public abstract boolean pairedTTest(
double[] sample1,
double[] sample2,
double alpha)
throws IllegalArgumentException, MathException;
/**
* Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm#formula">
* t statistic </a> given observed values and a comparison constant.
@ -136,9 +153,8 @@ public interface TTest {
* @return t statistic
* @throws IllegalArgumentException if input array length is less than 2
*/
double t(double mu, double[] observed)
throws IllegalArgumentException;
public abstract double t(double mu, double[] observed)
throws IllegalArgumentException;
/**
* Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section2/prc22.htm#formula">
* t statistic </a> to use in comparing the mean of the dataset described by
@ -155,19 +171,19 @@ public interface TTest {
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
double t(double mu, StatisticalSummary sampleStats)
throws IllegalArgumentException;
public abstract double t(double mu, StatisticalSummary sampleStats)
throws IllegalArgumentException;
/**
* Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* 2-sample t statistic. </a>
* Computes a 2-sample t statistic, under the hypothesis of equal
* subpopulation variances. To compute a t-statistic without the
* equal variances hypothesis, use {@link #t(double[], double[])}.
* <p>
* This statistic can be used to perform a two-sample t-test to compare
* sample means.
* This statistic can be used to perform a (homoscedastic) two-sample
* t-test to compare sample means.
* <p>
* If <code>equalVariances</code> is <code>true</code>, the t-statisitc is
* The t-statisitc is
* <p>
* (1) &nbsp;&nbsp;<code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* &nbsp;&nbsp;<code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* <p>
* where <strong><code>n1</code></strong> is the size of first sample;
* <strong><code> n2</code></strong> is the size of second sample;
@ -181,52 +197,67 @@ public interface TTest {
* with <strong><code>var1<code></strong> the variance of the first sample and
* <strong><code>var2</code></strong> the variance of the second sample.
* <p>
* If <code>equalVariances</code> is <code>false</code>, the t-statisitc is
* <p>
* (2) &nbsp;&nbsp; <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The observed array lengths must both be at least 2.
* </li></ul>
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param equalVariances are the sample variances assumed equal?
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if the statistic can not be computed do to a
* convergence or other numerical error.
*/
double t(double[] sample1, double[] sample2, boolean equalVariances)
throws IllegalArgumentException, MathException;
public abstract double homoscedasticT(double[] sample1, double[] sample2)
throws IllegalArgumentException;
/**
* Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* 2-sample t statistic </a>, comparing the means of the datasets described
* by two {@link StatisticalSummary} instances.
* Computes a 2-sample t statistic, without the hypothesis of equal
* subpopulation variances. To compute a t-statistic assuming equal
* variances, use {@link #homoscedasticT(double[], double[])}.
* <p>
* This statistic can be used to perform a two-sample t-test to compare
* sample means.
* <p>
* If <code>equalVariances</code> is <code>true</code>, the t-statisitc is
* The t-statisitc is
* <p>
* (1) &nbsp;&nbsp;<code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* &nbsp;&nbsp; <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
* <p>
* where <strong><code>n1</code></strong> is the size of first sample;
* <strong><code> n2</code></strong> is the size of second sample;
* <strong><code> m1</code></strong> is the mean of first sample;
* <strong><code> m2</code></strong> is the mean of second sample</li>
* </ul>
* and <strong><code>var</code></strong> is the pooled variance estimate:
* where <strong><code>n1</code></strong> is the size of the first sample
* <strong><code> n2</code></strong> is the size of the second sample;
* <strong><code> m1</code></strong> is the mean of the first sample;
* <strong><code> m2</code></strong> is the mean of the second sample;
* <strong><code> var1</code></strong> is the variance of the first sample;
* <strong><code> var2</code></strong> is the variance of the second sample;
* <p>
* <code>var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1)))</code>
* <p>
* with <strong><code>var1<code></strong> the variance of the first sample and
* <strong><code>var2</code></strong> the variance of the second sample.
* <strong>Preconditions</strong>: <ul>
* <li>The observed array lengths must both be at least 2.
* </li></ul>
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
public abstract double t(double[] sample1, double[] sample2)
throws IllegalArgumentException;
/**
* Computes a 2-sample t statistic </a>, comparing the means of the datasets
* described by two {@link StatisticalSummary} instances, without the
* assumption of equal subpopulation variances. Use
* {@link #homoscedasticT(StatisticalSummary, StatisticalSummary)} to
* compute a t-statistic under the equal variances assumption.
* <p>
* If <code>equalVariances</code> is <code>false</code>, the t-statisitc is
* This statistic can be used to perform a two-sample t-test to compare
* sample means.
* <p>
* (2) &nbsp;&nbsp; <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
* The returned t-statisitc is
* <p>
* &nbsp;&nbsp; <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
* <p>
* where <strong><code>n1</code></strong> is the size of the first sample;
* <strong><code> n2</code></strong> is the size of the second sample;
* <strong><code> m1</code></strong> is the mean of the first sample;
* <strong><code> m2</code></strong> is the mean of the second sample
* <strong><code> var1</code></strong> is the variance of the first sample;
* <strong><code> var2</code></strong> is the variance of the second sample
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The datasets described by the two Univariates must each contain
@ -235,18 +266,55 @@ public interface TTest {
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
* @param equalVariances are the sample variances assumed equal?
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
double t(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
boolean equalVariances)
throws IllegalArgumentException;
public abstract double t(
StatisticalSummary sampleStats1,
StatisticalSummary sampleStats2)
throws IllegalArgumentException;
/**
* Computes a 2-sample t statistic, comparing the means of the datasets
* described by two {@link StatisticalSummary} instances, under the
* assumption of equal subpopulation variances. To compute a t-statistic
* without the equal variances assumption, use
* {@link #t(StatisticalSummary, StatisticalSummary)}.
* <p>
* This statistic can be used to perform a (homoscedastic) two-sample
* t-test to compare sample means.
* <p>
* The t-statisitc returned is
* <p>
* &nbsp;&nbsp;<code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* <p>
* where <strong><code>n1</code></strong> is the size of first sample;
* <strong><code> n2</code></strong> is the size of second sample;
* <strong><code> m1</code></strong> is the mean of first sample;
* <strong><code> m2</code></strong> is the mean of second sample
* and <strong><code>var</code></strong> is the pooled variance estimate:
* <p>
* <code>var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1)))</code>
* <p>
* with <strong><code>var1<code></strong> the variance of the first sample and
* <strong><code>var2</code></strong> the variance of the second sample.
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The datasets described by the two Univariates must each contain
* at least 2 observations.
* </li></ul>
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
public abstract double homoscedasticT(
StatisticalSummary sampleStats1,
StatisticalSummary sampleStats2)
throws IllegalArgumentException;
/**
* Returns the <i>observed significance level</i>, or
* <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
* p-value</a>, associated with a one-sample, two-tailed t-test
* <i>p-value</i>, associated with a one-sample, two-tailed t-test
* comparing the mean of the input array with the constant <code>mu</code>.
* <p>
* The number returned is the smallest significance level
@ -270,13 +338,12 @@ public interface TTest {
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
double tTest(double mu, double[] sample)
throws IllegalArgumentException, MathException;
public abstract double tTest(double mu, double[] sample)
throws IllegalArgumentException, MathException;
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
* which <code>sample</code> is drawn equals <code>mu</code>.
* which <code>sample</code> is drawn equals <code>mu</code>.
* <p>
* Returns <code>true</code> iff the null hypothesis can be
* rejected with confidence <code>1 - alpha</code>. To
@ -308,13 +375,11 @@ public interface TTest {
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error computing the p-value
*/
boolean tTest(double mu, double[] sample, double alpha)
throws IllegalArgumentException, MathException;
public abstract boolean tTest(double mu, double[] sample, double alpha)
throws IllegalArgumentException, MathException;
/**
* Returns the <i>observed significance level</i>, or
* <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
* p-value</a>, associated with a one-sample, two-tailed t-test
* <i>p-value</i>, associated with a one-sample, two-tailed t-test
* comparing the mean of the dataset described by <code>sampleStats</code>
* with the constant <code>mu</code>.
* <p>
@ -327,7 +392,8 @@ public interface TTest {
* <strong>Usage Note:</strong><br>
* The validity of the test depends on the assumptions of the parametric
* t-test procedure, as discussed
* <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">here</a>
* <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
* here</a>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The sample must contain at least 2 observations.
@ -339,17 +405,17 @@ public interface TTest {
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
double tTest(double mu, StatisticalSummary sampleStats)
throws IllegalArgumentException, MathException;
public abstract double tTest(double mu, StatisticalSummary sampleStats)
throws IllegalArgumentException, MathException;
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
* which the dataset described by <code>stats</code> is drawn equals <code>mu</code>.
* two-sided t-test</a> evaluating the null hypothesis that the mean of the
* population from which the dataset described by <code>stats</code> is
* drawn equals <code>mu</code>.
* <p>
* Returns <code>true</code> iff the null hypothesis can be
* rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2</code>
* Returns <code>true</code> iff the null hypothesis can be rejected with
* confidence <code>1 - alpha</code>. To perform a 1-sided test, use
* <code>alpha / 2.</code>
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>sample mean = mu </code> at
@ -377,13 +443,14 @@ public interface TTest {
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
boolean tTest(double mu, StatisticalSummary sampleStats, double alpha)
throws IllegalArgumentException, MathException;
public abstract boolean tTest(
double mu,
StatisticalSummary sampleStats,
double alpha)
throws IllegalArgumentException, MathException;
/**
* Returns the <i>observed significance level</i>, or
* <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
* p-value</a>, associated with a two-sample, two-tailed t-test
* <i>p-value</i>, associated with a two-sample, two-tailed t-test
* comparing the means of the input arrays.
* <p>
* The number returned is the smallest significance level
@ -391,19 +458,15 @@ public interface TTest {
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
* If the <code>equalVariances</code> parameter is <code>false,</code>
* the test does not assume that the underlying popuation variances are
* The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
* sample data to compute the p-value. In this case, formula (1) for the
* {@link #t(double[], double[], boolean)} statistic is used
* and the Welch-Satterthwaite approximation to the degrees of freedom is used,
* sample data to compute the p-value. The t-statistic used is as defined in
* {@link #t(double[], double[])} and the Welch-Satterthwaite approximation
* to the degrees of freedom is used,
* as described
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* here.</a>
* <p>
* If <code>equalVariances</code> is <code>true</code>, a pooled variance
* estimate is used to compute the t-statistic (formula (2)) and the sum of the
* sample sizes minus 2 is used as the degrees of freedom.
* here.</a> To perform the test under the assumption of equal subpopulation
* variances, use {@link #homoscedasticTTest(double[], double[])}.
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
@ -417,47 +480,78 @@ public interface TTest {
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
double tTest(double[] sample1, double[] sample2, boolean equalVariances)
throws IllegalArgumentException, MathException;
public abstract double tTest(double[] sample1, double[] sample2)
throws IllegalArgumentException, MathException;
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* Returns the <i>observed significance level</i>, or
* <i>p-value</i>, associated with a two-sample, two-tailed t-test
* comparing the means of the input arrays, under the assumption that
* the two samples are drawn from subpopulations with equal variances.
* To perform the test without the equal variances assumption, use
* {@link #tTest(double[], double[])}.
* <p>
* The number returned is the smallest significance level
* at which one can reject the null hypothesis that the two means are
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
* A pooled variance estimate is used to compute the t-statistic. See
* {@link #homoscedasticT(double[], double[])}. The sum of the sample sizes
* minus 2 is used as the degrees of freedom.
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
* t-test procedure, as discussed
* <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
* here</a>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The observed array lengths must both be at least 2.
* </li></ul>
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
public abstract double homoscedasticTTest(
double[] sample1,
double[] sample2)
throws IllegalArgumentException, MathException;
/**
* Performs a
* <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code>
* and <code>sample2</code> are drawn from populations with the same mean,
* with significance level <code>alpha</code>.
* with significance level <code>alpha</code>. This test does not assume
* that the subpopulation variances are equal. To perform the test assuming
* equal variances, use
* {@link #homoscedasticTTest(double[], double[], double)}.
* <p>
* Returns <code>true</code> iff the null hypothesis that the means are
* equal can be rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2</code>
* <p>
* If the <code>equalVariances</code> parameter is <code>false,</code>
* the test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
* sample data to compute the p-value. In this case, formula (1) for the
* {@link #t(double[], double[], boolean)} statistic is used
* and the Welch-Satterthwaite approximation to the degrees of freedom is used,
* as described
* See {@link #t(double[], double[])} for the formula used to compute the
* t-statistic. Degrees of freedom are approximated using the
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* here.</a>
* <p>
* If <code>equalVariances</code> is <code>true</code>, a pooled variance
* estimate is used to compute the t-statistic (formula (2)) and the sum of the
* sample sizes minus 2 is used as the degrees of freedom.
* Welch-Satterthwaite approximation.</a>
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
* the 95% level, under the assumption of equal subpopulation variances,
* use <br><code>tTest(sample1, sample2, 0.05, true) </code>
* the 95% level, use
* <br><code>tTest(sample1, sample2, 0.05). </code>
* </li>
* <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
* at the 99% level without assuming equal variances, first verify that the measured
* mean of <code>sample 1</code> is less than the mean of <code>sample 2</code>
* and then use <br><code>tTest(sample1, sample2, 0.005, false) </code>
* <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>,
* first verify that the measured mean of <code>sample 1</code> is less
* than the mean of <code>sample 2</code> and then use
* <br><code>tTest(sample1, sample2, 0.005) </code>
* </li></ol>
* <p>
* <strong>Usage Note:</strong><br>
@ -475,40 +569,126 @@ public interface TTest {
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param alpha significance level of the test
* @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
boolean tTest(double[] sample1, double[] sample2, double alpha,
boolean equalVariances)
throws IllegalArgumentException, MathException;
public abstract boolean tTest(
double[] sample1,
double[] sample2,
double alpha)
throws IllegalArgumentException, MathException;
/**
* Performs a
* <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code>
* and <code>sample2</code> are drawn from populations with the same mean,
* with significance level <code>alpha</code>, assuming that the
* subpopulation variances are equal. Use
* {@link #tTest(double[], double[], double)} to perform the test without
* the assumption of equal variances.
* <p>
* Returns <code>true</code> iff the null hypothesis that the means are
* equal can be rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2.</code> To perform the test
* without the assumption of equal subpopulation variances, use
* {@link #tTest(double[], double[], double)}.
* <p>
* A pooled variance estimate is used to compute the t-statistic. See
* {@link #t(double[], double[])} for the formula. The sum of the sample
* sizes minus 2 is used as the degrees of freedom.
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
* the 95% level, use <br><code>tTest(sample1, sample2, 0.05). </code>
* </li>
* <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2, </code>
* at the 99% level, first verify that the measured mean of
* <code>sample 1</code> is less than the mean of <code>sample 2</code>
* and then use
* <br><code>tTest(sample1, sample2, 0.005) </code>
* </li></ol>
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the test depends on the assumptions of the parametric
* t-test procedure, as discussed
* <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
* here</a>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The observed array lengths must both be at least 2.
* </li>
* <li> <code> 0 < alpha < 0.5 </code>
* </li></ul>
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param alpha significance level of the test
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
public abstract boolean homoscedasticTTest(
double[] sample1,
double[] sample2,
double alpha)
throws IllegalArgumentException, MathException;
/**
* Returns the <i>observed significance level</i>, or
* <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
* p-value</a>, associated with a two-sample, two-tailed t-test
* comparing the means of the datasets described by two Univariates.
* <i>p-value</i>, associated with a two-sample, two-tailed t-test
* comparing the means of the datasets described by two StatisticalSummary
* instances.
* <p>
* The number returned is the smallest significance level
* at which one can reject the null hypothesis that the two means are
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
* If the <code>equalVariances</code> parameter is <code>false,</code>
* the test does not assume that the underlying popuation variances are
* The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
* sample data to compute the p-value. In this case, formula (1) for the
* {@link #t(double[], double[], boolean)} statistic is used
* and the Welch-Satterthwaite approximation to the degrees of freedom is used,
* as described
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* here.</a>
* sample data to compute the p-value. To perform the test assuming
* equal variances, use
* {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
* <p>
* If <code>equalVariances</code> is <code>true</code>, a pooled variance
* estimate is used to compute the t-statistic (formula (2)) and the sum of the
* sample sizes minus 2 is used as the degrees of freedom.
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
* t-test procedure, as discussed
* <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
* here</a>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The datasets described by the two Univariates must each contain
* at least 2 observations.
* </li></ul>
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
public abstract double tTest(
StatisticalSummary sampleStats1,
StatisticalSummary sampleStats2)
throws IllegalArgumentException, MathException;
/**
* Returns the <i>observed significance level</i>, or
* <i>p-value</i>, associated with a two-sample, two-tailed t-test
* comparing the means of the datasets described by two StatisticalSummary
* instances, under the hypothesis of equal subpopulation variances. To
* perform a test without the equal variances assumption, use
* {@link #tTest(StatisticalSummary, StatisticalSummary)}.
* <p>
* The number returned is the smallest significance level
* at which one can reject the null hypothesis that the two means are
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
* See {@link #homoscedasticT(double[], double[])} for the formula used to
* compute the t-statistic. The sum of the sample sizes minus 2 is used as
* the degrees of freedom.
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
@ -522,49 +702,44 @@ public interface TTest {
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
* @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
boolean equalVariances)
throws IllegalArgumentException, MathException;
public abstract double homoscedasticTTest(
StatisticalSummary sampleStats1,
StatisticalSummary sampleStats2)
throws IllegalArgumentException, MathException;
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that <code>sampleStats1</code>
* and <code>sampleStats2</code> describe datasets drawn from populations with the
* same mean, with significance level <code>alpha</code>.
* Performs a
* <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that
* <code>sampleStats1</code> and <code>sampleStats2</code> describe
* datasets drawn from populations with the same mean, with significance
* level <code>alpha</code>. This test does not assume that the
* subpopulation variances are equal. To perform the test under the equal
* variances assumption, use
* {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
* <p>
* Returns <code>true</code> iff the null hypothesis that the means are
* equal can be rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2</code>
* <p>
* If the <code>equalVariances</code> parameter is <code>false,</code>
* the test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
* sample data to compute the p-value. In this case, formula (1) for the
* {@link #t(double[], double[], boolean)} statistic is used
* and the Welch-Satterthwaite approximation to the degrees of freedom is used,
* as described
* See {@link #t(double[], double[])} for the formula used to compute the
* t-statistic. Degrees of freedom are approximated using the
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* here.</a>
* <p>
* If <code>equalVariances</code> is <code>true</code>, a pooled variance
* estimate is used to compute the t-statistic (formula (2)) and the sum of the
* sample sizes minus 2 is used as the degrees of freedom.
* Welch-Satterthwaite approximation.</a>
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
* the 95% level under the assumption of equal subpopulation variances, use
* <br><code>tTest(sampleStats1, sampleStats2, 0.05, true) </code>
* the 95%, use
* <br><code>tTest(sampleStats1, sampleStats2, 0.05) </code>
* </li>
* <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
* at the 99% level without assuming that subpopulation variances are equal,
* first verify that the measured mean of <code>sample 1</code> is less than
* the mean of <code>sample 2</code> and then use
* <br><code>tTest(sampleStats1, sampleStats2, 0.005, false) </code>
* at the 99% level, first verify that the measured mean of
* <code>sample 1</code> is less than the mean of <code>sample 2</code>
* and then use
* <br><code>tTest(sampleStats1, sampleStats2, 0.005) </code>
* </li></ol>
* <p>
* <strong>Usage Note:</strong><br>
@ -583,13 +758,14 @@ public interface TTest {
* @param sampleStats1 StatisticalSummary describing sample data values
* @param sampleStats2 StatisticalSummary describing sample data values
* @param alpha significance level of the test
* @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
boolean tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
double alpha, boolean equalVariances)
throws IllegalArgumentException, MathException;
}
public abstract boolean tTest(
StatisticalSummary sampleStats1,
StatisticalSummary sampleStats2,
double alpha)
throws IllegalArgumentException, MathException;
}

View File

@ -23,8 +23,11 @@ import org.apache.commons.math.stat.univariate.StatisticalSummary;
/**
* Implements t-test statistics defined in the {@link TTest} interface.
* <p>
* Uses commons-math {@link org.apache.commons.math.distribution.TDistribution}
* implementation to estimate exact p-values.
*
* @version $Revision: 1.8 $ $Date: 2004/06/23 16:26:14 $
* @version $Revision: 1.9 $ $Date: 2004/08/02 04:20:08 $
*/
public class TTestImpl implements TTest {
@ -72,8 +75,7 @@ public class TTestImpl implements TTest {
/**
* Returns the <i>observed significance level</i>, or
* <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
* p-value</a>, associated with a paired, two-sample, two-tailed t-test
* <i> p-value</i>, associated with a paired, two-sample, two-tailed t-test
* based on the data in the input arrays.
* <p>
* The number returned is the smallest significance level
@ -113,7 +115,7 @@ public class TTestImpl implements TTest {
}
/**
* Performs a paired t-test</a> evaluating the null hypothesis that the
* Performs a paired t-test evaluating the null hypothesis that the
* mean of the paired differences between <code>sample1</code> and
* <code>sample2</code> is 0 in favor of the two-sided alternative that the
* mean paired difference is not equal to 0, with significance level
@ -172,7 +174,8 @@ public class TTestImpl implements TTest {
if ((observed == null) || (observed.length < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
return t(StatUtils.mean(observed), mu, StatUtils.variance(observed), observed.length);
return t(StatUtils.mean(observed), mu, StatUtils.variance(observed),
observed.length);
}
/**
@ -196,19 +199,21 @@ public class TTestImpl implements TTest {
if ((sampleStats == null) || (sampleStats.getN() < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
return t(sampleStats.getMean(), mu, sampleStats.getVariance(), sampleStats.getN());
return t(sampleStats.getMean(), mu, sampleStats.getVariance(),
sampleStats.getN());
}
/**
* Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* 2-sample t statistic. </a>
* Computes a 2-sample t statistic, under the hypothesis of equal
* subpopulation variances. To compute a t-statistic without the
* equal variances hypothesis, use {@link #t(double[], double[])}.
* <p>
* This statistic can be used to perform a two-sample t-test to compare
* sample means.
* This statistic can be used to perform a (homoscedastic) two-sample
* t-test to compare sample means.
* <p>
* If <code>equalVariances</code> is <code>true</code>, the t-statisitc is
* The t-statisitc is
* <p>
* (1) &nbsp;&nbsp;<code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* &nbsp;&nbsp;<code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* <p>
* where <strong><code>n1</code></strong> is the size of first sample;
* <strong><code> n2</code></strong> is the size of second sample;
@ -222,58 +227,85 @@ public class TTestImpl implements TTest {
* with <strong><code>var1<code></strong> the variance of the first sample and
* <strong><code>var2</code></strong> the variance of the second sample.
* <p>
* If <code>equalVariances</code> is <code>false</code>, the t-statisitc is
* <p>
* (2) &nbsp;&nbsp; <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The observed array lengths must both be at least 2.
* </li></ul>
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param equalVariances are the sample variances assumed equal?
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
public double t(double[] sample1, double[] sample2, boolean equalVariances)
public double homoscedasticT(double[] sample1, double[] sample2)
throws IllegalArgumentException {
if ((sample1 == null) || (sample2 == null ||
Math.min(sample1.length, sample2.length) < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
return t(StatUtils.mean(sample1), StatUtils.mean(sample2), StatUtils.variance(sample1),
StatUtils.variance(sample2), (double) sample1.length,
(double) sample2.length, equalVariances);
return homoscedasticT(StatUtils.mean(sample1), StatUtils.mean(sample2),
StatUtils.variance(sample1), StatUtils.variance(sample2),
(double) sample1.length, (double) sample2.length);
}
/**
* Computes a <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* 2-sample t statistic </a>, comparing the means of the datasets described
* by two {@link StatisticalSummary} instances.
* Computes a 2-sample t statistic, without the hypothesis of equal
* subpopulation variances. To compute a t-statistic assuming equal
* variances, use {@link #homoscedasticT(double[], double[])}.
* <p>
* This statistic can be used to perform a two-sample t-test to compare
* sample means.
* <p>
* If <code>equalVariances</code> is <code>true</code>, the t-statisitc is
* The t-statisitc is
* <p>
* (1) &nbsp;&nbsp;<code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* &nbsp;&nbsp; <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
* <p>
* where <strong><code>n1</code></strong> is the size of first sample;
* <strong><code> n2</code></strong> is the size of second sample;
* <strong><code> m1</code></strong> is the mean of first sample;
* <strong><code> m2</code></strong> is the mean of second sample</li>
* </ul>
* and <strong><code>var</code></strong> is the pooled variance estimate:
* where <strong><code>n1</code></strong> is the size of the first sample
* <strong><code> n2</code></strong> is the size of the second sample;
* <strong><code> m1</code></strong> is the mean of the first sample;
* <strong><code> m2</code></strong> is the mean of the second sample;
* <strong><code> var1</code></strong> is the variance of the first sample;
* <strong><code> var2</code></strong> is the variance of the second sample;
* <p>
* <code>var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1)))</code>
* <p>
* with <strong><code>var1<code></strong> the variance of the first sample and
* <strong><code>var2</code></strong> the variance of the second sample.
* <strong>Preconditions</strong>: <ul>
* <li>The observed array lengths must both be at least 2.
* </li></ul>
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
public double t(double[] sample1, double[] sample2)
throws IllegalArgumentException {
if ((sample1 == null) || (sample2 == null ||
Math.min(sample1.length, sample2.length) < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
return t(StatUtils.mean(sample1), StatUtils.mean(sample2),
StatUtils.variance(sample1), StatUtils.variance(sample2),
(double) sample1.length, (double) sample2.length);
}
/**
* Computes a 2-sample t statistic </a>, comparing the means of the datasets
* described by two {@link StatisticalSummary} instances, without the
* assumption of equal subpopulation variances. Use
* {@link #homoscedasticT(StatisticalSummary, StatisticalSummary)} to
* compute a t-statistic under the equal variances assumption.
* <p>
* If <code>equalVariances</code> is <code>false</code>, the t-statisitc is
* This statistic can be used to perform a two-sample t-test to compare
* sample means.
* <p>
* (2) &nbsp;&nbsp; <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
* The returned t-statisitc is
* <p>
* &nbsp;&nbsp; <code> t = (m1 - m2) / sqrt(var1/n1 + var2/n2)</code>
* <p>
* where <strong><code>n1</code></strong> is the size of the first sample;
* <strong><code> n2</code></strong> is the size of the second sample;
* <strong><code> m1</code></strong> is the mean of the first sample;
* <strong><code> m2</code></strong> is the mean of the second sample
* <strong><code> var1</code></strong> is the variance of the first sample;
* <strong><code> var2</code></strong> is the variance of the second sample
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The datasets described by the two Univariates must each contain
@ -282,27 +314,73 @@ public class TTestImpl implements TTest {
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
* @param equalVariances are the sample variances assumed equal?
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
public double t(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
boolean equalVariances)
public double t(StatisticalSummary sampleStats1,
StatisticalSummary sampleStats2)
throws IllegalArgumentException {
if ((sampleStats1 == null) ||
(sampleStats2 == null ||
Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
return t(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
sampleStats2.getVariance(), (double) sampleStats1.getN(),
(double) sampleStats2.getN(), equalVariances);
return t(sampleStats1.getMean(), sampleStats2.getMean(),
sampleStats1.getVariance(), sampleStats2.getVariance(),
(double) sampleStats1.getN(), (double) sampleStats2.getN());
}
/**
* Computes a 2-sample t statistic, comparing the means of the datasets
* described by two {@link StatisticalSummary} instances, under the
* assumption of equal subpopulation variances. To compute a t-statistic
* without the equal variances assumption, use
* {@link #t(StatisticalSummary, StatisticalSummary)}.
* <p>
* This statistic can be used to perform a (homoscedastic) two-sample
* t-test to compare sample means.
* <p>
* The t-statisitc returned is
* <p>
* &nbsp;&nbsp;<code> t = (m1 - m2) / (sqrt(1/n1 +1/n2) sqrt(var))</code>
* <p>
* where <strong><code>n1</code></strong> is the size of first sample;
* <strong><code> n2</code></strong> is the size of second sample;
* <strong><code> m1</code></strong> is the mean of first sample;
* <strong><code> m2</code></strong> is the mean of second sample
* and <strong><code>var</code></strong> is the pooled variance estimate:
* <p>
* <code>var = sqrt(((n1 - 1)var1 + (n2 - 1)var2) / ((n1-1) + (n2-1)))</code>
* <p>
* with <strong><code>var1<code></strong> the variance of the first sample and
* <strong><code>var2</code></strong> the variance of the second sample.
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The datasets described by the two Univariates must each contain
* at least 2 observations.
* </li></ul>
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
* @return t statistic
* @throws IllegalArgumentException if the precondition is not met
*/
public double homoscedasticT(StatisticalSummary sampleStats1,
StatisticalSummary sampleStats2)
throws IllegalArgumentException {
if ((sampleStats1 == null) ||
(sampleStats2 == null ||
Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
return homoscedasticT(sampleStats1.getMean(), sampleStats2.getMean(),
sampleStats1.getVariance(), sampleStats2.getVariance(),
(double) sampleStats1.getN(), (double) sampleStats2.getN());
}
/**
* Returns the <i>observed significance level</i>, or
* <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
* p-value</a>, associated with a one-sample, two-tailed t-test
* <i>p-value</i>, associated with a one-sample, two-tailed t-test
* comparing the mean of the input array with the constant <code>mu</code>.
* <p>
* The number returned is the smallest significance level
@ -331,13 +409,14 @@ public class TTestImpl implements TTest {
if ((sample == null) || (sample.length < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
return tTest( StatUtils.mean(sample), mu, StatUtils.variance(sample), sample.length);
return tTest( StatUtils.mean(sample), mu, StatUtils.variance(sample),
sample.length);
}
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
* which <code>sample</code> is drawn equals <code>mu</code>.
* which <code>sample</code> is drawn equals <code>mu</code>.
* <p>
* Returns <code>true</code> iff the null hypothesis can be
* rejected with confidence <code>1 - alpha</code>. To
@ -379,8 +458,7 @@ public class TTestImpl implements TTest {
/**
* Returns the <i>observed significance level</i>, or
* <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
* p-value</a>, associated with a one-sample, two-tailed t-test
* <i>p-value</i>, associated with a one-sample, two-tailed t-test
* comparing the mean of the dataset described by <code>sampleStats</code>
* with the constant <code>mu</code>.
* <p>
@ -393,7 +471,8 @@ public class TTestImpl implements TTest {
* <strong>Usage Note:</strong><br>
* The validity of the test depends on the assumptions of the parametric
* t-test procedure, as discussed
* <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">here</a>
* <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
* here</a>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The sample must contain at least 2 observations.
@ -410,17 +489,19 @@ public class TTestImpl implements TTest {
if ((sampleStats == null) || (sampleStats.getN() < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
return tTest(sampleStats.getMean(), mu, sampleStats.getVariance(), sampleStats.getN());
return tTest(sampleStats.getMean(), mu, sampleStats.getVariance(),
sampleStats.getN());
}
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that the mean of the population from
* which the dataset described by <code>stats</code> is drawn equals <code>mu</code>.
* two-sided t-test</a> evaluating the null hypothesis that the mean of the
* population from which the dataset described by <code>stats</code> is
* drawn equals <code>mu</code>.
* <p>
* Returns <code>true</code> iff the null hypothesis can be
* rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2</code>
* Returns <code>true</code> iff the null hypothesis can be rejected with
* confidence <code>1 - alpha</code>. To perform a 1-sided test, use
* <code>alpha / 2.</code>
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>sample mean = mu </code> at
@ -448,7 +529,8 @@ public class TTestImpl implements TTest {
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
public boolean tTest( double mu, StatisticalSummary sampleStats, double alpha)
public boolean tTest( double mu, StatisticalSummary sampleStats,
double alpha)
throws IllegalArgumentException, MathException {
if ((alpha <= 0) || (alpha > 0.5)) {
throw new IllegalArgumentException("bad significance level: " + alpha);
@ -458,8 +540,7 @@ public class TTestImpl implements TTest {
/**
* Returns the <i>observed significance level</i>, or
* <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
* p-value</a>, associated with a two-sample, two-tailed t-test
* <i>p-value</i>, associated with a two-sample, two-tailed t-test
* comparing the means of the input arrays.
* <p>
* The number returned is the smallest significance level
@ -467,19 +548,15 @@ public class TTestImpl implements TTest {
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
* If the <code>equalVariances</code> parameter is <code>false,</code>
* the test does not assume that the underlying popuation variances are
* The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
* sample data to compute the p-value. In this case, formula (1) for the
* {@link #t(double[], double[], boolean)} statistic is used
* and the Welch-Satterthwaite approximation to the degrees of freedom is used,
* sample data to compute the p-value. The t-statistic used is as defined in
* {@link #t(double[], double[])} and the Welch-Satterthwaite approximation
* to the degrees of freedom is used,
* as described
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* here.</a>
* <p>
* If <code>equalVariances</code> is <code>true</code>, a pooled variance
* estimate is used to compute the t-statistic (formula (2)) and the sum of the
* sample sizes minus 2 is used as the degrees of freedom.
* here.</a> To perform the test under the assumption of equal subpopulation
* variances, use {@link #homoscedasticTTest(double[], double[])}.
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
@ -493,55 +570,96 @@ public class TTestImpl implements TTest {
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
public double tTest(double[] sample1, double[] sample2, boolean equalVariances)
public double tTest(double[] sample1, double[] sample2)
throws IllegalArgumentException, MathException {
if ((sample1 == null) || (sample2 == null ||
Math.min(sample1.length, sample2.length) < 2)) {
throw new IllegalArgumentException("insufficient data");
}
return tTest(StatUtils.mean(sample1), StatUtils.mean(sample2), StatUtils.variance(sample1),
StatUtils.variance(sample2), (double) sample1.length,
(double) sample2.length, equalVariances);
return tTest(StatUtils.mean(sample1), StatUtils.mean(sample2),
StatUtils.variance(sample1), StatUtils.variance(sample2),
(double) sample1.length, (double) sample2.length);
}
/**
* Returns the <i>observed significance level</i>, or
* <i>p-value</i>, associated with a two-sample, two-tailed t-test
* comparing the means of the input arrays, under the assumption that
* the two samples are drawn from subpopulations with equal variances.
* To perform the test without the equal variances assumption, use
* {@link #tTest(double[], double[])}.
* <p>
* The number returned is the smallest significance level
* at which one can reject the null hypothesis that the two means are
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
* A pooled variance estimate is used to compute the t-statistic. See
* {@link #homoscedasticT(double[], double[])}. The sum of the sample sizes
* minus 2 is used as the degrees of freedom.
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
* t-test procedure, as discussed
* <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
* here</a>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The observed array lengths must both be at least 2.
* </li></ul>
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
public double homoscedasticTTest(double[] sample1, double[] sample2)
throws IllegalArgumentException, MathException {
if ((sample1 == null) || (sample2 == null ||
Math.min(sample1.length, sample2.length) < 2)) {
throw new IllegalArgumentException("insufficient data");
}
return homoscedasticTTest(StatUtils.mean(sample1),
StatUtils.mean(sample2), StatUtils.variance(sample1),
StatUtils.variance(sample2), (double) sample1.length,
(double) sample2.length);
}
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* Performs a
* <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code>
* and <code>sample2</code> are drawn from populations with the same mean,
* with significance level <code>alpha</code>.
* with significance level <code>alpha</code>. This test does not assume
* that the subpopulation variances are equal. To perform the test assuming
* equal variances, use
* {@link #homoscedasticTTest(double[], double[], double)}.
* <p>
* Returns <code>true</code> iff the null hypothesis that the means are
* equal can be rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2</code>
* <p>
* If the <code>equalVariances</code> parameter is <code>false,</code>
* the test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
* sample data to compute the p-value. In this case, formula (1) for the
* {@link #t(double[], double[], boolean)} statistic is used
* and the Welch-Satterthwaite approximation to the degrees of freedom is used,
* as described
* See {@link #t(double[], double[])} for the formula used to compute the
* t-statistic. Degrees of freedom are approximated using the
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* here.</a>
* <p>
* If <code>equalVariances</code> is <code>true</code>, a pooled variance
* estimate is used to compute the t-statistic (formula (2)) and the sum of the
* sample sizes minus 2 is used as the degrees of freedom.
* Welch-Satterthwaite approximation.</a>
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
* the 95% level, under the assumption of equal subpopulation variances,
* use <br><code>tTest(sample1, sample2, 0.05, true) </code>
* the 95% level, use
* <br><code>tTest(sample1, sample2, 0.05). </code>
* </li>
* <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
* at the 99% level without assuming equal variances, first verify that the measured
* mean of <code>sample 1</code> is less than the mean of <code>sample 2</code>
* and then use <br><code>tTest(sample1, sample2, 0.005, false) </code>
* <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>,
* first verify that the measured mean of <code>sample 1</code> is less
* than the mean of <code>sample 2</code> and then use
* <br><code>tTest(sample1, sample2, 0.005) </code>
* </li></ol>
* <p>
* <strong>Usage Note:</strong><br>
@ -559,45 +677,141 @@ public class TTestImpl implements TTest {
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param alpha significance level of the test
* @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
public boolean tTest(double[] sample1, double[] sample2, double alpha,
boolean equalVariances)
public boolean tTest(double[] sample1, double[] sample2,
double alpha)
throws IllegalArgumentException, MathException {
if ((alpha <= 0) || (alpha > 0.5)) {
throw new IllegalArgumentException("bad significance level: " + alpha);
}
return (tTest(sample1, sample2, equalVariances) < alpha);
return (tTest(sample1, sample2) < alpha);
}
/**
* Performs a
* <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that <code>sample1</code>
* and <code>sample2</code> are drawn from populations with the same mean,
* with significance level <code>alpha</code>, assuming that the
* subpopulation variances are equal. Use
* {@link #tTest(double[], double[], double)} to perform the test without
* the assumption of equal variances.
* <p>
* Returns <code>true</code> iff the null hypothesis that the means are
* equal can be rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2.</code> To perform the test
* without the assumption of equal subpopulation variances, use
* {@link #tTest(double[], double[], double)}.
* <p>
* A pooled variance estimate is used to compute the t-statistic. See
* {@link #t(double[], double[])} for the formula. The sum of the sample
* sizes minus 2 is used as the degrees of freedom.
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
* the 95% level, use <br><code>tTest(sample1, sample2, 0.05). </code>
* </li>
* <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2, </code>
* at the 99% level, first verify that the measured mean of
* <code>sample 1</code> is less than the mean of <code>sample 2</code>
* and then use
* <br><code>tTest(sample1, sample2, 0.005) </code>
* </li></ol>
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the test depends on the assumptions of the parametric
* t-test procedure, as discussed
* <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
* here</a>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The observed array lengths must both be at least 2.
* </li>
* <li> <code> 0 < alpha < 0.5 </code>
* </li></ul>
*
* @param sample1 array of sample data values
* @param sample2 array of sample data values
* @param alpha significance level of the test
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
public boolean homoscedasticTTest(double[] sample1, double[] sample2,
double alpha)
throws IllegalArgumentException, MathException {
if ((alpha <= 0) || (alpha > 0.5)) {
throw new IllegalArgumentException("bad significance level: " + alpha);
}
return (homoscedasticTTest(sample1, sample2) < alpha);
}
/**
* Returns the <i>observed significance level</i>, or
* <a href="http://www.cas.lancs.ac.uk/glossary_v1.1/hyptest.html#pvalue">
* p-value</a>, associated with a two-sample, two-tailed t-test
* comparing the means of the datasets described by two Univariates.
* <i>p-value</i>, associated with a two-sample, two-tailed t-test
* comparing the means of the datasets described by two StatisticalSummary
* instances.
* <p>
* The number returned is the smallest significance level
* at which one can reject the null hypothesis that the two means are
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
* If the <code>equalVariances</code> parameter is <code>false,</code>
* the test does not assume that the underlying popuation variances are
* The test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
* sample data to compute the p-value. In this case, formula (1) for the
* {@link #t(double[], double[], boolean)} statistic is used
* and the Welch-Satterthwaite approximation to the degrees of freedom is used,
* as described
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* here.</a>
* sample data to compute the p-value. To perform the test assuming
* equal variances, use
* {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
* <p>
* If <code>equalVariances</code> is <code>true</code>, a pooled variance
* estimate is used to compute the t-statistic (formula (2)) and the sum of the
* sample sizes minus 2 is used as the degrees of freedom.
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
* t-test procedure, as discussed
* <a href="http://www.basic.nwu.edu/statguidefiles/ttest_unpaired_ass_viol.html">
* here</a>
* <p>
* <strong>Preconditions</strong>: <ul>
* <li>The datasets described by the two Univariates must each contain
* at least 2 observations.
* </li></ul>
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
public double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2)
throws IllegalArgumentException, MathException {
if ((sampleStats1 == null) || (sampleStats2 == null ||
Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
return tTest(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
sampleStats2.getVariance(), (double) sampleStats1.getN(),
(double) sampleStats2.getN());
}
/**
* Returns the <i>observed significance level</i>, or
* <i>p-value</i>, associated with a two-sample, two-tailed t-test
* comparing the means of the datasets described by two StatisticalSummary
* instances, under the hypothesis of equal subpopulation variances. To
* perform a test without the equal variances assumption, use
* {@link #tTest(StatisticalSummary, StatisticalSummary)}.
* <p>
* The number returned is the smallest significance level
* at which one can reject the null hypothesis that the two means are
* equal in favor of the two-sided alternative that they are different.
* For a one-sided test, divide the returned value by 2.
* <p>
* See {@link #homoscedasticT(double[], double[])} for the formula used to
* compute the t-statistic. The sum of the sample sizes minus 2 is used as
* the degrees of freedom.
* <p>
* <strong>Usage Note:</strong><br>
* The validity of the p-value depends on the assumptions of the parametric
@ -611,57 +825,53 @@ public class TTestImpl implements TTest {
*
* @param sampleStats1 StatisticalSummary describing data from the first sample
* @param sampleStats2 StatisticalSummary describing data from the second sample
* @param equalVariances are sample variances assumed to be equal?
* @return p-value for t-test
* @throws IllegalArgumentException if the precondition is not met
* @throws MathException if an error occurs computing the p-value
*/
public double tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
boolean equalVariances)
public double homoscedasticTTest(StatisticalSummary sampleStats1,
StatisticalSummary sampleStats2)
throws IllegalArgumentException, MathException {
if ((sampleStats1 == null) || (sampleStats2 == null ||
Math.min(sampleStats1.getN(), sampleStats2.getN()) < 2)) {
throw new IllegalArgumentException("insufficient data for t statistic");
}
return tTest(sampleStats1.getMean(), sampleStats2.getMean(), sampleStats1.getVariance(),
return homoscedasticTTest(sampleStats1.getMean(),
sampleStats2.getMean(), sampleStats1.getVariance(),
sampleStats2.getVariance(), (double) sampleStats1.getN(),
(double) sampleStats2.getN(), equalVariances);
(double) sampleStats2.getN());
}
/**
* Performs a <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that <code>sampleStats1</code>
* and <code>sampleStats2</code> describe datasets drawn from populations with the
* same mean, with significance level <code>alpha</code>.
* Performs a
* <a href="http://www.itl.nist.gov/div898/handbook/eda/section3/eda353.htm">
* two-sided t-test</a> evaluating the null hypothesis that
* <code>sampleStats1</code> and <code>sampleStats2</code> describe
* datasets drawn from populations with the same mean, with significance
* level <code>alpha</code>. This test does not assume that the
* subpopulation variances are equal. To perform the test under the equal
* variances assumption, use
* {@link #homoscedasticTTest(StatisticalSummary, StatisticalSummary)}.
* <p>
* Returns <code>true</code> iff the null hypothesis that the means are
* equal can be rejected with confidence <code>1 - alpha</code>. To
* perform a 1-sided test, use <code>alpha / 2</code>
* <p>
* If the <code>equalVariances</code> parameter is <code>false,</code>
* the test does not assume that the underlying popuation variances are
* equal and it uses approximated degrees of freedom computed from the
* sample data to compute the p-value. In this case, formula (1) for the
* {@link #t(double[], double[], boolean)} statistic is used
* and the Welch-Satterthwaite approximation to the degrees of freedom is used,
* as described
* See {@link #t(double[], double[])} for the formula used to compute the
* t-statistic. Degrees of freedom are approximated using the
* <a href="http://www.itl.nist.gov/div898/handbook/prc/section3/prc31.htm">
* here.</a>
* <p>
* If <code>equalVariances</code> is <code>true</code>, a pooled variance
* estimate is used to compute the t-statistic (formula (2)) and the sum of the
* sample sizes minus 2 is used as the degrees of freedom.
* Welch-Satterthwaite approximation.</a>
* <p>
* <strong>Examples:</strong><br><ol>
* <li>To test the (2-sided) hypothesis <code>mean 1 = mean 2 </code> at
* the 95% level under the assumption of equal subpopulation variances, use
* <br><code>tTest(sampleStats1, sampleStats2, 0.05, true) </code>
* the 95%, use
* <br><code>tTest(sampleStats1, sampleStats2, 0.05) </code>
* </li>
* <li>To test the (one-sided) hypothesis <code> mean 1 < mean 2 </code>
* at the 99% level without assuming that subpopulation variances are equal,
* first verify that the measured mean of <code>sample 1</code> is less than
* the mean of <code>sample 2</code> and then use
* <br><code>tTest(sampleStats1, sampleStats2, 0.005, false) </code>
* at the 99% level, first verify that the measured mean of
* <code>sample 1</code> is less than the mean of <code>sample 2</code>
* and then use
* <br><code>tTest(sampleStats1, sampleStats2, 0.005) </code>
* </li></ol>
* <p>
* <strong>Usage Note:</strong><br>
@ -680,19 +890,18 @@ public class TTestImpl implements TTest {
* @param sampleStats1 StatisticalSummary describing sample data values
* @param sampleStats2 StatisticalSummary describing sample data values
* @param alpha significance level of the test
* @param equalVariances are sample variances assumed to be equal?
* @return true if the null hypothesis can be rejected with
* confidence 1 - alpha
* @throws IllegalArgumentException if the preconditions are not met
* @throws MathException if an error occurs performing the test
*/
public boolean tTest(StatisticalSummary sampleStats1, StatisticalSummary sampleStats2,
double alpha, boolean equalVariances)
public boolean tTest(StatisticalSummary sampleStats1,
StatisticalSummary sampleStats2, double alpha)
throws IllegalArgumentException, MathException {
if ((alpha <= 0) || (alpha > 0.5)) {
throw new IllegalArgumentException("bad significance level: " + alpha);
}
return (tTest(sampleStats1, sampleStats2, equalVariances) < alpha);
return (tTest(sampleStats1, sampleStats2) < alpha);
}
//----------------------------------------------- Protected methods
@ -738,8 +947,8 @@ public class TTestImpl implements TTest {
/**
* Computes t test statistic for 2-sample t-test.
* If equalVariance is true, the pooled variance
* estimate is computed and used.
* <p>
* Does not assume that subpopulation variances are equal.
*
* @param m1 first sample mean
* @param m2 second sample mean
@ -747,17 +956,29 @@ public class TTestImpl implements TTest {
* @param v2 second sample variance
* @param n1 first sample n
* @param n2 second sample n
* @param equalVariances are variances assumed equal?
* @return t test statistic
*/
protected double t(double m1, double m2, double v1, double v2, double n1,
double n2, boolean equalVariances) {
if (equalVariances) {
double pooledVariance = ((n1 - 1) * v1 + (n2 -1) * v2 ) / (n1 + n2 - 2);
return (m1 - m2) / Math.sqrt(pooledVariance * (1d / n1 + 1d / n2));
} else {
double n2) {
return (m1 - m2) / Math.sqrt((v1 / n1) + (v2 / n2));
}
}
/**
* Computes t test statistic for 2-sample t-test under the hypothesis
* of equal subpopulation variances.
*
* @param m1 first sample mean
* @param m2 second sample mean
* @param v1 first sample variance
* @param v2 second sample variance
* @param n1 first sample n
* @param n2 second sample n
* @return t test statistic
*/
protected double homoscedasticT(double m1, double m2, double v1,
double v2, double n1, double n2) {
double pooledVariance = ((n1 - 1) * v1 + (n2 -1) * v2 ) / (n1 + n2 - 2);
return (m1 - m2) / Math.sqrt(pooledVariance * (1d / n1 + 1d / n2));
}
/**
@ -780,8 +1001,9 @@ public class TTestImpl implements TTest {
/**
* Computes p-value for 2-sided, 2-sample t-test.
* If equalVariances is true, the sum of the sample sizes minus 2
* is used as df; otherwise df is approximated from the data.
* <p>
* Does not assume subpopulation variances are equal. Degrees of freedom
* are estimated from the data.
*
* @param m1 first sample mean
* @param m2 second sample mean
@ -789,20 +1011,41 @@ public class TTestImpl implements TTest {
* @param v2 second sample variance
* @param n1 first sample n
* @param n2 second sample n
* @param equalVariances are variances assumed equal?
* @return p-value
* @throws MathException if an error occurs computing the p-value
*/
protected double tTest(double m1, double m2, double v1, double v2,
double n1, double n2, boolean equalVariances)
double n1, double n2)
throws MathException {
double t = Math.abs(t(m1, m2, v1, v2, n1, n2, equalVariances));
double t = Math.abs(t(m1, m2, v1, v2, n1, n2));
double degreesOfFreedom = 0;
degreesOfFreedom= df(v1, v2, n1, n2);
TDistribution tDistribution =
getDistributionFactory().createTDistribution(degreesOfFreedom);
return 1.0 - tDistribution.cumulativeProbability(-t, t);
}
/**
* Computes p-value for 2-sided, 2-sample t-test, under the assumption
* of equal subpopulation variances.
* <p>
* The sum of the sample sizes minus 2 is used as degrees of freedom.
*
* @param m1 first sample mean
* @param m2 second sample mean
* @param v1 first sample variance
* @param v2 second sample variance
* @param n1 first sample n
* @param n2 second sample n
* @return p-value
* @throws MathException if an error occurs computing the p-value
*/
protected double homoscedasticTTest(double m1, double m2, double v1,
double v2, double n1, double n2)
throws MathException {
double t = Math.abs(t(m1, m2, v1, v2, n1, n2));
double degreesOfFreedom = 0;
if (equalVariances) {
degreesOfFreedom = (double) (n1 + n2 - 2);
} else {
degreesOfFreedom= df(v1, v2, n1, n2);
}
TDistribution tDistribution =
getDistributionFactory().createTDistribution(degreesOfFreedom);
return 1.0 - tDistribution.cumulativeProbability(-t, t);

View File

@ -23,7 +23,7 @@ import org.apache.commons.math.stat.univariate.SummaryStatistics;
/**
* Test cases for the TTestImpl class.
*
* @version $Revision: 1.5 $ $Date: 2004/06/02 13:08:55 $
* @version $Revision: 1.6 $ $Date: 2004/08/02 04:20:09 $
*/
public final class TTestTest extends TestCase {
@ -166,73 +166,73 @@ public final class TTestTest extends TestCase {
// Target comparison values computed using R version 1.8.1 (Linux version)
assertEquals("two sample heteroscedastic t stat", 1.603717,
testStatistic.t(sample1, sample2, false), 1E-6);
testStatistic.t(sample1, sample2), 1E-6);
assertEquals("two sample heteroscedastic t stat", 1.603717,
testStatistic.t(sampleStats1, sampleStats2, false), 1E-6);
testStatistic.t(sampleStats1, sampleStats2), 1E-6);
assertEquals("two sample heteroscedastic p value", 0.1288394,
testStatistic.tTest(sample1, sample2, false), 1E-7);
testStatistic.tTest(sample1, sample2), 1E-7);
assertEquals("two sample heteroscedastic p value", 0.1288394,
testStatistic.tTest(sampleStats1, sampleStats2, false), 1E-7);
testStatistic.tTest(sampleStats1, sampleStats2), 1E-7);
assertTrue("two sample heteroscedastic t-test reject",
testStatistic.tTest(sample1, sample2, 0.2, false));
testStatistic.tTest(sample1, sample2, 0.2));
assertTrue("two sample heteroscedastic t-test reject",
testStatistic.tTest(sampleStats1, sampleStats2, 0.2, false));
testStatistic.tTest(sampleStats1, sampleStats2, 0.2));
assertTrue("two sample heteroscedastic t-test accept",
!testStatistic.tTest(sample1, sample2, 0.1, false));
!testStatistic.tTest(sample1, sample2, 0.1));
assertTrue("two sample heteroscedastic t-test accept",
!testStatistic.tTest(sampleStats1, sampleStats2, 0.1, false));
!testStatistic.tTest(sampleStats1, sampleStats2, 0.1));
try {
testStatistic.tTest(sample1, sample2, .95, false);
testStatistic.tTest(sample1, sample2, .95);
fail("alpha out of range, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// exptected
// expected
}
try {
testStatistic.tTest(sampleStats1, sampleStats2, .95, false);
testStatistic.tTest(sampleStats1, sampleStats2, .95);
fail("alpha out of range, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
testStatistic.tTest(sample1, tooShortObs, .01, false);
testStatistic.tTest(sample1, tooShortObs, .01);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
testStatistic.tTest(sampleStats1, tooShortStats, .01, false);
testStatistic.tTest(sampleStats1, tooShortStats, .01);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
testStatistic.tTest(sample1, tooShortObs, false);
testStatistic.tTest(sample1, tooShortObs);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
testStatistic.tTest(sampleStats1, tooShortStats, false);
testStatistic.tTest(sampleStats1, tooShortStats);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
testStatistic.t(sample1, tooShortObs, false);
testStatistic.t(sample1, tooShortObs);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
}
try {
testStatistic.t(sampleStats1, tooShortStats, false);
testStatistic.t(sampleStats1, tooShortStats);
fail("insufficient data, IllegalArgumentException expected");
} catch (IllegalArgumentException ex) {
// expected
@ -252,13 +252,13 @@ public final class TTestTest extends TestCase {
// Target comparison values computed using R version 1.8.1 (Linux version)
assertEquals("two sample homoscedastic t stat", -1.120897,
testStatistic.t(sample1, sample2, true), 10E-6);
testStatistic.homoscedasticT(sample1, sample2), 10E-6);
assertEquals("two sample homoscedastic p value", 0.2948490,
testStatistic.tTest(sampleStats1, sampleStats2, true), 1E-6);
testStatistic.homoscedasticTTest(sampleStats1, sampleStats2), 1E-6);
assertTrue("two sample homoscedastic t-test reject",
testStatistic.tTest(sample1, sample2, 0.3, true));
testStatistic.homoscedasticTTest(sample1, sample2, 0.3));
assertTrue("two sample homoscedastic t-test accept",
!testStatistic.tTest(sample1, sample2, 0.2, true));
!testStatistic.homoscedasticTTest(sample1, sample2, 0.2));
}
public void testSmallSamples() throws Exception {
@ -266,8 +266,8 @@ public final class TTestTest extends TestCase {
double[] sample2 = {4d, 5d};
// Target values computed using R, version 1.8.1 (linux version)
assertEquals(-2.2361, testStatistic.t(sample1, sample2, false), 1E-4);
assertEquals(0.1987, testStatistic.tTest(sample1, sample2, false), 1E-4);
assertEquals(-2.2361, testStatistic.t(sample1, sample2), 1E-4);
assertEquals(0.1987, testStatistic.tTest(sample1, sample2), 1E-4);
}
public void testPaired() throws Exception {

View File

@ -17,7 +17,7 @@
-->
<?xml-stylesheet type="text/xsl" href="./xdoc.xsl"?>
<!-- $Revision: 1.19 $ $Date: 2004/06/23 16:26:16 $ -->
<!-- $Revision: 1.20 $ $Date: 2004/08/02 04:20:09 $ -->
<document url="stat.html">
<properties>
<title>The Commons Math User Guide - Statistics</title>
@ -411,7 +411,10 @@ System.out.println(regression.getSlopeStdErr());
Welch-Satterwaite approximation</a> is used to compute the degrees
of freedom. Methods to return t-statistics and p-values are provided in each
case, as well as boolean-valued methods to perform fixed significance
level tests. See the examples below and the API documentation for
level tests. The names of methods or methods that assume equal
subpopulation variances always start with "homoscedastic." Test or
test-statistic methods that just start with "t" do not assume equal
variances. See the examples below and the API documentation for
more details.</li>
<li>The validity of the p-values returned by the t-test depends on the
assumptions of the parametric t-test procedure, as discussed
@ -536,26 +539,25 @@ testStatistic.pairedTTest(sample1, sample2, .05);
To compute the t-statistic:
<source>
TTestImpl testStatistic = new TTestImpl();
testStatistic.t(summary1, summary2, false);
testStatistic.t(summary1, summary2);
</source>
</p>
<p>
To compute the (one-sided) p-value:
<source>
testStatistic.tTest(sample1, sample2, false);
testStatistic.tTest(sample1, sample2);
</source>
</p>
<p>
To perform a fixed significance level test with alpha = .05:
<source>
testStatistic.tTest(sample1, sample2, .05, false);
testStatistic.tTest(sample1, sample2, .05);
</source>
</p>
<p>
In each case above, the last (boolean) parameter determines
whether or not the test should assume that subpopulation variances
are equal. Replacing this with <code>true</code> will result in
homoscedastic (equal variances) tests / test statistics.
In each case above, the test does not assume that the subpopulation
variances are equal. To perform the tests under this assumption,
replace "t" at the beginning of the method name with "homoscedasticT"
</p>
</dd>
<dt>Computing <code>chi-square</code> test statistics</dt>