druid/docs/content/development/extensions-core/test-stats.md

<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

---
layout: doc_page
title: "Test Stats Aggregators"
---
# Test Stats Aggregators

Incorporates test statistics related aggregators, including z-score and p-value. Please refer to [https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/](https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/) for math background and details.

Make sure to include `druid-stats` extension in order to use these aggregrators.

## Z-Score for two sample ztests post aggregator

Please refer to [https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/](https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/) and [http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf](http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf) for more details.

z = (p1 - p2) / S.E.  (assuming null hypothesis is true)

Please see below for p1 and p2.
Please note S.E. stands for standard error where 

S.E. = sqrt{ p1 * ( 1 - p1 )/n1 + p2 * (1 - p2)/n2) }

(p1 – p2) is the observed difference between two sample proportions.

### zscore2sample post aggregator
* **`zscore2sample`**: calculate the z-score using two-sample z-test while converting binary variables (***e.g.*** success or not) to continuous variables (***e.g.*** conversion rate).

```json
{
  "type": "zscore2sample",
  "name": "<output_name>",
  "successCount1": <post_aggregator> success count of sample 1,
  "sample1Size": <post_aggregaror> sample 1 size,
  "successCount2": <post_aggregator> success count of sample 2,
  "sample2Size" : <post_aggregator> sample 2 size
}
```

Please note the post aggregator will be converting binary variables to continuous variables for two population proportions.  Specifically

p1 = (successCount1) / (sample size 1)

p2 = (successCount2) / (sample size 2)

### pvalue2tailedZtest post aggregator

* **`pvalue2tailedZtest`**: calculate p-value of two-sided z-test from zscore
    - ***pvalue2tailedZtest(zscore)*** - the input is a z-score which can be calculated using the zscore2sample post aggregator


```json
{
  "type": "pvalue2tailedZtest",
  "name": "<output_name>",
  "zScore": <zscore post_aggregator>
}
```
  
## Example Usage

In this example, we use zscore2sample post aggregator to calculate z-score, and then feed the z-score to pvalue2tailedZtest post aggregator to calculate p-value.

A JSON query example can be as follows:

```json
{
  ...
    "postAggregations" : {
    "type"   : "pvalue2tailedZtest",
    "name"   : "pvalue",
    "zScore" : 
    {
     "type"   : "zscore2sample",
     "name"   : "zscore",
     "successCount1" :
       { "type"   : "constant",
         "name"   : "successCountFromPopulation1Sample",
         "value"  : 300
       },
     "sample1Size" :
       { "type"   : "constant",
         "name"   : "sampleSizeOfPopulation1",
         "value"  : 500
       },
     "successCount2":
       { "type"   : "constant",
         "name"   : "successCountFromPopulation2Sample",
         "value"  : 450
       },
     "sample2Size" :
       { "type"   : "constant",
         "name"   : "sampleSizeOfPopulation2",
         "value"  : 600
       }
     }
    }
}

```
-												add missing license headers, in particular to MD files; clean up RAT … (#6563)

* add missing license headers, in particular to MD files; clean up RAT exclusions

* revert inadvertent doc changes

* docs

* cr changes

* fix modified druid-production.svg

											
										
										
											2018-11-13 12:38:37 -05:00
+								<!--
 								  ~ Licensed to the Apache Software Foundation (ASF) under one
 								  ~ or more contributor license agreements.  See the NOTICE file
 								  ~ distributed with this work for additional information
 								  ~ regarding copyright ownership.  The ASF licenses this file
 								  ~ to you under the Apache License, Version 2.0 (the
 								  ~ "License"); you may not use this file except in compliance
 								  ~ with the License.  You may obtain a copy of the License at
 								  ~
 								  ~   http://www.apache.org/licenses/LICENSE-2.0
 								  ~
 								  ~ Unless required by applicable law or agreed to in writing,
 								  ~ software distributed under the License is distributed on an
 								  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 								  ~ KIND, either express or implied.  See the License for the
 								  ~ specific language governing permissions and limitations
 								  ~ under the License.
 								  -->
-												adding new post aggregators for test statistics to druid-stats extension (#4532)

* adding new post aggregators of test stats to druid-stats extension

* changes to address code review comments

* fix checkstyle violations using druid_intellij_formatting.xml after merge upstream/master

* add @Override annotation per CI log

* make changes per review comments/discussions

* remove some blocks per review comments

											
										
										
											2017-10-10 02:43:27 -04:00
+								---
 								layout: doc_page
-												Added titles and harmonized docs to improve usability and SEO (#6731)

* added titles and harmonized docs

* manually fixed some titles

											
										
										
											2018-12-12 23:42:12 -05:00
+								title: "Test Stats Aggregators"
-												adding new post aggregators for test statistics to druid-stats extension (#4532)

* adding new post aggregators of test stats to druid-stats extension

* changes to address code review comments

* fix checkstyle violations using druid_intellij_formatting.xml after merge upstream/master

* add @Override annotation per CI log

* make changes per review comments/discussions

* remove some blocks per review comments

											
										
										
											2017-10-10 02:43:27 -04:00
+								---
 								# Test Stats Aggregators
 								Incorporates test statistics related aggregators, including z-score and p-value. Please refer to [https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/](https://www.paypal-engineering.com/2017/06/29/democratizing-experimentation-data-for-product-innovations/) for math background and details.
 								Make sure to include `druid-stats` extension in order to use these aggregrators.
 								## Z-Score for two sample ztests post aggregator
 								Please refer to [https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/](https://www.isixsigma.com/tools-templates/hypothesis-testing/making-sense-two-proportions-test/) and [http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf](http://www.ucs.louisiana.edu/~jcb0773/Berry_statbook/Berry_statbook_chpt6.pdf) for more details.
 								z = (p1 - p2) / S.E.  (assuming null hypothesis is true)
 								Please see below for p1 and p2.
 								Please note S.E. stands for standard error where
 								S.E. = sqrt{ p1 * ( 1 - p1 )/n1 + p2 * (1 - p2)/n2) }
 								(p1 – p2) is the observed difference between two sample proportions.
 								### zscore2sample post aggregator
 								* **`zscore2sample`**: calculate the z-score using two-sample z-test while converting binary variables (***e.g.*** success or not) to continuous variables (***e.g.*** conversion rate).
 								```json
 								{
 								  "type": "zscore2sample",
 								  "name": "<output_name>",
 								  "successCount1": <post_aggregator> success count of sample 1,
 								  "sample1Size": <post_aggregaror> sample 1 size,
 								  "successCount2": <post_aggregator> success count of sample 2,
 								  "sample2Size" : <post_aggregator> sample 2 size
 								}
 								```
 								Please note the post aggregator will be converting binary variables to continuous variables for two population proportions.  Specifically
 								p1 = (successCount1) / (sample size 1)
 								p2 = (successCount2) / (sample size 2)
 								### pvalue2tailedZtest post aggregator
 								* **`pvalue2tailedZtest`**: calculate p-value of two-sided z-test from zscore
 								    - ***pvalue2tailedZtest(zscore)*** - the input is a z-score which can be calculated using the zscore2sample post aggregator
 								```json
 								{
 								  "type": "pvalue2tailedZtest",
 								  "name": "<output_name>",
 								  "zScore": <zscore post_aggregator>
 								}
 								```
 								## Example Usage
 								In this example, we use zscore2sample post aggregator to calculate z-score, and then feed the z-score to pvalue2tailedZtest post aggregator to calculate p-value.
 								A JSON query example can be as follows:
 								```json
 								{
 								  ...
 								    "postAggregations" : {
 								    "type"   : "pvalue2tailedZtest",
 								    "name"   : "pvalue",
 								    "zScore" :
 								    {
 								     "type"   : "zscore2sample",
 								     "name"   : "zscore",
 								     "successCount1" :
 								       { "type"   : "constant",
 								         "name"   : "successCountFromPopulation1Sample",
 								         "value"  : 300
 								       },
 								     "sample1Size" :
 								       { "type"   : "constant",
 								         "name"   : "sampleSizeOfPopulation1",
 								         "value"  : 500
 								       },
 								     "successCount2":
 								       { "type"   : "constant",
 								         "name"   : "successCountFromPopulation2Sample",
 								         "value"  : 450
 								       },
 								     "sample2Size" :
 								       { "type"   : "constant",
 								         "name"   : "sampleSizeOfPopulation2",
 								         "value"  : 600
 								       }
 								     }
 								    }
 								}
 								```