PolarSPARC |
Introduction to Statistics - Part 7
Bhaskar S | 09/17/2021 |
In Part 6 of the series, we explored the Hypothesis Testing for proportions, for two independent samples z-test, and for two independent samples t-test.
In this part of the series, we will continue our journey with Hypothesis Testing for two dependent samples t-test and for two independent samples z-test for difference of proportions.
Dependent Samples - Two-Means t-Test
Many statistical applications collect data samples from either the same population or from two populations that have a natural pairing relationship. These samples are known as paired or matched samples. The use of data pairs occurs naturally in situations where the samples are measured both before and after some event. For example, one may want to make inferences about the mean weight loss for members of a health club after they have gone through a weight loss program for a certain period of time.
The two-means paired t-test method is used to compare the means of two dependent samples and determine if whether there is a difference between these means. The following are some of the requirements for performing the two-means paired t-test for the dependent samples:
The samples are randomly selected
The samples are dependent and of size n
The sample population are EITHER normally distributed OR the sample size is \(n \ge 30\)
When the above conditions are satisfied, the following are the steps to perform the two-means dependent samples t-test:
State the significance level \(\alpha\)
State the null hypothesis \(H_0\) about the population mean \(\mu_d\) of paired differences
State the alternate hypothesis \(H_a\) about the population mean \(\mu_d\) of paired differences
Compute of the differences between the paired entries from the sample d = (entry from first sample) - (corresponding entry from the second sample)
Compute the mean of the paired differences \(\bar{d}\) = \(\Large{\frac{\Sigma{d}}{n}}\)
Compute the standard deviation of the paired differences \(s_d = \Large{\sqrt{\frac{\Sigma{(d - \bar{d})^2}}{n - 1}}}\)
Compute the test statistic \(t = \Large{\frac{\bar{d} - \mu_d}{s_d/\sqrt{n}}}\)
Compute the Degrees of Freedom d.f. = n - 1
Determine the critical value(s) corresponding to the stated significance level \(\alpha\)
If the computed t falls in the rejection region, then reject the null hypothesis
Let us now solve a problem for the two-means paired t-test.
Example-1 | A doctor wishes to see if a patient's cholesterol level will change by prescribing a certain medication. Six subjects are pretested and their readings are 210, 235, 208, 190, 172, and 244 respectively. They are prescribed the medication for a 6-week period after which they are tested again and their readings are 190, 170, 210, 188, 173, and 228 respectively. Can it be concluded that the cholesterol level has been changed at a \(\alpha = 0.10\). | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Null hypothesis (the cholesterol medication has no effect): \(H_0: \mu_d = 0\) Alternate hypothesis (the cholesterol medication has is effective): \(H_a: \mu_d \ne 0\) Compute of the differences between the paired entries as follows:
Compute the mean of the paired differences \(\bar{d}\) = \(\Large{\frac{\Sigma{d}}{n}}\) = \(\Large{\frac{(20 + 65 -2 + 2 -1 + 16)}{6]}}\) = 16.7 Compute the standard deviation of the paired differences \(s_d = \Large{\sqrt{\frac{\Sigma{(d - \bar{d})^2}}{n - 1}}}\) using the values from the table below:
Compute the standard deviation of the paired differences \(s_d = \Large{\sqrt{\frac{\Sigma{(d - \bar{d})^2}}{n - 1}}}\) = \(\Large{\frac{(10.89 + 2391.21 + 349.69 + 216.09 + 246.49 + 0.49)}{(6 - 1)}}\) \(\approx\) 25.4 Compute the test statistic \(t = \Large{\frac{\bar{d} - \mu_d}{s_d/\sqrt{n}}}\) = \(\Large{\frac{16.7 - 0}{25.4/\sqrt{6}}}\) \(\approx\) 1.610 In this situation, the hypothesis test is deciding if there is a difference in the mean. Hence, this is a two-tailed test. Given the significance level \(\alpha = 0.10\) and d.f = n - 1 = 6 - 1 = 5. For \(\alpha = 0.10\), the critical value from the t-table for d.f. = 5 is \(t_c \approx 2.015\). Since the computed standardized test statistic (t) is below the critical value \(t_c\), we FAIL to reject the null hypotesis \(H_0\). Therefore, at the 0.10 significance level, the sample data provides not enough evidence to conclude the claim that the prescribed medication changes a patient's cholesterol level. |
Hypothesis Testing - Two Population Proportions
Often times, we conduct hypothesis tests to determine differences between two population proportions. In the following section(s), we will use the following notation:
\(n_1\) - Sample size of the first population
\(n_2\) - Sample size of the second population
\(p_1\) - the proportion of successes in the first population
\(p_2\) - the proportion of successes in the second population
\(x_1\) - the number of successes in the first sample
\(x_2\) - the number of successes in the second sample
\(\hat{p_1} = \Large{\frac{x_1}{n_1}}\) - the proportion of successes in the first sample
\(\hat{p_2} = \Large{\frac{x_2}{n_2}}\) - the proportion of successes in the second sample
\(\bar{p} = \Large{\frac{x_1 + x_2}{n_1 + n_2}}\) - the weighted estimate of the population proportions successes
\(\bar{q} = 1 - \bar{p}\) - the weighted estimate of the population proportions failures
Independent Samples - Two-Proportion z-Test
A two-sample z-test is used to test the difference between two population proportions using two independent samples. The following are some of the requirements for performing the two-proportion z-test:
The two samples are randomly selected
The two samples are independent
Verify \(n_1\bar{p_1} \ge 5\), \(n_1\bar{q_1} \ge 5\), \(n_2\bar{p_2} \ge 5\), and \(n_2\bar{q_2} \ge 5\)
When the above conditions are satisfied, the following are the steps to perform the two-proportion z-test:
State the significance level \(\alpha\)
State the null hypothesis \(H_0\) about the difference between the two population proportions
State the alternate hypothesis \(H_a\) about the difference between the two population proportions
Compute the sample proportions \(\hat{p_1} = \Large{\frac{x_1}{n_1}}\) and \(\hat{p_2} = \Large{\frac{x_2}{n_2}}\)
Compute the weighted estimates of the population proportions \(\bar{p} = \Large{\frac{x_1 + x_2}{n_1 + n_2}}\) and \(\bar{q} = 1 - \bar{p}\)
Compute the test statistic z = \(\Large{\frac{(\hat{p_1} - \hat{p_2}) - (p_1 - p_2)}{\sqrt{\bar{p}\bar{q}(1/n_1+1/n_2)}}}\)
Determine the critical value(s) corresponding to the stated significance level \(\alpha\)
If the computed z falls in the rejection region, then reject the null hypothesis
Let us now solve a problem for the two-proportion z-test.
Example-2 | A researcher wants to estimate the difference between the percentages of users of two toothpastes who will never switch to another toothpaste. In a sample of 500 users of Toothpaste A, 100 said that they will never switch to another toothpaste. In another sample of 400 users of Toothpaste B, 68 said that they will never switch to another toothpaste. At a 1% significance level, can we conclude that the proportion of users of Toothpaste A who will never switch to another toothpaste is greater than the proportion of users of Toothpaste B who will never switch to another toothpaste. |
---|---|
Given facts: \(n_1 = 500\), \(n_2 = 400\), \(x_1 = 100\), and \(x_2 = 68\). Sample proportion for Toothpaste A: \(\hat{p_1} = \Large{\frac{x_1}{n_1}}\) = \(\Large{\frac{100}{500}}\) = 0.20. Sample proportion for Toothpaste B: \(\hat{p_2} = \Large{\frac{x_2}{n_2}}\) = \(\Large{\frac{68}{400}}\) = 0.17. Null hypothesis: \(H_0: p_1 = p_2\) OR \(p_1 - p_2 = 0\). Alternate hypothesis: \(H_a: p_1 \gt p_2\) OR \(p_1 - p_2 \gt 0\). In this situation, the hypothesis test is deciding if the proportion for Toothpaste A is greater than the proportion for Toothpaste B. Hence, this is a right-tailed test. Given the significance level \(\alpha = 0.01\). Compute the weighted estimate of the population proportions \(\bar{p} = \Large{\frac{x_1 + x_2}{n_1 + n_2}}\) = \(\Large{\frac{100 + 68}{500 + 400}}\) = 0.187. Also, \(\bar{q} = 1 - \bar{p}\) = 1 - 0.187 = 0.813. Compute the test statistic z = \(\Large{\frac{(\hat{p_1} - \hat{p_2}) - (p_1 - p_2)}{\sqrt{\bar{p}\bar{q}(1/n_1+1/n_2)}}}\) = \(\Large{\frac{(0.20 - 0.17) - (0)}{\sqrt{0.187 * 0.813 * (1/500+1/400)}}}\) \(\approx 1.15\). For \(\alpha = 0.01\), the critical value from the z-table is \(z_c \approx 2.33\). Since the computed standardized test statistic (z) is below the critical value \(z_c\), we FAIL to reject the null hypotesis \(H_0\). Therefore, at the 1% significance level, the sample data does not provide sufficient evidence to indicate that the proportion of users of Toothpaste A who will never switch to another toothpaste is greater than the proportion of users of Toothpaste B who will never switch to another toothpaste. |
References
Introduction to Statistics - Part 6
Introduction to Statistics - Part 5
Introduction to Statistics - Part 4
Introduction to Statistics - Part 3
Introduction to Statistics - Part 2