Introduction to Statistics

Many statistical applications collect data samples from either the same population or from two populations that have a natural pairing relationship. These samples are known as paired or matched samples. The use of data pairs occurs naturally in situations where the samples are measured both before and after some event. For example, one may want to make inferences about the mean weight loss for members of a health club after they have gone through a weight loss program for a certain period of time.

The two-means paired t-test method is used to compare the means of two dependent samples and determine if whether there is a difference between these means. The following are some of the requirements for performing the two-means paired t-test for the dependent samples:

The samples are randomly selected
The samples are dependent and of size n
The sample population are EITHER normally distributed OR the sample size is \(n \ge 30\)

When the above conditions are satisfied, the following are the steps to perform the two-means dependent samples t-test:

State the significance level \(\alpha\)
State the null hypothesis \(H_0\) about the population mean \(\mu_d\) of paired differences
State the alternate hypothesis \(H_a\) about the population mean \(\mu_d\) of paired differences
Compute of the differences between the paired entries from the sample d = (entry from first sample) - (corresponding entry from the second sample)
Compute the mean of the paired differences \(\bar{d}\) = \(\Large{\frac{\Sigma{d}}{n}}\)
Compute the standard deviation of the paired differences \(s_d = \Large{\sqrt{\frac{\Sigma{(d - \bar{d})^2}}{n - 1}}}\)
Compute the test statistic \(t = \Large{\frac{\bar{d} - \mu_d}{s_d/\sqrt{n}}}\)
Compute the Degrees of Freedom d.f. = n - 1
Determine the critical value(s) corresponding to the stated significance level \(\alpha\)
If the computed t falls in the rejection region, then reject the null hypothesis

Before

After

210

190

235

170

208

210

-2

190

188

172

173

-1

244

228

\(d - \bar{d}\)

\((d - \bar{d})^2\)

20 - 16.7

\(+3.3^2\) = 10.89

65 - 16.7

\(+48.3^2\) = 2391.21

-2 - 16.7

\(-18.7^2\) = 349.69

2 - 16.7

\(-14.7^2\) = 216.09

-1 - 16.7

\(-15.7^2\) = 246.49

16 - 16.7

\(-0.7^2\) = 0.49

Often times, we conduct hypothesis tests to determine differences between two population proportions. In the following section(s), we will use the following notation:

\(n_1\) - Sample size of the first population
\(n_2\) - Sample size of the second population
\(p_1\) - the proportion of successes in the first population
\(p_2\) - the proportion of successes in the second population
\(x_1\) - the number of successes in the first sample
\(x_2\) - the number of successes in the second sample
\(\hat{p_1} = \Large{\frac{x_1}{n_1}}\) - the proportion of successes in the first sample
\(\hat{p_2} = \Large{\frac{x_2}{n_2}}\) - the proportion of successes in the second sample
\(\bar{p} = \Large{\frac{x_1 + x_2}{n_1 + n_2}}\) - the weighted estimate of the population proportions successes
\(\bar{q} = 1 - \bar{p}\) - the weighted estimate of the population proportions failures

A two-sample z-test is used to test the difference between two population proportions using two independent samples. The following are some of the requirements for performing the two-proportion z-test:

The two samples are randomly selected
The two samples are independent
Verify \(n_1\bar{p_1} \ge 5\), \(n_1\bar{q_1} \ge 5\), \(n_2\bar{p_2} \ge 5\), and \(n_2\bar{q_2} \ge 5\)

When the above conditions are satisfied, the following are the steps to perform the two-proportion z-test:

State the significance level \(\alpha\)
State the null hypothesis \(H_0\) about the difference between the two population proportions
State the alternate hypothesis \(H_a\) about the difference between the two population proportions
Compute the sample proportions \(\hat{p_1} = \Large{\frac{x_1}{n_1}}\) and \(\hat{p_2} = \Large{\frac{x_2}{n_2}}\)
Compute the weighted estimates of the population proportions \(\bar{p} = \Large{\frac{x_1 + x_2}{n_1 + n_2}}\) and \(\bar{q} = 1 - \bar{p}\)
Compute the test statistic z = \(\Large{\frac{(\hat{p_1} - \hat{p_2}) - (p_1 - p_2)}{\sqrt{\bar{p}\bar{q}(1/n_1+1/n_2)}}}\)
Determine the critical value(s) corresponding to the stated significance level \(\alpha\)
If the computed z falls in the rejection region, then reject the null hypothesis

Example-2	A researcher wants to estimate the difference between the percentages of users of two toothpastes who will never switch to another toothpaste. In a sample of 500 users of Toothpaste A, 100 said that they will never switch to another toothpaste. In another sample of 400 users of Toothpaste B, 68 said that they will never switch to another toothpaste. At a 1% significance level, can we conclude that the proportion of users of Toothpaste A who will never switch to another toothpaste is greater than the proportion of users of Toothpaste B who will never switch to another toothpaste.
Given facts: \(n_1 = 500\), \(n_2 = 400\), \(x_1 = 100\), and \(x_2 = 68\). Sample proportion for Toothpaste A: \(\hat{p_1} = \Large{\frac{x_1}{n_1}}\) = \(\Large{\frac{100}{500}}\) = 0.20. Sample proportion for Toothpaste B: \(\hat{p_2} = \Large{\frac{x_2}{n_2}}\) = \(\Large{\frac{68}{400}}\) = 0.17. Null hypothesis: \(H_0: p_1 = p_2\) OR \(p_1 - p_2 = 0\). Alternate hypothesis: \(H_a: p_1 \gt p_2\) OR \(p_1 - p_2 \gt 0\). In this situation, the hypothesis test is deciding if the proportion for Toothpaste A is greater than the proportion for Toothpaste B. Hence, this is a right-tailed test. Given the significance level \(\alpha = 0.01\). Compute the weighted estimate of the population proportions \(\bar{p} = \Large{\frac{x_1 + x_2}{n_1 + n_2}}\) = \(\Large{\frac{100 + 68}{500 + 400}}\) = 0.187. Also, \(\bar{q} = 1 - \bar{p}\) = 1 - 0.187 = 0.813. Compute the test statistic z = \(\Large{\frac{(\hat{p_1} - \hat{p_2}) - (p_1 - p_2)}{\sqrt{\bar{p}\bar{q}(1/n_1+1/n_2)}}}\) = \(\Large{\frac{(0.20 - 0.17) - (0)}{\sqrt{0.187 * 0.813 * (1/500+1/400)}}}\) \(\approx 1.15\). For \(\alpha = 0.01\), the critical value from the z-table is \(z_c \approx 2.33\). Since the computed standardized test statistic (z) is below the critical value \(z_c\), we FAIL to reject the null hypotesis \(H_0\). Therefore, at the 1% significance level, the sample data does not provide sufficient evidence to indicate that the proportion of users of Toothpaste A who will never switch to another toothpaste is greater than the proportion of users of Toothpaste B who will never switch to another toothpaste.

Example-2

A researcher wants to estimate the difference between the percentages of users of two toothpastes who will never switch to another toothpaste. In a sample of 500 users of Toothpaste A, 100 said that they will never switch to another toothpaste. In another sample of 400 users of Toothpaste B, 68 said that they will never switch to another toothpaste. At a 1% significance level, can we conclude that the proportion of users of Toothpaste A who will never switch to another toothpaste is greater than the proportion of users of Toothpaste B who will never switch to another toothpaste.

Given facts: \(n_1 = 500\), \(n_2 = 400\), \(x_1 = 100\), and \(x_2 = 68\).

Sample proportion for Toothpaste A: \(\hat{p_1} = \Large{\frac{x_1}{n_1}}\) = \(\Large{\frac{100}{500}}\) = 0.20.

Sample proportion for Toothpaste B: \(\hat{p_2} = \Large{\frac{x_2}{n_2}}\) = \(\Large{\frac{68}{400}}\) = 0.17.

Null hypothesis: \(H_0: p_1 = p_2\) OR \(p_1 - p_2 = 0\).

Alternate hypothesis: \(H_a: p_1 \gt p_2\) OR \(p_1 - p_2 \gt 0\).

In this situation, the hypothesis test is deciding if the proportion for Toothpaste A is greater than the proportion for Toothpaste B. Hence, this is a right-tailed test.

Given the significance level \(\alpha = 0.01\).

Compute the weighted estimate of the population proportions \(\bar{p} = \Large{\frac{x_1 + x_2}{n_1 + n_2}}\) = \(\Large{\frac{100 + 68}{500 + 400}}\) = 0.187. Also, \(\bar{q} = 1 - \bar{p}\) = 1 - 0.187 = 0.813.

Compute the test statistic z = \(\Large{\frac{(\hat{p_1} - \hat{p_2}) - (p_1 - p_2)}{\sqrt{\bar{p}\bar{q}(1/n_1+1/n_2)}}}\) = \(\Large{\frac{(0.20 - 0.17) - (0)}{\sqrt{0.187 * 0.813 * (1/500+1/400)}}}\) \(\approx 1.15\).

For \(\alpha = 0.01\), the critical value from the z-table is \(z_c \approx 2.33\).

Since the computed standardized test statistic (z) is below the critical value \(z_c\), we FAIL to reject the null hypotesis \(H_0\).

Therefore, at the 1% significance level, the sample data does not provide sufficient evidence to indicate that the proportion of users of Toothpaste A who will never switch to another toothpaste is greater than the proportion of users of Toothpaste B who will never switch to another toothpaste.