Suppose we record the temperature of 25 random days this winter in degrees Fahrenheit. We want to figure out what proportion of days are under 30 degrees, so we take the number of days we observed to be under 30 and divide by the sample size of 25. We denote this proportion as p^. The proportion of the days where the high temperature was over 30 degrees we denote as q^. Can we say that the true proportion of days over the winter that the high temperature was under 30 is p^? We have to first think of how different the sample proportion might have been if we’d taken another random 20 days out of all the days during the winter. We aren’t going to actually take more samples, we are looking at the sampling distribution of the sample proportion of days this winter.
We know the distribution is approximately Normal under certain conditions. The normal distribution says about 68% of all samples of the days of winter will have p^ within one standard deviation of p (the true proportion of days under 30 degrees). And about 95 percent of all samples will be within two standard deviations from p^.
Suppose in our sample of 25 days, we find that 8 days have a temperature of under 30 degrees. Therefore our p^ is 8/25 = 0.32 and our q^ = 17/25 = 0.68. Notice that p^ + q^ = 1. The standard deviation of the sampling distribution is sqrt (pq/n) (note that sqrt means “square root”), but since we don’t know the true proportion p, we use p^ and q^ and find the standard error which is sqrt(p^q^/n). Calculating this value, we get sqrt[(0.32)(0.68)/25] = 0.09.
This information tells us that if I am at p^, there is a 95 percent chance that p is no more than two standard errors away from me. It means I am 95 percent sure that p will be within that span. So in our example, p^ = 0.32 and standard error is 0.09. Two standard errors is 0.18, so we are 95 percent sure that the true proportion of days in the winter that are less than 30 degrees is between 0.14 and 0.50. We can say that we are 95 percent confident that between 18 percent and 50 percent of the days this winter are have a high temperature less than 30 degrees.
In this example we used two standard errors to give us a 95 percent confidence interval. To change the confidence level, we’d need to change the number of standard errors’ so the size of the margin of error corresponds to the new level. This number is called the critical value, denoted as z* since we are using the Normal distribution. The critical value for a 95 percent confidence interval is 1.96, meaning that 95 percent of a Normal model is found within 1.96 standard deviations from the mean. I used z* to be 2 because it was easier to use in the example. The more precise measure is 1.96. For a 90 percent confidence interval, z* is 1.645. For a 99 percent confidence interval, z* is 2.58. Note that you can find the critical values, z* in a Z table in any statistics book.
Adjusting the intervals for the true values of z* we get the 90 percent confidence interval to be 0.32- 1.645(.09) to 1.645(.09) + 0.32, which is 0.17195 to 0.46805. The 95 percent confidence interval is 0.1463 to 0.4964, and the 99 confidence interval is 0.0878 to 0.5522. Notice how the intervals are wider the more confidence we use. That makes sense to be more confident that the true proportion will fall is a wider range than in a smaller range.
The size of the sample is also important in some cases. Suppose a candidate is planning a poll and wants to estimate voter support within 2.5% with 99 percent confidence. What size sample is needed? First we look at the margin of error. We don’t know p^ because we don’t have a sample yet, so we guess a value. The value that makes p^q^ the largest is p^ = 0.5, so we use that to be completely safe. Therefore we use the formula for margin of error, ME = z*[sqrt(p^q^/n)]. In our example, this is 0.025 = 2.58sqrt[(0.5)(0.5)/n)]. Solving for n, we get approximately 2,663.
There are many other applications of proportions, confidence intervals and hypothesis testing, which may be a topic for a future article. This guide should give a student a basic understanding of proportions and how to use them in confidence intervals involving real world situations.