This guest blog post comes to us from superstar AP Stats teacher Leigh Nataro.
Sampling distributions are at the heart of statistical inference, which includes both confidence intervals and hypothesis testing. Although it is easy to tell students about the properties of sampling distributions, it is better when students experience this for themselves. In this activity, students download Major League Baseball salary data from 2016 and use an online tool called StatKey to create sampling distributions for sample means based on various sample sizes. Because the original population of salaries is severely skewed to the right, students see the Central Limit Theorem in action when n > 30.
Describe how the shape of the sampling distribution for a sample mean changes as n increases, when the original population distribution is severely skewed.
Describe the shape, center, and spread of the sampling distribution for a sample mean, when n is greater than or equal to 30.
Explain the relationship between sample size and standard deviation for the sampling distribution for a sample mean.
Note: A complete lesson plan, student notes and answer key to the student notes and the file of baseball salaries can be found at the end of this blog.
Here is the original population distribution of baseball salaries. Salaries have been rounded to the nearest million. The mean of the population distribution is 4.393 million dollars with a median of 1.5 million dollars.
To estimate the mean salary of all 862 baseball players, students will create sampling distributions with various sample sizes noting how the shape, center, and spread change as n increases. Eventually, they will create a sampling distribution for the sample mean based on a sample size of n = 30. A screenshot of what students may see is shown below.
Click on the image below to see how the population distribution of MLB salaries is used in StatKey to create the sampling distribution for sample means based on various sample sizes.
Common Student Errors
As students work through this activity, there are several common errors that are good for you to know. These are errors that have been seen multiple times on the AP exam and have been noted in many scoring commentaries.
Be sure students describe the shape of the sampling distribution as approximately normal. The word approximately is important, because a normal distribution is a theoretical model that has the x-axis as an asymptote. Clearly no distribution of data or population distribution is exactly normal.
When students reference the Central Limit Theorem, it is only in relation to distributions of sample means. If a problem is working with distributions of sample proportions, the student should not be referring to the Central Limit Theorem or n > 30. If this is done on a free response question, the student’s response may receive a lower score.
If it is assumed that the original population distribution is approximately normal, a sample size of n > 30 is not needed for the sampling distribution of the sample means to be approximately normal.