Chapter 4 - Day 1 - Lesson 4.1
Identify the population and sample in a statistical study.
Identify voluntary response sampling and convenience sampling and explain how these sampling methods can lead to bias.
Describe how to select a simple random sample with technology or a table of random digits.
Activity: Does Beyonce Write Her Own Lyrics?
For 10 years, I have used the Gettysburg Address Activity to teach this lesson. This activity brilliantly reveals the need for a simple random sample when making inferences for a population. But the context never got the buy-in from students that I wanted. Except for a few history nerds, there were few students who cared to find out the true author of the Gettysburg Address.
Enter Beyonce. When her hit “Crazy in Love” came out, people started questioning whether or not she had written the lyrics. In a Vanity Fair article, Beyonce came back at them:
“‘Crazy in Love’ was really hard to write because there was so much going on … I mean, I had written — what? —seven, eight number one songs with Destiny’s Child, in a row.”
So how do we use statistics to determine if Beyonce wrote the lyrics to “Crazy in Love”?
It is well known that different authors use different styles and word choice. It turns out that the average word length is fairly consistent for each author and can be used as a way to distinguish one author from another. Since we know for sure that Beyonce wrote the lyrics for all of the Destiny’s Child songs (average word length 3.64), we should be able to determine her possible authorship of “Crazy in Love” by finding the average word length.
But we don’t have time to find the actual word length for all of the words in “Crazy in Love” (population). Instead, we will pick 5 words (sample). The first sampling method will be the “quickly circle 5 words” method (convenience sample), which will produce terrible overestimates, leading us to want to use a better sampling method (simple random sample!).
Simple Random Sample:
Rush students into their first sample of 5 words. The more you rush them, the worse they will mess it up.
Play the song on your speakers while students are finding their samples.
If students are going to use RandInt on their calculators to find the simple random sample, be sure that they seed their calculators first.
Use sticker dots on a poster board so that you can save the results. You can refer back to these posters throughout the remainder of the year.
The true average word length for “Crazy in Love” is 3.53.
A possible extension if you have time is for students to take samples of size 10 and make a third dotplot. This will show them that increasing the sample size decreases the variability of the sampling distribution.
This might be the first time that students are experiencing a sampling distribution, which will be an important concept when we get to inference. Point at one of the dots and ask “What does this dot represent?” The answer is “A sample of 5 words, and an average calculated from that sample.” Then point at a different dot and ask ‘What does this dot represent?” The answer is “A different sample of 5 words, and an average calculated from that sample.” So the dotplot represents many, many samples and an average calculated for each of those samples (the sampling distribution of a sample mean).
This is the perfect context to discuss how poor sampling methods can lead to bias. The convenience sample has high bias because most of the estimates are overestimates of the truth (most dots are above 3.53) and the simple random sample has low bias because about half of the estimates are overestimates and half are underestimates.
The true average word length for “Crazy in Love” is 3.53, which is not very far from the known Beyonce average word length of 3.64. This means that we do not have convincing evidence that Beyonce did not write the lyrics. This is not the same as saying “Beyonce did write the lyrics”. The latter statement is equivalent to accepting the null hypothesis.