top of page

Testing the Distribution of a Categorical Variable (Lesson 11.1)

Chapter 11 - Day 1

Learning Targets
  • State hypotheses for a test about the distribution of a categorical variable.

  • Calculate expected counts for a test about the distribution of a categorical variable.

  • Calculate the test statistic for a test about the distribution of a categorical variable.

Activity: Which Color M&M is the Most Common?
Activity:
jpg.jpg
pdf.jpg
pdf.jpg

Experience First

For this lesson, we will be taking a random sample of M&Ms to evaluate a claim made by the company about the distribution of color. We do this by giving each student one small snack size M&Ms (we buy these after Halloween and then let them sit in a closet until we need them).

 

The first task to to aggregate the data from each of the snack size M&Ms into one class sample. We do this by having each student record their counts on the white board, but you could also have students enter this data into a spreadsheet. 

 

We suggest leading the whole group through a discussion about hypotheses and expected counts, and then letting them work in small groups on questions #4-7.

Note 1: There are two M&M factories with different distributions. More info here.

Note 2: The color distribution depends on the type of M&M (milk chocolate, almond, etc). More info here.

Formalize Later

Be prepared to answer and explain the following two questions about the calculation of the chi-square test statistic: 

 

Why do we square (Observed - Expected)?

Sometimes the observed is greater than expected and sometimes it is less. We square the difference so that all of our values are positive. We used a similar approach back in Chapter 1, when we calculated standard deviation. This part of the formula explains why the chi-square distribution starts at 0 and only includes positive values. 

 

Why do we divide by Expected? 

We use an example to help explain.

Scenario 1: The expected number of red M&M’S is 6 and we get 16 red M&M’S.

Scenario 2: The expected number of red M&M’S is 500 and we get 510 M&M’S.

Which scenario provides more convincing evidence against the company’s claim? In both scenarios, the observed value is 10 away from the expected. But Scenario 1 provides much more convincing evidence. The important idea is how far away the observed count is from the expected count as a fraction of the expected values.

This lesson is setting students up for Lesson 11.2, where they will perform a full 4-step significance test using the data collected in this lesson. 

bottom of page