ChiSquare Goodness of Fit Day 1 (Topic 8.2)
Chapter 12  Day 1
Learning Targets

State appropriate hypotheses and compute the expected counts and chisquare test statistic for a chisquare test for goodness of fit.

State and check the Random, 10%, and Large Counts conditions for performing a chisquare test for goodness of fit.

Calculate the degrees of freedom and Pvalue for a chisquare test for goodness of fit.
Activity: Which Color M&M is the Most Common?
Stats Medic / Skew the Script Collaboration Lesson: Does Harvard Discriminate Against Asian Applicants?
We start this lesson by telling students that we emailed the company that makes M&Ms asking about the color distribution. The company replied, claiming the following distribution:
Brown 13%, Yellow 14%, Orange 20%, Green 16%, Blue 24%, and Red 13%.
We are going to take a sample to try and find evidence against this claim. We buy one large bag of M&Ms and tell students to think of this bag as being a random sample of the entire population of M&Ms. We give each student a small handful of candies until the bag is empty, then we collect totals on the front white board. Students will use the class totals for all of their calculations.
Note 1: There are two M&M factories with different distributions. More info here.
Note 2: The color distribution depends on the type of M&M (milk chocolate, almond, etc). More info here.
Why Do We Square (Observed – Expected)?
Sometimes the observed is greater than expected and sometimes it is less. We square this results so that all of our values are positive. We used a similar approach back in Chapter 1, when we calculated standard deviation. This part of the formula explains why the chisquare distribution starts at 0.
Why Do We Divide by Expected?
We use an example to help explain.
Scenario 1: The expected number of red M&M’S is 6 and we get 16 red M&M’S.
Scenario 2: The expected number of red M&M’S is 500 and we get 510 M&M’S.
Which scenario provides more convincing evidence against the company’s claim? In both scenarios, the observed value is 10 away from the expected. But Scenario 1 provides much more convincing evidence. The important idea is how far away the observed count is from the expected count as a fraction of the expected values.