# Tell the Whole Story: Evidence for Ha by Josh Tabor

*Today we have Josh Tabor as our guest blogger. Josh is a high school statistics teacher at Canyon del Oro high school in Arizona. He also the co-author of *__Statistical Reasoning in Sports__*, *__Statistics and Probability with Applications__* (SPA), and *__The Practice of Statistics__* (TPS). Josh is a question leader at the AP Statistics Exam Reading each year, an AP Statistics consultant for the College Board, and also one of my greatest mentors. Josh has and continues to transform the way I teach statistics.*

By the end of the year, my students can write a conclusion to a significance test with great proficiency: “Because the *p*-value of 0.03 is less than = 0.05, we reject *H*0. There is convincing evidence that…” However, if you ask them to interpret the *p*-value, they will often look at you with a mixture of fear and confusion. I am convinced that the root of this problem is that students aren’t thinking about the question the *p*-value is trying to answer.

**How I know this matters**

This really hit home as I graded __Question #5 on the 2013 AP exam__.

Here is the short version of this item: An observational study was conducted to see if there was a relationship between meditation and blood pressure among men who lived in a retirement community. Of the 11 men who meditated daily, 0 had high blood pressure while 8 of the 17 who didn’t meditate had high blood pressure.

Part (a) of the item asked students to recognize __correlation doesn’t imply causation__ as this was an observational study.

Part (b) asked students to explain why a two-sample *z* test for a difference in proportions was not appropriate (the large counts condition isn’t met).

Because the two-sample *z* test is inappropriate, Part (c) asked students to use the results of a simulation to draw a conclusion about the study. The item presented a graph that showed the simulated sampling distribution of the difference in proportions under the assumption that the null hypothesis was true (p*med* = p*not *). Here are the results of 100 trials of this simulation, using the __One Categorical Variable applet.__

**The problem**

On the AP Exam, what did over 99% of the students do at this point? *They completely ignored the data from the study!* That is, fewer than 1% of students thought to compare the proportion of meditators who had high blood pressure (0/11 = 0) to the proportion of non-meditators who had high blood pressure (8/17 = 0.47).

Here’s what students should have been thinking: “In the actual study, the difference in proportions was –0.47. Wow! This seems like a pretty big difference. But, maybe there is no difference in the true proportions, and the researchers got a difference this big by chance alone. Hmmm…I wonder how likely it is to get a difference like this if there is no difference between meditators and non-meditators?”

The answer to this last question is the *p*-value, which can be easily estimated from the simulation. In the graph above, there is only one dot less than or equal to –0.47 , so the *p*-value is approximately 1/100. But like I said earlier, very few students even considered the difference at all. Very disappointing—especially because I suspect a majority of students could make very good progress on a traditional significance test.

**The solution: evidence vs. convincing evidence**

Significance test questions typically ask “Is there convincing evidence that …” However, it is hard for students to understand if the evidence is convincing if they don’t ever consider the evidence in the first place. So, from the beginning of the year I try to develop a framework that contrasts *evidence* with *convincing evidence*. I use this framework in my first-day activity (__Hiring Discrimination__ from *The Practice of Statistics)* and keep revisiting it throughout the year.

**How do I change my instruction?**

When we get to formal significance testing, I continue to expect my students to think about the evidence *before* they decide whether or not the evidence is convincing. To make this happen, I have students write down the evidence for the alternative hypothesis immediately after they state hypotheses and define parameter(s).

On a test I gave recently, the hypotheses were *H*0: *p* = 0.20 and *Ha*: *p* < 0.20, with *p* = the proportion of all cereal boxes with a voucher for a free video rental (__2005 #4__, for those that recognize the context). Here is an example of what I expect from students:

After they state the evidence for *Ha* (line 3 in the student work above), I’d like them to think about (but not write down) the two explanations for the evidence they identified:

The proportion of boxes with vouchers really is 0.20 and we got a value of 0.169 by chance alone.

The proportion of boxes with vouchers really is less than 0.20 (evil cereal company!)

To determine if the first explanation is plausible, we want to know how likely is it to get a proportion of 0.169 or smaller by chance alone if the company is telling the truth. *The answer is the p-value. *Once they know what question they are trying to answer, the *p*-value makes a lot more sense!

For this item, the *p*-value is 0.268. The interpretation is just the next part of the story: *Assuming that the true proportion of boxes with vouchers is p = 0.20, there is a 0.268 probability of getting a sample proportion less than or equal to 0.169 by chance alone.* Because this probability is larger than 0.05, we don’t have convincing evidence that the true proportion of boxes with vouchers is less than 0.20.

**Final thoughts**

Even though it isn’t required on the AP exam, it’s worth having students state the evidence for the alternative hypothesis. It helps students interpret *p*-values and make conclusions. Students are less tempted to “accept the null hypothesis” because they already know there is evidence against it.

If you aren’t convinced yet, try the following: Give students the cereal box problem from earlier, but make the sample proportion 15/65 (0.231). A student who thinks about the *whole* story won’t waste time going through the full significance test procedure. If is greater than 0.20, there isn’t convincing evidence for *Ha *because there isn’t *any* evidence for *Ha*. And that’s the end of this (short) story.