top of page

Assessing a Regression Model (Lesson 3.7)

Chapter 3 - Day 8

Learning Targets
  • Use a residual plot to determine whether a regression model is appropriate.
  • Interpret the standard deviation of the residuals.

  • Interpret r^2.

Activity: How Many iPhones Will be Sold?

Experience First

In this activity, students will be introduced to the residual plot. It might be worth reminding them before the activity that the residual is the difference between the actual y value and the predicted yvalue. On the scatterplot, the residual is the vertical distance between the point and the least squares regression line.

For the activity, students will use the 2 Quantitative Variable applet. The iPhone data for this activity will be the first time they see nonlinear data. There is a clear curved pattern to the iPhone sales over time. The overall purpose of this lesson is to decide which of competing models would do the best job of making predictions. We will use the residual plot, the standard deviation of the residuals (s), and the coefficient of determination (r^2) to help decide which model is best.

Formalize Later

A mathematical model (equation) is a bad model for making predictions when the residual plot shows a leftover curved pattern. We tell students to look out for residual plots that have a “smiley face” or “frown face” pattern. Ideally, we would like the residual plot to have no leftover curved pattern, or just an even and random scatter above and below 0.

We tried to relate s (standard deviation of the residuals) back to the scatterplot and the residual plot by recognizing that s is telling us the average distance that each point is away from the model in the scatterplot or how far each residual is away from the x-axis. We like the use of “typical” distance in the interpretation in the text because it is consistent with the interpretation of standard deviation from Lesson 1.7 (after all, s is a standard deviation).

Students already have some good knowledge about the correlation (r), notably that an r close to -1 or +1 suggests a strong linear relationship. When we square the correlation to get the coefficient of determination, we are looking for a value as close to +1 as possible. 

It is important that students use precise statistical language when trying to choose the best model for a set of data. All too often, students say “it is quadratic because it shows no leftover pattern”.  “it” is vague and needs to be clarified by saying “the scatterplot of iPhone sales versus year is quadratic because the residual plot for a quadratic model shows no leftover pattern. Students should know the distinction between scatterplot and residual plot. We are trying to find a model for the scatterplot. The residual plot tells us whether or not our model is a good one.

bottom of page