# Unit 2: Exploring Two-Variable Data

Updated: Sep 22, 2019

The new __College Board Course and Exam Description__ (CED) has presented the new standard for content and pacing in AP Statistics, and we have been making some adjustments to our __daily lesson plans__. Just as there were some changes we made in Unit 1: Exploring One-Variable Data (__mosaic plots__ and a __new definition of percentile__), there are some changes that the College Board has put into Unit 2: Exploring Two-Variable Data that require us to make a few tweaks.

**Nonlinear Data **

In the past, we used non-linear data as the caboose of the course, just after completing inference for linear regression. Now, we will teach these lessons along with all the other two-variable analysis. Here is our new pacing guide:

One of the bonuses of this new schedule is that students will now have the option to use a non-linear model to make their final prediction for the Barbie Bungee Finale.

**Outliers, Influential, and High-leverage Oh My!**

Holy moly. There is a lot of vocabulary to keep track of here. Let's look at the CED definition for each:

Outlier:An outlier in regression is a point that does not follow the general trend shown in the rest of the data and has a large residual when the Least Squares Regression Line (LSRL) is calculated.

Notice that the CED is requiring an outlier "has a large residual."

Influential point:An influential point in regression is any point that, if removed, changes the relationship substantially. Examples include much different slope,yintercept, and/or correlation.

This one feels pretty familiar. Usually we have students find the slope, y-intercept, and correlation of a set of data, then remove a point and re-calculate each of these values. If any of the values changes substantially (what does this even mean?), then the point is influential.

High leverage point: A high-leverage point in regression has a substantially larger or smallerx-value than the other observations have.

So here we are only looking at only one variable (x). If one of the points has an x-value that is substantially larger or smaller (doesn't this sound like the one-variable definition of outlier?), then the point is considered high-leverage.

**When Should I Teach Unit 2?**

Some teachers prefer to save this unit until the end of the course and pair it with the __inference for linear regression__. This is a good option to consider if you are trying to save some days, as you won't have to review anything before jumping into the inference for linear regression. We prefer to teach this unit early in the year, allow students time to forget everything, and then get a refresh at the end of the course.