Graphing Scatterplots and Trend Lines Before data is entered into the regression calculator, some data may be

Question

Graphing Scatterplots and Trend Lines
Before data is entered into the regression calculator, some data may be excluded. Explain why the data pair (0, 100) was excluded for the comparison of hours slept the night before and the test score. Describe what this data point represents. If it was included, would the trend line describe the data well?

GPT-5 mini · Answer

The point (0, 100) was excluded because it is an extreme outlier / high-leverage case for the sleep–score relationship and therefore would distort the regression. Concretely it represents a student who reported 0 hours of sleep the night before the test but scored 100 on the test.

Why that justifies exclusion
- It’s unlikely to be representative of the process we want to model (most students sleep some positive amount); the point may reflect a data-entry error, cheating, or an exceptional case (e.g., already mastered the material).
- Statistically it is a high-leverage outlier: at x = 0 it lies far from the x-values of the main cluster, so it can strongly pull the fitted line away from where it would lie for the bulk of the data.
- Including such a point can violate the assumptions of the linear model (e.g., homoscedasticity, linearity) for the main group.

What happens if it’s included
- The regression line will be pulled toward that point and can give a misleading slope and intercept for the majority of observations; R² and residuals will change substantially.
- In short, the trend line would not describe the main body of the data well if that point is a nonrepresentative outlier.

Recommended approach
- Verify the point (check for data-entry error or special circumstances).
- If it is a valid but exceptional observation, report results both with and without it (or use robust regression) and justify whichever choice you make.

All Answers 1

Submit Your Answer