To find the correlation coefficient \( r \) for the dataset given by the points \((-3, -40)\), \( (1, 12)\), \( (5, 72)\), and \( (7, 137)\), we can use the formula for the Pearson correlation coefficient, which is:
\[ r = \frac{n(\sum xy) - (\sum x)(\sum y)}{\sqrt{[n \sum x^2 - (\sum x)^2][n \sum y^2 - (\sum y)^2]}} \]
Where:
- \( n \) is the number of data points
- \( x \) and \( y \) are the variables being correlated
Step 1: Calculate the required sums
-
Calculate \( n \): \[ n = 4 \quad (\text{since there are 4 points}) \]
-
Calculate \( \sum x \), \( \sum y \), \( \sum xy \), \( \sum x^2 \), and \( \sum y^2 \):
-
Points:
- \( (-3, -40) \)
- \( (1, 12) \)
- \( (5, 72) \)
- \( (7, 137) \)
-
Calculations: \[ \sum x = -3 + 1 + 5 + 7 = 10 \] \[ \sum y = -40 + 12 + 72 + 137 = 181 \] \[ \sum xy = (-3)(-40) + (1)(12) + (5)(72) + (7)(137) = 120 + 12 + 360 + 959 = 1451 \] \[ \sum x^2 = (-3)^2 + (1)^2 + (5)^2 + (7)^2 = 9 + 1 + 25 + 49 = 84 \] \[ \sum y^2 = (-40)^2 + (12)^2 + (72)^2 + (137)^2 = 1600 + 144 + 5184 + 18769 = 25397 \]
-
Step 2: Substitute into the formula
Now we can substitute these values into the formula for \( r \):
\[ r = \frac{4(1451) - (10)(181)}{\sqrt{[4(84) - (10)^2][4(25397) - (181)^2]}} \]
Calculating the components step-by-step:
-
Calculating \( 4(1451) - (10)(181) \): \[ = 5804 - 1810 = 3994 \]
-
Calculating \( n \sum x^2 - (\sum x)^2 \): \[ = 4(84) - (10)^2 = 336 - 100 = 236 \]
-
Calculating \( n \sum y^2 - (\sum y)^2 \): \[ = 4(25397) - (181)^2 = 101588 - 32761 = 68827 \]
Now, substituting these into the formula for \( r \):
\[ r = \frac{3994}{\sqrt{236 \cdot 68827}} \]
Calculating the denominator:
\[ \sqrt{236 \cdot 68827} \approx \sqrt{16263932} \approx 4032.08 \]
So,
\[ r \approx \frac{3994}{4032.08} \approx 0.989 \]
Step 3: Correlation Coefficient Description
If we assume you found \( r = 0.982 \), we can describe the correlation as follows:
- The correlation coefficient \( r = 0.982 \) indicates a very strong positive linear relationship between the two variables represented by the points.
- This means that as the x-values increase, the y-values tend to also increase, and the data points are closely clustered around the line of best fit.
In summary, a correlation coefficient of \( 0.982 \) suggests a strong predictable linear association between the variables, confirming that approximately 96.4% (i.e., \( r^2 \)) of the variability in y can be explained by the linear relationship with x.