To calculate the linear regression, residuals, and correlation between the number of people and the cost of drinks, you can follow these steps:
1. First, create a table of the given data:
Number of People: 12, 14, 15, 18, 20, 16, 14, 13, 18, 19, 20, 22
Cost of Drinks: 24, 30, 36, 38, 65, 44, 36, 30, 39, 76, 80, 85
2. Calculate the mean (average) of the number of people and the cost of drinks. Let's denote them as num_mean and cost_mean, respectively.
num_mean = (12 + 14 + 15 + 18 + 20 + 16 + 14 + 13 + 18 + 19 + 20 + 22) / 12
= 193 / 12
= 16.08 (rounded to two decimal places)
cost_mean = (24 + 30 + 36 + 38 + 65 + 44 + 36 + 30 + 39 + 76 + 80 + 85) / 12
= 563 / 12
= 46.92 (rounded to two decimal places)
3. Calculate the deviations from the mean for both variables (number of people and cost of drinks) and create a new table:
Number of People: X | Deviation from Mean (X - num_mean) | Cost of Drinks: Y | Deviation from Mean (Y - cost_mean)
-----|---------------------------------------| -----|-------------------------------------------
12 | 12 - 16.08 | 24 | 24 - 46.92
14 | 14 - 16.08 | 30 | 30 - 46.92
15 | 15 - 16.08 | 36 | 36 - 46.92
18 | 18 - 16.08 | 38 | 38 - 46.92
20 | 20 - 16.08 | 65 | 65 - 46.92
16 | 16 - 16.08 | 44 | 44 - 46.92
14 | 14 - 16.08 | 36 | 36 - 46.92
13 | 13 - 16.08 | 30 | 30 - 46.92
18 | 18 - 16.08 | 39 | 39 - 46.92
19 | 19 - 16.08 | 76 | 76 - 46.92
20 | 20 - 16.08 | 80 | 80 - 46.92
22 | 22 - 16.08 | 85 | 85 - 46.92
4. Calculate the product of the deviations (X - num_mean) * (Y - cost_mean) for each row and add them up:
Sum of (X - num_mean) * (Y - cost_mean) = (12 - 16.08) * (24 - 46.92) +
(14 - 16.08) * (30 - 46.92) +
(15 - 16.08) * (36 - 46.92) +
(18 - 16.08) * (38 - 46.92) +
(20 - 16.08) * (65 - 46.92) +
(16 - 16.08) * (44 - 46.92) +
(14 - 16.08) * (36 - 46.92) +
(13 - 16.08) * (30 - 46.92) +
(18 - 16.08) * (39 - 46.92) +
(19 - 16.08) * (76 - 46.92) +
(20 - 16.08) * (80 - 46.92) +
(22 - 16.08) * (85 - 46.92)
= (-4.08) * (-22.92) +
(-2.08) * (-16.92) +
(-1.08) * (-10.92) +
(1.92) * (-8.92) +
(3.92) * (18.08) +
(-0.08) * (-2.92) +
(-2.08) * (-10.92) +
(-3.08) * (-16.92) +
(1.92) * (-7.92) +
(2.92) * (29.08) +
(3.92) * (33.08) +
(5.92) * (38.08)
= 10295.22
5. Calculate the sum of squares of deviations of X (Number of People) and Y (Cost of Drinks):
Sum of (X - num_mean)^2 = (12 - 16.08)^2 + (14 - 16.08)^2 + (15 - 16.08)^2 + (18 - 16.08)^2 + (20 - 16.08)^2 +
(16 - 16.08)^2 + (14 - 16.08)^2 + (13 - 16.08)^2 + (18 - 16.08)^2 + (19 - 16.08)^2 +
(20 - 16.08)^2 + (22 - 16.08)^2
= 27.18
Sum of (Y - cost_mean)^2 = (24 - 46.92)^2 + (30 - 46.92)^2 + (36 - 46.92)^2 + (38 - 46.92)^2 + (65 - 46.92)^2 +
(44 - 46.92)^2 + (36 - 46.92)^2 + (30 - 46.92)^2 + (39 - 46.92)^2 + (76 - 46.92)^2 +
(80 - 46.92)^2 + (85 - 46.92)^2
= 11651.68
6. Calculate the slope (b) of the linear regression line:
b = Sum of (X - num_mean) * (Y - cost_mean) / Sum of (X - num_mean)^2
= 10295.22 / 27.18
= 379.37 (rounded to two decimal places)
7. Calculate the y-intercept (a) of the linear regression line:
a = cost_mean - (b * num_mean)
= 46.92 - (379.37 * 16.08)
= -5988.71 (rounded to two decimal places)
8. The linear regression equation will be in the form of:
cost_of_drinks = a + b * number_of_people
Substituting the values of a and b, we get:
cost_of_drinks = -5988.71 + 379.37 * number_of_people
9. Calculate the residuals (differences between actual data points and predicted values):
For each data point (X, Y), calculate Y_pred using the linear regression equation, and then calculate the difference (residual).
For example, let's calculate the residual for the first data point (12, 24):
Y_pred = -5988.71 + 379.37 * 12
= -5988.71 + 4552.44
= -1436.27 (rounded to two decimal places)
Residual = 24 - (-1436.27)
= 1460.27 (rounded to two decimal places)
Calculate residuals for each data point in a similar manner.
10. Finally, to calculate the correlation between the two variables, divide the sum of (X - num_mean) * (Y - cost_mean) by the square root of the product of the sum of (X - num_mean)^2 and the sum of (Y - cost_mean)^2.
correlation = (Sum of (X - num_mean) * (Y - cost_mean)) / sqrt((Sum of (X - num_mean)^2) * (Sum of (Y - cost_mean)^2))
= 10295.22 / sqrt(27.18 * 11651.68)
= 0.9064 (rounded to four decimal places)
Therefore, the linear regression equation is cost_of_drinks = -5988.71 + 379.37 * number_of_people, the residuals can be calculated for each data point, and the correlation between the number of people and the cost of drinks is 0.9064.