A medical statistician wanted to examine the relationship between the amount of sunshine (x) and incidence of skin cancer (y). As an experiment, he found the number of skin cancers detected per 100,000 of population and the average daily sunshine in eight counties around the country. These data are shown below.

Average Daily Sunshine 5 7 6 7 8 6 4 3
Skin Cancer per 100,000 7 11 9 12 15 10 7 5

Find the least squares regression line.

a)=4.522 - 11.465x
b)=-6.165 + 6.211x
c)=-1.115 + 1.846x
d)= 15.498 – 14.355x

1 answer

I'll give you a hint. It's either b) or c). Use a regression formula if you need to do this by hand.

Here is one you might use:

predicted y = a + bx
...where a represents the y-intercept and b the slope.

To get to that point, here are some formulas to calculate along the way.

To find a:
a = (Ey/n) - b(Ex/n)

Note: E here means to add up or to find the total.

To find b:
b = SSxy/SSxx

To find SSxy:
SSxy = Exy - [(Ex)(Ey)]/n

To find SSxx:
SSxx = Ex^2 - [(Ex)(Ex)]/n

It may help to set up a table of values. Make a column for X values, Y values, X^2 values (^2 means squared), Y^2 values, and XY values. Total each column. Note that XY means to take X * Y for each pair of values. X^2 means to square each X value. Y^2 means to square each Y value. The value n in all formulas is 8 (for sample size). Once you have all the values you need for the formulas, just plug the values into the formulas and go from there.