Question 3 [17 Marks)
There are five (5) set of data; A to E is provided for this question. Each group will be assigned to analyze
ONE dataset only and answer the questions below. The assigned dataset will be determined by your
lecturer in separate announcement in MMLS.
Below are the descriptions for each of the dataset:
Dataset A : The high level of serum cholesterol may result in the risk of having cardiovascular disease.
There are several researches which found that the body weight is closely related to the cardiovascular
disease. In this study, we would like to investigate if the body weight (in pound) may influence the level
of serum cholesterol (in mg) of people in order to identify the prevention strategy towards
cardiovascular disease.
Dataset B: This data gives the values of food price index and house price measure for a particular state
in UK from 1971 to 1989. In this study, we would like to investigate if the food price index (in %) may has
some significant effect on the average price (in $100) of houses in the state.
Dataset C: This data provides list of standardized mortality ratio (SMR) of deaths from lung cancer for 25
groups of job among male employees. Also given are smoking ratios for each job group. In this study, we
would like to identify if the smoking ratio may influence the lung cancer SMR.
Dataset D: This data come from a study in Britain of possible influences on the extent to which parents
consult a doctor when their child is ill. The parents were asked of how often each child had been ill in
the past two weeks and reported to the doctor. Ages of the child were also recorded. Does age have an
effect on the rate of reporting illness to doctors?
Dataset E: This data consist of 36 consecutive monthly sales and monthly advertising expenditures for a
dietary weight control product. This study would like to investigate if there is any significant effect of the
total expenditures of advertising towards total sales.
Using the dataset in EXCEL spreadsheet , answer the following questions.
a. Determine the dependent variable, Y and independent variable, X for your dataset based on the
description provided. [2 marks]
b. Generate the simple linear regression model using EXCEL function between dependent variable,
Y and the independent, X. [3 marks]
c. Determine the estimated regression equation from output in (b) and interpret the coefficients.
[3 marks]
d. Can we infer that the dependent variable has significant association with the independent
variable? [4 marks]
e. State the coefficient of determination and interpret its value. [2 marks]
f. Graph the relationship between the two variables as a scatter plot. Discuss the relationship of
the variables based on the pattern of straight line of the scatter plot. [3 marks]