A computer system administrator notices that computers running a particular operating system seem to crash more often as the installation of the operating system ages. She measures the time (in minutes) before crash for five computers one month after installation, and for six computers seven months after installation. The results are as follows:

One month after:
209
217
230
221
243

Seven months after:
85
129
201
176
105
149

(a) Explain why it is necessary to check whether the populations are approximately normal before constructing a confidence interval.

Since the sample size is [small/large?] , it is necessary to check that the populations are approximately normal.

3 answers

It is necessary to check whether the populations are approximately normal before constructing a confidence interval because the validity of many statistical methods and procedures, including the construction of confidence intervals, relies on the assumption of normality, especially when sample sizes are small.

When the sample size is small (typically considered to be less than 30), the Central Limit Theorem does not apply effectively, and the sample means may not be normally distributed even if the population from which they are drawn is. If the underlying populations are not approximately normal, the confidence intervals constructed may not be accurate, leading to incorrect conclusions about the data.

In your case, since the sample sizes are small (five computers one month after installation and six computers seven months after installation), it is crucial to verify that the populations from which these samples are drawn are approximately normal. This could be done through graphical methods (such as histograms or Q-Q plots) or statistical tests for normality (like the Shapiro-Wilk test).

Thus, the sentence would read as:

Since the sample size is small, it is necessary to check that the populations are approximately normal.

Construct a 90% confidence interval. Round to 1 decimal place

To construct a 90% confidence interval for the time before crash for both the one-month and seven-month data, we first need to calculate the means and standard deviations for each group, and then use these to find the confidence intervals.

Step 1: Calculate the Means and Standard Deviations

One Month After:

  • Data: 209, 217, 230, 221, 243
  • Mean (\( \bar{x}_1 \)): \[ \bar{x}_1 = \frac{209 + 217 + 230 + 221 + 243}{5} = \frac{1120}{5} = 224 \]
  • Standard Deviation (\( s_1 \)): \[ s_1 = \sqrt{\frac{\sum (x_i - \bar{x}_1)^2}{n - 1}} \]
    • First, calculate \( (x_i - \bar{x}_1)^2 \):

      • (209 - 224)² = 225
      • (217 - 224)² = 49
      • (230 - 224)² = 36
      • (221 - 224)² = 9
      • (243 - 224)² = 361
    • Sum of squares: \[ 225 + 49 + 36 + 9 + 361 = 680 \]

    • Now calculate \( s_1 \): \[ s_1 = \sqrt{\frac{680}{5 - 1}} = \sqrt{\frac{680}{4}} = \sqrt{170} \approx 13.0 \]

Seven Months After:

  • Data: 85, 129, 201, 176, 105, 149
  • Mean (\( \bar{x}_2 \)): \[ \bar{x}_2 = \frac{85 + 129 + 201 + 176 + 105 + 149}{6} = \frac{845}{6} \approx 140.8 \]
  • Standard Deviation (\( s_2 \)): \[ s_2 = \sqrt{\frac{\sum (x_i - \bar{x}_2)^2}{n - 1}} \]
    • First, calculate \( (x_i - \bar{x}_2)^2 \):

      • (85 - 140.8)² ≈ 3123.84
      • (129 - 140.8)² ≈ 139.24
      • (201 - 140.8)² ≈ 3656.64
      • (176 - 140.8)² ≈ 1264.64
      • (105 - 140.8)² ≈ 1277.24
      • (149 - 140.8)² ≈ 66.24
    • Sum of squares: \[ 3123.84 + 139.24 + 3656.64 + 1264.64 + 1277.24 + 66.24 \approx 9687.84 \]

    • Now calculate \( s_2 \): \[ s_2 = \sqrt{\frac{9687.84}{6 - 1}} = \sqrt{\frac{9687.84}{5}} \approx \sqrt{1937.568} \approx 44.0 \]

Step 2: Find the Confidence Intervals

To find the 90% confidence interval for each group, we can use the t-distribution because the sample sizes are small.

  • For \( n = 5 \) (one-month data), \( \alpha = 0.10 \) gives \( \frac{\alpha}{2} = 0.05 \). The critical value \( t_{0.05, 4} \) (df = n-1 = 4) is approximately 2.132.

  • For \( n = 6 \) (seven-month data), \( \alpha = 0.10 \) gives \( \frac{\alpha}{2} = 0.05 \). The critical value \( t_{0.05, 5} \) (df = n-1 = 5) is approximately 2.015.

One Month: \[ \text{Margin of Error} = t_{0.05, 4} \cdot \frac{s_1}{\sqrt{n_1}} = 2.132 \cdot \frac{13.0}{\sqrt{5}} \approx 2.132 \cdot 5.8 \approx 12.4 \]

90% CI for one month: \[ (\bar{x}_1 - \text{Margin of Error}, \bar{x}_1 + \text{Margin of Error}) = (224 - 12.4, 224 + 12.4) = (211.6, 236.4) \] Rounded: \[ (211.6, 236.4) \]

Seven Months: \[ \text{Margin of Error} = t_{0.05, 5} \cdot \frac{s_2}{\sqrt{n_2}} = 2.015 \cdot \frac{44.0}{\sqrt{6}} \approx 2.015 \cdot 18.0 \approx 36.4 \]

90% CI for seven months: \[ (\bar{x}_2 - \text{Margin of Error}, \bar{x}_2 + \text{Margin of Error}) = (140.8 - 36.4, 140.8 + 36.4) = (104.4, 177.2) \] Rounded: \[ (104.4, 177.2) \]

Summary of Confidence Intervals

  • One month after installation: (211.6, 236.4)
  • Seven months after installation: (104.4, 177.2)

These intervals provide an estimate of the average crash times for each group at the 90% confidence level.