The following data expressed as an air pollution index, give the air quality of a city for 10 randomly selected days.

57.3, 58.1, 58.7, 66.7, 58.6, 61.9, 59.0, 64.4, 62.6, 64.9
Draw a probability plot and test for normality.

1 answer

To assess the normality of the given air pollution index data, we can perform the following steps:

  1. Descriptive Statistics: Calculate basic statistics like mean, median, and standard deviation.
  2. Visual Inspection: Create a probability plot (Q-Q plot) to visually assess normality.
  3. Statistical Test: Perform a formal test for normality, such as the Shapiro-Wilk test.

Step 1: Descriptive Statistics

Let's calculate the mean, median, and standard deviation of the data.

Given data:

  • \( Data = [57.3, 58.1, 58.7, 66.7, 58.6, 61.9, 59.0, 64.4, 62.6, 64.9] \)

Mean: \[ \text{Mean} = \frac{(57.3 + 58.1 + 58.7 + 66.7 + 58.6 + 61.9 + 59.0 + 64.4 + 62.6 + 64.9)}{10} = \frac{ 62.1 }{10} = 61.08 \]

Median: To find the median, we first sort the data:

  • Sorted data: \( [57.3, 58.1, 58.6, 58.7, 59.0, 61.9, 62.6, 64.4, 64.9, 66.7] \)
  • The median is the average of the 5th and 6th observations: \[ \text{Median} = \frac{(59.0 + 61.9)}{2} = 60.45 \]

Standard Deviation: \[ \text{SD} = \sqrt{\frac{1}{n-1} \sum (x_i - \bar{x})^2} \] Calculating the squared differences:

  • Mean = 61.08
  • Squared differences: \[ \begin{align*} (57.3 - 61.08)^2 & = 14.54 \ (58.1 - 61.08)^2 & = 8.99 \ (58.7 - 61.08)^2 & = 5.63 \ (66.7 - 61.08)^2 & = 31.71 \ (58.6 - 61.08)^2 & = 6.06 \ (61.9 - 61.08)^2 & = 0.66 \ (59.0 - 61.08)^2 & = 4.32 \ (64.4 - 61.08)^2 & = 10.58 \ (62.6 - 61.08)^2 & = 2.34 \ (64.9 - 61.08)^2 & = 14.56 \ \end{align*} \]

The sum of squared differences is approximately 93.1, and dividing by \( 9 \) (since \( n-1 = 10-1 \)): \[ \text{Variance} = \frac{93.1}{9} \approx 10.34 \quad \text{thus} \quad \text{SD} = \sqrt{10.34} \approx 3.22 \]

Step 2: Visual Inspection - Q-Q Plot

To create a Q-Q plot, the quantiles of the data are plotted against the quantiles of a normal distribution. This is typically done with statistical software or programming languages such as Python or R.

Since I cannot produce graphical content directly, here's how you can generate it:

In Python:

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt

data = np.array([57.3, 58.1, 58.7, 66.7, 58.6, 61.9, 59.0, 64.4, 62.6, 64.9])
stats.probplot(data, dist="norm", plot=plt)
plt.title('Q-Q Plot')
plt.show()

Step 3: Perform Normality Test

For the Shapiro-Wilk test for normality:

In Python:

shapiro_test = stats.shapiro(data)
print(f'Statistic: {shapiro_test.statistic}, p-value: {shapiro_test.pvalue}')

Interpret the results:

  • A p-value > 0.05 indicates that we fail to reject the null hypothesis; the data is normally distributed.
  • A p-value ≤ 0.05 suggests rejecting the null hypothesis; the data is not normally distributed.

Conclusion

You'd need to run the code above to generate the Q-Q plot and obtain the results of the Shapiro-Wilk test. The visual and statistical results will help you determine whether the air pollution index data is normally distributed.