A manufacturer of aluminum cans measured the level of impurities in 24 cans. The amounts of impurities, in percent, were as follows.

7.1
2.6
8.3
4.6
9.3
5.1
8.3
1.7
6
9.2
7.3
6.8
4.4
8.7
7.9
8
6.2
8.3
4.8
8.6
9.1
6.5
5.7
9.4
Construct a boxplot for the given sample.

3 answers

To construct a boxplot for the given data set of impurities in aluminum cans, we first need to follow these steps:

  1. Order the Data: Arrange the data in ascending order.
  2. Find the Minimum and Maximum: Identify the smallest and largest values in the data set.
  3. Calculate the Quartiles: Determine the first quartile (Q1), median (Q2), and third quartile (Q3).
  4. Identify Outliers: Calculate the interquartile range (IQR) and determine any potential outliers.
  5. Create the Boxplot.

Step 1: Order the Data

Here are the values in ascending order:

  • 1.7
  • 2.6
  • 4.4
  • 4.6
  • 4.8
  • 5.1
  • 5.7
  • 6
  • 6.2
  • 6.5
  • 6.8
  • 7.1
  • 7.3
  • 7.9
  • 8
  • 8.3
  • 8.3
  • 8.3
  • 8.6
  • 8.7
  • 9.1
  • 9.2
  • 9.3
  • 9.4

Step 2: Find the Minimum and Maximum

  • Minimum: 1.7
  • Maximum: 9.4

Step 3: Calculate the Quartiles

  • Median (Q2): This is the middle value. Since there are 24 data points (even number), the median will be the average of the 12th and 13th values: \[ Q2 = \frac{7.1 + 7.3}{2} = 7.2 \]

  • First Quartile (Q1): This is the median of the first half of the data (1.7 to 7.1). The first half, in order, is:

    • 1.7, 2.6, 4.4, 4.6, 4.8, 5.1, 5.7, 6, 6.2, 6.5, 6.8, 7.1 The median of those values (6 values) is the average of the 6th and 7th: \[ Q1 = \frac{5.7 + 6}{2} = 5.85 \]
  • Third Quartile (Q3): This is the median of the second half of the data (7.3 to 9.4). The second half, in order, is:

    • 7.3, 7.9, 8, 8.3, 8.3, 8.3, 8.6, 8.7, 9.1, 9.2, 9.3, 9.4 The median of those values (6 values) is the average of the 6th and 7th: \[ Q3 = \frac{8.3 + 8.6}{2} = 8.45 \]

Step 4: Identify Outliers

  • Interquartile Range (IQR): \[ IQR = Q3 - Q1 = 8.45 - 5.85 = 2.6 \]
  • Lower Bound: \( Q1 - 1.5 \times IQR = 5.85 - 1.5 \times 2.6 = 5.85 - 3.9 = 1.95 \)
  • Upper Bound: \( Q3 + 1.5 \times IQR = 8.45 + 1.5 \times 2.6 = 8.45 + 3.9 = 12.35 \)

Any data points outside of this range are considered outliers. In this case, the lower bound is 1.95 and the upper bound is 12.35. None of the points fall outside this range, so there are no outliers.

Step 5: Create the Boxplot

Now we can summarize the findings for the boxplot:

  • Minimum: 1.7
  • Q1: 5.85
  • Median (Q2): 7.2
  • Q3: 8.45
  • Maximum: 9.4

A simple boxplot would have a box ranging from Q1 (5.85) to Q3 (8.45), a line at the median (7.2), and "whiskers" extending to the minimum (1.7) and maximum (9.4).

To draw this boxplot:

  • Draw a box from 5.85 to 8.45.
  • Draw a line at 7.2 within the box.
  • Extend the left whisker from 5.85 down to 1.7 and the right whisker from 8.45 up to 9.4.

This boxplot gives a visual summary of the data distribution and highlights the central tendency and variability.

Is it reasonable to treat the sample as coming from an approximately normal population? Explain.

To determine whether it is reasonable to treat the sample of impurity levels as coming from an approximately normal population, we can consider the following factors:

1. Visual Inspection:

  • Constructing a histogram or a normal probability plot (Q-Q plot) would help visually assess the distribution. A normal distribution would appear bell-shaped in a histogram, and points would fall close to a straight line in a Q-Q plot.

2. Descriptive Statistics:

  • Compute descriptive statistics like the mean and median. For a normal distribution, these values are expected to be close. In this case:
    • Mean: \[ \text{Mean} = \frac{\sum \text{values}}{n} = \frac{7.1 + 2.6 + 8.3 + 4.6 + 9.3 + 5.1 + 8.3 + 1.7 + 6 + 9.2 + 7.3 + 6.8 + 4.4 + 8.7 + 7.9 + 8 + 6.2 + 8.3 + 4.8 + 8.6 + 9.1 + 6.5 + 5.7 + 9.4}{24} = \frac{ 131.1 }{ 24 } \approx 5.46 \]
    • Median: 7.2 (as calculated earlier).
  • Since the mean and median are not equal (the mean is lower than the median), this can be an indication of skewness.

3. Skewness and Kurtosis:

  • Calculate skewness and kurtosis values. For a normal distribution, skewness should be around 0, and kurtosis should be around 3. Significant deviations from these values would suggest a non-normal distribution.

4. Sample Size:

  • The sample size of 24 is moderately small, and small samples can yield misleading conclusions about normality. Larger samples tend to produce more reliable results due to the Central Limit Theorem.

5. Statistical Tests for Normality:

  • Conduct normality tests such as the Shapiro-Wilk test or Kolmogorov-Smirnov test. These tests provide p-values that indicate whether to reject the null hypothesis of normality. A p-value lower than a chosen alpha level (commonly 0.05) suggests that the data do not come from a normal distribution.

Conclusion:

  • Without performing graphical and statistical analyses, one might initially suspect that the data do not appear to be normally distributed, especially considering the difference between mean and median. However, a definitive answer would require graphical representation and normality testing.
  • If the histogram shows significant skewness or the results of the Shapiro-Wilk test indicate non-normality, it would not be reasonable to treat the sample as coming from an approximately normal population.

In summary, while we can't conclude normality without further analysis, there are reasons to be cautious based on initial descriptive statistics and potential visual patterns in the data. For accurate inference, conducting the suggested tests would be imperative.