Let X_1, \ldots , X_ n be iid samples with cdf F, and let F^0 denote the cdf of \textsf{Unif}(0,1). Recall that
F^0(t) = t \cdot \, \mathbf{1}(t \in [0,1]) + 1 \cdot \mathbf{1}(t > 1) .
We want to use goodness of fit testing to determine whether or not X_1, \ldots , X_ n \stackrel{iid}{\sim } \textsf{Unif}(0,1). To do so, we will test between the hypotheses
\displaystyle H_0 \displaystyle : F(t) = F^0
\displaystyle H_1 \displaystyle : F(t) \neq F^0.
To make computation of the test statistic easier, let us first reorder the samples from smallest to largest, so that
X_{(1)} \leq X_{(2)} \leq \ldots \leq X_{(n)}
is the reordered sample. In this set-up, the Kolmogorov-Smirnov test statistic is given by the formula
T_ n = \sqrt{n} \max _{i = 1, \ldots , n} \left\{ \max \left(\bigg| \frac{i -1}{n} - X_{(i)} \mathbf{1}\left( X_{(i)} \in [0,1]\right) \bigg|, \bigg| \frac{i }{n} - X_{(i)} \mathbf{1}\left( X_{(i)} \in [0,1]\right) \bigg|\right) \right\} .
You observe the data set \mathbf{x} consisting of 5 samples:
\mathbf{x}= 0.8, 0.7, 0.4, 0.7, 0.2
Using the formula above, what is the value of T_{5} for this data set? (You are encouraged to use computational tools.)
1 answer
\mathbf{x}= 0.2, 0.4, 0.7, 0.7, 0.8
Next, let's plug in the values into the formula for T_5:
T_5 = sqrt(5) * max{ |(1-1)/5 - 0.2 * 1|, |(2/5) - 0.4 * 1|, |(3/5) - 0.7 * 1|, |(4/5) - 0.7 * 1|, |(5/5) - 0.8 * 1| }
Calculating each component individually:
|(1-1)/5 - 0.2 * 1| = |0 - 0.2| = 0.2
|(2/5) - 0.4 * 1| = |0.4 - 0.4| = 0
|(3/5) - 0.7 * 1| = |0.6 - 0.7| = 0.1
|(4/5) - 0.7 * 1| = |0.8 - 0.7| = 0.1
|(5/5) - 0.8 * 1| = |1 - 0.8| = 0.2
Taking the maximum of these values, we have:
T_5 = sqrt(5) * max{0.2, 0, 0.1, 0.1, 0.2}
T_5 = sqrt(5) * 0.2
T_5 ≈ 0.447
Therefore, the value of T_5 for this data set is approximately 0.447.