Let X_1, \ldots , X_ n be iid samples with cdf F, and let F^0 denote the cdf of \textsf{Unif}(0,1). Recall that

F^0(t) = t \cdot \, \mathbf{1}(t \in [0,1]) + 1 \cdot \mathbf{1}(t > 1) .

We want to use goodness of fit testing to determine whether or not X_1, \ldots , X_ n \stackrel{iid}{\sim } \textsf{Unif}(0,1). To do so, we will test between the hypotheses

\displaystyle H_0 \displaystyle : F(t) = F^0
\displaystyle H_1 \displaystyle : F(t) \neq F^0.
To make computation of the test statistic easier, let us first reorder the samples from smallest to largest, so that

X_{(1)} \leq X_{(2)} \leq \ldots \leq X_{(n)}

is the reordered sample. In this set-up, the Kolmogorov-Smirnov test statistic is given by the formula

T_ n = \sqrt{n} \max _{i = 1, \ldots , n} \left\{ \max \left(\bigg| \frac{i -1}{n} - X_{(i)} \mathbf{1}\left( X_{(i)} \in [0,1]\right) \bigg|, \bigg| \frac{i }{n} - X_{(i)} \mathbf{1}\left( X_{(i)} \in [0,1]\right) \bigg|\right) \right\} .

You observe the data set \mathbf{x} consisting of 5 samples:

\mathbf{x}= 0.8, 0.7, 0.4, 0.7, 0.2

Using the formula above, what is the value of T_{5} for this data set? (You are encouraged to use computational tools.)

1 answer

To compute the value of T_5 for this data set, we first need to reorder the samples from smallest to largest:

\mathbf{x}= 0.2, 0.4, 0.7, 0.7, 0.8

Next, let's plug in the values into the formula for T_5:

T_5 = sqrt(5) * max{ |(1-1)/5 - 0.2 * 1|, |(2/5) - 0.4 * 1|, |(3/5) - 0.7 * 1|, |(4/5) - 0.7 * 1|, |(5/5) - 0.8 * 1| }

Calculating each component individually:

|(1-1)/5 - 0.2 * 1| = |0 - 0.2| = 0.2

|(2/5) - 0.4 * 1| = |0.4 - 0.4| = 0

|(3/5) - 0.7 * 1| = |0.6 - 0.7| = 0.1

|(4/5) - 0.7 * 1| = |0.8 - 0.7| = 0.1

|(5/5) - 0.8 * 1| = |1 - 0.8| = 0.2

Taking the maximum of these values, we have:

T_5 = sqrt(5) * max{0.2, 0, 0.1, 0.1, 0.2}

T_5 = sqrt(5) * 0.2

T_5 ≈ 0.447

Therefore, the value of T_5 for this data set is approximately 0.447.