Let us compute the t-SNE output for the isoceles triangle as in the previous problem.
Recall we are given 3 points in 2 dimensions, as shown in the figure below:
\mathbf{x}^{(1)} the center node , and \mathbf{x}^{(2)} and \mathbf{x}^{(3)} the boundary nodes. The 2 boundary nodes are at distance A to the center node, and are farther away from each other, at distance B (B>A) than they are from the center node. Here, we are given A=1, and B=\sqrt{2}.
The distribution \mathbf{P} in 2D remains the same as before, and for this small example, we will again try to minimize \text {KL}(\mathbf{P}||\mathbf{Q}) by solving for \text {KL}(\mathbf{P}||\mathbf{Q})=0.
What is different from before is that we will use the t-distribution to define \mathbf{Q} in the target space.
Assume again that the projection onto 1D is as follow with unknown distance a between the center and boundary nodes, but a distance 2a between the boundary nodes.
Write down q_{ij} in terms of the distance a and find a such that q_{12}=p_{12}, giving \text {KL}(\mathbf{P}||\mathbf{Q})=0.
(Enter an answer accurate to at least 3 decimal places.)
a=\quad
1 answer
First, let's write down q_{ij} in terms of the distance "a":
q_{ij} = \frac{(1 + ||y_i - y_j||^2)^{-1}}{\sum_{k\neq l}(1 + ||y_k - y_l||^2)^{-1}}
Since we are projecting onto 1D, the Euclidean distance in the target space is simply the absolute difference between the projections of the points.
Based on the given information, we know that the distance between the center and boundary nodes is "a" and the distance between the boundary nodes is 2a.
Let's consider q_{12}:
q_{12} = \frac{(1 + ||y_1 - y_2||^2)^{-1}}{\sum_{k\neq l}(1 + ||y_k - y_l||^2)^{-1}}
Since y_1 and y_2 represent the projections of x^{(1)} and x^{(2)} respectively, we can write:
||y_1 - y_2|| = |a - (-a)| = 2a
Substituting this into the expression for q_{12}, we get:
q_{12} = \frac{(1 + (2a)^2)^{-1}}{\sum_{k\neq l}(1 + ||y_k - y_l||^2)^{-1}}
Now, we need to find the value of "a" such that q_{12} = p_{12}.
Given that A = 1, which represents the distance between the center and the boundary nodes, and B = sqrt(2), which represents the distance between the two boundary nodes, we know that p_{12} = A/B^2.
Therefore, we need to solve the equation:
q_{12} = \frac{(1 + (2a)^2)^{-1}}{\sum_{k\neq l}(1 + ||y_k - y_l||^2)^{-1}} = \frac{A}{B^2}
Substituting the given values A = 1 and B = sqrt(2), we get:
\frac{(1 + (2a)^2)^{-1}}{\sum_{k\neq l}(1 + ||y_k - y_l||^2)^{-1}} = \frac{1}{2}
Now, we can solve for "a" by simplifying the equation:
(1 + (2a)^2)^{-1} = \frac{1}{2} \times \sum_{k\neq l}(1 + ||y_k - y_l||^2)^{-1}
(1 + (2a)^2)^{-1} = \frac{1}{2} \times \left(\frac{1}{(1 + (2a)^2)^{-1}}\right)
(1 + (2a)^2)^{-1} = \frac{1}{2} \times (1 + (2a)^2)
Simplifying further, we get:
1 = \frac{1}{2}(1 + (2a)^2)^2
Expanding the square, we get:
1 = \frac{1}{2}(1 + 4a^2 + 4a^4)
Multiplying through by 2, we get:
2 = 1 + 4a^2 + 4a^4
Rearranging terms, we get:
4a^4 + 4a^2 - 1 = 0
This is a quadratic equation in terms of a^2. Solving for a^2 using a numerical solver, we find:
a^2 ≈ 0.170
Taking the square root of both sides, we get:
a ≈ 0.412
Therefore, the value of "a" that satisfies q_{12} = p_{12} is approximately 0.412.