Horseshoes in multi-dimensional scaling

webR demo


This page contains R code that you can edit and run interactively in your browser – there is no server-side computing involved.

Points on a straight line

Let’s consider points on a straight line in \({\mathbb R}^d\) parameterized by

\[ x = ua \]

where the slope \(a\) is a vector in \({\mathbb R}^d\) and \(u\in\,]-\infty,+\infty[\) is the real-valued path parameter. We choose \(d=24\) for no particular reason, and sample 41 points from \(u=-20\) to \(u=+20\) at equal distances.

We compute the pairwise distance matrix d1 between all points and use cmdscale to perform classical multidimensional scaling

So far, so good. The embedding method faithfully reproduces the points’ relative positions.

With saturation

Now let’s consider that the distances aren’t measured perfectly well, but saturate: small distances are faithful, but large distances are reported as smaller than they are. We can use the following function to model this

\[ s(x) = x \left(1-e^{-x/x_0}\right). \]

Here, \(x_0\) is the distance scale at which distances become “large”. For \(x\ll x_0\), \(s(x)\approx x\), but for larger \(x\), the value \(s(x)\) is capped at \(x_0\). We can plot this function, for a particular choice of \(x_0\):

Let’s run multidimensional scaling again:

There is a horseshoe!

Diaconis, Goel and Holmes

In their paper “Horseshoes in multidimensional scaling and local kernel methods”, Diaconis, Goel, and Holmes (2008) looked at such situations. More precisely, they considered distance matrices that are concentrated along the diagonal. They set out from the example of the voting records of the members of the 2005 United States House of Representatives. From these records, they computed all pairwise distances (or dissimilarities), applied classical multi-dimensional scaling, and observed a horseshoe pattern.

For good measure, let us see what happens if we increase the saturation:

The horseshoe becomes even bendier.

What is going on?

Let’s look at some of the eigenvectors of the distance matrix d2 after double centering. These are used by cmdscale.

If we think of the double centered matrix d2 as a discretized version of a linear operator, we see that a dominating component is the negative of the second derivative, \(-d^2/dx^2\). Therefore, we see eigenfunctions that resemble harmonic functions. These, in turn, create the horseshoe pattern.


Embeddings of high-dimensional data into lower dimensional spaces are useful. But they can also create apparent patterns that have little to do with the data-generating process. Be careful.

Session info


Diaconis, Persi, Sharad Goel, and Susan Holmes. 2008. “Horseshoes in Multidimensional Scaling and Kernel Methods.” Annals of Applied Statistics 2: 777.