Keeping Distances

July 2019

View from the top of a taxi. Photo by Clem Onojeghuo on Unsplash.

When was the last time you went more than a day without thinking about distance? It might seem like a strange question. But notions of distance, whether literal or metaphorical, appear frequently in our daily lives.

Here are some distance-related thoughts that regularly creep into my mind:

Where should I go for lunch today? What's nearby and delicious?
What are my goals for today? How close am I to meeting them?
Is the world still on fire today? What can I do to help bring society closer to a more empathetic, thoughtful, and sustainable ideal?

With distance comes a notion of proximity: when things are close, and when they are far away. Sometimes this proximity is objective and easy to measure. Other times, it is more abstract. Regardless, nearly everyone has an intuition for what distance means.

Mathematician Morris Kline once wrote that "Mathematicians create by acts of insights and intuition." So it is for distance. In what follows, I'd like to explore different examples of distances in mathematics. We'll start with the most well-known formulation, and from there will move on to more exotic examples. We'll see what ties all of these distances together, and what makes each one of them unique.

Taxicab Calculations

In geometry class, every student learns that the shortest distance between two points is a straight line. And if you know the coordinates of those points, with the help of the Pythagorean theorem you can even calculate this minimal distance.

Connect the points with a horizontal and a vertical line segment; these segments will form two sides of a right triangle. The hypotenuse of this triangle represents the shortest distance.

Figure 1: Adjust the positions of the two points to see the shortest distance between them (orange).

When looking at a map, this shortest distance is often described by saying it's the distance "as the crow flies." Unfortunately, if you're anything like me, you are not a crow. For those of us tethered to the ground, this shortest distance is typically unattainable. More often, you're constrained by the roads in the town or city you live in.

Enter the manhattan distance, also known as the taxicab geometry. This is a way of measuring distance that assumes you're constrained to grid. Such a notion may seem strange in a math classroom, but in the real world we measure distances in this way all the time (just ask Google Maps). We often talk about distances between city locations in terms of blocks: how many over, and how many up or down. This is just the manhattan distance.

In this geometry, there are no diagonals. From every point on the grid, you can move in one of four directions: up, down, left, or right. Such a restriction still leads to a perfectly good notion of distance. However, it comes with some slightly surprising properties.

For one, unlike the notion of distance you learn about in high school geometry, when it comes to connecting two points in the taxicab geometry, there is typically no unique shortest path.

Here's a demonstration that lets you explore how many different paths will connect two points when using the manhattan distance. (As a bonus, can you figure out how many possible paths there are of minimal distance between two points on the grid?)

Path 1 of 6

Figure 2: Click on a point on the grid, then adjust the slider to see all possible shortest paths to that point.

For another fun bit of trivia, let's explore what circles look like in this geometry. You can define a circle to simply be the set of points that are a fixed distance from some central point. As you may recall, this fixed distance is called the radius of the circle.

With the manhattan distance on a grid, circles always consist of a finite number of points. For instance, a circle of radius 1 centered at any point will consist of the four points on either side of the center.

Here's how a circle looks in the taxicab geometry as you increase the radius:

Circle radius: 1

Figure 3: Move the slider to adjust the size of a circle in the taxicab geometry. The red dot is at the center of the circle.

As you can see, circles in this geometry become closer and closer to rotated squares! Moreover, the circumference of any circle in this geometry is just the radius r multiplied by 8 (since in each quadrant, you need to move up by r units and over by r units). As a consequence, the ratio of a circle's circumference to twice its radius, which in traditional geometry has the value π ≈ 3.14159..., has a distinctly less mysterious flavor here. As far as the manhattan distance is concerned, π is simply equal to 4.

Hamming it Up

Whenever I write a story, I make typos. Fortunately, my computer has a spell checker, and can easily help me correct my mistakes.

But how does the computer know when I've made a mistake in the first place? As it turns out, the idea boils down to distance.

Intuitively, this makes sense. If I type matehmatics when I mean to type mathematics, we can agree that the misspelled word is close to the correctly spelled word. But in order to make this intuition rigorous, we need a more precise understanding of what "close" means in this context.

Let's take a look at three notions of distance between words instead of points.

Hamming Distance

The Hamming distance is a measurement you can use on words of the same length. It counts up the number of positions at which the corresponding letters in the word are different.

For example, "ghost" and "roast" have a hamming distance of 3, because the first three characters in the two words are different, while the last two are the same.

Note the order of letters here is important! the "o" in "ghost" doesn't count as a match with the "o" in "roast", because "o" is the third letter in ghost but the second letter in roast.

Since the words you're comparing must have the same length, the Hamming distance isn't practical as a spell-checking tool. See the references at the end for more common applications of this distance function.

Levenshtein Distance

If we want to compare words of different lengths, we need a different way to measure distance. The Levenshtein distance accomplishes this goal by generalizing, in some sense, the Hamming distance.

The Levenshtein distance calculates the distance between words as being the number of single-character edits needed to transform one word to the other. An edit is defined as the addition of a letter, the removal of a letter, or the exchange of a letter at a given position for a different letter.

For example, the Levenshtein distance between "friend" and "foe" is 4: change the "r" to an "o" and then delete the "i", "n", and "d."

When the two strings are of equal length, the Hamming distance is an upper bound for the Levenshtein distance, but sometimes the Levenshtein will be shorter. For example, "tree" and "reed" have a Hamming distance of 4, but a Levenshtein distance of only 2, since you can delete the "t" from the start of "tree" and then add a "d" to the end.

Damerau-Levenshtein Distance

Let's return to my original typo: matehmatics instead of mathematics. In this case, the Hamming distance is well-defined, and like the Levenshtein distance it equals 2.

In a 1964 paper titled "A technique for computer detection and correction of spelling errors," Fred Damerau added a slight modification to the Levenshtein distance. In what's now known as the Damerau-Levenshtein distance, swapping of two adjacent characters counts as a single edit, rather than two. According to this distance, matehmatics is only one edit away from mathematics.

Damerau motivated this change by observing that more than 80% of typos were caused by one of the four edit types that the Damerau-Levenshtein distance tracks: a missing letter, an extra letter, an incorrect letter, or two adjacent letters out of order.

Here's a demonstration for you to play around with these three notions of word distance. (Bonus question: what would a circle look like in the geometries determined by these distances?)

First Word:

Second Word:

Hamming distance	2
Levenshtein Distance	2
Damerau-Levenshtein Distance	1

Figure 4: Type words into the boxes to compare three different distance measures.

Function Farness

When it comes to math, we often think about distances between points. But points aren't the only mathematical objects that can be compared for closeness.

We can also define a notion of distance on functions. In fact, we can define infinitely many such notions!

What might it mean for two functions to be close together or far apart? If you think about it for a minute or two, you can probably come up with some ideas. Here we'll explore two of them.

One way might be to look at the area between the two functions. If the area is large, then the functions must, in some sense, be far apart. Conversely, if the area is 0, then under some reasonable assumptions on the functions, they must be equal.

For the mathematicians in the room, I'm intentionally omitting technical details. If it makes you feel better, you can assume that all of the functions I discuss in this section are continuous real-valued functions, defined on a closed and bounded interval.

Another way to measure distance is to look at the largest difference between the function values for a given input. For instance, if one function evaluates to 100 while another one evaluates to 25 at the same input, the difference at this input would be 75, and so the distance between the functions would be at least 75. Once again, under some reasonable assumptions on the functions, if this distance is 0, the functions are the same.

Here's a demonstration where you can adjust some simple functions and see both of these distances change.

Largest Diff: 3.00Area: 15.00

Figure 4: You can adjust a few points on the orange and green function to explore how our notions of distance change.

A question for you to ponder: if the largest difference distance is small, what (if anything) can you say about the area distance? If the area distance is small, what (if anything) can you say about the largest difference distance?

What's in a Distance?

We've now seen several examples of distances. However, we haven't yet defined precisely how all of our examples fit into the same definition.

With these examples at the forefront of our minds, I'd now like to tie them together with a mathematical definition of a distance function.

Suppose you have a set of objects. These objects could be points, words, functions, or any other collection of things that you'd like to be able to describe in terms of proximity. A distance function for this set is function that takes any two objects, returns a number, and has the following properties:

The distance function never returns negative numbers. For example, it doesn't make sense to talk about the distance between two cities as having a negative value.
If the distance between two objects is zero, those two objects are the same. For example, regardless of the distance function you choose, the only word with a distance of zero from "mathematics" is the word "mathematics" itself.
The order in which you pass objects to the distance function doesn't matter. For example, the distance between San Francisco and Chicago must be the same as the distance between Chicago and San Francisco.
Adding destinations can never decrease the total length of a trip. In other words, the distance between A and C will never be more than the total distance from A to B and then from B to C, no matter what B you choose. This requirement is known as the triangle inequality, because in a triangle, the sum of any two sides will always exceed the length of the third side.
Intuitively, this should make sense: if you need to fly from San Francisco to Chicago, then barring flight delays, you're never going to get a shorter trip by having a layover somewhere instead of taking a direct flight.

If you're interested, I'll leave it to you to prove that all of the examples we've seen so far qualify as bonafide mathematical distances. (In general, the triangle inequality is the hardest property to verify.)

Prime Time

Now that we have a more formalized definition of distance, we can extend our set of examples beyond where our intuition might naturally take us.

We'll begin with the first (and often only) distance function people typically see: the absolute value function. This function measures the distance between two numbers. For example, 2 and 5 are a distance of 3 from each other; put another way, | 2 - 5 | = | 5 - 2 | = 3.

You can verify for yourself that the absolute value function satisfies all of the properties of a distance function.

But there's another way we could define a distance, one that has to do with the prime factors of whole numbers. The distance is weird looking, but bear with me.

To start, recall that a prime number is a whole number larger than 1, whose only prime factors are 1 and itself. The first few primes are 2, 3, 5, 7, and 11. 4 isn't prime because 4 = 2 × 2. 6 isn't prime because 6 = 2 × 3.

Let's choose a prime number, say 3. Our new distance function will take any two whole numbers, calculate their difference, and determine the maximum power of 3 that divides this difference. The distance is then defined as the reciprocal of three to this maximal power. (In order to have a true distance, we define it to equal zero if the difference equals zero.)

It's a mouthful, right? Let's look at an example. Imagine that we're trying to find the distance between the numbers 11 and 2. The difference is 9, which is divisible by 3 two times. With our new notion of distance, then, we'd say that the distance between 11 and 2 is 3^-2 = 1/9, or around 0.111.

For a given prime p, this distance is called the p-adic distance.

Here's a demonstration that will allow you to play around with these ideas in more detail. You can select a prime and then calculate the p-adic distance between any two whole numbers within the range from -1,000,000 to 1,000,000.

Selected prime: 2

Number 1:

Number 2:

Figure 5: Use this tool to calculating the p-adic difference between any numbers within the range from -1,000,000 to 1,000,000.

Why in the world would you want to do this? I'm afraid a historical motivation would take us too far afield, so if nothing else, just take it as a mathematical curiosity for now.

Instead of calculating these distances one pair at a time, here's a heat map with p-adic distance for every pair of numbers between 1 and 25. As you can see, distances are equal along diagonals that move up and to the right. This is because adding the same value to two numbers has no effect on their p-adic distance: 10 - 2 has a 2-adic distance of 0.125, just as (10 + 5) - (2 + 5) = 15 - 7 does.

Selected prime: 2

Figure 6: A heat chart of p-adic distances for pairs of numbers between 1 and 25. The darker the square, the closer the distance to 1. The lighter the square, the closer the distance to 0. Hover over (or tap on) a square to see the numbers and their distance.

While the chart above can show us several distance calculations at a glance, it also has a flaw. By putting numbers along the top and right in increasing order from 1 to 25, it implicitly relies on the standard notion of distance we've been taught all of our lives. In other words, 25 is far away from 1 because the standard distance (as measured by absolute value) is 24, even though the 2-adic distance between 1 and 25 is much smaller.

Here's another way to visualize numbers based on their p-adic distance, adapted from Visualizing the p-adic Integers and The p-adic integers: General introduction and visual representation. In this representation, numbers form a fractal-like structure. The numbers clump together quickly, so this demonstration only supports seeing a handful of them for the primes 3, 5, and 7:

Number of generations: 1

Selected prime: 3

Figure 7: Another representation of nearby numbers. Choose a prime, and then use the slider to see all numbers less than that prime (black), less then the square of that prime (black and blue), or less than the cube of that prime (black, blue, and orange). Points within a cluster are 'close' in the p-adic sense.

Classifying Closeness

These p-adic distances give us an infinite family of distance functions in addition to the more well-known absolute value function. But they also might seem like little more than a curiosity. Here are some results that I hope will convince you otherwise.

First, it's possible to extend the p-adic distance to differences between rational numbers, not just integers. And when considered in this larger universe of numbers, p-adic distance functions represent emerge more naturally. In fact, according to Ostrowski's Theorem, any distance function on pairs of rational numbers must be equivalent to one of the following:

The trivial distance function (i.e. one that sets the distance between any two numbers equal to 0 if the numbers are the same, and 1 if they are different),
The absolute value function,
The p-adic distance function for some prime p.

In other words, even though p-adic distances fall outside of our everyday experience, mathematically they are of fundamental importance.

Second, since there are infinitely prime numbers, in some sense it is the absolute value function that's the outlier, not the notion of p-adic distance. In fact, there's active research at the intersection of physics and p-adic distance functions (see the resources at the end if you're curious).

Third, there are some fun and counter-intuitive results in geometry that emerge when working with p-adic distance. Here are two to break your brain a little. (These examples are taken from pages 5-6 in Neal Koblitz's p-adic Numbers, p-adic Analysis, and Zeta-Functions.)

Suppose you pick any three points with rational coordinates. As long as these points don't fall along the same line, they'll determine the corners of a triangle. If you calculate the distance between points using a p-adic distance function instead of the usual notion of distance, the triangle will always have two sides of equal length. In other words, every triangle is an isosceles triangle!
You can define a circle of radius r in the same way that you do with well-known Euclidian geometry. Here's where things get weird: if you calculate distance using a p-adic distance function, every point inside the circle is a center of the circle!
Put another way, just like there's no unique shortest path between two points when using the manhattan distance, there's no unique center of a circle when using a p-adic distance.

Conclusion

Distance may not seem like a very interesting topic. After all, it's something we encounter every time we pull up Google Maps. But even in our everyday lives, how we calculate distance depends on context. Are we trying to navigate through crowded city streets, or find the shortest travel distance for an international vacation?

By formalizing our intuition into a definition, we can begin to explore a rich universe of geometries based on different ways to measure distance. Some of these are close to our experience, while others are quite foreign. But despite their differences, they all share the same definition, and the potential for rich and interesting mathematics.

Sources: