Known Closest Pair Algorithms are nearly optimal, even in approximation

Algorithms, Reductions, Complexity, Error Correcting Codes, Merlin-Arthur Protocols, Theory, Fun!

Jul 17, 2024

The Closest Pair (CP) Problem is defined as follows: given two sets of N vectors each, find a pair of vectors, one vector from the first set and another vector from the second set, which are as close to each other as possible in Euclidean space.1

Example

Imagine having a bunch of 2D vectors in Euclidean space, and the goal is to find the two which are closest to each other:

In the image above, the two red points are the ones that are closest to each other.2

Why Study This Problem?

This problem is a fundamental problem in computer science and is related to computational geometry, matching problems, computational biology and more.

If this wasn’t enough, in the field of fine-grained complexity, we have been finding computational connections between problems in P, like the CP problem which has been shown to be related to (at least) the following problems: Nearest Neighbor Search, Edit Distance, Closest Pair in other metrics, Furthest Pair, etc.

In this two-part-article…

we will explore how to show that the existing algorithms known for Closest Pair (CP) are (nearly) optimal. This is considered an advanced topic in computer science, but fear not. I will try to simplify things. I will keep things at surface level as much as possible, so even if you are not familiar with these kinds of works it should be possible to get something new out of this article. Moreover, whenever certain aspects of a prerequisite (listed below) are required, I will give a brief introduction to what is needed to keep us motivated. In fact, for this article today, you won’t need any of the prerequisites. I’ve decided to split this into two articles in the interest of keeping this information digestible and short.

Helpful (but not required) prerequisites*:

*for this article today, you won’t need any of the prerequisites.

The notion of a reduction, (For example like those in the P vs NP framework).
Error Correcting Codes (ECC): We do not require familiarity with any specific error correcting code for the sake of this article.3
Communication Protocols, more specifically Merlin-Arthur (MA) Protocols.

By the end of this article series, you will have:

Known some important results about CP, a widely studied computer science problem.
seen an example of a reduction in the theoretical computer science field: Fine-grained complexity.
seen how error correcting codes (ECC) can be useful in proving hardness of approximation.
the intuitive understanding of a line of work, including my paper titled “Finer-grained Reductions in Fine-grained Hardness of Approximation”.

This article is inspired by my research during my masters degree, under the supervision of Prof. Noga Ron-Zewi. Together we published a paper titled “Finer-grained Reductions in Fine-grained Hardness of Approximation” which was accepted to ICALP 2024. While today we won’t go into details of my paper as this makes this post too long, I hope this post introduces new people to the topic and gives other folks the motivation to read the paper. Stay tuned to the next post for more information!

Vector Notation

Before approaching the problem, let us introduce some notation.

Dimension of a vector

The dimension of a vector, usually denoted d, is the number of entries in this vector.

Example of vectors with dimension d=2

\(\begin{bmatrix} 1 \\ 0 \end{bmatrix} , \begin{bmatrix} 1 \\ 1 \end{bmatrix}\)

Example of vectors with dimension d=5

\(a=\begin{bmatrix} 1 \\ 0 \\ 1 \\ 0 \\ 0 \\ \end{bmatrix} , b=\begin{bmatrix} 1 \\ 1 \\ 0 \\ 1 \\ 1 \\ \end{bmatrix}\)

Inner Product between two vectors

A related notion we need is the Inner Product (IP) between two vectors. Which is simply defined as follows:

\(\langle a,b \rangle=\sum_{i=1}^d a_i \cdot b_i\)

Example

We compute the inner product between the two 5-dimensional vectors given in a previous example:

\(\begin{multline} \langle a, b \rangle = \sum_{i=1}^d a_i \cdot b_i \\ =1 \cdot 1 + 0 \cdot 1 + 1 \cdot 0 + 0 \cdot 1 + 0 \cdot 1\\ = 1 + 0 + 0 + 0 + 0 = 1 \end{multline}\)

Thus, the inner product between these two vectors is actually 1. Had the inner product been equal to 0, we call vectors a and b orthogonal vectors.

Euclidean Distance between two vectors

Recall that the distance between two vectors, say vector a and vector b, in Euclidean space is given by

\(\Delta(a,b)=\sqrt{\sum_{i=1}^{d}(a_i-b_i)^2}\)

Example

We compute the distance between the two 5-dimensional vectors given in a previous example:

\(\begin{multline} \Delta(a,b)=\sqrt{\sum_{i=1}^{d}(a_i-b_i)^2}\\ =\sqrt{(1-1)^2+(0-1)^2+(1-0)^2+(0-1)^2+(0-1)^2}\\ =\sqrt{0+1+1+1+1}=\sqrt{4}=2 \end{multline}\)

Thus, the distance between vector a and vector b is equal to 2.

Approaching the problem, even in approximation

The most straightforward algorithm to solve CP is simply to go through all possible pairs and compute their distance from one another, then output the pair with the smallest distance. Since there are N*N=N² pairs, this algorithm takes O(N²*d) time.

You might be thinking well that’s not too bad, we can solve CP in polynomial time. Well, in fine-grained complexity we are interested in pinning down the precise polynomial time required to solve a problem. So we are interested in showing for example that we can’t do much better than N² time.

We will be working with the version of the problem where the dimension is not too large, namely d=O(log(N)).

In fact, to beat the bottleneck of N² time, we will even allow ourselves to solve this problem only in approximation (Approximate Closest Pair, Apx-CP), which means we do not require the output vectors to be the actual closest pair, but we allow the output of pairs of vectors that are “close enough” to the actual closest pair. To be more precise, if the closest pair had distance 2 like in the example above, we will be allowed to output any pair of vectors whose distance is anywhere between

2, and
(1+δ)*2

For some δ>0, which we are required to set.

This may sound like a complication, but take a moment to appreciate why by doing this, we now may have hope to solve CP faster in this version:

Before this complication, we were allowed only to output the pairs of vectors who have distance precisely 2 (or the “Closest Pair”).
After this complication, we are now allowed to output more pairs of vectors. These pairs may have the optimal distance of 2, but may also have a slightly worse distance but no larger than (1+δ)*2.

Known Algorithms for Approximate Closest Pair

Approximate Closest Pair (Apx-CP) can be solved faster than the naïve N² algorithm. If we wish to compute the closest pair in O(N^2-ε) time, then we have algorithms for approximation of δ≈ε³.

I won’t delve into the details of how this algorithm works, but I invite you to read it in the paper “Polynomial Representations of Threshold Functions and Algorithmic Applications”.

Our Results

We show that the algorithm is nearly optimal. More precisely, we prove that in order to achieve (the smaller, closer to the optimal) approximation of δ≈ε⁴we require at least N^2-ε time. Which is very close to the current known algorithm with δ≈ε³.

Future Post

In the interest of keeping this article digestible and short, I’ve decided to split it up into two articles. In the next article, I will show more details of such results.
Stay tuned!

Conclusion

In this article, we explored the fundamental Closest-Pair Problem. By introducing the basics of vector notation, inner products, and Euclidean distances, we've laid the groundwork for tackling this problem. We've also touched upon the concept of approximate solutions, demonstrating why allowing slight deviations can lead to more efficient algorithms. As we conclude this first part, we should take a moment to recognize the progress made in algorithm design for the CP problem. In the next article, I will show the main ideas required for proving such results.

Questions & Discussion

Feel free to comment to discuss anything together!

In literature, this is usually referred to as the “Bichromatic Closest Pair Problem” because we have two sets of vectors. The “Monochromatic Closest Pair Problem” has only one set of vectors.

Again, as per the first footnote, recall in the CP problem we have two sets of vectors. Which means some points in the illustration actually belong to the first set and the rest belong to the second set. Note that in this image the two red points (the closest pair of vectors) belong to different sets.

Although knowing specific error correcting codes is required to completely understand how previous works achieve their results.

— Machine Minds —

Discussion about this post