Metric Embeddings, High Dimensional Geometry, Vector Databases

Logistics

Date: Friday, November 14th 2025

Location: Northwestern University, 3rd floor Mudd library (Room: 3514) 2233 Tech Dr, Evanston, IL 60208.

Parking: For those driving to the workshop, attendees can park in the North Campus garage 2311 N Campus Dr #2300, Evanston, IL 60208. https://maps.northwestern.edu/txt/facility/646 You’ll exit the garage on the opposite side from the car entrance and you’ll see Mudd Library directly in front of you across a grassy lawn area. Take the elevator to your right in the library lobby to the 3rd floor.

Registration

Description:

This one-day workshop, which is part of the Fall 2025 IDEAL Special Program on High Dimensional and Complex Data Analysis, will explore the interplay between metric embeddings, high-dimensional geometry, and algorithmic problems in data science and theoretical computer science. Topics include geometric and probabilistic methods for understanding metric spaces, embeddings with low distortion, and the implications of high-dimensional phenomena for efficient computation and data representation. The workshop will bring together researchers from mathematics, computer science, and related areas to foster exchange across these connected fields.

Schedule:

Time | Event

9:00am: Breakfast

9:20am: Opening remarks

9:30am – 10:15 am: Michal Derezinski (University of Michigan) Optimal Subspace Embeddings for Faster Numerical Linear Algebra

10:15 am – 10:45 am Break

10:45am – 11:30 am: Sepideh Mahabadi (Microsoft Research) Graph-Based Algorithms for Diverse Similarity Search

11:30 am – 11:35 am: Break

11:35 am – 12:00 pm: Prashanti Anderson (MIT) Additive Approximation Schemes for Low-Dimensional Embeddings

12:00 pm – 1:30 pm: Lunch

1:30 pm – 2:15 pm: Christopher Musco (NYU) Navigability and Graph-based Vector Search

2:15 pm – 2:45 pm: Break

2:45 pm – 3:10 pm: Vaidehi Srinivas (Northwestern University) The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound

3:10 pm – 3:15 pm: Break

3:15 pm – 4:00 pm: Konstantin Tikhomirov (CMU) Embedding graphs into metric spaces

Abstracts:

Speaker: Michal Derezinski (University of Michigan)

Title: Optimal Subspace Embeddings for Faster Numerical Linear Algebra

Abstract: A subspace embedding is a random linear transformation that maps high-dimensional vectors to a lower dimension that with high probability preserves the norms of all vectors in a low-dimensional subspace up to a small relative error. Due to their simplicity and computational efficiency, subspace embeddings have been the central dimensionality reduction technique in numerical linear algebra problems such as least squares regression and low-rank approximation for the past two decades. Yet, despite their popularity, there remains a fundamental theory-practice gap in our understanding of subspace embeddings, which is characterized by a conjecture of Nelson and Nguyen (FOCS 2013) about the optimal embedding dimension that can be attained efficiently. In this talk, I will discuss recent efforts in closing this theory-practice gap by analyzing the universality properties of sparse random matrices, which has led to resolving the conjecture up to sub-polylogarithmic factors (SODA 2026). Based on joint works with Shabarish Chenakkod, Xiaoyu Dong, and Mark Rudelson.

Speaker: Sepideh Mahabadi (Microsoft Research)

Title: Graph-Based Algorithms for Diverse Similarity Search

Abstract: Nearest neighbor search is a fundamental data structure problem with many applications in machine learning, computer vision, recommendation systems and other fields. Although the main objective of the data structure is to quickly report data points that are closest to a given query, it has long been noted (Carbonell and Goldstein, 1998) that without additional constraints the reported answers can be redundant and/or duplicative. This issue is typically addressed in two stages: in the first stage, the algorithm retrieves a (large) number r of points closest to the query, while in the second stage, the r points are post-processed and a small subset is selected to maximize the desired diversity objective. Although popular, this method suffers from a fundamental efficiency bottleneck, as the set of points retrieved in the first stage often needs to be much larger than the final output.

In this paper we present efficient algorithms for approximate nearest neighbor search with diversity constraints that bypass this two stage process. Our algorithms are based on popular graph-based methods, which allows us to “piggy-back” on the existing efficient implementations. These are the first graph-based algorithms for nearest neighbor search with diversity constraints. For data sets with low intrinsic dimension, our data structures report a diverse set of k points approximately closest to the query, in time that only depends on k and \log \Delta, where \Delta is the ratio of the diameter to the closest pair distance in the data set. This bound is qualitatively similar to the best known bounds for standard (non-diverse) graph-based algorithms. Our experiments show that the search time of our algorithms is substantially lower than that using the standard two-stage approach.

This is a joint work with Piyush Anand, Piotr Indyk, Ravishankar Krishnaswamy, Vikas Raykar, Kiran Shiragur, and Haike Xu.

Speaker: Prashanti Anderson (MIT)

Title: Additive Approximation Schemes for Low-Dimensional Embeddings

Abstract: We consider the task of fitting low-dimensional embeddings to high-dimensional data. In particular, we study the k-Euclidean Metric Violation problem (k-EMV), where the input is a vector of non-negative pairwise dissimilarities D between data points and the goal is to find a k-dimensional Euclidean embedding of the data whose pairwise distances vector X is as close as possible (in ℓ₂ error) to the input dissimilarities vector D. Although k-EMV has been studied in the statistics community for over 70 years, under the name “multi-dimensional scaling”, prior to our work there were no known efficient approximation algorithms for k > 1. We provide the first polynomial-time additive approximation scheme for k-EMV. Our key technical contribution is a new analysis of correlation rounding for Sherali-Adams / Sum-of-Squares relaxations, tailored to low-dimensional embeddings. Based on joint work with A. Bakshi and S. Hopkins to appear at SODA 2026.

Speaker: Christopher Musco (NYU)

Title: Navigability and Graph-based Vector Search

Abstract: In the last few years, we have seen a surge of interest in industry and academia around so-called “graph-based” methods for high-dimensional near-neighbor search. Such methods include HNSW, DiskANN, and many others. While these methods differ in details, at a high level, they all find candidate near-neighbors by performing greedy search in a sparse graph constructed over the underlying vector data. While this approach works extremely well in practice, our theoretical understanding of why graph-based methods perform well is limited. In this talk, I will discuss recent efforts to place graph-based methods on firmer theoretical ground by studying “graph navigability”, a widely adopted minimal necessary condition to ensure that greedy search finds good near-neighbors in the search graph. I will discuss existence results (when does a dataset admit a sparse navigable graph?) and algorithmic results (can we efficiently construct near-optimal navigable graphs?). I will also discuss open questions on navigability and graph-based search in general. The talk will be based on papers that will appear at NeurIPS 2025 and SODA 2026.

Speaker: Vaidehi Srinivas (Northwestern University)

Title: The Burer-Monteiro SDP method can fail even above the Barvinok-Pataki bound

Abstract: Semidefinite programs (SDPs) are powerful algorithmic tool with wide-ranging applications in combinatorial optimization, control theory, machine learning, and operations research. They are convex problems, so they can be solved in polynomial time. However, in practice, algorithms with worst-case guarantees face a large memory bottleneck, so SDPs are usually solved using the low-memory non-convex Burer-Monteiro method. Previous work has given guarantees for the Burer-Monteiro method in the paradigm of smoothed analysis. In this work, we show that the Burer-Monteiro method can fail in the worst-case, justifying the use of beyond worst-case paradigms like smoothed analysis. We will discuss the techniques used to reason about the optimization landscape, which harness the geometry of the problem quite differently than previous approaches. We hope that these techniques can be extended more broadly to other important non-convex optimization problems. This is joint work with Liam O’Carroll and Aravindan Vijayaraghavan.

Speaker: Konstantin Tikhomirov (CMU)

Title: Embedding graphs into metric spaces

Abstract: I will discuss some recent results regarding embeddings of random regular graphs of bounded degree into metric spaces, including resolution of a problem of Kleinberg in which the target metric space is itself a random graph. Based on joint works with D.Altschuler, P.Dodos, and K.Tyros.

Organizers:

Metric Embeddings, High Dimensional Geometry, Vector Databases

Join Our Newsletter

Success!

Special Program Announcement

Click here to view the exciting series of workshops, courses, seminars and other activities!