April 9th and 10th

Description: 

Real-world networks often exhibit a hidden structure, which we wish to infer. For example, many networks exhibit community structure. Inferring communities is a valuable tool in network analysis; community detection has been used in a wide array of applications including recommender systems (e.g. Netflix), webpage sorting, fraud detection, and neurobiology. Inspired by these real-world networks, researchers in probability, statistics, information theory, and machine learning have studied structure recovery problems in random graph models. In addition to community detection, problems of this type include graph matching, recovery of planted subgraphs, and inference of graph properties. This workshop will bring together leading experts in the field, and both local and external participants, with the goal of sharing the latest advances and launching new collaborations.

Keynote Speakers:
 

Logistics:

  • Date: April 9th -10th
  • In-person Location: Northwestern University: Mudd Library 3rd floor, 2233 Tech Drive, Evanston

Schedule:

The workshop will be held in Mudd 3514, in the Northwestern Computer Science Department.

  • Tuesday, April 9

    8:45

    Breakfast

    9:20

    Opening remarks

    9:30 

    Tselil Schramm (Stanford University) on “Spectral clustering in high-dimensional Gaussian mixture block models”

    10:30

    Break

    11:00

    Chao Gao (U Chicago) on “Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials”

    11:30

    Erasmo Tani (U Chicago) on ” Learning partitions with bounded-error same-cluster oracle”

    12:00

    Lunch

    1:30 

    Jiaming Xu (Duke University) on “Recent advances on random graph matching”

    2:30

    Lightning talks

    3:15-4:30

    Poster session in Mudd 3501 and nearby hallway

    Wednesday, April 10

    8:45

    Breakfast

    9:30 

    Elchanan Mossel (MIT) on “Some progress on Learning and Optimization in Ising Models”

    10:30

    Break

    11:00

    Open problem session

    12:00

    Lunch (and continue discussing open problems)

    1:30 

    Alex Wein (University of California, Davis) on “Fine-Grained Extensions of the Low-Degree Testing Framework”

    2:30

    Break

    3:00

    Cong Ma (U of Chicago) on “ Top-K Ranking with a Monotone Adversary”

    3:30

    Concluding remarks, reception

Titles and Abstracts:

Tselil Schramm

Title: Spectral clustering in high-dimensional Gaussian mixture block models

Abstract: The Gaussian mixture block model is a simple generative model for networks: to generate a sample, we associate each node with a latent feature vector sampled from a mixture of Gaussians, and we add an edge between nodes if and only if their feature vectors are sufficiently similar. The different components of the Gaussian mixture represent the fact that there may be several types of nodes with different distributions over features — for example, in a social network each component represents the different attributes of a distinct community.  In this talk I will discuss recent results on the performance of spectral clustering algorithms on networks sampled from high-dimensional Gaussian mixture block models, where the dimension of the latent feature vectors grows as the size of the network goes to infinity. Our results merely begin to sketch out the information-computation landscape for clustering in these models, and I will make an effort to emphasize open questions.

Based on joint work with Shuangping Li.

Bio: Tselil Schramm is an assistant professor of statistics at Stanford University. She obtained her PhD in computer science at UC Berkeley, advised by Prasad Raghavendra and Satish Rao. Her research sits at the intersection of theoretical computer science and statistics, including a variety of topics such as information-computation trade-offs, sum-of-squares algorithms for statistical problems, and random graphs.

 

Chao Gao

Title: Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials

Abstract: Graphon estimation has been one of the most fundamental problems in network analysis and has received considerable attention in the past decade. From the statistical perspective, the minimax error rate of graphon estimation has been established by Gao et al (2015) for both stochastic block model (SBM) and nonparametric graphon estimation. The statistical optimal estimators are based on constrained least squares and have computational complexity exponential in the dimension. From the computational perspective, the best-known polynomial-time estimator is based on universal singular value thresholding (USVT), but it can only achieve a much slower estimation error rate than the minimax one. It is natural to wonder if such a gap is essential. The computational optimality of the USVT or the existence of a computational barrier in graphon estimation has been a long-standing open problem. In this work, we take the first step towards it and provide rigorous evidence for the computational barrier in graphon estimation via low-degree polynomials. Specifically, in both SBM and nonparametric graphon estimation, we show that for low-degree polynomial estimators, their estimation error rates cannot be significantly better than that of the USVT under a wide range of parameter regimes. Our results are proved based on the recent development of low-degree polynomials by Schramm and Wein (2022), while we overcome a few key challenges in applying it to the general graphon estimation problem. By leveraging our main results, we also provide a computational lower bound on the clustering error for community detection in SBM with a growing number of communities and this yields a new piece of evidence for the conjectured Kesten-Stigum threshold for efficient community recovery. Finally, we extend our computational lower bounds to sparse graphon estimation and biclustering with additive Gaussian noise, and provide discussion on the optimality of our results.

Bio: Chao Gao is Professor of Statistics at University of Chicago. He graduated from Yale University. His advisor is Harry Zhou. His research lies in nonparametric and high-dimensional statistics, network analysis, Bayes theory and robust statistics.

 

Erasmo Tani

Title: Learning partitions with bounded-error same-cluster oracle

Abstract: Learning cluster structure is a fundamental task in machine learning. In this talk, we focus on partitioning a finite set. We consider the active setting in which one gains information about the clusters exclusively by asking potentially expensive questions of the form: “Are u and v part of the same cluster?” The main question of interest is: how many queries are needed to guarantee full recovery of the underlying partition if a fixed number of questions could return an erroneous answer? We will discuss recent progress on the problem, including an algorithm that achieves provably optimal query complexity up to constant factors. Based on joint work with Adela Frances DePavia and Olga Medrano Martín del Campo.

Bio: Erasmo Tani is a final-year PhD candidate at the University of Chicago working on the algorithmic foundations of data science.

 

Jiaming Xu

Title: Recent advances on random graph matching

Abstract: Random graph matching aims to recover the hidden vertex correspondence between two random graphs from correlated edge connections. This is a ubiquitous problem arising in a variety of applications across diverse fields such as network privacy, computational biology, computer vision, and natural language processing. The problem is also deep and rich in theory, involving the delicate interplay of algorithms, complexity, and information limits.  Recently, extensive efforts have been devoted to the study of matching two correlated Erdős–Rényi graphs and exciting progress have been made, thanks to collective efforts from a wide research community. The speaker in his talk will present an overview, recent results, and important future directions on this topic. Based on joint work with Cheng Mao (Georgia Tech), Yihong Wu (Yale), and Sophie H. Yu (Stanford).

Bio: Jiaming Xu is an associate professor at the Fuqua School of Business at Duke University. He received a Ph.D. degree from UIUC in 2014, an M.S. degree from UT-Austin in 2011, and a B.E. degree from Tsinghua University in 2009, all in Electrical and Computer Engineering. His research interests include high-dimensional statistics, networks, information theory, convex and non-convex optimization, and queueing theory. He received a Simons-Berkeley Fellowship in 2016 and an NSF Career Award in 2022.

 

Elchanan Mossel

Title: Some progress on Learning and Optimization in Ising Models

Abstract: I will discuss some recent work on learning Ising Models from samples including in dynamic and adversarial settings along with related problems dealing with optimization problems in Ising Models. 

Based on the following papers:

  1. https://arxiv.org/pdf/2311.09197.pdf (with Jason Gaitonde, to appear in STOC 24)
  2. https://arxiv.org/html/2309.05206v2 (with Zongchen Chen, ITCS 24)
  3. https://arxiv.org/pdf/2302.10841.pdf (with B. Chin, A. Moitra and C. Sandon ; preprint)
  4. https://arxiv.org/abs/2101.06178 (with A. Moitra and C. Sandon ; COLT 2021) 

Bio: Elchanan Mossel  is a professor of mathematics at the Massachusetts Institute of Technology. His primary research fields are probability theory, combinatorics, and statistical inference. Elchanan was the Ph.D. advisor of two of the organizers of this workshop and a postdoc mentor of a third organizer. 

Alex Wein

Title: Fine-Grained Extensions of the Low-Degree Testing Framework

Abstract: The low-degree polynomial framework has emerged as a versatile tool for probing the computational complexity of statistical problems by studying the power and limitations of a restricted class of algorithms: low-degree polynomials. Focusing on the setting of hypothesis testing, I will discuss some extensions of this method that allow us to tackle finer-grained questions than the standard approach.

First, for the task of detecting a planted clique in a random graph, we ask not merely when this can be done in polynomial time O(n^c), but seek the optimal exponent c as a function of the clique size. To this end, we consider algorithms that make non-adaptive edge queries and then apply a low-degree test, and we determine the number of queries required. This is joint work with Jay Mardia and Kabir Verchand.

Second, in the spiked Wigner model with any iid spike prior, we seek the precise optimal tradeoff curve between type I and type II error rates. Conditional on an appropriate strengthening of the “low-degree conjecture,” we show that tests based on the spectrum achieve the best possible tradeoff curve among poly-time algorithms, while exponential-time non-spectral tests can do better. This is joint work with Ankur Moitra.

Bio: Alex Wein is an Assistant Professor of Mathematics at the University of California, Davis. He received his PhD from MIT in 2018, advised by Ankur Moitra. His main research interests are in statistical-computational tradeoffs.

 

Cong Ma

Title: Top-K Ranking with a Monotone Adversary

Abstract: In this talk, we address the top-$K$ ranking problem with a monotone adversary. We consider the scenario where a comparison graph is randomly generated and the adversary is allowed to add arbitrary edges. The statistician’s goal is then to accurately identify the top-$K$ preferred items based on pairwise comparisons derived from this semi-random comparison graph. The main contribution of this paper is to develop a weighted maximum likelihood estimator (MLE) that achieves near-optimal sample complexity, up to a $\log^2(n)$ factor, where $n$ denotes the number of items under comparison. This is made possible through a combination of analytical and algorithmic innovations. On the analytical front, we provide a refined $\ell_\infty$ error analysis of the weighted MLE that is more explicit and tighter than existing analyses. It relates the $\ell_\infty$ error with the spectral properties of the weighted comparison graph. Motivated by this, our algorithmic innovation involves the development of an semi-definite-program-based (SDP-based) approach to reweight the semi-random graph and meet specified spectral properties. Additionally, we propose a first-order method based on the Matrix Multiplicative Weight Update (MMWU) framework. This method efficiently solves the resulting SDP in nearly-linear time relative to the size of the semi-random comparison graph.

Bio: Cong Ma is currently an assistant professor in the Department of Statistics at the University of Chicago. He obtained his PhD from Princeton University in 2020, advised by Professor Yuxin Chen and Professor Jianqing Fan. After that, he did a one-year postdoc at UC Berkeley advised by Professor Martin Wainwright. Cong is broadly interested in the mathematics of data science, with a focus on the interplay between statistics and optimization. Recently he is excited about problems arising from transfer learning, reinforcement learning, and low-rank matrix recovery.

 

Poster Session Abstracts

Shenduo Zhang (Georgia Institute of Technology)

Title: Information-Theoretic Thresholds for Planted Dense Cycles

Abstract: We study a random graph model for small-world networks which are ubiquitous in social and biological sciences. In this model, a dense cycle of expected bandwidth nτ, representing the hidden one-dimensional geometry of vertices, is planted in an ambient random graph on n vertices. For both detection and recovery of the planted dense cycle, we characterize the information-theoretic thresholds in terms of n, τ, and an edge-wise signal-to-noise ratio λ. In particular, the information-theoretic thresholds differ from the computational thresholds established in a recent work for low-degree polynomial algorithms, thereby justifying the existence of statistical-to-computational gaps for this problem.

 

Yuzhou Gu (Institute for Advanced Study)

Title: Community detection in the hypergraph stochastic block model

Abstract: The weak recovery threshold for the 2-community stochastic block model was established by (Massoulié 2014) and (Mossel-Neeman-Sly, 2015), showing that it matches the Kesten-Stigum (KS) threshold. We consider a generalization of this model to hypergraphs, where every hyperedge can contain more than two vertices. (Pal-Zhu 2021) established that weak recovery is always possible above the KS threshold. We study the tightness of the KS threshold for the hypergraph stochastic block model (HSBM), and prove a few interesting phase transitions.

 

We show that the KS threshold is tight when

* r=3,4, any lambda (r is the hyperedge size and lambda is the strength parameter), or

* r=5, lambda >= lambda_5^*, for some lambda_5^* < 0, or

* r=6, and |lambda| is small enough, or

* any r, lambda >= lambda_r^*, for some lambda_r^* <= 1/5

Unlike the SBM, the KS threshold is not always tight for the HSBM. We show that the KS threshold is not tight when

* r=5,6, lambda <= lambda_r^**, for some lambda_r^**<0, or

* r>=7, lambda <= lambda_r^**, for some lambda_r^**>0

This suggests the existence of an information-computation gap in these regimes.

Based on joint works with Aaradhya Pandey and Yury Polyanskiy.

Charlie Guan and Xiaochun Niu (Northwestern University)

Title: Exact Recovery in the Geometric Hidden Community Model

Abstract: In this paper, we propose a family of label recovery problems on weighted Euclidean random graphs. The vertices of a graph are embedded in $\mathbb{R}^d$ according to a Poisson point process, and are assigned to a discrete community label. Our goal is to infer the vertex labels, given edge weights whose distributions depend on the vertex labels as well as their geometric positions. Our general model provides a geometric extension of popular graph and matrix problems, including submatrix localization and $\mathbb{Z}_2$-synchronization, and includes the Geometric Stochastic Block Model (proposed by Baccelli and Sankararaman) as a special case. 

We study the fundamental limits of the exact recovery of the vertex labels. Under a mild distinctness of distributions assumption, we determine the information-theoretic threshold for exact label recovery, in terms of a Chernoff–Hellinger divergence criterion. Impossibility of recovery below the threshold is proven by a unified analysis using a Cram\’er lower bound. Achievability above the threshold is proven via an efficient two-phase algorithm, where the first phase computes an almost-exact labeling through a local propagation scheme, while the second phase refines the labels. The family of models exhibits local-to-global amplification; the information-theoretic threshold is dictated by the performance of the so-called \emph{genie estimator}, which decodes the label of a single vertex given all the other labels.

Tong Xu (Northwestern University)

Title: Integer Programming for Learning Directed Acyclic Graphs from Non-identifiable Gaussian Models

Abstract: We study the problem of learning directed acyclic graphs from continuous observational data, generated according to a linear Gaussian structural equation model. State-of-the-art structure learning methods for this setting have at least one of the following shortcomings: i) they cannot provide optimality guarantees and can suffer from learning sub-optimal models; ii) they rely on the stringent assumption that the noise is homoscedastic, and hence the underlying model is fully identifiable.  We overcome these shortcomings and develop a computationally efficient mixed-integer programming framework for learning medium-sized problems that accounts for arbitrary heteroscedastic noise. We present an early stopping criterion under which we can terminate the branch-and-bound procedure to achieve an asymptotically optimal solution and establish the consistency of this approximate solution. In addition, we show via numerical experiments that our method outperforms three state-of-the-art algorithms and is robust to noise heteroscedasticity, whereas the performance of the competing methods deteriorates under strong violations of the identifiability assumption. The software implementation of our method is available as the Python package \emph{micodag}.

Taha Ameen (University of Illinois Urbana-Champaign)

Title: Graph Matching with Multiple Graphs

Abstract: We study the fundamental limits to exactly matching m correlated Erdos-Renyi graphs. This problem is well understood when m=2, we study the case when m>2. Specifically, we derive a necessary condition for exactly matching all m graphs and present an algorithm for which the same condition is sufficient. This condition is less stringent than the case of exactly matching two graphs. When it holds, we show that it is possible to match subsets of vertices among pairs of graphs and then combine the information from each matching so that all vertices are matched across all graphs with high probability. Along the way, an interesting relationship between the degree of a vertex in an Erdos-Renyi graph and its inclusion in the k-core of the graph is explored. The poster is based on joint work with Bruce Hajek.

Anirudh Sridhar (Massachusetts Institute of Technology)

Title: Finding Super-spreaders in Network Cascades

Abstract: Suppose that a cascade (e.g., an epidemic) spreads on an unknown graph, and only the infection times of vertices are observed. What can be learned about the graph from the infection times caused by multiple distinct cascades? Most of the literature on this topic focuses on the task of recovering the entire graph, which requires $\Omega ( \log n)$ cascades for an n-vertex bounded degree graph. Here we ask a different question: can the important parts of the graph be estimated from just a few (i.e., constant number) of cascades, even as n grows large?

In this work, we focus on identifying super-spreaders (i.e., high-degree vertices) from infection times caused by a Susceptible-Infected process on a graph. Our first main result shows that vertices of degree greater than $n^{3/4}$ can indeed be estimated from a constant number of cascades. Our algorithm for doing so leverages a novel connection between vertex degrees and the second derivative of the cumulative infection curve. Conversely, we show that estimating vertices of degree smaller than $n^{1/2}$ requires at least $\log(n) / \log \log (n)$ cascades. Surprisingly, this matches (up to $\log \log n$ factors) the number of cascades needed to learn the entire graph if it is a tree. Based on joint work with Elchanan Mossel.

Suqi Liu (Harvard University)

Title: Random Geometric Graph Alignment with Graph Neural Networks

Abstract: We characterize the performance of graph neural networks for graph alignment problems in the presence of vertex feature information. More specifically, given two graphs that are independent perturbations of a single random geometric graph with noisy sparse features, the task is to recover an unknown one-to-one mapping between the vertices of the two graphs. We show under certain conditions on the sparsity and noise level of the feature vectors, a carefully designed one-layer graph neural network can with high probability recover the correct alignment between the vertices with the help of the graph structure. We also prove that our conditions on the noise level are tight up to logarithmic factors. Finally we compare the performance of the graph neural network to directly solving an assignment problem on the noisy vertex features. We demonstrate that when the noise level is at least constant this direct matching fails to have perfect recovery while the graph neural network can tolerate noise level growing as fast as a power of the size of the graph. Based on joint work with Morgane Austern.

Organizers:

Join Our Newsletter