Center for Stochastic Dynamics Seminar by James Murphy




Online seminar

Social Media Links


, assistant professor of mathematics at Tufts University


Geometric Structures in High-Dimensional Data: Graphs, Manifolds, and W_2-Barycenters


The curse of dimensionality renders statistical and machine learning in high dimensions intractable without additional assumptions on the underlying data.  We consider geometric models for data that allow for mathematical performance guarantees and efficient algorithms that deflect the curse.  The first part of the talk develops a family of data-driven metrics that balance between density and geometry in the underlying data.  We consider discrete graph operators based on these metrics, and prove performance guarantees for clustering with them in the spectral graph paradigm.  Fast algorithms based on Euclidean nearest-neighbor graphs are proposed and connections with partial differential equations on Riemannian manifolds are developed.  In the second part of the talk, we move away from Euclidean spaces and focus on representation learning of probability distributions in Wasserstein space.  We introduce a general barycentric coding model in which data are represented as Wasserstein-2 (W_2) barycenters of a set of fixed reference measures.  Leveraging the Riemannian structure of W_2-space, we develop a tractable optimization program to learn the barycentric coordinates when given access to the densities of the underlying measures.  We provide a consistent statistical procedure for learning these coordinates when the measures are accessed only by i.i.d. samples.  Our consistency results and algorithms exploit entropic regularization of optimal transport maps, thereby allowing our barycentric modeling approach to scale efficiently.  Throughout the talk, applications to image and natural language processing demonstrate the efficacy of our geometric methods.   Zoom Meeting ID: 974 8665 9782 Passcode: 423168


Getting to Campus