Rice Header
CAAM Header

Colloquium - 3/18, 3:00PM, Duncan Hall 1064

Michael Trosset

Department of Statistics
Indiana University

"Out-of-Sample Embedding"

Various problems in statistics and machine learning necessitate embedding new objects in an existing Euclidean space without disturbing a previously embedded configuration of points. In machine learning, this activity is widely known as out-of-sample embedding.

If embedding is performed by a method that makes use of Cartesian coordinates, then the out-of-sample problem is easy to formulate and the challenges are purely computational. After reviewing some possibilities, I will turn to the less intuitive case in which embedding is performed by classical multidimensional scaling (CMDS), which recovers principal component representations from Euclidean inner products. This is the case that has been emphasized in the machine learning literature.

I will review several ideas for out-of-sample extensions of CMDS, then describe a principled solution in which the out-of-sample extension is formulated as an unconstrained nonlinear least squares problem. The objective function is a fourth-order polynomial, easily minimized by standard gradient-based methods for numerical optimization. More importantly, this formulation provides deeper insight into what earlier proposals accomplish.

Department of Computational and Applied Mathematics
6100 Main MS-134   Houston, TX 77005   713.348.4805

Rice University   |   School of Engineering   |   Pearlman Memorial Fund   |   Weiser Memorial Fund for Student Excellence   |   Contact Webmaster