Statistical foundations of stochastic neighbor embedding
Title : Statistical foundations of stochastic neighbor embedding
Asbtract : Stochastic Neighbor Embedding is a non linear dimension reduction method that is widely used for exploratory purposes in data sciences in general, and particularly in the field of single cell genomics. The method has been proposed by van der Maaten et al (2008) and is characterized by heuristic arguments and by an optimization algorithm, but the statistical foundations of tSNE have not yet been investigated. In this presentation I will show what kind of statistical model can be associated with tSNE, from Markov processes on graphs to multivariate Gaussian models. We propose a hierarchical generative model and investigate the associated estimation strategy based on graphs coupling. We will conclude by open questions relative to the statistical challenges we will investigate in the near future. This is an ongoing joint work with Julien Chiquet (AgroParisTech, INRAE), Thibault Espinasse (Univ Lyon 1), Hugues van Assel (X, LBMC), and Francois Gindraud (AgroParisTech, LBMC).
van der Maaten, L. and G. Hinton (2008). Visualizing data using t-SNE.Journal of MachineLearning Research 9(Nov), 2579–260
Contact : https://franckpicard.github.io
Séminaire en ligne : https://lpensl.my.webex.com/
Room : webconf3.physique