Aller au contenu. | Aller à la navigation

Outils personnels


UMR 5672

logo de l'ENS de Lyon
logo du CNRS
Vous êtes ici : Accueil / Séminaires / Machine Learning and Signal Processing / On the Robustness of Text Vectorizers

On the Robustness of Text Vectorizers

Damien Garreau (MCF at J. A. Dieudonné laboratory, Université Côte d'Azur).
Quand ? Le 18/04/2023,
de 13:00 à 14:00
Participants Damien Garreau
Ajouter un événement au calendrier vCal

Speaker: Damien Garreau (MCF at J. A. Dieudonné laboratory, Université Côte d'Azur).

Title: On the Robustness of Text Vectorizers


A fundamental issue in natural language processing pipelines is their robustness with respect to changes in the input. One critical step in this process is the embedding of documents, which transforms sequences of words or tokens into vector representations. In this talk, I will show how popular embedding schemes, such as concatenation, TF-IDF, and Paragraph Vector (a.k.a. doc2vec), exhibit robustness in the Hölder or Lipschitz sense with respect to the Hamming distance. I will present quantitative bounds and demonstrate how the constants involved are affected by the length of the document.  This is joint work with Rémi Catellier and Samuel Vaiter.

More information:

Talk in room M7 101 (Campus Monod, ENS de Lyon)