Skip to content. | Skip to navigation

Personal tools

Sections

UMR 5672

logo de l'ENS de Lyon
logo du CNRS
You are here: Home / Seminars / Machine Learning and Signal Processing / On the Robustness of Text Vectorizers

On the Robustness of Text Vectorizers

Damien Garreau (MCF at J. A. Dieudonné laboratory, Université Côte d'Azur).
When Apr 18, 2023
from 01:00 to 02:00
Attendees Damien Garreau
Add event to calendar vCal
iCal

Speaker: Damien Garreau (MCF at J. A. Dieudonné laboratory, Université Côte d'Azur).

Title: On the Robustness of Text Vectorizers

Abstract:

A fundamental issue in natural language processing pipelines is their robustness with respect to changes in the input. One critical step in this process is the embedding of documents, which transforms sequences of words or tokens into vector representations. In this talk, I will show how popular embedding schemes, such as concatenation, TF-IDF, and Paragraph Vector (a.k.a. doc2vec), exhibit robustness in the Hölder or Lipschitz sense with respect to the Hamming distance. I will present quantitative bounds and demonstrate how the constants involved are affected by the length of the document.  This is joint work with Rémi Catellier and Samuel Vaiter.
 

More information: https://sites.google.com/view/damien-garreau/

Talk in room M7 101 (Campus Monod, ENS de Lyon)