UMR 5672

Vous êtes ici : Accueil / Séminaires / Machine Learning and Signal Processing / How Smooth Is Attention?

How Smooth Is Attention?

Valérie Castin (PhD student, ENS PSL)

Quand ?	Le 21/01/2025, de 13:00 à 14:00
Où ?	M7 101
Participants	Valérie Castin
Ajouter un événement au calendrier	vCal iCal

Valérie Castin

Title: How Smooth Is Attention?

Abstract: Self-attention and masked self-attention are at the heart of Transformers' outstanding success. Still, our mathematical understanding of attention, in particular of its Lipschitz properties — which are key when it comes to analyzing robustness and expressive power — is incomplete. We provide a detailed study of the Lipschitz constant of self-attention in several practical scenarios, discussing the impact of the sequence length and layer normalization on the local Lipschitz constant of both unmasked and masked self-attention. We complement our theoretical findings with experiments on pretrained and randomly initialized BERT and GPT-2.
If there is time, I will also present our recent work on modelling Transformers as partial differential equations, which allows to study the dynamics of tokens passing through an infinitely deep Transformer.

L’auto-attention et l’auto-attention masquée sont au cœur du succès exceptionnel des Transformers. Pourtant, notre compréhension mathématique de l'attention, en particulier de sa Lipschitzianité — qui est cruciale lorsqu'il s'agit d'analyser la robustesse et l'expressivité des Transformers — est incomplète. Nous présentons une étude détaillée de la constante de Lipschitz de l'auto-attention, en examinant l'impact de la longueur de séquence et de LayerNorm sur la constante de Lipschitz locale de l'auto-attention masquée et non masquée. Nous complétons nos résultats théoriques par des expériences sur BERT et GPT-2, pré-entraînés ou à l'initialisation.
S’il reste du temps, je présenterai aussi nos récents travaux consistant à modéliser les Transformers par une équation aux dérivées partielles, ce qui permet d’étudier la dynamique des tokens passant à travers un Transformer infiniment profond.

Website: https://scholar.google.com/citations?user=Qb_cpC8AAAAJ&hl=fr

In Room M7 101, 1st floor, Monod campus, ENSL.

Navigation

Navigation

UMR 5672

How Smooth Is Attention?

Valérie Castin

Title: How Smooth Is Attention?

Contacts

Plan d'accès

Navigation