
I’m an AI researcher at HuggingFace.
Between 2023 and 2025, I led our evaluation efforts and collabs (on LLMs/agents) - my team’s most famous projects include lighteval, the evaluation guidebook and the (former) Open LLM Leaderboard. We also built a couple evaluations (including GAIA with Meta), and helped around 50 teams build their own.
I enjoy programming, making science open and understandable, books, and delicious food. My motto would likely be: “So much to do, so little time”.
You’ll find me as clefourrier over the web (Twitter, LinkedIn), or
you can reach me at -> I’m
currently on sabbatical till Dec 2026 to hike and do fun non AI stuff,
so mostly unavailable!myfirstname at 🤗 dot co. Open to both
collabs and mentoring, within available bandwidth.
2026: ⭐ On
sabbatical!
122025: ⚙️ Release : LLM
Evaluation Guidebook v2
112025: 🎓 Presentation
to the UVT
FI team seminar: Panorama of LLM
evaluations Winter 2025
112025: 🎓 PhD Examiner
for the thesis of Grgur Kovac:
congrats!
102025: 📝 Blog/Book : The
Smol Training Playbook: The Secrets to Building World-Class LLMs ✨
102025:
⭐ 2025’s “AI 100” by H2O.ai: Top AI Leaders Driving
Real-World Impact
102025: 🗞️ Nature: AI bots wrote
and reviewed all papers at this conference
092025: 🎓 Invited to
the Wallenberg Advanced Scientific Forum 2025: Measuring
What Matters: Evaluation as a Driver of Generative AI
092025: 🎧 France
Culture, La Science CQFD: Evaluation
des IA : souffler dans l’algotest
082025: 🗞️ MITTR: GPT-5
is here, now what?
072025: 📜 ACL 2025: La Leaderboard: A Large Language
Model Leaderboard for Spanish Varieties and Languages of Spain and Latin
America
062025: 🎧 Underscore: AI has quietly
reached a historic milestone ✨
052025: 🎓 Reviewer for
ACL 2025
042025: 🗞️ BusinessInsider:
Figuring
out which AI model is right for you is harder than you
think
042025: 🗞️ VentureBeat : Beyond
generic benchmarks: How Yourbench lets enterprises evaluate AI models
against actual data
042025: 📜 COLM 2025: YourBench: Easy Custom
Evaluation Sets for Everyone
032025: 🎤 Keynote for
the CNRS NLP working group : Panorama of LLM evaluations
✨
032025: 📝 Blog : Fixing the
Open LLM Leaderboard with Math-Verify
032025: 🗞️ Epsiloon Magazine
: IA:
le quiz ultime
022025: 📺 France 24 TV : Tech24
022025: ⭐ 2025
“French Innovators” Award : 100
French scientists whose research change our lives, by the French journal
Le Point
022025: 📜 COLM 2025: SmolLM2: When Smol Goes Big –
Data-Centric Training of a Small Language Model
012025: 📝 Blog : CO2
emissions and model performance: Insights from the Open LLM
Leaderboard
??2025: 🎓 Expert
evaluator to select funding for Horizon projects for the European
Commission
122024: 📜 ACL 2025 : Global MMLU: Understanding and
Addressing Cultural and Linguistic Biases in Multilingual
Evaluation
112024: 🎓 Reviewer for
the COLING 2025
102024: ⚙️ Release : The LLM
Evaluation Guidebook ✨
072024: 🗞️ The Economist :
How
to tell which AI model is best
072024: 🎧 Latent Space
Benchmarks 201
✨
062024: 🗞️ VentureBeat : Hugging
Face’s updated leaderboard shakes up the AI evaluation
game
062024: 🗞️ La Tribune : Pour
contrer la crise de l’évaluation des IA, Hugging Face rehausse les
exigences
062024: 📝 Blog : Performances
are plateauing, let’s make the leaderboard steep again ✨
052024:
📝 Blog : Let’s talk
about LLM evaluation
052024: ⭐ “France’s top AI
talents”: Gathering
at Elysée
042024: 🗞️ La Recherche : 2023,
l’année des grands modèles de langue ouverts
042024: 🗞️ TechCrunch : Hugging
Face releases a benchmark for testing generative AI on health
tasks
042024: 📜 Arxiv : The Hallucinations Leaderboard –
An Open Effort to Measure Hallucinations in Large Language
Models
022024: ⚙️ Release : Lighteval ✨
??2023: 🎓 Reviewer for
the LChange Workshop
122023: 📝 Blog : 2023, year of Open
LLMs ✨
122023: 📝 Blog : Open LLM
Leaderboard: DROP deep dive
122023: 📜 ICLR 2024 : GAIA: a benchmark for
General AI Assistants ✨
102023: 📜 CoRR 2023: Zephyr: Direct Distillation of
LM Alignment
062023: 📝 Blog : What’s
going on with the Open LLM Leaderboard?
052023: Mentored 2 teams
for the Responsible AI
Challenge at Mozilla
042023: ⚙️ Release : Open LLM
Leaderboard
032023: 🎧 Parlons Tech :
L’IA
générative à la Loupe
012023: 📝 Blog : Introduction to Graph
Machine Learning
??2022: 🎓 Reviewer for
the LChange Workshop
112022: 📜 Arxiv : Bloom: A 176b-parameter
open-access multilingual language model
102022: 📜 PhD : Neural Approaches to
Historical Word Reconstruction ✨
052022: 📜 ACL 2022 : Probing
Multilingual Cognate Prediction Models ✨
042022: 📜 Arxiv : Entities, Dates, and Languages:
Zero-Shot on Historical Texts with T0
??2020: 🎓 Reviewer for
EMNLP 2020
??2021: 🎓 Reviewer for
ACL 2020
082021: 📜 ACL 2021: Can Cognate
Prediction Be Modelled as a Low-Resource Machine Translation
Task?
??2020: 🎓 Reviewer for
ACL 2020
052020: 📜 LREC 2020: Methodological Aspects
of Developing and Managing an Etymological Lexical Resource: Introducing
EtymDB-2.0
022020: 📜 Arxiv : The Alzheimer’s Disease
Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after
1 Year Follow-up
evaluating LLMs and agents (2022-2025)
investigating graph
machine learning (2022)
reconstructing dead languages using ml
(2019-2022)
predicting neurodegenerative diseases from patient data
through time (2017-2018)
using 3D meshes and grids to visualize
geology (structural modeling) (2014-2015)
This site is deliberately static and very lightweight, for ecology, accessibility, and this. I use markdown and pandoc. My logo was made by Alix Chagué.