Clémentine Fourrier

Between 2023 and 2025, I led our evaluation efforts and collabs (on LLMs/agents) - my team’s most famous projects include lighteval, the evaluation guidebook and the (former) Open LLM Leaderboard. We also built a couple evaluations (including GAIA with Meta), and helped around 50 teams build their own.

I enjoy programming, making science open and understandable, books, and delicious food. My motto would likely be: “So much to do, so little time”.

Contact:

You’ll find me as clefourrier over the web (Twitter, LinkedIn), ~~or you can reach me at myfirstname at 🤗 dot co. Open to both collabs and mentoring, within available bandwidth.~~ -> I’m currently on sabbatical till Dec 2026 to hike and do fun non AI stuff, so mostly unavailable!

Timeline

Awards ResearchPressPodcastsAcademiaRessources

2026: ⭐ On sabbatical!
122025: ⚙️ Release : LLM Evaluation Guidebook v2
112025: 🎓 Presentation to the UVT FI team seminar: Panorama of LLM evaluations Winter 2025
112025: 🎓 PhD Examiner for the thesis of Grgur Kovac: congrats!
102025: 📝 Blog/Book : The Smol Training Playbook: The Secrets to Building World-Class LLMs ✨
102025: ⭐ 2025’s “AI 100” by H2O.ai: Top AI Leaders Driving Real-World Impact
102025: 🗞️ Nature: AI bots wrote and reviewed all papers at this conference
092025: 🎓 Invited to the Wallenberg Advanced Scientific Forum 2025: Measuring What Matters: Evaluation as a Driver of Generative AI
092025: 🎧 France Culture, La Science CQFD: Evaluation des IA : souffler dans l’algotest
082025: 🗞️ MITTR: GPT-5 is here, now what?
072025: 📜 ACL 2025: La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America
062025: 🎧 Underscore: AI has quietly reached a historic milestone ✨
052025: 🎓 Reviewer for ACL 2025
042025: 🗞️ BusinessInsider: Figuring out which AI model is right for you is harder than you think
042025: 🗞️ VentureBeat : Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data
042025: 📜 COLM 2025: YourBench: Easy Custom Evaluation Sets for Everyone
032025: 🎤 Keynote for the CNRS NLP working group : Panorama of LLM evaluations ✨
032025: 📝 Blog : Fixing the Open LLM Leaderboard with Math-Verify
032025: 🗞️ Epsiloon Magazine : IA: le quiz ultime
022025: 📺 France 24 TV : Tech24
022025: ⭐ 2025 “French Innovators” Award : 100 French scientists whose research change our lives, by the French journal Le Point
022025: 📜 COLM 2025: SmolLM2: When Smol Goes Big – Data-Centric Training of a Small Language Model
012025: 📝 Blog : CO2 emissions and model performance: Insights from the Open LLM Leaderboard
??2025: 🎓 Expert evaluator to select funding for Horizon projects for the European Commission
122024: 📜 ACL 2025 : Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
112024: 🎓 Reviewer for the COLING 2025
102024: ⚙️ Release : The LLM Evaluation Guidebook ✨
072024: 🗞️ The Economist : How to tell which AI model is best
072024: 🎧 Latent Space Benchmarks 201 ✨
062024: 🗞️ VentureBeat : Hugging Face’s updated leaderboard shakes up the AI evaluation game
062024: 🗞️ La Tribune : Pour contrer la crise de l’évaluation des IA, Hugging Face rehausse les exigences
062024: 📝 Blog : Performances are plateauing, let’s make the leaderboard steep again ✨
052024: 📝 Blog : Let’s talk about LLM evaluation
052024: ⭐ “France’s top AI talents”: Gathering at Elysée
042024: 🗞️ La Recherche : 2023, l’année des grands modèles de langue ouverts
042024: 🗞️ TechCrunch : Hugging Face releases a benchmark for testing generative AI on health tasks
042024: 📜 Arxiv : The Hallucinations Leaderboard – An Open Effort to Measure Hallucinations in Large Language Models
022024: ⚙️ Release : Lighteval ✨
??2023: 🎓 Reviewer for the LChange Workshop
122023: 📝 Blog : 2023, year of Open LLMs ✨
122023: 📝 Blog : Open LLM Leaderboard: DROP deep dive
122023: 📜 ICLR 2024 : GAIA: a benchmark for General AI Assistants ✨
102023: 📜 CoRR 2023: Zephyr: Direct Distillation of LM Alignment
062023: 📝 Blog : What’s going on with the Open LLM Leaderboard?
052023: Mentored 2 teams for the Responsible AI Challenge at Mozilla
042023: ⚙️ Release : Open LLM Leaderboard
032023: 🎧 Parlons Tech : L’IA générative à la Loupe
012023: 📝 Blog : Introduction to Graph Machine Learning
??2022: 🎓 Reviewer for the LChange Workshop
112022: 📜 Arxiv : Bloom: A 176b-parameter open-access multilingual language model
102022: 📜 PhD : Neural Approaches to Historical Word Reconstruction ✨
052022: 📜 ACL 2022 : Probing Multilingual Cognate Prediction Models ✨
042022: 📜 Arxiv : Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
??2020: 🎓 Reviewer for EMNLP 2020
??2021: 🎓 Reviewer for ACL 2020
082021: 📜 ACL 2021: Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
??2020: 🎓 Reviewer for ACL 2020
052020: 📜 LREC 2020: Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0
022020: 📜 Arxiv : The Alzheimer’s Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up

Research/interest topics

evaluating LLMs and agents (2022-2025)
investigating graph machine learning (2022)
reconstructing dead languages using ml (2019-2022)
predicting neurodegenerative diseases from patient data through time (2017-2018)
using 3D meshes and grids to visualize geology (structural modeling) (2014-2015)

About this site

This site is deliberately static and very lightweight, for ecology, accessibility, and this. I use markdown and pandoc. My logo was made by Alix Chagué.

Welcome!

Contact:

Timeline

Research/interest topics

About this site