Clémentine Fourrier

Welcome!

I’m an AI researcher at HuggingFace, leading our evaluation efforts and collabs (on LLMs/agents). The OpenEvals team maintains lighteval and the evaluation guidebook, as well as builds/helps the community build cool evaluations. We previously worked on the Open LLM Leaderboard.

On the side, I give a hand to our AI for good/AI for science initiatives.

I enjoy programming, making science open and understandable, books, and delicious food. My motto would likely be: “So much to do, so little time”.

Contact:

You’ll find me as clefourrier over the web (Twitter, LinkedIn, BlueSky, …), or you can reach me at myfirstname at 🤗 dot co. Open to both collabs and mentoring, within available bandwidth.

If you want a fast answer, better make it short and to the point ^^

Timeline

Jun 2025: 🗞️ Underscore Twitch: GAIA and the future of agents evaluation Apr 2025: 🗞️ BusinessInsider: Figuring out which AI model is right for you is harder than you think
Apr 2025: 🗞️ VentureBeat : Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data
Apr 2025: 📜 Arxiv : YourBench: Easy Custom Evaluation Sets for Everyone
Mar 2025: 🎤 CNRS NLP working group : Panorama of LLM evaluations ✨
Mar 2025: 📝 Blog : Fixing the Open LLM Leaderboard with Math-Verify
Mar 2025: 🗞️ Epsiloon Magazine : IA: le quiz ultime
Feb 2025: 🗞️ France 24 TV : Tech24 on AI
Feb 2025: 🗞️ French AI Summit Conclusions: French LLM Leaderboard showcase,
Feb 2025: ⭐ Finalist of the 2025 French Innovators Awards, AI section : 100 French scientists whose research change our lives, by the French journal Le Point
Feb 2025: 📜 Arxiv : SmolLM2: When Smol Goes Big – Data-Centric Training of a Small Language Model
Jan 2025: 📝 Blog : CO2 emissions and model performance: Insights from the Open LLM Leaderboard
Dec 2024: 📜 Arxiv : Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation
Oct 2024: ⚙️ Release : Evaluation Guidebook✨
Jul 2024: 🗞️ The Economist : How to tell which AI model is best
Jul 2024: 🎧 Latent Space Benchmarks 201 ✨
Jun 2024: 🗞️ VentureBeat : Hugging Face’s updated leaderboard shakes up the AI evaluation game
Jun 2024: 🗞️ La Tribune : Pour contrer la crise de l’évaluation des IA, Hugging Face rehausse les exigences
Jun 2024: 📝 Blog : Performances are plateauing, let’s make the leaderboard steep again
May 2024: 📝 Blog : Let’s talk about LLM evaluation
May 2024: ⭐ Invited to France’s top AI talents gathering at Elysée Event
May 2024: 📜 ICLR : GAIA: a benchmark for General AI Assistants ✨
Apr 2024: 🗞️ La Recherche : 2023, l’année des grands modèles de langue ouverts
Apr 2024: 🗞️ TechCrunch : Hugging Face releases a benchmark for testing generative AI on health tasks
Apr 2024: 📜 Arxiv : The Hallucinations Leaderboard – An Open Effort to Measure Hallucinations in Large Language Models
Feb 2024: ⚙️ Release : Lighteval ✨
Dec 2023: 📝 Blog : 2023, year of Open LLMs ✨
Dec 2023: 📝 Blog : Open LLM Leaderboard: DROP deep dive
Oct 2023: 📜 Arxiv : Zephyr: Direct Distillation of LM Alignment
Jun 2023: 📝 Blog : What’s going on with the Open LLM Leaderboard?
Apr 2023: ⚙️ Release : Open LLM Leaderboard
Mar 2023: 🎧 Parlons Tech : L’IA générative à la Loupe
Jan 2023: 📝 Blog : Introduction to Graph Machine Learning
Nov 2022: 📜 Arxiv : Bloom: A 176b-parameter open-access multilingual language model
Oct 2022: 📜 PhD : Neural Approaches to Historical Word Reconstruction
May 2022: 📜 ACL : Probing Multilingual Cognate Prediction Models ✨
Apr 2022: 📜 Arxiv : Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
Aug 2021: 📜 ACL : Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
May 2020: 📜 LREC : Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB-2.0
Feb 2020: 📜 Arxiv : The Alzheimer’s Disease Prediction Of Longitudinal Evolution (TADPOLE) Challenge: Results after 1 Year Follow-up

Previous research/interest topics

atm, evaluation of LLMs and agents (2023-now)
graph machine learning (2022)
reconstructing dead languages using neural networks (2019-2022)
neurodegenerative disease prediction from longitudinal data (2017-2018)
using 3D meshes and grids for geology and structural modeling (2014-2015)

It’s likely I’ll learn new things again! (robotics maybe? ¯\(ツ)/¯ )

About this site

This site is deliberately static and very lightweight, for ecology, accessibility, and this. I use markdown and pandoc. My logo was made by Alix Chagué.