Home
My Work
Experience
Links

Evaluations

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

Machine Learning

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

Many suspect that LLMs are 'cheating' on evaluations; to what extent is that accurate?

Jul 24, 2024

© 2024 Jacob Haimes. This work is licensed under CC BY SA 4.0