Open Menu
Close Menu
Home
My Work
Experience
Links
Evaluations
Machine Learning
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Many suspect that LLMs are 'cheating' on evaluations; to what extent is that accurate?
Jul 24, 2024