My Work

Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts

Many suspect that LLMs are 'cheating' on evaluations; to what extent is that accurate?

Jul 24, 2024

Open Source AI is a lie, but it doesn't have to be
Open Source AI is a lie, but it doesn't have to be

Big Tech is attempting to redefine "Open Source" to their advantage; at the very least, we should know about it.

Apr 30, 2024

Detecting Implicit Gaming through Retrospective Evaluation Sets
Detecting Implicit Gaming through Retrospective Evaluation Sets

2-day hackathon project, awarded first place by peer review in the Evaluations Apart Hackathon in Nov '23.

Nov 23, 2023

Into AI Safety Podcast
Into AI Safety Podcast

The podcast was started as a tool for others shifting into the field of AI safety.

Oct 23, 2023

HoPE Against HoPE
HoPE Against HoPE

Exploring pigeon flocking under predation through multi-agent simulation and evolutionary algorithms.

May 1, 2022

Stretching the Boundary: Shell Finite Elements for Pneumatic Soft Actuators
Stretching the Boundary: Shell Finite Elements for Pneumatic Soft Actuators

Leveraging shell finite elements for faster simulation without compromising accuracy.

Apr 8, 2022

Automated Synthesis of Bending Pneumatic Soft Actuators
Automated Synthesis of Bending Pneumatic Soft Actuators

A seemless workflow for the design and fabrication of pneumatic soft actuators.

Apr 8, 2022

Optimal Design
Optimal Design

Reports from course on optimal design, including least squares classification, optimal topology, support vector machines, and an optimization-based Sudoku solver.

Apr 1, 2022

CPPN2Sim
CPPN2Sim

MATLAB library for converting Computational Pattern Producing Networks into lightweight simulations of actuator behavior.

Nov 2, 2021

Flow Visualization
Flow Visualization

Reports from course on flow visualization: Viscosity Dynamics, Propelled Paint, and Bad Water Rising.

Nov 1, 2021