Why this is here: 90 signal strength + source-backed + recent this week.
VQV Signal
NOISE
90% signal strength
StaminaBench: Stress-Testing Coding Agents over 100 Interaction Turns
We introduce StaminaBench, a benchmark that measures the stamina of coding agents: how many consecutive interaction turns (change requests) they can handle before failing. Unlike the prevailing fraction-of-tasks-solved metric, this matches real vibe-coding wh...
Score 60
Source Type arxiv
Reposts 0
Topic Quality 45
Open the original source for full context, or open the topic page to see related signals and the topic timeline.