Paper Club: A Moore's Law for AI Capabilities
- Date
- Thursday 19 June 2025
- Time
- 19:00 - 21:00
- Location
- Singapore
About the event
Technical Note: This event is intended for participants with a technical background. We strongly encourage reading the paper ahead of time to fully engage with the discussion. Join us as we explore "Measuring AI Ability to Complete Long Tasks," a fascinating paper that introduces a new way to track AI progress using an intuitive, human-centered metric. Instead of relying on traditional benchmarks that often saturate quickly, the researchers propose measuring AI capabilities through "task completion time horizon" - essentially asking: how long are the tasks that AI can complete with 50% reliability? By combining three diverse task suites (HCAST, RE-Bench, and a new suite called SWAA), they create a comprehensive evaluation spanning everything from 2-second decisions to 8-hour software projects.