AI Safety: Can incomplete preferences keep artificial agents shutdownable?
- Date
- Thursday 20 March 2025
- Time
- 19:00 - 21:00
- Location
- Singapore
About the event
Speaker Bio: Elliott Thornley is a Research Fellow at Oxford University. He uses ideas from decision theory to design and train safer artificial agents. Session Summary: In this event, Elliot will explain the shutdown problem: the problem of ensuring that advanced artificial agents never resist shutdown. Elliot will then propose a solution: we train agents to have incomplete preferences. Specifically, he proposes that we train agents to lack a preference between every pair of different-length trajectories. He will suggest a method for training such agents using reinforcement learning, and present experimental evidence in favour of the method. He will explain how work on the shutdown problem fits into a larger project called ‘constructive decision theory’: using ideas from decision theory to design and train artificial agents.