AI Safety: Can incomplete preferences keep artificial agents shutdownable?

Name: AI Safety: Can incomplete preferences keep artificial agents shutdownable?
Start: 2025-03-20T11:00:00.000Z
End: 2025-03-20T13:00:00.000Z
Location: Singapore

Date: Thursday 20 March 2025
Time: 19:00 - 21:00
Location: Singapore

About the event

Speaker Bio: Elliott Thornley is a Research Fellow at Oxford University. He uses ideas from decision theory to design and train safer artificial agents. Session Summary: In this event, Elliot will explain the shutdown problem: the problem of ensuring that advanced artificial agents never resist shutdown. Elliot will then propose a solution: we train agents to have incomplete preferences. Specifically, he proposes that we train agents to lack a preference between every pair of different-length trajectories. He will suggest a method for training such agents using reinforcement learning, and present experimental evidence in favour of the method. He will explain how work on the shutdown problem fits into a larger project called ‘constructive decision theory’: using ideas from decision theory to design and train artificial agents.