Shared representations (such as roles or intents) that capture the sufficient information needed for collaboration.
AI agents need to collaborate and interact with humans in many different settings such as autonomous vehicles driving alongside humans, robots assisting humans in homes, AI assistants learning and leveraging human preferences.
On the other hand, humans surprisingly collaborate well together even in complex tasks by adapting to each other through repeated interactions.
Given humans' computational constraints (such as being bounded rational with access to limited memory or time), we believe the reason humans can easily interact with each other is that interactions, despite their apparent complexity, are inherently structured. What emerges from these repeated interactions is shared knowledge about the interaction history that enables them to trust each other.
Understanding repeated and long-term interaction of learning agents with humans introduces a set of theoretical and applied challenges for developing more effective AI agents that can coordinate, collaborate, or even positively influence humans. Specifically, we focus on two fundamental research directions: (1) developing representation learning algorithms that enable capturing the core of interaction for better coordination, collaboration, and influencing, and (2) effectively adapting to human partners over repeated interactions.
Game Theoretic Approaches for Formalizing Interaction
Autonomous car nudging in front of a human-driven car to influence the driver to slow down as a result of a game-theoretic planner.
In our work [AURO 2018], we have focused on a game-theoretic and dynamical systems approach for modeling the interaction between humans and robots. Specifically, we have formalized the interaction between autonomous cars and human-driven cars as an underactuated dynamical system in order to go beyond simplistic models of other drivers on the road, e.g., models that treat human-driven cars as moving obstacles, and instead take into account a learning-based approach that incorporates expressive models of human actions and their responses to robots. We demonstrate that we can plan to influence human-driven cars when optimizing for better safety, efficiency, and coordination. We actively gather information about the driving style of other vehicles to discover their policies and influence them toward more desirable strategies.
Representations for Repeated and Continual Interactions
We build upon the important insight that humans and robots need to coordinate with each other over long-term and repeated interactions, and that game-theoretic techniques to build models of the partner are not scalable over continual repeated interactions.
We have thus developed a new and orthogonal paradigm that learns a low-dimensional representation in Markov games---which we refer to as conventions---that capture the core of interaction. Conventions are approximations of sufficient statistics needed for multi-agent coordination. This idea can enable long-term and adaptive interactive behavior in a scalable fashion.
The learned low-dimensional representation can correspond to a diverse set of entities such as assignment of leading and following roles in multi-robot games [AURO 2021], the listening and speaking roles in dyadic interactions, e.g., when two robots collaboratively transport an object [CoRL 2019], a latent action space for teleoperating an assistive robot [ICRA 2020, RSS 2020, IROS 2020, AURO 2021, L4DC 2021, CoRL 2021], a latent strategy or intent of partner policies [CoRL 2020, ICLR 2021, CoRL 2021], or even conventions developed through linguistic communication [CoNLL 2020].
LILI: Learning and Influencing Latent Intent. We capture a low-dimensional latent strategy of the other agent's policy and leverage that for better coordination in a game of air-hockey.
We demonstrate that we can train a deep reinforcement learning policy that leverages these learned representations to better model the non-stationary partner strategy and further to plan and even influence the partner for reaching more effective long-term outcomes. Our algorithm LILI: Learning and Influencing Latent Intent can play a game of air-hockey with another partner (robot or human) without any prior knowledge of their strategy in real-time [CoRL 2020].
Building upon this work, we have studied how to reduce non-stationarity in multi-agent reinforcement learning by stabilizing these learned representations. Our algorithm, SILI: Stable Influencing of Latent Intent stabilizes partner strategies in an effective way that leads to role assignments and solving an easier learning problem in multi-agent collaborations [CoRL 2021]. In addition, we have studied how these conventions adapt over repeated interactions and have proposed a modular approach that separately learns these conventions and their evolution [ICLR 2021].
An Application of Learned Representations: Assistive Teleoperation
For almost one million American adults living with physical disabilities, picking up a bite of food or pouring a glass of water presents a significant challenge. Wheelchair-mounted robotic arms -- and other physically assistive devices -- hold the promise of increasing user autonomy, reducing reliance on caregivers, and improving quality of life. Unfortunately, the very dexterity that makes these robotic assistants useful also makes them hard for humans to control. Today's users must teleoperate their assistive robots throughout entire tasks. For instance, when users control an assistive robot for eating, they would need to carefully orchestrate the position and orientation of the end-effector to move a fork to the plate, spear a morsel of food, and then guide the food back towards their mouth. These challenges are often prohibitive: users living with disabilities have reported that they choose not to leverage their assistive robot when eating because of the associated difficulty. Our key insight is that controlling high-dimensional robots can become easier by learning and leveraging low-dimensional representations of actions, which enable users to convey their intentions, goals, and plans to the robot using simple, intuitive, and low-dimensional inputs.
Imagine that you are working with the assistive robot to grab food from your plate. Here we placed three marshmallows on a table in front of the user, and the person needs to make the robot grab one of these marshmallows using their joystick.
Importantly, the robot does not know which marshmellow the human wants! Ideally, the robot will make this task easier by learning a simple mapping between the person's inputs and their desired marshmallow.
Woodrow Zhouyuan Wang, Andy Shih, Annie Xie, Dorsa Sadigh. Influencing Towards Stable Multi-Agent Interactions. Proceedings of the 5th Conference on Robot Learning (CoRL), 2021. [PDF]
Annie Xie, Dylan Losey, Ryan Tolsma, Chelsea Finn, Dorsa Sadigh. Learning Latent Representations to Influence Multi-Agent Interaction. Proceedings of the 4th Conference on Robot Learning (CoRL), 2020. [PDF]
Andy Shih, Arjun Sawhney, Jovana Kondic, Stefano Ermon, Dorsa Sadigh. On the Critical Role of Conventions in Adaptive Human-AI Collaboration. 9th International Conference on Learning Representations (ICLR), 2021. [PDF]
Siddharth Karamcheti*, Megha Srivastava*, Percy Liang, Dorsa Sadigh. LILA: Language-Informed Latent Actions. Proceedings of the 5th Conference on Robot Learning (CoRL), 2021. [PDF]
Dylan Losey, Hong Jun Jeon, Mengxi Li, Krishnan Srinivasan, Ajay Mandlekar, Animesh Garg, Jeannette Bohg, Dorsa Sadigh. Learning Latent Actions to Control Assistive Robots. Journal of Autonomous Robots (AURO), 2021. [PDF]
Hong Jun Jeon, Dylan Losey, Dorsa Sadigh. Shared Autonomy with Learned Latent Actions. Proceedings of Robotics: Science and Systems (RSS), July 2020. [PDF]
Mengxi Li*, Minae Kwon*, Dorsa Sadigh. Influencing Leading and Following in Human-Robot Teams. Journal of Autonomous Robots (AURO), 2021. [PDF]
Dorsa Sadigh, Nick Landolfi, S. Shankar Sastry, Sanjit A. Seshia, Anca D. Dragan. Planning for Cars that Coordinate with People: Leveraging Effects on Human Actions for Planning and Active Information Gathering over Human Internal State. Autonomous Robots (AURO), October 2018. [PDF]