Understanding the Behaviour of Learning Algorithms in Zero-sum Games

In economic and game theory, zero-sum games are settings for perfect competition where the gain of one player is exactly equal to the loss of the other. But what happens in environments where many intelligent agents – human or artificial – interact with one other? Do these systems attain a state of equilibrium or do they become chaotic? And what are the conditions that influence these outcomes?

These are some of the questions that Georgios Piliouras, assistant professor of Engineering Systems and Design, Singapore University of Technology and Design (SUTD), is trying to answer through his work on multi-agent reinforcement learning in games.

For Prof Piliouras, the focus on this research area stems from his fascination with how complex phenomena emerge from simple components, such as neurons coming together to form the brain, an ant colony self-organising and building complex structures, or how the global economy works.

“In every one of these cases we can create pretty reasonable models of the behaviour of the individual constituents of these networks,” he noted. “But when we scale them up, the global emergent behaviour can, in many cases, be unexpected.” 

Unexpected chaos

Prof Piliouras’s objective is to create a robust and scalable theory of how learning algorithms behave in general decentralised environments. One of the standard classes of these environments is the zero-sum game which lies at the core of many recent artificial intelligence (AI) architectures.

An example is Generative Adversarial Networks, where two neural networks compete against each other. One of them, the Generator, tries to create realistic looking images whereas the other one, the Discriminator, tries to predict whether the images presented are real world images or synthetic ones. “By having the networks compete against each other, we can create AI that produces very realistic looking images,” explained Prof Piliouras.

The same mathematical concept lies at the core of Alpha-Go/Alpha-Zero, the AI systems produced by DeepMind, which learned to master the game of Go through self-play. 

However, Prof Piliouras’ research found that many standard learning dynamics such as gradient descent (an optimisation algorithm that is used to update the parameters of a machine learning model) are unstable and in fact chaotic in zero-sum games. This suggests that zero-sum games and other similar multi-agent settings can be more complex than standard economic theory suggests.

New multi-agent AI architectures

To improve the performance of self-learning systems, Prof Piliouras is working to create learning algorithms that behave predictably and converge to equilibrium instead of behaving chaotically.

To date, he has co-authored several joint papers with researchers from DeepMind to leverage these ideas and create new multi-agent AI architectures.

His research group also published five papers in the Conference on Neural Information Processing Systems (NeurIPS) in 2019, with two of them receiving spotlight awards. The same year, the team received a best paper award nomination at the International Conference on Autonomous Agents and Multiagent Systems, which is the premier conference on multi-agent systems.

“Publishing in these top ML conferences provides a great opportunity for communicating our ideas to a wide audience and getting some valuable feedback,” said Prof Piliouras, who plans to keep probing deeper into the structure of multi-agent reinforcement learning in games.

“There are a lot of challenges and questions that we still do not quite understand especially when we have a large number of users and complex action spaces,” he said. “There is definitely a lot of exciting work to be done both on the theoretical as well as the experimental front.”  

Prof Piliouras counts himself lucky to have had the opportunity to collaborate with many brilliant researchers around the globe in the course of his work. “My research journey so far has been very rewarding,” he said. “I am happy with the progress we have made already on some of the fundamental questions in the area, and at the same time I am excited about where we are going next.”