MixLight: Mixed-Agent Cooperative Reinforcement Learning for Traffic Light Control

Ming YANG, Yiming WANG, Yang YU, Mingliang ZHOU*, Leong Hou U*

*Corresponding author for this work

Research output: Journal PublicationsJournal Article (refereed)peer-review


Optimizing traffic light configuration is viewed as a method to increase the traffic throughput in urban cities. Recent studies have employed reinforcement learning to optimize the traffic light configuration. However, the assumption of these studies is oversimplified as all traffic lights are controlled by one unified policy. In the real world, the situation becomes more complicated as a city may deploy more than one traffic light policy due to the different development stages of the city. In this work, we propose a novel multiagent reinforcement learning method, called MixLight, which aims to learn the traffic light configuration under an environment of mixed policies. Our contribution is twofold. First, we propose an executor-guide dual network, in which the guide network changes the executor network optimization direction via reward shaping. Second, we improve the centralized training and decentralized execution framework for the traffic light environment, which reduces the exploration space of agents and decreases the nonstationary during training process. This assists the agents in achieving a cooperative strategy based on their local observations during the execution. Experiments on real-world and synthetic datasets verify the superiority of our proposed method.

Original languageEnglish
Pages (from-to)2653-2661
Number of pages9
JournalIEEE Transactions on Industrial Informatics
Issue number2
Early online date27 Jul 2023
Publication statusPublished - Feb 2024
Externally publishedYes

Bibliographical note

Publisher Copyright:
© 2005-2012 IEEE.


  • Cooperative systems
  • deep neural networks
  • multiagent systems
  • reinforcement learning (RL)
  • traffic light control


Dive into the research topics of 'MixLight: Mixed-Agent Cooperative Reinforcement Learning for Traffic Light Control'. Together they form a unique fingerprint.

Cite this