An Ensemble Optimization Algorithm Based on MADDPG

Konferenz: ISCTT 2022 - 7th International Conference on Information Science, Computer Technology and Transportation
27.05.2022 - 29.05.2022 in Xishuangbanna, China

Tagungsband: ISCTT 2022

Seiten: 4Sprache: EnglischTyp: PDF

Autoren:
Zhou, Ruiyu (School of Management, Guangdong University of Technology, Guangzhou, Guangdong Province, China)

Inhalt:
Multi-agent Deep Deterministic Policy Gradient (MADDPG) is a multi-agent deep reinforcement learning algorithm, and it has been published for more than 4 years. It has gradually become a classic multi-agent algorithm and one of the basic algorithms that many beginners must learn. However, sometimes the MADDPG algorithm takes a bit too many episodes for the reward value to rise significantly, and sometimes the reward value is not high enough within a certain episode. That is why this paper proposes an algorithm that uses an ensemble method to optimize the MADDPG algorithm. The basic operation idea of the algorithm is to train multiple models simultaneously, then compare the rewards obtained by all models, and select the model with the highest reward to train all models at this stage. After testing in different environments provided by MPE, it is found that this ensemble optimization algorithm based on MADDPG is effective.