Optimizing exploration parameter in dueling deep Q-networks for complex gaming environment
Reinforcement Learning is being used to solve various tasks. A Complex Environment is a recent problem at hand for Reinforcement Learning, which employs an Agent who interacts with the surroundings and learns to solve whatever task has to be done. To solve a Complex Environment efficiently using a R...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Language: | English |
Published: |
2019
|
Subjects: | |
Online Access: | http://eprints.utm.my/id/eprint/96670/1/MuhammadShehryarKhanMSC2019.pdf.pdf http://eprints.utm.my/id/eprint/96670/ http://dms.library.utm.my:8080/vital/access/manager/Repository/vital:143073 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Reinforcement Learning is being used to solve various tasks. A Complex Environment is a recent problem at hand for Reinforcement Learning, which employs an Agent who interacts with the surroundings and learns to solve whatever task has to be done. To solve a Complex Environment efficiently using a Reinforcement Learning Agent, a lot of parameters are to be kept in perspective. Every action that the Agent takes has a consequence in the form of a Reward Function. Based on the value of this Reward Function, our Agent develops a Policy to solve the Environment. The Policy is generally developed to maximize the Reward Functions. The Optimal Policy employs an Exploration Strategy which is used by the Agent. Reinforcement Learning Architectures are relying on the Policy and Exploration Strategy of the Agent to solve the Environment efficiently. This research is based upon two parts. Firstly, the optimization of a Deep Reinforcement Learning Architecture “Dueling Deep Q-Network” is conducted by improving its Exploration strategy. It combines a recent and novel Exploration technique, Curiosity Driven Intrinsic Motivation, with the Dueling DQN. The performance of this Curious Dueling DQN is checked by comparing it with the existing Dueling DQN. Secondly, the performance of the Curious Dueling DQN is validated against Noisy Dueling DQN, a combination of Dueling DQN with another recent exploration strategy called Noisy Nets, hence, finding an optimal exploration strategy. The performance of both solutions is evaluated in the environment of Super Mario Bros based on Mean Score and Estimation Loss. The proposed model improves the Mean Score by 3 folds, while the loss is increased by 28%. |
---|