Analysis and Improvement of Deep Reinforcement Learning-Based Mobile Robot Navigation in Dynamic Environments
Pysyvä osoite
Kuvaus
Opinnäytetyö kokotekstinä PDF-muodossa.
Autonomous mobile robot navigation in dynamic environments has been studied widely because of
the increasing necessity for safe and efficient movement in complex settings such as warehouses,
hospitals, transport terminals, and public spaces. Even though classical navigation approaches have
shown stronger performance in controlled static environments, real-world continuous applications
require repeated replanning, map updates, and simplified assumptions about obstacle position and
dynamics. These requirements often limit the application of classical approaches in complex scenarios.
Therefore, deep reinforcement learning is progressively adopted as a promising approach for
navigation tasks requiring continuous control and adaptation to changing environments.
In this thesis, deep reinforcement learning-based mobile robot navigation in dynamic environments
has been analyzed and improved with a focus on safety, stability, task completion, and generalization.
The research problem is aligned with the difficulty of learning reliable navigation policies, mainly in
dynamic environments, where the performance can be reduced due to sparse rewards, unsafe
exploration, and limited transferability to unseen settings. The study has been rooted in reinforcement
learning theory, the Markov decision processes, continuous control, and safety-aware reward design
function. Notably, DDPG, TD3, and SAC have been used as proposed algorithms for continuous control.
Using structured navigation environments, a simulation-based experimental framework has been
developed. The baseline DDPG has first been evaluated and conducted both qualitative and
quantitative analysis on its failure behavior. After this, an iterative reward refinement was performed
to improve collision avoidance, goal reaching, and motion stability. Later, algorithmic enhancement
was performed, where TD3 and SAC were compared under controlled static and dynamic settings, and
SAC was selected for curriculum-based training because of its generalization behavior. Finally, a
curriculum-based training has been implemented with progressive complex environments followed
by three generalization assessments, including zero-shot evaluation, warm-up fine-tuning, and OOD
stress sweep.
The results show that the improved model is more stable and has effective navigation behavior. The
curriculum-trained policy achieved consistent task completions and has shown reasonable
transferability to unseen environments, although performance degrades in more difficult out-of
distribution cases. Overall, the study has demonstrated that a combination of reward engineering,
stable learning, and curriculum-based training can improve safe navigation in dynamic environments.
