学习笔记

按主题整理的学习笔记(RL、控制、规划、感知、Sim2Real)。该分类下的文章如下。

PPO Tricks for Stable Training

less than 1 minute read

Key Concepts Clip ratio and value loss balance. Advantage normalization and reward scaling. Early stopping based on KL divergence.