Prof. Dr. Sebastian Peitz
Chair of Safe Autonomous Systems, TU Dortmund
| Chapter | Topic | Content |
|---|---|---|
| Basics & tabular methods | ||
| 1-5 | Bandits, MDPs, Dynamic Programming, Monte Carlo, TD Learning | RL basics in finite dimensions |
| Deep-learning-based methods | ||
| 6 | Brief introduction to deep learning | The basics for what comes next |
| 7 | Value function approximation | Value estimation with function approximation |
| 8 | Deep Q-learning | Q-learning with neural networks |
| 9 | Policy gradients | Direct optimization of the policy |
| 10 | Actor-critic algorithms | Improved policy gradients via value functions |
| 11 | Advanced algorithms (Part I): From policy gradient to PPO | The evolution of moderl RL algorithms |
| 12 | Advanced algorithms (Part II): From \(Q\)-learning to Soft Actor-Critic | The evolution of moderl RL algorithms |
| Advanced Topics |