From 0 to 1: Code the Classic Reinforcement Learning Algorithms
Programming Course, Bilibili, 2025
This coding series is designed to lead you from basic to advanced reinforcement learning by walking through hands-on Python implementations of the classic algorithms. Step by step, you’ll follow along as each method is explained and developed line by line, starting from the foundational Q-Learning and progressing through modern deep RL including DQN, DDPG, MADDPG, PPO, and SAC.
Q-Learning
Click the web link and start learning: Bilibili - 强化学习 Q-learning玩21点纸牌 纯白板逐行代码Python实现
*Note: This course is in Chinese, but it won’t affect learning
Double Q-Learning
Click the web link and start learning: Bilibili - 强化学习 Double Q-learning 纯白板逐行代码Python实现
*Note: This course is in Chinese, but it won’t affect learning.
Deep Q-Network (DQN)
This is a step-by-step tutorial on how to implement Deep Q-Network (DQN) using Python.
Click the web link and start learning: Bilibili - 深度强化学习 DQN 纯白板逐行代码Python实现
*Note: This course is in Chinese, but it won’t affect learning.
Deep Determenistic Policy Gradient (DDPG)
This video will help you deploy Deep Determenistic Policy Gradient (DDPG) method step-by-step using Python.
Click the web link and start learning: Bilibili - 深度强化学习 DDPG 纯白板逐行代码Python实现
*Note: This course is in Chinese, but it won’t affect learning.
Multi-Agent Deep Determenistic Policy Gradient (MADDPG)
Click the web link and start learning: Bilibili - 多智能体深度强化学习 MADDPG 纯白板逐行代码Python实现
*Note: This course is in Chinese, but it won’t affect learning.
Proximal Policy Optimization (PPO)
Click the web link and start learning: Bilibili - 深度强化学习 PPO 纯白板逐行代码Python实现
*Note: This course is in Chinese, but it won’t affect learning.
Soft Actor-Critic (SAC)
Click the web link and start learning: Bilibili - 深度强化学习 SAC 纯白板逐行代码Python实现
*Note: This course is in Chinese, but it won’t affect learning.
