Math in Reinforcement learning

Theoretical Course, Bilibili, 2025

Unbiased and Biased Estimation

In reinforcement learning, the concept of “unbiased estimation” is ubiquitous. One of the core tasks of RL is to estimate various “expected values” through the samples generated from interactions with the environment. In this class, we will first understand the basic concept of unbiased estimation, and then further compare it through two common methods in RL: the Monte Carlo method (unbiased) and the temporal difference method (biased).

Click the web link and start learning this lecture: Bilibili - 无偏估计vs有偏估计

Markov Decision Process

All reinforcement learning problems can be modeled as Markov Decision Processes (MDPs), so what exactly is an MDP? This presentation will mainly analyze it step by step from four aspects:

  1. Markov property
  2. Markov process/Markov chain
  3. Markov reward process
  4. Markov decision process

Click the web link and start learning this lecture: Bilibili - 马尔可夫决策过程

What is exactly the Policy π?

π shows up everywhere in reinforcement learnin, but what exactly is it in math? Let’s dive into this lecture and find out.

Click the web link and start learning this lecture: Bilibili - 策略π到底是什么东西

Value Function

Value function includes state value function and action value function. Let’s invistigate their basic definitons and see how to build them in RL. This presentation mainly covers:

  1. State value function V(s)
  2. Action value function Q(s,a)
  3. The mathematical relationship between V(s) and Q(s,a)
  4. Why Q(s,a) is more commonly used in reinforcement learning?

Click the web link and start learning this lecture: Bilibili - 价值函数

Monte-Carlo Methods

The main idea of the Monte-Carlo method is as follows: When the problem to be solved is the expected value of a certain random event, through a large number of repeated random samplings, the statistical mean of the samples is used to approximate the expected value. Its mathematical essence is to use the law of large numbers in probability theory to approximate the calculation of an integral or expected value that is difficult to directly solve. This sharing includes:

  1. What is the Monte Carlo method
  2. The Monte Carlo method in reinforcement learning

Click the web link and start learning this lecture: Bilibili - 蒙特卡洛方法