learning-to-reinforcement-learning

强化学习记录,学习资料为ZhouBolei大神的introRL

Foundation

  • Lecture1: Overview (课程概括与RL基础)
    • Keywords: Agent, Environment, action, reward
    • Supervised learning: Annotated images, data follows i.i.d distribution(条件独立同分布)
    • Reinforcement learning: Data are not i.i.d, a correlated time series data(数据不符合i.i.d条件,前后有时序关系); No instant feedback or label for correct action(不能马上得到反馈)
    • Features of RL: Trial-and-error exploration; Delayed reward; Time matters(sequential data, non i.i.d data); Agent’s actions changes the environment.
    • Rewards: A scalar feedback signal; Indicate how well agent is doing at step t; RL is based on the maximization of rewards.
    • Sequential Decision Making: Trade-off between immediate reward and long-term reward.
    • An RL agent components: Policy, Value function, Model.
    • Agents type: Value-based agent, Policy-based agent, Actor-Critic agent.
    • Exploration and Exploitation.

Advanced Topics