强化学习记录,学习资料为ZhouBolei大神的introRL。
Foundation
- Lecture1: Overview (课程概括与RL基础)
- Keywords: Agent, Environment, action, reward
- Supervised learning: Annotated images, data follows i.i.d distribution(条件独立同分布)
- Reinforcement learning: Data are not i.i.d, a correlated time series data(数据不符合i.i.d条件,前后有时序关系); No instant feedback or label for correct action(不能马上得到反馈)
- Features of RL: Trial-and-error exploration; Delayed reward; Time matters(sequential data, non i.i.d data); Agent’s actions changes the environment.
- Rewards: A scalar feedback signal; Indicate how well agent is doing at step t; RL is based on the maximization of rewards.
- Sequential Decision Making: Trade-off between immediate reward and long-term reward.
- An RL agent components: Policy, Value function, Model.
- Agents type: Value-based agent, Policy-based agent, Actor-Critic agent.
- Exploration and Exploitation.