Semester7

Notes of courses done/attended in semester 7 in college

Lecture 25
Shattering
3 kinds of classifiers
- threshold = decision stump
- interval classifier
reinforcement learning
MDP = MArkov decision process
Exploration vs exploitation
learning through interaction with environment
- eg chjild learning to walk
- do smth
- get feedback form env
  - u r doing right or wrong
- take action
- action is given reward/penality
state and action spaces
- how many states possible
- what actions
episodic and continual tasks
- maze me hai, niukal aaya bahar = episodic task
- learning to walk = continual task
  - continuous hai
reward and delayed reward
- delayed
  - immediately nahi pata ki if a move taken will take u closer to goal or not
  - i will get as resp is if a move was good or not jab opponent now takes move
policy
mdp and smdp
q-learning algo

supervised
- sample i/o pairs of fn to be learned r given
- teacher hai
  - training data dia
  - predict output minimizing some loss
- eg: regression, classiffication
unsupervised
- only data points given(features only)
- find similar Xs
- clustering
reinforcement
- agent acts on its environment
- rcvs evaluation of its action
- not told which one is correct to achieve goal
- training data: S,A,R
- develop an optimal policy(sequence of decision rules) for learner so as to maximize it s long-term reward
- eg robotics, board-games,etc

Set of states, S
set of Actions A
transition prob Pss(a)
- state s me hu,take action a, move to s;, what is prob of this
Reward Rss(a)
discount (Y)
9 elements ki grid hai
bot could be in any one
so 9 states
Pss’(a)

discount factor
- 1 dollar hai aaj
- time ke saath iski value goes down
- this is concept of discount
- a dollar today is btr than tmrw
policy
- what action u should take when u r in state s

fully observable domain
- although u want to move in one dirn(high prob), u move in some other, ik that, it is fully observable
markovian property
- history is not important
- idc how u reached a state, imp is what u do next
time of transition
- time take to decide is fixed
- for smdp, not fixed
- Rapid chess(fixed time to take a move) ke lie MDP use kar, normal chess ke lie SMDP

state
- parameters describing system
- eg: coordinates of robot moving in room
actions
- which dirn robot moves
transition prob
- prob of going from state s to s’ under influence of ationm a
- 3 states and 2 actions => 9 probabilities
immediate rewards
- +ve/-ve when system makes transition
policy
- actions to be chosen
Value function
- value of state or state-action pair is totl expected award starting form that state