Semester7

Notes of courses done/attended in semester 7 in college

Simple agent
- always makes same move independent of opponent
  - Policy is: chooses min numbered line possible,
- pupose of this agent is to give Q-learning algo an easy and predicatable opponent to play aginst
state space size
- each line is unique and can be drawn or not
- 16 dots, 24 lines
- #states = 2^24
- game can be in any one of these states
State action pairs
- mean #lines possible to draw in each state = 12
- #state action pairs = 2^24*12 = 200mn
reward
- +ve or -ve reward only when game ends and also when a box is completed
- reward for winning must be far greater than that for completing a box
- likewise punishment for losing a game needs to be more sever than losing a box

policy is eat chips only when u take action that lead to BnB
- bad policy?
  - risk dkehne pe hai
  - if I eat, i will be sleepy
  - if i do not, then strength nahi

actions, rewards?
do not stay still (reward of -5 which means u pinch urself if u stay sleepy)
F is called absorbing state (reward urself by eating all chips)
moving b/w 2 squares could b good as it might be taking u closer to F, buyt u wouldn’t know unless u refer to map
so u deciode to reqard only after reaching F
all other actionos r neutral
when no direct road b/w 2 squares - no reward as it is not a viable action
reward matrix