Semester7

Notes of courses done/attended in semester 7 in college

any device working ar notmal operation, ails, then gets repaired then normal operation starts
this is Operate-repair cycle

ras

avail

in both shutdown, and unplanned, system might be offline
planned one r not included in recover, only in failure
transient = just restarting a service, rollback is sufficient
permanent = during the failure, I cannot use system, some imp component has failed
partial failurres preferred over total
total happens when Single point of failure hai
so our goal is to remove SPOF
to increase availability
- increase MTTF
- or, decrease MTTR
  - easier than above
  - MTTF predefined aata device ke saath
  - MTTR? SPOF me recovery takes time (maintenance wale ko bulao woh check karega, fault bataya etc etc)
    - instead if we have 2 systems running parallely, one fails, other chala le, MTTR = 0

furst me Shared memory or Bus both are SPOF here
shmem in gen bmultiple h/w units ki bani hoti, so bus hi main imp hai
workstations connected to ethernet
SPOf = ethernet
third me
- we have 2 networks in parallel
- assuming each workstation is provided diff service, then if one node is down, we cannot use applications and data present on it
fourth me
- shared RAID
  - shared file system = disk structure hai
  - each ndoe can access other’s
- so agar ek node bhi fails, I can use second, data can be accessed using logs and checkpoints etc

workstatipon is available 99% of timr
but if one fails, other is accessible
how to get availability of whole cluster
serial me agar hai 2 devices both having avail = 0.99, then total avail = 0.99*0.99 (bcz even if one fails, whole system fails)
if in parallel
- failure happens only when both fails
- so (1-0.99)*(1-0.99)
- this is failure prob, so avail = 1-0.0001 = 0.9999

spa

so serial me total avail goes down, agar parallel me kia (backup hai, stand by pe), avail increases
last ques me
- 52 weeks so 52 hours not available in a year out of (365*24) hours, also 0.0001 percent times bhi nahi available

fpo

load balancer can still redirect request to new server
access to shared disk is imp, bcz agar sab apna local store karegam tehn fail hote hi bt (RAID, SAN vagairah me)
set of processes
- process migration is tough to acheive
- bcz bahut kuch os space me horta related to a process
- one way is checkpointing
  - store all related details into a file an dfile ko dusri jagah le ja
- second way is make it stateless
  - process does not remember it
  - request state vagairah maintain karega
  - VM me yeh easy to implement
  - bcz os se independent hote woh
sErvice group
- anyth related to a particular service jisko migrate karna

for

fual n/w conenctivity chahiye, ek fail hua toh dusre me daalde
set of disks shared by both, and some local related disks
portability
- a service running on server 1, move karna 2 pe, diff library version/new os version etc nahi hona chahiye bt, compatibility honi chahiye executable program ki
this makes it no sPOF

possible cluster configurations
active passive(hot standby(already ready))
- primary and standbyy(active and passive)
- only prim accept requests
- secondary ka data shared with primary
- second monitor primary node, if fails it becomes main node
passive ke failure ke chance kam since kuch kar nahi raha
but mehnga tabhi
so only critical applns jinko biulkul downtime nahi dena, unko is mode me run kar

active active
- both are receivinf requests
- they both monitor each other
- if one fails, other will handle load coming to second

concept of failback
- failover me migration tha
- active-active me when it recovers, applns fail back to original node, since both are active active
active-acti me manageable cost
but performance impact when single node handle much load
and uske fail ke chance increase when one fails

mnp

now parallel thing hai
when it will fail?
- switch fail
- storage fails
- all the nodes fail
- so, 1*(200/205)*(1-(1-0.99)^4)