Semester7

Notes of courses done/attended in semester 7 in college

In distributed system, we have nodes connected over a network
a program running on single node
we have some challenges
- total failure
- partial failure
in single node system, program either fails or succeed (reaches expected point or not)
in distributed system, it may have more than 2 states
- one process down does not mean system will go down
- partial failures are diff to detect, since system is not down and seem to be working

when node fails, 3 common system models for nodes
crash-stop faults
- node was working, at some point it fails, after that it is not part of system., it is done
crash-recovery faults
- node failed at time t1, it might come back later at t2, tho t2-t2 time diff is not known
- challenges if node comes back, state uipdate ho chuki hai
Byzantine (arbitrary) faults
- node might to anything, including trying to trick and deceive other nodes
algos should take care of which failure they are modeling, and take care

any execution happening on distributed system should follow these
safety
- nothing bad happens
- every state within node is consistent with state other nodes have
liveness
- something does happens
- algo should come to a conclusion
- deadlock type ki state na aa jaye

Consistency, Availability , and Partition tolerance
single user agar hai db me, no worry about concurrency issues
multiple users => concurrency control needed
db systems ensure these ACID properties
- atomicity
  - txn may have multiple read/writes
  - prop states that either txn happens completely or nothing happens
- Consistency
  - data b4 txn and after txn are consistent
- Isolation
  - temporary state of a txn (inconsistency) is not visible to others
- Durability
  - whatever change is done should be durable even if system is crashed, it must remain
to ensure these properties
- single node db me koi bt nahi, sab kuch ek jagah hai
- but if data is partitioned across multiple systems, it might delay (overhead int erms of time and availability)
CAP thm says that out of consistency, availability, and partition tolerance, only 2 can be provided at a time

single node me partitioning hi nahi hogi toh C,A dono mil gaye
agar atleast 2 node hai
- partitioning possible
- so along with partitioning, I can only get consistency or availability
Consistency?
- changes should be consistent
- if db is replicated, change in one must reflect in others as well
- even if it is partitioned (darta partitioned into disjoint subsets and distributed)
  - constraints vagairah ensure karna, tough
  - availability affect hogi (mtlb timely nahi milega bande ko reply)
  - so tradeoff b/w consistency and availability
Availability + Consistency
- either single system or some protocol in place
Availability + Partition tolerance
- eventual consistency
- system maybe inconsistent at some pt of time, but eventually it will get corrected
- user may read inconsistent at some time
- dynamoDB is one eg
- there are mechanism where we can know if it is consistent or not
Consistency + Partition tolerance
- any copy of data picked will be consistent
MongoDB vagairah
- partition database in other systems, so that any txn requires data only for a particular system
- so agla involve hi nahi toh koi bt nahi consistency ki
- sharding karta na mongodb

basically available
- system does guarantee availkability
soft state
- whatever state we see at one pt of time may change at any pt of time without any external input
  - normal db me w/o input it fdoes not change
  - router me hoti hai soft state => has some time after which it changes, hard state => which remains constant
eventual consistency
- system will become consistent over time