20200423

@ShiKaiWi

Plan

60min: GFS

Notes

Why hard?

  • performance => sharding
  • faults => tolerance by replication
  • inconsistency because of replication
  • low performance because of consistency

What?

Consitency

strong consistency ensures client interactes with cluster(many machines) just like with single matchine.

Replication

  • double write: update disorder(inconsistency).

How is GFS built?

Features

  • Automatic Recovery
  • Single data center
  • Storing Large file
  • For sequential access
  • no gurantee for correct data

Archtecture

  • Single master keeps meta data for all chunks located in chunk servers.
  • Multiple chunk servers take responsibility for storing chunks whose size is 64MB based on linux filesystem.

Read/Write procedure

check these procedure in paper or the course vedio.

Shortcomings

  • single master => no ha of master
  • no strong onsistency gurantee

Interesting Q&A

(check answers in next note)

  1. why do all replica need a primary?
  2. why to keep primary role by leasing?
  3. why to use heartbeat to grant a lease of primary?

Conclusion

(check them in next note)

More