20200425

@ShiKaiWi 2020-04-25T00:00:00

Plan

60min: Primary-Backup Replication 30min: Finish notes of previous post

Notes

Primary-Backup Replication

Fault tolerant for fail stop not for bugs.

Two schemas

State transfer: transfer the whole state of primary to bacukup.
Replicated State Matchine: P/B are a state machine so replication of the whole state machine equals the replication of input into the state machine.

Important Problems about P/B Replication

Synchronization speed between the primary and the backup.
How to switch(cut over) to backup when primary fails down?
What are the anommalies in cut-over?
How to create a new replica after the backup has taken over the primary?

Interesting Q&A

why do all replica need a primary?

Short: The primary decides the mutation order so that keeps the consistency between all relicas.

Details: Say there are two updates(Set 1 & Set 2) on the variable X is sent concurrently to all replicas(no primary) and then client tries to fetch the value of X from replica0 and from replica1 different values(1 or 2) may be returned so inconsistency occurs. In GFS, all updates are sent to primary replica and the primary decides the order of the multiple updates which is sent to and executed by the other replicas so that the consistency between all replicas are achieved.

why to use heartbeat to grant a lease of primary?

Heartbeat is useful technique in distributed system: client is granted a lease with deadline from master and to keep the lease valid the client must be granted again before the lease expires(this is heartbeat). That is to say, client and master will both know the event of lease expiring if network partition occurs between client and master and eventually no multiple primary roles exist(no brain split).

why does any chunk has ben tagged with a version?

Say a primary chunkserver fails down and goes back again as a non-primary replica after a while during which a new primary has been elected by master and some updates has occured on the relavant replicas. You know the updates is missing on the ex-primary replica so relavant chunks are stale and GFS must not return the stale chunks to querying client. And the version number of chunk solves this issue. And updates on a chunk will increase the version number associated with the chunk and the version number is recorded by master which means the ex-primary replica reports its chunks(with their version) and master can detect some chunks are stale(the old version number is smaller than the lastest version number).

Conclusion

No more.