Distributed systems Fall 2008 Exercise 6 Tanenbaum 6,7 Topic: Fault tolerance, two-phase commit 1. a) What kinds of failure types can you expect in - a Local Area Network - UDP transport service - on the air link in GPRS? b) How can you detect such failures and how can you mask them? Is it possible tha the masking process leads to new failures? c) What is the use of establishing failure classifications? Use RPC implementations as examples to clarify your points. 2. Fig. 7-16 describes the behavior of an implementation of virtual synchrony. a) "Prove" the correctness of the algorithm. Especially, describe what happens, when during a multicast operation - a new node joins the group - a receiving node disappears - the sending node crashes. b) In Chapter 5 we had an algorithm which implemented a "distributed snapshot" (Fig. 5-10). In what way (if in any) are these algorithms related? 3. a) Describe the behavior of the two-phase commit protocol in cases where - the coordinator suffers a (temporary) crash, - the participant suffers a (temporary) crash. For what purposes are the different states needed? The algorithm presented in Tanenbaum's book does not contain any "haveCommitted" message. Is this message necessary at all? b) Using unreliable e-mail Alice and Bob tried, in vain, to agree upon meeting in a lunch restaurant. Why does the two-phase commit algorithm work in a satisfactory way when Alice and Bob cannot agree upon a lunch? Could you think that under certain circumstances the two-phase commit would work even in the case "Alice and Bob"? If Alice and Bob would have used the normal old-fashioned telephone there would not have been any problems. Why not? (Hint: the reason is NOT the slightly longer delay in message passing!) (The stroy of Alice and Bob, see the slides or the article: Turek,J., Shasha,S., The many faces of Consensus in Distributed Systems; Computer 25,6 pp 8-17; June 1992). 4. How could you implement in a system, which is based on the primary-backup architecture, a) sequential consistency b) causal consistency? The user-visible response time will be shorter if the updates are imple- mented as asynchronous background operations (lazy updates). Do lazy updates create problems wrt. fulfilling the consistency requirements? Your answer should be accurate enough! Explain why your solution works properly.