Distributed systems
Fall 2008
Exercise 6

Tanenbaum 6,7

Topic: Fault tolerance, two-phase commit

1. a) What kinds of failure types can you expect in
        - a Local Area Network
        - UDP transport service
        - on the air link in GPRS?
   b) How can you detect such failures and how can you mask them? Is it
      possible tha the masking process leads to new failures?
   c) What is the use of establishing failure classifications? Use RPC
      implementations as examples to clarify your points.


2. Fig. 7-16 describes the behavior of an implementation of virtual synchrony.
   a) "Prove" the correctness of the algorithm. Especially, describe what
       happens, when during a multicast operation
       - a new node joins the group
       - a receiving node disappears
       - the sending node crashes.
   b) In Chapter 5 we had an algorithm which implemented a "distributed
      snapshot" (Fig. 5-10). In what way (if in any) are these algorithms
      related?


3. a) Describe the behavior of the two-phase commit protocol in cases where
        - the coordinator suffers a (temporary) crash,
        - the participant suffers a (temporary) crash.
      For what purposes are the different states needed?
      The algorithm presented in Tanenbaum's book does not contain any
      "haveCommitted" message. Is this message necessary at all?
   b) Using unreliable e-mail Alice and Bob tried, in vain, to agree upon
      meeting in a lunch restaurant. Why does the two-phase commit algorithm
      work in a satisfactory way when Alice and Bob cannot agree upon a lunch?
      Could you think that under certain circumstances the two-phase commit
      would work even in the case "Alice and Bob"?
      If Alice and Bob would have used the normal old-fashioned telephone
      there would not have been any problems. Why not? (Hint: the reason is
      NOT the slightly longer delay in message passing!)
      (The stroy of Alice and Bob, see the slides or the article: Turek,J.,
      Shasha,S., The many faces of Consensus in Distributed Systems; Computer
      25,6 pp 8-17; June 1992).


4. How could you implement in a system, which is based on the primary-backup
   architecture,
     a) sequential consistency
     b) causal consistency?
   The  user-visible response time will be shorter if the updates are imple-
   mented as asynchronous background operations (lazy updates). Do lazy
   updates create problems wrt. fulfilling the consistency requirements?

   Your answer should be accurate enough! Explain why your solution works
   properly.