Goals and challenges of distribution |
- Operating systems and data communications:
- basic concepts
- central OS and comm. services
- Remote communication (RPC, RMI), message passing
- (Courses: Concurrent Programming, Intro. to Data Communications, Operating Systems)
|
- Can present, based on case descriptions, the basics, goals and challenges of distribution.
- Can describe and give examples of the concepts of transparency, heterogeneity, openness, scalability and consistency.
- Can give a justified definition for the term "distributed system".
|
- Can give a justified description of the effect that concurrency, partitioning and replication has on the availability, performance and scalability of a service.
- Can qualitatively estimate the benefits and disadvantages of distributing an application.
- Can explain the meaning of distributed time, concurrency and non-determinism.
|
- Can evaluate the performance and scalability of different distribution solutions by using load and queuing models.
- Can evaluate the reliability and fault tolerance of different solutions.
- Is aware of security-related threats.
|
Structure of a distributed system |
Courses: Operating Systems and Introduction to Data Communications |
- Can characterize the common goals of network operating systems, distributed operating systems and middleware, and describe their differences.
- Can describe vertical distribution.
|
- Can describe those hardware architectures, operating system and middleware level software architectures, and application level architecture models that are central to distribution.
- Can justify the interdependencies of different solutions and their effects on the application level.
|
Can evaluate the costs and benefits gained through different solutions in cases of different system loads. |
Coordination of system and communications
Distributed decision making
|
- The principles, unreliability and inexactness of messaging services (Courses: Concurrent Programming, Intro. to Data Communications)
- Implementation principles of inter-process communication, IPC (Course: Operating Systems)
- The concept of a transaction
|
- Can explain clock synchronization, and the need and working (algorithms) for logical scalar and vector clocks.
- Can describe the basic concepts and at least one algorithm for each of the following topics:
- multicast and its implementation
- defining global state
- distributed decision making, mutual exclusion and election, transaction handling
- Can explain transaction serializability and how to implement it by using locks.
|
- Can implement as an algorithm the total and causal ordering of events, distributed snapshot, mutual exclusion and decision making.
- Can justify why the algorithms work and what is required for them to work, and evaluate the effectiveness of the algorithms in different environments.
- Can describe the operation of a distributed transaction and how to implement serializability by using locks and timestamps.
|
- Is familiar with recent literature in the field.
- Can apply formal methods to prove the correctness of algorithms related to the topic.
- Is aware of more advanced methods in how to implement serializability of transactions in a distributed environment.
|
Management of replicas |
The structural basic solutions of the DNS and www services on the Internet (Course: Introduction to Data Communications) |
- Knows why objects are replicated and how replicated objects are implemented.
- Can explain how consistency problems emerge, describe the differences between user-centric and data-centric models, and pick the suitable model for a given need.
- Can describe the most important implementation approaches of replicas and updating them.
- Is aware of of epidemic and quorum-based updating methods.
|
- Can describe the core features of different consistency models on a conceptual level.
- Can choose a consistency model that fits a specific purpose, and apply it in a meaningful way.
- Can describe both epidemic protocols and quorum-based replica management.
|
- Can design and implement a basic solution for replica management in a fixed network.
- Can design and implement replica management based on epidemic and quorum protocols.
- Is aware of recent literature in the field.
|
Fault tolerance methods |
Error detection and recovery methods in data storage and data communication, based on parity and repetition. (Courses: Computer Organization and Introduction to Data Communications) |
- Can describe failure models, explaining their differences and uses and describe central implementation methods for fault tolerance.
- Can explain how an agreement can be formed between unreliable/faulty actors.
- Can describe how reliable multicast can be implemented in a dynamically changing group.
|
- Can explain the interdependencies and interplay of different failures and their handling mechanisms.
- Can justify the correct operation of the reliable multicast algorithm.
- Can justify how two-phase commit works in different failure situations
- Knows different ways to produce a checkpoint (for recovery) and can justify the correct operation of each method.
|
- Is aware of the general state of the art in literature on each of the mentioned points.
- Can apply probability calculus to estimate fault tolerance.
|