Exam Information

Course literature

Distributed Systems - Concepts and Design, 4th edition by Coulouris, Dollimore and Kindberg (published by Addison-Wesley).
Chapters 1, 2, 4.3-4.5, 5, 8, 10, 11.1-11.4, 12.1-12.5, 15 (excluding 15.5), 19 (excluding 19.7).
The Jini(TM) Technology Core Platform Specification, chapter EV (PDF)
I. Foster, C. Kesselman, Computational Grids, Chapter 2 of "The Grid: Blueprint for a New Computing Infrastructure", Morgan-Kaufman, 1999. [pdf]
P. Eerola et.al. The NorduGrid architecture and tools. In Proceedings of CHEP 2003. [pdf]

Exam literature

The exam will emphasize the literature below, and reading it in detail is essential for answering the exam questions.

Distributed Systems - Concepts and Design, 4th edition by Coulouris, Dollimore and Kindberg (published by Addison-Wesley).
Chapter 1-2: Introduction, background, and System Models.
Sections 4.3-4.4: External data representation and client-server.
Sections 5.1-5.2, 5.5: RMI.
Section 8.1-8.3: Distributed file systems, with emphasis on Sun NFS.
Section 10.1-10.5 Peer2Peer Systems
Section 11.1-11.4: Time
Section 12.1-12.3: Mutex and elections.
Section 12.4: Multicast.
Section 12.5: Consensus and related problems.
Section 15.1-15.4: Replication

About the Examination

The exam is oral with the a grade according to a binary pass/fail verdict. An internal censor supervises the exam. Examination and questioning is done mainly by the lecturer but also the censor. The exam is scheduled to take 20 minutes including evaluating and grading. This means that there is 15 minutes available per student. You are expected to prepare a 15 min presentation of each exam question at beforehand, ie. there is no separate preparation time. You can think of the presentation as a mini-lecture given to very quickly learning students.

When you enter the examination room you are requested to identify yourself using the AAU study card. You are then asked to pick a question by lottery (i.e. the question is chosen randomly among a subset of the exam questions). There is a number of hidden numbers (or playing cards) each corresponding to an exam question. You are then advised to spend a minute to browse/read your notes, then write up the agenda for your presentation, and immediately thereafter start on your prepared presentation. You are advised to put your note aside or at least not read from them continuously. You are recommended to write important points and conclusions on the blackboard, and recommend it to animate examples of distributed algorithm. Do not prepare slides - we have limited expectations about readability and artistic impression/beauty of drawings.

If there are less than 10 questions available at the time you draw, it is because we have just heard a presentation of that question a lot in the recent time; this variation will ensure that we are pay better attention to your presentation.

During the exam we evaluate your ability to

Understand the fundamental properties of distributed systems and their implications on system behavior and system design.
Understand common distributed problems and the basic algorithms and techniques that solve these.
Read and present distributed algorithms.
Compare and evaluate different distributed algorithms/solutions with respect to semantic guarantees/accuracy, performance, and fault-tolerance properties
Understand distributed high-level communication mechanisms (e.g. message passing, RMI, multicast) and how it technically can be realized
Practically implement/program small distributed algorithms or applications.
and for good students, to combine existing solution principles and create dedicated solutions to more complex applications.

Here is a generic outline of a presentation that you may find helpful when preparing your own agenda to a specific question

Give a short introduction to the context and problem addressed by the question
Address quickly the core issues of the question and solution goals
Explain in technical detail selected distributed algorithms (or implementation aspects) and discuss its properties, either as general algorithms or step by step examples.

Re-examination will be in august (time to be announced).

Slides of the figures from the book (only) will be available for you during the exam, please announce from the beginning which figure numbers you whish to use. Slides from the lectures are NOT available.

Exam questions

Below you will find a proposal for a list of exam question. Some of the questions are rather broad, and you will have to choose a subset that you will discuss in depth during the exam (of course you will still be required to know about the topics that you choose not to discuss, as well as issues from the exam literature not covered by any question).

In some of the questions you are recommended to choose where to go into depth, since time may not permit going into equal depth with all aspects; similarly treating everything at a superficial level is also not recommended. Included in the list below is a number of sub-questions. These are intended to be of assistance during the preparation of the questions and to hint important sub-topics - not as a strict list to be answered. You are not required to be able to answer all questions to pass; on the other hand, if you are uncertain about many of the answers it is probably worth reading on the topic again. Please do not use the sub-questions as an agenda or outline of your presentation!

On each question is noted the literature that must primarily be addressed in your presentation.

Time in distributed systems [11.1-11.4].
Discuss algorithms to achieve clock synchronization in distributed system, with emphasis on either logical time or physical time.

How are clocks implemented in computers? How is time represented ? what are the sources of inaccurate clocks? Why can't computer clocks not be 100% accurately synchronized? What is physical time as opposed to logical time? What is internal/external synchronization? How does Christians method and the Berkeley Algorithm synchronize clocks? What is the network time protocol? What is causal ordering and the happens-before relation? What is Lamport clocks? How are events time stamped? What is vector clocks? What can be done with vector clocks that cannot be achieved with Lamport clocks? What can logical time be used for? What system model is assumed for the different algorithms?
Mutex and elections [12.1-12.3]
Discuss the problems in performing mutual exclusion and leader election in distributed systems, and show mutex or leader elections algorithms.

What is distributed mutual exclusion? What are the correctness criteria? How does the central server algorithm work? the ring based token passing ? Multicast synchronization? The voting principle? What is election? What purposes may it have? How does ring based election work? The Bully Algorithm? What are the relevant performance metrics for mutex/election algorithms? What are the fault-tolerance properties of the algorithms? What is a reliable failure detector? What is an unreliable failure detector?
Multicast [12.4]
What are the advantages of multicast communication? Discuss either reliable multicast or ordered multicast algorithms (in both cases remember to discuss semantic models).

In what applications can multicast communications be a convenient abstraction? How can it be used to achieve more efficient communication than point-to-point communication? Can it be used to provide better deliverable guarantees that n point-to-point communication? For what applications are basic (un-ordered/unreliable) multicast sufficient, and what applications require stronger guarantees? What are the problems in realizing multi-cast communication? What are the correctness criteria for reliable multicast? How can we implement reliable multicast? - Assuming basic multicast? assuming IP-multicast? What is ordered multicast? what semantic orderings X can you imagine? When is what ordering for instance appropriate? How can the X-ordered delivery be implemented?
What system model is assumed for the different multicast algorithms?
Byzantine generals [12.5]
Explain what the Byzantine generals problem is. Present impossibility result for 3 Byzantine generals, 1 faulty as well as the solution for 4 Byzantine generals, 1 faulty.

What is the relation between Computer Science and Byzantine generals problem? What practical relevance does it have? What are the correctness criteria for the solution to BZG? Why is it impossible to achieve consensus for 3 generals even if only one is allowed to fail Byzantine? When is consensus possible under Byzantine failures? How? Is consensus possible in asynchronous systems? Why not? What other options are available? What is failure masking? How may failure detectors be used?
Remote Method Invocation [ 5.1-5.2, 5.5]
Give an introduction to the idea of RMI, and discuss the implementation principles.

What is Remote Method Invocation? Why is RMI suitable for distributed systems? What is the goal of RMI? What is a remote interface? How can it be specified? How are remote invocations different from local invocations? What is an idempotent operation? What options exists for call semantics? What is maybe invocation semantics? What is at least once semantics? What is at most once semantics? How can they be implemented? What problems arise in case of failures and concurrency? How can a distributed object system be implemented? What are the involved components? What happens step by step during a remote method invocation? What is static and dynamic invocation? What are server threads? What is java RMI? How are remote interfaces specified in Java RMI? How are parameters transferred? What is the call semantics? How is the code for parameter objects transferred? What is the purpose of the registry?
Distributed file systems [8.1-8.3]
Discuss what is the goal of distributed files systems, and describe SUN NFS.

What is the purpose of a basic distributed file system (DFSA) ? What are the required features and areas of responsibility? What are the expected benefits? What is a file? What is a directory? What typical operations must the DFS support? What is the architecture of a typical DFS? What is a flat file service? What is a directory service? How are the typical distributed flat file service operations different from their centralized counterparts+ What is an idempotent operation? what is a stateless server? How is a file identified? What is NFS ? What typical operations are offered by the NFS protocol? What is the architecture of a NFS system? What is a file handle? How is a path name translated? What is the purpose of caching? What is server caching? What is read-ahead? What is write through? What happens in case of failure? What is client caching? What is cache consistency? How does NFS check for validity of a client cache entry? What performance bottlenecks exists in NFS?
Replication [15.1-15.4]
Discuss the use of replication to achieve either fault tolerance or increased availability.

What is replication? How can it improve performance? How can it reduce performance? How can it increase/decrease availability? How can it increase/decrease fault tolerance? What are the common requirements for replicated data? What is the basic architectural model for replicating data? What are the general five phases for accessing replicated objects? What is a fault tolerant service? What are the correctness requirements for replicated fault tolerant data? What is linearizability? What is sequential consistency? What is passive replication? How can it be implemented? What is active replication ? How can it be implemented? What fault-tolerance guarantees can be provided? What are the advantages/disadvantages of active and passive replication respectively? What is (high) availability? What are the consistency requirements for high availability replication? What is a gossip? How does the gossip architecture look like? What is causal ordering? What is a vector time stamp? What are the main data structures in a gossip replica manager? How is a query performed? When may it be performed? How is an update operation performed? When can an update be performed? What is a gossip message? What happens when a replica manager receives a gossip? How are updates propagated in the system? What are the advantages/disadvantages of the gossip architecture?
Peer2peer [10.1-10.5]
Discuss the goal of Peer-to-Peer systems, and describe how searches in a Pastry net is performed.

What is a p2p system? What are their goals? How are they different from centralized or client/server systems? What are the advantages and disadvantages of p2p systems? How are resources identified? How are nodes identified? What is overlay routing networks? What is a distributed hash table? What is the API for a DHT? How is a DHT realized in Pastry? How is routing performed? What information is contained by the routing table? What is the longest common prefix? What is a leaf set? What purpose does it serve? What is the performance of the Pastry System? How does Pastry function when nodes fail or appear/disappear dynamically? What disadvantages do you see of the pastry approach?
Study-exercise
see exam notes under the study exercise text
Study-exercise