Exam Information
Course literature
- Distributed Systems - Concepts and Design, 4th edition by Coulouris, Dollimore and Kindberg (published by Addison-Wesley).
Chapters 1,
2, 4.3-4.5, 5, 8, 10, 11.1-11.4, 12.1-12.5, 15 (excluding 15.5), 19 (excluding
19.7).
- The Jini(TM) Technology Core Platform Specification, chapter EV (PDF)
- I. Foster, C. Kesselman, Computational Grids, Chapter 2 of "The
Grid: Blueprint for a New Computing Infrastructure", Morgan-Kaufman, 1999. [pdf]
- P. Eerola et.al. The NorduGrid architecture and tools. In
Proceedings of CHEP 2003. [pdf]
Exam literature
The exam will emphasize the literature below, and reading it in detail is
essential for answering the exam questions.
- Distributed Systems - Concepts and Design, 4th edition by Coulouris, Dollimore and Kindberg (published by Addison-Wesley).
Chapter 1-2: Introduction, background, and System Models.
Sections 4.3-4.4: External data representation and client-server.
Sections 5.1-5.2, 5.5: RMI.
Section 8.1-8.3: Distributed file systems, with emphasis on Sun NFS.
Section 10.1-10.5 Peer2Peer Systems
Section 11.1-11.4: Time
Section 12.1-12.3: Mutex and elections.
Section 12.4: Multicast.
Section 12.5: Consensus and related problems.
Section 15.1-15.4: Replication
About the Examination
The exam is oral with the a grade according to a binary pass/fail verdict. An internal censor
supervises the exam. Examination and questioning is done mainly by the lecturer
but also the censor. The exam is scheduled to take 20
minutes including evaluating and grading. This means that there is 15 minutes
available per student. You are expected to prepare a 15 min
presentation of each exam question at beforehand, ie. there is no separate
preparation time. You can think of the presentation as a mini-lecture given to
very quickly learning students.
When you enter the examination room you are requested to identify yourself
using the AAU study card. You are then asked to pick a question by lottery (i.e.
the question is chosen randomly among a subset of the exam questions). There is
a number of hidden numbers (or playing cards) each corresponding to an exam
question. You are then advised to spend a minute to browse/read your notes, then
write up the agenda for your presentation, and immediately thereafter start on
your prepared presentation. You are advised to put your note aside or at least
not read from them continuously. You are recommended to write important points and
conclusions on the blackboard, and recommend it to animate examples of
distributed algorithm. Do not prepare slides - we have limited expectations
about readability and artistic impression/beauty of drawings.
If there are less than 10 questions available at the time you draw, it is
because we have just heard a presentation of that question a lot in the recent
time; this variation will ensure that we are pay better attention to your
presentation.
During the exam we evaluate your ability to
- Understand the fundamental properties of distributed systems and their
implications on system behavior and system design.
- Understand common distributed problems and the basic algorithms and
techniques that solve these.
- Read and present distributed algorithms.
- Compare and evaluate different distributed algorithms/solutions with
respect to semantic guarantees/accuracy, performance, and fault-tolerance
properties
- Understand distributed high-level communication mechanisms (e.g. message
passing, RMI, multicast) and how it technically can be realized
- Practically implement/program small distributed algorithms or
applications.
- and for good students, to combine existing solution principles and
create dedicated solutions to more complex applications.
Here is a generic outline of a presentation that you may find helpful
when preparing your own agenda to a specific question
- Give a short introduction to the context and problem addressed by
the question
- Address quickly the core issues of the question and solution goals
- Explain in technical detail selected distributed algorithms (or
implementation aspects) and discuss its properties, either as general
algorithms or step by step examples.
Re-examination will be in august (time to be announced).
See also Hans Hüttels advice on oral exams.
Slides of the figures from the book (only) will be available for you during
the exam, please announce from the beginning which figure numbers you whish to
use. Slides from the lectures are NOT available.
Exam questions
Below you will find a proposal for a list of exam question. Some of the
questions are rather broad, and you will have to choose a subset that you will
discuss in depth during the exam (of course you will still be required to know
about the topics that you choose not to discuss, as well as issues from the exam
literature not covered by any question).
In some of the questions you are recommended to choose where to go
into depth, since time may not permit going into equal depth with all
aspects; similarly treating everything at a superficial level is also
not recommended. Included in the list below is a number of
sub-questions. These are intended to be of assistance during the
preparation of the questions and to hint important sub-topics - not as
a strict list to be answered. You are not required to be able to answer
all questions to pass; on the other hand, if you are uncertain about
many of the answers it is probably worth reading on the topic again.
Please do not use the sub-questions as an agenda or outline of your
presentation!
On each question is noted the literature that must primarily be addressed in
your presentation.
- Time in distributed systems [11.1-11.4].
Discuss algorithms to achieve clock synchronization in distributed
system, with emphasis on either logical time or physical time.
How are clocks implemented in computers? How is time represented ? what are
the sources of inaccurate clocks? Why can't computer clocks not be 100%
accurately synchronized? What is physical time as opposed to logical time?
What is internal/external synchronization? How does Christians method and
the Berkeley Algorithm synchronize clocks? What is the network time protocol?
What is causal ordering and the happens-before relation? What is Lamport
clocks? How are events time stamped? What is vector clocks? What can be done
with vector clocks that cannot be achieved with Lamport clocks? What can
logical time be used for? What system model is assumed for the different
algorithms?
- Mutex and elections [12.1-12.3]
Discuss the problems in performing mutual exclusion and leader
election in distributed systems, and show mutex or leader elections
algorithms.
What is distributed mutual exclusion? What are the correctness criteria? How
does the central server algorithm work? the ring based token passing ?
Multicast synchronization? The voting principle? What is election?
What purposes may it have? How does ring based election work? The Bully
Algorithm? What are the relevant performance metrics for mutex/election
algorithms? What are the fault-tolerance properties of the algorithms? What
is a reliable failure detector? What is an unreliable failure detector?
- Multicast [12.4]
What are the advantages of multicast communication? Discuss either
reliable multicast or ordered multicast algorithms (in both cases
remember to discuss semantic models).
In what applications can multicast communications be a convenient
abstraction? How can it be used to achieve more efficient communication than
point-to-point communication? Can it be used to provide better deliverable
guarantees that n point-to-point communication? For what applications
are basic (un-ordered/unreliable) multicast sufficient, and what
applications require stronger guarantees? What are the problems in realizing
multi-cast communication? What are the correctness criteria for reliable
multicast? How can we implement reliable multicast? - Assuming basic
multicast? assuming IP-multicast? What is ordered multicast? what semantic
orderings X can you imagine? When is what ordering for instance appropriate?
How can the X-ordered delivery be implemented?
What system model is assumed for the different multicast algorithms?
- Byzantine generals [12.5]
Explain what the Byzantine generals problem is. Present
impossibility result for 3 Byzantine generals, 1 faulty as well as the
solution for 4 Byzantine generals, 1 faulty.
What is the relation between Computer Science and Byzantine generals
problem? What practical relevance does it have? What are the correctness
criteria for the solution to BZG? Why is it impossible to achieve
consensus for 3 generals even if only one is allowed to fail Byzantine? When is
consensus possible under Byzantine failures? How? Is consensus possible in
asynchronous systems? Why not? What other options are available? What is
failure masking? How may failure detectors be used?
- Remote Method Invocation [ 5.1-5.2, 5.5]
Give an introduction to the idea of RMI, and discuss the
implementation principles.
What is Remote Method Invocation? Why is RMI suitable for distributed
systems? What is the goal of RMI? What is a remote interface? How can it
be specified? How are remote invocations different from local invocations?
What is an idempotent operation? What options exists for call semantics?
What is maybe invocation semantics? What is at least once semantics? What is
at most once semantics? How can they be implemented? What problems arise in
case of failures and concurrency? How can a distributed object system be
implemented? What are the involved components? What happens step by step
during a remote method invocation? What is static and dynamic invocation?
What are server threads? What is java RMI? How are remote interfaces
specified in Java RMI? How are parameters transferred? What is the call
semantics? How is the code for parameter objects transferred? What is the
purpose of the registry?
- Distributed file systems [8.1-8.3]
Discuss what is the goal of distributed files systems, and describe
SUN NFS.
What is the purpose of a basic distributed file system (DFSA) ? What are the
required features and areas of responsibility? What are the expected
benefits? What is a file? What is a directory? What typical operations must
the DFS support? What is the architecture of a typical DFS? What is a flat
file service? What is a directory service? How are the typical distributed
flat file service operations different from their centralized counterparts+
What is an idempotent operation? what is a stateless server? How is a file
identified? What is NFS ? What typical operations are offered by the NFS
protocol? What is the architecture of a NFS system? What is a file handle?
How is a path name translated? What is the purpose of caching? What is
server caching? What is read-ahead? What is write through? What happens in
case of failure? What is client caching? What is cache consistency? How does
NFS check for validity of a client cache entry? What performance bottlenecks
exists in NFS?
- Replication [15.1-15.4]
Discuss the use of replication to achieve either fault tolerance or
increased availability.
What is replication? How can it improve performance? How can it reduce
performance? How can it increase/decrease availability? How can it
increase/decrease fault tolerance? What are the common requirements for
replicated data? What is the basic architectural model for replicating data?
What are the general five phases for accessing replicated objects? What is a
fault tolerant service? What are the correctness requirements for replicated
fault tolerant data? What is linearizability? What is sequential
consistency? What is passive replication? How can it be implemented? What is
active replication ? How can it be implemented? What fault-tolerance
guarantees can be provided? What are the advantages/disadvantages of active
and passive replication respectively? What is (high) availability? What are
the consistency requirements for high availability replication? What is a
gossip? How does the gossip architecture look like? What is causal ordering?
What is a vector time stamp? What are the main data structures in a gossip
replica manager? How is a query performed? When may it be performed? How is
an update operation performed? When can an update be performed? What is a
gossip message? What happens when a replica manager receives a gossip?
How are updates propagated in the system? What are the
advantages/disadvantages of the gossip architecture?
- Peer2peer [10.1-10.5]
Discuss the goal of Peer-to-Peer systems, and describe how searches in a Pastry net
is performed.
What is a p2p system? What are their goals? How are they different from
centralized or client/server systems? What are the advantages and
disadvantages of p2p systems? How are resources identified? How are nodes
identified? What is overlay routing networks? What is a distributed hash
table? What is the API for a DHT? How is a DHT realized in Pastry? How is
routing performed? What information is contained by the routing table? What
is the longest common prefix? What is a leaf set? What purpose does it
serve? What is the performance of the Pastry System? How does Pastry
function when nodes fail or appear/disappear dynamically? What disadvantages
do you see of the pastry approach?
- Study-exercise
see exam notes under the study exercise text
- Study-exercise