direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

Distributed Systems

Almost every computing system nowadays is distributed, ranging from multi-core laptops to Internet-scale services; understanding the principles of distributed computing is hence important for the design and engineering of modern computing systems.  Fundamental issues that arise in reliable and efficient distributed systems include developing adequate methods for modeling failures and synchrony assumptions, determining precise performance bounds on implementations of concurrent data structures, capturing the trade-off between consistency and efficiency, and demarcating the frontier of feasibility in distributed computing.

For example, popular Internet services and applications such as CNN.com, YouTube, Facebook, Skype, BitTorrent attract millions of users every day, and only by the effective load-balancing and collaboration of many thousand machines, an acceptable Quality-of-Service/Quality-of-Experience can be guaranteed. While distributed systems promise a good scalability as well as a high robustness, they pose challenging research problems, such as: How to design robust and scalable distributed architectures and services? How to coordinate access to a shared resource, e.g., by electing a leader? Or how to provide incentives for cooperation in an open, collaborative distributed system?

Selected Publications

On the Weakest Failure Detector Ever
Citation key GHKLN-OWFDE-09
Author Guerraoui, Rachid and Herlihy, Maurice and Kuznetsov, Petr and Lynch, Nancy and Newport, Calvin
Pages 353–366
Year 2009
ISSN 0178-2770
DOI http://dx.doi.org/10.1007/s00446-009-0079-3
Journal Distributed Computing Journal (DC)
Volume 21
Number 5
Month August
Publisher Springer
Abstract Many problems in distributed computing are impossible to solve when no information about process failures is available. It is common to ask what information about failures is necessary and sufficient to circumvent some specific impossibility, e.g., consensus, atomic commit, mutual exclusion, etc. This paper asks what information about failures is necessary to circumvent any impossibility and sufficient to circumvent some impossibility. In other words, what is the minimal yet non-trivial failure information. We present an abstraction, denoted Υ, that provides very little information about failures. In every run of the distributed system, Υ eventually informs the processes that some set of processes in the system cannot be the set of correct processes in that run. Although seemingly weak, for it might provide random information for an arbitrarily long period of time, and it eventually excludes only one set of processes (among many) that is not the set of correct processes in the current run, Υ still captures non-trivial failure information. We show that Υ is sufficient to circumvent the fundamental wait-free set-agreement impossibility. While doing so, (a) we disprove previous conjectures about the weakest failure detector to solve set-agreement and (b) we prove that solving set-agreement with registers is strictly weaker than solving n+1-process consensus using n-process consensus. We show that Υ is the weakest stable non-trivial failure detector: any stable failure detector that circumvents some wait-free impossibility provides at least as much information about failures as Υ does. Our results are generalized, from the wait-free to the f-resilient case, through an abstraction $\Upsilon^f$ that we introduce and prove minimal to solve any problem that cannot be solved in an f-resilient manner, and yet sufficient to solve f-resilient f-set-agreement.
Download Bibtex entry

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe