direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Es gibt keine deutsche Übersetzung dieser Webseite.

Distributed Systems

Almost every computing system nowadays is distributed, ranging from multi-core laptops to Internet-scale services; understanding the principles of distributed computing is hence important for the design and engineering of modern computing systems.  Fundamental issues that arise in reliable and efficient distributed systems include developing adequate methods for modeling failures and synchrony assumptions, determining precise performance bounds on implementations of concurrent data structures, capturing the trade-off between consistency and efficiency, and demarcating the frontier of feasibility in distributed computing.

For example, popular Internet services and applications such as CNN.com, YouTube, Facebook, Skype, BitTorrent attract millions of users every day, and only by the effective load-balancing and collaboration of many thousand machines, an acceptable Quality-of-Service/Quality-of-Experience can be guaranteed. While distributed systems promise a good scalability as well as a high robustness, they pose challenging research problems, such as: How to design robust and scalable distributed architectures and services? How to coordinate access to a shared resource, e.g., by electing a leader? Or how to provide incentives for cooperation in an open, collaborative distributed system?

People

Selected Publications

The Fault Detection Problem
Zitatschlüssel HK-TFDP-09
Autor Haeberlen, Andreas and Kuznetsov, Petr
Buchtitel Principles of Distributed Systems – Proceedings of the 13th International Conference On Principle Of Distributed Systems (OPODIS '09)
Seiten 99–114
Jahr 2009
ISBN 978-3-642-10876-1
ISSN 0302-9743
DOI http://dx.doi.org/10.1007/978-3-642-10877-8_10
Ort Nimes, France
Adresse Berlin / Heidelberg, Germany
Monat December
Verlag Springer
Zusammenfassung One of the most important challenges in distributed computing is ensuring that services are correct and available despite faults. Recently it has been argued that fault detection can be factored out from computation, and that a generic fault detection service can be a useful abstraction for building distributed systems. However, while fault detection has been extensively studied for crash faults, little is known about detecting more general kinds of faults. This paper explores the power and the inherent costs of generic fault detection in a distributed system. We propose a formal framework that allows us to partition the set of all faults that can possibly occur in a distributed computation into several fault classes. Then we formulate the fault detection problem for a given fault class, and we show that this problem can be solved for only two specific fault classes, namely omission faults and commission faults. Finally, we derive tight lower bounds on the cost of solving the problem for these two classes in asynchronous message-passing systems.
Download Bibtex Eintrag

Zusatzinformationen / Extras

Direktzugang:

Schnellnavigation zur Seite über Nummerneingabe