direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Page Content

Distributed Systems

Almost every computing system nowadays is distributed, ranging from multi-core laptops to Internet-scale services; understanding the principles of distributed computing is hence important for the design and engineering of modern computing systems.  Fundamental issues that arise in reliable and efficient distributed systems include developing adequate methods for modeling failures and synchrony assumptions, determining precise performance bounds on implementations of concurrent data structures, capturing the trade-off between consistency and efficiency, and demarcating the frontier of feasibility in distributed computing.

For example, popular Internet services and applications such as CNN.com, YouTube, Facebook, Skype, BitTorrent attract millions of users every day, and only by the effective load-balancing and collaboration of many thousand machines, an acceptable Quality-of-Service/Quality-of-Experience can be guaranteed. While distributed systems promise a good scalability as well as a high robustness, they pose challenging research problems, such as: How to design robust and scalable distributed architectures and services? How to coordinate access to a shared resource, e.g., by electing a leader? Or how to provide incentives for cooperation in an open, collaborative distributed system?


Selected Publications

Zeno: Eventually Consistent Byzantine-Fault Tolerance
Citation key SFKRM-ZECBFT-09
Author Singh, Atul and Fonseca, Pedro and Kuznetsov, Petr and Rodrigues, Rodrigo and Maniatis, Petros
Title of Book 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI '09)
Pages 169–184
Year 2009
Location Boston, MA, USA
Address Berkeley, CA, USA
Month April
Publisher USENIX Association
Organization USENIX
Abstract Many distributed services are hosted at large, shared, geographically diverse data centers, and they use replication to achieve high availability despite the unreachability of an entire data center. Recent events show that non-crash faults occur in these services and may lead to long outages. While Byzantine-Fault Tolerance (BFT) could be used to withstand these faults, current BFT protocols can become unavailable if a small fraction of their replicas are unreachable. This is because existing BFT protocols favor strong safety guarantees (consistency) over liveness (availability). This paper presents a novel BFT state machine replication protocol called Zeno that trades consistency for higher availability. In particular, Zeno replaces strong consistency (linearizability) with a weaker guarantee (eventual consistency): clients can temporarily miss each other's updates but when the network is stable the states from the individual partitions are merged by having the replicas agree on a total order for all requests. We have built a prototype of Zeno and our evaluation using micro-benchmarks shows that Zeno provides better availability than traditional BFT protocols.
Link to publication Link to original publication Download Bibtex entry

Zusatzinformationen / Extras

Quick Access:

Schnellnavigation zur Seite über Nummerneingabe

Auxiliary Functions