direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Topics for the Seminar on Internet Measurement, SoSe 2011

Topics for the seminar on Internet Measurement (SoSe 2011).
Themenvorschläge für das Seminar über Internet Measurement (SoSe 2011).

01 — Quantifying Path Exploration in the Internet

Student/Bearbeiter: Jan Henning; Supervisor/Betreuer: Steve Uhlig;

A number of previous measurement studies have shown the existence of path exploration and slow convergence in the global Internet routing system, and a number of protocol enhancements have been proposed to remedy the problem. However all the previous measurements were conducted over a small number of testing pre fixes. There has been no systematic study to quantify the pervasiveness of BGP slow convergence in the operational Internet, nor there is any known effort to deploy any of the proposed solutions. In this paper we present our measurement results from identifying BGP slow convergence events across the entire global routing table. Our data shows that the severity of path exploration and slow convergence varies depending on where prefixes are originated and where the observations are made in the Internet routing hierarchy. In general, routers in tier-1 ISPs observe less path exploration, hence shorter convergence delays than routers in edge ASes, and prefixes originated from tier-1 ISPs also experience less path exploration than those originated from edge ASes. Our data also shows that the convergence time of route fail-over events is similar to that of new route announcements, and significantly shorter than that of route failures, which confirms our earlier analytical results. In addition, we also developed a usage-time based path preference inference method which can be used by future studies of BGP dynamics.

05 — Routing Stability in Static Wireless Mesh Networks

Student/Bearbeiter: Andrii Soloviov; Supervisor/Betreuer: Ruben Merz;

Considerable research has focused on the design of routing protocols for wireless mesh networks. Yet, little is understood about the stability of routes in such networks. This understanding is important in the design of wireless routing protocols, and in network planning and management. In this paper, we present results from our measurement-based characterization of routing stability in two network deployments, the UCSB MeshNet and the MIT Roofnet. To conduct these case studies, we use detailed link quality information collected over several days from each of these networks1 . Using this information, we investigate routing stability in terms of route-level characteristics, such as prevalence, persistence and flapping. Our key findings are the following: wireless routes are weakly dominated by a single route; dominant routes are extremely short-lived due to excessive route flapping; and simple stabilization techniques, such as hysteresis thresholds, can provide a significant improvement in route persistence.

08 — A First Look at Mobile Hand-held Device Traffic

Student/Bearbeiter: Martin Schenck; Supervisor/Betreuer: Pan Hui;

Although mobile hand-held devices (MHDs) are ubiquitous today, little is know about how they are used—especially at home. In this paper, we cast a first look on mobile hand-held device usage from a network perspective. We base our study on anonymized packet level data representing more than 20,000 residential DSL customers. Our characterization of the traffic shows that MHDs are active on up to 3 % of the monitored DSL lines. Mobile devices from Apple (i.e., iPhones and iPods) are, by a huge margin, the most commonly used MHDs and account for most of the traffic. We find that MHD traffic is dominated by multimedia content and downloads of mobile applications.

09 — Understanding Online Social Network Usage from a Network Perspective

Student/Bearbeiter: Eric Klieme; Supervisor/Betreuer: Gilles Trédan;

Online Social Networks (OSNs) have already attracted more than half a billion users. However, our understanding of which OSN features attract and keep the attention of these users is poor. Studies thus far have relied on surveys or interviews of OSN users or focused on static properties, e. g., the friendship graph, gathered via sampled crawls. In this paper, we study how users actually interact with OSNs by extracting clickstreams from passively monitored network traffic. Our characterization of user interactions within the OSN for four different OSNs (Facebook, LinkedIn, Hi5, and StudiVZ) focuses on feature popularity, session characteristics, and the dynamics within OSN sessions. We find, for example, that users commonly spend more than half an hour interacting with the OSNs while the byte contributions per OSN session are relatively small.

10 — Seven Years and One Day: Sketching the Evolution of Internet Traffic

Student/Bearbeiter: Dominik Barczyk; Supervisor/Betreuer: Steve Uhlig;

This contribution aims at performing a longitudinal study of the evolution of the traffic collected every day for seven years on a trans-Pacific backbone link (the MAWI dataset). Long term characteristics are investigated both at TCP/IP layers (packet and flow attributes) and application usages. The analysis of this unique dataset provides new insights into changes in traffic statistics, notably on the persistence of Long Range Dependence, induced by the on-going increase in link bandwidth. Traffic in the MAWI dataset is subject to bandwidth changes, to congestions, and to a variety of anomalies. This allows the comparison of their impacts on the traffic statistics but at the same time significantly impairs long term evolution characterizations. To account for this difficulty, we show and explain how and why random projection (sketch) based analysis procedures provide practitioners with an efficient and robust tool to disentangle actual long term evolutions from time localized events such as anomalies and link congestions. Our central results consist in showing a strong and persistent long range dependence controlling jointly byte and packet counts. An additional study of a 24-hour trace complements the long-term results with the analysis of intraday variabilities.

16 — TCP Revisited: A Fresh Look at TCP in the Wild

Student/Bearbeiter: Cagdas Dönmez; Supervisor/Betreuer: Amir Mehmood;

Since the last in-depth studies of measured TCP traffic some 6–8 years ago, the Internet has experienced significant changes, including the rapid deployment of backbone links with 1–2 orders of magnitude more capacity, the emergence of bandwidth-intensive streaming applications, and the massive penetration of new TCP variants. These and other changes beg the question whether the characteristics of measured TCP traffic in today's Internet reflect these changes or have largely remained the same. To answer this question, we collected and analyzed packet traces from a number of Internet backbone and access links, focused on the "heavy-hitter" flows responsible for the majority of traffic. Next we analyzed their within-flow packet dynamics, and observed the following features: (1) in one of our datasets, up to 15.8% of flows have an initial congestion window (ICW) size larger than the upper bound specified by RFC 3390. (2) Among flows that encounter retransmission rates of more than 10%, 5% of them exhibit irregular retransmission behavior where the sender does not slow down its sending rate during retransmissions. (3) TCP flow clocking (i.e., regular spacing between flights of packets) can be caused by both RTT and non-RTT factors such as application or link layer, and 60% of flows studied show no pronounced flow clocking. To arrive at these findings, we developed novel techniques for analyzing unidirectional TCP flows, including a technique for inferring ICW size, a method for detecting irregular retransmissions, and a new approach for accurately extracting flow clocks.

17 — On Dominant Characteristics of Residential Broadband Internet Traffic

Student/Bearbeiter: Michal Tadeusz Stawski; Supervisor/Betreuer: Juhoon Kim;

While residential broadband Internet access is popular in many parts of the world, only a few studies have examined the characteristics of such traffic. In this paper we describe observations from monitoring the network activity for more than 20,000 residential DSL customers in an urban area. To ensure privacy, all data is immediately anonymized. We augment the anonymized packet traces with information about DSL-level sessions, IP (re-)assignments, and DSL link bandwidth.

Our analysis reveals a number of surprises in terms of the mental models we developed from the measurement literature. For example, we find that HTTP—not peer-to-peer—traffic dominates by a significant margin; that more often than not the home user's immediate ISP connectivity contributes more to the round-trip times the user experiences than the WAN portion of the path; and that the DSL lines are frequently not the bottleneck in bulk-transfer performance.

19 — Characterizing VLAN-Induced Sharing in a Campus Network

Student/Bearbeiter: Ece Gürler; Supervisor/Betreuer: Cigdem Sengul;

Many enterprise, campus, and data-center networks have complex layer-2 virtual LANs ("VLANs") below the IP layer. The interaction between layer-2 and IP topologies in these VLANs introduces hidden dependencies between IP level network and the physical infrastructure that has implications for network management tasks such as planning for capacity or reliability, and for fault diagnosis. This paper characterizes the extent and effect of these dependencies in a large campus network. We first present the design and implementation of EtherTrace, a tool that we make publicly available, which infers the layer-2 topology using data passively collected from Ethernet switches. Using this tool, we infer the layer-2 topology for a large campus network and compare it with the IP topology. We find that almost 70% of layer-2 edges are shared by 10 or more IP edges, and a single layer-2 edge may be shared by as many as 34 different IP edges. This sharing of layer-2 edges and switches among IP paths commonly results from trunking multiple VLANs to the same access router, or from colocation of academic departments that share layer-2 infrastructure, but have logically separate IP subnet and routers. We examine how this sharing affects the accuracy and specificity of fault diagnosis. For example, applying network tomography to the IP topology to diagnose failures caused by layer-2 devices results in only 54% accuracy, compared to 100% accuracy when our tomography algorithm takes input across layers.

20 — Live Streaming Performance of the Zattoo Network

Student/Bearbeiter: Michael Winkelmann; Supervisor/Betreuer: Oliver Hohlfeld;

A number of commercial peer-to-peer systems for live streaming, such as PPLive, Joost, LiveStation, SOPCast, TVants, etc. have been introduced in recent years. The behavior of these popular systems has been extensively studied in several measurement papers. Due to the proprietary nature of these commercial systems, however, these studies have to rely on a "black-box" approach, where packet traces are collected from a single or a limited number of measurement points, to infer various properties of traffic on the control and data planes. Although such studies are useful to compare different systems from end-user's perspective, it is difficult to intuitively understand the observed properties without fully reverse-engineering the underlying systems. Our paper presents a large-scale measurement study of Zattoo, one of the largest production live streaming providers in Europe, using data collected by the provider. To highlight, we found that even when the Zattoo system was heavily loaded with as high as 20,000 concurrent users on a single overlay, the median channel join delay remained less than 2 to 5 seconds, and that, for a majority of users, the streamed signal lags over-the-air broadcast signal by no more than 3 seconds. To motivate the measurement study, we also present a description of the Zattoo network architecture.

23 — Measuring Serendipity: Connecting People, Locations and Interests in a Mobile 3G

Student/Bearbeiter: Francisco Javier Sanchez - Migallon Blanco; Supervisor/Betreuer: Ingmar Poese;

Characterizing the relationship that exists between people's application interests and mobility properties is the core question relevant for location-based services, in particular those that facilitate serendipitous discovery of people, businesses and objects. In this paper, we apply rule mining and spectral clustering to study this relationship for a population of over 280,000 users of a 3G mobile network in a large metropolitan area. Our analysis reveals that (i) People's movement patterns are correlated with the applications they access, e.g., stationary users and those who move more often and visit more locations tend to access different applications. (ii) Location affects the applications accessed by users, i.e., at certain locations, users are more likely to evince interest in a particular class of applications than others irrespective of the time of day. (iii) Finally, the number of serendipitous meetings between users of similar cyber interest is larger in regions with higher density of hotspots. Our analysis demonstrates how cellular network providers and location-based services can benefit from knowledge of the inter-play between users and their locations and interests.

25 — The nature of data center traffic: measurements & analysis

Student/Bearbeiter: Adin Sljivar; Supervisor/Betreuer: Nadi Sarrar;

We explore the nature of traffic in data centers, designed to support the mining of massive data sets. We instrument the servers to collect socket-level logs, with negligible performance impact. In a 1500 server operational cluster, we thus amass roughly a petabyte of measurements over two months, from which we obtain and report detailed views of traffic and congestion conditions and patterns. We further consider whether traffic matrices in the cluster might be obtained instead via tomographic inference from coarser-grained counter data.

32 — Measuring availability in the Domain Name System

Student/Bearbeiter: Michal Holowaty; Supervisor/Betreuer: Bernhard Ager;

The domain name system (DNS) is critical to Internet functionality. The availability of a domain name refers to its ability to be resolved correctly. We develop a model for server dependencies that is used as a basis for measuring availability. We introduce the minimum number of servers queried (MSQ) and redundancy as availability metrics and show how common DNS misconfigurations impact the availability of domain names. We apply the availability model to domain names from production DNS and observe that 6.7% of names exhibit sub-optimal MSQ, and 14% experience false redundancy. The MSQ and redundancy values can be optimized by proper maintenance of delegation records for zones.

  • Casey Deccio, Jeff Sedayao, Krishna Kant, Prasant Mohapatra. Measuring Availability in the Domain Name System, IEEE INFOCOM 2010

34 — TopBT: A Topology-Aware and Infrastructure-Independent BitTorrent Client

Student/Bearbeiter: Jan Nehring; Supervisor/Betreuer: Georgios Smaragdakis;

BitTorrent (BT) has carried out a significant and continuously increasing portion of Internet traffic. While several designs have been recently proposed and implemented to improve the resource utilization by bridging the application layer (overlay) and the network layer (underlay), these designs are largely dependent on Internet infrastructures, such as ISPs and CDNs. In addition, they also demand large-scale deployments of their systems to work effectively. Consequently, they require multi-efforts far beyond individual users' ability to be widely used in the Internet.
In this paper, aiming at building an infrastructure-independent user-level facility, we present our design, implementation, and evaluation of a topology-aware BT system, called TopBT, to significantly improve the overall Internet resource utilization without degrading user downloading performance. The unique feature of TopBT client lies in that a TopBT client actively discovers network proximities (to connected peers), and uses both proximities and transmission rates to maintain fast downloading while reducing the transmitting distance of the BT traffic and thus the Internet traffic. As a result, a TopBT client neither requires feeds from major Internet infrastructures, such as ISPs or CDNs, nor requires large-scale deployment of other TopBT clients on the Internet to work effectively. We have implemented TopBT based on widely used open-source BT client code base, and made the software publicly available. By deploying TopBT and other BitTorrent clients on hundreds of Internet hosts, we show that on average TopBT can reduce about 25% download traffic while achieving a 15% faster download speed compared to several prevalent BT clients. TopBT has been widely used in the Internet by many users all over the world.

  • Shansi Ren, Enhua Tan, Tian Luo, Songqing Chen, Lei Guo, and Xiaodong Zhang. TopBT: A Topology-Aware and Infrastructure-Independent BitTorrent Client, IEEE INFOCOM 2010

99 — Improving Content Delivery - Using Provider-aided Distance Information

Student/Bearbeiter: Robert Philipp Skupin; Supervisor/Betreuer: Juhoon Kim;

Content delivery systems constitute a major portion of today's Internet traffic. While they are a good source of revenue for Internet Service Providers (ISPs), the huge volume of content delivery traffic also poses a significant burden and traffic engineering challenge for the ISP. The difficulty is due to the immense volume of transfers, while the traffic engineering challenge stems from the fact that most content delivery systems themselves utilize a distributed infrastructure. They perform their own traffic flow optimization and realize this using the DNS system. While content delivery systems may, to some extent, consider the user's performance within their optimization criteria, they currently have no incentive to consider any of the ISP's constraints. As a consequence, the ISP has "lost control" over a major part of its traffic. To overcome this impairment, we propose a solution where the ISP offers a Provider-aided Distance Information System (PaDIS). PaDIS uses information available only to the ISP to rank any client-host pair based on distance information, such as delay, bandwidth or number of hops.
In this paper we show that the applicability of the system is significant. More than 70% of the HTTP traffic of a major European ISP can be accessed via multiple different locations. Moreover, we show that deploying PaDIS is not only beneficial to ISPs, but also to users. Experiments with different content providers show that improvements in download times of up to a factor of four are possible. Furthermore, we describe a high performance implementation of PaDIS and show how it can be deployed within an ISP.

Zusatzinformationen / Extras

Direktzugang

Schnellnavigation zur Seite über Nummerneingabe