direkt zum Inhalt springen

direkt zum Hauptnavigationsmenü

Sie sind hier

TU Berlin

Inhalt des Dokuments

Topics for the Seminar on Internet Measurement, SS 2014

Topics for the seminar on Internet Measurement, SS 2014.
Themen für das Seminar über Internet Measurement, SS 2014.

#1 Evolution of Social-Attribute Networks: Measurements, Modeling, and Implications using Google+

Understanding social network structure and evolution has important implications for many aspects of network and system design including provisioning, bootstrapping trust and reputation systems via social networks, and defenses against Sybil attacks. Several recent results suggest that augmenting the social network structure with user attributes (e.g., location, employer, communities of interest) can provide a more fine-grained understanding of social networks. However, there have been few studies to provide a systematic understanding of these effects at scale. We bridge this gap using a unique dataset collected as the Google+ social network grew over time since its release in late June 2011. We observe novel phenomena with respect to both standard social network metrics and new attribute-related metrics (that we define). We also observe interesting evolutionary patterns as Google+ went from a bootstrap phase to a steady invitation-only stage before a public release. Based on our empirical observations, we develop a new generative model to jointly reproduce the social structure and the node attributes. Using theoretical analysis and empirical evaluations, we show that our model can accurately reproduce the social and attribute structure of real social networks. We also demonstrate that our model provides more accurate predictions for practical application contexts.

Gong, Neil Zhenqiang and Xu, Wenchang and Huang, Ling and Mittal, Prateek and Stefanov, Emil and Sekar, Vyas and Song, Dawn

ACM Internet Measurement Conference (IMC) 2012

#2 Bobtail: Avoiding Long Tails in the Cloud

Highly modular data center applications such as Bing, Facebook, and Amazon’s retail platform are known to be susceptible to long tails in response times. Services such as Amazon’s EC2 have proven attractive platforms for building similar applications. Unfortunately, virtualization used in such platforms exacerbates the long tail problem by factors of two to four. Surprisingly, we find that poor response times in EC2 are a property of nodes rather than the network, and that this property of nodes is both pervasive throughout EC2 and persistent over time. The root cause of this problem is co-scheduling of CPU-bound and latency-sensitive tasks. We leverage these observations in Bobtail, a system that proactively detects and avoids these bad neighboring VMs without significantly penalizing node instantiation. With Bobtail, common communication patterns benefit from reductions of up to 40% in 99.9th percentile response times.

Xu, Yunjing and Musgrave, Zachary and Noble, Brian and Bailey, Michael

USENIX conference on Networked Systems Design and Implementation2013

#3 3GOL: Power-boosting ADSL using 3G OnLoading

The co-existence of cellular and wired networks has been exploited almost exclusively in the direction of OffLoading traffic from the former onto the latter. In this paper we claim that there exist cases that call for the exact opposite, i.e., use the cellular network to assist a fixed wired network. In particular, we show that by “OnLoading” traffic from the wired broadband network onto the cellular network we can usefully speedup wired connections, on the downlink or the uplink. We consider the technological challenges pertaining to this idea and implement a prototype 3G OnLoading service that we call 3GOL, that can be deployed by an operator providing both the wired and cellular network services. By strategically OnLoading a fraction of the data transfers to the 3G network, one can significantly enhance the performance of particular applications. In particular we demonstrate non-trivial performance benefits of 3GOL to two widely used applications: video-on-demand and multimedia upload. We also consider the case when the operator that provides wired and cellular services is different, adding the analysis on economic constraints and volume cap on cellular data plans that need to be respected. Simulating 3GOL over a DSLAM trace we show that 3GOL can reduce video pre-buffering time by at least 20% for 50% of the users while respecting data caps and we design a simple estimator to compute the daily allowance that can be used towards 3GOL while respecting caps. Our prototype is currently being piloted in 30 households in a large European city by a large network provider.

Rossi, Claudio and Vallina-Rodriguez, Narseo and Erramilli, Vijay and Grunenberger, Yan and Gyarmati, Laszlo and Laoutaris, Nikolaos and Stanojevic, Rade and Papagiannaki, Konstantina and Rodriguez, Pablo

ACM Conference on Emerging Networking Experiments and Technologies (CoNEXT) 2013

#4 A Measurement-based Study of MultiPath TCP Performance over Wireless Networks

With the popularity of mobile devices and the pervasive use of cellular technology, there is widespread interest in hybrid networks and on how to achieve robustness and good performance from them. As most smart phones and mobile devices are equipped with dual interfaces (WiFi and 3G/4G), a promising approach is through the use of multi-path TCP, which leverages path diversity to improve performance and provide robust data transfers. In this paper we explore the performance of multi-path TCP in the wild, focusing on simple 2-path multi-path TCP scenarios. We seek to answer the following questions: How much can a user benefit from using multi-path TCP over cellular and WiFi relative to using the either interface alone? What is the impact of flow size on average latency? What is the effect of the rate/route control algorithm on performance? We are especially interested in understanding how application level performance is affected when path characteristics (e.g., round trip times and loss rates) are diverse. We address these questions by conducting measurements using one commercial Internet service provider and three major cellular carriers in the US.

Chen, Yung-Chih and Lim, YS and Gibbens, Richard J and Nahum, Erich M and Khalili, Ramin and Towsley, Don

ACM Internet Measurement Conference (IMC) 2013

#5 Mapping the Expansion of Google’s Serving Infrastructure

Modern content-distribution networks both provide bulk content and act as “serving infrastructure” for web services in order to reduce user-perceived latency. Serving infrastructures such as Google’s are now critical to the online economy, making it imperative to understand their size, geographic distribution, and growth strategies. To this end, we develop techniques that enumerate IP addresses of servers in these infrastructures, find their geographic location, and identify the association between clients and clusters of servers. While general techniques for server enumeration and geolocation can exhibit large error, our techniques exploit the design and mechanisms of serving infrastructure to improve accuracy. We use the EDNS-client-subnet DNS extension to measure which clients a service maps to which of its serving sites. We devise a novel technique that uses this mapping to geolocate servers by combining noisy information about client locations with speed-of-light constraints. We demonstrate that this technique substantially improves geolocation accuracy relative to existing approaches. We also cluster server IP addresses into physical sites by measuring RTTs and adapting the cluster thresholds dynamically. Google’s serving infrastructure has grown dramatically in the ten months, and we use our methods to chart its growth and understand its content serving strategy. We find that the number of Google serving sites has increased more than sevenfold, and most of the growth has occurred by placing servers in large and small ISPs across the world, not by expanding Google’s backbone.

Calder, Matt and Fan, Xun and Hu, Zi and Katz-Bassett, Ethan and Heidemann, John and Govindan, Ramesh

ACM Internet Measurement Conference (IMC) 2013

#6 A Fistful of Bitcoins: Characterizing Payments Among Men with No Names

Bitcoin is a purely online virtual currency, unbacked by either physical commodities or sovereign obligation; instead, it relies on a combination of cryptographic protection and a peer-to-peer protocol for witnessing settlements. Consequently, Bitcoin has the unintuitive property that while the ownership of money is implicitly anonymous, its flow is globally visible. In this paper we explore this unique characteristic further, using heuristic clustering to group Bitcoin wallets based on evidence of shared authority, and then using re-identification attacks (i.e., empirical purchasing of goods and services) to classify the operators of those clusters. From this analysis, we characterize longitudinal changes in the Bitcoin market, the stresses these changes are placing on the system, and the challenges for those seeking to use Bitcoin for criminal or fraudulent purposes at scale.

Meiklejohn, Sarah and Pomarole, Marjori and Jordan, Grant and Levchenko, Kirill and McCoy, Damon and Voelker, Geoffrey M and Savage, Stefan

ACM Internet Measurement Conference (IMC) 2013

#7 Understanding the Domain Registration Behavior of Spammers

Spammers register a tremendous number of domains to evade blacklisting and takedown efforts. Current techniques to detect such domains rely on crawling spam URLs or monitoring lookup traffic. Such detection techniques are only effective after the spammers have already launched their campaigns, and thus these countermeasures may only come into play after the spammer has already reaped significant benefits from the dissemination of large volumes of spam. In this paper we examine the registration process of such domains, with a particular eye towards features that might indicate that a given domain likely has a malicious purpose at registration time, before it is ever used for an attack. Our assessment includes exploring the characteristics of registrars, domain life cycles, registration bursts, and naming patterns. By investigating zone changes from the .com TLD over a 5-month period, we discover that spammers employ bulk registration, that they often re-use domains previously registered by others, and that they tend to register and host their domains over a small set of registrars. Our findings suggest steps that registries or registrars could use to frustrate the efforts of miscreants to acquire domains in bulk, ultimately reducing their agility for mounting large-scale attacks.

Hao, Shuang and Thomas, Matthew and Paxson, Vern and Feamster, Nick and Kreibich, Christian and Grier, Chris and Hollenbeck, Scott

ACM Internet Measurement Conference (IMC) 2013

#8 Analysis of a “/0” Stealth Scan from a Botnet

Botnets are the most common vehicle of cyber-criminal activity. They are used for spamming, phishing, denial of service attacks, brute-force cracking, stealing private information, and cyber warfare. Botnets carry out network scans for several reasons, including searching for vulnerable machines to infect and recruit into the botnet, probing networks for enumeration or penetration, etc. We present the measurement and analysis of a horizontal scan of the entire IPv4 address space conducted by the Sality botnet in February of last year. This 12-day scan originated from approximately 3 million distinct IP addresses, and used a heavily coordinated and unusually covert scanning strategy to try to discover and compromise VoIP-related (SIP server) infrastructure. We observed this event through the UCSD Network Telescope, a /8 darknet continuously receiving large amounts of unsolicited traffic, and we correlate this traffic data with other public sources of data to validate our inferences. Sality is one of the largest botnets ever identified by researchers, its behavior represents ominous advances in the evolution of modern malware: the use of more sophisticated stealth scanning strategies by millions of coordinated bots, targeting critical voice communications infrastructure. This work offers a detailed dissection of the botnet’s scanning behavior, including general methods to correlate, visualize, and extrapolate botnet behavior across the global Internet.

Dainotti, Alberto and King, Alistair and Papale, Ferdinando and Pescape, Antonio and others

ACM Internet Measurement Conference (IMC) 2012

#9 Cell vs. WiFi: On the Performance of Metro Area Mobile Connections

Cellular and 802.11 WiFi are compelling options for mobile Internet connectivity. The goal of our work is to understand the performance afforded by each of these technologies in diverse environments and use conditions. In this paper, we compare and contrast cellular and WiFi performance using crowd-sourced data from Speedtest.net. Our study considers spatio-temporal performance (upload/download throughput and latency) using over 3 million user-initiated tests from iOS and Android apps in 15 different metro areas collected over a 15 week period. Our basic performance comparisons show that (i) WiFi provides better absolute download/upload throughput, and a higher degree of consistency in performance; (ii) WiFi networks generally deliver lower absolute latency, but the consistency in latency is often better with cellular access; (iii) throughput and latency vary widely depending on the particular access type (e.g., HSPA, EVDO, LTE, WiFi, etc.) and service provider. More broadly, our results show that performance consistency for cellular and WiFi is much lower than has been reported for wired broadband. Temporal analysis shows that average performance for cell and WiFi varies with time of day, with the best performance for large metro areas coming at non-peak hours. Spatial analysis shows that performance is highly variable across metro areas, but that there are subregions that offer consistently better performance for cell or WiFi. Comparisons between metro areas show that larger areas provide higher throughput and lower latency than smaller metro areas, suggesting where ISPs have focused their deployment efforts. Finally, our analysis reveals diverse performance characteristics resulting from the rollout of new cell access technologies and service differences among local providers.

Sommers, Joel and Barford, Paul

ACM Internet Measurement Conference (IMC) 2012

#10 Volume-based Transit Pricing: Is 95 The Right Percentile?

The 95th percentile billing mechanism has been an industry de facto standard for transit providers for well over a decade. While the simplicity of the scheme makes it attractive as a billing mechanism, dramatic evolution in traffic patterns, associated interconnection practices and industry structure over the last two decades motivates an obvious question: is it still appropriate? In this paper, we evaluate the 95th percentile pricing mechanism from the perspective of transit providers, using a decade of traffic statistics from SWITCH (a large research/academic network), and more recent traffic statistics from 3 Internet Exchange Points (IXPs). We find that over time, heavy-inbound and heavy-hitter networks are able to achieve a lower 95th-to-average ratio than heavy-inbound and moderate-hitter networks, possibly due to their ability to better manage their traffic profile. The 95th percentile traffic volume also does not necessarily reflect the cost burden to the provider, motivating our exploration of an alternative metric that better captures the costs imposed on a network. We define the provision ratio for a customer, which captures its contribution to the provider’s peak load.

Raja, Vamseedhar Reddyvari and Dhamdhere, Amogh and Scicchitano, Alessandra and Shakkottai, Srinivas and Leinen, Simon and others

Passive and Active Measurement conference (PAM) 2014

#11 The Need for End-to-End Evaluation of Cloud Availability

People’s computing lives are moving into the cloud, making understanding cloud availability increasingly critical. Prior studies of Internet outages have used ICMP-based pings and traceroutes. While these studies can detect network availability, we show that they can be inaccurate at estimating cloud availability. Without care, ICMP probes can underestimate availability because ICMP is not as robust as applicationlevel measurements such as HTTP. They can overestimate availability if they measure reachability of the cloud’s edge, missing failures in the cloud’s back-end. We develop methodologies sensitive to five “nines” of reliability, and then we compare ICMP and end-to-end measurements for both cloud VM and storage services. We show case studies where one fails and the other succeeds, and our results highlight the importance of application-level retries to reach high precision. When possible, we recommend end-to-end measurement with application-level protocols to evaluate the availability of cloud services.

Hu, Zi and Zhu, Liang and Ardi, Calvin and Katz-Bassett, Ethan and Madhyastha, Harsha V and Heidemann, John and Yu, Minlan

Passive and Active Measurement conference (PAM) 2014

Tian Pan, Xiaoyu Guo, Chenhui Zhang,  Junchen Jiang, Hao Wu, Bin Liu, In IEEE Infocom, Mar 2012.

#12 A First Look at Cellular Machine-to-Machine Traffic – Large Scale Measurement and Characterization

Cellular network based Machine-to-Machine (M2M) communication is fast becoming a market-changing force for a wide spectrum of businesses and applications such as telematics, smart metering, point-of-sale terminals, and home security and automation systems. In this paper, we aim to answer the following important question: Does traffic generated by M2M devices impose new requirements and challenges for cellular network design and management? To answer this question, we take a first look at the characteristics of M2M traffic and compare it with traditional smartphone traffic. We have conducted our measurement analysis using a weeklong traffic trace collected from a tier-1 cellular network in the United States. We characterize M2M traffic from a wide range of perspectives, including temporal dynamics, device mobility, application usage, and network performance. Our experimental results show that M2M traffic exhibits significantly different patterns than smartphone traffic in multiple aspects. For instance, M2M devices have a much larger ratio of uplink to downlink traffic volume, their traffic typically exhibits different diurnal patterns, they are more likely to generate synchronized traffic resulting in bursty aggregate traffic volumes, and are less mobile compared to smartphones. On the other hand, we also find that M2M devices are generally competing with smartphones for network resources in co-located geographical regions. These and other findings suggest that better protocol design, more careful spectrum allocation, and modified pricing schemes may be needed to accommodate the rise of M2M devices.

Shafiq, Muhammad Zubair and Ji, Lusheng and Liu, Alex X and Pang, Jeffrey and Wang, Jia

ACM SIGMETRICS / Performance 2012

#13 Trinocular: Understanding Internet Reliability Through Adaptive Probing

Natural and human factors cause Internet outages—from big events like Hurricane Sandy in 2012 and the Egyptian Internet shutdown in Jan. 2011 to small outages every day that go unpublicized. We describe Trinocular, an outage detection system that uses active probing to understand reliability of edge networks. Trinocular is principled : deriving a simple model of the Internet that captures the information pertinent to outages, and populating that model through long-term data, and learning current network state through ICMP probes. It is parsimonious, using Bayesian inference to determine how many probes are needed. On average, each Trinocular instance sends fewer than 20 probes per hour to each /24 network block under study, increasing Internet “background radiation” by less than 0.7%. Trinocular is also predictable and precise: we provide known precision in outage timing and duration. Probing in rounds of 11 minutes, we detect 100% of outages one round or longer, and estimate outage duration within one-half round. Since we require little traffic, a single machine can track 3.4M /24 IPv4 blocks, all of the Internet currently suitable for analysis. We show that our approach is significantly more accurate than the best current methods, with about one-third fewer false conclusions, and about 30% greater coverage at constant accuracy. We validate our approach using controlled experiments, use Trinocular to analyze two days of Internet outages observed from three sites, and re-analyze three years of existing data to develop trends for the Internet.

Quan, Lin and Heidemann, John and Pradkin, Yuri


#14 An Empirical Reexamination of Global DNS Behavior

The performance and operational characteristics of the DNS protocol are of deep interest to the research and network operations community. In this paper, we present measurement results from a unique dataset containing more than 26 billion DNS query-response pairs collected from more than 600 globally distributed recursive DNS resolvers. We use this dataset to reaffirm findings in published work and notice some significant differences that could be attributed both to the evolving nature of DNS traffic and to our differing perspective. For example, we find that although characteristics of DNS traffic vary greatly across networks, the resolvers within an organization tend to exhibit similar behavior. We further find that more than 50% of DNS queries issued to root servers do not return successful answers, and that the primary cause of lookup failures at root servers is malformed queries with invalid TLDs. Furthermore, we propose a novel approach that detects malicious domain groups using temporal correlation in DNS queries. Our approach requires no comprehensive labeled training set, which can be difficult to build in practice. Instead, it uses a known malicious domain as anchor, and identifies the set of previously unknown malicious domains that are related to the anchor domain. Experimental results illustrate the viability of this approach, i.e. , we attain a true positive rate of more than 96%, and each malicious anchor domain results in a malware domain group with more than 53 previously unknown malicious domains on average.

Gao, Hongyu and Yegneswaran, Vinod and Chen, Yan and Porras, Phillip and Ghosh, Shalini and Jiang, Jian and Duan, Haixin


Zusatzinformationen / Extras


Schnellnavigation zur Seite über Nummerneingabe

NA: Internet Measurement Sem.
Dozent: Anja Feldmann

ab 17.04.2014

Fr 14:00 - 16:00 Uhr

Ort: MAR 4.033