Inhalt des Dokuments
- Analysis of a "/0" Stealth Scan from a Botnet
- Classifying Internet One-way Traffic
- Content delivery and the natural evolution of DNS: remote dns trends, performance issues and alternative solutions
- Longtime Behavior of Harvesting Spam Bots
- Measuring the Deployment of IPv6: Topology, Routing and Performance
- Mitigating Sampling Error when Measuring Internet Client IPv6 Capabilities
- Obtaining In-Context Measurements of Cellular Network Performance
- Anatomy of a Large European IXP
- Deadline-Aware Datacenter TCP (D2TCP)
- Measuring and Fingerprinting Click-Spam in Ad Networks
- Tracking Millions of Flows in High Speed Networks for Application Identification
- Unreeling Netflix: Understanding and Improving Multi-CDN Movie Delivery
Analysis of a "/0" Stealth Scan from a Botnet
Botnets are the most common vehicle of cyber-criminal activity. They are used for spamming, phishing, denial of service attacks, brute-force cracking, stealing private information, and cyber warfare. Botnets carry out network scans for several reasons, including searching for vulnerable machines to infect and recruit into the botnet, probing networks for enumeration or penetration, etc. We present the measurement and analysis of a horizontal scan of the entire IPv4 address space conducted by the Sality botnet in February of last year. This 12-day scan originated from approximately 3 million distinct IP addresses, and used a heavily coordinated and unusually covert scanning strategy to try to discover and compromise VoIP-related (SIP server) infrastructure. We observed this event through the UCSD Network Telescope, a /8 darknet continuously receiving large amounts of unsolicited trafﬁc, and we correlate this trafﬁc data with other public sources of data to validate our inferences. Sality is one of the largest botnets ever identiﬁed by researchers, its behavior represents ominous advances in the evolution of modern malware: the use of more sophisticated stealth scanning strategies by millions of coordinated bots, targeting critical voice communications infrastructure. This work offers a detailed dissection of the botnet’s scanning behavior, including general methods to correlate, visualize, and extrapolate botnet behavior across the global Internet.
A. Dainotti, A. King, K. Claffy, F. Papale, and A. Pescapè, in Internet Measurement Conference (IMC), Nov 2012.
Classifying Internet One-way Traffic
In this work we analyze a massive data-set that captures 5.23 petabytes of traffic to shed light into the composition of one-way traffic towards a large network based on a novel one-way traffic classifier. We find that one-way traffic makes a very large fraction of all traffic in terms of flows, it can be primarily attributed to malicious causes, and it has declined since 2004 because of relative decrease of scan traffic. In addition, we show how our classifier is useful for detecting network outages.
Eduard Glatz, Xenofontas Dimitropoulos, in Internet Measurement Conference (IMC), Nov 2012.
Content delivery and the natural evolution of DNS: remote dns trends, performance issues and alternative solutions
Content Delivery Networks (CDNs) rely on the Domain Name System (DNS) for replica server selection. DNS-based server selection builds on the assumption that, in the absence of information about the client's actual network location, the location of a client's DNS resolver provides a good approximation. The recent growth of remote DNS services breaks this assumption and can negatively impact client's web performance.
In this paper, we assess the end-to-end impact of using remote DNS services on CDN performance and present the first evaluation of an industry-proposed solution to the problem. We find that remote DNSusage can indeed significantly impact client's web performance and that the proposed solution, if available, can effectively address the problem for most clients. Considering the performance cost of remote DNS usage and the limited adoption base of the industry-proposed solution, we present and evaluate an alternative approach, Direct Resolution, to readily obtain comparable performance improvements without requiring CDN or DNS participation.
John S. Otto, Mario A. Sánchez, John P. Rula, and Fabián E. Bustamante, in Internet Measurement Conference (IMC), Nov 2012.
Longtime Behavior of Harvesting Spam Bots
Our observations suggest that simple obfuscation methods are still efficient for protecting addresses from being harvested. A key finding is that search engines are used as proxies, either to hide the identity of the harvester or to optimize the harvesting process.
Oliver Hohlfeld, Thomas Graf, and Florin Ciucu, in Internet Measurement Conference (IMC), Nov 2012.
Measuring the Deployment of IPv6: Topology, Routing and Performance
We use historical BGP data and recent active measurements to analyze trends in the growth, structure, dynamics and performance of the evolving IPv6 Internet, and compare them to the evolution of IPv4. We find that the IPv6 network is maturing, albeit slowly. While most core Internet transit providers have deployed IPv6, edge networks are lagging. Early IPv6 network deployment was stronger in Europe and the Asia-Pacific region, than in North America. Current IPv6 network deployment still shows the same pattern. The IPv6 topology is characterized by a single dominant player -- Hurricane Electric -- which appears in a large fraction of IPv6 AS paths, and is more dominant in IPv6 than the most dominant player in IPv4. Routing dynamics in the IPv6 topology are largely similar to those in IPv4, and churn in both networks grows at the same rate as the underlying topologies. Our measurements suggest that performance over IPv6 paths is comparable to that over IPv4 paths if the AS-level paths are the same, but can be much worse than IPv4 if the AS-level paths differ.
Amogh Dhamdhere, Matthew Luckie, Bradley Huffaker, kc claffy, Ahmed Elmokashfi, Emile Aben, in Internet Measurement Conference (IMC), Nov 2012.
Mitigating Sampling Error when Measuring Internet Client IPv6 Capabilities
Despite the predicted exhaustion of unallocated IPv4 addresses between 2012 and 2014, it remains unclear how many current clients can use its successor, IPv6, to access the Internet. We propose a refinement of previous measurement studies that mitigates intrinsic measurement biases, and demonstrate a novel web-based technique using Google ads to perform IPv6 capability testing on a wider range of clients. After applying our sampling error reduction, we find that 6% of world-wide connections are from IPv6-capable clients, but only 1--2% of connections preferred IPv6 in dual-stack (dual-stack failure rates less than 1%). Except for an uptick around IPv6-day 2011 these proportions were relatively constant, while the percentage of connections with IPv6-capable DNS resolvers has increased to nearly 60%. The percentage of connections from clients with native IPv6 using happy eyeballs has risen to over 20%.
Sebastian Zander, Lachlan L.H. Andrew, Grenville Armitage, Geoff Huston, George Michaelson, in Internet Measurement Conference (IMC), Nov 2012.
Obtaining In-Context Measurements of Cellular Network Performance
Network service providers, and other parties, require an accurate understanding of the performance cellular networks deliver to users. In particular, they often seek a measure of the network performance users experience solely when they are interacting with their device---a measure we call in-context. Acquiring such measures is challenging due to the many factors, including time and physical context, that influence cellular network performance. This paper makes two contributions. First, we conduct a large scale measurement study, based on data collected from a large cellular provider and from hundreds of controlled experiments, to shed light on the issues underlying in-context measurements. Our novel observations show that measurements must be conducted on devices which (i) recently used the network as a result of user interaction with the device, (ii) remain in the same macro-environment (e.g., indoors and stationary), and in some cases the same micro-environment (e.g., in the user's hand), during the period between normal usage and a subsequent measurement, and (iii) are currently sending/ receiving little or no user-generated traffic. Second, we design and deploy a prototype active measurement service for Android phones based on these key insights. Our analysis of 1650 measurements gathered from 12 volunteer devices shows that the system is able to obtain average throughput measurements that accurately quantify the performance experienced during times of active device and network usage.
Aaron Gember, Aditya Akella, Jeffrey Pang, Alexander Varshavsky, Ramon Caceres, in Internet Measurement Conference (IMC), Nov 2012.
Anatomy of a Large European IXP
The largest IXPs carry on a daily basis traffic volumes in the petabyte range, similar to what some of the largest global ISPs reportedly handle. This little-known fact is due to a few hundreds of member ASes exchanging traffic with one another over the IXP's infrastructure. This paper reports on a first-of-its-kind and in-depth analysis of one of the largest IXPs worldwide based on nine months' worth of sFlow records collected at that IXP in 2011.
A main finding of our study is that the number of actual peering links at this single IXP exceeds the number of total AS links of the peer-peer type in the entire Internet known as of 2010! To explain such a surprisingly rich peering fabric, we examine in detail this IXP's ecosystem and highlight the diversity of networks that are members at this IXP and connect there with other member ASes for reasons that are similarly diverse, but can be partially inferred from their business types and observed traffic patterns. In the process, we investigate this IXP's traffic matrix and illustrate what its temporal and structural properties can tell us about the member ASes that generated the traffic in the first place. While our results suggest that these large IXPs can be viewed as a microcosm of the Internet ecosystem itself, they also argue for a re-assessment of the mental picture that our community has about this ecosystem.
Bernhard Ager, Nikolaos Chatzis, Anja Feldmann, Nadi Sarrar, Steve Uhlig, Walter Willinger, In ACM SIGCOMM, Aug 2012.
Deadline-Aware Datacenter TCP (D2TCP)
An important class of datacenter applications, called Online Data-Intensive (OLDI) applications, includes Web search, online retail, and advertisement. To achieve good user experience, OLDI applications operate under soft-real-time constraints (e.g., 300 ms latency) which imply deadlines for network communication within the applications. Further, OLDI applications typically employ tree-based algorithms which, in the common case, result in bursts of children-to-parent traffic with tight deadlines. Recent work on datacenter network protocols is either deadline-agnostic (DCTCP) or is deadline-aware (D3) but suffers under bursts due to race conditions. Further, D3 has the practical drawbacks of requiring changes to the switch hardware and not being able to coexist with legacy TCP. We propose Deadline-Aware Datacenter TCP (D2TCP), a novel transport protocol, which handles bursts, is deadline-aware, and is readily deployable. In designing D2TCP, we make two contributions: (1) D2TCP uses a distributed and reactive approach for bandwidth allocation which fundamentally enables D2TCP's properties. (2) D2TCP employs a novel congestion avoidance algorithm, which uses ECN feedback and deadlines to modulate the congestion window via a gamma-correction function. Using a small-scale implementation and at-scale simulations, we show that D2TCP reduces the fraction of missed deadlines compared to DCTCP and D3 by 75% and 50%, respectively.
Balajee Vamanan, Jahangir Hasan, T.N. Vijaykumar, In ACM SIGCOMM, Aug 2012.
Measuring and Fingerprinting Click-Spam in Ad Networks
Advertising plays a vital role in supporting free websites and smartphone apps. Click-spam, i.e., fraudulent or invalid clicks on online ads where the user has no actual interest in the advertiser's site, results in advertising revenue being misappropriated by click-spammers. While ad networks take active measures to block click-spam today, the effectiveness of these measures is largely unknown. Moreover, advertisers and third parties have no way of independently estimating or defending against click-spam.
In this paper, we take the first systematic look at click-spam. We propose the first methodology for advertisers to independently measure click-spam rates on their ads. We also develop an automated methodology for ad networks to proactively detect different simultaneous click-spam attacks. We validate both methodologies using data from major ad networks. We then conduct a large-scale measurement study of click-spam across ten major ad networks and four types of ads. In the process, we identify and perform in-depth analysis on seven ongoing click-spam attacks not blocked by major ad networks at the time of this writing. Our findings highlight the severity of the click-spam problem, especially for mobile ads.
Vacha Dave, Saikat Guha, Yin Zhang, In ACM SIGCOMM, Aug 2012.
Tracking Millions of Flows in High Speed Networks for Application Identification
Today's Internet applications exhibit increased diversity, while the Internet routers are still oblivious to this trend. To improve the end-to-end application QoS, one solution is to embed the application information explicitly in packet headers, but it will bring global changes. Another local solution is router-assisted traffic differentiation. To achieve this, the functionalities including packet identification and flow tracking inside the router are required. While most existing studies focus on the former, fewer efforts are put on the later. Given a large flow table is involved, how to track millions of concurrent flows in a cost-effective manner on a router's line card raises a great space-time challenge. To address this, we design an on-chip/off-chip flow tracking system to accommodate millions of flows and achieve the throughput at tens of Gigabits. By exploiting temporal locality and heavy-tailedness of Layer-4 traffic, we design the Adaptive Least Frequently Evicted (ALFE) replacement policy to catch elephant flows, therefore maintain a high cache hit rate. To alleviate performance penalty due to the cache misses, we organize the flow table in a fixed-allocated manner to fully utilize modern DRAM's burst feature. We have implemented a research prototype using FPGA for performance evaluation. The experiment results show that our system can reach 80% hit rate with a small-sized cache of 16K entries, while achieving 70Mpps throughput. This enables backbone line rate processing. Further, more than 40% power saving can be achieved by our system, which is fast and accurate with only 3% FPGA resource usage.
Tian Pan, Xiaoyu Guo, Chenhui Zhang, Junchen Jiang, Hao Wu, Bin Liu, In IEEE Infocom, Mar 2012.
Unreeling Netflix: Understanding and Improving Multi-CDN Movie Delivery
Netflix is the leading provider of on-demand Internet video streaming in the US and Canada, accounting for 29.7% of the peak downstream traffic in US. Understanding the Netflix architecture and its performance can shed light on how to best optimize its design as well as on the design of similar on-demand streaming services. In this paper, we perform a measurement study of Netflix to uncover its architecture and service strategy. We find that Netflix employs a blend of data centers and Content Delivery Networks (CDNs) for content distribution. We also perform active measurements of the three CDNs employed by Netflix to quantify the video delivery bandwidth available to users across the US. Finally, as improvements to Netflix's current CDN assignment strategy, we propose a measurement-based adaptive CDN selection strategy and a multiple-CDN-based video delivery strategy, and demonstrate their potentials in significantly increasing user's average bandwidth.
Vijay Kumar Adhikari, Yang Guo, Fang Hao, Matteo Varvello, Volker Hilt, Moritz Steiner, Zhi-Li Zhang, In IEEE Infocom, Mar 2012.