Web Content Cartography

Recent studies show that a significant part of Internet traffic is delivered through Web-based applications. To cope with the increasing demand for Web content, large scale content hosting and delivery infrastructures, such as data-centers and content distribution networks, are continuously being deployed. Being able to identify and classify such hosting infrastructures is helpful not only to content producers and ISPs, but also to the research community at large. For example, to quantify the degree of hosting infrastructure deployment in the Internet or the replication of Web content.
In this paper, we introduce Web Content Cartography, i.e. the identification and classification of content hosting and delivery infrastructures. We propose a lightweight and fully automated approach to discover hosting infrastructures based only on DNS measurements and BGP routing table snapshots. Our experimental results show that our approach is feasible even with a limited number of well-distributed vantage points. We find that some popular content is served exclusively from specific regions and ASes.  Furthermore, our classification enables us to derive content-centric AS rankings that complement existing AS rankings and shed light on recent observations about shifts in interdomain traffic and the AS topology.

Selected Publications

Ager, Bernhard and Mühlbauer, Wolfgang and Smaragdakis, Georgios and Uhlig, Steve (2011). Web Content Cartography. Proceedings of Internet Measurement Conference (IMC '11). ACM, 585–600.

