Web Content Cartography
Recent studies show that a
significant part of Internet traffic is delivered through Web-based
applications. To cope with the increasing demand for Web content,
large scale content hosting and delivery infrastructures, such as
data-centers and content distribution networks, are continuously being
deployed. Being able to identify and classify such hosting
infrastructures is helpful not only to content producers and ISPs, but
also to the research community at large. For example, to quantify the
degree of hosting infrastructure deployment in the Internet or the
replication of Web content.
In this paper, we introduce Web Content Cartography, i.e. the identification and classification of content hosting and delivery infrastructures. We propose a lightweight and fully automated approach to discover hosting infrastructures based only on DNS measurements and BGP routing table snapshots. Our experimental results show that our approach is feasible even with a limited number of well-distributed vantage points. We find that some popular content is served exclusively from specific regions and ASes. Furthermore, our classification enables us to derive content-centric AS rankings that complement existing AS rankings and shed light on recent observations about shifts in interdomain traffic and the AS topology.