by Fredy Künzler
Reading time: 4 minutes
How an Internet provider controls its data traffic
Where does the data that an average Internet customer obtains with their broadband connection actually come from? For an Internet provider, it is necessary to constantly analyze the data sources. However, it is not the traffic of an individual customer that is of interest, but the aggregated consumption of the entire customer base. In this way, the data protection and privacy of the customer is guaranteed (see FAQ).
The Internet is structured as a network of networks. The individual networks consist of the respective infrastructures of the providers, which are referred to as autonomous systems (AS). Each autonomous system has a globally unique number, the so-called AS number or ASN. Init7, for example, has the AS number 13030.
The autonomous systems are connected to each other via interconnections, which are simple Ethernet connections that are usually established in telehouses or data centers. These connections are referred to as “peering” – the capacities most commonly used today are designed for 10 or 100 gigabit/s; this is known as PNI (Private Network Interconnect). Another type of peering takes place via Internet Exchanges. These are switching platforms in which each participating network has one or more Ethernet connections. The peering is controlled via the BGP4 (Border Gateway Protocol Version 4) routing protocol.
In addition to the peerings, a provider requires additional capacity to provide the bandwidth. Direct interconnection with all other ASs is not feasible, as there are currently over 100,000 ASNs worldwide that are connected on the Internet. As a provider, you must therefore inevitably limit yourself to peering partners. It is important for customers to be connected to as many ASNs as possible. To achieve this, one or more supply contracts for IP transit are concluded. A carrier with a larger network transports the customer’s data to and from the autonomous systems that are not directly connected via peering.
But how do you select the relevant peering partners? In fact, only a few dozen to a hundred autonomous systems are really relevant in terms of the volume of data exchanged. Around 80% of the data volume purchased by an average Internet service provider comes from only around 10 autonomous systems. Unsurprisingly, these include the “Big 5”, sometimes referred to as “GAFAM”: Google, Amazon, Facebook (Meta), Apple and Microsoft. Also important are the CDN (Content Delivery Networks) providers Akamai, Cloudflare and Fastly, as well as Netflix and Twitch (Amazon IVS). In Switzerland, the TV provider Zattoo is also important. In Europe, data center and cloud providers such as OVH (France), Eweka (Netherlands) and Hetzner (Germany) also play an important role. In the case of gaming platforms, Steam (Valve Corporation) is worth mentioning. For outgoing traffic, ISPs (Internet Service Providers) from the same country or language region generally have larger volumes. In Switzerland, these are primarily Swisscom, Sunrise, Quickline and Salt.
NetFlow or SFlow data from the routers is evaluated to aggregate and analyze the actual data volume. This flow data is based on a snapshot in which, for example, every thousandth data packet transmitted by a router is recorded. The source and destination IP address and the packet size are saved. These flow samples are then forwarded to a flow collector and analyzed. The sample of one data packet per 1000 provides a fairly accurate measure of the actual data volume. Imagine, for example, video streams from YouTube, where millions of data packets are transmitted from the same source to the same destination.
There are various tools for analyzing the flow samples – at Init7 we use the open source software Akvorado. A particularly impressive feature of this program is the ability to create Sankey diagrams, which can be used to graphically display the top 15 traffic sources in a given period, for example.
The network architect of a provider tries to route the traffic from these sources (and destinations) via PNI peerings wherever possible. This is in the interest of both parties, as the content provider also strives to optimize its traffic. In addition to optimum quality, a PNI also offers the lowest costs compared to public peering or IP transit. However, peering requires time and money: you travel to industry events such as the European Peering Forum or the Global Peering Forum to negotiate directly with other peering coordinators. The time required to configure and operate the necessary tools should also not be underestimated: In addition to Akvorado, the open source program “Peering Manager” can be used to at least partially automate and manage the configuration of the routers.
At Init7, we have been trying to route our traffic as optimally as possible for well over 20 years and put a considerable amount of effort into this – which is reflected in the quality of our backbone. Other providers save themselves this effort and simply buy capacity from carriers, usually at the cheapest price, as the purchasers often lack the expertise to assess the quality.
Init7 therefore operates PNI to several dozen other networks, including, of course, the major content providers such as Google, Amazon, Facebook (Meta), Apple and Microsoft. These capacities are almost always designed with a bandwidth of 100 Gbit/s and are also redundantly provisioned at different locations so that an alternative path is available in the event of a failure. There are also around 30 Internet exchanges to which Init7 is connected. This information is documented in the PeeringDB, a database operated by the peering community, which lists relevant information about existing and potential peers. The traffic handled by Init7 via IP Transit only accounts for around two to three percent of the total volume, as our aim is to determine the quality of our Internet connections ourselves as far as possible for the benefit of our customers.
Interconnection capacity must therefore be measured on an ongoing basis and adjusted if necessary. If this is not done, overbooking can occur. This so-called overbooking occurs in two places: One is with broadband connections, where too many customers are connected to the available capacity – as we described in the blog post “Overbooking – how providers divide up the bandwidth”. The other area is interconnection. If the available peering capacity is exhausted in prime time on Sunday evening at 8:30 pm and no upgrades are made, this will be visible to the end customer. When streaming Netflix, YouTube or Zattoo, the user then sees “pixel mush” instead of HD quality. One data source may work perfectly while another is overloaded. Where the problem of overbooking actually lies is not apparent to most end customers.
Some providers, usually larger ones, deliberately do not adjust their overloaded capacities to demand – for monetary reasons. They demand money from the traffic sources – which is problematic under cartel law. We described how these cartels work some time ago in our blog “To peer or not to peer – Kartelle im Internet”.