Figures and facts (instead of introduction)
- In 2010, the average web page size was 481 kB. In 2019 - already 1936.7 kB ( detailed statistics ). Over the past three years, the value of this indicator has increased by 314.7%. Studies show that the tendency to increase the size of web pages continues .
- Streaming audio and video services are gaining popularity. As of April 2019, the number of subscribers to the popular Spotify service was 217 million.
- According to surveys, 25% of users leave a web page if it loads for more than 4 seconds. 74% of users downloading a site from a mobile device prefer not to wait if the download takes more than 5 seconds. 46% of users refuse to deal with a web service if it is slow.
What do the above facts testify to?
The fact that the Internet every year is becoming more and more "heavy" content.
And also that in the modern world a huge role is played by the speed of websites and services. If the speed is too low - this is fraught with the loss of the audience, and in many cases - also profit. One reliable way to solve this problem is to use Content Delivery Networks (CDNs).
Selectel has been offering
CDN service since 2014, and we have studied in detail the technical side of the issue. In this article we will talk about the device and the features of modern CDN.
Key Terms
Before you start a substantive discussion about the features of CDN, let's define the basic terminology.
CDN (Content Delivery Network) is a geographically distributed network infrastructure that provides fast delivery of content to users of web services and sites. The servers included in the CDN are geographically located in such a way as to make the response time for users of the site / service minimal.
Origin (origin) - the server on which the source files or data distributed through the CDN are stored.
PoP (point of presence) - a caching server within the CDN, located in a specific geographical location. The term edge is also used to refer to such servers.
Dynamic content - content generated on the server at the time of receipt of the request (either modified by the user or downloaded from the database).
Static content - content stored on the server in an unchanged form (for example, binary files, audio and video files, JS and CSS).
A bit of history and theory
The sharp rise in the Internet in the mid-1990s led to a situation where servers began to withstand the load. With servers of that time (which were sometimes weaker in technical specifications than the most productive modern laptop), I had to go to different tricks: google, for example, “hierarchical caching” and information superhighway - now these phrases are used only in articles on the history of Internet technologies . To understand how content distribution technologies have evolved, let us make a theoretical digression.
Note: the distribution of static and dynamic content is associated with different types of server load. In the case of dynamic content, the generation of which is associated with accesses to the database, the processor speed and the amount of RAM are important.
For the distribution of static content, which in most cases is very "heavy" and which needs to be downloaded very quickly, network speed is important first. The meaning of technical solutions to accelerate the distribution of statics is as follows: to provide horizontal scaling without complex two-way synchronization with the main server.
To reduce the load, the owners of web services in the late 1990s began to distribute statics and dynamics from different servers. Large web projects with a huge audience scattered around the world began to host static servers in different geographical locations.
Then, in the late 1990s, companies began to appear in which the organization of the distribution of statics became one of the main areas of business. In 1998, MIT student Daniel Levin and mathematics teacher Thomson Leighton founded Akamai. Now it is one of the largest (if not the largest) CDN provider in the world.
Already in 2004, more than 3,000 companies used CDNs; total content delivery costs were up to $ 20 million per month.
The number of CDNs around the world is constantly growing: relevant services are provided by both large international companies (for example, Akamai, Amazon, Cloudflare), and numerous regional providers (
detailed reviews ).
CDN is not only used to distribute statics in the strict sense of the word: distributing content across multiple servers around the world helps ensure availability during peak periods.
Over the past 10-12 years, another type of content has become widespread on the Internet - streaming (numerous streaming audio and video services, which today are very popular and have a million, if not billionth, audience). Distribution today is another common use case for CDN.
Consider the principles of operation and features of using CDN in more detail.
How does CDN work?
Imagine a web service used by people throughout Russia. The main servers are located in St. Petersburg, and users are located in different geographical locations: say, in Krasnodar (2 604.2 km from St. Petersburg), Novosibirsk (3 826.1 km), Irkutsk (5 661, 7 km) or Vladivostok (9 602, 4 km). The farther the user is from the original server, the longer the “original” response. At the dawn of the Runet, in the very beginning of the 2000s, residents of Yuzhno-Sakhalinsk or Petropavlovsk-Kamchatsky could wait for a full web page to load fully for 5 or even 10 minutes.
When using CDN, everything happens differently: a user from Vladivostok is redirected to the geographically closest caching server as part of the CDN, which makes delivery of static content much faster.
To speed up the distribution of dynamics when using CDN, other mechanisms are used: the CDN provider reduces the network route due to its network.
Another interesting scenario for using CDN is the so-called live-streaming: Internet users from all over the world can watch or listen to broadcasts from places of events in a browser (and sometimes in a special application). It is arranged this way: one or several origin servers receive a broadcast stream from the video camera, which is immediately relayed to the points of presence. Origin servers do not distribute content to clients. Streaming CDNs also include load balancers that redirect requests to the least-loaded edge servers at the moment.
How is content distribution organized?
As a rule, to configure the distribution of static content via CDN, you must perform the following steps:
Step 1: Move the site statics to a separate domain, for example, static.example.com - this will be origin.
Step 2: To work through CDN, create a domain of the form cdn.example.com.
Step 3: Connect the CDN from the provider. To connect, the owner of the web service must inform the provider of the following:
the domain from which it will take statics - static.example.com;
the domain from which the distribution will go is cdn.example.com.
Step 4: At your DNS registrar, configure a CNAME record from cdn.example.com to the domain of the CDN provider, which the CDN provider allocates when connected.
For example, in CDN Selectel, such a domain has the form 85e77c09-bc03-43bf-b8f3-9492ae33390f.selcdn.net, where 85e72c09-bc03-43bf-b8f3-9492ae33390f is generated automatically.
Step 5: On your site, change the domain for the static that you plan to distribute via CDN to cdn.example.com.
The user types in the browser bar the address
www.example.com , from which he receives an HTML page. Moreover, all static content, for example, graphic images, is loaded from the CDN (from the address cdn.example.com).
Static content intended for distribution is often placed in object repositories (
we wrote about this six years ago ). There are many plugins and extensions for popular CMS (Wordpress, Joomla, Drupal, 1C Bitrix and others), with which you can configure integration with cloud storage services and distribution of statics via CDN.
After connecting the CDN, the web service will work on the same original server. The cached parts of the site will be uploaded to the servers of the CDN network. The system finds the closest server for the user and loads the site’s statics from him as quickly as possible.
Let's pay attention to one important point: the servers included in the CDN are not similar to the file servers on which the content is hosted for later download. CDNs are not used for storing content, but for caching based on specific algorithms.
How does the CDN understand where the nearest caching server is?
As a rule, two popular technologies are used to load content from a CDN: GeoDNS and AnyCast.
Using GeoDNS, you can bind multiple IP addresses to a single domain name. Depending on the geographical location (determined by the IP address from which the request came), the user is redirected to the nearest server. You can read about the features of GeoDNS in
this article (in English).
When using Anycast technology, the addresses are common, but routing occurs to "their" servers within the region. When accessing the address
www.example.com, the user is redirected to the nearest point of presence. A user’s provider receives several announcements from different networks that have a point of presence, and the provider's router selects the closest one from them. The answer is likewise returned along the shortest route.
How is content cached?
The most common is the
first-access scheme: the maximum amount of time to download is spent by the user who accesses the original server first. All subsequent users will receive data cached at the point of presence closest to them.
Geography is very important here: for example, after contacting a user from Rio de Janeiro, the data will be cached on a server located in Brazil, which will not solve the problem with access speed for users from Paris or London.
To overcome the limitations imposed by this scheme, regional extraction technologies are used: neighboring servers included in the CDN take content from each other, rather than accessing the original server.
On most CDNs, the user who sends the request for static content is redirected to the nearest point of presence and receives a cached version of this content from it. If the nearest point of presence cannot find the files, a search will begin on neighboring points of presence, from where the user’s response will be redirected. In Akamai CDN, this procedure is called tiered distribution (you can translate it into Russian as “multi-level distribution”).
What are CDNs used for?
Most often, CDN is used to reduce the response time of cached content, which, as we mentioned above, reduces the outflow of visitors due to slow loading of the resource and thereby reduces possible financial losses. CDN also helps reduce the risk of losing access to content due to a fall in the primary server. Content will be available all the time while you restore the main server.
Using CDN significantly reduces the load on the main server, which helps to solve the problem of peak loads. Modern CDN is able to survive very large loads. At the end of 2018, Akamai
announced a record volume of CDN traffic transmitted : 72 Tb / s.
Nowadays, CDNs are also actively used to distribute streaming content.
What is important to remember when working with CDN?
Like any technology, CDN has a number of features.
The very first problem CDN-based web services may encounter is cache latency. The following situation is quite likely: on the main server, the file was changed, but on caching servers it will still lie unchanged. This is especially important when frequently updated content is distributed via CDN (photos from the scene, new software versions, and so on)
To ensure the delivery of "fresh" content in modern CDNs, there is a cache cleaning function, that is, removal of content from the cache pool. In addition, owners of sites and services can manage the settings themselves using validator headers (see our
article on this topic for recommendations on this topic).
Another difficulty is associated with blocking: if for one reason or another services that are your “neighbors” by IP CDN provider are blocked, your site may be blocked with you. But this problem can be solved: upon request, CDN providers can change your IP address.
Who needs a CDN?
CDN is primarily needed for projects with a large audience in different regions or countries. Everything is clear here: reducing delays, quick distribution of content and increasing the level of convenience, and, as a result, more satisfied users.
CDN can also be useful for mobile application developers: according to statistics, users often refuse to continue working with the application due to speed problems. Recently, special technical solutions have appeared that are focused on the distribution of content to mobile devices. They are called Mobile CDNs. Many large CDN providers, such as Akamai or Amazon, offer related services.
Need CDN and projects focused on the distribution of gaming, multimedia content and streaming (as mentioned above).
What to look for when choosing a CDN provider (instead of concluding)
The number of users of your web service is growing, the audience is expanding, and you are thinking of connecting a CDN to optimize and accelerate the distribution of statics and reduce the load on the main servers.
What should I look for when choosing a CDN provider?
Firstly, the
number of points of presence . This is especially true for projects with an extensive international audience. It will be useful to find out information about points of presence in the most interesting regions for you and compare them with the potential audience of the site.
Secondly, it is the
presence of joints with telecom operators . This is also an important factor on which the speed and efficiency of the CDN depends. For example, a CDN provider with points of presence in 100 cities, but with a small number of connections, may have a longer delay than a provider with points of presence located in 5 cities, but there are much more connections with telecom operators.
Unfortunately, in most cases, CDN providers do not publish such information, so you can only verify everything by testing.
Thirdly, the
availability of additional services and functions . Many CDN providers provide services such as analyzing consumption statistics, managing caching policies, managing HTTP headers, preloading very “heavy” content (from 200 MB or more), and completely and selectively clearing the cache.
In addition, when choosing a CDN provider, you need to check whether it supports the technologies and protocols you need (HTTP / 2, IPv6, SSL certificates and others).