CacheBrowser Experiment: Bypassing a Chinese Firewall Without a Proxy Using Content Caching

Image: Unsplash

Today, a significant portion of all Internet content is distributed using CDN networks. In doing so, research on how various censors spread their influence over such networks. Scientists from the University of Massachusetts analyzed possible methods of blocking CDN content using the example of the practices of the Chinese authorities, and also developed a tool to bypass such locks.

We prepared a review material with the main conclusions and results of this experiment.

Introduction

Censorship is a global threat to free speech on the Internet and free access to information. In many ways, this is possible due to the fact that the Internet has borrowed the “end-to-end communication” model from the telephone networks of the 70s of the last century. This allows you to block access to content or user communication without serious effort or expense, simply based on the IP address. There are several ways from blocking the address itself with forbidden content to blocking the ability for users to even recognize it using DNS manipulation.

However, the development of the Internet has also led to the emergence of new ways of disseminating information. One of them is the use of cached content to improve performance and speed up communications. Today, CDN providers handle a significant amount of all the traffic in the world - only Akamai, the leader in this segment, accounts for up to 30% of global static web traffic.

A CDN network is a distributed system for delivering Internet content at maximum speed. A typical CDN network consists of servers in various geographical locations that cache content in order to "give" it to those users who are closest to this server. This can significantly increase the speed of online communication.

In addition to improving the quality of service for end users, CDN hosting helps content creators scale their projects, reducing the load on the infrastructure.

Censoring CDN Content

Despite the fact that CDN traffic already accounts for a significant share of all information transmitted via the Internet, there is still almost no research on how censors in the real world approach its control.

The authors of the study began by examining censorship techniques that can be applied to CDNs. They then examined the real mechanisms that the Chinese authorities are using.

First, let's talk about the possible censorship methods and the possibility of their application for CDN control.

IP Filtering

This is the easiest and cheapest Internet censorship technique. Using this approach, the censor identifies and blacklistes the IP addresses of resources hosting prohibited content. Then controlled Internet service providers stop delivering packets sent to such addresses.

IP-based blocking is one of the most common methods of censoring the Internet. Most commercial network devices are equipped with features to perform these locks without significant computational cost.

However, this method is not very suitable for blocking CDN traffic due to some properties of the technology itself:

Distributed caching - to ensure the best accessibility of content and optimize performance, CDN networks cache user content on a large number of edge servers located in geographically distributed locations. To filter such content based on IP, the censor will need to find out the addresses of all edge servers and blacklist them. This will hit the main properties of the method, because its main advantage is that in the usual scheme, blocking one server allows you to "chop off" access to prohibited content immediately for a large number of people.
Shared IP commercial CDN providers share their infrastructure (i.e. edge servers, mapping system, etc.) between multiple clients. As a result, the forbidden CDN content is downloaded from the same IP addresses as the non-prohibited content. As a result, any attempt to IP-filtering will lead to the fact that a huge number of sites and content that do not interest censors will also be blocked.
Highly dynamic IP assignment - to optimize load balancing and improve the quality of service, mapping of edge servers and end users is very fast and dynamic. For example, Akamai updates the returned IP addresses every minute. This will make it almost impossible to associate addresses with prohibited content.

DNS interference

In addition to IP filtering, another popular way of censoring is DNS interference. This approach involves the actions of censors to ensure that users do not recognize the IP addresses of resources with prohibited content at all. That is, the intervention is at the level of domain name resolution. There are several ways to do this, including cracking DNS connections, using the DNS poisoning technique, and blocking DNS queries to banned sites.

This is a very effective way to block, but it can be circumvented if you use non-standard methods of resolving DNS, for example, out-of-band channels. Therefore, censors typically combine DNS blocking with IP filtering. But, as stated above, IP filtering is not effective for censoring CDN content.

URL / Keyword Filtering with DPI

Modern equipment for monitoring network activity can be used to analyze specific URLs and keywords in transmitted data packets. This technology is called DPI (deep packet inspection). Such systems find references to forbidden words and resources, after which there is an interference with online communication. As a result, packets are simply discarded.

This method is effective, but more complex and resource-intensive, since it requires defragmentation of all data packets sent within certain streams.

CDN content can be protected from such filtering as well as "regular" content - in both cases, the use of encryption (that is, HTTPS) helps.

In addition to using DPI to search for keywords or URLs of prohibited resources, these tools can be used for more advanced analysis. Such methods include statistical analysis of online / offline traffic and analysis of identification protocols. These methods are extremely resource-intensive and at the moment there is simply no evidence that censors use them in a sufficiently serious amount.

Self-censorship of CDN providers

If the censor is a state, then it has every opportunity to prohibit CDN providers who do not comply with local laws governing access to content in the country. Self-censorship cannot be resisted in any way - therefore, if a CDN provider company is interested in working in a certain country, it will be forced to comply with local laws, even if they restrict freedom of speech.

How China Censors CDN Content

The great Chinese firewall is rightly considered the most effective and advanced system for providing Internet censorship.

Research methodology

Scientists experimented with a Linux node located inside China. They also had access to several computers abroad. At first, the researchers checked that the node was censored, similar to that applied to other Chinese users - for this they tried to open various prohibited sites from this machine. So the presence of the same level of censorship was confirmed.

The list of CDN-blocked sites in China was taken from GreatFire.org. Then, an analysis of the blocking method in each case was carried out.

According to open sources, Akamai is the only major player in the CDN market with its own infrastructure in China. Other providers involved in the study: CloudFlare, Amazon CloudFront, EdgeCast, Fastly, and SoftLayer.

During the experiments, the researchers found out the addresses of Akamai edge-servers inside the country, and then tried to get cached allowed content through them. It was not possible to access forbidden content (HTTP 403 Forbidden error returned) - obviously, the company conducts self-censorship in order to maintain the possibility of working in the country. At the same time, access to these resources remained open outside the country.

Providers without infrastructure in China do not use self-censorship for local users.

In the case of other providers, the most commonly used method of blocking was DNS filtering - requests to blocked sites are resolved to invalid IP addresses. At the same time, the firewall does not block the CDN edge servers themselves, since they store both forbidden and allowed information.

And if in the case of unencrypted traffic the authorities have the opportunity to block individual pages of sites using DPI, then using HTTPS they can only restrict access to the entire domain as a whole. This leads, among other things, to blocking permitted content.

In addition, China has its own CDN providers, including networks such as ChinaCache, ChinaNetCenter and CDNetworks. All these companies fully comply with the laws of the country and block prohibited content.

CacheBrowser: CDN lock bypass tool

As analysis has shown, censors find it difficult to block CDN content. Therefore, the researchers decided to go further and develop an online blocking bypass tool that would not use proxy technology.

The main idea of the tool is that censors have to interfere with the operation of DNS to block CDNs, but it’s not necessary to use domain name resolution to load CDN content. Thus, the user can get the content he needs by directly contacting the edge server on which he is already cached.

The diagram below shows the system device.

Client software is installed on the user's computer; a regular browser is used to access the content.

When requesting a URL or part of already requested content, the browser sends a request to the local DNS system (LocalDNS) to get the IP address of the hosting. Normal DNS is only requested for domains that are not already in the LocalDNS database. The Scraper module constantly goes through the requested URLs and looks for potentially blocked domain names in the list. Then Scraper calls on the Resolver module to resolve newly discovered blocked domains, this module performs the task and adds an entry to LocalDNS. Then, the browser’s DNS cache is cleared to remove existing DNS records for the locked domain.

If the Resolver module cannot understand which CDN provider the domain belongs to, then it will ask the Bootstrapper module for help.

How it works in practice

Client software of the product was implemented for Linux, but it can be easily ported, including for Windows. The browser uses the usual Mozilla

Firefox The Scraper and Resolver modules are written in Python, and the Customer-to-CDN and CDN-toIP databases are stored in .txt files. The localDNS database is the regular / etc / hosts file on Linux.

As a result, for a blocked URL of the form blocked.com, the script will receive the IP address of the edge server from the / etc / hosts file and send an HTTP GET request to access BlockedURL.html with the fields of the Host HTTP header:

blocked.com/ and User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:14.0) Gecko/20100101 Firefox/14.0.1

The Bootstrapper module is implemented using the free digwebinterface.com tool. This DNS resolver cannot be blocked and it answers DNS queries on behalf of many geographically distributed DNS servers in different network regions.

Using this tool, researchers managed to access Facebook from their Chinese node - although the social network has long been blocked in China.

Conclusion

The experiment showed that the use of problems that censors experience when trying to block CDN content can be used to create a system for bypassing locks. Such a tool allows you to bypass locks even in China, where one of the most powerful online censorship systems operates.