How Internet blocking works: an overview of modern methods with a real example

A group of Indian scientists published a review of modern methods of Internet blocking introduced by government agencies, using the example of their own country. They studied the mechanisms that Internet providers use to restrict access to prohibited information, assessed their accuracy and the ability to circumvent such blocks. We present to your attention the main points of this work.

Input data

In recent years, researchers from different countries have done a lot of research on blocking methods that are used in countries that are considered to be "not free" - for example, in China or Iran. However, even democracies like India have in recent years developed a massive infrastructure for censoring the Internet.

During the study, scientists compiled a list of 1,200 sites potentially blocked in the country. Data was collected from open sources like Citizen Lab or Herdict. Then, Internet access was organized using the nine most popular Internet service providers.

To determine the fact of censorship and blocking the site, the OONI tool was originally used.

OONI vs proprietary script to search for locks

Researchers initially intended to use a popular censorship detection tool called OONI. However, already during the experiment it turned out that it gives a lot of false positives - a manual check of the results revealed a lot of inaccuracies.

Poor definition of censorship may be due to outdated OONI mechanisms. So, when DNS filtering is detected, the tool compares the IP address of the given host returned by Google DNS (it is considered uncensored) with the IP address assigned to the site by the Internet provider.

If the addresses do not match, then OONI signals the presence of a lock. However, in the realities of the modern Internet, different IP addresses do not say anything and, for example, may be evidence of the use of CDN networks.

Thus, the researchers had to write their own scripts to detect locks. Below is an overview of popular ways to block content on the Internet and an analysis of their effectiveness in modern conditions.

How to carry out locks or what are middleboxes

The analysis showed that in all cases of various types of locks, they are implemented using embedded network elements. Researchers called them middleboxes - they intercept user traffic, analyze it, and if they find an attempt to connect to a banned site, embed special packets in the traffic.

To detect middleboxes, researchers developed their own Iterative Network Tracing (INT) method, which uses the principles of the traceroute utility. Its essence boils down to sending web requests to a blocked site with an increase in TTL values in IP headers.

Middlebox interception mechanism

DNS locks

The DNS resolution process is a major step towards gaining access to any website. The user-entered URL is first resolved to the associated IP address. When using DNS blocking, censors always intervene precisely at this step - the resolved resolver returns the user an invalid IP address, as a result, the site simply does not open (DNS poisoning).

Another way to block this is to use DNS injection - in this case, the middlebox between the client and resolver intercepts the DNS request and sends its own response containing the incorrect IP address.

To identify DNS blocking by Internet providers, researchers used TOR with output nodes in countries without censorship - if it opens a site, and using a simple connection through a provider - no, then there is a fact of blocking.

After identifying sites blocked by DNS, the researchers determined the method of blocking.

Iterative network tracing method: the client sends special requests (DNS / HTTP GET) containing a blocked site and an ever-increasing TTL

TCP / IP packet filtering

Locks by filtering by packet header are considered a popular way of censoring online. On the Internet you can find a lot of research, the authors of which are trying to identify just such a way to block sites.

In reality, the problem is that this method can be easily confused with conventional system failures, leading to difficulties in the network and reducing its bandwidth. Unlike HTTP blocking, when filtering TCP / IP, the user does not receive any notifications that the site he needs is blocked - he simply does not open. It is very difficult to validate and separate the cases of blocking from the usual failures and errors in the network.

However, the researchers tried to do this. For this, the handshake procedure was used. handshake packets tunneled through Tor with output nodes in uncensored countries. In the case of sites with which it was possible to establish a connection using Tor, the handshake procedure was carried out five more times in a row with a delay of about two seconds. If each of the attempts was unsuccessful, with a high degree of probability it was a question of intentional filtering.

As a result, such a blocking method was not found for all tested Internet providers.

HTTP filtering

But in the case of five out of nine providers, HTTP filtering was detected. This method involves analyzing the contents of HTTP packets. You can implement it using those very intermediate network elements (middleboxes).

To identify HTTP filtering, researchers created Tor loops ending in countries without Internet censorship. Then they compared the content received in response to requests to blocked sites made domestically and using Tor.

One of the first tasks was to identify the moment at which the lock occurs. For example, in the case of some providers, after sending an HTTP GET request, the response came with an HTTP 200 OK response with the TCP FIN bit set and a blocking notification — it was he who made the client’s browser disconnect from the target site. However, after that, a package from the site also came. In such cases, it was not clear what became the blocking trigger - the client’s request, or the site’s response.

It was possible to find out with a simple manipulation: in the HTTP packet header in the GET request, the Host field was replaced with HOST. This turned out to be enough for the blocked site to open. This proves that censors only check client requests, not server responses.

Conclusion: are all providers blocking

Often, specific Internet providers do not block sites themselves, but rely in this regard on providers who manage "neighboring" networks. In the experiment under review, several Internet service providers were never seen using their own locks, but at the same time, the sites blocked in the country from their users could just not open.