Python and fast HTTP clients

Nowadays, if you are writing some kind of Python application, then you will most likely have to equip it with the functionality of an HTTP client that can communicate with HTTP servers. The ubiquity of the REST API has made HTTP tools respected residents of countless software projects. That is why any programmer needs to own patterns aimed at organizing optimal work with HTTP connections.







There are many HTTP clients for Python. The most common among them, and, besides, the one that is easy to work with, can be called requests . Today, this customer is the de facto standard.



Permanent Connections



The first optimization to consider when working with HTTP is to use persistent connections to web servers. Persistent connections have become standard since HTTP 1.1, but many applications still do not use them. This shortcoming is easy to explain, knowing that when using the requests



library in simple mode (for example, using its get



method), the connection to the server is closed after receiving a response from it. In order to avoid this, the application needs to use the Session



object, which allows reusing open connections:



 import requests session = requests.Session() session.get("http://example.com") #    session.get("http://example.com")
      
      





Connections are stored in the connection pool (it defaults to 10 connections by default). Pool size can be customized:



 import requests session = requests.Session() adapter = requests.adapters.HTTPAdapter(    pool_connections=100,    pool_maxsize=100) session.mount('http://', adapter) response = session.get("http://example.org")
      
      





Reusing a TCP connection to send multiple HTTP requests gives the application many performance advantages:





HTTP 1.1 also supports pipelining of requests. This allows you to send multiple requests within the same connection, without waiting for responses to previously sent requests (that is, send requests in "packets"). Unfortunately, the requests



library does not support this feature. However, pipelining requests may not be as fast as processing them in parallel. And, besides, it’s appropriate to pay attention to this: replies to “packet” requests should be sent by the server in the same sequence in which it received these requests. The result is not the most efficient request processing scheme based on the FIFO principle (“first in, first out” - “first come, first leave”).



Parallel query processing



requests



have one more serious flaw. This is a synchronous library. A method call like requests.get("http://example.org")



blocks the program until a full HTTP server response is received. The fact that the application has to wait and do nothing can be considered a minus of this scheme of organization of interaction with the server. Is it possible to make the program do something useful instead of just waiting?



A smartly designed application can mitigate this problem by using a thread pool, similar to those provided by concurrent.futures



. This allows you to quickly organize the parallelization of HTTP requests:



 from concurrent import futures import requests with futures.ThreadPoolExecutor(max_workers=4) as executor:    futures = [        executor.submit(            lambda: requests.get("http://example.org"))        for _ in range(8)    ] results = [    f.result().status_code    for f in futures ] print("Results: %s" % results)
      
      





This very useful pattern is implemented in the requests-futures library. At the same time, the use of Session



objects is transparent to the developer:



 from requests_futures import sessions session = sessions.FuturesSession() futures = [    session.get("http://example.org")    for _ in range(8) ] results = [    f.result().status_code    for f in futures ] print("Results: %s" % results)
      
      





By default, a worker with two threads is created, but the program can easily set this value by passing the FuturSession



argument or even its own executor to the FuturSession



object. For example, it might look like this:



 FuturesSession(executor=ThreadPoolExecutor(max_workers=10))
      
      





Asynchronous work with requests



As already mentioned, the requests



library is completely synchronous. This leads to application blocking while waiting for a response from the server, which affects performance poorly. One solution to this problem is to execute HTTP requests in separate threads. But the use of threads is an additional load on the system. In addition, this means the introduction of a parallel data processing scheme into the program, which does not suit everyone.



Starting with Python 3.5, the standard language features include asynchronous programming using asyncio



. The aiohttp library provides the developer with an asynchronous HTTP client based on asyncio



. This library allows the application to send a series of requests and continue to work. At the same time, to send another request, you do not need to wait for a response to a previously sent request. Unlike pipelining HTTP requests, aiohttp



sends requests in parallel using multiple connections. This avoids the “FIFO problem” described above. Here's what using aiohttp



looks like:



 import aiohttp import asyncio async def get(url):    async with aiohttp.ClientSession() as session:        async with session.get(url) as response:            return response loop = asyncio.get_event_loop() coroutines = [get("http://example.com") for _ in range(8)] results = loop.run_until_complete(asyncio.gather(*coroutines)) print("Results: %s" % results)
      
      





All the approaches described above (using Session



, streams, concurrent.futures



or asyncio



) offer different ways to speed up HTTP clients.



Performance



The following code is an example in which the HTTP client sends requests to the httpbin.org



server. The server supports an API that can, among other things, simulate a system that takes a long time to respond to a request (in this case, it is 1 second). Here, all the techniques discussed above are implemented and their performance is measured:



 import contextlib import time import aiohttp import asyncio import requests from requests_futures import sessions URL = "http://httpbin.org/delay/1" TRIES = 10 @contextlib.contextmanager def report_time(test):    t0 = time.time()    yield    print("Time needed for `%s' called: %.2fs"          % (test, time.time() - t0)) with report_time("serialized"):    for i in range(TRIES):        requests.get(URL) session = requests.Session() with report_time("Session"):    for i in range(TRIES):        session.get(URL) session = sessions.FuturesSession(max_workers=2) with report_time("FuturesSession w/ 2 workers"):    futures = [session.get(URL)               for i in range(TRIES)]    for f in futures:        f.result() session = sessions.FuturesSession(max_workers=TRIES) with report_time("FuturesSession w/ max workers"):    futures = [session.get(URL)               for i in range(TRIES)]    for f in futures:        f.result() async def get(url):    async with aiohttp.ClientSession() as session:        async with session.get(url) as response:            await response.read() loop = asyncio.get_event_loop() with report_time("aiohttp"):    loop.run_until_complete(        asyncio.gather(*[get(URL)                         for i in range(TRIES)]))
      
      





Here are the results obtained after starting this program:



 Time needed for `serialized' called: 12.12s Time needed for `Session' called: 11.22s Time needed for `FuturesSession w/ 2 workers' called: 5.65s Time needed for `FuturesSession w/ max workers' called: 1.25s Time needed for `aiohttp' called: 1.19s
      
      





Here is a chart of the results.



The results of a study of the performance of different methods for making HTTP requests



It is not surprising that the simplest synchronous query execution scheme turned out to be the slowest. The point here is that here the queries are executed one by one, without reusing the connection. As a result, it takes 12 seconds to complete 10 queries.



Using the Session



object, and as a result, reusing connections, saves 8% of the time. This is already very good, and to achieve this is very simple. Anyone who cares about performance should use at least the Session



object.



If your system and your program allow you to work with threads, then this is a good reason to think about using threads to parallelize queries. Streams, however, create some additional load on the system; they are, so to speak, not “free”. They need to be created, run, you need to wait for the completion of their work.



If you want to use the fast asynchronous HTTP client, then if you are not writing on older versions of Python, you should pay the most serious attention to aiohttp



. This is the fastest solution, best scalable. It is capable of handling hundreds of concurrent requests.



An alternative to aiohttp



, not a particularly good alternative is to manage hundreds of threads in parallel.



Stream Data Processing



Another optimization of working with network resources, which may be useful in terms of improving application performance, is to use streaming data. The standard request processing scheme looks like this: the application sends a request, after which the body of this request is loaded in one go. The stream



parameter, which supports the requests



library, as well as the content



attribute of the aiohttp



library, allows you to move away from this scheme.



Here's how the organization of streaming data processing using requests



looks like:



 import requests #  `with`          #     . with requests.get('http://example.org', stream=True) as r:    print(list(r.iter_content()))
      
      





Here's how to stream data using aiohttp



:



 import aiohttp import asyncio async def get(url):    async with aiohttp.ClientSession() as session:        async with session.get(url) as response:            return await response.content.read() loop = asyncio.get_event_loop() tasks = [asyncio.ensure_future(get("http://example.com"))] loop.run_until_complete(asyncio.wait(tasks)) print("Results: %s" % [task.result() for task in tasks])
      
      





Eliminating the need for instant loading of the full response content is important in cases where you need to prevent the potential possibility of useless allocation of hundreds of megabytes of memory. If the program does not need access to the answer as a whole, if it can work with individual fragments of the answer, then it is probably best to resort to methods of streaming work with requests. For example, if you are going to save data from the server’s response to a file, then reading and writing it in parts will be much more efficient in terms of memory usage than reading the entire response body, allocating a huge amount of memory and then writing it all to disk.



Summary



I hope that my talk about different ways to optimize the operation of HTTP clients will help you choose what is best for your Python application.



Dear readers! If you still know any other ways to optimize working with HTTP requests in Python applications, please share them.








All Articles