This article focuses on how Instagram uses types to document HTTP APIs and to enforce contracts when working with them.
Situation overview
When you open the Instagram mobile client, it, through the HTTP protocol, accesses the JSON-API of our Python (Django) server.
Here is some information about our system that will allow you to get an idea of the complexity of the API that we use to organize the work of the mobile client. So here is what we have:
- Over 2000 endpoints on the server.
- Over 200 top-level fields in a client data object that represents an image, video, or story in an application.
- Hundreds of programmers who write server code (and even more who deal with the client).
- Hundreds of commits to server code made daily and modifying APIs. This is necessary to provide support for new system features.
We use types to document our complex, constantly evolving HTTP APIs and to enforce contracts when working with them.
Types
Let's start from the very beginning. The syntax description for type annotations in Python code appeared in PEP 484 . But why add type annotations to the code?
Consider a function that downloads information about a Star Wars hero:
def get_character(id, calendar): if id == 1000: return Character( id=1000, name="Luke Skywalker", birth_year="19BBY" if calendar == Calendar.BBY else ... ) ...
In order to understand this function, you need to read its code. Having done this, you can find out the following:
- It takes the integer identifier (
id
) of the character. - It takes the value from the corresponding enumeration (
calendar
). For example,Calendar.BBY
stands for "Before Battle of Yavin," that is, "Before the Battle of Yavin." - It returns information about the character in the form of an entity containing fields representing the identifier of this character, his name and year of birth.
The function has an implicit contract, the meaning of which the programmer has to restore every time he reads the function code. But the function code is written only once, and you have to read it many times, so this approach to working with this code is not particularly good.
Moreover, it is difficult to verify that the mechanism that calls the function adheres to the implicit contract described above. Similarly, it is difficult to verify that this contract is respected in the body of the function. In a large code base, such situations can lead to errors.
Now consider the same function that declares type annotations:
def get_character(id: int, calendar: Calendar) -> Character: ...
Type annotations allow you to explicitly express the contract of this function. In order to understand what needs to be input to a function, and what this function returns, just read its signature. A type checking system can statically analyze the function and verify compliance with the contract in the code. This allows you to get rid of a whole class of errors!
Types for various HTTP APIs
We’ll develop an HTTP-API that allows you to receive information about the heroes of Star Wars. To describe the explicit contract used when working with this API, we will use type annotations.
Our API should accept the character identifier (
id
) as a URL parameter and the value of the
calendar
enumeration as a request parameter. The API should return a JSON response with character information.
Here's what the API request looks like and the response it returns:
curl -X GET https://api.starwars.com/characters/1000?calendar=BBY { "id": 1000, "name": "Luke Skywalker", "birth_year": "19BBY" }
To implement this API in Django, you first need to register the URL path and the presentation function responsible for receiving the HTTP request made along this path and for returning the response.
urlpatterns = [ url("characters/<id>/", get_character) ]
The function, as input, accepts the request and URL parameters (in our case,
id
). It parses and casts the
calendar
request parameter, which is the value from the corresponding enumeration, to the required type. It loads character data from the repository and returns a dictionary serialized in JSON and wrapped in an HTTP response.
def get_character(request: IGWSGIRequest, id: str) -> JsonResponse: calendar = Calendar(request.GET.get("calendar", "BBY")) character = Store.get_character(id, calendar) return JsonResponse(asdict(character))
Although the function is provided with type annotations, it does not explicitly describe the hard contract for the HTTP API. From the signature of this function we cannot find out the names or types of request parameters, or response fields and their types.
Is it possible to make the signature of the function-representation be exactly the same informative as the signature of the previously considered function with type annotations?
def get_character(id: int, calendar: Calendar) -> Character: ...
Function parameters can be query parameters (URL, query, or query body parameters). The type of value returned by the function may represent the contents of the response. With this approach, we would have at our disposal a clearly defined and understandable contract for the HTTP API, the observance of which could be ensured by a type checking system.
Implementation
How to implement this idea?
We use a decorator to convert a strongly typed representation function into a Django representation function. This step does not require changes in terms of working with the Django framework. We can use the same middleware, the same routes and other components that we are used to.
@api_view def get_character(id: int, calendar: Calendar) -> Character: ...
Consider the details of the
api_view
decorator
api_view
:
def api_view(view): @functools.wraps(view) def django_view(request, *args, **kwargs): params = { param_name: param.annotation(extract(request, param)) for param_name, param in inspect.signature(view).parameters.items() } data = view(**params) return JsonResponse(asdict(data)) return django_view
This is a difficult piece of code to understand. Let's analyze its features.
We, as an input value, take a strongly typed representation function and wrap it in a regular Django representation function, which we return:
def api_view(view): @functools.wraps(view) def django_view(request, *args, **kwargs): ... return django_view
Now take a look at the implementation of the Django view function. First we need to construct arguments for a strongly typed presentation function. We use introspection and the inspect module to obtain the signature of this function and iterate over its parameters:
for param_name, param in inspect.signature(view).parameters.items()
For each parameter, we call the
extract
function, which extracts the parameter value from the request.
Then we cast the parameter to the expected type specified in the signature (for example, cast the string
calendar
to a value that is an element of the
Calendar
enumeration).
param.annotation(extract(request, param))
We call a strongly typed representation function with the arguments we constructed:
data = view(**params)
The function returns a strongly typed value of the
Character
class. We take this value, transform it into a dictionary and wrap it in a JSON format HTTP response:
return JsonResponse(asdict(data))
Fine! We now have a Django view function that wraps a strongly typed view function. Finally, take a look at the
extract
function:
def extract(request: HttpRequest, param: Parameter) -> Any: if request.resolver_match.route.contains(f"<{param}>"): return request.resolver_match.kwargs.get(param.name) else: return request.GET.get(param.name)
Each parameter can be a URL parameter or a request parameter. The request URL path (the path that we registered at the very beginning) is available in the route object of the Django URL locator system. We check for the parameter name in the path. If there is a name, then we have a URL parameter. This means that we can somehow extract it from the request. Otherwise, this is a query parameter and we can also extract it, but in some other way.
That's all. This is a simplified implementation, but it illustrates the basic idea of typing an API.
Data types
The type used to represent the contents of the HTTP response (i.e.,
Character
) can be represented either by a dataclass or a typed dictionary.
A data class is a compact class description format that represents data.
from dataclasses import dataclass @dataclass(frozen=True) class Character: id: int name: str birth_year: str luke = Character( id=1000, name="Luke Skywalker", birth_year="19BBY" )
Instagram typically uses data classes to model HTTP response objects. Here are their main features:
- They automatically generate template designs and various helper methods.
- They are understandable to type checking systems, which means that values can be subject to type checks.
- They maintain immunity thanks to the
frozen=True
construct. - They are available in the Python 3.7 standard library, or as a backport in the Python Package Index.
Unfortunately, Instagram has an outdated code base that uses large, untyped dictionaries, passed between functions and modules. It would be difficult to translate all this code from dictionaries to data classes. As a result, we, using data classes for the new code, and in outdated code we use typed dictionaries .
Using typed dictionaries allows us to add type annotations to client dictionary objects and, without changing the behavior of a working system, use the type checking capabilities.
from mypy_extensions import TypedDict class Character(TypedDict): id: int name: str birth_year: str luke: Character = {"id": 1000} luke["name"] = "Luke Skywalker" luke["birth_year"] = 19 # type error, birth_year expects a str luke["invalid_key"] # type error, invalid_key does not exist
Error processing
The view function is expected to return character information in the form of a
Character
entity. What should we do if we need to return an error to the client?
You can throw an exception that will be caught by the framework and converted into an HTTP response with error information.
@api_view("GET") def get_character(id: str, calendar: Calendar) -> Character: try: return Store.get_character(id) except CharacterNotFound: raise Http404Exception()
This example also demonstrates the HTTP method in the decorator, which sets the HTTP methods allowed for this API.
Tools
The HTTP API is strongly typed using the HTTP method, request types, and response types. We can introspect this API and determine that it should accept a GET request with the
id
string in the URL path and with the
calendar
value related to the corresponding enumeration in the query string. We can also learn that in response to such a request a JSON response should be given with information about the nature of
Character
.
What can be done with all this information?
OpenAPI is an API description format on the basis of which a rich set of auxiliary tools is created. This is a whole ecosystem. If we write some code to perform introspection of endpoints and generate OpenAPI specifications based on the received data, this will mean that we will have the capabilities of these tools.
paths: /characters/{id}: get: parameters: - in: path name: id schema: type: integer required: true - in: query name: calendar schema: type: string enum: ["BBY"] responses: '200': content: application/json: schema: type: object ...
We can generate HTTP API documentation for the
get_character
API, which includes names, types, request and response information. This is an appropriate level of abstraction for client developers who need to fulfill requests to the appropriate endpoint. They do not need to read the Python implementation code for this endpoint.
API documentation
On this basis, you can create additional tools. For example, a means to execute a request from a browser. This allows developers to access the HTTP APIs of interest to them without having to write code. We can even generate type-safe client code to ensure that types work correctly on both the client and server. Due to this, we may have at our disposal a strictly typed API on the server, calls to which are performed using strictly typed client code.
In addition, we can create a backward compatibility check system. What happens if we release a new version of the server code in which to access the API in question we need to use
id
,
name
and
birth_year
, and then we understand that we do not know the birthdays of all the characters? In this case, the
birth_year
parameter
birth_year
need to be made optional, but at the same time, older versions of clients that expect a similar parameter may simply stop working. Although our APIs differ in explicit typing, the corresponding types may change (say, the API will change if the use of the character’s birth year was first mandatory and then became optional). We can track API changes and warn API developers by giving them prompts at the right time that, by making some changes, they can disrupt the performance of clients.
Summary
There is a whole range of application protocols that computers can use to communicate with each other.
One side of this spectrum is represented by RPC frameworks like Thrift and gRPC. They differ in that they usually set strict types for requests and responses and generate client and server code for organizing the operation of requests. They can do without HTTP and even without JSON.
On the other hand, there are unstructured web frameworks written in Python that do not have explicit contracts for requests and responses. Our approach provides opportunities typical for more clearly structured frameworks, but at the same time allows you to continue using the HTTP + JSON bundle and contributes to the fact that you have to make a minimum of changes to the application code.
It is important to note that this idea is not new. There are many frameworks written in strongly typed languages that provide developers with the features we described. If we talk about Python, then this is, for example, the APIStar framework.
We have successfully commissioned the use of types for the HTTP API. We were able to apply the described approach to typing the API throughout our code base due to the fact that it is well applicable to existing presentation functions. The value of what we did is obvious to all our programmers. Namely, we are talking about the fact that the documentation generated automatically has become an effective means of communication between those who develop the server and those who write the Instagram client.
Dear readers! How do you approach the design of HTTP APIs in your Python projects?