Types for HTTP APIs written in Python: Instagram experience

Today we are publishing the second material from the series on using Python in Instagram. Last time it was checking the types of Instagram server code. The server is a monolith written in Python. It consists of several million lines of code and has several thousand Django endpoints.







This article focuses on how Instagram uses types to document HTTP APIs and to enforce contracts when working with them.



Situation overview



When you open the Instagram mobile client, it, through the HTTP protocol, accesses the JSON-API of our Python (Django) server.



Here is some information about our system that will allow you to get an idea of ​​the complexity of the API that we use to organize the work of the mobile client. So here is what we have:





We use types to document our complex, constantly evolving HTTP APIs and to enforce contracts when working with them.



Types



Let's start from the very beginning. The syntax description for type annotations in Python code appeared in PEP 484 . But why add type annotations to the code?



Consider a function that downloads information about a Star Wars hero:



def get_character(id, calendar):     if id == 1000:         return Character(             id=1000,             name="Luke Skywalker",             birth_year="19BBY" if calendar == Calendar.BBY else ...         )     ...
      
      





In order to understand this function, you need to read its code. Having done this, you can find out the following:





The function has an implicit contract, the meaning of which the programmer has to restore every time he reads the function code. But the function code is written only once, and you have to read it many times, so this approach to working with this code is not particularly good.



Moreover, it is difficult to verify that the mechanism that calls the function adheres to the implicit contract described above. Similarly, it is difficult to verify that this contract is respected in the body of the function. In a large code base, such situations can lead to errors.



Now consider the same function that declares type annotations:



 def get_character(id: int, calendar: Calendar) -> Character:    ...
      
      





Type annotations allow you to explicitly express the contract of this function. In order to understand what needs to be input to a function, and what this function returns, just read its signature. A type checking system can statically analyze the function and verify compliance with the contract in the code. This allows you to get rid of a whole class of errors!



Types for various HTTP APIs



We’ll develop an HTTP-API that allows you to receive information about the heroes of Star Wars. To describe the explicit contract used when working with this API, we will use type annotations.



Our API should accept the character identifier ( id



) as a URL parameter and the value of the calendar



enumeration as a request parameter. The API should return a JSON response with character information.



Here's what the API request looks like and the response it returns:



 curl -X GET https://api.starwars.com/characters/1000?calendar=BBY {    "id": 1000,    "name": "Luke Skywalker",    "birth_year": "19BBY" }
      
      





To implement this API in Django, you first need to register the URL path and the presentation function responsible for receiving the HTTP request made along this path and for returning the response.



 urlpatterns = [    url("characters/<id>/", get_character) ]
      
      





The function, as input, accepts the request and URL parameters (in our case, id



). It parses and casts the calendar



request parameter, which is the value from the corresponding enumeration, to the required type. It loads character data from the repository and returns a dictionary serialized in JSON and wrapped in an HTTP response.



 def get_character(request: IGWSGIRequest, id: str) -> JsonResponse:    calendar = Calendar(request.GET.get("calendar", "BBY"))    character = Store.get_character(id, calendar)    return JsonResponse(asdict(character))
      
      





Although the function is provided with type annotations, it does not explicitly describe the hard contract for the HTTP API. From the signature of this function we cannot find out the names or types of request parameters, or response fields and their types.



Is it possible to make the signature of the function-representation be exactly the same informative as the signature of the previously considered function with type annotations?



 def get_character(id: int, calendar: Calendar) -> Character:    ...
      
      





Function parameters can be query parameters (URL, query, or query body parameters). The type of value returned by the function may represent the contents of the response. With this approach, we would have at our disposal a clearly defined and understandable contract for the HTTP API, the observance of which could be ensured by a type checking system.



Implementation



How to implement this idea?



We use a decorator to convert a strongly typed representation function into a Django representation function. This step does not require changes in terms of working with the Django framework. We can use the same middleware, the same routes and other components that we are used to.



 @api_view def get_character(id: int, calendar: Calendar) -> Character:    ...
      
      





Consider the details of the api_view



decorator api_view



:



 def api_view(view):    @functools.wraps(view)    def django_view(request, *args, **kwargs):        params = {            param_name: param.annotation(extract(request, param))            for param_name, param in inspect.signature(view).parameters.items()        }        data = view(**params)        return JsonResponse(asdict(data))       return django_view
      
      





This is a difficult piece of code to understand. Let's analyze its features.

We, as an input value, take a strongly typed representation function and wrap it in a regular Django representation function, which we return:



 def api_view(view):    @functools.wraps(view)    def django_view(request, *args, **kwargs):        ...    return django_view
      
      





Now take a look at the implementation of the Django view function. First we need to construct arguments for a strongly typed presentation function. We use introspection and the inspect module to obtain the signature of this function and iterate over its parameters:



 for param_name, param in inspect.signature(view).parameters.items()
      
      





For each parameter, we call the extract



function, which extracts the parameter value from the request.



Then we cast the parameter to the expected type specified in the signature (for example, cast the string calendar



to a value that is an element of the Calendar



enumeration).



 param.annotation(extract(request, param))
      
      





We call a strongly typed representation function with the arguments we constructed:



 data = view(**params)
      
      





The function returns a strongly typed value of the Character



class. We take this value, transform it into a dictionary and wrap it in a JSON format HTTP response:



 return JsonResponse(asdict(data))
      
      





Fine! We now have a Django view function that wraps a strongly typed view function. Finally, take a look at the extract



function:



 def extract(request: HttpRequest, param: Parameter) -> Any:    if request.resolver_match.route.contains(f"<{param}>"):        return request.resolver_match.kwargs.get(param.name)    else:        return request.GET.get(param.name)
      
      





Each parameter can be a URL parameter or a request parameter. The request URL path (the path that we registered at the very beginning) is available in the route object of the Django URL locator system. We check for the parameter name in the path. If there is a name, then we have a URL parameter. This means that we can somehow extract it from the request. Otherwise, this is a query parameter and we can also extract it, but in some other way.



That's all. This is a simplified implementation, but it illustrates the basic idea of ​​typing an API.



Data types



The type used to represent the contents of the HTTP response (i.e., Character



) can be represented either by a dataclass or a typed dictionary.



A data class is a compact class description format that represents data.



 from dataclasses import dataclass @dataclass(frozen=True) class Character:    id: int    name: str    birth_year: str luke = Character(    id=1000,    name="Luke Skywalker",    birth_year="19BBY" )
      
      





Instagram typically uses data classes to model HTTP response objects. Here are their main features:





Unfortunately, Instagram has an outdated code base that uses large, untyped dictionaries, passed between functions and modules. It would be difficult to translate all this code from dictionaries to data classes. As a result, we, using data classes for the new code, and in outdated code we use typed dictionaries .



Using typed dictionaries allows us to add type annotations to client dictionary objects and, without changing the behavior of a working system, use the type checking capabilities.



 from mypy_extensions import TypedDict class Character(TypedDict):    id: int    name: str    birth_year: str luke: Character = {"id": 1000} luke["name"] = "Luke Skywalker" luke["birth_year"] = 19 # type error, birth_year expects a str luke["invalid_key"] # type error, invalid_key does not exist
      
      





Error processing



The view function is expected to return character information in the form of a Character



entity. What should we do if we need to return an error to the client?



You can throw an exception that will be caught by the framework and converted into an HTTP response with error information.



 @api_view("GET") def get_character(id: str, calendar: Calendar) -> Character:    try:        return Store.get_character(id)    except CharacterNotFound:        raise Http404Exception()
      
      





This example also demonstrates the HTTP method in the decorator, which sets the HTTP methods allowed for this API.



Tools



The HTTP API is strongly typed using the HTTP method, request types, and response types. We can introspect this API and determine that it should accept a GET request with the id



string in the URL path and with the calendar



value related to the corresponding enumeration in the query string. We can also learn that in response to such a request a JSON response should be given with information about the nature of Character



.



What can be done with all this information?



OpenAPI is an API description format on the basis of which a rich set of auxiliary tools is created. This is a whole ecosystem. If we write some code to perform introspection of endpoints and generate OpenAPI specifications based on the received data, this will mean that we will have the capabilities of these tools.



 paths:  /characters/{id}:    get:      parameters:        - in: path          name: id          schema:            type: integer          required: true        - in: query          name: calendar          schema:            type: string            enum: ["BBY"]      responses:        '200':          content:            application/json:              schema:                type: object                ...
      
      





We can generate HTTP API documentation for the get_character



API, which includes names, types, request and response information. This is an appropriate level of abstraction for client developers who need to fulfill requests to the appropriate endpoint. They do not need to read the Python implementation code for this endpoint.





API documentation



On this basis, you can create additional tools. For example, a means to execute a request from a browser. This allows developers to access the HTTP APIs of interest to them without having to write code. We can even generate type-safe client code to ensure that types work correctly on both the client and server. Due to this, we may have at our disposal a strictly typed API on the server, calls to which are performed using strictly typed client code.



In addition, we can create a backward compatibility check system. What happens if we release a new version of the server code in which to access the API in question we need to use id



, name



and birth_year



, and then we understand that we do not know the birthdays of all the characters? In this case, the birth_year



parameter birth_year



need to be made optional, but at the same time, older versions of clients that expect a similar parameter may simply stop working. Although our APIs differ in explicit typing, the corresponding types may change (say, the API will change if the use of the character’s birth year was first mandatory and then became optional). We can track API changes and warn API developers by giving them prompts at the right time that, by making some changes, they can disrupt the performance of clients.



Summary



There is a whole range of application protocols that computers can use to communicate with each other.



One side of this spectrum is represented by RPC frameworks like Thrift and gRPC. They differ in that they usually set strict types for requests and responses and generate client and server code for organizing the operation of requests. They can do without HTTP and even without JSON.



On the other hand, there are unstructured web frameworks written in Python that do not have explicit contracts for requests and responses. Our approach provides opportunities typical for more clearly structured frameworks, but at the same time allows you to continue using the HTTP + JSON bundle and contributes to the fact that you have to make a minimum of changes to the application code.



It is important to note that this idea is not new. There are many frameworks written in strongly typed languages ​​that provide developers with the features we described. If we talk about Python, then this is, for example, the APIStar framework.



We have successfully commissioned the use of types for the HTTP API. We were able to apply the described approach to typing the API throughout our code base due to the fact that it is well applicable to existing presentation functions. The value of what we did is obvious to all our programmers. Namely, we are talking about the fact that the documentation generated automatically has become an effective means of communication between those who develop the server and those who write the Instagram client.



Dear readers! How do you approach the design of HTTP APIs in your Python projects?








All Articles