Schemathesis: property-based testing for API schemas

image







Photo Chris Keats on Unsplash







Many companies, including us, have switched from monoliths to microservices for the sake of better scalability and faster development cycles. We still have monolithic projects, but they are gradually being replaced by a set of small and neat microservices.







These microservices use the Open API 3.0 schema to describe what to expect from them. Schemes provide many useful things, for example, auto-generated clients or interactive documentation, but their main advantage is that they help control how the services communicate with each other.







Interservice communication becomes more complicated when the number of participants grows, and in this article, I want to share my thoughts on the problems of using schemes in web applications and outline some ways to deal with them.










Even considering that the Open API 3.0 is in many ways superior to its predecessor (also known as Swagger 2.0), it, like other specifications, has many limitations. The main problem is that even if the described scheme fully reflects the author’s vision, this does not mean that the real application will behave according to the scheme.







There are many different approaches for synchronizing schemas with documentation and application logic. The most common:









None of these approaches guarantees a 1: 1 correspondence between the behavior of the application and its scheme, and there are many reasons for this. This can be a complex restriction at the database level, which cannot be expressed in the language of the scheme or the ubiquitous human factor - or we forgot to update the application to reflect the changes in the scheme or vice versa.







There are many consequences of these inconsistencies, from the raw error that crashes the application, to security issues that can cause serious financial loss.










The obvious way to solve these problems is to test applications and configure linters for circuits (such as Zally from Zalando), which we do, but the situation becomes more complicated when you need to work with hundreds of services of various sizes.







Classic, example-based tests have some support cost and take time to write, but they are still an integral part of any modern development process. We were looking for a cheap and effective way to find defects in our applications, something that allows us to test applications written in different languages, will have a minimum support cost and will be easy to use.







Therefore, we decided to investigate the applicability of property-based testing (PBT) for the Open API schemes. The concept itself is not new, it was first implemented in the Haskell library by QuickCheck Koen Claessen and John Hughes in 1999. Today, PBT tools exist in most programming languages, including Python, our main backend language. In the examples below, I will use Hypothesis, authored by David R. MacIver.







The essence of the approach is to determine the properties that the code must satisfy and verify that these properties are executed on a large number of randomly generated input data. Let's imagine a simple function that takes two numbers as an input and returns their sum, as well as a test for this function. As an example, we can expect that our implementation has the commutativity property.







def add(a, b):
return a + b
def test_add(a, b):
assert add(a, b) == add(b, a)
view raw add.py hosted with ❤ by GitHub


However, Hypothesis quickly reminds us that commutativity is only valid for real numbers:







from hypothesis import strategies as st, given
NUMBERS = st.integers() | st.floats()
@given(a=NUMBERS, b=NUMBERS)
def test_add(a, b):
assert add(a, b) == add(b, a)
# Falsifying example: test_add(a=0, b=nan)
view raw hypothesis.py hosted with ❤ by GitHub


PBT allows developers to find non-trivial examples when code does not work as expected. So how does this apply to API schemas?







It turned out that we expect quite a lot from our applications, they should:









Compliance with the scheme can be further developed:









Even taking into account the fact that it is impossible to fulfill all these properties in all cases, they are good guidelines. By themselves, schemas are a source of application properties, making them ideal for use in PBT.










First of all, we looked around and found that there was already a Python library for this - swagger-conformance , but it looked abandoned. We needed Open API support and more flexibility with data generation strategies than in swagger-conformance. We also found a fresh library - hypothesis-jsonschema , written by one of the main Hypothesis developers - Zac Hatfield-Dodds. I am grateful to the people who wrote these tools. With their efforts, testing in Python has become more exciting, inspiring and enjoyable.







Since the JSON Schema is the basis of the Open API, this library was suitable for us, but still did not provide everything that we needed. Having all these tools, we decided to build our own library based on Hypothesis, hypothesis-jsonschema and pytest, which would work with the Open API and Swagger specifications.







This is how the Schemathesis project came about , which we started a few months ago on our Testing Platform team at Kiwi.com. The idea is this:









Schemathesis generates data that matches the circuit and makes the necessary network requests to the running application and checks if the application crashed or that the response matches the circuit.







We still have a huge amount of interesting functionality ahead for implementation:









Even in its current state, Schemathesis has helped us improve our applications and deal with certain types of defects. Next I will show some examples of how this works and what types of errors can be found. For this purpose, I created an application that provides a simple API for booking, the source can be found here https://github.com/Stranger6667/schemathesis-example . It contains errors that are not always obvious at first glance and we will find them using Schemathesis.







In the example, there are two endpoints:









Further in the text, I mean that this project is running on 127.0.0.1:8080



.







Schemathesis can be used as a command-line application or in Python tests, both features have their advantages and disadvantages, which I will discuss later.

Let's start from the command line and try to create a new reservation. Booking model has only a few fields:







components:
schemas:
Booking:
properties:
id:
type: integer
name:
type: string
is_active:
type: boolean
type: object
view raw booking_v1.yaml hosted with ❤ by GitHub


Open API 3 scheme Booking model







CREATE TABLE bookings (
id INTEGER PRIMARY KEY,
name VARCHAR(30),
is_active BOOLEAN
);
view raw booking_db.sql hosted with ❤ by GitHub


Definition of the table in the database.







# models.py
import attr
@attr.s(slots=True)
class Booking:
id: int = attr.ib()
name: str = attr.ib()
is_active: bool = attr.ib()
asdict = attr.asdict
# db.py
from . import models
async def create_booking(pool, *, booking_id: int, name: str, is_active: bool) -> models.Booking:
row = await pool.fetchrow(
"INSERT INTO bookings (id, name, is_active) VALUES ($1, $2, $3) RETURNING *",
booking_id, name, is_active
)
return models.Booking(**row)
# views.py
from aiohttp import web
from . import db
async def create_booking(request: web.Request, body) -> web.Response:
booking = await db.create_booking(
request.app["db"], booking_id=body["id"], name=body["name"], is_active=body["is_active"]
)
return web.json_response(booking.asdict())
view raw create_booking.py hosted with ❤ by GitHub


Relevant Python Code







Have you noticed a defect that could lead to an unhandled error?







We need to run Schemathesis and specify the endpoint we need:







 $ schemathesis run -M POST -E /bookings/ http://0.0.0.0:8080/api/openapi.json
      
      





These two options, --method



and --endpoint



allow you to run tests only for interesting endpoints.













The Schemathesis CLI will generate simple Python code so that the error can be easily reproduced, and will also save it in the Hypothesis internal database for use in future runs. On the server side, we will see the problem parameter in the exception text:







 File "/example/views.py", line 13, in create_booking request.app["db"], booking_id=body["id"], name=body["name"], is_active=body["is_active"] KeyError: 'id'
      
      





To fix the error, we need to make id



and other parameters mandatory in the scheme.







components:
schemas:
Booking:
properties:
id:
type: integer
name:
type: string
is_active:
type: boolean
type: object
required: [id, name, is_active]
view raw booking_v2.yaml hosted with ❤ by GitHub


Let's restart the last command and check if everything is fine:













Again a mistake! On the server side, this output is:







 asyncpg.exceptions.UniqueViolationError: duplicate key value violates unique constraint "bookings_pkey" DETAIL: Key (id)=(0) already exists.
      
      





It seems that I did not consider the situation when a user tries to create a reservation with the same ID twice! But, this kind of problem is common on production - double clicks, repeated requests for errors, etc.







We often do not imagine how our applications will be used after the deployment, but PBT can help with finding logic that is not available in the implementation.







Schemathesis also allows you to use its functionality in ordinary Python tests. The second endpoint of our example may look simple - take a record from the database and serialize it. But it also contains an error.







paths:
/bookings/{booking_id}:
parameters:
- description: Booking ID to retrieve
in: path
name: booking_id
required: true
schema:
format: int32
type: integer
get:
summary: Get a booking by ID
operationId: example.views.get_booking_by_id
responses:
"200":
description: OK
view raw get_booking.yaml hosted with ❤ by GitHub


Open API 3 definitions







# db.py
async def get_booking_by_id(pool, *, booking_id: int) -> Optional[models.Booking]:
row = await pool.fetchrow(
"SELECT * FROM bookings WHERE id = $1", booking_id
)
if row is not None:
return models.Booking(**row)
# views.py
async def get_booking_by_id(request: web.Request, booking_id: int) -> web.Response:
booking = await db.get_booking_by_id(request.app["db"], booking_id=booking_id)
if booking is not None:
data = booking.asdict()
else:
data = {}
return web.json_response(data)
view raw get_booking.py hosted with ❤ by GitHub


The central element of using Schemathesis is an instance of a circuit. It provides parameterization of the circuit, selection of endpoints for tests, and other configuration options.







There are several ways to create a circuit, and all of them have such a pattern - schemathesis.from_<something>



. Usually, it’s much more convenient to have an application as a pytest



fixture to launch it when necessary (and schemathesis.from_pytest_fixture



for this purpose), but for simplicity I will continue to use the application running locally on port 8080:







import schemathesis
schema = schemathesis.from_uri("http://0.0.0.0:8080/api/openapi.json")
@schema.parametrize(method="GET", endpoint="/bookings/{booking_id}")
def test_get_booking(case):
response = case.call()
assert response.status_code < 500
view raw schemathesis_test.py hosted with ❤ by GitHub


Each test with the schema.parametrize decorator should take a case fixture as an argument, which contains all the attributes required by the circuit and additional information to make the necessary requests over the network. The fixture looks something like this:







 >>> case Case( path='/bookings/{booking_id}', method='GET', base_url='<a href="http://0.0.0.0:8080/api%27">http://0.0.0.0:8080/api'</a>, path_parameters={'booking_id': 2147483648}, headers={}, cookies={}, query={}, body=None, form_data={} )
      
      





Case.call()



makes a network request with this data to the running application using requests



.







Tests can be run using pytest



(standard unittest



also supported):







 $ pytest test_example.py -v
      
      











Server Side Exception:







 asyncpg.exceptions.DataError: invalid input for query argument $1: 2147483648 (value out of int32 range)
      
      





The output indicates a problem with insufficient validation of the input data, you can fix it by adding a minimum and maximum value to the circuit. Having format: int32



not enough - according to the specification, this is just a hint.










The application in the example is too simplified and does not have much of the functionality necessary for production, such as authorization, monitoring, and so on. However, Schemathesis and property-based testing in general can detect a wide range of errors in applications. A short summary of the previous paragraphs and a few other examples:









These problems have a different level of danger, but even small errors can fill your error tracker and appear in your notifications. I came across errors like the ones mentioned above on production and I prefer to fix them as soon as possible than to be woken up by PagerDuty in the middle of the night.







There are many things that can be improved in these areas and I want to invite you to participate in the development of Schemathesis



, Hypothesis



, hypothesis-jsonschema



or pytest



, all of which are open source projects. Links to projects are listed below.







Thanks for attention!







The project has its own Gitter chat, in which you can ask to chat with us and leave feedback - https://gitter.im/kiwicom/schemathesis







References :










All Articles