Tests or types

Hi, Habr. The other day I was looking for how to do something in Idris, and came across a good post, a free translation of which seems quite appropriate. Liberties and gag, where necessary, I will denote “here by such squiggles at the beginning and at the end”.

When to use tests, and when - types? What information and what guarantees do we receive in exchange for our efforts to write them?

We will look at a simple and slightly contrived example expressed in Python, C, Haskell, and Idris. We will also see what can be said about the implementation without any additional knowledge about it, in each case.

We will not take into account the various backdoors that allow explicitly violating the language guarantees (for example, C extensions, unsafePerformIO

in Haskell, unsafe type conversions), otherwise no conclusions could be made at all, and this post would have turned out to be quite short. ⟦In addition, the same Haskell has a subset of Safe Haskell that explicitly and transitively prohibits the use of these and a number of other tricks that could violate the integrity of the language.⟧

Specification

Let a list and some meaning be given. It is necessary to return the index of this value in the list or indicate that this value is not in the list.

The implementation of this specification is trivial, so it’s natural to ask, where do the tests or types in general. However, the properties and methods of reasoning that we will talk about today are applicable to a much more complex code. Let the implementation take ten thousand lines of unreadable spaghetti code, if that helps to see their usefulness.

Python

 def x(y, z): # 10000

We note right away that we are not interested in the unchecked ⟦and semantics-not affecting properties of a program like variable naming and text documentation, so I intentionally did not write code that helps perception. We are only interested in the fact that, subject to passing tests and type checks, it cannot be untrue .

In the code above, there is practically no useful information other than the fact that we have a function that takes two arguments. This function can equally well find the index of the value in the list, or it can send an insulting letter to your grandmother.

Analysis

Not only do we get fragile code without tests and types, but our only way to understand what a function does is documentation. And since the documentation is checked by people and not by machines, it can easily turn out to be outdated ⟦or initially incorrect⟧.

Documentation
- ✗ We know the expected behavior
  
  We have nothing to tell us about the behavior of this function. You hate your grandmother. You are a monster.
Warranty
- ✓ Memory Security
  
  Python is a garbage collection language that removes this concern from us. ⟦However, as far as I know, nothing prevents you from pulling unsafe libraries or C FFI inside this function.⟧

Python with tests

 def test_happy_path(): assert x([1, 2, 3], 2) == 1 def test_missing(): assert x([1, 2, 3], 4) is None

Now we know that our function works, and if the element is missing, the result is None

?

Oh no. This is just one example. Unfortunately, the scope of our function is infinite, and no number of examples can prove the correct operation of our function. More tests - more confidence, but no number of tests will resolve all doubts.

The possibility that this function will return None

for 4

, but not for 5

, sounds rather delusional, and in this particular case this is most likely nonsense. We can be satisfied with our level of faith and dwell on a certain number of examples. But then again, then the post will be short, so let's imagine that the implementation is not so obvious.

Since the tests cannot prove something in the general case, but only show the behavior with specific examples, the tests cannot show the absence of errors. For example, there is no test that would show that our function never throws an exception or never enters the eternal cycle, or does not contain invalid links. This can only be a static analysis.

However, even if the examples are not very good in the role of evidence, they at least constitute good documentation. From these two examples we can derive the full specification ⟦under some additional a priori assumptions - those two examples also satisfy, for example, the “counterspec” “find the element in the array and return the previous one, if any,” which took me ten seconds to invent .

Analysis

Although tests can show how to use our function, and also give a little confidence that this function works correctly with at least some examples, they cannot prove anything about our code in the general case. Unfortunately, this means that tests only partially help to avoid errors.

Documentation
- ✓ We have an example of use
- ✓ We know some classes of values that will be processed correctly
- ✗ We know all types of values that will be processed correctly
  
  We have no restrictions on the types of arguments, so despite the existence of examples of what the function can handle, we do not know which types have not been tested.
- ✗ We know the expected behavior
  
  ⟦The author of the original article ticked here, I will allow myself to put a cross, given the comment above⟧
Specification
- ✓ Works in at least one case
- ✗ The returned index is always a valid index
- ✗ The returned index always indicates a suitable value
- ✗ Missing item always returns None
  
  / Nothing
Common mistakes
- ✗ No typos or incorrect names
  
  Static analysis can help, but since Python is a dynamic language with the ability to override various things at runtime, we can never prove that there are no errors.
  
  In particular, it can be very difficult or impossible to determine whether the method name is correct, since the validity of the method call depends on the runtime type of the object on which the call is made.
- ✗ No unexpected null
- ✗ Error case is always handled
  
  In my experience, this is one of the most common sources of errors: in our example, the function returns None
  
  in case of a missing element, but the code using this function can assume, for example, that it will always return a number. In addition, this can also lead to an unhandled exception.
Warranty
- ✓ Memory Security
- ✗ The function cannot be called with the wrong type
- ✗ No side effects
- ✗ No exceptions
- ✗ No errors
- ✗ No perpetual cycles

Haskell

 xyz = -- 10000

If you are unfamiliar with the syntax: this is the definition of a function x

with parameters y

and z

. In Haskell, you can omit types, since they will be deduced from the implementation ⟦if, of course, you do not use various advanced features from modern extensions of the type system⟧.

It may seem that this is not very different from the Python version, but only thanks to the fact that we wrote our function in Haskell and it is tiled, we can already talk about some interesting properties.

Analysis

Obviously, here we can draw not so many conclusions, but here are some points to note:

Documentation
- ✗ We know the expected behavior
Common mistakes
- ✓ No typos or incorrect names
  
  Since Haskell is a compiled language, all names must be resolved at compile time. The program simply will not compile if there is this error.
- ✓ No unexpected null
  
  Haskell just doesn't have null
  
  . The problem is solved!
Warranty
- ✓ Memory Security
- ✓ The function cannot be called with the wrong type
- ✓ No unexpected side effects
  
  ⟦The author of the original article did not specify this item, but I will allow myself to note that if there are side effects, the deduced type of this function will indicate their presence, so that the calling code will know about their capabilities.⟧

Haskell type specifying

 x :: Eq a => [a] -> a -> Maybe Int xyz = -- ...

Earlier, we talked about a rather conniving attitude towards the safety of grandmothers: it was clear from the tests that the function was not going to harm anyone, but was the grandmother really safe? Does this function accurately send no swear letters?

Haskell is known for being a pure functional language. This does not mean that the code cannot have side effects, but all side effects must be present in the type. We know the type of this function, we see that it is clean, so we are sure that this function does not modify any external state.

This is a very interesting property for other reasons: since we know that there are no side effects, we can understand what this function does, based only on its signature! Just search this Hoogle signature and look at the first result. Of course, this is not the only possible function that would have such a type, but the type gives us enough confidence for documentation purposes.

Analysis

Documentation
- ✓ We know expected behavior
- ✗ We have an example of use
- ✓ We know some classes of values that will be processed correctly
- ✓ We know all types of values that will be correctly processed.
Specification
- ✗ Works in at least one case.
  
  With missing tests or evidence, we don’t know if our function works at all as we expect!
- ✗ The returned index is always a valid index.
- ✗ The returned index always indicates a suitable value.
- ✗ A missing item always returns None
  
  / Nothing
  
  .
Common mistakes
- ✓ No typos or incorrect names
- ✓ No unexpected null
- ✓ Error case is always handled
  
  If our function returns Nothing
  
  , then the type system allows us to verify that this case is correctly processed by the calling code. Of course, this case can be ignored, but this must be done explicitly.
Warranty
- ✓ Memory Security
- ✓ The function cannot be called with the wrong type
- ✓ No side effects
- ✗ No exceptions
  
  I share exceptions and errors, believing that after exceptions it is possible to recover, and after errors (for example, partially defined functions) - no.
  
  For the most part, exceptions are described in types (for example, in the IO monad). In a good way, we should know that a function will not throw an exception, only on the basis of its type. However, Haskell breaks this expectation by allowing exceptions to be thrown from pure code .
  
  ⟦In addition, it is worth noting that in Haskell, errors like incorrectly calling partially defined functions are also presented as exceptions that can be caught and processed, so the difference between the two categories is slightly less obvious.⟧
- ✗ No errors
  
  We can still use partially defined functions, for example, division by zero.
- ✗ No perpetual cycles

Haskell with tests

Remember, I said earlier that tests cannot prove the absence of errors? I lied. When the stars converge correctly, and if the tests are combined with types, then it becomes possible! The first star is the finiteness of the domain of our function. Second, the domain of definition should be not only finite, but also not very large, otherwise such a test will be difficult to put into practice.

For example:

 not :: Bool -> Bool not x = ...

The input can be either True

or False

. It’s enough to test these two options, and here it is, the Holy Grail! No exceptions, perpetual cycles, incorrect results, no errors. ⟦However, for a slightly more complex function, it may not be clear how much time is devoted to tests: if they take a long time to complete, have we got into the eternal cycle, or is it just heavy? The problem of stopping her.⟧

In fact, this is also not entirely true in the case of Haskell: in each Haskell type there is also a value ⊥ (which can be obtained as undefined

, error

or in some sense as infinite recursion), but Haskellists traditionally close their eyes and believe that it does not exist.

Extracurricular Reading: There are Only Four Billion Floats – So Test Them All!

In any case, in our original example, the scope is infinite, so tests can only show that our code works for a finite set of examples.

Analysis

In this case, the tests complement the types and plug some holes in the Haskell type system. We have much more confidence in our code compared to using tests alone or types.

C

 /* C    ,    int */ int x(int *y, size_t n, int z) { /* 10000    */ }

We consider C out of interest in older type systems. In C, in particular, the types are most likely needed not by the programmer, but by the compiler to help him generate faster code.

In our example, we do not know what the function will return if the element is not found. We will have to rely on tradition or on documentation (for example, in this case it may be -1

).

We could also use out arguments: this way we can return an error and store the return value in this out argument. This is a slightly more expressive option, but we still have to rely on the documentation to understand which parameters are read and which are written. In both cases, it’s hard to understand behavior by looking at types.

 /*   ,   out- */ error_t x(int *y, size_t n, int z, size_t *w) { /* 10000    */ }

Analysis

The type system itself does not give us so many guarantees. Of course, we get some information from these types, but just compare it with the Haskell case.

Idris

 x : Eq x => List x -> x -> Maybe Int xyz = ...

This function is of the same type as in the case of Haskell. However, with a more expressive type system, we can achieve more. The choice of types can talk about the implementation.

 %default total x : Eq x => Vect nx -> x -> Maybe (Fin n) xyz = ...

This type can be read as "give me a list of size n

and some value, and I will return you either a number less than n

or Nothing

." This ensures that the function returns an index that obviously does not go beyond the bounds.

In addition, this function is total, that is, the timer has checked that it always ends. This eliminates perpetual cycles and errors.

Analysis

Specification
- ✗ Works in at least one case.
- ✓ The returned index is always the correct index
- ✗ The returned index always indicates a suitable value
- ✗ Missing item always returns None
  
  / Nothing
Warranty
- ✓ Memory Security
- ✓ The function cannot be called with the wrong type
- ✓ No side effects
- ✗ No exceptions
- ✓ No errors
- ✓ No perpetual cycles

Idris with tests

Since the language of types Idris is as expressive as the language of its terms ⟦(or rather, its provably total part)⟧, the difference between the test and type is blurred:

 ex : x [1, 2, 3] 2 = Just 1 ex = Refl

This function has a rather strange type x [1, 2, 3] 2 = Just 1

. This type means that for a successful type check, the typer must prove that x [1, 2, 3] 2

structurally equal to Just 1

. ⟦In this case, the proof is trivial, since it is enough for the tyler to normalize the terms on both sides of the equal sign, which will be done in finite time due to the totality of all the functions used, and which will lead to a unique result due to Church-Rosser. After this, one can use the reflexivity of equality, which is what Refl

.⟧

In fact, we wrote a type level test.

Idris with evidence

For completeness of analysis, we can use the full power of dependent types and prove our implementation (since dependent types in Idris are equivalent to a logical system that includes constructive first-order logic).

In particular, we can prove properties that were previously unattainable to us:

 --      Eq  DecEq x : DecEq a => Vect na -> (y : a) -> Maybe (Fin n) xyz = ... --    ,       `x` findIndexOk : DecEq a => (y : Vect na) -> (z : a) -> case xyz of Just i => index iy = z Nothing => Not (Elem zy) findIndexOk yz = ...

Type findIndexOk

can be read as “for any type a

such that there exists an algorithmically decidable comparison ( DecEq

) for it, for any vector y

elements of type a

any length n

and any value z

type a

: if xyz

returns index i

, then by this index lies z

, but if xyz

returns Nothing

, then there is no such element in the vector at all. "

⟦It is interesting that the author of the original article gives a type slightly weaker than the one given above.⟧

Now we have everything captured! What are the disadvantages? Well, writing all this evidence can be quite difficult.

Comparison

	Python	Python tests	Haskell	Haskell types	Haskell types tests	Idris	Idris tests	Idris proofs
Documentation
We know the expected behavior	✗	✗	✗	✓	✓	✓	✓	✓
There is an example of use	✗	✓	✗	✗	✓	✗	✓	✓
We know some types of suitable values.	✗	✓	✗	✓	✓	✓	✓	✓
We know all types of suitable values.	✗	✗	✗	✓	✓	✓	✓	✓
Specification
Works in at least one case	✗	✓	✗	✗	✓	✗	✓	✓
The returned index is always valid.	✗	✗	✗	✗	✗	✓	✓	✓
The returned index is always valid.	✗	✗	✗	✗	✗	✗	✗	✓
Missing element gives `None` /` Nothing`	✗	✗	✗	✗	✗	✗	✗	✓
Common mistakes
No typos or wrong names	✗	✗	✓	✓	✓	✓	✓	✓
No sudden `null`	✗	✗	✓	✓	✓	✓	✓	✓
The error case is always handled.	✗	✗	✓	✓	✓	✓	✓	✓
Warranty
Memory security	✓	✓	✓	✓	✓	✓	✓	✓
Cannot be called with the wrong type.	✗	✗	✓	✓	✓	✓	✓	✓
No side effects	✗	✗	✗	✓	✓	✓	✓	✓
No exceptions	✗	✗	✗	✗	✗	✓	✓	✓
No mistakes	✗	✗	✗	✗	✗	✓	✓	✓
No perpetual cycles	✗	✗	✗	✗	✗	✓	✓	✓

Opinion

In my opinion, the use of a modern type system in itself is most effective in terms of the relationship of the information received and guarantees to the effort expended. If you want to write fairly reliable code, then the types can be seasoned with tests. Ideally - in the style of QuickCheck.

With dependent types, the line between tests and types becomes less obvious. If you are writing software for Boeing or for pacemakers, it may be useful to write evidence.

All Articles

Tests or types

Specification

Python

Python with tests

Haskell

Haskell type specifying

Haskell with tests

C

Idris

Idris with tests

Idris with evidence

Comparison

Opinion

More articles: