Tests or types

Hi, Habr. The other day I was looking for how to do something in Idris, and came across a good post, a free translation of which seems quite appropriate. Liberties and gag, where necessary, I will denote β€œhere by such squiggles at the beginning and at the end”.







When to use tests, and when - types? What information and what guarantees do we receive in exchange for our efforts to write them?







We will look at a simple and slightly contrived example expressed in Python, C, Haskell, and Idris. We will also see what can be said about the implementation without any additional knowledge about it, in each case.







We will not take into account the various backdoors that allow explicitly violating the language guarantees (for example, C extensions, unsafePerformIO



in Haskell, unsafe type conversions), otherwise no conclusions could be made at all, and this post would have turned out to be quite short. ⟦In addition, the same Haskell has a subset of Safe Haskell that explicitly and transitively prohibits the use of these and a number of other tricks that could violate the integrity of the language.⟧







Specification



Let a list and some meaning be given. It is necessary to return the index of this value in the list or indicate that this value is not in the list.

The implementation of this specification is trivial, so it’s natural to ask, where do the tests or types in general. However, the properties and methods of reasoning that we will talk about today are applicable to a much more complex code. Let the implementation take ten thousand lines of unreadable spaghetti code, if that helps to see their usefulness.







Python



 def x(y, z): # 10000   
      
      





We note right away that we are not interested in the unchecked ⟦and semantics-not affecting properties of a program like variable naming and text documentation, so I intentionally did not write code that helps perception. We are only interested in the fact that, subject to passing tests and type checks, it cannot be untrue .







In the code above, there is practically no useful information other than the fact that we have a function that takes two arguments. This function can equally well find the index of the value in the list, or it can send an insulting letter to your grandmother.







Analysis







Not only do we get fragile code without tests and types, but our only way to understand what a function does is documentation. And since the documentation is checked by people and not by machines, it can easily turn out to be outdated ⟦or initially incorrect⟧.









Python with tests



 def test_happy_path(): assert x([1, 2, 3], 2) == 1 def test_missing(): assert x([1, 2, 3], 4) is None
      
      





Now we know that our function works, and if the element is missing, the result is None



?







Oh no. This is just one example. Unfortunately, the scope of our function is infinite, and no number of examples can prove the correct operation of our function. More tests - more confidence, but no number of tests will resolve all doubts.







The possibility that this function will return None



for 4



, but not for 5



, sounds rather delusional, and in this particular case this is most likely nonsense. We can be satisfied with our level of faith and dwell on a certain number of examples. But then again, then the post will be short, so let's imagine that the implementation is not so obvious.







Since the tests cannot prove something in the general case, but only show the behavior with specific examples, the tests cannot show the absence of errors. For example, there is no test that would show that our function never throws an exception or never enters the eternal cycle, or does not contain invalid links. This can only be a static analysis.







However, even if the examples are not very good in the role of evidence, they at least constitute good documentation. From these two examples we can derive the full specification ⟦under some additional a priori assumptions - those two examples also satisfy, for example, the β€œcounterspec” β€œfind the element in the array and return the previous one, if any,” which took me ten seconds to invent .







Analysis







Although tests can show how to use our function, and also give a little confidence that this function works correctly with at least some examples, they cannot prove anything about our code in the general case. Unfortunately, this means that tests only partially help to avoid errors.









Haskell



 xyz = -- 10000  
      
      





If you are unfamiliar with the syntax: this is the definition of a function x



with parameters y



and z



. In Haskell, you can omit types, since they will be deduced from the implementation ⟦if, of course, you do not use various advanced features from modern extensions of the type system⟧.







It may seem that this is not very different from the Python version, but only thanks to the fact that we wrote our function in Haskell and it is tiled, we can already talk about some interesting properties.







Analysis







Obviously, here we can draw not so many conclusions, but here are some points to note:









Haskell type specifying



 x :: Eq a => [a] -> a -> Maybe Int xyz = -- ...
      
      





Earlier, we talked about a rather conniving attitude towards the safety of grandmothers: it was clear from the tests that the function was not going to harm anyone, but was the grandmother really safe? Does this function accurately send no swear letters?







Haskell is known for being a pure functional language. This does not mean that the code cannot have side effects, but all side effects must be present in the type. We know the type of this function, we see that it is clean, so we are sure that this function does not modify any external state.







This is a very interesting property for other reasons: since we know that there are no side effects, we can understand what this function does, based only on its signature! Just search this Hoogle signature and look at the first result. Of course, this is not the only possible function that would have such a type, but the type gives us enough confidence for documentation purposes.







Analysis









Haskell with tests



Remember, I said earlier that tests cannot prove the absence of errors? I lied. When the stars converge correctly, and if the tests are combined with types, then it becomes possible! The first star is the finiteness of the domain of our function. Second, the domain of definition should be not only finite, but also not very large, otherwise such a test will be difficult to put into practice.







For example:







 not :: Bool -> Bool not x = ...
      
      





The input can be either True



or False



. It’s enough to test these two options, and here it is, the Holy Grail! No exceptions, perpetual cycles, incorrect results, no errors. ⟦However, for a slightly more complex function, it may not be clear how much time is devoted to tests: if they take a long time to complete, have we got into the eternal cycle, or is it just heavy? The problem of stopping her.⟧







In fact, this is also not entirely true in the case of Haskell: in each Haskell type there is also a value βŠ₯ (which can be obtained as undefined



, error



or in some sense as infinite recursion), but Haskellists traditionally close their eyes and believe that it does not exist.







Extracurricular Reading: There are Only Four Billion Floats – So Test Them All!







In any case, in our original example, the scope is infinite, so tests can only show that our code works for a finite set of examples.







Analysis

In this case, the tests complement the types and plug some holes in the Haskell type system. We have much more confidence in our code compared to using tests alone or types.







C



 /* C    ,    int */ int x(int *y, size_t n, int z) { /* 10000    */ }
      
      





We consider C out of interest in older type systems. In C, in particular, the types are most likely needed not by the programmer, but by the compiler to help him generate faster code.







In our example, we do not know what the function will return if the element is not found. We will have to rely on tradition or on documentation (for example, in this case it may be -1



).







We could also use out arguments: this way we can return an error and store the return value in this out argument. This is a slightly more expressive option, but we still have to rely on the documentation to understand which parameters are read and which are written. In both cases, it’s hard to understand behavior by looking at types.







 /*   ,   out- */ error_t x(int *y, size_t n, int z, size_t *w) { /* 10000    */ }
      
      





Analysis

The type system itself does not give us so many guarantees. Of course, we get some information from these types, but just compare it with the Haskell case.







Idris



 x : Eq x => List x -> x -> Maybe Int xyz = ...
      
      





This function is of the same type as in the case of Haskell. However, with a more expressive type system, we can achieve more. The choice of types can talk about the implementation.







 %default total x : Eq x => Vect nx -> x -> Maybe (Fin n) xyz = ...
      
      





This type can be read as "give me a list of size n



and some value, and I will return you either a number less than n



or Nothing



." This ensures that the function returns an index that obviously does not go beyond the bounds.







In addition, this function is total, that is, the timer has checked that it always ends. This eliminates perpetual cycles and errors.







Analysis









Idris with tests



Since the language of types Idris is as expressive as the language of its terms ⟦(or rather, its provably total part)⟧, the difference between the test and type is blurred:







 ex : x [1, 2, 3] 2 = Just 1 ex = Refl
      
      





This function has a rather strange type x [1, 2, 3] 2 = Just 1



. This type means that for a successful type check, the typer must prove that x [1, 2, 3] 2



structurally equal to Just 1



. ⟦In this case, the proof is trivial, since it is enough for the tyler to normalize the terms on both sides of the equal sign, which will be done in finite time due to the totality of all the functions used, and which will lead to a unique result due to Church-Rosser. After this, one can use the reflexivity of equality, which is what Refl



.⟧







In fact, we wrote a type level test.







Idris with evidence



For completeness of analysis, we can use the full power of dependent types and prove our implementation (since dependent types in Idris are equivalent to a logical system that includes constructive first-order logic).







In particular, we can prove properties that were previously unattainable to us:







 --      Eq  DecEq x : DecEq a => Vect na -> (y : a) -> Maybe (Fin n) xyz = ... --    ,       `x` findIndexOk : DecEq a => (y : Vect na) -> (z : a) -> case xyz of Just i => index iy = z Nothing => Not (Elem zy) findIndexOk yz = ...
      
      





Type findIndexOk



can be read as β€œfor any type a



such that there exists an algorithmically decidable comparison ( DecEq



) for it, for any vector y



elements of type a



any length n



and any value z



type a



: if xyz



returns index i



, then by this index lies z



, but if xyz



returns Nothing



, then there is no such element in the vector at all. "







⟦It is interesting that the author of the original article gives a type slightly weaker than the one given above.⟧







Now we have everything captured! What are the disadvantages? Well, writing all this evidence can be quite difficult.







Comparison



Python Python

tests
Haskell Haskell

types
Haskell

types

tests
Idris Idris

tests
Idris

proofs
Documentation
We know the expected behavior βœ— βœ— βœ— βœ“ βœ“ βœ“ βœ“ βœ“
There is an example of use βœ— βœ“ βœ— βœ— βœ“ βœ— βœ“ βœ“
We know some types of suitable values. βœ— βœ“ βœ— βœ“ βœ“ βœ“ βœ“ βœ“
We know all types of suitable values. βœ— βœ— βœ— βœ“ βœ“ βœ“ βœ“ βœ“
Specification
Works in at least one case βœ— βœ“ βœ— βœ— βœ“ βœ— βœ“ βœ“
The returned index is always valid. βœ— βœ— βœ— βœ— βœ— βœ“ βœ“ βœ“
The returned index is always valid. βœ— βœ— βœ— βœ— βœ— βœ— βœ— βœ“
Missing element gives `None` /` Nothing` βœ— βœ— βœ— βœ— βœ— βœ— βœ— βœ“
Common mistakes
No typos or wrong names βœ— βœ— βœ“ βœ“ βœ“ βœ“ βœ“ βœ“
No sudden `null` βœ— βœ— βœ“ βœ“ βœ“ βœ“ βœ“ βœ“
The error case is always handled. βœ— βœ— βœ“ βœ“ βœ“ βœ“ βœ“ βœ“
Warranty
Memory security βœ“ βœ“ βœ“ βœ“ βœ“ βœ“ βœ“ βœ“
Cannot be called with the wrong type. βœ— βœ— βœ“ βœ“ βœ“ βœ“ βœ“ βœ“
No side effects βœ— βœ— βœ— βœ“ βœ“ βœ“ βœ“ βœ“
No exceptions βœ— βœ— βœ— βœ— βœ— βœ“ βœ“ βœ“
No mistakes βœ— βœ— βœ— βœ— βœ— βœ“ βœ“ βœ“
No perpetual cycles βœ— βœ— βœ— βœ— βœ— βœ“ βœ“ βœ“


Opinion



In my opinion, the use of a modern type system in itself is most effective in terms of the relationship of the information received and guarantees to the effort expended. If you want to write fairly reliable code, then the types can be seasoned with tests. Ideally - in the style of QuickCheck.







With dependent types, the line between tests and types becomes less obvious. If you are writing software for Boeing or for pacemakers, it may be useful to write evidence.








All Articles