Tests or types? - Rust version

A couple of days ago, 0xd34df00d published a translation of an article here that describes what you can learn about a function in different languages, if you consider it as a "black box" without using information about its implementation (but, of course, without stopping it from using the compiler). Of course, the information received is very dependent on the language - four examples were considered in the original article:

Python - dynamically typed, minimum information, only tests give some hints;
C - weakly statically typed, not much more information;
Haskell - strongly statically typed, with pure functions, much more information;
Idris is a language with dependent types, there is enough information to prove the correctness of the function during compilation.

"There is C, there is Haskell, but where is Rust ?!" - immediately the question was raised. The answer is under the cut.

Recall the condition of the problem:

Let a list and some meaning be given. It is necessary to return the index of this value in the list or indicate that this value is not in the list.

For the impatient - all the options discussed below can be seen in the Rust playground .

Let's go!

Simple search

We will start with an almost naive signature, which, in fact, differs from C code only in some idiomatic elements:

fn foo(x: &[i32], y: i32) -> Option<usize> { // 10000    }

What do we know about this feature? Well ... not so much, actually. Of course, having Option<usize>

in the return values is much better than what C provided us, but we still do not have any information about the semantics of the function. In particular, there is no guarantee that there will be no side effects, nor can there be any way to verify the expected behavior.

Can a correctly written test fix the situation? We look:

 #[test] fn test() { assert_eq!(foo(&[1, 2, 3], 2), Some(1)); assert_eq!(foo(&[1, 2, 3], 4), None); }

In general, we did not get anything new - all of the same checks we could easily do with Python (and, looking ahead, the tests will give practically nothing in the future).

Use the generics, Luke!

But is it really good that we are forced to use only signed 32-bit numbers? The mess. We fix:

 fn foo<El>(x: &[El], y: El) -> Option<usize> where El: PartialEq, { // 10000    }

Yeah! This is already something. Now we can take slice, consisting of any elements that we can compare for equality. Explicit polymorphism is almost always better than implicit and almost always better than none, is it?

However, such a function may unexpectedly pass the following test for us:

 fn refl<El: PartialEq + Copy>(el: El) -> Option<usize> { foo(&[el], el) // should always return Some(0), right? } #[test] fn dont_find_nan() { assert_eq!(refl(std::f64::NAN), None); }

This immediately indicates some flaw on our part, because according to the original specification, such a call would have to return Some(0)

. Of course, the problem here is due to the specificity of types with a partially defined comparison in general and floats in particular.

Suppose now we want to get rid of such a problem - for this we just tighten the requirements for the type El:

 fn foo<El>(x: &[El], y: El) -> Option<usize> where El: Eq, { // 10000    }

Now we demand not just the possibility of comparison for equality - we require that this comparison be an equivalence relation . This somewhat narrows the range of possible parameters, but now both types and tests suggest (albeit not explicitly) that the expected behavior should really fall into the specification.

Lyrical digression: we want to go MORE generic!

This option is not related to the original task, but, in my opinion, is a good illustration of the principle: "be liberal in what you accept, be conservative in what you do". In other words: if there is an opportunity, without prejudice to ergonomics and performance, to make the type of accepted values more general - it makes sense to do just that.

Consider this option:

 fn foo<'a, El: 'a>(x: impl IntoIterator<Item = &'a El>, y: El) -> Option<usize> where El: Eq, { // 10000    }

What do we know about this function now? Everything is the same, only now it accepts not a list or a slice as an input, but some arbitrary object that can be made to give out alternately links to objects of type El and compare them with the sought-after: an analogue in Java, if I remember correctly, was would be a function that takes an Iterable<Comparable>

.

As before, only stricter

However, for example, the guarantees offered by the compiler at already known stages are not enough for us. Or, say, we don’t want (for one reason or another) to get into a heap, but want to work on the stack — which means that we need an array instead of a vector — but at the same time we want our code to be generalized to different sizes of the array . Or we want the function to be optimized as much as possible for each specific size of the input list.

In short, we need a generic array - and Rust already has a package that provides this verbatim .

Now we have at our disposal the following code:

 use generic_array::{GenericArray, ArrayLength}; fn foo<El, Size>(x: GenericArray<El, Size>, y: El) -> Option<usize> where El: Eq, Size: ArrayLength<El>, { // 10000    }

What do we know from this code? That the function takes an array of some fixed size, reflected in its type (and compiled independently for each such size). So far, this does not change much - in the end, exactly the same guarantees, not only at the monomorphization stage, but in runtime, provided the previous version with a cut.

But we can go even further.

Type Level Arithmetic

The original article mentioned several guarantees that we received from Idris and could not get from anyone else. One of them - and perhaps the simplest one, because for it you don’t even need to write a full proof or a full test, but just specify the type a little bit - it says that the return value, if it exists (i.e. if it is not Nothing

), is guaranteed not to exceed the length of the input list.

It would seem that the necessary condition for such a guarantee is the presence of dependent types, well, or at least some kind of similarity, and it would be strange to expect something like this from Rust, right?

Meet - typenum . With it, our function can be depicted like this:

 use generic_array::{ArrayLength, GenericArray}; use typenum::{IsLess, Unsigned, B1}; trait UnsignedLessThan<T> { fn as_usize(&self) -> usize; } impl<Less, More> UnsignedLessThan<More> for Less where Less: IsLess<More, Output = B1>, Less: Unsigned, { fn as_usize(&self) -> usize { <Self as Unsigned>::USIZE } } fn foo<El, Size>(x: GenericArray<El, Size>, y: El) -> Option<Box<dyn UnsignedLessThan<Size>>> where El: Eq, Size: ArrayLength<El>, { // 10000    }

"What the hell is this black magic ?!" - you ask. And you will certainly be right: typenum is that black magic, and attempts to use it at least somehow sanely are doubly.

Nevertheless, the signature of this function is quite unambiguous.

The function accepts an array of El elements of length Size and one element of type El.
The function returns an Option value, which, if it is Some,
- It is a trait object based on the UnsignedLessThan<T>
  
  , which takes the Size type as a parameter;
- in turn, the UnsignedLessThan<T>
  
  IsLess<T>
  
  implemented for all types that implement Unsigned
  
  and IsLess<T>
  
  for which IsLess<T>
  
  returns B1, i.e. true

In other words, in this way we wrote a function that guaranteed to return a non-negative (unsigned) number smaller than the original size of the array (or rather, of course, it returns this same trait object, from which we must later call the as_usize

method, guaranteed to return such a number) .

There are exactly two tricks in this approach:

We can noticeably lose in performance. If suddenly, for some reason, such our function is in the "hot" part of the program, the constant need for dynamic calls may be one of the slowest operations. However, this drawback may well not be as significant as it seems, but there is a second:
In order for this function to compile correctly, we will need to either actually write inside it the proof of the correctness of its operation, or "trick" the type system through unsafe

. The first is too complicated for the Friday article, but the second is simply a scam.

Conclusion

Of course, in practice, in such cases, either the second implementation (receiving a slice of an arbitrary type) or the implementation under a spoiler (receiving an iterable object) will be used. All subsequent arguments almost certainly are of no practical interest and serve solely as an exercise in working with a type system.

Nevertheless, the fact that on the Rust type system can be able to emulate one of the features of the obviously stronger Idris type system is, in my opinion, quite indicative.

All Articles