The incomprehensible performance of multiple scheduling

Under the cutscene, a decryption of the report by Stefan Karpinsky, one of the key developers of the Julia language, is proposed. In the report, he discusses the unexpected results of the convenient and efficient multiple dispatch, taken as the main Julia paradigm.









From a translator : the title of the report refers to an article by Eugene Wigner, "The Incomprehensible Effectiveness of Mathematics in the Natural Sciences . "







Multiple scheduling is a key paradigm of the Julia language, and during its existence, we, the developers of the language, noticed something expected, but at the same time puzzling. At least we did not expect this to the extent that we saw it. This is something - a staggering level of code reuse in the Julia ecosystem, which is much higher than in any other language I know.







We constantly see that some people write generalized code, someone else defines a new data type, these people are not familiar with each other, and then someone applies this code to this unusual data type ... And everything just works. And this happens surprisingly often .

I always thought that such behavior should be expected from object-oriented programming, but I used many object-oriented languages, and it turns out that usually everything just doesn't work in them. Therefore, at some point I thought: why is Julia such an effective language in this regard? Why is the code reuse level so high there? And also - what lessons can be learned from this that other languages ​​could borrow from Julia in order to become better?







Sometimes when I say this, the public does not believe me, but you are already on JuliaCon, so you are aware of what is happening, so I will focus on why, in my opinion, this is happening.







But for starters - one of my favorite examples.













On the slide is the result of the work of Chris Rakaukas. He writes all sorts of very generalized packages for solving differential equations. You can feed dual numbers , or BigFloat, whatever you want. And somehow he decided that he wants to see the error of the integration result. And there was a Measurements package that can track both the value of a physical quantity and the propagation of an error through a sequence of formulas. Also, this package supports elegant syntax for an uncertainty value using the Unicode character ±



. Here on the slide it is shown that the acceleration of gravity, the length of the pendulum, the initial velocity, the angle of deviation are all known with some kind of error. So, you define a simple pendulum, pass its equations of motion through the solver ODE and - bam! - everything works . And you see a graph with mustache inaccuracies. And I still don’t show that the code for drawing a graph is also generalized, and you just put in the value with an error from Measurements.jl and get a graph with errors.







The level of compatibility of different packages and generalization of the code is simply brain-bearing. How does it just work ? It turns out yes.







Well, not that we did not expect this at all. After all, we included the concept of multiple scheduling in the language precisely because it allows us to express generalized algorithms. So all of the above is not so crazy. But it’s one thing to know this in theory, and another to see in practice that the approach really works. After all, single dispatching and operator overloading in C ++ should also give a similar result - but in reality they often do not work as they would like.







In addition, we are witnessing something more than we expected when developing the language: not just generalized code is written. Further I will try to tell what, in my opinion, this is more.







So, there are two types of code reuse, and they are quite different. One is generalized algorithms, and this is the first thing they remember. The second, less obvious, but seems to be more important aspect is the simplicity with which Julia uses the same data types in a wide variety of packages. To some extent, this happens because type methods do not become an obstacle to its use: you do not need to agree with the type author about the interfaces and methods that it inherits; you can simply say: “Oh, I like this type of RGB. I’ll come up with my own operations on it, but I like its structure.”







Foreword Multiple scheduling versus function overload



Now I have to mention function overloading in C ++ or Java, as I am constantly asked questions about them. At first glance, it is no different from multiple dispatch. What is the difference and why is function overload worse?







I'll start with an example on Julia:







 abstract type Pet end struct Dog <: Pet; name::String end struct Cat <: Pet; name::String end function encounter(a::Pet, b::Pet) verb = meets(a, b) println("$(a.name) meets $(b.name) and $verb") end meets(a::Dog, b::Dog) = "sniffs" meets(a::Dog, b::Cat) = "chases" meets(a::Cat, b::Dog) = "hisses" meets(a::Cat, b::Cat) = "slinks"
      
      





We define the abstract type of Pet



, introduce the subtypes of Dog



and Cat



for it, they have a name field (the code repeats a little, but is tolerable) and defines a generalized function of "meeting" that takes two objects of type Pet



arguments. In it, we first calculate the “action” determined by the result of calling the generalized meet()



function, and then print the sentence describing the meeting. In the meets()



function, we use multiple dispatch to determine the action that one animal performs when it encounters another.







Add a couple of dogs and a couple of cats and see the results of the meeting:







 fido = Dog("Fido") rex = Dog("Rex") whiskers = Cat("Whiskers") spots = Cat("Spots") encounter(fido, rex) encounter(rex, whiskers) encounter(spots, fido) encounter(whiskers, spots)
      
      





Now we’ll “translate” the same into C ++ as literally as possible. Define the Pet



class with the name



field - in C ++ we can do this (by the way, one of the advantages of C ++ is that data fields can even be added to abstract types. Then we define the base meets()



function, define the encounter()



function for two objects of type Pet



and, finally, define the derived classes Dog



and Cat



and do overload meets()



for them:







 class Pet { public: string name; }; string meets(Pet a, Pet b) { return "FALLBACK"; } void encounter(Pet a, Pet b) { string verb = meets(a, b); cout << a.name << " meets " << b. name << " and " << verb << endl; } class Cat : public Pet {}; class Dog : public Pet {}; string meets(Dog a, Dog b) { return "sniffs"; } string meets(Dog a, Cat b) { return "chases"; } string meets(Cat a, Dog b) { return "hisses"; } string meets(Cat a, Cat b) { return "slinks"; }
      
      





The main()



function, as in the Julia code, creates dogs and cats and makes them meet:







 int main() { Dog fido; fido.name = "Fido"; Dog rex; rex.name = "Rex"; Cat whiskers; whiskers.name = "Whiskers"; Cat spots; spots.name = "Spots"; encounter(fido, rex); encounter(rex, whiskers); encounter(spots, fido); encounter(whiskers, spots); return 0; }
      
      





So, multiple dispatching against function overloading. Gong!













What do you think will return code with multiple dispatching?







$ julia pets.jl
 Fido meets Rex and sniffs Rex meets Whiskers and chases Spots meets Fido and hisses Whiskers meets Spots and slinks
      
      





The animals meet, sniff, sizzle and play catch-up - as was intended.







$ g ++ -o pets pets.cpp && ./pets
 Fido meets Rex and FALLBACK Rex meets Whiskers and FALLBACK Spots meets Fido and FALLBACK Whiskers meets Spots and FALLBACK
      
      





In all cases, the "fallback" option is returned.







Why? Because this is how function overloading works. If multiple dispatching worked, then meets(a, b)



inside encounter()



would be called with the specific types that a



and b



had at the time of the call. But overloading is applied, therefore meets()



is called for the static types a



and b



, which both in this case are Pet



.







So, in the C ++ approach, the direct "translation" of generalized Julia code does not give the desired behavior due to the fact that the compiler uses types that are derived statically at the compilation stage. And the whole point is that we want to call a function based on real concrete types that variables have in runtime. Template functions, although they improve the situation somewhat, still require knowledge of all types included in the expression statically at compile time, and it is easy to come up with an example where this would be impossible.







For me, such examples show that multiple dispatching does the right thing, and all other approaches are just not a very good approximation to the correct result.







Now let's see such a table. I hope you find it meaningful:







Scheduling type Syntax Dispatch arguments Degree of expressiveness Expressive opportunity
no f (x 1 , x 2 , ...) {} O (1) constant
solitary x 1 .f (x 2 , ...) {x 1 } O (| X 1 |) linear
multiple f (x 1 , x 2 , ...) {x 1 , x 2 , ...} O (| X 1 | | X 2 | ...) exponential


In languages ​​without dispatching, you simply write f(x, y, ...)



, the types of all arguments are fixed, i.e. a call to f()



is a call to a single function f()



, which may be in the program. The degree of expressiveness is constant: calling f()



always does one and only one thing. Single dispatch was a major breakthrough in the transition to OOP in the 1990s and 2000s. The dot syntax is usually used, which people really like. And an additional expressive opportunity appears: the call is dispatched according to the type of object x 1 . An expressive opportunity is characterized by the power of the set | X 1 | types having the f()



method. In multiple scheduling, however, the number of potential options for the function f()



is equal to the power of the Cartesian product of sets of types to which the arguments can belong. In reality, of course, hardly anyone needs so many different functions in one program. But the key point here is that the programmer is given a simple and natural way to use any element of this variety, and this leads to an exponential growth of opportunities.







Part 1. General programming



Let's talk about the generalized code - the main feature of multiple dispatching.







Here is a (completely artificial) example of generic code:







 using LinearAlgebra function inner_sum(A, vs) t = zero(eltype(A)) for v in vs t += inner(v, A, v) #  ! end return t end inner(v, A, w) = dot(v, A * w) #   
      
      





Here A



is something matrix-like (although I didn’t indicate the types, and you can guess something by name), vs



is the vector of some vector-like elements, and then the scalar product is considered through this “matrix”, for which a generalized definition is given without specifying any types. Generalized programming here consists in this very call of the inner()



function in a loop (professional advice: if you want to write generalized code, just remove any type restrictions).







So, "look, mom, it works":







 julia> A = rand(3, 3) 3×3 Array{Float64,2}: 0.934255 0.712883 0.734033 0.145575 0.148775 0.131786 0.631839 0.688701 0.632088 julia> vs = [rand(3) for _ in 1:4] 4-element Array{Array{Float64,1},1}: [0.424535, 0.536761, 0.854301] [0.715483, 0.986452, 0.82681] [0.487955, 0.43354, 0.634452] [0.100029, 0.448316, 0.603441] julia> inner_sum(A, vs) 6.825340887556694
      
      





Nothing special, it calculates some value. But - the code is written in a generalized style and will work for any A



and vs



, if only it would be possible to perform the corresponding operations on them.







As for efficiency on specific data types - how lucky. I mean that for dense vectors and matrices this code will do it “as it should” - it will generate machine code with invocation of BLAS operations, etc. etc. If you pass static arrays, then the compiler will take this into account, expand the cycles, apply vectorization - everything is as it should.







But more importantly, the code will work for new types, and you can make it not just super-efficient, but super-duper-efficient! Let's define a new type (this is the real data type that is used in machine learning), a unitary vector (one-hot vector). This is a vector in which one of the components is 1, and all the others are zero. You can imagine it very compactly: all that needs to be stored is the length of the vector and the number of the nonzero component.







 import Base: size, getindex, * struct OneHotVector <: AbstractVector{Int} len :: Int ind :: Int end size(v::OneHotVector) = (v.len,) getindex(v::OneHotVector, i::Integer) = Int(i == v.ind)
      
      





In fact, this is really the whole type definition from the package that adds it. And with this definition, inner_sum()



also works:







 julia> vs = [OneHotVector(3, rand(1:3)) for _ in 1:4] 4-element Array{OneHotVector,1}: [0, 1, 0] [0, 0, 1] [1, 0, 0] [1, 0, 0] julia> inner_sum(A, vs) 2.6493739294755123
      
      





But for a scalar product, a general definition is used here - for this type of data it is slow, not cool!







So, general definitions work, but not always in an optimal way, and you may occasionally encounter this when using Julia: "well, a general definition is called up, that's why this GPU code has been working for the fifth hour ..."







In inner()



by default, the general definition of the matrix product by a vector is called, which when multiplied by a unitary vector returns a copy of one of the columns of type Vector{Float64}



. Then the general definition of the scalar product dot()



is called with the unitary vector and this column, which does a lot of unnecessary work. In fact, for each component it is checked "are you equal to one? And you?" etc.







We can greatly optimize this procedure. For example, replacing matrix multiplication with OneHotVector



simply by selecting a column. Fine, define this method, and that’s it.







 *(A::AbstractMatrix, v::OneHotVector) = A[:, v.ind]
      
      





And here it is, power : we say "we want to dispatch on the second argument, " no matter what is in the first. Such a definition will simply pull the row out of the matrix and will be much faster than the general method - iteration and summation over columns is removed.







But you can go further and directly optimize inner()



, because the multiplication of two unitary vectors through the matrix simply pulls out an element of this matrix:







 inner(v::OneHotVector, A, w::OneHotVector) = A[v.ind, w.ind]
      
      





That's the promised super-duper efficiency. And all that is needed is to define this inner()



method.







This example shows one of the applications of multiple scheduling: there is a general definition of a function, but for some data types it does not work optimally. And then we pointwise add a method that preserves the function behavior for these types, but works much more efficiently .







But there is another area - when there is no general definition of a function, but I want to add functionality for some types. Then you can add it with minimal effort.







And the third option - you just want to have the same function name, but with different behavior for different data types - for example, so that the function behaves differently when working with dictionaries and arrays.







How to get similar behavior in single dispatch languages? It is possible, but difficult. Problem: when overloading the function *



it was necessary to dispatch on the second argument, and not on the first. You can do double dispatch: first, dispatch by the first argument and call the AbstractMatrix.*(v)



method. And this method, in turn, calls something like v.__rmul__(A)



, i.e. the second argument in the original call has now become the object whose method is actually being called. __rmul__



here taken from Python, where such behavior is a standard pattern, but it seems to work only for addition and multiplication. Those. the problem of double dispatching is solved if we want to call a function called +



or *



, otherwise - alas, not our day. In C ++ and other languages ​​- you need to build your bike.







OK, what about inner()



? Now there are three arguments, and scheduling is in the first and third. What to do in languages ​​with single dispatch is not clear. "Triple dispatch" I never met live. There are no good solutions. Usually, when a similar need arises (and in numerical codes it appears quite often), people eventually implement their multiple dispatch system. If you look at large projects for numerical calculations in Python, you will be amazed how many of them go this way. Naturally, such implementations work situationally, poorly designed, full of bugs and slow ( reference to Greenspan 's tenth rule - approx. Transl. ), Because Jeff Besancon did not work on any of these projects (the author and chief developer of the type dispatch system in Julia - approx. transl. ).







Part 2. General types



I will pass to the reverse side of the Julia paradigm - the general types. This, in my opinion, is the main "workhorse" of the language, because it is in this area that I observe a high level of code reuse.







For example, suppose you have an RGB type, like what ColorTypes.jl has. There is nothing complicated in it, just three values ​​are gathered together. For the sake of simplicity, we assume that the type is not parametric (but could be), and the author defined several basic operations for it, which he found useful. You take this type and think: "Hmm, I would like to add more operations on this type." For example, imagine RGB as a vector space (which, strictly speaking, is incorrect, but it will come down to a first approximation). In Julia, you simply take and add in your code all the operations that are missing.







The question arises - and cho? Why am I focusing on this so much? It turns out that in object-oriented languages ​​based on classes, this approach is surprisingly difficult to implement. Since in these languages ​​the method definitions are inside the class definition, there are only two ways to add a method - either edit the class code to add the desired behavior, or create an inheritor class with the necessary methods.







The first option inflates the definition of the base class, and also forces the developer of the base class to take care of the support of all the added methods when changing the code. What could one day make such a class unsupported.







Inheritance is a classic “recommended” option, but also not without flaws. First, you need to change the class name - let it now be not RGB



, but MyRGB



. In addition, new methods will no longer work for the original RGB



class; if I want to apply my new method to an RGB



object created in someone else’s code, I need to convert or wrap it in MyRGB



. But this is not the worst. If I made a MyRGB



class with some added functionality, someone else OurRGB



, etc. - then if someone wants a class that has all the new functionality, you need to use multiple inheritance (and this is only if the programming language allows it at all!).







So, both options are so-so. There are, however, other solutions:









The key feature of Julia in terms of reusing code is almost completely reduced to the fact that the method is defined outside the type . All. Do the same in single dispatch languages ​​- and types can be reused with the same ease. The whole story with “let's make methods part of the class” is an so-so idea, in fact. True, there is a good point - the use of classes as namespaces. If I write xf(y)



- f()



not required to be in the current namespace, it must be searched in the namespace x



. Yes, this is a good thing - but is it worth all the other troubles? I do not know. In my opinion, no (although my opinion, as you might guess, is slightly biased).







Epilogue. The problem of expression



There is such a programming problem that was noticed in the 70s. It is largely related to static type checking, because it appeared in such languages. I really think that it has nothing to do with static type checking. The essence of the problem is the following: is it possible to change the data model and the set of operations on the data at the same time, without resorting to dubious techniques.







The problem can more or less be reduced to the following:







  1. is it possible to easily and without errors add new data types to which the available methods are applicable and
  2. Is it possible to add new operations on existing types .


(1) easily done in object-oriented languages ​​and difficult in functional, (2) - vice versa. In this sense, we can just talk about the dualism of OOP and FP approaches.







In multi-dispatch languages, both operations are easy. (1) is solved by adding methods to existing functions for new types, (2) by defining new functions on existing types. , . ( https://en.wikipedia.org/wiki/Expression_problem ), . ? , , . , " , " — " " . " , " , , .







, . , , — .







, Julia ( ), . .








All Articles