Brandon Rhodes is a very modest person who presents himself on Twitter as "a Python programmer who repays a loan to the community in the form of reports or essays." The number of these “reports and essays” is impressive, as is the number of free projects that Brandon was or is a contributor to. And Brandon has published two books and is writing a third.
I very often meet in comments on Habré a fundamental misunderstanding or rejection of dynamic languages, dynamic typing, generalized programming and other paradigms. I publish this authorized (abbreviated) translation (transcript) of one of Brandon's reports in the hope that it will help programmers existing in the paradigms of static languages to better understand dynamic languages, in particular Python.
As is customary with me, please inform me in PM of my mistakes and typos.
What does the phrase “marginal case” mean in the title of my report? The limiting case arises when you iterate over a sequence of options until you reach the extreme value. For example, an n-sided polygon. If n = 3, then this is a triangle, n = 4 is a quadrangle, n = 5 is a pentagon, etc. As n approaches infinity, the sides become smaller and larger, and the outline of the polygon becomes like a circle. Thus, the circle is the limiting case for regular polygons. This is what happens when a certain idea is brought to the limit.
I want to talk about Python as an extreme case for C ++. If you take all the good ideas from C ++ and clean them up to their logical conclusion, I am sure you will end up with Python as naturally as a series of polygons comes to a circle.
I became interested in Python in the 90s: it was such a period in my life when I got rid of "non-core assets", as I call it. Many things began to bore me. Interruptions, for example. Remember, once on many computer boards there were such contacts with jumpers? And you set these jumpers on the manuals so that the video card receives a higher priority interrupt, so that your game runs faster? So, I was tired of allocating and freeing memory using malloc()
and free()
about the same time as I stopped adjusting the performance of my computer with jumpers. It was 1997 or so.
I mean, when we study a process, we usually strive to get complete control over it, to have all the possible levers and buttons at hand. Then some people are still fascinated by this possibility of control. But my character is that, as soon as I get used to the management and understand what's what, I immediately start looking for the opportunity to relinquish some of my powers, transfer levers and buttons to some machine so that it assigns interruptions for me.
Therefore, in the late 90s, I was looking for a programming language that would allow me to focus on the subject area and task modeling, rather than worrying about in which area of the computer’s memory my data is stored. How can we simplify C ++ without repeating the sins of famous scripting languages?
For example, I could not use Perl, and you know why? This dollar sign! He immediately made it clear that the creator of Perl did not understand how programming languages worked. You use the dollar in Bash to separate variable names from the rest of the string because the Bash program consists of literally perceived commands and their parameters. But after you get to know these programming languages, in which strings are placed between pairs of small characters called quotation marks, and not throughout the entire program text, you begin to perceive $
as visual garbage. The dollar sign is useless, it is ugly, it must go! If you want to design a language for serious programming, you should not use special characters to indicate variables.
What about the syntax? Take C as a basis! It works pretty well. Let assignment be denoted by an equal sign. This designation is not accepted in all languages, but, one way or another, many are used to it. But let's not make assignment an expression. The users of our language will be not only professional programmers, but also schoolchildren, scientists or data scientists (if you are not aware which of these categories of users writes the worst code, then I’ll hint that these are not schoolchildren). We will not give users the opportunity to change the state of variables in unexpected places, and we will make the assignment an operator.
What, then, should be used to indicate equality if the equal sign has already been used for assignment? Of course, double assignment, as is done in C! Many are already used to it. We will also borrow from C the symbols for all arithmetic and bitwise operations, because these symbols work, and many are quite happy with them.
Of course, we can improve something. What do you think when you see the percent sign in the program text? About string interpolation, of course! Although %
is primarily a module operator, it was simply undefined for strings. And if so, then why not reuse it?
Numeric and string literals that control sequences with backslashes - all this will look like in C.
Execution flow control? The same if
, else
, while
, break
and continue
. Of course, we'll add some fun by co-opting the good old for
to iterate over data structures and value ranges. This will be proposed later in C ++ 11, but in Python, the for
operator initially encapsulated all operations for calculating sizes, traversing links, incrementing the counter, etc., in other words, doing everything that was necessary to provide the user with an element of the data structure. What type of structures? It doesn’t matter, just pass it to for
, it will figure it out.
We will also borrow exceptions from C ++, but we will make them so cheap in terms of resource consumption that they can be used not only to handle errors, but also to control the flow of execution. We will make indexing more interesting by adding slicing - the ability to index not only individual elements of sequential data structures, but also their ranges.
Oh yes! We'll fix the original design flaw in C - add a dangling comma!
This story began with Pascal, a terrible language in which a semicolon is used as an expression delimiter . This means that the user must put a semicolon at the end of each expression in the block except the last . Therefore, every time you change the order of expressions in a program in Pascal, you risk receiving a syntax error if you do not make sure to remove the semicolon from the last line and add it to the end of the line that used to be the last.
If (n = 0) then begin writeln('N is now zero'); func := 1 end
Kernigan and Ritchie did the right thing when they defined the semicolon in C as the terminator of the expression, rather than the separator, creating that wonderful symmetry when each line in the program, including the last, ends the same and can be freely interchanged. Unfortunately, in the future, a sense of harmony changed for them, and they made the comma a separator in static initializers. This looks fine when the expression fits on one line:
int a[] = {4, 5, 6};
but when your initializer gets longer and you arrange it vertically, you get the same uncomfortable asymmetry as in Pascal:
int a[] = { 4, 5, 6 };
At an early stage of its development, Python made the hanging comma in data structures completely optional, regardless of how the elements of this structure are arranged: horizontally or vertically. By the way, this is very convenient for code auto-generation: you do not need to treat the last element as a special case.
Later, the C99 and C ++ 11 standards also corrected the initial misunderstanding, allowing you to add a comma after the last literal in the initializer.
We also need to implement in our programming language such a thing as namespaces or namespaces. This is a critical part of the language, which should save us from mistakes like name conflicts. We will do it easier than C ++: instead of giving the user the ability to arbitrarily name namespace, we will create one namespace per module (file) and designate them with file names. For example, if you create the module foo.py
, it will be assigned the namespace foo
.
To work with such a simplified model of namespaces, a user needs only one operator.
Create the my_package
directory, put the my_module.py
file my_module.py
, and declare the class in the file:
class C(object): READ = 1 WRITE = 2
then access to the class attributes will be as follows:
import my_package.my_module my_package.my_module.C.READ
Do not worry, we will not force the user to print the full name each time. We will give him the opportunity to use several versions of the import
statement to vary the degree of “proximity” of the namespace:
import my_package.my_module my_package.my_module.C.READ from my_package import my_module my_module.C.READ from my_package.my_module import C C.READ
Thus, the same names given in different packages will never conflict:
import json j = json.load(file) import pickle p = pickle.load(file)
The fact that each module has its own namespace also means that we do not need a static
modifier. However, we recall one function that static
performed - encapsulating internal variables. To show colleagues that a given name (variable, class, or module) is not public, we start it with an underscore, for example, _ignore_this
. It can also be a signal for the IDE not to use this name in auto-completion.
We will not implement function overloading in our language. The overload mechanism is too complex. Instead, we will use optional arguments with default values that can be omitted from the call, as well as named arguments to “jump over” the optional arguments with valid defaults and only specify values that differ from the default ones. Importantly, the lack of overload will save us from the need to determine which function from the set of overloaded functions was just called, how the call manager worked: the function is always one in this module, it is easy to find by name.
We will give the user full access to many system APIs, including sockets. I don’t understand why authors of scripting languages always offer their own ingenious ways to open a socket. However, they never realize the full Unix Socket API. They implement 5-6 functions that they understand, and throw away everything else. Python, by contrast, has standard modules for interacting with the OS that implement each standard system call. That means you can open Stevens' book right now and start writing code. And all your sockets, processes and forks will work exactly as it says. Yes, it is possible that Guido or the early Python contributors did just that because they were too lazy to write their implementation of system libraries, too lazy to explain to users again how sockets work. But as a result, they achieved a wonderful effect: you can transfer all your UNIX knowledge gained in C and C ++ to the Python environment.
So, we decided on what features we will “borrow” from C ++ to create our simple scripting language. Now we need to decide what we want to fix.
Unknown behavior, undefined behavior, behavior defined by implementation ... These are all bad ideas for the language that will be used by schoolchildren, scientists, and data scientists. And the gain in performance for which such things are allowed is often negligible compared to inconvenience. Instead, we will declare that any syntactically correct program gives the same result on any platform. We will describe the language standard with phrases such as “Python evaluates all expressions from left to right” instead of trying to reorder the calculations depending on the processor, OS, or moon phase. If the user is sure that the order of calculations is important, he has the right to properly rewrite the code: in the end, the user is the main one.
You must have encountered similar errors: expression
oflags & 0x80 == nflags & 0x80
always returns 0, because comparisons in C take precedence over bitwise operations. In other words, this expression evaluates to
oflags & (0x80 == nflags) & 0x80
Oh, that C!
We will eliminate the potential cause of such errors in our simple scripting language, putting the priority of comparison operations behind arithmetic and bit manipulation, so that the expression from our example is calculated more intuitively:
(oflags & 0x80) == (nflags & 0x80)
Code readability is important to us. If C arithmetic operations are familiar to the user even by school arithmetic, then the confusion between logical and bitwise operations is a clear source of errors. We will replace the double ampersand with the word and
, and the double vertical line with the word or
, so that our language looks more like human speech than the picket of “computer” symbols.
We will leave the possibility of abbreviated computation to our logical operators ( https://en.wikipedia.org/wiki/Short-circuit_evaluation ), but also endow them with the ability to return the final value of any type, not just Boolean. Then expressions like
s = error.message or 'Error'
In this example, the variable will be assigned the value error.message
if it is nonempty, otherwise the string 'Error'.
We extend C's idea that 0 is false to other objects besides integers. For example, on empty lines and containers.
We will destroy the integer overflow. Our language will be consistent in implementation and easy to use, so our users will not need to remember a special value suspiciously close to two billion, after which the whole, increased by one, suddenly changes sign. We implement integers that behave like integers until they exhaust all available memory.
Another important issue in the design of the scripting language: the rigor of typing. Many in the audience are familiar with JavaScript? What happens if the number 3 is subtracted from the string '4'?
js> '4' - 3 1
Fine! And if you add the number 3 to the string '4'?
js> '4' + 3 "43"
This is called lax (or weak) typing. This is something like an inferiority complex when a programming language thinks that a programmer will condemn it if it cannot return the result of any, even obviously meaningless, expression by repeatedly casting types. The problem is that type casting, which a weakly typed language produces automatically, very rarely leads to meaningful results. Let's try a little more complex conversions:
js> [] + [] "" js> [] + {} "[object Object]"
We expect that the addition operation is commutative, but what will happen if we change the terms in the latter case?
js> {} + [] 0
JavaScript is not alone in its problems. Perl in a similar situation also tries to return at least something:
perl> "3" + 1 4
And awk will do something like that:
$ echo | awk '{print "3" + 1}' 4
The creators of scripting languages have traditionally believed that loose typing is convenient . They were mistaken: loose typing is terrible ! She violates the principle of locality. If there is an error in the code, then the programming language should inform the user about it, causing an exception as close as possible to the problematic place in the code. But in all these languages, which endlessly cast types until something is worked out, control usually comes to an end, and we get a result, judging by which, in our program, something is wrong somewhere. And we have to debug our entire program, one line after another, to find this error.
Loose typing also degrades code readability, because even if we correctly use implicit type casting in a program, it happens unexpectedly for another programmer.
In Python, as in C ++, such expressions will return an error.
>>> '4' - 3 TypeError >>> '4' + 3 TypeError
Because type casting, if really necessary, is easy to write explicitly:
>>> int('4') + 3 7 >>> '4' + str(3) '43'
This code is easy to read and maintain, it makes clear what exactly is happening in the program, which leads to this result. This is because Python programmers believe that explicit is better than implicit, and the error should not go unnoticed.
Python is a strongly typed language, and the only implicit type conversion in it occurs during arithmetic operations on integers, the result of which must be expressed as a fractional number. Perhaps this also should not be allowed in the program, but in this case too many users would have to immediately explain the difference between integers and floating-point numbers, which would complicate their first steps in Python.
Continued: “ Python as the ultimate case of C ++. Part 2/2 . "