Exploring the depths of type annotations in Python. Part 1

Since 2014, when Python introduced support for type annotations , programmers have been working on their implementation in their code. The author of the material, the first part of the translation of which we publish today, says that according to her assessment, quite bold, now type annotations (sometimes called “hints”) are used in about 20-30% of the code written in Python 3. Here are the results of the survey, which she, in May 2019, spent on Twitter.

As it turned out, annotations are used by 29% of respondents. According to the author of the article, in recent years, she has increasingly come across type annotations in various books and study guides.

→ The second part

In the Python documentation, the terms “type hint” and “type annotation” are used interchangeably. The author of the article mainly uses the term “type hint”, we use the term “type annotation”.

This article will cover a wide range of issues regarding the use of type annotations in Python. If you want to discuss the original article with the author, you can use the pull-request mechanism.

Introduction

Here you can find a classic example of how code written using type annotations looks like.

Here is the regular code:

def greeting(name):     return 'Hello ' + name

Here is the code to which the annotations are added:

 def greeting(name: str) -> str:    return 'Hello ' + name

The template according to which the code with type annotations is drawn up looks like this:

 def function(variable: input_type) -> return_type:    pass

At first glance, it seems that applying type annotations in code looks simple and straightforward. But the developer community still has a fair amount of uncertainty in understanding what exactly type annotations are. In addition, there is even ambiguity in how to call them correctly - “annotations” or “hints”, and in what advantages their use in the code base gives.

When I began to research this topic and think about whether I need to use type annotations, I felt completely confused. As a result, I decided to do the same as I always do, having met with something incomprehensible. I decided to deeply research this issue and presented my research in the form of this material, which, I hope, will be useful to those who, like me, want to deal with type annotations in Python.

How do computers execute our programs?

In order to understand what Python developers are trying to achieve with type annotations, let's talk about the mechanisms of computer systems that are several levels below the Python code. Thanks to this, we can better understand how, in general terms, computers and programming languages work.

Programming languages, at its core, are tools that allow you to process data using a central processing unit (CPU), as well as store in memory the data that needs to be processed and the data resulting from the processing.

Simplified computer circuit

The processor is, in essence, pretty “stupid." He is capable of performing impressive actions with data, but he only understands machine instructions, which boil down to sets of electrical signals. Machine language can be represented as consisting of zeros and ones.

In order to prepare these zeros and ones that the processor understands, you need to translate the code from a high-level language to a low-level language. This is where compilers and interpreters come in.

If the language is compiled or executable (Python code is executed by means of the interpreter), then the code in this language turns into low-level machine code that contains instructions for low-level computer components, that is, for hardware.

There are several ways to translate code written in some programming language into code that machines can understand. You can either create a file with the program code and convert it to machine code using the compiler (this is how C ++, Go, Rust and some other languages work), or run the code directly using the interpreter, which will be responsible for converting the code into machine commands. That is how, with the help of interpreters, programs in Python are launched, as well as in other "scripting" languages, such as PHP and Ruby.

Interpreted language code processing scheme

How does the hardware know how to store zeros and ones representing the data the program works with in memory? Our program must inform the computer how to allocate memory for this data. And what is this data? It depends on what types of data a particular language supports.

Data types are available in all languages. Typically, data types are one of the first topics that beginners learn to learn programming in a certain language.

There are excellent tutorials on the same Python, for example - this is where you can find detailed information about data types. In simple terms, data types are different ways of representing data stored in memory.

Among the existing data types, for example, strings and integers can be noted. The set of data types available to the developer depends on the programming language they use. Here, for example, is a list of basic Python data types :

 int, float, complex str bytes tuple frozenset bool array bytearray list set dict

There are data types consisting of other data types. For example, a list in Python can store integers or strings, as well as both.

In order to find out how much memory you need to allocate to store some data, the computer needs to know about what type of data the program is going to place in memory. Python has a built-in getsizeof

function that will let us know about the amount of memory, expressed in bytes, needed to store the values of various data types.

Here is one great answer to StackOverflow where you can find information on how to find out the sizes of the “minimum” values that can be stored in various types of variables.

 import sys import decimal import operator d = {"int": 0,    "float": 0.0,    "dict": dict(),    "set": set(),    "tuple": tuple(),    "list": list(),    "str": "a",    "unicode": u"a",    "decimal": decimal.Decimal(0),    "object": object(), } #   ,        d_size = {} for k, v in sorted(d.items()):    d_size[k]=sys.getsizeof(v) sorted_x = sorted(d_size.items(), key=lambda kv: kv[1]) sorted_x [('object', 16), ('float', 24), ('int', 24), ('tuple', 48), ('str', 50), ('unicode', 50), ('list', 64), ('decimal', 104), ('set', 224), ('dict', 240)]

As a result, by sorting a dictionary containing samples of values of various types, we can find out that the maximum size is an empty dictionary ( dict

) and followed by a set ( set

). In comparison with them, very little space is needed to store a single integer ( int

type).

The above example gives us an idea of how much memory is required to store various values used in programs.

Why should this bother us at all? The fact is that some types are better than others for solving some problems, allowing you to solve these problems more effectively. In some situations, you need to carefully check the types. For example, sometimes checks are made that the data types used in the program do not run counter to some assumptions made when designing the program.

But what are these types? Why do we need them? This is where the concept of the “type system” comes into play.

Introduction to Type Systems

A long time ago , in a distant distant galaxy, people who performed mathematical calculations manually realized that if they compared the “types” with numbers or elements of equations, they could reduce the number of logical errors that appear when deriving mathematical proofs about these elements.

Since at the very beginning computer science was reduced, in essence, to the execution of large volumes of manual calculations, some of those old principles were transferred to these calculations. Type systems have become a tool used to reduce the number of errors in programs by assigning appropriate types to various variables or elements.

Here are some examples:

If we write software for a bank, then we cannot use the lines in a code fragment that calculates the balance on someone else’s account.
If we are working with the data of a survey and want to understand if someone answered a question positively or negatively, then the answers “yes” and “no” will most naturally be encoded using a logical type.
When developing a large search engine, we must limit the number of characters that users of this system can enter in the search query field. This means that we need to check some string data for compliance with certain parameters.

Today, in programming, there are two main type systems. Here is what Steve Klabnik writes about this: “A static type system is the mechanism by which the compiler checks the source code and assigns labels (called“ types ”) to program fragments, and then uses them to draw conclusions about the program’s behavior. A dynamic type system is the mechanism by which the compiler generates code to observe what types of data (they are also called “types” by coincidence) are used by the program. ”

What does it mean? This means that when working with compiled languages, you usually need to assign entity types in advance. Thanks to this, the compiler will be able to check them during compilation of the code and find out whether it will be possible to create a meaningful program from the source code provided to it.

I recently came across one explanation of the difference between static and dynamic typing. This is probably the best text I have read on this subject. Here is a fragment of it: “I used to use statically typed languages, but over the past few years I have been programming mainly in Python. At first, using static typing annoyed me a bit. There was a feeling that the need to declare variable types slows down and forces me to overly express my ideas. Python just let me do what I wanted, even if I accidentally did something wrong. Using languages with static typing is like giving a task to someone who always asks again, clarifying the small details of the case that he is assigned to complete. Dynamic typing is when the person who is given the task always nods in agreement. In this case, there is a feeling that he understood you. But sometimes there is no complete certainty that the one who has been given the task has properly figured out what they want from him. ”

In talking about type systems, I came across something that I did not immediately understand. Namely, the concepts of “static typing” and “dynamic typing” are closely related to the concepts of “compiled language” and “interpreted language”, but the terms “static” and “compiled”, as well as the terms “dynamic” and “interpreted” are not synonyms . The language can be dynamically typed, like Python, and at the same time compiled. Similarly, a language can be statically typed, like Java, but also interpreted (for example, in the case of Java, when using Java REPL).

Comparison of data types in statically and dynamically typed languages

What is the difference between data types in statically and dynamically typed languages?

When using static typing, types must be declared in advance. For example, if you work in Java, then your programs will look something like this:

 public class CreatingVariables {  public static void main(String[] args) {    int x, y, age, height;    double seconds, rainfall;    x = 10;    y = 400;    age = 39;    height = 63;    seconds = 4.71;    rainfall = 23;    double rate = calculateRainfallRate(seconds, rainfall);   } private static double calculateRainfallRate(double seconds, double rainfall) {  return rainfall/seconds; }

Pay attention to the beginning of the program. Several variables are declared there, next to which there are indications of the types of these variables:

 int x, y, age, height; double seconds, rainfall;

In addition, types are specified both when declaring functions and declaring their arguments. Without these type declarations, the program cannot be compiled. When creating Java programs, from the very beginning, you need to plan what types these or those entities will have. As a result, the compiler, while processing the code of such programs, will know what exactly it needs to check in the process of generating machine code.

Python relieves the programmer of such hassle. Similar Python code might look like this:

 y = 400 age = 39 height = 63 seconds = 4.71 rainfall = 23 rate = calculateRainfall(seconds, rainfall) def calculateRainfall(seconds, rainfall):  return rainfall/seconds

How does all this work in the bowels of Python? To be continued…

Dear readers! Which programming language that you used left the most pleasant impression?

All Articles

Exploring the depths of type annotations in Python. Part 1

Introduction

How do computers execute our programs?

Introduction to Type Systems

Comparison of data types in statically and dynamically typed languages

More articles: