And yet C is a low-level language

Over the past decade since the advent of the C language, many interesting programming languages have been created. Some of them are still used, others have influenced the next generation of languages, the popularity of the third has quietly faded away. Meanwhile, archaic, controversial, primitive, made in the worst traditions of its generation of C languages (and its heirs) livelier than all living things.

Criticism C is a classic epistolary genre for our industry. It sounds louder, then quieter, but lately it has literally been stunning. An example is a translation of David Ciswell's article “C is Not a Low Level Language,” published on our blog some time ago. You can say different things about C, there are really a lot of unpleasant mistakes in the design of the language, but to refuse C in the "low level" is too much!

In order not to tolerate such an injustice, I took courage and tried to decide what a low-level programming language was and what practice they wanted from it, after which I went over the arguments of the critics C. This is how this article turned out.

Content

Criticism Arguments C

Here are some of C's critics' arguments, including those listed in an article by David Chiznell:

The abstract C language machine is too similar to the outdated PDP-11 architecture, which has long ceased to correspond to the device of popular modern processors.
The mismatch between an abstract machine and the device of real machines complicates the development of optimizing language compilers.
The incompleteness and complexity of the language standard leads to discrepancies in standard implementations.
The dominance of C-like languages does not allow exploring alternative processor architectures.

Let's first determine the requirements for a low-level language, after which we return to the arguments given.

Low level programming language

There is no universally accepted definition of a low level language. But before discussing controversial issues, it is desirable to have at least some initial requirements for the subject of the dispute.

No one will argue that assembly language is at the lowest level. But on each platform it is unique, so code in such a language cannot be portable. Even on a backward compatible platform, you may need to use some new instructions.

From here follows the first requirement for a low-level language: it should retain common features for popular platforms . Simply put, the compiler must be portable. Portability of the compiler simplifies the development of language compilers for new platforms, and the variety of platforms supported by compilers eliminates the need for developers to rewrite application programs for each new machine.

The first requirement conflicts with the wishes of developers of special programs: programming languages, drivers, operating systems and high-performance databases. The programmers who write these programs want to be able to manually optimize, work directly with memory, and so on. In a word, a low-level language should allow working with the details of the implementation of the platform .

Finding a balance between these two requirements - identifying aspects common to platforms and accessing as many details as possible - is a fundamental reason for the difficulty of developing a low-level language.

Note that high-level abstractions are not so important for such a language - it is more important for it to serve as a contract between the platform, the compiler and the developer. And if there is a contract, then there is a need for a language independent of the particular implementation standard .

Our first requirement — features common to target platforms — is expressed in an abstract language machine, so we’ll start the discussion with C.

It's not just about PDP-11

The platform in which the C language appeared is PDP-11. It is based on the traditional von Neumann architecture , in which the programs are executed sequentially by the central processor, and the memory is a flat tape, where both the data and the programs are stored. Such an architecture is easily implemented in hardware, and over time, all general purpose computers began to use it.

Modern improvements to von Neumann's architecture are aimed at eliminating its main bottleneck - delays in the exchange of data between the processor and memory (English von Neuman bottleneck ). The difference in memory and CPU performance led to the appearance of caching subsystems of processors (single-level and later multi-level).

But even caches these days are not enough. Modern processors have become superscalar. The delays in receiving instructions from the memory data are partially compensated by the extraordinary execution ( instruction-level parallelism ) of the instructions, coupled with the branch predictor .

The sequential abstract machine C (and many other languages) imitates the work not so much specifically of PDP-11, but of any computers arranged according to the principle of von Neumann architecture. It includes architectures built around processors with a single core: desktop and server x86, mobile ARM, coming from the scene of Sun / Oracle SPARC and IBM POWER.

Over time, several computing cores began to be integrated into one processor, as a result of which it became necessary to maintain the coherence of the caches of each core and required internuclear interaction protocols. Von Neumann architecture was thus scaled to several cores.

The original version of abstract machine C was sequential, not reflecting the presence of program execution threads interacting through memory. The appearance of the memory model in the standard expanded the capabilities of the abstract machine to parallel.

Thus, the assertion that the abstract C machine has long been inconsistent with the structure of modern processors does not concern so much a specific language as computers using von Neumann architecture, including in parallel execution.

But as a practitioner, I want to note the following: we can assume that the Fonneimann approach is outdated, we can assume that it is relevant, but this does not cancel the fact that today's general-purpose architectures use derivatives of the traditional approach.

The standardized and portable embodiment of von Neumann architecture - the abstract C machine - is conveniently implemented on all major platforms and therefore enjoys its popularity as a portable assembler deservedly.

Optimizing compilers and low level language

Our second requirement for a low-level language is access to the low-level implementation details of each of the popular platforms. In the case of C, this is direct work with memory and objects in it as an array of bytes, the ability to directly work with byte addresses and advanced pointer arithmetic.

Critics of C point out that the language standard gives too many guarantees regarding, for example, the location of individual fields in structures and associations. Together with pointers and primitive mechanisms of loops this complicates the work of the optimizer.

Indeed, a more declarative approach would allow the compiler to independently solve the problems of data alignment in memory or the optimal order of fields in structures; and high-level cycles give the freedom you need when vectorizing.

The position of the C developers in this case is as follows: a low-level language should allow it to work at a level low enough for the programmer to independently solve optimization problems. Within C, it is possible to work as a compiler, choosing, for example, SIMD instructions and correctly placing the data in memory.

In other words, our requirement of access to the implementation details of each platform comes into conflict with the wishes of developers of optimizing compilers precisely because of the presence of low-level tools.

Interestingly, in his article entitled “C is not a low-level language,” Lifewell argues paradoxically that C is too low-level, indicating the absence of high-level tools in it. But practitioners need exactly low-level tools, otherwise the language cannot be used to develop operating systems and other low-level programs, that is, it will not satisfy the second of our requirements.

Distracting from the description of C optimization problems, I want to note that at the moment, no less effort is invested in optimizing compilers of high-level languages (the same C # and Java) than in GCC or Clang. Functional languages also have enough effective compilers: MLTon, OCaml, and others. But the developers of the same OCaml can still boast performance at best at half the speed of C code ...

Standard as an absolute good

In his article, Chiznell cites the results of a survey conducted in 2015: many programmers made mistakes in solving problems of understanding the C standards.

I believe that one of the readers was dealing with the C standard. I have a paper version of C99, some 900 pages. This is not a laconic Scheme specification with a volume of less than 100 pages and not a licked Standard ML, consisting of 300. Fun from work no one gets the C standard: neither compiler developers, nor document developers, nor programmers.

But we must understand that the C standard was developed after the fact, after the appearance of many "almost barely places" compatible dialects. ANSI C authors have done a great job summarizing existing implementations and covering with countless “crutches” of unorthogonality in language design.

It may seem strange that someone undertook to implement such a document. But C has been implemented by many compilers. I will not retell the tales of others about the zoo of the UNIX world of the late 80s, especially since at that time I didn’t think it was too confident and only until five. But obviously, the standard was really needed by everyone in the industry.

The great thing is that it exists and is implemented by at least three large compilers and many smaller compilers, which together support hundreds of platforms. None of the competing languages C, claiming the crown of the king of low-level languages, can boast of such diversity and versatility.

In fact, the current C standard is not so bad. A more or less experienced programmer is able to develop a non-optimizing C compiler in a reasonable amount of time, which is confirmed by the existence of many semi-amateur implementations (the same TCC, LCC and 8cc).

Having a generally accepted standard means that C satisfies the last of our requirements for a low-level language: this language is built on a specification, not a specific implementation.

Alternative architectures - special computing

But Lifewell brings up another argument, returning to the design of modern general-purpose processors that implement von Neumann architecture options. He claims that it makes sense to change the principles of the central processor. Once again, this criticism is not specific to C, but to the most basic model of imperative programming.

Indeed, there are many alternatives to the traditional approach with sequential execution of programs: SIMD models in the GPU style, models in the style of an abstract Erlang machine, and others. But each of these approaches has limited applicability when used in a central processor.

GPUs, for example, remarkably multiply matrices in games and machine learning, but they are difficult to use for ray tracing. In other words, this model is suitable for specialized accelerators, but does not work for general-purpose processors.

Erlang works great in a cluster, but efficient quick sort or a fast hash table on it is hard to do. The model of independent actors is better used at a higher level, in a large cluster, where each node is still the same high-performance machine with a traditional processor.

Meanwhile, modern x86-compatible processors have long included a set of vector instructions similar to the GPU in purpose and operating principles, but preserving the general processor circuit in the von Neumann style as a whole. I have no doubt that any fairly general approaches to computing will be included in popular processors.

There is such an authoritative opinion: the future belongs to specialized programmable accelerators. Under such extraordinary pieces of iron, it really makes sense to develop languages with special semantics. But a general purpose computer was and remains similar to the very PDP-11, for which C-like imperative languages are so well suited.

C will live

There is a fundamental contradiction in Chiznell's article. He writes that to ensure the speed of C programs, processors mimic the abstract C machine (and the long-forgotten PDP-11), after which it points out the limitations of such a machine. But I do not understand why this means that "C is not a low-level language."

In general, we are not talking about the shortcomings of C as a language, but about criticizing common von Neumann-style architectures and the programming model that follows from them. But so far it does not seem that the industry is ready to abandon the familiar architecture (at least not in general-purpose processors).

Despite the availability of many specialized processors such as GPUs and TPUs, the von Neumann architecture currently rules and the industry needs a language that allows it to operate at the lowest possible level within the framework of the most popular architecture. A fairly simple, ported to dozens of platforms and standardized programming language is C (and its closest relatives).

For all this, C has enough shortcomings: an archaic library of functions, an intricate and contradictory standard, and gross design errors. But, apparently, the creators of the language still did something right.

One way or another, we still need a low-level language, and it was built specifically for the popular Fonneimann computers. And let C be out of date, but apparently, any successor to it will still have to build on the same principles.

All Articles