Introduction to C. Message from the last century

Foreword

In my comments, I referred several times to Andrew Tanenbaum’s book Operating Systems Design and Implementation, its first edition, and how C is represented in it. And these comments have always been of interest. I decided it was time to publish a translation of this introduction into C. It is still relevant. Although there are certainly those who have not heard about the programming language PL / 1 , and maybe even about the operating system Minix .

This description is also interesting from a historical point of view and for understanding how far the C language has gone since its birth and the IT industry as a whole.

I want to immediately make a reservation that my second language is French:

But this is offset by 46 years of programming experience .

So, let's get started, it’s Andrew Tanenbaum’s turn.

Introduction to the C Language (pp. 350 - 362)

The C programming language was created by Dennis Ritchie of AT&T Bell Laboratories as a high-level programming language for developing the UNIX operating system. Currently, the language is widely used in various fields. C is especially popular with system programmers because it allows you to write programs simply and concisely.

The main book describing the C language is the book of C programming language by Brian Kernigan and Dennis Ritchie (1978). Books on the C language were written by Bolon (1986), Gehani (1984), Hancock and Krieger (1986), Harbison and Steele (1984) and many others.

In this application, we will try to give a fairly complete introduction to C, so that those who are familiar with high-level languages such as Pascal, PL / 1, or Modula 2 will be able to understand most of the MINIX code provided in this book. C features that are not used in MINIX are not discussed here. Numerous subtle points omitted. The emphasis is on reading C programs, rather than writing code.

A.1. C language basics

A C program consists of a set of procedures (often called functions, even if they do not return values). These procedures contain declarations, operators, and other elements that together tell the computer what to do. Figure A-1 shows a small procedure in which three integer variables are declared and assigned values. The name of the procedure is main. The procedure has no formal parameters, as indicated by the absence of any identifiers between the brackets behind the procedure name. The body of the procedure is enclosed in braces ({}). This example shows that C has variables, and that these variables must be declared before use. C also has operators, in this example these are assignment operators. All operators must end with a semicolon (unlike Pascal, which uses colons between operators, not after them).

Comments begin with the characters “/ *” and end with the characters “* /” and can span multiple lines.

main () /*   */ { int i, j, k; /*  3   */ i = 10; /*  i  10 ( ) */ j = i + 015; /*  j  i + 015 ( ) */ k = j * j + 0xFF; /*  k  j * j + 0xFF ( ) */ } . Al.    .

The procedure contains three constants. Constant 10 in the first assignment

it is an ordinary decimal constant. Constant 015 is an octal constant

(equal to 13 in decimal). Octal constants always start at zero. The constant 0xFF is a hexadecimal constant (equal to 255 decimal). Hexadecimal constants always begin with 0x. All three types are used in C.

A.2. Basic data types

C has two main types of data (variables): an integer and a character, declared as int and char, respectively. There is no separate boolean variable. The int variable is used as a boolean variable. If this variable contains 0, then it means false / false, and any other value means true / true. C also has floating point types, but MINIX does not use them.

You can apply short, long, or unsigned “adjectives” to an int type that define a (range dependent on the compiler) range of values. Most 8088 processors use 16-bit integers for int and short int and 32-bit integers for long int. Unsigned integers (unsigned int) on the 8088 processor have a range from 0 to 65535, and not from -32768 to +32767, as is the case with ordinary integers (int). A character takes 8 bits.

The register specifier is also allowed for both int and char, and is a hint to the compiler that the declared variable should be placed in the register for the program to work faster.

Some ads are shown in fig. A - 2.

 int i; /*    */ short int z1, z2; / *    */ char c; /*   */ unsigned short int k; /*      */ long flag_poll; /* 'int'    */ register int r; /*   */ . -2.  .

Conversion between types is allowed. For example, the operator

 flag_pole = i;

allowed even if i is of type int and flag_pole is long. In many cases

it is necessary or useful to force conversions between data types. For forced conversion, it is enough to put the target type in brackets before the expression for conversion. For example:

  ( (long) i);

instructs to convert the integer i to long before passing it as a parameter to the procedure p, which expects the parameter long.

When converting between types, pay attention to the sign.

When converting a character to an integer, some compilers treat characters as signed, that is, from - 128 to +127, while others treat them as

unsigned, that is, from 0 to 255. In MINIX, expressions such as

 i = c & 0377;

which converts from (character) to an integer, and then performs a logical AND

(ampersand) with the octal constant 0377. The result is that the high 8 bits

are set to zero, actually making c be treated as an 8-bit unsigned number, in the range from 0 to 255.

A.3. Compound types and pointers

In this section, we will look at four ways to build more complex data types: arrays, structures, unions, and pointers. An array is a collection / set of elements of the same type. All arrays in C begin with element 0.

Announcement

 int a [10];

declares an array a with 10 integers to be stored in the elements of the array from [0] to a [9]. Second, arrays can be three or more dimensions, but they are not used in MINIX.

A structure is a collection of variables, usually of various types. The structure in C is similar to the record in Pascal. Operator

 struct {int i; char c;} s;

declares s as a structure containing two members, the integer i and the character c.

To assign member i of structure s to 6, write the following expression:

 si = 6;

where the dot operator indicates that the element i belongs to the structure s.

A union is also a set of members, similar to a structure, except that at any moment only one of them can be in a union. Announcement

 union {int i; char c;} u;

means that you can have an integer or character, but not both. The compiler should allocate enough space for combining so that it can accommodate the largest (from the point of view of occupied memory) element of combining. Unions are used only in two places in MINIX (to define a message as a union of several different structures, and to define a disk block as a union of a data block, i-node block, catalog block, etc.).

Pointers are used to store machine addresses in C. They are used very, very often. An asterisk (*) is used to indicate a pointer in ads. Announcement

 int i, *pi, a [10], *b[10], **ppi;

declares an integer i, a pointer to an integer pi, an array a of 10 elements, an array b of 10 pointers to integers and a pointer to a pointer ppi to an integer.

The exact syntax rules for complex declarations that combine arrays, pointers, and other types are somewhat complex. Fortunately, MINIX uses only simple declarations.

Figure A-3 shows the declaration of the z array of struct table structures, each of which has

three members, integer i, pointer cp to character and character c.

 struct table { /*      */ int i; / *  */ char *cp, c; /*      */ } z [20]; /*    20  */ .  - 3.  .

Arrays of structures are common in MINIX. Further, the name table can be declared as a struct table structure that can be used in subsequent declarations. For example,

 register struct table *p;

declares p a pointer to a struct table structure and suggests saving it

in register. During program execution, p may indicate, for example, z [4] or

to any other element in z, all 20 elements of which are structures of type struct table.

To make p a pointer to z [4], just write

 p = &z[4];

where ampersand as a unary (monadic) operator means "take the address of what follows it." Copy the value of member i to the integer variable n

The structure pointed to by p can be done as follows:

 n = p->i;

Note that the arrow is used to access a member of the structure through a pointer. If we use the variable z, then we must use the dot operator:

 n = z [4] .i;

The difference is that z [4] is a structure, and the point operator selects the elements

from composite types (structures, arrays) directly. Using pointers, we don’t select a participant directly. The pointer instructs you to first select a structure and only then select a member of this structure.

Sometimes it’s convenient to give a name to a composite type. For example:

 typedef unsigned short int unshort;

defines unshort as unsigned short (unsigned short integer). Now unshort can be used in the program as the main type. For example,

 unshort ul, *u2, u3[5];

declares a short unsigned integer, a pointer to a short unsigned integer, and

An array of short unsigned integers.

A.4. Operators

Procedures in C contain declarations and statements. We have already seen the declarations, so now we will consider the operators. The purpose of the conditional and loop operators is essentially the same as in other languages. Figure A-4 shows several examples of them. The only thing worth paying attention to is that curly braces are used to group operators, and the while statement has two forms, the second of which is similar to Pascal's repeat statement.

C also has a for statement, but it doesn't look like a for statement in any other language. The for statement has the following form:

 for (<>; <>; <>) ;

The same can be expressed through the while statement:

 <> while(<>) { <>; <> }

As an example, consider the following statement:

 for (i=0; i <n; i = i+l) a[i]=0;

This operator sets the first n elements of array a to zero. Operator execution begins by setting i to zero (this is done outside the loop). Then the operator is repeated until i <n, while performing the assignment and increase of i. Of course, instead of the operator of assigning a value to the current element of a zero array, there may be a compound operator (block) enclosed in curly brackets.

 if (x < 0) k = 3; /*   if */ if (x > y) { /*   if */ i = 2; k = j + l, } if (x + 2 <y) { /*  if-else */ j = 2; k = j - 1; } else { m = 0; } while (n > 0) { /*  while */ k = k + k; n = n - l; } do { / *    while */ k = k + k; n = n - 1; } while (n > 0); . A-4.   if  while  C.

C also has an operator similar to the case operator in Pascal. This is a switch statement. An example is shown in Figure A-5. Depending on the value of the expression specified in switch, one or another case statement is selected.

If the expression does not match any of the case statements, then the default statement is selected.

If the expression is not associated with any case statements and the default statement is absent, execution continues from the next statement after the switch statement.

It should be noted that the break statement should be used to exit the case block. If the break statement is missing, the next case block will be executed.

 switch (k) { case 10: i = 6; break; /*   case 20, ..    switch */ case 20: i = 2; k = 4; break; / *   default* / default: j = 5; } . A-5.   switch

The break statement also acts inside the for and while loops. It should be remembered that if the break statement is inside a series of nested loops, the output is only one level up.

A related statement is the continue statement, which does not exit the loop,

but causes the completion of the current iteration and the beginning of the next iteration

immediately. This is essentially a return to the top of the loop.

C has procedures that can be called with or without parameters.

According to Kernigan and Ritchie (p. 121), it is not allowed to transfer arrays,

structures or procedures as parameters, although passing pointers to it all

allowed. Is there a book or not (it will pop up in my memory: “If there is life on Mars, if there is no life on Mars”), many C compilers allow structures as parameters.

The name of the array, if it is written without an index, means a pointer to an array, which simplifies the transfer of an array pointer. Thus, if a is the name of an array of any type, it can be passed to g by writing

 g();

This rule applies only to arrays; this rule does not apply to structures.

Procedures can return values by executing a return statement. This statement may contain an expression, the result of which will be returned as the value of the procedure, but the caller can safely ignore the return value. If the procedure returns a value, then the type value is written before the procedure name, as shown in Fig. A-6. Like parameters, procedures cannot return arrays, structures, or procedures, but can return pointers to them. This rule is designed for a more efficient implementation - all parameters and results always correspond to one machine word (in which the address is stored). Compilers that allow structures to be used as parameters usually also allow them to be used as return values.

 int sum (i, j) /*      */ int i, j ; /*   */ { return (i + j); /*      */ } . -6.   ,   .

C has no built-in I / O. Input / output is implemented by calling library functions, the most common of which are illustrated below:

 printf («x=% dy = %oz = %x \n», x, y, z);

The first parameter is the string of characters between quotation marks (in fact, this is an array of characters).

Any character that is not a percentage is simply printed as is.

When the percentage occurs, the following parameter is printed in the form defined by the letter following the percentage:

d - print as a decimal integer

o - print as an octal integer

u - print as an unsigned decimal integer

x - print as a hex integer

s - print as a string of characters

c - print as one character

The letters D, 0, and X are also allowed for decimal, octal, and hexadecimal printing of long numbers.

A.5. Expressions

Expressions are created by combining operands and operators.

Arithmetic operators such as + and - and relational operators such as <

and> similar to their counterparts in other languages. % Operator

used modulo. It is worth noting that the equality operator is ==, and the inequality operator is! =. To check if a and b are equal, you can write like this:

 if (a == b) <>;

C also allows you to combine the assignment operator with other operators, therefore

 a += 4;

equivalent to recording

  =  + 4;

Other operators can also be combined in this way.

C has operators for manipulating bits of a word. Both shifts and bitwise logical operations are allowed. The left and right shift operators are <<

and >> respectively. Bitwise Logical Operators &, | and ^, which are logical AND (AND), including OR (OR) and exclusive OR (XOP), respectively. If i has the value 035 (octal), then the expression i & 06 has the value 04 (octal). Another example, if i = 7, then

 j = (i << 3) | 014;

and get 074 for j.

Another important group of operators is unary operators, each of which accepts only one operand. As a unary operator, ampersand & gets the address of a variable.

If p is a pointer to an integer and i is an integer, the operator

 p = &i;

computes the address i and stores it in the variable p.

The opposite of taking an address is an operator that takes a pointer as input and calculates the value at that address. If we just assigned address i to pointer p, then * p has the same meaning as i.

In other words, a pointer (or

expression giving a pointer) and returns the value of the element that it points to. If i has a value of 6, then the operator

 j = *;

will assign j the number 6.

Operator! (an exclamation point is a negation operator) returns 0 if its operand is nonzero, and 1 if its operator is 0.

It is mainly used in if statements, for example

 if (!x) k=0;

checks the value of x. If x is zero (false), then k is assigned the value 0. Actually, the operator! cancels the condition following it, just like the not operator in Pascal.

The ~ operator is a bitwise complement operator. Each 0 in its operand

becomes 1, and every 1 becomes 0.

The sizeof operator reports the size of its operand in bytes. Applied to

an array of 20 integers a on a computer with 2 byte integers, for example sizeof a will have a value of 40.

The last group of operators is the operators of increase and decrease.

Operator

++;

means an increase in p. How much p will increase depends on its type.

Integers or characters increment by 1, but pointers increment by

the size of the object pointed to in this way, if a is an array of structures, and p is a pointer to one of these structures, and we write

 p = &a[3];

to make p point to one of the structures in the array, then after increasing p

will point to a [4] no matter how large the structures are. Operator

 p--;

It is similar to the p ++ operator, except that it decreases, but does not increase, the value of the operand.

In statement

 n = k++;

where both variables are integers, the original value of k is assigned to n and

only then does k increase. In statement

 n = ++ k;

k increases first, then its new value is stored in n.

Thus, a ++ (or -) operator can be written before or after its operand, resulting in various values.

The last statement is this? (question mark) that selects one of two alternatives

separated by a colon. For example, an operator,

 i = (x < y ? 6 : k + 1);

compares x with y. If x is less than y, then i gets the value 6; otherwise, the variable i gets the value k + 1. The brackets are optional.

A.6. Program structure

A C program consists of one or more files containing procedures and declarations.

These files can be individually compiled into object files, which are then linked to each other (using the linker) to form an executable program.

Unlike Pascal, procedure declarations cannot be nested, therefore all of them are written at the “top level” in the program file.

It is allowed to declare variables outside the procedures, for example, at the beginning of the file before the first declaration of the procedure. These variables are global, and can be used in any procedure throughout the program, unless the static keyword precedes the declaration. In this case, these variables cannot be used in another file. The same rules apply to procedures. Variables declared inside a procedure are local to the procedure.

The procedure can access the integer variable v declared in another file (provided that the variable is not static), declaring it external in itself:

 extern int v;

Each global variable must be declared exactly once without the extern attribute in order to allocate memory for it.

Variables can be initialized when declared:

 int size = 100;

Arrays and structures can also be initialized. Global variables that are not initialized explicitly receive a default value of zero.

A.7. C preprocessor

Before the source file is transferred to the C compiler, it is automatically processed

a program called a preprocessor. It is the output of the preprocessor, not

The original program is fed to the input of the compiler. Preprocessor performs

Three basic conversions in a file before passing it to the compiler:

1. Inclusion of files.

2. Definition and replacement of macros.

3. Conditional compilation.

All preprocessor directives begin with a number sign (#) in the 1st column.

When a view directive

 #include "prog.h"

met by the preprocessor, it includes the prog.h file, line by line, in

the program to be passed to the compiler. When the #include directive is written as

 #include <prog.h>

then the included file is searched in the / usr / include directory instead of the working directory. It is common practice in C to group the declarations used by several files in a header file (usually with the suffix .h), and include them where necessary.

The preprocessor also allows macro definitions. For example,

 #define BLOCK_SIZE 1024

defines the BLOCK_SIZE macro and assigns it a value of 1024. From now on,

each occurrence of a string of 10 characters "BLOCK_SIZE" in the file will be

replaced by a 4-character string "1024" before the compiler sees the file with the program. By convention, macro names are written in uppercase. Macros can have parameters, but in practice few do.

The third feature of the preprocessor is conditional compilation. MINIX has several

places where code is written specifically for the 8088 processor, and this code should not be included when compiling for another processor. These sections look like this:

 #ifdef i8088 <   8088> #endif

If the i8088 character is defined, then the statements between the two preprocessor directives #ifdef i8088 and #endif are included in the output of the preprocessor; otherwise they are skipped. Calling the compiler with the command

 cc -c -Di8088 prog.c

or by including a statement in the program

 #define i8088

we define the symbol i8088, so all dependent code for 8088 to be included. As MINIX develops, it may acquire special code for 68000s and other processors that will be processed as well.

As an example of how the preprocessor works, consider the program in Fig. A-7 (a). It includes one prog.h file, the contents of which are as follows:

 int x; #define MAXAELEMENTS 100

Imagine that the compiler was called by a command

 cc -E -Di8088 prog.c

After the file has passed through the preprocessor, the output will be as shown in Fig. A-7 (b).

It is this output, not the source file, that is given as input to the C compiler.

 #include prog.h int x; main () main (); { { int a[MAX_ELEMENTS]; int a [100];  = 4;  = 4; a[x] = 6; [] = 6; #ifdef i8088 printf("8088. a[x]:% d\n", a[x]); printf ("8088. a[x]:% d\n", a[x]); #endif } #ifdef m68000 printf ("68000. x=%d\n", x); #endif } () (b) . -7. (a)   prog.c. (b)  .

Note that the preprocessor did its job and deleted all lines starting with the # sign. If the compiler would be called like this

 cc -c -Dm68000 prog.c

then another print would be included. If it were called like this:

 cc -c prog.c

then no print would be included. (The reader may reflect on what would happen if the compiler were called with both the -D ﬂ ags flags.)

A.8. Idioms

In this section, we will look at several constructs that are typical for C but not common in other programming languages. First, consider the loop:

 while (n--) *p++ = *q++;

Variables p and q are usually character pointers, and n is a counter. The loop copies the n-character string from where q points to where p points. At each iteration of the loop, the counter decreases until it reaches 0, and each of the pointers increases, so they sequentially point to memory cells with a higher number.

Another common design:

 for (i = 0; i < N; i++) a[i] = 0;

which sets the first N elements of a to 0. An alternative way to write this loop is:

 for (p = &a[0]; p < &a[N]; p++) *p = 0;

In this statement, the integer pointer p is initialized to point to the zero element of the array. The loop continues until p reaches the address of the Nth element of the array. A pointer construct is much more efficient than an array construct, and therefore it is usually used.

Assignment operators may appear in unexpected places. For example,

 if (a = f (x)) <  >;

first calls the function f, then assigns the result of calling the function a, and

finally checks to see if it is true (non-zero) or false (zero). If a is not equal to zero, then the condition is satisfied. Operator

 if (a = b) <  >;

also, first, the value of the variable b of the variable a, and then checks a if the value is nonzero. And this operator is completely different from

 if (a == b) <  >;

which compares two variables and executes the operator if they are equal.

Afterword

That's all.You will not believe how much I enjoyed preparing this text. How much I remembered useful from the same C language. I hope you too will enjoy plunging into the wonderful world of C language.

All Articles