The process of compiling C ++ programs

The purpose of this article:



In this article I want to talk about how the compilation of programs written in C ++ occurs and describe each stage of compilation. I do not pursue the goal of telling everything in detail in detail, but only to give a general vision. Also, this article is a necessary introduction before the next article about static and dynamic libraries, since the compilation process is extremely important for understanding before further narration about libraries.







All actions will be performed on Ubuntu version 16.04 .

Using the g ++ compiler version:







$ g++ --version g++ (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609
      
      





G ++ Compiler Composition





We will not call these components directly, because in order to work with C ++ code, additional libraries are required, allowing all the necessary loads to be done by the main compiler component - g ++ .







Why compile source files?



The source C ++ file is just code, but it cannot be run as a program or used as a library. Therefore, each source file must be compiled into an executable file, dynamic or static libraries (these libraries will be discussed in the next article).







Compilation steps:



Before we begin, let's create the source .cpp file, with which we will continue to work.







driver.cpp :







 #include <iostream> using namespace std; #define RETURN return 0 int main() { cout << "Hello, world!" << endl; RETURN; }
      
      





1) Preprocessing



The very first stage of compiling a program.







A preprocessor is a macro processor that transforms your program for further compilation. At this stage, work occurs with preprocessor directives. For example, the preprocessor adds headers to the code ( #include ), removes comments, replaces macros ( #define ) with their values, selects the necessary pieces of code in accordance with the conditions of #if , #ifdef and #ifndef .







The headers included in the program using the #include directive recursively go through the preprocessing stage and are included in the output file. However, each header can be opened several times during preprocessing, therefore, special preprocessor directives are usually used to protect against cyclic dependency.







We will get the preprocessed code in the driver.ii output file (the C ++ files that have passed through the preprocessing stage have the .ii extension) using the -E flag, which tells the compiler that it is not necessary to compile (more on this later), but only preprocess it:







 g++ -E driver.cpp -o driver.ii
      
      





Looking at the body of the main function in the new generated file, you can see that the RETURN macro has been replaced:







 int main() { cout << "Hello, world!" << endl; return 0; }
      
      





driver.ii







In the new generated file you can also see a huge number of new lines, these are various libraries and the iostream header.







2) Compilation



At this step, g ++ performs its main task - compiles, that is, converts the code obtained at the last step without directives into assembler code . This is an intermediate step between a high-level language and machine (binary) code.







Assembler code is a human-readable representation of machine code.







Using the -S flag, which tells the compiler to stop after the compilation stage, we get the assembler code in the driver.s output file:







 $ g++ -S driver.ii -o driver.s
      
      





driver.s
  .file "driver.cpp" .local _ZStL8__ioinit .comm _ZStL8__ioinit,1,1 .section .rodata .LC0: .string "Hello, world!" .text .globl main .type main, @function main: .LFB1021: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl $.LC0, %esi movl $_ZSt4cout, %edi call _ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc movl $_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_, %esi movq %rax, %rdi call _ZNSolsEPFRSoS_E movl $0, %eax popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE1021: .size main, .-main .type _Z41__static_initialization_and_destruction_0ii, @function _Z41__static_initialization_and_destruction_0ii: .LFB1030: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $16, %rsp movl %edi, -4(%rbp) movl %esi, -8(%rbp) cmpl $1, -4(%rbp) jne .L5 cmpl $65535, -8(%rbp) jne .L5 movl $_ZStL8__ioinit, %edi call _ZNSt8ios_base4InitC1Ev movl $__dso_handle, %edx movl $_ZStL8__ioinit, %esi movl $_ZNSt8ios_base4InitD1Ev, %edi call __cxa_atexit .L5: nop leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE1030: .size _Z41__static_initialization_and_destruction_0ii, .-_Z41__static_initialization_and_destruction_0ii .type _GLOBAL__sub_I_main, @function _GLOBAL__sub_I_main: .LFB1031: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 movl $65535, %esi movl $1, %edi call _Z41__static_initialization_and_destruction_0ii popq %rbp .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE1031: .size _GLOBAL__sub_I_main, .-_GLOBAL__sub_I_main .section .init_array,"aw" .align 8 .quad _GLOBAL__sub_I_main .hidden __dso_handle .ident "GCC: (Ubuntu 5.4.0-6ubuntu1~16.04.9) 5.4.0 20160609" .section .note.GNU-stack,"",@progbits
      
      





We can also see and read the result. But in order for the machine to understand our code, it is necessary to convert it into machine code, which we will get in the next step.







3) assembly



Since x86 processors execute instructions in binary code, it is necessary to translate assembler code into machine code using assembler .







Assembler converts assembler code to machine code, storing it in an object file .







An object file is an intermediate file created by assembler that stores a piece of machine code. This piece of machine code that has not yet been linked together with other pieces of machine code into a final executable program is called object code .







Further, it is possible to save this object code into static libraries in order not to compile this code again.







We get the machine code using the assembler ( as ) in the output object file driver.o :







 $ as driver.s -o driver.o
      
      





But at this step, nothing is finished yet, because there can be many object files and you need to combine all of them into a single executable file using the linker (linker). Therefore, we move on to the next stage.







4) Layout



The linker (linker) links all the object files and static libraries into a single executable file, which we can run in the future. In order to understand how the bunch occurs, you should talk about the symbol table .







A symbol table is a data structure created by the compiler itself and stored in the object files themselves. The symbol table stores the names of variables, functions, classes, objects, etc., where each identifier (symbol) corresponds to its type, scope. Also, the symbol table stores the addresses of links to data and procedures in other object files.

It is with the help of the symbol table and the links stored in them that the linker will be able in the future to build relationships between data among many other object files and create a single executable file from them.







Get the driver executable:







 $ g++ driver.o -o driver //          
      
      





5) Download



The last step that our program has to go through is to call the bootloader to load our program into memory. Dynamic library loading is also possible at this stage.







Run our program:







 $ ./driver // Hello, world!
      
      





Conclusion



This article examined the basics of the compilation process, the understanding of which will be quite useful to every novice programmer. The second article about static and dynamic libraries will be published shortly.








All Articles