C Program Compilation steps & Example with GCC Linux

In this tuorial, we will see how different compilation steps of a C program and how to compile a C program using GCC in Linux enviornemnt.

Running and compiling a C program on most of the IDEs and on command line interfaces is just a click for the end-user. Most of us don’t even bother to know what happens at the backend of that IDE when we click on the compile button. But that is one of the very important questions when we are working with the C programming language. This is one of the most frequently asked interview questions for the software engineer position. An interviewer will never consider you a C programming coder if you are unable to answer this question.

C Program compilation steps and process

C Program Compilation Step

For a beginner level, a compilation of a C program can be simply classified into 4 major steps. These steps are listed below:

  • Pre-processing
  • Compilation
  • Assembly
  • Linking

Let’s now explain and go into a little depth of each of these steps.

Step 1: Pre-processing

During the pre-processing step of a C program compilation process, the processor does some initial processing in which it resolves all the lines starting with special characters. For instance, all the lines in your code starting with the pound (#) character which is known as macros or #define directives.

In a C program, there are two types of features that start with the pound character i.e.

  1. Included header files
  2. Macros

Both these inline functions are used to reduce the redundancy of the code as you can see from line number 5 and line number 6 of the C program shown below.

C and C++ Program compilation steps example

Second thing that is processed during this step is all the lines starting with double slashes (//) i.e. all the comments in your C program, refer to line number 11 of the code above and it resolve all the continued statement (ending with backslash (\)), refer to line number 12 of the code above.

Step 2: Compilation

During the second step of C programming compilation process, the already pre-processed C program is translated into its equivalent assembly language program. In some of the compilers at this stage, the C program is directly converted into its equivalent machine code, but mostly and generally the C program is only converted into an intermediately human readable code rather than unreadable machine code.

At the end of this step, an output file will be created with an extension of assembly code i.e. (.s). For instance, if the C program file name is “Hello_world.c” then after the completion of this step a file named “Hello_world.s” will be generated containing the assembly language equivalent of your program.

Step 3: Assembler

In this step, as the name specifies, the previously created assembly language equivalent of your program is further processed. This processing involved the conversion of the assembly language code into binary code, or machine code. This code will contain the machine language and will talk to the machine in 0s and 1s which is completely unreadable by humans.

The output file generated at the end of this step is known as the object file and have extension (.o).

Step 4: Linker

In the last step of C program compilation the previously created object file, with .o extension, will be processed. The object file created in the last step was not in final executable form. In this step the instructions of the object file were rearranged, and the libraries are linked to make the final executable output file with extension (.out).

The linking of the library will arrange all the pieces of the object code and fill the instructions for the function called in the code. For instance, in the example code at line number 12 a predefined function, puts is used, the definition of this function is provided in the linking library and is resolved during this step.

C program high level to assembly to machine code conversion

Another important feature of linker is to provide a memory mapping of code to assign physical addresses during run-time of code. Although, program will get different addresses on every fresh run, but mechanism defined by a linker file remains the same.

C Program Compilation Steps with GCC in Linux

To compile C code in the Linux command line using GCC, first, make sure, GCC compiler is installed on your Linux machine.

If GCC is not installed on your system, follow these steps:

  1. First, update Linux packages using this command. Because everything in Linux installed in packages:
$ sudo apt update

2. After that install all the essential packages by running this command in CLI:

$ sudo apt install build-essential

This build-essential command automatically install all commands along with GCC that is reuire to compile C code in Linux command line such as make, gcc, etc.

3. To check, if gcc is installed successfully, execute this command:

$ gcc --version 

It will display the output by giving the version of gcc.

gcc (Ubuntu 9.3.0-10ubuntu2) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Now lets move forward to see compilation steps of a C program with gcc in linux.

Copy this example code and save it with the name of add.c in any editor. This code takes two input parameters such as x and y which is defined as a macros. The sum_int function takes these two input paramters, adds them and return this value to an integer variable “out”. After that printf() function prints this value on console.

#include <stdio.h>
#define x 10
#define y 10
int sum_int(int a, int b)
{
return a+b;
}
int main(void)
{
     int out = sum_int(x,y);   
     printf("Sum of x and y = %d\n", out);
	return 0;
}

Compilation Steps with GCC in Linux

First lets perform pre-processing on the above c program. To perfom pre-processing step on above code, execute this command in linux command line:

$ gcc -E add.c
// this command can also be used for pre-processing step
$ cpp add.c > add.i

You will get output after pre-processing as shown in the code below. Two important things to note here is that:

  • (Header files inclusion) First, pre-processing includes all the header files along with their relative path where they are stored in your machine.
  • ( Macros expansion) Secondly, pre-processing actually replaces the macros, such as x and y, with their actual define files.
# 1 "add.c"
# 1 "<built-in>"
# 1 "<command-line>"
# 31 "<command-line>"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 32 "<command-line>" 2
# 1 "add.c"
# 1 "/usr/include/stdio.h" 1 3 4
# 27 "/usr/include/stdio.h" 3 4
# 1 "/usr/include/x86_64-linux-gnu/bits/libc-header-start.h" 1 3 4
# 33 "/usr/include/x86_64-linux-gnu/bits/libc-header-start.h" 3 4
# 1 "/usr/include/features.h" 1 3 4
# 461 "/usr/include/features.h" 3 4
[ ... additional instruction like shown above ... ]
# 873 "/usr/include/stdio.h" 3 4
# 2 "add.c" 2
# 4 "add.c"
int sum_int(int a, int b)
{
return a+b;
}
int main(void)
{
     int out = sum_int(10,10);
     printf("Sum of x and y = %d\n", out);
 return 0;
}

The second step is to perform the compilation. This GCC command is used for the compilation of c program. This commands converts add.i pre-processed file (see the last step) into the assembly file.

$ gcc -S add.i

The output of compilation step is assembly file with the name of add.s.

	.file	"add.c"
	.text
	.globl	sum_int
	.type	sum_int, @function
sum_int:
.LFB0:
	.cfi_startproc
	endbr64
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	movl	%edi, -4(%rbp)
	movl	%esi, -8(%rbp)
	movl	-4(%rbp), %edx
	movl	-8(%rbp), %eax
	addl	%edx, %eax
	popq	%rbp
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE0:
	.size	sum_int, .-sum_int
	.section	.rodata
.LC0:
	.string	"Sum of x and y = %d\n"
	.text
	.globl	main
	.type	main, @function
main:
.LFB1:
	.cfi_startproc
	endbr64
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset 6, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register 6
	subq	$16, %rsp
	movl	$10, %esi
	movl	$10, %edi
	call	sum_int
	movl	%eax, -4(%rbp)
	movl	-4(%rbp), %eax
	movl	%eax, %esi
	leaq	.LC0(%rip), %rdi
	movl	$0, %eax
	call	[email protected]
	movl	$0, %eax
	leave
	.cfi_def_cfa 7, 8
	ret
	.cfi_endproc
.LFE1:
	.size	main, .-main
	.ident	"GCC: (Ubuntu 9.3.0-10ubuntu2) 9.3.0"
	.section	.note.GNU-stack,"",@progbits
	.section	.note.gnu.property,"a"
	.align 8
	.long	 1f - 0f
	.long	 4f - 1f
	.long	 5
0:
	.string	 "GNU"
1:
	.align 8
	.long	 0xc0000002
	.long	 3f - 2f
2:
	.long	 0x3
3:
	.align 8
4:

The third step is to convert assembly file (add.s) into object file. This gcc command converts assmbly file (add.s) into object file.

$ as -o add.o add.s

After the compilation process, the assembler generates instruction codes according to underlying instruction set architecture of your machine such as ISA32, ISA64.

Finally, the linker converts the object file (add.o) into an executable file (add.exe). But linking processes also resolves the addresses of the external functions. For example, we have used external function print() in the above example code. The assembler does not resolve the address resolution of these external function and it is the responsibility of Linker to perform this task.

This command perform linking process:

$ gcc add.c -o add

Now type this command to execute code from linux command line:

$ ./add

You will get this output:

Sum of x and y = 20

Leave a Reply