From C To Assembly Language LG #94

...making Linux just a little more fun!

From C To Assembly Language
By Hiran Ramankutty

1. Overview

What is a microcomputer system made up of? A microcomputer system is made up of a microprocessor unit (MPU), a bus system, a memory subsystem, an I/O subsystem and an interface among all components. A typical answer one can expect.

This is only the hardware side. Every microcomputer system requires a software so as to direct each of the hardware components while they are performing their respective tasks. Computer software can be thought about at system side (system software) and user side (user software).

The user software may include some in-built libraries and user created libraries in the form of subroutines which may be needed in preparing programs for execution.

The system software may encompass a variety of high-level language translators, an assembler, a text editor, and several other programs for aiding in the preparation of other programs. We already know that there are three levels of programming and they are Machine language, Assembly language and High-level language.

Machine language programs are programs that the computer can understand and execute directly (think of programming in any microprocessor kit). Assembler language instructions match machine language instructions on a more or less one-for-one basis, but are written using character strings so that they are more easily understood, and high-level language instructions are much closer to the English language and are structured so that they naturally correspond to the way programmers think. Ultimately, an assembler language or high-level language program must be converted into machine language by programs called translators. They are referred to as assembler and compiler or interpreter respectively.

Compilers for high-level languages like C/C++ have the ability to translate high-level language into assembly code. The GNU C and C++ Compiler option of -S will generate an assembly code equivalent to that of the corresponding source program. Knowing how the most rudimentary constructs like loops, function calls and variable declaration are mapped into assembly language is one way to achieve the goal of mastering C internals. Before proceeding further, you must make it a point that you are familiar with Computer Architecture and Intel x86 assembly language to help you follow the material presented here.

2. Getting Started

To begin with, write a small program in C to print hello world and compile it with -S options. The output is an assembler code for the input file specified. By default, GCC makes the assembler file name by replacing the suffix `.c', with `.s'. Try to interpret the few lines at the end of the assembler file.

The 80386 and above family of processors have myriads of registers, instructions and addressing modes. A basic knowledge about only a few simple instructions is sufficient to understand the code generated by the GNU compiler.

Generally, any assembly language instruction includes a label, a mnemonic, and operands. An operand's notation is sufficient to decipher the operand's addressing mode. The mnemonics operate on the information contained in the operands. In fact, assembly language instructions operate on registers and memory locations. The 80386 family has general purpose registers (32 bit) called eax, ebx, ecx etc. Two registers, ebp and esp are used for manipulating the stack. A typical instruction, written in GNU Assembler (GAS) syntax, would look like this:

movl $10, %eax

This instruction stores the value 10 in the eax register. The prefix `%' to the register name and `$' to the immediate value are essential assembler syntax. It is to be noted that not all assemblers follow the same syntax.

Our first assembly language program, stored in a file named first.s is shown in Listing 1.

#Listing 1
.globl main
main:
  movl $20, %eax
  ret

This file can be assembled and linked to generate an a.out by giving the command cc first.s. The extensions `.s' are identified by the GNU compiler front end cc as assembly language files and invokes the assembler and linker, skipping the compilation phase.

The first line of the program is a comment. The .globl assembler directive serves to make the symbol main visible to the linker. This is vital as your program will be linked with the C startup library which will contain a call to main. The linker will complain about 'undefined reference to symbol main' if that line is omitted (try it). The program simply stores the value 20 in register eax and returns to the caller.

3. Arithmetic, Comparison, Looping

Our next program is Listing 2 which computes the factorial of a number stored in eax. The factorial is stored in ebx.

#Listing 2
.globl main
main: 
	movl $5, %eax
	movl $1, %ebx
L1:	cmpl $0, %eax		//compare 0 with value in eax
	je L2			//jump to L2 if 0==eax (je - jump if equal)
	imull %eax, %ebx	// ebx = ebx*eax
	decl %eax		//decrement eax
	jmp L1			// unconditional jump to L1
L2: 	ret

L1 and L2 are labels. When control flow reaches L2, ebx would contain the factorial of the number stored in eax.

4. Subroutines

When implementing complicated programs, we split the tasks to be solved in systematic order. We write subroutines and functions for each of the tasks which are called when ever required. Listing 3 illustrates subroutine call and return in assembly language programs.

#Listing 3
.globl main
main:
	movl $10, %eax
	call foo
	ret
foo:
	addl $5, %eax
	ret

The instruction call transfers control to subroutine foo. The ret instruction in foo transfers control back to the instruction after the call in main.

Generally, each function defines the scope of variables it uses in each call of the routine. To maintain the scopes of variables you need space. The stack can be used to maintain values of the variables in each call of the routine. It is important to know the basics of how the activation records can be maintained for repeated, recursive calls or any other possible calls in the execution of the program. Knowing how to manipulate registers like esp and ebp and making use of instructions like push and pop which operate on the stack are central to understanding the subroutine call and return mechanism.

5. Using The Stack

A section of your program's memory is reserved for use as a stack. The Intel 80386 and above microprocessors contain a register called stack pointer, esp, which stores the address of the top of stack. Figure 1 below shows three integer values, 49,30 and 72, stored on the stack (each integer occupying four bytes) with esp register holding the address of the top of stack.

Figure 1

Unlike the stack analogous to a pile of bricks growing up wards, on Intel machines stack grows down wards. Figure 2 shows the stack layout after the execution of the instruction pushl $15.

Figure 2

The stack pointer register is decremented by four and the number 15 is stored as four bytes at locations 1988, 1989, 1990 and 1991.

The instruction popl %eax copies the value at top of stack (four bytes) to the eax register and increments esp by four. What if you do not want to copy the value at top of stack to any register? You just execute the instruction addl $4, %esp which simply increments the stack pointer.

In Listing 3, the instruction call foo pushes the address of the instruction after the call in the calling program on to the stack and branches to foo. The subroutine ends with ret which transfers control to the instruction whose address is taken from the top of stack. Obviously, the top of stack must contain a valid return address.

6. Allocating Space for Local Variables

It is possible to have a C program manipulating hundreds and thousands of variables. The assembly code for the corresponding C program will give you an idea of how the variables are accommodated and how the registers are used for manipulating the variables without causing any conflicts in the final result that is to be obtained.

The registers are few in number and cannot be used for holding all the variables in a program. Local variables are allotted space within the stack. Listing 4 shows how it is done.

#Listing 4
.globl main
main:
	call foo
	ret
foo:
	pushl %ebp
	movl %esp, %ebp
	subl $4, %esp
	movl $10, -4(%ebp)
	movl %ebp, %esp
	popl %ebp
	ret

First, the value of the stack pointer is copied to ebp, the base pointer register. The base pointer is used as a fixed reference to access other locations on the stack. In the program, ebp may be used by the caller of foo also, and hence its value is copied to the stack before it is overwritten with the value of esp. The instruction subl $4, %esp creates enough space (four bytes) to hold an integer by decrementing the stack pointer. In the next line, the value 10 is copied to the four bytes whose address is obtained by subtracting four from the contents of ebp. The instruction movl %ebp, %esp restores the stack pointer to the value it had after executing the first line of foo and popl %ebp restores the base pointer register. The stack pointer now has the same value which it had before executing the first line of foo. The table below displays the contents of registers ebp, esp and stack locations from 3988 to 3999 at the point of entry into main and after the execution of every instruction in Listing 4 (except the return from main). We assume that ebp and esp have values 7000 and 4000 stored in them and stack locations 3988 to 3999 contain some arbitrary values 219986, 1265789 and 86 before the first instruction in main is executed. It is also assumed that the address of the instruction after call foo in main is 30000.

Table 1

6. Parameter Passing and Value Return

The stack can be used for passing parameters to functions. We will follow a convention (which is used by our C compiler) that the value stored by a function in the eax register is taken to be the return value of the function. The calling program passes a parameter to the callee by pushing its value on the stack. Listing 5 demonstrates this with a simple function called sqr.

#Listing 5
.globl main
main:
	movl $12, %ebx
	pushl %ebx
	call sqr
	addl $4, %esp       //adjust esp to its value before the push
	ret
sqr:
	movl 4(%esp), %eax
	imull %eax, %eax    //compute eax * eax, store result in eax 
	ret

Read the first line of sqr carefully. The calling function pushes the content of ebx on the stack and then executes a call instruction. The call will push the return address on the stack. So inside sqr, the parameter is accessible at an offset of four bytes from the top of stack.

8. Mixing C and Assembler

Listing 6 shows a C program and an assembly language function. The C function is defined in a file called main.c and the assembly language function in sqr.s. You compile and link the files together by typing cc main.c sqr.s.

The reverse is also pretty simple. Listing 7 demonstrates a C function print and its assembly language caller.

#Listing 6
//main.c
main()
{
	int i = sqr(11);
	printf("%d\n",i);
}

//sqr.s
.globl sqr
sqr:
	movl 4(%esp), %eax
	imull %eax, %eax
	ret

#Listing 7
//print.c
print(int i)
{
	printf("%d\n",i);
}

//main.s
.globl main
main:
	movl $123, %eax
	pushl %eax
	call print
	addl $4, %esp
	ret

9. Assembler Output Generated by GNU C

I guess this much reading is sufficient for understanding the assembler output produced by gcc. Listing 8 shows the file add.s generated by gcc -S add.c. Note that add.s has been edited to remove many assembler directives (mostly for alignments and other things of that sort).

#Listing 8
//add.c
int add(int i,int j)
{
	int p = i + j;
	return p;
}

//add.s
.globl add
add:
	pushl %ebp
	movl %esp, %ebp
	subl $4, %esp		//create space for integer p
	movl 8(%ebp),%edx	//8(%ebp) refers to i
	addl 12(%ebp), %edx	//12(%ebp) refers to j
	movl %edx, -4(%ebp)	//-4(%ebp) refers to p
	movl -4(%ebp), %eax	//store return value in eax
	leave			//i.e. to movl %ebp, %esp; popl %ebp ret

The program will make sense upon realizing the C statement add(10,20) which gets translated into the following assembler code:

pushl $20
pushl $10
call add

Note that the second parameter is passed first.

10. Global Variables

Space is created for local variables on the stack by decrementing the stack pointer and the allotted space is reclaimed by simply incrementing the stack pointer. So what is the equivalent GNU C generated code for global variables? Listing 9 provides the answer.

#Listing 9
//glob.c
int foo = 10;
main()
{
	int p foo;
}

//glob.s
.globl foo
foo:
	.long 10
.globl main
main:
	pushl %ebp
	movl %esp,%ebp
	subl $4,%esp
	movl foo,%eax
	movl %eax,-4(%ebp)
	leave
	ret

The statement foo: .long 10 defines a block of 4 bytes named foo and initializes the block with zero. The .globl foo directive makes foo accessible from other files. Now try this out. Change the statement int foo to static int foo. See how it is represented in the assembly code. You will notice that the assembler directive .globl is missing. Try this out for different storage classes (double, long, short, const etc.).

11. System Calls

Unless a program is just implementing some math algorithms in assembly, it will deal with such things as getting input, producing output, and exiting. For this it will need to call on OS services. In fact, programming in assembly language is quite the same in different OSes, unless OS services are touched.

There are two common ways of performing a system call in Linux: through the C library (libc) wrapper, or directly.

Libc wrappers are made to protect programs from possible system call convention changes, and to provide POSIX compatible interface if the kernel lacks it for some call. However, the UNIX kernel is usually more-or-less POSIX compliant: this means that the syntax of most libc "system calls" exactly matches the syntax of real kernel system calls (and vice versa). But the main drawback of throwing libc away is that one loses several functions that are not just syscall wrappers, like printf(), malloc() and similar.

System calls in Linux are done through int 0x80. Linux differs from the usual Unix calling convention, and features a "fastcall" convention for system calls. The system function number is passed in eax, and arguments are passed through registers, not the stack. There can be up to six arguments in ebx, ecx, edx, esi, edi, ebp consequently. If there are more arguments, they are simply passed though the structure as first argument. The result is returned in eax, and the stack is not touched at all.

Consider Listing 10 given below.

#Listing 10
#fork.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>

int main()
{
	fork();
	printf("Hello\n");
	return 0;
}

Compile this program with the command cc -g fork.c -static. Use the gdb tool and type the command disassemble fork. You can see the assembly code used for fork in the program. The -static is the static linker option of GCC (see man page). You can test this for other system calls and see how the actual functions work.

There have been several attempts to write an up-to-date documentation of the Linux system calls and I am not making this another of them.

11. Inline Assembly Programming

The GNU C supports the x86 architecture quite well, and includes the ability to insert assembly code within C programs, such that register allocation can be either specified or left to GCC. Of course, the assembly instruction are architecture dependent.

The asm instruction allows you to insert assembly instructions into your C or C++ programs. For example the instruction:

asm ("fsin" : "=t" (answer) : "0" (angle));

is an x86-specific way of coding this C statement:

answer = sin(angle);

You can notice that unlike ordinary assembly code instructions asm statements permit you to specify input and output operands using C syntax. Asm statements should not be used indiscriminately. So, when should we use them?

Asm statements allow your programs to access the computer hardware directly. This can produce programs that execute quickly. You can use them when writing operating system code that directly needs to interact with the hardware. For example, /usr/include/asm/io.h contains assembly instructions to access input/output ports directly.
Inline assembly instructions also speed up the innermost loops of the programs. For instance, sine and cosine of the same angles can be found by fsincos x86 instruction. Probably, the two listings given below will help you understand this factor better.

#Listing 11
#Name : bit-pos-loop.c 
#Description : Find bit position using a loop

#include <stdio.h>
#include <stdlib.h>

int main (int argc, char *argv[])
{
	long max = atoi (argv[1]);
	long number;
	long i;
	unsigned position;
	volatile unsigned result;

	for (number = 1; number <= max; ; ++number) {
		for (i=(number>>1), position=0; i!=0; ++position)
			i >>= 1;
		result = position;
	}
	return 0;
}

#Listing 12
#Name : bit-pos-asm.c
#Description : Find bit position using bsrl

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
	long max = atoi(argv[1]);
	long number;
	unsigned position;
	volatile unsigned result;

	for (number = 1; number <= max; ; ++number) {
		asm("bsrl %1, %0" : "=r" (position) : "r" (number));
		result = position;
	}
	return 0;
}

Compile the two versions with full optimizations as given below:

$ cc -O2 -o bit-pos-loop bit-pos-loop.c
$ cc -O2 -o bit-pos-asm bit-pos-asm.c

Measure the running time for each version by using the time command and specifying a large value as the command-line argument to make sure that each version takes at least few seconds to run.

$ time ./bit-pos-loop 250000000

and

$ time ./bit-pos-asm 250000000

The results will be varying in different machines. However, you will notice that the version that uses the inline assembly executes a great deal faster.

GCC's optimizer attempts to rearrange and rewrite program' code to minimize execution time even in the presence of asm expressions. If the optimizer determines that an asm's output values are not used, the instruction will be omitted unless the keyword volatile occurs between asm and its arguments. (As a special case, GCC will not move an asm without any output operands outside a loop.) Any asm can be moved in ways that are difficult to predict, even across jumps. The only way to guarantee a particular assembly instruction ordering is to include all the instructions in the same asm.

Using asm's can restrict the optimizer's effectiveness because the compiler does not know the asms' semantics. GCC is forced to make conservative guesses that may prevent some optimizations.

12. Exercises

Interpret the assembly code for C program in Listing 6. Modify it for eliminating errors that are obtained when generating assembly code with -Wall option. Compare the two assembly codes. What changes do you observe?
Compile several small C programs with and without optimization options (like -O2). Read the resulting assembly codes and find out some common optimization tricks used by the compiler.
Interpret assembly code for switch statement.
Compile several small C programs with inline asm statements. What differences do you observe in assembly codes for such programs.
A nested function is defined inside another function (the "enclosing function"), such that:
- the nested function has access to the enclosing function's variables; and
- the nested function is local to the enclosing function, that is, it can be called from elsewhere unless the enclosing function gives you a pointer to the nested function.
Nested functions can be useful because they help control the visibility of a function.

Consider Listing 13 given below:
```
#Listing 13
/* myprint.c */
#include <stdio.h>
#include <stdlib.h>

int main()
{
	int i;
	void my_print(int k)
	{
		printf("%d\n",k);
	}
	scanf("%d",&i);
	my_print(i);
	return 0;
}
```
Compile this program with cc -S myprint.c and interpret the assembly code. Also try compiling the program with the command cc -pedantic myprint.c. What do you observe?

[BIO] I have just given my final year B.Tech examinations in Computer Science and Engineering and a native of Kerala, India.