To which does an executable program get loaded before execution starts?

My question assumes that the program is compiled using a linux hosted environment as opposed to free standing (i.e. embedded system). My interest in this question is half academic but I do use Ubuntu Linux 12.04.04 LTS for my programming environment and I would just like to know how things works under the hood.

For the purpose of understanding my question I've written some lead-in. If we agree the following to be pseudo-code, so any errors may be glossed over and the answer can be focused around principles, rather than a specific implementation.

int main(void)
{
    int j;
    j = 2;

    if (j == 2) {
        j = j + 1;
    } else {
        j = j - 1;
    }

    return 0;
}

The assembly output would have compare and branch instructions such as in the follow pseudo assembly code. This is my assumption, so if my understanding is way off please set me on track...

.start
        LOAD J, #2    ;   j = 2
        CMP J, #2     ;   if (j == 2)
        BNE .else    
        INC J, #1     ;   j = j + 1
.else
        DEC J, #1     j   j = j - 1

So the resulting machine code may cause the branch to jump relative to the PC. If we agree each instruction is always 32-bits wide for the sake of simplicity, then the "BNE .else" may produce a machine code instruction that increments the PC = PC + 8, since that's the location relative to the branch, (basically skipping the "INC J, #1" instruction.

However, if the .else block was further away, then "BNE .else" could result in an absolute jump to a specific location in memory, in which case the address would be .start + 16 bytes.

If the relative jump example were always true, then the program could just be copied to any part of memory, say

0x20000000
or
0x30000000
then the PC moved to that location and the code starts running. The op-code for the branch in each case would be
BNE 0x20000010
and
BNE 0x30000010
respectively (assuming somehow we found a way to squeeze the BNE op-code into the address by aligning to 4 or 8 bytes boundaries).

If the second absolute jump example were true, then the start location would need to be known at compile time, or the operating system would have to relocate the branch instructions prior to running the program.

If the compiler assumed the code would be run at

0x00000000
but the OS wants the code to run at
0x20000000
, then the OS would need to alter the branch statements as it loaded the code into memory, laying down the new jump address.

So my questions are:

  1. Typically, does a compiler produce relative or absolute machine code, or both? Perhaps it just never produces absolute style code to avoid all of this...

  2. If it does produce absolute machine code, to which address in memory does it assume the code will commence?

  3. If it doesn't fix the address in memory, then does it write the machine code as if it will commence at zero address and how would the OS go about displacing the branch instructions before running?

Answers


I think these articles will help answer your questions:

Relocation

Position Independent Code

Memory Management Unit

The short answer is that the compiler produces absolute machine code that is packaged in a relocatable format. The addresses are fixed when the relocatable executable is started.

Most modern hardware has a memory management unit that greatly simplifies this process, as the process is executing in a virtual address space obviating the need to relocate the addresses.

Shared libraries are typically compiled with the -f PIC (or equivalent) option in order to produce position independent code.


First of all you should know that a compiler creates object files and the linker links multiple object files into an executable file.

The object file contains only "placeholders" for addresses (e.g. "jmp 0x0") which will be replaced by the linker (e.g. replace "jmp 0x0" by "jmp 0x12345678").

There is the possibility to compile position-independent using certain compiler command line options. This is used for shared libraries because such code must for certain reasons run in different locations in memory without being changed. By default the compiler uses absolute addressing (the addresses will be inserted by the linker).

The executable file format contains information about the address where to load the executable file. Unlike Windows (where executable files may be moved to another position in some - but not all - cases) under Linux the address where an executable file is loaded to is fixed and given by the linker. Shared libraries can be loaded to any address in memory. Therefore they have to use relative addressing instead of absolute addressing.

Read the link to the Memory Management Unit given in Brad Lanam's answer if you want to know how different programs can "share" the same address.

On machines that do not have an MMU (e.g. the Linux version for the late 1980s Amiga models - yes, they can run Linux!) executable files must be compiled position-independent, too, because they cannot be loaded to a fixed address.

By the way: Nearly all jump instructions on most CPUs (x86, ARM, MIPS, SPARC, PowerPC) are relative, not absolute. (Notable exceptions are the "far jump" on x86 and the "J(AL)" instruction on MIPS).


Need Your Help

How to add data labels using axlsx

ruby charts labels axlsx

I'm trying to generate a pie chart using axlsx for ruby. Everything is working fine but I can't seem to get data labels to appear for each "pie slice". What I'm shooting for is more or less like ...