computer system chapter 7. linking lynn choi korea university
TRANSCRIPT
Computer SystemComputer System
Chapter 7. LinkingChapter 7. Linking
Lynn ChoiLynn Choi
Korea UniversityKorea University
A Simple Program TranslationA Simple Program Translation
Problems:• Efficiency: small change requires complete recompilation• Modularity: hard to share common functions (e.g. printf)
Solution:• Separate compilation: use linker
Translator
m.c
p
ASCII source file
Binary executable object file(memory image on disk)
A Better Scheme Using a LinkerA Better Scheme Using a Linker
Linker (ld)
Translators
m.c
m.o
Translators
a.c
a.o
p
Separately compiled relocatable object files
Executable object file (contains code and data for all functions defined in m.c and a.c)
Compiler Driver Compiler Driver
Compiler driverCompiler driver coordinates all steps in the translation and linking coordinates all steps in the translation and linking process. process.
Typically included with each compilation system (e.g., gcc)
Invokes preprocessor (cpp), compiler (cc1), assembler (as), and linker (ld).
Passes command line arguments to appropriate phases
Example: create executable Example: create executable pp from from m.cm.c and and a.ca.c::
bass> gcc -O2 -v -o p m.c a.c cpp [args] m.c /tmp/cca07630.i cc1 /tmp/cca07630.i m.c -O2 [args] -o /tmp/cca07630.s as [args] -o /tmp/cca076301.o /tmp/cca07630.s <similar process for a.c>ld -o p [system obj files] /tmp/cca076301.o /tmp/cca076302.o bass>
Example Program 1Example Program 1/* main.c */ void swap();
int buf[2] = {1, 2};
int main() {
swap(); return 0;
}
/* swap.c */ extern int buf[];
int *bufp0 = &buf[0]; int *bufp1;
void swap() {
int temp;
bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp;
}
Unix> gcc –O2 –g –o p main.c swap.cUnix> gcc –O2 –g –o p main.c swap.cThis generates the following compilation processes
cpp [other arguments] main.c /tmp/main.ic11 /tmp/main.i main.c –O2 [other arguments] –o /tmp/main.sas [other arguments] –o /tmp/main.o /tmp/main.s(The driver goes through the same process to generate swap.o.)Ld –o p [system object files and args] /tmp/main.o /tmp/swap.o
LinkerLinker
LinkingLinkingThe process of collecting and combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed.Linking time
Can be done at compile time, i.e. when the source code is translatedOr, at load time, i.e. when the program is loaded into memoryOr, even at run time.
Static LinkerStatic LinkerPerforms the linking at compile time.Takes a collection of relocatable object files and command line arguments and generate a fully linked executable object file that can be loaded and run.Performs two main tasks
Symbol resolution: associate each symbol reference with exactly one symbol definitionRelocation: relocate code and data sections and modify symbol references to the relocated memory locations
Dynamic LinkerDynamic LinkerPerforms the linking at load time or at run time. Will be discussed later..
Object FilesObject Files
Relocatable object fileRelocatable object fileContains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file
Compilers and assemblers generate relocatable object files
Executable object fileExecutable object fileContains binary code and data in a form that can be copied directly into memory and executed
Linkers generate executable object files
Shared object fileShared object fileA special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or at run time
What Does a Linker Do?What Does a Linker Do?
Merges object filesMerges object filesMerges multiple relocatable (.o) object files into a single executable object file that can loaded and executed by the loader.
Resolves external referencesResolves external referencesAs part of the merging process, resolves external references.
External reference: reference to a symbol defined in another object file.
Relocates symbolsRelocates symbolsRelocates symbols from their relative locations in the .o files to new absolute positions in the executable.
Updates all references to these symbols to reflect their new positions.
References can be in either code or datacode: a(); /* reference to symbol a */
data: int *xp=&x; /* reference to symbol x */
Why Linkers?Why Linkers?
ModularityModularityProgram can be written as a collection of smaller source files, rather than one monolithic mass.
Can build libraries of common functions (more on this later)e.g., Math library, standard C library
EfficiencyEfficiencyTime:
Change one source file, compile, and then relink.
No need to recompile other source files.
Space: Libraries of common functions can be aggregated into a single file...
Yet executable files and running memory images contain only code for the functions they actually use.
Object File FormatObject File Format
Object file format varies from system to systemObject file format varies from system to systemExamples: a.out (early UNIX systems), COFF (Common Object File Format, used by early versions of System V), PE (Portable Executable, a variation of COFF used by Microsoft Windows), ELF (Used by modern UNIX including Linux)
ELF (Executable and Linkable Format)ELF (Executable and Linkable Format)Standard binary format for object files
Derives from AT&T System V UnixLater adopted by BSD Unix variants and Linux
One unified format for Relocatable object files (.o),
Executable object files
Shared object files (.so)
Generic name: ELF binaries
Better support for shared libraries than old a.out formats.
ELF Object File FormatELF Object File FormatELF headerELF header
Word size, byte ordering of the system, the machine type (e.g. IA32)The size of ELF header, the object file type (.e.g. relocatable, executable, or shared), the file offset of the section header table and the size and number of entries in the section header table.
Section header tableSection header tableContains the locations and sizes of various sectionsA fixed size entry for each section
Segment header tableSegment header tablePage size, virtual and physical addresses of memory segments (sections), segment sizes.
.text.text section: machine code section: machine code
.data.data section: initialized global variables section: initialized global variablesLocal C variables are maintained at run time on the stack, and do not appear in .data or .bss section
.bss.bss section: uninitialized global variables section: uninitialized global variables“Block Storage Start” or “Better Save Space!”Just a place holder and do not occupy space
ELF header
Segment header table(required for executables)
.text section
.data section
.bss section
.symtab
.rel.txt
.rel.data
.debug
Section header table(required for relocatables)
0
ELF Object File Format (cont)ELF Object File Format (cont)
.symtab.symtab section sectionSymbol tableInformation about function and global variables
.rel.text.rel.text section sectionRelocation info for .text sectionAddresses of instructions that will need to be modified in the executable
.rel.data.rel.data section sectionRelocation info for .data sectionAddresses of pointer data that will need to be modified in the merged executable
.debug.debug section sectionInfo for symbolic debugging (gcc -g)
.line.line section sectionMapping between line numbers in source code and machine instructions in the .text section
.strtab.strtab section sectionString table for symbols in the .symtab and .debug
ELF header
Program header table(required for executables)
.text section
.data section
.bss section
.symtab
.rel.text
.rel.data
.debug
Section header table(required for relocatables)
0
.line
.strtab
Life and Scope of an ObjectLife and Scope of an Object
Life vs. scopeLife vs. scopeLife of an object determines whether the object is still in memory (of the process) whereas the scope of an object determines whether the object can be accessed at this position
It is possible that an object is live but not visible.
It is not possible that an object is visible but not live.
Local variablesLocal variablesVariables defined inside a function
The scope of these variables is only within this function
The life of these variables ends when this function completes
So when we call the function again, storage for variables is created andvalues are reinitialized.
Static local variables - If we want the value to be extent throughout the life of a program, we can define the local variable as "static."
Initialization is performed only at the first call and data is retained between func calls.
Life and Scope of an ObjectLife and Scope of an Object
Global variablesGlobal variablesVariables defined outside a function
The scope of these variables is throughout the entire program
The life of these variables ends when the program completes
Static variablesStatic variablesStatic variables are local in scope to their module in which they are defined, but life is throughout the program.
Static local variables: static variables inside a function cannot be called from outside the function (because it's not in scope) but is alive and exists in memory.
Static variables: if a static variable is defined in a global space (say at beginning of file) then this variable will be accessible only in this file (file scope)
If you have a global variable and you are distributing your files as a library and you want others not to access your global variable, you may make it static by just prefixing keyword static
SymbolsSymbols
Three kinds of linker symbolsThree kinds of linker symbolsGlobal symbols that are defined by module m and that can be referenced by other modules.
Nonstatic C functions and nonstatic global variablesFunctions or global variables that are defined without the C static attribute.
Global symbols that are referenced by module m, but defined by other moduleThese are called externals.
Local symbols that are defined and referenced exclusively by module mStatic C functions and static variables
Local procedure variables that are defined with C static attribute are not managed on the stack. Instead, the compiler allocates space in .data or in .bss
Local linker symbol ≠ local program variable
ELF Symbol TableELF Symbol Table
typedef struct { int name; /* string table offset */ int value; /* section offset, or VM address */ int size; /* object size in bytes */ char type:4, /* data, func, section, or src file name (4 bits) */ binding:4; /* local or global (4 bits) */ char reserved; /* unused */ char section; /* section header index, ABS, UNDEF, */
/* or COMMON */ } Elf_Symbol;
Num: Value Size Type Bind Ot Ndx Name8: 0 8 OBJECT GLOBAL 0 3 buf9: 0 17 FUNC GLOBAL 0 1 main10: 0 0 NOTYPE GLOBAL 0 UND swap
Three entries in the symbol table for main.o
Strong and Weak SymbolsStrong and Weak Symbols
Symbol resolutionSymbol resolutionThe linker associates each reference with exactly one symbol definition
References to local symbols defined in the same module is straightforward
Resolving references to global variables is trickierThe same symbol might be defined by multiple object files
Program symbols are either strong or weakProgram symbols are either strong or weakstrong: procedures and initialized globals
weak: uninitialized globalsAt compile time, the compiler exports each global symbol to the assembler as either strong or weak, and the assembler encodes this information implicitly in the symbol table of the relocatable object file
p1.c p2.c
strong
weak
strong
strong int foo=5;
p1() {}
int foo;
p2() {}
Linker’s Symbol Resolution RulesLinker’s Symbol Resolution Rules
Rule 1. A strong symbol can only appear once.Rule 1. A strong symbol can only appear once.
Rule 2. A weak symbol can be overridden by a strong symbol of the Rule 2. A weak symbol can be overridden by a strong symbol of the same name.same name.
references to the weak symbol resolve to the strong symbol.
Rule 3. If there are multiple weak symbols, the linker can pick an Rule 3. If there are multiple weak symbols, the linker can pick an arbitrary one.arbitrary one.
Linker PuzzlesLinker Puzzles
int x;p1() {}
int x;p2() {}
int x;int y;p1() {}
double x;p2() {}
int x=7;int y=5;p1() {}
double x;p2() {}
int x=7;p1() {}
int x;p2() {}
int x;p1() {} p1() {}
Link time error: two strong symbols (p1)
References to x will refer to the same uninitialized int. Is this what you really want?
Writes to x in p2 might overwrite y!Evil!
Writes to x in p2 will overwrite y!Nasty!
Nightmare scenario: two identical weak structs, compiled by different compilerswith different alignment rules.
References to x will refer to the same initializedvariable.
RelocationRelocation
After the symbol resolution phaseAfter the symbol resolution phaseOnce the symbol resolution step has been completed, the linker associates each symbol reference in the code with exactly one symbol definition
Linker knows the exact sizes of the code and data sections in each object module
Relocation consists of 2 stepsRelocation consists of 2 stepsRelocating sections and symbol definitions
Merges all sections of the same type into a new aggregate section.data sections from all the input modules are merged into a single .data section
Assign run-time memory addresses to the new aggregate sections and to each symbol defined internally in each module
Relocating symbol references within sectionsModify external references so that they point to the correct run-time addresses
To perform this step, the linker relies on relocation entries in the relocatable object modules
Example C ProgramExample C Program
int e=7; int main() { int r = a(); exit(0); }
m.c a.cextern int e; int *ep=&e;int x=15; int y; int a() { return *ep+x+y; }
Relocating SectionsRelocating Sections
main()m.o
int *ep = &e
a()
a.o
int e = 7
headers
main()
a()
0system code
int *ep = &e
int e = 7
system data
more system code
int x = 15int y
system data
int x = 15
Relocatable Object Files Executable Object File
.text
.text
.data
.text
.data
.text
.data
.bss .symtab.debug
.data
uninitialized data .bss
system code
Resolving External ReferencesResolving External References
Symbols are lexical entities that name functions and variables.
Each symbol has a value (typically a memory address).
Code consists of symbol definitions and references.
References can be either local or external.
int e=7; int main() { int r = a(); exit(0); }
m.c a.cextern int e; int *ep=&e;int x=15; int y; int a() { return *ep+x+y; }
Def of local symbol e
Ref to external symbol exit(defined in libc.so)
Ref toexternalsymbol e
Def oflocal symbol ep
Defs of local symbols x and y
Refs of local symbols ep,x,y
Def oflocal symbol a
Ref to external symbol a
m.om.o Relocation Info Relocation Info
Disassembly of section .text: 00000000 <main>: 00000000 <main>: 0: 55 pushl %ebp 1: 89 e5 movl %esp,%ebp 3: e8 fc ff ff ff call 4 <main+0x4> 4: R_386_PC32 a 8: 6a 00 pushl $0x0 a: e8 fc ff ff ff call b <main+0xb> b: R_386_PC32 exit f: 90 nop
Disassembly of section .data: 00000000 <e>: 0: 07 00 00 00
source: objdump
int e=7; int main() { int r = a(); exit(0); }
m.c
a.oa.o Relocation Info ( Relocation Info (.text.text))a.cextern int e; int *ep=&e;int x=15; int y; int a() { return *ep+x+y; }
Disassembly of section .text: 00000000 <a>: 0: 55 pushl %ebp 1: 8b 15 00 00 00 movl 0x0,%edx 6: 00 3: R_386_32 ep 7: a1 00 00 00 00 movl 0x0,%eax 8: R_386_32 x c: 89 e5 movl %esp,%ebp e: 03 02 addl (%edx),%eax 10: 89 ec movl %ebp,%esp 12: 03 05 00 00 00 addl 0x0,%eax 17: 00 14: R_386_32 y 18: 5d popl %ebp 19: c3 ret
a.oa.o Relocation Info (. Relocation Info (.datadata))a.cextern int e; int *ep=&e;int x=15; int y; int a() { return *ep+x+y; }
Disassembly of section .data: 00000000 <ep>: 0: 00 00 00 00
0: R_386_32 e 00000004 <x>: 4: 0f 00 00 00
Executable After Relocation (.Executable After Relocation (.texttext))08048530 <main>: 8048530: 55 pushl %ebp 8048531: 89 e5 movl %esp,%ebp 8048533: e8 08 00 00 00 call 8048540 <a> 8048538: 6a 00 pushl $0x0 804853a: e8 35 ff ff ff call 8048474 <_init+0x94> 804853f: 90 nop 08048540 <a>: 8048540: 55 pushl %ebp 8048541: 8b 15 1c a0 04 movl 0x804a01c,%edx 8048546: 08 8048547: a1 20 a0 04 08 movl 0x804a020,%eax 804854c: 89 e5 movl %esp,%ebp 804854e: 03 02 addl (%edx),%eax 8048550: 89 ec movl %ebp,%esp 8048552: 03 05 d0 a3 04 addl 0x804a3d0,%eax 8048557: 08 8048558: 5d popl %ebp 8048559: c3 ret
Executable After Relocation (.Executable After Relocation (.datadata))
Disassembly of section .data: 0804a018 <e>: 804a018: 07 00 00 00
0804a01c <ep>: 804a01c: 18 a0 04 08
0804a020 <x>: 804a020: 0f 00 00 00
int e=7; int main() { int r = a(); exit(0); }
m.c
a.cextern int e; int *ep=&e;int x=15; int y; int a() { return *ep+x+y; }
Linking with Static LibrariesLinking with Static Libraries
Static libraryStatic libraryA package of related object modules, which can be supplied as input to the linker at compile time
The linker copies only the object modules in the library that are actually referenced by the program
Exampleslibc.a: ANSI C standard C library that includes printf, scanf, strcpy, etc.
libm.a: ANSI C math library
Instead of Instead of unix> gcc main.c /usr/lib/printf.o /usr/lib/scanf.o …
DoDounix> gcc main.c /usr/lib/libc.a
To create a library, use AR toolTo create a library, use AR toolunix> gcc –c addvec.c multvec.c
unix> ar rcs libvector.a addvec.o multvet.o
Linking with Static LibrariesLinking with Static Libraries
Translators(cpp, cc1, as)
main2.c
main2.o
libc.a
Linker (ld)
p2
printf.o and any other modules called by printf.o
libvector.a
addvec.o
Static libraries
Source files
Relocatableobject files
Fully linked executable object file
vector.h
Packaging Commonly Used FunctionsPackaging Commonly Used Functions
How to package functions commonly used by programmers?How to package functions commonly used by programmers?Math, I/O, memory management, string manipulation, etc.
Awkward, given the linker framework so far:Awkward, given the linker framework so far:Option 1: Put all functions in a single source file
Programmers link big object file into their programsSpace and time inefficient
Each executable object file occupies disk space as well as memory spaceAny change in one function would require the recompilation of the entire source file
Option 2: Put each function in a separate source fileProgrammers explicitly link appropriate binaries into their programsMore efficient, but burdensome on the programmer
Solution: Solution: static librariesstatic libraries (. (.aa archive filesarchive files))Concatenate related relocatable object files into a single file with an index (called an archive).Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives.If an archive member file resolves reference, link the member file into executable.
Static Libraries (archives)Static Libraries (archives)
Translator
p1.c
p1.o
Translator
p2.c
p2.o libc.astatic library (archive) of relocatable object files concatenated into one file.
executable object file (only contains code and data for libc functions that are actually called from p1.c and p2.c)
Further improves modularity and efficiency by packaging commonly used functions [e.g., C standard library (libc), math library (libm)] Linker selects only the .o files in the archive that are actually needed by the program.
Linker (ld)
p
Creating Static LibrariesCreating Static Libraries
Translator
atoi.c
atoi.o
Translator
printf.c
printf.o
libc.a
Archiver (ar)
... Translator
random.c
random.o
ar rs libc.a \ atoi.o printf.o … random.o
Archiver allows incremental updates: • Recompile function that changes and replace .o file in archive.
C standard library
Commonly Used LibrariesCommonly Used Libraries
libc.alibc.a (the C standard library) (the C standard library)8 MB archive of 900 object files.Standard I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math
libm.alibm.a (the C math library) (the C math library)1 MB archive of 226 object files. floating point math (sin, cos, tan, log, exp, sqrt, …)
% ar -t /usr/lib/libc.a | sort …fork.o … fprintf.o fpu_control.o fputc.o freopen.o fscanf.o fseek.o fstab.o …
% ar -t /usr/lib/libm.a | sort …e_acos.o e_acosf.o e_acosh.o e_acoshf.o e_acoshl.o e_acosl.o e_asin.o e_asinf.o e_asinl.o …
Using Static LibrariesUsing Static Libraries
Linker’s algorithm for resolving external references:Linker’s algorithm for resolving external references:Scan .o files and .a files in the command line order.During the scan, keep a list of the current unresolved references.As each new .o or .a file obj is encountered, try to resolve each unresolved reference in the list against the symbols in obj. If there exist any entries in the unresolved list at end of scan, then error.
Problem:Problem:Command line order matters!Moral: put libraries at the end of the command line.
bass> gcc -L. libtest.o –lmine.a bass> gcc -L. –lmine.a libtest.o libtest.o: In function `main': libtest.o(.text+0x4): undefined reference to `libfun'
Executable Object FileExecutable Object File
After the linking with static librariesAfter the linking with static librariesThe input C program (in ASCII text) file has been transformed into a single binary file that contains all of the information needed to load the program into memory and run.
The format of an executable object fileThe format of an executable object fileSimilar to the format of a relocatable object file, except the followings
ELF header includes program’s entry point, which is the address of the 1st instruction
.init section defines a small function called _init that will be called by the program’s initialization code
No relocation information
Segment header table describes Mapping between the segments in the executable object files and the memory segments in the virtual address space
Read/write/executable permission (alignment information between segments)
Executable Object FileExecutable Object File
.data
.symtab
.debug
0
.rodata
.bss
ELF header
Describes object file
sections
.strtab
Section header table
.line
Segment header table
.text
.init Read-only memory segment (code segment)
Read/write memory segment(data segment)
Symbol table and debugging info are notloaded into memory
Loading Executable BinariesLoading Executable Binaries
ELF header
Segment header table(required for executables)
.text section
.data section
.bss section
.symtab
.debug
.line
.strtab
Section header table(required for relocatables)
0
.text segment(r/o)
.data segment(initialized r/w)
.bss segment(uninitialized r/w)
Executable object file for example program p
Process image
init and shared libsegments
Kernel virtual memory
Memory-mapped region forshared libraries
Run-time heap(created by malloc)
User stack(created at runtime)
Unused0
%esp (stack pointer)
Memoryinvisible touser code
brk
0xc0000000
0x08048000
0x40000000
Read/write segment(.data, .bss)
Read-only segment(.init, .text, .rodata)
Loaded from the executable file
Linux Run-time Memory ImageLinux Run-time Memory Image
Startup Routines for C programStartup Routines for C program
When the loader (execve) runsWhen the loader (execve) runsIt creates the memory image by copying chunks of the executable object files into the code and data segments (guided by the segment header table)
The loader jumps to the program’s entry point, The loader jumps to the program’s entry point, Which is the address of the _start symbol.
The startup code at the _start address is defined in the object file crt1.oThe startup code at the _start address is defined in the object file crt1.o0x080480c0 <_start>: /* entry point */
call __libc_init_first /* startup code in .text */
call _init /* startup code in .init */Initialization routines from .text and .init sections
call atexit /* startup code in .text */Appends a list of routines that should be called when the application calls the exit function
call main /* application code */exit function runs the functions registered by atexit and returns control to OS by calling _exit
call _exit /* return control to OS */
Shared LibrariesShared Libraries
Static libraries have the following disadvantages:Static libraries have the following disadvantages:Potential for duplicating lots of common code in the executable files on a file system.
Every C program needs the standard C library
Potential for duplicating lots of code in the text segment of each process. Minor bug fixes of system libraries require each application to explicitly relink.
Solution: Solution: Shared librariesShared libraries Whose members are dynamically loaded and linked at run-time.Called shared objects (.so) in UNIX or dynamic link libraries (DLL) in Windows
Shared library routines can be shared by multiple processes.There is exactly one .so file for a particular library in any given file system.
The code and data in this .so file are shared by all the executable object files
A single copy of the .text section of a shared library in memory can be shared by multiple processes
Dynamic linking can occur when executable is first loaded and run.Common case for Linux, handled automatically by ld-linux.so.
Dynamic linking can also occur after program has begun.An application can request the dynamic linker to load and link shared libraries.In Linux, this is done explicitly by user with dlopen().
Dynamic Linking with Shared LibraryDynamic Linking with Shared Library
Translators (cpp, cc1, as)
main2.c
main2.o
libc.solibvector.so
Linker (ld)
p2
Dynamic linker (ld-linux.so)
Relocation and symbol table info
libc.solibvector.so
Code and data
Partially linked executable object file(on disk)
Relocatableobject file
Fully linked executablein memory
vector.h
Loader (execve)
gcc –shared –fPIC –o libvector.so addvec.c multvec.c
gcc –o p2 main2.c ./libvector.so
loader loads and runsthe dynamic linker, and
then passes control to the application
The Complete PictureThe Complete Picture
Translator
m.c
m.o
Translator
a.c
a.o
libc.so
Static Linker (ld)
p
Loader/Dynamic Linker(ld-linux.so)
libwhatever.a
p’
libm.so
Homework 3Homework 3
Read Chapter 3 (from Computer Organization and Design Textbook)Read Chapter 3 (from Computer Organization and Design Textbook)
Exercise (from Computer Systems Textbook)Exercise (from Computer Systems Textbook)
7.6
7.9
7.12
7.13
7.14