generating programs and linking professor rick han department of computer science university of...
Post on 19-Dec-2015
217 views
TRANSCRIPT
Generating Programs and Linking
Professor Rick Han
Department of Computer Science
University of Colorado at Boulder
CSCI 3753 Announcements
• Moodle - posted last Thursday’s lecture
• Programming shell assignment 0 due Thursday at 11:55 pm, not 11 am
• Introduction to Operating Systems
• Read Chapters 3 and 4 in the textbook
System Libraries and Tools(Compilers, Shells, GUIs)
Operating System Architecture
App3
DiskMemoryCPU Display Mouse
App2App1
I/O
Scheduler VMFile
System OS
“Kernel”
Posix, Win32,Java, C library APISystem call API
DeviceManager
What is an Application?
• A software program consist of a sequence of code instructions and data– for now, let a simple
app = a program
• Computer executes the instructions line by line– code instructions
operate on data
Code
Data
Program P1
CPU
ProgramCounter (PC)
Registers
ALU
Fetch Codeand Data
Write Data
Code
Data
MainMemory
OS Loader
ProgramP1
binary
Loading and Executing a Program
Code
Data
Code
Data
P1binary
P2binary
Disk
Loading and Executing a Program
Code
Data
Code
Data
Code
Data
P1binary
P2binary
Disk
MainMemory
OS Loader
ProgramP1
binary
shift left by 2 register R1and put in address A
invoke low level systemcall n to OS: syscall n
jump to address B
Machine Code instructionsof binary executable
Generating a Program’s Binary Executable
• We program source code in a high-level language like C or Java, and use tools like compilers to create a program’s binary executable
Code
Program P1’sBinary
Executable
SourceCode
Compiler
file P1.c
Assembler Linker
Data
gcc can generate any of these stages
P1.s P1.o
technically, there is a preprocessing step before the compiler.“gcc -c” will generate relocatable object files, and not run linker
Linking Multiple Object Files Into an Executable
• linker combines multiple .o object files into one binary executable file– why split a program into multiple objects and then relink them?– breaking up a program into multiple files, and compiling them
separately, reduces amount of recompilation if a single file is edited
• don’t have to recompile entire program, just the object file of the changed source file, then relink object files
Code
P1 or P1.exe
SourceCode
Compilercc1
file P1.c
Assembleras
Linkerld
Data
P1.s P1.o
foo2.o
foo3.o
Linking Multiple Object Files Into an Executable
• in combining multiple object files, the linker must – resolve references to variables and functions defined in other
object files - this is called symbol resolution– relocate each object’s internal addresses so that the
executable’s combination of objects is consistent in its memory references
• an object’s code and data are compiled in its own private world to start at address zero
Code
P1 or P1.exe
SourceCode
Compilercc1
file P1.c
Assembleras
Linkerld
Data
P1.s P1.o
foo2.o
foo3.o
Linker Resolves Unknown SymbolsP1.c
int globalvar1=0;
main(...) { ----- f1(...) -----}
foo2.c
void f1(...) { ----}
void f2(...) { ---- globalvar1 = 4; ----}
extern void f1(...); extern int globalvar1;
P1.o
the P1.o object file will contain a list ofunknown symbols, e.g. f1, in a symbol table
foo2.o
foo2.o’s symbol table listsunknown symbols, e.g. globalvar1
Linker Resolves Unknown Symbols
• ELF relocatable object file contains following sections:– ELF header (type, size, size/#
sections)– code (.text)– data (.data, .bss, .rodata)
• .data = initialized global variables• .bss = uninitialized global variables
(does not actually occupy space on disk, just a placeholder)
– symbol table (.symtab)– relocation info (.rel.text, .rel.data)– debug symbol table (.debug only if
“-g” compile flag used)– line info (map C & .text line #s only
if “-g”)– string table (for symbol tables)
ELF header.text
.rodata.data.bss
.symtab.rel.text.rel.data.debug
.line.strtab
Section header table
ELF relocatable object file
Linker Resolves Unknown Symbols
• Symbol table contains 3 types of symbols:– global symbols - defined in this object– global symbols referenced but not defined
here– local symbols defined and referenced
exclusively by this object, e.g. static global variables and functions
• local symbols are not equivalent to local variables, which get allocated on the stack at run time
Linker Resolves Unknown Symbols
extern float f1();
int globalvar1=0;
void f2(...) {
static int x=-1; ----- }
global symbols defined here
global symbol referenced herebut defined elsewhere
“local” symbol
• The symbol table informs the Linker where symbols referenced or referenceable by each object file can be found:– if another file references globalvar1, then look here for info– if this file reference f2, then another object file’s symbol table
will mention f2
Linker Resolves Unknown Symbols
• Each entry in the ELF symbol table looks like:
typedef struct { int name; /* string table offset */ int value; /* section offset or VM address */ int size; /* object size in bytes */ char type:4, /* data, func, section or src file name (4 bits) */ binding:4;/* local or global (4 bits) */ char reserved; /* unused */ char section; /* section header index, ABS, UNDEF, */} ELF_Symbol;
here’s where we flag the undefined status
Linker Resolves Unknown Symbols
• During linking, the linker goes through each input object file and determines if unknown symbols are defined in other object files
Linker
Code
Data
.symtab
P1.o relocatableobject file
Code
Data
.symtab
P2.o
Code
Data
.symtab
P3.o
function f1() in P1.ois referenced butnot defined, henceunknown
definedin P2?
No defined inP3?
Yes
Linker Resolves Unknown Symbols
• What if two object files use the same name for a global variable?– Linker resolves multiply defined global symbols– functions and initialized global variables are defined
as strong symbols, while uninitialized global variables are weak symbols
Rule 1: multiple strong symbols are not allowed
Rule 2: choose the strong symbol over the weak symbol
Rule 3: given multiple weak symbols, choose any one
Linker Resolves Unknown Symbols
• Linking with static libraries– Bundle together many related .o files together into a
single file called a library or .a file• e.g. the C library libc.a contains printf(), strcpy(), random(),
atoi(), etc.• library is created using the archive ar tool
– the library is input to the linker as one file– linker can accept multiple libraries– linker copies only those object modules in the library
that are referenced by the application program– Example: gcc main.c /usr/lib/libm.a /usr/lib/libc.a
Linker Resolves Unknown Symbols
• a static library is a collection of relocatable object modules– group together related
object modules– within each object, can
further group related functions
– if an application links to libfoo.a, and only calls a function in foo3.o, then only foo3.o will be linked into the program
libfoo.a
foo1.o
foo2.o
foo3.o
foo4.o
Linker Resolves Unknown Symbols
• Linker scans object files and libraries sequentially left to right on command line to resolve unknown symbols– for each input file on command line, linker
• updates a list of defined symbols with object’s defined symbols• tries to resolve the undefined symbols (from object and from list of
previously undefined symbols) with the list of previously defined symbols
• carries over the list of defined and undefined symbols to next input object file
– so linker looks for undefined symbols only after they’re undefined!
• it doesn’t go back over the entire set of input files to resolve the unknown symbol
• if an unknown symbol becomes referenced after it was defined, then linker won’t be able to resolve the symbol!
• Thus, order on the command line is important - put libraries last!
Linker Resolves Unknown Symbols
• Example: gcc libfoo.a main.c– main.c calls a function f1 defined in libfoo.a– scanning left to right, when linker hits libfoo.a, there
are no unresolved symbols, so no object modules are copied
– when linker hits main.c, f1 is unresolved and gets added to unresolved list
– Since there are no more input files, the linker stops and generates a linking error:
/tmp/something.o: In function ‘main’:
/tmp/something.o: undefined reference to ‘f1’
Linker Resolves Unknown Symbols
• Example: gcc main.c libfoo.a– main.c calls a function f1 defined in libfoo.a– scanning left to right, when linker hits main.c, it will add f1 to the
list of unresolved references– when linker next hits libfoo.a, it will look for f1 in the library’s
object modules, see that it is found, and add the object module to the linked program
– No errors are generated. A binary executable is generated.• Lesson #1: the order of linking can be important, so put
libraries at the end of command lines• Lesson #2: an undefined symbol error can also mean
that you – didn’t link in the right libraries, didn’t add right library path– forgot to define the symbol somewhere in your code
Linker Relocates Addresses
• After resolving symbols, the linker relocates addresses when combining the different object modules– merges separate code .text sections into a single .text section– merges separate .data sections into a single .data section– each section is assigned a memory address– then each symbol reference in the code and data sections is
reassigned to the correct memory address• looks at .relo.text and .relo.data to find relocation entries of
references that needed address translation
– these are virtual memory addresses that are translated at load time into real run-time memory addresses
Linked ELF Executable Object File
• ELF executable object file contains following sections:– ELF header (type, size, size/# sections)– segment header table– .init (program’s entry point, i.e. address
of first instruction)– other sections similar– Note the absence of .rel.tex
and .rel.data - they’ve been relocated!• Ready to be loaded into memory and
run– only sections through .bss are loaded
into memory– .symtab and below are not loaded into
memory– code section is read-only– .data and .bss are read/write
ELF headersegment header table
.init.text
.rodata.data.bss
.symtab.debug
.line.strtab
Section header table
ELF executable object file