intro to reverse engineering

Intro to Reverse Engineering

~ intropy ~

Why do we reverse engineer?

• Closed source software– Vulnerability Research– Product verification

• Proprietary formats– Interoperability

• SMB on UNIX• Word compatible editors

• Virus research

Why should you give a fuck?

• Basis of computing– Reverse engineering teaches the inner

workings of any processor– Learning how the processor handles

data helps in understanding many other aspects of computer security

• All the cool kids are doing it (not really)

Real Time RCE (Debugging)

• Debuggers that disassemble– OllyDbg– WinDbg– SoftIce

• Code actually runs– The application actually executes all

instructions as if it was ran normally• Uses interrupts to control execution of the

program– Swaps out the current instruction with an

interrupt instruction code– Swaps it back when the execution is continued

Static Analysis (Dead Listing)

• Traditional disassemblers– IDA Pro– W32Dasm– objdump

• Code does not execute– The disassembler parses the file format and related code

sections– Good disassemblers do deep recursive analysis to ensure

proper instruction disassembly• Allows the user the ability to look at what code will

do without actually running it• Does not allow the ease of live

disassembly/debugging– Viewing registers– Inspecting the contents of memory

File Formats

What are file formats?

• Files that adhere to a specific format often being executable by an operating system

• Executable files are created from source code and libraries by a compiler

• Data files can be created by anything from a text editor to an mp3 encoder

Executable Contents

• Machine code– Instructions the program will run– Memory locations

• code addresses• function addresses

• Program data– Static variables– Strings

• Loader data– Imports– Exports

Sections

• Allows the loader to find various information

• Not finite, executables can have user defined sections

Executable Formats

• ELF – Executable and Linker Format– History

Originally published by UNIX system laboratories as a dynamic, linkable format to be used in various UNIX platforms

– What uses ELF• Linux• Solaris• Most modern BSD based unix’s

– Dissection• Header• Sections

ELF Header• The header contains various information the operating

system loading needs

e_ident – Contains various identification fields including Endianess, ELF version, Operating System

e_type – Identifies the object file type including relocatable, executable, or core file

e_machine – Contains the processor type including Intel 80386, HPPA, PowerPC

e_version – Contains the file version informatione_entry - Contains the entry point for the executablee_phoff – Contains the program files header offset in bytese_shoff – Contains the section header offsete_flags – Contains the processor specific flagse_ehsize – Contains the ELF header size in bytes

ELF Sections• Each section of an ELF executable contain various

information needed to execute

.bss - This section holds uninitialized data that contributes to the program's memory image. By definition, the system initializes the data with zeros when the program begins to run.

.comment - This section holds version control information.

.ctors - This section holds initialized pointers to the C++ constructor functions.

.data - This section holds initialized data that contribute to the program's memory image.

.data1 - This section holds initialized data that contribute to the program's memory image.

.debug - This section holds information for symbolic debugging. The contents are unspecified.

.dtors - This section holds initialized pointers to the C++ destructor functions.

.dynamic - This section holds dynamic linking information.

ELF Sections Cont….dynstr - This section holds strings needed for dynamic linking,

most commonly the strings that represent the names associated with symbol table entries.

.dynsym - This section holds the dynamic linking symbol table.

.fini - This section holds executable instructions that contribute to the process termination code. When a program exits normally the system arranges to execute the code in this section.

.got - This section holds the global offset table.

.hash - This section holds a symbol hash table.

.init - This section holds executable instructions that contribute to the process initialization code. When a program starts to run the system arranges to execute the code in this section before calling the main program entry point.

.interp - This section holds the pathname of a program interpreter. If the file has a loadable segment that includes the section, the section's attributes will include the SHF_ALLOC bit. Otherwise, that bit will be off.

.line - This section holds line number information for symbolic debugging, which describes the correspondence between the program source and the machine code. The contents are unspecified.

ELF Sections Cont….note - This section holds information in the ``Note Section''

format described below..plt - This section holds the procedure linkage table..relNAME - This section holds relocation information. By convention,

``NAME'' is supplied by the section to which the relocations apply. Thus a relocation section for .text normally would have the name .rel.text

.rodata - This section holds read-only data that typically contributes to a non- writable segment in the process image.

.rodata1 - This section holds read-only data that typically contributes to a non- writable segment in the process image.

.shstrtab - This section holds section names.

.strtab - This section holds strings, most commonly the strings that represent the names associated with symbol table entries.

.symtab - This section holds a symbol table. If the file has a loadable segment that includes the symbol table, the section's attributes will include the SHF_ALLOC bit. Otherwise the bit will be off.

.text - This section holds the ``text'' or executable instructions, of a program.

Executable Formats Cont…• PE – Portable Executable

– HistoryMicrosoft migrated to the PE format with the introduction of the Windows NT 3.1 operating system. It is based of a modified form of the UNIX COFF format

– What uses PE• Windows NT• Window 2000• Windows XP• Windows 2003• Windows CE

– Dissection• DOS Stub

– The DOS stub contains a message that the executable will not run in DOS mode

• Optional Header (Not optional]• RVA

– Relative virtual addressing• Sections

Optional Header• The optional header in a PE executable contains various information

regarding the executable contents needed for the OS loader

SizeOfCode - Size of the code (text) section, or the sum of all code sections if there are multiple sections.

AddressOfEntryPoint – Address of the entry function to start execution fromBaseOfCode - RVA of the start of the code relative to the base

addressBaseOfData – RVA of the start of the data relative to the base

addressSectionAlignment – Alignment of sections when loaded into memoryFileAlignment – Alignment of section on diskSizeOfImage - Size, in bytes, of image, including all headers; must

be a multiple of Section Alignment SizeOfHeaders - Combined size of MS-DOS stub, PE Header, and

section headers rounded up to a multiple of FileAlignment. NumberOfRvaAndSizes - Number of data-dictionary entries in the remainder

of the Optional Header. Each describes a location and size.

Sections

• The sections in a PE file contain various pieces of the executable needed to run including various RVA’s and offsets

.text – Contains all executable code

.idata – Contains imported data such as dll addresses

.edata – Contains any exported data

.data – Contains initialized data like global variables and string literals

.bss – Contains un-initialized data

.rsrc – Contains all module resources

.reloc – Contains relocation data for the OS loader

Data Formats

• Different than executable formats– Doesn’t usually contain machine code– Has structure but not always defined sections

• A reverser often needs to reverse how a file format functions– Proprietary formats are not always published– Reversing allows compatibility (i.e. Microsoft doc)

• Data rights management– Often the only way to get what you pay for is to

take action

Assembly Language

What is it

• Lowest level of programming (besides microcode)

• Direct processor register access utilizing architecture defined instructions

• Output of most compilers

How is it used

• Directly using an assembler– NASM– ml– as

• Output by a high level compiler– GCC– cl

What does it looks like

• Depends on the instruction set– IA32

• mov eax, 0x1

– PA-RISC• copy %r14,%r25

– ARM• LDR r0,[r8]

Instruction Sets

• The mneumonics for the opcodes handled by the processor

• Minimal set of “commands” that achieve a programming goal

Different Instruction Set Architectures

• RISC - Reduced Instruction Set Computing– Fixed length 32 bit instructions– 32 general purpose registers – Vendors

• IBM (PowerPC)• HP (PA-RISC)• Apple (PowerPC)

• CISC - Complex Instruction Set Computing– Multibyte instructions– Multiple synonymous opcodes– 16 registers– Vendors

• Intel (IA-32)• DEC [PDP-11]• Motorola (m68K)

Registers and the Stack

Overview

• Purpose– Registers are used to store temporary

data• Pointers• Computations

– The stack is used to manage data• Variables• Data

Stack Layout• Stack is dynamic but builds as it goes• Addresses start at a higher address and

builds to lower addresses• The stack is generally allocated in 4 byte

chunks

Register sizes

• Register sizes depend on the supported architecture– 32 bit– 64 bit

• IA32– 16 registers 32 bits (4 bytes) each

• RISC– 32 general purpose registers 64 bits [8

bytes] each

IA32 Registers

• EBP – Stack frame base pointer– Points to the start of the functions stack frame

• ESP – Stack source pointer– Points to the current (top) location on the stack

• EIP – Instruction pointer– Points to the next executable instruction

IA32 Registers Cont…• General Purpose registers

– Used in general computation and control flow– EAX – Accumulator register– EBX – General data register– ECX – Counter register– EDX – General data register– ESI – Source index register– EDI – Destination index register

• Segment registers– Used to segment memory and compute addresses– CS – Code segment register– SS - Stack segment register– DS - Data segment register– ES - Extra (More data) segment register– FS - Third data segment register– GS – Fourth data segment register

• EFLAGS– CF – Carry Flag– SF – Signed Flag– ZF – Zero Flag

Overview of IA-32 Instruction Set

• mov – Moves source to destination• lea – Loads effective address• jmp – Jump

– jne – Jump if not equal– jg – Jump if greater than

• call – Unconditional function call• ret – Returns from a function to the caller• add – Adds two values• sub – subtracts two values• xor – XORs two values• cmp – Compares two registers

Calling conventionsCalling conventions define how the callers data is arranged on the stack

• cdecl– Most common calling convention– Dynamic parameters– Caller unwinds stack

• pop ebp• ret

• fastcall– Higher performance– First two parameters are passed over registers

• stdcall– Common in Windows– Parameters are received in reverse order– Function unwinds stack

• ret 0x16

Example

PUSH EBP ; Pushes the contents of EBP onto the stack

MOV EBP, ESP ; Moves the address of ESP to EBPCMP DWORD PTR [EBP+C], 111 ; Subtract what is at EBP+12 with

111JNZ 00401054 ; If previous compare is not zero jump to

00401054MOV EAX, DWORD PTR [EBP+10] ; Move what is at EBP+16 to EAXCMP AX, 64 ; Subtract what we moved to EAX with

64JNZ 00401068 ; If the comparison does not equal 0

jump to addressPOP EBP ; Store the current value on the stack in

EBPRET ; Return to the caller

OllyDbg

Overview

• Purpose– OllyDbg is a general purpose win32 user land

debugger. The great thing about it is the intuitive UI and powerful disassembler

• Licensing– OllyDbg is free (shareware), however it is not

open source and the source code is not available

• Extensibility– OllyDbg has defined a plugin architecture

allowing extensibility via powerful plugins

Window Layouts

• Window layouts are the various parts of the UI that contain pertinent information– Code window – Displays the executable

machine code– Register window – Allows the user to watch

the contents of each register during execution

– Memory window – Allows the user to view the contents of various memory locations

– Stack window – Displays the stack, including memory addresses and values

Working in OllyDbg• Navigation

– Moving– Searching

• Commenting– Can be entered in the code window with the ; or : keys

• Listing Names– The names window displays all functions or imported

functions used in the program– Listing them is easy via the shortcut Ctrl + N

• Showing Memory– Displaying memory can be useful when looking for

strings or other important data– Displaying the memory map window can be achieved via

Alt + M

Working in OllyDbg Cont…

• Breakpoints– Breakpoints allow the debugger to stop at a

specified address or instruction– There are two types of breakpoints in general

• Software breakpoints– Handled by the operating system– Set by navigating to the specified address and hitting

F2• Hardware breakpoints

– Handled by the processor– Set by finding a place in memory you want to break on

access and right clicking selecting the proper option

– Olly also provides a way to view and turn on and off breakpoints via the breakpoints window with Alt + B


• Controlling Execution– Starting the process

• Once the target program is either loaded or attached in Olly you can start execution. This will actually set up an initial breakpoint at the application entry point

– There are several ways you can proceed from the entry point

• Single stepping– Executes one instruction at a time and can be achieved by hitting

F7– Steps into every function– Tedious as fuck

• Execute until return– Executes until the ret instuction is encoutered which can be

achieved by hitting Ctrl + F9– Executes all instructions in the current function– Faster than single stepping but not as comprehensive


• Watching execution– Registers

• Handled in the register window• Red highlighting indicates a register has changed

– Stack• Handled in the stack window• Display can be address or relative address from ebp

• Call stack– Displays the functions the current function has

been called from– Can be displayed with the shortcut Alt + K

OllyDbg Case Study*(smarty word for demo)

• Example– Program displays a popup box– Goal is to make the proper box show

and exit

• Patching– Allows us to modify the executable

assembly code and save it to a new file with the changes

OllyDbg Plugins

• OllyDbg provides a downloadable PDK for plugin development

• Several plugins exist that provide extra usability– Heap Vis– Breakpoint manager– Ollyscript

IDA Pro

Overview

• IDA Pro was originally designed as a powerful disassembler

• Supports 30+ processors• It has since been broadened to include a

built in debugger• Designed for reverse engineers with

quickness and robustness in mind– This sometimes makes the learning curve step

• Extensible plugin architecture and scripting language

Window Layouts

• Customizing window layouts– Each saved session will store any

customized layouts– A default layout can also be saved– Customized layouts are provided to help

the user with workflow and can consist of any combination or number of windows

Navigation• Shortcuts

– Most actions have equivalent shortcuts associated with them– Some of the most used

• [Enter] – Jumps into the function under the cursor• [Esc] – Returns to the previous cursor position

• Jumping– IDA allows the user to jump to various parts of a binary file easily– Some of the jumps

• Entry point – Jumps to the entry point of the binary• By name – Allows the user to jump to a specific function or string in

the binary• By address – Allows the user to jump to a specific address

• Markers– Markers can be used to tag locations in the binary for future

reference– Markers are set using Alt + M and naming– Jumping to a marker is easily achieved with Ctrl + M

Editing

• Comments– Comments allow you to organize and document

important parts of the binary– Comments can be entered using the shortcut keys ;

or :

• Function names can be renamed to something more descriptive– Often times symbols are not available for the binary

and naming each functions allows you to understand and track your work

– Functions can be renamed using the shortcut Alt + P

Windows• IDA View

– Displays the disassembled binary• Hex View

– Display the hex view of the current cursor position• Names

– The names windows displays textual names and addresses in the binary

• Strings– The strings window contains any ascii strings present in the

executable• Imports

– The imports window contains the imported functions from dll’s• Functions

– The functions window allows you to view all functions and their addresses

Graphing

• IDA Pro has a powerful graphing engine that allows a user to visualize call graphs and xrefs– Flow chart graphs display the current

functions machine code and any branches– Function call graph will display the call flow

of all the functions in the executable (Can be large)

– Xref graphs display the to and from xrefs with machine code

SDK/Plugins

• The SDK allows the user to develop plugins for use in IDA Pro

• Plugins are generally written in C/C++ and compiled against the SDK libraries and headers

• Using the plugins you can write– processor modules– input processing modules– plugin modules

• Some good plugins– x86emu – Allows ida to do runtime emulation– IDAPython – Access the IDA API in Python– Processes Stalker – Allows visualization and run time

tracing

Flirt

• Fast Library Identification and Recognition Technology

• Flirt is a means for IDA Pro to identify imported functions and compilers by matching against a database of known signatures

• This greatly speeds up analysis by automatically naming discovered functions

• Only works with C/C++ functions

IDC Scripting

• The IDC scripting engine allows the user to achieve small tasks through the IDC scripting engine

• IDC resembles C and has many helpful functions built in– PatchByte– Comment– FindCode

Decompiling

Overview

• Decompiling is different than disassembling in that it tries to reconstruct machine code to readable (and ultimately compilable) source code– Native compiled code is difficult to reconstruct

because of the compilers behavior when optimizing the produced code

– Virtual machine code is much easier to achieve readable code because of its nature. It must be compiled into a intermediate language with all necessary information the target platform may need to run

• .Net• Java

.Net

• .Net is compiled down into MSIL (Microsoft intermediate language) and is a good example of decompiling

• .Net must provide the operating system with a wealth of information including symbol names, and data structures

Native code

• Native code is a language that has been compiled down into machine language

• Often times because of optimization a compiler inadvertently obfuscates the higher lever source code

• Decompiling is not quite to the point of producing a good representation of the original source code

Decompilers

• .Net– ILDasm– Remotesoft Salamander– Reflector for .Net

• Java– JODE– JAD (Disappeared)

• Native– Boomerang

Decompilation Demo

Thanks fend3r!

Conclusion

• Reverse engineering is a vast and complex world

• With a lot of practice though it becomes much easier

• A good reverser knows their tools inside and out

• Workflow and organization are the keys to reversing

Shirt Quiz

• Name the IA-32 registers• What does .Net assemble into• In OllyDbg how do you list the Names• What is the IA-32 instruction to Compare two

integers• How does the IA-32 processor handle

signedness• What does the IDC scripting language resemble• How many processors does IDA support

(roughly)• In IDA how do you quickly follow a CALL

References• Reversing - http://www.wiley.com/WileyCDA/WileyTitle/productCd-

0764574817.html• ELF File format -

http://www.skyfree.org/linux/references/ELF_Format.pdf• PE File Format -

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dndebug/html/msdn_peeringpe.asp

• http://lsd-pl.net/references.html• OllyDbg - http://ollydbg.de/• OllyDbg Plugins - http://ollydbg.win32asmcommunity.net/stuph/• IDA Pro - http://www.datarescue.com/• IDC - http://www.datarescue.com/idadoc/707.htm• IDA Plugins - http://home.arcor.de/idapalace/• Reflector - http://www.aisto.com/roeder/dotnet/• JODE - http://jode.sourceforge.net/• Boomerang - http://boomerang.sourceforge.net/• Crackmes.de - http://www.crackmes.de/

http://www.skyfree.org/linux/references/ELF_Format.pdf



http://lsd-pl.net/references.html

http://ollydbg.de/

http://www.datarescue.com/

http://home.arcor.de/idapalace/

http://www.aisto.com/roeder/dotnet/

http://jode.sourceforge.net/

http://boomerang.sourceforge.net/



Fucking done.

Questions?

intro to reverse engineering

Documents