[acm press the 6th acm symposium - hong kong, china (2011.03.22-2011.03.24)] proceedings of the 6th...

Self Destructive Tamper Response for Software Protection

Kazuomi Oishi and Tsutomu MatsumotoGraduate School and Research Institute of Environment and Information Sciences

Yokohama National University79-7 Tokiwadai, Hodogaya-ku, Yokohama 240-8501, Japan

[email protected], [email protected]

ABSTRACTA method of creating tamper resistant software that is re-sistant to unauthorized modification is proposed. It utilizesa primitive that combines self-modifying based instructioncamouflage and self integrity verification, and a method toconstruct a structure in which the multiple primitives areinterlocked each other. Tamper resistant software createdby the proposed method contains multiple camouflaged in-structions in the object program, so that it is difficult forattacker to correctly understand the content of processingusing static analysis. When attacker tries to do dynam-ic analysis, anti-debugging techniques prevent the attempt.The tamper resistant software, at runtime, continuously ex-ecutes detecting and preventing dynamic analysis, verifyingits integrity, and self-modifying itself in such a way that tar-get of self-modifying is dynamically determined according toresult of self integrity verification. If unauthorized modifi-cation is detected, then it self-modifies a part of instructionwhich is different from the part of camouflaged instructionto be self-modified, and executes different instructions fromits original. As a result, it generates a series of unpredictableabnormal self destructive behaviors such as error or termina-tion, so that attacker’s analysis and modification are strong-ly disturbed. Cost of analysis is increased as the numbersof self integrity verification and instruction camouflage areincreased, hence, the tamper resistance can be strengthenedquantitatively.

Categories and Subject DescriptorsD.m [Software]: Miscellaneous—Software protection; K.6.5[Management of Computing and Information Sys-tems]: Security and Protection

General TermsSecurity

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.ASIACCS ’11, March 22–24, 2011, Hong Kong, China.Copyright 2011 ACM 978-1-4503-0564-8/11/03 ...$10.00.

KeywordsInstruction camouflage, self destructive tamper response,self integrity verification, self-modifying, tamper resistance

1. INTRODUCTIONMany security functionalities are implemented as software

and huge number of users use such software programs onPCs, cellular phones, etc. In order to assure the security, itis required that unauthorized observation and modificationof implemented data or algorithm should be difficult even ifmalicious users analyze and attack against them. Softwarewith these features is called tamper resistant software, andits realization has been researched [1, 14].

Two conventional approaches in binary program analy-sis are static analysis and dynamic analysis. Static analysisdoesn’t execute target program and examines code of thetarget program using tools such as disassembler. Static anal-ysis can examine whole code of the target but cannot observeits runtime behavior. Dynamic analysis executes target pro-gram and examines its runtime behavior using tools such asdebuggers, virtual machines, emulators, etc. Dynamic anal-ysis can examine runtime behavior of the target in detailbut cannot obtain information on part of code that is notexecuted. Thus, both analyses are often used together.

As countermeasure against static analysis, various obfus-cation techniques are proposed, e.g., obfuscation of controlflow [3], instruction camouflage [9], protection of variablesusing encoding [4], and obfuscation of executables that im-proves resistance to static disassembly [10]. However, as tomost of them, it is difficult to evaluate their effectivenessand dynamic analysis is not included in their scopes.

Anti-debugging can be a countermeasure against dynamicanalysis. Various anti-debugging tricks circumvent dynamicanalysis by detecting or thwarting dynamic analysis tools[5]. Such anti-debugging techniques are usually implementedin the program to be analyzed, as anti-debugging routines,and they work at runtime. Many concrete examples of anti-debugging routine and the classification can be found in [13].Anti-debugging routines need a mechanism to protect itselffrom modification. but such mechanism is not necessarilystrong enough [11].

Self integrity verification is proposed as a countermeasureagainst modification [2, 7]. Self integrity verification pro-gram includes verification routines (called guards or testers)and periodically invokes them during execution, so that theprogram verifies by itself at runtime if it is modified. Net-works of interconnected verification routines are formed, sothat all or almost of the interconnected verification routines

490

must be disabled at the same time to succeed in modifica-tion, hence, it has an effect of increasing the cost of attack.However, concrete method to create them or the implemen-tation are not clearly exhibited.In this paper, we propose a method to create tamper resis-

tant software that is resistant to unauthorized modification.It includes an improvement on the method of [12]. The pro-posed method consists of anti-debugging techniques that de-tect and hinder dynamic analysis, a primitive that combinesinstruction camouflage and self integrity verification, and aconstruction method that creates a structure in which themultiple primitives are interlocked each other. The proposedmethod improves and combines the existing technologies insuch a way that their respective weaknesses are canceled andmore stronger tamper resistance is achievable. The proposedprimitive is a novel integration of self-modifying based in-struction camouflage and self integrity verification and gen-erates unpredictable self destructive tamper response thatvaries from modification to modification. The specificationand implementation of the proposed method are clearly ex-hibited and the created tamper resistant software is resistantto both static and dynamic analyses.In section 2, the proposed method is explained. In section

3, features of the proposed method is discussed. In section4, a conclusion is given.

2. PROPOSED METHOD

2.1 Procedure of the proposed methodThe proposed method utilizes a primitive and a construc-

tion method. The proposed primitive is called dynamic self-modifying based self destructive tamper response. The pro-posed construction method creates a structure in which themultiple primitives are interlocked each other. The structureis called interlocking self integrity verification. Software ofobject which proposed method is applied to is called hostprogram.

Input: Host program P that is to be tamper resistant.Output: Tamper resistant object program Q in machine lan-guage.

(step-1) Add anti-debugging routinesAdd anti-debugging routines and hash function ontoP . Any implementation method can be used (e.g., sub-routines or inlining). For example, an anti-debuggingroutine, which implements debugger detection task andtermination task, is repeatedly added from P ’s begin-ning to end so that it is periodically executed as Pruns. If P is a source program in high-level program-ming language such as C, then compile P with addedroutines, and use obtained assembly program1 as Pasm

in step-2.

(step-2) Specify camouflaged instructionsSelect instructions in Pasm that are to be camouflagedand decide the corresponding camouflaged instruction-s. It is possible to specify any pair of a to-be-camouflagedinstruction and a camouflaged instruction. Or, pro-gram developer may directly specify pairs dependingon contents of Pasm. Suppose n pairs are specified.

1In this paper, Intel x86 type CPU is used as example andprogram in assembly language is based on AT&A syntax.

(step-3) Execute algorithm for tamper resistanceExecute algorithm for tamper resistance that trans-forms host program into tamper resistant program.The algorithm takes Pasm and n pairs of instructionsas inputs and outputs a tamper resistant object pro-gram Q in machine language.

2.2 Algorithm for tamper resistance

2.2.1 TerminologiesLet smaddress denote target address of self-modifying,

base denote address to use for indirectly calculatingsmaddress, and hv denote hash value of protected area,respectively. As three variables, smaddress, base, and hvtake any values, mask is introduced as a variable to relatethem. The relationship among the four variables is defined,for example, using a function h, addition, and deduction, asfollows.

smaddress− base = h(hv) +mask (1)

A function h takes output of the hash function as input andoutputs a value whose bit length is less than or equal tothe bit length for addressing (e.g., a function takes a 256-bithv as input, divides it into 32 blocks, and outputs a 8-bitblock). Hereafter, for concise explanation, we suppose thatthe bit length of hv is less than or equal to the bit lengthfor addressing, h(x) = x. Value of mask is determined aftervalue of hv and value of relative distance between smaddressand base are determined.

An area of object program that contains instruction se-quences (code) is referred to as code area. In code area, ncamouflaged instructions are called camouflaged instruction1, camouflaged instruction 2, . . ., camouflaged instruction nin ascending order (i.e., address of camouflaged instructioni is lower than address of camouflaged instruction (i + 1)).Restoring Routine and Hiding Routine that self-modify acamouflaged instruction i (1 ≤ i ≤ n) and its correspond-ing original instruction i are called Restoring Routine i andHiding Routine i, respectively.

2.2.2 AlgorithmWe explain an algorithm that creates interlocking self in-

tegrity verification by applying dynamic self-modifying basedself destructive tamper response n (3 ≤ n) times. Figure 1is used as an example that n = 3.

Input: Assembly program Pasm that is to be tamper resis-tant, and n pairs of a to-be-camouflaged instruction and acamouflaged instruction.Output: Tamper resistant object program Q in machine lan-guage.

(step-1) Replace with camouflaged instructionsReplace every to-be-camouflaged instruction in Pasm with

the corresponding camouflaged instruction.

(step-2) Generate Restoring and Hiding RoutinesGenerate n Restoring Routines and n Hiding Routines.

For each Restoring Routine and each Hiding Routine, assigntemporary values to the start address variable and the lengthvariable of protected area and the mask variable. We explainthe detail of code generation using Restoring Routine 2 ofFigure 1 as an example.

(step-2.1) Generate code that calculates hash value hv bycalling hash function routine _hash. The specification is that

491

Figure 1: An example of tamper resistant softwarecreated by the proposed method

the start address and the length of protected area is specifiedin registers eax and ebx, respectively. As these values arenot determined at the code generation, a temporary valuethat occupies possible maximum variable area is specified.For example, if the bit length for addressing is 32, then avalue 0x12345678 which occupies 32 bits can be used astemporary value.

movl $0x12345678, %ebx

movl $0x12345678, %eax

movl %ebx, 4(%esp)

movl %eax, (%esp)

call _hash

movl %eax, %edxIn this code, hash value hv is stored in register edx.(step-2.2) Generate code that calculates smaddress =

base+ hv +maskmovl $L12, %eax

addl %edx, %eax

subl $0x12345678, %eax

In this code, a label L12 of host program is used as base.As mask value is not determined yet, a temporary value isspecified. Considering that the mask value will be negative,subl is used. Note that smaddress is calculated in registereax, and neither hash value nor address of the instruction tobe self-modified appears as immediate data in the program.

(step-2.3) Generate code that overwrites data atsmaddress with to-be-camouflaged (original) instruction inmachine language. As difference between to-be-camouflagedinstruction 2 (je, 0x74 in machine language) and camou-flaged instruction 2 (jne, 0x75 in machine language) is thefirst 1 byte, generate the following code.

movb $0x74, (%eax)

Codes generated in step-2.1, step-2.2, and step-2.3 consti-tute Restoring Routine 2.

Repeating similar code generation, generate n RestoringRoutines and n Hiding Routines. In order for a Hiding Rou-tine not to coincide with its corresponding Restoring Rou-tine, each code should be generated with, for example, dif-ferent mask definition equation, label, or instruction.

Note: base should be chosen from labels defined in hostprogram, or should be a label defined at random addressof host program If the label is put at just before a to-be-camouflaged instruction, then the address appears as imme-diate value in the program in machine language, so that itcan be a clue for attacker to find a camouflaged instruction.Thus, such label is never used.

(step-3) Insert Restoring and Hiding RoutinesBased on control flow of Pasm, determine positions of

Restoring Routines and Hiding Routines. A procedure forchoosing positions of [9] can be used. Then, insert RestoringRoutines and Hiding Routines into their respective position-s.

When Restoring Routine i and Hiding Routine i are in-serted into their respective positions according to the pro-cedure, it is guaranteed that camouflaged instruction i iscertain to be rewritten as its original instruction i before itis executed, and is certain to be rewritten again as camou-flaged instruction i before the program ends. As a result ofposition determination, the i-th Restoring Routine of codearea in ascending order may not always be Restoring Rou-tine i and that is also the case with Hiding Routines. Eachof Restoring Routines, camouflaged instructions, and HidingRoutines may be consecutive. Suppose an arrangement ofFigure 1.

(step-4) Set start address and length of protectedarea

(step-4.1) Select a Restoring Routine at random. Supposethe j-th Restoring Routine in Figure 1 and j = 2 is selected2.

(step-4.2) Divide code area into n areas. The areas arereferred to as the i-th area (i = 1, 2, . . . , n) from top to bot-tom in Figure 1. Divide code area to satisfy the followingconditions except for the (j−1)-th, j-th, and (j+1)-th areas.[Divide condition 1] Each area includes only one entire Restor-ing Routine (in order to form cycle of path described in di-vide condition 3 later).

2If the first from the top is selected, then area that is abovethe mask of the Restoring Routine is not protected, so thatit should not be selected. If the n-th (the first from the bot-tom) Restoring Routine is selected, then area that is beneaththe mask of the Hiding Routine under the n-th RestoringRoutine is not protected.

492

[Divide condition 2] An area includes zero or more entireHiding Routine(s).[Divide condition 3] Determine boundaries of areas so thateach area is adjacent one another and has no overlap, divideconditions 1 and 2 are satisfied, and the number of areasthat include camouflaged instruction becomes as many aspossible.As to the (j−1)-th area, a boundary between the (j−2)-

th area and it can be defined arbitrarily, it is divided so thatit includes the (j − 1)-th entire Restoring Routine and partof the j-th Restoring Routine that doesn’t contain its maskand area below the mask.As to the j-th area and (j + 1)-th area, condition is dif-

ferent in the next two cases.[Case 1] If there is one or more Hiding Routine(s) betweenj-th Restoring Routine and (j + 1)-th Restoring Routine,then, letting Hiding Routine α be the uppermost HR, theyare divided so that the j-th area is between immediatelybelow the mask of the j-th RR and immediately above themask of the Hiding Routine α, and the (j + 1)-th area is,including the (j+1)-th RR, between immediately below themask of HR α and immediately above the (j + 2)-th area.Figure 1 corresponds to this case 1 and α = 2.[Case 2] If there is no Hiding Routine between the j-thRestoring Routine and the (j + 1)-th Restoring Routine,then they are divided so that the j-th area is between im-mediately below the mask of the j-th Restoring Routine andimmediately above the (j+1)-th area (the boundary betweenthem is arbitrarily defined), the (j +1)-th area includes the(j+1)-th entire Restoring Routine, and the divide conditions1, 2, and 3 are satisfied.(step-4.3) Determine a structure of networks with cycle

of path so that following arrangement conditions are satis-fied. Restoring Routines and Hiding Routines included in anarea are collectively referred to as a self-modifying routineset. We define that the j-th Restoring Routine is includedin a self-modifying routine set of (j − 1)-th area and HidingRoutine α is included in a self-modifying routine set of j-tharea.[Arrangement condition 1] Except for the (j − 1)-th, j-th,and (j + 1)-th areas, Restoring Routines and Hiding Rou-tines included in a self-modifying routine set have a sameprotected area.[Arrangement condition 2] Protected area of the (j − 1)-thRestoring Routine and that of the j-th Restoring Routineare different. When the Hiding Routine α is included in thej-th area, the j-th Restoring Routine and the Hiding Rou-tine α have a same protected area and the protected area ofthe Hiding Routine α and that of the (j + 1)-th RestoringRoutine are different.[Arrangement condition 3] Except for the (j − 1)-th, j-th,and (j + 1)-th areas, each n area is assigned to a protectedarea of a self-modifying routine set of a different area fromitself, and protected area of each self-modifying routine setis an area that doesn’t include the self-modifying routineset itself, and n areas constitute a cycle of paths where apath starts from an area (precisely, a Restoring Routine ora Hiding Routine in the area) and ends at another area (theprotected area of the Restoring Routine or the Hiding Rou-tine)3.

3For example, as shown in Figure 1, Restoring Routine 1included in 1st area protects 2nd area, Restoring Routine 2included in 2nd area protects 3rd area, . . ., and Restoring

(step-4.4) Assemble Pasm and obtain the object programQ in machine language, disassemble the Q, and determinestart address and length of each area. Temporary valuesstored in start address variables and length variables arereplaced with the corresponding determined correct values.

(step-5) Determine and set mask valuesFor each Restoring Routine and each Hiding Routine of Q,

calculate each hash value, then calculate mask values usingthe hash values, and replace temporary values with the maskvalues, in the order of the following steps. A hash value iscalculated by taking its protected area’s object program inmachine language fixed in step-4 as input. In 1st step, calcu-late hash values and mask values of a self-modifying routineset whose protected area is the j-th area, and replace tem-porary values with these mask values. In 2nd step, calculatehash values and mask values of a self-modifying routine setwhose protected area includes the self-modifying routine setof which mask values are determined in 1st step, and replacetemporary values with these mask values. In 3rd step, calcu-late hash values and mask values of a self-modifying routineset whose protected area includes the self-modifying routineset of which mask values are determined in 2nd step, andreplace temporary values with these mask values. Repeatthe above procedure. In n-th step, calculate hash valuesand mask values of the j-th self-modifying routine set (it-s Restoring Routines and Hiding Routine α, if any), andreplace temporary values with these mask values.

If camouflaged instructions included in a protected areaare restored to the original instructions when each RestoringRoutine or each Hiding Routine calculates hash value at run-time, then the machine codes of the camouflaged instruction-s in the protected area are replaced with the machine codesof the original instructions, and hash value is calculated tak-ing the replaced protected area as input. How to decide ifcamouflaged instructions in a protected area are restored isas follows. Let Ri denote a Restoring Routine or a HidingRoutine of a self-modifying routine set, a camouflaged in-struction k denote a camouflaged instruction included in theprotected area of Ri, and RRk and HRk denote a RestoringRoutine and a Hiding Routine that self-modifies the camou-flaged instruction k and its corresponding original instruc-tion k, respectively, Compare execution order of Ri, RRk,andHRk. Let thash(Ri) and tsm(Ri) denote time when hashfunction routine and self-modifying instruction of Ri are ex-ecuted, respectively. If tsm(RRk) < thash(Ri) < tsm(HRk)holds, then camouflaged instruction k is restored to the orig-inal instruction k, otherwise camouflaged instruction k is notrestored. Compare execution order of Ri, RRk, andHRk forall i and all k, determine binary data of protected area andcalculate hash value. If the execution order changes at run-time depending on execution status, then return to step-3,select different positions of Restoring Routine k and HidingRoutine k, and repeat the algorithm. If the execution orderstill changes even after different positions are tried, then re-turn to step-1, and change the to-be-camouflaged instructionk to a different instruction and repeat the algorithm.

In the example of Figure 1 where j = 2, the self-modifyingroutine set whose protected area is 2nd area is RestoringRoutine 1. The self-modifying routine set whose protectedarea is 1st area, which includes Restoring Routine 1, includesHiding Routine 1, Restoring Routine 3, and Hiding Routine

Routine included in n-th area protects 1st area.

493

3. The self-modifying routine set whose protected area is3rd area, which includes Hiding Routine 1, Restoring Rou-tine 3, and Hiding Routine 3, includes Restoring Routine2 and Hiding Routine 2. Therefore, in 1st step, calculatehash value and mask value of Restoring Routine 1, and re-place temporary value with the mask value (-0xe4). (As theinstruction calculating the mask is subl, it is replaced with0xe4.) In 2nd step, calculate hash values and mask values ofHiding Routine 1, Restoring Routine 3, and Hiding Routine3, and replace temporary values with the mask values. 1starea contains original instruction 1 when hash function ofHiding Routine 1 takes 1st area as input. 1st area containscamouflaged instruction 1 when hash functions of RestoringRoutine 3 and Hiding Routine 3 take 1st area as input. In3rd step, calculate hash values and mask values of Restor-ing Routine 2 and Hiding Routine 2, and replace temporaryvalues with the mask values (0x1d and 0x79).Output as Q the object program in machine language to

which the above-mentioned processing has been applied.

2.3 ImprovementMasks that are not included in n areas can be protected

by digital signature as follows. Signature verification rou-tine including verification key is added onto Pasm beforestep-4. It terminates program execution if its area for sig-nature verification doesn’t correspond to its signature datastored in data area of Pasm. The data area is different fromthe protected n areas. As the start address and the lengthof the area for signature verification are not yet determined,temporary values are specified. The area for signature veri-fication is defined after step-4.3 and before step-4.4 so thatthe area includes unprotected masks. In step-4.4, the startaddress and the length are replaced with determined val-ues. After the area for signature verification is determinedin step-5, the signature data is generated and set in the pre-defined place in data area. As long as the signature schemeis secure and signature generation key is secret, attackercannot forge the signature, hence, the masks are protected.

3. FEATURES

3.1 Self destructive tamper responseWe explain how the proposed primitive works using Restor-

ing Routine 2 of Figure 1. When the program is executed,Restoring Routine 2 is executed. When modification of pro-tected area (i.e., 2nd area) is not detected, the camouflagedinstruction “jne L8” is overwritten with its original instruc-tion “je L8”, the original instruction is executed, HidingRoutine 2 is executed, and “je L8” is re-camouflaged with“jne L8” if modification of 3rd area is not detected. Whenmodification is detected, hash value becomes different fromits original and consequently target address of self-modifyingbecomes different from its original. As a result, instruc-tion at the different address is self-modified, so that “jne”is not changed and a byte at the different address is over-written with 0x74. If part of instruction is overwritten withdifferent value, then multiple instructions following (or in-cluding) the part can change to different instructions4. In

4Two instruction sequences that start at slightly differentaddresses synchronize quickly, often after a few instructions[10, 8]. Therefore, a few different instructions continue andoriginal instructions follow them.

this case, instruction boundaries can be influenced and thefollowing processing becomes different from its original, sothat the program execution would not conform to the spec-ification with high probability. For example, while programexecution continues, it generates bugs due to different kindsof instructions or wrong data values, or abnormal behav-iors such as termination by error may occur due to invalidmemory access. That is, if modification is detected, thenself-modifying is not performed correctly, two or more in-structions become different instructions with high probabil-ity, and behavior that is different from original and includesbugs or errors, i.e., self destructive tamper response, occurs.When same modification is done at different positions ordifferent modifications are done at same position, respec-tive hash values, that is, respective target addresses of self-modifying are different, so are tamper responses different. Ifhash value is unpredictable, then tamper response is unpre-dictable as well. That is, created tamper resistant softwareprevents attacker’s analysis and modification by generatingunpredictable self destructive tamper response that variesfrom modification to modification. Although each Restor-ing Routine of Figure 1 includes 1 camouflaged instruction,multiple instructions can be camouflaged. If the number ofcamouflaged instructions per 1 self integrity verification islarger, then more self destructive tamper responses can begenerated, so that analysis and modification are more effec-tively prevented.

3.2 Tamper resistance to modificationIf attacker can disable all self destructive tamper respons-

es, tamper resistance to modification of the proposed methodis removed completely. Note that attacker cannot disableself destructive tamper response unless hash value of protect-ed area and target address of self-modifying are determined.We suppose that goal of attacker is to determine hash valueof protected area and target address of self-modifying, andconsider cost to achieve the goal.

3.2.1 Resistance to static analysisHash value is computed over protected area that is self-

modified at runtime, and the value is stored in register.Based on the value in register, address of camouflaged in-struction to be self-modified is calculated using register atruntime. As neither hash value nor address appears as im-mediate data in the program in machine language, it is diffi-cult to determine these values by static analysis. Therefore,it is difficult to disable dynamic self-modifying based selfdestructive tamper response by static analysis.

3.2.2 Resistance to dynamic analysisIn order to do dynamic analysis, attacker needs to find

and circumvent anti-debugging routines in the program. Asthe program is camouflaged, he cannot necessarily find allanti-debugging routines. If he fails in circumventing anti-debugging routines, then the program is, for example, ter-minated. If he can find and circumvent all anti-debuggingroutines, then he can execute and analyze the program withdebugger until one of the modifications for circumvention isdetected. But, he cannot continue the analysis correctly be-cause abnormal execution of the program happens after thedetection. The position of instruction that is self-modifiedwhen a modification is detected is independent of the po-sition of the modification. Because of camouflage, attacker

494

Figure 2: An example code of GCK scheme

cannot necessarily understand control flow of the programcorrectly. So, it is difficult for attacker to infer what behav-ior the modified program makes and when it happens. Al-gorithm that automates any dynamic analysis is not knownand dynamic analysis generally depends on the attacker’sexperience and heuristics by trial and error. Therefore, ifit is possible to make the attacker’s trial-and-error be suffi-ciently difficult in practice by increasing the numbers of selfintegrity verification and instruction camouflage, then theproposed method can make program be practically resistantto modification.

3.3 Resistance to page replication attackThe following attack (page replication attack) is proposed

in [15] as a generic attack to defeat self integrity verificationunder a condition that OS can be modified. Original pro-gram P and modified program P ′ are loaded in memory andP ′ is executed on a modified OS that can manipulate mem-ory access. Memory access is directed to P when protectedarea is read (data fetch) and it is directed to P ′ when in-struction is retrieved (instruction fetch), so that self integrityverification is successful and modified program is executed.A scheme (GCK scheme) is proposed in [6] as a counter-

measure against the page replication attack. GCK schemeembeds a task into self integrity verification program in sucha way that at runtime the task self-modifies instruction inprotected area and judges whether it is under page replica-tion attack or not by confirming if self-modifying is reflectedin result of self integrity verification.In the proposed method, a protected area when original

instruction appears in it and the same protected area whencamouflaged instruction appears in it can be the objects ofdifferent dynamic self-modifying based self destructive tam-per responses. For example, 1st area, Hiding Routine 1, andRestoring Routine 3 in Figure 1 have this relationship. Ac-cordingly, it can verify if self-modifying is reflected or notand generates self destructive tamper response if verificationfails, so that it can prevent the page replication attack.

3.4 Resistances of GCK scheme and proposedmethod

Based on a general discussion in 3.2.2, we focus on attack-er’s act for locating position of modification. We comparecosts of the attacker’s analyses on two programs, which aregenerated from a same program by GCK method and theproposed method, respectively.An example code of GCK scheme is shown in Figure 2.

For clarity, it shows a mix of x86 assembly code using AT&Tsyntax and C-style branching. It works as follows. Instruc-tion in line 2 self-modifies the value 0 in line 4 with the value

1. The computation of checksum(R1, R2) in line 3 returnsthe value 1 in register al if checksum calculated over the self-modified area between R1 and R2 is equal to a known value,otherwise returns the value 0 in al. Execution of andb in line4 stores the value 1 in al if data fetch and integrity verifica-tion by the checksum function and retrieval of the value 1 byinstruction fetch in line 4 are correctly performed. Other-wise, it stores the value 0 in al. Line 6 confirms if the valuein al is 1 or not, then verify_license() in line 8 is exe-cuted if it is 1 and failed_license() in line 7 is executedotherwise.

For attacker to disable the tamper resistance of GCKscheme, it is sufficient to modify the compare-and-branchcode at lines 6 and 8 so that verify_license() in line 8 isalways executed regardless of the value in al. Branch in Cis implemented as conditional jump in assembly language,for example, je (jump to specified address if comparison issuccessful), so that modification succeeds if je is replacedwith unconditional jump jmp. GCK scheme assumes thatchecksum computation code or verification code are pro-tected by code obfuscation techniques and attacker cannotspecify them using static analysis. Therefore, with respectto GCK scheme, cost for locating position of modificationCostlocateGCK is evaluated as cost for locating the compare-and-branch code at lines 6 and 8 using dynamic analysis. Onehand, if attacker finds and disables anti-debugging routines,then he can continue dynamic analysis with debugger until adisabling modification is detected. On the other hand, if dif-ferent modifications are done within a protected area, a sametamper response happens regardless of modification positionor modification contents. That is, whichever modificationchecksum(R1, R2) detects, failed_license() in line 7 is al-ways executed. Therefore, attacker can conduct analysis asfollows. At first, he disables all anti-debugging routines andmakes two copies of the modified program. Next, he makesdifferent modifications at a same position in them, and ob-tains respective execution traces. By comparing the tracesand extracting the diff, he can narrow down the range cor-responding to lines 3 and 7. Let Costnarrow

GCK denote the costfor narrowing down. Attacker can pinpoint the compare-and-branch code at lines 6 and 8 by analyzing the range indetail or tracing back from line 7. Let CostdetailGCK denote thecost for detailed analysis. Then, cost for locating position ofmodification in order to disable one self integrity verificationis as follows.

CostlocateGCK = CostnarrowGCK + CostdetailGCK (2)

In the proposed method, locating position of modificationis finding Restoring Routine, original instruction, and ad-dress of the original instruction. Let Costlocateproposed denote thecost. By comparing execution traces of modified programsas done for GCK scheme and extracting the diff, attackercan narrow down Restoring Routine and different tamperresponses. Let Costnarrow

proposed denote the cost for narrowingdown. Now, we have the following.

CostnarrowGCK ≃ Costnarrow

proposed (3)

However, as the address of a camouflaged instruction andoriginal instruction corresponding to the camouflaged in-struction don’t appear in these execution traces, the analysisapplied to GCK scheme for pinpointing position of modifi-cation doesn’t work for the proposed method similarly.

In order for attacker to determine original instruction andits address, it is sufficient to observe hash value using debug-

495

ger or obtain execution trace in a situation where RestoringRoutine computes correct hash value over original protect-ed area. The following analysis is a way to do so. At first,attacker finds which (original or camouflaged) instructionsare present in the protected area when the Restoring Rou-tine is executed. Next, he keeps correct occurrence of theinstructions, in addition, he also keeps anti-debugging rou-tines inside the protected area unmodified. If the RestoringRoutine is executed in that state, then attacker can observecorrect hash value using debugger or obtain correct execu-tion trace. Execution like this is referred to as adaptive dy-namic analysis. In order for attacker to execute a RestoringRoutine in a state of having no modification in its protectedarea, it is sufficient to apply adaptive dynamic analysis toall Restoring Routines that are executed from the beginningof the program to the Restoring Routine.An adaptive dynamic analysis to a Restoring Routine in-

cludes determining the Restoring Routine by narrowing downand, in addition to that, determining protected area of theRestoring Routine, determining original instructions thatappear in the protected area when the Restoring Routineis executed, obtaining correct hash value when it is comput-ed, and determining address of camouflaged instruction andits original instruction. These acts have more cost than thatof pinpointing compare-and-branch code. Let Costdetailproposed

denote the cost for these acts, then

Costlocateproposed = Costnarrowproposed + Costdetailproposed (4)

and

CostdetailGCK < Costdetailproposed (5)

hold.From (2), (3), (4), and (5), we have the following.

CostlocateGCK < Costlocateproposed (6)

Thus, the proposed method has stronger tamper resistanceto modification than GCK scheme.

4. CONCLUSIONWe have proposed a method to create tamper resistant

software that is resistant to both static analysis and dynamicanalysis. The proposed method is based on anti-debugging,dynamic self-modifying based self destructive tamper re-sponse, and interlocking self integrity verification. Tamperresistant software created by the proposed method generatesa series of unpredictable self destructive tamper responses,where each response varies from modification to modifica-tion. Therefore, the tamper resistant software disturbs at-tacker’s analysis and modification harder and has strongertamper resistance than previous schemes. It is possible tostrengthen tamper resistance to unauthorized modificationby increasing the numbers of self integrity verification andinstruction camouflage.

5. REFERENCES[1] D. Aucsmith. Tamper resistant software: An

implementation. In Information Hiding, FirstInternational Workshop, volume 1174 of Lecture Notesin Computer Science, pages 317–333, 1996.

[2] H. Chang and M. Atallah. Protecting software code byguards. In Security and Privacy in Digital RightsManagement, volume 2320 of Lecture Notes inComputer Science, pages 160–175, 2002.

[3] C. Collberg, C. Thomborson, and D. Low. Ataxonomy of obfuscating transformations. TechnicalReport 148, The University of Auckland, July 1997.

[4] K. Fukushima, S. Kiyomoto, and T. Tanaka. Anobfuscation scheme using affine transformation and itsimplementation. IPSJ Journal, 47(8):2556–2570, 2006.

[5] M. N. Gagnon, S. Taylor, and A. K. Ghosh. Softwareprotection through anti-debugging. IEEE Security &Privacy, 5(3):82–84, 2007.

[6] J. T. Giffin, M. Christodorescu, and L. Kruger.Strengthening software self-checksumming viaself-modifying code. In Proceedings of the 21 st AnnualComputer Security Applications Conference (ACSAC),pages 23–32, 2005.

[7] B. Horne, L. Matheson, C. Sheehan, and R. Tarjan.Dynamic self-checking techniques for improved tamperresistance. In Security and Privacy in Digital RightsManagement, volume 2320 of Lecture Notes inComputer Science, pages 141–159, 2002.

[8] M. Jacob, M. H. Jakubowski, and R. Venkatesan.Towards integral binary execution: implementingoblivious hashing using overlapped instructionencodings. In Proceedings of the 9th workshop onMultimedia & security, pages 129–140. ACM, 2007.

[9] Y. Kanzaki, A. Monden, M. Nakamura, and K. ichiMatsumoto. A software protection method based oninstruction camouflage. Electronics andCommunications in Japan (Part III: FundamentalElectronic Science), 89(1):47–59, 2006.

[10] C. Linn and S. Debray. Obfuscation of executablecode to improve resistance to static disassembly. InProceedings of the 10th ACM conference on Computerand communications security, CCS ’03, pages 290–299.ACM, 2003.

[11] A. Main and P. C. van Oorschot. Software protectionand application security: understanding thebattleground. International Course on State of the Artand Evolution of Computer Security and IndustrialCryptography,Heverlee, Belgium, June 2003.Proceedings (revised papers): Springer, LNCS (toappear), (Version of Dec.31), 2003.

[12] K. Oishi and T. Matsumoto. Tamper resistantsoftware with self destructive tamper response. IEICETrans. Fundamentals. (Japanese Edition), 94(3),March 2011.

[13] T. Shields. Anti-debugging – a developers view.whitepaper, Veracode Inc., 2009.http://www.veracode.com/images/pdf/whitepaper

_antidebugging.pdf, accessed December 28, 2010.

[14] P. C. van Oorschot. Revisiting software protection. InInformation Security, 6th International Conference,ISC 2003, Bristol, UK, October 1-3, 2003,Proceedings, volume 2851 of Lecture Notes inComputer Science, pages 1–13. Springer, 2003.

[15] P. C. van Oorschot, A. Somayaji, and G. Wurster.Hardware-assisted circumvention of self-hashingsoftware tamper resistance. IEEE Trans. DependableSecur. Comput., 2(2):82–92, April 2005.

496

[acm press the 6th acm symposium - hong kong, china (2011.03.22-2011.03.24)] proceedings of the 6th...

Documents