[ieee 2014 ieee 38th international computer software and applications conference workshops...

6
Introducing Code Assets of a New White-Box Security Modeling Language He Sun, Lin Liu, Letong Feng School of Software Tsinghua University Beijing, China [email protected] Yuan Xiang Gu Irdeto Tsinghua University Ottawa, Canada [email protected] AbstractThis paper argues about a new conceptual modeling language for the White-Box (WB) security analysis. In the WB security domain, an attacker may have access to the inner structure of an application or even the entire binary code. It becomes pretty easy for attackers to inspect, reverse engineer, and tamper the application with the information they steal. The basis of this paper is the 14 patterns developed by a leading provider of software protection technologies and solutions. We provide a part of a new modeling language named i*-WBS(White-Box Security) to describe problems of WB security better. The essence of White-Box security problem is code security. We made the new modeling language focus on code more than ever before. In this way, developers who are not security experts can easily understand what they need to really protect. KeywordsWhite-box security; Code security; Security modeling language; i*-WBS; I. INTRODUCTION White-Box vs. Black-Box Security: Before introduce the new modeling language of White-Box security, we should firstly make you know what the White-Box security is. White-box security now has become an important subarea in the software security field, which is opposite to black-box system security . More efforts are put into it since software is becoming a more valuable commodity than before. As network becomes an indispensable part in peoples daily life, Gigabytes of privacy information are transferring over the Internet in every second. Many attackers are sneezing such information using illegal methods under malicious intentions, which infringe our information property and privacy. In many cases, the system itself can also become the targets of the attackers. e.g., Floods of requests to block a service, also called denial-of-service attack is a typical black-box security attack. On the other hand, when some new software appears on the market, driven by the potential piracy profits, many attackers try their best to find out the inner logic and property of the software by using static and dynamic analysis tools. Piracy, which concerns the software industrial, is prevalent in the last few decades. Copyrighted software and digital products are impacted greatly by the cracked software because of their low cost, neglecting the basic intellectual property rights. The examples above are typical attacks in information security field. They can be divided into two categories according to some criteria explained later. The first scenario represents the traditional black-box attack, and the second scenario represents the white-box attack. Compare to traditional black-box domain, which has based on the assumption that attackers only have limited access to the software itself, white-box domain has everything in the hands of attackers included code. Due to the fundamental divergence between these two situations, more difference arise, bringing out different views, tools, theories and methods. For a better understanding of the differences between the two kinds of security, a reference to relevant testing theory may help. Migrate the view of testing that black- box testing focus on functions, while the white-box testing mainly focus on structures. Because of the fundamental difference, their derived differences can be generally divided into 3 parts: 1. Differences in basic description (aims): Obviously, information security is the key reason for both kinds of attacks, but the effect and attacking measures are fairly different. On one hand, from the aspect of functions, black-box attackers seek “digital information” transferred between ends, so-called “man-in-the-middle” pattern, since transport process becomes the only vulnerability. On the other hand, white-box attackers play the role of “man-in-the-end” with full accessibility to source code or binary code. With the entire system in hand, it is easy for attackers gather something privacy like personal information and what’s more something involved piracy like algorithms or logics with the help of particular tools. As result, black-box attackers spoil the function of system and get personal information illegally, while white- box attackers mainly spoil piracy of the software. 2. Differences in Attack Procedure Taking the accessibility into account, the two kinds of attack take different breakthrough point. For black-box, the vulnerability seems to be the transmission stages. With quantities of tests of inserting, reversing, sneezing and so on, black-box attackers could find the defects and get desired data. Since they focus on function, only partial of the system is involved. 2014 IEEE 38th Annual International Computers, Software and Applications Conference Workshops 978-1-4799-3578-9/14 $31.00 © 2014 IEEE DOI 10.1109/COMPSACW.2014.24 116

Upload: yuan-xiang

Post on 09-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2014 IEEE 38th International Computer Software and Applications Conference Workshops (COMPSACW) - Vasteras, Sweden (2014.7.21-2014.7.25)] 2014 IEEE 38th International Computer

Introducing Code Assets of a New White-Box Security Modeling Language

He Sun, Lin Liu, Letong Feng School of Software Tsinghua University

Beijing, China [email protected]

Yuan Xiang Gu Irdeto

Tsinghua University Ottawa, Canada

[email protected]

Abstract—This paper argues about a new conceptual modeling language for the White-Box (WB) security analysis. In the WB security domain, an attacker may have access to the inner structure of an application or even the entire binary code. It becomes pretty easy for attackers to inspect, reverse engineer, and tamper the application with the information they steal. The basis of this paper is the 14 patterns developed by a leading provider of software protection technologies and solutions. We provide a part of a new modeling language named i*-WBS(White-Box Security) to describe problems of WB security better. The essence of White-Box security problem is code security. We made the new modeling language focus on code more than ever before. In this way, developers who are not security experts can easily understand what they need to really protect.

Keywords—White-box security; Code security; Security modeling language; i*-WBS;

I. INTRODUCTION White-Box vs. Black-Box Security: Before introduce

the new modeling language of White-Box security, we should firstly make you know what the White-Box security is. White-box security now has become an important subarea in the software security field, which is opposite to black-box system security . More efforts are put into it since software is becoming a more valuable commodity than before. As network becomes an indispensable part in people’s daily life, Gigabytes of privacy information are transferring over the Internet in every second. Many attackers are sneezing such information using illegal methods under malicious intentions, which infringe our information property and privacy. In many cases, the system itself can also become the targets of the attackers. e.g., Floods of requests to block a service, also called denial-of-service attack is a typical black-box security attack.

On the other hand, when some new software appears on the market, driven by the potential piracy profits, many attackers try their best to find out the inner logic and property of the software by using static and dynamic analysis tools. Piracy, which concerns the software industrial, is prevalent in the last few decades. Copyrighted software and digital products are impacted greatly by the cracked software because of their low cost, neglecting the basic intellectual property rights.

The examples above are typical attacks in information security field. They can be divided into two categories according to some criteria explained later. The first scenario represents the traditional black-box attack, and the second scenario represents the white-box attack.

Compare to traditional black-box domain, which has based on the assumption that attackers only have limited access to the software itself, white-box domain has everything in the hands of attackers included code. Due to the fundamental divergence between these two situations, more difference arise, bringing out different views, tools, theories and methods.

For a better understanding of the differences between the two kinds of security, a reference to relevant testing theory may help. Migrate the view of testing that black-box testing focus on functions, while the white-box testing mainly focus on structures. Because of the fundamental difference, their derived differences can be generally divided into 3 parts:

1. Differences in basic description (aims): Obviously, information security is the key reason for

both kinds of attacks, but the effect and attacking measures are fairly different. On one hand, from the aspect of functions, black-box attackers seek “digital information” transferred between ends, so-called “man-in-the-middle” pattern, since transport process becomes the only vulnerability. On the other hand, white-box attackers play the role of “man-in-the-end” with full accessibility to source code or binary code. With the entire system in hand, it is easy for attackers gather something privacy like personal information and what’s more something involved piracy like algorithms or logics with the help of particular tools.

As result, black-box attackers spoil the function of system and get personal information illegally, while white-box attackers mainly spoil piracy of the software.

2. Differences in Attack Procedure Taking the accessibility into account, the two kinds of

attack take different breakthrough point. For black-box, the vulnerability seems to be the transmission stages. With quantities of tests of inserting, reversing, sneezing and so on, black-box attackers could find the defects and get desired data. Since they focus on function, only partial of the system is involved.

2014 IEEE 38th Annual International Computers, Software and Applications Conference Workshops

978-1-4799-3578-9/14 $31.00 © 2014 IEEE

DOI 10.1109/COMPSACW.2014.24

116

Page 2: [IEEE 2014 IEEE 38th International Computer Software and Applications Conference Workshops (COMPSACW) - Vasteras, Sweden (2014.7.21-2014.7.25)] 2014 IEEE 38th International Computer

For white-box, though available the entire system is, it seems a little difficult to understand it holistic and focus on specific points. However, with the help of static or dynamic analysis tools, it becomes easier for the attacker to grasp the logic control and get important data.

3. Differences in Cost & Requirements Without knowing the inside of the system, try-and-

error method are the mainly time consuming for black-box attacker, on the other hand, more times are spent on analysis and understanding the entire system in white-box security domain.

Apart from the time consumption, knowledge acquirement also makes a big difference. Knowledge of Operating System and Compiling are the basis for analysis of source code or binary code, something concerned with the transaction logic is a plus for a white-box attacker. But for black-box attackers, some knowledge about certain kinds of attack is competent for the job.

The rest of the paper is organized as follows. In Section 2, we present a basic example of White-box security and model this in traditional i* modeling language. We will introduce a part of the new modeling language related to White-box security based on i* modeling and show the usage and the significance of code assets in Section 3. Related work and conclusion can be found in Section 4.

II. BACKGROUND In a White-Box domain, the environment in which the

software runs is untrusted. In this environment, attackers of the software have root access to the source code or the binary code and it is the fact we can’t change. This enables attackers to browse and trace the execution of the program. Attackers have access to memory and cache and stop the execution at any point by using debugger, and alter the code or data in the memory. Attackers can also using static analysis tools to analyze the code on the disk without running the program. By tampering with and removing some restrictions in the applications, the attacker can ultimately create crack programs. This can result in piracy of applications , which can significantly hurt the business model of information system developers.

Crack is a typical White-Box attack, and we will show the whole process of crack and make it clearly the importance of code assets.

A. Typical White-Box attack:crack In white-box attack domain, pirate is quite prevalent.,

which refers to the scenario where an illegal user uses the application without paying. Actually, it is the cracker who should be blamed for infringing by taking advantage of the fact that everybody wants to get most accomplishments at least cost.

The starting point of a software development life-cycle is developing the source code. We assume that the attacker is sophisticated enough to be able to use dynamic analysis tools such as debugger. In practice, we must take off the shell of the application. The shell here means a part of

program to protect the application from tampering. The taking off step is not very difficult for a proficient attacker because the real program running in the memory has nothing to do with what the shell looks like.

After un-shelling of the application, the attacker will do the main process of crack.

As we always know that the program developer will create a validation license to identify the user’s authority. The user has to pass the validation for further use. So the cracker focuses on how to pass the step or generate the registration code. To accomplish this goal, the cracker may have three choices:

1. Get the correct registration code. 2. Calculate the registration code 3. Bypass the step of validation The premise of the first choice is that the running

program will generate the explicit registration code in the memory and compare it and the input of the user. In this situation, the cracker will set an invigilator which we usually called it the memory registered machine in the memory to inspect the generated registration code. The user may firstly enter a wrong code (he/she impossibly enter in the correct number at first), but the invigilator will see the correct registration code and present it to the user. Finally, the illegal user will get the way to use the application without any payment.

In the situation of both the second choice and the third choice, the cracker must understand the cryptography algorithm. The attacker may use static or dynamic analysis tools to accomplish it. At last, the cracker will understand the registered algorithm through static analyzing the confidential data and the call graph. Then the cracker is able to get the relative registration number or the bypassing algorithm. This is not the ending that the protector and paying customers would like to see.

B. The key-point of White-Box security Through the analyzing above, we will find that the

developer can’t stop the cracker from getting the code because that the cracker can be one of the user himself (in the white-box attack domain). In such a circumstance, the most critical point is the prevention of code from inspection and understanding.

C. Goal-oriented modeling language The i* modeling language was designed for analyzing

relationships among actors with strategic intent and choices. i* model elements can capture consequences of attacks and solutions on the goals of each agent. i* is a high-level conceptual modeling language. It means that although the i* notation supported analysis of strategic relationships that appear in White-Box security scenarios, it lacks some of the modeling elements required specifically for White-Box security.

Firstly, we introduce what traditional i* language looks like. Traditional i* language includes elements which are actor, goal, soft goal, task and some relationships. It focuses on the goal and motivation of attackers. i* language has a sufficient expressive power of goal

117

Page 3: [IEEE 2014 IEEE 38th International Computer Software and Applications Conference Workshops (COMPSACW) - Vasteras, Sweden (2014.7.21-2014.7.25)] 2014 IEEE 38th International Computer

Figure 1. i* modeling of the process of crack.

analyzing. The most important relationship is dependency which represents one actor should accomplish some goals or tasks depend on another actor. It shows that the vulnerability because of the dependency. The decomposition link means a goal or a task has some sub-goals or sub-tasks. The means-ends link represents that there are several ways to accomplish the goal and it is a kind of “and” relationship. Figure 1 shows a basic traditional i* modeling of the process of crack.

III. I*-WHITE-BOX SECURITY MODELING: CODE ASSETS

The greatest feature of the White-Box security is that concerned the low level of abstraction which is the code level. For the character , this paper provides a change based on the traditional i* modeling language. Firstly, we see code and code’s affiliates (such as data, control flow)as a special resource outside any actors. These resources are what we really want to protect. Secondly, since the format and sequence of code are changing every moment in the problem of White-Box security, we provide some fields in the given code resources to describe fully the attacking context.

A. Code Resoucres In the traditional i* modeling, resource is regarded as a

tool which provides resource to help an actor accomplish a goal or a task. In the i*-WBS modeling, we see code resources as what we want to protect but not to use. Code

resources has their own inner assets such as data, instructions, jump statements and so on. We could call these code resources inner assets as code assets and these code assets are what we really want to protect from attacking.

Traditional i* modeling focuses on the goal and motivation of actors and even neglects any detail of the task and technical. Because the feature of the White-Box security is code protecting, i*-WBS emphasizes the analysis and description of code. Here, a number of fields has been added in to code resources expressed as Code(State, Format, Time stamp, Protected). The following will introduce the detail of these different fields. 1. State: State means the code are static or dynamic. In

another way it means whether the code is running or not. The state value also infers where the code are, disks or memory. The state is a very important field because it decides what method or tool attackers can use. For example, an attacker can’t use a static analysis tool to analyze code in memory. The state field can be two values: Static(S) or Dynamic (D).

2. Format: The word “code” is a general concept and people can’t conclude the format of the code from the word. The format field has three values which are source code(SC), binary code(BC) and intermediate code(IC). Attacking methods and tools are very different between different code formats. For example, the attack can use disassembler to transfer

118

Page 4: [IEEE 2014 IEEE 38th International Computer Software and Applications Conference Workshops (COMPSACW) - Vasteras, Sweden (2014.7.21-2014.7.25)] 2014 IEEE 38th International Computer

binary code to assembly language but can’t use the same tool to attack source code.

3. Time Stamp: In the running of a program, there are many stages such as build time(BT), load time(LT), run time(RT) and update time(UT).It can be denoted by N if the code is not executed. Timestamp indicates the malicious body can get what information and what kind of operation, for example, if the key is generated in the construction phase rather than on the hard disk to store the final form of code or binary code, the result is that the malicious can’t get the key.

4. Protected: The Protected field has two values which are encrypted (E) and unencrypted (UE). Some code is encrypted, if an attacker want to understand the encryption code, he or she firstly have to decrypt the code. After the decryption, the attacker is able to take further attacking.

With the help of the added field above, people can describe the information of code more sufficiently. For example, “code(S, SC, N, E)” represents the source code exist on disk without running and it is encrypted. This representation is very convenient to understand.

The goal of protecting code resources is prevent the way which the code implement its function from understanding by malicious users. Attackers have to analyze parameters, instructions and functions to understand what these codes say. In this paper, we call instructions, data, functions and so on code assets (Figure 2).

Figure 2. The structure of code assets.

B. The expression and using way of code resource in i*-WBS

In traditional i* modeling language, it is a convention to regard code resources (actually any resource) as a part of one particular actor. In i*-WBS modeling language, code resources are treated as an independent part outside any actors because of frequent interaction with even all actors (Figure 3).

Figure 3. The expression of code assets in i*-WBS.

From the figure 3, we can clearly see dependency relationships between actors’ tasks and one kind of code assets. For example, the task 3 of actor 1 is depended on the code’ data, it means actor 1 want to do something with the data. No matter what the detail of task 3 is, data is clearly the focus of our concern.

There is an example to show the importance of code assets in analyzing White-box security problems (Figure 4).

Figure 4. Code Assets in Control Flow Flattening pattern

119

Page 5: [IEEE 2014 IEEE 38th International Computer Software and Applications Conference Workshops (COMPSACW) - Vasteras, Sweden (2014.7.21-2014.7.25)] 2014 IEEE 38th International Computer

In Figure 4, we can see that the attacker’s malicious tasks named “Inspect the code” and “Understand the branch instruction targets (Static Analysis)” are both against the control flow of the code. The goal of the attacker is to understand the code and do a reverse engineering by creating the code’s control flow.

In the Figure 4 , we can clearly know what the attacker concerns and what approaches the attacker uses. In Control Flow Flattening pattern, we can protect the attacking against control flow of codes by obfuscating the jumps and instructions.

By the help of the code assets domain, we can fast and clearly know what the attacker really want to attack and the defender could ready to make a plan to protect the attacker’s targets.

IV. RELATED WORK AND CONCLUSION White-Box security is a new domain of security

conceptual modeling. The basis of this paper is the 14 patterns developed at Irdeto which is the leading company of software protection technologies and solutions. The research of Golnaz and Luncheng Lin is meaningful for this article.

In [2], he presents a new conceptual modeling language method for the White-box security. In his paper, he employed the i* agent and goal-oriented modeling approach to analyze WB security patterns. i* models can express the goals of WB attacks and solutions and help the to decompose into a hierarchical goal graph. Golnaz has analyze the 14 patterns developed at Irdeto and produce the i* graph of each pattern. The main purpose of the work is to extract the modeling concepts required to express those 14 patterns.

Base on the detailed analysis of 14 White-Box security patterns, we found that traditional goal-oriented modeling is not sufficient to analyze White-Box security problems. Code security is the essence of the White-Box security problems. In the second part of this paper, we introduce a typical White-Box attacking-crack to show the importance of code assets. Not only do we aim to analyze the attacker’s goals, but which kind of code asset the attacker wants to analyze and the way to attack. In this paper, we provide four more fields to code resources and divide code

resources into five kinds of code assets. It is more clearly to show people what the attacker’s target is and which kind of code assets is the most important in some specific scenarios.

In this paper, we just show the importance of code assets in White-Box analyzing and how to express them. In the future, we aim to add some particular relationships into this ontology to express White-Box security problems more sufficiently. After that, we should analyzing 14 patterns by using the new ontology and give one or two examples of real life.

In another way, we can use the method in conjunction with other methods, for example Problem Frame modeling language. We can transform a malicious task in i*-WBS modeling into an attacking machine in Problem Frame. The process is shown in Figure 5.

Figure 5. The transform from i*-WBS to PF through malicious task

ACKNOWLEDGMENT This paper receives financial support from the National

Natural Science Foundation of China (No. 61033006), National High Technology Research and Development Program of China (2012AA040904).

REFERENCES [1]. Lin L, Nuseibeh B, Ince D, et al. Using abuse frames to bound the

scope of security problems[C]//Requirements Engineering Conference, 2004. Proceedings. 12th IEEE International. IEEE, 2004: 354-355.

[2]. Elahi G, Yu E, Gu Y X, et al. Goal-Oriented Modeling and Analysis of White-Box Security: Toward an Ontology, In Proceedings of the 2nd ACM workshop on Software Security Protection, June 16, Beijing, China, 2012.

[3]. C. Landwehr , A. R. Bull , J. P. Mcdermott, W. , S. Choi, A Taxonomy of Computer Program Security Flaws, 1993.

[4]. G. Sindre and L. Opdahl. Eliciting security requirements with misuse cases. Requir. Eng., 10(1):34{44, 2005.

[5]. A. van Lamsweerde. Elaborating security requirements by construction of intentional antimodels. In Proc. of ICSE'04, pages 148--157. IEEE Computer Society, 2004.

[6]. J. Jurjens. Model-based Security Testing Using UMLsec: A Case Study. Electronic Notes in Theoretical Computer Science, 220(1):93{104, 2008. Proceedings of the Fourth Workshop on Model Based Testing (MBT 2008).

[7]. Golnaz Elahi, Eric Yu, Tong Li, Lin Liu: Security Requirements Engineering in the Wild: A Survey of Common Practices. COMPSAC 2011: 314-319

[8]. S. Chow, P. Eisen, H. Johnson, P.C. van Oorschot: White-Box Cryptography and an AES Implementation. In: Nyberg, K., Heys, H.M. (eds.) SAC 2002. LNCS, vol. 2595, Springer, Heidelberg (2003)

[9]. F. Braber, I. Hogganvik, M. S. Lund, K. Stolen, and F. Vraalsen. Model-based security analysis in seven steps, a guided tour to the CORAS method. BT Technology Journal, 25(1):101--117, 2007.

[10]. C. Liem, Y. Gu, H. Johnson, “A Compiler-Based Infrastructure for Software-Protection”, Programming Languages and Analysis for Security (PLAS’08), Tuscon, AZ, June, 2009, pp 33-44

[11]. Yuhong Wen, Haihong Zhao, Lin Liu: Analysing security requirements patterns based on problems decomposition and composition. RePa 2011: 11-20

[12]. Christopher Alexander and S. Ishikawa and M.Silverstein. A Pattern Language, VOLUME = 2, YEAR = 1977, Oxford University Press.

[13]. Yoshioka N, Washizaki H, Maruyama K. A survey on security patterns[J]. Progress in Informatics, 2008, 5(5): 35-47.

[14]. Mouratidis H, Giorgini P, Manson G, et al. A natural extension of tropos methodology for modelling security[C]//the Proceedings of the Agent Oriented Methodologies Workshop (OOPSLA 2002), Seattle-USA. 2002.

[15]. Supaporn K, Prompoon N, Rojkangsadan T. An approach: Constructing the grammar from security pattern[C]//Proc. 4th

120

Page 6: [IEEE 2014 IEEE 38th International Computer Software and Applications Conference Workshops (COMPSACW) - Vasteras, Sweden (2014.7.21-2014.7.25)] 2014 IEEE 38th International Computer

International Joint Conference on Computer Science and Software Engineering (JCSSE2007). 2007.

[16]. Schumacher M, Fernandez-Buglioni E, Hybertson D, et al. Security Patterns: Integrating security and systems engineering[M]. John Wiley & Sons, 2006.

[17]. McGraw G. Software security[J]. Security & Privacy, IEEE, 2004, 2(2): 80-83.

[18]. Chess B, Arkin B. Software security in practice[J]. Security & Privacy, IEEE, 2011, 9(2): 89-92.

[19]. Collberg C S, Thomborson C. Watermarking, tamper-proofing, and obfuscation-tools for software protection[J]. Software Engineering, IEEE Transactions on, 2002, 28(8): 735-746.

[20]. Collberg C, Carter E, Debray S, et al. Dynamic path-based software watermarking[J]. ACM SIGPLAN Notices, 2004, 39(6): 107-118.

[21]. Collberg C, Thomborson C, Townsend G M. Dynamic graph-based software watermarking[J]. TR04-08, Department of Computer Science, 2004.

121