the dataflow pointcut: a formal and practical framework

12
The Dataflow Pointcut - A Formal and Practical Framework * Dima Alhadidi [email protected] Amine Boukhtouta [email protected] Nadia Belblidia [email protected] Mourad Debbabi [email protected] Prabir Bhattacharya [email protected] Computer Security Laboratory (CSL) Concordia Institute for Information Systems Engineering Concordia University, Montreal, Quebec, Canada ABSTRACT Some security concerns are sensitive to flow of information in a program execution. The dataflow pointcut has been proposed by Masuhara and Kawauchi in order to easily im- plement such security concerns in aspect-oriented program- ming (AOP) languages. The pointcut identifies join points based on the origins of values. This paper presents a for- mal framework for this pointcut based on the λ calculus. Dataflow tags are propagated statically to track data de- pendencies between expressions. We introduce a static se- mantics for tag propagation and prove that it is consistent with respect to the dynamic semantics of the propagation. We instrument the static effect-based type system to prop- agate tags, match and inject advices. This static approach can be used to minimize the cost of dataflow pointcuts by reducing the runtime overhead since much of the dataflow information would be available statically and at the same time it can be used for verification. The proposed semantics for advice weaving is in the spirit of AspectJ where advices are injected before, after, or around the join points that are matched by their respective pointcuts. Inspired from the for- mal framework, the AspectJ compiler ajc is extended with the dataflow pointcut that tracks data dependencies inside methods. Categories and Subject Descriptors D.3.1 [Programming Languages]: Formal Definitions and Theory—Semantics General Terms Languages, Theory * This research is the result of a fruitful collaboration with the Department of National Defense, Bell Canada and the DND/NSERC Research Partnership Program. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. AOSD’09, March 2–6, 2009, Charlottesville, Virginia, USA. Copyright 2009 ACM 978-1-60558-442-3/09/03 ...$5.00. Keywords Aspect-oriented Programming, Dataflow Analysis, Type The- ory 1. MOTIVATION AND BACKGROUND A security of an application is an attribute that permeates the whole system. As such, any attempt to address security concerns must be global in nature. Besides, security solu- tions must be applied consistently at every relevant location. One way of achieving these objectives is by separating out security concerns from the rest of the application concerns, such that they can be addressed independently and applied globally. A methodology that can encompass separation of security concerns and consistent implementation of security solutions, would pave the road towards secure applications, enable a security expert to specify security properties, and facilitate the correctness verification of security solutions. Application security hardening is as any process, method- ology, product, or combination that is used to increase the security of applications. Actually it is all about code mod- ifications scattered across the whole software. Accordingly, Aspect-Oriented Programming (AOP) [1], which allows sep- aration of crosscutting concerns such as synchronization and persistence, appears to be a promising paradigm for software security hardening. The mostly-used AOP approaches are the pointcut-advice model [5], the multi-dimensional sepa- ration of concerns model [3], and the adaptive programming model [4]. The adopted model in this paper is the pointcut- advice model because it appears to be the most appropriate one to harden security into applications [7]. The fundamen- tal concepts of this model are: join points, pointcuts, and advices. A join point is a location in the execution of a program. A pointcut is a concept that classifies join points in the same way a type classifies values. An advice is code fragments executed when join points satisfying a particular pointcut are reached. This execution can be done before, after, or around a specific join point. Despite the fact that AOP technology fits well to security- relevant problems, there are no theoretical foundations, ap- plications, or tools that address AOP and security together. Most of the contributions [6, 23, 24] that explore the usabil- ity of AOP for integrating security code into applications are presented as case studies that show the relevance of AOP languages for application security hardening. In this paper, we have defined a formal framework for the dataflow point- 15

Upload: concordia

Post on 08-May-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

The Dataflow Pointcut - A Formal and Practical Framework*

Dima [email protected]

Amine [email protected]

Nadia [email protected]

Mourad [email protected]

Prabir [email protected]

Computer Security Laboratory (CSL)Concordia Institute for Information Systems Engineering

Concordia University, Montreal, Quebec, Canada

ABSTRACTSome security concerns are sensitive to flow of informationin a program execution. The dataflow pointcut has beenproposed by Masuhara and Kawauchi in order to easily im-plement such security concerns in aspect-oriented program-ming (AOP) languages. The pointcut identifies join pointsbased on the origins of values. This paper presents a for-mal framework for this pointcut based on the λ calculus.Dataflow tags are propagated statically to track data de-pendencies between expressions. We introduce a static se-mantics for tag propagation and prove that it is consistentwith respect to the dynamic semantics of the propagation.We instrument the static effect-based type system to prop-agate tags, match and inject advices. This static approachcan be used to minimize the cost of dataflow pointcuts byreducing the runtime overhead since much of the dataflowinformation would be available statically and at the sametime it can be used for verification. The proposed semanticsfor advice weaving is in the spirit of AspectJ where advicesare injected before, after, or around the join points that arematched by their respective pointcuts. Inspired from the for-mal framework, the AspectJ compiler ajc is extended withthe dataflow pointcut that tracks data dependencies insidemethods.

Categories and Subject DescriptorsD.3.1 [Programming Languages]: Formal Definitions andTheory—Semantics

General TermsLanguages, Theory

∗This research is the result of a fruitful collaboration withthe Department of National Defense, Bell Canada and theDND/NSERC Research Partnership Program.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.AOSD’09, March 2–6, 2009, Charlottesville, Virginia, USA.Copyright 2009 ACM 978-1-60558-442-3/09/03 ...$5.00.

KeywordsAspect-oriented Programming, Dataflow Analysis, Type The-ory

1. MOTIVATION AND BACKGROUNDA security of an application is an attribute that permeates

the whole system. As such, any attempt to address securityconcerns must be global in nature. Besides, security solu-tions must be applied consistently at every relevant location.One way of achieving these objectives is by separating outsecurity concerns from the rest of the application concerns,such that they can be addressed independently and appliedglobally. A methodology that can encompass separation ofsecurity concerns and consistent implementation of securitysolutions, would pave the road towards secure applications,enable a security expert to specify security properties, andfacilitate the correctness verification of security solutions.

Application security hardening is as any process, method-ology, product, or combination that is used to increase thesecurity of applications. Actually it is all about code mod-ifications scattered across the whole software. Accordingly,Aspect-Oriented Programming (AOP) [1], which allows sep-aration of crosscutting concerns such as synchronization andpersistence, appears to be a promising paradigm for softwaresecurity hardening. The mostly-used AOP approaches arethe pointcut-advice model [5], the multi-dimensional sepa-ration of concerns model [3], and the adaptive programmingmodel [4]. The adopted model in this paper is the pointcut-advice model because it appears to be the most appropriateone to harden security into applications [7]. The fundamen-tal concepts of this model are: join points, pointcuts, andadvices. A join point is a location in the execution of aprogram. A pointcut is a concept that classifies join pointsin the same way a type classifies values. An advice is codefragments executed when join points satisfying a particularpointcut are reached. This execution can be done before,after, or around a specific join point.

Despite the fact that AOP technology fits well to security-relevant problems, there are no theoretical foundations, ap-plications, or tools that address AOP and security together.Most of the contributions [6, 23, 24] that explore the usabil-ity of AOP for integrating security code into applications arepresented as case studies that show the relevance of AOPlanguages for application security hardening. In this paper,we have defined a formal framework for the dataflow point-

15

cut based on the λ calculus. This pointcut is an importantpointcut from a security perspective because it can detectand fix a considerable number of vulnerabilities that are re-lated to invalidated input, e.g., web application vulnerabili-ties, process injection, log forging, and path injection. Thepointcut identifies join points based on the origins of val-ues using dataflow tags. In addition, we introduce a staticsemantics for tag propagation and prove that it is consis-tent with respect to the dynamic semantics of the propaga-tion. Masuhara and Kawauchi [22] have defined the dataflowpointcut but they have not provide a formal framework forthis pointcut. They have presented the design of this point-cut with a web-application example, and its prototype im-plementation.

The choice of the λ calculus as a starting core is moti-vated by the desire to remove the syntactic dissimilaritiesbetween any related constructs in different AOP languages.The λ calculus is considered a useful mathematical tool inthe study of programming languages, since programs can beidentified with lambda terms [9]. Effect-type system opensthe road to analyze programs statically in the presence ofmutable data. Consequently, we instrument the effect-basedtype system to do weaving during static typing. The maincontributions of this paper can be summarized as follows:

• A formal framework of the dataflow pointcut is de-tailed. Dataflow tags discriminate dataflow pointcutsand propagate statically between expressions to trackdata dependencies. We introduce a static semanticsfor tag propagation and prove that it is consistent withrespect to the dynamic semantics of the propagation.The effect-based type system is instrumented to taketag propagation, matching and weaving into consid-eration. The static effect-based type system that isinstrumented to propagate tags can be used to mini-mize the cost of dataflow pointcuts because much ofthe dataflow information would be available statically.In addition, the defined semantics gives a guide thatdemonstrates how to implement the dataflow pointcutinto real or code production compilers.

• The pointcuts set, and get are considered. Manyfunctional programming languages offer references forrepresenting mutable data. The set and get point-cuts pick out join points where variables are set andread respectively. The set and get pointcuts havenot been addressed in previous research proposals thatdeal with aspect-oriented functional programming [17,18, 19, 20, 21].

• Advice weaving is formalized. Such a weaving is in thespirit of AspectJ [13] where advices are injected before,after, or around the join points that are matched bytheir respective pointcuts. There are many researchproposals that have addressed this subject [17, 18, 19,20, 21]. The novelty of our approach lies in the intro-duction of the sequence construct “;” of the extendedλ calculus to perform the weaving.

• Inspired from the formal framework, an intra-proceduralimplementation for the dataflow pointcut has been in-troduced in AspectJ compiler ajc 1.5.0.

The rest of the paper is organized as follows. The paperdiscusses related work in Section 2. Section 3 describes the

syntax. A background for the effect-based type system is de-tailed in Section 4. Section 5 talks about the formal frame-work of the dataflow pointcut whereas Section 6 handles itspractical framework. Concluding remarks as well as a dis-cussion of future work are represented in Section 7.

2. RELATED WORKThere are many research contributions that have targeted

AOP semantics [12, 17, 18, 19, 20, 29, 31, 30]. The most rel-evant research proposals are the contribution of Walker et al.where the authors have defined the semantics of MinAML,an aspect-oriented language [17] and the contribution ofDantas et al. where the authors have defined PolyAML, atyped functional, aspect-oriented language [18]. Comparedto these contributions, the defined framework does not uselabels to mark points where advices are going to be injected.Furthermore, advice weaving is in the spirit of AspectJ. Con-sequently, the sequence construct“;” is introduced for adviceinjection instead of applying the advice to the argument orto the result of a function as it is the case in MinAML andPolyAML. In the aforementioned contribution [17], proceedcan only appear once in the around-advice, but it can ap-pear more than one time in the around-advice of the definedframework as it is the case in AspectJ. Similar to MinAMLand PolyAML, advising expressions inside the advice defini-tion is not allowed.

Tatsuzawa et al. have implemented an aspect-orientedversion of core O’Caml called Aspectual Caml [19]. Com-pared to the defined framework, Aspectual Caml carries outtype inference on advices without consulting the types of thefunctions designated by the pointcuts. In addition, there areno formal semantics for Aspectual Caml. Wang et al. haveprovided seamless integration of AOP and strongly-typedfunctional languages through a static weaving process, whichdeals with around advices and type-scoped pointcuts in thepresence of higher-order functions [20, 21].

Clifton et al. [2] have presented Modular Aspects withOwnership (MAO). Using MAO, programmers can specifythe full range of their aspects’ interactions with the baseprogram and their interference with one another, makingaspect-oriented programs more precisely documented andeasier to reason about. In addition, Clifton and Leavenshave introduced MiniMAO1 [12], an object-based and aspect-oriented calculus. The defined calculus models context-exposing pointcut primitives as well as around-advice capa-ble of changing parameter bindings on proceed invocations.The authors have given an operational semantics, type sys-tem, and proof of soundness for MiniMAO1.

The most prominent contribution that target security andAOP together is the contribution of Masuhara and Kawauchi[22]. This contribution is detailed in the previous section.A case study has been described [24] by Ramachandran etal. to incorporate multilevel security system (MLS) usingaspects. AspectJ is used to intercept Java library calls in or-der to enforce MLS policy. The authors have described howAspectJ can actually go further than conventional object-oriented approach to achieve stronger enforcement of MLS.

Huang et al. have introduced an implementation of areusable and generic aspect library [23]. This aspect libraryis based on AspectJ and common Java security packages. Itcontains the following typical categories of security aspects:encryption, decryption, authentication, authorization, andsecurity audit. By means of an example of access control,

16

De Win et al. [6] have investigated how well AOP can dealwith the separation of security from an application. In orderto construct a more generic solution, they have suggested toabstract relevant pointcuts out of the aspect implementa-tion.

Tracematches [26] allow developers specifying traces ofinterest via regular expressions of symbols with free vari-ables, along with some code to execute if such a trace oc-curs on a program execution. Comparatively, the defineddataflow pointcut provides more functionalities than trace-matches. The representation of tracematches allows express-ing some special cases but not all cases of the dataflow point-cut. For example, tracematches cannot pick out all the joinpoints that are data-dependent on a calling of a specificfunction. Actually, we don’t know how these join points ap-pear in a program execution to model them as a sequence ofevents. Besides, the implemented dataflow pointcut enablesa succinct, elegant, and declarative representation to trackdataflow dependencies. In addition, tracematches have beenimplemented in abc compiler whereas the defined dataflowpointcut has been implemented in ajc compiler.

3. SYNTAXIn this section, we present the syntax and the notations

that are used along this paper. A program, as shown inFigure 1, consists of an expression and a sequence of advices.We consider the following expressions:

• Constants and variables.

• Functional constructs such as function abstraction, func-tion application, and recursion.

• Let expressions.

• Sequence expressions.

• Imperative notations such as referencing. An expres-sion of the form ref(e) allows the allocation of a newreference that points to the value obtained from theevaluation of e. The unary operator “ ! ” is used fordereferencing, and the binary operator “ := ” is usedfor assignment.

Each advice has a kind (akind) that can be either before,after, or around. It contains also a pointcut designator(pcd) that specifies the join points in which it is interested,and a body (exp) representing the action to be taken atthose points. In the case of around advice, the body (exp)may contain a special variable proceed that represents theadvice with next precedence, or the computation under thejoin point if there is no further advice. Around expressions(ArExp) are expressions that may contain proceed. AdvSeqis a sequence of advices. Empty sequence is represented bythe symbol ε.

We consider four kinds of basic pointcuts: call, set, get,and dflow. The pointcut syntax uses type schemes to specifyjoin point types and tags to discriminate dataflow pointcuts.Types are detailed in the next section. Tags are representedby natural numbers. We write ι-set to denote a set havingtags as elements.

Basic pointcuts can be logically combined to produce morecomplex ones using boolean operators. Vname is an infiniteset of variable names whereas Fname is an infinite set offunction names.

Prog 3 pr ::= e¢q (Program)

Exp 3 e ::= c (Expressions)

| x

| λx.e| e1e2

| let rec f x = e1 in e2

| let x = e1 in e2

| e1;e2

| ref e

| ! e

| e1 := e2

Const 3 c ::= n | ( ) | true | false (Constants)

AdvSeq 3 q ::= 〈akind: before | after, (Advices)

pcd: Pcd,

exp: Exp〉 q

| 〈akind: around,pcd: Pcd,

exp: ArExp〉 q | ε

ArExp 3 ea ::= c (Around Expressions)

| x

| proceed

| λx.ea

| ea1ea

2

| let rec f x = ea1 in ea

2

| let x = ea1 in ea

2

| ea1 ;e

a2

| ref ea

| ! ea

| ea1 := ea

2

Pcd 3 p ::= true | ¬p | p ∧p (Pointcuts)

| cgdp | sp

CGDPcd 3 cgdp ::= 〈pkind: call,var: Fname,

scheme: FunctionTypeScheme〉| 〈pkind: get,

var: Vname,

scheme: RefTypeScheme〉| 〈pkind: dflow,

tag: Tag,pcd: CGDPcd〉

SPcd 3 sp ::= 〈pkind: set,var: Vname,

scheme: RefTypeScheme〉Fname 3 f ::= Identifier

Vname 3 v ::= Identifier

Tag 3 ι ::= n (Tags)

TagSet 3 t ::= ι -set

Figure 1: Syntax

Notations

• Given two sets A and B, we write A−→m

B to denote

the set of all mappings from A to B. A mapping (mapfor short) m ∈ A−→

mB could be defined by extension

as [a1 7→ b1, ..., an 7→ bn] to denote the association ofthe elements bi’s to ai’s. We write Dom(m) to denotethe domain of the map m. We write mx1,x2 ,... todenote the map m excluding the associations of theform xi 7→ . Given two maps m and m′, we writem†m′ to denote the overwriting of the map m by the

17

associations of the map m′, i.e., the domain of m†m′ isDom(m) ∪ Dom(m′), and we have (m†m′)(a) =m′(a)if a ∈ Dom(m′) and m(a) otherwise.

• Given a record space D = 〈f1 : D1, f2 : D2, . . . , fn :Dn〉 and an element e of type D, the access to the fieldfi of an element e is written as e.fi.

• Given a category c, we write c-set to denote the typeof sets having elements of category c.

• The type Identifier classifies identifiers.

4. EFFECT-BASED TYPE SYSTEMIt is well known that the Hindley-Milner type discipline

[25] for polymorphic type inference in functional program-ming languages is problematic in the presence of nonrefer-entially transparent constructs. More precisely, the problemis relevant to type generalization in the presence of mutabledata. Therefore, many solutions have been proposed [8, 10,11, 28]. The adopted type and effect discipline in this paperto solve this problem is a variant of the one of Talpin andJouvelot [11]. In this section, we will give an overview ofthis discipline where a new static system that reconstructsthe types, regions, and effects of expressions in an implicitlytyped functional language that supports imperative opera-tions on reference values has been defined. Static and dy-namic semantics for the syntax that appears in Section 3according to this discipline are provided next.

4.1 Static SemanticsStatic semantics presented in Figure 3 depends on three

static domains which are shown in Figure 2: regions, effects,and types. Regions are intended to abstract memory loca-tions and represent sets of possibly aliased reference values.The domain of regions ρ is the disjoint union of a countableset of constants ranged over by r and variables ranged overby γ. Every data structure corresponds to a given regionin the static semantics. This region abstracts the memorylocations in which it will be allocated at run time. Two val-ues are in the same region if they may share some memorylocations.

Region 3 ρ ::= r | γ

Effect 3 η ::= ∅ | ς

| η ∪ η | init(ρ, τ)

| read(ρ, τ) | write(ρ, τ)

Type 3 τ ::= int | bool

| Unit | τη−→ τ

| refρ(τ) | α

TypeScheme 3 σ ::= τ | ∀υσ

FunctionTypeScheme 3 φ ::= τη−→ τ | ∀υφ

RefTypeScheme 3 ϕ ::= refρ(τ) | ∀υϕ

Figure 2: Regions, Effects, and Types

Effects represent approximations of the imperative behav-ior on regions. Basic effects η can either be the constant∅ that represents the absence of effects, effect variable ς,init(ρ, τ) that stands for the allocation of a reference in aregion ρ to a value of type τ , read(ρ, τ) that describes ac-cesses to references in region ρ, or write(ρ, τ) that representsthe assignments of values to references in the region ρ. Ef-fects can be gathered together with the infix ∪ that denotes

the union of effects. Effects define a set algebra. Hence, theequality on effects is defined modulo associativity, commu-tativity, and idempotence with ∅ as a neutral element. Theconsidered types are:

• int

• bool

• Unit

• functional types τη−→ τ ′ from τ to τ ′ with a latent ef-

fect η where the latent effect of a function is the effectincurred when the function is applied and it encapsu-lates the side effects of the function body.

• reference types refρ(τ) in region ρ to values of type τ .

• type variables α.

Type schemes are defined as follows ∀υ1...υnτ where υ canbe type, region, or effect variable. FunctionTypeScheme isdefined to be used with call pointcut while RefTypeSchemeis defined to be used with set and get pointcuts. In theeffect-based type system, the typing judgment Γ ` e :τ, η states that an expression e has type τ and effect ηunder some typing environment Γ where a typing environ-ment Γ maps variables to type schemes. The relation of typeinstance between two types states that a type τ ′ is a typeinstance of a type τ if there exists a substitution θ such thatθτ = τ ′ where substitution θ is a mapping from type vari-ables to types, from region variables to regions, and fromeffect variables to effects. On the other hand, the relationof generic type instance between a type and a type schemestates that a type τ ′ is a generic type instance of a typescheme σ = ∀υ1...υnτ (written as σ Â τ ′) if there exists asubstitution θ defined over υ1...υn such that θτ = τ ′.

The notion of free variables and the notion of generaliza-tion are needed to handle the typing of let expressions. Typegeneralization states that a variable cannot be generalizedif it is free in the typing environment Γ or if it is presentin the inferred effect. This is represented by the followingfunction Gen:

Gen(Γ, τ, η) = ∀F(τ)\(F(Γ) ∪ F(η))τ

where F(−) denotes the set of free variables as defined inthe following:

F(int) = { }F(bool) = { }F(Unit) = { }F(α) = {α}F(τ1

η−→ τ2) = F(τ1) ∪ F(τ2) ∪ F(η)F(refρ(τ)) = F(ρ) ∪ F(τ)F(∀υ1...υnτ) = F(τ)\{υ1, ..., υn}F(Γ) =

⋃x∈Dom(Γ) F(Γ(x))

F(∅) = { }F(ς) = {ς}F(init(ρ, τ)) = F(ρ) ∪ F(τ)F(read(ρ, τ)) = F(ρ) ∪ F(τ)F(write(ρ, τ)) = F(ρ) ∪ F(τ)F(η ∪ η′) = F(η) ∪ F(η′)F(r) = { }F(γ) = {γ}

Subeffecting [11] is introduced by the following rule where

18

TypeOf(c) Â τΓ ` c : τ, ∅ (const)

x : σ ∈ Γ σ Â τΓ ` x : τ, ∅ (var)

Γx†[x 7→ τ1] ` e : τ2, η

Γ ` λx.e : τ1η−→ τ2, ∅ (abs)

Γ ` e1 : τ1η−→ τ2, η′ Γ ` e2 : τ1, η′′

Γ ` e1e2 : τ2, η ∪ η′ ∪ η′′(app)

Γ ` e1 : τ1, η Γ ` e2 : τ2, η′Γ ` e1; e2 : τ2, η ∪ η′ (seq)

Γx,f †[x 7→ τ1, f 7→ τ1η−→ τ ] ` e1 : τ, η

Γf †[f 7→ Gen(Γ, τ1η−→ τ, η)] ` e2 : τ2, η′

Γ ` let rec f x = e1 in e2 : τ2, η ∪ η′(letrec)

Γ ` e1 : τ1, ηΓx†[x 7→ Gen(Γ, τ1, η)] ` e2 : τ2, η′Γ ` let x = e1 in e2 : τ2, η ∪ η′

(let)

Γ ` e : τ, ηΓ ` ref (e) : refρ (τ), η ∪ init(ρ, τ)

(ref)

Γ ` e : refρ (τ), ηΓ ` !e : τ, η ∪ read(ρ, τ)

(deref)

Γ ` e1 : refρ (τ), η Γ ` e2 : τ, η′Γ ` e1:=e2 : Unit, η ∪ η′ ∪ write(ρ, τ)

(assign)

Figure 3: Static Semantics

η ⊆ η′ if and only if there exists an effect η′′ such thatη′=η ∪ η′′ :

Γ ` e : τ, η η ⊆ η′

Γ ` e : τ, η′

Talpin and Jouvelot have shown that the static and thedynamic semantics are consistent with respect to a struc-tural relation between values and types.

4.2 Dynamic Semantics

Value 3 v = c + Ref + {u} + Closure

Closure 3 c = Var× Exp × Env

Store 3 s = Ref −→m

Value

Env 3 Σ = Var −→m

Value

Ref 3 l

Var 3 x

Figure 4: Computable Values

The dynamic semantics presented in Figure 5 specifies theevaluation of expressions. Values are either the constant c,the command value u, reference values l, or closures c. Adynamic environment Σ maps variables to values. A closurec takes the form 〈x, e, Σ〉 where x is the formal parameter, eis the body of the function, and Σ is an environment whichmaps each free variable of e to the value it assumes at thetime of the declaration of the function. A store s is a finitemap from references to values. A trace a is a set of labeledreference values that indicate initialized, read, and written

locations. A trace is the dynamic counterpart of a staticside-effect (described in subsection 4.1). Given a store sand an environment Σ, the dynamic semantics associates anexpression e with the value v it computes, the trace a of theside effects it performs during its evaluation and the possiblyupdated store s′. This is noted s, Σ ` e → v, a, s′.

s, Σ ` c → c, ∅, s (const)

x ∈ Dom(Σ)s, Σ ` x → Σ(x), ∅, s (var)

s, Σ ` λx.e → 〈x, e, Σx〉, ∅, s (abs)

s, Σ ` e1 → 〈x, e′1, Σ′〉, a, s′s′, Σ ` e2 → v2, a′, s′′

s′′, Σ′†[x 7→ v2] ` e′1 → v3, a′′, s′′′s, Σ ` e1e2 → v3, a ∪ a′ ∪ a′′, s′′′

(app)

s, Σ ` e1 → v1, a, s′ s′, Σ ` e2 → v2, a′, s′′s, Σ ` e1; e2 → v2, a ∪ a′, s′′ (seq)

c = 〈x, e1, Σf,x†[f 7→ c]〉 s, Σ ` e1 → c, a, s′s′, Σf †[f 7→ c] ` e2 → v, a′, s′′

s, Σ ` let reclet reclet rec f x = e1 ininin e2 → v, a ∪ a′, s′′(letrec)

s, Σ ` e1 → v1, a, s′s′, Σx†[x 7→ v1] ` e2 → v2, a′, s′′

s, Σ ` let x = e1 in e2 → v2, a ∪ a′, s′′(let)

s, Σ ` e → v, a, s′ l /∈ Dom(s′)s, Σ ` ref (e) → l, a ∪ {init(l)}, s′†[l 7→ v]

(ref)

s, Σ ` e → l, a, s′s, Σ ` !e → s′(l), a ∪ {read(l)}, s′ (deref)

s, Σ ` e1 → l, a, s′ s′, Σ ` e2 → v, a′, s′′s, Σ ` e1:=e2 : u, a ∪ a′ ∪ {write(l)}, s′′l †[l 7→ v]

(assign)

Figure 5: Dynamic Semantics

5. DATAFLOW POINTCUTThe dataflow pointcut is an important pointcut that ana-

lyzes information flow in a program execution to detect inputvalidation vulnerabilities. For clarification and in the case ofXSS, the dataflow pointcut can intercept any joint point thatprints an unauthorized string to a client. An unauthorizedstring is a string that is created from one of the client’s in-put parameters. Sanitizing is then used to replace charactersthat come from untrusted principals, and replace them withquoted characters. The dataflow pointcut dflow[x,x’](p)

as defined by Masuhara and Kawauchi [22] matches if thereis a dataflow from x’ to x. Variable x should be bound to avalue in the current join point whereas variable x’ should bebound to a value in a past join point matching to p. There-fore, Masuhara and Kawauchi’s dataflow pointcut must beused in conjunction with some other pointcuts that bind x

to a value in the current join point. The dflow pointcut asdefined by them for the sanitizing task in web applicationsis presented in Figure 6. The second line matches calls toprint methods in Servlet subclasses, and binds the param-eter string to variable o. The dflow pointcut restricts the

19

join points to such ones that the parameter string originatesfrom a return value of getParameter in a past join point.

pointcut r e spondCl i en tSt r ing ( St r ing o ) :c a l l (∗ PrintWriter . p r i n t ∗( S t r ing ) ) && args ( o ) &&

within ( S e r v l e t+)&& dflow [ o , i ] ( c a l l ( S t r ing Request . getParameter (

St r ing ) ) && re turns ( i ) ) ;

Figure 6: Masuhara and Kawauchi’s Dataflow Point-cut

E, q, m `d c : { }, m (const)

x : t ∈ E Γ ` x : τ, ηE, q, m `d x : M(τ, m) ∪ t, m

(var)

Ex†[x 7→ {}], q, m `d e : t, m′E, q, m `d λx.e : t, m′ (abs)

E, q, m `d e1 : t1, m′ E, q, m′ `d e2 : t2, m′′Γ ` e1 : τ, η Γ ` e1e2 : τ ′, η′E, q, m `d e1e2 : M(τ ′, m′′) ∪ t, m′′

where t = t1 ∪ t2 ∪ searchTagCall(e1, τ, t2, q)

(app)

E, q, m `d e1 : t1, m′ E, q, m′ `d e2 : t2, m′′E, q, m `d e1; e2 : t2, m′′ (seq)

Ex,f †[x 7→ {}, f 7→ {}], q, m `d e1 : t1, m′Ef †[f 7→ t1], q, m′ `d e2 : t2, m′′

E, q, m `d let reclet reclet rec f x = e1 ininin e2 : t2, m′′(letrec)

E, q, m `d e1 : t1, m′Ex†[x 7→ t1], q, m′ `d e2 : t2, m′′

E, q, m `d letletlet x = e1 ininin e2 : t2, m′′(let)

E, q, m `d e : t, m′E, q, m `d refrefref (e) : t, m′ (ref)

E, q, m `d e : t, m′ Γ ` e : τ, ηΓ ` !e : τ ′, η′

E, q, m `d !e : M(τ ′, m′) ∪ t′, m′where t′ = t ∪ searchTagGet(e, τ, t, q)

(deref)

E, q, m `d e1 : t1, m′ E, q, m′ `d e2 : t2, m′′Γ ` e1 : refρ(τ), η

E, q, m `d e1:=e2 : t, m′′ρ†[ρ 7→ t2]

where t = t1 ∪ t2 ∪ searchTagSet(e1, refρ(τ), t2, q)

(assign)

Figure 7: Static Tagging Rules

In the sequel, we introduce a static semantics for tag prop-agation and prove that it is consistent with respect to thedynamic semantics of the propagation using the concepts ofTalpin and Jouvelot’s type discipline [11].

5.1 Static ApproachA formal static framework is defined to match and weave

dataflow pointcuts. Dataflow tags are associated with ex-pressions and propagated statically to track data dependen-cies. The set of tags that are associated with an expressionis specified according to the tagging rules specified in Figure7.

The dataflow pointcut has a pointcut as a part of its syn-tax together with a tag that discriminates this dataflowpointcut from other dataflow pointcuts. In general, the

pointcut of a dataflow pointcut should match joint pointswhere values are defined and because of that it could be acall or a get pointcut. If an expression matches the point-cut of a datflow pointcut, this expression is tagged with thetag of this dataflow pointcut. This tag is then propagated toother expressions that are data-dependent on the expressionthat matches the pointcut of the dataflow pointcut. Finally,if another expression represents a join point, i.e., call, deref-erencing, or assignment expressions and is tagged with thetag of this dataflow pointcut which means that it depends onthe the expression that matches the pointcut of the dataflowpointcut, so it matches the corresponding dataflow pointcut.The maximum number of tags in a set associated with anexpression is equal to the number of the defined dataflowpointcuts.

In general, an expression e matches the following dataflowpointcut 〈pkind:dflow,tag:n,pcd:p〉 if e represents a join point,e is data-dependent on a previous expression e′, and e′ matchesp. To track data dependencies between expressions, all theexpressions that match p, which e′ is one of them, will betagged with the tag of the dataflow pointcut n and n willthen be transmitted according to the defined tagging rulesto other expressions which e is one of them because theyare data-dependent on e′ . Accordingly, if e is an applica-tion expression, e matches the following dataflow pointcut〈pkind:dflow,tag :n,pcd:p〉 if its argument is tagged with n.If e is an assignment expression, it matches the pointcut ifthe right-hand side of the assignment operator is tagged withn. If e is a dereferencing expression, it matches the pointcutif its argument is tagged with n. This means that these ex-pressions are data-dependent on a previous expression thatmatches the pointcut p of this dataflow pointcut.

The dataflow judgment E , q, m `d e : t, m′ is used tospecify that an expression e is associated with a set of tagst in the presence of a sequence of defined advices q where atagging environment E maps variables to tag sets. The con-cept of a tagging environment E is similar to the concept ofa typing environment Γ and at the same time the domainsof both environments are equal. A mapping m stores map-pings from regions to tag sets. The mapping m′ reflects amodified version of m after tagging the expression e. Weassume that expressions are α-converted.

The tagging rules presented in Figure 7 and the typingrules which are presented in the sequel in Figure 8 are relatedto each other but they are separated to enhance readability.The function M checks if the type is a reference type in re-gion ρ and if ρ is associated with the tag set t in the mappingm. If so it returns t. Otherwise, it returns an empty set.Comparing types depends on pattern matching and not ontype unification because unification changes both types tobe equal but in this case we just need to check if the type isa reference type or not. The functions searchTagCall, search-TagGet, and searchTagSet work on a sequence of advices andreturn a set of tags. The tags returned from the first one dis-criminate the dataflow pointcuts whose enclosed pointcutsmatch an application expression; the tags returned from thesecond one discriminate the dataflow pointcuts whose en-closed pointcuts match a dereferencing expression; and thetags returned from the third one discriminate the dataflowpointcuts whose enclosed pointcuts match an assignment ex-pression. We refer the reader to [14] to understand the for-mal definitions of the utility functions that are used in thispaper.

20

Now let us turn to the explanation of the tagging rulesof Figure 7. Constants are associated with empty sets oftags. The tagging of variables is dictated by the taggingenvironment. Besides, we must take into consideration ifthe variable has a reference type using the function M. Forfunction abstraction, the tags of it depend on the tags thatare associated with its subexpression. For an applicationexpression, its tag set contains the tags of the function ex-pression and the tags of the argument. Besides, we musttake into consideration if the application expression has areference type using the function M. Moreover, it containsthe tags that are retrieved using the function (searchTagCall).These tags discriminate dataflow pointcuts whose enclosedpointcuts match the corresponding application expression.The tags of a sequence expression depend on the tags of itssecond subexpression.

For a recursive let expression, the tags of it depend onthe tags of its second subexpression provided that we ex-tend the tagging environment with variable assumption forthe function name. The tags of the function name are set tothe tags that are associated with the first subexpression pro-vided that we extend the tagging environment with variableassumptions. A similar explanation applies to the tagging oflet expression where the tags of it depend on the tags thatare associated with its second subexpression provided thatwe extend the tagging environment with variable assump-tion.

The tags of a reference expression depend on the tags ofits subexpression. For a dereferencing expression, its tag setcontains the tags that are associated with its subexpression.Besides, we must take into consideration if the dereferenc-ing expression has a reference type using the function M.Moreover, it contains the tags that are retrieved using thefunction (searchTagGet). These tags discriminate dataflowpointcuts whose enclosed pointcuts match the correspond-ing dereferencing expression. For an assignment expression,its tag set contains the tags of its first subexpression andthe tags of its second subexpression. Moreover, it containsthe tags that are retrieved using the function (searchTagSet).These tags discriminate dataflow pointcuts whose enclosedpointcuts match the corresponding assignment expression.After tagging the assignment expression and to keep the factthat the subexpression on the left-hand side of the assign-ment operator depends on the subexpression on the right-hand side of the assignment operator, the mapping m ischanged to reflect that the region of the subexpression onthe left-hand side of the assignment operator is associatedwith the tag set of the subexpression on the right-hand sideof the assignment operator. This is done to use the result-ing set with this subexpression if it is used elsewhere andconsequently to maintain the data-dependency between ex-pressions.

The following example demonstrates the basic ideas re-lated to the dataflow pointcut:

letletlet x = refrefref 3 ininin letletlet y =refrefref 4 ininin letletlet f = λz.z ininin x:=!y; f(x)The dataflow pointcut 〈pkind:dflow,tag:k,pcd:p〉 is matchedby the expressions !y, x:=!y, and f(x) where the pointcutp picks out join points where we dereference a variable y oftype ∀αρ.ref ρ(α) This is justified by the following reasons:

• The expression !y satisfies the pointcut p and conse-quently !y is tagged with the tag k of the dataflowpointcut according to (deref) static tagging rule.

• The assignment expression x:=!y is then tagged withk according to the (assign) static tagging rule.

• x depends on dereferencing y. Accordingly, the regionρ of x is associated with the tag k and stored in themapping m as indicated in the (assign) tagging rule.Afterward, the tag associated with the region ρ of xwill be retrieved from the mapping m wherever x ap-pears using the function M. Hence, x in f(x) will betagged with k according to the (var) static tagging rule.

• Finally, we conclude that the expressions !y, x:=!y,and f(x) match the defined pointcut because they aretagged with k.

TypeOf(c) Â τΓ, q ` c : τ, ∅ Ã c

(const)

x : σ ∈ Γ σ Â τΓ, q ` x : τ, ∅ Ã x

(var)

Γx†[x 7→ τ1], q ` e : τ2, η Ã e′ Γ ` λx.e′ : τ ′, η′

Γ, q ` λx.e : τ1η−→ τ2, ∅ Ã λx.e′

(abs)

Γ, q ` e1 : τ1η−→ τ2, η′ Ã e′1 Γ, q ` e2 : τ1, η′′ Ã e′2

E, q, m `d e2 : t q′ = fapp(e1, τ1η−→ τ2, t, q)

〈e′1e′2, q′〉 ↪→ 〈e′, ε〉 Γ ` e′ : τ, η′′′

Γ, q ` e1e2 : τ2, η ∪ η′ ∪ η′′ Ã e′

(app)

Γ, q ` e1 : τ1, η Ã e′1 Γ, q ` e2 : τ2, η′ Ã e′2Γ ` e′1; e′2 : τ ′, η′′

Γ, q ` e1; e2 : τ2, η ∪ η′ Ã e′1; e′2(seq)

Γx,f†[x 7→ τ1, f 7→ τ1η−→ τ ], q ` e1 : τ, η Ã e′1

Γf†[f 7→ Gen(Γ, τ1η−→ τ, η)], q ` e2 : τ2, η′ Ã e′2

e′ = let rec f x = e′1 in e′2 Γ ` e′ : τ3, η′′

Γ, q ` let rec f x = e1 in e2 : τ2, η ∪ η′ Ã e′

(letrec)

Γ, q ` e1 : τ1, η Ã e′1Γx†[x 7→ Gen(Γ, τ1, η)], q ` e2 : τ2, η′ Ã e′2

Γ ` let x = e′1 in e′2 : τ, η′′

Γ, q ` let x = e1 in e2 : τ2, η ∪ η′ Ã let x = e′1 in e′2

(let)

Γ, q ` e : τ, η Ã e′

Γ, q ` ref (e) : refρ (τ), η ∪ init(ρ, τ) Ã ref (e′)(ref)

Γ, q ` e : refρ (τ), η Ã e′ E, q, m `d e : tq′ = fderef (e, refρ (τ), t, q) 〈!e′, q′〉 ↪→ 〈e1, ε〉

Γ ` e1 : τ1, η′

Γ, q ` !e : τ, η ∪ read(ρ, τ) Ã e1

(deref)

Γ, q ` e1 : refρ (τ1), η Ã e′1Γ, q ` e2 : τ1, η′ Ã e′2

E, q, m `d e2 : t q′ = fassign(e1, refρ (τ1), t, q)〈e′1:=e′2, q′〉 ↪→ 〈e′, ε〉 Γ ` e′ : τ, η′′

Γ, q ` e1:=e2 : Unit, η ∪ η′ ∪ write(ρ, τ1) Ã e′

(assign)

Figure 8: Type-Based Weaving Rules

Next, we use the effect-based type inference system tohandle the weaving process as it appears in Figure 8. Forthis purpose, we define a new judgment as follows:

Γ, q ` e : τ, η Ã e′

This new judgment states that expression e has type τ andeffect η under some typing environment Γ and it is translatedthrough weaving to e′. The translated expression e′ is theweaving outcome that results when the applicable advices ofthe sequence q are weaved into the expression e. An advicein q is said to be applicable to e if its pointcut matches e.

21

In the rules (const) and (var) of Figure 8, the translationmakes no changes because there are no applicable advices toweave. In the rules (abs, seq, letrec, let, and ref) there arealso no applicable advices however these rules keep the factthat sub-expressions may have been translated at previoussteps. The rules (app, deref, and assign) are crucial becausewe want to pick out join points where we call a function, geta variable, or set a variable. Besides, these join points maymatch the defined dataflow pointcuts. It is essential at theserules to check if any pointcut matches those join points. Incase of matching, the applicable advices are injected accord-ing to their kinds. We assume that the advices are sorted inthe sequence s according to their precedence.

The function fapp in the rule (app) picks out all the appli-cable advices that their pointcuts match an application ex-pression. These pointcuts are call pointcuts, dataflow point-cuts, or a logical combination between them using booleanoperators. The matching of a call pointcut depends on thename of a function and its type whereas the matching of adataflow pointcut in this case depends if the argument ofthe application expression is tagged with its tag. The func-tion fderef in the rule (deref) picks out all the applicableadvices that their pointcuts match a dereferncing expres-sion. These pointcuts are get pointcuts, dataflow pointcuts,or a logical combination between them using boolean op-erators. The matching of a get pointcut depends on thename of a variable and its type whereas the matching of adataflow pointcut in this case depends if the argument of thedereferncing expression is tagged with its tag. The functionfassign in the rule (assign) picks out all the applicable ad-vices that their pointcuts match an assignment expression.These pointcuts are set pointcuts, dataflow pointcuts, or alogical combination between them using boolean operators.The matching of a set pointcut depends on the name of avariable and its type whereas the matching of a dataflowpointcut in this case depends if the right-hand side of theassignment expression is tagged with its tag.

° ea (proceed Axiom)

° ea

° eaea1

° ea

° ea1ea

° ea

° λx.ea° ea

° letletlet x = ea ininin ea1

° ea

° letletlet x = ea1 ininin ea

° ea

° let reclet reclet rec f x = ea ininin ea1

° ea

° let reclet reclet rec f x = ea1 ininin ea

° ea

° ea; ea1

° ea

° ea1 ; ea

° ea

° refrefref ea° ea

° !ea° ea

° ea:=ea1

° ea

° ea1 :=ea

Figure 9: Derivation of proceed Expressions

In Figure 8, the weaving configuration is represented by〈Exp,AdvSeq〉. Hence, the rule 〈e, q〉 ↪→ 〈e′, ε〉 means thate′ is the result of weaving all the advices in q into e. Noticethat ↪→ is transitive. The axiom ° ea defines that theexpression ea of type ArExp contains proceed whereas theaxiom ± ea defines that the expression ea of type ArExpdoes not contain proceed. Derivations of proceed expressionsand no proceed expressions are shown in Figure 9 and Figure10 respectively. Hereafter, we give the weaving rules:

q = aq′ a.kind = before

〈e, q〉 ↪→ 〈a.exp; e, q′〉

± ea (No proceed Axiom)

± ea ± ea1

± eaea1

± ea

± λx.ea± ea ± ea

1± letletlet x = ea ininin ea

1

± ea ± ea1

± let reclet reclet rec f x = ea ininin ea1

± ea ± ea1

± ea; ea1

± ea

± refrefref ea

± ea

± !ea± ea ± ea

1± ea:=ea

1

Figure 10: Derivation of No proceed Expressions

q = aq′ a.kind = after

〈e, q〉 ↪→ 〈let tmp = e in a.exp; tmp, q′〉

q = aq′ a.kind = around Γ ` e : τ, ηΓ ` a.exp : τ ′, η′ θτ = θτ ′ θη ⊆ θη′

± a.exp〈e, q〉 ↪→ 〈a.exp, q′〉

q = aq′ a.kind = around 〈e, q′〉 ↪→ 〈e′, ε〉Γ ` e : τ, η Γ ` (λproceed .a.exp) e′ : τ ′, η′

θτ = θτ ′ θη ⊆ θη′ ° a.exp〈e, q〉 ↪→ 〈(λproceed .a.exp) e′, ε〉

The weaving process is in the spirit of AspectJ. The se-quence construct “;” of the extended λ calculus is intro-duced to perform the injection. The before-advice in thefirst rule is inserted before the expressions that match itspointcut. The after-advice in the second rule is insertedafter the expressions that match its pointcut. Actually, thevalue of the matched expression should be returned after ex-ecuting the matched expression inside the advice body. Thearound-advice without proceed in the third rule bypasses thecomputation of a join point. The around-advice with pro-ceed in the fourth rule allows to run the advice with nextprecedence, or the computation under the join point if thereis no further advice. Besides, the type of the around-advicemust be the same or an instance of the type of the expres-sion that matches its pointcut. In the following, we stateand prove a result that establishes the preservation of theweaving process.

Theorem 1. (Preservation) If Γ ` e : τ, η and 〈e, q〉 ↪→〈e′, ε〉 then Γ ` e′ : τ ′, η′ where there exists a substitution θsuch that θτ = θτ ′ and θη ⊆ θη′.

Proof. The proof is done by induction over the length ofs. For detailed information, we refer the reader to [14].

Example 1.

We present in Figure 11 the derivation according to the type-based weaving rules for the following expression, advice, andpointcut: Expression: e = (let reclet reclet recfx = x ininin f2)Advices: a1::= 〈akind: before, pcd:p1, exp:e1 〉Pointcuts: p1::= 〈pkind: call, var:f, scheme: ∀α.α → α 〉

Derivation: The rules (const, var, app, and letrec) in theFigure 8 are used in the derivation. Typing the expressionf2 is instrumented to match and weave the advice a1.

22

Σ, E, q, s, r ` c → c : { }, ∅, s, r (const)

x : v ∈ Σ v : t ∈ EΣ, E, q, s, r ` x → v : N(v, r) ∪ t, ∅, s, r

(var)

Σx†[x 7→ v], Ex†[v 7→ {}], q, s, r ` e → v : t, a, s′, r′

Σ, E, q, s, r ` λx.e → 〈x, e, Ex〉 : t, ∅, s, r′(abs)

Γ ` e1 : τ, η Γ ` e1e2 : τ ′, η′

Σ, E, q, s, r ` e1 → 〈x, e′1, Σ′〉 : t1, a, s′, r′ Σ, E, q, s′, r′ ` e2 → v′′ : t2, a′, s′′, r′′

Σ′x†[x 7→ v′′], Ev′′†[v′′ 7→ t2], q, s′′, r′′ ` e′1 → v′ : t′1, f ′′, s′′′, r′′′

Σ, E, q, s, r ` e1e2 → v′ : N(v′, r′′′) ∪ t, a ∪ a′ ∪ a′′, s′′′, r′′′

where t = t′1 ∪ searchTagCall(e1, τ, t2, q)

(app)

Σ, E, q, s, r ` e1 → v1 : t1, a, s′, r′ Σ, E, q, s′, r′ ` e2 → v2 : t2, a′, s′′, r′′

Σ, E, q, s, r ` e1; e2 → v2 : t2, a ∪ a′, s′′, r′′(seq)

c = 〈x, e1, Σf,x†[f 7→ c]〉 Σ, E, q, s, r ` e1 → c : t1, a, s′, r′

Σf†[f 7→ c], Ef†[c 7→ t1], q, s′, r′ ` e2 → v : t2, a′, s′′, r′′

Σ, E, q, s, r ` let reclet reclet rec f x = e1 ininin e2 → v : t2, a ∪ a′, s′′, r′′(letrec)

Σ, E, q, s, r ` e1 → v1 : t1, a, s′, r′

Σx†[x 7→ v1], Ev1†[v1 7→ t1], q, s′, r′ ` e2 → v2 : t2, a′, s′′, r′′

Σ, E, q, s, r ` letletlet x = e1 ininin e2 → v2 : t2, a ∪ a′, s′′, r′′(let)

Σ, E, q, s, r ` e → v : t, a, s′, r′ l /∈ Dom(s′)Σ, E, q, s, r ` refrefref (e) → l : t, a ∪ {init(l)}, s′†[l 7→ v], r′

(ref)

Γ ` e : τ, η Γ ` !e : τ ′, η′

Σ, E, q, s, r ` e → l : t, a, s′, r′

Σ, E, q, s, r ` !e → s′(l) : N(s′(l), r′) ∪ t′, a ∪ {read(l)}, s′, r′

where t′ = t ∪ searchTagGet(e, τ, t, q)

(deref)

Γ ` e1 : refρ(τ), ηΣ, E, q, s, r ` e1 → l : t1, a, s′, r′ Σ, E, q, s′, r′ ` e2 → v : t2, a′, s′′, r′′

Σ, E, q, s, r ` e1:=e2 → u : t, a ∪ a′ ∪ {write(l)}, s′′l †[l 7→ v], r′′l †[l 7→ t2]where t = t1 ∪ t2 ∪ searchTagSet(e1, refρ(τ), t2, q)

(assign)

Figure 12: Dynamic Tagging Rules

x : int ∈ Γx†[x 7→ int] int  intΓ, a1 ` x : int, ∅ à x (1)

f : int → int ∈ Γf†[f 7→ int → int] int → int  int → intTypeOf(2)  int

Γ, a1 ` f : int → int, ∅ Ã fΓ, a1 ` 2 : int, ∅ Ã 2 E, a1, m `d 2 : { } q′ = a1

〈f2, a1〉 ↪→ 〈e1; f2, ε〉 Γ ` e1; f2 : int, ∅Γ, a1 ` f2 : int, ∅ Ã e1; f2 (2)

Γx,f†[x 7→ int, f 7→ int → int], a1 ` x : int, ∅ Ã x (from 1,2)Γf†[f 7→ int → int], a1 ` f2 : int, ∅ Ã e1; f2 (from 2)

Γ ` let rec f x = x in e1; f2 : int, ∅Γ, a1 ` let reclet reclet rec f x = x ininin f2 : int, ∅ Ã let reclet reclet rec f x = x ininin e1; f2

Figure 11: Example Derivations

In this subsection, a dynamic semantics is provided tomatch and weave dataflow pointcuts. Dataflow tags are as-sociated with values and propagated to track data depen-dencies. This semantics, which is presented in Figure 12, isneeded to prove the correctness of the aforementioned staticsemantics. Given an environment Σ that maps variables tovalues, an environment E that maps values to tags, a se-quence of advices q, a store s, and a mapping r that mapsreferences to tags, the dynamic semantics associates an ex-pression e with a value v and a tag t. The trace a repre-sents the side effects the expression e performs during itsevaluation. In addition, s′ and r′ are the possibly updatedstore and the possibly updated mapping respectively. Thisis noted as Σ, E, q, s, r ` e → v : t, a, s′, r′. A trace a is

the dynamic counterpart of a static side-effect η whereas amapping r is the dynamic counterpart of a static mappingm.Since matching the defined pointcuts depends primarily ontypes, some expressions are needed to be typed in the rules(app, deref, and assign) in the dynamic tagging rules. Ac-cordingly, these rules are written as follows:

staticPremise1...staticPremisen

dynamicPremise1...dynamicPremisen

Conclusion

The function N checks if the value is a reference value l andif l is associated with the tag set t in the mapping r. If soit returns t. Otherwise, it returns an empty set. On theother hand, the functions searchTagCall, searchTagGet, andsearchTagSet are the same as their counterparts in the staticapproach.

5.3 Consistency of Dynamic and StaticApproach

We use the proof method that is introduced by Talpinand Jouvelot [11] to show that the static and the dynamicsemantics are consistent with respect to a structural relationbetween values and types defined as the maximal fixed pointof a monotonic property. A store model tells which region ρand type τ correspond to a reference value l.

S ∈ StoreModel = Region × Type

Note that S ⊆ S′ if and only if ∀l ∈ Dom(S), S(l) = S′(l).A dynamic trace of side effects a is consistent with the effect

5.2 Dynamic Approach

23

η for the model S, noted S |= a : η, if and only if:

∀ init(l) ∈ a, S(l) = (ρ, τ) ∧ init(ρ, τ) ∈ η∀ read(l) ∈ a, S(l) = (ρ, τ) ∧ read(ρ, τ) ∈ η∀ write(l) ∈ a, S(l) = (ρ, τ) ∧ write(ρ, τ) ∈ η

Note that, if S ⊆ S′ and S |= a : η, then S′ |= a : η. Also,when S |= a : η and S |= a′ : η′, then S |= a ∪ a′ : η ∪ η′. Atagged and typed stores are defined as models for describingthe relation between values, types, and tag sets.

r : s : S ∈ TaggedTypedStore =Store × StoreModel ×DynamicMapping

Note that DynamicMapping= Ref −→m

TagSet. Given a typed

store r : s : S, the value v is tagged with the tag set t andconsistent with the type τ , noted r : s : S |=t v : τ , if andonly if v, τ , and t verify one of the following properties:

r : s : S |=t u : Unit ⇔ r(u) = tr : s : S |=t l : refρ(τ) ⇔ S(l) = (ρ, τ) and s : S |=t s(l) : τand r(l) = tr : s : S |=t 〈x, e, Σ〉 : τ ⇔ there exists Γ, E , q, and m suchthat r : s : S |=E Σ : Γ and q, Γ ` λx.e : τ, ∅ Ã λx.e′ andE , q, m `d λx.e : t, m′

Note that r : s : S |=E Σ : Γ if and only if Dom(E) =Dom(Σ)= Dom(Γ) and a : s : S |=E(X) Σ(x) : Γ(x) forevery x ∈ Dom(E). Notice that we ignore constants for ab-breviation purposes. This structural property between val-ues, types, and tags does not uniquely define a relation andmust be regarded as a fixed point equation on the domainR = TaggedTypedStore × Value × Type of the relation. Afunction F is defined on the domain Pfin(R) → Pfin(R).Its fixed points are the relations on R that verify the prop-erty defined above.

F(Q) = {(r, s, S, v, τ, t)\if v = u then τ = Unit and r(u) = tif v = l then there exist ρ and τ ′ such that τ = refρ(τ

′) andS(l) = (ρ, τ ′) and (s, S, s(l), τ ′, t) ∈ Qif v = 〈x, e, Σ〉 then there exists Γ, E , q, and m such thatr : s : S |=E Σ : Γ and q, Γ ` λx.e : τ, ∅ Ã λx.e′ andE , q, m `d λx.e : t, m′}

Theorem 2. (Consistency) let Σ be an environment andΓ its type. Let r : s : S be a tagged and typed store such thata : s : S |=E Σ : Γ. Provided that Γ, s ` e : τ, η Ã e′ andΣ, E, s, r, m ` e → v : t, a, r′, m′, there exists a store modelS′ such that a : s : S v a′ : s′ : S′ with S′ |= a : η anda′ : s′ : S′ |=t v : τ .

Proof. The proof is by induction on the length of thedynamic evaluation, for each syntactic category of expres-sions. It is based on the proof of Talpin and Jouvlet andomitted for space limitation

6. DATAFLOW DESIGN AND IMPLEMEN-TATION IN ASPECTJ

We design and implement the dataflow pointcut as an ex-tension to the ajc compiler of AspectJ-1.5.0. The adapted

implementation methodology depends primarily on the se-mantics defined for the λ calculus. The implemented dat-flow pointcut tracks data dependencies inside methods. Theimplementation task is not an easy task because it requiresdigging into the actual code of the ajc compiler. This taskis time consuming since it implies reading thousands of linesof code especially in the absence of documentations.

The design and implementation of the dataflow point-cuts depend primarily on matching. The matching of thedataflow pointcut has two levels. The first level correspondsto the matching of the pointcut p enclosed by the dataflowpointcut dflow(p) where p is any of the defined AspectJpointcuts. The second level resides in checking if a join pointhas a define-use relation with the join point that is matchedby the enclosed pointcut p. We depend on the effect of exe-cuting instructions on the operand stack of the Java VirtualMachine [16] to track define-use relationships. Accordingly,a join point jp matches a dataflow pointcut dflow(p) either:

• If this joint point jp matches the enclosed pointcut pof the dataflow pointcut and at the same time definesa value in the operand stack of the JVM or,

• If this join point has a define-use relationship with thejoin point jp′ where jp′ matches the enclosed pointcutof the dataflow pointcut p and defines a value in theoperand stack of the JVM.

The implementation methodology that is followed to im-plement the dataflow pointcut consists of the following steps:

• The ajc parser is extended to recognize and parse thesyntax of the dataflow pointcut.

• The life cycle of any AspectJ pointcut is went throughby the dataflow pointcut.

• A new matching stage is added to match the pointcutp of the dataflow pointcut dflow(p). If p matches abytecode instruction and this bytecode defines a valuein the operand stack, all the subsequent instructionsin the same method are visited including the bytecodeinstruction that matches p itself. Once a bytecode in-struction is visited, its behavior is simulated by a de-fined stack as follows. If the bytecode instruction de-fines a value on the operand stack, it will be pushedon a defined stack. On the other hand, if the byte-code instruction uses a value from the operand stack,the first bytecode instruction on the defined stack willbe popped and a dependency relationship between thetwo instructions is created in a special structure repre-sented as a hash table. A hash table is used to repre-sent the dependency relationships by having the byte-code instruction that uses a value from the operandstack as an entry connected to a dependency list whichcontains the bytecode instructions that this instructionpops from the defined stack. In the case of a bytecodeinstruction that can be reached from a branch instruc-tion or follow immediately a branch instruction, thebranch instruction is also added to its dependency list.

• An analysis of the dependency relationships in a hashtable is done to track data dependencies between in-structions with the help of the Bytecode EngineeringLibrary (BCEL) API [15]. The BCEL API is intended

24

to give users a convenient possibility to analyze, cre-ate, and manipulate binary Java class files. Analyzingthe dependency relationships takes into considerationtransitivity relationships and branch instructions.

• At the end of each method and after finishing the anal-ysis, the hash tables are removed.

A legitimate question to ask is how the defined semanticsfor the dataflow pointcut using λ calculus is helpful in theimplementation. A natural answer to this question stemsfrom the fact that tag propagation using typing in λ calculusis simulated in Java with the idea of the hash table. Actu-ally, the tag of a specific dataflow pointcut propagates be-tween bytecode instructions using entries and dependencylists of dependency relationships in a specific hash table. Inaddition, both approaches agree that much of the dataflowinformation can be available statically which minimizes thecost of dataflow pointcut considerably.

class CROSS{

public void doGet ( HttpServletRequest request ,HttpServletResponse response ) throwsIOException

{re sponse . setContentType ( ”app l i c a t i on /x−

JavaScr ipt ”) ;Pr intWriter out = response . getWriter ( ) ;S t r ing name = reques t . getParameter ( ”name”) ;out . p r i n t l n ( ”document . body . innerHTML = ’

Welcome ” + name + ” ’ ; ”) ;out . c l o s e ( ) ;

}

}

aspect CheckDataCROSS{

pointcut CROSSDataChecker ( ) :df low ( c a l l ( S t r ing HttpServletRequest .

getParameter ( S t r ing ) ) )&& c a l l (∗ ∗ . p r i n t l n( . . ) ) ;

before ( ) : CROSSDataChecker ( ){

System . out . p r i n t l n ( ”Detected Vu ln e r ab i l i t y ”) ;}

}

Figure 13: Cross-site Scripting Vulnerability

Figure 14: Cross-site Scripting Vulnerability Detec-tion

class SQL{

public void s e t u s e r ( HttpServ letRequest request ,Connection con ) throws SQLException

{St r ing username = reques t . getParameter ( ”

userame ”) ;S t r ing password = reques t . getParameter ( ”

password ”) ;S t r ing query = ”SELECT id FROM us e r t a b l e

WHERE ”+”username= ’ ”+username+” ’ AND”+”password=PASSWORD( ’ ” + password+” ’ ) ” ;

Statement s t = con . createStatement ( ) ;Resu l tSet r s = s t . executeQuery ( query ) ;

}}

aspect CheckDataSQL{

pointcut SQLDataChecker ( ) :df low ( c a l l ( S t r ing HttpServletRequest .

getParameter ( St r ing ) ) )&& c a l l (∗ ∗ .executeQuery ( . . ) ) ;

before ( ) : SQLDataChecker ( ){

System . out . p r i n t l n ( ”Detected Vu ln e r ab i l i t y ”) ;}

}

Figure 15: SQL Injection Vulnerability

Figure 16: SQL Injection Vulnerability Detection

Next we present two Java code examples that are vulner-able to XSS and SQL injection attacks respectively. Thenwe show how the implemented dataflow pointcut can de-tect and fix such vulnerabilities. One of the most impor-tant tools released with AspectJ is a graphical structurebrowser that edits program source files, compiles programswith ajc, runs programs, and navigates crosscutting con-cerns. This browser is used to demonstrate how the im-plemented dataflow pointcut picks out the vulnerable joinpoints in a specific code. Fig. 13 displays the code that isvulnerable to XSS together with a dataflow pointcut to de-tect such a vulnerability. Fig. 14 shows, with the help of theajc browser, how the defined dataflow pointcut catches callsto println methods that their parameter string originatesfrom a return value of a getParameter method in a past joinpoint. On the other hand, Fig. 15 contains a code that isvulnerable to SQL injection together with a dataflow point-cut to detect such a vulnerability. Fig. 16 shows how thedataflow pointcut catches calls to executeQuery methods

25

that their parameter string originates from a return value ofa getParameter method in a past join point. These vulner-abilities, in general, can be prevented by validating all inputfrom outside the application using a before or an around ad-vices. Validation should include length and content. Typ-ically only alphanumeric characters are needed. Any otheraccepted characters should be escaped. For simplificationthe advice is represented here as a warning message.

7. CONCLUSIONIn this paper, we have presented a formal and a prac-

tical framework for the dataflow pointcut. Dataflow tagsare propagated statically to track data dependencies. Thisapproach can minimize the cost of dataflow pointcuts be-cause much of the dataflow information would be availablestatically. We have introduced a static semantics for tagpropagation and proved that it is consistent with respectto the dynamic semantics of the propagation. The pro-posed semantics for advice weaving is in the spirit of As-pectJ where advices are injected before, after, or around thejoin points that are matched by their respective pointcuts.This contribution is the first step towards a complete secu-rity aspect core based on the extended λ-calculus. Hence,as a future work, we plan to define a calculus that containssecurity-related pointcuts together with the underlying se-mantic foundations. In addition, we will target integratinginter-procedural analysis for dataflow pointcut matching.

8. REFERENCES[1] Kiczales, G., Lamping, J., Menhdhekar, A., Maeda, C., Lopes,

C., Loingtier, J., and Irwin., J. (1997) Aspect-orientedprogramming. In Aksit, M. and Matsuoka, S. (eds.),Proceedings European Conference on Object-OrientedProgramming, 1241, 220–242. Springer-Verlag.

[2] Clifton, C., Leavens, G. T., and Noble, J. (2007) MAO:Ownership and effects for more effective reasoning aboutaspects. Proceedings European Conference on Object-OrientedProgramming, 4609, 451–475. Springer-Verlag.

[3] Ossher, H. and Tarr, P. (2000) Multi-dimensional separation ofconcerns and the hyperspace approach. Proceedings of theSymposium on Software Architectures and ComponentTechnology: The State of the Art in Software Development.Enschede, The Netherlands. Kluwer.

[4] Orleans, D. and Lieberherr, K. (2001) DJ: Dynamic adaptiveprogramming in Java. Proceedings of the Third InternationalConference on Metalevel Architectures and Separation ofCrosscutting Concerns, REFLECTION ’01,Kyoto, Japan, pp.73–80. Springer-Verlag.

[5] Masuhara, H., Kiczales, G., and Dutchyn, Ch. (2003) Acompilation and optimization model for aspect-orientedprograms. In Hedin, G. (eds.), Proceedings of CompilerConstruction (CC2003), 2622, 46–60. Springer-Verlag.

[6] De Win, B., Vanhaute, B., and De Decker, B. (2001) Securitythrough aspect-oriented programming. Proceedings of theIFIP TC11 WG11.4 First Annual Working Conference onNetwork Security, Leuven, Belgium, pp. 125–138. Kluwer.

[7] Alhadidi, D., Belblidia, N., and Debbabi, M. (2006) Securitycrosscutting concerns and AspectJ. The InternationalConference on Privacy, Security and Trust, Markham,Ontario, Canada, 30 October–1 November. McGraw-Hill.

[8] Debbabi, M., Aidoud, Z., and Faour, A. (1997) On theinference of structured recursive effects with subtyping.Journal of Functional and Logic Programming, 1997(5).

[9] Plotkin, G. (1975) Call-by-Name, Call-by-Value and thelambda-Calculus. Theoretical Computer Science, 1(2), pp125–159. Elsevier.

[10] Nielson, F. and Nielson, H. (1994) Constraints for polymorphicbehaviours of concurrent ML. Constraints in ComputationalLogic, pp. 73–88. Springer-Verlag, London, UK.

[11] Talpin, J. and Jouvelot, P. (1992) Polymorphic type, regionand effect inference. Journal of Functional Programming,2(3), 245–271.

[12] Clifton, C. and Leavens, G. T. (2006) MiniMAO1: AnImperative core language for studying aspect-orientedreasoning. Science of Computer Programming, 63(3),321–374.

[13] The AspectJ programming guide.http://dev.eclipse.org/viewcvs/indextech.cgi/check-out/aspectj-home/doc/progguide/index.html, 2008.

[14] ENCS Users’ Web Pages-Concordia University-Montreal,Quebec.http://users.encs.concordia.ca/~a_boukh/Appendix.pdf, 2009.

[15] Byte Code Engineering Library BCEL.http://jakarta.apache.org/bcel/manual.html, 2008.

[16] The Java Virtual Machine Instruction Set.http://java.sun.com/docs/books/jvms/secondedition/html/Instructions.doc.html, 2008.

[17] Walker, D., Zdancewic, S., and Ligatti, J. (2003) A theory ofaspects. Proceedings of the International Conference onFunctional Programming, Uppsala, Sweden. ACM Press.

[18] Dantas, D., Walker, D., Washburn, G., and Weirich, S. (2005)PolyAML: a polymorphic aspect-oriented functionalprogramming language. j-SIGPLAN, 40(9), 306–319.

[19] Masuhara, H.,Tatsuzawa, H., and Yonezawa, A. (2005)Aspectual caml: an aspect-oriented functional language.Proceedings of the International Conference on FunctionalProgramming, Tallinn, Estonia, pp. 320–330. ACM Press.

[20] Wang, M., Chen, K., and Khoo, S. (2006) Type-directedweaving of aspects for higher order functional languages.PEPM’06: Proceedings of the 2006 Symposium on PartialEvaluation and Semantics-Based Program Manipulation,Charleston, South Carolina, pp. 78–87. ACM Press.

[21] Wang, M., Chen, K., and Khoo, S. (2006) On the pursuit ofstatic and coherent weaving. Foundations of Aspect-OrientedLanguages (FOAL) 2006.

[22] Masuhara, H. and Kawauchi, K. (2003) Dataflow pointcut inaspect-oriented programming. APLAS 03: Asian Symposiumon Programming Languages and Systems. Springer Verlag.

[23] Huang, M., Wang, C., and Zhang, L. (2004) Toward a reusableand generic security aspect library. In De Win, B., Shah, V.,Joosen, W., and Bodkin, R. (eds.), AOSDSEC: AOSDTechnology for Application-Level Security, Lancaster,England.

[24] Ramachandran, R., Pearce, D., and Welch, I. (2006) AspectJfor multilevel security. In Coady, Y., Lorenz, D., Spinczyk, O.,and Wohlstadter, E. (eds.), ACP4IS06, Bonn, Germany, 20March, pp. 13–17. Published as University of VirginiaComputer Science Technical Report CS–2006–01.

[25] Damas, L., and Milner, R. (1982) Principal type-schemes forfunctional programs. POPL ’82: Proceedings of the 9thsymposium on Principles of programming languages,Albuquerque, Mexico, pp. 207–212. ACM Press.

[26] Allan, C., Avgustinov, P., Christensen, A. S., Hendren, L.,Kuzins, S., Lhotak, O., de Moor, O., Sereni, D., Sittampalam,G., and Tibble, J. (2005) Adding trace matching with freevariables to AspectJ. Proceedings of the 20th annualconference on Object oriented programming, systems,languages, and applications, OOPSAL’05, San Diego, CA,USA, pp. 345–364. ACM Press.

[27] Pavel, A., Hajiyev, E, Ongkingco, N., de Moor, O., Sereni, D.,Tibble, J., and Verbaere, M. (2007) Semantics of staticpointcuts in aspectJ. Proceedings of the 34th annualsymposium on Principles of programming languages, Nice,France, pp. 11–23. ACM Press.

[28] Tofte, M. (1990) Type inference for polymorphic references.Information and Computation, 89(1), 1–34.

[29] Jagadeesan, R., Jeffrey, A., and Riely, J. (2003) A Calculus ofuntyped aspect-oriented programs. Proceedings of theEuropean Conference on Object-Oriented Programming,Darmstadt, Germany, pp. 54–73. Springer.

[30] Avgustinov, P., Hajiyev, E, Ongkingco, N., de Moor, O.,Sereni, D., Tibble, J., and Verbaere, M. (2007) Semantics ofstatic pointcuts in aspectJ. Proceedings of the 34th annualsymposium on Principles of programming languages, Nice,France, pp. 11–23. ACM Press.

[31] Wand, M., Kiczales, G., and Dutchyn, Ch. (2004) A semanticsfor advice and dynamic join points in aspect-orientedprogramming. Transactions on Programing Languages andSystems, 26(5), 890–910. ACM Press.

26