automated inference of past action instances in digital investigations

27
Automated Inference of Past Action Instances in Digital Investigations Joshua I. James , Pavel Gladyshev Digital Forensic Investigation Research Laboratory Soonchunhyang University, Asan-si, Chungcheongnam-do, South Korea University College Dublin, Belfield, Dublin 4, Ireland Abstract As the amount of digital devices suspected of containing digital evidence increases, case backlogs for digital investigations are also increasing in many organizations. To ensure timely investigation of requests, this work proposes the use of signature-based methods for automated action in- stance approximation to automatically reconstruct past user activities within a compromised or suspect system. This work specifically explores how multiple instances of a user action may be detected using signature- based methods during a post-mortem digital forensic analysis. A system is formally defined as a set of objects, where a subset of objects may be altered on the occurrence of an action. A novel action-trace update time threshold is proposed that enables objects to be categorized by their respective update patterns over time. By integrating time into event re- construction, the most recent action instance approximation as well as limited past instances of the action may be differentiated and their time values approximated. After the formal theory if signature-based event reconstruction is defined, a case study is given to evaluate the practicality of the proposed method. Keywords: Automatic Event Reconstruction; Digital Forensic Investi- gations; Automated Inference; Signature Analysis; Action-Trace Update Pattern Detection 1 Introduction Since the definition of Digital Forensic Science at the Digital Forensics Research Workshop in 2001 [1], the field has grown almost as dramatically as technology itself. As described by Casey [2], Digital Forensic Science is “coming of age”, which not only brings about a maturation in the principal concepts of the field, but also an increased scrutiny against these principles and their lack of rigor- ous scientific backing. Digital forensic investigators are now finding themselves overwhelmed with the scale and quantity of cases, along with the pressure of 1 arXiv:1407.5714v1 [cs.CR] 22 Jul 2014

Upload: ucd

Post on 11-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Automated Inference of Past Action Instances in

Digital Investigations

Joshua I. James†, Pavel Gladyshev‡

Digital Forensic Investigation Research Laboratory†Soonchunhyang University, Asan-si, Chungcheongnam-do, South Korea

‡University College Dublin, Belfield, Dublin 4, Ireland

Abstract

As the amount of digital devices suspected of containing digital evidenceincreases, case backlogs for digital investigations are also increasing inmany organizations. To ensure timely investigation of requests, this workproposes the use of signature-based methods for automated action in-stance approximation to automatically reconstruct past user activitieswithin a compromised or suspect system. This work specifically exploreshow multiple instances of a user action may be detected using signature-based methods during a post-mortem digital forensic analysis. A systemis formally defined as a set of objects, where a subset of objects maybe altered on the occurrence of an action. A novel action-trace updatetime threshold is proposed that enables objects to be categorized by theirrespective update patterns over time. By integrating time into event re-construction, the most recent action instance approximation as well aslimited past instances of the action may be differentiated and their timevalues approximated. After the formal theory if signature-based eventreconstruction is defined, a case study is given to evaluate the practicalityof the proposed method.

Keywords: Automatic Event Reconstruction; Digital Forensic Investi-gations; Automated Inference; Signature Analysis; Action-Trace UpdatePattern Detection

1 Introduction

Since the definition of Digital Forensic Science at the Digital Forensics ResearchWorkshop in 2001 [1], the field has grown almost as dramatically as technologyitself. As described by Casey [2], Digital Forensic Science is “coming of age”,which not only brings about a maturation in the principal concepts of the field,but also an increased scrutiny against these principles and their lack of rigor-ous scientific backing. Digital forensic investigators are now finding themselvesoverwhelmed with the scale and quantity of cases, along with the pressure of

1

arX

iv:1

407.

5714

v1 [

cs.C

R]

22

Jul 2

014

increasingly restrictive standards [3]. This combination translates into a consis-tently increasing number of delayed, or even neglected, cases.

Gogolin [4] states that in Michigan, USA, “50% or more of [Law Enforce-ment’s] cases have a digital component, and most agencies report that thisnumber is increasing”. This is not surprising given the rapid adoption and evo-lution of technology worldwide on a business as well as personal level within thelast 10 years alone. As technology becomes more a part of everyone’s life, it isnatural that more evidence in investigations will be found in digital form. Theissue is that many law enforcement agencies are not currently well positionedto handle an ever-increasing amount of data using traditional digital forensictechniques. Casey, et al. [5] observes “few [digital forensic laboratories] canstill afford to create a forensic duplicate of every piece of media and performan in-depth forensic examination of all data on those media”. Even though thefield of digital forensics has been advancing rapidly, the backlog for digital in-vestigations has continued to increase. Currently in the United States there arereports of backlogs from 12 to 18 months [6], and in some cases “approachingor exceeding 2 years” [4]. In 2004 the United Kingdom digital crime investiga-tion backlog was 6 to 12 months [7], and rose to 18 to 24 months in 2009 [8]before being improved through a number of policy, case prioritization and evi-dence outsourcing initiatives [9]. Since technological advancement for personaland business use shows little sign of slowing, data and backlog growth will con-tinue unless law enforcement, and the legal system in general, move towards theuse and acceptance of verifiable, highly-automated solutions during the digitalinvestigation process.

In an effort to provide faster data to knowledge acquisition for the digitalinvestigator, this research proposes the use of signature-based methods for theautomated analysis of actions that happened in a given digital system. It showsthat the state of low-level artifacts in a suspect system may be automatically ob-served and correlated to higher-level actions using signature-based methods thattake into account measured trace update thresholds, rather than the assump-tion of an immediate trace updates on the occurrence of an action. The resultis a fast and detailed reconstruction of action instances that could be appliedduring the triage, preliminary, and in-depth analysis phases of an investigation.

2 Related Work

2.1 State Machine Analysis

Formal analysis is a method to formally represent a system and analyze certainscenarios based on this formal model. Several works have proposed modeling acomputer system formally as a finite state machine (FSM) [10, 11, 12], whichallows event reconstruction to be reduced to a state-space exploration problem.FSM models have been applied to model suspect systems in various ways. Forexample, Carrier [11] proposed a computer history model that groups the systeminto primitive (lowest level) and complex (causing multiple primitive or complex

2

events) events. An investigator then formulates hypotheses of events that aretested against the created computer history model in order to support or refutethe hypothesis. Gladyshev and Patel [10] took a different approach by modelingthe suspect system as an FSM where only the final state is known. The state-space is then back traced in order to find all possible scenarios that could haveresulted in the final, observed state.

The issue, and benefit, of creating a state machine model of a suspect systemis that all possible states and transitions of the system are considered. However,in real-world systems the state-space of even the simplest system becomes com-putationally impractical. Even with efforts to reduce the possible state-space, asdescribed by James, et al. [13], computational modeling of a real-world systemfor analysis currently requires too much abstraction to be practical.

2.2 Computer Profiling

The work on computer profiling presented in Marrington, et al. [14] attemptsto generate a computer usage profile that “. . . allows a human examiner tomake an informed decision regarding the likely value of the computer system toan investigation before undertaking a detailed manual forensic examination”.In this work an abstracted object model is used to classify objects observedin a suspect system. Observed objects are categorized as particular objecttypes, such as system, principle (people/groups), application or content data.Relationships between these objects are then determined. These relationshipsprovide insight into the logic of the system, and allow for the identificationof indirect relationships between objects that were otherwise thought to beunrelated. By examining an object of interest the relation of other objects maybe found. Times and events – defined as recorded, inferred and unknown types –are found, and are associated with their corresponding objects, where possible.The overall computer profile is then represented by these object, relationshipand event connections. Hypotheses about a computer system and its historycan then be formulated and tested based on the derived computer profile.

By concentrating on an informational rather than a computational finitestate machine model, this method does increase practicality compared to meth-ods previously described. In general, states are defined at a more abstract(object) level, and are based on observed evidence. Because of this, creationof the informational model is less computationally intensive as not all possiblecombinations of past states must be considered. The drawback is this model rep-resents suspect objects and their relations, but makes no conclusion about whatexactly these relations mean. The investigator is still left to manual hypothesisgeneration and testing, where some previously mentioned methods attempt toautomatically present possible hypothesis as well as test the hypotheses withinthe created models.

3

2.3 File System Activity Analysis

One probabilistic method of file system activity analysis has been presented byKhan and Wakeman [15]. In their work, neural networks are used to learn anddetect application “footprints”. Traces that were created on a disk (usuallyfile creation and manipulation) by a particular application of interest were fedinto the neural network in the order in which they were accessed to learn theupdate time-span relationships and file-system manipulation patterns of theapplication.

This method is highly probabilistic, where the neural network is able tocalculate the likelihood of an observation matching a previously derived model(footprint). Khan, et al. [16] showed that neural networks trained on specificapplication trace creation variables show relatively good results in determiningand differentiating one application’s footprint from another. One issue – thatwas also discussed by the authors – is that these systems need a very largeamount of training data to be reliable. The training data needed, manual vari-able selection and separate neural networks per application make this methodless practically feasible. Also, when comparing the application footprint of thismethod to the use of signatures that encode multiple object update behaviorsdescribed in James, et al. [17], the differences in models suggest that learnedsignatures lack some specificity that could provide more event information foruse in reconstruction.

3 Detection of Action Instances

Unlike previous works, this work focuses on the automated extraction of useraction instance hypotheses. An action is any event external to the system that isthe direct cause of a process. An action is the farthest point at which a happenedevent can be inferred. Opening a program by clicking an icon, for example, is anexternal activity conducted by a user that manipulates the state of the system.This work submits the hypothesis that multiple past executions of an actionmay be inferred by analyzing sporadically updated artifacts associated withthe given action instance. In this work the detection of past action instancesusing signature-based methods that includes artifact update time-spans will bedescribed, and practically demonstrated in a simple case study.

3.1 Theory of Action Instance Reconstruction

When a user interacts with a system, actions cause changes in the state ofthe system. Given the deterministic nature of computer systems, a certainaction will consistently cause the same changes within the system each timethe action takes place if the system is in the exact same starting state. Auser clicking on a program’s icon, for example, will consistently execute theprogram. The program must access specific files to load, which in turn updatesfiles, file metadata, log entries, etc. This work will focus on file meta-data, andspecifically file access, modified and created time stamps.

4

Figure 1: Causal chain of an action causing a process that causes trace creation.

Figure 2: Back-tracing the causal chain from the observed trace to determinethe corresponding action that caused the trace.

A causal relation between actions and object updates (trace creation) viaprocess execution can be determined because of causal chaining, where an actioncauses the process and the process causes the update of corresponding objects(Fig. 1).

Causal links can be traced back from the final observed state to determineall possible beginning actions that could result in such a final state [18]. In thiscase, the observation of a resulting trace may be related back to the processthat caused it, which may in turn be related back to the action that caused theprocess (Fig. 2).

A system contains a finite set of objects, O, where each object in O may bedefined in terms of associated access (ta), modified (tm) and created (tc) timestamps:

o = (ta, tm, tc)

The objects and time stamps in a system can be described as follows:

t = (τ), where t is a time stamp and τ is the time value of the time stamp

O = {o1, o2, o3. . .}

Tm = {tm1, tm2, tm3. . .}

Ta = {ta1, ta2, ta3. . .}

Tc = {tc1, tc2, tc3. . .}

T = Tm ∪ Ta ∪ Tc

An action is defined as:

a = (Ma,Da,Oa, af)

where:

• Ma is an action that modifies timestamps to the current time plus somerandom period of time (τ +∆τ)

5

• Da is an action that sets timestamps to a default value

� Da = {(t, τ)}, where t is the timestamp, τ is the default value

• Oa is a collection of objects the action creates if they are not present

• af is a function which takes in a set of objects and returns another set ofobjects produced from the original

� O′ = af(O)

As defined, an action may update time stamps. Since the time stamp updateperiod is not instantaneous, the update will happen at some random intervalafter the action. This update takes place at a random delta after the time of theaction. The action will also create an object if it does not exist. If objects, andtheir corresponding timestamps have been destroyed, these timestamps are notupdated. Created objects may have timestamps set to default values that maypossibly be sometime before the time the action took place. For example, soft-ware installation and backup recovery actions could produce objects with timestamps that are before the installation and backup actions occurred. Finally,an action function exists that accepts a set of objects as an input, and returnsa modified set of objects produced from the original.

The set of all actions is defined as:

A = {a1, a2, a3 . . . }

Actions happen at a particular point in time. An instance of an action is definedas:

I = {(a, τ)|a ∈ A, τ ∈ R}

where:

• τ is the time of the action instance occurring

In the proposed model, the set of actions is defined such that each timestampis a result of some action. This simplified system model is defined as:

∀t ∈ T, ∃i ∈ I, (t.τ = i.τ +∆τ) ∨(∃d ∈ i.a.Da, (d.t = t) ∧ (t.τ = d.τ))

This model states that for all time stamps there either exists an actioninstance where the current time of the time stamp equals the time of the actioninstance plus some random period of time, or the time stamp is equal to adefault time stamp.

In the model so far defined, a single action deterministically causes modifi-cation or creation to a set of objects. However, in a real-world system a singleexecution of the action can have multiple paths through a program. An actionwith multiple paths through a program is defined as:

6

aa = {a1, a2, a3, a4}

Each path through a program will cause different modifications. The resultis that not all time stamps may be updated with every execution, depending onthe path.

The event reconstruction approach adopted in this work seeks to recover asequence of action instances (i) where:

i = (a, τ)

I = {i1, i2, i3, i4 . . . }

And where multiple action instances may have an effect on a single object.

O′ = in.aa.af(in− 1.aa.af(. . . i2.aa.af(i1.aa.af(O)) . . . ))

A reconstruction function (ISR) is defined as:

I ′ = ISR(O′)

where:

• I ′ ⊆ I

In this case I ′ = I is impossible. Correctness is such that i in I ′ implies i inI. However, the opposite is not necessarily always true.

Since the object update delay is random, it cannot be said when the actionoccurred. Likewise, different action instances (a1, τ1) and (a2, τ2) may be exe-cuted at different times but because delta is different for each action, the finaltime stamp may be the same. For example, if:

i1 = (a1, τ1)

i2 = (a2, τ2)

τ1 6= τ2

It is possible that O′ = i1.a.af(O) and O′ = i2.a.af(O) if deltas for eachaction instance are different. Therefore, a function that uniquely identifies mul-tiple action instances that acted upon the final observed object is impossible;however, it is possible to recover a subset of possible action instances.

Provided delta is random, the exact time of an action instance cannot befound, but the time of an action instance may be statistically approximated.With a sample of the time intervals in which an action instance takes to updateobjects (∆τ) a distribution for the action instance update threshold can becreated. Given a probability distribution of delta, the time of the action instancecan be time-bound, and probabilistically approximated. The threshold for anaction instance execution is defined as:

7

θ = Φ(∆τ1, ∆τ2, ∆τ3, ∆τ4. . .)

where:

• θ is the maximum action instance execution threshold

• Φ is a probability density function that accepts the set of ∆τ for a singleaction instance

A function that identifies the exact time of the action instance cannot be defined,but limits on the duration of the action instance are restricted to within θ fromthe set of time stamp values.

0 ≤ ∆τ ≤ θ

The definition of an action instance must be updated to account for the exe-cution threshold:

i = (a, τ, θ)

The aim the proposed algorithm is to recover a subset of action instanceapproximations where each approximation consists of an action that happenedand an approximation of the time in which this action happened. An actioninstance approximation (ia) is defined as:

ia = (a, λτ)

where:

• λτ is the time interval in which the instance must have occurred in theform of a double containing two time values [τ1, τ2]

� τ1 is the start time of the interval and τ2 is the end time of the interval

� τ1 and/or τ2 may be null denoting no limit

Ia = {ia1, ia2, ia3. . .}

To recover a subset of action instance approximations (Ia), first the functionAIA is defined that returns a time interval (λτ) from a given time value (τ)and an action instance execution threshold (θ).

λτ = AIA(τ, θ)

The function AIA returns double [τ1, τ2], where τ1 is calculated by subtract-ing θ from τ , and τ2 = τ .

The function ISR is defined that returns the set of possible action instanceapproximations (Ia) from the final observed state O′, and the set of all actioninstances I.

8

Ia = ISR(O′, I)

In ISR, for each time stamp (O′.o.t) in the input set O′, and for each updateinstance (i.a.Ma) in the input set I that the time stamp is a member of, get theaction path (I.i.aa.a) where the update instance set contained only the singletime stamp (I.i.aa.a.Ma = {O′.o.t}), and the result of AIA(O′.o.t., I.i.θ). Theresult is a set of all possibly executed actions that is a subset of I, and a set ofinstance approximation intervals per action instance.

3.2 Signatures of Action Instances

This work submits the hypothesis that signature based methods may be used toautomate the trace observation, action inference and action instance approxi-mation tasks. For the task of observation, object time stamps and their relationto the action instance must be known. For action instance execution approxi-mation, the action instance execution threshold is required, where an unknown(null) value equals any time in the past. And for the inference task, under-standing of the underlying relation between the observed facts and the inferredconclusion is required. Knowledge of the system may be encoded as a traceupdate consistency-checking function. From this, a signature is defined as:

S = {Ti, θ, cm}

where:

• Ti is the set of all object time stamps associated with the action instancein the form of an object-trace double [o, t]

� Ti = {t|t ∈ i.a.Ma}

• θ is the maximum action instance execution threshold

• cm is the update consistency checking function particular to the categoryof object update patterns

A method for the derivation of time stamps related to a particular actioninstance has been described by James [19]. This method allows for the deter-mination of the set Ti related to a particular action instance. The derivation ofthe object update threshold (θ) for a particular action instance, which will bedescribed. This section, however, will focus on the update consistency check-ing function (cm), and the definition of three main time stamp related updatepatterns.

3.2.1 Core Object Update Consistency

Core object time stamps are defined as a subset of time stamps Score in T thatare updated to the current value of the system clock on the occurrence of eachexecution of a single, specific action.

9

All of the time stamps in a Core set are said to be in the ‘always updated’time stamp category. Using this definition, if any trace in a Core set is observedthen it can be inferred that the action instance must have happened since theartifact relates to one, and only one, action.

Also, since Core time stamps are ‘always updated’, it is expected that eachtime stamp will be within a certain time range of each other depending on theparticular object update threshold of the action.

An example of a Core trace would be a configuration file that is alwaysmodified when its related program, Program X, is closed. If the configurationfile were only modified when Program X is closed, the modification time stampof the configuration file would be a Core trace for the action “Close ProgramX”.

From this definition, an object time stamp update consistency function(CoreTest) can be derived to test whether each object update conforms tothe Core signature category. In the case of Core, if each trace has been updatedwithin θ, then the execution time for the action instance can be time-boundbefore the oldest time in the array.

First, a function getTraceStates is defined to return the state of time stampsof all objects defined in S. For each object specified in the signature, add theobject, time stamp and time stamp value to the array TraceStates.

function getTraceStates(O′, S)

array TraceStates

foreach S.T i.o ∈ O′

letTraceStates = TraceStates+ [S.T i[o, t], O′.o.t.τ ]

return TraceStates

Next, the function CoreTest may be defined that accepts an object updatethreshold and the TraceStates array. First the TraceStates array is sortedbased on the time stamp values, where element 0 is the oldest and n − 1 isthe newest (most recent) time stamp value. If the oldest time stamp value inTraceStates plus the object update threshold is less than the most recent timestamp value in TraceStates, then the Core traces are not consistent. If theoldest time stamp value plus the object update threshold is greater than themost recent object update value, then the Core traces are considered consis-tent. The function CoreTraces return the array detected, which is a singleelement array containing a double with the oldest and most recent time stampsin TraceStates.

function CoreTest(θ, TraceStates)

sort TraceStates

if (TraceStates[0, 0] + θ < TraceStates[n− 1])

10

return null

else

return detected[TraceStates[0, 0], T raceStates[n− 1]]

3.2.2 Supporting Object Update Consistency

Supporting object time stamps are defined as a subset of time stamps Ssupport

in T that may or may not be updated to the current value of the system clockon the occurrence of each execution of a single, particular action instance, butthat will only be updated by a single, particular action.

Supporting object time stamps are in the ‘irregularly updated’ time stampcategory. However, similar to Core signatures, if any trace in a supportingsignature is detected, then it can be inferred that the action instance must havehappened since the trace also relates to one, and only one, action.

A time stamp can be irregularly updated if, for example, a file is cached inmemory after the first execution of an action. If the file data cached in memory,rather than the file on disk, is accessed on the next execution of the actioninstance then the trace update will not be observable on the disk. In this casethe original file’s meta-data on disk would not be updated on the execution ofthe second action instance.

From this definition, an object time stamp update consistency function(SupportTest) can be derived to test whether each trace conforms to the sup-porting signature category. In the case of supporting, if each trace has beenupdated within θ, then the execution time for the action instance can be ap-proximated to be at, or shortly before the oldest time in the array; however,depending on the action path, objects may not always be updated. If any ob-ject time stamp is updated outside of θ from another related object time stamp,then it can be inferred that a separate instance of the same action must havehappened.

The function SupportTest is defined that accepts an object update thresholdand the TraceStates array. First, the TraceStates array is sorted based onthe time stamp values, where element 0 is the oldest and n − 1 is the newest(most recent) time stamp value. Each object time stamp value is compared tothe oldest time stamp value in the TraceStates array. The comparison takesplace until the time stamp value plus the object update threshold is less thanthe newest compared time stamp value in the array. When this happens, alltime stamp values are assigned to the action instance that must have occurredbetween the oldest time stamp value and the most recent time stamp that is stillless than the threshold. The oldest time stamp is then replaced with the mostrecent time stamp that is greater than the threshold, and the process startsagain.

function SupportTest(θ, TraceStates)

sort TraceStates

11

timeV alue = TraceStates[0]

foreach i in TraceStates; do

if(timeV alue+ θ > TraceStates[i− 1, i− 1])

next

else

arraydetected = detected[] + [timeV alue,TraceStates[i− 2]]

let timeV alue = TraceStates[i− 1]

done

return detected[]

3.2.3 Shared Object Update Consistency

Shared object time stamps are defined as a subset of time stamps Sshared in Tthat may or may not be updated to the current value of the system clock on theoccurrence of each execution of multiple actions.

Shared object time stamps may be either ‘always updated’ or ‘irregularlyupdated’ category types depending on the action. Since the particular trace maybe associated with more than one action, it is possible that any associated actioninstance could have updated the trace. Thus, without additional information,the only information that can be inferred from the detection of a shared objecttime stamps is that at least one of the associated actions must have happened.

An example of a shared object time stamp would be the access time stamp ofa .dll file. Multiple actions can cause the .dll file to be accessed, so when exam-ining the system in a post-mortem environment with no additional information,each action that causes the .dll file to be accessed has the same probability tohave updated the accessed time stamp.

From this definition, an object time stamp update consistency function(SharedTest) can be derived to test whether each trace conforms to the sharedobject update category, and determine what action the trace is associated with.In the case of shared, if each trace has been updated within θ, then the exe-cution time for the action can be approximated to be at, or shortly before theoldest time in the array; however, objects may not always be updated. If anyobject is updated outside of θ from another object, then a separate execution ofthe same action may be inferred. Traces may also be associated with multipleactions. Because of this, additional context is needed to determine which actioncaused the trace. Some methods, such as probabilistic association, can be usedto attempt to determine which action the trace is associated with, but there arelimitations to these methods. However, action-to-trace association methods,and their weaknesses, are beyond the scope of this work.

12

Keeping with the currently defined signature creation model, at best whatcan be said when observing a shared trace is that all actions associated withthe trace could have happened within their respective update thresholds. Forthis reason, the consistency checking of a group of shared objects is much likesupporting objects. Each object time stamp value is compared from oldest tomost recent. Each time stamp is considered to be associated with one actioninstance if the value is within the update threshold. The result is an array withmultiple instances of the action. In the case of shared, the objects may be testedmore than once, since they will be present in multiple signatures. This meansthat a trace may be associated with multiple actions, and all actions associatedwith the shared trace are assumed to have happened.

function SharedTest(θ, TraceStates)

sort TraceStates

timeV alue = TraceStates[0]

foreach i in TraceStates; do

if(timeV alue+ θ > TraceStates[i− 1, i− 1])

next

else

arraydetected = detected[] + [timeV alue,TraceStates[i− 2]]

lettimeV alue = TraceStates[i− 1]

done

return detected[]

3.3 Object Update Threshold

The object update process is not instantaneous. In order to accurately differ-entiate between multiple instances of an action, object update duration mustbe defined for the particular action. The object update threshold, in seconds,of the actions “Open Internet Explorer 8” (IE8) and “Open Firefox 3.6” (FF3)were surveyed on 25 computer systems running Windows XP or Windows 7,and modeled as a normal distribution. To attempt to reduce noise, a limiter of2σ will be used.

For the action “Open Internet Explorer 8”, the average execution updateduration was 27.4 seconds, with a standard deviation (σ) of 16.76 seconds. Thethreshold chosen was 2σ, or from 0 to 61 seconds. Fig. 3 shows the resultsmodeled as a normal distribution, with a histogram of the data shown in Fig.4.

13

Figure 3: Normal distribution of IE8 execution times in seconds to two standarddeviations.

Figure 4: Histogram of IE8 execution times where X is time in seconds and Yis the number of occurrences within the update duration.

14

Figure 5: Normal distribution of FF3 execution times in seconds to two standarddeviations.

For the action “Open Firefox 3.6”, the average execution object updateduration was 24.5 seconds, with a standard deviation of 12.96 seconds. Thethreshold chosen was 2σ, or from 0 to 50 seconds (Fig. 5, 6).

Objects will be associated to the same action instance if each has beenupdated within the given threshold. To handle the situation of an overlap oftwo action instances where an object could be in a position to be associated witheither instance of the action, a search for the first and last time stamp in theset will be conducted, starting from the known point. If the update durationbetween the oldest object and the newest object is greater than the definedaction’s object update threshold, then at least two instances of the action musthave happened. For example, an artifact t2 is observed. t2 is associated to anaction whose signature consists of the set {t1, t2, t3}, and whose threshold (θ) is60 seconds. If t1 = 12:59:30, t2 = 13:00:00, and t3 = 13:00:58 then t2−t1 < θ andt3−t2 < θ. In this case t1 and t3 could be associated to the same action instancesince θ < 60 when compared to t2, even though θ < t3− t1 = 88. To handle thissituation, when an object is found, all existing time stamps associated to theaction instance are observed and sorted. The oldest time stamp is then used asthe base from where all other returned time stamps are compared. If any timestamp is greater than θ from the oldest time stamp, then multiple instances ofthe action must have happened.

15

Figure 6: Histogram of FF3 execution times where X is time in seconds and Yis the number of occurrences within the update duration.

3.4 Signature Analysis Model

The proposed signature analysis model for detecting actions uses the previouslydefined classes of signatures in a layered approach to build up knowledge ofactions that have happened in a system. To illustrate, a fictional example ofthis approach is given:

An action, ActionX, has a Core signature (SXCore) consisting of two timestamps, and a Supporting signature (SXSupport) that has three associated timestamps. All object update thresholds are defined as 30 seconds.

SXCore = {[(o1, t1), (o2, t2)], 30sec., Core}

SXSupport = {[(o3, t3), (o4, t4), (o5, t5)], 30sec.,Supporting}

The Shared signature (SXShared) forActionX has two associated time stamps.Both of these time stamps are also associated with another action, ActionY .The Shared signature for ActionY is denoted as SYShared.

SXShared = {[(o6, t6), (o7, t7)], 30sec., Shared}

SYShared = {[(o6, t6), (o7, t7)], 30sec., Shared}

The function SignatureMatch(O′, S) takes the system and signature as in-put, and returns the value of the observed action instance update time-span.

SignatureMatch(O′, SXCore) returns

16

ActionX ActionYCore 4/14/2010 19.28:25Core 4/14/2010 19:28:32Support 4/14/2010 15:13:25Support 4/14/2010 19:28:18Support 4/14/2010 19:28:34Shared 4/14/2010 19:28:25 4/14/2010 19:28:25Shared 05/02/2010 09:45 05/02/2010 09:45

Table 1: Summary of example time stamps related to ActionX.

{[“4/14/2010 19 : 28 : 25′′, “4/14/2010 19 : 28 : 32′′]}

SignatureMatch(O′, SXSupport) returns

{[“4/14/2010 15 : 13 : 25′′], [“4/14/2010 19 : 28 : 18′′, “4/14/2010 19 : 28 :34′′]}

SignatureMatch(O′, SXShared) returns

{[“4/14/2010 19 : 28 : 25′′], [“5/2/2010 9 : 45 : 02′′]}

SignatureMatch(O′, SYShared) returns

{[“4/14/2010 19 : 28 : 25′′], [“5/2/2010 9 : 45 : 02′′]}

The result of this detection process is summarized in Table 1.Since Core signature traces are always updated and relate only to ActionX,

it can be inferred that ActionX last happened approximately at 4/14/201019:28:25. Both ActionX Core timestamps are within θ, so the traces are con-sistent.

With the knowledge of the last execution time of ActionX, the Supportingsignature may now provide more information. In this case, two supportingtraces confirm the last execution time (t3 and t4). Traces in the Supportingsignature may not always be updated, as is shown by the supporting trace (t5)with a time stamp of 4/14/2010 15:13:25. This trace is consistent since the timeis before the identified last execution time (t1). Also, since Supporting tracesare associated only with one action, a previous execution of ActionX must havehappened at this time.

Finally, Shared traces are examined. Each trace is associated with bothActionX and ActionY . The first shared trace (t6) has a time stamp that iswithin the last execution time of ActionX; however, ActionY could have alsohappened at this time. Calculating the probability of one trace belonging to aparticular action has been discussed in [20, 21], but is beyond the scope of thiswork. Because of this, no conclusion can be made. The next trace, however,has a time that is after the detected last execution time (t1) of ActionX. Sincethis trace is associated only with ActionX or ActionY , it can be inferred that

17

ActionX ActionYLast Execution 4/14/2010

19.28:18Previous Execution 4/14/2010

15:13:2505/02/201009:45

Table 2: Known action instance approximation times after signature analysis.

Figure 7: Graph of objects in T related to ActionX grouped by θ, that showstwo distinct instances of ActionX.

the trace (t7) must belong to ActionY since it is not consistent with the infor-mation known about ActionX. An instance of ActionY must have happenedat approximately 5/2/2010 9:45:02, to be consistent with ActionX.

After this analysis, action instance approximations may be given as shownin Table 2.

The time stamps that are known to relate only to ActionX are shown inFig. 7. The times are grouped, where θ = 30 seconds. In the case of Core andSupporting signatures, where the traces are related only to ActionX, the mostrecent, as well as past executions of the action can be inferred.

This example illustrates that by layering multiple observations more infor-mation about previous executions of actions can be automatically inferred. Also,by building on already detected information, inferences about other non-relatedactions may be made. Evaluation of this method will be presented in a casestudy, where the process is applied to detect actions in a real environment.

4 Case Study

To test the proposed method, example signatures were created for the actions“Open Internet Explorer 8” and “Open Firefox 3.6” on Windows XP. WindowsXP was chosen since, at the time of data collection, it was still the most fre-quently encountered operating system of surveyed law enforcement [17]. Forbrevity, only Core and Supporting objects will be analyzed. The tested signa-tures are defined as regular expressions1 to support portability, and are listedin Table 3 and Table 4. As previously shown, θ for FF3 is 50 seconds, and θ forIE8 is 61 seconds.

1For more information on Regular Expressions, see http://www.bsd.org/regexintro.html

18

Category TimeStamp

Objects related to ‘OpeningFF3’

Core Modified .*/Firefox/Profiles/.*default/urlclassifierkey.\.txt

Core Modified .*/Prefetch/Firefox\.EXE-.*\.pf

Support Created .*/Prefetch/Firefox\.EXE-.*\.pf

Support Created .*/Firefox/Profiles/.*default/cookies.sqlite-journal

Support Created .*/Firefox/Profiles/.*\default/urlclassifierkey.\.txt

Support Created .*/Firefox/Profiles/ .*de-fault/startupCache$

Support Created .*/Firefox/Profiles/ .*de-fault/pluginreg.dat

Table 3: FF3 objects, categories and corresponding time stamp of interest.

Category TimeStamp

Objects related to ‘OpeningIE8’

Core Modified .*/Prefetch/IEXPLORE\.EXE-.*\.pf

Support Created .*/Prefetch/IEXPLORE\.EXE-.*\.pf

Support Created .*/Cookies/.*@ATDMT\[[0-9]\]\.TXT

Support Created .*/Cookies/.*@BING\[[0-9]\]\.TXT

Support Created .*/Cookies/.*@live\[[0-9]\]\.TXT

Table 4: IE8 objects, categories and corresponding time stamp of interest.

19

The case consists of two computers running Windows XP with both IE8 andFF3 installed. Each computer was used daily for entertainment, work and studytasks. Both users identified that they used Firefox as their primary browser. Toaccurately determine when IE8 and FF3 have been opened and closed, a Win-dows Security auditing policy was implemented on both computers to monitorprocess creation and executable access. Each computer was monitored for anumber of days, after which the Windows security event log was exported andthe computer’s file system meta-data was collected with tools from The SleuthKit version 3.2.2. The resulting Windows Security Logs and meta-data outputsare available as a downloadable dataset [22].

On ‘Computer 1’, 12 instances of opening FF3 from the 19th to the 24thwere identified from the Windows event log, and 6 instances of opening IE8 wereidentified. The meta-data from Computer 1 was scanned using the previouslydefined signature for opening FF3. The identified objects and associated timestamps are shown in Table 5.

In this case, all Core objects were discovered. However, one Core object hada time stamp that was different than another Core object. This unexpectedbehavior can be explained by looking at the Open and Close times from theWindows event log. The Firefox open event with the process ID 4284 occurredat 13:23, and was never followed by a process close event. While process 4284was still open, another instance of Firefox was started, process 5480, at 15:02.If process 4284 had locked the object in question, then the time stamp may notbe updated upon another instance of the action. However, T2 was not lockedby the first event, and was updated. Since both objects must be updated whenthe action happens, this must mean that two instances of the same action mustbe running in parallel, otherwise both artifacts would be updated.

Next, the meta-data from Computer 1 was scanned using the previouslydefined signature for opening IE8. The identified objects and associated timestamps are shown in Table 6. In this case, all Core objects were detected,identifying the most recent execution of IE8 as happening at approximately14:56 on 07/23/2011. All other associated objects had timestamps before thistime.

On Computer 2, 14 instances of opening FF3 from the 13th to the 17th wasidentified from the Windows event log, and two instances of opening IE8 wereidentified. The meta-data from Computer 2 was scanned using the previouslydefined signature for opening FF3. The identified objects and associated timestamps are shown in Table 7. In this case all Core artifacts were detected, iden-tifying the most recent execution of FF3 as approximately 20:24 on 07/17/2011.All other associated objects had time stamps before this time.

Next, the meta-data from Computer 2 was scanned using the previouslydefined signature for opening IE8. The identified objects and associated timestamps are shown in Table 8. In this case, all Core artifacts were detected,setting the most recent execution of IE8 at approximately 15:15 on 07/17/2011.All other associated objects had time stamps before this time.

20

Category Time Returned ObjectCore 07/24/2011

13:24:14C:/Documents and Set-tings/User1/ ApplicationData/Mozilla/Firefox/Profiles/94370b5u.default/urlclassifierkey3.txt

Core 07/24/201115:02:31

C:/Windows/Prefetch/Firefox.exe-28641590.pf

Support 12:26/201004:26:24

C:/Windows/Prefetch/Firefox.exe-28641590.pf

Support 07/24/201113:24:10

C:/Documents and Set-tings/User1/ ApplicationData/Mozilla/Firefox/Profiles/94370b5u.default/cookies.sqlite-journal

Support 01/05/201123:15:34

C:/Documents and Set-tings/User1/ ApplicationData/Mozilla/Firefox/Profiles/94370b5u.default/urlclassifierkey3.txt

Support N/A .*/Firefox/Profiles/ .*de-fault/startupCache$

Support 12/26/201003:04:55

C:/Documents and Set-tings/User1/ ApplicationData/Mozilla/Firefox/Profiles/94370b5u.default/pluginreg.dat

Table 5: FF3 objects and associated timestamps found using signature detectionon Computer 1.

21

Category Time Returned ObjectCore 07/23/2011

14:56:53C:/Windows/Prefetch/Iexplore.exe-27122324.pf

Support 07/19/201100:57:22

C:/Windows/Prefetch/Iexplore.exe-27122324.pf

Support 06/14/201110:47:26

C:/Documents andSettings/User1/ Cook-ies/user1@atdmt[2].txt

Support 01/11/201119:40:26

C:/Documents andSettings/User1/ Cook-ies/user1@bing[2].txt

Support 06/14/201110:47:26

C:/Documents andSettings/User1/ Cook-ies/user1@live[1].txt

Table 6: IE8 objects and associated time stamps found using signature detectionon Computer 1.

4.1 Evaluation

In this case study signatures were used to detect Core and Supporting objects.After, objects within the action’s object update threshold were grouped andconsidered related to the same action instance. The result is a list of objecttimestamps related to a particular action of a particular computer. The resultsof the previous signature detection, where the Windows Event log confirms alldetected timestamps, are given in Table 9.

Table 9 shows that the most recent execution, as well as at least one pastinstance of Firefox was detected on both computers. Further, at least the mostrecent execution of Internet Explorer was detected on both computers.

Detecting more previous instances of Firefox than Internet Explorer may be aresult of both users using Firefox as their primary browsers, meaning that theremay have been fewer instances of Internet Explorer during the experiment. Inall cases, however, the most recent instance of the action was always accuratelydetected.

4.1.1 Further Implementation

This case study was meant to illustrate the described theory. The researchersare currently implementing the proposed model in a tool designed for digital in-vestigators, and are applying it to anti-forensic action detection2. For signaturecreation, specific actions of interest are chosen, and a sandbox is used to extractobject updates in a target system. Object update ‘traces’ are categorized basedon the given model, and relevant traces are added as signatures for actions ofinterest. The tool then accepts a suspect disk image (or live computer), and

2The open source tool implementing the proposed theory can be found athttp://github.com/hvva/IoAF

22

Category Time Returned ObjectCore 07/17/2011

20:24:26C:/Documents and Set-tings/user/ ApplicationData/Mozilla/Firefox/Profiles/c2yzki95.default/urlclassifierkey3.txt

Core 07/17/201120:24:18

C:/Windows/Prefetch/Firefox.exe-28641590.pf

Support 05/21/201016:15:23

C:/Windows/Prefetch/Firefox.exe-28641590.pf

Support 05/14/201016:44:22

C:/Documents and Set-tings/user/ ApplicationData/Mozilla/Firefox/Profiles/c2yzki95.default/cookies.sqlite-journal

Support 10/23/201011:26:02

C:/Documents and Set-tings/user/ ApplicationData/Mozilla/Firefox/ Pro-files/c2yzki95.default/cookies.sqlite-journal(deleted)

Support 04/13/201100:33:49

C:/Documents and Set-tings/user/ ApplicationData/Mozilla/Firefox/Profiles/c2yzki95.default/urlclassifierkey3.txt

Support 07/17/201115:23:05

C:/Documents and Set-tings/user/ ApplicationData/Mozilla/Firefox/Profiles/c2yzki95.default/startupCache

Support 07/17/201100:46:36

C:/Documents and Set-tings/user/ ApplicationData/Mozilla/Firefox/Profiles/c2yzki95.default/pluginreg.dat

Table 7: FF3 objects and associated time stamps found using signature detec-tion on Computer 2.

23

Category Time Returned ObjectCore 07/17/2011

15:15:13C:/Windows/Prefetch/Iexplore.exe-27122324.pf

Support 07/17/201115:15:09

C:/Windows/Prefetch/Iexplore.exe-27122324.pf

Support 03/10/201115:01:01

C:/Documents andSettings/User1/ Cook-ies/user1@atdmt[1].txt

Support 03/10/201115:38:37

C:/Documents andSettings/User1/ Cook-ies/user1@bing[2].txt

Support 03/10/201115:38:37

C:/Documents andSettings/User1/ Cook-ies/user1@live[2].txt

Table 8: IE8 objects and associated time stamps found using signature detectionon Computer 2.

Computer Action LoggedTime

DetectedTime

Computer 1 OpenFF3

07/24/201115:02:30

07/24/201115:02:31

Computer 1 OpenFF3

07/24/201113:23:58

07/24/201113:24:14

Computer 2 OpenFF3

07/17/201120:24:14

07/17/201120:24:18

Computer 2 OpenFF3

07/17/201115:23:01

07/17/201115:23:05

Computer 2 OpenFF3

07/17/201100:46:14

07/17/201100:46:36

Computer 1 OpenIE8

07/23/201114:56:46

07/23/201114:56:53

Computer 2 OpenIE8

07/17/201115:15:06

07/17/201115:15:09

Table 9: Detected executions of an action based on signature matching overmeta-data separated by a measured object update threshold, where the Win-dows Security Log confirms each detected time.

24

matches generated signatures with extracted suspect meta-data to reconstructaction instances in the suspect system.

4.1.2 Weaknesses

The greatest weakness with this method is the same weakness in all signature-based detection methods. Not all possible actions can be known, and unknownactions will not be considered. This lack of complete knowledge is especiallyrelevant in the case of action instance signature generation. If not all possibleactions are known, then it is impossible definitely determine a trace’s category.The answer to this weakness comes from the investigator. Human knowledgeis also incomplete, yet human investigators are able to state that actions in asystem must have happened based on their knowledge of how the system works.Human investigators also update their knowledge based on new experiences.Signature-based action instance detection methods must be able to be updatedwhen new information that leads to changing a trace’s category is found. Evenif action signatures are being updated, it is still not possible to account forevery piece of custom-made software. For this reason, this method is moresuitable as a pre-analysis inference guide for the human investigator or for post-examination human inference verification rather than for completely automatedinvestigations.

Another consideration is when an incident has multiple relevant actions.While some actions can be differentiated by using Core, Supporting and Sharedobjects, if relevant actions happen within a very short time of each other – asthe formal model shows – relating the object to the specific action is impossible.The final state of the system simply does not contain enough information todifferentiate two similar action instances happening at approximately the sametime. Further, this method can help detect action instances, but in most casesit will not help with the problem of attribution, unless a specific, attributablepattern is detectable. For example if malicious software creates detectable objectupdate patterns in the system.

5 Conclusions

In this work, the concept of signature detection of actions was discussed. Threecategories of signatures were formally defined that, based on their unique updatepatterns, allow for the detection of the most recent as well as past action instanceapproximations. A method for differentiating between two instances of thesame action within a very short time range was also given. After, the signatureanalysis method was illustrated with an example. Finally, a case study using theproposed signature analysis method on two real-world computers was examined.The case study showed good results in detecting the most recent instance of theaction, and did give more information about past action instances. However,the method did not detect every past instance of the action, but it also did notgive false positives. Overall, by using the proposed method to automatically

25

approximate action instances and display them in a timeline, an investigatorcan very quickly get an idea of how, and when, the system has been usedregardless of their knowledge of the system or specific data being analyzed.

Future work will look at improving the action instance extraction by in-tegrating more objects/content into the signature, such as Windows Registryand system log entries. Further, much like intrusion detection systems, a hy-brid statistical and signature-based approach may help reduce signature-basedweaknesses while still maintaining a high level of accuracy. And finally, futurework will attempt to monitor a system for a longer period of time to determineif detected times very far in the past correspond with actual action instances.

References

[1] Gary Palmer. DFRWS Technical Report: A Road Map for Digital ForensicResearch, 2001.

[2] Eoghan Casey. Digital forensics: Coming of age. Digital Investigation,6(1-2):1–2, 2009.

[3] Simson L Garfinkel. Digital forensics research: The next 10 years. DigitalInvestigation, 7(Supplement 1):S64–S73, 2010.

[4] Greg Gogolin. The Digital Crime Tsunami. Digital Investigation, 7(1-2):3–8, 2010.

[5] E Casey, M Ferraro, and L Nguyen. Investigation Delayed Is Justice De-nied: Proposals for Expediting Forensic Examinations of Digital Evidence*.Journal of forensic sciences, 54(6):1353–1364, 2009.

[6] Jeff Raasch. Child porn prosecutions delayed by backlog of cases, 2010.

[7] BBC. Police ’need more e-crime skills’, 2004.

[8] InfoSecurity. Digital forensics in a smarter and quicker way?, 2009.

[9] Don Kohtz. Dealing with the Digital Evidence Backlog, 2011.

[10] P Gladyshev and A Patel. Finite state machine approach to digital eventreconstruction. Digital Investigation, 1(2):130–149, 2004.

[11] B D Carrier and E H Spafford. A hypothesis-based approach to digitalforensic investigations. CERIAS, PhD, 2006.

[12] A R Arasteh, M Debbabi, A Sakha, and M Saleh. Analyzing multiple logsfor forensic evidence. Digital Investigation, 4:82–91, 2007.

[13] J James, P Gladyshev, M T Abdullah, and Y Zhu. Analysis of EvidenceUsing Formal Event Reconstruction. Digital Forensics and Cyber Crime,31:85–98, 2010.

26

[14] A Marrington, G Mohay, H Morarji, and A Clark. A Model for ComputerProfiling. pages 635–640. IEEE, 2010.

[15] M N A Khan and I Wakeman. Machine Learning for Post-Event TimelineReconstruction. pages 112–121, 2006.

[16] M N A Khan, C R Chatwin, and R C D Young. A framework for post-event timeline reconstruction using neural networks. Digital Investigation,4(3-4):146–157, 2007.

[17] J James. Survey of Evidence and Forensic Tool Usage in Digital Investiga-tions , 2010.

[18] Peter Menzies. Counterfactual Theories of Causation, 2008.

[19] Joshua I. James, Pavel Gladyshev, and Yuandong Zhu. Signature Based De-tection of User Events for Post- Mortem Forensic Analysis. Digital Foren-sics and Cyber Crime, 53:96–109.

[20] Megan Carney and Marc Rogers. The Trojan made me do it: A first stepin statistical based computer forensics event reconstruction. InternationalJournal of Digital Evidence, 2(4):1–11, 2004.

[21] M Kwan, K P Chow, F Law, and P Lai. Reasoning about evidence usingBayesian networks. Advances in Digital Forensics IV, pages 275–289, 2008.

[22] J I James. Internet Explorer and Firefox User Activity Dataset, 2013.

27