predicting cyber-attacks using hawkes processes · data breaches dataset autocorrelation of the...

Alexandre Boumezoued, Milliman R&D

OICA, 29 April 2020

Joint work with Caroline Hillairet (ENSAE) and Yannick Bessy-Roland (AXA GIE)

Predicting Cyber-Attacks using

Hawkes Processes

Agenda

1 Introduction

2 Data breaches dataset

3 Hawkes model

4 Fitting and prediction

5 References

Attack that occurs on the same day a weakness is discovered in software. At that point, it's exploited before a fix becomes available from its creator

The attacker secretly relays and possibly alters the communications between two parties who believe they are directly communicating with each other

Introduction

Example of types of cyber attacks

The attacker sends a document

appearing reliable (mainly e-mail) in

order to collect sensitive information

Click to edit Master text styles

Man in the middle

Zero day

Denial of service

MalwarePhishing

• Definition: The risk of an attack on digital data as well as the consequences on the information system

Software that is specifically designed to disrupt, damage, or gain unauthorized access to a computer system

Attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by disrupting services of a host connected to the Internet

IntroductionCyber insurance covers

Damage

Crisis

management

Third party

Cyber insurance

Crisis management:

Costs of investigation

Costs of assistance

Other costs of crisis management

Damage:

Data cleaning

Data restauration

Payment of ransom

Operating loss

Third party liability:

Virus transmission

Personal liability insurance

Denial of service

Agenda

1 Introduction

3 Hawkes model

5 References

Name of the covered entity

Type of the covered entity:Healthcare provider, businesses…

Data breaches datasetThe Privacy Rights Clearinghouse database

Database

Description

A public database that contains 8871 data breaches in the US over the period 2005-2019

https://www.privacyrights.org/data-breaches

Covariates

Those used in the results presented are highlighted in blue

Total records breached

Date Made Public: day/month/year

Type of Breach : « Hacking / IT

incident » , « Theft »

Localization of

the breached

entity: state/city

Data breaches datasetDataset description: types of breaches

Data breaches datasetDataset description: types of targets

A majority of Theft/Loss and Hacking/Malware

21% of Unintented disclosure

A majority in Healthcare/Medical

Businesses are well represented too

Data breaches datasetDescriptive statistics over 2010-2018

Data breaches datasetCyber attacks frequencies by type and organization

Apparent clusteringby type of attacks

Deterministic trendsor stochastic regimes?

Apparent clusteringby type of organization attacked

No clear trends

Data breaches datasetAutocorrelation of the number of incidents

R-squared : 0.726

Confidence interval (95%)[0.687, 0.766]

R-squared : 0.718

Regression of the number of event during the following month 𝒕 + 𝟏 as a function of the number of event during the current month 𝑡 → should be independent for a Poisson process model to be valid

Autocorrelation dramatically increases when focusing on attacks and/or organizations of the same type

R-squared : 0.154

R-squared : 0.780

Agenda

1 Introduction

3 Hawkes model

5 References

Hawkes modelChoice of the Hawkes model

Taking into account autocorrelation

Cox model : Poisson model with stochastic intensity → difficulty to specify the stochastic intensity dynamics

Hawkes model : Self-exciting model with stochastic intensity, fully specified by the point process itself

Choice of the Hawkes model:

Self-excitation: every event increases the probability for a new event to occur within a given group (same

organization or attack type)

Clustering: the self-exciting property allows to model cluster effect (groups of attacks – same origin!)

Inter-excitation: in the case of multi-dimensional Hawkes process, every attack in one group increases the

occurrence probability of new events in the other groups

Related references:

Peng et al. (2017), Baldwin et al. (2017)

A Hawkes process with exponential kernel is a counting process 𝑁𝑡 = σ𝑛≥1 1𝑇𝑛≤𝑡 with intensity:

𝜆 𝑡 = 𝜇 𝑡 +

𝑇𝑛<𝑡

𝛼 exp(−𝛽(𝑡 −𝑇𝑛))

𝜇:ℝ+ → ℝ+is a deterministic baseline intensity

The sum represents the impact of past events; it captures the self-excitation property

Hawkes modelUnivariate Hawkes process

Each jump represents an attack

Clustering phenomena

Intensity decreases exponentially between jumps

𝜆 𝑡 𝑁𝑡

Multivariate Hawkes process - presentation

Multivariate Hawkes process allows to model interactions between types of entities/attacks/states:

𝑁𝑡1

𝑡≥0, … , 𝑁𝑡

𝑡≥0, 𝐾 counting processes with jump times 𝑇𝑛

𝑛≥1, … , 𝑇𝑛

(𝐾)

𝑛≥1

The intensity process with exponential kernel of the counting process (𝑖) is defined as:

Matrix of excitation:

𝛼 =𝛼1,1 𝛼1,2𝛼2,1 𝛼2,2

=0.0 0.990.0 0.90

Group 2 is purely self-excited

Group 1 is fully influenced by Group 2

Hawkes model

𝜆𝑖 𝑡 = 𝜇𝑖(𝑡) +

𝑗=1

𝑇𝑛𝑗<𝑡

𝛼𝑖,𝑗 exp −𝛽𝑖,𝑗(𝑡 − 𝑇𝑛𝑗)

Group 1 self-excitation

Impact of Group 2 on Group 1

Impact of Group 𝑗 on Group 𝑖

Hawkes modelMultivariate Hawkes process - kernels

« Classical » exponential kernel:𝜙𝑖,𝑗 𝑠 = 𝛼𝑖,𝑗 exp −𝛽𝑖,𝑗𝑠

Instantaneous excitation Complexity: we assume all 𝛽𝑖,𝑗 (excitation

memory of 𝑖 from 𝑗) depend on the groups and are to be calibrated

The intensity process is not Markov (for dimension ≥ 2)

Kernel with delay: 𝜙𝑖,𝑗 𝑠 = 𝛼𝑖,𝑗𝑠 exp −𝛽𝑖𝑠 The intensity process is not Markov (even

in dimension 1) Complexity: we assume all 𝛽𝑖 (excitation

memory of 𝑖 from any other group) depend on the groups and are to be calibrated

Kernels Intensity process

Development of closed-form formulas for the expected number of claims for such multivariate Hawkes processes

Agenda

1 Introduction

3 Hawkes model

5 References

Fitting and predictionData Grouping

Crossing variables: attack type, sector, state

Retaining groups with more than 200 attacks and remaining in OTHER

Total: six segments

Fitting and predictionModel specification

The three Hawkes kernels considered:

Baseline intensity: 𝝁𝒊 𝒕 = 𝝁𝟎,𝒊 + 𝜸𝒊𝒕 to account for trends in the dataset

Possibility to add a Lasso penalty (not discussed here)

𝐿 𝜇, 𝜙 𝑝𝑒𝑛𝑎𝑙𝑖𝑧𝑒𝑑 = 𝐿(𝜇, 𝜙) − 𝝂σ𝟏≤𝒊,𝒋≤𝒅 |𝜶𝒊,𝒋|

Reduce complexity

Improve prediction capacity

Kernel 1 (exp) Kernel 2 (exp) Kernel 3 (delay)

𝜙𝑖,𝑗 𝑠 𝛼𝑖,𝑗exp(−𝜷𝒊𝑠) 𝛼𝑖,𝑗exp(−𝜷𝒊,𝒋𝑠) 𝛼𝑖,𝑗𝐬exp(−𝜷𝒊𝑠)

Nb parameters 54 84 54

Likelihood: kernel 3 with delay best fits the data

Adequacy tests (Kolmogorov-Smirnov): adequacy is satisfactory, except for group (4)

Fitting and prediction Calibration results

Kernel 1

Kernel 2

Kernel 3

(delay)

𝜙𝑖,𝑗 𝑠 𝛼𝑖,𝑗exp(−𝜷𝒊𝑠) 𝛼𝑖,𝑗exp(−𝜷𝒊,𝒋𝑠) 𝛼𝑖,𝑗𝐬exp(−𝜷𝒊𝑠)

Nb parameters 54 84 54

-Likelihood (2011-2015) 6513 6172 6153

-Likelihood (2011-2016) 7639 7516 7485

Parameter estimates (kernel 3):

Fitting and prediction Calibration analysis

Matrix of ratios between maximal

excitation and baseline intensity:

Same orders of magnitude

Partly captures the historical trend

Captures the baseline non-excited intensity

Captures the major self/externalinteractions

Strong self-excitation in theseMED groups

Strong reciprocalself-excitation of groups (2) and (4)

«Causal» excitation of (2) by (5)

Fitting and prediction Out-of-sample prediction results for 2017 (kernel 3)

Simulation based on the thinning algorithm for point processes

Predictions with mean and (0.5%, 99.5%) percentiles

Joint prediction of all groups capturing the causal and asymmetric interactions

Parameter uncertainty can be added

Agenda

1 Introduction

3 Hawkes model

5 References

References

Baldwin, A., Gheyas, I., Ioannidis, C., Pym, D., & Williams, J. (2017). Contagion in cyber security attacks. Journal of the

Operational Research Society, 68(7), 780-791.

Böhme, R., & Kataria, G. (2006, June). Models and Measures for Correlation in Cyber-Insurance. In WEIS.

Boumezoued, A. (2016). Population viewpoint on Hawkes processes. Advances in Applied Probability, 48(2), 463-480.

Daley, D. J., & Vere-Jones, D. (2007). An introduction to the theory of point processes: volume II: general theory and

structure. Springer Science & Business Media.

Edwards, B., Hofmeyr, S., & Forrest, S. (2016). Hype and heavy tails: A closer look at data breaches. Journal of

Cybersecurity, 2(1), 3-14.

Hawkes, Alan G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika, 58(1), 83-90.

Hawkes, Alan G, David Oakes. (1974). A cluster process representation of a self-exciting process. J. of Applied Probability

493–503.

Oakes, David. (1975). The Markovian self-exciting process. Journal of Applied Probability 69–77.

Peng, C., Xu, M., Xu, S., & Hu, T. (2017). Modeling and predicting extreme cyber attack rates via marked point processes.

Journal of Applied Statistics, 44(14), 2534-2563.

Xu, M., & Hua, L. (2019). Cybersecurity Insurance: Modeling and Pricing. North American Actuarial Journal, 1-30.

Disclaimer

This presentation presents information of a general nature. It is not intended to guide or determine any specific individual situation and Milliman recommends that users of this presentation will seek explanation and/or amplification of any part of the presentation that they consider not to be clear. Neither the presenter nor the presenter's employer shall have any responsibility or liability to any person or entity with respect to damages alleged to have been caused directly or indirectly by the content of this presentation. All persons who choose to rely in any way on the contents of this presentation do so entirely at their own risk.The contents of this presentation are confidential and must not be modified, copied, quoted, distributed or shown to any other parties without Milliman's prior written consent. Copyright © Milliman 2020. All rights reserved

predicting cyber-attacks using hawkes processes · data breaches dataset autocorrelation of the...

Documents

autocorrelation & ssd

pearson's chi-squared test data: table3b x-squared = 22

7. autocorrelation (violation of assumption #b3) · pdf...

correlation and autocorrelation

multivariate tests for autocorrelation in the stable and ......

spatial autocorrelation using gis

6. autocorrelation · 6.2 autocorrelation coefficient in...

autocorrelation or serial correlation

autocorrelation in regression analysis

20150404 rm - autocorrelation

2694.5a-1/3 · 2018-04-10 · 2694.5012 fix ar berceau av d...

autocorrelation ii

optical autocorrelation using non-linearity in a simple ......

heteroskedasticity- and autocorrelation-robust inference ·...

autocorrelation - statplus€¦ · autocorrelation...

lesson 7: estimation of autocorrelation and partial ... ·...

spatial analysis - autocorrelation

chapter 07 - autocorrelation

using the autocorrelation function in nih-image to … ·...

crime trend analysis by changes of spatial autocorrelation...