structured models with neural...
TRANSCRIPT
![Page 1: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/1.jpg)
Structured models with neural networks
Tomáš Pevný
February 28, 2020
![Page 2: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/2.jpg)
Motivation — identification of infected computers
examples of messages:http://lop.guardpair.com/affs?addonname=[Enter%20Product%20Name]&affid=9050&subaffid=5774&subID=undefined&clientuid=undefined&origaffid=9050&origsubaffid=5774&href=http%3A%2F%2F7http://pf.updatewp.org/?v=3.16&pcrc=308403836&LSVRDT=&ty=CHECKhttp://rules.similardeals.net/v1.0/whitelist/1052/9050x5774/7online.subsea7.net?partnerName=Lyrics
![Page 3: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/3.jpg)
Motivation — representation of PE file format
Executable
CodeDataPE header
Call Graph Functions
Control Flow Graph Basic Blocks
StringsImports
Instructions
f1
f2
f3
f4
f7
f8
f5
f6
b1
b3b2
b4
b5b6
b7
b8
![Page 4: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/4.jpg)
Single-instance learning
x = (x1, . . . , xd) f(x) =
{+1
−1extractfeatures
send toclassifier
Icon made by Freepik from www.flaticon.com
![Page 5: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/5.jpg)
Multi-instance learning
x =
(x1,1, . . . , x1,d)(x1,1, . . . , x1,d)
...(xb,1, . . . , xb,d)
f(x) =
{+1
−1extractfeatures
send toclassifier
![Page 6: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/6.jpg)
Our solution of multi-instance learning
x1 ∈ Rd
x2 ∈ Rd
x3 ∈ Rd
xb ∈ Rd
...
h(x1, θh)
h(x2, θh)
h(x3, θh)
h(xb, θh)
x̃1 ∈ Ru
x̃2 ∈ Ru
x̃3 ∈ Ru
x̃b ∈ Ru
...
g({x̃i}bi=1, θg
)x̄ ∈ Ru f (x̄, θf )
g({x̃i}bi=1
)= 1
l
∑bi=1 x̃i
h(x, θh) : Rd → Ru
h(x, θh) = max{0, xTθh}f(x̄, θf ) : Ru → R2
f(x̄, θf ) = x̄Tθf
One vector per instance
One vector per sample
![Page 7: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/7.jpg)
Generalized multi-instance learning
x =
(x(1)1 , . . . , x
(1)d1
)
(x(2)1,1, . . . , x
(2)1,d2
)
(x(2)1,1, . . . , x
(2)1,d2
)...
(x(2)b,1 , . . . , x
(2)b,d2
)
(x(3)1,1, . . . , x
(3)1,d3
)
(x(3)1,1, . . . , x
(3)1,d3
)...
(x(3)c,1 , . . . , x
(3)c,d3
)
f(x) =
{+1
−1extractfeatures
send toclassifier
![Page 8: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/8.jpg)
Extension of universal approximation theorem
Let S be the class of spaces which1. contains all compact subsets of Rd, d ∈ N2. is closed under finite cartesian products3. for each X ∈ S we have P(X ) ∈ S1
Then for each X ∈ S, every continuous function on X can bearbitrarily well approximated by neural networks.
1Here we assume that P(X ) is endowed with some metric.
![Page 9: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/9.jpg)
Identification of infected computers
![Page 10: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/10.jpg)
Detection of infected users — model
d(1)1
. . . d(1)n
d(2)1
. . . d(2)n
p(1)1
. . . p(1)n
p(2)1
. . . p(2)n
k(1)1
. . . k(1)n
v(1)1
. . . v(1)n
k(2)1
. . . k(2)n
v(2)1
. . . v(2)n
�KVq(1)1
. . . q(1)n
�KVq(2)1
. . . q(2)n
�Dd1 . . . dn
�Pp1 . . . pn
�Qq1 . . . qn
�Uu(1)1
. . . u(1)3n
u(2)1
. . . u(2)
l
u(3)1
. . . u(3)
l
u(4)1
. . . u(4)
l
u(5)1
. . . u(5)
l
u(6)1
. . . u(6)
l
�SLDs(1)1
. . . s(1)
k
�SLDs(2)1
. . . s(2)
k
�SLDs(3)1
. . . s(3)
k
�userx1 . . . xp
fy
![Page 11: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/11.jpg)
Detection of infected users — results
I Training data from 5-30.10 20173.6 · 109/4.45 · 106 urls / users
I Testing data from 3.11 20172 · 108/1.5 · 106 urls / users
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
recall
precision
Convolution [1] R. Forests [2]MIL - URL MIL - 5 min
[1] eXpose: A character-level convolutional neural network with embeddings for detectingmalicious URLs, file paths and registry keys, J. Saxe and K. Berlin, 2017
[2] Learning detectors of malicious web requests for intrusion detection in network traffic, L.Machlica, K. Bartos, and M. Sofka, 2017
![Page 12: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/12.jpg)
Static analysis of malware binaries — PE file format
Executable
CodeDataPE header
Call Graph Functions
Control Flow Graph Basic Blocks
StringsImports
Instructions
f1
f2
f3
f4
f7
f8
f5
f6
b1
b3b2
b4
b5b6
b7
b8
![Page 13: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/13.jpg)
Static analysis of malware binaries — model of a function
push regmov reg regmov reg memdec regjne num
mov reg mempush regmov mem regcall KERNEL32!Disable...
xor reg reginc regpop regretn num
1
2
3
tokenization block representationdisassembly output
2554116045
669126
1
2
3
11251163
ngrams
x1
x2
x3
MIL
NN
![Page 14: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/14.jpg)
Static analysis of malware binaries — model of a binary
x1x2x3
x4x5
xn
Block NNBlock NNBlock NN
Block NNBlock NN
Block NN
f1
f2
fm
meanmax
meanmax
meanmax
y2
Function NN
Function NN
Function NN
y1
ym
meanmax Binary NN
map groupby reduce map groupby reduce
![Page 15: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/15.jpg)
Static analysis of malware binaries — results
I Training data4 · 105 binaries
I Testing data5.5 · 105 binaries
I approx. 10% were malicious
![Page 16: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/16.jpg)
Static analysis of malware binaries — feedback
I avoids specifying imports in PEheader
I loads addresses of libraryfunctions to an array
![Page 17: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/17.jpg)
Static analysis of malware binaries — feedback
I adware related behaviorI creating both visible and
invisible windows
![Page 18: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/18.jpg)
Future directions
I Learning representation of hierarchical dataI Learning distances for clusteringI Generative modelsI Anomaly / few shot learning
I Game theory in securityI Finding new attacks under constraints
I Decentralized learningI Modeling and reasoning over relational data
![Page 19: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/19.jpg)
Modeling the internet
https://www.threatcrowd.org/domain.php?domain=ibuyitttttttttttttttttttttttttttttttttttibuyit.com
![Page 20: Structured models with neural networksai.ms.mff.cuni.cz/~sui/pevnak.pdfDetectionofinfectedusers—results I Trainingdatafrom5-30.102017 3:6109=4:45106 urls/users I Testingdatafrom3.112017](https://reader035.vdocuments.mx/reader035/viewer/2022070909/5f907a5cb30e9e6f352b8ee2/html5/thumbnails/20.jpg)
Materials
I https://github.com/pevnak/Mill.jlI https://github.com/pevnak/JsonGrinder.jl
I Discriminative models for multi-instance problems withtree-structure, Tomáš Pevný, Petr Somol, 2016
I Using Neural Network Formalism to Solve Multiple-InstanceProblems, Tomáš Pevný, Petr Somol, 2016
I Approximation capability of neural networks on sets ofprobability measures and tree-structured data, Tomáš Pevný,Vojtěch Kovařík, 2019
I Nested Multiple Instance Learning in Modelling of HTTPnetwork traffic, Tomas Pevny, Marek Dedic, 2020