open-csam information aggregator and reporting … · scalable backend using django rest api...

24
Georgios Chatzichristos Operational Security Unit - ENISA 04 06 2019 OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING TOOL USING AI AND NATURAL LANGUAGE PROCESSING

Upload: others

Post on 30-Jul-2020

18 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

Georgios Chatzichristos

Operational Security Unit - ENISA

04 06 2019

OPEN-CSAM

INFORMATION AGGREGATOR AND

REPORTING TOOL USING AI AND

NATURAL LANGUAGE PROCESSING

Page 2: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

THE GOAL

Help Decision Makers

take better decisions !

Page 3: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

3

THE TRIGGER

Open Cyber Security Awareness Machine

BluePrint

Technical

Operational

OperationalTechnical

Page 4: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

4 Short name of the powerpoint presentation, maximum length two thirds of the page

Page 5: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

5

The Vision

Open Cyber Security Awareness Machine

Make enisa an open source info hub with good training data for AI available for all

Threat analystsAcademiaEssential Services providers

Researchers

Cyber Security professionals

.

.

.

.

Cyber Security

Professionals

CSIRTs Training data for AI

Use services

Contribute to QoS

Page 6: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

6

The process

Open Cyber Security Awareness Machine

Monitor & cluster

trending open source

information

Cybersecurity search engine

Reporting

Page 7: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

7

NLP

Open Cyber Security Awareness Machine

Continuous monitoringDaily/Weekly/Monthly/Yearly

Stats

Trending terms in Tweets Trending terms in News

ENISA’s

termsENISA’s

topics

Monitor & cluster

trending open source

information

Cybersecurity search engine

Reporting

Page 8: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

8

AI

Open Cyber Security Awareness Machine

Continuous monitoringDaily/Weekly/Monthly/Yearly

Stats

Monitor & cluster

trending open source

information

Cybersecurity search engine

Reporting

Page 9: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

9 Open Cyber Security Awareness Machine

Hardcoded

Used to drive AI

Knowledge Graph

Monitor & cluster

trending open source

information

Cybersecurity search engine

Reporting

Page 10: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

10

Searching

Open Cyber Security Awareness Machine

Monitor & cluster

trending open source

information

Cybersecurity search engine

Reporting

Page 11: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

11 Open Cyber Security Awareness Machine

Searching

Monitor & cluster

trending open source

information

Cybersecurity search engine

Reporting

Page 12: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

12

Reporting

Open Cyber Security Awareness Machine

Monitor & cluster

trending open source

information

Cybersecurity search engine

Reporting

Page 13: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

OPENCSAM

CURRENT STATUS OF

DEVELOPMENT

EAU DE WEB

Page 14: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

14

OPENCSAM STATUS

Open Cyber Security Awareness Machine

3 main directions:

● Scalable backend using Django REST API

● Friendly UX for all user types

● Proof-of-concept implementations for:

○ Knowledge Graph enrichment (w/ Twitter hashtags and

PageRank)

○ News clusterization (w/ Universal Sentence Encoder)

○ Text summarization (w/ Universal Sentence Encoder)

○ Classify web content for KG terms (w/ Tensorflow and

corpus-derived FastText SkipGram word embeddings)

○ Classify web corpus (w/ USE & seed corpus)

Page 15: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

15

OPENCSAM ARCHITECTURE

Short name of the powerpoint presentation, maximum length two thirds of the page

Page 16: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

16

Django REST API based backend

Celery for queue management and

scrapers

OPENCSAM - BACKEND

Short name of the powerpoint presentation, maximum length two thirds of the page

Postgresql database

ElasticSearch for document

management

Docker for easy deployment

Page 17: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

17

OPENCSAM - UI DEVELOPMENT

Short name of the powerpoint presentation, maximum length two thirds of the page

● Developed wireframes for main modules:

○ Knowledge graph

○ Reporting/summarisation

○ Trending terms

○ Search

○ Source administration

Page 18: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

18

Dynamic KG editor prototype

Term evolution over time

OPENCSAM - UI DEVELOPMENT

Page 19: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

19

• Extract emerging terms (hashtag

co-occurrence)

• Generate co-occurrence graph

• Score concepts with PageRank

• Detect variations in rank over time

• Suggest terms with the highest

positive variation

• The super-user adds the terms to

the KG

OPENCSAMKG SUGGESTED TERMS

Short name of the powerpoint presentation, maximum length two thirds of the page

Page 20: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

20

• Build a set of corpus-wide common keyphrases

• bigrams and trigrams, scored using Normalized Pointwise

Mutual Information (NPMI)

• 1,2,3,4 elements PositionRank-ed keyphrases

•Cluster the keyphrases using word embeddings

•Build co-occurrence graph, use traverse distances

as feature in text classification task

•Propose for Knowlege Graph, based on quality

OPENCSAM – KEYPHRASES

Short name of the powerpoint presentation, maximum length two thirds of the page

Page 21: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

21

High quality sentence encodings from Universal Sentence Encoder

● Seed topics with “topic words”

● Encode topic words and news titles to vectors

● Set seed-word vectors as cluster centers

● Group titles around cluster centers

Future:

● replace USE with BERT, finetune for our corpus

● Explore VLAWE (Vector of Locally-Aggregated Word Embeddings)

OPENCSAM – CLUSTER NEWS

Short name of the powerpoint presentation, maximum length two thirds of the page

Page 22: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

23

● Extractive summarization (best ranked phrases)

● Users can select phrases to be kept/removed

● Included text comes from multiple articles

● Algorithm uses Universal Sentence Encoder

sentence embeddings

OPENCSAM –SUMMARISATION

Short name of the powerpoint presentation, maximum length two thirds of the page

Page 23: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

24

[email protected]

JOIN THE OPENCSAM

COMMUNITY

Open Cyber Security Awareness Machine

https://github.com/enisaeu/OpenCSAM

Page 24: OPEN-CSAM INFORMATION AGGREGATOR AND REPORTING … · Scalable backend using Django REST API Friendly UX for all user types Proof-of-concept implementations for: Knowledge Graph enrichment

THANK YOU FOR YOUR

ATTENTION

Vasilissis Sofias Str 1, Maroussi 151 24,

Attiki, Greece

+30 28 14 40 9711

[email protected]

www.enisa.europa.eu