center for content extraction · top secret//comint/irel to usa, a us, can, gbr, nzl//20320108...

8
TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL Human Language Technology _ ., ... '\... -IV . .. Center for Content Extraction Content Extraction Analytics SIGDEV End-to-End Demo 21 May 2009 Derived From: NSA/CSSM 1-52 Dated: 20070 108 Declassify On: 203301 08 TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL

Upload: others

Post on 19-Apr-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL

Human Language Technology _

., ... '\... -IV . ..

Center for Content Extraction

Content Extraction Analytics SIGDEV End-to-End Demo

21 May 2009

Derived From: NSA/CSSM 1-52 Dated: 20070 1 08

Declassify On: 203301 08

TOP SECRET//COMINT//REL TO USA, AUS, CAN, GBR, NZL

Page 2: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements

TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108

Introduction to Content Extraction

• New technologies can find Essential Elements of Information in documents

The Center for Content Extraction provides "one stop shopping" for these technologies at NSA

TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108

Page 3: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements

TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108

Extraction can benefit SIGDEV from end to end

Selection lira1nslation & Transliteration Analysis

II I 1nter1pretation/Enrichment Retrieval Storage & Distribution

TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108

Page 4: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements

TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108

STAIRS Partners

5 (Marina, CEA)

T (Cybertrans)

A (SNA/Paintball, Synapse)

I (Nymrod,Thundercloud)

R (Journeyman/CPE)

5 (GoldenRetriever, SocioPath)

TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108

Page 5: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements

TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108

Imple1mentation: CCE Extraction Architecture (Lex Hound)

Subscription Based Customers - extracted

report/transcript content

Marina (comms tracking) Synapse/EKS (link analysis) Nymrod (Name Matching)

Web Service On

Demand Customers

L WebServices)JJ

LexHound Web Demo CAMT (translation) TKB (target knowledge base) SNA (social network analysis) GIS ( geo mapping) NTOC (terror cell tracking) Heresyitch (UC collateral) GoldenRetriever (record building)

I

------------------------------, Reports _.

Transcri~

1

Ingester

Dispatcher

Task Manager

/

\ \ '\ '" \ \ \ ' \ \ ' ' . ~......_ __ _

...,..__..{ \ ~tractor(s) II •• ' • . ' ' '

·: trc)~former I \

' • • ' _I • I I R~derer

' I

'

+- -------1 I Sender I I

l ~--------- Output ---- ..

TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108

Page 6: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements

TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108

Elaboration: The Central Importance of Storage

D Each of the STAIRS Steps exploits stored i1nformation • Selection Dictionaries ("get it")

• Linguistic Glossaries for Translation

• Wikis etc for enrichment ("know it")

D Ma1nual record-formation is slow, prone to 01missio1ns and inconsistencies • <200K Person Ta rgets in TKB

• Growth rv = 20K/year

D Auto1matic extraction accelerates storage • >3000K Citation Records in Nymrod Entity DB

• Growth rv = lOOOK/year

TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108

Page 7: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements

TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108

Machine vs. Manual Chief-of-State Citations

Nymrod (machine-extracted) Citations LastTKB

Cod Manual

Name Role Cites Update A

Malaysian Prime 10/15/200 1 Abdullah Badawi Minister cos > 100 7

2 Abdullahi Yusuf Somali President cos > 300 N/A

(Mah mud 'Abbas) PA 3 Abu Mazin President cos >200 5/20/2009

4 Alan Garcia Peruvian President cos > 100 N/A

5 Aleksandr Lukashenko Belarusian President cos >50 N/A

6 Alvaro Golom Guatemalan President cos >200 N/A

7 Alvaro Uribe Colombian President cos >700 N/A

8 Amadou Toumani Toure Malian President cos >50 N/A

9 Angela Merkel German Chancellor cos > 300 N/A

10 Bashar ai-Asad Syrian President cos > 800 N/A

... ........................... ... .... ... .. ....... .... .. .. . ..

122 Yuliya Tymoshenko Ukrainian Prime cos >200 N/A

TOP SECRET//COMINT/IREL TO USA, AUS, CAN, GBR, NZL//20320108

Page 8: Center for Content Extraction · TOP SECRET//COMINT/IREL TO USA, A US, CAN, GBR, NZL//20320108 Introduction to Content Extraction • New technologies can find Essential Elements

"\:1 ~

~

£:::1

~ p

</' V7 \:7" C;::..

Hwnan Language Technology "•' ~ f •