dhs s&t cyber security division (csd) predict overview - predict - maughan - 2nov2015.pdf ·...
TRANSCRIPT
DHS S&T Cyber Security Division (CSD)
PREDICT Overview
Homeland Security Advanced Research Projects Agency
Douglas Maughan Division Director November 2, 2015
http://www.dhs.gov/cyber-research
DHS S&T Research Infrastructure Lowering the bar to meaningful cyber security R&D
• Research Data Repository (PREDICT) – Repository of network data for use by the U.S.- based cyber security
research community – https://www.predict.org
• Experimental Research Testbed (DETER) – Researcher and vendor-neutral experimental infrastructure
Used by over 200 organizations from more than 20 states and 17 countries
Used by over 40 classes, from 30 institutions involving 2,000+ students – http://www.deter-project.org
• Software Assurance Market Place (SWAMP) – A software assurance testing and evaluation facility and the associated
research infrastructure services – http://www.continuousassurance.org
Launched February 3, 2014 2
• Rationale / Background / Historical: – Researchers with insufficient access to data unable
to adequately test their research prototypes – Government technology decision-makers and
researchers need data to evaluate competing “products”
– Supports scientific method via repeatability of tests and evaluations
– Unclear legal and ethical policies for Internet research
• Project Impetus: – National Strategy to Secure Cyberspace (February
2003) – 2009 Cyberspace Policy Review – Supports “Expanding Public Access to the Results of
Federally Funded Research” see http://www.whitehouse.gov/sites/default/files/microsites/ostp/ostp_public_access_memo_2013.pdf
• The Research Data Repository (a.k.a. PREDICT) project is the only freely-available legally collected repository of large-scale datasets containing real network traffic and system logs in the U.S.
PREDICT Background
3
PREDICT Cyber
Security Datasets
Dataset Sharing
Legal Framework
Ethics & Disclosure
Control
4
PREDICT Project Architecture
5
Data Hosts
Researchers PREDICT
Coordinating Center (PCC)
Account Request
Data Providers
Data Access
MOAs
Legal Analysis
MOAs Data Hosting
MOAs
Application Review Board
PREDICT Project Legal Framework
MOAs
• Large scale internet datasets – DITL (Day in the Life of the Internet)
• Collection mechanisms include: – The Los Nettos network – 10 Gb backbone comprised of dark fiber
and leased gigabit circuits – Archipelago (Ark) - UCSD/CAIDA's world-wide active measurement
infrastructure with 71 Ark monitors located in 35 countries – Merit Network – A regional ISP in Michigan with a 10 Gbps facilities-
based core – Packet Clearing House – Numerous collection locations associated
with global IXP activities • Others
– Collegiate competitions – entire CTF capture, including red team – Synthetically generated datasets – from other USG, e.g. DARPA – Malware Command and Control
• Future – Insider Threat, Mobile, CPSSEC, Others?
6
PREDICT Dataset Focus
Dataset Categories
7
• Address Space Allocation Data • Border Gateway Protocol (BGP) Routing Data • Blackhole Address Space Data • Domain Name System (DNS) Data • Internet Topology Data • Intrusion Detection System (IDS) and Firewall Data • Infrastructure Data • Internet Protocol (IP) Packet Headers • Synthetically Generated Datasets • Traffic Flow Data • Application Layer Security Data • Unsolicited Bulk Email Data • Botnet Sinkhole Data • National Collegiate Cyber Defense Competition • Netalyzr – Performance Data
Data Host/Providers • UCSD/CAIDA
– Topology Measurements, Network Telescope • USC – ISI / Colorado State Universtiy
– NetFlow, Internet Topology Data, Address Allocation, Spam logs, IP reputation lists
• University of Michigan/Merit Networks – Netflow, BGP Routing, Dark Address Space Monitoring, BGP
Beacon Routing • Georgia Tech
– Botnet Sinkhole Connection • University of Wisconsin
– Global Intrusion Detection Database, Physical Infrastructure dataset
• Packet Clearing House – BGP Routing, VoIP Measurement, Synthetic Attack Data
8
974 TB
113 TB
364 TB
0.1 TB
3.7 TB
10.0 TB
TOTAL = 1500+ TB
• Over 300 research papers/journals/technical reports within the last 3 years using PREDICT datasets
• Research groups that have used PREDICT include: – 117 academic institutions – 88 commercial entities – 37 Government organizations – 8 Foreign – 11 non-profit organizations
• Menlo Report – Highly visible in the cyber-security research community. – Raising awareness of the importance of the issues
associated with ethical and legal cybersecurity R&D
Research Impact
9
(Normative) Computer Ethics
“A typical problem in computer ethics arises because there is a policy vacuum about how computer technology should be used.
Computers provide us with new capabilities and these in turn give us new choices for action. Often, either no policies for conduct in these situations exist or existing policies seem inadequate.
A central task of computer ethics is to determine what we should do in such cases, i.e., to formulate policies to guide our actions.”
- James Moor, 1985
10
The Belmont Report
IRBs help ensure that research conforms with the ethical principles of the Belmont Report
"Ethical Principles and Guidelines for the Protection of Human Subjects of Research”, US Department of Health, Education, and Welfare, April 18,1979
11
The Menlo Report
"Ethical Principles Guiding Information and Communication Technology Research” Supported by US Department of Homeland Security (published 2011).
Belmont Principle Menlo Application
Respect for Persons Identify stakeholders Informed consent
Beneficence Identify potential benefits and harms Balance risks and benefits Mitigate realized harms
Justice Fairness and equity
Additional Menlo Principle: Respect for the Law and Public Interest
Compliance Transparency and accountability
12
Case Studies of ICT Research • Shining Light in Dark
Places: Understanding the ToR Network
• Learning More About the Underground Economy: A Case Study of Keyloggers and Dropzones
• Your Botnet is My Botnet: Examination of a Botnet Takeover
• Why and How to Perform Fraud Experiments
• Measurement and Mitigation of Peer-to-Peer-Based Botnets: A Case Study on Storm Worm
• Spamalytics: An Empirical Analysis of Spam Marketing Conversion
• Studying Spamming Botnets Using Botlab
• P2P as Botnet Command and Control: A Deeper Insight
• DDoS Attacks Against South Korean and U.S. Government Sites
• BBC: Experiments with Commercial Botnets
• Lycos Europe “Make Love Not Spam” Campaign
• University of Bonn: “Stormfucker”
• Information Warfare Monitor: “Ghostnet”
• Tipping Point: Kraken Botnet Takeover
• Symbiot: “Active Defense”
• Tracing Anonymous Packets to the Approximate Source
• LxLabs Kloxo/HyperVM
• Exploiting Open Functionality in SMS-Capable Networks
• Pacemakers and Implantable Cardiac Defibrillators: Software Radio Attacks and Zero-Power Defenses
• Black Ops 2008 -- Its The End Of The Cache As We Know It
• How to Own the Internet in Your Spare Time
• Botnet Design
• RFID Hacking
• WORM vs. WORM: preliminary study of an active counter-attack mechanism
• A Pact with the Devil
• Playing Devil's Advocate: Inferring Sensitive Data from Anonymized Network Traces
• Protected Repository for the Defense of Infrastructure Against Cyber Attacks
Likely to be considered Human Subjects Research subject to IRB review
13
• Expanding international cooperation framework to facilitate multi-national cyber security research collaboration – Complete: Australia, Canada, Israel, Japan, United Kingdom – In Progress: Netherlands, New Zealand, Singapore, Spain
• Technical and Policy work on disclosure control for traffic data • Additional Data Access Methods, such as Secure Virtual Enclaves • Expansion of Ethics of Cyber Security research to include the
development of practical guidelines for research community, including emphasis on Institutional Review Boards (IRBs)
• Additional datasets and categories – Public/restricted and live streaming/archival data
• Coordinate across USG agencies to share unclassified data resulting from cyber security R&D; currently sharing data produced by: DARPA, FCC, and IARPA
• We are actively seeking additional datasets for inclusion in PREDICT in addition to encouraging the use of existing datasets
Summary/Way Forward
14
For more information, visit http://www.dhs.gov/cyber-research
https://www.predict.org
Douglas Maughan, Ph.D. Division Director Cyber Security Division Homeland Security Advanced Research Projects Agency (HSARPA) [email protected] 202-254-6145 / 202-360-3170
15
What are ethics?
• “The field of ethics (or moral philosophy) involves systematizing, defending, and recommending concepts of right and wrong behavior.”
• Normative ethics, is concerned with developing a set of morals or guiding principles intended to influence the conduct of individuals and groups within a population (i.e., a profession, a religion, or society at large).
16
Ethics != Law • “Law can be defined as a consistent set of universal rules that
are widely published, generally accepted, and usually enforced” • Interrelated but by no means identical (e.g., legal but not ethical,
ethical but not legal) – Adherence to ethical principles may be required to meet regulatory
requirements surrounding academic research – A law may illuminate the line between beneficial acts and harmful ones. – If the computer security research community develops ethical principals
and standards that are acceptable to the profession and integrates those as standard practice, it makes it easier for legislatures and courts to effectively perform their functions.
17