abstract we live in a world of viruses, worms, and browser threats that change and adapt on an...
TRANSCRIPT
Protecting the World with Big Data
Bill PfeiferProgram ManagerMicrosoft Malware Protection CenterSeptember 2014
Abstract
We live in a world of viruses, worms, and browser threats that change and adapt on an hourly basis. Learn how Microsoft’s Protection Team, who bring you Microsoft Security Essentials and Windows Defender, has built and maintained a Big Data solution to protect Windows customers. These efforts offer monitoring and tools for release management, cloud protection, automatic signature generation, and malware research.
About Me
• Tlingit from southeast Alaska• Interested in electronics and
security from a young age• BS from University of Alaska,
Fairbanks• MS from Purdue• Member of AISES• Microsoft for 3.5 years
Tlingit totem pole and community house in Totem Bight State Park, Ketchikan, Alaska.Credit: Bob and Ira Spring
MS Malware Protection Center
Offerings• Microsoft Security Essentials
• Windows Defender
• System Center Endpoint Protection
• Office 365 Protection
• Azure Protection
• Windows Store protection
• Protect the unprotected • Remain security vendor agnostic
• Publish world-class security content, world security posture
• Remove malware value proposition
• Reduce malware’s reach and life span with cloud protection, machine learning, automation, and faster sample collection
• Identify and work with partners to eliminate malware monetization schemes
• Drive increased malware sample, telemetry, knowledge sharing
• Formalize strategic partner relationships with vendors, CERTs, E-commerce, application vendors and distributors
• Sponsor coordinated malware eradication campaigns
MMPC main goals
Disrupt malware ecosystem
Help ensure all Microsoft customers are protected
Build a strong and united ecosystem
PROTECTION SENSORS
Windows 8+ Defender 55MWindows 7- MSE 94M
Enterprise SCCM, Intune 6M
MSRT Monthly cleanup 1.2B
Azure, Office 365 .Windows 7- Defender 309M
DAILY675K new samples250M cloud calls12 sig releases
RESULTS79% protected13% encounters3.6% infected122M unique files
Too many expired AVs on Windows 8+
Malware out-paces sigs
GOALSEnsure all of Microsoft’s customers are protected
• Measure, push user-not-protected scenariosEradicate malware
• Apply new protection techniques• Amplify researchers with automation• Block the first time with the cloud
Lead antimalware ecosystem• Drive appropriate behavior• Coordinate activities across industry and ecommerce players• Fix testing perception, testing approach
8
The usual suspects• Malware families
rarely die: 466 make up top 99% of infections• Disruption helps, but
most families come back…• … and when they do,
they come back more resilient
Inefficiency: Lingering malware infections
Encounters vs. Infections
Data from Microsoft real time protection clients
Heat map shows rate of encounters (Blue->Green->Yellow->Red)
Country color signifies % of customers with infections
Of note Flooding works: more
encounters mean more infections
World-wide: 9% encounters, 3% infections
The trick is to stop the encounters
http://www.microsoft.com/security/sir/threat/default.aspx
Family Encounters Infections Industry MissesJenxcus 1,804,868 188,540 177,523 OptimizerElite 195,846 157,777 - Zbot 237,470 143,353 34,107 Brantall 433,403 113,526 - Wysotot 536,452 105,791 186,541 Rotbrow 508,577 100,354 271,326 Necurs 127,059 96,505 - Sality 455,342 91,492 50,485 Rovnix 85,228 71,019 - Kilim 196,326 67,871 1,198 Ramnit 437,128 67,171 30,629 Upatre 110,502 64,375 - Clikug 441,677 62,438 - Gamarue 920,694 58,110 26,624 Virut 223,311 57,594 2,516 Filcout 2,791,531 52,555 7,495,272 Spacekito 77,385 51,652 53,379 Napolar 156,702 51,492 1,973 Alureon 72,305 50,762 3,860 Dorkbot 471,114 48,697 8,128
Threat Family reports
Antimalware automation
Big Datasamples,
telemetry, reputation,
determinations
Analysis
Auto-classification
Signature generation
Telemetry response
Industry- Samples- Meta-data- Reputation- Determinations
Collection
Customers- Telemetry- Samples
Collection- Industry and customers- Automatic and on demand
Big Data- Samples- Map reduce- Processed/Workflow
Analysis- Dynamic and Static- Vendor rescans/determinations- Human-supplied patterns
Auto-classification- Combine analysis with reputation- Assign determination, family- Feeds sig-gen and cloud protection
Signature Generation- Best-fit signature- Static and proactive- Signature release pipeline
Telemetry Monitoring- FP detection- Never unknowns- Sample requests
Business Intelligence Team
Query Masters• Dashboards• Livesite reporting• PoR meetings• Researcher tools• Query Optimizations
Data Infrastructure
Multiple data sources• Windows Update• Watson Error Reporting• Software Quality Metrics• Telemetry Threat/suspicious
reports
Features
Storage & Usage Numbers
Threat Telemetry
Raw data 2 TB per day 360 TB for ½ yearReduced 200 GB per day 36 TB for ½ year
More than 200 engineers and researchers on the protection team
2 Cosmos Clusters 3 VC instances each
2 PBs stored between clusters
10K job/day
1.5K adhoc jobs/day
4.5 PB read/day by adhoc
Queue wait time of 2 minutes
First Impressions
Issues we found
Missing features• No coding guidelines• Limited shared libraries• No Discoverability• No scheduling
Impact• Duplication of work• Duplication of data• Long execution time
Expensive operations
CROSS APPLY(De-serializing rows)
CLUSTER BY(Partitioning the storage)
Data Skews
Evolution
How things started to evolve
Intermediate outputs for multi-stage jobs• Rerun against the middle outputs while developing
Documenting reusable data streams (lookup tables & contextual streams)Caching historical data
• Need to write stream sets to join over date ranges
Creating views over the cachesCreating libraries
• Common processors (strip the Threat Family Name out of the Threat Name)• Contextual meta data (geolocation)• Enumerations
Stopgap Scheduling • Task scheduler
Formalizing a Data Model
4Metrics/KPIs Views
Lookup Profiles
Metrics/KPI Streams
3Aggregate Views
Aggregate Streams
2Filter Views
Filter Streams
1Curated Views
Curated Streams
0 Raw
File ProfileKey: Sha1/Sha256Provides: First Seen dates, Prevalence, Top sigseqs, Top filenames, etc.
Family Profile Key: Family NameProvides: Family owners, Class, Machine Impacts, etc.
Device ProfileKey: Machine GUIDProvides: Heartbeat rate, City, State, Country, Platform, Top Threat IDs, etc.
Filename ProfileKey: FilenameProvides: Ancestors, Top Threat Ids, Top Sigs, First Seen, etc.
Signature ProfileKey: SigSeqProvides: Last check-in date, author, prevalence, family association, etc.
Sample Source ProfileKey: Source NameProvides: count of samples, efficacy of source, rate of samples etc.
URL ProfileKey: URLProvides: Top Threat Ids, Top Sigs, First Seen, family association, etc.
IP ProfileKey: IPProvides: Top Threat Ids, Top Sigs, First Seen, family association, etc.
Example Metrics: PSL/ESL
4 PSL / ESL KPI
Lookup Profiles
3 MissesAggregate
ActivesAggregate
Incorrect DetectionsAggregate
FailuresAggregate
EncountersAggregate
2 Missesview
Activesview
Incorrect Detectionsview
Failures view
EncountersView**
1
Canonical Telemetry view*
File Report view
Memory Report view
Boot-removal Reportview
Boot Reportview
Rootkit Reportview
File Report Memory Report Boot-removal Report Boot Report Rootkit Report
0 Raw Telemetry
4Metrics/KPIs Views
Lookup Profiles
Metrics/KPI Streams
3Aggregate Views
Aggregate Streams
2
Filter Views
Filter Streams
1Curated Views
Curated Streams
0 Raw
PDMCalculating the PSL, ESL based on the Protection Data Model
Operationalizing• Monitoring
• Job execution• Output stream creation• Dashboard creation
• Automated scheduling• Sangam workflows
• Production level libraries• Common source
• Production level caches• SLA on bug fixes, breaking change notification
• Documentation library• MSDN-like docs for reference and discoverability
• Testing framework
What is next?
• AB Testing• Increased production job stability• Increase agility
• Automatic Dashboard generation• Custom views for each researcher
• Rule based query generation• Parallel logic across single data set
© 2013 Microsoft. All rights reserved. Microsoft, Windows and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.