staaf, an efficient distributed framework for performing large-scale android application analysis
DESCRIPTION
There has been no shortage of Android malware analysis reports recently, but thus far that trend has not been accompanied with an equivalent scale of released public Android application tools or frameworks. To address this issue, we are presenting the Scalable Tailored Application Analysis Framework (STAAF), released as a new OWASP project for public use under Apache License 2.0. The goal of this framework is to allow a team of one or more analysts to efficiently analyze a large number of Android applications. In addition to large scale analysis, the framework aims to promote collaborative analysis through shared processing and results.Our framework is designed using a modular and distributed approach, which allows each processing node to be highly tailored for a particular task. At the heart of the framework is the Resource Manager (RM) module, which is responsible for tracking samples, managing analysis modules, and storing results. The RM also serves to reduce processing time and data management through the deduplication of data and work, and it also aids with the scheduling of tasks so that they can be completed as a pipeline or as a single unit. When processing begins, the RM uses several default "primitive" modules that carry out the fundamental operations, such as extracting the manifest, transforming the Dalvik bytecode, and extracting application resources. The analysis modules then use the raw results to extract specific attributes such as permissions, receivers, invoked methods, external resources accessed, control flow graphs, etc., and these results are then stored in a distributed data store, after which the information can be queried for high level trends or targeted searches.The modular nature of our framework allows independent analyses to happen on a per module basis, and the results of this data processing can be merged with other results at a later time. This design promotes an agile approach to large scale analysis, because it permits a wide array of analysis to happen distributively and in parallel. This means that teams with different needs or schedules can complete time-sensitive tasks separately with minimized data processing pipelines, while allowing more complex or time intensive tasks to be added later. Additionally, if analysis needs to be branched at some point in the pipeline, intermediate results can be retained and additional modules can be added leveraging the results from the past analysis steps. The results are also stored in a distributed database and designed to be queried using a map-reduce style query, which offers performance efficiencies as well as allowing the transparent inclusion of remote third party analysis databases. By using this plug-in style analysis framework, we are able to attain more efficient processing schedules and tailor the analysis for a specific need.This framework is designed to be scalable and extensible, and the initial offering of this framework includes several modules...TRANSCRIPT
1 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis
OWASP AppSec USA
Thursday, September 22, 2011
2 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Allow Me to Introduce Myself
Ryan W Smith VP Engineering at Praetorian
OWASP DFW Chapter Leader (2011)
Active member of The Honeynet
Project (2002- )
8+ years of work with DoD,
Intelligence Community,
Federal/State/Local governments,
and Fortune 500 companies
3 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
4 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
5 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
What can STAAF do for you?
Observation #1:
There are a lot of Android app analysis
tools freely available
BUT:
They’re typically designed for single app
analysis
STAAF leverages the power of these tools as modules, And adds efficiency, scalability, data mgmt and sharing
6 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
What can STAAF do for you?
Analyze 50k apps in less than 2 days and make
the extracted data readily available to analysts Goal
Observation #2:
Higher value analysis can be attained by
analyzing large numbers of applications
over long periods of time
SOLUTION:
Reduce the time and complexity for an
analyst to process large numbers of apps
7 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Analytic Data
What can STAAF do for you?
Minimize analysts’ effort to extract meaningful
results from a large number of applications Goal
Process Apps
Mod #1
Mod #2
Mod #3
Mod #N
Process Data Mod A
Mod B
Mod C
Mod Z
Meaningful
Results
8 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
What is STAAF
9 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
What is STAAF
10 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
What is STAAF
11 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
What is STAAF
12 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
What is STAAF
13 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
What STAAF is NOT
• STAAF is not a stand alone application
• STAAF is not only a malware detection or anti-virus engine
• STAAF is not an application collection tool
STAAF is a problem agnostic app analysis framework
14 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
15 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Android’s Open App Model
• Low barrier to entry
• Apps hosted and installed from anywhere
• All apps are created equal
• No distinction between core apps and 3rd party apps
• Accept apps based on: 1. Trust of the source
2. Permissions requested
16 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
“Legitimate” Monitoring Apps
Ad/Marketing Networks
Social Gaming Networks
17 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
“Not-So-Legitimate” Permission Use
SMS Trojan – Link to site hosting rogue app for “free movie player” – Sends 2 Premium SMS messages to a Kazakhstan number
(about $5 per message)
Gemini – Repackaged apps in Chinese markets – Sex positions and MonkeyJump2 are known examples – Central C&C – Exfiltrates unique device identifiers – Downloads and Install New Apps (with permission)
DroidDream – Approx. 50 Malicious apps in official market – Central C&C – Exfiltrates unique device identifiers – Downloads additional code modules
18 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
19 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF Workflow
Step 0: STAAF components initialized
20 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF Workflow
Step 1: Users sends APKs to be processed
21 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF Workflow
Step 2: Coordinator checks database for previous
results and logs new instance data for each APK
22 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF Workflow
Step 3: Coordinator sends new APKs to the file
repository service
23 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF Workflow
Step 4: Coordinator sends tasking orders to the task
queue
24 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF Workflow
Step 5: Elastic computing nodes pull tasks from their
designated task queue
25 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF Workflow
Step 6: Elastic computing nodes pull in the APK and
related information
26 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF Workflow
Step 6: After processing the elastic computing nodes
push out processed files and analysis results
27 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF Workflow
Step 7: When all tasking is complete elastic
computing nodes notify the coordinator
28 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Task Modules
• Can be registered dynamically
• Task-Oriented
– High level
• What % of apps use permission X
• What is the most common libraries used
– Mid level
• Extract Permissions
• Extract static URLs
• Extract Methods Called
– Low level
• Extract manifest
• Extract Dex bytecode
PR
IOR
ITY
CO
MP
LE
XIT
Y
29 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Deduplication of Effort
• All Intermediate data are cached for later use
– Extract and convert manifest to ASCII
– Extract Dex and convert to Smali and Java
– Compute the control flow graph from the Dex
• Libraries and shared resources must only be processed once
• Apps must only be processed once by each module, ever
Small savings matter at large scales
30 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Distributed Data Sharing
• Sharing app samples is just the beginning
• Share the entire process:
– Raw Application
– Extracted Resources
– Raw Data
– Processed Data
• Or set specific limits on what data is shared
31 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
32 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Time Trials STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
Achieved 50k apps in ~7 hours* *Extrapolated from shorter tests
33 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Time Trials STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
“One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.” -Amazon
1 ECU
1 Node
Central DB
5 ECUs
1 Node
Central DB
Compute Time 2h25m
Compute Time 0h36m
34 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Time Trials STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
STAAF is bound by both CPU and database throughput
1 ECU
1 Node
Central DB
1 ECU
4 Nodes
Central DB
Compute Time 2h25m
Compute Time 1h56m
35 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Time Trials STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
By using distributed, local databases STAAF achieves a significant time performance increase
1 ECU
4 Nodes
Central DB
1 ECU
4 Nodes
Local DB
Compute Time 1h56m
Compute Time 0h36m
36 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Time Trials STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
Using by adding multiple processors with local databases, we achieve near linear scalability
1 ECU
1 Node
Central DB
1 ECU
4 Nodes
Local DB
Compute Time 2h25m
Compute Time 0h36m
37 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Time Trials STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
By simply increasing the CPU capacity to 5 ECUs, we achieve the same performance as four 1 ECU nodes
1 ECU
4 Nodes
Local DB
5 ECUs
1 Node
Central DB
Compute Time 0h36m
Compute Time 0h36m
38 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Time Trials STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
Once again, using a central database fails to achieve linear performance gains
5 ECUs
1 Node
Central DB
5 ECUs
4 Nodes
Central DB
Compute Time 0h36m
Compute Time 0h28m
39 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Time Trials STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
By using distributed, local databases we once again achieve near linear performance gains
5 ECUs
1 Node
Central DB
5 ECUs
4 Nodes
Local DB
Compute Time 0h36m
Compute Time 0h10m
40 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Time Trials STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
By increasing CPU capacity, number of processing nodes, and number of databases, we decreased processing time by 14.5x
1 ECU
1 Node
Central DB
5 ECUs
4 Nodes
Local DB
Compute Time 2h25m
Compute Time 0h10m
41 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Time Trials STAAF Performance Tests
# Time Apps ECUs Nodes Database
1 2h25m 500 1 1 Central
2 2h00m 500 1 2 Central
3 1h56m 500 1 4 Central
4 0h36m 500 1 4 Local
5 0h36m 500 5 1 Central
6 0h28m 500 5 4 Central
7 0h10m 500 5 4 Local
8 0h27m 1722 5 5 Local
9 1h19m 9349 5 10 Local
Larger tests confirm that STAAF continues to scale linearly
5 ECUs
5 Nodes
Local DB
5 ECUs
10 Nodes
Local DB
1722 Apps Compute Time
0h27m
9349 Apps Compute Time
1h19m
42 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Permissions Requested
Average: 3
Most Requested: 117
Initial Results :: Permissions Requests 53,000 Applications Analyzed
Android Market: ~48,000
3rd Party Markets: ~5,000
Location Data 11,929 (24%)
Read Contacts 3,636 (8%)
Send SMS 1,693 (4%)
Receive SMS 1,262 (4%)
Record Audio 1,100 (2%)
Read SMS 832 (2%)
Process Outgoing Calls 323 (1%)
43 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Additional Results :: Shared Libraries 53,000 Applications Analyzed
Android Market: ~48,000
3rd Party Markets: ~5,000
com.admob 38% (18,426 apps )
org.apache 8% ( 3,684 apps )
com.google.android 6% ( 2,838 apps )
com.google.ads 6% ( 2,779 apps )
com.flurry 6% ( 2,762 apps )
com.mobclix 4% ( 2,055 apps )
com.millennialmedia 4% ( 1,758 apps)
44 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Permissions Are Not a Good Indicator
SMS Trojan
SMS Replicator
Droid Dream
zsones
&
Fake Security Tool Gemeni
Malware only needs a single permission
45 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
46 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
STAAF’s Future
• Build a publically available user interface
• Provide a dashboard with global stats
• Further Tune database performance issues
• Build more complex analysis modules – Static data flow analysis
– Dynamic sandbox analysis
• Expose a public module interface through UI
47 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Presentation Roadmap
• STAAF (Overview)
• Background
• STAAF (Deep Dive)
• Results
• Future Work
• Conclusions
48 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Final Thoughts
• STAAF is a system of systems and services, not an application
• STAAF enables large scale Android application analysis
• STAAF is problem agnostic and can be tailored to answer many analytic questions
• STAAF augments the capabilities of the analyst, it does not replace them
• STAAF achieves scalable performance increases by increasing computer nodes/power
49 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured
Q&A