staaf, an efficient distributed framework for performing large-scale android application analysis

49
1 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured STAAF An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis OWASP AppSec USA Thursday, September 22, 2011

Upload: praetorian

Post on 09-Jun-2015

4.925 views

Category:

Technology


1 download

DESCRIPTION

There has been no shortage of Android malware analysis reports recently, but thus far that trend has not been accompanied with an equivalent scale of released public Android application tools or frameworks. To address this issue, we are presenting the Scalable Tailored Application Analysis Framework (STAAF), released as a new OWASP project for public use under Apache License 2.0. The goal of this framework is to allow a team of one or more analysts to efficiently analyze a large number of Android applications. In addition to large scale analysis, the framework aims to promote collaborative analysis through shared processing and results.Our framework is designed using a modular and distributed approach, which allows each processing node to be highly tailored for a particular task. At the heart of the framework is the Resource Manager (RM) module, which is responsible for tracking samples, managing analysis modules, and storing results. The RM also serves to reduce processing time and data management through the deduplication of data and work, and it also aids with the scheduling of tasks so that they can be completed as a pipeline or as a single unit. When processing begins, the RM uses several default "primitive" modules that carry out the fundamental operations, such as extracting the manifest, transforming the Dalvik bytecode, and extracting application resources. The analysis modules then use the raw results to extract specific attributes such as permissions, receivers, invoked methods, external resources accessed, control flow graphs, etc., and these results are then stored in a distributed data store, after which the information can be queried for high level trends or targeted searches.The modular nature of our framework allows independent analyses to happen on a per module basis, and the results of this data processing can be merged with other results at a later time. This design promotes an agile approach to large scale analysis, because it permits a wide array of analysis to happen distributively and in parallel. This means that teams with different needs or schedules can complete time-sensitive tasks separately with minimized data processing pipelines, while allowing more complex or time intensive tasks to be added later. Additionally, if analysis needs to be branched at some point in the pipeline, intermediate results can be retained and additional modules can be added leveraging the results from the past analysis steps. The results are also stored in a distributed database and designed to be queried using a map-reduce style query, which offers performance efficiencies as well as allowing the transparent inclusion of remote third party analysis databases. By using this plug-in style analysis framework, we are able to attain more efficient processing schedules and tailor the analysis for a specific need.This framework is designed to be scalable and extensible, and the initial offering of this framework includes several modules...

TRANSCRIPT

Page 1: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

1 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

OWASP AppSec USA

Thursday, September 22, 2011

Page 2: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

2 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Allow Me to Introduce Myself

Ryan W Smith VP Engineering at Praetorian

OWASP DFW Chapter Leader (2011)

Active member of The Honeynet

Project (2002- )

8+ years of work with DoD,

Intelligence Community,

Federal/State/Local governments,

and Fortune 500 companies

Page 3: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

3 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Page 4: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

4 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Presentation Roadmap

• STAAF (Overview)

• Background

• STAAF (Deep Dive)

• Results

• Future Work

• Conclusions

Page 5: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

5 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

What can STAAF do for you?

Observation #1:

There are a lot of Android app analysis

tools freely available

BUT:

They’re typically designed for single app

analysis

STAAF leverages the power of these tools as modules, And adds efficiency, scalability, data mgmt and sharing

Page 6: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

6 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

What can STAAF do for you?

Analyze 50k apps in less than 2 days and make

the extracted data readily available to analysts Goal

Observation #2:

Higher value analysis can be attained by

analyzing large numbers of applications

over long periods of time

SOLUTION:

Reduce the time and complexity for an

analyst to process large numbers of apps

Page 7: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

7 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Analytic Data

What can STAAF do for you?

Minimize analysts’ effort to extract meaningful

results from a large number of applications Goal

Process Apps

Mod #1

Mod #2

Mod #3

Mod #N

Process Data Mod A

Mod B

Mod C

Mod Z

Meaningful

Results

Page 8: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

8 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

What is STAAF

Page 9: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

9 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

What is STAAF

Page 10: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

10 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

What is STAAF

Page 11: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

11 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

What is STAAF

Page 12: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

12 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

What is STAAF

Page 13: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

13 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

What STAAF is NOT

• STAAF is not a stand alone application

• STAAF is not only a malware detection or anti-virus engine

• STAAF is not an application collection tool

STAAF is a problem agnostic app analysis framework

Page 14: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

14 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Presentation Roadmap

• STAAF (Overview)

• Background

• STAAF (Deep Dive)

• Results

• Future Work

• Conclusions

Page 15: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

15 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Android’s Open App Model

• Low barrier to entry

• Apps hosted and installed from anywhere

• All apps are created equal

• No distinction between core apps and 3rd party apps

• Accept apps based on: 1. Trust of the source

2. Permissions requested

Page 16: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

16 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

“Legitimate” Monitoring Apps

Ad/Marketing Networks

Social Gaming Networks

Page 17: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

17 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

“Not-So-Legitimate” Permission Use

SMS Trojan – Link to site hosting rogue app for “free movie player” – Sends 2 Premium SMS messages to a Kazakhstan number

(about $5 per message)

Gemini – Repackaged apps in Chinese markets – Sex positions and MonkeyJump2 are known examples – Central C&C – Exfiltrates unique device identifiers – Downloads and Install New Apps (with permission)

DroidDream – Approx. 50 Malicious apps in official market – Central C&C – Exfiltrates unique device identifiers – Downloads additional code modules

Page 18: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

18 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Presentation Roadmap

• STAAF (Overview)

• Background

• STAAF (Deep Dive)

• Results

• Future Work

• Conclusions

Page 19: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

19 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF Workflow

Step 0: STAAF components initialized

Page 20: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

20 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF Workflow

Step 1: Users sends APKs to be processed

Page 21: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

21 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF Workflow

Step 2: Coordinator checks database for previous

results and logs new instance data for each APK

Page 22: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

22 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF Workflow

Step 3: Coordinator sends new APKs to the file

repository service

Page 23: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

23 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF Workflow

Step 4: Coordinator sends tasking orders to the task

queue

Page 24: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

24 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF Workflow

Step 5: Elastic computing nodes pull tasks from their

designated task queue

Page 25: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

25 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF Workflow

Step 6: Elastic computing nodes pull in the APK and

related information

Page 26: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

26 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF Workflow

Step 6: After processing the elastic computing nodes

push out processed files and analysis results

Page 27: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

27 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF Workflow

Step 7: When all tasking is complete elastic

computing nodes notify the coordinator

Page 28: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

28 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Task Modules

• Can be registered dynamically

• Task-Oriented

– High level

• What % of apps use permission X

• What is the most common libraries used

– Mid level

• Extract Permissions

• Extract static URLs

• Extract Methods Called

– Low level

• Extract manifest

• Extract Dex bytecode

PR

IOR

ITY

CO

MP

LE

XIT

Y

Page 29: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

29 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Deduplication of Effort

• All Intermediate data are cached for later use

– Extract and convert manifest to ASCII

– Extract Dex and convert to Smali and Java

– Compute the control flow graph from the Dex

• Libraries and shared resources must only be processed once

• Apps must only be processed once by each module, ever

Small savings matter at large scales

Page 30: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

30 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Distributed Data Sharing

• Sharing app samples is just the beginning

• Share the entire process:

– Raw Application

– Extracted Resources

– Raw Data

– Processed Data

• Or set specific limits on what data is shared

Page 31: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

31 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Presentation Roadmap

• STAAF (Overview)

• Background

• STAAF (Deep Dive)

• Results

• Future Work

• Conclusions

Page 32: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

32 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Time Trials STAAF Performance Tests

# Time Apps ECUs Nodes Database

1 2h25m 500 1 1 Central

2 2h00m 500 1 2 Central

3 1h56m 500 1 4 Central

4 0h36m 500 1 4 Local

5 0h36m 500 5 1 Central

6 0h28m 500 5 4 Central

7 0h10m 500 5 4 Local

8 0h27m 1722 5 5 Local

9 1h19m 9349 5 10 Local

Achieved 50k apps in ~7 hours* *Extrapolated from shorter tests

Page 33: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

33 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Time Trials STAAF Performance Tests

# Time Apps ECUs Nodes Database

1 2h25m 500 1 1 Central

2 2h00m 500 1 2 Central

3 1h56m 500 1 4 Central

4 0h36m 500 1 4 Local

5 0h36m 500 5 1 Central

6 0h28m 500 5 4 Central

7 0h10m 500 5 4 Local

8 0h27m 1722 5 5 Local

9 1h19m 9349 5 10 Local

“One EC2 Compute Unit (ECU) provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.” -Amazon

1 ECU

1 Node

Central DB

5 ECUs

1 Node

Central DB

Compute Time 2h25m

Compute Time 0h36m

Page 34: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

34 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Time Trials STAAF Performance Tests

# Time Apps ECUs Nodes Database

1 2h25m 500 1 1 Central

2 2h00m 500 1 2 Central

3 1h56m 500 1 4 Central

4 0h36m 500 1 4 Local

5 0h36m 500 5 1 Central

6 0h28m 500 5 4 Central

7 0h10m 500 5 4 Local

8 0h27m 1722 5 5 Local

9 1h19m 9349 5 10 Local

STAAF is bound by both CPU and database throughput

1 ECU

1 Node

Central DB

1 ECU

4 Nodes

Central DB

Compute Time 2h25m

Compute Time 1h56m

Page 35: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

35 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Time Trials STAAF Performance Tests

# Time Apps ECUs Nodes Database

1 2h25m 500 1 1 Central

2 2h00m 500 1 2 Central

3 1h56m 500 1 4 Central

4 0h36m 500 1 4 Local

5 0h36m 500 5 1 Central

6 0h28m 500 5 4 Central

7 0h10m 500 5 4 Local

8 0h27m 1722 5 5 Local

9 1h19m 9349 5 10 Local

By using distributed, local databases STAAF achieves a significant time performance increase

1 ECU

4 Nodes

Central DB

1 ECU

4 Nodes

Local DB

Compute Time 1h56m

Compute Time 0h36m

Page 36: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

36 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Time Trials STAAF Performance Tests

# Time Apps ECUs Nodes Database

1 2h25m 500 1 1 Central

2 2h00m 500 1 2 Central

3 1h56m 500 1 4 Central

4 0h36m 500 1 4 Local

5 0h36m 500 5 1 Central

6 0h28m 500 5 4 Central

7 0h10m 500 5 4 Local

8 0h27m 1722 5 5 Local

9 1h19m 9349 5 10 Local

Using by adding multiple processors with local databases, we achieve near linear scalability

1 ECU

1 Node

Central DB

1 ECU

4 Nodes

Local DB

Compute Time 2h25m

Compute Time 0h36m

Page 37: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

37 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Time Trials STAAF Performance Tests

# Time Apps ECUs Nodes Database

1 2h25m 500 1 1 Central

2 2h00m 500 1 2 Central

3 1h56m 500 1 4 Central

4 0h36m 500 1 4 Local

5 0h36m 500 5 1 Central

6 0h28m 500 5 4 Central

7 0h10m 500 5 4 Local

8 0h27m 1722 5 5 Local

9 1h19m 9349 5 10 Local

By simply increasing the CPU capacity to 5 ECUs, we achieve the same performance as four 1 ECU nodes

1 ECU

4 Nodes

Local DB

5 ECUs

1 Node

Central DB

Compute Time 0h36m

Compute Time 0h36m

Page 38: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

38 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Time Trials STAAF Performance Tests

# Time Apps ECUs Nodes Database

1 2h25m 500 1 1 Central

2 2h00m 500 1 2 Central

3 1h56m 500 1 4 Central

4 0h36m 500 1 4 Local

5 0h36m 500 5 1 Central

6 0h28m 500 5 4 Central

7 0h10m 500 5 4 Local

8 0h27m 1722 5 5 Local

9 1h19m 9349 5 10 Local

Once again, using a central database fails to achieve linear performance gains

5 ECUs

1 Node

Central DB

5 ECUs

4 Nodes

Central DB

Compute Time 0h36m

Compute Time 0h28m

Page 39: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

39 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Time Trials STAAF Performance Tests

# Time Apps ECUs Nodes Database

1 2h25m 500 1 1 Central

2 2h00m 500 1 2 Central

3 1h56m 500 1 4 Central

4 0h36m 500 1 4 Local

5 0h36m 500 5 1 Central

6 0h28m 500 5 4 Central

7 0h10m 500 5 4 Local

8 0h27m 1722 5 5 Local

9 1h19m 9349 5 10 Local

By using distributed, local databases we once again achieve near linear performance gains

5 ECUs

1 Node

Central DB

5 ECUs

4 Nodes

Local DB

Compute Time 0h36m

Compute Time 0h10m

Page 40: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

40 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Time Trials STAAF Performance Tests

# Time Apps ECUs Nodes Database

1 2h25m 500 1 1 Central

2 2h00m 500 1 2 Central

3 1h56m 500 1 4 Central

4 0h36m 500 1 4 Local

5 0h36m 500 5 1 Central

6 0h28m 500 5 4 Central

7 0h10m 500 5 4 Local

8 0h27m 1722 5 5 Local

9 1h19m 9349 5 10 Local

By increasing CPU capacity, number of processing nodes, and number of databases, we decreased processing time by 14.5x

1 ECU

1 Node

Central DB

5 ECUs

4 Nodes

Local DB

Compute Time 2h25m

Compute Time 0h10m

Page 41: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

41 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Time Trials STAAF Performance Tests

# Time Apps ECUs Nodes Database

1 2h25m 500 1 1 Central

2 2h00m 500 1 2 Central

3 1h56m 500 1 4 Central

4 0h36m 500 1 4 Local

5 0h36m 500 5 1 Central

6 0h28m 500 5 4 Central

7 0h10m 500 5 4 Local

8 0h27m 1722 5 5 Local

9 1h19m 9349 5 10 Local

Larger tests confirm that STAAF continues to scale linearly

5 ECUs

5 Nodes

Local DB

5 ECUs

10 Nodes

Local DB

1722 Apps Compute Time

0h27m

9349 Apps Compute Time

1h19m

Page 42: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

42 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Permissions Requested

Average: 3

Most Requested: 117

Initial Results :: Permissions Requests 53,000 Applications Analyzed

Android Market: ~48,000

3rd Party Markets: ~5,000

Location Data 11,929 (24%)

Read Contacts 3,636 (8%)

Send SMS 1,693 (4%)

Receive SMS 1,262 (4%)

Record Audio 1,100 (2%)

Read SMS 832 (2%)

Process Outgoing Calls 323 (1%)

Page 43: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

43 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Additional Results :: Shared Libraries 53,000 Applications Analyzed

Android Market: ~48,000

3rd Party Markets: ~5,000

com.admob 38% (18,426 apps )

org.apache 8% ( 3,684 apps )

com.google.android 6% ( 2,838 apps )

com.google.ads 6% ( 2,779 apps )

com.flurry 6% ( 2,762 apps )

com.mobclix 4% ( 2,055 apps )

com.millennialmedia 4% ( 1,758 apps)

Page 44: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

44 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Permissions Are Not a Good Indicator

SMS Trojan

SMS Replicator

Droid Dream

zsones

&

Fake Security Tool Gemeni

Malware only needs a single permission

Page 45: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

45 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Presentation Roadmap

• STAAF (Overview)

• Background

• STAAF (Deep Dive)

• Results

• Future Work

• Conclusions

Page 46: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

46 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

STAAF’s Future

• Build a publically available user interface

• Provide a dashboard with global stats

• Further Tune database performance issues

• Build more complex analysis modules – Static data flow analysis

– Dynamic sandbox analysis

• Expose a public module interface through UI

Page 47: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

47 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Presentation Roadmap

• STAAF (Overview)

• Background

• STAAF (Deep Dive)

• Results

• Future Work

• Conclusions

Page 48: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

48 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Final Thoughts

• STAAF is a system of systems and services, not an application

• STAAF enables large scale Android application analysis

• STAAF is problem agnostic and can be tailored to answer many analytic questions

• STAAF augments the capabilities of the analyst, it does not replace them

• STAAF achieves scalable performance increases by increasing computer nodes/power

Page 49: STAAF, An Efficient Distributed Framework for Performing Large-Scale Android Application Analysis

49 Entire contents © 2011 Praetorian. All rights reserved. Your World, Secured

Q&A