information flow leaks in android apps and analysis of ... · semantics. associate the semantics...

22
FlowCog: Context-aware Semantics Extraction and Analysis of Information Flow Leaks in Android Apps Xiang Pan, Yinzhi Cao, Xuechao Du, Boyuan He, Gan Fang, Yan Chen. Northwestern University, Johns Hopkins University Zhejiang University, Google

Upload: others

Post on 04-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

FlowCog: Context-aware Semantics Extraction

and Analysis of Information Flow Leaks

in Android Apps

Xiang Pan, Yinzhi Cao, Xuechao Du, Boyuan He, Gan Fang, Yan Chen.

Northwestern University, Johns Hopkins UniversityZhejiang University, Google

Page 2: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Roadmap1. Motivating Example2. FlowCog Overview 3. Design

a. View Dependency Explorerb. Flow and Semantics Correlation Inference

4. Implementation5. Evaluation & Case Study6. Conclusion

Page 3: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

MainActivity:onCreateuserLocation = getLastKnownLocation(...)allowShareLoc = (CheckBox) findViewById(...)

Main_activity.xml <Button android:onClick=”perfomUpdateWeather”>

MainActivity:performUpdateWeather()task=new UpdateWeatherAsyncTask(userLocation, allowShareLoc.isChecked())task.execute()

UpdateWeatherAsyncTask:doInBackground()

if (allowShareLoc) ServerApi.postData(location)else showDialogForPermission();

“Share location to automatically update city”

“Update Weather”

ServerApi.postData(location)req = prepareWeatherUpdateReq(location)HttpClient.execute(req) ServerApi.postData(location)

PosBtnListener:onClick()

Let our app provide accurate weather information based on your current location, this ...

(1)Source

(2)

(3)

(3)(4) Sink

Page 4: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

[MockDroid] Beresford, Alastair R., et al. "Mockdroid: trading privacy for application functionality on smartphones." Proceedings of the 12th workshop on mobile computing systems and applications. ACM, 2011.

[SmartDroid] Zheng, Cong, et al. "Smartdroid: an automatic system for revealing ui-based trigger conditions in android applications." Proceedings of the second ACM workshop on Security and privacy in smartphones and mobile devices. ACM, 2012.

[AppIntent] Yang, Zhemin, et al. "Appintent: Analyzing sensitive data transmission in android for privacy leakage detection." Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security. ACM, 2013.

[Epicc] Octeau, Damien, et al. "Effective inter-component communication mapping in android with epicc: An essential step towards holistic security analysis." Effective Inter-Component Communication Mapping in Android with Epicc: An Essential Step Towards Holistic Security Analysis (2013).

[AmanDroid] Wei, Fengguo, Sankardas Roy, and Xinming Ou. "Amandroid: A precise and general inter-component data flow analysis framework for security vetting of android apps." Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 2014.

[TaintDroid] Enck, William, et al. "TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones." ACM Transactions on Computer Systems (TOCS) 32.2 (2014): 5.

[FlowDroid] Arzt, Steven, et al. "Flowdroid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps." Acm Sigplan Notices49.6 (2014): 259-269.

[IccTA] Li, Li, et al. "Iccta: Detecting inter-component privacy leaks in android apps." Proceedings of the 37th International Conference on Software Engineering-Volume 1. IEEE Press, 2015.

[Apparecium] Titze, Dennis, and Julian Schütte. "Apparecium: Revealing data flows in android applications." Advanced Information Networking and Applications (AINA), 2015 IEEE 29th International Conference on. IEEE, 2015.

[Droidsafe] Gordon, Michael I., et al. "Information Flow Analysis of Android Applications in DroidSafe." NDSS. Vol. 15. 2015.

[VetDroid] Pravin, Ms Nigam Paridhi. "Vetdroid: Analysis using permission for vetting undesirable behaviours in android applications." International Journal of Innovative and Emerging Research in Engineering 2.3 (2015): 131-136.

Cannot tell which flow is legitimate!

58% flows are legitimate!

Big burden on users!

Page 5: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Roadmap1. Motivating Example2. FlowCog Overview 3. Design

a. View Dependency Explorerb. Flow and Semantics Correlation Inference

4. Implementation5. Evaluation & Case Study6. Conclusion

Page 6: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

FlowCog Overview● First system to extract Android app’s flow-specific

semantics. ● Associate the semantics with flow’s behavior.

● High level steps:○ Associate each flow with its related views via static

analysis and an optional dynamic analysis.○ Extract view semantics. (e.g., “Update Weather”)○ Determine if semantics provides information about

flow behavior.

Page 7: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

FlowCog Architecture: Semantics Extraction (1/2)

App

1. Data flow analysis with FlowDroid.

App

Flow

2. Activation event and guarding conditions.

3. View dependency explorer.

Flow path

Dynamic Analysis

App4. Semantic Extractor.

Views

“Share location to automatically update city”

Activation Event.

UpdateWeatherAsyncTask:doInBackground

if (allowShareLoc) sendDataelse ...

Guarding Condition

Page 8: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

FlowCog: Flow and Semantics Correlation Inference (2/2)

Semantic Extractor

Classifier

Learning classifier.

Learning-free classifier

Ex: “Share location to automatically update city” => <update, city>, <share, location>

FilterFlow-specific

texts.

App description.[<Verb, Noun>,<Verb, Noun>, …,]

Page 9: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Roadmap1. Motivating Example2. FlowCog Overview 3. Design

a. View Dependency Explorerb. Flow and Semantics Correlation Inference

4. Implementation5. Evaluation & Case Study6. Conclusion

Page 10: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Design: View Dependency Explorer● Formalized into taint analysis problem.

○ Source: all the views. ■ findViewById(...) call statements.■ All new statements that create classes (and their builders)

extending View and Dialog classes.

○ Sink: ■ Statements in given data flow.■ Guarding condition statements.■ All the activation events’ registration statements.

○ Use IFDS framework provided by FlowDroid.

Page 11: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Design: Flow and Semantics Correlation Inference

Flow-specific

texts.

App description.Classifier

Learning Classifier

Learning-free Classifier

Filter

Vectorize input using TF-IDF.Classify using SVM and Gradient Boosting.

Use Word2Vec to convert two inputs into two vector lists, and then compute their similarity score.

Documentation of

source and sink

methods.

Page 12: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Roadmap1. Motivating Example2. FlowCog Overview 3. Design

a. View Dependency Explorerb. Flow and Semantics Correlation Inference

4. Implementation5. Evaluation & Case Study6. Conclusion

Page 13: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

ImplementationComponent Language Loc

Flow-related Semantics Extraction

Java ~12,000

Classifier Python ~3,000

Dynamic Analysis Python, Java ~1,000

Total Python, Java ~16,000

Page 14: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Roadmap1. Motivating Example2. FlowCog Overview 3. Design

a. View Dependency Explorerb. Flow and Semantics Correlation Inference

4. Implementation5. Evaluation & Case Study6. Conclusion

Page 15: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Evaluation: Ground TruthType Apps Apps

with Flows

Legitimate Flows

Malicious Flows

Total Flows

Benign 1,299/4,500 361 688 355 1,043

Malicious[Drebin dataset]

586/1,500 255 675 624 1,299

Overall 1,885/6,000 616 1,363 979 2,342

Page 16: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Evaluation: FlowCog Achieves High Accuracy.

Type Flows Precision Recall Accuracy

Benign 1,043 90.3% 95.1% 90.7%

Malicious 1,299 89.9% 91.0% 89.6%

Overall 2,342 90.1% 93.1% 90.2%

Page 17: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Evaluation: Size of Data Set Matters.Type # of Flows Accuracy

Credential 443 96.2%

Device/Card Id 373 91.4%

Phone number 103 94.2%

Internet 1,009 91.7%

SMS 233 83.7%

Location 173 79.2%

Contact 132 79.5%

Calendar 12 66.7%

Less accurate flow types can be improved with more training data.

Page 18: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Case Study: Home of Ocarina● Leaks out users’ geo-location. ● Labeled as legitimate because of

extracted semantics.

“Map”, “The location of home of Ocarina”

Page 19: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Case Study: SMS Irritate● Leaks out user-specific

information via SMS.● Labeled as legitimate.

“Send SMS”, “Number of SMS to flood”, “Message”

Page 20: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Case Study: Merry Christmas● Leaks out users’

information to Internet.● Labeled as malicious.

“Move the box to the target empty position...”

Page 21: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Conclusion● FlowCog is the first system to extract flow-specific semantics.● FlowCog adopts NLP techniques to associate flow-specific

semantics with flow behaviors.● Our evaluation results show that FlowCog can achieve a precision

of 90.1% and a recall of 93.1%.

Page 22: Information Flow Leaks in Android Apps and Analysis of ... · semantics. Associate the semantics with flow’s behavior. High level steps: Associate each flow with its related views

Thanks!

FlowCog open-source at: https: //github.com/SocietyMaster/FlowCog