using social network analysis methods for the prediction of faulty components gholamreza safi
TRANSCRIPT
Using Social Network Analysis Methods for
the Prediction of Faulty Components
Gholamreza Safi
List of Contents
• Motivations• Our Vision• Goal Models• Software Architectural Slices(SAS)• Predicting Fault prone components• Comparison with related works• Conclusions
2
Motivations
• Finding errors as early as possible in software life-cycle is important
• Using dependency data, socio-technical analysis• Considering dependency between software elements• Considering interactions between developers during
the life-cycle
3
[Bird et al 2009]
Our Vision
• Provide a facility for considering concerns of roles other than developers who participating in development process
• Not directly like socio-technical based approaches• Complexity• Some basis is needed to model concerns
• Goal Models and software architectures
4
Goal Models
5
Goal models and software architectures
• Software Architecture(SA): Set of principal design decisions• goal models represent the different way of
satisfaction of a high-level goal• They could have impacts 0n SA• Components and connectors is a common
representation of SA• So we should show the impact of Goal models on this
representation of SA
6
Software Architectural Slices(SAS)
• Software Architectural Slices(SAS): is part of a software architecture (a subset of interacting components and related connectors) that provides the functionality of a leaf level goal of a goal model graph
• An Algorithm is designed to extract SAS of a system, given goal model and the entry point of leaf level goals in the SA
7
Example of SAS
Leaf-level Goal in goal Model SliceSend request for topic User Interface, User Manager, User Data
InterfaceDecrypt received message User ManagerSend Requests for Interests User Interface, User Manager, User Data
InterfaceSend Request for time table User Interface, User Manager, User Data
Interface, Time Table ManagerChoose schedule Automatically User Interface, User Manager, User Data
Interface, Event Manager, Event Data Interface
Select Participants Explicitly User Interface, User Manager, User Data Interface
Collect Timetables by system from Agents
User Interface, Agent Manager Interface
8
Predicting Fault prone components
• Social Networks analysis methods• Metrics
• Connectivity metrics:• individual nodes and their immediate
neighbors• Degree
• Centrality Metrics• relation between non-immediate neighbor
nodes in network• Closeness
• Betweeness
9
ComponentsDegree Closeness Betweeness
User Interface 4 8/6=1.33 6+1+1+1=9User Manager 2 11/6=1.83 ½+1/2+1/2=1.5Timetable Manager
3 9/6=1.5 ½+1/3+1+1/3+1/2+1/2=2.88
Event Manager 2 11/6=1.83 1/3+1/3=2/3Agent Manager Interface
2 11/6=1.83 1/3+1/3=2/3
User Data Interface
2 14/6=2.33 0
Event Data Interface
3 12/6=2 0
10
Aggregated Metrics based on SAS
Leaf Level Goals Aggregated Degree
Aggregated Closeness
Aggregated Betweeness
Send request for topic 8 33/6 10.5Decrypt received message
2 11/6 1.5
Send Requests for Interests
8 33/6 10.5
Send Request for time table
11 42/6=7 10.5+2.88=13..33
Choose schedule Automatically
13 58/6 10.5+2/3=11.16
Select Participants Explicitly
8 33/6 10.5
Collect Timetables by system from Agents
6 19/6 9+2/3=9.66
• metrics for individual components could not be very useful for test related analysis, since it only provide information for unit level testing
• In a real computation many components collaborate with each other to provide a service or satisfy a goal of the system
• A bug in one of them could have bad impact on all of the other collaborators
11
Logistic Regression
)1(1
1)( equation
ezf
z
12
Logistic regression and Architectural slices
• we want to select the beta values for three aggregated metrics
• After this by using f(z) we could find the probability of the event that the corresponding architectural slice encounter at least one error
• The process for making a logistic regression ready for prediction contains two stages:• Training• Validation
13
3322110 xxxz
• Consider a test suite and based on the number of failed test cases, compute the probability of a slice to being faulty (number of failed test case for that slice/total number of test cases)
• then using metrics, try to find beta values to make f(z) close to computed probability.
• Evaluate the model by actual data. • Validation measures could help us to determine the
quality of our initial model.
• The process of training and validation should be repeated until we reach to a certain level of confidence about our model.
How to train and validate?
14
Measures for Validation
• Precision: Ratio of (true positives) to (true positives + false positives)• True positives: number of error prone slices
which also determine to be error-prone in the model
• False positives: Those which have not error but shown to have errors using approach
• Recall: Ratio of (true positives) to (true positives + false negatives) • False Negatives: Those which are considered to
be error free by mistake using approach
• F Score:
15
Related works
• Zimmerman and Nagappan • Uses dependency data between different part of code • These kind of techniques are accurate• Central components could have more errors
• Bird et al. using a socio-technical network • consider the work of developers and the updates that
they make to files • Similar to Meneely et al.• The main idea:
• A developer who participated in developing different files could have the same impact on those files
– Make the same sort of faults
16
Comparison with related works
• Our approach has the benefits of dependency data approaches • Dependency between SA components• Dependency between goals and SA• Goal models introduce some privileges
• Compare to Social Network based approaches:• They only consider simple contribution of developers such
as updating a file• Goals and their relations shows the concerns of
stakeholders• consider impacts of different stakeholders implicitly • other aspects of a developer
– lack of knowledge in using a specific technology – Strong experience of a developer in using a method or
technology
• Augmenting our approach to consider developer interaction is also possible
17
Conclusion
• Introduce metrics based on dependency between components of software architecture
• Introduce aggregate metrics to show impact of goal selection on error prediction using architectural slices
• Prediction using logistic regression• Training• Validation
• Compare to existing works• We could consider other roles than developers• Different aspect of contribution of developers
• Evaluations
18
Reference
• T. Zimmermann and N. Nagappan, “Predicting Subsystem Failures using Dependency Graph Complexities,” The 18th IEEE International Symposium on Software Reliability (ISSRE '07), Trollhattan, Sweden: 2007, pp. 227-236.
• A. Meneely, L. Williams, W. Snipes, and J. Osborne, “Predicting failures with developer networks and social network analysis,” Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of software engineering - SIGSOFT '08/FSE-16, Atlanta, Georgia: 2008, p. 13.
• C. Bird, N. Nagappan, H. Gall, B. Murphy, and P. Devanbu, “Putting It All Together: Using Socio-technical Networks to Predict Failures,” 2009 20th International Symposium on Software Reliability Engineering, Mysuru, Karnataka, India: 2009, pp. 109-119.
19
20