softlab boğaziçi university department of computer engineering software engineering research lab
TRANSCRIPT
SoftLabBoğaziçi University Department of Computer Engineering
Software Engineering Research Labhttp://softlab.boun.edu.tr/
Research Challenges Trend to large, heterogenous,
distributed sw systems leads to an increase in system complexity
Software and service productivity lags behind requirements
Increased complexity takes sw developers further from stakeholders
Importance of interoperability, standardisation and reuse of software increasing.
Research Challenges Service Engineering Complex Software Systems Open Source Software Software Engineering Research
Software Engineering Research Approaches
Balancing theory and praxis How engineering research differs
from scientific research The role of empirical studies Models for SE research
The need to link research with practice
Why after 25 years of SE has SE research failed to influence industrial practice and the quality of resulting software?
Potts argues that this failure is caused by treating research and its application by industry as separate, sequential activities.
What he calls the research-then-transfer approach. The solution he proposes is the industry-as-laboratory approach.
.
Colin Potts, Software Engineering Research Revisited, IEEE Software, September 1993
Industry-as-Laboratory Approach Stronger connection at start because knowledge
of problem is acquired from the real practitioners in industry, often industrial partners in a research consortium.
Connection is strengthened by practitioners and researchers constantly interacting to develop the solution
Early evaluation and usage by industry lessens the Technology Transfer Gap.
Reliance on Empirical Research shift from solution-driven SE to problem-focused SE solve problems that really do matter to practitioners
Industry-as-Laboratory emphasizes Real Case StudiesAdvantages of case studies over studying
problems in research lab. Scale and complexity - small, simple (even simplistic)
cases avoided - these often bear little relation to real problems.
Unpredictability - assumptions thrown out as researchers learn more about real problems
Dynamism - a ‘real’ case study is more vital than a textbook account
The real-world complications of industrial case studies are more likely to throw up representative problems and phenomena than research laboratory examples influenced by the researchers’ preconceptions.
Need to consider Human/Social Context in SE research
Not all solutions in software engineering are solely technical.
There is a need to examine organizational, social and cognitive factors systematically as well.
Many problems are “people problems”, and require “people-orientated” solutions.
Theoretical SE research
While there is still a place for innovative, purely speculative research in Software Engineering, research which studies real problems in partnership with industry needs to be given a higher profile.
These various forms of research ideally complement one another.
Neither is particularly successful if it ignores the other. Too industrially focused research may lack adequate
theory!
Academically focused research may miss the practice!
Software Engineering Research Approaches
The Industry-as-Laboratory approach links theory and praxis
Engineering research aims to improve existing processes and/or products
Empirical studies are needed to validate Software Engineering research
Models for SE research need to shift from the analytic to empirical.
Empirical SE Research
Real life Problems
Do parts connect?
Research Questions:
Research Questions:
Research Questions:
Research Questions:
Our Research Question Software
development lifecycle:
Requirements Design Development Test (Takes ~50% of overall
time) Detect and correct
defects before delivering software.
Test strategies: Expert judgment Manual code reviews Oracles/ Predictors as
secondary tools
In Practise Product quality
Lower defect rates Less costly testing times Low maintenance cost
Process quality Effort and cost estimation Process improvement
Research Question
How much test is enough? When to stop testing?
Problem
Decision making under uncertainity
Solution
CS claims it can be solved
Artificial Intelligence
SE Research Intersection of AI and Software
Engineering An opportunity to:
Use some of the most interesting computational techniques to solve some of the most important and rewarding questions
AI Fields, Methods and Techniques
What Can We Learn From Each Other?
Software Development Reference Model
Intersection of AI and SE Research
Empirical Software Engineering
Intersection of AI and SE Research
Build Oracles to predict Defects Cost and effort Refactoring
Measure Static code attributes Complexity and call graph structure
Data collection Open repositories (NASA, Promise) Open source Softlab Data Repository (SDR)
Software Engineering Domain Classical ML applications
Data miner performance The more data the better the
performance Little or no meaning behind the
numbers, no interesting stories to tell
Software Engineering Domain Algorithm performance Understanding Data
Change training data: over/ under/ micro sampling Noise analysis Increase information content of data Feature analysis/ weighting Learn what you will predict later Cross company vs within company data
Domain Knowledge SE ML
Software Engineering Research
Predictive Models Defect prediction and cost estimation Bioinformatics
Process Models Quality Standards Measurement
Major Research Areas Software Measurement
Defect Prediction/ Estimation
Effort & Cost Estimation
Process Improvement (CMM)
Dynamicanalyser
Programbeing tested
Testresults
Testpredictions
Filecomparator
Executionreport
Simulator
Sourcecode
Testmanager Test data Oracle
Test datagenerator
Specification
Reportgenerator
Test resultsreport
A Testing Workbench
Static Code Attributes void main() { //This is a sample code
//Declare variables int a, b, c;
// Initialize variables a=2; b=5;
//Find the sum and display c if greater than zero
c=sum(a,b); if c < 0 printf(“%d\n”, a); return; }
int sum(int a, int b) { // Returns the sum of two
numbers return a+b; }
Module LOC LOCC V CC Error
main() 16 4 5 2 2
sum() 5 1 3 1 0
LOC: Line of CodeLOCC: Line of commented CodeV: Number of unique operands&operatorsCC: Cyclometric Complexity
c > 0
c
Prest
A tool developed by Softlab
Parser C, Java, C++, jsp
Metric Collection Data Analysis
Public Datasets NASA (IV&V Facility, Metrics Program) PROMISE (Software Engineering Repository)
Includes Softlab data now Open Source Projects (Sourceforge, Linux, etc.) Internet based small datasets University of South California (USC) Dataset Desharnais Dataset ICBSG Dataset NASA COCOMO and NASA 93 Datasets
Softlab Data Repository (SDR) Local industry collaboration Total 20 companies, 25 projects over 5 years
Data Sources
Tangible Benefit
Projects Gerçek hatalı dosya sayısı
pd pf İncelenen modül sayısı
Test eforunda kazanım
XXX 5 %63 %9 31 %84
YYY 1 %100 %10 27 %87
ZZZ 1 %100 %23 103 %77
Requirements Analysis
Coding
Design
Test
Maintenance ∞
Defect prediction
Call Graph / Refactoring
Refactoring
Matching reqs with defects
Test driven development
Emerging Research Topics Adding organizational factors to local prediction model
Information about the development team, experience, coding practices, etc. Adding file metrics from version history
Modified/added/deleted lines of code Selecting only modified files from each version in the prediction model
Confidence Factor Using time factors Dynamic prediction: Constructing a model
for each application in a version for each module/package in an application for each developer by learning from his/her coding habits
TDD Measuring test coverage Defect proneness Company wide implementation process Embedded systems
Cost/ Effort Estimation Dynamic estimation per process