softlab boğaziçi university department of computer engineering software engineering research lab

SoftLabBoğaziçi University Department of Computer Engineering

Software Engineering Research Labhttp://softlab.boun.edu.tr/

Research Challenges Trend to large, heterogenous,

distributed sw systems leads to an increase in system complexity

Software and service productivity lags behind requirements

Increased complexity takes sw developers further from stakeholders

Importance of interoperability, standardisation and reuse of software increasing.

Research Challenges Service Engineering Complex Software Systems Open Source Software Software Engineering Research

Software Engineering Research Approaches

Balancing theory and praxis How engineering research differs

from scientific research The role of empirical studies Models for SE research

The need to link research with practice

Why after 25 years of SE has SE research failed to influence industrial practice and the quality of resulting software?

Potts argues that this failure is caused by treating research and its application by industry as separate, sequential activities.

What he calls the research-then-transfer approach. The solution he proposes is the industry-as-laboratory approach.

.

Colin Potts, Software Engineering Research Revisited, IEEE Software, September 1993

Industry-as-Laboratory Approach Stronger connection at start because knowledge

of problem is acquired from the real practitioners in industry, often industrial partners in a research consortium.

Connection is strengthened by practitioners and researchers constantly interacting to develop the solution

Early evaluation and usage by industry lessens the Technology Transfer Gap.

Reliance on Empirical Research shift from solution-driven SE to problem-focused SE solve problems that really do matter to practitioners

Industry-as-Laboratory emphasizes Real Case StudiesAdvantages of case studies over studying

problems in research lab. Scale and complexity - small, simple (even simplistic)

cases avoided - these often bear little relation to real problems.

Unpredictability - assumptions thrown out as researchers learn more about real problems

Dynamism - a ‘real’ case study is more vital than a textbook account

The real-world complications of industrial case studies are more likely to throw up representative problems and phenomena than research laboratory examples influenced by the researchers’ preconceptions.

Need to consider Human/Social Context in SE research

Not all solutions in software engineering are solely technical.

There is a need to examine organizational, social and cognitive factors systematically as well.

Many problems are “people problems”, and require “people-orientated” solutions.

Theoretical SE research

While there is still a place for innovative, purely speculative research in Software Engineering, research which studies real problems in partnership with industry needs to be given a higher profile.

These various forms of research ideally complement one another.

Neither is particularly successful if it ignores the other. Too industrially focused research may lack adequate

theory!

Academically focused research may miss the practice!

Software Engineering Research Approaches

The Industry-as-Laboratory approach links theory and praxis

Engineering research aims to improve existing processes and/or products

Empirical studies are needed to validate Software Engineering research

Models for SE research need to shift from the analytic to empirical.

Empirical SE Research

Real life Problems

Do parts connect?

Research Questions:

Our Research Question Software

development lifecycle:

Requirements Design Development Test (Takes ~50% of overall

time) Detect and correct

defects before delivering software.

Test strategies: Expert judgment Manual code reviews Oracles/ Predictors as

secondary tools

In Practise Product quality

Lower defect rates Less costly testing times Low maintenance cost

Process quality Effort and cost estimation Process improvement

Research Question

How much test is enough? When to stop testing?

Problem

Decision making under uncertainity

Solution

CS claims it can be solved

Artificial Intelligence

SE Research Intersection of AI and Software

Engineering An opportunity to:

Use some of the most interesting computational techniques to solve some of the most important and rewarding questions

AI Fields, Methods and Techniques

What Can We Learn From Each Other?

Software Development Reference Model

Intersection of AI and SE Research

Empirical Software Engineering

Intersection of AI and SE Research

Build Oracles to predict Defects Cost and effort Refactoring

Measure Static code attributes Complexity and call graph structure

Data collection Open repositories (NASA, Promise) Open source Softlab Data Repository (SDR)

Software Engineering Domain Classical ML applications

Data miner performance The more data the better the

performance Little or no meaning behind the

numbers, no interesting stories to tell

Software Engineering Domain Algorithm performance Understanding Data

Change training data: over/ under/ micro sampling Noise analysis Increase information content of data Feature analysis/ weighting Learn what you will predict later Cross company vs within company data

Domain Knowledge SE ML

Software Engineering Research

Predictive Models Defect prediction and cost estimation Bioinformatics

Process Models Quality Standards Measurement

Major Research Areas Software Measurement

Defect Prediction/ Estimation

Effort & Cost Estimation

Process Improvement (CMM)

Dynamicanalyser

Programbeing tested

Testresults

Testpredictions

Filecomparator

Executionreport

Simulator

Sourcecode

Testmanager Test data Oracle

Test datagenerator

Specification

Reportgenerator

Test resultsreport

A Testing Workbench

Static Code Attributes void main() { //This is a sample code

//Declare variables int a, b, c;

// Initialize variables a=2; b=5;

//Find the sum and display c if greater than zero

c=sum(a,b); if c < 0 printf(“%d\n”, a); return; }

int sum(int a, int b) { // Returns the sum of two

numbers return a+b; }

Module LOC LOCC V CC Error

main() 16 4 5 2 2

sum() 5 1 3 1 0

LOC: Line of CodeLOCC: Line of commented CodeV: Number of unique operands&operatorsCC: Cyclometric Complexity

c > 0

c

Prest

A tool developed by Softlab

Parser C, Java, C++, jsp

Metric Collection Data Analysis

Public Datasets NASA (IV&V Facility, Metrics Program) PROMISE (Software Engineering Repository)

Includes Softlab data now Open Source Projects (Sourceforge, Linux, etc.) Internet based small datasets University of South California (USC) Dataset Desharnais Dataset ICBSG Dataset NASA COCOMO and NASA 93 Datasets

Softlab Data Repository (SDR) Local industry collaboration Total 20 companies, 25 projects over 5 years

Data Sources

Tangible Benefit

Projects Gerçek hatalı dosya sayısı

pd pf İncelenen modül sayısı

Test eforunda kazanım

XXX 5 %63 %9 31 %84

YYY 1 %100 %10 27 %87

ZZZ 1 %100 %23 103 %77

Requirements Analysis

Coding

Design

Test

Maintenance ∞

Defect prediction

Call Graph / Refactoring

Refactoring

Matching reqs with defects

Test driven development

Emerging Research Topics Adding organizational factors to local prediction model

Information about the development team, experience, coding practices, etc. Adding file metrics from version history

Modified/added/deleted lines of code Selecting only modified files from each version in the prediction model

Confidence Factor Using time factors Dynamic prediction: Constructing a model

for each application in a version for each module/package in an application for each developer by learning from his/her coding habits

TDD Measuring test coverage Defect proneness Company wide implementation process Embedded systems

Cost/ Effort Estimation Dynamic estimation per process

softlab boğaziçi university department of computer engineering software engineering research lab

Documents