what would users change in my app? summarizing app reviews for recommending software changes

Post on 13-Jan-2017

146 Views

Category:

Presentations & Public Speaking

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

What Would Users Change in My App? Summarizing App Reviews for

Recommending Software Changes.

Andrea Sebastiano Carol V. Junji Corrado A. Gerardo Harald Di Sorbo Panichella Alexandru Shimagaki Visaggio Canfora Gall

UNIVERSITÀ DEGLI STUDI DEL

SANNIO

2

OUTLINE

Context: Manual v.s. AutomatedAnalysis of User Reviews

Proposed Solution: Generating Summaries of User Reviews

Case Study: Assessment of the SummariesInvolving 23 Developers

Conclusion and Future Work

3

Manual v.s. AutomatedAnalysis of User Reviews

V.S.

4

Maintenance of Mobile Applications

“About one third of app reviews contain useful information for developers”

Pagano et. al. RE2013

5

Manual Analysis of Reviews

6

PAST WORKChen et al – ICSE

2014

Text Analysis to filter out non-informative reviews

Topic Analysis to recognize topics treated in the reviews classified as informative

7

PAST WORK Panichella et al – ICSME

2015

FEATURE REQUESTPROBLEM DISCOVERY

INFORMATION SEEKINGINFORMATION GIVING

OTHER

Sentiment Analysis+

Natural Language Parsing

+Text Analysis

8

The Problem

Feature Requests Bug Reports

9

Generating Summaries of User Reviews

SURF (Summarizer of User Review Feedback)

10

USER REVIEWS MODEL

11

USER REVIEWS MODEL

I love this app but it crashes my whole iPad and it

has to restart itself•User intention: Problem Discovery

•Review topics: App, Model“…The User Reviews Model proposed by the authors is impressive in how it analyzes a review sentence by sentence and is able to characterize a sentence with multiple labels…” – one of FSE reviewers

12

SUMMARIZER OF USER REVIEW FEEDBACK

13

1. Data Collection1

14

2. Intention Classification2

machinelearning

15

3. Topics Classification3

16

3. Topics Classification3

17

3. Topics Classification3

Can't change position of icons on main screen and can't close bookmarks icon

too.

screen, trajectory, button, white, background, interface, usability, tap, switch, icon, orientation,

position, picture, show, list, category, cover, scroll, touch, website, swipe, sensitive, view, roll, side, sort, click, small, colorful, glitch, page, corner,

bookmark…

GUI-related dictionary

P (SENTENCE, GUI) = 5/14 = 0.357

18

4. Sentence Scoring

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.

Intention Class ScoreProblem Discovery 3.0Feature Request 3.0

Information Seeking 1.0Information Giving 1.5

Other 0.5

4

IRSSENTENCE = 3.0

SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too

19

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences

P (SENTENCE, GUI) = 5/14 = 0.357

4. Sentence Scoring4

SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too

20

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences.Obs3) Longer sentences are usually more informative than shorter ones.

L SENTENCE = 80

4. Sentence Scoring4

SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too

21

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences.Obs3) Longer sentences are usually more informative than shorter ones.Obs4) Reviews treating frequently discussed features may attract more attention of developers than reviews dealing with features rarely used or discussed by users

MFWR (SENTENCE,GUI) = 2/14 = 0.143

4. Sentence Scoring4

SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too

23

5. Summary Generation5

24

Case Study

Involving 23 Developers

25

Case Study

Involving 23 Developers

3439 Reviews

26

Case Study

Involving 23 Developers

3439 Reviews

Of17

Apps

27

Research Questions

RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers?

RQ2: To what extent does a summarization technique developed on top of URM help mobile developers better understand the users' needs?

URM

28

Study Procedure

29

TWO Experiments

Experiment I Experiment II

ITALY SWITZERLAND

NETHERLAND

JAPAN

30

TWO Experiments

Experiment I Experiment II

ITALY SWITZERLAND

NETHERLAND

JAPAN

31

TWO Experiments

Experiment I

ITALY

SWITZERLAND

NETHERLAND

32

TWO Experiments

Experiment I

ITALY

SWITZERLAND

NETHERLAND

1) Summaries for 15Apps

33

TWO Experiments

Experiment I

ITALY

SWITZERLAND

NETHERLAND

1) Summaries for 15Apps

2) Involving 16 Developers (6 were the original

developers)

34

TWO Experiments

Experiment I

ITALY

SWITZERLAND

NETHERLAND

1) Summaries for 15Apps

2) Involving 16 Developers (6 were the original

developers)3) We assigned to each participant an app.

35

TWO Experiments

Experiment II

JAPAN

36

TWO Experiments

Experiment II

JAPAN

1) Summaries Of 2Apps

37

TWO Experiments

Experiment II

JAPAN

1) Summaries Of 2Apps

2) Involving 7 Employers from

38

TWO Experiments

Experiment IIGroup 1 (3 subjects) Group 2 (4 subjects)

Experiment II-A

Experiment II-B

39

TWO Experiments

Experiment IIGroup 1 (3 subjects) Group 2 (4 subjects)

Experiment II-A

Experiment II-B

Participants ClassifiedReviews according to URM Participants Classified

Reviews according to URM

40

TWO Experiments

Experiment IIGroup 1 (3 subjects) Group 2 (4 subjects)

Experiment II-A

Experiment II-B

Participants ClassifiedReviews according to URM Participants Classified

Reviews according to URM

Participants Validatedthe summaries generated

by SURF

Participants Validatedthe summaries generated

by SURF

41

Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

RQ1

42

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

Experiment I Experiment II &

43

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

Experiment I Experiment II &

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

44

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

Experiment I Experiment II &

82% of participants declared that the most important topics modeled

in URM are the App, GUI and Feature or Functionality categories.

45

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

Experiment I Experiment II &

82% of participants declared that the most important topics modeled

in URM are the App, GUI and Feature or Functionality categories.

“I found the classification GUI-BUG, APP-BUG, etc

very useful. . .”

46

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

Experiment I Experiment II &

82% of participants declared that the most important topics modeled

in URM are the App, GUI and Feature or Functionality categories.

“. . in case I'm searching for BUGs, I can just

look for the category, instead of reading

everything over andover again. . .”

“I found the classification GUI-BUG, APP-BUG, etc

very useful. . .”

47

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

Experiment I Experiment II &

82% of participants declared that the most important topics modeled

in URM are the App, GUI and Feature or Functionality categories.

“. . in case I'm searching for BUGs, I can just

look for the category, instead of reading

everything over andover again. . .”

“I found the classification GUI-BUG, APP-BUG, etc

very useful. . .”

SUMMARY: Most of participants consider URM as a robust and suitable model for representing user

needs in meaningful maintenance tasks for developers.

48

To what extent does a summarization technique developed on top of URM help mobile developers better understand the

users' needs?

RQ2

49

RQ2: To what extent does a summarization technique developed on top of URM help mobile developers

better understand the users' needs?

50

RQ2:

The validation task performed by the survey participants highlights

the very high classification accuracy of

SURF, which is 91%.

To what extent does a summarization technique developed on top of URM help mobile developers

better understand the users' needs?

51

RQ2:

The validation task performed by the survey

participants highlights the very high classification

accuracy of SURF, which is 91%.

To what extent does a summarization technique developed on top of URM help mobile developers

better understand the users' needs?

SURF works reasonable well in summarizing user feedback regarding change requests

concerning GUI, APP, FEATURE improvements with the only

exception of the maintenance topic “COMPANY”.

52

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

53

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

54

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

55

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

SURF helps to prevent more than 50% of the time required by developers for analyzing users

feedback and planning software changes.

56

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

SURF helps to prevent more than 50% of the time required by developers for analyzing users

feedback and planning software changes.66% of feedback manually

extracted by the participants also appear in the summaries

automatically generated by SURF.

57

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

SURF helps to prevent more than 50% of the time required by developers for analyzing users

feedback and planning software changes.66% of feedback manually

extracted by the participants also appear in the summaries

automatically generated by SURF.

SUMMARY: 1) SURF helps to prevent more than half of

the time required by developers for analyzing users feedback and planning software changes.

2) 66% of manually extracted feedback appears also in the automatic generated summaries.

58

Quality of SURF’ Summaries

59

Quality of SURF’ Summaries

60

Quality of SURF’ Summaries

61

Conclusion

1) URM is a robust and suitable model for representing user needs in

meaningful maintenance tasks for developers.

2) SURF helps to prevent more than half of the time required for

analyzing users feedback and planning software changes.

3) 66% of manually extracted feedback appears

also in the automatic generated summaries.

V.S.

4) Summaries generated by SURF are reasonably correct,

adequate, concise, and expressive.

Thanks for the Attention!

Questions?

SURF (Summarizer of User Review Feedback)

top related