what would users change in my app? summarizing app reviews for recommending software changes

What Would Users Change in My App? Summarizing App Reviews for

Recommending Software Changes.

Andrea Sebastiano Carol V. Junji Corrado A. Gerardo Harald Di Sorbo Panichella Alexandru Shimagaki Visaggio Canfora Gall

UNIVERSITÀ DEGLI STUDI DEL

SANNIO

OUTLINE

Context: Manual v.s. AutomatedAnalysis of User Reviews

Proposed Solution: Generating Summaries of User Reviews

Case Study: Assessment of the SummariesInvolving 23 Developers

Conclusion and Future Work

Manual v.s. AutomatedAnalysis of User Reviews

Maintenance of Mobile Applications

“About one third of app reviews contain useful information for developers”

Pagano et. al. RE2013

Manual Analysis of Reviews

PAST WORKChen et al – ICSE

Text Analysis to filter out non-informative reviews

Topic Analysis to recognize topics treated in the reviews classified as informative

PAST WORK Panichella et al – ICSME

FEATURE REQUESTPROBLEM DISCOVERY

INFORMATION SEEKINGINFORMATION GIVING

Sentiment Analysis+

Natural Language Parsing

+Text Analysis

The Problem

Feature Requests Bug Reports

Generating Summaries of User Reviews

SURF (Summarizer of User Review Feedback)

USER REVIEWS MODEL

I love this app but it crashes my whole iPad and it

has to restart itself•User intention: Problem Discovery

•Review topics: App, Model“…The User Reviews Model proposed by the authors is impressive in how it analyzes a review sentence by sentence and is able to characterize a sentence with multiple labels…” – one of FSE reviewers

SUMMARIZER OF USER REVIEW FEEDBACK

1. Data Collection1

2. Intention Classification2

machinelearning

3. Topics Classification3

Can't change position of icons on main screen and can't close bookmarks icon

screen, trajectory, button, white, background, interface, usability, tap, switch, icon, orientation,

position, picture, show, list, category, cover, scroll, touch, website, swipe, sensitive, view, roll, side, sort, click, small, colorful, glitch, page, corner,

bookmark…

GUI-related dictionary

P (SENTENCE, GUI) = 5/14 = 0.357

4. Sentence Scoring

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.

Intention Class ScoreProblem Discovery 3.0Feature Request 3.0

Information Seeking 1.0Information Giving 1.5

Other 0.5

IRSSENTENCE = 3.0

SENTENCE = Can't change position of icons on main screen and can't close bookmarks icon too

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences

P (SENTENCE, GUI) = 5/14 = 0.357

4. Sentence Scoring4

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences.Obs3) Longer sentences are usually more informative than shorter ones.

L SENTENCE = 80

Obs1) User feedback discussing bug reports and feature requests are more important for developers than all other reviews type.Obs2) Developers need reasonably useful sentences discussing specific aspect of an app with respect to other review sentences.Obs3) Longer sentences are usually more informative than shorter ones.Obs4) Reviews treating frequently discussed features may attract more attention of developers than reviews dealing with features rarely used or discussed by users

MFWR (SENTENCE,GUI) = 2/14 = 0.143

5. Summary Generation5

Case Study

Involving 23 Developers

Case Study

3439 Reviews

Case Study

3439 Reviews

Research Questions

RQ1: Is URM a robust and suitable model for representing user needs in meaningful maintenance tasks for developers?

RQ2: To what extent does a summarization technique developed on top of URM help mobile developers better understand the users' needs?

Study Procedure

TWO Experiments

Experiment I Experiment II

ITALY SWITZERLAND

NETHERLAND

TWO Experiments

Experiment I Experiment II

ITALY SWITZERLAND

NETHERLAND

TWO Experiments

Experiment I

SWITZERLAND

NETHERLAND

TWO Experiments

Experiment I

SWITZERLAND

NETHERLAND

1) Summaries for 15Apps

TWO Experiments

Experiment I

SWITZERLAND

NETHERLAND

2) Involving 16 Developers (6 were the original

developers)

TWO Experiments

Experiment I

SWITZERLAND

NETHERLAND

2) Involving 16 Developers (6 were the original

developers)3) We assigned to each participant an app.

TWO Experiments

Experiment II

TWO Experiments

Experiment II

1) Summaries Of 2Apps

TWO Experiments

Experiment II

1) Summaries Of 2Apps

2) Involving 7 Employers from

TWO Experiments

Experiment IIGroup 1 (3 subjects) Group 2 (4 subjects)

Experiment II-A

Experiment II-B

TWO Experiments

Experiment II-A

Experiment II-B

Participants ClassifiedReviews according to URM Participants Classified

Reviews according to URM

TWO Experiments

Experiment II-A

Experiment II-B

Participants ClassifiedReviews according to URM Participants Classified

Reviews according to URM

Participants Validatedthe summaries generated

by SURF

Participants Validatedthe summaries generated

by SURF

Is URM a robust and suitable model for representing user needs in meaningful

maintenance tasks for developers?

RQ1: Is URM a robust and suitable model for representing user needs in meaningful

Experiment I Experiment II &

78.26% of participants declared that URM is not

missing any relevant information and that the topics

considered in URM are EXAUSTIVE.

82% of participants declared that the most important topics modeled

in URM are the App, GUI and Feature or Functionality categories.

“I found the classification GUI-BUG, APP-BUG, etc

very useful. . .”

“. . in case I'm searching for BUGs, I can just

look for the category, instead of reading

everything over andover again. . .”

very useful. . .”

“. . in case I'm searching for BUGs, I can just

look for the category, instead of reading

everything over andover again. . .”

very useful. . .”

SUMMARY: Most of participants consider URM as a robust and suitable model for representing user

needs in meaningful maintenance tasks for developers.

To what extent does a summarization technique developed on top of URM help mobile developers better understand the

users' needs?

RQ2: To what extent does a summarization technique developed on top of URM help mobile developers

better understand the users' needs?

The validation task performed by the survey participants highlights

the very high classification accuracy of

SURF, which is 91%.

To what extent does a summarization technique developed on top of URM help mobile developers

The validation task performed by the survey

participants highlights the very high classification

accuracy of SURF, which is 91%.

To what extent does a summarization technique developed on top of URM help mobile developers

SURF works reasonable well in summarizing user feedback regarding change requests

concerning GUI, APP, FEATURE improvements with the only

exception of the maintenance topic “COMPANY”.

How do app review summaries generatedby SURF impact the time required by developers to

analyze user reviews?

The time saving capability of

SURF perceived by all developers

Is of at least 50%.

94% of participants believe that the time saving capability ofSURF is of 75%.

Is of at least 50%.

SURF helps to prevent more than 50% of the time required by developers for analyzing users

feedback and planning software changes.

Is of at least 50%.

feedback and planning software changes.66% of feedback manually

extracted by the participants also appear in the summaries

automatically generated by SURF.

Is of at least 50%.

feedback and planning software changes.66% of feedback manually

extracted by the participants also appear in the summaries

automatically generated by SURF.

SUMMARY: 1) SURF helps to prevent more than half of

the time required by developers for analyzing users feedback and planning software changes.

2) 66% of manually extracted feedback appears also in the automatic generated summaries.

Quality of SURF’ Summaries

Conclusion

1) URM is a robust and suitable model for representing user needs in

meaningful maintenance tasks for developers.

2) SURF helps to prevent more than half of the time required for

analyzing users feedback and planning software changes.

3) 66% of manually extracted feedback appears

also in the automatic generated summaries.

4) Summaries generated by SURF are reasonably correct,

adequate, concise, and expressive.

Thanks for the Attention!

Questions?

SURF (Summarizer of User Review Feedback)

what would users change in my app? summarizing app reviews for recommending software changes

Presentations & Public Speaking

summarizing skills

summarizing paraphrasing

outlining & summarizing

summarizing nephrology

typical recommending systems

tpp summarizing

summarizing ppt

summarizing tips

outline recommending experts...

summarizing islam

poets recommending

summarizing plpoint

summarizing strategies

recommending a strategy

summarizing ppoint.ppt

summarizing nonfictionpowerpoint

summarizing & notetaking

8. summarizing

recommending music recommending tracks recommending...

(apk/app) -...