spotting working code examples (icse 2014)

Post on 02-Jul-2015

163 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

SPOTTING

WORKING CODE EXAMPLES

Iman Keivanloo Juergen Rilling Ying Zou

1

Code Completion

_File_>

public static void test() {

FileInputStream fStream = new FileInputStrea…

try {

String everything = IOUtils.toString(fStream );

} finally {

fStream.close();

}

2

Code Recommendation

_FileInputStream_>

3

• Limited query

• Usage pattern

4

Spotting Working Code Examples

_Read file line by line FileInputStream_ __> Real-time search

100ms < <400ms

Challenges

in Spotting Working Code Example

Correctness

while ((content = fis.read()) != -1){

System.out.print((char) content);}

Correct Complete Concise

FileInputStream fis = null;

File file = new File(“foo.txt”);

fis = new FileInputStream(file);

int content;Send SMS …

+ +

5

Challenges in Spotting Working Code

Example

Query:

{read, file}549,750

6

7

8

Why NOT Vector Space Model?

• e.g.,

test(readFile(“f1.txt”));

test(readFile(“f2.txt”));

test(readFile(“f3.txt”));

VSMBag-of-

words

Cosine

similarity+ +

VSM does not search for patterns+

9

Search Space

Search Algorithm

Similarity

Search Space

Content

Similarity

int temp = 1;

int temp = 0;

float var = 3;

{int, temp}

{int, temp}

{float, var}

*Bag-of-words model

10

Our Approach

Search Space

Content

Similarity

int temp = 1;

int temp = 0;

float var = 3;

{int, temp}

{int, temp}

{float, var}

*Bag-of-words model *p-strings

[Baker, B. S. 1993]

Pattern

Similarity

𝜌 𝜌 = 𝜌 ;

𝜌 𝜌 = 𝜌 ;

𝜌 𝜌 = 𝜌 ;

𝜌 𝜌 = 𝜌 ; + {int, temp, foat, var}

𝜌 𝜌 = 𝜌 ; + {int, temp, foat, var}

𝜌 𝜌 = 𝜌 ; + {int, temp, foat, var}

11

12

Offline Code Snippet Processing

12

13

Discarding Unnecessary Details …

13

14

{int, temp, foat, var}

Representation without Ordering Data

14

Mining Abstract Solutions

15

abstract programming solution (clone)

16

Search Space

Search Algorithm

The Proposed Greedy Algorithm

top-k lines

(imaginary snippet)

1st abstract clone top snippet

query

{read, file}

𝑙𝑞,1

𝑙𝑞,2

𝑙𝑞,𝑛

𝑝𝑐,1

𝑝𝑐,2

𝑝𝑐,𝑛

𝑐𝑝,1

𝑐𝑝,2

𝑐𝑝,𝑛

top-k abstract

clones

top-k lines

17

Spotting Working Code Examples

1. Free-form querying

2. Self-contained code examples

query= { JFreeChart, JPEG}

18

Spotting Working Code Examples

3. Less dependency on term matching

4. No limitation on query’s terms

query= { bubblesort }

19

Case Study

1. Feasibility (e.g., no data/control flow data!)

2. Scalability

3. Performance:

•RQ1 Ranking schema?

•RQ2 Our approach VS. code search engines?

20

Corpus

~12 million

Java classes

~25,000~3 million

Unique Java

classes

~300 million

LOC

-----------------

5.5 million

fragments

~15.5 million

abstract clones

21

• Features for ranking:

1. Similarity (S)

2. Popularity (P)

3. Size (A)

feature X

Top-K

RQ1 – What is the best ranking schema

for spotting working code examples?

Re-ranking

4. Combination of P and S

5. Combination of A and S

22

• Recall is misleading

• The first answer matters

• WTA (Winner Takes All)

RQ1 – What is the best ranking schema?

23

Whether the top ranked answer is correct?

RQ1 – What is the best ranking schema?

60

70

80

90

S P PS A AS

Coverage

Precision

Similarity (S) Popularity (P) Size (A)

S P P+S A A+S

24

Whether the top ranked answer is a good code example?

RQ1 – What is the best ranking schema?

Completeness Conciseness

S P A P

100

60

30

100

60

20

(S) Similarity

(P) Popularity

(A) Size

S P A P+S A+S

Popularity + Similarity

leads to the best ranking schema

for

spotting working code examples

RQ2 – Can our approach outperform

Internet-scale code search engines?

Our approach

~25,000

26

27

RQ2 – Our approach vs. Ohloh Code?

Our approach Our approach

Best Hit’s Rank NDCG

40

20

2

1

0.7

0.5

The proposed real-time search is

feasible + outperforms Ohloh Code

28

Summary

top related