digging for diamonds: identifying valuable web automation programs in repositories jarrod jackson 1,...

Digging for diamonds: Identifying Digging for diamonds: Identifying valuable web automation programs valuable web automation programs

in repositoriesin repositories

Jarrod Jackson1, Chris Scaffidi2, Katie Stolee2

1 Oregon State University2 University of Nebraska - Lincoln

22

Web scripts:Web scripts:Enabling users to enhance the browserEnabling users to enhance the browser

IBM CoScripter Web Macro

Problem Approach Evaluation Conclusion

33


Yahoo Pipe


44


GreaseMonkey UserScript


55

Repositories of end-user code:Repositories of end-user code:The good, the great, and the “other”The good, the great, and the “other”

C. Bogart, et al. End-User Programming in the Wild: A Field Study of CoScripter Scripts. VL/HCC 2008.

Previous study:

Of 1445 web macros…~ 10% had many runs~ 10% had many users~ 80% were “other”

This is the largest web macro repository> 6000 users, > 3000 “public” scripts


66

What if our repositories could…What if our repositories could…

• … omit pieces of code from search results if they are unlikely to be reused, anyway?

• ... provide a UI for administrators to review (and remove?) old code that’s unlikely to be used?

• … advise programmers, when they upload code, about how to improve the reusability of their code?


77

Needed: a model for predicting reuseNeeded: a model for predicting reuse

• Key questions for discovering such a model…– What information about the code indicates reusability?– How do we combine this information to predict reuse?

• Similar models have been successful on OO code– Predicting reuse based on coupling & cohesion– Predicting bugginess based on code complexity metrics,

information about code authors, code churn, …

Web scripts are much simpler (don’t call each other, don’t have inheritance, etc)… we need different information here.


88

Prior work found 35 traits (in 8 Prior work found 35 traits (in 8 categories) statistically related to reusecategories) statistically related to reuse• Mass appeal – eg popular keywords• Language – eg data values are in English• Annotations – eg comments• Flexibility – eg parameterization (variables)• Length – eg small # distinct lines of code• Author information – eg early adopter?• Advanced syntax – eg “control-click” keyword• No Preconditions – eg no cookies needed

All traits are computed automatically from one of four sources: executable code statements, URLs referenced, annotations, code history.


99

• Given a binary measure of reuse, for each trait– Find the threshold that optimally divides the reused

scripts from the un-reused scripts

Model that we developed Model that we developed (in words & pictures)(in words & pictures)

Tra

it le

vel

Threshold


1010

Predicting if a macro will be reusedPredicting if a macro will be reused

• Count how many predictors are satisfied

• Predict that the macro will be reused if this count exceeds some minimum– A tunable parameter– A higher minimum implies a higher bar that a script

must overcome to be predicted as to be reused• Fewer false positives, higher false negatives


1111

ExampleExample

• E.g.: Suppose that our measure of reuse is “script is reused more than 75% of other scripts”

• Suppose that based on this measure of reuse, the best thresholds for four predictors are…

comments ≥ 3 lines_of_code ≥ 40 prev_created ≥ 10 literals ≤ 4

• The model would predict that some other script would satisfy the reuse measure criterion if the script satisfies at least n of these predictors


1212

How well does this approach work…How well does this approach work…

• … for different kinds of web scripts?

• … for different reuse measures?

• … when predicting future reuse based on past data?

• … when only a subset of traits are available?


1313

Scripts and measures Scripts and measures for our evaluationfor our evaluation


Measure cutoff

1414

Accuracy varied little by measure or Accuracy varied little by measure or script type (e.g., TP ≥ 0.7 at FP = 0.4)script type (e.g., TP ≥ 0.7 at FP = 0.4)


1515

Yahoo Pipe accuracy slipped a bit Yahoo Pipe accuracy slipped a bit when using past to predict futurewhen using past to predict future


1616

Code-based traits gave nearly the full Code-based traits gave nearly the full accuracyaccuracy

(History, URL, Annotations, Code)


1717

ConclusionsConclusions

• Model is equally accurate for a range of uses– And might only require code-based traits


1818

ConclusionsConclusionsand future workand future work

• Model is equally accurate for a range of uses– And might only require code-based traits– But can we improve accuracy by using information

available after reuse is attempted?– Can we also predict how happy people will be when

reusing different pieces of code?

• And now to put the model to work…– Improving search engines– Providing UI for administrators to review macros– Giving programmers advice automatically


1919

Thank YouThank You

To ICISA for this opportunity to present this paper


2020

So how do we separate the So how do we separate the wheat from the chaff?wheat from the chaff?

• Providing such features requires predicting whether code will ever be reused

– Without relying on information that’s available after code is reused (“chicken and egg”)

• Ratings, reviews, etc…• (For some features, of course, we can always add this

information in later.)

– With a fairly simple model for making predictions• So that predictions can be explained to users• Especially when we’re advising users about how to improve

reusability of their code!!!!!


digging for diamonds: identifying valuable web automation programs in repositories jarrod jackson 1,...

Documents