sharing is fun! thoughts on open access research

23
Sharing is fun! Thoughts on open access research Mike Godfrey University of Waterloo

Upload: verena

Post on 22-Feb-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Sharing is fun! Thoughts on open access research. Mike Godfrey University of Waterloo. Why share your research artifacts?. So the community can validate your results So the community can build on your results It improves your visibility in the research community. Two experiences. GXL - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sharing is fun! Thoughts on open access research

Sharing is fun!Thoughts on open access research

Mike GodfreyUniversity of Waterloo

Page 2: Sharing is fun! Thoughts on open access research

Why share your research artifacts?

• So the community can validate your results

• So the community can build on your results

• It improves your visibility in the research community

Page 3: Sharing is fun! Thoughts on open access research

Two experiences

• GXL– Ric Holt, Andy Schürr, Susan Sim, Andreas Winter,

many others– c. 1999

• What's in a name?– Abram Hindle, Neil Ernst– c. 2011

Page 4: Sharing is fun! Thoughts on open access research

Project 1: GXL

• Background: late 1990s, lots of research source code reverse engineering environments emerging:e.g., Rigi/Shrimp, SwagKit/PBS, Bauhaus, MOOSE, GuPro, Datrix, Dali, CIA/Acacia …

Page 5: Sharing is fun! Thoughts on open access research

CSER / CASCON 1999 5November 7, 1999

Architectural Reconstruction

Source Code

ExecutingSystem

SourceControl

System Artifacts

Scanning

Parsing

Profiling

ChangeReporting

Extractors

ExtractedFacts

Repository View Generation

VisualizationManipulation

Architecture

Page 6: Sharing is fun! Thoughts on open access research

CSER / CASCON 1999 6November 7, 1999

TAXForm Utopia

PBS Extractor(cfx)

R ig i Extractor(rig iparse)

D ali Extractor(SN iFF+)

TAXFormR epository

PBS V iew erand Abstraction

Too ls

SystemArtifacts

BunchC lustering Tool

R ig i SH riM PView er

Dali toTAXFormConverter

Rigi toTAXFormConverter

cfx toTAXFormConverter

Bunch /TAXFormConverter

TAXForm toRigi Converter

Page 7: Sharing is fun! Thoughts on open access research

CSER / CASCON 1999 7November 7, 1999

Transforming Between Schemas

Universal

High-Level

Procedural

PL/I C

Object-Oriented

C++ Java

Dali C Rigi CPBS C

Page 8: Sharing is fun! Thoughts on open access research

Let's share our tools!• "I want to use your source code fact extractor with my analysis engine … how

hard can that be?"

– "Just make your tools available for download!"

– "Just make your APIs and output data format public!"

– "Just make your source code available!"

– "Just show me your main internal meta-model!"

– "OK, maybe we need to talk …"

Page 9: Sharing is fun! Thoughts on open access research

9

PBS C Language E/R View

Page 10: Sharing is fun! Thoughts on open access research

10

PBS Architectural Schema

Page 11: Sharing is fun! Thoughts on open access research

11

TA++ Combined E/R Model

Page 12: Sharing is fun! Thoughts on open access research

12

BAUHAUS Combined E/R

Page 13: Sharing is fun! Thoughts on open access research

13

GUPRO Multi-Language Model

Page 14: Sharing is fun! Thoughts on open access research

Some problems

• To pre-process or not?

• Templates/generics are a bear

• Are interfaces classes? It's important …

• Naming, UIDs, mangling

• Lies my extractor told me

Page 15: Sharing is fun! Thoughts on open access research

Let's share our tools!• Key events

– CASCON 1999 / 2000 workshops on tool interoperability– ICSE 2000 Workshop of Std Exchange Formats (WoSEF)– Dagstuhl Seminar 01041, Interoperability of Reverse Engineering

Tools, Jan 2001

• Months of discussion + arguing led to three levels:1. Software architecture2. "Middle model" (f calls g, h uses v)3. "Code level"PLUS a XML-based notation that can be used to represent all three: GXL

Page 16: Sharing is fun! Thoughts on open access research

GXL

• After arguing and arguing, we realized that all we could really agree on as a community is that models of programs are graphs

• GXL: Graph eXchange Language– It's XML!– It's not XMI!– … but, ummm, BYO schema!

Page 17: Sharing is fun! Thoughts on open access research

Success! …

• Some tool owners created GXL converters for their tools, but its use fizzles out

• It's a headache to maintain inter-tool compatibility when you're doing active research and keep changing your mind (and others do the same)– That's the nature of research!– Probably this works best with "stable" tools

Page 18: Sharing is fun! Thoughts on open access research

Success! …

• So this "sharing" turned out to be a lot harder than it looked, even with a lot of good will and energy

• Instead of building large, robust bridges, we built a raft factory– And that was good enough for its purpose

• Most importantly, we learned a lot about what to do "next time"

Page 19: Sharing is fun! Thoughts on open access research

Project 2: What's in a name?• Can we label/name topics automatically extracted from version

control meta-data?

• For a given LDA topic, can we label it with non-functional requirements (NFRs) automatically + without training?

• … semi-automatically?

Ref: "Automated Topic Naming: Supporting Cross-project Analysis of Software Maintenance", by A. Hindle, N. Ernst, M. Godfrey, J. Mylopoulos. Empirical Software Engineering, 18(6), December 2013.

Page 20: Sharing is fun! Thoughts on open access research
Page 21: Sharing is fun! Thoughts on open access research

What's in a name

• Abram Hindle is a big advocate of open access, wants to set a good example:

http://softwareprocess.es/static/What's_in_a_Name.html

– Source code for original tools– Original data (MaxDB repo: 1GB)– Extracted data– Tool output (LDA topics)– VirtualBox VM (LDA, other tools + data preloaded: 3GB)– Wordnet-like list for NFRs (Please reference if you use it!)

Page 22: Sharing is fun! Thoughts on open access research

Lessons learned and open sores

• It is our moral duty as scientists to be open

• Assume no one will care, but someone might

• Sharing is hard!– You can "design for sharing", but it takes effort– You will get better at it

Page 23: Sharing is fun! Thoughts on open access research

Sharing is fun!Thoughts on open access research

Mike GodfreyUniversity of Waterloo