scalable semantic web-based source code search infrastructure

16
I. Keivanloo, L. Roostapour, P. Schugerl, J. Rilling Scalable Semantic Web-based Source Code Search Infrastructure SE-CodeSearch

Upload: icsm-2010

Post on 21-Nov-2014

3.808 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Scalable Semantic Web-based Source Code Search Infrastructure

I. Keivanloo, L. Roostapour, P. Schugerl, J. Rilling

Scalable Semantic Web-based Source Code Search Infrastructure

SE-CodeSearch

Page 2: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 2

Search

Who lives in London?

Who has relatives in London!

9/14/2010

Page 3: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 3

Source code search

Where is it defined? Where is it called!

9/14/2010

Page 4: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 4

Query types • Pure structural (PSQ)

• Metadata (MDQ)

• Transitive closure-based (TCQ)

• Method call (MCQ)

• Absent information (AIQ)

• Mixed queries (MXQ)

Requirement-based classification

9/14/2010

Page 5: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 5

SICS Semantic-rich Internet-scale Code Search

•Supports all query types •Handles a tera-scale repository

Page 6: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 6

Is there any SICS?

•NO

Page 7: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 7

•Incomplete code (no binaries)

•Repository evolution–The crawler is working 24/7–Dependent code might be indexed in any order

•Very large repository (tera-scale)

Challenges

9/14/2010

Page 8: Scalable Semantic Web-based Source Code Search Infrastructure

•Creates small ontology for each code part

• Code facts

• Static code analysis rules

•Saves them in the RDF repository

•Uses backward chaining reasoner to answer

• Not only structural query

• But also all the other query types

(embedded code analysis at runtime)

SE-CodeSearch

9/14/2010

Page 9: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 9

SICSONT

• Source Code Ontology for Internet-scale Static Analysis

http://aseg.cs.concordia.ca/ontology#sicsont

9/14/2010

Page 10: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 10

Semantic Web-based Static Code Analysis

• Knowledge-based approach

• Inference engine does the analysis

• Restricted to OWL-DL

– De facto standard for knowledge sharing

– Based on Description Logic

• Decidable

• More restricted than rule-based families

9/14/2010

Page 11: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 11

Semantic Web-based Static Code Analysis (Cont.)

• No compiler• Possible analysis– Inheritance tree computation– Fully qualified name resolution– Method call/return statement and type resolution

• Translation template for each analysis rule

9/14/2010

Page 12: Scalable Semantic Web-based Source Code Search Infrastructure

Queries:1. Transitivity closure-based2. Method call

Dataset:600,000 Java classes (no binaries) from a very large dataset (~400 GB)

http://www.ics.uci.edu/~lopes/datasets.

Scalability Test

Hardware:• 3 GB RAM• 3.40 GHz CPU

9/14/2010

Page 13: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 13

SE-CodeSearch Highlights

•Avoid expensive knowledge

modeling

•Optimized ontology population

•Backward-chaining reasoner

•Disk-based computation

–Works on minimum hardware

9/14/2010

Page 14: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 14

SE-CodeSearch Highlights (Cont.)

•Parallelization

–One pass code analysis

–Static code analysis on

•Complete code

•Partial Code

–Independent of parsing order

•First Package A then Package B

•First Package B then Package A

–Repository evolves incrementally

•Open World Reasoning (Not available in Relational DB)9/14/2010

Page 15: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 15

The poster

9/14/2010

Page 16: Scalable Semantic Web-based Source Code Search Infrastructure

ICSM 2010 ERA 16

?• SE-CodeSearch homepage:

http://aseg.cs.concordia.ca/codesearch

• Source Code Ontology homepage:http://aseg.cs.concordia.ca/ontology

• ASEG Lab. homepage:http://aseg.cs.concordia.ca

• Any question:[email protected]