c-store: rdf data management using column stores
DESCRIPTION
C-Store: RDF Data Management Using Column Stores. Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 24, 2009. What is RDF data?. RDF (Resource Description Framework) The data model behind the Semantic Web . The Semantic Web’s vision is to make Web machine readable. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/1.jpg)
C-Store: RDF Data Management Using Column Stores
Jianlin FengSchool of SoftwareSUN YAT-SEN UNIVERSITYApr. 24, 2009
![Page 2: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/2.jpg)
What is RDF data?
RDF (Resource Description Framework) The data model behind the Semantic Web.
The Semantic Web’s vision is to make Web machine readable.
Represents data as statements of the form <subject, property, object> To represent the notion "The sky has the color blue" use the triple < The sky, has the color, blue>.
![Page 3: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/3.jpg)
DBFacebook RDF Graph:Triples make the graph
![Page 4: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/4.jpg)
RDF Data Is Proliferating
Swoogle: Semantic Web Search Engine Indexes about 2,889,974 Semantic Web documen
ts. Number of triples could be parsed from all the doc
uments is 699,043,992. http://swoogle.umbc.edu/
Simile: MIT Digital Library Data in RDF More than 50 million triples. http://simile.mit.edu/
![Page 5: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/5.jpg)
RDF Data Management
Early projects built their own RDF stores.
Trend now towards storing in RDBMSs.
Examines 3 approaches for storing RDF data in a RDBMS
![Page 6: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/6.jpg)
Approach 1: Triple Stores
![Page 7: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/7.jpg)
Approach 2: Property Tables
![Page 8: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/8.jpg)
Approach 3: One-table-per-property
Favors Column Store
![Page 9: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/9.jpg)
Comparison Results Synopsis
Triple-store really slow on benchmark with 50M triples.
Property-tables and one-table-per-property approaches are factor of 3 faster.
One-table-per-property with column-store yields another factor of 10.
![Page 10: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/10.jpg)
Querying RDF Data
SPARQL is the dominant language. Examples:
SELECT ?name WHERE { ?x type Person . ?x name ?name }
SELECT ?likes ?dislikesWHERE { ?x title “Implementation Techniques for Main Memory Databases”. ?y authorOf ?x . ?y likes ?likes . ?y dislikes ?dislikes }
![Page 11: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/11.jpg)
Translation to SQL over triples is easy
![Page 12: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/12.jpg)
SPARQL SQL (over triple store) Query 1 SPARQL:
SELECT ?nameWHERE { ?x type Person . ?x name ?name }
Query 1 SQL:SELECT B.objectFROM triples AS A, triples as BWHERE A.subject = B.subject
AND A.property = “type”AND A.object = “Person”AND B.predicate = “name”
![Page 13: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/13.jpg)
Characteristics of Triple Stores
Accessing multiple properties for a resource require subject-subject joins.
Path expressions require subject-object joins. Can improve performance by:
Indexing each column Dictionary encoding string data
Ultimately: Do not scale
![Page 14: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/14.jpg)
Property Tables Can Reduce Joins
![Page 15: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/15.jpg)
Characteristics of Property Tables Complex to design
If narrow: reduces nulls, increases unions/joins If wide: reduces unions/joins, increases nulls
Implemented in Jena and Oracle But main representation of data is still triples
![Page 16: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/16.jpg)
Table-Per-Property Approach
• Nulls not stored
• Easy to handle multi-valued attributes
• Only need to read relevant properties
•Still need joins (but they are linear merge joins)
![Page 17: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/17.jpg)
Materialized Paths
![Page 18: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/18.jpg)
Accelerating Path Expressions
Materialize Common Paths Improved property table per
formance by 18-38% Improved one-table-per-pro
perty performance by 75-84%
Use automatic database designer (e.g., C-Store /Vertica) to decide what to materialize
![Page 19: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/19.jpg)
One-table-per-property Column-Store Can think of one-table-per-property as vertica
l partitioning super-wide property table. Column-store is a natural storage layer to use
for vertical partitioning. Advantages:
Tuple Headers Stored Separately. Column-oriented data compression. Do not necessarily have to store the subject colu
mn Carefully optimized merge-join code
![Page 20: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/20.jpg)
Library Benchmark
Data Real Library Data (50 million RDF triples) Data acquired from a variety of diverse sources (s
ome quite unstructured). Queries
Automatically generated from the Longwell RDF browser.
Details in Abadi’s paper .
![Page 21: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/21.jpg)
Results
![Page 22: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/22.jpg)
Future Work
build a fully-functional RDF database Extracts and loads RDF data from structured, sem
i-structured, and unstructured data sources. Translates SPARQL to queries over vertical sche
ma. Performs reasoning inside the DB. Use with biology research.
![Page 23: C-Store: RDF Data Management Using Column Stores](https://reader035.vdocuments.mx/reader035/viewer/2022062422/56813492550346895d9b7b8b/html5/thumbnails/23.jpg)
References
Abadi, Daniel J., Marcus, Adam, Madden, Samuel R., and Hollenbach, Kate. Scalable Semantic Web Data Management Using Vertical Partitioning. In VLDB, 2007.
Abadi, Daniel J., Marcus, Adam, Madden, Samuel R., and Hollenbach, Kate. SW-Store: A Vertically Partitioned DBMS for Semantic Web Data Management. In VLDB Journal, 2009.