a primer on converting analysis results data to …openrefine method 18 advantages • flexibility...
TRANSCRIPT
A Primer on Converting Analysis Results Data to RDF Data Cubes using Free and Open Source Tools
Tim Williams Principal Statistical Solutions Analyst Global Statistical Sciences UCB BioSciences, Inc.
PhUSE 2014
TT03
The Semantic Web (circa 2011) 2
3 "I want to take the clinical trials results..."
"..and put them in an RDF Data Cube!"
Placebo LowDose HighDose Baseline N=28 N=30 N=29 --------------------------------------------- Sex F 12 (42.9) 14 (46.7) 16 (55.2) M 16 (57.1) 16 (53.3) 13 (44.8)
ds:obs1 a qb:Observation ; prop:treatment "Plc" ; prop:sex "F" ; prop:statistic "count" ; prop:result "12"^^xsd:double ; qb:dataSet ds:dataset-demog .
ds:obs2 a qb:Observation ; ...
4
"op": "rdf-extension/save-rdf-schema", "description": "Save RDF schema skeleton", "schema": { "baseUri": "http://www.example.org/", "prefixes": [ { "name": "dccs", "uri": "http://www.example.org/dc/demog/dccs/" }, { "name": "rdfs", "uri": "http://www.w3.org/2000/01/rdf-schema#" }, { "name": "prov", "uri": "http://www.w3.org/ns/prov#" }, ........
JSON
ts:i7832 ts:firstName “Homer” ; ts:lastName “Simpson” ; ts:hasSpouse ts:i5628 . ts:i5628 ts:firstName “Marge”; ts:lastName “Simpson”;
Turtle
Tribble hasSpouse Homer Simpson
Marge Simpson
Triple
Turtle
Jason
How to start?
In Scope
• Introduction to Semantic Web, RDF....
• PhUSE Wiki "PhUSE Semantic Technology Curriculum" • Detailed tutorial
5
Out of Scope
• Simplified RDF Data Cube • Two creation methods (overview)
PhUSE Wiki: Companion Documents 6
What is an RDF Data Cube?
7
8
3 Main Components in the Cube Model • Attributes
• metadata • status=final,issued="2014-08-06T00:00:00"^^xsd:dateTime ;
• Measure (or Primary Measure) • the observed value of primary interest • count=12
• Dimensions • value keys or indices that identify the measure • treatment="Plc" , sex="F", statistic="count"
9
10
F
M
Plc LowD HighD
count
12
16
14
16
16
13
12
percentage
14 16
16
13
42.9 46.7 55.2
55.2
44.8
Treatment
Sex
Baseline Placebo LowDose HighDose Characteristic N=28 N=30 N=29 ---------------------------------------------------------------------- Sex F 12 (42.9) 14 (46.7) 16 (55.2) M 16 (57.1) 16 (53.3) 13 (44.8)
Statistic • count • percentage
Treatment
11
treatment="Plc",
Dimensions
12
16
14
16
16
13
12 14 16
16
13
42.9 46.7 55.2
55.2
44.8
It's a hit!! count=12
Measure
sex="F",
statistic="count"
Plc Treatment
F Sex
count
12 Publisci OpenRefine
X
X X
Publisci Method 13
Map table
Ruby Script
CSV
Baseline Placebo LowDose HighDose Characteristic N=28 N=30 N=29 ---------------------------------------------------------------------- Sex F 12 (42.9) 14 (46.7) 16 (55.2) M 16 (57.1) 16 (53.3) 13 (44.8)
Statistic • count • percentage
Treatment,Sex,Statistic,Result Plc,F,count,12 Plc,F,percentage,42.9 Plc,M,count,16 Plc,M,percentage,57.1 LowD,F,count,14, LowD,F,percentage,46.7 LowD,M,count,16 LowD,M,percentage,53.3 etc.
require 'publisci' include PubliSci::DSL data do source 'demog3DimSource.csv' dimension 'Treatment' , 'Sex', 'Statistic' measure 'Result' option :base_url, 'http://example.org' option 'base', 'http://example.org/' option 'label_column', 'Statistic' end metadata do dataset 'Demographics Analysis Results' title 'Demographics' creator 'Your-Name-Here' description 'Table example for Demographics and Baseline Characteristics' date '2014-07-07T00:00:00' end open('demog3Dim_p.ttl','w'){|file| file.write generate_n3}
... ns:obs1 a qb:Observation ; qb:dataSet ns:dataset-demog3DimSource ; rdfs:label "1" ; prop:Treatment <code/treatment/Plc> ; prop:Sex <code/sex/F> ; prop:Statistic <code/statistic/count> ; prop:Result 12 ; ... .
Publisci Method 14
Advantages
Simple, quick, easy
Minimal cube knowledge
Automatic code list generation
Disadvantages
Limited support*
Harder to extend unless you are a Ruby and Cube expert
Not as flexible as OpenRefine
RDF Data Cube
15
Map table
Import
Construct
Attach
Export
CSV/XLS
Create Project
Cube Skeleton Components • Attributes • Dimensions • Measure
Values
OpenRefine Method
OpenRefine 16
Save & Re-use JSON from OpenRefine 17
OpenRefine method 18
Advantages
• Flexibility in cube design
• Incremental development
Disadvantages
• Greater cube knowledge required
• Steep Learning curve
• Labour-intensive, manual steps
• Measures in the same cube all receive the same data type Example: count and percentage as xsd:double
• Cube components available within interface
• Data reconciliation
Where did I go wrong with this child?
Query the data with SPARQL 19
PREFIX prop: <http://www.example.org/dc/demog/prop/> SELECT ?value WHERE { ?obs prop:treatment "Plc"; prop:sex "F"; prop:statistic "count"; prop:result ?value. }
I blame The Internets, honey.
SPARQL Protocol and RDF Query Language
Cube Construction: an Evolution. 20
Publisci rrdf, rrdflibs
• “My first cube!” • Codelists
• Structure and Skeletons • Customization • Data reconciliation
• Production solution
OpenRefine
Data Transparency?
• Metadata
• embedded with the data
• Standardization
• data reconciliation with online vocabularies & thesauri
• translation between different coding systems and data models
• Merge data
• similar and dissimilar sources
• Machine readable • Reasoning, logic, intelligent search
21
Semantic Interoperability: "The ability for computer systems to exchange data with unambiguous, shared meaning". - Wikipedia.
22
Thank you!
Tim Williams UCB Biosciences, Inc. Raleigh, NC USA [email protected]
Acknowledgements Will Strinz - Publisci Author OpenRefine team Ian Fleming, Marc Andersen - PhUSE WG Leads PhUSE WG team members Open Source Movement The Internets Contact:
www.linkedin.com/in/timpwilliams/
Copyright & Source Attributions All images are copyrights of their creators and respected owners.
23
Paramount Pictures
Hasbro Inc.
Ron Leishman. Image 440722 illustrations Of.com
LOD Cloud Diagram as of September 2011CC BY-SA 3.0 . Anja Jentzch, own work
http://dgallery.s3.amazonaws.com/sparql-protocol.png
Davidson University Dept. of Biology Herpetology Lab Research http://www.bio.davidson.edu/people/midorcas/research/stresearch/tercar.jpg
My life with Fly Ball dogs http://mylifewithflyballdogs.com http://farm6.staticflickr.com/5341/7186194778_3c9d6b56be.jpg
Daily Tombstone Photo http://dailytombstonephoto.blogspot.com/2010/05/mausoleum-of-charles-lucky-luciano-st.html MAUSOLEUM OF CHARLES "LUCKY" LUCIANO - St. John's Cemetery, Middle Village, New York Image modifications by TW, Aug 2014
24
Nickelodeon
Copyright & Source Attributions All images are copyrights of their creators and respected owners.