a primer on converting analysis results data to …openrefine method 18 advantages • flexibility...

Post on 24-Jul-2020

9 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Primer on Converting Analysis Results Data to RDF Data Cubes using Free and Open Source Tools

Tim Williams Principal Statistical Solutions Analyst Global Statistical Sciences UCB BioSciences, Inc.

PhUSE 2014

TT03

The Semantic Web (circa 2011) 2

3 "I want to take the clinical trials results..."

"..and put them in an RDF Data Cube!"

Placebo LowDose HighDose Baseline N=28 N=30 N=29 --------------------------------------------- Sex F 12 (42.9) 14 (46.7) 16 (55.2) M 16 (57.1) 16 (53.3) 13 (44.8)

ds:obs1 a qb:Observation ; prop:treatment "Plc" ; prop:sex "F" ; prop:statistic "count" ; prop:result "12"^^xsd:double ; qb:dataSet ds:dataset-demog .

ds:obs2 a qb:Observation ; ...

4

"op": "rdf-extension/save-rdf-schema", "description": "Save RDF schema skeleton", "schema": { "baseUri": "http://www.example.org/", "prefixes": [ { "name": "dccs", "uri": "http://www.example.org/dc/demog/dccs/" }, { "name": "rdfs", "uri": "http://www.w3.org/2000/01/rdf-schema#" }, { "name": "prov", "uri": "http://www.w3.org/ns/prov#" }, ........

JSON

ts:i7832 ts:firstName “Homer” ; ts:lastName “Simpson” ; ts:hasSpouse ts:i5628 . ts:i5628 ts:firstName “Marge”; ts:lastName “Simpson”;

Turtle

Tribble hasSpouse Homer Simpson

Marge Simpson

Triple

Turtle

Jason

How to start?

In Scope

•  Introduction to Semantic Web, RDF....

•  PhUSE Wiki "PhUSE Semantic Technology Curriculum" •  Detailed tutorial

5

Out of Scope

•  Simplified RDF Data Cube •  Two creation methods (overview)

PhUSE Wiki: Companion Documents 6

What is an RDF Data Cube?

7

8

3 Main Components in the Cube Model •  Attributes

•  metadata •  status=final,issued="2014-08-06T00:00:00"^^xsd:dateTime ;

•  Measure (or Primary Measure) •  the observed value of primary interest •  count=12

•  Dimensions •  value keys or indices that identify the measure •  treatment="Plc" , sex="F", statistic="count"

9

10

F

M

Plc LowD HighD

count

12

16

14

16

16

13

12

percentage

14 16

16

13

42.9 46.7 55.2

55.2

44.8

Treatment

Sex

Baseline Placebo LowDose HighDose Characteristic N=28 N=30 N=29 ---------------------------------------------------------------------- Sex F 12 (42.9) 14 (46.7) 16 (55.2) M 16 (57.1) 16 (53.3) 13 (44.8)

Statistic •  count •  percentage

Treatment

11

treatment="Plc",

Dimensions

12

16

14

16

16

13

12 14 16

16

13

42.9 46.7 55.2

55.2

44.8

It's  a  hit!!  count=12

Measure

sex="F",

statistic="count"

Plc Treatment

F Sex

count

12 Publisci OpenRefine

X

X X

Publisci Method 13

Map table

Ruby Script

CSV

Baseline Placebo LowDose HighDose Characteristic N=28 N=30 N=29 ---------------------------------------------------------------------- Sex F 12 (42.9) 14 (46.7) 16 (55.2) M 16 (57.1) 16 (53.3) 13 (44.8)

Statistic •  count •  percentage

Treatment,Sex,Statistic,Result Plc,F,count,12 Plc,F,percentage,42.9 Plc,M,count,16 Plc,M,percentage,57.1 LowD,F,count,14, LowD,F,percentage,46.7 LowD,M,count,16 LowD,M,percentage,53.3 etc.

require 'publisci' include PubliSci::DSL data do source 'demog3DimSource.csv' dimension 'Treatment' , 'Sex', 'Statistic' measure 'Result' option :base_url, 'http://example.org' option 'base', 'http://example.org/' option 'label_column', 'Statistic' end metadata do dataset 'Demographics Analysis Results' title 'Demographics' creator 'Your-Name-Here' description 'Table example for Demographics and Baseline Characteristics' date '2014-07-07T00:00:00' end open('demog3Dim_p.ttl','w'){|file| file.write generate_n3}

... ns:obs1 a qb:Observation ; qb:dataSet ns:dataset-demog3DimSource ; rdfs:label "1" ; prop:Treatment <code/treatment/Plc> ; prop:Sex <code/sex/F> ; prop:Statistic <code/statistic/count> ; prop:Result 12 ; ... .

Publisci Method 14

Advantages

Simple, quick, easy

Minimal cube knowledge

Automatic code list generation

Disadvantages

Limited support*

Harder to extend unless you are a Ruby and Cube expert

Not as flexible as OpenRefine

RDF Data Cube

15

Map table

Import

Construct

Attach

Export

CSV/XLS

Create Project

Cube Skeleton Components •  Attributes •  Dimensions •  Measure

Values

OpenRefine Method

OpenRefine 16

Save & Re-use JSON from OpenRefine 17

OpenRefine method 18

Advantages

•  Flexibility in cube design

•  Incremental development

Disadvantages

•  Greater cube knowledge required

•  Steep Learning curve

•  Labour-intensive, manual steps

•  Measures in the same cube all receive the same data type Example: count and percentage as xsd:double

•  Cube components available within interface

•  Data reconciliation

Where did I go wrong with this child?

Query the data with SPARQL 19

PREFIX prop: <http://www.example.org/dc/demog/prop/> SELECT ?value WHERE { ?obs prop:treatment "Plc"; prop:sex "F"; prop:statistic "count"; prop:result ?value. }

I blame The Internets, honey.

SPARQL Protocol and RDF Query Language

Cube Construction: an Evolution. 20

Publisci rrdf, rrdflibs

•  “My first cube!” •  Codelists

•  Structure and Skeletons •  Customization •  Data reconciliation

•  Production solution

OpenRefine

Data Transparency?

•  Metadata

•  embedded with the data

•  Standardization

•  data reconciliation with online vocabularies & thesauri

•  translation between different coding systems and data models

•  Merge data

•  similar and dissimilar sources

•  Machine readable •  Reasoning, logic, intelligent search

21

Semantic Interoperability: "The ability for computer systems to exchange data with unambiguous, shared meaning". - Wikipedia.

22

Thank you!

Tim Williams UCB Biosciences, Inc. Raleigh, NC USA tim.williams@ucb.com

Acknowledgements Will Strinz - Publisci Author OpenRefine team Ian Fleming, Marc Andersen - PhUSE WG Leads PhUSE WG team members Open Source Movement The Internets Contact:

www.linkedin.com/in/timpwilliams/

Copyright & Source Attributions All images are copyrights of their creators and respected owners.

23

Paramount Pictures

Hasbro Inc.

Ron Leishman. Image 440722 illustrations Of.com

LOD Cloud Diagram as of September 2011CC BY-SA 3.0 . Anja Jentzch, own work

http://dgallery.s3.amazonaws.com/sparql-protocol.png

Davidson University Dept. of Biology Herpetology Lab Research http://www.bio.davidson.edu/people/midorcas/research/stresearch/tercar.jpg

My life with Fly Ball dogs http://mylifewithflyballdogs.com http://farm6.staticflickr.com/5341/7186194778_3c9d6b56be.jpg

Daily Tombstone Photo http://dailytombstonephoto.blogspot.com/2010/05/mausoleum-of-charles-lucky-luciano-st.html MAUSOLEUM OF CHARLES "LUCKY" LUCIANO - St. John's Cemetery, Middle Village, New York Image modifications by TW, Aug 2014

24

Nickelodeon

Copyright & Source Attributions All images are copyrights of their creators and respected owners.

top related