(linked) data curation challenges
DESCRIPTION
(Linked) Data Curation challenges. Kevin Ashley Director, Digital Curation Centre www.dcc.ac.uk [email protected]. Reusable with attribution: CC-BY. The DCC is supported by Jisc. Acknowledgements. John Wilkins & Cameron Neylon Ideas, images, slides, inspiration. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/1.jpg)
(Linked) Data Curation challenges
Kevin AshleyDirector, Digital Curation Centre
Reusable with attribution: CC-BYThe DCC is supported by Jisc
![Page 2: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/2.jpg)
2
Acknowledgements
• John Wilkins & Cameron Neylon• Ideas, images, slides, inspiration
2013-07-05 Kevin Ashley – CC-BY
![Page 3: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/3.jpg)
3
Data views and processes
• Administration• Discovery• Work-level description• Discipline-level interpretation
2013-07-05 Kevin Ashley – CC-BY
![Page 4: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/4.jpg)
4
Administrative view
2013-07-05 Kevin Ashley – CC-BY
Data from projects funded by NERC
Data produced by the department of linguistics
![Page 5: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/5.jpg)
5
Discovery view
2013-07-05 Kevin Ashley – CC-BY
Data about reproductive behaviour in freshwater fish
![Page 6: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/6.jpg)
6
Work-level description
2013-07-05 Kevin Ashley – CC-BY
![Page 7: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/7.jpg)
72013-07-05 Kevin Ashley – CC-BY
![Page 8: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/8.jpg)
Kevin Ashley – CC-BY 82013-07-05
![Page 9: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/9.jpg)
9
Data is variable
• Not always textual• Not always tabular• Not always fixed• Not always clearly authored – think of archival
provenance• Not always associated with publication
2013-07-05 Kevin Ashley – CC-BY
![Page 10: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/10.jpg)
Kevin Ashley – CC-BY 10http://www.flickr.com/photos/sethw/113073189/
95% of research results are never published
2013-07-05
![Page 11: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/11.jpg)
Kevin Ashley – CC-BY 11http://flickr.com/photos/heymans/480396810/
If a million postdocs repeat a million experiments…
2013-07-05
![Page 12: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/12.jpg)
Kevin Ashley – CC-BY 12http://flickr.com/photos/cliche/120070310/
And 25% of those don’t work…
2013-07-05
![Page 13: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/13.jpg)
Kevin Ashley – CC-BY 13
…how much taxpayer’s money is that?
http://flickr.com/photos/luismimunoznajar/2093185804/2013-07-05
![Page 14: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/14.jpg)
Kevin Ashley – CC-BY 142013-07-05
I need that data now!!! I don’t care how messy it is – I
can fix it!
I’ve wasted too much of my life fixing other’s people’s bad
data. I’m not interested until you’ve cleaned it up and
documented it. Besides, I have other things to think about
![Page 15: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/15.jpg)
15
Grandfather’s axe
2013-07-05 Kevin Ashley – CC-BY
[email protected] CC-BY-NC-SA
When is my dataset a new dataset?
![Page 16: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/16.jpg)
16
Authorship
• Reference data – cell-level provenance versus single author data table
• ‘Cleaned’ data – can pass through many hands• Synthesis…
2013-07-05 Kevin Ashley – CC-BY
![Page 17: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/17.jpg)
Kevin Ashley – CC-BY 172013-07-05
![Page 18: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/18.jpg)
Kevin Ashley – CC-BY 182013-07-05
![Page 19: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/19.jpg)
19
Potential wins
• Provenance of machine-gathered data – linking observations to instrument descriptions
• Linking data in multiple places• Data and publications and plans• Robust assertions about data versioning• Association of data with institutions
2013-07-05 Kevin Ashley – CC-BY
![Page 20: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/20.jpg)
Kevin Ashley – CC-BY 20
networks of people…2013-07-05
![Page 21: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/21.jpg)
Kevin Ashley – CC-BY 212013-07-05
![Page 22: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/22.jpg)
22
More wins
• Assertions at table and variable group level• Linking that crosses disciplinary boundaries:– Biochemistry and neuroscience– Naval history, economics and climate science
• Linking that crosses research and administrative boundaries
2013-07-05 Kevin Ashley – CC-BY
![Page 23: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/23.jpg)
23
IGFBP-5 plays a role in the regulation of cellular senescence via a p53-dependent pathway and in aging-associated vascular diseases
2013-07-05 Kevin Ashley – CC-BY
After John WIlbanks
![Page 24: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/24.jpg)
24
Tylenol
2013-07-05 Kevin Ashley – CC-BY
N-acetyl-p-aminophenolAcetaminophen
ParacetamolSameAsN-(4-hydroxyphenyl)ethanamideN-(4-hydroxyphenyl)acetamide
![Page 25: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/25.jpg)
25
“I never had an idea that couldn’t be improved by sharing it with as
many people as possible…”
Bill Hooker (2006)http://3quarksdaily.blogs.com/3quarksdaily/2006/10/the_future_of_s_1.html
2013-07-05 Kevin Ashley – CC-BY
![Page 26: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/26.jpg)
Kevin Ashley – CC-BY 26
IdeaDevelo
p
Fund
PlanRecord
Process
Publish
Read
2013-07-05
![Page 27: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/27.jpg)
Kevin Ashley – CC-BY 27
IdeaDevelo
p
Fund
PlanRecord
Process
Publish
Read
2013-07-05
![Page 28: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/28.jpg)
Kevin Ashley – CC-BY 28
IdeaDevelo
p
Fund
PlanRecord
Process
Publish
Read
2013-07-05
![Page 29: (Linked) Data Curation challenges](https://reader035.vdocuments.mx/reader035/viewer/2022062520/56816359550346895dd413a0/html5/thumbnails/29.jpg)
29
Challenge? Opportunity
• Linked data can improve administration of research and research data
• The real potential is in improving research quality and efficiency
• The same actors can’t do both• The actions don’t need to be in lock-step
2013-07-05 Kevin Ashley – CC-BY