Download - Csvconf
![Page 1: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/1.jpg)
The Content Mine
Peter Murray-Rust[*]University of Cambridge, Open Knowledge,
& Shuttleworth Fellow OKFest, Berlin, 2014-07-15, DE
[*] and Michelle Brook, Jenny Molloy, Ross Mounce, Richard Smith-Unna, Mark MacGillivray, Emanuel
Toliv
![Page 2: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/2.jpg)
Liberating facts for humanity*
• Public science 500,000,000,000 USD per year• 85% of medical research is wasted (bad design, lost
data, non-communication)• ContentMine will liberate 100,000,000 facts per year
from scientific literature• Crawl, Scrape, Extract, Republish• Open Data CC 0, Open Standards, Open Source• COLLABORATIVE, any data-rich discipline
• [*] Closed data means people die
![Page 3: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/3.jpg)
But we can now turn PDFs into
Science
We can’t turn a hamburger into a cow
![Page 4: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/4.jpg)
UNITS
TICKS
QUANTITYSCALE
TITLES
DATA!!2000+ points
![Page 5: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/5.jpg)
Dumb PDF
CSV
SemanticSpectrum
2nd Derivative
Smoothing Gaussian Filter
Automaticextraction
![Page 6: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/6.jpg)
Chemical Computer Vision
1 sec to turn this into semantic science
![Page 7: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/7.jpg)
PROPERTIES (Name-Value-Units-Error)
Name Value UnitsNV U
NV U
N V
U
N
E
V E U
Note CML supports value ranges and errors
![Page 8: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/8.jpg)
“nuggets” in a scientific paper
quantity
units
Value ranges
Humans aren’t designed to mine this … chemical
project places
![Page 9: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/9.jpg)
Parsing chemical sentences
![Page 10: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/10.jpg)
http://wwmm.ch.cam.ac.uk/chemicaltagger
• Typical
Typical chemical synthesis
![Page 11: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/11.jpg)
Open Content Mining of FACTs
Machines can interpret chemical reactions
We have done 500,000 patents. There are > 3,000,000 reactions/year. Added value > 1B Eur.
![Page 12: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/12.jpg)
Evolution of ultraviolet vision in the largest avian radiation - the passerines Anders Ödeen 1* , Olle Håstad 2,3 and Per Alström 4
HTML
Styles , superscripts
And diåcritics preserved!
AMI
![Page 13: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/13.jpg)
PDF Turdus iliacusTaeniopygia guttataSerinus canariaLanius excubitorMelopsittacus undulatusPavo cristatusSturnus vulgarisDolichonyx oryzivorusFicedula hypoleucaVaccinium myrtillusFalco tinnunculus
TurdusPomatostomus LeothrixAmytornis AcanthisittaOrthonyx x 2MalurusCnemophilus x 4Philesturnus x 2Motacilla x 2Toxorhampus x 2
![Page 14: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/14.jpg)
Linked Open Data – the world’s knowledge
very little physical science http://upload.wikimedia.org/wikipedia/commons/3/34/LOD_Cloud_Diagram_as_of_September_2011.png
DBPedia
BIO
Comp
Lib
PDB
Ontologies
GOV
GOV.uk
Music,ArtLiterature
Social
Knowledgebases
RDF triples
![Page 15: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/15.jpg)
Acanthisittidae Acanthizidae Acrocephalidae Callaeidae Campephagidae Cnemophilidae Corvidae
0.84 0.91 0.93 0.95
Acanthisitta Acrocephalus Ailuroedus Ailuroedus Amytornis Camptostoma
AMI23.1234.5437.2138.55
Posterior probability
AMI can MEASUREBranch lengths!
NexML
Genus Family
HTML
![Page 16: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/16.jpg)
We can do any data…
![Page 17: Csvconf](https://reader036.vdocuments.mx/reader036/viewer/2022062512/55495f05b4c905f24e8b57fe/html5/thumbnails/17.jpg)
… pixel analysis …