![Page 1: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/1.jpg)
Mining science from the plant literature
ContentMine
Open Plant Forum, Norwich, UK, 2016-07-27
Peter Murray-Rust[1]University of Cambridge [2]TheContentMine
10,000 scholarly publications every day.How many relate to plants?
![Page 2: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/2.jpg)
(2x digital music industry!)
Non-profit
Downloading several thousand papers per day and making search results open for everyone
http://contentmine.org
Downloadable Open source
![Page 3: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/3.jpg)
MozFest 2015
ContentMine + TGAC / hack
![Page 4: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/4.jpg)
Terpinome Phytochemists!
Salvia officinalis
Salvia microphylla
Origanum vulgare Ocimum basilicum
Laurus nobilis [1]
[1] Lauraceae
![Page 5: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/5.jpg)
We can search for
• Plants• Compounds• Other species• Diseases• Places• Important terms
• We’ll need: sources, dictionaries, software
![Page 6: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/6.jpg)
Europe PubMedCentral
Over 1 million biomedical papers
![Page 7: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/7.jpg)
![Page 8: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/8.jpg)
Dictionaries!
Diseases (WHO)
![Page 9: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/9.jpg)
![Page 10: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/10.jpg)
![Page 11: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/11.jpg)
catalogue
getpapers
query
DailyCrawl
EuPMC, arXivCORE , HAL,(UNIV repos)
ToCservices
PDF HTMLDOC ePUB TeX XML
PNGEPS CSV
XLSURLsDOIs quickscrape
normaNormalizerStructurerSemanticTagger
Text
DataFigures
ami
UNIVRepos
search
LookupCONTENTMINING
Chem
Phylo
Trials
CrystalPlants
COMMUNITY
plugins
Visualizationand Analysis
PloSONE, BMC, peerJ… Nature, IEEE, Elsevier…
Publisher Sites
scrapersqueries
taggers
abstract
methods
references
CaptionedFigures
Fig. 1
HTML tables
100, 000 pages/day Semantic ScholarlyHTML(W3C community group)
Facts
Latest 20150908
CONTENTMINE SOFTWARE
Crossref
![Page 12: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/12.jpg)
What plants produce Carvone?
https://en.wikipedia.org/wiki/Carvone
https://en.wikipedia.org/wiki/Carvone
![Page 13: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/13.jpg)
https://en.wikipedia.org/wiki/Carvone
WIKIDATA
![Page 14: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/14.jpg)
Carvone in WikidataAlso SPARQL endpointWP identifier
Chemical type
Chemical identifier
![Page 15: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/15.jpg)
ARTICLES FACETS
gene disease drug Phytochem
species genus words
![Page 16: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/16.jpg)
Suggest the title of this article
![Page 17: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/17.jpg)
![Page 18: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/18.jpg)
![Page 19: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/19.jpg)
species words
drug Phytochemdisease
![Page 20: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/20.jpg)
species words
drug Phytochemdisease
disease
![Page 21: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/21.jpg)
![Page 22: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/22.jpg)
![Page 23: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/23.jpg)
![Page 24: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/24.jpg)
![Page 25: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/25.jpg)
(2x digital music industry!)
Downloading and searchingSeveral thousand papers per day and making the results open for everyone
![Page 26: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/26.jpg)
end
![Page 27: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/27.jpg)
Mining for phytochemicals• getpapers –q carvone –o carvone –x –k 100Search “carvone”, output to carvone/, fmt XML, limit 100 hits
• cmine carvone Normalize papers; search locally for species, sequences, diseases, drugsResults in dataTables.htmland results/…/results.xml (includes W3C annotation)
• python cmhypy.py carvone/ -u petermr <key>send annotations -> hypothes.is
![Page 28: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/28.jpg)
Annotation (entity in context)
prefixsurface
label
location
suffix
![Page 29: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/29.jpg)
![Page 30: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/30.jpg)
Search for carvone
![Page 31: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/31.jpg)
Mining for phytochemicals• getpapers –q carvone –o carvone –x –k 100Search “carvone”, output to carvone/, fmt XML, limit 100 hits
• cmine carvone Normalize papers; search locally for species, sequences, diseases, drugsResults in dataTables.htmland results/…/results.xml (includes W3C annotation)
• python cmhypy.py carvone/ -u petermr <key>send IUCN redlist plant annotations -> hypothes.is
![Page 32: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/32.jpg)
Annotation (entity in context)
prefixsurface
label
location
suffix
![Page 33: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/33.jpg)
Facilitating synthetic biology literature mining and searching for the plant community
Robert Davey, joined TGAC in February 2010 as the lead software engineer on the MISO LIMS project, which was released as an open source framework in June 2012. He went on to become the Core Bioinformatics Project Leader, and was then appointed as Data Infrastructure and Algorithms (DIA) Group Leader in late 2012. Ksenia Krasileva, Group Leader with a joint appointment at The Genome Analysis Centre and The Sainsbury Laboratory. Ksenia joined Norwich Research park in December 2014 moving from University of California Davis where she held Fellowship from National Institute of Food and Agriculture (NIFA) to develop functional genomic tools for wheat working with Jorge Dubcovsky. Nicola Patron, molecular and synthetic biologist at The Sainsbury Laboratory (TSL), a world-leading research institute working on the science of plant-microbe interactions. Richard Smith-Unna, PhD student, Plant Sciences Cambridge. Peter Murray-Rust, a (retired but highly active) chemist in Cambridge University.
Report of 2-day workshop (hack) held at TGAC 2016-03-10/11
The workshop centered on novel methods for discovering information about plants from the existing literature (“Content Mining”). We prepared ContentMine software specifically for the workshop on the basis that “anyone can run it and get useful results “. Everyone was asked to install the software on whatever platform they commonly used (Mac, Windows, Unix). There were few problems and most people were running within an hour. A typical example was “find all you can about diseases of oats” using EuropePubMedCentral (with over 1 million Open Access papers). This retrieves about 500 papers, which were further filtered for chemicals, diseases, species, etc. and displayed within a minute or two, significantly increasing the speed of knowledge-driven scientific discovery. We also jointly made considerable improvements to the software and have agreed to meet regularly to take this forward.
![Page 34: Mining the scientific literature for plants and chemistry](https://reader035.vdocuments.mx/reader035/viewer/2022062902/58ef16001a28ab032a8b45c5/html5/thumbnails/34.jpg)