chic - converting hamburgers into cows
DESCRIPTION
How to convert legacy documents to more semantic forms and demonstrates what is possible when it is in this formTRANSCRIPT
CHIC – Converting Hamburgers Into Cows
Joseph [email protected]
The Scholarly Publication Cycle
Generate
Publish
Capture
Use
What is a Cow?
• the character encoding is clearly stated• the document uses a mark-up technology to
identify components • the components have meaning and possibly
behaviour associated with them• unreduced data available
What we thought the workflow should look like
Standoff Annotation File
OSCAR
http://sourceforge.net/projects/oscar3-chem/http://www.omii.ac.uk/wiki/Nwsltr1209OSCAR
http://tinyurl.com/yakzgkd
Front Matter
Abstract
Introduction
Discussion
Experimental
References
Results
Article
Front Matter
Abstract
Introduction
Discussion
Experimental
References
Results
Synthesis
Set up
Analysis
Compound Name
Experimental
DOCX Workflow (part 1)
DOCX Workflow (part 2)
OREChemPSU Soton
IU
Atom
Atom
Cam
Atom
Molecules
SVG
Text
CrystalEye PubChem
Gaussian workflow ORE Triplestore
http://research.microsoft.com/en-us/projects/orechem/
What can we do with a Cow?
5-Cyclobutyl-2,3-dihydro-[1H]-2-benzazepine 82:
Potassium carbonate (0.63 g, 4.56 mmol) and thiophenol (0.19 g, 1.69 mmol) were added to the 2- nitrobenzene sulfonamide 50 (0.50 g, 1.302 mmol) in N,N-dimethylformamide (33 cm3) at room temperature and the mixture was stirred for 16 h. Deionised water (50 cm3) was added and the aqueous phase was extracted with ethyl acetate (5 x 50 cm3). The organic extracts were dried (MgSO4) and concentrated under reduced pressure to give the title compound 82 (0.259 g, 1.302 mmol, ca. 100%) as an oil used without further purification.
Parsing and Semantics
Tokenization and Chunking
Phrase identification
RDF of reaction components
• 3D Boxes: Solid• Double Circles: Oil• Octagon: Gum• Triple Octagon: Foam• Diamond: Crystals or
Needles• Ellipses: Unknown or
Unspecified
Semantic Authoring
• ICE-TheOREM– http://tinyurl.com/y85vh22
• Chem4Word– http://research.microsoft.com/en-us/projects/ch
em4word/– http://bit.ly/c4w