Download - Text mining

Transcript
Page 1: Text mining

Text mining

Page 2: Text mining

Explosion

Page 3: Text mining

exponential increase

Page 4: Text mining
Page 5: Text mining
Page 6: Text mining

some things are constant

Page 7: Text mining
Page 8: Text mining

“graph calculus”

Page 9: Text mining

=

Page 10: Text mining

~45 seconds per paper

Page 11: Text mining

Information retrieval

Page 12: Text mining

find the relevant papers

Page 13: Text mining

user-specified query

Page 14: Text mining

“yeast AND cell cycle”

Page 15: Text mining
Page 16: Text mining

stemming

Page 17: Text mining

dynamic query expansion

Page 18: Text mining

ranking

Page 19: Text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 20: Text mining

no tool will find that

Page 21: Text mining

Entity recognition

Page 22: Text mining

identify the substance(s)

Page 23: Text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 24: Text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 25: Text mining

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009

Page 26: Text mining

comprehensive lexicon

Page 27: Text mining

orthographic variation

Page 28: Text mining

“black list”

Page 29: Text mining

manual correction

Page 30: Text mining

still too much to read

Page 31: Text mining

Information extraction

Page 32: Text mining

formalize the facts

Page 33: Text mining

co-occurrence

Page 34: Text mining

global statistical analysis

Page 35: Text mining

NLPNatural Language Processing

Page 36: Text mining

parsing individual sentences

Page 37: Text mining

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 38: Text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 39: Text mining

store in a database

Page 40: Text mining

then the fun begins :-)

Page 41: Text mining

Acknowledgments

NLP pipeline– Jasmin Saric– Rossitza Ouzounova– Isabel Rojas– Peer Bork

Reflect– Heiko Horn– Sune Frankild– Evangelos Pafilis– Sven Haag– Michael Kuhn– Peer Bork– Reinhardt Schneider– Sean O’Donoghue


Top Related