text mining

41
Text mining

Upload: lars-juhl-jensen

Post on 24-Jan-2015

595 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Text mining

Text mining

Page 2: Text mining

Explosion

Page 3: Text mining

exponential increase

Page 4: Text mining
Page 5: Text mining
Page 6: Text mining

some things are constant

Page 7: Text mining
Page 8: Text mining

“graph calculus”

Page 9: Text mining

=

Page 10: Text mining

~45 seconds per paper

Page 11: Text mining

Information retrieval

Page 12: Text mining

find the relevant papers

Page 13: Text mining

user-specified query

Page 14: Text mining

“yeast AND cell cycle”

Page 15: Text mining
Page 16: Text mining

stemming

Page 17: Text mining

dynamic query expansion

Page 18: Text mining

ranking

Page 19: Text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 20: Text mining

no tool will find that

Page 21: Text mining

Entity recognition

Page 22: Text mining

identify the substance(s)

Page 23: Text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 24: Text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 25: Text mining

Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009

Page 26: Text mining

comprehensive lexicon

Page 27: Text mining

orthographic variation

Page 28: Text mining

“black list”

Page 29: Text mining

manual correction

Page 30: Text mining

still too much to read

Page 31: Text mining

Information extraction

Page 32: Text mining

formalize the facts

Page 33: Text mining

co-occurrence

Page 34: Text mining

global statistical analysis

Page 35: Text mining

NLPNatural Language Processing

Page 36: Text mining

parsing individual sentences

Page 37: Text mining

Gene and protein names

Cue words for entity recognition

Verbs for relation extraction

[nxexpr The expression of [nxgene the cytochrome genes [nxpg CYC1 and CYC7]]]is controlled by[nxpg HAP1]

Page 38: Text mining

Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1 homolog) directly phosphorylated Swe1 and this modification served as a priming step to promote subsequent Cdc5-dependent Swe1

hyperphosphorylation and degradation

Page 39: Text mining

store in a database

Page 40: Text mining

then the fun begins :-)

Page 41: Text mining

Acknowledgments

NLP pipeline– Jasmin Saric– Rossitza Ouzounova– Isabel Rojas– Peer Bork

Reflect– Heiko Horn– Sune Frankild– Evangelos Pafilis– Sven Haag– Michael Kuhn– Peer Bork– Reinhardt Schneider– Sean O’Donoghue