text-mining practical

55
Lars Juhl Jensen Text-mining practical

Upload: lars-juhl-jensen

Post on 10-May-2015

160 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Text-mining practical

Lars Juhl Jensen

Text-mining practical

Page 2: Text-mining practical

the task

Page 3: Text-mining practical

named entity recognition

Page 4: Text-mining practical

human proteins

Page 5: Text-mining practical

link proteins to diseases

Page 6: Text-mining practical

what I have done

Page 7: Text-mining practical

information retrieval

Page 8: Text-mining practical

two diseases

Page 9: Text-mining practical

prostate cancer

Page 10: Text-mining practical

schizophrenia

Page 11: Text-mining practical

two sets of documents

Page 12: Text-mining practical

62,755 abstracts

Page 13: Text-mining practical

65,588 abstracts

Page 14: Text-mining practical

one directory with each set

Page 15: Text-mining practical

one file with each abstract

Page 16: Text-mining practical

dictionary

Page 17: Text-mining practical

tab-delimited file

Page 18: Text-mining practical

human proteins

Page 19: Text-mining practical

22,523 entities

Page 20: Text-mining practical

synonyms

Page 21: Text-mining practical

from many databases

Page 22: Text-mining practical

orthographic variation

Page 23: Text-mining practical

prefixes and postfixes

Page 24: Text-mining practical

automatically generated

Page 25: Text-mining practical

2,726,495 names

Page 26: Text-mining practical

tagdir program

Page 27: Text-mining practical

flexible matching

Page 28: Text-mining practical

upper- and lower-case

Page 29: Text-mining practical

spaces and hyphens

Page 30: Text-mining practical

tab-delimited output

Page 31: Text-mining practical

what you will do

Page 32: Text-mining practical

named entity recognition

Page 33: Text-mining practical

find unfortunate names

Page 34: Text-mining practical

create “black list”

Page 35: Text-mining practical

information extraction

Page 36: Text-mining practical

co-mentioning

Page 37: Text-mining practical

within documents

Page 38: Text-mining practical

link proteins to diseases

Page 39: Text-mining practical

link between the diseases

Page 40: Text-mining practical
Page 41: Text-mining practical

a helping hand

Page 42: Text-mining practical

“black list”

Page 43: Text-mining practical

100+ matches

Page 44: Text-mining practical

10+ matches

Page 45: Text-mining practical
Page 46: Text-mining practical

wrap up

Page 47: Text-mining practical

prostate cancer

Page 48: Text-mining practical

FOLH1

Page 49: Text-mining practical

schizophrenia

Page 50: Text-mining practical

Glutamate carboxypeptidase II

Page 51: Text-mining practical

same protein

Page 52: Text-mining practical

synonyms matter

Page 53: Text-mining practical

“black list” is crucial

Page 54: Text-mining practical

text mining is useful

Page 55: Text-mining practical

not black magic