text-mining practical

76
Text-mining practical Lars Juhl Jensen

Upload: lars-juhl-jensen

Post on 15-Jul-2015

445 views

Category:

Science


4 download

TRANSCRIPT

Page 1: Text-mining practical

Text-mining practicalLars Juhl Jensen

Page 2: Text-mining practical

unix primer

Page 3: Text-mining practical

the command line

Page 4: Text-mining practical

some useful commands

Page 5: Text-mining practical

cat

Page 6: Text-mining practical

less

Page 7: Text-mining practical

head -10

Page 8: Text-mining practical

tail -10

Page 9: Text-mining practical

grep ‘needle’

Page 10: Text-mining practical

cut -f 2

Page 11: Text-mining practical

sort

Page 12: Text-mining practical

sort -nr

Page 13: Text-mining practical

uniq -c

Page 14: Text-mining practical

redirecting output

Page 15: Text-mining practical

write to file

Page 16: Text-mining practical

command > filename

Page 17: Text-mining practical

using pipes

Page 18: Text-mining practical

command1 | command2

Page 19: Text-mining practical

putting it all together

Page 20: Text-mining practical

cut -f 4 infile | sort | uniq -c |sort -nr | head -100 > outfile

Page 21: Text-mining practical

the task

Page 22: Text-mining practical

disease gene finding

Page 23: Text-mining practical

named entity recognition

Page 24: Text-mining practical

human genes

Page 25: Text-mining practical

gene prioritization

Page 26: Text-mining practical

what I have done

Page 27: Text-mining practical

information retrieval

Page 28: Text-mining practical

two diseases

Page 29: Text-mining practical

prostate cancer

Page 30: Text-mining practical

schizophrenia

Page 31: Text-mining practical

two sets of documents

Page 32: Text-mining practical

62,755 abstracts

Page 33: Text-mining practical

65,588 abstracts

Page 34: Text-mining practical

one directory with each set

Page 35: Text-mining practical

one file with each abstract

Page 36: Text-mining practical

dictionary

Page 37: Text-mining practical

tab-delimited file

Page 38: Text-mining practical

human genes

Page 39: Text-mining practical

22,523 entities

Page 40: Text-mining practical

synonyms

Page 41: Text-mining practical

from many databases

Page 42: Text-mining practical

orthographic variation

Page 43: Text-mining practical

prefixes and suffixes

Page 44: Text-mining practical

automatically generated

Page 45: Text-mining practical

2,726,495 names

Page 46: Text-mining practical

tagdir program

Page 47: Text-mining practical

flexible matching

Page 48: Text-mining practical

upper- and lower-case

Page 49: Text-mining practical

spaces and hyphens

Page 50: Text-mining practical

tab-delimited output

Page 51: Text-mining practical

what you will do

Page 52: Text-mining practical

named entity recognition

Page 53: Text-mining practical

find unfortunate names

Page 54: Text-mining practical

create “black list”

Page 55: Text-mining practical

information extraction

Page 56: Text-mining practical

co-mentioning

Page 57: Text-mining practical

within abstracts

Page 58: Text-mining practical

rank genes for each disease

Page 59: Text-mining practical

find shared gene

Page 60: Text-mining practical
Page 61: Text-mining practical

a helping hand

Page 62: Text-mining practical

“black list”

Page 63: Text-mining practical

100+ matches

Page 64: Text-mining practical

10+ matches

Page 65: Text-mining practical
Page 66: Text-mining practical

wrap up

Page 67: Text-mining practical

Protein kinase B

Page 68: Text-mining practical

PKB

Page 69: Text-mining practical

Akt

Page 70: Text-mining practical

AKT1

Page 71: Text-mining practical

same protein

Page 72: Text-mining practical

synonyms matter

Page 73: Text-mining practical

“black list” is crucial

Page 74: Text-mining practical

text mining is useful

Page 75: Text-mining practical

not black magic

Page 76: Text-mining practical

Thanks for your attention!

76