text-mining practical

unix primer

the command line

some useful commands

head -10

tail -10

grep ‘needle’

cut -f 2

sort -nr

uniq -c

redirecting output

write to file

command > filename

using pipes

command1 | command2

putting it all together

cut -f 4 infile | sort | uniq -c |sort -nr | head -100 > outfile

the task

disease gene finding

named entity recognition

human genes

gene prioritization

what I have done

information retrieval

two diseases

prostate cancer

schizophrenia

two sets of documents

62,755 abstracts

65,588 abstracts

one directory with each set

one file with each abstract

dictionary

tab-delimited file

human genes

22,523 entities

synonyms

from many databases

orthographic variation

prefixes and postfixes

automatically generated

2,726,495 names

tagdir program

flexible matching

upper- and lower-case

spaces and hyphens

tab-delimited output

what you will do

named entity recognition

find unfortunate names

create “black list”

information extraction

co-mentioning

within abstracts

rank genes for each disease

find shared gene

a helping hand

“black list”

100+ matches

10+ matches

wrap up

prostate cancer

schizophrenia

Glutamate carboxypeptidase II

same protein

synonyms matter

“black list” is crucial

text mining is useful

not black magic

EMBO Practical Course Computational Biology:Genomes to SystemsPuerto Varas, 3-9 April 2014

Thank you!

text-mining practical

black list

disease gene

black magic

human genes

tabdelimited file

tabdelimited output

ank genes

gene prioritization

Science

introduction to text mining - en.cs.uni-paderborn.de ·...

a brief survey of text mining · text mining = text data...

text mining & web mining

text mining with oracle - text mining summit

introduction to text mining -...

text mining webinar - knime€¦ · text mining webinar the...

text mining for clementine improve insights with text mining

practical text mining with perl

text mining and data mining

practical text mining and statistical analysis for non

practical text mining with sql using relational databases

text mining - data mining

introduction to text mining - edbt 2006 · text mining text...

introduction to text mining and sas text...

text mining infrastructure in r - university of...

text-mining practical

practical nlp and information extraction with gate ·...

mining text using keyword distributions - hebrew...

text mining

introduction to text mining - uni-paderborn.de ·...