networks of proteins and diseases
TRANSCRIPT
Networks of proteins and diseases
Lars Juhl Jensen
sequence analysis
protein networks
de Lichtenberg, Jensen et al., Science, 2005
adverse drug reactions
Campillos, Kuhn et al., Science, 2008
group leader
cofounder
data mining
proteomics
text mining
biomedical literature
electronic health records
protein networks
guilt by association
STRING
computational predictions
gene fusion
Korbel et al., Nature Biotechnology, 2004
gene neighborhood
Korbel et al., Nature Biotechnology, 2004
phylogenetic profiles
Korbel et al., Nature Biotechnology, 2004
experimental data
gene coexpression
protein interactions
Jensen & Bork, Science, 2008
curated knowledge
complexes
pathways
Letunic & Bork, Trends in Biochemical Sciences, 2008
many databases
different formats
different identifiers
variable quality
not comparable
hard work
quality scores
von Mering et al., Nucleic Acids Research, 2005
calibrate vs. gold standard
missing most of the data
text mining
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
comprehensive lexicon
CDC2
cyclin dependent kinase 1
expansion rules
hCdc2
CDC2
flexible matching
cyclin-dependent kinase 1
cyclin dependent kinase 1
“black list”
SDS
augmented browsing
Reflect
browser add-on
real-time text mining
Pafilis, O’Donoghue, Jensen et al., Nature Biotechnology, 2009O’Donoghue et al., Journal of Web Semantics, 2010
information extraction
co-mentioning
within documents
within paragraphs
within sentences
text corpus
~22 million abstracts
no access
~4 million full-text articles
localization and disease
general approach
COMPARTMENTS
TISSUES
DISEASES
curated knowledge
experimental data
text mining
computational predictions
common identifiers
quality scores
visualization
compartments.jensenlab.org
tissues.jensenlab.org
dissemination
web interfaces
web services
diseases.jensenlab.org
bulk download
disease networks
medical data
electronic health records
central registries
individual hospitals
Jensen et al., Nature Reviews Genetics, 2012
structured data
Jensen et al., Nature Reviews Genetics, 2012
unstructured data
in Danish
by busy doctors
confounding factors
age and gender
reporting bias
custom dictionaries
typo rules
age/gender matching
comorbidity
Jensen et al., Nature Reviews Genetics, 2012
Roque et al., PLOS Computational Biology, 2011
temporal correlation
diagnosis trajectories
Jensen et al., in preparation, 2013
pharmocovigilance
adverse drug reactions
Eriksson et al., submitted, 2013
ADR profiles
Eriksson et al., submitted, 2013
ADR frequencies
Eriksson et al., submitted, 2013
molecular basis
protein networks
AcknowledgmentsSTRINGChristian von MeringDamian SzklarczykMichael KuhnManuel StarkSamuel ChaffronChris CreeveyJean MullerTobias DoerksPhilippe JulienAlexander RothMilan SimonovicJan KorbelBerend SnelMartijn HuynenPeer Bork
Text miningSune FrankildEvangelos PafilisKalliopi TsafouAlberto SantosJanos BinderHeiko HornMichael KuhnNigel BrownReinhardt SchneiderSean O’ Donoghue
EHR miningAnders Boeck JensenPeter Bjødstrup JensenFrancisco S. RoqueHenriette SchmockMarlene DalgaardMassimo AndreattaThomas HansenKaren SøebySøren BredkjærAnders JuulTudor OpreaPope MoseleyThomas WergeSøren Brunak
Thank you!