one tagger, many uses: illustrating the power of dictionary-based named entity recognition
TRANSCRIPT
Lars Juhl Jensen@larsjuhljensen
One tagger, many usesIllustrating the power of dictionary-based named entity
recognition
>10 km
too much to read
computer
as smart as a dog
teach it specific tricks
named entity recognition
dictionary
genes / proteins
diseases
expansion rules
prefixes and suffixes
curated blacklist
SDS
software
C++ tagger
>1000 abstracts / second
70–80% recall
80–90% precision
open sourcebitbucket.org/larsjuhljensen/tagger/
Dockerhub.docker.com/r/larsjuhljensen/tagger/
web servicetagger.jensenlab.org
community resources
Extractextract.jensenlab.org
STRINGstring-db.org
string-db.org
DISEASESdiseases.jensenlab.org
Cytoscape
curated knowledge
experimental data
co-occurrence text mining
Medline abstracts
<1 km
15 million full-text articles
Westergaard et al., BioRxiv, 2017
~50% more associations
electronic health records
Jensen et al., Nature Reviews Genetics, 2012
in Danish
dictionary
drugs
adverse events
in Danish
named entity recognition
temporal correlations
Drug introduction Drug discontinuation
Adverse eventNegative modifier Indication Pre-existingcondition
Adverse drug reaction Possibleadverse drug reaction
Adverse event
ADR ofadditional drug
Identification start
Eriksson et al., Drug Safety, 2014
find novel associations
summary
broadly applicable
keep it simple
free tools
AcknowledgmentsEvangelos PafilisSune Pletscher-
FrankildNadezhda Doncheva
Damian SzklarczykMichael Kuhn
Robert Eriksson
John “Scooter” MorrisTudor OpreaChristian von MeringPeer BorkChristos ArvanitidisSøren Brunak