the pragmatic text miner: it's just another type of poorly standardized data

Post on 27-Jun-2015

138 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The pragmatic text miner: It's just another type of poorly standardized data

TRANSCRIPT

Lars Juhl Jensen

The pragmatic text miner

It’s just another type of poorly standardized data

why text mining?

data mining

guilt by association

structured data

unstructured text

biomedical literature

>10 km

too much to read

computer

as smart as a dog

teach it specific tricks

named entity recognition

text corpus

comprehensive lexicon

synonyms

expansion rules

prefixes and suffixes

flexible matching

hyphens and spaces

“black list”

a

co-mentioning

within documents

within paragraphs

within sentences

weighted score

unifying text & data

text mining

curated knowledge

experimental data

computational predictions

integrated web resources

protein networks

string-db.org

chemical networks

stitch-db.org

subcellular localization

compartments.jensenlab.org

tissue expression

tissues.jensenlab.org

disease associations

many sources

different formats

different identifiers

variable quality

hard work

collaboration model

domain experts

what?

why?

problem

manpower

me

how?

technology

guidance

biodiversity

organisms

environments

Encyclopedia of Life

British Heritage Library

what we need

the format is not important

the license is

AcknowledgmentsProtein networks

Michael KuhnDamian Szklarczyk

Andrea Franceschini Milan SimonovicAlexander RothSune Pletscher-

FrankildJianyi Lin

Pablo MinguezChristian von Mering

Peer Bork

Localization and diseaseSune Pletscher-FrankildAlberto SantosJanos BinderKalliopi TsafouChristian StolteAlbert PallejaHeiko HornEvangelos PafilisReinhardt SchneiderSean O’ Donoghue

top related