charla en el cbm
TRANSCRIPT
Pasado, presente y futuro de la búsqueda de literatura científica
Ramón Alonso-Allende
Pasado, presente y futuro de la búsqueda de literatura científica
Ramón Alonso-Allende
1990’s2000’s
ïêáíÉ
ëÉ~êÅÜ êÉ~Ç
ÉñéÉêáãÉåí
Science CicleFuture
Search =Integration + Meaning + Social
Tod
ay
Relevance + Complete
+ Easy
- TimeVal
ue
syst
em
Sistemas de información
1995 2000 2005 2010
0
250.000
500.000
750.000
1.000.000
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
Sear
ches
(100
0s)
Searches in PubMed
Retos
Retos
‣ Manejar cantidades ingentes de información.
‣ Ambigüedad del lenguaje.
‣ Tiempo.
‣ Mantenerse al día.
jordinho_dp
0
20.000.000
40.000.000
60.000.000
80.000.000
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
GB PDB Medline SwissProt
Mucha información heterogenea
43% Genes humanos tienen nombres ambiguos
Algunos datos
‣ 5.892 términos pueden ser genes o enfermedades
‣ 3.963 nombres hacen referencia a 2 genes diferentes
‣ Un término hace referencia a 114 genes
0
1.000
2.000
3.000
4.000
2 3 4 5 6 7 8 9Núm
ero
de t
érm
inos
Número de conceptos
Disease GenesDrugs
Algunos Ejemplos
sps AAt1stiff-man syndrome(Diseases or Syndromes)
annuloaortic ectasia (Diseases or Syndromes)
polystyrene sulfonate (Pharmacological substances)
alanine aminotransferase (Genes and Proteins)
systolic blood pressure (Biological functions)
spermine synthase (Genes and Proteins)
Language ambiguitypóåoåóãë eoãoåóãë ^Åêoåóã
aáÑÑÉêÉåí=ïoêÇ=Ñoê=íÜÉ=
ë~ãÉ=ÄáoãÉÇáÅ~ä=Éåíáíó
p~ãÉ=å~ãÉ=Ñoê=ÇáÑÑÉêÉåí=
ÄáoãÉÇáÅ~ä=ÉåíáíáÉë
RÉÇìÅÉ=ïoêÇ=
êÉéêÉëÉåíáåÖ=~=
ÄáoãÉÇáÅ~ä=Éåíáíó
• få=eìã~å=íÜÉêÉ=~êÉ=~í=
äÉ~ëí=RKQNU=ÖÉåÉë=ïáíÜ=
ëóåoåóãë=EPUB=oÑ=íÜÉ=
íoí~ä=ÖÉåoãÉF
• aêìÖë=Ü~îÉ=~=
ÅoããÉêÅá~ä=å~ãÉ=~åÇ=~=
ÅÜÉãáÅ~ä=å~ãÉ
póãÄoä=m^m=áë=~å=~äá~ë=
ÑoêW
• m^m=Em~åÅêÉ~íáíáëJ
~ëëoÅá~íÉÇ=éêoíÉáåF
• jRmpPM=EjáíoÅÜoÇ=
êáÄoëoã~ä=éêoí=PMëF
• m^mli^=Emoäó^=
éoäáãÉê~ëÉ=~äéÜ~F
p`q=ëí~åÇë=ÑoêW
• píÉã=`Éää=qê~åëéä~åí
• pÉÅêÉíáå
• p~äãoå=Å~äÅáíoåáå
Inmanejable
‣ More than 25 MM documents considering scientific articles, grants, biomedical patents… relevant sources of information for biomedical researchers.
‣ 2,000 new scientific papers published everyday
‣ 5 years to read the new scientific material produced every 24 hours.
‣ Scan 130 journals and read 27 articles per day to follow a single disease, like breast cancer.
Mantenerse al día
‣Alertas en buscadores
‣emailling eTOCs
‣Feeds RSS
0%
10%
20%
30%
40%
50%
60%
70%
80%
All
Bioc
hem
estr
y
Mol
. & C
ell B
iol.
Gen
etic
s
Biot
echn
olog
y
Bioi
nfro
mat
ics
Med
icin
e
Oth
er
Search tasks & Lab work by discipline%
tim
e
Searchin literature Searching data form DB Working in the labRoos, A., Kumpulainen, S., Järvelin, K and Hedlund, T. (2008). "The information environment of researchers in molecular medicine" Information Research, 13(3) paper 353. [Available at http://InformationR.net/ir/13-3/paper353.html]
Cómo afrontamos retos
Afrontamos los retos:
‣ Integrando información para el usuarios.
‣ Analizando el texto (text mining).
‣ Funcionalidad útil.
‣ Tecnología + Interfaz sencillo = - Tiempo
Integración de datosSequence DBsUniProtGenBankRefSeqPIREMBLEntrez ProteinUniSTS
Gene DBsGDBEnsemblEntrez GeneUniGeneH-InvDBMGCHGNC
Pathway DBsKEGGECReactome
Domain DBsPfamPROSITESMARTProDomInterPro
Other DBsAffymetrixGOPDBMIMCCDSHPRDHGNC
Text miningGene: GH1Growth Hormone 1GeneID: 2688
Synonym: GHNSynonym: GH
Gene: GG1Gamma Glutamyl HydrolaseGeneID: 8836
Synonym: conjugasaSynonym: GH
adenoma (0.300)adipocyte (0.418)adipose (0.324)age-related (0.442)genotropin (19.368)
antifolate (2.850)carboxypeptidase (12.618)folate (0.674)gamma-glu-x (15.452)antifolylpoly-gamma-glutamate (12.054)
Medlineabstracts
Open access Texto completo
Proyectos I+Dabstracts
Datos indexados
NU=j NQRKMMM NIR=j
[=Qj=ÅoåÅÉéíoë
[=OMM=j=êÉä~ÅáoåÉë
Comparison: Use-Case: Looking for the gene SCT
PubMed: SCT is Solid- Cystic tumor
Google Scholar: SCT is name of author
novo|seek: SCT ismeaning you are looking for:-Secretine-Stem Cell transplantation
novo|seek vs. Google Scholar
dooÖäÉ=pÅÜoä~êW=åo=ï~ó=ío=ÑoÅìë=íÜÉ=ëÉ~êÅÜ=ÄÉóoåÇ=êÉ~ÇáåÖW=íáãÉJÅoåëìãáåÖ
Semantic SearchDiscovery
Knowledge Extraction
Concept relations
‣Search more efficiently.
‣Extract more information.
‣Put into relation different sources of information
‣Gain time
Techonology
by L cornide
e.g. Search of breast cancerDetection of breast carcinoma cells in effusions is associated with rapidly fatal outcomeWomen who do not receive regular mammograms are more likely than others to have breast cancer diagnosed at an advanced stage[…] thereby providing higher cytotoxicity against the 4T1 mouse mammary carcinoma cell line
All of this keywords are referred to the same biomedical concept, a search by breast cancer will retrieve this three documents
‣ Use of context and semantic information to identify the relevant information
e.g. Search of CAT, that could be referred to the enzyme Catalase or to the animal, “cat”.[..] activity of antioxidant enzymes (GSH-Px, SOD, CAT) and content of malondialdehyde (MDA) were
determined[…] 26 free-living lynx, 53 domestic cats, 28 dogs, 33 red foxes (Vulpes vulpes) […]
The same keyword is referred to different biomedical concepts. Using the context, we can identify that only the first sentence talks about an enzyme
Semantic Search
‣ Conceptual search
by L cornide
Concept Relations
e.g. Search for Alzheimer’s DiseaseThe apolipoprotein E gene (APOE) polymorphism genotyping has an allegedly important predictive value for coronary heart disorders and Alzheimer's disease.Apolipoprotein E (apoE), a ligand for the low-density lipoprotein receptor family, has been implicated in modulating glial inflammatory responses and the risk of neurodegeneration associated with Alzheimer's disease.Although many genes have been suggested to be associated with AD, with the exception of APOE, most polymorphic variants of potential risk exhibit a very weak association with AD
The protein apolipoprotein E and Alzheimer disease are related with a relevance of 36%
by L cornide
Knowledge Extraction
‣ Based on the detected relations between concepts, we can extract automatically knowledge from text
e.g. Obtain the knowledge about Breast cancer, extracted from literature[…] BRCA1 or BRCA2 […] Information was recorded on prophylactic mastectomy, prophylactic oophorectomy, use of tamoxifen [..] had a bilateral prophylactic oophorectomy. […] breast cancer, 248 (18.0%) had had a prophylactic bilateral mastectomy. Among those who did not have a prophylactic mastectomy, only 76 women (5.5%) took tamoxifen and 40 women (2.9%) took raloxifene for breast cancer prevention. […].
Genes BRCA1 and BRCA2 are related with breast cancer. Tamoxifen and Raloxifene are drugs used in its treatment, and mastectomy and oophorectomy are usual procedures to treat it.
by L cornide
Make new Discoveries
‣ Discover hidden relations between concepts that have not been described before in the scientific literature
e.g. Obtain the knowledge about Breast cancer, extracted from literature[…] meal fatty acids appear to be an important determinant of vascular reactivity, with fish oils significantly improving postprandial endothelium-independent vasodilationNumerous studies have documented longer bleeding times and decreased platelet aggregation in subjects ingesting omega-3 fatty acidsvasomotor pain, in particular the fact of reactional vasodilation during Raynaud's syndrome, inflammation in the region surrounding zones of ischemic necrosis, and infection of ulcersObjective judgement on effects of medicine in patients with Raynaud's phenomenon--measurement of cutaneous blood flow using laser Doppler flowmeter and platelet aggregation activity
By finding evidence of a relation between fish oils and vasodilatation and platelet aggregation, and evidence in the link between these two functions and Raynaud’s syndrome, we can uncover a new discovery that was not described previously in the literature, the possible treatment of Raynaud’s Syndrome with fish oil.
by L cornide
El Futuro
‣ Información estructurada.
‣ Identificador de usuario.
‣ El artículo del futuro.
‣ Búsqueda social.
Collective
CollaborativeQ&A
Friend-Filtered
Social Search
http://www.readwriteweb.com/archives/3_flavors_of_social_search_what_to_expect.php
Beta testers
Colaboración en el desarrollo de uno de los principales buscadores biomédicos en el mercado.
Acceso a los últimas actualizaciones de nuestro buscador.
Regalo seguro.
www.novoseek.com/betatesters.html
Contacto
Ramón Alonso-AllendeMarketing & Business [email protected]: +34 91 141 71 50