william hayes, phd phoebe roberts, phd march 19, 2007 william hayes, phd phoebe roberts, phd march...
TRANSCRIPT
![Page 1: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/1.jpg)
William Hayes, PhD
Phoebe Roberts, PhDMarch 19, 2007
William Hayes, PhD
Phoebe Roberts, PhDMarch 19, 2007
Biogen Idec Literature Informatics for Drug Discovery
![Page 2: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/2.jpg)
MissionMission
• Provide – access to literature and text resources– tools to access and manage literature and
text resources– expert analyses of literature and text
resources– the most advanced tools and analyses
available
![Page 3: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/3.jpg)
AgendaAgenda
• Value Proposition
• Literature Informatics Overview
• Projects
• Summary
![Page 4: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/4.jpg)
Value PropositionValue Proposition
• A recent trend in the industry is to cut the library to a bare operational staff - to manage E-journals and document delivery
• To do so eliminates our ability to make knowledgeable decisions for drug development
![Page 5: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/5.jpg)
The Scope of the Literature Problem – You cannot keep up!
The annual worldwide production of information in publications is estimatedas 8 TB in books, 25 TB in newspapers, 20 TB in magazines, and 2 TB injournals Every minute scientific knowledgeincreases by 2,000 pages
It takes five years to read the newscientific material produced every24 hours
80% of information is stored asunstructured text
The number of papers associatedwith a pharma target:
in 1990 = 100in 2001 = 8
![Page 6: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/6.jpg)
Library -> Literature InformaticsLibrary -> Literature Informatics
• Deliver information
• Requires variety of skill sets (Library science, operations, technical, informatics, domain expertise)
![Page 7: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/7.jpg)
What is Literature Informatics ?What is Literature Informatics ?
• Applying data mgmt and analytical technologies to extract and store knowledge from scientific/business literature
• Analytical technologies: –Information retrieval –Text mining–Semantic reasoning and inference
• Analytical objectives:– What protein interactions can be found in the corpus?– Which gene expressed in a particular pathway with respect to a special
disease for a special genetic group– Which compounds inhibit a protein?– Which documents found are toxicology-related?– Show me all co-occurring genes and diseases
![Page 8: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/8.jpg)
Literature Informatics BenefitsLiterature Informatics Benefits
• Much more efficient overview of research areas– Save significant time for individual
researchers/the company• Ability to effectively extract information from
hundreds to millions of documents• Greater than 10X improvements in speed of
analysis and recall • More value captured from $Millions spent on
literature content and research
![Page 9: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/9.jpg)
External vs Internal Research DollarsExternal vs Internal Research Dollars
• US Total: $94.3B (2003) (JAMA. 2005;294:1333-1342)
– Public 43% - NIH(28%), Other Federal (7%), State/local gov (5%), Charity (3%)
– Private 57% - Pharma (29%), Biotech ~1500 companies (19%), Device (9%)
• Pfizer R&D (2004)– $8B (3.5X of Pfizer spend from one funding agency!)
• Biogen Idec - 3rd largest biotech– $684M (2004) R&D (0.7% of US Total)
![Page 10: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/10.jpg)
Number of Papers Published Number of Papers Published (from Pubmed)(from Pubmed)
2002 2003 2004 2005
Medline 550538 580725 619626 681899
Pfizer 381 444 490 460
Biogen Idec
70 53 51 52
![Page 11: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/11.jpg)
Text Analytics Financial AnalysisText Analytics Financial Analysis
• Given 1000 researchers• 22% time searching and analyzing literature
(Outsell survey 2002)• 220 person-years per year analyzing
literature– $22M / year
• Significant percentage of that time is retrievable using advanced text analytics and expert analysts
![Page 12: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/12.jpg)
Front-loading Safety Concerns Front-loading Safety Concerns
• Lead optimization (LO) costs ~$126M (Tufts survey)
• LO projects take between 2-4 years• ~50% LO projects undergo attrition due to safety
concerns (Tufts survey results)• ~50% of safety issues had literature indicators at
beginning of project (anecdotal evidence)• $25M per 4 LO projects can be recovered IF
comprehensive literature analyses can speed up Safety analyses by 20%
![Page 13: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/13.jpg)
Text Analytics ImpactText Analytics Impact
• Case 1: start with an unknown protein, determine interaction network. No standard procedure without NLP tools – estimated 2-3 weeks of manual mining. With an NLP tool that extracts connectivity information w/ graph visualization from full-text journal articles – 1 hour
• Case 2: determine toxicity patterns for a compound, or determine toxicity side-effects of inhibiting a target. With manual OVID search – library scientists have already put in 3 months, a total of a year estimated. With NLP+ontologies (OBIIE) – 2-3 weeks.
• Case 3: An unknown protein is somehow linked to a known disease. There is a lot of disease literature, but only 4 papers on the protein. Establish a plausible connection of mechanism of action with this disease. Without NLP – indefinite. With OBIIE – 2-3 weeks.
![Page 14: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/14.jpg)
The Analyst’s RoleThe Analyst’s Role
• Understand questions asked, problems encountered– Too much information– Not enough information– Relevant information is buried
• Match resources to needs– Protein-centric versus pipeline?– Better clinical or chemistry coverage?
• Know search logic and available tools
• Pre-screen end-user tools
![Page 15: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/15.jpg)
The Analyst’s RoleThe Analyst’s Role
• Link disparate resources for improved coverage
• Repackage results to match question, user preferences
• Never lose sight of user experience– Alleviate tedium– Minimize error– Increase relevance– Make them look good
• Raise awareness of previously unanswerable questions
![Page 16: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/16.jpg)
Drug Discovery & Due Diligence Drug Discovery & Due Diligence Information RequirementsInformation Requirements
• Set up alerts/RSS feeds on company, compound, clinical trial info, etc
• What’s in clinic for indication, trial info/protocols and stage of trial
• Safety issues• Potential alternative indications• Biomarkers• Toxicities of compounds for indication• Potential consultants, collaboration map• More comprehensive searches for research, development,
pharmacodynamics, clinical trials, adverse events, etc.
![Page 17: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/17.jpg)
Text Processing
Stemming,Stop-word filters,Pattern filters,Lexicon matching,Ontologies,NLP parsingetc, ..
Feature Extraction
Statistical:Word Counts, Pattern Extraction & Counts, etc
Domain-specificGene Name counts, etc
NLP-specificPhrase counts, etc
Data Mining
Classification, Clustering, Association,Statistical Analysis,Visual Analysis,etc …
Text documents
Text docs
Numerical Feature Vectors
Retrieval/ Storage
IndexingAccess DriversStorage
Text docs
Pre-process documents to enhance the ease of feature extraction
Features are summarized into vector forms which are suitable for data mining
Results can be document characterization or hidden relationship extraction
Retrieve and organize relevant documents
Using workflow technologies to build text mining applications using finer grain components/services
Typical Text Mining WorkflowTypical Text Mining Workflow
![Page 18: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/18.jpg)
OverviewOverview
• Collect– Quosa– Medline
• Explore– Biovista
• Extract– Linguamatics I2E
• Infrastructure– KDE
![Page 19: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/19.jpg)
QuosaQuosa
• Federated search/alerts• Localize full-text papers• Find information not found in abstracts (kinetic
parameters, experimental protocols, etc)
• Manage literature• Collaborate• Analyze literature sets• Develop corpora for other applications to
analyze
![Page 20: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/20.jpg)
BiovistaBiovista Interactive Co-occurrence Analysis Interactive Co-occurrence Analysis
• Basic Research– Target expansion and off-target effects– Experimental design– Going fishing– Finding connections between known facts– Comprehensive summary of a research area– Collaboration
• Clinical Development– Drug-Drug interactions– Timeline studies– Side effects to worry about
• Intellectual Property– Analyze issued patents
• Competitive Intelligence
![Page 21: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/21.jpg)
Linguamatics I2ELinguamatics I2E
• Fact search engine
• Uses semantic entity types coupled with syntactic search criteria for relationship extraction
• Agile NLP application
![Page 22: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/22.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 23: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/23.jpg)
Inforsense KDEInforsense KDE
• Text Mining Infrastructure
• Text/Data workflow environment
![Page 24: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/24.jpg)
Use case 1: Where are early licensing Use case 1: Where are early licensing opportunities in academia?opportunities in academia?
Goal: identify areas of research that could yield potential therapeutics
Criteria: • some efficacy is established in the form of testing in
animal models• Pre-IND filingApproach: Survey the literature for papers that
describe in vivo testing of reagents that affect a particular biology (eg immunity, neurology or tumor growth)
![Page 25: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/25.jpg)
Paint a picture of the desired targetPaint a picture of the desired target
• Use internal projects to develop search criteria• Four early-stage projects each have 5-10 papers describing
neutralizing antibodies• The papers mention an indication only half the time• The papers always mention tissues and cell types• Antibodies are described in a limited number of ways• The target of the antibody is almost always in the same
sentence as the antibody term• The ability of an antibody to block function is described in a
limited number of ways
![Page 26: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/26.jpg)
Use the desired features to construct a Use the desired features to construct a searchsearch
Antibody and protein terms in the same sentenceBlock/neutralize and variations somewhere in the abstract
Nervous tissues somewhere in the abstract
“a neutralizing monoclonal antibody against IL-1 beta was infused into the wound immediately following the injury”
“a neutralizing monoclonal antibody directed against MMP-9 was administered intravenously”
“anti-rat neutralizing IL-1 beta antibody (anti-IL-1 beta) or control immunoglobulin G antibody (IgG) was microinjected”
“potent blocking of p75 binding occurs only with MAb 909”
“an antibody that blocks erbB2/neu-mediated signaling inhibited vestibular ganglion neuron viability”
![Page 27: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/27.jpg)
Search ResultsSearch Results
![Page 28: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/28.jpg)
Use Case 2: the Gene ListUse Case 2: the Gene List
Official Name
BIIB name
Itga4 Tysabri
Itgb1 Tysabri
Tnfsf13b BAFF
Tdgf1 Cripto
Cd80 Galiximab
Fcer2a Lumiliximab
–Generated by biomarker studies, toxicity studies, central to translational medicine
–Often hundreds of genes
–Official names are obscure
–Finding all the names, the most common name is hard
–On average, one a week
![Page 29: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/29.jpg)
Find Relevant Genes from Online Databases
Find Relevant Genes from Online Databases
Find Associations between Frequent TermsFind Associations between Frequent Terms
Gene Expression Analysis Gene Expression Analysis
A Literature Analytics WorkflowA Literature Analytics Workflow
![Page 30: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/30.jpg)
Visualizing search results and Visualizing search results and information within yields new insightsinformation within yields new insights
• Paging through abstracts one by one doesn’t show the big picture:– Who’s collaborating with whom?– Who’s patenting their work?– When did the field develop and mature?– Who are the opinion leaders?
![Page 31: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/31.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
1934 Author/Affiliations8893 relationsBlue = Aurora KinasesGreen = Cancer litRed = Patents lit
![Page 32: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/32.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
![Page 33: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/33.jpg)
Where do we need to be? Where do we need to be?
• Spend less time acquiring, more time assimilating
• Provide domain experts with powerful literature analytics
• Mix/match best of breed applications for combining text/data mining
• Need knowledge discovery/exploitation environment that supports rapid construction of integrated text/data results for researchers
![Page 34: William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 William Hayes, PhD Phoebe Roberts, PhD March 19, 2007 Biogen Idec Literature Informatics for Drug](https://reader035.vdocuments.mx/reader035/viewer/2022070305/5514e9a7550346935c8b5a50/html5/thumbnails/34.jpg)
AcknowledgementsAcknowledgements
• Connie Matsui
• June Ivey
• Pam Gollis
• Harry Bochner
• Adrean Andreas
• Cindy Shamel
• Steve French
• Research Informatics