integrating patents with research data

18
Nicko Goncharoff ACS Fall 2012 Andrew Hinton, PhD 19 August Christopher Southan, PhD Integrating patent chemistry with public and private non-patent research resources

Upload: chris-southan

Post on 12-Feb-2015

530 views

Category:

Technology


2 download

DESCRIPTION

SureChem ACS 2012. Presented by Nico on behalf of all three authors. The data is searchable at https://open.surechem.com/login. Related information included recent posts at http://cdsouthan.blogspot.se/

TRANSCRIPT

Page 1: Integrating Patents with Research Data

Nicko Goncharoff ACS Fall 2012Andrew Hinton, PhD 19 AugustChristopher Southan, PhD

Integrating patent chemistry with public and private non-patent research resources

Page 2: Integrating Patents with Research Data
Page 3: Integrating Patents with Research Data
Page 4: Integrating Patents with Research Data

Database of automatically mined structure data from text and images

•20M annotated US, EP, WO full text records and Japan patent abstracts•12M unique chemical structures•MEDLINE – 19M abstracts (coming Q4)

SureChem Data Collection

I

Page 5: Integrating Patents with Research Data

Free resource for researchers Enables linking to public and

proprietary content

Professional search needs Data export, alerts, patent family

search, chemical relevance filters…

API or Data Feed access to chemistry & full text

Integrate with internal databases & workflows

Page 6: Integrating Patents with Research Data

Chemistry Mining Workflow

Page 7: Integrating Patents with Research Data

Public Patent Chemistry Landscape

Page 8: Integrating Patents with Research Data

Current Patent Sources In PubChem

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

EPO(Sling) Chemicalize.org IBM ThompsonPharma

Nu

mb

ers

of

SID

's

10 K280 K

2.3 M

3.7 M

Thomson

Page 9: Integrating Patents with Research Data

Patent & Literature Sources in PubChem

52,975

67,437129,448

3,291,940 515,745

2,113,169

281,920

ChEMBL + PubMed + Journals

918,077 45% lead-like

IBM, pre-2000 patents 2,369,481 32% lead-like

The Big ThreeThomson Pharma,

patents and literature 3,756,283

41% lead-like

Page 10: Integrating Patents with Research Data

SureChem to Deposit All Structures* into PubChem - 2012

•1976 to present

•Deposition of structures only

•View related patents in SureChemOpen

•*Some filtering of common chemistry likely

Page 11: Integrating Patents with Research Data

SureChem and IBM in PubChem (2 Example Patents)

US583593, Inhibitors of squalene synthetase and protein

farnesyltransferase. Abbott

686 93146

IBM Total: 239SureChem Total: 832

WO-1994018188-A1 4-hydroxy-benzopyran-2-ones and 4-

hydroxy-cycloalkyl[b]pyran-2-ones HIV protease inhibitors, Upjohn

SureChem Total: 776

298478 229

IBM Total : 527

Page 12: Integrating Patents with Research Data

Identifying Relevant Chemistry - IC50

US-20120035195-A1 BACE2, Hoffman LaRoche

Page 13: Integrating Patents with Research Data

Structures with IC50 Values

PDF SureChemOpen Excel

US-20120035195-A1

Page 14: Integrating Patents with Research Data

Search IC50 Structures in PubChem

search

Page 15: Integrating Patents with Research Data

SureChem Pubchem (ThomsonPharma ,

Chemicalize)

9679

Stage No. of Structures

Available from SureChem (SC) 1848

Pre-Exist in PubChem 669

Pre-Exist – not from IC50 table 573

Pre-Exist – from IC50 table 96 (12 from TP + 84 via chemicalize.org)

Unique-SC with IC50 79

Unique-SC – beyond IC50 table 1100

SureChem Unique Contribution

Page 16: Integrating Patents with Research Data

http://opentox.informatik.uni-freiburg.de/ches-mapper/

Identifying Relevant Chemistry

Patent US-20120035195-A1

Page 17: Integrating Patents with Research Data

SureChem Chemical Relevance Filtering

• Frequency counts of chemicals within patents• Additional molecular property filtering i.e. Lipinski descriptors• Natural Language Processing – based indexing of Exemplified Compounds

Automated indexing of Exemplified Compounds in text

Page 18: Integrating Patents with Research Data

Conclusion

SureChem deposition into PubChem will

– Significantly expand public patent chemistry scope– Contribute unique and timely MedChem-relevant data– Enable open drug discovery and chemical biology– Advance progress toward a more open, federated

chemical information network