new generation of patent sequence databases · patent families gm671154 ada42650 cs017585 acq13114...

48
EBI is an Outstation of the European Molecular Biology Laboratory. New generation of patent sequence databases Information Sources in Biotechnology Japan

Upload: others

Post on 13-Jul-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

EBI is an Outstation of the European Molecular Biology Laboratory.

New generation of patent sequence databases

Information Sources in Biotechnology

Japan

Page 2: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

2

Patent-related resources

Patent

Resources

Patents

http://www.ebi.ac.uk/

Page 3: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

3 http://www.ebi.ac.uk/patentdata/

Patent resources at EBI

Page 4: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

4

Patent resources at EBI

Same sequences (EPO, USPTO, JPO, KIPO)

Non-redundant sequence data

Patent family classification

Enriched with patent information

EPO

Patent proteins:

USPTO JPO KIPO

Patent nucleotides:

ENA (EPO, USPTO, JPO, KIPO)

Page 5: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

5

NR patent

sequence

databases

NCBI

GenBank NIG

DDBJ

EBI

EMBL-Bank

USPTO

EPO

KIPO

JPO

other

patent

offices

INSDC

INSDC agreement:

• Free unrestricted access

• All data exchanged daily

Sequence data from patent literature

Page 6: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

6

Non-redundant patent databases

Patent

proteins

Patent

nucleotides

NRNL1

(Non-redundant

nucleotide level-1)

NRPL1

(Non-redundant

protein level-1)

Level-1

http://www.ebi.ac.uk/patentdata/

Groups together

100% identical

patent sequences

NRNL2

(Non-redundant

nucleotide level-2)

NRPL2

(Non-redundant

protein level-2)

Level-2 Groups together

identical sequences

by patent family

Page 7: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

7

Patent sequence record in NRNL1

Patents containing

100% identical

sequence

Sequence

Page 8: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

8

Patent sequence record in NRNL2

Patent

equivalents

Sequence

record in ENA

Sequence

Patent

literature

Priority number

and date

Translation

Page 9: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

9 www.ebi.ac.uk

Remove

sequence redundancy

Group by

patent families

Additional annotation,

including priority dates

for patent families

Level-1 NR

EMBL patents (redundant)

Level-2 NR

Non-redundant patent databases

Page 10: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

10

~23.9 M PAT sequences (>230 M total)

~12.2 M sequences

~15.5 M sequences

Nucleotide

NRNL1

ENA

NRNL2

~6.5 M PRT sequences (>32 M total)

~2.5 M sequences

~3.8 M sequences

Protein Patent

Proteins

NRPL1

NRPL2

Patent sequence records at EBI

Page 11: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

11

Sequence search

Page 12: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

12

Sequence searching

http://www.ebi.ac.uk/

Sequence Similarity &

Analysis

Tools

Page 13: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

13

Sequence searching

www.ebi.ac.uk/Tools/sss/

Wide variety of

search tools

Page 14: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

14

Choosing the right search engine

FASTA Better general search engine

SSEARCH Sensitive but slow; good for short sequences

GGSEARCH Force full-length matches ||||||||||||||| Query

Subject

GLSEARCH Match domains/patterns

to protein; oligo-to-gene |||||||||||||||

Query

Subject

BLAST General search engine

Page 15: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

15

Protein

Patent

databases

Search a variety of databases

*Select all 6

results in triplicate!!

Page 16: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

16

Search a variety of databases

Patent data

Nucleotide

*Select all 3

results in triplicate!!

Page 17: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

17

let’s look at an example…

Page 18: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

18 http://www.ebi.ac.uk/Tools/sss/

Protein

Patent

proteins

Example:

Search patent

protein sequence

Searching a redundant database

Page 19: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

19

. . .

>260 identical results

too much to analyze

Results from a redundant database

Page 20: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

20

LEVEL-1 NR patent sequence database

removes redundancy

fewer results to analyze, less chance

of missing important results

Page 21: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

21

Searching NR level-1 patent database

NR patent

Level-1

http://www.ebi.ac.uk/Tools/sss/

Example:

Search patent

protein sequence NR patent

level-1

Page 22: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

22

Results from NR level-1 database

Each hit unique

Page 23: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

23

Results from NR level-1 database

Link to patent

documentation

List of all

patents

containing

the sequence

Link to

sequence

entry

Earliest

publication date

Page 24: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

24

Patent families

Simple Patent Family is a group of patents

that relate to the same invention, and are

based on the same originating application

They arise when an invention is patented in

multiple countries

Grouping patents into families reduces multi-national

results down to a representative member

Page 25: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

25

Patent families

GM671154 CS017585 ACQ13114 DI603183 AAR79155 DD649656 ADA42650

100% identical sequences

Invention A Invention B

HB492658

EP WO US US JP

patent family second

patent family

Same sequence can appear multiple times in a database due to:

Same invention filed multiple times in different offices (same patent family)

Different inventors use the same sequence in different contexts (different

patent families)

Page 26: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

26

LEVEL-2 NR patent sequence database

groups identical sequences by patent family

provides earliest priority date for family

Page 27: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

27

Searching NR level-2 patent database

NR patent

Level-2

http://www.ebi.ac.uk/Tools/sss/

Example:

Search patent

protein sequence NR patent

level-2

Page 28: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

28

Results from NR level-2 database

Each hit = one family

Page 29: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

29

Results from NR level-2 database

Patent

equivalents

Earliest publication

data in family Earliest active

priority date in

family

Page 30: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

30

Results from NR level-2 database

Link to patent

documentation

Link to

sequence

entry

patents in

same family

Page 31: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

31

Text search

Page 32: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

32

SRS: advanced text search

1st: Select resources to

search

2nd: Create query

http://www.ebi.ac.uk/srs/

Page 33: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

33 Sequence Searching Tools

SRS: advanced text search

Select library tab

Page 34: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

34 Sequence Searching Tools

SRS: advanced text search

Select library tab

Search >100 databases

NR patent DNA

(NRNL1 & NRNL2)

NR patent proteins

(NRPL1 & NRPL2)

Page 35: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

35 Sequence Searching Tools

SRS: advanced text search

Select library tab

Search >100 databases

Example:

Selected to search

NR level-1 patent

DNA database

Page 36: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

36 Sequence Searching Tools

SRS: advanced text search

Select resources to search

Select library tab

Page 37: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

37 Sequence Searching Tools

SRS: advanced text search

Select resources to search Select library tab

1) Select field 2) Type in text

Page 38: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

38 Sequence Searching Tools

SRS: advanced text search

Select resources to search Select library tab

Here, selected

patent number

Page 39: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

39 Sequence Searching Tools

SRS: advanced text search

Select resources to search

Create query

Select library tab

Page 40: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

40 Sequence Searching Tools

SRS: advanced text search

Select resources to search Create query Select library tab

Lists non-redundant

nucleotide

sequences from

WO0146262

Page 41: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

41 Sequence Searching Tools

SRS: advanced text search

Select resources to search Select library tab Create query

WO0146262 sequences

Page 42: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

42 Sequence Searching Tools

SRS: advanced text search

Select resources to search Select library tab Create query

WO0146262

nucleotide sequence

record in NRNL1

WO0146262 sequences

Details which other

patents also claim

this sequence

(with NRNL2, would

see family grouping)

Page 43: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

43 Sequence Searching Tools

SRS: advanced text search

Select resources to search Create query Select library tab

NRNL1 sequence record WO0146262 sequences

Page 44: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

44 Sequence Searching Tools

SRS: advanced text search

Select resources to search Create query Select library tab

http://www.ebi.ac.uk/srs/

NRNL1 sequence record

WO0146262 literature WO0146262 sequences

Page 45: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

45 Sequence Searching Tools

SRS: advanced text search

Find all sequences associated with a patent

Find all sequences associated with a patent

+ identify all patents associated with

each sequence

Find all sequences associated with a patent

+ identify all patents in the same family

associated with each sequence

NRNL1

EMBL-Bank

NRNL2

Page 46: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

46 http://www.ebi.ac.uk/patentdata/

Non-redundant

For more information

Page 47: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

47

Publication

User

Manual

For more information

Page 48: New generation of patent sequence databases · Patent families GM671154 ADA42650 CS017585 ACQ13114 DI603183 AAR79155 DD649656 100% identical sequences Invention A Invention B HB492658

48

Help

Contacts:

http://www.ebi.ac.uk/support/