using network processors in genomics herbert bos * † kaiming huang * {herbertb,khuang}@liacs.nl *...

23
Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands Vrije Universiteit, Netherlands http://www.liacs.nl/~herbertb/projects/biocomp/ H. Bos – Leiden University 13/02/2004 1

Upload: craig-brantingham

Post on 31-Mar-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Using Network Processors inGenomics

Herbert Bos* †

Kaiming Huang*

{herbertb,khuang}@liacs.nl

*Leiden Universiteit, Netherlands† Vrije Universiteit, Netherlands

http://www.liacs.nl/~herbertb/projects/biocomp/

H. Bos – Leiden University 13/02/2004 1

Page 2: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Case study: BLAST

● search nucleotide/protein database for query● BLAST discovers similarity rather than exact

match● two main phases:

1. scoring (registering where query and DNADB match)

2. alignment (dynamic programming)

● only the first phase on NPUs

H. Bos – Leiden University 13/02/2004 2

Page 3: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Window matching

H. Bos – Leiden University 13/02/2004 3

Page 4: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Window matching

H. Bos – Leiden University 13/02/2004 4

Page 5: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Window matching

H. Bos – Leiden University 13/02/2004 5

Page 6: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Window matching

H. Bos – Leiden University 13/02/2004 6

Page 7: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Window matching

● naïve approach: roughly W*N*M comparisons● does not scale ● string search algorithms: Aho-Corasick

– all windows matched at the same time– shifting genome one nucleotide at a time– matching algorithm transformed in a DFA

● DFA may be quite large

H. Bos – Leiden University 13/02/2004 7

Page 8: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Aho-Corasick

H. Bos – Leiden University 13/02/2004 8

● Alphabet: acgt● Window size: 3● Query: acgccga● Windows:

{acg,cgc,gcc,ccg,cga}

Page 9: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Aho-Corasick

H. Bos – Leiden University 13/02/2004 9

0 1 2 3

4 5 6

12

10 11

7 8 9

t a c g

c

g

g c

a

g

cc

c

s 1 2 3 4 5 6 7 8 9 10 11 12

f(s) 0 4 5 0 7 8 0 4 10 4 5 1

● Alphabet: acgt● Window size: 3● Query: acgccga● Windows:

{acg,cgc,gcc,ccg,cga}

Page 10: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Aho-Corasick

H. Bos – Leiden University 13/02/2004 10

0 1 2 3

4 5 6

12

10 11

7 8 9

t a c g

c

g

g c

a

g

cc

c

● Alphabet: acgt● Window size: 3● Query: acgccga● Windows:

{acg,cgc,gcc,ccg,cga}

s 1 2 3 4 5 6 7 8 9 10 11 12

f(s) 0 4 5 0 7 8 0 4 10 4 5 1

3 6 9 11 12

acg cgc gcc ccg cga

Page 11: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Aho-Corasick

H. Bos – Leiden University 13/02/2004 11

0 1 2 3

4 5 6

12

10 11

7 8 9

t a c g

c

g

g c

a

g

cc

c

● Alphabet: acgt● Window size: 3● Query: acgccga● Windows:

{acg,cgc,gcc,ccg,cga}

s 1 2 3 4 5 6 7 8 9 10 11 12

f(s) 0 4 5 0 7 8 0 4 10 4 5 1

3 6 9 11 12

acg cgc gcc ccg cga tacgcga

Page 12: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

H. Bos – Leiden University 13/02/2004 12

ControlProcessor

NPU (IXP1200)

ME

ME

ME

ME

ME

ME

PCI Bus

StrongARM Microengines

DRAM

SRAM

Gbps ports

Pentium

PCI

scratch

IXPBlastArchitecture

Page 13: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

H. Bos – Leiden University 13/02/2004 13

ControlProcessor

NPU (IXP1200)

ME

ME

ME

ME

ME

ME

PCI Bus

StrongARM Microengines

DRAM

SRAM

Gbps ports

Pentium

PCI

scratch

IXPBlastArchitecture

Page 14: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

H. Bos – Leiden University 13/02/2004 14

ControlProcessor

NPU (IXP1200)

ME

ME

ME

ME

ME

ME

PCI Bus

StrongARM Microengines

DRAM

SRAM

Gbps ports

Pentium

PCI

scratch

IXPBlastArchitecture

Page 15: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

H. Bos – Leiden University 13/02/2004 15

ControlProcessor

NPU (IXP1200)

ME

ME

ME

ME

ME

ME

PCI Bus

StrongARM Microengines

DRAM

SRAM

Gbps ports

Pentium

PCI

scratch

IXPBlastArchitecture

0 1 2 3

4 5 6

12

10 11

7 8 9

t a c g

c

g

g c

a

g

cc

c

Page 16: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

H. Bos – Leiden University 13/02/2004 16

ControlProcessor

NPU (IXP1200)

ME

ME

ME

ME

ME

ME

PCI Bus

StrongARM Microengines

DRAM

SRAM

Gbps ports

Pentium

PCI

scratch

IXPBlastArchitecture

0 1 2 3

4 5 6

12

10 11

7 8 9

t a c g

c

g

g c

a

g

cc

c

Page 17: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

H. Bos – Leiden University 13/02/2004 17

ControlProcessor

NPU (IXP1200)

ME

ME

ME

ME

ME

ME

PCI Bus

StrongARM Microengines

DRAM

SRAM

Gbps ports

Pentium

PCI

scratch

IXPBlastArchitecture

0 1 2 3

4 5 6

12

10 11

7 8 9

t a c g

c

g

g c

a

g

cc

c

Page 18: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

IXPBlast: packet handling

● packets read and processed in batches of 100.000● “spilling” must be taken into account● currently no feedback

H. Bos – Leiden University 13/02/2004 18

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Page 19: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Results

● 232 MHz IXP1200 ~ 1.8GHz Pentium-4● 1611 Nucleotide query (MyD88)● 1.4 GB genome (Zebrafish)

– IXP1200: 90 sec with DFA– IXP1200: 129 sec with “trie”– P4: 132: 132 sec with “trie”

● number of matches: 524856

H. Bos – Leiden University 13/02/2004 19

Page 20: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Results

H. Bos – Leiden University 13/02/2004 20

Query size

DNADB

sizeImpl. Performance

1611 1.4 GB P4 132 sec

1611 1.4 GB IXP1200 129 sec

1611 1.4 GB IXP1200

DFA

90 sec

Page 21: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Conclusions

● NPUs are useful in other application domains● Newer hardware is expected to perform much

better● “Throughput processors”● Adapting our current approach to use BLAST

tricks/heuristics

H. Bos – Leiden University 13/02/2004 21

Page 22: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Network processors

● geared for high throughput● used exclusively in network systems● example: intrusion detection● similar to looking for gene on

in genomes● differences

H. Bos – Leiden University 13/02/2004 22

Radisysixp1200 board

Page 23: Using Network Processors in Genomics Herbert Bos * † Kaiming Huang * {herbertb,khuang}@liacs.nl * Leiden Universiteit, Netherlands † Vrije Universiteit,

Application domain: “Genomics”

● example: search genome for occurrence of “patterns”● similar problems as IDS, poor performance on GPP

cannot exploit parallelism– throughput-driven– how about FPGAs?– how about clusters?

● NPU– easier to program than FPGAs– cheaper than cluster computing– “on the desktop” IP never leaves the room

H. Bos – Leiden University 13/02/2004 23