first place memocode'14 design contest entry

55
A High Performance Systolic Architecture for k -NN Classification Kevin Townsend, Philip Jones, Joseph Zambreno Reconfigurable Computing Laboratory Iowa State University MEMOCODE’14 Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 1 / 11

Upload: kevin-townsend

Post on 01-Jul-2015

332 views

Category:

Engineering


2 download

DESCRIPTION

This is what I presented at the 2014 Memocode conference on Iowa State's winning design contest entry. The team was lead by me.

TRANSCRIPT

Page 1: First Place Memocode'14 Design Contest Entry

A High Performance Systolic Architecture for k-NNClassification

Kevin Townsend, Philip Jones, Joseph Zambreno

Reconfigurable Computing LaboratoryIowa State University

MEMOCODE’14

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 1 / 11

Page 2: First Place Memocode'14 Design Contest Entry

Outline

1 The Competition

2 Our Approach

3 Hardware DesignPlatformSystolic ArrayProcessing ElementDot ProductSort

4 Results

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 2 / 11

Page 3: First Place Memocode'14 Design Contest Entry

The Competition

Problem Statement

k Neareast Neighbors

32 Dimensional Space or 32 element length vectors

1,000 (M) test vectors

10,000,000 (N) train vectors

Values are 12 bits

Mahalonobis Distance√(x − y)tS−1(x − y) vs

√(x − y)t(x − y) where x is a training

vector and y is a testing vector.Better results for some problems1024 multiplications vs 32

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11

Page 4: First Place Memocode'14 Design Contest Entry

The Competition

Problem Statement

k Neareast Neighbors

32 Dimensional Space or 32 element length vectors

1,000 (M) test vectors

10,000,000 (N) train vectors

Values are 12 bits

Mahalonobis Distance√(x − y)tS−1(x − y) vs

√(x − y)t(x − y) where x is a training

vector and y is a testing vector.Better results for some problems1024 multiplications vs 32

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11

Page 5: First Place Memocode'14 Design Contest Entry

The Competition

Problem Statement

k Neareast Neighbors

32 Dimensional Space or 32 element length vectors

1,000 (M) test vectors

10,000,000 (N) train vectors

Values are 12 bits

Mahalonobis Distance√(x − y)tS−1(x − y) vs

√(x − y)t(x − y) where x is a training

vector and y is a testing vector.Better results for some problems1024 multiplications vs 32

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11

Page 6: First Place Memocode'14 Design Contest Entry

The Competition

Problem Statement

k Neareast Neighbors

32 Dimensional Space or 32 element length vectors

1,000 (M) test vectors

10,000,000 (N) train vectors

Values are 12 bits

Mahalonobis Distance√(x − y)tS−1(x − y) vs

√(x − y)t(x − y) where x is a training

vector and y is a testing vector.Better results for some problems1024 multiplications vs 32

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11

Page 7: First Place Memocode'14 Design Contest Entry

The Competition

Problem Statement

k Neareast Neighbors

32 Dimensional Space or 32 element length vectors

1,000 (M) test vectors

10,000,000 (N) train vectors

Values are 12 bits

Mahalonobis Distance√(x − y)tS−1(x − y) vs

√(x − y)t(x − y) where x is a training

vector and y is a testing vector.Better results for some problems1024 multiplications vs 32

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11

Page 8: First Place Memocode'14 Design Contest Entry

The Competition

Problem Statement

k Neareast Neighbors

32 Dimensional Space or 32 element length vectors

1,000 (M) test vectors

10,000,000 (N) train vectors

Values are 12 bits

Mahalonobis Distance√(x − y)tS−1(x − y) vs

√(x − y)t(x − y) where x is a training

vector and y is a testing vector.Better results for some problems1024 multiplications vs 32

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11

Page 9: First Place Memocode'14 Design Contest Entry

The Competition

Problem Statement

k Neareast Neighbors

32 Dimensional Space or 32 element length vectors

1,000 (M) test vectors

10,000,000 (N) train vectors

Values are 12 bits

Mahalonobis Distance√(x − y)tS−1(x − y) vs

√(x − y)t(x − y) where x is a training

vector and y is a testing vector.Better results for some problems1024 multiplications vs 32

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11

Page 10: First Place Memocode'14 Design Contest Entry

The Competition

Problem Statement

k Neareast Neighbors

32 Dimensional Space or 32 element length vectors

1,000 (M) test vectors

10,000,000 (N) train vectors

Values are 12 bits

Mahalonobis Distance√(x − y)tS−1(x − y) vs

√(x − y)t(x − y) where x is a training

vector and y is a testing vector.Better results for some problems1024 multiplications vs 32

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11

Page 11: First Place Memocode'14 Design Contest Entry

The Competition

Problem Statement

k Neareast Neighbors

32 Dimensional Space or 32 element length vectors

1,000 (M) test vectors

10,000,000 (N) train vectors

Values are 12 bits

Mahalonobis Distance√(x − y)tS−1(x − y) vs

√(x − y)t(x − y) where x is a training

vector and y is a testing vector.Better results for some problems1024 multiplications vs 32

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 3 / 11

Page 12: First Place Memocode'14 Design Contest Entry

Our Approach

Optimizations

We choose a brute force solution. This is all 10,000,000,000 (M × N)products.

(x − y)tS−1(x − y) is used because√

is an increasing function.

(x − y)t(S−1x − S−1y) reduces the computation from 1024multiplications to 32 multiplications.

S−1x and S−1y can be calculated ahead of time. (Only 10,001,000matrix vector multiplications)

This results in approximately 1.3 trillion integer operations required.

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 4 / 11

Page 13: First Place Memocode'14 Design Contest Entry

Our Approach

Optimizations

We choose a brute force solution. This is all 10,000,000,000 (M × N)products.

(x − y)tS−1(x − y) is used because√

is an increasing function.

(x − y)t(S−1x − S−1y) reduces the computation from 1024multiplications to 32 multiplications.

S−1x and S−1y can be calculated ahead of time. (Only 10,001,000matrix vector multiplications)

This results in approximately 1.3 trillion integer operations required.

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 4 / 11

Page 14: First Place Memocode'14 Design Contest Entry

Our Approach

Optimizations

We choose a brute force solution. This is all 10,000,000,000 (M × N)products.

(x − y)tS−1(x − y) is used because√

is an increasing function.

(x − y)t(S−1x − S−1y) reduces the computation from 1024multiplications to 32 multiplications.

S−1x and S−1y can be calculated ahead of time. (Only 10,001,000matrix vector multiplications)

This results in approximately 1.3 trillion integer operations required.

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 4 / 11

Page 15: First Place Memocode'14 Design Contest Entry

Our Approach

Optimizations

We choose a brute force solution. This is all 10,000,000,000 (M × N)products.

(x − y)tS−1(x − y) is used because√

is an increasing function.

(x − y)t(S−1x − S−1y) reduces the computation from 1024multiplications to 32 multiplications.

S−1x and S−1y can be calculated ahead of time. (Only 10,001,000matrix vector multiplications)

This results in approximately 1.3 trillion integer operations required.

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 4 / 11

Page 16: First Place Memocode'14 Design Contest Entry

Our Approach

Optimizations

We choose a brute force solution. This is all 10,000,000,000 (M × N)products.

(x − y)tS−1(x − y) is used because√

is an increasing function.

(x − y)t(S−1x − S−1y) reduces the computation from 1024multiplications to 32 multiplications.

S−1x and S−1y can be calculated ahead of time. (Only 10,001,000matrix vector multiplications)

This results in approximately 1.3 trillion integer operations required.

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 4 / 11

Page 17: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 18: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 19: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 20: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 21: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 22: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 23: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 24: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 25: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 26: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 27: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 28: First Place Memocode'14 Design Contest Entry

Our Approach

High level approach

trainA trainA

trainB trainB

testA testA

testB testB

MahalanobisProduct

MahalanobisProduct

k-NN

retret

print

0.6GB

1.3GB

64KB

128KB

256KB

Host Coprocessor

start time

end time

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 5 / 11

Page 29: First Place Memocode'14 Design Contest Entry

Hardware Design Platform

The Convey Platform

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

Memory

Controller 1

Memory

Controller 2

Memory

Controller 3

Memory

Controller 4

Memory

Controller 5

Memory

Controller 6

Memory

Controller 7

Memory

Controller 8

Design a k-NN processing element (PE) with one floating pointmultiply-accumulator (MAC).

Duplicate the PE block as many times as possible.

Give each PE access to memory.

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 6 / 11

Page 30: First Place Memocode'14 Design Contest Entry

Hardware Design Platform

The Convey Platform

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

Memory

Controller 1

Memory

Controller 2

Memory

Controller 3

Memory

Controller 4

Memory

Controller 5

Memory

Controller 6

Memory

Controller 7

Memory

Controller 8

Design a k-NN processing element (PE) with one floating pointmultiply-accumulator (MAC).

Duplicate the PE block as many times as possible.

Give each PE access to memory.

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 6 / 11

Page 31: First Place Memocode'14 Design Contest Entry

Hardware Design Platform

The Convey Platform

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

kNNPE

Memory

Controller 1

Memory

Controller 2

Memory

Controller 3

Memory

Controller 4

Memory

Controller 5

Memory

Controller 6

Memory

Controller 7

Memory

Controller 8

Design a k-NN processing element (PE) with one floating pointmultiply-accumulator (MAC).

Duplicate the PE block as many times as possible.

Give each PE access to memory.

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 6 / 11

Page 32: First Place Memocode'14 Design Contest Entry

Hardware Design Systolic Array

Systolic Arrays

testA testB trainA trainB ret

k-NNPE

k-NNPE

k-NNPE

k-NNPE

. . .

Solves routing problem

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 7 / 11

Page 33: First Place Memocode'14 Design Contest Entry

Hardware Design Processing Element

Single Processing Element

kNN PE

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11

Page 34: First Place Memocode'14 Design Contest Entry

Hardware Design Processing Element

Single Processing Element

Datain

/192 Data

out

kNN PE

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11

Page 35: First Place Memocode'14 Design Contest Entry

Hardware Design Processing Element

Single Processing Element

Datain

Opcodein

Indexin

Opcodeout

Indexout

/192 Data

out

kNN PE

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11

Page 36: First Place Memocode'14 Design Contest Entry

Hardware Design Processing Element

Single Processing Element

Buffer

Datain

Opcodein

Indexin

Opcodeout

Indexout

/192 Data

out

≈ 1536 Registers

kNN PE

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11

Page 37: First Place Memocode'14 Design Contest Entry

Hardware Design Processing Element

Single Processing Element

Buffer

TestCache

Datain

Opcodein

Indexin

Opcodeout

Indexout

/192 Data

out

660 Registers560 LUTs

≈ 1536 Registers

kNN PE

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11

Page 38: First Place Memocode'14 Design Contest Entry

Hardware Design Processing Element

Single Processing Element

Buffer TrainBuffer

TestCache

Datain

Opcodein

Indexin

Opcodeout

Indexout

/192 Data

out

660 Registers560 LUTs

≈ 1536 Registers ≈1536 Registers≈768 LUTs

kNN PE

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11

Page 39: First Place Memocode'14 Design Contest Entry

Hardware Design Processing Element

Single Processing Element

Buffer TrainBuffer

TestCache Product

Datain

Opcodein

Indexin

Opcodeout

Indexout

/192 Data

out

660 Registers560 LUTs

≈ 1536 Registers ≈1536 Registers≈768 LUTs

8704 Registers6806 Luts20 DSPs

kNN PE

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11

Page 40: First Place Memocode'14 Design Contest Entry

Hardware Design Processing Element

Single Processing Element

Buffer TrainBuffer

TestCache Product

Sort

Datain

Opcodein

Indexin

Opcodeout

Indexout

/192 Data

out

660 Registers560 LUTs

316 Registers388 LUTs

7 BlockRAMs

≈ 1536 Registers ≈1536 Registers≈768 LUTs

8704 Registers6806 Luts20 DSPs

kNN PE

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 8 / 11

Page 41: First Place Memocode'14 Design Contest Entry

Hardware Design Dot Product

Dot Product Pipeline

31, 12-bit subtracters

31, 24-bit subtracters

32, 13x25-bit multipliers

31, 45-bit adder tree

≈ 128 interger operators

150Mhz, 128 processingelements

2.4 billion operations persecond

testA

testB

trainA

trainB

pro

du

ct

Vec

tor

Su

btr

acte

rV

ecto

rS

ub

trac

ter

Vec

tor

Mu

ltip

lier

Ad

der

Tre

e

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 9 / 11

Page 42: First Place Memocode'14 Design Contest Entry

Hardware Design Dot Product

Dot Product Pipeline

31, 12-bit subtracters

31, 24-bit subtracters

32, 13x25-bit multipliers

31, 45-bit adder tree

≈ 128 interger operators

150Mhz, 128 processingelements

2.4 billion operations persecond

testA

testB

trainA

trainB

pro

du

ct

Vec

tor

Su

btr

acte

rV

ecto

rS

ub

trac

ter

Vec

tor

Mu

ltip

lier

Ad

der

Tre

e

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 9 / 11

Page 43: First Place Memocode'14 Design Contest Entry

Hardware Design Dot Product

Dot Product Pipeline

31, 12-bit subtracters

31, 24-bit subtracters

32, 13x25-bit multipliers

31, 45-bit adder tree

≈ 128 interger operators

150Mhz, 128 processingelements

2.4 billion operations persecond

testA

testB

trainA

trainB

pro

du

ct

Vec

tor

Su

btr

acte

rV

ecto

rS

ub

trac

ter

Vec

tor

Mu

ltip

lier

Ad

der

Tre

e

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 9 / 11

Page 44: First Place Memocode'14 Design Contest Entry

Hardware Design Dot Product

Dot Product Pipeline

31, 12-bit subtracters

31, 24-bit subtracters

32, 13x25-bit multipliers

31, 45-bit adder tree

≈ 128 interger operators

150Mhz, 128 processingelements

2.4 billion operations persecond

testA

testB

trainA

trainB

pro

du

ct

Vec

tor

Su

btr

acte

rV

ecto

rS

ub

trac

ter

Vec

tor

Mu

ltip

lier

Ad

der

Tre

e

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 9 / 11

Page 45: First Place Memocode'14 Design Contest Entry

Hardware Design Sort

Sort

Counter

product

Bouncer

B3

B2

B1=100

B0

Inse

rter

RAM

V0

V1

V2

V3

out

7

19

42

68

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11

Page 46: First Place Memocode'14 Design Contest Entry

Hardware Design Sort

Sort

Counter

product13

Bouncer

B3

B2

B1=100

B0

Inse

rter

RAM

V0

V1

V2

V3

out

7

19

42

68

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11

Page 47: First Place Memocode'14 Design Contest Entry

Hardware Design Sort

Sort

Counter

product

Bouncer

B3

B2

B1=100

B0

Inse

rter

RAM

V0

V1

V2

V3

out

7

19

42

68

13

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11

Page 48: First Place Memocode'14 Design Contest Entry

Hardware Design Sort

Sort

Counter

product

Bouncer

B3

B2

B1=100

B0

Inse

rter

RAM

V0

V1

V2

V3

out

7

19

42

68

13

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11

Page 49: First Place Memocode'14 Design Contest Entry

Hardware Design Sort

Sort

Counter

product

Bouncer

B3

B2

B1=100

B0

Inse

rter

RAM

V0

V1

V2

V3

out

7

13

42

68

19

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11

Page 50: First Place Memocode'14 Design Contest Entry

Hardware Design Sort

Sort

Counter

product

Bouncer

B3

B2

B1=100

B0

Inse

rter

RAM

V0

V1

V2

V3

out

7

13

19

6842

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11

Page 51: First Place Memocode'14 Design Contest Entry

Hardware Design Sort

Sort

Counter

product

Bouncer

B3

B2

B1=100

B0

Inse

rter

RAM

V0

V1

V2

V3

out

7

13

19

42

68

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11

Page 52: First Place Memocode'14 Design Contest Entry

Hardware Design Sort

Sort

Counter

product

Bouncer

B3

B2

B1=68

B0

Inse

rter

RAM

V0

V1

V2

V3

out

7

13

19

42

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 10 / 11

Page 53: First Place Memocode'14 Design Contest Entry

Results

Results

1.3 billion integer operations / 2.4 billion integer operations persecond = 0.54 seconds.

Actual runtime is 0.54 seconds.

Paper at:http://www.rcl.ece.iastate.edu/sites/default/files/papers/TowJon14A.pdf

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 11 / 11

Page 54: First Place Memocode'14 Design Contest Entry

Results

Results

1.3 billion integer operations / 2.4 billion integer operations persecond = 0.54 seconds.

Actual runtime is 0.54 seconds.

Paper at:http://www.rcl.ece.iastate.edu/sites/default/files/papers/TowJon14A.pdf

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 11 / 11

Page 55: First Place Memocode'14 Design Contest Entry

Results

Results

1.3 billion integer operations / 2.4 billion integer operations persecond = 0.54 seconds.

Actual runtime is 0.54 seconds.

Paper at:http://www.rcl.ece.iastate.edu/sites/default/files/papers/TowJon14A.pdf

Townsend (RCL@ISU) k-NN on FPGA MEMOCODE’14 11 / 11