alvaro cassinelli*, makoto naruse* , ** and masatoshi ishikawa*

28
Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa* Ishikawa-Hashimoto lab. University of Tokyo*, PRESTO JST** Quad-tree image compression using reconfigurable free-space optical interconnections and pipelined parallel processors LCD/SLM LCD/SLM LCD/SLM LCD/SLM A C : PRESTO = Precursory Research for Embryonic Science and Technology JST= Japan Science and Technology

Upload: azra

Post on 10-Feb-2016

73 views

Category:

Documents


0 download

DESCRIPTION

…. LCD/SLM. LCD/SLM. LCD/SLM. LCD/SLM. A C : PRESTO = Precursory Research for Embryonic Science and Technology JST= Japan Science and Technology. Quad-tree image compression using reconfigurable free-space optical interconnections and pipelined parallel processors. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Alvaro Cassinelli*, Makoto Naruse*,** and Masatoshi Ishikawa*Ishikawa-Hashimoto lab. University of Tokyo*, PRESTO JST**

Quad-tree image compression using reconfigurable free-space optical interconnections

and pipelined parallel processors

LCD/SLM LCD/SLM LCD/SLM LCD/SLM

A C :

PRESTO = Precursory Research for Embryonic Science and Technology

JST= Japan Science and Technology

Page 2: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

III. Conclusion and further work

Plan of the presentation

I. OCULAR architectures for computing- Reconfigurable Single Stage (OCULAR-I)

- Reconfigurable Multi-stage (OCULAR-II)

II. OCULAR-II demonstration: Quad-tree compression.

- Quad-tree compression algorithm

- Set-up and Demonstration

- Discussion

Page 3: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

I. OCULAR architectures for computing

I.1 Reconfigurable Single Stage (OCULAR-I)

2D array of data

Photo Detector Array

Processing Element Array

VCSEL array

Optical Interconnections

Optical feed-back

I.2 Reconfigurable Multi-stage (OCULAR-II)

O ptoelectronic

C omputer

U sing

L aser

A rrays with

R econfiguration

2D array of data

Output

Photo DetectorProcessing Element Array

VCSEL

Optical Interconnections

Page 4: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

network-based parallel computers

Optical technology offers enhanced parallel communication primitives

Static Dynamic Reconfigurable interconnection

(X, Y or Z).

…switches inside processors (local control)

…switches outside processors (local or global/external control possible)

I.1 Single-stage paradigm for parallel computing

P1

P2

Pn

YZ

X

Fixed interconnection (X, Y, and Z)

mux

ULA

Mem

control

P1

P2

Pn

……

X

Y

Z

………

controller

…of great benefit for = distributed memory

shared memory

Page 5: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

…anyway, static networks can be redesigned as single-stage dynamic networks…

I.1 Dynamic architecture vs. static

In an n-degree static topology, each processor has n distinct

optoelectronic I/O ports…

Technologically challenging

Non reusable architecture

Bad scalability

P1

P2

Pn

…processors, switches and interconnections located in

distinct modules

Optimal use of electronic, optoelectronic and optics

Scalability, hardware reusability in other topologies

possible introduction of multiple stages…

switches interconnectionsprocessors

P1

P2

Pn

……

… ………

Feed-back loop

[slide not shown in main presentation]

Page 6: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

I.1 OCULAR-I system architecture

Switches and interconnections : reconfigurable diffractive optics module

dynamic single stage… Elementary Processor Array

VCSEL arrayPhoto-detector array

Optical interconnection

module

Optical feed-back

P1

P2

Pn

……

X

Y

Z

………

…optical architecture

2D optoelectronic processing layer (PD-PE-VCSEL) +

[ Modular architecture ]

Page 7: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

[ SIMD Processor array ]

Processing Module

Electronic mesh for rapid short range communication between PEs.

Si photo-detectors with

Integrated amplifier / threshold

8x8 PEs (on FPGA)

AB

4-neighbors VCSEL PD

ALU

mapped I/O

local memory (24 bits)

registers

PE

[ Photo-detector array ] [VCSEL array ]

850 nm VCSELs

Modulation > 1 GHz (possible 10-50 GHz)

Each array attached to a PCB

10 MHz operation demonstrated

Page 8: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Folded 4-f system14 x 25 x 6.2 cm

Laser diode

FT lens

Reconfigurable interconnection module

CGH is generated by an optically addressable SLM, using a laser diode and a liquid crystal display coupled trough a fiber optical plate.

Space-invariant interconnections – good/bad?

Free-space – alignment issues?

Multi-level CGH – good diffraction efficiency

Reconfiguration (“switch”) freq. – 100 Hz…

The module generates the interconnection pattern…

…it is therefore responsible for interconnection and switching

XYZ

=

alvaro:In these optical interconnection module, we require adjustable components to adopt the diffraction position on LD and PD.We have designed zooming Fourier transform lens as the adjustable component. The focal length is adjustable from 360mm to 440mm by moving one of lenses as illustrated in the figure. This function is important for matching interconnection parameters such as the pixel pitches of the VCSEL-array, the PD-array, the CGH, and for compensating for wavelength variation of the VCSEL array.

Page 9: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Multi-StagesSingle-Stage

S &

I - m

S &

I - 2

S &

I - 1

P1

P2

Pn

… …

I.2 Multi-stage paradigm for parallel computing

architecture can be “spanned” into

The cost of multiplying the

processors is paid back as…

Simplicity & Speed – S & I does not need to be complex (shuffle-exchange networks).

Scalability / Reconfigurability – for different topologies.

Pipelining – possible.

Theoretical background – Multi-stage architectures have been studied for decades in networking applications…

Hypercube

Mesh

Cube Cycle

Shuffle/exchange

Delta Benes

De Bruijn[computing] [computing & networking]Tree

PyramidOmega

Clos

Banyan

Switc

h &

P1

P2

Pn

Inte

rcon

nect

ion

Stage mStage 1

P1

P2

Pn

P1

P2

Pn

Stage 2

Page 10: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Optical interconnection

module

…Optical

interconnection module

Optical interconnection

module

Elementary Processor Array

VCSEL arrayPhoto-detector array

Two layer module

Optoelectronic processing module

I.2 OCULAR-II system architecture

Page 11: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

II. Quad-tree compression on OCULAR-II

II.3 Discussion

II.1 Quad-tree compression algorithm

II.2 Set-up and Demonstration

Interconnection module (SLM)

VCSELs

Photo Detectors

PE array

PE array

Receiver array

Sender array

Electrical feed-back trough host computer

Page 12: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

II.1 Principle of the quad-tree compression algorithm

This group of pixels is a level 2 leaf of address B

A B

D C…this pixel is a level 0 leaf of address CDA

level 1 leaf of address DB

…this pixel is NOT a leaf

…corresponding tree

B

DB

CDA

B

AC

D

level 2

level 1

level 3

level 0

D

A

B

Image…

Image as a tree = ( 2 , B ) + ( 1 , DB ) + ( 0 , CDA )

Leaf = ( level , address )

Page 13: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

II.1 Quad-tree compression on OCULAR-II architecture

- compare on receiver side

- update leaf levels of upper-level PE, if corners resulted to be lower “false” leafs.

- sequentially broadcast leaf’s values to corresponding upper PE.

• initializationarray n

array n+1

1

3

4

2

detect upper leaves

Load 2Nx2N image. ON pixels are set as lowest level leafs on local PE memories.

• from stage to stage• detect upper leaves

array n+1

array n+2

cutting branches

- parallel broadcast signal for resetting false low-level leaves.

- Download data from last array.

- Save data (level, address) from PEs which are still leaves.

• cutting branches

• End on last stage:

A C :

Rem : data from the receiver side to the sender side is electronically feed-back trough the host computer…

Page 14: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Example : interconnection for processing of level 1

1) Detecting leaves

2) Conditional broadcast

A B

C D

= computing PE on array n+1

= broadcasting PE on array n

A B

C D

…Is A a level one leaf?

A

(zero order)

D

(first order)

…If so, A must update its leaf level and cut lower branches.

CCD image of PD plane

[slide not shown in main presentation]

Page 15: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

II.2 OCULAR-II demonstrator setup

• demonstration is carried out on a two layer OCULAR II prototype

Multiple layer processing is simulated thanks to electronic feed-back between first and second processor arrays.

• Interconnection for each level are time multiplexed on the SLM module.

Level 0 Level 1 Level 2cgh

diffraction pattern

Optical interconnection

module

PE array 2PE array 1 VCSEL array PD array

• Two level CGHs are used (enough diffraction efficiency)

Page 16: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

…quad-tree algorithm and hypercube network

Image 2n/2 x 2n/2

pixel large

XY

W

Z

Quad-tree on OCULAR-II: pairs of (6-dimensional) hypercube links are generated and multiplexed in time thanks to the SLM-based interconnection module…

…on level 1: X, Z …on level 2: Y, W

2n elementary processors arranged in a n-dimensional hypercube topology

Page 17: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Interconnection module

“sender” array (SIMD + VCELS)

“receiver” array (SIMD + PD)

Monitor CCD

CGH monitor

Control and results on host computer …

II.2 Quad-Tree Compression Demonstration Setup

Page 18: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Example : holograms required during level 1 processing.

1) Broadcast hologram (quadrant comparison)

2) Re-Broadcast hologram (cutting branches)

A B

C D

= computing PE

= broadcasting PE

A B

C D

Potential leaf on level one

(zero order)

D

A

(first order)

[slide not shown in main presentation]

Page 19: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Level 0. Detecting upper leaves.

D CA B

D C

A B

…symbolic representation of the initial tree, containing 28 level 0 (most of them false) leaves

Level 0 quadrants

level 0 leaves

true

false

Page 20: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Detail of level 0 broadcasting

= “D” corners with leaf bit ON

= “D” corners with leaf bit OFF.

photo-detector chip surface as seen through the alignment CCD camera

receiver array

sender array

[slide not shown in main presentation]

In this demonstration we used two-level phase CGHs

computed by SA.

Only the 1st order of diffraction is

used as the interconnection pattern.

Page 21: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Level 0. Cutting branches.

D C

A Bnewly

created leaf on level 1

Page 22: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

D CBA

D C

A B

Level 1. Detecting upper leaves.

Level 1 quadrants

Page 23: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Level 1. Cutting branches.

D C

A B

newly created leaf

on level 2

Page 24: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Level 2. Detecting leaves and cutting branches.

D CBA

D C

A B

…symbolic representation of the encoded image as a minimal tree with seven leaves.

Level 2 quadrants

Page 25: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Also, one have to remember than our chips are only 8x8 pixel large.

However, SLM reconfiguration limits operation at maximum hundred hertz....

II.3 Discussion

28 pixels ON = 28 initial leaves. …only seven final leaves

Compression of a 2Nx2N pixel large image takes O(5.N) clock cycles...

SIMD array, VCSEL and photo-detectors can run at more than 100MHz…

two million 1024x1024 images compressed per second!

8x8 image

(N=3)15 iterations…

Page 26: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

III. Conclusion and further work

II.1 Summary

II.2 Research underway and further work

Page 27: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

Alignment is not difficult, but may become a critical issue in “true” multistage architectures...

I.1 Summary

Electronic feed-back trough host computer generates parasitic signals, and synchronization problems!

We have successfully tested OCULAR-II multistage architecture with reconfigurable optical interconnections by implementing quad-tree compression on binary images (=example of embedded hypercube)

Optically addressed SLM-based interconnection module accounts for the strongest bandwidth limitation (hundred hertz)

However…

Page 28: Alvaro Cassinelli*, Makoto Naruse* , ** and Masatoshi Ishikawa*

III.2 Further work: OCULAR-III

Alignment issues (between 2D arrays)

[ Research underway ]

- dynamic alignment using actuators and control theory.- pre-aligned connectors using fiber-bundles.

Design of an integrated (VLSI) optoelectronic layer (with switching…)

Fiber bundle

[ Future research directions ]

- Test of these “modular” architectures for building computing and networking MINs.

- Design of all-optical networks using the above paradigm.

network interconnection modules

Processor arrays

http://www.k2.t.u-tokyo.ac.jp/index-e.html

Concurrent multistage paradigm using fixed interconnections - design of fixed, guide-wave-based pre-aligned interconnection modules (the processor array is in charge of the switching function) => OCULAR-III

IBnC