shashi seminar

8/2/2019 Shashi Seminar

http://slidepdf.com/reader/full/shashi-seminar 1/25

TECHNICAL SEMINAR REPORT

ON

Three-dimensional Image Processing VLSI System with Network-

on-chip System and Reconfigurable Memory Architecture

Submitted in partial fulfillment of the Technical Seminar

VIII Semester, ECE

Under the guidance of

Mrs. Bhagirathi N.M.

(Asst. Professor, Dept. of E & C)

Prescribed By

VISVESVARAYA TECHNOLOGICAL UNIVERSITY

Submitted by:

SHASHIKIRAN K 1BI08EC090

2012

Department of Electronics and Communication Engineering

Bangalore Institute of Technology

K R Road, V V Puram, Bengaluru-560004



BANGALORE INSTITUTE OF TECHNOLOGY(Affiliated to Visveswaraya Technological University)

K R ROAD, V V PURAM, BANGALORE-560004

DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING

CERTIFICATE

This is to certify that the seminar report entitled “Three-dimensional Image

Processing VLSI System with Network-on-chip System and Reconfigurable Memory

Architecture” is presented by Shashikiran.K bearing the USN 1BI08EC090, student of

Final year is in partial fulfillment for the course of Bachelor of Engineering in Electronics

and Communication Engineering of the Visvesvaraya Technological University duringthe academic year 2011-2012.

Signature of the H.O.D Evaluated By

Mrs. Bhagirathi N.M.

Asst. Professor

Name of the student: Shashikiran.K

USN: 1BI08EC090

Date:50



CONTENTS

PAGE

Abstract 1

Introduction 2

Three-dimensional layer architecture 3

3D. RAM/ROM synthesis design system 4

Reconfigurable memory system 6

Processor control system design 7

Network-on-chip design 8

Self-repairable VLSI and dependable reconfigurable system 9

Chip simulation and experimental results 11

Conclusion 18

References i

Appendix iii



1

ABSTRACT

This is an introduction to“new RAM/ROMmodule system with reconfigurablememory architecture forthree-dimensional (3D) image processing VLSI system”.

To enable flexible image data processing, suitable input/output data control is

critical feature for high performance image processing system. The fast speed 3D VLSIsystem also requires efficient pipeline data operation. New RAM/ROM synthesis design

system is realized by specific arrangement with RAM, ROM, pin and interconnection.

The pipeline Flip-Flop control, clock buffer insertion and critical signal route have been

improved to enhance whole system operation speed. The network-on-chip system is alsoproposed to enable fast signal transmission and correct control operation. The 3D image

processing VLSI system can also be improved by suitable data storage and pipeline

control flow. The chip simulation experiments show the accurate results with247.728mW power consumption and 50MHz processing frequency. Practical chip test

conclusion confirms that new RAM/ROM synthesis design can successfully realize inner-

chip write/read function and efficient data flow control to improve 3D reconfigurable

system efficiency. Better image VLSI system can be realized by elaborate network-on-chip system and precise 3D stacking layer design.

.



2

I. INTRODUCTION

Recently, image processing technology has been widely used in vision system,

multimedia processor, and consumer electronics. Rapid developing technology requireshigh performance image processor with fast computation speed, small chip size and low

power consumption. In addition, flexible data flow, robust signal control and innerwrite/read operation are also important for image processing system.

To improve image chip performance, three-dimensional (3D) technology has beenused to realize effective image processing VLSI system. Typical 3D technology separates

whole image chip to several function layers. Different layers are stacked vertically andare connected by Through-Silicon Via (TSV) between each layers [4]. In Fig. 1, thefunction layers include CMOS image sensor layer and analog-to-digital (A/D) converters

layer, which is used to transfer analog image signal to input digital image data. In

addition, the following stacking layers, such as frame memory layer, reconfigurable

memory layer, and Processing Element (PE) module layer, are used to deal with inputdigital data and realize fast speed image processing. To improve system operation

efficiency and avoid multi-layer pipeline delay, reconfigurable memory technology has

been introduced to accelerate 3D image processing speed. In addition, recent network-on-chip research has also been developed for 3D architecture construction and inter-layer

data transmission. Data synchronization can be improved by single instruction multiple

data (SIMD) stream, and related pipeline operation stream of multiple instruction

multiple data (MIMD) can also be used to enable image VLSI system performance.

The global data control is important for parallel image processing system. To

realize suitable data control function, several RAM modules are inserted into PE layer, as3D image chip layer architecture in Fig. 1. Some useless ROM memory parts have been

replaced by additional RAM modules. The special Flip-Flop design and clock buffer

adjustment are also used to enable inner data write/read flow. Consequently, image data



3

and control instruction can be inserted or be monitored by outside controller parts. The

3D image system can also be realized easily and control pipeline thread can be improvedby direct data/instruction operation.

The rest parts of this paper are organized as follows. Section II describes whole

system configuration and layer architecture for 3D image processing VLSI system. InSection III, new design methods for Flip-Flop and clock buffer are proposed to solve

RAM/ROM co-design operation. Section IV and Section V also describe reconfigurable

memory system and processor layer system design. Section VI proposes 3D network-on-chip system architecture, and Section VII also introduces self-repairable image system for

dependable reconfigurable VLSI design. Section VIII presents 3D system simulation

results and image chip experiments. Finally, we draw our conclusion and future work inSection IX.

II. THREE-DIMENSIONAL LAYER ARCHITECTURE

The three-dimensional (3D) architecture for parallel image processing system isshown in Fig. 1. Many different function layers are stacked vertically and the Through-

Silicon Via

(TSV) can be used to connect whole chip layers with specific stacking sequence. Byeffective function layer design and precise inter-layer connection, 3D architecture can

reduce chip size, drop power consumption and accelerate system speed. In addition,

image data transmission, system signal bandwidth, and analog-digital converter

efficiency can also be significantly improved.

As shown in Fig. 1, input image data can flow from top layer to down layer forreconfigurable system operation. The input image signal can be sampled by image sensorlayer and be converted to digital image data by A/D converter layer. The frame memory

layer, reconfigurable memory layer, and processing element layer are used to deal with

digital image data. System reconfigurable operation requires careful thread pipeline andintricate state control. Frequent data write/read and direct instruction control can be

considered as critical characteristic for 3D pipeline image system. Thus RAM and ROM

combination system has been proposed to realize effective data control and highperformance image processing.



4

III. 3D RAM/ROM S YNTHESIS D ESIGN S YSTEM

A. Synchronous System Architecture

The RAM and ROM modules are used together to realize better data write/read

and inner-chip signal control in 3D image system. New 3D processing technology withFlip-Flop and clock buffer is proposed to generate input image signals as in Fig. 2. To

enable system control and data pipeline, synchronous signal system is used in 3DRAM/ROM co-design system. As in Fig. 2, synchronous clock buffer is usedto push and

delay input clock signal. The serial Flip-Flops are also used to create synchronous signals

under input clock control. The signal phase can be adjusted and synchronous output canalso keep whole system signal in same operation sequence with input clock. By our

proposed synchronous architecture, RAM/ROM synthesis design method can realize

global synchronous control, and image processing system can be pipelined together to

accelerate chip operation speed.

B. Pipeline Latch System

For 3D image processing system, pipeline thread mismatch can happen frequently

and will cause system processing faults. To keep suitable 3D system pipeline process,

synchronous system is recommended with precise instruction control. The proposed

method of replacing common latch with pipeline Flip-Flop is described in Fig. 3. Aswaveform data illustration, input signals cannot always keep synchronous with clock

signal. Then output signal cannot easily get synchronous data output and will cause

system mismatch. Pipeline Flip-Flop (PFF) method is proposed and data switch moduleis used to control output signal under input signal combination. The related Karnaugh

table is also described in Fig. 3 to show the detailed switch selection. In addition, new D-

FF module is also used to replace common RS-FF latch to enable signal synchronizationand 3D system pipeline process.



5

C. 3D RAM/ROM Reconfigurable Memory System

The RAM/ROM whole system configuration for 3D image processing system is

illustrated in Fig. 4. The input image data are stored in frame memory and inner-chip datamemory. Through interconnection network between adjacent layers, image data can be

sent to four Process Elements (PEs) for 3D pipeline system operation. Output image

signal can be sent out by system output interface. To control inner-chip data, control unit

and RISC processor are used to realize signal pipeline and data flow. The configurationmemory is also used to insert the reconfiguration signal and enable the 3D reconfigurable

image processing. The RAM/ROM synthesis design system can also write/read input

image data to inner memory modules directly, and straight control instruction through 3Dlayers can also improve image chip performance.



6

D. Whole Chip Architecture for 3D Image System

The VLSI chip architecture of 3D image processing system is given in Fig. 5. The

input data, address information and control signal can enter input switch in VLSI

processor chip. Through control module and SRAM module, input image data can realize

pipeline image processing. By output switch module, image data can be sent out toconstruct new output image picture. The control module in Fig. 5 consists of several

modules, including frame memory, four PE modules, MAIN memory, INST memory for

instant data process, and CONFIG memory for reconfigurable data process. The innerimage data can realize pipeline operation by frame memory and PE modules. The image

data are fetched from MAIN memory module. Neighboring INST memory and CONFIG

memory are used together to control pipeline thread and reconfigurable sequence. Theadditional SRAM modules are applied to store image data and control instruction for

direct outside system control into inner-chip modules. The RAM and ROM synthesis

architecture in inner control module can realize system control and precise data pipeline

by proposed chip architecture and memory modules.

IV. RECONFIGURABLE M EMORY SYSTEM

In proposed 3D image processing system, RAM and ROM are used together torealize image data write/read and inner signal control. Synchronous cl ock buffer and

pipeline Flip-Flop element are applied to realize image data operation for system

instruction insertion and memory data fetch. To realize synchronous system control in 3Dimage processing system, new 3D reconfigurable memory system is proposed to enable

image data reconfiguration and system self-repairable operation. Fig. 6 illustrates typical

3D stacking architecture for sensor and reconfigurable memory. Common sensor network

was used to grasp input image data, including static picture data and dynamic movingimage data. The sensor image data will be transferred by A/D converter layer and

interconnect network to next function layer as shown in Fig. 1.



7

Next image processing layer is divided by several frame memory blocks as in Fig.

6. Different target image data will be assembled to get related reconfigurable memory

blocks. The separated memory blocks will be different and be suitable for detailed input

image data. If image data operation has some problems, such as image data loss andpicture damage, neighboring memory block will be combined again to remove error

image blocks and enable re-healing processing or self-repairable image processing. The

processing image data and reconfigurable instruction are controlled by processor elementlayer. Thus 3D reconfigurable memory system can realize precise image processing and

raise whole system robustness.

V. PROCESSOR CONTROL SYSTEM DESIGN

The Processing Element (PE) layer in 3D image processor system can controlimage memory configuration and pipeline data flow, as shown in Fig. 7. To realize direct

system control and reduce inter-layer transmission loss, processor modules and relatedmemory blocks are mostly stacked in same vertical column. Input image data from



8

outside sensor layer can be converted by following A/D converter layer. The analog

image signal can be transferred to pipelined digital image data in following columnmemory block. The image data can flow from top layer to down layer vertically, and

system control instruction works from down layer to top layer on the contrary.3D

reconfigurable RAM/ROM memory layer can be controlled easily and input image data

can be operated with fast pipeline thread and flexible inner instruction.

Many related reconfigurable processor system are used in recent VLSI processing

system. Similarly, the processor layer has many processing elements and reconfigurablesystem is also applied to realize reconfigurable image operation. As in Fig. 7, the vertical

data flow can be controlled by adjacent processor layer and frame memory layer.

Processing Elements (PEs) layer enables related memory combination and data block partition. Depended on image operation requirements, frame memory layer can be

divided to several blocks to store pipeline image data. ROM and RAM modules can also

be combined to constitute whole memory block and realize image self-repairable

operation. The image data accuracy and system processing speed will be improved by

precise processor control and 3D reconfigurable architecture.

VI. NETWORK-ON-CHIP DESIGN

For advanced VLSI system research, system processing speed, whole chip area

and power consumption become the critical design challenges. Recently, the 3D stacking

layer architecture and network-on-chip design are also considered to improve systemefficiency. In addition, further research focuses on system combination design for 3D

architecture and network-on-chip system. Thus new 3D interconnect network architecture

has been proposed in this paper to improve layer stacking flexibility and whole system

performance.

In practical 3D image processing chip, we improve the 3D network-on-chip

design based on layer architecture in Fig. 8. For complex stacking layer system, manydifferent function layers are connected with specific Through-Silicon via (TSV)

architecture. Many stacking layer types are used with related TSV connection structures,

including inter-layer TSV, trans-layer TSV, and multi-layer TSV. As in Fig. 8,



9

neighboring layer connection means inter-layer TSV network, which is designed to

connect adjacent layers by specific silicon via and interconnect network. Another TSVtype is trans-layer TSV, which passes through neighboring function layer and connect

corresponding layers by trans-connection silicon via, such as reconfigurable memory

layer and processing element layer in Fig. 8. In addition, further TSV tunnel design can

also connect several layers together and realize multi-layer TSV type with assemblestacking layer connection. The three-layer assemble connection for A/D converter, power

network, and reconfigurable memory layer in Fig. 8 shows typical multi-layer TSV

architecture. Also, four-layer TSV network from power network layer to processingelement layer can describe further complex multi-layer structure in seven-layer image

processing system as in Fig. 8.

VII. SELF -REPAIRABLE VLSI AND DEPENDABLE RECONFIGU-

RABLE SYSTEM

Dependable reconfigurable VLSI system is recent research hotspot for high

performance processor system. In practical image VLSI chip, data operation errors can

happen frequently and will cause serious problems to influence whole systemperformance. To solve image data mismatch and processing error problem, self-

repairable methods and re-healing design technologies are applied in our practical chip

design.



10

Common robust design method to repair VLSI system error is reconfigurable re-healing technology. As in Fig. 9, the processing image data are damaged in center part of

whole image blocks. Reconfigurable self-repair method checks the vertical image blocks

to get the detailed error address. The horizontal image blocks are also identified by

memory data scanning to get required image data, which are used to repair error imageblocks with suitable re-healing methods.

After damaged image data and related address blocks are decided by image block sweeping, specific error image blocks will be reconfigured and neighboring memory

blocks are used to replace error image blocks and repair damaged image data by system

design target and related image information. When image error data are corrected in

corresponding image memory blocks, whole VLSI system will enter reconfigurableoperation again to recover original image block architecture.

The repairable image blocks are assembled together and are used to construct new center

image block again. The border separation is removed and four corrected image sub-blocks are composited to realize image re-healing operation.

The critical points for VLSI self-repairable design are reconfigurable block areaand repairable control sequence. The VLSI re-healing performance is determined by

design requirements and system robustness. If large memory area can be used for image

repairing, system correct efficiency will be increased and block searching time will be

extended with high power consumption and large chip area. In addition, if newdependable design technologies are also used together, such as compact reconfigurable

border and small image sub-block, operation power and chip area can be reduced greatly.

However, related reconfigurable processing cannot always realize successful repairable

results, and system robustness will be reduced rapidly. Thus in common repairablemethod, selected image border varies from three pixels to five pixels around the detected

error image block.



11

The specific repairable sequence also determinates whole system operation andprocessing efficiency for the 3D image reconfigurable VLSI system. Considering the

detailed control sequence, first system operation is scanning image range and searching

related address for repairable image blocks. Second processing method is reconfigurableoperation to separate the detected damage image blocks and construct neighboring

memory blocks. Third sequence is re-healing processing and new image block

reconfiguration by related memory interconnection and image repairable method. Finally,

memory blocks and processing elements will be combined again to recover originalimage VLSI system. The re-healing sub-image blocks are assembled together and

improved image results are created by repairable processing VLSI and reconfigurableimage system.

VIII. CHIP SIMULATION AND EXPERIMENTAL R ESULTS

A. Image Processor Chip Design and Simulation

Based on synchronous improvements for clock buffer and Flip-Flop latch in 3Dstacking layers, we designed new image processing chip by 0.13 um technology. Fig. 10

shows the layout micrograph for practical manufactured VLSI chip. The detailed chip has

208 pins with 5000 um length and 5000 um width. The gate number is about 980,000

gates and chip utilization is 20.655%. Chip clock cycle is 50 MHz, and practicalprocessing frequency is 25 MHz The interconnect distribution parts use 8 metal layers,



12

including power mesh network, clock tree and other function signal wires.

For IR-drop verification with zero EM violation, practical switching rate is 20%

under 50 MHz clock control. The power consumption is 247.728 mW under 1.2V powersource. For the VDD-drop simulation as in Fig. 11, worst drop value is 13.802 mV with

1.15% drop rate. Similarly, worst rise value is 9.919 mV and related rise rate is 0.827%

for VSS-rise simulation in Fig. 12. Based on experimental simulation for VDD-drop andVSS-rise, image processor chip can realize suitable image operation without excessive

disturbance for signal floating and drop/rise variance. 3D layer stacking in practical

processor chip can also be realized easily and assembled successfully with precise imagedata adaptation and fluent inter-layer signal transmission.

B. Test Board Experiments and Simulation

We also designed test board to get practical experimental results after 3D image

processing chip was manufactured. Photograph of implemented test board is shown in

Fig. 13. Image test board system consists of base board, socket part, Input/output ports,interface part, and image processing chip. Practical base board is designed by 4 layer

experimental board with 180 mm length and 180 mm width. The socket part is embedded

in center range of base board and practical tested chip is inserted in the socket with tightpin contact. Around inserted socket, we designed four column input/output ports, which

are used to write/read image data, memory address and control instruction signals. By



13

outside computer interface port, we can also control practical test board and access board

signals for 3D image processing system simulation

Practical test results for 3D image processing chip by computer monitor system

are given in Fig. 14 and Fig. 15. We control inner image data bus with 1 bit stepincrement input by inserted SRAM modules as in Fig. 5. The practical results of SRAM

output are also upgraded step by step with 1 bit data change. Ladder increment results can

realize suitable write and read procedure from outside part to inside chip directly, as in

Fig. 14. Image system control can be increased greatly and global pipelined operation canbe realized easily with immediate outside instruction control. Inner memory in processor

module, such as Main memory and instant memory (INST memory) in Fig. 5, can alsoaccess outside data signals directly. In addition, Fig. 15 shows processing waveform forinput data read and output data write. Main memory data are used to store main image

processing data, and INST memory data mean the adjustment instruction for image

system reconfiguration. From experimental waveform results, outside data and control

instruction can be inserted into inner RAM modules and can be fetched by outsidesystem. Thus we can realize better data control and faster pipeline operation to increase

whole 3D image chip performance.



14

C. Image Simulation and Conversion Results

Image simulation results by 3D reconfigurable image chip are given in Fig. 16.Many image data are tested and can be used for conversion simulation by practical imagechip and computer simulation program. As the image conversion experiments in Fig. 16,

we tested typical image figure named as “Cameraman”, which describes the particular

man using Camera machine to take photograph around his environment. Based on

specific picture conversion and related intern image processing, “Cameraman” figure can

be compressed rapidly. The picture can be used for next image operation to enhance

picture display precision and system processing performance.



15

To realize data transformation and image compression, we use specific imageMPEG algorithms to extract figure edge and get the corresponding thresholding figure for

final image processing. Also, Fig. 16(b) and Fig. 16(c) illustrate detailed image

conversion and data transmission, respectively. The practical image data can be operated

quickly and realize fast image processing for super high speed Camera design. 3Dnetwork-on-chip architecture ensures fast system speed and improves data transmission

efficiency. Other MPEG/JPEG image processing methods, such as DCT/IDCT algorithm,

pipeline image operation, multi- layer stacking method, and reconfigurable self-repairable memory, can also be applied for new 3D image processing system.

D. Reconfigurable Image Self-repairable Processing

In practical image chip test, image precision problem and picture distortion

happen frequently. Common image errors are generated in signal processing and datatransformation for 3D image processing system. Robust image self-repairable technology

is necessary for high performance image chip design. Fig. 17 shows typical image

repairable methods in our 3D image processing system. Six pictures from Fig. 17(a) to

Fig. 17(f) are used to explain detailed processing sequences for picture data recovery andimage re-healing results.



16

First test picture with plane image in Fig. 17(a) is original experimental picture. InFig. 17(b), two square blocks mean image errors in test picture. Data scanning is

necessary to capture the detailed places in practical picture range. Next reconfigurable

technology is used to replace error image blocks with neighboring image parts as in Fig.

17(c). By related image block repairing, error image blocks can be removed and newimage blocks are assembled again with specific sequence to recover previous image part.

Similar operation is realized continuously for another error block in Fig. 17(d). Following

step with reconfigurable memory block and image re-healing operation are also used torepair image data in Fig. 17(e). Finally, whole test picture can be recovered to correct

image data results as shown in Fig. 17(f).

If there are numerous image errors and large picture area, reconfigurable self-

repairable operation in 3D image VLSI chip is also progressed step by step with similar

sequence as Fig. 17. The self-repairable technology can enhance whole image system

performance. It can also heal error picture parts after 3D image system processing and

inter-layer picture data transmission. Image size and error number can influence the datarecover efficiency and output image quality. Thus the synthesis image operation,

including picture data processing and reconfigurable repairable system, is our maindesign contribution in practical 3D image VLSI system.

E. Memory Allocation and Chip Size

Practical image operation and whole VLSI system are realized by related

RAM/ROM memory blocks in our 3D reconfigurable image chip. The chip size is also

decided by memory allocation area and RAM/ROM block number. As in Table I,memory information and allocation sequence are shown in detail for our image

processing chip.



17

Based on 3D chip architecture as in Fig. 5, frame memory blocks (FMem) areused to store image frame data. Total Fmem module has 8 KB data capacity, including

16number and 256x8bit unit size for each frame memory. Practical area for each FMem

block is 46600 um2. Whole FMem block area is allocated with 745600 um2as shown in

Table I. Similarly, data memory (DMem) and I/O memory (IOMem) have same size andarea allocation in our 3D image chip. The important configuration memory (CMem) has

4096x40bit block size and 587600 um2 chip area. The related processing memory, such

as main memory (MMem), pipeline instruction memory (PMem), and table memory(TMem) in Table I also use SRAM modules to deal with image data directly in whole

VLSI system. The pipeline instruction is handled in PMem module, and TMem block

gives system Table memory to store image middle procedure data for next reconfigurableimage processing. Furthermore, additional SRAM module in Fig. 5 uses large 8192x32bit

size and two 32-bit fast memory blocks. The inserted SRAM blocks occupy 212800 um2

, which are used to realize the direct image data and instruction fetch operation into our

3D image processing system.

In summary, total memory blocks have nine type modules with 106 KB size and

70 memory number. Whole memory area is about 4250600 um2by size accumulation andsystem allocation. If chip peripheral ring area is also considered in practical 3D image

system, additional allocated memory area is about 1000000 um2

, which commonly occupies about 25% area in entire memory system. Thus whole imagechip area is more than 5000000 um2with detailed memory size allocation. If addition

core processing elements are also allocated, whole chip area can be increased further and

global synthesis design between memory and processor will become future research

challenges. Robust image processing, such as self-repairable operation and re-healingmethod, will also consider memory allocation and module area in whole 3D image chip.

F. Discussion and Future Challenge

Direct memory insertion in 3D image processing system can improve system

robustness and control inner data operation. As in Table II, common 2D planar image

system has large chip size and slow processing speed. The image data and operation



18

instruction can be handled by common sequence, and its system robustness is not enough

without suitable self-repairable capability and re-healing feature. Compared withcommon 2D image system, our proposed 3D architecture can reduce chip size and

increase processing speed. 3D operation system can also realize fast parallel processing

and robust image operation for image consumer electronics products.

In this paper, we propose new 3D RAM/ROM image system with reconfigurable

operation memory and 3D network-on chip architecture. The design chip can insert

image data or control instruction into inner-chip modules directly. The operation imagedata can also be sent out immediately and inner control instruction can be monitored for

reconfigurable system processing. Whole 3D VLSI system speed can be accelerated

rapidly with very fast parallel and direct data control. 3D image system also has self-repairable feature and re-healing merit to realize dependable VLSI reconfigurable system

and high robustness image operation as in Table II.

One demerit point for 3D network-on-chip architecture is complex critical data

path from input ports to output ports. The extended signal routes will cause data

transmission loss and can influence whole VLSI system performance. By additional clock buffer insertion and pipeline Flip-Flop latch replacement, inner signal delay can be

created and will waste inner processing time. Whole pipeline system frequency will also

be reduced and image data processing speed will be decreased because of critical da tapath delay. In addition, complex network connection can also influence system data

transmission and introduce inter-chip data mismatching.

Another weak point in our 3D chip system is layer stacking efficiency. Different

image function layers have respective layer features and connection methods. The

stacking sequence and neighboring layer relation also require precise design and system

consideration. More stacking layers can realize small chip size, fast operation speed, and

compact image operation. Global system synthesis with different stacking layers andrelated function combination is recent design hotspot. In addition, low power system and

high robust chip will also become important research targets in the future.

Thus our future design challenge in next 3D image system will focus on several

significant targets, such as reduce data critical path, decrease inter-layer connection

complexity, and accelerate image processing speed . The pipeline thread of image innerelement will be studied and the delay path will be divided for inner pipeline process. The



19

multi-layer stacking technology is also future research topic with detailed layer

combination and specific TSV tunnel design. Consequently, precise design adjustmentsin processing element modules and reconfigurable data sequences will be improved to

satisfy new data path flow and highly pipeline image operation in our future complex 3D

system research.

IX. CONCLUSION

In this paper, new reconfigurable system with RAM/ROM memory modules and

3D layer architecture is proposed for highly pipeline image processing chip. Flexible dataflow and direct system control can be realized by precise data fetch in RAM and ROM

memory. The synchronous clock buffer and pipeline Flip-Flop module are used to adjust

3D system processing flow. New 3D stacking layer architecture can also be applied to

reduce image chip size and increase system pipeline speed. Additional 3D network-on-chip connection system can satisfy 3D chip stacking requirements and enable global

pipeline operation for multi-layer VLSI image system. Experimental results in this paperillustrate that new 3D reconfigurable memory system can deal with inner data and control

instruction signals directly for dependable VLSI chip. Further image robust methods,including self-repairable operation and re-healing system, are also used in proposed 3D

image processing system. Future challenges will be focused on critical path reduction,

fast pipeline thread construction, complex multi-layer stacking methods, and highlyrobust self-dependable VLSI system research.



i

REFERENCES

D. Doswald, J. Hafliger, P. Blessing, N. Felber, P. Niederer, and W.Fichtner, “A

30-frames/s megapixel real-time CMOS image processor,” IEEE J. Solid-StateCircuits, vol. 35, no. 11, pp. 1732-1743, Nov. 2000.

M. Koyanagi, Y. Nakagawa, K.-W. Lee, T. Nakamura, Y. Yamada, K. Inamura,K. Ki-Tae Park, and H. Kurino, “Neuromorphic vision chip fabricated using

three-dimensi onal integration technology,” in Proc. ISSCC Dig. Tech. Papers ,

Feb. 2001, pp. 270 – 271, 454.

J. W. Joyner, P. Zarkesh-Ha, and J. D. Meindl, “Global interconnect design in a

three-dimensional system-on-a-chip,” IEEE Trans. VLSI Systems, vol. 12, no. 4,

pp. 367 – 372, Apr. 2004.

M. Koyanagi, T. Fukushima, and T. Tanaka, “High-density through silicon vias

for 3-D LSIs,” Proceedings of the IEEE, vol. 97, no. 1, pp. 49 – 59, Jan. 2009.

K. Kiyoyama, Y. Ohara, K.-W. Lee, Y. Yang, T. Fukushima, T. Tanaka, and M.

Koyanagi, “A parallel ADC for high-speed CMOS image processing system with

3D structure,” in Proc. IEEE Int. Conf. 3D System Integration , Sep. 2009, pp. 1 – 4.

T. Sugimura, Y. Konishi, J. Deguchi, T. Ishihara, T. Fukushima, A. Konno, M.Uchiyama, and M. Koyanagi, “Design of parallel reconfigurable image processor

with three-dimensional structure,” IEICE Trans. Inf. Syst., vol. J89-D, no. 6, pp.

1141 – 1152, Jun. 2006.

D. Amano, T. Sugimura, Y. Konishi, T. Fukushima, T. Tanaka, and M. Koyanagi,

“Reconfigurable stacked memory system for parallel image processing using

three-dimensional LSI technology,” in Proc. IPSJ-SLDM , Oct. 2006, pp. 147 –

152.

D. Lattard, E. Beigne, F. Clermidy, Y. Durand, R. Lemaire, P. Vivet, and F.

Berens, “A reconfigurable baseband platform based on an asynchronousnetwork-on-chip,” IEEE J. Solid-State Circuits , vol. 43, no. 1, pp. 223 – 235, Jan.

2008.

T. Komuro, S. Kagami, and M. Ishikawa, “A dynamically reconfigurable SIMD processor for a vision chip,” IEEE J. Solid-State Circuits, vol. 39, no. 1, pp. 265 – 268, Jan. 2004.

S. Kodama, D. Amano, T. Sugimura, T. Fukushima, T. Tanaka, and M.

Koyanagi, “New reconfigurable memory architecture for parallel image-

processing LSI with three-dimensional structure,” Japanese J. Applied Physics,vol. 47, no. 4, pp. 2774 – 2778, Apr. 2008.



ii

D. Kim, Z. Fu, J. H. Park, and E. Culurciello, “A 1-mW CMOS temporal-

difference AER sensor for wireless sensor networks,” IEEE Trans. Elec. Devices,

vol. 56, no. 11, pp. 2586 – 2593, Nov. 2009.

J. Guo and S. Sonkusale, “A high dynamic range CMOS image sensor forscientific imaging applications,” IEEE J. Sensors , vol. 9, no. 10, pp. 1209 – 1218,

Oct. 2009.

H. Singh, M. Lee, G. Lu, F. Kurdahi, N. Bagherzadeh, and E. Filho,

“MorphoSys: an integrated reconfigur able system for data-parallel and

computation-intensive applications,” IEEE Trans. Computers, vol. 49, no. 5, pp.465 – 481, Nov. 2009.

H. Kondo, M. Nakajima, N. Masui, S. Otani, N. Okumura, Y. Takata, T. Nasu,

H. Takata, T. Higuchi, M. Sakugawa, H. Fujiwara, K. Ishida, K. Ishimi, S.

Kaneko, T. Itoh, M. Sato, O. Yamamoto, and K. Arimot, “Design andimplementation of a configurable heter ogeneous multicore SoC With nine CPUs

and two matrix processors,” IEEE J. Solid-State Circuits, vol. 43, no. 4, pp. 892 – 901, Jan. 2008.

H. Kanbara, R. Kinjo, Y. Toda, H. Okuhata, and M. Ise, “Dependable embedded

processor core for higher reliability,” in Proc. IEEE Int. Symp. Consumer

Electronics, May 2009, pp. 819 – 822.

O. J. Kuiken, X. Zhang, and H. G. Kerkhoff, “Built-in self-diagnostics for aNoC-based reconfigurable IC for dependable beamforming applications,” inProc. IEEE Int. Symp. Defect and Fault Tolerance of VLSI Systems, Oct. 2008,

pp. 45 – 53.

I. Loi, S. Mitra, T. H. Lee, S. Fujita, and L. Benini, “A low-overhead fault

tolerance scheme for TSV-base d 3D network-on-chip links,” in Proc. IEEE/ACM

Int. Conf. CAD, Nov. 2008, pp. 598 – 602.

F. Li, C. Nicopoulos, T. Richardson, Y. Xie, V. Narayanan, and M. Kandemir,

“Design and management of 3D chip multipr ocessors using network-in-

memory,” in Proc. Int. Symp. Computer Architecture , Jun. 2006, pp. 130– 141.

B. Feero and P. P. Pande, “Performance evaluation for three-dimensional

networks-on-chip,” in Proc. IEEE Computer Society Annual Symp. VLSI, Mar.2007, pp. 9 – 11.

Y. Xu, Y. Du, B. Zhao, X. Zhou, Y. Zhang, and J. Yang, “A low-radix and low-

diameter 3D interconnection network design,” in Proc. IEEE Int. Symp. HighPerformance Computer Architecture , Feb. 2009, pp. 30 – 42.



iii

APPENDIX

IEEE paper on Three-dimensional Image Processing VLSI

System with Network- on-chip System and Reconfigurable

Memory Architecture

By

Yun Yang, Member , IEEE

shashi seminar

Documents