vlsi design of a neurohardware processor...

VLSI DESIGN OF A NEUROHARDWARE PROCESSOR IMPLEMENTING

THE KOHONEN NEURAL NETWORK ALGORITHM

AVINASH RAJAH

UNIVERSITI TEKNOLOGI MALAYSIA

VLSI DESIGN OF A NEUROHARDWARE PROCESSOR IMPLEMENTING

THE KOHONEN NEURAL NETWORK ALGORITHM

AVINASH RAJAH

A thesis submitted in fulfilment of the

requirements for the award of the degree of

Master of Engineering (Electrical)

Faculty of Electrical Engineering

Universiti Teknologi Malaysia

DECEMBER 2005

v

ABSTRACT

As artificial neural networks continue to gain popularity in the domain of

pattern recognition, there have been growing demands for these models to be

executed at high-speeds. Thus, to cater to this need, the VLSI design and

implementation of a neurohardware for high-speed pattern recognition is proposed in

this research. The UTM-Neuroprocessor implements the Kohonen Neural Network

for pattern classification. High-speed pattern classification by the neural paradigm is

achieved through massively parallel execution based on the neuron-parallel

processing approach. For proof of concept purposes, a 10x10 UTM-Neuroprocessor,

which implements a 10x10 Kohonen network, was developed in this work. The

design and rapid FPGA prototyping of the neuroprocessor was achieved using

VHDL and the Altera Nios embedded system development kit. The FPGA-based

prototype of the 10x10 UTM-Neuroprocessor is able to function at a frequency of

100 MHz and delivers performances up to 5.079 GCPS and 2.285 GCUPS. Software

components, including a VB-based GUI, were also developed to allow execution of

pattern recognition applications on the UTM-Neuroprocessor. For efficient VLSI

implementation of the UTM-Neuroprocessor, the combined FPGA-VLSI approach

was proposed. Correspondingly, the VLSI design of a 2x2 array computation engine,

termed the Array_2x2 microchip, was developed in the AMI 0.5µm process

technology and fabricated at the Europractice IC foundry. The fabricated Array_2x2

microchip can be applied to produce a 2x2 UTM-Neuroprocessor, in the combined

FPGA-VLSI implementation approach. The design consumes an area of 16.9 mm2 on

silicon and is encapsulated in 84-pin PGA package. SPICE simulations of the

Array_2x2 design proved functionality at an operating frequency of 90 MHz. The

microchip is able to deliver performances of up to 169.41 MCPS and 75.78 MCUPS

MCUPS for a 2x2 UTM-Neuroprocessor.

vi

ABSTRAK

Pertumbuhan dalam penggunaan teknologi rangkaian saraf (neural networks)

dalam pelbagai bidang aplikasi telah mewujudkan keperluan untuk perkakasan

mikroelektronik canggih yang mampu melaksanakan rangkaian saraf pada kelajuan

tinggi. Oleh demikian, implementasi VLSI bagi sebuah pemproses yang

melaksanakan rangkaian saraf Kohonen pada kelajuan tinggi untuk aplikasi

pengecaman corak, telah dicadangkan dalam kajian ini. Pengecaman corak pada

kelajuan tinggi oleh UTM-Neuroprocessor direalisasikan menggunakan kaedah

pemprosesan selari. Bagi tujuan pemprototaipan, rekabentuk sebuah UTM-

Neuroprocessor, yang melaksanakan rangkaian Kohonen 10x10, telah dibangunkan.

Pembangunan rekabentuk 10x10 UTM-Neuroprocessor telah menggunakan VHDL

dan kit pembangunan sistem terbenam Altera Nios. Rekabentuk yang dibangunkan

telah diprototaip segera pada FPGA dan mampu mencapai kelajuan setinggi 100

MHz. Beberapa aturcara perisian juga telah dibangunkan bersama, untuk

membolehkan pemprosesan aplikasi pengecaman corak pada 10x10 UTM-

Neuroprocessor. Menggunakan aturcara yang telah dibangunkan, beberapa aplikasi

pengecaman corak dunia sebenar telah dilaksanakan. UTM-Neuroprocessor telah

didapati mampu menawarkan kuasa pemprosesan setinggi 5.079 GCPS dan 2.285

GCUPS untuk aplikasi-aplikasi tersebut. Bagi implementasi VLSI UTM-

Neuroprocessor, kaedah FPGA-VLSI tergabung telah dicadangkan. Berdasarkan

cadangan tersebut, sebuah mikrocip yang melaksanakan rangkaian Kohonen 2x2

telah direkabentuk dengan proses teknologi AMI 0.5µm. Mikrochip tersebut telah

difabrikasi di Europractice, Belgium dan boleh digunakan untuk membangunkan

sebuah 2x2 UTM-Neuroprocessor dalam kaedah implementasi FPGA-VLSI

tergabung. Simulasi SPICE telah membuktikan kefungsian rekabentuk mikrocip

tersebut pada 90 MHz. Pada kelajuan ini, Array_2x2 membolehkan kuasa

pemprosesan setinggi 169.41 MCPS dan 75.78 MCUPS dicapai oleh sebuah 2x2

UTM-Neuroprocessor.

vii

TABLE OF CONTENTS

CHAPTER TITLE PAGE

DECLARATION ii

DEDICATION iii

ACKNOWLEDGEMENTS iv

ABSTRACT v

ABSTRAK vi

TABLE OF CONTENTS vii

LIST OF TABLES xiv

LIST OF FIGURES xvi

LIST OF ABBREVIATIONS xxi

LIST OF APPENDICES xxii

PART ONE

THESIS CONTENT

1 INTRODUCTION 1

1. 1 Background and Motivation

1.2 Problem Statement

1.3 Objectives

1.4 Scope of Work

1.5 Research Contributions

1

3

4

5

6

viii

1.6 Organization of Thesis

1.7 Summary

6

8

2 LITERATURE REVIEW 9

2.1 Pattern Recognition

2.2 Unsupervised Learning in Neural Networks

2.3 The Kohonen Neural Network

2.4 Neurohardware Architecture

2.4.1 General-Purpose versus Special-Purpose

Neurohardware

2.4.2 System Architecture

2.5 Review of Previous Works

2.5.1 The nEXPERT System

2.5.2 The UTM-Neuroprocessor

2.5.3 The NBISOM-25 Chip

2.5.4 The COKOS Chip

2.6 Summary

9

11

12

12

13

14

15

15

16

17

18

18

3 VLSI DESIGN METHODS AND TOOLS 20

3.1 Research Overview

3.2 VHDL Modeling with UTM-VHDLmg

3.3 Hardware Prototyping of FPGA-based Embedded

Systems

3.3.1 Design Flow of FPGA-based Embedded

Systems

3.3.2 Embedded System Design with Altera

Nios Development Kit

3.3.3 Designing with Altera Quartus II EDA

Tool

20

22

23

23

24

24

ix

3.3.4 Embedded Software Development

3.4 ASIC Design Methodology in UTM-ECAD

VLSI Research Laboratory

3.4.1 Standard Cell Library for AMI 0.5um

Process Technology

3.4.2 Logic Synthesis with Synopsys Design

Compiler

3.4.3 Back-End VLSI Design with Tanner

EDA Pro Software Suite

3.4.3.1 Placement & Routing with L-

Edit SPR

3.4.3.2 Circuit Extraction with L-Edit

Extract

3.4.3.3 Circuit Simulation with T-Spice

Pro

3.4.4 Physical Verification & Tape-Off

Procedures

3.5 Development of Standard Cell Libraries

3.5.1 BBX Standard Cell Library for P&R Tool

3.5.2 Circuit Extraction Of Designs based on

BBX Standard Cells

3.5.3 BBX Replacement Procedure for

Fabrication

3.6 Summary

26

27

30

32

34

35

36

37

38

40

40

42

45

46

4 ALGORITHM & ARCHITECTURAL

SPECIFICATIONS FOR THE PROPOSED

NEUROHARDWARE

47

4.1 Kohonen Neural Network

4.1.1 Kohonen NN Learning Algorithm

4.1.2 Kohonen NN Recall Algorithm

47

49

52

x

4.2 Hardware Implementation - Algorithm Issues

4.2.1 Grid Topology & Initialization Scheme

4.2.2 Similarity Measure with Manhattan

Distance

4.2.3 Learning Rate

4.2.4 Neighbourhood Function

4.2.5 Termination Condition

4.3 Hardware Implementation - Architectural Issues

4.3.1 Type of Parallelism & Hardware

Architecture

4.3.2 Data Representation & Precision

4.3.3 Weight Storage

4.4 Summary

53

54

54

55

55

57

58

58

60

61

61

5 DESIGN OF THE NEURO CORE 63

5.1 Introduction of the UTM-Neuroprocessor (Top-

Level Architecture and Behaviour)

5.2 Design of the Neuro Core

5.3 Instruction Set & Format

5.4 Design Parameterization

5.5 Design of the Controller Module

5.5.1 Global Control Submodule

5.5.2 Core Control Submodule

5.5.3 Controller Operation For Kohonen Recall

Phase

5.5.4 Controller Operation For Kohonen

Learning Phase

5.6 Design of the Array Computation Engine

5.6.1 Manhattan Distance Computation

5.6.2 Countdown Operation

5.6.3 Weight Adaptation Computation

63

66

68

69

71

72

74

75

77

80

82

83

85

xi

5.7 Design of the Processing Element (PE)

5.7.1 Instruction & Data Passing

5.7.2 Manhattan Distance Computation

5.7.3 Countdown Function

5.7.4 Weight Adaptation Computation

5.8 Summary

87

88

88

90

91

93

6 DESIGN OF THE UTM-NEUROPROCESSOR 95

6.1 Design of the Neuro Co-processor

6.1.1 Design Of The Avalon Bus Interface

Hardware Module

6.2 The Design of the UTM-Neuroprocessor

6.3 Hardware Development of the UTM-

Neuroprocessor

6.4 Embedded Software Development

6.4.1 Pattern Classification & Recognition

(PCR) Software Module

6.5 Summary

95

96

98

99

103

103

107

7 VLSI IMPLEMENTATION OF THE ARRAY

COMPUTATION ENGINE

109

7.1 UTM-Neuroprocessor : Combined FPGA-VLSI

Implementation

7.2 Design Of The 2x2 Array Computation Engine

7.3 Logic Synthesis

7.4 Floorplanning

7.4.1 Floorplanning Of Top-Level Blocks

7.4.2 Power Distribution Network

7.5 Layout Design Of PE & TM Blocks

109

112

113

115

116

117

119

xii

7.5.1 VLSI Layout of Processing Element (PE)

7.5.2 VLSI Layout of the Transmission

Multiplexer (TM)

7.6 I/O Padframe Design

7.7 Full Chip Layout Design

7.8 Physical Verification

7.9 Tape-Off

7.10 Fabricated Array_2x2 Microchip

7.11 Summary

120

123

124

127

130

130

134

134

8 APPLICATION DEMONSTRATION &

PERFORMANCE EVALUATION

136

8.1 Demonstration of Pattern Recognition

Applications

8.1.1 Generic Classification – Iris Plants

Dataset

8.1.2 Classification in the Medical Domain –

WBCD Dataset

8.1.3 Classification in the Medical Domain –

HIV Blood Dataset

8.2 Performance Evaluation

8.2.1 Derivation of the Performance Equations

8.2.2 The MCPS Equation

8.2.3 The MCUPS Equation

8.2.4 Performance of the UTM-Neuroprocessor

8.2.5 Comparison with Previous Works

8.3 Summary

136

137

139

140

142

143

144

145

146

147

149

9 CONCLUSION 151

xiii

9.1 Concluding Remarks

9.2 Future Work

151

155

REFERENCES 157

PART TWO

APPENDICES

APPENDIX A – F 163

xiv

LIST OF TABLES

TABLE NO TITLE PAGE

3.1 Features of UTM-MTC35000 and AMI

MTC35100 Standard Cell Libraries

32

4.1 Implemented Properties of the Kohonen NN

Algorithm

62

5.1 Instruction Set 68

5.2 NN Variables for Execution of Process

Instructions

69

5.3 Parameterizable Settings of the Neuro Core 70

5.4 State Descriptions & Outputs of Global Control

Submodule

73

5.5 State Descriptions of Core Control Submodule 75

6.1 Avalon Basic Signals for Fundamental Slave

Transfers

97

6.2 Address Bit State & Corresponding Operations 98

6.3 List of Peripherals in UTM-Neuroprocessor 101

6.4 Software Routines Of The PCR Module 104

6.5 Address Pointers Used By The Software

Routines

106

7.1 LE resource count for FPGA implementation of

the UTM-Neuroprocessor

110

7.2 Parameters settings of the 2x2 Array

Computation Engine

112

7.3 Relevant Details for Array_2x2 Floorplanning

115

xv

7.4 Settings for P&R based Layout Design

Generation

120

7.5 VLSI Layout Details of PE 122

7.6 VLSI Layout Details of TM 124

7.7 Interconnections Between Package Pins and

Design Ports

132

8.1 Summary of Iris Plants Dataset 137

8.2 Classification Results of the Iris Plants Dataset 138

8.3 Summary of WBCD Dataset 139

8.4 Classification Results of the WBCD Dataset 140

8.5 Summary of HIV Blood Dataset 141

8.6 Classification results of the HIV Blood Dataset 142

8.7 Performance of the UTM-Neuroprocessor 147

8.8 Comparison with the UTM-Neuroprocessor

(Cheang 2003)

148

8.9 Comparison with the NBISOM-25 (Ruping et

al. 1996)

148

8.10 Comparison with the COKOS Chip

(Speckmann et al. 1992)

149

9.1 General Specifications of the UTM-

Neuroprocessor

152

9.2 Design Specifications of the 10x10 UTM-

Neuroprocessor Rapidly Prototyped in FPGA

153

9.3 Design Specifications of the Array_2x2

Microchip

154

9.4 Performance Estimation of a 10x10 UTM-

Neuroprocessor, Applying an Array

Computation Engine Implemented in VLSI

154

xvi

LIST OF FIGURES

FIGURE NO TITLE PAGE

1.1 Top-Level Block Diagram of UTM-

Neuroprocessor

5

3.1 Research Overview 21

3.2 Hardware/Software Development Flow for Nios

Processor based Embedded Systems

23

3.3 ASIC Design Flow Applied in UTM VLSI-

ECAD Research Laboratory

28

3.4 Unformatted Nor4 BBX Cell & Modified Nor4

BBX Cell For Tanner EDA Suite Compatibility

31

3.5 Logic Synthesis Process for Cell-Based Designs 33

3.6 Workflow to Perform Logic Synthesis using

Synopsys DC

34

3.7 Back-End VLSI Design using Tanner EDA Pro

Software Suite

35

3.8 Format of Cell-Based Designs Produced By

L-Edit SPR

36

3.9 Process Flow For DRC & ERC Conducted At

The Foundry

38

3.10 Empty Bonding Diagram & Completed

Bonding Diagram

39

3.11 Full Layout Version of Nand2 Cell & BBX

Version of the Nand2 Cell

34

3.12 Segment Of A Cell-Based Design Composed Of

3 Interconnected BBX Nand2 Cells

42

xvii

3.13 Extracted Netlist Of A BBX Standard Cell-

Based Design

43

3.14 Device Level Netlist of And2 Standard Cell

Obtained from library vendor

44

3.15 Netlist after Annotation of Subcircuit

Definitions using SPICE Include Statements

44

3.16 Segment of the Cell-Based Design after

Undergoing the BBX Replacement Procedure

45

4.1 General Structure Of Kohonen Neural Network 48

4.2 Grid Topologies of the Output Layer 49

4.3 Kohonen NN Learning Algorithm 49

4.4 Basic Neighborhood Functions 52

4.5 Kohonen NN Recall Algorithm 53

4.6 Manhattan Distance Equation 54

4.7 Alternative Neighbourhood Function for

Rectangular Topology

56

4.8 Boundaries & Relevant Parameters for

Rectangular Learning Neighbourhood Function

57

4.9 Processing Difference between Neuron-

Parallelism and Synapse-Parallelism

59

4.10 2-Dimensional Array Architecture for

Implementation of the Neuron-Parallel

Approach

60


Neuroprocessor

64

5.2 Design Hierarchy of UTM-Neuroprocessor 64

5.3 Top-Level Behaviour of UTM-Neuroprocessor 65

5.4 Top-Level Architecture of the Neuro Core 66

5.5 Neuro Core with a 10x10 Array Computation

Engine

67

5.6 I/O Interface of the Neuro Core 68

5.7 Instruction Format 69

5.8 I/O Block Diagram of the Controller 71

xviii

5.9 Functional Block Diagram of the Controller

Module

71

5.10 State Diagram of the Global Control Submodule 72

5.11 State Diagram of the Core Control Submodule 74

5.12 Computational Steps in the Kohonen Recall

Phase

75

5.13 Control Operation for Kohonen Recall Phase 76

5.14 Computational Steps of the Kohonen Learning

Algorithm

77

5.15 Control Operation for Kohonen Learning Phase 78

5.16 Array Computation Engine of 10x10 UTM-

Neuroprocessor

80

5.17 Instruction Passing Within A PE Row of the

10x10 Array

81

5.18 NIOS System Module design flow 82

5.19 Manhattan Distance Computation Within a Row

of the 10x10 Array

83

5.20 Countdown Operation 84

5.21 Signal Assertion During A Countdown

Operation

84

5.22 Learning Neighbourhood Lineation Using iRow

& iCol Signals

86

5.23 Weight Adaptation Computation Within Row 4

of the 10x10 Array

86

5.24 Functional Block Diagram of the PE 87

5.25 Instruction Passing Datapath & Data Passing

Datapath

88

5.26 Pipelined Datapath For Computation of

Manhattan Distance

89

5.27 Pipelined Computation of Manhattan Distance

in the PE

90

5.28 Countdown Function

91

xix

5.29 Pipelined Datapath For Weight Adaptation

Computation

92

5.30 Pipelined Execution of the Weight Update

Computation in the PE

93

6.1 Functional Block Diagram of Avalon Bus

Interface Module

96

6.2 Functional Block Diagram of Avalon Bus

Interface Module

97


Neuroprocessor

99

6.4 UTM-Neuroprocessor Hardware Development

Flow

100

6.5 User Logic Interface Wizard 101

6.6 SOPC Builder’s System Contents Window 102

6.7 Main Window of the GUI 105

6.8 Software Code Declaring the Address Pointers 106

6.9 Software Code Demonstrating Initiation of the

Recall Instruction

107

7.1 Combined FPGA-VLSI Implementation of

UTM-Neuroprocessor

111

7.2 Top-Level Block Diagram of 2x2 Array

Computation Engine

112

7.3 Bottom-Up Synthesis Flow of Array_2x2

Design

114

7.4 Floorplan of Array_2x2 Microchip 116

7.5 Pin Orientation of PE Block & I/O Port

Positioning for TM Block

117

7.6 Routing Through the Blocks & Routing Around

the Blocks

118

7.7 Power Distribution Network for Array_2x2

Microchip

119

7.8 PE for Column 0 121

7.9 PE for Column 1 122

xx

7.10 TM Block 123

7.11 Layout Design of I/O Padframe 126

7.12 I/O Filling for Pad Cells 127

7.13 Full Chip Design Workflow 128

7.14 Full Chip VLSI Layout of Array_2x2

Microchip

129

7.15 Physical Verification of Array_2x2 VLSI

Design

130

7.16 Wire Bonding Diagram of Array_2x2

Microchip

131

7.17 Pin Layout of PGA-84 Package 132

7.18 Fabricated and Packaged Array_2x2 Microchip 134

8.1 Graphical Visualization of the Trained Kohonen

Network for the Iris Plants Dataset

138

8.2 Graphical Visualization of the trained Kohonen

Network for the WBCD Dataset

140

8.3 Graphical Visualization of the trained Kohonen

Network for the HIV Blood Dataset

142

xxi

LIST OF ABBREVIATIONS

ASIC – Application Specific Integrated Circuit

CAD – Computer Aided Design

CMOS – Complementary Metal Oxide Semiconductor

CPU – Central Processing Unit

DMA – Direct Memory Access

EDA – Electronic Design Automation

FPGA – Field Programmable Gate Array

FSM – Finite State Machine

GUI – Graphical User Interface

HDL – Hardware Description Language

I/O – Input / Output

IP – Intellectual Property

LE – Logic Element

MCPS – Millions of Connections Per Second

MCUPS – Millions of Connections Update Per Second

NN – Neural Network

PC – Personal Computer

RAM – Random Access Memory

SIMD – Single Instruction Multiple Data

SoC – System On Chip

SOM – Self-Organizing Map

SRAM – Static Random Access Memory

VB – Visual Basics

VHDL – VHSIC Hardware Description Language

VHSIC – Very High Speed Integrated Circuit

VLSI – Very High Scale Integrated Circuit

xxii

LIST OF APPENDICES

APPENDIX TITLE PAGE

A VHDL Source Code of the Neuro Co-Processor 163

B VHDL Source Code of the 2x2 Array

Computation Engine

167

C C Source Code of the PCR Embedded Software

Module

173

D Visual Basics Source Code Of The GUI

Program

185

E Matlab Script For Graphical Visualization Of

Kohonen Network

199

F Pattern Recognition Datasets 201

CHAPTER 1

INTRODUCTION

This thesis proposes the VLSI design and implementation of a neural network

hardware, or neurohardware for pattern recognition. The aim is to produce a

neurohardware that executes the Kohonen Neural Network at massive parallelism for

high-speed pattern classification and serves as a comprehensive computing platform

for pattern recognition applications. In this first chapter, the background of the

domain is discussed, providing the rationale and focus points behind the research.

1. 1 Background and Motivation

Although conventional logic based computing has been successful in many

applications, it has not been effective in solving a variety of critical and complex

problems. At the same time, these perplexing problems are solved trivially and

routinely in real-time by human beings. It is this intriguing predicament that has led

to the study of information processing by the human brain and subsequently the

emergence of artificial neural networks. Artificial neural networks, or simply neural

networks, attempt to mimic the computational power of the mammalian brain by

massively interconnecting very simple computational units called neurons (Misra

1997). The adoption of this similar design philosophy provides neural networks with

the ability to learn and solve tasks challenging to conventional computing.

2

Neural networks have found a wide range of applications, with the majority

associated with the pattern recognition domain. The pattern classification attribute of

neural networks have been instrumental in the following examples of pattern

recognition applications; predictive and preventive maintenance, condition

monitoring, character recognition, speech synthesis, intelligence based medical

diagnosis and intrusion detection and predictive penetration services in computer

networks.

As artificial neural networks gain popularity for pattern recognition in a

variety of application domains, it is critical that these models are able to be executed

speedily and generate results in real-time (Lindsey 1998). Although a number of

implementations of neural networks are available on conventional general purpose

machines, most of these implementations require an inordinate amount of time to

train or run neural networks and are not able to provide real-time response, especially

when the network sizes are large. Large network sizes are often required in real

world applications and execution performances by these machines are simply

unacceptable. This drawback is apparently due to the fact that general purpose

computers are traditionally based on the von-Neumann architecture which is

sequential in nature (Schoenauer 1998). Artificial neural networks on the other hand

have a parallel structure by conception.

The most obvious solution to this problem is to accelerate the execution

artificial neural network algorithms is through simulation on dedicated parallel

hardware. The massively parallel and distributed processing brand of neural

networks suggest massively parallel hardware as an obvious implementation choice

to obtain appropriate algorithm-architecture matching and high execution speeds.

When implemented in parallel hardware, neural networks can take full advantage of

their inherent parallelism and run in orders of magnitude faster than software

simulations on sequential machines. Parallel processing with multiple simple

processing elements working together can provide tremendous speed-ups in neural

network and produce real-time responses and fast learning phases.

Consequently, a new breed of hardware, termed neurohardware, have

emerged. Neurohardware is typically defined as dedicated hardware designed to

3

implement neural algorithms and take full advantage of the inherent parallelism in

neural networks through parallel processing. At present, a wide spectrum of

neurohardware implementations that primarily differ in terms of performance-space

compromise, degree of parallelism and system architecture approaches are available.

However, with the continuous burgeoning of neural paradigms and increasing

applicability of neural networks in real-time based applications, there is a great

demand and market for massively parallel and dedicated neurohardware.

1.2 Problem Statement

Neurohardware providing parallel execution platforms for neural networks

can adopt two different architectural directions in doing so; general-purpose

architectures that emulate a wide range of neural network models, and special-

purpose architectures dedicated to a specific neural paradigm (Ruckert 2002).

Dedicated implementations are able to be built at a low hardware cost to execute the

algorithm more quickly and efficiently compared to general-purpose architectures.

The speed achieved by special-purpose architectures is unattainable by general-

purpose architectures (Liao 2001). Therefore, special-purpose architectures would be

viable for neurohardware targeting high-speed execution of specific neural

paradigms.

Neural network can be effectively grouped into three categories that are

distinguishable by their learning approaches; supervised, reinforcement and

unsupervised (Cheang 2003). Unsupervised learning possesses a number of

advantages over the other types of learning, which includes faster training and

execution for large networks. This brand of networks would be suitable for pattern

recognition problems, given their ability to detect structures and relations in data that

are not so apparent. One such neural paradigm that has been successful in pattern

classification and recognition applications is the Kohonen neural network (NN)

algorithm. The Kohonen NN is a proven algorithm and could be easily mapped onto

hardware than other neural paradigms (Glesner and Pochmuller 1994).

4

In developing the ASIC implementation of neurohardware, two main stream

technologies can be considered. FPGAs have evolved tremendously under the current

advancements of VLSI process technologies. The flexibility and reconfigurability of

FPGAs advantageously support parameterized designs and rapid prototyping of

advanced hardware architectures. VLSI implementations on the other hand are able

to guarantee higher compaction and speed for hardware designs, compared to

FPGAs. However, both technologies can be jointly utilized and advantageously

exploited for implementation of neurohardware.

1.3 Objectives

From the discussion in the preceding sections, the objectives of the work

presented in this thesis are as follows:

1) To design a neurohardware that provides high-speed pattern classification.

The Kohonen NN algorithm is to be implemented, in a massively parallel and

dedicated manner, to deliver high-speed pattern classification.

2) To design a neurohardware that serves as a comprehensive computing

platform for pattern recognition applications, based on the Kohonen NN

algorithm.

3) To propose a viable VLSI implementation approach for the designed

neurohardware and to develop a prototype for demonstration of real-world

pattern recognition applications.

5

1.4 Scope of Work

Neuroprocessor

GUI

Nios

Standard

Peripherals

AvalonBus

InterfaceController

Av

alo

n B

us

Nios CPU

Nios LibraryModule

PCRModule

UART

Host PC

Array Computation Engine

Neuro Co-processor

Neuro Core

Figure 1.1 : Top-Level Block Diagram of UTM-Neuroprocessor

Based on the outlined objectives, the neurohardware design illustrated by

Figure 1.1, the UTM-Neuroprocessor, is proposed in this work. The scope of work

involved in producing the proposed neurohardware is as follows:

1) Comprehensive study of the Kohonen NN algorithm and determining

necessary algorithmic modifications for efficient hardware implementation of

the algorithm.

2) Design and FPGA prototyping of the UTM-Neuroprocessor, using the Altera

Nios embedded system development kit. The Neuro co-processor that

executes the Kohonen NN at massive parallelism is designed using VHDL.

3) Development of the software components of the UTM-Neuroprocessor. The

embedded software component, the PCR module, is developed in C while the

GUI program is developed using Visual Basics.

6

4) VLSI implementation and fabrication of the array computation engine in the

AMI 0.5 µm process technology, to realize implementation of the UTM-

Neuroprocessor in the proposed combined FPGA-VLSI approach.

5) Demonstration of real-world pattern recognition datasets on the UTM-

Neuroprocessor. Datasets from selected application domains are used to

verify the classification speed and viability of the neuroprocessor as a

computing platform for pattern recognition applications.

1.5 Research Contributions

1) A systematic study and modification of the Kohonen NN algorithm for

efficient hardware implementation.

2) A comprehensive design and prototyping flow for FPGA-based embedded

systems using the Altera Nios development kit.

3) A viable ASIC design methodology for the UTM-ECAD VLSI Laboratory,

based on the AMI 0.5µm process technology. The design methodology

incorporates industry standard EDA tools and is applicable from design entry

stages to tape-out.

1.6 Organization of Thesis

The work in this thesis is presented conveniently over eight chapters. The

first chapter outlines the motivations and objectives of the thesis and subsequently

presents the scope of work involved in meeting the research objectives.

7

The second chapter provides brief summaries of the literature reviewed prior

to engaging in the mentioned scope of work. Review of literature on previously

attempted efforts assists in achieving the research objectives.

Chapter three presents the VLSI design methods and tools adopted in

producing the FPGA prototyping and ASIC implementation of the neurohardware in

this work.

Chapter four provides a detailed discussion on the implemented Kohonen NN

algorithm and outlines the necessary algorithmic modifications. Based on the

modifications, the architectural and design specifications of the neurohardware is

ascertained.

Chapter five delivers a description of the top-level architecture and behaviour

of the UTM-Neuroprocessor. Subsequently, the focus is shifted to the design of the

hardware module which implements the Kohonen NN algorithm at massive

parallelism for the neuroprocessor, is detailed elaborately in the chapter.

Chapter six dwells into the embedded system design and prototyping of the

UTM-Neuroprocessor using the Altera Nios development kit. The software

components of the neuroprocessor are also discussed in the chapter.

Chapter seven focuses on the VLSI implementation of the array computation

engine, for implementation of the UTM-Neuroprocessor in the combined FPGA-

VLSI approach. The chapter also presents ASIC design and fabrication of a

prototype design of the computation engine, termed the Array_2x2 microchip, in the

AMI 0.5 µm process technology.

Chapter eight provides details the application demonstration work on the

UTM-Neuroprocessor, using real-world pattern recognition datasets. Performance

evaluation and benchmarking of the neuroprocessor against previous works are also

reported by the chapter.

8

In the final chapter of the thesis, the research work is summarized and

deliverables of the research are stated. Potential extensions and improvements to the

design are also given.

1.7 Summary

In this chapter, a brief introduction was given to the background and

motivation. The need for neurohardware that executed neural networks at massive

parallelism for high-speed pattern classification was identified. Correspondingly,

several objectives were outlined to meet this need in the research. The UTM-

Neuroprocessor was proposed to fulfill the objectives of the research. The UTM-

Neuroprocessor executes the Kohonen Neural Network at massive parallelism for

high-speed pattern classification and serves as a comprehensive computing platform

for pattern recognition applications. The following chapter will discuss some

literature relevant to producing the proposed neurohardware and covers previous

works accomplished on the design of neurohardware for the similar objectives.

REFERENCES

Ahmad, R., Bambang, S.S., Rahman, W. and Rais, A. (1999). “Development of

Optimized Digital CMOS Standard Cell Library”. Proceeding of World Engineering

Congress (WEC99). 159-161.

Altera Corporation. (2003a). “Nios Hardware Development Tutorial”. Altera

Corporation.

Altera Corporation. (2003b). “Introduction to Quartus II”. Altera Corporation.

Altera Corporation. (2003c). “Sopc Builder Data Sheet”. Altera Corporation.

Altera Corporation. (2003d). “Nios Software Development Tutorial”. Altera

Corporation.

Altera Corporation (2003e). “Nios Development Board: Reference Manual, Stratic

Professional Edition”. Altera Corporation.

Altera Corporation. (2003f). “Stratix Device Handbook: Volume 1”. Altera

Corporation.

Asral B.J, Ahmad, R., Rais, A. (1999). “Standard Cell Library Development.”

Proceeding of IEEE International Conference on Microelectronics (ICM). 161-163.

Beale, R. and Jackson, T. (1990). “Neural Computing: An Introduction”. Adam

Hilger, IOP Publishing Ltd.

158

Brown, S., and Vranesic, Z. (2000). “Fundamentals of Digital Logic with VHDL

Design”. Singapore: McGraw_Hill.

Burr, J.B. (1992). “Digital Parallel Implementations of Neural Networks”. Prentice

Hall. 223-281.

Carpinelli, J. D. (2001). “Computer Systems, Organization & Architecture”. USA:

Addison Wesley.

Cheang, CH. (2003). “A Digital Neurohardware Implementation of Kohonen Neural

Network for Pattern Recognition”. Universiti Teknologi Malaysia.

Fausett, L. (1994). “Fundamentals of Neural Networks”. Prentice-Hall Inc.

Fischer, T., Eppler, W., Gemmeke, H., Kock, G. and Becher, T. (1997). “The SAND

Neurochip and its Embedding in the MiND System”. 7th

International Conference on

Artificial Neural Networks.

Fisher, R.A. (1936). "The Use of Multiple Measurements in Taxonomic Problems".

Annual Eugenics, 7, Part II, 179-188.

Geus, A. J. (1989). “Logic synthesis speeds ASIC Design”. IEEE Spectrum. August

1989. 27-31.

Glesner, M. and Pochmuller, W. (1994). “Neurocomputers – An overview of neural

networks in VLSI”. Chapman and Hall.

Haykin, S. (1994). “Neural Networks: A Comprehensive Foundation”. Macmillan

College Publishing Company.

Honkela, T., Kaski, S., Lagus, K. and Kohonen, T. (1997). “Websom --- self-

organizing maps of document collections”. Workshop on Self-Organizing Maps.

159

Ienne, P. (1995). “Digital Hardware Architectures for Neural Networks”. SPEEDUP

Journal, Vol. 9, No. 1.

Kohonen, T. (1995). “Self-Organizing Maps”. Springer-Verlag.

Kohonen, T. (1988). “The neural phonetic typewriter”. Computer.

Kohonen, T., Torkkola, K., Shozakai, M.,.Kangas, J. and Venta, O. (1988).

“Phonetic Typewriter for Finnish and Japanese”. Proceedings of the IEEE 1988

International Conference on Acoustics, Speech and Signal Processing.

Karnik, T. (2000). “Microprocessor Layout Method”. in Chen Wai-Kai. “The VLSI

Handbook”. USA: CRC Press. 62.1-62.28.

Kurup, P. and Abbasi, T. (1997). “Logic Synthesis Using Synopsys”. USA: Kluwer

Academic Publishers.

Lindsey, Clark S. (1998). “Neural Networks in Hardware: Architectures, Products

and Applications”. www.particle.kth.se/~lindsey.

Manavendra Misra (1997). “Parallel Environments for Implementing Neural

Networks”. Neural Computing Surveys. Vol 1. 48-60.

Martin-Del-Brio, B., Medrano-Marques, N. and Hernandez-Sanchez, S. “A Low-

Cost Neuroprocessor Board for Emulating the SOFM Neural Model”. 5th IEEE

International Conference on Electronics, Circuits and Systems.

Melton, M.S., Tan Phan, Reeves, D.S. and Van den Bout, D.E. (1992). “The

TInMANN VLSI Chip”. IEEE Transactions on Neural Networks.

Moerland, P.D. and Fiesler, E. (1996). “Hardware-Friendly Learning Algorithms for

Neural Networks: An Overview”. Fifth International Conference on Microelectronics

for Neural Networks and Fuzzy Systems.

160

Moerland, P. and Fiesler, E. (1997). “Neural Network Adaptations to Hardware

Implementations”. Handbook of Neural Computation E1.2. 1-13.

Mohamed Khalil and Koay, K.H. (1999). “VHDL Module Generator: A Rapid-

prototyping Design Entry Tool for Digital ASICs”. Jurnal Teknologi.

Muroga, S. (2000). “Cell-Library Design Approach”, in Chen Wai-Kai. “The VLSI

Handbook”. USA: CRC Press. 45.1-45.3.

Murphy, P.M. and Aha, D.W. (1994). “UCI Repository of machine learning

databases”. University of California, Department of Information and Computer

Science. (http://www.ics.uci.edu/~mlearn/MLRepository.html).

Patterson, D.W. (1996). “Artificial Neural Networks: Theory and Applications”.

Prentice Hall.

Peter van der Putten. (1996). “Utilizing the Topology Preserving Property of Self-

Organizing Maps for Classification”. Utrecht University: Masters thesis.

Rabaey, J. (2003). “Digital Integrated Circuits: A Design Perspective”. Usa:

Prentice-Hall. 2nd

Edition.

Ruping, S., Ruckert, U and Goser, K. (1993). “Hardware Design For SOFM with

Binary Input Vectors”. “New Trends in Neural Computation, Lecture Notes in

Computer Science”. Springer Verlag, Berlin.

Ruping, S., Ruckert, U and Goser, K. (1994). “A Chip for Self Organizing Feature

Maps”. Fourth International Conference on Microelectronics for Neural Networks

and Fuzzy Systems.

Ruping, S. and Ruckert U. (1996). “A Scalable Processor Array for SOFM”. Fifth

International Conference on Microelectronics for Neural Networks and Fuzzy

Systems.

161

Ruping, S., Porrmann, M. and Ruckert, U. (1997). “A High Performance SOFM

Hardware System”. International Work-Conference on Artificial and Natural Neural

Networks.

Ruping, S., Porrmann, M. and Ruckert, U. (1999) “SOM Hardware with

Acceleration Module for Graphical Representation of the Learning Process”. Seventh

International Conference on Microelectronics for Neural, Fuzzy and Bio-Inspired

Systems.

Salapura, V. (1994). “Neural Networks using bit stream arithmetic: A space efficient

implementation”. IEEE International Symposium on Circuits and Systems, London.

Schoenauer, T. et al. (1998). “Digital Neurohardware: Principles and perspectives”.

Neuronal Networks in Applications.

Schoenauer, T., Jahnke, A., Roth, U. and Klar, H. (1998). “Digital Neurohardware:

Principles and Perspectives”. Neural Networks in Applications.

Skapura, D.M. (1996). “Building Neural Networks”. ACM Press.

Speckmann, H., Thole, P. and Rosenstiel, W. (1992). “Hardware Implementation of

Kohonen's Self Organising Feature Map”. International Conference of Neural

Networks.

Speckmann, H., Thole, P. and Rosenstiel, W. (1993). “Hardware Synthesis for

Neural Networks from a Behavioral Description With VHDL”. International Joint

Conference on Neural Networks.

Silvaco International (1998). “Cell Characterization with .Modif Statement in

SmartSpice”. Simulation Standard. Volume 9. 12-13.

Synopsys Online Documentation. (2000a). “Design Compiler User Guide”. Synopsys

Inc., USA.

162

Synopsys Online Documentation. (2000b). “Design Compiler Tutorial”. Synopsys

Inc., USA.

Synopsys Online Documentation. (2000c). “Library Compiler User Guide”.

Synopsys Inc., USA.

Tanner Research, Inc. (2001a). “L-Edit User Guide”. Tanner Research Inc., USA.

Tanner Research, Inc. (2001b). “T-Spice Pro User Guide”. Tanner Research Inc.,

USA.

Vesanto, J. (2000). “Neural Network Tool for Data Mining: SOM Toolbox”.

Symposium for Tool Environments and Development Methods for Intelligent

Systems.

Visa, A., Iivarinen, J., Valkealahti, K. and Simula, O. (1995). “Neural Network

Based Classifier”. International Conference on Artificial Neural Networks.

Weste, N.H.E. and Eshraghian, K. (1992). “Principles of CMOS VLSI Design – A

Systems Perspective (Second Edition)”. Addison-Wesley Publishing Company.

Wolberg, W. H. and Mangasarian, O. L. (1990) "Cancer diagnosis via linear

programming". SIAM News, Volume 23, Number 5.

Xiang Fang, Thole, P., Goppert, J. and Rosenstiel, W. (1996). “A Hardware

Supported System for A Special Online Application of Self-Organising Maps”.

International Conference on Neural Networks.

Yihua Liao. (2001). “Neural Networks in Hardware: A Survey”. Department of

Computer Science, University of California: ECS250A Project.

vlsi design of a neurohardware processor...

Documents