vlsi design of a neurohardware processor...
TRANSCRIPT
VLSI DESIGN OF A NEUROHARDWARE PROCESSOR IMPLEMENTING
THE KOHONEN NEURAL NETWORK ALGORITHM
AVINASH RAJAH
UNIVERSITI TEKNOLOGI MALAYSIA
VLSI DESIGN OF A NEUROHARDWARE PROCESSOR IMPLEMENTING
THE KOHONEN NEURAL NETWORK ALGORITHM
AVINASH RAJAH
A thesis submitted in fulfilment of the
requirements for the award of the degree of
Master of Engineering (Electrical)
Faculty of Electrical Engineering
Universiti Teknologi Malaysia
DECEMBER 2005
v
ABSTRACT
As artificial neural networks continue to gain popularity in the domain of
pattern recognition, there have been growing demands for these models to be
executed at high-speeds. Thus, to cater to this need, the VLSI design and
implementation of a neurohardware for high-speed pattern recognition is proposed in
this research. The UTM-Neuroprocessor implements the Kohonen Neural Network
for pattern classification. High-speed pattern classification by the neural paradigm is
achieved through massively parallel execution based on the neuron-parallel
processing approach. For proof of concept purposes, a 10x10 UTM-Neuroprocessor,
which implements a 10x10 Kohonen network, was developed in this work. The
design and rapid FPGA prototyping of the neuroprocessor was achieved using
VHDL and the Altera Nios embedded system development kit. The FPGA-based
prototype of the 10x10 UTM-Neuroprocessor is able to function at a frequency of
100 MHz and delivers performances up to 5.079 GCPS and 2.285 GCUPS. Software
components, including a VB-based GUI, were also developed to allow execution of
pattern recognition applications on the UTM-Neuroprocessor. For efficient VLSI
implementation of the UTM-Neuroprocessor, the combined FPGA-VLSI approach
was proposed. Correspondingly, the VLSI design of a 2x2 array computation engine,
termed the Array_2x2 microchip, was developed in the AMI 0.5µm process
technology and fabricated at the Europractice IC foundry. The fabricated Array_2x2
microchip can be applied to produce a 2x2 UTM-Neuroprocessor, in the combined
FPGA-VLSI implementation approach. The design consumes an area of 16.9 mm2 on
silicon and is encapsulated in 84-pin PGA package. SPICE simulations of the
Array_2x2 design proved functionality at an operating frequency of 90 MHz. The
microchip is able to deliver performances of up to 169.41 MCPS and 75.78 MCUPS
MCUPS for a 2x2 UTM-Neuroprocessor.
vi
ABSTRAK
Pertumbuhan dalam penggunaan teknologi rangkaian saraf (neural networks)
dalam pelbagai bidang aplikasi telah mewujudkan keperluan untuk perkakasan
mikroelektronik canggih yang mampu melaksanakan rangkaian saraf pada kelajuan
tinggi. Oleh demikian, implementasi VLSI bagi sebuah pemproses yang
melaksanakan rangkaian saraf Kohonen pada kelajuan tinggi untuk aplikasi
pengecaman corak, telah dicadangkan dalam kajian ini. Pengecaman corak pada
kelajuan tinggi oleh UTM-Neuroprocessor direalisasikan menggunakan kaedah
pemprosesan selari. Bagi tujuan pemprototaipan, rekabentuk sebuah UTM-
Neuroprocessor, yang melaksanakan rangkaian Kohonen 10x10, telah dibangunkan.
Pembangunan rekabentuk 10x10 UTM-Neuroprocessor telah menggunakan VHDL
dan kit pembangunan sistem terbenam Altera Nios. Rekabentuk yang dibangunkan
telah diprototaip segera pada FPGA dan mampu mencapai kelajuan setinggi 100
MHz. Beberapa aturcara perisian juga telah dibangunkan bersama, untuk
membolehkan pemprosesan aplikasi pengecaman corak pada 10x10 UTM-
Neuroprocessor. Menggunakan aturcara yang telah dibangunkan, beberapa aplikasi
pengecaman corak dunia sebenar telah dilaksanakan. UTM-Neuroprocessor telah
didapati mampu menawarkan kuasa pemprosesan setinggi 5.079 GCPS dan 2.285
GCUPS untuk aplikasi-aplikasi tersebut. Bagi implementasi VLSI UTM-
Neuroprocessor, kaedah FPGA-VLSI tergabung telah dicadangkan. Berdasarkan
cadangan tersebut, sebuah mikrocip yang melaksanakan rangkaian Kohonen 2x2
telah direkabentuk dengan proses teknologi AMI 0.5µm. Mikrochip tersebut telah
difabrikasi di Europractice, Belgium dan boleh digunakan untuk membangunkan
sebuah 2x2 UTM-Neuroprocessor dalam kaedah implementasi FPGA-VLSI
tergabung. Simulasi SPICE telah membuktikan kefungsian rekabentuk mikrocip
tersebut pada 90 MHz. Pada kelajuan ini, Array_2x2 membolehkan kuasa
pemprosesan setinggi 169.41 MCPS dan 75.78 MCUPS dicapai oleh sebuah 2x2
UTM-Neuroprocessor.
vii
TABLE OF CONTENTS
CHAPTER TITLE PAGE
DECLARATION ii
DEDICATION iii
ACKNOWLEDGEMENTS iv
ABSTRACT v
ABSTRAK vi
TABLE OF CONTENTS vii
LIST OF TABLES xiv
LIST OF FIGURES xvi
LIST OF ABBREVIATIONS xxi
LIST OF APPENDICES xxii
PART ONE
THESIS CONTENT
1 INTRODUCTION 1
1. 1 Background and Motivation
1.2 Problem Statement
1.3 Objectives
1.4 Scope of Work
1.5 Research Contributions
1
3
4
5
6
viii
1.6 Organization of Thesis
1.7 Summary
6
8
2 LITERATURE REVIEW 9
2.1 Pattern Recognition
2.2 Unsupervised Learning in Neural Networks
2.3 The Kohonen Neural Network
2.4 Neurohardware Architecture
2.4.1 General-Purpose versus Special-Purpose
Neurohardware
2.4.2 System Architecture
2.5 Review of Previous Works
2.5.1 The nEXPERT System
2.5.2 The UTM-Neuroprocessor
2.5.3 The NBISOM-25 Chip
2.5.4 The COKOS Chip
2.6 Summary
9
11
12
12
13
14
15
15
16
17
18
18
3 VLSI DESIGN METHODS AND TOOLS 20
3.1 Research Overview
3.2 VHDL Modeling with UTM-VHDLmg
3.3 Hardware Prototyping of FPGA-based Embedded
Systems
3.3.1 Design Flow of FPGA-based Embedded
Systems
3.3.2 Embedded System Design with Altera
Nios Development Kit
3.3.3 Designing with Altera Quartus II EDA
Tool
20
22
23
23
24
24
ix
3.3.4 Embedded Software Development
3.4 ASIC Design Methodology in UTM-ECAD
VLSI Research Laboratory
3.4.1 Standard Cell Library for AMI 0.5um
Process Technology
3.4.2 Logic Synthesis with Synopsys Design
Compiler
3.4.3 Back-End VLSI Design with Tanner
EDA Pro Software Suite
3.4.3.1 Placement & Routing with L-
Edit SPR
3.4.3.2 Circuit Extraction with L-Edit
Extract
3.4.3.3 Circuit Simulation with T-Spice
Pro
3.4.4 Physical Verification & Tape-Off
Procedures
3.5 Development of Standard Cell Libraries
3.5.1 BBX Standard Cell Library for P&R Tool
3.5.2 Circuit Extraction Of Designs based on
BBX Standard Cells
3.5.3 BBX Replacement Procedure for
Fabrication
3.6 Summary
26
27
30
32
34
35
36
37
38
40
40
42
45
46
4 ALGORITHM & ARCHITECTURAL
SPECIFICATIONS FOR THE PROPOSED
NEUROHARDWARE
47
4.1 Kohonen Neural Network
4.1.1 Kohonen NN Learning Algorithm
4.1.2 Kohonen NN Recall Algorithm
47
49
52
x
4.2 Hardware Implementation - Algorithm Issues
4.2.1 Grid Topology & Initialization Scheme
4.2.2 Similarity Measure with Manhattan
Distance
4.2.3 Learning Rate
4.2.4 Neighbourhood Function
4.2.5 Termination Condition
4.3 Hardware Implementation - Architectural Issues
4.3.1 Type of Parallelism & Hardware
Architecture
4.3.2 Data Representation & Precision
4.3.3 Weight Storage
4.4 Summary
53
54
54
55
55
57
58
58
60
61
61
5 DESIGN OF THE NEURO CORE 63
5.1 Introduction of the UTM-Neuroprocessor (Top-
Level Architecture and Behaviour)
5.2 Design of the Neuro Core
5.3 Instruction Set & Format
5.4 Design Parameterization
5.5 Design of the Controller Module
5.5.1 Global Control Submodule
5.5.2 Core Control Submodule
5.5.3 Controller Operation For Kohonen Recall
Phase
5.5.4 Controller Operation For Kohonen
Learning Phase
5.6 Design of the Array Computation Engine
5.6.1 Manhattan Distance Computation
5.6.2 Countdown Operation
5.6.3 Weight Adaptation Computation
63
66
68
69
71
72
74
75
77
80
82
83
85
xi
5.7 Design of the Processing Element (PE)
5.7.1 Instruction & Data Passing
5.7.2 Manhattan Distance Computation
5.7.3 Countdown Function
5.7.4 Weight Adaptation Computation
5.8 Summary
87
88
88
90
91
93
6 DESIGN OF THE UTM-NEUROPROCESSOR 95
6.1 Design of the Neuro Co-processor
6.1.1 Design Of The Avalon Bus Interface
Hardware Module
6.2 The Design of the UTM-Neuroprocessor
6.3 Hardware Development of the UTM-
Neuroprocessor
6.4 Embedded Software Development
6.4.1 Pattern Classification & Recognition
(PCR) Software Module
6.5 Summary
95
96
98
99
103
103
107
7 VLSI IMPLEMENTATION OF THE ARRAY
COMPUTATION ENGINE
109
7.1 UTM-Neuroprocessor : Combined FPGA-VLSI
Implementation
7.2 Design Of The 2x2 Array Computation Engine
7.3 Logic Synthesis
7.4 Floorplanning
7.4.1 Floorplanning Of Top-Level Blocks
7.4.2 Power Distribution Network
7.5 Layout Design Of PE & TM Blocks
109
112
113
115
116
117
119
xii
7.5.1 VLSI Layout of Processing Element (PE)
7.5.2 VLSI Layout of the Transmission
Multiplexer (TM)
7.6 I/O Padframe Design
7.7 Full Chip Layout Design
7.8 Physical Verification
7.9 Tape-Off
7.10 Fabricated Array_2x2 Microchip
7.11 Summary
120
123
124
127
130
130
134
134
8 APPLICATION DEMONSTRATION &
PERFORMANCE EVALUATION
136
8.1 Demonstration of Pattern Recognition
Applications
8.1.1 Generic Classification – Iris Plants
Dataset
8.1.2 Classification in the Medical Domain –
WBCD Dataset
8.1.3 Classification in the Medical Domain –
HIV Blood Dataset
8.2 Performance Evaluation
8.2.1 Derivation of the Performance Equations
8.2.2 The MCPS Equation
8.2.3 The MCUPS Equation
8.2.4 Performance of the UTM-Neuroprocessor
8.2.5 Comparison with Previous Works
8.3 Summary
136
137
139
140
142
143
144
145
146
147
149
9 CONCLUSION 151
xiii
9.1 Concluding Remarks
9.2 Future Work
151
155
REFERENCES 157
PART TWO
APPENDICES
APPENDIX A – F 163
xiv
LIST OF TABLES
TABLE NO TITLE PAGE
3.1 Features of UTM-MTC35000 and AMI
MTC35100 Standard Cell Libraries
32
4.1 Implemented Properties of the Kohonen NN
Algorithm
62
5.1 Instruction Set 68
5.2 NN Variables for Execution of Process
Instructions
69
5.3 Parameterizable Settings of the Neuro Core 70
5.4 State Descriptions & Outputs of Global Control
Submodule
73
5.5 State Descriptions of Core Control Submodule 75
6.1 Avalon Basic Signals for Fundamental Slave
Transfers
97
6.2 Address Bit State & Corresponding Operations 98
6.3 List of Peripherals in UTM-Neuroprocessor 101
6.4 Software Routines Of The PCR Module 104
6.5 Address Pointers Used By The Software
Routines
106
7.1 LE resource count for FPGA implementation of
the UTM-Neuroprocessor
110
7.2 Parameters settings of the 2x2 Array
Computation Engine
112
7.3 Relevant Details for Array_2x2 Floorplanning
115
xv
7.4 Settings for P&R based Layout Design
Generation
120
7.5 VLSI Layout Details of PE 122
7.6 VLSI Layout Details of TM 124
7.7 Interconnections Between Package Pins and
Design Ports
132
8.1 Summary of Iris Plants Dataset 137
8.2 Classification Results of the Iris Plants Dataset 138
8.3 Summary of WBCD Dataset 139
8.4 Classification Results of the WBCD Dataset 140
8.5 Summary of HIV Blood Dataset 141
8.6 Classification results of the HIV Blood Dataset 142
8.7 Performance of the UTM-Neuroprocessor 147
8.8 Comparison with the UTM-Neuroprocessor
(Cheang 2003)
148
8.9 Comparison with the NBISOM-25 (Ruping et
al. 1996)
148
8.10 Comparison with the COKOS Chip
(Speckmann et al. 1992)
149
9.1 General Specifications of the UTM-
Neuroprocessor
152
9.2 Design Specifications of the 10x10 UTM-
Neuroprocessor Rapidly Prototyped in FPGA
153
9.3 Design Specifications of the Array_2x2
Microchip
154
9.4 Performance Estimation of a 10x10 UTM-
Neuroprocessor, Applying an Array
Computation Engine Implemented in VLSI
154
xvi
LIST OF FIGURES
FIGURE NO TITLE PAGE
1.1 Top-Level Block Diagram of UTM-
Neuroprocessor
5
3.1 Research Overview 21
3.2 Hardware/Software Development Flow for Nios
Processor based Embedded Systems
23
3.3 ASIC Design Flow Applied in UTM VLSI-
ECAD Research Laboratory
28
3.4 Unformatted Nor4 BBX Cell & Modified Nor4
BBX Cell For Tanner EDA Suite Compatibility
31
3.5 Logic Synthesis Process for Cell-Based Designs 33
3.6 Workflow to Perform Logic Synthesis using
Synopsys DC
34
3.7 Back-End VLSI Design using Tanner EDA Pro
Software Suite
35
3.8 Format of Cell-Based Designs Produced By
L-Edit SPR
36
3.9 Process Flow For DRC & ERC Conducted At
The Foundry
38
3.10 Empty Bonding Diagram & Completed
Bonding Diagram
39
3.11 Full Layout Version of Nand2 Cell & BBX
Version of the Nand2 Cell
34
3.12 Segment Of A Cell-Based Design Composed Of
3 Interconnected BBX Nand2 Cells
42
xvii
3.13 Extracted Netlist Of A BBX Standard Cell-
Based Design
43
3.14 Device Level Netlist of And2 Standard Cell
Obtained from library vendor
44
3.15 Netlist after Annotation of Subcircuit
Definitions using SPICE Include Statements
44
3.16 Segment of the Cell-Based Design after
Undergoing the BBX Replacement Procedure
45
4.1 General Structure Of Kohonen Neural Network 48
4.2 Grid Topologies of the Output Layer 49
4.3 Kohonen NN Learning Algorithm 49
4.4 Basic Neighborhood Functions 52
4.5 Kohonen NN Recall Algorithm 53
4.6 Manhattan Distance Equation 54
4.7 Alternative Neighbourhood Function for
Rectangular Topology
56
4.8 Boundaries & Relevant Parameters for
Rectangular Learning Neighbourhood Function
57
4.9 Processing Difference between Neuron-
Parallelism and Synapse-Parallelism
59
4.10 2-Dimensional Array Architecture for
Implementation of the Neuron-Parallel
Approach
60
5.1 Top-Level Block Diagram of UTM-
Neuroprocessor
64
5.2 Design Hierarchy of UTM-Neuroprocessor 64
5.3 Top-Level Behaviour of UTM-Neuroprocessor 65
5.4 Top-Level Architecture of the Neuro Core 66
5.5 Neuro Core with a 10x10 Array Computation
Engine
67
5.6 I/O Interface of the Neuro Core 68
5.7 Instruction Format 69
5.8 I/O Block Diagram of the Controller 71
xviii
5.9 Functional Block Diagram of the Controller
Module
71
5.10 State Diagram of the Global Control Submodule 72
5.11 State Diagram of the Core Control Submodule 74
5.12 Computational Steps in the Kohonen Recall
Phase
75
5.13 Control Operation for Kohonen Recall Phase 76
5.14 Computational Steps of the Kohonen Learning
Algorithm
77
5.15 Control Operation for Kohonen Learning Phase 78
5.16 Array Computation Engine of 10x10 UTM-
Neuroprocessor
80
5.17 Instruction Passing Within A PE Row of the
10x10 Array
81
5.18 NIOS System Module design flow 82
5.19 Manhattan Distance Computation Within a Row
of the 10x10 Array
83
5.20 Countdown Operation 84
5.21 Signal Assertion During A Countdown
Operation
84
5.22 Learning Neighbourhood Lineation Using iRow
& iCol Signals
86
5.23 Weight Adaptation Computation Within Row 4
of the 10x10 Array
86
5.24 Functional Block Diagram of the PE 87
5.25 Instruction Passing Datapath & Data Passing
Datapath
88
5.26 Pipelined Datapath For Computation of
Manhattan Distance
89
5.27 Pipelined Computation of Manhattan Distance
in the PE
90
5.28 Countdown Function
91
xix
5.29 Pipelined Datapath For Weight Adaptation
Computation
92
5.30 Pipelined Execution of the Weight Update
Computation in the PE
93
6.1 Functional Block Diagram of Avalon Bus
Interface Module
96
6.2 Functional Block Diagram of Avalon Bus
Interface Module
97
6.3 Top-Level Block Diagram of UTM-
Neuroprocessor
99
6.4 UTM-Neuroprocessor Hardware Development
Flow
100
6.5 User Logic Interface Wizard 101
6.6 SOPC Builder’s System Contents Window 102
6.7 Main Window of the GUI 105
6.8 Software Code Declaring the Address Pointers 106
6.9 Software Code Demonstrating Initiation of the
Recall Instruction
107
7.1 Combined FPGA-VLSI Implementation of
UTM-Neuroprocessor
111
7.2 Top-Level Block Diagram of 2x2 Array
Computation Engine
112
7.3 Bottom-Up Synthesis Flow of Array_2x2
Design
114
7.4 Floorplan of Array_2x2 Microchip 116
7.5 Pin Orientation of PE Block & I/O Port
Positioning for TM Block
117
7.6 Routing Through the Blocks & Routing Around
the Blocks
118
7.7 Power Distribution Network for Array_2x2
Microchip
119
7.8 PE for Column 0 121
7.9 PE for Column 1 122
xx
7.10 TM Block 123
7.11 Layout Design of I/O Padframe 126
7.12 I/O Filling for Pad Cells 127
7.13 Full Chip Design Workflow 128
7.14 Full Chip VLSI Layout of Array_2x2
Microchip
129
7.15 Physical Verification of Array_2x2 VLSI
Design
130
7.16 Wire Bonding Diagram of Array_2x2
Microchip
131
7.17 Pin Layout of PGA-84 Package 132
7.18 Fabricated and Packaged Array_2x2 Microchip 134
8.1 Graphical Visualization of the Trained Kohonen
Network for the Iris Plants Dataset
138
8.2 Graphical Visualization of the trained Kohonen
Network for the WBCD Dataset
140
8.3 Graphical Visualization of the trained Kohonen
Network for the HIV Blood Dataset
142
xxi
LIST OF ABBREVIATIONS
ASIC – Application Specific Integrated Circuit
CAD – Computer Aided Design
CMOS – Complementary Metal Oxide Semiconductor
CPU – Central Processing Unit
DMA – Direct Memory Access
EDA – Electronic Design Automation
FPGA – Field Programmable Gate Array
FSM – Finite State Machine
GUI – Graphical User Interface
HDL – Hardware Description Language
I/O – Input / Output
IP – Intellectual Property
LE – Logic Element
MCPS – Millions of Connections Per Second
MCUPS – Millions of Connections Update Per Second
NN – Neural Network
PC – Personal Computer
RAM – Random Access Memory
SIMD – Single Instruction Multiple Data
SoC – System On Chip
SOM – Self-Organizing Map
SRAM – Static Random Access Memory
VB – Visual Basics
VHDL – VHSIC Hardware Description Language
VHSIC – Very High Speed Integrated Circuit
VLSI – Very High Scale Integrated Circuit
xxii
LIST OF APPENDICES
APPENDIX TITLE PAGE
A VHDL Source Code of the Neuro Co-Processor 163
B VHDL Source Code of the 2x2 Array
Computation Engine
167
C C Source Code of the PCR Embedded Software
Module
173
D Visual Basics Source Code Of The GUI
Program
185
E Matlab Script For Graphical Visualization Of
Kohonen Network
199
F Pattern Recognition Datasets 201
CHAPTER 1
INTRODUCTION
This thesis proposes the VLSI design and implementation of a neural network
hardware, or neurohardware for pattern recognition. The aim is to produce a
neurohardware that executes the Kohonen Neural Network at massive parallelism for
high-speed pattern classification and serves as a comprehensive computing platform
for pattern recognition applications. In this first chapter, the background of the
domain is discussed, providing the rationale and focus points behind the research.
1. 1 Background and Motivation
Although conventional logic based computing has been successful in many
applications, it has not been effective in solving a variety of critical and complex
problems. At the same time, these perplexing problems are solved trivially and
routinely in real-time by human beings. It is this intriguing predicament that has led
to the study of information processing by the human brain and subsequently the
emergence of artificial neural networks. Artificial neural networks, or simply neural
networks, attempt to mimic the computational power of the mammalian brain by
massively interconnecting very simple computational units called neurons (Misra
1997). The adoption of this similar design philosophy provides neural networks with
the ability to learn and solve tasks challenging to conventional computing.
2
Neural networks have found a wide range of applications, with the majority
associated with the pattern recognition domain. The pattern classification attribute of
neural networks have been instrumental in the following examples of pattern
recognition applications; predictive and preventive maintenance, condition
monitoring, character recognition, speech synthesis, intelligence based medical
diagnosis and intrusion detection and predictive penetration services in computer
networks.
As artificial neural networks gain popularity for pattern recognition in a
variety of application domains, it is critical that these models are able to be executed
speedily and generate results in real-time (Lindsey 1998). Although a number of
implementations of neural networks are available on conventional general purpose
machines, most of these implementations require an inordinate amount of time to
train or run neural networks and are not able to provide real-time response, especially
when the network sizes are large. Large network sizes are often required in real
world applications and execution performances by these machines are simply
unacceptable. This drawback is apparently due to the fact that general purpose
computers are traditionally based on the von-Neumann architecture which is
sequential in nature (Schoenauer 1998). Artificial neural networks on the other hand
have a parallel structure by conception.
The most obvious solution to this problem is to accelerate the execution
artificial neural network algorithms is through simulation on dedicated parallel
hardware. The massively parallel and distributed processing brand of neural
networks suggest massively parallel hardware as an obvious implementation choice
to obtain appropriate algorithm-architecture matching and high execution speeds.
When implemented in parallel hardware, neural networks can take full advantage of
their inherent parallelism and run in orders of magnitude faster than software
simulations on sequential machines. Parallel processing with multiple simple
processing elements working together can provide tremendous speed-ups in neural
network and produce real-time responses and fast learning phases.
Consequently, a new breed of hardware, termed neurohardware, have
emerged. Neurohardware is typically defined as dedicated hardware designed to
3
implement neural algorithms and take full advantage of the inherent parallelism in
neural networks through parallel processing. At present, a wide spectrum of
neurohardware implementations that primarily differ in terms of performance-space
compromise, degree of parallelism and system architecture approaches are available.
However, with the continuous burgeoning of neural paradigms and increasing
applicability of neural networks in real-time based applications, there is a great
demand and market for massively parallel and dedicated neurohardware.
1.2 Problem Statement
Neurohardware providing parallel execution platforms for neural networks
can adopt two different architectural directions in doing so; general-purpose
architectures that emulate a wide range of neural network models, and special-
purpose architectures dedicated to a specific neural paradigm (Ruckert 2002).
Dedicated implementations are able to be built at a low hardware cost to execute the
algorithm more quickly and efficiently compared to general-purpose architectures.
The speed achieved by special-purpose architectures is unattainable by general-
purpose architectures (Liao 2001). Therefore, special-purpose architectures would be
viable for neurohardware targeting high-speed execution of specific neural
paradigms.
Neural network can be effectively grouped into three categories that are
distinguishable by their learning approaches; supervised, reinforcement and
unsupervised (Cheang 2003). Unsupervised learning possesses a number of
advantages over the other types of learning, which includes faster training and
execution for large networks. This brand of networks would be suitable for pattern
recognition problems, given their ability to detect structures and relations in data that
are not so apparent. One such neural paradigm that has been successful in pattern
classification and recognition applications is the Kohonen neural network (NN)
algorithm. The Kohonen NN is a proven algorithm and could be easily mapped onto
hardware than other neural paradigms (Glesner and Pochmuller 1994).
4
In developing the ASIC implementation of neurohardware, two main stream
technologies can be considered. FPGAs have evolved tremendously under the current
advancements of VLSI process technologies. The flexibility and reconfigurability of
FPGAs advantageously support parameterized designs and rapid prototyping of
advanced hardware architectures. VLSI implementations on the other hand are able
to guarantee higher compaction and speed for hardware designs, compared to
FPGAs. However, both technologies can be jointly utilized and advantageously
exploited for implementation of neurohardware.
1.3 Objectives
From the discussion in the preceding sections, the objectives of the work
presented in this thesis are as follows:
1) To design a neurohardware that provides high-speed pattern classification.
The Kohonen NN algorithm is to be implemented, in a massively parallel and
dedicated manner, to deliver high-speed pattern classification.
2) To design a neurohardware that serves as a comprehensive computing
platform for pattern recognition applications, based on the Kohonen NN
algorithm.
3) To propose a viable VLSI implementation approach for the designed
neurohardware and to develop a prototype for demonstration of real-world
pattern recognition applications.
5
1.4 Scope of Work
Neuroprocessor
GUI
Nios
Standard
Peripherals
AvalonBus
InterfaceController
Av
alo
n B
us
Nios CPU
Nios LibraryModule
PCRModule
UART
Host PC
Array Computation Engine
Neuro Co-processor
Neuro Core
Figure 1.1 : Top-Level Block Diagram of UTM-Neuroprocessor
Based on the outlined objectives, the neurohardware design illustrated by
Figure 1.1, the UTM-Neuroprocessor, is proposed in this work. The scope of work
involved in producing the proposed neurohardware is as follows:
1) Comprehensive study of the Kohonen NN algorithm and determining
necessary algorithmic modifications for efficient hardware implementation of
the algorithm.
2) Design and FPGA prototyping of the UTM-Neuroprocessor, using the Altera
Nios embedded system development kit. The Neuro co-processor that
executes the Kohonen NN at massive parallelism is designed using VHDL.
3) Development of the software components of the UTM-Neuroprocessor. The
embedded software component, the PCR module, is developed in C while the
GUI program is developed using Visual Basics.
6
4) VLSI implementation and fabrication of the array computation engine in the
AMI 0.5 µm process technology, to realize implementation of the UTM-
Neuroprocessor in the proposed combined FPGA-VLSI approach.
5) Demonstration of real-world pattern recognition datasets on the UTM-
Neuroprocessor. Datasets from selected application domains are used to
verify the classification speed and viability of the neuroprocessor as a
computing platform for pattern recognition applications.
1.5 Research Contributions
1) A systematic study and modification of the Kohonen NN algorithm for
efficient hardware implementation.
2) A comprehensive design and prototyping flow for FPGA-based embedded
systems using the Altera Nios development kit.
3) A viable ASIC design methodology for the UTM-ECAD VLSI Laboratory,
based on the AMI 0.5µm process technology. The design methodology
incorporates industry standard EDA tools and is applicable from design entry
stages to tape-out.
1.6 Organization of Thesis
The work in this thesis is presented conveniently over eight chapters. The
first chapter outlines the motivations and objectives of the thesis and subsequently
presents the scope of work involved in meeting the research objectives.
7
The second chapter provides brief summaries of the literature reviewed prior
to engaging in the mentioned scope of work. Review of literature on previously
attempted efforts assists in achieving the research objectives.
Chapter three presents the VLSI design methods and tools adopted in
producing the FPGA prototyping and ASIC implementation of the neurohardware in
this work.
Chapter four provides a detailed discussion on the implemented Kohonen NN
algorithm and outlines the necessary algorithmic modifications. Based on the
modifications, the architectural and design specifications of the neurohardware is
ascertained.
Chapter five delivers a description of the top-level architecture and behaviour
of the UTM-Neuroprocessor. Subsequently, the focus is shifted to the design of the
hardware module which implements the Kohonen NN algorithm at massive
parallelism for the neuroprocessor, is detailed elaborately in the chapter.
Chapter six dwells into the embedded system design and prototyping of the
UTM-Neuroprocessor using the Altera Nios development kit. The software
components of the neuroprocessor are also discussed in the chapter.
Chapter seven focuses on the VLSI implementation of the array computation
engine, for implementation of the UTM-Neuroprocessor in the combined FPGA-
VLSI approach. The chapter also presents ASIC design and fabrication of a
prototype design of the computation engine, termed the Array_2x2 microchip, in the
AMI 0.5 µm process technology.
Chapter eight provides details the application demonstration work on the
UTM-Neuroprocessor, using real-world pattern recognition datasets. Performance
evaluation and benchmarking of the neuroprocessor against previous works are also
reported by the chapter.
8
In the final chapter of the thesis, the research work is summarized and
deliverables of the research are stated. Potential extensions and improvements to the
design are also given.
1.7 Summary
In this chapter, a brief introduction was given to the background and
motivation. The need for neurohardware that executed neural networks at massive
parallelism for high-speed pattern classification was identified. Correspondingly,
several objectives were outlined to meet this need in the research. The UTM-
Neuroprocessor was proposed to fulfill the objectives of the research. The UTM-
Neuroprocessor executes the Kohonen Neural Network at massive parallelism for
high-speed pattern classification and serves as a comprehensive computing platform
for pattern recognition applications. The following chapter will discuss some
literature relevant to producing the proposed neurohardware and covers previous
works accomplished on the design of neurohardware for the similar objectives.
REFERENCES
Ahmad, R., Bambang, S.S., Rahman, W. and Rais, A. (1999). “Development of
Optimized Digital CMOS Standard Cell Library”. Proceeding of World Engineering
Congress (WEC99). 159-161.
Altera Corporation. (2003a). “Nios Hardware Development Tutorial”. Altera
Corporation.
Altera Corporation. (2003b). “Introduction to Quartus II”. Altera Corporation.
Altera Corporation. (2003c). “Sopc Builder Data Sheet”. Altera Corporation.
Altera Corporation. (2003d). “Nios Software Development Tutorial”. Altera
Corporation.
Altera Corporation (2003e). “Nios Development Board: Reference Manual, Stratic
Professional Edition”. Altera Corporation.
Altera Corporation. (2003f). “Stratix Device Handbook: Volume 1”. Altera
Corporation.
Asral B.J, Ahmad, R., Rais, A. (1999). “Standard Cell Library Development.”
Proceeding of IEEE International Conference on Microelectronics (ICM). 161-163.
Beale, R. and Jackson, T. (1990). “Neural Computing: An Introduction”. Adam
Hilger, IOP Publishing Ltd.
158
Brown, S., and Vranesic, Z. (2000). “Fundamentals of Digital Logic with VHDL
Design”. Singapore: McGraw_Hill.
Burr, J.B. (1992). “Digital Parallel Implementations of Neural Networks”. Prentice
Hall. 223-281.
Carpinelli, J. D. (2001). “Computer Systems, Organization & Architecture”. USA:
Addison Wesley.
Cheang, CH. (2003). “A Digital Neurohardware Implementation of Kohonen Neural
Network for Pattern Recognition”. Universiti Teknologi Malaysia.
Fausett, L. (1994). “Fundamentals of Neural Networks”. Prentice-Hall Inc.
Fischer, T., Eppler, W., Gemmeke, H., Kock, G. and Becher, T. (1997). “The SAND
Neurochip and its Embedding in the MiND System”. 7th
International Conference on
Artificial Neural Networks.
Fisher, R.A. (1936). "The Use of Multiple Measurements in Taxonomic Problems".
Annual Eugenics, 7, Part II, 179-188.
Geus, A. J. (1989). “Logic synthesis speeds ASIC Design”. IEEE Spectrum. August
1989. 27-31.
Glesner, M. and Pochmuller, W. (1994). “Neurocomputers – An overview of neural
networks in VLSI”. Chapman and Hall.
Haykin, S. (1994). “Neural Networks: A Comprehensive Foundation”. Macmillan
College Publishing Company.
Honkela, T., Kaski, S., Lagus, K. and Kohonen, T. (1997). “Websom --- self-
organizing maps of document collections”. Workshop on Self-Organizing Maps.
159
Ienne, P. (1995). “Digital Hardware Architectures for Neural Networks”. SPEEDUP
Journal, Vol. 9, No. 1.
Kohonen, T. (1995). “Self-Organizing Maps”. Springer-Verlag.
Kohonen, T. (1988). “The neural phonetic typewriter”. Computer.
Kohonen, T., Torkkola, K., Shozakai, M.,.Kangas, J. and Venta, O. (1988).
“Phonetic Typewriter for Finnish and Japanese”. Proceedings of the IEEE 1988
International Conference on Acoustics, Speech and Signal Processing.
Karnik, T. (2000). “Microprocessor Layout Method”. in Chen Wai-Kai. “The VLSI
Handbook”. USA: CRC Press. 62.1-62.28.
Kurup, P. and Abbasi, T. (1997). “Logic Synthesis Using Synopsys”. USA: Kluwer
Academic Publishers.
Lindsey, Clark S. (1998). “Neural Networks in Hardware: Architectures, Products
and Applications”. www.particle.kth.se/~lindsey.
Manavendra Misra (1997). “Parallel Environments for Implementing Neural
Networks”. Neural Computing Surveys. Vol 1. 48-60.
Martin-Del-Brio, B., Medrano-Marques, N. and Hernandez-Sanchez, S. “A Low-
Cost Neuroprocessor Board for Emulating the SOFM Neural Model”. 5th IEEE
International Conference on Electronics, Circuits and Systems.
Melton, M.S., Tan Phan, Reeves, D.S. and Van den Bout, D.E. (1992). “The
TInMANN VLSI Chip”. IEEE Transactions on Neural Networks.
Moerland, P.D. and Fiesler, E. (1996). “Hardware-Friendly Learning Algorithms for
Neural Networks: An Overview”. Fifth International Conference on Microelectronics
for Neural Networks and Fuzzy Systems.
160
Moerland, P. and Fiesler, E. (1997). “Neural Network Adaptations to Hardware
Implementations”. Handbook of Neural Computation E1.2. 1-13.
Mohamed Khalil and Koay, K.H. (1999). “VHDL Module Generator: A Rapid-
prototyping Design Entry Tool for Digital ASICs”. Jurnal Teknologi.
Muroga, S. (2000). “Cell-Library Design Approach”, in Chen Wai-Kai. “The VLSI
Handbook”. USA: CRC Press. 45.1-45.3.
Murphy, P.M. and Aha, D.W. (1994). “UCI Repository of machine learning
databases”. University of California, Department of Information and Computer
Science. (http://www.ics.uci.edu/~mlearn/MLRepository.html).
Patterson, D.W. (1996). “Artificial Neural Networks: Theory and Applications”.
Prentice Hall.
Peter van der Putten. (1996). “Utilizing the Topology Preserving Property of Self-
Organizing Maps for Classification”. Utrecht University: Masters thesis.
Rabaey, J. (2003). “Digital Integrated Circuits: A Design Perspective”. Usa:
Prentice-Hall. 2nd
Edition.
Ruping, S., Ruckert, U and Goser, K. (1993). “Hardware Design For SOFM with
Binary Input Vectors”. “New Trends in Neural Computation, Lecture Notes in
Computer Science”. Springer Verlag, Berlin.
Ruping, S., Ruckert, U and Goser, K. (1994). “A Chip for Self Organizing Feature
Maps”. Fourth International Conference on Microelectronics for Neural Networks
and Fuzzy Systems.
Ruping, S. and Ruckert U. (1996). “A Scalable Processor Array for SOFM”. Fifth
International Conference on Microelectronics for Neural Networks and Fuzzy
Systems.
161
Ruping, S., Porrmann, M. and Ruckert, U. (1997). “A High Performance SOFM
Hardware System”. International Work-Conference on Artificial and Natural Neural
Networks.
Ruping, S., Porrmann, M. and Ruckert, U. (1999) “SOM Hardware with
Acceleration Module for Graphical Representation of the Learning Process”. Seventh
International Conference on Microelectronics for Neural, Fuzzy and Bio-Inspired
Systems.
Salapura, V. (1994). “Neural Networks using bit stream arithmetic: A space efficient
implementation”. IEEE International Symposium on Circuits and Systems, London.
Schoenauer, T. et al. (1998). “Digital Neurohardware: Principles and perspectives”.
Neuronal Networks in Applications.
Schoenauer, T., Jahnke, A., Roth, U. and Klar, H. (1998). “Digital Neurohardware:
Principles and Perspectives”. Neural Networks in Applications.
Skapura, D.M. (1996). “Building Neural Networks”. ACM Press.
Speckmann, H., Thole, P. and Rosenstiel, W. (1992). “Hardware Implementation of
Kohonen's Self Organising Feature Map”. International Conference of Neural
Networks.
Speckmann, H., Thole, P. and Rosenstiel, W. (1993). “Hardware Synthesis for
Neural Networks from a Behavioral Description With VHDL”. International Joint
Conference on Neural Networks.
Silvaco International (1998). “Cell Characterization with .Modif Statement in
SmartSpice”. Simulation Standard. Volume 9. 12-13.
Synopsys Online Documentation. (2000a). “Design Compiler User Guide”. Synopsys
Inc., USA.
162
Synopsys Online Documentation. (2000b). “Design Compiler Tutorial”. Synopsys
Inc., USA.
Synopsys Online Documentation. (2000c). “Library Compiler User Guide”.
Synopsys Inc., USA.
Tanner Research, Inc. (2001a). “L-Edit User Guide”. Tanner Research Inc., USA.
Tanner Research, Inc. (2001b). “T-Spice Pro User Guide”. Tanner Research Inc.,
USA.
Vesanto, J. (2000). “Neural Network Tool for Data Mining: SOM Toolbox”.
Symposium for Tool Environments and Development Methods for Intelligent
Systems.
Visa, A., Iivarinen, J., Valkealahti, K. and Simula, O. (1995). “Neural Network
Based Classifier”. International Conference on Artificial Neural Networks.
Weste, N.H.E. and Eshraghian, K. (1992). “Principles of CMOS VLSI Design – A
Systems Perspective (Second Edition)”. Addison-Wesley Publishing Company.
Wolberg, W. H. and Mangasarian, O. L. (1990) "Cancer diagnosis via linear
programming". SIAM News, Volume 23, Number 5.
Xiang Fang, Thole, P., Goppert, J. and Rosenstiel, W. (1996). “A Hardware
Supported System for A Special Online Application of Self-Organising Maps”.
International Conference on Neural Networks.
Yihua Liao. (2001). “Neural Networks in Hardware: A Survey”. Department of
Computer Science, University of California: ECS250A Project.