high performance computing on spinnaker neuromorphic platform .high performance computing on...
Post on 11-Oct-2018
Embed Size (px)
High Performance Computing on SpiNNaker Neuromorphic Platform:a Case Study for Energy Efficient Image Processing
Indar Sugiarto, Gengting Liu, Simon Davidson, Luis A. Plana and Steve B. FurberSchool of Computer Science, University of ManchesterOxford Road, Manchester, United Kingdom, M13 9PL
Email: [indar.sugiarto, gengting.liu, simon.davidson, luis.plana, steve.furber]@manchester.ac.uk
AbstractThis paper presents an efficient strategy to imple-ment parallel and distributed computing for image processingon a neuromorphic platform. We use SpiNNaker, a many-core neuromorphic platform inspired by neural connectivityin the brain, to achieve fast response and low power consump-tion. Our proposed method is based on fault-tolerant fine-grained parallelism that uses SpiNNaker resources optimallyfor process pipelining and decoupling. We demonstrate thatour method can achieve a performance of up to 49.7 MP/J forSobel edge detector, and can process 1600 x 1200 pixel imagesat 697 fps. Using simulated Canny edge detector, our methodcan achieve a performance of up to 21.4 MP/J. Moreover, theframework can be extended further by using larger SpiNNakermachines. This will be very useful for applications such asenergy-aware and time-critical-mission robotics as well as veryhigh resolution computer vision systems.
In the last decade there has been a growing interestin bringing green technology into the high-performancecomputing (HPC) domain. The Top500 Project report showsthat supercomputer performance is approximately doublingevery year, whilst power consumption is also rising . Theenergy efficiency of those supercomputers has increased,but at a slower rate than performance. On the other hand,the emergence of neuromorphic technology, i.e., computingplatforms inspired by the brain, offers a new paradigmof computation. This technology differs from conventionalcomputer technology not only in its architectural descrip-tion, but also in that it offers the interesting feature of lowerpower consumption.
SpiNNaker (Spiking Neural Network Architecture) isa neuromorphic computing system that was built with themotivation to appreciate the marvelous work of the brain.None of the human-made computing technologies can beatthe performance of a human brain in terms of communica-tion+computing power. Even Sunway Taihulight, currentlythe fastest supercomputer in the world , has an inferiorperformance, when measured using the TEPS (TraversedEdges Per Second) metric . SpiNNaker, in contrast, aimsto provide relatively less computational power but highly-efficient interconnectivity, similar to the brain itself.
Although initially intended for neuromorphic applica-tions, SpiNNaker has also attracted attention from fieldssuch as robotics, due to its low power consumption. TheSpiNNaker chip is designed around an ARM968 proces-sor, the primary market for which is low-power embeddedmicrocontroller applications. A SpiNNaker chip is able todeliver 3600 MOPS (million operations per second) at only1 Watt in 130nm UMC technology. However, the SpiNNakerchip was designed to be optimal for spiking neural networksimulation, with little regard for applications outside of thislimited space. Consequently, common functionality that onemight expect to find on an general purpose CPU, such asfloating point hardware and memory management, are notpresent. Despite this lack of focus outside of the neuralspace, the unusual communications fabric and power effi-ciency of SpiNNaker suggest that it could be an interestingplatform on which to evaluate other classes of algorithm. Forexample, we foresee potential applications of SpiNNaker incomputer vision and robotics.
To begin our exploration in this field, in this paper wepropose an energy-efficient, high-performance approach toimage processing, and evaluate how SpiNNaker performs inthis domain. As exemplars that address different aspects ofimage processing, we demonstrate three algorithms: Sobeledge detector, image smoothing using Gaussian filtering, andimage sharpening using histogram equalization.
Our contributions can be summarized as follows:
1) We propose an efficient implementation of funda-mental image processing algorithms on a neuromor-phic computing platform.
2) We evaluate the efficiency of a scalable, paralleland distributed algorithm running on SpiNNaker.
3) We provide a new benchmark for performance eval-uation of many-core neuromorphic platforms.
The rest of this paper is structured as follows: InSection 2 the SpiNNaker architecture and communicationnetwork are introduced. In Section 3, we describe our novelmethod to achieve scalable and efficient image processingusing SpiNNaker. Section 4 presents our evaluation of theproposed method. The paper closes with the Conclusionssection.
978-1-5090-5252-3/16/$31.00 c2016 IEEE
2. SpiNNaker Neuromorphic Platform
SpiNNaker is a distributed computing system designedoriginally to simulate millions of neurons in a spiking neuralnetwork model. As a neurally-inspired computing system, itoffers opportunities to explore new principles of massivelyparallel computation which cannot easily be performed bytraditional supercomputers.
2.1. SpiNNaker Chips and Machines
The main element of SpiNNaker machines is the SpiN-Naker chip. The SpiNNaker chip is a multicore system-on-chip that comprises 18 ARM-968 processor cores. Inside thechip, those cores are surrounded by a light-weight, packet-switched asynchronous communications infrastructure thatmakes the SpiNNaker machine as a globally asynchronouslocally synchronous (GALS) system. Each chip also hasa 128 MByte SDRAM (Synchronous Dynamic RandomAccess Memory), which is physically mounted on top ofthe SpiNNaker die.
The ARM cores on the SpiNNaker chip do not havefloating point unit (FPU), but they can use emulated floatingpoint operation provided by compilers such as GCC (GNUC-compiler). Running such an emulated operation on a coreusing GCC compiler will yield a performance of up to 9.4MFLOPS (Million Floating-point Operations Per Second)at a clock frequency of 200MHz. Higher performance canbe achieved, theoretically up to 100 MFLOPS, by takingadvantage of the special features of the ARM architecture.The work by Iordache  gives a good example of how toachieve high performance emulated floating point operationon an integer processor without FPU.
The chip does not employ cache coherence. Instead,each core incorporates two tightly-coupled static memories(SRAM) in a Harvard architecture. As is common in ARMprocessors, SpiNNaker cores can run in ARM or THUMBmode. Programs written in ARM mode are slightly fasterthan their THUMB counterpart, however, programs writtenin THUMB mode have higher code density. Table 1 showsthe SpiNNaker ARM core performance benchmarked usingDhrystone, measured in DMIPS (Dhrystone million instruc-tion per second).
TABLE 1. THE SPINNAKERS DHRYSTONE BENCHMARK AT 200MHZ.
Mode Unoptimized OptimizedTHUMB 52.7 DMIPS 121.1 DMIPSARM 57.5 DMIPS 138.8 DMIPS
As a general purpose computing engine, SpiNNaker isa machine with a large number of homogeneous processingelements. In this paper, we use the term node to refer to asingle SpiNNaker chip. The actual number of working coresmight be different from chip to chip due to defects duringchips fabrication process. During the boot process, anymalfunctioning core is excluded from the list of availablecores. This is a part of fault tolerance mechanisms in SpiN-Naker that take place in several levels to ensure reliability
against system failure . In our work, we also applied anadditional mechanism such that malfunction cores can alsobe detected during run time by monitoring their activities.This monitoring task is assigned to the leading core ineach node, and the faulty cores will be excluded during theworkload distribution to ensure the integrity of the imageprocessing algorithm.
The SpiNNaker architecture is scalable and SpiNNakermachines are classified by the number of processor cores.Table 2 shows the nomenclature used for SpiNNaker ma-chines, where the 10x machine has approximately 10xprocessor cores. Currently, the largest machine in operationconsists of 5 105-machine and contains 518400 cores. Fig. 1shows the SpiNNaker 103 machine used in this paper.
TABLE 2. SUMMARY OF SPINNAKER MACHINE NAMING CONVENTION.
Name Features103 machine*) A 48-node board (864 ARM cores) with two 100Mbps
Ethernet ports, six 3.1Gbps serial transceivers, and oneSpiNN-link port. It requires 12V 6A supply.
104 machine A single frame incorporating 24 pieces of 103-machine. It has 10,368 ARM processor cores andconsumes approx. 1kW of power.
105 machine A 19 rack cabinet incorporating 5 frames of 104-machine. It has 103,680 ARM processor cores, andrequires a 10kW (approx.) power supply.
106 machine+) It comprises 10 cabinets (each a 105-machine). It willhave 1,036,800 ARM processor cores, and will requirea 100kW (approx.) power supply.
*)used in this paper +)under construction
Figure 1. SpiNNaker 103 machine, also known as a SpiNN-5 board.
2.2. SpiNNaker Communication Network
SpiNNaker machines are networks of SpiNNaker chips.A SpiNNaker chip has six bidirectional, inter-chip links thatallow the creation of networks with efficient topologies,such as the preferred 2D-torus interconnect. The key com-ponent of the SpiNNaker communication infrastructure is itspacket-switched network that can distribute short packets inan energy-efficient manner. Packet routing is managed bythe asynchronous Network-on-Chip (NoC) which extendsseamlessly to the interchip links .
Each chip contains a bespoke multicast router with aconfigurable 3-state, CAM-based look-up table which cansend a copy of a packet to any subset of the 18 on-chipcores and the 6 external links c