4-4: a programmable and configurable mixed-mode … programmable and configurable mixed-mode fpaa...

4
A Programmable and Configurable Mixed-Mode FPAA SoC Sahil Shah, Sihwan Kim, Farhan Adil, Jennifer Hasler, Suma George, Michelle Collins, Richard Wunderlich, Stephen Nease, and Shubha Ramakrishnan Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, GA, USA Abstract: The authors present a Floating-Gate based, System-On-Chip large-scale Field-Programmable Analog Array IC that integrates divergent concepts from previous designs along with low-power digital computation including a 16bit open-source MSP430 microprocessor, and resulting interface circuitry such as DACs and ADCs. We describe the SoC FPAA architecture, experimental results of a range of compiled circuit and systems. Keywords: FPAAs; Floating-Gate, SoC, Command Word Classification This paper presents a Floating-Gate (FG) based, System- On-Chip (SoC) large-scale Field-Programmable Analog Array (FPAA) FPAA IC that integrates divergent concepts from previous multiple FPAA designs [1,2] along with low-power digital computation including a 16bit open- source MSP430 microprocessor (uP), and resulting interface circuitry such as DACs and ADCs. The unified structure enables a wide range of system-on-a-chip computing options that can be optimized over a range of metrics, such as power or energy consumed. The resulting IC architecture is the most sophisticated FPAA device built to date. This paper shows an analog command-word recognition system to illustrate the system computation, demonstrates experimentally the first embedded classifier structure compiled onto a single FPAA device. This FPAA SoC enables both high parameter density (number of programmable elements / area / normalized to process) as well as high accessibility of each of the resulting computations due to its advanced data flow handling; the next closest design optimizing both parameters is nearly a 600,000 factor less than this SoC FPAA. This IC was fabricated in a 350nm CMOS process; such approaches have recently been shown to be possible in scaled down IC processes [3]. The FPAA SoC die measures 12mm x 7mm, fabricated in a 350nm standard CMOS process. Figure 1 plots various FPAA devices showing the percentage of control path implemented versus analog parameter density. Figure 1 shows two key metrics for FPAA approaches based on a collection of published FPAA devices. We define analog parameter density as the number of programmable parameters per mm 2 , normalized to a 1μm CMOS node. Analog parameter density determines critically the IC computation complexity, particularly when using routing as computation. Figure 1 shows FG based FPAAs enable 1000 improvement in parameter density, providing orders of magnitude potential computation on a single device; alternatives to FG devices require a DAC at every node or similar dynamic techniques. Our second metric is to describe the amount of control flow (mostly digital) relative to the amount of analog and digital data flow capability. Practically, the ability to get data to all of the processors can be a primary limitation for a series of application spaces, such as image processing, where data does not always arrive in the desired order for the computation. The presented SoC FPAA device maximizes both metrics, being nearly a factor of 500 improvement in area efficiency as typical of other analog FPAA devices, but with high utilization of the resulting computational resources; the closest high utilization structure (i.e. like PSoC5 [4]) is nearly a 600,000 factor improvement. This large-scale Field-programmable analog array (FPAA) still enables analog computational energy efficiencies of 1000 as well as area efficiencies of 100 over digital solutions. This capability enables low-power system computation, in the μW levels, enabling a whole range of applications, particularly always-on context-aware processors. The saturation of computational energy efficiency in digital computation [2] enables this SoC Fig. 1: Using data from generations of FPAA devices built at GT and elsewhere, various FPAA devices are plotted as the Percentage of Control Path implemented versus Analog Parameter Density. Recent FPAA ICs, like the dynamically reconfigurable FPAA or FPAADD device begin to effectively maximize both parameters. Analog Parameter Density is the number of analog parameters per mm 2 , normalized to a 1μm process (or analog parameter density). Analog parameters directly set the complexity possible by the particular FPAA device. 57 Distribution A: Approved for public release; distribution unlimited.

Upload: vonhu

Post on 24-Apr-2018

235 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 4-4: A Programmable and Configurable Mixed-Mode … Programmable and Configurable Mixed-Mode FPAA SoC Sahil Shah, Sihwan Kim, Farhan Adil, Jennifer Hasler, Suma George, Michelle Collins,

A Programmable and Configurable Mixed-Mode FPAA SoC

Sahil Shah, Sihwan Kim, Farhan Adil, Jennifer Hasler, Suma George, Michelle Collins, Richard Wunderlich, Stephen Nease, and Shubha Ramakrishnan

Electrical and Computer Engineering Georgia Institute of Technology, Atlanta, GA, USA

Abstract: The authors present a Floating-Gate based, System-On-Chip large-scale Field-Programmable Analog Array IC that integrates divergent concepts from previous designs along with low-power digital computation including a 16bit open-source MSP430 microprocessor, and resulting interface circuitry such as DACs and ADCs. We describe the SoC FPAA architecture, experimental results of a range of compiled circuit and systems.

Keywords: FPAAs; Floating-Gate, SoC, Command Word Classification

This paper presents a Floating-Gate (FG) based, System-On-Chip (SoC) large-scale Field-Programmable Analog Array (FPAA) FPAA IC that integrates divergent concepts from previous multiple FPAA designs [1,2] along with low-power digital computation including a 16bit open-source MSP430 microprocessor (uP), and resulting interface circuitry such as DACs and ADCs. The unified structure enables a wide range of system-on-a-chip computing options that can be optimized over a range of metrics, such as power or energy consumed. The resulting IC architecture is the most sophisticated FPAA device built to date. This paper shows an analog command-word recognition system to illustrate the system computation, demonstrates experimentally the first embedded classifier structure compiled onto a single FPAA device. This FPAA SoC enables both high parameter density (number of programmable elements / area / normalized to process) as well as high accessibility of each of the resulting computations due to its advanced data flow handling; the next closest design optimizing both parameters is nearly a 600,000 factor less than this SoC FPAA. This IC was fabricated in a 350nm CMOS process; such approaches have recently been shown to be possible in scaled down IC processes [3]. The FPAA SoC die measures 12mm x 7mm, fabricated in a 350nm standard CMOS process.

Figure 1 plots various FPAA devices showing the percentage of control path implemented versus analog parameter density. Figure 1 shows two key metrics for FPAA approaches based on a collection of published FPAA devices. We define analog parameter density as the number of programmable parameters per mm2, normalized to a 1μm CMOS node. Analog parameter density determines critically the IC computation complexity, particularly when using routing as computation. Figure 1 shows FG based FPAAs enable ≈ 1000 improvement in parameter density, providing orders of magnitude potential

computation on a single device; alternatives to FG devices require a DAC at every node or similar dynamic techniques. Our second metric is to describe the amount of control flow (mostly digital) relative to the amount of analog and digital data flow capability. Practically, the ability to get data to all of the processors can be a primary limitation for a series of application spaces, such as image processing, where data does not always arrive in the desired order for the computation. The presented SoC FPAA device maximizes both metrics, being nearly a factor of 500 improvement in area efficiency as typical of other analog FPAA devices, but with high utilization of the resulting computational resources; the closest high utilization structure (i.e. like PSoC5 [4]) is nearly a 600,000 factor improvement.

This large-scale Field-programmable analog array (FPAA) still enables analog computational energy efficiencies of 1000 as well as area efficiencies of 100 over digital solutions. This capability enables low-power system computation, in the μW levels, enabling a whole range of applications, particularly always-on context-aware processors. The saturation of computational energy efficiency in digital computation [2] enables this SoC

Fig. 1: Using data from generations of FPAA devices built at GT and elsewhere, various FPAA devices are plotted as the Percentage of Control Path implemented versus Analog Parameter Density. Recent FPAA ICs, like the dynamically reconfigurable FPAA or FPAADD device begin to effectively maximize both parameters. Analog Parameter Density is the number of analog parameters per mm2, normalized to a 1μm process (or analog parameter density). Analog parameters directly set the complexity possible by the particular FPAA device.

57Distribution A: Approved for public release; distribution unlimited.

Page 2: 4-4: A Programmable and Configurable Mixed-Mode … Programmable and Configurable Mixed-Mode FPAA SoC Sahil Shah, Sihwan Kim, Farhan Adil, Jennifer Hasler, Suma George, Michelle Collins,

FPAA as a representative of physical computing to be a potential solution for the industrial pain point for embedded systems applications (acoustics, vision, communication, robotics) resulting from its factor of 1000 improvement in energy efficiency [1,5]. Computational efficiency, not including the energy required for communication, is measured in equivalent Multiply and Accumulate (MAC) operations per unit time (sometimes implicit in the units) per unit power, some of the fundamental computing operations found in analog and digital computation. For example, both custom [6] and configurable implementations [7] of Vector-Matrix Multiplication (VMM), demonstrate 1-10 MMAC (/s)/μW power ranges while the digital MAC energy wall remains roughly at 10MMAC (/s)/mW [8].

Architecture Description of the FPAA SOC IC

Figure 2 shows the block diagram for the RASP 3.0 FPAA IC based on Manhattan FPAA architecture, including the array of computation blocks and routing, composed of Connection (C) and Switch (S) blocks. This configurable fabric effectively integrates Analog (A) and Digital (D) components in a hardware platform easily mapped towards compiler tools. The switchable analog and digital devices are a combination of the components in the Computational Analog Blocks (CAB), in the Computational Logic Blocks (CLB), and in the devices in the routing architectures that are programmed to non-binary levels.

Fig. 2: RASP 3.0 functional block diagram illustrating the resulting computational blocks and resulting routing architecture. The infrastructure control includes a μP developed from an open-source MSP 430 processor, as well as on chip structures including on-chip DACs, current-to-voltage conversion, and voltage measurement, to program each Floating-Gate (FG) device. The (FG) switches in the Connection (C) Blocks, the Switch Blocks (S), and the local routing are a single pFET (FG) transistor programmed to be a closed switch over the entire fabric signal swing of 0 to 2.5V. The Computational Analog Blocks (CAB) and Computational Logic Blocks (CLB) are similar to previous approaches [2]. Eight, 4 input Boolean Logic Element (BLE) lookup tables with a latch comprise the CLB blocks.

58

Page 3: 4-4: A Programmable and Configurable Mixed-Mode … Programmable and Configurable Mixed-Mode FPAA SoC Sahil Shah, Sihwan Kim, Farhan Adil, Jennifer Hasler, Suma George, Michelle Collins,

Further, the interaction of analog components, digital FPGA-like components, and a μP infrastructure coming together creates a significant co-design problem between these three domains. This FPAA SoC utilizes a non-volatile, reprogrammable, fine-grain fabric to enable circuits and signal-processing approaches. This FPAA further employs an open-source MSP 430 microprocessor (μP) with on-chip structures for 7bit signal DACs, a ramp ADC, memory mapped General Purpose (GP) IO, and related components. The processor is able to send information to and from the array through memory mapped I/O special purpose peripherals. These peripherals include

16 memory-mapped 7bit signal DACs for the architecture, allowing measurements to be performed on chip, with the data taken by and stored in the processor, as well as additional DACs (and one 14bit ramp ADC) for the FG programming. The processor supplements the processing power of the digital portion of the system and increases overall implementation flexibility; portions of a problem can be mapped to reconfigurable analog, reconfigurable digital, or a general-purpose digital processor.

The architecture is based on Floating-Gate (FG) device, circuit, programming, and system techniques. The μP and other control infrastructure allow all programming of FG devices on-chip by simply downloading the list of switches

Fig. 3: Analog auditory word classification application compiled into the RASP 3.0, showing the experimental waveforms from the IC. (a) Block diagram for the classifier algorithm, in a similar representation used for our tool framework. (b) BPF center frequencies that are scaled evenly on a log-frequency scale between 100Hz and 4kHz with a nearly constant Q (Q≈2). The programming approach only accounted for part of the threshold voltage mismatch variations (indirect FG devices) and did not consider effects of capacitor mismatch; additional techniques can be used for the existing IC for tighter measurements. The peak gain was the largest variation between the filters; the Q was roughly 2 with some small variation. The center frequency for each of the filters, as shown was monotonic and fairly close to an ideal exponential spacing. (c) BandPass Filter (BPF) outputs and Amplitude Detection for a single phrase from the TIMIT database, (d) Classification of word and components for a TIMIT waveform, using a k-WTA with three outputs to detect the word “dark” in the resulting phrase.

59

Page 4: 4-4: A Programmable and Configurable Mixed-Mode … Programmable and Configurable Mixed-Mode FPAA SoC Sahil Shah, Sihwan Kim, Farhan Adil, Jennifer Hasler, Suma George, Michelle Collins,

to be programmed. The code for programming is eliminated when the processor is executing, efficiently utilizing the on-chip SRAM capabilities. The external system, through a serial port interface, first loads the programming code, and then executes the programming code on the downloaded data.

This FPAA SoC practically requires a toolset for system design in a reasonable design timeframe. A user of these ICs can compile their design, rather than undertaking a full custom IC design process, typical of current FPGA devices. Compilation means the user starts with a description of the IC, going from a range of high level block abstractions to a literal list of FG switches going (and intermediate representations like verilog), to an experimentally measured IC utilized in the same places as a commercial IC, including potential volume production.

Command Word Recognition in SoC FPAA

This paper demonstrates the first embedded classifier structure (command word recognition) compiled onto a single FPAA device, going from sensor input (audio) to classified word, experimentally demonstrated in analog hard- ware; this demonstration is a small fraction of the overall IC. The system power for this compiled system on the SoC FPAA (23μW) is consistent with the x1000 improvement factor (comparison of MACs) for physical computation over digital approaches, with future opportunities for improved performance in the same IC. We expect the SoC FPAA could be used for a variety of potential applications for sound / acoustics / speech, image processing and vision sensors, robotics, and wireless communication.

Figure 3 shows the auditory classifier structure for a limited phrase, like a command word, that can be classified through features in the averaged signal spectrum. Continuous-time spectrum decomposition used a bank of constant Q filters at the first processing stage, using a bank of amplitude detection and filtering operations, and then next a VMM + WTA classifier block to classify each of the resulting spectrum into simple symbols. In a more complex speech recognition system, one might have the spectrum correspond to phonemes or part of phonemes and build up the temporal representations using temporal classification (i.e. HMM classification) to word spot the resulting

phonemes, syllables, and words. A simple command word application requires only to distinguish between a few simple symbols, can be directly computed as a state machine on the MSP430 processor; a next level of computation, say a simple Viterbi decoder, could be directly implemented on the MSP430 processor as well.

Table I shows the power required for the compiled command-word classifier computation by functional block, as well as the entire system, which requires 23μW for the current implementation. The implementation is not optimal, particularly in the LPF and classifier blocks, but shows the functional capability of the system at a very low (23μW) power level. This classifier system, compiled on this FPAA is consistent with the x1000 improvement factor in computation (measured in equivalent multiplication and accumulate, or MAC, operations). The focus in this paper is to demonstrate the operation of this classifier structure; further papers will describe the optimal design, classification accuracy metrics, robust behavior over datasets, and power consumption optimization. Table II shows the table of parameters for the resulting SoC FPAA.

Acknowledgements

We want to thank all of the efforts put into the open-source MSP 430 development, as well as we want to thank all of the efforts put into the open-source VPR / VTR place and routes project.

References [1] C. Schlottmann, S. Shapero, S. Nease, and P. Hasler, Journal of Solid State Circuits, 2012, vol. 47, no. 10, pp. 2174-2184. [2] R. Wunderlich, F. Adil, and P. Hasler, IEEE Transactions on VLSI, vol. 21, no. 8, 2013, pp. 1496-1505. [3] J. Hasler and H. Wang, “A Fine-Grain FPAA fabric for RF + Baseband,” GOMAC 2015. [4] PSoC5 Data Sheet, Cyprus Semi, 2011. [5] C. Mead, “Neuromorphic electronic systems”, IEEE Proceedings, vol. 78, no. 10, 1990, pp. 1629-1636. [6] R. Chawla, A. Bandyopadhyay, V. Srinivasan, and P. Hasler, IEEE Custom Integrated Circuits Conference, 2004, pp. 651-654. [7] C. Schlottmann, and P. Hasler, IEEE Journal of Emerging CAS, vol. 1, 2012, 403-411. [8] H. B. Marr, B. Degnan, P. Hasler, and D. Anderson, ``Scaling Energy Per Operation via an Asynchronous Pipeline,'' IEEE Trans. on VLSI, Vol. 21, No. 1, January 2013, pp. 147-151.

Table 1: Measured Power Numbers for Compiled Command-Word Classifier Function

Table 2: Table of important SoC FPAA IC Parameters

60