design and simulation of fpga based risc -cpu and system ... · imm 0xabc rcmpi rd,0xd ble label...

Design and Simulation of FPGA based RISC-CPU and

System On Chip

Mr.M.Maharaj1*, Dr.S.Praveenkumar2

1PG Student, Dept of ECE, Saveetha Engineering College, Chennai, Tamilnadu

2Associate Professor, Dept of ECE, Saveetha Engineering College, Chennai, Tamilnadu

Corresponding Author Email : [email protected]

Abstract

This paper presents the design and simulation of FPGA based RISC processor and System On

Chip (SOC) using Verilog HDL programming. This paper describes processor design, instruction set architecture and core design. The main advantage of this design is a design of soft CPU core

which enables custom instructions and function units, and can be reconfigured to enhance SoC development, debugging, testing, and tuning. Using testbench code and verified functionality of the SOC (System On Chip).

Keywords: Field Programmable Gate Array (FPGA), RISC Processor , SOC(System On

Chip).

1. Introduction

A system on a chip or system on chip (SoC or SOC) is an integrated circuit (also known

as an "IC" or "chip") that integrates all components of a computer or other electronic systems. It

may contain digital, analog, mixed-signal, and often radio-frequency functions—all on a

International Journal of Pure and Applied MathematicsVolume 119 No. 15 2018, 535-546ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/

535

single substrate. SoCs are very common in the mobile computing market because of their low

power-consumption.A typical application is in the area of embedded systems[1][2].

SoC integrates a microcontroller (or microprocessor) with advanced peripherals

like graphics processing unit (GPU), Wi-Fi module, or coprocessor. If the definition of a

microcontroller is a system that integrates a microprocessor with peripheral circuits and memory,

the SoC is to a microcontroller what a microcontroller is to processors, remembering that the

SoC does not necessarily contain built-in memory[3][4].

A typical SoC consists of a microcontroller, microprocessor or digital signal processor (DSP)

core – multiprocessor SoCs (MPSoC) having more than one processor core, memory blocks

including a selection of ROM, RAM, EEPROM and flash memory, timing sources

including oscillators and phase- locked loops, peripherals including counter-timers, real-

time timers and power-on reset generators, external interfaces, including industry standards such

as USB, FireWire, Ethernet, USART, SPI, analog interfaces including ADCs and DACs, voltage

regulators and power management circuits[5][6].

2. System On Chip (SOC)

Fig 1. Microcontroller-based system on a chip

International Journal of Pure and Applied Mathematics Special Issue

536

A SoC consists of both the hardware, described above, and

the software controllingthe microcontroller, microprocessor or DSP cores, peripherals and

interfaces. The design flow for a SoC aims to develop this hardware and software in

parallel.Most SoCs are developed from pre-qualified hardware blocks for the hardware elements

described above, together with the software drivers that control their operation. Of particular

importance are the protocol stacks that drive industry-standard interfaces like USB [7][8]. The

hardware blocks are put together using CAD tools; the software modules are integrated using

a software-development environment.

Once the architecture of the SoC has been defined, any new hardware elements are

written in an abstract language termed RTL which defines the circuit behavior. These elements

are connected together in the same RTL language to create the full SoC design.

3. RISC Architecture

RISC deals with these two levels - more precisely their interaction and trade-offs. The

work that each instruction of the RISC machine performs is simple and straight forward. Thus,

the time required to execute each instruction can be shortened and the number of cycles reduced.

Typically the instruction execution time is divided in five stages, machine cycles, and as soon as

processing of one stage is finished, the machine proceeds with executing the second stage.

However, when the stage becomes free it is used to execute the same operation that belongs to

the next instruction. The operation of the instructions is performed in a pipeline fashion, similar

to the assembly line in the factory process [9] [10].

Typically those five pipeline stages are IF – Instruction Fetch, ID – Instruction Decode, EX –

Execute, MA – Memory Access, WB – Write Back

By overlapping the execution of several instructions in a pipeline fashion, RISC achieves

its inherent execution parallelism which is responsible for the performance advantage over the

Complex Instruction Set Architectures (CISC).

The goal of RISC is to achieve execution rate of one Cycle Per Instruction (CPI=1.0)

which would be the case when no interruptions in the pipeline occurs. However, this is not the

case.


537

Figure 2. Typical five stage RISC pipeline

The instructions and the addressing modes in RISC architecture are carefully selected and

tailored upon the most frequently used instructions, in a way that will result in a most efficient

execution of the RISC pipeline[1].

4. Proposed System

Figure 3 SOC Data Flow Diagram


538

Figure 4 SOC Module Diagram

4.1.PROCESSOR DESIGN

Now let’s get right down to work and design a simple, FPGA-optimized, 16-bit, 16

register RISC processor core, for hosting embedded applications written in (integer) C, with

code-size-efficient 16-bit instructions.

4.2.Instruction set architecture

First we’ll choose an instruction set architecture. To simplify the development tools

chain, it is tempting to reuse an existing (legacy) ISA, however a new, custom instruction set can

be better optimized to minimize the area and hence the cost of an FPGA implementation. In

FPGAs, wires (programmable interconnect) and 4-LUTs are the most precious resources, and

most legacy ISAs were not designed with that in mind. Here are the two key ide as behind this

new instruction set.

1) Using the zero-cycle latency on-chip block RAM for an instruction store, (either RAM or i-

cache), each new instruction is available almost immediately. Therefore, as compared to our

earlier CPUs (that sport an instruction fetch pipeline stage to compensate for the latency of

offchip memory), it should be possible to build a simpler, non-pipelined processor with good

performance.

RST

CLK SOC

I/O PORTS

CPU

ALU

RAM

16X16

ADD

SUB

ROM ENCODER

DECODER

TIMER

DECODER

PAR_IN

PAR_OUT


539

2) In a non-pipelined RISC CPU, a two-operand architecture

(all register-register operations of the form dest = dest op src;) enables the register file to be

implemented with a single bank of dual-port distributed RAM. With these two key decisions

made, the rest of the design flows naturally. So here is our streamlined GR0000 16-bit RISC

instruction set architecture. There are sixteen 16-bit registers, r0-r15, and a 16-bit program

counter PC.

There are five instruction formats:

And 22 operation plus 16 branch instructions:

Some instructions are interlocked and uninterruptible. These include imm, adc*, sbc*, and

*cmp*. Imm establishes the upper 12 bits of the immediate data of the instruction that follows.

The carry-out of adc*/*sbc* becomes the carry- in of the add*/sub/adc*/*sbc* that follows.

*Cmp* establishes condition codes (not programmer visible) for the conditional branch which

follows. These compose e.g.


540

imm 0xABC

rcmpi rd,0xD

ble label

4.3. Core Interface

The core interface is relatively simple.The 16-bit core is parameterized to make it easier

to derive 8-and 32-bit register variants:Reset is synchronous, sampled on rising edge of the

clock. On reset, the processor jumps to address i_ad_rst.The processor core has a Harvard

architecture, with separate instruction-fetch and load/store-data ports. Here’s the instruction port:

As each instruction completes, late in the clock cycle, the core asserts the next instruction

address i_ad, qualified by insn_ce. After clk rises, the system drives insn with the next

instruction word, and asserts hit (―cache hit‖).

If insn is not ready, or upon an i-cache miss, hit is deasserted, so the processor ignores

(annuls) the current, invalid instruction. Therefore, in the implementation that follows, certain

decode signals must be qualified by hit. Int_en (with insn_ce) signals that the currently


541

completing instruction is interruptible, and the system may safely insert an interrupt instruction.

Somewhat surprisingly, this signal is all that is necessary to implement interrupt handling in a

modular way, entirely outside of the processor core itself.

During a load or store instruction, the core requests a data transfer on the load/store-data port:

The data port outputs sw, sb, lw, and lb are valid well ahead of clk. The system can

sample these and determine whether to assert rdy in the current clock cyc le. Memory is byte

addressable, and so d_ad is the big-endian effective address of the load or store. During a store

instruction, the processor asserts d_ad with do each cycle until the system signals rdy indicating

it has captured the store data. Sw (store word) data are on do[15:0] while sb (store byte) data are

on do[7:0] only.

During a load instruction, the core asserts d_ad and awaits ready to indicate that the load

data are valid on data[15:0]. During lb (zero-extending load byte), the system must drive data

[15:8] with 8’b0. Besides loaded data, the tri-state data bus is also used within the core to carry

all other instruction result values.

5. RESULTS AND DISCUSSION

In this RTL simulation software is Questa sim 10.0b software from mentor graphics EDA

tool vendor. Verilog HDL is used to design the Entire SOC system and scripting also

used here.

6. SIMULATION OUTPUT

6.1. SOC TESTBENCH OUTPUT:

Figure 5 SOC Testbench output


542

6.2 TIMER/COUNTER OUTPUT:

Figure 6 Timer/Counter output

6.3 ADDER & SUBTRACTOR SIMULATION OUTPUT:

Figure 7 Added/Subtractor output


543

7. CONCLUSION AND FUTURE WORK

Thus, a RISC based SOC design and simulated by Questa sim software from mentor

graphics. The RISC based SOC advantage is compact and implementation different peripherals

in the SOC. We written the test bench for the SOC and verified the functionality of the SOC.

In this SOC now we are designed 8-bit low configuration like Instruction set, Program &

data memory, timer/counter & parallel I/O ports peripherals. In future we can add the more

instruction set and increase the program & data memory.

Then we have to increase the data/address bus 32-bit or 64 –bit. In this peripherals we

can develop the high level peripherals like biometric operation, high level sensor interfacing

peripherals, wireless module interfacing peripherals, modern agriculture performing peripherals.

Below mention the block diagram of modern agriculture.

REFERENCES

[1] G.M.Amdahl, G.A. Blaauw, F.P. Brooks, "Architecture of the IBM System/360, IBM Journal of

Research and Development, Vol.8, No.2, p.87-101, April 1964.

[2] G.A. Blaauw, F.P. Brooks, "The Structure of System/360", IBM Systems Journal, Vol.3, No.2, p.119-

135, 1964.

[3] R.P.Case, A.Padegs, "Architecture of the IBM System/370", Communications of ACM, Vol.21,

No.1,p. 73-96, January 1978.

[4] D.W.Anderson, F.J.Sparacio, and R.M.Tomasulo, ―The IBM 360 Model 91: Machine philosophy and

instruction handling,‖ IBM Journal of Research and Development, Vol.11, No.1, January 1967, p.8-24.

[5] G. Radin, "The 801 Minicomputer", IBM T.J.Watson Research Center, Report RC 9125,

November11, 1981, also in SIGARCH Computer Architecture News 10, No.2, p.39-47, March 1982.

[6] John CockeandVikyMarkstein, ―The Evolution of RISC Technology at IBM,‖ IBM Journal of

Research and Development, Vol.34, No.1, pp.37, January 1990.

[7] M.Jothi Kumar and Chitravalavan, ―Implementation Of Blake Algorithm Using Pipelining in FPGA‖,

International Journal of Innovations in Scientific and Engineering Research (IJISER), Vol.1, No.12,

pp.488-493, 2014.

[8] M. E. Hopkins, "A Perspective on the 801 / Reduced Instruction Set Computer", IBM Systems

Journal, Vol. 26, No.1, 1987.

[9] Henry S. Warren, Jr., ―Instruction scheduling for the IBM RISC System/6000 processor,‖ IBM

Journal of Research and Development, Vol.34, No.1, pp.37, January 1990.

[10] D.A. Patterson, C.H.Sequin, "A VLSI RISC", IEEE Computer Magazine, September 1982.


544

[11] J. L. Hennessy, "VLSI Processor Architecture", IEEE Transactions on Computers, Vol. C-33,

No.12,December 1984. J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach,

Morgan & Kaufman Publishers, San Mateo, California.

[12] Prakash, M., and C. J. KavithaPriya. "An Analysis of Types of Protocol Implemented in Internet of Things Based on Packet Loss Ratio." In Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, p. 27. ACM, 2016.

[13] L.J.Shustek, ―Analysis and Performance of Computer Instruction Sets,‖ PhD. Thesis, Stanford

University, May 1978.

[14] Gregory F. Grohosky, ―Machine Organization of the IBM RISC System/6000 processor,‖ IBM

Journal of Research and Development, Vol.34, No.1, pp.37, January 1990.

[15] V.G.Oklobdzija, ―Issues in CPU-Coprocessor Communication and Synchronization,‖ EUROMICRO

’88, Fourteenth Symposium on Microprocessing and Microprogramming, pp. 695.,Zurich,Switzerland,

August 1988.

[16] R.M.Tomasulo, ―An Efficient Algorithm for Exploring Multiple Arithmetic Units,‖ IBM Journal of

Research and Development, Vol.11. No.1. p.25-33.

[17] John Cocke, Gregory Grohosky, and VojinOklobdzija, ―Instruction Control Mechanism for a

Computing System with Register Renaming, MAP Table and Queues Indicating Available

Registers,‖U.S. Patent No. 4,992,938, February 12, 1991.


545

design and simulation of fpga based risc -cpu and system ... · imm 0xabc rcmpi rd,0xd ble label...

Documents