design and simulation of fpga based risc -cpu and system ... · imm 0xabc rcmpi rd,0xd ble label...
TRANSCRIPT
Design and Simulation of FPGA based RISC-CPU and
System On Chip
Mr.M.Maharaj1*, Dr.S.Praveenkumar2
1PG Student, Dept of ECE, Saveetha Engineering College, Chennai, Tamilnadu
2Associate Professor, Dept of ECE, Saveetha Engineering College, Chennai, Tamilnadu
Corresponding Author Email : [email protected]
Abstract
This paper presents the design and simulation of FPGA based RISC processor and System On
Chip (SOC) using Verilog HDL programming. This paper describes processor design, instruction set architecture and core design. The main advantage of this design is a design of soft CPU core
which enables custom instructions and function units, and can be reconfigured to enhance SoC development, debugging, testing, and tuning. Using testbench code and verified functionality of the SOC (System On Chip).
Keywords: Field Programmable Gate Array (FPGA), RISC Processor , SOC(System On
Chip).
1. Introduction
A system on a chip or system on chip (SoC or SOC) is an integrated circuit (also known
as an "IC" or "chip") that integrates all components of a computer or other electronic systems. It
may contain digital, analog, mixed-signal, and often radio-frequency functions—all on a
International Journal of Pure and Applied MathematicsVolume 119 No. 15 2018, 535-546ISSN: 1314-3395 (on-line version)url: http://www.acadpubl.eu/hub/Special Issue http://www.acadpubl.eu/hub/
535
single substrate. SoCs are very common in the mobile computing market because of their low
power-consumption.A typical application is in the area of embedded systems[1][2].
SoC integrates a microcontroller (or microprocessor) with advanced peripherals
like graphics processing unit (GPU), Wi-Fi module, or coprocessor. If the definition of a
microcontroller is a system that integrates a microprocessor with peripheral circuits and memory,
the SoC is to a microcontroller what a microcontroller is to processors, remembering that the
SoC does not necessarily contain built-in memory[3][4].
A typical SoC consists of a microcontroller, microprocessor or digital signal processor (DSP)
core – multiprocessor SoCs (MPSoC) having more than one processor core, memory blocks
including a selection of ROM, RAM, EEPROM and flash memory, timing sources
including oscillators and phase- locked loops, peripherals including counter-timers, real-
time timers and power-on reset generators, external interfaces, including industry standards such
as USB, FireWire, Ethernet, USART, SPI, analog interfaces including ADCs and DACs, voltage
regulators and power management circuits[5][6].
2. System On Chip (SOC)
Fig 1. Microcontroller-based system on a chip
International Journal of Pure and Applied Mathematics Special Issue
536
A SoC consists of both the hardware, described above, and
the software controllingthe microcontroller, microprocessor or DSP cores, peripherals and
interfaces. The design flow for a SoC aims to develop this hardware and software in
parallel.Most SoCs are developed from pre-qualified hardware blocks for the hardware elements
described above, together with the software drivers that control their operation. Of particular
importance are the protocol stacks that drive industry-standard interfaces like USB [7][8]. The
hardware blocks are put together using CAD tools; the software modules are integrated using
a software-development environment.
Once the architecture of the SoC has been defined, any new hardware elements are
written in an abstract language termed RTL which defines the circuit behavior. These elements
are connected together in the same RTL language to create the full SoC design.
3. RISC Architecture
RISC deals with these two levels - more precisely their interaction and trade-offs. The
work that each instruction of the RISC machine performs is simple and straight forward. Thus,
the time required to execute each instruction can be shortened and the number of cycles reduced.
Typically the instruction execution time is divided in five stages, machine cycles, and as soon as
processing of one stage is finished, the machine proceeds with executing the second stage.
However, when the stage becomes free it is used to execute the same operation that belongs to
the next instruction. The operation of the instructions is performed in a pipeline fashion, similar
to the assembly line in the factory process [9] [10].
Typically those five pipeline stages are IF – Instruction Fetch, ID – Instruction Decode, EX –
Execute, MA – Memory Access, WB – Write Back
By overlapping the execution of several instructions in a pipeline fashion, RISC achieves
its inherent execution parallelism which is responsible for the performance advantage over the
Complex Instruction Set Architectures (CISC).
The goal of RISC is to achieve execution rate of one Cycle Per Instruction (CPI=1.0)
which would be the case when no interruptions in the pipeline occurs. However, this is not the
case.
International Journal of Pure and Applied Mathematics Special Issue
537
Figure 2. Typical five stage RISC pipeline
The instructions and the addressing modes in RISC architecture are carefully selected and
tailored upon the most frequently used instructions, in a way that will result in a most efficient
execution of the RISC pipeline[1].
4. Proposed System
Figure 3 SOC Data Flow Diagram
International Journal of Pure and Applied Mathematics Special Issue
538
Figure 4 SOC Module Diagram
4.1.PROCESSOR DESIGN
Now let’s get right down to work and design a simple, FPGA-optimized, 16-bit, 16
register RISC processor core, for hosting embedded applications written in (integer) C, with
code-size-efficient 16-bit instructions.
4.2.Instruction set architecture
First we’ll choose an instruction set architecture. To simplify the development tools
chain, it is tempting to reuse an existing (legacy) ISA, however a new, custom instruction set can
be better optimized to minimize the area and hence the cost of an FPGA implementation. In
FPGAs, wires (programmable interconnect) and 4-LUTs are the most precious resources, and
most legacy ISAs were not designed with that in mind. Here are the two key ide as behind this
new instruction set.
1) Using the zero-cycle latency on-chip block RAM for an instruction store, (either RAM or i-
cache), each new instruction is available almost immediately. Therefore, as compared to our
earlier CPUs (that sport an instruction fetch pipeline stage to compensate for the latency of
offchip memory), it should be possible to build a simpler, non-pipelined processor with good
performance.
RST
CLK SOC
I/O PORTS
CPU
ALU
RAM
16X16
ADD
SUB
ROM ENCODER
DECODER
TIMER
DECODER
PAR_IN
PAR_OUT
International Journal of Pure and Applied Mathematics Special Issue
539
2) In a non-pipelined RISC CPU, a two-operand architecture
(all register-register operations of the form dest = dest op src;) enables the register file to be
implemented with a single bank of dual-port distributed RAM. With these two key decisions
made, the rest of the design flows naturally. So here is our streamlined GR0000 16-bit RISC
instruction set architecture. There are sixteen 16-bit registers, r0-r15, and a 16-bit program
counter PC.
There are five instruction formats:
And 22 operation plus 16 branch instructions:
Some instructions are interlocked and uninterruptible. These include imm, adc*, sbc*, and
*cmp*. Imm establishes the upper 12 bits of the immediate data of the instruction that follows.
The carry-out of adc*/*sbc* becomes the carry- in of the add*/sub/adc*/*sbc* that follows.
*Cmp* establishes condition codes (not programmer visible) for the conditional branch which
follows. These compose e.g.
International Journal of Pure and Applied Mathematics Special Issue
540
imm 0xABC
rcmpi rd,0xD
ble label
4.3. Core Interface
The core interface is relatively simple.The 16-bit core is parameterized to make it easier
to derive 8-and 32-bit register variants:Reset is synchronous, sampled on rising edge of the
clock. On reset, the processor jumps to address i_ad_rst.The processor core has a Harvard
architecture, with separate instruction-fetch and load/store-data ports. Here’s the instruction port:
As each instruction completes, late in the clock cycle, the core asserts the next instruction
address i_ad, qualified by insn_ce. After clk rises, the system drives insn with the next
instruction word, and asserts hit (―cache hit‖).
If insn is not ready, or upon an i-cache miss, hit is deasserted, so the processor ignores
(annuls) the current, invalid instruction. Therefore, in the implementation that follows, certain
decode signals must be qualified by hit. Int_en (with insn_ce) signals that the currently
International Journal of Pure and Applied Mathematics Special Issue
541
completing instruction is interruptible, and the system may safely insert an interrupt instruction.
Somewhat surprisingly, this signal is all that is necessary to implement interrupt handling in a
modular way, entirely outside of the processor core itself.
During a load or store instruction, the core requests a data transfer on the load/store-data port:
The data port outputs sw, sb, lw, and lb are valid well ahead of clk. The system can
sample these and determine whether to assert rdy in the current clock cyc le. Memory is byte
addressable, and so d_ad is the big-endian effective address of the load or store. During a store
instruction, the processor asserts d_ad with do each cycle until the system signals rdy indicating
it has captured the store data. Sw (store word) data are on do[15:0] while sb (store byte) data are
on do[7:0] only.
During a load instruction, the core asserts d_ad and awaits ready to indicate that the load
data are valid on data[15:0]. During lb (zero-extending load byte), the system must drive data
[15:8] with 8’b0. Besides loaded data, the tri-state data bus is also used within the core to carry
all other instruction result values.
5. RESULTS AND DISCUSSION
In this RTL simulation software is Questa sim 10.0b software from mentor graphics EDA
tool vendor. Verilog HDL is used to design the Entire SOC system and scripting also
used here.
6. SIMULATION OUTPUT
6.1. SOC TESTBENCH OUTPUT:
Figure 5 SOC Testbench output
International Journal of Pure and Applied Mathematics Special Issue
542
6.2 TIMER/COUNTER OUTPUT:
Figure 6 Timer/Counter output
6.3 ADDER & SUBTRACTOR SIMULATION OUTPUT:
Figure 7 Added/Subtractor output
International Journal of Pure and Applied Mathematics Special Issue
543
7. CONCLUSION AND FUTURE WORK
Thus, a RISC based SOC design and simulated by Questa sim software from mentor
graphics. The RISC based SOC advantage is compact and implementation different peripherals
in the SOC. We written the test bench for the SOC and verified the functionality of the SOC.
In this SOC now we are designed 8-bit low configuration like Instruction set, Program &
data memory, timer/counter & parallel I/O ports peripherals. In future we can add the more
instruction set and increase the program & data memory.
Then we have to increase the data/address bus 32-bit or 64 –bit. In this peripherals we
can develop the high level peripherals like biometric operation, high level sensor interfacing
peripherals, wireless module interfacing peripherals, modern agriculture performing peripherals.
Below mention the block diagram of modern agriculture.
REFERENCES
[1] G.M.Amdahl, G.A. Blaauw, F.P. Brooks, "Architecture of the IBM System/360, IBM Journal of
Research and Development, Vol.8, No.2, p.87-101, April 1964.
[2] G.A. Blaauw, F.P. Brooks, "The Structure of System/360", IBM Systems Journal, Vol.3, No.2, p.119-
135, 1964.
[3] R.P.Case, A.Padegs, "Architecture of the IBM System/370", Communications of ACM, Vol.21,
No.1,p. 73-96, January 1978.
[4] D.W.Anderson, F.J.Sparacio, and R.M.Tomasulo, ―The IBM 360 Model 91: Machine philosophy and
instruction handling,‖ IBM Journal of Research and Development, Vol.11, No.1, January 1967, p.8-24.
[5] G. Radin, "The 801 Minicomputer", IBM T.J.Watson Research Center, Report RC 9125,
November11, 1981, also in SIGARCH Computer Architecture News 10, No.2, p.39-47, March 1982.
[6] John CockeandVikyMarkstein, ―The Evolution of RISC Technology at IBM,‖ IBM Journal of
Research and Development, Vol.34, No.1, pp.37, January 1990.
[7] M.Jothi Kumar and Chitravalavan, ―Implementation Of Blake Algorithm Using Pipelining in FPGA‖,
International Journal of Innovations in Scientific and Engineering Research (IJISER), Vol.1, No.12,
pp.488-493, 2014.
[8] M. E. Hopkins, "A Perspective on the 801 / Reduced Instruction Set Computer", IBM Systems
Journal, Vol. 26, No.1, 1987.
[9] Henry S. Warren, Jr., ―Instruction scheduling for the IBM RISC System/6000 processor,‖ IBM
Journal of Research and Development, Vol.34, No.1, pp.37, January 1990.
[10] D.A. Patterson, C.H.Sequin, "A VLSI RISC", IEEE Computer Magazine, September 1982.
International Journal of Pure and Applied Mathematics Special Issue
544
[11] J. L. Hennessy, "VLSI Processor Architecture", IEEE Transactions on Computers, Vol. C-33,
No.12,December 1984. J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach,
Morgan & Kaufman Publishers, San Mateo, California.
[12] Prakash, M., and C. J. KavithaPriya. "An Analysis of Types of Protocol Implemented in Internet of Things Based on Packet Loss Ratio." In Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, p. 27. ACM, 2016.
[13] L.J.Shustek, ―Analysis and Performance of Computer Instruction Sets,‖ PhD. Thesis, Stanford
University, May 1978.
[14] Gregory F. Grohosky, ―Machine Organization of the IBM RISC System/6000 processor,‖ IBM
Journal of Research and Development, Vol.34, No.1, pp.37, January 1990.
[15] V.G.Oklobdzija, ―Issues in CPU-Coprocessor Communication and Synchronization,‖ EUROMICRO
’88, Fourteenth Symposium on Microprocessing and Microprogramming, pp. 695.,Zurich,Switzerland,
August 1988.
[16] R.M.Tomasulo, ―An Efficient Algorithm for Exploring Multiple Arithmetic Units,‖ IBM Journal of
Research and Development, Vol.11. No.1. p.25-33.
[17] John Cocke, Gregory Grohosky, and VojinOklobdzija, ―Instruction Control Mechanism for a
Computing System with Register Renaming, MAP Table and Queues Indicating Available
Registers,‖U.S. Patent No. 4,992,938, February 12, 1991.
International Journal of Pure and Applied Mathematics Special Issue
545
546