ece 554 final project sega master system -team zoop- clement luk, eric jackowski, ilhyun kim, mike...
TRANSCRIPT
ECE 554 Final ProjectSega Master System
-TEAM ZOOP-Clement Luk, Eric Jackowski, Ilhyun Kim, Mike Wiktor, Karthik Ramachandran, Tsung-Hao Chen, Tsung-Chi Lin,
Yi-Ting Chen, Dan Luu
Outline
Features
CPU
Translator
Graphics
Memory & I/O devices
Emulator
Outline
Implementation
Problems & Difficulties encountered
Debug & Testing
Design Changes
Final Product
Features
Features On-the-fly CISC to RISC instruction translation Memory Paging Sega Emulation Interrupts
Z80 processor
CISC processor Variable length instruction(each are 1 byte length, max up to 4 bytes) 16 bits address bus, 8 bits data bus Other signal to interface with I/O and memory
Register 2 sets of 6 general register
(8 bits or 16 bits as pairs) 2 sets of accumulator and flag register Special registers:
Program Counter, Stack Pointer, Index Reg, Interrupt Reg, Memory refresh Reg
Z80 Processor Instructions
Total 158 different types, but addressing mode can be mixed1) Load and exchange2) Block transfer and search3) Arithmetic and logical4) Rotate and shift5) Bit manipulation(Set, Rest, Test)6) Jump, call, and return7) Input/Output8) Basic CPU control
RISC Core
z80 compatible Emulates all architectural states Timing is not emulated
RISC implementation Single-issue pipelined core Load / Store architecture
Translating z80 instructions into micro-ops Reducing the complexity of the pipelined
implementation of z80
Micro-op format
Instuction formatRRR type: 15+6 bits
RRI type: 18 +6 bits
RI type: 21 +6 bits
op Rdst(5) Rs(5) Rt(5)
op Rdst(5) Rs(5) IMM(8)
op Rdst/Rsrc(5) IMM(16)
Processor Block Diagram
z80fetch
xlate
instructionqueue decode
/RF
access
ALUstage
Memaccess
fetch / translatorRISC core
Memory Memory
Instruction fetch / translation
z80 instruction fetcher
MUX
1
adder
fetchPC
target PC (from the core)taken CTI/interrupt
(from the core)
memorysystem
16-bitaddr
8-bitz80
inst word
z80inst queue
pre-decodelogic
1~4 bytez80 I-word
z80 I ready
pre-decode info
xlator stall
translator
Instruction fetch / translationInstruction queue
xlator
uop[0]
uop[1]
uop[2]
uop[3]
queue full
FIFO queue0 (core gen)
FIFO queue1 (core gen)
FIFO queue2 (core gen)
FIFO queue3 (core gen)
MU
X uop
decodeRF
sequencing 0,1,2,3
Multi-input, single-output FIFO
Queue 4 uops in parallel (if <4 put nop)
Dequeue 1 uop sequentially from each queue
When instruction with EOI is dequed, dequeue nops from remaining queue’s and start dequeing from queue0
flush_unit/flush1
bubble
toflush_unit/bubble
4
RF
Rd0_idRd1_id
Wr_idWr_data
F_maskF_data
Rd_data0
Rd_data1
RF
Rd0_idRd1_id
Wr_idWr_data
F_maskF_data
Rd_data0
Rd_data1
fields_
deco
der
Iword
ImmWr_id
Rd0_idRd1_idfield
s_d
ecod
er
Iword
ImmWr_id
Rd0_idRd1_id
from
ME
M_W
B pipelatch
from
ME
M_W
B pipelatch
from
ME
M_W
B pipelatch
PC_update
PC_disp
archPCnext_archPC
targetPCPCupdate
pipe_stall
archPCreg
PC_update
PC_disp
archPCnext_archPC
targetPCPCupdate
pipe_stall
PC_update
PC_disp
archPCnext_archPC
targetPCPCupdate
pipe_stall
archPCreg
archPCreg
flush_unit/target_PCflush_unit/PCupdate
mem_pipe_stalldep_check/bubble
pipe_stall
Iword
Iword(38:36)
Iword(26:21)
flush
stall
mem_pipe_stall
stall overrides
flush
opcode(5:0)
IMM (15:0)Wr_id(4:0)
seqNPC(15:0)
dep_check
Rd0_idRd1_id
EXE_opcodeEXE_Wr_idEXE_Fmask
bubble
dep_check
Rd0_idRd1_id
EXE_opcodeEXE_Wr_idEXE_Fmask
bubble
from E
XE
stagefrom
EX
E stage
1
Fmask(7:0)Iword(35:28)
2
3
1
2
3
5
4
5
Iword(27)EOI6
10
11
targetPC
21
22
19
20
Rd_data0(15:0)Rd0_id(4:0)
Rd_data1(15:0)Rd1_id(4:0)
RF Stage
flush
stall
stall overrides
flush
opcode(5:0)
IMM (15:0)
Wr_id(4:0)
seqNPC(15:0)
1
Fmask(7:0)2
3flush
stall
stall overrides
flush
Result(15:0)
Wr_id(4:0)Fmask(7:0)
Flags(7:0)
branch_unit
OpcodeImmSrc0Src1seqNPC
targetPC
takenbranch_unit
OpcodeImmSrc0Src1seqNPC
targetPC
taken
EXE_result_mux
opcodeALU_resultIFF1IFF2
seqNPC
Result
flush_unit
mem_pipe_stallbubble
EOI_RF_stage
EOI_EXE_stagePC_EXE_stage
EOI_MEM_stagePC_MEM_stage
paging_RQ
br_targetPCbr_taken
interrupt_RQ
flush0
flush1flush2
targetPCPCupdate
flush_unit
mem_pipe_stallbubble
EOI_RF_stage
EOI_EXE_stagePC_EXE_stage
EOI_MEM_stagePC_MEM_stage
paging_RQ
br_targetPCbr_taken
interrupt_RQ
flush0
flush1flush2
targetPCPCupdate
interrupt_handlerinterrupt_handler
4
5
EOI EOI
6
7
7
seqNPC(15:0)
8
8
mem_pipe_stall5
flush2from paging_handler
/paging_RQmem_pipe_stall
bubble4
5
flush1
10
11
IFF1regIFF1reg
IFF1
IFF2regIFF2reg
IFF2
EI DI
EI/DI??
Src1(15:0)
exte
rn_
INT
exte
rn_
NM
I
exte
rn_
INT
exte
rn_
NM
I
inte
rn_
INT
inte
rn_
NM
I
xlat
or/
INT
xlat
or/
NM
I
interrupt_RQ
flush0
flush9
12
13
14
15
16
MEMctrl(5:0)
Rd_data0(15:0)Rd0_id(4:0)
Rd_data1(15:0)Rd1_id(4:0)
ALU
Opcode
Imm
Src0
Src1
Flags
Result
ALU
Opcode
Imm
Src0
Src1
Flags
Result
AL
U_in
pu
t_sel
Rd_data0Rd0_id
Rd_data1Rd1_id
MEM_ResultMEM_Wr_idMEM_FlagsMEM_Fmask
WB_ResultsWB_Wr_idWB_FlagsWB_Fmask
Src0
Src1
AL
U_in
pu
t_sel
Rd_data0Rd0_id
Rd_data1Rd1_id
MEM_ResultMEM_Wr_idMEM_FlagsMEM_Fmask
WB_ResultsWB_Wr_idWB_FlagsWB_Fmask
Src0
Src113
1415
16
19
20
21
22
ld/st/in/out ctrl
EXE Stage
flush
stall
stall overrides
flush
Result(15:0)
Wr_id(4:0)Fmask(7:0)
Flags(7:0)
EOI 7
seqNPC(15:0) 8
mem_pipe_stall5
flush2
Src1(15:0)
15
16
paging_handler
MREQaddrRd_Wr
flush0paging_RQ
paging_handler
MREQaddrRd_Wr
flush0paging_RQ
memoryinterface
MREQIORQRd/Wr
D_dataD_addr memory
interface
MREQIORQRd/Wr
D_dataD_addr
stall
Wr_id(4:0)Fmask(7:0)
mem_pipe_stall5
mem_pipe_stall5
D_wait
mem_pipe_stall5
mux
me
m/IO
rea
d data
ALU
resu
lt
Result(15:0)
paritycheck
/ F correction(not sure)
Flags(7:0)12to
flush
_unit/p
agin
g_R
Q
9from
flush_unit/flush0
21
22
19
20
13
14
MEMctrl(5:0)
MEM Stage
Interrupts
IRQ (Maskable Interrupt) SMS is always used in interrupt mode 1. It causes a jump to
location $0x0038 when an interrupt is generated A frame interrupt occurs every 1/60 second
NMI (Non Maskable Interrupt) Generated when the Pause button is pressed Causes an unconditional processor jump to address $0x0066
VDP
VDP
Data_in[7:0]RD_bWR_b
RESET_bCLKCLK2X
rd[7:0]rcebroebrweb
pixelCLK
VDP_mem_ctil
Data_in[10:0]WR_bFULL_bRESET_bCLK(neg-trig)
VDP_CQ(32x11)
Q_out[10:0]Empty_b
RD_b
Data_in[21:0]WR_bFULL_bRESET_bCLK(neg-trig)
VDP_WQ(64x22)
Q_out[21:0]Empty_b
RD_b
Addr_MPU[13:0]Data_MPU_in[7:0]Data_MPU_out[7:0]RD_MPU_bWR_MPU_bWAIT_MPU_b
VDP_draw
Data_W[21:0]WR_W_b
FULL_W_b
Addr_
IO[1
3:0
]D
ata
_IO
_ou
t[7:0
]D
ata
_IO
_in
[7:0
]R
D_IO
_b
WR
_IO
_b
WA
IT_IO
_b
reg_in
fra
me
reg12[13:0],NTB[2:0],STB[5:0],SPB[0],OBC[3:0],BX[7:0],BY[7:0],LC[7:0]
RESET_bCLK
VDP_reg(11x8)
Addr_rd[3:0]
Data_in[7:0]WR_b
Data_in[10:0]CQ_empty_bCQ_RD_b
RESET_bCLK
Ready_b
RD_bWR_b
RS[2:0]D[7:0]
VDP_color_update
DAC_RD_b
DAC_WR_bDAC_RS
DAC_D
Hsyn_b
Vsyn_b
Blank_b
rdrcebroebrweb
pCLK
ra
VGA_Data
VGA_RD_b
VGA_WR_b
WQ_out
WQ_empty_b
WQ_RD_b
WQ_in
WQ_WR_b
WQ_full_b
WQ_in
WQ_WR_b
WQ_full_bCQ_in
CQ_WR_b
CQ_full_b
Addr[7:0]Data_in[7:0]Data_out[7:0]IORQ_bRD_bWR_bWAIT_bIRQ_b
VDP_IO
Ad
dr_
M[1
3:0
]D
ata
_M
_in
[7:0
]D
ata
_M
_o
ut[
7:0
]R
D_
M_
bW
R_
M_
bW
AIT
_M
_b
Addr_R[3:0]Data_R_out[7:0]
WR_R_b
Data_C_out[10:0]WR_C_b
FULL_C_bHsyn_bFsyn_bC
LK
RE
SE
T_
b
0
Data_in[21:0]WQ_empty_bWQ_RD_b
RESET_bCLK
VDP_VGA_ctrl
Addr_out[18:0]Data_out[7:0]
RD_bWR_b
Hsyn_bVsyn_bBlank_bFsyn_b
fra
me
VDP Overview
Function: 1) Memory RD/WR from CPU traffic 2) Register RD/WR from CPU traffic 3) Paint the screen based on defined memory
location by register 4) Interrupt generation 5) Return current scan line information
VDP
Graphics Features 256*192 resolution 64 sprites on screen selected out of 512 possible tiles 32 simultaneous colors out of 64 Background Flipping (horizontal and vertical) Background Scrolling Relocatable sprite table Sprite data interleaving to allow a single write to
change the color of the whole 8*8 sprite
System I/O FPGA
CL
K (fro
m D
LL
)C
LK
2X
(from
DL
L)
RE
SE
T_
b
512K RAM
512K RAM
RAMDAC
512K RAM
512K RAM
from DLLCLK2X
to 1
from Z80 MREQ_b
from Z80 RD_b
from Z80 WR_b
from Z80 Addr[7:0]from Z80 Data_OUT
from FPGARESET_b
Addr[7:0]Data_IN[7:0]Data_OUT[7:0]
INT_bIORQ_bRD_bWR_bWAIT_b
RESET_bCLK
Addr_M[13:0]Data_M_IN[7:0]Data_M_OUT[7:0]RD_M_bWR_M_bWAIT_M_b
Ctrl & Datato RAMDAC
???
VDP
Addr[7:0]Data_OUT[7:0]
NMI_bIORQ_bRD_bWAIT_b
JPC
CLKPS2
no implemented
from Z80 IORQ_b
from Z80 RD_b
to 3, always HIGH
to Z80 INT_b
from Z80 IORQ_b
from Z80 RD_b
from Z80 WR_b
to 2
1 2
3
from FPGA RESET_b
from DLL CLK
from FPGA RESET_b
from DLL CLK
to Z80 Data_IN
Addr[15:0]Data_IN[7:0]
Data_OUT[7:0]
INT_bNMI_b
IORQ_bMREQ_b
RD_bWR_b
WAIT_bRESET_b
CLK
Z80
Addr_Fetch[15:0]Data_Fetch[7:0]FHRQ_bFlush_bWAIT_F_b
Addr[15:0]Data_IN[7:0]
Data_OUT[7:0]MREQ_b
RD_bWR_b
WAIT_b
Addr_V[13:0]Data_V_IN[7:0]
Data_V_OUT[7:0]RD_V_b
WR_V_bWAIT_V_b
MPU
Addr_F[15:0]Data_F[7:0]FHRQ_bFlush_bWAIT_F_b
Addr_SRAM[18:0]Data_SRAM[15:0]CE_WE_OE_
CLKRESET_b
ld[15:0]la[18:0]
lceb
lweb
loeb
MPU (Memory Paging Unit)MPU
11111
Addr_FH
AddrData_INData_OUT
RD_CPU_b = MREQ_b | RD_bWR_CPU_b = MREQ_b | WR_b
Addr_VDPData_VDP_inData_VDP_outRD_VDP_bWR_VDP_bWAIT_VDP_b
RD_CPUWR_CPU
CLKRESET_b
Addr_SRAMData_SRAM
ceboebw eb
MREQ_bRD_bWR_bWAIT_b
A_page
D2
M PU_paging
CLKRESET_b
Addr_FH_in
Addr_inData_inData_outWR_b
Addr_FH_out
Addr_out
Data_read
A_F_page
Data_FH
Data_FH
FHRQ_bFlush_bWAIT_F_b
M PU_scheduler
Addr_CPUData_CPU_inData_CPU_outRD_CPU_bWR_CPU_bWAIT_CPU_b
Addr_VDPData_VDP_inData_VDP_outRD_VDP_bWR_VDP_bWAIT_VDP_b
CLKRESET_b
Addr_SRAMData_SRAM
ceboebw eb
Addr_FHData_FH_out
RQ_FH_bFLUSH_bWAIT_FH_b
5 MSB0xxxxx (256K) SMS ROM1xxxx (256K-last 54K) SMS Battery-Backed-Up RAM11101 (16K) User RAM11110 (4 bits) FCR (FFFC~FFFF)11111 (16K) VDP Memory
Memory & I/O devices
All devices and memory share the same address bus and data bus.For input/output type of instructions, Z80 sets IORQ. The lower 8-bit address is the port number, and data goes through the data bus.For load type of instructions, Z80 sets MREQ. The 16-bit address is the virtual address and the data goes through the data bus.Virtual address from Z80 is mapped to physical address by the memory paging chip.
Memory & I/O devices
Reason for Memory Paging Only 16-bit Address supported by Z80 Need to handle up to 512 KB ROM + 16 KB RAM ( need more than 19
bits )
Memory Map summary On-board User RAM ( 8KB X 2 ) ROM Frame 0, 1, 2 ( each 16KB of size ) Frame Control Registers
Frame Control Registers (FCR’s) $FFFC : RAM select register $FFFD : Frame 0 ROM bank $FFFE : Frame 1 ROM bank $FFFF : Frame 2 ROM bank
Joypad I/OPORT $DC ($C0) – Joypad port 1 (read only) Each bit corresponds to a button 0 for pressed, 1 for released Bits meaning :
PORT $DD ($C1) – Joypad port 2 (read only) Bits meaning :
Any kinds of Joypads are okay (serial or parallel)
bit 7 : Joypad 2 Downbit 6 : Joypad 2 Upbit 5 : Joypad 1 Fire Bbit 4 : Joypad 1 Fire Abit 3 : Joypad 1 Rightbit 2 : Joypad 1 Leftbit 1 : Joypad 1 Downbit 0 : Joypad 1 Up
bit 7 : Lightgun 2bit 6 : Lightgun 1bit 5 : Unusedbit 4 : Reset Buttonbit 3 : Joypad 2 Fire Bbit 2 : Joypad 2 Fire Abit 1 : Joypad 2 Rightbit 0 : Joypad 2 Left
Joypad ControllerVDD
PS2_Clk
VDD
PS2_Data
PS/2 Adapter
Decoder
Keyboard
Key_StatusRegisterJoystick2
Address
Decoder
Data Port
Key_StatusRegisterJoystick1
Floating
Pause Flag
Addr[7:0]
NMI
Data[7:0]
RD
IORQ
Joypad Controller (JPC Unit)
Proxy
UpDown
Left Right FireA FireB Pause Reset
Joy1
Mapping Key W S A D<
,
>
.F1 ESC
Scan Code 1D 1B 1C 23 41 49 05 76
Joy2
Mapping Key ↑ ↓ ← →0
Ins
.
Del
Scan Code E075 E072 E068 E074 70 71
Joypad to Keyboard Mapping
I/O Controller
Interacts with CPU,Scheduler,Color Ram.
Four Basic Operations.
Vramwrite, Vramread,Register Write , ColorRam write.
Interrupt Handling.
Functional Operation
On I/O request, fetches command word and data.
DecodeCommand[15:14] 00:Read; 01:Write;10:Reg Wr;11:Cram
Line Interrupts
Frame Interrupts
Supports Pre-fetching on read
SMS Emulator
Memory ( )Memory ( )
SYS_IO ( )SYS_IO ( )
Z80Z80 DecoderDecoder
Z80Z80 CoreCore
Z80 CPU
VDP ( )VDP ( )
MPU ( )MPU ( )
JPC ( )JPC ( )
Z80 version–Download the framework from web–Reconstruct the Z80 CPU
SMS Emulator (Contd.) Uop translator version
–Map the Verilog Translator to C++ code–Implement the micro instruction set
Memory ( )Memory ( )
SYS_IO ( )SYS_IO ( )
VDP ( )VDP ( )
MPU ( )MPU ( )
JPC ( )JPC ( )
Z80Z80 DecoderDecoder
RISC CPU
Micro_IMicro_I
TranslatorTranslator
Z80Z80 CoreCore
Miscellanea
Disassembler and profiler
–Z80 instruction decoder
–Every instruction is used
Debugging tools
–Virtual VDP and MEM
–Software interface with SPART
•With the range of Baud Rate: 9600 - 115200
–Bus traffic generator for VDP debugging
–Bus traffic playback based VDP
Processor Debugging Tools
Remote VDP / memory Originally, it was a part of plan B (no VDP) All memory / IO requests from the processor are sent to the
software emulator through SPART (serial port, 57600 baud) captures all memory / IO traffics comparison with software
emulator
z80
RemoteVDP/meminterface
SPART
mem/IO
busVDP / mememulator
traceserialport
PC, addr, data
FPGA
Processor Debugging Tools
Remote VDP/mem handshaking Processor sends PC, inst, addr to emulator Emulator echoes back to the processor for:
flow control debugging break point communication error detection
Debugging issues Trace divergence for different timings
VDP Debugging: Steps
Steps in debug1) Correct image painting2) Left SRAM update:Correct image setup and attribute3) Right SRAM update:Painting correct image
Use bus traffic and SRAM.v to dump memory for checking
VDP Debugging: Challenge
Using RAMDAC Not enough documentation: timing issue Color update takes several clock cycles Dark image-- not disabling ethernet port
Sprite Ordering: error in documentation(1st try show no sprite)Bus traffic: Not a self contain command and data (sequence matter)
Demo
1) Emulator A) Translated Z-80 into u-ops B) Bus unit play back
2) Debugging tools: A) Modelsim and emulator tracer B) SPART version
3) Sega game system
Problems Encountered
VDP Documentation Core generated files Timing simulation accuracy Debugging issues
RISC core Memory interface issues Special Cases in instructions
Memory Controller