design review scooby doo gang: jonathan hsieh annie pettengill jim hollifield jeff barbieri matt...
TRANSCRIPT
Design Review
Scooby Doo gang:Jonathan Hsieh
Annie Pettengill
Jim Hollifield
Jeff Barbieri
Matt Silverstein
Design Goals
• Mystery Machine Requirements:– Correctness / Proficiency– Compliance to external interface / protocol– support an interface for human to play– Asynchronous Decide now button
Other Goals
• Priorities– Speedup over pure baseline software 68HC11
based-implementation.• Hardware “functional units”• software optimization
– Interesting architecture– Opening / closing book (download/upload in play
configurations)– Hardware HCI. (software hci optional)
Function call dependance
Think
Search
Main
Init
Gen
Think
MakeMove
Gen
MakeMove
Gen
For
Search
Quiesce
In_check
Gen
Sort_pv
For
MakeMove
Search
Takeback
Queisce
In_check
Gen
eval
For
MakeMove
Quiesce
Takeback
Gen_caps
Sort_pv
More Call depedances
For
In_check
Attack
eval
Eval dark pawn
Eval_light_pawn
Eval_light_king
Eval dark king
Gen
Gen_push
Gen_push
Gen_promote
Make_move
In_check
Can_castle
Takeback
In_check
Read / Write access analysis
• Eval: – there are no writes from the board structure (but
many reads).
• In_check / attack:– many reads. Returns a boolean, could be a array of
bits to lookup to see if being attacked
• Gen:– generates list values. – Variable times
Quantify parameters
• A program on sun machines
• Compiles code with special hooks
• graphically displays call info and run time info for profiling programs.
• The idea -- Amdahl’s law -- speed up the slowest parts get most speedup
• slowest parts: move to hardware!
Quantify Results
• After doing about 20 moves these functions take the most time (not including print and scanf.
Attack 49.13% Sort 2.49% Quiesce 0.38%Eval 19.99% Eval_dark_pawn 1.99% Eval_dark_king 0.36%Gen_caps 9.14% Eval_light_pawn 1.74% Eval_light_king 0.34%In_check 6.53% Makemove 1.15% Search 0.19%Gen 3.52% takeback 0.88% Think 0.00%
Summed run-time analysis
• In_check -> attack– ~55% of program run time!– Straight forward for hardware
• Eval -> eval_*– ~25% program run time!– Straight forward for hardware.
• Gen*– ~15% of program!
Conclusion
• Optimize in_check, eval, and gen by placing in hardware
• This is most effective if board in FPGA registers. -- try to figure out if possible to use FPGA as memory for processor.
• Keep recursion on processor.
Hardware software partition
SW / CPUMemory structure allow for recursion / dynamic structuresCompiler can handle thatRecursion cannot really happen parallely (?)Should be able to access RAM as well as FPGA registers using
FPGAMany parallel executions happeningHigh speed custom implementationsGood for static structures and constantsSimple for “read” only functions if things read in registers. (always execute!)
MemoryMove historiesRecursion stacks
Serial interfaceCan access anything memory cpu can in simulation
Implementation plan
Implementation Plan
• Design hierarchy– HW/SW split
• HW subsystems / goals
• SW goals
– Physical design• HW/SW interface
• Memory access architecture
• FPGA/HC11/mem interaction.
Implementation Plan
• Handle all recursion on hc11 -- compiler and assembler code best for memory structures (trees, hash tables, etc.)
• Software analysis shows that 3 function trees: in_check/attack, eval, and gen take the majority of the algorithm’s time.
Shared memory architecture
FPGA HC11
Clk
Memory (Shared)
clk Psuedo clock
clk
Architecture features
• Observation: – Memory (12 ns=> 83 Mhz) is ~ as fast as max
FPGA speed(100 Mhz). – about 10x faster than 8Mhz HC11.
• Zoinks!– Clock set at high FPGA clock speed – HC11 clock: “psuedo clock.” a function in the
FPGA -- slows the FPGA clock to something in HC11 range.
Clocking Diagram
FPGA clk
FPGA clk x2
FPGA clk x4
(HC11 psuedo clk)
000 001 010 011 100 101 110 111Counter for clk
Clocking Diagram
FPGA clk
FPGA clk x8
(HC11 psuedo clk)
FPGA Mem read / Write
HC11 Mem read / Write
FPGA Mem read / Write
FPGA Mem read / Write
FPGA Mem read / Write
FPGA mem FPGA memHC11
mem
FPGA Mem read / Write
Psuedoclock possiblities
• Could allow for Processor to access memory as if it were the only thing using it.
• While the Processor is waiting for next clock tick, and done with memory, FPGA can R/W memory.
• FPGA can run and calculate information concurrently with the HC11!
FPGA Hardware Units
Eval Unit Attack/Check Unit
Psuedo Clk
Memory Bus
Controller
Gen Unit
Chess BoardRegisters
Chess PieceRegisters
Mem
HC11
HCI
FPGA/Memory organization
• Specific addressses would contain specific information all the time.– Board representation address– current eval score– in check map– next generated moves
• Addresses can be proxied by fpga so that fpga registers acts like memory to HC11!
Performance prediction
• Baseline = 1;• Best case based on profiling: (assuming
hyper idealized HW)– 55% => 0% attack – 25% => 0% eval.– 15%=> 0% gen.
• HW accelerated => 0.05 baseline!– 20x speedup.
Attack/in_check
Annie Pettengill
In_check
• Input is the color of the side to check if it is in check
• Outputs true if in check, otherwise outputs false
in_check• In_checks looks at each of the 64 squares
for the king of the color passed in to the function
• It then calls attack on that square and color• If we used a pieces implementation (versus
board) this would change a for loop and if statement into a single call of the attack function
In_check
64 times…..
Board Implementation Piece Implementation
Attack
• Inputs the square the piece is on and the color of the other side
• Outputs true if the square is being attacked by the color s and false if it is not
Details about Attack
• The pawn is looked at separately because the way it moves is different from the way it attacks
• The moves as organized now are different for black and white pawns
• The different pieces are evaluated for every direction-to see whether they can actually move there and whether they can slide
Bigger Picture : Attack Tables
• Construct two chess boards: one for white pieces and one for black
• Instead of a piece, each square would contain a true or false depending on whether the square was being attacked for that color piece
So….
• Every time a player makes a move, the attack function on the fpga, rebuilds the table
• Can tailor the attack function for specific pieces in specific squares using a combination of board and piece implementation
Implementation of Attack Tables
ch eck fo r p ieces b e tw een sq u are an d rook /q u eenb oard im p lem en ta tion
ch eck row an d co lu m n fo r rook o r q u eenp iece im p lem en ta tion
ch eck fo r p ieces in b e tw een b ish op an d sq u areb oard im p lem en ta tion
ch eck d iag on a ls fo r b ish opp iece im p lem en ta tion
fo r p aram eter p ieces - ig n o re o ff th e b oard
ch eck lin e o f a ttack fo r p aw n sp iece im p lem en ta tion
fo r p aram eter p ieces - ig n o re o ff th e b oard
ch eck kn ig h ts a ttackp iece im p lem en ta tion
g iven a p iece on th e p aram eter
Advantages
• Avoid lots of useless searching, you know exactly where each piece is with the piece implementation
• If running attack on one square, why not on 128 squares in parallel? – or perhaps use a piece implementation for of attack table and only run it on 32 squares…..
• Acts as a lookup table for other functions
Parameter versus Internal
• Use special tailoring for parameter squares that only check for pertinent cases – use a special numbering scheme for the location of pieces in piece implementation
• Internal squares stay the same
Eval
Matthew Silverstein
Eval() Function Inputs
• What side the current move is for– Light or Dark
• Board Configuration– Algorithm uses two 64 space arrays to represent the
board• Which pieces are where (piece[64] structure)
• What color the pieces are (color[64] structure)
– The hardware overhead of can be cut by using an array that is indexed by piece, not but position
Eval() Function Outputs
• Score – an integer value– based the present configuration of the board– Calibrated for if the current player is Light or
Dark
Eval
• The function call breaks down into three main subsections– Initialization
• Takes the current board configuration and sets all of the internal registers to an appropriate value
• Sets up the pawn_rank, pawn_mat, and pawn_count structures
– Bonus / Penalty assess• For each square calculates either a bonus or penalty;
based upon relative benefit of certain pieces being on that square
• Sums the results for each square and provides a Light_score and a Dark_score
– Calculate score• Combines the Light and dark scores to provided an
single return value for the function, based on if it is presently Light or dark’s turn.
Eval structure
Board registers
Init
Bonus and penaltycases
Calculate
Light or dark
Bonus / Penalty Assess structure
Eva
l_pa
wn_
sq
No
pena
lty
Kni
ght p
enal
tyB
isho
p pe
nalt
yR
ook
pena
lty
Eva
l_ki
ng_s
q
mux
One for each input square
adderAdds the values generated at each blockScore_light
Eva
l_pa
wn_
sq
No
pena
lty
Kni
ght p
enal
tyB
isho
p pe
nalt
yR
ook
pena
lty
Eva
l_ki
ng_s
q
mux….
Eval_pawn
• Inputs:– Square to calculate penalty for– Pawn_rank structure– Pawn_count structure
• Based on the inputs there is a possibility of assessing up to four different penalties
Eval_pawn Penalties, Bonuses
• Penalty A: if there’s a pawn behind this one
• Penalty B: if there are no friendly pawns adjacent to the current pawn
• Penalty C: if the pawn is not isolated
• Bonus D: if the pawn is passed
Eval_pawn structure
Pen
alty
A
0 Pen
alty
B
0 Pen
alty
C
0 Pen
alty
D
0mux mux mux mux
adder
Pawn penalty “Control logic”
Squ
are
paw
n_ra
nk
paw
n_co
unt
Eval_king
• Inputs (same as eval_pawn):– Square to calculate penalty for– Pawn_rank structure– Pawn_count structure
• The function returns a penalty value that is adjusted depending on how well shielded the king is by its own pawns
Eval_king Penalties
• The File A, B, C, F, G, and H Penalties– These penalties are assessed when there is no pawn in
File, one row away from the king.
– The magnitude of the penalty is dependent on the distance in the row the pawn is from the king
• The pawn attack Penalty– This penalty is assessed if the enemy's pawns have
advanced too far down the board towards the king
Eval_king structure
Pawn_countpawn_rank
Fil
e A
pen
alty
Fil
e B
pen
alty
Fil
e C
pen
alty
Paw
n A
ppro
ach
Fil
e F
pen
alty
Fil
e G
pen
alty
Fil
e H
pen
alty
Paw
n A
ppro
ach
No
pena
lty
Adder Adder
muxcontrol
Bonus Penalty Assess Structure
• Switching from a position to a piece representation of the board
• No longer need to repeat mux 64 times• Adder now has 16 inputs one for each piece (vs.
64 inputs). • Knight, Bishop, and Rock still strait table lookups• Pawn_eval is repeated 8 times
– Still better then 64 times
Gen
Jim Hollifield
gen() function
• Searches through all 64 spaces
• Skips empty spaces and opponent pieces
• Creates all possible moves for each friendly piece
• Pushes (with helper function gen_push() onto move_stack
Possible Moves
• Basic Pawn Moves– Move forward 1 or 2 spaces– Take Left or Right
• Non-pawn Piece (N, B, R, Q, K) moves– B, R, Q can slide (move more than one space),
but stops when another piece is blocking path
• Castle (King or Queen side)
• En Passant
For Pawns
Pawn
Light Dark
TakeLeft
TakeRight
Move Forward1 Space
Move Backward1 Space
Same as Light,But Reversed
For Non-Pawn PiecesIs space Empty?
Is Piece Friendly?
Take Piece
Move tonext piece
Move tonext direction
Move to nextsquare in current
direction
Does Piece “Slide”?
Yes No
No
Yes
Yes No
Edge of BoardNo More Directions
Other Functions• gen_caps()
– Same as gen(), except only checks for capture moves
– called by quiesce()
• gen_push()– Pushes moves from gen and gen_caps onto
move_stack
• gen_promote()– Pushes pawn promote move onto move_stack– One move for each possible piece (Q, B, R, or N)
move_stack
move 0move 1move 2
move nmove 0move 1
gen_begin
Ply 0
move 2
move n
Ply 1
gen_end
Before Ply 0
gen_begin
gen_end
After Ply 0 (before Ply 1)
move 0
Ply 2
. . .
gen_begin
gen_end
After Ply 1 (before Ply 2)
. . .. . .
HW/SW Breakdown
• FPGA puts moves into stack structure
• Currently done by gen(), gen_caps(), gen_push(), and gen_promote() functions
• HC11 sorts stack structure
• Currently done by sort() and sort_pv() functions
gen() Hardware
Move Generator (FSM)
PusherStack
(in Shared Memory)
gen_begin
Pieces
Board
Moves
current_plygen_begingen_end
Generator FSMP take
Left
P take Right
P move 1
P move2
N move0
N move1
N move2
N move3
N move4
N move5
N move6
N move7
x2
R moveForward
R moveRight
R moveBack
R moveLeftx2
x8x7
x7
x7
x7
B moveForward
Left
B moveForward
Right
B moveBackRight
B moveBackLeft
x2x7
x7
x7
x7
Q moveRight
Q moveForward
Right
Q moveForward Q move
ForwardLeft
Q moveLeft
Q moveBackLeft
Q moveBack
Q moveBackRight
x7
x7
x7
x7
x7
x7
x7
x7
K moveForward
K moveForward
Left
K moveLeft K move
BackLeft
K moveBack
K moveBackRight
K moveRight
K moveForward
Right
Reset
CastleK side
CastleQ side
En PassantLeft
En PassantRight
DONE
Pawn
Knight
Rook
Bishop
Queen
King
Special- moves skipped by gen_caps()
*
*
* **#
#
#
#
#
- moves checked by gen_promote()
HCI
Jeff Barbieri
Possible Ideas
• Interface Design #1– 8x8 Bar LED board on left displaying the piece that is
in each location (period is black/white)
– DIP Switches to select the from/to for a move
– “Clock” to make the move
• Interface Design #2– 8x8 Bar LED board on left displaying the piece that is
in each location (period is black/white)
– Button in each square next to the Bar LED to select the from and then to for a move
– “Clock” to make the move
Interface Design Idea #1
Rib
bon
Cab
le C
onne
ctio
n
From ToDIP DIP
Make Move
Latches&
Other Logic
Interface Design Idea #2
Rib
bon
Cab
le C
onne
ctio
n
Make Move
Other Ideas
• Beep to signify illegal move
• Touch-screen for the board
• LEDs to signify which player’s move it is
• Alternate board layouts (many of these)
Considerations
• Selection of parts– Numeric, Alphanumeric, LCD, etc. LEDs– Push buttons– Latches
• Costs for parts– Number of FPGA pins needed– Time to wire-wrap board– $$$ for parts
Considerations
• Feasibility of design
• Design of board in relation to design of chess game
Summary
• Many possibilities
• Two basic likely designs
• Lots of thought and planning needs to go into design before acquiring parts and building
Software optimizations
Jonathan Hsieh
Software optimizations
• All that remains is:– sort 2.49% -> currently a O() alg, can change to
be a heap (log n / constant)– makemove 1.15% -> these will be slower– takeback 0.88% -> these will be slower– quiesce 0.38% -> will probably go up– search 0.19%
Algorithm improvements
• search– Killer Heuristic (search attack branches first,
implemented already done)– think on opponent’s time. (multi
threading/interrupts! Do it)– history heuristic. (built in already?)– tighter searches (could be implemented)– refutation tables (not sure what they are)– transposition tables (not sure what they are)
• Eval– Pawn formation hash table (not needed, in hw)– King safety hash table. (not needed, in hw)
• Jinkies!– can probably squeeze out another 20% reduction.
0.05 => 0.04.
• Idealized speedup target = 25x!
Integration plan
FPGA:Attack FPGA:Eval FPGA:Gen
Software/Profiling
HW/SW Partitioning
Physical Design
Integration/Debugging
HW/SW Interface
Optimizations
CoSim
SW modification / optimization
Baseline Stats
FPGA:HCI
Baseline Stats
Division of laborIf it weren’t for those meddling kids!
Jon
• Group leader – prevent him from dropping the class
• Software guru (algorithm, HC11, HiWare)– strong software background– hates wires
• overall system design– some hw/sw partitioning experience (research).
Jim
• move generation hardware considerations– when wiring requirements dried up, we moved
from individual projects to pairs -- volunteered to put thought into and implement
• Wirewrap Whiz (Physical Interfacing)/ FPGA interface Whiz– doesn’t care what he does and didn’t volunteer
for a verilog job at first
Jeff
• Project management software– Experience with lots of MS software.
• HCI Hardware– display, inputs, verilog, (beeper?) and (wiring?)
Matt
• Verilog Whiz – eval function design
• wanted it and did a good job with it.
• Memory Master– got FPGA demo 0 mem->fpga->mem proof
working.
• Annie– Soldering Wire wrap Queen
• wire wrapped everything a lot faster than jim
– HC11 hardware interface• Got that portion of demo 0 to work.
– Attack function design.
Demonstration Plan
• Demo 1 (week of 10/4)– Stats on baseline chess algorithm.– HW/SW partitioning and interfacing method. – Details about HW sub systems
• eval / attack / gen / hci
– (Co)simulation of separate parts of HW/SW partitioning
• Demo 2 (Week of 11/1)– Frozen Physical Hardware– Co Simulation working with hw/sw– Chess that works and communicates.– Preliminary stats on new design– Optimizing / Debugging process
• Final Demo 11/29– Optimizations and speedup statistics.– PC interface GUI (depending on interface)
Demo 1 Work Schedule
• 9/17 F. Demo 0 completion
• 9/20 M. Internal design review
• 9/22 W. HW/SW partitioning details
• 9/23 R. Design review
• 9/29 W. HW/SW interfacing resolved
• 10/4 M. Verilog simulations for FPGA stuff.
Demo 2 Work Schedule
• 10/11 M. HW/SW integration/Co-simulation
• 10/18 M Physical Hardware frozen
• 10/25 Algorithm Optimizations
• 11/1 Clock speed optimizations
Final Demo
• Final review ready. Add more bells and whistles
• Have one month for unpredicted delays..