shakti riscv monterey · shak!processor!program! g s madhusudan (principal scientist) neel gala...

32
Shak& Processor Program G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Upload: others

Post on 01-Feb-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

 Shak&  Processor  Program  

G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate)

RISE Group, CSE Dept.

IIT Madras

Page 2: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Overview  n  Goals  of  the  program  –  Consolidate  various  Processor  development  ini8a8ves  and  create  a  viable  processor  eco-­‐system  

n  Research  in  design  of  next  genera8on  Processors,  Interconnects  and  Storage  Systems  n  Experimental  Processors  n  SHAKTI  family  of  processors    

n  Low  power,  Mobile,  Server  and  HPC  variants  n  RapidIO  based  interconnect  for  the  SHAKTI  family  n  RapidIO  based  SSD  storage  to  complement  SHAKTI  server  class  processors  

n  OS,  Compilers,  BSP  and  related  SW  sub-­‐systems  n  Reference  designs  and  boards  

Page 3: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Program  Goals  n  Processor  Research  

n  SHAKTI  family,  Adap8ve  System  fabric/RapidIO  Interconnect  ,  Lightstor  Storage  

n  Experimental  architectures  like  Tagged  ISA,  Belt  systems  and  Single  Address  space  systems    

n  Low  power  design  

n  Bluespec/Chisel  based  workflow  n  ADD  workflow  n  Formal  Techniques  for  system  design  workflow  n  Low  power  design  techniques  and  tool  development  

Page 4: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Program  Goals  n  OS  development  for  SHAKTI  

n  Linux,  Android    and  L4  BSP  development  n  Compiler  development  

n  OS  research  for  Experimental  processors  n  Single  address  space  OS  n  Secure  OS  for  Tagged  CPU  arch.  n  Languages  for  secure  CPU/OS    

Page 5: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Processor  Variants  

Page 6: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Overview  n  The  SHAKTI  Processor  systems  encompasses  the  following  HW/system  components  n  A  family  of  cores  ranging  from    uC  class  cores  to  Server  class  cores  

n  A  collec8on  of  interconnect  fabrics,  from  single  core    AHB  based  bus  fabrics  to  100+  core  mul8-­‐stage  fabrics  

n  Serial  RapidIO  based  interconnect  for  external  I/O  and  chip  to  chip  CC  interconnects  

Page 7: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Overview  n  SSD  based  storage  system  that  uses  SRIO  to  provide  mul8-­‐million  IOP  and  exabyte  scalable  storage  

n  Hybrid  Memory  Cube  based  memory  controller  to  provide  a  scalable  fabric  based  memory  architecture  

n  Security  components  to  provide  trusted  compu8ng  support  

n  Fault  Tolerant  SoC  components  (cores,  fabrics)  for  safety  cri8cal  applica8ons  

Page 8: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Shak&  Processor  Family  •  C-­‐Class  

–  32-­‐bit  Micro-­‐controller  –  RV32I  

•  I-­‐Class  variant  –  64-­‐bit  GP  controller  –  Single-­‐threaded,  Quad  Issue,  etc.  

•  S-­‐Class  –  Targe8ng  Server  Applica8ons  –  Mul8-­‐stage  fabric  –  Hybrid  Memory  Cube.  

 

•  M-­‐Class  –  Mul8-­‐core  version  of  I  Class  

high  performance  and  embedded  applica8ons.  

–  Targe8ng  complex  SoC  systems  

•  H-­‐Class    –  64+  Cores  –  SIMD/VPU  Support  

•  T-­‐Class  –  Tagged  ISA  for  security.  

 

Page 9: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Secure  variants  •  All  SHAKTI  variants  can  have  secure  variants.  •  Standard  variant  will  have  a  Trustzone  like  func8onality  for  a  formally  defined  OS  TCB.    

•  Other  standard  security  blocks  like  HAB  and  security  co-­‐processor  will  be  developed.  

•  Experimental  T-­‐class  variants  will  have  tagged  ISA  architecture.  

Page 10: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Tagged  ISA  •  Various  approaches  being  examined,  final  version  will  most  likely  use  a  combina8on  of  the  free  space  available  above  the  43rd  bit  in  64  bit  pointers  for  smaller  tags  and  an  indirect  scheme  for  larger  tags.  

Page 11: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

C-­‐  CLASS  PROCESSOR  AND    

FAULT  TOLERANCE  

Page 12: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

C  Class  Processor  

Fetch   Decode   Execute   Memory   Write  Back  

RF  Inst  Cache  

Data  Cache  

BPU  

•   32-­‐bit  5  Stage  Pipeline  with  Branch  Predic8on.  •   Supports  all  Integer  Instruc8ons.  •   AHB/AXI-­‐Lite  bus  

Page 13: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

C  Class  Processor  •  More  Features  – An  L1  Cache.  – Op8onal  single  and  mul8-­‐cycle  mul8plier,  DSP  instr.  – Memory  protec8on  (8  regions)  –  Tournament  branch  predictor.  –  Targe8ng  50-­‐75MHz  on  FPGA.  – Modularized  RTL    

•  Target  Applica8ons  – Non-­‐MMU  class  control  applica8ons  (Cortex  M  class)  –  Secure  and  FT  variants  

Page 14: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

•  Fault  Tolerant  features  to  meet  the  ISO26262  type  standards  (HW+SW  eco-­‐system)  

•  Will  have  companion  formally  verified  micro-­‐kernel  OS  •  Informa8on  Redundancy    

–  Parity  check  at  each  pipeline  stage  •  ALUs  

–  DMR  –  Dual  Modular  Redundancy  (Parameterized).  –  Time  redundancy  to  isolate  faulty  module.  –  Recomputa8on  with  different  coding  schemes  for  different  Arithme8c  

opera8on  for  fault  isola8on.  –  Proposed  technique  bridges  the  gap  between  space  redundancy  and  

8me  redundancy.  

Fault  Tolerant  Variant  

Page 15: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Fetch   Decode  

ALU1  

ALU2  

MEM   WB  ==  

 Parity  Checker  +  Error  Correc8on  

TR  Check  +  Fault  Isola8on  

Logic  

•   Programmable  Tolerance  level.  •   The  number  of  cycles  for  TR  can  be  programmed  to  register.  

•   Programmable  Confidence  driven  compu8ng    •   Error  resilient  ini8a8ve.  

Micro-­‐architecture  

Parity  generator  

Page 16: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Micro-­‐architecture  •   Performance  enhancement  through  ILP  •   Split  ALU  into  Func8onal  Units.  •   Out  of  order  dispatch.  •   Use  ROB  structure  to  maintain  in-­‐order  commits.  •   Extra  Stages  Added.  

Decode   DISPATCH  

Add/Sub  

Add/Sub  

Mul  

Mul  

Logical  

Logical  

==  

==  

==  

Fault  Handling  

Fault  Handling  

Fault  Handling  

ROB  

Page 17: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Cycle     Compare  Result   Error    Type   Ac&on  Taken  

1   Match   No  error   Posted  from  ALU1  

2,3,4   Mismatch   Transient     No  pos8ng  (Redo  the  opera8on)  

5   As  transient  error  persists  for  3  consecu8ve  cycles,Permanent  error  flag  raised.Recompu8ng  with  different  coding  schemes  for  different  opera8on  to  iden8fy  faulty  ALU.  Status  of  health  flag  of  ALU’s  will  be  set  to  decide  data  selec8on  and  pos8ng.  Post  the  data  from  healthy  ALU.    

Cycle     ALU  1  Health  Flag  

ALU2  Health  Flag  

Conclusion/Ac&on  Taken    

6.   0   0   Not  able  to  detect  faulty  ALU,  Post  ‘0’  and  status  of  health  flag  status  will  be  send  to  ISR  to  decide  

0   1   ALU2  faulty.  Isolate  ALU2  and  Post  from  ALU1  here  onwards.  

1   0   ALU1  faulty.  Isolate  ALU1  and  Post  from  ALU2  here  onwards.  

1     1   Both  ALU1  &  ALU2  faulty.  Post  ‘0’  and  status  of  health  flag  status  will  be  send  to  ISR  to  decide  

7.   Post  from  one  available  healthy  ALU  

Fault  Handling  Methodology  

Page 18: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Opera&on   Normal  Opera&on  

Recompu&ng  Opera&on  

                   Fault  detec&on  capability  

ADD/  SUB   F  =  OP1  ±  OP2  (Cin=0  implied)  

F’  =  (~op1+1)  ±  ~op2                            (+1  to  take  care  cin=1)  

As  F  &  F’  are  complementary,stuck  at  failure  at  any  bit  posi8on  will  be  known.  

Logical     F=  OP1  and/or/    xor  OP2  

Swapped  Operands   As  Upper  half  (31:16)  bits  will  be  done  by  lower  half  (15:0)  circuit  and  vice  versa.  Error  at  any  bit  posi8on  will  be  detected.  

Signed  Mul8plica8on  

F1=  op1*  op2   F2=(2’s  Complement  of  op1)*op2    F2  =  -­‐F1    Comparison  of  2’s  complement  of  F2  &  F1  for  error  detec8on      

No  error  Condi8on  :  F1=  2n  -­‐  F2      Stuck  at  failure  at  par8cular  bit  posi8on  i  ,  Rnormal=  F1±2i      ,  Rrecomp  =  F2  or  F2±2i  .      2’s  complement  of  Rrecomp  =  2n  -­‐F2  or  2n    -­‐  (F2±2i)  =  F1  or  F1  -­‐+    2i                    Idea  is,  if    Rnormal  increases  then  2’s  complement  of  Rrecomp2  will  remain  same  or  decreases  and  vice  versa  therefore  ERROR  detected.      

Unsigned  Mul8plica8on  

 Here  we  have  to    append  the  zero  at  the  MSB  and  use  the  signed  mul8plica8on  method  for  fault  handling  

Re-­‐computa&on  Methodology  

Page 19: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

I  CLASS  PROCESSOR  

Page 20: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

I  Class  Processor  •  500Mhz  –  1Ghz  class  mul8-­‐core  variant  •  Supports  all  64-­‐bit  RISCV  Instruc8ons.  •  Parameterized  Issue  width.  •  IEEE-­‐754  single  and  double  precision  FPU.  •  Enable  quick  design  space  explora8on.  – Highly  parameterized  

•  Ini8al  target  –  45/65  nm  

Page 21: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Micro-­‐Architecture  Explicit  renaming:  •  Physical  register  file  stores  

both  specula8ve  and  commined  values.  

•  Register  alias  table  stores  mapping.    

•  Simple  commit  logic.  •  FRQ:  Free  Register  Queue  

Page 22: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

PRF  vs  ROB  renaming  •  Physical  Register  File  :  –  Decouples  Issuing  and  operand  fetch.  –  One  source  for  all  operands.  –  Less  Data  movement.  –  No  WAW  and  WAR  checks  required  at  all  –  Revert  Mapping  in-­‐case  of  mis-­‐predic8on/  excep8on  –  Scalable    

•  ROB  :  –  In  order  commit  –  precise  excep8on  handling.  –  Plenty  of  data  movement  for  operand  forwarding.  – With  RS  allows  distributed  scheduling  points.    

Page 23: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

S-­‐CLASS  PROCESSOR  AND    

SERVER  FABRICS  

Page 24: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

•  64  bit  SMT  variant  for  server  applica8ons  – Virtualiza8on  supported  (CPU  and  I/O)  –   1-­‐3  Ghz  

•  SoC  configura8on  –  Clusters  of  8  cores/cluster  each  connected  via  fabric  for  32  clusters  (total  256  cores).  –  Intra  cluster  –  Ringbus  –  Inter-­‐cluster  –  Mesh,  Ring,  NoC  –  Socket  to  Socket  CC  interconnect  –  SRIO  GSM  based  

•  Memory/Cache  – Private  L1,  shared/private  L2  and  shared  segmented  L3  cache  – Main  Memory  –  DDR4/Hybrid  Memory  Cube    

•  Parameterized  for  N-­‐threads  (8  max)  and  M-­‐issue(4  max).    

S-­‐  Class  Processor  

Page 25: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

FETCH   DECODE   ISSUE  

RF  RF  RF  RF  

Reserva8on  Sta8on  BPU  

Inst  Cache  

RF  RF  RF  ROB  

CDB  CDB  CDB  CDB  

ALU  ALU  ALU  ALU   ALU  FPU   ALU  ALU  ALU  Branch   LSU  

RF  RF  RF  TLB  

Micro-­‐Architecture  

Page 26: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Distributed  Vs  Centralized  Issue  Queue  

•  Unified  IQ:  •  Holds  all  types  of  instruc8ons.  •  Effec8ve  usage  of  IQ  space.  •  Has  higher  issue  logic  delay.  •  Higher  instruc8on  window  compared  to  

distributed  IQ.  

•  Distributed  IQ:  •  Each  smaller  sub-­‐IQ  holds  specific  

type  of  instruc8ons.  •  Lesser  issue  logic  delay  compared  

to  unified  IQ.  •  Easier  to  power  gate  unused  queue.  

Page 27: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

•  Hybrid  Ring  +  Mesh  fabric  •  Each  cluster  uses  512  bit    Bi-­‐direc8onal  Ring  for  up  to  8-­‐cores.  •  Closely  coupled  MOESI  variant  with  home  node  •  Power,  Area  and  verifica8on  efficiency  over  mesh  •  Memory  ordering  seman8cs  easier  to  prove  

Server  Fabrics  for  the  S-­‐Class  

Core  

L2  

Core  

L2  

Core  

L2  

Core  

L2  

L2  

Core  

Core  

L2  

Core  

L2  

Core  

L2  

HMC  /  DDR  RIO  -­‐  CC  

Page 28: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

•  Hybrid  Interconnect  scheme  –  Rings  scale  well  for  up  to  8  cores.  –  Inter  ring  connec8on  via  mesh  or  ring  depending  on  server  characteris8cs  –  Cache  coherency  will  be  directory  based  and  will  depend  on  CC-­‐NUMA  

aware  SW  architecture  

Server  Fabrics  (contd.)  

Mesh  of  Rings  with  2  bridges   Mesh  of  Rings  with  4  bridges   Hybrid  with  hierarchical  rings  

Page 29: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

•  10/25G  RapidIO  for  socket-­‐socket  connec8on  –  A  5+  state  MOESI/MESIF  like  directory  based  protocol  will  be  the  CC  protocol  

–  Scalable  up  to  128  sockets  (  16  clusters  of  8  sockets  )  –  Quasi-­‐parallel  SRIO  variant  being  examined  to  reduce  latency  since  SERDES  latency  is  too  high  (vis  a  vis  QPI)  

Socket  Interconnect  

Proc-­‐1   Proc-­‐2  

Proc-­‐4   Proc-­‐3  

RIO  

Page 30: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Adap&ve  System  Fabric    n  Proposed  experimental  fabric  to  unify  Memory,  I/O  and  

Networking  for  clusters  and  data-­‐centers.  n  Based  on  persistent  memory  based  systems  with    NVM  storage    n  Ini8al  candidate  being  explored  

n   A  hybrid  memory  cube    +  RapidIO  fabric  n  Common  PHY  +  physical  connec8vity  for  both  fabrics  n   Applica8on  specific  protocols  will  dynamically  configure  an  interconnect  link  for  specific  purpose  

n   The  fabric's  topology  can  be  changed  dynamically  to  suit  the  workload  in  ques8ons.  The  intelligence  will  reside  in  the    fabric  router  

 

Page 31: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Adap&ve  System  Fabric      n  A  Microkernel  based  OS  architecture  is  also  being  proposed  to  take  advantage  of  the  HW  fabric,  computa8on  can  dynamically  move  to  where  the  data  is  located  

n  Berkeley  Spark/Tachyon  being  used  to  evaluate  viability  of  architecture  for  data  analy8cs  scenarios  

Page 32: SHAKTI RISCV MONTEREY · Shak!Processor!Program! G S Madhusudan (Principal Scientist) Neel Gala (PhD candidate) RISE Group, CSE Dept. IIT Madras

Thank  you  

•  Contact  and  further  info  :  – Madhusudan  :  [email protected]  – Neel  :  [email protected]  – SHAKTI  :  hnp://rise.cse.iitm.ac.in/shak8