research presentation

86
Nirav A. Desai desai.nirav.12 [email protected] 1

Upload: nirav-desai

Post on 30-Oct-2014

21 views

Category:

Documents


1 download

DESCRIPTION

Covers summary of research work I have done till date and what I am working on right now.

TRANSCRIPT

Page 1: Research Presentation

Nirav A. Desai [email protected]

1

Page 2: Research Presentation

Nirav A. Desai [email protected]

2

Page 3: Research Presentation

Nirav A. Desai [email protected]

3

MM-Wave Active Sensor: BPSK Spectrum can be seen in the Spectrum Analyzer

Nirav Desai

Page 4: Research Presentation

Nirav A. Desai [email protected]

4

I assisted in these mm-wave MIMO experiments at UCSB

Page 5: Research Presentation

Nirav A. Desai [email protected]

5

Page 6: Research Presentation

Nirav A. Desai [email protected]

6

Page 7: Research Presentation

Nirav A. Desai [email protected]

7

Page 8: Research Presentation

Nirav A. Desai [email protected]

8

Page 9: Research Presentation

Nirav A. Desai [email protected]

9

Page 10: Research Presentation

Nirav A. Desai [email protected]

10

Page 11: Research Presentation

Nirav A. Desai [email protected]

11

Page 12: Research Presentation

Nirav A. Desai [email protected]

12

Page 13: Research Presentation

Nirav A. Desai [email protected]

13

Page 14: Research Presentation

Nirav A. Desai [email protected]

14

Page 15: Research Presentation

Nirav A. Desai [email protected]

15

Page 16: Research Presentation

Nirav A. Desai [email protected]

16

Page 17: Research Presentation

Nirav A. Desai [email protected]

17

Page 18: Research Presentation

Nirav A. Desai [email protected]

18

Page 19: Research Presentation

Nirav A. Desai [email protected]

19

Page 20: Research Presentation

Nirav A. Desai [email protected]

20

Page 21: Research Presentation

Nirav A. Desai [email protected]

21

Page 22: Research Presentation

Nirav A. Desai [email protected]

22

Page 23: Research Presentation

Nirav A. Desai [email protected]

23

Page 24: Research Presentation

Nirav A. Desai [email protected]

24

EE 5323: VLSI DESIGN 1 PROJECTCourse Instructor: Prof. Chris Kim

16-bit BRENT KUNG ADDER DESIGN in 45nM CMOSNirav DesaiID: 4280229

Department of Electrical and Computer EngineeringUniversity of Minnesota

Page 25: Research Presentation

Nirav A. Desai [email protected]

25

Page 26: Research Presentation

Nirav A. Desai [email protected]

26

Brent Kung Adder Gate Level Diagram

1. Input Block with Pre Computation

Input Adder Chain 1

Input Adder Chain 2

Input Adder Chain 3

Input Adder Chain 4

1X

1X

1X

1X

1.224X

1.562X

1.23X

1.274X

1.097X

1.553X

1.108X

1.034X

3.883X

3.043X

2.943X

10.1683X

10.8506X

36X

40X

Output Buffers to driveCapacitive Loads

Output Buffers to driveCapacitive Loads

Pi*Pi-1

Gi + Pi*Gi-1

Page 27: Research Presentation

Nirav A. Desai [email protected]

27

Brent Kung Adder Gate Level Diagram

2. Intermediate Dot Product Blocks

Intermediate Adder Chain 1

Intermediate Adder Chain 21X

1X

1X

1X

1.72X

6X

4X

16X

16X

Output Buffers to driveCapacitive Loads

Pi*Pi-1

Gi + Pi*Gi-1

Page 28: Research Presentation

Nirav A. Desai [email protected]

28

Brent Kung Adder Gate Level Diagram

3. Output Block for Post Computation

1.182X1.117X

Ci-1

Pi

Output Buffers to driveCapacitive Loads

Si

Page 29: Research Presentation

Nirav A. Desai [email protected]

29

Brent Kung Adder Transistor Level Design

XOR GATE

Page 30: Research Presentation

Nirav A. Desai [email protected]

30

Brent Kung Adder Transistor Level Design

Inverter Design Optimization

• NMOS Width = 90nm• PMOS / NMOS Length = 50nM• Vdd = 1.1V• Current Averaged Over One Period of 2 ns• Optimal PMOS Width = 165nM• βinverter = 165/90 = 1.834• Sizing for NAND, NOR and XOR Changed appropriately

Page 31: Research Presentation

Nirav A. Desai [email protected]

31

Brent Kung Adder Transistor Level Design

1. Input Block with Pre Computation

Input Adder Block Chain 1

Gate Number 1.000 2.000 3.000 4.000 5.000 Stage G Stage F Stage B Stage H Gate HGate Name BUFFER INVERTER NOR INVERTER NAND LOAD hg value 1.000 1.000 1.646 1.000 1.352 36.000 2.225 36.000 6.943 556.248 3.540f value 3.540 3.540 2.151 3.540 2.618648b value 2.893 2.400 1.000 1.000 1.000 1.000S Value 1.000 1.224 1.097 3.883 10.16831 36.000

Input Adder Block Chain 2

Gate Number 1.000 2.000 3.000 4.000 Stage G Stage F Stage B Stage H Gate HGate Name BUFFER INVERTER XOR NAND LOAD hg value 1.000 1.000 1.893 1.295 13.748 2.451 13.748 12.359 416.510 4.518f value 4.518 4.518 2.386 3.488b value 2.893 2.400 1.780 1.000 1.000S Value 1.000 1.562 1.553 3.043 13.748

Input Adder Block Chain 3

Gate Number 1.000 2.000 3.000 Stage G Stage F Stage B Stage H Gate HGate Name BUFFER INVERTER NOR LOAD hg value 1.000 1.000 1.646 3.941 1.646 3.941 6.943 45.038 3.558f value 3.558 3.558 2.162b value 2.893 2.400 1.000S Value 1.000 1.230 1.108 3.941

Input Adder Block Chain 4

Gate Number 1.000 2.000 3.000 4.000 5.000 Stage G Stage F Stage B Stage H Gate HGate Name BUFFER INVERTER XOR NAND INVERTER LOAD hg value 1.000 1.000 1.893 1.295 1.000 40.000 2.451 40.000 6.943 680.832 3.686f value 3.686 3.686 1.947 2.847 3.686447b value 2.893 2.400 1.000 1.000 1.000 1.000S Value 1.000 1.274 1.034 2.943 10.85056 40.000

3.94084

Logical Effort Design for Signal Chains labeled in previous slide #2

Page 32: Research Presentation

Nirav A. Desai [email protected]

32

Brent Kung Adder Transistor Level Design

2. Intermediate Dot Product Blocks

Logical Effort Design for Signal Chains labeled in previous slide #3

Intermediate Adder Block Chain 1

Gate Number 1.000 2.000 Stage G Stage F Stage B Stage H Gate HGate Name INVERTER NAND LOAD hg value 1.000 1.352 1.000 1.352 6.000 1.000 8.112 2.848f value 2.848 2.107 2.848b value 1.000 1.000 1.000S Value 1.000 2.107 6.000

Intermediate Adder Block Chain 2

Gate Number 1.000 2.000 Stage G Stage F Stage B Stage H Gate HGate Name BUFFER NAND LOAD hg value 1.000 1.352 2.848 1.352 2.848 2.000 7.701 2.775f value 2.775 2.053b value 2.000 1.000S Value 1.000 1.026

Page 33: Research Presentation

Nirav A. Desai [email protected]

33

Brent Kung Adder Simulated Performance

Voltage (V) Delay Max-C14 (nS)

Power Max (mW)

Power-DelayProduct (xE-12)

1.1 0.359 6.73 2.41

0.9 0.503 2.95 1.483

0.7 0.937 0.924 0.865

Simulations with maximally sized 1 stage buffers as determined by Logical Effort Designof individual chains

Voltage (V) Delay Max-C14 (nS)

Power Max (mW)

Power-DelayProduct (xE-12)

1.1 0.403 5.186 2.089

0.9 0.569 2.277 1.295

0.7 1.069 0.692 0.739

Simulations with minimally sized 1 stage buffers

Without Parasitic Extraction and Interconnect Parasitics buffering doesn’t improve performance significantly.

Page 34: Research Presentation

Nirav A. Desai [email protected]

34

Brent Kung Adder Worst Case Delay

Input Pattern: A: FFFF B: 0000 -> 0001

Dotted Lines show Carry Bits 15 and 14

Carry Bit 15 Carry Bit 14

Page 35: Research Presentation

Nirav A. Desai [email protected]

35

Brent Kung Adder Layout

Input Block with Pre Computation

Input Inverters for Bit 0 and Bit 1

Output BuffersPEX waveforms show

larger size may be needed

XORNAND10X

Page 36: Research Presentation

Nirav A. Desai [email protected]

36

Brent Kung Adder Layout

XOR 1.553X

Page 37: Research Presentation

Nirav A. Desai [email protected]

37

Brent Kung Adder Layout

NAND 10.57X Layout with inter digitated fingers to reduce parasitics

Page 38: Research Presentation

Nirav A. Desai [email protected]

38

Brent Kung Adder Layout

Intermediate Dot Product Generator

Output BuffersPEX Waveforms

show largerSize may be necessary

here

Page 39: Research Presentation

Nirav A. Desai [email protected]

39

Brent Kung Adder Layout

Output Stage with Buffers

Page 40: Research Presentation

Nirav A. Desai [email protected]

40

Brent Kung Adder Layout

Full Layout: 49.5um X 48.6um

Page 41: Research Presentation

Nirav A. Desai [email protected]

41

Future Design Modifications

• The design uses large buffers at the output of every stage to drive large capacitances• The buffers are not needed at nodes with low fanouts and can be eliminated.• The buffers at input nodes right now cause more power consumption and add to the delay .• Thus the overall performance can be improved with fewer buffers.

Page 42: Research Presentation

Nirav A. Desai [email protected]

42

References:

Course Slides from Prof. Kia Bazargan’s Course on VLSI

A Taxonomy of Parallel Prefix Networks

(David Harris ) – Reference paper on course

website

Digital Integrated Circuits by Jan Rabaey

Page 43: Research Presentation

Nirav A. Desai [email protected]

43

SRAM DESIGN PROJECT PHASE 2

Nirav Desai4280229

VLSI DESIGN 2: Prof. Kia BazarganDept. of ECE

College of Science and EngineeringUniversity of Minnesota, Twin Cities

43University of Minnesota

Page 44: Research Presentation

Nirav A. Desai [email protected]

44

SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE •NMOS inverter = 110nM PMOS inverter = 220nM NMOS Access = 90nM•NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess=2.4 •Cbitline = 0.747fF for 512 cell array ( Interconnect Parasitics from ASU PTM Website )

University of Minnesota

Page 45: Research Presentation

Nirav A. Desai [email protected]

45

SRAM CELL READ AND WRITE MARGIN FROM BUTTERFLY CURVE •NMOS inverter = 150nM PMOS inverter = 555nM NMOS Access = 180nM•NMOSinv/NMOSaccess = 1.2 PMOSinv/NMOSaccess = 3 Cbitline = 0.747fF•Curve shows SRAM cell is close to write failure. •Bitline Precharge to less than 1.1V could be explored to increase SNM.

University of Minnesota

Page 46: Research Presentation

Nirav A. Desai [email protected]

46

Simulation Setup

• M0,M1,M3,M4 form the cross coupled inverter pair• M5,M6 are access transistors• C1, C2 is the bitline capacitance• M7 is the precharge switch for bitline ( bit ) - V3 precharges the bitline to 0.8V• V6 precharges bitbar and writes a 0 to the cell

V(write)

V(ic) V(word)

V(qbar)

V(q)

V(bitbar)V(bit)

University of Minnesota

Page 47: Research Presentation

Nirav A. Desai [email protected]

47

Timing Waveforms for Characterization

V(write) – Applied to source of M7 (precharge switch)

V(word) – Wordline Voltage

V(qbar)

V(q)

V(ic) – Enables the precharge switch M7

V(bitbar)

V(bit)

• V(write) precharges Cbit to 0.8V via M7• V(word) disables access transistors M5 and M6 during precharge .• V(qbar) and V(q) are used to generate the butterfly curves.• V(ic) enables M7 during precharge It could be implemented as

NOT(V(word)).• V(bitbar) precharges to 0.8V, shows

charge pumping when M7 turns off and follows V(qbar) when wordline is enabled.

• V(bit) follows V(q) after word line is enabled.• V(bit) precharged to Vdd by V6

University of Minnesota

Page 48: Research Presentation

Nirav A. Desai [email protected]

48

PASS TRANSISTOR BASED TREE DESIGN

1:8 Row Decoder Tree

Similar Tree Decoder for 16 LSB Bits

University of Minnesota

Page 49: Research Presentation

Nirav A. Desai [email protected]

49

TREE DECODER DESIGN

Page 50: Research Presentation

Nirav A. Desai [email protected]

50

PASS TRANSISTOR BASED TREE DESIGN

IN OUT

CK

CK

50

880

L

W

Identical Sizing for NMOS and PMOS to minimize charge injection effects

• Delay drops by ~40ps/2 for every Doubling of transistor widths• Delay drop saturates around 1000nM to 89ps• Used W/L of 880/50 for final tree

University of Minnesota

Page 51: Research Presentation

Nirav A. Desai [email protected]

51

TREE DECODER TIMING DIAGRAMS

The following waveforms were applied to the row and column selection inputs of the tree decoder

University of Minnesota

Page 52: Research Presentation

Nirav A. Desai [email protected]

52

TREE DECODER TIMING DIAGRAMS

It takes one cycle for initializing the tree decoder after which we get clean pulses for each row output

LSB pulse is wider than MSB pulse in bottom figure to allow the tree to clear present state before next

University of Minnesota

Page 53: Research Presentation

Nirav A. Desai [email protected]

53

TREE DECODER TIMING DIAGRAMS

The top waveforms shows the matrix point output where the row and column select inputs are highThe output node discharges when the input goes low

University of Minnesota

Page 54: Research Presentation

Nirav A. Desai [email protected]

54

Page 55: Research Presentation

Nirav A. Desai [email protected]

55

READ WRITE CIRCUIT ( Design by Bong Jin )

Sense Amplifier Write Driver

Precharge Circuit

University of Minnesota

Page 56: Research Presentation

Nirav A. Desai [email protected]

56

READ WRITE CIRCUIT TEST SETUP

Bitline Capacitance estimate from ASU PTM Website

Cbit estimate for 512 rows

NMOS Switches to allow read without disabling write circuit

Single SRAM Cell for simulations

University of Minnesota

Page 57: Research Presentation

Nirav A. Desai [email protected]

57

READ / WRITE TIMING WAVEFORMS

Precharge Pulse ( Active Low )

Data Meant to be written to cell

Write Enable Pulse

Read Enable Pulse

Output of Write Buffer

Disable output buffer ( tristate logic )

Bitline

Bitline Bar

Data Output

Data Out Bar

University of Minnesota

Page 58: Research Presentation

Nirav A. Desai [email protected]

58

SRAM Cell Layout

University of Minnesota

Page 59: Research Presentation

Nirav A. Desai [email protected]

59

2X2 SRAM Array Layout

VDD

GND

GND

WORD 1

WORD 0

B0 B0BAR B1 B1BAR

This unit can be replicated in all directions without any changes. LVS check remainingArray Size = 3.7975umX2.4725um

University of Minnesota

Page 60: Research Presentation

Nirav A. Desai [email protected]

60

References

Digital Integrated Circuits

Jan Rabaey, Anantha Chandrakasan, Borivoje Nikolic

( SRAM Cell Design, Decoders, Read Write Circuits )

CMOS VLSI Design by Weste and Harris

( Butterfly Curves )

CMOS Circuit Design, Layout and Simulation

Baker, Li, Boyce (Decoder Design)

Course slides of Prof. Kia Bazargan

( Precharge Techniques, Decoders, SRAM Cell Design )

University of Minnesota

Page 61: Research Presentation

Nirav A. Desai [email protected]

61

System Diagram for developing LMS Algorithm for Channel Estimation ( H(z) )

Errors e1 and e2 ( e2 being the Quantized Error ) could have the same convergence

If the channel model H(z) is adapted using a LMS Model

Next few slides show regular LMS and modified LMS Error Convergence

Adaptive DSP Course by Prof. Keshab Parhi

Page 62: Research Presentation

Nirav A. Desai [email protected]

62

Error Convergence for regular LMS takes more time than the modified LMS

Adaptive DSP Course by Prof. Keshab Parhi

Page 63: Research Presentation

Nirav A. Desai [email protected]

63

Modified LMS Adapts all tap weights using different errors computed using as many filter output estimates as the filter order. The assumption being that the optimum gradient direction for each tap weight is different and is given by the corresponding errorLattice Predictors would be a more efficient way to do this as compared to LMS since each stage of a predictor is optimum for that order unlike modified LMS where you adapt each tap weight in a sub optimal manner.

Adaptive DSP Course by Prof. Keshab Parhi

Page 64: Research Presentation

Nirav A. Desai [email protected]

64

EEG Spectral Estimates for Pre-Ictal, Ictal and Post-Ictal Signal Sequences

Adaptive DSP Course by Prof. Keshab Parhi

Page 65: Research Presentation

Nirav A. Desai [email protected]

65

Spectral Estimation for a low pass filtered impulse sequence using different techniques

Adaptive DSP Course by Prof. Keshab Parhi

Page 66: Research Presentation

Nirav A. Desai [email protected]

66

Correlograms provide best Spectral Estimates for Low Pass Filtered Impulse Trains

Adaptive DSP Course by Prof. Keshab Parhi

Page 67: Research Presentation

Nirav A. Desai [email protected]

67

EE 5364 / CS 5204:Advanced Computer Architecture

Final Course Project on Design of a Branch Predictor

Prepared by:Nirav Desai 4280229

Amanda Skinner 3749048 Course Instructor: Prof. Pen-Chung Yew

Department of ECEUniversity of Minnesota, Twin Cities

Page 68: Research Presentation

Nirav A. Desai [email protected]

68Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

Why Branch Predictor?• Branch Predictors improve the flow of

the instruction pipeline

• As Branch predictor accuracy increases,

cache misses decrease, or improve, for

both data and instruction caches

Page 69: Research Presentation

Nirav A. Desai [email protected]

69

Why Branch Predictor?

Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

Page 70: Research Presentation

Nirav A. Desai [email protected]

70Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

• As branch predictor accuracy increases, cache misses go down

• Prefetching and increasing cache size decreases cache misses

Miss Rate for Mesa benchmark. Both the L1-Data and L2 cache associativities were changed

Why Prefetching ?

[4]

Page 71: Research Presentation

Nirav A. Desai [email protected]

71Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

• LA-PC runs ahead of PC and keeps track of load and store instructions

• RPT keeps track of previous reference addresses and strides for load and store instructions

• L2 Cache prefetching can be done by storing spill over data and instructions from L1 Cache blocks.

• INTEL CORE 2 Duo uses RPT for L1 Cache Prefetching and Loop Counter Local Branch Predictor

Reference Prediction Table[1]

Page 72: Research Presentation

Nirav A. Desai [email protected]

72

• Loop Counter would give high accuracy on matrix multiplication

• Track all registers for loop counter as possibility of different interleaved threads using different registers

• Loop Counter error would imply dynamic update of registers based on non-local values

• Tag registers giving repeated conditional branch errors on the Branch Decision Table

• Use the O-GEHL predictor for all tagged branches

• Using the loop counter and duplicate ALU will allow indexing long histories with limited geometric length

Design of Branch Predictor

Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

Page 73: Research Presentation

Nirav A. Desai [email protected]

73Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

Branch Decision Table

Branch Address

Predicted Direction

Predicted Branch Target

Actual Direction

Actual BranchTarget

Counters UsedC(i)(j)

Tag

Counters UsedC(i)(j)

Entered by LA-PC

Entered by Loop Counter or O-GEHL

Entered by Duplicate ALU

Entered by PC

Entered by PC

Entered by O-GEHL

Entered by O-GEHL

if prediction != actual decision

Prediction computed by Loop Counter ?

Yes - Incorrect Duplicate Register Values

Re-Initialize Duplicate Register Stack Set LA-PC to PC

After 2 successive errors make an entry in O-GEHLAlso tag the branch address in Branch Decision Table

to be used with O-GEHL

Prediction computed by O-GEHL ?

Yes – Run the update equation on counters listed in table

Set LA-PC to PC

Page 74: Research Presentation

Nirav A. Desai [email protected]

74Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

Loop Counter Branch Predictor

Op-Code = 4 (beq) OR Op-Code = 5 (bne)

Duplicate Register Flag == 0 ?

Yes No

First Conditional Branch

Copy Register Stack to Duplicate Register Stack( Equivalent to initializing

the duplicate register stack)

Duplicate Register Stack Initialized

Set Register Flag for rs and rt = 1These registers will be tracked by the Duplicate ALU

Proceed to Branch Prediction Computation

rs == rt ? rs != rt ?

Op code == 4 ? Op code == 5 ?

yesno yes noExecute

Copy Off-Set from bits 15 to bit 0

Sign Extend Off Set to bit 31 ( Total 32 bits )Left Shift by 2 ( to get Word Address )

Add to PC+4 to get Branch Target Address

Inc LA-PCBy 4

Inc LA-PCBy 4

Do addition and subtraction for all instructions having rs and rt with

register flags set to 1 rs – Bits 25:21 rt – Bits: 20:16

The loop counter looks at only the conditional branches

Can be extended to bgtz, blez

Op-Code:Bits 31:26

Page 75: Research Presentation

Nirav A. Desai [email protected]

75Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

O-GEHL Branch Predictor[2]

C12()

C11()

C24()

C23()

C22()

C21()

C39()

C38()

C37()

C36()

C35()

C34()

C33()

C32()

C31()

History Lengths go in Geometric Progression given by L(i) = αi-1 L(1) + constantBest Series found from experiments: 2, 4, 9, 12, 18, 31, 54, 114, 145, 266

Dynamic History length fitting with variable α also possible.

C10266()

C10265()

C101()

Sum = ΣC(i)(j)+C(i+1)(k)+…C(i+9)(l)

• j,k,l .. Are incremented on every unconditional branch.

• j increments are modulo 2, k increments are modulo 4, l increments are modulo 266.• Each C(i)(j) is a 4 bit saturating counter

that counts -8 to 7.• Counter Update given by:

if(p!=out) if(branch==taken) c(i)(j)++

if(branch!=taken) c(i)(j)-- • Dynamic Threshold (θ) Fitting possible• Threshold(θ) by default is 0.

Sum > θ then p = takenSum < θ then p = not taken

Page 76: Research Presentation

Nirav A. Desai [email protected]

76Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

Duplicate ALU ( for MIPS )[3]

LA-PC Address -Instruction

Duplicate Instruction Queue

Reg 3

Reg 2

Reg 1

Op Code

31-26

25-21

20-16

15-11

Decode Unit

CompareOp-Code

Op-Code == 4 OR 5: (beq, bne) Use Loop CounterOp-Code == 2 OR 3: (jump, jal) Always takeOp-Code == 0 & FUNCT==8 OR 9: (jr, jalr) Always take

Branch Target for Jump: 32bits: bits 31:28: 4 MSB bits of current PC+4 bits 27:2: Jump Target from instruction

bits 1:0 : 00 ( Word Addresses )Branch Target for Branch: 32 bits: Current PC + 4 + bits 15:0 left shifted by 2 to give word addresses

Compare Register Flags for reg1, reg2, reg3

If register flags set, do the computation forOp-Code: 0 bits(5:0) 32: add r1, r2, r3Op-Code: 0 bits(5:0) 34: sub r1, r2, r3Op-Code: 0 bits(5:0) 33: addu r1, r2, r3Op-Code: 0 bits(5:0) 35: subu r1, r2, r3Op-Code: 8: addi r1, constantOp-Code: 9: addiu r1, constant

• Set LA-PC Busy bit on instruction read• When LA-PC updated by branch predictors,

busy bit reset• For arithmetic, reset busy bit after 2 cycles• Instruction read when busy bit reset• LA-PC different from that used in RPT

This branch predictor can be used on Multi Threaded CPUs

Page 77: Research Presentation

Nirav A. Desai [email protected]

77

Test results on O-GEHL Branch Predictor[5]

Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

Page 78: Research Presentation

Nirav A. Desai [email protected]

78Nirav Desai 4280229 ECEAmanda Skinner 3749048 CS

References1. An Effective On-Chip Preloading Scheme to Reduce Data Access Penalty Jean-Loup Baer, Tien-Fu Chen Department of Computer Science and Engineering, University of Washington, Seattle, WA 98195 Supercomputing '91 Proceedings of the 1991 ACM/IEEE Conference on Supercomputing

2. The O-GEHL Branch Predictor Andre Seznec The 1st JILP Championship Branch Prediction Competition CBP1 (2004) Available from www.jilp.org

3. Computer Organisation and Design The Hardware-Software Interface David Patterson and John Hennessy

4. http://en.wikipedia.org/wiki/CPU_cache

5. Analysis of the Optimized GEHL Predictor Andre Seznec Available from: http://www.irisa.fr/caps/people/seznec/ISCA05.pdf

Page 79: Research Presentation

Nirav A. Desai [email protected]

79

Research Ideas I am working on right now

Page 80: Research Presentation

Nirav A. Desai [email protected]

80

Strained Silicon on SiGe Solar Cell

• Requires Chemical Vapor Deposition or MBE techniques for fabrication

• Tandem Solar Cell design gives a wide band of absorbable frequencies with different band gaps.

• Optimal thickness at quarter wavelength will give maximum absorption at designed frequency

• Back plate metal contacts and top plate fingered contacts

• Economically viable for charging battery packs in electric vehicles and for replacing LPG cooking gas cylinders.

• Long term viability for power generation feasible due to low operating costs and low distribution costs in a distributed model.

• Reference: Si/multicrystalline-SiGe heterostructure as a candidate for solar cells with high conversion efficiency: Photovoltaic Specialists Conference, 2002. Conference Record of the Twenty-Ninth IEEEDate of Conference: 19-24 May 2002Author(s): Usami, N. Inst. for Mater. Res., Tohoku Univ., Sendai, Japan Takahashi, T. ;  Fujiwara, K. ;  Ujihara, T. ;  Sazaki, G. ; Murakami, Y. ;  Nakajima, K. Page(s): 247 - 249 

Page 81: Research Presentation

Nirav A. Desai [email protected]

81

Rake Receiver with MDS Codes

• Rake receivers could be used to identify strongest multi path component from a received signal.

• This could be achieved by correlating the received signal with itself over different delays and finding the strongest delay component.

• This does not involve maximal ratio combining.

• It could be combined with MDS codes for wireless communications where given any d bits corrupted by channel noise or multi path effects, the signal could still be recovered uniquely.

• Reference: Lectures of Prof. Cutter on iTunesU under the course on Digital Communications 2 taught at MIT.

• Reference: W-CDMA Rake Receiver implementation in DSP: EE Times: Link: http://www.eetimes.com/electronics-news/4139933/W-CDMA-RAKE-Receiver-Comes-to-Life-in-DSP

• Reference: A Rake Receiver for Maximal Ratio Combining without Channel Estimation for UWB Communications: http://digitalcommons.unf.edu/cgi/viewcontent.cgi?article=1044&context=ojii_volumes

Page 82: Research Presentation

Nirav A. Desai [email protected]

82

Class S RF Power Amplifiers on GaN HEMTs

• Class S RF Power Amplifiers with fully differential H-Bridge topology could give a theoretical 100% efficiency.

• GaN HEMTs give the best high frequency switching characteristics.

• The 2 features could be combined to give a high efficiency RF power amplifier topology.

• Reference: Ph.D. Dissertation of Stephan Maroldt, University of Freiburg

Page 83: Research Presentation

Nirav A. Desai [email protected]

83

Microprocessor Design

• The attached slides describe the design of a 16 bit Brent Kung Adder and 1024x16 asynchronous SRAM in 45 nM CMOS along with the design of a branch predictor and cache prefetch unit for a MIPS microprocessor.

• These design ideas could be combined with other ideas for pipeline design, ALU design and interconnect circuit design to give a full physical layer design of a MIPS microprocessor in 45nM CMOS.

• Various power reduction and clock gating techniques could be applied at a higher level of the hierarchy.

Page 84: Research Presentation

Nirav A. Desai [email protected]

84

mm-wave MIMO OFDM

• mm-wave MIMO OFDM could be used for wireless backhaul networks due to its high capacity

• mm-wave MIMO systems could be extended to 2x2, 4x4, 8x8, etc topologies to exploit spatial diversity and get higher data rate.

• Reference:

• 4 channel spatial multiplexing over a mm-wave line of sight link

Microwave Symposium Digest, 2009. MTT '09. IEEE MTT-S InternationalDate of Conference: 7-12 June 2009Author(s): Sheldon, C. Dept. of Electr. & Comput. Eng., Univ. of California, Santa Barbara, CA, USA Munkyo Seo ;  Torkildson, E. ;  Rodwell, M. ;  Madhow, U. 

Page(s): 389 - 392

Page 85: Research Presentation

Nirav A. Desai [email protected]

85

Routing algorithm to reduce congestion

• The routing algorithm to reduce congestion could be based on the idea of sparsity.

• High congestion nodes could be dropped from the network map till congestion on the node drops.

• The underlying packet streams would be using a flow control based routing protocol.

• Each node would store a map of the network which would be updated periodically using ping back messages.

• Could be applied to packet switched networks, traffic control and wireless sensor networks.

Page 86: Research Presentation

Nirav A. Desai [email protected]

86

Photonic Computers

• These could use multiplexer based logic gates.

• Photonic multiplexers have been widely researched and developed for optical communications.

• Phase detectors could be used to identify the phase and thus the value of the stored signal.

• These would use electronic charge storage and high speed electro-optic conversion.

• Reference: Prior research on this has been carried out in UCSB.