cs 61c: great ideas in computer architecture

CS61C:GreatIdeasinComputerArchitecturePerformanceandFloating-PointArithmetic

Instructors:KrsteAsanović &RandyH.Katz

http://inst.eecs.berkeley.edu/~cs61c/fa17

10/24/17 Fall2017-- Lecture#17 1

Outline• DefiningPerformance• Floating-PointStandard• AndinConclusion…

10/24/17 2

WhatisPerformance?• Latency(orresponsetimeorexecutiontime)– Timetocompleteonetask

• Bandwidth(orthroughput)– Taskscompletedperunittime

10/24/17 3

CloudPerformance:WhyApplicationLatencyMatters

• Keyfigureofmerit:applicationresponsiveness– Longerthedelay,thefewertheuserclicks,thelesstheuser

happiness,andthelowertherevenueperuser10/24/17 4

CPUPerformanceFactors• Aprogramexecutesinstructions• CPU Time/Program

= Clock Cycles/Program x Clock Cycle Time= Instructions/Program

x Average Clock Cycles/Instruction x Clock Cycle Time

• 1st termcalledInstructionCount• 2nd termabbreviatedCPIforaverage

ClockCyclesPerInstruction• 3rdtermis1/Clockrate

10/24/17 5

RestatingPerformanceEquation• Time= Seconds

ProgramInstructions Clockcycles SecondsProgram Instruction ClockCycle××=

10/24/17 6

WhatAffectsEachComponent?InstructionCount,CPI,ClockRate

Hardwareorsoftwarecomponent? AffectsWhat?

Algorithm InstructionCount,CPI

ProgrammingLanguage

InstructionCount,CPI

Compiler InstructionCount,CPI

InstructionSet Architecture InstructionCount,ClockRate,CPI

10/24/17 7

PeerInstruction

• Whichcomputerhasthehighestperformanceforagivenprogram?

Computer Clock frequency Clock cyclesperinstruction

#instructionsperprogram

RED 1.0GHz 2 1000

GREEN 2.0GHz 5 800

ORANGE 0.5GHz 1.25 400

5.0GHz 10 2000

10/24/17 8

PeerInstruction

Whichsystemisfaster?

9

System Rate(Task1) Rate(Task2)A 10 20B 20 10

RED:SystemAGREEN:SystemB

ORANGE:Sameperformance:Unanswerablequestion!

10/24/17

WorkloadandBenchmark• Workload: Setofprogramsrunonacomputer– Actualcollectionofapplicationsrunormadefromrealprogramstoapproximatesuchamix

– Specifiesbothprogramsandrelativefrequencies• Benchmark:Programselectedforuseincomparingcomputerperformance– Benchmarksformaworkload– Usuallystandardizedsothatmanyusethem

10/24/17 10

SPEC(SystemPerformanceEvaluationCooperative)• ComputerVendorcooperativeforbenchmarks,startedin1989• SPECCPU2006

– 12IntegerPrograms– 17Floating-PointPrograms

• Oftenturnintonumberwherebiggerisfaster• SPECratio:referenceexecutiontimeonoldreferencecomputer

dividedbyexecutiontimeonnewcomputer,reportedasspeed-up• (SPEC2017justreleased)

10/24/17 11

SPECINT2006onAMDBarcelonaDescription Instruction

Count (B) CPI Clock cycle time (ps)

Execution Time (s)

Reference Time (s)

SPEC-ratio

Interpreted string processing 2,118 0.75 400 637 9,770 15.3Block-sorting compression 2,389 0.85 400 817 9,650 11.8GNU C compiler 1,050 1.72 400 724 8,050 11.1Combinatorial optimization 336 10.0 400 1,345 9,120 6.8Go game 1,658 1.09 400 721 10,490 14.6Search gene sequence 2,783 0.80 400 890 9,330 10.5Chess game 2,176 0.96 400 837 12,100 14.5Quantum computer simulation 1,623 1.61 400 1,047 20,720 19.8Video compression 3,102 0.80 400 993 22,130 22.3Discrete event simulation library 587 2.94 400 690 6,250 9.1Games/path finding 1,082 1.79 400 773 7,020 9.1XML parsing 1,058 2.70 400 1,143 6,900 6.012

SummarizingSPECPerformance• Variesfrom6xto22xfasterthanreferencecomputer

• Geometricmeanofratios:N-th rootofproductofNratios– GeometricMeangivessamerelativeanswernomatterwhatcomputerisusedasreference

• GeometricMeanforBarcelonais11.710/24/17 13

Administrivia

1410/24/17

• Midterm2inoneweek!– GuerrillaSessiontonight7-9pminCory

293– ONEdoublesidedcheatsheet– Staytunedforroomassignments– Reviewsession:Saturday10-12pmin

HearstAnnexA1• Homework4released!

– CachesandFloatingPoint– Dueafterthemidterm– Stillgoodcachepractice!

• Proj2-Part2released– Dueafterthemidterm,butgoodtodo

before!

Outline• DefiningPerformance• Floating-PointStandard• AndinConclusion…

10/24/17 Fall2016– Lecture#17 15

QuickNumberReview• Computersdealwithnumbers• WhatcanwerepresentinNbits?– 2N things,andnomore!Theycouldbe…

• Unsignedintegers:0 to 2N- 1(forN=32,2N– 1 =4,294,967,295)

• SignedIntegers(Two’sComplement)-2(N-1) to 2(N-1)- 1(forN=32,2(N-1)=2,147,483,648)

10/24/17 16

OtherNumbers1. Verylargenumbers? (seconds/millennium)

Þ 31,556,926,00010 (3.155692610 x1010)2. Verysmallnumbers?(Bohrradius)

Þ 0.000000000052917710m(5.2917710 x10-11)3. Numberswithbothinteger&fractionalparts?

Þ 1.5• Firstconsider#3• …oursolutionwillalsohelpwith#1and#210/24/17 17

GoalsforFloatingPoint• Standardarithmeticforreals forallcomputers

– Liketwo’scomplement• Keepasmuchprecisionaspossibleinformats• Helpprogrammerwitherrorsinrealarithmetic

– +∞,-∞,Not-A-Number(NaN),exponentoverflow,exponentunderflow

• Keepencodingthatissomewhatcompatiblewithtwo’scomplement– E.g.,0inFl.Pt.is0intwo’scomplement– Makeitpossibletosortwithoutneedingtodofloating-point

comparison

10/24/17 18

RepresentationofFractions• “BinaryPoint”likedecimalpointsignifiesboundarybetweenintegerandfractionalparts:

xx.yyyy21 20 2-1 2-2 2-3 2-4

Example6-bitrepresentation:10.1010two = 1x21 + 1x2-1 + 1x2-3 = 2.625ten

Ifweassume“fixedbinarypoint”,rangeof6-bitrepresentationswiththisformat: 0to3.9375(almost4)

10/24/17 19

FractionalPowersof2

10/24/17 20

0 1.0 11 0.5 1/22 0.25 1/43 0.125 1/84 0.0625 1/165 0.03125 1/326 0.0156257 0.00781258 0.003906259 0.00195312510 0.000976562511 0.0004882812512 0.00024414062513 0.000122070312514 0.0000610351562515 0.000030517578125

i 2-i

RepresentationofFractionswithFixedPt.

Whataboutadditionandmultiplication?Additionisstraightforward:

01.100 1.5ten+ 00.100 0.5ten

10.000 2.0ten

Multiplicationabitmorecomplex:

01.100 1.5ten00.100 0.5ten 00 000

000 000110 0

0000000000

0000110000

Where’stheanswer,0.11?(e.g.,0.5+0.25;Needtorememberwherepointis!)

10/24/17 21

RepresentationofFractionsOurexamplesuseda“fixed”binarypoint.Whatwereallywantisto“float”thebinarypointtomakemosteffecitve useoflimitedbits

… 000000.001010100000…

Anyothersolutionwouldloseaccuracy!

example: put0.1640625ten intobinary.Representwith5-bitschoosingwheretoputthebinarypoint.

Storethesebitsandkeeptrackofthebinarypoint2placestotheleftoftheMSB

• Withfloating-pointrepresentation,eachnumeralcarriesanexponentfieldrecordingthewhereaboutsofitsbinarypoint

• Binarypointcanbeoutside thestoredbits,soverylargeandsmallnumberscanberepresented

10/24/17 22

ScientificNotation(inDecimal)

• Normalizedform:noleadings0s(exactlyonedigittoleftofdecimalpoint)

• Alternativestorepresenting1/1,000,000,000– Normalized: 1.0x10-9

– Notnormalized: 0.1x10-8,10.0x10-10

6.02ten x1023

radix(base)decimalpoint

mantissa exponent

10/24/17 23

ScientificNotation(inBinary)

• Computerarithmeticthatsupportsitiscalledfloatingpoint,becauseitrepresentsnumberswherethebinarypointisnotfixed,asitisforintegers– DeclaresuchvariableinCasfloat

• double fordoubleprecision

1.01two x2-1

radix(base)“binarypoint”

exponentmantissa

10/24/17 24

Floating-PointRepresentation(1/4)• 32-bitwordhas232 patterns,somustbeapproximationofrealnumbers≥1.0,<2

• IEEE754Floating-PointStandard:– 1bitforsign(s)offloatingpointnumber– 8bitsforexponent(E)– 23bitsforfraction(F)(get1extrabitofprecisionifleading1isimplicit)

(-1)s x (1+F)x 2E• Canrepresentfrom2.0x 10-38to2.0x 1038

10/24/17 25

Floating-PointRepresentation(2/4)• Normalformat:+1.xxx…xtwo*2yyy…ytwo• MultipleofWordSize(32bits)

031S Exponent

30 23 22Significand

1bit 8bits 23bits

• S representsSign• Exponent representsy’s• Significand representsx’s• Representnumbersassmallas2.0ten x10-38 toaslargeas2.0ten x1038

10/24/17 26

Floating-PointRepresentation(3/4)• Whatifresulttoolarge?

(>2.0x1038 ,<-2.0x1038 )– Overflow!Þ Exponentlargerthanrepresentedin8-bitExponentfield

• Whatifresulttoosmall?(>0&<2.0x10-38 ,<0&>-2.0x10-38 )– Underflow!Þ Negativeexponentlargerthanrepresentedin8-bitExponentfield

• Whatwouldhelpreducechancesofoverflowand/orunderflow?0 2x10-38 2x10381-1 -2x10-38-2x1038

underflow overflowoverflow

10/24/17 27

Floating-PointRepresentation(4/4)• Whataboutbiggerorsmallernumbers?• IEEE754Floating-PointStandard:

DoublePrecision(64bits)– 1bitforsign(s)offloating-pointnumber– 11bitsforexponent(E)– 52bitsforfraction(F)

(get1extrabitofprecisionifleading1isimplicit)(-1)s x (1+F)x 2E• Canrepresentfrom2.0x 10-308to2.0x 10308• 32-bitformatcalledSinglePrecision

10/24/17 28

WhichisLess?(i.e.,closerto-∞)

• 0 vs.1x10-127?

• 1x 10-126 vs.1x10-127?

• -1x10-127 vs.0?

• -1x 10-126 vs.-1x10-127?10/24/17 29

FloatingPoint:RepresentingVerySmallNumbers• Zero:Bitpatternofall0sisencodingfor0.000

Þ But0inexponentshouldmeanmostnegativeexponent(want0tobenexttosmallestreal)

Þ Can’tusetwo’scomplement(10000000two)

• Biasnotation:subtractbiasfromexponent– Singleprecisionusesbiasof127;DPuses1023

• 0uses00000000two =>0-127=-127;∞,NaN uses11111111two =>255-127=+128– SmallestSPrealcanrepresent:1.00…00x 2-126

– LargestSPrealcanrepresent:1.11…11x2+12710/24/17 30

IEEE754Floating-PointStandard(1/3)SinglePrecision(DoublePrecisionsimilar):

• Signbit: 1meansnegative0meanspositive

• Significand insign-magnitudeformat(not2’scomplement)– Topackmorebits,leading1implicitfornormalizednumbers– 1+23bitssingle,1+52bitsdouble– alwaystrue:0<Significand <1(fornormalizednumbers)

• Note:0hasnoleading1,soreserveexponentvalue0justfornumber0

031S Exponent30 23 22

Significand1bit 8bits 23bits

10/24/17 31

IEEE754Floating-PointStandard(2/3)• IEEE754uses“biasedexponent” representation– DesignerswantedFPnumberstobeusedevenifnoFPhardware;e.g.,sortrecordswithFPnumbersusingintegercompares

– Wantedbigger(integer)exponentfieldtorepresentbiggernumbers

– 2’scomplementposesaproblem(becausenegativenumberslookbigger)• Usejustmagnitudeandoffsetbyhalftherange

10/24/17 32

IEEE754Floating-PointStandard(3/3)

• Summary(singleprecision):

• CalledBiasedNotation,wherebiasisnumbersubtractedtogetfinalnumber− IEEE754usesbiasof127forsingleprecision− Subtract127fromExponentfieldtogetactualexponentvalue

031S Exponent

30 23 22Significand

1bit 8bits 23bits

• (-1)S x(1+Significand)x2(Exponent-127)•Doubleprecisionidentical,exceptwithexponentbiasof1023(half,quadsimilar)

10/24/17 33

BiasNotation(+127)

∞,NaN

Zero

Gettingclosertozero

HowitisencodedHowitisinterpreted

10/24/17 34

PeerInstruction• GuessthisFloatingPointnumber:11000000010000000000000000000000• RED: -1x2128

• GREEN:+1x2-128

• ORANGE:-1x21

-1.5x21

3510/24/17 35

MoreFloatingPoint• Whatabout0?

– Bitpatternall0smeans0,sonoimplicitleading1• Whatifdivide1by0?

– Cangetinfinitysymbols+∞,-∞– Signbit0or1,largestexponent(all1s),0infraction

• Whatifdosomethingstupid?(∞– ∞,0÷ 0)– CangetspecialsymbolsNaN for“Not-a-Number”– Signbit0or1,largestexponent(all1s),notzeroinfraction

• Whatifresultistoobig?(2x10308x 2x102)– Getoverflowinexponent,alertprogrammer!

• Whatifresultistoosmall?(2x10-308÷ 2x102)– Getunderflowinexponent,alertprogrammer!

10/24/17 36

Representationfor± ∞• InFP,divideby0shouldproduce± ∞,notoverflow•Why?– OKtodofurthercomputationswith∞E.g.,X/0>Ymaybeavalidcomparison

• IEEE754represents± ∞– Mostpositiveexponentreservedfor∞– Significand allzeroes

10/24/17 37

Representationfor0• Represent0?– Exponentallzeroes– Significand allzeroes–Whataboutsign?Bothcasesvalid+0: 0 00000000 00000000000000000000000-0: 1 00000000 00000000000000000000000

10/24/17 38

SpecialNumbers• Whathavewedefinedsofar?(SinglePrecision)

Exponent Significand Object0 0 00 nonzero ???1-254 anything +/- fl.pt.#255 0 +/- ∞255 nonzero ???

10/24/17 39

RepresentationforNot-a-Number• WhatdoIgetifIcalculate sqrt(-4.0)or0/0?

– If∞notanerror,theseshouldn’tbeeither– CalledNota Number(NaN)– Exponent=255,Significand nonzero

• Whyisthisuseful?– HopeNaNs helpwithdebugging?– Theycontaminate:op(NaN,X)=NaN– Canusethesignificand toidentifywhich!(e.g.,quietNaNs andsignalingNaNs)

10/24/17 40

RepresentationforDenorms(1/2)• Problem:There’sagapamongrepresentableFPnumbers

around0– Smallestrepresentablepositivenumber:

a=1.0…2 *2-126 =2-126– Secondsmallestrepresentablepositivenumber:

b =1.000……12 *2-126=(1+0.00…12)*2-126=(1+2-23)*2-126=2-126 +2-149

a- 0=2-126b- a=2-149 b

a0 +-

Gaps!

Normalization and implicit 1is to blame!

10/24/17 41

RepresentationforDenorms(2/2)• Solution:

− Westillhaven’tusedExponent=0,Significand nonzero− DEnormalized number:no(implied)leading1,implicit

exponent=-126− Smallestrepresentablepositivenumber:

a=2-149 (i.e.,2-126x2-23)﹣ Second-smallestrepresentablepositivenumber:

b=2-148 (i.e.,2-126x2-22)

0 +-10/24/17 42

SpecialNumbersSummary• Reserveexponents,significands:

Exponent Significand Object0 0 00 nonzero Denorm1-254 anything +/- fl.pt.number255 0 +/- ∞255 Nonzero NaN

10/24/17 43

SavingBits• Manyapplicationsinmachinelearning,graphics,signal

processingcanmakedowithlowerprecision• IEEE“half-precision”or“FP16”uses16bitsofstorage

– 1signbit– 5exponentbits(exponentbiasof15)– 10significand bits

• Microsoft“BrainWave”FPGAneuralnetcomputerusesproprietary8-bitand9-bitfloating-pointformats

44

Outline• DefiningPerformance• FloatingPointStandard• AndinConclusion…

10/24/17 45

AndInConclusion,…• Time(seconds/program)ismeasureofperformance

Instructions Clockcycles SecondsProgram Instruction ClockCycle

• Floating-pointrepresentationsholdapproximationsofrealnumbersinafinitenumberofbits- IEEE754Floating-PointStandardismostwidelyacceptedattemptto

standardizeinterpretationofsuchnumbers(Everydesktoporservercomputersoldsince~1997followstheseconventions)

- SinglePrecision:

××=

10/24/17 46

031S Exponent

30 23 22Significand

1bit 8bits 23bits

cs 61c: great ideas in computer architecture

Documents