cs 61c: great ideas in computer architecture

46
CS 61C: Great Ideas in Computer Architecture Performance and Floating-Point Arithmetic Instructors: Krste Asanović & Randy H. Katz http://inst.eecs.berkeley.edu/~cs61c/fa17 10/24/17 Fall 2017-- Lecture #17 1

Upload: others

Post on 16-Oct-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: CS 61C: Great Ideas in Computer Architecture

CS61C:GreatIdeasinComputerArchitecturePerformanceandFloating-PointArithmetic

Instructors:KrsteAsanović &RandyH.Katz

http://inst.eecs.berkeley.edu/~cs61c/fa17

10/24/17 Fall2017-- Lecture#17 1

Page 2: CS 61C: Great Ideas in Computer Architecture

Outline• DefiningPerformance• Floating-PointStandard• AndinConclusion…

10/24/17 2

Page 3: CS 61C: Great Ideas in Computer Architecture

WhatisPerformance?• Latency(orresponsetimeorexecutiontime)– Timetocompleteonetask

• Bandwidth(orthroughput)– Taskscompletedperunittime

10/24/17 3

Page 4: CS 61C: Great Ideas in Computer Architecture

CloudPerformance:WhyApplicationLatencyMatters

• Keyfigureofmerit:applicationresponsiveness– Longerthedelay,thefewertheuserclicks,thelesstheuser

happiness,andthelowertherevenueperuser10/24/17 4

Page 5: CS 61C: Great Ideas in Computer Architecture

CPUPerformanceFactors• Aprogramexecutesinstructions• CPU Time/Program

= Clock Cycles/Program x Clock Cycle Time= Instructions/Program

x Average Clock Cycles/Instruction x Clock Cycle Time

• 1st termcalledInstructionCount• 2nd termabbreviatedCPIforaverage

ClockCyclesPerInstruction• 3rdtermis1/Clockrate

10/24/17 5

Page 6: CS 61C: Great Ideas in Computer Architecture

RestatingPerformanceEquation• Time= Seconds

ProgramInstructions Clockcycles SecondsProgram Instruction ClockCycle××=

10/24/17 6

Page 7: CS 61C: Great Ideas in Computer Architecture

WhatAffectsEachComponent?InstructionCount,CPI,ClockRate

Hardwareorsoftwarecomponent? AffectsWhat?

Algorithm InstructionCount,CPI

ProgrammingLanguage

InstructionCount,CPI

Compiler InstructionCount,CPI

InstructionSet Architecture InstructionCount,ClockRate,CPI

10/24/17 7

Page 8: CS 61C: Great Ideas in Computer Architecture

PeerInstruction

• Whichcomputerhasthehighestperformanceforagivenprogram?

Computer Clock frequency Clock cyclesperinstruction

#instructionsperprogram

RED 1.0GHz 2 1000

GREEN 2.0GHz 5 800

ORANGE 0.5GHz 1.25 400

5.0GHz 10 2000

10/24/17 8

Page 9: CS 61C: Great Ideas in Computer Architecture

PeerInstruction

Whichsystemisfaster?

9

System Rate(Task1) Rate(Task2)A 10 20B 20 10

RED:SystemAGREEN:SystemB

ORANGE:Sameperformance:Unanswerablequestion!

10/24/17

Page 10: CS 61C: Great Ideas in Computer Architecture

WorkloadandBenchmark• Workload: Setofprogramsrunonacomputer– Actualcollectionofapplicationsrunormadefromrealprogramstoapproximatesuchamix

– Specifiesbothprogramsandrelativefrequencies• Benchmark:Programselectedforuseincomparingcomputerperformance– Benchmarksformaworkload– Usuallystandardizedsothatmanyusethem

10/24/17 10

Page 11: CS 61C: Great Ideas in Computer Architecture

SPEC(SystemPerformanceEvaluationCooperative)• ComputerVendorcooperativeforbenchmarks,startedin1989• SPECCPU2006

– 12IntegerPrograms– 17Floating-PointPrograms

• Oftenturnintonumberwherebiggerisfaster• SPECratio:referenceexecutiontimeonoldreferencecomputer

dividedbyexecutiontimeonnewcomputer,reportedasspeed-up• (SPEC2017justreleased)

10/24/17 11

Page 12: CS 61C: Great Ideas in Computer Architecture

SPECINT2006onAMDBarcelonaDescription Instruction

Count (B) CPI Clock cycle time (ps)

Execution Time (s)

Reference Time (s)

SPEC-ratio

Interpreted string processing 2,118 0.75 400 637 9,770 15.3Block-sorting compression 2,389 0.85 400 817 9,650 11.8GNU C compiler 1,050 1.72 400 724 8,050 11.1Combinatorial optimization 336 10.0 400 1,345 9,120 6.8Go game 1,658 1.09 400 721 10,490 14.6Search gene sequence 2,783 0.80 400 890 9,330 10.5Chess game 2,176 0.96 400 837 12,100 14.5Quantum computer simulation 1,623 1.61 400 1,047 20,720 19.8Video compression 3,102 0.80 400 993 22,130 22.3Discrete event simulation library 587 2.94 400 690 6,250 9.1Games/path finding 1,082 1.79 400 773 7,020 9.1XML parsing 1,058 2.70 400 1,143 6,900 6.012

Page 13: CS 61C: Great Ideas in Computer Architecture

SummarizingSPECPerformance• Variesfrom6xto22xfasterthanreferencecomputer

• Geometricmeanofratios:N-th rootofproductofNratios– GeometricMeangivessamerelativeanswernomatterwhatcomputerisusedasreference

• GeometricMeanforBarcelonais11.710/24/17 13

Page 14: CS 61C: Great Ideas in Computer Architecture

Administrivia

1410/24/17

• Midterm2inoneweek!– GuerrillaSessiontonight7-9pminCory

293– ONEdoublesidedcheatsheet– Staytunedforroomassignments– Reviewsession:Saturday10-12pmin

HearstAnnexA1• Homework4released!

– CachesandFloatingPoint– Dueafterthemidterm– Stillgoodcachepractice!

• Proj2-Part2released– Dueafterthemidterm,butgoodtodo

before!

Page 15: CS 61C: Great Ideas in Computer Architecture

Outline• DefiningPerformance• Floating-PointStandard• AndinConclusion…

10/24/17 Fall2016– Lecture#17 15

Page 16: CS 61C: Great Ideas in Computer Architecture

QuickNumberReview• Computersdealwithnumbers• WhatcanwerepresentinNbits?– 2N things,andnomore!Theycouldbe…

• Unsignedintegers:0 to 2N- 1(forN=32,2N– 1 =4,294,967,295)

• SignedIntegers(Two’sComplement)-2(N-1) to 2(N-1)- 1(forN=32,2(N-1)=2,147,483,648)

10/24/17 16

Page 17: CS 61C: Great Ideas in Computer Architecture

OtherNumbers1. Verylargenumbers? (seconds/millennium)

Þ 31,556,926,00010 (3.155692610 x1010)2. Verysmallnumbers?(Bohrradius)

Þ 0.000000000052917710m(5.2917710 x10-11)3. Numberswithbothinteger&fractionalparts?

Þ 1.5• Firstconsider#3• …oursolutionwillalsohelpwith#1and#210/24/17 17

Page 18: CS 61C: Great Ideas in Computer Architecture

GoalsforFloatingPoint• Standardarithmeticforreals forallcomputers

– Liketwo’scomplement• Keepasmuchprecisionaspossibleinformats• Helpprogrammerwitherrorsinrealarithmetic

– +∞,-∞,Not-A-Number(NaN),exponentoverflow,exponentunderflow

• Keepencodingthatissomewhatcompatiblewithtwo’scomplement– E.g.,0inFl.Pt.is0intwo’scomplement– Makeitpossibletosortwithoutneedingtodofloating-point

comparison

10/24/17 18

Page 19: CS 61C: Great Ideas in Computer Architecture

RepresentationofFractions• “BinaryPoint”likedecimalpointsignifiesboundarybetweenintegerandfractionalparts:

xx.yyyy21 20 2-1 2-2 2-3 2-4

Example6-bitrepresentation:10.1010two = 1x21 + 1x2-1 + 1x2-3 = 2.625ten

Ifweassume“fixedbinarypoint”,rangeof6-bitrepresentationswiththisformat: 0to3.9375(almost4)

10/24/17 19

Page 20: CS 61C: Great Ideas in Computer Architecture

FractionalPowersof2

10/24/17 20

0 1.0 11 0.5 1/22 0.25 1/43 0.125 1/84 0.0625 1/165 0.03125 1/326 0.0156257 0.00781258 0.003906259 0.00195312510 0.000976562511 0.0004882812512 0.00024414062513 0.000122070312514 0.0000610351562515 0.000030517578125

i 2-i

Page 21: CS 61C: Great Ideas in Computer Architecture

RepresentationofFractionswithFixedPt.

Whataboutadditionandmultiplication?Additionisstraightforward:

01.100 1.5ten+ 00.100 0.5ten

10.000 2.0ten

Multiplicationabitmorecomplex:

01.100 1.5ten00.100 0.5ten 00 000

000 000110 0

0000000000

0000110000

Where’stheanswer,0.11?(e.g.,0.5+0.25;Needtorememberwherepointis!)

10/24/17 21

Page 22: CS 61C: Great Ideas in Computer Architecture

RepresentationofFractionsOurexamplesuseda“fixed”binarypoint.Whatwereallywantisto“float”thebinarypointtomakemosteffecitve useoflimitedbits

… 000000.001010100000…

Anyothersolutionwouldloseaccuracy!

example: put0.1640625ten intobinary.Representwith5-bitschoosingwheretoputthebinarypoint.

Storethesebitsandkeeptrackofthebinarypoint2placestotheleftoftheMSB

• Withfloating-pointrepresentation,eachnumeralcarriesanexponentfieldrecordingthewhereaboutsofitsbinarypoint

• Binarypointcanbeoutside thestoredbits,soverylargeandsmallnumberscanberepresented

10/24/17 22

Page 23: CS 61C: Great Ideas in Computer Architecture

ScientificNotation(inDecimal)

• Normalizedform:noleadings0s(exactlyonedigittoleftofdecimalpoint)

• Alternativestorepresenting1/1,000,000,000– Normalized: 1.0x10-9

– Notnormalized: 0.1x10-8,10.0x10-10

6.02ten x1023

radix(base)decimalpoint

mantissa exponent

10/24/17 23

Page 24: CS 61C: Great Ideas in Computer Architecture

ScientificNotation(inBinary)

• Computerarithmeticthatsupportsitiscalledfloatingpoint,becauseitrepresentsnumberswherethebinarypointisnotfixed,asitisforintegers– DeclaresuchvariableinCasfloat

• double fordoubleprecision

1.01two x2-1

radix(base)“binarypoint”

exponentmantissa

10/24/17 24

Page 25: CS 61C: Great Ideas in Computer Architecture

Floating-PointRepresentation(1/4)• 32-bitwordhas232 patterns,somustbeapproximationofrealnumbers≥1.0,<2

• IEEE754Floating-PointStandard:– 1bitforsign(s)offloatingpointnumber– 8bitsforexponent(E)– 23bitsforfraction(F)(get1extrabitofprecisionifleading1isimplicit)

(-1)s x (1+F)x 2E• Canrepresentfrom2.0x 10-38to2.0x 1038

10/24/17 25

Page 26: CS 61C: Great Ideas in Computer Architecture

Floating-PointRepresentation(2/4)• Normalformat:+1.xxx…xtwo*2yyy…ytwo• MultipleofWordSize(32bits)

031S Exponent

30 23 22Significand

1bit 8bits 23bits

• S representsSign• Exponent representsy’s• Significand representsx’s• Representnumbersassmallas2.0ten x10-38 toaslargeas2.0ten x1038

10/24/17 26

Page 27: CS 61C: Great Ideas in Computer Architecture

Floating-PointRepresentation(3/4)• Whatifresulttoolarge?

(>2.0x1038 ,<-2.0x1038 )– Overflow!Þ Exponentlargerthanrepresentedin8-bitExponentfield

• Whatifresulttoosmall?(>0&<2.0x10-38 ,<0&>-2.0x10-38 )– Underflow!Þ Negativeexponentlargerthanrepresentedin8-bitExponentfield

• Whatwouldhelpreducechancesofoverflowand/orunderflow?0 2x10-38 2x10381-1 -2x10-38-2x1038

underflow overflowoverflow

10/24/17 27

Page 28: CS 61C: Great Ideas in Computer Architecture

Floating-PointRepresentation(4/4)• Whataboutbiggerorsmallernumbers?• IEEE754Floating-PointStandard:

DoublePrecision(64bits)– 1bitforsign(s)offloating-pointnumber– 11bitsforexponent(E)– 52bitsforfraction(F)

(get1extrabitofprecisionifleading1isimplicit)(-1)s x (1+F)x 2E• Canrepresentfrom2.0x 10-308to2.0x 10308• 32-bitformatcalledSinglePrecision

10/24/17 28

Page 29: CS 61C: Great Ideas in Computer Architecture

WhichisLess?(i.e.,closerto-∞)

• 0 vs.1x10-127?

• 1x 10-126 vs.1x10-127?

• -1x10-127 vs.0?

• -1x 10-126 vs.-1x10-127?10/24/17 29

Page 30: CS 61C: Great Ideas in Computer Architecture

FloatingPoint:RepresentingVerySmallNumbers• Zero:Bitpatternofall0sisencodingfor0.000

Þ But0inexponentshouldmeanmostnegativeexponent(want0tobenexttosmallestreal)

Þ Can’tusetwo’scomplement(10000000two)

• Biasnotation:subtractbiasfromexponent– Singleprecisionusesbiasof127;DPuses1023

• 0uses00000000two =>0-127=-127;∞,NaN uses11111111two =>255-127=+128– SmallestSPrealcanrepresent:1.00…00x 2-126

– LargestSPrealcanrepresent:1.11…11x2+12710/24/17 30

Page 31: CS 61C: Great Ideas in Computer Architecture

IEEE754Floating-PointStandard(1/3)SinglePrecision(DoublePrecisionsimilar):

• Signbit: 1meansnegative0meanspositive

• Significand insign-magnitudeformat(not2’scomplement)– Topackmorebits,leading1implicitfornormalizednumbers– 1+23bitssingle,1+52bitsdouble– alwaystrue:0<Significand <1(fornormalizednumbers)

• Note:0hasnoleading1,soreserveexponentvalue0justfornumber0

031S Exponent30 23 22

Significand1bit 8bits 23bits

10/24/17 31

Page 32: CS 61C: Great Ideas in Computer Architecture

IEEE754Floating-PointStandard(2/3)• IEEE754uses“biasedexponent” representation– DesignerswantedFPnumberstobeusedevenifnoFPhardware;e.g.,sortrecordswithFPnumbersusingintegercompares

– Wantedbigger(integer)exponentfieldtorepresentbiggernumbers

– 2’scomplementposesaproblem(becausenegativenumberslookbigger)• Usejustmagnitudeandoffsetbyhalftherange

10/24/17 32

Page 33: CS 61C: Great Ideas in Computer Architecture

IEEE754Floating-PointStandard(3/3)

• Summary(singleprecision):

• CalledBiasedNotation,wherebiasisnumbersubtractedtogetfinalnumber− IEEE754usesbiasof127forsingleprecision− Subtract127fromExponentfieldtogetactualexponentvalue

031S Exponent

30 23 22Significand

1bit 8bits 23bits

• (-1)S x(1+Significand)x2(Exponent-127)•Doubleprecisionidentical,exceptwithexponentbiasof1023(half,quadsimilar)

10/24/17 33

Page 34: CS 61C: Great Ideas in Computer Architecture

BiasNotation(+127)

∞,NaN

Zero

Gettingclosertozero

HowitisencodedHowitisinterpreted

10/24/17 34

Page 35: CS 61C: Great Ideas in Computer Architecture

PeerInstruction• GuessthisFloatingPointnumber:11000000010000000000000000000000• RED: -1x2128

• GREEN:+1x2-128

• ORANGE:-1x21

-1.5x21

3510/24/17 35

Page 36: CS 61C: Great Ideas in Computer Architecture

MoreFloatingPoint• Whatabout0?

– Bitpatternall0smeans0,sonoimplicitleading1• Whatifdivide1by0?

– Cangetinfinitysymbols+∞,-∞– Signbit0or1,largestexponent(all1s),0infraction

• Whatifdosomethingstupid?(∞– ∞,0÷ 0)– CangetspecialsymbolsNaN for“Not-a-Number”– Signbit0or1,largestexponent(all1s),notzeroinfraction

• Whatifresultistoobig?(2x10308x 2x102)– Getoverflowinexponent,alertprogrammer!

• Whatifresultistoosmall?(2x10-308÷ 2x102)– Getunderflowinexponent,alertprogrammer!

10/24/17 36

Page 37: CS 61C: Great Ideas in Computer Architecture

Representationfor± ∞• InFP,divideby0shouldproduce± ∞,notoverflow•Why?– OKtodofurthercomputationswith∞E.g.,X/0>Ymaybeavalidcomparison

• IEEE754represents± ∞– Mostpositiveexponentreservedfor∞– Significand allzeroes

10/24/17 37

Page 38: CS 61C: Great Ideas in Computer Architecture

Representationfor0• Represent0?– Exponentallzeroes– Significand allzeroes–Whataboutsign?Bothcasesvalid+0: 0 00000000 00000000000000000000000-0: 1 00000000 00000000000000000000000

10/24/17 38

Page 39: CS 61C: Great Ideas in Computer Architecture

SpecialNumbers• Whathavewedefinedsofar?(SinglePrecision)

Exponent Significand Object0 0 00 nonzero ???1-254 anything +/- fl.pt.#255 0 +/- ∞255 nonzero ???

10/24/17 39

Page 40: CS 61C: Great Ideas in Computer Architecture

RepresentationforNot-a-Number• WhatdoIgetifIcalculate sqrt(-4.0)or0/0?

– If∞notanerror,theseshouldn’tbeeither– CalledNota Number(NaN)– Exponent=255,Significand nonzero

• Whyisthisuseful?– HopeNaNs helpwithdebugging?– Theycontaminate:op(NaN,X)=NaN– Canusethesignificand toidentifywhich!(e.g.,quietNaNs andsignalingNaNs)

10/24/17 40

Page 41: CS 61C: Great Ideas in Computer Architecture

RepresentationforDenorms(1/2)• Problem:There’sagapamongrepresentableFPnumbers

around0– Smallestrepresentablepositivenumber:

a=1.0…2 *2-126 =2-126– Secondsmallestrepresentablepositivenumber:

b =1.000……12 *2-126=(1+0.00…12)*2-126=(1+2-23)*2-126=2-126 +2-149

a- 0=2-126b- a=2-149 b

a0 +-

Gaps!

Normalization and implicit 1is to blame!

10/24/17 41

Page 42: CS 61C: Great Ideas in Computer Architecture

RepresentationforDenorms(2/2)• Solution:

− Westillhaven’tusedExponent=0,Significand nonzero− DEnormalized number:no(implied)leading1,implicit

exponent=-126− Smallestrepresentablepositivenumber:

a=2-149 (i.e.,2-126x2-23)﹣ Second-smallestrepresentablepositivenumber:

b=2-148 (i.e.,2-126x2-22)

0 +-10/24/17 42

Page 43: CS 61C: Great Ideas in Computer Architecture

SpecialNumbersSummary• Reserveexponents,significands:

Exponent Significand Object0 0 00 nonzero Denorm1-254 anything +/- fl.pt.number255 0 +/- ∞255 Nonzero NaN

10/24/17 43

Page 44: CS 61C: Great Ideas in Computer Architecture

SavingBits• Manyapplicationsinmachinelearning,graphics,signal

processingcanmakedowithlowerprecision• IEEE“half-precision”or“FP16”uses16bitsofstorage

– 1signbit– 5exponentbits(exponentbiasof15)– 10significand bits

• Microsoft“BrainWave”FPGAneuralnetcomputerusesproprietary8-bitand9-bitfloating-pointformats

44

Page 45: CS 61C: Great Ideas in Computer Architecture

Outline• DefiningPerformance• FloatingPointStandard• AndinConclusion…

10/24/17 45

Page 46: CS 61C: Great Ideas in Computer Architecture

AndInConclusion,…• Time(seconds/program)ismeasureofperformance

Instructions Clockcycles SecondsProgram Instruction ClockCycle

• Floating-pointrepresentationsholdapproximationsofrealnumbersinafinitenumberofbits- IEEE754Floating-PointStandardismostwidelyacceptedattemptto

standardizeinterpretationofsuchnumbers(Everydesktoporservercomputersoldsince~1997followstheseconventions)

- SinglePrecision:

××=

10/24/17 46

031S Exponent

30 23 22Significand

1bit 8bits 23bits