microprocessors: the engines of the digital age

18
The University of Manchester Research Microprocessors: the engines of the digital age DOI: 10.1098/rspa.2016.0893 Document Version Accepted author manuscript Link to publication record in Manchester Research Explorer Citation for published version (APA): Furber, S. (2017). Microprocessors: the engines of the digital age. Royal Society of London. Proceedings A. Mathematical, Physical and Engineering Sciences. https://doi.org/10.1098/rspa.2016.0893 Published in: Royal Society of London. Proceedings A. Mathematical, Physical and Engineering Sciences Citing this paper Please note that where the full-text provided on Manchester Research Explorer is the Author Accepted Manuscript or Proof version this may differ from the final Published version. If citing, it is advised that you check and use the publisher's definitive version. General rights Copyright and moral rights for the publications made accessible in the Research Explorer are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Takedown policy If you believe that this document breaches copyright please refer to the University of Manchester’s Takedown Procedures [http://man.ac.uk/04Y6Bo] or contact [email protected] providing relevant details, so we can investigate your claim. Download date:10. Dec. 2021

Upload: others

Post on 10-Dec-2021

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Microprocessors: the engines of the digital age

The University of Manchester Research

Microprocessors: the engines of the digital age

DOI:10.1098/rspa.2016.0893

Document VersionAccepted author manuscript

Link to publication record in Manchester Research Explorer

Citation for published version (APA):Furber, S. (2017). Microprocessors: the engines of the digital age. Royal Society of London. Proceedings A.Mathematical, Physical and Engineering Sciences. https://doi.org/10.1098/rspa.2016.0893

Published in:Royal Society of London. Proceedings A. Mathematical, Physical and Engineering Sciences

Citing this paperPlease note that where the full-text provided on Manchester Research Explorer is the Author Accepted Manuscriptor Proof version this may differ from the final Published version. If citing, it is advised that you check and use thepublisher's definitive version.

General rightsCopyright and moral rights for the publications made accessible in the Research Explorer are retained by theauthors and/or other copyright owners and it is a condition of accessing publications that users recognise andabide by the legal requirements associated with these rights.

Takedown policyIf you believe that this document breaches copyright please refer to the University of Manchester’s TakedownProcedures [http://man.ac.uk/04Y6Bo] or contact [email protected] providingrelevant details, so we can investigate your claim.

Download date:10. Dec. 2021

Page 2: Microprocessors: the engines of the digital age

Microprocessors:theenginesofthedigitalage

SteveFurberCBEFRSFREng,

SchoolofComputerScience,

TheUniversityofManchester,

OxfordRoad,

ManchesterM139PL

[email protected]

Page 3: Microprocessors: the engines of the digital age

AbstractThe microprocessor – a computer central processing unit integrated onto asinglemicrochip–hascometodominatecomputingacrossallofitsscalesfromthe tiniest consumer appliance to the largest supercomputer. This dominancehastakendecadestoachieve,butanirresistiblelogicmadetheultimateoutcomeinevitable.TheobjectivesofthisPerspectivepaperaretoofferabriefhistoryofthedevelopmentofthemicroprocessorandtoanswerquestionssuchas:wheredidthemicroprocessorcomefrom,whereisitnow,andwheremightitgointhefuture?

KeywordsMicroprocessor;Moore’sLaw;Intel;ARM;SpiNNaker.

1. IntroductionAcomputerrequiresmemorytoholdprogramsanddata,aprocessortoexecutethoseprogramsusingthedata,andI/O(input/output)capabilitiestointerfacetotheoutsideworld.Theintenseactiontakesplacewithintheprocessor,andthemicroprocessorachievesintegrationofalloftheprocessingfunctionsonasinglemicrochip.Theintroductionofthemicroprocessorrepresentedabreakthroughintermsofthesizeandcostofacomputersystem,andwasoneoftheadvancesthat made the personal computer (PC) revolution, and later the mobilerevolution, come about. The next revolution in computing in which themicroprocessorwillplayacentralroleisIoT–theInternetofThings.

Today, thankstotheexponentialprogress inthenumberof transistorsthatcanbefabricatedonasinglechip(Moore’sLaw[1]),theterm“microprocessor”hasbecomelessclearinitsprecisemeaning.TheprocessorchipinatypicalPCisa formidable beast with several processor ‘cores’, complex cache memoryhierarchies (though the main memory is still off chip) and very high-performance I/O interfaces (though most of the I/O components are still offchip). The nearest analogue to the original microprocessor is the individualprocessorcore,andthisistheinterpretationthatwillbeusedinthispaper.

The key benefit of the microprocessor results from integrating all of thecomponentsofacomputer thatare involved inexecuting instructions togetheronthesamemicrochip.Instructionsarefetchedfromexternalmemory(thoughoften today this is cache memory on the same chip) and data is loaded andstored from external memory (again, often using on-chip caches), but theinstruction decode and execute logic is all collocated, resulting in significantperformance and energy benefits compared with splitting the processingfunctions across two or more chips, as was done prior to the arrival of themicroprocessor.Thesebenefitsaccruebecauseon-chipconnectionsincurmuchlowerparasiticcapacitancethandooff-chipconnections,andmostofthedelaysandenergyconsumedbyaprocessorresultfromdrivingcapacitiveloadsupanddownduringexecution.

In this Perspectives paper I will offer a personal view of the keydevelopments in thehistoryof themicroprocessor,whichcanbedividedquite

Page 4: Microprocessors: the engines of the digital age

cleanlyintodecade-by-decadeprogress.Thisisnotanexhaustivehistory,butanattempt to highlight the key issues as they emerged. Our story starts in the1970s…

2. The1970s:emergenceBack in1969NipponCalculatingMachineCorporationapproached IntelwithaproposalforInteltobuild12customchipsforitsnewBusicom141-PF*rangeofcalculators. Intelcamebackwithacounter-proposal todevelop just fourchips,one of which could be programmed to meet the needs of the range. Thatprogrammable chipwas the Intel 4004. Intel bought the rights to those chipsback from the customer and launched the Intel 4004 and its accompanyingchipsetwith an advertisement in theNovember 15, 1971, issue of ElectronicsNews: “Announcing a New Era in Integrated Electronics”. Themicroprocessorwasborn.

TheIntel4004[2]used2,300transistorsona10-micrometerpMOSprocess.Itcouldbeclockedatfrequenciesupto740kHzandwouldexecuteupto92,600instructionspersecond.Itwasa4-bitdevice(definedbythe4-bitwidthofthedatabus),with8-bit instructionsanda12-bitaddressbus,all integratedintoa16-pin dual-in-line package. From this modest start a new era did, indeedemerge!

Throughthe1970sadiverserangeofmicroprocessorsweredeveloped,thegreatmajorityofwhichwere8-bitdeviceswith16-bitaddressbusespackagedin40-pin dual-in-line packages. These included direct descendants of the 4004suchastheIntel8008andthe8080,theSignetics2650(myfirstmicroprocessor,nowlargelyforgotten!), theMotorola6800,theNationalSemiconductorSC/MP(fromwhichAcornwasbootstrapped),theMOSTechnology6502andtheZilogZ80.The6502drovedownthepricetonewlevelsofaffordability,andtogetherwith the Z80 was largely responsible for the emergence of the computerhobbyistmovementwhich in turn led to thehome computer revolutionof the1980s.

Thankstothe8-bitmicroprocessor,thecomputerwasnowoutofthehandsof thewhite-coatedcomputeroperatoremployedbythe largecorporation,andinto the hands of the young enthusiast – students and entrepreneurs. Whenthose young enthusiasts included the likes of Steve Jobs and Steve WozniakcreatingtheApple1,theseedsofchangewerewellandtrulysown.

3. The1980s:RISCvsCISCBy thebeginningof the1980s the personal computermarketwas established,anditwasbeginningtobreakoutfromitshobbyistoriginsintothewiderhomemarket,withbasic computer familiarity andgamingbeing theprimaryuses inthe home. These machines used 8-bit microprocessors, but there was a clearroadmapupto16-bitmicroprocessors.

In 1981 IBM introduced the IBM PC, also powered by an 8-bitmicroprocessor – the Intel 8088 – clearly targeting desk-tops in business. Itwasn’t an especially ambitious machine, but the IBM name carried a lot ofweight,andtheIBMPCcametosetthestandardformostofthePCmarketupto

Page 5: Microprocessors: the engines of the digital age

this day (though IBM itself no longer makes IPCs). Only Apple offered somecredibledegreeofcompetition.

Thus thePCwasestablished,and thescenewasset for themicroprocessormanufacturers to move their customers up to 16-bit machines, as moreperformancewouldclearlysellmoremachines.Buthowshoulda16-bitmachinebe architected? The established microprocessor manufacturers were all largesemiconductorcompanieswhoknewalotaboutmakingchipsbutfarlessaboutcomputer architecture. 8-bitmicroprocessorswere relatively simple to design,but16-bitarchitectureswereacompletelydifferentkettle-of-fish.

There was a readily available source of architectural insight into how toconfigurea16-bitmachineastheminicomputerbusinesshadbeentherebefore.Theleadingminicomputerinthe1970swasthe(32-bit)DECVAX11/780.Itdidnotuseamicroprocessor,butitshowedhowtoarchitectsuchamachineusingamulti-chipprocessor,andwhyshouldn'tamicroprocessordosomethingsimilar?The11/780architecturewasverycomplex,andreflectedthedesireatthetimeto ‘close the semantic gap’ between the high-level language and the machineinstruction set. 16-bit microprocessors should follow this trend, within theconstraintsofthelimitedtransistorresourceonasinglechip.

Butsomefolkhadotherideas!In1980DavidA.PattersonandDavidR.Ditzelpublished their seminal paper “The case for the Reduced Instruction SetComputer” [3]. This paper made the very strong case that optimizing anarchitecture for the limited resource on a single chipwas quite different fromoptimizing it foramulti-chipprocessorsuchas thaton theVAX11/780.Theirargumentswerestrong,andwerebackedupbyarealchipdesign–theBerkeleyRISCI–thatwasbeingdesignedbyapostgradclassinonesession.Therewereotherdesignsaroundatthetimethatreinforcedthismessage,mostnotablytheIBM 801 [4] and soon thereafter the Stanford MIPS (Microprocessor withoutInterlockingPipelineStages)[5].Thefundamentalcasewasthatanarchitecturebased on a 1970s minicomputer would have a very complex instruction set(CISC!),which incurreda lotof chiparea for themicrocodeROMtomapallofthoseinstructionintobasicinstructionelements.WithRISCthatcomplexitywasreversed; by keeping the instruction set as simple and regular as possible, nomicrocodeROMwouldberequired,sothereweremoretransistorsavailableforarchitecturefeaturesthatgavemorebenefit,suchasafull32-bitinstructionsetand pipelined execution (whichwas also facilitated by the regular instructionset).

Themainstreammicroprocessormanufacturerswereunconvincedbyall ofthis academic argument, and indeed spentmost of the 1980s expressing theirfirm opposition to the concept (though by the end of the 1980s most hadsuccumbed and had some sort of in-houseRISC project underway). However,awayfromthemainstream,smallercompaniesconsideringdesigningtheirownprocessors lapped this all up. One such company was Acorn Computers inCambridge,UK,whowereresponsibleforthedesignoftheverysuccessfulBBCMicrocomputer,andwerestrugglingtoseehowtheyshouldmoveupto16-bitprocessing.AtitspeakAcornemployedaround400staffandsomein-housechipdesignexpertise,andtheyhadworkedcloselywithVLSITechnology,Inc.,ofSanJose, California, so they were developing some experience in designing chipsfromscratch.

Page 6: Microprocessors: the engines of the digital age

Acorn started by looking at the 16-bit microprocessors emerging frommainstreamindustry,butfoundthattheyhadtwoprincipaldrawbacks:• Real-time performance: the BBC Micro made extensive use of the 6502’sgood real time response to handle a lot of complex I/O in software. Theemerging 16-bit microprocessor had significantly inferior real-timecapabilities.

• Memorybandwidthutilization:Acorn’sengineershadformedtheviewthatthe primary determinant of performance was the processor’s ability toaccessmemoryathighbandwidth.Themostexpensivecomponent inaPCwasthememory,butthe16-bitprocessorsofthedaycouldnotmakefulluseofthebandwidthofferedbythosememories–surelyamistake?

So the Acorn team had started to think about designing their ownmicroprocessor toovercometheseperceiveddeficiencies.TheRISCphilosophyfrom Berkeley found an eager audience, and the Acorn RISC Machine (latersimplyARM)projectwasborn.TheARMwasdesignedfromtheoutsetasa32-bitmachine, so Acorn largely skipped the 16-bit generation.Why not? 32 bitsshould give you twice the bandwidth of 16 bits, and hence twice theperformance!

Acornwas,ofcourse,notaloneinpursuingtheRISCideainthecommercialdomain.StanfordUniversityspunoutacompanytocommercializederivativesoftheirRISCwork, and again in theUK, Inmos Ltd pursued a somewhat parallelpath (thoughnot influencedbyRISC) todevelop the transputer–a single-chipmachinewith processor,memory and communications on the same chip, in aform that enabled easy scaling up to large-scale parallel machines. And therewereothers[6].

4. The1990s:clockwarsandSoCsThrough the 1990sMoore’s Law delivered ever-increasing transistor resourcethatenabledfull32-bitRISCandCISCmicroprocessorstobedeliveredwithevermore complexmicroarchitectures, including cachememories, translation look-asidebuffers,etc.–thefullgamutoftricksandfeaturesthathadbeendevelopedinadifferentageinmainframeandminicomputers,andthensome.

Thecharacteristicthatdrovemicroprocessordesignmorethananythingelsewas the desire for faster clock rates. This became the major marketingdifferentiator between high-end processors, leading to the ‘clockwars’, wheresellingmicroprocessors toabroadand largelynon-technicalmarketdependedincreasinglyonclaims forextremeclockrates thatoftenhad littleornodirectrelationshiptoanyperformancebenefitrealizablebyatypicaluser.

Atthesametime,inaverydifferentmarket,mobilesystemsweredevelopinginaverydifferentway.The1990ssawtheintroductionofthemobileSystem-on-Chip (SoC)where, insteadofusing thegrowing transistor resource to improvethe performance of the microprocessor, it was used to bring more and moresystem functions onto the same microchip as the microprocessor. For thispurpose, a small and simplemicroprocessor left themaximumspace for otherfunctions, and the newly formed ARM Ltd had the smallest and simplestmicroprocessor offering in this newly emergingmarket [7]. Nowherewas thismorevital than inthedigitalmobilephonehandsetmarket,whereEuropehadstolen a lead over the rest of the world, and when the leading handset

Page 7: Microprocessors: the engines of the digital age

manufacturer of the time – Nokia – adopted the ARM for its products, theconsequenceswerehighlydisruptive.

TheNokiadealwasn’taforegoneconclusion.ARM’sRISCheritageconveyedmany advantages in terms of performance and simplicity, but it had onesignificant drawback: the fixed 32-bit instruction set architecture resulted inworsecodedensity thanwasachievedby thevariable-length instructionsusedbytheCISCcompetition,andcodedensitymattershere.Poorcodedensityleadstolargercodememoriesandmorepowerdissipatedincodefetchingfromthosememories.

ARM Ltd addressed the code density issue with an imaginative leap. TheyintroducedtheThumb16-bitinstructionset[8],whereeachThumbinstructionis a compressed form of a 32-bit ARM instruction. Thumb instructions aredecompressed using simple combinatorial logic at the front of the ARMexecutionpipeline,which isotherwiseprettymuchunchanged.Amodechangeswitched theprocessor between32-bitARMand16-bit Thumbexecution, andthecodedensityproblemwentaway.Nokiawasconvinced,andARM’sroutetodominationofthemobilephonehandsetbusinesswasestablished.

All this SoC development attracted little attention from the high-endmicroprocessorcompanies.ARMwasn’tevenpretendingtoplaytheclock-warsgame;itwasinadifferentbusinessataverydifferentperformancelevel.

5. The2000s:manycoresmakeefficientworkShortly after the turn of the Millennium, the world of the high-endmicroprocessorchangedbeyondallrecognition.Theclockwarsofthe1990shadrunintoawall–power.Chipsweresimplygettingtoohot,andtheproblemsofcoolingthemwerebecominginsurmountable.

High-profile development programmes for very high clock-rate productswerescrapped,presumablyatconsiderableexpense,andanotherway forwardhad tobe found.Fortunately, therewasa simple solution. Insteadofusing the(still) ever-increasing transistor resource to make a single microprocessor gofaster,usethatresourcetoputtwo(ormore)identicalmicroprocessorsontothesame microchip. Due to the fundamental physics that defines the powerconsumption of a CMOS circuit, two half-speed processors use about half thepower of a single double-speedprocessor. The only down-side is that the twoprocessors are much harder to program than is the single one, but that is aproblemfortheuser,not themanufacturer; inanycase, there isno longeranychoiceinthismatter.Goparallel,oracceptperformancestagnation!

Meanwhile,inthemobilephonebackwater,thingswerechangingtoo.Simplemobile telephone handsets were becoming smarter, with pretensions tofunctionality that had previously been the domain of the computer. Textmessaginghadalwaysbeenamobilephone function,butnowuserswantedtoread their emails on their phones, and maybe even browse the web. Mobilephonesneededmorecomputepower,butstillonavery tightpowerbudget (amobilephonehandsetcanaccommodatearound3wattsofprocessordissipationbefore itbecomesuncomfortable for theuser), somulticoresolutionsemergedheretoo.

TherealturningpointcamewiththeintroductionofthefirstAppleiPhonein1997.ApplereinventedthesmartphoneconceptinthewaythatonlyApplecan,

Page 8: Microprocessors: the engines of the digital age

andintroducedauserinterfacethatgavethesmartphoneuseraccesstomostofthefunctionalityofadesktopmachine,albeitsomewhatrestrictedbythesizeofscreen that could be fitted onto a product designed to fit into a pocket, inadditiontophonecapabilityandadditionalfunctionalityresultingfromarangeofsensorsbuiltintothemachine.

Allofthisrequiredthehumblemobileprocessortostepuptoamuchhigherperformance mark. A smartphone incorporates many processors, not just thefrontlineapplicationprocessors,butthosefrontlineprocessorswerenowbeingaskedtodelivertheperformanceofa1980ssupercomputerjusttokeeptheuserinterfacesmoothandresponsive. Ineffect, smartphoneprocessorswerebeingaskedtodeliverperformanceapproachingthatofadesktopprocessorbutonasmartphonepowerbudget.

6. The2010s:mobility,performance,machinelearningThepresentdecadehasseendramaticgrowthinmobiletechnology,withsmartphonesdominatingthemobilephonehandsetmarket,drivingupthedemandformobiledatabandwidth.Alongsidethesmartphone,itsbiggerbrother,thetablethas gained market acceptance, again driven initially by Apple’s innovativeproduct developments and marketing. The tablet’s successful format hasemerged as a large-screen variant of the smart phone rather than as amobileversionofaPC,previousattemptstointroducewhichlargelyfloundered.

With the growth of the market for smart phones and tablets, screenresolution has improved and processor performance has been pushed hardwithin the very restrictive power and battery-life limits. Who would haveimagined that a humble mobile phone would ever require more processingpowerthancouldbedeliveredbyastate-of-the-art32-bitprocessor?Yet64-bitmachinesnowdominatehigh-endsmartphonesandtablets.Theoverwhelmingmajority of these processors are based on designs fromARM Ltd,who by thebeginning of 2016 had seen 85 billion ARM-powered chips shipped by theirglobal partnership, giving them absolute numerical domination of the worldmicroprocessormarket.

This decade has also seen dramatic growth in ‘cloud’ computing – vastwarehousesfullofserversandstoragesystems[9],predominantlypoweredbyIntel’s high-end microprocessors. The scale of these largely invisible systemsbeggars belief, with global corporations such as Amazon and Google havingmultipledatacenterinstallationsaroundtheworld,collectivelyusingmillionsofhigh-endIntelmicroprocessors toprovide internetservicessuchassearchandvideo streaming, with individual power budgets in the region of 100MW – asignificantproportionoftheelectricalpoweroutputofasmallpowerstation.

Someof thekeyuser functionsavailableonsmartphones–suchasspeechrecognition – are, in fact, currently delivered through cloud services. Thissynergyoflow-poweruser-friendlymobileplatforms,high-speeddigitalwirelesscommunicationandhugecloudcomputeresourcesunderpinstoday’sconsumersmartphoneexperience.

Several of the cloud applications employ ‘deep learning’ [10] techniques –machinelearningusinglarge-scalemulti-layeredneuralnetworks.Thereisalotof interest around accelerators for the neural networks as employed by deep

Page 9: Microprocessors: the engines of the digital age

learning systems, but the jury is still out as to thebest approach to takehere.Graphicsprocessors(GPUs)havebecomeincreasinglyusableforhighlyparallelnumericalcomputation,andthisiscurrentlytheleadingapproachtoacceleratedeep learning. An innovative approach is demonstrated by Google’s TensorProcessingUnit[11],acustomchipdesignedtoacceleratemachinelearningbyexploiting the reduced computational precision typically required for suchalgorithms. Brain-inspired ‘neuromorphic’ technologies [12] offer greaterenergy-efficiency, but are relatively unproven. All manner of intermediatesolutionsareemerging,andthisisanareathatisripeforstart-upcompaniesandentrepreneurialinnovation.

7. TechnologytrendsUnderpinningtheprogressinmicroprocessorsistheprogressinsemiconductortechnology over the last half century. This progress has been exponential inalmostallrespects,including:

• thegrowthinthenumberoftransistorsthatcanbeintegratedonasinglemicrochip,whichhas followedMoore’sLaw[1] froma fewthousand intheearly1970stoafewbilliontoday;

• thematchingprogressinmemorytechnologyintermsofthenumberofbitsthatcanbestoredonachip,fromafewthousandintheearly1970stobillionstoday;

• theshrinkageof thetransistor featuresizewhichhasmadethisgrowthpossible, from10microns in theearly1970s toaround10nanometerstoday;

• theincreaseinclockspeedfromjustunder1MHzinthe1970stoaround3-4GHztoday,afigurethatislimitedbypowerratherbythetechnologyitself;

• thecostofthedesignof,andthemanufacturingfacilityfor,astate-of-theartmicroprocessor.

Formuchofthehistoryofthemicroprocessortherehasbeenawin-winscenariowhereby smaller transistors have delivered faster, more efficient and cheaperfunctionality,andtherateofprogresshasbeengovernedprincipallybythetimerequired to recoup enough revenue from one generation to cover the costs ofdeveloping thenextgeneration.Thisvirtuouscirclehasnowcome toanend–thecostperfunctionisnowincreasingastransistorsshrinkbeyondabout30nm,andtheeconomicsofdesignareundersignificantstrain.

If Moore’s Law is no longer delivering progress, what other options arethere?Microchips are still two-dimensional,with all of the action close to thesurfaceofathinsliverofsilicon.Thereisgrowinginterestin3Dpackaging[13]–stacking microchips on top of each other, thereby reducing the distance thatsignals have to travel from one chip to another, for example from themicroprocessor to its memory. To a first approximation one can consider theenergy required to perform a given computation to be determined by thenumber of bits of data that are moved in the course of that computationmultiplied by the distance that they move. 3D packaging therefore offers theprospect of improving energy-efficiency, but it brings with it quite severethermal considerations:memory chips, in particular, cannot be allowed to get

Page 10: Microprocessors: the engines of the digital age

toohotiftheyaretooperatereliably,sostackingsensitivememoriesontopofpower-hungrymicroprocessorsisnotarecipeforreliableoperation.

8. Thefuture:IoT,heterogeneityanddarksiliconAlthough there is always the risk of technology disruptions rendering anyattempt topredict the future futile, thenextdecadeor twoofdevelopmentsofmainstream microprocessor technology seem to be fairly settled. It seemsunlikely that radical new technologies (such as the much vaunted quantumcomputing)willhaveasignificant impactonmainstreamcomputingwithinthenext 20 years, so theworld of themicroprocessorwill likely be dominatedbytrendsthatarealreadyvisibletoday.

Amongthesevisibletrends,IoT(the“InternetofThings”)[14]willaffectthedesignofthelargestvolumeofmicroprocessors,andislargelyunaffectedbytheendofMoore’sLaw.IoTenvisagesaworldwherelocalconnectedintelligenceisinbuilt ineverything, epitomizedby the ‘internet lightbulb’ and the connectedhome.Sensorsareeverywhere,relayingtheirsensedinformationbackintocloudservices. The world will have a very fine-grained nervous system. Themicroprocessorsthatpowertheperipheryofthisnetworkwillneedtobesmalland efficient, and operatewith nomaintenance, often harvesting energy fromtheir environment (since replacingbatterieswill be out of thequestion). Theywill number in the hundreds of billions. Inmanyways theymay be similar totoday’smicrocontrollers, but securitywill be a huge issue. Therehave alreadybeenissueswithcomputersystemsincarsbeinghacked;whenthewholeplanetis available for hacking, security will be paramount, and innovation will berequired to achieve thiswithin the verymodest scale and resources of an IoTdevice.

Higherupthemicroprocessorfamilytree,energy-efficiencywilldominateinawaythathasaquitedifferentmanifestation.Efficiencyconsiderations,togetherwiththeendofMoore’sLaw,willdrivemicroprocessorstowardsheterogeneoussolutions, with tuned accelerators available to reduce the energy demands ofparticularclassesofapplication.Graphicsprocessorswereanearlyexampleofan application-specific accelerator, though they were motivated primarily byperformance requirements rather thanby energy-efficiency.The current trendin seeking efficient accelerators for deep learning, probably alongside newlyemergingapproachestoartificialintelligence,willcontinue,andwillbejoinedbyotherapplication-specificaccelerators.

The overall picture that emerges, then, is that futuremicroprocessors willhaveafewcomplex‘fat’corestomaximizetheperformanceofcodethatwillnotparallelize, together with many simpler ‘thin’ cores to run parallelizable codemore efficiently, and a range of special-purpose accelerators to supportimportantcomputationalkernelsevenmoreefficiently thancan the thincores.Heterogeneity will become a common feature of high-end many-coremicroprocessors.

Withallofthiscomputeresourceavailable,therun-timesystemwillhavetomanage thepower andperformance levels of each computational unit to keeppower under control, and at any time a significant proportion of thecomputationalresourcewillhavetobepowereddown,sincerunningitallatthe

Page 11: Microprocessors: the engines of the digital age

same time will result in excessive power consumption. These powered downunitswillformthe‘darksilicon’[15].

9. ProgressoveronecareerI have offered a personal view of the key stages in the progress of themicroprocessoroverthehalf-centuryof itsexistence,andwillclosebyofferingtwospecificdatapointsthatrepresentthestartandtowardstheendofmyowncareer in terms of microprocessors and microprocessor-based systems intowhichIhavehadsignificantinput:

• ThefirstARMprocessor(Fig.1),retrospectivelycalledARM1sincetherehavebeenmanysuccessors,wherethedesignstartedlatein1983andthefirstsiliconchipwasoperationalonApril25th1985.

• SpiNNaker(Fig.2),wherethedesignstartedin2006andthefirstsiliconchipwasoperationalin2011.Thelarge-scalemachine(Fig.3)wasturnedon under the auspices of the European Union Human Brain Project inApril2016.

ThedevelopmentoftheARMwasdescribedearlierinthehistoricalaccount.Thefirst ARM microprocessor was designed on a 3 micron CMOS process, usingaround25,000transistorsona7mmx7mmsiliconchip.Itwouldexecuteatupto 6 MHz, processing 6 million instructions per second while using about 0.1wattsofelectricalpower.

Aquarterofacenturylater,theSpiNNakerprocessorwasdesignedona130nanometerCMOSprocess(anold,buteconomic,technologyby2006standards),using around 100million transistors on a 1 cm x 1 cm silicon chip [16]. TheSpiNNaker processor incorporates 18 ARM processor cores, each with a localmemorysystem(whichaccountforthegreatmajorityofthetransistors),andawide range of system support components. The ARM cores operate at up to200MHz,givingthechipa total throughputofupto3.6billion instructionspersecond, with a total electrical power consumption (including the 128 Mbytememoryincorporatedintothesamepackage)of1watt.

SpiNNaker was designed to support large-scale brain and brain-likecomputationalmodelsinbiologicalrealtime,andassuchtheprocessorisanodein a large machine [17]. The Human Brain Project SpiNNaker platform uses28,800 SpiNNaker processors to yield a machine with half a million ARMprocessor cores occupying a total active silicon area approaching 5 squaremetres.

SpiNNaker exemplifies some of the principles outlined earlier for futuretrendsincomputerdesign:

• massiveparallelismusingmany‘thin’cores(withoutthefatcores,astheSpiNNakertargetapplicationareais“embarrassingly”parallel;

• 3Dpackaging(whichisfairlybasicinSpiNNaker’scase)tominimizethedistanceoverwhichdatamoves;

• prioritizingenergy-efficiencyoverperformance.The instruction set architecture supported by the ARM processors on

SpiNNakerhasseenmanyextensionsandenhancementssincethatusedonthefirst ARM processor, but it is still highly recognizable as a descendant of thatoriginal instruction set. In contrast, the silicon technology and the resulting

Page 12: Microprocessors: the engines of the digital age

performance, functional density and energy-efficiency have changed by manyordersofmagnitudeoverthatquarterofacentury.

10. ConclusionsThemicroprocessor has seen formidable growth in its influence on humanitysinceitshumbleoriginsin1969asaproposaltooptimizethecostofbuildingarange of desktop calculators. The inescapable cost, performance and efficiencybenefits of integrating all of the central functions of a computer onto a singlemicrochiphavedriven the entire computerbusiness into themicroprocessor’sclutches, and in the process the market for products with embeddedcomputationalintelligencehasdiversifiedinmanydirectionstothepointwherenowtherearealreadymorethantencomputersforeveryhumanontheplanet–a numberwhichwill increaseby another order ofmagnitudeover the comingdecadeasIoTbecomesapervasivereality.

Since the spread of the personal computer through the 1980s, followed bytheintroductionoftheworld-widewebinthe1990s(makingtheInternet,whichhad already been around for a couple of decades, usable by the widerpopulation)andconcurrentdevelopmentsinmobile(wireless)communications,computerandcommunicationstechnologyhasbecomepartoftheinfrastructureofhumansocietyeverywhere.Wehavetrulyenteredintoadigitalage,andthemicroprocessoristheengineofthatdigitalage.

DataaccessibilityNotapplicable.

CompetinginterestsThe author is a founder, director and shareholder of Cogniscience Ltd, aUniversityofManchester spin-outcompany thatownsaSpiNNakerpatentandrelatedIP.

Authors’contributionsSingleauthor–notapplicable.

AcknowledgementsThe instruction set architecture of the original ARM microprocessor wasdevelopedprincipallybymycolleagueatthattime,SophieWilson,withwhomIshare theRoyalSociety2016MullardAward.TheAcornteamincludedseveralfolk who wrote software test programs, and a VLSI design team whoimplementedthesiliconchipandlast,butnotleast,supportandencouragementfromAcorn’sseniormanagementatthattime,mostnotablyHermannHauser.

SpiNNaker has been 15 years in conception and 10 years in construction,andmany folk in Manchester and in various collaborating groups around theworld have contributed to get the project to its current state. I gratefullyacknowledgeallofthesecontributions.

Page 13: Microprocessors: the engines of the digital age

FundingstatementThedesignandconstructionoftheSpiNNakermachinewassupportedbyEPSRC(the UK Engineering and Physical Sciences Research Council) under grantsEP/D07908X/1 and EP/G015740/1, in collaboration with the universities ofSouthampton, Cambridge and Sheffield and with industry partners ARM Ltd,SilistixLtdandThales.OngoingdevelopmentofthesoftwareissupportedbytheEUICTFlagshipHumanBrainProject(FP7-604102),incollaborationwithmanyuniversityand industrypartnersacross theEUandbeyond,andexplorationofthecapabilitiesof themachine is supportedby theEuropeanResearchCouncilundertheEuropeanUnion’sSeventhFrameworkProgramme(FP7/2007-2013)/ERCgrantagreement320689.

EthicsstatementNotapplicable.

References

[1] Moore, GE. 1965 Cramming More Components onto Integrated Circuits.Electronics,pp.114–117,April19.

[2] Intel’smuseumarchive,“i4004datasheet”,http://www.intel.com/Assets/PDF/DataSheet/4004_datasheet.pdf

[3] Patterson,DA, Ditzel,DR.1980TheCasefortheReducedInstructionSetComputer.ACMSIGARCHComputerArchitectureNews8(6),pp.25-33.

[4] Radin,G. 1982The801minicomputer.Proc.1st IntSymponArchitecturalSupport for Programming Languages and Operating Systems (ASPLOS-I),pp.39–47.

[5] Hennessy, JL, Jouppi, N, Przybylski, S, Rowen, C, Gross, T, Baskett, F, Gill, J. 1982 MIPS: A Microprocessor Architecture. MICRO-15: Proc 15th Ann Symp. on Microprogramming, pp. 17-22, IEEE Press.

[6] Furber, SB.1989VLSIRISCArchitectureandOrganization,MarcelDekker,NewYork.

[7] Furber,S.2000ARMSystem-on-ChipArchitecture,AddisonWesley.[8] JaggerD(ed.).1997AdvancedRISCMachinesArchitectureReferenceManual

(1stedition),PrenticeHall.[9] Barroso,LA,Clidara, J, Hölze,U.2013TheDatacenterasaComputer:An

Introductions to the Design of Warehouse-Scale Machines (2nd edition),SynthesisLecturesonComputerArchitecture8(3),pp.1-154.

[10] Hinton,GE,Osindero,S,The,Y-W.2006AFastLearningAlgorithmforDeepBeliefNets.NeuralComputation18,pp.1527-1554.

[11] Jouppi,N.May18,2016.GooglesuperchargesmachinelearningtaskswithTPUcustomchip.https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html

[12] Furber,S.2016Large-ScaleNeuromorphicComputingSystems.J.NeuralEngineering13(5),pp.1-14.

[13] Pavlidis, V F, Friedman, E G. 2009 Three-dimensional Integrated CircuitDesign,MorganKaufmann,SanFrancisco,CA.

Page 14: Microprocessors: the engines of the digital age

[14] Atzori, L, Iera, A, Morabito, G. 2010 The Internet of Things: A Survey.ComputerNetworks54(15),pp.2787-2805.

[15] Esmeilzadeh,H,Blem,E,St.Amany,R,Sankaralingam,K, Burger,D.2011Dark Silicon and the End of Multicore Scaling, Proc. 38th Int Symp onComputerArchitecture(ISCA’11),SanJose,CA,USA,pp.365-376.

[16] Painkras, E, Plana, LA, Garside, JD, Temple, S, Galluppi, F, Patterson, C,Lester,DR,Brown,AD,Furber,S.2013SpiNNaker:A1W18-coreSystem-on-ChipforMassively-ParallelNeuralNetworkSimulation.IEEEJ.ofSolid-StateCircuits,pp1943-1953.

[17] Furber,SB,Galluppi,F,Temple,S,Plana,LA.2014TheSpiNNakerProject.Proc.IEEE102(5),pp.652-665.

Page 15: Microprocessors: the engines of the digital age

Figure1

Figure1: The firstARMchip,delivered in1985.Theregularityof the layout–particularlyofthe32-bitdatapaththatoccupiesoverhalfofthechipareaatthebottomofthefigure–betraysthemanualprocessusedforthedesign.Thechipincorporates25,000transistorsona7mmx7mmarea.Connectionstoexternalcircuitryaremadethroughtheouterringofinput/outputcircuitry.

Page 16: Microprocessors: the engines of the digital age

Figure2

Figure2. TheSpiNNakerchip,deliveredin2011.Eachofthe18coresincludesanARMprocessorandsundryadditionalcircuitrywithinthelighterrectangulararea.Thedarkerareawithineachcoreismemory(RAM).Thechipincorporates100,000,000transistors(thegreatmajorityofwhichareinthememories)ona1cm x 1cm area. Again, connections to external circuitry aremade through theouterringofinput/outputcircuitry.

Page 17: Microprocessors: the engines of the digital age

Figure3

Figure3.TheSpiNNakermachine,deliveredin2016.Themachineincorporates28,800 SpiNNaker chips – half a million ARM processor cores in total –connected into a two-dimensional toroidal surface, contained in five 19” rackcabinets. The sixth cabinet contains the associated server systems. The totalactivesiliconareainthemachineamountstofivesquaremetres.

Page 18: Microprocessors: the engines of the digital age

Authorprofile

SteveFurberCBEFRSFREng isICLProfessorofComputerEngineeringintheSchool of Computer Science at the University of Manchester, UK. AftercompletingaBAinmathematicsandaPhDinaerodynamicsattheUniversityofCambridge, UK, he spent the 1980s at Acorn Computers, where he was aprincipal designer of the BBC Microcomputer and the ARM 32-bit RISCmicroprocessor.Over85billionvariantsoftheARMprocessorhavesincebeenmanufactured,poweringmuchoftheworld'smobileandembeddedcomputing.HemovedtotheICLChairatManchesterin1990whereheleadsresearchintoasynchronous and low-power systems and, more recently, neural systemsengineering,wheretheSpiNNakerprojectisdeliveringacomputerincorporatingamillionARMprocessorsoptimizedforbrainmodellingapplications.