microprocessor design

95
Microprocessor Design

Upload: muhammad-umer

Post on 20-Nov-2015

48 views

Category:

Documents


2 download

DESCRIPTION

tjrtsjtrjngfh

TRANSCRIPT

  • Microprocessor Design

  • Contents

    0.1 Wikibooks:Collections Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 What is Wikibooks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 What is this book? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.3 Who are the authors? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.4 Wikibooks in Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.5 Happy Reading! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    0.2 Microprocessor Design/Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.1 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.2 How Will This Book Be Organized? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.3 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20.2.4 Who Is This Book For? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.2.5 What This Book Will Not Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30.2.6 Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1 Microprocessor Basics 41.1 Microprocessor Design/Microprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

    1.1.1 Microprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.2 Abstraction Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.3 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.4 ISA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.5 Moores Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.6 Clock Rates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.1.7 Basic Elements of a Computer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    1.2 Microprocessor Design/OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.1 Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2.2 Types of Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    1.3 Microprocessor Design/Real-Time Operating System . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.1 Real-Time Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.2 RTOS vs. General Purpose OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.3.3 Kernel in RTOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    1.4 Microprocessor Design/Embedded System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.1 Embedded System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.4.2 Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    i

  • ii CONTENTS

    1.4.3 Real-Time Embedded System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.5 Microprocessor Design/Computer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    1.5.1 Von Neumann Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.5.2 Harvard Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.5.3 Modern Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.5.4 RISC and CISC and DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.5.5 Microprocessor Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.5.6 Endian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.5.7 Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.5.8 further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    1.6 Microprocessor Design/Instruction Set Architectures . . . . . . . . . . . . . . . . . . . . . . . . . 151.6.1 ISAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.6.2 Memory Arrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.6.3 Common Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.6.4 Instruction Length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.6.5 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    1.7 Microprocessor Design/Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.7.1 Memory Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.7.2 Hard Disk Drives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.7.3 RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.7.4 Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.7.5 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    1.8 Microprocessor Design/Control and Datapath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.8.1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    1.9 Microprocessor Design/Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.9.1 Clock Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191.9.2 Cycles per Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.9.3 Instruction count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.9.4 CPU Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.9.5 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.9.6 Amdahls Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.9.7 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    1.10 Microprocessor Design/Assembly Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.10.1 Assemblers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.10.2 Assembly Language Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.10.3 Load and Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.10.4 Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.10.5 Jumping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.10.6 Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211.10.7 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    1.11 Microprocessor Design/Design Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

  • CONTENTS iii

    1.11.1 Determine Machine Capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.11.2 Design the Datapath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.11.3 Create ISA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.11.4 Instruction Set Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.11.5 Build Control Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.11.6 Design the Address Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231.11.7 Verify the design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.11.8 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241.11.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    2 Microprocessor Components 252.1 Microprocessor Design/Basic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

    2.1.1 Basic Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.1.2 Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.1.3 Multiplexers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.1.4 Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    2.2 Microprocessor Design/Program Counter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.2.1 Updating the PC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.2.2 Branching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

    2.3 Microprocessor Design/Instruction Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.3.1 RISC Instruction Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.3.2 CISC Instruction Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    2.4 Microprocessor Design/Register File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.1 Register File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4.2 More registers than you can shake a stick at . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.3 Register Bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.4.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    2.5 Microprocessor Design/Memory Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.5.1 Memory Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.5.2 Actions of the Memory Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.5.3 Timing Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

    2.6 Microprocessor Design/ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.6.1 Tasks of an ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.6.2 ALU Slice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.6.3 Example: 2-Bit ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.6.4 Example: 4-Bit ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.6.5 Additional Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.6.6 ALU Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.6.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

    2.7 Microprocessor Design/FPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.7.1 Floating point numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.7.2 Floating Point Unit Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

  • iv CONTENTS

    2.7.3 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.8 Microprocessor Design/Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    2.8.1 Simple Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.8.2 Complex Control Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    3 ALU Design 363.1 Microprocessor Design/Add and Subtract Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    3.1.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.1.2 Bit Adders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.1.3 Serial Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.1.4 Parallel Adder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.1.5 Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    3.2 Microprocessor Design/Shift and Rotate Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.1 Shift and Rotate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.2 Logical Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.3 Arithmetic shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.4 Rotations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.5 Fast Shift Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.2.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

    3.3 Microprocessor Design/Multiply and Divide Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.1 Multiply and Divide Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3.2 Multiplication Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.3 Division Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.3.4 Multiply and Accumulate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

    3.4 Microprocessor Design/ALU Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4.1 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4.2 Zero Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4.3 Overflow Flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403.4.4 Carry/Borrow flag . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4.5 Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4.6 Latch ALU flags or not? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

    4 Design Paradigms 424.1 Microprocessor Design/Single Cycle Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    4.1.1 Cycle Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.1.2 Redundant Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.1.3 Single Cycle Designs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    4.2 Microprocessor Design/Multi Cycle Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2.1 Multi-Cycle Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424.2.2 Hardware Reuse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

    4.3 Microprocessor Design/Pipelined Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3.1 Pipelining Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

  • CONTENTS v

    4.3.2 Pipelining Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434.3.3 Superpipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.3.4 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    4.4 Microprocessor Design/Superscalar Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.5 Microprocessor Design/VLIW Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    4.5.1 VLIW Vs Superscalar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.5.2 Multi-Issue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    4.6 Microprocessor Design/Vector Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.6.1 Parallel Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.6.2 Non-Parallel Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    4.7 Microprocessor Design/Multi-Core Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454.7.1 Symmetric Multi-core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.7.2 Asymmetric Multi-core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.7.3 Symmetric Multicore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.7.4 Asymmetric Multi-core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.7.5 further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    5 Execution Problems 485.1 Microprocessor Design/Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.2 Microprocessor Design/Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    5.2.1 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 485.3 Microprocessor Design/Hazards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    5.3.1 Data Hazards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.3.2 Control Hazards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.3.3 Structural Hazards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.3.4 Fixing Hazards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

    6 Benchmarking and Optimization 526.1 Microprocessor Design/Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

    6.2.1 Runtime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526.2.2 Processor Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2.3 MIPS/$ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2.4 Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2.5 MIPS/mW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.2.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    6.3 Microprocessor Design/Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.3.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.3.2 Common Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 536.3.3 Benchmark Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546.3.4 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    6.4 Microprocessor Design/Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

  • vi CONTENTS

    7 Parallel Processing 557.1 Microprocessor Design/Multi-Core Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    7.1.1 Symmetric Multi-core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.1.2 Asymmetric Multi-core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.1.3 Symmetric Multicore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.1.4 Asymmetric Multi-core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567.1.5 further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    7.2 Microprocessor Design/Memory-Level Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . 567.2.1 Memory-Level Parallelism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    7.3 Microprocessor Design/Out Of Order Execution . . . . . . . . . . . . . . . . . . . . . . . . . . . 567.3.1 Hazards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567.3.2 Example: Intel Hyperthreading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    8 Support Software 578.1 Microprocessor Design/Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578.2 Microprocessor Design/Simulator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578.3 Microprocessor Design/Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    8.3.1 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    9 Microprocessor Production 589.1 Microprocessor Design/FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589.2 Microprocessor Design/Wire Wrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

    9.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589.2.2 Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599.2.3 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609.2.4 Design Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609.2.5 Assembly Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629.2.6 Programming Tips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629.2.7 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

    9.3 Microprocessor Design/Photolithography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659.3.1 Wafers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659.3.2 Basic Photolithography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659.3.3 packaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659.3.4 further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

    9.4 Microprocessor Design/Sockets and interfacing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659.4.1 Form Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659.4.2 Connectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669.4.3 Sockets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

    10 Advanced Topics 6710.1 Microprocessor Design/Microcodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    10.1.1 Further Reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

  • CONTENTS vii

    10.1.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6710.2 Microprocessor Design/Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

    10.2.1 Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6810.2.2 No cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6810.2.3 Single cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6810.2.4 Hit or Miss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6910.2.5 Cache performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6910.2.6 Cache Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6910.2.7 Size of Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7010.2.8 Cache Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7010.2.9 Memory Stall Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7010.2.10 Associativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7110.2.11 Cache Misses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7210.2.12 Cache Write Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7310.2.13 Stale Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7310.2.14 Split cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7410.2.15 Error detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7410.2.16 Specialized cache features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7410.2.17 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7510.2.18 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

    10.3 Microprocessor Design/Virtual Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7510.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7610.3.2 Memory Accessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7610.3.3 Pages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7610.3.4 Page Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7610.3.5 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    10.4 Microprocessor Design/Power Dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7710.4.1 Genes Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7710.4.2 Two reasons to reduce power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7710.4.3 Heat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7810.4.4 further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7810.4.5 Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    10.5 Microprocessor Design/GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7810.5.1 Characteristics of GPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7810.5.2 GPU Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7910.5.3 GPU Functions & Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7910.5.4 GPU Accelerated Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7910.5.5 GPGPU Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7910.5.6 Difference between GPU and CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8010.5.7 GPU APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8010.5.8 GPU Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

  • viii CONTENTS

    10.5.9 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8110.5.10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

    11 Text and image sources, contributors, and licenses 8211.1 Text . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8211.2 Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8411.3 Content license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

  • 0.1. WIKIBOOKS:COLLECTIONS PREFACE 1

    0.1 Wikibooks:Collections Preface

    This book was created by volunteers at Wikibooks (http://en.wikibooks.org).

    0.1.1 What is Wikibooks?

    Started in 2003 as an offshoot of the popular Wikipediaproject, Wikibooks is a free, collaborative wiki websitededicated to creating high-quality textbooks and other ed-ucational books for students around the world. In addi-tion to English, Wikibooks is available in over 130 lan-guages, a complete listing of which can be found at http://www.wikibooks.org. Wikibooks is a wiki, whichmeans anybody can edit the content there at any time.If you find an error or omission in this book, you canlog on to Wikibooks to make corrections and additionsas necessary. All of your changes go live on the websiteimmediately, so your effort can be enjoyed and utilizedby other readers and editors without delay.Books at Wikibooks are written by volunteers, and canbe accessed and printed for free from the website. Wiki-books is operated entirely by donations, and a certain por-tion of proceeds from sales is returned to the WikimediaFoundation to help keep Wikibooks running smoothly.Because of the low overhead, we are able to produce andsell books for much cheaper then proprietary textbookpublishers can. This book can be edited by anybody atany time, including you. We don't make you wait twoyears to get a new edition, and we don't stop selling oldversions when a new one comes out.Note that Wikibooks is not a publisher of books, andis not responsible for the contributions of its volunteereditors. PediaPress.com is a print-on-demand publisherthat is also not responsible for the content that it prints.Please see our disclaimer for more information: http://en.wikibooks.org/wiki/Wikibooks:General_disclaimer .

    0.1.2 What is this book?

    This book was generated by the volunteers at Wikibooks,a team of people from around the world with varyingbackgrounds. The people who wrote this book may notbe experts in the field. Some may not even have a passingfamiliarity with it. The result of this is that some infor-mation in this book may be incorrect, out of place, ormisleading. For this reason, you should never rely on acommunity-edited Wikibook when dealing in matters ofmedical, legal, financial, or other importance. Please seeour disclaimer for more details on this.Despite thewarning of the last paragraph, however, booksat Wikibooks are continuously edited and improved. Iferrors are found they can be corrected immediately. Ifyou find a problem in one of our books, we ask that yoube bold in fixing it. You don't need anybodys permissionto help or to make our books better.Wikibooks runs off the assumption that many eyes canfindmany errors, andmany able hands can fix them. Overtime, with enough community involvement, the books atWikibooks will become very high-quality indeed. Youare invited to participate at Wikibooks to help makeour books better. As you find problems in your bookdon't just complain about them: Log on and fix them!This is a kind of proactive and interactive reading expe-rience that you probably aren't familiar with yet, so logon to http://en.wikibooks.org and take a look around atall the possibilities. We promise that we won't bite!

    0.1.3 Who are the authors?

    The volunteers at Wikibooks come from around theworld and have a wide range of educational and profes-sional backgrounds. They come to Wikibooks for dif-ferent reasons, and perform different tasks. Some Wik-ibookians are prolific authors, some are perceptive edi-tors, some fancy illustrators, others diligent organizers.Some Wikibookians find and remove spam, vandalism,and other nonsense as it appears. Most wikibookians per-form a combination of these jobs.Its difficult to say who are the authors for any particu-lar book, because so many hands have touched it and somany changes have beenmade over time. Its not unheardof for a book to have been edited thousands of times byhundreds of authors and editors. You could be one of themtoo, if you're interested in helping out.

    0.1.4 Wikibooks in Class

    Books at Wikibooks are free, and with the proper edit-ing and preparation they can be used as cost-effectivetextbooks in the classroom or for independent learners.In addition to using a Wikibook as a traditional read-only learning aide, it can also become an interactive class

    http://en.wikibooks.org/http://en.wikibooks.org/http://www.wikibooks.org/http://www.wikibooks.org/http://en.wikibooks.org/wiki/Wikibooks:General_disclaimerhttp://en.wikibooks.org/wiki/Wikibooks:General_disclaimerhttp://en.wikibooks.org/

  • 2 CONTENTS

    project. Several classes have come to Wikibooks to writenew books and improve old books as part of their nor-mal course work. In some cases, the books written bystudents one year are used to teach students in the sameclass next year. Books written can also be used in classesaround the world by students who might not be able toafford traditional textbooks.

    0.1.5 Happy Reading!

    We atWikibooks have put a lot of effort into these books,and we hope that you enjoy reading and learning fromthem. We want you to keep in mind that what you areholding is not a finished product but instead a work inprogress. These books are never finished in the tradi-tional sense, but they are ever-changing and evolving tomeet the needs of readers and learners everywhere. De-spite this constant change, we feel our books can be reli-able and high-quality learning tools at a great price, andwe hope you agree. Never hesitate to stop in at Wiki-books and make some edits of your own. We hope to seeyou there one day. Happy reading!

    0.2 Microprocessor De-sign/Introduction

    Microprocessor Design

    0.2.1 About This Book

    Computers and computer systems are a pervasive part ofthe modern world. Aside from just the common desktopPC, there are a number of other types of specialized com-puter systems that pop up in many different places. Thecentral component of these computers and computer sys-tems is the microprocessor, or the CPU. The CPU (shortfor Central Processing Unit) is essentially the brains be-hind the computer system, it is the component that com-putes. This book is going to discuss what microproces-sor units do, how they do it, and how they are designed.This book is going to discuss the design of microproces-sor units, but it will not discuss the design of completecomputer systems nor the design of other computer com-ponents or peripherals. Some microprocessor designswill be implemented and synthesized in Hardware De-scription Languages, such as Verilog or VHDL. The bookwill be organized to discuss simple designs and conceptsfirst, and expand the initial designs to include more com-plicated concepts as the book progresses.This book will attempt to discuss the basic concepts andtheory of microprocessor design from an abstract level,and give real-world examples as necessary. This bookwill not focus on studying any particular processor archi-

    tecture, although several of the most common architec-tures will appear frequently in examples and notes.

    0.2.2 How Will This Book Be Organized?

    The first section of the book will review computer archi-tecture, and will give a brief overview of the componentsof a computer, the components of a microprocessor, andsome of the basic architectures of modern microproces-sors.The second section will discuss in some detail the individ-ual components of a microcontroller, what they do, andhow they are designed.The third section will focus in on the ALU and FPU, andwill discuss implementation of particular mathematicaloperations.The fourth section will discuss the various designparadigms, starting with the most simple single cycle ma-chine to more complicated exotic architectures such asvector and VLIW machines.Additional chapters will serve as extensions and supportchapters for concepts discussed in the first four sections.

    0.2.3 Prerequisites

    This book will rely on some important background infor-mation that is currently covered in a number of other localwikibooks. Readers of this book will find the followingprerequisites important to understand the material in thisbook:

    Digital Circuits

    Programmable Logic

    Embedded Systems

    Assembly Language

    All readers must be familiar with binary numbersand also hexadecimal numbers. These notations will beused throughout the book without any prior explanation.Readers of this book should be familiar with at least oneassembly language, and should also be familiar with ahardware description language. This book will use bothtypes of languages in the main narrative of the text with-out offering explanation beforehand. Appendices mightbe included that contain primers on this material.Readers of this book will also find some pieces of soft-ware helpful in examples. Specifically, assemblers andassembly language simulators will help with many of theexamples. Likewise, HDL compilers and simulators willbe useful in the design examples. If free versions of thesesoftware programs can be found, links will be added inan appendix.

    https://en.wikibooks.org/wiki/Microprocessor_Designhttps://en.wikibooks.org/wiki/Digital_Circuitshttps://en.wikibooks.org/wiki/Programmable_Logichttps://en.wikibooks.org/wiki/Embedded_Systemshttps://en.wikibooks.org/wiki/Assembly_Language

  • 0.2. MICROPROCESSOR DESIGN/INTRODUCTION 3

    0.2.4 Who Is This Book For?

    This book is designed to accompany an advanced under-graduate or graduate study in the field of microprocessordesign. Students in the areas of Electrical Engineering,Computer Engineering, or Computer Science will likelyfind this book to be the most useful. The basic subjects inthis field will be covered, and more advanced topics willbe included depending on the proficiencies of the authors.Many of the topics considered in this book will apply tothe design of many different types of digital hardware, in-cluding ASICs. However, the main narrative of the book,and the ultimate goals of the book will be focused on mi-crocontrollers and microprocessors, not other ASICs.

    0.2.5 What This Book Will Not Cover

    This book is about the design of micro-controllers andmicroprocessors only. This book will not cover the fol-lowing topics in any detail, although some mention mightbe made of them as a matter of interest:

    Transistor mechanics, semiconductors, or integratedcircuit fabrication (Microtechnology)

    Digital Circuit Logic, Design or Layout(Programmable Logic)

    Design or interfacing with other computer compo-nents or peripherals (Embedded Systems)

    Design or implementation of communication proto-cols used to communicate between computer com-ponents (Serial Programming)

    Design or creation of computer software (ComputerProgramming)

    Design of System-on-a-Chip hardware or any devicewith an integrated micro-controller

    0.2.6 Terminology

    Throughout the book, the words Microprocessor, Mi-crocontroller, Processor, and CPU will all generallybe used interchangeably to denote a digital processing el-ement capable of performing arithmetic and quantitativecomparisons. We may differentiate between these termsin individual sections, but an explanation of the differ-ences will always be provided.

    https://en.wikibooks.org/wiki/Semiconductorshttps://en.wikibooks.org/wiki/Microtechnologyhttps://en.wikibooks.org/wiki/Digital_Circuitshttps://en.wikibooks.org/wiki/Programmable_Logichttps://en.wikibooks.org/wiki/Embedded_Systemshttps://en.wikibooks.org/wiki/Serial_Programminghttps://en.wikibooks.org/wiki/Subject:Computer_programminghttps://en.wikibooks.org/wiki/Subject:Computer_programming

  • Chapter 1

    Microprocessor Basics

    1.1 Microprocessor De-sign/Microprocessors

    Microprocessor Design

    1.1.1 Microprocessors

    Microprocessors are the devices in a computer whichmake things happen. Microprocessors are capable of per-forming basic arithmetic operations, moving data fromplace to place, and making basic decisions based on thequantity of certain values.

    The components of a PC computer. Part number 3 is the CPU.

    Types of Processors

    The vast majority of microprocessors can be found in em-bedded microcontrollers. The second most common typeof processors are common desktop processors, such asIntels Pentium or AMDs Athlon. Less common are theextremely powerful processors used in high-end servers,such as Suns SPARC, IBMs Power, or Intels Itanium.

    Historically, microprocessors and microcontrollers havecome in standard sizes of 8 bits, 16 bits, 32 bits, and64 bits. These sizes are common, but that does not meanthat other sizes are not available. Some microcontrollers(usually specially designed embedded chips) can come inother non-standard sizes such as 4 bits, 12 bits, 18 bits,or 24 bits. The number of bits represent how much phys-ical memory can be directly addressed by the CPU. Italso represents the amount of bits that can be read by oneread/write operation. In some circumstances, these aredifferent; for instance, many 8 bit microprocessors havean 8 bit data bus and a 16 bit address bus.

    8 bit processors can read/write 1 byte at a time andcan directly address 256 bytes

    16 bit processors can read/write 2 bytes at a time,and can address 65,536 bytes (64 Kilobytes)

    32 bit processors can read/write 4 bytes at a time,and can address 4,294,967,295 bytes (4 Gigabytes)

    64 bit processors can read/write 8 bytes at a time,and can address 18,446,744,073,709,551,616 bytes(16 Exabytes)

    General Purpose Versus Specific Use

    Microprocessors that are capable of performing a widerange of tasks are called general purpose microproces-sors. General purpose microprocessors are typically thekind of CPUs found in desktop computer systems. Thesechips typically are capable of a wide range of tasks (inte-ger and floating point arithmetic, external memory inter-face, general I/O, etc). We will discuss some of the othertypes of processor units available:

    General Purpose A general purpose processing unit,typically referred to as a microprocessor is a chipthat is designed to be integrated into a larger systemwith peripherals and external RAM. These chips cantypically be used with a very wide array of software.

    DSP A Digital Signal Processor, or DSP for short, is achip that is specifically designed for fast arithmetic

    4

    https://en.wikibooks.org/wiki/Microprocessor_Design

  • 1.1. MICROPROCESSOR DESIGN/MICROPROCESSORS 5

    operations, especially addition and multiplication.These chips are designed with processing speed inmind, and don't typically have the same flexibil-ity as general purpose microprocessors. DSPs alsohave special address generation units that can man-age circular buffers, perform bit-reversed address-ing, and simultaneously access multiple memoryspaces with little to no overhead. They also supportzero-overhead looping, and a single-cycle multiply-accumulate instruction. They are not typically morepowerful than general purpose microprocessors, butcan perform signal processing tasks using far lesspower (as in watts).

    Embedded Controller Embedded controllers, or mi-crocontrollers are microprocessors with additionalhardware integrated into a single chip. Many micro-controllers have RAM, ROM, A/D and D/A con-verters, interrupt controllers, timers, and even oscil-lators built into the chip itself. These controllers aredesigned to be used in situations where a whole com-puter system isn't available, and only a small amountof simple processing needs to be performed.

    Programmable State Machines The most simplisticof processors, programmable state machines are aminimalist microprocessor that is designed for verysmall and simple operations. PSMs typically havevery small amount of program ROM available, lim-ited scratch-pad RAM, and they are also typicallylimited in the type and number of instructions thatthey can perform. PSMs can either be used stand-alone, or (more frequently) they are embedded di-rectly into the design of a larger chip.

    Graphics Processing Units Computer graphics are socomplicated that functions to process the visuals ofvideo and game applications have been offloaded toa special type of processor known as a GPU. GPUstypically require specialized hardware to implementmatrix multiplications and vector arithmetic. GPUsare typically also highly parallelized, performingshading calculations on multiple pixels and surfacessimultaneously.

    Types of Use

    Microcontrollers and Microprocessors are used for anumber of different types of applications. People maybe the most familiar with the desktop PC, but the fact isthat desktop PCs make up only a small fraction of all mi-croprocessors in use today. We will list here some of thebasic uses for microprocessors:

    Signal Processing Signal processing is an area that de-mands high performance frommicrocontroller chipsto perform complex mathematical tasks. Signal pro-cessing systems typically need to have low latency,

    and are very deadline driven. An example of a sig-nal processing application is the decoding of digitaltelevision and radio signals.

    Real Time Applications Some tasks need to be per-formed so quickly that even the slightest delay orinefficiency can be detrimental. These applicationsare known as real time systems, and timing is ofthe utmost importance. An example of a real-timesystem is the anti-lock braking system (ABS) con-troller in modern automobiles.

    Throughput and Routing Throughput and routing isthe use of a processor where data is moved from oneparticular input to an output, without necessarily re-quiring any processing. An example is an internetrouter, that reads in data packets and sends them outon a different port.

    Sensor monitoring Many processors, especially smallembedded processors are used to monitor sensors.The microprocessor will either digitize and filter thesensor signals, or it will read the signals and producestatus outputs (the sensor is good, the sensor is bad).An example of a sensor monitoring processor is theprocessor inside an antilock brake system: This pro-cessor reads the brake sensor to determine when thebrakes have locked up, and then outputs a controlsignal to activate the rest of the system.

    General Computing A general purpose processor islike the kind of processor that is typically found in-side a desktop PC. Names such as Intel and AMDare typically associated with this type of processor,and this is also the kind of processor that the publicis most familiar with.

    Graphics Processing of digital graphics is an area wherespecialized processor units are frequently employed.With the advent of digital television, graphics pro-cessors are becoming more common. Graphics pro-cessors need to be able to perform multiple simul-taneous operations. In digital video, for instance, amillion pixels or more will need to be processed forevery single frame, and a particular signal may have60 frames per second! To the benefit of graphicsprocessors, the color value of a pixel is typically notdependent on the values of surrounding pixels, andtherefore many pixels can typically be computed inparallel.

    \mbox{Clock Time} = \frac{1}{\mbox{Clock Rate}}

    1.1.2 Abstraction Layers

    Computer systems are developed in layers known as lay-ers of abstraction. Layers of abstraction allow people todevelop computer components (hardware and software)without having to worry about the internal design of the

  • 6 CHAPTER 1. MICROPROCESSOR BASICS

    other layers in the system. At the highest level are theuser-interface programs that people use on their comput-ers. At the lowest level are the transistor layouts of theindividual computer components. Some of the layers ina computer system are (listed from highest to lowest):

    1. Application

    2. Operating System

    3. Firmware

    4. Instruction Set Architecture

    5. Microprocessor Control Logic

    6. Physical Circuit Layout

    This book will be mostly concerned with the InstructionSet Architecture (ISA), and the Microprocessor ControlLogic but we will also describe the Operating System(OS) in brief. Topics above these are typically the realmof computer programmers. The bottom layer, the Physi-cal Circuit Layout is the job of hardware and VLSI engi-neers.

    1.1.3 Operating System

    Operating System is a program which acts as an inter-face between the system user and the computer hardwareand controls the execution of application programs. It isthe program running at all times on the computer, usuallycalled the Kernel.

    1.1.4 ISA

    The Instruction Set Architecture is a long name for theassembly language of a particular machine, and the asso-ciated machine code for that assembly language. We willdiscuss this below.

    Assembly Language

    An assembly language is a small language that contains ashort word or mnemonic for each individual commandthat a microcontroller can follow. Each command gets asingle mnemonic, and each mnemonic corresponds to asingle machine command. Assembly language gets con-verted (by a program called an assembler) into the bi-nary machine code. The machine code is specific to eachdifferent type of machine.

    Common ISAs

    Wikibooks contains books about programming inmultiple different types of assembly language. For moreinformation about Assembly language, or for books on a

    particular ISA, see Assembly Language.

    Some of the most common ISAs, listed in order of pop-ularity (most popular first) are:

    ARM

    IA-32 (Intel x86)

    MIPS

    Motorola 68K

    PowerPC

    Hitachi SH

    SPARC

    1.1.5 Moores Law

    A common law that governs the world of microproces-sors is Moores Law. Moores Law, originally by Dr.CarverMead at Caltech, and summarized famously by In-tel Founder Gordon Moore. Moores Law states that thenumber of transistors on a single chip at the same pricewill double every 18 to 24 months. This law has heldwithout fail since it was originally stated in 1965. Cur-rent microprocessor chips contain millions of transistorsand the number is growing rapidly. Here is Moores sum-marization of the law fromElectronicsMagazine in 1965:

    The complexity for minimum componentcosts has increased at a rate of roughly a factorof two per year...Certainly over the short termthis rate can be expected to continue, if notto increase. Over the longer term, the rate ofincrease is a bit more uncertain, although thereis no reason to believe it will not remain nearlyconstant for at least 10 years. That means by1975, the number of components per integratedcircuit for minimum cost will be 65,000. Ibelieve that such a large circuit can be built ona single wafer.Gordon Moore

    Moores Law has been used incorrectly to calculate thespeed of an integrated circuit, or even to calculate itspower consumption, but neither of these interpretationsare true. Also, Moores law is talking about the numberof transistors on a chip for a minimum component cost,which means that the number of transistors on a chip, forthe same price, will double. This goes to show that chipsfor less price can have fewer transistors, and that chipsat a higher price can have more transistors. On an eco-nomic note, a consequence of Moores Law is that com-panies need to continue to innovate and integrate more

    https://en.wikibooks.org/wiki/Assembly_Language

  • 1.2. MICROPROCESSOR DESIGN/OS 7

    transistors onto a single chip, without being able to in-crease prices.Moores Law does not require that the speed of the chipincrease along with the number of transistors on thechip. However, the two measurements are typically re-lated. Some points to keep in mind about transistors andMoores Law are:

    1. Smaller Transistors typically switch faster thenlarger transistors.

    2. To get more transistors on a single chip, the chipneeds to be made larger, or the transistors need to bemade smaller. Typically, the transistors get smaller.

    3. Transistors tend to leak electrical current as they getsmaller. This means that smaller transistors requiremore power to operate, and they generate more heat.

    4. Transistors tend to generate heat as a function of fre-quencies. Higher clock rates tend to generate moreheat.

    Moores law is occasionally misinterpreted to mean thatthe speed of processors, in hertz will double every 18months. This is not strictly true, although the speed ofprocessors does tend to increase as transistors are madesmaller andmore compact. With the advent of multi-coreprocessors, some people have used Moores law to meanthat processor throughput increases with time, which isnot strictly the case either (although it is a likely side ef-fect of Moores law).

    1.1.6 Clock Rates

    Microprocessors are typically discussed in terms of theirclock speed. The clock speed is measured in hertz (ormegahertz, or gigahertz). A hertz is a cycle per second.Each cycle, a microprocessor will perform certain tasks,although the amount of work performed in a single cycle

    will be different for different types of processors. Theamount of work that a processor can complete in a singlecycle is measured in cycles per instruction. For somesystems, such as MIPS, there is 1 cycle per instruction.For other systems, such as modern x86 chips, there aretypically very many cycles per instruction.The clock rate is equated as such:

    Clock Time = 1Clock RateThis means that the amount of time for a cycle is inverselyproportional to the clock rate. A computer with a 1MHzclock rate will have a clock time of 1 microsecond. Amodern desktop computer with a 3.2 GHz processor willhave a clock time of approximately 3 1010 seconds, or300 picoseconds. 300 picoseconds is an incredibly smallamount of time, and there is a lot that needs to happeninside the processor in each clock cycle.

    1.1.7 Basic Elements of a Computer

    There are a few basic elements that are common to allcomputers. These elements are:

    CPU

    Memory

    Input Devices

    Output Devices

    Depending on the particular computer architecture, theseelements may be available in various sizes, and they maybe accompanied by additional elements.

    1.2 Microprocessor Design/OS

    1.2.1 Operating System

    An operating system is an essential component of the sys-tem software. It is a program that enables the computerhardware to communicate and operate with the computersoftware. An operating system which is initially loadedinto the computer by a boot program, manages all theother programs in a computer. The OS provides a soft-ware platform on top of which other application programscan run. The application programs make use of the op-erating system by making requests for services througha defined application program interface (API). In addi-tion, users can interact directly with the operating sys-tem through a user interface such as a command lan-guage or a graphical user interface (GUI). OS is com-monly equipped with features like: Multitasking, Syn-chronization, Interrupt and Event Handling, Input/ Out-put, Inter-task Communication, Timers and Clocks and

  • 8 CHAPTER 1. MICROPROCESSOR BASICS

    Memory Management to fulfill its primary role of man-aging the hardware resources to meet the demands of ap-plication programs.Operating systems can be found on almost any devicethat contains a computer from cellular phones andvideo game consoles to supercomputers and web servers.Examples of popular modern operating systems include""Android"", BSD, iOS, Linux, OS X, QNX, MicrosoftWindows,Windows Phone, and IBM z/OS. All these, ex-cept Windows, Windows Phone and z/OS, share roots inUNIX.A ""kernel"" is a program that constitutes the centralcomponent of an operating system. It has complete con-trol over everything that occurs in the system.

    Functions of an Operating System

    1. It makes the system more convenient to use by theuser.

    2. It manages the hardware and software resources ofthe system by making sure that each applicationgets the necessary resources while managing all theother applications simultaneously, thus increasingefficiency.

    3. The OS is responsible for providing a consistent ap-plication program interface ""(API)"". A consis-tent application program interface allows a softwaredeveloper to write an application on one device andhave a high level of confidence that it will run on an-other device of the same type, even if the amountof memory or the quantity of storage is different onthe two machines.

    4. An OS should be constructed in such a way as topermit the effective development, testing and intro-duction of new system functions without at the sametime interfering with service.

    1.2.2 Types of Operating System

    A part of the operating system called the scheduler is re-sponsible for deciding which program to run when, andprovides the illusion of simultaneous execution by rapidlyswitching between each program. The type of an operat-ing system is defined by how the scheduler decides whichprogram to run when.

    1. Real-time operating system(RTOS) - A real-timeoperating system (RTOS) is an OS intended to servereal-time application requests. It must be able toprocess data as it comes in, typically without buffer-ing delays.

    2. Single-user, single task - As the name implies, thisoperating system is designed to manage the com-

    puter so that one user can effectively do one thingat a time. Eg. Palm OS.

    3. Single-user, multi-tasking - Operating systems thatwill let a single user have several programs in oper-ation at the same time.This is the type of operatingsystem most people use on their desktop and lap-top computers today. Eg. Microsofts Windows andApples MacOS.

    4. Multi-user - A multi-user operating system allowsmany different users to take advantage of the com-puters resources simultaneously. The operating sys-tem must make sure that the requirements of thevarious users are balanced, and that each of the pro-grams they are using has sufficient and separate re-sources so that a problem with one user doesn't af-fect the entire community of users. Eg. Unix, VMSand mainframe operating systems.

    In this book, we will only discuss the Real-Time Operat-ing Systems in detail.

    1.3 Microprocessor Design/Real-Time Operating System

    1.3.1 Real-Time Operating System

    Real-Time Operating System (RTOS) is a multitaskingoperating system intended for serving real-time applica-tion requests. It must be able to process data as it comesin, typically without buffering delays. RTOS is imple-mented in products all around us, ranging from military,and consumer to scientific applications. RTOS is the op-erating system used in many of the embedded systems asit will be effective in allowing the real-time applicationsto be designed and expanded more easily whilst meetingthe performances required.It comprises of two components, namely, Real-Timeand Operating System. Real-Time indicates that theoperating system must respond in a definite time for thecritical operations that it performs along with the highreliability. RTOS is therefore an operating system thatsupports real-time applications and embedded systems byproviding logically correct result within the deadline re-quired. Such capabilities define its deterministic timingbehavior and limited resource utilization nature.

    Classification of RTOS

    RTOSs are broadly classified into three types:

    1. Hard real-time: The degree of tolerance for misseddeadlines is extremely small or zero. Amissed dead-line has catastrophic results for the system.

    https://en.wikipedia.org/wiki/Android_(operating_system)https://en.wikipedia.org/wiki/Kernel_(computing)https://en.wikipedia.org/wiki/API

  • 1.3. MICROPROCESSOR DESIGN/REAL-TIME OPERATING SYSTEM 9

    2. Firm real-time: Missing a deadline might result inan unacceptable quality reduction.

    3. Soft real-time: The deadlines may be missed andcan be recovered from. Reduction in system qualityis acceptable.

    Features of RTOS

    A basic RTOS will be equipped with the following fea-tures:

    Multitasking and Preemptibility

    An RTOS must be multi-tasked and preemptible to sup-port multiple tasks in real-time applications. The sched-uler should be able to preempt any task in the system andallocate the resource to the task that needs it most even atpeak load.

    Task Priority

    Preemption defines the capability to identify the task thatneeds a resource the most and allocates it the control toobtain the resource. In RTOS, such capability is achievedby assigning individual task with the appropriate prioritylevel.

    Reliable and Sufficient Inter Task CommunicationMechanism

    Formultiple tasks to communicate in a timelymanner andto ensure data integrity among each other, reliable andsufficient inter-task communication and synchronizationmechanisms are required.

    Priority Inheritance

    To allow applications with stringent priority requirementsto be implemented, RTOS must have a sufficient numberof priority levels when using priority scheduling.

    Predefined Short Latencies

    An RTOS needs to have accurately defined short timingof its system calls.

    Control of Memory Management

    To ensure predictable response to an interrupt, an RTOSshould provide way for task to lock its code and data intoreal memory.

    RTOS Architecture

    The architecture of an RTOS is dependent on the com-plexity of its deployment. Good RTOSs are scalable tomeet different sets of requirements for different appli-cations. For simple applications, an RTOS usually com-prises only a kernel. For more complex embedded sys-tems, an RTOS can be a combination of various mod-ules, including the kernel, networking protocol stacks,and other components.

    1.3.2 RTOS vs. General Purpose OS

    Many non-real-time operating systems also provide sim-ilar kernel services. The key difference between general-computing operating systems and real-time operating sys-tems is the need for deterministic timing behavior inthe real-time operating systems. Formally, determinis-tic timing means that operating system services consumeonly known and expected amounts of time. In theory,these service times could be expressed as mathematicalformulas. These formulas must be strictly algebraic andnot include any random timing components. Random el-ements in service times could cause random delays in ap-plication software and could then make the applicationrandomly miss real-time deadlines a scenario clearlyunacceptable for a real-time embedded system.General-computing non-real-time operating systems areoften quite non-deterministic. Their services can in-ject random delays into application software and thuscause slow responsiveness of an application at unexpectedtimes. If you ask the developer of a non-real-time operat-ing system for the algebraic formula describing the timingbehavior of one of its services (such as sending a messagefrom task to task), you will invariably not get an algebraicformula. Instead the developer of the non-real-time oper-ating system (such as Windows, Unix or Linux) will justgive you a puzzled look. Deterministic timing behaviorwas simply not a design goal for these general-computingoperating systems.On the other hand, real-time operating systems often go astep beyond basic determinism. For most kernel services,these operating systems offer constant load-independenttiming.

    1.3.3 Kernel in RTOS

    kernel the part of an operating system that providesthe most basic services to application software running ona processor.The kernel of a real-time operating system (RTOS)provides an abstraction layer that hides from applica-tion software the hardware details of the processor (orset of processors) upon which the application softwarewill run. In providing this abstraction layer the RTOS

  • 10 CHAPTER 1. MICROPROCESSOR BASICS

    kernel supplies five main categories of basic services toapplication software.

    1. The most basic category of kernel services, at thevery center, is Task Management.

    2. The second category of kernel services, is IntertaskCommunication and Synchronization.

    3. Many RTOS kernels provide Dynamic MemoryAllocation services.

    4. Many RTOS kernels also provide a Device I/OSupervisor category of services.

    5. In addition to kernel services, many RTOSs of-fer a number of optional add-on operating systemcomponents for such high-level services as file sys-tem organization, network communication, networkmanagement, database management, user-interfacegraphics, etc.

    Functions

    Task Scheduling Most RTOSs do their schedulingof tasks using a scheme called priority-based preemp-tive scheduling. It basically assign each process a pri-ority and if at any point in time, scheduler runs high-est priority process ready to run. Every process runsto completion unless preempted. Scheduler is respon-sible for time-sharing of CPU among tasks. Each timethe priority-based preemptive scheduler is alerted by anexternal world trigger (such as a switch closing) or asoftware trigger (such as a message arrival), it must gothrough the following 5 steps:

    1. Determine whether the currently running taskshould continue to run.

    2. If not, determine which task should run next.

    3. Save the environment of the task that was stopped(so it can continue later).

    4. Set up the running environment of the task that willrun next.

    5. Allow this task to run.

    These 5 steps together are called task switching

    Fixed Time Task Switching The time it takes to dotask switching is of interest when evaluating an operatingsystem. A simple general-computing (non-preemptive)operating system might do task switching only at timer

    tick times, which might for example be ten millisecondsapart. Then if the need for a task switch arises anywherewithin a 10-millisecond timeframe, the actual task switchwould occur only at the end of the current 10-millisecondperiod. Such a delay would be unacceptable in most real-time embedded systems.For, in fact, the term real-time does not mean as fastas possible; but rather real-time demands consistent,repeatable, known timing performance. Although a non-real-time operating system might do some faster taskswitching for small numbers of tasks, it might equally wellintroduce a long time delay the next time it does the sametask switch. The strength of a real-time operating systemis in its known, repeatable timing performance, which isalso typically faster than that of a non-deterministic taskscheduler in situations of large numbers of tasks in a soft-ware system. Most often, the real-time operating sys-tem will exhibit task-switching times much faster thanits non-real-time competitor when the number of tasksgrows above 5 or 10.

    Intertask Communication And SynchronizationMost operating systems, including RTOSs, offer a vari-ety of mechanisms for communication and synchroniza-tion between tasks. These mechanisms are necessary in apreemptive environment of many tasks, because withoutthem the tasks might well communicate corrupted infor-mation or otherwise interfere with each other.For instance, a task might be preempted when it is in themiddle of updating a table of data. If a second task thatpreempts it reads from that table, it will read a combina-tion of some areas of newly-updated data plus some areasof data that have not yet been updated. These updatedand old data areas together may be incorrect in combi-nation, or may not even make sense. An RTOSs mech-anisms for communication and synchronization betweentasks are provided to avoid these kinds of errors. MostRTOSs provide several mechanisms, with each mecha-nism optimized for reliably passing a different kind ofinformation from task to task. Probably the most popu-lar kind of communication between tasks in embeddedsystems is the passing of data from one task to another.Most RTOSs offer a message passing mechanism for do-ing this. Each message can contain an array or buffer ofdata.

    Determine AndHigh SpeedMessage Passing Inter-task message communication is another area where dif-ferent operating systems show different timing charac-teristics. Most operating systems actually copy messagestwice as they transfer them from task to task via a mes-sage queue. The first copying is from the message-sendertask to an operating system-owned secret area of RAMmemory and the second copying is from the operatingsystems secret RAM area to the message-receiver task.Clearly this is non-deterministic in its timing, as these

  • 1.4. MICROPROCESSOR DESIGN/EMBEDDED SYSTEM 11

    copying activities take longer as message length increasesAn approach that avoids this non-determinism and alsoaccelerates performance, is to have the operating systemcopy a pointer to the message and deliver that pointer tothe message-receiver task without moving the messagecontents at all. In order to avoid access collisions, theoperating system then needs to go back to the message-sender task and obliterate its copy of the pointer to themessage. For large messages, this eliminates the need forlengthy copying and eliminates non-determinism.

    Dynamic Memory Allocation Dynamic memory al-location is when an executing program requests that theoperating system give it a block of main memory. Theprogram then uses this memory for some purpose. De-terminism of service times is also an issue in the areaof dynamic allocation of RAM memory. Many general-computing non-real-time operating systems offer mem-ory allocation services from what is termed a Heap.Heaps suffer from a phenomenon called External Mem-ory Fragmentation that may cause the heap services todegrade. External fragmentation arises when free mem-ory is separated into small blocks and is interspersed byallocated memory. It is a weakness of certain storage al-location algorithms, when they fail to order memory usedby programs efficiently. The result is that, although freestorage is available, it is effectively unusable because itis divided into pieces that are too small individually tosatisfy the demands of the application.This fragmentation problem can be solved by garbagecollection (defragmentation) software. Unfortunately,garbage collection algorithms are often wildly non-deterministic injecting randomly-appearing random-duration delays into heap services. These are often seenin the memory allocation services of general-computingnon-real-time operating systems.Real-time operating systems, solve this problem of delayby altogether avoiding both memory fragmentation andgarbage collection, and their consequences. RTOSsoffer non-fragmenting memory allocation techniques in-stead of heaps. They do this by limiting the variety ofmemory chunk sizes they make available to applicationsoftware. While this approach is less flexible than theapproach taken by memory heaps, they do avoid exter-nal memory fragmentation and avoid the need for defrag-mentation.

    1.4 Microprocessor De-sign/Embedded System

    1.4.1 Embedded System

    An embedded system is a system that has software em-bedded into computer-hardware, which makes a system

    dedicated for an application(s) or specific part of an ap-plication or product or part of a larger system. It is amicroprocessor-based control system which processes afixed set of programmed instructions to control electro-mechanical equipment which may be part of an evenlarger system. They are the electronic systems that con-tain a microprocessor or a micro-controller, but we do notthink of them as computers the computer is hidden orembedded in the system.Physically, embedded systems range from portable de-vices such as digital watches and MP3 players, to largestationary installations like traffic lights, factory con-trollers, and largely complex systems like hybrid vehicles,MRI, and avionics. Complexity varies from low, with asingle microcontroller chip, to very high with multipleunits, peripherals and networks mounted inside a largechassis or enclosure.Three main embedded components are:

    1. Embeds hardware to give computer like functional-ity.

    2. Embeds main application software generally intoflash or ROM and the application software performsconcurrently the number of tasks.

    3. Embeds a real time operating system(RTOS), whichsupervises the application software tasks running onthe hardware and organizes the accesses to systemresources according to priorities and timing con-straints of tasks in the system.

    1.4.2 Characteristics

    The main characters of an Embedded System are:

    1. Dedicated functions

    2. Dedicated complex algorithms

    3. Dedicated GUIs and other user interfaces for theapplication

    4. Real time operations

    5. Multi-rate operations

    1.4.3 Real-Time Embedded System

    Real-time embedded systems are defined as those systemsin which the correctness of the system depends not onlyon the logical result of computation, but also on the timeat which the results are produced.Real-time embedded systems are used in many applica-tions such as airborne computers, medical instrumentsand communication systems. Embedded systems are

  • 12 CHAPTER 1. MICROPROCESSOR BASICS

    characterized by limited processor memory, limited pro-cessing power, and unusual interfaces to the outsideworld. Real-time requirements impose stringent timedeadlines for delivering the results of embedded process-ing.RTOS kernels hide from application software the low-level details of system hardware, and at the same timeprovide several categories of services to application soft-ware. These include: task management with priority-based preemptive scheduling, reliable intertask commu-nication and synchronization, non-fragmenting dynamicmemory allocation, and basic timer services.The issue of timing determinism is important in differ-entiating general-computing operating systems from real-time operating systems. This issue crops up in many partsof operating system kernels, such as task schedulers, dy-namic memory allocation and intertask message commu-nication. While general-computing operating systems of-ten offer non-deterministic services in these areas, fullydeterministic solutions are needed for real-time and em-bedded systems. A number of real-time operating sys-tems implement these solutions in their compact high-performance kernels.

    1.5 Microprocessor De-sign/Computer Architecture

    1.5.1 Von Neumann Architecture

    Early computer programswere hard wired. To reprograma computer meant changing the hardware switches manu-ally, that took a long timewith potential errors. Computermemory was only used for storing data.John von Neumann suggested that data and programsshould be stored together in memory. This is now calledVon Neumann architecture. Programs are fetched frommemory for execution by a central unit that we call theCPU. Basically programs and data are represented inmemory in the same way. The program is just data en-coded with special meaning. The main criticism of thisapproach is, that security problems can arise when in-structions can be manipulated as if they were data, andvice-versa.A Von Neumann microprocessor is a processor that fol-lows this pattern:

    Fetch An instruction and the necessary data are obtainedfrom memory.

    Decode The instruction and data are separated, and thecomponents and pathways required to execute theinstruction are activated.

    Execute The instruction is performed, the data is ma-nipulated, and the results are stored.

    This pattern is typically implemented by separating thetask into two components, the control, and the datapath.

    Control

    The control unit reads the instruction, and activates theappropriate parts of the datapath.

    Datapath

    The datapath is the pathway that the data takes throughthe microprocessor. As the data travels to different partsof the datapath, the command signals from the controlunit cause the data to be manipulated in specific ways,according to the instruction. The datapath consists ofthe circuitry for transforming data and for storing tempo-rary data. It contains ALUs capable of transforming datathrough operations such as addition, subtraction, logicalAND, OR, inverting, and shifting.We discuss the control and datapath in far more detail ina later section, Control and Datapath.

    1.5.2 Harvard Architecture

    In a Harvard Architecture machine, the computer sys-tems memory is separated into two discrete parts: dataand instructions. In a pure Harvard system, the two dif-ferent memories occupy separate memory modules, andinstructions can only be executed from the instructionmemory.Many DSPs are modified Harvard architectures, designedto simultaneously access three distinct memory areas: theprogram instructions, the signal data samples, and the fil-ter coefficients (often called the P, X, and Y memories).In theory, such three-way Harvard architectures can bethree times as fast as a Von Neumann architecture that isforced to read the instruction, the data sample, and thefilter coefficient, one at a time.

    1.5.3 Modern Computers

    Modern desktop computers, especially computers basedon the Intel x86 ISA are not Harvard computers, althoughthe newer variants have features that are Harvard-Like.All information, program instructions, and data are storedin the same RAM areas. However, a modern featurecalled paging allows the physical memory to be seg-mented into large blocks of memory called pages. Eachpage of memory can either be instructions or data, but notboth.Modern embedded computers, however, are typicallybased on a Harvard architecture. Instructions are storedin a different addressable memory block than the data is,

    https://en.wikipedia.org/wiki/John%2520von%2520Neumannhttps://en.wikipedia.org/wiki/Von%2520Neumann%2520architecturehttps://en.wikibooks.org/wiki/Microprocessor_Design/Control_and_Datapath

  • 1.5. MICROPROCESSOR DESIGN/COMPUTER ARCHITECTURE 13

    and there is no way for the microprocessor to interchangedata and instructions.

    1.5.4 RISC and CISC and DSP

    Historically, the first type of ISA was the complex in-struction set computers (CISC), and the second typewas the reduced instruction set computers (RISC). Itis a common misunderstanding that RISC systems typi-cally have a small ISA (fewer instructions) but make upfor it with faster hardware. RISC system actually havereduced instructions, in the sense that each instructiondoes so little that it takes very little time to execute it. Itis a common misunderstanding that CISC systems havemore instructions, but typically pay a steep performancepenalty for the added versatility. CISC systems actuallyhave complex instructions, in the sense that at least oneinstruction takes a long time to execute -- for example,the double indirect addressing mode inherently requirestwo memory cycles to execute, and a few CPUs have astring copy instruction that may require hundreds ofmemory cycles to execute. MIPS and SPARC are ex-amples of RISC computers. Intel x86 is an example of aCISC computer.Some people group stack machines with the RISC ma-chines; others group stack machines with the CISC ma-chines; some people , describe stack machines as neitherRISC nor CISC.Other ISA types include DSPs, stack machines, VLIWmachines, MISCmachines, TTA architectures, massivelyparallel processor arrays, etc.We will discuss these terms and concepts in more detaillater.

    1.5.5 Microprocessor Components

    Some of the common components of a microprocessorare:

    Control Unit

    I/O Units

    Arithmetic Logic Unit (ALU)

    Registers

    Cache

    A brief introduction to these components is placed below.

    Control Unit

    The control unit, as described above, reads the instruc-tions, and generates the necessary digital signals to op-erate the other components. An instruction to add two

    numbers together would cause the Control Unit to acti-vate the addition module, for instance.

    I/O Units

    A processor needs to be able to communicate with therest of the computer system. This communication occursthrough the I/O ports. The I/O ports will interface withthe system memory (RAM), and also the other peripher-als of a computer.

    Arithmetic Logic Unit

    The Arithmetic Logic Unit, or ALU is the part ofthe microprocessor that performs arithmetic operations.ALUs can typically add, subtract, divide, multiply, andperform logical operations of two numbers (and, or, nor,not, etc).ALUwill be discussed in farmore detail in a later chapter,ALU.

    Registers

    This book, includes data about different kinds of regis-ters. Hopefully it will be obvious which kind of registerwe are talking about from the context.The most general meaning is a hardware register": any-thing that can be used to store bits of information, in away that all the bits of the register can be written to orread out simultaneously. Since registers outside of a CPUare also outside the scope of the book, this book will onlydiscuss processor registers, which are hardware registersthat happen to be inside a CPU. But usually we will referto a more specific kind of register.Registers are mentioned in far more detail in a later chap-ter, Register File.

    programmer-visible registers The programmer-visible registers, also called the user-accessible registers,also called the architectural registers, often simply calledthe registers, are the registers that are directly encodedas part of at least one instruction in the instruction set.The registers are the fastest accessible memory locations,and because they are so fast, there are typically very fewof them. In most processors, there are fewer than 32 reg-isters. The size of the registers defines the size of thecomputer. For instance, a 32 bit computer has registersthat are 32 bits long. The length of a register is known asthe word length of the computer.There are several factors limiting the number of registers,including:

    It is very convenient for a new CPU to be software-compatible with an old CPU. This requires the

    https://en.wikipedia.org/wiki/Instruction_set#Categories_of_ISAhttps://en.wikibooks.org/wiki/Microprocessor_Design/Instruction_Set_Architectures#ISAshttps://en.wikibooks.org/wiki/Microprocessor_Design/ALUhttps://en.wikibooks.org/wiki/Microprocessor_Design/Register_File

  • 14 CHAPTER 1. MICROPROCESSOR BASICS

    new chip to have exactly the same number ofprogrammer-visible registers as the old chip.

    Doubling the number general-purpose registers re-quires adding another bit to each instruction that se-lects a particular register. Each 3-operand instruc-tion (that specify 2 source operands and a destina-tion operand) would expand by 3 bits. Modern chipmanufacturing processes could put a million regis-ters on a chip; that would make each and every 3-operand instruction require 60 bits just to select theregisters, not counting the bits required to specifywhat to do with those operands.

    Adding more registers adds more wires to the crit-ical path, adding capacitance, which reduces themaximum clock speed of the CPU.

    Historically, CPUs were designed with few regis-ters, because each additional register increased thecost of the CPU significantly. But now that modernchip manufacturing can put tens of millions of bitsof storage on a single commodity CPU chip, this isless of an issue.

    Microprocessors typically contain a large number of reg-isters, but only a small number of them are accessible bythe programmer. The registers that can be used by theprogrammer to store arbitrary data, as needed, are calledgeneral purpose registers. Registers that cannot be ac-cessed by the programmer directly are known as reservedregisters[citation needed].Some computers have highly specialized registers -- memory addresses always came from the programcounter or the index register or the stack pointer; oneALU input was always hooked to data coming frommem-ory, the other ALU input was always hooked to the ac-cumulator; etc.Other computers have more general-purpose registers --any instruction that access memory can use any addressregister as a index register or as a stack pointer; any in-struction that uses the ALU can use any data register.Other computers have completely general-purpose regis-ters -- any register can be used as data or an address inany instruction, without restriction.

    microarchitectural registers Besides theprogrammer-visible registers, all CPUs have otherregisters that are not programmer-visible, calledmicroarchitectural registers or physical registers.These registers include:

    memory address register

    memory data register

    instruction register

    microinstruction register

    microprogram counter

    pipeline registers

    extra physical registers to support register renaming

    the prefetch input queue

    writable control stores (We will discuss the controlstore in theMicroprocessor Design/Control Unit andMicroprocessor Design/Microcode)

    Some people consider on-chip cache to be part ofthe microarchitectural registers; others consider itoutside the CPU.

    There are a wide variety of ways to implement any oneinstruction set. The vast majority of these microarchitec-tural registers are technically not necessary. A designercould choose to design a CPU that had almost no physi-cal registers other than the programmer-visible registers.However, many designers choose to design a CPU withlots of physical registers, using them in ways that makethe CPU execute the same given instruction set muchfaster than a CPU that lacks those registers.

    Cache

    Most CPUs manufactured do not have any cache.Cache ismemory that is located on the chip, but that is notconsidered registers. The cache is used because readingexternal memory is very slow (compared to the speed ofthe processor), and reading a local cache is much faster.In modern processors, the cache can take up as much as50% or more of the total area of the chip. The follow-ing table shows the relationship between different typesof memory:Cache typically comes in 2 or 3 levels, depending onthe chip. Level 1 (L1) cache is smaller and faster thanLevel 2 (L2) cache, which is larger and slower. Somechips have Level 3 (L3) cache as well, which is larger stillthan the L2 cache (although L3 cache is still much fasterthan external RAM).We discuss cache in far more detail in a later chapter,Cache.

    1.5.6 Endian

    Different computers order their multi-byte data words(i.e., 16-, 32-, or 64-bit words) in different ways in RAM.Each individual byte in a multi-byte word is still sepa-rately addressable. Some computers order their data withthe most significant byte of a word in the lowest address,while others order their data with the most significant

    https://en.wikibooks.org/wiki/Wikibooks:ORhttps://en.wikibooks.org/wiki/Microprocessor_Design/Control_Unithttps://en.wikibooks.org/wiki/Microprocessor_Design/Microcodehttps://en.wikibooks.org/wiki/Microprocessor_Design/Cache

  • 1.6. MICROPROCESSOR DESIGN/INSTRUCTION SET ARCHITECTURES 15

    byte of a word in the highest address. There is logic be-hind both approaches, and this was formerly a topic ofheated debate.This distinction is known as endianness. Computers thatorder data with the least significant byte in the lowest ad-dress are known as Little Endian, and computers thatorder the data with the most significant byte in the low-est address are known as Big Endian. It is easier for ahuman (typically a programmer) to view multi-word datadumped to a screen one byte at a time if it is ordered asBig Endian. However it makes more sense to others tostore the LS data at the LS address.When using a computer this distinction is typically trans-parent; that is that the user cannot tell the difference be-tween computers that use the different formats. How-ever, difficulty arises when different types of computersattempt to communicate with one another over a network.With a big-endian 68K sort of machine,address increases > ------ > data : 74 65 73 74 00 00 0005is the string test followed by the 32-bit integer 5. Thelittle-endian x86 sort of machine would interpret the lastpart as the integer 0x0500_0000.When communicating over a network composed of bothbig-endian and little-endian machines, the network hard-ware (should) apply the Address Invariance principle,to avoid scrambling text (avoiding the NUXI problem).High-level software (should) format packets of data tobe transmitted over the network in Network Byte Orde