cao-notess by girdhar gopal gautam 3g

7/31/2019 Cao-notess by Girdhar Gopal Gautam 3g

1/130

Gopal sharma MVN institute of technology &management

Branch: ECE (5 th SEM)

Session-2012

Computer Architecture and Organization

MRS. Rama Pathak

Submitted byGirdhar gopal gautam ([email protected] )

1


2/130

Lecture 1:

Digital logics Boolean Algebra Logic Gates Truth table

Here we deal with the basic digital circuits of our computer. That is what are the hardwarecomponents we are using , how these hardware components are related and interacted toeach other and how this hardware is accessed or seen by the user.

This gives the origin of the classification of our computer study into: Computer design: This is concerned with the hardware design of the computer. In this

designer decides on the specifications of the computer system. Computer Organization: This is concerned with the way the hardware components operate

and the way they are connected to form the computer system. Computer Architecture: This is concerned with the structure and behavior of the computer

as seen by the user. It includes the information formats, the instruction set and addressingmodes for accessing memory.

In our course we will be dealing with computer architecture and organization.

Before starting with the computer architecture and organization lets discuss thecomponents which make the hardware or the organization of the computer which iscomposed of digital circuits which are handled by digital computer.

Digital Computers Imply that the computer deals with digital information Digital information: is represented by binary digits (0 and 1)

Gates blocks of Hardware that produce 1 or 0 when input logic requirements are

satisfied

2


3/130

Functions of gates can be described by: Truth Table Boolean Function Karnaugh Map

Table for various logic gates -1.1

Gate

GATEBinary digitalinput signal

Binary digitaloutput signal

3


4/130

Boolean algebra

Algebra with Binary (Boolean) Variable and Logic Operations Boolean Algebra is useful in Analysis and Synthesis of Digital

Logic Circuits

- Input and Output signals can be represented by BooleanVariables and

- Function of the Digital Logic Circuits can be represented by Logic Operations , i.e., Boolean Function(s)

- From a Boolean function, a logic diagram can beconstructed using AND, OR, and I

Note: We can have many circuits for the same Boolean expression.

For example:

Truth TableThe most elementary specification of the function of a Digital

Logic Circuit is the Truth Table

4


5/130

Table that describes the Output Values for all the combinationsof the Input Values, called MINTERMS

n input variables 2n minterms

Summary: Computer Design: what hardware components we need. Computer Organization: how these hardware components are interacted. Computer Architecture: how these are connected with the user. Logic Gates: Blocks of hardware giving result in 0 or 1. Basic 8 logic gatesout of 3 (AND , OR and I ) are basic Boolean Algebra: The representation of input and output signals in the formof expressions. Truth table: Table that describes the Output Values for all the combinationsof the Input Values

Lecture 2:

Combinational logic BlocksMultiplexersAddersEncodersDecoders

Combinational circuits are circuits without memory where the outputs are obtained fromthe inputs only. An n-input m-output combinational circuit is of the form.

Multiplexer is the combinational circuit which selects one of the many inputs depending onthe selection criteria.The no of selection inputs depends on the number of inputs in the manner as 2 x = y

By this if y is the no of inputs then x is the no of selection lines.Thus if we have 4 input lines, we use 2 selection lines as 2 2 =4 and so on.And this will be called as 4:1 multiplexer or 4*1 multiplexer.

This has been explained in the diagram as:

Combinationalcircuits

n input m output

5


6/130

AddersHalf Adder Full Adder

Half Adder: Adds 2 bits and give out carry and sum as result

4-to-1 Multiplexer

I0

I1

I2

I3

S0

S1

Y

0 0 I0

0 1 I 11 0 I

2

1 1 I3

Select OutputS

1S

0Y

6


7/130

Full Adder: Adds 2 bits with carry in and gives carry out and sum as result.

x

y

x

y

c = xy s = xy + xy= x y

x

c

s

0 0 0 00 1 0 11 0 0 11 1 1 0

x y c s 010 0

01

1

y

Truth Table

Digital Circuit

0

XY

Cin

S

cout

0 0 0 0 00 0 1 0 10 1 0 0 10 1 1 1 01 0 0 0 11 0 1 1 01 1 0 1 01 1 1 1 1

Cout

= xy + xcin+ yc

in

= xy + (x y) Cin

s = xy cin+xyc

in+xyc

in+xyC

in

= x y Cin

= (x y) Cin

xC

in

xC

in

Cout s

x y c in cout s

0

0

10

0

1

11

0

101

1

010

7


8/130

Decoder: Decoder takes n inputs and gives 2 n outputs.That is we get 8 outputs for 3 inputs and is called as 3* 8 decoder.We also have 2* 4 decoder and 4*16 decoder and so on.

We are implementing a decoder with the help of NAND gates.

Using NAND gates, it becomes more economical.

8


9/130

Summary: Combinational circuits: where the outputs are obtained from the inputs only. Various combinational circuits are:o Multiplexers: No of selection inputs depends on the number of inputs in the manner as 2 x = y.o Half Adder: Adds 2 bits and give result as carry and sum.o Full Adder: Adds 2 bits with carry in and gives result as carry outand Sum.o Encoder: Takes 2 n inputs and gives n outputs.o Decoder: Takes n inputs and gives 2 n outputs.

Important Questions derived from this:Q1. What is the difference in multiplexer and decoder?Q2.Draw a 4*1 decoder with the help of AND gates.

9


10/130

Lecture 3:

Sequential logic BlocksLatchesFlip flopsRegistersCounters

Sequential logic Blocks : logic blocks whose output logic valuedepends on the input values and the state of the blocks

In this we have the concept of memory which was notapplicable for combinational circuits.

The various sequential blocks or circuits are:

Latches: A latch is a kind of bistable multivibrator , an electronic circuit which has two

stable states and thereby can store one bit of information. Today the word is mainlyused for simple transparent storage elements, while slightly more advanced non-transparent (or clocked ) devices are described as flip-flops . Informally, as thisdistinction is quite new, the two words are sometimes used interchangeably.

S-R latch:

To overcome the restricted combination, one can add gates to the inputs that would convert(S,R) = (1,1) to one of non-restricted combinations. That can be:Q = 1 (1,0) referred to as an S-latch Q = 0 (0,1) referred to as an R-latch

10


11/130

Keep state (0,0) referred to as an E-latch

D-LATCHForbidden input values are forced not to occur by using an inverter between the inputs.

Flip Flops:

D flip flop:

Q

QD(data)

E(enable)

D Q

E Q

E Q

D Q

D Q(t+1)

0 01 1

11


12/130

If you compare the D-flip flop and D latch the only difference you find in the circuit isthat latches do not have clocks and flip flops have it.

So you can note down the difference between latches and flip flops as: Latch is an edge triggered device whereas Flip flop is a level triggered. The output of a latch changes independent of a clock signal whereas the Output of a

Flip - Flop changes at specific times determined by a clocking signal. In Latch We do not require clock pulses and flip flops are clocked devices.

Characteristics- State transition occurs at the rising edge or

falling edge of the clock pulse

Latches

respond to the input only during these periods

Edge-triggered Flip Flops (positive)

respond to the input only at this time

12


13/130

Counters: A counter is a device which stores (and sometimes displays) the number of times a particular event or process has occurred, often in relationship to a clock signal.

4 bit binary counter:

RING COUNTER:

In Ring Counter the output of 1 st flip flop is moved to the input of 2 nd flip flop.

J K

Q

J K

Q

J K

Q

J K

Q

Clock

Counter Enable

A0 A1 A2 A3

OutputCarry

13


14/130

JOHNSON COUNTER :

In Johnson counter the output of last flip flop is inverted and given to the first flip flop.

Registers: It refers to a group of flip-flops operating as a coherent unit to hold data. This isdifferent from a counter, which is a group of flip-flops operating to generate new data bytabulating it.

14


15/130

Shift register: A register that is capable of shifting data one bit at a time is called a shiftregister. The logical configuration of a serial shift register consists of a chain of flip-flopsconnected in cascade, with the output of one flip-flop being connected to the input of itsneighbor. The operation of the shift register is synchronous; thus each flip-flop isconnected to a common clock. Using D flip-flops forms the simplest type of shift-registers .

Bi- directional shift register with parallel load

Summary:

DQ

C DQ

C DQ

C DQ

C

A0

A1 A2 A3

4 x 1MUX

4 x 1MUX

4 x 1MUX

4 x 1MUX

Clock S 0S 1 SeriaIInput

I0 I1 I2 I3SerialInput

15


16/130

Sequential circuits: output logic value depends on the input values and thestate of the blocks. These circuits have memory. Various combinational circuits are:o Latches: An electronic circuit which has two stable states andthereby can store one bit of informationo Flip flops: It also has 2 stable states but with memory.o Counter: A device which stores number of times a particular eventor process has occurred.o Registers: A group of flip-flops operating as a coherent unit to holddata.

Important Questions derived from this:Q1. What is the difference in latch and flip flop?Q2. Explain Johnson counter?Q3. Draw shift register with parallel load.

16


17/130

Lecture 4:

Stored Program control concept

Flynns classification of computers: SISD SIMD MISD MIMD

After the discussion of basic principles of hardware and the combinational and sequentialcircuits we have in our computer system. Lets see how these components are interacted tomake our computer system which we use. We will be starting with the basic architecturesof the computer system. And the most basic one which comes is how the programs arestored in our computer system or how the different programs and data are arranged in our system.

Stored Program control concept

The simplest way to organize a computer is to have one processor register and aninstruction code with 2 parts.

Opcode (What operation is to be completed) Address (Address of the operands on which the operation is to be

computed) A computer that by design includes an instruction set architecture and can store in

memory a set of instructions (a program) that details the computation and the dataon which computation is to be done.

Memory 4096*16

The opcode tells us the operation to be performed. Address tells us the memory location where to find the operand. For a memory unit of 4096 bits we need 12 bits to specify the address.

Instruction Format

011Opcode

15Address

12

015Binary Operand

Fig 1: Stored Program Organization

Processor register

(Accumulator or AC)

Instructions(Program)

Operands(Data)

17


18/130

When we store an instruction code in memory, 4 bits are specified for 16 operations(as 12 bits are for operand address).

For an operation control fetches the instruction from memory, it decodes theoperation (one out of 16) and finds out the operands and then do the operation.

Computers with one processor register generally name it accumulator (or AC). Theoperation is performed with the operand and the content of AC. In case no operand is specified, we compute the operation on

accumulator .E.g. Clear AC, complement AC etc.

PARALLEL COMPUTERSThe one we studied was very basic one but sometimes we have very large computations inwhich one processor with general architecture will not of much help. Thus we take the helpof many processors or divide the processor functions into many functional units and alsodoing the same computation on many data values. So to give solutions to all these we have

various types of computers.Architectural Classification

Flynn's classification Based on the multiplicity of Instruction Streams and Data Streams Instruction Stream

Sequence of Instructions read from memory Data Stream

Operations performed on the data in the processor

Fig 2: Classification accordance to Instruction and Data stream

There are a variety of ways parallel processing can be classified. M.J.Flynn considered the organization of a computer system by the number of

instructions and data items manipulated simultaneously. The normal operation of a computer is to fetch instructions from memory and

execute them in the processor.

Number of Data Streams

Number of InstructionStreams

Single

Multiple

Single Multiple

SISD SIMD

MISD MIMD

18


19/130

The sequence of instructions read from memory constitutes an instructionstream.

The operations performed on the data in the processor constitute a datastream.

Parallel processing can be implemented with either instruction stream, data stream

or both.

SISD COMPUTER SYSTEMS

SISD (Single instruction single data stream) is the simplest computer available. It containsno parallelism. It has single instruction and single data stream. The instructions associatedwith SISD are executed sequentially and the system may or may not have external; parallel

processing capabilities.

Fig 3: SISD ArchitectureCharacteristics

- Standard von Neumann machine- Instructions and data are stored in memory- One operation at a time

LimitationsVon Neumann bottleneck Maximum speed of the system is limited by the

Memory Bandwidth (bits/sec or bytes/sec)- Limitation on Memory Bandwidth - Memory is shared by CPU and I/O

Examples: Superscalar processorsSuper pipelined processorsVLIW

MISD COMPUTER SYSTEMS

MISD (Multiple instruction, single data stream) is of no practical usage as there is leastchance where a lot of instructions get executed on a single data.

Control

Unit

Processor

UnitMemory

Instruction stream

Data stream

19


20/130

Fig 4: MISD Architecture Characteristics

- There is no computer at present that can beClassified as MISD

SIMD COMPUTER SYSTEMS

SIMD (Single instruction Multiple data stream) is the computer where a single instructiongets operated with different sets of data. It gets executed with the help of many processingunits controlled by a single control unit. The shared memory must contain various modulesso that it can communicate with all the processors at the same time.

Main memory is used for storage of programs. Master control unit decodes the instruction and determine the instruction to be

executed.

M1

CU1 P 1

M2

CU2 P 2

Mn CUn P n

Memory

Instruction stream

Data stream

Control Unit

Alignment network

P1

P2

Pn

M1

MnM2

Data bus

Instruction Stream

Data stream

Processor units

Memory modules

20

Memory


21/130

Fig 5: SIMD Architecture Characteristics

- Only one copy of the program exists- A single controller executes one instruction at a time

Examples:Array processorsSystolic arraysAssociative processors

MIMD COMPUTER SYSTEMS

MIMD (Multiple instruction, multiple data stream) refers to a computer system where wehave different processing elements working on different data.In this we classify various multiprocessors and multi computers.

Characteristics

- Multiple processing units- Execution of multiple instructions on multiple data

Fig 6: MIMD Architecture

Types of MIMD computer systems- Shared memory multiprocessors

UMA NUMA

- Message-passing multi computers

SHARED MEMORY MULTIPROCESSORSExample systems

Bus and cache-based systems- Sequent Balance, Encore Multimax

Multistage IN-based systems- Ultra computer, Butterfly, RP3, HEP

Interconnection Network

P1

M1 Pn MnP 2 M2

Shared Memory

21


22/130

Crossbar switch-based systems- C.mmp, Alliant FX/8

LimitationsMemory access latencyHot spot problem

SHARED MEMORY MULTIPROCESSORS (UMA)

Fig 7: Uniform Memory access(UMA)Characteristics

All processors have equally direct access to one large memory address space. Thus theaccess time to reach that memory is same for all processors thus it is named as UMA.

SHARED MEMORY MULTIPROCESSORS (NUMA)


P 1 P nP 2

M1 MnM2


P 1 P nP 2

M MM

22

MnM1 M2


23/130

Fig 8: NUMA (Non uniform memory access)

CharacteristicsAll processors have equally direct access to one large memory address space and also

have their own memory. Thus the access time to reach different memories is different for each processor thus it is named as NUMA.

MESSAGE-PASSING MULTICOMPUTER

Fig 9: Message passing multi computer Architecture

Characteristics- Interconnected computers- Each processor has its own memory, and communicates via message-passing

Example systems- Tree structure: Teradata, DADO- Mesh-connected: Rediflow, Series 2010, J-Machine- Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III

Limitations- Communication overhead- Hard to programming

Summary: Stored Program Control Concept: In this type of organization instructions and data

are stored separately. Flynns classification Of computers: It divided the processing work intodata streams and instruction streams and resulted in:

Message-Passing Network

P 1 P nP 2

M M M

Point-to-point connections

23


24/130

o SISD(Single instruction Single data)o SIMD(Single instruction Multiple data)o MISD(Multiple instruction Single data)o MIMD (Multiple instruction Multiple data)

Important Questions:Q1. Explain stored program control concept.Q2. Explain Flynns classification of computers.Q3. Describe the concept of data stream and instruction stream.

Lecture -5

MULTILEVEL VIEWPOINT OF A MACHINE MICRO ARCHITECTURE

ISA MICRO ARCHITECTURE

CPUCACHESMAIN MEMORY AND SECONDARY MEMORY UNITSINPUT / OUTPUT MAPPING

After the discussion of stored program control concept and the various type of parallelcomputers, lets study the different components of the computer structure.

MULTILEVEL VIEWPOINT OF A MACHINEOur computer is build on various layers.

These layers are basically divided into:Software layer Hardware Layer Instruction Set Architecture

24


25/130

Fig 1: Multilevel viewpoint of a machine

Computer system architecture is decided on the basis of the type of applications or usage of the computer.

The computer architect decides the different layers and the function of each layer for aspecific computer.These layers or functions of each can vary from one organization to another.

Our layered architecture is basically divided into 3 parts:

Macro-Architecture : as a unit of deployment, we will talk about Clientapplications and COM Servers.Computer Architecture is the conceptual design and fundamental operational structure of a computer system. It is a blueprint and functional description of requirements (especiallyspeeds and interconnections) and design implementations for the various parts of acomputer .

This is basically our software layer of the computer. It comprises of :

User Application layer The user layer is basically to give the interface to the user with the computer for which the computer is designed .At this layer the user gives the inputs as what

processing has to be done .The requirements given by the user has to beimplemented by the computer architect with the help of other layers.

High level language

INSTRUCTION SET ARCHITECTURE (ISA)

PROCESSOR MEMORY I/0 SYSTEM

CIRCUIT LEVEL DESIGN

SILICON LAYOUT LAYER

COMPILER

ASSEMBLER

OS MSDOSWINDOWSUNIX / LINUX

USER APPLICATION LAYERSOFTWARELAYER

HARDWARELAYER

DATA PATH AND CONTROL

GATE LEVEL DESIGN

MACROARCHITECTURE

MICROARCHITECTURE

25


26/130

High-level programming language is a programming language with strongabstraction from the details of the computer. In comparison to low-level

programming languages, it may use natural language elements, be easier to use,or more portable across platforms. Such languages hide the details of CPUoperations such as memory access models and management of scope.E.g.

C/Fortran/Pascal .These are not computer dependent.

Assembly languageAssembly Language refers to the lowest-level human-readable method for

programming a particular computer. Assembly Languages are platform specific,and therefore there is a different Assembly Language necessary for

programming every different type of computer.

Machine languageMachine languages consist entirely of numbers and are almost impossible for humans to read and write.

Operating systemOperating systems interface with hardware to provide the necessary servicesfor application software. E.g. OS, LINUX, UNIX etc.

Functions of Operating system: Process management Memory management File management Device management Error Detection Security

Types of Operating system: Multiprogramming Operating System Multiprocessing Operating system Time Sharing Operating system Real time Operating system Distributed Operating system Network Operating system

Compiler Software that translates a program written in a high-level programminglanguage (C/C++, COBOL, etc.) into machine language. A compiler usuallygenerates assembly language first and then translates the assembly languageinto machine language. A utility known as a "linker" then combines all requiredmachine language modules into an executable program that can run in thecomputer.

26


27/130

Assembler is the software that translates assembly language into machinelanguage. Contrast with compiler, which is used to translate a high-levellanguage, such as COBOL or C, into assembly language first and then intomachine language.

Instruction set architecture: This is an abstraction on the interface between thehardware and the low-level software. It deals with the functional behaviour of acomputer system as viewed by a programmer . Computer organization deals withstructural relationships that are not visible by a programmer. Instruction setarchitecture is the attribute of a computing system, as seen by the assemblylanguage programmer or compiler.

ISA is determined by:Data Storage.Memory Addressing Modes.

Operations in the Instruction Set.Instruction Formats.Encoding the Instruction Set.Compilers View.

Micro-Architecture: inside a unit of deployment we will talk about running process,COM apartment, thread concurrency and synchronization, memory sharing.

Micro architecture , also known as Computer organization is a lower level, moreconcrete, description of the system that involves how the constituent parts of the

system are interconnected and how they interoperate in order to implement the ISA.The size of a computers cache for instance, is an organizational issue that generallyhas nothing to do with the

Processor memory I /o system These are the basic hardwaredevices required for the processing of any system application.

Data path and control In different computers we have differentnumber and type of registers and other logic circuits .The data pathand control decides the flow of information within the various partsof the computer system in various circuits.

Gate level design These circuits such as register, counters etc areimplemented in the form of various gates available.

Circuit level design to add the gates to form a logical circuit or acomponent we have the basic circuit level design which ultimatelygives birth to all the hardware components of a computer system.

Silicon layout layer

Other than the architecture of the computer , we have some very basic units which areimportant for our computer.Memory units:

27


28/130


29/130

Encoding the Instruction Set.Compilers View

Other than the structured organization of computer , other importantelements are:

o Memoryo CPUo I/O

Important Questions:Q1. Explain multi level view point of a machine.Q2. Describe micro architecture.Q3. Describe macro architecture.Q4. Explain ISA and why we call it is a link between the hardware and software

components.Q5. What is operating system?

29


30/130

Lecture 6:

CPU performance measures MIPS

MFLOPS

After the discussion of all the elements of computer structure in the previous topics , wedescribe the performance of a computer in this lecture with the help of their performancemetrics.

Performance of a machine is determined by: Instruction count Clock cycle time Clock cycles per instruction

Processor design (datapath and control) will determine: Clock cycle time

Clock cycles per instruction Single cycle processor - one clock cycle per instruction Advantages: Simple design, low CPI Disadvantages: Long cycle time, which is limited by the slowest instruction

We have different methods to calculate the performance of a CPU or two comparetwo CPUs but it highly depends on what type of instructions we give to these CPU.

The two phenomenon we generally use are: MIPS MFLOPS

MIPS: For a specific program running on a specific computer MIPS is a measure of how

many millions of instructions are executed per second:

MIPS = Instruction count / (Execution Time x 106)= Instruction count / (CPU clocks x Cycle time x 106)

= (Instruction count x Clock rate) /(Instruction count x CPI x 106)

= Clock rate / (CPI x 106)

Faster execution time usually means faster MIPS rating.

CPI

Inst.Count

CycleTime

30


31/130

MIPS is a good technique but it also have some pitfalls.Problems with MIPS rating:

No account for the instruction set used. Program-dependent: A single machine does not have a single MIPS rating

since the MIPS rating may depend on the program used.

Easy to abuse: Program used to get the MIPS rating is often omitted. Cannot be used to compare computers with different instruction sets. A higher MIPS rating in some cases may not mean higher performance or

better execution time i.e. due to compiler design variations. For a machine with instruction classes:

For a given program, two compilers produced the following instruction counts:

The machine is assumed to run at a clock rate of 100 MHz.

MIPS = Clock rate / (CPI x 106) = 100 MHz / (CPI x 106)CPI = CPU execution cycles / Instructions countCPU time = Instruction count x CPI / Clock rate

For compiler 1: CPI1 = (5 x 1 + 1 x 2 + 1 x 3) / (5 + 1 + 1) = 10 / 7 = 1.43 MIP1 = 100 / (1.428 x 106) = 70.0

CPU time1 = ((5 + 1 + 1) x 106 x 1.43) / (100 x 106) = 0.10 seconds

For compiler 2: CPI2 = (10 x 1 + 1 x 2 + 1 x 3) / (10 + 1 + 1) = 15 / 12 = 1.25 MIP2 = 100 / (1.25 x 106) = 80.0 CPU time2 = ((10 + 1 + 1) x 106 x 1.25) / (100 x 106) = 0.15 seconds

Instruction class CPIA 1B 2C 3

Instruction counts (in millions)for each instruction class

Code from: A B CCompiler 1 5 1 1Compiler 2 10 1 1

31


32/130

MFLOPS: MFLOPS, for a specific program running on a specific computer, is a measure of

millions of floating point-operation (megaflops) per second.MFLOPS = Number of floating-point operations /(Execution time x 106 )

MFLOPS is a better comparison measure between different machines than MIPS.

This is better than MIPS but it also has some pitfalls.

Problems with MFLOPS: A floating-point operation is an addition, subtraction, multiplication, or division

operation applied to numbers represented by a single or a double precision floating- point representation.

Program-dependent: Different programs have different percentages of floating- point operations present i.e. compilers have no floating- point operations and yielda MFLOPS rating of zero.

Dependent on the type of floating-point operations present in the program.Summary:

Performance of a machine is determined by: Instruction count Clock cycle time Clock cycles per instruction

MIPS = Instruction count / (Execution Time x 106)

MFLOPS = Number of floating-point operations /(Execution time x106 )

Important Questions:Q1. What is MIPS?Q2. What is MFLOPS?Q3. What is the difference between MIPS and MFLOPS?Q4. What are CPU performance measures?

32


33/130

Lecture 7:

Cache Memory Main Memory

Secondary Memory

We have basically 3 type of memories attached with our processor.

Cache MemoryMain MemorySecondary Memory

Primary storage , presently known as memory , is the only one directly accessible to theCPU. The CPU continuously reads instructions stored there and executes them as required.Any data actively operated on is also stored there in uniform manner.

there are two more sub-layers of the primary storage, besides main large-capacity RAM:

Processor registers are located inside the processor. Each register typically holds aword of data (often 32 or 64 bits). CPU instructions instruct the arithmetic and logic unit to perform various calculations or other operations on this data (or with thehelp of it). Registers are technically among the fastest of all forms of computer datastorage.

Processor cache is an intermediate stage between ultra-fast registers and muchslower main memory. It's introduced solely to increase performance of the

computer. Most actively used information in the main memory is just duplicated inthe cache memory, which is faster, but of much lesser capacity. On the other hand itis much slower, but much larger than processor registers. Multi-level hierarchicalcache setup is also commonly used primary cache being smallest, fastest andlocated inside the processor; secondary cache being somewhat larger and slower.

These are the type of memories accessed when we work with processor . But if we have tostore some data permanently we need to take help of secondary or auxiliary memory.

Secondary memory (or secondary storage) is the slowest and cheapest form of memory . Itcannot be processed directly by the CPU . It must first be copied into primary storage (alsoknown as RAM ).

Secondary memory devices include magnetic disks like hard drives and floppy disks ;optical disks such as CDs and CDROMs ; and magnetic tapes , which were the first formsof secondary memory.

Primary memory Secondary memory

33
http://en.wikipedia.org/wiki/Processor_registerhttp://en.wikipedia.org/wiki/Word_(computing)http://en.wikipedia.org/wiki/Word_(computing)http://en.wikipedia.org/wiki/Arithmetic_and_logic_unithttp://en.wikipedia.org/wiki/Arithmetic_and_logic_unithttp://en.wikipedia.org/wiki/Arithmetic_and_logic_unithttp://en.wikipedia.org/wiki/Arithmetic_and_logic_unithttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Memory_hierarchyhttp://en.wikipedia.org/wiki/Memory_hierarchyhttp://www.webopedia.com/TERM/S/memory.htmhttp://www.webopedia.com/TERM/S/memory.htmhttp://www.webopedia.com/TERM/S/CPU.htmhttp://www.webopedia.com/TERM/S/CPU.htmhttp://www.webopedia.com/TERM/S/RAM.htmhttp://www.webopedia.com/TERM/S/hard_disk_drive.htmhttp://www.webopedia.com/TERM/S/floppy_disk.htmhttp://www.webopedia.com/TERM/S/optical_disk.htmhttp://www.webopedia.com/TERM/S/compact_disc.htmhttp://www.webopedia.com/TERM/S/compact_disc.htmhttp://www.webopedia.com/TERM/S/compact_disc.htmhttp://www.webopedia.com/TERM/S/CD_ROM.htmhttp://www.webopedia.com/TERM/S/tape.htmhttp://en.wikipedia.org/wiki/Processor_registerhttp://en.wikipedia.org/wiki/Word_(computing)http://en.wikipedia.org/wiki/Arithmetic_and_logic_unithttp://en.wikipedia.org/wiki/Arithmetic_and_logic_unithttp://en.wikipedia.org/wiki/CPU_cachehttp://en.wikipedia.org/wiki/Memory_hierarchyhttp://en.wikipedia.org/wiki/Memory_hierarchyhttp://www.webopedia.com/TERM/S/memory.htmhttp://www.webopedia.com/TERM/S/CPU.htmhttp://www.webopedia.com/TERM/S/RAM.htmhttp://www.webopedia.com/TERM/S/hard_disk_drive.htmhttp://www.webopedia.com/TERM/S/floppy_disk.htmhttp://www.webopedia.com/TERM/S/optical_disk.htmhttp://www.webopedia.com/TERM/S/compact_disc.htmhttp://www.webopedia.com/TERM/S/CD_ROM.htmhttp://www.webopedia.com/TERM/S/tape.htm


34/130

1. Fast 1. Slow2. Expensive 2. Cheap3. Low capacity 3. Large capacity

4. Connects directly to the processor 4. Not connected directly to the processor

Hard Disks:

Hard disks similar to cassette tapes use the magnetic recording techniques - the magnetic mediumcan be easily erased and rewritten, and it will "remember" the magnetic flux patterns stored onto themedium for many years.

Hard drive consists of platter, control circuit board and interface parts.

A hard disk is a sealed unit containing a number of platters in a stack. Hard disks may be mounted in

a horizontal or a vertical position. In this description, the hard drive is mounted horizontally.

Electromagnetic read/write heads are positioned above and below each platter. As the platters spin,the drive heads move in toward the center surface and out toward the edge. In this way, the driveheads can reach the entire surface of each platter.

On a hard disk, data is stored in thin, concentric bands. A drive head, while in one position can reador write a circular ring, or band called a track. There can be more than a thousand tracks on a 3.5-

inch hard disk. Sections within each track are called sectors. A sector is the smallest physical storageunit on a disk, and is almost always 512 bytes (0.5 kB) in size.

The stack of platters rotate at a constant speed. The drive head, while positioned close to the center of the disk reads from a surface that is passing by more slowly than the surface at the outer edges of the disk. To compensate for this physical difference, tracks near the outside of the disk are less-densely populated with data than the tracks near the center of the disk. The result of the differentdata density is that the same amount of data can be read over the same period of time, from any drive

34


35/130

head position.

The disk space is filled with data according to a standard plan. One side of one platter contains spacereserved for hardware track-positioning information and is not available to the operating system.Thus, a disk assembly containing two platters has three sides available for data. Track-positioning

data is written to the disk during assembly at the factory. The system disk controller reads this datato place the drive heads in the correct sector position.

Magnetic Tapes:

An electric current in a coil of wire produces a magnetic field similar to that of a bar magnet , andthat field is much stronger if the coil has a ferromagnetic (iron-like) core

Tape heads are made from rings of ferromagnetic material with a gap where the tape contacts it sothe magnetic field can fringe out to magnetize the emulsion on the tape. A coil of wire around thering carries the current to produce a magnetic field proportional to the signal to be recorded. If analready magnetized tape is passed beneath the head, it can induce a voltage in the coil. Thus the samehead can be used for recording and playback .

35
http://hyperphysics.phy-astr.gsu.edu/hbase/electric/elecur.html#c1http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/elemag.html#c5http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/elemag.html#c3http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/elemag.html#c3http://hyperphysics.phy-astr.gsu.edu/hbase/solids/ferro.html#c4http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/toroid.html#c1http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/toroid.html#c1http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape2.html#c3http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape.html#c3http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape.html#c3http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape.html#c4http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape.html#c4http://hyperphysics.phy-astr.gsu.edu/hbase/electric/elecur.html#c1http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/elemag.html#c5http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/elemag.html#c3http://hyperphysics.phy-astr.gsu.edu/hbase/solids/ferro.html#c4http://hyperphysics.phy-astr.gsu.edu/hbase/magnetic/toroid.html#c1http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape2.html#c3http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape.html#c3http://hyperphysics.phy-astr.gsu.edu/hbase/audio/tape.html#c4


36/130

Lecture 8:

Instruction Set based classification of computers Three address instructions

Two address instructions One address instructions Zero address instructions RISC address instructions CISC address instructions RISC Vs CISC

In the last chapter we discussed the various architectures and the layers of the computer architecture. In this chapter we are explaining the middle layer of the multilevel view pointof a machine i.e. Instruction Set Architecture.

Instruction Set Architecture (ISA) is an abstraction on the interface between the hardwareand the low-level software.

It comprises of :Instruction Formats.Memory Addressing Modes.Operations in the Instruction Set.Encoding the Instruction Set.Data Storage.Compilers View.

Instruction FormatIs the representation of the instruction. It contains the various Instruction Fields :

opcode field specify the operations to be performed Address field(s) designate memory address(es) or processor register(s) Mode field(s) determine how the address field is to be interpreted to get effective

address or the operand The number of address fields in the instruction format :

depend on the internal organization of CPU The three most common CPU organizations :

- Single accumulator organization :

ADD X /* AC AC + M[X] */- General register organization :ADD R1, R2, R3 /* R1 R2 + R3 */ADD R1, R2 /* R1 R1 + R2 */MOV R1, R2 /* R1 R2 */ADD R1, X /* R1 R1 + M[X] */

- Stack organization :PUSH X /* TOS M[X] */

36


37/130


38/130

One goal for CISC machines was to have a machine language instruction to matcheach high-level language statement type.

Criticisms on CISC-Complex Instruction

Format, Length, Addressing Modes Complicated instruction cycle control due to the complex decoding HWand decoding process

- Multiple memory cycle instructions Operations on memory data Multiple memory accesses/instruction- Microprogrammed control is necessity Microprogram control storage takes substantial portion of CPU chip area Semantic Gap is large between machine instruction and microinstruction- General purpose instruction set includes all the features required byindividually different applications

When any one application is running, all the features required bythe other applications are extra burden to the application

RISC

In the late 70s - early 80s, there was a reaction to the shortcomings of the CISC style of processors

Reduced Instruction Set Computers (RISC) were proposed as analternative

The underlying idea behind RISC processors is to simplify the instruction set and reduce

instruction execution time

Note : In RISC type of instructions , we cant access the memory operands directly .

Evaluate X = (A + B) * (C + D) :MOV R1, A /* R1 M[A] */MOV R2, B /* R2 M[B] */ADD R1,R1,R2 /* R1 R1 + R2MOV R2, C /* R2 M[C] */MOV R3, D /* R3 M[D] */ADD R2,R2, R3 /* R2 R2 + R2 */

MUL R1,R1, R2 /* R1 R1 * R2 */MOV X, R1 /* M[X] R1 */

RISC processors often feature : Few instructions Few addressing modes Only load and store instructions access memory

38


39/130

All other operations are done using on-processor registers Fixed length instructions Single cycle execution of instructions The control unit is hardwired, not microprogrammed

Since all (but the load and store instructions) use only registers for operands, only a few addressing modes are needed

By having all instructions the same length : reading them in is easy and fast

The fetch and decode stages are simple, looking much more like Manos BCthan a CISC machine

The instruction and address formats are designed to be easy to decode (Unlike the variable length CISC instructions,) the opcode and register fields of RISC instructions can be decoded simultaneously

The control logic of a RISC processor is designed to be simple and fast : The control logic is simple because of the small number of instructions andthe simple addressing modes

The control logic is hardwired, rather than microprogrammed, becausehardwired control is faster

ADVANTAGES OF RISCVLSI Realization- Control area is considerably reduced

RISC chips allow a large number of registers on the chip- Enhancement of performance and HLL support

- Higher regularization factor and lower VLSI design cost

Computing Speed- Simpler, smaller control unit faster - Simpler instruction set; addressing modes; instruction format

faster decoding- Register operation faster than memory operation- Register window enhances the overall speed of execution- Identical instruction length, One cycle instruction execution

suitable for pipelining faster

Design Costs and Reliability- Shorter time to design

reduction in the overall design cost and reduces the problem that the end product will be obsolete by the time the design is completed

- Simpler, smaller control unithigher reliability

- Simple instruction format (of fixed length)

39


40/130

ease of virtual memory management High Level Language Support

- A single choice of instruction shorter, simpler compiler - A large number of CPU registers more efficient code- Register window Direct support of HLL

- Reduced burden on compiler writer

RISC VS CISC

The CISC Approach Thus, the entire task of multiplying two numbers can be completed with oneinstruction:

MULT 2:3, 5:2

One of the primary advantages of this system is that the compiler has to do verylittle work to translate a high-level language statement into assembly. Because the

length of the code is relatively short, very little RAM is required to storeinstructions. The emphasis is put on building complex instructions directly into thehardware.

The RISC ApproachIn order to perform the exact series of steps described in the CISC approach, a

programmer would need to code four lines of assembly: LOAD A, 2:3

LOAD B, 5:2PROD A, BSTORE 2:3,

A At first, this may seem like a much less efficient way of completing the

operation. Because there are more lines of code, more RAM is needed to store theassembly level instructions. The compiler must also perform more work to converta high-level language.

RISC vs CISC

Emphasis on hardwareTransistors used for storingcomplex instructions

Emphasis on software

Spends more transistorson memory registers

Includes multi-clock complex instructions,

Single-clock reduced instruction only

Memory-to-memory:"LOAD" and "STORE"incorporated in instructions

Register to register:"LOAD" and "STORE"are independent instructions

Small code sizes large code sizes

40


41/130

High cycles per second Low cycles per second

Summary: The instruction format is composed of the opcode field, address field, and mode field. The different types of address instructions used are three-address, two-address, one-

address and zero-address.

RISC and CISC Introduction with its advantages and criticism RISC Vs CISC

Important Questions:Q1.Explain the different addressing formats in detail with example.Q2.Explain RISC AND CISC with their advantages and criticisms.Q3 Numerical

41


42/130

Lecture 9:

Addressing modes Implied Mode Immediate Mode Register Mode Register Indirect Mode Autoincrement or Autodecrement Mode Direct Addressing Mode Indirect Addressing Mode Relative addressing Mode

In the last lecture we studied the instruction formats, now we study how the instructionsuse the addressing modes of different types.

Addressing Modes

Addressing Modes* Specifies a rule for interpreting or modifying the address field of the instruction

(before the operand is actually referenced)* Variety of addressing modes

- to give programming flexibility to the user - to use the bits in the address field of the instruction efficiently

In simple words we can say the addressing modes is the way to fetch operands (or Data)from memory.

TYPES OF ADDRESSING MODES Implied Mode

: Address of the operands are specified implicitly in the definition of the instruction- No need to specify address in the instruction- EA = AC, or EA = Stack[SP]- Examples from BC : CLA, CME, INP

Immediate Mode: Instead of specifying the address of the operand,operand itself is specified- No need to specify address in the instruction- However, operand itself needs to be specified- (-)Sometimes, require more bits than the address- (+) Fast to acquire an operand- Useful for initializing registers to a constant value

Register Mode: Address specified in the instruction is the register address- Designated operand need to be in a register

42


43/130

- (+) Shorter address than the memory address-- Saving address field in the instruction- (+) Faster to acquire an operand than the memory addressing- EA = IR(R) (IR(R): Register field of IR)

Register Indirect Mode

: Instruction specifies a register which contains the memory address of the operand- (+) Saving instruction bits since register address is shorter than the memoryaddress

- (-) Slower to acquire an operand than both the register addressing or memoryaddressing

- EA = [IR(R)] ([x]: Content of x) Autoincrement or Autodecrement Mode

- Similar to the register indirect mode except :When the address in the register is used to access memory, the value in the register

is incremented or decremented by 1 automatically Direct Address Mode

: Instruction specifies the memory address which can be used directly to access thememory- (+) Faster than the other memory addressing modes- (-) Too many bits are needed to specify the address for a large physical memory

space- EA = IR(addr) (IR(addr): address field of IR)- E.g., the address field in a branch-type instr

Indirect Addressing Mode: The address field of an instruction specifies the address of a memory location that

contains the address of the operand- (-) Slow to acquire an operand because of an additional memory access- EA = M[IR(address)]

Relative Addressing Modes: The Address fields of an instruction specifies the part of the address(abbreviated address) which can be used along with a designatedregister to calculate the address of the operand--> Effective addr = addr part of the instr + content of a special register - (+) Large physical memory can be accessed with a small number of address bits- EA = f(IR(address), R), R is sometimes implied--> Typically EA = IR(address) + R - 3 different Relative Addressing Modes depending on R

* (PC) Relative Addressing Mode (R = PC)* Indexed Addressing Mode (R = IX, where IX: Index Register)* Base Register Addressing Mode (R = BAR(Base Addr Register))* Indexed addressing mode vs. Base register addressing mode

- IR(address) (addr field of instr) : base address vs. displacement- R (index/base register) : displacement vs. base address- Difference: the way they are used (NOT the way they are computed)

* indexed addressing mode : processing many operands in an array using the same instr

43


44/130

* base register addressing mode : facilitate the relocation of programs in memory inmultiprogramming systems

Addressing Modes: Examples

Summary: Addressing Modes: Specifies a rule for interpreting or modifying the address field

of the instruction. The different types of addressing modes are: Implied mode, Immediate mode,

Register mode, Register indirect mode, Autoincrement or auto decrement mode,Direct mode, Indirect mode, Relative addressing mode.

Important Questions:Q1. Explain the addressing modes with suitable examples.

44


45/130

Lecture 10:

Instruction set Data Transfer Instructions

o Typical Data Transfer Instructionso Data Transfer Instructions with Different Addressing

Modes Data Manipulation Instructions

o Arithmetic instructionso Logical and bit manipulation instructionso Shift instructions

Program Control Instructionso Conditional Branch Instructionso Subroutine Call & Return

DATA TRANSFER INSTRUCTIONS

These are the type of instructions used only for the transfer of data fromregisters to registers, registers to memory operands and other memorycomponents. There is no manipulation done on the data values.

These are the type of instructions in which there is no usage of variousaddressing modes. We have a direct transfer between the various registersand memory components.

Like Load and store we used for the transfer of data to and from theaccumulator.

Load LDStore STMove MOVExchange XCHInput INOutput OUTPush PUSHPop POP

Name Mnemonic

Typical Data Transfer Instructions

Table 3.1

45


46/130


47/130

Arithmetic Instructions : These are the type of instructions used for arithmeticalcalculations like addition , subtraction , increment etc.

Logical and Bit Manipulation Instructions

These are the type of instructions in which are operations are computed on string of bits. These bits are treated as individual and thus the operation can be done onindividual or a group of bits ignoring the whole value and even new bits insertion is

possible.

For example:CLR R1 will make all the bits as 0.COM R1 will invert all the bits.AND , OR and XOR will produce the result on 2 individual bits of each operand.E.g.: AND of 0011 and 1100 will result to:

0000.AND instruction is also known as mask instruction as if we have to mask some values of operand we can AND that value with 0s giving other inputsas 1(high).E.g.: Suppose we have to mask register with value 11000110

On 1st

, 3rd

and 7th

bit. Then we will have to AND it with value 01011101.

CLRC, SETC and COMC will work only on 1 bit of the operand i.e. Carry.

Similarly in case of EI and DI we work only on 1 bit interrupt flip flop toenable it.

NameMnemonicIncrement INCDecrement DECAdd ADDSubtract SUBMultiply MULDivide DIVAdd with Carry ADDCSubtract with Borrow SUBB

Negate(2s Complement) NEG

Table 3.3

47


48/130

Name Mnemonic

Clear CLRComplement COMAND ANDOR ORExclusive-OR XORClear carry CLRCSet carry SETCComplement carry COMCEnable interrupt EIDisable interrupt DI

Shift Instructions : These are the type of instructions which modify the whole valueof operand but by shifting the bits on left or right side.

Say R1 has value 11001100o SHR inserts 0 at the left most position.

Result 01100110o SHL inserts 0 at the right most position.

Result 10011000o SHRA : In case of SHRA the sign bit remains same else every bit shift left

or right accordingly.Result 11100110

o SHLA is same as that of SHL inserting 0 in the end.Result 10011000

o In ROR , all the bits are shifted towards right and the rightmost one movesto leftmost position.

Result : 01100110o In ROL , all the bits are shifted towards left and the leftmost one moves to

rightmost position.Result : 10011001

o In case of RORC , suppose we have a carry bit as 0 with register R1. In thisall the bits of the register will be right shifted and the value of carry will bemoved to leftmost position and the rightmost position will be moved tocarry.

48

Table 3.4


49/130

Result : 01100110 with carry as 0o Similarly in case of ROLC , we will get all the bits of the register left

shifted and the value of carry moved to rightmost position and the leftmost position will be moved to carry.

Result : 10011000 with carry as 1.

PROGRAM CONTROL INSTRUCTIONS:

Before starting with program control instructions, lets study the concept of PC i.e.Program counter. Program counter is the register which tells us the address of thenext instruction to be executed. When we fetch the instruction pointed by PC frommemory it changes it value giving us the address of the next instruction to be fetched.In case of sequential instructions it simply increments itself and in case of branchingor modular programs it gives us the address of the first instruction of the calledprogram. After the execution of the called program , the program counter points back to the instruction next to the instruction from which the subprogram was called. Incase of go to kind of instructions the program counter simply changes the value of program counter with out keeping any reference of the previous instruction..

Logical shift right SHRLogical shift left SHLArithmetic shift right SHRAArithmetic shift left SHLARotate right ROR

Rotate left ROLRotate right thru carry RORCRotate left thru carry ROLC

NameMnemonic

PC

+1In-Line Sequencing (Next instruction is fetchedfrom the next adjacent location in the memory)

Address from other source; Current Instruction,Stack, etc; Branch, Conditional Branch,Subroutine, etc.

49

Table 3.5


50/130

Program Control Instructions: These instructions are used for the transfer of control toother instructions. That is these instructions are used in case we have to execute the nextinstruction from some other location instead of sequential manner.

The conditions can be :Calling a sub program

Returning to the main programJumping onto some other instruction or locationSkip the instructions in case of break and exit or in case the condition youcheck is false and so on

*CMP and TST instructions do not retain their results of operation (- and AND,respectively).They only set or clear certain flags.

Conditional Branch Instructions: These are the instructions in which we test someconditions and depending on the result we go either for branching or sequential way.

NameMnemonic

Branch BRJump JMPSkip SKP

Call CALLReturn RTNCompare(by ) CMPTest(by AND) TST

50

Table 3.6


51/130

Subroutine Call and Return:

Subroutine Call : Call SubroutineJump to SubroutineBranch to SubroutineBranch & save return address

Two most important operations are implied:*Branch to the beginning of the Subroutine

-Same as the branch or conditional branch*Save the return address to get the address of the location of the calling program

upon exit from the subroutine.

Location of storing return address:Fixed Location in the subroutine (Memory)Fixed Location in memory

BZ Branch if zero Z = 1BNZ Branch if not zero Z = 0BC Branch if carry C = 1

BNC Branch if no carry C = 0BP Branch if plus S = 0BM Branch if minus S = 1BV Branch if overflow V = 1BNV Branch if no overflow V = 0

BHI Branch if higher A > BBHE Branch if higher or equal A BBLO Branch if lower A < BBLOE Branch if lower or equal A B

BE Branch if equal A = BBNE Branch if not equal A B

BGT Branch if greater than A > BBGE Branch if greater or equal A BBLT Branch if less than A < BBLE Branch if less or equal A BBE Branch if equal A = BBNE Branch if not equal A B

Unsigned compare conditions (A - B)

Signed compare conditions (A - B)

Mnemonic Branch condition Tested condition

51

Table 3.7


52/130

In a processor Register In memory stack

-most efficient way

Summary:Data Transfer Instructions are of two types namely: Typical Data Transfer Instructions and Data Transfer Instructions with Different Addressing Modes.The Data Manipulation Instructions are of three types, which are Arithmeticinstructions, Logical and bit manipulation instructions and Shift instructions.Program Control Instructions can be divided into Conditional Branch Instructionsand Subroutine Call & Return instructions.

Important Questions:Q1.Explain the data Transfer instructions.Q2.Explain the data Manipulation instructions.Q3.Explain the Program control instructions with example.

52

CALLSP SP - 1M[SP] PC

PC EA

RTNPC M[SP]

SP SP + 1


53/130

Lecture 11:

Program Interrupts MASM

PROGRAM INTERRUPT:

Types of Interrupts:1. External Interrupt : External interrupts are initiated from outside of CPU &

memory.-I/O device-> Data transfer request or data transfer complete-Timing device ->timeout- Power failure- Operator

2. Internal Interrupts (traps) : Internal Interrupts are caused by thecurrently running program.

- Register, Stack Overflow- Divide by Zero- OP- code violation- Protection Violation

3. Software Interrupts : Both external & internal interrupts areintiated by the computer hardware. Software interrupts are initiated

by the executing instruction.-Supervisor Call -> Switching from user mode to the supervisor

mode-> Allows to execute a certain class of

operations which are not allowed in theuser mode.

MASM:

If you have used a modern word processor such as Microsoft Word and have noticed themacros feature. Where you can record a series of frequently used actions or commands intothe macros. For example, you always need to insert a 2 by 4 column with the title "Date"and "Time". You can start the macro recorder and create the table as you wish. After that,you can save the macro. The next time you need to create the same kind of table, you just

need to execute the macro. The same applies for a macro assembler. It enables you torecord down frequently performed actions or a frequently used block of code so that you donot have to re-type it each time.

The Microsoft Macro Assembler (abbreviated MASM) is an x86 high-level assembler for DOS and Microsoft Windows . Currently it is the most popular x86 assembler . It supports awide variety of macro facilities and structured programming idioms, including high- levelfunctions for looping and procedures . Later versions added the capability of producing

53
http://en.wikipedia.org/wiki/X86_architecturehttp://en.wikipedia.org/wiki/High-level_assemblerhttp://en.wikipedia.org/wiki/High-level_assemblerhttp://en.wikipedia.org/wiki/DOShttp://en.wikipedia.org/wiki/Microsoft_Windowshttp://en.wikipedia.org/wiki/Microsoft_Windowshttp://en.wikipedia.org/wiki/Assembly_language#Assemblerhttp://en.wikipedia.org/wiki/Assembly_language#Assemblerhttp://en.wikipedia.org/wiki/Macro_(computer_science)http://en.wikipedia.org/wiki/Structured_programminghttp://en.wikipedia.org/wiki/High-level_programming_languagehttp://en.wikipedia.org/wiki/Control_flow#Loopshttp://en.wikipedia.org/wiki/Control_flow#Loopshttp://en.wikipedia.org/wiki/Subroutinehttp://en.wikipedia.org/wiki/X86_architecturehttp://en.wikipedia.org/wiki/High-level_assemblerhttp://en.wikipedia.org/wiki/DOShttp://en.wikipedia.org/wiki/Microsoft_Windowshttp://en.wikipedia.org/wiki/Assembly_language#Assemblerhttp://en.wikipedia.org/wiki/Macro_(computer_science)http://en.wikipedia.org/wiki/Structured_programminghttp://en.wikipedia.org/wiki/High-level_programming_languagehttp://en.wikipedia.org/wiki/High-level_programming_languagehttp://en.wikipedia.org/wiki/Control_flow#Loopshttp://en.wikipedia.org/wiki/Subroutine


54/130

programs for Windows. MASM is one of the few Microsoft development tools that target16-bit , 32-bit and 64-bit platforms . Earlier versions were MS-DOS applications. Versions5.1 and 6.0 were OS/2 applications and later versions were Win32 console applications.Versions 6.1 and 6.11 included Phar Lap 's TNT DOS extender so that MASM could run inMS-DOS.[ citation needed

The name MASM originally referred to as MACRO ASSEMBLER but over theyears it has become synonymous with Microsoft Assembler.An Assembly language translator converts macros into several machine languageinstructions.MASM isn't the fastest assembler around (it's not particularly slow, except in acouple of degenerate cases, but there are faster assemblers available).

Though very powerful, there are a couple of assemblers that, arguably, are more powerful (e.g., TASM and HLA).MASM is only usable for creating DOS and Windows applications; you cannot

effectively use it to create software for other operating systems.

Benefits of MASMThere are some benefits to using MASM today:

Steve Hutchessen's ("Hutch") MASM32 package provides the support for MASM thatMicrosoft no longer provides.

You can download MASM (and MASM32) free from Microsoft and other sites. Most Windows' assembly language examples on the Internet today use MASM syntax. You may download MASM directly from Webster as part of the MASM32 package.

Summary:

Program Interrupts can be external, internal or software interrupts.MASM is Microsoft or macro assembler used for implementing macros.

Important Questions:Q1.What are Program interrupts. Explain the types of Program interrupts.Q2. Explain MASM in detail.

54
http://en.wikipedia.org/wiki/16-bithttp://en.wikipedia.org/wiki/32-bithttp://en.wikipedia.org/wiki/64-bithttp://en.wikipedia.org/wiki/Computing_platformhttp://en.wikipedia.org/wiki/MS-DOShttp://en.wikipedia.org/wiki/MS-DOShttp://en.wikipedia.org/wiki/OS/2http://en.wikipedia.org/wiki/OS/2http://en.wikipedia.org/wiki/Win32_consolehttp://en.wikipedia.org/wiki/Win32_consolehttp://en.wikipedia.org/wiki/Phar_Lap_(company)http://en.wikipedia.org/wiki/Phar_Lap_(company)http://en.wikipedia.org/wiki/DOS_extenderhttp://en.wikipedia.org/wiki/DOS_extenderhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/16-bithttp://en.wikipedia.org/wiki/32-bithttp://en.wikipedia.org/wiki/64-bithttp://en.wikipedia.org/wiki/Computing_platformhttp://en.wikipedia.org/wiki/MS-DOShttp://en.wikipedia.org/wiki/OS/2http://en.wikipedia.org/wiki/Win32_consolehttp://en.wikipedia.org/wiki/Phar_Lap_(company)http://en.wikipedia.org/wiki/DOS_extenderhttp://en.wikipedia.org/wiki/Wikipedia:Citation_neededhttp://en.wikipedia.org/wiki/Wikipedia:Citation_needed


55/130

Lecture 10:

CPU Architecture typeso Accumulator o Register o Stack o Memory / Register

Detailed data path of a register based CPU

In Unit -3 we discussed the instruction set computer(ISA) which deals with the varioustypes of address instructions , addressing modes and different types of instructions invarious computer architectures.

In this chapter we will discuss the various type of computer organizations we have. In general, most processors or computers are organized in one of 3 ways

Single register (Accumulator) organization Basic Computer is a good example Accumulator is the only general purpose register

Stack organization All operations are done using the hardware stack For example, an OR instruction will pop the two top elements from

the stack, do a logical OR on them, and push the result on the stack General register organization

Used by most modern computer processors Any of the registers can be used as the source or destination for

computer operations

Accumulator type of Organization:In case of accumulator type of organizations, one operand is in memory and other is inaccumulator.

The instructions we can run with accumulator are :

AC AC DR AND with DR AC AC + DR Add with DR AC DR Transfer from DR AC(0-7) INPR Transfer from INPR AC AC ComplementAC shr AC, AC(15) E Shift rightAC shl AC, AC(0) E Shift leftAC 0 ClearAC AC + 1 Increment

55


56/130

Circuit required:

Stack Organization:Stack

- Very useful feature for nested subroutines, nested interrupt services- Also efficient for arithmetic expression evaluation- Storage which can be accessed in LIFO- Pointer: SP- Only PUSH and POP operations are applicable

Stack type of organization is of two types

1616

8

Adder andlogiccircuit

16 AC

Accumulator From DR

From INPR

ControlGates

LD INR CLR

16

To bus

Clock

56


57/130

REGISTER STACK ORGANIZATION

Register Stack

Push, Pop operations

ABC

01234

63

Address

FULL EMPTY

SP

DR

Flags

Stack pointer

6 bits

/* Initially, SP = 0, EMPTY = 1, FULL = 0 */

PUSH POPSP SP + 1 DR M[SP]

M[SP] DR SP SP 1If (SP = 0) then (FULL 1) If (SP = 0) then (EMPTY 1)EMPTY 0 FULL 0

57


58/130

MEMORY STACK ORGANIZATION

Memory with Program, Data, and Stack Segments

A portion of memory is used as a stack with a processor register as a stack pointer

- PUSH: SP SP - 1M [SP] DR

- POP: DR M [SP]SP SP + 1

Note: Most computers do not provide hardware to check stack overflow (fullstack) or underflow (empty stack) must be done in software

Register type of organization:In this we take the help of various registers , say R1 to R8 for transfer and

manipulation of data.

Detailed data path of a typical register based CPU

4001400039993998

3997

3000

Data(Operands)

Program(Instructions)

1000

PC

AR

SPStack

Stack growsIn this direction

58


59/130

To avoid memory access directly (as it is very time consuming and thus a costly technique), we prefer the register organization as it proves to be more efficient and time savingorganization.

In this we are using 7 registers. The two multiplexers and a decoder decide which registersto be used as operands source and what register to be used as a destination for the storageof result.MUX 1 decides the 1st operand register which depends on the values of SELS1 (Selector for source 1).Similarly, for MUX 2, SELs2 works as input for 2nd operand decision.

These two inputs through S1bus and S2 bus reach ALU. OPR denotes the type of operationto be performed and the computation or operation is performed on ALU. Then the result iseither stored back in one of the 7 registers with the help of decoder which decides which isthe resultant register with the help of SELD.

MUX1SELS1 {

MUX2

}SELS2

ALUOPR

R1R2R3R4R5R6R7

Input

3 x 8

Decoder

SELD

Load(7 lines )

Output/Result

S1bus

S2bus

Clock

59


60/130

Lecture 13:

Address Sequencing / Microinstruction Sequencing

Implementation of control unit

Address Sequencing/Microinstruction Sequencing:Microinstructions are stored in control memory in groups, with each group specifying aroutine. The hardware that controls the address sequencing of the control memory must becapable of sequencing the microinstructions within a routine and be able to branch fromone routine to another with the help of this circuit.

Steps : An initial address is loaded into CAR at power turned ON that usually is the firstmicroinstruction that activates the instruction fetch routine.This routine may be sequenced

by incrementing.At the end of the fetch routine the instructionm is in the IR of thecomputer.Next the control memory computes the effective address of the operand.The netstep is the execution of the instruction fetched from memory.The transformation from the instruction code bits to an address in control memory wherethe routine is located is reffered to as a mapping process.

Instruction code

Mappinglog ic

Multiplexers

Control memory (ROM)

Subroutineregister (SBR )

Branchlogic

Statusbits

Microoperations

Control address register (CAR)

Incrementer

MUXselect

select a statusbit

Branch address

60


61/130

At the completion of the execution of the instruction, control must return to the fetchroutine by executing an unconditional; branch microinstruction to the first address of thefetch routine.Sequencing Capabilities Required in a Control Storage

- Incrementing of the control address register

- Unconditional and conditional branches- A mapping process from the bits of the machineinstruction to an address for control memory

- A facility for subroutine call and return

Design of control Unit:After getting the microoperations we have to execute these microperations but before thatwe need to decode them.

Fig: Decoding of microoperation fields.Because we have 8 microoperations represented with the help of 3 bits in every table andalso we have 3 such tables possible we have decoded these microperations field bits withthree 3*8 decoders.After getting the microoperations, we have to give it to particular circuits, the datamanipulation type of microperations like AND, ADD, Sub and so on we give to ALU and

microoperation fields

3 x 8 decoder

6 5 4 3 2 1 0

F1

3 x 8 decoder

7 6 5 4 3 2 1 0

F2

3 x 8 decoder

7 6 5 4 3 2 1 0

F3

Arithmeticlogic andshift unit

ANDADD

DRTAC

ACLoad

FromPC

FromDR(0-10)

Select 0 1Multiplexers

AR

Load Clock

AC

DR

D R T A R P C T A R

61


62/130

the corresponding results moved to AC. The ALU has been provided data from AC andDR.And for data transfer type of instructions like in the case of PCTAR or DRTAR we need tosimply transfer the values .Because we have two options for data transfer in AR we aretaking the help of MUX to choose one . We will take 2*1 MUX and one select line which

is attached with DRTAR microperation signal .That means if DRTAR is high then MUXwill choose DR to transfer the data to AR else PC s data will be moved to AR.And thecorresponding data movement will be done with the help of load high or not .If any of thevalues is high the value will be loaded to AR.

The clock signal is provided for the synchronization of microoperations.

62


63/130

Lecture 13:

Fetch and decode cycle Control Unit

Fetch and Decode

T0: AR PC (S0S

1S

2=010, T0=1)

T1: IR M [AR], PC PC + 1 (S0S1S2=111, T1=1)T2: D0, . . . , D7 Decode IR(12-14), AR IR(0-11), I IR(15)

S2

S 1

S0

Bus

7MemoryunitAddress

Read

AR

LD

PC

INR

IR

LD Clock

1

2

5

Common bus

T1

T0

63


64/130


65/130

Control Unit

Control unit (CU) of a processor translates from machine instructions to thecontrol signals for the microoperations that implement them

Control units are implemented in one of two ways Hardwired Control

CU is made up of sequential and combinational circuits to generate thecontrol signals

Microprogrammed Control A control memory on the processor contains microprograms that

activate the necessary control signals

We will consider a hardwired implementation of the control unit for the BasicComputer

S2

S1

S0

Bus

7MemoryunitAddress

Read

AR

LD

PC

INR

IR

LD Clock

1

2

5

Common bus

T1

T0

65


66/130

Lecture 15:

Memory hierarchy and its organization Need of memory hierarchy Locality of reference principle

In the last units we have studied the various instructions , data and the registers associatedwith our computer organization.Lets come on to micro architecture of computer , in which an important part is memory.Lets study what is a memory and what are the various types of memory available.

Memory unit is a very essential component in a computer which is used for storing programs and data. We use main memory for running programs and also additionalcapacity for storage . We have various levels of memory units in terms of memoryhierarchy.

MEMORY HIERARCHY

Memory Hierarchy is to obtain the highest possible access speed while minimizing thetotal cost of the memory system

The various components are:

Main Memory: The memory unit that communicates directly with CPU. The programs anddata currently needed by the processor reside in main memory.

Auxiliary Memory : This is made of devices that provide backup storage. Example :Magnetic tapes , magnetic disks etc.

Cache memory : This is the memory which lies in between your main memory and CPU.

]

Magnetictapes

Magneticdisks

I/Oprocessor

CPU

Mainmemory

Cachememory

66


67/130

Fig :Memory Hierarchy

In this hierarchy , we have magnetic tapes at the lowest level which means they are veryslow and very cheap in nature. Moving on to upper levels , we have main memory in whichwe get increased speed but with increased cost per bit.

Thus we can conclude as we go towards upper levels:- Price increases- Speed increases- Cost per bit increases- Access time decreases- Size decreases

Many operating systems are designed to enable the CPU to process a number of independent programs concurrently. This concept is called multiprogramming.This is made

possible by the existence of 2 programs residing in different pats of memory hierarchy atthe same time . Example : CPU and I/O transfer.

The locality of reference , also known as the locality principle , is the phenomenon, that the

collection of the data locations referenced in a short period of time in a runningcomputer, often consists of relatively well predictable clusters .

Analysis of a large number of typical programs has shown that the references to memory atany given interval of time tend to be confined within a few localized areas in memory. This

phenomenon is known as locality of reference

Register

Cache

Main Memory

Magnetic Disk

Magnetic Tape

67


68/130

Important special cases of locality are temporal , spatial , equidistant and branch locality.

Temporal locality: if at one point in time a particular memory location isreferenced, then it is likely that the same location will be referenced again in thenear future. There is a temporal proximity between the adjacent references to the

same memory location. In this case it is common to make efforts to store a copy of the referenced data in special memory storage, which can be accessed faster.Temporal locality is a very special case of the spatial locality, namely when the

prospective location is identical to the present location.

Spatial locality: if a particular memory location is referenced at a particular time,then it is likely that nearby memory locations will be referenced in the near future.There is a spatial proximity between the memory locations, referenced at almost thesame time. In this case it is common to make efforts to guess, how bigneighbourhood around the current reference is worthwhile to prepare for faster access.

Equidistant locality: it is halfway between the spatial locality and the branchlocality. Consider a loop accessing locations in an equidistant pattern, i.e. the pathin the spatial-temporal coordinate space is a dotted line. In this case, a simplelinear function can predict which location will be accessed in the near future.

Branch locality: if there are only few amount of possible alternatives for the prospective part of the path in the spatial-temporal coordinate space . This is thecase when an instruction loop has a simple structure, or the possible outcome of asmall system of conditional branching instructions is restricted to a small set of

possibilities. Branch locality is typically not a spatial locality since the few

possibilities can be located far away from each other. Sequential locality: In a typical program the execution of instructions follows asequential order unless branch instructions create out of order execution. This alsotake into consideration spatial locality as the sequential instructions are stored near to each other.

In order to make benefit from the very frequently occurring temporal and spatial kind of locality, most of the information storage systems are hierarchical. The equidistant localityis usually supported by the diverse nontrivial increment instructions of the processors. For the case of branch locality, the contemporary processors have sophisticated branch

predictors, and on the base of this prediction the memory manager of the processor tries tocollect and preprocess the data of the plausible alternatives.

Reasons for locality

There are several reasons for locality. These reasons are either goals to achieve or circumstances to accept, depending on the aspect. The reasons below are not disjoint; infact, the list below goes from the most general case to special cases.

68


69/130

Predictability: In fact, locality is merely one type of predictable behavior incomputer systems. Luckily, many of the practical problems are decidable and hencethe corresponding program can behave predictably , if it is well written.

Structure of the program: Locality occurs often because of the way in whichcomputer programs are created, for handling decidable problems. Generally, related

data is stored in nearby locations in storage. One common pattern in computing involves the processing of several items, one at a time. This means that if a lot of processing is done, the single item will be accessed more than once, thus leading totemporal locality of reference. Furthermore, moving to the next item implies thatthe next item will be read, hence spatial locality of reference, since memorylocations are typically read in batches.

Linear data structures: Locality often occurs because code contains loops thattend to reference arrays or other data structures by indices. Sequential locality, aspecial case of spatial locality, occurs when relevant data elements are arranged andaccessed linearly. For example, the simple traversal of elements in a one-

dimensional array, from the base address to the highest element would exploit thesequential locality of the array in memory. [2] The more general equidistant localityoccurs when the linear traversal is over a longer area of adjacent data structures having identical structure and size, and in addition to this, not the whole structuresare in access, but only the mutually corresponding same elements of the structures.This is the case when a matrix is represented as an sequential matrix of rows andthe requirement is to access a single column of the matrix.

Use of locality in general

If most of the time the substantial portion of the references aggregate into clusters, and if

the shape of this system of clusters can be well predicted, then it can be used for speedoptimization. There are several ways to make benefit from locality. The commontechniques for optimization are:

to increase the locality of references. This is achieved usually on the software side.

to exploit the locality of references. This is achieved usually on the hardware side.The temporal and spatial locality can be capitalized by hierarchical storagehardwares. The equidistant locality can be used by the appropriately specializedinstructions of the processors, this possibility is not only the responsibility of hardware, but the software as well, whether its structure is suitable for compiling a

binary program which calls the specialized instructions in question. The branchlocality is a more elaborate possibility, hence more developing effort is needed, butthere is much larger reserve for future exploration in this kind of locality than in allthe remaining ones.

69
http://en.wikipedia.org/wiki/Predictabilityhttp://en.wikipedia.org/wiki/Predictabilityhttp://en.wikipedia.org/wiki/Decidablehttp://en.wikipedia.org/wiki/Decidablehttp://en.wikipedia.org/wiki/Predictabilityhttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Decidablehttp://en.wikipedia.org/wiki/Decidablehttp://en.wikipedia.org/wiki/Computinghttp://en.wikipedia.org/wiki/Locality_of_reference#cite_note-1http://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Matrix_(mathematics)http://en.wikipedia.org/wiki/Optimization_(computer_science)http://en.wikipedia.org/wiki/Predictabilityhttp://en.wikipedia.org/wiki/Decidablehttp://en.wikipedia.org/wiki/Predictabilityhttp://en.wikipedia.org/wiki/Computer_programhttp://en.wikipedia.org/wiki/Decidablehttp://en.wikipedia.org/wiki/Computinghttp://en.wikipedia.org/wiki/Locality_of_reference#cite_note-1http://en.wikipedia.org/wiki/Data_structurehttp://en.wikipedia.org/wiki/Matrix_(mathematics)http://en.wikipedia.org/wiki/Optimization_(computer_science)


70/130

Lecture 16:

Main Memoryo RAM chip organizationo ROM chip organization

Expansion of main memoryo Memory connections to CPUo Memory address map

Till now we have discussed the memory interconnections and their comparisons.Lets take each in detail.

Main Memory: Main memory is a large (w.r.t Cache Memory ) and fast memory (w.r.tmagnetic tapes , disks etc) used to store the programs and data during the computer operation. I/O processor manages data transfers between auxiliary memory and mainmemory.

Main Memory is available in 2 types :

The principal technology used for main memory is based on semiconductor integratedcircuits.RAM : This is part of main memory where we can both read and write data.

Typical RAM chip:

CS1 and CS2 are used to enable or disable a particular RAM..

We have corresponding truth table as:

Chip select 1

Chip select 2ReadWrite

7-bit address

CS1

CS2RDWRAD 7

128 x 8RAM 8-bit data bus

70


71/130

We have RAM enabled when CS1 as 1 and CS2 as 0.Else we will have inhibitoperation or high impedence state. When we have 1 and 0 we will have RAM enabled.But if we have both read and write as 0 we dont have any operation and thus RAM is inhigh impedence state .RD pin tells us that RAM is getting used fro read operation.

Similarly WR pin is used to show that Write operation is getting performed on RAM.In this if we have option of both WR and RD as high we choose read operation else we willhave inconsistency of data.

Since we have 128 * 8 words RAM that means we have 128 words and each word of length 8 bits.

Thus we need 8 bit data bus to transfer the data and we have bidirection 8 bit databus .

To access 128 words we need 2 7 i.e. 7 bits to access 128 words.

Integrated c

cao-notess by girdhar gopal gautam 3g

Documents