tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · web viewpipelining...

35
TB/AMP/2010 The Pentium Microprocessors The Pentium microprocessor signals an improvement to the architecture found in the 80486 microprocessor. The changes include an improved cache structure, a wider data bus width, a faster numeric coprocessor, a dual integer processor, and branch prediction logic. The cache has been reorganized to form two caches that are each 8K bytes in size, one for caching data, and the other for instructions. The data bus width has been increased from 32 bits to 64 bits. The numeric coprocessor operates at about five times faster than the 80486 numeric coprocessor. A dual-integer processor often allows two instructions per clock. Finally, the branch prediction logic allows programs that branch to execute more efficiently. Notice that these changes are internal to the Pentium, which makes software upward-compatible from earlier Intel 80X86 microprocessors. A later improvement to the Pentium was the addition of the MMX instructions. SALIENT FEATURES OF 80586 (PENTIUM) A salient feature of Pentium is its superscalar, superpipelined architecture. It has two integer pipelines U and V, where each one is a 4-stage pipeline. This enhances the speed of integer arithmetic of Pentium to a large extent. Moreover, it has an on-chip floating-point unit, which has 5.1

Upload: voque

Post on 06-Feb-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

The Pentium Microprocessors

The Pentium microprocessor signals an improvement to the architecture found in the 80486

microprocessor. The changes include an improved cache structure, a wider data bus width, a

faster numeric coprocessor, a dual integer processor, and branch prediction logic. The cache

has been reorganized to form two caches that are each 8K bytes in size, one for caching

data, and the other for instructions. The data bus width has been increased from 32 bits to 64

bits. The numeric coprocessor operates at about five times faster than the 80486 numeric

coprocessor. A dual-integer processor often allows two instructions per clock. Finally, the

branch prediction logic allows programs that branch to execute more efficiently. Notice that

these changes are internal to the Pentium, which makes software upward-compatible from

earlier Intel 80X86 microprocessors. A later improvement to the Pentium was the addition

of the MMX instructions.

SALIENT FEATURES OF 80586 (PENTIUM)

A salient feature of Pentium is its superscalar, superpipelined architecture. It has two integer

pipelines U and V, where each one is a 4-stage pipeline. This enhances the speed of integer

arithmetic of Pentium to a large extent. Moreover, it has an on-chip floating-point unit,

which has increased the floating-point performance manifold compared to the floating-

point performances of 80386/486 processors.

Another feature of Pentium is that it contains two separate caches, viz. data cache and

instruction cache. In 80486 there was a single unified data/instruction cache.

The Intel CPU architectures up to 80486 issues only one instruction to the execution unit per

cycle. This obviously leads to a comparatively slow process of decoding and execution. For

enhancement of processor performance beyond one instruction per cycle, the computer

architects employ the technique of multiple instruction issue (MII). Thus a microprocessor

which is capable of issuing more thaw instruction per single processor cycle will be termed

as MII microprocessor. Obvious executing more than one instruction in a cycle, the

microprocessor must have more than execution channels. Thus there are two problems, viz.

5.1

Page 2: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

(a) How to issue multiple instruct, and (b) How to execute them concurrently. Keeping in

view these two issues, I architectures may again be redivided in two classes of architectures

— (i) Very Long Instruction Word (VLIW) architecture and (ii) Superscalar architecture.

Fig. 5 .1 Pentium CPU Architecture

In VLIW processors, the compiler reorders the sequential stream of code that is coming

from memory into a fixed size instruction group and issues them in parallel for execution.

On the other hand, in superscalar architecture the hardware decides which instructions are to

be issued concurrently at run time.

The Pentium CPU is based on superscalar architecture. The hardware, in case of the

superscalar architecture like Pentium, becomes enormously complex because in such a

processor multiple instructions have to be issued in each cycle to the execution unit.

Another important concept involved here is that of pipelining. Pipelining has been

implemented in all the processors from 8086 onwards, in a limited sense when instructions

have been prefetched and stored in a queue.

5.2

Page 3: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

Superscalar Execution

The salient feature of Pentium is that it supports superscalar architecture. For execution of

multiple instructions concurrently, Pentium microprocessor issues two instructions in

parallel to the two independent integer pipelines known as U and V pipelines. Each of these

two pipelines has 5 stages, as shown in Fig. 5.2. These pipeline stages are similar to the one

in 80486 CPU. Functions of these pipelines have been presented in brief.

Fig. 5 .2 Superscalar Organisation

1. In the prefetch stage of the pipeline, the CPU fetches the instructions from the

instruction cache, which stores the instructions to be executed. In this stage, the CPU

also aligns the codes appropriately. This is required since the instructions are of

variable length and the initial opcode bytes of each instruction should be

appropriately aligned. After the prefetch stage, there are two decode stages D1 and

D2.

2. In the D1 stage, the CPU decodes the instruction and generates a control word. For

simple RISC like instructions involving register data transfer or arithmetic and

5.3

Page 4: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

logical operations, only a single control word might be sufficient enough for starting

the execution. However, as we know X86 architecture supports complex CISC

instruction and require microcoded control sequencing.

3. Thus a second decode stage D2 is required where the control word from D1 stage is

again decoded for final execution. Also the CPU generates addresses for data

memory references in this stage.

4. In the execution stage, known as E stage, the CPU either accesses the data cache for

data operands or executes the arithmetic/logic computations or floating-point

operations in the execution unit.

5. In the final stage of the five stage pipeline, which is the WB (writeback) stage, the

CPU updates the registers’ contents or the status in the flag register depending upon

the execution result.

Although, as we mentioned Pentium pipeline structure is somewhat similar to the 80486

pipeline structure, Pentium achieves a lot of speed-up by integrating additional hardware in

each pipeline stages. Thus while 80486 may take two clock cycles to decode some

instructions, Pentium takes only one.

Separate Code and Data Cache

Unlike 80486 microprocessors’ unified code/data cache of 8Kbyte size, Pentium has

introduced two separate 8Kbyte caches for code and data. From the fundamental principles

of cache operation, one may observe that a unified cache, as in 80486 will always have a

higher hit ratio than two separate caches. Why then Pentium has gone in for separate

caches? The answer probably lies in the fact that to support the superscalar organisation, it

demanded more bandwidth that a unified cache could not provide. Moreover to efficiently

execute the branch prediction, separate caches are more meaningfully employed.

5.4

Page 5: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

The Memory System

The memory system for the Pentium microprocessor is 4G bytes in size, just as in the

8O386DX and 80486 microprocessors. The difference lies in the width of the memory data

bus. The Pentium uses a 64-bit data bus to address memory organized in eight banks that

each contains 512Mbytes of data. Figure 5.3 shows the organization of the Pentium physical

memory system.

The Pentium memory system is divided into eight banks that each stores a byte of data with

a parity bit. The Pentium, like the 80486, employs internal parity generation and checking

logic for the memory system’s data bus information. (Note that most Pentium systems do

not use parity checks, but it is available.) The 64-bit wide memory is important to double-

precision floating-point data. Recall that a double-precision floating-point number is 64 bits

wide. Because of the change to a 64-bit wide data bus, the Pentium is able to retrieve

floating-point data with one read cycle, instead of two as in the 80486. This causes the

Pentium to function at a higher throughput than an 80486. As with earlier 32-bit Intel

microprocessors, the memory system is numbered in bytes from byte 00000000H to byte

FFFFFFFFH.

Fig. 5.3. The 8-byte wide memory banks of the Pentium microprocessor.

Memory selection is accomplished with the bank enable signals (BE7 —BE 0). These

separate memory banks allow the Pentium to access any single byte, word, doubleword, or

quadword with one memory transfer cycle. As with earlier memory selection logic, we often

generate eight separate write strobes for writing to the memory system.

5.5

Page 6: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

A new feature added to the Pentium is its capability to check and generate parity for the

address bus (A3 1—A5) during certain operations. The AP pin provides the system with

parity information and the APCHK indicates a bad parity check for the address bus. The

Pentium takes no action when an address parity error is detected. The error must be assessed

by the system and the system must take appropriate action (an interrupt), if so desired.

The Pentium can function with a 32-bit wide memory system by using a multiplexer to

convert the 64-bit data bus to a 32-bit data bus. A set of bi-directional multiplexers (bi-

directional buffers are used as multiplexers) are used to convert the Pentium’s 64-bit data

bus into a 32-bit data bus. Care must be taken when using this arrangement because

software could access a doubleword that crosses the boundary between the lower and upper

halves of the data bus. All doublewords must be stored at doubleword boundaries. Note that

a doubleword boundary is an address that is divisible by 4.

Input/output System

The input/output system of the Pentium is completely compatible with earlier Intel

microprocessors. The I/O port number appears on address lines A15—A3 with the bank

enable signals used to select the actual memory banks used for the I/O transfer.

Beginning with the 80386 microprocessor, I/O privilege information is added to the TSS

segment when the Pentium is operated in the protected mode. This allows I/O ports to be

selectively inhibited. If the blocked I/O location is accessed, the Pentium generates type13

interrupt to signal an I/O privilege violation.

5.6

Page 7: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

Special Pentium Registers

The Pentium is essentially the same microprocessor as the 80386 and 80486, except that

some additional features and changes to the control register set have occurred.

Control Registers

Figure 5.4 shows the control register structure for the Pentium microprocessor. Note that a

new control register CR4 has been added to the control register array.

Fig. 5.4. The structure of the Pentium control registers.

PG Selects page table translation of linear addresses into physical addresses

when PG = 1. Page table translation allows any linear address to be assigned

any physical memory location.

CD Cache disable controls the internal cache. If CD = 1, the cache will not fill

with new data for cache misses, but it will continue to function for cache hits.

If CD = 0, misses will cause the cache to fill with new data.

NW Not write-through selects the mode of operation for the data cache. If NW =

1, the data cache is inhibited from cache write-through.

AM Alignment mask enables alignment checking when set. Note that alignment

checking only occurs for protected mode operation when the user is at

privilege level 3.

5.7

Page 8: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

WP Write protect protects user level pages against supervisor level write

operations. When WP = 1, the supervisor can write to user level segments.

NE Numeric error enables standard numeric coprocessor error detection. If NE =

1, the FERR pin becomes active for a numeric coprocessor error. If NE = 0,

any coprocessor error is ignored.

ET Selects the 80287 coprocessor when ET =0 or the 80387 coprocessor when

ET=1. This bit was installed because there was no 80387 available when the

80386 first appeared. In most systems, ET is set to indicate that an 80387 is

present in the system.

TS Indicates that the 80386 has switched tasks (in protected mode, changing the

contents of TR places a 1 into TS). If TS = 1, a numeric coprocessor

instruction causes a type 7 (coprocessor not available) interrupt.

EM Is set to cause a type 7 interrupt for each ESC instruction. (ESCape

instructions are used to encode instructions for the 80387 coprocessor.) We

often use this interrupt to emulate, with software, the function of the

coprocessor. Emulation reduces the system cost, but it often requires at least

100 times longer to execute the emulated coprocessor instructions.

MP Is set to indicate that the arithmetic coprocessor is present in the system.

PE Is set to select the protected mode of operation for the 80386. It may also be

cleared to re-enter the real mode. This bit can only be set in the 80286. The

80286 could not return to real mode without a hardware reset, which

precludes its use most systems that use protected mode.

VME Virtual mode extension enables support for the virtual interrupt flag in

protected mode. If VME = 0, virtual interrupt support is disabled.

PVI Protected mode virtual interrupt enables support for the virtual interrupt flag

in protected mode.

TSD Time stamp disable controls the RDTSC instruction.

5.8

Page 9: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

DE Debugging extension enables I/O breakpoint debugging extensions when set.

PSE Page size extension enables 4M-byte memory pages when set

MCE Machine check enable enables the machine checking interrupt.

The Pentium contains new features that are controlled by CR4 and a few bits in CR0.

EFLAG Register

The extended flag (EFLAG) register has been changed in the Pentium microprocessor.

Figure5.5 pictures the contents of the EFLAG register. Four new flag bits have been added

to this register to control or indicate conditions about some of the new features in the

Pentium.

Fig. 5.5. The structure of the Pentium EFLAG register.

Following is a list of the four new flags and the function of each:

ID The identification flag is used lb test for the CPUID instruction. If a program can set

and clear the ID flag, the processor supports the CPUID instruction.

VIP Virtual interrupt pending indicates that a virtual interrupt is pending.

VIF Virtual interrupt is the image of the virtual interrupt flag IF used with VIP

AC Alignment check indicates the state of the AM bit in control register 0.

VM Virtual Mode Flag If this flag is set, the 80386 enters the virtual 8086 mode within

the protected mode. This is to be set only when the 80386 is in protected mode. In

this mode, if any privileged instruction is executed an exception 13 is generated.

This bit can be set using the IRET instruction or any task switch operation only in

the protected mode.

5.9

Page 10: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

RF Resume Flag This flag is used with the debug register break points. It is checked at

the starting of every instruction cycle and if it is set, any debug fault is ignored

during the instruction cycle. The RF is automatically reset after successful execution

of every instruction, except for the IRET and POPF instructions. Also, it is not

automatically cleared after the successful execution of JMP, CALL and TNT

instructions causing a task switch. These instructions are used to set the RF to the

value specified by the memory data available at the stack.

NT Nested Task Flag

IOP I/O privilege level

Built-In Self-Test (BIST)

The built-in self-test (BIST) is accessed on power-up by placing a logic 1 on INIT while the

RESET pin changes from 1 to 0. The BIST tests 70 percent of the internal structure of the

Pentium in approximately 150μs. Upon completion of the BIST, the Pentium reports the

outcome in register EAX. If EAX = 0, the BIST passed and the Pentium is ready for

operation. If EAX contains any other value, the Pentium has malfunctioned and is faulty.

PENTIUM MEMORY MANAGEMENTThe memory-management unit within the Pentium is upward-compatible with the 80386

and 80486 microprocessors. Many of the features of these earlier microprocessors are

basically unchanged in the Pentium. The main change is in the paging unit and a new

system memory-management mode.

Paging Unit

The paging mechanism functions with 4K-byte memory pages or with a new extension

available to the Pentium with 4M byte-memory pages. As detailed in Chapters 1 and 17, the

size of the paging table structure can become large in a system that contains a large memory.

Recall that to fully repage 4G bytes of memory, the microprocessor requires slightly over

4M bytes of memory just for the page tables. In the Pentium, with the new 4M-byte paging

5.10

Page 11: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

feature, this is dramatically reduced to just a single page table. The new 4M-byte page sizes

are selected by the PSE bit in control register 4.

The main difference between 4K paging and 4M paging is that in the 4M paging scheme

there is no page table entry in the linear address. See Figure 5.6 for the 4M paging system in

the Pentium microprocessor. Pay close attention to the way the linear address is used with

this scheme. Notice that the leftmost 10 bits of the linear address select an entry in the page

directory (just as with 4K pages). Unlike 4K pages, there are no page tables; instead, the

page directory addresses a 4M-byte memory page.

Fig. 5.6 The linear address 00200001H repaged to memory location

01000002H in 4Mbyte pages. Note that there are no page tables.

Memory-Management Mode

The system memory-management mode (SMM) is on the same level as protected mode, real

mode, and virtual mode, but it is provided to function as a manager. The SMM is not

intended to be used as an application or a system-level feature. It is intended for high-level

system functions such as power management and security, which most Pentiums use during

operation.

5.11

Page 12: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

Access to the SMM is accomplished via a new external hardware interrupt applied to the

SMI pin on the Pentium. When the SMM interrupt is activated, the processor begins

executing system-level software in an area of memory called the system management RAM

or SMMRAM, called the SMM state dump record. The SMI interrupt disables all other

interrupts that are normally handled by user applications and the operating system. A return

from the SMM interrupt is accomplished with a new instruction. RSM returns from the

memory-management mode interrupt and returns to the interrupted program at the point of

the interruption.

The SMM interrupt calls the software, initially stored at memory location 38000H, using

CS=3000H and EIP = 8000H. This initial state can be changed using a jump to any location

within the first 1M byte of memory. An environment similar to real-mode memory

addressing is entered by the management mode interrupt, but it is different because, instead

of being able to address the first 1M of memory, SMM mode allows the Pentium to treat the

memory system as a flat, 4G-byte system.

In addition to executing software that begins at location 38000H, the SMM interrupt also

stores the state of the Pentium in what is called a dump record. The dump record is stored at

memory locations 3FFA8H through 3FFFFH, with an area at locations 3FE00H through

3FEF7H that is reserved by Intel. The dump record allows a Pentium-based system to enter

a sleep mode and reactivate at the point of program interruption. This requires that the

SMMRAM be powered during the sleep period. Many laptop computers have a separate

battery to power the SMMRAM for many hours during sleep mode.

The Halt auto restart and I/O trap restarts are used when the SMM mode is exited by the

RSM instruction. These data allow the RSM instruction to return to the halt-state or return to

the interrupt I/O instruction. If neither a halt nor an I/O operation is in effect upon entering

the mode, the RSM instruction reloads the state of the machine from the state dump and

returns point of interruption.

5.12

Page 13: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

The SMM mode can be used by the system before the normal operating system is placed in

the memory and executed. It can also periodically be used to manage the system, provided

that normal software doesn’t exist at location 38000H—3FFFFH. If the system relocates the

SMRAM before booting the normal operating system, it becomes available for use in

addition to the normal system.

The base address of the SMM mode SMRAM is changed by modifying the value in the state

dump base address registers (locations 3FEF8H through 3F3FBH) after the first memory-

management mode interrupt. When the first RSM instruction is executed, returning control

back to the interrupted system, the new value from these locations changes the base address

of the SMM interrupt for all future uses. For example, if the state dump base address is

changed to 000E8000H, all subsequent SMM interrupts use locations E8000H—EFFFFH

for the Pentium state dump. These locations are compatible with DOS and Windows.

PENTIUM II

Pentium II is also a 32-bit processor with 64-bit data bus and 36-bit address bus to address

up to 64GB of physical memory space. It is actually a Pentium pro processor with on-chip

MMX (Multi Media Extension). It is available with maximum internal ratings of 233 MHz

to 450 MHz.

The features of Pentium II processor are;

(i) Supports the INTEL architecture with dynamic execution.

(ii) Integrated primary (L1) 16-kb instruction cache and 16-kb write back data cache.

(iii) Integrated 256kb second level (L2) cache.

(iv) Fully compatible with previous microprocessors.

(v) Supports MMX technology.

(vi) Quick start and Deep sleep modes provide extremely low power dissipation.

5.13

Page 14: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

(vii) Low power GTL + processor system bus interface (GTL: Gunning transceiver

Logic).

(viii) Integrated math co-processor.

(ix) Integrated thermal diode for measuring processor temperature.

Pentium II Software Changes

The Pentium II microprocessor core is a Pentium Pro. This means that the Pentium II and

the Pentium Pro are essentially the same device for software. This section lists the changes

to the CPUID instruction; and the SYSENTER, SYSEXIT, FXSAVE, and FXRSTORE

instructions (the only modifications to the software).

CPUID Instruction

Table 5.1 lists the values passed between the Pentium II and the CPUID instruction. These

are changed from earlier versions of the Pentium microprocessor.

The version information returned after executing the CPUID instruction with a logic 0 in

EAX is returned in EAX. The family ID is returned in bits 8 to 11; the model ID is returned

in bits 4 to 7. The stepping ID is returned in bits 0 to 3. For the Pentium II, the model

number is 6 and the family ID is a 3. The stepping number refers to an update number. The

higher the stepping number, the newer the version.

TABLE 5.1 CPUID instruction.

5.14

Page 15: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

The features are indicated in the EDX register after executing the CPUID instruction with a

zero in EAX. Only two new features are returned in EDX for the Pentium II. Bit position 11

indicates whether the microprocessor supports the two new fast call instructions

SYSENTER and SYSEXIT. Bit position 23 indicates whether the microprocessor supports

the MMX instruction set. The remaining bits are identical to earlier versions of the

microprocessor and are not described. Bit 16 indicates whether the microprocessor supports

the page attribute table or PAT. Bit 17 indicates whether the microprocessor supports the

page size extension found with the Pentium Pro and Pentium II microprocessors. The page

size extension allows memory above 4G through MG to be addressed. Finally, bit 24

indicates whether the fast floating-point save and restore instructions are implemented.

SYSENTER and SYSEXIT Instructions

The SYSENTER and SYSEXIT instructions use the fast call facility introduced in the

Pentium II microprocessor. Please note that these instructions function only in ring zero

(privilege level 0) in protected mode. Windows operates in ring 0, but does not allow

applications access to ring 0. These new instructions are meant for operating system

software.

The SYSENTER instruction uses some of the model-specific registers to store CS, EIP, and

ESP to execute a fast call to a procedure defined by the model-specific register. The fast call

is different from a regular call because it does not push the return address onto the stack as a

regular call. Table 5.2 illustrates the model-specific register used with SYSENTER and

SYSEXIT. Note that the model-specific registers are read with the RDMSR instruction and

written with the WRMSR instruction.

TABLE 5.2 The model- specific registers used with

SYSENTER and SYSEXIT.

5.15

Page 16: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

To use the RDMSR or WRMSR instructions, place the register number in the ECX register.

If the WRMSR is used, place the new data for the register in EDS: EAX. For the

SYSENTER instruction, you need use only the EAX register, but place a zero into EDX. If

the RDMSR instruction is used, the data are returned in the EDX: EAX register pair.

To use the SYSENTER instruction, first load the model-specific registers with the address

of the system entrance point into the SYSENTER_CS and SYSENTER._EIP registers. This

would normally be the address of the operating system such as Windows or Windows NT.

Note that this instruction is meant as a system instruction to access code or software in ring

0. The stack segment register is lo4ded with the value placed into SYSENTER_CS plus 8.

In other words, the selector pair addressed by SYSENTER._CS selector value are loaded

into CS and SS. The value of the stack offset is loaded into SYSENTER_ESP.

The SYSEXIT instruction loads CS and SS with the selector pair addressed by

SYSENTER_CS plus 16 and 24. Table 5.3 illustrates the selectors from the global selector

table, as addressed by SYSENTER_CS. In addition to the code and stack segment selector

and the memory segments that they represent, the SYSEXIT instruction passes the value in

EDX to the EIP register and the value in ECX to the ESP register. The SYSEXIT instruction

returns control back to application ring 3. As mentioned, these instructions appear to have

been designed for quick entrance and return from the Windows or Windows NT operating

systems on the personal computer.

TABLE 5.3 Selectors addressed by the SYSENTER_CS select value.

To use SYSENTER and SYSEXIT, the SYSENTER instruction must pass the return

address to the system. This is accomplished by loading the EDX register with the return

offset arni by placing the segment address in the global descriptor table at location

SYSENTER_C?+. The stack segment is transferred by loading the stack segment selector

into SYSENTER_CS+24 and the ESP into the ECX.5.16

Page 17: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

FXSAVE and FXRSTOR Instructions

The last two new instructions added to the Pentium II microprocessor are the FXSAVE and

FXRSTOR instructions, which are almost identical to the FSAVE and FRSTOR

instructions. The main difference is that the FXSAVE instruction is designed to properly

store the state of the MMX machine, while the FSAVE properly stores the state of the

floating- point coprocessor. The FSAVE instruction stores the entire tag field, while the

FXSAVE instruction only stores the valid bits of the tag field. The valid tag field is used to

reconstruct the restore tag field when the FXRSTOR instruction executes. This means that if

the MMX state of the machine is saved, use the FXSAVE instruction; if the floating-point

state of the machine is saved, use the FSAVE instruction. For new applications, it is

recommended that the FXSAVE and FXRSTOR instructions should be used to save the

MMX state and floating-point state of the machine. Do not use the FSAVE and FRSTOR

instructions in new applications.

THE PENTIUM III

The Pentium III microprocessor is an improved version of the Pentium II microprocessor.

Even though it is newer than the Pentium II, it is still based on the Pentium Pro architecture.

There are two versions of the Pentium III. One version is available with a non-blocking

512Kbyte cache and packaged in the slot 1 cartridge, and the other version is available with

a 256K-byte advanced transfer cache and packaged in an integrated circuit. The slot 1-

version cache runs at half the processor speed, and the integrated-cache version runs at the

processor clock frequency. As shown in most benchmarks of cache performance, increasing

the cache size from 256K bytes to 512K bytes only improves performance by a few percent.

The salient architectural features are:

1. P-III CPU has been developed using 0.25 micron technology and includes over 9.5

million transistors. It has three versions operating at 450 MHz, 500 MHz and 550 MHz

which are commercially available.

2. P-III incorporates multiple branch prediction algorithms.

5.17

Page 18: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

3. Seventy new instructions have been added to Pentium III. These instructions are useful in

advanced imaging, speech processing and multimedia applications.

4. Dual independent bus architecture increases bandwidth.

5. P-III employs dynamic execution technology, which has already been discussed.

6. A 512Kbyte unified, non-blocking level 2 cache has been used.

7. Eight 64-bit wide Intel MMX registers along with a set of 57 instructions for multimedia

applications are available

Chip SetsThe chip set for the Pentium III is different from the Pentium II. The Pentium III uses an

Intel 810, 815, or 820 chipset. The 815 is most commonly found in newer systems that use

the Pentium III. A few other vendor chip sets are available, but problems with drivers for

new peripherals, such as the video cards, have been reported. An 840 chip set also was

developed for the Pentium III, but Intel does not make it available.

BusThe Coppermine version of the Pentium III increases the bus speed to either 100 MHz or

133MHz. The faster version allows transfers between the microprocessor and the memory at

higher speeds. Suppose that a 1-GHz microprocessor uses a 133-MHz memory bus. You

might think that the memory bus speed could be faster to improve performance. However,

the connections between the microprocessor and the memory preclude using a higher speed

for the memory. If it is decided to use a 200-MHz bus speed, we must recognize that a

wavelength at 200 MHz is 300,000,000/200,000,000 or 3/2 meter. An antenna is 1/4 of a

wavelength. At 200 MHz, an antenna is 14.8 inches. We do not want to radiate energy at

200 MHz, so we need to keep the printed circuit board connections shorter than 1/4-

wavelength. In practice, we would keep the connections to no more than 1/10 of 1/4-

wavelength. This means that the connections in a 200MHz system should be no longer than

1.48 inches. This size would present the main board manufacturer with a problem when

placing the sockets for a 200 MHz memory system.

5.18

Page 19: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

It is possible to approach or even exceed the 200 MHz memory system, if we develop a new

technology for interconnecting the microprocessor, chipset, and memory. At present the

memory functions in bursts of four 64-bit numbers each time we read the main memory.

This burst of 32bytes is read into the cache. The main memory requires 3 wait states at 100

MHz to access the first 64-bit number and then zero wait states for each of the three

remaining 64-bit wide numbers for a total of seven 100 MHz bus clocks. This means we are

reading data at 70 ns / 32 = 2.1875ns per byte, which is a bus speed of 457M bytes per

second. This is slower than the clock on a 1GHz microprocessor, but because most

programs are cyclic and the instructions are stored ii internal cache, we can and often do

approach the operating frequency of the microprocessor.

PENTIUM IV

The most recent version of the Pentium Pro architecture microprocessor is the Pentium 4

microprocessor from Intel. The Pentium 4 was released initially in November 2000 with a

speed of 1.3 GHz. It is currently available in speeds up to 2.0 GHz. There are two packages

available for this integrated microprocessor, the 423-pin PGA and the 478-pin FC-PGA2.

Both versions use the 1.8 micron technology for fabrication. As with earlier versions of the

Pentium, the Pentium 4 uses a 100-MHz memory bus speed, but because it is quad pumped,

the bus speed can approach 400 MHz.

Memory Interface

The memory interface to the Pentium 4 typically uses the Intel 850 chipset. The 850

provides a dual-pipe memory bus to the microprocessor with each pipe interfaced to a 32-bit

wide section of the memory. The two pipes function together to comprise the 64-bit wide

data path to the microprocessor. Because of the dual pipe arrangement, the memory must be

populated with pairs of RDRAM memory devices operating at either 600 MHz or 800 MHz.

According to Intel this arrangement provides a 300% increase in speed over a memory

populated with PC-l00 memory.

5.19

Page 20: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

Hyper Pipelined Technology

The Pentium 4 incorporates a deeper pipelined architecture than prior versions of the

Pentium microprocessor. Not only does it queue instructions for execution, but it also

queues microinstruction for execution in a special cache for the microprocessor core. This

special microinstruction cache is 12K bytes deep. This technology excludes the execution

unit from the main cache path to the microinstruction stream to increase performance.

RISC Architecture

The complexities of the instructions supported by a CISC processor went on increasing, as

more and more sophisticated processors were designed and marketed. This resulted in an

increase of processor die size to accommodate the large microcode required by the complex

instructions. The large size in turn meant more cost, since it consumes more silicon. Also

the chip size increases, the power consumption increases, resulting in more heating of the

chip. This in turn requires more cooling arrangement.

If we use processor, which support a set of simpler instructions, which do not require

complex decoding, then the design of processor becomes simple, with an associated

reduction in cost and power consumption. Also the execution of these instructions becomes

very fast.

As the name implies, Reduced Instruction Set Computer or RISC as it is popularly known is

a type of architecture that utilizes a small, lightly optimized set f instructions, rather than a

more specialized set of instructions often found in other types of architectures. Typica1ly

every instruction is executed in a single clock after it is fetched and decoded. These

instructions are executed very fast. Lot of disc space is consumed by micro codes in a ClSC

design which could be otherwise used for enhanced features. It is thus possible to produce

more RISC processors per silicon wafer. This makes RISC processors smaller, with less

energy consumption.

5.20

Page 21: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

THE ADVANTAGES OF RISC

There are several advantages of a RISC processor over its CISC counterpart. Implementing

a processor with a simplified instruction set design provides several advantages over

implementing a comparable CISC design. Some of the advantages are as below.

(i) RISC instructions, being simple, can be hard-wired, while CISC architectures may

have to use micro-programming in order to implement comp1ex instructions.

(ii) A set of simple instructions results in reduced complexity of the control unit and

the data-path; as a consequence, the processor can work at a high clock frequency

and thus yields higher speed.

(iii) As a result several extra functionalities, such as memory management units or

floating point arithmetic units, can also be placed on the same chip.

(iv) Smaller chips allow a semiconductor manufacturer to place more parts on a single

silicon wafer, which can lower the per-chip cost dramatically.

(v) High-level language compilers produce more efficient codes in a RISC processor

than its counterpart CISC processor, because they tend use the smaller set of

instructions in a RISC computer.

(vi) Shorter design cycle—A new RISC processor can be designed and tested more

quickly since RISC processors are simpler than corresponding CISC processors.

(vii) The application programmers who use the microprocessor’s instructions will find

it easier to - develop code with a smaller and optimum instruction set.

(viii) Another advantage is that the loading and decoding of instructions in a R1SC

processor is simple and fast, as it is not needed to wait until the length of an

instruction is known in order to start decoding the following one. Decoding is

simplified as opcode and address fields are located in the same position for all

instructions.

5.21

Page 22: tijubaby.weebly.comtijubaby.weebly.com/uploads/4/3/6/9/4369784/module5.d…  · Web viewPipelining has been implemented in all the processors from 8086 onwards, ... In the D1 stage,

TB/AMP/2010

BASIC FEATURES OF RISC PROCESSORS

(i) Simple instruction set: In a RISC machine, the instruction set contains simple,

basic instructions, from which more complex instructions can be composed.

Thus instructions with less latency are preferred.

(ii) Same length instruction: Each instruction is of the same length, so that it may be

fetched in a single operation. The traditional microprocessors from Intel or

Motorola support variable length instructions.

(iii) Single machine-cycle instructions: Most instructions complete in one machine

cycle, which allows the processor to handle several instructions at the same time.

RISC processors have unity CPI (clock per instruction), which is due to the

optimization of each instruction on the CPU and massive pipelining embedded in

a RISC processor.

(iv) Pipelining: Usually massive pipelining is embedded in a RISC processor. The

pipelining is key to speed up RISC machines.

(v) Very few addressing modes and formats: Unlike the CISC processors, where the

number of addressing modes are very high, in RISC processors, the addressing

modes are much less and it supports few formats.

(vi) Large number of registers: The RISC design philosophy generally incorporates a

larger number of registers to prevent in large amounts of interactions with

memory.

(vii) Microcoding not required: Unlike in a CISC machine, in RISC architecture,

instruction microcoding is not required. This is because of the availability of a

set of simple instructions and simple instructions may be easily built into the

hardware.

(viii) Load and Store architecture: The RISC architecture is primarily a Load and

Store architecture implying that all the memory accesses take place using Load

or Store type operations.

5.22