amd k-6 processor evaluation
DESCRIPTION
AMD K-6 Processor Evaluation. Registers. AMD-K6 Registers. General purpose registers Segment registers Floating point registers MMX registers EFLAGS register. Continue. Control registers Task register Debug registers Test registers Descriptor/memory registers - PowerPoint PPT PresentationTRANSCRIPT
AMD K-6 Processor Evaluation
Registers
AMD-K6 Registers
• General purpose registers
• Segment registers
• Floating point registers
• MMX registers
• EFLAGS register
Continue...
• Control registers
• Task register
• Debug registers
• Test registers
• Descriptor/memory registers
• Model-specific registers (MSRs)-Model 6
General-Purpose Registers
• 8 32-bit general-purpose registers
• EAX
• EBX
• ECX
Continue
• EDX
• EDI
• ESI
• ESP
• EBP
Segment registers
• 6 16-bit segment registers
• Used as pointers to areas (segments) of memory
• CS • DS
• ES
Continue
• FS
• GS
• SS
Floating-Point Registers
• 8 80-bit numeric floating point registers
• Help the floating-point execution unit
• Labeled FPR0–FPR7
MMX Registers
• 8 64-bit MMX registers
• Used by multimedia software
EFLAGS Register
• Provides for three different types of flags
– System flags – Control flag– Status flags
Control Registers
• 5 control registers
• Contain system control bits and pointers
Task Register
• Contains a pointer to the Task State Segment of the current task
Debug Registers
• 8 Debug registers
• Labled DR0-DR7
Descriptors
• Define, protect, and isolate code segments, data segments, task state segments, and gates
Memory Management Registers
• The AMD-K6 processor controls segmented memory management with 4 registers:
– Global Descriptor Table Register
– Interrupt Descriptor Table Register
– Local Descriptor Table Register
– Task Register
Model-Specific Registers (MSR)
• 5 MSRs
– Machine Check Address Register (MCAR)– Machine Check Type Register (MCTR)– Test Register 12 (TR12)– Time Stamp Counter (TSC)– Write Handling Control Register (WHCR)
MCAR and MCTR
• Both are 64-bit
• The AMD-K6 processor does not support the generation of a machine check exception, so these are used MCAR and MCTR are used instead
Test Register 12
• Disable the L1 caches
Time Stamp Counter (TSC)
• 16-bit
• The time stamp counter (TSC) MSR is incremented by the processor with each process or clock cycle
Write Handling Control Register (WHCR)
• Contains three fields: WCDE bit, Write Allocate Enable Limit (WAELIM) field, and the Write Allocate Enable 15-to-16-Mbyte (WAE15M) bit
CPU SPEED
CPU SPEED
• Very fast under Windows NT 4.0
• The 32-bit performance is excellent
• Runs Windows 95 faster than the Intel Pentium MMX
Continue
• Good choice for a great gaming machine
• Good engine for running Microsoft Office, surfing the net, and checking email
Continue
• If Windows NT is the primary operating system, the AMD K6 should be considered as a low cost but good performing alternative to a Pentium Pro or Pentium II
CPU Type: RISC86
RISC86 Superscalar Microarchitecture
• RISC86 microarchitecture - Internally translates x86 instructions into RISC86 operations– x86 Instructions - 1 to 15 bytes– RISC86 opcodes - simpler fixed-length
• Superscalar operation - multiple decode, execution, and retirement– Centralized Schedule Buffer/ Instruction Control Unit
• Buffers and manages up to 24 RISC86 operations at one time– Equates to 12 x86 instructions
– Multiple Decoders• Buffer can receive up to 4 RISC86 operations from decoders in
1 clock– 7 Parallel Execution Units
• Buffer can issue up to 6 RISC86 operations to execution units in 1 clock
x86 Instruction Categories (Short and Long Decodes)
• Short Decode– Common x86 instructions 7 bytes in length– Produce 1 RISC86 operations– 2 processed per clock– Processed completely within the decoders
• Long Decode– More complex and somewhat common x86 instructions 11 bytes
in length– Produce up to 4 RISC86 operations– 1 processed per clock– Processed completely within the decoders
x86 Instruction Categories (Vector Decode)
• Vector Decode– Complex x86 instructions requiring long sequences or RISC86
instructions– 1 processed per clock– Decoders generate an initial set of 4 RISC86 operations
• Decode is completed by fetching a sequence of additional operations from an on-chip ROM at a rate of 4 operations per clock
RISC86 Operations Categories
• Memory load operations (load)• Memory store operations (store)• Integer register operations (alu/alux)• MMX register operations (meu)• Floating-point register operations (float)• Branch condition evaluations (branch)
x86 to RISC86 Translation Example
I nstructions (x86) Operations (RISC86)
MOV CX, [SP + 4] Load
ADD AX, BX Alu (Add)
CMP CX, [AX] LoadAlu (Sub)
J Z f oo Branch
Instruction Set
Instruction Set
• Categories– Arithmetic– Conversions– Logical Operations– Transfers and Memory Operations
• Compatibility– Uses full Intel Instruction Set
• Features– Three Separate Instruction Sets
• Integer Instruction Set• Floating-Point Instruction Set• MMX Instruction Set
Technologies Used
Technologies Used
• RISC86 Superscalar microarchitecture– This enables leading-edge performance on both Microsoft Windows
95 and Windows NT operating systems, and the installed base of x86 software
• Socket 7-compatible Bus Interface– This allows PC manufacturers and resellers to leverage today’s
infrastructure to quickly bring superior price/performance PC systems to market
Detailed Comparison of the K-6 to the Intel Pentium
AMD K- 6 I ntel Pentium Pro
x86 Decoders 2 Sophisticated, 1 Long, 1vector
1 Sophisticated, 2 simple
Average RI SCops/ x86:32 bit code
1.2 (lower is better) 1.5
Average RI SCops/ x86:16 bit code
1.5 (lower is better) 2.0
Maximum ROp I ssue Rate 6 5
Physical Registers 48 40
Centralized Buff ermax/ active
24/ 18 40/ 20
FPU Multiply/ ADDLatency
2/ 2 5/ 3
Pipeline Stages 6 12
Continued
AMD K- 6 I ntel Pentium Pro
Misaligned Loads 1 cycle penalty 6 cycle penalty
Branch History Table 8192 entries 512 entries
Branch predictionaccuracy
95% 85-90%
Misprediction Penalty
I nstruction/ Data TLB 64/ 128 entries 32/ 64 entries
L1 I nstruction-Cache 32 KB+ Predecode 2-WaySet-Assoc.
8KB 2-Way Set-Assoc.
L1 Data Cache 32KB, 2-Way Set-Assoc. 8KB, 4-Way Set-Assoc.
Continued
AMD K- 6 I ntel Pentium Pro
Local Bus Bandwidth 528 MB/ Sec 528 MB/ sec
Local Bus Latency 2 clocks 5-7 clocks
Factors Affecting Performance
Factors Affecting Performance
• Pipelining, prefetching, and predecoding– Using a 32 byte instruction cache line, lines are prefetched and
predecoded. This enables the decoders to efficiently decode multiple instruction simultaneously
• Multiple Decoders– The decoders issue up to four opertions at a time to the
centralized schedule buffer which buffers and manages up to 24 operations at a time.
• Parallel Execution Units– The Instruction Control Unit issues up to six instruction to the
execution units and they are executed in parallel
Addressing Modes
Memory Map
Address Range (Decimal) Address Range (Hex) Size Description
1024K-131072K 100000-8000000 130048K Extended Memory960K-1023K F0000-FFFFF 64 K AMI System Bios
952K-959K EE000-EFFFF 8K FLASH Boot Block (Availableas HIMEM)
948K-951K ED000-EDFFF 4K ECSD (Plug and PlayConfiguration area)
944K-947K EC000-ECFFF 4K OEM Logo Area (Available asUMB)
896K-943K E0000-EBFFF 48K BIOS Reserved (Available asUMB)
640K-895K A0000-DFFFF 256 K Available High DOS Memory(open to the ISA & PCI bus)
639K 9FC00-9FFFF 1K Extended BIOS Data(moveable by QEMM,
386MAX)512K-638K 80000-9FBFF 127 K Extended conventional
0K-511K 00000-7FFFF 512 K Conventional
Addressing Modes
• Direct Addressing:- address operand byte points directly to the target data - only for internal RAM
• Indirect Addressing:- two address operand bytes pointing to another pair of address bytes, containing the address of the operand- for internal and external RAM
• Immediate Constants:- value of a constant can follow the operation code in the program memory
• Indexed Addressing:- only program memory can be accessed- only read operations are possible- addressing mode reads lookup tables in the program memory- base register points to the base of the table entry and the accumulator is set up with the table entry number
Addressing Modes
•Register Instructions:- for register banks containing registers from R0 to R7 - 3-bit register specification in the operation code of the instruction- no address byte
• Register-Specific Instructions:- some instructions are specific to certain registers- no address byte necessary
- operation code does the pointing itself
Perspective On Role In The Market Place
• Microprocessor Market:
- short product life cycles
- migration to higher performance microprocessors
- dominant position of Intel Corporation setting standards affecting margins and profitability of competitors restricting innovation and differentiation of product offerings
- successful competition possible if: new process technologies higher performance microprocessors greater volumes significant capital expenditures
AMD-K6 Processor Family Roadmap
Perspective On Role In The Market Place
- K6 - key element of further developments
- aggressive technology transition schedule
- possible risks and uncertainties:
successful fabrication of higher performance AMD-K6 ?
Intel’s new product introduction, marketing strategies and
pricing ?
continued development of worldwide market acceptance ?
availability of financial and other resources ?
possible adverse market conditions in the PS market ?
unexpected interruptions of production ?