computer hardware and peripherals

Computer Hardware and Peripherals

S Sandeep Kumar

28 Dec, 2006 Introduction to PC Hardware

The newspaper ads scream out prices, MHz, ATA, DDR, PCI-e. Do you know what these buzzwords really mean? Does anyone? The PC may be the single most important tool for researchers and executives, but because it is purchased in a camera store or discount food

warehouse it is often treated as a commodity item. It is not possible to buy a PC with poor quality core components. AMD and Intel don't make bad CPUs. Mainboards, disks, and video cards are uniformly decent quality. You can buy a cheap case

(with sharp edges that cut your hands), but if you never open the covers you will never know. So in an era where game consoles cost $600, and you can spend real money on a flat panel TV set, you can still buy the least expensive Dell or HP system for $350 and get a pretty good computer.

If that is all the advice you are looking for, read no farther. The rest of this article will be spent explaining how the technology works and what everything means. This information may have no immediate practical use, but it will increase your comfort

level. If the PC is no longer a big investment in money, it is still an important tool in business or education. Knowing more about it should make concerned people more comfortable with their decisions. No technical background is assumed. Even very complex issues will be explained in

terms that everyone can understand.

Who Made It?

IBM invented the modern PC design, but they recently sold that business to a Chinese company. This should not be a big surprise. Often the only American thing in a computer is the name on the cardboard box it came in. Apple assembles systems in Shanghai and ships them overnight to the US.

If you buy a car from Ford, you expect the frame, engine, transmission, generator, and other parts to come from Ford or at least be built to Ford specifications. You do not expect to be able to put a Ford transmission in a GM car.

In a PC, however, the CPU, memory, disk, CD, power supply, and case are all manufactured to industry standards. You can take a hard disk or memory out of a Dell computer and put it into a system made by HP. The brand names you know are the names of companies that assemble,

distribute, and support the computers, not the companies that make the parts. This is an international business. The mainboard almost certainly comes from a Taiwan company

(Asus, Abit, Shuttle, MSI, ...). Disks tend to come from Singapore or Indonesia (Seagate, Western Digital, Maxtor). Memory and LCD displays often come from Korea. The external case and the power supply probably come from China (though over time more and more parts will come from China).

You can buy the components from CDW or NewEgg and assemble a computer yourself, but you won't save any money. The big computer makers buy parts in lots of a thousand, packaged in bulk to save packing and shipping. Nine screws attach the motherboard to the mounts on the case. Four screws attach the disk to the disk bay. Then the cables all plug into sockets. An

unskilled worker can be quickly trained to assemble a computer every few minutes. The advanced technology is in the manufacture of the chips, not the final assembly of the finished

product. A CPU chip is constructed in a plant that costs billions of dollars. The building is on shock absorbers because the vibration generated by passing trucks would disturb the process. People wear spacesuits not to protect them from the environment, but to protect the chips from flakes of loose skin or the particles we exhale in every breath.


S Sandeep Kumar

Then the chip is packaged in plastic and shipped out. There is a socket on the mainboard. One corner of the chip has an arrow, and one corner of the socket has an arrow. Drop the chip into

the socket while matching the two arrows, then drop a lever to hold it in place. It is harder to tie a shoelace than to install a CPU chip on a mainboard.

Trends

Lets assume you don't have the time to read everything about computers. What are the key concepts you need to know to be informed on the technology, even if you won't impress your teenage son?

The Intel blunder - Inside a computer there is an assembly line with stations (called the "pipeline"). There is a "clock" that signals periodically (like the bell in a highschool) when it is time for work to move on to the next station in the line. About 5 years ago Intel designed the

Pentium 4 chip based on the assumption that clocks would get faster, so the time for any station to do work would get shorter. This meant building a longer pipeline with lots of stations, and simplifying the work done at each station. Its like creating an assembly line where each worker just turns one screw and then passes the work on to the next person. As you might imagine,

there is some inefficiency passing work from station to station, but that would have been overcome as the clock speeded up. Unfortunately, problems in the materials used to construct chips made it impossible to significantly increase clock speeds for the entire five year period, and

the inefficiency of the design was a drag on performance. Fortunately, Intel had an entirely different chip design for laptops, where efficiency improved battery life. So Intel reversed course and applied this more efficient laptop design to desktop computer chips, producing the current family of chips called Core Duo (or Core 2 Duo in a second generation).

The CPU and GPU - Inside every computer there are two processing units. The Central Processing Unit (CPU) from Intel or AMD runs the operating system (Windows, Mac, ...) and all the applications (Office, Web, Mail, ...). The Graphics Processing Unit is a specialized device that can

perform certain operations 10 times faster than the CPU. It does not run ordinary programs. The operating system treats it as a device, feeds it data, and expects it to perform its specialized calculations. You can buy a system with two or four CPU "cores". Each core can run a different

software program at the same time. If you are running a long program in the background, say compressing a TV show so it can display on a video iPod, then having a second core to run the rest of the programs can smooth out your performance. More than two cores can be very valuable on a corporate server, but is unlikely to help the home user. Similarly, you can plug two video

adapter cards into the same computer and get 2 (or sometimes 4) GPU chips. Unlike a CPU core, they will not be used to run different programs. Rather, they will be tied together to do one bulk process even faster. Now compressing video for an iPod turns out to be something that the GPUs

could do much, much faster than the CPUs if anyone wrote the software to use them. This has been promised for some time but never delivered. While Intel strategy seems to emphasize more CPU cores in the future, AMD has announced a strategy (called "Fusion") to produce chips with (probably) two CPU cores and then additional high speed specialized processing cores that look a

lot more like a GPU. This will speed up games, multimedia processing, Photoshop, and the other specialized applications that can actually use more processing power. Overclocking - The pipeline advances when the clock signals. To be sure that everything works

properly, vendors like Intel test their chips with very fast clocks to make sure they work, but then ship them with slower clock speeds to protect against any possibility of error. If you are using your computer to run a business, you don't want to mess with the clock speed. You never, ever

want the computer to crash. If you are playing games, however, then a setting that crashes the computer once a month (and you reboot) may be worth it if it makes the game run faster the rest of the time. Setting the clock to run faster is called "overclocking" and has become interesting because it is easy to do. It just isn't particularly useful.


S Sandeep Kumar

Vista - Anybody who really knows computers spends a certain amount of time helping out friends who have loaded up their computers with trash, viruses, and spyware. So you spend a half hour

cleaning things up, then turn the machine back over to someone who isn't smart enough to click the "No" button any time someone offers him free stuff. Windows was designed for TV's Mayberry, and then it got connected to the modern Internet. Microsoft spent 5 years creating a new tougher operating system that behaved the same as the old XP system. This is something

like taking an armored Humvee that troops use in Baghdad and making it look and drive like a minivan that soccer moms use to pick up the kids after school. It is bigger and requires a more powerful computer, but is better suited for the dangerous real world we are in.

Standards

There are 8 basic parts to a computer:

1. Case 2. Power Supply 3. Mainboard 4. CPU (and cooling tower)

5. Memory 6. Video Card 7. Hard Disk

8. DVD Drive

Case

The case is the metal or plastic box that holds the computer. You can get small cases that look like stereo equipment, or large cases with room for a lot of disks and add on cards. Enthusiasts may buy a case that is transparent or one that looks sporty. Things to look for in a case: A smaller case is more convenient. The large case is mostly empty,

which may seem like a waste of space. However, a computer needs airflow to cool the CPU and video card. Bigger cases have more air and are easier to cool. Also, noise from a computer mostly comes from fans. A small case has a small fan that has to rotate very fast and is loud. A big case

can have a 120 mm fan that can rotate more slowly and still move a lot of air. If you have the space on the floor, the bigger case has advantages. Other than this, look for a case with no sharp edges that cut your fingers if you open it up.

Mainboard

The mainboard must have a socket for at least one CPU chip, and sockets for typically 2 or 4

memory "sticks". The back panel will expose plugs for keyboard, mouse, USB, ethernet, and maybe External SATA disks. Some mainboards have integrated video. It will then have slots and connectors for the things you add it: SATA connectors for disk and PCI and PCI-e slots for adapter cards.

Things to look for in a mainboard: The CPU socket has to match the CPU chip. Mainboards come in large (ATX) and small (MATX) sizes with a few other (jumbo, tiny) sizes used only for specialty purposes. The small MATX board will only have two memory slots and room for four adapter cards, but it will fit in media center cases designed to look like stereo equipment. If you get a big

case you can fit a full sized ATX board. Gigabit Ethernet is useful, particularly when you move large files from machine to machine in a home network. External SATA (eSATA) allows you to connect external disks to the computer at full speed. Integrated video is OK for running Office,

but look for a DVI connector for modern flat panel screens. Otherwise, compare the adapter cards you want to install with the slots you have. Mainboards can have four PCI slots, but boards designed for running video games often have room for a second oversized video card and cut the PCI slots down to two.

Video


S Sandeep Kumar

The video card will plug into the large PCI Express slot on the mainboard. Enthusiasts will spend

more on the video card (or cards) than they do on the rest of the computer. Video cards can use more power than the rest of the system. This makes sense, however, because for the specialized type of computing they do, a video card can be 10 times faster than the CPU on the mainboard. Gamers want the most powerful card they can afford. Home "media center" users probably want

a card that supports HDCP and can play High Definition TV recordings. Business users don't need any of this, and will be satisfied with the integrated video that comes on some inexpensive mainboards. Things to look for in a Video card: Serious gamers already knew all this stuff and stopped reading

long ago. You are therefore probably looking for integrated video or a single medium performance PCI Express video card. There are two main GPU chip makers: Nvidia and ATI. Since you are not looking to spend a few thousand dollars for the biggest and badest machine on the planet, the

two vendors are approximately interchangeable. Minimally, you want video that supports DirectX 9 (Windows XP) and has some hardware support for decoding standard definition MPEG (DVD). If you read carefully, for the same money you can find integrated video or inexpensive adapters that will decode High Definition TV (broadcast and cable). In cards this means some type of ATI

1xxx (1600, 1900, 1950) GPU or an Nvidia 7xxx (7600, 7900) GPU. If you get 256 megabytes of memory, this card will be ideal for Vista. However, if you plan to add a Blu-Ray or HD-DVD drive to your system and view recorded high definition movies on a big screen display, then you need a

display and adapter card that support "HDCP" and, to be safe, a GPU that can decode a format called "H.264" (generally the same ATI 1xxx or Nvidia 7xxx, but read the specs). To do everything except high performance gaming, expect the video card to cost $150 to $200. Vista supports DirectX 10, but at the start of 2007 the only cards that support this (Nvidia 88xx) start

at $400. If you have only flat panel displays, the DVI connectors are much better than old analog connectors. A few cards support the smaller digital HDMI connector normally associated with flat panel TV sets.

Power Supply

The Power Supply is rated by the maximum amount of power it must provide to the rest of the

computer. Most business systems will get along with 350 watts, but a system with four cores or high end video cards can require 500 watts or more. The Power Supply is the component most likely to fail sometime during the life of the computer. It may overheat or the cooling fan may fail. As it starts to fail the voltage levels go bad (the 12 volt wires may carry only 11 volts) and that

can make the computer unstable and it will start to crash. Things to look for in a Power Supply: I would say that you should read reviews and buy one from the company that has the best reputation for reliability, but even the best company produces the

occasional bad unit. So look for a unit with two 12 volt "rails" instead of one, and one with a 120 mm fan will make less noise than one with 80 mm fans. Spend 10 minutes reading reviews at www.newegg.com and don't buy a real stinker that customers all hate. If you have a small case or one with a compartment for the Power Supply, watch out for units that are slightly larger than

normal, especially if they have "modular cables" that plug into one end of the supply, because when you assemble everything the Power Supply may not fit. If you don't stock a spare Power Supply in your house, then know where to find a CompUSA or computer repair store where you can get one when you need it, or your computer may be down for a few days waiting for an

internet vendor to ship one.

Disk

SATA is simpler and faster than the old parallel ATA. There is one high speed disk from WD called the "Raptor" that is at twice as fast as all other disks. Otherwise, you buy disks based on price, capacity, and reliability.


S Sandeep Kumar

Things to look for in Disk: First consider your case. A small case may have only room for one or at most two disks, so you may want to buy the biggest disk you can get. A large case can have

room for 6 disks, and you can add buy rails that adapt the unused front 5.25" CD-sized slots to hold even more disks. Now if you look carefully at prices, you will see that two 250 gigabyte hard drives are often less expensive than one 500 gigabyte disk. A disk has movable "arms" that position somewhere on the surface to read or write data. Each disk has its own position. So if you

are copying data from one disk to another the arms on each disk can remain positioned in one place, but if you process data from one file to another on the same disk, the arms move back and forth and the same processing can take 20 times longer than a two disk operation. So if you have room and are willing to think about optimizing your data layout, more small disks is better than

fewer big disks.

All you need is a Phillips Screwdriver

You can buy every part (except the CPU) from a dozen different vendors. There are only two CPU choices. Everything will fit together because there are standards. Lets build the system. 1. The Power Supply is a metal box with cables that dangle from it and connect to devices

inside the case. You slide it into place and attach it to the back of the case with four Phillips screws (that come with the Power Supply). 2. The mainboard sits on screw holes on top of nine "standoffs" that keep the bottom of the

board a quarter inch away from the metal of the case trey. Standoffs and the screws that attach the mainboard come with the case. In almost every case these are the same size screws as the ones that hold the Power Supply. 3. A modern power supply has a 24 pin connector that plugs into a slot on the mainboard.

There is a second 4 pin connector. There is a latch on one side of each plug that attaches to a plastic notch on one side of the mainboard socket, but the plugs have shapes so they cannot be plugged in upside down. 4. The CPU drops into the socket and clamps in place. A cooling tower (heat sink and fan)

clamps on top of the mounting assembly on the mainboard. Plug the fan into the little 3 pin power connector on the mainboard. 5. Plug memory into the memory slots. If they don't go in at first, you may be trying to plug

them in backwards. 6. Plug the video card in the long PCI Express slot. 7. The hard disk and DVD drive connect to the case with four screws, typically two screws on each side. These screws also come with the case, but they are slightly smaller than the screws

that hold the mainboard to the standoffs. There are only the two sizes of screws. 8. Now plug the long thin SATA cable to SATA socket number 1 on the mainboard and connect it to the back of the disk. There is an L shaped connector so it cannot go in wrong.

Connect a power cable from the power supply to the back of the disk. 9. Now there is a block of pins on the mainboard and some little cables from the front panel of the machine. Typically you connect the power switch, power light, disk light, and speaker cables. There will be a diagram in the manual for the mainboard. The power and disk light

connectors are the only things in the computer that you can plug in backwards. If either light does not go on, open the case later and flip the connector around. 10. Connect the monitor, keyboard, and mouse. Plug the power cable in the wall and turn the machine on. It is time to install an operating system.

The first time you do all this you may spend hours checking and rechecking everything. When you get used to it, the whole process can take as little as 20 minutes. On an assembly line, a worker probably builds a system in 3 minutes.

Carburetor or Cup Holders?

Back in the 1950's, before fuel economy and pollution controls, people judged a car by the power

of its engine. Today, however, cars are sold to the mass market based on a quiet smooth ride, side air bags, a nice sound system, and Cup Holders.


S Sandeep Kumar

There is a "hot rod" market for custom computers today. As with the automotive hot rod, the

target audience for high performance computers are young males playing games. They have even borrowed a word from automotive customization, buying "modding" hardware for their systems. Cases have transparent panels and low heat internal lights to show off the electronics. The CPU is "overclocked" (run at a speed higher than that recommended by the manufacturer) and the extra

heat is removed with exotic cooling systems. As with cars, this is a specialty market. Maybe its the lower speed limits, or maybe the boomers are just getting older. Big engines are less important than they used to be. Some people even dream of zipping though town on a

Segway. The same thing may happen with computers. The 55 to 65 miles per hour boundary for computers may correspond to 700 MHz to 1 GHz. That is fast enough to run Windows, browse the Web, read E-Mail, listen to MP3 files, and run all the Office programs. Faster speeds are only

useful if you play computer games, convert video files, or run a server. This is not a message that the computer makers want you to hear. It might suggest to people that they keep their old computer for another year rather than replacing it with a new model.

However, technology continues to evolve across the board, and CPU may not be the most important thing. You didn't buy your last new car because the speed limit increased on the highway. Consider some other issues:

• Power. A few years ago a typical CPU used 100 to 120 watts of power. Today vendors

build a version of their mainstream CPU chips that use only 65 watts (for two CPU cores!) and Intel makes specialty processors that can drop the power use as low as 10 watts.

However, video cards can draw 200 watts each and high end gamers add a second power supply to their system just to power the video. If you can select your own components, and you do not happen to own your own electric company, then it will certainly be worth the $4 difference to buy a 65 watt processor than the 100 watt version of the same CPU.

During the lifetime of the system you will save a lot more than $4 worth of electricity. If you buy a preassembled system from a vendor, they have no incentive to add $4 to the purchase cost of a system for a feature that most consumers are not smart enough to look

for.

• Noise. The cheapest way to deal with the heat generated by the CPU, mainboard, and video card is to put a small fan on each device and run it at a high speed. This generates a

lot of noise. For only a little more money one can carefully design the airflow, install larger metal radiators to spread the heat more widely, and run a few much larger fans at much lower speed. A quiet computer doesn't cost much more, and has a big effect on either your

office or your living room.

• Keyboard and Mouse. The keyboard is the one part of the computer system that arguably has gotten worse over the years instead of better. When IBM first designed its

personal computer, they used the same keyboard technology used in other IBM devices designed for full time professional use. Some people believe that the old "clicky key" IBM keyboards of the 1980's represented the peak of technology. Today a "keyboard" is a no-cost item you can select on the Dell Web page, and replacement keyboards are sold in

computer stores for $15. Compare this cost to the medical expenses of a repetitive stress injury. Fortunately, keyboards are interchangeable across systems, and a good one never breaks (although it doesn't improve if you spill pancake syrup into it). If you get a

computer with a lousy keyboard, invest in a better one. Similarly, you can spend a few extra dollars getting a comfortable mouse and save your hand (although it is a lot easier to find mice shaped for right handed than left handed people).

• Screens. Size matters. Anybody reading this document wants some kind of LCD monitor. They come in two flavors. Devices sold as computer monitors have a limited brightness


S Sandeep Kumar

and tend to have high resolution on small screen sizes. Today the 19" monitor has become standard, but there are 20" and 24" monitors, some with wide screens. For a bit more you

can buy an LCD TV. When sold as a TV, the screen is typically twice as bright as a computer monitor. TV panels (32" to 42") have resolutions of approximately 1280x768 or 1920x1080. A smaller 24" screen sold as a computer monitor will have resolutions of 1600x1200. Baby Boomers will discover that higher resolution isn't necessarily a good

thing. While it makes photographs sharper, it also makes text too small to read comfortably.

High performance computers run games that appeal to the teenager. For the rest of us, a

comfortable keyboard, easy to read screen, and quiet room may be more important.

The Q Bridge

In downtown New Haven, CT where I-91 meets I-95, the "Q Bridge" crosses the harbor area. It must be one of the hottest attractions in southern New England, because every morning and afternoon cars line up for miles to cross it. It defines the rush hour commute, and nothing that

you do to the other roads or exits in West Haven or East Haven will materially speed things up. Inside your computer there is an electronic version of the Q Bridge. Depending on the application,

some component will become the choke point, and all the data bytes will line up waiting to get through. But while the real Q Bridge never changes, the PC choke point moves as you change use. A Porche and a Yugo get caught in the backup at the same point in West Haven. Twenty minutes

later they cross the bridge at the same time. It doesn't do any good to spend a lot of money on a fast car and a big engine if the limiting factor is traffic moving five miles an hour. Yet customers often select a server with a fast CPU, without first considering what the bottleneck will be.

If you play video games or edit video, then the speed of your computer depends on the speed of the CPU. If you record TV shows on your computer and then edit out undesirable material periodically inserted into the program, processing may be 10 times faster if you read from one

disk and write to another than if you use a single disk. The performance of a database is typically determined by the amount of memory you have. Copying files from one machine to another depends on the network speed.

Summary and Links to Topics

Each main point will be summarized here. The summary is then linked to a secondary page where you can learn more about a particular topic. Readers are urged to follow each such link to get the

full story, but you can make your own decision.

Circuit Size, Voltage, and Heat

All the electronic components of a computer follow some basic design principles. To make a computer circuit operate faster, you have to make it smaller. Smaller circuits can run at a higher speed, using less voltage, and producing less heat. By analogy, if the only question is 0 or 1, empty or full, then it is much faster and requires much less work to fill a shotglass with water

than to fill a bathtub. These four factors (size, speed, voltage, and heat) are always in balance. You can increase speed on a given chip by increasing the voltage, but that produces more heat and requires more expensive cooling.

24 June, 2005

The Rules of Circuits


S Sandeep Kumar

To understand how circuits work, consider plumbing. Electricity runs through wire much like water runs through a pipe. The amount of water that flows through the pipe is equivalent to the current of electricity through the wire. The water pressure is equivalent to voltage.

There is one slight problem with the plumbing analogy. You fill a bathtub with water using the water pressure in the line, but you empty a bathtub by opening the drain and letting gravity push the water down. Circuits, however, use both positive and negative voltage, so the speed with which electric charge drains from a circuit is the same as the speed at which it fills (determined by voltage).

For a lamp, stereo amplifier, or drill, the measure of quality is more "power". The opposite is true for computers. A computer measures data as 0 and 1, on and off, empty or full. When we talk about a "powerful" computer, we mean a machine that can perform its calculations more quickly. However, each calculation only requires accurate measurement of the state of the circuits. Real electrical power, measured in "watts" like a light bulb, is waste heat.

In direct contact with the top of the CPU chip is a block of metal called the “heatsink”. It is solid at the bottom, but has cooling fins at the top. Waste heat generated by the CPU is conducted into the heatsink. A fan blows air over the fins. The heat is transferred to the air, which is then blown out the back of the computer. A 486 CPU generated about 4 watts of waste heat. A Pentium III

generated around 25 watts. A Pentium 4 generates 80 watts, and Intel is struggling to keep the next generation chip below 100 watts.

Heat has become the most important problem in CPU design. Chips could be much faster if the engineers could find some way to reduce the waste heat. Unable to do this, engineers must content themselves with more efficient systems to conduct the heat away from the CPU chip and dump it into the room.

Size Matters (Small is Better)

To indicate a 1, you have to fill something. It doesn't matter what you fill as long as you can quickly fill or empty it and you can accurately measure whether it is full or empty. Obviously, it takes less time to fill a shot glass with water than it does a bathtub. Not only is it faster, but it takes less water (current), and you can do it with a much lower water pressure (voltage). You can also empty it faster.

So the trick to computer circuits is to make them as small as possible, so that filling them with electrons, or removing all the electrons, can be done in the least amount of time with the least work. The size of circuits is determined by the width of the smallest line that the chip manufacturing technology can draw.

Circuits used to be measured in microns (millionths of a meter). Recent generations of chips were

created at .25 microns, then .18 , .13, and now .9 microns. Given these sizes, the industry is beginning to change scale to nanometers (billionths of a meter). Convert microns to nanometers by multiplying by 1000. So a .13 micron circuit is a 130 nanometer circuit.

On the flat surface of a computer chip, the size something is determined by its volume (width

times height). So chip improvements are determined by the square of line width and not just the line width. Comparing squares, a 130 nanometer technology is almost half the volume of a 180 nanometer technology, and therefore twice as good.

Vendors are beginning to migrate from 130 nanometer chips to a new generation of 90 nanometer chips. An Intel CPU commonly known as “Prescott” appeared early in 2004 based on


S Sandeep Kumar

this technology. Intel has demonstrated a technology that may be available in a few years based on 65 nanometer elements.

A Dripping Faucet

As the size of circuits becomes smaller, chips have begun to leak current. As the size of wires and transistors got smaller, the space between the wires also shrank. This space is supposed to be an

insulator, but as the distance between two wires gets smaller and smaller, some current begins to leak across the thinner insulation barrier.

There is also a problem inside the transistors themselves. In a computer, a transistor circuit is supposed to be "on" or "off". We think of this difference like a light switch, but following the analogy with plumbing it is something like a faucet that is open or closed. The problem here is

that as the size of the transistor gets smaller, some current leaks through the insulating barrier when the transistor is "off". It is something like water dripping through a faucet with a bad washer. The circuit still works. It is easy to tell the difference between a faucet that is open with water running from one that is almost closed with water dripping.

Not only is this a problem, but the physics causes it to get exponentially worse as circuit size gets smaller. There are solutions. New chip materials provide better insulation in the space between wires compared to current silicon. However, Intel did not expect the problem would get so bad so quickly, and their new materials will not become available fast enough to solve the problem this year.

Work is Heat and Heat is Work

Flanders and Swan wrote this in a song about the laws of Thermodynamics (CD). Their summary runs "Work is Heat and Heat is Work." In plumbing, it takes work to move the water up to your second floor bathroom. You don't see the work because it is being done by massive pumps at the Water Company. However, if you had to pump the water yourself, or carry it upstairs in buckets,

you would immediately recognize that work is involved. As you build up a sweat, you realize that work is heat.

The smaller a circuit is, the less work is required. It requires a lot less effort to carry up the stairs enough water to fill a shot glass than it does to fill a bathtub.

Every modern CPU chip generates more heat than it can tolerate. By itself, the chip will overheat in a few seconds and stop running. A few years back, someone posted pictures on the Web of a

test in which they cooked an egg on a standard CPU chip. In a real computer, however, heat is just a waste product that must be discarded.

Cooling a CPU follows essentially the same rules as cooling a car engine. You can pump liquid through a block of metal that covers the chip, then vent the heat through a radiator. Most computers today, however, opt for a simple block of metal with cooling fins (a "heat sink") that

make the CPU air cooled. Anyone alive in the 60's may remember that the old Volkswagens had air cooled engines that worked the same way.

Smaller circuits generate less heat, but the heat they do generate is concentrated in a smaller area. This may require better cooling. The most recent processors require a layer of copper between the CPU and the heat sink, because copper conducts heat better than aluminum. The

very best heat sinks are all copper, but that presents another problem. Copper is much heavier than aluminum, and an all copper heat sink can add more weight than the motherboard can support, particularly when the system is moved.


S Sandeep Kumar

Overclocking

If a zero bit is represented by empty and a one bit is represented by full, then under ideal circumstances every measurement would find every circuit either entirely empty or entirely full.

However, there are slight imperfections in the material or manufacturing process for each circuit. Some may fill or empty slightly slower than others. So the computer is designed with some tolerance. If the circuit is less than 1/3 full, we may treat it as "empty". If it is more than 2/3 full, we may treat it as "full". In between, the status is indeterminate.

Intel or AMD test every circuit in every chip they make. They apply a standard voltage, then fill or

empty the circuits and measure them. Some chips will be nearly perfect and will operate at the highest speed. Other chips may have circuits that run a bit more slowly and take longer to fill or empty. They will be sold to run at a slower speed.

A conservative buyer will accept the vendor rating. Other users, looking for "extreme" performance, may try to squeeze more performance out of the chip by increasing the clock rate.

Since there is a little slop factor left from the vendor testing, most processors will run at a 5 or 10% faster clock speed. More than that requires effort.

A faster clock doesn't really make anything in the CPU run faster. If you start to fill a circuit up with electrons, it will fill at whatever speed it operates without regard to the clock. What the clock does it is indicate when the filling process ends and when to measure which circuits are full and

which are empty. If every operation competed with time to spare, then you can safely speed up the clock and shorten the time between the start of the process and the point of measurement. Eventually you shorten the period so much that one circuit is not only still filling, but has not yet reached the 2/3 full point or whatever mark generates a reliable value. Then the system crashes.

One solution is to increase the voltage by one notch (usually a quarter volt). In the analogy,

voltage is like water pressure. Higher pressure means more flow, and everything fills and empties faster. Running a chip at a slightly higher voltage than recommended can compensate for running the clock at a slightly higher speed. However, it will also generate more heat.

Extreme performance fans crank the voltage and clock speed up to high values, but compensate with large, loud, expensive cooling solutions. Consider for example the Zalman water cooling system with a radiator that is bigger than most computers.

Balanced, Serial, Point to Point, One Way

At slow speed anything works, so the first computer designs used whatever options were simpler. Then you push the speed higher and higher till you hit a barrier. There are several ways to move data around in a computer at higher speed. Each solves a problem. Several new architectures combine several features to provide some improvement.

Voltage is a Difference

If you look out a window, you may see a bird sitting on top of a power line. Birds can do this

because voltage isn't an absolute property but is always measured relative to something else. Electricity doesn't move through the bird because the bird isn't connected to the other power wire or to a ground. Occasionally a squirrel will touch two wires to complete the circuit.

All early computer interfaces started by assigning one wire to each bit of data or control signal. The voltage on all the signal wires would be measured relative to a single common ground wire. This works at low speed, over short distances.


S Sandeep Kumar

The problem is using a single common ground to measure several different wires. There is some delay after the signal wires change before everything settles down and a reliable measurement

can be made. Things are better if you have fewer signal wires for every ground wire. In a modern CPU, every fourth pin can be a ground connection. Still this type of structure seems to max out at around a 200 MHz clock signal.

Balanced

The solution to this problem has been known since the '60s, but was first applied to communication over long distances. Each signal is represented by two dedicated wires. To

generate a signal, apply a small positive voltage to one wire and an equal negative voltage to the other wire. The receiver measures the difference between the two wires in the pair and determines which is positive and which is negative. Since the two wires have opposite signals, they exactly balance each other and produce 0 net voltage relative to any external reference point.

Balanced pairs also solve the problem of external interference. A long wire is also an antenna. Look in your AM radio and you may find that the antenna is just some ordinary electric wire run around the case. The longer the wire, the more outside signal gets picked up. Insulation blocks the flow of electricity, but radio waves pass right through it. The radio measures a signal induced

on a single wire loop. However, when computers use a pair of balanced wires, any external source of interference produces exactly the same effect on each wire of the pair, and at the receiving end the two cancel out.

Over short distances, like on a mainboard, it is sufficient to run the pair of wires next to each other. Any interference they generate or receive tends to cancel out when the pair is measured

against each other. Over longer distances, such as a USB or Ethernet cable, the pair of wires is twisted round each other. This prevents either wire from being "closer" all the time to either a source or recipient of interference.

The bad news is that you need more wires or pins than in the older one-pin-per-signal design. If

every fourth pin used to be a ground, the pin count increases by 50% to switch to a balanced signal (3 signals require 6 pins instead of 4). However, the clock speed on the pair of wires can be increased by such a large factor that all of the new balanced pair connections end up using far fewer wires in total.

Serial (not Parallel)

If you send the same electric signal through any parallel set of wires, the electricity will move

more slowly through some wires due to slight differences in the metal. The signals arrive at the other end with very small timing differences. This is called "skew". Like runner in the various lanes of a 100 meter dash, they all start at the same time and place, but they arrive at the finish line staggered by small differences in speed.

Start Finish

o-------------------------------------------------------------> o--------------------------------------------------------------> o---------------------------------------------------------------> o--------------------------------------------------------->

o----------------------------------------------------------------> o--------------------------------------------------------------> |<- Clock Pulse ->|

covers worst skew


S Sandeep Kumar

Skew is less of a problem over short distances or low speeds. It gets worse when, as in the PCI

bus, the wire is connected at points along the path to connectors on sockets into which adapter

cards may or may not be plugged. Every time the signal hits a point where it is soldered to something, or where the signal splits in two directions, there will be some delay. These contacts must be manufactured for pennies, so it isn't feasible for them to be of uniform quality.

In addition to the signal wires, a parallel bus carries a clock pulse. All the data bits start out at the same time. The clock, however, cycles half way between adjacent data bits. The idea is that the

clock signals the earliest point when the next bit can arrive on any wire, and the last moment when the slowest previous data bit can have arrived.

The problems with a parallel bus are problems in physics, wire, and solder. You can't fix them with faster CPU chips. Eventually, each parallel bus in the computer is replaced by something better.

The alternative has been understood for as long as there have been Personal Computers. Instead

of sending one bit of data down each wire in a parallel bus, send all of the data one bit at a time down a single pair of wires. If there are slight imperfections in the wire, they effect each bit equally. The bits arrive at the other end at the same speed they were sent. This is a Serial bus.

The problem with a Serial bus is that it requires a very fast computer chip to generate and receive the signal. This, however, is a problem that is easily solved as silicon computer chips got faster

and cheaper. As chip technology improved, one by one each parallel bus in the PC has been replaced by a serial alternative.

Point to Point (not really a Bus)

In computer terms, a "bus" is a communication path shared by many devices. The memory bus is a sequence of 1 to 4 slots into you can plug "DIMM" modules of memory. The PCI bus is a sequence of up to 5 slots into which you plug adapter cards. Even the IDE and SCSI disk connectors are bus cables that support more than one device.

However, each time a wire encounters a slot or connector there is an interface point that minimally increases skew and can also generate a "reflection" where the signal bounces off the interface and starts to travel back towards it source.

The solution is to redesign each connection to be "point to point" between two devices. In the newer disk technologies (SATA replacing old IDE, and SAS replacing the old SCSI) a dedicated

cable with two pair of wires connects each disk to a dedicated connector on the controller card. On many mainboards today, the fastest supported memory speed is only permitted when a single DIMM is plugged into the memory slot. If you want to plug two DIMMs of memory into the same bus, you have to drop performance to a lower clock speed.

The most advanced version of this design principle is provided by HyperTransport, a new high

speed chip interconnect system used by AMD, IBM, and Apple. The CPU is connected to other chips on the mainboard using a system or point to point wires. To make the system work, each chip can receive a signal from wires on one side and relay the signal bit by bit on to the next chip.

One Way

The conventional design for memory, CPU, PCI, and other chip connection technology is to use the same pins to both transmit and receive data. This requires the chip to have both "transmit" and "receive" electronics connected to the same wire.


S Sandeep Kumar

Other systems designate one pair of wires to transmit data from the chip, and a different pair of wires to receive data on the chip. Superficially such a design appears to either require twice as

many wires to get the same speed or else to cut the amount of data that can be transferred in half. That would be true if the data was going in only one direction. In practice, however, data has to flow in both directions and this design allows each chip to transmit and receive data simultaneously. Plus it allow a slightly higher clock rate.

Summary

Typically no system requires all of these features at once. Almost all new technologies use

balanced pairs of wires. Otherwise, Intel's PCI Express has decided on a serial bus (not parallel, not point to point) while AMD, IBM, and Apple like HyperTransport which is a parallel point to point (not serial, not bus) system.

The following table shows some PC serial connections that have been or will be replacing older parallel connections:

Serial Replaces Timeframe

USB printer Parallel Printer cable 2000

Ethernet [nothing] 1990s

Firewire [nothing] 2000

Serial ATA 80 wire ATA cable 2003

Serial SCSI various SCSI cables 2004

PCI Express PCI bus 2004

ExpressCard PCCard/Cardbus 2005

Clocks and Cycles

Components of a computer (the CPU, memory, adapter cards) are coordinated by a "clock" signal

measured in Megahertz (millions of ticks per second) or Gigahertz (billions of ticks per second). Generally we say that speeding up the clock makes the computer run faster, but that is slightly misleading. The clock tells all the components when they should all be done with their previous operation and when they should begin the next step. Components all run at whatever speed their

design permits. If all the components can complete their longest operation with lots of time to spare, then there is room to speed up the clock, shorten the periods, and get more work done in the same amount of time. Set the clock too fast ("overclock") and it ticks before one of the

components is quite done with its last operation. Then the system crashes

22 Dec, 2004

Clock Speed: Tell Me When it Hertz

Jargon explained: clock, megahertz/gigahertz, cycle

Computer performance is a traffic problem, moving data and instructions from memory and around inside the chip. Most people think of "traffic" in terms of cars and highways. However, there is a more relevant traffic analogy that everyone experienced before they learned to drive.

Students have been sitting in class for a long time. Finally the bell rings throughout the school signaling the end of the current period. Everyone gets up and moves through the hall to their

next classroom. After a few minutes the bell rings again to signal the start of the next period. The


S Sandeep Kumar

bell has to ring everywhere in the school at the same time to coordinate movement. Without the bell, some classes would be released early and others would be released late.

The various parts of a computer hold instructions and data. Periodically they send this data along

wires to the next processing station. To coordinate this activity, the computer provides a clock pulse. The clock is a regular pattern of alternating high and low voltages on a wire. To compare this with a clock in the hall, lets say the high voltage signal is a "tick" and the low voltage signal is a "tock". The clock speed is measured in millions per second (Megahertz) or billions per

second (Gigahertz). A 100Mh PC mainboard has a clock which "ticks" and "tocks" 100 million times each second. Each tick-tock sequence is called a cycle. The clock pulse tells some circuits when to start sending data on the wires, while it tells other circuits when the data from the previous pulse should have already arrived.

A small point of notation: The standard clock speeds are some multiple of 33.3333... MHz. Three

times this speed is 100 MHz. By convention, the speeds are rounded down to 33 and 66 MHz, but the fraction explains why three times a 33 MHz clock is 100 and not 99.

There are five ways to increase the processing power of a CPU or the teaching power of a High School.

• Raise the clock speed - In the analogy, this corresponds to reducing the time available for each class period. If the teacher can talk faster, and if the students behave and listen

more closely, this can work up to a point. Each student gets done with the school day earlier.

• Build a Pipeline - A more complicated solution shortens the class period, but then breaks each subject into a sequence of steps. If it takes 45 minutes to cover Algebra, and that

time cannot be reduced, then the subject could be covered in three consecutive 15 minute periods. A simpler subject might be covered in just one period. After all, there is no reason other than the convenience of scheduling why every every class for every subject lasts the

same period of time. Students get done quicker, but only if some of the subjects are light weight.

• Parallelism - Add more classrooms and more students. No one student learns anything faster, but at the end of the day the school has taught more people in the same amount of

time. Of course, this only works if you have more students in the school district to teach. • Class Size - double the number of students in each classroom. High Schools don't like to

do this. Computers, however, can easily switch from 32 to 64 bit operations. This will not

effect most programs, but the particular applications that need processing power (games, multimedia) can be distributed in a 64 bit form to get more work done per operation.

• Build a Second School - Sometime in '05 or '06 both Intel and AMD will begin to ship "multi-core" processor chips. This creates a system with two separate CPUs. An individual

program won't run any faster, and if these chips have a slower clock may even run more slowly. However, two programs will be able to run at once, and programs that require the most performance (games, multimedia) can be written to use both CPUs at once.

The easiest solution, and the one that benefits everyone without requiring any changes to software, is to speed up the clock. Beyond a point, that also required a longer pipeline. Then

sometime in 2004 both CPU vendors ran into a ceiling. Intel had difficulty pushing its clock much beyond 3 GHz, and AMD had trouble pushing past 2 GHz. Because AMD had more parallelism, the AMD chip was just as powerful as the Intel chip despite the lower clock speed.

So both vendors reconsidered their strategy and have decided to consider the other options. AMD was first to offer a 64 bit processor, but because Microsoft was not ready to ship a corresponding

64 bit version of its operating system the AMD advantage has been limited. Intel developed a


S Sandeep Kumar

range of tricks to get more work done at lower clock speeds and lower power in their Centrino laptop processor, and they are now migrating some of this technology to desktop systems.

Vital Statistics

Warning: This is a confusing collection of numbers. What is important is how the different

numbers relate to each other. Specific numbers are plugged in because they are occasionally

mentioned in the literature.

Example: There are two versions of the Intel 2.4 GHz Pentium 4. One get a clock speed from the mainboard of 100 MHz, but since it transfers data 4 times per clock tick its "Font Side Bus" (FSB) to memory and I/O is said to be four times the clock or 400 MHz. Internally the CPU has a "multiplier" of 24, meaning the external clock is divided into 24 periods to produce the 2.4 GHz

value. A slightly more modern version of P4 gets a 133 MHz clock, has a 533 MHz Front Side Bus and has a multiplier of 18. The equivalent AMD Athlon XP 2400+ gets a clock of 133 MHz, has a Front Side Bus twice that at 266 MHz, and an internal multiplier of 15. That gives it an internal

speed of 2.0 GHz, but since it executes more instructions per internal clock tick it is rated to be equivalent to Intel's 2.4 GHz.

Chip Type Actual Clock Bits/Clock FSB Multiplier Speed

Pentium 4 2.4 100 MHz 4 400 MHz 24 2.4 GHz

Pentium 4 2.4A 133 MHz 4 533 MHz 18 2.4 GHz

Athlon XP 2400 133 MHz 2 266 MHz 15 2.0 GHz

The earliest PC had one clock, and its signal applied to the CPU, memory, and all the I/O devices.

A modern PC has many different clock signals for different areas of the machine. Clocks are generated by the mainboard. Their speed is often set in the BIOS setup panels that appear when the user presses DEL or another key during the power up boot.

CPU socket clock

The mainboard generates a clock signal that paces the transfer of data to and from the CPU. Data from the CPU may be going to memory, to the AGP video card, or to an I/O device. The mainboard may sense the CPU chip and set the clock based on the

manufacturer's recommendation, or it may provide a BIOS setup panel that lets the user adjust the clock value. The standard values tend to be 100, 133, 166, or 200 MHz.

Front Side Bus (FSB)

The CPU transfers data to the "Northbridge" chip on the mainboard. From there it can go to memory, the video card, or the I/O bus. An Intel CPU transfers data 4 times for every cycle of the CPU socket clock. So while the actual clock speed may be 200 MHz, an Intel CPU chip is typically described as having an 800 MHz Front Side Bus. AMD is more

complicated. The old 32 bit Athlon processors transferred data only twice per clock cycle. With a CPU clock of 166 MHz, the FSB is 333 MHz. However, the new Athlon 64 CPU chip has its own integrated memory controller and a high speed HyperTransport integrated I/O

bus. FSB numbers would be meaningless. There is no Northbridge chip between the CPU and other devices. The CPU can use its direct connection to memory while at the same time performing high speed I/O to video or other devices.

Multiplier

The CPU generates an internal clock that runs faster than the mainboard clock. If the mainboard clock is 100 MHz and the CPU "multiplier" is 24, then the internal clock cycles 24 times for every tick of the mainboard clock, producing a CPU speed of 2.4 GHz. The same 2.4 GHz can also be produced by applying a multiplier of 18 to a mainboard clock

running at 133 MHz. The multiplier is manufactured into the CPU chip and cannot be changed.

Memory


S Sandeep Kumar

Modern mainboards generate a separate clock to the memory. As it happens, the current memory clock rates are also 100, 133, 166, and 200 MHz. Some motherboards generate

this clock as a completely independent number, while others express it as a ratio to the CPU bus clock. DDR (double data rate) memory transfers data twice per cycle (on the tick and again on the tock) and is therefore often quoted as having a speed that is twice the actual clock speed (200, 266, 333, or 400 MHz).

PCI Bus

The PCI standard calls for a 33 MHz clock speed. Some systems generate this independently, but most systems simply divide the 100 MHz CPU bus clock by three or the 133 MHz clock by 4. This is fine as long as you stick to the standard values. If you use the

BIOS to nudge the CPU up slightly to a non-standard value like 110 MHz, then the PCI bus will also be running fast. At some point, one of the adapter cards will be far enough out of spec that it will become unreliable.

The Front Side Bus connects the CPU to memory. If the FSB is running at an effective rate of 800

MHz but the fastest memory is 400 MHz, then the CPU gets no benefit from its data transfer ability. The newest high performance mainboards have two separate memory buses. DDR memory has to be installed in pairs. A memory reference is split between the two 400 MHz buses producing an 800 MHz aggregate transfer rate that matches the speed of the CPU.

BIOS Setup

Each time the computer powers up, the mainboard senses the type of CPU that is installed. It can

sense if the type of CPU has changed. Initially, the CPU clock speed will be set to whatever value is standard for this particular model of processor. Similarly, the mainboard determines the type of memory and sets speeds and timings to match the slowest type of memory installed in the system.

After the mainboard has been shipped to customers, the CPU vendor may add new processor

models. Existing mainboards can be updated to correctly handle these new CPU chips by updating a set of programs called the BIOS. Unlike ordinary software, the BIOS is stored in read-only memory on the mainboard and it provides programming for the chipset instead of the CPU. However, if the new CPU chip fits in the same socket and uses the same voltage levels as the

older processors, then an alternative to updating the BIOS is to manually enter all the right speeds and timings into the BIOS configuration screens displayed if you press Del or F2 just as the computer begins to power up.

More aggressive computer users may enter values that are faster than the numbers published for their CPU chip. This practice is called "overclocking." Because processors are tested beyond their

rated speed, almost any CPU can be overclocked by 5% or 10%. More than that may require special cooling.

Nanoseconds

All the ads and specifications quote clock speed in Megahertz. However, the more important number is the length of time between clock ticks (the cycle time). Such periods are usually measured in nanoseconds (billionths of a second) abbreviated "nsec."

Electricity travels through a copper wire just a bit slower than the speed of light. Normally, we

can just regard the speed of light as "very fast." It becomes important when the distances are very long (astronomy) or when the times are very short (computers). A nanosecond is the amount of time that it takes light (or an electric signal) to travel about one foot.


S Sandeep Kumar

PC clock speeds appear at first to be a strange collection of numbers. However, the corresponding cycle types display a much more regular pattern:

Clock Cycle Bus

33Mh 30 nsec PCI (general adapter cards) 66Mh 15 nsec AGP (video adapter) 100Mh 10 nsec mainboard clock to the CPU 2Ghz 0.5 nsec CPU internal clock after multipler

A processor with a 2 GHz clock must perform operations in less time than it takes for light (or

electricity) to travel 6 inches. The chip is very small, but it has millions of circuits. All must be manufactured to a very high level of precision.

However, it is much simpler to apply quality control to a chip the size of a fingernail than to the entire mainboard. This by itself show the problems of a higher speed main clock, and the benefit of capping the I/O bus design at 33 MHz (30 light-feet of signal distance).

Instructions per Cycle: Get in Gear

To add up a column of numbers with a pocket calculator, you simply type each number in and press the "+" key (or the "=" key at the end). Most users probably think that a PC spreadsheet

program does the same thing. However, the human brain has actually been doing the hard part of the operation, moving down one row in the column, focusing on the number, and recognizing it. Each PC instruction carries with it a number of additional operations that would not be obvious to the casual user.

First, the computer must locate the next instruction in memory and move it to the CPU. This instruction is coded as a number. The computer must decode the number to determine the operation (say ADD), and the size of the data (say 16-bits). Additional information is then moved and decoded to determine the location in memory (the row and column of the spreadsheet).

Finally, the number is added to the running total. Although a human might take some time to add two eight digit numbers together, the addition is the simplest part of the operation for a computer chip. Decoding the instruction and locating the data take the most time.

Each generation of Intel CPU chip has performed this operation in fewer clock cycles than the previous generation.

• A 386 CPU required a minimum of 6 clock ticks to add two numbers.

• A 486 CPU could generally add two numbers in two clock ticks. • A Pentium CPU could add two numbers in a single clock tick. • A modern processor can add two to six pairs of numbers in a single clock tick. If it

discovers that the next instruction needs data that hasn't arrived from slow memory, it can rearrange things to execute subsequent instructions until the data arrives.

To make a car go faster, one steps on the accelerator. Extra gas makes the engine rotate faster. When RPM gets high enough, it is better to shift to a higher gear. The PC system clock (measured in MHz) is like the engine speed (measured in RPM). The CPU model selects the gear. The original

86 processor was like first gear, and the 486 is like fourth gear. So it is a mistake to compare clock speed across changes in the architecture.

This explains the current difference between Intel and AMD chip speeds. AMD has more internal processing units, so it executes more instructions at the same clock speed. AMD therefore quotes its processor by the equivalent Intel processor speed and not the actual clock.


S Sandeep Kumar

CPU, Instructions

A computer chip can do simple arithmetic, compare numbers, and move numbers around in

memory. Everything else, from word processing to browsing the Web, is done by programs that

use those basic instructions. CPUs get faster in three ways. First, better designs can do the simple operations faster. Second, better design can do as many as six simple operations at the same time in different areas of the CPU. Thirdly, since a lot of time is lost if the CPU has to wait for data from slower memory, techniques that reduce the memory wait time appear to speed up the CPU. 1 Jan, 2007

CPU Design

Its all Numbers [but not Math]

At the hardware level, a computer executes sequences of individual instructions. Each instruction

tells the computer to add, subtract, multiply, or divide two numbers, compare numbers to see if they are equal or which is larger, and move numbers between the CPU and a location in memory. The rest of the instructions are mostly housekeeping.

Everything in a computer is represented as numbers. Each memory location has a numeric "address" that identifies it. Each I/O device (disk, CD, keyboard, printer) has a range of assigned

address numbers. Every key pressed on the keyboard generates numbers. Every dot on the computer monitor has an address, and the color of the point is represented by three numbers that mix the primitive colors of red, green, and blue. Sound is represented as a stream of numbers.

Consider the automatic error correction in a word processor. If you type in " teh " the computer appears to recognize the common misspelling and changes it to " the ". What does this have to do with numbers? Well, every character that you type, including the space bar, transmits a code to the computer. The code is ASCII, and in that code a blank is 32, "a" is 97 and "z" is 122. So the

computer sees " teh " as the sequence 32 116 101 104 32. The word processor has been programmed to check for this sequence, and when it sees it it exchanges the 101 and 104. The CPU chip doesn't know about spelling, but it is very fast and accurate handing numbers.

You might then think that the speed of the computer is determined by how fast it can add. People expect this because adding large numbers takes us a long time. Ask someone how much is

2+2 and they will respond immediately 4. Ask how much is 154373 + 382549 and they will stop for a minute and take out a pencil. A computer adds numbers with electronic circuits that work just as fast for large or small numbers. Arithmetic is what computers do best, and they do it almost instantly.

If I ask you to add 2+2, you can do it immediately. Now suppose I put two numbers in different

rooms of your house, write the name of the room on a sheet of paper, put the paper in an envelop, and ask you how long it will take to find the numbers and add them. You won't know until you open the paper and find out where the numbers are. It gets worse if some of the

numbers are in other houses in the neighborhood and I put the address of the house on the paper instead of just the room name.

The CPU has different places to store numbers. It has 8 or 16 "registers" which require no delay at all. It has "L1 cache" which is almost instantaneous, and L2 Cache which is just a little slower. Then it has main memory. Memory is very fast these days, but it is so slow compared to the speed of the CPU that you waste hundreds of instructions waiting for a response.


S Sandeep Kumar

The computer processes instruction through a sequence of steps. First you have to read the next instruction itself, which may be in cache or it may be in memory and have to be fetched. Then the

computer decodes the instruction to determine what is to be done and more importantly, where is the data that the instruction needs. It may be in registers, L1 cache, L2 cache, or memory. The CPU has to fetch the data, then turn the instruction over to one of the processing units. There are many preliminary steps, and then several processing steps. So the CPU processes instructions

through a "pipeline" that behaves like an assembly line (where the work comes to the workers) or like a cafeteria line (where the users come to the food).

A CPU is measured by how many instructions it can process in a second, not by how long it takes to process any single instruction. Consider a fast food counter. They have a bunch of lines, several people working the counter, and lots of people in back cooking the food. They measure

themselves by how many customers they serve in any period of time. When you come to the the front of the line, the item you want may be temporarily unavailable and you have to step aside. It might take you unusually long to get your burger, but lots of other people are being served

during the period. To you, the service is slow. To the business, they are moving lots of people through.

In the same way, a CPU is designed to fetch programming, fetch data, and execute instructions. Sometimes a particular instruction needs data that is not immediately available. All modern processors can push the instruction aside and have it wait while subsequent instructions are serviced. Speed is measured by the overall throughput of the chip.

The High School Analogy

The first generation of PC CPU chips was like a one room schoolhouse. A class of students could

enter and be seated. The first period would be English. When the bell rings, they switch books and take a period of Math. Then History, a Language, and finally Science. After the last subject, the school day is done. However, in the computer version of the "school" another class of students immediately enter the building and begin their subjects.

If you want the school to educate students more efficiently, you could try to shorten the periods (speed up the clock). However, you can also speed up things by building more classrooms. That is what happened with the 286, 386, and 486 generations of chips. In a school designed like a 486, there is one classroom for each subject. When the bell rings, the students in the English room

move to Math, the Math students move to History, and so on. The students in the last class, Science, leave the school. A new class enters and sits down in the English classroom to begin their sequence of subjects.

Each new generation of chips typically triples the number of circuits of the previous generation. So the fifth generation chip, the Pentium, added a complete second set of classrooms. Now two groups of students would take each subject at the same time.

If the first five generations of CPU acted like a grade school and then a high school, processors after the Pentium II act a bit like college. The chip has some larger number of internal instruction processing stations. Some handle integers, and some handle floating point numbers. Instructions enter execution and are given a sequence of operations they need to perform. In a sense, the

instructions wander around from station to station with some level of independence. Some instructions get done quickly, some take longer. However, there is a rule that the instruction must end in the order in which they began. So the instructions that get done quickly have to wait at the exit for the slower instructions that entered before then to finish up.

This analogy also explains an important detail about the clock rate. Speeding up the clock doesn't tell the computer to do anything faster. Each circuit performs its operation as fast as it can. The


S Sandeep Kumar

clock tells the circuits when to begin the next set of operations. If the clock is too fast, the next operation begins before the previous operation is complete, the data is corrupted, and the system crashes.

Dependent Instructions

Suppose you want to add three numbers together:

5 + 22 + 7

A person and a computer program will first add 5 to 22 getting 27. Then adding 27 to 7 gets 34.

Two operations are performed. Since the second operation uses the result (27) from the first operation, they have to be done in order.

Now consider adding four numbers together:

5 + 22 + 7 + 18

A person will accomplish this by appending a third operation that adds the 34 calculated by the first two operations to 18 to get 52. However, a computer can perform more than one numerical operation at the same time, provided that the two operations are independent of each other. So if you want to optimize this for a modern PC, you would arrange the instructions as follows

1. Add 5 and 22 (27)

2. Add 7 and 18 (25) 3. Add the results of the previous two steps, 27 and 25, together (52).

Since steps 1 and 2 don't depend on each other's results, they can both be run at the same time. Step 3 requires the results of both previous steps, so it runs in the next cycle. As a result, the

computer can add four numbers together in the same two cycles it took to add just three numbers together, because the first two operations can both run in the first cycle at the same time.

The original Pentium chip could execute two instructions at the same time, provided that they were not dependent on each other. It required the programmer or compiler arranging the

instructions in an optimal order. The Pentium II, III, and Pentium 4 CPU chips internally rearrange instructions when they are no dependent on prior results, so optimization doesn't depend as much on how the program is coded.

Registers

All computers designed in the last forty years hold data in "registers". If you are adding up a column of numbers, the register holds the running total. If you are scanning a document for spelling errors, a register keeps track of your location in the document.

The original 16-bit Intel CPU design had a very small number of highly specialized registers known by letters. As it happened, the letters were associated with words that described their use. If you were adding up numbers in the column of a spreadsheet, the A register "accumulated" the total, the B register was the "base" and pointed to the column or cell, and the C register held the "count" of the number of cells remaining to be added.

In 1986 Intel introduced the 386 CPU with a new set of 32-bit instructions. The original seven highly specialized 16-bit registers became seven largely interchangeable general purpose


S Sandeep Kumar

registers. However, it was not until nine years later that Microsoft released a generally available operating system (Windows 95) that made use of the 386 instructions and registers.

The 32-bit instruction set of the 386 chip has survived for almost 20 years. Meanwhile, Moore's

Law tells us that the number of circuits on a chip doubles about every 18 months. Hardware is much easier to change than all the software. A modern CPU chip has a lot more than 7 registers, but they are invisible to the user and even to the operating system.

A program may have a sequence of operations that one after another "accumulate" different totals into the A register. In each step, a different "count" may be loaded into the "C" register.

However, each of these operations may be independent of the other. Under the covers, the CPU may recognize this and speed up processing by allowing operations to run in parallel. I doing so, the CPU will assign two real registers to pretend to be the A and C registers for one group of operations, while a different pair of real registers will pretend to be A and C for different operations. Of course this pretending is complicated and only goes so far.

In 2004 AMD introduced its Athlon 64 family of processors with a 64-bit instruction set. Initially Intel resisted, but it has finally caved in and cloned the AMD operating design. Server programs benefit from the ability to use more than 4 Gigabytes of memory, which after all is only about $400 worth of memory. However, for every type of program the more important feature of the

new 64-bit instructions may be a new set of 8 registers that compilers can now use to optimize program execution. Early tests suggest that many programs run 20 to 30% faster thanks to the extra registers.

Memory Access Delay

Memory is a lot slower than the CPU. If an instruction requires data that is out in the main memory of the computer, it may have to wait for a period of time equal to the processing of

hundreds of instructions. Since some of the subsequent instructions will depend on the results of this previous operation, the CPU will halt waiting for memory.

To get around this problem, a CPU has two types of internal high speed memory to hold recently used instructions and data. This high speed memory is called "cache".

The best type of internal memory is the Level 1 (L1) cache. This memory is part of the CPU core along with the units that decode instructions and perform arithmetic. If the instruction and data

are in L1 cache then the CPU can execute at full speed. The modern Intel processors have 32K of L1 internal cache. Competing processors from AMD have even more.

When the instruction or data is not found in the L1 cache, modern processors have a larger amount of Level 2 cache integrated into the CPU chip. Different chips have 128K, 256K, or 512K of L2 cache depending on cost and technology. A "Pentium IV" chip always has more L2 cache

than the less expensive "Celeron" chip of the same generation. Access to the L2 cache may delay an instruction for several clock cycles, but the CPU chip will often be able to reorder instructions and keep busy during the period.

The main memory of the computer is Synchronous Dynamic Random Access Memory (SDRAM).

This memory is measured in units of 128 or 256 megabytes. The CPU and memory are connected by the component of the mainboard called the Northbridge. The CPU transmits data two (AMD) or four (Intel) times per tick of a 100, 133, 166, or 200 MHz clock. The memory transmits data two times per tick of a 100, 133, 166, or 200 MHz clock.

It sounds like they are matched, but there is a missing number. After the computer generates the

address of the desired memory location, there is a delay called the "latency" before the memory


S Sandeep Kumar

begins to respond. Then it transfers data at the rated clock speed. The problem is that the latency is measured in tens of nanoseconds, and when a modern CPU can execute 12 to 24 instructions per nanosecond.

Latency is the performance killer. In the time it takes to fetch a new byte of data from a new address, the CPU could have executed hundreds of instructions. By reordering subsequent instructions that do not depend on the results of this memory fetch, a CPU might continue to run for a few dozen instructions, but then it will stop. Even if the L1 and L2 cache handle more than

99.5% of all data requirements inside the CPU itself, the latency delay may mean that a typical CPU with a typical workload spends half its time waiting for the memory to respond while executing no instructions.

Older SDRAM was classified by the clock speed. PC100 and PC133 memory runs at clock rates of 100 or 133 MHz. Today's computers use Double Data Rate (DDR) SDRAM and they change the

naming convention to use the clock times two (for DDR) and then times eight for 8 bytes per transfer. So DDR memory at 100 MHz is represented as PC1600, 133 MHz is PC2100, and 166 MHz is PC2700.

RISC Architecture

The first Intel "CPU on a chip" was the 4004 processor. It was more like a pocket calculator than a real computer. It handled ordinary base 10 digits encoded as four bits. Later chips added the

ability to handle 8 bit, 16 bit, and 32 bit numbers. So on a modern Intel CPU chip there is no single Add instruction. Instead, there are separate Add operations for digits, bytes, and every other size of number. The resulting set of possible instructions is a mess. This is typical of a "Complex Instruction Set" computer chip.

In your Sunday paper, right next to the CompUSA insert there is probably something from Sears.

Look at the last few pages of the ad, where they show the tools. There will almost certainly be a picture of the traditional "190 Piece Socket Wrench Set." If you purchased this item, you would always have the right tool for any job. In reality, it is almost impossible to keep all the pieces

organized, and you will spends minutes searching through all the attachments to find one of the right size.

Go to a tire store. They lift your car off the floor, remove the hubcaps, and then pick up a gun shaped device connected to a hose. "Zuuurp" and each bolt comes off the wheel. You could do the same thing with the 190 Piece Socket Wrench Set, but every garage knows that automotive

wheel bolts come in only one size. So they don't have to spend time searching for the right size tool, and they can optimize the one size that they really need.

When computer designers realized the same thing, it was called Reduced Instruction Set Computers or RISC. Make all the instructions the same size. Use only one size of data. Simplify the instructions and therefore the operation decode. Then use all the room on the chip to optimize what is left, rather than filling the chip with support for instructions that are seldom executed.

Today the RISC philosophy of CPU design is represented by the IBM Power line of processors used in the XBox 360, PS/3, and Wii, and also in big IBM Unix computers. Sun also has its SPARC family of chips. However, the advantage of a Reduced Instruction Set turned out to be most important in the period when chips have 2-3 million transistors (during the period of the late 486

chips and the early Pentium chips). When the PowerPC was first announced, it was billed as having "the power of a Pentium at the price of a 486."

Every 18 months the CPU chip doubles the number of transistors it can hold. Today's CPU has hundreds of millions of transistors. It quickly became unimportant to alter the work to simplify


S Sandeep Kumar

the design of the computer. RISC today has its greatest effect in video game consoles, where the computer program is specifically designed for the hardware and maximum performance is worth the extra investment in design.

Pipeline, Superscalar

Although a tire store may be fast at changing tires, when you really need speed look at how they

do things in Indianapolis. A race car pulls into the pit for service. They jack it off the ground, and then four teams of mechanics go to work on all four wheels simultaneously. The car is back in the race in a matter of seconds. In ordinary life, such service would be prohibitively expensive. But in the world of microelectronics, transistors are cheap.

A pipeline is the sequence of processing stations that decode instructions, fetch data, perform

the operation, and save the results. Inside the CPU, instructions are processed at a sequence of stations that resemble an assembly line. Fifteen years ago, a CPU would process instructions in five or six steps. Each step is completed in one clock cycle.

In order to speed up the clock, it is necessary to break the processing down into smaller steps

that can be accomplished in the shorter clock cycle. A modern Intel CPU may have a pipeline with 40 steps in it. One instruction in the program occupies each step. At each tick of the clock, all of the instructions advance one step forward in the pipeline. An instruction may finish at the end of the line, and a new instruction may enter at the beginning.

Pipelines have a potential problem whenever the program encounters a branch instruction. This is

a decision point where the program will continue by executing one of two alternate paths of new instructions. The problem is that the CPU will not really know which of the two paths will be taken until the branch instruction is at or near the end of the pipeline. To keep the pipeline full, the CPU has to guess which of the two alternate instruction paths will be executed and begin processing it

through the pipeline. If this "branch prediction" is wrong, then the partially executed path has to be abandoned, and the correct path has to enter the pipeline at the beginning. A mistake in branch prediction can cause the CPU to miss around 30 clock cycles of execution.

A computer is superscalar when it can execute more than one instruction per clock cycle. The pipeline discussion talked about one instruction ending and one beginning at every clock tick. A

Pentium 4 CPU can actually start or terminate up to three instructions in a clock cycle. Along the pipeline, most of the processing steps are duplicated. The CPU can be adding two or more pairs of numbers at the same time.

However, one of the things that makes a Pentium 4 or AMD CPU so complicated is that this ability

to execute more than one instruction at a time has to be completely hidden from the program. The program is written to execute one instruction after the other, and the CPU produces results that exactly duplicate this behavior. So to use the extra processing power, the CPU chip must have a large amount of complex control logic to detect when two instructions that the program is

written to execute one after the other are actually independent and can really be executed at the same time.

SIMD

There are two processing units in a typical home computer. The CPU is made by Intel or AMD, and it is the chip you normally hear about. However, in most systems there is actually a second chip that, in raw computational ability, is a much more powerful computer. It is the main chip on the video card, the Graphics Processing Unit or GPU.


S Sandeep Kumar

The GPU is not the kind of general purpose computer for which you could write an operating system or applications. It does a small number of things over and over, but it is very fast when

doing them. It also has some local memory that may be faster than the main memory of your computer.

What makes the GPU so powerful? Data is displayed on the screen as a set of dots. Each dot is represented by three numbers for the three colors. Three dimensional applications (mostly video games) execute mathematical operations to calculate the correct values for each color of each dot

in some area of the screen. Video images are compressed into the MPEG 2 streams of a DVD or HDTV by comparing the colors of adjacent dots with trial and error to find a mathematical sequence that can generate the same image pattern while occupying considerably less memory.

This can always be done one instruction at a time, but it is repetitive. More importantly, whatever you do to one dot you also have to do to the next dot and the one after it.

In a square dance, someone stands at the microphone calling out the next step. In unison, all the dancers on the floor do the same thing, then the caller announces another step.

You can design a processing unit the same way. One part of the processor reads the program and determines what the next operation should be. However, unlike a PC CPU, the instruction doesn't apply to one number or a pair of numbers. Instead, a whole line of numbers has been loaded into the unit, and the one instruction applies to all of them at the same time. This is called SIMD, for

"Single Instruction, Multiple Data". Thirty years ago on big room sized mainframe computers, it was called "vector processing."

Fifteen years ago, the first SIMD chips began to be used in Personal Computers. They weren't powerful enough to be used for video applications, but they could provide support for the much less complicated processing of audio data. Such chips are called DSPs for "Digital Signal

Processor". They could be used for everything from computer modems to removing the sound of scratches from old phonograph records. Today, CPUs and SIMD are much faster and more powerful.

There is a small amount of SIMD capability built into the Intel and AMD CPU chip. It is used to support multimedia and games. In an Intel chip, it is called MMX, SSE, SSE2, and SSE3. AMD SIMD is called "3DNow!", and like Intel it has gone through several generations.

A number of vendors are building specialized SIMD CPU chips that are somewhere between the highly specialized design of the GPU and the general design of the CPU. Sony and IBM are collaborating on the "Cell" processor for the Playstation 3. Smaller vendors offer boards that can plug into a conventional PC to speed up the processing of games or scientific computation.

More Core or Fusion?

Intel is now selling its Core 2 Duo chips, while AMD is selling Athlon 64 X2 chips. These chips

have two internal CPU processors instead of one. The ability to run two different programs at the same time improves power and smoothes performance. Intel will be bringing out a Core 2 Quadro chip with four CPUs later in 2007. Unfortuantely, for a desktop rather than a server workload, each additional processor is less useful than the previous one.

AMD just purchased ATI, one of the two leading vendors of video Graphics Processing Chips, and

they have announced a different strategy they call "Fusion". After two conventional CPU cores, the additional cores may be based on a SIMD GPU design. They would behave more like devices than computers. The operating system would assign these devices to the kinds of programs that can use them: games, HDTV recoding, photoshop, voice recognition, etc.


S Sandeep Kumar

Specialized SIMD cores would be of no use to Office applications or when you are just browsing the Web. However, these applications can hardly use all the processing power in a single CPU let

alone a dual core system. Today the only types of applications that run 100% CPU busy are exactly the applications that can benefit from a SIMD parallel processing capability that will perform specific operations 10 or 20 times faster than a conventional CPU.

The CPU Market

At CPU can execute two or three instructions per cycle. Memory continues to have a delay (the “latency”) of around 50 nanoseconds between the time the CPU makes a request for data and the

time that the memory can respond with the data. If the CPU has to wait for memory, a 3 GHz processor could have executed conservatively 2 (instructions per clock) times 3 (cycles per nanosecond) times 50 nanoseconds or 300 instructions in the time it takes the memory to respond. Making the CPU run faster doesn’t help.

Fortunately, programs use the CPU in only a few common patterns:

• An interactive user running Office or a Web browser uses the CPU only in short bursts. The

computer sits around waiting for the next keystroke or for some data to arrive over the network. If the CPU were able to respond in .01 seconds instead of .02 seconds, the human being would not notice the difference.

• A computer processing a stream of video data or running a video game has a lot of

processing to do. The CPU is 100% busy. Most of the time, the program fetches the next byte of data from the stream. Memory access can be predicted, so the CPU seldom has to wait for an unexpected reference to a random memory location.

• A computer acting as a Web or application server, however, runs hundreds of small

programs on behalf of thousands of remote user requests. No single request uses a lot of CPU. There is no way to anticipate the next request or the data that it will need from memory. This is the kind of usage pattern where the CPU is most likely to have to wait for

memory to respond with data, and because the server handles so many remote users, it is also the case where performance is most important. If the CPU supports HyperThreading, then when the instructions for one thread block waiting for memory, there is an entirely different thread sitting in the CPU with an independent set of instructions able to execute until memory responds or, by bad luck, until the second thread also blocks for memory.

Currently Intel and AMD have four families of CPU chip.

1. Core 2 Duo or Solo ("Mainstream"). Plugs into a socket with 775 pins and transfers data four times per tick of a 200 to 333 MHz clock (800 to 1333MHz FSB). Internally it runs at a speed around 3 GHz. Prices run from $180 to $600 per chip. AMD calls this the Athlon 64 X2

2. Celeron ("Value"). An inexpensive version of the mainstream that plugs into all the same boards. It has slower internal clock speeds, slower FSB speed, and smaller cache. Celeron prices, however, are around $100. AMD calls this the Semperon.

3. "Mobile". Various versions of the single core mainstream chip that have been optimized to run at very or ultra low power. Some versions of this chip drop the battery use from the conventional desktop 60-90 watts down to as low as 10 or even 5 watts. AMD calls this the Turion.

4. Xeon ("Server"). This is a version of the mainstream chip that has been modified so a mainboard can have two or more CPU chips. Xeon was also the first family to roll out a Quadro (4 core chip). Intel has demonstrated servers with four Quad-core Xeon chips, providing a total of 16 CPUs. AMD calls this the Opteron.


S Sandeep Kumar

Currently (start of 2007) Intel has a manufacturing advantage over AMD. Each new generation of chip is measured by the width of the smallest circuit features. Intel is at 65 nanometers and is

working on the next generation of 45 nanometers. AMD has just begun to ship its first 65 nanometer chips. The Intel Core 2 Duo design is also a bit faster.

AMD has been smarter than Intel in all its strategic decisions. AMD came out with a 64 bit architecture that Intel resisted for years but finally gave up and had to copy. AMD chips have an integrated memory controller and HyperTransport bus that Intel resisted for years, but now Intel

will introduce its version of the same design in 2008. AMD is working on Fusion, and maybe Intel will have match that initiative a few years later.

For consumers, however, none of this really matters. You have chips, the chips have power and a price. Either CPU vendor can take the "lead" in performance simply by reducing the price of its chips.

Hyperthreading and Multi-Core

The processing steps of a computer program can be decomposed into a set of independent "threads". To display a Web page, the Browser has to read in the page itself plus each individual file representing the pictures and ads displayed within the page. Then the text has to be arranged on the page and each picture has to be decompressed. Finally, the page has to be arranged and

displayed on the screen. Each of these operations can be assigned to a thread. If a computer has (or appears to have) two CPUs, Windows will assign a separate thread to each processor and the computer will process two different streams of data at the same time.

No matter how fast Intel makes its chip, a modern CPU spends 50% or more of its time waiting for data to arrive from main memory. This is only getting worse, because CPU speed increases much more quickly than memory speed. A larger cache provides some help. Another idea,

however, is for the CPU to have some way to switch from the instruction and thread that is blocked waiting for data to another thread that is ready to execute. This is the idea behind "Hyperthreading". Each CPU pretends to be two processors. The OS assigns a thread to each pretend processor. When one thread is blocked waiting for data, the CPU can switch over to the

other thread and get more work done. 1 Jan, 2007

Hyper-Threading and Multi-Core

Threads

Consider the problem of cooking for a big dinner party. Each dish has its own recipe. You could follow the instructions in one recipe until that one dish is done, then set it aside and start the next

dish. Unfortunately, it would take several days to cook the dinner, and everything would come out cold. Fortunately, there are long periods of time when something sits in the oven, and while it is cooking you can prepare one or two other things.

A sequence of instructions to do one thing is called a “recipe” in the kitchen, and a “thread” in computer programming. A computer user intuitively understands the behavior of threads when

running several programs on the screen, or when listening to an MP3 file in the background while typing a letter into the word processor. Even a single program can make use of threads. The Browsers has separate threads for every file or image you are downloading, and it may assign a

separate thread to decode each image or banner ad that appears on the screen when you visit the New York Times web site.


S Sandeep Kumar

Some short operations have a very high priority. For example, a pot of rice you just started has to be checked every 30 seconds of so to see if it has come to a full boil. At that point the heat can

be turned down, the pot can be covered, and now you can forget it for 15 minutes. However, if you don’t check it regularly at first, it will boil over, make a mess on the stove, and you have to start over.

Computer programs also assign a priority to their threads. As with cooking, high priority can only be assigned to trivial tasks that can be accomplished in almost no time at all. Just as a kitchen

has to have timers, and a beep when the microwave is done, so the operating system has to have support for program threads and the ability to connect them to timers and to events signaled when data arrives from the network or another device.

In the kitchen, each task you perform has its own set of tools. To chop carrots, you need a knife and a cutting board. To take something from the oven, you need oven mittens. It takes some

small amount of time to set down what you are doing and change. If you don’t change, you will find it is very difficult to cut carrots while wearing oven mittens.

Each thread in the computer stores its status and data in the CPU chip. To switch threads, the operating system has to take this data out of the CPU, store it away, and load up data for the other thread. Switching from one thread to another takes a few hundred instructions, but this is

not a problem when the CPU can execute billions of instructions a second while a hard drive or network performs only about 30 operations per second. The overhead of thread switching for I/O is trivial.

If it is a big complicated dinner that one person can simply not get done in time, you need some help. Specific tasks can be assigned to different people. The threads don’t change. The bread is

still cooked the same way whether there is one person in the kitchen or two. With two people, however, one can chop carrots while the other peels potatoes.

Modern operating systems support computers with more than one CPU chip. The system assigns one thread to run one one CPU, and another thread to run on the next CPU. The two threads run

concurrently. However, such systems are expensive and are typically found only in big servers or engineering workstations. Desktop and laptop computers have come with only one CPU.

Hyper-Threading

As has already been noted, memory delay has become an important problem for computer performance. When an instruction requires data that is in second level cache, it may have to wait a cycle or two. During this time, the CPU will look for other instructions that do not depend on the

result of the blocked instruction and execute them out of order. However, out of order execution is at best good for a dozen instructions. When an instruction needs data from DDR DRAM, it will be blocked for a length of time during which the CPU could have run hundreds of instructions.

In 2004, Intel tried to address this memory delay problem with a trick called Hyper-Threading. Rather than duplicate the entire circuitry of a CPU, a Hyperthreading processor simply duplicates

the registers that hold all the data that the OS would have to remove from the CPU in order to run a different thread. The OS thinks that there are two CPUs and it assigns two different threads to them. All the registers and data needed to run each thread are loaded into the same CPU chip at the same time.

When both threads are able to run at full speed, the CPU spends half its time running instructions

for each thread. Unlike the OS, the CPU doesn't have a view of "priority" and cannot favor one thread because it is more important. However, if one thread becomes blocked because it is waiting for data from the very slow main memory, then the CPU can apply all of its resources to


S Sandeep Kumar

executing instructions for the other thread. Only when both threads are simultaneously blocked waiting for data from memory does the CPU become idle.

Multi-Core

Moore's Law says that every 18 months the number of circuits on a chip can double. About one Moore Generation after Intel introduced Hyperthreading both Intel and AMD decided to spend the extra transistors to take the next step and create two real CPUs in the same chip.

It has always been possible to do this in any 18 month cycle. However, vendors previously decided to use the transistors to make the single CPU run faster, by supporting out of order execution and register renaming.

A Server tends to assign a thread to each incoming user request. Generally all network users are of equal priority, so threading is an obvious choice for Server software. However, desktop users

tend to do one primary thing at a time. If you are running a low intensity job like Word or Web Browsing, CPU doesn't matter. However, playing video games, retouching photographs, compressing TV programs, and a few other consumer programs will use a lot of one CPU, and making the one CPU run faster seemed more important.

Engineers ran out of ideas for using transistors to make a single program run faster. So starting

last year they started building "dual core" chips with two CPUs. That forced some of the software vendors, particularly the video game makers, to redesign their software to make better use of the second processor.

Two CPUs can do twice as much work as one CPU if you can keep both processors busy all the

time. Unfortunately, that is not realistic. Even on a server, the value of each subsequent processor goes down, and on a desktop there just isn't enough work to distribute it uniformly. So while Intel is beginning to show of a Core 2 Quadro chip with four CPUs, it makes little sense to go farther than that.

Heat and Power

Computers are idle a lot of the time. When they are running, there is often only work to keep one

core busy. The easy thing to do would be to design a dual core machine where both processors run all the time. Such a system will generate twice as much heat and use twice as much energy. Intel and AMD rushed their first generation of dual core processors out the door, so this is how they operate.

Given more time to do the engineering, you can design multi-core systems to shut down parts of

the chip that are not being used. This is critical in a laptop system running on battery, but in today's heat and power conscious environment it is useful for even desktop machines.

Co(re)ordination

Two programs are running on your computer. While they mostly do different things, they may both store data on the same disk and they both display output on the same screen. Internally, the operating system must coordinate their concurrent access to shared resources. At the hardware level, each CPU core must coordinate access to memory and to the I/O devices.

In the old days when Intel had one CPU per chip, coordination between the processors was done by the Northbridge chip on the mainboard. That was a perfectly sensible design. However, when Intel moved to Core Duo, and started to put two CPUs in the same chip, it was no left with the


S Sandeep Kumar

unfortunate consequence that the two CPUs could not talk directly to each other or coordinate activity, but instead they had to go out to the Northbridge chip for every such request.

When AMD came up with the Athlon 64/Opteron design they moved memory management into

the CPU. That eliminated the need for a Northbridge chip. Processors where connected to the Southbridge (and thus all I/O devices) and to other processors using HyperTransport links. Each AMD CPU has one CPU, a memory manager, and 1 to 3 HyperTransport managers. AMD connected these five components to each other with a general purpose switch called the crossbar

or "XBar" for short. At the time, they may not have given much thought to multiple CPU cores, but this turned out to be an ideal design.

Inside the AMD chip, a CPU that needs data from memory, an I/O device, or another CPU makes a request to the XBar. The XBar determines if the requested data is local (another CPU on the same chip, memory controlled by this chip) or remote (another chip connected by a HyperTransport link).

The use of the XBar to connect devices in the same chip, and the HyperTransport link to connect to external devices, creates a design that is efficient, scalable, and flexible. Recently AMD purchased ATI, a leading maker of the Graphics Processing Unit on a video card. This architecture will let them explore hybrid chips that contain a CPU to run programs and a GPU to handle video,

both part of the same chip. Alternate designs combine CPU and some of the Southbridge function to produce ultra cheap or ultra small boards.

This is such a good idea that Intel will be copying it, but probably not until 2008.

Memory and "Burst" Speed

Technology has been applied to increase memory speed only when it can be done without

reducing size or increasing cost. Current mass market designs favor Double Data Rate SDRAM. When a CPU instruction requires data from memory, it presents the address and then has to wait several cycles. Once the first block of data has been located by the memory hardware, the 32

bytes immediately surrounding the address can also be transferred in a "burst" of activity. DDR memory transfers the data at twice the ordinary speed of the memory bus by transferring bytes on both the tick and the tock of the clock.

5 Jan, 2007

Memory

The newspaper ad offers a computer system with a "2.2 GHz CPU" and "256 Megabytes of RAM". We are interested in the speed of the CPU and the size of the memory.

Periodically the CPU has to wait for data to come in from memory. During this period the CPU

appears to be "busy" but it really isn't doing any useful work. In some applications, the speed of memory is the limiting factor. Then a "faster" CPU isn't really any faster and it can't get any more work done.

Modern Synchronous DRAM has two performance numbers. The first is latency, the delay between the time that a particular data item is requested and the time when the memory can

reliably transmit the data back to the CPU. The latency number hard to dig out of the technical information provided by memory vendors, but it is measured in tens of nanoseconds.


S Sandeep Kumar

The second performance number is throughput, the rate at which SDRAM can return additional data from the same general area of memory once the latency period ends. This is often called the "burst" period because there is a burst of data transfer after the idle latency period.

Memory transfers data 8 bytes (64 bits) at a time. Since the memory can't run faster, modern mainboards provide some additional power by maintaining two memory buses. The user must install memory in matching pairs of DIMM modules, but then there is twice as much memory transfer capability.

The CPU stores the most recently fetched data in its Cache. It would be too complex to track

smaller units of data, so Cache saves blocks of 32 bytes of data fetched from memory. Therefore, whenever the computer makes a request for data in memory, even for just one byte of data, the hardware actually fetches the 32 bytes around the address generated by the CPU, fetches all that data, and stores a copy of it in Cache.

When the CPU needs data, it looks in Cache. If the data isn't there, then it makes a memory

request. The CPU has to wait until the data is ready to transfer (the latency period). When the data is ready, then the memory bus transfers four consecutive units of 8 bytes each (the burst).

Synchronous

In the first generation of IBM PC, DRAM memory transferred one unit of memory for every CPU request. The CPU presented an address, the memory responded with data. The CPU presented another address, the memory responded with another unit of data. This worked well because the CPU and memory ran at essentially the same speed.

An operation that can only proceed when the sender and receiver both indicate that they are ready is said to be "asynchronous". It runs at the speed of the slower of the two ends. Baseball is an asynchronous game. The pitcher can take his time, look at the runner on first, get signals from the catcher. If the batter needs more time he can step back out of the box, stretch, and rub

something on his hands. Only when the batter is in the box and the pitcher starts his windup can we really expect a pitch.

There is another mode of operation represented by the pitching machine in a batting cage. The machine delivers balls regularly and mechanically, whether the batter is ready or not. When something happens at a regular rate, driven by a clock, then computer experts call it a "synchronous" operation.

The first generation of Synchronous DRAM (SDRAM) transferred 8 bytes of data with every clock tick. A second generation of Double Data Rate (DDR) SDRAM transfers data with every tick and tock of the clock (when the clock signal rises from low to high and when it drops from high to low).

Today most computers use a slightly faster version called DDR2 SDRAM. DDR2 runs at a faster

clock rate, but that doesn't mean that the latency is greatly improved. Today only AMD boards with older sockets (754, 939, and 940) use the older DDR memory.

Latency

1. First, the mainboard has to generate the address of the data and hold it on the memory bus while issuing a read or write command.

2. Then the memory control logic has to convert this address into "row" and "column"

numbers to select the data from the array of transistors in the memory chip. There is also


S Sandeep Kumar

a delay in the chip before the data becomes available. At the end of this latency period the data is available.

3. After the first transfer, three more transfers complete the burst. They transfer the data at the full speed of the bus, one transfer per clock period.

The second phase, when the address is being decoded and the data is being fetched, is the primary latency number. Unfortunately there are a lot of numbers that go into this period that tell only part of the story. There is a latency number for the individual chip that is usually quoted at 8

to 10 nanoseconds. There is a CAS number that is measured as 2 or 3 clock cycles. The bottom line is that when you add all the numbers together, the middle piece adds up to 40 to 50 nanoseconds.

Now if you consider that DDR400 memory has a 5 nanosecond clock cycle and transfer data during the burst every 2.5 nanoseconds, if you put a 40 or 50 nanosecond delay in the middle of every data transfer it will overwhelm the calculation. The timing looks like:

• 5 nsec to present the command • 45 nsec latency plus or minus a bit • 2.5 + 2.5 +2.5 to finish off the burst.

This is a total of 57.5 nanoseconds to get 32 bytes.

If all you look at is the clock rate and the bandwidth of the memory bus, you might think that the memory speed keeps up with the CPU. However, when you recognize that there is a latency

delay, you realize that the CPU has to stop every time it references data in memory, and the delay represents enough time that the CPU could have executed hundreds of instructions. CPUs get faster, but memory doesn't at some point, it would no longer matter if you made a CPU faster because it would be spending almost all its time waiting for memory. Memory speed is the major

limiting factor in the performance computers today. Making the Cache larger helps, but it ultimately cannot solve the problem.

What to Buy

If you plan on running Vista, you want a computer with 2 Gigabytes of RAM. A 1 Gigabyte memory "stick" (DIMM) costs slightly more than $100. Memory prices were flat or went up slightly through 2006.

You can spend more money for faster memory. This might be represented by a faster clock

speed, or it may be represented by lower latency numbers. Faster memory speeds up the execution of a dedicated program (mostly video games). However, the very fastest memory costs substantially more, and most users would get more out of twice as much memory than they get out of slightly faster memory.

Memory contains performance values (the SPID) that the mainboard can read at power up time.

The mainboard has clock speeds that it supports, and it will try to run the memory at the fastest speed the memory supports. Unfortunately, each mainboard is a little different, and each memory is a little different, and the standards here are not as tight as we would like. Some combinations

of board and memory don't work at the fastest speed. If you really, really want your memory to run at 800 MHz, then you have to choose a particular memory vendor and part that the mainboard vendor has tested (or that someone else has tested).

If you simply buy parts based on their rated speed, you may end up with memory that runs at 800 MHz, but only on some other board, and a board that works at 800 MHz, but only with some


S Sandeep Kumar

other memory. Fortunately, you can simply go to the BIOS screens and force the memory bus speed to the next lower value (667 Mhz) and everything will work flawlessly, if not at top speed.

The most important memory feature is one that most desktop users are unable to select. Memory

can come with ECC error checking. This will detect a problem in the memory itself, but it will also detect sporadic problems caused by mismatches with the board. If you don't have ECC memory, then memory problems show up as corrupted data and cause your programs and OS to crash in all sorts of random ways. To use ECC, you not only have to get the feature in the memory stick

but it also has to be supported by the mainboard, and most mainboard vendors only support it on server configurations. ECC costs a few bucks more, but that is a lot cheaper than the hours or days that you spend trying to track down a problem that initially appears to be a software problem but is ultimately resolved as a memory problem.

The Mainboard (Motherboard)

The mainboard contains slots for the CPU, memory, and I/O devices. In current designs, one chip called the Northbridge sits between and connects the three high speed devices: CPU, memory, and AGP video port. It is then connected to a second chip called the Southbridge that provides

logic for all the slow speed devices: the keyboard, mouse, modem port, printer port, IDE controller, PCI, USB, and any other devices.

29 Dec, 2006

The Mainboard

After the CPU chip, the mainboard or "motherboard" is the most important component of any personal computer. Intel makes a small number of its own boards, but most systems use motherboards from companies such as Abit, Asus, MSI, SuperMicro, Tyan, …

Mainboards come in sizes. A typical full sized computer uses an ATX mainboard which has room

for 4 memory sticks and 7 PCI or PCI-e adapter cards. A smaller size board and case supports the MATX standard, which typically has only two memory slots and room for 4 adapter cards. Although the mainboard is smaller, MATX vendors typically find room for integrated video on the mainboard so you don’t necessarily need to use up an adapter card slot. There are oversized boards, but they are only used in servers. There are tiny boards used in specialty devices.

The mainboard is attached to a tray in the bottom or side of the case by nine screws that screw into metal “standoffs” that keep the bottom of the mainboard a safe distance from the metal of the case. Everything else plugs into the mainboard:

• The CPU drops into the mainboard socket. • A mainboard for Intel CPUs has an LGA 775 socket, but the speed of the clock

generated by the mainboard to run the CPU has to be fast enough to support newer CPU models.

• AMD Athlon 64 CPUs come in versions for Socket 939 and AM2. The difference is

that 939 CPUs support older DDR memory and AM2 CPUs support DDR2 memory. • Server boards support a different Intel Athlon or AMD Opteron socket, but ordinary

users will typically not encounter them. However, some Opteron chips are made for the 939 and AM2 sockets. They are equivalent to Althlon 64 processors, but with a

larger 1 megabyte of cache. • DDR and DDR2 memory is rated by speed. A mainboard will be rated for a certain

maximum speed, but it will slow down to support slower memory. Memory will be rated for

a maximum speed, but it will also slow down if plugged into a slower board. So the only


S Sandeep Kumar

compatibility issue is to put DDR memory into a DDR mainboard, and DDR2 memory into a DDR2 mainboard.

• Memory is rated for a certain speed, and the mainboard is rated for a certain maximum speed. However, in some cases the combination of a particular brand of low cost memory plus a particular brand of mainboard will cause trouble. The mainboard will try to run at the rated top speed, but occasionally the data will be corrupted. The problem can be

solved by entering the BIOS configuration panels at power up and manually setting the memory speed to the next slower setting. There is no message telling you that you have the problem, and until you realize it the computer will crash in many different ways at different times. In order for the computer to detect a memory problem, it needs an extra

memory feature called ECC. All servers use ECC. Microsoft would love computer makers to put ECC memory in every computer, because they get blamed for all the failures that are really caused by memory problems. However, ECC costs a bit more and computer makers

unwisely shave a few bucks off the cost of a system by using cheaper unchecked memory. If you can find a combination of mainboard, CPU, and memory that supports ECC, it is the single most valuable upgrade you can consider.

• A modern mainboard has some combination of old flat “parallel” ATA connectors and new,

small Serial ATA (SATA) connectors. SATA is better, but until recently you needed the old flat cable for DVD drives. During 2007 we may see a transition to all SATA.

• The two leading makers of video chips are Nvidia and ATI. AMD bought ATI last year, and

Nvidia makes a very popular and powerful set of chips for controlling mainboards. As a result, the quality and capability of mainboards that come with integrated graphics is improving. Anyone playing video games still wants a separate video card. Integrated video typically uses some of the computer’s main memory, and contention for memory between

ordinary programs running on the CPU and video requirements has an impact on overall system performance. However, business users and adults who just want to run Windows, Office, and view TV or DVDs may be quite happy with the video that comes on a mainboard. If you need more, you can always buy a video card later on.

• Mainboards also have integrated audio. If your tastes are simple, this will be perfectly adequate. However, games may require computer generated sounds, and when you have the power, computer processing of even recorded sound can add impressive effects that

make ordinary headphones sound richer. An add-on audio card from companies like Creative will be able to do far more sound processing, if you need it.

• All mainboards have USB capability. However, to run external disks at their full speed, a external form of SATA called “eSATA” is becoming popular. New mainboards may have one

or two eSATA slots on the back panel next to the USB and Ethernet. • Modern mainboards have some combination of modern PCI Express (PCI-e) slots and older

PCI slots. PCI-e card slots come in sizes. Video always uses a long “x16” slot. Other types

of cards can fit into smaller x1 and x4 slots. PCI-e cards have a size, but they can always be plugged into a larger mainboard slot. An x1 card will also plug into an x4 or x16 slot. However, a larger card will not fit into a smaller size slot. Other than video, PCI-e cards are still rather exotic devices. There are a few disk controllers and TV tuners. There is

currently no PCI-e add-in sound card. The high end boards sold to people who play video games have room for two oversize video cards, but then they can support only two old PCI cards. Less expensive boards with one PCI-e video slot and two PCI-e x1 or x4 slots leave room for up to four old PCI slots and may be more useful to mainstream users.

Chip Set

The core of each mainboard is a pair of chips collectively referred to as "the Chip Set". They sit in the middle of the mainboard and are connected to everything else.

Intel makes its own Chip Sets for people who want a high quality, conservative, middle of the road system. A bit more function at lower cost is provided by alternate chip sets from companies named VIA, SIS, and Nvidia. The support for CPU, memory, and PCI is pretty much the same


S Sandeep Kumar

from all vendors, so the choice of mainboard and chipset may be driven by video, USB 2, FireWire, audio, and integrated LAN.

Each vendor has different Chip Sets for Intel and AMD systems. Even for an Intel Pentium IV,

however, there are different FSB CPU speeds (400, 533, 800), different DDR memory speeds (266, 333, 400, and dual bus 400).

The first chip in the set is called the "Northbridge". It connects to the three high speed devices: the CPU, memory, and video card. Most of the time the Northbridge moves data between the CPU and memory.

The second chip, called the "Southbridge", provides the control function for all the other devices.

• The Southbridge generates the PCI bus and typically generates a few extra PCI Express

lines. • The Southbridge contains the controller function for the parallel and Serial ATA hard disks

and DVD drive. Some mainboards have a secondary SATA controller chip connected to the Southbridge using one PCI-e lane.

• The Southbridge generates ports for USB and for all the standard low speed devices (keyboard, mouse, serial port, printer port).

• One Ethernet port is supported by the Southbridge, but some mainboards add a second Ethernet port supported by an external chip connected to the Southbridge.

Each control function of the Southbridge started out as a separate device with its own controller

chip. In fact, in the first IBM PC, the Serial Port and Printer Port each came on separate adapter cards. Over the next 25 years each of the old support chips were combined, new control functions were added, and then they two were combined with the old functions. The result is a single Southbridge chip that combines functions that at one time were on dozens of separate chips.


S Sandeep Kumar

In order to maintain compatibility with all the operating systems and applications previously written, new chips continue to pretend to be each old chip they replace. Thus the Southbridge

doesn't behave like a single device, but rather like dozens of individual devices each with their own I/O addresses, interrupt levels, and individual states and status.

It is important to remember that the Southbridge only provides the control functions of all these devices. Control logic operates at the low voltages associated with a chip's internal processing. The external devices being controlled (keyboards, mice, USB, Ethernet, etc.) have long external

cables that require higher voltage and often require additional power lines. So the Southbridge chip isn't directly connected to the keyboard it controls. There have to be additional intermediate circuits (traditionally called "drivers" and "receivers") to take Southbridge signals and step them up to 5 or 12 volts on the way out, and then to accept signals at 5 or 12 volts and step that signal back down to the lower voltage that can be accepted by the Southbridge.

AMD and HyperTransport

Putting a Northbridge chip between the CPU and the memory adds a slight delay. AMD decided a few years ago to avoid the delay and simplify the mainboard by connecting the memory directly to the CPU chip. This makes things slightly more complicated, because the CPU socket changes for each type of memory (DDR or DDR2, with or without ECC). However, it has turned out to be such a good design that Intel intends to adopt it a few years from now.

When the memory is connected to the CPU, then there is no need for a Northbridge chip per se. There are two solutions. For low cost systems, all the PCI-e support (including the 16 lines needed to run a video card) move to a slightly enhanced Southbridge. Such systems have only the one chip and support only one video card. More expensive systems add a second chip that

creates additional PCI-e lines to support a second video card, and maybe a few extra SATA connections.

AMD needed a standard connection between the CPU and the one or two external mainboard chips. It adopted an industry standard called HyperTransport. HT is a very high speed connection

between chips on a mainboard. When three or more chips are connected together, middle chips act as a “tunnel” to receive and forward on data transmitted between external chips.

The AMD design is particularly good when you want to add additional CPU chips. In the Intel design, more than one CPU (even more than one core inside the same chip) is supported through the Northbridge. In the AMD design, cores that are part of the same chip can talk directly to each other, and one CPU chip talks to another CPU chip directly over a HT link.

Hard Disks and CD Drives

Apple adopted an industry standard technology called SCSI for its Macintosh computers. It was a standard that applied to desktops, servers, and even mainframe computers. PC makers, however,

followed a path of tricks and gimmicks to design the lowest cost disk attachment. The simplest possible electronic interface was a chip that duplicated exactly the mainboard I/O bus available at the time. A simple 40 wire cable connected this chip to logic chips on the disk. The mainboard bus had been introduced on the IBM PC AT in 1985, so the disk connection became knows as AT

Attachment or "ATA". It is also popularly known as "IDE" but some manufacturer claimed that as a trademark barring its use as an official name. Then a dozen years passed, and each year the chips got twice as smart as the year before. ATA evolved from an 8 MHz connection to a 133 MHz

connection and became smart enough to handle other types of devices. However, the physical connectors and programming interface had to build on and remain compatible with an idea that some engineers developed to build the lowest cost possible interface based on the primitive


S Sandeep Kumar

electronics available at that moment in time. Today computers are transitioning to a new simpler and higher speed interface called Serial ATA.

5 Jan, 2007

Hard Disk

There are all sorts of hard disks. They come in desktop and laptop sizes, with large and small capacity, and in consumer and enterprise (corporate) configurations. You can put them inside the

computer, or you can buy an external enclosure that connects over USB, eSATA, or SCSI. You can also buy network attached enclosures and corporations can buy large storage devices called a "SAN".

There are lots of options and lots of salesmen who promise all sorts of magic, particularly to corporate managers who should know better. Remarkably, even IT professionals can get

hoodwinked by a good pitch and some hand waving. In reality, all disks work exactly the same way and their performance is determined by some very simply factors. Whether you put the disk into your desktop or into some massive, expensive box in a machine room, it will continue to rotate and perform at the same speed.

The Physical Characteristics of All Disks

The hard disk has one or more metal platters coated top and bottom with a magnetic material

similar to the coating on a VCR magnetic tape. In the VCR the tape moves by a fixed recording and sensing device (the "head"). In a disk, the recording head is on a movable metal support called the "arm".

Information is recorded onto bands of the disk surface that form concentric circles. The circle closest to the outside is much bigger than the circle closest to the center. Since each metal

platter has a top and bottom surface, there are at least two magnetic circles for each size and location. However, a disk may have as many as five platters, producing ten of these identical circles at the same distance out from the center.

There is a separate magnetic read/write head for each disk surface. With five platters there are

ten heads. They are all fixed on a single metal device that moves the heads from the center of the disk to to outer edge of the disk. Instead of moving smoothly across the surface, the arm jumps from one position to the next. Each arm position corresponds to one circle of recorded material on each surface of each platter of the disk.

The information in one recorded circle of material used to be called a "track". The information on

all of the tracks of all of the surfaces when the arm is in any single fixed position used to be called a "cylinder". That was back in the old days when disk electronics was dumb.

Today the control chip on each disk is very smart. It is smart enough to know that the outermost circle on the disk is bigger and can hold more data than the innermost circle. So the physical layout of a disk (the number of platters and arms or the amount of data in each track) is today

only known by the electronics on the disk. To the rest of the world, the disk is a bunch of records numbers 0 to some large number. Some drives and some OS software still reports a number of "cylinders", but on a modern drive this is just a unit of storage that is some number of megabytes.

Although you cannot externally determine where any particular record is located, this does not

change the basic performance of the disk. To find any particular piece of data, the disk must first move the arm to the correct location where the data has been stored (on some platter surface).


S Sandeep Kumar

Then it must wait for the disk to rotate until the data it wants is positioned under one of the heads.

A desktop computer has disks that rotate at 7200 RPM and an arm that can, on average, move

from one position to another across the disk surface. Moving the arm is an operation called a "seek". It takes much less time to move from one position on the disk to the next position than it does to move the arm across the entire surface. Disk makers report an average seek time which is the time it takes to move between two randomly chosen positions. For a desktop disk, the

average seek time is 8 or 9 milliseconds (thousandths of a second). For enterprise (corporate server disks) this drops to 4.5 milliseconds.

After the arm is in position, the operation must wait for the disk surface to rotate to the point where the start of the data is positioned directly under the read head. Some times the data can be read immediately, but other times the disk will have to rotate almost completely around

before the position is right. So on average, the disk has to wait one half of a rotation (a period called the "rotational latency").

One the arm is in position, the data is read at whatever speed it passes under the arms. This is called the "transfer", but it takes so little time compared to the previous two numbers that it doesn't matter. In practice the performance of a disk is the sum of average seek time and rotational latency.

A desktop disk rotates at 7200 RPM and has an average seek time of 8 or 9 milliseconds. Latency is half of 60/7200 or around 4 milliseconds. Total average delay is around 12 or 13 milliseconds. This means that a desktop drive can position to a random new location on the disk around 75 times per second.

A enterprise disk rotates at 10,000 or 15,000 RPM and has an average seek time of 4.5

milliseconds. That makes it almost exactly twice as fast as a desktop drive. These disks attach through a SCSI or Serial Attached SCSI (SAS) connection and they can cost 5 to 10 times as much as a desktop disk depending on what price you get from your vendor.

Suppose your computer is reading 4K chunks of data from the disk surface. You can read 75 such random chunks per second, for a total of 300K per second. If you move to enterprise disks, you can bump this to 600K.

Now suppose you are reading a really big file from beginning to end. The arm moves once to the start of the data, but now instead of reading 4K of data it reads all the data on the track. Then it reads the other tracks on the other disk surfaces that can be accessed without moving the arm. It

reads on track for every disk rotation. When it has read all the data in this one location, it moves the arm the minimum amount (to the next higher position) and reads all the data there. Disks are typically optimized for this type of sequential access, and they can transfer large data files at an aggregate of 40 Megabytes per second (for desktop) or 80 Megabytes per second (for enterprise disks).

So if you read random things scattered across the disk you get 300,000 bytes per second, while if you read one big file sequentially you get 40,000,000 bytes per second. That's a factor of 100 difference. If you spend 5 to 10 times as much, you can double the speed. If you can avoid moving the arm unnecessarily, you can get 100 times the speed.

This becomes important as new generations of easy to use desktop search technology allow

casual users to look for documents or mail meeting certain criteria. If the mail has already been indexed this is not a problem. Consider two users who have 40 megabytes of data that needs to be searched and, by good luck, each chunk of data is located in one place, but the two chunks are


S Sandeep Kumar

on different areas of the same disk. If the search technology is smart enough to know that these are two requests for the same disk, the best performance would be to complete the search for

one user (1 second on a desktop disk, 0.5 seconds on an enterprise disk) and then do the search for the other user. Unfortunately, a lot of systems will try to run both searches at the same time, forcing the arm to move back and forth between the two chunks of data being searched. It could take 100 times as long.

Corporate data moves into increasingly sophisticated RAID arrays and Storage Area Networks

that promise to free the administrator from the problem of tracking where or how the data is physically stored. Unfortunately, this also means that an application cannot tell when two search requests are on the same or different disks. You might hope that for $100,000 or more the SAN would be smart enough to fix things automatically, but it isn't.

There is no substitute for manually optimizing things that you do all the time. In a corporate

database, for example, you always put the "log" file on a physically different disk than the data (or the arm will be constantly jumping between the two). A desktop computer user should think about how often he copies or processes large files (for example, removing commercials from recorded TV programs). Putting the input and output files on different disks will cause the operation to run more than 10 times faster.

Using cache (in the computer memory, on the disk controller, or on the disk itself) will optimize the random requests you cannot anticipate. Careful positioning of data for things you do over and over will have a much greater effect. You might think it would cost more money to have two disks

than to have one, but it depends on the amount of data you store. Two 250 Gigabyte desktop computer disks actually cost less than one 500 Gigabyte drive, and three of them cost a lot less than one 750 Gigabyte disk. You pay a premium for large devices, but a larger number of small independent devices will always perform better. Of course, you must plan for such a

configuration. You need room in the case for more drives, and you need SATA connectors on the mainboard, and you need power from the Power Supply.

One of the most widely quoted performance characteristics is totally meaningless. A desktop disk can read data at a maximum rate of 40 to 60 megabytes per second. An ATA or Serial ATA connection may be advertised at 100, 150, or 300 megabytes per second. That speed represents

the burst speed for transferring data from the disk cache to the computer, but the disk performance still remains a tiny fraction of this nominal transfer rate.

ATA

When IBM designed its first PC Hard Drives, it put all the logic chips on a separate controller card. The disk itself was a dumb magnetic recording device, like a tape player. By the early '90s, however, computer chips had become cheap and powerful enough to put all the control logic for

motors, recording arm positioning, and digital to analog conversion on the disk itself, eliminating the need for a controller.

Still, the disk had to be connected to the mainboard through some type of cable, and the data transfer required a protocol. Chips were smarter, but not quite that smart yet. The simplest design was to create a chip that didn't do anything. To make that work, they decided that the

protocol on the cable between the disk and the mainboard would be exactly the same as the protocol that all the devices on the mainboard itself used. This was called the Industry Standard Attachment protocol (ISA) but in reality it was a version of the hardware standards introduced by

IBM in 1985 with its PC "AT" model. "AT" was a registered trademark of IBM, so most of the industry avoided using it to talk about standards. However, the disk people were less sensitive to this and called the disk interface based on the ISA protocol the "AT Attachment" or ATA.


S Sandeep Kumar

The beauty of ATA is that the disk cable and the I/O bus used the same signals and transferred data the same way. A chip to connect the two had to do almost nothing at all, and that saved money.

Then both chip and disk technology changed. The old ISA bus was way too slow and dumb, so it was replaced by the PCI bus. So now disks were emulating a bus that no longer existed on the mainboard. Unfortunately, there were a lot of computers out there and a lot of disks, and it would be way too disruptive to change everything. They needed to maintain compatibility, so you could plug a new disk into an old computer or an old disk with valuable data into a new computer.

PIO and UDMA

In the old PC AT, data was transferred two bytes at a time by the CPU. This was a very inefficient

use of CPU power, but it was as much as could be done at the time. As disks got bigger and faster, they needed a more efficient approach. Direct Memory Access (DMA) provided an alternative way to run the old AT bus that bypassed the CPU. So disk vendors now created a

version of DMA that would run on their disk cable. Only they speeded it up faster and called it Ultra DMA or UDMA.

Up to 33 MHz it was possible to run UDMA on the old 40 wire flat cable that had always connected the disk to the mainboard. However, at higher speeds there was an interference between adjacent wires. They solved this problem by creating an 80 wire cable. Every other wire in the

cable was a dummy connected to ground. So really the 80 wire cable had 40 real signal wires just as before, but now when a wire generated interference with the wires next to it, the two wires on each side were dummies that didn't do anything.

Successive generations of the ATA standard have a number. Only experts really know these numbers. Most people remembered the speed of data transfer, which jumped from 33 to 66 to

100 MHz. There are some connections that run at 133 MHz, but there is no meaningful difference between 100 and 133.

The disk has a maximum speed it will support, but it can also run at all the older speeds. In fact, it can go all the way back to the old two bytes at a time managed by the CPU which is called PIO or Programmed I/O. When the system powers up the mainboard tests the connection to the disks

over the cable. The disk and mainboard agree to operate at the highest speed both can support. That is, unless there is something wrong with the cable. If they can't communicate reliably, then they try successively slower speeds until they settle at the bottom with PIO.

Unless you have a 15 year old disk, PIO is never a meaningful choice. Unfortunately, it is what

gets silently selected when you have a bad cable or the thing isn't plugged in right. The system will be quite happy to come up in PIO mode, until you try to copy a big file. Then it will take 100 times longer than it should and the CPU will be 100% busy. Since the original failure produced no messages, it may take a while to track down the problem and discover the cable problem. Look in

Windows Device Manager at the ATA disk controllers. There are two devices on each controller, and Windows will show you what speed they are running at. If any is in PIO mode, shut the system down and replace or replug the disk cables.

"Master" and "Slave" (or Cable Select)

An ATA cable has two devices. One must be declared to be M (address 0) and the other must be S (address 1).

Each disk or DVD drive has jumpers. One setting is for M, the other for S. If both disks on a cable

are set to M (or both to S) then they won't work. Modern disks have an alternate jump setting


S Sandeep Kumar

called Cable Select. If all disks are set to Cable Select, then the disk at the end of the cable will become M and the disk connected to the middle connector will be S.

Serial ATA

The old parallel ATA disks and cables are being replaced by a new generation of Serial ATA cables. Instead of the old 40 pin, 80 wire cable, the SATA connection has four pins and 7 wires. Two pair

of wires transfer data from the mainboard to the disk and from the disk to the mainboard. The other three wires are grounds to avoid interference:

1. Ground 2. Positive half of signal from mainboard to disk 3. Negative half of signal from mainboard to disk

4. Ground 5. Negative half of signal from disk to mainboard 6. Positive half of signal from disk to mainboard 7. Ground

If you imagine that they invented a whole new protocol to support this new hardware, you would be entirely wrong. PCs are all about compatibility. What they did was to invent a way to transmit the exact same commands and signals that used to be in the 40 pin interface over two pairs of wires. So inside the disk the SATA data is turned back into the same commands and protocol as old ATA, and the same on the mainboard end.

The first generation of SATA transferred data at 150 megabytes per second. SATA II now transfers data at 300 megabytes per second. However, remembering that desktop disks transfer data at 40 megabytes per second, and enterprise disks max out at 80 Mb/s, the difference between 150 and 300 is meaningless.

What kind of disk to buy?

It used to be that SATA disks were more expensive, but now they are the same price or a few

dollars cheaper than old parallel ATA models. Motherboards have more SATA capability, and SATA is simpler and faster. Unless you need to support an older computers, SATA is the modern choice.

Each vendor has desktop and RAID models of SATA disks. They are exactly the same until the disk begins to fail. Then the desktop models go into heroic recovery trying over and over, hundreds or thousands of times to read some part of the disk that has bad data. If you are using

the RAID capability of the mainboard to store a second copy of the data on a mirrored or RAID array of disks, then the data can be recovered from the other disk. In this case, the heroic efforts of a standard desktop disk are disruptive and unnecessary. The alternative RAID-friendly models of disks give up retrying quicker.

Western Digital has an enterprise class set of disks sold (at a premium price) to desktop users.

They are called the Raptor. While a 250 Gigabyte SATA disk may cost $70, a $150 Gigabyte Raptor costs $200. However, it rotates at 10,000 RPM instead of 7200 RPM and it seeks twice as fast (4.5 milliseconds instead of 9 milliseconds).

Starting in 2007, disk vendors will begin to make "hybrid" disks. This is a disk device with some flash memory. It may improve the performance of Vista, and it may allow laptop disks to shut down more often to extend battery life.

SCSI (For Servers and Power Users)


S Sandeep Kumar

SCSI was developed as an industry-wide standard. It is used on PC Servers, and was used on Macintosh, but it was also designed to handle the heavy load of performance critical central servers that cost hundreds of thousands of dollars. It is better than ATA, but more expensive.

In the IDE design, the mainboard is always in control and the disk is a slave. In the SCSI design, the computer and all the disks are more like peers. In the IDE design the computer addresses one of the two devices on the cable, and then the cable is busy until the device is done. In the SCSI design, the computer may send a command to one of the devices, but then the disks go off to

move their arms, position their heads, and transfer data between the disk surface and the cache memory in the disk device. All this happens in the background. The SCSI bus is only "busy" when it is actually transferring data or commands. A Server with SCSI disks can have commands outstanding to every disk. It can even queue up a few commands in each disk device so the disk

can begin to move the arm to the next location even while the data from the previous command is being transferred down the bus to the PC.

As with ATA, the SCSI bus has improved with new technology. The original bus was replaced with a "wide" bus and then an "ultra-wide" bus. The original speed has been increased to fast and then faster.

SCSI disks are much more expensive, but then they are much higher quality. IDE disks rotate at

5400 or 7200 RPM. SCSI disks rotate at 10000 and 15000 RPM. IDE disks come with a one year warrantee. SCSI disks come with a 5 year warrantee and 1,200,000 hours mean time to failure. [Note, this does not mean that a disk will run continuously for 136 years before you can expect it

to fail. It means that if you buy 136 of these disks and run them for a year, you can expect one of them to fail.]

The difference between ATA and SCSI is becoming blurred by the Serial ATA standard. Serial ATA is fast enough to match the best SCSI speed, and a system that runs a separate dedicated cable from the controller card to every disk solves all the blocking and bus performance problems.

A concrete measure of this change was the recent announcement from Western Digital of a disk

device with SCSI performance and reliability characteristics (10000 RPM rotation, 5 years warrantee) but a Serial ATA interface.

Serial Attached SCSI (SAS)

Even the fastest SCSI disk cannot achieve a sustained transfer rate to match the 150 megabytes per second capability of the current Serial ATA cable. No plausible future disk technology will do better than the 600 megabytes expected from future versions of the same technology. And while

the SCSI bus had better performance than the blocking problem of an old ATA cable, the new Serial ATA cables all connect one disk to one controller port.

So a new generation of SCSI devices began to appear on corporate servers in 2006 that combine the advantages of SCSI intelligence with the low cost and high speed of the Serial ATA cable. This is called Serial Attached SCSI or SAS.

A Serial ATA disk will offer more storage for much less money than a SAS disk. The type of cable

doesn't change the quality, rotation speed, warrantee, or other cost factors. SAS disks will be more reliable and smarter.

Most vendors (Dell in particular) sell SAS disks as replacement for the older SCSI systems. They don't take advantage of the new architecture. However, SAS disks were designed to attach to a type of storage network. Each disk has a hardware ID. Every computer controller card has a

hardware ID. You can connect the disks directly to the controller cards, or you can put intelligent


S Sandeep Kumar

switching devices in the middle. Then a bunch of disks and a bunch of computers are all connected together. Computers know which disks to talk to, and visa versa. However, because

the connection between any particular disk and any particular computer is all software, the system can be automatically reconfigured when one of the computers (or disks) begins to fail.

Video and Monitors

The video adapter requires higher data transfer speeds than any other device. While the disks

and network plug into the PCI bus or Southbridge mainboard chip, the video adapter is connected at high speed to the CPU and memory. For a decade, the video connector has been an AGP slot rated by speed (2x, 4x, or 8x). In the last few months, Intel has begun to offer a new slot design called PCI-Express that on paper can operate four times as fast as the fastest AGP slot. However,

at the currently available technology, no video adapter card requires that much extra speed.

1 Jan, 2007

Video

Ten years ago most video adapters had a sluggish processor, a megabyte or less of memory, and weak analog signal conversion chip. Given the memory, a user might have to choose between the number of dots on the screen (resolution) and the number of colors displayed. Limits in the analog conversion chip might force a slower refresh rate leading to a flickering screen. However, video adapters were already close to being fully adequate.

Then ten years of chip technology came along. Bumping the memory from one to four megabytes solved the color/resolution problem. Then cards had 8 megs, 16 megs, 32 megs, 64 megs, and now 128 or 256 megabytes of memory. There is certainly no plausible use for such video hardware if you just run Windows and Office. Today, the only function of a high end video adapter is to play 3D games.

This will not be true in 2007. The most visible feature of the new Microsoft Vista operating system will be its "Avalon" user interface. Microsoft will transfer much of the burden of presenting windows, menus, toolbars, and other screen elements to the processing power of current generation video adapters. Of course, if you only have an old video adapter you will continue to

get the current Windows interface. However, business users in a few years will require the kind of video card that today is only useful for blasting aliens and saving the universe.

Many systems come with some video capability built into the mainboard at no cost. This video is adequate for business use on the current Windows system. Even then, there are some legitimate business reasons for buying a separate video adapter to plug into the PCI Express slot:

• Video integrated on the mainboard uses main memory. This will slow down the

performance of the overall system when access to memory by the program you are running collides with access to the same memory by the video hardware. A separate video adapter card has its own memory and operates independently.

• Mainboard video typically connects to the monitor through the traditional small 15 pin analog VGA connector. The VGA connector is OK up to XGA (1024x768) resolutions and is required for a CRT monitor. However, at higher resolutions you get a sharper picture on your LCD display if you use the larger digital DVI connector. Typically you can only get DVI

on an AGP video card. • Video cards costing as little as $100 allow two monitors to be connected to the same PC.

Set side by side, Windows treats them as two halves of a single desktop. You can drag

applications with the mouse from one screen to the other. This expands your work surface. You can leave a database query or spreadsheet open on one screen while you compose a


S Sandeep Kumar

report or letter referencing the numbers on the other screen. Now that 20" flat LCD monitors cost $200, two display systems are a very affordable productivity aid.

However, after the cluster of basic video cards costing $100 there is a gap. The most powerful

newest card cost $350 or $400. These cards are for the serious gaming enthusiast, although there might be some 3D applications in architecture. There is simply a lot more software for blasting aliens from the planet Zoron than there is for visualizing the layout of a new kitchen.

Visible Features

Resolution

Screen resolution is stated as two numbers. The first number counts the dots horizontally spaced across each line. The second number counts the number of lines from the top to the bottom of the screen. In the old days, there were five standard resolutions:

640x480 (VGA)

800x600 (SVGA) 1024x768 (XGA) 1280x1024(SXGA) 1600x1200(UXGA)

The horizontal number is always the larger of the two. There are more dots on each line than

there are lines from the top to the bottom of the screen. The screen is still square. This can be explained by the fact that the "dots" are not round or square, but instead are rectangles that are taller than they are wide. The extra height of each "dot" makes up for the smaller number of them.

Screens that use these standard resolutions display a picture in approximately the standard 4:3

"aspect ratio" used for standard TV broadcast. Movies recorded on DVD and High Definition TV pictures are intended for display on a wider aspect ratio of 16:9. A small number of laptop and desktop LCD panels are designed to support this type of widescreen display.

The wider aspect ratio is designated by prepending the letter "W" in front of a standard resolution

name. This means that the width has been bumped to the next larger size, but the resolution of the height has not changed. For example, XGA is 1024 wide and 768 high. The next higher standard size is SXGA which is 1280 wide and 1024 high. However, a WXGA wide aspect ratio screen has a resolution that is 1280 wide but only 768 high.

You can buy wide aspect displays with resolutions of:

1280x768 (WXGA)

1600x1024 (WSGA) 1920x1200 (WUXGA)

The wide screen resolutions have become more common with the adoption of LCD flat panel TV sets. High Definition TV has two standard resolutions of 1280x720 (just a few lines shorter than WXGA) and 1920x1080 (just a few lines shorter than WUXGA). Of course, HDTV sets will also

display XGA (1024x768) but if you hook a computer up to a flat screen TV it will look better in a wide screen resolution.

A CRT monitor can switch between resolutions up to some maximum supported value. A laptop or flat panel LCD monitor generally has one native resolution that corresponds to the dots


S Sandeep Kumar

manufactured into the screen. It may support lower resolutions, but will look best at its native setting.

Brightness

An LCD display has a white backlight that shines through a screen filled with red, green, and blue bits of glass. This produces tiny dots of colored light. The active Liquid Crystal part of the LCD

display is a variable polarized filter in front of each dot that controls the amount of each dot of red, green, and blue light that gets through. If all the light from all three colors gets through, the eye merges the three colors and sees a "white" light. If all the light is blocked, you see black. Otherwise, you see a generated color.

A desktop display that only has to show Excel spreadsheets and PowerPoint presentations can get

along fine with a few bold color distinctions. The eye can also draw clear distinctions between different bright colors on a standard computer monitor. However, dark colors present a separate problem

One performance measurement in the specifications of every LCD panel is a measure of

brightness. It is expressed in "nits" or units of brightness per square millimeter. A standard desktop LCD monitor has a brightness of 250 units. An LCD TV monitor designed to be viewed from across the room typically has a brightness of 500 units and that is the best available today. A small number of devices sold by various vendors provide intermediate values of 300, 350, 400, or 450.

A 17 inch LCD computer monitor has a resolution of 1280x1024. A 20" LCD TV, however, is often sold with a resolution of only 800x600. From across the room, you can't see high resolution. If you consider a "TV" to double as a computer monitor, check carefully its native resolution and be sure it matches a value supported by the video adapter in your PC.

Color Range

If you look even closer, each dot on the screen consists of three separate parts. One component

is Red, one is Green, and one is Blue. Seen from a distance, the three components merge to form a composite which can be adjusted to any color our eye can see. Although the amount of each of the three base colors is continuous, computer equipment generally creates a range of possible brightness from 0 to 255 so that the intensity can be represented by a byte.

Ten years ago video adapters had small amounts of memory. They would save space by

allocating only one or two bytes of memory for every dot on the screen. The adapter would then translate the smaller value into a full color byte. Today there are no adapters with so little memory that they cannot allocate three or four bytes per dot, even at the highest resolutions.

However, as frequently happens a new need comes along to make use of an otherwise obsolete old feature. Windows 2000, XP, and Server 2003 machines all support "remote desktop

connection". When this feature is enabled, an administrator can connect to the computer over the network using the RDC client program. A window opens on the client machine and shows an image of the desktop of the remote computer. To save bandwidth, the connection can be

configured to use the old one byte or two byte video modes previously used by obsolete adapters. Emulating these video modes reduces the amount of data that has to be transmitted over the network to display the desktop image.

Performance Features

Integrated or Separate Adapter?


S Sandeep Kumar

To save money and space, smaller mainboards often come with a video adapter built in. This is attractive for corporate systems, where the only application is running office, or for Media Center systems that only have to drive the TV set.

In the past, integrated video was always crappy. However, the main suppliers of video cards now build chips for mainboards, and while integrated video is not as powerful as a separate adapter card, it can be adequate for most purposes.

If you want to play the latest video games, you want a separate video card. Otherwise, consider what you want to do and compare it with the capability of the integrated video. For example, if

you intend to play Blu-ray or HD-DVD movies through the computer, then you need integrated video that will connect to your high definition monitor and you want an integrated video chip that supports decoding H.264 data.

How many Monitors?

Monitors used to be expensive. Now a 20" LCD panel can be purchased for $200, which means you can get two for $400. However, integrated mainboard video is often limited to a single

monitor. Most video adapter cards support two monitors. If you want to run three monitors, you may need a second video card.

SLI or Crossfire

Video cards perform massive amounts of repetitive operations. You can buy faster video cards with faster processors, but when you reach the limit here the next step is to add a second video card and split the work between the cards. Nvidia calls this "SLI" while ATI calls it "Crossfire".

This is only interesting for video games. Unlike the previous case, where you added a second card to drive the third monitor, all the cards you use in a SLI/Crossfire configuration drive a single monitor that is running the one gaming application.

DirectX 9 or 10

The Windows programming support for games and TV applications is called DirectX. This is a programming standard that changes from year to year. Windows XP used to support DirectX 8, but today most XP users have installed the free upgrade to DirectX 9. Windows Vista comes with support for DirectX 10.

Video cards support some level of DirectX. You can always plug an old card into a new system,

but it won't be able to use all the features. As this is being written (Jan 2007) there is only one graphics chip that supports DirectX 10, and all the cards that use the chip cost $400 or more. So DirectX 9 is the only cost effective solution and is generally the level of support you should look

for when buying new equipment. As the year progresses, more cost effective support for DirectX 10 will become available.

Purevideo or Avivo

Video adapter hardware can also be used to offload a lot of the video stream processing when you are watching live or recorded video. This comes at several levels.

• MPEG 2 is the video compression used in DVDs, most Media Center video recording cards, and broadcast and cable digital TV (including HD broadcast programs). Some level of


S Sandeep Kumar

MPEG 2 support has been provided by all integrated mainboard and video adapter cards for the last five years (for as long as DVD movies have been widely used).

• MPEG 4 and Windows Media (WMV) are more advanced compression methods. They create smaller files (or better pictures), but they require more processing.

• H.264 is the newest compression method. It requires the most processing and provides the best compression. It may be found on some Blu-ray disks.

Each new generation of video processing chip provides hardware support for more video

compression options. The Nvidia 6xxx (6000 series) of cards provided the first "Purevideo" acceleration of MPEG 2. The subsequent 7xxx cards support more video formats. The latest 8xxx cards will do better when they become more widely available.

ATI has corresponding support and a brand name called "Avivo". It is not clear exactly what that means, but you will get better hardware support for displaying video files in the 1xxx series of cards (1600, 1650, 1900, 1950) than in older cards, and newer chips will follow.

Without hardware acceleration, trying to play a Blu-ray or HD-DVD movie may run your CPU into the ground and produce unsatisfactory results.

AGP

For about a decade, video cards plugged into a special AGP video slot. The AGP slot had more data wires than PCI (64 instead of 32) and it ran at a higher clock rate (66 MHz instead of 32). Successive generations of AGP video cards transferred data 2, 4, or 8 times per clock cycle.

Each subsequent generation of AGP card ran faster, and in computer terms that means that it ran with a lower voltage level.

• AGP 1 supports 1x and 2x adapter cards with a signal level of 3.3 volts.

• AGP 2 supports 4x adapter cards with a signal level of 1.5 volts (it also supports 1x and 2x at the lower voltage, but why bother).

• AGP 3 is a new standard that will support 8x adapter cards with a signal level of 0.8 volts (and again it "supports" slower transfer, but why bother).

There are slightly different plug configurations to prevent you from accidentally plugging an AGP 1 card into a socket that only supports AGP 2 cards. Many adapter cards are configured to plug into either an AGP 1 or AGP 2 slot and to automatically adapt and run at either 3.3 V or 1.5 V.

Today there are still a few mainboards with AGP slots and a small number of cards made with AGP connectors. However, most video adapters have moved on to PCI Express.

PCI-Express

PCI Express is an entirely new bus architecture from Intel. It replaces not only the AGP slot for video, but also the PCI slots for all the other adapter cards (and the PC Card slot in your laptop). A more extensive discussion of PCI-e is provided in another article.

PCI Express transmits data over two pair of wires that provide 250 Megabytes per second in each direction. The two pair are called a "line". Additional bandwidth can be added by simply running 2, 4, 8, or 16 lines of PCI-e to the same adapter card.

Video adapter cards that use PCI-e always support the maximum 16 lines of PCI-e bandwidth. However, this is far more data transfer capability than any video card can actually use. Some


S Sandeep Kumar

mainboards provide the full 16 line slot to hold a video card, but then they only connect to the first 8-lines on the card. This is perfectly adequate for today's video cards.

If you are only running Windows and Office, you need even less bandwidth than this. For a very

short time, mainboard vendors designed products where the second video card might have even fewer PCI-e lines. However, mainboard chipsets have caught up and today most mainboards can provide more PCI-e lines than anyone can meaningfully use.

External Connectors

VGA Connector

In 1987 IBM introduces a 15 pin analog video interface plug for its "VGA" display. Technically this connector is called an MD15, where M stands for "mini", D because the plug is shaped like a letter "D", and 15 because there are 15 pins in three rows. Three pairs (six pins) present a voltage level for the three colors Red, Green, and Blue.

The video adapter and display monitor negotiate a resolution and refresh rate. This information implies a particular clock rate. Each dot of each line corresponds to a particular time period. During that time, the adapter generates voltage levels for the three colors and the display generates the corresponding dot.

When IBM invented the interface, monitors had a resolution of 640x480 refreshed 60 times a

second. However, the interface design would work on any resolution and refresh rate. Today it is frequently used for resolutions up to 1200x1600.

DVI Connector

The analog design of the VGA plug is a good match to the intrinsically analog operation of a CRT monitor. As long as you are using a CRT, no better interface design is possible.

However, today more people are buying flat panel LCD monitors. In the LCD each dot is an individually addressable digital element. It is more efficient and precise for the adapter to transmit digital numeric values for the color intensity of each dot.

The DVI connector is much larger than the analog VGA plug. It has lots more pins that allow digital information to be transferred between the adapter and the video monitor. The DVI plug contains both digital and analog versions of the signal, and an external converter plug can convert a DVI socket into a old VGA socket for connection to an old monitor.

It is common for high end video adapters to have both a DVI and VGA plug. LCD display panels

also come with support for both DVI and VGA connectors. If you have two identical panels, you might as an experiment plug one into the DVI plug and one into the VGA plug of the same adapter. You should notice that the monitor that uses the DVI plug has a slightly sharper picture with better colors.

HDMI Connector

The smaller HDMI connector is becoming popular for consumer electronics (TV and HD DVD

applications). Basically HDMI is a smaller plug version of the DVI connector, but adds a wire for digital audio. A small number of video adapter cards support HDMI today. It may become more popular, or computers may wait for the next standard to come along. Monitors may come with a


S Sandeep Kumar

cable that is HDMI on one end and DVI on the other. They will convert to real HDMI devices (Blu-ray players) or to DVI video cards.

HDTV

A high definition tube TV has the same basic design as a CRT computer monitor. A large Plasma TV hanging on a wall has a lot in common with an LCD monitor. Computer standards are so common that you can typically plug a computer into any TV that costs more than $3000.

However, there are two notable differences between TV and computer standards.

1. Standard definition TV signals (and one form of High Definition TV known as "1080i") are interlaced. The TV first receives every other line of the picture (say the odd lines). Then it goes back to the top and receives the lines that were skipped (the even ones). Any TV set can process interlaced signals, and conventional TVs can only process interlaced signals. A

computer, however, generates each line one after the other. This is called progressive scan (a term that some consumers may have picked up from the description of better DVD players). A $3000 TV can process either interlaced or progressive signals, but a $300

desktop computer monitor can only handle progressive signals. 2. Every TV and computer monitor, whether CRT, LCD, Plasma, or even projector, generates

the screen as a sequence of Red, Green, and Blue dots of light. The eye merges adjacent Red, Green, and Blue dots of various intensities to produce all the other colors. Computer

video adapter cards work by generating values for Red, Green, and Blue directly. They then transmit these values over the analog VGA or digital DVI cable. TV, however, started as a Black-and-White system and added color later. That original design could never be removed from the standard. So even today a digital TV, cable box, or DVD player

generates a black-and-white signal (Y) and then two color signals (Pr and Pb). You can generate the same picture either way. However, again a $3000 TV can receive either component TV input (Y Pr Pb), or analog computer (VGA), or digital (DVI). A computer monitor generally cannot display component TV signals.

Some computer monitors are sold with the ability to process component TV signals (the three RCA plugs colored Red, Green, and Blue that carry the Y Pr Pb signals). This allows the monitor to display HDTV from a cable set top box. Some computer video adapters contain a round connector that breaks out into the component TV signals to drive an HDTV set that doesn't support DVI.

This means that in practice, everything will soon connect to everything in the video hardware area, and they will make the connection any way you want. DVI or HDMI are preferred because they will give the sharper picture. Then you can use the component connectors on your monitor for your XBOX 360.

If you want more details on the computer-TV convergence, recording shows on you TV, and displaying HDTV on your computer, another article is available on this subject.

PCI and PCI Express

For a decade starting in 1985, PC adapter cards all plugged into the "ISA" bus. Then Intel came up with a better, faster PCI bus, which has dominated the last decade. The good news from such

a long period of stability is that there are lots of fast, cheap, compatible adapter cards to upgrade your computer with an extra disk controller or a better audio system. The bad news is that ten years are up and it is time for a new I/O bus. The only part of PCI-Express that is similar to the

old PCI bus is its name. It provides a much higher speed in a much smaller socket. However, although there are a few PCI-Express video cards available, there are no PCI-Express adapter cards. Systems will continue to need PCI slots for at least the next few years.


S Sandeep Kumar

24 Dec, 2006

The Bus (PCI and PCI-Express)

The CPU, memory, disks, and all the other devices in a computer have to be able to communicate and exchange data. The technology that connects them is called the "Bus".

In the first IBM PC, the Bus was just a set of wires that ran through the mainboard. Everything connected to this one Bus and operated at one clock speed. Soon, however, the CPU had to run much faster than anything else, and the memory had to be faster than any I/O device.

In a modern PC, the CPU, memory, and video card are connected to a high speed control chip called the "Northbridge". Each device runs at its own speed:

• The mainboard delivers a clock signal to the CPU at speeds of 166, 200, or 266 megahertz (ticks per second). The CPU typically transfers data four times a second, so the effective rate of this bus is typically 800 or 1066 MHz. This is called the front side bus (FSB).

Internally the CPU chops each FSB tick into smaller units, producing an internal clock of typically 1.8 to 3.6 GHz. The ration of the internal speed of the chip to the FSB clock is called the "multiplier". The BIOS allows you to increase the speed of the mainboard clock, which then speeds up the CPU. This is called "overclocking".

• Memory can transfer data at twice the rate (Double Data Rate or DDR) of a 200 to 400 MHz clock. Each transfer is 8 bytes if there is a single memory bus, and 16 bytes in a dual memory bus system. However, the clock speed is not always the best measure of memory

speed. After requesting data, the mainboard or CPU has to wait some number of clock ticks before reading the first data transfer. This is called the latency. So in theory a memory maker could double the clock speed, but then also double the number of clock ticks of latency, and the memory would not actually be any faster although it would appear

to have jumped from 400 to 800 MHz. • Modern video adapter cards plug into a PCI-Express slot with 16 lines of data transfer each

of which can transfer data at 250 megabytes per second. That is an aggregate of 4

Gigabytes per second, but no modern video card actually needs that kind of data transfer capability.

The Northbridge is then connected to a second slower control chip called the "Southbridge" that supports all the other devices.

• The old "Parallel" ATA (IDE) transfers data down the flat ribbon cable at 100 Megabytes per second (Mb/s).

• The new Serial ATA transfers data at 150 or 300 Mb/s. This higher speed is misleading, because no desktop hard drive can actually read or write data faster than 40 Mb/s, so even the slower SATA is faster than is necessary.

• The standard PCI bus transfers four bytes of data with every tick of a 33 MHz clock. It

therefore also has a rate of 133 Mb/s. • Each line of a PCI-Express bus transfers data at 250 Mb/s. PCI-e slots on the mainboard

can have 1, 2, 4, 8, or 16 lines that combine their data transfer capability. However, just one PCI-e line is faster than the old PCI bus, and only powerful and expensive RAID disk

adapters typically need more than one PCI-e line. • The USB port running in high speed mode transfers 480 megabits per second. That is 60

megabytes per second.

The Athlon 64 CPU has a simpler architecture. The memory connects directly to the CPU chip, and a slightly more powerful support chip can add video support to the usual Southbridge functions.


S Sandeep Kumar

A car drives down the local streets at 25 miles per hour. Then it turns onto a highway ramp and accelerates to 55. Is there one road system, or two? The important thing is that there is a

connection that allows a flow of traffic between the two speed zones. So data may flow from the CPU to the Northbridge at 6400 megabytes/second and then queue up to flow down the PCI bus at 133 megabytes/second. The effective data rate will be the slowest bus speed, but data can flow from any device to any other device.

PCI

Even a small modern mainboard can include integrated support for video, 6 channel high

definition audio, gigabit Ethernet, lots of USB slots, plus all the usual devices. If you need a different type of audio support, a TV tuner to record programs, a document scanner, or most devices, the 60 megabyte per second transfer rate of USB 2 is more than fast enough.

However, if you need extra gigabit Ethernet, or a RAID disk adapter, or a second video card, then USB is not fast enough. You need to plug these devices into an I/O bus. For the last fifteen years, that bus has been some form of PCI.

Classic PCI

Depending on their size, every desktop computer mainboard has one to five 32 bit PCI adapter

slots. This traditional PCI bus transfers 4 bytes of data with every tick of a 33 MHz clock, producing an aggregate bandwidth of 133 megabytes per second. However, this data transfer has to be shared among all of the devices in all of the slots.

There are a limited number of "interrupt levels" available in a PC. One interrupt level can be

shared by two or more devices. Each PCI slot is assigned to an interrupt level, and on most mainboards that interrupt level is already in use by some device built into the mainboard (video, disk controller, audio, Ethernet, etc.). This isn't necessarily a problem, but in practice some adapter cards that you buy don't get along well with specific mainboard devices. If a new PCI card

is not working correctly, and there is another free slot in the computer, moving the card may cause the problem to go away. This does not mean there is something wrong with the slot. It is probably just an incompatibility with the device that shares the interrupt level with the slot.

Server PCI [becoming obsolete]

If you need to support two or more gigabit Ethernet cards, or a big RAID adapter, or a FiberChannel interface, then 133 megabytes per second is not enough bandwidth.

Server computers have addressed this problem with expanded versions of the PCI bus. First, the

size of the connection between the adapter and the slot can be expanded to support 64 bit data transfer. Then the speed of the bus can be increased to 66, 100, or even 133 MHz. Twice the data at four times the data transfer up to just over a gigabyte per second.

However, you will only find these types of PCI slots on expensive Server computers, and inexpensive mainboards have PCI-e slots that are even faster.

PCI-Express (PCI-e)

PCI-Express is a new high speed I/O bus.


S Sandeep Kumar

The old PCI bus starts on the Mainboard at the Southbridge chip. Each of the 32 bits of data is represented by a single wire that runs through each PCI socket. The same 32 wires are used to send data from memory to the devices and from the devices to memory.

It is much easier to change hardware than software, and you don't want to have to wait for a new version of Windows or of the Linux Kernel. So the first requirement of a new bus is that it appear to the OS to be exactly the same as good old PCI. It is not necessary to change a single line of code in the OS or any device driver.

Now if you are going to have a bus that runs at a much higher speed than old PCI, you will have

to connect it to the higher speed Northbridge rather than the low speed Southbridge chip on the Mainboard. However, this is not a big problem because the new bus will, among other things, replace the old AGP video interface that the Northbridge chip used to generate.

Now lets build the PCI-Express bus up component by component.

Pair: Each bit of data will be carried on a pair of wires instead of on a single wire. Balanced signals mean that you can run a much higher clock speed with a much lower voltage. The old PCI

bus ran at 33 MHz on a desktop and maxed out at 133 MHz on exotic servers. The PCI Express bus runs at 2.5 GHz, but to detect errors and provide timing, it takes 10 clock ticks to transmit an 8 bit byte. Therefore, the pair of wires transmits 250 megabytes per second.

Point-to-Point: Each pair of wires goes from the Northbridge to a single device. The old PCI bus ran the same wire through every device slot. With PCI express, a pair of wires is dedicated to a single slot.

A Line: One pair of wires carries data from the Northbridge to the device. A second pair carries data from the device to the Northbridge. A group of two pair of four wires is called a "line".

Uni-Directional: The pairs operate independently, so a device can be both sending and receiving data at 250 megabytes per second on each line. A few vendors claim that the line transfers 500 megabytes per second, but this is misleading because few devices both transmit and receive the same amount of data.

The old desktop PCI bus could transfer a total of 133 megabytes per second of data sent in both directions for all of the five PCI slots on the mainboard. A single line of PCI Express is almost twice as fast in either direction, can transmit data in both directions simultaneously, and is dedicated to a single device. For ordinary desktop use, one line of PCI Express is very fast.

However, the more exotic forms of Server PCI could go faster, as could the old AGP slot. To

match or exceed these higher speeds, PCI Express allows one device to use two or more lines at the same time.

Round-Robin: The PCI bus transmitted one bit of data down each wire. The receiver accumulated these bits to form the data. A PCI Express line always sends a complete byte down the wire in 10 ticks of the 2.5 GHz clock. When a device is connected by more than one line, the

bytes are transmitted "round robin" by assigning each consecutive byte to the next line, then wrapping back from the last line to the first. Two lines can carry 500 megabytes per second in each direction, four lines can carry a gigabyte, eight lines can carry 2 gigabytes, and sixteen lines can carry 4 gigabytes per second in each direction.

x Notation: The convention is to use an "x" followed by the number of lines in use. This is,

unfortunately, often confused with the AGP speed notation. An "x16" PCI Express video card has 16 lines and can transmit data at 4 gigabytes per second in both directions simultaneously. An old


S Sandeep Kumar

8x AGP card runs at "8 times" the base speed of the interface and can transfer only 2 gigabytes per second totaled over both directions.

Negotiate: At startup time, the Northbridge sends a message down each line of PCI Express

asking the device at the other end to identify itself. When it gets back the same identity from two or more lines, it configures the device to round-robin the byte transmission across the lines that are connected to that device.

Similarly, when a PCI Express device is plugged into a socket, it does not know how many lines it will actually be able to communicate across. Every PCI Express device must be ready to do

everything on just one line if that is all the Mainboard is willing to allocate to it. The extra lines don't add anything except additional transmission capacity.

Power then data: A PCI Express socket has some power pins, a plastic barrier, and then a slot for signal pins. The signal slot can be large enough to accommodate connectors for 1, 4, 8, or 16 lines. A PCI Express card has connectors for the power, a gap that matches the plastic barrier in the slot, and then a tab that plugs into the signal slot.

The card can be shorter than the slot. A PCI Express card with a short tab can always plug into a plastic socket that is longer. Thus a PCI Express card with one line can plug into any Mainboard PCI Express slot even if it is designed for 4, 8, or 16 lines. Alternately, a PCI Express socket large enough to accomodate an x16 card will also accept any other size of card.

The data can be shorter than the slot. A mainboard doesn't have to connect an actual line of

transmission capability to every connector on the slot. Several mainboards have "Universal" x16 plastic slots to which only 8, 4, or 2 lines are actually connected. At startup the card will sense which lines are active and will use only the ones it really has.

Thus at startup the Mainboard indicates how many lines it has and the card responds on the number of lines it can accommodate. They end up using the smaller of the two numbers that both can support.

However, today there are only PCI Express video cards and they all have connectors for 16 lines. They can only plug in to a plastic slot that also has room for 16 lines. Most PCI Express boards have one primary Video Card slot to which 16 lines are normally connected. This means that the cards could both transmit and receive data at 4 gigabytes per second.

Some boards have a second video slot. However, almost no boards have enough lines to assign

16 to each slot. One common configuration is the Nvidia "SLI" approach in which there are two video x16 plastic slots into which two x16 video cards are plugged. However, the mainboard only connects 8 lines to the first 8 connectors on each card. Thus the 16 available lines are split evenly between the two adapter cards.

All the other approaches attach the full x16 lines to the first slot, but then the second slot is really

has only 2 or 4 lines of transmission capacity. That may not seem like a lot, but it is perfectly adequate bandwidth for anything except playing Doom, and video games don't support more than one display monitor. A second video card can then be used to connect 3 or 4 monitors, a

configuration that is more popular in business where x2 or x4 is perfectly adequate video bandwidth.

x8: Intel Servers may have plastic slots for x4 and x8 adapter cards. This will be useful at some time in the future when there are x4 and x8 adapter cards, probably some type of SCSI RAID adapter. None exist now.


S Sandeep Kumar

x1: Desktop boards often have some x1 plastic slot for non-existent x1 PCI Express adapter cards. These are often right next to the video card slots. Before you complain about the wasted

space, realize that modern video cards often require extra power and have large fans and heat sinks to dispose of the waste heat. Sometimes the video card doesn't actually fit in just its own slot and encroaches on the slot next to it. So if the neighboring slot is a useless x1 PCI Express slot, you haven't really lost anything.

Power: The PCI Express standard requires that the mainboard deliver more power to bigger slots

than smaller slots. There is a table of required power delivery for x1, x4, x8, and x16 plastic slots. Even when the mainboard doesn't populate all the data connectors with active lines, it must deliver the amount of power indicated for each slot size.

PCI or PCI-e

An ATX board has room for 7 slots. An MATX board has room for 4 slots. Since modern video adapters use 16 lines of PCI-e, there will typically be one full-sized PCI-e slot on the board. The

rest of the slots will be divided between PCI and PCI-e based on guesswork. You choose a mainboard based on your own guesswork of how many slots of each type you intend to use.

The very expensive video cards have very hot processing units that require extra cooling. As a result, they are often designed to occupy two card slots instead of one. Mainboard vendors typically put a small x1 (one line) PCI-e slot immediately next to the full size x16 slot reserved for

the video card. If you buy a more modest video card, you can use the x1 slot for some other PCI-e card.

For the very enthusiastic gamer, vendors build an "SLI" or "Crossfire" board with room for two double-slot PCI-e video cards. Such a board will typically have room left for only two traditional PCI slots. The full size x16 PCI-Express slot can take smaller x4 cards, so even if you don't intend to use two video cards it can be useful for, among other things, an x4 RAID adapter.

If the mainboard has only one PCI-e video card slot, then there is room for 2 other PCI-e x1 or x4 slots, and four traditional PCI slots.

What can you do with a PCI-e slot? If the mainboard has a limited number of SATA ports, or has no external e-SATA port, then you can spend $30-$40 for a PCI-e adapter card that adds that capability. If you want server class I/O capability, you can get an x4 PCI-e disk RAID adapter card

for $800 that connects a dozen disks with hardware error recovery. Consumers will be pleased to know that a small number of vendors are adding TV tuner cards for Media Center use that have a PCI-e x1 connection.

However, the industry is still in transition and there are (at the start of 2007) still fewer PCI-e cards than PCI. Creative, for example, only has PCI versions of its add-in audio adapters. The PCI TV tuner cards are less expensive and more numerous than PCI-e cards.

Long term, PCI-e is better. Short term, particularly if you have a bunch of existing cards that you want to keep, you need a mainboard with an adequate number of old PCI slots.

It is useful to compare the PCI-Express standard from Intel to the HyperTransport standard used by AMD and Apple:

• PCI-Express supports adapter cards plugged into sockets on the mainboard. HyperTransport connects chips soldered to the mainboard and cannot be connected to an

I/O socket.


S Sandeep Kumar

• PCI-Express runs point to point between the Northbridge and one device. HyperTransport is also a point to point connection, but each chip has a bridging capability to the next chip.

So one HyperTransport bus can connect to a sequence of chips through bridged point to point links.

• PCI-Express has variable bandwidth represented by the number of lanes dedicated to a device. However, a device has exclusive control of whatever bandwidth is assigned to it. HyperTransport has a fixed bandwidth that is shared by all the bridged chips.

That said, this is mostly a theoretical comparison. If you buy a mainboard with an Nvidia NForce chipset, the board will use HyperTransport between the CPU and the chipset and PCI Express between the chipset and the video adapters. Each bus has its own role and its own devices.

Ethernet

An Ethernet adapter card connects an office PC to the corporate network. At home it connects

several computers to each other for file sharing, and it allows all the computers to share a single high speed Ethernet connection over a DSL or Cable modem. 1 Jan, 2007

Ethernet

Ethernet has become a standard feature of every mainboard, because Ethernet is the preferred

connection to DSL from the phone company or cable modems from Comcast. New mainboards support Gigabit Ethernet, but there are still some older models that have 100 Megabit Ethernet. While you can attach one computer directly to the DSL or cable modem, for $30 you can buy a

router and connect all the computers in your house to Internet services. In the process, you will have connected all your computers to each other.

What is Ethernet? Ethernet today has almost nothing to do with the ideas of the researchers at Xerox who invented it. The first Ethernet was just a thick copper wire encased in a protective sheath. Each computer would drill into the wire with a special tap. Since there was just the one

piece of copper, any data transmitted by any computer would be received by all the other computers connected to the same wire.

Wire is cheap and dumb. However, the equipment needed to connect to the wire was expensive. Twenty years ago the card that connected a minicomputer to the Ethernet cost $2000, while a bridge (a two port switch to connect two separate Ethernet wires to each other) cost $7000. Then chips got cheaper. Today an Ethernet adapter is $15 and an eight port switch costs $28.

The change in technology and economics transformed the Ethernet physically. It no longer made sense to have a single big dumb piece of copper. With cheap smart circuits, the network could be made simpler and cheaper by connecting computers to central switches over ordinary (although high grade) phone wires.

Your home telephone connects to the phone company over one pair of copper wires. This one pair

of wires both sends whatever you say and receives whatever is said by the person at the other end. However, in most cases both of you don't try to speak at the same time, and voice is a relatively small amount of data.

Ethernet operating at speeds up to 100 megabits uses two pair of copper phone wires. Data transmitted by a computer goes out one set of wires, while data received from all other

computers (and from the Internet) comes in the other set of wires. Gigabit Eithernet uses four pair of wires and transmits in both directions on all pair.

Cables and Jacks


S Sandeep Kumar

An ordinary telephone uses the small size standard phone company jack called an "RJ11". It supports four wires. The phone company also has a larger standard jack called an "RJ45" with

room for eight wires. Normally the larger jack is used for corporate systems with many lines. Ethernet standardized on the larger jack even when it only uses four wires. If nothing else, it is useful for distinguishing the network jack from the smaller modem phone line jack on most laptops.

At speeds of 10 or 100 megabits, the Ethernet devices at each end of the wire (the computer and

the switch) each expect to transmit its data on one pair of wires and receive its data on another pair. They have to choose pairs that match. This is achieved in several ways:

• Computers and printers are all wired to transmit on one designated pair. Switches, routers, and modems, on the other hand, expect to receive data from that pair and transmit through the pair computers receive on. So an ordinary cable can connect a

computer to a port on a switch. • Sometimes you want to connect similar devices directly to each other. For example, you

can create an "Ethernet" simply by connecting two computers to each other. However, since the two Ethernet ports are wired identically, you need a "Crossover" cable. This cable

connects each pair of wires to one position on the plug at one end, and the opposite position on the plug at the other end. What one computer regards as transmit, the other regards as receive.

• When one switch is full, you get additional ports by connecting it to another switch. You could connect the two switches with a special Crossover cable. However, this is such a common requirement that one port on each 10/100 megabit switch is specially wired as the downlink port. That port is wired like a computer instead of the normal switch port. So

an ordinary cable can be connected from the downlink port of the switch to any standard port on another switch.

When you move to Gigabit Ethernet, however, there are no dedicated wires. Each wire pair has to carry 250 megabits per second of the aggregate 1 Gigabit load. That means that every pair has to be able to both transmit and receive data. When a Gigabit Ethernet device (computer or switch) is

connected to an older 100 megabit device, they not only sense the slower speed but also sense which pair of wires to use as transmit and which as receive.

The original Ethernet standard operated at 10 megabits per second. When run over twisted pair wire, this standard is called "10BaseT". The speed is "10" (megabits/sec), the "T" is for

"telephone twisted pair". "Base" standards for a "baseband" signal. In the popular press, "broadband" has been used as a synonym for "high speed". In technical standards, however, "broadband" means that the data is transmitted over a frequency, such as a channel in a Cable TV system. The phone company transmits DSL over the same pair of wires that carries your voice call, but the data is carried at a much higher frequency than the human ear can hear.

The current standard supports 100 megabits over the same type of cable, so it is called "100BaseT". Actually the quality of the cable is slightly higher for 100BaseT than for 10BaseT. Cable quality is designated as Category 3, 4, 5, or 6. Normally this is shorted to "Cat" and you will sound more impressive if you ask for "Cat 5" cable. The cable gets better with every higher

number. Higher quality cable may cost a few cents more, but as everyone with a closet full of power cords can testify, wire lasts for decades while technology changes.

The highest current standard is Cat 5E or Cat 6 cable. This is physically different from all the previous generations of Ethernet because it contains four twisted pair of wire that connect to all eight pins on the RJ45 plug. It supports 10 and 100 megabit transmission, but it also support the emerging standard for Gigabit Ethernet or 1000BaseT.


S Sandeep Kumar

Packets and Hardware Addressing

Today Internet protocols are used for everything. Ethernet, however, predates the Internet and has its own conventions for device addressing and packet formation. Ethernet conventions extend

only as far as the wire. An Ethernet may connect devices in your home, but to communicate outside your house you need Internet support.

When an Ethernet was formed from one shielded copper wire, the maximum size for each packet of data was set to be 1500 bytes. Anything bigger has to be broken down into multiple packets. After a device sends one packet it must pause before sending the next packet. All this made

sense when devices shared the same wire, but with modern equipment these conventions just slow down large file transfer.

Every Ethernet adapter is assigned a unique six byte number called its "MAC" address. Every packet of data has a source MAC address, of the adapter that sent it, and a destination MAC address. Normal data is sent to one machine, but a packet can be given a "broadcast" address

and it will be duplicated by the switches and sent to every computer in the local network. The adapter card in every computer checks the destination MAC address in every packet it receives. It accepts packets addressed to it or containing a broadcast address. It discards all data addressed to another machine.

Modern switches watch the packets that pass through them and learn the port to which each MAC

address is connected. However, a residue of the old days when the Ethernet was just a dumb piece of copper is the convention that all packets could be broadcast to all computers and the adapters would ignore packets not addressed to them. The ability of switches to filter out and direct traffic aids performance, but it is not required for the system to work.

Internet protocols were added on top of this system of Ethernet packets. Each Internet device has

an IP address. Internet packets are directed to the IP address. Each computer or router maintains a table that maps IP addresses to Ethernet MAC addresses. Traffic to other computers on the local network is sent directly. Traffic to other computers goes out through the gateway router connected to the modem.

The maximum packet size of 1500 bytes reflected a physical limitation of a type of wiring that

hasn't been used in 10 years. However, for compatibility purposes, it is still the default maximum packet size on modern equipment. Gigabit Ethernet is slowed down by the requirement to send lots of data over such a small packet size. Gigabit Ethernet devices have the ability to use

"Jumbo" packet sizes, typically up to 9000 bytes. If you transfer large files between computers on a home network, enabling Jumbo packets should improve performance. However, you need to buy a $30 Ethernet switch that supports Jumbo packets and not a $30 Ethernet switch that doesn't support them.

Switches, Routers, Gateways, and Firewalls

A DSL or Cable modem frequently comes with an Ethernet adapter for a PC and a cable. Put the

adapter in the PC, connect it through the cable to a jack in the modem, install the software, and the computer is connected to the Internet. This creates a simple Ethernet with just two devices.

To share the Internet connection or other devices between two or more PCs, you need a switch or router.

A "switch" is a device typically costing $30 to $50 with a row of jacks. Connect each computer to the switch through phone wire cable. Any data sent by any computer goes through the switch and

arrives at the computer or device to which it was directed. A switch knows nothing about Internet


S Sandeep Kumar

protocols. Data move through the switch, but the switch itself neither generates nor receives messages.

A "router" is a slightly more expensive and more intelligent device. Home users typically

purchase a router that controls the DSL or Cable modem connecting to the Internet. A router knows Internet protocols. It has an address just like the computers. Modern routers frequently have a built in Web Server and can be controlled from a PC Web Browser.

To clarify obsolete terminology, a "hub" is an older device that does a subset of the functions of a modern switch. Given current prices, it makes no sense today to use hubs.

A switch has memory to hold some amount of data from each device. This allows different computers to connect to the same switch at different speeds. For example, a very old printer

could connect to the switch at 10 megabits per second, while an old computer connects at 100 megabits per second, and a current computer connects at Gigabit speed. The switch receives the data at whatever speed the device can send, then turns around and sends the data on at

whatever speed the receiver can support. Gigabit data is retransmitted at 10 megabits per second if it goes to the printer.

The switch negotiates speed with each device and learns its MAC address. Ethernet packets have an address field that contains the MAC value of the intended receiver. Switches will forward data only to its intended recipient.

Ethernet was developed by Xerox back in the 1970's. The Internet became widely used in the

middle of the 1990's. Today most Ethernet traffic uses Internet protocols, but they are really two different communications systems. Internet uses IP addresses and can transmit data around the world. Ethernet uses Mac addresses and can transfer data around your living room (or around your house if you run the wires that far). Switches operate on the Ethernet level and look at Mac addresses.

One device in your home will probably take all the Ethernet traffic and connect it to the Internet through your DSL or cable modem. It is called a Router. A Router operates on IP addresses and the world wide Internet protocols.

The Router that you buy for $40 is actually a little computer. In many cases, it runs a special version of the Linux operating system. Companies that modify Linux have to publish the source to

their changes, so programmers have modified this source and offer different versions of the firmware for popular Router devices. Some Routers will also connect to and share printers or disks. Almost every Router has a firewall that prevents programmers in Romania from trying to hack into your home computers.

A typical Router has a four port Ethernet switch to connect your home computers, but if you have

more than four computers you can simply connect one port of the built in switch to a second external switch and add the extra computers to it. Since the Router connects to a DSL or cable modem, there is no particular reason for it to run faster than 10 Megabits, but with modern

equipment it is more convenient if the built in switch supports Gigabit speeds and Jumbo packets. If it doesn't, then just get an external switch that does.

For a few dollars more, you can get a Wireless Router than also supports Wireless Ethernet connection from laptops and handheld devices.

Internet Addressing


S Sandeep Kumar

Ethernet delivers packets based on the MAC address. Internet protocols require a second address number called the "IP Address". The IP address is a four byte number, and by convention it is

represented as the decimal numeric value of each byte (0 to 255) separated by periods. Yale University, for example, has IP addresses beginning with 130.132.*.* and the machine on which PCLT is hosted at the time this is being written has address 130.132.51.8. Every source or destination of messages on the Internet has to be assigned one of these numbers. There are

enough consumers who cannot set the clock on their microwave oven, so expecting them to correctly enter a number like this into the system is unreasonable. Most of the time the number is provided automatically over the network.

The phone company will have assigned one IP address to your DSL modem, or the Cable TV company will have assigned an IP address to your cable modem. Unless you have purchased an

extra cost business service, the IP address you have been assigned can change from day to day. They have a pool of available addresses, and when you begin to use the service they assign an unused number from the pool for your temporary use. If you use a dial up phone line to connect

to the Internet, the Internet Service Provider gives you a phone number to dial and an id and password to logon to their system. During the logon the ISP passes back to your machine an IP address it should use during the connection.

The same approach is used when a high speed DSL or Cable modem is connected to a home network through an Ethernet Router box. The router is provided with a node name, userid, or

password to logon to the ISP network. The ISP passes back an IP Address value that the Router box then uses to communicate with the outside world.

In either case, the IP address provided by the ISP, even temporarily, allows one computer or the one Router box to communicate with any mail, Web, or other server anywhere in the world. This still leaves the question of how computers inside your home talk to each other or to the Router

box. The answer is a trick that Routers know called "NAT" (Network Address Translation). The NAT function in the Router translates all messages from other computers so that they look, to the outside world, like programs running inside the Router itself. Therefore, other computers in the home network don't have to be assigned addresses that are meaningful outside the home.

The Internet reserves sets of IP Addresses for non-public use. These numbers can be assigned to

machines that are isolated from the public network and either do not communicate at all or else only communicate through gateways. A popular range reserved for non-public use are the addresses beginning 192.168.1.*.

The simplest way to assign IP Addresses to all the computers of a home network is to let the Router box that provides connectivity to the Internet assign numbers on request to any machine

that asks for one. By default, the Linksys Router assigns itself the address 192.168.1.1 in the home network. It then skips numbers 2-99 and assigns numbers as requested by computers starting at 192.168.1.100. The protocol for serving up IP Address values on request is called

DHCP. All of these values can be configured in the advanced control panels of the Router, but there typically is no reason to change them.

So having explained how this all works, the equipment and services are generally configured so you don't need to know the details.

The ISP will provide you with a DSL or Cable modem, some software for a computer, and the names and passwords needed to access the system. Since some ISP agreements don't allow

multiple machines in a home network to share the same line, it may be a good idea while the installer is in the house to hide any Router in a closet and install and test everything on one computer.


S Sandeep Kumar

After the ISP equipment has been tested, replace the single computer with the Router box and connect at least one computer Ethernet adapter to the Router. The computer should be set to pick

up its IP Address automatically from the network, and if it is the same computer used to test the modem it should probably be rebooted so it picks up a new address from the Router. Now follow the instructions in the Router manual to configure the Router with the same ID and password that the ISP provided to make the previous connection. It may be helpful to know the buzzword that

identifies the particular type of logon protocol used by the ISP (for example, "PPoE" is a popular choice) since this has to be selected from a menu of options in the Router.

Once the Router logs on successfully to the ISP, computers connected to it through Ethernet should be able to access Web sites. The IP addresses vended by the Router also allow the computers to talk to each other to share files and printers.

Wireless

a, b, g, and n

The FCC in the US and its international counterparts license various frequencies to radio, TV,

military, and other users. Specific bands of frequency are assigned for "unlicensed" use by household devices. The first devices to use these frequencies were cordless telephones. Computers quickly followed.

The first unlicensed frequency range was 900 MHz. There are still wireless phones in this frequency, but an initial generation of non-standard wireless computer cards has now been

phased out. A second band of frequencies was opened at 2.4 GHz. This is the most popular choice for wireless phones and the current standard "802.11b" and "802.11g" ("WiFi") wireless Ethernet equipment. A new band of frequencies at 5 GHz is now becoming available. It is used for new "802.11a" wireless Ethernet equipment, but there are no wireless phones currently operating in this range.

The 2.4 GHz frequency (b and g) is preferred by wireless phones because it has good performance, long range, and some ability to pass through the walls of a house. Its disadvantage is that the frequencies are crowded with devices, and they are subject to interference from microwave ovens. The 5 GHz devices (a) are free from interference, but they don't stretch as far

and have serious problems passing through walls. In fact, the a devices operate over such short distances and limited environments that they are almost worthless.

The b standard runs at 11 Megabits per second. The g standard nominally runs at 54 megabits per second, but vendors have come up with ways to use two channels inside the frequency to double this to 108 Mb/s.

Obviously the next step is to take over all the frequencies that a device can get access to and to

crank the speed up as high as possible. This idea is being explored in a new standard called 802.11n. The problem designing the n standard isn't how to increase speed, but rather to design a device that is considerate enough that it doesn't shut down not only your own wireless phone

but also the wireless phones and routers of everyone in your neighborhood. There are some "pre-n" routers on the market, but you will be really unpopular if you install one anywhere but on a farm or a cabin in the woods.

The simplest way to add Wireless capability to a home system is to use a Wireless Router instead of a conventional Router to connect to the DSL or Cable modem. Wireless routers have all the

functions and Ethernet ports of the standard router, and a set of antenna and wireless capability added in. Prices may vary, and you may prefer another vendor. However, for reference the following four Linksys routers were priced in Dec, 2004:


S Sandeep Kumar

Model Function Price

BEFSR41 4 RJ45 ports (no wireless) $54

WRT54G 802.11g Router + 4 RJ45 ports $80

WRT55AG 802.11a, b, and g Router + 4 RJ45 ports $115

The most common Wireless Ethernet adapter is a Cardbus device that plugs into laptop

computers. To make a wireless connection from a desktop computer, the most convenient option is probably to use an external network adapter device with a USB connector to the PC.

WEP

Wireless Ethernet broadcasts data for at least a hundred feet. The signal may go much farther if

the recipient uses more sensitive professional equipment. To provide even the most basic elements of privacy, the data should be encrypted.

Wireless standards provide for data encryption called 'WEP". WEP comes in 64, 128, and 152 bit versions. The larger number is better, but it must be supported by all of the devices in the

network. It is generally agreed that 64 bit WEP is not particularly good, but it is still better than nothing. Use at least 128 bit if possible.

WEP is driven by an encryption key. You can generate the key manually, but there is typically an algorithm that will generate a key from a password. The key is initially generated on the Access Point. It must then be entered into the configuration panels of every computer that you want to

connect to the Access Point. Since it is very easy to get this wrong the first time you try to do it, make sure that there is at least one Wired Ethernet computer that can connect to the Access Point and run the configuration panels. Otherwise, if something goes wrong you may not be able to get back to the Access Point with any Wireless device to check or change the WEP configuration.

Infrastructure

Wireless Ethernet adapter cards can be configured to run in "ad hoc" or "infrastructure" modes. The "ad hoc" mode allows any two computers that come within range of each other to begin communicating. This is not, however, an easy configuration to debug. Furthermore, low level Ethernet connectivity has already been shown to be useless without also getting an IP Address. Since "ad hoc" operation requires manual configuration of IP, it is difficult to set up.

Normally the adapter is configured for "infrastructure" mode. It then searches not for another computer, but for Wireless Router or "Access Point" such as the devices listed in the previous table.

A 2.4 GHz device (b or g) has a range of 100 to 150 feet indoors, less through thick walls. A 5 GHz device has a range of 25 to 75 feet and generally cannot penetrate a real wall. To provide full

coverage, a company may scatter Access Points around a building. By luck, somebody is going to be located midway between two Access Points with the opportunity to connect to either.

Access Points are configured with a network identifier (SSID) and a channel number (recommended to be 1, 6, or 11). Access points that cover adjacent territory should be assigned to different channels so their signals do not interfere with each other. Generally, Access Points

shared by workers in the same company, or Access Points at opposite ends of a really big home, will have the same SSID. You can configure the Access Point to either broadcast the SSID or to be quite. Broadcast SSID makes it easy to select particular Access Points when there are several networks close to each other, but keeping the SSID a secret improves security.


S Sandeep Kumar

If you live in an apartment building, it is possible that the signal from a neighbor's Access Point will leak into your apartment. It would then be strongly recommended that you choose a different SSID and a different channel.

When you install a Wireless Ethernet adapter in a computer and set it up for "infrastructure" mode, the Windows support will display the SSID of all the Access Points close enough to read their broadcast. The user must select one Access Point, and if it is secured must provide a WEP Key.

Copyright 1998, 2007 PCLT -- Introduction to PC Hardware -- H. Gilbert

USB and FireWire

To connect external devices (printers, scanners, disks, CD or DVD writers) to a computer there are two popular connection standards. USB 2.0 and FireWire provide full speed support for large numbers and a broad variety of external plug and play devices.

1 Jan, 2007

USB, 1394 (Firewire), and e-SATA

When IBM designed its first PC, it minimized any custom hardware. The box used industry standard off-the-shelf chips for each interface. The keyboard was connected to one type of chip,

serial ports ran off a National Semiconductor serial chip, and the printer connected to a "parallel" port driven by yet a third chip.

Today all these specialized devices are supported as functions of the "Southbridge" chip on the mainboard. It is expensive to design all these different specialized functions into the chip and to provide connectors from that chip to all the different ports. Even the physical specialized

connectors are expensive, and the parallel printer port takes up more space than any three other devices.

These "legacy" interfaces are obsolete. Today all their functions, and may additional hardware services, can be provided by a single standard device interface. Computers have been shipping

with a USB port for five years. There are USB keyboards and mice. Printers and scanners come with a USB interface. A few vendors have shipped "legacy free" mainboards that replace all the old connectors with USB ports, but consumers have expressed some resistance to change. The old inefficient interfaces remain because people seem to want them.

USB

Every modern computer mainboard has four USB 2 ports on the back. Typically the mainboard

will have two additional "headers" onto which you can plug a cable that feeds two additional pairs of USB ports. A case will have one pair of USB ports on the front, and mainboards often come with a bracket that you can screw onto the end of an unoccupied card slot to generate two more USB ports in the back.

All the devices (except video) that used to have individual ports on the back of the computer can

now be connected to a USB port. The keyboard and mouse can connect to either USB or the round "PS/2" port. Printers now connect directly to USB instead of the massive obsolete "parallel port". If anyone needs a modem any more, an external serial port can also be connected through USB.


S Sandeep Kumar

If you have an old computer without USB 2, you can buy a PCI card with 4 or 5 USB ports for $27.

The modern USB 2.0 port supports three specific transfer speeds:

• A "low speed" USB device transfers data at 1.5 megabits per second. • A "full speed" USB device transfers data at 12 megabits per second. • A "high speed" USB device transfers data at 480 megabits per second.

Unfortunately, a buyer has to be careful of the terminology. Technically, USB 2.0 is a specification of the physical and electrical interface specification. A few vendors claim to have USB 2.0 ports but do not support high speed devices. So make sure that a device claims "high speed" or 480 Mbs before buying it.

Converting to bytes, USB appears to be 60 megabytes per second, but in practice it does not

transfer data that fast. There is overhead, and in practice a USB disk is unable to operate faster than 10-16 megabytes per second.

That is much faster than is necessary for DVD, HD DVD, Blu-Ray, and standard or high definition TV tuners. Although the video feed to monitors requires massive bandwidth, HDTV is compressed down to around a megabyte per second.

With an external adapter (typically built into the cable) a USB port can be connected to old

parallel printers or modems. For around $60 several vendors produce a box that connects to one USB port on a laptop and provides legacy connectors for the keyboard and mouse (PS/2), modem (9 pin), parallel printer (25 pin), and even a 100 megabit Ethernet port. This allows the laptop vendor to avoid obsolete connectors that most of the time serve no useful purpose.

FireWire

FireWire is the informal name for a standard technically known as IEEE 1394. It was developed by

Apple as a low cost alternative to SCSI for disk-speed external devices. The original FireWire specification runs at 400 megabits per second and a second generation doubles that to 800 megabits.

Firewire is more efficient than USB. A Firewire 400 Mb/s transfers data faster than the nominally faster USB 480 Mb/s.

However, Firewire has not caught on. It is available on some but not all mainboards. It is much less common than USB for most devices. Even Apple has transferred its attention to USB.

External SATA (eSATA)

So if you are looking for a simple and inexpensive way to connect external disks to your computer, the current best technology is eSATA. This is a slightly more rugged version of the simple SATA cables that connect hard disks inside the computer.

Like the internal SATA, it is available in 150 and 300 Megabyte per second speeds. Since desktop disks typically transfer at a maximum of 40 megabytes/sec, and the very fastest Raptor disks

only do 80 megabytes/sec, 150 is a perfectly fine speed. In fact, since the slower 150 MB/s is slightly more robust against connection problems, the author forces all his own SATA external disks to run at 150 even when they are nominally rated at 300.


S Sandeep Kumar

eSATA allows you to run external disks at the same speed and with the same performance as internal disks. If you have a small case, then external disks may be a requirement. Even with a

large case they can be useful for backup or archival. While USB will continue to be useful for the wide range of slower speed devices, eSATA is establishing itself as the preferred external disk connection.

External Enclosures

For $30 one can purchase an external ATA or Serial ATA device enclosure. They come in various sizes to hold a 5 1/4 inch CD or DVD reader or writer, a standard 3 1/2 inch IDE hard disk, or a 2 1/2 inch low profile "laptop" disk.

The external enclosure can connect to the PC over a USB, Firewire, or eSATA cable. Converting a

SATA disk to an eSATA cable is the simplest, but the electronics to do any other conversion is fairly trivial and a small part of the overall cost of any enclosure.

The 2 1/2 inch laptop disk is smaller and more expensive that the standard 3 1/2 inch disk. However, its power requirements are low enough that the disk and small enclosure can be

powered by the USB port on many (but not all) laptop computers. The enclosure is small enough to fit in a pocket and provides convenient supplemental storage for the typically limited disk space in laptop computers. It is also a convenient way to carry data between desktop systems.

If you need to upgrade to a standard 3 1/2 inch disk, then you need an external power source anyway. At that point the higher speed of the eSATA connection is more attractive than USB.

computer hardware and peripherals

Documents