complete latest revised

Department of EIT Hochschule Rosenheim

Table of Contents

1 . INTRODUCTION TO MEMORY STORAGE DRIVES ......................................................................................... 7

1.1 AS IT EXISTS TODAY ............................................................................................................................... 7 1.2 SOLID STATE DRIVES – A BRIEF OVERVIEW ........................................................................................... 7 1.3 THESIS OBJECTIVE ................................................................................................................................. 8 1.4 SUMMARY OF CHAPTERS ....................................................................................................................... 8

2 . SYSTEM MEMORY OVERVIEW ..................................................................................................................... 9

2.1 SYSTEM ARCHITECTURE ........................................................................................................................ 9 2.2 MEMORY ............................................................................................................................................. 11 2.3 STORAGE HIERARCHY.......................................................................................................................... 13 2.4 MEMORY CONTROLLER ....................................................................................................................... 17 2.5 SUMMARY ............................................................................................................................................ 18

3 . MAGNETIC DISK STORAGE ........................................................................................................................ 19

3.1 HARD DISK DRIVES ............................................................................................................................. 19 3.2 HARD DISK DRIVE SYSTEM ARCHITECTURE .......................................................................................... 23 3.3 HARD DISK DRIVE INTERFACES ........................................................................................................... 25 3.4 EXTERNAL HARD DISK DRIVES ........................................................................................................... 28 3.5 FUTURE OF HARD DISK DRIVES ........................................................................................................... 29

4 . SOLID STATE DRIVES ................................................................................................................................. 32

4.1 FLASH MARKET DEVELOPMENT .......................................................................................................... 32 4.2 SOLID STATE DRIVES ........................................................................................................................... 33 4.3 PHYSICAL LAYOUT ............................................................................................................................... 38 4.4 FLASH TRANSLATION LAYER (FTL) .................................................................................................... 40 4.5 SOLID STATE DRIVE INTERFACES ........................................................................................................ 44 4.6 SSD MARKET ...................................................................................................................................... 45 4.7 FUTURE ................................................................................................................................................ 46 4.8 SUMMARY ............................................................................................................................................ 47 4.9 TYPICAL CHARACTERISTICS OF HDD AND SSD ................................................................................... 47

5 . PERFORMANCE: HDD VS SSD .................................................................................................................... 49

5.1 BENCHMARK ........................................................................................................................................ 49 5.2 BENCHMARK ENVIRONMENT ................................................................................................................ 50 5.3 TPC-H BENCHMARK ........................................................................................................................... 50 5.4 ENERGY EFFICIENCY TEST ................................................................................................................... 54 5.5 HD TUNE BENCHMARK ....................................................................................................................... 56 5.6 SUMMARY ............................................................................................................................................ 58

6 . BETTER INVESTMENT: SSD OR ADDITIONAL RAM? .................................................................................... 59

6.1 BENCHMARK ENVIRONMENT ............................................................................................................... 59 6.2 RESULTS .............................................................................................................................................. 60 6.3 CONCLUSION ....................................................................................................................................... 61 6.4 BENCHMARK PROBLEMS ...................................................................................................................... 62

7 . REVERSE ENGINEERING ............................................................................................................................. 63

7.1 INTEL X25-EXTREME ........................................................................................................................... 63 7.2 CRUCIAL REAL C300 ........................................................................................................................... 65


7.3 SUMMARY ............................................................................................................................................ 68 7.4 CONCLUSION ....................................................................................................................................... 69

8 . DESIGNING OPTIMAL PERFORMANCE BASED SSD SYSTEM LEVEL ARCHITECTURE AND ITS CONTROLLER

COST ESTIMATION ......................................................................................................................................... 70

8.1 COST ESTIMATION OF CONTROLLER FOR SYSTEM DESIGNED TO MEET PERFORMANCE SPECIFICATION . 70 8.2 IMPLEMENTATION FACTORS IN OPTIMIZATION TOOL ............................................................................ 71 8.3 OPTIMIZATION TOOL CONSISTENCY TEST FOR CONTROLLER SIZE ......................................................... 76 8.4 HINTS TO USE TOOL FOR OPTIMAL SYSTEM DESIGN AND CONTROLLER COST ESTIMATION: .................. 77

9 . SUMMARY ................................................................................................................................................ 78

9.1 CONCLUSION ....................................................................................................................................... 78 9.2 FUTURE WORK ON SSD ........................................................................................................................ 79

APPENDIX A...............................................................................................................................................80

APPENDIX B...............................................................................................................................................83


List of Figures

Figure 1: View of personal computer system [25] _______________________________________________ 10

Figure 2: Interconnections of memory components ______________________________________________ 10

Figure 3: Forms of storage, divided according to their distance from the CPU [19] ____________________ 15

Figure 4: Memory hierarchy in comparison with Cost/MB, Size & Access speed [32] ___________________ 17

Figure 5: Memory controller hub ____________________________________________________________ 17

Figure 6 : Hard Disk Drive [27] ____________________________________________________________ 20

Figure 7 : Representations of sectors, blocks and tracks on platter surface [27] _______________________ 21

Figure 8 : Representation of Hard Disk Drive as blocks __________________________________________ 23

Figure 9 : Role of Cache buffer in Hard disk ___________________________________________________ 24

Figure 10 : Typical IDE/ATA ribbon cable its socket on a motherboard [28] __________________________ 26

Figure 11: A single-drop 68-conductor SCSI ribbon cable [28] ____________________________________ 27

Figure 12: Close-ups of SATA cable and its slots on a motherboard [28] _____________________________ 28

Figure 13: A Seagate 1TB external hard drive [28] ______________________________________________ 29

Figure 14 Moving Parts in Hard Disk Drives [29] ______________________________________________ 30

Figure 15: Evolution in density of NAND flash memory __________________________________________ 33

Figure 16: HDD and SSD [30] _____________________________________________________________ 34

Figure 17: NAND flash memory chip [30] _____________________________________________________ 35

Figure 18: Flash memory overwrite mechanism ________________________________________________ 36

Figure 19 : A generic overview of a Flash memory bank [5] _______________________________________ 37

Figure 20 : Components of SSD _____________________________________________________________ 39

Figure 21: organization of conventional SSD __________________________________________________ 40

Figure 22 : Address translation in solid state drive [8] ___________________________________________ 41

Figure 23 : Internal structure of solid state drive [6] ____________________________________________ 42

Figure 24 : X4 PC-Express card with NAND flash chips on it [31] __________________________________ 44

Figure 25 : SSD Market development _________________________________________________________ 46

Figure 26 : TPC-H benchmark application outline ______________________________________________ 52

Figure 27 : TPCH benchmark performance results ______________________________________________ 53

Figure 28: Comparison for energy efficiency ___________________________________________________ 55

Figure 29: Read speed comparison __________________________________________________________ 57

Figure 30: Access time comparison __________________________________________________________ 58

Figure 31: performance comparison between HDD with 12GB system RAM vs SSDs with 2GB system RAM _ 60

Figure 32: performance comparison between HDD with 2GB, 8GB, and 12GB system RAM _____________ 61

Figure 33: Intel X25 – Extreme SSD _________________________________________________________ 64

Figure 34: Controller from Marvell on Intel X25-E SSD board ____________________________________ 65

Figure 35: Crucial Real C300 SSD __________________________________________________________ 66

Figure 36: Controller from Marvell on Crucial Real C300 SSD board _______________________________ 67

Figure 37: Design tool outlook ______________________________________________________________ 74

Figure 38: Warning- System is over designed or under designed with respect to performance specified. ____ 74

Figure 39: Cost calculation tool _____________________________________________________________ 75

Figure 40 : Process flow for flip chip BGA and wire bonded BGA packaging. ________________________ 73

Figure 41 : Controller size for the system with SATA 2.0 interface __________________________________ 76

Figure 42 : Controller size for the system with SATA 3.0 interface __________________________________ 76

Figure 43: procedure to create application package _____________________________________________ 81


List of Tables

Table 4-1 SLC vs MLC [9] .................................................................................................................................. 35

Table 5-1 Overview of drives in Benchmark environment .................................................................................... 50

Table 6-1 Overview of drives in Benchmark environment .................................................................................... 59

Table 7-1 Controller chip details of Intel X25- E and Crucial Real C300 SSD ................................................... 68

Table 7-2 Controller chip details of Intel X25- E and Crucial Real C300 SSD ................................................... 68

Table 7-3 Interface compatibility of Intel X25- E and Crucial Real C300 SSD .................................................. 69

Table 8-1 System Interface types and their performances .................................................................................... 71

Table 8-2 Buffer Cache types and their performances ......................................................................................... 71

Table 8-3 SSD Controller Interface signals.......................................................................................................... 72


Abbreviations:

Acronym Definition

BA Bank Address

BGA Ball Grid Array

CS Chip Select

CK Clock Enable

CKE Clock Enable Rank

CAS Column Address Strobe

CLK Clock

DRAM Dynamic Random Access Memory

DQ Data Bus

DQS Data Strobe

DM Data Mask

MA Memory Address

MLC Multi Level Cell

RST Reset

RAS Row Address Strobe

REF_CLK_P/N PCI Express Clock

SATA Serial ATA

SSD Solid State Drive

SLC Single Level Cell

PATA Parallel ATA

PET_P/N PCI Express differential signal

HDD Hard Disk Drive

ODT On-Die Termination

UART Universal Asynchronous Receiver Transmitter

WE Write Enable


Chapter 1

1 . Introduction to Memory Storage Drives

1.1 As it exists today

Fifty years since the first commercial drive, the disk drive has been the prevailing storage

media in almost every computer system to date. Surprisingly, despite the technological

improvements in storage capacity and operational performance, modern disk drives are based

on the same physical and electromagnetic properties. With the rapidly changing technologies

and innovations, electronic storage devices like computer hard disks are becoming more and

more sophisticated in design as well as performance. Even though traditional hard disk drives

(HDD) are being threatened to a certain extent by flash based storage devices, they are still

the most popular form of storage for computing today. Hard drives are used in everything

from servers to desktops and notebooks and offer higher storage capacities.

1.2 Solid State Drives – A brief overview

Solid State Drives cost significantly more per unit capacity than their rotating counterparts as

of today, but there are numerous applications where they can be applied to great benefit. For

example, in transaction-processing systems, disk capacity is often wasted in order to improve

operation throughput. In such configurations, many small (cost inefficient) rotating disks are

deployed to increase I/O parallelism. Large SSDs, suitably optimized for random read and

write performance, could effectively replace whole farms of slow, rotating disks. Currently,

small SSDs are starting to appear in laptops because of their reduced power-profile and

reliability such as shockproof in portable environments. As the cost of flash continues to

decline, the potential application space for solid state disks will certainly continue to grow.

Solid State Drives are amongst the most popular computer hard disks which are available in

the electronic hardware market. It would be interesting to know what is so special about these

computer drives and all the more what make them too special!


1.3 Thesis Objective

Performance evaluation of hard disk drives and solid state drives by making an

extensive comparison followed by benchmarking,

Analyzing the architecture of an SSD controller and reverse engineering,

Finally, developing a tool which suggests the most optimum and cost efficient system

level SSD architecture based on a selected interface.

1.4 Summary of Chapters

Chapter 2 gives an overview of memory design architectures in traditional computer systems

and a glance to the storage hierarchy.

Chapter 3 goes into the details about hard disk drives, its architecture, physical structure and

operation followed by a discussion on the different types of host interfaces used by hard disk

drives today.

Chapter 4 is dedicated to the flash memory technology which gives a taste of solid state

drives technology, its architecture, working and advantages over its counterpart, the hard disk

drive.

In chapter 5 the performance of magnetic disks and SSD is analysed in different scenarios.

Here, it is aimed to identify the main characteristics and try to point out possible weaknesses

along with a solution to these drawbacks.

Chapter 6 provides content regarding reverse engineering of the solid state drive controller

and its structure at package level. Here, the different factors responsible for varied

performance in solid state drives are listed.

Chapter 7 gives insights about deciding between SSD and more RAM for performance

enhancements.

Chapter 8 describes about the tool that was developed which suggests a system level solid

state drive architecture for optimum performance with an estimated controller cost based on

selected host interface and.

Chapter 9 summarizes the results and discussion on the future of solid state drives.


Chapter 2

2 . System Memory Overview

2.1 System Architecture

The system architecture determines the main hardware components that make up the physical

computer system and the way in which they are interconnected. The main components

required for a computer system are listed below:

Central processing unit (CPU),

Random access memory (RAM),

Read-only memory (ROM),

Input / output (I/O) ports,

The system bus,

A power supply unit (PSU).

In addition to these core components, in order to extend the functionality of the system and

provide a computing environment so that a human operator interacts with much more ease,

additional components are required which could include:

Secondary storage devices (e.g. disk drives),

Input devices (e.g. keyboard, mouse, scanner)

Output devices (e.g. display adapter, monitor, printer)

The core system components are mounted on a backplane, more commonly referred to as a

mainboard (or motherboard). The mainboard is a relatively large printed circuit board that

provides the electronic channels (buses) that carry data and control signals between the

various components, as well as the necessary interfaces (in the form of slots or sockets) to

allow the CPU, Memory cards and other components to be plugged into the system. In most

cases, the ROM chip is built in to the mainboard, and the CPU and RAM must be compatible


with the mainboard in terms of their physical format and electronic configuration. Internal

I/O ports are provided on the mainboard for devices such as internal disk drives and optical

drives.

Figure 1: View of personal computer system [25]

The relationship between the elements that make up the core of the system is illustrated

below.

Figure 2: Interconnections of memory components

The data flows back and forth between the processor and the memory over shared electrical

conduits called buses which carry address, data, and control signals. Depending on the

particular bus design, data and address signals can share the same set of wires, or they can

use different sets.


External I/O ports are also provided on the mainboard to enable the system to be connected to

external peripheral devices such as the keyboard, mouse, video display unit, and audio

speakers. Both the video adaptor and audio card may be provided on-board (i.e. built in to the

mainboard), or as separate plug-in circuit boards that are mounted in an appropriate slot on

the mainboard. The mainboard also provides much of the control circuitry required by the

various system components, allowing the CPU to concentrate on its main role, which is to

execute programs. Memory is the most important integral part of a computational system. In

this chapter, the focus is on memory organization as a clear understanding of these ideas is

vital for the analysis of system performance.

2.2 Memory

Memory lies at the heart of the stored-program computer. The system memory is the place

where the computer holds current programs and data that are in use. Although memory is

used in many different forms around modern PC systems, it can be divided into two essential

types

Read-only-memory, ROM.

Random access memory, RAM

2.2.1 ROM

ROM refers to non volatile memory which means that it always retains data even after power

is shut. In fact, it needs very little charge to retain its memory. It is used to store permanent or

semi-permanent data that persists even while the system is turned off. It is usually used to

store small start up programs like BIOS which is used to bootstrap the computer. There are

many extended types of ROM namely -

PROM: Programmable ROM

EPROM: Erasable PROM

EEPROM: Electrically Erasable PROM (Flash)

http://en.wikipedia.org/wiki/BIOS


Flash memory is essentially EEPROM with the added benefit that data can be written or

erased in blocks, removing the one-byte-at-a-time limitation. This makes flash memory faster

than EEPROM!

2.2.2 RAM

RAM refers to volatile memory which means that the data is lost once the power is turned

off. There are two types of RAM, Static RAM (SRAM) and Dynamic RAM (DRAM).

SRAM: It consists of circuits similar to the D flip-flop. Therefore, it doesn’t need to be

refreshed, unlike its counterpart, the DRAM. SRAM is faster and much more expensive than

DRAM and is used to build cache memory.

DRAM: It stores each bit of data in a separate capacitor within an integrated circuit. The

capacitor can be either charged or discharged; these two states are taken to represent the two

values of a bit, conventionally called '0' and '1'. Capacitors leak the charge stored in them

slowly over time and thus must be refreshed every few milliseconds to prevent data loss.

DRAM is ―cheap‖ memory owing to its simple design compared to SRAM. Designers use

DRAM as it is much denser, uses less power and generates less heat than SRAM. For these

reasons, DRAM's are preferred over SRAM's to be used to build the main memory.

There are many kinds of DRAM memories and new kinds appear in the market with

regularity as manufacturers attempt to keep up with rapidly increasing processor speeds. Each

design is based on the conventional DRAM cell, with optimizations that improve the speed

with which the basic DRAM cells can be accessed.

Synchronous DRAM (SDRAM)

SDRAM has a synchronous interface, meaning that it waits for a clock signal before

responding to control inputs and is therefore synchronized with the computer's system

bus. The clock is used to drive an internal finite state machine that pipelines incoming

instructions. This allows the chip to have a more complex pattern of operation, enabling

higher speeds.


Double Data Rate SDRAM (DDR SDRAM)

DDR SDRAM has the same working principle. The difference is that DDR SDRAM

doubles the bandwidth by double-pumping (transferring data on both the rising and the

falling edge of the clock signal, without increasing the clock frequency.

DDR2 SDRAM

DDR2 is the next generation of memory developed after DDR. DDR2 increased the data

transfer rate referred to as bandwidth by increasing the operational frequency to match

the high FSB frequencies and by doubling the pre-fetch buffer rate. Like DDR SDRAM,

DDR2 transfers data on both the rising and the falling edge of the clock signal. The trade

off is that internal operations are carried out at only half the clock rate!

DDR3 SDRAM

DDR3 is the successor to DDR2. DDR3 increased the pre-fetch buffer size to 8 bits and

increased the operating frequency once again resulting in high data transfer rates than its

predecessor DDR2. Like DDR2 SDRAM, DDR3 transfers data on both the rising and the

falling edge of the clock signal, although internal operations are limited to only a quarter

of the clock rate!

Rambus DRAM (RDRAM)

This is an alternative proprietary technology with a higher maximum bandwidth than

DDR SDRAM. Compared to other contemporary standards, Rambus shows a slight

increase in latency, heat output, manufacturing complexity, and cost.

Video RAM (VRAM)

VRAM has two ports namely, DRAM port and video port. The second port, the video

port, is typically read-only and is dedicated to providing a high bandwidth data channel

for the graphics chipset. This is used in the frame buffers of graphics systems.

2.3 Storage Hierarchy

Storage hierarchy refers to the different types of memory devices and equipment configured

into an operational computer system to provide the necessary attributes of storage capacity,

speed, access time, and cost to make a cost-effective practical system.


In practice, almost all computers use a variety of memory types, organized in a storage

hierarchy around the CPU, as a trade-off between performance and cost. Generally, the lower

a storage is in the hierarchy, the lesser its bandwidth and the greater its access latency is from

the CPU. This traditional division of storage to primary, secondary, tertiary and off-line

storage is also guided by cost per bit.

2.3.1 Primary storage

Primary storage or the commonly referred Main Memory is the memory which is

directly accessible to the CPU. The CPU continuously reads instructions stored there

and executes them as required. Any data actively operated on is also stored there in

uniform manner.

Besides main large-capacity RAM, Primary storage consists of two additional sub-

layers namely processor registers and processor cache as shown in the Figure 3

Processor registers are located inside the processor. Each register typically holds a

word of data (often 32 or 64 bits). CPU instructions instruct the arithmetic and logic

unit to perform various calculations or other operations on this data (or with the help

of it). Registers are the fastest of all forms of computer data storage.

Processor cache is an intermediate stage between ultra-fast registers and much slower

main memory. It's introduced solely to increase performance of the computer. Most

actively used information in the main memory is just duplicated in the cache memory,

which is faster, but of much lesser capacity. On the other hand it is much slower, but

much larger than processor registers. Multi-level hierarchical cache setup is also

commonly used—primary cache being smallest, fastest and located inside the

processor; secondary cache being somewhat larger and slower.

secondary cache is the L2 cache, usually contained on the motherboard. However,

more and more chip makers are planning to put this cache on board the processor

itself. The benefit is that it will then run at the same speed as the processor, and costs

less to put on the chip than to set up a bus and logic externally from the processor.

The hierarchy continues and it is referred to as L3 cache. This cache used to be the L2

cache on the motherboard, but now that some processors include L1 and L2 cache on


the chip, it becomes L3 cache. Usually, it runs slower than the processor, but faster

than main memory.

Random-Access Memory It is small-sized, but quite expensive at the same time. (The

particular types of RAM used for primary storage are also volatile, i.e. they lose the

information when not powered).

Main memory is directly or indirectly connected to the central processing unit via a

memory bus. It is actually two buses: an address bus and a data bus. The CPU firstly

sends a memory address that indicates the desired location of data. Then it reads or

writes the data itself using the data bus. Additionally, a memory management unit

(MMU) is a small device between CPU and RAM recalculating the actual memory

address, for example to provide an abstraction of virtual memory or other tasks.

Figure 3: Forms of storage, divided according to their distance from the CPU [19]

http://en.wikipedia.org/wiki/Volatile_memory

http://en.wikipedia.org/wiki/Memory_management_unit


2.3.2 Secondary Storage

Secondary storage is also known as external memory or auxiliary storage. The term

'secondary' refers to the inability of the CPU to access it directly. The data in the

secondary storage is accessed by the CPU through intermediary devices like the

processor cache. The computer uses its secondary storage via the various input/ output

channels. Secondary storage is non-volatile which means it does not lose the data

when the device is powered down. Per unit, it is typically also two orders of

magnitude less expensive than primary storage. Consequently, modern computer

systems typically have two orders of magnitude more secondary storage than primary

storage and data is kept for a longer time there. In modern computers, hard disk drives

are commonly used as secondary storage.

2.3.3 Offline Storage

Offline Storage is where removable types of storage media sit such as tape cartridges

and optical disc such as CD and DVD. Offline storage is can be used to transfer data

between systems but also allow for data to be secured offsite to ensure companies

always have a copy of valuable data in the event of a disaster.

2.3.4 Tertiary Storage

Tertiary Storage is mainly used as backup and archival of data and although based on

the slowest devices can be classed as the most important in terms of data protection

against a variety of disasters that can affect an IT infrastructure. Most devices in this

segment are automated via robotics and software to reduce management costs and risk

of human error and consist primarily of disk & tape based back up devices.


Figure 4: Memory hierarchy in comparison with Cost/MB, Size & Access speed [32]

*The values are approximated for illustration

2.4 Memory Controller

The memory controller is a digital circuit which manages the flow of data going to and

from the main memory. It can be a separate chip or integrated into another chip, such as

on the die of a microprocessor. This is also called a Memory Chip Controller (MCC).

Figure 5: Memory controller hub

The memory controller scans for the type and speed of the RAM connected. It also

determines the maximum size of each individual memory module and the overall memory

capacity of the system.

Memory controllers contain the logic necessary to read, write and refresh the main

memory. Considering DRAM for example, Reading and writing to DRAM is performed by


selecting the row and column data addresses of the DRAM as the inputs to the

multiplexer circuit, where the de-multiplexer on the DRAM uses the converted inputs to

select the correct memory location and return the data, which is then passed back through a

multiplexer to consolidate the data in order to reduce the required bus width for the operation.

Bus width is the number of parallel lines available to communicate with the memory cell.

Memory controller bus widths range from 8 to 64-bit. In complicated systems, memory

controllers are operated in parallel such as a four 64-bit bus operating in parallel, though

some are designed to operate in "gang mode" where two 64-bit memory controllers can be

used to access a 128-bit memory device.

2.5 Summary

In this chapter, an introduction to different memory technologies in a computer system is

summarised. System memory hierarchy has been closely analysed which gives a much better

idea, how to choose different storage technology based on type and size suitably. In coming

chapters, more insights on current technological trends used for secondary storage are given.

In next chapter, the focus is on current state of secondary storage, represented by magnetic

disks also called as hard disk drives.


Chapter 3

3 . Magnetic Disk Storage

As computing capacity increases, so does the need for Secondary storage. The most

important device of this class is the Hard Disk Drive (HDD), which is based on magnetic

principles for permanently storing information. HDDs have cost per byte at least two orders

of magnitude smaller than that of DRAM, making them suitable for storing vast amounts of

data. Hence, hard disk drives are used as secondary memory in most computer systems.

In this chapter, an insight on current state of disk storage, represented by magnetic disks is

given in section 3.1

3.1 Hard Disk Drives

The hard disk drive is by far the most common secondary storage device in use today. Being

in use for half a century, hard disk drives are today considered very mature, and have seen

many major improvements. Hard Disk Drives (HDD) are storage devices containing one or

more rotating platters made out of a non-magnetic material and are coated with a thin layer of

magnetic material. Small sections of this material are manipulated into different magnetic

states, making it possible to store data. Magnetic disks have had a great ability to scale

capacity and continue to do so today. Internal view of a magnetic disk can be seen in

Figure 6.

3.1.1 Physical layout

Hard disk drives are called so because of rotating magnetic platters in them which are used

for storage. The rotating platters in magnetic disks sometimes use both sides for storage. Each

of these is divided into sectors and tracks. The intersection of a single block and a single track

makes up a block. As seen in Figure 7, tracks on the outer part of the disk platter are made up

of more sectors. This is due to the fact that the surface will pass faster under the disk head,

also more surface mean allow to store more data in these tracks. These different sections are

called as zones. To get a higher data capacity in a disk, several platters are put together in a


spindle. The disk arm will have a separate head for each surface, and is able to write to more

sectors without seeking to a different track. The same tracks across all surfaces are called a

cylinder. Having cylinders will make it possible to increase read and write operations, as the

disk arm can perform operations on multiple surfaces without needing to move to different

position.

Figure 6 : Hard Disk Drive [27]

3.1.2 Working Principle

The platters are made from a non-magnetic material and are coated with a thin layer of

magnetic material. Read-and-write heads are positioned on top of the disks. The platters are

spun at very high speeds with a motor. A typical hard drive has two electric motors, one to

spin the disks and one to position the read/write head assembly. Information is examined or

altered on the platter as it rotates past the read/write heads. The read-and-write head can

detect and modify the magnetization of the ferromagnetic material immediately under it.


3.1.3 Disk access time

Disk access time in magnetic disks is made up of three different operations. The time the

different operations take will vary on position of disk head, where in the rotation the disk

surface is and physical abilities of the disk. Disks read and write data in sector-sized blocks.

The access time for a sector has three main components:

I. Seek time: To read the contents of some target sector, the arm initially positions the

head over the track that contains the target sector. The time required to move the arm

is called the seek time. The seek time, Tseek, depends on the previous position of the

head and the speed that the arm moves across the surface. The average seek time in

modern drives, Tavg seek, measured by taking the mean of several thousand seeks to

random sectors.

II. Rotational latency: this depends on rotational speed of the disk (RPM). Once the

head is in position over the track, the drive waits for the first bit of the target sector to

pass under the head. The performance of this step depends on both the position of the

surface when the head arrives at the target sector and the rotational speed of the disk.

In the worst case, the head just misses the target sector and waits for the disk to make

a full rotation. Thus, the maximum rotational latency is given by

TMax Rotation = 1/RPM

Figure 7 : Representations of sectors, blocks and tracks on platter surface [27]


III. Transfer time : When the first bit of the target sector is under the head, the drive can

begin to read or write the contents of the sector. The transfer time for one sector

depends on the rotational speed and the number of sectors per track. Thus, the

average transfer time for one sector can be roughly estimated as

With these characteristics, the seek time and rotational delay becomes a significant part of a

random read or writes operation. For sequential operations, the disk will be able to work on

entire tracks/cylinders at a time, continuing with neighbouring tracks/cylinders. Doing

sequential read will, because of a short physical distance between the locations of data,

minimize time used on seeks, resulting in an overall lower access time for the data.

3.1.4 Addressing

The location of a specific sector is referenced using its cylinder number, head number and

sector number (this addressing scheme is often abbreviated to CHS). Indeed, the total number

of sectors on the drive could be calculated by multiplying the number of cylinders by the

number of read/write heads, and then multiplying the result by the number of sectors per

track. Since the introduction of zoned bit recording (as mentioned above, this is a drive

geometry in which the number of sectors per track is smaller at the centre of the disk) this

calculation can no longer be used. The way in which sectors are addressed has also become

more abstract, relieving the operating system software of the need to know about physical

drive geometry. Note that sectors that are logically sequential are not necessarily physically

contiguous. After reading a sector, there may be a small delay before the drive controller is

ready to read another sector. Sectors that are logically sequential may therefore be spaced at

discrete intervals on the disk to give the drive controller time to get ready to read the next

sector - a technique known as interleaving. If an interleave factor of 3:1 were used for

example, it would take three full rotations for the controller to read all of the sectors on a

single track. Thanks to advances in technology, most modern hard drives do not need to use

interleaving.

Modern hard disk drives use logical block addressing (LBA), a simple linear

addressing scheme in which each sector is given an integer index number, starting with 0.

The drive controller translates each logical block address into a cylinder, head and sector

TAverage transfer = (1/RPM) x (1/average number of sectors/track)


number in order to obtain the physical location of the sector on disk. The maximum number

of sectors that can be addressed is dependent on the number of bits used for the logical block

address.

3.2 Hard disk drive system architecture

The system level design of hard disk drive is characterized by the use of a few highly-

integrated chips; their interconnection is represented as a block diagram in Figure 8.

As one can see in the picture, the whole layout is based upon chips below:

System controller chip including the read/write channel, disk controller and RISC

control processor (microcontroller),

Flash ROM chip containing drive firmware,

RAM chip used as a cache buffer.

Figure 8 : Representation of Hard Disk Drive as blocks

Disk controller is the most complicated drive component which determines the speed of data

exchange between a HDD and HOST.


Disk controller has four ports used for connection to a HOST, microcontroller, buffer RAM

and data exchange channel between it and head disk assembly. Disk controller is an

automatic device driven by microcontroller; from HOST side only standard registers of task

file are accessible. Disk controller is programmed at the initialization stage by

microcontroller, during the procedure it sets up the data encoding methods, selects the

polynomial method of error correction, defines flexible or hard partitioning into sectors, etc.

Buffer manager is a functional part of disk controller governing the operations of buffer

RAM, referred to as cache. The capacity of the latter ranges in modern HDDs from 512Kb to

16MB. The buffer manager splits the whole buffer RAM into separate sectioned buffers.

Special registers accessible from microcontroller contain the initial addresses of those

sectioned buffers. When HOST exchanges data with one of the buffers the read/write channel

can exchange data with another buffer sector. Thus the system achieves multi-sequencing for

the processes of data reading/writing from/to disk and data exchange with HOST.

3.2.1 Hard disk drive cache

Hard disk drives contain an integrated cache; also often called a buffer. The purpose of this

cache is not dissimilar to other caches used in the PC, even though it is not normally thought

of as part of the regular PC cache hierarchy. The function of the cache is to act as a buffer

between a relatively fast device and a relatively slow one. For hard disks, the cache is used to

hold the results of recent reads from the disk and also to 'pre-fetch' information that is likely

to be requested in the near future, for example, the sectors or sectors immediately after the

one just requested.

Figure 9 : Role of Cache buffer in Hard disk


The basic principle behind the operation of a simple cache is straightforward. Reading data

from the hard disk is generally done in blocks of various sizes, not just one 512-byte sector at

a time. The cache is broken into segments, or pieces, each of which can contain one block of

data. When a request is made for data from the hard disk, the cache circuitry is first queried to

see if the data is present in any of the segments of the cache. If it is present, it is supplied to

the logic board without access to the hard disk's platters being necessary. If the data is not in

the cache, it is read from the hard disk, supplied to the controller, and then placed into the

cache in the event that it gets asked for again. Since the cache is limited in size, there are only

so many pieces of data that can be held before the segments must be recycled. Typically the

oldest piece of data is replaced with the newest one. This is called circular, first-in, first-out

(FIFO) or wrap-around caching.

The use of cache improves performance of any hard disk, by reducing the number of physical

accesses to the disk on repeated reads and allowing data to stream from the disk uninterrupted

when the bus is busy.

3.3 Hard Disk Drive Interfaces

Host interface also called as drive interface used defines the characteristics of the electronic

interface between the disk drive and the computer. The type of interface used will to a great

extent depend on the purpose for which the computer is to be used, and the type of

interface(s) supported by the system motherboard. A number of different interfaces have been

developed over the years, some of which are described below.

3.3.1 Advanced Technology Attachment (ATA)

ATA has in the past been somewhat incorrectly referred to as Integrated Drive Electronics

(IDE) and has been retrospectively renamed as Parallel ATA (PATA) to distinguish it from

the more recent Serial ATA (SATA) interface. The use of the popular IDE misnomer comes

from the fact that this interface was the first in widespread use to have the drive controller

built into the drive itself. Previously, the drive controller was a separate add-on card that

occupied one of the ISA slots on the computer's motherboard. The drive was connected to the

motherboard using a 40 or 80-conductor ribbon cable that connected a 40-pin socket on the

drive itself to a similar socket on the motherboard and transferred sixteen bits of data in


parallel. Each ribbon cable could connect two ATA drives in a master-slave configuration.

Enhanced IDE, introduced in anticipation of changes to the ATA standard, allowed the use of

direct memory access (DMA) which meant that data could be transferred directly between the

disk and memory without involving the CPU in the data transfer process. This freed up the

CPU for other tasks.

Figure 10 : Typical IDE/ATA ribbon cable its socket on a motherboard [28]

3.3.2 Small Computer System Interface (SCSI)

SCSI disk and tape drives were standard fare on servers and high-performance workstations

and despite advances in ATA technology can still be found in many high-performance server

applications. SCSI can be used to connect a wide range of devices, and the SCSI standard

defines command sets for many specific types of peripheral device. The SCSI interface

allows a maximum of either 8 or 16 peripheral devices to connect to the host computer via a

shared parallel bus.

Servers typically employ RAID drives in which multiple disks are connected to a SCSI RAID

controller card via a SCSI backplane inside a disk enclosure. The connection between the

backplane and the controller card will typically be a 68 or 80-conductor single drop ribbon

cable. Multiple non-RAID devices could also be connected to a SCSI controller card using

multi-drop cables. SCSI drives have not been widely adopted for personal computers due to

their cost, and the availability of relatively inexpensive ATA drives that provide perfectly

adequate performance for most desktop computing environments. SCSI controller cards are

nonetheless still available for personal computers, and can be mounted in a standard PCI-X or


PCI-E expansion slot. Parallel SCSI has largely been superseded in server and mass storage

applications by Fibre Channel (FC) or Serially Attached SCSI (SAS), both of which use a

high-speed serial interface.

Figure 11: A single-drop 68-conductor SCSI ribbon cable [28]

3.3.3 Serial Advanced Technology Attachment (SATA)

SATA is the successor to Parallel ATA. One of the most obvious differences is the use of a

high-speed serial signal cable instead of the parallel ribbon cable used for ATA drives. It has

two pairs of wires for carrying data and 3 ground wires, giving a total of seven wires. The

cable is cheaper and less bulky than its PATA counterpart, allowing a better flow of air

within the system case and making it easier to install. A SATA signal cable connects a single

drive to a SATA socket on the motherboard - there is no master/slave arrangement. SATA

drives use a 15-pin power connector rather than the 4-pin Molex power connectors used for

PATA drives, although adapters are available to enable a SATA drive to be connected to a

power supply via a 4-pin Molex power cable should the need arise.

The first version of the SATA standard is officially designated as Serial ATA International

Organization: Serial ATA Revision 1.0 (the technology itself should be referred to as SATA

1.5 Gbps) and specifies a gross transfer rate of 1.5 gigabits per second. Taking encoding into

account, this equates to 1.2 gigabytes (150 megabytes) of data. Subsequent revisions have

doubled and redoubled the transfer rates. Revision 2.0 (SATA 3.0 Gbps) is capable of a gross

transfer rate of 3.0 gigabits per second, and Revision 3.0 (SATA 6.0 Gbps) has a gross

transfer rate of 6.0 gigabits per second. As of 2010, most installed hard drives and PC

chipsets implement SATA 3.0 Gb/s, although SATA 6.0 Gbps products are now becoming

available (the Version 3.0 standard was released in May 2010). Most motherboards produced


since 2003 have integrated SATA controllers (although an add-on controller card can be

installed in a PCI or PCI-E slot). The SATA controller can use the Advanced Host Controller

Interface (AHCI) in order to take advantage of advanced features such as the hot-swapping of

drives, providing both the motherboard and operating system support AHCI. If not, SATA

controllers are capable of operating in "IDE emulation" mode.

Figure 12: Close-ups of SATA cable and its slots on a motherboard [28]

3.4 External Hard Disk Drives

External hard disk drives are generally standard ATA, SCSI or SATA hard disk drives

mounted in a suitable portable disk enclosure. The drive can be connected to a computer via a

Universal Serial Bus (USB) or Firewire port, or in the case of SATA drives via an eSATA

(external SATA) or eSATAp (power over eSATA) interface. If an eSATA or eSATAp port is

not available on the system, one can usually be added using a PCI add-on card. The use of an

eSATA interface has the advantage that data transfer rates are generally faster than for

contemporary versions of either USB or Firewire. Having said that, a future iteration of

Firewire is predicted to be able to achieve a data transfer rate of 6.4 Gbps, which will be

slightly faster than the SATA 6.0 Gbps version of eSATA, while USB 3.0 will not be far

behind with a data transfer rate of 4.8 Gbps. Unlike USB or Firewire however, eSATA allows

low-level drive features such as SMART (Self-Monitoring, Analysis, and Reporting

Technology) to be available to the drive. Unlike Firewire, neither USB 2.0 nor eSATA are

capable of providing the 12V power supply required by some 3.5" external hard disk drives

(such as the 1TB Seagate external drive pictured below), which means they need a separate


power supply. The introduction of eSATAp is intended to resolve this issue, while USB 3.0

will reportedly be able to provide voltages of 5V, 12V or 24V. At the time of writing, the

storage capacity of a typical external hard drive can range from a few hundred gigabytes up

to 4 terabytes.

Figure 13: A Seagate 1TB external hard drive [28]

To meet the demands of the fast growing interface technologies, the data access time should

reduce immensely which is possible either by simply increasing the rotational speed of the

platters or increasing the cache size to hide the latency.

3.5 Future of Hard Disk Drives

Magnetic disks have followed Moore’s Law during the last decades, doubling in capacity

roughly every 12 months. As well as capacity, bandwidth has also followed this trend.

Latency does, however, improve with a smaller factor, making random seeks more and more

expensive [13]. Continuing this trend, either needs to rethink the way magnetic disks are used

or move to an alternative storage solution.

Considering the future of magnetic disk storage technology, there are few bottle necks in

continuing this trend, the chief among them being the rotational speed of the platters (RPM).

Disk RPM is a critical component of hard disk drive performance as it directly impacts the

latency and data transfer rate from the disk. The faster the disk spins, the spindle head reads

more data; the slower the RPM, the higher the mechanical latencies.


A white paper by Fujitsu Trends in Enterprise Hard Disk Drives [10] quotes -

"Ultrahigh-speed HDDs rotating at speeds exceeding 20,000 rpm have also been researched

but not commercialized due to heat generation, power consumption, noise, vibration and

other problems in characteristics, and a lack of long term reliability."

Companies have tried ingenious designs to reduce the excessive heat produced by a high

spin rate. Generally, the physical disk platters of a standard 3.5 inch hard disk have an

approximate diameter of 3 inches. However, in companies like Pegasus II, the platter size has

been further reduced to 2.5 inches.

The smaller platters cause less air friction and therefore reduce the amount of heat generated

by the drive. In addition, the actual drive chassis is one big heat fin, which also helps

dissipate the heat. But, the disadvantage is that since the platters are smaller they have less

data capacity. This can be overcome by using more of them in stack but consequently the

height of the drive increases.

To get higher data rates from HDDs, manufacturers can

Spin the disks faster-but at 20,000 RPM, enterprise-class HDD platters are already under

severe mechanical stress.

Increase the number of read/write heads that can be active simultaneously-which

constitutes a radical, substantial, and costly architectural and electronic change to HDD

design.

Add a second servo actuator with another set of read/write heads and another set of

read/write electronics-which is completely out of the question from an economic

perspective.

Figure 14 Moving Parts in Hard Disk Drives [29]


Combining these trends together suggests that, what customers of big multi user servers

would really like is - Faster disk drives with lower power consumption! But that's just getting

tougher with hard disk technology.


Chapter 4

4 . Solid State Drives

In the past few years flash memory became more and more important. In many mobile

devices like mobile phones, digital cameras, USB memory sticks and mp3 players, flash

memory is used in small amounts for years. But as the price for flash memory is rapidly

decreasing and the storage density of flash memory chips is growing, it becomes feasible to

use flash memory even in notebooks, desktop computers and servers. It is now possible to

construct devices containing an array of single flash chips to build a device such that its

amount of memory is sufficient to use for main storage. Such a device is called a Solid State

Drive (SSD).

Solid State Drives are increasingly common in small form factor computing like notebooks;

but SSDs are also used in the desktop and enterprise server space by those looking leverage

the speed of an SSD to get maximum performance. While solid state drives have several

benefits, including speed, longevity and practically no noise output, they are not always the

best choice as hard drives still dominate in both capacity and cost.

4.1 Flash Market Development

The market of flash memory is changing. The density of NAND Flash is increasing

drastically. While flash memory chips can be made much smaller, their capacity doubles

approximately every year

This development leads to a widespread usage of flash memory. Today solid state drives are

mainly used in notebooks. In the future they might even be used in server architectures as the

standard configuration.

Another interesting development is the cost of flash memory. The price of flash memory is

rapidly dropping. Every month flash memory devices or flash memory storage cards get

cheaper, new products with larger capacity emerge on the market.


Figure 15: Evolution in density of NAND flash memory

4.2 Solid State Drives

Solid state drives do not need any mechanical parts. They are fully electronic device and use

solid state memory to store the data persistently. Two different storage chips are used: flash

memory or SDRAM memory chips. In this thesis only flash memory based solid state drives

is considered which are mostly used today.


Figure 16: HDD and SSD [30]

FLASH memory is the cornerstone of the Solid State Drive. With the increasing use of flash-

based secondary storage, detailed understanding of flash behaviour which affects operating

system design and performance becomes important.

This chapter provides detailed information about flash memory. Then in multiple

sections the internal parts of a solid state drive are discussed. The section 4.4 describes about

the flash translation layer and techniques which ensure the functionality of the solid state

drive.

4.2.1 FLASH MEMORY

Flash memory is a specific type of EEPROM that can be electrically erased and programmed

in blocks. Flash memory is non-volatile memory. There are two different types of flash

memory cells:

NOR flash memory cells.

NAND flash memory cells.

At the beginning of flash memory NOR flash memory was often used. It can be addressed by

the processor directly and is handy small amounts of storage.


Figure 17: NAND flash memory chip [30]

Today, NAND flash memory is used to store the data. It offers a much higher density which

is more suitable for large data amounts. The costs are lower and the endurance is much longer

than NOR flash. NAND flash can only be addressed at the page level. Flash memory can

either come with single level cells (SLC) or multi level cells (MLC). The difference in these

two cell models is that a SLC can only store 1 bit per cell (1 or 0), whereas MLC can store

multiple bits (e.g. 00 or 01 or 10 or 11). Internally these values are managed by holding a

different voltage level. Both flash memory cells are similar in their design. MLC flash

devices cost less and allow a higher storage density. Therefore in most mass productions

MLC cells are used. SLC flash devices provide faster write performance and greater

SLC MLC

High Density X

Low Cost per Bit X

Endurance x

Operating Temperature Range x

Low Power Consumption x

Write/Erase Speeds x

Write/Erase Endurance x

Table 4-1 SLC vs MLC [9]

reliability. SLC flash cells are usually used in high performance storage solutions. Table 4-1

compares the two cell models.


Flash memory only allows two possible states:

erased

programmed

When a flash memory cell is in the erased status then its bits are all set to zero (or one,

depending on the flash device). Only when a flash cell is in the erased mode, the controller

can write to that cell. In this example this means the 0 can be set to 1. Now the cell is

programmed and kind of frozen. It is not possible to simply change back the 1 to a 0 and

write again. The flash memory cell has to be erased first (Figure 18). The even worse fact is

that not only a small couple of cells can be erased. The erase operation has to be done on a

much larger scale. It can only be done in the granularity of erase units which are e.g. 512KB.

If the amount of data being written is small compared to the erase unit, then a

correspondingly large penalty is incurred in writing the data. The flash memory architecture

is divided in blocks of flash memory. The smallest erasable block is called an erase unit. If

the position of the written data overlaps two blocks, then both blocks have to be erased.

However this erase operation must not be necessarily executed right before or after the write.

The controller of the device might choose just a new block for the write request and update

the internal addressing map.

Figure 18: Flash memory overwrite mechanism


4.2.1.1 Flash Structure

NAND Flash memory is organized into blocks where each block consists of a fixed number

of pages. Each page stores data and corresponding metadata and error correction code (ECC)

information. A single page is the smallest read and write unit. The internal structure of Flash

memory is rarely identical from chip to chip. As the technology has matured over the years,

many smaller architectural changes are been made. There are, however, a few fundamentals

for how Flash memory is constructed. Each chip will have a large number of storage cells. To

be able to store data, these will be arranged into rows and columns [1]. This is called the

Flash array. The Flash array is connected to a data register and a cache register (Figure 19).

These registers are used when reading or writing data to or from the Flash array. By having a

cache register in addition to a data register, the controller can process a request while the

controller serves data. This enables the Flash memory bank, to internally, process the next

request, while data is being read or written.

Figure 19 : A generic overview of a Flash memory bank [5]

4.2.1.2 Page

Pages in a Flash array are the smallest unit any higher level of abstraction will be working on.

The size of a page may vary, depending on the specifics of physical structure, but are

typically in the size of 4kB [6, 5]. . Having 128 pages, the next greater unit in the flash

memory hierarchy is the erase unit with 512KB; this can vary from drive to drive. In addition,

each cell will also have an allotted space for Error-Correction Code (ECC). During a read


operation, all the data from the page will be transferred to the data register. In a similar way,

write operations to a page will write all data in the data register to the cells within a page.

Recalling again, when writing flash cells supports only two operations. A cell can be in a

neutral or a negative state. When writing data to a page, it is only possible to change from

neutral (logical one) to negative (logical zero) state, meaning that to be able to change from

zero to one, entire page need to be reset. On whole, Flash chips can be grouped together in

so-called planes to increase storage capacity. Multiple planes can be accessed in parallel to

enhance data throughput [12]

4.2.1.3 Erase Block

When re-setting cell state with field emission, multiple pages will be affected by the reset.

This group of pages is called an erase block. A typical number of pages contained will be 128

[7], but can be different, depending on how the flash cells are structure. Given a page size of

4kB, an erase block would then be 512kB in size. This tells that changing content in any of

the pages within the erase block would need to rewrite all 512kB. For this simple reason, in-

place writes are not possible in Flash memory.

4.2.1.4 Cell degradation

The Each time a Flash cell is erased, the stress on the cell from the field emission will

contribute to cell degradation [1]. Modern flash memory banks are usually rated for

approximately 105 erase cycle, but to be able to handle a small number of faulty cells, each

page will be fitted with ECC data.

4.3 Physical layout

While flash memory is the cornerstone of the Solid State Drive before data gets to the flash

memory, there are several other SSD components that data must pass through. An SSD does

not actually have many unique parts and the differentiation in SSDs from different

manufacturers often happens in the controller and firmware more than anything else.

There is little information released by hardware manufacturers about drive layout and how

data is organized. To illustrate this, look at the entirety of what the Intel® X25-E datasheet

has to say about its architecture.


Figure 20 : Components of SSD

The Intel® X25-E SATA Solid State Drives utilize a cost effective System on Chip (SOC)

design to manage a full SATA 3Gbps bandwidth with the host while managing multiple flash

memory devices on multiple channels internally [2].

Having looked the structure of the Flash memory banks in Figure 19, gives an general idea

of what to expect, but only a simple read/write operation. If seen from block diagram in

Figure 23, an SSD connects several flash memory banks together in a Flash Controller (FC).

In a single SSD there are usually multiple FCs, which are commonly called channels. As

implied by the name, a channel will be able to independently process requests, giving SSDs

the ability to internally process a number of operations in parallel.


Figure 21: organization of conventional SSD

4.4 Flash Translation Layer (FTL)

In order to alleviate the ―erase-before-write‖ problem in flash memory, most flash memory

storage devices are equipped with a software or firmware layer called Flash Translation

Layer (FTL) [11]. An FTL makes a flash memory storage device look like a hard disk drive

to the upper layers. One key role of an FTL is to redirect each logical page write from the

host to a clean flash memory page which has been erased, and to remap the logical page

address from an old physical page to a new physical page. In addition to this address

mapping, an FTL is also responsible for data consistency and uniform wear-leveling. The

concept of the FTL is implemented by the controller of the solid state drive. The layer tries to

efficiently manage the read and write access to the underlying flash memory chips. It hides

all the details from the user. So when writing to the solid state drive the user does not have to

worry about free blocks and the erase operation. All the managing is done internally by the

FTL. It provides a mechanism to ensure that writes are distributed uniformly across the

media. This process is called wear-leveling and prevents flash memory cells from wearing

out.

4.4.1 Controller

The controller of a solid state drive manages all the internal processes to make the FTL work.

It contains a mapping table that does the logical physical mapping. The logical address that

comes from the request is mapped to the physical address which points to the flash memory

block where the data is in fact stored. Whenever a read or write request arrives in the solid

state drive the logical block address (LBA) first has to be translated into the physical block

address (PBA) (Figure 22). The LBA is the block address used by the operating system to


read or write a block of data on the flash drive. The PBA is the physical address of a block of

data on the flash drive. Note that over time the PBA corresponding to one and the same LBA

can change often.

Figure 22 : Address translation in solid state drive [8]

The controller handles the wear-leveling process (see section 4.4.2). When a write request

arrives at the solid state drive then a free block is selected, the data is written and the address

translation table is updated. Internally the old block has not to be erased immediately. The

controller could also choose to wait with the erasure and do a kind of garbage collection

when the amount of free blocks falls below a certain limit. Or the controller may also wait

until the drive is not busy. Certainly some data structures are used to maintain a free block list

and to store the used blocks. In a flash memory block there is a little overhead memory where

meta-data can be stored to help managing these structures. For example a counter stores how

many times each block has already been erased.

Like conventional hard disk drives, SSDs usually have an internal DRAM cache to buffer

write requests or store prefetched pages. This buffer enables solid-state disks to backup and

restore pages during erase cycles and to keep in-memory information, e.g., page-mapping

structures. Using and increasing the DRAM cache and add more intelligent techniques to

organize requests could make a huge difference. By using an FTL, it is possible to avoid most

drawbacks of flash chips while making use of the advantages. Therefore, the FTL is a major

performance-critical part of every SSD. For such a SSD controller one can think of many

optimizations.


Figure 23 : Internal structure of solid state drive [6]

Pre-fetching data when sequential read patterns occur (like a conventional hard disk drive

could fill its whole buffer) might speed up the reading process. A controller could also write

to different flash chips in parallel (Figure 23). Since all the parts are electronically in flash

memory, parallelization might not be very hard to add. Flash memory can also be seen as

many memory cells that are ordered in parallel. Using parallelization, the I/O requests, the

erase process and the internal maintaining of the data structures get more complicated, but a

much higher performance can be accomplished. One could even think of constructing a SSD

containing several drives combined together as a RAID configuration inside.

4.4.2 Garbage collection and Wear-leveling

Garbage collection and wear-leveling are other important tasks of the FTL. Garbage

collection is needed because blocks must be erased before they are used. The garbage

collector works by scanning the SSD blocks for invalid pages, then reclaiming those invalid

pages. Wear-leveling is necessary because most workloads write to a subset of blocks

frequently, while rarely writing to other blocks. Because each block of flash memory only has

a limited number of write-erases before it is worn out, without wear-leveling, the frequently

written to blocks would easily wear out well before the other blocks. Wear-leveling helps

solve this problem by shuffling cold (unused/less frequently used) blocks with hot (frequently

used) blocks to balance out the number of writes over all of the flash memory


4.4.3 Write Amplification

As seen in earlier section, SSDs will need workarounds to enable place while writing of data.

That is, changing a few bytes of data will either need the entire erase block to move, or the

entire erase block to be rewritten.

Write amplification is a measure of the number of bytes actually written when writing a

certain number of bytes. For example, if you write a 4K file, on average, the drive may write

40K bytes worth of data. This comes back to the flash characteristics. At some point, you will

need to combine data from several partially used blocks to free up pages for new data to be

written. Write amplification has an impact on the life of a drive. One effective way to

measure drive lifetime is to measure how many bytes can be written to the drive over its

lifetime.

4.4.4 Error correction

Knowing that a cell will loose its ability to properly store data after a certain number of

writes, the SSD-controller needs to be able to handle erroneous pages in a graceful manner.

To detect errors, each page has an allotted space for ECC. This makes it possible to check the

consistency of the data on writes. The ECC will be used to handle a given number of

damaged cells, but will at some point reach an un-correctable amount of noise. This page will

then be marked as invalid, and no longer be used by the FTL.

4.4.5 Trim

Trim is a function of the operating system telling the drive that a page is no longer valid!

This helps to reduce write amplification because you don’t copy stale pages. There will also

be fewer pages to copy, which will speed up the process of freeing up partially valid blocks.

When it is time to consolidate blocks to free up space, the SSD must copy all of the data it

considers valid to a new block before it can erase the current block. Without trim, the SSD

does not know a page is invalid unless the LBA associated with it has been rewritten.


4.5 Solid State Drive Interfaces

Solid State Drives are available with a variety of system interfaces based primarily on the

performance requirements for the SDD in the system. Also, since SDDs are generally used in

conjunction or interchangeable with magnetic disk drives, a common mass storage bus

interface is used in most cases. This also allows the system software to manage both drive

types in a similar way, making system integration nearly plug-and-play. There are also

interfaces initially designed for other purposes but have been adopted by SSDs in some cases.

Generally SSD are supported with SATA, Serial Attached SCSI, ATA/IDE similar to HDD

but SSD also support the latest revised version of these standards. To meet the demand, SSD

with PCI Express interfaces are used. In many server applications, PCI Express is used as the

end interface only when HDD’s are used in RAID mode just to meet the performance

bandwidth of PCI Express. However, when SSD using PCI Express, the flash chips are

directly placed on the PCI-Express card enabling to get better performance out –of flash chips

directly as compared to HDD’s used in RAID through other interfaces.

Figure 24 : X4 PC-Express card with NAND flash chips on it [31]

4.5.1 PCI Express

PCI Express is a 2.5 Giga transfers/second serial differential point-to-point high-speed

interconnect with added flexibility and scalability. The immediate benefit is increased


bandwidth. PCI Express offers 4GB/s of peak bandwidth per direction for a x16 link and 8

GB/s concurrent bandwidth. This allows for the highest performance in gaming and video

capture. In addition, PCI Express is designed for cost parity. The PCI Express x16 connector

is expected to be at cost parity to the high volume standard connectors.

Peripheral Component Interconnect Express (PCI-e) is an internal interface, so a

SSD would be on a circuit board and plugged into a PCI express slot on the motherboard.

4.6 SSD Market

The sources predict that SSD will have a major impact on the storage market. As of today,

companies like Crucial, Intel, Fusion-io: just to name few, have already released high speed

SSDs to the market. The architectural technologies that can speed up performance and IOPS

are independently developed by various original equipment manufacturers to suit particular

products or markets. The key architectural features which will increase throughput and shrink

the asymmetry gap in read / write IOPS are:-

Parallelization of the internal flash arrays

Improved flash management technology.

Faster flash controllers.

Faster host interface controllers (and faster interfaces driven by the needs of the SSD

market rather than adapted from the HDD market).

Hybridizing on board memory technologies - for example using faster RAM-like non

volatile memory in some parts of the device and slower flash-like memory in the bulk

storage arrays

A lot of trial and error will be involved as original equipment manufacturers throw products

at the market which tweak the technologies they understand best - and see which products

stick. Some of these will enhance currently known architectures, while others may make

some architectural features obsolete. In years around the corner- the flash SSD technology is

expected to have reached a point where the architecture of an ideal SSD is well established -

and the ongoing developments will be driven more by process changes than anything else.


Figure 25 : SSD Market development

4.7 Future

The availability and maturity of SSD-technology has changed drastically over the last couple

of years. Having gone from being a vastly more expensive technology that proved better in

only a small subset of scenarios, With ONFI (Open NAND Flash Interface)[3] working

intensively on NAND technology, The SSD future seems to be bright. ONFI has created the

Block Abstracted NAND addendum specification to simplify the host controller design by

relieving the host of the complexities of ECC, bad block management, and other low-level

NAND management tasks. The ONFI block abstracted NAND revision 1.1 specification adds

the high speed source synchronous interface, which provides up to a 5X improvement in

bandwidth compared with the traditional asynchronous NAND interface. The ONFI

workgroup continues to evolve the ONFI specifications to meet the needs of a rapidly

growing and changing industry.


ONFI 2.1 [3] contains a plethora of new features that deliver speeds of 166 MB/s and 200

MB/s, plus other enhancements to increase power, performance, and ECC capabilities. Along

with ONFI, the SSD manufacturing companies are designing their products to meet fast

interface technologies such as SATAIII, PCI Express.

ONFI is dedicated to simplifying NAND flash integration into consumer electronic products,

computing platforms, and any other application that requires solid state mass storage.

4.8 Summary

This chapter has given an overview of the technology behind SSDs. It is seen that flash cells

are at a point where production and technology are mature enough to make storage devices

capable of competing with magnetic disks. Discussions regarding some (FTL, Wear-leveling)

of the challenges SSDs are faced with when using these Flash cells for bulk storage.

4.9 Typical characteristics of HDD and SSD

Reliability of the drive

HDD drives use mechanical parts whose lifespan is limited. While SSD using flash memory

can sustain almost 105

write cycles per write cell [21].

Access Speed

The typical access time for a Flash based SSD is about 35 – 100 micro-seconds whereas

that of a rotating disk is around 5,000 – 10,000 micro-seconds. That makes a Flash-based

SSD approximately 100 times faster than a rotating disk.

Consistent read performance

Read performance does not change based on where data is stored on an SSD but in HDD, If

data is written in a fragmented way, reading back the data will have varying response times.

Defragmentation

SSDs do not benefit from defragmentation because there is little benefit to reading data

sequentially and any defragmentation process adds additional writes on the NAND flash

that already have a limited cycle life [22]. HDDs may require defragmentation after


continued operations or erasing and writing data, especially involving large files or where

the disk space becomes low.

Audible noise

HDD have audible clicks and crunching sounds. While SDD drives are often quieter

because they have no mechanical parts.

Size

Flash-based SSDs are manufactured in standard 2.5″ and 3.5″ form factors. 2.5″ SSDs are

normally used in laptops or notebooks while the 3.5″ form factors are used in desktops

Vibration

SSDs are naturally more rugged than HDDs. SSD drive can sustain up to 1,000 Gs/0.5 ms of

shock[16] before sustaining damage or a drop in performance while HDD drives can

withstand up to 63 Gs/2ms while operating and 350 Gs/1ms [24] when turned off.

Power Consumption

SSDs have low power consumption over HDDs

Heat Dissipation

Along with the lower power consumption, there is also much lesser heat dissipation for

systems using Flash-based SSDs as their data storage solution. This is due to the absence of

heat generated from the rotating/movable media. This certainly proves to be the one of the

main advantages of Flash-based SSDs relative to that of a traditional HDD. With less heat

dissipation, it serves as the ideal data storage solution for mobile systems such as PDAs,

notebooks, etc.

Mean Time Between Failures (MTBF)

Average MTBF for SSDs is approximately 2,000,000 Hours [16] while MTBF for HDDs is

approximately 700,000 Hours [24]

Cost Considerations

As of February 2011, NAND flash SSDs cost about (US) $1.20–2.00 per GB and HDDs

cost about (US) $0.05/GB for 3.5 inch and $0.10/GB for 2.5 inch drives.


Chapter 5

5 . Performance: HDD vs SSD

A chapter earlier has indicated SSDs have the advantage of not having moving parts, giving it

an overall low latency. Magnetic disks, on the other hand, have a harder time keeping latency

low, due to seek and rotational latency. In this chapter, the focus is on how the above

mentioned general performance characteristics add up when faced with specific application

scenarios. The goal is to get a clear profile of both SSD and HDD, making the right choice

when it comes to performance.

There are various techniques used to analyse performance of storage drives, and there

architecture behind the performance.

5.1 Benchmark

In computing, a benchmark is the act of running a computer program, a set of programs, or

other operations, in order to assess the relative performance of an object, normally by

running a number of standard tests and trials against it [23].

Benchmarks provide a method of comparing the performance of various subsystems across

different chip/system architectures.

The performance of both SSDs and magnetic disks can be difficult to summarize with just a

few numbers. As discussed earlier, certain aspects of a disk might give different performance

results, and one might get different performance depending on the workload. In addition to

these uncertainties, different file systems will store data in a fundamentally different way. All

this put together, it is hard to get a clear answer for what level of performance a given

application can expect to achieve, only looking at numbers from datasheets.

To investigate performance levels, up-to-date high-end SATA consumer and enterprise flash

solid state drives with mechanical hard disk drive are benchmarked. For this, When choosing

drives for benchmark, focused on mid-range alternatives, the two most popular/best SSDs in


the market today are considered, namely Intel X25-E and Crucial Real C300. The two SSDs

are differentiated by the type of memory and the system interface technology used. HDD

from Seagate is considered for benchmarking.

Disk Specification

Make Seagate[24] Intel X25-E[16] Crucial Real C300[17]

Type HDD SSD SSD

Size(GB) 80 32 128

Form factor 3.5" 2.5" 2.5"

Interface SATA SATA SATA

Transfer rate(Gbps) 1.5/3 1.5/3 06/03/1.5

Rotation 7200 - -

Memory Magnetic Platter SLC NAND MLC NAND

Average access time 4.16ms 0.08ms <0.1ms

sequential read

250MB/s 355MB/s

Sequential write

170MB/s 140MB/s

Table 5-1 Overview of drives in Benchmark environment

5.2 Benchmark environment The benchmark environment consists of two different SSDs one from Intel and other from

Crucial along with a magnetic disk drive from Seagate. Information about the drives, as

provided from datasheets for comparison is available in Table 5-1. During benchmarks, for

simplicity the drives are referred by their manufacturer. The test PC consists of Intel®

Xeon® Processor 5600 and the Intel® 5520 Chipset-2.67GHz and 2GB of Random Access

Memory (RAM), running on windows XP (32bit) operating system.

In section 5.3, different benchmarks are run on both SSD's and magnetic disk, their

performance are analysed with the resulting benchmark values.

5.3 TPC-H Benchmark

Transaction Processing Performance Council (TPC) is a non-profit organization founded in

1988 to define transaction processing and database benchmarks and to disseminate objective,

verifiable TPC performance data to the industry. TPC benchmarks are widely used today in

evaluating the performance of computer systems.


The TPC-H benchmark is widely used in the database community as a yardstick to

assess the performance of database management systems against large scale decision support

applications. The benchmark is designed and maintained by the Transaction Processing

Performance Council.

5.3.1 BACKGROUND AND SIGNIFICANCE OF TPC-H

The TPC-H benchmark [13] tests the performance of analytics servers used by decision

support systems by measuring the performance of ad-hoc queries against a data set (called a

scale factor) of a specific size while the underlying data is being modified. The objective is to

simulate an on-line production database environment with an unpredictable query load that

represents a business oriented decision support workload where a DBA must balance query

performance and operational requirements such as locking levels and refresh functions.

Results are usually expressed as QphH@Size for performance or $/QphH@Size where

―Size‖ indicates the database size or scale factor used for the testing. The performance metric

reported by TPC-H is called the TPC-H Composite Query-per-Hour Performance Metric

(QphH@Size), and reflects multiple aspects of the capability of the system to process queries.

TPC-H benchmarking database sizes are currently 1GB, 10GB, 30GB, 100GB, 300GB,

1,000GB, 3,000GB, 10,000GB, 30,000GB, and 100,000GB but the TPC discourages

comparing results across different database sizes since database size is a major and obvious

factor in performance.

Although any benchmark, including the TPC-H, is unlikely to represent any particular

customer’s decision support workload or environment, TPC-H is an important test because of

the high level of stress it puts on many parts of a decision support system, and is used by

virtually all major platform vendors, and many decision support system suppliers to

demonstrate the performance attributes of their systems.

In this thesis, there is one important consideration that has to be noted, contrast to expressing

results in QphH@Size for performance, the time taken by individual queries to run against

the database set is measured.

The TPC provides a set of tools to build TPC-H benchmark .This tool contains code files.

The tools provided with TPC-H includes, a database population generator (DBGEN) and a

query template translator (QGEN)


DBGEN and QGEN are written in ANSI 'C' for portability, and have been successfully

ported to over a dozen different systems. While the TPC-H specification allows an

implementer to use any utility to populate the benchmark database and to create the

benchmark query sets, the resultant population must exactly match the output of DBGEN.

The source codes have been provided to make the building a compliant database population

and query sets as simple as possible.

A TPC-H benchmark application package is created which is bound to a database, this

application measures time taken by individual query to run against 10 GB database which is

application specific, it is done by using DBGEN. The overview of TPC-H benchmark

application created is shown in Figure 26. There are several steps to be followed in order to

create an application package, for the detailed procedure look into appendix A

TPC benchmark results are expected to be accurate representations of system

performance. Hence, there are certain guidelines that are expected to be followed when

measuring those results. The approach or methodology used in measurements is explicitly

described in the specification [13].

Figure 26 : TPC-H benchmark application outline


TPC-H Benchmark performance

0

200

400

600

800

1000

1200

1400

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Query number

Av

era

ge

ex

ec

uti

on

tim

e (

se

co

nd

s)

Seagate HDD Intel X25-E SSD Crucial C300 SSD

5.3.2 Test scenario

SSDs are bit vulnerable to random read in contrast to sequential read and it is therefore

interesting to observe how they perform by changing conditions for data seek through queries

while scanning through large database. In addition to looking at read performance, it is

interesting to know how same operations consume time such as scanning tables while

traversals in database handled on Intel X-25E, Crucial Real C300, as opposed to Seagate's

HDD.

The application package is created using 17 queries from QGEN, each one of them will

access 10 GB database. The application database is created on all three drives independently.

The identical application is created and ran through the drives by bonding the packages to

respective database. The execution time of every individual query is calculated.

To discover the level of impact the access time has on performance, a set of same

queries across three different disks was tested. These disks are listed in Table 5-1.Each query

was made to run 115 times and the initial 15 runs have been excluded while calculating the

average time taken. This was done to ensure standard deviation between query execution

times is less than 3 percent and the processor is only into running the application query.

5.3.3 Results

When comparing the results from Figure 27, observation indicates the query execution times

from Intel X-25 E and Crucial Real C300 are low and comparable in contrast to very high

execution times in Seagate HDD.

Figure 27 : TPCH benchmark performance results


The difference in execution time is not consistent over the queries executed; this solely

depends on how the database is laid across the drives. Considering the query execution time,

on an average, Intel X-25E is 8 times and Crucial C300 is 10 times faster than Seagate HDD.

Both Intel X-25E and Crucial Real C300 SSD perform relatively close to that of advertised

speed when doing reads, this is due to the symmetrical latency properties of SSDs.

Except for the query readings-1, 8, 10, 12 and 16, both SSDs, across all file systems, achieve

significantly lower execution times. This can be attributed to the fact that although database

is distributed across flash memory chips, flash memory banks are channelled in parallel,

hence the lower execution time. When getting a series of requests for data located on

different channels, the SSDs are able to handle these requests in parallel.

The execution time of query readings-1, 8, 10, 12 and 16 in both SSDs and Seagate HDD

show comparable values, this is because the queries are accessing sequentially stored data.

Summarizing, solid state drives perform significantly better than hard disk drives in random

operations in contrast to sequential operations.

5.4 Energy Efficiency Test

While talking about solid state drives, power consumption becomes an interesting point of

discussion. Nowadays there are many huge server architectures that run 24 hours a day and

consume quite an amount of power.

An advantage of flash memory is that it has low power consumption. The one

advantage of a solid state drive over a good hard disk drive in terms of power consumption is

that the operation power consumption is lower. On the contrary several measures showed that

the idle power consumption can be much lower in a hard drive. A workload can be surely

found where the hard disk drive wins and others where the solid state drive can show better

energy efficiency. Newer high performance solid state drives often contain an additional

amount of DRAM cache which will also use some additional power. The general power

consumption comparison is not that easy to do. No general statement can be found. Every

hard disk drive or solid state drive has slightly different power consumption. See the critical

article from Tom's Hardware [15] for more information.


5.4.1 Test scenario

It is interesting to know how much on an average Seagate hard disk drive would consume

power in comparison to Intel X25-E and Crucial Real C300.

The power consumption was measured for all 17 TPC-H queries running each of them for 10

times. In order to measure, cost control 3000-a product from Base Tech, was used, which

measures not only power consumption but also energy costs accordingly.

Consumption of the power indicated is not purely by the drives alone but also from the

system on which it is running, the variations are quite acceptable due to moving parts of the

Seagate hard disk.

5.4.2 Results

The results are displayed in the Figure 28

Figure 28: Comparison for energy efficiency

These values are indirectly influenced by the total amount of time taken by individual drives

to execute all the TPC-H queries. The Seagate drive consumes approximately 6 times more

power than Solid State Drives under tests. This is mainly due to moving parts of Seagate drive;

especially during random readings, head needs to be moved repeatedly. In case of SSDs, no

additional power is required to activate the platters or the mechanical arms.

Power consumption

0

0.5

1

1.5

2

2.5

3

Seagate HDD Intel X25-E Crucial C300

Power(KWh)


However, as said earlier no specific conclusion can be brought out from this but, the

picture shows the overall amount of power that can be saved by replacing HDD by solid state

drives while performing the same task.

5.5 HD Tune Benchmark

HD Tune is a hardware-independent utility that administrators can use to perform a variety of

hard disk diagnostics, regardless of manufacturer, to confirm hard disk health and

performance.

HD Tune is a hard disk utility which has many functions namely Benchmark (measures

low level read/write performance), file benchmark, random access, health status check, drive

temperature display. However, our main interest in HD tune is the benchmark functionality;

the benchmark function itself offers four different tests:

Transfer rate

The data transfer rate is measured across the entire disk surface (default) or across the

selected capacity [14] for the specified data block size. It gives option to measure transfer rate

for both read and write. To prevent accidental data loss, the write test can only be performed

on a disk with no partitions.

During the transfer test, certain parameters have to be set as per our requirements like

Test speed/accuracy: the full test will read or write every sector on the disk. This will give

the most accurate results but the test time will be very long. By choosing the

partial test the transfer speed is sampled across the disk surface. The test time

and accuracy can be chosen by moving the slider.

Block size: block size which is used during the transfer rate test. Lower values may give

lower test results. The default and recommended value is 64 KB.

Access time

The average access time is measured and displayed in milliseconds (ms).

Burst rate

The burst rate is the highest speed (in megabytes per second) at which data can be transferred

from the drive interface (IDE, SATA, USB) to the operating system.


CPU usage

The CPU usage shows how much CPU time (in %) the system needs to read data from the

hard disk.

As seen in section 5.3.3, the time consumed by Seagate hard disk to run queries was quite

high in comparison to Intel X-25E and Crucial Real C300. This variation could be due to

various factors such as access time, average transfer rate. In section 5.5.1, the above mention

factors by running through HD tunes through all the three drives were examined.

5.5.1 Test scenario

The HD utility was run over all three drives. For the transfer rate test, the block size setting

was assigned to 4MB with full test in fast mode.

5.5.2 Results

Transfer rate:

Figure 29: Read speed comparison

The sequential read in Seagate hard disk is low compared to hard disk drives shown in Figure

29, Seagate HDD is nearly 4 times slower than Intel X25-E and 5 times slower than Crucial

C300. This is mainly because of its large access time.

Average Read Speed

0

50

100

150

200

250

300


MB

/s


Figure 30: Access time comparison

This access time difference itself shows how valuable Solid state drives could be in high

speed real time application. The SSDs access data nearly 150 times faster than HDD.

5.6 Summary

From benchmarks, the results obtained fit many of the observations done in section 3.1.3.

Magnetic disks showed an overall low performance on read operations, due to seek time and

rotational delay. SSD’s are 8-10 times faster for reads on an average, and 150 times faster

with respect to the access speed compared to Hard Disk Drive. SSDs showed a high

performance on read operations, even showing a higher degree of performance on random

reads in TPC-H benchmark. Most likely, this could be attributed to the fact that an SSD

consist of multiple flash memory chips, connected in parallel, as discussed in section 4.3.

SSDs will depend heavily on FTL, as each channel can handle requests in parallel!

The overall performance results indicate solid state drives give better performance then HDD,

but upon keen observation, considerable performance difference can be seen between the two

solid states drives. The crucial Real C300 SSD outperforms Intel X25E SSD in most of the

benchmark tests conducted above. The performance difference could be due to various

reasons like flash type, controller, system architecture, cache buffer used. In coming chapters,

the discussions bring out the analysis for performance difference.

Access time

15.7

0.1 0.10

2

4

6

8

10

12

14

16

18


Tim

e(m

s)


Chapter 6

6 . Better Investment: SSD or additional RAM?

In comparison to hard disk drives, Solid state drives are faster, quieter, more efficient in

energy consumption-but costlier!

Usually, to get better performances with Hard disk drives, Systems are installed with more

Random Access Memory. Since cost per Giga-Byte in hard disk drives is low, just by adding

external RAM, performance outcome of the hard disk drives can be controlled. A system

working on 10GB of data, if installed with RAM anything more than 10GB should increase

system performance immensely as the data can be completely loaded into RAM thereby

reducing the access time of hard disk drive. But, RAM is costlier by itself - stated in section

2.3.1.

By adding more RAM to the system, the performance can be immensely increased;

as seen in previous chapter, performance can also be enhanced replacing HDD by SSD in the

system. To get a better picture of a better buy, section 6.1 focuses on – comparing the

performance of HDD with additional RAM to SSD with 2GB of RAM. This will give us an

idea of a better investment between RAM and SSD to enhance system performance.

To analyse this, the same two solid state drives used earlier in section 5.1 and the Seagate

Hard disk drive are considered. TPC-H benchmark is used for performance comparison.

6.1 Benchmark Environment

Drive Specification

Make Seagate[24] Intel X25-E[16] Crucial Real C300[17]

Type HDD SSD SSD

Size(GB) 80 32 128

Interface SATA 2.0 SATA 2.0 SATA 2.0

Rotation 7200 - -

Memory Magnetic Platter SLC NAND MLC NAND

System RAM (GB) 2 / 8 /12 2 2

Table 6-0-1 Overview of drives in Benchmark environment


TPC-H benchmark with scale factor of 10 was run through all the drives initially with system

memory 2GB RAM. Later, benchmark process was repeated in Seagate HDD with 8GB

RAM and 12GB RAM.

CPU: Intel® Xeon® Processor 5600

Main board: Intel® 5520

OS : Microsoft Windows 7 Professional x64;

Memory:

o 2 GB, DDR3-1333 SDRAM (Kingston)

o 4 GB, DDR3-1333 SDRAM (Kingston)

o 8GB, DDR3- 1333 SDRAM( Micron)

The TPC-H queries were run for 115 times but while calculating average query execution

time, first 15 results were excluded to ensure that processor is only into running the

application query.

6.2 Results

To match the performance of SSD, the system was upended with more RAM in steps.

Figure 31: performance comparison between HDD with 12GB system RAM vs SSDs with 2GB system RAM

Analysis of the Figure 31 indicates, expect for the queries-1, 3, 5, 12 and 16 the performance

of SSDs with 2GB RAM cannot be met even by increasing the RAM to 12GB in the system


with Seagate HDD. Increasing RAM beyond 12GB to attain better performance is not

productive in current scenario. Since database size is normally huge, large amount of RAM

has to be appended consistently with increase in database size to enhance the performance.

Considering overall query performances, SSD provide better results.

The comparison of performances with variation in RAM with HDD is indicated in Figure

32. The performance seems to be at saturation level irrespective of increase in RAM – except

in queries 1, 2, 8, 10, 12, 15. Increasing RAM beyond actual database size to attain better

performance is not productive.

Figure 32: performance comparison between HDD with 2GB, 8GB, and 12GB system RAM

6.3 Conclusion

Database applications (server) greatly benefit random disk access speed .This is why servers

have large DRAM footprints used as disk cache. However, solid state drive provides

significantly better performance in random reads. The test results indicate solid state drives

performs better while dealing with large data. Considering server applications, it is worth

investing on solid state drives than RAM. Hence solid state drives are a better for

performance enhancements.

A precise decision cannot be taken by just based on above results as the scenario cannot be

generalised. Applications where random reads or writes are very less compared to sequential

reads or writes, HDD with RAM is better buy saving the investment significantly. Therefore


you have to be really careful about where SSDs are used; otherwise it is very difficult to

justify their additional cost.

6.4 Benchmark problems

There are no standard benchmarking tools that are specifically built to test Solid State Drives.

Benchmarking SSDs using tools developed for HDDs causes several unique problems that

need to be solved by developing benchmarking software that catches up with the technology.

As mentioned, SSDs use different strategies and data geometry than conventional HDDs.

This causes some functional differences and, more importantly, makes some benchmarks

inadequate, particularly those that were optimized for the standard platter configuration of

HDDs. Due to these addressing issues, some benchmarks could show radically different

results also on the transfer graphs and or average the performance values incorrectly.

Needless to say that the same algorithms applied to a functionally totally different device will

not render the same ―realistic‖ performance values, on the contrary, many of the test points

will fall within one block but others will span from the end of one block to the beginning of

another block. That will cause delays in the completion of the reads or writes and since the

test samples are relatively small in size, it will result in low ―calculated‖ performance values.

Because the stride size is constant in most benchmarks and the page size is constant, too,

these values will result in a saw-tooth pattern of the performance graph, simply as a

consequence of the periodicity of the two address patterns. Some benchmarks appear to use

test patterns that don’t seem to work well with SSDs and, thus, generate artefacts. Thus

benchmark cannot be viewed as cent percent representative of the SSD performance.


Chapter 7

7 . Reverse engineering

As seen in section 5.3.3, Crucial Real C300 performed better than Intel X25E. This chapter

focuses on system level structure of solid state drives deeply. Here, it is tried to analyze the

factors that influenced the difference in performance. To visualize, reverse engineering

process is carried on Intel X25E and Crucial Real C300.

Reverse engineering is the process of discovering the technological principles of a human

made device, object or system through analysis of its structure, function and operation. It

often involves taking something (a mechanical device, electronic component, or software

program) apart and analyzing its workings in detail to be used in maintenance, or to try to

make a new device or program that does the same thing without using or simply duplicating

(without understanding) any part of the original.

Reverse engineering has its origins in the analysis of hardware for commercial or

military advantage. The purpose is to deduce design decisions from end products with little or

no additional knowledge about the procedures involved in the original production.

The basic building blocks of the Solid State Drive are Flash chip Array , Host interface and

Controller chip which holds the other two intact and manages entire system. Intel X25E and

crucial Real C300 are approached by these building blocks for analysis.

7.1 Intel X25-Extreme

The Intel X-25E [16] board which is shown below in Figure 33 mainly has three kinds of

chips namely: a set of flash chips, a controller and a DRAM chip.

Intel X25 Extreme uses 50 nm Single Level Chip (SLC) to build its Flash array block. The

X25-E uses a 10-channel storage controller backed by 16MB of cache. Amusingly, the cache

is provided by Samsung K4S281632I-UC60 SDRAM memory chip. The storage controller is

an Intel design that is particularly crafty, supporting not only SMART monitoring, but also

http://en.wikipedia.org/wiki/Machine

http://en.wikipedia.org/wiki/Electronic_component


Native Command Queuing (NCQ). NCQ was originally designed to compensate for the

rotational latency inherent to mechanical hard drives, but here it is being used in reverse, the

ability of the SATA hard drive to queue and re-order commands to maximize execution

efficiency. It takes a little time (time is of course relative when you're talking about an SSD

whose access latency is measured in microseconds) between when a system completes a

request and the next one is issued.

Figure 33: Intel X25 – Extreme SSD

Intel X25E is compatible with SATA 1.5 Gbps and 3 Gbps. The flash packages of course are

only the building blocks of an SSD. Much of the magic comes from the architecture and

optimizations of the SSD controller logic.

7.1.1 Controller Analysis

The Intel X25E controller was scanned and opened to gather more information about the

internal structure. The carving of Intel X25E pictures are shown below


Figure 34: Controller from Marvell on Intel X25-E SSD board

The controller was a ball grid array package (BGA) with a single row wire bonding. Wires

were made of gold used for bonding. Although the controller chip seems to be from Intel at

the glance, the specifications on the die indicate it was from Marvell.

The analog and digital sections in the controller die were well distinguished. The orientation

of the controller die on the Intel X25E clearly indicates the SATA controller, DRAM

controller and the Flash controller sections.

The specifications of the controller are indicated in the Table 7-1, Table 7-2, Table 7-3.

7.2 Crucial Real C300

Crucial Real C300 [17] features with 16 MLC flash memory chips split evenly between the

two sides of the circuit board. The 128GB capacity SSD uses 8GB flash chips that have two

NAND dies apiece. Crucial Real SSD uses flash memory chips from Micron. The flash chips

in modern solid-state drives usually conforms to the Open NAND Flash Interface (ONFI) 1.0

standard like the Intel X25E uses, but the Crucial Real SSD flash chips are hip to the much

more recent ONFI 2.1 spec.

The ONFI 2.1 specification pushes NAND performance levels into a new performance range:

166 MB/s to 200 MB/s. This new specification is the first NAND specification to specifically

address the performance needs of solid-state drives to offer faster data transfer rates in

combination with other new technologies like SATA 6 Gbps, USB 3.0 and PCI Express

Gen2.


Figure 35: Crucial Real C300 SSD

If you want to wring more than 300MB/s from a mechanical hard drive, you are going to

have to combine several of them in RAID. Solid-state drive makers are actually faced with

the same challenge. Individual flash chips do not necessarily offer superior sequential

throughput to traditional hard drives, which means that SSD seeking to maximize

performance must distribute the load across numerous chips tied to multiple memory

channels, effectively creating a multiple channel array within the confines of a single drive.

The Crucial Real SSD inherits its 6Gbps Serial ATA support from Marvell's 88SS9174 flash

controller which supports TRIM command set. TRIM works in conjunction with Marvell's


garbage collection routine, which runs in the background to reclaim flash pages marked as

available by the command. The frequency with which garbage collection is performed

depends on how the drive is being used and how much free capacity it has available. With

eight memory channels, the Marvell controller is two short of the ten channels Intel squeezed

into its X25E SSD. Crucial claims the C300 SSD can sustain a sequential read rate of

355MB/s when connected to a 6Gbps SATA interface. The drive's sequential read

performance purportedly drops to 265MB/s when using a 3Gbps link.

Flipping the C300's circuit board reveals a DDR memory chip that serves as the drive's cache.

The 128MB Micron DDR3 DRAM module offers decent cache performance for fast

transaction buffering, which will become more important as SATA-III 6.0 Gbps transfers are

observed.

7.2.1 Controller Analysis

The Marvell controller provides a lot better performance; it would be interesting to have a

closer look. The scanned and opened up pictures provide more information about the internal

structure.

Figure 36: Controller from Marvell on Crucial Real C300 SSD board

Unlike Intel X25E controller, Crucial Real C300 controller die did not give clear picture.

However, by the orientation of the controller die on the board, the SATA and cache

interconnections was visualized. The closer look suggested, the controller chip was ball grid

array (BGA) package with wire bonding. Contrast to single row bonding in Intel X25E

controller die, Crucial Real C300's Marvell controller used three rows for bonding expect for

SATA where it used single row bonding. The pads for wire bonding were neatly arranged in

multiple rows to shrink the die size.


The surface of the die looked like FPGA but a detailed analysis suggested, it was a mesh used

by Marvell to limit inheritance of the design by its competitors. The much deeper analysis

was of interest but due to certain limitations, analysis was concluded at this stage.

7.3 Summary

The reverse engineering analysis of the controllers from Intel X-25E and Crucial C300 is

summarized in the table 7-1. With same package technology, increased number of balls, the

size of the die from Crucial Real C300 SSD controller is comparably small. Although both

SSDs use controller from Marvell, Crucial C300 uses the latest release of it which includes

improved firmware features.

7.3.1 Controller Specification

SSD Intel X25-E SSD (32GB) Crucial Real C300 SSD (128GB)

Controller Manufacturer-Year Marvell -2007 Marvell -2009

Chip Size (cm) 1.9x1.9 1.7x1.7

Part Number PC29AS21AA0 88SS9174-BJP2

Package Technology BGA BGA

Balls 409 521

Die Size(mm) 5.9 x 5.9 4.4 x 4.4

Bonding Wire Wire

# Bonding Rows 1 Row 3 Rows (SATA signals – 1 Row)

Table 7-1 Controller chip details of Intel X25- E and Crucial Real C300 SSD

As seen in section 4.4.1, a better performance can be obtained by increasing the number of

channels, giving SSDs the ability to internally process a number of operations in parallel. The

Intel X25-E uses 10 channels with 20 flash chips but Crucial C300 uses only 8 channels with

16 flash chips still giving a better performance. This is because, flash chips in Crucial C300

uses advanced interface standard (ONFI 2.1) in contrast to Intel X-25-E’s ONFI 1.0 standard.

ONFI 2.1 offers simplified synchronous flash controller design, but pushes performance

levels to higher range –166 MB/s to 200 MB/s. This is summarized in table 7-2

7.3.2 Flash Interface


Flash Chip SLC MLC

Number of Channels 10 08

ONFI standard* 1.0 2.1

Speed/flash chip (MB/s) [26] 50 166-200

Cache SDRAM DDR3 SDRAM

Table 7-2 Controller chip details of Intel X25- E and Crucial Real C300 SSD


The advanced flash interface technology in Crucial Real C300 is supported by SATA III i.e

6Gbps, while Intel is supported by SATA II interface.

7.3.3 Host Interface

7.4 Conclusion

The latest SSD controller from Marvell and advanced flash chip interface standards obtained

in Crucial Real C300 giving higher bandwidth, which is backed up by a hefty DDR3 buffer

enables it to meet the demands for faster data rates with SATA 6Gbps. These mentioned

features of Crucial Real C300 SSD contribute to outperform Intel X25E SSD.

Controller is the main part which holds interfaces from surrounding units to produce

extended performance, baring bottle necks of individual parts. Overall NAND performance is

an important factor at a time when faster speeds are a critical design factor for solid state

drives; especially the interfaces of those SSDs connect to offer faster data rates with Serial

ATA 6 Gbps, USB 3.0, and PCI Express Gen2.

*Note: In our tests, a common SATA 3Gbps was used for testing both the SSDs


SATA Interface Compatibility 1.5 Gbps , 3Gbps 1.5Gbps,3Gbps,6Gbps

Table 7-3 Interface compatibility of Intel X25- E and Crucial Real C300 SSD


Chapter 8

8 .Designing optimal performance based SSD system

level architecture and its Controller cost estimation

System designers perform a series of trade-offs when selecting a particular controller for their

target product and target market(s).

The trade-offs include:

Programmatic – cost, schedule, support, warranty, and availability.

Technical – performance, power, package options, features, scalability, and

flexibility.

Other – commonality, compatibility, documentation, development support, testing,

and reputation.

In the process of controller selection, the system designer is also doing the same analysis for

the flash parts and other parts needed in the design. It is an iterative process to find the right

combination of components to best meet the requirements for the particular product.

Due to proprietary concerns, not all the controller design data is available to the general

public over the Internet. There is however a significant amount of application detail that can

be learned for each of the SSD controllers on the market by studying their use in existing

SSDs. In order to meet the known performance bandwidth specifications of current interface

technologies through SSD and put them into economic perspective, this section performs a

package level cost-effectiveness analysis of controllers by varying the system level

architecture of SSD-considering SSD controller price per performance as our metric of

choice.

8.1 Cost estimation of controller for a system designed to

meet performance specification

8.1.1 MATLAB GUIDE

GUIDE, the MATLAB graphical user interface development environment, provides a set of

tools for creating graphical user interfaces (GUIs). These tools simplify the process of laying

out and programming GUIs.


Using the GUIDE Layout Editor, user can populate a GUI by clicking and dragging

GUI components-such as axes, panels, buttons, text fields, sliders, and so on-into the layout

area. User can also create menus and context menus for the GUI. From the Layout Editor,

user can size the GUI, modify component look and feel, align components, set tab order,

view a hierarchical list of the component objects, and set GUI options.

A tool was created using MATLAB GUIDE, where the GUI will through-up options for the

user to design his own system. The tool created defines the controller size and its cost for the

designed system.

The tool created is system level optimization tool, designed to optimize different interfaces in

solid state storage system to get the best performance and cost for a desired system interface.

It defines the Controller which meets the performance of system interface by varying the

quantity of other opted type of integral parts of SSD.

8.2 Implementation factors in optimization tool

While designing the optimization tool, several factors have been taken into considerations

which are listed below.

Performance Factors

Host Interface

Host Interface Coding technique Data rate/second Transfers/second Clock

SATA 2.0 8b/10b 300MB 3GT/s 3GHz

SATA 3.0 10b/8b 600MB 4GT/s 4GHz

PCI-e 2.0* 8b/10b 500MB 5GT/s 5GHz

PCI-e 3.0* 128b/130b 1GB 8GT/s 8GHz

Table 8-1 System Interface types and their performances

* Per Lane

Buffer Cache Interface

Cache Type Transfers/second

DDR1 200-400 MT/s

DDR2 400-1066 MT/s

DDR3 800-2133 MT/s

Table 8-2 Buffer Cache types and their performances


In setting the buffer cache interface for the designed system, its value assigned performs 4

times the maximum performance of system interface/Host interface selected.

Flash Interface

Flash interface is the main part in the solid state drive where the performance of the designed

system is controlled. The performance is controlled by varying the number of flash channels,

number of flash chips per channel, channel width. The tool provides with two different

options i.e Single Level Cell and Multi Level Cell to be chosen while designing the system.

Note: Flash read Performance per chip per channel is considered as a variable as it depends

on the manufacturer.

Controller Size Factors

While calculating the controller size resulting from system designed, different interfaces that

are to be handled and their resulting signal pins are considered. Table below lists the different

signal pins that are considered for respective interfaces on the controller.

Signals Flash Chips DRAM** SATA PCI-Express* UART

Control Signals Chip select CK CLK REF_CLK_P CLK

Write enable CK# REF_CLK_N

Command latch enable CKE

Ready/busy RST#

Reset/write protect RAS#

CAS#

WE#

CS#

ODT

Memory Address MA

Bank Address BA

Data signal DQ DQ Transmitter[+,-] PET_P[x : 0]

PET_N[x : 0] Transmitter

DQS Receiver[+,-] PER_P[x : 0]

PER_N[x : 0] Receiver

DQS#

DM

Table 8-3 SSD Controller Interface signals

* X: 2^Number of lanes ** More signals with varying quantity

The power to ground pin ratio for signal pins is set as variable.


Controller Cost Factors

The tool calculates controller cost for two types of packages namely, Flip-Chip BGA and

Wire bonded BGA. The packaging cost for FCBGA and wire bonded BGA is calculated as a

function of variables such as die cost, number of I/Os, wafer level die yield, and assembly

process yield. The cost of the designed controller is calculated in two sections – Cost of the

Die and cost of package.

The size of the die depends on number of I/O pads, pad pitch and their arrangements.

Package type FCBGA Wirebonded BGA

Pad pitch (microns) 150 80

Bond pad configuration Area array pads Peripheral pads

The gross Die Per Wafer [DPW] can be estimated by the following expression:

The cost of the die is given by

Cost per Die ($) =

The cost for respective package depends on cost of each process step indicated in the figure

Figure 37 : Process flow for flip chip BGA and wire bonded BGA packaging.

Wafer diameter [d, mm]

Die size [S, mm2]


8.2.1 The properties of the tool are as defined below

1. Verifies the designed system for optimality by warning over design and under design

based on selected system interface maximum performance.

2. Estimates the number of pins on the controller chip based on the designed system.

3. Based on the number of pins indicated, the die size required and the cost is calculated

with selected technology and resources.

4. Calculates the cost of the estimated controller chip for the designed system.

Figure 38: Design tool outlook

Figure 39: Warning- System is over designed or under designed with respect to performance specified.


Figure 40: Cost calculation tool

8.2.2 Advantages of the tool

User can design the SSD system architecture for highest degree of performance by

varying flash type, buffer type, their size, number of channels, channel bandwidth.

The user is provided with input option to set the performance for flash modules (one

of the main contributor for system performance) based on which the tool calculates

the overall SSD system performance.

Based on the flash module specifications, the tool calculates the system capacity.

The tool warns the user when system is over and under designed.

The tool suggests the number of buffer channels required automatically based on the

type, size, and channel bandwidth of buffer selected.

The tool provides options to select type of package for controller before calculating

the cost.

8.2.3 Limitations

The tool has limited number of package options while calculating the Cost.

The performance values indicated in the tool during design are all theoretical.


8.3 Optimization tool consistency test for controller size

Systems are designed using optimization tool with interface SATA 2.0 and SATA 3.0. The

controller size of the designed systems is compared with Intel X-25E and Crucial Real C300

shown below.

SSD Controller

(system Interface)

Tool Designed

(Number of pins/balls)

Company Designed

(Number of pins/balls)

Intel X25-E (SATA 2.0) 448 409

Crucial C300 (SATA 3.0) 554 521

Figure 41 : Controller size for the system with SATA 2.0 interface

Figure 42 : Controller size for the system with SATA 3.0 interface


The values from the tool indicate that controller sizes are comparable. The difference in

values could be due to various reasons such as signal to power pins ratio. These values are

just for comparison.

The illustration of the tool for optimal system design and controller cost estimation is shown

in Appendix B

8.4 Hints to use tool for optimal system design and

controller cost estimation:

Select the desired host interface to set the performance specification for the system to

be designed.

Fill in the inputs such as flash chip performance, power-ground to I/O pins ratio,

select type of cache desired.

While designing the system, beware of the warning massage for over design or under

design of the system. This helps user to design optimal system architecture for the

performance specified in first step.

Select calculate button to know total number of pins (balls) on controller chip for the

designed system.

To know the cost of the controller for the designed system, press Die Cost ($) which

is at bottom left of your screen. Select desired, node technology and then select

desired package type and finally press Chip cost button to view the cost.


Chapter 9

9 . Summary

9.1 Conclusion

In conclusion, the common sense intuition that flash based Solid State Drive’s (SSDs)

provide superior performance for large read I/O is validated. As studied, SSDs are several

times faster for reads on an average, and extremely faster with respect to the access speed

compared to hard disk drives. Solid state drives are more efficient in power consumption

comparably.

Heavily used transactional databases, where there is an excessive random I/O workload

benefit the most from SSD technology as it additionally helps to negate the disk configuration

issues. With HDD devices, it is critical as to how the database structure is laid out, the

number of spindles etc, but, for SSD based systems, it does not matter whether data is laid or

use column or row-oriented storage for our databases as all the data space finally results in

the same performance!

Although it may be considered as application specific, considering the test results, investment

on solid state drives is better than on Random Access Memory in regards to performance

boost.

The system level optimization tool simulates the scenario which helps for studying the solid

storage system and carrying out various odds and outs to enhance the performance of the

system in order to save cost when it’s implemented practically. The development of system

level optimization tool is very much effective in designing the solid state storage systems

architecture for best performance for a desired system interface. A detailed analysis of the

factors considered in the tool helps to guide the decision and clarify the effects of the

variables on the cost of controller.


With the tremendous hype, it makes truly excited about the potential of SSDs and the rate at

which manufacturers are improving on this technology making them increasingly delicious,

seeing them dominating the market is a little far from reality at this point.

9.2 Future work on SSD

DRAM based SSDs will continue as a niche product, as cost and capacities will continue to

limit its use to system memory. If this continues, then improvements will have to be made in

data management to make better use of these SSDs, bringing tiered storage architecture into

an area that was traditionally just a flat file system. Developing middleware software to take

advantage of SSD however will require a much longer time frame than simply improving on

an existing product. It will also require a certain amount of discipline to manage a more

graduated approach of architecture with a better level of overall management.

Controller in solid state drive is a vital part; companies have to focus on bringing up better

architecture. SSD controller technology has to be the target in parallel to flash interface.

Currently, major SSD designs have moved from a 4 channel to a 10 channel controller and

controllers with even more channels has to be implemented. This will allow SSD drives to

perform much faster.

Improvements in MLC technology in terms of reliability, capacity and cost will increase the

appeal of SSDs. Alternative technologies are also waiting in the wings to replace SSD such as

phase change memory (PRAM) and resistive memory (RRAM), which may give more

appealing alternative technologies to SSD in terms of cost and performance ahead of

what SSD can achieve.


Appendix A

Building TPC-H benchmark

The TPC Benchmark H (TPC-H) is a decision support benchmark. It consists of a suite of

business oriented ad-hoc queries and concurrent data modifications. The queries and the data

populating the database have been chosen to have broad industry-wide relevance. This

benchmark illustrates decision support systems that examine large volumes of data, execute

queries with a high degree of complexity, and give answers to critical business questions.

TPC-H benchmark is an Embedded SQL database application, which connects to database

and execute embedded SQL statements. Embedded SQL statements are embedded within a

host language application.

To build the TPC-H benchmark, TPC provides a set of tools namely, a database population

generator (DBGEN) and a query template translator (QGEN). With IBM DB2 as a platform,

DBGEN provides data for the database and QGEN provides SQL queries. Using these,

TPC-H benchmark application can be created. This is done in two parts, creating TPC-H

database and creating a package between query source file and the database.

TPC-H database generation

DBGEN generates 8 separate ascii files, each file will contain pipe-delimited data.

Create 8 tables under database schema named as TPC-H and import each one of the

ascii files into tables defined in the TPC-H database schema.

Assign keys by altering tables in TPC-H database as per TPC-H specification [13].


TPC-H application package

Source file is created by embedding TPC-H queries/SQL statements in 'C' programming

language. To run applications written in compiled host languages, you must create the

packages needed by the database manager at execution time. The Figure 43 shows the order

of these steps, along with the various modules of a typical compiled DB2 application [4]

1. Create source files that contain programs with TPC-H queries.

2. Connect to a TPC-H database generated using DBGEN, then precompile each source

Figure 43: procedure to create application package


file to convert embedded SQL source statements into a form the database manager

can use.

[The precompiler converts embedded SQL statements directly into DB2 run-time

services API calls. When the precompiler processes a source file, it specifically looks

for SQL statements and avoids the non-SQL host language. PRECOMPILE (PREP) is an

application process that modifies source files containing embedded SQL statements

(*.sqc) and yields host language calls consisting of a source file(s) (*.c) and a

package. It is at this precompile time that the TIMESTAMP, which is also known as

the UNIQUE ID or CONSISTENCY TOKEN, is generated and is associated with the

package through the bind file and modified source code.]

3. Compile the modified source files (*.c) using the host language compiler.

4. Link the object files with the DB2 and host language libraries to produce an

executable program.

Compiling and linking (steps 3 and 4) create the required object modules

5. The BIND command invokes the bind utility. It prepares SQL statements stored in the

bind file generated by the precompiler and creates a package that is stored in the

database. Bind the bind file to create the package, or bind if different database is

going to be accessed. Binding creates the package to be used by the database manager

when the program is run.

6. Run the TPC-H benchmark application. The application accesses the TPC-H database

using the access plans.


Appendix B

System level optimization tool

The System level optimization tool, designed to optimize different interfaces in solid state

storage system to get the best performance and cost for a desired system interface is

illustrated.

Different interfaces to the controller influencing the performance of Solid State Drives:

Host Interface

Flash Interface

Number of Channels

Channel width

Flash Chip read performance

Buffer cache Interface

Cache type

Cache standard

I/O channel width

Number of channels

Cache size

The tool is operated in two sections:

Section1: Design a solid state drive for optimal performance

The tool provides options to select different interfaces mentioned above while designing the

solid state storage system. Based on the system interface (Host Interface) selected and its

maximum performance, the tool focuses on performance optimality – the tool warns if the

system is over/under designed while selecting different interfaces which influence the

performance. This is illustrated below in steps


Step1: Select the desired Host Interface which is the critical factor based on which the

system is designed. The system is designed to match the maximum performance offered by

selected Host Interface. Here SATA 2.0 is selected for illustration

Step2: Enter all the parameters needed to design a system and calculate its performance such

as flash chip read performance, signal to power pin ratio


Step 3: Select the type of cache and its standard to vary the number of cache channels

suitably. The buffer cache channels are selected ensuring that the performance of buffer

cache is 4 times faster than the selected system interface. Here DDR2 is selected and flash

chip read performance is entered as 25ns (nano-seconds).

Step 4: The Signal-pins button calculates the designed system performance along with the

number of balls on the designed controller. In this case the tool shows a system warning as

the system is under-designed for desired system interface. This is visualized by comparing


desired system performance and designed system performance which are 300MBps and

80MBps respectively. The Vcc-Vdd/IO pin ratio is considered as 1(every 2 IO pins requires 1

Vdd and 1 VCC).

Step 5: To increase the performance, the number of flash channels in parallel are increased.

In this case it is increased from 2 to 4 channels.


Step 6: The system warning indicated that the system is still under designed. So either the

number of flash channels should be increased else the channel width can be increased to

attain better performance. In this case the channel width is increased from 8bit to 16bit.

Step 7: When the designed system performance and the desired system performance are

matched with +/- 10 percent, the warning stops, indicating that the system design is optimal

with respect to selected Host Interface. The designed system is optimal in performance and

resulting controller has 340 pins (balls).


Step 8: If the designed system exceeds the maximum performance by host interface, the tool

warns for overdesigned system. The system should be altered by comparing the host interface

performance and designed system performance.

Step 9: The designed system capacity can be increased by varying Flash Chips/Channel

menu and also by selecting the number of Die’s/Flash chip

Section 2: Cost calculation of the controller for the designed Solid State Drive

Continuing the previous example, the designed system has 340 pins (balls) as seen in section-

1, step 7. This section of the tool calculates cost of the die based on number of resulted pins

along with selected parameters such as node technology, wafer diameter and the package

selected. Considering the similar packages available from Texas Instruments, an estimation

of cost can be made for the controller of the system designed.


The cost of the controller chip for designed system is 9.89 $ approximately.


Bibliography

[1] R. Bez, E. Camerlenghi, A. Modelli, and A. Visconti. Introduction to flash memory.

Proceedings of the IEEE, 91(4):489–502, April 2003.

[2] Intel X25-E Extreme SATA Solid-State Drive

http://download.intel.com/design/flash/nand/extreme/319984.pdf,

[3] http://onfi.org/specifications/

[4] IBM DB2 Guide -IBM public library

http://publib.boulder.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=/com.ibm.db2.luw.

apdv.embed.doc/doc/c0021136.html

[5] Cagdas Dirik and Bruce Jacob. The performance of pc solid-state disks (ssds) as a

function of bandwidth, concurrency, device architecture, and system organization. In ISCA

’09: Proceedings of the 36th annual international symposium on Computer architecture,

pages 279–289,New York, NY, USA, 2009. ACM.

[6] Design tradeoffs for SSD performance, Nitin Agarwal,Vijayan Prabhakaran.

www.usenix.org/event/usenix08/tech/full_papers/agrawal/agrawal.pdf

[7] David Roberts, Taeho Kgil, and TrevorMudge. Integrating nand flash devices onto

servers. Commun. ACM, 52(4):98–103, 2009.

[9] Super Talent Technology. SLC vs. MLC: An Analysis of Flash Memory.

http://www.supertalent.com/datasheets/SLC_vs_MLCwhitepaper.pdf.

[10] Trends in Enterprise Hard Disk Drives, Seiichi Sugaya (June 30, 2005)

http://www.fujitsu.com/downloads/MAG/vol42-1/paper08.pdf

[11] Intel Corporation. Intel - Understanding the Flash Translation Layer (FTL) Specication.

http://www.embeddedfreebsd.org/Documents/Intel-FTL.pdf, 1998.

[12] Imation. Solid State Drives - Data Reliability and Lifetime.

http://www.imation.com/PageFiles/83/SSD-Reliability-Lifetime-White-Paper.pdf.

[13] TPC BENCHMARKTM H- www.tpc.org/tpch/spec/tpch2.1.0.pdf

http://onfi.org/specifications/

http://www.usenix.org/event/usenix08/tech/full_papers/agrawal/agrawal.pdf

http://www.fujitsu.com/downloads/MAG/vol42-1/paper08.pdf


[14] HD tune pro manual, hdtunepro.pdf

[15] Tom's hardware. Flash SSD Update: More Results, Answers.

http://www.tomshardware.com/reviews/ssd-hard-drive,1968-4.html.

[16] Intel® X25-E Extreme SATA Solid-State Drives

http://download.intel.com/design/flash/nand/extreme/319984.pdf

[17] RealSSD™ C300 2.5 Technical Specifications – Crucial-

www.crucial.com/pdf/Datasheets-letter_C300_RealSSD_v2-5-10_online.pdf

[18] Computer organization and design: the hardware and software interface by David

A.Patterson, John L. Hennessy (page 450-475)

[19] en.wikipedia.org/wiki/computer_data_storage

[20] Intel: Disk Interface Technology,Quick reference guide – NP2108.pdf 1040211

[21] Write Endurance in Flash Drives: Measurements and Analysis, Simona Boboila &

Peter Desnoyer-http://www.usenix.org/event/fast10/tech/full_papers/boboila.pdf

[22] Intel High Performance Solid State Drive - Solid State Drive Frequently Asked Questions

http://www.intel.com/support/ssdc/hpssd/sb/CS-029623.htm#5

[23] http://en.wikipedia.org/wiki/Benchmark_(computing)

[24] Barracuda 7200.10 – www.seagate.com/docs/pdf/datasheet/ds_7200_10.pdf

[25] en.wikipedia.org

[26] http://onfi.org/wp-content/uploads/2011/03/20100818_S104_Grunzke.pdf

[27] http://www.novopc.com/2008/09/hard-disk/

[28] http://www.easy-computer-tech.com

[29] http://www.ramsan.com/resources/SSDOverview

[30] http://www.datarecoverytools.co.uk

[31] www.bit-tech.net

[32] http://tjliu.myweb.hinet.net

http://download.intel.com/design/flash/nand/extreme/319984.pdf

http://www.usenix.org/event/fast10/tech/full_papers/boboila.pdf

http://www.intel.com/support/ssdc/hpssd/sb/CS-029623.htm#5

http://en.wikipedia.org/wiki/Benchmark_(computing)

http://www.seagate.com/docs/pdf/datasheet/ds_7200_10.pdf

http://onfi.org/wp-content/uploads/2011/03/20100818_S104_Grunzke.pdf

http://www.novopc.com/2008/09/hard-disk/

http://www.easy-computer-tech.com/

http://www.ramsan.com/resources/SSDOverview

http://www.datarecoverytools.co.uk/

http://www.bit-tech.net/

http://tjliu.myweb.hinet.net/


Index

A

Addressing, 22

Advanced Technology Attachment, 25

B

ball grid array package (BGA), 65

Benchmark, 49

C

cache, 12, 14, 23, 24, 25, 29, 37, 41, 54, 58, 63, 67

Cell degradation, 38

Controller, 40

Crucial Real C300, 65

D

Disk access time, 21

E

Erase Block, 38

F

FLASH MEMORY, 34

Flash Structure, 37

Flash Translation Layer, 40

G

Garbage collection, 42

H

Hard Disk Drives, 19

HD Tune, 56

I

Intel X25-Extreme, 63

M

Marvell, 65, 66, 67, 68, 69

MATLAB GUIDE, 71

Memory, 11

MLC, 35

O

Offline Storage, 16

P

Page, 37

PCI Express, 44

Primary storage, 14

Processor cache, 14

R

RAM, 12

Reverse engineering, 63

ROM, 11

Rotational latency, 21

S

Secondary Storage, 16

Seek time, 21

Serial Advanced Technology Attachment, 27

SLC, 35

Small Computer System Interface, 26

Solid State Drives, 32

Storage Hierarchy, 13

System Architecture, 9

T

Tertiary Storage, 16

TPC-H, 50, 51, 52, 60, 81, 82, 83

Trim, 43

W

Wear-leveling, 42

Write Amplification, 43