chapter 12.3 mass-storage systems. .2/38 silberschatz, galvin and gagne ©2005 operating system...

Chapter 12.3 Mass-Storage SystemsChapter 12.3 Mass-Storage Systems

.2/38 Silberschatz, Galvin and Gagne ©2005Operating System Concepts

Chapter 12-3 Mass-Storage SystemsChapter 12-3 Mass-Storage Systems

Chapter 12-1: Overview of Mass Storage Structure

Chapter 12-2: Disk Attachment Disk Scheduling

Chapter 12-3: Disk Management Swap-Space Management RAID Structure

Chapter 12-4 Stable-Storage Implementation Tertiary Storage Devices Operating System Issues Performance Issues


Disk ManagementDisk Management


Disk ManagementDisk Management

We will discuss three very important topics:

Disk Formatting

Boot Blocks, and

Bad Blocks


Disk Management – Preliminary Comments

Before a disk is first put into use, it must be formatted.

But the disk can (and does, typically) support a variety of often diverse uses – from

Operating system needs and

User needs, and

Certain specialized needs for sometimes special applications.

So, the disk must / can be formatted in a number of ways – and is.

Further, when disks are manufactured and sent out for use, they often have bad spots (bad sectors).

This is the norm

Thus any kind of formatting must account for bad spots on the disk and map logical blocks into physical sectors.

So, there’s a lot of important information in this section.


Disk Management – Disk Formatting Initially a disk is divided into sectors that the disk controller can read

from and write to.

Recall: the disk controller is itself a small, very specialized processor and executes a restricted instruction set.

The instruction set deals primarily with instructions dealing with I/O and instructions dealing with device operations themselves.

Instructions include requests such as: input and output (open, close, read, write, etc. and additional instructions needed to control and manage the disk operations (is device ready; timing; much more)

As stated in previous lectures, instructions to the disk controller and other low level privileged instructions take the form of ‘commands’ and ‘instructions’ (depends on the computing system) Commands handled interpretively; Instructions – generally assembler level – are a bit different.


Disk Management – Disk Sectors Formatting the disk into sectors is referred to as low-level formatting.

Typical sector size is 512 bytes; others are available, such as 256 bytes, 1024 bytes (IK) and others. 512 is the norm.

Each sector itself contains a specific data structure consisting of a header, body of the sector, and a trailer.

Headers and Trailers are used for control information needed by the disk controller. These typically include:

Sector number (can check against the request for I/O) - Usually found in header.

Error Correction Codes (ECC) - Usually found in trailer

Data area itself – generally 512 bytes.


Disk Management – Disk Sectors The error correcting code (not exactly the same as those used in main

memory – but the thinking and specific bits are very similar…) is generated when data is written to the data portion. There are a variety of formulas used in generating ECCs.

When a read takes place, the hardware calculates the code based on the number and position of specific bitsand compares it to the code stored in the sector.

If different, the sector is somehow corrupted

Because the ECC is error-correcting, the bit (hopefully one) may be both identified (detected) and corrected. This is called a soft error and this phenomenon is passed on to

system administrators and tech reps for maintenance concerns. We talk a lot about ECC, parity, redundancy and more when we

discuss RAID ahead..


Disk Management – Low-Level Formatting

Usually the disk is low-level formatted as part of production.

Part of this formatting includes a process to designate a number of bytes of the data portion of a sector.

Header and trailer sizes are generally fixed because they are hardware processed.

As stated, the disk may be ‘divided’ into specific portions of the disk (called partitions) which may be used for specific needs.

Because of the physical characteristics of disk access, partitions are normally allocated ‘in cylinders.’

Each partition is essentially a separate logical disk.

Three typical partitions are

Partition for the operating system’s executable code

Partitions for user files

Often, partition(s) of raw disk.


Disk Management – Low-Level Formatting

Given that three (or more) partitions are established, these partitions need to be made ready.

After partitioning, step 2 is logical formatting (creation of a file system).

Here, the OS needs to establish several data structures for control and management.

These partitions are initialized and include structures such as

empty directories and

other structures (like memory maps)

to be used to manage free and allocated regions necessary during normal operations.

In truth, there is much more than just these…


Really – much more is done depending on the OS

Are we using virtual memory?

Many more support structures are needed. –

paging? Segmentation? Memory maps

Setting up queues to support multi-tasking operations

Other skeletal data structures to be used during operations…

Perhaps we want to be able to dual boot this computing system…

Another side note: Interestingly, actual disk I/O is done in blocks, but file system I/O is done in clusters, simply larger hunks of blocks.

This is done to facilitate sequential I/O exploiting the theory of locality. (Recall?)


Disk Management – Low-Level Formatting- Raw Disk

The notion of raw disk is an important one. This partition (normally not very large) has no associated file system

included in its initialization. It is simply a ‘raw’ area of sequential blocks.

Actually, processing in raw disk can speed up many operations, but special processing is required and is the responsibility of such clients.

Very specific locations in the partition can be exactly specified and hence the need to use a file system (directory, etc.) is bypassed.

In fact, using a file system would be a major hindrance and would likely slow things down considerably!

Raw disk is simply available to some special clients with some special applications to use as they wish.

These clients (for example, data base engines) are on their own and do not enjoy benefits of buffering, cache, pre-fetching, etc.

But, these needs are often found in the real world, and thus there is often a disk partition for raw I/O.


Disk Management – That Boot Block

We know that we typically have a bootstrap loader located in ROM such that when power is supplied, this burned in code is executed.

In truth, this bootstrap loader is a very simply program which normally is used to really bring in the full bootstrap loader from disk generally located in a fixed location on disk.

The bootstrap loader really initiates a call to a reserved area of the OS’s partition which will then undertake full bootstrap operations.

Since the bootstrap loader in ROM is read only, we don’t have to worry about this becoming corrupted.

Once the full bootstrap program is brought in (note there is no device drivers built yet), this loader then proceeds to load and initialize the rest of the operating system.


Windows 200 Boot Approach

Here’s a system specific approach:

First of all, code is run in the system’s ROM. This directs control to a fixed address which contains the boot code (as we implied).

This fixed address is the first sector of the hard disk, which is called the Master Boot Record.

The MBR contains a pointer to the real boot code to be executed and a table that lists the partitions and their locations on this disk.

This table in the Master Boot Record is thus a map of the partitions of the disk but contains this pointer one of the partitions called: boot partition..

The boot partition contains the operating system and all the device drivers.

Control is transferred to the boot partition (boot sector) which is executed which causes the continued loading and initializing of the rest of the operating system and its key support algorithms and data structures.


Booting from a Disk in Windows 2000Booting from a Disk in Windows 2000

First sector of disk is the Master Boot Record.

Boot partition (sector) contains OS and device drivers

Code in bootstrap loader in ROMdirects system to read boot code from MBR. This record contains table listing of partitions of hard disk and a pointer to boot sector – where remainder of OS is to be booted from


Bad Blocks – IDE Controllers

As stated, disks are normally manufactured with back blocks on them. After low-level formatting – normally done at the factory – they are sent

out with bad blocks on them. How bad spots on disk are accommodated is the subject here.

For small systems with IDE controllers, when one formats the disk, the scan finds these bad blocks.

This approach is simple, and an appropriate entry is entered into the FAT citing that this block is not available for assignment.

Once the disk is in operation, a common system program, chkdsk, is run which indentifies bad blocks and also indicates their defecti on the FAT. Unfortunately, if this happens, the data in this block is normally lost.


Bad Blocks – SCSI Controllers SCSI disks are usually found on servers, workstations, and high-end PCs. As before, this list is determined at the factory. But during operations, the disk is automatically updated.

Part of the initialization of the disk process is to reserve some spare sectors, which can be used to replace bad sectors – substituting a spare for a bad one.

Called: sector sparing or forwarding.

Bad sectors can arise from almost anything. A request to read from the bad block is simply mapped to the spare. Unfortunately, substitution brings with it problems, as one could imagine.


Bad Blocks – SCSI Controllers – more Continuing: bad sectors in a SCSI:

Unfortunately, we are not certain where these spares may be located.

It might bring some real sub-optimization of the disk scheduling algorithm.

So, most disks are formatted to have some spare sectors within the same cylinder.

An alternative to sector sparing is sector slipping.

As its name implies, once a bad sector is identified, block contents are passed to the ‘next’ block address up to and including the first spare block. (that is, all are ‘moved up’ one).

Of course, we have a performance drop off here as data is moved…

And, almost always, data in a bad block is gone.

Although there are techniques such that an identified bad block can be copied and perhaps spared, in most instances the data is gone!

These are called soft errors. We normally ‘press on.’

Hard errors, in contrast, typically represent really lost data and necessitate some kind of restore from a back up media.

This requires, usually, a computer operator intervention to load (if not loaded) the back up – followed by invocation of a restore procedure.


Swap Space Management


Swap-Space Management

Introductory thoughts

Swap Space Use

Swap Space Location

Swap Space in Solaris


Swap Space Management - Introduction

Introductory Thoughts

Swapping is used to make room in memory generally when we are space constrained.

In truth, this is rarely used nowadays.

Rather than swap entire processes, which can be very expensive from a performance perspective, we normally swap pages – a virtual memory technique.

So, we sometimes use the terms swapping and paging somewhat interchangeably nowadays.

Idea is to use low-level OS functions in a virtual environment and use disk space as extended primary memory.

Any time swapping is undertaken, there is a significant degradation in overall performance in comparing disk access to memory addressing, but remember, the reason for swapping overall is, of course, to improve overall performance in a virtual memory system.


Swap Space Use Naturally, the use of swap space will depend upon the operating system.

If an entire process is swapped, more swap space will be needed than in paged systems, where only a page may need to be stored.

Too, when an entire process is swapped, we are usually not only talking about the code, but data areas (stacks, etc.) as well.

So, how much space do we really need on disk?

Can range from a few meg to gigs!

Overestimate needed swap space?

May waste space – but often no additional harm done.

Underestimate needed swap space?

May result in the process being aborted!

Some systems (Linux) suggest an amount of swap space:

“Double the amount of physical memory”

Most set aside less; some argue whether swap space should be set aside at all.

Some systems have multiple swap spaces on different disks so that the space is distributed over the system’s I/O devices.


Swap Space Location Found in: Normal file system; sometimes in

A separate disk partition – a raw partition. File System Swap Space: So, think about using a file system…

File systems incur a lot of overhead Get some nice features using a file management system, but when time is

very critical, this approach can introduce many inefficiences. We’d have to use a directory lookups and perhaps extra disk accesses. These are (in this context) not desirable. Called traversing the file system.

Raw partition as Swap Space. Uses no file system (there’s no file system / directory structure in this approach) We do have a swap space manager which must be used to manage the blocks

in this partition, whose space is determined during initial disk partitioning. This manager uses special algorithms for speed vice space efficiency.. Internal Fragmentation? At reboot time, any internal fragmentation will go away.

This is nice. The data in the swap space don’t stay there long, as it might in typical disk

partitions that use a file system. Linux allows swap space in both raw partitions and in the file system. The trade offs are clearly between swap performance – time, and the convenience of

management of the file system.


RAID


Redundant Arrays of Independent Disks (RAID)

Raid architecture is a very interesting modern approach to significantly improve the way we read and write data to disks.

In RAID, we are exploiting the notion of parallelism in disks.

Parallelism via RAID can significantly improve performance – speed of I/O – and reliability – back up, recovery, and redundancy in stored data..

In the pasts, RAIDs were inexpensive disks and considered an inexpensive alternative to large, very expensive disks.

Now, these are considered Redundant Arrays of Independent Disks.

The reality is that disks do fail.

One approach is to have redundancy – the storage of information that we do not expect to use, but in emergencies, it can be used to backup and restore the lost information.


Redundant Arrays of Inexpensive Disks (RAID) We must consider engineering metrics such as Mean Time Between Failures

(MTBF) as well as Mean Times to Repair (MTTR). We first consider the technique of mirroring, where two physical disks is

considered one logical disk. Every write operation is accomplished on both disks Appears to provide good security - assuming that the second disk will

not fail before the first failed disk can be repaired. Unfortunately, independence of disk failures cannot be assumed. Often power failures / surges, earthquakes, other disasters may wipe out

more than one disk at the same time. Another approach is to have a second disk but to stagger the second write

operation a short time interval after the first. Operation is not ‘complete’ until second ‘write’ is successful.

Still another approach is to add something called a nonvolatile RAM (NVRAM) cache to the RAID array.

This write-back cache is protected from data loss during power failures (often the big culprit in failures). So here, the write can be considered complete once the first write takes place.


Improvement in Performance via Parallelism Improvement in Performance via Parallelism Several improvements in disk-use techniques involve the use of multiple disks

working cooperatively.

Disk striping can be used to significantly improve the transfer rate. . Data striping means to split the bits of each byte across multiple disks.

An eight-bit byte would need eight disks. In this architecture, an array of eight disks is considered one logical disk. Note: we, operating in parallel, have eight times the access rate. Each disk participates in each access, so the number of accesses that can be

processed per second is about the same as on a single disk, but each access can read eight times as much data in the same time as from a single disk!

Of course, the principle of bit-striping generalizes to block-level striping.

Overall goals of striping are two: Increase the throughput of multiple small accesses (that is, page accesses) by

load balancing, and Reduce the response time of large accesses.

End of Chapter 12.3End of Chapter 12.3

chapter 12.3 mass-storage systems. .2/38 silberschatz, galvin and gagne ©2005 operating system...

Documents

disk controller

disk operations

concepts disk management

disk formatting boot

concepts chapter

bad blocks slide

additional instructions

massstorage systems