28=16341_sad ppt27

29
Design of files and use of auxiliary storage devices

Upload: sidharath-pathania

Post on 02-Apr-2018

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 1/29

Page 2: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 2/29

Introduction• Information systems in business are file- and

database-oriented. Data are accumulated into filesthat are processed and maintained by the system.

• Databases, draw on data accumulated in transactionand other kinds of files and are designed to share

data for different applications.• The system analyst is responsible for designing the

files, determining their contents, and selecting amethod for organizing the data.

• At same time, if the proposed applications will drawon database resources, the analyst must developthe means of interacting with the database.

Page 3: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 3/29

Basic File Terminology• Data Item:

Individual elements of data are called dataitems. Each data item is identified by name and

has a specific value associated with it.

The association of a value with a field createsone instance of data item.

Data items can comprise subitems or subfields.

E.g Date is often used as a single data item,consisting of three subitems of month, day and

year.

Page 4: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 4/29

Basic File Terminology• Record:

The complete set of related data pertaining toan entry, is a record.

Each field has a defined length and type.

When the number and size of data item in arecord are constant for every record, the record

is called a fixed-length record.

Variable-length records are less common in mostbusiness applications than fixed-length designs.

Page 5: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 5/29

Basic File Terminology• Record Key:

To distinguish one specific record from another,systems analysts select one data item in the

record that is likely to be unique in all records of 

a file and use it for identification purposes.This item, called the record key, key attribute, or

simply key, is already part of the record, not

additional data added to it just for the purposeof identification.

Page 6: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 6/29

Basic File Terminology• Entity:

An entity is any person, place, thing or event of interest to the organization and about which

data are captured, stored, or processed.

Patients and tests are entities of interest inhospitals, while banking entities include

customers and checks.

Page 7: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 7/29

Basic File Terminology• File:

A file is a collection of related records. Eachrecord in a file is included because it pertains to

the same entity.

A file of checksThe number of records in the file determines the

file size. If each record is fixed-length and uses

200 characters of storage, the file uses 6 times200 characters of storage.

Page 8: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 8/29

Basic File Terminology• Databases:

A database is an integrated collection of data stored in

different types of records, and in a way that makes themaccessible for multiple applications.

The interrelation of the records derives from the relationshipin the data, not from their physical storage location.

Records for different entities are typically stored in a database(whereas files store record for single entity). In a universitydatabase, for example, records for students, courses, andfaculty are interrelated in the same database.

Databases donot eliminate the need for files in aninformation system. Different types of files are still needed tocapture the details of events and business activities, toprepare reports, or to store data that are not in the database.

Page 9: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 9/29

Data Structure DiagramsPurpose:

Data structure diagrams are graphic tools that show the

logical data structure requirements of an informationsystem application.

They serve four purposes:

1. Verify information requirements

2. Describe data associated with entities.

3. Show the relationship between entities.

4. Communicate the data requirements to a file designer

or database administrator.

Each data item either identifies the entities ordescribe an important attribute. Data structure

diagrams organize the data.

Page 10: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 10/29

Data Structure DiagramsNotation:

A common notation is used in preparing data

structure diagrams.

Entities are represented by rectangles, with entity

name at the top and a list of attributes (data items

or fields) describing the entity.

Each entity is identifiable by a key attribute, which

by convention is the first data item listed.

Page 11: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 11/29

Data Structure DiagramsUse in file design:

The use of data structure diagrams requires the

analyst to address the important questions about

the entity being described:

• What data items will uniquely identify an

occurrence of the entity?

• By what means will information about the entity be

accessed?

• What other data items describes the attributes of 

the entity?

Page 12: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 12/29

Data Structure Diagrams

Check

Account number

Check numberDate

Payee

Amount of transaction

Entity name

Key

Data items

Figure: Data structure diagram for checking examples

Page 13: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 13/29

Data Structure Diagrams• Figure includes a simple data structure diagram for

checking example introduced.

• As illustration shows, the record key, which in this case

is the account number, uniquely identifies the account.

Other details , including check number, date, payee,

and amount of the transaction, are attributes.• Analyzing the use of the checking information through

the data structure diagram shows that the actual check

number must be used for identification purposes. Since

the account number, uniquely identifies the account

but doesnot describe transactions involving it, a

combined key of account number and check number

must be used to trace individual transactions.

Page 14: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 14/29

Types of Files• There are mainly four types of files:

Master file

Transaction file

Table file

Report file

Page 15: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 15/29

Master File• A master file is a collection of records about an

important aspect of an organization’s activities. 

• It may contain data describing the current status of 

specific events or business indicators.

• E.g the master file in accounts payable system

shows the balance owed to every vendor and

supplier from whom the organization purchases

supplies or services.

• A second type of master file reflects the history of events affecting a particular entity.

Page 16: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 16/29

Transaction File• A transaction file is a temporary file with two

purposes:

1. accumulating data about events as they occur

2. updating master files to reflect the results of 

current transactions.

Page 17: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 17/29

Table File• Table files contain reference data used in processing

transactions, updating master files, or producing

output.

• Table files conserve storage space and ease

program maintenance by storing in a file data that

otherwise would be included in programs or masterfile records.

l

Page 18: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 18/29

Report File• Report files are temporary files used when printing time is

not available for all the reports produced, a situation that

frequently arises in overlapped processing. (the capabilityof a computer to simultaneously carry out input,

processing, and output operations which increases

throughput time considerably).

• The computer writes the report or document to a file on

magnetic disk or tape, where it remains until it can be

printed.

This process is known as spooling, i.e. , output that cannotbe printed when it is produced is spooled into a report file.

h l

Page 19: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 19/29

Other Files• Other kinds of files play a role in information systems.

• A backup file is a copy of a master, transaction, or table file

made to ensure that a duplicate is available if anythinghappens to the original.

• Archival files, copies made for long-term storage of data,

usually are stored away from the computer center to

prevent their being inadvertently accessed or retrieved for

use, thus ensuring their preservation.

h d f fil i i

Page 20: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 20/29

Methods of file organization

• Sequential organization

• Direct-Access organization

• Indexed organization

S i l i i

Page 21: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 21/29

Sequential organization

• Sequential organization is the simplest way to store and

retrieve records in a file.

• In sequential file, records are stored one after the another

without concern for the actual value of data in the records.

• The first record is stored at the beginning of the file.

• The second is stored right after the first, the third after thesecond and so on.

• This order never changes in sequential file organization.

S i l i i 1

Page 22: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 22/29

Sequential organization 1

• Reading sequential files: To read sequential file, the system

always starts at the beginning of the file and reads its way

up to the record, one record at a time.

• Searching for records: Sequential files do not use physical

record keys; records are accessed in order of their

appearance in file.• Evaluation of sequential files: when there is need to access

every record in a file then it is a good method. If on the

average of about one-half of the records in the file is to be

used then it is also acceptable. On the other hand, where

the requirement is to find one particular record in a very

large file, sequential file organization becomes a

disadvantage.

Di A i i

Page 23: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 23/29

Direct-Access organization

• Direct-access files are keyed files. They associate a record

with a specific key value and a particular storage location.

• All records are stored by key at addresses rather than by

position.

• If the program knows the record key, it can determine the

location address of a record and retrieve it independentlyof every other record in a file.

• In general, if fewer than 10 percent of the records in a file

will be needed during a typical processing run, the file

should not be established as a sequential file.

• On the other hand, if more than 40 percent of the records

will be accessed, the analyst should select the sequential

organization.

Di t A i ti 2

Page 24: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 24/29

Direct-Access organization 2

• Using the record key as the storage address is called addressing.

When this method can be used, it is simple and quick.

• However, the requirements of this method often prevent its

use. Direct addressing should have a data set with the following

characteristics:

1. The key set (i.e., the range of key values assigned) is in a dense,

ascending order with few unused values (unused values mean

wasted storage space). Therefore few open gaps in key values

are wanted.

2. The record keys correspond to the numbers of the storageaddresses: there is a storage address for each actual or

possible key value in the file and there are no duplicate key

values.

Di t A i ti 3

Page 25: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 25/29

Direct-Access organization 3

• Hash addressing: When direct addressing is not possible but

direct access is necessary the analyst specifies the alternative

access method of hashing.

• Hashing (also called Key transformation or randomizing) refers

to the process of deriving a storage address from a record key.

• An algorithm (an arithmetic procedure) is devised to change a

key value into another value that serves as a storage address.

(The data value in the record itself does not change).

Di t A i ti 4

Page 26: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 26/29

Direct-Access organization 4

• Types of hashing algorithms: A simple hashing algorithm for

changing the social security number into a suitable storage

address follows:

1. Strip off the first three digits of the number. 456821455

becomes 821455.

2. Divide the new key by prime number. Here we are using 41.

3. Modular division is used.

4. 19.

• Folding: Split the key into pieces and process them further

(add, subtract, divide, etc).

821

+ 455

1276 storage location

Di t A i ti 5

Page 27: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 27/29

Direct-Access organization 5

• Extraction: Select specific digits from the key and process

them with the remaining digits.

814 (1st, 3rd , 4th digits)

- 255 (2nd, 5th , 6th digits)

599 storage location

• Squaring: Multiply the number by itself and then apply otherhashing methods.

821,455 * 821,455 = 67,478, 831

Fold first half with second half. Extract 1st and 2nd to other

digits

6747 578

8831 15

15,578 storage location 593 storage location

I d d i ti

Page 28: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 28/29

Indexed organization

• A third way of accessing records is through an index.

• The basic form of index includes a record key and a storage

address for a record.

• To find a record when a storage address is unknown (as with

direct access and hashing structures), it is necessary to scan the

records.

• However the search will be faster if an index is used, since it

takes less time to search an index than an entire file of data.

Ch t i ti f I d

Page 29: 28=16341_sad ppt27

7/27/2019 28=16341_sad ppt27

http://slidepdf.com/reader/full/2816341sad-ppt27 29/29

Characteristics of an Index

• An index is a separate file from master file to which it pertains.

• Each record in the index contains only two items of data: a

record key and a storage address.

• To find a specific record when the file is stored under an indexed

organization, the index is first searched to find the key of record

wanted.

• When it is found, the corresponding storage address is noted

and then the program accesses the record directly.

• This method uses a sequential scan of the index, followed by

direct access to the appropriate record.• The index help speed the search compared with a sequential

file, but it is slower than direct addressing