cs102-01 intro to file org
Post on 02-Dec-2014
99 Views
Preview:
TRANSCRIPT
CS 102File
Structures & File
OrganizationsChapter 01
Introduction to File Organizations
CJD
Files Terminology
File – a collection of related data
examples : student records, payroll file
Entity – a data item of interest
example : student or employee
Fields or Columns – data related to an entity or object
examples : last name, first name, gender, birth date, (degree program or department), (year level or job title), etc.
CJD
Files Terminology
Record or Row – a grouped collection of fields about an entity or object
examples: fields about a student or employee
Key – one or more fields chosen to identify a record uniquely. Files are usually ordered according to key values.
examples: student or employee number, or last name, first name and birth date.
CJD
Why External Storage ?
collection of data usually too large to fit in main memory
only a small portion of the file is accessed at a time need to be in internal memory
file needs to be stored permanently in non-volatile storage for access by several programs
Preference : Contiguous storage of data
provides quicker access to information.
CJD
Our Objective
Objective :
Investigate structures used to organize large collections of data into files stored on secondary storage devices.
File Design goals :
Efficiency of storage and access.
Many factors affect file design.
CJD
Factors of File Design
File Organization – how records are stored
File Type – role of the file in an information system
File Characteristics – activity and volatility of the file
File Manipulations – operations to access the file and keep it current
CJD
File Organizations
the way in which records are stored in an external file.
the data structures used for organizing the data
Common File Organizations
1. Sequential
2. Random
3. Indexed Sequential
4. Multikey
CJD
Sequential File Organization
Records are stored and accessed consecutively in sequence from beginning to end
Records are usually in ascending or descending order by the key field
On average, half the records must be accessed to locate a record of interest
The entire file must usually be copied in order to update it
Records may be fixed length or of varying length
CJD
Random File Organization
Allows access of a record without sequential search through the file
There is a relationship between each record’s key and its location in an external file
Directly computes the location of the record using key value
There may be no relationship between the logical ordering and the physical ordering of the file.
CJD
Relative Files
an implementation of random file organization with fixed record lengths.
Each record in secondary storage is assigned a record number which designates its relative position with respect to the beginning of the file.
CJD
Relative Files
The first record may be record number 0 or 1 depending on implementation.
record address =
address of beginning of file +
relative record number x fixed record length
can be updated in place
can be sequentially accessed according to physical order of records in storage but this access is usually meaningless.
CJD
Indexed Sequential File Organization
a hybrid of sequential and random file organizations
Records may be variable length.
Records are grouped into blocks.
Blocks are fixed length.
Records in a block are not necessarily ordered but usually are.
Each block is stored in a contiguous location.
The blocks are organized into a relative file ordered by the key fields of representative records per block.
CJD
Indexes
there is a hierarchical structure of record keys and relative block numbers called an index
To retrieve a record,
index is used to retrieve the relative record number of the block containing the record,
then the relative file of blocks is accessed randomly
then the block is searched sequentially for the record
CJD
Indexes
index can specify order in which file should be accessed sequentially by record keys.
access speed is almost as fast as random file organization but slower due to the index search
CJD
Multikey File Organization
allows multiple ways to access the file by several different key fields
uses indexing structure with indexes for each key field
CJD
File Types
Six File Types according to functionality :
1. Master File - records of permanent data but are updated occasionally
major collection of data pertaining to a specific application
usually stored on disks, and nowadays rarely on tape
Example : Bank accounts file, accounts balances file
CJD
File Types
2. Transaction File – records of operations applied to master file
Example : new account opening file, Deposits/Withdrawals file
3. Table File – records used for lookup
Example : Interest rates file, minimum balance requirement file, account type description file
CJD
File Types
4. Report File – information prepared for the user. May be printed or displayed on-screen.
Example : Summary of accounts, Error and Audit listings of a maintenance run
5. Control File – summary of maintenance run
Example contents : run date, maintenance statistics
6. History File – backup of master, transaction and control files from past runs.
CJD
File Characteristics
Usage characteristics of a file :
file size – consider initial and future file sizes to determine storage that can accommodate file.
activity – percentage of master records updated during a maintenance run
high activity is more efficiently stored using sequential file organization
low activity is more efficiently stored using organizations with random access that allows update in place
CJD
File Characteristics
volatility – number of records added and deleted compared to original number of records.
high volatility – best updated using merge procedures. Non-sequential files have high overhead to reorganize as a result of these updates.
frequency of use – sequential files require more time to update on hourly basis instead of daily or monthly basis, so can’t be used frequently
required response time – real time access require random access
CJD
File Manipulations : Queries1. Queries
searching records whose values meet a criteria
the types of queries to be performed can affect file design
Examples :
List the record of student Juan De La Cruz
List all students enrolled in more than 20 units
List all second-year level Engineering students
Count all Engineering students per major with an average of 1.5 or better
CJD
File Manipulations : Merging
2. Merging
combining data extracted from two or more files
Examples :
File of Engineering students is usually separate from file of grades
Counting all Engineering students per major with an average of 1.5 or better requires merging these files.
CJD
File Manipulations : Maintenance
3. Maintenance
updating master file with transaction file to keep the master file up-to-date
possible transaction or update codes could be
A = addition of a new master file record
C = changing values of fields of master file records
D = deletion of an existing record from master file
CJD
File Manipulations : Maintenance file maintenance program merges transaction and
master files
if transaction refer to a non-existent master file record, it must be an addition transaction
if transaction refer to an existent master file record, it must be an update or deletion
if no transactions refer to a master file record, no update on that record is made
For sequential files, transactions must be ordered in the same key sequence as the master file so they may be merged. A new master file is created by the maintenance program.
CJD
File Manipulations : Multifile
4. Multifile Information Systems
uses many master files
designed to reduce duplication of information
Examples :
The last name and first name need not be stored with each grade of a student.
Only the key field “student number” is in both student file and grades file to link corresponding records.
CJD
File Manipulations : Multifile
4. Multifile Information Systems : DBMS
Database management systems (DBMS) is a multifile information system
DBMS is out of scope of this course and is the subject of a different course.
CJD
End
top related