datastructures 6.1 a brief look at reality. datastructures 6.2 aircraft noise monitoring system 1...

57
datastructures A Brief Look at Reality

Upload: neal-ball

Post on 11-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

PC Coursedatastructures 6.*
Sydney Airport - 7000 complaints per day
Software Used : CA Windows/4GL - Application
Development Tool
CA-OpenIngres - DBMS
Developers - Lochard Environment Systems
Have installed noise and flight plan systems at:
Sydney, Melbourne, Brisbane, Cairns, Perth, Adelaide, Coolangatta
All systems are integrated into 1 system monitored from the offices of Air Services, Canberra
datastructures 6.*
Some Features:
- A reading every second)
threshold
System
Aircraft Noise Monitoring System 4
Noise and Flight Plan overlaid to a map of surrounding area - viewed in real time
Records stored in CA-OpenIngres database
Geographic Information System interface
aircraft information
Communications
6.0 Design and Data Structures
Data structures are the bricks and mortar that hold databases together.
Data structures (for the ANSI/SPARC standard) are defined in the internal model level and implemented in the physical data organisation.
datastructures 6.*
Design and Data Structures
Data structures are often hidden from the application programmer, since they are primarily used by the DBMS and the Operating System
Understanding data structures is important for performance reasons, to improve program design and allow easier communication with DBMS specialists
datastructures 6.*
Attributes they should contain
Avoid Redundancy if possible
datastructures 6.*
A group of attributes has a natural “inherent” structure
This structure is independent of the way the data is used
Normalisation
originally Codd defined three normal forms
later expanded to Boyce-Codd and fourth and fifth
normal forms
datastructures 6.*
Dependency Theory
" one truly scientific part of the field [of database design]"
Date 6th Ed p.380 - needs more research
Relational Database Design - a mechanical approach to
producing a database schema with certain desirable
properties
Study Algorithms which produce Normalised Relations
with Desirable Properties
may be better than another
Each Normal Form requires that a Relation satisfies
criteria for that form. This eliminates a different
kind of redundancy
database operations and will store each fact only once
Database operations applied to unnormalised relations
may lead to anomalies
datastructures 6.*
S79 32 P1 4
datastructures 6.*
Functional Dependencies
- the values of one set of attributes effect the values
of another attributes
The value of Y depends on the value of X
The simplest case is 1 attribute determines another single attribute
datastructures 6.*
Functional Dependencies
Is Redundant
datastructures 6.*
What is it ?
It is a method for increasing the quality of database design.
It also doubles up as a theoretical base for definition of the properties of tables.
It probably has its origins in the early days of moving from ‘files’ to database tables.
It supports and verifies the database model.
datastructures 6.*
COT7180 9142717 C Wilson
Each Row MUST CONTAIN the same number of
columns
c3576 Doe,J Jones,R Smith,V Ng,K
k4567 Nguyen,P
datastructures 6.*
Normalisation - 2
(1) The table must be in 1 NF
(2) Each attribute which is NOT part of the key
must be functionally dependent on the whole key
Building Room Seats No. of Levels
A 214 85 4
A 242A 25 4
B 213 135 6
BUILDING only, not Building and Room
datastructures 6.*
Normalisation 3
(1) Table must be in 2NF ( and therefore 1NF)
(2) Every attribute which is NOT part of the
Primary Key must be dependent ONLY on
the P.K. (i.e. not dependent on any other
NON-KEY attribute)
b7654 Fabbri,M 312 B Building
datastructures 6.*
Normalisation 3a
c3576 Doe,J Doe,J 101 A Building
k4567 Nguyen,L Nguyen,L 209 F Building
b7645 Fabbri,M Fabbri,M 312 B Building
These tables are in 3NF.
Table1
the key
datastructures 6.*
Review of Kent's Process
- no repeating groups
Relationship between key and non-key fields
1:1 or 1:N
2NF is violated when a non-key field is a fact about a
subset of the key
Problems with relations in 2NF
- repeated information
- update anomalies
- potential inconsistency
- delete anomalies
3NF
- is violated when a non-key field is a fact about another
non-key field
Problems with relations in 3NF
when a non-key field is a fact about a key field
datastructures 6.*
1. Do not prematurely combine entities into tables
2. Concentrate on Access Mechanisms which can be shared among requests
3. Deviate from the data model in a responsible manner
4. Use table/view and column names which closely reflect data model names
5. Do not define multiple attributes as one (composite) column in a table. (i.e. attribute = column)
6. Develop an ‘architecture’ to support business rules (rules, procedures, integrity)
7. Position your organisation for future technology (windows, CASE, optimisers)
Modelling - some hints
We are going to have a quick look at some
aspects of accessing data from the database
datastructures 6.*
Cobol C Assembler External
DBMS Conceptual Model Conceptual
datastructures 6.*
No. of Disk I/O’s per second 3
Assume 1 disk I/O per page
Assume HEAP storage structure
(nearly 3 hours !)
provides access on a ‘key’
Modify existing structure to ? ? on what key
(attribute or multiple attributes)
compression leaffill location
datastructures 6.*
Performance is dependent on how quickly an application can access table data (remember caching in Session 2 ?)
In Oracle there are these points to consider in Implementation Design
1. Applications can access table data with or without indexes
2. IF an index is present and IF the index will help performance, Oracle will use the index
3. Oracle will automatically update an index to keep it in synchronisation with its table
Indexes - Performance Improvers
datastructures 6.*
It is not a good tactic to index every attribute set.
Indexes should be restricted to key attributes
Index maintenance generates overheads - time and storage
Indexes - Performance Improvers
datastructures 6.*
File Organisation
A file organisation is a technique for physically arranging the records of a file on a secondary storage device.
File organisations
Record Access Modes
In sequential access, record storage starts at a designated point, usually the beginning, and proceeds in a linear sequence through the file. Each record can only be retrieved by accessing all the records that physically precede it.
Random Access
In random access, a given record is accessed "out of the blue" without referencing other records in the file.
Sequential Access
datastructures 6.*
File Organisation and Access Mode
A File organisation is established when the file is created, and is rarely changed. However, record access mode can change each time the file is used.
File
Organisation
There are two basic implementations of the indexed sequential organisation:
- hardware-dependent uses block index on the key, disk address to the prime area which contains the data records and the track index for the cylinder
- hardware-independent uses a control interval which may be considered a virtual track, free space for new records is provided by distributed free space.
Indexed Sequential File Organisation
its relative record number.
- The relative record number is a number 0 to n
that gives the position to the record relative to
the beginning of the file.
- This provides a method of direct file organisation.
datastructures 6.*
Hashed Files
For applications which do updates and retrievals in random mode, and there is rarely the need for sequential access to the data records (e.g. reservation systems). Hashed file organisation provides rapid access to individual records based on a key.
The major disadvantage of hash organisation is that sequential organisation is not convenient because the records are not stored in primary key sequence. But highly concurrent environments doing random access are suitable for using hash organisation.
The basis of a hash file is an addressing algorithm which transforms the record identifier into a relative address.
datastructures 6.*
Primary
storage
area
Overflow
storage
area
Bucket
overflow
technique
Identifier
Transformation
datastructures 6.*
Hashing Routines
Records are assigned to buckets by means of a hashing routine, or
transformation, which is an algorithm that converts each primary key value into a relative disk address.
An example of one that consistently performs best under most conditions is: * division/remainder method
1. Determine the number of buckets to be allocated to the file.
2. Select a prime number that is approximately equal to this number.
3. Divide each primary key value (usually the ASCII sum) by the prime
number.
datastructures 6.*
Managing Overflow
Chaining Techniques:
* separate - overflow records are relocated to avoid merging
of synonym chains
Separate Overflow Area
* uses an independent overflow area, with a pointer in the home buckets
to the overflow buckets
Types of Overflow Chaining:
Length 1.3
Load Factor: The load factor is the percentage of space allocated to the file that is taken up by the records in the file. A low load factor reduces the number of records that overflow their home addresses It is common to use 50% to 80%, using a lower load factor for files which that will grow.
Bucket Capacity: Increasing the bucket capacity will also reduce the number of overflows and hence the average search length.
Load Factor (%)
b=1
b=2
b=3
b=4
Relative
A non-linear data structure, each element having several "next" elements ( branching ).
A binary tree has a maximum of two branches per element or node.
A node consist of some data and a maximum of two pointers, a left pointer to the left branch and right pointer to the right branch. If there is no left or right branch then a nil pointer is used.
datastructures 6.*
Basic binary
tree record
layout for
2. Visit the root.
3. Visit the root.
<
datastructures 6.*
B Trees
The problem with Binary Trees is balance, the tree can easily deteriorate to a linked list.
Consequently, the reduced search times are lost.
This problem is overcome in B-trees.
B for Balanced, where all the leaves are the same distance
from the root.
datastructures 6.*
B+ Trees
There are several varieties of B-trees, most applications use the B+ tree.
The + indicates the presence of an Index -
A B-tree index is an ordered tree of index nodes
Each index node contains one or more index entries
Each index entry corresponds to a row in a table
The index entry contains :
the ROWID (physical disk location)
datastructures 6.*
B-tree indexes are not suitable for all types of attributes
They are suitable for processing where data is being inserted, updated, and deleted at a high or constant rate
Candidate keys are Primary keys - so are Foreign keys
Oracle automatically creates b-tree indexes for all Primary Keys and UNIQUE integrity constraints of a table
B+ Trees
datastructures 6.*
A B+-tree of degree m has the following properties:
1. Every node has between [m/2] and m children (where
m is an integer > 3 and usually odd), except the root
which is not bound by a lowery limit.
2. All leaves are at the same level, that is the same
depth from the root.
3. A nonleaf node that has n children will contain n-1
keys.
1250
A Review of Trees
Can permit rapid retrieval of data for both random and sequential processing.
Can be used on primary or secondary keys.
Trees are special cases of networks; in networks, records from different files are joined without a strict hierarchy
being observed. This is addressed in the hierarchical and network model lectures.
datastructures 6.*
Bit Mapping
a bit to indicate the presence of some value
a row i.d. to reference the row
Other Methods
Row 1
Row 2
Row 3
Useful in DSS/data warehouse applications
Not good for frequent update or insert applications
datastructures 6.*
Clustering - Oracle
Is a technique which ‘clusters’, or groups together, related rows of one or more tables in the same data block.
The objective is to store (on disk) rows of an application which are used together (e.g. Orders and Items ) - this saves disk I/O on analysis applications
A cluster key is necessary for each cluster.
Not very successful for high volume processing
Other Methods
datastructures 6.*
Table size : small 2 1 1 3
Table size : medium (modify 4 1 1 1
disk space available)
Deletes Frequent 4 1 1 3
Updates Frequent 4 1 1 2
Secondary Index Structure N/A 1 1 1
** secondary indexes used with a heap structure
datastructures 6.*
Exact Match Key Retrieval 4 1 2 2
Sorted Data 4 4 2 1
Concurrent Updates 4 1 1 2
Add Data - No Modify 2 3 3 1
Sequential Addition of Data 1** 2 5 1
datastructures 6.*
Initial Bulk Copy of Data 1 2 2 2
Table Growth - nil/static N/A 1 1 2
Table growth - low (15%) N/A 1 1 2
Plan to modify periodically
Too fast to modify