slide 8-1 copyright © 2009 pearson education, inc. 1 j. glenn brookshear c h a p t e r 3 chapter 9...

63
1 Slide 8-1 Copyright © 2009 Pearson Education, Inc. J. Glenn Brookshear C H A P T E R 3 Chapter 9 J. Glenn Brookshear 蔡 蔡 蔡 DataBase & DBMS

Upload: annis-ramsey

Post on 02-Jan-2016

230 views

Category:

Documents


2 download

TRANSCRIPT

1

Slide 8-1 Copyright © 2009 Pearson Education, Inc.

J. Glenn Brookshear

C H A P T E R 3

Chapter 9

J. Glenn Brookshear蔡 文 能

DataBase & DBMS

2

Slide 8-2 Copyright © 2009 Pearson Education, Inc.

Chapter 9: Database Systems

• 9.5 Traditional File Structures

• 9.1 Database Fundamentals

• 9.2 The Relational Model

• 9.3 Object-Oriented Databases

• 9.4 Maintaining Database Integrity

• 9.6 Data Mining

• 9.7 Social Impact of Database Technology

3

Slide 8-3 Copyright © 2009 Pearson Education, Inc.

Data Structure vs. File Structure

• Data Structure : How to arrange data in memory• File Structure : How to arrange data in Disk and/or

any other secondary storage• DataBase and DataBase Management System

– Users do NOT have to care about how to store data in a file. DBMS will handle the detail.

– Users can use SQL (Structured Query Language) to access the DataBase in an interactive command and/or through a program (embeded SQL)

RAM Disk

4

Slide 8-4 Copyright © 2009 Pearson Education, Inc.

File Types

• Sequential file: one accessed in a serial manner from beginning to end. E.g. audio, video, text, programs.

• Text file: sequential file in which each logical record is a single character.

ASCII: 1 byte/char

Unicode: 2 bytes/char

5

Slide 8-5 Copyright © 2009 Pearson Education, Inc.

Sequential Files

• Sequential file: A file whose contents can only be read in order– Reader must be able to detect end-of-file (EOF)– Data can be stored in logical records, sorted by a

key field• Greatly increases the speed of batch updates

6

Slide 8-6 Copyright © 2009 Pearson Education, Inc.

Text Files

• Simple file structure.• Extendable to more complex file structures

using markup languages (XHTML, HTML).• XHTML, HTML control the display of the file

on a monitor.• XML is a standard for markup languages.

7

Slide 8-7 Copyright © 2009 Pearson Education, Inc.

Converting data from two’s complement notation into ASCII for storage in a text file

8

Slide 8-8 Copyright © 2009 Pearson Education, Inc.

Figure 9.15 A procedure for merging two sequential files

9

Slide 8-9 Copyright © 2009 Pearson Education, Inc.

Figure 9.16 Applying the merge algorithm (Letters are used to represent entire records. The particular letter indicates the value of the record’s key field.)

10

Slide 8-10 Copyright © 2009 Pearson Education, Inc.

XML

• Example

<note><to>Tove</to><from>Jani</from><heading>Reminder</heading> <body>Don't forget me this weekend!

</body></note>

http://www.w3schools.com/xml/xml_whatis.asp

11

Slide 8-11 Copyright © 2009 Pearson Education, Inc.

Indexed File

Brown P1

Johnson P2

Jones P3

Smith P4

Watson P5

key pointer

Index fileData file

Data item for Brown

Index: A list of key values and the location of their associated records

12

Slide 8-12 Copyright © 2009 Pearson Education, Inc.

An inverted file

13

Slide 8-13 Copyright © 2009 Pearson Education, Inc.

A file with a partial index

14

Slide 8-14 Copyright © 2009 Pearson Education, Inc.

Figure 9.17 Opening an indexed file

15

Slide 8-15 Copyright © 2009 Pearson Education, Inc.

Hashing

• In hashing the index file is replaced by a hash function.– The storage space is divided into buckets.– Each record has a key field. Each record is stored in the

bucket corresponding to the hash of its key.– A hash function computes a bucket number for each key

value.

• Advantage: no index table needed.• Disadvantages:

i) hash function needs careful design;ii) unpredictable performance

16

Slide 8-16 Copyright © 2009 Pearson Education, Inc.

Terminology

• Bucket: section of the data storage area.• Key: identifier for a block of information.• Hash function: takes as input a key and outputs

a bucket number.• Collision: two keys yield the same bucket

number.

17

Slide 8-17 Copyright © 2009 Pearson Education, Inc.

Hash Functions

• Hash Function Requirements– Easily and quickly computed.– Values evenly spread over the bucket numbers.– What can go wrong: bucket number computed from 1st and 3rd

characters of a name:Brown, Brook, Broom, Broadhead, Biot, Bloom, …

Examples of Hash FunctionsMid square: compute (key x key) and

set bucket number = middle digits.Extraction: select digits from certain positions within the key.Divide key by number of buckets and use the remainder.

18

Slide 8-18 Copyright © 2009 Pearson Education, Inc.

Figure 9.18 Hashing the key field value 25X3Z to one of 41 buckets

19

Slide 8-19 Copyright © 2009 Pearson Education, Inc.

Figure 9.19 The rudiments of a hashing system, in which each bucket holds those records that hash to that bucket number

20

Slide 8-20 Copyright © 2009 Pearson Education, Inc.

Collisions in Hashing

• Collision: The case of two keys hashing to the same bucket– Clustering problem: Poorly designed hashing function can

have uneven distribution of keys into buckets

– Collision also becomes a problem when there aren’t enough buckets (probability greatly increases as load factor (% of buckets filled) approaches 75%)

– Solution: somewhere between 50% and 75% load factor, increase number of buckets and rehash all data

21

Slide 8-21 Copyright © 2009 Pearson Education, Inc.

Handling bucket overflow

22

Slide 8-22 Copyright © 2009 Pearson Education, Inc.

A large file partitioned into bucketsto be accessed by hashing

23

Slide 8-23 Copyright © 2009 Pearson Education, Inc.

The role of an operating system when accessing a file

System calls ?

24

Slide 8-24 Copyright © 2009 Pearson Education, Inc.

Maintaining a file’s order by means of a File Allocation Table (FAT)

25

Slide 8-25 Copyright © 2009 Pearson Education, Inc.

Information Required on a Hard Drive to Load an OS

• Startup BIOS (POST , Load MBR)• Master Boot Record (MBR)

– Master Boot Program– Partition Table (16 bytes * 4)

• OS Boot Record (Boot Sector)– Loads the first program file of the OS

• Boot Loader Program– Begins process of loading OS into memory

26

Slide 8-26 Copyright © 2009 Pearson Education, Inc.

How Data Is Logically Stored on a Floppy Disk

• All floppy and hard disk drives are divided into tracks and sectors

• Tracks are concentric circles on a disk

• Sector– Always 512 bytes

– Physical organization of a disk

– BIOS manages disk as sectors

• Cluster (file allocation unit)– Group of sectors

– Logical organization of a disk

– OS views disk as a list of clusters

27

Slide 8-27 Copyright © 2009 Pearson Education, Inc.

The Boot Record

• Track 0, sector 1 of a floppy disk

• Contains basic information about how the disk is organized

• Includes bootstrap program, which can be used to boot from the disk

• Uniform layout and content of boot record allows any version of DOS or Windows to read any DOS or Windows disk

28

Slide 8-28 Copyright © 2009 Pearson Education, Inc.

The File Allocation Table (FAT)

• Lists the location of files on disk in a one-column table

• Floppy disk FAT is 12 bits wide, called FAT12

• Each entry describes how a cluster on the disk is used

• A bad cluster on the disk will be marked in the FAT

29

Slide 8-29 Copyright © 2009 Pearson Education, Inc.

The Root Directory

• Lists all the files assigned to this table

• Contains a fixed number of entries

• Some items included are:– Filename and extension

– Time and date of creation or last update

– File attributes

– First cluster number

30

Slide 8-30 Copyright © 2009 Pearson Education, Inc.

How a Hard Drive is Logically Organized to Hold Data

• Low-level format – Creates tracks and sectors, done at factory

• Partition the hard drive (FDISK.EXE)– Creates partition table at the beginning of drive

• High-level format – Done by OS for each logical drive

• Master Boot Record (MBR) is the first 512 bytes of a hard drive– Master boot program (446 bytes) calls boot program to load

OS– Partition table

• Description, Location, Size

31

Slide 8-31 Copyright © 2009 Pearson Education, Inc.

Partitions and Logical Drives

32

Slide 8-32 Copyright © 2009 Pearson Education, Inc.

FAT16

• Supported by DOS and all versions of Windows

• Uses 16 bits for each cluster entry

• As the size of the logical drive increases, FAT16 cluster size increases dramatically

33

Slide 8-33 Copyright © 2009 Pearson Education, Inc.

FAT32

• Became available with Windows 95 OSR2

• Used 32 bits per FAT entry, although only 28 bits were used to hold cluster numbers

• More efficient than FAT16 in terms of cluster size

34

Slide 8-34 Copyright © 2009 Pearson Education, Inc.

NTFS

• Supported by Windows NT/2000/XP

• Provides greater security

• Used a database called the master file table (MFT) to locate files and directories

• Supports large hard drives

35

Slide 8-35 Copyright © 2009 Pearson Education, Inc.

Comparing FAT16, FAT32, NTFS

36

Slide 8-36 Copyright © 2009 Pearson Education, Inc.

Figure 9.1: A file versus a database organization (1/2)

37

Slide 8-37 Copyright © 2009 Pearson Education, Inc.

Figure 9.1: A file versus a database organization(2/2)

38

Slide 8-38 Copyright © 2009 Pearson Education, Inc.

Figure 9.2: The conceptual layers of a database

39

Slide 8-39 Copyright © 2009 Pearson Education, Inc.

Schemas

• Schema: A description of the structure of an entire database, used by database software to maintain the database

• Subschema: A description of only that portion of the database pertinent to a particular user’s needs, used to prevent sensitive data from being accessed by unauthorized personnel

40

Slide 8-40 Copyright © 2009 Pearson Education, Inc.

Database Management Systems

• Database Management System (DBMS): A software layer that manipulates a database in response to requests from applications

• Distributed Database: A database stored on multiple machines– DBMS will mask this organizational detail from its users

• Data independence: The ability to change the organization of a database without changing the application software that uses it

41

Slide 8-41 Copyright © 2009 Pearson Education, Inc.

Database Models

• Database model: A conceptual view of a database– Relational database model– Object-oriented database model

42

Slide 8-42 Copyright © 2009 Pearson Education, Inc.

Relational Database Model

• Relation: A rectangular table– Attribute: A column in the table– Tuple: A row in the table

• Relational Design– Avoid multiple concepts within one relation

• Can lead to redundant data

• Deleting a tuple could also delete necessary but unrelated information

43

Slide 8-43 Copyright © 2009 Pearson Education, Inc.

Improving a Relational Design

• Decomposition: Dividing the columns of a relation into two or more relations, duplicating those columns necessary to maintain relationships– Lossless or nonloss decomposition: A “correct”

decomposition that does not lose any information

44

Slide 8-44 Copyright © 2009 Pearson Education, Inc.

Figure 9.3: A relation containing employee information

45

Slide 8-45 Copyright © 2009 Pearson Education, Inc.

Figure 9.4: A relation containing redundancy

46

Slide 8-46 Copyright © 2009 Pearson Education, Inc.

Figure 9.5 An employee database consisting of three relations

47

Slide 8-47 Copyright © 2009 Pearson Education, Inc.

Figure 9.6 Finding the departments in which employee 23Y34 has worked

48

Slide 8-48 Copyright © 2009 Pearson Education, Inc.

Figure 9.7: A relation and a proposed decomposition

49

Slide 8-49 Copyright © 2009 Pearson Education, Inc.

Relational Operations

• Select: Choose rows

• Project: Choose columns

• Join: Assemble information from two or more relations

50

Slide 8-50 Copyright © 2009 Pearson Education, Inc.

Figure 9.8: The SELECT operation

51

Slide 8-51 Copyright © 2009 Pearson Education, Inc.

Figure 9.9: The PROJECT operation

52

Slide 8-52 Copyright © 2009 Pearson Education, Inc.

Figure 9.10: The JOIN operation

53

Slide 8-53 Copyright © 2009 Pearson Education, Inc.

Figure 9.11: Another example of the JOIN operation

54

Slide 8-54 Copyright © 2009 Pearson Education, Inc.

Figure 9.12 An application of the JOIN operation

60

Slide 8-60 Copyright © 2009 Pearson Education, Inc.

Figure 9.13: The associations between objects in an object-oriented database

61

Slide 8-61 Copyright © 2009 Pearson Education, Inc.

Maintaining Database Integrity (1/2)

• Transaction: A sequence of operations that must all happen together– Example: transferring money between bank accounts

• Transaction log: A non-volatile record of each transaction’s activities, built before the transaction is allowed to execute– Commit point: The point at which a transaction has been

recorded in the log

– Roll-back: The process of undoing a transaction

62

Slide 8-62 Copyright © 2009 Pearson Education, Inc.

Maintaining database integrity (2/2)

• Simultaneous access problems– Incorrect summary problem– Lost update problem

• Locking = preventing others from accessing data being used by a transaction– Shared lock: used when reading data– Exclusive lock: used when altering data

63

Slide 8-63 Copyright © 2009 Pearson Education, Inc.

表格正規化簡介 表格正規化簡介 (1/2)(1/2)• 第一階正規化 第一階正規化 (1NF):(1NF):

– 關聯式資料庫 ((relational databaserelational database) 要求各個資料表皆須符合第一正規化(first normal form, 簡稱 1NF) ,如下:

– 第一正規化:每一個欄位只准有一個值。第一正規化:每一個欄位只准有一個值。– 關聯式資料庫要求每個資料表都要有一個主鍵 ((primary key)primary key) ,用來識別

每一個 tuple 。– 主鍵可以是資料表中的某一個欄位,也可以由幾個欄位組成。– 關聯式資料庫對主鍵欄位另外有個要求,即實體完整性 ((entity entity

integrityintegrity) :組成主鍵的任何欄位值,都不可以是 Null 。• 第二階正規化 (2NF) :

– 資料表要滿足第一正規化,而且所有欄位都資料表要滿足第一正規化,而且所有欄位都完全功能相依完全功能相依於主鍵。於主鍵。• 功能相依功能相依 (functionally dependence) :

– 由 attribute X 的值可以決定一個唯一的 attribute Y 的值,簡寫成 X→Y 。• 完全功能相依完全功能相依 (full functional dependence) :

– 如果 attribute Y 功能相依於 attribute X ,但是並不功能相依於 attribute X的任何子集,則稱 attribute Y 完全功能相依於 attribute X 。

64

Slide 8-64 Copyright © 2009 Pearson Education, Inc.

X→YX→Y ,, Y→ZY→Z 的關係稱做遞移功能相依遞移功能相依 (transitive functional dependence) ,即 Z 遞移功能相依於 X 。

• 第三階正規化第三階正規化 ((third normal formthird normal form ,,簡稱簡稱 33NF)NF)– 資料表要滿足第二正規化,而且所有欄位都不可遞移功能相依於主鍵。資料表要滿足第二正規化,而且所有欄位都不可遞移功能相依於主鍵。

• 第三階正規化解決的問題第三階正規化解決的問題– 解決了新增資料的問題– 解決了刪除資料的問題– 解決了修改資料的問題

表格正規化簡介 表格正規化簡介 (2/2)(2/2)

65

Slide 8-65 Copyright © 2009 Pearson Education, Inc.

Data Mining ( 資料探勘 )

• Data Mining: The area of computer science that deals with discovering patterns in collections of data

• Data warehouse: A static data collection to be mined– Data cube: Data presented from many perspectives to

enable mining

• Data Mining Strategies– Class description– Class discrimination– Cluster analysis– Association analysis– Outlier analysis– Sequential pattern analysis

66

Slide 8-66 Copyright © 2009 Pearson Education, Inc.

Data mining( 資料探勘 ) 主要功能與技術

功能 技術 適用領域

關聯性 (Association) 案例庫推理 / 集合理論 / 統計 菜籃分析

時間序列 (Sequence) 類神經網路 / 統計 利率預測

分類 (Classification) 基因演算 / 類神經網路 / 統計 / 客戶評鑑分類 模糊邏輯案例推理 / 決策樹

公式 (Modeling) 基因規劃 / 基因演算 / 迴歸 銷售預測

群組 (Clustering) 類神經網路 / 模糊邏輯 / 市場區隔 基因演算 / 統計

67

Slide 8-67 Copyright © 2009 Pearson Education, Inc.

Social Impact of Database Technology

• Problems– Massive amounts of personal data are being

collected• Often without knowledge or meaningful consent of

affected people– Data merging produces new, more invasive

information– Errors are widely disseminated and hard to correct

• Remedies– Existing legal remedies often difficult to apply– Negative publicity may be more effective

68

Slide 8-68 Copyright © 2009 Pearson Education, Inc.

DataBaseManagement System

謝謝捧場[email protected]

蔡文能

Thank You!Thank You!