cas cs 460 introduction to database systems thanks to prof. george kollios, boston university and...
Post on 19-Dec-2015
220 views
TRANSCRIPT
CAS CS 460CAS CS 460Introduction to Database SystemsIntroduction to Database Systems
Thanks to Prof. George Kollios, Boston University and Prof. Mitch Cherniack Brandeis University for lecture materials
1.2
About the course – AdministriviaAbout the course – Administrivia
Instructor: Ravi Kothuri, [email protected]
Office, Hours: MCS 147, Mon/Wed 5-6PM and 7:30-8PM
Teaching Fellow: Panagiotis Papapetrou, [email protected]
MCS 147, Tue/Thu 11 - 12:30 AM
Home Page: http://www.cs.bu.edu/rkothuri
Check frequently! Syllabus, schedule, assignments, announcements…
1.4
My BackgroundMy Background
Oracle Corporation PhD from University of California, Santa Barbara Research:
Multi-dimensional indexing Mobile Databases Spatial, GIS systems and CAD/CAM databases
Google Maps type of technologies for Enterprise Geometric algorithms for terrain management, city modeling,…
Data Mining (spatial, financial, …) RFID technologies Semantic –web (RDF) technologies
Book: “Pro Oracle Spatial”, Nov 2004 Teaching on invitation from Prof. George Kollios
1.5
Who uses Databases?Who uses Databases?
Universities (records for students, faculty, courses,…
Airlines (passengers, flights, luggage, …)
Banking (customers, loans, …)
Utilities (customers, usage history, bills); e.g. telecom, electric,..
Any Company: human resources Employees, depts, facilities,…
“Data is the primary and integral part of information industry. Proper
management of the data using database technology is essential
for any large-scale company, organization.’’
1.6
What is a Database What is a Database SystemSystem??
Database:
A very large collection of related data
Models a real world enterprise: Entities (e.g., teams, games / students, courses)
Relationships (e.g., The Patriots are playing in the Superbowl)
Even active components (e.g. “business logic”)
DBMS: A software package/system that can be used to store, manage and retrieve data form databases
Database System: DBMS+data (+ applications)
1.7
Why Study Databases??Why Study Databases??
Shift from computation to information Always true for corporate computing
More and more true in the scientific world
and of course, Web
DBMS encompasses much of CS in a practical discipline OS, languages, theory, AI, logic
1.8
Managing data: A naïve approachManaging data: A naïve approach
Why not store everything on flat files: use the file system of the OS, cheap/simple…
Name, Course, Grade
John Smith, CS112, B
Mike Stonebraker, CS234, A
Jim Gray, CS560, A
John Smith, CS560, B+
…………………
Yes, but not scalable… Filesize limitations, access/update performance is slow,..
1.9
Problem 1Problem 1
Data redundancy and inconsistency Multiple file formats, duplication of information in different files
(say, in different departments)
John Smith, [email protected], CS112, B
John Smith, Arts560, [email protected], B+
Smith J, [email protected], Math212, A
Why is this a problem?
Wasted space
Potential inconsistencies (multiple formats, John Smith vs Smith J.)
1.10
Problem 2Problem 2
Data retrieval: Find the students who took CS560
Find the students with GPA > 3.5
For every query we need to write a program!
Need a Query/Retrieval engine that can support different ways to access data Easy to write
Execute efficiently
1.11
Problem 3Problem 3
Data Integrity
No support for sharing:
Prevent simultaneous modifications
No coping mechanisms for system crashes
No means of Preventing Data Entry Errors (checks must be hard-coded in the programs)
Security problems
1.12
Database SystemsDatabase Systems
Database systems offer solutions to all the mentioned problems
Database systems: Support Modeling of the data
Provide Levels of Abstraction of the data
Provide programs to allow you to Retrieve/modify the data
SQL
• For easy, standard specification of queries
Query Optimizer
• To process your queries efficiently
Ensure Integrity Maintenance
Transaction Manager/Recovery Manager
• to ensure atomicity/integrity in concurrent transactions
• to ensure integrity after system crashes)
…………….
1.13
Database SystemsDatabase Systems
Data Modeling
Levels of Abstraction
Data Retrieval
Data Modification/Integrity Maintenance
1.14
Data ModelData Model
A framework for describing data data relationships data semantics data constraints
Entity-Relationship model (Ch. 6) A set of entities to model real-world objects
Relationships among entities
Relational model Data as a set (or sets) of “records” or “tuples”
Each tuple in the set has the same set of attributes
Other models: object-oriented model: inheritance, abstraction,… semi-structured data models, XML: tuples in a set can have different attributes
1.15
Entity-Relationship ModelEntity-Relationship Model
Example of schema in the entity-relationship model
1.16
Entity Relationship Model (Cont.)Entity Relationship Model (Cont.)
E-R model of real world Entities (objects)
E.g. customers, accounts, bank branch
Relationships between entities
E.g. Account A-101 is held by customer Johnson
Relationship set depositor associates customers with accounts
Widely used for database design Database design in E-R model usually converted to design in the
relational model (coming up next) which is used for storage and processing
1.17
Relational ModelRelational Model
Example of tabular data in the relational model
customer-name
Customer-idcustomer-street
customer-city
account-number
Johnson
Smith
Johnson
Jones
Smith
192-83-7465
019-28-3746
192-83-7465
321-12-3123
019-28-3746
Alma
North
Alma
Main
North
Palo Alto
Rye
Palo Alto
Harrison
Rye
A-101
A-215
A-201
A-217
A-201
Attributes
1.18
Database SystemsDatabase Systems
Data Modeling
Levels of Abstraction
Data Retrieval
Data Modification/Integrity Maintenance
1.19
Levels of AbstractionLevels of Abstraction
Data storage Involves Complex data structures Hide complexity from users
Abstract views of the data (e.g., for storing a customer record) Physical level: how a customer record is stored as
bytes/words on disk
• Mostly hidden from database users/programmers Logical level: describes “types” inside the database
type customer = recordname : string;street : string;city : integer;
ssn; integer;
end;
View level: application programs hide details of data types. Views can also hide information (e.g., ssn) for security purposes.
1.21
Physical Level: Data OrganizationPhysical Level: Data Organization
Data Storage (Ch 11)
Where can data be stored? Main memory
Secondary memory (hard disks)
Optical store
Tertiary store (tapes)
Move data? Determined by buffer manager
Mapping data to files? Determined by file manager
1.22
Database ArchitectureDatabase Architecture(physical level data organization)(physical level data organization)
DBA
DDL Interpreter
Buffer ManagerFile Manager
Data
Schema
DDL Commands
Metadata
Storage Manager
Secondary Storage
1.23
Database SystemsDatabase Systems
Data Modeling
Levels of Abstraction
Data Retrieval
Data Modification/Integrity Maintenance
1.24
Data retrievalData retrieval
Queries (Ch 3, 4)
Query = Declarative data retrieval
describes what data, not how to retrieve it
Ex. Give me the students with GPA > 3.5 vs
Scan the student file and retrieve the records with gpa>3.5
Why?
1. Easier to write
2. Efficient to execute
1.25
Data retrievalData retrieval
Query Optimizer“compiler” for queries (aka “DML Compiler”)
Plan ~ Assembly Language Program
Optimizer Does Better With Declarative Queries:
1. Algorithmic Query (e.g., in C) 1 Plan to choose from2. Declarative Query (e.g., in SQL) n Plans to choose from
Query Optimizer Query Evaluator
Query
Plan
Data
Query Processor
1.26
Specifying the Query using SQLSpecifying the Query using SQL
SQL: widely used (declarative) non-procedural language E.g. find the name of the customer with customer-id 192-83-7465
select customer.customer-namefrom customerwhere customer.customer-id = ‘192-83-7465’
E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465
select account.balancefrom depositor, accountwhere depositor.customer-id = ‘192-83-7465’ and depositor.account-number = account.account-
number
Procedural languages: C++, Java, relational algebra
1.27
Data retrieval: Data retrieval: Indexing (Ch 12)Indexing (Ch 12)
How to answer fast the query: “Find the student with SID = 101”?
One approach is to scan the student table, check every student, retrurn the one with id=101… very slow for large databases
Any better idea?
1st keep student record over the SID. Do a binary search…. Updates…2nd Use a dynamic search tree!! Allow insertions, deletions, updates and at the same time keep the records sorted! In databases we use the B+-tree (multiway search tree) 3rd Use a hash table. Much faster for exact match queries… but cannot support Range queries. (Also, special hashing schemes are needed for dynamic data)
1.28
Root
B+Tree Example B=4
120
150
180
30
100
3 5 11
30
35
100
101
110
120
130
150
156
179
180
200
1.29
Database ArchitectureDatabase Architecture(data retrieval)(data retrieval)
DBA
Query OptimizerDDL Interpreter
Query Evaluator
Buffer Manager
File Manager
Data
Schema
DDL Commands
User
Query
DB Programmer
DML Precompiler
Code w/ embedded queries
Statistics
Indices Metadata
Query Processor
Storage Manager
Secondary Storage
1.30
Database SystemsDatabase Systems
Data Modeling
Levels of Abstraction
Data Retrieval
Data Modification/Integrity Maintenance
1.31
Data IntegrityData Integrity Transaction processing (Ch 15, 16)Transaction processing (Ch 15, 16)
Why Concurrent Access to Data must be Managed?
John and Jane withdraw $50 and $100 from a common account…
Initial balance $300. Final balance=?
It depends…
John: 1. get balance 2. if balance > $50 3. balance = balance - $50 4. update balance
Jane: 1. get balance 2. if balance > $100 3. balance = balance - $100 4. update balance
1.32
Data IntegrityData IntegrityRecovery (Ch 17)Recovery (Ch 17)
Transfer $50 from account A ($100) to account B ($200)
1. get balance for A
2. If balanceA > $50
3. balanceA = balanceA – 50
4.Update balanceA in database
5. Get balance for B
6. balanceB = balanceB + 50
7. Update balanceB in database
System crashes….
Recovery management
1.33
Database ArchitectureDatabase Architecture
DB Programmer
User DBA
DML Precompiler Query OptimizerDDL Interpreter
Query Evaluator
Buffer Manager
File Manager
Data
Statistics
Indices
Schema
DDL CommandsQueryCode w/ embedded queries
Transaction ManagerRecovery Manager
Metadata
Integrity Constraints
Secondary Storage
Storage Manager
Query Processor