fall 20021 cse330/cis550: introduction to database management systems prof. susan davidson office:...
Post on 17-Jan-2018
216 Views
Preview:
DESCRIPTION
TRANSCRIPT
Fall 2002 1
CSE330/CIS550: Introduction to Database
Management Systems
Prof. Susan DavidsonOffice: 278 Moore
Office hours: TTh 10-11
Fall 2002 2
Administrative Stuff • What you should know to take this
class.• Handouts: Syllabus and Homework 1.• Resources: Text, TAs, Web site,
bulletin board and office hours.• Coursework: homeworks, exams,
project.• Computer accounts.
Fall 2002 3
What the subject is about• Modeling and organization of data• Efficient (expressive?) retrieval of data• Reliable and consistent storage of
data• Not surprisingly, all these topics are
interrelated.
Fall 2002 4
What is a DBMS?• A database (DB) is a large,
integrated collection of data.• A DB models a real-world
enterprise. • A database management system
(DBMS) is a software package designed to store and manage databases.
Fall 2002 5
Why study databases?• Everybody needs them, i.e. $$$.• There are lots of interesting
problems, both in database research and in implementation.
• Good design is always a challenge.
Fall 2002 6
Connection to otherareas of CS…
• Programming languages and software engineering (obviously)
• Algorithms (obviously)• Logic, discrete math, and theory of
computation • “Systems” issues: concurrency,
operating systems, file organization and networks.
Fall 2002 7
But 80% of the world’s data is not in a DB!
Examples: - scientific data (large images, complex
programs that analyze the data) - personal data- WWW
Fall 2002 8
Why don't we “program up” databases when we need
them? • For simple and small databases
this is often the best solution. Flat files and grep get us a long way.
• We run into problems when– The structure is complicated (more than a
simple table)– The database gets large– Many people want to use it simultaneously
Fall 2002 9
• We might start by building a file with the following structure:
• This text file is easy to deal with. So there's no need for a DBMS!
Example: Personal Calendar
What Day When Who Where
Lunch 10/24 1pm Rick Joe’s DinerCS123 10/25 9am Dr. Egghead Morris234Biking 10/26 9am Jane Jane’s houseDinner 10/26 6PM Jane Café Le Boeuf
Fall 2002 10
Problem 1: Data Organization
• Consider the all-important “who” field. Do we also want to keep e-mail addresses, telephone numbers etc?
• Expand our file to look like:
• Now we are keeping our address book in our calendar and doing so redundantly.
What When Who-name Who-email Who-tel …. Where …
Fall 2002 11
“Link” Calendar with Address Book?
• Two conceptual “entities” -- contact information and calendar -- with a relationship between them, linking people in the calendar to their contact information.
• This link could be based on something as simple as the person's name.
Fall 2002 12
Problem 2: Efficiency• Size of personal address book is probably less
than one hundred entries, but there are things we'd like to do quickly and efficiently. – “Give me all appointments on 10/28”– “When am I next meeting Jim?”
• “Program” these as quickly as possible. • Have these programs executed efficiently. • What would happen if you were using a
corporate calendar with hundreds of thousands of entries?
Fall 2002 13
Problem 3. Concurrency and Reliability
• Suppose other people are allowed access to your calendar and are allowed to modify it? How do we stop two people changing the file at the same time and leaving it in a physical (or logical) mess?
• Suppose the system crashes while we are changing the calendar. How do we recover our work?
Fall 2002 14
Transactions• Key concept for concurrency is that of a
transaction : an atomic sequence of database actions (read/write) on data items (e.g. calendar entry).
• Key concept for recoverability is that of a log : keeping track of all actions carried out by the db.
• Sounds like operating systems all over again!
Fall 2002 15
Database architecture -- the traditional view
It is common to describe databases in two ways:– The logical structure. What users see. The
program or query language interface.– The physical structure. How files are organized.
What indexing mechanisms are used. Further it is traditional to split the logical
level into two components: overall database design (conceptual) and the views that various users get to see.
Fall 2002 16
Three-level architectureView 1 View 2 … View N
Physical Level(file organization, indexing)
Schema Conceptual Level
Fall 2002 17
Data independence• A user of a relational database system should
be able to use SQL to query the database without knowing about how the precisely how data is stored, e.g.
• After all, you don't worry much how numbers are stored when you program some arithmetic or use a computer-based calculator.
SELECT When, WhereFROM CalendarWHERE Who = "Bill"
Fall 2002 18
More on data independence
• Logical data independence protects the user from changes in the logical structure of the data -- could completely reorganize the calendar “schema” without changing how I query it.
• Physical data independence protects the user from changes in the physical structure of data: could add an index on Who without changing how the user would write the query, but the query would execute faster (query optimization).
Fall 2002 19
That's the traditional view, but ...
• Three-level architecture is not always achievable for database programmers. When databases get big, queries must be carefully written to achieve efficiency.
• There are databases over which we have no control. The Web is a giant, disorganized, database.
• There are also well-organized database on the web (e.g., the Movie database) for which the terminology does not quite apply.
Fall 2002 20
In this course...• Study relational databases, their design,
how to query, what forms of indices to use.• Beyond relational algebra: a logical model
of data (Datalog), recursion.• Beyond “first-normal form”: object-
oriented databases, how to query, using OO design techniques.
• XML and semi-structured data models
Fall 2002 21
What we won’t cover in any depth...
• The “technology” of databases: – details of physical design– concurrency control– transaction management– query optimization
(although a few of these issues will be briefly discussed)
top related