introduction and conceptual modelling - lecture 1 - introduction to databases (1007156anr)

45
2 December 2005 Introduction to Databases Introduction and Conceptual Modelling Prof. Beat Signer Department of Computer Science Vrije Universiteit Brussel http://www.beatsigner.com

Upload: beat-signer

Post on 01-Nov-2014

5.443 views

Category:

Education


3 download

DESCRIPTION

This lecture is part of an Introduction to Databases course given at the Vrije Universiteit Brussel.

TRANSCRIPT

Page 1: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

2 December 2005

Introduction to Databases Introduction and Conceptual Modelling

Prof. Beat Signer

Department of Computer Science

Vrije Universiteit Brussel

http://www.beatsigner.com

Page 2: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

2 February 14, 2014

Course Organisation

Prof. Beat Signer

Vrije Universiteit Brussel

10 G 731d

+32 2 629 12 39

[email protected]

Reinout Roels

Vrije Universiteit Brussel

10 F 730

+32 2 629 37 53

[email protected]

Page 3: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

3 February 14, 2014

Course Information

Course book Database System Concepts

(Sixth Edition), Abraham Silberschatz, Henry Korth and S. Sudarshan, McGraw-Hill, 2010

additional information from the book is available online

- http://highered.mcgraw-hill.com/sites/0073523321

Course information (lecture slides, exercises, …)

available on PointCarré http://pointcarre.vub.ac.be/index.php?application=weblcms&go=course_viewer&course=2321

Page 4: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

4 February 14, 2014

Exercises

Course content is going to be applied in the

exercise sessions

Weekly exercise sessions starting on February 20

- group 1: computer room E.1.4, Thursday 11:00-13:00

- group 2: computer room E.1.7, Thursday 14:00-16:00

assistants: Reinout Roels

Additional content may be covered in exercise sessions exam covers content of lectures and exercises

Page 5: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

5 February 14, 2014

Exam

Written closed book exam in Dutch / English covers content of lectures, specific book chapters

and exercises

Page 6: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

6 February 14, 2014

Course Overview

1. Introduction overview

conceptual modelling and ER model

2. Extended ER Model and other Modelling Languages

3. Relational Model and Relational Algebra

4. Relational Database Design reduction

functional dependencies and normalisation

5. Structured Query Language (SQL)

6. Advanced SQL

Page 7: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

7 February 14, 2014

Course Overview …

7. DBMS Architectures and Features DBMS components

client-server architecture

parallelisation and distribution

8. Storage Management

9. Access Methods indexing and hashing

10.Query Processing and Optimisation

11.Transaction Management transactions

concurrency and recovery

Page 8: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

8 February 14, 2014

Course Overview …

12.Object and Object-Relational Databases

13.Future Trends and Review

Page 9: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

9 February 14, 2014

Databases in Action

Online shops product information, customer

data, order data, ...

e.g. Amazon

- hundreds of millions of customers

- more than 50 terrabytes of data

Human resources course registration, student

grades, employee records, salary information, tax information, ...

e.g. PointCarré

- course registration

Page 10: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

10 February 14, 2014

Databases in Action ...

Banking and trading customer data, account

information, transactions, ...

e.g. London Stock Exchange

- almost 1 million trades per day

Reservation systems book flights from multiple

airlines, hotel rooms etc.

e.g. Amadeus systems

- Global Distribution System (GDS)

founded by Lufthansa, Air France

and other partners

Page 11: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

11 February 14, 2014

Databases in Action ...

Digital archives persistently store various

types of digital media

e.g. Internet Archive project

- access to more than 150 billion

archived web pages

[archive.org]

Libraries index for traditional paper-

based libraries as well as digital libraries

e.g. Open Library project

- over 23 million indexed books

[openlibrary.org]

Page 12: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

12 February 14, 2014

Databases in Action ...

Geographic Information

Systems (GIS) store raster (bitmap) or

vector data representing real world objects

geospatial query language

Scientific databases sensor data, classifications

(e.g. human genome) as well as data from simulations

e.g. LHC Computing Grid

- LHC experiments at CERN

- 15 petabytes of data per year

Page 13: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

13 February 14, 2014

Databases in Action ...

Many everyday devices

contain databases TVs, washing machines,

mobile phones, ...

e.g. Android phones with SQLite database

Embedded databases in

cars, airplanes etc. manage configurations and

store sensor data

e.g. db4o object database used in BMW's Car IT system

Page 14: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

14 February 14, 2014

Databases in Action ...

Databases in the WISE

research lab (VUB) database-driven cross-media

publishing

database extensions for hypermedia services

personal information management

data visualisation

- e.g. ArtVis

human-information interaction

paper-digital interfaces

Page 15: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

15 February 14, 2014

Databases in Action ...

Databases touch all aspects of our daily life!

Numerous large database software companies e.g. Oracle was the 3rd largest software company in 2011

Page 16: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

16 February 14, 2014

Basic Terminology

Database collection of logically related data

database schema describes the database design

- format and relationships between stored data (often rather static)

collection of data stored in a database at a given time is called an instance of the database

Database Management Systems (DBMS) tools (programs) to efficiently store, maintain and retrieve

information from a database

- support of create, read, update and delete data (CRUD operations)

- data definition language (DDL) to define the database schema

- data manipulation language (DML) to query and update the data

• often declarative fourth generation languages (4GLs) such as SQL

- data access control, transactions and concurrency control

Page 17: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

17 February 14, 2014

File-Processing System

Why should we not just use multiple files in a

file system to store our data?

There are various disadvantages of such an approach data redundancy and inconsistency

- different file formats over time

- duplication of information in different files

limited data access

- we have to write new programs to carry out new tasks

- data cannot be retrieved in a convenient and efficient manner

data isolation

- data may be distributed over different files without a common format

integrity

- integrity constraints (e.g. balance > 0) are hidden in the program code and not

explicitly stated and checked

Page 18: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

18 February 14, 2014

File-Processing System …

missing atomic operations

- system failures and crashes may leave the data in an inconsistent state

(e.g. only parts of a operation have been carried out)

- example: transfer of money from one account to another account

concurrent update anomalies

- concurrent updates may leave the data in an inconsistent state

- example: two programs simultaneously removing money from a single account

limited security control

- difficult to give a user only access to parts of a file

DBMSs offer solutions to all these problems concepts and algorithms to solve the problems with

file-processing systems

Page 19: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

19 February 14, 2014

General Database Architecture

Access

Methods

System

Buffers

Authorisation

Control

Integrity

Checker

Command

Processor

Program

Object Code

DDL

Compiler

File

Manager

Buffer

Manager

Recovery

Manager

Scheduler

Query

Optimiser

Transaction

Manager

Query

Compiler

Queries

Catalogue

Manager

DML

Preprocessor

Database

Schema

Application

Programs

Data, Indices and

System Catalogue

Database

Manager

Data

Manager

DBMS

Programmers Users DB Admins

Based on 'Components of a DBMS', Database Systems,

T. Connolly and C. Begg, Addison-Wesley 2010

Page 20: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

20 February 14, 2014

Data Abstraction

Physical level physical schema describes how the data

is stored (complex low-level data structures)

Logical level logical schema describes what data is stored

- simple structures: attribute names, data types and relationships between data

implementation of simple structures might be based on complex physical-level structures but the user of the logical level should not be aware of that physical data independence

View level subschemas provide only access to parts of the database

- reduce complexity and introduce security

multiple views might be defined for a single database

Viewn

Physical Level

Logical Level

View1

Page 21: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

21 February 14, 2014

Duality of the Database Schema

Database Schema describes

describes

Application

Concepts

Database

Concepts

Application

World

Computer

World

Page 22: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

22 February 14, 2014

Data Models and History of DBMSs

Punched cards used since 1725 Hollerith cards later used by IBM for data processing

1950s: Data processing with magnetic tapes as storage only sequential access to data

reading from one or multiple tapes and writing to a new tape

sometimes combined with input from punched cards etc.

Magnetic Tape Punched Card

Page 23: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

23 February 14, 2014

Data Models and History of DBMSs ...

1960s: Widespread use of hard disks direct access (random access) to data

opened possibilities for new Navigational DBMSs

- IBM's IMS (1968), hierarchical database

- Integrated Data Store (IDS), network database

IBM 350 Disk

Storage Unit

Hard Disk

Page 24: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

24 February 14, 2014

Data Models and History of DBMSs ...

A data model is a collection of conceptual tools describes data, data relationships, data semantics and constraints

Hierarchical model data organised in a tree structure

used in early mainframe DBMS

- e.g. IBM's Information Management System (IMS)

XML documents also described by a hierarchical model

Network model generalised graph structure

two main constructs

- records contain fields and sets define relationships between records

navigational operations

- follow the relationship from one record to another record

Page 25: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

25 February 14, 2014

Data Models and History of DBMSs ...

Relational model collection of tables (relations) containing records

described in a paper by Edgar F. Codd in 1970

1970s: Relational DBMS IBM's System R (1974) "based" on Codd's paper; SQL added later

Entity-Relationship (ER) model representation of basic objects (entities) and their relationships

widely used in conceptual database design

Object-based data model introduces object identity, encapsulation and methods

1980s: Object Databases initial work on object databases

Page 26: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

26 February 14, 2014

Data Models and History of DBMSs ...

1990s: Web Interfaces to Databases databases deployed much more extensively

Semistructured data model no clear separation between data and the schema

("self-describing" data)

individual data items of the same type may have different attributes

XML is widely used to represent semistructured data

2000s: XML and XQuery relational databases often still form the core

Later 2000s: Extremely Large-scale distributed DBMS BigTable or Hadoop and Hbase

"NoSQL databases"

Page 27: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

27 February 14, 2014

Database Design

Conceptual Design define an abstract conceptual application model containing the

main domain concepts

- interact with domain experts to get the requirements

describe the entities (with attributes) and their relationships

specify the functional requirements (operations)

- ensure that operations can be realised based on the conceptual model

Database implementation based on conceptual model logical design phase

- mapping of the conceptual schema to the implementation data model

• e.g. reduction from the ER model to the relational data model

- define the logical database schema

physical design phase

- define the physical database layout based on the logical database schema

Page 28: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

28 February 14, 2014

Database Design ...

Two major database design pitfalls have to be avoided we should avoid any redundancy where information is repeated at

multiple places since this might lead to inconsistent data

- e.g. a lecture management system where a student's name is stored for each

lecture they are attending instead of storing it in a separate student entity

a database design may be incomplete and not enable the representation of certain aspects of the application domain

- e.g. in a shopping application where the customer information is stored as part

of an order we cannot enter new customer data without having an order

There is often more than one "good design" e.g. when do we model something as a relationship and when as

a separate entity?

modelling is a challenging task that requires a combination of engineering skills and "good taste"

Page 29: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

29 February 14, 2014

Entity-Relationship (ER) Model

Conceptual model based on a set

of entities and relationships

An entity is a "thing" or "object" that can

be distinguished from other objects

A relationship describes an association

between multiple entities

"Introduced" and formalised by Peter Chen P. Chen, The Entity-Relationship Model - Toward a Unified View

of Data, ACM Transactions on Database Systems 1 (1), March 1976

Peter Chen

Page 30: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

30 February 14, 2014

Entities

An entity represents a distinguishable object e.g. specific person, car or company

An entity is described by a number of attributes has to be uniquely identifiable by its attributes (ovals)

An entity set is a set of entities with the same type the extension of the entity set (rectangle) are its entities

an entity may belong to multiple entity sets

Beat Signer 1234

1576 Lode Hoste

3212 William Van Woensel

Employees

Employees

id name

note that we will use a slightly

different notation than in the book!

Page 31: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

31 February 14, 2014

Attributes

The set of permitted attribute values is called the domain

or value set entity instances can be described by a set of (name,value) pairs

e.g. {(id, 1576),(name, Lode Hoste)}

The ER model supports the following attribute types simple attributes

composite attributes

- hierarchy of sub-attributes

single-valued attributes

multivalued attributes

- optional lower and upper bounds

derived attributes

- computed via relationships or other attribute values

Page 32: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

32 February 14, 2014

Attributes ...

A multivalued attribute is represented by a double ellipse

Derived attributes are indicated by dashed ellipses

address is an example of composite attribute

LocatedAt OfficesEmployees

id name

birthday

phone age #offices

address

street city

0..1 0..*

Page 33: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

33 February 14, 2014

Keys

An entity's attribute values must uniquely identify the

entity

A subset of attributes that uniquely identify an entity is

called superkey

A minimal superkey without any unnecessary attributes

is called a candidate key

The primary key is one of the candidate keys chosen by

the database designer for unique entity identification in the ER model, the primary key is highlighted by the set of

underlined attributes

the value of a primary key should change very rarely

Page 34: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

34 February 14, 2014

Relationships

A relationship is an association between multiple entities

A relationship set (diamond) is a set of relationships of the

same type e.g. LocatedAt

Beat Signer 1234

1576 Lode Hoste

3212 William Van Woensel

Employees

10F718

10F705

10F721

10F703

Offices

LocatedAt OfficesEmployees

id name name address

Page 35: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

35 February 14, 2014

Relationships ...

We can have binary or n-ary relationship sets {(e1, e2,..., en) | e1 E1, e2E2,..., enEn}

Each relationship instance in an ER schema represents

an association between the involved entities

The role defines an entity's function in a relationship has to be explicitly defined if the same entity set participates more

than once in a relationship set (recursive relationship)

A relationship may contain descriptive attributes

A relationship instance must be uniquely identifiable by

its entities (without any descriptive attributes) i.e. a relationship set cannot contain two relationship entities that

only differ in their descriptive attributes

Page 36: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

36 February 14, 2014

Relationship with Roles and Attributes

LocatedAt OfficesEmployees

id name name address

WorksFor

boss employee

duration

Page 37: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

37 February 14, 2014

Example of a 3-ary Relationship

WorksFor CompaniesEmployees

id name name address

Durations

from to

Page 38: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

38 February 14, 2014

Cardinality Constraints

A relationship can be one-to-one, one-to-many,

many-to-one or many-to-many

An arrow indicates a to-one relationship cardinality constraints may also be expressed by numbers

- e.g. 0..*, 1..*, 0..1, 1..1, 2..5

a double line or 1..* indicates a total participation constraint

MarriedTo WomenMen

Teaches CoursesTeachers

StarsIn FilmsFilmstars

Page 39: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

39 February 14, 2014

Weak Entity Sets

An entity set with a primary key is called a strong

entity set

A weak entity set (double rectangle) does not have

enough attributes to form a primary key

Offers SeatsCinemas

id name number colour

Page 40: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

40 February 14, 2014

Weak Entity Sets ...

A weak entity set must be associated with an

identifying entity set via an identifying relationship

(double diamond) a weak entity set is existence dependent on an identifying

entity set

- can also participate in other non-identifying relationships

a weak entity set must relate to the identifying entity set via a total participation constraint and each weak entity instance can only be related to one identifying entity instance

a discriminator or partial key (underlined dashed attributes) uniquely identifies a weak entity relative to a strong entity

In some cases a weak entity may also be expressed as a

multivalued composite attribute

Page 41: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

41 February 14, 2014

ER Design Issues

When do we model something as an attribute and when

as an entity set? there is no general answer and the choice depends on the

specific application domain to be modelled

When do we model something as a relationship set and

when as an entity set? a relationship set often corresponds to an action between entities

In general we should try to avoid higher level n-ary

relationship sets (3-ary relationship sets should be the

maximum and even these should be used carefully)

There should be no redundant attributes

Page 42: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

42 February 14, 2014

Homework

Study the following two chapters of the

Database System Concepts book chapter 1

- Introduction

chapter 7

- sections 7.1-7.5 and 7.7

- Database Design and the ER Model

Page 43: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

43 February 14, 2014

Exercise 1

Conceptual modelling

Entity-Relationship (ER) model

Page 44: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

Beat Signer - Department of Computer Science - [email protected]

44 February 14, 2014

References

A. Silberschatz, H. Korth and S. Sudarshan,

Database System Concepts (Sixth Edition),

McGraw-Hill, 2010

P. Chen, The Entity-Relationship Model - Toward a

Unified View of Data, ACM Transactions on Database

Systems 1 (1), March 1976

WISE Lab http://wise.vub.ac.be

ArtVis Project http://wise.vub.ac.be/content/artvis-exploring-information-through-advanced-visualisation-techniques

Page 45: Introduction and Conceptual Modelling - Lecture 1 - Introduction to Databases (1007156ANR)

2 December 2005

Next Lecture Extended ER Model and

other Modelling Languages