data modelling and databases donald kossmann systems group eth zürich 1

25
Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich www.systems.ethz.ch 1

Upload: gillian-copeland

Post on 26-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Data Modelling and Databases

Donald Kossmann

Systems Group

ETH Zürich

www.systems.ethz.ch

1

Page 2: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

A Short History of Computing

• since 1940s: Computers for Number Crunching– von Neuman Machine, Moore’s law

• since 1990s: Magnetic storage cheaper than paper– various technologies (tape, disk, flash, …)

• since 2000s: “The Cloud”– sensors: most data is digitally born– e.g., mobile phones, cars, microwaves, fitbit, …

Page 3: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

A Short History of Computing• since 1940s: Computers for Arithmetics

– Moore’s law has dominated this trend– (hitting its limits today)

• since 1990s: Magnetic storage cheaper than paper– various technologies (tape, disk, DRAM, SSD, PCM,

…)– continued trend of higher density at same cost

• since 2000s: “Internet of things”– mobile phones have cameras– various sensing technologies (RFIDs, …)

Calculator(+, -, *, /)

Calculator(+, -, *, /)

Information Hub(store, process, communicate data)

Information Hub(store, process, communicate data)

Page 4: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Computer Science in Change

• Traditional Computing– automate processes: execute a sequence of +, *, …

• Today: “Big Data”– automate experience:

• do not do the same mistake twice• answer tough questions based on past evidence

Page 5: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Simple Truths• „Power of data“

– the more data the merrier (GB -> TB)– data comes from everywhere in all shapes– value of data often discovered later– data has no owner within an organization (no silos!)

• Services turn data into $– the more services the merrier – need to adapt quickly

• E.g.: Google, Amadeus, Disney, Walmart, BMW, ...• Platforms: Oracle, MS, SAP, Google, 28msec, ... 5

Page 6: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Two Examples• Google Translate

– translate text based on snippets of multi-lang. corpora – e.g., EU patents, translated books, Web sites, etc.

• Patients like me– find patients with the same disease and markers– exchange war stories

Page 7: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Challenges

• Automate Experience – NOT Thinking!– only works if you ask the right questions

and interpret the answers correctly– shortage of Big Data talent on job market

• Misuse of data & privacy– owner must control usage of data

• Democratization – Big Data opportunities in the hands of everybody

Page 8: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Big Data Question: Yes or No?

• Find a spouse?• Should Adam bite into the apple?• 1 + 1?• Cure for cancer?• How to treat a cough?• Should I give Donald a loan?• Premium for fire insurance?• When should my son come home?• Which book should I read next?• Translate from German to English.

Page 9: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Vision

• Answer all questions– Store all data and make it available and useful to all

authorized people, anytime and anywhere.

• Google‘s mission statement:Organize all the information of the world.

• Status: Technology is there (card boxes). The model is missing (labels).

9

Page 10: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Data Science: Science of Questions• How to formulate questions?

– relational algebra

• How to organize data to answer questions?– ER / UML, relational data model

• How to acquire data to answer questions?– project, transactions, (much more not covered)

• How to make it efficient– normal forms, optimization

• How to quantify error, avoid stupid questions?– not covered in this class

10

Page 11: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

What is a Database System (DBMS)?

• A DBMS is a tool that helps develop and run data-intensive applications:– large databases– large data streams

11

Page 12: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

The Data Management Universe

12

Page 13: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

DB vs. BD

• Databases– You know questions upfront– Closed world: data correct & complete

• Big Data– The exact opposite in all regards

• But, similar algorithms, languages, technology– Collect data to answer question when it is asked– Bridge time between event (data) and question

13

Page 14: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Data and Data Models

• Formats– XML, serialized Java objects, binary, ...

• Structures / Models– Tuples, hierarchies, relationships, lists, unstructured, ...

• Examples– Lecture notes– Financial accounts– Emotions (?): love, taste, ...

14

Page 15: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Systems

• Software platforms that store & organize data– File system: Windows, ...– Relational database systems: Postgres, Oracle, ...– Other database systems: Sausalito, OODB, ...– Key/value stores: HBase, AWS S3, MongoDB, ...– Interpreters: JVM, .NET, ...– Human intelligence

• Hardware that stores & organizes data– HDD, SSD, main memory, ...– Paper– Human brain

15

Page 16: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Where is data stored today?

Costper Bit

Time2000 20101990

Paper

Machine

Human

Mechanical Turk: Prices for humans going down again. How come?16

Page 17: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Typical Applications(data / operations)

• Bank (Accounts / „Money Transfer“)

• Library (Books / „Lend Book“)

• Content Management System (docs, „show“)

• E-Business (Catalogue, „search“)

• ERP (Order, „delivery“)

• Decision Support (Order, „emp of the month“)

• Facebook, Twitter, … (Friends, „post tweet“)17

Page 18: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Why use a DBMS?

• Avoid redundancy and inconsistency

• Rich (declarative) access to the data

• Synchronize concurrent data access

• Recovery after system failures

• Security and privacy

• Reduce cost and pain to do something useful– There is always an alternative!!!

18

Page 19: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

The Data Management Universe

19

Page 20: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Data Modelling

Relational Schema

HierarchicalSchema

Object-orientedSchema

Conceptual Schema(ER-Schema)

Manual Modelling

Semi-automaticTransformation

„Mini World“

XML

20

Page 21: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Example

Student

LectureProfessor

Real World: University

PersNrMatrNr

NameNameStudent Professor

attends gives

Lecture Title

Nr

Conceptual Modelling

21

Page 22: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Overview of Data Models

• Network model (e.g., CODASYL/COBOL)

• Hierarchical model (IBM IMS/FastPath)

• Relational model (SQL)

• Object-oriented model (ODMG 2.0)

• Semi-structured model (XML Infoset)

• Deductive model (Datalog, Prolog)

22

Page 23: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Relational Data Model

Student

Legi Name

2612025403

...

FichteJonas

...

attends

Legi Lecture

2540326120

...

50225001

...

Lecture

Nr Title

50015022

...

Grundzüge Glaube und Wissen

...Select NameFrom Student, attend, LectureWhere Student.Legi= attend.Legi and

attend.Lecture= Lecture.Nr andLecture.Title = `Grundzüge´;

Update Lecture set Title = `Grundzüge der

Logik´ where Nr = 5001; 23

Page 24: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Logs Indexes DB Catalogue

Storage Manager

TA ManagementRecovery

Runtime Schema

Query Optimizer DBMS

DML-Compiler DDL-Compiler

Application Ad-hoc QueryManagement

toolsCompiler

„Naive“User

ExpertUser

App-Developer

DB-admin

External Storage

Components of a Database System

24

Page 25: Data Modelling and Databases Donald Kossmann Systems Group ETH Zürich  1

Database Abstraction Layers

Data Independence

Physical Layer(e.g., indexes)

Logical Layer (schema)

View1 View 2 View 3...

Logical Data Independence

Physical Data Independence

Changes at one layer do not affect another layer!

25