week 7 database systems/ data-intensive systemsdatabase systems/ data-intensive systems. ... mysql,...

19
[Ira Assent | Christian S. Jensen] www.cs.au.dk/~[ira|csj] Week 7 Database Systems/ Data-Intensive Systems

Upload: trankhanh

Post on 11-Mar-2018

216 views

Category:

Documents


3 download

TRANSCRIPT

[Ira Assent | Christian S. Jensen]

www.cs.au.dk/~[ira|csj]

Week 7

Database Systems/Data-Intensive Systems

dPersp - October 4, 2010 2

Overview• Introduction to databases

Applications Problems solved by database technology Key background concepts needed for the exercises DBMSs available Data-intensive systems

• Introduction to the exercises The ideas behind Practicalities

Database Applications• Database applications abound

Financials: banking, investment, insurance Travel: reservations, schedules Universities: registration, grades, courses Sales: customers, products, purchases Online retailers: order tracking, customized

recommendations Manufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax

deductions• Databases touch all aspects of our lives

Convenience Efficiency

• Can you imagine a life without databases?

3dPersp - October 4, 2010

Definitions• Mini-world

A part of reality about which information is stored• Data

Known facts about the mini-world that can be recorded and have an implicit meaning

• Database (DB) A collection of related data

• Database Management System (DBMS) A software package that facilitates the creation and maintenance

of a computerized database One DBMS, many DBs

• Database System A database and a DBMS

• Database Instance The contents of a DB at a particular time

4dPersp - October 4, 2010

Data Management Example• Scenario

You own a video store Customers rent DVDs Several copies of each movie.

• Needs Which DVDs have a customer rented? Are any rentals overdue? When will a DVD become available?

• Let’s use files for managing this data…

5dPersp - October 4, 2010

Files and Their Advantages• Example: Customers in a flat file, called customers.txt

C1, Eric, Elm Street,Tucson, Arizona C2, Kim, Broadway, New York, New York C3, Bo, Speedway Boulevard,Tucson, Arizona

• Advantages Text editors are easy to use Simple to insert/delete/update a record Cheap solution

6dPersp - October 4, 2010

Example Headaches• Queries

Sort the customers by name? –by state? –by city and state? Requirement

Robust, sophisticated query language

• Concurrent users Example: Ben and Sarah edit customers.txt at the same time.

1. Ben starts to edit customers.txt, reads it into memory.2. Sarah starts to edit customers.txt.3. Ben adds a new record.4. Ben saves customers.txt to disk.5. Sarah saves customers.txt to disk. Ben’s new record disappears!

Requirement Must support multiple users.

dPersp - October 4, 2010 7

Example Headaches• Crashes• Integrity

Inconsistent data, missing data, wrong data C1,Eric,Elm,Tucson C1,PPeter,Tucson,Elm,Arizona

Requirement Techniques for specifying and ensuring integrity

• Security• Efficiency

8dPersp - October 4, 2010

Database vs. File• Example of database

• Example of file C1, Eric, Elm Street, Tucson, Arizona C2, Kim, Broadway, New York, New York C3, Bo, Speedway Boulevard, Tucson, Arizona

CustomerID Name Street City StateC1 Eric Elm Street Tucson ArizonaC2 Kim Broadway New York New YorkC3 Bo Speedway

BoulevardTucson Arizona

9dPersp - October 4, 2010

Instances and Schemas• Similar to types and variables in programming languages• Schema – the logical structure of the database

Example: The database consists of information about a set of customers and accounts and the relationship between them

Analogous to the type of a variable

• Instance – the actual content of the database at a particular point in time Analogous to the value of a variable

• Instance

• Schema Customer(CustomerID, Name, E-mail, Street, City, State)

CustomerID Name Street City StateC1 Eric Elm Street Tucson ArizonaC2 Kim Broadway New York New York

10dPersp - October 4, 2010

Tables

• Columns/attributes customer_name, account_number

• Rows/tuples• Keys

What should we name the table? A set of attributes that uniquely identify a tuple {customer_id, account_number}

dPersp - October 4, 2010 11

Databases• Many tables• Can the tables have

independent content?• Queries

Who live in thesame city?

How much moneydoes Johnson have?

List the top-3 richest cities

12dPersp - October 4, 2010

Database Design• The process of designing the schema of a database

(Also the result of the process)• A good database design

Captures all relevant aspects of the mini-world Relevant aspects: those needed to support the applications Is a “clean” and “nice” model

• Designing good schemas is very important. Some people design schemas for a living (and are very well paid). Requires understanding of a business and database technology Requires conceptual thinking It involves trade-offs and is not just about false or true

• Bad schemas create problems for “all.” More complex applications, potential integrity problems and query

and update performance problems, maintenance problems

dPersp - October 4, 2010 13

The Entity-Relationship Model• Models an organization as a collection of entities and

relationships Entity: a “thing” or an “object” in the organization that is

distinguishable from other objects Described by a set of attributes

Relationship: an association among several entities

• Represented by an entity-relationship diagram (ERD)

14dPersp - October 4, 2010

Example DBMSs4th Dimension, Adabas D, Alpha Five, Apache Derby, Aster Data, BlackRay, CA-Datacom, CSQL, CUBRID, Daffodil database, DataEase, Database Management Library, Dataphor, DB-Fast, DB2, Derby/Java DB, ElevateDB, Empress Embedded Database, EnterpriseDB, EffiProz, eXtremeDB, fastDB, FileMaker Pro, Firebird, FrontBase, Gladius DB, Greenplum, H2, Helix database, HSQLDB, SQLDB, IBM DB2, IBM Lotus Approach, IBM DB2 Express-C, Infobright, Informix, Ingres, InterBase, InterSystems Caché, Kognitio, Linter, LucidDB, MariaDB, MaxDB, Mckoi SQL Database, Microsoft Access, Microsoft Jet Database Engine (part of Microsoft Access), Microsoft SQL Server, Microsoft SQL Server Express, Microsoft Visual FoxPro, Mimer SQL, MonetDB, mSQL, MySQL, Netezza, Nexusdb, NonStop SQL, Openbase, OpenLink Virtuoso (Open Source Edition), OpenLink Virtuoso Universal Server, Oracle, Oracle Rdb for OpenVMS, Panorama, Pervasive, PostgreSQL, Progress Software, RDM Embedded, RDM Server, The SAS system, Sav Zigzag, ScimoreDB, SmallSQL, solidDB, SQLbase, SQLite, Sybase Adaptive Server Enterprise, Sybase Adaptive Server IQ, Sybase SQL Anywhere (formerly known as Sybase Adaptive Server Anywhere and Watcom SQL), Sybase Advantage Database Server, Tdbengine, Teradata, TimesTen, txtSQL, UniData, UniVerse, Valentina, Vertica, VistaDB, VMDS, XSPRADA

15dPersp - October 4, 2010

DBMS Comparisons• Relational DBMSs

http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems

• Object-Relational DBMSshttp://en.wikipedia.org/wiki/Comparison_of_object-

relational_database_management_systems

• Database toolshttp://en.wikipedia.org/wiki/Comparison_of_database_tools

dPersp - October 4, 2010 16

Database vs. Data-Intensive System• Data-intensive system: A larger IT system that may

include one or more DBMSs and where data management is somehow challenging.

• A broader focus• Examples

Clustering of high-dimensional data Tracking of moving objects Route prediction Mobile service infrastructure Location privacy Continuous queries on moving objects Spatio-textural search/hyper-local web search Multimedia similarity search

• This is where much of our research “lives.”

dPersp - October 4, 2010 17

Underlying Idea• We want today to be bottom-up instead of top-down.• We want you to invent something.• We want you to think, be creative, and consider

alternatives.

• We do not care so much about the correctness of the solutions – if you hand in solutions that you show you were thinking about things, you will get credit for week 7.

• This may be chaotic!

dPersp - October 4, 2010 18

Practicalities• Remember to bring paper and pencil

You will doing the solutions on paper and will then upload the answers at the end of the day.

• We merge “small” a and b groups to get “big” groups suitable for discussion. After lunch we will re-do the groups Thus, each “small” group should be prepared to upload its own

solution.

• You need one laptop per group. Record your solutions in the template provided as you go along. No need to install anything.

• We may, or may not, end each half day with a joint session.

• Complete the evaluation of week 7 at the end of the day.

dPersp - October 4, 2010 19