week 7 database systems/ data-intensive systemsdatabase systems/ data-intensive systems. ... mysql,...
TRANSCRIPT
[Ira Assent | Christian S. Jensen]
www.cs.au.dk/~[ira|csj]
Week 7
Database Systems/Data-Intensive Systems
dPersp - October 4, 2010 2
Overview• Introduction to databases
Applications Problems solved by database technology Key background concepts needed for the exercises DBMSs available Data-intensive systems
• Introduction to the exercises The ideas behind Practicalities
Database Applications• Database applications abound
Financials: banking, investment, insurance Travel: reservations, schedules Universities: registration, grades, courses Sales: customers, products, purchases Online retailers: order tracking, customized
recommendations Manufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax
deductions• Databases touch all aspects of our lives
Convenience Efficiency
• Can you imagine a life without databases?
3dPersp - October 4, 2010
Definitions• Mini-world
A part of reality about which information is stored• Data
Known facts about the mini-world that can be recorded and have an implicit meaning
• Database (DB) A collection of related data
• Database Management System (DBMS) A software package that facilitates the creation and maintenance
of a computerized database One DBMS, many DBs
• Database System A database and a DBMS
• Database Instance The contents of a DB at a particular time
4dPersp - October 4, 2010
Data Management Example• Scenario
You own a video store Customers rent DVDs Several copies of each movie.
• Needs Which DVDs have a customer rented? Are any rentals overdue? When will a DVD become available?
• Let’s use files for managing this data…
5dPersp - October 4, 2010
Files and Their Advantages• Example: Customers in a flat file, called customers.txt
C1, Eric, Elm Street,Tucson, Arizona C2, Kim, Broadway, New York, New York C3, Bo, Speedway Boulevard,Tucson, Arizona
• Advantages Text editors are easy to use Simple to insert/delete/update a record Cheap solution
6dPersp - October 4, 2010
Example Headaches• Queries
Sort the customers by name? –by state? –by city and state? Requirement
Robust, sophisticated query language
• Concurrent users Example: Ben and Sarah edit customers.txt at the same time.
1. Ben starts to edit customers.txt, reads it into memory.2. Sarah starts to edit customers.txt.3. Ben adds a new record.4. Ben saves customers.txt to disk.5. Sarah saves customers.txt to disk. Ben’s new record disappears!
Requirement Must support multiple users.
dPersp - October 4, 2010 7
Example Headaches• Crashes• Integrity
Inconsistent data, missing data, wrong data C1,Eric,Elm,Tucson C1,PPeter,Tucson,Elm,Arizona
Requirement Techniques for specifying and ensuring integrity
• Security• Efficiency
8dPersp - October 4, 2010
Database vs. File• Example of database
• Example of file C1, Eric, Elm Street, Tucson, Arizona C2, Kim, Broadway, New York, New York C3, Bo, Speedway Boulevard, Tucson, Arizona
CustomerID Name Street City StateC1 Eric Elm Street Tucson ArizonaC2 Kim Broadway New York New YorkC3 Bo Speedway
BoulevardTucson Arizona
9dPersp - October 4, 2010
Instances and Schemas• Similar to types and variables in programming languages• Schema – the logical structure of the database
Example: The database consists of information about a set of customers and accounts and the relationship between them
Analogous to the type of a variable
• Instance – the actual content of the database at a particular point in time Analogous to the value of a variable
• Instance
• Schema Customer(CustomerID, Name, E-mail, Street, City, State)
CustomerID Name Street City StateC1 Eric Elm Street Tucson ArizonaC2 Kim Broadway New York New York
10dPersp - October 4, 2010
Tables
• Columns/attributes customer_name, account_number
• Rows/tuples• Keys
What should we name the table? A set of attributes that uniquely identify a tuple {customer_id, account_number}
dPersp - October 4, 2010 11
Databases• Many tables• Can the tables have
independent content?• Queries
Who live in thesame city?
How much moneydoes Johnson have?
List the top-3 richest cities
12dPersp - October 4, 2010
Database Design• The process of designing the schema of a database
(Also the result of the process)• A good database design
Captures all relevant aspects of the mini-world Relevant aspects: those needed to support the applications Is a “clean” and “nice” model
• Designing good schemas is very important. Some people design schemas for a living (and are very well paid). Requires understanding of a business and database technology Requires conceptual thinking It involves trade-offs and is not just about false or true
• Bad schemas create problems for “all.” More complex applications, potential integrity problems and query
and update performance problems, maintenance problems
dPersp - October 4, 2010 13
The Entity-Relationship Model• Models an organization as a collection of entities and
relationships Entity: a “thing” or an “object” in the organization that is
distinguishable from other objects Described by a set of attributes
Relationship: an association among several entities
• Represented by an entity-relationship diagram (ERD)
14dPersp - October 4, 2010
Example DBMSs4th Dimension, Adabas D, Alpha Five, Apache Derby, Aster Data, BlackRay, CA-Datacom, CSQL, CUBRID, Daffodil database, DataEase, Database Management Library, Dataphor, DB-Fast, DB2, Derby/Java DB, ElevateDB, Empress Embedded Database, EnterpriseDB, EffiProz, eXtremeDB, fastDB, FileMaker Pro, Firebird, FrontBase, Gladius DB, Greenplum, H2, Helix database, HSQLDB, SQLDB, IBM DB2, IBM Lotus Approach, IBM DB2 Express-C, Infobright, Informix, Ingres, InterBase, InterSystems Caché, Kognitio, Linter, LucidDB, MariaDB, MaxDB, Mckoi SQL Database, Microsoft Access, Microsoft Jet Database Engine (part of Microsoft Access), Microsoft SQL Server, Microsoft SQL Server Express, Microsoft Visual FoxPro, Mimer SQL, MonetDB, mSQL, MySQL, Netezza, Nexusdb, NonStop SQL, Openbase, OpenLink Virtuoso (Open Source Edition), OpenLink Virtuoso Universal Server, Oracle, Oracle Rdb for OpenVMS, Panorama, Pervasive, PostgreSQL, Progress Software, RDM Embedded, RDM Server, The SAS system, Sav Zigzag, ScimoreDB, SmallSQL, solidDB, SQLbase, SQLite, Sybase Adaptive Server Enterprise, Sybase Adaptive Server IQ, Sybase SQL Anywhere (formerly known as Sybase Adaptive Server Anywhere and Watcom SQL), Sybase Advantage Database Server, Tdbengine, Teradata, TimesTen, txtSQL, UniData, UniVerse, Valentina, Vertica, VistaDB, VMDS, XSPRADA
15dPersp - October 4, 2010
DBMS Comparisons• Relational DBMSs
http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems
• Object-Relational DBMSshttp://en.wikipedia.org/wiki/Comparison_of_object-
relational_database_management_systems
• Database toolshttp://en.wikipedia.org/wiki/Comparison_of_database_tools
dPersp - October 4, 2010 16
Database vs. Data-Intensive System• Data-intensive system: A larger IT system that may
include one or more DBMSs and where data management is somehow challenging.
• A broader focus• Examples
Clustering of high-dimensional data Tracking of moving objects Route prediction Mobile service infrastructure Location privacy Continuous queries on moving objects Spatio-textural search/hyper-local web search Multimedia similarity search
• This is where much of our research “lives.”
dPersp - October 4, 2010 17
Underlying Idea• We want today to be bottom-up instead of top-down.• We want you to invent something.• We want you to think, be creative, and consider
alternatives.
• We do not care so much about the correctness of the solutions – if you hand in solutions that you show you were thinking about things, you will get credit for week 7.
• This may be chaotic!
dPersp - October 4, 2010 18
Practicalities• Remember to bring paper and pencil
You will doing the solutions on paper and will then upload the answers at the end of the day.
• We merge “small” a and b groups to get “big” groups suitable for discussion. After lunch we will re-do the groups Thus, each “small” group should be prepared to upload its own
solution.
• You need one laptop per group. Record your solutions in the template provided as you go along. No need to install anything.
• We may, or may not, end each half day with a joint session.
• Complete the evaluation of week 7 at the end of the day.
dPersp - October 4, 2010 19