cs 157b: database management systems ii march 20 class meeting department of computer science san...

23
CS 157B: Database Management Systems II March 20 Class Meeting Department of Computer Science San Jose State University Spring 2013 Instructor: Ron Mak www.cs.sjsu.edu/~mak

Upload: kelley-hoover

Post on 01-Jan-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

CS 157B: Database Management Systems IIMarch 20 Class Meeting

Department of Computer ScienceSan Jose State University

Spring 2013Instructor: Ron Mak

www.cs.sjsu.edu/~mak

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

2

Unofficial Field Trip

Computer History Museum in Mt. View http://www.computerhistory.org/

Experience a fully restored IBM 1401 mainframe computer from the early 1960s in operation. General info: http://en.wikipedia.org/wiki/IBM_1401 My summer seminar: http://www.cs.sjsu.edu/~mak/1401/ Restoration:

http://ed-thelen.org/1401Project/1401RestorationPage.html Private demos at 11:45 and at 2:00.

See a life-size working model of Charles Babbage’s Difference Engine in operation, a hand-cranked mechanical computer designed in the early 1800s. Public demo at 1:00.

Saturday, March 23.Meet in the museumlobby at 11:15 AM.

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

3

Extra Credit!

There will be extra credit if you participate in the unofficial field trip to the Computer History Museum. Up to 10 points added to your midterm score. To be decided:

a quiz (via Desire2Learn) or an essay

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

4

Extract, Transform, and Load (ETL)

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

5

Extract, Transform, and Load (ETL)

You want only high quality data in your data warehouse.

What is high quality data? correct unambiguous consistent complete

The transform phase of ETL produces high quality data. Cleaning the data. Conforming data from multiple sources.

_

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

6

Extract, Transform, and Load (ETL)

In the real world, data is often dirty. Therefore, the ETL process must clean the source data

when the data is being copied into the data warehouse.

Cleaning operations Remove or correct corrupted data. Remove or correct invalid or inconsistent data.

unexpected null values missing data values out of range misspellings referential integrity violations business rule violations

_

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

7

Extract, Transform, and Load (ETL)

Data from multiple sources may need to be conformed to be usable together in the data warehouse.

Type conversion Example: Convert a user ID in a data source from a string to a

long integer to match with the user ID in other data sources.

Format conversion Example: Dates and times, names

Align field and attribute names Examples: customer_name vs. name_of_client

store vs. retail_outlet_

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

8

ETL: Semantic Mappings

Unit conversions Example: feet vs. yards, miles vs. kilometers

Structural mappings Example: federal state city district

vs. kingdom region parish

Temporal mappings Example: One data source has a measure taken once an hour,

another data source has the same measure taken daily.

Spatial mappings Example: street addresses

vs. GIS coordinates (latitude + longitude) vs. political boundaries (cities, districts, counties, etc.)

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

9

ETL: Semantic Mappings

Spatio-temporal mappings Locations in space-time

And even more complex mappings May require the use of ontologies.

shared vocabularies knowledge structures models of reality etc.

_

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

10

Dimensional Modeling

Fact tables Contain values that are measures, usually numeric.

Example: the number of sales

Dimension tables Contain the context for the measures.

Examples: time, location, product Dimensions are usually grouped and hierarchical

Example: western locations, eastern locations Example: yearly, quarterly, monthly, weekly, daily, hourly

Often denormalized for query performance. Many queries, few updates.

_

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

11

Dimensional Modeling

Design criteria

What are the facts? What are we measuring? Example: number of sales

What is the grain, or granularity of the facts? Determined by the dimensions. All measurements in a fact table must be at the same grain. Example: sales figures collected at the point of sale

What are the dimensions?What context do we need to provide for the measures in the fact table? Examples: stores, dates, products

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

12

Dimensional Modeling

Implementation Star schema

Measures: number of units soldDimensions: date, store, product

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

13

Online Analytical Processing (OLAP)

A common type of business analysis. Also used to analyze scientific data.

Visualize data in a multidimensional manner. Analytical processes

that involve manipulating data along different dimensions.

The OLAP cube.

“What happened recently, and why?”_

http://gerardnico.com/wiki/database/oracle/oracle_olap

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

14

Online Analytical Processing (OLAP)

OLAP operations slice and dice drill up, drill down drill across, drill through pivot

_

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

15

Online Analytical Processing (OLAP)

Slice View or manipulate the data

along a subset of the dimensions.

Consider onlydata from thefirst quarter.

http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

16

Online Analytical Processing (OLAP)

Dice View or manipulate the

data within subsets of the ranges of the dimensions.

Consider only data from Q1 and Q2from only Toronto and Vancouver

for only computers and home entertainment.

http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

17

Online Analytical Processing (OLAP)

Drill down View or manipulate a

dimension at a lower level of detail.

Drill down on the time dimension

from quarters to months.

http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

18

Online Analytical Processing (OLAP)

Drill up “Roll up” (aggregate) data

to a higher level in along a dimension.

Sum up the cities by country.

http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

19

Online Analytical Processing (OLAP)

Drill across Integrate data from more than one fact table.

Drill through Access the database tables that underlie the OLAP cube.

_

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

20

Online Analytical Processing (OLAP)

Pivot Rotate the axes (dimensions)

to present a different view.

http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

21

OLAP Summary

http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

22

DW Summary

http://www.csun.edu/~twang/595DM/Slides/Week6.pdf

Plus: dashboards and scorecards

Department of Computer ScienceSpring 2013: March 20

CS 157B: Database Management Systems II© R. Mak

23

Cognos

Business intelligence (BI) tool from IBM. Queries and reports Dashboards and scorecards OLAP Data mining

predictive analysis

Cognos Business Intelligence 10 is available in the IBM Academic Cloud along with a sample data warehouse. I will create student accounts. Online tutorials Cognos demo