zhangxi lin texas tech university isqs 6347, data & text mining 1 isqs 6339 data management and...

19
Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Upload: josephine-lloyd

Post on 14-Dec-2015

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Zhangxi Lin

Texas Tech University

ISQS 6347, Data & Text Mining1

ISQS 6339 Data Management and Business Intelligence

Database Review

Page 2: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Attributes of Data Sharable. Moveable. Secure. Accurate. Timely. Relevant.

Page 3: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Data hierarchy Bits. Characters. Fields (columns). Records (rows). Files (table). Database.

Page 4: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Why Build a Database? Handle large amounts of data. Satisfy multiple users. Make information retrieval faster. Make data input faster. Provide greater accuracy.

Page 5: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Database versus Database Management System (DBMS) Database is a self-describing collection of

integrated files. A DBMS is a complex computer program that acts

as a data librarian, supervising the transfer of data between the end user and the database.

Page 6: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Relational Model

Relation? Attribute? Tuple? Keys.

Primary and foreign. Referential integrity. Relational algebra. Relational Calculus.

Page 7: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Relational DB Rules

Every row must have exactly the same number of columns (fields or attributes).

Each row can have only one value stored in each column (fields or attributes).

A column must contain the same kind of value in every row of that column.

No two rows can be exactly the same. The order of the rows or of the columns can’t

be used to provide information.

Page 8: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Terminology

Data Processing

InformalRelational DB

Formal Relational DB

 

File 

Table 

Relation

Record Row Tuple

Field Column Attribute 

 

Page 9: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Normalization Purpose:

Avoid anomalies. Not delete something you wish to keep while deleting

something you do not want to keep. Not having to add something that is unnecessary while adding

something that is necessary. Reduce redundancy.

Process: Successive application of rules. Bottom-up (data drives process). Move from first through fifth normal form. Does it make more or less tables?

Page 10: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review
Page 11: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Entity Relationship Modeling

List the entities or objects in the environment. People, things, transactions.

Describe the relationship between them A single row in table A can be related to how

many rows in table B (one or many). A single row in table B can be related to how

many rows in table A (one or many).

Page 12: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Example E/RD

Page 13: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

SQL Definition (DDL).

CREATE, ALTER, DROP. Manipulation (DML).

SELECT, INSERT, UPDATE, DELETE.

The most used SQL command SELECT

Page 14: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

General Format of SELECT

SELECT [DISTINCT] item(s) FROM table(s) [WHERE condition] [GROUP BY columns] [HAVING condition] [ORDER BY row(s)]

Page 15: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

ISQS 6347, Data & Text Mining15

Case Study - IMW

Page 16: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

Want a house? Find one from the web Up to 50% of prospective

American homebuyers use the Internet to search for new homes, encompassing more than 9% of households online

Or about six million visitors, to various real estate sites.

These users have accessed the real estate sites on an average of 1.8 days per month or a total of 13.9 minutes each day.

From 1996 to 2000, more than 400 business models were created across the entire real estate spectrum.

ISQS 6347, Data & Text Mining16

Page 17: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

About IMW

ISQS 6347, Data & Text Mining17

Based in Austin, Texas, IMW (Internet Media Works!) is an ASP, specialized mainly in web-based application development, database integration, and web development and hosting for all kinds of businesses.

IMW has been more successful in selling its e-business services for commercial real estate. Its services include lead generation, real estate transaction management, property listing, realtor membership management, real estate indices, real estate auctions, etc., with COMMREX as a complete e-business solution.

IMW used to have up to 6 full-time employees and a few part-time employees.

Page 18: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

ISQS 6347, Data & Text Mining18

Website Hosting Services

Core Membership Database Services

Core Property Listing Database Services

Optional WebsiteHosting Services

Optional Membership Database Services

Optional Property Listing Database Services

Public UserApplication

Services

Networking and System Operation ServicesPublic User Support

Internet Service Provider’s Services

IMW’s Web-Based Application Services

IMW’s Services

Page 19: Zhangxi Lin Texas Tech University ISQS 6347, Data & Text Mining 1 ISQS 6339 Data Management and Business Intelligence Database Review

ISQS 6347, Data & Text Mining19

Property ID

Listor ID Listor ID

Address

Property Type

City

Office

Chapter

Functions

Specializations

Office

Company ID

Address

Telephone #

Company ID

Company Name

Listor Name

Chapter

Feature

Property Type

Subtype 1

Type Name

Subtype 2

Subtype n

M:1

M:M

M:1

M:M

Primary Key

Secondary Key

Link to a table

Legends

Property Listing Database Membership Database

IMW’s Data Model