zhangxi lin texas tech university isqs 6347, data & text mining 1 isqs 6339 data management and...
TRANSCRIPT
Zhangxi Lin
Texas Tech University
ISQS 6347, Data & Text Mining1
ISQS 6339 Data Management and Business Intelligence
Database Review
Attributes of Data Sharable. Moveable. Secure. Accurate. Timely. Relevant.
Data hierarchy Bits. Characters. Fields (columns). Records (rows). Files (table). Database.
Why Build a Database? Handle large amounts of data. Satisfy multiple users. Make information retrieval faster. Make data input faster. Provide greater accuracy.
Database versus Database Management System (DBMS) Database is a self-describing collection of
integrated files. A DBMS is a complex computer program that acts
as a data librarian, supervising the transfer of data between the end user and the database.
Relational Model
Relation? Attribute? Tuple? Keys.
Primary and foreign. Referential integrity. Relational algebra. Relational Calculus.
Relational DB Rules
Every row must have exactly the same number of columns (fields or attributes).
Each row can have only one value stored in each column (fields or attributes).
A column must contain the same kind of value in every row of that column.
No two rows can be exactly the same. The order of the rows or of the columns can’t
be used to provide information.
Terminology
Data Processing
InformalRelational DB
Formal Relational DB
File
Table
Relation
Record Row Tuple
Field Column Attribute
Normalization Purpose:
Avoid anomalies. Not delete something you wish to keep while deleting
something you do not want to keep. Not having to add something that is unnecessary while adding
something that is necessary. Reduce redundancy.
Process: Successive application of rules. Bottom-up (data drives process). Move from first through fifth normal form. Does it make more or less tables?
Entity Relationship Modeling
List the entities or objects in the environment. People, things, transactions.
Describe the relationship between them A single row in table A can be related to how
many rows in table B (one or many). A single row in table B can be related to how
many rows in table A (one or many).
Example E/RD
SQL Definition (DDL).
CREATE, ALTER, DROP. Manipulation (DML).
SELECT, INSERT, UPDATE, DELETE.
The most used SQL command SELECT
General Format of SELECT
SELECT [DISTINCT] item(s) FROM table(s) [WHERE condition] [GROUP BY columns] [HAVING condition] [ORDER BY row(s)]
ISQS 6347, Data & Text Mining15
Case Study - IMW
Want a house? Find one from the web Up to 50% of prospective
American homebuyers use the Internet to search for new homes, encompassing more than 9% of households online
Or about six million visitors, to various real estate sites.
These users have accessed the real estate sites on an average of 1.8 days per month or a total of 13.9 minutes each day.
From 1996 to 2000, more than 400 business models were created across the entire real estate spectrum.
ISQS 6347, Data & Text Mining16
About IMW
ISQS 6347, Data & Text Mining17
Based in Austin, Texas, IMW (Internet Media Works!) is an ASP, specialized mainly in web-based application development, database integration, and web development and hosting for all kinds of businesses.
IMW has been more successful in selling its e-business services for commercial real estate. Its services include lead generation, real estate transaction management, property listing, realtor membership management, real estate indices, real estate auctions, etc., with COMMREX as a complete e-business solution.
IMW used to have up to 6 full-time employees and a few part-time employees.
ISQS 6347, Data & Text Mining18
Website Hosting Services
Core Membership Database Services
Core Property Listing Database Services
Optional WebsiteHosting Services
Optional Membership Database Services
Optional Property Listing Database Services
Public UserApplication
Services
Networking and System Operation ServicesPublic User Support
Internet Service Provider’s Services
IMW’s Web-Based Application Services
IMW’s Services
ISQS 6347, Data & Text Mining19
Property ID
Listor ID Listor ID
Address
Property Type
City
Office
Chapter
Functions
Specializations
Office
Company ID
Address
Telephone #
Company ID
Company Name
Listor Name
Chapter
Feature
Property Type
Subtype 1
Type Name
Subtype 2
Subtype n
M:1
M:M
M:1
M:M
Primary Key
Secondary Key
Link to a table
Legends
Property Listing Database Membership Database
IMW’s Data Model