1
EM7033Data Management and Business Intelligence
Luca Cosmo
Database Systems: The Complete Book (2nd Edition). Hector Garcia-Molina, Jeffrey D. Ullman, Jennifer Widom. Prentice Hall
Business Intelligence: Data Mining and Optimization for Decision Making. Carl Vercellis. Wiley
Class hours:Wednesday 14:00-15:00
Thursday 15:45-17:15
Friday 15:45-17:15
Office hours: In my office in via Torino, send me an email for an
appointment!
EMAIL: [email protected]
Put [EM7033] as email subject prefix
2
3
Content of EM7033
Design of databases.
E/R model, relational model.
Database programming.
SQL, Relational algebra.
Introduction to Data Mining
4
Course Requirements
1. Exam type: written exam
2. Group activity: design of a database groups of (exactly) 4 students MUST register by sending
a mail by midnight September 30th to [email protected] with
subject: EM7033 WG: GROUPNAME (write the name of the group, not
GROUPNAME !)
content: ID (matricola) and name for each student of the group
Project will be assigned by October 5th
Presentations will be scheduled from October 13th
Each one will run about 20 minutes long
Up to 4 bonus points to be gained ! Good luck ;-)
ExampleRetention in the mobile phone industry
The marketing manager of a mobile phone company realizes that a large number of customers are discontinuing their service, leaving her company in favor of some competing provider.
Suppose that the marketing manager can rely on a budget adequate to pursue a customer retention campaign aimed at 2000 individuals out of a total customer base of 2 million people.
How she should go about choosing those customers to be contacted so as to optimize the effectiveness of the campaign?
The target group can be chosen as the 2000 people having the highest churn likelihood among the customers of high business value. (Not even that simple…)
5
Business Intelligence vs Intuitive approach
6
Business Intelligence
The main purpose of business intelligence systems is to provide knowledge workers with tools and methodologies that allow them to make effective and timely decisions. Effective decisions. The application of rigorous analytical
methods allows decision makers to rely on information and knowledge which are more dependable.
Timely decisions. The ability to rapidly react to the actions of competitors and to new market conditions is a critical factor in the success or even the survival of a company.
7
https://en.wikipedia.org/wiki/DIKW_Pyramid8
DIKW Pyramid Data
For a retailer data refer to primary entities such as customers, points of sale and items, while sales receipts represent the commercial transactions.
Information
Information is the outcome of extraction and processing activities carried out on data, and it appears meaningful for those who receive it in a specific domain
Knowledge
Information is transformed into knowledge when it is used to make decisions and develop the corresponding actions.
Wisdom
Take a decision based on knowledge and previous experience
9
What is Business Intelligence?
Business Intelligence is a set of methods, processes,
architectures, applications, and technologies that gather and transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making (to drive business performance).
10
BI: A General Process
11
Data Gathering
Data Cleanse
Data Storage
Data Analysis
Data Presentation
The collection of raw data from different sources by different means.
The transformation of data into clean and standard models and formats
The refined data will be stored under a particular data model for quality management, easy and fast access
Results are presented and delivered in different human comprehendible formats, to support decisions.
The process involves analytical components, such as OLAP, data quality, data profiling, business rule analysis, and data mining, to extract information and knowledge
GATHER AND ORGANIZE DATA
The first source of data is internal to the enterprise:
Customers, products, transactions, warehouse, ...
Historical data
It is very important to organize data as it is easly accessible and contains all the usefull information (but not more -> data overload).
12
DATABASES
14
Do You Know SQL?
Explain the difference between:
SELECT a
FROM R
WHERE a<10 OR a>=10;
and
SELECT b
FROM R;
a b5 2010 3020 40… …
R
15
And How About These?
SELECT b
FROM R, S
WHERE R.b = S.b;
SELECT b
FROM R
WHERE b IN (SELECT b FROM S);
16
Interesting Stuff About Databases
It used to be about boring stuff: employee records, bank records, etc.
Today, the field covers all the largest sources of data, with many new ideas.
Web search.
Data mining.
Scientific and medical databases.
ERPs.
17
More Interesting Stuff
Database programming centers around limited programming languages.
Only area where non-Turing-complete languages make sense.
Leads to very succinct programming, but also to unique query-optimization problems.
18
Still More …
You may not notice it, but databases are behind almost everything you do on the Web.
Google searches.
Queries at Amazon, eBay, etc.
19
And More…
Databases often have unique concurrency-control problems.
Many activities (transactions) at the database at all times.
Must not confuse actions, e.g., two withdrawals from the same account must each debit the account.
ACID properties
(Atomicity, Consistency, Isolation, Durably)
20
What is a Data Model?
1. Mathematical representation of data.
Examples: relational model = tables; semistructured model = trees/graphs.
2. Operations on data.
3. Constraints.
21
A Relation is a Table
name manf
Kilkenny Arthur Guinness
Bud Lite Anheuser-Busch
Beers
Attributes(columnheaders)
Tuples(rows)
Relationname
22
Schemas
Relation schema = relation name and attribute list. Optionally: types of attributes.
Example: Beers(name, manf) or Beers(name: string, manf: string)
Database = collection of relations.
Database schema = set of all relation schemas in the database.
23
Why Relations?
Very simple model.
Often matches how we think about data.
Abstract model that underlies SQL, the most important database language today.
24
Our Running Example
Beers(name, manf)
Bars(name, addr, license)
Drinkers(name, addr, phone)
Likes(drinker, beer)
Sells(bar, beer, price)
Frequents(drinker, bar)
Underline = key (tuples cannot have the same value in all key attributes).
Excellent example of a constraint.
25
Database Schemas in SQL
SQL is primarily a query language, for getting information from a database.
But SQL also includes a data-definitioncomponent for describing database schemas.
Data-manipulation instructions to insert, delete and modify tables.