New Frontiers in Business Intelligence:
Distribution and Personalization
Matteo GolfarelliUniversity of Bologna - Italy
2
Summary
� The challenges of BI 2.0
� Motivating scenario and envisioned architecture: Business Intelligence Networks
� Distribution� Research issues� A mapping language� Query reformulation
� Personalization → Patrick Marcel
3
From BI 1.0 to BI 2.0
� Business intelligence (BI) transformed the role of computer science in companies from a technology for storing data into a discipline for timely detecting key business factors and effectively solving strategic decisional problems
� In the current changeable and unpredictable market scenarios, the needs of decision makers are rapidly evolving
� To meet the new, more sophisticated user needs, a new generation of BI systems (BI 2.0) has been emerging
4
Issues in BI 1.0
� Performance optimization (query plans, materialized views, indexing, etc.)
� Logical design� Conceptual design methodologies and
formalisms � ETL modeling and automation� Testing the DW� ....
5
Issues in BI 2.0
� BI as a service� On-demand BI� Real-time BI� Situational BI� Collaborative BI� Pervasive BI� ....
6
Motivating scenario
� Cooperation is seen today by companies as one of the major means for increasing flexibility and innovating so as to survive in today uncertain and changing market
� Companies need strategic information about the outer world, for instance about trading partners and related business areas
� It is estimated that above 80% of waste in inter-company and supply-chain processes is due to a lack of communication between the companies involved
7
Motivating scenario
� In such a distributed business scenario, where multiple partner companies/organizations cooperate towards a common goal, traditional BI systems are no longer sufficient to maximize the effectiveness of decision making processes
� Two new significant requirements arise:� Cross-organization monitoring and decision making
Accessing local information is no more enough, users need to transparently and uniformly access information scattered across several heterogeneous BI platforms
� Pervasive and personalized access to informationUsers require that information can be easily and timely accessed through devices with different computation and visualization capabilities, and with sophisticated and customizable presentations
8
Envisioned architecture
� Business Intelligence Network (BIN):a dynamic, collaborative network of peers, each hosting a local, autonomous BI platform1. Each peer relies on a local multidimensional schema that
represents the peer's view of the business, and it offers monitoring and decision support functionalities to the other peers
2. Users transparently access business information distributed over the network in a pervasive and personalized fashion
3. Access is secure, depending on the access control and privacy policies adopted by each peer
4. Participants are collaborative, even if with different grades5. Inclination to collaboration does not reduce autonomy of
participants, who are not subject to a shared schema6. A BIN is decentralized and scalable because the number of
participants, the complexity of business models, and the workload can change
9
Envisioned architecturep
eer i
queryforwarding
queryreformulation
query resultreconciliation
local queryprocessing
peer N
peer 1
BusinessIntelligence
Network
local BI platform
access policiesresolution
local MD schema
mappings
10
Envisioned architecture
peer i
queryforwarding
queryreformulation
query resultreconciliation
local queryprocessing
peer N
peer 1
BusinessIntelligence
Network
local BI platform
access policiesresolution
local MD schema
mappings
interacts with the peer’s BI platform to obtain results from the local data
11
Envisioned architecturep
eer i
queryforwarding
queryreformulation
query resultreconciliation
local queryprocessing
peer N
peer 1
BusinessIntelligence
Network
local BI platform
access policiesresolution
local MD schema
mappings
uses the semantic mappings established towards the peer neighbors to reformulate queries accordingly
12
Envisioned architecture
peer i
queryforwarding
queryreformulation
query resultreconciliation
local queryprocessing
peer N
peer 1
BusinessIntelligence
Network
local BI platform
access policiesresolution
local MD schema
mappings
applies query routing policies to select the most relevant peers to forward a query to
13
Envisioned architecturep
eer i
queryforwarding
queryreformulation
query resultreconciliation
local queryprocessing
peer N
peer 1
BusinessIntelligence
Network
local BI platform
access policiesresolution
local MD schema
mappings
collects and integrates the results coming from the peers
14
Envisioned architecture
peer i
queryforwarding
queryreformulation
query resultreconciliation
local queryprocessing
peer N
peer 1
BusinessIntelligence
Network
local BI platform
access policiesresolution
local MD schema
mappings
sets policies for data sharing depending on the degree of trust between participants
15
A typical user interaction
DW
DB
DW
DB
DW DB
Milan
Bologna
Florence
Naples
Rome
A user formulates an OLAP query q by accessing the local multidimensional schema of her peer
16
A typical user interaction
DW
DB
DW
DB
DW DB
Milan
Bologna
Florence
Naples
Rome
She can annotate q by a preference that enables her to rank the returned information according to her specific interests
17
A typical user interaction
DW
DB
DW
DB
DW DB
Milan
Bologna
Florence
Naples
Rome
To enhance the decision making process, q is forwarded to the network and reformulated on the other peers in terms of their own multidimensional schemata
18
A typical user interaction
DW
DB
DW
DB
DW DB
Milan
Bologna
Florence
Naples
Rome
Each involved peer locally processes the reformulated query and returns its (possibly partial or approximate) results to the querying peer
19
A typical user interaction
DW
DB
DW
DB
DW DB
Milan
Bologna
Florence
Naples
Rome
The results are integrated, ranked according to the preference expressed by the user, and returned to the user based on the lexicon used to formulate q
20
Research issues
� Query reformulation on peers is a challenging task due to the presence of aggregation and to the possibility of having information represented at different granularities in each peer
� To optimize query answering across the network, query routingstrategies that forward queries to the most promising peers only are needed
� The strategic nature of the exchanged information and its multidimensional structure require advanced approaches for security
� Mechanisms for controlling data provenance and quality in order to provide users with information they can rely on should be devised
� A unified, integrated vision of the heterogeneous information collected must be returned to users through object fusiontechniques
21
Query reformulation
� Mapping language:�Handling the asymmetry between dimensions and
measures�Specifying the relationship between two attributes
of different multidimensional schemata in terms of their granularity
�Considering aggregation operators to avoid the risk of inconsistent query reformulations
�Expressing also mappings at the instance level to transcode data
(Golfarelli et al., 2010)
22
HOSPITALIZATION
costdurationOfStay
ward
unit
patientbirthDate
gender
segment
city region
diagnosiscategory
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
ward
LHD
patientBirthYear
patientGender
patientCity patientNation
datemonthyear
disease
@Rome
@Florence
datemonthyear
week
organ
Query reformulation
� Mapping language:
23
Query reformulation
HOSPITALIZATION
costdurationOfStay
ward
unit
patientbirthDate
gender
segment
city region
category
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
ward
LHD
patientBirthYear
patientGender
patientCity patientNation
monthyear
disease
@Rome
datemonthyear
week
organ
@Florence
date
diagnosis
same
roll-up
equi-level
equi-level
� Mapping language:mappings can be annotated with
encoding functions
drill-down
24
Md-Schema@peeriMd-Schema@peerj
Semantic Mappings
Mappingtranslation
Schematranslation
Schematranslation
s-t tgds
OLAP Query
Relational Query
Querytranslation
Query reformulation
� Framework:�To translate semantic mappings we use a logical
formalism called source-to-target tuple generating dependencies (ten Cate & Kolaitis, 2010)
25
Example: Schema translation
ward
unit
diagnosiscategory
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
patientBirthYear
patientGender
patientCity patientNation
datemonthyear
@Florence
HOSPITALIZATION
costdurationOfStay
patientbirthDate
gender
segment
city region
ward
LHD
disease
@Rome
datemonthyear
week
organ
HospFT(organ,disease,date,ward,patient,cost,durationOfStay)OrganDT(organ)DiseaseDT(disease)DateDT(date,week,month,year)WardDT(ward,LHD)PatientDT(patient,birthDate,city,region,segment,gender)
AdmFT(diagnosis,date,ward,patientCity,patientBirthYear,patientGender,totStayCost,totExamCost,totLength,numAdmissions)
DiagnosisDT(diagnosis,category)DateDT(date,month,year)WardDT(ward,unit)PatientCityDT(patientCity,patientNation)PatientBirthYearDT(patientBirthYear)PatientGenderDT(patientGender)
26
Example: Query translation
� Total hospitalization costs for region and yearπregion,year,SUM(cost) (HospFT DateDT PatientDT)
q(R,Y,SUM(C)) ←HospFT(_,_,D,_,P,C,_),
DateDT(D,_,_,Y),PatientDT(P,_,_,R,_,_))
HOSPITALIZATION
costdurationOfStay
patientbirthDate
gender
segment
city region
ward
LHD
disease
@Rome
datemonthyear
week
organ
27
Example: Mapping translation
∀S,E,C (AdmFT(_,...,S,E,_,_), C=S+E→HospFT(_,...,C,_)
HOSPITALIZATION
costdurationOfStay
ward
unit
patientbirthDate
gender
segment
city region
category
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
ward
LHD
patientBirthYear
patientGender
patientCity patientNation
monthyear
disease
@Rome
datemonthyear
week
organ
@Florence
date
diagnosis
same
28
Example: The rewriting
� The group-by is reformulated using the roll-upmapping from region to patientCity, while measure cost is derived using the samemapping
πyear,patientCity,SUM(totStayCost+totExamCost) (AdmFT DateDT PatientCityDT)
ward
unit
diagnosiscategory
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
patientBirthYear
patientGender
patientCity patientNation
datemonthyear
@Florence
29
Personalization
� The goal of personalization is to deliver information that is relevant to an individual or a group of individuals in the most appropriate format and layout� Recommendation: the system suggests new queries to
support users in navigating the cube (Giacometti et al., 2009)
� Personalized visualization: the user specifies constraints that are used to determine a preferred visualization according to a user profile (Bellatreche et al., 2005)
� Ranking: query results are organized in a total or partial order so that the user visualizes only the “most relevant” tuples (Golfarelli et. al., 2011).
� Contextualization: the query is enhanced by adding predicates that depend on the context (Jerbi et al. 2008)
30
Thank you for you attention
Questions?
31
Related readings
� Abiteboul, S. Managing an XML warehouse in a P2P context. In Proc. CAISE, 2003� Banek, M., Vrdoljak, V., Min Tjoa, A., & Skocir, Z. Automated integration of heterogeneous
data warehouse schemata. IJDWM, 4(4), 2008� L. Bellatreche, A. Giacometti, P. Marcel, H. Mouloudi, D. Laurent. A personalization
framework for OLAP queries. In Proc. DOLAP, 2005� J. Chomicki. Preference formulas in relational queries. ACM TODS, 28(4), 2003� Cui, Y., & Widom, J. Lineage Tracing for General Data Warehouse Transformations. JVLDB,
12(1), 2003� da Silva, P.P., McGuinness, D.L., & McCool, R. Knowledge Provenance Infrastructure. IEEE
Data Engineering Bulletin, 26(4), 2003� Dubois, D., & Prade, H. On the use of aggregation operations in information fusion processes.
International Journal on Fuzzy Sets and Systems, 142(1), 2004� P. Georgiadis, I. Kapantaidakis, V. Christophides, E. M. Nguer, and N. Spyratos. Efficient
rewriting algorithms for preference queries. In Proc. ICDE, 2008� A. Giacometti, P. Marcel, E. Negre. Recommending MDX Queries. In Proc. DaWaK, 2009� Golfarelli, M. Rizzi., S., & Biondi, P. myOLAP: An approach to express and evaluate OLAP
preferences. To appear on IEEE TKDE, 2011� M. Golfarelli, F. Mandreoli, W. Penzo, S. Rizzi, E. Turricchia. Towards OLAP Query
Reformulation in Peer-to-Peer Data Warehousing. In Proc. DOLAP, 2010� Halevy, A. Y., Ives, Z. G., Madhavan, J., Mork, P., Suciu, D., & Tatarinov, I. The Piazza Peer
Data Management System. IEEE TKDE, 16(7), 2004� Hoang, T. A. D., & Binh Nguyen, T. State of the art and emerging rule-driven perspectives
towards service-based business process interoperability. In Proc. Int. Conf. on Computing andCommunication Technologie, 2009
32
Related readings
� H. Jerbi, F. Ravat, O. Teste, G Zurfluh. Management of context-aware preferences inmultidimensional databases. In Proc. ICDIM, 2008
� P. Kalnis, W. Siong Ng, B. Chin Ooi, D. Papadias and K.-L.Tan. An adaptive peer-to-peer network for distributed caching of OLAP results. In Proc. SIGMOD Conference, 2002
� Kehlenbeck, M., & Breitner, M. H. Ontology-based exchange and immediate application ofbusiness calculation definitions for online analytical processing. In Proc. DAWAK, 2009
� W. Kießling. Foundations of preferences in database systems. In Proc. VLDB, 2002� Mandreoli, F., Martoglia, R., Penzo, W., & Sassatelli S. SRI: exploiting semantic information
for effective query routing in a PDMS. In Proc. ACM Int. Workshop on Web Information andData Management, 2006
� Mecca, G., Papotti, P., & Raunich, S. Core Schema Mappings. In Proc. ACM SIGMOD Int.Conf. on Management of Data, 2009
� K. Stefanidis, E. Pitoura, P. Vassiliadis. Modeling and Storing Context-Aware Preferences. In Proc. ADBIS, 2006
� Sung, S., Liu, Y., Xiong, H., & Ng, P. Privacy preservation for data cubes. Knowledge andInformation Systems, 9(1), 2006
� Tatarinov, I. & Halevy, A.Y. Efficient Query Reformulation in Peer-Data ManagementSystems. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 2004
� B. ten Cate and P. G. Kolaitis. Structural characterizations of schema-mapping languages.Comm. ACM, 53(1), 2010
� Torlone, R. Two approaches to the integration of heterogeneous data warehouses. Int. Journ.on Distributed and Parallel Databases, 23(1), 2008
� D. Xin, J. Han, H. Cheng and X. Li. Answering Top-k Queries with Multidimensional Selections: The Ranking Cube Approach. In Proc. VLDB, 2006