esteem: trust-aware p2p data integration
DESCRIPTION
ESTEEM: Trust-aware P2P data integration. Carola Aiello,Tiziana Catarci, Diego Milano, Monica Scannapieco Dipartimento di Informatica e Sistemistica Università di Roma “La Sapienza”. Outline. Progetti precedenti Obiettivi ESTEEM Problematiche e direzioni di ricerca dell’unità - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/1.jpg)
1
ESTEEM: Trust-aware P2P data integration
Carola Aiello,Tiziana Catarci, Diego Milano, Monica Scannapieco
Dipartimento di Informatica e Sistemistica Università di Roma “La Sapienza”
![Page 2: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/2.jpg)
2
Outline
Progetti precedenti Obiettivi ESTEEM Problematiche e direzioni di ricerca
dell’unità Data quality: Quality-aware query
processing Privacy: Privacy-aware record matching Trust: Modello di trust per le sorgenti
![Page 3: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/3.jpg)
3
DaQuinCIS project (2003)
MIUR – COFIN/PRIN Main focus: data quality in
cooperative information systems (CISs)
Data Quality Problems: Record Matching Quality-driven query processing
![Page 4: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/4.jpg)
4
Motivations
A real example: e-Goverment project to integrate data about Italian companies
DATA INTEGRATION LAYER
Query Company XYZ ?
Chambers of Commerce Social Insurance Agency Accident Insurance Agency
![Page 5: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/5.jpg)
5
Chambers of Commerce Social Insurance Agency Accident Insurance Agency
Id Name Type of activity Address City
![Page 6: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/6.jpg)
6
The Three Real RecordsID Type of
ActivityCity Name Address
CNCBTB765SDV Retail of bovine and ovine meats
Novi Ligure Meat production of Bartoletti Benito
National Street dei Giovi
0111232223 Grocer’s shop, beverages
Pizzolo Formigaro
Bartoletti Benito
Meat production
9, Rome Street
CNCBTR765LDV Butcher Ovada Meat production in Piemonte of Bartoletti Benito
4, Mazzini Square
Which is the actual company XYZ to be returned to the client ?
• One of 3 ? Which ?• A “merge” of the 3 ?
Which is the actual company XYZ to be returned to the client ?
• One of 3 ? Which ?• A “merge” of the 3 ?
![Page 7: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/7.jpg)
7
Objectives of the Research
Given a set of distributed and heterogeneous data sources that are affected by data quality problems
1. Improving the quality of each data source
Record matching across sources
2. Provide a unified and trasparent access to data sources
Data Integration & Quality-driven query processing
![Page 8: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/8.jpg)
8
Improving quality of addresses in Italian PA (2004)
Accordo di collaborazione AIPA (ora CNIPA) e ISTAT Aprile 2002-Luglio 2004
Proposta di formati standard per l’acquisizione e l’interscambio degli indirizzi
Proposta di ridisegno dei flussi per l’aggiornamento degli indirizzi
Metodologia per la misurazione della qualità degli indirizzi
Misurazione sperimentale della qualità degli indirizzi in tre archivi nazionali: Agenzia delle Entrate Camere di Commercio INPS
![Page 9: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/9.jpg)
9
Data Quality and Data Privacy (Current)
Joint Activity with University of Purdue, Indiana USA
Publishing elementary data may violate privacy requirements, even when data are anonymized anonymization removes principal identifiers
like SSN, Name+Surname+DOB, etc. Record matching privacy aware
only the result of the intersection (AB) across data sets are shared and nothing else (not A-AB and not B-AB)
![Page 10: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/10.jpg)
10
Obiettivi ESTEEM
Studio di problematiche di trust e qualità dei dati in sistemi P2P
Specifica di sistemi di integrazione dati P2P con requisiti di trust
Definizione di algoritmi di query processing quality- and trust-aware
![Page 11: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/11.jpg)
11
P2P Systems
P2P systems loosely coupled, dynamic, open
Data sharing in such systems no centralized global schema peers mapping dynamically build new peers can make available new
data schema
![Page 12: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/12.jpg)
12
Data Quality
EmployeeID Name Surname Salary Email
arpa78 John Smith 2600 [email protected]
eugi98 Edward Monroe 1500 [email protected]
ghjk09 Anthony White 1250 [email protected]
dref43 Marianne Collins 1150 [email protected]
Attributeconflict
EmployeeID Name Surname Salary Email
arpa78 John Smith 2000 [email protected]
eugi98 Edward Monroe 1500 [email protected]
ghjk09 Anthony Wite 1250 [email protected]
treg23 Marianne Collins 1150 [email protected]
Keyconflict
EmployeeS1
EmployeeS2
![Page 13: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/13.jpg)
13
Quality-aware query processing - 1
Key conflicts require the application of Record Matching techniques
Attribute conflicts are solved by query time Conflict Resolution Techniques
The resolution of such conflicts in P2P systems is an open issue: Definition of a quality-aware semantics for query
answering in P2P systems Need to develop techniques for solving such
conflicts according to the defined semantics
![Page 14: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/14.jpg)
14
Quality-aware query processing - 2
Query language supporting the specification of conflict resolution strategies
Important in P2P systems: research space pruning on the basis of quality characterization of sources
![Page 15: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/15.jpg)
15
Privacy
How to protect privacy when sharing data? With the source S1 and S2 issuing the Queries Q1 and Q2 respectively, at the end of the
interaction S1 must learn result Q1 and nothing else S2 must learn result Q2 and nothing else
S1 S2
Query Q1
Result Q2
Query Q2Result Q1
![Page 16: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/16.jpg)
16
Privacy-aware Record Matching - 1
A B
AB Secure set intersection: (i) matching
esatto; (ii) non di record; (iii) costosi Private data sharing: (i) matching
esatto; (ii) schema un-aware
![Page 17: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/17.jpg)
17
Privacy-aware Query Processing - 2
Algoritmi che consentano di fare privacy aware record matching in contesti P2P Problema della third party Prime proposte ElAbbadi ICDE 2006 ma
matching esatto
![Page 18: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/18.jpg)
18
Trust
Trust typically associated to a source as a whole
Need for finer level characterization Eg: Ministero delle Finanze
affidabile rispetto ai Codici Fiscali
![Page 19: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/19.jpg)
19
Modello di Trust per le sorgenti dati -1
Previous proposals: the whole organization (peer)
Our proposal: <Organization, Data Type>
OOrgn
CDOrgR i
iDki
iDki
k
)(,,
,,,
# of D-exchanges of
Orgk
# of <D, Orgk> complaints sent
by Orgi
![Page 20: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/20.jpg)
20
Modello di Trust per le sorgenti dati - 2
Drawback: Centralized Need for:
Decentralized More flexible model (e.g. trust associated to views)
![Page 21: ESTEEM: Trust-aware P2P data integration](https://reader036.vdocuments.mx/reader036/viewer/2022062409/568146e8550346895db4219f/html5/thumbnails/21.jpg)
21
Modello di Trust per le sorgenti dati - 3
More general trust characterization based on the evaluation of a peer’s assertion on some metadata:
Data quality-aware: trust computed on the basis of the declared quality of provided data
Privacy-aware: trust computed on the basis of the declared privacy leveldifferent roles for providers and consumers: e.g. a provider can decide not to
release data if a requester is not privacy - trusted (or to adopt specific technique)