presentazione data quality and big data€¦ · cosa intendiamo per data quality «the fitness for...
TRANSCRIPT
![Page 1: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/1.jpg)
Data Quality nell'era dei Big Data
roberto leombruniUniversità di Torino e Laboratorio Revelli
ARCHIVIAZIONE, DISSEMINAZIONE E RIUSO DEI DATI: A CHE PUNTO SIAMO?
TorinoCampus Luigi Einaudi23 novembre 2017
![Page 2: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/2.jpg)
La NASA lanciò lo space shuttle Challenger il 28 gennaio 1986. Pochi momenti dopo il decollo, uno dei razzi propulsori esplose, portando alla distruzione dello shuttle e alla morte dei sette membri dell’equipaggio.
L’esplosione dello space shuttle Challenger fu attribuita a un problema diqualità dei dati con i quali si era gestito il decollo (Fisher & Kingma,2001).
l’importanza della qualità
![Page 3: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/3.jpg)
l’importanza della qualità
![Page 4: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/4.jpg)
cosa intendiamo per data quality
«The fitness for use of information»
Martin Eppler
«The state of completeness, validity, consistency, timeliness andaccuracy that makes data appropriate for a specific use»
Government of British Columbia
…
![Page 5: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/5.jpg)
cosa intendiamo per data quality
rilevanza
accuratezza
completezza
consistenza
timeliness
accessibilità
comparabilità
costi
…
e nel caso deiBig Data?
![Page 6: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/6.jpg)
rilevanza
accuratezza
completezza
consistenza
timeliness
accessibilità
comparabilità
costi
…
cosa intendiamo per data quality
![Page 7: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/7.jpg)
rilevanza
accuratezza
completezza
consistenza
timeliness
accessibilità
comparabilità
costi
…
cosa intendiamo per data quality
![Page 8: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/8.jpg)
i costi della poca accuratezza
![Page 9: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/9.jpg)
ancora sui costi della cattiva qualità
![Page 10: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/10.jpg)
e gli altri costi?
![Page 11: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/11.jpg)
e gli altri costi?
no «fitness for use» → bad decisions
(trash in - trash out principle)
![Page 12: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/12.jpg)
un esempio dalla ricerca sociale: WHIP-Salute
the database is able to track the main events ofindividuals’ working careers
WHIP stands for Work Histories Italian Panel
self
employment
pensiondependent
work
s.s. provision
198
5
201
0
It is based on administrative data collected by the Italian National Institute for Social Security (INPS), National Institute for Work Injuries Insurance (INAIL), Ministry of Welfare, National Institute of Statistics (ISTAT).
injury hospitalization
![Page 13: Presentazione data quality and big data€¦ · cosa intendiamo per data quality «The fitness for use of information » Martin Eppler «The state of completeness, validity, consistency,](https://reader033.vdocuments.mx/reader033/viewer/2022050119/5f4f8d5caab9497e1d7bb3c8/html5/thumbnails/13.jpg)
un esempio dalla ricerca sociale: Whip-salute
data reception docs retrieval
data cleansing
data normalization
administrative datadodo
cs
longitudinal identification
of firms
longitudinal identification
of job spells work historiesdatabase
on linedocumentation
InputsInputs
OutputsOutputs