from unstructured data to structured journalism
TRANSCRIPT
![Page 1: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/1.jpg)
From unstructured data to structured journalism
Giuseppe FutiaNexa Center for Internet and Society, Politecnico di Torino (DAUIN)
April 12, 2016Master in Giornalismo "Giorgio Bocca" di Torino
![Page 2: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/2.jpg)
Nexa Center for Internet & Society at Politecnico di Torino
Website: http://nexa.polito.it/
![Page 3: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/3.jpg)
Communication ManagerWebsite, social media,
mailing-list
![Page 4: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/4.jpg)
Research FellowGitHub account:
https://github.com/giuseppefutia
![Page 5: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/5.jpg)
Start with Why
![Page 6: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/6.jpg)
Presentation ofJonathan Stray
(Journalist, data scientist)
YouTube Video:
https://www.youtube.com/watch?v=z4wHiv4bs-Y
![Page 7: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/7.jpg)
Who said What?Best tool for multi-lingual
journalists
#newsHack 2016
organized byBBC Connected Studio
![Page 8: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/8.jpg)
Team
• 1 Product manager
• 1 Software engineer
• 2 Researchers
![Page 9: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/9.jpg)
Team
• 1 Product manager
• 1 Software engineer
• 2 Researchers
• And journalists…?
![Page 10: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/10.jpg)
New York Times, BBC, Washington Post
Source: Poynter.org
![Page 11: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/11.jpg)
Using "machine learning," technologists at news outlets around the world are helping newsrooms eliminate extra time-consuming tasks and giving humans more time to do what they do best: reporting the news (Poynter.org)
![Page 13: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/13.jpg)
Juicer BBC News Labs
![Page 14: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/14.jpg)
Linked Data CloudSource:
https://en.wikipedia.org/wiki/Linked_data
![Page 15: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/15.jpg)
Knowledge Map Washington Post
![Page 16: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/16.jpg)
Panama papers leak Source: Wired.com
![Page 17: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/17.jpg)
Panama papers leak
• 11.5 million of documents
– 4.8 million of mails
– 4 million of database entries
– 2 million of PDFs
– 1 million of images
– 320.000 text documents
• 100 news organisations and 400 journalists
![Page 18: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/18.jpg)
Panama papers processing
• Sort and organise the files
• Index these files
• Bring out all of the metadata
• Investigate data from the big data and analytical perspective
![Page 19: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/19.jpg)
Panama papers result
• The final database: 30 per cent of the original data size
• Bring out entities: first names and second names
• Analytics to find how these names refer to the documents
![Page 20: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/20.jpg)
TellMeFirst http://tellmefirst.polito.it
![Page 21: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/21.jpg)
Public Contracts http://public-contracts.nexacenter.org/
![Page 22: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/22.jpg)
Data journalism as a framework
![Page 23: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/23.jpg)
BBC News Labs Project
“To help news organisationscurate stories that scale, adapt and connect across platforms
and use cases”
![Page 24: From unstructured data to structured journalism](https://reader030.vdocuments.mx/reader030/viewer/2022021422/5aabe1c97f8b9aaf528b55e7/html5/thumbnails/24.jpg)
Thanks!
GitHub Repository
https://github.com/giuseppefutia/