text & data mining for the national library of...
TRANSCRIPT
![Page 1: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/1.jpg)
Text & Data Mining
for the National Library of Greece
in consideration of GDPR
Dr. Marinos Papadopoulos
Attorney-at-Law
Mr. Konstantinos Vavousis, M.Sc.
IT Security expert
Mrs. Dimitra Hioti, M.A.
Information Scientist, Librarian
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
1
![Page 2: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/2.jpg)
NLG & TDM in Greece
• Law 4452/2017 art.4(4)(b)
• Regulation 2016/679/EU (GDPR)
• Directive 2019/790/EU (Directive on Copyright in the DSM)
• NATIONAL LIBRARY OF GREECE (NLG) deployed TDM since FEB.2017
• Harvesting the .GR, .EDU, & .COM of the Web
• 18 TB of information
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
2
![Page 3: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/3.jpg)
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
Working Stages Of Web Archiving In Greece By NLG
Stage I Economic and technical study on the
needs and content of the Greek web
harvest. Study of international
experience
1st web harvest:
broad crawl – national
level: text data only
Stage II Definition of "Greek" sites to be mined
Stage III Data Analysis of 1st web harvest to
create a National Web Archiving System
Stage IV Installing and checking the operation of
tools for all phases of national web
archiving: extraction, archiving /
classification and finally, user search
and access): Heritrix for harvesting, Solr
for indexing and Open Wayback for web
site reconstitution. Netarchive Suite
using.
2nd web harvest:
broad - national level:
text only)
thematic (text and
images)
Stage V Developing a National Archiving System
of Greek Web (“ΕΣΑΕΙ”): the Greek
user/librarian interface
3
![Page 4: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/4.jpg)
Harvesting
• Bulk harvesting of works in the Greek Web.
• Selective harvesting leveraging on criteria:
1. Subject/Topic
2. Creator/Provenance
3. Type/Format
• The NLG Curator tool allows for either domain or selective harvests according to 6 predefined schedules.
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
4
![Page 5: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/5.jpg)
Preservation & Access
• Copies of harvested works are preserved in different servers in
different buildings of NLG.
• Users can access works leveraging on the NLG Curator tool.
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
5
![Page 6: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/6.jpg)
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
ΕΣΑΕΙ WEB ARCHIVE
6
![Page 7: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/7.jpg)
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
ΕΣΑΕΙ WEB ARCHIVE
7
![Page 8: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/8.jpg)
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
ΕΣΑΕΙ WEB ARCHIVE
8
![Page 9: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/9.jpg)
GDPR vs SECURITY?
• Digital environments
• Strong Security Mechanisms
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
9
![Page 10: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/10.jpg)
R.O.S.I.
• Minimize investments
• Heavy fines by GDPR
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
10
![Page 11: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/11.jpg)
Strong Security
Mechanisms• Functions (Rec. 78, 108, 119)
• Processes (Rec.63, Art.4, 11, 28)
• Controls (Rec.37)
• Systems (Rec.49, 105, 151)
• Procedures (Rec.71, 88, Art.6, 12, 40, 41, 43, 47, 70)
• Policies (Rec.78, Art.4, 24, 39)
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
11
Personal and critical data protection
![Page 12: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/12.jpg)
Optimal Infrastructure
• Firewall
• DLP
• MDM
• IDPS
• Email ENCRYPTION
• Database ENCRYPTION & PSEUDONYMIZATION
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
12
![Page 13: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/13.jpg)
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
13
![Page 14: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/14.jpg)
Authentication &
Authorization
• Strong access control policies
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
14
![Page 15: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/15.jpg)
Epilogue
• CYBER-SECURITY is top priority for GDPR
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
15
![Page 16: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/16.jpg)
Text & Data Mining
for the National Library of Greece
in consideration of GDPRc
Dr. Marinos Papadopoulos & Konstantinos Vavousis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
contributors
Dr. Marinos Papadopoulos
Attorney-at-Law
Mr. Konstantinos Vavousis, M.Sc.
IT Security Expert
Trust-IT Ltd.
Dr. Charalampos Bratsas
Mathematician
Aristotle University of Thessaloniki
Mrs. Eliza Makridou, M.Phil.
Information Scientist, Librarian
National Library of Greece
Dr. Michalis Gerolimos
Information Scientist, Librarian
National Library of Greece
Mrs. Dimitra Hioti, M.A.
Information Scientist, Librarian
National Library of Greece
16
![Page 17: Text & Data Mining for the National Library of …netpreserve.org/ga2019/wp-content/uploads/2019/07/...Text & Data Mining for the National Library of Greece in consideration of GDPR](https://reader034.vdocuments.mx/reader034/viewer/2022042309/5ed6cc71777e4e4f012b2b4c/html5/thumbnails/17.jpg)
Dr. Marinos Papadopoulos & Konstantinos Vavoussis & Dimitra Hiot i| IIPC WAC 2019: 06-07/06/2019, Zagreb, Croatia
17