Your Content hides a treasure and you might have not found it yet
TYPO3 Congres Amsterdam 2014
Let me start with a quote
“Nowadays people know the price of everything
and the value of nothing.”
!
― Oscar Wilde, The Picture of Dorian Gray
TYPO3 Congres Amsterdam 2014
The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
Contact 7
TYPO3 Congres Amsterdam 2014
The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
Contact 7
TYPO3 Congres Amsterdam 2014
The problem
TYPO3 Congres Amsterdam 2014
Welcome to the digital, information age... a never ending flood of content!
Technology enables us to produce nearly unlimited content
We are still „hunters and gatherers“
Storage space feels to be „infinite“, but resources are limited
Evolution of technology = new standards and formats
TYPO3 Congres Amsterdam 2014
What happens if we loose the ability to view/retrieve all this content?
TYPO3 Congres Amsterdam 2014
So how do we handle this ?
dkd staff Meeting, 13.08.2014, Frankfurt
We preserve !
“Preservation — The protection of cultural property through activities that minimize chemical and physical deterioration and damage and that prevent loss of informational content. The primary goal of preservation is to prolong the existence of cultural property.”
Preservation 101
dkd staff Meeting, 13.08.2014, Frankfurt
Preserving a website is not trivial
What do you want to preserve?
Content only?
Content and Design?
How often? Stock prices vs. Company History page
How do you deal with browser differences?
How do you preserve functionality? E.g. insurance fee calculator
TYPO3 Congres Amsterdam 2014
What do you preserve?
TYPO3 Congres Amsterdam 2014
How do you preserve?
TYPO3 Congres Amsterdam 2014
The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
Contact 7
TYPO3 Congres Amsterdam 2014
A project funded by the EU called ForgetIT
The solution
TYPO3 Congres Amsterdam 2014
Concise Preservation by combining Managed Forgetting and Contextualized Remembering
TYPO3 Congres Amsterdam 2014
The Project - Facts
EU research project
Part of the Seventh framework programme
Countries involved : Germany, Sweden, Israel, Turkey, Greece, United Kingdom, Italy
Project duration: 2013/2016
TYPO3 Congres Amsterdam 2014
The partners
TYPO3 Congres Amsterdam 2014
The Project - Goals
ForgetIT aims to transfer the human Preservation and Forgetting concepts to computer systems
Meaningful Preservation and Forgetting
Consider differences between individuals and organizations and demonstrate via use cases
TYPO3 Congres Amsterdam 2014
The Project - our approach to the organizational part
Transform the basis of content management to become semantic, using a linked data approach
Subsequently measure key value indicators to determine the value of content, it’s relevance and benefits
Enable humans and systems to actively preserve or forget
TYPO3 Congres Amsterdam 2014
How does human memory work?
TYPO3 Congres Amsterdam 2014
TYPO3 Congres Amsterdam 2014
What have you stored in your mind?
What is the dkd color code?
Which color had the word „Holz“ written on the post it! ?
How many wooden stickt sat in the vase?
Were there 2,3 or 4 green Copic-Markers?
Was the lap top on the lounge table open or closed?
TYPO3 Congres Amsterdam 2014
What have you stored in your mind?
TYPO3 Congres Amsterdam 2014
A conceptual model of the human memory
TYPO3 Congres Amsterdam 2014
Role in the Preserve-or-Forget
Active system Preserve-or-Forget Middleware Archival Information system
TYPO3 Congres Amsterdam 2014
The link to human memory
TYPO3 Congres Amsterdam 2014
The link to human memory
Digital preservation
TYPO3 Congres Amsterdam 2014
The link to human memory
Digital preservation
Forgetting without context
TYPO3 Congres Amsterdam 2014
The link to human memory
Digital preservation
Forgetting without context
Preservation with learning
TYPO3 Congres Amsterdam 2014
The link to human memory
Digital preservation
Forgetting without context Preservation with context
Preservation with learning
TYPO3 Congres Amsterdam 2014
The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
Contact 7
TYPO3 Congres Amsterdam 2014
A web of data
Semantic web
dkd staff Meeting, 13.08.2014, Frankfurt
„The semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation“ Tim Berners-Lee
dkd staff Meeting, 13.08.2014, Frankfurt
The current web
dkd staff Meeting, 13.08.2014, Frankfurt
The semantic web
dkd staff Meeting, 13.08.2014, Frankfurt
What people see in a website
dkd staff Meeting, 13.08.2014, Frankfurt
What do machines see in a website
dkd staff Meeting, 18.09.2014, Frankfurt
What machines could see in a website
LocationSocial Media
Berlin
Brandenburg gate
Subscription
Destination
dkd staff Meeting, 18.09.2014, Frankfurt
Let‘s start simple
Type
Location
Built
: Landmark
: Berlin
: 1788-1791
Brandenburg gate
dkd staff Meeting, 18.09.2014, Frankfurt
Let‘s start simple
Type
Location
Built
: Landmark
: Berlin
: 1788-1791
Brandenburg gate
Type
Location
Area
: Capital
: Germany
: 891.85 km2
Berlin
dkd staff Meeting, 18.09.2014, Frankfurt
Let‘s start simple
Type
Location
Built
: Landmark
: Berlin
: 1788-1791
Brandenburg gate
Type
Location
Area
: Capital
: Germany
: 891.85 km2
Berlin
dkd staff Meeting, 18.09.2014, Frankfurt
Triples
Berlin capitalis a
dkd staff Meeting, 18.09.2014, Frankfurt
Triples
Berlin capitalis a
Subject Predicate Object
dkd staff Meeting, 18.09.2014, Frankfurt
How do we implement triples?
URI URIURI
dkd staff Meeting, 13.08.2014, Frankfurt
Ontologies
define hierarchies
help us describe relations
provide the general structure
ensure interdisciplinary understanding
dkd staff Meeting, 18.09.2014, Frankfurt
Hierarchy in an ontology
Europe
Germany
Berlin
Bradenburg gate
TYPO3 Congres Amsterdam 2014
sDBpedia
Author: Anja Jentzsch, source: http://en.wikipedia.org/wiki/File:LOD_Cloud_Diagram_as_of_September_2011.png
TYPO3 Congres Amsterdam 2014
As of September 2011
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
TYPO3 Congres Amsterdam 2014
As of September 2011
MusicBrainz
(zitgist)
P20
Turismo de
Zaragoza
yovisto
Yahoo! Geo
Planet
YAGO
World Fact-book
El ViajeroTourism
WordNet (W3C)
WordNet (VUA)
VIVO UF
VIVO Indiana
VIVO Cornell
VIAF
URIBurner
Sussex Reading
Lists
Plymouth Reading
Lists
UniRef
UniProt
UMBEL
UK Post-codes
legislationdata.gov.uk
Uberblic
UB Mann-heim
TWC LOGD
Twarql
transportdata.gov.
uk
Traffic Scotland
theses.fr
Thesau-rus W
totl.net
Tele-graphis
TCMGeneDIT
TaxonConcept
Open Library (Talis)
tags2con delicious
t4gminfo
Swedish Open
Cultural Heritage
Surge Radio
Sudoc
STW
RAMEAU SH
statisticsdata.gov.
uk
St. Andrews Resource
Lists
ECS South-ampton EPrints
SSW Thesaur
us
SmartLink
Slideshare2RDF
semanticweb.org
SemanticTweet
Semantic XBRL
SWDog Food
Source Code Ecosystem Linked Data
US SEC (rdfabout)
Sears
Scotland Geo-
graphy
ScotlandPupils &Exams
Scholaro-meter
WordNet (RKB
Explorer)
Wiki
UN/LOCODE
Ulm
ECS (RKB
Explorer)
Roma
RISKS
RESEX
RAE2001
Pisa
OS
OAI
NSF
New-castle
LAASKISTI
JISC
IRIT
IEEE
IBM
Eurécom
ERA
ePrints dotAC
DEPLOY
DBLP (RKB
Explorer)
Crime Reports
UK
Course-ware
CORDIS (RKB
Explorer)CiteSeer
Budapest
ACM
riese
Revyu
researchdata.gov.
ukRen. Energy Genera-
tors
referencedata.gov.
uk
Recht-spraak.
nl
RDFohloh
Last.FM (rdfize)
RDF Book
Mashup
Rådata nå!
PSH
Product Types
Ontology
ProductDB
PBAC
Poké-pédia
patentsdata.go
v.uk
OxPoints
Ord-nance Survey
Openly Local
Open Library
OpenCyc
Open Corpo-rates
OpenCalais
OpenEI
Open Election
Data Project
OpenData
Thesau-rus
Ontos News Portal
OGOLOD
JanusAMP
Ocean Drilling Codices
New York
Times
NVD
ntnusc
NTU Resource
Lists
Norwe-gian
MeSH
NDL subjects
ndlna
myExperi-ment
Italian Museums
medu-cator
MARC Codes List
Man-chester Reading
Lists
Lotico
Weather Stations
London Gazette
LOIUS
Linked Open Colors
lobidResources
lobidOrgani-sations
LEM
LinkedMDB
LinkedLCCN
LinkedGeoData
LinkedCT
LinkedUser
FeedbackLOV
Linked Open
Numbers
LODE
Eurostat (OntologyCentral)
Linked EDGAR
(OntologyCentral)
Linked Crunch-
base
lingvoj
Lichfield Spen-ding
LIBRIS
Lexvo
LCSH
DBLP (L3S)
Linked Sensor Data (Kno.e.sis)
Klapp-stuhl-club
Good-win
Family
National Radio-activity
JP
Jamendo (DBtune)
Italian public
schools
ISTAT Immi-gration
iServe
IdRef Sudoc
NSZL Catalog
Hellenic PD
Hellenic FBD
PiedmontAccomo-dations
GovTrack
GovWILD
GoogleArt
wrapper
gnoss
GESIS
GeoWordNet
GeoSpecies
GeoNames
GeoLinkedData
GEMET
GTAA
STITCH
SIDER
Project Guten-berg
MediCare
Euro-stat
(FUB)
EURES
DrugBank
Disea-some
DBLP (FU
Berlin)
DailyMed
CORDIS(FUB)
Freebase
flickr wrappr
Fishes of Texas
Finnish Munici-palities
ChEMBL
FanHubz
EventMedia
EUTC Produc-
tions
Eurostat
Europeana
EUNIS
EU Insti-
tutions
ESD stan-dards
EARTh
Enipedia
Popula-tion (En-AKTing)
NHS(En-
AKTing) Mortality(En-
AKTing)
Energy (En-
AKTing)
Crime(En-
AKTing)
CO2 Emission
(En-AKTing)
EEA
SISVU
education.data.g
ov.uk
ECS South-ampton
ECCO-TCP
GND
Didactalia
DDC Deutsche Bio-
graphie
datadcs
MusicBrainz
(DBTune)
Magna-tune
John Peel
(DBTune)
Classical (DB
Tune)
AudioScrobbler (DBTune)
Last.FM artists
(DBTune)
DBTropes
Portu-guese
DBpedia
dbpedia lite
Greek DBpedia
DBpedia
data-open-ac-uk
SMCJournals
Pokedex
Airports
NASA (Data Incu-bator)
MusicBrainz(Data
Incubator)
Moseley Folk
Metoffice Weather Forecasts
Discogs (Data
Incubator)
Climbing
data.gov.uk intervals
Data Gov.ie
databnf.fr
Cornetto
reegle
Chronic-ling
America
Chem2Bio2RDF
Calames
businessdata.gov.
uk
Bricklink
Brazilian Poli-
ticians
BNB
UniSTS
UniPathway
UniParc
Taxonomy
UniProt(Bio2RDF)
SGD
Reactome
PubMedPub
Chem
PRO-SITE
ProDom
Pfam
PDB
OMIMMGI
KEGG Reaction
KEGG Pathway
KEGG Glycan
KEGG Enzyme
KEGG Drug
KEGG Com-pound
InterPro
HomoloGene
HGNC
Gene Ontology
GeneID
Affy-metrix
bible ontology
BibBase
FTS
BBC Wildlife Finder
BBC Program
mes BBC Music
Alpine Ski
Austria
LOCAH
Amster-dam
Museum
AGROVOC
AEMET
US Census (rdfabout)
Media
Geographic
Publications
Government
Cross-domain
Life sciences
User-generated content
Author: Anja Jentzsch, source: http://en.wikipedia.org/wiki/File:LOD_Cloud_Diagram_as_of_September_2011.png
dkd staff Meeting, 13.08.2014, Frankfurt
Custom ontologies
Industry specific ontology
Geographic locations ontology
Your very own company ontology
TYPO3 Congres Amsterdam 2014
Semantic search
create specific and precise queries
have the meaning of the intended information
receive cumulative results from different sources
image search through concept detection
TYPO3 Congres Amsterdam 2014
Concept detection
If we can enable computers to see the content of an image, they would be able to detect concepts and give us accurate image results
Derive context and meaning
TYPO3 Congres Amsterdam 2014
Content extraction
Once a computer is able to understand content of text, it can reduce redundant text (unnecessary words, reiterations)
Integration and reuse of information
TYPO3 Congres Amsterdam 2014
Content value
Custom ontologies allow a company to attach asses a semantic object’s value
Redefines itself over time
Think of a 2d map to locate important terms/products/events from a company’s perspective
TYPO3 Congres Amsterdam 2014
The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
Contact 7
TYPO3 Congres Amsterdam 2014
Preservation value and memory buoyancy
The Values
dkd staff Meeting, 13.08.2014, Frankfurt
dkd staff Meeting, 13.08.2014, Frankfurt
dkd staff Meeting, 13.08.2014, Frankfurt
Active system
Content repository
Archive
TYPO3 Congres Amsterdam 2014
Content value
TYPO3 Congres Amsterdam 2014
Memory buoyancy
TYPO3 Congres Amsterdam 2014
Preservation value
TYPO3 Congres Amsterdam 2014
Further assumptions on Content Value
Relevance does influence the value in the ontology over time
Changes in a companies strategy or portfolio will force a reassessment of Content and Content Value
Who created the content could be important to calculate the Content Value
TYPO3 Congres Amsterdam 2014
The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
Roadmap 7
TYPO3 Congres Amsterdam 2014
Four strategies that can be combined to gain access to content value
Strategy
TYPO3 Congres Amsterdam 2014
How to define your own content value (1)
Start with some inventory on the content
What kind of content do you create? News? Pages? PDF?
How long does it stay in its actual state? (Content Lifecycle)
When does it expire? What happens with archived content?
How much content do you create?
…
TYPO3 Congres Amsterdam 2014
How to define your own content value (2)
Look at the people creating the content
Create a social graph of the people and the content they create
Identify the main nodes in the graph
Calculate the network size of those nodes
Identify those that have the most impact based on borders crossed
…
TYPO3 Congres Amsterdam 2014
How to define your own content value (3)
Look at the analytics of the content
Where lies your hot / cold content based of external usage
Which content gives you the most reactions such as shares, mentions, comments?
Bear in mind that popularity might be an indicator, but sometimes might mislead you.
…
TYPO3 Congres Amsterdam 2014
How to define your own content value (4)
Look inside the content you create
What are the top 100 words in your content (w/o stop words)
Which entities belong to you? Which to you industry?
How is the density of you entities built up?
Cluster documents based of the taxonomies they use?
Identify orphaned content
…
TYPO3 Congres Amsterdam 2014
The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
Contact 7
TYPO3 Congres Amsterdam 2014
The problem 1
The solution 2
Semantic web 3
The values 4
The strategy 5
Questions 6
Contact 7
TYPO3 Congres Amsterdam 2014
Contact
Olivier Dobberkau <[email protected]>
ForgetIT Project Website: www.forgetit-project.eu
Twitter: @ForgetITProject
Code will be published on Github in 2015
TYPO3 Congres Amsterdam 2014
Thank you for your attention!
Thank you for your attention!