linked data for abbreviations and segmentation
TRANSCRIPT
PowerPoint-Prsentation - Folie 1
Creating Knowledge out of Interlinked Data
LOD2 Presentation . 02.09.2010 . Page
http://lod2.eu
AKSW, Universitt Leipzig
Sebastian Hellmann
Linked Data
for
Abbreviations and Segmentation
http://nlp2rdf.orghttp://lod2.eu
http://slideshare.net/kurzum
Sebastian Hellmann researcher working on LOD2 EU ProjectAKSW Agile Knowledge and the Semantic Web research group in Leipzig - http://aksw.orgInfAI Institute for Applied Informatics - http://infai.org
Contents:Introduction to Linked Data
Linked data close-up: DBpedia data set
Exploitation of free and open data for CLDR
Collaboration points
Introduction
http://lod-cloud.net
http://lod-cloud.net
Linked Open Data
- All datasets provide open access to individual records via HTTP- Many are free (no payment required, as in royalty-free)- Some are openly licensed, e.g. CC-0 or CC-BY-SA
=> Open access also applies to published HTML on the WWW, but here the data itself is published unrendered via RDF
http://dbpedia.org
DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.
allows sophisticated queries against Wikipedia content
allows links from the different data sets on the Web to Wikipedia data
data is extracted continuously: http://live.dbpedia.org
WikiData will be integrated within the next four months
via Google Summer of Code project
http://dbpedia.org
http://dbpedia.org/resource/Berlin
First paragraph in more than 20 languages
http://dbpedia.org/resource/Berlin
Facts from Wikipedia infoboxes
http://dbpedia.org/resource/Berlin
Several HierarchicalClassifications
http://dbpedia.org/resource/Berlin
Links
Multilingual labels
Trend 1: I18n
DBpedia Extraction Framework can be extended to easily extract any data from Wikipedia: https://github.com/dbpedia/extraction-framework
We are using it to extract corpora for NLP e.g. URI, surrounding text, surface form
Probabilities:P(sf|URI): P that apple refers to wikipedia:Apple_Inc.
P(URI|sf): P that wikipedia:Apple_Inc. is apple in text
Trend 2: DBpedia 4 NLP
DBpedia is a data dissemination project:as download for reuse
As Linked Data for interlinking
Corpora will be published via the NLP Interchange RDF Format (NIF) - http://nlp2rdf.org
Trend 2: DBpedia 4 NLP
DBpedia Live Abbreviation Example
Up-to-date gazetteer - AFD party was founded earlier this year.- lexical information and statistics could be included
Linguistic LOD Cloud
DBpediaMain version and I18n chapters
http://dbpedia.org/Datasets/NLP
Wiktionary 2 RDF: http://dbpedia.org/Wiktionary
Wortschatz from Uni Leipzig (planned as Linked Data)http://corpora.informatik.uni-leipzig.de/download.html
JRC Names: http://langtech.jrc.it/JRC-Names.htmlJRC-Names is a highly multilingual named entity resource for person and organisation names
Lexvo.org: provides URIs for ISO 629-3
http://lexvo.org/id/iso639-3/spa
Example data sets from LLOD
http://linguistics.okfn.org/resources/llod/
=> CLDR will make an excellent addition to LLOD
Linguistic LOD
CLDR as Linked Dataempowers third parties to link to your authoritative data
links are reusable
LIDER EU project (presumably starting in October) will provide some support for linked data adopters
ULI members can join the industry and advisory board
Workshop DBpedia & NLP in Oct, 2013http://nlp-dbpedia2013.blogs.aksw.org/
Creation of free and open benchmarks in RDF
We could promote CLDR and collect contributions
Collaboration points I
Personally, I can:Join ULI mailing list
Look out for appropriate data
Look for opportunities (e.g. synergies with other projects)
Provide some counseling (e.g. pointers, technology Q&A)=> this will be done as preparation for the LIDER EU project, CLDR
Academic collaboration:Excellent PhD student topic: Create corpora, interlink and fuse data and benchmark effectiveness for segmentation
Provide knowledge transfer (e.g. tutorials, visits)
Collaboration points II
Open Community All feedback is welcome!
http://slideshare.net/kurzumWebsites:
http://dbpedia.org
http://nlp2rdf.org
http://lod2.eu
Thanks for your attention
Wiktionary Example
LOD2 EU Project produces LOD2 Stack.
Three requirements to unlock Natural Language Processing (NLP) for the project:
1. NLP tool output is required to be in RDF
2. Scalability (less triples, focus on usefulness)
3. Common vocabulary to integrate and use NLP tools
The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.Version 1.0 published in November 2011
Version 2.0 is scheduled for completion within 2013
NLP Interchange Format 2.0
NIF Architecture
Adressing Primary Data
Adressing Primary Data
NIF 1.0:http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729
NIF 2.0 uses RFC 5147:http://www.w3.org/DesignIssues/LinkedData.html#char=717,729
User extensions possible:http://www.w3.org/DesignIssues/LinkedData.html#your_own_scheme(but you have to link to documentation on how it was created)
As a Web Service
curl --data-urlencode prefix="http://prefix.given.by/theClient#" --data-urlencode input="[...]" (--data-urlencode source=http://www.w3.org/DesignIssues/LinkedData.html) http://nlp2rdf.lod2.eu/demo/NIFStanfordCore
Tibeto-Burman languages: http://purl.org/olia/tibet.owl#VNst
Russian TreeTagger : http://purl.org/olia/russ.owl#partizip_prt_sg_neut_passiv_gen_langform
German STTS: http://purl.org/olia/stts.owl#VAPP
English Penn: http://purl.org/olia/penn.owl#VBG
all map to http://purl.org/olia/olia.owl#NonFiniteVerb
Ontologies of Lingingustic Annotation (OLiA) contain mappings for over 50 Tagsets (free and open, CC-By)
Vocabulary Module: OLiA
NIF 2.0 tries to be compatible to (Vocabulary Module): ITS 2.0
FISE used in Apache Stanbol (IKS-EU Project)
LAF/GrAF XML ISO standard, recently published
Fragment Identifiers by IETF and W3C
Lemon ontology from Monnet EU Project
NERD ontology from EURECOM and LinkedTV EU Project
Xpointer/XPath URI scheme
Open Annotation
NIF 2.0 - plans
NIF 2.0 :NIF is free and open (CC-0 or CC-BY)
All ontologies will be hosted for persistently by University Leipzig
Sign up on the mailinglist at http://nlp2rdf.org
Provide Use Cases, Requirements, Implementations at:http://wiki.nlp2rdf.org/wiki/Use_cases#Use_cases
http://wiki.nlp2rdf.org/wiki/Requirements#Requirements
How you can contribute:
LOD 2 StackCurrently project half-time
Most of the tools are free and open source
Commercial rollout planned
Many webinars available
You can integrate your tool via Debian package
http://lod2.euhttp://stack.lod2.eu/
How you can contribute:
ULI meeting 2013/05/28 Page
http://lod2.eu
LOD2 Title . 02.09.2010 . Page
http://lod2.eu
http://lod2.eu
ISSLOD 2011/09/15 Page
http://lod2.eu
Table of Contents
LOD2 Title . 02.09.2010 . Page
http://lod2.eu
LOD2 Title . 02.09.2010 . Page
http://lod2.eu
Address
University of LeipzigFaculty of Mathematics and Computer ScienceInstitute of Computer ScienceDepartment of Business Information SystemsPostfach 10092004009 LeipzigGermany
Thanks for your attention!
Contact
Creating Knowledge out of Interlinked Data
Sren Auer The Data Web 24.5.2012 Page http://lod2.euClick to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level
Seventh Outline LevelMastertextformat bearbeitenZweite Ebene
Dritte Ebene
Vierte Ebene
Fnfte Ebene
Click to edit the title text formatClick to edit Master title style