linked data for abbreviations and segmentation

PowerPoint-Prsentation - Folie 1

Creating Knowledge out of Interlinked Data

LOD2 Presentation . 02.09.2010 . Page

http://lod2.eu

AKSW, Universitt Leipzig

Sebastian Hellmann

Linked Data
for
Abbreviations and Segmentation

http://nlp2rdf.orghttp://lod2.eu

http://slideshare.net/kurzum

Sebastian Hellmann researcher working on LOD2 EU ProjectAKSW Agile Knowledge and the Semantic Web research group in Leipzig - http://aksw.orgInfAI Institute for Applied Informatics - http://infai.org

Contents:Introduction to Linked Data

Linked data close-up: DBpedia data set

Exploitation of free and open data for CLDR

Collaboration points

Introduction

http://lod-cloud.net

http://lod-cloud.net

Linked Open Data

- All datasets provide open access to individual records via HTTP- Many are free (no payment required, as in royalty-free)- Some are openly licensed, e.g. CC-0 or CC-BY-SA

=> Open access also applies to published HTML on the WWW, but here the data itself is published unrendered via RDF

http://dbpedia.org

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web.

allows sophisticated queries against Wikipedia content

allows links from the different data sets on the Web to Wikipedia data

data is extracted continuously: http://live.dbpedia.org

WikiData will be integrated within the next four months
via Google Summer of Code project

http://dbpedia.org

http://dbpedia.org/resource/Berlin

First paragraph in more than 20 languages


Facts from Wikipedia infoboxes


Several HierarchicalClassifications


Links

Multilingual labels

Trend 1: I18n

DBpedia Extraction Framework can be extended to easily extract any data from Wikipedia: https://github.com/dbpedia/extraction-framework

We are using it to extract corpora for NLP e.g. URI, surrounding text, surface form

Probabilities:P(sf|URI): P that apple refers to wikipedia:Apple_Inc.

P(URI|sf): P that wikipedia:Apple_Inc. is apple in text

Trend 2: DBpedia 4 NLP

DBpedia is a data dissemination project:as download for reuse

As Linked Data for interlinking

Corpora will be published via the NLP Interchange RDF Format (NIF) - http://nlp2rdf.org

Trend 2: DBpedia 4 NLP

DBpedia Live Abbreviation Example

Up-to-date gazetteer - AFD party was founded earlier this year.- lexical information and statistics could be included

Linguistic LOD Cloud

DBpediaMain version and I18n chapters

http://dbpedia.org/Datasets/NLP

Wiktionary 2 RDF: http://dbpedia.org/Wiktionary

Wortschatz from Uni Leipzig (planned as Linked Data)http://corpora.informatik.uni-leipzig.de/download.html

JRC Names: http://langtech.jrc.it/JRC-Names.htmlJRC-Names is a highly multilingual named entity resource for person and organisation names

Lexvo.org: provides URIs for ISO 629-3

http://lexvo.org/id/iso639-3/spa

Example data sets from LLOD

http://linguistics.okfn.org/resources/llod/

=> CLDR will make an excellent addition to LLOD

Linguistic LOD

CLDR as Linked Dataempowers third parties to link to your authoritative data

links are reusable

LIDER EU project (presumably starting in October) will provide some support for linked data adopters

ULI members can join the industry and advisory board

Workshop DBpedia & NLP in Oct, 2013http://nlp-dbpedia2013.blogs.aksw.org/

Creation of free and open benchmarks in RDF

We could promote CLDR and collect contributions

Collaboration points I

Personally, I can:Join ULI mailing list

Look out for appropriate data

Look for opportunities (e.g. synergies with other projects)

Provide some counseling (e.g. pointers, technology Q&A)=> this will be done as preparation for the LIDER EU project, CLDR

Academic collaboration:Excellent PhD student topic: Create corpora, interlink and fuse data and benchmark effectiveness for segmentation

Provide knowledge transfer (e.g. tutorials, visits)

Collaboration points II

Open Community All feedback is welcome!

http://slideshare.net/kurzumWebsites:

http://dbpedia.org

http://nlp2rdf.org

http://lod2.eu

Thanks for your attention

Wiktionary Example

LOD2 EU Project produces LOD2 Stack.

Three requirements to unlock Natural Language Processing (NLP) for the project:

1. NLP tool output is required to be in RDF

2. Scalability (less triples, focus on usefulness)

3. Common vocabulary to integrate and use NLP tools

The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.Version 1.0 published in November 2011

Version 2.0 is scheduled for completion within 2013

NLP Interchange Format 2.0

NIF Architecture

Adressing Primary Data

Adressing Primary Data

NIF 1.0:http://www.w3.org/DesignIssues/LinkedData.html#offset_717_729

NIF 2.0 uses RFC 5147:http://www.w3.org/DesignIssues/LinkedData.html#char=717,729

User extensions possible:http://www.w3.org/DesignIssues/LinkedData.html#your_own_scheme(but you have to link to documentation on how it was created)

As a Web Service

curl --data-urlencode prefix="http://prefix.given.by/theClient#" --data-urlencode input="[...]" (--data-urlencode source=http://www.w3.org/DesignIssues/LinkedData.html) http://nlp2rdf.lod2.eu/demo/NIFStanfordCore

Tibeto-Burman languages: http://purl.org/olia/tibet.owl#VNst

Russian TreeTagger : http://purl.org/olia/russ.owl#partizip_prt_sg_neut_passiv_gen_langform

German STTS: http://purl.org/olia/stts.owl#VAPP

English Penn: http://purl.org/olia/penn.owl#VBG

all map to http://purl.org/olia/olia.owl#NonFiniteVerb

Ontologies of Lingingustic Annotation (OLiA) contain mappings for over 50 Tagsets (free and open, CC-By)

Vocabulary Module: OLiA

NIF 2.0 tries to be compatible to (Vocabulary Module): ITS 2.0

FISE used in Apache Stanbol (IKS-EU Project)

LAF/GrAF XML ISO standard, recently published

Fragment Identifiers by IETF and W3C

Lemon ontology from Monnet EU Project

NERD ontology from EURECOM and LinkedTV EU Project

Xpointer/XPath URI scheme

Open Annotation

NIF 2.0 - plans

NIF 2.0 :NIF is free and open (CC-0 or CC-BY)

All ontologies will be hosted for persistently by University Leipzig

Sign up on the mailinglist at http://nlp2rdf.org

Provide Use Cases, Requirements, Implementations at:http://wiki.nlp2rdf.org/wiki/Use_cases#Use_cases

http://wiki.nlp2rdf.org/wiki/Requirements#Requirements

How you can contribute:

LOD 2 StackCurrently project half-time

Most of the tools are free and open source

Commercial rollout planned

Many webinars available

You can integrate your tool via Debian package

http://lod2.euhttp://stack.lod2.eu/

How you can contribute:

ULI meeting 2013/05/28 Page

http://lod2.eu

LOD2 Title . 02.09.2010 . Page

http://lod2.eu

http://lod2.eu

ISSLOD 2011/09/15 Page

http://lod2.eu

Table of Contents


http://lod2.eu


http://lod2.eu

Address

University of LeipzigFaculty of Mathematics and Computer ScienceInstitute of Computer ScienceDepartment of Business Information SystemsPostfach 10092004009 LeipzigGermany

Thanks for your attention!

Contact

Creating Knowledge out of Interlinked Data

Sren Auer The Data Web 24.5.2012 Page http://lod2.euClick to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline Level

Seventh Outline LevelMastertextformat bearbeitenZweite Ebene

Dritte Ebene

Vierte Ebene

Fnfte Ebene

Click to edit the title text formatClick to edit Master title style

linked data for abbreviations and segmentation

Technology