the roadmap for the arabic chapter of dbpediaits content from wikipedia and linked to thousands of...

11
The Roadmap for the Arabic Chapter of DBpedia HAYTHAM AL-FEEL Information Systems Department, Faculty of Computers and Information Fayoum University Cairo, Egypt htf00 @fayoum.edu.eg Abstract: - DBpedia is nowadays considered one of the main sources of structured data on the web that extract its content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the lack of multilingualism that appears clearly in the Arabic domain. From that point of view, the paper on hands draws a roadmap to adhere these obstacles via representing the main vocabularies used in DBpedia, main participants, and best practices. In addition to the representing of our Mapping Methodology for the Arabic Chapter, which we call ACMM. This paper highlights our efforts that considered the first contribution in the deployment of the Arabic Chapter of DBpedia from scratch that increased the number of mapped infoboxes, open the door for development of new applications based on this chapter, as a step in publishing and linking large semantic data in Arabic. Key-Words: - Semantic Web; DBpedia; Wikipedia; Mapping; Data Sharing; Arabic Chapter 1 Introduction DBpedia project [1] is considered the results of the collaboration between Freie University of Berlin, Open Link Software and the University of Leipzig which aims to represent Wikipedia content in a structured from, such as RDF triples. DBpedia is considered the main web hub that links different datasets to each other [2][3][4]. The availability of DBpedia in local languages is expected to open the door for semantic applications to be developed to discover new knowledge from different resources, and facilitates the making of sophisticated queries; which is not available in Wikipedia. DBpedia works under the license of the Creative Commons Attribution-Share Alike License [5] and GNU Free Documentation License [6 ].At the time of writing, there are 18 localized DBpedia chapters, including English, German, French, Greek, Basque, Czech, Dutch, Polish, Swedish, Spanish, Italian, Portuguese, Russian, Ukrainian, Korean, Japanese, Indonesian and Esperanto [7]. On the other hand, there are 279 active Wikipedia editions [8] which refer to different languages available World Wide, while the number of available DBpedia chapters did not exceed 6.45% of the total number of Wikipedia editions. DBpedia faces obstacles preventing it from the sustainable operation, including the lack of multilingualism and documentation. Each language has its characteristics and its way of mapping. There are still many languages that do not have DBpedia chapters up till now, for example, the Arabic Chapter was not there before the work we did and explained in this paper. However, there are around 135,610,819 [9][10] Arabian internet-users World Wide, which are around 4.8% of users all over the world. The paper on hands describes the work we did in the mapping and the implementation of the Arabic Chapter. This work is considered the road-map to the Arabic Chapter of DBpedia, explaining how to deal with different difficulties and problems that we met on mapping, describing the best practices and our methodology of mappings that increased the number of mapped templates and properties in the Arabic Chapter and became available in the DBpedia mapping race with our individual efforts. This paper is organized as follows: Section 2 discusses the related work; Section 3 overviews the Wikipedia Project and the DBpedia project. While, Section 4 highlights the DBpedia Extraction Framework. On the other hand, Section 5 describes the steps needed to build the Arabic Chapter of DBpedia, highlighting difficulties that we met and how we solved them. Also, this describes our mapping methodology and the conditions needed to Mathematical and Computational Methods in Electrical Engineering ISBN: 978-1-61804-329-0 115

Upload: others

Post on 06-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

The Roadmap for the Arabic Chapter of DBpedia

HAYTHAM AL-FEEL

Information Systems Department, Faculty of Computers and Information Fayoum University

Cairo, Egypt

htf00 @fayoum.edu.eg

Abstract: - DBpedia is nowadays considered one of the main sources of structured data on the web that extract

its content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia

faces obstacles, such as the lack of multilingualism that appears clearly in the Arabic domain. From that point

of view, the paper on hands draws a roadmap to adhere these obstacles via representing the main vocabularies

used in DBpedia, main participants, and best practices. In addition to the representing of our Mapping

Methodology for the Arabic Chapter, which we call ACMM. This paper highlights our efforts that considered

the first contribution in the deployment of the Arabic Chapter of DBpedia from scratch that increased the

number of mapped infoboxes, open the door for development of new applications based on this chapter, as a

step in publishing and linking large semantic data in Arabic.

Key-Words: - Semantic Web; DBpedia; Wikipedia; Mapping; Data Sharing; Arabic Chapter

1 Introduction

DBpedia project [1] is considered the results of the

collaboration between Freie University of Berlin,

Open Link Software and the University of Leipzig

which aims to represent Wikipedia content in a

structured from, such as RDF triples. DBpedia is

considered the main web hub that links different

datasets to each other [2][3][4]. The availability of

DBpedia in local languages is expected to open the

door for semantic applications to be developed to

discover new knowledge from different resources,

and facilitates the making of sophisticated queries;

which is not available in Wikipedia. DBpedia works

under the license of the Creative Commons

Attribution-Share Alike License [5] and GNU Free

Documentation License [6 ].At the time of writing,

there are 18 localized DBpedia chapters, including

English, German, French, Greek, Basque, Czech,

Dutch, Polish, Swedish, Spanish, Italian,

Portuguese, Russian, Ukrainian, Korean, Japanese,

Indonesian and Esperanto [7]. On the other hand,

there are 279 active Wikipedia editions [8] which

refer to different languages available World Wide,

while the number of available DBpedia chapters did

not exceed 6.45% of the total number of Wikipedia

editions. DBpedia faces obstacles preventing it from

the sustainable operation, including the lack of

multilingualism and documentation. Each language

has its characteristics and its way of mapping. There

are still many languages that do not have DBpedia

chapters up till now, for example, the Arabic

Chapter was not there before the work we did and

explained in this paper. However, there are around

135,610,819 [9][10] Arabian internet-users World

Wide, which are around 4.8% of users all over the

world.

The paper on hands describes the work we did in the

mapping and the implementation of the Arabic

Chapter. This work is considered the road-map to

the Arabic Chapter of DBpedia, explaining how to

deal with different difficulties and problems that we

met on mapping, describing the best practices and

our methodology of mappings that increased the

number of mapped templates and properties in the

Arabic Chapter and became available in the

DBpedia mapping race with our individual efforts.

This paper is organized as follows: Section 2

discusses the related work; Section 3 overviews the

Wikipedia Project and the DBpedia project. While,

Section 4 highlights the DBpedia Extraction

Framework. On the other hand, Section 5 describes

the steps needed to build the Arabic Chapter of

DBpedia, highlighting difficulties that we met and

how we solved them. Also, this describes our

mapping methodology and the conditions needed to

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 115

Page 2: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

do SPARQL queries. Section 6 shows our mapping

results. Finally, section 7 concludes the paper and

discusses the future work that we can do.

2 RELATED WORK

DBpedia internationalization efforts just began a

few years ago aiming to participate in finding

solutions for the short comes of multilingualism in

DBpedia. There is no much work done in this

direction, and this is an open research area. On the

other hand, Kontokostas and Hellman [11]

described the main problems faced in the

multilingual chapters of DBpedia, such as lack of

tools that support multilingualism, in addition to the

lack of documentation and tutorials that help

contribution of the community [11]. An attempt to

the mapping of multilingual Wikipedia infoboxes to

DBpedia ontology was discussed in [12] but it does

not fill the needs of the mapping of the Arabic

Language. Also, Xing Niu and colleagues tried to

put strategies to the mappings of the Chinese

DBpedia but it doesn't match fully with the Arabic

mappings according to the different characteristics

of the two languages, especially in the punctuation

[13]. Also, Alessio Palmero made an attempt to

automate the mapping of multilingual DBpedia, but

he did not prove his work to be used in the DBpedia

community and did not consider the Arabic

language in his work [14].The first time that Arabic

language being discussed in DBpedia was in [15]

which highlighted the weakness and difficulties that

face the Arabic Language in the domain.

3 Wikipedia & DBpedia

3.1 Wikipedia

Wikipedia is considered a great source of data that

became the 7th most visited website all over the

world [16]. It's an open source, multilingual, online

encyclopedia in different domains covering many

topics that authored, edited and published by

volunteers worldwide and has a total number of

articles around 35,579,919 [8] written using wikitext

which is considered a lightweight markup language

[17]. Wikipedia contains inter-language links that

connect articles in different languages [18], but this

does not mean that any change in one language will

affect others; each language has to maintain its

content. The Arabic edition of Wikipedia is ranked

as the 22nd

largest edition of Wikipedia by 350,000

articles [8][19] which is considered 0.98% of the

total number of articles in Wikipedia. The

Wikipedia article begins with a short paragraph

describing the topic. On the other hand, there are

some pages that contain infoboxes. Infobox is

considered a tabular form placed in the upper right

corner of the Wikipedia article in the Arabic edition

enclosed by {{ }} operators and it contains

attributes with their values as pairs that can be

written in Arabic or English, but it is preferable to

be drafted in Arabic. Not all Wikipedia pages have

infoboxes while most of Wikipedia pages have a

reference template that includes the description of

an infobox and its different attributes, Fig.1 shows

an Arabic infobox of Cairo city on the Wikipedia

page.

Fig. 1. The Arabic Wikipedia page infobox for the Cairo city in Egypt

On the other hand, Wikipedia has some weakness

points, such as being unstructured and it does not

support any further searching capabilities rather than

textual searching. Also, it is not capable of making

of sophisticated queries.

3.2 DBpedia

DBpedia gains its publicity according to a large

number of resources and datasets linked to it

forming the Linked Open Data (LOD) [4]. DBpedia

version available at the time of writing of this paper

is DBpedia 2014 releasing that contains 38 million

things with 3 billion RDF triples and 685 classes

[20] in addition to 1079 object proprieties and 1600

data type proprieties [21] that represents the

relationship between two concepts or between a

concept and its value.

The English Chapter of DBpedia is considered the

most completed chapter till now that we use in

mapping to the DBpeida ontology. It describes 4.58

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 116

Page 3: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

million entities including 1,455.000 persons,

241.000 organizations 735,000 places and 411.000

creative works that include films, music albums, and

video games in addition to 131.2 million fact

assertion derived from infoboxes and 168.5 million

triples representing Wikipedia structure [22]

[21].DBpedia ontology-based mainly on OWL [23]

which is considered the backbone of the Semantic

Web. In addition to OWL, there are other schemes

used in DBpedia, such as Wikipedia Categories,

Yago classes [24], the Upper Mapping and Binding

Exchange Layer (UMBEL) [25]. Also, there are

other vocabularies used to describe different

attributes and their values, such as FOAF[26],

SKOS[27], DC[28].

4. DBpedia Extraction Framework

The main goal of DBpedia is to extract the

different attributes represented in Wikipedia

infoboxes to form a structured linked knowledge

graph that facilitate the searching of sophisticated

queries and can easily be used in various

applications [29]. There are two main ways of

Wikipedia gathering data, time is playing an

important role in their classifications; the first one is

the Dump Based, and the second one is the Live

Extraction [30][31][32]. Both of them use the

Extraction Manger [33][34] that manages the

collecting and parsing of Wikipedia articles and

converting their values into different units, data

types and Geo-coordinates [30][31]. While DBpedia

Dump depends on Wikipedia local dumps that are

updated monthly in the form of SQL that are

converted to a triple form in N-triple Serializer or

Virtuoso triplestore [32]. On the other hand, the live

extraction depends on some extractors such as the

Labels Extractor, the Abstract Extractor and others

that will be explained later on. These extractors are

applied to the Wikipedia page and extract the

triples. These triples will be placed in the N-Triple

Dumps, in addition to the Virtuoso, which is used as

a SPARQL endpoint [30][31][35][36]. Any changes

on Wikipedia will instantly be updated in DBpedia

according to the access given from Wikipedia to

DBpedia via Open Archives Initiative Protocol for

Metadata Harvesting (OAI-PMH) [37][38] as shown

in Fig.2.

Fig. 2. DBpedia Extraction Framework

On the other hand, there are two main methods of

infobox extraction. These are Mapping Based

Infobox and Generic Infobox Extraction. Mapping

Based Infobox extraction method used to collect and

map most of the common attributes in Wikipedia to

properties in the DBpedia ontology. This method is

used to cover the shortcomings of the Generic

Infobox Extraction which extract all attributes and

their values via concatenating the title of an article

in Wikipedia with the attribute as a property in a

namespace, such as http://dbpedia.org/property and

the value of that attribute in the form of subject,

predicate and object [31]. DBpedia Information

Extraction Framework (DIEF) added a great value

to the extraction process for the multilingual content

[2][39].The main extractors [29 ][31] [32][39]used

in DBpedia are:

- Abstract Extractor: responsible for

extracting the first paragraph in Wikipedia

article and placed on the DBpedia property

dbpedia–owl: abstract [35]

- Infoboxes Extractor: responsible for the

extraction of properties of all infoboxes.

- Geo-Coordinates Extractor: responsible

for the extraction of Geo-coordinates placed

on Wikipedia pages inside the infobox or in

the upper left hand side of the English

edition and in the upper right hand side in

the Arabic edition of Wikipedia.

- Page Links Extractor: responsible for the

extraction of links between Wikipedia

articles.

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 117

Page 4: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

- Label Extractor: responsible for replacing

the article title in Wikipedia in rdfs: label in

DBpedia

- Homepage Extractor: extracts the website

of the main entity of that article and replace

it in foaf: homepage[31 ].

- Interlingua Links Extractor: responsible

for the linking of an article in one language

to other languages representing the same

article.

- Image Extractor: responsible for the

extraction of the image of the wiki page and

replace it in the property foaf: depiction[31]

- Page ID Extractor: every page in

Wikipedia has an ID that is an integer

number to identify this page, and the value

of the page id will be placed on dbpedia-

owl:wikiPageID:

- Revision ID Extractor: it is an extractor

that extracts the integer value represents the

revision ID and place it in

dbpedia_owl:wikiPageRevisionID[30 ][31]

- Person Data Extractor: this extractor

responsible for the extraction of a person

data, such as date and place of birth which

can represented in the FOAF vocabulary

such as foaf:birthDate or in Dbpedia

vocabulary,such as dbpedia-

owl:birthDate[39]

- Wiki Page Extractor: responsible for the

extraction of links to corresponding pages

or articles in Wikipedia

- Disambiguation Extractor: this extractor

works with homonyms words that refer to

different web resources via dbpedia-

owl:wikiPageDisambiguates

- Redirects Extractor: this extractor used to

redirect links between articles for synonyms

words, for example, the name of the famous

Egyptian actor Adel Imam dbpedia:

Adel_Imam is redirected to dbpedia:

Adel_Emam for representing the same

resource [31].

- Article Category Extractor: according to

the nature of the topic, is categorized. This

categorization in DBpedia is represented via

Dublin Core [28] vocabulary especially

dc:subject and SKOS vocabulary [27].

5. Arabic Chapter

Arabic language is considered one of the non-ASCII

languages, for that reason we use the International

Resource Identifier (IRI) to identify uniquely

different resources as a complementary to the

Uniform Resource Identifier (URI) which is widely

used on the SemanticWeb [13] [40].When the IRI

used in the address bar of the browser to identify a

resource, may special characters appear in the

address, such as

http://ar.dbpedia.org/resource/%D%A%d0%d1

which is considered unclear representation for that

resource, and for that reason it is better to use one of

the plugins to avoid this problem. In addition, there

are main namespaces that are used to define

different players in the Arabic Chapter of DBpedia

such as:

http://dbpedia.org/resource which represents

the data extracted from an article according

to the mapping of an infobox and has a

prefix of dbpedia:

http://dbpedia.org/ontology represents the

DBpedia ontology and has a prefix of

dbpedia–owl:

http://dbpedia.org/property/ represents

properties extracted from Wikipedia

templates as attributes and has a prefix of

dbpprop: [32]

5.1 Mapping Difficulties

One of the objectives of the DBpedia ontology is to

structure Wikipedia data in a standard way and

reduce redundancy, but this is not the case now.

DBpedia Ontology is getting unclear, and standards

are going lost [41].This is one of the reasons that

forces us to put a methodology for Arabic DBpedia

mapping that can enhance the work on this chapter

to get more mapped classes, more contributions, and

more applications. DBpedia mapping is considered

one of the main players in the DBpedia stage and

can be one of the reasons for the success or the

failure of a chapter. Different extractors depend

mainly on mapping, so the wrong mapping can

cause loss of data, from this point of view, the

importance of mapping occurs. According to the

importance of the mapping process in DBpedia[42]

and because of our work is considered the first

contribution in the deployment of the Arabic

Chapter of DBpedia[15]; we will explain the main

participants in mappings and their relations with

each other. Before we go deeply in our methodology

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 118

Page 5: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

of mapping, we should discuss first the problems

that we face on mapping of the Wikipedia templates

and attributes to the DBpedia classes and proprieties

in the Arabic Chapter.

Infobox template is one of the key successes of

DBpedia. Unfortunately, not all Wikipedia

contributors know the importance of an infobox and

for that reasons are not included it in their articles.

For example, only four pages out of 50 of the

Egyptian museums on Wikipedia have infoboxes

which is relatively a small number according to the

importance and the value of these pages in the

cultural heritage. Wikipedia infoboxes may miss

according to different reasons, one of them is may

the writer of an article does not know the

importance of the infobox or does not know how to

use it [15].

On the other hand, English Chapter is considered

the most completed and the reference chapter that

we map to. Unfortunately changing in one of the

classes that we map to in the English Chapter may

cause changing the results of the retrieved data

when querying from another chapter of DBpedia. So

we should be sure from the availability of the class

we map to in the English edition, and we should

make maintenance for the mapping day after day.

For example, Cairo page in Wikipedia has an

infobox in the English Chapter called settlement, as

shown in Fig.3 While in the Arabic Chapter, is

called "تجمع سكاني", and in the Deutsch is ''ort''

Fig. 3. Settlement Infobox in the English Chapter for Cairo

These infoboxes mapped to the class settlement that

has attributes, such as areaTotal, areaLand, and

areaWater. Nowadays, the class settlement is

mapped to the top class PopulatedPlace in the

DBpedia ontology that affects the chapters that

depend on the English Chapter of DBpedia and its

ontology. Thus, we mapped this infobox again in the

Arabic Chapter to the class PopulatedPlace in the

DBpedia ontology to overcome this short comes.

One of the problems also found while mapping is

the naming of an infobox by a name that does not

reflect the page content correctly or have an overlap

with another infobox, for example "متحف عابدٌه" that

refers to Abdeen Museum in Egypt has an infobox

that has a name "مبنى أثري" that has different

attributes than the museum infobox, so when it is

mapped to the DBpedia ontology, it will make a

confusion. For that reason, we changed to تحف" "م to

be corresponding to the museum infobox and

mapped it to the class museum in the DBpedia

ontology. Also, dissimilarity of attributes between

corresponding infoboxes create inconsistency, for

example, the Album infobox that is mapped to the

class Album in DBpedia ontology differs from one

chapter to another. The French Chapter has only six

attributes mapped while 16 attributes are mapped in

the Arabic Chapter. On the other hand, an attribute

may appear in one of the chapters and is not

available on another, for example, there is no

corresponding position for the Chancellor in the

universities in the Arab region and for that reason it

is not included in the mapping of the University

infobox in the Arabic chapter that is represented as

mappings.dbpedia.org/index.php/mapping-ar:

Thus, one of the reasons for the ."معلوماث_جامعت"

unavailability of an attribute in an infobox may go

back to several reasons, one of them is the author of

that article do not know it's value or may there is no

meaning of that attribute in the context of that

article. Another reason may go back to the wrong

selection of an attribute name in Wikipedia

template. Also, appearance of different attributes in

the Arabic language refers to the same meaning but

in different syntax is considered one of the Arabic

mapping problems such as"ولد فً" , "تارٌخ المٍالد", "

as shown in Fig.4 which "المولد " and "المٍالد

represents this problem in different infoboxes for

public figures in the Arab region.

Fig. 4. Different syntax for the birthdate attribute in different infoboxes

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 119

Page 6: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

This problem can be solved by the selection of one

of these attributes and using it to replace other

attributes refer to the same meaning in the Arabic

Wikipedia. Another problem occurs in the Arabic

mapping is the dissimilarity between the infobox

and the templates that refer to, such as the infobox

of "معلوماث جامعت" and its template that is called

In addition, there are different names for ."جامعت"

infoboxes refer to the same thing and have different

attributes which cause inconsistency such as

, ""معلوماث فنان موسٍقى" "صندوق معلوماث فنان موسٍقى""

"موسٍقى ن"فنا that refers to the infobox

“MusicalArtist” in the English Chapter, so to

overcome these shortcomes we should use only one

infobox for representing the same topic.

5.2 Mapping Methodology

The infobox is considered the main player in the

DBpedia mappings in general not only in the Arabic

Chapter as we discussed before. For each article in

Wikipedia, there is an infobox template that covers

the topic may or may not being available in the

article. The template includes different attributes

that are mapped later to properties in the DBpedia

ontology. Part of these attributes differs from one

language to another, according to the creator of the

template and the culture of that language, but most

of them should be the same. Our Arabic Chapter’s

Mapping Methodology (ACMM) is described as

shown in Fig.5, addition to the clear examples

illustrated in this section. ACMM is organized in

eight sequential steps as follows:

1-Creation of an Infobox: if the infobox is not

available for an article in the Wikipedia and we

want to map this article. The selection of the closely

suitable infobox to the article is considered one of

the key successes of the mapping of Wikipedia

infoboxes to DBpedia ontology. For example, there

is an article about Adel Emam, the famous Arabian

actor, so it is better to create an actor infobox for

that article if it is not available instead of a Person

infobox.

2-Check the Mapping Statistics: on this step we

should check first if this infobox is mapped before

or not, and this can be discovered in the Arabic

mappings section inside the DBpedia mappings.

3- Search for the Suitable Class: If the infobox is not

mapped to one of the ontology classes, we look at

the English Wikipedia and DBpedia ontology to

find the most appropriate class that we can map to.

4- Extract the Mapping of the Equivalent Class: at

this stage, the equivalent infobox in the English

Chapter that mapped before to the DBpedia

ontology is considered the reference infobox to ours

in the Arabic Chapter.

5-Mapping: on this step we map the attributes of the

Arabic infobox to the English one and their

corresponding on the DBpedia ontology. However,

before mapping, if there is no any class presents this

infobox in the Arabic mapping, we should create

this class first.

6-Revision: make a revision for that mapping to

ensure that we mapped correctly.

7-Validation: on this step we validate and test the

mapping to correct the inconsistencies may found.

8-Publishing: finally the mapped infobox is now

ready to be published

Fig. 5. Flowchart describing the ACMM

On the other hand, let’s show by examples our

methodology on an article about Al-Ahram daily

newspaper which is considered one of the famous

daily newspapers in the Arab region, and which has

a template called newspaper that can be visited

through

https://en.wikipedia.org/wiki/Template:Infobox_ne

wspaper for the English version and via the URI

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 120

Page 7: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

https://ar.wikipedia.org/wiki/ ةقالب:صندوق_معلوماث_جرٌد

for the Arabic version as shown in Fig.6

Fig. 6. Newspaper template in both Arabic & English Wikipedia

What we see in Fig.6 is the template that is

considered the reference for the infobox. Each

article has a template, and if it is not there, we can

create it if we have a Wikipedia account. These two

templates fulfilled with the information known to

the author of an infobox or by other Wikipedia

contributors. If the author of an infobox does not

know the value of an attribute, he can left it blanks

till another one knows its value come and fill the

space. Fig.7 and Fig.8 show Al-Ahram infobox in

Arabic and English after the fulfillment of their

attributes.

Fig. 7. Infobox newspaper after fulfilled with

Fig. 8. Infobox معلوماث جرٌدة after fulfilled with data

After the creation of an infobox and filling it with

data, it is validated by the Wikipedia Community

and appears on the article page as shown in Fig. 9

Fig. 9. Infobox newspaper on the article page in Wikipedia Arabic &

English as appear on the page article

Mapping statistics as we discussed later give an

overview about the mapped infoboxes and

unmapped infoboxes as shown in Fig.10. After

being sure for the unavailability of that infobox, we

create this infobox from the domain of our language

that is in our case is the Arabic Language, referring

to the infobox that mapped before in English

Chapter to the DBpedia ontology as shown in Fig.11

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 121

Page 8: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

Fig. 10. Mapping Statistics for the Arabic Chapter of DBpedia

Fig. 11. Part of the Infoboxes mapped in the Arabic Chapter of DBpedia

As shown in Fig.12, we use the template mapping to

map each attribute in the infobox to its

corresponding property in the DBpedia ontology. If

we do not find a corresponding property, we can

create it and specify its domain and range, and then

use it and the results is the mapping page after

verification as shown in Fig.13 for Al-Ahram

newspaper contains its data with different

vocabularies.

Fig. 12. The mapping template for the newspaper infobox

Fig. 13. Al-Ahram page on DBpedia

5.3 SPARQL Endpoint

After the verification of the mapped classes, it is

supposed that we can make queries, but this is not

the case. Before the deployment of the SPARQL

endpoint we cannot make any queries on our chapter

of DBpedia. SPARQL is one of the building blocks

of DBpedia. It is a protocol, and RDF query became

a recommendation in January 2008 as SPARQL 1.0

[43] while SPARQL 1.1 [44]became a

recommendation in March 2013. SPARQL is used

to retrieve triples and make sophisticated queries.

Virtuoso is the SPARQL endpoint that we use. The

infrastructure that given to the SPARQL endpoint

for the Arabic Chapter is Intel Core I5 with 4 GBs

memory and 500 GBs hard disks with Ubuntu 12.04

Linux Operating System. Fig.14 shows a SPARQL

query retrieve the foaf name for Al-Ahram

newspaper from the SPARQL endpoint.

Fig. 14. A SPARQL query on Al-Ahram page on DBpedia

6. Results

Our work on the Arabic DBpedia is considered the

first contribution of that chapter of DBpedia. We

established the namespace ''ar'' and tried to find a

methodology that can increase the mapping of

templates and properties in the Arabic Chapter.

When we established the Arabic Chapter, the

template occurrences were zero as in Fig.15

Fig. 15. Template occurrences were zero on DBpedia when we

established the Arabic Chapter

After the deployment of our methodology of

mapping, the number of mapped classes in the

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 122

Page 9: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

Arabic Chapter became 52 mappings from

Wikipedia templates to DBpedia classes out of 1234

[21]. In addition, the number of template

occurrences mapped from the Arabic Wikipedia to

the Arabic Chapter of DBpeida increased from zero

to 59087 out of 208913, which is considered

28.28% of the total templates in the Arabic

Wikipedia. Also, the occurrences of properties in

the Arabic Wikipedia increased from zero to 342231

out of 2817139 which represents 12.15% of a total

number of occurrences in the Arabic Chapter of

DBpedia. Mapping of the Arabic Chapter up till

now is shown in Fig.16

Fig. 16. Occurrences percentage after mapping for classes, templates

and properties on Arabic Chapter of DBpedia

7. Conclusions and Future Work

DBpedia is considered one of the hot topics

nowadays because of its importance to the web of

the Linked Open Data. Unfortunately, the Arabic

was not included before the work done by the author

of this work; however of the importance of the

Arabic language according to the enormous number

of internet users worldwide. The paper on hands

described the creation and mapping of the Arabic

Chapter shows the increasing on the number of

mapped templates and attributes from Wikipedia to

the classes and properties in the DBpedia ontology

after the deployment of our mapping methodology.

Our Future work will concern the quality of data.

Also, we will try to find solutions for fully automate

the mapping process of the Arabic DBpedia. References:

[1] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C.

Becker, R. Cyganiak and S. Hellmann,

'DBpedia - A crystallization point for the Web

of Data', Web Semantics: Science, Services and

Agents on the World Wide Web, vol. 7, no. 3,

pp. 154-165, 2009.

[2] D. Kontokostas, C. Bratsas, S. Auer, S.

Hellmann, I. Antoniou and G. Metakides,

'Internationalization of Linked Data: The case of

the Greek DBpedia edition', Web Semantics:

Science, Services and Agents on the World Wide

Web, vol. 15, pp. 51-61, 2012.

[3] C. Bizer, T. Heath and T. Berners-Lee, 'Linked

Data - The Story So Far', International Journal

on Semantic Web and Information Systems, vol.

5, no. 3, pp. 1-22, 2009.

[4] Y. Yamamoto, A. Yamaguchi and A. Yonezawa,

'Building Linked Open Data towards integration

of biomedical scientific literature with

DBpedia', J Biomed Sem, vol. 4, no. 1, p. 8,

2013.

[5] Creativecommons.org, 'Creative Commons',

2015.[Online].Available: https://creativecomm

ons.org . [Accessed: 25- Jul- 2015].

[6] Gnu.org, 'GNU Free Documentation License

v1.3- GNU Project - Free Software Foundation',

2015.[Online].Available:http://www.gnu.org/

licenses/fdl-1.3.en.html.[Accessed:21-Jul-

2015].

[7]Wiki.dbpedia.org, 'Language Chapters DBpedia',

2015.[Online].Available: http://wiki.dbpedia.

org/about/language-chapters.[Accessed: 22- Jul-

2015].

[8] Wikipedia, 'List of Wikipedias', 2015. [Online].

Available:https://en.wikipedia.org/wiki/List_of

Wikipedias. [Accessed: 18- Jul- 2015].

[9] Wikipedia, 'Languages used on the Internet',

2015.[Online].Available: https://en.wikipedia

.org/wiki/Languages_used_on_the_Internet

[Accessed: 18- Jul- 2015].

[10]Internetworldstats.com, 'Arabic Speaking

Internet Users and Population Statistics', 2015.

[Online].Available: http://www.internetworldsta

ts.com/ stats19.htm. [Accessed: 27- Jul- 2015].

[11] D. kontokastas and S. Hellman, The DBpedia

data stack. Towards a Sustainable DBpedia

Project provide a public data infrastructure for

Europe,[Online].Available: http://svn.aksw.org/

papers/2014/EDF_DBpediaDataStack/pub

lic.pdf. [Accessed: 15- June- 2015].

[12] C. Bratsas, L. Ioannidis, D. Kontokostas, S.

Auer, C. Bizer, S. Hellmann and I. Antoniou,

'DBpedia internationalization-a graphical tool

for I18n infobox-to-ontology mappings', in

International Semantic Web Conference Demo

(ISWC2011 Demo), Bonn, Germany, 2011.

[13] X. Niu, X. Sun, H. Wang, S. Rong, G. Qi and

Y. Yu, Zhishi.Me - Weaving Chinese Linking

Open Data, in The Semantic Web -- ISWC 2011:

10th International Semantic Web Conference,

Bonn, Germany, 2011

05

1015202530

Acc

ure

nce

s (%

)

Categories

Classes Mapped

Templates Mapped

Properties Mapped

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 123

Page 10: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

[14] A. Palmero Aprosio, 'Extending Linked open

data resources exploiting Wikipedia as a source

of information', Ph.D., Universit`a degli Studi di

Milano, 2012.

[15] H. Al-Feel, 'A Step towards the Arabic

DBpedia', International Journal of Computer

Applications, vol. 80, no. 3, pp. 27-33, 2013.

[16] Alexa.com, 'Alexa Top 500 Global Sites', 2015.

[Online].Available:http://www.alexa.com/to

psites . [Accessed: 27- Jul- 2015].

[17] S. Auer and J. Lehmann, "What Have

Innsbruck and Leipzig in Common? Extracting

Semantics from Wiki Content," in THE 4TH

EUROPEAN SEMANTIC WEB CONFERENCE (ESWC

2007) Innsbruck, Austria,2007, pp. 503-517.

[18]L. Zhang, A. Rettinger and S. Thoma, 'Bridging

the Gap between Cross-lingual NLP and

DBpedia by Exploiting Wikipedia', 2014.

[Online].Available: http://www.aifb.kit.edu/im

ages/4/43/Nlp%26dbpedia.pdf. [Accessed: 27-

Jul- 2015].

[19] Wikipedia, 'Arabic Wikipedia', 2015. [Online].

Available:https://en.wikipedia.org/wiki/Arabic_

Wikipedia. [Accessed: 27- Jul- 2015].

[20]'DBpedia Statistics', 2015. [Online]. Available:

http://Wiki.dpedia.org/Datasets2014/dataset/Stat

istics. [Accessed: 27- Jul- 2015].

[21] Blog.dbpedia.org, 'DBpedia Version 2014

released|DBpedia Blog', 2014. [Online].

Available:http://blog.dbpedia.org/?p=77.

[Accessed: 8- Jul- 2015].

[22]D. Kontokostas, 'The past, present & future of

DBpedia', in 18th International Conference on

Business Information Systems(BIS), Poznan,

Poland, 2015.

[23] W3.org, 'OWL 2 Web Ontology Language

Document Overview (Second Edition)', 2015.

[Online].Available: http://www.w3.org/TR/owl

2-overview/. [Accessed: 27- Jul- 2015].

[24]F. Suchanek, G. Kasneci and G. Weikum,

'YAGO: A Core of Semantic Knowledge

Unifying WordNet and Wikipedia', in

Proceedings of the Sixteenth International

World Wide Web Conference (WWW2007),

Banff, Alberta, CANADA, 2015, pp. 697-702.

[25] Umbel.org, 'UMBEL: Upper Mapping and

Binding Exchange Layer', 2015. [Online].

Available: http://www.umbel.org/. [Accessed:

23- Jul- 2015].

[26] Xmlns.com, 'FOAF Vocabulary Specification',

2014.[Online].Available:http://xmlns.com/f

oaf/spec/. [Accessed: 17- Jul- 2015].

[27]A. Miles and S. Bechhofer, 'SKOS Simple

Knowledge Organization System Namespace

Document 30 July 2008 "Last Call" Edition',

W3.org,2008.[Online].Available: http://www

.w3.org/TR/2008/WD-skos-reference-200808

29/skos.html. [Accessed: 11- Jul- 2015].

[28] Dublincore.org, 'DCMI Home: Dublin Core®

Metadata Initiative (DCMI)', 2014. [Online].

Available: http://dublincore.org/. [Accessed: 27-

Jul- 2015].

[29]S. Hellmann, C. Stadler, J. Lehmann and S.

Auer, 'DBpedia Live Extraction', in On the

Move to Meaningful Internet Systems: OTM,

Vilamoura, Portugal, 2009, pp. 1209 – 1223.

[30]M. Morsey, J. Lehmann, S. Auer, C. Stadler and

S. Hellmann, 'DBpedia and the live extraction of

structured data from Wikipedia', Program:

electronic library and information systems, vol.

46, no. 2, pp. 157-181, 2012.

[31]C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C.

Becker, R. Cyganiak and S. Hellmann,

'DBpedia - A crystallization point for the Web

of Data', Web Semantics: Science, Services and

Agents on the World Wide Web, vol. 7, no. 3,

pp. 154-165, 2009.

[32]J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D.

Kontokostas, P. Mendes, S. Hellmann, M.

Morsey, P. van Kleef, S. Auer and C. Bizer,

'DBpedia-a large-scale, multilingual knowledge

base extracted from Wikipedia'', Semantic Web

Journal, vol. 5, pp. 1-29, 2014.

[33]S. Hellmann, 'DBpedia Extraction of

Knowledge from Wikipedia', http://lod2.eu,

2011. [Online].Available:

http://semanticweb.kaist.ac.kr/

workshop2011/presentation/3_dbpedia_sebastia

n.pdf. [Accessed: 20- Jul- 2015].

[34]D. Lange, C. Böhm and F. Naumann,

'Extracting Structured Information from

Wikipedia Articles to Populate Infoboxes', in

19th ACM international conference on

Information and knowledge management, 2010,

pp. 1661-1664.

[35]'LOD2 - Creating Knowledge out of Interlinked

Data',2010.[Online].Available:

http://cordis.europa.eu/docs/projects/ cnect/3/2

57943/080/deliverables/001-LOD2D322DBp

ediaLiveExtraction.pdf. [Accessed: 10- Jul-

2015].

[36] Wiki.dbpedia.org, 'Data Set 2014 | DBpedia',

2015.[Online].Available:

http://wiki.dbpedia.org/data-set-2014.

[Accessed: 6- Jul- 2015].

[37] Openarchives.org, 'Open Archives Initiative -

Protocol for Metadata Harvesting - v.2.0', 2015.

[Online].Available: http://www.openarchives

.org/OAI/2.0/openarchivesprotocol.htm.

[Accessed: 2-- Jul- 2015].

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 124

Page 11: The Roadmap for the Arabic Chapter of DBpediaits content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia faces obstacles, such as the

[38]S. Hamburger, 'Using the open archives

initiative protocol for metadata harvesting',

Library Collections, Acquisitions, and

Technical Services, vol. 32, no. 2, p. 114, 2008.

[39]A. Ismayilov, D. Kontokostas, S. Auer, J.

Lehmann and S. Hellmann, Wikidata through

the Eyes of DBpedia, 1st ed. 2015.

[40] W3.org, 'Internationalized Resource Identifiers

(IRIs)', 2011. [Online]. Available:

http://www.w3.org/International/O-URL-and-

ident.html. [Accessed: 15- June - 2015].

[41]'Mapping Guide', 2015. [Online]. Available:

http://mappings.dbpedia.org/index.php/map

ing_guide. [Accessed: 10- June - 2015].

[42]D. Kontokostas, DBpedia Mappings Crowd-

sourcing, 1st ed. AKSW, Universität Leipzig,

2014, pp. 2-23.

[43] W3.org, 'SPARQL Query Language for RDF

1.0',2008.[Online].Available:http://www.w3

.org/TR/rdf-sparql-query/. [Accessed: 27- Jul-

2015].

[44] W3.org, 'SPARQL 1.1 Query Language', 2013.

[Online].Available: http://www.w3.org/TR/spa

rql11-query/. [Accessed: 10- June- 2015].

Mathematical and Computational Methods in Electrical Engineering

ISBN: 978-1-61804-329-0 125