the roadmap for the arabic chapter of dbpediaits content from wikipedia and linked to thousands of...
Post on 06-Jul-2020
6 Views
Preview:
TRANSCRIPT
The Roadmap for the Arabic Chapter of DBpedia
HAYTHAM AL-FEEL
Information Systems Department, Faculty of Computers and Information Fayoum University
Cairo, Egypt
htf00 @fayoum.edu.eg
Abstract: - DBpedia is nowadays considered one of the main sources of structured data on the web that extract
its content from Wikipedia and linked to thousands of web resources forming the Linked Open Data. DBpedia
faces obstacles, such as the lack of multilingualism that appears clearly in the Arabic domain. From that point
of view, the paper on hands draws a roadmap to adhere these obstacles via representing the main vocabularies
used in DBpedia, main participants, and best practices. In addition to the representing of our Mapping
Methodology for the Arabic Chapter, which we call ACMM. This paper highlights our efforts that considered
the first contribution in the deployment of the Arabic Chapter of DBpedia from scratch that increased the
number of mapped infoboxes, open the door for development of new applications based on this chapter, as a
step in publishing and linking large semantic data in Arabic.
Key-Words: - Semantic Web; DBpedia; Wikipedia; Mapping; Data Sharing; Arabic Chapter
1 Introduction
DBpedia project [1] is considered the results of the
collaboration between Freie University of Berlin,
Open Link Software and the University of Leipzig
which aims to represent Wikipedia content in a
structured from, such as RDF triples. DBpedia is
considered the main web hub that links different
datasets to each other [2][3][4]. The availability of
DBpedia in local languages is expected to open the
door for semantic applications to be developed to
discover new knowledge from different resources,
and facilitates the making of sophisticated queries;
which is not available in Wikipedia. DBpedia works
under the license of the Creative Commons
Attribution-Share Alike License [5] and GNU Free
Documentation License [6 ].At the time of writing,
there are 18 localized DBpedia chapters, including
English, German, French, Greek, Basque, Czech,
Dutch, Polish, Swedish, Spanish, Italian,
Portuguese, Russian, Ukrainian, Korean, Japanese,
Indonesian and Esperanto [7]. On the other hand,
there are 279 active Wikipedia editions [8] which
refer to different languages available World Wide,
while the number of available DBpedia chapters did
not exceed 6.45% of the total number of Wikipedia
editions. DBpedia faces obstacles preventing it from
the sustainable operation, including the lack of
multilingualism and documentation. Each language
has its characteristics and its way of mapping. There
are still many languages that do not have DBpedia
chapters up till now, for example, the Arabic
Chapter was not there before the work we did and
explained in this paper. However, there are around
135,610,819 [9][10] Arabian internet-users World
Wide, which are around 4.8% of users all over the
world.
The paper on hands describes the work we did in the
mapping and the implementation of the Arabic
Chapter. This work is considered the road-map to
the Arabic Chapter of DBpedia, explaining how to
deal with different difficulties and problems that we
met on mapping, describing the best practices and
our methodology of mappings that increased the
number of mapped templates and properties in the
Arabic Chapter and became available in the
DBpedia mapping race with our individual efforts.
This paper is organized as follows: Section 2
discusses the related work; Section 3 overviews the
Wikipedia Project and the DBpedia project. While,
Section 4 highlights the DBpedia Extraction
Framework. On the other hand, Section 5 describes
the steps needed to build the Arabic Chapter of
DBpedia, highlighting difficulties that we met and
how we solved them. Also, this describes our
mapping methodology and the conditions needed to
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 115
do SPARQL queries. Section 6 shows our mapping
results. Finally, section 7 concludes the paper and
discusses the future work that we can do.
2 RELATED WORK
DBpedia internationalization efforts just began a
few years ago aiming to participate in finding
solutions for the short comes of multilingualism in
DBpedia. There is no much work done in this
direction, and this is an open research area. On the
other hand, Kontokostas and Hellman [11]
described the main problems faced in the
multilingual chapters of DBpedia, such as lack of
tools that support multilingualism, in addition to the
lack of documentation and tutorials that help
contribution of the community [11]. An attempt to
the mapping of multilingual Wikipedia infoboxes to
DBpedia ontology was discussed in [12] but it does
not fill the needs of the mapping of the Arabic
Language. Also, Xing Niu and colleagues tried to
put strategies to the mappings of the Chinese
DBpedia but it doesn't match fully with the Arabic
mappings according to the different characteristics
of the two languages, especially in the punctuation
[13]. Also, Alessio Palmero made an attempt to
automate the mapping of multilingual DBpedia, but
he did not prove his work to be used in the DBpedia
community and did not consider the Arabic
language in his work [14].The first time that Arabic
language being discussed in DBpedia was in [15]
which highlighted the weakness and difficulties that
face the Arabic Language in the domain.
3 Wikipedia & DBpedia
3.1 Wikipedia
Wikipedia is considered a great source of data that
became the 7th most visited website all over the
world [16]. It's an open source, multilingual, online
encyclopedia in different domains covering many
topics that authored, edited and published by
volunteers worldwide and has a total number of
articles around 35,579,919 [8] written using wikitext
which is considered a lightweight markup language
[17]. Wikipedia contains inter-language links that
connect articles in different languages [18], but this
does not mean that any change in one language will
affect others; each language has to maintain its
content. The Arabic edition of Wikipedia is ranked
as the 22nd
largest edition of Wikipedia by 350,000
articles [8][19] which is considered 0.98% of the
total number of articles in Wikipedia. The
Wikipedia article begins with a short paragraph
describing the topic. On the other hand, there are
some pages that contain infoboxes. Infobox is
considered a tabular form placed in the upper right
corner of the Wikipedia article in the Arabic edition
enclosed by {{ }} operators and it contains
attributes with their values as pairs that can be
written in Arabic or English, but it is preferable to
be drafted in Arabic. Not all Wikipedia pages have
infoboxes while most of Wikipedia pages have a
reference template that includes the description of
an infobox and its different attributes, Fig.1 shows
an Arabic infobox of Cairo city on the Wikipedia
page.
Fig. 1. The Arabic Wikipedia page infobox for the Cairo city in Egypt
On the other hand, Wikipedia has some weakness
points, such as being unstructured and it does not
support any further searching capabilities rather than
textual searching. Also, it is not capable of making
of sophisticated queries.
3.2 DBpedia
DBpedia gains its publicity according to a large
number of resources and datasets linked to it
forming the Linked Open Data (LOD) [4]. DBpedia
version available at the time of writing of this paper
is DBpedia 2014 releasing that contains 38 million
things with 3 billion RDF triples and 685 classes
[20] in addition to 1079 object proprieties and 1600
data type proprieties [21] that represents the
relationship between two concepts or between a
concept and its value.
The English Chapter of DBpedia is considered the
most completed chapter till now that we use in
mapping to the DBpeida ontology. It describes 4.58
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 116
million entities including 1,455.000 persons,
241.000 organizations 735,000 places and 411.000
creative works that include films, music albums, and
video games in addition to 131.2 million fact
assertion derived from infoboxes and 168.5 million
triples representing Wikipedia structure [22]
[21].DBpedia ontology-based mainly on OWL [23]
which is considered the backbone of the Semantic
Web. In addition to OWL, there are other schemes
used in DBpedia, such as Wikipedia Categories,
Yago classes [24], the Upper Mapping and Binding
Exchange Layer (UMBEL) [25]. Also, there are
other vocabularies used to describe different
attributes and their values, such as FOAF[26],
SKOS[27], DC[28].
4. DBpedia Extraction Framework
The main goal of DBpedia is to extract the
different attributes represented in Wikipedia
infoboxes to form a structured linked knowledge
graph that facilitate the searching of sophisticated
queries and can easily be used in various
applications [29]. There are two main ways of
Wikipedia gathering data, time is playing an
important role in their classifications; the first one is
the Dump Based, and the second one is the Live
Extraction [30][31][32]. Both of them use the
Extraction Manger [33][34] that manages the
collecting and parsing of Wikipedia articles and
converting their values into different units, data
types and Geo-coordinates [30][31]. While DBpedia
Dump depends on Wikipedia local dumps that are
updated monthly in the form of SQL that are
converted to a triple form in N-triple Serializer or
Virtuoso triplestore [32]. On the other hand, the live
extraction depends on some extractors such as the
Labels Extractor, the Abstract Extractor and others
that will be explained later on. These extractors are
applied to the Wikipedia page and extract the
triples. These triples will be placed in the N-Triple
Dumps, in addition to the Virtuoso, which is used as
a SPARQL endpoint [30][31][35][36]. Any changes
on Wikipedia will instantly be updated in DBpedia
according to the access given from Wikipedia to
DBpedia via Open Archives Initiative Protocol for
Metadata Harvesting (OAI-PMH) [37][38] as shown
in Fig.2.
Fig. 2. DBpedia Extraction Framework
On the other hand, there are two main methods of
infobox extraction. These are Mapping Based
Infobox and Generic Infobox Extraction. Mapping
Based Infobox extraction method used to collect and
map most of the common attributes in Wikipedia to
properties in the DBpedia ontology. This method is
used to cover the shortcomings of the Generic
Infobox Extraction which extract all attributes and
their values via concatenating the title of an article
in Wikipedia with the attribute as a property in a
namespace, such as http://dbpedia.org/property and
the value of that attribute in the form of subject,
predicate and object [31]. DBpedia Information
Extraction Framework (DIEF) added a great value
to the extraction process for the multilingual content
[2][39].The main extractors [29 ][31] [32][39]used
in DBpedia are:
- Abstract Extractor: responsible for
extracting the first paragraph in Wikipedia
article and placed on the DBpedia property
dbpedia–owl: abstract [35]
- Infoboxes Extractor: responsible for the
extraction of properties of all infoboxes.
- Geo-Coordinates Extractor: responsible
for the extraction of Geo-coordinates placed
on Wikipedia pages inside the infobox or in
the upper left hand side of the English
edition and in the upper right hand side in
the Arabic edition of Wikipedia.
- Page Links Extractor: responsible for the
extraction of links between Wikipedia
articles.
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 117
- Label Extractor: responsible for replacing
the article title in Wikipedia in rdfs: label in
DBpedia
- Homepage Extractor: extracts the website
of the main entity of that article and replace
it in foaf: homepage[31 ].
- Interlingua Links Extractor: responsible
for the linking of an article in one language
to other languages representing the same
article.
- Image Extractor: responsible for the
extraction of the image of the wiki page and
replace it in the property foaf: depiction[31]
- Page ID Extractor: every page in
Wikipedia has an ID that is an integer
number to identify this page, and the value
of the page id will be placed on dbpedia-
owl:wikiPageID:
- Revision ID Extractor: it is an extractor
that extracts the integer value represents the
revision ID and place it in
dbpedia_owl:wikiPageRevisionID[30 ][31]
- Person Data Extractor: this extractor
responsible for the extraction of a person
data, such as date and place of birth which
can represented in the FOAF vocabulary
such as foaf:birthDate or in Dbpedia
vocabulary,such as dbpedia-
owl:birthDate[39]
- Wiki Page Extractor: responsible for the
extraction of links to corresponding pages
or articles in Wikipedia
- Disambiguation Extractor: this extractor
works with homonyms words that refer to
different web resources via dbpedia-
owl:wikiPageDisambiguates
- Redirects Extractor: this extractor used to
redirect links between articles for synonyms
words, for example, the name of the famous
Egyptian actor Adel Imam dbpedia:
Adel_Imam is redirected to dbpedia:
Adel_Emam for representing the same
resource [31].
- Article Category Extractor: according to
the nature of the topic, is categorized. This
categorization in DBpedia is represented via
Dublin Core [28] vocabulary especially
dc:subject and SKOS vocabulary [27].
5. Arabic Chapter
Arabic language is considered one of the non-ASCII
languages, for that reason we use the International
Resource Identifier (IRI) to identify uniquely
different resources as a complementary to the
Uniform Resource Identifier (URI) which is widely
used on the SemanticWeb [13] [40].When the IRI
used in the address bar of the browser to identify a
resource, may special characters appear in the
address, such as
http://ar.dbpedia.org/resource/%D%A%d0%d1
which is considered unclear representation for that
resource, and for that reason it is better to use one of
the plugins to avoid this problem. In addition, there
are main namespaces that are used to define
different players in the Arabic Chapter of DBpedia
such as:
http://dbpedia.org/resource which represents
the data extracted from an article according
to the mapping of an infobox and has a
prefix of dbpedia:
http://dbpedia.org/ontology represents the
DBpedia ontology and has a prefix of
dbpedia–owl:
http://dbpedia.org/property/ represents
properties extracted from Wikipedia
templates as attributes and has a prefix of
dbpprop: [32]
5.1 Mapping Difficulties
One of the objectives of the DBpedia ontology is to
structure Wikipedia data in a standard way and
reduce redundancy, but this is not the case now.
DBpedia Ontology is getting unclear, and standards
are going lost [41].This is one of the reasons that
forces us to put a methodology for Arabic DBpedia
mapping that can enhance the work on this chapter
to get more mapped classes, more contributions, and
more applications. DBpedia mapping is considered
one of the main players in the DBpedia stage and
can be one of the reasons for the success or the
failure of a chapter. Different extractors depend
mainly on mapping, so the wrong mapping can
cause loss of data, from this point of view, the
importance of mapping occurs. According to the
importance of the mapping process in DBpedia[42]
and because of our work is considered the first
contribution in the deployment of the Arabic
Chapter of DBpedia[15]; we will explain the main
participants in mappings and their relations with
each other. Before we go deeply in our methodology
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 118
of mapping, we should discuss first the problems
that we face on mapping of the Wikipedia templates
and attributes to the DBpedia classes and proprieties
in the Arabic Chapter.
Infobox template is one of the key successes of
DBpedia. Unfortunately, not all Wikipedia
contributors know the importance of an infobox and
for that reasons are not included it in their articles.
For example, only four pages out of 50 of the
Egyptian museums on Wikipedia have infoboxes
which is relatively a small number according to the
importance and the value of these pages in the
cultural heritage. Wikipedia infoboxes may miss
according to different reasons, one of them is may
the writer of an article does not know the
importance of the infobox or does not know how to
use it [15].
On the other hand, English Chapter is considered
the most completed and the reference chapter that
we map to. Unfortunately changing in one of the
classes that we map to in the English Chapter may
cause changing the results of the retrieved data
when querying from another chapter of DBpedia. So
we should be sure from the availability of the class
we map to in the English edition, and we should
make maintenance for the mapping day after day.
For example, Cairo page in Wikipedia has an
infobox in the English Chapter called settlement, as
shown in Fig.3 While in the Arabic Chapter, is
called "تجمع سكاني", and in the Deutsch is ''ort''
Fig. 3. Settlement Infobox in the English Chapter for Cairo
These infoboxes mapped to the class settlement that
has attributes, such as areaTotal, areaLand, and
areaWater. Nowadays, the class settlement is
mapped to the top class PopulatedPlace in the
DBpedia ontology that affects the chapters that
depend on the English Chapter of DBpedia and its
ontology. Thus, we mapped this infobox again in the
Arabic Chapter to the class PopulatedPlace in the
DBpedia ontology to overcome this short comes.
One of the problems also found while mapping is
the naming of an infobox by a name that does not
reflect the page content correctly or have an overlap
with another infobox, for example "متحف عابدٌه" that
refers to Abdeen Museum in Egypt has an infobox
that has a name "مبنى أثري" that has different
attributes than the museum infobox, so when it is
mapped to the DBpedia ontology, it will make a
confusion. For that reason, we changed to تحف" "م to
be corresponding to the museum infobox and
mapped it to the class museum in the DBpedia
ontology. Also, dissimilarity of attributes between
corresponding infoboxes create inconsistency, for
example, the Album infobox that is mapped to the
class Album in DBpedia ontology differs from one
chapter to another. The French Chapter has only six
attributes mapped while 16 attributes are mapped in
the Arabic Chapter. On the other hand, an attribute
may appear in one of the chapters and is not
available on another, for example, there is no
corresponding position for the Chancellor in the
universities in the Arab region and for that reason it
is not included in the mapping of the University
infobox in the Arabic chapter that is represented as
mappings.dbpedia.org/index.php/mapping-ar:
Thus, one of the reasons for the ."معلوماث_جامعت"
unavailability of an attribute in an infobox may go
back to several reasons, one of them is the author of
that article do not know it's value or may there is no
meaning of that attribute in the context of that
article. Another reason may go back to the wrong
selection of an attribute name in Wikipedia
template. Also, appearance of different attributes in
the Arabic language refers to the same meaning but
in different syntax is considered one of the Arabic
mapping problems such as"ولد فً" , "تارٌخ المٍالد", "
as shown in Fig.4 which "المولد " and "المٍالد
represents this problem in different infoboxes for
public figures in the Arab region.
Fig. 4. Different syntax for the birthdate attribute in different infoboxes
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 119
This problem can be solved by the selection of one
of these attributes and using it to replace other
attributes refer to the same meaning in the Arabic
Wikipedia. Another problem occurs in the Arabic
mapping is the dissimilarity between the infobox
and the templates that refer to, such as the infobox
of "معلوماث جامعت" and its template that is called
In addition, there are different names for ."جامعت"
infoboxes refer to the same thing and have different
attributes which cause inconsistency such as
, ""معلوماث فنان موسٍقى" "صندوق معلوماث فنان موسٍقى""
"موسٍقى ن"فنا that refers to the infobox
“MusicalArtist” in the English Chapter, so to
overcome these shortcomes we should use only one
infobox for representing the same topic.
5.2 Mapping Methodology
The infobox is considered the main player in the
DBpedia mappings in general not only in the Arabic
Chapter as we discussed before. For each article in
Wikipedia, there is an infobox template that covers
the topic may or may not being available in the
article. The template includes different attributes
that are mapped later to properties in the DBpedia
ontology. Part of these attributes differs from one
language to another, according to the creator of the
template and the culture of that language, but most
of them should be the same. Our Arabic Chapter’s
Mapping Methodology (ACMM) is described as
shown in Fig.5, addition to the clear examples
illustrated in this section. ACMM is organized in
eight sequential steps as follows:
1-Creation of an Infobox: if the infobox is not
available for an article in the Wikipedia and we
want to map this article. The selection of the closely
suitable infobox to the article is considered one of
the key successes of the mapping of Wikipedia
infoboxes to DBpedia ontology. For example, there
is an article about Adel Emam, the famous Arabian
actor, so it is better to create an actor infobox for
that article if it is not available instead of a Person
infobox.
2-Check the Mapping Statistics: on this step we
should check first if this infobox is mapped before
or not, and this can be discovered in the Arabic
mappings section inside the DBpedia mappings.
3- Search for the Suitable Class: If the infobox is not
mapped to one of the ontology classes, we look at
the English Wikipedia and DBpedia ontology to
find the most appropriate class that we can map to.
4- Extract the Mapping of the Equivalent Class: at
this stage, the equivalent infobox in the English
Chapter that mapped before to the DBpedia
ontology is considered the reference infobox to ours
in the Arabic Chapter.
5-Mapping: on this step we map the attributes of the
Arabic infobox to the English one and their
corresponding on the DBpedia ontology. However,
before mapping, if there is no any class presents this
infobox in the Arabic mapping, we should create
this class first.
6-Revision: make a revision for that mapping to
ensure that we mapped correctly.
7-Validation: on this step we validate and test the
mapping to correct the inconsistencies may found.
8-Publishing: finally the mapped infobox is now
ready to be published
Fig. 5. Flowchart describing the ACMM
On the other hand, let’s show by examples our
methodology on an article about Al-Ahram daily
newspaper which is considered one of the famous
daily newspapers in the Arab region, and which has
a template called newspaper that can be visited
through
https://en.wikipedia.org/wiki/Template:Infobox_ne
wspaper for the English version and via the URI
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 120
https://ar.wikipedia.org/wiki/ ةقالب:صندوق_معلوماث_جرٌد
for the Arabic version as shown in Fig.6
Fig. 6. Newspaper template in both Arabic & English Wikipedia
What we see in Fig.6 is the template that is
considered the reference for the infobox. Each
article has a template, and if it is not there, we can
create it if we have a Wikipedia account. These two
templates fulfilled with the information known to
the author of an infobox or by other Wikipedia
contributors. If the author of an infobox does not
know the value of an attribute, he can left it blanks
till another one knows its value come and fill the
space. Fig.7 and Fig.8 show Al-Ahram infobox in
Arabic and English after the fulfillment of their
attributes.
Fig. 7. Infobox newspaper after fulfilled with
Fig. 8. Infobox معلوماث جرٌدة after fulfilled with data
After the creation of an infobox and filling it with
data, it is validated by the Wikipedia Community
and appears on the article page as shown in Fig. 9
Fig. 9. Infobox newspaper on the article page in Wikipedia Arabic &
English as appear on the page article
Mapping statistics as we discussed later give an
overview about the mapped infoboxes and
unmapped infoboxes as shown in Fig.10. After
being sure for the unavailability of that infobox, we
create this infobox from the domain of our language
that is in our case is the Arabic Language, referring
to the infobox that mapped before in English
Chapter to the DBpedia ontology as shown in Fig.11
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 121
Fig. 10. Mapping Statistics for the Arabic Chapter of DBpedia
Fig. 11. Part of the Infoboxes mapped in the Arabic Chapter of DBpedia
As shown in Fig.12, we use the template mapping to
map each attribute in the infobox to its
corresponding property in the DBpedia ontology. If
we do not find a corresponding property, we can
create it and specify its domain and range, and then
use it and the results is the mapping page after
verification as shown in Fig.13 for Al-Ahram
newspaper contains its data with different
vocabularies.
Fig. 12. The mapping template for the newspaper infobox
Fig. 13. Al-Ahram page on DBpedia
5.3 SPARQL Endpoint
After the verification of the mapped classes, it is
supposed that we can make queries, but this is not
the case. Before the deployment of the SPARQL
endpoint we cannot make any queries on our chapter
of DBpedia. SPARQL is one of the building blocks
of DBpedia. It is a protocol, and RDF query became
a recommendation in January 2008 as SPARQL 1.0
[43] while SPARQL 1.1 [44]became a
recommendation in March 2013. SPARQL is used
to retrieve triples and make sophisticated queries.
Virtuoso is the SPARQL endpoint that we use. The
infrastructure that given to the SPARQL endpoint
for the Arabic Chapter is Intel Core I5 with 4 GBs
memory and 500 GBs hard disks with Ubuntu 12.04
Linux Operating System. Fig.14 shows a SPARQL
query retrieve the foaf name for Al-Ahram
newspaper from the SPARQL endpoint.
Fig. 14. A SPARQL query on Al-Ahram page on DBpedia
6. Results
Our work on the Arabic DBpedia is considered the
first contribution of that chapter of DBpedia. We
established the namespace ''ar'' and tried to find a
methodology that can increase the mapping of
templates and properties in the Arabic Chapter.
When we established the Arabic Chapter, the
template occurrences were zero as in Fig.15
Fig. 15. Template occurrences were zero on DBpedia when we
established the Arabic Chapter
After the deployment of our methodology of
mapping, the number of mapped classes in the
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 122
Arabic Chapter became 52 mappings from
Wikipedia templates to DBpedia classes out of 1234
[21]. In addition, the number of template
occurrences mapped from the Arabic Wikipedia to
the Arabic Chapter of DBpeida increased from zero
to 59087 out of 208913, which is considered
28.28% of the total templates in the Arabic
Wikipedia. Also, the occurrences of properties in
the Arabic Wikipedia increased from zero to 342231
out of 2817139 which represents 12.15% of a total
number of occurrences in the Arabic Chapter of
DBpedia. Mapping of the Arabic Chapter up till
now is shown in Fig.16
Fig. 16. Occurrences percentage after mapping for classes, templates
and properties on Arabic Chapter of DBpedia
7. Conclusions and Future Work
DBpedia is considered one of the hot topics
nowadays because of its importance to the web of
the Linked Open Data. Unfortunately, the Arabic
was not included before the work done by the author
of this work; however of the importance of the
Arabic language according to the enormous number
of internet users worldwide. The paper on hands
described the creation and mapping of the Arabic
Chapter shows the increasing on the number of
mapped templates and attributes from Wikipedia to
the classes and properties in the DBpedia ontology
after the deployment of our mapping methodology.
Our Future work will concern the quality of data.
Also, we will try to find solutions for fully automate
the mapping process of the Arabic DBpedia. References:
[1] C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C.
Becker, R. Cyganiak and S. Hellmann,
'DBpedia - A crystallization point for the Web
of Data', Web Semantics: Science, Services and
Agents on the World Wide Web, vol. 7, no. 3,
pp. 154-165, 2009.
[2] D. Kontokostas, C. Bratsas, S. Auer, S.
Hellmann, I. Antoniou and G. Metakides,
'Internationalization of Linked Data: The case of
the Greek DBpedia edition', Web Semantics:
Science, Services and Agents on the World Wide
Web, vol. 15, pp. 51-61, 2012.
[3] C. Bizer, T. Heath and T. Berners-Lee, 'Linked
Data - The Story So Far', International Journal
on Semantic Web and Information Systems, vol.
5, no. 3, pp. 1-22, 2009.
[4] Y. Yamamoto, A. Yamaguchi and A. Yonezawa,
'Building Linked Open Data towards integration
of biomedical scientific literature with
DBpedia', J Biomed Sem, vol. 4, no. 1, p. 8,
2013.
[5] Creativecommons.org, 'Creative Commons',
2015.[Online].Available: https://creativecomm
ons.org . [Accessed: 25- Jul- 2015].
[6] Gnu.org, 'GNU Free Documentation License
v1.3- GNU Project - Free Software Foundation',
2015.[Online].Available:http://www.gnu.org/
licenses/fdl-1.3.en.html.[Accessed:21-Jul-
2015].
[7]Wiki.dbpedia.org, 'Language Chapters DBpedia',
2015.[Online].Available: http://wiki.dbpedia.
org/about/language-chapters.[Accessed: 22- Jul-
2015].
[8] Wikipedia, 'List of Wikipedias', 2015. [Online].
Available:https://en.wikipedia.org/wiki/List_of
Wikipedias. [Accessed: 18- Jul- 2015].
[9] Wikipedia, 'Languages used on the Internet',
2015.[Online].Available: https://en.wikipedia
.org/wiki/Languages_used_on_the_Internet
[Accessed: 18- Jul- 2015].
[10]Internetworldstats.com, 'Arabic Speaking
Internet Users and Population Statistics', 2015.
[Online].Available: http://www.internetworldsta
ts.com/ stats19.htm. [Accessed: 27- Jul- 2015].
[11] D. kontokastas and S. Hellman, The DBpedia
data stack. Towards a Sustainable DBpedia
Project provide a public data infrastructure for
Europe,[Online].Available: http://svn.aksw.org/
papers/2014/EDF_DBpediaDataStack/pub
lic.pdf. [Accessed: 15- June- 2015].
[12] C. Bratsas, L. Ioannidis, D. Kontokostas, S.
Auer, C. Bizer, S. Hellmann and I. Antoniou,
'DBpedia internationalization-a graphical tool
for I18n infobox-to-ontology mappings', in
International Semantic Web Conference Demo
(ISWC2011 Demo), Bonn, Germany, 2011.
[13] X. Niu, X. Sun, H. Wang, S. Rong, G. Qi and
Y. Yu, Zhishi.Me - Weaving Chinese Linking
Open Data, in The Semantic Web -- ISWC 2011:
10th International Semantic Web Conference,
Bonn, Germany, 2011
05
1015202530
Acc
ure
nce
s (%
)
Categories
Classes Mapped
Templates Mapped
Properties Mapped
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 123
[14] A. Palmero Aprosio, 'Extending Linked open
data resources exploiting Wikipedia as a source
of information', Ph.D., Universit`a degli Studi di
Milano, 2012.
[15] H. Al-Feel, 'A Step towards the Arabic
DBpedia', International Journal of Computer
Applications, vol. 80, no. 3, pp. 27-33, 2013.
[16] Alexa.com, 'Alexa Top 500 Global Sites', 2015.
[Online].Available:http://www.alexa.com/to
psites . [Accessed: 27- Jul- 2015].
[17] S. Auer and J. Lehmann, "What Have
Innsbruck and Leipzig in Common? Extracting
Semantics from Wiki Content," in THE 4TH
EUROPEAN SEMANTIC WEB CONFERENCE (ESWC
2007) Innsbruck, Austria,2007, pp. 503-517.
[18]L. Zhang, A. Rettinger and S. Thoma, 'Bridging
the Gap between Cross-lingual NLP and
DBpedia by Exploiting Wikipedia', 2014.
[Online].Available: http://www.aifb.kit.edu/im
ages/4/43/Nlp%26dbpedia.pdf. [Accessed: 27-
Jul- 2015].
[19] Wikipedia, 'Arabic Wikipedia', 2015. [Online].
Available:https://en.wikipedia.org/wiki/Arabic_
Wikipedia. [Accessed: 27- Jul- 2015].
[20]'DBpedia Statistics', 2015. [Online]. Available:
http://Wiki.dpedia.org/Datasets2014/dataset/Stat
istics. [Accessed: 27- Jul- 2015].
[21] Blog.dbpedia.org, 'DBpedia Version 2014
released|DBpedia Blog', 2014. [Online].
Available:http://blog.dbpedia.org/?p=77.
[Accessed: 8- Jul- 2015].
[22]D. Kontokostas, 'The past, present & future of
DBpedia', in 18th International Conference on
Business Information Systems(BIS), Poznan,
Poland, 2015.
[23] W3.org, 'OWL 2 Web Ontology Language
Document Overview (Second Edition)', 2015.
[Online].Available: http://www.w3.org/TR/owl
2-overview/. [Accessed: 27- Jul- 2015].
[24]F. Suchanek, G. Kasneci and G. Weikum,
'YAGO: A Core of Semantic Knowledge
Unifying WordNet and Wikipedia', in
Proceedings of the Sixteenth International
World Wide Web Conference (WWW2007),
Banff, Alberta, CANADA, 2015, pp. 697-702.
[25] Umbel.org, 'UMBEL: Upper Mapping and
Binding Exchange Layer', 2015. [Online].
Available: http://www.umbel.org/. [Accessed:
23- Jul- 2015].
[26] Xmlns.com, 'FOAF Vocabulary Specification',
2014.[Online].Available:http://xmlns.com/f
oaf/spec/. [Accessed: 17- Jul- 2015].
[27]A. Miles and S. Bechhofer, 'SKOS Simple
Knowledge Organization System Namespace
Document 30 July 2008 "Last Call" Edition',
W3.org,2008.[Online].Available: http://www
.w3.org/TR/2008/WD-skos-reference-200808
29/skos.html. [Accessed: 11- Jul- 2015].
[28] Dublincore.org, 'DCMI Home: Dublin Core®
Metadata Initiative (DCMI)', 2014. [Online].
Available: http://dublincore.org/. [Accessed: 27-
Jul- 2015].
[29]S. Hellmann, C. Stadler, J. Lehmann and S.
Auer, 'DBpedia Live Extraction', in On the
Move to Meaningful Internet Systems: OTM,
Vilamoura, Portugal, 2009, pp. 1209 – 1223.
[30]M. Morsey, J. Lehmann, S. Auer, C. Stadler and
S. Hellmann, 'DBpedia and the live extraction of
structured data from Wikipedia', Program:
electronic library and information systems, vol.
46, no. 2, pp. 157-181, 2012.
[31]C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C.
Becker, R. Cyganiak and S. Hellmann,
'DBpedia - A crystallization point for the Web
of Data', Web Semantics: Science, Services and
Agents on the World Wide Web, vol. 7, no. 3,
pp. 154-165, 2009.
[32]J. Lehmann, R. Isele, M. Jakob, A. Jentzsch, D.
Kontokostas, P. Mendes, S. Hellmann, M.
Morsey, P. van Kleef, S. Auer and C. Bizer,
'DBpedia-a large-scale, multilingual knowledge
base extracted from Wikipedia'', Semantic Web
Journal, vol. 5, pp. 1-29, 2014.
[33]S. Hellmann, 'DBpedia Extraction of
Knowledge from Wikipedia', http://lod2.eu,
2011. [Online].Available:
http://semanticweb.kaist.ac.kr/
workshop2011/presentation/3_dbpedia_sebastia
n.pdf. [Accessed: 20- Jul- 2015].
[34]D. Lange, C. Böhm and F. Naumann,
'Extracting Structured Information from
Wikipedia Articles to Populate Infoboxes', in
19th ACM international conference on
Information and knowledge management, 2010,
pp. 1661-1664.
[35]'LOD2 - Creating Knowledge out of Interlinked
Data',2010.[Online].Available:
http://cordis.europa.eu/docs/projects/ cnect/3/2
57943/080/deliverables/001-LOD2D322DBp
ediaLiveExtraction.pdf. [Accessed: 10- Jul-
2015].
[36] Wiki.dbpedia.org, 'Data Set 2014 | DBpedia',
2015.[Online].Available:
http://wiki.dbpedia.org/data-set-2014.
[Accessed: 6- Jul- 2015].
[37] Openarchives.org, 'Open Archives Initiative -
Protocol for Metadata Harvesting - v.2.0', 2015.
[Online].Available: http://www.openarchives
.org/OAI/2.0/openarchivesprotocol.htm.
[Accessed: 2-- Jul- 2015].
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 124
[38]S. Hamburger, 'Using the open archives
initiative protocol for metadata harvesting',
Library Collections, Acquisitions, and
Technical Services, vol. 32, no. 2, p. 114, 2008.
[39]A. Ismayilov, D. Kontokostas, S. Auer, J.
Lehmann and S. Hellmann, Wikidata through
the Eyes of DBpedia, 1st ed. 2015.
[40] W3.org, 'Internationalized Resource Identifiers
(IRIs)', 2011. [Online]. Available:
http://www.w3.org/International/O-URL-and-
ident.html. [Accessed: 15- June - 2015].
[41]'Mapping Guide', 2015. [Online]. Available:
http://mappings.dbpedia.org/index.php/map
ing_guide. [Accessed: 10- June - 2015].
[42]D. Kontokostas, DBpedia Mappings Crowd-
sourcing, 1st ed. AKSW, Universität Leipzig,
2014, pp. 2-23.
[43] W3.org, 'SPARQL Query Language for RDF
1.0',2008.[Online].Available:http://www.w3
.org/TR/rdf-sparql-query/. [Accessed: 27- Jul-
2015].
[44] W3.org, 'SPARQL 1.1 Query Language', 2013.
[Online].Available: http://www.w3.org/TR/spa
rql11-query/. [Accessed: 10- June- 2015].
Mathematical and Computational Methods in Electrical Engineering
ISBN: 978-1-61804-329-0 125
top related