web search systems

8/14/2019 Web Search Systems

1/23

4

Chapter 2

BACKGROUND AND LITERATURE REVIEW

The Web information resources are growing explosively in number and volume but to retrieve relevant

information from these resources are becoming more difficult and time-consuming. The way of storing

Web information resources is considered a root cause of irrelevancy in information retrieved because they

are not stored in machine-understandable organization. It has been estimated [12] that nearly 48 percent to

63 percent irrelevant results are retrieved. That is, the relevant results are in the range of 37 percent to 52

percent, as shown in Table 2.1, which are far from the accuracy. An extension of current web known as

Semantic Web, has been envisioned to organize the web content through ontologies in order to enable them

machine understandable. To support the theme of semantic web, there is a crucial need of techniques for

designing, development, populating and integrating ontologies.

2.1 Semantic Web Vs Current Web

Semantic web may be compared with non-semantic web within several parameters such as content,

conceptual perception, scope, environment and resource-utilization.

(a) Content: Semantic web encompasses actual content along with its formal semantics. Here, the

formal semantics are machine understandable content, generated in logic-based languages such as Web

Ontology Language (OWL) recommended by W3C. Through formal semantics of content, computers can

make inferences about data i.e., to understand what the data resource is and how it relates to other data. In

todays web there is no formal semantics of existing contents. These content are machine-readable but not

machine understandable.


2/23

5

Table 2.1: Relevant results (from top 20 results) of search some engines

Search Engine Precision

Yahoo 0.52

Google 0.48

MSN 0.37

Ask 0.44

Seekport 0.37

(b) Conceptual Perception: Current web is just like a book having multiple hyperlinked documents. In

book scenario, index of keywords are presented in each book but the contexts in which those keywords are

used, are missing in the indexes. That is, there are no formal semantic of keywords in indexes. To check

which one is relevant, we have to read the corresponding pages of that book. Same is the case with current

web. In semantic web this limitation will be eliminated via ontologies where data is given with well-defined

meanings, understandable by machines.

(c) Scope: Through literature survey, it has been determined that inaccessible part of the Web is about

five hundred times larger than accessible one [13]. It is estimated that there are billion pages of information

available on the web, and only a few of them can be reached via traditional search engines. In semantic web

formal semantics of data are available via ontologies, and the ontologies are the essential component of

semantic web accessible to semantic search engines.

(d) Environment: Semantic Web is the web of ontologies having data with formal meanings. This is in

contrast to current Web which contains virtually boundless information in the form of documents. The

Semantic Web, on the other hand, is about having data as well as documents that machines can process,

transform, assemble, and even act on data in useful ways.


3/23

6

(e) Resources Utilization: There are a lot of Web resources that may be very useful in our everyday

activities. In current web it is difficult to locate them; because they are not annotated properly by the

metadata understandable by machines. In Semantic Web there will be a network of related resources. It will

be very easy to locate and to use them in semantic web world. Similarly there are some other criteria factors

for comparison between current web and semantic web. For example, information searching, accessing,

extracting, interpreting and processing on semantic web will be more easy and efficient; Semantic web will

have inference or reasoning capability; network or communication cost will be reduced in the presence of

semantic web for the reason of relevant results; and many more - some are listed in the Table 2.2.

Table 2.2: Semantic vs. current WebSr. No Web Factors (Non-Semantic) Web Semantic Web

1. Conceptual Perception Large hyperlinked book Large interlinked database

2. Content No formal meanings Formally defined3. Scope Limited-Probably

invisible web excluded

Boundless-Probable invisible web

included

4. Environment Web of documents Web of ontologies, data and documents

5. Resource Utilization Minimum-Normal Maximum

6. Inference/Reasoning capability No Yes

7. Knowledge Management application support No Yes

8. Information searching,accessing,extracting Difficult and time-

consuming task

Easy and Efficient

9. Timeliness,accuracy,transparency of

information

Less More

10. Semantic heterogeneity More Less

11. Ingredients -Content-Presentation

-Content, presentations-Formal Semantics

12. Text simplification and clarification No Yes

According to theme of semantic web, if the explicit semantics of web resources are put together with

their linguistic semantics and are represented in some logic-based languages then we can handle the

limitations of current web. To support this theme of semantic web, W3C recommended some standards [14]

such as RDF (Resource Description Framework), RDFS (RDF Schema), OWL (Web Ontology Language),

SPARQL (a query language for RDF) and GRDDL (Gleaning Resource Descriptions from Dialects of

Languages). RDF is used for data model of semantic web application. Resources are represented through

URIs, connected through labeled edges which are also represented through URIs. RDF is represented


4/23

7

through a language called RDFS. There is another more powerful language so- called OWL to represent

RDF model. The query language such as SPARQL can be used to querying RDF model. Now the semantic

web vision has become a reality [15, 16]. Several semantic web systems have been developed. A subset of

these systems is given in Table 2.3. A huge number of ontology based web documents have been

published.


5/23

8

Table 2.3: Examples of Semantic Web Systems [17]SW Systems Country Activity

area

Application area

of SWT

SWT used SW technology benefits

A Digital Music Archive(DMA) forNRK using SW techniques

Norway Broadcasting

IS,CD,& DI 1,2,3 & IHV IS,INR,S & RD

A linked Open Data Resource ListManagement Tool for UndergradStds

UK ELT, andpublishing

CD,CM,DI, andSA

RDF,3,1,SKOS &PV

(ECR),P, reduced time tomarket, and S & RD

A Semantic Web Content Repositoryfor Clinical Research

US HC and PI DI 1,2,4,5, andPV

Automation,IM,and IS

An Intelligent Search Engine for OnlineServices for Public Admin. Spain PI and eG Portal and IS 1 and IHV ECR,INR, and IS

An Ontology of Cantabrias Cultural

Heritage

Spain PI and

museum

Portal and DI 1 and IHV ECR and IM

Composing Safer Drug Regimens for

the Individual Patient using SWT

US HC DI and IS 1,2,PV, and

IHV

S & RD,open model,IS,and

DCG

CRUZARAn application of semanticmatchmaking for eTourism in the city

of Zaragoza

Spain PI andeToursim

Portal and DI 1,3,Rules,PV, and IHV

Faceted navigation,P,and (S&RD)

Enhancement and Integration ofCorporate Social Software Using SW

France Util andenergy

DI,CM, and SN 1,3, and PV IS,S & RD, and INR

Enhancing Content Search Using SW US IT industry Portal and IS 1 IS and S& RD

Geographic Referencing Framework UK PI,GIS &

eG

DI 1,2 & IHV S&RD and automation

Improving the Reliability of InternetSearch Results Using Search Thresher

Ireland Webaccessibilit

y

IS 1 IS

Improving Web Search Using Metadata Spain andUS

Search Portal,IS,CD, andcustomization

1,2,RDF,4,PV,and IHV

P,open model, and S&RD

KDE 4.0 Semantic Desktop Search and

Tagging

Germany Semantic

desktop

DI,CD,IS,service

integration, andSA

1,3,2 and

RDFS++

IS,IM, and open model

POPSNASAs Expertise LocationService Powered by SW Technologies

US PI DI and SN 1 and 3 Faceted navigation, S&RD, andECR

Prioritization of Biological Targets for

Drug Discovery

US Life

sciences

DI and portal 1,3,2,2DL,P

V,and IHV

IS,S&RD,IM,and ECR

Real Time Suggestion of Related Ideasin the Financial Industry

Spain Financial CD I and IHV IS

S_Ctnt_Desc to improve discoverySemantic MDR and IR for NationalArchives

UKKorea

TelecomPI

CD and ISPortal,CD, andSA

1,PV & IHV1,2,Rules,PV, and IHV

S&RD,open model, and ISIS

Semantic Tags Serbia IT industry SA,SN, and DI 1,3 and PV INR,IS, S&RD,and DCGSWT for Public Health Awareness US HC,PI &

eGDI 1,2,PV &

IHVS&RD and IM

Semantic-based Search and QuerySystem for the Traditional ChineseMedicine Community

China PI and HC DI,IS and Schemamapping

2,3,and PV S&RD and IS

The SW for the Agricultural Domain,Semantic Navigation of Food, Nutrition

and Agricultural Journal

Italy PI and eG Portal,IS,SA, andCD

1,2,SKOS,PV,and IHV

IS

The swordfish Metadata InitiativeBetter,Faster,Smarter Web Content

US IT industry Portal and DI 1 and IHV DCG and ECR

Twine US IT industry SA,SN,and DI RDF,1,2,3 INR,IS,S&RD,and DCG

Use of SWT in Natural Languageinterface to Business Applications

India IT industry Natural languageinterface

1,2,5,3, andIHV

IM and ECR

Different abbreviations and integers used in above table are: SWT(Semantic Web Technologies),IHV(In-House Vocabularies),IS(ImprovedSearch),IM(Incremental Modeling),CD(Content Discovery),DI(Data Integration),INR(Identity New Relationships),S&RD(Share and ReuseData),PV(Public Vocabularies),ECR(Explicit Content Relationships),SA(Semantic Annotation),SN(Social Network),GIS(Geographic

Information System) ,ELT(Education, Learning Technology),CM(Content Management),DCG(Dynamic ContentGeneration),P(Personalization),PI(Public Institution),HC(Health

Care),eG(eGovernement),1(RDFS),2(OWL),3(SPARQL),4(GRDDL),5(Rules, Rules(N3))


6/23

9

2.2 The Web Ontology

Ontology formally provides structural knowledge of a domain and its data in the machine-

understandable way in some W3C [18] recommended technologies such as RDFS [19] and OWL [20] for

ontology formalization. It is an essential component of any semantic web application. Basically, it is the

central component in overall layered architecture of the semantic web, as shown in Figure 2.1. In

ontologies, the information-resources are connected in such a way that each is uniquely identified by URI

and new information can be drawn through a reasoning process. Basically, the ontology is a special type of

network of web-resources. A conceptual view of ontology can be in RDF-graph or in triples form. Consider

an example of a persons family ontology as shown in Figure 2.2. Now by definition of ontology, it looks a

special kind of a network of information resources. The only information explicitly given in said ontology

are isFatherOf, isMotherOf, isBrotherOf, isSisterOf, but we can drive a several types of new information

such as isSonOf(3,1) (i.e. Humza is the son of Farooq), isHusbandOF(8,9), isWifeOf(9,8),

isGrandFatherOf(8,3), isGrandmotherOf(9,3) and many more can easily be derived. It can be implemented

in some logic-based language such as OWL and conceptually it can be seen as triples or RDF-graph.


7/23

10

Figure 2.1: A layered model of semantic web [21].


8/23

11

Figure 2.2: A sample slice of persons family ontology

2.3 Engineering Ontologies for Semantic Web

We have reviewed the literature concerning the adaptation of existing web development methodologies

for semantic web applications. The findings disclose that not much work has been done in this direction.

XML web engineering methodologies adaptations are proposed in the form of WEESA [22]. It generates

semantic annotations by defining a mapping between the XML schemas and existing ontologies. No proper

guidelines are there for designing ontology. WEESA cannot directly design or develop ontologies. It

focused on mapping of XML schema concepts onto concepts of existing ontology. In [23], the authors have


9/23

12

proposed an extension of the Web Site Design Method. In this approach object chunk entities are mapped

to concepts in the ontology. OOHDM has extended in the light of the semantic web technologies [24]. Its

primitives for conceptual and navigation models have been described as DAML classes and RDFS have

been used for domain vocabulary. The Hera [25] methodology has been extended for adaptive web-based

application engineering. It uses semantic web standards for models representation.

Existing web engineering methodologies as mentioned above, have adapted the use of ontology in their

development process, but their main focus is mapping and annotation, using existing ontologies. Design

process of new ontologies during semantic web application development is remained lack of focus. No

proper guidance is there on how to design ontology for semantic web applications.

There are some other methodologies for ontology development as surveyed in [26, 27, 28]. Mostly these

methodologies focus on specification and formalization of ontology and do not concentrate on its design

phase. These methodologies are based on natural language processing (NLP) and machine learning

techniques. Orientation of these methodologies is web-agents facilitation rather than web-contents

formalization. The work on ontology development was boosted when the idea of Semantic Web was

envisioned.

In KBSI IDEF5 methodology [29], data about domain is collected and analyzed. Then built and fixed

strategy is used to create ontology. Ushold and King [30] proposed an ontology development methodology.

In this methodology, after identifying purpose of ontology, it is captured and then coded. In

MethOntology[31] after preparing ontology specification, knowledge is acquired and analyzed to determine

domain terms such as concepts, relations and properties and then formalization is started. After that,

evaluation and documentation is performed.

In [32] a methodology, based on collaborative approach has been proposed. In its first phase, design

criteria for the ontology, specified boundary conditions for the ontology and set of standards for evaluating

ontology, are defined. In second phase, the initial version of ontology is produced, and then through


10/23

13

iterative process the desired ontology is obtained. Software Built and Fixed approach is followed, which

leads to heavy development and maintenance cost.

Helena and Joo [33] proposed a methodology for ontology construction in 2004. This method divides

ontology construction activities in different steps such as specification, conceptualization, formalization,

implementation and maintenance. Knowledge acquisition, evaluation and documentation are performed

during each phase. There are some other approaches investigating the transformation of a relational model

into an ontological model. In these approaches, the ontology is developed from database schemas. These

approaches mainly use reverse engineering theory. In [34], a methodology is outlined for constructing

ontology from conceptual database schemas using a mapping process. The process is carried out by taking

into consideration the logical database model of target system.

Most of methodologies as overviewed above are based on the built and fixed approach. In that, first

initial version of ontology is built and improved iteratively until domain requirements are satisfied. In this

way the basic principles of software engineering are not followed properly. These methodologies mainly

focus on data during developing process rather than focus on descriptive knowledge. These methodologies

mainly work on specification and implementation phases and design phase lacks in proper attention.

Moreover the design and implementation phases of these methodologies are difficult to identify and

separate.

One of the challenges in real world applications is to improve accessing and sharing knowledge that

resides in database. Domain knowledge to be shared needs to be formalized using a formal ontology

language. Extracting and building web ontology on top of Relational Database (RDB) is a way to represent

domain specific knowledge. Moreover, information residing in RDBs is highly engineered for accessibility

and scalability [35] and is characterized by high quality and rapidly increasing correspondence to the

surface web [36]. In the schema mapping techniques, the idea is to convert RDB schema to ontology based

on the predefined schema mapping rules. Various research groups have proposed various techniques.


11/23

14

Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective

mapping for ontologies automatically. It uses RDB schema to create OWL ontology and the mapping

instance data. The translation process relies on a set of data source-to-domain mapping rules. A

processing module named as Semantic Bridge for RDB translates queries to produce OWL ontologies.


ontology mappings automatically. Automapper is an application independent tool that generates a basic

ontology from a relational database schema and dynamically produces instance data using ontology. This

method quickly exposes the data to the Semantic Web, where a variety of tools and applications are

available to support translation, integration, reasoning, and visualization.

The following class descriptions, axioms and restrictions are currently generated by Automapper:

maxCardinality is set to 1 for all nullable columns and is used for descriptive purposes minCardinality is set to 1 for all non-nullable columns and is used for descriptive purposes. All datatype and object properties that represent columns are marked as Functional Properties. To

ensure global uniqueness and class specificity, these columns are given URIs based on concatenating

the table and column names

allValuesFrom restriction reflect the datatype or class associated with each column and is used fordescriptive purposes. Table 2.4 lists the contents of Departments table

Table 2.4: Departments Table

IDa NAME

1 System Solutions

2 Research and Development

3 Management

a. ID is Primary Key


12/23

15

From this schema, Automapper creates the data source ontology and class-specific inverse functionalrules, as shown in the Fig.2.3

Figure 2.3: Ontology and inverse functional rules created by Automapper

XTR-RTO [38] is an approach to build OWL ontology from eXtensible Markup Language (XML)

document. The transform module firstly maps source of XML schema into RDB and next, maps it into

OWL ontology. Specific attributes for describing RDB are used such as rdb: DbName, rdb: Relation, rdb:

RelationList, rdb: Table and rdb: Attribute. All tables are mapped to instances of type rdb: Relation and

consequently will be added to type rdb: RelationList, where as each attribute is mapped to an instance of

type rdf: Attribute and an instance of type rdb: hasType.

XTR-RTO [38] is an approach to build OWL ontology from eXtensible Markup Language (XML)

document. The transformation module firstly maps source of XML schema into RDB and next, maps it into

OWL ontology. Specific attributes for describing RDB are used such as rdb: DbName, rdb: Relation, rdb:

RelationList, rdb: Table and rdb: Attribute.

All tables are mapped to instances of type rdb: Relation and consequently will be added to type rdb:

RelationList, where as each attribute is mapped to an instance of type rdf: Attribute and an instance of type

rdb: hasType.

dsont:Hresources.Departments a owl:Class;rdfs:subClassOf

[

a owl:Restriction;owl:onProperty

dsont:hresources.departments.ID;owl:allValuesFrom xsd:decimal

],

[a owl:Restriction;

owl:onProperty

dsont:hresources.departments.ID;

1^^xsd:nonNegativeInteger]


13/23

16

The entity-relation model is the most popular style for organizing database at present, which can express

the relationship between data clearly. Therefore metadata information and structural restrictions extracted

from relational database to construct ontologies.The ontology contains:

Vocabularies for describing relational database systems such as:rdb:DBName,rdb:Relation,rdb:RelationList,rdb:Table,rdb:Attribute,rdb:PrimaryKeyAttribute , and rdb:ForeignKeyAttribute.

Semantic relationships between vocabularies suchas:rdb:hasRelation,db:hasAttribute,rdb:primaryKey,db:hasType and rdb:isNullable.

Restrictions on the vocabularies and their semantic relationships such as: each relation has zero ormore attributes, and each attribute has exactly one type.

Mapping approach used in XTR-RTO is described below:

Each table is mapped to an instance of type rdb: Relation and added to type rdb: RelationList. Each attribute is mapped to an instance of type rdb: Attribute and an instance of type rdb: hasType is

generated simultaneously. If the attribute is the foreign key, an instance of type rdb:

ReferenceAttribute and an instance of type rdb: ReferenceRelation are generated to represent this

information.

Generate the restriction of each instance of type rdb: Attribute, such as cardinality restriction and foreign

key restriction.

There is one table in this relational database, as illustrated in Table 2.5

Table 2.5: Book Table in Relational ModelBOOK_ID TITLE AUTHOR PRINTER_ID

Suppose the database is saved in local host, and the OWL ontology describing the database has a

namespace http://localhost/book.owl, which can be changed by users. Fig. 2.4 illustrates the relations in

the ontology:


14/23

17

Figure 2.4: Relations in the ontology

The OWL description of table BOOK is as shown in the Fig. 2.5

Figure 2.5: OWL description of BOOK Table

false

string

false

string

falsestring

false

string


15/23

18

In RTAXON [39] data mining approach is used. This work proposed learning technique to exploit all

database content to identify categorization patterns. These categorization patterns used to generate class

hierarchies. This fully formalized method combines a classical schema analysis with hierarchy mining

(extraction) in the data. Learning method focuses on concept hierarchy identification by identifying the

relation of attributes. Due to the assumption that attribute names have specific role in the relation, this

approach identifies lexical clues of attribute names that reveal their specific role in the relation (i.e.

classifying the tuples).

A learning method that shows how the content of the databases can be exploited to identify

categorization patterns. These categorization patterns are then used to generate class hierarchies. This fully

formalized method combines a classical schema analysis with hierarchy mining (extraction) in the data.

Learning method focuses on concept hierarchy identification by identifying the relation of attributes. Due to

the assumption that attribute names have specific role in the relation, this approach identifies lexical clues

of attribute names that reveal their specific role in the relation (i.e. classifying the tuples).

In example ofFig. 2.6, the categorizing attribute in the Products relation is clearly identified by its name

(Category).

2.3.1 Identification of the categorizing attributes:

Two sources are involved in the identification of categorizing attributes: names of attributes and data

diversity in attribute extensions (i.e. in column data). These two sources are indicators that allow, finding

attribute candidates and selecting the most plausible one.

a.Identification of lexical clues in attribute names:

Attributes bear names that reveal their specific role in the relation (i.e. classifying the tuples) used for

categorization. The lexical clue that indicates the role of the attribute can just be a part of the name, as in the

attribute names CategoryID or Object Type. In Fig. 2.6, for example, the categorizing attribute in the


16/23

19

Products relation is clearly identified by its name (i.e. Category).A list of clues set up and used to perform a

first filtering of potential candidates.

b.Filtering through entropy-based estimation of data diversity:

With an extensive list of lexical clues, the first filtering step appears to be effective. For example,

Category column in the Products relation can be used to derive subclasses. However, experiments on

complex databases show that this step often identifies several candidates. The selection among the

remaining candidates is based on an estimation of the data diversity in the attribute extensions. A good

candidate might exhibit some typical degree of redundancy that can be formally characterized using the

concept of entropy from information theory. Entropy is a measure of the uncertainty of a data source.

Attributes with highly repetitive content will be characterized by low entropy. Conversely, among attributes

of a given relation, the primary key will have the highest entropy since all values in its extension are

distinct. Informally, the rationale behind this selection step is tofavor the candidate that would provide the

most balanced distribution of instances within the subclasses.

2.3.2 Generation and population of the subclasses:

The generation of subclasses from an identified categorizing attribute can be straightforward. A subclass

is derived from each value type of the attribute extension (i.e. for each element of the attribute active

domain). However, proper handling of the categorization source may require more complex mappings.

Example

As illustrated in Fig. 2.6, the derivation applied in this example can be divided into two inter-related. The

first part, named (a) in the figure, includes derivations that are motivated by the identification of patterns

from the database schema. Each relation (or table) definition from the relational database schema is the

source of a class in the ontology.

To complete the class definitions, datatype properties are derived from some of the relation attributes.

The foreign key relationships are the most reliable source for linking classes and, in this example; each

relationship is translated into an object property. The derivations applied to obtain this upper part of the


17/23


18/23

21

2.3.3 Rules fo r learning classes

During learning ontological classes, one class may derive from the information spreading across several

relations. These rules integrate the information in several relations into one ontological class, when these

relations are used to describe one entity.

2.3.4 Rules for learning properties and property characteristics

There are two kinds of properties: object properties and datatype properties. These rules learn object

properties according to the reference relationship between relations. In a relational model, relations may be

used to indicate the relationship between two relations; in this case, such relation can be mapped into an

ontological object property. These rules learn object properties by using the binary relationship between

relations. Complex or non-binary (n-ary) relations are not supported by ontology languages and need to be

decomposed into a group of binary relations. For each attribute in relations, if it cannot be converted into

ontological object property, it can be converted into ontological datatype property.

2.3.5 Rules for learning hierarchy

Classes and properties can be organized in a hierarchy. If two relations in database have inheritance

relationship, then the two corresponding ontological class or property organized in a hierarchy.

2.3.6 Rules for learning cardinality

Ontology defines property cardinality to further specify properties. The property cardinality learned from

the constraint of attributes in relations. If attribute is primary or foreign key then the minCardinality and

maxCardinality of the property is 1. And if any attribute is declared as NOT NULL, the minCardinality of

the property is 1. Furthermore if any attribute is declared as UNIQUE, the maxCardinality of the property is

1.

2.3.7 Rules for learning instances

For an ontological class, its instances consist of tuples in relations corresponding to class and relations

between instances are established using the information contained in the foreign keys in the database tuples.


19/23

22

According to above mentioned rules, ontology constructed based on the data in relational database

automatically.

2.3.8 Implementation

The overall framework of ontology learning is presented in Fig. 5. The input for the framework is data

stored in relational database. The framework uses a database analyzer to extract schema information from

database, such as the primary keys, foreign keys and dependencies. Then the obtained information is

transferred to ontology generator. The ontology generator will generate ontology based on the schema

information and rules. As a last step, user can modify and refine the obtained ontology with the aid of

ontology reasoner and ontology editor.

The framework is a domain/application independent and can learn ontology for general or specific

domains from relational database.Fig.2.7 illustrates ontology construction framework

Figure 2.7: Ontology Construction Framework

RDB

Ontology Generator Rules Lib

OntologyOntology

ReasonerOntology Editor

Ontology Based Application

RDB AnalyzerRDB

Ontology Generator Rules Lib

OntologyOntology

ReasonerOntology Editor

Ontology Based Application

RDB Analyzer


20/23

23

As this survey shows that each approach defines common rules for mapping basic RDB schema pattern

to ontology such as relations, properties, reference keys and cardinalities as shown in the following Table

2.5.

Table 2.5: Comparative Analysis of ontology construction approaches

Approach Mapping

Creation

Procedure Sources Used Mapping Rules Limitations

Automapper:

Relational DatabaseSemantic

Translation using

OWL and SWRL[37]

Automatic -Construction of

ontology fromRDB with the help

of Configuration

file

-Configuration

file-RDB Schema

-Mapping rules

Class,datatype

property, andobject property

-Un-normalized

RDB-Un-resolvable

URIs

Using Relational

Database to Build

OWL Ontology

from XML Data

Sources [38]

Semi-Automatic - Construction of

ontology from

RDB

-RDB schema

-Mapping rules

rdb:Relation,rdb

:Attribute,and

rdb:ReferenceAt

tribute

-Un-normalized

RDB

-Need domain

experts help for

extracting

cardinalityrestrictions.

Mining the Content

of RelationalDatabase to Learn

Ontology with

Deeper Taxonomies[39]

Semi-Automatic -Schema analysis

and Hierarchymining from stored

data

- RDB schema

- Mapping rules- Stored data

Class,datatype

properties,objectproperty, and

M:M object

property

-Sometimes,

attribute name doesnot represent its

value

Learning Ontology

from RelationalDatabase[40]

Automatic - Construction of

Ontology fromRDB without

using middlemodel

-RDB Schema

-Mapping rules

classes,

properties,property

characteristics,cardinalities and

data instances

-Un-normalized

RDB

Ontology built either from scratch using any ontology editor or by leveraging database (document

collections) using (semi-) automatic ontology construction. The topic of ontology construction has been

receiving growing attention since different application might focus on different aspect. However, the basic

idea is to provide vocabularies of concepts and their relationship within a specific domain. Many works

have been done in the area of ontology construction through the use of relational database as the data

source.

As described in Table 2.5, the techniques used in ontology construction from relational database are

based on schema mapping and data mining approaches.


21/23

24


mapping ontology automatically. The translation process relies on a set of data source-to-domain

mapping rules. These mapping rules depend on well-form relational database schema. In many applications,

sometimes well-form relational database schema is not available, thus it is not guarantee for better results in

ontology construction process.

XTR-RTO [38] uses metadata information extracted from relational database to construct ontology

based on predefined schema translation rules. Effectiveness of theses translation rules depends upon well-

form relational database schema. Sometimes unavailability of well-designed relational database, results in

challenges in ontology construction process.

RTXON [39] identifies lexical clues for attribute names in the filtering process. In many applications,

attribute names, sometimes do not represent its value (any lexical clue), thus better results cannot be

guaranteed in filtering process.

The approach used in [40] acquired ontology from relational database automatically by using group of

learning rules. These learning rules depend upon relational database schema and in many applications,

unavailability of well-formed relational database resulting in inconsistent and incorrect ontology

construction.

2.4 Populating Web Ontologies

Web-ontology automatic population is one of the research issues nowadays. There are two stages of

web-ontology developments; one is its (schema) creation and second is its population. Although populating

web-ontology is not a complicated task, but it is very time consuming, enormous and laborious when it is

performed manually. For the success of Semantic Web there is a need of a proper solution for populating

ontologies. In current web, mostly the data is available in XML data files, and there is extremely large

number of these data resources, containing data in terabytes. To upgrade such data-intensive web


22/23

25

applications into semantic web applications, there is need of some proper methodologies for automatic

populating of ontologies.

Different approaches have been proposed for ontology populating. Some of them are based on natural

language processing and machine learning techniques [41] [42]. Junte Zhang and Proscovia Olango [43]

presented a novel approach for making and populating ontologies. According to this method domain

knowledge about the ontology is collected and domain ontology is constructed by using open source tool

called Protg. This domain ontology is transformed to equivalent RDF file and this RDF file is

manipulated manually to populate the ontology skeleton created by the Protg. XSLT or Xquery have been

used to extract the relevant information from Wikipedia pages into Perls regular expressions and then

ontology instances are generated using those expressions. Semantic heterogeneities and inconsistency

problems were raised while exporting Wikipedia pages to XML format and those remained unsolved.

Web-ontology creation and populating guidelines are provided while developing new semantic web

applications using WSDM [44] [45]. Concepts in the ontology are mapped to object chunks manually at

conceptual level. This conceptual mapping is used to generate actual semantic web page at implementation

level. A similar approached is used in WEESA [46]. This is an adaptation of XML-base web engineering.

Web-ontology concepts are mapped to schema elements of XML document. This mapping is defined for

each page. Then ontology is populated via tool [47]. In [48] a methodology is proposed for extracting data

from web documents for ontology population. This methodology consists of three steps. The first step

consists of extracting information in the form of sentences and paragraphs. Web documents are selected by

using some search engines or we could select them manually. This information is understood by the system

semantically and syntactically and it also understands the relations between the terms of text using some

rhetorical structures. For efficient representation of extracted information, XML is used due to its flexibility

and abilities for handling data. We proposed ontology populating methodology [49] to populate ontology

from data stored in XML data files. This methodology may help in transforming an existing non-semantic


23/23

web application into semantic web application by populating its web-ontology semi-automatically through a

set of transformation algorithms by reducing the time consuming task of ontology population. In [50], a

similar work is presented. The proposed methodologies take a web-ontology schema and XML-document

as input and produce a populated ontology as output.

2.5 Integrating Web Ontologies

When multiple ontologies are simultaneously used in some integration operations such as merging,

mapping and aligning then they may suffer from different types of heterogeneities such as semantic

heterogeneity and non-semantic or syntactic heterogeneity [51,52]. The syntactic heterogeneity is due to

the use of different languages for the formalization of ontologies e.g., one ontology is formalized in OWL

[20] and the other is formalized in DAML [53] language. The semantic heterogeneity consists of

terminological, conceptual and contextual heterogeneities. The terminological heterogeneity occurs when

different terms are used to represent same concepts or same term is used to represent different concepts.

Conceptual heterogeneity between two concepts may occur due to granularities of concepts i.e., when one

is sub-concept or super-concept of the other or both are overlapped. Similarly, two concepts are explicit-

semantic heterogeneous when they have different roles or functionalities in the similar domain.

In ontology model, the taxonomic and non-taxonomic relations of a concept represent its roles and

super/sub concepts respectively. Since, in a certain domain an intellectual concept is explicitly defined in

term of roles that it keeps, therefore we may call the roles of a concept as its explicit semantics. However,

explicit semantics of a non-intellectual concept may be derived from its granularities and its physical

attributes because of typical nature of those domains, e.g. in ontology of a furniture domain, concepts like

chair, table and desk have no intellectual properties, these concepts have only taxonomic (i.e. parent, child,

sibling) and elementary (i.e. color, type, etc.) characteristics.

web search systems

Documents