web search systems

Upload: ranasherdil

Post on 04-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Web Search Systems

    1/23

    4

    Chapter 2

    BACKGROUND AND LITERATURE REVIEW

    The Web information resources are growing explosively in number and volume but to retrieve relevant

    information from these resources are becoming more difficult and time-consuming. The way of storing

    Web information resources is considered a root cause of irrelevancy in information retrieved because they

    are not stored in machine-understandable organization. It has been estimated [12] that nearly 48 percent to

    63 percent irrelevant results are retrieved. That is, the relevant results are in the range of 37 percent to 52

    percent, as shown in Table 2.1, which are far from the accuracy. An extension of current web known as

    Semantic Web, has been envisioned to organize the web content through ontologies in order to enable them

    machine understandable. To support the theme of semantic web, there is a crucial need of techniques for

    designing, development, populating and integrating ontologies.

    2.1 Semantic Web Vs Current Web

    Semantic web may be compared with non-semantic web within several parameters such as content,

    conceptual perception, scope, environment and resource-utilization.

    (a) Content: Semantic web encompasses actual content along with its formal semantics. Here, the

    formal semantics are machine understandable content, generated in logic-based languages such as Web

    Ontology Language (OWL) recommended by W3C. Through formal semantics of content, computers can

    make inferences about data i.e., to understand what the data resource is and how it relates to other data. In

    todays web there is no formal semantics of existing contents. These content are machine-readable but not

    machine understandable.

  • 8/14/2019 Web Search Systems

    2/23

    5

    Table 2.1: Relevant results (from top 20 results) of search some engines

    Search Engine Precision

    Yahoo 0.52

    Google 0.48

    MSN 0.37

    Ask 0.44

    Seekport 0.37

    (b) Conceptual Perception: Current web is just like a book having multiple hyperlinked documents. In

    book scenario, index of keywords are presented in each book but the contexts in which those keywords are

    used, are missing in the indexes. That is, there are no formal semantic of keywords in indexes. To check

    which one is relevant, we have to read the corresponding pages of that book. Same is the case with current

    web. In semantic web this limitation will be eliminated via ontologies where data is given with well-defined

    meanings, understandable by machines.

    (c) Scope: Through literature survey, it has been determined that inaccessible part of the Web is about

    five hundred times larger than accessible one [13]. It is estimated that there are billion pages of information

    available on the web, and only a few of them can be reached via traditional search engines. In semantic web

    formal semantics of data are available via ontologies, and the ontologies are the essential component of

    semantic web accessible to semantic search engines.

    (d) Environment: Semantic Web is the web of ontologies having data with formal meanings. This is in

    contrast to current Web which contains virtually boundless information in the form of documents. The

    Semantic Web, on the other hand, is about having data as well as documents that machines can process,

    transform, assemble, and even act on data in useful ways.

  • 8/14/2019 Web Search Systems

    3/23

    6

    (e) Resources Utilization: There are a lot of Web resources that may be very useful in our everyday

    activities. In current web it is difficult to locate them; because they are not annotated properly by the

    metadata understandable by machines. In Semantic Web there will be a network of related resources. It will

    be very easy to locate and to use them in semantic web world. Similarly there are some other criteria factors

    for comparison between current web and semantic web. For example, information searching, accessing,

    extracting, interpreting and processing on semantic web will be more easy and efficient; Semantic web will

    have inference or reasoning capability; network or communication cost will be reduced in the presence of

    semantic web for the reason of relevant results; and many more - some are listed in the Table 2.2.

    Table 2.2: Semantic vs. current WebSr. No Web Factors (Non-Semantic) Web Semantic Web

    1. Conceptual Perception Large hyperlinked book Large interlinked database

    2. Content No formal meanings Formally defined3. Scope Limited-Probably

    invisible web excluded

    Boundless-Probable invisible web

    included

    4. Environment Web of documents Web of ontologies, data and documents

    5. Resource Utilization Minimum-Normal Maximum

    6. Inference/Reasoning capability No Yes

    7. Knowledge Management application support No Yes

    8. Information searching,accessing,extracting Difficult and time-

    consuming task

    Easy and Efficient

    9. Timeliness,accuracy,transparency of

    information

    Less More

    10. Semantic heterogeneity More Less

    11. Ingredients -Content-Presentation

    -Content, presentations-Formal Semantics

    12. Text simplification and clarification No Yes

    According to theme of semantic web, if the explicit semantics of web resources are put together with

    their linguistic semantics and are represented in some logic-based languages then we can handle the

    limitations of current web. To support this theme of semantic web, W3C recommended some standards [14]

    such as RDF (Resource Description Framework), RDFS (RDF Schema), OWL (Web Ontology Language),

    SPARQL (a query language for RDF) and GRDDL (Gleaning Resource Descriptions from Dialects of

    Languages). RDF is used for data model of semantic web application. Resources are represented through

    URIs, connected through labeled edges which are also represented through URIs. RDF is represented

  • 8/14/2019 Web Search Systems

    4/23

    7

    through a language called RDFS. There is another more powerful language so- called OWL to represent

    RDF model. The query language such as SPARQL can be used to querying RDF model. Now the semantic

    web vision has become a reality [15, 16]. Several semantic web systems have been developed. A subset of

    these systems is given in Table 2.3. A huge number of ontology based web documents have been

    published.

  • 8/14/2019 Web Search Systems

    5/23

    8

    Table 2.3: Examples of Semantic Web Systems [17]SW Systems Country Activity

    area

    Application area

    of SWT

    SWT used SW technology benefits

    A Digital Music Archive(DMA) forNRK using SW techniques

    Norway Broadcasting

    IS,CD,& DI 1,2,3 & IHV IS,INR,S & RD

    A linked Open Data Resource ListManagement Tool for UndergradStds

    UK ELT, andpublishing

    CD,CM,DI, andSA

    RDF,3,1,SKOS &PV

    (ECR),P, reduced time tomarket, and S & RD

    A Semantic Web Content Repositoryfor Clinical Research

    US HC and PI DI 1,2,4,5, andPV

    Automation,IM,and IS

    An Intelligent Search Engine for OnlineServices for Public Admin. Spain PI and eG Portal and IS 1 and IHV ECR,INR, and IS

    An Ontology of Cantabrias Cultural

    Heritage

    Spain PI and

    museum

    Portal and DI 1 and IHV ECR and IM

    Composing Safer Drug Regimens for

    the Individual Patient using SWT

    US HC DI and IS 1,2,PV, and

    IHV

    S & RD,open model,IS,and

    DCG

    CRUZARAn application of semanticmatchmaking for eTourism in the city

    of Zaragoza

    Spain PI andeToursim

    Portal and DI 1,3,Rules,PV, and IHV

    Faceted navigation,P,and (S&RD)

    Enhancement and Integration ofCorporate Social Software Using SW

    France Util andenergy

    DI,CM, and SN 1,3, and PV IS,S & RD, and INR

    Enhancing Content Search Using SW US IT industry Portal and IS 1 IS and S& RD

    Geographic Referencing Framework UK PI,GIS &

    eG

    DI 1,2 & IHV S&RD and automation

    Improving the Reliability of InternetSearch Results Using Search Thresher

    Ireland Webaccessibilit

    y

    IS 1 IS

    Improving Web Search Using Metadata Spain andUS

    Search Portal,IS,CD, andcustomization

    1,2,RDF,4,PV,and IHV

    P,open model, and S&RD

    KDE 4.0 Semantic Desktop Search and

    Tagging

    Germany Semantic

    desktop

    DI,CD,IS,service

    integration, andSA

    1,3,2 and

    RDFS++

    IS,IM, and open model

    POPSNASAs Expertise LocationService Powered by SW Technologies

    US PI DI and SN 1 and 3 Faceted navigation, S&RD, andECR

    Prioritization of Biological Targets for

    Drug Discovery

    US Life

    sciences

    DI and portal 1,3,2,2DL,P

    V,and IHV

    IS,S&RD,IM,and ECR

    Real Time Suggestion of Related Ideasin the Financial Industry

    Spain Financial CD I and IHV IS

    S_Ctnt_Desc to improve discoverySemantic MDR and IR for NationalArchives

    UKKorea

    TelecomPI

    CD and ISPortal,CD, andSA

    1,PV & IHV1,2,Rules,PV, and IHV

    S&RD,open model, and ISIS

    Semantic Tags Serbia IT industry SA,SN, and DI 1,3 and PV INR,IS, S&RD,and DCGSWT for Public Health Awareness US HC,PI &

    eGDI 1,2,PV &

    IHVS&RD and IM

    Semantic-based Search and QuerySystem for the Traditional ChineseMedicine Community

    China PI and HC DI,IS and Schemamapping

    2,3,and PV S&RD and IS

    The SW for the Agricultural Domain,Semantic Navigation of Food, Nutrition

    and Agricultural Journal

    Italy PI and eG Portal,IS,SA, andCD

    1,2,SKOS,PV,and IHV

    IS

    The swordfish Metadata InitiativeBetter,Faster,Smarter Web Content

    US IT industry Portal and DI 1 and IHV DCG and ECR

    Twine US IT industry SA,SN,and DI RDF,1,2,3 INR,IS,S&RD,and DCG

    Use of SWT in Natural Languageinterface to Business Applications

    India IT industry Natural languageinterface

    1,2,5,3, andIHV

    IM and ECR

    Different abbreviations and integers used in above table are: SWT(Semantic Web Technologies),IHV(In-House Vocabularies),IS(ImprovedSearch),IM(Incremental Modeling),CD(Content Discovery),DI(Data Integration),INR(Identity New Relationships),S&RD(Share and ReuseData),PV(Public Vocabularies),ECR(Explicit Content Relationships),SA(Semantic Annotation),SN(Social Network),GIS(Geographic

    Information System) ,ELT(Education, Learning Technology),CM(Content Management),DCG(Dynamic ContentGeneration),P(Personalization),PI(Public Institution),HC(Health

    Care),eG(eGovernement),1(RDFS),2(OWL),3(SPARQL),4(GRDDL),5(Rules, Rules(N3))

  • 8/14/2019 Web Search Systems

    6/23

    9

    2.2 The Web Ontology

    Ontology formally provides structural knowledge of a domain and its data in the machine-

    understandable way in some W3C [18] recommended technologies such as RDFS [19] and OWL [20] for

    ontology formalization. It is an essential component of any semantic web application. Basically, it is the

    central component in overall layered architecture of the semantic web, as shown in Figure 2.1. In

    ontologies, the information-resources are connected in such a way that each is uniquely identified by URI

    and new information can be drawn through a reasoning process. Basically, the ontology is a special type of

    network of web-resources. A conceptual view of ontology can be in RDF-graph or in triples form. Consider

    an example of a persons family ontology as shown in Figure 2.2. Now by definition of ontology, it looks a

    special kind of a network of information resources. The only information explicitly given in said ontology

    are isFatherOf, isMotherOf, isBrotherOf, isSisterOf, but we can drive a several types of new information

    such as isSonOf(3,1) (i.e. Humza is the son of Farooq), isHusbandOF(8,9), isWifeOf(9,8),

    isGrandFatherOf(8,3), isGrandmotherOf(9,3) and many more can easily be derived. It can be implemented

    in some logic-based language such as OWL and conceptually it can be seen as triples or RDF-graph.

  • 8/14/2019 Web Search Systems

    7/23

    10

    Figure 2.1: A layered model of semantic web [21].

  • 8/14/2019 Web Search Systems

    8/23

    11

    Figure 2.2: A sample slice of persons family ontology

    2.3 Engineering Ontologies for Semantic Web

    We have reviewed the literature concerning the adaptation of existing web development methodologies

    for semantic web applications. The findings disclose that not much work has been done in this direction.

    XML web engineering methodologies adaptations are proposed in the form of WEESA [22]. It generates

    semantic annotations by defining a mapping between the XML schemas and existing ontologies. No proper

    guidelines are there for designing ontology. WEESA cannot directly design or develop ontologies. It

    focused on mapping of XML schema concepts onto concepts of existing ontology. In [23], the authors have

  • 8/14/2019 Web Search Systems

    9/23

    12

    proposed an extension of the Web Site Design Method. In this approach object chunk entities are mapped

    to concepts in the ontology. OOHDM has extended in the light of the semantic web technologies [24]. Its

    primitives for conceptual and navigation models have been described as DAML classes and RDFS have

    been used for domain vocabulary. The Hera [25] methodology has been extended for adaptive web-based

    application engineering. It uses semantic web standards for models representation.

    Existing web engineering methodologies as mentioned above, have adapted the use of ontology in their

    development process, but their main focus is mapping and annotation, using existing ontologies. Design

    process of new ontologies during semantic web application development is remained lack of focus. No

    proper guidance is there on how to design ontology for semantic web applications.

    There are some other methodologies for ontology development as surveyed in [26, 27, 28]. Mostly these

    methodologies focus on specification and formalization of ontology and do not concentrate on its design

    phase. These methodologies are based on natural language processing (NLP) and machine learning

    techniques. Orientation of these methodologies is web-agents facilitation rather than web-contents

    formalization. The work on ontology development was boosted when the idea of Semantic Web was

    envisioned.

    In KBSI IDEF5 methodology [29], data about domain is collected and analyzed. Then built and fixed

    strategy is used to create ontology. Ushold and King [30] proposed an ontology development methodology.

    In this methodology, after identifying purpose of ontology, it is captured and then coded. In

    MethOntology[31] after preparing ontology specification, knowledge is acquired and analyzed to determine

    domain terms such as concepts, relations and properties and then formalization is started. After that,

    evaluation and documentation is performed.

    In [32] a methodology, based on collaborative approach has been proposed. In its first phase, design

    criteria for the ontology, specified boundary conditions for the ontology and set of standards for evaluating

    ontology, are defined. In second phase, the initial version of ontology is produced, and then through

  • 8/14/2019 Web Search Systems

    10/23

    13

    iterative process the desired ontology is obtained. Software Built and Fixed approach is followed, which

    leads to heavy development and maintenance cost.

    Helena and Joo [33] proposed a methodology for ontology construction in 2004. This method divides

    ontology construction activities in different steps such as specification, conceptualization, formalization,

    implementation and maintenance. Knowledge acquisition, evaluation and documentation are performed

    during each phase. There are some other approaches investigating the transformation of a relational model

    into an ontological model. In these approaches, the ontology is developed from database schemas. These

    approaches mainly use reverse engineering theory. In [34], a methodology is outlined for constructing

    ontology from conceptual database schemas using a mapping process. The process is carried out by taking

    into consideration the logical database model of target system.

    Most of methodologies as overviewed above are based on the built and fixed approach. In that, first

    initial version of ontology is built and improved iteratively until domain requirements are satisfied. In this

    way the basic principles of software engineering are not followed properly. These methodologies mainly

    focus on data during developing process rather than focus on descriptive knowledge. These methodologies

    mainly work on specification and implementation phases and design phase lacks in proper attention.

    Moreover the design and implementation phases of these methodologies are difficult to identify and

    separate.

    One of the challenges in real world applications is to improve accessing and sharing knowledge that

    resides in database. Domain knowledge to be shared needs to be formalized using a formal ontology

    language. Extracting and building web ontology on top of Relational Database (RDB) is a way to represent

    domain specific knowledge. Moreover, information residing in RDBs is highly engineered for accessibility

    and scalability [35] and is characterized by high quality and rapidly increasing correspondence to the

    surface web [36]. In the schema mapping techniques, the idea is to convert RDB schema to ontology based

    on the predefined schema mapping rules. Various research groups have proposed various techniques.

  • 8/14/2019 Web Search Systems

    11/23

    14

    Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective

    mapping for ontologies automatically. It uses RDB schema to create OWL ontology and the mapping

    instance data. The translation process relies on a set of data source-to-domain mapping rules. A

    processing module named as Semantic Bridge for RDB translates queries to produce OWL ontologies.

    Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective

    ontology mappings automatically. Automapper is an application independent tool that generates a basic

    ontology from a relational database schema and dynamically produces instance data using ontology. This

    method quickly exposes the data to the Semantic Web, where a variety of tools and applications are

    available to support translation, integration, reasoning, and visualization.

    The following class descriptions, axioms and restrictions are currently generated by Automapper:

    maxCardinality is set to 1 for all nullable columns and is used for descriptive purposes minCardinality is set to 1 for all non-nullable columns and is used for descriptive purposes. All datatype and object properties that represent columns are marked as Functional Properties. To

    ensure global uniqueness and class specificity, these columns are given URIs based on concatenating

    the table and column names

    allValuesFrom restriction reflect the datatype or class associated with each column and is used fordescriptive purposes. Table 2.4 lists the contents of Departments table

    Table 2.4: Departments Table

    IDa NAME

    1 System Solutions

    2 Research and Development

    3 Management

    a. ID is Primary Key

  • 8/14/2019 Web Search Systems

    12/23

    15

    From this schema, Automapper creates the data source ontology and class-specific inverse functionalrules, as shown in the Fig.2.3

    Figure 2.3: Ontology and inverse functional rules created by Automapper

    XTR-RTO [38] is an approach to build OWL ontology from eXtensible Markup Language (XML)

    document. The transform module firstly maps source of XML schema into RDB and next, maps it into

    OWL ontology. Specific attributes for describing RDB are used such as rdb: DbName, rdb: Relation, rdb:

    RelationList, rdb: Table and rdb: Attribute. All tables are mapped to instances of type rdb: Relation and

    consequently will be added to type rdb: RelationList, where as each attribute is mapped to an instance of

    type rdf: Attribute and an instance of type rdb: hasType.

    XTR-RTO [38] is an approach to build OWL ontology from eXtensible Markup Language (XML)

    document. The transformation module firstly maps source of XML schema into RDB and next, maps it into

    OWL ontology. Specific attributes for describing RDB are used such as rdb: DbName, rdb: Relation, rdb:

    RelationList, rdb: Table and rdb: Attribute.

    All tables are mapped to instances of type rdb: Relation and consequently will be added to type rdb:

    RelationList, where as each attribute is mapped to an instance of type rdf: Attribute and an instance of type

    rdb: hasType.

    dsont:Hresources.Departments a owl:Class;rdfs:subClassOf

    [

    a owl:Restriction;owl:onProperty

    dsont:hresources.departments.ID;owl:allValuesFrom xsd:decimal

    ],

    [a owl:Restriction;

    owl:onProperty

    dsont:hresources.departments.ID;

    1^^xsd:nonNegativeInteger]

  • 8/14/2019 Web Search Systems

    13/23

    16

    The entity-relation model is the most popular style for organizing database at present, which can express

    the relationship between data clearly. Therefore metadata information and structural restrictions extracted

    from relational database to construct ontologies.The ontology contains:

    Vocabularies for describing relational database systems such as:rdb:DBName,rdb:Relation,rdb:RelationList,rdb:Table,rdb:Attribute,rdb:PrimaryKeyAttribute , and rdb:ForeignKeyAttribute.

    Semantic relationships between vocabularies suchas:rdb:hasRelation,db:hasAttribute,rdb:primaryKey,db:hasType and rdb:isNullable.

    Restrictions on the vocabularies and their semantic relationships such as: each relation has zero ormore attributes, and each attribute has exactly one type.

    Mapping approach used in XTR-RTO is described below:

    Each table is mapped to an instance of type rdb: Relation and added to type rdb: RelationList. Each attribute is mapped to an instance of type rdb: Attribute and an instance of type rdb: hasType is

    generated simultaneously. If the attribute is the foreign key, an instance of type rdb:

    ReferenceAttribute and an instance of type rdb: ReferenceRelation are generated to represent this

    information.

    Generate the restriction of each instance of type rdb: Attribute, such as cardinality restriction and foreign

    key restriction.

    There is one table in this relational database, as illustrated in Table 2.5

    Table 2.5: Book Table in Relational ModelBOOK_ID TITLE AUTHOR PRINTER_ID

    Suppose the database is saved in local host, and the OWL ontology describing the database has a

    namespace http://localhost/book.owl, which can be changed by users. Fig. 2.4 illustrates the relations in

    the ontology:

  • 8/14/2019 Web Search Systems

    14/23

    17

    Figure 2.4: Relations in the ontology

    The OWL description of table BOOK is as shown in the Fig. 2.5

    Figure 2.5: OWL description of BOOK Table

    false

    string

    false

    string

    falsestring

    false

    string

  • 8/14/2019 Web Search Systems

    15/23

    18

    In RTAXON [39] data mining approach is used. This work proposed learning technique to exploit all

    database content to identify categorization patterns. These categorization patterns used to generate class

    hierarchies. This fully formalized method combines a classical schema analysis with hierarchy mining

    (extraction) in the data. Learning method focuses on concept hierarchy identification by identifying the

    relation of attributes. Due to the assumption that attribute names have specific role in the relation, this

    approach identifies lexical clues of attribute names that reveal their specific role in the relation (i.e.

    classifying the tuples).

    A learning method that shows how the content of the databases can be exploited to identify

    categorization patterns. These categorization patterns are then used to generate class hierarchies. This fully

    formalized method combines a classical schema analysis with hierarchy mining (extraction) in the data.

    Learning method focuses on concept hierarchy identification by identifying the relation of attributes. Due to

    the assumption that attribute names have specific role in the relation, this approach identifies lexical clues

    of attribute names that reveal their specific role in the relation (i.e. classifying the tuples).

    In example ofFig. 2.6, the categorizing attribute in the Products relation is clearly identified by its name

    (Category).

    2.3.1 Identification of the categorizing attributes:

    Two sources are involved in the identification of categorizing attributes: names of attributes and data

    diversity in attribute extensions (i.e. in column data). These two sources are indicators that allow, finding

    attribute candidates and selecting the most plausible one.

    a.Identification of lexical clues in attribute names:

    Attributes bear names that reveal their specific role in the relation (i.e. classifying the tuples) used for

    categorization. The lexical clue that indicates the role of the attribute can just be a part of the name, as in the

    attribute names CategoryID or Object Type. In Fig. 2.6, for example, the categorizing attribute in the

  • 8/14/2019 Web Search Systems

    16/23

    19

    Products relation is clearly identified by its name (i.e. Category).A list of clues set up and used to perform a

    first filtering of potential candidates.

    b.Filtering through entropy-based estimation of data diversity:

    With an extensive list of lexical clues, the first filtering step appears to be effective. For example,

    Category column in the Products relation can be used to derive subclasses. However, experiments on

    complex databases show that this step often identifies several candidates. The selection among the

    remaining candidates is based on an estimation of the data diversity in the attribute extensions. A good

    candidate might exhibit some typical degree of redundancy that can be formally characterized using the

    concept of entropy from information theory. Entropy is a measure of the uncertainty of a data source.

    Attributes with highly repetitive content will be characterized by low entropy. Conversely, among attributes

    of a given relation, the primary key will have the highest entropy since all values in its extension are

    distinct. Informally, the rationale behind this selection step is tofavor the candidate that would provide the

    most balanced distribution of instances within the subclasses.

    2.3.2 Generation and population of the subclasses:

    The generation of subclasses from an identified categorizing attribute can be straightforward. A subclass

    is derived from each value type of the attribute extension (i.e. for each element of the attribute active

    domain). However, proper handling of the categorization source may require more complex mappings.

    Example

    As illustrated in Fig. 2.6, the derivation applied in this example can be divided into two inter-related. The

    first part, named (a) in the figure, includes derivations that are motivated by the identification of patterns

    from the database schema. Each relation (or table) definition from the relational database schema is the

    source of a class in the ontology.

    To complete the class definitions, datatype properties are derived from some of the relation attributes.

    The foreign key relationships are the most reliable source for linking classes and, in this example; each

    relationship is translated into an object property. The derivations applied to obtain this upper part of the

  • 8/14/2019 Web Search Systems

    17/23

  • 8/14/2019 Web Search Systems

    18/23

    21

    2.3.3 Rules fo r learning classes

    During learning ontological classes, one class may derive from the information spreading across several

    relations. These rules integrate the information in several relations into one ontological class, when these

    relations are used to describe one entity.

    2.3.4 Rules for learning properties and property characteristics

    There are two kinds of properties: object properties and datatype properties. These rules learn object

    properties according to the reference relationship between relations. In a relational model, relations may be

    used to indicate the relationship between two relations; in this case, such relation can be mapped into an

    ontological object property. These rules learn object properties by using the binary relationship between

    relations. Complex or non-binary (n-ary) relations are not supported by ontology languages and need to be

    decomposed into a group of binary relations. For each attribute in relations, if it cannot be converted into

    ontological object property, it can be converted into ontological datatype property.

    2.3.5 Rules for learning hierarchy

    Classes and properties can be organized in a hierarchy. If two relations in database have inheritance

    relationship, then the two corresponding ontological class or property organized in a hierarchy.

    2.3.6 Rules for learning cardinality

    Ontology defines property cardinality to further specify properties. The property cardinality learned from

    the constraint of attributes in relations. If attribute is primary or foreign key then the minCardinality and

    maxCardinality of the property is 1. And if any attribute is declared as NOT NULL, the minCardinality of

    the property is 1. Furthermore if any attribute is declared as UNIQUE, the maxCardinality of the property is

    1.

    2.3.7 Rules for learning instances

    For an ontological class, its instances consist of tuples in relations corresponding to class and relations

    between instances are established using the information contained in the foreign keys in the database tuples.

  • 8/14/2019 Web Search Systems

    19/23

    22

    According to above mentioned rules, ontology constructed based on the data in relational database

    automatically.

    2.3.8 Implementation

    The overall framework of ontology learning is presented in Fig. 5. The input for the framework is data

    stored in relational database. The framework uses a database analyzer to extract schema information from

    database, such as the primary keys, foreign keys and dependencies. Then the obtained information is

    transferred to ontology generator. The ontology generator will generate ontology based on the schema

    information and rules. As a last step, user can modify and refine the obtained ontology with the aid of

    ontology reasoner and ontology editor.

    The framework is a domain/application independent and can learn ontology for general or specific

    domains from relational database.Fig.2.7 illustrates ontology construction framework

    Figure 2.7: Ontology Construction Framework

    RDB

    Ontology Generator Rules Lib

    OntologyOntology

    ReasonerOntology Editor

    Ontology Based Application

    RDB AnalyzerRDB

    Ontology Generator Rules Lib

    OntologyOntology

    ReasonerOntology Editor

    Ontology Based Application

    RDB Analyzer

  • 8/14/2019 Web Search Systems

    20/23

    23

    As this survey shows that each approach defines common rules for mapping basic RDB schema pattern

    to ontology such as relations, properties, reference keys and cardinalities as shown in the following Table

    2.5.

    Table 2.5: Comparative Analysis of ontology construction approaches

    Approach Mapping

    Creation

    Procedure Sources Used Mapping Rules Limitations

    Automapper:

    Relational DatabaseSemantic

    Translation using

    OWL and SWRL[37]

    Automatic -Construction of

    ontology fromRDB with the help

    of Configuration

    file

    -Configuration

    file-RDB Schema

    -Mapping rules

    Class,datatype

    property, andobject property

    -Un-normalized

    RDB-Un-resolvable

    URIs

    Using Relational

    Database to Build

    OWL Ontology

    from XML Data

    Sources [38]

    Semi-Automatic - Construction of

    ontology from

    RDB

    -RDB schema

    -Mapping rules

    rdb:Relation,rdb

    :Attribute,and

    rdb:ReferenceAt

    tribute

    -Un-normalized

    RDB

    -Need domain

    experts help for

    extracting

    cardinalityrestrictions.

    Mining the Content

    of RelationalDatabase to Learn

    Ontology with

    Deeper Taxonomies[39]

    Semi-Automatic -Schema analysis

    and Hierarchymining from stored

    data

    - RDB schema

    - Mapping rules- Stored data

    Class,datatype

    properties,objectproperty, and

    M:M object

    property

    -Sometimes,

    attribute name doesnot represent its

    value

    Learning Ontology

    from RelationalDatabase[40]

    Automatic - Construction of

    Ontology fromRDB without

    using middlemodel

    -RDB Schema

    -Mapping rules

    classes,

    properties,property

    characteristics,cardinalities and

    data instances

    -Un-normalized

    RDB

    Ontology built either from scratch using any ontology editor or by leveraging database (document

    collections) using (semi-) automatic ontology construction. The topic of ontology construction has been

    receiving growing attention since different application might focus on different aspect. However, the basic

    idea is to provide vocabularies of concepts and their relationship within a specific domain. Many works

    have been done in the area of ontology construction through the use of relational database as the data

    source.

    As described in Table 2.5, the techniques used in ontology construction from relational database are

    based on schema mapping and data mining approaches.

  • 8/14/2019 Web Search Systems

    21/23

    24

    Automapper [37] is a Semantic Web interface for RDB to generate the data source and the respective

    mapping ontology automatically. The translation process relies on a set of data source-to-domain

    mapping rules. These mapping rules depend on well-form relational database schema. In many applications,

    sometimes well-form relational database schema is not available, thus it is not guarantee for better results in

    ontology construction process.

    XTR-RTO [38] uses metadata information extracted from relational database to construct ontology

    based on predefined schema translation rules. Effectiveness of theses translation rules depends upon well-

    form relational database schema. Sometimes unavailability of well-designed relational database, results in

    challenges in ontology construction process.

    RTXON [39] identifies lexical clues for attribute names in the filtering process. In many applications,

    attribute names, sometimes do not represent its value (any lexical clue), thus better results cannot be

    guaranteed in filtering process.

    The approach used in [40] acquired ontology from relational database automatically by using group of

    learning rules. These learning rules depend upon relational database schema and in many applications,

    unavailability of well-formed relational database resulting in inconsistent and incorrect ontology

    construction.

    2.4 Populating Web Ontologies

    Web-ontology automatic population is one of the research issues nowadays. There are two stages of

    web-ontology developments; one is its (schema) creation and second is its population. Although populating

    web-ontology is not a complicated task, but it is very time consuming, enormous and laborious when it is

    performed manually. For the success of Semantic Web there is a need of a proper solution for populating

    ontologies. In current web, mostly the data is available in XML data files, and there is extremely large

    number of these data resources, containing data in terabytes. To upgrade such data-intensive web

  • 8/14/2019 Web Search Systems

    22/23

    25

    applications into semantic web applications, there is need of some proper methodologies for automatic

    populating of ontologies.

    Different approaches have been proposed for ontology populating. Some of them are based on natural

    language processing and machine learning techniques [41] [42]. Junte Zhang and Proscovia Olango [43]

    presented a novel approach for making and populating ontologies. According to this method domain

    knowledge about the ontology is collected and domain ontology is constructed by using open source tool

    called Protg. This domain ontology is transformed to equivalent RDF file and this RDF file is

    manipulated manually to populate the ontology skeleton created by the Protg. XSLT or Xquery have been

    used to extract the relevant information from Wikipedia pages into Perls regular expressions and then

    ontology instances are generated using those expressions. Semantic heterogeneities and inconsistency

    problems were raised while exporting Wikipedia pages to XML format and those remained unsolved.

    Web-ontology creation and populating guidelines are provided while developing new semantic web

    applications using WSDM [44] [45]. Concepts in the ontology are mapped to object chunks manually at

    conceptual level. This conceptual mapping is used to generate actual semantic web page at implementation

    level. A similar approached is used in WEESA [46]. This is an adaptation of XML-base web engineering.

    Web-ontology concepts are mapped to schema elements of XML document. This mapping is defined for

    each page. Then ontology is populated via tool [47]. In [48] a methodology is proposed for extracting data

    from web documents for ontology population. This methodology consists of three steps. The first step

    consists of extracting information in the form of sentences and paragraphs. Web documents are selected by

    using some search engines or we could select them manually. This information is understood by the system

    semantically and syntactically and it also understands the relations between the terms of text using some

    rhetorical structures. For efficient representation of extracted information, XML is used due to its flexibility

    and abilities for handling data. We proposed ontology populating methodology [49] to populate ontology

    from data stored in XML data files. This methodology may help in transforming an existing non-semantic

  • 8/14/2019 Web Search Systems

    23/23

    web application into semantic web application by populating its web-ontology semi-automatically through a

    set of transformation algorithms by reducing the time consuming task of ontology population. In [50], a

    similar work is presented. The proposed methodologies take a web-ontology schema and XML-document

    as input and produce a populated ontology as output.

    2.5 Integrating Web Ontologies

    When multiple ontologies are simultaneously used in some integration operations such as merging,

    mapping and aligning then they may suffer from different types of heterogeneities such as semantic

    heterogeneity and non-semantic or syntactic heterogeneity [51,52]. The syntactic heterogeneity is due to

    the use of different languages for the formalization of ontologies e.g., one ontology is formalized in OWL

    [20] and the other is formalized in DAML [53] language. The semantic heterogeneity consists of

    terminological, conceptual and contextual heterogeneities. The terminological heterogeneity occurs when

    different terms are used to represent same concepts or same term is used to represent different concepts.

    Conceptual heterogeneity between two concepts may occur due to granularities of concepts i.e., when one

    is sub-concept or super-concept of the other or both are overlapped. Similarly, two concepts are explicit-

    semantic heterogeneous when they have different roles or functionalities in the similar domain.

    In ontology model, the taxonomic and non-taxonomic relations of a concept represent its roles and

    super/sub concepts respectively. Since, in a certain domain an intellectual concept is explicitly defined in

    term of roles that it keeps, therefore we may call the roles of a concept as its explicit semantics. However,

    explicit semantics of a non-intellectual concept may be derived from its granularities and its physical

    attributes because of typical nature of those domains, e.g. in ontology of a furniture domain, concepts like

    chair, table and desk have no intellectual properties, these concepts have only taxonomic (i.e. parent, child,

    sibling) and elementary (i.e. color, type, etc.) characteristics.