personalized ontology learning and mining for web information gathering

Upload: kiran-matukumilli

Post on 06-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    1/46

    Personalised Ontology Learning and Mining

    for Web Information Gathering

    Xiaohui (Daniel) Tao

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    2/46

    Agenda

    ! Introduction! Ontology Learning for User Background Knowledge! Ontology Mining for Personalisation! Evaluation! Conclusions

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    3/46

    Challenge and Solution

    ! Web information explodes rapidly and, as a result, Webinformation gathering (WIG) becomes challenging;

    !Most systems are based on keyword-matching techniques! Feature vectors based on the statistics of terms and

    documents;

    ! Information mismatching and overloading problems!

    Capturing user information needs can benefit Webinformation gathering

    ! Personalised Web information gathering

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    4/46

    User Profiles in Personalised WIG

    ! What is a user profile?! the interesting concepts of user information needs

    ! User profiles acquisition!

    Global analysis techniques use global knowledge bases;! Local analysis techniques use user feedback or observe userbehaviour.

    ! Interviewing, non-interviewing, and semi-interviewing! Interviewing: e.g. TREC-11 Filtering track training sets;! Non-interviewing: e.g. the OBWAN model;! Semi-interviewing techniques: e.g. Foxtrot recommender

    system.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    5/46

    User Profiles in Other Fields

    ! Biological/medical professionals range from various levels ofcomputer knowledge;

    ! There are many other distinguishing factors when it comes topatients;

    ! Adapting a personalized computer application using user profiles, wecan improve the usability of biological/medical systems and thus the

    work of professionals:

    ! Provide simpler, more efficient user interfaces for bio/medicalprofessionals;

    ! Tailor user interfaces to a professionals/patients needs andimpairments.! In a smart clinic bio/medical professionals preferences and prioritiescan be considered. Their input can be simplified and human errors canbe significantly reduced.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    6/46

    Ontologies for User Profiles

    ! Ontologies can be used to describe user background knowledge torepresent user profiles.

    ! Ontology definition! Formal description and explicit specification of conceptualisation

    ! Consist of concepts, instances, semantic relations, and axioms;! Domain ontologies and generic ontologies! Ontology learning

    ! Manual accomplishment of ontology learning (efficiency needs tobe improved)

    ! Automated accomplishment of ontology learning (accuracy needsto be improved)

    ! Knowledge specification in ontologies.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    7/46

    Problem and Motivation

    ! Acquiring user profiles using ontology-based methods is animportant hypothesis in personalised Web information gathering.However, the existing approaches have limitations.

    ! A breakthrough is necessary for clear and complete specificationof knowledge in ontologies through mining local information;

    ! This thesis addresses these problems by!

    proposing a novel ontology learning and mining model, and! evaluating the model against numerous existing personalisedWIG models.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    8/46

    Ontology Learning for User Background

    Knowledge

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    9/46

    World Knowledge Representation

    ! World knowledge is the commonsense knowledge possessed by people and isacquired through their experience and education;

    ! Aworld knowledge base (WKB) is a global ontology that formally describes andspecifies world knowledge;

    ! With a WKB, user-interesting concepts are extracted, including both the relevantand non-relevant concepts according to user information needs.

    ! Library of Congress Subject Headings (LCSH)

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    10/46

    World Knowledge Base Construction

    ! MARC 21 authority records of the LCSH system! 130MB with a single data stream

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    11/46

    Ontology Taxonomy Construction! The WKB works as a global ontology to extract user-interesting

    concepts for bootstrapping user personalised ontologies;

    ! For a given topic, three different sets of concepts need to beextracted, along with their associated semantic relationships:

    ! Positive subjects: the concepts that are interesting to the userwith respect to the topic;

    ! Negative subjects: the concepts that may make paradoxical orambiguous interpretations of the topic, thus making it difficult

    to capture the information needs.! Neutral subjects: the concepts that have no indication ofeither positive or negative subjects.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    12/46

    Semi-automatic Taxonomy Construction! Ontology Learning Environment (OLE)

    ! extracts candidate subjects from the WKB, and! for users to identify for positives and negatives;

    ! Support values of selected subjects! The subjects are selected by the user, thus, their support

    values are approved by the user.

    !sup(s,T)=1 for positive subjects;! sup(s,T)=-1for negative subjects;

    ! sup(s,T)=0for neutral subjects;

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    13/46

    Automatic Taxonomy Construction

    ! Web users may not like to burden themselves with providing feedback, soautomatic taxonomy construction is necessary;

    ! The subjects that have terms overlapped with the terms in the given topicare extracted for positive subjects;

    ! The heighbours of positive subjects are extracted as the negative subjects.! The support value of identified subjects:

    ! Positive subjects:

    ! sup(s,T)=-1for negative subjects;! No neutral subjects in this method at this stage.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    14/46

    Illustration of Constructed Ontologies

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    15/46

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    16/46

    Specificity and Exhaustivity

    ! Ontology mining aims to discover interesting concepts from the ontologies.! The difficulty is how to emphasis the specific relations ofis-aandpart-ofin a

    single computational model;

    ! A multidimensional ontology mining method, Specificityand Exhaustivity,is introduced to solve this problem

    ! Specificitydescribes the focus of a subject on a given topic,! Sematic and topic specificity

    ! Exhaustivityrestricts the semantic extent covered by a subject that deals with thetopic.

    ! Specificityand Exhaustivityare designed to investigate the concepts and thestrength of associations between them in ontologies.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    17/46

    Semantic Specificity

    ! Semantic specificity: the focus of a subject on its referring concepts.! Influenced by the subject's locality in the taxonomic structure of

    ontology;

    ! The upper bound level subjects are more abstractive, cover moredescendent subjects, and have more concepts referred, comparing withthe lower bound level subjects;

    ! Thus, the lower bound subjects have stronger focus because they have asmaller number of concepts referred.

    ! The Semantic specificity of a subject, spea(s), is measured by! investigating the subjects locality in the taxonomic structure,! taking the associated is-aandpart-of semantic relations into account.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    18/46

    Topic Specificity

    ! Topic Specificity: the focus of a subject on a given topic.! The topic specificity can be discovered from a user's personal information

    collections: Local Instance Repository (LIR).

    ! LIR: user stored documents, browsed Web pages, and compiled/received emails,and so on;! They have content-related descriptors associating with the concepts specified in aknowledge base;

    ! This kind of documents with semantic meta-data become more and more popularon the Web today, and are argued to be the mainstream of semantic Web

    documents.

    ! A user's LIR is simulated by a collection of user-visited information items in alibrary catalogue.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    19/46

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    20/46

    Topic Specificity

    ! Mappings between the subjects in the WKB and theinstances in an LIR:

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    21/46

    Exhaustivity

    ! Exhaustivity: the extent of concepts dealt with by a subject,in respect to a given topic.

    ! The extent of interesting concepts referred to by a subjectextends if the subject has more positive descendants to thetopic.

    ! In contrast, if the subject has more negative descendants, theextent of user-interesting concepts referred by the subjectshrinks.

    ! The exhaustivity of a subject, exh(s, T), is measured byinvestigating the strength of its descendant subjectssupporting or against the given topic.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    22/46

    User Profiles Refinement

    ! Subjects are only considered user-interesting if thesubjects have positive specificity and exhaustivity

    values.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    23/46

    Interesting Subject Discovery

    ! The interesting subjects are discovered from the user'sLIR, based on the citation of subjects to instances.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    24/46

    Interesting Subject Discovery

    ! The interestingness level of these discovered subjects ismeasured

    ! Based on the overlapping size with the positive subjects.! A discovered subject is more interesting if

    ! it has more related-topositive subjects, and! these related-topositive subjects have stronger support tothe topic.

    ! Methods are proposed to calculate:! The min_interestthat is determined by the positive subjectsand designated to prune the weak discoveries;! The interest(s, T) of the discovered interesting subjects.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    25/46

    User Profiles Refinement (2)! The ontology-based user profiles are further refined for

    personalisation, with the interesting subjects discovered

    based on users LIRs.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    26/46

    Evaluation

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    27/46

    Evaluation Experiments

    ! Objective evaluation method! Standard test data, topics, and measuring methods;! No human participants are involved;! Possible to compare with state-of-the-art models;! Fully repeatable.

    ! Testing data set! The Reuters Corpus Volume 1 (RCV1) corpus with 806,791 Web documents;! The dataset was also used in TREC-11 Filtering Track;

    ! Experimental topics (50 topics)! TREC-11 Filtering Track topics R101-R150 manually created by NISTA experts;! High stability of evaluation experiment

    ! 25 topics is just barely enough for an experiment but that 50 topics is stable, stated by C. Buckleyand E. M. Voorhees[16];

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    28/46

    Experiment Dataflow

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    29/46

    Common Platform: WIG System

    ! Commonly used by all the experimental models;! The implementation of a model developed byLi and Zhong[8],

    which uses user profiles for Web information gathering.

    ! It is chosen because! Verified better than theRocchio andDempster-Shafermodels,! Extensible in using support values of training documents.

    ! Input: user profiles containing positive and negative documentsassociated with support(d)values.

    ! Output: ranked documents gathered from the RCV1 testing set.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    30/46

    Ontology Models

    ! Implementation of the proposed ontology learning and mining model! Ontology-I model: ontologies were learnt by using the semi-automatic method;! Ontology-II model: ontologies were leant by using the automatic method.

    ! World Knowledge Base!

    Constructed based on the LCSH system;! Contains about 491,250 topical, geographic, and corporate subjects;! Three different semantic relations specification: is-a, part-of, and related-to.

    ! Local Instance Repository! The catalogue information in the QUT library, containing 448,590 items;! Available for public access.

    ! Documents in user profiles were extracted from the LIRs using the user-interestingsubjects

    ! The training documents were weighted by

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    31/46

    TREC Model

    ! Demonstrated the manual user profile acquiring methods;! For a given topic,

    ! Linguists read a set of documents; and marked them positive or negativeagainst the topic;

    !Positive documents: support

    (d

    )=1/

    |D

    +|;! Negative documents: support(d)=0;

    ! TREC model makes a target model to our proposed model to mark.! the justifications of positive and negative were made by users manually;! assumption: only users know their interests and preferences perfectly.

    ! Experiment hypothesis! The Ontology models can achieve the same or close performance to that of this

    TREC model.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    32/46

    Category Model

    ! Demonstrated the non-interviewing user profiles acquiring techniques! In particular the Gauch et al.'s OBWAN model and Sieg et al.'s ontological user

    profile model.

    ! The user-interesting concepts were represented by a set of weighted positivesubjects constructed in the ontology form with the super-class and sub-class relations specified.

    ! Positive subjects were the same as those used in the Ontology-I model andobtained via the OLE;

    ! Training sets were extracted from the user LIRs, the same process as that inthe Ontology models.

    ! support(d) was determined by the # of positive subjects cited by thedocument.

    ! Experiment hypothesis was that the Ontology models can outperform this baselineCategory model.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    33/46

    Web Model

    ! Implementation of the preliminary study model and the typical semi-interviewing user profiles acquiring models.

    ! The positive and negative subjects were identified by users manually;! The subjects were used to acquire training sets from the Web via Google;! support(d)was determined by

    ! The beliefs of the referring positive subjects;! The ranking position on the returned list;! The Googles precision performance.

    ! Experiment hypothesis! The Ontology models can outperform this preliminary study model.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    34/46

    Performance Measuring

    ! Common and modern performance measuring methods! The precision averages at eleven standard recall levels (11SPR);! The mean average precision (MAP);! Macro - F1 Measure and Micro - F1 Measure.

    ! Statistical significance tests! The Students Paired T-Test

    ! Largely agrees with the bootstrap and randomisation tests in terms ofinformation gathering evaluations [181].

    ! Percentage change in performance.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    35/46

    11SPR Results

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    36/46

    MAP and F1 Measure Results

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    37/46

    Statistic Significance Test Results

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    38/46

    Ontology vs. TREC

    ! The coverage of user profiles;! Limited readings in the TREC user profiles acquiring process;

    ! The representation of user profiles;! Formal definitions vs. non-definitions! Ontology taxonomy structure vs.

    non-structure

    ! The specification of semantic relations! is-a, part-of, and related-tovs.

    non-specification

    !The support value of training documents! Float values vs. the binary values

    ! Manual acquiring still maintained the accuracy of TREC user profiles.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    39/46

    Ontology vs. Web

    ! The definition of concepts.! The use of WKB in the Ontology model.! Users had no clear definition of concepts for computational models.! The accuracy and coverage of user profiles! Interesting concepts discovery in the

    Ontology model;

    ! The refining phases in the Ontology userprofiles acquiring;

    ! Semantic relations specification! is-a, part-of, and related-to in theOntology model! No semantic relations taken into account inthe Web model

    ! Training documents extraction! Abstracted information in the Ontology user profiles! Free contribution to Web documents that were used by the Web model.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    40/46

    Ontology vs. Category

    ! The coverage of user profiles! Interesting concepts discovery phase in the Ontology model;! The accuracy of user profiles! The refining phase in the Ontology

    model for user profiles acquiring;! Knowledge specification! is-a, part-of, and related-tovs.! super-class and sub-class! The representation of user profiles

    ! Positive, negative, and neutral subjects in the Ontology userprofiles;! Positive subjects only in the Category user profiles.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    41/46

    Conclusions

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    42/46

    Ontology Learning and Mining Model

    ! An ontology learning and mining model is proposed, aiming to acquire userprofiles for personalised Web information gathering;

    ! A large world knowledge base is constructed based on the LCSH system;! Two ontology learning methods, automatic and semi-automatic, are proposed to

    learn personalised ontologies for user profiles;

    ! A multidimensional ontology mining method, specificity and exhaustivity, isintroduced to investigate the concepts and their semantic relationships inontologies;

    ! The ontologies are personalised base on the user-interesting concepts discoveredfrom the user Local Instance Repositories.

    ! The model is evaluated by comparing the acquired user profiles with thatacquired by benchmark models in experiments, and the evaluation result ispromising.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    43/46

    Contributions

    ! Contributions to knowledge engineering:! Proposed a computational model that emphasises the specific semantic

    relations ofis-a, part-of, and related-to in a single model, using the

    multidimensional specificity and exhaustivity;

    ! Explored a breakthrough for clear and complete specification of knowledge inontologies through mining local information;

    ! Provided an ideal world knowledge base for knowledge models developed byother scientific researchers.

    ! Introduced a reliable objective ontology evaluation method for otherontology researchers to evaluate their works;

    ! Contributions to Web information gathering:! Proposed a concept-based approach to acquire ontology-based user profiles

    for personalised WIG;

    ! Provided a new benchmark for other researchers.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    44/46

    Future Work

    ! Extending user profiles acquiring from short term to longterm.

    ! It will be interesting to investigate the change of user interests ina long term period and to measure its influence on WIG

    performance by extending the work presented in this thesis.! Extending the work on user profiles to other fields, such as:

    ! Mining biological data to create profiles for cancer users so thatwe can provide them better cares as well as systematically

    collecting more effective medical/biological data;

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    45/46

    References! The references are referred to the References

    ! X. Tao, Y. Li, and N. Zhong. A Personalized Ontology Model forWeb Information Gathering. IEEE Transactions on Knowledge and

    Data Engineering, 23(4):496--511, 2011

    ! X. Tao, Y. Li, and N, Zhong. A Knowledge-based Model UsingOntologies for Personalized Web Information Gathering. Web

    Intelligence and Agent Systems, an International Journal, 8(3), pp.

    235-254, 2010.

  • 8/2/2019 Personalized Ontology Learning and Mining for Web Information Gathering

    46/46

    Thanks for Listening

    Questions?