an ontology-driven web- based personalized news service

Knowledge Media Institute

myPlanet: An Ontology-Driven Web-Based Personalized News Service

Yannis Kalfoglou, John Domingue, Enrico Motta

Maria Vargas-Vera, Simon Buckingham Shum

KMI-TR-102

February, 2001

In Proceedings of the IJCAI’01 Workshop on Ontologies andInformation Sharing, Seattle, WA, USA, August 2001

www.kmi.open.ac.uk/tr/papers/kmi-tr-102.pdf

myPlanet: an ontology-drivenWeb-basedpersonalisednewsservice

YannisKalf oglou,John Domingue,Enrico Motta,Maria Vargas-Vera, SimonBuckingham Shum

KnowledgeMediaInstitute(KMi),TheOpenUniversity,

Milton KeynesMK7 6AA, UK�y.kalfoglou,j.b.domingue,e.motta,m.vargas-vera,s.buckingham.shum� @open.ac.uk

Abstract

In this paperwe presentmyPlanet, an ontology-driven personalisedWeb-basedservice. We ex-tended the existing infrastructureof the Plane-tOnto newspublishingsystem.Our concernsweremainly to provide lightweight meansfor ontologymaintenanceandeasetheaccessto repositoriesofnewsitems,arich resourcefor informationsharing.We reasonaboutthe informationbeingsharedbyproviding anontology-driveninterest-profilingtoolwhich enableusersto specify their interests. Wealso developedontology-driven heuristicsto findnews itemsrelatedto users’interests. This paperarguesfor therole of ontology-drivenpersonalisedWeb-basedservicesin informationsharing.

1 Intr oductionNowadays,we observe a trend in providing personalizedWeb-basedservicesin order to accommodatethe versatileneedsof an ever increasingnumberof Web users. Recentadvancesin agentandInternettechnologyprovide the tech-nological means,however, equally important is to providethe meansfor semanticinfrastructure. Towards this goal,[HuhnsandStephens,1999] proposetheuseof “personalon-tologies”whereeachWeb userwill beableto createhis/herown ontologytailoredto his/herview of theworld. Althoughwe found this idea fruitful, it bearsa contradictoryconno-tation. Whenwe talk aboutontologies,we can’t really say“personal”. Ontologiesare- by definition - sharedviews ofthe world([Kalfoglou, 2000a]). We ratherprefer to usethemetaphor“personalviews” of anontologytailoredto specificservices.That is, eachuserwill see- andeventuallybeableto edit - partof anontologythat is tailoredto a specificser-vice. Theontologyitself will remainshared,in thesensethatthecreation,editingandmaintenancetasksinvolvetheeffortsof many agents(letthembe peopleor software). The way itwill beexposedto userswill dependon thekind of servicesthey want. For example,in our domainof Web-basednewsservices,a useris able to browsethosecontentsof the on-tology thatarerelatedto news items,like peoplewho wrotethem,projectsmentioned,etc. This kind of Web-basednewsservicesenableusersto accessinformation tailoredto theirinterests.

Valuable information is sharedamong the membersofa community by using the lowest-common-denominatormedium:anemailmessage.Userssendastoryin theform ofanemail(hereafter, e-Story)to anewsserver from whichdes-ignatedsystemsredirectthe e-Storyback to targetedmem-bersof thecommunity. This is anindirectform of communi-cation(incomparisonwith amember-to-memberform), how-ever, weenrichit by anontology-driveninterest-profilingtoolanddeductive knowledgeretrieval techniques.This allowedus to reasonabout the knowledgebeing sharedand targetit to certainpeople. The meansfor connectingknowledgeto peoplewere analyzedfrom the processpoint of view in[O’Leary, 1998]. His framework hasbeenusedin someon-tology applications([Kalfoglou, 2000b]) and in [Domingueand Motta, 2000] the authorsshowed how theseprocessesarerealizedin thecontext of PlanetOnto. In particulartheyfocussedon the two connectingprocesses:peopleto knowl-edge andknowledge to people. Themeanswhich wereusedto connectpeopleto knowledgein PlanetOnto were inte-gratedvisualisation,search,and query-answeringfacilitieswhereastheconnectionof knowledgeto peopleachievedbypro-actively contactingpeopleto solicit e-Storiesand alertthemwhenitemsof interestwerepublished.

To deliver such an ontology-driven servicewe need tohave flexible mechanismsfor ontology maintenance,anarea which is still in its infancy and hampersontologyapplications([Kalfoglou et al., 2000]). In this work, we de-ployed InformationExtraction(hereafter, IE) systemsto ex-tract informationfrom usersusinga servicewhich could beusedto updatethe underlyingontology. In that sense,theuserbecomesthemainagentresponsiblefor maintainingtheontologyinstances,lifting the burderfrom ontologicalengi-neerswhocanfocusonstructuralandsemanticissuesrelatedwith ontologydesignanddeployment.In our domainwe ex-perimentedwith extractinginformationfrom users’e-Storiesin orderto updatetheunderlyingontology.

Our researchgoalsaretwo-fold: (a) to improve andeaseontologyusabilityfor Webusersbymeansof ontology-drivenWeb-basedfront-endsto personalizedservices;and (b) toprovide lightweight meansfor ontology maintenancetrig-geredby users’input by deploying IE techniquesalongwithdomainspecifictemplates.This tight couplingof Web-basedenvironmentswith underlyingontologiesis a promisingandappealingtechnologyfor themajorityof users.

1

7

6

5

4

3

2

Figure1: ThePlanetOnto architecture.

We organizethis paperas follows: in section2 we de-scribetheexisting infrastructure,PlanetOnto, which we ex-tendin section3 with the personalizedservicesprovidedbymyPlanet. We reporton relatedefforts in section4 andweconcludethe paperin section5 by discussingfuture direc-tionsandimplicationsof this work.

2 PlanetOntoIn this sectionwe briefly describetheexisting infrastructure,PlanetOnto, an integratedsuiteof tools developedover thelast 4 yearsin the Knowledge Media Institute(KMi). Thewhole infrastructureis describedin detail in [DomingueandMotta,2000]. Herewerecapitulateontheimportantelementsof thePlanetOntoarchitecturesomeof whichwerethefocusof ourwork aswedescribein thenext section.

In thePlanetOntodomainweidentify threetypesof users:journalistswho sendstoriesto KMI Planet, knowledgeedi-torswhomaintainthePlanetontologyandthePlanetknowl-edgebase,andreaderswho readthePlanetstories.In figure1, we illustrate the PlanetOnto architecturealong with theactivities thatsupports:

1. Storysubmission:Storiesaresubmittedto KMi Planetin theform of emailwhich is thenformattedandstoredin KMi Planet’sstorydatabase;

2. Storyreading: Storiescanbe readby usinga standardWebbrowser;

3. Storyannotation:A specializedtool, KNote, is usedtohelp the journalistor the knowledgeeditor to associate

thestorywith knowledgestructureof theunderlyingon-tology. This processwas manualand we have semi-automatedit aswe describein section3.2;

4. Provision of customizedalerts: An agent, Newsboy,builds userprofiles from patternsof accessto Plane-tOntoandthenusestheseprofilesto alert readersaboutrelevantstories.While thattool usesstatisticalevidenceto build profiles, in section3.1 we presentmyPlanetwhich makesit possiblefor a userto build a profile byusinganontology-drawn structure;

5. Ontology editing: A Web-based ontology editor,WebOnto[Domingue, 1998], is used for construct-ing knowledgemodelsin the OCML language[Motta,1999];

6. Story soliciting: An agent, Newshound, gathersdataaboutpopularnews items and then solicits potentiallypopular stories from the journalists. The ontology-driven heuristicsof myPlanet, describedin section3,could extendthis tool to solicit storiesfrom journalistswith similar interests;

7. Storyretrieval and queryanswering:A Web-basedin-terface,Lois, providesaccessto thestoryarchiveandtheassociatedknowledgebaseby integratingWeb-browsingandsearchwith knowledge-basedqueryretrieval.

3 myPlanetPlanetOnto wasoriginally conceivedasaninternalnewslet-

Figure2: Thee-StoriesfinderJava Applet.

ter andprogressively becamean integratedsuiteof tools forknowledgemanagement.It is usedasa masscommunicationmediumfrom membersof our lab but lacks the advantagesof personalised,tailored-to-preferences,services.myPlanetaimsto fill-in this gapby providing themeansfor easynav-igation throughthe e-Storiesrepository, settinguserprefer-ences,andproviding assistanceto theknowledgeeditorsforannotatinge-Stories.Wedescribethesetoolsin thefollowingtwo sections.

3.1 Ontological interest-profilingOneof the limitations of the PlanetOnto suiteof tools wasthelack of ane-Storiesretrieval methodwhich would enableusersto readonly the e-Storiesof their interestinsteadofforcing themto browsethee-Storiesdatabasefor potentiallyinterestingitems. A possiblefix to this problemwould havebeento provide a keyword-basedsearchengine.This sortofsolution,however, bearstheknown limitationsthateveryoneof ushasexperiencedwith currentkeyword-basedsearchen-gines(e.g,unrelatedmatches).

Consequently, we worked on a methodwhich allows theuserto specifyhis/herinterests(crudelyspeaking,“the searchcriteria”), andthenwe searchfor e-Storiesthat matchtheseinterests.Thedifferenceof our approachwhencomparingit

with a keyword-basedsearchengineis that the structureofthe interestsis drawn from the underlyingontology. Hence,we deliberatelyimposea genericstructureof intereststo theuserwhich containsthemostimportanttypesof informationonewould typically find in the KMi Planete-Stories. Thisstructureis composedof thefollowing items:

� Research areas thatareinvestigatedin KMi;� Research themes thatareinvestigatedin KMi;� Organizations thatKMi collaborateswith;� Projects in KMi;� Technologies usedin KMi;� Application domains that are investigated in

KMi;� People - membersof theKMi lab.

All of these items are classesin the underlying KMiPlanet ontology1. The advantageof this is that we cango beyond the expectedcategory namematching: we can

1Accessiblefrom the Web through the WebOnto browser onURL: http://webonto.open.ac.uk/

reasonaboutthe categoriesselectedby applying ontology-driven� deductiveheuristics.For example,if someoneis inter-estedin Research Area GeneticAlgorithms, we wouldnormally return all the e-Storiesthat talk about that Re-search Area by employing thestring-matchingtechniquewe describein the sequel.However, by usingthe ontologi-cal relationsthat hold betweenthesecategorieswe canfindwhich Projects have asResearch Area GeneticAl-gorithmsand thensearchfor e-Storiesthat talk abouttheseProjects. Thesewould then be includedin our answersetaspotentiallyinterestinge-Storiesalthoughthey don’t ex-plicitly mentiontheGeneticAlgorithmsResearch Area.In the samemanner, we can apply more complex heuris-tics suchas finding Technologies that have beenusedin Projects andPeople who aremembersor leadersoftheseProjects - which have asResearch Area Ge-netic Algorithms - thereforeinferring that thesePeoplemightbeapotentialcontactfor informationonTechnolo-gies for GeneticAlgorithms. In termsof the underlyingontologystructure,our aim is to take advantageof the richdefinitionsof classesin the OCML language.For example,thefollowing OCML codeis thedefinitionof aninstanceof aKMi researchanddevelopmentproject,the “sharingontolo-gieson theweb” project:

(def-instanceproject-sharing-ontologies-on-the-webkmi-r&d-pr oject((has-research-areares-area-ontologiesres-area-knowledge-sharing-and-reuse)(project-application-domainorganisational-learning)(addresses-themetheme-collaboratingtheme-communicatingtheme-reasoning)(has-project-leaderjohn-domingueenrico-mottazdenek-zdrahal)(funding-sourceorg-european-commission)(has-goals”Enablingknowledgeengineersto shareontologieson theweb.”)(has-web-addressweb-page-project-sharing-ontologies-on-the-web)(uses-technologylisp java tech-lispwebtech-ocml)(associated-productstech-webontotech-tadzebao)))

As we can see,this definition is sufficient for deducingfactsrelatedto theproject’s researchareas,themes,applica-tion domain,leaders,etc. Most of theseconstructsareuseddirectly in the browsablestructurewe imposedto the userin myPlanet’s interface. Thus, the deductionstepinvolvesa straightforward OCML query. Other slots, however, likefunding sourceand technologiesused,canbe usedto inferfurtherlinks asin thescenariowedescribedbefore.This richrepresentationof aprojectinstancehighlightsthestrengthsofOCML as a knowledgemodelling language([Motta, 1999])which hasbeenusedin many projectsover the last 6 years.Currently, thereareover 90 modelsdefinedin the WebOntolibrary all of which areaccessiblewith a Web browserfromwebonto.open.ac.uk. Wealsouserelationsto link peo-plewith projectssuchas:

(def-relationinvolved-in-projects(?x ?project):constraint(and(person?x)

(project?project)):sufficient (or (has-project-member?project?x)

(has-project-leader?project?X)))

The OCML languageprovides supportfor defining opera-tional optionsfor eachrelationsuchasthe:sufficient

constructin our exampleabove. Its purposeis to helpchar-acterizethe extensionof a relation. For the relation givenabove, it is sufficient to prove that a personis a memberorleaderof a projectin orderfor therelationinvolved-in-project/2 to hold. We also store the selectionsa usermakes,that is, we save the user’s profile with respectto theselectedinterests.This profile canbeeditedlateron aswellasusedfor findingpro-actively e-Storiesthatmatchit.

The matchingof interestsin a e-Storyis basedon string-matchingbut employs the notion of “cue phrases”and“cuewords” which areassociatedwith the instancesof the cate-goriesgivenabove. Weusetwo meaningsof “cue”: evidenceandabstraction.A cuephrase,in ourapproach,is bothanab-stractionof thecategory that is associatedwith andevidencethat thee-Storywhich containsit is relevantto thatcategory.For example,we defineasa cuephrasefor theResearchAreaOntologies, thephrase“knowledgesharingandreuse”.This is anabstractionof the termOntologies. Whenever wefind thatphrasein an e-Storywe assumethat this e-Storyisrelevant to Ontologies. This finding is the evidenceof rel-evance. This techniquehasbeenproved easyto apply andgave usa broaderandmoreaccurateanswersetthantheonewe would getwith a simplematchof thecategoryname.Onthe other hand,we needto be careful whenwe identify ordevise cuesfor a particularcategory sincea looselydefinedcuephrasecould result in looselyrelatede-Stories.For ex-ample,thecuephrase“survival of thefittest” couldbearguedthat is an abstractionof the GeneticalgorithmsResearchArea sinceit describesa commontechniqueof molecularbiology usedin Geneticalgorithms. It might be dangerousto useit though,sinceit is looselyconnectedto thetermGe-neticalgorithmsandthepossibilityto getunrelatede-Storiesis high(e.g,e-Storiesabouta fighting contestmight containthis phrase).We seethis asa tradeoff: the moregenericthecuephrasesarethemorephraseswecandefineor devise,thelessgenericthe cuephrasesarethe lessphraseswe cande-fine or devise. It is obvious that,with morecuephraseswecanfind moree-Storiesbut the phrasescan’t be too genericbecausethismayresultin unrelatede-Stories.To resolvethistradeoff, we hadto follow a manualapproachin identifyingor evendevising,whenevernecessary, cuephrasesfor all theinstancesof thesevencategoriesdescribedabove. Thatway,we wereableto judgeby ourselvesthe “closeness”of a cuephraseto a particularcategory by referring to literaturere-sources,askingexpertsin that category for advice,etc. Weareplanning,however, to automatethis processto themaxi-mumdegreepossibleasthis is adesiredrequirementin orderto scale-upthisapproachin a time-effectivemanner.

To illustrate the usageof this tool, we will go throughadetailedscenarioin which a usertries to find e-Storiesre-latedto his/herinterests.As we canseefrom figure2, a JavaApplet is usedas the front-endfor choosingthe categoriesuponwhich the searchwill be based. When this Applet isloadedover the Web it loadsall the instancesfor the sevencategoriesgiven above, henceit provides a partial view oftheunderlyingontology’scontents.In our example,theuser“yanniskalfoglou” hasbrowsethehierarchytreeandchosentwo categories: Application Domain Distanceteach-ing andProjectSharingOntologiesontheWeb. Thesetwo

Figure3: A e-Storyof myPlanet.

aredisplayedin theupperright paneof thewindow in figure2. Thelower left paneis usedfor displayingadditionalinfor-mationwith respectto the category currentlyviewed in thetree.In ourexample,weseea textualdescriptionof thegoalsfor theProject beingviewed.This informationis obtainedby queryingthe underlyingontologyfor the project’s goals.We displaydifferenttypesof textual informationtailoredtothetypeof categorybeingviewed.For example,whenanin-stanceof People is viewedthenwedisplaytheprojectsthatthis personis involvedto. This informationis obtainedfromtheontologyafterfiring therelevantquery.

After selectingthecategories,user“yanniskalfoglou” cansave his profile andinitiate thesearchby pressingtheViewmyPlanet button. This will displaythe results,if any, in apersonalizedWeb-pagewhich will beusedin futuresessionsastheuser’s personalPlanet Web-page(hence,myPlanet).Sucha pagecontainsthe setof e-Storiesthat matchthe se-lectedcategoriesby employing thestring-matchingtechniquewe describedabove. We includea snapshotof a e-Storythatwasfound relevant to the user’s interestsin figure 3. As wecansee,this e-Storycontainsthecuephrase“distancelearn-ing”(which is deliberatelycircled for the sake of this exam-ple) which is associatedwith theApplication domainDistanceteaching.

3.2 Populating the ontologyThe e-Storiesare formalized in terms of associatingthemwith a formal representationwhich supportsvariousformsof reasoningin PlanetOnto. This formalizationprocess,as[DomingueandMotta,2000] describe:

“is drivenby anontologythatdefinestheconceptsneededto describeeventsrelatedto academiclife -

for example,projects,products,seminars,publica-tionsandsoforth. This meansthatwe ignorepartsof anewsstorythatarenotrelevantto theontology,muchasin template-driveninformationextractionapproaches.”

In theseapproaches,IE systemsfocus only on portionsoftext that arerelevant to a particulardomain. From that per-spective, IE can be seenas the task of pulling pre-definedrelationsfrom texts aswe seein applicationsof IE in vari-ousdomains(see,for example,[ProuxandChenevoy, 1997]).Furthermore,IE canbeusedto partially parsea pieceof textin orderto recognisesyntacticconstructswithout theneedofgeneratinga completeparsetreefor eachsentence.This ap-proachcould be coupledwith domainspecifictemplatesinorderto identify relevant information. If no extractiontem-plate appliesto the parsedsentencethen no information isretrieved.

Thesecharacteristicsof IE technologywereappealingforour task: to populatethe ontologywith new instancesof e-Storiesin an automatedmanner. IE gave us the meanstoidentify thepartof ane-Storythatwill beprocessed,whereasdomainspecifictemplatesmadeit possibleto fill-in slots inontology instances.For example,in a e-Storyfor the KMidomainonemight be interestedto extract only the nameofKMi projects,KMi members,KMi funding organisations,KMi award bodies,money being awarded,etc., and ignorethe rest. As it is describedin [Vargas-Veraet al., 2001], thekind of information that will be extractedis determinedbythepre-definedtemplateswhich arebasedon thetypologyofeventsin our KMi Planet ontology. Examplesof eventsare visiting-a-place-or-people, academic-

Figure4: A e-Storysendto KMi Planet.

conference, event-involving-project, and soforth. Currently, we have 40 event typesdefinedin our on-tology andwe have devise templatesfor 10 of them. Thesearethedomainspecifictemplatesusedin IE systems.

An exampletemplatefor the event type visiting-a-place-or-people is asfollows:

[ ,X, ,visited,Y,from, Z, ]

This templatematchesthe sentenceword list where X isrecognisableasan entity capableof visiting, Y is the placebeingvisitedandcannotbea preposition,andZ is recognis-ableasa rangeof datesby virtue of their syntacticfeatures.The remainingtokensin the sentenceare ignored. We usetheunderlyingkmi-ontology instancesto identify propernamesfor visitors(if they areKMi employees)andwheneverthis failswedeploy anamedentity recogniserto helpuswithidentifying additionalpropernamesfor visitors andplaces.Each templateis triggeredby the main verb in any tense.In this template,the trigger word is the verb “visited”. As[Riloff, 1996] describes,linguistic rulescouldbedeployedtohelp identify trigger wordsreliably. For example,if the tar-getedinformationis thesubjector thedirectobjectof a verbthenthetriggerword shouldbethemainverb.

Assumethat a KMi journalistsubmitsa e-StoryaboutanAKT meeting.We illustratesuchae-Storyin figure4. As wecansee,thefirst sentenceof thee-Storymatchesthetemplategivenabove. It containsthetriggerword “visited”. This will

activatethetemplateandvariablesX, Y, andZ will beinstan-tiatedto visitor, placebeingvisitedandrangeof dates,whichgiveusthefollowing information:

� visitor: “AKT collaboratinginstitutions”� place:“Sheffield”� date:“January29-312001”

This will beautomaticallyconvertedto OCML codein or-derto fill-in theslotsin theinstanceof theeventtypewe aredealingwith:

(def-instancevisit-of-akt-collaborating-institutionsvisiting-a-place-or-people((has-duration‘3 days’)(start-timejanuary-29-2001)(end-timejanuary-31-2001)(has-locationsheffield)(visitor akt-collaborating-institutions)))

In the sequel,a form-basedinterfaceis usedto visualizethe informationextractedasshown in figure 5. Uninstanti-atedslotscould be filled-in manuallyby the knowledgeen-gineer. Themainhelpof this semi-automaticinstantiationofeventtypeis theextractionof informationfrom e-Stories,thepartialslots-filling,andtheidentificationof eventtype.

In sometemplateswe canalsomake useof the underly-ing ontologyto supporttheeventidentification.For example,thetemplatefor theconferring-a-monetary-awardeventtypeis:

[X, ,has been awarded,Y,from,Z, ]

whereY is amountof money, Z is a funding body, andX iseitherapersonof aproject.To decidewhichone,wetraversethe instancesof peopleandprojectsin theunderlyingkmi-ontology to find outwhich matchesX.

4 RelatedworkAlthoughwe couldn’t find directly comparableprojectswithour domain- ontology-drivenWeb-basedpersonalizednewsservices- there several efforts describedin the literaturewhereontologiesandWeb-basedserviceswereput together.We reporton thesein thesequel:

In the FindURproject[McGuinness,1998], the meansforknowledge-enhancedsearchby usingontologieswereinves-tigated.McGuinnessdescribesa tool, deployedat theAT&Tresearchlabs, which usesontologiesto improve the searchexperiencesfrom the perspectivesof recall andprecisionaswell as easeof query formation. Their tool is mainly tar-getedto the InformationRetrieval researchareaandaimstoimprovethesearchenginestechnology. However, theideaofdeploying ontologiesto achieve thesegoalsis similar to ourapproachwhich is mostlyconcernedwith usingontologiestostructurethe searchspace(i.e.,pre-selectedcategoriesof in-terests- section3.1) andincreasetheanswerset(i.e.,heuris-tics deployed to selecta relevant e-Story- section3.1). Intheir work though,meansfor updatingthe topic setsusedto categorizeinformation(similarto our interestscategories)were investigated.In contrastwith our approachwherethe

3 days

Figure5: Semi-automaticallyannotatethe e-Storyof figure 4: a partial instantiationof the event type: visiting a placeorpeople.

categoriesof interestsarepre-definedandmaintainedinter-nally, the FindUR teamwere“experimentingwith a collab-orative topic-building environment that allows domain ex-pertsto expandof modify topic setsdirectly”[McGuinness,1998]. Although this approachhasthe advantageof speed-ing up the maintenancetask, in our casewe seethe pre-selectedcategoriesasa stablepieceof knowledgeover time.If however, thesecategoriesneedto be updated,we coulduse the WebOnto[Domingue,1998] environment for edit-ing andbrowsing the underlyingontology. We shouldalsopointout asimilarity in theuseof cuephrasesandcuewordsto increasethe numberof relatede-Stories. In the FindURproject,thenotionof “evidencephrases”wasused.However,their definitionas“evidence”phraseshighlightsa differencein their application: aswe describedin section3.1, we usecuesbothasabstractionsof termsandasevidencewhereasinthe FindUR domainthey usedonly asevidence. For exam-ple, asthe authorsdescribe,the company Vocalteccould bean evidencefor the topic Internet telephonybut certainly isnot anabstractionof it. In particular, they defineda typology

of evidencephrases:synonyms, subclasses, products, compa-nies, associatedstandards, key people. Thesewerethenusedto increasethe numberof relatedanswersto a given query.They weredeployedin thebackgroundalongwith rulesthatgovern their interrelations. As in our approach,thesewerenot automaticallygenerated.

A similar approachwhich deploys contentmatching tech-niquesisdescribedin [Guarinoetal., 1999] wheretheauthorspresentthe OntoSeeksystemdesignedto supportcontent-basedaccessto theWeb. As in theFindURproject,thetargetwastheInformationRetrieval areawith theaimof improvingrecall and precisionand the focus was two specificclassesof informationrepositories:yellow pagesandproductcata-logues.Their underlyingmechanismusesconceptualgraphsto representqueriesand resourcesdescriptions.As the au-thorsargue,“with conceptualgraphs,theproblemof contentmatchingreducesto ontology-drivengraphmatching,whereindividualnodesandarcsmatchif theontologyindicatesthata subsumptionrelationshipholdsbetweenthem”[Guarinoetal., 1999]. However thesegraphsarenot constructedauto-

matically. The OntoSeekteamdevelopeda semi-automaticapproach� in which the userhasto verify the links betweendifferentnodesin the graphvia a designateduser-interface.Thesimilarity of thiswork with myPlanetlies in theusageofanontology. However, aspreviously, we deployedour ontol-ogy in differentphases:in structuringthesearchspaceandinincreasingtheanswerset.

On a slightly different focus, the IMPS(Internet-basedMulti-agent ProblemSolving) systemusessoftware agentsto conductknowledgeacquisitionon-line using distributedresources[Crow andShadbolt,1999]. Oneof theseagents,OCA(OntologyConstructionAgent), is usedto facilitatethetaskof constructinganontologyat runtime,that is, queryingvariousresourcesfor filling in the gapsin the ontology. Al-thoughthe goalsof this work weredifferent,the underlyingideafor theOCA is similartooureffortsof populatingtheon-tology by automaticallyinstantiatingclassesaswe describedin section3.2. OCA wasused“to extract information fromnetworked knowledgeresources- like WordNet, the onlinethesaurus/lexical databaseanda plain text domaindatabasein the field of geology, the IGBA dataset”[Crow andShad-bolt, 1999]. Our approachis different in that we deploy IEtechniquesalongwith domainspecifictemplatesto instantiatespecificontologyclasseswhereastheOCAdeploys heuristicmethodsfor extractionandfocuseson creatingan hierarchylatticeof classesof concepts.

In the context of managinguserprofileswe shouldpointto attemptsthat have beenmadeto infer userprofiles fromanalyzingpatternsof accessto documents[Krulwich andBurkley, 1997]. However, most of theseapproachestry toinduceuserinterestsby employing empiricalmethods.In ourcase,we deliberatelyimposeanontology-drivenstructuretotheuserprofile whichenabledusto reasonaboutit.

Finally, [Rouxetal., 2000] and[Faatzetal., 2000] discussearly ideason theuseof IE techniquescoupledwith ontolo-giesin orderto helpthemunderstandcomplex relationships,statementsor termsin semi-structuredor unstructureddocu-ments.

5 Summary and futur e workIn this paperwe presenteda system,myPlanet, which actsasthe front-endto a news server. It is placedon the top of theexisting infrastructurefor ontology-driven Web-basednewsservices,PlanetOnto. It aimsto allow usersbrowsee-Storiesaccordingto their preferences(i.e.,searchcriteria). The us-ageof theunderlyingontologyallowedusto deviseheuristicswhich make it possibleto increasethe answersetof relatede-Stories.Wealsoprovidefacilitiesfor saving users’profiles,a featurevital for providing furtherservicestailoredto theirpreferences.

While the easeof accessibilityto our e-Storiesrepositorywasaprimarygoal,equallyimportantwasthemaintenanceofthis repository. Sincewebaseourserviceson theenrichmentof e-Storiesin termsof annotatingthemwith ontology-drawnknowledgestructureswe hadto find waysof automatingthisprocess.We usedIE techniquesanddevelopeddomainspe-cific templatesto automaticallyidentify the event type of ae-Storyandextract specificinformationneededfor instanti-

atingit in theunderlyingontology.Therearecertainresearchissueswhichremainopenin this

work. In theareaof personalisedservicesweneedto taketheontology-basedreasoningto a furtherstage:reasonaboutthekind of outputthatwill bedispatchedto theuserby analysinghis/herprofile. Sincewesavetheuser’spreferenceswecouldapplydeductiveheuristicsto find e-Storiesthatarerelatedtothesepreferencesby meansof tracing their interrelationsinthe underlyingontology. A simpleexamplecould be to in-fer that technologiesusedin projectsmight be of interesttousersthat looking for e-Storiesrelatedto otherprojectswiththesameresearcharea.Furthermore,weareinvestigatingthepossibility of extendingthe type of output. Currently, a re-latede-Storyis theoutputof myPlanet. In thefuturethough,wemightwantto provideotherkind of outputlike,for exam-ple, suggestionsaboutpotentialcollaboratorson a researchtopic, or organizationswith a potentialinterestin the user’sresearchareas.Thesecouldbeinferredby applyingthesamestyleof deductiveheuristicsbut changingtheoutputto ades-ignated“personalinterests”Web-page.As in theexistingsys-tem,editingfacilitiesarevital to keepthesystemupdatedandlet theuserdrive thereasoningprocess.

One of the advantages of our “lowest-common-denominator” medium(the email message) is that wemake no commitmentsas to what the structure shouldbe. Which meansthat we can apply exactly the sameinfrastructure to any kind of document, not necessarilyemailmessages.Thetechnologyneedsno changes,howeverwe might need to edit or even createnew ontologies tocharacterisethe new domain. Towards this direction, weplan to extend the usageof IE techniquescoupled withdomainspecifictemplatesasit hasbeenproveda fastway ofinstantiatingour ontologies.In our ontologypopulationtaskwe hadto manuallyconstructthe templatesfor eachtypeofevent. We areplanningto automatethis taskby deployinginductive learningalgorithms. The existing setof e-Storiescould be used, potentially, as the training set to identifycharacteristicsof event typeswhich will eventually lead toautomaticallyconstruct their templates. These templatescanthenbe testedon the annotatede-Storiesto judge theirquality and appropriateness.In the sameline of work, weintend to expand on IE techniquesand include tools thatallow detectionof anaphorawhich is an important featurewhen dealing with large corpussesof text from the sameorganisationbut differentdepartments.In thesecases,termsareoften usedin different formats(i.e.,abbreviatednames).Co-referrencesbetweenthoseare importantto be identifiedprior to IE tasksin orderto avoid duplicationsor omissionsof information.

Finally, theuseof cuephrasesandcuewordsfor increasingtheanswersetworkedwell in ourapproach.Althoughthesetis relatively small(wehave somethinglike 200 cue phrasesdefined)their identification needto be automated. To dothis we have begunto work with a techniqueborrowedfromthedataengineeringdomain[Krulwich,1995], whichappliesheuristicsto identify ‘semanticallysignificantphrases’.Theunderlyingprincipleis to observevisualeffectsoftenusedbyauthorsto emphasizeimportantconceptsin their documents.For example,boldfacedor italicisedwords,heavily repeated

phrases,compoundnounphrases,list of items,etc. We havebuild� a prototypetool which extractsa largesetof potentialcuephrasesafterapplyinga designatedsetof heuristics.Thepotentialphraseswill thenbeeditedto constructthefinal set.

With this first versionof myPlanetandthe extensionsweplanto makeweareworkingtowardsthevisionof theKnowl-edge User erawherethe useris the focal point in a settingwith a plethoraof knowledge-intensive systemsaim to de-liver intelligentservicesover theWebsurroundinghim. Thismetaphor, althoughin its infancy yet, is in contrastwith thetraditionalview of knowledge-intensivesystemsbeingthefo-cal point with userssurroundingthemactingassubscribersfor knowledgeservices.

AcknowledgementsTheresearchdescribedin thispaperis supportedby theInter-disciplinaryResearchCollaboration(IRC)project: AdvanceKnowledge Technologies(AKT - http://www.aktors.org)fundedby theUK government.

References[Crow andShadbolt,1999] L. Crow and N. Shadbolt. Ac-

quiring and Structuring Web Content with KnowledgeLevel Models. In R. Studer and D. Fensel, editors,Proceedingsof the 11th EuropeanWorkshopon Knowl-edgeAcquisition,ModellingandManagement(EKAW’99),Dagstuhl,Germany, pages83–101.SpringerVerlag,May1999.

[DomingueandMotta,2000] J. Domingue and E. Motta.Planet-Onto:FromNewsPublishingto IntegratedKnowl-edgeManagementSupport. IEEE Intelligent Systems,15(3):26–32,2000.

[Domingue,1998] J. Domingue. Tadzebaoand WebOnto:Discussing, Browsing, and Editing Ontologies on theWeb. In Proceedingsof the11thKnowledge Acquisition,Modelling and ManagementWorkshop,KAW’98, Banff,Canada, April 1998.

[Faatzet al., 2000] A. Faatz,T. Kaamps,andR. Steinmetz.Backgroundknowledge,indexing and matchinginterde-pendenciesif documentmanagementandontologymain-tenance.In positionpaperin Proceedingsof theECAI2000workshopon Ontology Learning(OL2000),Berlin, Ger-many, August2000.

[Guarinoet al., 1999] N. Guarino,C.Masolo,andG.Vetere.OntoSeek:Content-BasedAccessto theWeb. IEEEIntel-ligentSystems, 14(3):70–80,May 1999.

[HuhnsandStephens,1999] M. HuhnsandL. Stephens.Per-sonalOntologies. IEEE InternetComputing, 3(5):85–87,September1999.

[Kalfoglouet al., 2000] Y. Kalfoglou, T. Menzies,K-D. Al-thoff, andE. Motta. Meta-knowledgein systemsdesign:panacea...orundeliveredpromise? TheKnowledge Engi-neeringReview, 15(4),2000.

[Kalfoglou,2000a] Y. Kalfoglou. DeployingOntologies inSoftware Design. PhDthesis,Departmentof Artificial In-telligence,Universityof Edinburgh,June2000.

[Kalfoglou,2000b] Y. Kalfoglou. On the convergenceofcoretechnologiesfor knowledgemanagementandorgan-isationalmemories:ontologiesand experiencefactories.In Proceedingsof the ECAI 2000 Workshopon Knowl-edge Managementand Organisational Memories(W11-KMOM’00), Berlin, Germany, August2000.

[Krulwich andBurkley, 1997] B. Krulwich andC. Burkley.The InfoFinder Agent: LearningUser IntereststhroughHeuristic PhraseExtraction. IEEE Intelligent Systems,12(5):22–27,1997.

[Krulwich, 1995] B. Krulwich. Learningdocumentcategorydescriptionsthroughtheextractionof semanticallysignif-icant phrases.In Proceedingsof the IJCAI’95 Workshopon DataEngineeringfor InductiveLearning, 1995.

[McGuinness,1998] L.D. McGuinness.OntologicalIssuesfor Knowledge-EnhancedSearch. In N. Guarino,editor,Proceedingsof the1stInternationalConferenceonFormalOntology in InformationSystems(FOIS’98),Trento,Italy,pages302–316.IOSPress,June1998.

[Motta,1999] E. Motta. ReusableComponentsfor Knowl-edgeModels:CaseStudiesin ParametricDesignProblemSolving, volume53 of Frontiers in Artificial IntelligenceandApplications. IOS Press,1999. ISBN: 1-58603-003-5.

[O’Leary, 1998] D. O’Leary. KnowledgeManagementSys-tems: ConvertingandConnecting. IEEE Intelligent Sys-tems, 13(3):30–33,June1998.

[ProuxandChenevoy, 1997] D. Proux and Y. Chenevoy.Naturallanguageprocessingfor bookstorage:Automaticextractionof information from bibliographicnotices. InProceedingsof the Natural Language ProcessingPacificRimSymposium(NLPRS’97), pages229–234,1997.

[Riloff, 1996] E. Riloff. An empirical study of automateddictionaryconstructionfor informationextractionin threedomains.AI Journal, (85):101–134,1996.

[Rouxet al., 2000] C. Roux,D. Proux,F. Rechermann,andL. Julliard. An ontologyenrichmentmethodfor a prag-maticinformationextractionsystemgatheringdataon ge-netic interactions. In position paper in ProceedingsoftheECAI2000Workshopon OntologyLearning(OL2000),Berlin, Germany. August2000.

[Vargas-Veraet al., 2001] M. Vargas-Vera, J. Domingue,Y. Kalfoglou, E. Motta, and S. Buckingham-Shum.Template-driveninformationextractionfor populatingon-tologies. TechnicalReport105, KnowledgeMedia Insti-tute(KMi), March2001.

an ontology-driven web- based personalized news service

Documents