ekaw 2016 - ontology forecasting in scientific literature: semantic concepts prediction based on...
TRANSCRIPT
![Page 1: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/1.jpg)
Amparo Elizabeth Cano Basave1, Francesco Osborne2, Angelo Salatino2
1 Aston University, United Kingdom2 KMi, The Open University, United Kingdom
EKAW 2016
OntologyForecastinginScientificLiterature:SemanticConceptsPredictionbasedon
Innovation-AdoptionPriors
![Page 2: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/2.jpg)
22
Osborne, F., Motta, E. and Mulholland, P.Exploring scholarly data with Rexplore.International Semantic Web Conference 2013
technologies.kmi.open.ac.uk/rexplore/
![Page 3: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/3.jpg)
TheComputerScienceOntology1
• Notfine-grainedenough.– E.g.,only2topicsareclassifiedunderSemanticWeb
• Static,manuallydefined,hencepronetogetobsoleteveryquickly.
3
Standardresearchareastaxonomies/classifications/ontologiessuchasACMarenotapttothetask.
ACM 2012
![Page 4: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/4.jpg)
TheComputerScienceOntology(CSO)wasautomaticallycreatedandupdatedbyapplyingtheKlink-2algorithm.
Osborne, F. and Motta, E.: Klink-2: integrating multiple web sources to generate semantic topic networks. In ISWC 2015. (2015)
TheComputerScienceOntology2
![Page 5: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/5.jpg)
• WeautomaticallygeneratedaversionofCSOconsistingofabout15,000topics linkedbyabout70,000semanticrelationships.
• ItincludedverygranularandlowlevelresearchareasanditcanberegularlyupdatedbyrunningKlink-2onanewsetofpublications.
• WealsohavedifferentversionsofCSOobtainedbyrunningKlink-2onthesetofdocumentsuptoacertainyear.
5
TheComputerScienceOntology3
5
CSO 2012 CSO 2013 CSO 2014 CSO 2015
[…]
![Page 6: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/6.jpg)
Asharedconceptualization
“Ontologiesareaformal,explicitspecificationofasharedconceptualization”(Studer etal.,1998)
“Theconceptualizationshouldexpressasharedviewbetweenseveralparties,aconsensusratherthananindividualview“(Guarino atal,2009)
“Ontologiesareus:inseparablefromthecontextofthecommunityinwhichtheyarecreatedandused.”(Mika,2005)
“OntologyEvolutionisthetimelyadaptationofanontologytothearisenchangesandtheconsistentpropagationofthesechangestodependentartefacts.”(Stojanovic,2004)
6
![Page 7: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/7.jpg)
Butwhatifwecannotwaitforsharedconsensus?
Theseontologiesreflectthepast,andcanonlycontainconceptsthatarealreadypopularenoughtobeselectedbyexpertsorautomaticmethods.
Hence,theyhardlysupporttaskswhichinvolvetheabilitytodescribeemergingconcepts,e.g.:
• Exploringtheforefrontofresearch;
• Trenddetection;
• Horizonscanning;
• Producingsmartanalyticstoinformbusinessdecision.
77
![Page 8: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/8.jpg)
OntologyForecastingGivenanontologyintimet,ateamofexpertsand/orasoftwareconsideranumberofrelevantknowledgesourcesandupdatetheontologybyalsoincludingnewconceptsonwhichtherewillbe (probably)asharedconsensusintimet+1.Forexample,aforecastedontologyofresearchtopicsin2000mayalreadyincludeanewtopicassociatedtothedynamicspreludingtothe“SemanticWeb”(newcollaborationsbetweenKnowleged BaseSystems,AIandWWW)
8
[…]
t-n t-1 t t+1
![Page 9: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/9.jpg)
Contributions– afirststeptowardsontologyforecasting
1. Weapproachthenoveltaskofontologyforecastingbypredictingsemanticconceptsintheresearchdomain.
2. Weintroducemetricstoanalysethelinguisticandsemanticprogressivenessinscholarlydata.
3. Wepropose SemanticInnovationForecast(SIF) anovelweakly-supervisedapproachfortheforecastingofemergingsemanticconcepts.
4. Weevaluateourapproachinadatasetofover1milliondocumentsintheComputerSciencedomain.
– Theproposedframeworkofferscompetitiveboostsinmeanaverageprecisionattenforforecastsover5years.
9
![Page 10: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/10.jpg)
Scopus(ComputerScience)- #ofpublications
10
0
50000
100000
150000
200000
250000
1 9 9 5 1 9 9 7 1 9 9 9 2 0 0 1 2 0 0 3 2 0 0 5 2 0 0 7
NUMBEROFA
RTICLES
YEAR
![Page 11: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/11.jpg)
Scopus(ComputerScience) – vocabularysize
11
0
20000
40000
60000
80000
100000
120000
140000
160000
1 9 9 5 1 9 9 7 1 9 9 9 2 0 0 1 2 0 0 3 2 0 0 5 2 0 0 7
VOCA
BULARYSIZE
YEAR
![Page 12: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/12.jpg)
Klink-2ComputerScienceOntology- #ofclasses
12
![Page 13: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/13.jpg)
LinguisticProgressiveness
Languageinnovationinacorpusreferstotheintroductionofnovelpatternsoflanguage.
WegeneratealanguagemodelperyearusingKatzback-offsmoothinglanguagemodelandanalyzeddifferencesbetweenconsecutiveyearsbyusingtheperplexitymetric.
13
0
2E+10
4E+10
6E+10
8E+10
1E+11
1.2E+11
1.4E+11
1 9 9 5 1 9 9 7 1 9 9 9 2 0 0 1 2 0 0 3 2 0 0 5 2 0 0 7
PERP
LEXITY
YEAR
![Page 14: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/14.jpg)
LinguisticProgressiveness
Wealsoperformaprogressiveanalysisbasedonlexicalinnovationandlexicaladoption.
Alargenumberofnewwordsappeareachyear,butonlyfewofthemareadopted(i.e.,stillusedinthefollowingyear).
14
0
10000
20000
30000
40000
50000
60000
70000
1 9 9 7 1 9 9 9 2 0 0 1 2 0 0 3 2 0 0 5 2 0 0 7
NUMBEROFW
ORD
S
YEAR
# of new words per year
# of adopted words per year
![Page 15: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/15.jpg)
MeasureLinguisticProgressiveness
15
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
1 9 9 7 1 9 9 9 2 0 0 1 2 0 0 3 2 0 0 5 2 0 0 7
LING
UISTICPRO
GRESSIVENE
SS
YEAR
We introduce the linguistic progressiveness metric:
𝑳𝑷𝒕 =𝑳𝑨𝒕𝑳𝑰𝒕
![Page 16: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/16.jpg)
Innovation-AdoptionPriors
Weassumethatemergingtopicswillbeassociatedwithnovelwords,thuswecomputepriorsintimetbyconsideringinnovative(LI)andadoptedwords(LA).
Awordpriorisaprobabilitydistributionthatexpressesawordrelevanceto- inthiscase- beingcharacteristicofinnovativetopics.
Webuildthepriormatrixbyassigningaweighttoeachterminthisvocabulary.
– 0.7ifw∈ LIt−2 and0.9ifw∈ LAt−1.Becauseouranalysisshowsthatrecentlyadoptedwords(LA)aremoreoftenassociatedwithemergingtopicsthannewwords(LI).
16
![Page 17: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/17.jpg)
SemanticInnovationForecast(SIF)model
SIFisagenerativeprobabilistictopicmodel thattakesininputasetofdocumentsatyeartandasetofhistoricalpriorsandforecasttopicworddistributionsrepresentingnewconceptsintheontologyOt+1.
17
![Page 18: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/18.jpg)
SemanticInnovationForecast(SIF)model
18
WeuseCollapsedGibbsSamplingtoinferthemodelparametersandtopicassignmentsforacorpusatyeart+1givenobserveddocumentsatyeart.
![Page 19: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/19.jpg)
Evaluation
WeperformthistaskbyapplyingourframeworkontheScopusdatasetforComputerScience(>1Mpublications).
Eachcollectionofdocumentsinayearisrandomlypartitionedintothreesubsets:20%isusedtoderiveinnovationpriors,40%trainingset,40%testingset.
WetrainaSIFmodelonyeartusinginnovativepriorscomputedforthetwopreviousyears(t-1andt-2)andweusetheSIFmodeltoforecastsemanticconceptsatyeart+1.
Wethenmeasurecomputethecosinesimilaritybetweenthepredictedsemanticconceptsfort+1andthegoldstandardconceptsforthatyear.WeconsideraconceptcorrectlyforecastedifthesimilaritywithaGSconceptishigherthan0.5.
19
![Page 20: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/20.jpg)
Evaluation- Baselines
WecompareSIFagainstfourbaselines.Forayeartforecastingforyeart+1:
1. LDATopics(LDA) onthefulltrainingset.Thissettingmakesnoassumptionoverinnovative/adoptedlexicons.
2. LDAInnovativeTopics(LDA-I);computestopicsbasedondocumentscontainingatleastonewordappearinginLIt.
3. LDAAdoptedTopics(LDA-A);computestopicsbasedonlyondocumentscontainingatleastonewordappearinginLAt.
4. LDAInnovation/AdoptionTopics(LDA-IA); computestopicsbasedonlyondocumentscontainingatleastonewordappearinginLIt orLAt.
20
![Page 21: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/21.jpg)
Evaluation- MeanAveragePrecision@10
21
Year SIF LDA LDA-A LDA-I LDA-IA
2000 0.70 0.12 0.48 0 0.412002 0.87 0 0.82 0.64 0.752004 0.91 0 0.58 0.57 0.632006 0.87 0.31 0.78 0.84 0.692008 0.99 0.40 0.68 0.57 0.70AVG 0.87 0.17 0.67 0.52 0.64
![Page 22: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/22.jpg)
Conclusion
Itispossibletoforecastreliablyemergingsemanticconceptsiftheontologyisassociatedwithalargecollectionofdocument.
Thenextchallengeistoforecastnewversionofanontology,thatistoproduceanontologythatincludesallconceptsandrelationshipsthatwillbe(probably)includedinthenewversion.
22
![Page 23: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/23.jpg)
Futureworks
• Integrationofexplicitandlatentsemantics;
• Includinggraph-structureinformationintothemodel;
• Understandinghowresearchtopicsarecreatedandforecasttopictrends.
23
Salatino, A.A., Osborne, F., Motta, E. (2016) How are topics born? Understanding the research dynamics preceding the emergence of new areas. PeerJ Preprints
![Page 24: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/24.jpg)
Francesco Osborne Angelo SalatinoAmparo Cano Basave
Elizabeth Cano-Basave, A. E., Osborne, F., Salatino, A.A. (2016) Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction based on Innovation-Adoption Priors. EKAW 2016, Bologna, Italy
Email: [email protected]: FraOsborneSite: people.kmi.open.ac.uk/francesco
![Page 25: EKAW 2016 - Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors](https://reader031.vdocuments.mx/reader031/viewer/2022030207/58ac3fc41a28ab99028b46f7/html5/thumbnails/25.jpg)