cascade prediction

社会媒体分析与挖掘

沈华伟

中国科学院计算技术研究所

[email protected]

SocialMedia

Social media are computer-mediated tools that allow people,companies and other organizations to create, share, orexchange information, career interests, ideas, andpictures/videos in virtual communities and networks.

Wikipedia

TypicalExamples:

Twitter,Facebook,Weibo,WeChat,Science,…

WhatIwilltalkabout

• CascadePrediction– Shen etal.,AAAI2014;Bao etal,2015;Gao etal.,WWW2016

• Influencemaximization– Chengetal.,CIKM2013;Chengetal.,SIGIR2014

• Collectivebehavior– Shen etal.,PNAS,2014;Scientometrics,2016

PartI

CascadePrediction

Cascades

• Threekindsofcascades– 𝑡", 𝑡$,⋯ , 𝑡&,⋯ 𝑡'

• PageView,queryusage,……

– 𝑢", 𝑡" , 𝑢$, 𝑡$ ,⋯ , 𝑢), 𝑡) ,⋯ , 𝑢', 𝑡'• WeChat,epidemics,……

– 𝑢", 𝑣", 𝑡" , 𝑢$, 𝑣$, 𝑡$ ,⋯ , 𝑢), 𝑣), 𝑡) ,⋯ , 𝑢', 𝑣', 𝑡'• Weibo,citation,……

Arethepopularitydynamicspredictable?(Power-lawdistribution,Null-model,……)

Yearly citation c(t) for 200 randomly selected papers published between 1960 and 1970 in the PR corpus. The color code corresponds to each papers’ publication year.

CascadepredictionHeterogeneousPopularitydynamics

Shenetal.,AAAI2014

Cascadeprediction

C.S.Peirce.TheNumericalMeasureoftheSuccessofPredictions.Science,4(93):453-454,1884.

Burst:temporalscale-free

Shenetal.,AAAI2014

Cascadeprediction• Feature-basedmethods

– Extractfeatures,e.g.,contentfeature,userfeature,structuralfeature,andthenpredictpopularityusingstandardclassificationorregressionmodels[Lerman etal.,WWW2010;Bao etal.,2013]

• TemporalAnalysis– Treatpopularitydynamicsastime

series,makingpredictionsbyeitherexploitingtemporalcorrelations[SzaboandHuberman,Comm.ACM2010;YangandLeskovec,ICDM2010;Gomez-Rodriguezetal,ICML2013]

Thesemethodsignorethatpopularitydynamicsisaprocessreflectingthearrivalofcollectiveattention.

Popularityinearlystage

Structuraldiversity

Cascadeprediction

Keyfactorstopopularitydynamics

IntrinsicAttractiveness…………….

Richgetsricher.……………………

Agingeffect……………………………….

⌘i

cti

Pi(t) =1p

2⇡�itexp

✓� (ln t� µi)

2

2�2i

◆

Arrivalrateofattentiontoanitemisdeterminedby

Visibility?

Power-law,log-normal,exponential,Rayleighdistribution

Shenetal.,AAAI2014

CascadepredictionGenerativemodelofcitationsdynamics:

ReinforcedPoissonProcess(RPP):𝜃, = {𝜇,, 𝜎,}

MLEforparameterestimation: Prediction:

Attractiveness

Agingeffect

Richgetsricher

Shenetal.,AAAI2014

Quantifyingscientificimpact

t̃ ⌘ (ln t� µi)/�i

c̃ ⌘ ln(1 + cti/m)/�i

c̃ = ��t̃�

Bonner&Fisher,Linearmagneticchainswithanisotropiccoupling,PhysicalReview (1964)Hohenberg &Kohn, Inhomogeneous electron gas,PhysicalReview (1964)Bardakci etal.IntrinsicallyBrokenU(6)⊗ U(6)SymmetryforStrongInteractions,PhysicalReviewLetters(1964)Berglund&W.E.Spicer,Photoemissionstudiesofcopperandsilver:Theory,PhysicalReview(1964)

1.1 4.8 1.13.0 8.8 1.21.9 7.5 0.96.7 9.2 1.0

Examples:

Quantifyingscientificimpact

Wang,Song,&Barabási,Science,2013.

Universalcitationdynamicsfordifferentjournals:

CascadepredictionCitation dynamics

ü RPP(ReinforcedPoissonProcess)consistentlyoutperformscompetingmethods.ü RPPwithoutpriorperformsalmostidenticallytoRPPwithprior(highaccuracy),but

performsremarkablybadonahandfulofcases,causedbyoverfitting (highMAPE)ü ThesuperiorityoftheRPPwithprior,increaseswiththelengthoftrainingperiods.

CascadepredictionExtensionsoftheRPPmodel

Ø Replacetherelaxationfunctionwithotherformoffucntions[Gao etal.,WSDM2015]

Ø Replacingthe“rich-gets-richer”mechanismwithobservedvisibility[Zhaoetal.,KDD2015]

Ø MixtureofRPPtomodelmultiplediffusion[Gao etal.,WWW2016]

e.g.,Numberofretweetersà Followercountofeachretweeter

e.g.,lognormalà exponentialorpowerlawfunction

Shenetal.,AAAI2014

CascadepredictionMixtureofRPPmodel

Diffusionprocesswithmultiplestages

EachcomponentisaRPPmodel

CascadepredictionHawkesProcess

Ratefunction:

Attractivenessorinfectiousnessofmessages

Zhaoetal.,KDD2015;Bao etal.,WWW2015

CascadepredictionRecurrentMarkedPointProcess

Idea:Learningtheratefunctionfromdata,insteadofhuman-definedratefunction

Duetal.,KDD2016

ü Embeddingeventhistorytovector

ü Learnratefunctionwithintheframeworkofmarkedpointprocess

CascadepredictionSummary

• Popularitydynamicsisanarrivalprocess,capturinghowamessageaccruesattention

• Feature-basedmethodwillbeincorporatedwithfeature-learningmethod

Popularitypredictionisstillanopenproblem!!

PartII

InfluenceMaximization

InfluencemaximizationProblemdefinition

– Input: l A social network G=(V,E), with V being the

node set and E being the edge setl Diffusion model：independent cascade

model and linear threshold modell k：number of seed nodes

– Output: a set of seed nodes S, | S | ≤ k– Objective: maximize the spread of

influence 𝜎 𝑆

0.1 0.2

0.3 0.1

0.1

0.5

0.4

0.1

0.4 0.4

0.2

0.2

0.10.5

0.3

Spread of influence

Given a social network, influence maximization aims to find a size-fixed set of seed nodes, maximizing the spread of influence.

Chengetal.,CIKM2013

InfluencemaximizationExistingmethods:

• Greedyalgorithm– Select,onebyone,thenodewithmaximummarginalinfluencetoaddintothesetofseednodes[Kempe etal,KDD2013;Leskovec etal.,KDD2007]

– Withguaranteedaccuracy1-1/e- ϵ,butnotscalable

• Heuristicalgorithm– Selectseednodeseitherbydelegatemetrics,e.g.,degree,PageRank,orestimateinfluencespreadwithapproximatemethods[Chenetal.,KDD2009;Jungetal.,ICDM2011]

– ScalablebutwithoutguaranteedaccuracyWelackanaccurateandscalablealgorithmtosolvetheproblemof

influencemaximization.

InfluencemaximizationPropertiesofIM

• Propertiesof𝜎 𝑆– Non-negative

– Monotone

– Submodular

argmax9𝜎 𝑆

st. 𝑆 ≤ 𝑘

𝜎 𝑆 ≥ 0, foranyS

𝜎 𝑆 ≤ 𝜎 𝑇 , ifS ⊆ T

𝜎 𝑆 ∪ {𝑣} − 𝜎 𝑆 ≥ 𝜎 𝑇 ∪ 𝑣 − 𝜎 𝑇 , ifS ⊆ T

Greedyalgorithmachievestheaccuracy1-1/e- ϵ ifthevlueof𝜎 𝑆couldbeexactlycomputed.

Influencemaximization(IM)

InfluencemaximizationScalability-accuracydilemma

• Valueof𝜎(𝑆)cannotbeexactlycomputed– MonteCarlosimulationisusedtoapproximatelyestimate𝜎(𝑆)

• Scalability-accuracydilemma– IncreasethenumberofMonteCarlosimulation

• Estimationof𝜎(𝑆)becomesaccurate,ϵ decreases• Lowscalability

– DecreasethenumberofMonteCarlosimulation• Highscalability• Estimationof𝜎(𝑆)becomesinaccurate,ϵ increases

Chengetal.,CIKM2013

InfluencemaximizationOursolution:StaticGreedy

• Idea: reusethesamesetofMonteCarosimulation

IndependentMonteCarlosimulations(𝑁 → ∞)

ReusethesamesetofMonteCarlosimulation(smallR)

…

𝜎(𝑆) 𝜎(T)

…

vs.

…

𝜎(𝑆) 𝜎(T)

vs.

…

1…N 1…N 1…R 1…R

Monotonicityandsubmodularity arestrictlysatisfied.Scalability-accuracydilemmaissolved.

InfluencemaximizationStaticGreedy:Results

𝑑 Q,R

log 𝑅

*,

, *

( ) ( )( )

k R kR k

k

I S I Sd

I S-

=

Run

ning

tim

e

Datasets

Our method

Our method

• NumberofMone Carlosimulationsdecreasesby2-3ordersofmagnitudes

• Runningtimedecreasesby3-4ordersofmagnitudes

Chengetal.,CIKM2013

InfluencemaximizationSummary

• Previousresearchfocusesondesigningscalableandaccuratealgorithmforinfluencemaximization– Limitedbyinfluencespreadmodel– Limitedbyunknowninterpersonalinfluence[Wangetal.,AAAI2015]

• Forfurtherresearchoninfluencemaximization– Fullydata-driven

• e.g.,selectseednodesfromhistoricalcascades

– Withoutrequiringinfluencespreadmodel• End-to-endwithoutcaringthespreadpathofinfluence

PartIII:

Collectivecreditallocationinscience[Shenetal.,PNAS2014]

Creditallocation

Multi-authorpapersaredominatingthepublicationofscience,increasingby7percentevery10yearsbetween

1900and2012.

Science’screditsystemisunderpressuretoevolve:Thenormofcreditallocationforsingle-authorpublicationsfailsformulti-authorpublications.

Shen &Barabási,PNAS,2014

CreditallocationVOLUME 76, NUMBER 11 P HY S I CA L REV I EW LE T T ER S 11 MARCH 1996

Generation of Nonclassical Motional States of a Trapped Atom

D.M. Meekhof, C. Monroe, B. E. King, W.M. Itano, and D. J. WinelandTime and Frequency Division, National Institute of Standards and Technology, Boulder, Colorado 80303-3328

(Received 11 October 1995)We report the creation of thermal, Fock, coherent, and squeezed states of motion of a harmonically

bound 9Be1 ion. The last three states are coherently prepared from an ion which has been initiallylaser cooled to the zero point of motion. The ion is trapped in the regime where the coupling betweenits motional and internal states, due to applied (classical) radiation, can be described by a Jaynes-Cummings-type interaction. With this coupling, the evolution of the internal atomic state provides asignature of the number state distribution of the motion.

PACS numbers: 42.50.Vk, 32.80.Pj, 32.80.Qk

Nonclassical states of the harmonic oscillator associatedwith a single mode of the radiation field (for example,squeezed states) have been a subject of considerableinterest. One method for analyzing these states has beenthrough the dynamics of a single, two-level atom whichradiatively couples to the single mode radiation field.This system, described by the Jaynes-Cummings model(JCM) interaction [1,2], is important to the field of cavityQED [3].Nonclassical states of motion occur naturally on an

atomic scale, for example, for electrons in atoms and atomsin molecules. On a macroscopic scale, the benefits ofnonclassical mechanical states, such as squeezed states, fordetection of gravitational waves have been appreciated forsome time [4], but so far these states have not been realized.More recently, there has been interest in the generationand detection of nonclassical states of motion for an atomconfined in a macroscopic, harmonic trap; for trapped ions,see Refs. [5–16]. These states are of interest from thestandpoint of quantum measurement concepts and mayfacilitate other measurements such as sensitive detection[5,7,13] or quantum computation [17].In this Letter we report the generation and detection of

thermal, Fock, coherent, and squeezed states of motionof a single 9Be1 ion confined in an rf (Paul) trap. Wedetect the state of atomic motion by observing the evo-lution of the atom’s internal levels [6,11] (e.g., collapseand revival) under the influence of a JCM-type interactionrealized with the application of external (classical) fields.Under certain conditions, the interaction Hamiltonian isformally equivalent to the JCM Hamiltonian of cavityQED. Here, the harmonic motion of the atom replaces thesingle mode of the radiation field. The coupling can berealized by applying quasistatic fields [7], traveling-wavefields [6,10,13,15], or standing-wave laser fields [8,9,12].In each case the coupling H

I

≠ 2m ? Esrd between in-ternal and motional states is induced by the atom’s motionthrough the spatially inhomogeneous electromagnetic fieldEsrd, where m is the atomic dipole operator.In the present experiment, we drive stimulated Raman

transitions between two hyperfine ground states by apply-

ing a pair of traveling-wave laser beams detuned froman excited electronic state [18]. The resulting interactionbetween these internal states jSl (denoted j #l and j "l)and motional harmonic oscillator states jnl and jn0l in thex direction is given by matrix elements

kS0, n0jHI

jS, nl ≠ h̄VkS0, n

0js1e

ihsa1a

yd

1 s2e

2ihsa1a

ydjS, nl (1)in a frame which rotates at the difference frequency of thelaser beams. In this expression, s1 (s2) is the raising(lowering) operator for the internal atomic state, a

y (a) isthe harmonic oscillator raising (lowering) operator, and Vis the Raman coupling parameter [5,13,18]. The Lamb-Dicke parameter is defined by h ; dk x0, where dk isthe wave-vector difference of the two Raman beams alongx, and x0 ≠

ph̄y2mv is the spread of the jn ≠ 0l wave

function in the harmonic well of frequency v.The order n

0 2 n of the vibrational coupling is selectedby tuning the Raman beam difference frequency. For ex-ample, by tuning to the first red sideband in the Ramanspectrum, we resonantly enhance the term which drivestransitions between states j #, nl and j ", n 2 1l. In theLamb-Dicke limit [dk

pkx2l ø 1, x ≠ x0sa 1 a

yd], theexponentials in Eq. (1) can be expanded to lowest order,resulting in the operator hsas1 1 a

ys2d, which corre-sponds to the usual JCM operator. We can easily controlthe strength and duration of the interaction by varying theintensity and time the lasers are applied. By choosing otherlaser tunings, we can select other operators such as the anti-JCM operator hsays1 1 as2d at the first blue sideband(which is not present in cavity QED) or the “two-phonon”JCM operator sh2y2d sa2s1 1 a

y2s2d at the second redsideband. In this experiment, the higher-order terms inthe expansion of the exponential in Eq. (1) must also betaken into account [19]. Reference [20] has explicitly dis-cussed the consequences of these higher-order terms on thetrapped ion internal and motional state dynamics.Additional differences from cavity-QED experiments

include the methods of state generation available (de-scribed below) and the relatively small decoherence. Inall but the case of thermal states, we coherently prepare

1796

2012NobelPrize-winningpaperinPhysics




Creditallocation1984NobelPrize-winningpaperinPhysics


Alphabetic author list.

Creditallocation

Howtoallocatecreditformulti-authorpublications?

Problem:

Challenge:

1. Multipleauthorshipbreaksthesymmetrybetweenauthors’contributionandthecredittheygotfortheircontribution.

2. Itishardtoquantifytheactualcontributionofauthors,especiallyforthoseoutsideoftheparticularresearchfield.

3. Eachdisciplinerunsitsowninformalcreditallocationsystem.


Creditallocation

CaseA2010NobelPrizeinChemistry

CaseB2010NobelPrizeinPhysics

Baba, Negishi,J.Am.Chem.Soc.98,6729(1976) Novoselov,Geim,Science,306,666(2004)


Casestudy:

Frequentlyco-citedpapers:

1.Negishi,Okukado,King,Van Horn,Spiegel,J.Am.Chem.Soc.(1978)

2.Negishi,King,Okukado,J.Org.Chem.(1977)

3.Negishi,Vanhorn,J.Am.Chem.Soc.(1977)


5.Negishi,Valente.Kobayashi,J.Am.Chem.Soc.(1980)


1.Geim,Novoselov,Nature(2007)

2.Novoselov,Jiang,Schedin,Booth,Khotkevich,Morozov,Geim,PNAS(2005)

3.Novoselov,Geim,Morozov,Jiang,Katsnelson,rigorieva,Dubonos,Firsov,Nature(2005)4.CastroNeto,Guinea,Peres,Novoselov,Geim,Rev.Mod.Phys.(2009)

5.Ferrari,Meyer,Scardaci,Casiraghi,Lazzeri,auri,Piscanec.Jiang,Novoselov,Roth,Geim.Phys.Rev.Lett.(2006)

Creditallocation

Co-cited papers:Co-citation strength sCredit allocation matrix A

Credit share:c=As


Priorforcreditallocation

Anycreditallocationmethodforsinglemulti-authorpaperscouldbetakenasaprior,e.g.,fractionalcreditallocation,harmoniccreditallocation

Creditallocation

CaseA2010NobelPrizeinChemistry

CaseB2010NobelPrizeinPhysics

Baba, Negishi,J.Am.Chem.Soc.98,6729(1976) Novoselov,Geim,Science,306,666(2004)


Caserevisiting:


1.Negishi,Okukado,King,Van Horn,Spiegel,J.Am.Chem.Soc.(1978)

2.Negishi,King,Okukado,J.Org.Chem.(1977)



5.Negishi,Valente.Kobayashi,J.Am.Chem.Soc.(1980)


1.Geim,Novoselov,Nature(2007)

2.Novoselov,Jiang,Schedin,Booth,Khotkevich,Morozov,Geim,PNAS(2005)

3.Novoselov,Geim,Morozov,Jiang,Katsnelson,rigorieva,Dubonos,Firsov,Nature(2005)4.CastroNeto,Guinea,Peres,Novoselov,Geim,Rev.Mod.Phys.(2009)

5.Ferrari,Meyer,Scardaci,Casiraghi,Lazzeri,auri,Piscanec.Jiang,Novoselov,Roth,Geim.Phys.Rev.Lett.(2006)

Creditshare:(0.28,0.72) Creditshare:(0.5,0.5)

Datasets• American Physical Society (APS)

– Period: 1893~2009– Papers and citations from all the 11 journals of APS– 463, 348 papers, 4, 710, 547 citations, and 248, 738 authors.

• Web of Science (WOS)– Period: 1955-2012– Multidisciplinary– 37, 553, 657 papers, 672, 321, 250 citations, and 8, 724, 394 authors

Creditallocation


Datasets:APS: American Physical SocietyWOS: Web of science

Nobel prize-winning papers

Validation

Metric:Whether our method could identify the Nobel Laureates from the author list.

Wearecorrectat51of63testcases.Onehitsituation:Firstauthor:30;LastAuthor:32;Ourmethod:56

Middle-authorLaureates

CreditallocationCreditshareevolution


ca: average credit share over 3 years after publication;cb: average credit share over 3 years before publication;Increase ratio: ca / cb

Tocheckcreditshareevolution,weonlythecitationsbeforeeachyearforcreditallocation

Nobelprizeeffect

CreditallocationComparingindependentauthors


Three independent papers (six scientists) contribute to the discovery of Higgs Boson.

WhogetstheNobelprize,i.e.,whogetshighcreditfromtheNobelcommittee?

Creditallocation

"Ireallyratherhopedbeforetheannouncementthattheywouldmakethenumberuptothree,andtherewascertainlyanobviouscandidatetobethethird, TomKibble”

(PeterHiggs, BBCInterview2014)

Higgs

Kibble

Englert, Brout

Guralnik, Hagen

Comparingindependentauthors

Higgs & Englert

Kibble

• Wedevelopedamethodtoquantifythecreditshareofcoauthorsbyreproducingthecollectivecreditallocationprocess informallyusedbythescientificcommunity.– Creditisallocatedamongcoauthorsbasedontheirperceived

contributionratherthantheiractualcontribution;– Establishedscientistsreceivemorecreditthantheirjunior

collaboratorsfromtheircoauthoredpublication• Thissituationcanchange,however,ifthejunioronemakesimportantindependent

contributiontothefield

– Creditshareisadynamicquantitythechangeswiththeevolutionofthefield

CreditallocationSummary

• Creditisallocatedbythewholecommunity ratherthanthecoauthorsthemselves.

• Citation isthemostelementaryformofvisibilityandcreditinthescientificcommunity.– Othertokensofimpact,likeinvitedtalks,keynotes,mentoring,books,

implicitlyalterthecreditsharebyenhancingthevisibilityandcitationsrelativetoothercoauthors.

• Howtochoosethemostappropriatecollaborators?– Youknow.

CreditallocationWhatwecanlearnfromthis?

Doyouwanttoknowthecreditshareyougetfromyourpaper?shenhuawei@gmail.com

Acknowledgements

Xueqi Cheng

Suqi Cheng Yongqing WangPeng BaoJunming Huang

Albert-László Barabási

TongMan Bingjie Sun

Relatedpublicationsfromourgroup1. Hua-WeiShen,Albert-László Barabási.Collectivecreditallocationinscience.PNAS,111(34):

12325–12330,2014.2. HuaweiShen,Dashun Wang,Chaoming Song,Albert-László Barabási.Modelingand

predictingpopularitydynamicsviareinforcedPoissonprocess.AAAI2014.3. Dashun Wang,Chaoming Song,Hua-WeiShen,Albert-László Barabási.Responseto

Commenton“Quantifyinglong-termscientificimpact”.Science,345:149,2014.4. Jinhua Gao,HuaweiShen,Shenghua Liu,Xueqi Cheng.Modelingandpredictingretweeting

dynamicsviaamixtureprocess.WWW2016.5. Yongqing Wang,HuaweiShen,Shenghua Liu,Xueqi Cheng.Learninguser-specificlatent

influenceandsusceptibilityfrominformationcascades.AAAI2015.6. Suqi Cheng,HuaweiShen,JunmingHuang,WeiChen,Xueqi Cheng.IMRank:influence

maximizationviafindingself-consistentranking.SIGIR2014.7. Suqi Cheng,HuaweiShen,JunmingHuang,Guoqing Zhang,Xueqi Cheng.StaticGreedy:

solvingthescalability-accuracydilemmaininfluencemaximization,CIKM2013.8. Peng Bao,Hua-WeiShen,JunmingHuang,Xue-QiCheng.Popularitypredictionin

microbloggingnetwork:AcasestudyonSina weibo.WWW2013.9. TongMan,HuaweiShen,Shenghua Liu,Xiaolong Jin,Xueqi Cheng.Predictanchorlinks

acrosssocialnetworksviaanembeddingapproach.IJCAI201610. Hao Wang,Hua-WeiShen*,Xue-QiCheng.Scientificcreditdiffusion:Researcherlevelor

paperlevel?Scientometrics,2016.

Huawei [email protected]

Thankyou！

cascade prediction

Documents