the good, the bad and the ugly - arizona state universityhuanliu/papers/dfc11132016.pdf · the...

29
Uncovering Novel Opportuni1es Arizona State University Data Mining and Machine Learning Lab DFC2016, Nov 13 1 The Good, the Bad and the Ugly - Uncovering Novel Opportuni1es of Data Science Huan Liu

Upload: others

Post on 04-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 1

TheGood,theBadandtheUgly-UncoveringNovelOpportuni1esofDataScience

HuanLiu

Page 2: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 22hDp://dmml.asu.edu/smm/

Page 3: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 3

BigDataChallengesTradi1onalThinking

•  Dataisubiquitousandcanonlybecomebigger•  Bigdataisnotjustbig– Transforminghowwelive,work,andthink

•  BigdatamakesmanytaskseasierandbeGer•  Anexampleofbigmobiledata–  UsingGPStoguideourtraveltodayvs.notsolongago

•  OpportuniQesarewherechallengesare

Page 4: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 4

Tradi1onalMediaandData

BroadcastMediaOne-to-Many

CommunicaQonMediaOne-to-One Tradi1onalData

Page 5: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 5

SomeChallengesinUnderstandingSocialMedia

•  Noise-RemovalFallacy–  CanweremovenoisewithoutlosingmuchinformaQon?

•  StudyingDistrust(theImplicit)inSocialMedia– Wheretofindtheinvisibledistrust?

•  Big-DataParadox–  Lackofdatawithbigsocialmediadata

•  EvaluaQonDilemma– Whereisgroundtruth?Howtoevaluatewithoutit?

•  DataSamplingBiasandItsMiQgaQon– O^enwegetasmallsampleof(sQllbig)data.Wouldthatdatasufficetoobtaincrediblefindings?

Page 6: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 6

TheGood,theBad,andtheUglyofSocialMediaData

•  Thegood–  Socialmediadataisbigandlinked

•  Thebad–  Socialmediadataisnoisyandshortofdatawhereitismostneeded

•  Theugly–  Socialmediadataisheterogeneous,parQal,andasymmetrical

TwoIllustraQveCasesforNovelChallenges:(1)Removingnoise,and(2)Inferringtheimplicit

Page 7: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 77

•  Weo^enheardthat:“99%TwiGerdataisuseless.”– “Hadeggs,sunny-side-up,thismorning”– CanweremovenoiseasweusuallydoinDM?

•  Whatisle^a^ernoiseremoval?– TwiGerdatacanberendereduselessa^erconvenQonalnoiseremoval

•  Aswearecertainthereisnoiseindata,shouldweremoveit?–  Ifyes,how?

•  Anewchallenge:FeatureselecQonwithlinkeddata

RemovingNoise–aFirstTaskinDataMining

Page 8: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 8

SocialDataandFeatureSelec1on

•  High-dimensionalsocialmediadataposesuniquechallengestodataminingtasks

•  FeatureselecQonhasbeenwidelyusedtopreparelarge-scale,high-dimensionaldataforeffecQvedatamining

•  TradiQonalfeatureselecQonalgorithmsdealwithonly“flat"data(a0ribute-valuedata).

•  WenowcantakeadvantageoflinkeddataforfeatureselecQon

Page 9: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 9

Representa1onforSocialMediaData

SocialContext

Page 10: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 10

NewProblemStatementofFeatureSelec1on

•  GivenlabeleddataXanditslabelindicatormatrixY,thedatasetF,itssocialcontextincludinguser-userfollowingrelaQonshipsSanduser-postrelaQonshipsP,

•  SelectkmostrelevantfeaturesfrommfeaturesfordatasetFwithitssocialcontextSandP

Page 11: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 11

HowtoUseLinkInforma1on

•  WouldtheaddiQonal(i.e.,link)informaQonbeusefulforfeatureselecQon?

•  Sometechnicalchallenges– RelaQonextracQon:WhataredisQnctrelaQonsthatcanbeextractedfromlinkeddata

– MathemaQcalrepresentaQon:HowtousetheserelaQonsinfeatureselecQonformulaQon

•  AretheretheoriestoguideusingeneraQnghypotheses?

Page 12: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 12

SocialTheoriesGuidedResearch

•  SocialcorrelaQontheoriessuggestthatthefourrelaQonsmayaffecttherelaQonshipsbetweenposts

•  SocialcorrelaQontheories– Homophily:Peoplewithsimilarinterestsaremorelikelytobelinked

–  Influence:Peoplewhoarelinkedaremorelikelytohavesimilarinterests

•  Guidedbytheories,weturnsocialrelaQonshypothesesforinvesQgaQon

Page 13: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 13

1.  CoPost2.  CoFollowing3.  CoFollowed4.  Following

Rela1onExtrac1on

Page 14: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 14

Evalua1onResultsonDigg

Page 15: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 15

Evalua1onResultsonDigg

Page 16: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 16

Summary

•  WeevaluateiflinkinformaQoncanbeusedforfeatureselecQonandunderstandhowitworks– LinkinformaQoncanhelpfeatureselec<onforsocialmediadata,inparQcular,whenweareshortofdata

•  Unlabeleddataismoreo^eninsocialmedia,unsupervisedlearningismoresensible,butalsomorechallenging

Page 17: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 17

InferringtheImplicit–SecondCase

•  Bothtrustanddistrust(posiQveandnegaQveinfo)helpdecisionmakersreducetheuncertaintyandriskassociatedwithdecisions

•  Distrustmayplayanequally,ifnotmore,criQcalroleastrustdoesindecisionmaking

•  DistrustisnewinSocialMediaAnalysis-AsymmetryofinformaQonavailable(likevsdislike)

•  Distrustis,however,notnewinSocialSciences-VariousdefiniQonofdistrustinSocialSciences

Page 18: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 18

TwoTheoriesofDistrustfromSocialSciences

•  DistrustisthenegaQonoftrust─ Lowtrustisequivalenttohighdistrust─ Theabsenceofdistrustmeanshightrust─ LackofthestudyingofdistrustmaGersliGle

•  Distrustisanewdimensionoftrust─ Trustanddistrustaretwoseparateconcepts─ Trustanddistrustcanco-exist─ AstudyignoringdistrustwouldyieldanincompleteesQmateoftheeffectoftrust

Page 19: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 19

ChallengesinStudyingDistrustinSocialMedia

•  Challenge1:LackofcomputaQonalunderstandingofdistrustwithsocialmediadata– SocialmediadataisbasedonpassiveobservaQons– LackofsomeinformaQonthatsocialsciencesconvenQonallyusetoconductstudies

•  Challenge2:DistrustinformaQonisusuallynotpubliclyavailable– Trustisdesiredwhiledistrustisnotforopenonlinesocialplaoorms

Page 20: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 20

Computa1onalUnderstandingofDistrust

•  DesigncomputaQonaltaskstohelpunderstanddistrustwithpassivelyobservedsocialmediadata

§ Q1:Isdistrustthenega1onoftrust?– YesorNo?

§ Q2:IsthereanyvalueofdistrustaYerQ1isanswered?–  Ifdistrustisanewdimensionoftrust,whatisaddedvalueofdistrust

•  HowcanweusesocialmediadatatocomputaQonallyanswerthetwoquesQons?

Page 21: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 21

Task1:Isdistrustthenega1onoftrust?

•  IfdistrustisthenegaQonoftrust,orlowtrustisequivalenttodistrust,distrustshouldbepredictableusingtrustinformaQon

Distrust LowTrust

Predic1ngDistrust

Predic1ngLowTrust

IF

THEN

Page 22: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 22

Evalua1onofTask1

§ Theperformanceofusinglowtrustfordistrustisconsistentlyworsethanrandomlyguessing§ Task1:Sinceitfailstopredictdistrustwithonlytrust,distrustisnotthenegaQonoftrust

dTP:ItusestrustpropagaQontocalculatetrustscoresforpairsofusersdMF:ItusesthematrixfactorizaQonbasedpredictortocomputetrustscoresforpairsofusersdTP-MF:ItisthecombinaQonofdTPanddMFusingOR

Page 23: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 23

Task2:Isthereanyaddedvalueofdistrust?

•  Ifdistrusthasanyaddedvalue,weshouldpredicttrustbeGerwithdistrust

•  Toverifytheabovestatement,wedefinethesecondcomputaQonaltaskinvolvingdistrust–  IncorporaQngdistrustintrustpredic1on

OldTrust NewTrust Distrust

TrustPredicQon

Page 24: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 24

Evalua1onofDistrustinTrustPropaga1on

•  IncorporaQngdistrustpropagaQoncanimprovetheperformanceoftrustmeasurement

•  OnestepdistrustpropagaQonusuallyoutperformsmulQplestepdistrustpropagaQon

x%x

PAPerformance

Page 25: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 25

ExperimentalSe[ngsforTask2

•  x%ofpairsofuserswithtrustrelaQonsarechosenasoldtrustrelaQonsandtheremainingasnewtrustrelaQons

•  Task2predictspairsofusersPfromasnewtrustrelaQons

•  Theperformanceiscomputedas ||||

nT

nT

APAPA ∩

=

xTN| AT

n |

Page 26: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 26

FindingsfromUnderstandingDistrust

•  DistrustpresentsdisQnctproperQes– ProperQesoftrustcannotbeextendedtodistrust

•  DistrustisnotthenegaQonoftrust– Lowtrustfailstopredictdistrust

•  Distrusthasaddedvalueovertrust– DistrusthelpsimprovetrustpredicQonperformance

•  However,distrustinformaQonisusuallynotavailableonasocialnetworkingsite

•  Nexttask-discoveringnegaQvelinkslikedistrust

Page 27: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 27

SomeChallengesinUnderstandingSocialMedia

•  Noise-RemovalFallacy–  CanweremovenoisewithoutlosingmuchinformaQon?

•  StudyingDistrustinSocialMedia– Wheretofindtheinvisibledistrust?

•  Big-DataParadox–  Lackofdatawithbigsocialmediadata

•  EvaluaQonDilemma– Whereisgroundtruth?Howtoevaluatewithoutit?

•  SamplingBiasandItsMiQgaQon– O^enwegetasmallsampleof(sQllbig)data.Wouldthatdatasufficetoobtaincrediblefindings?

Page 28: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 2828

•  scikit-feature–anopensourcefeatureselecQonrepositoryinPython

•  SocialCompuQngRepository•  Somebooksavailableasfreedownload

RepositoriesandRecentBooks

Page 29: The Good, the Bad and the Ugly - Arizona State Universityhuanliu/papers/DFC11132016.pdf · The Good, the Bad, and the Ugly of Social Media Data • The good – Social media data

UncoveringNovelOpportuni1esArizonaStateUniversityDataMiningandMachineLearningLab DFC2016,Nov13 2929

•  Forthisopportunitytoshareourresearch•  Acknowledgments– GrantsfromNSF,ONR,andARO– DMMLmembersandprojectleaders–  Collaborators

Search“huanLiu”formoreinformaQonorathGp://www.public.asu.edu/~huanliuHLiu,FMorstaGer,JTang,andRZafarani.``Thegood,thebad,andtheugly:uncoveringnovelresearchopportuni1esinsocialmediamining",inTrendsofDataScience,InternaQonalJournalonDataScienceandAnalyQcs,SpringerInternaQonalPublishingSwitzerland.September,2016.DOI10.1007/s41060-016-0023-0

THANKYOUandDFC2016