qut stage2 document avijit paul

21
Creative Industries Faculty Queensland University of Technology Extracting meaningful information from Social Network streams for crisis mapping Avijit Paul (n8459941) Stage 2 Proposal, Doctor of Philosophy May 2012 2012

Upload: avijit-paul

Post on 27-Oct-2014

102 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: QUT Stage2 Document Avijit Paul

     

C r e a t i v e   I n d u s t r i e s   F a c u l t y   -­‐   Q u e e n s l a n d   U n i v e r s i t y   o f   T e c h n o l o g y    

Extracting  meaningful  information  from  Social  Network  streams  for  crisis  mapping  Avijit  Paul  (n8459941)    Stage  2  Proposal,  Doctor  of  Philosophy    May  2012            

 2012  

08  Fall  

Page 2: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

2  

Table  of  Contents  

1.    The  Proposed  Title  .....................................................................................................................  3  

2.    The  Proposed  Supervisors  and  their  Credentials  .........................................................................  3  Principal  Supervisor:  Associate  Professor  Dr.  Axel  Bruns  .............................................................................  3  Associate  Supervisor:  Associate  Professor  Dr.  Dian  Tjondronegoro  .............................................................  3  Associate  Supervisor:  Dr.  Oksana  Zelenko  ....................................................................................................  3  

3.    Background  and  Literature  Review  .............................................................................................  4  Keywords  ..................................................................................................................................................  5  Research  Domain  ......................................................................................................................................  5  

3.1    Introductory  Statement  ............................................................................................................  6  

3.2    Literature  Review  .....................................................................................................................  8  New  Media  &  Communication  Studies  ......................................................................................................  8  

Crisis  Communication  and  Social  Media  .......................................................................................................  8  Twitter  Analytics  .......................................................................................................................................  9  

Contextual  Analysis  .......................................................................................................................................  9  Computational  Linguistic  ............................................................................................................................  10  

Information  Design  ..................................................................................................................................  10  Visual  Analytics  ...........................................................................................................................................  11  

Early  Detection  ........................................................................................................................................  11  

3.3    Research  Problem  ..................................................................................................................  11  Central  Research  Problem:  How  to  extract  and  present  useful  information  from  Social  Media  stream  during  crisis  time?  ...................................................................................................................................  11  Sub  Problem  1:  How  to  identify  what  is  useful  information?  ...................................................................  12  Sub  Problem  2:  How  to  capture  selected  data  from  Social  Media  Stream?  ..............................................  12  Sub  Problem  3:  How  to  extract  and  analyse  captured  data  in  real  time  to  find  useful  information  ..........  12  Sub  Problem  4:  How  to  present  the  information  to  stakeholders  .............................................................  13  

4.    Program  And  Design  Of  The  Research  Investigation  ..................................................................  13  

4.1    Objectives,  Methodology  and  Research  Plan  ..........................................................................  14  

4.2    Resources  and  Funding  Required  ............................................................................................  15  Books  and  journals  required  ....................................................................................................................  16  

4.3    Individual  Contribution  to  the  Research  Team  ........................................................................  16  

4.4    Timeline  of  Completion  of  the  Program  ..................................................................................  16  

5.    Reference  List  ...........................................................................................................................  18  

6.    Appendix  ..................................................................................................................................  21  

6.1    Coursework  ............................................................................................................................  21  

 

Page 3: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

3  

1.    The  Proposed  Title    

Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping  

2.    The  Proposed  Supervisors  and  their  Credentials    

Principal  Supervisor:  Associate  Professor  Dr.  Axel  Bruns    

Dr.  Axel  Bruns  is  an  Associate  Professor  in  the  Creative  Industries  Faculty  at  Queensland  

University  of  Technology  (QUT)  in  Brisbane,  Australia,  and  a  Chief  Investigator  in  the  ARC  Centre  of  

Excellence  for  Creative  Industries  and  Innovation  (cci.edu.au).  He  is  the  author  of  Blogs,  Wikipedia,  

Second  Life  and  Beyond:  From  Production  to  Produsage  (2008)  and  Gatewatching:  Collaborative  

Online  News  Production  (2005),  and  the  editor  of  Uses  of  Blogs  with  Joanne  Jacobs  (2006;  all  

released  by  Peter  Lang,  New  York).  On  top  of  developing  metrics  to  analyse  and  map  twitter  data,  in  

recent  years  he  has  published  a  vast  array  of  research  in  the  area  of  Social  Network  and  Crisis  

Communication  that  includes  topics  such  as  “Twitter  and  Crises”,  “Twitter  and  Disaster  Resilience”.      

 

Associate  Supervisor:  Associate  Professor  Dr.  Dian  Tjondronegoro      

Dr.  Dian  Tjondronegoro  is  an  Associate  Professor  at  QUT,  research  and  teaching  in  the  area  of  

“Mobile  and  Multimedia  Technologies”.  Dr.  Tjondronegoro  leads  the  “Mobile  Multimedia  Research  

Group”  and  teaches  in  the  area  of  “Mobile  Devices  and  Mobile  Application  Development”.  Of  specific  

significance  to  this  project  is  his  expertise  in  extracting  semantic  contents  from  video  using  

audiovisual  features.  Prior  to  this  experience,  Dr.  Tjondronegoro  has  examined  cross-­‐media  content  

tagging  and  clustering  of  text,  image,  and  video  to  support  extraction  of  semantically  related  web  

content.  

 

Associate  Supervisor:  Dr.  Oksana  Zelenko      

Dr.  Oksana  Zelenko  is  a  researcher  at  Creative  Industries  Faculty  at  QUT.  Her  research  area  

focuses  on  the  role  of  visual  and  interaction  design  in  the  field  of  mental  health  promotion  for  

children  and  young  people.  Previously  her  design  work  included  researching  and  developing  online  

visual  counseling  tools  that  are  currently  in  use  by  one  of  Australia's  largest  youth  counseling  

organisations.  On  top  of  that,  Dr.  Zelenko  has  also  demonstrated  expertise  in  the  area  of  information  

design  for  community  resilience  and  organisational  communication.    

Page 4: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

4  

3.    Background  and  Literature  Review    

During  recent  natural  disasters  (e.g.,  Queensland  Flood  in  2010-­‐2011  and  Earthquake,  Tsunami  

and  Nuclear  Crisis  in  Japan  2011)  millions  of  status  updates  appeared  on  various  social  networks,  

indicating  that  people’s  reliance  on  social  media  at  the  time  of  disaster  has  increased  tremendously  

in  recent  years.  The  greatest  concern,  however,  when  it  comes  to  harvesting  information  from  users  

of  Social  Networks  to  emergency  service  is  the  uncertain  credibility  of  received  data  content.  At  

present  it  is  highly  problematic  to  differentiate  between  information  that  has  high  degree  of  crisis-­‐

relevance  and  that  information  which  has  a  very  low  degree  of  crisis-­‐relevance.  Prior  research  by  

Bruns  (2011),  Potts  et  al.,  (2011)  shows  that  using  certain  methods,  such  as  following  keywords  and  

hashtags  from  publicly  available  data  in  twitter  make  it  possible  to  identify  information  related  to  a  

specific  crisis  in  progress  and  extract  meaningful  information  from  these  status  updates  or  tweets.    

 

However,  as  tweets  are  produced  and  disseminated  extremely  quickly,  there  exists  the  very  

practical  consideration  of  filtering  highly  useful  information  stream  from  non-­‐relevant  tweets  

(Boulos  et  al.,  2011).  This  is  not  simply  an  inconvenience,  it  poses  a  significant  challenge  that  if  

resolved  can  mean  the  different  between  life-­‐saving  decisions  and  life-­‐wasting  decisions.    

 

This  concern  is  compounded  by  managing  the  complex  task  of  appropriately  disseminating  the  

crisis-­‐relevant  information  that  is  harvested  by  filtering  social  media  stream,  to  the  multiple  

government  disaster  relief  agencies  (DCS,  2011)  and  Non-­‐Government  Organisations  (NGO’s)  whose  

relief  capacities,  resources  and  decision  would  be  highly  valued  by  such  information.  Additionally,  as  

the  state  and  the  value  of  the  information  during  crisis  change  constantly,  information  

representation  techniques  need  improvement  in  order  to  present  temporal  data  in  actionable  

manner.  The  literature  demonstrates  a  gap  in  current  approaches  in  presenting  such  information  to  

these  stakeholders.  

 

Therefore  this  project  will  address  some  of  the  issues  that  surround  the  management  and  the  

dynamic  state  of  unfolding  disaster  by  extracting  high-­‐value,  context-­‐specific  and  chronologically  

framed  disaster-­‐based  information.  Through  a  process  of  digital  harvesting  and  categorising  social  

media  conversation  streams,  this  project  also  seeks  to  deliver  both  a  framework  and  a  system  that  

will  facilitate  key  decision  making  processes  during  times  of  natural  disaster.    

 

 

 

Page 5: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

5  

Keywords    

Natural  Disaster,  Flood,  Earthquake,  Social  Network  Analysis,  Twitter  Analytics,  Big  data,  

Visualisation,  Information  Retrieval,  Text  Mining,  Machine  Learning,  Natural  Language  Processing.  

   

Research  Domain    

This  research  utilizes  an  interdisciplinary  approach  that  combines  elements  from  media  and  

communication  studies,  crisis  communication,  communication  design,  twitter  analytics,  sentiment  

analysis  and  computational  linguistic.    

 

 

 

   

Image  1:  Domain  areas  of  this  research  

 

 

 

 

 

 

Page 6: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

6  

 

3.1    Introductory  Statement    

  The  first  24  hours  are  often  the  most  critical  time  during  any  natural  disaster  and  is  also  the  

period  when  most  community  harms  occurs  (DCS,  2011).  Casualty  increases  due  to  slow  response  

time  from  relief  organisations  as  they  lack  verifiable  information  (Meier,  2012).  The  Department  of  

Community  Safety  (DCS)  of  Queensland  Government,  for  example,  in  its  2011  report  entitled  “All  

Hazards’  Information  Management  Program”  have  identified  reducing  response  time  during  disaster  

event  a  priority  in  order  to  reduce  community  harm  (DCS,  2011)  (Image  2  below).      

 

 Image  2:  Enhancing  disaster  response  system  from  current  to  future  (DCS,  2011).  

 

Prior  research  suggests  that  by  using  crowd-­‐sourced  information  from  various  sources  including  

social  networks,  it  is  potentially  possible  to  shorten  the  time  it  takes  to  find  information  that  allows  

faster  response  time  (Platt,  Hood,  &  Citrin,  2011).  In  recent  disasters  people  from  all  over  the  world  

used  social  network  sites  to  update  their  situation  and  seek  help.  This  made  Social  Media  streams  an  

extremely  powerful  information  source  during  crisis  events  (Muralidharan,  Rasmussen,  Patterson,  &  

Shin,  2011).  Two  social  networking  sites,  Facebook  and  Twitter,  were  most  popular  among  the  social  

network  sites  during  these  acute  events.  However,  prior  research  suggests  that  due  to  their  “walled-­‐

garden”  approach,  Facebook  is  less  accessible  than  twitter  for  public  communication  (Bruns,  2012).  

As  Twitter  updates  are  visible  even  to  a  non-­‐registered  user  and  Twitter  allows  a  user  to  follow  

another  user  without  the  need  to  know  the  person,  a  person  can  follow  a  crisis  authority  quickly  

during  disaster  time  to  receive  real  time  updates.    This  enables  Twitter  to  draw  on  and  also  become  

information  source  at  the  same  time.  For  this  reason  Twitter  is  the  social  network  of  choice  for  this  

research.    

 

Page 7: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

7  

However,  as  updates  in  Twitter  happen  extremely  quickly,  keeping  track  of  all  the  updates  to  

extract  useful  information  is  a  daunting  task.  Additionally,  during  a  crisis  different  authorities  require  

different  information  to  act  on.  Selecting  relevant  information  set  for  related  authority  is  a  challenge  

faced  while  harnessing  power  of  social  media  (DCS,  2011).  According  to  CCI  Floods  report  by  Bruns,  

Burgess,  Crawford  &  Shaw  (2012)  tweets  during  crisis  can  be  categorised  in  five  major  categories;  

information,  Media  Sharing,  Help  and  Fundraising,  Direct  Experience  and  Discussion  and  Reaction.  

Extracting  and  presenting  in  such  groups  can  provide  authorities  with  actionable  information.  

However,  not  all  tweets  can  be  grouped  distinctively  and  therefore  challenge  remains  in  identifying  

tweets  in  real-­‐time  that  do  not  clearly  fall  into  a  certain  group.    

 

Additionally,  a  large  body  of  present  Twitter  research  uses  certain  methods  such  as  hashtags  to  

identify  messages  related  to  a  specific  natural  disaster  and  find  meaningful  information  out  of  that  

(Bruns,  Burgess,  Crawford,  &  Shaw,  2012).  However,  this  method  of  tracking  via  pre-­‐defined  

keywords  has  its  limitations.  As  most  natural  disasters  are  unpredictable  events,  it  is  difficult  to  guess  

which  keywords  will  become  popular  and  noteworthy  in  order  to  be  selected  for  tracking.  

Additionally,  when  a  crisis  happens,  users  may  introduce  new  keywords  or  hashtags,  which  may  take  

time  to  become  noteworthy  or  may  be  abandoned  again  as  other,  similar  keywords  gain  importance  

(Bruns  &  Liang,  2012).    

 

On  top  of  that,  there  are  plenty  of  rumours  and  false  information  in  twitter  (Gayo-­‐Avello,  2012)  

that  makes  information  credibility  one  of  the  biggest  issue  of  twitter  (Castillo,  Mendoza,  &  Poblete,  

2011),  (Gupta,  Zhao,  &  Han,  2012).  Not  all  messages  that  appear  in  tweet  stream  are  authentic  in  

nature.  As  a  result,  rumour  and  fake  information  during  disasters  often  creates  unnecessary  

situations  (Mendoza,  Poblete,  &  Castillo,  2010)  and  contributes  significantly  in  the  irrelevant  

information  or  noise,  which  needs  to  be  eliminated  in  order  to  find  information  that  is  useful.  

Therefore  finding  information  from  their  early  ripples  and  grouping  them  together  before  they  

become  prominent  is  one  of  the  key  areas  of  this  research.    

 

Furthermore,  as  crisis  continues,  status  and  condition  of  a  crisis  situation  gets  updated  and  may  

make  the  information  irrelevant.  At  present  most  of  the  crowdsourced  crisis  information  

visualisation  uses  some  form  of  maps  to  display  information  (Elwood,  2011).  However  map  data  

often  do  not  portray  this  temporal  aspect  of  data  visualisation.  As  presenting  chronological  

information  of  disaster  is  crucial  for  informed  decision  making  at  times  of  disaster,  this  is  another  key  

area  of  this  research.    

 

Page 8: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

8  

  Therefore  the  primary  aim  of  this  research  is  to  formulate  new  research  perspectives  and  

methods  to  extract  and  present  relevant  information  from  on  going  social  media  updates  during  

natural  disasters.  By  building  a  theoretical  framework  and  an  online  system,  this  project  will  harvest  

social  media  conversation  streams  to  help  make  life  saving  decisions.    

3.2    Literature  Review    

New  Media  &  Communication  Studies    

In  recent  years,  new  media  such  as  Social  Media  has  been  heavily  influencing  the  way  we  

communicate  socially  and  interpersonally  (Baym,  Zhang,  &  Lin,  2004).    While  it  has  given  power  to  

ordinary  citizens  to  broadcast  their  message  to  potentially  an  unlimited  number  of  people,  inability  

to  identify  who  actually  reads  the  message  makes  it  very  limited  at  the  same  time.  Therefore  quite  

often  when  someone  tweets  they  only  have  an  imagined  audience  in  mind  and  they  hope  someone  

will  read  it  (Boyd  &  Marwick,  2011).  According  to  prior  research,  this  imagined  audience  affects  how  

people  tweet  and  how  they  balance  their  authenticity  and  reputation  in  the  tweetverse  (Boyd,  

2011).  As  this  research  focuses  on  communication  via  social  media,  theories  of  new  media  and  

communication  studies  will  be  extensively  reviewed.  Additionally,  to  gain  better  understanding  of  

Social  Media  usage  in  Crisis,  literature  on  crisis  communication  will  be  thoroughly  reviewed.  

Crisis  Communication  and  Social  Media    

Prior  research  suggests  that  people  have  been  using  Twitter  for  spontaneous  volunteerism  in  

recent  crisis  situations  (Starbird  &  Palen,  2011).    When  a  crisis  looms,  ordinary  citizens  who  were  not  

affected  takes  up  more  active  role  from  a  passive  ‘everyday  user’  role  (Bruns,  2011)  to  reach  out  and  

help  people  by  using  social  media.  Concepts  such  as  ‘Voluntweeters’,  a  self  organising  online  

microblogging  volunteer  community  has  emerged  in  recent  natural  disasters  without  any  directive  or  

influence  from  governments  or  authorities  (Starbird  &  Palen,  2011).  And  in  case  people  are  unable  to  

directly  contribute  information  from  the  ground,  they  tend  to  ‘retweet’  very  quickly  in  an  effort  to  

spread  the  news  as  fast  as  possible  (Starbird,  2012).    

 

Apart  from  collective  behaviour  phenomena,  Twitter  has  also  been  used  for  intensified  

information  search,  social  convergence  in  physical  space,  and  information  contagion  (Starbird,  Palen,  

Hughes,  &  Vieweg,  2010).  As  Twitter  has  repeatedly  been  proven  to  maintain  connectivity  (Bruns,  

2011),  finding  ways  to  show  empathy  for  the  people  involved  (Sarcevic  et  al.,  2012),  streamline  

Page 9: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

9  

multi-­‐channel  communication  processes  and  options  to  be  readily  accessible  to  the  news  media  

during  crisis  situations  (Large,  2012),  a  thorough  understanding  and  testing  of  Crisis  Communication  

theories  can  help  to  create  necessary  framework  that  can  be  used  to  analyse  social  network  data  

sets  in  real  time.    

Twitter  Analytics    

These  two  way  communications  multiplied  by  thousands  of  people  creates  a  firehose  of  

information  (Wu,  Hofman,  Mason,  &  Watts,  2011).  The  Twitter  firehose  consists  of  the  entire  tweet  

stream  at  any  given  time  (Dong  et  al.,  2010).  Since  the  number  of  updates  can  be  extremely  quick  

and  massive  (more  than  5,000  tweets  per  second  in  twitter  alone  during  Japan  Tsunami  (Empson,  

2012))  microsyntex  format  such  as  usage  hashtags  are  particularly  useful  to  bring  a  particular  topic  in  

the  forefront  of  an  ongoing  conversation  (Stamberger,  2010).  However,  the  contributing  factors  that  

establishes  a  keyword  as  hashtag  is  still  not  well  researched  (Cullum,  2010).    In  fact,  there  is  limited  

research  on  extracting  useful  information  from  the  firehose.  Furthermore,  identifying  keywords  or  

hashtags  alone  is  not  enough  as  various  other  metrics  such  as  widely  shared  links,  influential  users,  

retweets  can  have  significant  importance  and  are  important  items  to  extract  and  analyse  (Boyd,  

Golder,  &  Lotan,  2010)  .    

 

At  present  the  most  common  twitter  analytics  is  done  via  tracking  keywords  and  hashtags  (Bruns  

&  Liang,  2012).  Other  analytics  involve  locating  and  profiling  user  id  (twitter  handles)  (Yugami,  Igata,  

Anai,  &  Inakoshi,  2012),  geo  tagging  (Lee,  Wakamiya,  &  Sumiya,  2011),  URL  and  linkage  data  

(Aggarwal,  2011)  etc.  Twitter  analytics  has  been  used  to  track  academic  citation  prediction  

(Eysenbach,  2011),  temporal  patterns  of  happiness  (Dodds,  Harris,  Kloumann,  Bliss,  &  Danforth,  

2011)  and  finding  meaningful  expression  of  engagement  (Huston,  Weiss,  &  Benyoucef,  2011).  Most,  

if  not  all  Twitter  analytics  however  are  post-­‐hoc  and  the  data  is  archived  first  and  analysed  later.  In  

the  early  stage  of  this  research  I  will  use  the  most  appropriate  method  among  the  methods  available  

to  simulate  and  test  my  hypothesis  and  will  develop  a  new  method  for  real  time  testing  in  the  last  

phase  of  the  research.  This  presents  the  first  research  gaps  on  extracting  meaningful  and  useful  

information  from  an  on-­‐going  social  media  updates  during  crisis.    

Contextual  Analysis    

Even  though  real  time  data  processing  can  be  used  to  extract  data  (Vlachos,  2011),  it  does  not  

have  the  ability  to  identify  meaning  out  of  a  given  context.  In  order  to  understand  meaning,  it  needs  

to  learn  the  rules  and  patterns  (Valero,  Gómez,  &  Pineda,  2009).  Different  methods  such  as  

Page 10: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

10  

dictionary-­‐based,  rule  based,  hybrid  have  been  proposed  for  such  pattern  or  named  entity  

recognition  activity  (Song,  Tjondronegoro,  &  Docherty,  2012),  (Döhling  &  Leser,  2011).  However,  

limited  research  has  been  conducted  in  conjunction  with  disaster  response,  contextual  and  

sentiment  analysis  and  named  entity  recognition  (Park,  Cha,  Kim,  &  Jeong,  2012),  (De  Fortuny,  De  

Smedt,  Martens,  &  Daelemans,  2012).  Thus,  in  order  to  be  usable  in  picking  early  disaster  signals,  

contextual  analysis  can  be  used  to  find  the  meaning  of  a  word  in  context  (Maxwell,  Raue,  Azzopardi,  

Johnson,  &  Oates,  2012).  Therefore,  by  mining  subjective  expression  or  opinion,  it  will  be  able  to  

differentiate  between  similar  words  used  in  different  context  avoid  creating  false  alarm  while  

grouping  extracted  data  from  a  social  media  stream  (Liu,  2010).    

 

Computational  Linguistic    

In  recent  years  there  has  been  a  growing  interest  in  using  Computational  Linguistics  with  Twitter  

during  a  crisis  (Corvey,  Vieweg,  Rood,  &  Palmer,  2010)  mostly  to  identify  trending  keywords  (Sakaki,  

Toriumi,  &  Matsuo,  2011).  It  has  also  been  used  to  problems  with  products  and  service  with  Twitter  

data  (N.  K.  Gupta,  2011).  As  this  research  requires  extensive  analysis  of  text  data  in  order  to  

understand  uses  of  words  in  context,  methods  of  computational  linguistic  in  emergency  will  be  

studied  in  order  to  isolate  noise  data  from  useful  data.  

 

Information  Design    

Traditionally  maps  have  been  used  to  represent  crisis  related  data  in  order  to  identify  priority  

areas  (Tufte,  2001).  However,  as  the  information  changes  rapidly  in  social  networks,  presenting  crisis  

information  gathered  from  Social  Networks  via  map  may  not  be  the  best  way.  Furthermore,  most  of  

the  available  crisis  presentation  system  requires  extensive  manual  entry  and  monitoring  into  a  

system  that  projects  the  data  in  a  crisis  map  (Meier,  2012).  Although  this  has  proven  useful,  it  is  

often  time  and  resource  consuming.  Since  every  minute  is  important  when  saving  lives  after  a  

natural  disaster,  alternative  information  design  and  presentation  techniques  such  as  fractal  maps,  

heat  maps  or  other  non-­‐map  based  visualisation  techniques  will  be  explored.  As  there  has  been  

limited  research  done  on  presenting  data  generated  from  such  massive  datasets  during  disaster,  one  

of  the  major  challenge  for  this  research  is  to  present  real  time  information  extracted  from  social  

network  stream  in  a  meaningful  manner.    

 

Page 11: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

11  

Visual  Analytics    

As  the  amount  of  data  driven  documents  and  services  increases  rapidly,  visual  analytics  is  gaining  

more  and  more  momentum  in  recent  years  (Bostock,  Ogievetsky,  &  Heer,  2011).  Collaborating  and  

Social  visualisation  techniques  have  also  gained  popularity  to  visualise  crowd-­‐sourced  data  (Heer  &  

Agrawala,  2008),  (Keim  et  al.,  2008).    These  visual  analytic  methods  and  processed  will  be  studied  to  

find  how  it  can  be  used  to  best  present  the  data  in  order  to  present  it  quickly  and  effectively  in  a  

crisis  situation.    

 

Early  Detection    

Prior  research  suggested  use  of  social  media  to  predict  health  disasters  such  as  H1N1  using  

traditional  and  social  media  (Liu  &  Kim,  2011).  It  has  also  been  used  to  suggest  low-­‐level  prediction  

of  natural  disasters  (Li,  Wang,  &  Liu,  2011).  However,  once  data  is  gathered,  due  to  vast  differences  

in  the  information  generated,  it  remains  quite  difficult  to  analyse  them  in  real  time.  On  top  of  that,  

there  is  no  established  methodology  to  identify  the  time  taken  before  a  certain  term  becomes  a  

trending  topic:  there  is  a  methodological  gap  when  it  comes  to  identifying  weak  signals  surfacing  

through  social  media  streams  before  they  become  widely  visible,  in  order  to  understand  which  

keywords  are  likely  to  be  important.  Limited  research  has  been  conducted  to  identify  links  between  

social  media  updates  and  natural  disaster  prediction.  Therefore  the  third  area  of  interest  is  to  

probabilistically  identify  relationship  between  social  media  updates  and  potential  natural  disaster.    

3.3    Research  Problem    

Based  on  the  prior  literature  review,  a  central  research  problem  and  four  sub-­‐problems  are  

identified.  They  are;  

Central  Research  Problem:  How  to  extract  and  present  useful  information  from  Social  Media  stream  during  crisis  time?  

 

As  updates  happen  extremely  quickly  in  social  networks,  especially  during  crisis  time,  one  of  the  

most  important  parts  is  to  extract  information  that  is  useful.  Even  though  it  is  possible  to  read  

through  real  time  social  network  data,  the  problem  remains  trying  to  extract  information  that  is  

useful  and  usable  in  close  to  real-­‐time.  Additionally,  quality  of  information  degrades  over  time  and  

current  presentation  techniques  pose  certain  limitations  in  getting  up  to  date  information  quickly.  

Page 12: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

12  

Thus,  the  central  challenge  of  this  thesis  is  to  extract  useful  information  from  Social  Media  and  

present  it  with  as  little  delay  as  possible.    

 

Sub  Problem  1:  How  to  identify  what  is  useful  information?    

As  useful  is  a  relative  term,  the  first  problem  to  address  is  -­‐  what  is  useful  during  crisis  situation?  

As  prior  research  shows  that  in  a  twitter  conversation  there  are  various  patterns  and  metrics  of  

communication,  the  first  challenge  is  to  find  which  metrics;  patterns  and  frameworks  can  identify  a  

conversation  as  useful.  For  example,  finding  out  who  are  the  most  active  users  during  disaster  time,  

who  posts  original  messages  that  get  retweeted  most  may  have  a  significant  impact  to  find  useful  

conversation  and  therefore  will  be  identified  as  a  variable.  Once  the  variables  are  determined,  the  

task  is  to  develop  and  test  the  hypothesis  on  archival  data  before  testing  it  in  a  live  environment  at  a  

later  stage.    

 

Sub  Problem  2:  How  to  capture  selected  data  from  Social  Media  Stream?    

The  second  problem  is  to  capture  data  from  the  social  media  stream  during  a  crisis.  At  present  

there  are  various  methods  available  and  deployed  such  as  twapperkeeper.  However,  most  of  the  

available  capture  methods  looks  for  a  pre  determined  keyword  or  Hashtag  or  pre-­‐identified  user.  As  

this  research  is  looking  for  information  from  a  full  firehose  tweet  stream,  new  methods  such  as  

Hadoop,  Twitter  stom  and  so  on  will  be  used  to  capture  the  Social  Media  Stream.  Since  there  are  

various  methods  available  with  their  own  strength  and  weakness,  finding  the  right  way  to  capture  

will  be  the  second  issue  to  solve.    

 

Sub  Problem  3:  How  to  extract  and  analyse  captured  data  in  real  time  to  find  useful  information  

 

Once  the  method  for  capturing  information  is  identified,  the  next  challenge  is  to  analyse  it  and  

segregate  noise  from  the  information.  The  hypothesis  developed  at  Sub-­‐problem  1  will  be  applied  to  

data  collected  at  Sub-­‐problem  2  at  this  stage.  The  challenge  will  be  to  identify  how  to  separate  filter  

information  from  the  data  source  by  applying  twitter  analytics,  sentiment  analysis,  computational  

linguistic  or  any  other  methods  necessary  in  real  time  to  a  live  twitter  data  stream.    

 

Page 13: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

13  

Sub  Problem  4:  How  to  present  the  information  to  stakeholders    

Once  usable  information  is  extracted,  the  next  challenge  is  to  present  it  in  a  way  that  is  relevant  

to  the  stakeholders,  authorities,  communities,  media  to  act  on.  As  different  stakeholder  require  

different  types  of  information  and  a  one  size  filter  do  not  fit  all  the  information,  the  next  challenge  is  

to  identify  how  to  present  to  them  in  a  flexible  way  so  that  they  can  act  on  it.  Various  visualisation  

techniques  that  were  identified  within  the  literature  review  will  be  tested  at  this  stage  to  find  out  

which  technique  represents  temporal  data  in  a  chronological  manner  most  effectively.    

 

4.    Program  And  Design  Of  The  Research  Investigation      

This  research  will  be  divided  in  a  four  iterative  phases  that  will  allow  me  to  go  constantly  develop  

and  evaluate  the  whole  research  project  (Image  3).  The  key  phases  are-­‐  

 

Phase  0:  Initial  Literature  Review  (First  3  months)  

Phase  1:  Building  hypotheses  and  theoretical  algorithm  from  literature  (2nd  3  months)  

Phase  2:  Capturing  real  time  Twitter  data  using  capturing  technologies  like  Hadoop,  Strom  (last  6  

months  of  first  year)  

Phase  3:  Real  time  analytics,  hypothesis  testing  and  sending  for  evaluation  (2nd  year)  

Phase  4:  Information  design  and  creating  Crisis  Visualization.  (Initial  months  in  3rd  Year)  

 

 

   

Image  3:  Key  phases  of  the  research  design  

 

The  phases  are  broken  down  in  actionable  tasks  below  that  allow  going  back  and  forth  between  

the  tasks  as  deems  necessary.    

 

Page 14: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

14  

4.1    Objectives,  Methodology  and  Research  Plan    

The  objective  of  this  research  is  to  address  the  research  problems  identified  in  earlier  sections.  

And  to  do  that,  mixed  methods  consisting  of  various  qualitative  and  quantitative  research  methods  

will  be  used  in  this  research.  As  there  are  various  methodologies  currently  available  for  data  analysis  

and  communication  during  disaster,  some  of  the  methods  will  use  quantitative  data  and  others  will  

use  qualitative  data.  Below  are  some  of  the  broad  methods  that  will  be  studied  during  this  project.    

 

First  objective  is  to  identify  what  is  useful  and  it  will  be  developed  by  reviewing  literature  in  this  

area.  This  review  will  analyse  reports,  media  and  academic  writing  on  recent  research  in  the  area  of  

social  network,  natural  disaster,  media  &  communication  studies,  crisis  communication,  twitter  

analytics,  sentiment  analysis  and  computational  linguistics.  Based  on  the  studies,  variables  will  be  

identified  to  find  what  is  useful  in  the  context  of  social  media  conversation  during  disaster.    

 

This  will  be  followed  closely  by  development  and  testing  of  the  hypotheses  on  how  Twitter  users  

communicate  during  a  crisis.  In  order  to  do  this,  I  will  first  slice  disaster  related  (QLD  flood,  Japan  

Tsunami,  New  Zealand  earthquake)  twitter  data  gathered  at  CCI  from  twapperkeeper  using  awk  

scripts  (a  data  extraction  and  reporting  tool)  developed  by  Axel  Bruns.  By  mapping  relationship  

between  twitter  datasets  both  in  the  area  of  disaster  and  social  media  communication  I  will  be  able  

to  test  the  developed  hypotheses  on  communication  during  crisis.  I  will  then  formulate  approaches  

to  extract  relevant  information  from  a  large  dataset  archived  at  QUT.  Using  visualisation  tools  such  

as  Gephi  I  will  also  explore  possibilities  of  presenting  information  differently.  This  task  will  be  done  

after  submission  of  stage  2.    

 

However,  as  natural  disasters  are  happening  around  the  world,  research  articles  in  this  area  are  

appearing  rapidly.  To  keep  abreast  of  these  developments,  the  literature  review  will  be  on  going  

throughout  the  Phd  in  case  new  variables  are  identified.  

 

Second  objective  is  to  capture  live  twitter  streams  so  that  it  can  be  stored  for  future  analysis.  In  

order  to  do  this,  I  will  setup  a  NoSQL  database  (Mongo  or  CouchBase)  with  one  Hadoop  and  one  

STORM  cluster  to  store  incoming  twitter  streams.  Although  the  target  at  this  stage  is  to  use  twitter  

firehose  as  the  input  stream,  as  this  access  needs  to  be  purchased,  if  I  am  unable  to  gain  access  for  

that  I  will  use  keyword  specific  input  streams.  The  database  and  the  server  will  initially  be  hosted  via  

two  cloud  instances  from  NeCTAR,  an  Australian  Government  project  conducted  as  part  of  the  Super  

Science  initiative  and  financed  by  the  Education  Investment  Fund.  The  use  of  database  and  cluster  

Page 15: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

15  

file  system  may  vary  if  a  new  and  improved  version  is  released.  This  task  will  be  done  in  between  

stage  2  submission  and  confirmation  seminar.  In  the  end  this  will  result  in  a  system  that  can  capture  

twitter  data  from  the  twitter  firehose  in  real-­‐time  and  will  provide  the  basis  for  real-­‐time  analysis  on  

the  captured  data  stream.    

 

Third  objective  is  to  extract  the  useful  information  from  this  live  twitter  stream.  This  will  be  

done  using  suitable  twitter  analytics  methods  available  at  that  point  of  time.    Additionally,  to  

understand  the  meaning  of  the  words  used  based  on  their  context,  in  order  to  identify  weak  signals  I  

will  apply  contextual  analysis  and  other  computational  linguistic  methods  at  this  stage.  At  this  stage  

the  whole  system  will  go  through  an  iterative  process  of  testing,  evaluation  and  improvement  to  

make  it  more  effective.  This  step  will  use  the  hypothesis  developed  from  the  first  phase  (first  

objective)  and  data  collected  from  the  second  phase  (second  objective)  to  initially  test  on  archival  

data.  Based  on  the  result,  the  system  will  be  sent  for  evaluation  to  the  Queensland  Government’s  

Department  of  Community  Safety  (DCS)  for  assessment.  Improvements  will  be  carried  out  based  on  

the  feedback  gathered.  This  whole  process  will  be  done  during  2nd  year  of  candidature.    

 

Fourth  objective  is  to  present  the  information  in  a  way  that  is  useful  for  the  stakeholders.  

Various  presentation  techniques  will  be  used  to  test  the  extracted  information  in  order  to  see  which  

presents  the  most  benefit.  Since  using  maps  such  as  Google  Map  or  other  maps  are  the  most  

traditional  way  of  presenting  the  information,  the  data  will  first  be  placed  using  that  mapping  

technique.  However  as  maps  have  their  own  limitations  in  dealing  with  temporal  data  in  

chronological  order,  other  techniques  for  information  design  will  be  tested  at  this  stage  based  on  the  

extracted  information.  This  whole  process  will  be  an  iterative  process  with  seeking  feedback  from  

DCS  as  there  are  number  of  ways  the  data  can  be  presented  and  sampled.    

4.2    Resources  and  Funding  Required    

In  the  first  stage  I  will  use  my  own  personal  computer  and  QUT  computers  in  the  lab  in  order  to  

slice  data  with  awk  scripts  and  Gephi  to  visualize.  After  that,  in  order  to  do  real  time  data  extraction,  

I  will  first  use  free  Australian  Research  Cloud  network  (NeCTAR)  instances  that  is  already  available  for  

QUT  students.  At  the  same  time  I  will  also  submit  NeCTAR  RFP  stage  2  in  order  to  secure  a  longer  run  

at  using  their  cloud  instances.  If  I  need  access  to  even  larger  cloud  instances  I  will  use  AWS  (Amazon  

Web  Service)  and  will  apply  further  research  funding  such  as  the  auDA  grant  (.au  domain  

administration  Ltd)  to  support  usage  and  storage  at  AWS  or  other  appropriate  cloud.    

 

Page 16: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

16  

Books  and  journals  required    

As  this  research  taps  into  various  emerging  fields,  some  of  the  books  and  journals  available  are  

still  in  their  early  access  edition  and  therefore  not  available  through  QUT  library.  If  they  are  not  

available,  I  will  request  the  library  to  purchase  them.    

4.3    Individual  Contribution  to  the  Research  Team    

Although  this  is  an  individual  project,  it  is  linked  to  the  ARC  Linkage-­‐funded  project  “Social  

Media  in  Times  of  Crisis:  Learning  from  Recent  Natural  Disasters  to  Improve  Future  Strategies”  with  

collaboration  from  Queensland  Department  for  Community  Safety  and  the  Eidos  Institute.  This  

project  combines  large-­‐scale  quantitative  and  close  qualitative  analysis  to  investigate  the  public  use  

of  social  media  during  disasters,  working  with  key  emergency  management  organisations  to  improve  

their  communication  strategies.  My  contribution  will  be  building  theory  and  framework  on  what  to  

extract  as  well  as  developing  improved  extraction  and  presentation  methods  for  social  media  data  

stream.    

 

4.4    Timeline  of  Completion  of  the  Program    

Please  refer  to  the  attached  timeline.    

 

 

 

 

 

 

 

 

 

 

 

   

Page 17: QUT Stage2 Document Avijit Paul

PHD TIMELINE - AVIJIT PAULTime Elapsed (in months for 3 yr study) 3 6 9 12 15 18 21 24 27 30 33 36 Key Dates Resource Implications ConstraintsPhD MilestonesStage 2 5th June 2012Confirmation 5th March 2013Annual Progress 30th Sept 2013Final Seminar 4th Dec 2014Lodgement 4th Jan 2015Generic Capabilities

Advanced theoretical knowledge and analytical skills, as well as methodological, research design and problem-solving skills in a particular research area; Develop method

ATN More Critical and Creative Thinking

Advanced information processing skills and knowledge of advanced information technologies and other research technologies; AIRSIndependence in research planning and execution, consistent with the level of the research degree

Apply for research grant

Apply for research grant

Apply for research grant

Competence in the execution of protocols for research health and safety, ethical conduct and intellectual property ;

Confirm IP Arrangements

Submit Ethics Application

Complete H&S training

Skills in project management, teamwork, academic writing and oral communication;

ATN Leap Communication and Leadership

ATN Leap Project Mangement

Grad Cert in Research Commercialisation

Awareness of the mechanisms for research results transfer to end-users, scholarly dissemination through publications and presentations, research policy, and research career planning.

ATN More Critical Writing Journal Conference

Publication Workshop

Presentation Workshop Conference Journal

Commercialization exploration

CourseworkAdvanced Information Retrieval Skills (IFN001 Mandatory for PhD candidates) 15th June 2012Enquiry to Creative Industries (KKP 6601) 15th June 2012Thesis WritingTitle & AbstractIntroductionLiterature ReviewMethodologyData Analysis - Archival DataData Analysis - Live DataData Analysis - Visual AnalyticsDiscussionConclusionResearch Process (methodology in sections)Accessing LiteratureConsider MethodologiesHypothesis developmentReal Time CaptureImplementation of Real time Analytics

Live testing with Twitter Stream Funding for large scale access to twitter data

If unable to gain access will work with keywords

information designGather ResultsApprovals/Agreements/ApplicationsIntellectual PropertyEthicsIndustry Health & safety ScholarshipsGrants in AidWrite Up ScholarshipOutputsConference PapersJournalsSystem Commercialization

Meeting Final Seminar timeline

Confirmation Seminar

Develop tools

Develop skills in statistics, use or key software e.g. endnote, SPSS, AWK, STORM, Python Data analysis

Page 18: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

18  

5.    Reference  List    

Aggarwal,  C.  C.  (2011).  An  Introduction  To  Social  Network  Data  Analytics.  

Axel  Bruns,  J.  B.  (2011).  New  methodologies  for  researching  news  discussion  on  Twitter.  Paper  presented  at  the  

The  Future  of  Journalism,  Cardiff,  UK.  

Baym,  N.  K.,  Zhang,  Y.  B.,  &  Lin,  M.  C.  (2004).  Social  interactions  across  media.  New  Media  &  Society,  6(3),  299.    

Bostock,  M.,  Ogievetsky,  V.,  &  Heer,  J.  (2011).  D3:  Data-­‐Driven  Documents.  Visualization  and  Computer  

Graphics,  IEEE  Transactions  on,  17(12),  2301-­‐2309.    

Boulos,  M.  N.  K.,  Resch,  B.,  Crowley,  D.  N.,  Breslin,  J.  G.,  Sohn,  G.,  Burtner,  R.,  Pike,  W.,  Jezierski,  E.,  Chuang,  K.-­‐

Y.  S.  (2011).  Crowdsourcing,  citizen  sensing  and  sensor  web  technologies  for  public  and  environmental  

health  surveillance  and  crisis  management:  trends,  OGC  standards  and  application  examples.  

International  Journal  of  Health  Geographics,  10.    

Boyd,  D.  (2011).  Research  on  Social  Network  Sites.    

Boyd,  D.,  Golder,  S.,  &  Lotan,  G.  (2010).  Tweet,  tweet,  retweet:  Conversational  aspects  of  retweeting  on  

twitter.  1-­‐10.    

Boyd,  D.,  &  Marwick,  A.  E.  (2011).  I  tweet  honestly,  I  tweet  passionately:  Twitter  users,  context  collapse,  and  

the  imagined  audience.  New  Media  &  Society,  13(1),  114.    

Bruns,  A.  (2011).  Towards  Distributed  Citizen  Participation:  Lessons  from  WikiLeaks  and  the  Queensland  Floods.  

Paper  presented  at  the  Conference  for  E-­‐Democracy  and  Open  Government,  Krems,  Austria    

Bruns,  A.  (2012).  Ad  Hoc  Innovation  by  Users  of  Social  Networks:  The  Case  of  Twitter  ZSI  Discussion  Paper          

Bruns,  A.,  &  Liang,  Y.  E.  (2012).  Tools  and  methods  for  capturing  Twitter  data  during  natural  disasters.  First  

Monday,  17(4-­‐2).    

Bruns.,  A.,  Burgess,  J.,  Crawford,  K.,  &  Shaw,  F.  (2012).  CCI  Floodsreport:  Media  Ecologies  Project,  ARC  Centre  

of  Excellence  for  Creative  Industries  &  Innovation.  

Castillo,  C.,  Mendoza,  M.,  &  Poblete,  B.  (2011).  Information  credibility  on  twitter.  

Corvey,  W.  J.,  Vieweg,  S.,  Rood,  T.,  &  Palmer,  M.  (2010).  Twitter  in  mass  emergency:  what  NLP  techniques  can  

contribute.  Paper  presented  at  the  Proceedings  of  the  NAACL  HLT  2010  Workshop  on  Computational  

Linguistics  in  a  World  of  Social  Media,  Los  Angeles,  California.    

Cullum,  B.  (2010).  What  makes  a  hashtag  successful.  Retrieved  April  8th,  2012,  from  

http://www.movements.org/blog/entry/what-­‐makes-­‐a-­‐twitter-­‐hashtag-­‐successful/  

DCS,  Q.  G.  (2011).  ‘All  Hazards’  Information  Management  Program  

http://www.btrc.qld.gov.au/c/document_library/get_file?uuid=a4491bd2-­‐cfe5-­‐466b-­‐a003-­‐

45f86878bc85&groupId=12276.  Brisbane:  QLD  Government.  

De  Fortuny,  E.  J.,  De  Smedt,  T.,  Martens,  D.,  &  Daelemans,  W.  (2012).  Media  coverage  in  times  of  political  crisis:  

a  text  mining  approach:  University  of  Antwerp,  Faculty  of  Applied  Economics.  

Dodds,  P.  S.,  Harris,  K.  D.,  Kloumann,  I.  M.,  Bliss,  C.  A.,  &  Danforth,  C.  M.  (2011).  Temporal  patterns  of  

happiness  and  information  in  a  global  social  network:  hedonometrics  and  Twitter.  [;  Research  

Support,  U.S.  Gov't,  Non-­‐P.H.S.].  PloS  one,  6(12),  e26752.    

Page 19: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

19  

Döhling,  L.,  &  Leser,  U.  (2011).  EquatorNLP:  Pattern-­‐based  Information  Extraction  for  Disaster  Response.    

Dong,  A.,  Zhang,  R.,  Kolari,  P.,  Bai,  J.,  Diaz,  F.,  Chang,  Y.,    Zhaohui,  Z.  (2010).  Time  is  of  the  essence:  improving  

recency  ranking  using  twitter  data.  

Elwood,  S.  (2011).  Geographic  Information  Science:  Visualization,  visual  methods,  and  the  geoweb.  Progress  in  

Human  Geography,  35(3),  401-­‐408.    

Empson,  R.  (2012,  February  5).  Twitter:  In  The  Final  3  Minutes  Of  The  Super  Bowl,  There  Were  10,000  Tweets  

Per  Second.  Retrieved  April  9th,  2012,  from  http://techcrunch.com/2012/02/05/twitter-­‐in-­‐the-­‐final-­‐3-­‐

minutes-­‐of-­‐the-­‐super-­‐bowl-­‐there-­‐were-­‐10000-­‐tweets-­‐per-­‐second/  

Eysenbach,  G.  (2011).  Can  Tweets  Predict  Citations?  Metrics  of  Social  Impact  Based  on  Twitter  and  Correlation  

with  Traditional  Metrics  of  Scientific  Impact.  Journal  of  Medical  Internet  Research,  13(4).  doi:  e123  

10.2196/jmir.2012  

Gayo-­‐Avello,  D.  (2012).  "I  Wanted  to  Predict  Elections  with  Twitter  and  all  I  got  was  this  Lousy  Paper"  :  A  

Balanced  Survey  on  Election  Prediction  using  Twitter  Data.  Arxiv  preprint  arXiv:1204.6441.    

Gupta,  M.,  Zhao,  P.,  &  Han,  J.  (2012).  Evaluating  Event  Credibility  on  Twitter.    

Gupta,  N.  K.  (2011).  Extracting  descriptions  of  problems  with  product  and  services  from  twitter  data.  

Heer,  J.,  &  Agrawala,  M.  (2008).  Design  considerations  for  collaborative  visual  analytics.  Information  

Visualization,  7(1),  49-­‐62.    

Huston,  C.,  Weiss,  M.,  &  Benyoucef,  M.  (2011).  Following  the  Conversation:  A  More  Meaningful  Expression  of  

Engagement.  In  G.  Babin,  K.  StanoevskaSlabeva  &  P.  Kropf  (Eds.),  E-­‐Technologies:  Transformation  in  a  

Connected  World  (Vol.  78,  pp.  199-­‐210).  Berlin:  Springer-­‐Verlag  Berlin.  

Keim,  D.,  Andrienko,  G.,  Fekete,  J.  D.,  G√∂rg,  C.,  Kohlhammer,  J.,  &  Melan√ßon,  G.  (2008).  Visual  analytics:  

Definition,  process,  and  challenges.  Information  Visualization,  154-­‐175.    

Large,  T.  (2012).  TechnoTalk  -­‐  Will  Twitter  put  the  U.N.  out  of  the  disaster  business?  Retrieved  28  March,  2012,  

from  http://www.trust.org/alertnet/blogs/technotalk/will-­‐twitter-­‐put-­‐the-­‐un-­‐out-­‐of-­‐the-­‐disaster-­‐

business/#.T3Gkd2LX3Yk.twitter  

Lee,  R.,  Wakamiya,  S.,  &  Sumiya,  K.  (2011).  Discovery  of  unusual  regional  social  activities  using  geo-­‐tagged  

microblogs.  World  Wide  Web-­‐Internet  and  Web  Information  Systems,  14(4),  321-­‐349.    

Li,  C.,  Wang,  Y.,  &  Liu,  X.  (2011).  Research  on  natural  disaster  forecasting  data  processing  and  visualization  

technology.  

Liu,  B.  (2010).  Sentiment  analysis  and  subjectivity.  Handbook  of  Natural  Language  Processing,  627-­‐666.    

Liu,  B.  F.,  &  Kim,  S.  (2011).  How  organizations  framed  the  2009  H1N1  pandemic  via  social  and  traditional  

media:  Implications  for  US  health  communicators.  [Article].  Public  Relations  Review,  37(3),  233-­‐244.  

doi:  10.1016/j.pubrev.2011.03.005  

Maxwell,  D.,  Raue,  S.,  Azzopardi,  L.,  Johnson,  C.,  &  Oates,  S.  (2012).  Crisees:  Real-­‐Time  Monitoring  of  Social  

Media  Streams  to  Support  Crisis  Management.  Advances  in  Information  Retrieval,  573-­‐575.    

Meier,  P.  (Producer).  (2012,  April  4th).  Collaborative  Mapping  Platforms:  Crowdsourced  Crisis  Response.  

[Keynote]  Retrieved  from  http://www.trendhunter.com/keynote/patrick-­‐meier  

Mendoza,  M.,  Poblete,  B.,  &  Castillo,  C.  (2010).  Twitter  Under  Crisis:  Can  we  trust  what  we  RT?  

Page 20: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

20  

Muralidharan,  S.,  Rasmussen,  L.,  Patterson,  D.,  &  Shin,  J.  H.  (2011).  Hope  for  Haiti:  An  analysis  of  Facebook  and  

Twitter  usage  during  the  earthquake  relief  efforts.  [Article].  Public  Relations  Review,  37(2),  175-­‐177.  

doi:  10.1016/j.pubrev.2011.01.010  

Park,  J.,  Cha,  M.,  Kim,  H.,  &  Jeong,  J.  (2012).  Managing  Bad  News  in  Social  Media:  A  Case  Study  on  Domino‚Äôs  

Pizza  Crisis.    

Platt.,  A.,  Hood.,  C.,  &  Citrin.,  L.  (2011).  Organization  of  Social  Network  Messages    to    Improve  Understanding  of  

an  Evolving  Crisis  Paper  presented  at  the  Intelligence  and  Security  Informatics  (ISI),  2011  IEEE  

International  Conference,  Beijing.  

Potts,  L.,  Seitzinger,  J.,  Jones,  D.,  &  Harrison,  A.  (2011).  Tweeting  disaster:  hashtag  constructions  and  collisions.  

Sakaki,  T.,  Toriumi,  F.,  &  Matsuo,  Y.  (2011).  Tweet  trend  analysis  in  an  emergency  situation.  

Sarcevic,  A.,  Palen,  L.,  White,  J.,  Starbird,  K.,  Bagdouri,  M.,  &  Anderson,  K.  (2012).  Beacons  of  hope  in  

decentralized  coordination:  learning  from  on-­‐the-­‐ground  medical  twitterers  during  the  2010  Haiti  

earthquake.  

Song,  W.,  Tjondronegoro,  D.  W.,  &  Docherty,  M.  (2012).  Understanding  user  experience  of  mobile  video:  

framework,  measurement,  and  optimization.  Mobile  Multimedia:  User  and  Technology  Perspectives,  

3-­‐30.    

Stamberger,  K.  S.  a.  J.  (2010).  Tweak  the  Tweet:  Leveraging  microblogging  proliferation  with  a  prescriptive  

syntax  to  support  citizen  reporting.  Paper  presented  at  the  Information  Systems  for  Crisis  Response  

and  Management  (ISCRAM),  Seatle,  USA.  

Starbird,  K.  (2012).  Digital  Volunteerism:  Examining  Connected  Crowd  Work  During  Mass  Disruption  Events.    

Starbird,  K.,  &  Palen,  L.  (2011).  "Voluntweeters":  self-­‐organizing  by  digital  volunteers  in  times  of  crisis.  Paper  

presented  at  the  Proceedings  of  the  2011  annual  conference  on  Human  factors  in  computing  systems,  

Vancouver,  BC,  Canada.    

Starbird,  K.,  Palen,  L.,  Hughes,  A.  L.,  &  Vieweg,  S.  (2010).  Chatter  on  the  red:  what  hazards  threat  reveals  about  

the  social  life  of  microblogged  information.  Paper  presented  at  the  Proceedings  of  the  2010  ACM  

conference  on  Computer  supported  cooperative  work,  Savannah,  Georgia,  USA.    

Tufte,  E.  R.  (2001).  The  visual  display  of  quantitative  information:  Graphics  Press.  

Valero,  A.  T.  l.,  Gómez,  M.  M.  y.,  &  Pineda,  L.  V.  o.  (2009).  Using  Machine  Learning  for  Extracting  Information  

from  Natural  Disaster  News  Reports.  Computación  y  Sistemas  (Computers  and  Systems),  13(1),  33-­‐44.    

Vlachos,  A.  (2011).  Evaluating  unsupervised  learning  for  natural  language  processing  tasks.  Paper  presented  at  

the  Proceedings  of  the  First  Workshop  on  Unsupervised  Learning  in  NLP,  Edinburgh,  Scotland.    

Wu,  S.,  Hofman,  J.  M.,  Mason,  W.  A.,  &  Watts,  D.  J.  (2011).  Who  says  what  to  whom  on  twitter.  Paper  

presented  at  the  Proceedings  of  the  20th  international  conference  on  World  Wide  Web,  Hyderabad,  

India.    

Yugami,  N.,  Igata,  N.,  Anai,  H.,  &  Inakoshi,  H.  (2012).  Advanced  Analytics  for  Intelligent  Society.  Fujitsu  Scientific  

&  Technical  Journal,  48(2),  110-­‐116.    

 

 

Page 21: QUT Stage2 Document Avijit Paul

 

“Extracting  meaningful  information  from  Social  Network  streams  for  Crisis  Mapping”  Avijit  Paul  –  n8459941  –  PhD  -­‐  Stage  2  Proposal  -­‐  [email protected]  

 

21  

6.    Appendix  

6.1    Coursework      

AIRS  Unit  –  IFN  001  

I  have  taken  the  course  Advanced  Information  Retrieval  Skills  (IFN001)  and  submitted  assignment  

and  waiting  for  result.    

 

Approaches  to  Enquiry  In  the  Creative  Industries  -­‐  KKP601    

I  have  taken  this  course,  Approaches  to  Enquiry  In  the  Creative  Industries,  completed  the  

presentation  and  have  submitted  the  final  assignment  and  waiting  for  result.