datascience$in$the$21 $century$ - cra · introducmons$ • panelists:$ –...

24
Data Science in the 21 st Century chair: Barbara Ryder, Virginia Tech cochairs: Lise Getoor, UC Santa Cruz Steve Heller, Two Sigma Snowbird CRA Conference July 19 , 2016

Upload: others

Post on 10-Jun-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Data  Science  in  the  21st  Century  

chair:  Barbara  Ryder,  Virginia  Tech  co-­‐chairs:  Lise  Getoor,  UC  Santa  Cruz    

Steve  Heller,  Two  Sigma    

Snowbird  CRA  Conference  July  19  ,  2016  

Page 2: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

IntroducMons  

•  Panelists:  – David  Culler,  UC  Berkeley  – Rahel  Jhirad,  Hearst  – Rayid  Ghani,  UChicago  – Rob  Rutenbar,  UIUC  

Page 3: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Roadmap  

 •  IntroducMon  -­‐  Lise    •  Panelist  Remarks  -­‐  David,  Rahel,  Rayid,  Rob    •  Q&A  

hSp://bit.ly/cradsp  

Page 4: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

QuesMons  •  i.  What  is  data  science  and  what  is  its  relaMon  to  computer  science?      

•  ii.  What  do  students  have  to  learn  about  data  science  to  become  pracMMoners?  Should  learning  data  science  require  technical  skill,  and  if  so,  what  is  the  minimum  skill  set?    

•  iii.  What  are  the  big  research  quesMons  in  data  science?      

Page 5: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Data  Science  Panel:  IntroducMon  

Lise  Getoor  UC  Santa  Cruz  

 

Snowbird  Panel  on  Data  Science  July  19  ,  2016  

Page 6: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

What  is  Data  Science?  

Page 7: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

buzz  

Page 8: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

magic  

Page 9: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

old  

Page 10: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Defini&on:  Turning  data  into  knowledge,  and  knowledge  into  ac&on  

 

huge   heterogeneous   linked  &  noisy  

Page 11: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Data  Science  is  Interdisciplinary  

Systems:    Storage,  DB,  Security,  networking,  High-­‐

performance  compuMng    

Analy,cs:  Data  Mining,  

Machine  Learning,  Math,  StaMsMcs,  OpMmizaMon,  

Decision  Making  

User  Interac,on:  VisualizaMon,  Human  Computer  InteracMon,  

Sense  Making  

Data:  unstructured  

data,  graph  data,  text  data,  image,  

video  

EducaMon  

Health  Science  

BioinformaMcs  

Ecology   Smart  CiMes  

Social  Systems  

Astrophysics  

Personalized  Medicine  

Biology  

Marine  Science  

HumaniMes  Cybersecurity  

Environment   Business  

Page 12: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

What  do  data  scienMsts  do?  

source:  data  from  hSp://www.linkedin.com/skills   Courtesy:  Peter  Skomoroch    

Page 13: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Four  types  of  data  scienMst  

source:  "Analyzing  the  Analyzers"  O'Reilly  Media  

Page 14: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Why?  

Page 15: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Two  Views  

•  Pessimist:    No  choice  –  data  is  transforming  academic  disciplines  and  we  need  to  parMcipate  in  defining  the  discipline  of  data  science  to  not  be  lec  behind  

•  Op,mist:    Opportunity  to  develop  new  computaMonal  paradigms  for  data-­‐driven  and  model-­‐based  methods  and  to  train  students  to  do  responsible  data  science  

Page 16: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

How?  

Page 17: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Resources  •  NSF  Report  on  Data  Science  EducaMon  

–  hSp://bit.ly/DSEW2015_DracReport  •  NSF  CISE  AC  Report  on  Data  Science  

–  Outline,  expected  out  this  fall  •  NTRD  Report  on  Federal  Big  Data  Research  and  

Development  Plan  •  NAS  report  on  FronMers  of  Massive  Data  AnalyMcs,  2013  •  NSF  Workshop  Report  on  the  Social,  Econonic  and  

Workforce  ImplicaMons  of  Big  Data  AnalyMcs  and  Decision  Making  

•  ASA  Statement  on  Role  of  StaMsMcs  in  Data  Science  •  CRA  Statement  on  Data  Science    

Page 18: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Academic  Audience    

•  CS  departments  fall  into  following  categories:  1.   Leading  efforts  at  their  university  around  DS  2.  Part  of  an  integrated  effort  across  campus    3.  One  of  many  independent  efforts  across  campus  4.  Not  involved,  but  wondering  whether  to  get  

involved,  as  DS  efforts  are  being  under  taken  by  other  departments,  and  they  are  not  included  

5.  Not  involved  and  not  interested  

Page 19: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Academic  Audience    

•  CS  departments  fall  into  following  categories:  1.   Leading  efforts  at  their  university  around  DS  2.  Part  of  an  integrated  effort  across  campus    3.  One  of  many  independent  efforts  across  campus  4.  Not  involved,  but  wondering  whether  to  get  

involved,  as  DS  efforts  are  being  under  taken  by  other  departments,  and  they  are  not  included  

5.  Not  involved  and  not  interested  

Page 20: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

CRA  Statement  Goals  1.  Communicate  the  centrality  of  computer  science  research  

and  educaMon  to  the  emerging  discipline  of  data  science.    Emphasize  the  need  for  computer  science  knowledge  in  data  science  and  vice-­‐versa.  

2.  Highlight  the  need  and  opportuniMes  for  collaboraMon  around  data  science  research  between  academia  and  industry.  

3.  Commit  to  developing  a  program  of  responsible  data  science  research,  which  acknowledges  the  limitaMons  of  tradiMonal  computaMonal  and  staMsMcal  thinking  for  many  of  big  data  science  problems  and  highlights  the  importance  of  security,  privacy,  interpretability,  ethics  and  fairness.  

Page 21: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

CS  Research  OpportuniMes  

•  From  a  computaMonal  point  of  view,  data  science  requires  a  much  deeper  understanding  and  representaMon  of  how  data  is  acquired,  stored  and  accessed.    Data  lineage,  data  quality,  quality  assurance,  data  integraMon,  storage  and  security  all  need  to  be  rethought.    

•  The  tradiMonal  approach  of  acquisiMon  and  storage,  then  processing  ocen  does  not  work  for  very  high  volume/high  rate  data.    Big  data  demands  sublinear  algorithms;  along  with  this,  there  will  be  a  growing  need  for  probabilisMc,    stochasMc  and  online  approaches    

•  Many  classic  staMsMcal  assumpMons  do  not  fit  the  current  data  science  needs.      Ocen  derived  from  natural  sources,  data  is  increasingly  likely  to  be  biased,  incomplete  and  highly  heterogeneous.      Even  in  the  small  data  sejng,  we  need  new  techniques  that  can  cope  with  heterogeneity  and  biased  sampling.    causality  

•  The  challenges  in  scale  and  heterogeneity  also  fundamentally  change  how  users  interact  with  data,  how  the  data  is  visualized,  what  algorithms  are  needed  to  support  understanding  and  interpretaMon  of  the  results  of  data  science  models,  and  how  user  feedback  is  incorporated  

It  will  require  foundaMonal  new  research  into  computaMon,  systems,  machine  intelligence,  and  user  interacMon  

Data  Science  ≠  Hadoop  +  R    

Page 22: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Panel  QuesMons  i.  What  is  data  science  and  what  is  its  relaMon  to  computer  science?        ii.  What  do  students  have  to  learn  about  data  science  to  become  pracMMoners?  Should  learning  data  science  require  technical  skill,  and  if  so,  what  is  the  minimum  skill  set?      iii.  What  are  the  big  research  quesMons  in  data  science?      

hSp://bit.ly/cradsp  

Page 23: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Your  Turn  

•  At  your  university:  – What  is  being  done?  •  New  degrees,  departments,  etc.?  

– Who  is  doing  it?  • Which  departments  are  leading  the  efforts?  

– How  is  the  effort  being  supported?  • Who  is  championing  the  effort?    

Page 24: DataScience$in$the$21 $Century$ - CRA · IntroducMons$ • Panelists:$ – David$Culler,$UC$Berkeley$ – Rahel$Jhirad,$Hearst – Rayid$Ghani, UChicago$ – Rob$Rutenbar,$UIUC$

Closing  Remarks  

•  Let’s  work  together  to  share  experiences  and  resources  

•  Please  see  and  share  your  resources  here:  

 hSp://bit.ly/cradsp