building data acumen - national...

Post on 07-Aug-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

BuildingDataAcumen

NicoleLazar,UniversityofGeorgiaProfessor,DepartmentofSta=s=cs

MladenVouk,NorthCarolinaStateUniversityDis=nguishedProfessorofComputerScience,

AssociateViceChancellorforResearchDevelopmentandAdministra=on

2

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

BuildingDataAcumen

NicoleLazar,UniversityofGeorgiaProfessor,DepartmentofSta=s=cs

CapstoneCourses

3

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

HistoryofCapstoneattheUniversityofGeorgia

Ø  FirstofferedinAY2007–2008,10students

Ø  PartofUGA’sWriDngIntensiveProgramsinceAY2008–2009Ø  RequiredfromAY2010–2011forallstaDsDcsmajors

Ø  PostersessionintroducedinAY2010–2011

Ø  Enrollmenthasgrownsteadily,trackinggrowthofmajor;

46studentsincurrentoffering

4

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

GoalsofCapstoneattheUniversityofGeorgia

Ø  ExposuretoadvancedstaDsDcaltechniques

Ø  PracDcecommunicaDonofstaDsDcalideas,inwriDngandorallyØ  Groupwork

Ø  VerDcalintegraDonoflearning

(faculty→graduateassistants→students)

Ø  ConsulDng/workwithclient

Ø  Workwithrealdata

Ø  Professionaldevelopment

Withgrowth,challengingtomaintainallofthese.Currentoffering:relaxingclientandgroupwork.

5

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

CapstoneattheUniversityofGeorgia:ProjectFormats

Ø  Typically,findprojectsfromresearchersaroundcampus;scopeandscalehaveincreaseddramaDcallyoverDme

Ø  Alsoprojectsfromgovernmentoffices(e.g.,USDA,CDC)

Ø  Whenclasssizewassmaller,community-basedsurveyprojectsincorporated(e.g.,Nuci’sSpace,CampusTransit)

Ø  Currentoffering:

•  experimentalindividualandgroup(“datarepository”)opDons

•  morepublicoutreachprojects(HumaneSociety,UnitedWay)

6

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

EssenFalComponentsforBuildingDataAcumen:TheData

Ø  Movingbeyondt-testsandANOVA(topicscoveredhaveincludedbootstrap,classificaDonandregressiontrees,survivalanalysis,mulDpletesDng,issuesofreproducibilityinscience...)

Ø  Hands-onpracDcebeyondtheproject

Ø  Realdata(allprojectsinvolvereal,okenlarge,datasets)

Ø  Messydata(studentsarenotguaranteedtoreceivecleandatasetsfromclients)

7

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

EssenFalComponentsforBuildingDataAcumen:TheStudent

Ø  BuildingawarenessofaprofessionalidenDtyasstaDsDciansand(morerecently)datascienDsts

Ø  CommunicaDon!MuchresistanceonthisattheDme;benefitsareseenlater(ingraduateschool,intheworkforce)

Ø  Buildingasenseofcommunityamongstudentsandinstructors,TAs(howtoscaleupfrom10to~50students?)

Ø  Challengestudentstogoaboveandbeyondtheirintellectualcomfortzones

8

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

LookingtotheFuture

Ø  ConflicDngdirecDons

•  CanexpectconDnuedgrowthinstaDsDcsanddatascienceprograms

•  Hands-onpracDcalexperiencewithrealdataisessenDal

Ø  HowtoscaleandsDllprovideausefulexperiencetostudents?

Ø  NeedforinnovaDveapproaches(e.g.team“compeDDons”,smallerworkshopgroups...others?)

Ø  Anysuchcoursewillbeaconstantworkinprogress:don’tfallintoarouDneof“We’vealwaysdoneitthisway”–thatisnolongersufficient

9

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

HowDoWe(CanWe)KnowIt’sWorking?

Ø  Internalassessments:Qualityofprojects;clientsaDsfacDon(repeatparDcipants);postersessionchecklist(completedbyaqendees–staDsDcsfacultyandgraduatestudents,clients)

Ø  Feedbackfromgraduates:What’susefulwhentheygettothenextstage?

Ø  Whatareemployersandgraduateschoolslookingfor?

10

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

BuildingDataAcumen

NicoleLazar,UniversityofGeorgiaProfessor,DepartmentofSta=s=cs

CapstoneCourses

Q&A

11

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

BuildingDataAcumen

MladenVouk,NorthCarolinaStateUniversityDis=nguishedProfessorofComputerScience,

AssociateViceChancellorforResearchDevelopmentandAdministra=on

NCStateUniversityDataScienceIniFaFve

(dis.ncsu.edu)

12

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

Context

•  Understanding,managing,andusingdata—okenlargeamountsofunstructureddata—isbecomingincreasinglyimportantinnearlyeveryindustry,governmentsector,andacademicdomain.

•  NothavingtheskillsandinfrastructuretoapplydatascienceandanalyDcsreliablyandcorrectlyhasbecomeamajorriskforallsectors.

13

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

Data(and)ScienceLiteracyAgooddatascienDstdoesnothavetobeacomputerscienDst,amathemaDcian,orastaDsDcian.

But….

14

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

EducaFon?

•  Whatshouldbeincludedindatasciencecurriculum,bothnowandinthefuture?

•  HowtoprioriDzeorbestconveyfordifferingtypesofdata(scienceprograms)?

•  HowcanopportuniDestoenhancedataacumen(i.e.,theabilitytomakegoodjudgmentsanddecisionswithdata)beintegratedintodatascienceeducaDonalprograms?

•  Howcandataacumenbemeasuredorevaluated?

•  etc.

15

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

DisrupFve?

Largesttaxicompaniesmaynotowntaxis?

(e.g.,Uber)

LargestaccommodaDonprovidersmaynot

ownanyaccomodaDons?(AirBnB)

etc.

16

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

TheFiveV-sof(Big)Data*

Veracity(Uncertainty,Trust,Source,InformaDon

Density,Intent,…)

Velocity(Howfastis

dataarriving?)

Value(Economics,

Health,Security,…) Variety

(RangeofData-text,

visual,numerical,locaDon…)

Volume(Howmuchdataisarriving?)

Howmuchisneeded?Kb/secMb/secGb/secTb/sec

MB,6GB,9TB,12PB,15XB,18ZB,21YB,24

Errors:EpistemicAleatoricWorkingwithanAI

RealLifeDataTypes

(*)www.ibmbigdatahub.com/infographic/four-vs-big-data

17

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

ValuePath

Wisdom&ValueAdded,Impact

Knowledge&

Valaue

(Big)Data

DataScience&AnalyDcs

ApplicaDons

Data

People

Technology

Provenance,Compliance,Security,Usability,Privacy,EthicsReliability,Trustworthiness,SenseofImpact…

e.g.,Extract,Verify,Transform,Load,Organize,…

e.g.curaDon

18

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

DataScienceWorkThatMaYersImaginebeingabletoholdageographicinformaDonsystem(GIS)inyourhands,feeltheshapeoftheearth,sculptitstopography,anddirecttheflowofwater.ResearchersatNCState’sCenterforGeospaDalAnalyDcshavemadethisnovelideaarealitywithTangibleLandscape,anopensourcetangibleinterfacepoweredbyGRASSGISthatphysicallyandinteracDvelymanifestsgeospaDaldatasothatuserscannaturallyfeelit,shapeit,andimmediatelyseeresultsprojectedontothe3-Dmodel.ThismakesGISfarmoreintuiDveandaccessibleforbeginners,empowersgeospaDalexperts,andcreatesexciDngnewopportuniDesfordatascienDstsanddevelopersalike–likegamingwithGIS.TangibleLandscapeisnowbeingappliedtotacklecomplexrealworldproblems,fromcontrollingthespreadofwildfireoremerginginfecDousdiseasestounderstandingtheimpactsofstormhazards.

19

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

CiDzenScienDstsarecomparingnearly300,000satelliteimageswiththeseexamplestohelpimprovetheglobalrecordofhurricanesandtropicalcyclones.Seehowyoucanhelpimproveourunderstandingoftropicalcyclones.Crowd-sourcing.

hqps://ncics.org/events/cics-nc-leads-the-launch-of-cyclonecenter-org/

CycloneCenter.org

20

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

Technology

Data People

Provenance,Compliance,Security,Usability,Privacy,Etc.…

EducaFon

TechnologyDistributedHPC/HPDCapableCloud

DataBus

Tools+ApplicaDons

DataInteracDonModels:•  TotalIsolaDon•  Compute-to-Data•  Data-to-Compute•  Openmodel

DataandAnalyDcsLiteracy•  ComputaDonalThinking•  Math,modeling,sokware,

datamanagement,methods•  CommunicaDons•  Domainview&relevancy•  Ethics,senseofIntegrity&

ConfidenDality(security)

Users/Actors•  Naïve2advanceuser•  Concierge•  Content/Appdeveloper•  Datamanagerand

Admin•  Toolsandsystemdev.

Framework

21

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

HuntLibrary

CreaDvityLab-Hunt

LASMersiveExperienceEB2VisualizaDonlab-Hunt

VizLab–D.H.Hill

GamesRoom-Hunt

NCStatehasExtensiveVisualizaFonFaciliFes

22

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

NCStateUniversityDataScienceIniFaFveGoals•  CoordinatedatascienceacDviDes

•  EstablishaninterdisciplinarydatasciencecurriculumtofurtherdatascienceeducaDon

•  FosterresearchcollaboraDon,bothinternalandexternal

•  IncreaseresearchfundingandcompeDDveness

•  Buildindustrypartnerships

•  Provideservices&infrastructuretofaculty

•  Raisevisibility&increasereputaDon

•  …

Ins$tu$onalizeDataScience

23

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

AnalyFcs&DataSciencePrograms

•  Undergraduate-  GeneralEducaDonthemaDctrack-  Co-taughtComputerScience/StaDsDcsundergraduateelecDves-  PooleCollegeofManagement(PCOM)Undergraduate15-hourData

AnalyDcsHonorsProgram-  SomeoftheExecuDveEducaDonand“DataMaqers”offerings

•  Graduate-  InsDtuteforAdvancedAnalyDcs(IAA)–ProfessionalMS

(analyDcs.ncsu.edu)-  DataScienceGraduateCerDficateandMS(CSC/Stat)-  CSCandStatgraduatetracksinDataScience-  PooleCollegeofManagement–DigitalAnalyDcsCerDficatewithinMBA

program,CSC/Stat/PCOMexecuDveeducaDonforcompanies-  DataMaqerscourses(hqps://research.ncsu.edu/dsi/data-maqers/)

•  Other…(e.g.,Libraryofferings)

24

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

ForExample:

25

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

NCSUEnd-to-EndDataScienceCurriculum*

Domain-Focused Data Science (security, supply chain, economics, ...)

Story Telling and Visualization

End-to-end Data Science

Statistics Machine Learning

Applied Machine Learning

Foundation: Algebra, Calculus, Optimization, and Probability

Pro

gra

mm

ing

En

viro

nm

en

ts, P

latf

orm

s,

an

d P

rod

uc

tivi

ty T

oo

ls

Dat

a M

an

ag

em

en

t, W

ran

glin

g

Big

Dat

a

Insight Generation, Decision Science and Leadership

26

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

DataScience(DS)MasteryLevels•  Core(C):

o  AbletomasterindividualcoreconceptswithintheBloom’staxonomy*:Knowledge,Comprehension,ApplicaDon,Analysis,EvaluaDon,andSynthesis

o  AbletoadaptpreviouslyseensoluDonstodatascienceproblemsfortargetdomain-focusedapplicaDonsuDlizingthesecoreconcepts

•  IntermediateElecDves(I):o  AbletosynthesizemulDpleconceptstosolve,evaluate,andvalidatethe

proposeddatascienceproblemfromtheend-to-endperspecDveo  AbletoidenDfyandproperlyapplythetextbook-leveltechniquessuitablefor

solvingeachpartofthecomplexdatascienceproblempipeline•  AdvancedElecDves(A):

o  Abletoformulatenewdomain-targeteddatascienceproblems,jusDfytheirbusinessvalue,andmakedata-guidedacDonabledecisions

o  Abletoresearchthecu|ngedgetechnologies,comparethemandcreatetheopDmalonesforsolvingDStheproblemsathand

o  Abletoleadasmallteamworkingontheend-to-endexecuDonoftheDSproject

(*)e.g.,hqps://www.csun.edu/science/ref/reasoning/quesDons_blooms/blooms.html

27

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

CoreCurriculumCoursesFoundaDons:

•  MatrixAlgebra•  Calculus•  OpDmizaDon•  Probability

DataScienceMethods:•  StaDsDcs•  MachineLearningandDataMining•  AlgorithmsandDataStructures

DecisionMaking•  Data-guidedDecisionMaking•  VisualizaDon,VisualDataExploraDon,andStoryTelling

InfrastructureandProgrammingEnvironments•  ScripDngLanguagesforDS:e.g.,R,Python•  DatabaseManagementandOpDmizaDon

28

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

AdvancedElecFveCurriculumCoursesDataScienceMethods

•  DeepLearning•  BayesianReasoningandProbabilisDcGraphicalModels•  ProcessMining

DataScienceApplicaDons:•  NaturalLanguageProcessingandTextAnalyDcs•  GraphDataMining•  SocialNetworkAnalyDcs•  DataStreamAnalyDcs•  SenDmentAnalyDcsandRecommendaDonSystems•  SupplyChainDataAnalyDcs•  MarkeDngandFinanceDataAnalyDcs

DecisionMaking•  DiscreteEventSimulaDonandProcessModelingandControlInfrastructureand

ProgrammingEnvironments•  BigDataMiddleware:Hadoop,Spark,GraphDB,etc•  ParallelProgramming(e.g.,withSpark,MPI,etc.)•  IoT

29

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

ClosingThoughtsHow could these components be prioritized or best conveyed for differing types of data science programs? •  Conference on Undergraduate Research in Data Science •  Senior Design Projects on Data Science •  Internship/job Opportunities in partnership with Data Science and other Industries •  Annual Data Science Hackathons •  Data Science Competitions How can opportunities to enhance data acumen (i.e., the ability to make good judgments & decisions with data) be integrated into DS educational programs? •  Industry Surveys •  Foundations and Methods: High Priority •  Is data science introduced as an academic enrichment to the existing university curriculum?

-  e.g., Data Science Concentration within the Computer Science Curriculum How can data acumen be measured or evaluated? •  Individual Course Level Capstone Projects •  Senior Design Project in collaboration with Industry Partners •  Standardized Placement Tests for different Levels of the DS ladder •  Data Science Competition and Contests: perhaps developed in collaboration with industry •  Data Science and DS-related Job Placement Statistics: collected and monitored

30

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

Acknowledgments•  I would like to thank a number of my colleagues at NC

State, UNC-Charlotte, UNC-Chapel Hill and RENCI for discussion and direct or indirect input and contributions as summarized in this presentation. This includes:

Alyson Wilson, Nagiza Samatova, Rada Chirkova, Patrick Dreher, Jamie Roseborough, Michael Rappa, Christopher Healey, Dan McGurrin, Trey Overman, Stan Ahalt, Andrew Wilson, Mirsad Hadjikadic, Ashok Krishnamurthy, Raju Varsavai, Otis Brown, Mike Kowolenko, Andy Rindos.

• Support for some of the described activities comes in part from NC State University, UNC General Administration, State of North Carolina, a number of USA federal agencies and a number of industrial partners.

31

Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS

BuildingDataAcumen–Q&A

NicoleLazar,UniversityofGeorgiaProfessor,DepartmentofSta=s=cs

MladenVouk,NorthCarolinaStateUniversityDis=nguishedProfessorofComputerScience,

AssociateViceChancellorforResearchDevelopmentandAdministra=on

32

9/12/17–BuildingDataAcumen9/19/17–IncorporaDngReal-WorldApplicaDons

9/26/17–FacultyTrainingandCurriculumDevelopment

10/3/17–CommunicaDonSkillsandTeamwork

10/10/17–Inter-DepartmentalCollaboraDonandInsDtuDonalOrganizaDon

10/17/17–Ethics10/24/17–AssessmentandEvaluaDonforDataSciencePrograms

11/7/17–Diversity,Inclusion,andIncreasingParDcipaDon

11/14/17–Two-YearCollegesandInsDtuDonalPartnerships

Provideinputandlearnmore

aboutthestudyatwww.nas.edu/EnvisioningDS

33

top related