getting data science with r and arcgis -...
TRANSCRIPT
GettingDataSciencewithRandArcGISShaunWalbridge
MarjeanPobuda
https://github.com/scw/r-devsummit-2017-talk
HighQualityPDF(4MB)
ResourcesSection
DataScience
DataScienceAmuch-hypedphrase,buteffectivelyisabouttheapplicationofstatisticsandmachinelearningtoreal-worlddata,anddevelopingformalizedtoolsinsteadofone-offanalyses.Combinesdiversefieldstosolveproblems.
DataScienceWhat’sadatascientist?
“Adatascientistissomeonewhoisbetteratstatisticsthananysoftwareengineerandbetteratsoftwareengineeringthananystatistician.”—JoshWills
DataScienceUsgeographicfolksalsorelyonknowledgefrommultipledomains.Weknowthatspatialismorethanjustanxandycolumninatable,
andhowtogetvalueoutofthisdata.
DataScienceLanguagesPython(SciPystack,Jupyter,scikit-learn,…)
C++(Tensorflow,Shark,MLC++)Java(SparkMLlib,Weka)R( )
Manyworkflowsrequirecombiningcomponentsfrommultipleenvironments.
MLtaskview
Industrystandardforpackagemanagementinthedatasciencecontext,builtby
StartedwithPython,butasshownintheRsegmentoftheplenary,itcanbeusedtosupportR,andhybridworkflowswhichconnectmultiplelanguages.
TechnologypartnerofEsri,haveatalktomorrow:
Thurs10:30AM,MesquiteG-H
ContinuumAnaltyics
ExploringContinuumAnalytics’Open-SourceOfferings
R
Esriand ?IntegrationviaArcGIS–RbridgeJoined andMoretocome—GIShashistoricallybeenmorecoupledwithPython
RConsortium RFoundation
Why ?Powerfulcoredatastructuresandoperations
Dataframes,functionalprogrammingUnparalleledbreadthofstatisticalroutines
ThedefactolanguageofStatisticiansCRAN:6400packagesforsolvingproblemsVersatileandpowerfulplotting
WeassumebasicproficiencyprogrammingSeeresourcesforadeeperdiveintoR
RDataTypesyou’reusedtoseeing…
Numeric-Integer-Character-Logical-timestamp
…butothersyouprobablyaren’t:
vector-matrix-data.frame-factor
Datatypes
DataFramesTreatstabular(andmulti-dimensional)dataasalabeled,indexedseriesofobservations.Soundssimple,butisagamechangerovertypicalsoftwarewhichisjustdoing2Dlayout(e.g.Excel)
DataTypes#Createadataframeoutofanexistingsourcedf.from.csv<-read.csv("data/growth.csv",header=TRUE)
DataTypes#Createadataframefromscratchquarter<-c(2,3,1)person<-c("Goodchild","Tobler","Krige")
met.quota<-c(TRUE,FALSE,TRUE)df<-data.frame(person,met.quota,quarter)
DataTypesR>dfpersonmet.quotaquarter1GoodchildTRUE22ToblerFALSE33KrigeTRUE1
0D:SpatialPoints1D:SpatialLines2D:SpatialPolygons3D:Solid4D:Space-time
spTypesEntity+Attributemodel
R—ArcGISBridge
R—ArcGISBridge
R—ArcGISBridge
ArcGISdeveloperscancreatetoolsandtoolboxesthatintegrateArcGISandRArcGISuserscanaccessRcodethroughgeoprocessingscriptsRuserscanaccessorganizationsGIS’data,managedintraditionalGISways
https://r-arcgis.github.io
R–ArcGISBridgeStoreyourdatainArcGIS,accessitquicklyinR,returnRobjectsback
toArcGISnativedatatypes(e.g.geodatabasefeatureclasses).
Knowshowtoconvertspatialdatatospobjects.
PackageDocumentation
Demo:GettingStarted
ArcGISvsRDataTypesArcGIS R ExampleValue
AddressLocator
Character AddressLocators\\MGRS
Any Character
Boolean Logical
CoordinateSystem
Character "PROJCS[\"WGS_1984_UTM_Zone_19N\"...
Dataset Character "C:\\workspace\\projects\\results.shp"
Date Character "5/6/20152:21:12AM"
Double Numeric 22.87918
ArcGISvsRDataTypesArcGIS R ExampleValue
Extent Vector(xmin,ymin,xmax,ymax)
c(0,-591.561,1000,992)
Field Character
Folder Character fullpath,usewithe.g.file.info()
Long Long 19827398L
String Character
TextFile Character fullpath
Workspace Character fullpath
AccessArcGISfromRStartbyloadingthelibrary,andinitializingconnectiontoArcGIS:
#loadtheArcGIS-Rbridgelibrarylibrary(arcgisbinding)#initializetheconnectiontoArcGIS.OnlyneededwhenrunningdirectlyfromR.arc.check_product()
AccessArcGISfromRFirst,selectadatasource(canbeafeatureclass,alayer,oratable):
Then,filterthedatatothesetyouwanttoworkwith(createsin-memorydataframe):
ThiscreatesanArcGISdataframe–lookslikeadataframe,butretainsreferencesbacktothegeometrydata.
input.fc<-arc.open('data.gdb/features')
filtered.df<-arc.select(input.fc,fields=c('fid','mean'),where_clause="mean<100")
AccessArcGISfromRNow,ifwewanttodoanalysisinRwiththisspatialdata,weneedittoberepresentedasspobjects.arc.data2spdoestheconversion
forus:
arc.sp2datainvertsthisprocess,takingspobjectsandgeneratingArcGIScompatibledataframes.
df.as.sp<-arc.data2sp(filtered.df)
AccessArcGISfromRFinishedwithourworkinR,wanttogetthedatabacktoArcGIS.Writeourresultsbacktoanewfeatureclass,witharc.write:
arc.write('data.gdb/new_features',results.df)
AccessArcGISfromRWKTtoproj.4conversion:
Interactingdirectlywithgeometries:
Geoprocessingsessionspecific:
arc.fromP4ToWkt,arc.fromWktToP4
arc.shapeinfo,arc.shape2sp
arc.progress_pos,arc.progress_label,arc.env(readonly)
DataSciencewithR
HadleyStackDeveloperatRStudio,ProfessoratRiceUniversityggplot2,scales,dplyr,devtools,manyothersNew,incollaborationwithWesMcKinney:
HadleyWickham
feather
StatisticalFormulas
DomainspecificlanguageforstatisticsSimilarpropertiesinotherpartsofthelanguage
formodelspecificationconsistency
fit.results<-lm(pollution~elevation+rainfall+ppm.nox+urban.density)
caret
LiterateProgramming
packages:RMarkdown,Roxygen2Jupyternotebooks
Ibelievethatthetimeisripeforsignificantlybetterdocumentationofprograms,andthatwecanbestachievethisbyconsideringprogramstobeworksofliterature.—DonaldKnuth,“LiterateProgramming”
DevelopmentEnvironments
néeIPython
Bestofclasstoolsforinteractingwithdata.
RToolsforVisualStudio
dplyrPackageBatting%.%group_by(playerID)%.%summarise(total=sum(G))%.%arrange(desc(total))%.%head(5)
Introducingdplyr
RChallengesPerformanceissuesNotageneralpurposelanguageLackspurelyUImodeofinteraction(e.g.plotsmustbemanuallyspecified)Programmeronly.Thereisshiny,butRisfirstandforemostalanguagethatexpectsfluencyfromitsusers
R–ArcGISBridgeDeepDive
BuildingRScriptTools
Demo:R-ArcGISbridge
HowToInstallInstallwiththeRbridgeinstallDetailedinstallationinstructions
WhereCanIRunThis?
WhereCanIRunThis?Now:
First, 3.1orlaterArcGISPro(64-bit)1.1orlaterArcMap10.3.1orlater:
32-bitRbydefault64-bitRavailableviaBackgroundGeoprocessing
ArcGISServer10.3.1+/ArcGISEnterprise
installR
What’sNext?CondaformanagingRenvironments
StartingatPro2.0,canbeinstalledasanyotherpackageRastersupport
Resources
TrainingResourcesLearnLesson:AnalyzeCrimeUsingStatisticsandtheR-ArcGISbridge
WebCourse1:UsingtheR-ArcGISbridge
WebCourse2:IntegratingRScriptsintoArcGISGeoprocessingTools
OtherSessionsearlier
today,
yesterday,earliertoday,
GettingDataSciencewithRandArcGIS
IntegratingOpen-sourceStatisticalPackageswithArcGIS2016video
HarnessingthePowerofPythoninArcGISUsingtheCondaDistribution 2016videoScientificProgrammingwiththeSciPyStack 2015video
2016video
RLookingforapackagetosolveaproblem?Usethe .
TonsofgoodbooksandresourcesonRavailable,checkouttheenginetofindresourcesforthelanguagewhichcanbe
difficulttolocatebecauseofthename.
CRANTaskViews
RSeek
RPackagesbyHadleyWickham
SpatialR/DataScience Afreeand
accessibleversionoftheclassicinthefield,ElementsofStatisticalLearning.
AnIntroductiontoStaisticalLearning(PDF) website
GettingStartedinDataScience
ArcGIS+RDemoof
CamPlouffe(EsriCA)ranan ,coversmaterialsinmoredepth.
UCPlenaryDemo:StatisticalIntegrationwithRSSN:spatialmodelingonstreamnetworks
RArcGISWorkshop
MaterialsCourses:
Books:
KonstantinKrivoruchko(GAcreator)
Toobigtoprint.Tonsofusefulstuff,coversbothRandArcGISextensively.
HighPerformanceScientificComputingTheDataScientist’sToolbox
SpatialStatisticalDataAnalysisforGISUsers
PackagesClusteringdemocoversmclustandsp.
Tree-basedmodels,e.g.Timeseriesdata,e.g.
CARTLittleBookofR
RArcGISExtensions
CombinesPython,R,andMATLABtosolveawidevarietyofproblems
AnRflavoredlanguageforspatialanalysis
RArcGISBridgeMarineGeospatialEcologyTools(MGET)
GeospatialModelingEnvironment
ConferencesuseR2016isbeingheldJuly5-7inBrussels,Belgium
Manyhappeningaroundworld,someupcomingones:ODSCEastMay3-5inBostonODSCWestNov2-4inSanFrancisco
useR!Conference
OpenDataScienceConference(ODSC)
Closing
OutreachResourcesandoutreach–connectthedots,wantthistobeoutreachsowecanbuildupmoreR+ArcGISpeoplewhoaren’tascommonasourcorelanguagefolks.Futureoftheproject,questions
CommunityOpensourceproject,differentethosContributionsarethecurrency
Thatsaid,majoruptakeinthecommercialspace:MicrosoftR(boughtRevolutionAnalytics);RStudio
ThanksRteam:DmitryPavlushko,SteveKopp,MarkJanikas;today’sspeakers
GeoprocessingTeamContactUs
RateThisSessioniOS,Android:Feedbackfromwithintheapp
WindowsPhone,ornosmartphone?Cuneiformtabletsaccepted.