d w d +d q g olq j ,p s r u w &oh d q lq j d q g 9lv x d ... · 'd w d +d q g olq j ,p s r...
TRANSCRIPT
![Page 1: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/1.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 1/48
Data Handling: Import, Cleaning and VisualisationLecture 5:
Programming with Data
Prof. Dr. Ulrich Matter
17/10/2019
![Page 2: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/2.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 2/48
Updates
![Page 3: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/3.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 3/48
Part I: Data (Science) fundamentals
Date Topic
19.09.19 Introduction: Big Data/Data Science, course overview
26.09.19 An introduction to data and data processing
26.09.19 Exercises/Workshop 1: Tools, working with text files
03.10.19 Data storage and data structures
10.10.19 'Big Data‘ from the Web
10.10.19 Exercises/Workshop 2: Computer code and data storage
17.10.19 Programming with data
![Page 4: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/4.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 4/48
Next Week (24.10.2019)
No lecture in the morning! (no rooms available)
The exercise session in the afternoon is not affected by this (will takeplace as scheduled)!
·
·
![Page 5: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/5.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 5/48
Recap: "Big Data" from the Web
![Page 6: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/6.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 6/48
Limitations of rectangular data
Only two dimensions.
Hard to represent hierarchical structures.
·
Observations (rows)
Characteristics/variables (columns)
-
-
·
Might introduce redundancies.
Machine-readability suffers (standard parsers won't recognize it).
-
-
![Page 7: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/7.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 7/48
XML: JSON:
<person> <firstName>John</firstName> <lastName>Smith</lastName> <age>25</age> <address> <streetAddress>21 2nd Street</streetAddress> <city>New York</city> <state>NY</state> <postalCode>10021</postalCode> </address> <phoneNumber> <type>home</type> <number>212 555-1234</number> </phoneNumber> <phoneNumber> <type>fax</type> <number>646 555-4567</number> </phoneNumber> <gender> <type>male</type> </gender></person>
{"firstName": "John", "lastName": "Smith", "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": "10021" }, "phoneNumber": [ { "type": "home", "number": "212 555-1234" }, { "type": "fax", "number": "646 555-4567" } ], "gender": { "type": "male" }}
![Page 8: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/8.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 8/48
XML: JSON:
<person> <firstName>John</firstName> <lastName>Smith</lastName> </person>
{"firstName": "John", "lastName": "Smith", }
![Page 9: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/9.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 9/48
Parsing XML in R
The following examples are based on the example code shown above (thetwo text-files persons.json and persons.xml)
# load packages library(xml2) # parse XML, represent XML document as R object xml_doc <- read_xml("persons.xml") xml_doc
## {xml_document} ## <person> ## [1] <firstName>John</firstName> ## [2] <lastName>Smith</lastName> ## [3] <age>25</age> ## [4] <address>\n <streetAddress>21 2nd Street</streetAddress>\n <city>New York</## [5] <phoneNumber>\n <type>home</type>\n <number>212 555-1234</number>\n</phoneN## [6] <phoneNumber>\n <type>fax</type>\n <number>646 555-4567</number>\n</phoneNu## [7] <gender>\n <type>male</type>\n</gender>
![Page 10: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/10.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 10/48
Parsing JSON in R
# load packages library(jsonlite) # parse the JSON-document shown in the example above json_doc <- fromJSON("persons.json") # check the structure str(json_doc)
## List of 6 ## $ firstName : chr "John" ## $ lastName : chr "Smith" ## $ age : int 25 ## $ address :List of 4 ## ..$ streetAddress: chr "21 2nd Street" ## ..$ city : chr "New York" ## ..$ state : chr "NY" ## ..$ postalCode : chr "10021" ## $ phoneNumber:'data.frame': 2 obs. of 2 variables: ## ..$ type : chr [1:2] "home" "fax" ## ..$ number: chr [1:2] "212 555-1234" "646 555-4567" ## $ gender :List of 1 ## $ type: chr "male"
![Page 11: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/11.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 11/48
<!DOCTYPE html> <html> <head> <title>hello, world</title> </head> <body> <h2> hello, world </h2> </body> </html>
![Page 12: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/12.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 12/48
HTML documents: code and data!
HTML documents/webpages consist of 'semi-structured data':
A webpage can contain a HTML-table (structured data)…
…but likely also contains just raw text (unstructured data).
·
·
![Page 13: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/13.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 13/48
Characteristics of HTML
1. Annotate/'mark up' data/text (with tags)
2. Nesting principle
3. Expresses what is what in a document.
Defines structure and hierarchy
Defines content (pictures, media)
·
·
head and body are nested within the html document
Within the head, we define the title, etc.
·
·
Doesn't explicitly 'tell' the computer what to do
HTML is a markup language, not a programming language.
·
·
![Page 14: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/14.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 14/48
HTML document as a 'tree'
HTML (DOM) tree diagram (by Lubaochuan 2014, licensed under the Creative Commons Attribution-Share Alike 4.0 International license).
![Page 15: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/15.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 15/48
Parsing a Webpage with R
# install package if not yet installed# install.packages("rvest") # load the package library(rvest)
# parse the webpage, show the content swiss_econ_parsed <- read_html("https://en.wikipedia.org/wiki/Economy_of_Switzerlandswiss_econ_parsed
## {html_document} ## <html class="client-nojs" lang="en" dir="ltr"> ## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n## [2] <body class="mediawiki ltr sitedir-ltr mw-hide-empty-elt ns-0 ns-subject mw-e
![Page 16: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/16.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 16/48
Parsing a Webpage with R
Now we can easily separate the data/text from the html code. For example,we can extract the HTML table containing the data we are interested in as adata.frames.
tab_node <- html_node(swiss_econ_parsed, xpath = "//*[@id='mw-content-text']/div/tabtab <- html_table(tab_node) tab
## Year GDP (billions of CHF) US Dollar Exchange ## 1 1980 184 1.67 Francs ## 2 1985 244 2.43 Francs ## 3 1990 331 1.38 Francs ## 4 1995 374 1.18 Francs ## 5 2000 422 1.68 Francs ## 6 2005 464 1.24 Francs ## 7 2006 491 1.25 Francs ## 8 2007 521 1.20 Francs ## 9 2008 547 1.08 Francs ## 10 2009 535 1.09 Francs ## 11 2010 546 1.04 Francs ## 12 2011 659 0.89 Francs ## 13 2012 632 0 94 Francs
![Page 17: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/17.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 17/48
Basic Programming Concepts
![Page 18: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/18.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 18/48
Loops
Repeatedly execute a sequence of commands.
Known or unknown number of iterations.
Types: 'for-loop' and 'while-loop'.
·
·
·
'for-loop': number of iterations typically known.
'while-loop: number of iterations typically not known.
-
-
![Page 19: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/19.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 19/48
for-loop
![Page 20: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/20.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 20/48
for-loop in R
# number of iterations n <- 100# start loopfor (i in 1:n) { # BODY}
![Page 21: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/21.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 21/48
for-loop in R
# vector to be summed up numbers <- c(1,2,3,4,5)# initiate total total_sum <- 0# number of iterations n <- length(numbers)# start loopfor (i in 1:n) { total_sum <- total_sum + numbers[i]}
![Page 22: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/22.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 22/48
Nested for-loops
# matrix to be summed up numbers_matrix <- matrix(1:20, ncol = 4) numbers_matrix
## [,1] [,2] [,3] [,4] ## [1,] 1 6 11 16 ## [2,] 2 7 12 17 ## [3,] 3 8 13 18 ## [4,] 4 9 14 19 ## [5,] 5 10 15 20
![Page 23: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/23.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 23/48
Nested for-loops
# number of iterations for outer loop m <- ncol(numbers_matrix)# number of iterations for inner loop n <- nrow(numbers_matrix)# start outer loop (loop over columns of matrix)for (j in 1:m) { # start inner loop # initiate total total_sum <- 0 for (i in 1:n) { total_sum <- total_sum + numbers_matrix[i, j] } print(total_sum) }
![Page 24: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/24.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 24/48
while-loop
![Page 25: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/25.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 25/48
while-loop in R
# initiate variable for logical statement x <- 1# start loopwhile (x == 1) { # BODY}
![Page 26: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/26.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 26/48
while-loop in R
# initiate starting value total <- 0# start loopwhile (total <= 20) { total <- total + 1.12}
![Page 27: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/27.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 27/48
Booleans and logical statements
2+2 == 4
## [1] TRUE
3+3 == 7
## [1] FALSE
4!=7
## [1] TRUE
![Page 28: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/28.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 28/48
Booleans and logical statements
condition <- TRUE if (condition) { print("This is true!")} else { print("This is false!")}
## [1] "This is true!"
![Page 29: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/29.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 29/48
Booleans and logical statements
condition <- FALSE if (condition) { print("This is true!")} else { print("This is false!")}
## [1] "This is false!"
![Page 30: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/30.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 30/48
R functions
'Take a variable/parameter value as input and provide value asoutput'
For example, .
R functions take 'parameter values' as input, process those valuesaccording to a predefined program, and 'return' the results.
· f : X → Y
· X Y
· 2 × X = Y
·
![Page 31: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/31.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 31/48
R functions
Many functions are provided with R.
More can be loaded by installing and loading packages.
·
·
# install a package install.packages("<PACKAGE NAME>")# load a package library(<PACKAGE NAME>)
![Page 32: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/32.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 32/48
Tutorial: A Function to Compute the Mean
![Page 33: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/33.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 33/48
Preparation
1. Open a new R-script and save it in your code-directory as my_mean.R.
2. In the first few lines, use # to write some comments describing what thisscript is about.
![Page 34: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/34.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 34/48
Preparation
###################################### Mean Function:# Computes the mean, given a # numeric vector.
![Page 35: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/35.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 35/48
Preparation
1. Open a new R-script and save it in your code-directory as my_mean.R.
2. In the first few lines, use # to write some comments describing what thisscript is about.
3. Also in the comment section, describe the function argument (input)and the return value (output)
![Page 36: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/36.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 36/48
Preparation
###################################### Mean Function:# Computes the mean, given a # numeric vector.# x, a numeric vector# returns the arithmetic mean of x (a numeric scalar)
![Page 37: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/37.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 37/48
Preparation
1. Open a new R-script and save it in your code-directory as my_mean.R.
2. In the first few lines, use # to write some comments describing what thisscript is about.
3. Also in the comment section, describe the function argument (input)and the return value (output)
4. Add an example (with comments), illustrating how the function issupposed to work.
![Page 38: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/38.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 38/48
Preparation
# Example:# a simlpe numeric vector, for which we want to compute the mean# a <- c(5.5, 7.5)# desired functionality and output:# my_mean(a)# 6.5
![Page 39: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/39.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 39/48
1. Know the concepts/context!
Programming a function in R means telling R how to transform a giveninput (x).
Before we think about how we can express this transformation in the Rlanguage, we should be sure that we understand the transformation perse.
·
·
![Page 40: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/40.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 40/48
1. Know the concepts/context!
Here, we should be aware of how the mean is defined:
.
Programming a function in R means telling R how to transform a giveninput (x).
Before we think about how we can express this transformation in the Rlanguage, we should be sure that we understand the transformation perse.
·
·
= ( ) =x̄ 1
n∑n
i=1xi
+ +⋯+x1 x2 xn
n
![Page 41: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/41.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 41/48
2. Split the problem into several smaller problems
From looking at the mathematical definition of the mean ( ), we recognizethat there are two main components to computing the mean:
x̄
: the sum of all the elements in vector
and , the number of elements in vector .
· ∑ni=1
xi x
· n x
![Page 42: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/42.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 42/48
3. Address each problem step-by-step
In R, there are two built-in functions that deliver exactly these twocomponents:
sum() returns the sum of all the values in i ts arguments (i.e., if x is anumeric vector, sum(x) returns the sum of all elements in x).
length() returns the total number of elements in a given vector (thevector's 'length').
·
·
![Page 43: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/43.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 43/48
4. Putting the pieces together
With the following short line of code we thus get the mean of the elementsin vector a.
sum(a)/length(a)
![Page 44: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/44.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 44/48
5. Define the function
All that is left to do is to pack all this into the function body of our newlydefined my_mean() function:
# define our own function to compute the mean, given a numeric vector my_mean <- function(x) { x_bar <- sum(x) / length(x) return(x_bar)}
![Page 45: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/45.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 45/48
6. Test it with the pre-defined example
# test it a <- c(5.5, 7.5) my_mean(a)
## [1] 6.5
![Page 46: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/46.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 46/48
6. Test it with other implementations
Here, compare it with the built-in mean() function:
b <- c(4,5,2,5,5,7) my_mean(b) # our own implementation
## [1] 4.666667
mean(b) # the built_in function
## [1] 4.666667
![Page 47: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/47.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 47/48
Q&A
![Page 48: D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D ... · 'D W D +D Q G OLQ J ,P S R U W &OH D Q LQ J D Q G 9LV X D OLV D W LR Q / H FW X UH 3 UR JUD P P LQ J Z LW K 'D W D](https://reader036.vdocuments.mx/reader036/viewer/2022071006/5fc3d3bcc245fc1a76775f06/html5/thumbnails/48.jpg)
17/10/2019 Data Handling: Import, Cleaning and Visualisation
file:///Users/umatter/Dropbox/Teaching/HSG/datahandling/datahandling/materials/slides/html/05_programming.html#47 48/48
References