r programming for data science
TRANSCRIPT
![Page 2: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/2.jpg)
2
Outline
● History of R● Installation (Windows and Linux)● Data Types● Reading Data:
– Tabular– Large datasets
● Textual Data Formats● Subsetting:
– Lists, Matrices, Partial matching– Removing missing values
![Page 3: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/3.jpg)
3
Outline● Vectorized operations● Control Structures
– If-else– For, while, repeat, next break
● Functions– Scoping
● Dates and Times● Loop functions
– lapply, tapply, apply, mapply, split,
● Simulation and profiling– Generating random numbers, simulating a linear model, random sampling
● Visualizations
![Page 4: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/4.jpg)
4
History of R
● Originates from S language. S was initiated in 1976 as an internal statistical analysis environment—originally implemented as Fortran libraries– History of S:
http://www.stat.bell-labs.com/S/history.html
● R development history:– https://en.wikipedia.org/wiki/R_(programming_la
nguage)
![Page 5: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/5.jpg)
5
R and Statistics
● R developed from S which is a statistical analysis tool, and so is R
● Its functionality is divided into modules– Need to load a module for different functionalities
● Has very sophisticated graphics capabilities than most other statistical packages
● Useful for interactive work: run from terminal● Contains a powerful programming language for
developing new tools– Tools: for visualizations and analysis
![Page 6: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/6.jpg)
6
Design of the R System
● The “base” system, downloaded from CRAN● “All other stuff”● Packages in R
– The “base” has the base package required to run R and has the most fundamental functions
– Other packages contained in the “base”. Need to load these to be able to use them: utils, stats, datasets, graphics, grDevices, tools, etc.
– Recommended packages: boot, class, cluster, codetools, foreign, lattice, etc.
– Load packages with library(), or require()
![Page 7: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/7.jpg)
7
R Resources
● CRAN:– http://cran.r-project.org
● Quick-R: a book– http://www.statmethods.net/
● R bloggers (platform): not a social network– R-Bloggers is about empowering bloggers to empower
other R users– R-Bloggers.com is a blog aggregator of content
contributed by bloggers who write about R (in English)– https://www.r-bloggers.com/
![Page 8: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/8.jpg)
8
Installation of R: Ubuntu● Run from terminal:
– sudo apt-get install r-base r-base-dev
● If this doesn’t work, then you need – To add the repositories:
sudo echo "deb http://cran.rstudio.com/bin/linux/ubuntu xenial/" | sudo tee -a /etc/apt/sources.list
– Add the keyring: gpg --keyserver keyserver.ubuntu.com --recv-key E084DAB9
gpg -a --export E084DAB9 | sudo apt-key add -
– Install R-Base sudo apt-get update; sudo apt-get install r-base r-base-dev
● You can install from a PPA which has the most recent versions– Add the PPA
sudo add-apt-repository ppa:marutter/rrutter
– Install R-Base sudo apt-get update; sudo apt-get install r-base r-base-dev
![Page 9: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/9.jpg)
9
Installation of R: Windows
● Visit CRAN– https://cran.r-project.org/
● CRAN: Comprehensive R Archive Network
![Page 10: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/10.jpg)
10
Installation of R: Windows
Click/Select Download R for Windows
![Page 11: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/11.jpg)
11
Installation of R: Windows
Then click/select base or install R for the first time
![Page 12: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/12.jpg)
12
Installation of R: Windows
● Then click/select Download R X.X.X for Windows● After the download has finished, locate thedownloaded file and install.
![Page 14: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/14.jpg)
14
RStudio: Introduction
● RStudio is a set of integrated tools designed to help you be more productive with R.
● How?– It includes a console,– syntax-highlighting editor that supports direct
code execution, – a variety of robust tools for
plotting, viewing history, debugging and managing your workspace.
![Page 15: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/15.jpg)
15
RStudio: Installation
● From the RStudio home page, go to Products then select RStudio– Then scroll down and click
Download RStudio Desktop– Then click Download under RStudio Desktop
Personal License.– Select RStudio for your platform. Clicking on the
link will download the file directly.– Locate the file in your system Downloads folder
and start the installation.
![Page 16: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/16.jpg)
16
RStudio: Parts
The Console is where you write and run code interactively
The Files tab shows all the files and folders in your default workspace as if you were on a PC/Mac window.
The Plots tab will show all your graphs.
The Packages tab will list a series of packages or add-ons needed to run certain processes.
For additional info see the Help tab
The Environment tab shows all the active objects The History tab shows a list of commands used so far
![Page 17: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/17.jpg)
17
RStudio: Working Directory
● It is important to organize all files for a particular project under one main/parent directory
● A working directory in RStudio is where all the files for a particular project are stored
● All paths used in the console to load data files and scripts are relative to the working directory.
![Page 18: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/18.jpg)
18
● To set the working directory:– Start RStudio the same way you start other
programs in your computer– From the File menu options select New Project then
select New Directory then Empty Project then type the directory name (rprogramming) then under create project as subdirectory of click Browse and select Desktop
●
RStudio: Working Directory
![Page 19: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/19.jpg)
19
R: Getting Started● A few basic commands to test them on the console
– getwd(): get current working directory
– setwd(“/path/to/directory”): set a working directory to the specified path
– install.packages(“package_name”): install a package. Requires internet connection
– library(package_name), require(package_name): load and attach add-on packages
– ?object: provide documentation/help for an object. e.g. ?mtcars
– summary(object): provide a summary of an object like a dataset e.g. summary(mtcars)
● Everytime you run library(package_name) and get an error “there is no package called ‘package_name’”, you will need to install it first then call library on it.
![Page 20: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/20.jpg)
20
Data Visualizations in R: Introduction
● R has different systems (packages) for making graphs (visualizations)
● For this case we are going to use ggplot2 which is more elegant and versatile compared to many others. (ggvis, rgl, htmlwidgets, googleVis, etc.)
● Ggplot2 is built upon the “The Layered Grammar of Graphics”
![Page 21: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/21.jpg)
21
Data Visualizations in R: Tidyverse
● Tidyverse is a set of packages– The packages work in harmony
Reason: they share common data representations and API design.
● The tidyverse package makes it easy to install and load core packages from it in a single command
● To install run: install.packages(“tidyverse”)
● To use it run: library(tidyverse)which loads tidyverse core packages: ggplot2, tibble, tidyr, readr, purrr, and dplyr.– Google each one of these packages to learn what they do
![Page 22: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/22.jpg)
22
Data Visualizations: First Steps● library(tidyverse) loads all the core packages from
tidyverse● The library() function also tells any conflicts with base R
or other packages that arise from loading the named package. ● e.g. for this case filter() and lag() are functions from
tidyverse that conflict with similar functions from dplyr and stats packages
● In this case you may need to call a function explicitly from a package in the form. package::function()● e.g. ggplot2::ggplot() calls the ggplot function from
ggplot2 package.
![Page 23: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/23.jpg)
23
● Which is more fuel efficient: cars with big engines or cars with small engines?
● The mpg data frame:– Data Frame: is a rectangular collection of
variables in columns and observations in rows The mpg data frame in ggplot2 contains observations
collected by the US Environment Protection Agency on 38 models of cars.
● Run (from console) ?mpg to learn more about the data set.
Data Visualizations: First Steps
![Page 24: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/24.jpg)
24
First Steps Creating a ggplot
● To answer the question about fuel efficiency plot fuel consumption (hwy: y-axis) against engine size (displ: x-axis)
● See the magic of this command:– ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
![Page 25: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/25.jpg)
25
First Steps Creating a ggplot
> ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
A negative relationship between engine size (displ) and fuel efficiency (hwy) means Cars with bigger engines use more fuel.
![Page 26: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/26.jpg)
26
Creating a ggplot● In ggplot2,
– You begin with the function ggplot() ggplot() creates a coordinate system that you can add layers onto. The first argument is the data set that you are going to use for plotting
– To complete the graph add more layers to the coordinate system created by ggplot()
geom_point() function adds a layer of points to plot (which creates a scatter plot for this case)
Each function in ggplot2 takes a mapping argument which defines how variables are mapped to visual properties.
The mapping argument is always paired with aes()– The x and y arguments of aes() specify which variables to map to the x and y
axes.
– ggplot2 looks for the mapped variable in the data argument, in this case, mpg
![Page 27: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/27.jpg)
27
Creating a ggplot: Template
● A graphing template for ggplot
● You can get a list of <GEOM_FUNCTION>s by following this link (http://docs.ggplot2.org/current/)
![Page 28: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/28.jpg)
28
ggplot: Aesthetics Mappings
● Look at the graph and note the circled dots
● What is special with these big engine cars?
![Page 29: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/29.jpg)
29
ggplot: Aesthetics● Ggplot Aesthetic mappings can help answer the
question● An aesthetic is a visual property of the objects in a
plot. – These are things like size, shape or color of points.
● You can therefore display a point in different ways by changing the values of its aesthetic properties.
● You can convey information about your data by mapping the aesthetics in your plot to the variables in your dataset.– e.g. you can map the colors of your points to the class
variable to reveal the class of each car.
![Page 30: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/30.jpg)
30
ggplot: Aesthetics● New plot with aesthetics for class:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
● Try for year and manufacturer and look at the trends
![Page 31: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/31.jpg)
31
ggplot: Aesthetics
● Other aesthetics:– Size: for ordered variables, so each point reveals
its attribute size– Alpha: controls the transparency of the points– Shape: points will be of different shapes
Exercise: try plotting the same geom with these different aesthetics
● ggplot2 takes care of selecting a reasonable scale to use with the aesthetic and constructs a legend
![Page 32: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/32.jpg)
32
ggplot: Aesthetics
● The aesthetic properties of a geom can be set manually.– For example:
ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
– Will set all points to blue– Note color is outside the aes()
![Page 33: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/33.jpg)
33
ggplot: Facets
![Page 34: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/34.jpg)
34
● When the data has categorical variables, it is possible to split the plot into facets.
● Facets are subplots that each displays a subset of data.
● To plot facets, with a single variable, use the function facet_wrap(formula, …)– formula is created with ~ variable-name– formula is the name of a data structure in R, not a
synonym for equation.– The variable (variable-name) should be discrete.
ggplot: Facets
![Page 35: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/35.jpg)
35
ggplot: Facets● For example:
– ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy), color=”red”) + facet_wrap(~ class, nrow = 3)
● This will produce a plot for each element in mpg.class, and the plot will display in three rows.
![Page 36: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/36.jpg)
36
ggplot: Facets
● Can we facet the plot using two discrete variables:● Do this:
– ?facet_grid– ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy)) + facet_grid(drv ~ cyl)
In the plot, why do we have empty sub-plots?●
![Page 37: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/37.jpg)
37
ggplot: Facets
● Hack:– With facet grid, what happens when you use a . at
the place of one variable?– Is there an advantage of faceting over the color
aesthetic? Any disadvantages? What is the dataset is very large?
– In facet_wrap() what do nrow or ncol do?
– When using facet_grid() put the variable with more unique levels in the columns (RHS of formula), why?
Why doesn’t facet_grid() have nrow, and ncolumn
![Page 38: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/38.jpg)
38
ggplot2::Geometric objects (geoms)
● These are the geometric objects used to represent the data.– e.g. bar geoms, point geoms, line geoms, smooth geoms,
etc.
● To change the geom in your plot, change the geom function (geom_xxx())
● For example:– ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
– ggplot(data = mpg) + geom_smooth(mapping = aes(x = displ, y = hwy))
● Not every aesthetic works with every geom– e.g. you can’t set a shape of a line but of a point– Read: ?geom_point, ?geom_smooth
![Page 39: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/39.jpg)
39
ggplot2: geoms● ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))
● Try: – ggplot(data = mpg) +
geom_line(mapping = aes(x = displ, y = hwy, linetype = drv))
![Page 40: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/40.jpg)
40
ggplot2: geoms
● Plot:– ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy, linetype = drv))
– ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y – hwy, group = drv))
What is the difference? Which is better? Why?
![Page 41: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/41.jpg)
41
Ggplot2: combined geoms
● Can we use more than one geoms on the same plot?
● Try:– ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
geom_smooth(mapping = aes(x = displ, y = hwy))
● When using multiple geoms on the same plot you can use global mappings:– ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
geom_smooth()
Which makes the code easy to read and modify.
![Page 42: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/42.jpg)
42
ggplot2: combined geoms● When you use global mappings and set some mappings in a geom function,
these mappings will be treated as local to this layer only.
● For example:– ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth()
![Page 43: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/43.jpg)
43
ggplot2: combined geoms
● In the same way, you can specify different data for each layer.– Say you only want to fit a smooth line for one class of
cars– ggplot(data = mpg, mapping = aes(x = displ, y = hwy)) +
geom_point(mapping = aes(color = class)) +
geom_smooth(data = filter(mpg, class == "subcompact"), se = FALSE)
– Hack: can we plot more than one of the same
geom? –Try a smooth geom with different car class
![Page 44: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/44.jpg)
44
Ggplot2: combined geoms
![Page 45: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/45.jpg)
45
Combined Geoms: exercise
![Page 46: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/46.jpg)
46
Ggplot2: geoms
● How many geoms does ggplot2 have?– Visit this page:
https://www.rstudio.com/resources/cheatsheets/ Look for Data Visualization Cheat Sheet
● ggplot2 extensions provide more geoms to use. Take a look at available extensions from this gallery (http://www.ggplot2-exts.org/gallery/)
●
![Page 47: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/47.jpg)
47
ggplot2: statistical transformations
● Read: ?diamonds– ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
– Where does count come from?
![Page 48: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/48.jpg)
48
Statistical Transformations
● Some plots plot raw values – e.g. scatterplots,
● Some plots use calculated values– bar charts, histograms, and frequency polygons bin
your data and then plot bin counts, the number of points that fall in each bin.
– smoothers fit a model to your data and then plot predictions from the model. (Remember regression lines)
– boxplots compute a robust summary of the distribution and then display a specially formatted box.
–
–
![Page 49: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/49.jpg)
49
Statistical Transformation
● The algorithm used to calculate new values for a graph is called a stat, (Statistical Transformation)
● You can check which stat is used by default by looking at the default value of stat.– geom_bar() uses count. Thus you can recreate the bar
chart by running ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
● Every geom has a default stat; and vice-versa. This means that you can typically use geoms without worrying about the underlying statistical transformation.
![Page 50: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/50.jpg)
50
Statistical Transformation
● You can explicitly specify a stat:● When you want to override the default stat
e.g. Run demo <- tribble(
~a, ~b,
"bar_1", 20,
"bar_2", 30,
"bar_3", 40
)
Then runggplot(data = demo) +
geom_bar(mapping = aes(x = a, y = b), stat = "identity")
![Page 51: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/51.jpg)
51
Statistical Transformation● Reasons to explicitly specify a stat: cntd
– You want to override the default mapping from transformed variables to aesthetics.
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, y = ..prop.., group = 1))– This will draw a bar chart of proportion instead of count
![Page 52: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/52.jpg)
52
Position Adjustments
● A bar chart can be colored in either of two ways: color and fill.– ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
– ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
![Page 53: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/53.jpg)
53
Position Adjustments
● Check how the following plots will look like– ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity))
– ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = "identity")
– ggplot(data = diamonds, mapping = aes(x = cut, colour = clarity)) +
geom_bar(fill = NA, position = "identity")
– ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")
– ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
![Page 54: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/54.jpg)
54
Position Adjustments
● Learn more about position adjustments– ?position_dodge,
– ?position_fill,
– ?position_identity,
– ?position_jitter
– ?position_stack
![Page 55: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/55.jpg)
55
Position Adjustments:overplotting.
● Recall: ggplot(data = mpg) + geom_point(mapping = aes(x = displ, y = hwy))
– It displays fewer than 234 points: the number of observations (can you count them?)
– The values of displ and hwy are rounded and many points overlap each other. That is a problem called overplotting.
● You can avoid this gridding by setting the position adjustment to “jitter”– position = “jitter” adds a small amount of random noise to each point
– Since no points can receive the same amount of noise, they are going to be spread out.
● Jittering makes the graph less accurate at small scales, however it will make the graph more revealing at large scales.
● In ggplot2 the shorthand for geom_point(position = "jitter") is geom_jitter()
![Page 56: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/56.jpg)
56
Position Adjustments: jitter● ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
![Page 57: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/57.jpg)
57
Thank You! Asanteni!
![Page 58: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/58.jpg)
58
Working with Data
● In this part we are going to learn how to work with your data.– Getting data
Importing your own data Tidying data
– How to work with different data types: Relational data, Strings, Factors, Dates and Times
![Page 59: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/59.jpg)
59
Importing Data● For importing files, we will use the readr package which
is part of the tidyverse core packages.● Most of readr functions turn flat files into data frames. A
Data Frame is a tabular data format with rows and columns. It is a list of vectors of equal length.– read_csv(): reads comma separated files
– read_csv2(): reads semicolon separated files
– read_tsv(): read tab delimited files
– read_delim(): reads files with any delimiter
● Activity:– Check what read_table(), read_fwf() and read_log()
do?
![Page 60: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/60.jpg)
60
Importing Data: read_csv()● The first argument is the path to the file to read
– read_csv(“data/students.csv”)
● read_csv() prints out a column specification● read_csv() by default uses the first row as the column names
– You can use skip = n, to skip the first n lines if they contain data you don’t need, (most likely metadata)
– You can use comment = “#” to drop all lines that start with # for example
– Use col_names = FALSE so that read_csv() doesn’t treat the first row as the column names
● Missing values in R are specified out by na or NA. When loading files where missing values are specified differently, use na = “.” for example if missing values are specified by a period.– What will this line do?
read_csv(“students.csv”, skip = 2, comment = “//”, col_names = FALSE, na = “-”)
![Page 61: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/61.jpg)
61
Importing Data: Parsing● The parse_*() functions:
– ?parse_logical, ?parse_integer, ?parse_date
● The parse functions take in a character vector and return a more specialized vector.– Characters include everything, all letters and numbers, e.g.
“dLab”, “2013”, “xyz3”, “12.09”– A specialized would contain say only numbers, or only decimal
numbers, or only characters, and this is what the parse functions do: return a list of specific type of characters
● A vector in R is a list of characters surrounded enclosed in c() – For example names <- c(“John”, “Jean”, “Giovanni”, “Joni”)
dates_of_birth <- c(“2012-12-31”, “1988-05-02”, “1990-01-06”)
![Page 62: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/62.jpg)
62
Importing Data: Parsing● What happens to the following?
parse_integer(c("1", "231", ".", "456"), na = ".")
x <- parse_integer(c("123", "345", "abc", "123.45"))
● parse_logical() and parse_integer() parse logicals and integers respectively. There’s basically nothing that can go wrong with these parsers so I won’t describe them here further.
● parse_double() is a strict numeric parser, and parse_number() is a flexible numeric parser. These are more complicated than you might expect because different parts of the world write numbers in different ways.
● parse_character() seems so simple that it shouldn’t be necessary. But one complication makes it quite important: character encodings.
● parse_factor() create factors, the data structure that R uses to represent categorical variables with fixed and known values.
● parse_datetime(), parse_date(), and parse_time() allow you to parse various date & time specifications. These are the most complicated because there are so many different ways of writing dates.
![Page 63: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/63.jpg)
63
Importing Data: parsing● One important thing to note is encoding when parsing character.
UTF-8 is the most common, it may save you hours of fixing problems. Specify it when parsing characters like
x <- "El Niño was particularly bad this year"
parse_character(x, locale = locale(encoding = "utf-8"))
● ?parse_datetime, ?parse_date, ?parse_time
● Generate correct format strings to parse each of the following dates and times– d1 <- "January 1, 2010"
– d2 <- "2015-Mar-07"
– d3 <- "06-Jun-2017"
– d4 <- c("August 19 (2015)", "July 1 (2015)")
– d5 <- "12/30/14" # Dec 30, 2014
– t1 <- "1705"
– t2 <- "11:15:10.12 PM"
![Page 64: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/64.jpg)
64
Importing Data: parsing files● example_file <- read_csv(readr_example("challenge.csv"))
● Use the problems() function to look at any issues with the import– problems(example_file)
● Specify the column names explicitly when reading the fileexample_file <- read_csv(readr_example(“challenge.csv”),
col_types = cols(x = col_double(),y = col_date()
)
)
● Use tail(dataframe, n=X) and head(dataframe, n=X) to look at last and first X rows of the data frame.
![Page 65: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/65.jpg)
65
Parsing files
● One more strategy to get the column types is to use the guess_max option when reading in a file.
example_file2 <- read_csv(readr_example("challenge.csv"), guess_max = 1001)
![Page 66: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/66.jpg)
66
Writing to a file
● If you want to save the data into CSV you can use either of the functions– write_csv() or write_tsv() where you need
to specify The data frame you are saving The the file path (location) where to save it Optionally:
– you can set how missing values are written with na– You can also append to an existing file
![Page 67: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/67.jpg)
67
Parsing Files
● Group Activity – Download the dataset: Number of Trainees with
Special Needs enrolled in Vocational Training Centres from http://opendata.go.tz
Read it into a data frame and do some manipulations including making some plots
– Inspect read_rds() and write_rds() and see where you can
use these functions
– Explore these packages: Haven, readxl, DBI
![Page 68: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/68.jpg)
68
Tidy Data● A tidy dataset has these features
– Each variable is in its own column– Each observation is in its own row– Each value is in its own cell
● ?gather, ?spread
● Missing Values: – Can be explicitly stated with NA– Can be implicit: not present in the data
● With gather(…, na.rm=TRUE)● You can use the complete() function to make missing
values explicit tidy data.– ?complete
![Page 69: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/69.jpg)
69
Case Study
● Optionally download the data from http://www.who.int/tb/country/data/download/en/
● Load the data from the file or from the package: tidyr::who
● Looking at the data:– Country, iso2, iso3 are similar: representing a
country– Year is clearly a variable– Other columns, have unclear names, look at the
dictionary
![Page 70: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/70.jpg)
70
Case Study cntd...● Gather all the other columns, removing all missing values
– who1 <- who %>%
gather(new_sp_m014:newrel_f65, key = "key", value = "cases", na.rm = TRUE)
● Look at structure of the values in the new key by counting– who1 %>%
count(key)– Use the data dictionary for the definition of the keys– who2 <- who1 %>% – mutate(key = stringr::str_replace(key, "newrel", "new_rel"))
● Separate the key variable into different columns– who3 <- who2 %>%
separate(key, c("new", "type", "sexage"), sep = "_")
● Look at new key– who3 %>% – count(new)
● Drop new column because it is constant– who4 <- who3 %>%
select(-new)
● Separate sexage into sex and age– who5 <- who4 %>%
separate(sexage, c("sex", "age"), sep = 1)
![Page 71: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/71.jpg)
71
![Page 72: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/72.jpg)
72
Writing Code in R● Create new objects with <- with the format object_name
<- object_value● The <- symbol is the assignment operator● Examples:
– first_name <- “Sovello”
– date.of.birth <- “12/31/1980”
– PlaceOfBirth <- “Njombe”
– AGE <- 37
– x = 200 * 5
● Object names must start with a letter.● Object names can only contain letters, numbers,
underscore (_), and period (.)– Look at the examples above
![Page 73: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/73.jpg)
73
Writing code in R● You can look at what is in R by typing the name of the object
● You can also print an object explicitly– print(first_name)
[1] “Sovello” The [1] shown in the output indicates that x is a vector and 5 is its first element.
![Page 74: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/74.jpg)
74
Writing code in R
● All values that are not numbers must be enclosed in double/single quotes (“value”, or ‘value’)– Look at definition of place.of.birth in the screenshot
● Typos matter, when using object names. Cases matter a lot such that surname and Surname are not the same.
● The # character indicates a comment. Anything to the right of # is ignored by R
● No multi-line comments
![Page 75: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/75.jpg)
75
Group Exercise (5min)● What is wrong with this code snippet
Surname <- “Mkulima”
surname
● If you start typing a value for an object and press enter before an enclosing quote or paranthesis the code will look like
college <- “College of informatics
+
– A + means you should continue typing. What would you do to fix, stop or escape from the problem?
● Fix errors in this piece of code until it workslibrary(tidyverse)
ggplot(dota = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
fliter(mpg, cyl = 8)
![Page 76: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/76.jpg)
76
R Objects● R has five atomic objects
– Character– Numeric (real numbers)– Integer– Complex– Logical (True/False)
● The most basic type of R is a vector. An empty vector can be created with vector()
● A vector can only contain objects of the same type.● Numbers are generally treated as numeric objects
– If you want an integer, you have to explicitly specify an L. 1L is an integer 1 is a real number
![Page 77: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/77.jpg)
77
R Objects
● Inf is a special number which represents infinity.– You can use Inf in calculations like 1/Inf
● Creating vectors● Use the c() function to create vectors
> x <- c(0.5, 0.6) ## numeric
> x <- c(TRUE, FALSE) ## logical
> x <- c(T, F) ## logical
> x <- c("a", "b", "c") ## character
> x <- 9:29 ## integer
> x <- c(1+0i, 2+4i) ## complex
![Page 78: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/78.jpg)
78
Coercion of R objects● You can explicitly coerce objects using the as.* functions. ?
as.integer, ?as.character, ?as.logical, ?as.numeric
> x <- 0:6
> class(x)
[1] "integer"
> as.numeric(x)
[1] 0 1 2 3 4 5 6
> as.logical(x)
[1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE
> as.character(x)
[1] "0" "1" "2" "3" "4" "5" "6"
● If R fails to coerce an object, it produces NAs.> x <- c("a", "b", "c")
> as.numeric(x)
Warning: NAs introduced by coercion
[1] NA NA NA
> as.logical(x)
[1] NA NA NA
> as.complex(x)
Warning: NAs introduced by coercion
[1] NA NA NA
![Page 79: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/79.jpg)
79
R Objects: Matrices
● Matrices are vectors with a dimension attribute.● The dimension is an integer vector of length 2
(number of rows, number of columns)> m <- matrix(nrow = 2, ncol = 3)
> m
[,1] [,2] [,3]
[1,] NA NA NA
[2,] NA NA NA
> dim(m)
[1] 2 3
> attributes(m)
$dim
[1] 2 3
![Page 80: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/80.jpg)
80
Matrices● Matrices are constructed column-wise and so entries start at the
“upper left” corner and running down the columns> m <- matrix(1:6, nrow = 2, ncol = 3)
> m
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
● You can create matrices from vectors by adding a dimensions attribute> m <- 1:10
> m
[1] 1 2 3 4 5 6 7 8 9 10
> dim(m) <- c(2, 5)
> m
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
● Matrices must have every element be the same class (e.g. all integers or all numeric).
![Page 81: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/81.jpg)
81
Group work
● What do cbind() and rbind() do?
● Create 3 vectors and 3 matrices.● Create 3 matrices from vectors● Create 2 matrices using cbind() and rbind()
● Read about R lists: how to create using list()
![Page 82: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/82.jpg)
82
R Objects: Factors
● Factors represent categorical data● Factors can be ordered or unordered● Factor objects can be created with the
factor() function> x <- factor(c("yes", "yes", "no", "yes", "no"))
> x
[1] yes yes no yes no
Levels: no yes
> table(x)
x
no yes
2 3
![Page 83: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/83.jpg)
83
Factors● Say you want to sort a vector
> x1 <- c("Dec", "Apr", "Jan", "Mar")
> sort(x1)
[1] "Apr" "Dec" "Jan" "Mar"
● The target was to see months sorted in the order of Jan, Mar, Apr, Dec● To solve this problem we can make use of factors
– Create a vector of monthsmonth_levels <- c(
"Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec”
)
● Then create a vector with month levels.> y1 <- factor(x1, levels = month_levels)
● Applying sort on the new variable, will produce a sorted list in order of months
> sort(y1)
![Page 84: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/84.jpg)
84
R Objects: missing values● Missing values are denoted by NA and NaN for undefined mathematical
operations– is.na() is used to test objects if they are NA
– is.nan() is used to test for NaN
● NA values have a class also, so there are integer NA, character NA, etc.
● A NaN value is also NA but the converse is not true– > ## Create a vector with NAs in it
– > x <- c(1, 2, NA, 10, 3)
– > ## Return a logical vector indicating which elements are NA
– > is.na(x)
– [1] FALSE FALSE TRUE FALSE FALSE
– > ## Return a logical vector indicating which elements are NaN
– > is.nan(x)
– [1] FALSE FALSE FALSE FALSE FALSE
● What is difference between missing values Nas and Zero
![Page 85: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/85.jpg)
85
R Objects:Data Frames
● Data frames store tabular data in R● Data frames are represented as a special type
of list where every element of the list has to have the same length.
● Each element of the list can be thought of as a column and the length of each element of the list is the number of rows.
● Unlike matrices, data frames can store different classes of objects in each column.
![Page 86: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/86.jpg)
86
Data Frames> x <- data.frame(foo = 1:4, bar = c(T, T, F, F))
> x
foo bar
1 TRUE
2 TRUE
3 FALSE
4 FALSE
> nrow(x)
[1] 4
> ncol(x)
[1] 2
![Page 87: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/87.jpg)
87
Writing Code in R
● Scripts:– Turning interactive code into scripts
![Page 88: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/88.jpg)
88
Data Transformation
● Filter rows with filter()– Comparisons: >, >=, <, <=, !=, ==
sqrt(2) ^ 2 == 2
– Logical operatorsAnd &
Or | (shorthand x %in% y e.g. 2 %in% c(1, 2, 3, 4))
Not !
– To determing missing values is.na(x)
● Ordering: use arrange()
![Page 89: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/89.jpg)
89
Reading Data: large datasets
● With much larger datasets, there are a few things that you can do that will make your life easier and will prevent R from choking.– Read the help page for read.table, which contains many hints– Stop if your RAM is smaller than the size of the file– Set comment.char = "" if there are no commented lines in
your file.– Use the colClasses argument. Specifying this option instead
of using the default can make ’read.table’ run MUCH faster, often twice as fast. You have to know the class of each column
– Set nrows. This doesn’t make R run faster but it helps with memory usage.
![Page 90: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/90.jpg)
90
Reading large datasets
● A quick way to figure out the classes of each column is the following:
> initial <- read.table("datatable.txt", nrows = 100)
> classes <- sapply(initial, class)
> tabAll <- read.table("datatable.txt", colClasses = classes)
![Page 91: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/91.jpg)
91
Control Structures
● Control structures allow to control the flow of execution of a series of R expressions.
● Control structures allow you to put some “logic” into R code, rather than just always executing the same R code every time.
● Control structures allow you to respond to inputs or to features of the data and execute different R expressions accordingly.
![Page 92: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/92.jpg)
92
Control Structures: if-else● This if-else structure allows you to test a condition and act on it depending on
whether it’s true or false– You can only use the if statement
if(<condition>) {
## do something
}
## Continue with rest of code
● Or use the complete if-elseif(<condition>) {
## do something
}
else {
## do something else
}
● You can have a series of tests by following the initial if with any number of else ifs.if(<condition1>) {
## do something
} else if(<condition2>) {
## do something different
} else {
## do something different
}
![Page 93: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/93.jpg)
93
Example: if-else● ## Generate a uniform random number
x <- runif(1, 0, 10)
if(x > 3) {
y <- 10
} else {
y <- 0
}
● This is the same as executingy <- if(x > 3) {
10
} else {
0
}
![Page 94: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/94.jpg)
94
Control Structures: for
● For loops are the only looping construct in Rfor( x in sequence ){
##Execute code
}
● For one line loops, the curly braces are not strictly necessary.
– > for(i in 1:4) print(x[i])
[1] "a"
[1] "b"
[1] "c"
[1] "d"
–
![Page 95: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/95.jpg)
95
Control Structures: while
● While loops begin by testing a condition● If it is true, they loop body is executed and
the condition is tested again until the condition is false
> count <- 0
> while(count < 10) {print(count)count <- count + 1
}
![Page 96: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/96.jpg)
96
Control Structures: next
● Next is used to skip an iteration of a loopfor(i in 1:100) {
if(i <= 20) {
## Skip the first 20 iterations
next
}
## Do something here
}
![Page 97: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/97.jpg)
97
Control Structures: break
● Break is used to exit the loop immediately, regardless of what the loop maybe on.
for(i in 1:100) {
print(i)
if(i > 20) {## Stop loop after 20 iterationsbreak
}
}
![Page 98: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/98.jpg)
98
Functions
![Page 99: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/99.jpg)
99
Functions: scoping
![Page 100: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/100.jpg)
100
Dates and Times
![Page 101: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/101.jpg)
101
Loop functions
![Page 102: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/102.jpg)
102
Simulating and Profiling
![Page 103: R programming for data science](https://reader033.vdocuments.mx/reader033/viewer/2022051123/5a6e6f587f8b9ae3258b5c67/html5/thumbnails/103.jpg)
103
Vectorized Operations