sorted bar plot with 45 degree labels step by stepunixlab.sfsu.edu › ~trogu › 523 › 2016 ›...

7
DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2013 Last updated: Tuesday, September 2, 2014 sorted bar plot with 45 degree labels – step by step Sorted bar plot with 45 degree labels in R In this exercise we’ll plot a bar graph, sort it in decreasing order (big to small from left to right) and place long labels under the bars. The labels will be at a 45 degree angle so that they can fit and still be readable. Note that in Illustrator you can quickly do this with a rotated text box and another box that wraps the text and forces the labels to align with the bars at the base of the graph. Thanks to Gabriel Bentley and Maggie Lee (TAs) who researched the code. The database file was originally used by student Michelle Boccia. Download he data set here, composed of increases in tuition by various state universities from 2010-11 to 2011-12. In the exercise, the percent increase will be plotted. The data set file (CSV) is called: stateU1011.csv Below the way the text file looks and the way it will look in R. The data has been cleaned and there are no spaces or special characters in the file name or header names. For example if there is a dash in the name, R will change that into a period when importing it. Also, if you start the header names with a number, R will append an X in front of the name when importing. Please note that not all data sets are ideally plotted as a bar chart. I believe that bar plots are best used when the X axis (horizontal) is used for categories (universities, states, etc) rather than dates. When a time series needs to be plotted (years etc. on the X axis) then a line graph is sufficient. Also, plotting percentages as bars usually is great for comparison between the items, but the relation to the whole (100%) is usually trimmed at the top and that can skew the perception of the graph. Just beware of it. The final code can be found at the end of this document. Import the dataset stateU1011.csv into R-Studio (header: yes, comma separated: yes) and plot. As a rule, type your code in the R script window (upper left). Run code (button in upper right of window. If necessary, select only the code you would like to run, then run. In the matrix, identify which data columns from the data set you are going to visualize. Your choices are the labels in the boxes along the diagonal of the matrix. For each plot, look up or down to identify the X axis, and look sideways to identify the Y axis. For this step, refer also to Chapter 6 in the textbook, and especially my annotated pages 188-189. In this example I will pick campus and percentIncrease. Page of 1 7

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: sorted bar plot with 45 degree labels step by stepunixlab.sfsu.edu › ~trogu › 523 › 2016 › tutorials › sorted_bar... · 2016-08-25 · sorted bar plot with 45 degree labels

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2013! ! Last updated: Tuesday, September 2, 2014!sorted bar plot with 45 degree labels – step by step! ! ! ! ! !

Sorted bar plot with 45 degree labels in R!!!In this exercise we’ll plot a bar graph, sort it in decreasing order (big to small from left to right) and place long labels under the bars. The labels will be at a 45 degree angle so that they can fit and still be readable. Note that in Illustrator you can quickly do this with a rotated text box and another box that wraps the text and forces the labels to align with the bars at the base of the graph.!!Thanks to Gabriel Bentley and Maggie Lee (TAs) who researched the code. The database file was originally used by student Michelle Boccia.!!Download he data set here, composed of increases in tuition by various state universities from 2010-11 to 2011-12. In the exercise, the percent increase will be plotted. The data set file (CSV) is called:!stateU1011.csv Below the way the text file looks and the way it will look in R.

!The data has been cleaned and there are no spaces or special characters in the file name or header names. For example if there is a dash in the name, R will change that into a period when importing it. Also, if you start the header names with a number, R will append an X in front of the name when importing.!!Please note that not all data sets are ideally plotted as a bar chart. I believe that bar plots are best used when the X axis (horizontal) is used for categories (universities, states, etc) rather than dates. When a time series needs to be plotted (years etc. on the X axis) then a line graph is sufficient. Also, plotting percentages as bars usually is great for comparison between the items, but the relation to the whole (100%) is usually trimmed at the top and that can skew the perception of the graph. Just beware of it.!!The final code can be found at the end of this document.!!Import the dataset stateU1011.csv into R-Studio (header: yes, comma separated: yes) and plot. As a rule, type your code in the R script window (upper left). Run code (button in upper right of window. If necessary, select only the code you would like to run, then run. In the matrix, identify which data columns from the data set you are going to visualize. Your choices are the labels in the boxes along the diagonal of the matrix. For each plot, look up or down to identify the X axis, and look sideways to identify the Y axis. For this step, refer also to Chapter 6 in the textbook, and especially my annotated pages 188-189. In this example I will pick campus and percentIncrease.!!!

Page � of �1 7

Page 2: sorted bar plot with 45 degree labels step by stepunixlab.sfsu.edu › ~trogu › 523 › 2016 › tutorials › sorted_bar... · 2016-08-25 · sorted bar plot with 45 degree labels

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2013! ! Last updated: Tuesday, September 2, 2014!sorted bar plot with 45 degree labels – step by step! ! ! ! ! !

plot(stateU1011)

!!Plot campus and percentIncrease to check graph, in this case R will plot dashes instead of dots.!!plot(stateU1011$campus, stateU1011$percentIncrease)

Page � of �2 7

Page 3: sorted bar plot with 45 degree labels step by stepunixlab.sfsu.edu › ~trogu › 523 › 2016 › tutorials › sorted_bar... · 2016-08-25 · sorted bar plot with 45 degree labels

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2013! ! Last updated: Tuesday, September 2, 2014!sorted bar plot with 45 degree labels – step by step! ! ! ! ! !

Plot percentIncrease using the barplot command (not that only one data column is needed to plot the graph. Bars are arranged alphabetically by campus (the names of the universities). It looks cool but it’s difficult to compare each university with the others. Sorting the bars will look less cool but it will be much more informative.!!barplot(stateU1011$percentIncrease) !

!Below, add the campus name labels using names.arg. Notice that only a few labels are displayed, simply because there is not enough room for all the labels to show up.!!barplot(stateU1011$percentIncrease, names.arg=stateU1011$campus) !

Page � of �3 7

Page 4: sorted bar plot with 45 degree labels step by stepunixlab.sfsu.edu › ~trogu › 523 › 2016 › tutorials › sorted_bar... · 2016-08-25 · sorted bar plot with 45 degree labels

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2013! ! Last updated: Tuesday, September 2, 2014!sorted bar plot with 45 degree labels – step by step! ! ! ! ! !

!Next, we’ll sort the bars. In order to do this, we’ll create an object in R where the data will be sorted by the increase amount. !!sortedTable <- stateU1011[order(stateU1011$percentIncrease), ]

midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="") !!See result below and sortedTable object in following picture.!

!Select the sortedTable object (right window) to display this new virtual data set (sorted by percentIncrease). Note that by default R sorted the data in increasing order (small to big).!

Page � of �4 7

Page 5: sorted bar plot with 45 degree labels step by stepunixlab.sfsu.edu › ~trogu › 523 › 2016 › tutorials › sorted_bar... · 2016-08-25 · sorted bar plot with 45 degree labels

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2013! ! Last updated: Tuesday, September 2, 2014!sorted bar plot with 45 degree labels – step by step! ! ! ! ! !

!Now labels for the university names will be added at a 45 degree angle (third line in code below, run all three lines at once)!!sortedTable <- stateU1011[order(stateU1011$percentIncrease), ]

midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="") !text(x=midpts, y=-1, sortedTable$campus, cex=0.5, srt=45, xpd=TRUE, pos=2)

!x tells R where the labels should be positioned (it creates a separate object to do this: midpts – see data set window, but don’t worry about it here).!y sets the vertical distance from the bars. Play around with this value as it might look like nothing happened, but if you don’t get an error, it probably means the labels are rendering off screen, outside the window. Change the value until the labels appear.!sortedTable displays the names of the campuses but in the new sorted order by percent increase.!srt sets the angle of the label, in this case 45 degrees.!xpd (I have no idea :-( will look it up eventually)!pos sets the alignment, I think 2 stands for Flush Right or right side.!!!Next, we’ll reverse the sorting to the more traditional large to small, left to right. See highlights in code below. It’s the same as before with the extra decreasing part, and the type size (cex) is bigger). Run all at once again.!!sortedTable <- stateU1011[order(stateU1011$percentIncrease, decreasing = TRUE), ]

midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="") !text(x=midpts, y=-1, sortedTable$campus, cex=1, srt=45, xpd=TRUE, pos=2) !Note that the labels are still disappearing under the window. Don’t worry, after exporting the plot to PDF and opening the file in Illustrator the labels will display correctly, just make the artboard bigger to make them fit. (See pic on page 6).!

Page � of �5 7

Page 6: sorted bar plot with 45 degree labels step by stepunixlab.sfsu.edu › ~trogu › 523 › 2016 › tutorials › sorted_bar... · 2016-08-25 · sorted bar plot with 45 degree labels

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2013! ! Last updated: Tuesday, September 2, 2014!sorted bar plot with 45 degree labels – step by step! ! ! ! ! !

!After opening the file in Illustrator, remember to:!Select all > Object > Clipping Mask > Release!Also: Compound Path > Release.!Remove any unwanted boxes.!When editing objects (rectangles etc.) remember that each object is split into two separate objects: fill and border. Unlike the normal way, where border and fill are separate attributes but belong to the same object (a quirk of the R>PDF export).!If you want to change the spacing of the labels, you need to use Align and space equally. Or put all text in one continuous text box, rotate, place object wrap on top, and use leading (line spacing) to space labels.!!!!!!!!!!!!

For more information:

How can I sort my data in R?

http://bit.ly/dxWybg !How to display all x labels in R barplot?!http://bit.ly/1fkfVhu

Page � of �6 7

Page 7: sorted bar plot with 45 degree labels step by stepunixlab.sfsu.edu › ~trogu › 523 › 2016 › tutorials › sorted_bar... · 2016-08-25 · sorted bar plot with 45 degree labels

DAI 523 Information Design 1: Data Visualization | Trogu | Fall 2013! ! Last updated: Tuesday, September 2, 2014!sorted bar plot with 45 degree labels – step by step! ! ! ! ! !

!Final code:!!!sortedTable <- stateU1011[order(stateU1011$percentIncrease, decreasing = TRUE), ]

midpts <- barplot(sortedTable$percentIncrease, 1, names.arg="") !text(x=midpts, y=-1, sortedTable$campus, cex=1, srt=45, xpd=TRUE, pos=2)

Page � of �7 7