class: center, middle, inverse, title-slide # More plotting with ggplot2 ### Kevin Stachelek ### 2019/03/18 (updated: 2019-03-19) --- ## Install required packages ```r # Load the ggplot2 package install.packages("ggplot2") install.packages("magrittr") # important so that we can use %>%! install.packages("gapminder") install.packages("dplyr") ``` ## Load required packages ```r # Load the ggplot2 package library(ggplot2) library(magrittr) # important so that we can use %>%! library(gapminder) library(dplyr) ``` --- ### Explore the mtcars data frame with `head()` <table> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:right;"> mpg </th> <th style="text-align:left;"> cyl </th> <th style="text-align:right;"> disp </th> <th style="text-align:right;"> hp </th> <th style="text-align:right;"> drat </th> <th style="text-align:right;"> wt </th> <th style="text-align:right;"> qsec </th> <th style="text-align:right;"> vs </th> <th style="text-align:right;"> am </th> <th style="text-align:right;"> gear </th> <th style="text-align:right;"> carb </th> <th style="text-align:left;"> fcyl </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Mazda RX4 </td> <td style="text-align:right;"> 21.0 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:right;"> 160 </td> <td style="text-align:right;"> 110 </td> <td style="text-align:right;"> 3.90 </td> <td style="text-align:right;"> 2.620 </td> <td style="text-align:right;"> 16.46 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 6 </td> </tr> <tr> <td style="text-align:left;"> Mazda RX4 Wag </td> <td style="text-align:right;"> 21.0 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:right;"> 160 </td> <td style="text-align:right;"> 110 </td> <td style="text-align:right;"> 3.90 </td> <td style="text-align:right;"> 2.875 </td> <td style="text-align:right;"> 17.02 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:left;"> 6 </td> </tr> <tr> <td style="text-align:left;"> Datsun 710 </td> <td style="text-align:right;"> 22.8 </td> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 108 </td> <td style="text-align:right;"> 93 </td> <td style="text-align:right;"> 3.85 </td> <td style="text-align:right;"> 2.320 </td> <td style="text-align:right;"> 18.61 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 4 </td> </tr> <tr> <td style="text-align:left;"> Hornet 4 Drive </td> <td style="text-align:right;"> 21.4 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:right;"> 258 </td> <td style="text-align:right;"> 110 </td> <td style="text-align:right;"> 3.08 </td> <td style="text-align:right;"> 3.215 </td> <td style="text-align:right;"> 19.44 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 6 </td> </tr> <tr> <td style="text-align:left;"> Hornet Sportabout </td> <td style="text-align:right;"> 18.7 </td> <td style="text-align:left;"> 8 </td> <td style="text-align:right;"> 360 </td> <td style="text-align:right;"> 175 </td> <td style="text-align:right;"> 3.15 </td> <td style="text-align:right;"> 3.440 </td> <td style="text-align:right;"> 17.02 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:left;"> 8 </td> </tr> <tr> <td style="text-align:left;"> Valiant </td> <td style="text-align:right;"> 18.1 </td> <td style="text-align:left;"> 6 </td> <td style="text-align:right;"> 225 </td> <td style="text-align:right;"> 105 </td> <td style="text-align:right;"> 2.76 </td> <td style="text-align:right;"> 3.460 </td> <td style="text-align:right;"> 20.22 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:left;"> 6 </td> </tr> </tbody> </table> --- ### Explore the mtcars data frame with `str()` ```r str(mtcars) ``` ``` ## 'data.frame': 32 obs. of 12 variables: ## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... ## $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ... ## $ disp: num 160 160 108 258 360 ... ## $ hp : num 110 110 93 110 175 105 245 62 95 123 ... ## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... ## $ wt : num 2.62 2.88 2.32 3.21 3.44 ... ## $ qsec: num 16.5 17 18.6 19.4 17 ... ## $ vs : num 0 0 1 1 0 1 0 1 1 1 ... ## $ am : num 1 1 1 0 0 0 0 0 0 0 ... ## $ gear: num 4 4 4 3 3 3 3 4 4 4 ... ## $ carb: num 4 4 1 1 2 1 4 2 2 4 ... ## $ fcyl: Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ... ``` --- ### Build a simple plot ```r ggplot(mtcars, aes(x = cyl, y = mpg)) + geom_point() ``` <!-- --> --- ### Axes aesthetics ```r ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point() ``` <!-- --> --- ### Color aesthetics ```r ggplot(mtcars, aes(x = wt, y = mpg, color = disp)) + geom_point() ``` <!-- --> --- ### Size aesthetics ```r ggplot(mtcars, aes(x = wt, y = mpg, size = disp)) + geom_point() ``` <!-- --> --- ### One common error to avoid ```r ggplot(mtcars, aes(x = wt, y = mpg, shape = disp)) + geom_point() ``` -- ```r table(mtcars$disp) ``` ``` ## ## 71.1 75.7 78.7 79 95.1 108 120.1 120.3 121 140.8 145 146.7 ## 1 1 1 1 1 1 1 1 1 1 1 1 ## 160 167.6 225 258 275.8 301 304 318 350 351 360 400 ## 2 2 1 1 3 1 1 1 1 1 2 1 ## 440 460 472 ## 1 1 1 ``` -- The _types_ of variables (i.e. columns) are important --- ### Let's look at another dataset ```r # Explore the diamonds data frame with str() str(diamonds) ``` ``` ## Classes 'tbl_df', 'tbl' and 'data.frame': 53940 obs. of 10 variables: ## $ carat : num 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ... ## $ cut : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ... ## $ color : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ... ## $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ... ## $ depth : num 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ... ## $ table : num 55 61 65 58 58 57 57 55 61 61 ... ## $ price : int 326 326 327 334 335 336 336 337 337 338 ... ## $ x : num 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ... ## $ y : num 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ... ## $ z : num 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ... ``` --- ### We can build up a new plot following the same 'recipe' every time + add data (diamonds) -- + specify aesthetics (x axis and y axis inside `aes`) -- + add a geom object (`geom_point`) -- ```r # Add geom_point() with + ggplot(diamonds, aes(x = carat, y = price)) + geom_point() ``` <!-- --> --- ### We can layer additional `geoms` one after another ```r # Add geom_point() and geom_smooth() with + ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth() ``` <!-- --> --- ### Let's recreate that plot <!-- --> -- ```r # 1 - The plot you created in the previous exercise ggplot(diamonds, aes(x = carat, y = price)) + geom_point() + geom_smooth() ``` --- ### Plotting only the trend-line <!-- --> -- ```r # 2 - Copy the above command but show only the smooth line ggplot(diamonds, aes(x = carat, y = price)) + geom_smooth() ``` --- ### Plotting separate lines for each diamond clarity value <!-- --> -- ```r # 3 - Copy the above command and assign the correct value to col in aes() ggplot(diamonds, aes(x = carat, y = price, color = clarity)) + geom_smooth() ``` --- ### Color points by clarity <!-- --> -- ```r # 4 - Keep the color settings from previous command. Plot only the points with argument alpha. ggplot(diamonds, aes(x = carat, y = price, color = clarity)) + geom_point(alpha = 0.4) ``` --- ### plots can be created in two different steps using the `+` symbol ```r # Create the object containing the data and aes layers: dia_plot dia_plot <- ggplot(diamonds, aes(x = carat, y = price)) # Add a geom layer with + and geom_point() dia_plot + geom_point() ``` <!-- --> --- ### aesthetics can be 'scoped' to a specific `geom` object ```r # Add the same geom layer, but with aes() inside dia_plot + geom_point(aes(color = clarity)) ``` <!-- --> --- ### Basic scatter plot <!-- --> -- ```r # 2 - Expand dia_plot by adding geom_point() with alpha set to 0.2 dia_plot <- dia_plot + geom_point(alpha = 0.2) print(dia_plot) ``` --- ### Plot trendlines per group <!-- --> -- ```r # 3 - Plot dia_plot with additional geom_smooth() with se set to FALSE dia_plot + geom_smooth(se = FALSE) ``` * `se` allows to specify confidence intervals --- ### Plot trendline by clarity <!-- --> -- ```r # 4 - Copy the command from above and add aes() with the correct mapping to geom_smooth() dia_plot + geom_smooth(aes(col = clarity), se = FALSE) ``` --- # _Base R_ plotting --- #### take a look at built-in `iris` dataset ```r knitr::kable(head(iris), format = "html") ``` <table> <thead> <tr> <th style="text-align:right;"> Sepal.Length </th> <th style="text-align:right;"> Sepal.Width </th> <th style="text-align:right;"> Petal.Length </th> <th style="text-align:right;"> Petal.Width </th> <th style="text-align:left;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5.1 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 3.2 </td> <td style="text-align:right;"> 1.3 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.6 </td> <td style="text-align:right;"> 3.1 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 3.6 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.4 </td> <td style="text-align:right;"> 3.9 </td> <td style="text-align:right;"> 1.7 </td> <td style="text-align:right;"> 0.4 </td> <td style="text-align:left;"> setosa </td> </tr> </tbody> </table> --- ### a simple base R plot ```r plot(iris$Sepal.Length, iris$Sepal.Width) points(iris$Petal.Length, iris$Petal.Width, col = "red") ``` <!-- --> --- ### Limitations of Base Plotting 1. Plot doesn't get redrawn (this is similar to `matplotlib` in python) 2. Plot is drawn as an image 3. Need to manually add legend 4. No unified framework for plotting --- ### More about Base plots ```r # Plot the correct variables of mtcars plot(mtcars$wt, mtcars$mpg, col = mtcars$cyl) ``` <!-- --> --- ### More about Base plots ```r # Change cyl inside mtcars to a factor mtcars$fcyl <- as.factor(mtcars$cyl) # Make the same plot as in the first instruction plot(mtcars$wt, mtcars$mpg, col = mtcars$fcyl) ``` <!-- --> --- ### More about Base plots ```r # Use lm() to calculate a linear model and save it as carModel carModel <- lm(mpg ~ wt, data = mtcars) # Basic plot mtcars$cyl <- as.factor(mtcars$cyl) plot(mtcars$wt, mtcars$mpg, col = mtcars$cyl) # Call abline() with carModel as first argument and set lty to 2 abline(carModel, lty = 2) ``` <!-- --> --- ### The same plot using `ggplot2` ```r # Plot 1: add geom_point() to this command to create a scatter plot ggplot(mtcars, aes(x = wt, y = mpg, col = cyl)) + geom_point() ``` <!-- --> --- ### The same plot using `ggplot2` ```r # Plot 2: include the lines of the linear models, per cyl ggplot(mtcars, aes(x = wt, y = mpg, col = cyl)) + geom_point() + geom_smooth(aes(group = 1), method = "lm", se = FALSE, linetype = 2) ``` <!-- --> --- ### The same plot using `ggplot2` ```r # Plot 3: include a lm for each group in the dataset ggplot(mtcars, aes(x = wt, y = mpg, col = cyl)) + geom_point() + geom_smooth(method = "lm", se = FALSE) + geom_smooth(aes(group = 1), method = "lm", se = FALSE, linetype = 2) ``` <!-- --> --- # Let's talk about _tidy_ data! ### ggplot2 usually works with _tidy_ data There are three interrelated rules which make a dataset tidy: Each variable must have its own column. Each observation must have its own row. Each value must have its own cell.  --- ### two types of data: 'wide' and 'long' <table> <thead> <tr> <th style="text-align:right;"> Sepal.Length </th> <th style="text-align:right;"> Sepal.Width </th> <th style="text-align:right;"> Petal.Length </th> <th style="text-align:right;"> Petal.Width </th> <th style="text-align:left;"> Species </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 5.1 </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 3.2 </td> <td style="text-align:right;"> 1.3 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 4.6 </td> <td style="text-align:right;"> 3.1 </td> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 3.6 </td> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> </tr> <tr> <td style="text-align:right;"> 5.4 </td> <td style="text-align:right;"> 3.9 </td> <td style="text-align:right;"> 1.7 </td> <td style="text-align:right;"> 0.4 </td> <td style="text-align:left;"> setosa </td> </tr> </tbody> </table> <table> <thead> <tr> <th style="text-align:right;"> Length </th> <th style="text-align:right;"> Width </th> <th style="text-align:left;"> Species </th> <th style="text-align:left;"> Part </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> <td style="text-align:left;"> Petal </td> </tr> <tr> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> <td style="text-align:left;"> Petal </td> </tr> <tr> <td style="text-align:right;"> 1.3 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> <td style="text-align:left;"> Petal </td> </tr> <tr> <td style="text-align:right;"> 1.5 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> <td style="text-align:left;"> Petal </td> </tr> <tr> <td style="text-align:right;"> 1.4 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:left;"> setosa </td> <td style="text-align:left;"> Petal </td> </tr> <tr> <td style="text-align:right;"> 1.7 </td> <td style="text-align:right;"> 0.4 </td> <td style="text-align:left;"> setosa </td> <td style="text-align:left;"> Petal </td> </tr> </tbody> </table> * tidy Data is usually 'long' --- ### Tidy data makes code cleaner and therefore more reliable ```r # Option 1 ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point() + geom_point(aes(x = Petal.Length, y = Petal.Width), col = "red") # Option 2 ggplot(iris.long, aes(x = Length, y = Width, col = Part)) + geom_point() ``` --- ### Advantages of ggplot in comparison to base 1. The legend gets taken care of 2. The axis labels are legible 3. The plot can be __iterated on__ --- ### So how we make data tidy? -- ## Use `tidyr` and `dplyr`! ```r install.packages("nycflights13") install.packages("tidyr") ``` ```r library(tidyr) library(nycflights13) ``` --- ### Gather  --- ### Gather ```r ds <- nycflights13::airports %>% gather(lat, lon, key = "coordinate", value = "value") ``` <table> <thead> <tr> <th style="text-align:left;"> faa </th> <th style="text-align:left;"> name </th> <th style="text-align:right;"> alt </th> <th style="text-align:right;"> tz </th> <th style="text-align:left;"> dst </th> <th style="text-align:left;"> tzone </th> <th style="text-align:left;"> coordinate </th> <th style="text-align:right;"> value </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 04G </td> <td style="text-align:left;"> Lansdowne Airport </td> <td style="text-align:right;"> 1044 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> <td style="text-align:left;"> lat </td> <td style="text-align:right;"> 41.13047 </td> </tr> <tr> <td style="text-align:left;"> 06A </td> <td style="text-align:left;"> Moton Field Municipal Airport </td> <td style="text-align:right;"> 264 </td> <td style="text-align:right;"> -6 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/Chicago </td> <td style="text-align:left;"> lat </td> <td style="text-align:right;"> 32.46057 </td> </tr> <tr> <td style="text-align:left;"> 06C </td> <td style="text-align:left;"> Schaumburg Regional </td> <td style="text-align:right;"> 801 </td> <td style="text-align:right;"> -6 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/Chicago </td> <td style="text-align:left;"> lat </td> <td style="text-align:right;"> 41.98934 </td> </tr> <tr> <td style="text-align:left;"> 06N </td> <td style="text-align:left;"> Randall Airport </td> <td style="text-align:right;"> 523 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> <td style="text-align:left;"> lat </td> <td style="text-align:right;"> 41.43191 </td> </tr> <tr> <td style="text-align:left;"> 09J </td> <td style="text-align:left;"> Jekyll Island Airport </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> <td style="text-align:left;"> lat </td> <td style="text-align:right;"> 31.07447 </td> </tr> <tr> <td style="text-align:left;"> 0A9 </td> <td style="text-align:left;"> Elizabethton Municipal Airport </td> <td style="text-align:right;"> 1593 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> <td style="text-align:left;"> lat </td> <td style="text-align:right;"> 36.37122 </td> </tr> </tbody> </table> --- ### Spread  --- ### Spread ```r ds <- ds %>% tidyr::spread(coordinate, value, c("lat", "lon")) ``` <table> <thead> <tr> <th style="text-align:left;"> faa </th> <th style="text-align:left;"> name </th> <th style="text-align:right;"> alt </th> <th style="text-align:right;"> tz </th> <th style="text-align:left;"> dst </th> <th style="text-align:left;"> tzone </th> <th style="text-align:left;"> lat </th> <th style="text-align:left;"> lon </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 04G </td> <td style="text-align:left;"> Lansdowne Airport </td> <td style="text-align:right;"> 1044 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> <td style="text-align:left;"> 41.1304722 </td> <td style="text-align:left;"> -80.6195833 </td> </tr> <tr> <td style="text-align:left;"> 06A </td> <td style="text-align:left;"> Moton Field Municipal Airport </td> <td style="text-align:right;"> 264 </td> <td style="text-align:right;"> -6 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/Chicago </td> <td style="text-align:left;"> 32.4605722 </td> <td style="text-align:left;"> -85.6800278 </td> </tr> <tr> <td style="text-align:left;"> 06C </td> <td style="text-align:left;"> Schaumburg Regional </td> <td style="text-align:right;"> 801 </td> <td style="text-align:right;"> -6 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/Chicago </td> <td style="text-align:left;"> 41.9893408 </td> <td style="text-align:left;"> -88.1012428 </td> </tr> <tr> <td style="text-align:left;"> 06N </td> <td style="text-align:left;"> Randall Airport </td> <td style="text-align:right;"> 523 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> <td style="text-align:left;"> 41.431912 </td> <td style="text-align:left;"> -74.3915611 </td> </tr> <tr> <td style="text-align:left;"> 09J </td> <td style="text-align:left;"> Jekyll Island Airport </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> <td style="text-align:left;"> 31.0744722 </td> <td style="text-align:left;"> -81.4277778 </td> </tr> <tr> <td style="text-align:left;"> 0A9 </td> <td style="text-align:left;"> Elizabethton Municipal Airport </td> <td style="text-align:right;"> 1593 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> <td style="text-align:left;"> 36.3712222 </td> <td style="text-align:left;"> -82.1734167 </td> </tr> </tbody> </table> --- ### Separate  --- ### Separate ```r ds2 <- nycflights13::airports %>% tidyr::separate(tzone, into = c("country", "city"), sep = "/") ``` <table> <thead> <tr> <th style="text-align:left;"> faa </th> <th style="text-align:left;"> name </th> <th style="text-align:right;"> lat </th> <th style="text-align:right;"> lon </th> <th style="text-align:right;"> alt </th> <th style="text-align:right;"> tz </th> <th style="text-align:left;"> dst </th> <th style="text-align:left;"> country </th> <th style="text-align:left;"> city </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 04G </td> <td style="text-align:left;"> Lansdowne Airport </td> <td style="text-align:right;"> 41.13047 </td> <td style="text-align:right;"> -80.61958 </td> <td style="text-align:right;"> 1044 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America </td> <td style="text-align:left;"> New_York </td> </tr> <tr> <td style="text-align:left;"> 06A </td> <td style="text-align:left;"> Moton Field Municipal Airport </td> <td style="text-align:right;"> 32.46057 </td> <td style="text-align:right;"> -85.68003 </td> <td style="text-align:right;"> 264 </td> <td style="text-align:right;"> -6 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America </td> <td style="text-align:left;"> Chicago </td> </tr> <tr> <td style="text-align:left;"> 06C </td> <td style="text-align:left;"> Schaumburg Regional </td> <td style="text-align:right;"> 41.98934 </td> <td style="text-align:right;"> -88.10124 </td> <td style="text-align:right;"> 801 </td> <td style="text-align:right;"> -6 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America </td> <td style="text-align:left;"> Chicago </td> </tr> <tr> <td style="text-align:left;"> 06N </td> <td style="text-align:left;"> Randall Airport </td> <td style="text-align:right;"> 41.43191 </td> <td style="text-align:right;"> -74.39156 </td> <td style="text-align:right;"> 523 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America </td> <td style="text-align:left;"> New_York </td> </tr> <tr> <td style="text-align:left;"> 09J </td> <td style="text-align:left;"> Jekyll Island Airport </td> <td style="text-align:right;"> 31.07447 </td> <td style="text-align:right;"> -81.42778 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America </td> <td style="text-align:left;"> New_York </td> </tr> <tr> <td style="text-align:left;"> 0A9 </td> <td style="text-align:left;"> Elizabethton Municipal Airport </td> <td style="text-align:right;"> 36.37122 </td> <td style="text-align:right;"> -82.17342 </td> <td style="text-align:right;"> 1593 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America </td> <td style="text-align:left;"> New_York </td> </tr> </tbody> </table> --- ### Unite  --- ### Unite ```r ds2 <- ds2 %>% tidyr::unite(country, city, col = "tzone", sep = "/") ``` <table> <thead> <tr> <th style="text-align:left;"> faa </th> <th style="text-align:left;"> name </th> <th style="text-align:right;"> lat </th> <th style="text-align:right;"> lon </th> <th style="text-align:right;"> alt </th> <th style="text-align:right;"> tz </th> <th style="text-align:left;"> dst </th> <th style="text-align:left;"> tzone </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 04G </td> <td style="text-align:left;"> Lansdowne Airport </td> <td style="text-align:right;"> 41.13047 </td> <td style="text-align:right;"> -80.61958 </td> <td style="text-align:right;"> 1044 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> </tr> <tr> <td style="text-align:left;"> 06A </td> <td style="text-align:left;"> Moton Field Municipal Airport </td> <td style="text-align:right;"> 32.46057 </td> <td style="text-align:right;"> -85.68003 </td> <td style="text-align:right;"> 264 </td> <td style="text-align:right;"> -6 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/Chicago </td> </tr> <tr> <td style="text-align:left;"> 06C </td> <td style="text-align:left;"> Schaumburg Regional </td> <td style="text-align:right;"> 41.98934 </td> <td style="text-align:right;"> -88.10124 </td> <td style="text-align:right;"> 801 </td> <td style="text-align:right;"> -6 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/Chicago </td> </tr> <tr> <td style="text-align:left;"> 06N </td> <td style="text-align:left;"> Randall Airport </td> <td style="text-align:right;"> 41.43191 </td> <td style="text-align:right;"> -74.39156 </td> <td style="text-align:right;"> 523 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> </tr> <tr> <td style="text-align:left;"> 09J </td> <td style="text-align:left;"> Jekyll Island Airport </td> <td style="text-align:right;"> 31.07447 </td> <td style="text-align:right;"> -81.42778 </td> <td style="text-align:right;"> 11 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> </tr> <tr> <td style="text-align:left;"> 0A9 </td> <td style="text-align:left;"> Elizabethton Municipal Airport </td> <td style="text-align:right;"> 36.37122 </td> <td style="text-align:right;"> -82.17342 </td> <td style="text-align:right;"> 1593 </td> <td style="text-align:right;"> -5 </td> <td style="text-align:left;"> A </td> <td style="text-align:left;"> America/New_York </td> </tr> </tbody> </table> --- ### Modifying aesthetics + everything in `aes` is an aesthetic mapping (this includes x-axis and y-axis) + we can modify these `aes` with some specific options + let's modify the x-axis aesthetic! + standard position + `fill` position (proportional) + `dodge` position (horizontally stacked position) <!-- --> --- # Other Types of Plots ### Load Data ```r gm_2007 <- gapminder %>% filter(year == 2007) ``` -- ### Load (More) Data ```r by_year_continent <- gapminder %>% group_by(year, continent) %>% summarize(totalPop = sum(pop), meanLifeExp = mean(lifeExp)) ``` --- # Line plots change over time -- ```r ggplot(by_year_continent, aes(x = year, y = totalPop, color = continent, height = 3)) + * geom_line() + expand_limits(y = 0) ``` <!-- --> --- # Bar plots comparing over several categories -- ```r ggplot(by_continent, aes(x = continent, y = meanLifeExp)) + * geom_col() ``` <!-- --> --- # histograms distribution of a single numeric variable -- ```r ggplot(gm_2007, aes(x = lifeExp)) + * geom_histogram() ``` <!-- --> --- # It's important to manage the binwidth of a histogram ```r ggplot(gm_2007, aes(x = lifeExp)) + * geom_histogram(binwidth = 5) ``` <!-- --> --- # box plots distribution of several numeric variables -- ```r ggplot(gm_2007, aes(x = continent, y = lifeExp)) + * geom_boxplot() ``` <!-- --> --- # Histogram vs Box Plot .pull-left[ ```r hist_plot <- ggplot(gm_2007, aes(x = lifeExp)) + * geom_histogram() hist_plot ``` <!-- --> ] .pull-right[ ```r ggplot(gm_2007, aes(x = continent, y = lifeExp)) + * geom_boxplot() ``` <!-- --> ] --- # How to Print to a File ### Three ways: -- ### ggsave -- ### graphicsdevice -- ### manual --- # ggsave #### ggsave defaults to the last printed plot ```r ggsave("test_histogram.pdf") ``` #### otherwise you can specify a plot object as the second argument ```r ggsave("test_histogram2.pdf", plot = hist_plot) ``` --- # graphicsdevice (pdf, png, jpg, etc.) #### *Warning* The plot needs to be *printed*, not just created within the graphics device ```r pdf("test_histogram3.pdf") hist_plot dev.off() ``` ``` ## png ## 2 ``` --- # Manually ```r # can use the rstudio viewer pane print(hist_plot) ``` <!-- -->