Subsetting, str(), Shiny debugging

Learning goals

After this lesson, you should be able to:

Subset vectors and matrices with [ by index, name, logical vector, and indirectly with variables
Subset data frames with $ and [[
Use the str() function to examine the structure of an unfamiliar object and extract components from the object
Apply printing strategies with a Shiny app to streamline the debugging and development process

You can download a template Quarto file to start from here. Put this file in a folder called functions within a folder for this course.

Subsetting

The content here comes from Chapter 27 of R4DS, with some small additions.

Selecting many elements with `[`

There are five main types of things that you can subset a vector with, i.e., that can be the i in x[i]:

A vector of positive integers. Subsetting with positive integers keeps the elements at those positions:
```
x <- c("one", "two", "three", "four", "five")
x[c(3, 2, 5)]
```
```
[1] "three" "two"   "five" 
```
```
x[2:4]
```
```
[1] "two"   "three" "four" 
```
By repeating a position, you can actually make a longer output than input, making the term “subsetting” a bit of a misnomer.
```
x[c(1, 1, 2)]
```
```
[1] "one" "one" "two"
```
A vector of negative integers. Negative values drop the elements at the specified positions:
```
x[c(-1, -3, -5)]
```
```
[1] "two"  "four"
```
A logical vector. Subsetting with a logical vector only keeps values corresponding to TRUE. This is generally used with comparison functions and operators.
```
x <- c(10, 3, NA, 5, 8, 1, NA)

# All non-missing values of x
x[!is.na(x)]
```
```
[1] 10  3  5  8  1
```
```
# All values greater than 5, with NAs
x[x > 5]
```
```
[1] 10 NA  8 NA
```
```
# All non-missing values greater than 5
x[x > 5 & !is.na(x)]
```
```
[1] 10  8
```
Unlike filter(), NA indices will be included in the output as NAs (filter() removes instances of missing values.)
A character vector. If you have a named vector, you can subset it with a character vector:
```
x <- c(abc = 1, def = 2, xyz = 5)
x[c("xyz", "def")]
```
```
xyz def 
  5   2 
```
As with subsetting with positive integers, you can use a character vector to duplicate individual entries.

An object. A named object may provide any of the previous 4 types of information and can be used to subset:

x <- c(first = "one", second = "two", third = "three", fourth = "four")
# Note that x can also be created as follows
x <- c("one", "two", "three", "four")
names(x) <- c("first", "second", "third", "fourth")

# Subset with an integer object
idx_pos <- c(1, 3)
idx_neg <- c(-1, -3)
x[idx_pos]

  first   third 
  "one" "three"

x[idx_neg]

second fourth 
 "two" "four"

# Subset with a logical object
bool <- c(TRUE, FALSE, FALSE, TRUE)
x[bool]

 first fourth 
 "one" "four"

# Subset with a character object
which_names <- c("first", "fourth")
x[which_names]

 first fourth 
 "one" "four"

All of the above subsetting options can be combined with assignment <-. Be very wary of vector recycling when doing this! The number of things that you’re inserting should either be 1 or the size of the x[i] subset.

x <- c(first = "one", second = "two", third = "three", fourth = "four")
x

  first  second   third  fourth 
  "one"   "two" "three"  "four"

x[c(1, 3)] <- "new" # Replacement length is 1
x

 first second  third fourth 
 "new"  "two"  "new" "four"

x <- c(first = "one", second = "two", third = "three", fourth = "four")
x[c(1, 3)] <- c("new1", "new2") # Replacement length is 2, and length of subset is 2
x

 first second  third fourth 
"new1"  "two" "new2" "four"

x <- c(first = "one", second = "two", third = "three", fourth = "four")
x[c(1, 3, 4)] <- c("new1", "new2") # BAD! Replacement length is 2, and length of subset is 3

Warning in x[c(1, 3, 4)] <- c("new1", "new2"): number of items to replace is
not a multiple of replacement length

 first second  third fourth 
"new1"  "two" "new2" "new1"

x <- c(first = "one", second = "two", third = "three", fourth = "four")
x[c(1, 3)] <- c("new1", "new2", "new3")

Warning in x[c(1, 3)] <- c("new1", "new2", "new3"): number of items to replace
is not a multiple of replacement length

 first second  third fourth 
"new1"  "two" "new2" "four"

All of the above subsetting options can be used for subsetting matrices and data frames. Note that if the output has one row or one column, the output is a vector rather than a matrix.

m <- matrix(1:12, nrow = 3, ncol = 4)
m

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

m[1,] # Get 1st row

[1]  1  4  7 10

m[,1] # Get 1st column

[1] 1 2 3

m[1,3] # Get 1st row and 3rd column

[1] 7

m[c(1,3),] # Get 1st and 3rd rows

     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    3    6    9   12

m[,c(1,3)] # Get 1st and 3rd columns

     [,1] [,2]
[1,]    1    7
[2,]    2    8
[3,]    3    9

m[c(1,3),c(1,3)] # Get 1st and 3rd rows and 1st and 3rd columns

     [,1] [,2]
[1,]    1    7
[2,]    3    9

m[-1,] # Get all rows except 1st

     [,1] [,2] [,3] [,4]
[1,]    2    5    8   11
[2,]    3    6    9   12

m[c(TRUE, FALSE, FALSE),] # Get the 1st row via a logical

[1]  1  4  7 10

# Get the 1st row via a variable
which_rows <- 1
m[which_rows,]

[1]  1  4  7 10

# Add row and column names to the matrix
colnames(m) <- str_c("col", 1:4)
rownames(m) <- str_c("row", 1:3)
m["row1",]

col1 col2 col3 col4 
   1    4    7   10

which_rows <- c("row1", "row3")
m[which_rows,]

     col1 col2 col3 col4
row1    1    4    7   10
row3    3    6    9   12

Selecting a single element with `$` and `[[`

We can use $ and [[ to extract a single column of a data frame. (The same can be used to subset lists, which we’ll talk about next week. A data frame is actually a special case of a list.)

mtcars

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

mtcars$mpg

 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4

mtcars[["mpg"]]

 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4

which_var <- "mpg"
mtcars[[which_var]]

 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4

mtcars %>% pull(mpg)

 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
[31] 15.0 21.4

Exercises

Write functions that take a vector as input and return:

The elements at even-numbered positions. (Hint: use the seq() function.)
Every element except the last value.
Only even values (and no missing values).

Solutions

get_even_pos <- function(x) {
    if (length(x) <= 1) {
        print("No even positions")
    } else {
        idx <- seq(2, length(x), by = 2)
        x[idx]
    }
}
get_even_pos(1:10)

[1]  2  4  6  8 10

get_even_pos(1:9)

[1] 2 4 6 8

get_even_pos(1)

[1] "No even positions"

get_all_but_last <- function(x) {
    head(x, -1)
    # x[1:(length(x)-1)]
}
get_all_but_last(1:10)

[1] 1 2 3 4 5 6 7 8 9

get_evens <- function(x) {
    x[x %% 2 == 0 & !is.na(x)]
}

get_evens(c(1, 2, 7, NA))

[1] 2

get_evens(c(1, 2, 7, 8, NA))

[1] 2 8

Exploring the structure of an object with `str()`

The str() function shows you the structure of an object and is useful for exploring model objects and objects created from packages that are new to you. In the output of str() dollar signs indicate named components of a list that can be extracted via $ or [[.

We see that both mod and mod_summ are lists, so we can also interactively view these objects with View(mod) and View(mod_summ) in the Console.

mod <- lm(mpg ~ hp+wt, data = mtcars)
mod_summ <- summary(mod)
str(mod)

List of 12
 $ coefficients : Named num [1:3] 37.2273 -0.0318 -3.8778
  ..- attr(*, "names")= chr [1:3] "(Intercept)" "hp" "wt"
 $ residuals    : Named num [1:32] -2.572 -1.583 -2.476 0.135 0.373 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ effects      : Named num [1:32] -113.65 -26.046 -15.894 0.447 0.662 ...
  ..- attr(*, "names")= chr [1:32] "(Intercept)" "hp" "wt" "" ...
 $ rank         : int 3
 $ fitted.values: Named num [1:32] 23.6 22.6 25.3 21.3 18.3 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ assign       : int [1:3] 0 1 2
 $ qr           :List of 5
  ..$ qr   : num [1:32, 1:3] -5.657 0.177 0.177 0.177 0.177 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
  .. .. ..$ : chr [1:3] "(Intercept)" "hp" "wt"
  .. ..- attr(*, "assign")= int [1:3] 0 1 2
  ..$ qraux: num [1:3] 1.18 1.08 1.09
  ..$ pivot: int [1:3] 1 2 3
  ..$ tol  : num 1e-07
  ..$ rank : int 3
  ..- attr(*, "class")= chr "qr"
 $ df.residual  : int 29
 $ xlevels      : Named list()
 $ call         : language lm(formula = mpg ~ hp + wt, data = mtcars)
 $ terms        :Classes 'terms', 'formula'  language mpg ~ hp + wt
  .. ..- attr(*, "variables")= language list(mpg, hp, wt)
  .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:3] "mpg" "hp" "wt"
  .. .. .. ..$ : chr [1:2] "hp" "wt"
  .. ..- attr(*, "term.labels")= chr [1:2] "hp" "wt"
  .. ..- attr(*, "order")= int [1:2] 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(mpg, hp, wt)
  .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:3] "mpg" "hp" "wt"
 $ model        :'data.frame':  32 obs. of  3 variables:
  ..$ mpg: num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
  ..$ hp : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
  ..$ wt : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
  ..- attr(*, "terms")=Classes 'terms', 'formula'  language mpg ~ hp + wt
  .. .. ..- attr(*, "variables")= language list(mpg, hp, wt)
  .. .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:3] "mpg" "hp" "wt"
  .. .. .. .. ..$ : chr [1:2] "hp" "wt"
  .. .. ..- attr(*, "term.labels")= chr [1:2] "hp" "wt"
  .. .. ..- attr(*, "order")= int [1:2] 1 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. ..- attr(*, "predvars")= language list(mpg, hp, wt)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:3] "mpg" "hp" "wt"
 - attr(*, "class")= chr "lm"

str(mod_summ)

List of 11
 $ call         : language lm(formula = mpg ~ hp + wt, data = mtcars)
 $ terms        :Classes 'terms', 'formula'  language mpg ~ hp + wt
  .. ..- attr(*, "variables")= language list(mpg, hp, wt)
  .. ..- attr(*, "factors")= int [1:3, 1:2] 0 1 0 0 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:3] "mpg" "hp" "wt"
  .. .. .. ..$ : chr [1:2] "hp" "wt"
  .. ..- attr(*, "term.labels")= chr [1:2] "hp" "wt"
  .. ..- attr(*, "order")= int [1:2] 1 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(mpg, hp, wt)
  .. ..- attr(*, "dataClasses")= Named chr [1:3] "numeric" "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:3] "mpg" "hp" "wt"
 $ residuals    : Named num [1:32] -2.572 -1.583 -2.476 0.135 0.373 ...
  ..- attr(*, "names")= chr [1:32] "Mazda RX4" "Mazda RX4 Wag" "Datsun 710" "Hornet 4 Drive" ...
 $ coefficients : num [1:3, 1:4] 37.22727 -0.03177 -3.87783 1.59879 0.00903 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "(Intercept)" "hp" "wt"
  .. ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
 $ aliased      : Named logi [1:3] FALSE FALSE FALSE
  ..- attr(*, "names")= chr [1:3] "(Intercept)" "hp" "wt"
 $ sigma        : num 2.59
 $ df           : int [1:3] 3 29 3
 $ r.squared    : num 0.827
 $ adj.r.squared: num 0.815
 $ fstatistic   : Named num [1:3] 69.2 2 29
  ..- attr(*, "names")= chr [1:3] "value" "numdf" "dendf"
 $ cov.unscaled : num [1:3, 1:3] 3.80e-01 2.21e-05 -1.09e-01 2.21e-05 1.21e-05 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:3] "(Intercept)" "hp" "wt"
  .. ..$ : chr [1:3] "(Intercept)" "hp" "wt"
 - attr(*, "class")= chr "summary.lm"

Exercise

Write a function that takes the following inputs:

data: A dataset
yvar: Outcome variable to be used in a linear model (a length-1 character vector)
preds: Predictor variables to be used in a linear model (a character vector)
pred_of_interest: The variable whose coefficient estimate and confidence interval are of interest (a length-1 character vector and should be one of preds)

Your function will fit a linear model on the dataset using the given outcome and predictor variables and return a data frame (tibble) with the coefficient estimate and CI for the predictor of interest.

Test your function on the mtcars dataset.

Development tip: As you develop, it will help to create objects for the arguments so that you can see what output looks like interactively:

data <- mtcars
yvar <- "mpg"
preds <- c("hp", "wt")
pred_of_interest <- "hp"

When you’re done developing your function, remove these objects to declutter your environment by entering rm(data, yvar, preds, pred_of_interest) in the Console.

fit_mod_and_extract <- function(___) {
    # Use str_c to create a string (formula_str) that looks like "yvar ~ pred1 + pred2"
    # Look at the documentation for a helpful argument
    mod_formula_str <- 
    mod_form <- as.formula(mod_formula_str)
    
    # Fit a linear model using the constructed formula and given data
    mod <- lm(mod_form, data = data)
    
    # Obtain 95% confidence interval
    ci <- confint(mod, level = 0.95)
    
    # Return the coefficient estimate and CI for the predictor of interest
    tibble(
        which_pred = pred_of_interest,
        estimate = ___,
        ci_lower = ___,
        ci_upper = ___
    )
}

Solutions

fit_mod_and_extract <- function(data, yvar, preds, pred_of_interest) {
    # Use str_c to create a string (formula_str) that looks like "yvar ~ pred1 + pred2"
    # Look at the documentation for a helpful argument
    mod_formula_str <- str_c(yvar, "~", str_c(preds, collapse = "+"))
    mod_form <- as.formula(mod_formula_str)
    
    # Fit a linear model using the constructed formula and given data
    mod <- lm(mod_form, data = data)
    
    # Obtain 95% confidence interval
    ci <- confint(mod, level = 0.95)
    
    # Return the coefficient estimate and CI for the predictor of interest
    tibble(
        which_pred = pred_of_interest,
        estimate = mod$coefficients[pred_of_interest],
        ci_lower = ci[pred_of_interest, "2.5 %"],
        ci_upper = ci[pred_of_interest, "97.5 %"]
    )
}


fit_mod_and_extract(data = mtcars, yvar = "mpg", preds = c("hp", "wt"), pred_of_interest = "hp")

# A tibble: 1 × 4
  which_pred estimate ci_lower ci_upper
  <chr>         <dbl>    <dbl>    <dbl>
1 hp          -0.0318  -0.0502  -0.0133

Iterative development and debugging in Shiny

When working with new features in Shiny, it is very helpful to combine str() with renderPrint() and verbatimTextOutput(). We’ll go through Shiny Challenge 2 to demonstrate.