# Subsetting The content here comes from [Chapter 27](https://r4ds.hadley.nz/base-r) of R4DS, with some small additions. ## Selecting many elements with `[` There are five main types of things that you can subset a vector with, i.e., that can be the `i` in `x[i]`: 1. **A vector of positive integers**. Subsetting with positive integers keeps the elements at those positions: ```{r} x <- c("one", "two", "three", "four", "five") x[c(3, 2, 5)] x[2:4] ``` By repeating a position, you can actually make a longer output than input, making the term "subsetting" a bit of a misnomer. ```{r} x[c(1, 1, 2)] ``` 2. **A vector of negative integers**. Negative values drop the elements at the specified positions: ```{r} x[c(-1, -3, -5)] ``` 3. **A logical vector**. Subsetting with a logical vector only keeps values corresponding to `TRUE`. This is generally used with comparison functions and operators. ```{r} x <- c(10, 3, NA, 5, 8, 1, NA) # All non-missing values of x x[!is.na(x)] # All values greater than 5, with NAs x[x > 5] # All non-missing values greater than 5 x[x > 5 & !is.na(x)] ``` Unlike `filter()`, `NA` indices will be included in the output as `NA`s (`filter()` removes instances of missing values.) 4. **A character vector**. If you have a named vector, you can subset it with a character vector: ```{r} x <- c(abc = 1, def = 2, xyz = 5) x[c("xyz", "def")] ``` As with subsetting with positive integers, you can use a character vector to duplicate individual entries. 5. **An object**. A named object may provide any of the previous 4 types of information and can be used to subset: ```{r} x <- c(first = "one", second = "two", third = "three", fourth = "four") # Note that x can also be created as follows x <- c("one", "two", "three", "four") names(x) <- c("first", "second", "third", "fourth") # Subset with an integer object idx_pos <- c(1, 3) idx_neg <- c(-1, -3) x[idx_pos] x[idx_neg] # Subset with a logical object bool <- c(TRUE, FALSE, FALSE, TRUE) x[bool] # Subset with a character object which_names <- c("first", "fourth") x[which_names] ``` All of the above subsetting options can be combined with assignment `<-`. **Be very wary of vector recycling when doing this!** The number of things that you're inserting should either be 1 or the size of the `x[i]` subset. ```{r} x <- c(first = "one", second = "two", third = "three", fourth = "four") x x[c(1, 3)] <- "new" # Replacement length is 1 x x <- c(first = "one", second = "two", third = "three", fourth = "four") x[c(1, 3)] <- c("new1", "new2") # Replacement length is 2, and length of subset is 2 x x <- c(first = "one", second = "two", third = "three", fourth = "four") x[c(1, 3, 4)] <- c("new1", "new2") # BAD! Replacement length is 2, and length of subset is 3 x x <- c(first = "one", second = "two", third = "three", fourth = "four") x[c(1, 3)] <- c("new1", "new2", "new3") x ``` All of the above subsetting options can be used for subsetting matrices and data frames. Note that if the output has one row or one column, the output is a vector rather than a matrix. ```{r} m <- matrix(1:12, nrow = 3, ncol = 4) m m[1,] # Get 1st row m[,1] # Get 1st column m[1,3] # Get 1st row and 3rd column m[c(1,3),] # Get 1st and 3rd rows m[,c(1,3)] # Get 1st and 3rd columns m[c(1,3),c(1,3)] # Get 1st and 3rd rows and 1st and 3rd columns m[-1,] # Get all rows except 1st m[c(TRUE, FALSE, FALSE),] # Get the 1st row via a logical # Get the 1st row via a variable which_rows <- 1 m[which_rows,] # Add row and column names to the matrix colnames(m) <- str_c("col", 1:4) rownames(m) <- str_c("row", 1:3) m["row1",] which_rows <- c("row1", "row3") m[which_rows,] ``` ## Selecting a single element with `$` and `[[` We can use `$` and `[[` to extract a single column of a data frame. (The same can be used to subset lists, which we'll talk about next week. A data frame is actually a special case of a list.) ```{r} mtcars mtcars$mpg mtcars[["mpg"]] which_var <- "mpg" mtcars[[which_var]] mtcars %>% pull(mpg) ``` ## Exercises Write functions that take a vector as input and return: 1. The elements at even-numbered positions. (Hint: use the `seq()` function.) 2. Every element except the last value. 3. Only even values (and no missing values).

# Exploring the structure of an object with `str()` The `str()` function shows you the structure of an object and is useful for exploring model objects and objects created from packages that are new to you. In the output of `str()` dollar signs indicate named components of a list that can be extracted via `$` or `[[`. We see that both `mod` and `mod_summ` are lists, so we can also interactively view these objects with `View(mod)` and `View(mod_summ)` in the Console. ```{r} mod <- lm(mpg ~ hp+wt, data = mtcars) mod_summ <- summary(mod) str(mod) str(mod_summ) ``` ## Exercise Write a function that takes the following inputs: - `data`: A dataset - `yvar`: Outcome variable to be used in a linear model (a length-1 character vector) - `preds`: Predictor variables to be used in a linear model (a character vector) - `pred_of_interest`: The variable whose coefficient estimate and confidence interval are of interest (a length-1 character vector and should be one of `preds`) Your function will fit a linear model on the dataset using the given outcome and predictor variables and return a data frame (`tibble`) with the coefficient estimate and CI for the predictor of interest. Test your function on the `mtcars` dataset. Development tip: As you develop, it will help to create objects for the arguments so that you can see what output looks like interactively: ```{r eval=FALSE} data <- mtcars yvar <- preds <- pred_of_interest <- ``` When you're done developing your function, remove these objects to declutter your environment by entering `rm(data, yvar, preds, pred_of_interest)` in the Console. ```{r eval=FALSE} fit_mod_and_extract <- function(___) { # Use str_c to create a string (formula_str) that looks like "yvar ~ pred1 + pred2" # Look at the documentation for a helpful argument mod_formula_str <- mod_form <- as.formula(mod_formula_str) # Fit a linear model using the constructed formula and given data mod <- lm(mod_form, ___) # Obtain 95% confidence interval ci <- confint(mod, level = 0.95) # Return the coefficient estimate and CI for the predictor of interest tibble( which_pred = pred_of_interest, estimate = ___, ci_lower = ___, ci_upper = ___ ) } ```

# Iterative development and debugging in Shiny When working with new features in Shiny, it is very helpful to combine `str()` with `renderPrint()` and `verbatimTextOutput()`. We'll go through Shiny Challenge 2 to demonstrate.