Topic 1 Welcome and Motivation
Introductions
Get to know the others at your table. Share your names, preferred pronouns, and anything else that is important to you.
A taste of data visualization in R
Throughout the semester we’ll be using the statistical software R to analyze data. We’ll take a first look at R code today in the following activity.
Board games are a favorite hobby of your instructor, and to her delight, there is a dataset containing information about many different board games and their ratings from the website BoardGameGeek.com!
You will work through exploring this dataset in pairs, and the instructor will be circling around to help. For some phases of this exploration, you will need to have this page open to access the data description.
This code may seem foreign at first, but over the semester you will have a lot of practice, and you’ll be able to write code like this by yourself at the end! Today, focus on getting a feel for what the code looks like and its potential for helping us gain insight.
Warm-up
The data are stored in an object (a container) called games
. Let’s get a quick feel for this data.
- How many variables are present? Which are quantitative? Which are categorical?
- How many cases, or units of observation, do we have?
- How are the variables named / labeled?
Phase 1
- Use the code below to construct a visualization (a histogram) of average ratings for the games.
- What do you notice? What information can you gain from this plot?
- What do you think
binwidth = 2
does? Try setting a different value forbinwidth
to make a plot that looks better to you.
Phase 2
- Another way to visualize average ratings is shown below. Use the code below to construct a density plot of average ratings.
- Do you like this plot or the previous one better? Why?
Phase 3
- Using code similar to either of the two above examples in Phases 1 and 2, what other variables might you visualize in a similar way? Try the code here.
- Make note of any interesting findings.
Phase 4
- Using the code below, construct a visualization of average ratings broken down by minimum number of players.
- What do you notice? What is surprising? What is not surprising?
Phase 5
- Using the code below, construct two alternate visualizations of average ratings broken down by minimum number of players. These plots use color instead of panels and also only keep games for which the minimum number of players ranges from 1 to 4.
- Which of the three visualizations do you like best? Why?
Phase 6
- Using the code below, construct a visualization of the relationship between minimum recommended play time and average rating (a scatterplot).
- Is the plot useful? Why or why not?
- Using the
dplyr::filter
part of the Phase 5 code, subset the data to only keep games where the minimum recommended playtime is less than 1000 minutes. Then remake the scatterplot. - Make note of any interesting findings.