library(ggplot2)
data(diamonds)
Schedule
Course Calendar
Readings in the schedule below refer to the following textbooks (freely available online):
- R for Data Science (2e) by Wickham, Cetinkaya-Rundel, and Grolemund (Abbreviated as R4DS)
- R Programming for Data Science (Abbreviated as RPDS)
- Modern Data Science with R (3e) by Baumer, Kaplan, and Horton (Abbreviated as MDSR)
Guiding questions for the readings are available at the bottom of this page.
Week | Tuesday | Thursday | Announcements |
---|---|---|---|
1 |
9/5: Welcome! Meeting each other and designing our learning community Before class: Review the syllabus and think about the questions posed in the green "Reflect" blocks. |
9/7: Advanced visualization in ggplot Before class: Review the construction of plots from STAT 112 and STAT 155. Answer the Guiding Questions at the bottom of this page. |
Work on HW0 (your 10-year vision, doesn't need to be turned in). Look ahead to HW1 |
2 |
9/12: Advanced map visualization Before class: Watch this video on Coordinate Reference Systems, and answer the Guiding Questions at the bottom of this page. |
9/14: Advanced map visualization (continued) |
Turn in HW1 by midnight on Wed 9/13. Look ahead to HW2 due Wednesday 9/20 at midnight. |
3 |
9/19: Interactive visualization Before class: Listen to this podcast from Chapter 7 (timestamp 18:09) through Chapter 8 (ending at timestamp 25:27). Answer the Guiding Question at the bottom of this page. Install the "shiny" and "plotly" R packages. |
9/21: Classroom Community and Connectedness (CC&C) Survey For the first 30 minutes, we will move our course projects forward. In the last hour of class, CC&C facilitators will come in to run an activity on how community-building is going in our course. |
Turn in HW2 by midnight on Wednesday 9/20. Look ahead to HW3. |
4 | 9/26: Data wrangling: numbers, logicals, and dates Helpful readings (read before or after class): (All from R4DS) Chapter 13 (Logicals), Chapter 14 (Numbers), and Chapter 18 (Dates/Times) |
9/28: Data wrangling: strings Helpful readings (read before or after class): (All from R4DS) Chapter 15 (Strings) and Chapter 16 (Regular Expressions) |
Turn in HW3. Look ahead to HW4 (Project Miletone 2). |
5 |
10/3: Data wrangling: factors Helpful readings (read before or after class): Chapter 17 (Factors) (R4DS). |
10/5: Writing functions Helpful readings (read before or after class): R4DS Chapter 26 (Functions) and RPDS Section 13.1 (if-else). |
Turn in Project Milestone 2 by either Wed 10/4 or Wed 10/11 at midnight. Start looking at Reflection 1. |
6 |
10/10: Loops and iteration Helpful readings (read before or after class): Chapter 27 (Iteration) and this tutorial. |
10/12: Loops and iteration | Turn in HW4 (Project Milestone 2) by Wed 10/11 if you haven't already. Turn in Reflection 1 by Wed 10/11. |
7 |
10/17: Data acquisition: APIs Helpful readings (read before or after class): |
10/19: Data acquisition: Scraping Helpful readings (read before or after class): rvest vignette |
Turn in HW5. |
8 |
10/24: Project feedback Project progress presentation #1 (Milestone 3): Your team will present a 5-7 minute progress report and plan for next steps |
10/26: No class - Fall Break 🍁 | |
9 | 10/31: Review and practice (Day 1) | 11/2: Review and practice (Day 2) | Work on review activities as part of HW6. |
10 |
11/7: Data acquisition: databases Required reading before class: R4DS Chapter 22 (Databases). |
11/9: Data acquisition: databases (continued) | Work on HW7. |
11 |
11/14: Missing data: wrangling and missingness mechanisms Required reading before class: Chapters 1, 2, and 3 in The Missing Book |
11/16: Missing data: imputation Required reading before class: Chapters 11, 13, and 14 in The Missing Book |
Turn in HW7 on Wednesday 11/15. |
12 |
11/21: Project presentations Project progress presentation #2 (Milestone 4): Your team will give a 10 minute presentation with intermediate results for 2 research questions. |
11/23: No class - Thanksgiving Break 🦃 | |
13 | 11/28: Project work time | 11/30: Project work time | |
14 | 12/5: Project work time | 12/7: Project work time | |
15 | 12/12: Project presentations (last day of class) |
Guiding Questions
Do your best to answer guiding questions before the indicated class period. Responses don’t need to be turned in, but answering helps you prepare effectively for class.
11/7: Databases
Before class on Tuesday, install the DBI
and duckdb
packages. Post in the #questions channel on Slack if you run into issues.
As you read the Databases chapter of R4DS, answer the following questions:
- What are the differences between client-server, cloud, and in-process database management systems (DBMSs)?
- Make note of how the following functions are used in a database workflow. What does each function do? What inputs/arguments does each function require? Create a set of notes that shows the sequence of these functions in a database workflow.
DBI::dbConnect()
DBI::dbReadTable()
DBI::dbGetQuery()
tbl()
collect()
- Make note of how the
showQuery()
function can help you learn SQL (structured query language) by translatingdplyr
code into SQL.- As you read Section 22.5 (SQL), create notes that relate parts of SQL queries to
dplyr
functions.
- As you read Section 22.5 (SQL), create notes that relate parts of SQL queries to
10/19: Web scraping
As you read the rvest
package vignette, answer the following questions:
- The first step of getting (scraping) data from an arbitrary web page is to read in the webpage. What
rvest
function(s) are relevant for this step? - Next we need to select the HTML elements that contain the information that we want. What function(s) are relevant here?
- How would we select all elements that have the class “author”?
- How would we select all level 1 headers (tag
<h1>
)?
- Next we extract information from the selected HTML elements.
- What is the difference between selecting the text contents of an HTML element and selecting an attribute of the HTML element? What functions are used for these two tasks?
- How would we get all URLs for links that appear on a webpage?
9/19: Interactive visualization
After listening to this podcast from Chapter 7 (timestamp 18:09) through Chapter 8 (ending at timestamp 25:27), reflect on the following question:
- What was new, unexpected, or interesting in the discussion about animations, interactivity, and dashboards?
9/12: Advanced map visualization
After/while watching this video on Coordinate Reference Systems (CRS), answer the following questions:
- What is the shape of the Earth?
- Why is GDA94 a great datum name?
- What are the two components of a CRS/GCS?
- Why do we use many different local CRSs rather than just one CRS for the whole earth?
- Why is it insufficient to identify a location by its latitude and longitude?
- Why do we need to be mindful about CRSs when working with different spatial datasets?
9/7: Advanced visualization in ggplot
To review plot creation skills from STAT/COMP 112 and STAT 155, use the diamonds
dataset in the ggplot2
package to recreate the following visualizations: