Practicing our tools so-far; mini-project

Based on your observations in Reflection 1, use these exercises/explorations to practice what will be most beneficial to you.

As part of Homework 6 (due next Wed 10/8), the only required part of these activities is the “Improving your data visualizations” section. You are welcome to submit anything you work on this week for feedback.



Improving your data visualizations (REQUIRED)

Section 4.2.1 Guidelines for good plots presents 6 guidelines for creating great plots.

  • In class: Skim through the examples for 6 guidelines to get a feel for how the guidelines are implemented.
  • Out of class: Read through Guidelines for good plots in detail, and complete the following exercises in the following sections.

Resources for sparking creativity and imagination in your plots

Exercise: Explore at least one of these resources, and keep an eye out for the use of the 6 guidelines for great plots. Write a few sentences describing what you observe.

Applying the guidelines to your project

Part 1: Identify a key visualization from your project–one that communicates a finding that you really want your audience to remember. Display this original plot in the code chunk below. If you pick the same plot as your teammates, that’s fine. But please work on improvements to the visualization on your own so that you can best develop your own skills.

# Code for your original project visualization

Part 2: Evaluate your original visualization in terms of the 6 guidelines. Write a paragraph in the space below explaining which guidelines you want to use to improve your plot and why.

Write your evaluation here.

Part 3: Implement the changes that you described in Part 2, and display your improved plot below.

# Code for improved project visualization



Practice: recreating visualizations

Context 1: In the New York Times article Vast New Study Shows a Key to Reducing Poverty: More Friendships Between Rich and Poor, jump down to the section titled “A culture of success”. Recreate the “Where People Make Friends, By Income Rank” visualization for the “In College” graph. You can generate some example data with the following code:

# Each case represents a person
# income: This person's income in dollars
# friend_from_college: An indicator of whether this person made most of their current friends in college (1 = yes, 0 = no)
set.seed(62)
friend_data <- tibble(
    income = rnorm(n = 1000, mean = rep(seq(from = 40, by = 10, length.out = 20), each = 50)*1000, sd = 1000),
    friend_from_college = rbinom(n = 1000, size = 1, prob = rep(seq(from = 0.01, to = 0.7, length.out = 20), each = 50))
)

Context 2: Recreate any of the visualizations in The Pudding’s Film or Digital? article using the data available here. Suggested plots to recreate:

  • The first bar graph (“MEDIUMS OF TOP MOVIES”)
  • The square grid graphs (“MEDIUMS OF TOP MOVIES - Broken down by genre”)
  • The final dot plot (“MEDIUMS OF TOP MOVIES BY BUDGET RANGE (in logarithmic scale)”)

Context 3: Recreate any of the visualizations in The Pudding’s When Women Make Headlines article using the data available here. Suggested plots to recreate:

  • The first plot(s) with the title and subtitle “Words used in headlines about women - Arranged by country and ordered by frequency”. Create different plots highlighting different words.
  • The horizontal line plot with black and green dots (“Most outlets sensationalize headlines about women more than other topics - Comparing 65 news outlets over 10 years”)



Practice: string wrangling and regular expressions

Bookmark the strings cheatsheet and the regular expressions cheatsheet

  • Try any number of the exercises in the Regular Expressions chapter of R4DS.
  • Try any number of the Puzzles and Challenges at https://regexcrossword.com/. (Start with the Tutorial.)
    • As you work on these, it may be helpful to test out the regular expressions in the Console. Try to create examples of strings that fit the pattern and examples that look like they might fit the pattern but don’t.



Practice: maps

The following TidyTuesday datasets contain spatial information:

Use each of these datasets to create maps within the United States that display the locations of these sites (e.g., nationwide or for a particular state).

  1. The first step will be to display the map background. Review the use of the tidycensus::get_acs() function from the setup of Homework 2. The tidycensus package is an API wrapper package and provides tools for accessing geographic and census information in the United States. The get_acs() function gets information from the American Community Survey, a regular nationwide survey that provides information on a large range of demographic and livelihood indicators at different geographic levels (i.e., census tracts, counties, states).

    • Look at the documentation by entering ?tidycensus::get_acs in the Console.
    • If you want to make a choropleth map and color geographic regions by census variables, it will be useful to look at the Seaching for variables documentation from the tidycensus package. This will help with the variables argument in the get_acs() function.
  2. Review the usage of the st_as_sf() function from Homework 2 as well as the st_crs() function. Use these functions to convert the historical markers and wastewater plants datasets into sf objects.

  3. Continue to reference your work from Homework 2 as you make your maps. Practice fine tuning labels and color scales on your map.



Mini-project: Mac schedule explorer

Part 1: Navigate to the Spring 2024 Course Schedule.

  • Google Chrome users: Use Command-S, Ctrl-S, or File > Save Page As… to bring up options for saving the webpage. For Format, choose Webpage, Complete, and save this file as spring24_schedule.html.
  • All other browsers: Navigate to Moodle to download the webpage HTML source that the instructor downloaded.
    • Note: Safari has a Web Archive format, but unfortunately, the file type can’t be read by rvest.

Part 2: Use rvest to scrape the following data for all courses:

  • Course ID (e.g., STAT 212-01)
  • Course title (e.g., Intermediate Data Science)
  • Meeting days (e.g., T R)
  • Meeting times (e.g., 9:40 - 11:10 am)
  • Instructor name (e.g., Leslie Myint)
  • Available seats
  • Max enrollment

Part 3: Use any data wrangling tools necessary to clean this data so that the expected data is properly formatted.

Part 4: (OPTIONAL STRETCH GOAL) Scrape the following information from all courses in the Mac course catalog: course ID, course title, and URL for the catalog description for the course. Combine this information with your cleaned course schedule dataset.

  • Note: For getting the URL, you will need to look at the html_attr() function in rvest.

Part 5: Create a Shiny app that allows the user to explore/filter the schedule by whatever criteria you see fit. Include functionality to allow users to navigate to course descriptions in the Mac course catalog. Use your expertise as a student to design this app in a useful way for students!

  • Optional: If you want to learn a new way of creating more beautiful Shiny user interfaces, check out the flexdashboard package.

New topics to explore

  • Advanced interactivity
    • Shiny: Check out this series of tutorials to learn more about reactivity
    • plotly: Check out this tutorial that explains how plotly graphics and Shiny can work together
    • htmlwidgets
  • Text Mining with R: https://www.tidytextmining.com/